[U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds

All of lore.kernel.org
 help / color / mirror / Atom feed

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
@ 2015-08-28 13:50 Lukasz Majewski
  2015-08-28 13:50 ` [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console Lukasz Majewski
                   ` (3 more replies)
  0 siblings, 4 replies; 50+ messages in thread
From: Lukasz Majewski @ 2015-08-28 13:50 UTC (permalink / raw)
  To: u-boot

The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
"mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end
of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
the default timeout is to short.

The new value - 20 seconds - takes into account the situation when SD card
triggers internal clean up. Such process may take more than 10 seconds on
some cards.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>
Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
Cc: Tom Rini <trini@konsulko.com>
---
 drivers/mmc/dw_mmc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c
index 77b87e0..21a92d2 100644
--- a/drivers/mmc/dw_mmc.c
+++ b/drivers/mmc/dw_mmc.c
@@ -213,7 +213,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
 
 	if (data) {
 		start = get_timer(0);
-		timeout = 1000;
+		timeout = 20000;
 		for (;;) {
 			mask = dwmci_readl(host, DWMCI_RINTSTS);
 			/* Error during data transfer. */
-- 
2.0.0.rc2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console
  2015-08-28 13:50 [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds Lukasz Majewski
@ 2015-08-28 13:50 ` Lukasz Majewski
  2015-08-28 23:21   ` Simon Glass
  2015-09-03 12:21   ` [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers Lukasz Majewski
  2015-08-28 21:55 ` [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds Marek Vasut
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 50+ messages in thread
From: Lukasz Majewski @ 2015-08-28 13:50 UTC (permalink / raw)
  To: u-boot

The timeout error for DW MMC transfer should be visible on the u-boot
console to speed up the process of debugging.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Tom Rini <trini@konsulko.com>
Cc: Simon Glass <sjg@chromium.org>
---
 drivers/mmc/dw_mmc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c
index 21a92d2..98f5cb7 100644
--- a/drivers/mmc/dw_mmc.c
+++ b/drivers/mmc/dw_mmc.c
@@ -231,7 +231,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
 
 			/* Check for timeout. */
 			if (get_timer(start) > timeout) {
-				debug("%s: Timeout waiting for data!\n",
+				printf("%s: Timeout waiting for data!\n",
 				       __func__);
 				ret = TIMEOUT;
 				break;
-- 
2.0.0.rc2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-08-28 13:50 [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds Lukasz Majewski
  2015-08-28 13:50 ` [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console Lukasz Majewski
@ 2015-08-28 21:55 ` Marek Vasut
  2015-08-29 11:55   ` Lukasz Majewski
  2015-09-09  7:01 ` Lukasz Majewski
  2015-09-25 16:25 ` [U-Boot] [PATCH] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel) Lukasz Majewski
  3 siblings, 1 reply; 50+ messages in thread
From: Marek Vasut @ 2015-08-28 21:55 UTC (permalink / raw)
  To: u-boot

On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:
> The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end
> of dw mmc transfer.
> 
> For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> the default timeout is to short.
> 
> The new value - 20 seconds - takes into account the situation when SD card
> triggers internal clean up. Such process may take more than 10 seconds on
> some cards.

What happens if you pull the SD card out of the slot during such a process?

Also, where did you find out there is such "cleanup" mechanism please ?

> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> Cc: Marek Vasut <marex@denx.de>
> Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
> Cc: Tom Rini <trini@konsulko.com>
> ---
>  drivers/mmc/dw_mmc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c
> index 77b87e0..21a92d2 100644
> --- a/drivers/mmc/dw_mmc.c
> +++ b/drivers/mmc/dw_mmc.c
> @@ -213,7 +213,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct
> mmc_cmd *cmd,
> 
>  	if (data) {
>  		start = get_timer(0);
> -		timeout = 1000;
> +		timeout = 20000;
>  		for (;;) {
>  			mask = dwmci_readl(host, DWMCI_RINTSTS);
>  			/* Error during data transfer. */

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console
  2015-08-28 13:50 ` [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console Lukasz Majewski
@ 2015-08-28 23:21   ` Simon Glass
  2015-08-29 12:09     ` Lukasz Majewski
  2015-09-03 12:21   ` [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers Lukasz Majewski
  1 sibling, 1 reply; 50+ messages in thread
From: Simon Glass @ 2015-08-28 23:21 UTC (permalink / raw)
  To: u-boot

Hi Lukasz,

On 28 August 2015 at 07:50, Lukasz Majewski <l.majewski@samsung.com> wrote:
>
> The timeout error for DW MMC transfer should be visible on the u-boot
> console to speed up the process of debugging.
>
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> Cc: Tom Rini <trini@konsulko.com>
> Cc: Simon Glass <sjg@chromium.org>
> ---
>  drivers/mmc/dw_mmc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Can this be reported at the command line level instead? It feels wrong
to be printing errors in a driver. Also, why does it return TIMEOUT
instead of -ETIMEOUT?

Regards,
Simon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-08-28 21:55 ` [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds Marek Vasut
@ 2015-08-29 11:55   ` Lukasz Majewski
  2015-08-29 13:52     ` Marek Vasut
  0 siblings, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-08-29 11:55 UTC (permalink / raw)
  To: u-boot

On Fri, 28 Aug 2015 23:55:17 +0200
Marek Vasut <marex@denx.de> wrote:

> On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:
> > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for
> > end of dw mmc transfer.
> > 
> > For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> > and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> > the default timeout is to short.
> > 
> > The new value - 20 seconds - takes into account the situation when
> > SD card triggers internal clean up. Such process may take more than
> > 10 seconds on some cards.
> 
> What happens if you pull the SD card out of the slot during such a
> process?

Then we would wait 20 seconds :-) to proceed.

To be clear - the mentioned patch introduced regression. It works for
small files on a commonly available SD cards (like 4 GiB
Kingston/Adata).

When I ran DFU tests I've discovered that there is a problem with
storing 8MiB file (dat_8M.img). 

Even worse - when one wants to store Image.itb file (which might be 4-6
MiB) it sometimes works and sometimes not. Nightmare for debugging.

Please correct me if I'm wrong - but is seems like we are now using 1
second timeout for detection if SD card has been removed?

Shouldn't we use polling on the card detect IO pin instead? We could add
such polling in several places in the MMC subsystem (like we do it
with watchdog).

Marek, Pantelis, what do you think about this?

> 
> Also, where did you find out there is such "cleanup" mechanism
> please ?

Internally we did some tests with several SD cards. We were stunned when
it turned out that for some workloads it took up to 15 seconds to
end write operation for small data.

The culprit is the SD Card embedded controller responsible for FTL -
flash translation layer.
It allows NAND memory on the card to be visible as the block device.
More importantly it also takes care of wear leveling and bad block
management.

Hence, we don't know when it would start housekeeping operations.
We can only poll/wait until this controller finishes it work.
The code as it was (with the indefinite loop) was taking this situation
into account. 

The 1 second timeout is apparently too short and makes using SD card
non-deterministic and error prone in u-boot.

Even worse, many devices use SD card as the only storage device.

> 
> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > Cc: Marek Vasut <marex@denx.de>
> > Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
> > Cc: Tom Rini <trini@konsulko.com>
> > ---
> >  drivers/mmc/dw_mmc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c
> > index 77b87e0..21a92d2 100644
> > --- a/drivers/mmc/dw_mmc.c
> > +++ b/drivers/mmc/dw_mmc.c
> > @@ -213,7 +213,7 @@ static int dwmci_send_cmd(struct mmc *mmc,
> > struct mmc_cmd *cmd,
> > 
> >  	if (data) {
> >  		start = get_timer(0);
> > -		timeout = 1000;
> > +		timeout = 20000;
> >  		for (;;) {
> >  			mask = dwmci_readl(host, DWMCI_RINTSTS);
> >  			/* Error during data transfer. */
> 
> Best regards,
> Marek Vasut

Best regards,

Lukasz Majewski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150829/fb200c0f/attachment.sig>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console
  2015-08-28 23:21   ` Simon Glass
@ 2015-08-29 12:09     ` Lukasz Majewski
  2015-08-29 15:07       ` Simon Glass
  0 siblings, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-08-29 12:09 UTC (permalink / raw)
  To: u-boot

On Fri, 28 Aug 2015 17:21:51 -0600
Simon Glass <sjg@chromium.org> wrote:

> Hi Lukasz,
> 
> On 28 August 2015 at 07:50, Lukasz Majewski <l.majewski@samsung.com>
> wrote:
> >
> > The timeout error for DW MMC transfer should be visible on the
> > u-boot console to speed up the process of debugging.
> >
> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > Cc: Tom Rini <trini@konsulko.com>
> > Cc: Simon Glass <sjg@chromium.org>
> > ---
> >  drivers/mmc/dw_mmc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Can this be reported at the command line level instead? 

With my setup - dfu tests - I didn't receive any error/information
about the MMC (SD Card to be precise) timeout.

(I would expect MMC subsystem to return -ETIMEOUT and then this error
would be propagated to dfu and stop execution of the dfu command).

This didn't work and hence should be scrutinized.

> It feels wrong
> to be printing errors in a driver. 

If this information would be printed on the console I might have
noticed it immediately and save some time for debugging.

> Also, why does it return TIMEOUT
> instead of -ETIMEOUT?

Good question. 

> 
> Regards,
> Simon

Best regards,

Lukasz Majewski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150829/09a5af78/attachment.sig>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-08-29 11:55   ` Lukasz Majewski
@ 2015-08-29 13:52     ` Marek Vasut
  2015-08-29 16:38       ` Lukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Marek Vasut @ 2015-08-29 13:52 UTC (permalink / raw)
  To: u-boot

On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski wrote:
> On Fri, 28 Aug 2015 23:55:17 +0200

Hi!

> Marek Vasut <marex@denx.de> wrote:
> > On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:
> > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > > "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for
> > > end of dw mmc transfer.
> > > 
> > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> > > and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> > > the default timeout is to short.
> > > 
> > > The new value - 20 seconds - takes into account the situation when
> > > SD card triggers internal clean up. Such process may take more than
> > > 10 seconds on some cards.
> > 
> > What happens if you pull the SD card out of the slot during such a
> > process?
> 
> Then we would wait 20 seconds :-) to proceed.

Oops, I think I was not clear here. I was wondering what happens to the
card if you yank it out of the slot whole it's performing it's internal
cleanup or whatever. Is it possible that the card suffers data corruption,
effectively trashing user data ? Is this behavior specific to Samsung SD
cards ?

> To be clear - the mentioned patch introduced regression.

That's a bug, not a regression, but anyway, that's not the point. I do
agree with you that we do have a problem and I'm inclined to Ack this
patch, I'd like to understand what the real implications of such a behavior
of these cards are.

> It works for
> small files on a commonly available SD cards (like 4 GiB
> Kingston/Adata).
> 
> When I ran DFU tests I've discovered that there is a problem with
> storing 8MiB file (dat_8M.img).
> 
> Even worse - when one wants to store Image.itb file (which might be 4-6
> MiB) it sometimes works and sometimes not. Nightmare for debugging.

Ew, that's one crappy card you have there. I'm reading the SD card
"Physical Layer Simplified Specification Version 4.10" (part1_410.pdf)
section 4.6.2.2 and it states that for SDHC cards, the write operation
should take at most 250mS, for SDXC it's 500mS. Could it be that your
card is violating the spec ?

> Please correct me if I'm wrong - but is seems like we are now using 1
> second timeout for detection if SD card has been removed?
> 
> Shouldn't we use polling on the card detect IO pin instead? We could add
> such polling in several places in the MMC subsystem (like we do it
> with watchdog).
> 
> Marek, Pantelis, what do you think about this?

If you implement board_mmc_getcd(), you can check if the card is present
this way instead of waiting for command to time out. The infrastructure
for that is already in place. Right ?

It'd be cool if the MMC subsystem could pull the wp-gpios and cd-gpios
from DT though :)

> > Also, where did you find out there is such "cleanup" mechanism
> > please ?
> 
> Internally we did some tests with several SD cards. We were stunned when
> it turned out that for some workloads it took up to 15 seconds to
> end write operation for small data.
> 
> The culprit is the SD Card embedded controller responsible for FTL -
> flash translation layer.
> It allows NAND memory on the card to be visible as the block device.
> More importantly it also takes care of wear leveling and bad block
> management.
> 
> Hence, we don't know when it would start housekeeping operations.
> We can only poll/wait until this controller finishes it work.
> The code as it was (with the indefinite loop) was taking this situation
> into account.
> 
> The 1 second timeout is apparently too short and makes using SD card
> non-deterministic and error prone in u-boot.
> 
> Even worse, many devices use SD card as the only storage device.

Yes, horrible.

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console
  2015-08-29 12:09     ` Lukasz Majewski
@ 2015-08-29 15:07       ` Simon Glass
  2015-09-03 12:33         ` Lukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Simon Glass @ 2015-08-29 15:07 UTC (permalink / raw)
  To: u-boot

Hi Lukasz,

On 29 August 2015 at 06:09, Lukasz Majewski <l.majewski@majess.pl> wrote:
> On Fri, 28 Aug 2015 17:21:51 -0600
> Simon Glass <sjg@chromium.org> wrote:
>
>> Hi Lukasz,
>>
>> On 28 August 2015 at 07:50, Lukasz Majewski <l.majewski@samsung.com>
>> wrote:
>> >
>> > The timeout error for DW MMC transfer should be visible on the
>> > u-boot console to speed up the process of debugging.
>> >
>> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
>> > Cc: Tom Rini <trini@konsulko.com>
>> > Cc: Simon Glass <sjg@chromium.org>
>> > ---
>> >  drivers/mmc/dw_mmc.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> Can this be reported at the command line level instead?
>
> With my setup - dfu tests - I didn't receive any error/information
> about the MMC (SD Card to be precise) timeout.
>
> (I would expect MMC subsystem to return -ETIMEOUT and then this error
> would be propagated to dfu and stop execution of the dfu command).
>
> This didn't work and hence should be scrutinized.

Agreed.

>
>> It feels wrong
>> to be printing errors in a driver.
>
> If this information would be printed on the console I might have
> noticed it immediately and save some time for debugging.
>
>> Also, why does it return TIMEOUT
>> instead of -ETIMEOUT?
>
> Good question.

It looks likes mmc.h has its own errors (NO_CARD_ERR, etc.). I suspect
these could be converted to use standard errors.

Anyway your question remains. A driver error should be reported by the caller.

Regards,
Simon

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-08-29 13:52     ` Marek Vasut
@ 2015-08-29 16:38       ` Lukasz Majewski
  2015-08-29 19:19         ` Marek Vasut
  0 siblings, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-08-29 16:38 UTC (permalink / raw)
  To: u-boot

Hi Marek,

> On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski wrote:
> > On Fri, 28 Aug 2015 23:55:17 +0200
> 
> Hi!
> 
> > Marek Vasut <marex@denx.de> wrote:
> > > On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:
> > > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > > > "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting
> > > > for end of dw mmc transfer.
> > > > 
> > > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> > > > and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> > > > the default timeout is to short.
> > > > 
> > > > The new value - 20 seconds - takes into account the situation
> > > > when SD card triggers internal clean up. Such process may take
> > > > more than 10 seconds on some cards.
> > > 
> > > What happens if you pull the SD card out of the slot during such a
> > > process?
> > 
> > Then we would wait 20 seconds :-) to proceed.
> 
> Oops, I think I was not clear here. I was wondering what happens to
> the card if you yank it out of the slot whole it's performing it's
> internal cleanup or whatever. Is it possible that the card suffers
> data corruption, effectively trashing user data ?

I think that only the card manufacturer may reveal what can happen with
the card (what policy have been implemented in their FW).

I guess that you may lost data in such case.

> Is this behavior
> specific to Samsung SD cards ?

I've experienced the problem with Kingston (brand new one) and Adata
MicroSD HC (4GiB) cards.

> 
> > To be clear - the mentioned patch introduced regression.
> 
> That's a bug, not a regression, but anyway, 
> that's not the point. I do
> agree with you that we do have a problem and I'm inclined to Ack this
> patch, I'd like to understand what the real implications of such a
> behavior of these cards are.
> 
> > It works for
> > small files on a commonly available SD cards (like 4 GiB
> > Kingston/Adata).
> > 
> > When I ran DFU tests I've discovered that there is a problem with
> > storing 8MiB file (dat_8M.img).
> > 
> > Even worse - when one wants to store Image.itb file (which might be
> > 4-6 MiB) it sometimes works and sometimes not. Nightmare for
> > debugging.
> 
> Ew, that's one crappy card you have there. I'm reading the SD card
> "Physical Layer Simplified Specification Version 4.10" (part1_410.pdf)
> section 4.6.2.2 and it states that for SDHC cards, the write operation
> should take at most 250mS, for SDXC it's 500mS. Could it be that your
> card is violating the spec ?

I'll look into the spec and then comment :-). 

For my boards the time was 1.2 seconds for storing 8 MiB file.

> 
> > Please correct me if I'm wrong - but is seems like we are now using
> > 1 second timeout for detection if SD card has been removed?
> > 
> > Shouldn't we use polling on the card detect IO pin instead? We
> > could add such polling in several places in the MMC subsystem (like
> > we do it with watchdog).
> > 
> > Marek, Pantelis, what do you think about this?
> 
> If you implement board_mmc_getcd(), you can check if the card is
> present this way instead of waiting for command to time out. The
> infrastructure for that is already in place. Right ?

So you suggest adding board_mmc_getcd() in several places in the mmc
subsystem driver to detect removal of the SD card? 

> 
> It'd be cool if the MMC subsystem could pull the wp-gpios and cd-gpios
> from DT though :)

+1

> 
> > > Also, where did you find out there is such "cleanup" mechanism
> > > please ?
> > 
> > Internally we did some tests with several SD cards. We were stunned
> > when it turned out that for some workloads it took up to 15 seconds
> > to end write operation for small data.
> > 
> > The culprit is the SD Card embedded controller responsible for FTL -
> > flash translation layer.
> > It allows NAND memory on the card to be visible as the block device.
> > More importantly it also takes care of wear leveling and bad block
> > management.
> > 
> > Hence, we don't know when it would start housekeeping operations.
> > We can only poll/wait until this controller finishes it work.
> > The code as it was (with the indefinite loop) was taking this
> > situation into account.
> > 
> > The 1 second timeout is apparently too short and makes using SD card
> > non-deterministic and error prone in u-boot.
> > 
> > Even worse, many devices use SD card as the only storage device.
> 
> Yes, horrible.

Good that we have agreed.

> 
> Best regards,
> Marek Vasut

Best regards,

?ukasz Majewski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150829/2b634bd5/attachment.sig>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-08-29 16:38       ` Lukasz Majewski
@ 2015-08-29 19:19         ` Marek Vasut
  2015-09-01 11:19           ` Lukasz Majewski
  2015-09-01 16:22           ` Pantelis Antoniou
  0 siblings, 2 replies; 50+ messages in thread
From: Marek Vasut @ 2015-08-29 19:19 UTC (permalink / raw)
  To: u-boot

On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski wrote:
> Hi Marek,

Hi Lukasz,

> > On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski wrote:
> > > On Fri, 28 Aug 2015 23:55:17 +0200
> > 
> > Hi!
> > 
> > > Marek Vasut <marex@denx.de> wrote:
> > > > On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:
> > > > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > > > > "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting
> > > > > for end of dw mmc transfer.
> > > > > 
> > > > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> > > > > and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> > > > > the default timeout is to short.
> > > > > 
> > > > > The new value - 20 seconds - takes into account the situation
> > > > > when SD card triggers internal clean up. Such process may take
> > > > > more than 10 seconds on some cards.
> > > > 
> > > > What happens if you pull the SD card out of the slot during such a
> > > > process?
> > > 
> > > Then we would wait 20 seconds :-) to proceed.
> > 
> > Oops, I think I was not clear here. I was wondering what happens to
> > the card if you yank it out of the slot whole it's performing it's
> > internal cleanup or whatever. Is it possible that the card suffers
> > data corruption, effectively trashing user data ?
> 
> I think that only the card manufacturer may reveal what can happen with
> the card (what policy have been implemented in their FW).
> 
> I guess that you may lost data in such case.

Uuuurgh, that's seriously shitty.

> > Is this behavior
> > specific to Samsung SD cards ?
> 
> I've experienced the problem with Kingston (brand new one) and Adata
> MicroSD HC (4GiB) cards.

I had bad previous experience with Kingston too.

> > > To be clear - the mentioned patch introduced regression.
> > 
> > That's a bug, not a regression, but anyway,
> > that's not the point. I do
> > agree with you that we do have a problem and I'm inclined to Ack this
> > patch, I'd like to understand what the real implications of such a
> > behavior of these cards are.
> > 
> > > It works for
> > > small files on a commonly available SD cards (like 4 GiB
> > > Kingston/Adata).
> > > 
> > > When I ran DFU tests I've discovered that there is a problem with
> > > storing 8MiB file (dat_8M.img).
> > > 
> > > Even worse - when one wants to store Image.itb file (which might be
> > > 4-6 MiB) it sometimes works and sometimes not. Nightmare for
> > > debugging.
> > 
> > Ew, that's one crappy card you have there. I'm reading the SD card
> > "Physical Layer Simplified Specification Version 4.10" (part1_410.pdf)
> > section 4.6.2.2 and it states that for SDHC cards, the write operation
> > should take at most 250mS, for SDXC it's 500mS. Could it be that your
> > card is violating the spec ?
> 
> I'll look into the spec and then comment :-).
> 
> For my boards the time was 1.2 seconds for storing 8 MiB file.

OK, but that's weird. The transfer should always be up to 512B long and not
any longer, unless it's a multiblock transfer.

> > > Please correct me if I'm wrong - but is seems like we are now using
> > > 1 second timeout for detection if SD card has been removed?
> > > 
> > > Shouldn't we use polling on the card detect IO pin instead? We
> > > could add such polling in several places in the MMC subsystem (like
> > > we do it with watchdog).
> > > 
> > > Marek, Pantelis, what do you think about this?
> > 
> > If you implement board_mmc_getcd(), you can check if the card is
> > present this way instead of waiting for command to time out. The
> > infrastructure for that is already in place. Right ?
> 
> So you suggest adding board_mmc_getcd() in several places in the mmc
> subsystem driver to detect removal of the SD card?

Hmmmm, I'm not sure about this one. Panto ?

> > It'd be cool if the MMC subsystem could pull the wp-gpios and cd-gpios
> > from DT though :)
> 
> +1
> 
> > > > Also, where did you find out there is such "cleanup" mechanism
> > > > please ?
> > > 
> > > Internally we did some tests with several SD cards. We were stunned
> > > when it turned out that for some workloads it took up to 15 seconds
> > > to end write operation for small data.
> > > 
> > > The culprit is the SD Card embedded controller responsible for FTL -
> > > flash translation layer.
> > > It allows NAND memory on the card to be visible as the block device.
> > > More importantly it also takes care of wear leveling and bad block
> > > management.
> > > 
> > > Hence, we don't know when it would start housekeeping operations.
> > > We can only poll/wait until this controller finishes it work.
> > > The code as it was (with the indefinite loop) was taking this
> > > situation into account.
> > > 
> > > The 1 second timeout is apparently too short and makes using SD card
> > > non-deterministic and error prone in u-boot.
> > > 
> > > Even worse, many devices use SD card as the only storage device.
> > 
> > Yes, horrible.
> 
> Good that we have agreed.

Heh :)

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-08-29 19:19         ` Marek Vasut
@ 2015-09-01 11:19           ` Lukasz Majewski
  2015-09-01 11:33             ` Marek Vasut
  2015-09-01 16:22           ` Pantelis Antoniou
  1 sibling, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-01 11:19 UTC (permalink / raw)
  To: u-boot

Hi Marek,

> On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski wrote:
> > Hi Marek,
> 
> Hi Lukasz,
> 
> > > On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski
> > > wrote:
> > > > On Fri, 28 Aug 2015 23:55:17 +0200
> > > 
> > > Hi!
> > > 
> > > > Marek Vasut <marex@denx.de> wrote:
> > > > > On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski
> > > > > wrote:
> > > > > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > > > > > "mmc: dw_mmc: Zap endless timeout" removed endless loop
> > > > > > waiting for end of dw mmc transfer.
> > > > > > 
> > > > > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB
> > > > > > file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata
> > > > > > 4GiB) the default timeout is to short.
> > > > > > 
> > > > > > The new value - 20 seconds - takes into account the
> > > > > > situation when SD card triggers internal clean up. Such
> > > > > > process may take more than 10 seconds on some cards.
> > > > > 
> > > > > What happens if you pull the SD card out of the slot during
> > > > > such a process?
> > > > 
> > > > Then we would wait 20 seconds :-) to proceed.
> > > 
> > > Oops, I think I was not clear here. I was wondering what happens
> > > to the card if you yank it out of the slot whole it's performing
> > > it's internal cleanup or whatever. Is it possible that the card
> > > suffers data corruption, effectively trashing user data ?
> > 
> > I think that only the card manufacturer may reveal what can happen
> > with the card (what policy have been implemented in their FW).
> > 
> > I guess that you may lost data in such case.
> 
> Uuuurgh, that's seriously shitty.
> 
> > > Is this behavior
> > > specific to Samsung SD cards ?
> > 
> > I've experienced the problem with Kingston (brand new one) and Adata
> > MicroSD HC (4GiB) cards.
> 
> I had bad previous experience with Kingston too.
> 
> > > > To be clear - the mentioned patch introduced regression.
> > > 
> > > That's a bug, not a regression, but anyway,
> > > that's not the point. I do
> > > agree with you that we do have a problem and I'm inclined to Ack
> > > this patch, I'd like to understand what the real implications of
> > > such a behavior of these cards are.
> > > 
> > > > It works for
> > > > small files on a commonly available SD cards (like 4 GiB
> > > > Kingston/Adata).
> > > > 
> > > > When I ran DFU tests I've discovered that there is a problem
> > > > with storing 8MiB file (dat_8M.img).
> > > > 
> > > > Even worse - when one wants to store Image.itb file (which
> > > > might be 4-6 MiB) it sometimes works and sometimes not.
> > > > Nightmare for debugging.
> > > 
> > > Ew, that's one crappy card you have there. I'm reading the SD card
> > > "Physical Layer Simplified Specification Version
> > > 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for SDHC
> > > cards, the write operation should take at most 250mS, for SDXC
> > > it's 500mS. Could it be that your card is violating the spec ?

The "timeout" error is for situation when you issue write command
(either single or multiple block) and you don't receive any response
from the card.

In our case we use multiblock transfer (CMD25) with either set number
of block to transfer (CMD23) or explicit end of transmission (CMD12).

Let's consider the second case.

We setup data and issue CMD25. Then we check the CMD25 status (if we
don't receive reply in 250 ms we would get timeout error).
Afterwards data from buffer is transmitted to the card and flashed.
This operation may take long time. During this process you can issue
CMD13 (SEND_STATUS) to receive information about card state ([*] 4.10. -
page 85).

Two notable fields of [*] to check are READY_FOR_DATA and CURRENT_STATE.
Those two state what is the SD Card controller situation.

Then, you end the transfer with CMD12, which also provides some status
information from the SDCard (like "prg"|"rcv", etc).

If you want to issue next command you must check READY_FOR_DATA and
CURRENT_STATE. In the case of internal SD card controller operation you
would not get READY_FOR_DATA until it ends.


> > 
> > I'll look into the spec and then comment :-).
> > 
> > For my boards the time was 1.2 seconds for storing 8 MiB file.
> 
> OK, but that's weird. The transfer should always be up to 512B long
> and not any longer, unless it's a multiblock transfer.

It is a multi block transfer.

> 
> > > > Please correct me if I'm wrong - but is seems like we are now
> > > > using 1 second timeout for detection if SD card has been
> > > > removed?
> > > > 
> > > > Shouldn't we use polling on the card detect IO pin instead? We
> > > > could add such polling in several places in the MMC subsystem
> > > > (like we do it with watchdog).
> > > > 
> > > > Marek, Pantelis, what do you think about this?
> > > 
> > > If you implement board_mmc_getcd(), you can check if the card is
> > > present this way instead of waiting for command to time out. The
> > > infrastructure for that is already in place. Right ?
> > 
> > So you suggest adding board_mmc_getcd() in several places in the mmc
> > subsystem driver to detect removal of the SD card?
> 
> Hmmmm, I'm not sure about this one. Panto ?
> 
> > > It'd be cool if the MMC subsystem could pull the wp-gpios and
> > > cd-gpios from DT though :)
> > 
> > +1
> > 
> > > > > Also, where did you find out there is such "cleanup" mechanism
> > > > > please ?
> > > > 
> > > > Internally we did some tests with several SD cards. We were
> > > > stunned when it turned out that for some workloads it took up
> > > > to 15 seconds to end write operation for small data.
> > > > 
> > > > The culprit is the SD Card embedded controller responsible for
> > > > FTL - flash translation layer.
> > > > It allows NAND memory on the card to be visible as the block
> > > > device. More importantly it also takes care of wear leveling
> > > > and bad block management.
> > > > 
> > > > Hence, we don't know when it would start housekeeping
> > > > operations. We can only poll/wait until this controller
> > > > finishes it work. The code as it was (with the indefinite loop)
> > > > was taking this situation into account.
> > > > 
> > > > The 1 second timeout is apparently too short and makes using SD
> > > > card non-deterministic and error prone in u-boot.
> > > > 
> > > > Even worse, many devices use SD card as the only storage device.
> > > 
> > > Yes, horrible.
> > 
> > Good that we have agreed.
> 
> Heh :)
> 
> Best regards,
> Marek Vasut



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-01 11:19           ` Lukasz Majewski
@ 2015-09-01 11:33             ` Marek Vasut
  2015-09-01 15:25               ` Lukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Marek Vasut @ 2015-09-01 11:33 UTC (permalink / raw)
  To: u-boot

On Tuesday, September 01, 2015 at 01:19:09 PM, Lukasz Majewski wrote:
> Hi Marek,

Hi!

> > On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski wrote:
> > > Hi Marek,
> > 
> > Hi Lukasz,
> > 
> > > > On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski
> > > > 
> > > > wrote:
> > > > > On Fri, 28 Aug 2015 23:55:17 +0200
> > > > 
> > > > Hi!
> > > > 
> > > > > Marek Vasut <marex@denx.de> wrote:
> > > > > > On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski
> > > > > > 
> > > > > > wrote:
> > > > > > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > > > > > > "mmc: dw_mmc: Zap endless timeout" removed endless loop
> > > > > > > waiting for end of dw mmc transfer.
> > > > > > > 
> > > > > > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB
> > > > > > > file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata
> > > > > > > 4GiB) the default timeout is to short.
> > > > > > > 
> > > > > > > The new value - 20 seconds - takes into account the
> > > > > > > situation when SD card triggers internal clean up. Such
> > > > > > > process may take more than 10 seconds on some cards.
> > > > > > 
> > > > > > What happens if you pull the SD card out of the slot during
> > > > > > such a process?
> > > > > 
> > > > > Then we would wait 20 seconds :-) to proceed.
> > > > 
> > > > Oops, I think I was not clear here. I was wondering what happens
> > > > to the card if you yank it out of the slot whole it's performing
> > > > it's internal cleanup or whatever. Is it possible that the card
> > > > suffers data corruption, effectively trashing user data ?
> > > 
> > > I think that only the card manufacturer may reveal what can happen
> > > with the card (what policy have been implemented in their FW).
> > > 
> > > I guess that you may lost data in such case.
> > 
> > Uuuurgh, that's seriously shitty.
> > 
> > > > Is this behavior
> > > > specific to Samsung SD cards ?
> > > 
> > > I've experienced the problem with Kingston (brand new one) and Adata
> > > MicroSD HC (4GiB) cards.
> > 
> > I had bad previous experience with Kingston too.
> > 
> > > > > To be clear - the mentioned patch introduced regression.
> > > > 
> > > > That's a bug, not a regression, but anyway,
> > > > that's not the point. I do
> > > > agree with you that we do have a problem and I'm inclined to Ack
> > > > this patch, I'd like to understand what the real implications of
> > > > such a behavior of these cards are.
> > > > 
> > > > > It works for
> > > > > small files on a commonly available SD cards (like 4 GiB
> > > > > Kingston/Adata).
> > > > > 
> > > > > When I ran DFU tests I've discovered that there is a problem
> > > > > with storing 8MiB file (dat_8M.img).
> > > > > 
> > > > > Even worse - when one wants to store Image.itb file (which
> > > > > might be 4-6 MiB) it sometimes works and sometimes not.
> > > > > Nightmare for debugging.
> > > > 
> > > > Ew, that's one crappy card you have there. I'm reading the SD card
> > > > "Physical Layer Simplified Specification Version
> > > > 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for SDHC
> > > > cards, the write operation should take at most 250mS, for SDXC
> > > > it's 500mS. Could it be that your card is violating the spec ?
> 
> The "timeout" error is for situation when you issue write command
> (either single or multiple block) and you don't receive any response
> from the card.
> 
> In our case we use multiblock transfer (CMD25) with either set number
> of block to transfer (CMD23) or explicit end of transmission (CMD12).
> 
> Let's consider the second case.
> 
> We setup data and issue CMD25. Then we check the CMD25 status (if we
> don't receive reply in 250 ms we would get timeout error).
> Afterwards data from buffer is transmitted to the card and flashed.
> This operation may take long time. During this process you can issue
> CMD13 (SEND_STATUS) to receive information about card state ([*] 4.10. -
> page 85).

Maybe we should implement such polling through CMD13 then ? But, this
should be a matter for next MW, this MW we should go with increasing
the timeout to some 20 or 30 second. What do you think ?

> Two notable fields of [*] to check are READY_FOR_DATA and CURRENT_STATE.
> Those two state what is the SD Card controller situation.
> 
> Then, you end the transfer with CMD12, which also provides some status
> information from the SDCard (like "prg"|"rcv", etc).
> 
> If you want to issue next command you must check READY_FOR_DATA and
> CURRENT_STATE. In the case of internal SD card controller operation you
> would not get READY_FOR_DATA until it ends.
> 
> > > I'll look into the spec and then comment :-).
> > > 
> > > For my boards the time was 1.2 seconds for storing 8 MiB file.
> > 
> > OK, but that's weird. The transfer should always be up to 512B long
> > and not any longer, unless it's a multiblock transfer.
> 
> It is a multi block transfer.

[...]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-01 11:33             ` Marek Vasut
@ 2015-09-01 15:25               ` Lukasz Majewski
  2015-09-01 15:35                 ` Marek Vasut
  0 siblings, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-01 15:25 UTC (permalink / raw)
  To: u-boot

Hi Marek,

> On Tuesday, September 01, 2015 at 01:19:09 PM, Lukasz Majewski wrote:
> > Hi Marek,
> 
> Hi!
> 
> > > On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski
> > > wrote:
> > > > Hi Marek,
> > > 
> > > Hi Lukasz,
> > > 
> > > > > On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski
> > > > > 
> > > > > wrote:
> > > > > > On Fri, 28 Aug 2015 23:55:17 +0200
> > > > > 
> > > > > Hi!
> > > > > 
> > > > > > Marek Vasut <marex@denx.de> wrote:
> > > > > > > On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski
> > > > > > > 
> > > > > > > wrote:
> > > > > > > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > > > > > > > "mmc: dw_mmc: Zap endless timeout" removed endless loop
> > > > > > > > waiting for end of dw mmc transfer.
> > > > > > > > 
> > > > > > > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB
> > > > > > > > file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata
> > > > > > > > 4GiB) the default timeout is to short.
> > > > > > > > 
> > > > > > > > The new value - 20 seconds - takes into account the
> > > > > > > > situation when SD card triggers internal clean up. Such
> > > > > > > > process may take more than 10 seconds on some cards.
> > > > > > > 
> > > > > > > What happens if you pull the SD card out of the slot
> > > > > > > during such a process?
> > > > > > 
> > > > > > Then we would wait 20 seconds :-) to proceed.
> > > > > 
> > > > > Oops, I think I was not clear here. I was wondering what
> > > > > happens to the card if you yank it out of the slot whole it's
> > > > > performing it's internal cleanup or whatever. Is it possible
> > > > > that the card suffers data corruption, effectively trashing
> > > > > user data ?
> > > > 
> > > > I think that only the card manufacturer may reveal what can
> > > > happen with the card (what policy have been implemented in
> > > > their FW).
> > > > 
> > > > I guess that you may lost data in such case.
> > > 
> > > Uuuurgh, that's seriously shitty.
> > > 
> > > > > Is this behavior
> > > > > specific to Samsung SD cards ?
> > > > 
> > > > I've experienced the problem with Kingston (brand new one) and
> > > > Adata MicroSD HC (4GiB) cards.
> > > 
> > > I had bad previous experience with Kingston too.
> > > 
> > > > > > To be clear - the mentioned patch introduced regression.
> > > > > 
> > > > > That's a bug, not a regression, but anyway,
> > > > > that's not the point. I do
> > > > > agree with you that we do have a problem and I'm inclined to
> > > > > Ack this patch, I'd like to understand what the real
> > > > > implications of such a behavior of these cards are.
> > > > > 
> > > > > > It works for
> > > > > > small files on a commonly available SD cards (like 4 GiB
> > > > > > Kingston/Adata).
> > > > > > 
> > > > > > When I ran DFU tests I've discovered that there is a problem
> > > > > > with storing 8MiB file (dat_8M.img).
> > > > > > 
> > > > > > Even worse - when one wants to store Image.itb file (which
> > > > > > might be 4-6 MiB) it sometimes works and sometimes not.
> > > > > > Nightmare for debugging.
> > > > > 
> > > > > Ew, that's one crappy card you have there. I'm reading the SD
> > > > > card "Physical Layer Simplified Specification Version
> > > > > 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for
> > > > > SDHC cards, the write operation should take at most 250mS,
> > > > > for SDXC it's 500mS. Could it be that your card is violating
> > > > > the spec ?
> > 
> > The "timeout" error is for situation when you issue write command
> > (either single or multiple block) and you don't receive any response
> > from the card.
> > 
> > In our case we use multiblock transfer (CMD25) with either set
> > number of block to transfer (CMD23) or explicit end of transmission
> > (CMD12).
> > 
> > Let's consider the second case.
> > 
> > We setup data and issue CMD25. Then we check the CMD25 status (if we
> > don't receive reply in 250 ms we would get timeout error).
> > Afterwards data from buffer is transmitted to the card and flashed.
> > This operation may take long time. During this process you can issue
> > CMD13 (SEND_STATUS) to receive information about card state ([*]
> > 4.10. - page 85).
> 
> Maybe we should implement such polling through CMD13 then ? But, this
> should be a matter for next MW, this MW we should go with increasing
> the timeout to some 20 or 30 second. What do you think ?

For next release we can increase the timeout. Rewritting eMMC subsystem
code is a challenging task and we will not finish until the next u-boot
release.

> 
> > Two notable fields of [*] to check are READY_FOR_DATA and
> > CURRENT_STATE. Those two state what is the SD Card controller
> > situation.
> > 
> > Then, you end the transfer with CMD12, which also provides some
> > status information from the SDCard (like "prg"|"rcv", etc).
> > 
> > If you want to issue next command you must check READY_FOR_DATA and
> > CURRENT_STATE. In the case of internal SD card controller operation
> > you would not get READY_FOR_DATA until it ends.
> > 
> > > > I'll look into the spec and then comment :-).
> > > > 
> > > > For my boards the time was 1.2 seconds for storing 8 MiB file.
> > > 
> > > OK, but that's weird. The transfer should always be up to 512B
> > > long and not any longer, unless it's a multiblock transfer.
> > 
> > It is a multi block transfer.
> 
> [...]



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-01 15:25               ` Lukasz Majewski
@ 2015-09-01 15:35                 ` Marek Vasut
  0 siblings, 0 replies; 50+ messages in thread
From: Marek Vasut @ 2015-09-01 15:35 UTC (permalink / raw)
  To: u-boot

On Tuesday, September 01, 2015 at 05:25:00 PM, Lukasz Majewski wrote:
> Hi Marek,
> 
> > On Tuesday, September 01, 2015 at 01:19:09 PM, Lukasz Majewski wrote:
> > > Hi Marek,
> > 
> > Hi!
> > 
> > > > On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski
> > > > 
> > > > wrote:
> > > > > Hi Marek,
> > > > 
> > > > Hi Lukasz,
> > > > 
> > > > > > On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski
> > > > > > 
> > > > > > wrote:
> > > > > > > On Fri, 28 Aug 2015 23:55:17 +0200
> > > > > > 
> > > > > > Hi!
> > > > > > 
> > > > > > > Marek Vasut <marex@denx.de> wrote:
> > > > > > > > On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski
> > > > > > > > 
> > > > > > > > wrote:
> > > > > > > > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > > > > > > > > "mmc: dw_mmc: Zap endless timeout" removed endless loop
> > > > > > > > > waiting for end of dw mmc transfer.
> > > > > > > > > 
> > > > > > > > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB
> > > > > > > > > file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata
> > > > > > > > > 4GiB) the default timeout is to short.
> > > > > > > > > 
> > > > > > > > > The new value - 20 seconds - takes into account the
> > > > > > > > > situation when SD card triggers internal clean up. Such
> > > > > > > > > process may take more than 10 seconds on some cards.
> > > > > > > > 
> > > > > > > > What happens if you pull the SD card out of the slot
> > > > > > > > during such a process?
> > > > > > > 
> > > > > > > Then we would wait 20 seconds :-) to proceed.
> > > > > > 
> > > > > > Oops, I think I was not clear here. I was wondering what
> > > > > > happens to the card if you yank it out of the slot whole it's
> > > > > > performing it's internal cleanup or whatever. Is it possible
> > > > > > that the card suffers data corruption, effectively trashing
> > > > > > user data ?
> > > > > 
> > > > > I think that only the card manufacturer may reveal what can
> > > > > happen with the card (what policy have been implemented in
> > > > > their FW).
> > > > > 
> > > > > I guess that you may lost data in such case.
> > > > 
> > > > Uuuurgh, that's seriously shitty.
> > > > 
> > > > > > Is this behavior
> > > > > > specific to Samsung SD cards ?
> > > > > 
> > > > > I've experienced the problem with Kingston (brand new one) and
> > > > > Adata MicroSD HC (4GiB) cards.
> > > > 
> > > > I had bad previous experience with Kingston too.
> > > > 
> > > > > > > To be clear - the mentioned patch introduced regression.
> > > > > > 
> > > > > > That's a bug, not a regression, but anyway,
> > > > > > that's not the point. I do
> > > > > > agree with you that we do have a problem and I'm inclined to
> > > > > > Ack this patch, I'd like to understand what the real
> > > > > > implications of such a behavior of these cards are.
> > > > > > 
> > > > > > > It works for
> > > > > > > small files on a commonly available SD cards (like 4 GiB
> > > > > > > Kingston/Adata).
> > > > > > > 
> > > > > > > When I ran DFU tests I've discovered that there is a problem
> > > > > > > with storing 8MiB file (dat_8M.img).
> > > > > > > 
> > > > > > > Even worse - when one wants to store Image.itb file (which
> > > > > > > might be 4-6 MiB) it sometimes works and sometimes not.
> > > > > > > Nightmare for debugging.
> > > > > > 
> > > > > > Ew, that's one crappy card you have there. I'm reading the SD
> > > > > > card "Physical Layer Simplified Specification Version
> > > > > > 4.10" (part1_410.pdf) section 4.6.2.2 and it states that for
> > > > > > SDHC cards, the write operation should take at most 250mS,
> > > > > > for SDXC it's 500mS. Could it be that your card is violating
> > > > > > the spec ?
> > > 
> > > The "timeout" error is for situation when you issue write command
> > > (either single or multiple block) and you don't receive any response
> > > from the card.
> > > 
> > > In our case we use multiblock transfer (CMD25) with either set
> > > number of block to transfer (CMD23) or explicit end of transmission
> > > (CMD12).
> > > 
> > > Let's consider the second case.
> > > 
> > > We setup data and issue CMD25. Then we check the CMD25 status (if we
> > > don't receive reply in 250 ms we would get timeout error).
> > > Afterwards data from buffer is transmitted to the card and flashed.
> > > This operation may take long time. During this process you can issue
> > > CMD13 (SEND_STATUS) to receive information about card state ([*]
> > > 4.10. - page 85).
> > 
> > Maybe we should implement such polling through CMD13 then ? But, this
> > should be a matter for next MW, this MW we should go with increasing
> > the timeout to some 20 or 30 second. What do you think ?
> 
> For next release we can increase the timeout. Rewritting eMMC subsystem
> code is a challenging task and we will not finish until the next u-boot
> release.

Yeah, we're in agreement.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-08-29 19:19         ` Marek Vasut
  2015-09-01 11:19           ` Lukasz Majewski
@ 2015-09-01 16:22           ` Pantelis Antoniou
  2015-09-02  8:06             ` Marek Vasut
  1 sibling, 1 reply; 50+ messages in thread
From: Pantelis Antoniou @ 2015-09-01 16:22 UTC (permalink / raw)
  To: u-boot

Hi Marek,

> On Aug 29, 2015, at 22:19 , Marek Vasut <marex@denx.de> wrote:
> 
> On Saturday, August 29, 2015 at 06:38:48 PM, Lukasz Majewski wrote:
>> Hi Marek,
> 
> Hi Lukasz,
> 
>>> On Saturday, August 29, 2015 at 01:55:36 PM, Lukasz Majewski wrote:
>>>> On Fri, 28 Aug 2015 23:55:17 +0200
>>> 
>>> Hi!
>>> 
>>>> Marek Vasut <marex@denx.de> wrote:
>>>>> On Friday, August 28, 2015 at 03:50:20 PM, Lukasz Majewski wrote:
>>>>>> The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
>>>>>> "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting
>>>>>> for end of dw mmc transfer.
>>>>>> 
>>>>>> For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
>>>>>> and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
>>>>>> the default timeout is to short.
>>>>>> 
>>>>>> The new value - 20 seconds - takes into account the situation
>>>>>> when SD card triggers internal clean up. Such process may take
>>>>>> more than 10 seconds on some cards.
>>>>> 
>>>>> What happens if you pull the SD card out of the slot during such a
>>>>> process?
>>>> 
>>>> Then we would wait 20 seconds :-) to proceed.
>>> 
>>> Oops, I think I was not clear here. I was wondering what happens to
>>> the card if you yank it out of the slot whole it's performing it's
>>> internal cleanup or whatever. Is it possible that the card suffers
>>> data corruption, effectively trashing user data ?
>> 
>> I think that only the card manufacturer may reveal what can happen with
>> the card (what policy have been implemented in their FW).
>> 
>> I guess that you may lost data in such case.
> 
> Uuuurgh, that's seriously shitty.
> 

Welcome to the world of managed flash. Manufacturers tend not to follow the
spec to the letter. What matters is standard consumer usage benchmarks.

>>> Is this behavior
>>> specific to Samsung SD cards ?
>> 
>> I've experienced the problem with Kingston (brand new one) and Adata
>> MicroSD HC (4GiB) cards.
> 
> I had bad previous experience with Kingston too.
> 
>>>> To be clear - the mentioned patch introduced regression.
>>> 
>>> That's a bug, not a regression, but anyway,
>>> that's not the point. I do
>>> agree with you that we do have a problem and I'm inclined to Ack this
>>> patch, I'd like to understand what the real implications of such a
>>> behavior of these cards are.
>>> 
>>>> It works for
>>>> small files on a commonly available SD cards (like 4 GiB
>>>> Kingston/Adata).
>>>> 
>>>> When I ran DFU tests I've discovered that there is a problem with
>>>> storing 8MiB file (dat_8M.img).
>>>> 
>>>> Even worse - when one wants to store Image.itb file (which might be
>>>> 4-6 MiB) it sometimes works and sometimes not. Nightmare for
>>>> debugging.
>>> 
>>> Ew, that's one crappy card you have there. I'm reading the SD card
>>> "Physical Layer Simplified Specification Version 4.10" (part1_410.pdf)
>>> section 4.6.2.2 and it states that for SDHC cards, the write operation
>>> should take at most 250mS, for SDXC it's 500mS. Could it be that your
>>> card is violating the spec ?
>> 
>> I'll look into the spec and then comment :-).
>> 
>> For my boards the time was 1.2 seconds for storing 8 MiB file.
> 
> OK, but that's weird. The transfer should always be up to 512B long and not
> any longer, unless it's a multiblock transfer.
> 
>>>> Please correct me if I'm wrong - but is seems like we are now using
>>>> 1 second timeout for detection if SD card has been removed?
>>>> 
>>>> Shouldn't we use polling on the card detect IO pin instead? We
>>>> could add such polling in several places in the MMC subsystem (like
>>>> we do it with watchdog).
>>>> 
>>>> Marek, Pantelis, what do you think about this?
>>> 
>>> If you implement board_mmc_getcd(), you can check if the card is
>>> present this way instead of waiting for command to time out. The
>>> infrastructure for that is already in place. Right ?
>> 
>> So you suggest adding board_mmc_getcd() in several places in the mmc
>> subsystem driver to detect removal of the SD card?
> 
> Hmmmm, I'm not sure about this one. Panto ?
> 

I?m fine with this. Perhaps we can avoid spamming this all over the place,
and that would be better.

>>> It'd be cool if the MMC subsystem could pull the wp-gpios and cd-gpios
>>> from DT though :)
>> 
>> +1
>> 

Indeed. I would expect this to be a per-board thing. I would not expect
all platforms to provide that capability.

>>>>> Also, where did you find out there is such "cleanup" mechanism
>>>>> please ?
>>>> 
>>>> Internally we did some tests with several SD cards. We were stunned
>>>> when it turned out that for some workloads it took up to 15 seconds
>>>> to end write operation for small data.
>>>> 
>>>> The culprit is the SD Card embedded controller responsible for FTL -
>>>> flash translation layer.
>>>> It allows NAND memory on the card to be visible as the block device.
>>>> More importantly it also takes care of wear leveling and bad block
>>>> management.
>>>> 
>>>> Hence, we don't know when it would start housekeeping operations.
>>>> We can only poll/wait until this controller finishes it work.
>>>> The code as it was (with the indefinite loop) was taking this
>>>> situation into account.
>>>> 
>>>> The 1 second timeout is apparently too short and makes using SD card
>>>> non-deterministic and error prone in u-boot.
>>>> 
>>>> Even worse, many devices use SD card as the only storage device.
>>> 
>>> Yes, horrible.
>> 
>> Good that we have agreed.
> 
> Heh :)
> 

Horrible closed source flash management algorithm is horrible, film at 11 :)

> Best regards,
> Marek Vasut

Regards

? Pantelis

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-01 16:22           ` Pantelis Antoniou
@ 2015-09-02  8:06             ` Marek Vasut
  0 siblings, 0 replies; 50+ messages in thread
From: Marek Vasut @ 2015-09-02  8:06 UTC (permalink / raw)
  To: u-boot

On Tuesday, September 01, 2015 at 06:22:08 PM, Pantelis Antoniou wrote:
> Hi Marek,

Hi!

[...]
> >> So you suggest adding board_mmc_getcd() in several places in the mmc
> >> subsystem driver to detect removal of the SD card?
> > 
> > Hmmmm, I'm not sure about this one. Panto ?
> 
> I?m fine with this. Perhaps we can avoid spamming this all over the place,
> and that would be better.
> 
> >>> It'd be cool if the MMC subsystem could pull the wp-gpios and cd-gpios
> >>> from DT though :)
> >> 
> >> +1
> 
> Indeed. I would expect this to be a per-board thing. I would not expect
> all platforms to provide that capability.

That's right.

> >>>>> Also, where did you find out there is such "cleanup" mechanism
> >>>>> please ?
> >>>> 
> >>>> Internally we did some tests with several SD cards. We were stunned
> >>>> when it turned out that for some workloads it took up to 15 seconds
> >>>> to end write operation for small data.
> >>>> 
> >>>> The culprit is the SD Card embedded controller responsible for FTL -
> >>>> flash translation layer.
> >>>> It allows NAND memory on the card to be visible as the block device.
> >>>> More importantly it also takes care of wear leveling and bad block
> >>>> management.
> >>>> 
> >>>> Hence, we don't know when it would start housekeeping operations.
> >>>> We can only poll/wait until this controller finishes it work.
> >>>> The code as it was (with the indefinite loop) was taking this
> >>>> situation into account.
> >>>> 
> >>>> The 1 second timeout is apparently too short and makes using SD card
> >>>> non-deterministic and error prone in u-boot.
> >>>> 
> >>>> Even worse, many devices use SD card as the only storage device.
> >>> 
> >>> Yes, horrible.
> >> 
> >> Good that we have agreed.
> > 
> > Heh :)
> 
> Horrible closed source flash management algorithm is horrible, film at 11
> :)

It was some seriously repugnant flick ...

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers
  2015-08-28 13:50 ` [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console Lukasz Majewski
  2015-08-28 23:21   ` Simon Glass
@ 2015-09-03 12:21   ` Lukasz Majewski
  2015-09-03 12:44     ` Tom Rini
                       ` (2 more replies)
  1 sibling, 3 replies; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-03 12:21 UTC (permalink / raw)
  To: u-boot

It is very common that FAT code is using following pattern:
if (disk_{read|write}() < 0)
        return -1;

Up till now the above code was dead, since disk_{read|write) could only
return value >= 0.
As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at
struct block_dev_desc (@part.h). It returns unsigned long, where 0
indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase}
when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0
should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>

Test HW: Odroid XU3 - Exynos 5433
---
 fs/fat/fat.c       | 11 +++++++++--
 fs/fat/fat_write.c | 11 +++++++++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/fat/fat.c b/fs/fat/fat.c
index bccc3e3..d743014 100644
--- a/fs/fat/fat.c
+++ b/fs/fat/fat.c
@@ -45,11 +45,18 @@ static disk_partition_t cur_part_info;
 
 static int disk_read(__u32 block, __u32 nr_blocks, void *buf)
 {
+	ulong ret;
+
 	if (!cur_dev || !cur_dev->block_read)
 		return -1;
 
-	return cur_dev->block_read(cur_dev->dev,
-			cur_part_info.start + block, nr_blocks, buf);
+	ret = cur_dev->block_read(cur_dev->dev,
+				  cur_part_info.start + block, nr_blocks, buf);
+
+	if (nr_blocks && ret == 0)
+		return -1;
+
+	return ret;
 }
 
 int fat_set_blk_dev(block_dev_desc_t *dev_desc, disk_partition_t *info)
diff --git a/fs/fat/fat_write.c b/fs/fat/fat_write.c
index 98b88ad..adb6940 100644
--- a/fs/fat/fat_write.c
+++ b/fs/fat/fat_write.c
@@ -30,6 +30,8 @@ static void uppercase(char *str, int len)
 static int total_sector;
 static int disk_write(__u32 block, __u32 nr_blocks, void *buf)
 {
+	ulong ret;
+
 	if (!cur_dev || !cur_dev->block_write)
 		return -1;
 
@@ -39,8 +41,13 @@ static int disk_write(__u32 block, __u32 nr_blocks, void *buf)
 		return -1;
 	}
 
-	return cur_dev->block_write(cur_dev->dev,
-			cur_part_info.start + block, nr_blocks,	buf);
+	ret = cur_dev->block_write(cur_dev->dev,
+				   cur_part_info.start + block,
+				   nr_blocks, buf);
+	if (nr_blocks && ret == 0)
+		return -1;
+
+	return ret;
 }
 
 /*
-- 
2.0.0.rc2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console
  2015-08-29 15:07       ` Simon Glass
@ 2015-09-03 12:33         ` Lukasz Majewski
  0 siblings, 0 replies; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-03 12:33 UTC (permalink / raw)
  To: u-boot

Hi Simon,

> Hi Lukasz,
> 
> On 29 August 2015 at 06:09, Lukasz Majewski <l.majewski@majess.pl>
> wrote:
> > On Fri, 28 Aug 2015 17:21:51 -0600
> > Simon Glass <sjg@chromium.org> wrote:
> >
> >> Hi Lukasz,
> >>
> >> On 28 August 2015 at 07:50, Lukasz Majewski
> >> <l.majewski@samsung.com> wrote:
> >> >
> >> > The timeout error for DW MMC transfer should be visible on the
> >> > u-boot console to speed up the process of debugging.
> >> >
> >> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> >> > Cc: Tom Rini <trini@konsulko.com>
> >> > Cc: Simon Glass <sjg@chromium.org>
> >> > ---
> >> >  drivers/mmc/dw_mmc.c | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> Can this be reported at the command line level instead?
> >
> > With my setup - dfu tests - I didn't receive any error/information
> > about the MMC (SD Card to be precise) timeout.
> >
> > (I would expect MMC subsystem to return -ETIMEOUT and then this
> > error would be propagated to dfu and stop execution of the dfu
> > command).
> >
> > This didn't work and hence should be scrutinized.
> 
> Agreed.
> 
> >
> >> It feels wrong
> >> to be printing errors in a driver.
> >
> > If this information would be printed on the console I might have
> > noticed it immediately and save some time for debugging.
> >
> >> Also, why does it return TIMEOUT
> >> instead of -ETIMEOUT?
> >
> > Good question.
> 
> It looks likes mmc.h has its own errors (NO_CARD_ERR, etc.). I suspect
> these could be converted to use standard errors.
> 
> Anyway your question remains. A driver error should be reported by
> the caller.

Please drop this patch. 

I've posted other one (as a reply tp this one):
"[PATCH] FIX: fat: Provide correct return code from disk_{read|write}"

to fix this problem.

> 
> Regards,
> Simon
> 



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers
  2015-09-03 12:21   ` [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers Lukasz Majewski
@ 2015-09-03 12:44     ` Tom Rini
  2015-09-03 13:40       ` Lukasz Majewski
  2015-09-09  7:02     ` Lukasz Majewski
  2015-09-12 12:51     ` [U-Boot] " Tom Rini
  2 siblings, 1 reply; 50+ messages in thread
From: Tom Rini @ 2015-09-03 12:44 UTC (permalink / raw)
  To: u-boot

On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:

> It is very common that FAT code is using following pattern:
> if (disk_{read|write}() < 0)
>         return -1;
> 
> Up till now the above code was dead, since disk_{read|write) could only
> return value >= 0.
> As a result some errors from medium layer (i.e. eMMC/SD) were not caught.
> 
> The above behavior was caused by block_{read|write|erase} declared at
> struct block_dev_desc (@part.h). It returns unsigned long, where 0
> indicates error and > 0 indicates that medium operation was correct.
> 
> This patch as error regards 0 returned from block_{read|write|erase}
> when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0
> should return 0 and hence is not considered as an error.
> 
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> 
> Test HW: Odroid XU3 - Exynos 5433

Can you pick up Stephen's FAT replacement series and see if it also
fixes this problem?  Thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150903/2f4fe1e7/attachment.sig>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers
  2015-09-03 12:44     ` Tom Rini
@ 2015-09-03 13:40       ` Lukasz Majewski
  2015-09-03 14:18         ` Lukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-03 13:40 UTC (permalink / raw)
  To: u-boot

Hi Tom,

> On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:
> 
> > It is very common that FAT code is using following pattern:
> > if (disk_{read|write}() < 0)
> >         return -1;
> > 
> > Up till now the above code was dead, since disk_{read|write) could
> > only return value >= 0.
> > As a result some errors from medium layer (i.e. eMMC/SD) were not
> > caught.
> > 
> > The above behavior was caused by block_{read|write|erase} declared
> > at struct block_dev_desc (@part.h). It returns unsigned long, where
> > 0 indicates error and > 0 indicates that medium operation was
> > correct.
> > 
> > This patch as error regards 0 returned from block_{read|write|erase}
> > when nr_blocks is grater than zero. Read/Write operation with
> > nr_blocks=0 should return 0 and hence is not considered as an error.
> > 
> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > 
> > Test HW: Odroid XU3 - Exynos 5433
> 
> Can you pick up Stephen's FAT replacement series and see if it also
> fixes this problem?  Thanks!
> 

Ok, I will test this fat implementation.

However, since for v2015.10 it won't be included, I would also opt for
adding this fix to the current u-boot.

-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers
  2015-09-03 13:40       ` Lukasz Majewski
@ 2015-09-03 14:18         ` Lukasz Majewski
  2015-09-23  3:17           ` Stephen Warren
  0 siblings, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-03 14:18 UTC (permalink / raw)
  To: u-boot

Hi Lukasz,

> Hi Tom,
> 
> > On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:
> > 
> > > It is very common that FAT code is using following pattern:
> > > if (disk_{read|write}() < 0)
> > >         return -1;
> > > 
> > > Up till now the above code was dead, since disk_{read|write) could
> > > only return value >= 0.
> > > As a result some errors from medium layer (i.e. eMMC/SD) were not
> > > caught.
> > > 
> > > The above behavior was caused by block_{read|write|erase} declared
> > > at struct block_dev_desc (@part.h). It returns unsigned long,
> > > where 0 indicates error and > 0 indicates that medium operation
> > > was correct.
> > > 
> > > This patch as error regards 0 returned from
> > > block_{read|write|erase} when nr_blocks is grater than zero.
> > > Read/Write operation with nr_blocks=0 should return 0 and hence
> > > is not considered as an error.
> > > 
> > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > 
> > > Test HW: Odroid XU3 - Exynos 5433
> > 
> > Can you pick up Stephen's FAT replacement series and see if it also
> > fixes this problem?  Thanks!
> > 
> 
> Ok, I will test this fat implementation.

I've applied v2 of this patchset
on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f

Unfortunately, DFU tests fail with first attempt to pass the test.

Apparently this code needs some more review/testing before being
included to main line.

> 
> However, since for v2015.10 it won't be included, I would also opt for
> adding this fix to the current u-boot.
> 



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-08-28 13:50 [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds Lukasz Majewski
  2015-08-28 13:50 ` [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console Lukasz Majewski
  2015-08-28 21:55 ` [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds Marek Vasut
@ 2015-09-09  7:01 ` Lukasz Majewski
  2015-09-09 11:34   ` Marek Vasut
  2015-09-14 10:33   ` Przemyslaw Marczak
  2015-09-25 16:25 ` [U-Boot] [PATCH] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel) Lukasz Majewski
  3 siblings, 2 replies; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-09  7:01 UTC (permalink / raw)
  To: u-boot

Hi,

> The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for
> end of dw mmc transfer.
> 
> For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> the default timeout is to short.
> 
> The new value - 20 seconds - takes into account the situation when SD
> card triggers internal clean up. Such process may take more than 10
> seconds on some cards.
> 
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> Cc: Marek Vasut <marex@denx.de>
> Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
> Cc: Tom Rini <trini@konsulko.com>

Are there any more questions regarding this patch or is it ready for
submission as fix for v2015.10?

> ---
>  drivers/mmc/dw_mmc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c
> index 77b87e0..21a92d2 100644
> --- a/drivers/mmc/dw_mmc.c
> +++ b/drivers/mmc/dw_mmc.c
> @@ -213,7 +213,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct
> mmc_cmd *cmd, 
>  	if (data) {
>  		start = get_timer(0);
> -		timeout = 1000;
> +		timeout = 20000;
>  		for (;;) {
>  			mask = dwmci_readl(host, DWMCI_RINTSTS);
>  			/* Error during data transfer. */



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers
  2015-09-03 12:21   ` [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers Lukasz Majewski
  2015-09-03 12:44     ` Tom Rini
@ 2015-09-09  7:02     ` Lukasz Majewski
  2015-09-17 14:44       ` Lukasz Majewski
  2015-09-12 12:51     ` [U-Boot] " Tom Rini
  2 siblings, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-09  7:02 UTC (permalink / raw)
  To: u-boot

Hi,

> It is very common that FAT code is using following pattern:
> if (disk_{read|write}() < 0)
>         return -1;
> 
> Up till now the above code was dead, since disk_{read|write) could
> only return value >= 0.
> As a result some errors from medium layer (i.e. eMMC/SD) were not
> caught.
> 
> The above behavior was caused by block_{read|write|erase} declared at
> struct block_dev_desc (@part.h). It returns unsigned long, where 0
> indicates error and > 0 indicates that medium operation was correct.
> 
> This patch as error regards 0 returned from block_{read|write|erase}
> when nr_blocks is grater than zero. Read/Write operation with
> nr_blocks=0 should return 0 and hence is not considered as an error.
> 
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>

Are there any more questions regarding this patch? I'd be more than
happy if it would be added to v2015.10 :-).

> 
> Test HW: Odroid XU3 - Exynos 5433
> ---
>  fs/fat/fat.c       | 11 +++++++++--
>  fs/fat/fat_write.c | 11 +++++++++--
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/fat/fat.c b/fs/fat/fat.c
> index bccc3e3..d743014 100644
> --- a/fs/fat/fat.c
> +++ b/fs/fat/fat.c
> @@ -45,11 +45,18 @@ static disk_partition_t cur_part_info;
>  
>  static int disk_read(__u32 block, __u32 nr_blocks, void *buf)
>  {
> +	ulong ret;
> +
>  	if (!cur_dev || !cur_dev->block_read)
>  		return -1;
>  
> -	return cur_dev->block_read(cur_dev->dev,
> -			cur_part_info.start + block, nr_blocks, buf);
> +	ret = cur_dev->block_read(cur_dev->dev,
> +				  cur_part_info.start + block,
> nr_blocks, buf); +
> +	if (nr_blocks && ret == 0)
> +		return -1;
> +
> +	return ret;
>  }
>  
>  int fat_set_blk_dev(block_dev_desc_t *dev_desc, disk_partition_t
> *info) diff --git a/fs/fat/fat_write.c b/fs/fat/fat_write.c
> index 98b88ad..adb6940 100644
> --- a/fs/fat/fat_write.c
> +++ b/fs/fat/fat_write.c
> @@ -30,6 +30,8 @@ static void uppercase(char *str, int len)
>  static int total_sector;
>  static int disk_write(__u32 block, __u32 nr_blocks, void *buf)
>  {
> +	ulong ret;
> +
>  	if (!cur_dev || !cur_dev->block_write)
>  		return -1;
>  
> @@ -39,8 +41,13 @@ static int disk_write(__u32 block, __u32
> nr_blocks, void *buf) return -1;
>  	}
>  
> -	return cur_dev->block_write(cur_dev->dev,
> -			cur_part_info.start + block,
> nr_blocks,	buf);
> +	ret = cur_dev->block_write(cur_dev->dev,
> +				   cur_part_info.start + block,
> +				   nr_blocks, buf);
> +	if (nr_blocks && ret == 0)
> +		return -1;
> +
> +	return ret;
>  }
>  
>  /*



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-09  7:01 ` Lukasz Majewski
@ 2015-09-09 11:34   ` Marek Vasut
  2015-09-11 17:20     ` Alexey Brodkin
  2015-09-14 10:33   ` Przemyslaw Marczak
  1 sibling, 1 reply; 50+ messages in thread
From: Marek Vasut @ 2015-09-09 11:34 UTC (permalink / raw)
  To: u-boot

On Wednesday, September 09, 2015 at 09:01:30 AM, Lukasz Majewski wrote:
> Hi,
> 
> > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for
> > end of dw mmc transfer.
> > 
> > For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> > and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> > the default timeout is to short.
> > 
> > The new value - 20 seconds - takes into account the situation when SD
> > card triggers internal clean up. Such process may take more than 10
> > seconds on some cards.
> > 
> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > Cc: Marek Vasut <marex@denx.de>
> > Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
> > Cc: Tom Rini <trini@konsulko.com>
> 
> Are there any more questions regarding this patch or is it ready for
> submission as fix for v2015.10?

No comments, just apply this.

But this should really be fixed properly in the next MW.

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-09 11:34   ` Marek Vasut
@ 2015-09-11 17:20     ` Alexey Brodkin
  2015-09-11 21:45       ` Lukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Brodkin @ 2015-09-11 17:20 UTC (permalink / raw)
  To: u-boot

Hi Marek, Lukasz,

> On Wednesday, September 09, 2015 at 09:01:30 AM, Lukasz Majewski wrote:
> > Hi,
> > 
> > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > > "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for
> > > end of dw mmc transfer.
> > > 
> > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> > > and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> > > the default timeout is to short.
> > > 
> > > The new value - 20 seconds - takes into account the situation when SD
> > > card triggers internal clean up. Such process may take more than 10
> > > seconds on some cards.
> > > 
> > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > Cc: Marek Vasut <marex@denx.de>
> > > Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
> > > Cc: Tom Rini <trini@konsulko.com>
> > 
> > Are there any more questions regarding this patch or is it ready for
> > submission as fix for v2015.10?
> 
> No comments, just apply this.
> 
> But this should really be fixed properly in the next MW.
> 
> Best regards,
> Marek Vasut

FWIW I faced similar problem even reading data.
At least on one of my boards reading of ~8Mb file
took ~1.7 seconds and so 1 second timeout was interrupting data
exchange.

So indeed we need to have some dirty hack for upcoming release
like bumping timeout to something really huge but later we
need to fix that problem properly.

I though proper solution would be to set timeout depending of amount
of data to be exchanged... something like take slowest speed of SD/MMC
that is allowed by spec and calculate delay based on how much time it
might take for that slow device and for safety multiply it by say 2.

Now from this thread I see that there're other reasons that might affect
length of at least write operation. In other words it could be
complicated unfortunately.

Still we need to fix regression first with virtually infinite timeout :)
I would even thing that simple revert of Marek's patch may make sense for
now. From both points of view for keeping history clean (compared to
stacked fixes/workarounds) and from removal of regression root cause.

It's not that I like to have infinite loops but given previous
implementation worked fine for people in the previous U-Boot release.

-Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-11 17:20     ` Alexey Brodkin
@ 2015-09-11 21:45       ` Lukasz Majewski
  2015-09-12 16:13         ` Marek Vasut
  2015-09-14 10:30         ` Alexey Brodkin
  0 siblings, 2 replies; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-11 21:45 UTC (permalink / raw)
  To: u-boot

Hi Alexey,

> Hi Marek, Lukasz,
> 
> > On Wednesday, September 09, 2015 at 09:01:30 AM, Lukasz Majewski
> > wrote:
> > > Hi,
> > > 
> > > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > > > "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting
> > > > for end of dw mmc transfer.
> > > > 
> > > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> > > > and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> > > > the default timeout is to short.
> > > > 
> > > > The new value - 20 seconds - takes into account the situation
> > > > when SD card triggers internal clean up. Such process may take
> > > > more than 10 seconds on some cards.
> > > > 
> > > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > > Cc: Marek Vasut <marex@denx.de>
> > > > Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
> > > > Cc: Tom Rini <trini@konsulko.com>
> > > 
> > > Are there any more questions regarding this patch or is it ready
> > > for submission as fix for v2015.10?
> > 
> > No comments, just apply this.
> > 
> > But this should really be fixed properly in the next MW.
> > 
> > Best regards,
> > Marek Vasut
> 
> FWIW I faced similar problem even reading data.
> At least on one of my boards reading of ~8Mb file
> took ~1.7 seconds and so 1 second timeout was interrupting data
> exchange.

Was it SD card or eMMC device?

> 
> So indeed we need to have some dirty hack for upcoming release
> like bumping timeout to something really huge but later we
> need to fix that problem properly.
> 
> I though proper solution would be to set timeout depending of amount
> of data to be exchanged... something like take slowest speed of SD/MMC
> that is allowed by spec and calculate delay based on how much time it
> might take for that slow device and for safety multiply it by say 2.

As fair as I remember, card provide this kind of information. We can
try to investigate this possibility.

> 
> Now from this thread I see that there're other reasons that might
> affect length of at least write operation. In other words it could be
> complicated unfortunately.

My gut feeling is that proper handling of eMMC would require quite a
fair mmc subsystem rework.

> 
> Still we need to fix regression first with virtually infinite
> timeout :) I would even thing that simple revert of Marek's patch may
> make sense for now. 

+1 - unfortunately there were some other patches applied to this
particular patch. Simple revert might be a bit tricky here.

> From both points of view for keeping history
> clean (compared to stacked fixes/workarounds) and from removal of
> regression root cause.

As I said before - +1 from me.

> 
> It's not that I like to have infinite loops but given previous
> implementation worked fine for people in the previous U-Boot release.

Good justification


Best regards,
Lukasz Majewski

> 
> -Alexey
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150911/a6f13900/attachment.sig>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] FIX: fat: Provide correct return code from disk_{read|write} to upper layers
  2015-09-03 12:21   ` [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers Lukasz Majewski
  2015-09-03 12:44     ` Tom Rini
  2015-09-09  7:02     ` Lukasz Majewski
@ 2015-09-12 12:51     ` Tom Rini
  2 siblings, 0 replies; 50+ messages in thread
From: Tom Rini @ 2015-09-12 12:51 UTC (permalink / raw)
  To: u-boot

On Thu, Sep 03, 2015 at 02:21:39PM +0200, ?ukasz Majewski wrote:

> It is very common that FAT code is using following pattern:
> if (disk_{read|write}() < 0)
>         return -1;
> 
> Up till now the above code was dead, since disk_{read|write) could only
> return value >= 0.
> As a result some errors from medium layer (i.e. eMMC/SD) were not caught.
> 
> The above behavior was caused by block_{read|write|erase} declared at
> struct block_dev_desc (@part.h). It returns unsigned long, where 0
> indicates error and > 0 indicates that medium operation was correct.
> 
> This patch as error regards 0 returned from block_{read|write|erase}
> when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0
> should return 0 and hence is not considered as an error.
> 
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> 
> Test HW: Odroid XU3 - Exynos 5433

Applied to u-boot/master, thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150912/ac86cd2b/attachment.sig>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-11 21:45       ` Lukasz Majewski
@ 2015-09-12 16:13         ` Marek Vasut
  2015-09-13 10:03           ` Lukasz Majewski
  2015-09-14 10:30         ` Alexey Brodkin
  1 sibling, 1 reply; 50+ messages in thread
From: Marek Vasut @ 2015-09-12 16:13 UTC (permalink / raw)
  To: u-boot

On Friday, September 11, 2015 at 11:45:28 PM, Lukasz Majewski wrote:
> Hi Alexey,

Hi,

> > Hi Marek, Lukasz,
> > 
> > > On Wednesday, September 09, 2015 at 09:01:30 AM, Lukasz Majewski
> > > 
> > > wrote:
> > > > Hi,
> > > > 
> > > > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > > > > "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting
> > > > > for end of dw mmc transfer.
> > > > > 
> > > > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> > > > > and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> > > > > the default timeout is to short.
> > > > > 
> > > > > The new value - 20 seconds - takes into account the situation
> > > > > when SD card triggers internal clean up. Such process may take
> > > > > more than 10 seconds on some cards.
> > > > > 
> > > > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > > > Cc: Marek Vasut <marex@denx.de>
> > > > > Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
> > > > > Cc: Tom Rini <trini@konsulko.com>
> > > > 
> > > > Are there any more questions regarding this patch or is it ready
> > > > for submission as fix for v2015.10?
> > > 
> > > No comments, just apply this.
> > > 
> > > But this should really be fixed properly in the next MW.
> > > 
> > > Best regards,
> > > Marek Vasut
> > 
> > FWIW I faced similar problem even reading data.
> > At least on one of my boards reading of ~8Mb file
> > took ~1.7 seconds and so 1 second timeout was interrupting data
> > exchange.
> 
> Was it SD card or eMMC device?
> 
> > So indeed we need to have some dirty hack for upcoming release
> > like bumping timeout to something really huge but later we
> > need to fix that problem properly.
> > 
> > I though proper solution would be to set timeout depending of amount
> > of data to be exchanged... something like take slowest speed of SD/MMC
> > that is allowed by spec and calculate delay based on how much time it
> > might take for that slow device and for safety multiply it by say 2.
> 
> As fair as I remember, card provide this kind of information. We can
> try to investigate this possibility.

Please do.

> > Now from this thread I see that there're other reasons that might
> > affect length of at least write operation. In other words it could be
> > complicated unfortunately.
> 
> My gut feeling is that proper handling of eMMC would require quite a
> fair mmc subsystem rework.

Can you please elaborate ?

> > Still we need to fix regression first with virtually infinite
> > timeout :) I would even thing that simple revert of Marek's patch may
> > make sense for now.
> 
> +1 - unfortunately there were some other patches applied to this
> particular patch. Simple revert might be a bit tricky here.

-1 - In case the card gets removed during the DMA transfer and the
board doesn't have a watchdog, it will get stuck indefinitelly. We
absolutelly don't want this sort of behavior in U-Boot. I understand
that this is the easiest way for everyone to achieve some sort of
"working" solution, but it is definitelly not the correct one. While
I do agree to increasing the timeout, I do not agree to unbounded
loops, sorry.

> > From both points of view for keeping history
> > clean (compared to stacked fixes/workarounds) and from removal of
> > regression root cause.
> 
> As I said before - +1 from me.

As I said before, -1 from me. Btw. did anything regress in here? To me,
this seems like a newly discovered bug ...

> > It's not that I like to have infinite loops but given previous
> > implementation worked fine for people in the previous U-Boot release.
> 
> Good justification

It is never a justified to return to a potentially problematic version
for the sake of getting some sort of crappy hardware operational.

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-12 16:13         ` Marek Vasut
@ 2015-09-13 10:03           ` Lukasz Majewski
  2015-09-13 14:00             ` Marek Vasut
  0 siblings, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-13 10:03 UTC (permalink / raw)
  To: u-boot

Hi Marek,

> On Friday, September 11, 2015 at 11:45:28 PM, Lukasz Majewski wrote:
> > Hi Alexey,
> 
> Hi,
> 
> > > Hi Marek, Lukasz,
> > > 
> > > > On Wednesday, September 09, 2015 at 09:01:30 AM, Lukasz Majewski
> > > > 
> > > > wrote:
> > > > > Hi,
> > > > > 
> > > > > > The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> > > > > > "mmc: dw_mmc: Zap endless timeout" removed endless loop
> > > > > > waiting for end of dw mmc transfer.
> > > > > > 
> > > > > > For some workloads - dfu test @ Odroid XU3 (sending 8MiB
> > > > > > file) - and SD cards (e.g. MicroSD Kingston 4GiB, Adata
> > > > > > 4GiB) the default timeout is to short.
> > > > > > 
> > > > > > The new value - 20 seconds - takes into account the
> > > > > > situation when SD card triggers internal clean up. Such
> > > > > > process may take more than 10 seconds on some cards.
> > > > > > 
> > > > > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > > > > Cc: Marek Vasut <marex@denx.de>
> > > > > > Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
> > > > > > Cc: Tom Rini <trini@konsulko.com>
> > > > > 
> > > > > Are there any more questions regarding this patch or is it
> > > > > ready for submission as fix for v2015.10?
> > > > 
> > > > No comments, just apply this.
> > > > 
> > > > But this should really be fixed properly in the next MW.
> > > > 
> > > > Best regards,
> > > > Marek Vasut
> > > 
> > > FWIW I faced similar problem even reading data.
> > > At least on one of my boards reading of ~8Mb file
> > > took ~1.7 seconds and so 1 second timeout was interrupting data
> > > exchange.
> > 
> > Was it SD card or eMMC device?
> > 
> > > So indeed we need to have some dirty hack for upcoming release
> > > like bumping timeout to something really huge but later we
> > > need to fix that problem properly.
> > > 
> > > I though proper solution would be to set timeout depending of
> > > amount of data to be exchanged... something like take slowest
> > > speed of SD/MMC that is allowed by spec and calculate delay based
> > > on how much time it might take for that slow device and for
> > > safety multiply it by say 2.
> > 
> > As fair as I remember, card provide this kind of information. We can
> > try to investigate this possibility.
> 
> Please do.
> 
> > > Now from this thread I see that there're other reasons that might
> > > affect length of at least write operation. In other words it
> > > could be complicated unfortunately.
> > 
> > My gut feeling is that proper handling of eMMC would require quite a
> > fair mmc subsystem rework.
> 
> Can you please elaborate ?

This is only my personal guess (when taking account my previous work
done with dw_mmc on Exynos HW)
- I think Pantelis would have more knowledge to elaborate here.

> 
> > > Still we need to fix regression first with virtually infinite
> > > timeout :) I would even thing that simple revert of Marek's patch
> > > may make sense for now.
> > 
> > +1 - unfortunately there were some other patches applied to this
> > particular patch. Simple revert might be a bit tricky here.
> 
> -1 - In case the card gets removed during the DMA transfer and the
> board doesn't have a watchdog, it will get stuck indefinitelly. 

I'm just wondering here - why the indefinite loop was working
previously? Was anybody complaining (on the ML) about the problem of
removing the SD card when some operation is ongoing?

The problem with potential removal of SD card (after booting the board)
is with us for quite long time. Even with indefinite loop (without your
patch) we also could "hang" the board if the SD card was removed
during a transfer.

> We
> absolutelly don't want this sort of behavior in U-Boot. I understand
> that this is the easiest way for everyone to achieve some sort of
> "working" solution, but it is definitelly not the correct one. While
> I do agree to increasing the timeout, I do not agree to unbounded
> loops, sorry.

We have agreed to not agree :-)

> 
> > > From both points of view for keeping history
> > > clean (compared to stacked fixes/workarounds) and from removal of
> > > regression root cause.
> > 
> > As I said before - +1 from me.
> 
> As I said before, -1 from me. Btw. did anything regress in here? To
> me, this seems like a newly discovered bug ...

Yes, this is a bug. We had similar problem with Samsung's SDHCI, before
we switched to dw_mmc. This issue is new at dw_mmc.

> 
> > > It's not that I like to have infinite loops but given previous
> > > implementation worked fine for people in the previous U-Boot
> > > release.
> > 
> > Good justification
> 
> It is never a justified to return to a potentially problematic version

IMHO revering the change (before the release) is from the software
development point of view better solution than adding some
heuristic delta to timeout.

> for the sake of getting some sort of crappy hardware operational.

Unfortunately this "crappy hardware" is pervasive and we cannot do
anything about it.

To sum up (my point of view):
1. The best would be to revert the patch - but if simple "git revert" is
not working then,
2. We should increase the timeout (with my patch) for v2015.10 release
3. After release we can devise some kind of solution
4. It is a good topic for U-boot's minisummit discussion at Dublin

Marek, Alexey, Tom, Pantelis what do you think?

> 
> Best regards,
> Marek Vasut

Best regards,
Lukasz Majewski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150913/4d9ee1b9/attachment.sig>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-13 10:03           ` Lukasz Majewski
@ 2015-09-13 14:00             ` Marek Vasut
  2015-09-14 10:15               ` Alexey Brodkin
  0 siblings, 1 reply; 50+ messages in thread
From: Marek Vasut @ 2015-09-13 14:00 UTC (permalink / raw)
  To: u-boot

On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote:
> Hi Marek,

Hi,

[...]

> > > > Now from this thread I see that there're other reasons that might
> > > > affect length of at least write operation. In other words it
> > > > could be complicated unfortunately.
> > > 
> > > My gut feeling is that proper handling of eMMC would require quite a
> > > fair mmc subsystem rework.
> > 
> > Can you please elaborate ?
> 
> This is only my personal guess (when taking account my previous work
> done with dw_mmc on Exynos HW)
> - I think Pantelis would have more knowledge to elaborate here.

OK

> > > > Still we need to fix regression first with virtually infinite
> > > > timeout :) I would even thing that simple revert of Marek's patch
> > > > may make sense for now.
> > > 
> > > +1 - unfortunately there were some other patches applied to this
> > > particular patch. Simple revert might be a bit tricky here.
> > 
> > -1 - In case the card gets removed during the DMA transfer and the
> > board doesn't have a watchdog, it will get stuck indefinitelly.
> 
> I'm just wondering here - why the indefinite loop was working
> previously? Was anybody complaining (on the ML) about the problem of
> removing the SD card when some operation is ongoing?

It worked for me for all the workloads I used. Noone was complaining.

> The problem with potential removal of SD card (after booting the board)
> is with us for quite long time. Even with indefinite loop (without your
> patch) we also could "hang" the board if the SD card was removed
> during a transfer.

Which is why we should weed out the unbounded loops.

> > We
> > absolutelly don't want this sort of behavior in U-Boot. I understand
> > that this is the easiest way for everyone to achieve some sort of
> > "working" solution, but it is definitelly not the correct one. While
> > I do agree to increasing the timeout, I do not agree to unbounded
> > loops, sorry.
> 
> We have agreed to not agree :-)

Yes :-)

> > > > From both points of view for keeping history
> > > > clean (compared to stacked fixes/workarounds) and from removal of
> > > > regression root cause.
> > > 
> > > As I said before - +1 from me.
> > 
> > As I said before, -1 from me. Btw. did anything regress in here? To
> > me, this seems like a newly discovered bug ...
> 
> Yes, this is a bug. We had similar problem with Samsung's SDHCI, before
> we switched to dw_mmc. This issue is new at dw_mmc.
> 
> > > > It's not that I like to have infinite loops but given previous
> > > > implementation worked fine for people in the previous U-Boot
> > > > release.
> > > 
> > > Good justification
> > 
> > It is never a justified to return to a potentially problematic version
> 
> IMHO revering the change (before the release) is from the software
> development point of view better solution than adding some
> heuristic delta to timeout.
> 
> > for the sake of getting some sort of crappy hardware operational.
> 
> Unfortunately this "crappy hardware" is pervasive and we cannot do
> anything about it.
> 
> To sum up (my point of view):
> 1. The best would be to revert the patch - but if simple "git revert" is
> not working then,
> 2. We should increase the timeout (with my patch) for v2015.10 release

Let's do this for the sake of crappy cards.

> 3. After release we can devise some kind of solution
> 4. It is a good topic for U-boot's minisummit discussion at Dublin
> 
> Marek, Alexey, Tom, Pantelis what do you think?

I think yes.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-13 14:00             ` Marek Vasut
@ 2015-09-14 10:15               ` Alexey Brodkin
  2015-09-14 11:22                 ` Lukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Alexey Brodkin @ 2015-09-14 10:15 UTC (permalink / raw)
  To: u-boot

Hi Marek, Lukasz,

On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:
> On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote:
> > Hi Marek,
> 
> Hi,
> 
> [...]

> 
> > > > > Still we need to fix regression first with virtually infinite
> > > > > timeout :) I would even thing that simple revert of Marek's patch
> > > > > may make sense for now.
> > > > 
> > > > +1 - unfortunately there were some other patches applied to this
> > > > particular patch. Simple revert might be a bit tricky here.
> > > 
> > > -1 - In case the card gets removed during the DMA transfer and the
> > > board doesn't have a watchdog, it will get stuck indefinitelly.
> > 
> > I'm just wondering here - why the indefinite loop was working
> > previously? Was anybody complaining (on the ML) about the problem of
> > removing the SD card when some operation is ongoing?
> 
> It worked for me for all the workloads I used. Noone was complaining.

The same story here - previous code with infinite loop was working for
my boards. And now I do see a problem with pretty simple scenario that
we do use in our products.

> > The problem with potential removal of SD card (after booting the board)
> > is with us for quite long time. Even with indefinite loop (without your
> > patch) we also could "hang" the board if the SD card was removed
> > during a transfer.
> 
> Which is why we should weed out the unbounded loops.
> 
> > > We
> > > absolutelly don't want this sort of behavior in U-Boot. I understand
> > > that this is the easiest way for everyone to achieve some sort of
> > > "working" solution, but it is definitelly not the correct one. While
> > > I do agree to increasing the timeout, I do not agree to unbounded
> > > loops, sorry.
> > 
> > We have agreed to not agree :-)
> 
> Yes :-)

The first thing I care is working U-Boot v2015.10 out of the box on my
boards. And so I may agree on any temporary solution. I see it as timeout
value either being infinite or obviously very high like 60 seconds.

60 seconds might sound stupid but my thought behind this is to make sure
even long transfers succeed. Imagine 100 Mb rootfs or update file downloaded
from slow SD-card.

> > > > > From both points of view for keeping history
> > > > > clean (compared to stacked fixes/workarounds) and from removal of
> > > > > regression root cause.
> > > > 
> > > > As I said before - +1 from me.
> > > 
> > > As I said before, -1 from me. Btw. did anything regress in here? To
> > > me, this seems like a newly discovered bug ...
> > 
> > Yes, this is a bug. We had similar problem with Samsung's SDHCI, before
> > we switched to dw_mmc. This issue is new at dw_mmc.
> > 
> > > > > It's not that I like to have infinite loops but given previous
> > > > > implementation worked fine for people in the previous U-Boot
> > > > > release.
> > > > 
> > > > Good justification
> > > 
> > > It is never a justified to return to a potentially problematic version
> > 
> > IMHO revering the change (before the release) is from the software
> > development point of view better solution than adding some
> > heuristic delta to timeout.
> > 
> > > for the sake of getting some sort of crappy hardware operational.
> > 
> > Unfortunately this "crappy hardware" is pervasive and we cannot do
> > anything about it.
> > 
> > To sum up (my point of view):
> > 1. The best would be to revert the patch - but if simple "git revert" is
> > not working then,

Well even if clean revert won't work we may do manual tweaks so that
functionally it is "revert". If of any interest I may come up with that
sort of patch.

> > 2. We should increase the timeout (with my patch) for v2015.10 release

If everybody is OK with that let's go do it. Because release is around the
corner and I don't want to explain each and every user how to fix their
new problem.

> Let's do this for the sake of crappy cards.
> 
> > 3. After release we can devise some kind of solution
> > 4. It is a good topic for U-boot's minisummit discussion at Dublin
> > 
> > Marek, Alexey, Tom, Pantelis what do you think?
> 
> I think yes.

What's important we need to make sure Tom is aware of this situation and
he won't cut a release until our fix is in place and all involved parties
reported their happiness.

-Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-11 21:45       ` Lukasz Majewski
  2015-09-12 16:13         ` Marek Vasut
@ 2015-09-14 10:30         ` Alexey Brodkin
  2015-09-14 11:15           ` Przemyslaw Marczak
  1 sibling, 1 reply; 50+ messages in thread
From: Alexey Brodkin @ 2015-09-14 10:30 UTC (permalink / raw)
  To: u-boot

On Fri, 2015-09-11 at 23:45 +0200, Lukasz Majewski wrote:
> Hi Alexey,

> > FWIW I faced similar problem even reading data.
> > At least on one of my boards reading of ~8Mb file
> > took ~1.7 seconds and so 1 second timeout was interrupting data
> > exchange.
> 
> Was it SD card or eMMC device?

It was external SD-card.

> > 
> > So indeed we need to have some dirty hack for upcoming release
> > like bumping timeout to something really huge but later we
> > need to fix that problem properly.
> > 
> > I though proper solution would be to set timeout depending of amount
> > of data to be exchanged... something like take slowest speed of SD/MMC
> > that is allowed by spec and calculate delay based on how much time it
> > might take for that slow device and for safety multiply it by say 2.
> 
> As fair as I remember, card provide this kind of information. We can
> try to investigate this possibility.

I'm pretty sure card may report a lot of info about itself.
And if we look at what people do in Linux kernel we may see calculations
of different timeouts in "drivers/mmc/core/core.c".

But keeping in mind we're not writing another OS kernel but we're just
dealing with bootloader I'd prefer to keep things simple and do check
only critical things like totally broken HW. I.e. something simple should
be enough for U-Boot.

> > 
> > Now from this thread I see that there're other reasons that might
> > affect length of at least write operation. In other words it could be
> > complicated unfortunately.

-Alexey

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-09  7:01 ` Lukasz Majewski
  2015-09-09 11:34   ` Marek Vasut
@ 2015-09-14 10:33   ` Przemyslaw Marczak
  1 sibling, 0 replies; 50+ messages in thread
From: Przemyslaw Marczak @ 2015-09-14 10:33 UTC (permalink / raw)
  To: u-boot

Hi all,

On 09/09/2015 09:01 AM, Lukasz Majewski wrote:
> Hi,
>
>> The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
>> "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for
>> end of dw mmc transfer.
>>
>> For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
>> and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
>> the default timeout is to short.
>>
>> The new value - 20 seconds - takes into account the situation when SD
>> card triggers internal clean up. Such process may take more than 10
>> seconds on some cards.
>>
>> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
>> Cc: Marek Vasut <marex@denx.de>
>> Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
>> Cc: Tom Rini <trini@konsulko.com>
>
> Are there any more questions regarding this patch or is it ready for
> submission as fix for v2015.10?
>
>> ---
>>   drivers/mmc/dw_mmc.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c
>> index 77b87e0..21a92d2 100644
>> --- a/drivers/mmc/dw_mmc.c
>> +++ b/drivers/mmc/dw_mmc.c
>> @@ -213,7 +213,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct
>> mmc_cmd *cmd,
>>   	if (data) {
>>   		start = get_timer(0);
>> -		timeout = 1000;
>> +		timeout = 20000;
>>   		for (;;) {
>>   			mask = dwmci_readl(host, DWMCI_RINTSTS);
>>   			/* Error during data transfer. */
>
>
>

I made some quick mmc command time test with the following code changes:

========================================================================
diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c
index a84c1e1..78e5d5d 100644
--- a/drivers/mmc/dw_mmc.c
+++ b/drivers/mmc/dw_mmc.c
@@ -113,7 +113,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct 
mmc_cmd *cmd,
  	ALLOC_CACHE_ALIGN_BUFFER(struct dwmci_idmac, cur_idmac,
  				 data ? DIV_ROUND_UP(data->blocks, 8) : 0);
  	int ret = 0, flags = 0, i;
-	unsigned int timeout = 100000;
+	unsigned int t = 0, timeout = 100000;
  	u32 retry = 10000;
  	u32 mask, ctrl;
  	ulong start = get_timer(0);
@@ -219,7 +219,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct 
mmc_cmd *cmd,
  			mask = dwmci_readl(host, DWMCI_RINTSTS);
  			/* Error during data transfer. */
  			if (mask & (DWMCI_DATA_ERR | DWMCI_DATA_TOUT)) {
-				debug("%s: DATA ERROR!\n", __func__);
+				printf("%s: DATA ERROR!\n", __func__);
  				ret = -EINVAL;
  				break;
  			}
@@ -231,13 +231,15 @@ static int dwmci_send_cmd(struct mmc *mmc, struct 
mmc_cmd *cmd,
  			}

  			/* Check for timeout. */
-			if (get_timer(start) > timeout) {
-				debug("%s: Timeout waiting for data!\n",
+			t = get_timer(start);
+			if (t > timeout) {
+				printf("%s: Timeout waiting for data!\n",
  				       __func__);
  				ret = TIMEOUT;
  				break;
  			}
  		}
+		printf("[MMC CMD] Tms: %u\n", t);

  		dwmci_writel(host, DWMCI_RINTSTS, mask);
  ========================================================================


Test info:
----------
1. Tree: commit 850f788709cef8f7d53d571aec3bfb73b14c5531
    "Merge branch 'rmobile' of git://git.denx.de/u-boot-sh"
2. SD card: SanDisk 32GB HC 4
3. eMMC card: Toshiba 16GB,  8-bit DDR mode
4. Board: Odroid XU3


Example usecase:
----------------
ODROID-XU3 # mmc read 0x40000000 0x0 0x800

MMC read: dev # 0, block # 0, count 2048 ... [MMC CMD] Tms: 46
2048 blocks read: OK


Time results for SD read/write:
------------------------------------
1MB:   46ms / 220ms
5MB:  229ms / 931ms
10MB: 456ms / timeout

Time results for eMMC read/write:
------------------------------------
1MB:   13ms / 46ms
5MB:   62ms / 260ms
10MB: 123ms / 515ms
20MB: 245ms / timeout


Timeout error for SD, write of 10MB data:
-----------------------------------------
ODROID-XU3 # mmc write 0x40000000 0x0 0x5000

MMC write: dev # 0, block # 0, count 20480 ... dwmci_send_cmd: Timeout 
waiting for data!
[MMC CMD] Tms: 1001
mmc write failed
0 blocks written: ERROR



I made one more test, by removing the SD card when reading the data:
----------------------------------------------------------------
ODROID-XU3 # mmc read 0x40000000 0x0 0xa000

MMC read: dev # 0, block # 0, count 40960 ... dwmci_send_cmd: DATA ERROR!
[MMC CMD] Tus: 283
0 blocks read: ERROR


Summary
-------
So, as we can see, the 1 second of timeout should be enough for small 
data size transfer. I also note some use cases for which, update the 
kernel of size ~4MB was failed. So this depends on the present card 
condition, and it can change for each use case - you never know, how 
much time card needs to finish the same command.

So, the conslusion is, that the 20-30 seconds of timeout seem to be 
okay. This will not introduce "unbounded loops", because the transfer 
status is read in the loop, and it works well - as the card removing 
example shows.

Please test this code on your platform and share your results.

Best regards,
-- 
Przemyslaw Marczak
Samsung R&D Institute Poland
Samsung Electronics
p.marczak at samsung.com

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-14 10:30         ` Alexey Brodkin
@ 2015-09-14 11:15           ` Przemyslaw Marczak
  0 siblings, 0 replies; 50+ messages in thread
From: Przemyslaw Marczak @ 2015-09-14 11:15 UTC (permalink / raw)
  To: u-boot

Hi Alexey,

On 09/14/2015 12:30 PM, Alexey Brodkin wrote:
> On Fri, 2015-09-11 at 23:45 +0200, Lukasz Majewski wrote:
>> Hi Alexey,
>
>>> FWIW I faced similar problem even reading data.
>>> At least on one of my boards reading of ~8Mb file
>>> took ~1.7 seconds and so 1 second timeout was interrupting data
>>> exchange.
>>
>> Was it SD card or eMMC device?
>
> It was external SD-card.
>
>>>
>>> So indeed we need to have some dirty hack for upcoming release
>>> like bumping timeout to something really huge but later we
>>> need to fix that problem properly.
>>>
>>> I though proper solution would be to set timeout depending of amount
>>> of data to be exchanged... something like take slowest speed of SD/MMC
>>> that is allowed by spec and calculate delay based on how much time it
>>> might take for that slow device and for safety multiply it by say 2.
>>
>> As fair as I remember, card provide this kind of information. We can
>> try to investigate this possibility.
>
> I'm pretty sure card may report a lot of info about itself.
> And if we look at what people do in Linux kernel we may see calculations
> of different timeouts in "drivers/mmc/core/core.c".
>
> But keeping in mind we're not writing another OS kernel but we're just
> dealing with bootloader I'd prefer to keep things simple and do check
> only critical things like totally broken HW. I.e. something simple should
> be enough for U-Boot.
>

The standard doesn't specify the timeout for read/write commands, 
because it is not a constant value. To finish the command, card needs 
make some internal cleanup, which is called "Background operations" in 
Jedec standard.

There is no way to estimate the time to finish the "bkops", maybe the 
card's firmware also doesn't know when it finish.

The 4 min of timeout for background operations is probably a result 
after some tests.

If I good remember only erase timeout is specified by the standard, and 
can be estimated after read some values from EXT_CSD, if the vendor 
provides it.

>>>
>>> Now from this thread I see that there're other reasons that might
>>> affect length of at least write operation. In other words it could be
>>> complicated unfortunately.
>
> -Alexey
>

Some day, I had a device which couldn't boot - it goes down after a 
while, and the console was silent. The device was temporary broken.

After power-up with JTAG, and manually run the "background operations" 
on the card, the device could then boot.

So the conclusion was, that the timeout for read in the bootloader was 
to short, and the tired card tried to make some internal cleanup - so 
the read command could not be finished in the timeout estimated for 
fresh card.

The bkops are supported from eMMC 4.3, and are not supported for SD 
cards. In this case, Linux can trigger the card internal cleanup in it's 
idle state, so the read/write operation will probably finish as fast as 
possible.

It is going to be little frustrating, to wait for the fix, to such 
critical issue...

Best regards,
-- 
Przemyslaw Marczak
Samsung R&D Institute Poland
Samsung Electronics
p.marczak at samsung.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-14 10:15               ` Alexey Brodkin
@ 2015-09-14 11:22                 ` Lukasz Majewski
  2015-09-14 13:36                   ` Marek Vasut
  0 siblings, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-14 11:22 UTC (permalink / raw)
  To: u-boot

Hi Alexey,

> Hi Marek, Lukasz,
> 
> On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:
> > On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote:
> > > Hi Marek,
> > 
> > Hi,
> > 
> > [...]
> 
> > 
> > > > > > Still we need to fix regression first with virtually
> > > > > > infinite timeout :) I would even thing that simple revert
> > > > > > of Marek's patch may make sense for now.
> > > > > 
> > > > > +1 - unfortunately there were some other patches applied to
> > > > > this particular patch. Simple revert might be a bit tricky
> > > > > here.
> > > > 
> > > > -1 - In case the card gets removed during the DMA transfer and
> > > > the board doesn't have a watchdog, it will get stuck
> > > > indefinitelly.
> > > 
> > > I'm just wondering here - why the indefinite loop was working
> > > previously? Was anybody complaining (on the ML) about the problem
> > > of removing the SD card when some operation is ongoing?
> > 
> > It worked for me for all the workloads I used. Noone was
> > complaining.
> 
> The same story here - previous code with infinite loop was working for
> my boards. And now I do see a problem with pretty simple scenario that
> we do use in our products.
> 
> > > The problem with potential removal of SD card (after booting the
> > > board) is with us for quite long time. Even with indefinite loop
> > > (without your patch) we also could "hang" the board if the SD
> > > card was removed during a transfer.
> > 
> > Which is why we should weed out the unbounded loops.
> > 
> > > > We
> > > > absolutelly don't want this sort of behavior in U-Boot. I
> > > > understand that this is the easiest way for everyone to achieve
> > > > some sort of "working" solution, but it is definitelly not the
> > > > correct one. While I do agree to increasing the timeout, I do
> > > > not agree to unbounded loops, sorry.
> > > 
> > > We have agreed to not agree :-)
> > 
> > Yes :-)
> 
> The first thing I care is working U-Boot v2015.10 out of the box on my
> boards. And so I may agree on any temporary solution. I see it as
> timeout value either being infinite or obviously very high like 60
> seconds.
> 
> 60 seconds might sound stupid but my thought behind this is to make
> sure even long transfers succeed. Imagine 100 Mb rootfs or update
> file downloaded from slow SD-card.

Transfer of rootfs to SD-card (downloaded to memory via tftp) is
definitely valid scenario.

> 
> > > > > > From both points of view for keeping history
> > > > > > clean (compared to stacked fixes/workarounds) and from
> > > > > > removal of regression root cause.
> > > > > 
> > > > > As I said before - +1 from me.
> > > > 
> > > > As I said before, -1 from me. Btw. did anything regress in
> > > > here? To me, this seems like a newly discovered bug ...
> > > 
> > > Yes, this is a bug. We had similar problem with Samsung's SDHCI,
> > > before we switched to dw_mmc. This issue is new at dw_mmc.
> > > 
> > > > > > It's not that I like to have infinite loops but given
> > > > > > previous implementation worked fine for people in the
> > > > > > previous U-Boot release.
> > > > > 
> > > > > Good justification
> > > > 
> > > > It is never a justified to return to a potentially problematic
> > > > version
> > > 
> > > IMHO revering the change (before the release) is from the software
> > > development point of view better solution than adding some
> > > heuristic delta to timeout.
> > > 
> > > > for the sake of getting some sort of crappy hardware
> > > > operational.
> > > 
> > > Unfortunately this "crappy hardware" is pervasive and we cannot do
> > > anything about it.
> > > 
> > > To sum up (my point of view):
> > > 1. The best would be to revert the patch - but if simple "git
> > > revert" is not working then,
> 
> Well even if clean revert won't work we may do manual tweaks so that
> functionally it is "revert". If of any interest I may come up with
> that sort of patch.
> 
> > > 2. We should increase the timeout (with my patch) for v2015.10
> > > release
> 
> If everybody is OK with that let's go do it. Because release is
> around the corner and I don't want to explain each and every user how
> to fix their new problem.

v2015.10 correctness is crucial here.

> 
> > Let's do this for the sake of crappy cards.
> > 
> > > 3. After release we can devise some kind of solution
> > > 4. It is a good topic for U-boot's minisummit discussion at Dublin
> > > 
> > > Marek, Alexey, Tom, Pantelis what do you think?
> > 
> > I think yes.
> 
> What's important we need to make sure Tom is aware of this situation
> and he won't cut a release until our fix is in place and all involved
> parties reported their happiness.

I also think that Tom should speak up on this issue.

> 
> -Alexey


-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-14 11:22                 ` Lukasz Majewski
@ 2015-09-14 13:36                   ` Marek Vasut
  2015-09-17 14:43                     ` Lukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Marek Vasut @ 2015-09-14 13:36 UTC (permalink / raw)
  To: u-boot

On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski wrote:
> Hi Alexey,
> 
> > Hi Marek, Lukasz,
> > 
> > On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:
> > > On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski wrote:
> > > > Hi Marek,
> > > 
> > > Hi,
> > > 
> > > [...]
> > > 
> > > > > > > Still we need to fix regression first with virtually
> > > > > > > infinite timeout :) I would even thing that simple revert
> > > > > > > of Marek's patch may make sense for now.
> > > > > > 
> > > > > > +1 - unfortunately there were some other patches applied to
> > > > > > this particular patch. Simple revert might be a bit tricky
> > > > > > here.
> > > > > 
> > > > > -1 - In case the card gets removed during the DMA transfer and
> > > > > the board doesn't have a watchdog, it will get stuck
> > > > > indefinitelly.
> > > > 
> > > > I'm just wondering here - why the indefinite loop was working
> > > > previously? Was anybody complaining (on the ML) about the problem
> > > > of removing the SD card when some operation is ongoing?
> > > 
> > > It worked for me for all the workloads I used. Noone was
> > > complaining.
> > 
> > The same story here - previous code with infinite loop was working for
> > my boards. And now I do see a problem with pretty simple scenario that
> > we do use in our products.
> > 
> > > > The problem with potential removal of SD card (after booting the
> > > > board) is with us for quite long time. Even with indefinite loop
> > > > (without your patch) we also could "hang" the board if the SD
> > > > card was removed during a transfer.
> > > 
> > > Which is why we should weed out the unbounded loops.
> > > 
> > > > > We
> > > > > absolutelly don't want this sort of behavior in U-Boot. I
> > > > > understand that this is the easiest way for everyone to achieve
> > > > > some sort of "working" solution, but it is definitelly not the
> > > > > correct one. While I do agree to increasing the timeout, I do
> > > > > not agree to unbounded loops, sorry.
> > > > 
> > > > We have agreed to not agree :-)
> > > 
> > > Yes :-)
> > 
> > The first thing I care is working U-Boot v2015.10 out of the box on my
> > boards. And so I may agree on any temporary solution. I see it as
> > timeout value either being infinite or obviously very high like 60
> > seconds.
> > 
> > 60 seconds might sound stupid but my thought behind this is to make
> > sure even long transfers succeed. Imagine 100 Mb rootfs or update
> > file downloaded from slow SD-card.
> 
> Transfer of rootfs to SD-card (downloaded to memory via tftp) is
> definitely valid scenario.
> 
> > > > > > > From both points of view for keeping history
> > > > > > > clean (compared to stacked fixes/workarounds) and from
> > > > > > > removal of regression root cause.
> > > > > > 
> > > > > > As I said before - +1 from me.
> > > > > 
> > > > > As I said before, -1 from me. Btw. did anything regress in
> > > > > here? To me, this seems like a newly discovered bug ...
> > > > 
> > > > Yes, this is a bug. We had similar problem with Samsung's SDHCI,
> > > > before we switched to dw_mmc. This issue is new at dw_mmc.
> > > > 
> > > > > > > It's not that I like to have infinite loops but given
> > > > > > > previous implementation worked fine for people in the
> > > > > > > previous U-Boot release.
> > > > > > 
> > > > > > Good justification
> > > > > 
> > > > > It is never a justified to return to a potentially problematic
> > > > > version
> > > > 
> > > > IMHO revering the change (before the release) is from the software
> > > > development point of view better solution than adding some
> > > > heuristic delta to timeout.
> > > > 
> > > > > for the sake of getting some sort of crappy hardware
> > > > > operational.
> > > > 
> > > > Unfortunately this "crappy hardware" is pervasive and we cannot do
> > > > anything about it.
> > > > 
> > > > To sum up (my point of view):
> > > > 1. The best would be to revert the patch - but if simple "git
> > > > revert" is not working then,
> > 
> > Well even if clean revert won't work we may do manual tweaks so that
> > functionally it is "revert". If of any interest I may come up with
> > that sort of patch.
> > 
> > > > 2. We should increase the timeout (with my patch) for v2015.10
> > > > release
> > 
> > If everybody is OK with that let's go do it. Because release is
> > around the corner and I don't want to explain each and every user how
> > to fix their new problem.
> 
> v2015.10 correctness is crucial here.
> 
> > > Let's do this for the sake of crappy cards.
> > > 
> > > > 3. After release we can devise some kind of solution
> > > > 4. It is a good topic for U-boot's minisummit discussion at Dublin
> > > > 
> > > > Marek, Alexey, Tom, Pantelis what do you think?
> > > 
> > > I think yes.
> > 
> > What's important we need to make sure Tom is aware of this situation
> > and he won't cut a release until our fix is in place and all involved
> > parties reported their happiness.
> 
> I also think that Tom should speak up on this issue.

btw. I think I know why SoCFPGA never triggered this issue, it has
CONFIG_SYS_MMC_MAX_BLK_COUNT set to 256 so a large transfer was
always broken to smaller chunks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-14 13:36                   ` Marek Vasut
@ 2015-09-17 14:43                     ` Lukasz Majewski
  2015-09-18  0:31                       ` Tom Rini
  0 siblings, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-17 14:43 UTC (permalink / raw)
  To: u-boot

Hi Tom,

> On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski wrote:
> > Hi Alexey,
> > 
> > > Hi Marek, Lukasz,
> > > 
> > > On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:
> > > > On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski
> > > > wrote:
> > > > > Hi Marek,
> > > > 
> > > > Hi,
> > > > 
> > > > [...]
> > > > 
> > > > > > > > Still we need to fix regression first with virtually
> > > > > > > > infinite timeout :) I would even thing that simple
> > > > > > > > revert of Marek's patch may make sense for now.
> > > > > > > 
> > > > > > > +1 - unfortunately there were some other patches applied
> > > > > > > to this particular patch. Simple revert might be a bit
> > > > > > > tricky here.
> > > > > > 
> > > > > > -1 - In case the card gets removed during the DMA transfer
> > > > > > and the board doesn't have a watchdog, it will get stuck
> > > > > > indefinitelly.
> > > > > 
> > > > > I'm just wondering here - why the indefinite loop was working
> > > > > previously? Was anybody complaining (on the ML) about the
> > > > > problem of removing the SD card when some operation is
> > > > > ongoing?
> > > > 
> > > > It worked for me for all the workloads I used. Noone was
> > > > complaining.
> > > 
> > > The same story here - previous code with infinite loop was
> > > working for my boards. And now I do see a problem with pretty
> > > simple scenario that we do use in our products.
> > > 
> > > > > The problem with potential removal of SD card (after booting
> > > > > the board) is with us for quite long time. Even with
> > > > > indefinite loop (without your patch) we also could "hang" the
> > > > > board if the SD card was removed during a transfer.
> > > > 
> > > > Which is why we should weed out the unbounded loops.
> > > > 
> > > > > > We
> > > > > > absolutelly don't want this sort of behavior in U-Boot. I
> > > > > > understand that this is the easiest way for everyone to
> > > > > > achieve some sort of "working" solution, but it is
> > > > > > definitelly not the correct one. While I do agree to
> > > > > > increasing the timeout, I do not agree to unbounded loops,
> > > > > > sorry.
> > > > > 
> > > > > We have agreed to not agree :-)
> > > > 
> > > > Yes :-)
> > > 
> > > The first thing I care is working U-Boot v2015.10 out of the box
> > > on my boards. And so I may agree on any temporary solution. I see
> > > it as timeout value either being infinite or obviously very high
> > > like 60 seconds.
> > > 
> > > 60 seconds might sound stupid but my thought behind this is to
> > > make sure even long transfers succeed. Imagine 100 Mb rootfs or
> > > update file downloaded from slow SD-card.
> > 
> > Transfer of rootfs to SD-card (downloaded to memory via tftp) is
> > definitely valid scenario.
> > 
> > > > > > > > From both points of view for keeping history
> > > > > > > > clean (compared to stacked fixes/workarounds) and from
> > > > > > > > removal of regression root cause.
> > > > > > > 
> > > > > > > As I said before - +1 from me.
> > > > > > 
> > > > > > As I said before, -1 from me. Btw. did anything regress in
> > > > > > here? To me, this seems like a newly discovered bug ...
> > > > > 
> > > > > Yes, this is a bug. We had similar problem with Samsung's
> > > > > SDHCI, before we switched to dw_mmc. This issue is new at
> > > > > dw_mmc.
> > > > > 
> > > > > > > > It's not that I like to have infinite loops but given
> > > > > > > > previous implementation worked fine for people in the
> > > > > > > > previous U-Boot release.
> > > > > > > 
> > > > > > > Good justification
> > > > > > 
> > > > > > It is never a justified to return to a potentially
> > > > > > problematic version
> > > > > 
> > > > > IMHO revering the change (before the release) is from the
> > > > > software development point of view better solution than
> > > > > adding some heuristic delta to timeout.
> > > > > 
> > > > > > for the sake of getting some sort of crappy hardware
> > > > > > operational.
> > > > > 
> > > > > Unfortunately this "crappy hardware" is pervasive and we
> > > > > cannot do anything about it.
> > > > > 
> > > > > To sum up (my point of view):
> > > > > 1. The best would be to revert the patch - but if simple "git
> > > > > revert" is not working then,
> > > 
> > > Well even if clean revert won't work we may do manual tweaks so
> > > that functionally it is "revert". If of any interest I may come
> > > up with that sort of patch.
> > > 
> > > > > 2. We should increase the timeout (with my patch) for v2015.10
> > > > > release
> > > 
> > > If everybody is OK with that let's go do it. Because release is
> > > around the corner and I don't want to explain each and every user
> > > how to fix their new problem.
> > 
> > v2015.10 correctness is crucial here.
> > 
> > > > Let's do this for the sake of crappy cards.
> > > > 
> > > > > 3. After release we can devise some kind of solution
> > > > > 4. It is a good topic for U-boot's minisummit discussion at
> > > > > Dublin
> > > > > 
> > > > > Marek, Alexey, Tom, Pantelis what do you think?
> > > > 
> > > > I think yes.
> > > 
> > > What's important we need to make sure Tom is aware of this
> > > situation and he won't cut a release until our fix is in place
> > > and all involved parties reported their happiness.
> > 


> > I also think that Tom should speak up on this issue.

Tom, could you give your opinion on this issue?


> 
> btw. I think I know why SoCFPGA never triggered this issue, it has
> CONFIG_SYS_MMC_MAX_BLK_COUNT set to 256 so a large transfer was
> always broken to smaller chunks.



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers
  2015-09-09  7:02     ` Lukasz Majewski
@ 2015-09-17 14:44       ` Lukasz Majewski
  0 siblings, 0 replies; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-17 14:44 UTC (permalink / raw)
  To: u-boot

Hi Tom,,

> Hi,
> 
> > It is very common that FAT code is using following pattern:
> > if (disk_{read|write}() < 0)
> >         return -1;
> > 
> > Up till now the above code was dead, since disk_{read|write) could
> > only return value >= 0.
> > As a result some errors from medium layer (i.e. eMMC/SD) were not
> > caught.
> > 
> > The above behavior was caused by block_{read|write|erase} declared
> > at struct block_dev_desc (@part.h). It returns unsigned long, where
> > 0 indicates error and > 0 indicates that medium operation was
> > correct.
> > 
> > This patch as error regards 0 returned from block_{read|write|erase}
> > when nr_blocks is grater than zero. Read/Write operation with
> > nr_blocks=0 should return 0 and hence is not considered as an error.
> > 
> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> 
> Are there any more questions regarding this patch? I'd be more than
> happy if it would be added to v2015.10 :-).

Gentle ping :-)

> 
> > 
> > Test HW: Odroid XU3 - Exynos 5433
> > ---
> >  fs/fat/fat.c       | 11 +++++++++--
> >  fs/fat/fat_write.c | 11 +++++++++--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/fat/fat.c b/fs/fat/fat.c
> > index bccc3e3..d743014 100644
> > --- a/fs/fat/fat.c
> > +++ b/fs/fat/fat.c
> > @@ -45,11 +45,18 @@ static disk_partition_t cur_part_info;
> >  
> >  static int disk_read(__u32 block, __u32 nr_blocks, void *buf)
> >  {
> > +	ulong ret;
> > +
> >  	if (!cur_dev || !cur_dev->block_read)
> >  		return -1;
> >  
> > -	return cur_dev->block_read(cur_dev->dev,
> > -			cur_part_info.start + block, nr_blocks,
> > buf);
> > +	ret = cur_dev->block_read(cur_dev->dev,
> > +				  cur_part_info.start + block,
> > nr_blocks, buf); +
> > +	if (nr_blocks && ret == 0)
> > +		return -1;
> > +
> > +	return ret;
> >  }
> >  
> >  int fat_set_blk_dev(block_dev_desc_t *dev_desc, disk_partition_t
> > *info) diff --git a/fs/fat/fat_write.c b/fs/fat/fat_write.c
> > index 98b88ad..adb6940 100644
> > --- a/fs/fat/fat_write.c
> > +++ b/fs/fat/fat_write.c
> > @@ -30,6 +30,8 @@ static void uppercase(char *str, int len)
> >  static int total_sector;
> >  static int disk_write(__u32 block, __u32 nr_blocks, void *buf)
> >  {
> > +	ulong ret;
> > +
> >  	if (!cur_dev || !cur_dev->block_write)
> >  		return -1;
> >  
> > @@ -39,8 +41,13 @@ static int disk_write(__u32 block, __u32
> > nr_blocks, void *buf) return -1;
> >  	}
> >  
> > -	return cur_dev->block_write(cur_dev->dev,
> > -			cur_part_info.start + block,
> > nr_blocks,	buf);
> > +	ret = cur_dev->block_write(cur_dev->dev,
> > +				   cur_part_info.start + block,
> > +				   nr_blocks, buf);
> > +	if (nr_blocks && ret == 0)
> > +		return -1;
> > +
> > +	return ret;
> >  }
> >  
> >  /*
> 
> 
> 



-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-17 14:43                     ` Lukasz Majewski
@ 2015-09-18  0:31                       ` Tom Rini
  2015-09-18  7:32                         ` Lukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Rini @ 2015-09-18  0:31 UTC (permalink / raw)
  To: u-boot

On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote:

> Hi Tom,
> 
> > On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski wrote:
> > > Hi Alexey,
> > > 
> > > > Hi Marek, Lukasz,
> > > > 
> > > > On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:
> > > > > On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz Majewski
> > > > > wrote:
> > > > > > Hi Marek,
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > > > > Still we need to fix regression first with virtually
> > > > > > > > > infinite timeout :) I would even thing that simple
> > > > > > > > > revert of Marek's patch may make sense for now.
> > > > > > > > 
> > > > > > > > +1 - unfortunately there were some other patches applied
> > > > > > > > to this particular patch. Simple revert might be a bit
> > > > > > > > tricky here.
> > > > > > > 
> > > > > > > -1 - In case the card gets removed during the DMA transfer
> > > > > > > and the board doesn't have a watchdog, it will get stuck
> > > > > > > indefinitelly.
> > > > > > 
> > > > > > I'm just wondering here - why the indefinite loop was working
> > > > > > previously? Was anybody complaining (on the ML) about the
> > > > > > problem of removing the SD card when some operation is
> > > > > > ongoing?
> > > > > 
> > > > > It worked for me for all the workloads I used. Noone was
> > > > > complaining.
> > > > 
> > > > The same story here - previous code with infinite loop was
> > > > working for my boards. And now I do see a problem with pretty
> > > > simple scenario that we do use in our products.
> > > > 
> > > > > > The problem with potential removal of SD card (after booting
> > > > > > the board) is with us for quite long time. Even with
> > > > > > indefinite loop (without your patch) we also could "hang" the
> > > > > > board if the SD card was removed during a transfer.
> > > > > 
> > > > > Which is why we should weed out the unbounded loops.
> > > > > 
> > > > > > > We
> > > > > > > absolutelly don't want this sort of behavior in U-Boot. I
> > > > > > > understand that this is the easiest way for everyone to
> > > > > > > achieve some sort of "working" solution, but it is
> > > > > > > definitelly not the correct one. While I do agree to
> > > > > > > increasing the timeout, I do not agree to unbounded loops,
> > > > > > > sorry.
> > > > > > 
> > > > > > We have agreed to not agree :-)
> > > > > 
> > > > > Yes :-)
> > > > 
> > > > The first thing I care is working U-Boot v2015.10 out of the box
> > > > on my boards. And so I may agree on any temporary solution. I see
> > > > it as timeout value either being infinite or obviously very high
> > > > like 60 seconds.
> > > > 
> > > > 60 seconds might sound stupid but my thought behind this is to
> > > > make sure even long transfers succeed. Imagine 100 Mb rootfs or
> > > > update file downloaded from slow SD-card.
> > > 
> > > Transfer of rootfs to SD-card (downloaded to memory via tftp) is
> > > definitely valid scenario.
> > > 
> > > > > > > > > From both points of view for keeping history
> > > > > > > > > clean (compared to stacked fixes/workarounds) and from
> > > > > > > > > removal of regression root cause.
> > > > > > > > 
> > > > > > > > As I said before - +1 from me.
> > > > > > > 
> > > > > > > As I said before, -1 from me. Btw. did anything regress in
> > > > > > > here? To me, this seems like a newly discovered bug ...
> > > > > > 
> > > > > > Yes, this is a bug. We had similar problem with Samsung's
> > > > > > SDHCI, before we switched to dw_mmc. This issue is new at
> > > > > > dw_mmc.
> > > > > > 
> > > > > > > > > It's not that I like to have infinite loops but given
> > > > > > > > > previous implementation worked fine for people in the
> > > > > > > > > previous U-Boot release.
> > > > > > > > 
> > > > > > > > Good justification
> > > > > > > 
> > > > > > > It is never a justified to return to a potentially
> > > > > > > problematic version
> > > > > > 
> > > > > > IMHO revering the change (before the release) is from the
> > > > > > software development point of view better solution than
> > > > > > adding some heuristic delta to timeout.
> > > > > > 
> > > > > > > for the sake of getting some sort of crappy hardware
> > > > > > > operational.
> > > > > > 
> > > > > > Unfortunately this "crappy hardware" is pervasive and we
> > > > > > cannot do anything about it.
> > > > > > 
> > > > > > To sum up (my point of view):
> > > > > > 1. The best would be to revert the patch - but if simple "git
> > > > > > revert" is not working then,
> > > > 
> > > > Well even if clean revert won't work we may do manual tweaks so
> > > > that functionally it is "revert". If of any interest I may come
> > > > up with that sort of patch.
> > > > 
> > > > > > 2. We should increase the timeout (with my patch) for v2015.10
> > > > > > release
> > > > 
> > > > If everybody is OK with that let's go do it. Because release is
> > > > around the corner and I don't want to explain each and every user
> > > > how to fix their new problem.
> > > 
> > > v2015.10 correctness is crucial here.

Yes.

> > > > > Let's do this for the sake of crappy cards.
> > > > > 
> > > > > > 3. After release we can devise some kind of solution
> > > > > > 4. It is a good topic for U-boot's minisummit discussion at
> > > > > > Dublin
> > > > > > 
> > > > > > Marek, Alexey, Tom, Pantelis what do you think?
> > > > > 
> > > > > I think yes.
> > > > 
> > > > What's important we need to make sure Tom is aware of this
> > > > situation and he won't cut a release until our fix is in place
> > > > and all involved parties reported their happiness.
> > > 
> 
> 
> > > I also think that Tom should speak up on this issue.
> 
> Tom, could you give your opinion on this issue?

Well, is there a concensus patch now?

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150917/45ebc03d/attachment.sig>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-18  0:31                       ` Tom Rini
@ 2015-09-18  7:32                         ` Lukasz Majewski
  2015-09-18  8:07                           ` Przemyslaw Marczak
  2015-09-18 19:27                           ` Tom Rini
  0 siblings, 2 replies; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-18  7:32 UTC (permalink / raw)
  To: u-boot

Hi Tom,

> On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote:
> 
> > Hi Tom,
> > 
> > > On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski
> > > wrote:
> > > > Hi Alexey,
> > > > 
> > > > > Hi Marek, Lukasz,
> > > > > 
> > > > > On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:
> > > > > > On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz
> > > > > > Majewski wrote:
> > > > > > > Hi Marek,
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > > > > Still we need to fix regression first with virtually
> > > > > > > > > > infinite timeout :) I would even thing that simple
> > > > > > > > > > revert of Marek's patch may make sense for now.
> > > > > > > > > 
> > > > > > > > > +1 - unfortunately there were some other patches
> > > > > > > > > applied to this particular patch. Simple revert might
> > > > > > > > > be a bit tricky here.
> > > > > > > > 
> > > > > > > > -1 - In case the card gets removed during the DMA
> > > > > > > > transfer and the board doesn't have a watchdog, it will
> > > > > > > > get stuck indefinitelly.
> > > > > > > 
> > > > > > > I'm just wondering here - why the indefinite loop was
> > > > > > > working previously? Was anybody complaining (on the ML)
> > > > > > > about the problem of removing the SD card when some
> > > > > > > operation is ongoing?
> > > > > > 
> > > > > > It worked for me for all the workloads I used. Noone was
> > > > > > complaining.
> > > > > 
> > > > > The same story here - previous code with infinite loop was
> > > > > working for my boards. And now I do see a problem with pretty
> > > > > simple scenario that we do use in our products.
> > > > > 
> > > > > > > The problem with potential removal of SD card (after
> > > > > > > booting the board) is with us for quite long time. Even
> > > > > > > with indefinite loop (without your patch) we also could
> > > > > > > "hang" the board if the SD card was removed during a
> > > > > > > transfer.
> > > > > > 
> > > > > > Which is why we should weed out the unbounded loops.
> > > > > > 
> > > > > > > > We
> > > > > > > > absolutelly don't want this sort of behavior in U-Boot.
> > > > > > > > I understand that this is the easiest way for everyone
> > > > > > > > to achieve some sort of "working" solution, but it is
> > > > > > > > definitelly not the correct one. While I do agree to
> > > > > > > > increasing the timeout, I do not agree to unbounded
> > > > > > > > loops, sorry.
> > > > > > > 
> > > > > > > We have agreed to not agree :-)
> > > > > > 
> > > > > > Yes :-)
> > > > > 
> > > > > The first thing I care is working U-Boot v2015.10 out of the
> > > > > box on my boards. And so I may agree on any temporary
> > > > > solution. I see it as timeout value either being infinite or
> > > > > obviously very high like 60 seconds.
> > > > > 
> > > > > 60 seconds might sound stupid but my thought behind this is to
> > > > > make sure even long transfers succeed. Imagine 100 Mb rootfs
> > > > > or update file downloaded from slow SD-card.
> > > > 
> > > > Transfer of rootfs to SD-card (downloaded to memory via tftp) is
> > > > definitely valid scenario.
> > > > 
> > > > > > > > > > From both points of view for keeping history
> > > > > > > > > > clean (compared to stacked fixes/workarounds) and
> > > > > > > > > > from removal of regression root cause.
> > > > > > > > > 
> > > > > > > > > As I said before - +1 from me.
> > > > > > > > 
> > > > > > > > As I said before, -1 from me. Btw. did anything regress
> > > > > > > > in here? To me, this seems like a newly discovered
> > > > > > > > bug ...
> > > > > > > 
> > > > > > > Yes, this is a bug. We had similar problem with Samsung's
> > > > > > > SDHCI, before we switched to dw_mmc. This issue is new at
> > > > > > > dw_mmc.
> > > > > > > 
> > > > > > > > > > It's not that I like to have infinite loops but
> > > > > > > > > > given previous implementation worked fine for
> > > > > > > > > > people in the previous U-Boot release.
> > > > > > > > > 
> > > > > > > > > Good justification
> > > > > > > > 
> > > > > > > > It is never a justified to return to a potentially
> > > > > > > > problematic version
> > > > > > > 
> > > > > > > IMHO revering the change (before the release) is from the
> > > > > > > software development point of view better solution than
> > > > > > > adding some heuristic delta to timeout.
> > > > > > > 
> > > > > > > > for the sake of getting some sort of crappy hardware
> > > > > > > > operational.
> > > > > > > 
> > > > > > > Unfortunately this "crappy hardware" is pervasive and we
> > > > > > > cannot do anything about it.
> > > > > > > 
> > > > > > > To sum up (my point of view):
> > > > > > > 1. The best would be to revert the patch - but if simple
> > > > > > > "git revert" is not working then,
> > > > > 
> > > > > Well even if clean revert won't work we may do manual tweaks
> > > > > so that functionally it is "revert". If of any interest I may
> > > > > come up with that sort of patch.
> > > > > 
> > > > > > > 2. We should increase the timeout (with my patch) for
> > > > > > > v2015.10 release
> > > > > 
> > > > > If everybody is OK with that let's go do it. Because release
> > > > > is around the corner and I don't want to explain each and
> > > > > every user how to fix their new problem.
> > > > 
> > > > v2015.10 correctness is crucial here.
> 
> Yes.
> 
> > > > > > Let's do this for the sake of crappy cards.
> > > > > > 
> > > > > > > 3. After release we can devise some kind of solution
> > > > > > > 4. It is a good topic for U-boot's minisummit discussion
> > > > > > > at Dublin
> > > > > > > 
> > > > > > > Marek, Alexey, Tom, Pantelis what do you think?
> > > > > > 
> > > > > > I think yes.
> > > > > 
> > > > > What's important we need to make sure Tom is aware of this
> > > > > situation and he won't cut a release until our fix is in place
> > > > > and all involved parties reported their happiness.
> > > > 
> > 
> > 
> > > > I also think that Tom should speak up on this issue.
> > 
> > Tom, could you give your opinion on this issue?
> 
> Well, is there a concensus patch now?
> 

There isn't a consensus patch.

There are two options:
1. Try to revert changes (which remove endless loop)

2. Increase the delay to e.g. 4 minutes (as it is done in Linux) before
v2015.10 and provide correct solution (based on internal SD card
information) after the release.


-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-18  7:32                         ` Lukasz Majewski
@ 2015-09-18  8:07                           ` Przemyslaw Marczak
  2015-09-18 19:27                           ` Tom Rini
  1 sibling, 0 replies; 50+ messages in thread
From: Przemyslaw Marczak @ 2015-09-18  8:07 UTC (permalink / raw)
  To: u-boot

Hello,

On 09/18/2015 09:32 AM, Lukasz Majewski wrote:
> Hi Tom,
>
>> On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote:
>>
>>> Hi Tom,
>>>
>>>> On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski
>>>> wrote:
>>>>> Hi Alexey,
>>>>>
>>>>>> Hi Marek, Lukasz,
>>>>>>
>>>>>> On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:
>>>>>>> On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz
>>>>>>> Majewski wrote:
>>>>>>>> Hi Marek,
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>>>>> Still we need to fix regression first with virtually
>>>>>>>>>>> infinite timeout :) I would even thing that simple
>>>>>>>>>>> revert of Marek's patch may make sense for now.
>>>>>>>>>>
>>>>>>>>>> +1 - unfortunately there were some other patches
>>>>>>>>>> applied to this particular patch. Simple revert might
>>>>>>>>>> be a bit tricky here.
>>>>>>>>>
>>>>>>>>> -1 - In case the card gets removed during the DMA
>>>>>>>>> transfer and the board doesn't have a watchdog, it will
>>>>>>>>> get stuck indefinitelly.
>>>>>>>>
>>>>>>>> I'm just wondering here - why the indefinite loop was
>>>>>>>> working previously? Was anybody complaining (on the ML)
>>>>>>>> about the problem of removing the SD card when some
>>>>>>>> operation is ongoing?
>>>>>>>
>>>>>>> It worked for me for all the workloads I used. Noone was
>>>>>>> complaining.
>>>>>>
>>>>>> The same story here - previous code with infinite loop was
>>>>>> working for my boards. And now I do see a problem with pretty
>>>>>> simple scenario that we do use in our products.
>>>>>>
>>>>>>>> The problem with potential removal of SD card (after
>>>>>>>> booting the board) is with us for quite long time. Even
>>>>>>>> with indefinite loop (without your patch) we also could
>>>>>>>> "hang" the board if the SD card was removed during a
>>>>>>>> transfer.
>>>>>>>
>>>>>>> Which is why we should weed out the unbounded loops.
>>>>>>>
>>>>>>>>> We
>>>>>>>>> absolutelly don't want this sort of behavior in U-Boot.
>>>>>>>>> I understand that this is the easiest way for everyone
>>>>>>>>> to achieve some sort of "working" solution, but it is
>>>>>>>>> definitelly not the correct one. While I do agree to
>>>>>>>>> increasing the timeout, I do not agree to unbounded
>>>>>>>>> loops, sorry.
>>>>>>>>
>>>>>>>> We have agreed to not agree :-)
>>>>>>>
>>>>>>> Yes :-)
>>>>>>
>>>>>> The first thing I care is working U-Boot v2015.10 out of the
>>>>>> box on my boards. And so I may agree on any temporary
>>>>>> solution. I see it as timeout value either being infinite or
>>>>>> obviously very high like 60 seconds.
>>>>>>
>>>>>> 60 seconds might sound stupid but my thought behind this is to
>>>>>> make sure even long transfers succeed. Imagine 100 Mb rootfs
>>>>>> or update file downloaded from slow SD-card.
>>>>>
>>>>> Transfer of rootfs to SD-card (downloaded to memory via tftp) is
>>>>> definitely valid scenario.
>>>>>
>>>>>>>>>>>  From both points of view for keeping history
>>>>>>>>>>> clean (compared to stacked fixes/workarounds) and
>>>>>>>>>>> from removal of regression root cause.
>>>>>>>>>>
>>>>>>>>>> As I said before - +1 from me.
>>>>>>>>>
>>>>>>>>> As I said before, -1 from me. Btw. did anything regress
>>>>>>>>> in here? To me, this seems like a newly discovered
>>>>>>>>> bug ...
>>>>>>>>
>>>>>>>> Yes, this is a bug. We had similar problem with Samsung's
>>>>>>>> SDHCI, before we switched to dw_mmc. This issue is new at
>>>>>>>> dw_mmc.
>>>>>>>>
>>>>>>>>>>> It's not that I like to have infinite loops but
>>>>>>>>>>> given previous implementation worked fine for
>>>>>>>>>>> people in the previous U-Boot release.
>>>>>>>>>>
>>>>>>>>>> Good justification
>>>>>>>>>
>>>>>>>>> It is never a justified to return to a potentially
>>>>>>>>> problematic version
>>>>>>>>
>>>>>>>> IMHO revering the change (before the release) is from the
>>>>>>>> software development point of view better solution than
>>>>>>>> adding some heuristic delta to timeout.
>>>>>>>>
>>>>>>>>> for the sake of getting some sort of crappy hardware
>>>>>>>>> operational.
>>>>>>>>
>>>>>>>> Unfortunately this "crappy hardware" is pervasive and we
>>>>>>>> cannot do anything about it.
>>>>>>>>
>>>>>>>> To sum up (my point of view):
>>>>>>>> 1. The best would be to revert the patch - but if simple
>>>>>>>> "git revert" is not working then,
>>>>>>
>>>>>> Well even if clean revert won't work we may do manual tweaks
>>>>>> so that functionally it is "revert". If of any interest I may
>>>>>> come up with that sort of patch.
>>>>>>
>>>>>>>> 2. We should increase the timeout (with my patch) for
>>>>>>>> v2015.10 release
>>>>>>
>>>>>> If everybody is OK with that let's go do it. Because release
>>>>>> is around the corner and I don't want to explain each and
>>>>>> every user how to fix their new problem.
>>>>>
>>>>> v2015.10 correctness is crucial here.
>>
>> Yes.
>>
>>>>>>> Let's do this for the sake of crappy cards.
>>>>>>>
>>>>>>>> 3. After release we can devise some kind of solution
>>>>>>>> 4. It is a good topic for U-boot's minisummit discussion
>>>>>>>> at Dublin
>>>>>>>>
>>>>>>>> Marek, Alexey, Tom, Pantelis what do you think?
>>>>>>>
>>>>>>> I think yes.
>>>>>>
>>>>>> What's important we need to make sure Tom is aware of this
>>>>>> situation and he won't cut a release until our fix is in place
>>>>>> and all involved parties reported their happiness.
>>>>>
>>>
>>>
>>>>> I also think that Tom should speak up on this issue.
>>>
>>> Tom, could you give your opinion on this issue?
>>
>> Well, is there a concensus patch now?
>>
>
> There isn't a consensus patch.
>
> There are two options:
> 1. Try to revert changes (which remove endless loop)
>
> 2. Increase the delay to e.g. 4 minutes (as it is done in Linux) before
> v2015.10 and provide correct solution (based on internal SD card
> information) after the release.
>
>

I recommend the second option, however please note that SD card doesn't 
provide any information about the delay required to finish the command,
beside some informations about block erase operation time.

The block erase time info, probably depends on vendor implementation, so 
it would be better to use as long time for delay as possible, and just 
rely on the controller error status info instead of breaking the 
operation after too short timeout.

In such case I think, that it would be nice if user could be notified 
with some time interval, that the card is busy.

In other words, when he's waiting the second minute for the transfer 
end, then how does he know if the board doesn't hang?

Best regards,
-- 
Przemyslaw Marczak
Samsung R&D Institute Poland
Samsung Electronics
p.marczak at samsung.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-18  7:32                         ` Lukasz Majewski
  2015-09-18  8:07                           ` Przemyslaw Marczak
@ 2015-09-18 19:27                           ` Tom Rini
  2015-09-21 15:32                             ` Pantelis Antoniou
  1 sibling, 1 reply; 50+ messages in thread
From: Tom Rini @ 2015-09-18 19:27 UTC (permalink / raw)
  To: u-boot

On Fri, Sep 18, 2015 at 09:32:47AM +0200, Lukasz Majewski wrote:
> Hi Tom,
> 
> > On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote:
> > 
> > > Hi Tom,
> > > 
> > > > On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski
> > > > wrote:
> > > > > Hi Alexey,
> > > > > 
> > > > > > Hi Marek, Lukasz,
> > > > > > 
> > > > > > On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:
> > > > > > > On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz
> > > > > > > Majewski wrote:
> > > > > > > > Hi Marek,
> > > > > > > 
> > > > > > > Hi,
> > > > > > > 
> > > > > > > [...]
> > > > > > > 
> > > > > > > > > > > Still we need to fix regression first with virtually
> > > > > > > > > > > infinite timeout :) I would even thing that simple
> > > > > > > > > > > revert of Marek's patch may make sense for now.
> > > > > > > > > > 
> > > > > > > > > > +1 - unfortunately there were some other patches
> > > > > > > > > > applied to this particular patch. Simple revert might
> > > > > > > > > > be a bit tricky here.
> > > > > > > > > 
> > > > > > > > > -1 - In case the card gets removed during the DMA
> > > > > > > > > transfer and the board doesn't have a watchdog, it will
> > > > > > > > > get stuck indefinitelly.
> > > > > > > > 
> > > > > > > > I'm just wondering here - why the indefinite loop was
> > > > > > > > working previously? Was anybody complaining (on the ML)
> > > > > > > > about the problem of removing the SD card when some
> > > > > > > > operation is ongoing?
> > > > > > > 
> > > > > > > It worked for me for all the workloads I used. Noone was
> > > > > > > complaining.
> > > > > > 
> > > > > > The same story here - previous code with infinite loop was
> > > > > > working for my boards. And now I do see a problem with pretty
> > > > > > simple scenario that we do use in our products.
> > > > > > 
> > > > > > > > The problem with potential removal of SD card (after
> > > > > > > > booting the board) is with us for quite long time. Even
> > > > > > > > with indefinite loop (without your patch) we also could
> > > > > > > > "hang" the board if the SD card was removed during a
> > > > > > > > transfer.
> > > > > > > 
> > > > > > > Which is why we should weed out the unbounded loops.
> > > > > > > 
> > > > > > > > > We
> > > > > > > > > absolutelly don't want this sort of behavior in U-Boot.
> > > > > > > > > I understand that this is the easiest way for everyone
> > > > > > > > > to achieve some sort of "working" solution, but it is
> > > > > > > > > definitelly not the correct one. While I do agree to
> > > > > > > > > increasing the timeout, I do not agree to unbounded
> > > > > > > > > loops, sorry.
> > > > > > > > 
> > > > > > > > We have agreed to not agree :-)
> > > > > > > 
> > > > > > > Yes :-)
> > > > > > 
> > > > > > The first thing I care is working U-Boot v2015.10 out of the
> > > > > > box on my boards. And so I may agree on any temporary
> > > > > > solution. I see it as timeout value either being infinite or
> > > > > > obviously very high like 60 seconds.
> > > > > > 
> > > > > > 60 seconds might sound stupid but my thought behind this is to
> > > > > > make sure even long transfers succeed. Imagine 100 Mb rootfs
> > > > > > or update file downloaded from slow SD-card.
> > > > > 
> > > > > Transfer of rootfs to SD-card (downloaded to memory via tftp) is
> > > > > definitely valid scenario.
> > > > > 
> > > > > > > > > > > From both points of view for keeping history
> > > > > > > > > > > clean (compared to stacked fixes/workarounds) and
> > > > > > > > > > > from removal of regression root cause.
> > > > > > > > > > 
> > > > > > > > > > As I said before - +1 from me.
> > > > > > > > > 
> > > > > > > > > As I said before, -1 from me. Btw. did anything regress
> > > > > > > > > in here? To me, this seems like a newly discovered
> > > > > > > > > bug ...
> > > > > > > > 
> > > > > > > > Yes, this is a bug. We had similar problem with Samsung's
> > > > > > > > SDHCI, before we switched to dw_mmc. This issue is new at
> > > > > > > > dw_mmc.
> > > > > > > > 
> > > > > > > > > > > It's not that I like to have infinite loops but
> > > > > > > > > > > given previous implementation worked fine for
> > > > > > > > > > > people in the previous U-Boot release.
> > > > > > > > > > 
> > > > > > > > > > Good justification
> > > > > > > > > 
> > > > > > > > > It is never a justified to return to a potentially
> > > > > > > > > problematic version
> > > > > > > > 
> > > > > > > > IMHO revering the change (before the release) is from the
> > > > > > > > software development point of view better solution than
> > > > > > > > adding some heuristic delta to timeout.
> > > > > > > > 
> > > > > > > > > for the sake of getting some sort of crappy hardware
> > > > > > > > > operational.
> > > > > > > > 
> > > > > > > > Unfortunately this "crappy hardware" is pervasive and we
> > > > > > > > cannot do anything about it.
> > > > > > > > 
> > > > > > > > To sum up (my point of view):
> > > > > > > > 1. The best would be to revert the patch - but if simple
> > > > > > > > "git revert" is not working then,
> > > > > > 
> > > > > > Well even if clean revert won't work we may do manual tweaks
> > > > > > so that functionally it is "revert". If of any interest I may
> > > > > > come up with that sort of patch.
> > > > > > 
> > > > > > > > 2. We should increase the timeout (with my patch) for
> > > > > > > > v2015.10 release
> > > > > > 
> > > > > > If everybody is OK with that let's go do it. Because release
> > > > > > is around the corner and I don't want to explain each and
> > > > > > every user how to fix their new problem.
> > > > > 
> > > > > v2015.10 correctness is crucial here.
> > 
> > Yes.
> > 
> > > > > > > Let's do this for the sake of crappy cards.
> > > > > > > 
> > > > > > > > 3. After release we can devise some kind of solution
> > > > > > > > 4. It is a good topic for U-boot's minisummit discussion
> > > > > > > > at Dublin
> > > > > > > > 
> > > > > > > > Marek, Alexey, Tom, Pantelis what do you think?
> > > > > > > 
> > > > > > > I think yes.
> > > > > > 
> > > > > > What's important we need to make sure Tom is aware of this
> > > > > > situation and he won't cut a release until our fix is in place
> > > > > > and all involved parties reported their happiness.
> > > > > 
> > > 
> > > 
> > > > > I also think that Tom should speak up on this issue.
> > > 
> > > Tom, could you give your opinion on this issue?
> > 
> > Well, is there a concensus patch now?
> > 
> 
> There isn't a consensus patch.
> 
> There are two options:
> 1. Try to revert changes (which remove endless loop)
> 
> 2. Increase the delay to e.g. 4 minutes (as it is done in Linux) before
> v2015.10 and provide correct solution (based on internal SD card
> information) after the release.

OK, lets go with #2.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150918/770817ca/attachment.sig>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
  2015-09-18 19:27                           ` Tom Rini
@ 2015-09-21 15:32                             ` Pantelis Antoniou
  0 siblings, 0 replies; 50+ messages in thread
From: Pantelis Antoniou @ 2015-09-21 15:32 UTC (permalink / raw)
  To: u-boot

Hi Tom,

> On Sep 18, 2015, at 22:27 , Tom Rini <trini@konsulko.com> wrote:
> 
> On Fri, Sep 18, 2015 at 09:32:47AM +0200, Lukasz Majewski wrote:
>> Hi Tom,
>> 
>>> On Thu, Sep 17, 2015 at 04:43:33PM +0200, Lukasz Majewski wrote:
>>> 
>>>> Hi Tom,
>>>> 
>>>>> On Monday, September 14, 2015 at 01:22:20 PM, Lukasz Majewski
>>>>> wrote:
>>>>>> Hi Alexey,
>>>>>> 
>>>>>>> Hi Marek, Lukasz,
>>>>>>> 
>>>>>>> On Sun, 2015-09-13 at 16:00 +0200, Marek Vasut wrote:
>>>>>>>> On Sunday, September 13, 2015 at 12:03:18 PM, Lukasz
>>>>>>>> Majewski wrote:
>>>>>>>>> Hi Marek,
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> [...]
>>>>>>>> 
>>>>>>>>>>>> Still we need to fix regression first with virtually
>>>>>>>>>>>> infinite timeout :) I would even thing that simple
>>>>>>>>>>>> revert of Marek's patch may make sense for now.
>>>>>>>>>>> 
>>>>>>>>>>> +1 - unfortunately there were some other patches
>>>>>>>>>>> applied to this particular patch. Simple revert might
>>>>>>>>>>> be a bit tricky here.
>>>>>>>>>> 
>>>>>>>>>> -1 - In case the card gets removed during the DMA
>>>>>>>>>> transfer and the board doesn't have a watchdog, it will
>>>>>>>>>> get stuck indefinitelly.
>>>>>>>>> 
>>>>>>>>> I'm just wondering here - why the indefinite loop was
>>>>>>>>> working previously? Was anybody complaining (on the ML)
>>>>>>>>> about the problem of removing the SD card when some
>>>>>>>>> operation is ongoing?
>>>>>>>> 
>>>>>>>> It worked for me for all the workloads I used. Noone was
>>>>>>>> complaining.
>>>>>>> 
>>>>>>> The same story here - previous code with infinite loop was
>>>>>>> working for my boards. And now I do see a problem with pretty
>>>>>>> simple scenario that we do use in our products.
>>>>>>> 
>>>>>>>>> The problem with potential removal of SD card (after
>>>>>>>>> booting the board) is with us for quite long time. Even
>>>>>>>>> with indefinite loop (without your patch) we also could
>>>>>>>>> "hang" the board if the SD card was removed during a
>>>>>>>>> transfer.
>>>>>>>> 
>>>>>>>> Which is why we should weed out the unbounded loops.
>>>>>>>> 
>>>>>>>>>> We
>>>>>>>>>> absolutelly don't want this sort of behavior in U-Boot.
>>>>>>>>>> I understand that this is the easiest way for everyone
>>>>>>>>>> to achieve some sort of "working" solution, but it is
>>>>>>>>>> definitelly not the correct one. While I do agree to
>>>>>>>>>> increasing the timeout, I do not agree to unbounded
>>>>>>>>>> loops, sorry.
>>>>>>>>> 
>>>>>>>>> We have agreed to not agree :-)
>>>>>>>> 
>>>>>>>> Yes :-)
>>>>>>> 
>>>>>>> The first thing I care is working U-Boot v2015.10 out of the
>>>>>>> box on my boards. And so I may agree on any temporary
>>>>>>> solution. I see it as timeout value either being infinite or
>>>>>>> obviously very high like 60 seconds.
>>>>>>> 
>>>>>>> 60 seconds might sound stupid but my thought behind this is to
>>>>>>> make sure even long transfers succeed. Imagine 100 Mb rootfs
>>>>>>> or update file downloaded from slow SD-card.
>>>>>> 
>>>>>> Transfer of rootfs to SD-card (downloaded to memory via tftp) is
>>>>>> definitely valid scenario.
>>>>>> 
>>>>>>>>>>>> From both points of view for keeping history
>>>>>>>>>>>> clean (compared to stacked fixes/workarounds) and
>>>>>>>>>>>> from removal of regression root cause.
>>>>>>>>>>> 
>>>>>>>>>>> As I said before - +1 from me.
>>>>>>>>>> 
>>>>>>>>>> As I said before, -1 from me. Btw. did anything regress
>>>>>>>>>> in here? To me, this seems like a newly discovered
>>>>>>>>>> bug ...
>>>>>>>>> 
>>>>>>>>> Yes, this is a bug. We had similar problem with Samsung's
>>>>>>>>> SDHCI, before we switched to dw_mmc. This issue is new at
>>>>>>>>> dw_mmc.
>>>>>>>>> 
>>>>>>>>>>>> It's not that I like to have infinite loops but
>>>>>>>>>>>> given previous implementation worked fine for
>>>>>>>>>>>> people in the previous U-Boot release.
>>>>>>>>>>> 
>>>>>>>>>>> Good justification
>>>>>>>>>> 
>>>>>>>>>> It is never a justified to return to a potentially
>>>>>>>>>> problematic version
>>>>>>>>> 
>>>>>>>>> IMHO revering the change (before the release) is from the
>>>>>>>>> software development point of view better solution than
>>>>>>>>> adding some heuristic delta to timeout.
>>>>>>>>> 
>>>>>>>>>> for the sake of getting some sort of crappy hardware
>>>>>>>>>> operational.
>>>>>>>>> 
>>>>>>>>> Unfortunately this "crappy hardware" is pervasive and we
>>>>>>>>> cannot do anything about it.
>>>>>>>>> 
>>>>>>>>> To sum up (my point of view):
>>>>>>>>> 1. The best would be to revert the patch - but if simple
>>>>>>>>> "git revert" is not working then,
>>>>>>> 
>>>>>>> Well even if clean revert won't work we may do manual tweaks
>>>>>>> so that functionally it is "revert". If of any interest I may
>>>>>>> come up with that sort of patch.
>>>>>>> 
>>>>>>>>> 2. We should increase the timeout (with my patch) for
>>>>>>>>> v2015.10 release
>>>>>>> 
>>>>>>> If everybody is OK with that let's go do it. Because release
>>>>>>> is around the corner and I don't want to explain each and
>>>>>>> every user how to fix their new problem.
>>>>>> 
>>>>>> v2015.10 correctness is crucial here.
>>> 
>>> Yes.
>>> 
>>>>>>>> Let's do this for the sake of crappy cards.
>>>>>>>> 
>>>>>>>>> 3. After release we can devise some kind of solution
>>>>>>>>> 4. It is a good topic for U-boot's minisummit discussion
>>>>>>>>> at Dublin
>>>>>>>>> 
>>>>>>>>> Marek, Alexey, Tom, Pantelis what do you think?
>>>>>>>> 
>>>>>>>> I think yes.
>>>>>>> 
>>>>>>> What's important we need to make sure Tom is aware of this
>>>>>>> situation and he won't cut a release until our fix is in place
>>>>>>> and all involved parties reported their happiness.
>>>>>> 
>>>> 
>>>> 
>>>>>> I also think that Tom should speak up on this issue.
>>>> 
>>>> Tom, could you give your opinion on this issue?
>>> 
>>> Well, is there a concensus patch now?
>>> 
>> 
>> There isn't a consensus patch.
>> 
>> There are two options:
>> 1. Try to revert changes (which remove endless loop)
>> 
>> 2. Increase the delay to e.g. 4 minutes (as it is done in Linux) before
>> v2015.10 and provide correct solution (based on internal SD card
>> information) after the release.
> 
> OK, lets go with #2.
> 

Send me a patch with the delay to 4mins please (blergh).

> -- 
> Tom

Regards

? Pantelis

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers
  2015-09-03 14:18         ` Lukasz Majewski
@ 2015-09-23  3:17           ` Stephen Warren
  2015-09-23  8:40             ` Lukasz Majewski
  0 siblings, 1 reply; 50+ messages in thread
From: Stephen Warren @ 2015-09-23  3:17 UTC (permalink / raw)
  To: u-boot

On 09/03/2015 08:18 AM, Lukasz Majewski wrote:
> Hi Lukasz,
> 
>> Hi Tom,
>>
>>> On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:
>>>
>>>> It is very common that FAT code is using following pattern:
>>>> if (disk_{read|write}() < 0)
>>>>         return -1;
>>>>
>>>> Up till now the above code was dead, since disk_{read|write) could
>>>> only return value >= 0.
>>>> As a result some errors from medium layer (i.e. eMMC/SD) were not
>>>> caught.
>>>>
>>>> The above behavior was caused by block_{read|write|erase} declared
>>>> at struct block_dev_desc (@part.h). It returns unsigned long,
>>>> where 0 indicates error and > 0 indicates that medium operation
>>>> was correct.
>>>>
>>>> This patch as error regards 0 returned from
>>>> block_{read|write|erase} when nr_blocks is grater than zero.
>>>> Read/Write operation with nr_blocks=0 should return 0 and hence
>>>> is not considered as an error.
>>>>
>>>> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
>>>>
>>>> Test HW: Odroid XU3 - Exynos 5433
>>>
>>> Can you pick up Stephen's FAT replacement series and see if it also
>>> fixes this problem?  Thanks!
>>>
>>
>> Ok, I will test this fat implementation.
> 
> I've applied v2 of this patchset
> on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f
> 
> Unfortunately, DFU tests fail with first attempt to pass the test.

I've found a couple of problems.

First up, file_fat_write() wasn't truncating the file when writing, so
the file size wasn't changing when over-writing a large file with a
small file. With this fixed, I can run the DFU tests just fine for all
the small files (<1M). I've fixed this locally and in the ff branch on
my github.

Second, ff is slow:

Some random old build I had in flash on my system:
> Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
> reading dfu1.bin
> 1048576 bytes read in 95 ms (10.5 MiB/s)

With my ff branch:
> Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
> 1048576 bytes read in 5038 ms (203.1 KiB/s)

That's quite the slow-down! I believe this is causing dfu-util to time
out on the larger files (1M+). Just for functional testing, I'll try and
find a way to hack dfu-util to have a much larger timeout for the final
flush operation. I wonder if the old FAT implementation had a disk cache
(e.g. that 32K buffer in BSS?) and we need the same for ff? I'll try and
track down why it's so slow.

Perhaps there are other issues as yet unfound.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers
  2015-09-23  3:17           ` Stephen Warren
@ 2015-09-23  8:40             ` Lukasz Majewski
  2015-09-25  5:47               ` Stephen Warren
  0 siblings, 1 reply; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-23  8:40 UTC (permalink / raw)
  To: u-boot

Hi Stephen,

> On 09/03/2015 08:18 AM, Lukasz Majewski wrote:
> > Hi Lukasz,
> > 
> >> Hi Tom,
> >>
> >>> On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:
> >>>
> >>>> It is very common that FAT code is using following pattern:
> >>>> if (disk_{read|write}() < 0)
> >>>>         return -1;
> >>>>
> >>>> Up till now the above code was dead, since disk_{read|write)
> >>>> could only return value >= 0.
> >>>> As a result some errors from medium layer (i.e. eMMC/SD) were not
> >>>> caught.
> >>>>
> >>>> The above behavior was caused by block_{read|write|erase}
> >>>> declared at struct block_dev_desc (@part.h). It returns unsigned
> >>>> long, where 0 indicates error and > 0 indicates that medium
> >>>> operation was correct.
> >>>>
> >>>> This patch as error regards 0 returned from
> >>>> block_{read|write|erase} when nr_blocks is grater than zero.
> >>>> Read/Write operation with nr_blocks=0 should return 0 and hence
> >>>> is not considered as an error.
> >>>>
> >>>> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> >>>>
> >>>> Test HW: Odroid XU3 - Exynos 5433
> >>>
> >>> Can you pick up Stephen's FAT replacement series and see if it
> >>> also fixes this problem?  Thanks!
> >>>
> >>
> >> Ok, I will test this fat implementation.
> > 
> > I've applied v2 of this patchset
> > on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f
> > 
> > Unfortunately, DFU tests fail with first attempt to pass the test.
> 
> I've found a couple of problems.
> 
> First up, file_fat_write() wasn't truncating the file when writing, so
> the file size wasn't changing when over-writing a large file with a
> small file. With this fixed, I can run the DFU tests just fine for all
> the small files (<1M). I've fixed this locally and in the ff branch on
> my github.

Nice to hear that you have found the error.

> 
> Second, ff is slow:
> 
> Some random old build I had in flash on my system:
> > Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
> > reading dfu1.bin
> > 1048576 bytes read in 95 ms (10.5 MiB/s)
> 
> With my ff branch:
> > Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
> > 1048576 bytes read in 5038 ms (203.1 KiB/s)
> 
> That's quite the slow-down! I believe this is causing dfu-util to time
> out on the larger files (1M+). Just for functional testing, I'll try
> and find a way to hack dfu-util to have a much larger timeout for the
> final flush operation. I wonder if the old FAT implementation had a
> disk cache (e.g. that 32K buffer in BSS?) and we need the same for
> ff? 

I think that our current Fat implementation is optimized for tiny
embedded system (and probably no cache).

> I'll try and track down why it's so slow.
> 
> Perhaps there are other issues as yet unfound.

We might also check with sandbox FS set of tests.

-- 
Best regards,

Lukasz Majewski

Samsung R&D Institute Poland (SRPOL) | Linux Platform Group

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers
  2015-09-23  8:40             ` Lukasz Majewski
@ 2015-09-25  5:47               ` Stephen Warren
  0 siblings, 0 replies; 50+ messages in thread
From: Stephen Warren @ 2015-09-25  5:47 UTC (permalink / raw)
  To: u-boot

On 09/23/2015 02:40 AM, Lukasz Majewski wrote:
> Hi Stephen,
> 
>> On 09/03/2015 08:18 AM, Lukasz Majewski wrote:
>>> Hi Lukasz,
>>>
>>>> Hi Tom,
>>>>
>>>>> On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:
>>>>>
>>>>>> It is very common that FAT code is using following pattern:
>>>>>> if (disk_{read|write}() < 0)
>>>>>>         return -1;
>>>>>>
>>>>>> Up till now the above code was dead, since disk_{read|write)
>>>>>> could only return value >= 0.
>>>>>> As a result some errors from medium layer (i.e. eMMC/SD) were not
>>>>>> caught.
>>>>>>
>>>>>> The above behavior was caused by block_{read|write|erase}
>>>>>> declared at struct block_dev_desc (@part.h). It returns unsigned
>>>>>> long, where 0 indicates error and > 0 indicates that medium
>>>>>> operation was correct.
>>>>>>
>>>>>> This patch as error regards 0 returned from
>>>>>> block_{read|write|erase} when nr_blocks is grater than zero.
>>>>>> Read/Write operation with nr_blocks=0 should return 0 and hence
>>>>>> is not considered as an error.
>>>>>>
>>>>>> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
>>>>>>
>>>>>> Test HW: Odroid XU3 - Exynos 5433
>>>>>
>>>>> Can you pick up Stephen's FAT replacement series and see if it
>>>>> also fixes this problem?  Thanks!
>>>>>
>>>>
>>>> Ok, I will test this fat implementation.
>>>
>>> I've applied v2 of this patchset
>>> on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f
>>>
>>> Unfortunately, DFU tests fail with first attempt to pass the test.
>>
>> I've found a couple of problems.
>>
>> First up, file_fat_write() wasn't truncating the file when writing, so
>> the file size wasn't changing when over-writing a large file with a
>> small file. With this fixed, I can run the DFU tests just fine for all
>> the small files (<1M). I've fixed this locally and in the ff branch on
>> my github.
> 
> Nice to hear that you have found the error.
> 
>> Second, ff is slow:
>>
>> Some random old build I had in flash on my system:
>>> Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
>>> reading dfu1.bin
>>> 1048576 bytes read in 95 ms (10.5 MiB/s)
>>
>> With my ff branch:
>>> Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
>>> 1048576 bytes read in 5038 ms (203.1 KiB/s)
>>
>> That's quite the slow-down! I believe this is causing dfu-util to time
>> out on the larger files (1M+). Just for functional testing, I'll try
>> and find a way to hack dfu-util to have a much larger timeout for the
>> final flush operation. I wonder if the old FAT implementation had a
>> disk cache (e.g. that 32K buffer in BSS?) and we need the same for
>> ff? 

Extending the timeout (massively) in dfu-util did make
dfu_gadget_test.sh work.

> I think that our current Fat implementation is optimized for tiny
> embedded system (and probably no cache).

The ff library claims to be too. I'll try tracing the IO pattern in
sandbox and see where the difference lies.

>> I'll try and track down why it's so slow.
>>
>> Perhaps there are other issues as yet unfound.
> 
> We might also check with sandbox FS set of tests.

I get the same failures for fs-test.sh with or without this series; TC10
fails and everything else passes, for the non-"sb" FAT tests. (with the
file truncation fix I mentioned above applied, although I don't know if
it matters for fs-test.sh)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel)
  2015-08-28 13:50 [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds Lukasz Majewski
                   ` (2 preceding siblings ...)
  2015-09-09  7:01 ` Lukasz Majewski
@ 2015-09-25 16:25 ` Lukasz Majewski
  2015-09-28 13:43   ` Przemyslaw Marczak
  2015-09-28 21:08   ` [U-Boot] " Tom Rini
  3 siblings, 2 replies; 50+ messages in thread
From: Lukasz Majewski @ 2015-09-25 16:25 UTC (permalink / raw)
  To: u-boot

The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
"mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end
of dw mmc transfer.

For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
the default timeout is to short.

The new value - 4 minutes (240 seconds) - is the same as the one used in
Linux kernel driver. Such fix should be good enough until we come up
with better fix for this issue.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Cc: Marek Vasut <marex@denx.de>
Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
Cc: Tom Rini <trini@konsulko.com>
---
 drivers/mmc/dw_mmc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c
index a84c1e1..26d34ae 100644
--- a/drivers/mmc/dw_mmc.c
+++ b/drivers/mmc/dw_mmc.c
@@ -214,7 +214,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
 
 	if (data) {
 		start = get_timer(0);
-		timeout = 1000;
+		timeout = 240000;
 		for (;;) {
 			mask = dwmci_readl(host, DWMCI_RINTSTS);
 			/* Error during data transfer. */
-- 
2.0.0.rc2

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel)
  2015-09-25 16:25 ` [U-Boot] [PATCH] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel) Lukasz Majewski
@ 2015-09-28 13:43   ` Przemyslaw Marczak
  2015-09-28 21:08     ` Tom Rini
  2015-09-28 21:08   ` [U-Boot] " Tom Rini
  1 sibling, 1 reply; 50+ messages in thread
From: Przemyslaw Marczak @ 2015-09-28 13:43 UTC (permalink / raw)
  To: u-boot

Hi Lukasz,

On 09/25/2015 06:25 PM, Lukasz Majewski wrote:
> The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end
> of dw mmc transfer.
>
> For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> the default timeout is to short.
>
> The new value - 4 minutes (240 seconds) - is the same as the one used in
> Linux kernel driver. Such fix should be good enough until we come up
> with better fix for this issue.
>

I think, that you should adjust the commit message, because it
describes the result of the real problem which is in the background.

It doesn't fix the DFU, because it wasn't broken.
What is the real problem and how does this commit fix it?

> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> Cc: Marek Vasut <marex@denx.de>
> Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
> Cc: Tom Rini <trini@konsulko.com>
> ---
>   drivers/mmc/dw_mmc.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c
> index a84c1e1..26d34ae 100644
> --- a/drivers/mmc/dw_mmc.c
> +++ b/drivers/mmc/dw_mmc.c
> @@ -214,7 +214,7 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
>
>   	if (data) {
>   		start = get_timer(0);
> -		timeout = 1000;
> +		timeout = 240000;
>   		for (;;) {
>   			mask = dwmci_readl(host, DWMCI_RINTSTS);
>   			/* Error during data transfer. */
>

I tested this patch on Odroid XU3 and Odroid U3, with SD and eMMC cards.

For read/write data of size 1800MB, it doesn't break the operation.
Read takes about one minute for some eMMC card, and the write takes 
about 2 and a half minute.

The normal read/write operations are done partially for big size data, 
so it should be good, which shows test with write 1800MB to SD card.

Writing 1800MB of data takes about 8 minutes on some SD card and then I 
can see only the info that something was started - but I don't know if 
it's going on, or the board is dead... So maybe we should get some info 
from this part of code, that the transfer is going on, e.g. after each 
next 10% of data portion?

Tested-by: Przemyslaw Marczak <p.marczak@samsung.com>

Best regards,
-- 
Przemyslaw Marczak
Samsung R&D Institute Poland
Samsung Electronics
p.marczak at samsung.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel)
  2015-09-25 16:25 ` [U-Boot] [PATCH] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel) Lukasz Majewski
  2015-09-28 13:43   ` Przemyslaw Marczak
@ 2015-09-28 21:08   ` Tom Rini
  1 sibling, 0 replies; 50+ messages in thread
From: Tom Rini @ 2015-09-28 21:08 UTC (permalink / raw)
  To: u-boot

On Fri, Sep 25, 2015 at 06:25:25PM +0200, ?ukasz Majewski wrote:

> The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> "mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end
> of dw mmc transfer.
> 
> For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> the default timeout is to short.
> 
> The new value - 4 minutes (240 seconds) - is the same as the one used in
> Linux kernel driver. Such fix should be good enough until we come up
> with better fix for this issue.
> 
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> Cc: Marek Vasut <marex@denx.de>
> Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
> Cc: Tom Rini <trini@konsulko.com>
> Tested-by: Przemyslaw Marczak <p.marczak@samsung.com>

Applied to u-boot/master, thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150928/2d75e3dc/attachment.sig>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [U-Boot] [PATCH] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel)
  2015-09-28 13:43   ` Przemyslaw Marczak
@ 2015-09-28 21:08     ` Tom Rini
  0 siblings, 0 replies; 50+ messages in thread
From: Tom Rini @ 2015-09-28 21:08 UTC (permalink / raw)
  To: u-boot

On Mon, Sep 28, 2015 at 03:43:50PM +0200, Przemyslaw Marczak wrote:
> Hi Lukasz,
> 
> On 09/25/2015 06:25 PM, Lukasz Majewski wrote:
> >The commit: d9dbb97be0e4a550457aec5f11afefb446169c90
> >"mmc: dw_mmc: Zap endless timeout" removed endless loop waiting for end
> >of dw mmc transfer.
> >
> >For some workloads - dfu test @ Odroid XU3 (sending 8MiB file) -
> >and SD cards (e.g. MicroSD Kingston 4GiB, Adata 4GiB)
> >the default timeout is to short.
> >
> >The new value - 4 minutes (240 seconds) - is the same as the one used in
> >Linux kernel driver. Such fix should be good enough until we come up
> >with better fix for this issue.
> >
> 
> I think, that you should adjust the commit message, because it
> describes the result of the real problem which is in the background.
> 
> It doesn't fix the DFU, because it wasn't broken.
> What is the real problem and how does this commit fix it?

I think this message is good enough (esp since I want to apply it now)
and will link back to all of these other messages about it for details.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20150928/cfc168e4/attachment.sig>

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2015-09-28 21:08 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-28 13:50 [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds Lukasz Majewski
2015-08-28 13:50 ` [U-Boot] [PATCH 2/2] mmc: dw_mmc: Make timeout error visible to u-boot console Lukasz Majewski
2015-08-28 23:21   ` Simon Glass
2015-08-29 12:09     ` Lukasz Majewski
2015-08-29 15:07       ` Simon Glass
2015-09-03 12:33         ` Lukasz Majewski
2015-09-03 12:21   ` [U-Boot] [PATCH] FIX: fat: Provide correct return code from disk_{read|write} to upper layers Lukasz Majewski
2015-09-03 12:44     ` Tom Rini
2015-09-03 13:40       ` Lukasz Majewski
2015-09-03 14:18         ` Lukasz Majewski
2015-09-23  3:17           ` Stephen Warren
2015-09-23  8:40             ` Lukasz Majewski
2015-09-25  5:47               ` Stephen Warren
2015-09-09  7:02     ` Lukasz Majewski
2015-09-17 14:44       ` Lukasz Majewski
2015-09-12 12:51     ` [U-Boot] " Tom Rini
2015-08-28 21:55 ` [U-Boot] [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds Marek Vasut
2015-08-29 11:55   ` Lukasz Majewski
2015-08-29 13:52     ` Marek Vasut
2015-08-29 16:38       ` Lukasz Majewski
2015-08-29 19:19         ` Marek Vasut
2015-09-01 11:19           ` Lukasz Majewski
2015-09-01 11:33             ` Marek Vasut
2015-09-01 15:25               ` Lukasz Majewski
2015-09-01 15:35                 ` Marek Vasut
2015-09-01 16:22           ` Pantelis Antoniou
2015-09-02  8:06             ` Marek Vasut
2015-09-09  7:01 ` Lukasz Majewski
2015-09-09 11:34   ` Marek Vasut
2015-09-11 17:20     ` Alexey Brodkin
2015-09-11 21:45       ` Lukasz Majewski
2015-09-12 16:13         ` Marek Vasut
2015-09-13 10:03           ` Lukasz Majewski
2015-09-13 14:00             ` Marek Vasut
2015-09-14 10:15               ` Alexey Brodkin
2015-09-14 11:22                 ` Lukasz Majewski
2015-09-14 13:36                   ` Marek Vasut
2015-09-17 14:43                     ` Lukasz Majewski
2015-09-18  0:31                       ` Tom Rini
2015-09-18  7:32                         ` Lukasz Majewski
2015-09-18  8:07                           ` Przemyslaw Marczak
2015-09-18 19:27                           ` Tom Rini
2015-09-21 15:32                             ` Pantelis Antoniou
2015-09-14 10:30         ` Alexey Brodkin
2015-09-14 11:15           ` Przemyslaw Marczak
2015-09-14 10:33   ` Przemyslaw Marczak
2015-09-25 16:25 ` [U-Boot] [PATCH] mmc: dw_mmc: Increase timeout to 4 minutes (as in Linux kernel) Lukasz Majewski
2015-09-28 13:43   ` Przemyslaw Marczak
2015-09-28 21:08     ` Tom Rini
2015-09-28 21:08   ` [U-Boot] " Tom Rini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.