All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mmc: failure of block read wait for long time
@ 2010-07-27 11:27 ` Sukumar Ghorai
  0 siblings, 0 replies; 54+ messages in thread
From: Sukumar Ghorai @ 2010-07-27 11:27 UTC (permalink / raw)
  To: linux-mmc; +Cc: linux-arm-kernel, Sukumar Ghorai

 multi-block read failure retries in single block read one by one. It continues
 retry of subsequent blocks, even after failure. Application will not be able
 to decode the interleave data (even if few single block read success).
 This patch fixes this problem by returning at the first failure instead of
 waiting for long duration.

Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
---
 drivers/mmc/card/block.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index cb9fbc8..cfb0827 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
 				spin_lock_irq(&md->lock);
 				ret = __blk_end_request(req, -EIO, brq.data.blksz);
 				spin_unlock_irq(&md->lock);
-				continue;
 			}
 			goto cmd_err;
 		}

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-07-27 11:27 ` Sukumar Ghorai
  0 siblings, 0 replies; 54+ messages in thread
From: Sukumar Ghorai @ 2010-07-27 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

 multi-block read failure retries in single block read one by one. It continues
 retry of subsequent blocks, even after failure. Application will not be able
 to decode the interleave data (even if few single block read success).
 This patch fixes this problem by returning at the first failure instead of
 waiting for long duration.

Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
---
 drivers/mmc/card/block.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index cb9fbc8..cfb0827 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
 				spin_lock_irq(&md->lock);
 				ret = __blk_end_request(req, -EIO, brq.data.blksz);
 				spin_unlock_irq(&md->lock);
-				continue;
 			}
 			goto cmd_err;
 		}

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-07-27 11:27 ` Sukumar Ghorai
@ 2010-07-27 13:22   ` Adrian Hunter
  -1 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-07-27 13:22 UTC (permalink / raw)
  To: Sukumar Ghorai; +Cc: linux-mmc, linux-arm-kernel

Sukumar Ghorai wrote:
>  multi-block read failure retries in single block read one by one. It continues
>  retry of subsequent blocks, even after failure. Application will not be able
>  to decode the interleave data (even if few single block read success).
>  This patch fixes this problem by returning at the first failure instead of
>  waiting for long duration.

How can you know what sectors are important to the upper layers?

> 
> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> ---
>  drivers/mmc/card/block.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> index cb9fbc8..cfb0827 100644
> --- a/drivers/mmc/card/block.c
> +++ b/drivers/mmc/card/block.c
> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
>  				spin_lock_irq(&md->lock);
>  				ret = __blk_end_request(req, -EIO, brq.data.blksz);
>  				spin_unlock_irq(&md->lock);
> -				continue;
>  			}
>  			goto cmd_err;
>  		}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-07-27 13:22   ` Adrian Hunter
  0 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-07-27 13:22 UTC (permalink / raw)
  To: linux-arm-kernel

Sukumar Ghorai wrote:
>  multi-block read failure retries in single block read one by one. It continues
>  retry of subsequent blocks, even after failure. Application will not be able
>  to decode the interleave data (even if few single block read success).
>  This patch fixes this problem by returning at the first failure instead of
>  waiting for long duration.

How can you know what sectors are important to the upper layers?

> 
> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> ---
>  drivers/mmc/card/block.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> index cb9fbc8..cfb0827 100644
> --- a/drivers/mmc/card/block.c
> +++ b/drivers/mmc/card/block.c
> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
>  				spin_lock_irq(&md->lock);
>  				ret = __blk_end_request(req, -EIO, brq.data.blksz);
>  				spin_unlock_irq(&md->lock);
> -				continue;
>  			}
>  			goto cmd_err;
>  		}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-07-27 13:22   ` Adrian Hunter
@ 2010-07-27 13:32     ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-07-27 13:32 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-mmc, linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> Sent: Tuesday, July 27, 2010 6:52 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> Sukumar Ghorai wrote:
> >  multi-block read failure retries in single block read one by one. It
> continues
> >  retry of subsequent blocks, even after failure. Application will not be
> able
> >  to decode the interleave data (even if few single block read success).
> >  This patch fixes this problem by returning at the first failure instead
> of
> >  waiting for long duration.
> 
> How can you know what sectors are important to the upper layers?
[Ghorai] 
1. This is obvious either give the complete data to above layer or not. And every protocol follows that.

2. And other scenario, say apps request for to read the 128MB data and it will take quite long time before returning in failure case.

3. it is quite obvious if the read fail for sector(x) it will fail for sector(x+1)

> 
> >
> > Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> > ---
> >  drivers/mmc/card/block.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> > index cb9fbc8..cfb0827 100644
> > --- a/drivers/mmc/card/block.c
> > +++ b/drivers/mmc/card/block.c
> > @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
> struct request *req)
> >  				spin_lock_irq(&md->lock);
> >  				ret = __blk_end_request(req, -EIO,
> brq.data.blksz);
> >  				spin_unlock_irq(&md->lock);
> > -				continue;
> >  			}
> >  			goto cmd_err;
> >  		}
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >

Regards,
Ghorai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-07-27 13:32     ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-07-27 13:32 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> Sent: Tuesday, July 27, 2010 6:52 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> Sukumar Ghorai wrote:
> >  multi-block read failure retries in single block read one by one. It
> continues
> >  retry of subsequent blocks, even after failure. Application will not be
> able
> >  to decode the interleave data (even if few single block read success).
> >  This patch fixes this problem by returning at the first failure instead
> of
> >  waiting for long duration.
> 
> How can you know what sectors are important to the upper layers?
[Ghorai] 
1. This is obvious either give the complete data to above layer or not. And every protocol follows that.

2. And other scenario, say apps request for to read the 128MB data and it will take quite long time before returning in failure case.

3. it is quite obvious if the read fail for sector(x) it will fail for sector(x+1)

> 
> >
> > Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> > ---
> >  drivers/mmc/card/block.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> > index cb9fbc8..cfb0827 100644
> > --- a/drivers/mmc/card/block.c
> > +++ b/drivers/mmc/card/block.c
> > @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
> struct request *req)
> >  				spin_lock_irq(&md->lock);
> >  				ret = __blk_end_request(req, -EIO,
> brq.data.blksz);
> >  				spin_unlock_irq(&md->lock);
> > -				continue;
> >  			}
> >  			goto cmd_err;
> >  		}
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >

Regards,
Ghorai

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-07-27 13:32     ` Ghorai, Sukumar
@ 2010-07-28  8:41       ` Adrian Hunter
  -1 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-07-28  8:41 UTC (permalink / raw)
  To: Ghorai, Sukumar; +Cc: linux-mmc, linux-arm-kernel

ext Ghorai, Sukumar wrote:
> 
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
>> Sent: Tuesday, July 27, 2010 6:52 PM
>> To: Ghorai, Sukumar
>> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> Sukumar Ghorai wrote:
>>>  multi-block read failure retries in single block read one by one. It
>> continues
>>>  retry of subsequent blocks, even after failure. Application will not be
>> able
>>>  to decode the interleave data (even if few single block read success).
>>>  This patch fixes this problem by returning at the first failure instead
>> of
>>>  waiting for long duration.
>> How can you know what sectors are important to the upper layers?
> [Ghorai] 
> 1. This is obvious either give the complete data to above layer or not. And every protocol follows that.
> 
> 2. And other scenario, say apps request for to read the 128MB data and it will take quite long time before returning in failure case.
> 
> 3. it is quite obvious if the read fail for sector(x) it will fail for sector(x+1)
> 

None of that is exactly true.  However you could change the retry
from sectors to segments e.g.


diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index d545f79..5ed15f6 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -321,13 +321,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 	struct mmc_blk_data *md = mq->data;
 	struct mmc_card *card = md->queue.card;
 	struct mmc_blk_request brq;
-	int ret = 1, disable_multi = 0;
+	int ret = 1, retrying = 0;
 
 	mmc_claim_host(card->host);
 
 	do {
 		struct mmc_command cmd;
 		u32 readcmd, writecmd, status = 0;
+		unsigned int blocks;
 
 		memset(&brq, 0, sizeof(struct mmc_blk_request));
 		brq.mrq.cmd = &brq.cmd;
@@ -341,23 +342,26 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		brq.stop.opcode = MMC_STOP_TRANSMISSION;
 		brq.stop.arg = 0;
 		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
-		brq.data.blocks = blk_rq_sectors(req);
+
+		/*
+		 * After a read error, we redo the request one segment at a time
+		 * in order to accurately determine which segments can be read
+		 * successfully.
+		 */
+		if (retrying)
+			blocks = blk_rq_cur_sectors(req);
+		else
+			blocks = blk_rq_sectors(req);
 
 		/*
 		 * The block layer doesn't support all sector count
 		 * restrictions, so we need to be prepared for too big
 		 * requests.
 		 */
-		if (brq.data.blocks > card->host->max_blk_count)
-			brq.data.blocks = card->host->max_blk_count;
+		if (blocks > card->host->max_blk_count)
+			blocks = card->host->max_blk_count;
 
-		/*
-		 * After a read error, we redo the request one sector at a time
-		 * in order to accurately determine which sectors can be read
-		 * successfully.
-		 */
-		if (disable_multi && brq.data.blocks > 1)
-			brq.data.blocks = 1;
+		brq.data.blocks = blocks;
 
 		if (brq.data.blocks > 1) {
 			/* SPI multiblock writes terminate using a special
@@ -418,11 +422,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		 * programming mode even when things go wrong.
 		 */
 		if (brq.cmd.error || brq.data.error || brq.stop.error) {
-			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
-				/* Redo read one sector at a time */
-				printk(KERN_WARNING "%s: retrying using single "
-				       "block read\n", req->rq_disk->disk_name);
-				disable_multi = 1;
+			if (!retrying && rq_data_dir(req) == READ &&
+			    blk_rq_cur_sectors(req) &&
+			    blk_rq_cur_sectors(req) < blocks) {
+				/* Redo read one segment at a time */
+				printk(KERN_WARNING "%s: retrying read one "
+						    "segment at a time\n",
+						    req->rq_disk->disk_name);
+				retrying = 1;
 				continue;
 			}
 			status = get_card_status(card, req);
@@ -486,12 +493,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		if (brq.cmd.error || brq.stop.error || brq.data.error) {
 			if (rq_data_dir(req) == READ) {
 				/*
-				 * After an error, we redo I/O one sector at a
+				 * After an error, we redo I/O one segment at a
 				 * time, so we only reach here after trying to
-				 * read a single sector.
+				 * read a single segment.  Error out the whole
+				 * segment and continue to the next.
 				 */
 				spin_lock_irq(&md->lock);
-				ret = __blk_end_request(req, -EIO, brq.data.blksz);
+				ret = __blk_end_request(req, -EIO, blk_rq_cur_sectors(req) << 9);
 				spin_unlock_irq(&md->lock);
 				continue;
 			}


>>> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
>>> ---
>>>  drivers/mmc/card/block.c |    1 -
>>>  1 files changed, 0 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>>> index cb9fbc8..cfb0827 100644
>>> --- a/drivers/mmc/card/block.c
>>> +++ b/drivers/mmc/card/block.c
>>> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
>> struct request *req)
>>>  				spin_lock_irq(&md->lock);
>>>  				ret = __blk_end_request(req, -EIO,
>> brq.data.blksz);
>>>  				spin_unlock_irq(&md->lock);
>>> -				continue;
>>>  			}
>>>  			goto cmd_err;
>>>  		}
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> Regards,
> Ghorai
> 


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-07-28  8:41       ` Adrian Hunter
  0 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-07-28  8:41 UTC (permalink / raw)
  To: linux-arm-kernel

ext Ghorai, Sukumar wrote:
> 
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
>> Sent: Tuesday, July 27, 2010 6:52 PM
>> To: Ghorai, Sukumar
>> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> Sukumar Ghorai wrote:
>>>  multi-block read failure retries in single block read one by one. It
>> continues
>>>  retry of subsequent blocks, even after failure. Application will not be
>> able
>>>  to decode the interleave data (even if few single block read success).
>>>  This patch fixes this problem by returning at the first failure instead
>> of
>>>  waiting for long duration.
>> How can you know what sectors are important to the upper layers?
> [Ghorai] 
> 1. This is obvious either give the complete data to above layer or not. And every protocol follows that.
> 
> 2. And other scenario, say apps request for to read the 128MB data and it will take quite long time before returning in failure case.
> 
> 3. it is quite obvious if the read fail for sector(x) it will fail for sector(x+1)
> 

None of that is exactly true.  However you could change the retry
from sectors to segments e.g.


diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index d545f79..5ed15f6 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -321,13 +321,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 	struct mmc_blk_data *md = mq->data;
 	struct mmc_card *card = md->queue.card;
 	struct mmc_blk_request brq;
-	int ret = 1, disable_multi = 0;
+	int ret = 1, retrying = 0;
 
 	mmc_claim_host(card->host);
 
 	do {
 		struct mmc_command cmd;
 		u32 readcmd, writecmd, status = 0;
+		unsigned int blocks;
 
 		memset(&brq, 0, sizeof(struct mmc_blk_request));
 		brq.mrq.cmd = &brq.cmd;
@@ -341,23 +342,26 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		brq.stop.opcode = MMC_STOP_TRANSMISSION;
 		brq.stop.arg = 0;
 		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
-		brq.data.blocks = blk_rq_sectors(req);
+
+		/*
+		 * After a read error, we redo the request one segment at a time
+		 * in order to accurately determine which segments can be read
+		 * successfully.
+		 */
+		if (retrying)
+			blocks = blk_rq_cur_sectors(req);
+		else
+			blocks = blk_rq_sectors(req);
 
 		/*
 		 * The block layer doesn't support all sector count
 		 * restrictions, so we need to be prepared for too big
 		 * requests.
 		 */
-		if (brq.data.blocks > card->host->max_blk_count)
-			brq.data.blocks = card->host->max_blk_count;
+		if (blocks > card->host->max_blk_count)
+			blocks = card->host->max_blk_count;
 
-		/*
-		 * After a read error, we redo the request one sector at a time
-		 * in order to accurately determine which sectors can be read
-		 * successfully.
-		 */
-		if (disable_multi && brq.data.blocks > 1)
-			brq.data.blocks = 1;
+		brq.data.blocks = blocks;
 
 		if (brq.data.blocks > 1) {
 			/* SPI multiblock writes terminate using a special
@@ -418,11 +422,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		 * programming mode even when things go wrong.
 		 */
 		if (brq.cmd.error || brq.data.error || brq.stop.error) {
-			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
-				/* Redo read one sector at a time */
-				printk(KERN_WARNING "%s: retrying using single "
-				       "block read\n", req->rq_disk->disk_name);
-				disable_multi = 1;
+			if (!retrying && rq_data_dir(req) == READ &&
+			    blk_rq_cur_sectors(req) &&
+			    blk_rq_cur_sectors(req) < blocks) {
+				/* Redo read one segment at a time */
+				printk(KERN_WARNING "%s: retrying read one "
+						    "segment@a time\n",
+						    req->rq_disk->disk_name);
+				retrying = 1;
 				continue;
 			}
 			status = get_card_status(card, req);
@@ -486,12 +493,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		if (brq.cmd.error || brq.stop.error || brq.data.error) {
 			if (rq_data_dir(req) == READ) {
 				/*
-				 * After an error, we redo I/O one sector at a
+				 * After an error, we redo I/O one segment at a
 				 * time, so we only reach here after trying to
-				 * read a single sector.
+				 * read a single segment.  Error out the whole
+				 * segment and continue to the next.
 				 */
 				spin_lock_irq(&md->lock);
-				ret = __blk_end_request(req, -EIO, brq.data.blksz);
+				ret = __blk_end_request(req, -EIO, blk_rq_cur_sectors(req) << 9);
 				spin_unlock_irq(&md->lock);
 				continue;
 			}


>>> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
>>> ---
>>>  drivers/mmc/card/block.c |    1 -
>>>  1 files changed, 0 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>>> index cb9fbc8..cfb0827 100644
>>> --- a/drivers/mmc/card/block.c
>>> +++ b/drivers/mmc/card/block.c
>>> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
>> struct request *req)
>>>  				spin_lock_irq(&md->lock);
>>>  				ret = __blk_end_request(req, -EIO,
>> brq.data.blksz);
>>>  				spin_unlock_irq(&md->lock);
>>> -				continue;
>>>  			}
>>>  			goto cmd_err;
>>>  		}
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>>> the body of a message to majordomo at vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> Regards,
> Ghorai
> 

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-07-27 11:27 ` Sukumar Ghorai
@ 2010-08-27 20:59   ` Chris Ball
  -1 siblings, 0 replies; 54+ messages in thread
From: Chris Ball @ 2010-08-27 20:59 UTC (permalink / raw)
  To: Sukumar Ghorai; +Cc: linux-mmc, linux-arm-kernel, Adrian Hunter

Hi,

On Tue, Jul 27, 2010 at 04:57:46PM +0530, Sukumar Ghorai wrote:
>  multi-block read failure retries in single block read one by one. It continues
>  retry of subsequent blocks, even after failure. Application will not be able
>  to decode the interleave data (even if few single block read success).
>  This patch fixes this problem by returning at the first failure instead of
>  waiting for long duration.
> 
> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> ---
>  drivers/mmc/card/block.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> index cb9fbc8..cfb0827 100644
> --- a/drivers/mmc/card/block.c
> +++ b/drivers/mmc/card/block.c
> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
>  				spin_lock_irq(&md->lock);
>  				ret = __blk_end_request(req, -EIO, brq.data.blksz);
>  				spin_unlock_irq(&md->lock);
> -				continue;
>  			}
>  			goto cmd_err;
>  		}
> --

Sukumar/Adrian, any thoughts on a new version of this patch containing
Adrian's retry-segments idea?

-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-08-27 20:59   ` Chris Ball
  0 siblings, 0 replies; 54+ messages in thread
From: Chris Ball @ 2010-08-27 20:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Tue, Jul 27, 2010 at 04:57:46PM +0530, Sukumar Ghorai wrote:
>  multi-block read failure retries in single block read one by one. It continues
>  retry of subsequent blocks, even after failure. Application will not be able
>  to decode the interleave data (even if few single block read success).
>  This patch fixes this problem by returning at the first failure instead of
>  waiting for long duration.
> 
> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> ---
>  drivers/mmc/card/block.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> index cb9fbc8..cfb0827 100644
> --- a/drivers/mmc/card/block.c
> +++ b/drivers/mmc/card/block.c
> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
>  				spin_lock_irq(&md->lock);
>  				ret = __blk_end_request(req, -EIO, brq.data.blksz);
>  				spin_unlock_irq(&md->lock);
> -				continue;
>  			}
>  			goto cmd_err;
>  		}
> --

Sukumar/Adrian, any thoughts on a new version of this patch containing
Adrian's retry-segments idea?

-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-08-27 20:59   ` Chris Ball
@ 2010-08-30 19:09     ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-08-30 19:09 UTC (permalink / raw)
  To: Chris Ball; +Cc: linux-mmc, linux-arm-kernel, Adrian Hunter



> -----Original Message-----
> From: Chris Ball [mailto:cjb@laptop.org]
> Sent: Saturday, August 28, 2010 2:30 AM
> To: Ghorai, Sukumar
> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> Adrian Hunter
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> Hi,
> 
> On Tue, Jul 27, 2010 at 04:57:46PM +0530, Sukumar Ghorai wrote:
> >  multi-block read failure retries in single block read one by one. It
> continues
> >  retry of subsequent blocks, even after failure. Application will not be
> able
> >  to decode the interleave data (even if few single block read success).
> >  This patch fixes this problem by returning at the first failure instead
> of
> >  waiting for long duration.
> >
> > Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> > ---
> >  drivers/mmc/card/block.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> > index cb9fbc8..cfb0827 100644
> > --- a/drivers/mmc/card/block.c
> > +++ b/drivers/mmc/card/block.c
> > @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
> struct request *req)
> >  				spin_lock_irq(&md->lock);
> >  				ret = __blk_end_request(req, -EIO,
> brq.data.blksz);
> >  				spin_unlock_irq(&md->lock);
> > -				continue;
> >  			}
> >  			goto cmd_err;
> >  		}
> > --
> 
> Sukumar/Adrian, any thoughts on a new version of this patch containing
> Adrian's retry-segments idea?
[Ghorai] I am checked and tested the suggestion from Adrian. But was thinking it's just reducing the retry by 1/4 time; mean -
1. Each segment is size of 4K. and multi-block-write failed move to next segment(4k);
2. i.e. original code it was retrying for each 512 bytes. Now it's retrying for 8*512 bytes. So retry reduce by 1/8 times.

So I will post the 2nd version as suggested by Adrian; but let's see others input.

> 
> --
> Chris Ball   <cjb@laptop.org>   <http://printf.net/>
> One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-08-30 19:09     ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-08-30 19:09 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Chris Ball [mailto:cjb at laptop.org]
> Sent: Saturday, August 28, 2010 2:30 AM
> To: Ghorai, Sukumar
> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
> Adrian Hunter
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> Hi,
> 
> On Tue, Jul 27, 2010 at 04:57:46PM +0530, Sukumar Ghorai wrote:
> >  multi-block read failure retries in single block read one by one. It
> continues
> >  retry of subsequent blocks, even after failure. Application will not be
> able
> >  to decode the interleave data (even if few single block read success).
> >  This patch fixes this problem by returning at the first failure instead
> of
> >  waiting for long duration.
> >
> > Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> > ---
> >  drivers/mmc/card/block.c |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> > index cb9fbc8..cfb0827 100644
> > --- a/drivers/mmc/card/block.c
> > +++ b/drivers/mmc/card/block.c
> > @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
> struct request *req)
> >  				spin_lock_irq(&md->lock);
> >  				ret = __blk_end_request(req, -EIO,
> brq.data.blksz);
> >  				spin_unlock_irq(&md->lock);
> > -				continue;
> >  			}
> >  			goto cmd_err;
> >  		}
> > --
> 
> Sukumar/Adrian, any thoughts on a new version of this patch containing
> Adrian's retry-segments idea?
[Ghorai] I am checked and tested the suggestion from Adrian. But was thinking it's just reducing the retry by 1/4 time; mean -
1. Each segment is size of 4K. and multi-block-write failed move to next segment(4k);
2. i.e. original code it was retrying for each 512 bytes. Now it's retrying for 8*512 bytes. So retry reduce by 1/8 times.

So I will post the 2nd version as suggested by Adrian; but let's see others input.

> 
> --
> Chris Ball   <cjb@laptop.org>   <http://printf.net/>
> One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-07-28  8:41       ` Adrian Hunter
@ 2010-09-10 11:30         ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-10 11:30 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-mmc, linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> Sent: Wednesday, July 28, 2010 2:11 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> ext Ghorai, Sukumar wrote:
> >
> >> -----Original Message-----
> >> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> >> Sent: Tuesday, July 27, 2010 6:52 PM
> >> To: Ghorai, Sukumar
> >> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> >> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>
> >> Sukumar Ghorai wrote:
> >>>  multi-block read failure retries in single block read one by one. It
> >> continues
> >>>  retry of subsequent blocks, even after failure. Application will not
> be
> >> able
> >>>  to decode the interleave data (even if few single block read success).
> >>>  This patch fixes this problem by returning at the first failure
> instead
> >> of
> >>>  waiting for long duration.
> >> How can you know what sectors are important to the upper layers?
> > [Ghorai]
> > 1. This is obvious either give the complete data to above layer or not.
> And every protocol follows that.
> >
> > 2. And other scenario, say apps request for to read the 128MB data and
> it will take quite long time before returning in failure case.
> >
> > 3. it is quite obvious if the read fail for sector(x) it will fail for
> sector(x+1)
> >
> 
> None of that is exactly true.  However you could change the retry
> from sectors to segments e.g.
> 
> 
> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> index d545f79..5ed15f6 100644
> --- a/drivers/mmc/card/block.c
> +++ b/drivers/mmc/card/block.c
> @@ -321,13 +321,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
> struct request *req)
>  	struct mmc_blk_data *md = mq->data;
>  	struct mmc_card *card = md->queue.card;
>  	struct mmc_blk_request brq;
> -	int ret = 1, disable_multi = 0;
> +	int ret = 1, retrying = 0;
> 
>  	mmc_claim_host(card->host);
> 
>  	do {
>  		struct mmc_command cmd;
>  		u32 readcmd, writecmd, status = 0;
> +		unsigned int blocks;
> 
>  		memset(&brq, 0, sizeof(struct mmc_blk_request));
>  		brq.mrq.cmd = &brq.cmd;
> @@ -341,23 +342,26 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
> struct request *req)
>  		brq.stop.opcode = MMC_STOP_TRANSMISSION;
>  		brq.stop.arg = 0;
>  		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
> -		brq.data.blocks = blk_rq_sectors(req);
> +
> +		/*
> +		 * After a read error, we redo the request one segment at a
> time
> +		 * in order to accurately determine which segments can be read
> +		 * successfully.
> +		 */
> +		if (retrying)
> +			blocks = blk_rq_cur_sectors(req);
> +		else
> +			blocks = blk_rq_sectors(req);
> 
>  		/*
>  		 * The block layer doesn't support all sector count
>  		 * restrictions, so we need to be prepared for too big
>  		 * requests.
>  		 */
> -		if (brq.data.blocks > card->host->max_blk_count)
> -			brq.data.blocks = card->host->max_blk_count;
> +		if (blocks > card->host->max_blk_count)
> +			blocks = card->host->max_blk_count;
> 
> -		/*
> -		 * After a read error, we redo the request one sector at a
> time
> -		 * in order to accurately determine which sectors can be read
> -		 * successfully.
> -		 */
> -		if (disable_multi && brq.data.blocks > 1)
> -			brq.data.blocks = 1;
> +		brq.data.blocks = blocks;
> 
>  		if (brq.data.blocks > 1) {
>  			/* SPI multiblock writes terminate using a special
> @@ -418,11 +422,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
> struct request *req)
>  		 * programming mode even when things go wrong.
>  		 */
>  		if (brq.cmd.error || brq.data.error || brq.stop.error) {
> -			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
> -				/* Redo read one sector at a time */
> -				printk(KERN_WARNING "%s: retrying using single "
> -				       "block read\n", req->rq_disk->disk_name);
> -				disable_multi = 1;
> +			if (!retrying && rq_data_dir(req) == READ &&
> +			    blk_rq_cur_sectors(req) &&
> +			    blk_rq_cur_sectors(req) < blocks) {
> +				/* Redo read one segment at a time */
> +				printk(KERN_WARNING "%s: retrying read one "
> +						    "segment at a time\n",
> +						    req->rq_disk->disk_name);
> +				retrying = 1;
>  				continue;
>  			}
>  			status = get_card_status(card, req);
> @@ -486,12 +493,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
> struct request *req)
>  		if (brq.cmd.error || brq.stop.error || brq.data.error) {
>  			if (rq_data_dir(req) == READ) {
>  				/*
> -				 * After an error, we redo I/O one sector at a
> +				 * After an error, we redo I/O one segment at a
>  				 * time, so we only reach here after trying to
> -				 * read a single sector.
> +				 * read a single segment.  Error out the whole
> +				 * segment and continue to the next.
>  				 */
>  				spin_lock_irq(&md->lock);
> -				ret = __blk_end_request(req, -EIO,
> brq.data.blksz);
> +				ret = __blk_end_request(req, -EIO,
> blk_rq_cur_sectors(req) << 9);
>  				spin_unlock_irq(&md->lock);
>  				continue;
>  			}
> 
> 
> >>> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> >>> ---
> >>>  drivers/mmc/card/block.c |    1 -
> >>>  1 files changed, 0 insertions(+), 1 deletions(-)
> >>>
> >>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> >>> index cb9fbc8..cfb0827 100644
> >>> --- a/drivers/mmc/card/block.c
> >>> +++ b/drivers/mmc/card/block.c
> >>> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
> >> struct request *req)
> >>>  				spin_lock_irq(&md->lock);
> >>>  				ret = __blk_end_request(req, -EIO,
> >> brq.data.blksz);
> >>>  				spin_unlock_irq(&md->lock);
> >>> -				continue;
> >>>  			}
> >>>  			goto cmd_err;
> >>>  		}
> >>> --
[Ghorai] Adrian,
Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB data read) form the original code;

Initially it was retrying for each page(512 bytes) after multi-block read fail; but this solution is retying for each segment(2048 bytes); 

1. Now say block layrer reading 1MB and failed for the 1st segment. So it will still retry for 1MB/2048-bytes, i.e. 512 times.
2. So do you think any good reason to retry again and again? 
3. And again I mentioned how to inform the blok-layer that which segment having the valid data? 

And I was suggesting for not to retry again if 1st retry (single block) failed.
http://comments.gmane.org/gmane.linux.kernel.mmc/2714

> >>> To unsubscribe from this list: send the line "unsubscribe linux-mmc"
> in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >
> > Regards,
> > Ghorai
> >


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-10 11:30         ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-10 11:30 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> Sent: Wednesday, July 28, 2010 2:11 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> ext Ghorai, Sukumar wrote:
> >
> >> -----Original Message-----
> >> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> >> Sent: Tuesday, July 27, 2010 6:52 PM
> >> To: Ghorai, Sukumar
> >> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> >> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>
> >> Sukumar Ghorai wrote:
> >>>  multi-block read failure retries in single block read one by one. It
> >> continues
> >>>  retry of subsequent blocks, even after failure. Application will not
> be
> >> able
> >>>  to decode the interleave data (even if few single block read success).
> >>>  This patch fixes this problem by returning at the first failure
> instead
> >> of
> >>>  waiting for long duration.
> >> How can you know what sectors are important to the upper layers?
> > [Ghorai]
> > 1. This is obvious either give the complete data to above layer or not.
> And every protocol follows that.
> >
> > 2. And other scenario, say apps request for to read the 128MB data and
> it will take quite long time before returning in failure case.
> >
> > 3. it is quite obvious if the read fail for sector(x) it will fail for
> sector(x+1)
> >
> 
> None of that is exactly true.  However you could change the retry
> from sectors to segments e.g.
> 
> 
> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> index d545f79..5ed15f6 100644
> --- a/drivers/mmc/card/block.c
> +++ b/drivers/mmc/card/block.c
> @@ -321,13 +321,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
> struct request *req)
>  	struct mmc_blk_data *md = mq->data;
>  	struct mmc_card *card = md->queue.card;
>  	struct mmc_blk_request brq;
> -	int ret = 1, disable_multi = 0;
> +	int ret = 1, retrying = 0;
> 
>  	mmc_claim_host(card->host);
> 
>  	do {
>  		struct mmc_command cmd;
>  		u32 readcmd, writecmd, status = 0;
> +		unsigned int blocks;
> 
>  		memset(&brq, 0, sizeof(struct mmc_blk_request));
>  		brq.mrq.cmd = &brq.cmd;
> @@ -341,23 +342,26 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
> struct request *req)
>  		brq.stop.opcode = MMC_STOP_TRANSMISSION;
>  		brq.stop.arg = 0;
>  		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
> -		brq.data.blocks = blk_rq_sectors(req);
> +
> +		/*
> +		 * After a read error, we redo the request one segment at a
> time
> +		 * in order to accurately determine which segments can be read
> +		 * successfully.
> +		 */
> +		if (retrying)
> +			blocks = blk_rq_cur_sectors(req);
> +		else
> +			blocks = blk_rq_sectors(req);
> 
>  		/*
>  		 * The block layer doesn't support all sector count
>  		 * restrictions, so we need to be prepared for too big
>  		 * requests.
>  		 */
> -		if (brq.data.blocks > card->host->max_blk_count)
> -			brq.data.blocks = card->host->max_blk_count;
> +		if (blocks > card->host->max_blk_count)
> +			blocks = card->host->max_blk_count;
> 
> -		/*
> -		 * After a read error, we redo the request one sector at a
> time
> -		 * in order to accurately determine which sectors can be read
> -		 * successfully.
> -		 */
> -		if (disable_multi && brq.data.blocks > 1)
> -			brq.data.blocks = 1;
> +		brq.data.blocks = blocks;
> 
>  		if (brq.data.blocks > 1) {
>  			/* SPI multiblock writes terminate using a special
> @@ -418,11 +422,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
> struct request *req)
>  		 * programming mode even when things go wrong.
>  		 */
>  		if (brq.cmd.error || brq.data.error || brq.stop.error) {
> -			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
> -				/* Redo read one sector at a time */
> -				printk(KERN_WARNING "%s: retrying using single "
> -				       "block read\n", req->rq_disk->disk_name);
> -				disable_multi = 1;
> +			if (!retrying && rq_data_dir(req) == READ &&
> +			    blk_rq_cur_sectors(req) &&
> +			    blk_rq_cur_sectors(req) < blocks) {
> +				/* Redo read one segment at a time */
> +				printk(KERN_WARNING "%s: retrying read one "
> +						    "segment at a time\n",
> +						    req->rq_disk->disk_name);
> +				retrying = 1;
>  				continue;
>  			}
>  			status = get_card_status(card, req);
> @@ -486,12 +493,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
> struct request *req)
>  		if (brq.cmd.error || brq.stop.error || brq.data.error) {
>  			if (rq_data_dir(req) == READ) {
>  				/*
> -				 * After an error, we redo I/O one sector at a
> +				 * After an error, we redo I/O one segment at a
>  				 * time, so we only reach here after trying to
> -				 * read a single sector.
> +				 * read a single segment.  Error out the whole
> +				 * segment and continue to the next.
>  				 */
>  				spin_lock_irq(&md->lock);
> -				ret = __blk_end_request(req, -EIO,
> brq.data.blksz);
> +				ret = __blk_end_request(req, -EIO,
> blk_rq_cur_sectors(req) << 9);
>  				spin_unlock_irq(&md->lock);
>  				continue;
>  			}
> 
> 
> >>> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> >>> ---
> >>>  drivers/mmc/card/block.c |    1 -
> >>>  1 files changed, 0 insertions(+), 1 deletions(-)
> >>>
> >>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> >>> index cb9fbc8..cfb0827 100644
> >>> --- a/drivers/mmc/card/block.c
> >>> +++ b/drivers/mmc/card/block.c
> >>> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
> >> struct request *req)
> >>>  				spin_lock_irq(&md->lock);
> >>>  				ret = __blk_end_request(req, -EIO,
> >> brq.data.blksz);
> >>>  				spin_unlock_irq(&md->lock);
> >>> -				continue;
> >>>  			}
> >>>  			goto cmd_err;
> >>>  		}
> >>> --
[Ghorai] Adrian,
Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB data read) form the original code;

Initially it was retrying for each page(512 bytes) after multi-block read fail; but this solution is retying for each segment(2048 bytes); 

1. Now say block layrer reading 1MB and failed for the 1st segment. So it will still retry for 1MB/2048-bytes, i.e. 512 times.
2. So do you think any good reason to retry again and again? 
3. And again I mentioned how to inform the blok-layer that which segment having the valid data? 

And I was suggesting for not to retry again if 1st retry (single block) failed.
http://comments.gmane.org/gmane.linux.kernel.mmc/2714

> >>> To unsubscribe from this list: send the line "unsubscribe linux-mmc"
> in
> >>> the body of a message to majordomo at vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >
> > Regards,
> > Ghorai
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-09-10 11:30         ` Ghorai, Sukumar
@ 2010-09-10 11:43           ` Adrian Hunter
  -1 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-10 11:43 UTC (permalink / raw)
  To: ext Ghorai, Sukumar; +Cc: linux-mmc, linux-arm-kernel

ext Ghorai, Sukumar wrote:
> 
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
>> Sent: Wednesday, July 28, 2010 2:11 PM
>> To: Ghorai, Sukumar
>> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> ext Ghorai, Sukumar wrote:
>>>> -----Original Message-----
>>>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
>>>> Sent: Tuesday, July 27, 2010 6:52 PM
>>>> To: Ghorai, Sukumar
>>>> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>
>>>> Sukumar Ghorai wrote:
>>>>>  multi-block read failure retries in single block read one by one. It
>>>> continues
>>>>>  retry of subsequent blocks, even after failure. Application will not
>> be
>>>> able
>>>>>  to decode the interleave data (even if few single block read success).
>>>>>  This patch fixes this problem by returning at the first failure
>> instead
>>>> of
>>>>>  waiting for long duration.
>>>> How can you know what sectors are important to the upper layers?
>>> [Ghorai]
>>> 1. This is obvious either give the complete data to above layer or not.
>> And every protocol follows that.
>>> 2. And other scenario, say apps request for to read the 128MB data and
>> it will take quite long time before returning in failure case.
>>> 3. it is quite obvious if the read fail for sector(x) it will fail for
>> sector(x+1)
>> None of that is exactly true.  However you could change the retry
>> from sectors to segments e.g.
>>
>>
>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>> index d545f79..5ed15f6 100644
>> --- a/drivers/mmc/card/block.c
>> +++ b/drivers/mmc/card/block.c
>> @@ -321,13 +321,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
>> struct request *req)
>>  	struct mmc_blk_data *md = mq->data;
>>  	struct mmc_card *card = md->queue.card;
>>  	struct mmc_blk_request brq;
>> -	int ret = 1, disable_multi = 0;
>> +	int ret = 1, retrying = 0;
>>
>>  	mmc_claim_host(card->host);
>>
>>  	do {
>>  		struct mmc_command cmd;
>>  		u32 readcmd, writecmd, status = 0;
>> +		unsigned int blocks;
>>
>>  		memset(&brq, 0, sizeof(struct mmc_blk_request));
>>  		brq.mrq.cmd = &brq.cmd;
>> @@ -341,23 +342,26 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
>> struct request *req)
>>  		brq.stop.opcode = MMC_STOP_TRANSMISSION;
>>  		brq.stop.arg = 0;
>>  		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
>> -		brq.data.blocks = blk_rq_sectors(req);
>> +
>> +		/*
>> +		 * After a read error, we redo the request one segment at a
>> time
>> +		 * in order to accurately determine which segments can be read
>> +		 * successfully.
>> +		 */
>> +		if (retrying)
>> +			blocks = blk_rq_cur_sectors(req);
>> +		else
>> +			blocks = blk_rq_sectors(req);
>>
>>  		/*
>>  		 * The block layer doesn't support all sector count
>>  		 * restrictions, so we need to be prepared for too big
>>  		 * requests.
>>  		 */
>> -		if (brq.data.blocks > card->host->max_blk_count)
>> -			brq.data.blocks = card->host->max_blk_count;
>> +		if (blocks > card->host->max_blk_count)
>> +			blocks = card->host->max_blk_count;
>>
>> -		/*
>> -		 * After a read error, we redo the request one sector at a
>> time
>> -		 * in order to accurately determine which sectors can be read
>> -		 * successfully.
>> -		 */
>> -		if (disable_multi && brq.data.blocks > 1)
>> -			brq.data.blocks = 1;
>> +		brq.data.blocks = blocks;
>>
>>  		if (brq.data.blocks > 1) {
>>  			/* SPI multiblock writes terminate using a special
>> @@ -418,11 +422,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
>> struct request *req)
>>  		 * programming mode even when things go wrong.
>>  		 */
>>  		if (brq.cmd.error || brq.data.error || brq.stop.error) {
>> -			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
>> -				/* Redo read one sector at a time */
>> -				printk(KERN_WARNING "%s: retrying using single "
>> -				       "block read\n", req->rq_disk->disk_name);
>> -				disable_multi = 1;
>> +			if (!retrying && rq_data_dir(req) == READ &&
>> +			    blk_rq_cur_sectors(req) &&
>> +			    blk_rq_cur_sectors(req) < blocks) {
>> +				/* Redo read one segment at a time */
>> +				printk(KERN_WARNING "%s: retrying read one "
>> +						    "segment at a time\n",
>> +						    req->rq_disk->disk_name);
>> +				retrying = 1;
>>  				continue;
>>  			}
>>  			status = get_card_status(card, req);
>> @@ -486,12 +493,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
>> struct request *req)
>>  		if (brq.cmd.error || brq.stop.error || brq.data.error) {
>>  			if (rq_data_dir(req) == READ) {
>>  				/*
>> -				 * After an error, we redo I/O one sector at a
>> +				 * After an error, we redo I/O one segment at a
>>  				 * time, so we only reach here after trying to
>> -				 * read a single sector.
>> +				 * read a single segment.  Error out the whole
>> +				 * segment and continue to the next.
>>  				 */
>>  				spin_lock_irq(&md->lock);
>> -				ret = __blk_end_request(req, -EIO,
>> brq.data.blksz);
>> +				ret = __blk_end_request(req, -EIO,
>> blk_rq_cur_sectors(req) << 9);
>>  				spin_unlock_irq(&md->lock);
>>  				continue;
>>  			}
>>
>>
>>>>> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
>>>>> ---
>>>>>  drivers/mmc/card/block.c |    1 -
>>>>>  1 files changed, 0 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>>>>> index cb9fbc8..cfb0827 100644
>>>>> --- a/drivers/mmc/card/block.c
>>>>> +++ b/drivers/mmc/card/block.c
>>>>> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
>>>> struct request *req)
>>>>>  				spin_lock_irq(&md->lock);
>>>>>  				ret = __blk_end_request(req, -EIO,
>>>> brq.data.blksz);
>>>>>  				spin_unlock_irq(&md->lock);
>>>>> -				continue;
>>>>>  			}
>>>>>  			goto cmd_err;
>>>>>  		}
>>>>> --
> [Ghorai] Adrian,
> Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB data read) form the original code;
> 
> Initially it was retrying for each page(512 bytes) after multi-block read fail; but this solution is retying for each segment(2048 bytes); 
> 
> 1. Now say block layrer reading 1MB and failed for the 1st segment. So it will still retry for 1MB/2048-bytes, i.e. 512 times.
> 2. So do you think any good reason to retry again and again? 

If you have 1MB that is not readable, it sounds like the card is broken.
Why are so many reads failing?  Has the card been removed?

You might very rarely see ECC errors in a small number of sectors,
but more than that sounds like something else is broken.

> 3. And again I mentioned how to inform the blok-layer that which segment having the valid data? 
> 
> And I was suggesting for not to retry again if 1st retry (single block) failed.
> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> 
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc"
>> in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>> Regards,
>>> Ghorai
>>>
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-10 11:43           ` Adrian Hunter
  0 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-10 11:43 UTC (permalink / raw)
  To: linux-arm-kernel

ext Ghorai, Sukumar wrote:
> 
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
>> Sent: Wednesday, July 28, 2010 2:11 PM
>> To: Ghorai, Sukumar
>> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> ext Ghorai, Sukumar wrote:
>>>> -----Original Message-----
>>>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
>>>> Sent: Tuesday, July 27, 2010 6:52 PM
>>>> To: Ghorai, Sukumar
>>>> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>
>>>> Sukumar Ghorai wrote:
>>>>>  multi-block read failure retries in single block read one by one. It
>>>> continues
>>>>>  retry of subsequent blocks, even after failure. Application will not
>> be
>>>> able
>>>>>  to decode the interleave data (even if few single block read success).
>>>>>  This patch fixes this problem by returning at the first failure
>> instead
>>>> of
>>>>>  waiting for long duration.
>>>> How can you know what sectors are important to the upper layers?
>>> [Ghorai]
>>> 1. This is obvious either give the complete data to above layer or not.
>> And every protocol follows that.
>>> 2. And other scenario, say apps request for to read the 128MB data and
>> it will take quite long time before returning in failure case.
>>> 3. it is quite obvious if the read fail for sector(x) it will fail for
>> sector(x+1)
>> None of that is exactly true.  However you could change the retry
>> from sectors to segments e.g.
>>
>>
>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>> index d545f79..5ed15f6 100644
>> --- a/drivers/mmc/card/block.c
>> +++ b/drivers/mmc/card/block.c
>> @@ -321,13 +321,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
>> struct request *req)
>>  	struct mmc_blk_data *md = mq->data;
>>  	struct mmc_card *card = md->queue.card;
>>  	struct mmc_blk_request brq;
>> -	int ret = 1, disable_multi = 0;
>> +	int ret = 1, retrying = 0;
>>
>>  	mmc_claim_host(card->host);
>>
>>  	do {
>>  		struct mmc_command cmd;
>>  		u32 readcmd, writecmd, status = 0;
>> +		unsigned int blocks;
>>
>>  		memset(&brq, 0, sizeof(struct mmc_blk_request));
>>  		brq.mrq.cmd = &brq.cmd;
>> @@ -341,23 +342,26 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
>> struct request *req)
>>  		brq.stop.opcode = MMC_STOP_TRANSMISSION;
>>  		brq.stop.arg = 0;
>>  		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
>> -		brq.data.blocks = blk_rq_sectors(req);
>> +
>> +		/*
>> +		 * After a read error, we redo the request one segment at a
>> time
>> +		 * in order to accurately determine which segments can be read
>> +		 * successfully.
>> +		 */
>> +		if (retrying)
>> +			blocks = blk_rq_cur_sectors(req);
>> +		else
>> +			blocks = blk_rq_sectors(req);
>>
>>  		/*
>>  		 * The block layer doesn't support all sector count
>>  		 * restrictions, so we need to be prepared for too big
>>  		 * requests.
>>  		 */
>> -		if (brq.data.blocks > card->host->max_blk_count)
>> -			brq.data.blocks = card->host->max_blk_count;
>> +		if (blocks > card->host->max_blk_count)
>> +			blocks = card->host->max_blk_count;
>>
>> -		/*
>> -		 * After a read error, we redo the request one sector at a
>> time
>> -		 * in order to accurately determine which sectors can be read
>> -		 * successfully.
>> -		 */
>> -		if (disable_multi && brq.data.blocks > 1)
>> -			brq.data.blocks = 1;
>> +		brq.data.blocks = blocks;
>>
>>  		if (brq.data.blocks > 1) {
>>  			/* SPI multiblock writes terminate using a special
>> @@ -418,11 +422,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
>> struct request *req)
>>  		 * programming mode even when things go wrong.
>>  		 */
>>  		if (brq.cmd.error || brq.data.error || brq.stop.error) {
>> -			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
>> -				/* Redo read one sector at a time */
>> -				printk(KERN_WARNING "%s: retrying using single "
>> -				       "block read\n", req->rq_disk->disk_name);
>> -				disable_multi = 1;
>> +			if (!retrying && rq_data_dir(req) == READ &&
>> +			    blk_rq_cur_sectors(req) &&
>> +			    blk_rq_cur_sectors(req) < blocks) {
>> +				/* Redo read one segment at a time */
>> +				printk(KERN_WARNING "%s: retrying read one "
>> +						    "segment at a time\n",
>> +						    req->rq_disk->disk_name);
>> +				retrying = 1;
>>  				continue;
>>  			}
>>  			status = get_card_status(card, req);
>> @@ -486,12 +493,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq,
>> struct request *req)
>>  		if (brq.cmd.error || brq.stop.error || brq.data.error) {
>>  			if (rq_data_dir(req) == READ) {
>>  				/*
>> -				 * After an error, we redo I/O one sector at a
>> +				 * After an error, we redo I/O one segment at a
>>  				 * time, so we only reach here after trying to
>> -				 * read a single sector.
>> +				 * read a single segment.  Error out the whole
>> +				 * segment and continue to the next.
>>  				 */
>>  				spin_lock_irq(&md->lock);
>> -				ret = __blk_end_request(req, -EIO,
>> brq.data.blksz);
>> +				ret = __blk_end_request(req, -EIO,
>> blk_rq_cur_sectors(req) << 9);
>>  				spin_unlock_irq(&md->lock);
>>  				continue;
>>  			}
>>
>>
>>>>> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
>>>>> ---
>>>>>  drivers/mmc/card/block.c |    1 -
>>>>>  1 files changed, 0 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>>>>> index cb9fbc8..cfb0827 100644
>>>>> --- a/drivers/mmc/card/block.c
>>>>> +++ b/drivers/mmc/card/block.c
>>>>> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
>>>> struct request *req)
>>>>>  				spin_lock_irq(&md->lock);
>>>>>  				ret = __blk_end_request(req, -EIO,
>>>> brq.data.blksz);
>>>>>  				spin_unlock_irq(&md->lock);
>>>>> -				continue;
>>>>>  			}
>>>>>  			goto cmd_err;
>>>>>  		}
>>>>> --
> [Ghorai] Adrian,
> Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB data read) form the original code;
> 
> Initially it was retrying for each page(512 bytes) after multi-block read fail; but this solution is retying for each segment(2048 bytes); 
> 
> 1. Now say block layrer reading 1MB and failed for the 1st segment. So it will still retry for 1MB/2048-bytes, i.e. 512 times.
> 2. So do you think any good reason to retry again and again? 

If you have 1MB that is not readable, it sounds like the card is broken.
Why are so many reads failing?  Has the card been removed?

You might very rarely see ECC errors in a small number of sectors,
but more than that sounds like something else is broken.

> 3. And again I mentioned how to inform the blok-layer that which segment having the valid data? 
> 
> And I was suggesting for not to retry again if 1st retry (single block) failed.
> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> 
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc"
>> in
>>>>> the body of a message to majordomo at vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>> Regards,
>>> Ghorai
>>>
> 
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-09-10 11:43           ` Adrian Hunter
@ 2010-09-10 11:48             ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-10 11:48 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-mmc, linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> Sent: Friday, September 10, 2010 5:14 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> ext Ghorai, Sukumar wrote:
> >
> >> -----Original Message-----
> >> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> >> Sent: Wednesday, July 28, 2010 2:11 PM
> >> To: Ghorai, Sukumar
> >> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> >> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>
> >> ext Ghorai, Sukumar wrote:
> >>>> -----Original Message-----
> >>>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> >>>> Sent: Tuesday, July 27, 2010 6:52 PM
> >>>> To: Ghorai, Sukumar
> >>>> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> >>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>>>
> >>>> Sukumar Ghorai wrote:
> >>>>>  multi-block read failure retries in single block read one by one.
> It
> >>>> continues
> >>>>>  retry of subsequent blocks, even after failure. Application will
> not
> >> be
> >>>> able
> >>>>>  to decode the interleave data (even if few single block read
> success).
> >>>>>  This patch fixes this problem by returning at the first failure
> >> instead
> >>>> of
> >>>>>  waiting for long duration.
> >>>> How can you know what sectors are important to the upper layers?
> >>> [Ghorai]
> >>> 1. This is obvious either give the complete data to above layer or not.
> >> And every protocol follows that.
> >>> 2. And other scenario, say apps request for to read the 128MB data and
> >> it will take quite long time before returning in failure case.
> >>> 3. it is quite obvious if the read fail for sector(x) it will fail for
> >> sector(x+1)
> >> None of that is exactly true.  However you could change the retry
> >> from sectors to segments e.g.
> >>
> >>
> >> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> >> index d545f79..5ed15f6 100644
> >> --- a/drivers/mmc/card/block.c
> >> +++ b/drivers/mmc/card/block.c
> >> @@ -321,13 +321,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
> *mq,
> >> struct request *req)
> >>  	struct mmc_blk_data *md = mq->data;
> >>  	struct mmc_card *card = md->queue.card;
> >>  	struct mmc_blk_request brq;
> >> -	int ret = 1, disable_multi = 0;
> >> +	int ret = 1, retrying = 0;
> >>
> >>  	mmc_claim_host(card->host);
> >>
> >>  	do {
> >>  		struct mmc_command cmd;
> >>  		u32 readcmd, writecmd, status = 0;
> >> +		unsigned int blocks;
> >>
> >>  		memset(&brq, 0, sizeof(struct mmc_blk_request));
> >>  		brq.mrq.cmd = &brq.cmd;
> >> @@ -341,23 +342,26 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
> *mq,
> >> struct request *req)
> >>  		brq.stop.opcode = MMC_STOP_TRANSMISSION;
> >>  		brq.stop.arg = 0;
> >>  		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
> >> -		brq.data.blocks = blk_rq_sectors(req);
> >> +
> >> +		/*
> >> +		 * After a read error, we redo the request one segment at a
> >> time
> >> +		 * in order to accurately determine which segments can be read
> >> +		 * successfully.
> >> +		 */
> >> +		if (retrying)
> >> +			blocks = blk_rq_cur_sectors(req);
> >> +		else
> >> +			blocks = blk_rq_sectors(req);
> >>
> >>  		/*
> >>  		 * The block layer doesn't support all sector count
> >>  		 * restrictions, so we need to be prepared for too big
> >>  		 * requests.
> >>  		 */
> >> -		if (brq.data.blocks > card->host->max_blk_count)
> >> -			brq.data.blocks = card->host->max_blk_count;
> >> +		if (blocks > card->host->max_blk_count)
> >> +			blocks = card->host->max_blk_count;
> >>
> >> -		/*
> >> -		 * After a read error, we redo the request one sector at a
> >> time
> >> -		 * in order to accurately determine which sectors can be read
> >> -		 * successfully.
> >> -		 */
> >> -		if (disable_multi && brq.data.blocks > 1)
> >> -			brq.data.blocks = 1;
> >> +		brq.data.blocks = blocks;
> >>
> >>  		if (brq.data.blocks > 1) {
> >>  			/* SPI multiblock writes terminate using a special
> >> @@ -418,11 +422,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
> *mq,
> >> struct request *req)
> >>  		 * programming mode even when things go wrong.
> >>  		 */
> >>  		if (brq.cmd.error || brq.data.error || brq.stop.error) {
> >> -			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
> >> -				/* Redo read one sector at a time */
> >> -				printk(KERN_WARNING "%s: retrying using single "
> >> -				       "block read\n", req->rq_disk->disk_name);
> >> -				disable_multi = 1;
> >> +			if (!retrying && rq_data_dir(req) == READ &&
> >> +			    blk_rq_cur_sectors(req) &&
> >> +			    blk_rq_cur_sectors(req) < blocks) {
> >> +				/* Redo read one segment at a time */
> >> +				printk(KERN_WARNING "%s: retrying read one "
> >> +						    "segment at a time\n",
> >> +						    req->rq_disk->disk_name);
> >> +				retrying = 1;
> >>  				continue;
> >>  			}
> >>  			status = get_card_status(card, req);
> >> @@ -486,12 +493,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
> *mq,
> >> struct request *req)
> >>  		if (brq.cmd.error || brq.stop.error || brq.data.error) {
> >>  			if (rq_data_dir(req) == READ) {
> >>  				/*
> >> -				 * After an error, we redo I/O one sector at a
> >> +				 * After an error, we redo I/O one segment at a
> >>  				 * time, so we only reach here after trying to
> >> -				 * read a single sector.
> >> +				 * read a single segment.  Error out the whole
> >> +				 * segment and continue to the next.
> >>  				 */
> >>  				spin_lock_irq(&md->lock);
> >> -				ret = __blk_end_request(req, -EIO,
> >> brq.data.blksz);
> >> +				ret = __blk_end_request(req, -EIO,
> >> blk_rq_cur_sectors(req) << 9);
> >>  				spin_unlock_irq(&md->lock);
> >>  				continue;
> >>  			}
> >>
> >>
> >>>>> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> >>>>> ---
> >>>>>  drivers/mmc/card/block.c |    1 -
> >>>>>  1 files changed, 0 insertions(+), 1 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> >>>>> index cb9fbc8..cfb0827 100644
> >>>>> --- a/drivers/mmc/card/block.c
> >>>>> +++ b/drivers/mmc/card/block.c
> >>>>> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
> >>>> struct request *req)
> >>>>>  				spin_lock_irq(&md->lock);
> >>>>>  				ret = __blk_end_request(req, -EIO,
> >>>> brq.data.blksz);
> >>>>>  				spin_unlock_irq(&md->lock);
> >>>>> -				continue;
> >>>>>  			}
> >>>>>  			goto cmd_err;
> >>>>>  		}
> >>>>> --
> > [Ghorai] Adrian,
> > Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB
> data read) form the original code;
> >
> > Initially it was retrying for each page(512 bytes) after multi-block
> read fail; but this solution is retying for each segment(2048 bytes);
> >
> > 1. Now say block layrer reading 1MB and failed for the 1st segment. So
> it will still retry for 1MB/2048-bytes, i.e. 512 times.
> > 2. So do you think any good reason to retry again and again?
> 
> If you have 1MB that is not readable, it sounds like the card is broken.
> Why are so many reads failing?  Has the card been removed?
> 
> You might very rarely see ECC errors in a small number of sectors,
> but more than that sounds like something else is broken.

[Ghorai] yes, one example is we remove the card when reading data, 
And moreover we should not give the interleave data to apps, as we don't have option to tell application for the valid data.

> 
> > 3. And again I mentioned how to inform the blok-layer that which segment
> having the valid data?
> >
> > And I was suggesting for not to retry again if 1st retry (single block)
> failed.
> > http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >
> >>>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc"
> >> in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>> Regards,
> >>> Ghorai
> >>>
> >
> >


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-10 11:48             ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-10 11:48 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> Sent: Friday, September 10, 2010 5:14 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> ext Ghorai, Sukumar wrote:
> >
> >> -----Original Message-----
> >> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> >> Sent: Wednesday, July 28, 2010 2:11 PM
> >> To: Ghorai, Sukumar
> >> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> >> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>
> >> ext Ghorai, Sukumar wrote:
> >>>> -----Original Message-----
> >>>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> >>>> Sent: Tuesday, July 27, 2010 6:52 PM
> >>>> To: Ghorai, Sukumar
> >>>> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
> >>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>>>
> >>>> Sukumar Ghorai wrote:
> >>>>>  multi-block read failure retries in single block read one by one.
> It
> >>>> continues
> >>>>>  retry of subsequent blocks, even after failure. Application will
> not
> >> be
> >>>> able
> >>>>>  to decode the interleave data (even if few single block read
> success).
> >>>>>  This patch fixes this problem by returning at the first failure
> >> instead
> >>>> of
> >>>>>  waiting for long duration.
> >>>> How can you know what sectors are important to the upper layers?
> >>> [Ghorai]
> >>> 1. This is obvious either give the complete data to above layer or not.
> >> And every protocol follows that.
> >>> 2. And other scenario, say apps request for to read the 128MB data and
> >> it will take quite long time before returning in failure case.
> >>> 3. it is quite obvious if the read fail for sector(x) it will fail for
> >> sector(x+1)
> >> None of that is exactly true.  However you could change the retry
> >> from sectors to segments e.g.
> >>
> >>
> >> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> >> index d545f79..5ed15f6 100644
> >> --- a/drivers/mmc/card/block.c
> >> +++ b/drivers/mmc/card/block.c
> >> @@ -321,13 +321,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
> *mq,
> >> struct request *req)
> >>  	struct mmc_blk_data *md = mq->data;
> >>  	struct mmc_card *card = md->queue.card;
> >>  	struct mmc_blk_request brq;
> >> -	int ret = 1, disable_multi = 0;
> >> +	int ret = 1, retrying = 0;
> >>
> >>  	mmc_claim_host(card->host);
> >>
> >>  	do {
> >>  		struct mmc_command cmd;
> >>  		u32 readcmd, writecmd, status = 0;
> >> +		unsigned int blocks;
> >>
> >>  		memset(&brq, 0, sizeof(struct mmc_blk_request));
> >>  		brq.mrq.cmd = &brq.cmd;
> >> @@ -341,23 +342,26 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
> *mq,
> >> struct request *req)
> >>  		brq.stop.opcode = MMC_STOP_TRANSMISSION;
> >>  		brq.stop.arg = 0;
> >>  		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
> >> -		brq.data.blocks = blk_rq_sectors(req);
> >> +
> >> +		/*
> >> +		 * After a read error, we redo the request one segment at a
> >> time
> >> +		 * in order to accurately determine which segments can be read
> >> +		 * successfully.
> >> +		 */
> >> +		if (retrying)
> >> +			blocks = blk_rq_cur_sectors(req);
> >> +		else
> >> +			blocks = blk_rq_sectors(req);
> >>
> >>  		/*
> >>  		 * The block layer doesn't support all sector count
> >>  		 * restrictions, so we need to be prepared for too big
> >>  		 * requests.
> >>  		 */
> >> -		if (brq.data.blocks > card->host->max_blk_count)
> >> -			brq.data.blocks = card->host->max_blk_count;
> >> +		if (blocks > card->host->max_blk_count)
> >> +			blocks = card->host->max_blk_count;
> >>
> >> -		/*
> >> -		 * After a read error, we redo the request one sector at a
> >> time
> >> -		 * in order to accurately determine which sectors can be read
> >> -		 * successfully.
> >> -		 */
> >> -		if (disable_multi && brq.data.blocks > 1)
> >> -			brq.data.blocks = 1;
> >> +		brq.data.blocks = blocks;
> >>
> >>  		if (brq.data.blocks > 1) {
> >>  			/* SPI multiblock writes terminate using a special
> >> @@ -418,11 +422,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
> *mq,
> >> struct request *req)
> >>  		 * programming mode even when things go wrong.
> >>  		 */
> >>  		if (brq.cmd.error || brq.data.error || brq.stop.error) {
> >> -			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
> >> -				/* Redo read one sector at a time */
> >> -				printk(KERN_WARNING "%s: retrying using single "
> >> -				       "block read\n", req->rq_disk->disk_name);
> >> -				disable_multi = 1;
> >> +			if (!retrying && rq_data_dir(req) == READ &&
> >> +			    blk_rq_cur_sectors(req) &&
> >> +			    blk_rq_cur_sectors(req) < blocks) {
> >> +				/* Redo read one segment at a time */
> >> +				printk(KERN_WARNING "%s: retrying read one "
> >> +						    "segment at a time\n",
> >> +						    req->rq_disk->disk_name);
> >> +				retrying = 1;
> >>  				continue;
> >>  			}
> >>  			status = get_card_status(card, req);
> >> @@ -486,12 +493,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
> *mq,
> >> struct request *req)
> >>  		if (brq.cmd.error || brq.stop.error || brq.data.error) {
> >>  			if (rq_data_dir(req) == READ) {
> >>  				/*
> >> -				 * After an error, we redo I/O one sector at a
> >> +				 * After an error, we redo I/O one segment at a
> >>  				 * time, so we only reach here after trying to
> >> -				 * read a single sector.
> >> +				 * read a single segment.  Error out the whole
> >> +				 * segment and continue to the next.
> >>  				 */
> >>  				spin_lock_irq(&md->lock);
> >> -				ret = __blk_end_request(req, -EIO,
> >> brq.data.blksz);
> >> +				ret = __blk_end_request(req, -EIO,
> >> blk_rq_cur_sectors(req) << 9);
> >>  				spin_unlock_irq(&md->lock);
> >>  				continue;
> >>  			}
> >>
> >>
> >>>>> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
> >>>>> ---
> >>>>>  drivers/mmc/card/block.c |    1 -
> >>>>>  1 files changed, 0 insertions(+), 1 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> >>>>> index cb9fbc8..cfb0827 100644
> >>>>> --- a/drivers/mmc/card/block.c
> >>>>> +++ b/drivers/mmc/card/block.c
> >>>>> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
> >>>> struct request *req)
> >>>>>  				spin_lock_irq(&md->lock);
> >>>>>  				ret = __blk_end_request(req, -EIO,
> >>>> brq.data.blksz);
> >>>>>  				spin_unlock_irq(&md->lock);
> >>>>> -				continue;
> >>>>>  			}
> >>>>>  			goto cmd_err;
> >>>>>  		}
> >>>>> --
> > [Ghorai] Adrian,
> > Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB
> data read) form the original code;
> >
> > Initially it was retrying for each page(512 bytes) after multi-block
> read fail; but this solution is retying for each segment(2048 bytes);
> >
> > 1. Now say block layrer reading 1MB and failed for the 1st segment. So
> it will still retry for 1MB/2048-bytes, i.e. 512 times.
> > 2. So do you think any good reason to retry again and again?
> 
> If you have 1MB that is not readable, it sounds like the card is broken.
> Why are so many reads failing?  Has the card been removed?
> 
> You might very rarely see ECC errors in a small number of sectors,
> but more than that sounds like something else is broken.

[Ghorai] yes, one example is we remove the card when reading data, 
And moreover we should not give the interleave data to apps, as we don't have option to tell application for the valid data.

> 
> > 3. And again I mentioned how to inform the blok-layer that which segment
> having the valid data?
> >
> > And I was suggesting for not to retry again if 1st retry (single block)
> failed.
> > http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >
> >>>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc"
> >> in
> >>>>> the body of a message to majordomo at vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>> Regards,
> >>> Ghorai
> >>>
> >
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-09-10 11:48             ` Ghorai, Sukumar
@ 2010-09-10 13:02               ` Adrian Hunter
  -1 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-10 13:02 UTC (permalink / raw)
  To: Ghorai, Sukumar; +Cc: linux-mmc, linux-arm-kernel

Ghorai, Sukumar wrote:
> 
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
>> Sent: Friday, September 10, 2010 5:14 PM
>> To: Ghorai, Sukumar
>> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> Ghorai, Sukumar wrote:
>>>> -----Original Message-----
>>>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
>>>> Sent: Wednesday, July 28, 2010 2:11 PM
>>>> To: Ghorai, Sukumar
>>>> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>
>>>> Ghorai, Sukumar wrote:
>>>>>> -----Original Message-----
>>>>>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
>>>>>> Sent: Tuesday, July 27, 2010 6:52 PM
>>>>>> To: Ghorai, Sukumar
>>>>>> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org
>>>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>>>
>>>>>> Sukumar Ghorai wrote:
>>>>>>>  multi-block read failure retries in single block read one by one.
>> It
>>>>>> continues
>>>>>>>  retry of subsequent blocks, even after failure. Application will
>> not
>>>> be
>>>>>> able
>>>>>>>  to decode the interleave data (even if few single block read
>> success).
>>>>>>>  This patch fixes this problem by returning at the first failure
>>>> instead
>>>>>> of
>>>>>>>  waiting for long duration.
>>>>>> How can you know what sectors are important to the upper layers?
>>>>> [Ghorai]
>>>>> 1. This is obvious either give the complete data to above layer or not.
>>>> And every protocol follows that.
>>>>> 2. And other scenario, say apps request for to read the 128MB data and
>>>> it will take quite long time before returning in failure case.
>>>>> 3. it is quite obvious if the read fail for sector(x) it will fail for
>>>> sector(x+1)
>>>> None of that is exactly true.  However you could change the retry
>>>> from sectors to segments e.g.
>>>>
>>>>
>>>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>>>> index d545f79..5ed15f6 100644
>>>> --- a/drivers/mmc/card/block.c
>>>> +++ b/drivers/mmc/card/block.c
>>>> @@ -321,13 +321,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
>> *mq,
>>>> struct request *req)
>>>>  	struct mmc_blk_data *md = mq->data;
>>>>  	struct mmc_card *card = md->queue.card;
>>>>  	struct mmc_blk_request brq;
>>>> -	int ret = 1, disable_multi = 0;
>>>> +	int ret = 1, retrying = 0;
>>>>
>>>>  	mmc_claim_host(card->host);
>>>>
>>>>  	do {
>>>>  		struct mmc_command cmd;
>>>>  		u32 readcmd, writecmd, status = 0;
>>>> +		unsigned int blocks;
>>>>
>>>>  		memset(&brq, 0, sizeof(struct mmc_blk_request));
>>>>  		brq.mrq.cmd = &brq.cmd;
>>>> @@ -341,23 +342,26 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
>> *mq,
>>>> struct request *req)
>>>>  		brq.stop.opcode = MMC_STOP_TRANSMISSION;
>>>>  		brq.stop.arg = 0;
>>>>  		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
>>>> -		brq.data.blocks = blk_rq_sectors(req);
>>>> +
>>>> +		/*
>>>> +		 * After a read error, we redo the request one segment at a
>>>> time
>>>> +		 * in order to accurately determine which segments can be read
>>>> +		 * successfully.
>>>> +		 */
>>>> +		if (retrying)
>>>> +			blocks = blk_rq_cur_sectors(req);
>>>> +		else
>>>> +			blocks = blk_rq_sectors(req);
>>>>
>>>>  		/*
>>>>  		 * The block layer doesn't support all sector count
>>>>  		 * restrictions, so we need to be prepared for too big
>>>>  		 * requests.
>>>>  		 */
>>>> -		if (brq.data.blocks > card->host->max_blk_count)
>>>> -			brq.data.blocks = card->host->max_blk_count;
>>>> +		if (blocks > card->host->max_blk_count)
>>>> +			blocks = card->host->max_blk_count;
>>>>
>>>> -		/*
>>>> -		 * After a read error, we redo the request one sector at a
>>>> time
>>>> -		 * in order to accurately determine which sectors can be read
>>>> -		 * successfully.
>>>> -		 */
>>>> -		if (disable_multi && brq.data.blocks > 1)
>>>> -			brq.data.blocks = 1;
>>>> +		brq.data.blocks = blocks;
>>>>
>>>>  		if (brq.data.blocks > 1) {
>>>>  			/* SPI multiblock writes terminate using a special
>>>> @@ -418,11 +422,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
>> *mq,
>>>> struct request *req)
>>>>  		 * programming mode even when things go wrong.
>>>>  		 */
>>>>  		if (brq.cmd.error || brq.data.error || brq.stop.error) {
>>>> -			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
>>>> -				/* Redo read one sector at a time */
>>>> -				printk(KERN_WARNING "%s: retrying using single "
>>>> -				       "block read\n", req->rq_disk->disk_name);
>>>> -				disable_multi = 1;
>>>> +			if (!retrying && rq_data_dir(req) == READ &&
>>>> +			    blk_rq_cur_sectors(req) &&
>>>> +			    blk_rq_cur_sectors(req) < blocks) {
>>>> +				/* Redo read one segment at a time */
>>>> +				printk(KERN_WARNING "%s: retrying read one "
>>>> +						    "segment at a time\n",
>>>> +						    req->rq_disk->disk_name);
>>>> +				retrying = 1;
>>>>  				continue;
>>>>  			}
>>>>  			status = get_card_status(card, req);
>>>> @@ -486,12 +493,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
>> *mq,
>>>> struct request *req)
>>>>  		if (brq.cmd.error || brq.stop.error || brq.data.error) {
>>>>  			if (rq_data_dir(req) == READ) {
>>>>  				/*
>>>> -				 * After an error, we redo I/O one sector at a
>>>> +				 * After an error, we redo I/O one segment at a
>>>>  				 * time, so we only reach here after trying to
>>>> -				 * read a single sector.
>>>> +				 * read a single segment.  Error out the whole
>>>> +				 * segment and continue to the next.
>>>>  				 */
>>>>  				spin_lock_irq(&md->lock);
>>>> -				ret = __blk_end_request(req, -EIO,
>>>> brq.data.blksz);
>>>> +				ret = __blk_end_request(req, -EIO,
>>>> blk_rq_cur_sectors(req) << 9);
>>>>  				spin_unlock_irq(&md->lock);
>>>>  				continue;
>>>>  			}
>>>>
>>>>
>>>>>>> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
>>>>>>> ---
>>>>>>>  drivers/mmc/card/block.c |    1 -
>>>>>>>  1 files changed, 0 insertions(+), 1 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>>>>>>> index cb9fbc8..cfb0827 100644
>>>>>>> --- a/drivers/mmc/card/block.c
>>>>>>> +++ b/drivers/mmc/card/block.c
>>>>>>> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
>>>>>> struct request *req)
>>>>>>>  				spin_lock_irq(&md->lock);
>>>>>>>  				ret = __blk_end_request(req, -EIO,
>>>>>> brq.data.blksz);
>>>>>>>  				spin_unlock_irq(&md->lock);
>>>>>>> -				continue;
>>>>>>>  			}
>>>>>>>  			goto cmd_err;
>>>>>>>  		}
>>>>>>> --
>>> [Ghorai] Adrian,
>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB
>> data read) form the original code;
>>> Initially it was retrying for each page(512 bytes) after multi-block
>> read fail; but this solution is retying for each segment(2048 bytes);
>>> 1. Now say block layrer reading 1MB and failed for the 1st segment. So
>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
>>> 2. So do you think any good reason to retry again and again?
>> If you have 1MB that is not readable, it sounds like the card is broken.
>> Why are so many reads failing?  Has the card been removed?
>>
>> You might very rarely see ECC errors in a small number of sectors,
>> but more than that sounds like something else is broken.
> 
> [Ghorai] yes, one example is we remove the card when reading data, 

Well, that is a different case.  Once the card has gone, the block driver
can (and will once the remove method is called) error out all I/O
requests without sending them to MMC.  That doesn't happen until there
is a card detect interrupt and a resulting rescan.

A possible solution is to put a flag on mmc_card to indicate card_gone
that gets set as soon as the drivers card detect interrupt shows there
is no card (hoping that we are not getting any bouncing on card detect)
and then have mmc_wait_for_req() simple return -ENODEV immediately if
the card_gone flag is set.  Finally, if the mmc block driver sees
a -ENODEV error, it should also check the card_gone flag (via a new
core function) and if the card is gone, do not retry - and perhaps
even error out the rest of the I/O request queue as well.

I can suggest a patch if you want but I am on vacation next week so
it will have to wait a couple of weeks.

> And moreover we should not give the interleave data to apps, as we don't have option to tell application for the valid data.
> 
>>> 3. And again I mentioned how to inform the blok-layer that which segment
>> having the valid data?
>>> And I was suggesting for not to retry again if 1st retry (single block)
>> failed.
>>> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>>>
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc"
>>>> in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>> Regards,
>>>>> Ghorai
>>>>>
>>>
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-10 13:02               ` Adrian Hunter
  0 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-10 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

Ghorai, Sukumar wrote:
> 
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
>> Sent: Friday, September 10, 2010 5:14 PM
>> To: Ghorai, Sukumar
>> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> Ghorai, Sukumar wrote:
>>>> -----Original Message-----
>>>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
>>>> Sent: Wednesday, July 28, 2010 2:11 PM
>>>> To: Ghorai, Sukumar
>>>> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>
>>>> Ghorai, Sukumar wrote:
>>>>>> -----Original Message-----
>>>>>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
>>>>>> Sent: Tuesday, July 27, 2010 6:52 PM
>>>>>> To: Ghorai, Sukumar
>>>>>> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org
>>>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>>>
>>>>>> Sukumar Ghorai wrote:
>>>>>>>  multi-block read failure retries in single block read one by one.
>> It
>>>>>> continues
>>>>>>>  retry of subsequent blocks, even after failure. Application will
>> not
>>>> be
>>>>>> able
>>>>>>>  to decode the interleave data (even if few single block read
>> success).
>>>>>>>  This patch fixes this problem by returning at the first failure
>>>> instead
>>>>>> of
>>>>>>>  waiting for long duration.
>>>>>> How can you know what sectors are important to the upper layers?
>>>>> [Ghorai]
>>>>> 1. This is obvious either give the complete data to above layer or not.
>>>> And every protocol follows that.
>>>>> 2. And other scenario, say apps request for to read the 128MB data and
>>>> it will take quite long time before returning in failure case.
>>>>> 3. it is quite obvious if the read fail for sector(x) it will fail for
>>>> sector(x+1)
>>>> None of that is exactly true.  However you could change the retry
>>>> from sectors to segments e.g.
>>>>
>>>>
>>>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>>>> index d545f79..5ed15f6 100644
>>>> --- a/drivers/mmc/card/block.c
>>>> +++ b/drivers/mmc/card/block.c
>>>> @@ -321,13 +321,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
>> *mq,
>>>> struct request *req)
>>>>  	struct mmc_blk_data *md = mq->data;
>>>>  	struct mmc_card *card = md->queue.card;
>>>>  	struct mmc_blk_request brq;
>>>> -	int ret = 1, disable_multi = 0;
>>>> +	int ret = 1, retrying = 0;
>>>>
>>>>  	mmc_claim_host(card->host);
>>>>
>>>>  	do {
>>>>  		struct mmc_command cmd;
>>>>  		u32 readcmd, writecmd, status = 0;
>>>> +		unsigned int blocks;
>>>>
>>>>  		memset(&brq, 0, sizeof(struct mmc_blk_request));
>>>>  		brq.mrq.cmd = &brq.cmd;
>>>> @@ -341,23 +342,26 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
>> *mq,
>>>> struct request *req)
>>>>  		brq.stop.opcode = MMC_STOP_TRANSMISSION;
>>>>  		brq.stop.arg = 0;
>>>>  		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
>>>> -		brq.data.blocks = blk_rq_sectors(req);
>>>> +
>>>> +		/*
>>>> +		 * After a read error, we redo the request one segment at a
>>>> time
>>>> +		 * in order to accurately determine which segments can be read
>>>> +		 * successfully.
>>>> +		 */
>>>> +		if (retrying)
>>>> +			blocks = blk_rq_cur_sectors(req);
>>>> +		else
>>>> +			blocks = blk_rq_sectors(req);
>>>>
>>>>  		/*
>>>>  		 * The block layer doesn't support all sector count
>>>>  		 * restrictions, so we need to be prepared for too big
>>>>  		 * requests.
>>>>  		 */
>>>> -		if (brq.data.blocks > card->host->max_blk_count)
>>>> -			brq.data.blocks = card->host->max_blk_count;
>>>> +		if (blocks > card->host->max_blk_count)
>>>> +			blocks = card->host->max_blk_count;
>>>>
>>>> -		/*
>>>> -		 * After a read error, we redo the request one sector at a
>>>> time
>>>> -		 * in order to accurately determine which sectors can be read
>>>> -		 * successfully.
>>>> -		 */
>>>> -		if (disable_multi && brq.data.blocks > 1)
>>>> -			brq.data.blocks = 1;
>>>> +		brq.data.blocks = blocks;
>>>>
>>>>  		if (brq.data.blocks > 1) {
>>>>  			/* SPI multiblock writes terminate using a special
>>>> @@ -418,11 +422,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
>> *mq,
>>>> struct request *req)
>>>>  		 * programming mode even when things go wrong.
>>>>  		 */
>>>>  		if (brq.cmd.error || brq.data.error || brq.stop.error) {
>>>> -			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
>>>> -				/* Redo read one sector at a time */
>>>> -				printk(KERN_WARNING "%s: retrying using single "
>>>> -				       "block read\n", req->rq_disk->disk_name);
>>>> -				disable_multi = 1;
>>>> +			if (!retrying && rq_data_dir(req) == READ &&
>>>> +			    blk_rq_cur_sectors(req) &&
>>>> +			    blk_rq_cur_sectors(req) < blocks) {
>>>> +				/* Redo read one segment at a time */
>>>> +				printk(KERN_WARNING "%s: retrying read one "
>>>> +						    "segment at a time\n",
>>>> +						    req->rq_disk->disk_name);
>>>> +				retrying = 1;
>>>>  				continue;
>>>>  			}
>>>>  			status = get_card_status(card, req);
>>>> @@ -486,12 +493,13 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
>> *mq,
>>>> struct request *req)
>>>>  		if (brq.cmd.error || brq.stop.error || brq.data.error) {
>>>>  			if (rq_data_dir(req) == READ) {
>>>>  				/*
>>>> -				 * After an error, we redo I/O one sector at a
>>>> +				 * After an error, we redo I/O one segment at a
>>>>  				 * time, so we only reach here after trying to
>>>> -				 * read a single sector.
>>>> +				 * read a single segment.  Error out the whole
>>>> +				 * segment and continue to the next.
>>>>  				 */
>>>>  				spin_lock_irq(&md->lock);
>>>> -				ret = __blk_end_request(req, -EIO,
>>>> brq.data.blksz);
>>>> +				ret = __blk_end_request(req, -EIO,
>>>> blk_rq_cur_sectors(req) << 9);
>>>>  				spin_unlock_irq(&md->lock);
>>>>  				continue;
>>>>  			}
>>>>
>>>>
>>>>>>> Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
>>>>>>> ---
>>>>>>>  drivers/mmc/card/block.c |    1 -
>>>>>>>  1 files changed, 0 insertions(+), 1 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>>>>>>> index cb9fbc8..cfb0827 100644
>>>>>>> --- a/drivers/mmc/card/block.c
>>>>>>> +++ b/drivers/mmc/card/block.c
>>>>>>> @@ -419,7 +419,6 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
>>>>>> struct request *req)
>>>>>>>  				spin_lock_irq(&md->lock);
>>>>>>>  				ret = __blk_end_request(req, -EIO,
>>>>>> brq.data.blksz);
>>>>>>>  				spin_unlock_irq(&md->lock);
>>>>>>> -				continue;
>>>>>>>  			}
>>>>>>>  			goto cmd_err;
>>>>>>>  		}
>>>>>>> --
>>> [Ghorai] Adrian,
>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB
>> data read) form the original code;
>>> Initially it was retrying for each page(512 bytes) after multi-block
>> read fail; but this solution is retying for each segment(2048 bytes);
>>> 1. Now say block layrer reading 1MB and failed for the 1st segment. So
>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
>>> 2. So do you think any good reason to retry again and again?
>> If you have 1MB that is not readable, it sounds like the card is broken.
>> Why are so many reads failing?  Has the card been removed?
>>
>> You might very rarely see ECC errors in a small number of sectors,
>> but more than that sounds like something else is broken.
> 
> [Ghorai] yes, one example is we remove the card when reading data, 

Well, that is a different case.  Once the card has gone, the block driver
can (and will once the remove method is called) error out all I/O
requests without sending them to MMC.  That doesn't happen until there
is a card detect interrupt and a resulting rescan.

A possible solution is to put a flag on mmc_card to indicate card_gone
that gets set as soon as the drivers card detect interrupt shows there
is no card (hoping that we are not getting any bouncing on card detect)
and then have mmc_wait_for_req() simple return -ENODEV immediately if
the card_gone flag is set.  Finally, if the mmc block driver sees
a -ENODEV error, it should also check the card_gone flag (via a new
core function) and if the card is gone, do not retry - and perhaps
even error out the rest of the I/O request queue as well.

I can suggest a patch if you want but I am on vacation next week so
it will have to wait a couple of weeks.

> And moreover we should not give the interleave data to apps, as we don't have option to tell application for the valid data.
> 
>>> 3. And again I mentioned how to inform the blok-layer that which segment
>> having the valid data?
>>> And I was suggesting for not to retry again if 1st retry (single block)
>> failed.
>>> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>>>
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc"
>>>> in
>>>>>>> the body of a message to majordomo at vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>> Regards,
>>>>> Ghorai
>>>>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-09-10 13:02               ` Adrian Hunter
@ 2010-09-14  5:15                 ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-14  5:15 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-mmc, linux-arm-kernel

Adrian,

[..snip..]
> >>> [Ghorai] Adrian,
> >>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB
> >> data read) form the original code;
> >>> Initially it was retrying for each page(512 bytes) after multi-block
> >> read fail; but this solution is retying for each segment(2048 bytes);
> >>> 1. Now say block layrer reading 1MB and failed for the 1st segment. So
> >> it will still retry for 1MB/2048-bytes, i.e. 512 times.
> >>> 2. So do you think any good reason to retry again and again?
> >> If you have 1MB that is not readable, it sounds like the card is broken.
> >> Why are so many reads failing?  Has the card been removed?
> >>
> >> You might very rarely see ECC errors in a small number of sectors,
> >> but more than that sounds like something else is broken.
> >
> > [Ghorai] yes, one example is we remove the card when reading data,
> 
> Well, that is a different case.  Once the card has gone, the block driver
> can (and will once the remove method is called) error out all I/O
> requests without sending them to MMC.  That doesn't happen until there
> is a card detect interrupt and a resulting rescan.

[Ghorai] here we are discussing two problem, 
1. If IO failed how to stop retry; because of -
	a. internal card error
	b. issue in Filesystem, driver, or host controller issue
	c. or cards removed.

2. And 2nd how to sync block-layer IO, if card removed,
	a. case 1: when card removed interrupt support by the platform 
	b. case 2: when card removed interrupt does not support by the platform?

> 
> A possible solution is to put a flag on mmc_card to indicate card_gone
> that gets set as soon as the drivers card detect interrupt shows there
> is no card (hoping that we are not getting any bouncing on card detect)
> and then have mmc_wait_for_req() simple return -ENODEV immediately if
> the card_gone flag is set.  Finally, if the mmc block driver sees
> a -ENODEV error, it should also check the card_gone flag (via a new
> core function) and if the card is gone, do not retry - and perhaps
> even error out the rest of the I/O request queue as well.

[Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b

And the solution I was proposing to return the status of IO failure as soon as possible to above layer; and handle the card removed interrupt separately or any other issue in h/w or s/w or combination of both. Or just think again when platform don't have the card remove interrupt.

So my patch addresses the 1st part
And for the 2nd part we can submit the patch anytime.

> 
> I can suggest a patch if you want but I am on vacation next week so
> it will have to wait a couple of weeks.
> 
> > And moreover we should not give the interleave data to apps, as we don't
> have option to tell application for the valid data.
> >
[..snip..]
http://comments.gmane.org/gmane.linux.kernel.mmc/2714
 
> >


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-14  5:15                 ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-14  5:15 UTC (permalink / raw)
  To: linux-arm-kernel

Adrian,

[..snip..]
> >>> [Ghorai] Adrian,
> >>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB
> >> data read) form the original code;
> >>> Initially it was retrying for each page(512 bytes) after multi-block
> >> read fail; but this solution is retying for each segment(2048 bytes);
> >>> 1. Now say block layrer reading 1MB and failed for the 1st segment. So
> >> it will still retry for 1MB/2048-bytes, i.e. 512 times.
> >>> 2. So do you think any good reason to retry again and again?
> >> If you have 1MB that is not readable, it sounds like the card is broken.
> >> Why are so many reads failing?  Has the card been removed?
> >>
> >> You might very rarely see ECC errors in a small number of sectors,
> >> but more than that sounds like something else is broken.
> >
> > [Ghorai] yes, one example is we remove the card when reading data,
> 
> Well, that is a different case.  Once the card has gone, the block driver
> can (and will once the remove method is called) error out all I/O
> requests without sending them to MMC.  That doesn't happen until there
> is a card detect interrupt and a resulting rescan.

[Ghorai] here we are discussing two problem, 
1. If IO failed how to stop retry; because of -
	a. internal card error
	b. issue in Filesystem, driver, or host controller issue
	c. or cards removed.

2. And 2nd how to sync block-layer IO, if card removed,
	a. case 1: when card removed interrupt support by the platform 
	b. case 2: when card removed interrupt does not support by the platform?

> 
> A possible solution is to put a flag on mmc_card to indicate card_gone
> that gets set as soon as the drivers card detect interrupt shows there
> is no card (hoping that we are not getting any bouncing on card detect)
> and then have mmc_wait_for_req() simple return -ENODEV immediately if
> the card_gone flag is set.  Finally, if the mmc block driver sees
> a -ENODEV error, it should also check the card_gone flag (via a new
> core function) and if the card is gone, do not retry - and perhaps
> even error out the rest of the I/O request queue as well.

[Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b

And the solution I was proposing to return the status of IO failure as soon as possible to above layer; and handle the card removed interrupt separately or any other issue in h/w or s/w or combination of both. Or just think again when platform don't have the card remove interrupt.

So my patch addresses the 1st part
And for the 2nd part we can submit the patch anytime.

> 
> I can suggest a patch if you want but I am on vacation next week so
> it will have to wait a couple of weeks.
> 
> > And moreover we should not give the interleave data to apps, as we don't
> have option to tell application for the valid data.
> >
[..snip..]
http://comments.gmane.org/gmane.linux.kernel.mmc/2714
 
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-09-14  5:15                 ` Ghorai, Sukumar
@ 2010-09-20  7:54                   ` Adrian Hunter
  -1 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-20  7:54 UTC (permalink / raw)
  To: ext Ghorai, Sukumar; +Cc: linux-mmc, linux-arm-kernel, Adrian Hunter

On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
> Adrian,
>
> [..snip..]
>>>>> [Ghorai] Adrian,
>>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB
>>>> data read) form the original code;
>>>>> Initially it was retrying for each page(512 bytes) after multi-block
>>>> read fail; but this solution is retying for each segment(2048 bytes);
>>>>> 1. Now say block layrer reading 1MB and failed for the 1st segment. So
>>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
>>>>> 2. So do you think any good reason to retry again and again?
>>>> If you have 1MB that is not readable, it sounds like the card is broken.
>>>> Why are so many reads failing?  Has the card been removed?
>>>>
>>>> You might very rarely see ECC errors in a small number of sectors,
>>>> but more than that sounds like something else is broken.
>>>
>>> [Ghorai] yes, one example is we remove the card when reading data,
>>
>> Well, that is a different case.  Once the card has gone, the block driver
>> can (and will once the remove method is called) error out all I/O
>> requests without sending them to MMC.  That doesn't happen until there
>> is a card detect interrupt and a resulting rescan.
>
> [Ghorai] here we are discussing two problem,
> 1. If IO failed how to stop retry; because of -
> 	a. internal card error
> 	b. issue in Filesystem, driver, or host controller issue
> 	c. or cards removed.
>
> 2. And 2nd how to sync block-layer IO, if card removed,
> 	a. case 1: when card removed interrupt support by the platform
> 	b. case 2: when card removed interrupt does not support by the platform?
>
>>
>> A possible solution is to put a flag on mmc_card to indicate card_gone
>> that gets set as soon as the drivers card detect interrupt shows there
>> is no card (hoping that we are not getting any bouncing on card detect)
>> and then have mmc_wait_for_req() simple return -ENODEV immediately if
>> the card_gone flag is set.  Finally, if the mmc block driver sees
>> a -ENODEV error, it should also check the card_gone flag (via a new
>> core function) and if the card is gone, do not retry - and perhaps
>> even error out the rest of the I/O request queue as well.
>
> [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b

The card removal case can be extended to use the bus ops detect method
when there is no card detect irq.  I will send a RFC patch.

With respect to 1.a:
  - If the card has an internal error, then it is broken.  The user
  should remove the card and use a better one.  I do not see how reducing
  retry delays really helps the user very much.  Arguably if the card
  becomes unresponsive, the MMC core could provide a facility to
  reinitialise the card, but that is yet another issue.

With respect to 1.b:
  - The file system cannot cause the block driver to have I/O errors.
  - If there are errors in the driver they should be fixed.
  - If there are hardware problems with the host controller, then
  it is up to the host controller driver to deal with them e.g.
  by resetting the controller.  I don't see what this has to do with
  the block driver.

You leave out the important case of ECC errors.  I am concerned about
this because of the possibility that it happens inside a file system
journal e.g. EXT4 journal.  Perhaps the journal may be recovered if the
error only affects the last transaction, but perhaps not if it destroys
other transactions - which could happen if the approach you suggest
is taken.

>
> And the solution I was proposing to return the status of IO failure as soon as possible to above layer; and handle the card removed interrupt separately or any other issue in h/w or s/w or combination of both. Or just think again when platform don't have the card remove interrupt.
>
> So my patch addresses the 1st part

It is absolutely unacceptable to return I/O errors to the upper layers
for segments that do not have errors.

> And for the 2nd part we can submit the patch anytime.
>
>>
>> I can suggest a patch if you want but I am on vacation next week so
>> it will have to wait a couple of weeks.
>>
>>> And moreover we should not give the interleave data to apps, as we don't
>> have option to tell application for the valid data.
>>>
> [..snip..]
> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>
>>>
>
>


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-20  7:54                   ` Adrian Hunter
  0 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-20  7:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
> Adrian,
>
> [..snip..]
>>>>> [Ghorai] Adrian,
>>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB
>>>> data read) form the original code;
>>>>> Initially it was retrying for each page(512 bytes) after multi-block
>>>> read fail; but this solution is retying for each segment(2048 bytes);
>>>>> 1. Now say block layrer reading 1MB and failed for the 1st segment. So
>>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
>>>>> 2. So do you think any good reason to retry again and again?
>>>> If you have 1MB that is not readable, it sounds like the card is broken.
>>>> Why are so many reads failing?  Has the card been removed?
>>>>
>>>> You might very rarely see ECC errors in a small number of sectors,
>>>> but more than that sounds like something else is broken.
>>>
>>> [Ghorai] yes, one example is we remove the card when reading data,
>>
>> Well, that is a different case.  Once the card has gone, the block driver
>> can (and will once the remove method is called) error out all I/O
>> requests without sending them to MMC.  That doesn't happen until there
>> is a card detect interrupt and a resulting rescan.
>
> [Ghorai] here we are discussing two problem,
> 1. If IO failed how to stop retry; because of -
> 	a. internal card error
> 	b. issue in Filesystem, driver, or host controller issue
> 	c. or cards removed.
>
> 2. And 2nd how to sync block-layer IO, if card removed,
> 	a. case 1: when card removed interrupt support by the platform
> 	b. case 2: when card removed interrupt does not support by the platform?
>
>>
>> A possible solution is to put a flag on mmc_card to indicate card_gone
>> that gets set as soon as the drivers card detect interrupt shows there
>> is no card (hoping that we are not getting any bouncing on card detect)
>> and then have mmc_wait_for_req() simple return -ENODEV immediately if
>> the card_gone flag is set.  Finally, if the mmc block driver sees
>> a -ENODEV error, it should also check the card_gone flag (via a new
>> core function) and if the card is gone, do not retry - and perhaps
>> even error out the rest of the I/O request queue as well.
>
> [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b

The card removal case can be extended to use the bus ops detect method
when there is no card detect irq.  I will send a RFC patch.

With respect to 1.a:
  - If the card has an internal error, then it is broken.  The user
  should remove the card and use a better one.  I do not see how reducing
  retry delays really helps the user very much.  Arguably if the card
  becomes unresponsive, the MMC core could provide a facility to
  reinitialise the card, but that is yet another issue.

With respect to 1.b:
  - The file system cannot cause the block driver to have I/O errors.
  - If there are errors in the driver they should be fixed.
  - If there are hardware problems with the host controller, then
  it is up to the host controller driver to deal with them e.g.
  by resetting the controller.  I don't see what this has to do with
  the block driver.

You leave out the important case of ECC errors.  I am concerned about
this because of the possibility that it happens inside a file system
journal e.g. EXT4 journal.  Perhaps the journal may be recovered if the
error only affects the last transaction, but perhaps not if it destroys
other transactions - which could happen if the approach you suggest
is taken.

>
> And the solution I was proposing to return the status of IO failure as soon as possible to above layer; and handle the card removed interrupt separately or any other issue in h/w or s/w or combination of both. Or just think again when platform don't have the card remove interrupt.
>
> So my patch addresses the 1st part

It is absolutely unacceptable to return I/O errors to the upper layers
for segments that do not have errors.

> And for the 2nd part we can submit the patch anytime.
>
>>
>> I can suggest a patch if you want but I am on vacation next week so
>> it will have to wait a couple of weeks.
>>
>>> And moreover we should not give the interleave data to apps, as we don't
>> have option to tell application for the valid data.
>>>
> [..snip..]
> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>
>>>
>
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-09-20  7:54                   ` Adrian Hunter
@ 2010-09-20  8:57                     ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-20  8:57 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-mmc, linux-arm-kernel

Adrian,

> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> Sent: Monday, September 20, 2010 1:24 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> Adrian Hunter
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
> > Adrian,
> >
> > [..snip..]
> >>>>> [Ghorai] Adrian,
> >>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for
> 1MB
> >>>> data read) form the original code;
> >>>>> Initially it was retrying for each page(512 bytes) after multi-block
> >>>> read fail; but this solution is retying for each segment(2048 bytes);
> >>>>> 1. Now say block layrer reading 1MB and failed for the 1st segment.
> So
> >>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
> >>>>> 2. So do you think any good reason to retry again and again?
> >>>> If you have 1MB that is not readable, it sounds like the card is
> broken.
> >>>> Why are so many reads failing?  Has the card been removed?
> >>>>
> >>>> You might very rarely see ECC errors in a small number of sectors,
> >>>> but more than that sounds like something else is broken.
> >>>
> >>> [Ghorai] yes, one example is we remove the card when reading data,
> >>
> >> Well, that is a different case.  Once the card has gone, the block
> driver
> >> can (and will once the remove method is called) error out all I/O
> >> requests without sending them to MMC.  That doesn't happen until there
> >> is a card detect interrupt and a resulting rescan.
> >
> > [Ghorai] here we are discussing two problem,
> > 1. If IO failed how to stop retry; because of -
> > 	a. internal card error
> > 	b. issue in Filesystem, driver, or host controller issue
> > 	c. or cards removed.
> >
> > 2. And 2nd how to sync block-layer IO, if card removed,
> > 	a. case 1: when card removed interrupt support by the platform
> > 	b. case 2: when card removed interrupt does not support by the
> platform?
> >
> >>
> >> A possible solution is to put a flag on mmc_card to indicate card_gone
> >> that gets set as soon as the drivers card detect interrupt shows there
> >> is no card (hoping that we are not getting any bouncing on card detect)
> >> and then have mmc_wait_for_req() simple return -ENODEV immediately if
> >> the card_gone flag is set.  Finally, if the mmc block driver sees
> >> a -ENODEV error, it should also check the card_gone flag (via a new
> >> core function) and if the card is gone, do not retry - and perhaps
> >> even error out the rest of the I/O request queue as well.
> >
> > [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b
> 
> The card removal case can be extended to use the bus ops detect method
> when there is no card detect irq.  I will send a RFC patch.
> 
> With respect to 1.a:
>   - If the card has an internal error, then it is broken.  The user
>   should remove the card and use a better one.  I do not see how reducing
>   retry delays really helps the user very much.  Arguably if the card
>   becomes unresponsive, the MMC core could provide a facility to
>   reinitialise the card, but that is yet another issue.
> 
> With respect to 1.b:
>   - The file system cannot cause the block driver to have I/O errors.
>   - If there are errors in the driver they should be fixed.
>   - If there are hardware problems with the host controller, then
>   it is up to the host controller driver to deal with them e.g.
>   by resetting the controller.  I don't see what this has to do with
>   the block driver.
> 
> You leave out the important case of ECC errors.  I am concerned about
> this because of the possibility that it happens inside a file system
> journal e.g. EXT4 journal.  Perhaps the journal may be recovered if the
> error only affects the last transaction, but perhaps not if it destroys
> other transactions - which could happen if the approach you suggest
> is taken.
> 
[Ghorai] Thanks lot for your descriptive answer.
1. Can you answer this? 2.b. case 2: when card removed interrupt does not support by the platform?

2. Why block layer handling for inter-leave data? Can you give example diver who is returning interleave data? And how to tell application that buffer having interleave data?

> >
> > And the solution I was proposing to return the status of IO failure as
> soon as possible to above layer; and handle the card removed interrupt
> separately or any other issue in h/w or s/w or combination of both. Or
> just think again when platform don't have the card remove interrupt.
> >
> > So my patch addresses the 1st part
> 
> It is absolutely unacceptable to return I/O errors to the upper layers
> for segments that do not have errors.
> 
> > And for the 2nd part we can submit the patch anytime.
> >
> >>
> >> I can suggest a patch if you want but I am on vacation next week so
> >> it will have to wait a couple of weeks.
> >>
> >>> And moreover we should not give the interleave data to apps, as we
> don't
> >> have option to tell application for the valid data.
> >>>
> > [..snip..]
> > http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >
> >>>
> >
> >


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-20  8:57                     ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-20  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

Adrian,

> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> Sent: Monday, September 20, 2010 1:24 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
> Adrian Hunter
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
> > Adrian,
> >
> > [..snip..]
> >>>>> [Ghorai] Adrian,
> >>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for
> 1MB
> >>>> data read) form the original code;
> >>>>> Initially it was retrying for each page(512 bytes) after multi-block
> >>>> read fail; but this solution is retying for each segment(2048 bytes);
> >>>>> 1. Now say block layrer reading 1MB and failed for the 1st segment.
> So
> >>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
> >>>>> 2. So do you think any good reason to retry again and again?
> >>>> If you have 1MB that is not readable, it sounds like the card is
> broken.
> >>>> Why are so many reads failing?  Has the card been removed?
> >>>>
> >>>> You might very rarely see ECC errors in a small number of sectors,
> >>>> but more than that sounds like something else is broken.
> >>>
> >>> [Ghorai] yes, one example is we remove the card when reading data,
> >>
> >> Well, that is a different case.  Once the card has gone, the block
> driver
> >> can (and will once the remove method is called) error out all I/O
> >> requests without sending them to MMC.  That doesn't happen until there
> >> is a card detect interrupt and a resulting rescan.
> >
> > [Ghorai] here we are discussing two problem,
> > 1. If IO failed how to stop retry; because of -
> > 	a. internal card error
> > 	b. issue in Filesystem, driver, or host controller issue
> > 	c. or cards removed.
> >
> > 2. And 2nd how to sync block-layer IO, if card removed,
> > 	a. case 1: when card removed interrupt support by the platform
> > 	b. case 2: when card removed interrupt does not support by the
> platform?
> >
> >>
> >> A possible solution is to put a flag on mmc_card to indicate card_gone
> >> that gets set as soon as the drivers card detect interrupt shows there
> >> is no card (hoping that we are not getting any bouncing on card detect)
> >> and then have mmc_wait_for_req() simple return -ENODEV immediately if
> >> the card_gone flag is set.  Finally, if the mmc block driver sees
> >> a -ENODEV error, it should also check the card_gone flag (via a new
> >> core function) and if the card is gone, do not retry - and perhaps
> >> even error out the rest of the I/O request queue as well.
> >
> > [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b
> 
> The card removal case can be extended to use the bus ops detect method
> when there is no card detect irq.  I will send a RFC patch.
> 
> With respect to 1.a:
>   - If the card has an internal error, then it is broken.  The user
>   should remove the card and use a better one.  I do not see how reducing
>   retry delays really helps the user very much.  Arguably if the card
>   becomes unresponsive, the MMC core could provide a facility to
>   reinitialise the card, but that is yet another issue.
> 
> With respect to 1.b:
>   - The file system cannot cause the block driver to have I/O errors.
>   - If there are errors in the driver they should be fixed.
>   - If there are hardware problems with the host controller, then
>   it is up to the host controller driver to deal with them e.g.
>   by resetting the controller.  I don't see what this has to do with
>   the block driver.
> 
> You leave out the important case of ECC errors.  I am concerned about
> this because of the possibility that it happens inside a file system
> journal e.g. EXT4 journal.  Perhaps the journal may be recovered if the
> error only affects the last transaction, but perhaps not if it destroys
> other transactions - which could happen if the approach you suggest
> is taken.
> 
[Ghorai] Thanks lot for your descriptive answer.
1. Can you answer this? 2.b. case 2: when card removed interrupt does not support by the platform?

2. Why block layer handling for inter-leave data? Can you give example diver who is returning interleave data? And how to tell application that buffer having interleave data?

> >
> > And the solution I was proposing to return the status of IO failure as
> soon as possible to above layer; and handle the card removed interrupt
> separately or any other issue in h/w or s/w or combination of both. Or
> just think again when platform don't have the card remove interrupt.
> >
> > So my patch addresses the 1st part
> 
> It is absolutely unacceptable to return I/O errors to the upper layers
> for segments that do not have errors.
> 
> > And for the 2nd part we can submit the patch anytime.
> >
> >>
> >> I can suggest a patch if you want but I am on vacation next week so
> >> it will have to wait a couple of weeks.
> >>
> >>> And moreover we should not give the interleave data to apps, as we
> don't
> >> have option to tell application for the valid data.
> >>>
> > [..snip..]
> > http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >
> >>>
> >
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-09-20  8:57                     ` Ghorai, Sukumar
@ 2010-09-20 11:49                       ` Adrian Hunter
  -1 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-20 11:49 UTC (permalink / raw)
  To: Ghorai, Sukumar; +Cc: linux-mmc, linux-arm-kernel, Adrian Hunter

On 20/09/10 11:57, Ghorai, Sukumar wrote:
> Adrian,
>
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
>> Sent: Monday, September 20, 2010 1:24 PM
>> To: Ghorai, Sukumar
>> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
>> Adrian Hunter
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
>>> Adrian,
>>>
>>> [..snip..]
>>>>>>> [Ghorai] Adrian,
>>>>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for
>> 1MB
>>>>>> data read) form the original code;
>>>>>>> Initially it was retrying for each page(512 bytes) after multi-block
>>>>>> read fail; but this solution is retying for each segment(2048 bytes);
>>>>>>> 1. Now say block layrer reading 1MB and failed for the 1st segment.
>> So
>>>>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
>>>>>>> 2. So do you think any good reason to retry again and again?
>>>>>> If you have 1MB that is not readable, it sounds like the card is
>> broken.
>>>>>> Why are so many reads failing?  Has the card been removed?
>>>>>>
>>>>>> You might very rarely see ECC errors in a small number of sectors,
>>>>>> but more than that sounds like something else is broken.
>>>>>
>>>>> [Ghorai] yes, one example is we remove the card when reading data,
>>>>
>>>> Well, that is a different case.  Once the card has gone, the block
>> driver
>>>> can (and will once the remove method is called) error out all I/O
>>>> requests without sending them to MMC.  That doesn't happen until there
>>>> is a card detect interrupt and a resulting rescan.
>>>
>>> [Ghorai] here we are discussing two problem,
>>> 1. If IO failed how to stop retry; because of -
>>> 	a. internal card error
>>> 	b. issue in Filesystem, driver, or host controller issue
>>> 	c. or cards removed.
>>>
>>> 2. And 2nd how to sync block-layer IO, if card removed,
>>> 	a. case 1: when card removed interrupt support by the platform
>>> 	b. case 2: when card removed interrupt does not support by the
>> platform?
>>>
>>>>
>>>> A possible solution is to put a flag on mmc_card to indicate card_gone
>>>> that gets set as soon as the drivers card detect interrupt shows there
>>>> is no card (hoping that we are not getting any bouncing on card detect)
>>>> and then have mmc_wait_for_req() simple return -ENODEV immediately if
>>>> the card_gone flag is set.  Finally, if the mmc block driver sees
>>>> a -ENODEV error, it should also check the card_gone flag (via a new
>>>> core function) and if the card is gone, do not retry - and perhaps
>>>> even error out the rest of the I/O request queue as well.
>>>
>>> [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b
>>
>> The card removal case can be extended to use the bus ops detect method
>> when there is no card detect irq.  I will send a RFC patch.
>>
>> With respect to 1.a:
>>    - If the card has an internal error, then it is broken.  The user
>>    should remove the card and use a better one.  I do not see how reducing
>>    retry delays really helps the user very much.  Arguably if the card
>>    becomes unresponsive, the MMC core could provide a facility to
>>    reinitialise the card, but that is yet another issue.
>>
>> With respect to 1.b:
>>    - The file system cannot cause the block driver to have I/O errors.
>>    - If there are errors in the driver they should be fixed.
>>    - If there are hardware problems with the host controller, then
>>    it is up to the host controller driver to deal with them e.g.
>>    by resetting the controller.  I don't see what this has to do with
>>    the block driver.
>>
>> You leave out the important case of ECC errors.  I am concerned about
>> this because of the possibility that it happens inside a file system
>> journal e.g. EXT4 journal.  Perhaps the journal may be recovered if the
>> error only affects the last transaction, but perhaps not if it destroys
>> other transactions - which could happen if the approach you suggest
>> is taken.
>>
> [Ghorai] Thanks lot for your descriptive answer.
> 1. Can you answer this? 2.b. case 2: when card removed interrupt does not support by the platform?

As I wrote above: The card removal case can be extended to use the bus ops
detect method when there is no card detect irq.  I will send a RFC patch.

>
> 2. Why block layer handling for inter-leave data? Can you give example diver who is returning interleave data? And how to tell application that buffer having interleave data?

I am not sure what you mean by interleave data, but file systems  for example
are free to map any block to any file, directory or file system object,
so a consecutive series of sectors may contain unrelated data.  Up to a maximum
size, the block layer merges I/O requests when the sectors are consecutive,
so an I/O request can also contain unrelated data.

>
>>>
>>> And the solution I was proposing to return the status of IO failure as
>> soon as possible to above layer; and handle the card removed interrupt
>> separately or any other issue in h/w or s/w or combination of both. Or
>> just think again when platform don't have the card remove interrupt.
>>>
>>> So my patch addresses the 1st part
>>
>> It is absolutely unacceptable to return I/O errors to the upper layers
>> for segments that do not have errors.
>>
>>> And for the 2nd part we can submit the patch anytime.
>>>
>>>>
>>>> I can suggest a patch if you want but I am on vacation next week so
>>>> it will have to wait a couple of weeks.
>>>>
>>>>> And moreover we should not give the interleave data to apps, as we
>> don't
>>>> have option to tell application for the valid data.
>>>>>
>>> [..snip..]
>>> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>>>
>>>>>
>>>
>>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-20 11:49                       ` Adrian Hunter
  0 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-20 11:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/09/10 11:57, Ghorai, Sukumar wrote:
> Adrian,
>
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
>> Sent: Monday, September 20, 2010 1:24 PM
>> To: Ghorai, Sukumar
>> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
>> Adrian Hunter
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
>>> Adrian,
>>>
>>> [..snip..]
>>>>>>> [Ghorai] Adrian,
>>>>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for
>> 1MB
>>>>>> data read) form the original code;
>>>>>>> Initially it was retrying for each page(512 bytes) after multi-block
>>>>>> read fail; but this solution is retying for each segment(2048 bytes);
>>>>>>> 1. Now say block layrer reading 1MB and failed for the 1st segment.
>> So
>>>>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
>>>>>>> 2. So do you think any good reason to retry again and again?
>>>>>> If you have 1MB that is not readable, it sounds like the card is
>> broken.
>>>>>> Why are so many reads failing?  Has the card been removed?
>>>>>>
>>>>>> You might very rarely see ECC errors in a small number of sectors,
>>>>>> but more than that sounds like something else is broken.
>>>>>
>>>>> [Ghorai] yes, one example is we remove the card when reading data,
>>>>
>>>> Well, that is a different case.  Once the card has gone, the block
>> driver
>>>> can (and will once the remove method is called) error out all I/O
>>>> requests without sending them to MMC.  That doesn't happen until there
>>>> is a card detect interrupt and a resulting rescan.
>>>
>>> [Ghorai] here we are discussing two problem,
>>> 1. If IO failed how to stop retry; because of -
>>> 	a. internal card error
>>> 	b. issue in Filesystem, driver, or host controller issue
>>> 	c. or cards removed.
>>>
>>> 2. And 2nd how to sync block-layer IO, if card removed,
>>> 	a. case 1: when card removed interrupt support by the platform
>>> 	b. case 2: when card removed interrupt does not support by the
>> platform?
>>>
>>>>
>>>> A possible solution is to put a flag on mmc_card to indicate card_gone
>>>> that gets set as soon as the drivers card detect interrupt shows there
>>>> is no card (hoping that we are not getting any bouncing on card detect)
>>>> and then have mmc_wait_for_req() simple return -ENODEV immediately if
>>>> the card_gone flag is set.  Finally, if the mmc block driver sees
>>>> a -ENODEV error, it should also check the card_gone flag (via a new
>>>> core function) and if the card is gone, do not retry - and perhaps
>>>> even error out the rest of the I/O request queue as well.
>>>
>>> [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b
>>
>> The card removal case can be extended to use the bus ops detect method
>> when there is no card detect irq.  I will send a RFC patch.
>>
>> With respect to 1.a:
>>    - If the card has an internal error, then it is broken.  The user
>>    should remove the card and use a better one.  I do not see how reducing
>>    retry delays really helps the user very much.  Arguably if the card
>>    becomes unresponsive, the MMC core could provide a facility to
>>    reinitialise the card, but that is yet another issue.
>>
>> With respect to 1.b:
>>    - The file system cannot cause the block driver to have I/O errors.
>>    - If there are errors in the driver they should be fixed.
>>    - If there are hardware problems with the host controller, then
>>    it is up to the host controller driver to deal with them e.g.
>>    by resetting the controller.  I don't see what this has to do with
>>    the block driver.
>>
>> You leave out the important case of ECC errors.  I am concerned about
>> this because of the possibility that it happens inside a file system
>> journal e.g. EXT4 journal.  Perhaps the journal may be recovered if the
>> error only affects the last transaction, but perhaps not if it destroys
>> other transactions - which could happen if the approach you suggest
>> is taken.
>>
> [Ghorai] Thanks lot for your descriptive answer.
> 1. Can you answer this? 2.b. case 2: when card removed interrupt does not support by the platform?

As I wrote above: The card removal case can be extended to use the bus ops
detect method when there is no card detect irq.  I will send a RFC patch.

>
> 2. Why block layer handling for inter-leave data? Can you give example diver who is returning interleave data? And how to tell application that buffer having interleave data?

I am not sure what you mean by interleave data, but file systems  for example
are free to map any block to any file, directory or file system object,
so a consecutive series of sectors may contain unrelated data.  Up to a maximum
size, the block layer merges I/O requests when the sectors are consecutive,
so an I/O request can also contain unrelated data.

>
>>>
>>> And the solution I was proposing to return the status of IO failure as
>> soon as possible to above layer; and handle the card removed interrupt
>> separately or any other issue in h/w or s/w or combination of both. Or
>> just think again when platform don't have the card remove interrupt.
>>>
>>> So my patch addresses the 1st part
>>
>> It is absolutely unacceptable to return I/O errors to the upper layers
>> for segments that do not have errors.
>>
>>> And for the 2nd part we can submit the patch anytime.
>>>
>>>>
>>>> I can suggest a patch if you want but I am on vacation next week so
>>>> it will have to wait a couple of weeks.
>>>>
>>>>> And moreover we should not give the interleave data to apps, as we
>> don't
>>>> have option to tell application for the valid data.
>>>>>
>>> [..snip..]
>>> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>>>
>>>>>
>>>
>>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-09-20 11:49                       ` Adrian Hunter
@ 2010-09-20 12:37                         ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-20 12:37 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-mmc, linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> Sent: Monday, September 20, 2010 5:20 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> Adrian Hunter
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On 20/09/10 11:57, Ghorai, Sukumar wrote:
> > Adrian,
> >
> >> -----Original Message-----
> >> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> >> Sent: Monday, September 20, 2010 1:24 PM
> >> To: Ghorai, Sukumar
> >> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> >> Adrian Hunter
> >> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>
> >> On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
> >>> Adrian,
> >>>
> >>> [..snip..]
> >>>>>>> [Ghorai] Adrian,
> >>>>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for
> >> 1MB
> >>>>>> data read) form the original code;
> >>>>>>> Initially it was retrying for each page(512 bytes) after multi-
> block
> >>>>>> read fail; but this solution is retying for each segment(2048
> bytes);
> >>>>>>> 1. Now say block layrer reading 1MB and failed for the 1st segment.
> >> So
> >>>>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
> >>>>>>> 2. So do you think any good reason to retry again and again?
> >>>>>> If you have 1MB that is not readable, it sounds like the card is
> >> broken.
> >>>>>> Why are so many reads failing?  Has the card been removed?
> >>>>>>
> >>>>>> You might very rarely see ECC errors in a small number of sectors,
> >>>>>> but more than that sounds like something else is broken.
> >>>>>
> >>>>> [Ghorai] yes, one example is we remove the card when reading data,
> >>>>
> >>>> Well, that is a different case.  Once the card has gone, the block
> >> driver
> >>>> can (and will once the remove method is called) error out all I/O
> >>>> requests without sending them to MMC.  That doesn't happen until
> there
> >>>> is a card detect interrupt and a resulting rescan.
> >>>
> >>> [Ghorai] here we are discussing two problem,
> >>> 1. If IO failed how to stop retry; because of -
> >>> 	a. internal card error
> >>> 	b. issue in Filesystem, driver, or host controller issue
> >>> 	c. or cards removed.
> >>>
> >>> 2. And 2nd how to sync block-layer IO, if card removed,
> >>> 	a. case 1: when card removed interrupt support by the platform
> >>> 	b. case 2: when card removed interrupt does not support by the
> >> platform?
> >>>
> >>>>
> >>>> A possible solution is to put a flag on mmc_card to indicate
> card_gone
> >>>> that gets set as soon as the drivers card detect interrupt shows
> there
> >>>> is no card (hoping that we are not getting any bouncing on card
> detect)
> >>>> and then have mmc_wait_for_req() simple return -ENODEV immediately if
> >>>> the card_gone flag is set.  Finally, if the mmc block driver sees
> >>>> a -ENODEV error, it should also check the card_gone flag (via a new
> >>>> core function) and if the card is gone, do not retry - and perhaps
> >>>> even error out the rest of the I/O request queue as well.
> >>>
> >>> [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b
> >>
> >> The card removal case can be extended to use the bus ops detect method
> >> when there is no card detect irq.  I will send a RFC patch.
> >>
> >> With respect to 1.a:
> >>    - If the card has an internal error, then it is broken.  The user
> >>    should remove the card and use a better one.  I do not see how
> reducing
> >>    retry delays really helps the user very much.  Arguably if the card
> >>    becomes unresponsive, the MMC core could provide a facility to
> >>    reinitialise the card, but that is yet another issue.
> >>
> >> With respect to 1.b:
> >>    - The file system cannot cause the block driver to have I/O errors.
> >>    - If there are errors in the driver they should be fixed.
> >>    - If there are hardware problems with the host controller, then
> >>    it is up to the host controller driver to deal with them e.g.
> >>    by resetting the controller.  I don't see what this has to do with
> >>    the block driver.
> >>
> >> You leave out the important case of ECC errors.  I am concerned about
> >> this because of the possibility that it happens inside a file system
> >> journal e.g. EXT4 journal.  Perhaps the journal may be recovered if the
> >> error only affects the last transaction, but perhaps not if it destroys
> >> other transactions - which could happen if the approach you suggest
> >> is taken.
> >>
> > [Ghorai] Thanks lot for your descriptive answer.
> > 1. Can you answer this? 2.b. case 2: when card removed interrupt does
> not support by the platform?
> 
> As I wrote above: The card removal case can be extended to use the bus ops
> detect method when there is no card detect irq.  I will send a RFC patch.
> 
> >
> > 2. Why block layer handling for inter-leave data? Can you give example
> diver who is returning interleave data? And how to tell application that
> buffer having interleave data?
> 
> I am not sure what you mean by interleave data, but file systems  for
> example
> are free to map any block to any file, directory or file system object,
> so a consecutive series of sectors may contain unrelated data.  Up to a
> maximum
> size, the block layer merges I/O requests when the sectors are consecutive,
> so an I/O request can also contain unrelated data.

[Ghorai] 
1. I don't think so, FS know where data exists and where is the free space. Except oth cluster.

2. Where its mentioned in block media that for segment-x[i],x[j] data read fail out of all all requested segments form [1..n].
And I never gone through any driver/protocol, that retry the next i+1th segment where ith-segment is failed. And for that my suggestion is preferred.

> 
> >
> >>>
> >>> And the solution I was proposing to return the status of IO failure as
> >> soon as possible to above layer; and handle the card removed interrupt
> >> separately or any other issue in h/w or s/w or combination of both. Or
> >> just think again when platform don't have the card remove interrupt.
> >>>
> >>> So my patch addresses the 1st part
> >>
> >> It is absolutely unacceptable to return I/O errors to the upper layers
> >> for segments that do not have errors.
> >>
> >>> And for the 2nd part we can submit the patch anytime.
> >>>
> >>>>
> >>>> I can suggest a patch if you want but I am on vacation next week so
> >>>> it will have to wait a couple of weeks.
> >>>>
> >>>>> And moreover we should not give the interleave data to apps, as we
> >> don't
> >>>> have option to tell application for the valid data.
> >>>>>
> >>> [..snip..]
> >>> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >>>
> >>>>>
> >>>
> >>>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-20 12:37                         ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-20 12:37 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> Sent: Monday, September 20, 2010 5:20 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
> Adrian Hunter
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On 20/09/10 11:57, Ghorai, Sukumar wrote:
> > Adrian,
> >
> >> -----Original Message-----
> >> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> >> Sent: Monday, September 20, 2010 1:24 PM
> >> To: Ghorai, Sukumar
> >> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
> >> Adrian Hunter
> >> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>
> >> On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
> >>> Adrian,
> >>>
> >>> [..snip..]
> >>>>>>> [Ghorai] Adrian,
> >>>>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for
> >> 1MB
> >>>>>> data read) form the original code;
> >>>>>>> Initially it was retrying for each page(512 bytes) after multi-
> block
> >>>>>> read fail; but this solution is retying for each segment(2048
> bytes);
> >>>>>>> 1. Now say block layrer reading 1MB and failed for the 1st segment.
> >> So
> >>>>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
> >>>>>>> 2. So do you think any good reason to retry again and again?
> >>>>>> If you have 1MB that is not readable, it sounds like the card is
> >> broken.
> >>>>>> Why are so many reads failing?  Has the card been removed?
> >>>>>>
> >>>>>> You might very rarely see ECC errors in a small number of sectors,
> >>>>>> but more than that sounds like something else is broken.
> >>>>>
> >>>>> [Ghorai] yes, one example is we remove the card when reading data,
> >>>>
> >>>> Well, that is a different case.  Once the card has gone, the block
> >> driver
> >>>> can (and will once the remove method is called) error out all I/O
> >>>> requests without sending them to MMC.  That doesn't happen until
> there
> >>>> is a card detect interrupt and a resulting rescan.
> >>>
> >>> [Ghorai] here we are discussing two problem,
> >>> 1. If IO failed how to stop retry; because of -
> >>> 	a. internal card error
> >>> 	b. issue in Filesystem, driver, or host controller issue
> >>> 	c. or cards removed.
> >>>
> >>> 2. And 2nd how to sync block-layer IO, if card removed,
> >>> 	a. case 1: when card removed interrupt support by the platform
> >>> 	b. case 2: when card removed interrupt does not support by the
> >> platform?
> >>>
> >>>>
> >>>> A possible solution is to put a flag on mmc_card to indicate
> card_gone
> >>>> that gets set as soon as the drivers card detect interrupt shows
> there
> >>>> is no card (hoping that we are not getting any bouncing on card
> detect)
> >>>> and then have mmc_wait_for_req() simple return -ENODEV immediately if
> >>>> the card_gone flag is set.  Finally, if the mmc block driver sees
> >>>> a -ENODEV error, it should also check the card_gone flag (via a new
> >>>> core function) and if the card is gone, do not retry - and perhaps
> >>>> even error out the rest of the I/O request queue as well.
> >>>
> >>> [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b
> >>
> >> The card removal case can be extended to use the bus ops detect method
> >> when there is no card detect irq.  I will send a RFC patch.
> >>
> >> With respect to 1.a:
> >>    - If the card has an internal error, then it is broken.  The user
> >>    should remove the card and use a better one.  I do not see how
> reducing
> >>    retry delays really helps the user very much.  Arguably if the card
> >>    becomes unresponsive, the MMC core could provide a facility to
> >>    reinitialise the card, but that is yet another issue.
> >>
> >> With respect to 1.b:
> >>    - The file system cannot cause the block driver to have I/O errors.
> >>    - If there are errors in the driver they should be fixed.
> >>    - If there are hardware problems with the host controller, then
> >>    it is up to the host controller driver to deal with them e.g.
> >>    by resetting the controller.  I don't see what this has to do with
> >>    the block driver.
> >>
> >> You leave out the important case of ECC errors.  I am concerned about
> >> this because of the possibility that it happens inside a file system
> >> journal e.g. EXT4 journal.  Perhaps the journal may be recovered if the
> >> error only affects the last transaction, but perhaps not if it destroys
> >> other transactions - which could happen if the approach you suggest
> >> is taken.
> >>
> > [Ghorai] Thanks lot for your descriptive answer.
> > 1. Can you answer this? 2.b. case 2: when card removed interrupt does
> not support by the platform?
> 
> As I wrote above: The card removal case can be extended to use the bus ops
> detect method when there is no card detect irq.  I will send a RFC patch.
> 
> >
> > 2. Why block layer handling for inter-leave data? Can you give example
> diver who is returning interleave data? And how to tell application that
> buffer having interleave data?
> 
> I am not sure what you mean by interleave data, but file systems  for
> example
> are free to map any block to any file, directory or file system object,
> so a consecutive series of sectors may contain unrelated data.  Up to a
> maximum
> size, the block layer merges I/O requests when the sectors are consecutive,
> so an I/O request can also contain unrelated data.

[Ghorai] 
1. I don't think so, FS know where data exists and where is the free space. Except oth cluster.

2. Where its mentioned in block media that for segment-x[i],x[j] data read fail out of all all requested segments form [1..n].
And I never gone through any driver/protocol, that retry the next i+1th segment where ith-segment is failed. And for that my suggestion is preferred.

> 
> >
> >>>
> >>> And the solution I was proposing to return the status of IO failure as
> >> soon as possible to above layer; and handle the card removed interrupt
> >> separately or any other issue in h/w or s/w or combination of both. Or
> >> just think again when platform don't have the card remove interrupt.
> >>>
> >>> So my patch addresses the 1st part
> >>
> >> It is absolutely unacceptable to return I/O errors to the upper layers
> >> for segments that do not have errors.
> >>
> >>> And for the 2nd part we can submit the patch anytime.
> >>>
> >>>>
> >>>> I can suggest a patch if you want but I am on vacation next week so
> >>>> it will have to wait a couple of weeks.
> >>>>
> >>>>> And moreover we should not give the interleave data to apps, as we
> >> don't
> >>>> have option to tell application for the valid data.
> >>>>>
> >>> [..snip..]
> >>> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >>>
> >>>>>
> >>>
> >>>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-09-20 12:37                         ` Ghorai, Sukumar
@ 2010-09-20 13:09                           ` Adrian Hunter
  -1 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-20 13:09 UTC (permalink / raw)
  To: Ghorai, Sukumar; +Cc: linux-mmc, linux-arm-kernel, Adrian Hunter

On 20/09/10 15:37, ext Ghorai, Sukumar wrote:
>
>
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
>> Sent: Monday, September 20, 2010 5:20 PM
>> To: Ghorai, Sukumar
>> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
>> Adrian Hunter
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> On 20/09/10 11:57, Ghorai, Sukumar wrote:
>>> Adrian,
>>>
>>>> -----Original Message-----
>>>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
>>>> Sent: Monday, September 20, 2010 1:24 PM
>>>> To: Ghorai, Sukumar
>>>> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
>>>> Adrian Hunter
>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>
>>>> On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
>>>>> Adrian,
>>>>>
>>>>> [..snip..]
>>>>>>>>> [Ghorai] Adrian,
>>>>>>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for
>>>> 1MB
>>>>>>>> data read) form the original code;
>>>>>>>>> Initially it was retrying for each page(512 bytes) after multi-
>> block
>>>>>>>> read fail; but this solution is retying for each segment(2048
>> bytes);
>>>>>>>>> 1. Now say block layrer reading 1MB and failed for the 1st segment.
>>>> So
>>>>>>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
>>>>>>>>> 2. So do you think any good reason to retry again and again?
>>>>>>>> If you have 1MB that is not readable, it sounds like the card is
>>>> broken.
>>>>>>>> Why are so many reads failing?  Has the card been removed?
>>>>>>>>
>>>>>>>> You might very rarely see ECC errors in a small number of sectors,
>>>>>>>> but more than that sounds like something else is broken.
>>>>>>>
>>>>>>> [Ghorai] yes, one example is we remove the card when reading data,
>>>>>>
>>>>>> Well, that is a different case.  Once the card has gone, the block
>>>> driver
>>>>>> can (and will once the remove method is called) error out all I/O
>>>>>> requests without sending them to MMC.  That doesn't happen until
>> there
>>>>>> is a card detect interrupt and a resulting rescan.
>>>>>
>>>>> [Ghorai] here we are discussing two problem,
>>>>> 1. If IO failed how to stop retry; because of -
>>>>> 	a. internal card error
>>>>> 	b. issue in Filesystem, driver, or host controller issue
>>>>> 	c. or cards removed.
>>>>>
>>>>> 2. And 2nd how to sync block-layer IO, if card removed,
>>>>> 	a. case 1: when card removed interrupt support by the platform
>>>>> 	b. case 2: when card removed interrupt does not support by the
>>>> platform?
>>>>>
>>>>>>
>>>>>> A possible solution is to put a flag on mmc_card to indicate
>> card_gone
>>>>>> that gets set as soon as the drivers card detect interrupt shows
>> there
>>>>>> is no card (hoping that we are not getting any bouncing on card
>> detect)
>>>>>> and then have mmc_wait_for_req() simple return -ENODEV immediately if
>>>>>> the card_gone flag is set.  Finally, if the mmc block driver sees
>>>>>> a -ENODEV error, it should also check the card_gone flag (via a new
>>>>>> core function) and if the card is gone, do not retry - and perhaps
>>>>>> even error out the rest of the I/O request queue as well.
>>>>>
>>>>> [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b
>>>>
>>>> The card removal case can be extended to use the bus ops detect method
>>>> when there is no card detect irq.  I will send a RFC patch.
>>>>
>>>> With respect to 1.a:
>>>>     - If the card has an internal error, then it is broken.  The user
>>>>     should remove the card and use a better one.  I do not see how
>> reducing
>>>>     retry delays really helps the user very much.  Arguably if the card
>>>>     becomes unresponsive, the MMC core could provide a facility to
>>>>     reinitialise the card, but that is yet another issue.
>>>>
>>>> With respect to 1.b:
>>>>     - The file system cannot cause the block driver to have I/O errors.
>>>>     - If there are errors in the driver they should be fixed.
>>>>     - If there are hardware problems with the host controller, then
>>>>     it is up to the host controller driver to deal with them e.g.
>>>>     by resetting the controller.  I don't see what this has to do with
>>>>     the block driver.
>>>>
>>>> You leave out the important case of ECC errors.  I am concerned about
>>>> this because of the possibility that it happens inside a file system
>>>> journal e.g. EXT4 journal.  Perhaps the journal may be recovered if the
>>>> error only affects the last transaction, but perhaps not if it destroys
>>>> other transactions - which could happen if the approach you suggest
>>>> is taken.
>>>>
>>> [Ghorai] Thanks lot for your descriptive answer.
>>> 1. Can you answer this? 2.b. case 2: when card removed interrupt does
>> not support by the platform?
>>
>> As I wrote above: The card removal case can be extended to use the bus ops
>> detect method when there is no card detect irq.  I will send a RFC patch.
>>
>>>
>>> 2. Why block layer handling for inter-leave data? Can you give example
>> diver who is returning interleave data? And how to tell application that
>> buffer having interleave data?
>>
>> I am not sure what you mean by interleave data, but file systems  for
>> example
>> are free to map any block to any file, directory or file system object,
>> so a consecutive series of sectors may contain unrelated data.  Up to a
>> maximum
>> size, the block layer merges I/O requests when the sectors are consecutive,
>> so an I/O request can also contain unrelated data.
>
> [Ghorai]
> 1. I don't think so, FS know where data exists and where is the free space. Except oth cluster.

I was not talking about free space.  I was giving an example
of why it is not possible to assume anything about what is in
an I/O request.

>
> 2. Where its mentioned in block media that for segment-x[i],x[j] data read fail out of all all requested segments form [1..n].
> And I never gone through any driver/protocol, that retry the next i+1th segment where ith-segment is failed. And for that my suggestion is preferred.

The SD/MMC protocol does not indicate which sector has the error.
There is no possibility of trying the ith+1-segment because i is
unknown.

>
>>
>>>
>>>>>
>>>>> And the solution I was proposing to return the status of IO failure as
>>>> soon as possible to above layer; and handle the card removed interrupt
>>>> separately or any other issue in h/w or s/w or combination of both. Or
>>>> just think again when platform don't have the card remove interrupt.
>>>>>
>>>>> So my patch addresses the 1st part
>>>>
>>>> It is absolutely unacceptable to return I/O errors to the upper layers
>>>> for segments that do not have errors.
>>>>
>>>>> And for the 2nd part we can submit the patch anytime.
>>>>>
>>>>>>
>>>>>> I can suggest a patch if you want but I am on vacation next week so
>>>>>> it will have to wait a couple of weeks.
>>>>>>
>>>>>>> And moreover we should not give the interleave data to apps, as we
>>>> don't
>>>>>> have option to tell application for the valid data.
>>>>>>>
>>>>> [..snip..]
>>>>> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>
>


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-20 13:09                           ` Adrian Hunter
  0 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-20 13:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/09/10 15:37, ext Ghorai, Sukumar wrote:
>
>
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
>> Sent: Monday, September 20, 2010 5:20 PM
>> To: Ghorai, Sukumar
>> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
>> Adrian Hunter
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> On 20/09/10 11:57, Ghorai, Sukumar wrote:
>>> Adrian,
>>>
>>>> -----Original Message-----
>>>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
>>>> Sent: Monday, September 20, 2010 1:24 PM
>>>> To: Ghorai, Sukumar
>>>> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
>>>> Adrian Hunter
>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>
>>>> On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
>>>>> Adrian,
>>>>>
>>>>> [..snip..]
>>>>>>>>> [Ghorai] Adrian,
>>>>>>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for
>>>> 1MB
>>>>>>>> data read) form the original code;
>>>>>>>>> Initially it was retrying for each page(512 bytes) after multi-
>> block
>>>>>>>> read fail; but this solution is retying for each segment(2048
>> bytes);
>>>>>>>>> 1. Now say block layrer reading 1MB and failed for the 1st segment.
>>>> So
>>>>>>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
>>>>>>>>> 2. So do you think any good reason to retry again and again?
>>>>>>>> If you have 1MB that is not readable, it sounds like the card is
>>>> broken.
>>>>>>>> Why are so many reads failing?  Has the card been removed?
>>>>>>>>
>>>>>>>> You might very rarely see ECC errors in a small number of sectors,
>>>>>>>> but more than that sounds like something else is broken.
>>>>>>>
>>>>>>> [Ghorai] yes, one example is we remove the card when reading data,
>>>>>>
>>>>>> Well, that is a different case.  Once the card has gone, the block
>>>> driver
>>>>>> can (and will once the remove method is called) error out all I/O
>>>>>> requests without sending them to MMC.  That doesn't happen until
>> there
>>>>>> is a card detect interrupt and a resulting rescan.
>>>>>
>>>>> [Ghorai] here we are discussing two problem,
>>>>> 1. If IO failed how to stop retry; because of -
>>>>> 	a. internal card error
>>>>> 	b. issue in Filesystem, driver, or host controller issue
>>>>> 	c. or cards removed.
>>>>>
>>>>> 2. And 2nd how to sync block-layer IO, if card removed,
>>>>> 	a. case 1: when card removed interrupt support by the platform
>>>>> 	b. case 2: when card removed interrupt does not support by the
>>>> platform?
>>>>>
>>>>>>
>>>>>> A possible solution is to put a flag on mmc_card to indicate
>> card_gone
>>>>>> that gets set as soon as the drivers card detect interrupt shows
>> there
>>>>>> is no card (hoping that we are not getting any bouncing on card
>> detect)
>>>>>> and then have mmc_wait_for_req() simple return -ENODEV immediately if
>>>>>> the card_gone flag is set.  Finally, if the mmc block driver sees
>>>>>> a -ENODEV error, it should also check the card_gone flag (via a new
>>>>>> core function) and if the card is gone, do not retry - and perhaps
>>>>>> even error out the rest of the I/O request queue as well.
>>>>>
>>>>> [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b
>>>>
>>>> The card removal case can be extended to use the bus ops detect method
>>>> when there is no card detect irq.  I will send a RFC patch.
>>>>
>>>> With respect to 1.a:
>>>>     - If the card has an internal error, then it is broken.  The user
>>>>     should remove the card and use a better one.  I do not see how
>> reducing
>>>>     retry delays really helps the user very much.  Arguably if the card
>>>>     becomes unresponsive, the MMC core could provide a facility to
>>>>     reinitialise the card, but that is yet another issue.
>>>>
>>>> With respect to 1.b:
>>>>     - The file system cannot cause the block driver to have I/O errors.
>>>>     - If there are errors in the driver they should be fixed.
>>>>     - If there are hardware problems with the host controller, then
>>>>     it is up to the host controller driver to deal with them e.g.
>>>>     by resetting the controller.  I don't see what this has to do with
>>>>     the block driver.
>>>>
>>>> You leave out the important case of ECC errors.  I am concerned about
>>>> this because of the possibility that it happens inside a file system
>>>> journal e.g. EXT4 journal.  Perhaps the journal may be recovered if the
>>>> error only affects the last transaction, but perhaps not if it destroys
>>>> other transactions - which could happen if the approach you suggest
>>>> is taken.
>>>>
>>> [Ghorai] Thanks lot for your descriptive answer.
>>> 1. Can you answer this? 2.b. case 2: when card removed interrupt does
>> not support by the platform?
>>
>> As I wrote above: The card removal case can be extended to use the bus ops
>> detect method when there is no card detect irq.  I will send a RFC patch.
>>
>>>
>>> 2. Why block layer handling for inter-leave data? Can you give example
>> diver who is returning interleave data? And how to tell application that
>> buffer having interleave data?
>>
>> I am not sure what you mean by interleave data, but file systems  for
>> example
>> are free to map any block to any file, directory or file system object,
>> so a consecutive series of sectors may contain unrelated data.  Up to a
>> maximum
>> size, the block layer merges I/O requests when the sectors are consecutive,
>> so an I/O request can also contain unrelated data.
>
> [Ghorai]
> 1. I don't think so, FS know where data exists and where is the free space. Except oth cluster.

I was not talking about free space.  I was giving an example
of why it is not possible to assume anything about what is in
an I/O request.

>
> 2. Where its mentioned in block media that for segment-x[i],x[j] data read fail out of all all requested segments form [1..n].
> And I never gone through any driver/protocol, that retry the next i+1th segment where ith-segment is failed. And for that my suggestion is preferred.

The SD/MMC protocol does not indicate which sector has the error.
There is no possibility of trying the ith+1-segment because i is
unknown.

>
>>
>>>
>>>>>
>>>>> And the solution I was proposing to return the status of IO failure as
>>>> soon as possible to above layer; and handle the card removed interrupt
>>>> separately or any other issue in h/w or s/w or combination of both. Or
>>>> just think again when platform don't have the card remove interrupt.
>>>>>
>>>>> So my patch addresses the 1st part
>>>>
>>>> It is absolutely unacceptable to return I/O errors to the upper layers
>>>> for segments that do not have errors.
>>>>
>>>>> And for the 2nd part we can submit the patch anytime.
>>>>>
>>>>>>
>>>>>> I can suggest a patch if you want but I am on vacation next week so
>>>>>> it will have to wait a couple of weeks.
>>>>>>
>>>>>>> And moreover we should not give the interleave data to apps, as we
>>>> don't
>>>>>> have option to tell application for the valid data.
>>>>>>>
>>>>> [..snip..]
>>>>> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>>> the body of a message to majordomo at vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-09-20 13:09                           ` Adrian Hunter
@ 2010-09-20 13:25                             ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-20 13:25 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-mmc, linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> Sent: Monday, September 20, 2010 6:39 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> Adrian Hunter
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On 20/09/10 15:37, ext Ghorai, Sukumar wrote:
> >
> >
> >> -----Original Message-----
> >> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> >> Sent: Monday, September 20, 2010 5:20 PM
> >> To: Ghorai, Sukumar
> >> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> >> Adrian Hunter
> >> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>
> >> On 20/09/10 11:57, Ghorai, Sukumar wrote:
> >>> Adrian,
> >>>
> >>>> -----Original Message-----
> >>>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> >>>> Sent: Monday, September 20, 2010 1:24 PM
> >>>> To: Ghorai, Sukumar
> >>>> Cc: linux-mmc@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> >>>> Adrian Hunter
> >>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>>>
> >>>> On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
> >>>>> Adrian,
> >>>>>
> >>>>> [..snip..]
> >>>>>>>>> [Ghorai] Adrian,
> >>>>>>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times
> for
> >>>> 1MB
> >>>>>>>> data read) form the original code;
> >>>>>>>>> Initially it was retrying for each page(512 bytes) after multi-
> >> block
> >>>>>>>> read fail; but this solution is retying for each segment(2048
> >> bytes);
> >>>>>>>>> 1. Now say block layrer reading 1MB and failed for the 1st
> segment.
> >>>> So
> >>>>>>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
> >>>>>>>>> 2. So do you think any good reason to retry again and again?
> >>>>>>>> If you have 1MB that is not readable, it sounds like the card is
> >>>> broken.
> >>>>>>>> Why are so many reads failing?  Has the card been removed?
> >>>>>>>>
> >>>>>>>> You might very rarely see ECC errors in a small number of sectors,
> >>>>>>>> but more than that sounds like something else is broken.
> >>>>>>>
> >>>>>>> [Ghorai] yes, one example is we remove the card when reading data,
> >>>>>>
> >>>>>> Well, that is a different case.  Once the card has gone, the block
> >>>> driver
> >>>>>> can (and will once the remove method is called) error out all I/O
> >>>>>> requests without sending them to MMC.  That doesn't happen until
> >> there
> >>>>>> is a card detect interrupt and a resulting rescan.
> >>>>>
> >>>>> [Ghorai] here we are discussing two problem,
> >>>>> 1. If IO failed how to stop retry; because of -
> >>>>> 	a. internal card error
> >>>>> 	b. issue in Filesystem, driver, or host controller issue
> >>>>> 	c. or cards removed.
> >>>>>
> >>>>> 2. And 2nd how to sync block-layer IO, if card removed,
> >>>>> 	a. case 1: when card removed interrupt support by the platform
> >>>>> 	b. case 2: when card removed interrupt does not support by the
> >>>> platform?
> >>>>>
> >>>>>>
> >>>>>> A possible solution is to put a flag on mmc_card to indicate
> >> card_gone
> >>>>>> that gets set as soon as the drivers card detect interrupt shows
> >> there
> >>>>>> is no card (hoping that we are not getting any bouncing on card
> >> detect)
> >>>>>> and then have mmc_wait_for_req() simple return -ENODEV immediately
> if
> >>>>>> the card_gone flag is set.  Finally, if the mmc block driver sees
> >>>>>> a -ENODEV error, it should also check the card_gone flag (via a new
> >>>>>> core function) and if the card is gone, do not retry - and perhaps
> >>>>>> even error out the rest of the I/O request queue as well.
> >>>>>
> >>>>> [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b
> >>>>
> >>>> The card removal case can be extended to use the bus ops detect
> method
> >>>> when there is no card detect irq.  I will send a RFC patch.
> >>>>
> >>>> With respect to 1.a:
> >>>>     - If the card has an internal error, then it is broken.  The user
> >>>>     should remove the card and use a better one.  I do not see how
> >> reducing
> >>>>     retry delays really helps the user very much.  Arguably if the
> card
> >>>>     becomes unresponsive, the MMC core could provide a facility to
> >>>>     reinitialise the card, but that is yet another issue.
> >>>>
> >>>> With respect to 1.b:
> >>>>     - The file system cannot cause the block driver to have I/O
> errors.
> >>>>     - If there are errors in the driver they should be fixed.
> >>>>     - If there are hardware problems with the host controller, then
> >>>>     it is up to the host controller driver to deal with them e.g.
> >>>>     by resetting the controller.  I don't see what this has to do
> with
> >>>>     the block driver.
> >>>>
> >>>> You leave out the important case of ECC errors.  I am concerned about
> >>>> this because of the possibility that it happens inside a file system
> >>>> journal e.g. EXT4 journal.  Perhaps the journal may be recovered if
> the
> >>>> error only affects the last transaction, but perhaps not if it
> destroys
> >>>> other transactions - which could happen if the approach you suggest
> >>>> is taken.
> >>>>
> >>> [Ghorai] Thanks lot for your descriptive answer.
> >>> 1. Can you answer this? 2.b. case 2: when card removed interrupt does
> >> not support by the platform?
> >>
> >> As I wrote above: The card removal case can be extended to use the bus
> ops
> >> detect method when there is no card detect irq.  I will send a RFC
> patch.
> >>
> >>>
> >>> 2. Why block layer handling for inter-leave data? Can you give example
> >> diver who is returning interleave data? And how to tell application
> that
> >> buffer having interleave data?
> >>
> >> I am not sure what you mean by interleave data, but file systems  for
> >> example
> >> are free to map any block to any file, directory or file system object,
> >> so a consecutive series of sectors may contain unrelated data.  Up to a
> >> maximum
> >> size, the block layer merges I/O requests when the sectors are
> consecutive,
> >> so an I/O request can also contain unrelated data.
> >
> > [Ghorai]
> > 1. I don't think so, FS know where data exists and where is the free
> space. Except oth cluster.
> 
> I was not talking about free space.  I was giving an example
> of why it is not possible to assume anything about what is in
> an I/O request.
> 
> >
> > 2. Where its mentioned in block media that for segment-x[i],x[j] data
> read fail out of all all requested segments form [1..n].
> > And I never gone through any driver/protocol, that retry the next i+1th
> segment where ith-segment is failed. And for that my suggestion is
> preferred.
> 
> The SD/MMC protocol does not indicate which sector has the error.
> There is no possibility of trying the ith+1-segment because i is
> unknown.
[Ghorai] I understand, but I think still you are fever of retrying each and every sector(i, i+1, i+2, ..) for the all request segments sequentially, when single block read for the i-th sector failed.


> 
> >
> >>
> >>>
> >>>>>
> >>>>> And the solution I was proposing to return the status of IO failure
> as
> >>>> soon as possible to above layer; and handle the card removed
> interrupt
> >>>> separately or any other issue in h/w or s/w or combination of both.
> Or
> >>>> just think again when platform don't have the card remove interrupt.
> >>>>>
> >>>>> So my patch addresses the 1st part
> >>>>
> >>>> It is absolutely unacceptable to return I/O errors to the upper
> layers
> >>>> for segments that do not have errors.
> >>>>
> >>>>> And for the 2nd part we can submit the patch anytime.
> >>>>>
> >>>>>>
> >>>>>> I can suggest a patch if you want but I am on vacation next week so
> >>>>>> it will have to wait a couple of weeks.
> >>>>>>
> >>>>>>> And moreover we should not give the interleave data to apps, as we
> >>>> don't
> >>>>>> have option to tell application for the valid data.
> >>>>>>>
> >>>>> [..snip..]
> >>>>> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-mmc"
> in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >
> >


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-20 13:25                             ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-20 13:25 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> Sent: Monday, September 20, 2010 6:39 PM
> To: Ghorai, Sukumar
> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
> Adrian Hunter
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On 20/09/10 15:37, ext Ghorai, Sukumar wrote:
> >
> >
> >> -----Original Message-----
> >> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> >> Sent: Monday, September 20, 2010 5:20 PM
> >> To: Ghorai, Sukumar
> >> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
> >> Adrian Hunter
> >> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>
> >> On 20/09/10 11:57, Ghorai, Sukumar wrote:
> >>> Adrian,
> >>>
> >>>> -----Original Message-----
> >>>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> >>>> Sent: Monday, September 20, 2010 1:24 PM
> >>>> To: Ghorai, Sukumar
> >>>> Cc: linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org;
> >>>> Adrian Hunter
> >>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>>>
> >>>> On 14/09/10 08:15, ext Ghorai, Sukumar wrote:
> >>>>> Adrian,
> >>>>>
> >>>>> [..snip..]
> >>>>>>>>> [Ghorai] Adrian,
> >>>>>>>>> Yes this works and reduced the retry by 1/4 (2048 to 512 times
> for
> >>>> 1MB
> >>>>>>>> data read) form the original code;
> >>>>>>>>> Initially it was retrying for each page(512 bytes) after multi-
> >> block
> >>>>>>>> read fail; but this solution is retying for each segment(2048
> >> bytes);
> >>>>>>>>> 1. Now say block layrer reading 1MB and failed for the 1st
> segment.
> >>>> So
> >>>>>>>> it will still retry for 1MB/2048-bytes, i.e. 512 times.
> >>>>>>>>> 2. So do you think any good reason to retry again and again?
> >>>>>>>> If you have 1MB that is not readable, it sounds like the card is
> >>>> broken.
> >>>>>>>> Why are so many reads failing?  Has the card been removed?
> >>>>>>>>
> >>>>>>>> You might very rarely see ECC errors in a small number of sectors,
> >>>>>>>> but more than that sounds like something else is broken.
> >>>>>>>
> >>>>>>> [Ghorai] yes, one example is we remove the card when reading data,
> >>>>>>
> >>>>>> Well, that is a different case.  Once the card has gone, the block
> >>>> driver
> >>>>>> can (and will once the remove method is called) error out all I/O
> >>>>>> requests without sending them to MMC.  That doesn't happen until
> >> there
> >>>>>> is a card detect interrupt and a resulting rescan.
> >>>>>
> >>>>> [Ghorai] here we are discussing two problem,
> >>>>> 1. If IO failed how to stop retry; because of -
> >>>>> 	a. internal card error
> >>>>> 	b. issue in Filesystem, driver, or host controller issue
> >>>>> 	c. or cards removed.
> >>>>>
> >>>>> 2. And 2nd how to sync block-layer IO, if card removed,
> >>>>> 	a. case 1: when card removed interrupt support by the platform
> >>>>> 	b. case 2: when card removed interrupt does not support by the
> >>>> platform?
> >>>>>
> >>>>>>
> >>>>>> A possible solution is to put a flag on mmc_card to indicate
> >> card_gone
> >>>>>> that gets set as soon as the drivers card detect interrupt shows
> >> there
> >>>>>> is no card (hoping that we are not getting any bouncing on card
> >> detect)
> >>>>>> and then have mmc_wait_for_req() simple return -ENODEV immediately
> if
> >>>>>> the card_gone flag is set.  Finally, if the mmc block driver sees
> >>>>>> a -ENODEV error, it should also check the card_gone flag (via a new
> >>>>>> core function) and if the card is gone, do not retry - and perhaps
> >>>>>> even error out the rest of the I/O request queue as well.
> >>>>>
> >>>>> [Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b
> >>>>
> >>>> The card removal case can be extended to use the bus ops detect
> method
> >>>> when there is no card detect irq.  I will send a RFC patch.
> >>>>
> >>>> With respect to 1.a:
> >>>>     - If the card has an internal error, then it is broken.  The user
> >>>>     should remove the card and use a better one.  I do not see how
> >> reducing
> >>>>     retry delays really helps the user very much.  Arguably if the
> card
> >>>>     becomes unresponsive, the MMC core could provide a facility to
> >>>>     reinitialise the card, but that is yet another issue.
> >>>>
> >>>> With respect to 1.b:
> >>>>     - The file system cannot cause the block driver to have I/O
> errors.
> >>>>     - If there are errors in the driver they should be fixed.
> >>>>     - If there are hardware problems with the host controller, then
> >>>>     it is up to the host controller driver to deal with them e.g.
> >>>>     by resetting the controller.  I don't see what this has to do
> with
> >>>>     the block driver.
> >>>>
> >>>> You leave out the important case of ECC errors.  I am concerned about
> >>>> this because of the possibility that it happens inside a file system
> >>>> journal e.g. EXT4 journal.  Perhaps the journal may be recovered if
> the
> >>>> error only affects the last transaction, but perhaps not if it
> destroys
> >>>> other transactions - which could happen if the approach you suggest
> >>>> is taken.
> >>>>
> >>> [Ghorai] Thanks lot for your descriptive answer.
> >>> 1. Can you answer this? 2.b. case 2: when card removed interrupt does
> >> not support by the platform?
> >>
> >> As I wrote above: The card removal case can be extended to use the bus
> ops
> >> detect method when there is no card detect irq.  I will send a RFC
> patch.
> >>
> >>>
> >>> 2. Why block layer handling for inter-leave data? Can you give example
> >> diver who is returning interleave data? And how to tell application
> that
> >> buffer having interleave data?
> >>
> >> I am not sure what you mean by interleave data, but file systems  for
> >> example
> >> are free to map any block to any file, directory or file system object,
> >> so a consecutive series of sectors may contain unrelated data.  Up to a
> >> maximum
> >> size, the block layer merges I/O requests when the sectors are
> consecutive,
> >> so an I/O request can also contain unrelated data.
> >
> > [Ghorai]
> > 1. I don't think so, FS know where data exists and where is the free
> space. Except oth cluster.
> 
> I was not talking about free space.  I was giving an example
> of why it is not possible to assume anything about what is in
> an I/O request.
> 
> >
> > 2. Where its mentioned in block media that for segment-x[i],x[j] data
> read fail out of all all requested segments form [1..n].
> > And I never gone through any driver/protocol, that retry the next i+1th
> segment where ith-segment is failed. And for that my suggestion is
> preferred.
> 
> The SD/MMC protocol does not indicate which sector has the error.
> There is no possibility of trying the ith+1-segment because i is
> unknown.
[Ghorai] I understand, but I think still you are fever of retrying each and every sector(i, i+1, i+2, ..) for the all request segments sequentially, when single block read for the i-th sector failed.


> 
> >
> >>
> >>>
> >>>>>
> >>>>> And the solution I was proposing to return the status of IO failure
> as
> >>>> soon as possible to above layer; and handle the card removed
> interrupt
> >>>> separately or any other issue in h/w or s/w or combination of both.
> Or
> >>>> just think again when platform don't have the card remove interrupt.
> >>>>>
> >>>>> So my patch addresses the 1st part
> >>>>
> >>>> It is absolutely unacceptable to return I/O errors to the upper
> layers
> >>>> for segments that do not have errors.
> >>>>
> >>>>> And for the 2nd part we can submit the patch anytime.
> >>>>>
> >>>>>>
> >>>>>> I can suggest a patch if you want but I am on vacation next week so
> >>>>>> it will have to wait a couple of weeks.
> >>>>>>
> >>>>>>> And moreover we should not give the interleave data to apps, as we
> >>>> don't
> >>>>>> have option to tell application for the valid data.
> >>>>>>>
> >>>>> [..snip..]
> >>>>> http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-mmc"
> in
> >>> the body of a message to majordomo at vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-09-20 13:09                           ` Adrian Hunter
@ 2010-09-20 13:37                             ` Russell King - ARM Linux
  -1 siblings, 0 replies; 54+ messages in thread
From: Russell King - ARM Linux @ 2010-09-20 13:37 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ghorai, Sukumar, linux-mmc, linux-arm-kernel

On Mon, Sep 20, 2010 at 04:09:15PM +0300, Adrian Hunter wrote:
> The SD/MMC protocol does not indicate which sector has the error.
> There is no possibility of trying the ith+1-segment because i is
> unknown.

Not entirely true.

I seem to remember that the MMC protocol (and probably SD protocol)
provides an ack at the end of each data block to say whether the data
block was correct or not.

Some hosts tell you when they receive this ack, and you can track how
many blocks have been successfully read without error.  The ARM MMCI
primecell does this.

However, some hosts don't give you this information, in which case you
don't know where the error occurred in the read - so when a read fails,
you can only safely report that no bytes were read successfully.  This
means that even if the error was on the last block, there's no way to
know this.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-20 13:37                             ` Russell King - ARM Linux
  0 siblings, 0 replies; 54+ messages in thread
From: Russell King - ARM Linux @ 2010-09-20 13:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 20, 2010 at 04:09:15PM +0300, Adrian Hunter wrote:
> The SD/MMC protocol does not indicate which sector has the error.
> There is no possibility of trying the ith+1-segment because i is
> unknown.

Not entirely true.

I seem to remember that the MMC protocol (and probably SD protocol)
provides an ack at the end of each data block to say whether the data
block was correct or not.

Some hosts tell you when they receive this ack, and you can track how
many blocks have been successfully read without error.  The ARM MMCI
primecell does this.

However, some hosts don't give you this information, in which case you
don't know where the error occurred in the read - so when a read fails,
you can only safely report that no bytes were read successfully.  This
means that even if the error was on the last block, there's no way to
know this.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-09-20 13:37                             ` Russell King - ARM Linux
@ 2010-09-22  5:32                               ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-22  5:32 UTC (permalink / raw)
  To: Chris Ball, Adrian Hunter
  Cc: linux-mmc, linux-arm-kernel, Russell King - ARM Linux

[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]

Chris,

> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> Sent: Monday, September 20, 2010 7:07 PM
> To: Adrian Hunter
> Cc: Ghorai, Sukumar; linux-mmc@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On Mon, Sep 20, 2010 at 04:09:15PM +0300, Adrian Hunter wrote:
> > The SD/MMC protocol does not indicate which sector has the error.
> > There is no possibility of trying the ith+1-segment because i is
> > unknown.
> 
> Not entirely true.
> 
> I seem to remember that the MMC protocol (and probably SD protocol)
> provides an ack at the end of each data block to say whether the data
> block was correct or not.
> 
> Some hosts tell you when they receive this ack, and you can track how
> many blocks have been successfully read without error.  The ARM MMCI
> primecell does this.
> 
> However, some hosts don't give you this information, in which case you
> don't know where the error occurred in the read - so when a read fails,
> you can only safely report that no bytes were read successfully.  This
> means that even if the error was on the last block, there's no way to
> know this.
[Ghorai]
Would you please review and merge this patch [1] (attached too)?
[1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714



[-- Attachment #2: 0001-mmc-failure-of-block-read-wait-for-long-time.patch --]
[-- Type: application/octet-stream, Size: 1114 bytes --]

From 506c0c43c88da31fe56455c89f9294cc231fc4ed Mon Sep 17 00:00:00 2001
From: Sukumar Ghorai <s-ghorai@ti.com>
Date: Wed, 22 Sep 2010 10:58:53 +0530
Subject: [PATCH] mmc: failure of block read wait for long time

 multi-block read failure retries in single block read one by one. It continues
 retry of subsequent blocks, even after failure. Application will not be able
 to decode the interleave data (even if few single block read success).
 This patch fixes this problem by returning at the first failure instead of
 waiting for long duration.

Signed-off-by: Sukumar Ghorai <s-ghorai@ti.com>
---
 drivers/mmc/card/block.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index d545f79..0ec34af 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -493,7 +493,6 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 				spin_lock_irq(&md->lock);
 				ret = __blk_end_request(req, -EIO, brq.data.blksz);
 				spin_unlock_irq(&md->lock);
-				continue;
 			}
 			goto cmd_err;
 		}
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-22  5:32                               ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-22  5:32 UTC (permalink / raw)
  To: linux-arm-kernel

Chris,

> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux at arm.linux.org.uk]
> Sent: Monday, September 20, 2010 7:07 PM
> To: Adrian Hunter
> Cc: Ghorai, Sukumar; linux-mmc at vger.kernel.org; linux-arm-
> kernel at lists.infradead.org
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On Mon, Sep 20, 2010 at 04:09:15PM +0300, Adrian Hunter wrote:
> > The SD/MMC protocol does not indicate which sector has the error.
> > There is no possibility of trying the ith+1-segment because i is
> > unknown.
> 
> Not entirely true.
> 
> I seem to remember that the MMC protocol (and probably SD protocol)
> provides an ack at the end of each data block to say whether the data
> block was correct or not.
> 
> Some hosts tell you when they receive this ack, and you can track how
> many blocks have been successfully read without error.  The ARM MMCI
> primecell does this.
> 
> However, some hosts don't give you this information, in which case you
> don't know where the error occurred in the read - so when a read fails,
> you can only safely report that no bytes were read successfully.  This
> means that even if the error was on the last block, there's no way to
> know this.
[Ghorai]
Would you please review and merge this patch [1] (attached too)?
[1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714


-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-mmc-failure-of-block-read-wait-for-long-time.patch
Type: application/octet-stream
Size: 1114 bytes
Desc: 0001-mmc-failure-of-block-read-wait-for-long-time.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20100922/ee9f688f/attachment.obj>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-09-22  5:32                               ` Ghorai, Sukumar
@ 2010-09-22 12:43                                 ` Chris Ball
  -1 siblings, 0 replies; 54+ messages in thread
From: Chris Ball @ 2010-09-22 12:43 UTC (permalink / raw)
  To: Ghorai, Sukumar
  Cc: Adrian Hunter, linux-mmc, linux-arm-kernel, Russell King - ARM Linux

On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> Would you please review and merge this patch [1] (attached too)?
> [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714

I've been following the thread.  I believe Adrian has NACKed this patch,
by saying "It is absolutely unacceptable to return I/O errors to the
upper layers for segments that do not have errors."

I think it's possible to merge patches to improve the situation (such
as the idea of noticing a card disappearing earlier), but your initial
patch is not the patch to do that.  You should continue to work with
Adrian -- when he's happy that a patch does not break the semantics
above, we can consider merging it.

Thanks,

-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-22 12:43                                 ` Chris Ball
  0 siblings, 0 replies; 54+ messages in thread
From: Chris Ball @ 2010-09-22 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> Would you please review and merge this patch [1] (attached too)?
> [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714

I've been following the thread.  I believe Adrian has NACKed this patch,
by saying "It is absolutely unacceptable to return I/O errors to the
upper layers for segments that do not have errors."

I think it's possible to merge patches to improve the situation (such
as the idea of noticing a card disappearing earlier), but your initial
patch is not the patch to do that.  You should continue to work with
Adrian -- when he's happy that a patch does not break the semantics
above, we can consider merging it.

Thanks,

-- 
Chris Ball   <cjb@laptop.org>   <http://printf.net/>
One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-09-22 12:43                                 ` Chris Ball
@ 2010-09-22 12:51                                   ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-22 12:51 UTC (permalink / raw)
  To: Chris Ball
  Cc: Adrian Hunter, linux-mmc, linux-arm-kernel, Russell King - ARM Linux

Chris,

> -----Original Message-----
> From: Chris Ball [mailto:cjb@laptop.org]
> Sent: Wednesday, September 22, 2010 6:13 PM
> To: Ghorai, Sukumar
> Cc: Adrian Hunter; linux-mmc@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; Russell King - ARM Linux
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> > Would you please review and merge this patch [1] (attached too)?
> > [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> 
> I've been following the thread.  I believe Adrian has NACKed this patch,
> by saying "It is absolutely unacceptable to return I/O errors to the
> upper layers for segments that do not have errors."

[Ghorai] 
I think Russell also mentioned his opinion. Would you please add your idea too?

1. I would prefer Adrian to explain again what this statement means, in the context - data read fail and how we make it success? 

2. if data read fail for sector(x) why we have to try for sector(x+1, ..x+n)?

3. how to inform reader function which sector having the valid data out of (1...n) sectors.

4. do we have any driver/code in Linux or any other os, which give inter-leave data and return as success?  

> 
> I think it's possible to merge patches to improve the situation (such
> as the idea of noticing a card disappearing earlier), but your initial
> patch is not the patch to do that.  You should continue to work with
> Adrian -- when he's happy that a patch does not break the semantics
> above, we can consider merging it.
> 
> Thanks,
> 
> --
> Chris Ball   <cjb@laptop.org>   <http://printf.net/>
> One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-22 12:51                                   ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-22 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

Chris,

> -----Original Message-----
> From: Chris Ball [mailto:cjb at laptop.org]
> Sent: Wednesday, September 22, 2010 6:13 PM
> To: Ghorai, Sukumar
> Cc: Adrian Hunter; linux-mmc at vger.kernel.org; linux-arm-
> kernel at lists.infradead.org; Russell King - ARM Linux
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> > Would you please review and merge this patch [1] (attached too)?
> > [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> 
> I've been following the thread.  I believe Adrian has NACKed this patch,
> by saying "It is absolutely unacceptable to return I/O errors to the
> upper layers for segments that do not have errors."

[Ghorai] 
I think Russell also mentioned his opinion. Would you please add your idea too?

1. I would prefer Adrian to explain again what this statement means, in the context - data read fail and how we make it success? 

2. if data read fail for sector(x) why we have to try for sector(x+1, ..x+n)?

3. how to inform reader function which sector having the valid data out of (1...n) sectors.

4. do we have any driver/code in Linux or any other os, which give inter-leave data and return as success?  

> 
> I think it's possible to merge patches to improve the situation (such
> as the idea of noticing a card disappearing earlier), but your initial
> patch is not the patch to do that.  You should continue to work with
> Adrian -- when he's happy that a patch does not break the semantics
> above, we can consider merging it.
> 
> Thanks,
> 
> --
> Chris Ball   <cjb@laptop.org>   <http://printf.net/>
> One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-09-22 12:43                                 ` Chris Ball
@ 2010-09-24 14:35                                   ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-24 14:35 UTC (permalink / raw)
  To: Chris Ball, Adrian Hunter
  Cc: linux-mmc, linux-arm-kernel, Russell King - ARM Linux

Chris and Adrian,

[..snip..]
> 
> > -----Original Message-----
> > From: Chris Ball [mailto:cjb@laptop.org]
> > Sent: Wednesday, September 22, 2010 6:13 PM
> > To: Ghorai, Sukumar
> > Cc: Adrian Hunter; linux-mmc@vger.kernel.org; linux-arm-
> > kernel@lists.infradead.org; Russell King - ARM Linux
> > Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >
> > On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> > > Would you please review and merge this patch [1] (attached too)?
> > > [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >
> > I've been following the thread.  I believe Adrian has NACKed this patch,
> > by saying "It is absolutely unacceptable to return I/O errors to the
> > upper layers for segments that do not have errors."
> 
> [Ghorai]
> I think Russell also mentioned his opinion. Would you please add your idea
> too?
> 
> 1. I would prefer Adrian to explain again what this statement means, in
> the context - data read fail and how we make it success?
> 
> 2. if data read fail for sector(x) why we have to try for
> sector(x+1, ..x+n)?
> 
> 3. how to inform reader function which sector having the valid data out of
> (1...n) sectors.
> 
> 4. do we have any driver/code in Linux or any other os, which give inter-
> leave data and return as success?
> 
[Ghorai] please reply with your input on my/ Russell's suggestion?

> >
> > I think it's possible to merge patches to improve the situation (such
> > as the idea of noticing a card disappearing earlier), but your initial
> > patch is not the patch to do that.  You should continue to work with
> > Adrian -- when he's happy that a patch does not break the semantics
> > above, we can consider merging it.
> >
> > Thanks,
> >
> > --
> > Chris Ball   <cjb@laptop.org>   <http://printf.net/>
> > One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-24 14:35                                   ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-24 14:35 UTC (permalink / raw)
  To: linux-arm-kernel

Chris and Adrian,

[..snip..]
> 
> > -----Original Message-----
> > From: Chris Ball [mailto:cjb at laptop.org]
> > Sent: Wednesday, September 22, 2010 6:13 PM
> > To: Ghorai, Sukumar
> > Cc: Adrian Hunter; linux-mmc at vger.kernel.org; linux-arm-
> > kernel at lists.infradead.org; Russell King - ARM Linux
> > Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >
> > On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> > > Would you please review and merge this patch [1] (attached too)?
> > > [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >
> > I've been following the thread.  I believe Adrian has NACKed this patch,
> > by saying "It is absolutely unacceptable to return I/O errors to the
> > upper layers for segments that do not have errors."
> 
> [Ghorai]
> I think Russell also mentioned his opinion. Would you please add your idea
> too?
> 
> 1. I would prefer Adrian to explain again what this statement means, in
> the context - data read fail and how we make it success?
> 
> 2. if data read fail for sector(x) why we have to try for
> sector(x+1, ..x+n)?
> 
> 3. how to inform reader function which sector having the valid data out of
> (1...n) sectors.
> 
> 4. do we have any driver/code in Linux or any other os, which give inter-
> leave data and return as success?
> 
[Ghorai] please reply with your input on my/ Russell's suggestion?

> >
> > I think it's possible to merge patches to improve the situation (such
> > as the idea of noticing a card disappearing earlier), but your initial
> > patch is not the patch to do that.  You should continue to work with
> > Adrian -- when he's happy that a patch does not break the semantics
> > above, we can consider merging it.
> >
> > Thanks,
> >
> > --
> > Chris Ball   <cjb@laptop.org>   <http://printf.net/>
> > One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-09-22 12:43                                 ` Chris Ball
@ 2010-09-28 15:03                                   ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-28 15:03 UTC (permalink / raw)
  To: Chris Ball, Adrian Hunter
  Cc: linux-mmc, linux-arm-kernel, Russell King - ARM Linux

Chris and Adrian,

[..snip..]
> 
> Chris and Adrian,
> 
> [..snip..]
> >
> > > -----Original Message-----
[..snip..]
> > > Subject: Re: [PATCH] mmc: failure of block read wait for long time
> > >
> > > On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> > > > Would you please review and merge this patch [1] (attached too)?
> > > > [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> > >
> > > I've been following the thread.  I believe Adrian has NACKed this
> patch,
> > > by saying "It is absolutely unacceptable to return I/O errors to the
> > > upper layers for segments that do not have errors."
> >
> > [Ghorai]
> > I think Russell also mentioned his opinion. Would you please add your
> idea
> > too?
> >
> > 1. I would prefer Adrian to explain again what this statement means, in
> > the context - data read fail and how we make it success?
> >
> > 2. if data read fail for sector(x) why we have to try for
> > sector(x+1, ..x+n)?
> >
> > 3. how to inform reader function which sector having the valid data out
> of
> > (1...n) sectors.
> >
> > 4. do we have any driver/code in Linux or any other os, which give
> inter-
> > leave data and return as success?
> >
> [Ghorai] please reply with your input on my/ Russell's suggestion?
[Ghorai] any input?
> 
> > >
> > > I think it's possible to merge patches to improve the situation (such
> > > as the idea of noticing a card disappearing earlier), but your initial
> > > patch is not the patch to do that.  You should continue to work with
> > > Adrian -- when he's happy that a patch does not break the semantics
> > > above, we can consider merging it.
> > >
> > > Thanks,
> > >
> > > --
> > > Chris Ball   <cjb@laptop.org>   <http://printf.net/>
> > > One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-28 15:03                                   ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-28 15:03 UTC (permalink / raw)
  To: linux-arm-kernel

Chris and Adrian,

[..snip..]
> 
> Chris and Adrian,
> 
> [..snip..]
> >
> > > -----Original Message-----
[..snip..]
> > > Subject: Re: [PATCH] mmc: failure of block read wait for long time
> > >
> > > On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> > > > Would you please review and merge this patch [1] (attached too)?
> > > > [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> > >
> > > I've been following the thread.  I believe Adrian has NACKed this
> patch,
> > > by saying "It is absolutely unacceptable to return I/O errors to the
> > > upper layers for segments that do not have errors."
> >
> > [Ghorai]
> > I think Russell also mentioned his opinion. Would you please add your
> idea
> > too?
> >
> > 1. I would prefer Adrian to explain again what this statement means, in
> > the context - data read fail and how we make it success?
> >
> > 2. if data read fail for sector(x) why we have to try for
> > sector(x+1, ..x+n)?
> >
> > 3. how to inform reader function which sector having the valid data out
> of
> > (1...n) sectors.
> >
> > 4. do we have any driver/code in Linux or any other os, which give
> inter-
> > leave data and return as success?
> >
> [Ghorai] please reply with your input on my/ Russell's suggestion?
[Ghorai] any input?
> 
> > >
> > > I think it's possible to merge patches to improve the situation (such
> > > as the idea of noticing a card disappearing earlier), but your initial
> > > patch is not the patch to do that.  You should continue to work with
> > > Adrian -- when he's happy that a patch does not break the semantics
> > > above, we can consider merging it.
> > >
> > > Thanks,
> > >
> > > --
> > > Chris Ball   <cjb@laptop.org>   <http://printf.net/>
> > > One Laptop Per Child

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-09-28 15:03                                   ` Ghorai, Sukumar
@ 2010-09-28 18:32                                     ` Adrian Hunter
  -1 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-28 18:32 UTC (permalink / raw)
  To: Ghorai, Sukumar
  Cc: Chris Ball, linux-mmc, linux-arm-kernel, Russell King - ARM Linux

On 28/09/10 18:03, Ghorai, Sukumar wrote:
> Chris and Adrian,
>
> [..snip..]
>>
>> Chris and Adrian,
>>
>> [..snip..]
>>>
>>>> -----Original Message-----
> [..snip..]
>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>
>>>> On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
>>>>> Would you please review and merge this patch [1] (attached too)?
>>>>> [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>>>>
>>>> I've been following the thread.  I believe Adrian has NACKed this
>> patch,
>>>> by saying "It is absolutely unacceptable to return I/O errors to the
>>>> upper layers for segments that do not have errors."
>>>
>>> [Ghorai]
>>> I think Russell also mentioned his opinion. Would you please add your
>> idea
>>> too?
>>>
>>> 1. I would prefer Adrian to explain again what this statement means, in
>>> the context - data read fail and how we make it success?

Because I/O requests are made up of segments and every segment can be a
success or failure.

>>>
>>> 2. if data read fail for sector(x) why we have to try for
>>> sector(x+1, ..x+n)?

See answer to q. 1

>>>
>>> 3. how to inform reader function which sector having the valid data out
>> of
>>> (1...n) sectors.

__blk_end_request() does that

>>>
>>> 4. do we have any driver/code in Linux or any other os, which give
>> inter-
>>> leave data and return as success?

Here is the problem with that question.  The *same* I/O request
can have data for *different*sources.

>>>
>> [Ghorai] please reply with your input on my/ Russell's suggestion?
> [Ghorai] any input?

I have a question for you.  What use cases do you want to address
  - other than card removal?

>>
>>>>
>>>> I think it's possible to merge patches to improve the situation (such
>>>> as the idea of noticing a card disappearing earlier), but your initial
>>>> patch is not the patch to do that.  You should continue to work with
>>>> Adrian -- when he's happy that a patch does not break the semantics
>>>> above, we can consider merging it.
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Chris Ball<cjb@laptop.org>    <http://printf.net/>
>>>> One Laptop Per Child
>


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-28 18:32                                     ` Adrian Hunter
  0 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-28 18:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/09/10 18:03, Ghorai, Sukumar wrote:
> Chris and Adrian,
>
> [..snip..]
>>
>> Chris and Adrian,
>>
>> [..snip..]
>>>
>>>> -----Original Message-----
> [..snip..]
>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>
>>>> On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
>>>>> Would you please review and merge this patch [1] (attached too)?
>>>>> [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>>>>
>>>> I've been following the thread.  I believe Adrian has NACKed this
>> patch,
>>>> by saying "It is absolutely unacceptable to return I/O errors to the
>>>> upper layers for segments that do not have errors."
>>>
>>> [Ghorai]
>>> I think Russell also mentioned his opinion. Would you please add your
>> idea
>>> too?
>>>
>>> 1. I would prefer Adrian to explain again what this statement means, in
>>> the context - data read fail and how we make it success?

Because I/O requests are made up of segments and every segment can be a
success or failure.

>>>
>>> 2. if data read fail for sector(x) why we have to try for
>>> sector(x+1, ..x+n)?

See answer to q. 1

>>>
>>> 3. how to inform reader function which sector having the valid data out
>> of
>>> (1...n) sectors.

__blk_end_request() does that

>>>
>>> 4. do we have any driver/code in Linux or any other os, which give
>> inter-
>>> leave data and return as success?

Here is the problem with that question.  The *same* I/O request
can have data for *different*sources.

>>>
>> [Ghorai] please reply with your input on my/ Russell's suggestion?
> [Ghorai] any input?

I have a question for you.  What use cases do you want to address
  - other than card removal?

>>
>>>>
>>>> I think it's possible to merge patches to improve the situation (such
>>>> as the idea of noticing a card disappearing earlier), but your initial
>>>> patch is not the patch to do that.  You should continue to work with
>>>> Adrian -- when he's happy that a patch does not break the semantics
>>>> above, we can consider merging it.
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Chris Ball<cjb@laptop.org>    <http://printf.net/>
>>>> One Laptop Per Child
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-09-28 18:32                                     ` Adrian Hunter
@ 2010-09-28 18:59                                       ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-28 18:59 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Chris Ball, linux-mmc, linux-arm-kernel, Russell King - ARM Linux

Adrian,

> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> Sent: Wednesday, September 29, 2010 12:03 AM
> To: Ghorai, Sukumar
> Cc: Chris Ball; linux-mmc@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; Russell King - ARM Linux
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On 28/09/10 18:03, Ghorai, Sukumar wrote:
> > Chris and Adrian,
> >
> > [..snip..]
> >>
> >> Chris and Adrian,
> >>
> >> [..snip..]
> >>>
> >>>> -----Original Message-----
> > [..snip..]
> >>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>>>
> >>>> On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> >>>>> Would you please review and merge this patch [1] (attached too)?
> >>>>> [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >>>>
> >>>> I've been following the thread.  I believe Adrian has NACKed this
> >> patch,
> >>>> by saying "It is absolutely unacceptable to return I/O errors to the
> >>>> upper layers for segments that do not have errors."
> >>>
> >>> [Ghorai]
> >>> I think Russell also mentioned his opinion. Would you please add your
> >> idea
> >>> too?
> >>>
> >>> 1. I would prefer Adrian to explain again what this statement means,
> in
> >>> the context - data read fail and how we make it success?
> 
> Because I/O requests are made up of segments and every segment can be a
> success or failure.
[Ghorai] don't you conflict your self for the comments you provide for following patch - 
[PATCH] MMC: Refine block layer waiting for card state
[Adrian].. then why wait for lots of errors before doing it.

> 
> >>>
> >>> 2. if data read fail for sector(x) why we have to try for
> >>> sector(x+1, ..x+n)?
> 
> See answer to q. 1
> 
> >>>
> >>> 3. how to inform reader function which sector having the valid data
> out
> >> of
> >>> (1...n) sectors.
> 
> __blk_end_request() does that
[Ghorai] not true. Please check the code again.

> 
> >>>
> >>> 4. do we have any driver/code in Linux or any other os, which give
> >> inter-
> >>> leave data and return as success?
> 
> Here is the problem with that question.  The *same* I/O request
> can have data for *different*sources.
[Ghorai] File system does not do that and can you test that once how data comes from difference soure? 
Also conflicting your-self for the input you gave for the patch and as -
[PATCH] MMC: Refine block layer waiting for card state
[Adrian].. then why wait for lots of errors before doing it.

> 
> >>>
> >> [Ghorai] please reply with your input on my/ Russell's suggestion?
> > [Ghorai] any input?
> 
> I have a question for you.  What use cases do you want to address
>   - other than card removal?
[Ghorai] 
1. can you reply to original input form Russell's on the same thread?
2. can you check if you return the interleave data to FS how it can behave?
3. still you don't have any reference driver which provide the interleave data.


> 
> >>
> >>>>
> >>>> I think it's possible to merge patches to improve the situation (such
> >>>> as the idea of noticing a card disappearing earlier), but your
> initial
> >>>> patch is not the patch to do that.  You should continue to work with
> >>>> Adrian -- when he's happy that a patch does not break the semantics
> >>>> above, we can consider merging it.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> --
> >>>> Chris Ball<cjb@laptop.org>    <http://printf.net/>
> >>>> One Laptop Per Child
> >


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-28 18:59                                       ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-28 18:59 UTC (permalink / raw)
  To: linux-arm-kernel

Adrian,

> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> Sent: Wednesday, September 29, 2010 12:03 AM
> To: Ghorai, Sukumar
> Cc: Chris Ball; linux-mmc at vger.kernel.org; linux-arm-
> kernel at lists.infradead.org; Russell King - ARM Linux
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On 28/09/10 18:03, Ghorai, Sukumar wrote:
> > Chris and Adrian,
> >
> > [..snip..]
> >>
> >> Chris and Adrian,
> >>
> >> [..snip..]
> >>>
> >>>> -----Original Message-----
> > [..snip..]
> >>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>>>
> >>>> On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> >>>>> Would you please review and merge this patch [1] (attached too)?
> >>>>> [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >>>>
> >>>> I've been following the thread.  I believe Adrian has NACKed this
> >> patch,
> >>>> by saying "It is absolutely unacceptable to return I/O errors to the
> >>>> upper layers for segments that do not have errors."
> >>>
> >>> [Ghorai]
> >>> I think Russell also mentioned his opinion. Would you please add your
> >> idea
> >>> too?
> >>>
> >>> 1. I would prefer Adrian to explain again what this statement means,
> in
> >>> the context - data read fail and how we make it success?
> 
> Because I/O requests are made up of segments and every segment can be a
> success or failure.
[Ghorai] don't you conflict your self for the comments you provide for following patch - 
[PATCH] MMC: Refine block layer waiting for card state
[Adrian].. then why wait for lots of errors before doing it.

> 
> >>>
> >>> 2. if data read fail for sector(x) why we have to try for
> >>> sector(x+1, ..x+n)?
> 
> See answer to q. 1
> 
> >>>
> >>> 3. how to inform reader function which sector having the valid data
> out
> >> of
> >>> (1...n) sectors.
> 
> __blk_end_request() does that
[Ghorai] not true. Please check the code again.

> 
> >>>
> >>> 4. do we have any driver/code in Linux or any other os, which give
> >> inter-
> >>> leave data and return as success?
> 
> Here is the problem with that question.  The *same* I/O request
> can have data for *different*sources.
[Ghorai] File system does not do that and can you test that once how data comes from difference soure? 
Also conflicting your-self for the input you gave for the patch and as -
[PATCH] MMC: Refine block layer waiting for card state
[Adrian].. then why wait for lots of errors before doing it.

> 
> >>>
> >> [Ghorai] please reply with your input on my/ Russell's suggestion?
> > [Ghorai] any input?
> 
> I have a question for you.  What use cases do you want to address
>   - other than card removal?
[Ghorai] 
1. can you reply to original input form Russell's on the same thread?
2. can you check if you return the interleave data to FS how it can behave?
3. still you don't have any reference driver which provide the interleave data.


> 
> >>
> >>>>
> >>>> I think it's possible to merge patches to improve the situation (such
> >>>> as the idea of noticing a card disappearing earlier), but your
> initial
> >>>> patch is not the patch to do that.  You should continue to work with
> >>>> Adrian -- when he's happy that a patch does not break the semantics
> >>>> above, we can consider merging it.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> --
> >>>> Chris Ball<cjb@laptop.org>    <http://printf.net/>
> >>>> One Laptop Per Child
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mmc: failure of block read wait for long time
  2010-09-28 18:59                                       ` Ghorai, Sukumar
@ 2010-09-28 20:05                                         ` Adrian Hunter
  -1 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-28 20:05 UTC (permalink / raw)
  To: Ghorai, Sukumar
  Cc: Chris Ball, linux-mmc, linux-arm-kernel, Russell King - ARM Linux

On 28/09/10 21:59, ext Ghorai, Sukumar wrote:
> Adrian,
>
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
>> Sent: Wednesday, September 29, 2010 12:03 AM
>> To: Ghorai, Sukumar
>> Cc: Chris Ball; linux-mmc@vger.kernel.org; linux-arm-
>> kernel@lists.infradead.org; Russell King - ARM Linux
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> On 28/09/10 18:03, Ghorai, Sukumar wrote:
>>> Chris and Adrian,
>>>
>>> [..snip..]
>>>>
>>>> Chris and Adrian,
>>>>
>>>> [..snip..]
>>>>>
>>>>>> -----Original Message-----
>>> [..snip..]
>>>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>>>
>>>>>> On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
>>>>>>> Would you please review and merge this patch [1] (attached too)?
>>>>>>> [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>>>>>>
>>>>>> I've been following the thread.  I believe Adrian has NACKed this
>>>> patch,
>>>>>> by saying "It is absolutely unacceptable to return I/O errors to the
>>>>>> upper layers for segments that do not have errors."
>>>>>
>>>>> [Ghorai]
>>>>> I think Russell also mentioned his opinion. Would you please add your
>>>> idea
>>>>> too?
>>>>>
>>>>> 1. I would prefer Adrian to explain again what this statement means,
>> in
>>>>> the context - data read fail and how we make it success?
>>
>> Because I/O requests are made up of segments and every segment can be a
>> success or failure.
> [Ghorai] don't you conflict your self for the comments you provide for following patch -
> [PATCH] MMC: Refine block layer waiting for card state
> [Adrian].. then why wait for lots of errors before doing it.

That patch needs a lot more work.  Please do not base your
understanding on it.

>
>>
>>>>>
>>>>> 2. if data read fail for sector(x) why we have to try for
>>>>> sector(x+1, ..x+n)?
>>
>> See answer to q. 1
>>
>>>>>
>>>>> 3. how to inform reader function which sector having the valid data
>> out
>>>> of
>>>>> (1...n) sectors.
>>
>> __blk_end_request() does that
> [Ghorai] not true. Please check the code again.

Every time you call __blk_end_request() you specify success or
failure for the specified numbers of bytes starting from the
last position.

>
>>
>>>>>
>>>>> 4. do we have any driver/code in Linux or any other os, which give
>>>> inter-
>>>>> leave data and return as success?
>>
>> Here is the problem with that question.  The *same* I/O request
>> can have data for *different*sources.
> [Ghorai] File system does not do that and can you test that once how data comes from difference soure?
> Also conflicting your-self for the input you gave for the patch and as -
> [PATCH] MMC: Refine block layer waiting for card state
> [Adrian].. then why wait for lots of errors before doing it.
>
>>
>>>>>
>>>> [Ghorai] please reply with your input on my/ Russell's suggestion?
>>> [Ghorai] any input?
>>
>> I have a question for you.  What use cases do you want to address
>>    - other than card removal?

Please answer this question.

> [Ghorai]
> 1. can you reply to original input form Russell's on the same thread?

Russell did not make any suggestions.  He pointed out that some drivers,
but not all (and not omap_hsmmc), indicate how many bytes were transferred.
However it is difficult for me to explain how this will or will not help if
you won't give more information about your use cases.

For example, in the case of ECC errors, there are usually only a few blocks
in error, so only a few of the retries timeout, so retrying is not slow.
That is very different in the case the card has been removed, or has become
unresponsive - in which case every retry fails and has to timeout.

I still plan to address the card removal issue, but I am very busy, so don't
hold your breath.

> 2. can you check if you return the interleave data to FS how it can behave?
> 3. still you don't have any reference driver which provide the interleave data.

A single I/O request could have resulted from merging I/O requests from
two *different* file systems on two *different* partitions.  I provide as
reference every single linux file system.

>
>
>>
>>>>
>>>>>>
>>>>>> I think it's possible to merge patches to improve the situation (such
>>>>>> as the idea of noticing a card disappearing earlier), but your
>> initial
>>>>>> patch is not the patch to do that.  You should continue to work with
>>>>>> Adrian -- when he's happy that a patch does not break the semantics
>>>>>> above, we can consider merging it.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --
>>>>>> Chris Ball<cjb@laptop.org>     <http://printf.net/>
>>>>>> One Laptop Per Child
>>>
>
>


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-28 20:05                                         ` Adrian Hunter
  0 siblings, 0 replies; 54+ messages in thread
From: Adrian Hunter @ 2010-09-28 20:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/09/10 21:59, ext Ghorai, Sukumar wrote:
> Adrian,
>
>> -----Original Message-----
>> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
>> Sent: Wednesday, September 29, 2010 12:03 AM
>> To: Ghorai, Sukumar
>> Cc: Chris Ball; linux-mmc at vger.kernel.org; linux-arm-
>> kernel at lists.infradead.org; Russell King - ARM Linux
>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>
>> On 28/09/10 18:03, Ghorai, Sukumar wrote:
>>> Chris and Adrian,
>>>
>>> [..snip..]
>>>>
>>>> Chris and Adrian,
>>>>
>>>> [..snip..]
>>>>>
>>>>>> -----Original Message-----
>>> [..snip..]
>>>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
>>>>>>
>>>>>> On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
>>>>>>> Would you please review and merge this patch [1] (attached too)?
>>>>>>> [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
>>>>>>
>>>>>> I've been following the thread.  I believe Adrian has NACKed this
>>>> patch,
>>>>>> by saying "It is absolutely unacceptable to return I/O errors to the
>>>>>> upper layers for segments that do not have errors."
>>>>>
>>>>> [Ghorai]
>>>>> I think Russell also mentioned his opinion. Would you please add your
>>>> idea
>>>>> too?
>>>>>
>>>>> 1. I would prefer Adrian to explain again what this statement means,
>> in
>>>>> the context - data read fail and how we make it success?
>>
>> Because I/O requests are made up of segments and every segment can be a
>> success or failure.
> [Ghorai] don't you conflict your self for the comments you provide for following patch -
> [PATCH] MMC: Refine block layer waiting for card state
> [Adrian].. then why wait for lots of errors before doing it.

That patch needs a lot more work.  Please do not base your
understanding on it.

>
>>
>>>>>
>>>>> 2. if data read fail for sector(x) why we have to try for
>>>>> sector(x+1, ..x+n)?
>>
>> See answer to q. 1
>>
>>>>>
>>>>> 3. how to inform reader function which sector having the valid data
>> out
>>>> of
>>>>> (1...n) sectors.
>>
>> __blk_end_request() does that
> [Ghorai] not true. Please check the code again.

Every time you call __blk_end_request() you specify success or
failure for the specified numbers of bytes starting from the
last position.

>
>>
>>>>>
>>>>> 4. do we have any driver/code in Linux or any other os, which give
>>>> inter-
>>>>> leave data and return as success?
>>
>> Here is the problem with that question.  The *same* I/O request
>> can have data for *different*sources.
> [Ghorai] File system does not do that and can you test that once how data comes from difference soure?
> Also conflicting your-self for the input you gave for the patch and as -
> [PATCH] MMC: Refine block layer waiting for card state
> [Adrian].. then why wait for lots of errors before doing it.
>
>>
>>>>>
>>>> [Ghorai] please reply with your input on my/ Russell's suggestion?
>>> [Ghorai] any input?
>>
>> I have a question for you.  What use cases do you want to address
>>    - other than card removal?

Please answer this question.

> [Ghorai]
> 1. can you reply to original input form Russell's on the same thread?

Russell did not make any suggestions.  He pointed out that some drivers,
but not all (and not omap_hsmmc), indicate how many bytes were transferred.
However it is difficult for me to explain how this will or will not help if
you won't give more information about your use cases.

For example, in the case of ECC errors, there are usually only a few blocks
in error, so only a few of the retries timeout, so retrying is not slow.
That is very different in the case the card has been removed, or has become
unresponsive - in which case every retry fails and has to timeout.

I still plan to address the card removal issue, but I am very busy, so don't
hold your breath.

> 2. can you check if you return the interleave data to FS how it can behave?
> 3. still you don't have any reference driver which provide the interleave data.

A single I/O request could have resulted from merging I/O requests from
two *different* file systems on two *different* partitions.  I provide as
reference every single linux file system.

>
>
>>
>>>>
>>>>>>
>>>>>> I think it's possible to merge patches to improve the situation (such
>>>>>> as the idea of noticing a card disappearing earlier), but your
>> initial
>>>>>> patch is not the patch to do that.  You should continue to work with
>>>>>> Adrian -- when he's happy that a patch does not break the semantics
>>>>>> above, we can consider merging it.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --
>>>>>> Chris Ball<cjb@laptop.org>     <http://printf.net/>
>>>>>> One Laptop Per Child
>>>
>
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH] mmc: failure of block read wait for long time
  2010-09-28 20:05                                         ` Adrian Hunter
@ 2010-09-29  5:59                                           ` Ghorai, Sukumar
  -1 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-29  5:59 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Russell King - ARM Linux, Chris Ball, linux-mmc, linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> Sent: Wednesday, September 29, 2010 1:35 AM
> To: Ghorai, Sukumar
> Cc: Chris Ball; linux-mmc@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; Russell King - ARM Linux
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On 28/09/10 21:59, ext Ghorai, Sukumar wrote:
> > Adrian,
> >
> >> -----Original Message-----
> >> From: Adrian Hunter [mailto:adrian.hunter@nokia.com]
> >> Sent: Wednesday, September 29, 2010 12:03 AM
> >> To: Ghorai, Sukumar
> >> Cc: Chris Ball; linux-mmc@vger.kernel.org; linux-arm-
> >> kernel@lists.infradead.org; Russell King - ARM Linux
> >> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>
> >> On 28/09/10 18:03, Ghorai, Sukumar wrote:
> >>> Chris and Adrian,
> >>>
> >>> [..snip..]
> >>>>
> >>>> Chris and Adrian,
> >>>>
> >>>> [..snip..]
> >>>>>
> >>>>>> -----Original Message-----
> >>> [..snip..]
> >>>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>>>>>
> >>>>>> On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> >>>>>>> Would you please review and merge this patch [1] (attached too)?
> >>>>>>> [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >>>>>>
> >>>>>> I've been following the thread.  I believe Adrian has NACKed this
> >>>> patch,
> >>>>>> by saying "It is absolutely unacceptable to return I/O errors to
> the
> >>>>>> upper layers for segments that do not have errors."
> >>>>>
> >>>>> [Ghorai]
> >>>>> I think Russell also mentioned his opinion. Would you please add
> your
> >>>> idea
> >>>>> too?
> >>>>>
> >>>>> 1. I would prefer Adrian to explain again what this statement means,
> >> in
> >>>>> the context - data read fail and how we make it success?
> >>
> >> Because I/O requests are made up of segments and every segment can be a
> >> success or failure.
> > [Ghorai] don't you conflict your self for the comments you provide for
> following patch -
> > [PATCH] MMC: Refine block layer waiting for card state
> > [Adrian].. then why wait for lots of errors before doing it.
> 
> That patch needs a lot more work.  Please do not base your
> understanding on it.
[Ghorai] it's the similar problem when read fails and you also suggest how to break early. 

> 
> >
> >>
> >>>>>
> >>>>> 2. if data read fail for sector(x) why we have to try for
> >>>>> sector(x+1, ..x+n)?
> >>
> >> See answer to q. 1
> >>
> >>>>>
> >>>>> 3. how to inform reader function which sector having the valid data
> >> out
> >>>> of
> >>>>> (1...n) sectors.
> >>
> >> __blk_end_request() does that
> > [Ghorai] not true. Please check the code again.
> 
> Every time you call __blk_end_request() you specify success or
> failure for the specified numbers of bytes starting from the
> last position.
> 
> >
> >>
> >>>>>
> >>>>> 4. do we have any driver/code in Linux or any other os, which give
> >>>> inter-
> >>>>> leave data and return as success?
> >>
> >> Here is the problem with that question.  The *same* I/O request
> >> can have data for *different*sources.
> > [Ghorai] File system does not do that and can you test that once how
> data comes from difference soure?
> > Also conflicting your-self for the input you gave for the patch and as -
> > [PATCH] MMC: Refine block layer waiting for card state
> > [Adrian].. then why wait for lots of errors before doing it.
> >
> >>
> >>>>>
> >>>> [Ghorai] please reply with your input on my/ Russell's suggestion?
> >>> [Ghorai] any input?
> >>
> >> I have a question for you.  What use cases do you want to address
> >>    - other than card removal?
> 
> Please answer this question.
[Ghorai] say data error (including timeout), ECC error ..

> 
> > [Ghorai]
> > 1. can you reply to original input form Russell's on the same thread?
> 
> Russell did not make any suggestions.  He pointed out that some drivers,
> but not all (and not omap_hsmmc), indicate how many bytes were transferred.
> However it is difficult for me to explain how this will or will not help
> if
> you won't give more information about your use cases.
[Ghorai] any driver give the interleave data to apps? Can you test if you give the interleave data to FS how it behave? Even it can cause the system fault. And will spend day and night to find out the issue in which module - memory, host, card?

> 
> For example, in the case of ECC errors, there are usually only a few
> blocks
> in error, so only a few of the retries timeout, so retrying is not slow.
> That is very different in the case the card has been removed, or has
> become
> unresponsive - in which case every retry fails and has to timeout.
> 
> I still plan to address the card removal issue, but I am very busy, so
> don't
> hold your breath.
[Ghorai] so I will wait for your patch forever.
> 
> > 2. can you check if you return the interleave data to FS how it can
> behave?
[Ghorai] any time?
> > 3. still you don't have any reference driver which provide the
> interleave data.
> 
> A single I/O request could have resulted from merging I/O requests from
> two *different* file systems on two *different* partitions.  I provide as
> reference every single linux file system.
[Ghorai] Filesystem never marge two IO in a single request, and if say cache did not support (sync mode)?

[Ghorai] thanks that you want to provide the solution in different way and in different patch. I will wait for your patch forever. In the mean time will take this fix locally for system integration.

And in the mean time I will simulate a case when IO failed in between (say any request to sector number 100 to 200) will return the Data timeout. And will check how system behaves. 

> 
> >
> >
> >>
> >>>>
> >>>>>>
> >>>>>> I think it's possible to merge patches to improve the situation
> (such
> >>>>>> as the idea of noticing a card disappearing earlier), but your
> >> initial
> >>>>>> patch is not the patch to do that.  You should continue to work
> with
> >>>>>> Adrian -- when he's happy that a patch does not break the semantics
> >>>>>> above, we can consider merging it.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> --
> >>>>>> Chris Ball<cjb@laptop.org>     <http://printf.net/>
> >>>>>> One Laptop Per Child
> >>>
> >
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH] mmc: failure of block read wait for long time
@ 2010-09-29  5:59                                           ` Ghorai, Sukumar
  0 siblings, 0 replies; 54+ messages in thread
From: Ghorai, Sukumar @ 2010-09-29  5:59 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> Sent: Wednesday, September 29, 2010 1:35 AM
> To: Ghorai, Sukumar
> Cc: Chris Ball; linux-mmc at vger.kernel.org; linux-arm-
> kernel at lists.infradead.org; Russell King - ARM Linux
> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> 
> On 28/09/10 21:59, ext Ghorai, Sukumar wrote:
> > Adrian,
> >
> >> -----Original Message-----
> >> From: Adrian Hunter [mailto:adrian.hunter at nokia.com]
> >> Sent: Wednesday, September 29, 2010 12:03 AM
> >> To: Ghorai, Sukumar
> >> Cc: Chris Ball; linux-mmc at vger.kernel.org; linux-arm-
> >> kernel at lists.infradead.org; Russell King - ARM Linux
> >> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>
> >> On 28/09/10 18:03, Ghorai, Sukumar wrote:
> >>> Chris and Adrian,
> >>>
> >>> [..snip..]
> >>>>
> >>>> Chris and Adrian,
> >>>>
> >>>> [..snip..]
> >>>>>
> >>>>>> -----Original Message-----
> >>> [..snip..]
> >>>>>> Subject: Re: [PATCH] mmc: failure of block read wait for long time
> >>>>>>
> >>>>>> On Wed, Sep 22, 2010 at 11:02:08AM +0530, Ghorai, Sukumar wrote:
> >>>>>>> Would you please review and merge this patch [1] (attached too)?
> >>>>>>> [1] http://comments.gmane.org/gmane.linux.kernel.mmc/2714
> >>>>>>
> >>>>>> I've been following the thread.  I believe Adrian has NACKed this
> >>>> patch,
> >>>>>> by saying "It is absolutely unacceptable to return I/O errors to
> the
> >>>>>> upper layers for segments that do not have errors."
> >>>>>
> >>>>> [Ghorai]
> >>>>> I think Russell also mentioned his opinion. Would you please add
> your
> >>>> idea
> >>>>> too?
> >>>>>
> >>>>> 1. I would prefer Adrian to explain again what this statement means,
> >> in
> >>>>> the context - data read fail and how we make it success?
> >>
> >> Because I/O requests are made up of segments and every segment can be a
> >> success or failure.
> > [Ghorai] don't you conflict your self for the comments you provide for
> following patch -
> > [PATCH] MMC: Refine block layer waiting for card state
> > [Adrian].. then why wait for lots of errors before doing it.
> 
> That patch needs a lot more work.  Please do not base your
> understanding on it.
[Ghorai] it's the similar problem when read fails and you also suggest how to break early. 

> 
> >
> >>
> >>>>>
> >>>>> 2. if data read fail for sector(x) why we have to try for
> >>>>> sector(x+1, ..x+n)?
> >>
> >> See answer to q. 1
> >>
> >>>>>
> >>>>> 3. how to inform reader function which sector having the valid data
> >> out
> >>>> of
> >>>>> (1...n) sectors.
> >>
> >> __blk_end_request() does that
> > [Ghorai] not true. Please check the code again.
> 
> Every time you call __blk_end_request() you specify success or
> failure for the specified numbers of bytes starting from the
> last position.
> 
> >
> >>
> >>>>>
> >>>>> 4. do we have any driver/code in Linux or any other os, which give
> >>>> inter-
> >>>>> leave data and return as success?
> >>
> >> Here is the problem with that question.  The *same* I/O request
> >> can have data for *different*sources.
> > [Ghorai] File system does not do that and can you test that once how
> data comes from difference soure?
> > Also conflicting your-self for the input you gave for the patch and as -
> > [PATCH] MMC: Refine block layer waiting for card state
> > [Adrian].. then why wait for lots of errors before doing it.
> >
> >>
> >>>>>
> >>>> [Ghorai] please reply with your input on my/ Russell's suggestion?
> >>> [Ghorai] any input?
> >>
> >> I have a question for you.  What use cases do you want to address
> >>    - other than card removal?
> 
> Please answer this question.
[Ghorai] say data error (including timeout), ECC error ..

> 
> > [Ghorai]
> > 1. can you reply to original input form Russell's on the same thread?
> 
> Russell did not make any suggestions.  He pointed out that some drivers,
> but not all (and not omap_hsmmc), indicate how many bytes were transferred.
> However it is difficult for me to explain how this will or will not help
> if
> you won't give more information about your use cases.
[Ghorai] any driver give the interleave data to apps? Can you test if you give the interleave data to FS how it behave? Even it can cause the system fault. And will spend day and night to find out the issue in which module - memory, host, card?

> 
> For example, in the case of ECC errors, there are usually only a few
> blocks
> in error, so only a few of the retries timeout, so retrying is not slow.
> That is very different in the case the card has been removed, or has
> become
> unresponsive - in which case every retry fails and has to timeout.
> 
> I still plan to address the card removal issue, but I am very busy, so
> don't
> hold your breath.
[Ghorai] so I will wait for your patch forever.
> 
> > 2. can you check if you return the interleave data to FS how it can
> behave?
[Ghorai] any time?
> > 3. still you don't have any reference driver which provide the
> interleave data.
> 
> A single I/O request could have resulted from merging I/O requests from
> two *different* file systems on two *different* partitions.  I provide as
> reference every single linux file system.
[Ghorai] Filesystem never marge two IO in a single request, and if say cache did not support (sync mode)?

[Ghorai] thanks that you want to provide the solution in different way and in different patch. I will wait for your patch forever. In the mean time will take this fix locally for system integration.

And in the mean time I will simulate a case when IO failed in between (say any request to sector number 100 to 200) will return the Data timeout. And will check how system behaves. 

> 
> >
> >
> >>
> >>>>
> >>>>>>
> >>>>>> I think it's possible to merge patches to improve the situation
> (such
> >>>>>> as the idea of noticing a card disappearing earlier), but your
> >> initial
> >>>>>> patch is not the patch to do that.  You should continue to work
> with
> >>>>>> Adrian -- when he's happy that a patch does not break the semantics
> >>>>>> above, we can consider merging it.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> --
> >>>>>> Chris Ball<cjb@laptop.org>     <http://printf.net/>
> >>>>>> One Laptop Per Child
> >>>
> >
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2010-09-29  5:59 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-27 11:27 [PATCH] mmc: failure of block read wait for long time Sukumar Ghorai
2010-07-27 11:27 ` Sukumar Ghorai
2010-07-27 13:22 ` Adrian Hunter
2010-07-27 13:22   ` Adrian Hunter
2010-07-27 13:32   ` Ghorai, Sukumar
2010-07-27 13:32     ` Ghorai, Sukumar
2010-07-28  8:41     ` Adrian Hunter
2010-07-28  8:41       ` Adrian Hunter
2010-09-10 11:30       ` Ghorai, Sukumar
2010-09-10 11:30         ` Ghorai, Sukumar
2010-09-10 11:43         ` Adrian Hunter
2010-09-10 11:43           ` Adrian Hunter
2010-09-10 11:48           ` Ghorai, Sukumar
2010-09-10 11:48             ` Ghorai, Sukumar
2010-09-10 13:02             ` Adrian Hunter
2010-09-10 13:02               ` Adrian Hunter
2010-09-14  5:15               ` Ghorai, Sukumar
2010-09-14  5:15                 ` Ghorai, Sukumar
2010-09-20  7:54                 ` Adrian Hunter
2010-09-20  7:54                   ` Adrian Hunter
2010-09-20  8:57                   ` Ghorai, Sukumar
2010-09-20  8:57                     ` Ghorai, Sukumar
2010-09-20 11:49                     ` Adrian Hunter
2010-09-20 11:49                       ` Adrian Hunter
2010-09-20 12:37                       ` Ghorai, Sukumar
2010-09-20 12:37                         ` Ghorai, Sukumar
2010-09-20 13:09                         ` Adrian Hunter
2010-09-20 13:09                           ` Adrian Hunter
2010-09-20 13:25                           ` Ghorai, Sukumar
2010-09-20 13:25                             ` Ghorai, Sukumar
2010-09-20 13:37                           ` Russell King - ARM Linux
2010-09-20 13:37                             ` Russell King - ARM Linux
2010-09-22  5:32                             ` Ghorai, Sukumar
2010-09-22  5:32                               ` Ghorai, Sukumar
2010-09-22 12:43                               ` Chris Ball
2010-09-22 12:43                                 ` Chris Ball
2010-09-22 12:51                                 ` Ghorai, Sukumar
2010-09-22 12:51                                   ` Ghorai, Sukumar
2010-09-24 14:35                                 ` Ghorai, Sukumar
2010-09-24 14:35                                   ` Ghorai, Sukumar
2010-09-28 15:03                                 ` Ghorai, Sukumar
2010-09-28 15:03                                   ` Ghorai, Sukumar
2010-09-28 18:32                                   ` Adrian Hunter
2010-09-28 18:32                                     ` Adrian Hunter
2010-09-28 18:59                                     ` Ghorai, Sukumar
2010-09-28 18:59                                       ` Ghorai, Sukumar
2010-09-28 20:05                                       ` Adrian Hunter
2010-09-28 20:05                                         ` Adrian Hunter
2010-09-29  5:59                                         ` Ghorai, Sukumar
2010-09-29  5:59                                           ` Ghorai, Sukumar
2010-08-27 20:59 ` Chris Ball
2010-08-27 20:59   ` Chris Ball
2010-08-30 19:09   ` Ghorai, Sukumar
2010-08-30 19:09     ` Ghorai, Sukumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.