linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH  1/2] mtd: rawnand: brcmnand: Don't default to edu transfer
@ 2020-06-11  5:44 Kamal Dasu
  2020-06-11  5:44 ` [PATCH 2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers Kamal Dasu
  2020-06-11  7:16 ` [PATCH 1/2] mtd: rawnand: brcmnand: Don't default to edu transfer Miquel Raynal
  0 siblings, 2 replies; 12+ messages in thread
From: Kamal Dasu @ 2020-06-11  5:44 UTC (permalink / raw)
  To: Brian Norris, Kamal Dasu, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra
  Cc: linux-mtd, bcm-kernel-feedback-list, linux-kernel

When flash-dma is absent do not default to using flash-edu.
Make sure flash-edu is enabled before setting EDU transfer
function.

Fixes: a5d53ad26a8b ("mtd: rawnand: brcmnand: Add support for flash-edu for dma transfers")
Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
---
 drivers/mtd/nand/raw/brcmnand/brcmnand.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
index 8f9ffb46a09f..0c1d6e543586 100644
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
@@ -2953,8 +2953,9 @@ int brcmnand_probe(struct platform_device *pdev, struct brcmnand_soc *soc)
 		if (ret < 0)
 			goto err;
 
-		/* set edu transfer function to call */
-		ctrl->dma_trans = brcmnand_edu_trans;
+		if (has_edu(ctrl))
+			/* set edu transfer function to call */
+			ctrl->dma_trans = brcmnand_edu_trans;
 	}
 
 	/* Disable automatic device ID config, direct addressing */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH  2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers
  2020-06-11  5:44 [PATCH 1/2] mtd: rawnand: brcmnand: Don't default to edu transfer Kamal Dasu
@ 2020-06-11  5:44 ` Kamal Dasu
  2020-06-11  7:27   ` Miquel Raynal
  2020-06-11  7:16 ` [PATCH 1/2] mtd: rawnand: brcmnand: Don't default to edu transfer Miquel Raynal
  1 sibling, 1 reply; 12+ messages in thread
From: Kamal Dasu @ 2020-06-11  5:44 UTC (permalink / raw)
  To: Brian Norris, Kamal Dasu, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra
  Cc: linux-mtd, bcm-kernel-feedback-list, linux-kernel

Implemented ECC correctable and uncorrectable error handling for EDU
reads. If ECC correctable bitflips are encountered  on EDU transfer,
read page again using pio, This is needed due to a controller lmitation
where read and corrected data is not transferred to the DMA buffer on ECC
errors. This holds true for ECC correctable errors beyond set threshold.

Fixes: a5d53ad26a8b ("mtd: rawnand: brcmnand: Add support for flash-edu for dma transfers")
Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
---
 drivers/mtd/nand/raw/brcmnand/brcmnand.c | 26 ++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
index 0c1d6e543586..d7daa83c8a58 100644
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
@@ -1855,6 +1855,22 @@ static int brcmnand_edu_trans(struct brcmnand_host *host, u64 addr, u32 *buf,
 	edu_writel(ctrl, EDU_STOP, 0); /* force stop */
 	edu_readl(ctrl, EDU_STOP);
 
+	if (ret == 0 && edu_cmd == EDU_CMD_READ) {
+		u64 err_addr = 0;
+
+		/*
+		 * check for ecc errors here, subpage ecc erros are
+		 * retained in ecc error addr register
+		 */
+		err_addr = brcmnand_get_uncorrecc_addr(ctrl);
+		if (!err_addr) {
+			err_addr = brcmnand_get_correcc_addr(ctrl);
+			if (err_addr)
+				ret = -EUCLEAN;
+		} else
+			ret = -EBADMSG;
+	}
+
 	return ret;
 }
 
@@ -2058,6 +2074,7 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 	u64 err_addr = 0;
 	int err;
 	bool retry = true;
+	bool edu_read = false;
 
 	dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
 
@@ -2075,6 +2092,10 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 			else
 				return -EIO;
 		}
+
+		if (has_edu(ctrl))
+			edu_read = true;
+
 	} else {
 		if (oob)
 			memset(oob, 0x99, mtd->oobsize);
@@ -2122,6 +2143,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 	if (mtd_is_bitflip(err)) {
 		unsigned int corrected = brcmnand_count_corrected(ctrl);
 
+		/* in case of edu correctable error we read again using pio */
+		if (edu_read)
+			err = brcmnand_read_by_pio(mtd, chip, addr, trans, buf,
+						   oob, &err_addr);
+
 		dev_dbg(ctrl->dev, "corrected error at 0x%llx\n",
 			(unsigned long long)err_addr);
 		mtd->ecc_stats.corrected += corrected;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH  1/2] mtd: rawnand: brcmnand: Don't default to edu transfer
  2020-06-11  5:44 [PATCH 1/2] mtd: rawnand: brcmnand: Don't default to edu transfer Kamal Dasu
  2020-06-11  5:44 ` [PATCH 2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers Kamal Dasu
@ 2020-06-11  7:16 ` Miquel Raynal
  2020-06-11 15:22   ` Kamal Dasu
  1 sibling, 1 reply; 12+ messages in thread
From: Miquel Raynal @ 2020-06-11  7:16 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: Brian Norris, Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	bcm-kernel-feedback-list, linux-kernel

Hi Kamal,

Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 01:44:53
-0400:

> When flash-dma is absent do not default to using flash-edu.
> Make sure flash-edu is enabled before setting EDU transfer
> function.
> 
> Fixes: a5d53ad26a8b ("mtd: rawnand: brcmnand: Add support for flash-edu for dma transfers")
> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> ---
>  drivers/mtd/nand/raw/brcmnand/brcmnand.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> index 8f9ffb46a09f..0c1d6e543586 100644
> --- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> @@ -2953,8 +2953,9 @@ int brcmnand_probe(struct platform_device *pdev, struct brcmnand_soc *soc)
>  		if (ret < 0)
>  			goto err;
>  
> -		/* set edu transfer function to call */
> -		ctrl->dma_trans = brcmnand_edu_trans;
> +		if (has_edu(ctrl))
> +			/* set edu transfer function to call */
> +			ctrl->dma_trans = brcmnand_edu_trans;

Does this fallback to returning an error in case !has_edu() ? Othewise,
how is it handled?

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH  2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers
  2020-06-11  5:44 ` [PATCH 2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers Kamal Dasu
@ 2020-06-11  7:27   ` Miquel Raynal
  2020-06-11 16:04     ` Kamal Dasu
  0 siblings, 1 reply; 12+ messages in thread
From: Miquel Raynal @ 2020-06-11  7:27 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: Brian Norris, Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	bcm-kernel-feedback-list, linux-kernel

Hi Kamal,

Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 01:44:54
-0400:

> Implemented ECC correctable and uncorrectable error handling for EDU

Implement?

> reads. If ECC correctable bitflips are encountered  on EDU transfer,

extra space                                         ^

> read page again using pio, This is needed due to a controller lmitation

s/pio/PIO/

> where read and corrected data is not transferred to the DMA buffer on ECC
> errors. This holds true for ECC correctable errors beyond set threshold.

error.

Not sure what the last sentence means?

> 
> Fixes: a5d53ad26a8b ("mtd: rawnand: brcmnand: Add support for flash-edu for dma transfers")
> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> ---

Minor nits below :)

>  drivers/mtd/nand/raw/brcmnand/brcmnand.c | 26 ++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> index 0c1d6e543586..d7daa83c8a58 100644
> --- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> @@ -1855,6 +1855,22 @@ static int brcmnand_edu_trans(struct brcmnand_host *host, u64 addr, u32 *buf,
>  	edu_writel(ctrl, EDU_STOP, 0); /* force stop */
>  	edu_readl(ctrl, EDU_STOP);
>  
> +	if (ret == 0 && edu_cmd == EDU_CMD_READ) {

!ret

> +		u64 err_addr = 0;
> +
> +		/*
> +		 * check for ecc errors here, subpage ecc erros are
> +		 * retained in ecc error addr register

s/ecc/ECC/
s/erros/errors/
s/addr/address/

> +		 */
> +		err_addr = brcmnand_get_uncorrecc_addr(ctrl);
> +		if (!err_addr) {
> +			err_addr = brcmnand_get_correcc_addr(ctrl);
> +			if (err_addr)
> +				ret = -EUCLEAN;
> +		} else
> +			ret = -EBADMSG;

I don't like very much to see these values being used within NAND
controller drivers but I see it's already the cause, so I guess I can
live with that.

> +	}
> +
>  	return ret;
>  }
>  
> @@ -2058,6 +2074,7 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>  	u64 err_addr = 0;
>  	int err;
>  	bool retry = true;
> +	bool edu_read = false;
>  
>  	dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
>  
> @@ -2075,6 +2092,10 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>  			else
>  				return -EIO;
>  		}
> +
> +		if (has_edu(ctrl))
> +			edu_read = true;

You don't need this extra value, you already have the cmd parameter
which tells you if it is a read or a write. You might even want to
create a if block so set dir and edu_cmd and eventually a local
edu_read if you think it still makes sense.

> +
>  	} else {
>  		if (oob)
>  			memset(oob, 0x99, mtd->oobsize);
> @@ -2122,6 +2143,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>  	if (mtd_is_bitflip(err)) {
>  		unsigned int corrected = brcmnand_count_corrected(ctrl);
>  
> +		/* in case of edu correctable error we read again using pio */

s/edu/EDU/ ?
s/pio/PIO/

> +		if (edu_read)
> +			err = brcmnand_read_by_pio(mtd, chip, addr, trans, buf,
> +						   oob, &err_addr);
> +
>  		dev_dbg(ctrl->dev, "corrected error at 0x%llx\n",
>  			(unsigned long long)err_addr);
>  		mtd->ecc_stats.corrected += corrected;

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mtd: rawnand: brcmnand: Don't default to edu transfer
  2020-06-11  7:16 ` [PATCH 1/2] mtd: rawnand: brcmnand: Don't default to edu transfer Miquel Raynal
@ 2020-06-11 15:22   ` Kamal Dasu
  2020-06-11 15:24     ` Miquel Raynal
  0 siblings, 1 reply; 12+ messages in thread
From: Kamal Dasu @ 2020-06-11 15:22 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Brian Norris, Richard Weinberger, Vignesh Raghavendra,
	MTD Maling List, bcm-kernel-feedback-list,
	Linux Kernel Mailing List

On Thu, Jun 11, 2020 at 3:16 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Kamal,
>
> Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 01:44:53
> -0400:
>
> > When flash-dma is absent do not default to using flash-edu.
> > Make sure flash-edu is enabled before setting EDU transfer
> > function.
> >
> > Fixes: a5d53ad26a8b ("mtd: rawnand: brcmnand: Add support for flash-edu for dma transfers")
> > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> > ---
> >  drivers/mtd/nand/raw/brcmnand/brcmnand.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > index 8f9ffb46a09f..0c1d6e543586 100644
> > --- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > @@ -2953,8 +2953,9 @@ int brcmnand_probe(struct platform_device *pdev, struct brcmnand_soc *soc)
> >               if (ret < 0)
> >                       goto err;
> >
> > -             /* set edu transfer function to call */
> > -             ctrl->dma_trans = brcmnand_edu_trans;
> > +             if (has_edu(ctrl))
> > +                     /* set edu transfer function to call */
> > +                     ctrl->dma_trans = brcmnand_edu_trans;
>
> Does this fallback to returning an error in case !has_edu() ? Othewise,
> how is it handled?
>

 The driver will default to pio if both flash-dma and falsh-edu are
not present.

> Thanks,
> Miquèl

Kamal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mtd: rawnand: brcmnand: Don't default to edu transfer
  2020-06-11 15:22   ` Kamal Dasu
@ 2020-06-11 15:24     ` Miquel Raynal
  0 siblings, 0 replies; 12+ messages in thread
From: Miquel Raynal @ 2020-06-11 15:24 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: Brian Norris, Richard Weinberger, Vignesh Raghavendra,
	MTD Maling List, bcm-kernel-feedback-list,
	Linux Kernel Mailing List

Hi Kamal,

Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 11:22:56
-0400:

> On Thu, Jun 11, 2020 at 3:16 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Kamal,
> >
> > Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 01:44:53
> > -0400:
> >  
> > > When flash-dma is absent do not default to using flash-edu.
> > > Make sure flash-edu is enabled before setting EDU transfer
> > > function.
> > >
> > > Fixes: a5d53ad26a8b ("mtd: rawnand: brcmnand: Add support for flash-edu for dma transfers")
> > > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> > > ---
> > >  drivers/mtd/nand/raw/brcmnand/brcmnand.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > index 8f9ffb46a09f..0c1d6e543586 100644
> > > --- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > @@ -2953,8 +2953,9 @@ int brcmnand_probe(struct platform_device *pdev, struct brcmnand_soc *soc)
> > >               if (ret < 0)
> > >                       goto err;
> > >
> > > -             /* set edu transfer function to call */
> > > -             ctrl->dma_trans = brcmnand_edu_trans;
> > > +             if (has_edu(ctrl))
> > > +                     /* set edu transfer function to call */
> > > +                     ctrl->dma_trans = brcmnand_edu_trans;  
> >
> > Does this fallback to returning an error in case !has_edu() ? Othewise,
> > how is it handled?
> >  
> 
>  The driver will default to pio if both flash-dma and falsh-edu are
> not present.
> 
> > Thanks,
> > Miquèl  
> 
> Kamal


Ok, thanks for the clarification!

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers
  2020-06-11  7:27   ` Miquel Raynal
@ 2020-06-11 16:04     ` Kamal Dasu
  2020-06-12  7:07       ` Miquel Raynal
  0 siblings, 1 reply; 12+ messages in thread
From: Kamal Dasu @ 2020-06-11 16:04 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Brian Norris, Richard Weinberger, Vignesh Raghavendra,
	MTD Maling List, bcm-kernel-feedback-list,
	Linux Kernel Mailing List

On Thu, Jun 11, 2020 at 3:27 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Kamal,
>
> Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 01:44:54
> -0400:
>
> > Implemented ECC correctable and uncorrectable error handling for EDU
>
> Implement?
>
> > reads. If ECC correctable bitflips are encountered  on EDU transfer,
>
> extra space                                         ^
>
> > read page again using pio, This is needed due to a controller lmitation
>
> s/pio/PIO/
>
> > where read and corrected data is not transferred to the DMA buffer on ECC
> > errors. This holds true for ECC correctable errors beyond set threshold.
>
> error.
>
> Not sure what the last sentence means?
>

NAND controller allows for setting a correctable  ECC threshold number
of bits beyond which it will actually report the error to the driver.
e.g. for BCH-4 the threshold is 3, so 3-bit and 4-bit errors will
generate correctable ECC interrupt however 1-bit and 2-bit errors will
be corrected silently.
From the above example EDU hardware will not transfer corrected data
to the DMA buffer for 3-bit and 4-bit errors that get reported. So
once we detect
the error duing EDU we read the page again using pio.

> >
> > Fixes: a5d53ad26a8b ("mtd: rawnand: brcmnand: Add support for flash-edu for dma transfers")
> > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> > ---
>
> Minor nits below :)
>
> >  drivers/mtd/nand/raw/brcmnand/brcmnand.c | 26 ++++++++++++++++++++++++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > index 0c1d6e543586..d7daa83c8a58 100644
> > --- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > @@ -1855,6 +1855,22 @@ static int brcmnand_edu_trans(struct brcmnand_host *host, u64 addr, u32 *buf,
> >       edu_writel(ctrl, EDU_STOP, 0); /* force stop */
> >       edu_readl(ctrl, EDU_STOP);
> >
> > +     if (ret == 0 && edu_cmd == EDU_CMD_READ) {
>
> !ret
>
> > +             u64 err_addr = 0;
> > +
> > +             /*
> > +              * check for ecc errors here, subpage ecc erros are
> > +              * retained in ecc error addr register
>
> s/ecc/ECC/
> s/erros/errors/
> s/addr/address/
>
> > +              */
> > +             err_addr = brcmnand_get_uncorrecc_addr(ctrl);
> > +             if (!err_addr) {
> > +                     err_addr = brcmnand_get_correcc_addr(ctrl);
> > +                     if (err_addr)
> > +                             ret = -EUCLEAN;
> > +             } else
> > +                     ret = -EBADMSG;
>
> I don't like very much to see these values being used within NAND
> controller drivers but I see it's already the cause, so I guess I can
> live with that.
>
> > +     }
> > +
> >       return ret;
> >  }
> >
> > @@ -2058,6 +2074,7 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> >       u64 err_addr = 0;
> >       int err;
> >       bool retry = true;
> > +     bool edu_read = false;
> >
> >       dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
> >
> > @@ -2075,6 +2092,10 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> >                       else
> >                               return -EIO;
> >               }
> > +
> > +             if (has_edu(ctrl))
> > +                     edu_read = true;
>
> You don't need this extra value, you already have the cmd parameter
> which tells you if it is a read or a write. You might even want to
> create a if block so set dir and edu_cmd and eventually a local
> edu_read if you think it still makes sense.
>

I needed the value since dma and edu read has multiple conditions like
oob is not included, buffer is aligned, virtual address is good. This
indicates to
the if (mtd_is_bitflip(err))  block that the error was from an edu
transaction that happened.This way all ecc error handling for dma,
edu, pio is in one place.
Also there is more controller version specific logic for read error
handling in there and this allows us to maintain the hierarchy how we
handle both correctable
and uncorrectable error.

> > +
> >       } else {
> >               if (oob)
> >                       memset(oob, 0x99, mtd->oobsize);
> > @@ -2122,6 +2143,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> >       if (mtd_is_bitflip(err)) {
> >               unsigned int corrected = brcmnand_count_corrected(ctrl);
> >
> > +             /* in case of edu correctable error we read again using pio */
>
> s/edu/EDU/ ?
> s/pio/PIO/
>
> > +             if (edu_read)
> > +                     err = brcmnand_read_by_pio(mtd, chip, addr, trans, buf,
> > +                                                oob, &err_addr);
> > +
> >               dev_dbg(ctrl->dev, "corrected error at 0x%llx\n",
> >                       (unsigned long long)err_addr);
> >               mtd->ecc_stats.corrected += corrected;
>

Will fix all the other typos.

> Thanks,
> Miquèl

Thanks
Kamal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers
  2020-06-11 16:04     ` Kamal Dasu
@ 2020-06-12  7:07       ` Miquel Raynal
  2020-06-12 16:34         ` Kamal Dasu
  0 siblings, 1 reply; 12+ messages in thread
From: Miquel Raynal @ 2020-06-12  7:07 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: Brian Norris, Richard Weinberger, Vignesh Raghavendra,
	MTD Maling List, bcm-kernel-feedback-list,
	Linux Kernel Mailing List

Hi Kamal,

Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 12:04:29
-0400:

> On Thu, Jun 11, 2020 at 3:27 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Kamal,
> >
> > Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 01:44:54
> > -0400:
> >  
> > > Implemented ECC correctable and uncorrectable error handling for EDU  
> >
> > Implement?
> >  
> > > reads. If ECC correctable bitflips are encountered  on EDU transfer,  
> >
> > extra space                                         ^
> >  
> > > read page again using pio, This is needed due to a controller lmitation  
> >
> > s/pio/PIO/
> >  
> > > where read and corrected data is not transferred to the DMA buffer on ECC
> > > errors. This holds true for ECC correctable errors beyond set threshold.  
> >
> > error.
> >
> > Not sure what the last sentence means?
> >  
> 
> NAND controller allows for setting a correctable  ECC threshold number
> of bits beyond which it will actually report the error to the driver.
> e.g. for BCH-4 the threshold is 3, so 3-bit and 4-bit errors will
> generate correctable ECC interrupt however 1-bit and 2-bit errors will
> be corrected silently.
> From the above example EDU hardware will not transfer corrected data
> to the DMA buffer for 3-bit and 4-bit errors that get reported. So
> once we detect
> the error duing EDU we read the page again using pio.

Ok I see what you mean, can't you fake the threshold instead? The NAND
controller in Linux is not supposed to handle this threshold, the NAND
core is in charge. So what the controller driver should do is just:
increase the number of bitflips + return the maximum number or bitflip
or increase the failure counter. Is this already the case?

> 
> > >
> > > Fixes: a5d53ad26a8b ("mtd: rawnand: brcmnand: Add support for flash-edu for dma transfers")
> > > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> > > ---  
> >
> > Minor nits below :)
> >  
> > >  drivers/mtd/nand/raw/brcmnand/brcmnand.c | 26 ++++++++++++++++++++++++
> > >  1 file changed, 26 insertions(+)
> > >
> > > diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > index 0c1d6e543586..d7daa83c8a58 100644
> > > --- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > @@ -1855,6 +1855,22 @@ static int brcmnand_edu_trans(struct brcmnand_host *host, u64 addr, u32 *buf,
> > >       edu_writel(ctrl, EDU_STOP, 0); /* force stop */
> > >       edu_readl(ctrl, EDU_STOP);
> > >
> > > +     if (ret == 0 && edu_cmd == EDU_CMD_READ) {  
> >
> > !ret
> >  
> > > +             u64 err_addr = 0;
> > > +
> > > +             /*
> > > +              * check for ecc errors here, subpage ecc erros are
> > > +              * retained in ecc error addr register  
> >
> > s/ecc/ECC/
> > s/erros/errors/
> > s/addr/address/
> >  
> > > +              */
> > > +             err_addr = brcmnand_get_uncorrecc_addr(ctrl);
> > > +             if (!err_addr) {
> > > +                     err_addr = brcmnand_get_correcc_addr(ctrl);
> > > +                     if (err_addr)
> > > +                             ret = -EUCLEAN;
> > > +             } else
> > > +                     ret = -EBADMSG;  
> >
> > I don't like very much to see these values being used within NAND
> > controller drivers but I see it's already the cause, so I guess I can

s/cause/case/

> > live with that.
> >  
> > > +     }
> > > +
> > >       return ret;
> > >  }
> > >
> > > @@ -2058,6 +2074,7 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> > >       u64 err_addr = 0;
> > >       int err;
> > >       bool retry = true;
> > > +     bool edu_read = false;
> > >
> > >       dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
> > >
> > > @@ -2075,6 +2092,10 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> > >                       else
> > >                               return -EIO;
> > >               }
> > > +
> > > +             if (has_edu(ctrl))
> > > +                     edu_read = true;  
> >
> > You don't need this extra value, you already have the cmd parameter
> > which tells you if it is a read or a write. You might even want to
> > create a if block so set dir and edu_cmd and eventually a local
> > edu_read if you think it still makes sense.
> >  
> 
> I needed the value since dma and edu read has multiple conditions like
> oob is not included, buffer is aligned, virtual address is good. This
> indicates to
> the if (mtd_is_bitflip(err))  block that the error was from an edu
> transaction that happened.This way all ecc error handling for dma,
> edu, pio is in one place.
> Also there is more controller version specific logic for read error
> handling in there and this allows us to maintain the hierarchy how we
> handle both correctable
> and uncorrectable error.

Fair enough.

> 
> > > +
> > >       } else {
> > >               if (oob)
> > >                       memset(oob, 0x99, mtd->oobsize);
> > > @@ -2122,6 +2143,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> > >       if (mtd_is_bitflip(err)) {
> > >               unsigned int corrected = brcmnand_count_corrected(ctrl);
> > >
> > > +             /* in case of edu correctable error we read again using pio */  
> >
> > s/edu/EDU/ ?
> > s/pio/PIO/
> >  
> > > +             if (edu_read)
> > > +                     err = brcmnand_read_by_pio(mtd, chip, addr, trans, buf,
> > > +                                                oob, &err_addr);
> > > +
> > >               dev_dbg(ctrl->dev, "corrected error at 0x%llx\n",
> > >                       (unsigned long long)err_addr);
> > >               mtd->ecc_stats.corrected += corrected;  
> >  
> 
> Will fix all the other typos.
> 
> > Thanks,
> > Miquèl  
> 
> Thanks
> Kamal


Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers
  2020-06-12  7:07       ` Miquel Raynal
@ 2020-06-12 16:34         ` Kamal Dasu
  2020-06-15  7:19           ` Miquel Raynal
  0 siblings, 1 reply; 12+ messages in thread
From: Kamal Dasu @ 2020-06-12 16:34 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Brian Norris, Richard Weinberger, Vignesh Raghavendra,
	MTD Maling List, bcm-kernel-feedback-list,
	Linux Kernel Mailing List

On Fri, Jun 12, 2020 at 3:07 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Kamal,
>
> Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 12:04:29
> -0400:
>
> > On Thu, Jun 11, 2020 at 3:27 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > >
> > > Hi Kamal,
> > >
> > > Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 01:44:54
> > > -0400:
> > >
> > > > Implemented ECC correctable and uncorrectable error handling for EDU
> > >
> > > Implement?
> > >
> > > > reads. If ECC correctable bitflips are encountered  on EDU transfer,
> > >
> > > extra space                                         ^
> > >
> > > > read page again using pio, This is needed due to a controller lmitation
> > >
> > > s/pio/PIO/
> > >
> > > > where read and corrected data is not transferred to the DMA buffer on ECC
> > > > errors. This holds true for ECC correctable errors beyond set threshold.
> > >
> > > error.
> > >
> > > Not sure what the last sentence means?
> > >
> >
> > NAND controller allows for setting a correctable  ECC threshold number
> > of bits beyond which it will actually report the error to the driver.
> > e.g. for BCH-4 the threshold is 3, so 3-bit and 4-bit errors will
> > generate correctable ECC interrupt however 1-bit and 2-bit errors will
> > be corrected silently.
> > From the above example EDU hardware will not transfer corrected data
> > to the DMA buffer for 3-bit and 4-bit errors that get reported. So
> > once we detect
> > the error duing EDU we read the page again using pio.
>
> Ok I see what you mean, can't you fake the threshold instead? The NAND
> controller in Linux is not supposed to handle this threshold, the NAND
> core is in charge. So what the controller driver should do is just:
> increase the number of bitflips + return the maximum number or bitflip
> or increase the failure counter. Is this already the case?
>
/* threshold = ceil(BCH-level * 0.75) */
brcmnand_wr_corr_thresh(host, DIV_ROUND_UP(chip->ecc.strength * 3, 4));
This how the threshold is set, all it means is that for high BCH
levels don't interrupt on low number (less than threshold) of
bit_flips. Yes the controller driver only increments correctable ECC
count. But due the EDU design an EDU operation is disrupted when the
controller interrupts on correctable ECC errors during subpage ECC
calculations. Hence the driver needs to read the page again with PIO
to transfer corrected data.

> >
> > > >
> > > > Fixes: a5d53ad26a8b ("mtd: rawnand: brcmnand: Add support for flash-edu for dma transfers")
> > > > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> > > > ---
> > >
> > > Minor nits below :)
> > >
> > > >  drivers/mtd/nand/raw/brcmnand/brcmnand.c | 26 ++++++++++++++++++++++++
> > > >  1 file changed, 26 insertions(+)
> > > >
> > > > diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > > index 0c1d6e543586..d7daa83c8a58 100644
> > > > --- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > > +++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
> > > > @@ -1855,6 +1855,22 @@ static int brcmnand_edu_trans(struct brcmnand_host *host, u64 addr, u32 *buf,
> > > >       edu_writel(ctrl, EDU_STOP, 0); /* force stop */
> > > >       edu_readl(ctrl, EDU_STOP);
> > > >
> > > > +     if (ret == 0 && edu_cmd == EDU_CMD_READ) {
> > >
> > > !ret
> > >
> > > > +             u64 err_addr = 0;
> > > > +
> > > > +             /*
> > > > +              * check for ecc errors here, subpage ecc erros are
> > > > +              * retained in ecc error addr register
> > >
> > > s/ecc/ECC/
> > > s/erros/errors/
> > > s/addr/address/
> > >
> > > > +              */
> > > > +             err_addr = brcmnand_get_uncorrecc_addr(ctrl);
> > > > +             if (!err_addr) {
> > > > +                     err_addr = brcmnand_get_correcc_addr(ctrl);
> > > > +                     if (err_addr)
> > > > +                             ret = -EUCLEAN;
> > > > +             } else
> > > > +                     ret = -EBADMSG;
> > >
> > > I don't like very much to see these values being used within NAND
> > > controller drivers but I see it's already the cause, so I guess I can
>
> s/cause/case/
>
> > > live with that.
> > >
> > > > +     }
> > > > +
> > > >       return ret;
> > > >  }
> > > >
> > > > @@ -2058,6 +2074,7 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> > > >       u64 err_addr = 0;
> > > >       int err;
> > > >       bool retry = true;
> > > > +     bool edu_read = false;
> > > >
> > > >       dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
> > > >
> > > > @@ -2075,6 +2092,10 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> > > >                       else
> > > >                               return -EIO;
> > > >               }
> > > > +
> > > > +             if (has_edu(ctrl))
> > > > +                     edu_read = true;
> > >
> > > You don't need this extra value, you already have the cmd parameter
> > > which tells you if it is a read or a write. You might even want to
> > > create a if block so set dir and edu_cmd and eventually a local
> > > edu_read if you think it still makes sense.
> > >
> >
> > I needed the value since dma and edu read has multiple conditions like
> > oob is not included, buffer is aligned, virtual address is good. This
> > indicates to
> > the if (mtd_is_bitflip(err))  block that the error was from an edu
> > transaction that happened.This way all ecc error handling for dma,
> > edu, pio is in one place.
> > Also there is more controller version specific logic for read error
> > handling in there and this allows us to maintain the hierarchy how we
> > handle both correctable
> > and uncorrectable error.
>
> Fair enough.
>
> >
> > > > +
> > > >       } else {
> > > >               if (oob)
> > > >                       memset(oob, 0x99, mtd->oobsize);
> > > > @@ -2122,6 +2143,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> > > >       if (mtd_is_bitflip(err)) {
> > > >               unsigned int corrected = brcmnand_count_corrected(ctrl);
> > > >
> > > > +             /* in case of edu correctable error we read again using pio */
> > >
> > > s/edu/EDU/ ?
> > > s/pio/PIO/
> > >
> > > > +             if (edu_read)
> > > > +                     err = brcmnand_read_by_pio(mtd, chip, addr, trans, buf,
> > > > +                                                oob, &err_addr);
> > > > +
> > > >               dev_dbg(ctrl->dev, "corrected error at 0x%llx\n",
> > > >                       (unsigned long long)err_addr);
> > > >               mtd->ecc_stats.corrected += corrected;
> > >
> >
> > Will fix all the other typos.
> >
> > > Thanks,
> > > Miquèl
> >
> > Thanks
> > Kamal
>
>
> Thanks,
> Miquèl

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers
  2020-06-12 16:34         ` Kamal Dasu
@ 2020-06-15  7:19           ` Miquel Raynal
  2020-06-15 15:11             ` Kamal Dasu
  0 siblings, 1 reply; 12+ messages in thread
From: Miquel Raynal @ 2020-06-15  7:19 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: Brian Norris, Richard Weinberger, Vignesh Raghavendra,
	MTD Maling List, bcm-kernel-feedback-list,
	Linux Kernel Mailing List

Hi Kamal,

Kamal Dasu <kdasu.kdev@gmail.com> wrote on Fri, 12 Jun 2020 12:34:22
-0400:

> On Fri, Jun 12, 2020 at 3:07 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Kamal,
> >
> > Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 12:04:29
> > -0400:
> >  
> > > On Thu, Jun 11, 2020 at 3:27 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:  
> > > >
> > > > Hi Kamal,
> > > >
> > > > Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 01:44:54
> > > > -0400:
> > > >  
> > > > > Implemented ECC correctable and uncorrectable error handling for EDU  
> > > >
> > > > Implement?
> > > >  
> > > > > reads. If ECC correctable bitflips are encountered  on EDU transfer,  
> > > >
> > > > extra space                                         ^
> > > >  
> > > > > read page again using pio, This is needed due to a controller lmitation  
> > > >
> > > > s/pio/PIO/
> > > >  
> > > > > where read and corrected data is not transferred to the DMA buffer on ECC
> > > > > errors. This holds true for ECC correctable errors beyond set threshold.  
> > > >
> > > > error.
> > > >
> > > > Not sure what the last sentence means?
> > > >  
> > >
> > > NAND controller allows for setting a correctable  ECC threshold number
> > > of bits beyond which it will actually report the error to the driver.
> > > e.g. for BCH-4 the threshold is 3, so 3-bit and 4-bit errors will
> > > generate correctable ECC interrupt however 1-bit and 2-bit errors will
> > > be corrected silently.
> > > From the above example EDU hardware will not transfer corrected data
> > > to the DMA buffer for 3-bit and 4-bit errors that get reported. So
> > > once we detect
> > > the error duing EDU we read the page again using pio.  
> >
> > Ok I see what you mean, can't you fake the threshold instead? The NAND
> > controller in Linux is not supposed to handle this threshold, the NAND
> > core is in charge. So what the controller driver should do is just:
> > increase the number of bitflips + return the maximum number or bitflip
> > or increase the failure counter. Is this already the case?
> >  
> /* threshold = ceil(BCH-level * 0.75) */
> brcmnand_wr_corr_thresh(host, DIV_ROUND_UP(chip->ecc.strength * 3, 4));
> This how the threshold is set, all it means is that for high BCH
> levels don't interrupt on low number (less than threshold) of
> bit_flips. Yes the controller driver only increments correctable ECC
> count. But due the EDU design an EDU operation is disrupted when the
> controller interrupts on correctable ECC errors during subpage ECC
> calculations. Hence the driver needs to read the page again with PIO
> to transfer corrected data.

IIUC, you are doing the job twice: you should just return a number of
bitflips or an error to the NAND core. So that's why I'm telling that
you should get rid of this threshold. It would avoid the need for the
PIO transfer too.

You also say that the controller "only increments correctable ECC
count", what do you mean exactly? The controller does not report errors
when the number of bitflips happens to be above the BCH threshold? This
would be the only case where what is currently done would be actually
needed though.

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers
  2020-06-15  7:19           ` Miquel Raynal
@ 2020-06-15 15:11             ` Kamal Dasu
  2020-06-15 17:37               ` Miquel Raynal
  0 siblings, 1 reply; 12+ messages in thread
From: Kamal Dasu @ 2020-06-15 15:11 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Brian Norris, Richard Weinberger, Vignesh Raghavendra,
	MTD Maling List, bcm-kernel-feedback-list,
	Linux Kernel Mailing List

On Mon, Jun 15, 2020 at 3:19 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Kamal,
>
> Kamal Dasu <kdasu.kdev@gmail.com> wrote on Fri, 12 Jun 2020 12:34:22
> -0400:
>
> > On Fri, Jun 12, 2020 at 3:07 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > >
> > > Hi Kamal,
> > >
> > > Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 12:04:29
> > > -0400:
> > >
> > > > On Thu, Jun 11, 2020 at 3:27 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > > > >
> > > > > Hi Kamal,
> > > > >
> > > > > Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 01:44:54
> > > > > -0400:
> > > > >
> > > > > > Implemented ECC correctable and uncorrectable error handling for EDU
> > > > >
> > > > > Implement?
> > > > >
> > > > > > reads. If ECC correctable bitflips are encountered  on EDU transfer,
> > > > >
> > > > > extra space                                         ^
> > > > >
> > > > > > read page again using pio, This is needed due to a controller lmitation
> > > > >
> > > > > s/pio/PIO/
> > > > >
> > > > > > where read and corrected data is not transferred to the DMA buffer on ECC
> > > > > > errors. This holds true for ECC correctable errors beyond set threshold.
> > > > >
> > > > > error.
> > > > >
> > > > > Not sure what the last sentence means?
> > > > >
> > > >
> > > > NAND controller allows for setting a correctable  ECC threshold number
> > > > of bits beyond which it will actually report the error to the driver.
> > > > e.g. for BCH-4 the threshold is 3, so 3-bit and 4-bit errors will
> > > > generate correctable ECC interrupt however 1-bit and 2-bit errors will
> > > > be corrected silently.
> > > > From the above example EDU hardware will not transfer corrected data
> > > > to the DMA buffer for 3-bit and 4-bit errors that get reported. So
> > > > once we detect
> > > > the error duing EDU we read the page again using pio.
> > >
> > > Ok I see what you mean, can't you fake the threshold instead? The NAND
> > > controller in Linux is not supposed to handle this threshold, the NAND
> > > core is in charge. So what the controller driver should do is just:
> > > increase the number of bitflips + return the maximum number or bitflip
> > > or increase the failure counter. Is this already the case?
> > >
> > /* threshold = ceil(BCH-level * 0.75) */
> > brcmnand_wr_corr_thresh(host, DIV_ROUND_UP(chip->ecc.strength * 3, 4));
> > This how the threshold is set, all it means is that for high BCH
> > levels don't interrupt on low number (less than threshold) of
> > bit_flips. Yes the controller driver only increments correctable ECC
> > count. But due the EDU design an EDU operation is disrupted when the
> > controller interrupts on correctable ECC errors during subpage ECC
> > calculations. Hence the driver needs to read the page again with PIO
> > to transfer corrected data.
>
> IIUC, you are doing the job twice: you should just return a number of
> bitflips or an error to the NAND core. So that's why I'm telling that
> you should get rid of this threshold. It would avoid the need for the
> PIO transfer too.

I think you are reading some statements in isolation that probably are
causing some confusion.
EDU design has a flaw in case of reported  ECC error interrupt in that
corrected data is not transferred to the DMA buffer. The PIO is needed
to read corrected data into the NAND data buffer and only for the
reported errors. So there is no need to change the threshold
calculation logic, if we get rid of the threshold then we will have to
do the PIO read on any correctable bit error if it occurs during EDU
reads.

>
> You also say that the controller "only increments correctable ECC
> count", what do you mean exactly?

Maybe that statement was a bit misleading. To be clear when an ECC
error is reported the controller gives the bit_flips count  as well as
updates the ECC error address Register and ecc error status registers.
This logic works as expected in the hardware.

>The controller does not report errors
> when the number of bitflips happens to be above the BCH threshold? This
> would be the only case where what is currently done would be actually
> needed though.

 It's the other way. The controller only reports bit errors beyond
>=threshold value, will not report otherwise and silently correct the
data. There is no problem in  cases where erros are corrected
silently. Now ECC (un)correctable on EDU reads are detected by simply
reading back the ECC Error address register. And in case of reported
uncorrectable ECC errors are treated as usual.  And for reported
correctable ECC errors we need to read the page again using PIO so
that the corrected data is properly transferred. All this applies to
EDU transfer only.

>
> Thanks,
> Miquèl

Kamal

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers
  2020-06-15 15:11             ` Kamal Dasu
@ 2020-06-15 17:37               ` Miquel Raynal
  0 siblings, 0 replies; 12+ messages in thread
From: Miquel Raynal @ 2020-06-15 17:37 UTC (permalink / raw)
  To: Kamal Dasu
  Cc: Brian Norris, Richard Weinberger, Vignesh Raghavendra,
	MTD Maling List, bcm-kernel-feedback-list,
	Linux Kernel Mailing List

Hi Kamal,

Kamal Dasu <kdasu.kdev@gmail.com> wrote on Mon, 15 Jun 2020 11:11:00
-0400:

> On Mon, Jun 15, 2020 at 3:19 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Kamal,
> >
> > Kamal Dasu <kdasu.kdev@gmail.com> wrote on Fri, 12 Jun 2020 12:34:22
> > -0400:
> >  
> > > On Fri, Jun 12, 2020 at 3:07 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:  
> > > >
> > > > Hi Kamal,
> > > >
> > > > Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 12:04:29
> > > > -0400:
> > > >  
> > > > > On Thu, Jun 11, 2020 at 3:27 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:  
> > > > > >
> > > > > > Hi Kamal,
> > > > > >
> > > > > > Kamal Dasu <kdasu.kdev@gmail.com> wrote on Thu, 11 Jun 2020 01:44:54
> > > > > > -0400:
> > > > > >  
> > > > > > > Implemented ECC correctable and uncorrectable error handling for EDU  
> > > > > >
> > > > > > Implement?
> > > > > >  
> > > > > > > reads. If ECC correctable bitflips are encountered  on EDU transfer,  
> > > > > >
> > > > > > extra space                                         ^
> > > > > >  
> > > > > > > read page again using pio, This is needed due to a controller lmitation  
> > > > > >
> > > > > > s/pio/PIO/
> > > > > >  
> > > > > > > where read and corrected data is not transferred to the DMA buffer on ECC
> > > > > > > errors. This holds true for ECC correctable errors beyond set threshold.  
> > > > > >
> > > > > > error.
> > > > > >
> > > > > > Not sure what the last sentence means?
> > > > > >  
> > > > >
> > > > > NAND controller allows for setting a correctable  ECC threshold number
> > > > > of bits beyond which it will actually report the error to the driver.
> > > > > e.g. for BCH-4 the threshold is 3, so 3-bit and 4-bit errors will
> > > > > generate correctable ECC interrupt however 1-bit and 2-bit errors will
> > > > > be corrected silently.
> > > > > From the above example EDU hardware will not transfer corrected data
> > > > > to the DMA buffer for 3-bit and 4-bit errors that get reported. So
> > > > > once we detect
> > > > > the error duing EDU we read the page again using pio.  
> > > >
> > > > Ok I see what you mean, can't you fake the threshold instead? The NAND
> > > > controller in Linux is not supposed to handle this threshold, the NAND
> > > > core is in charge. So what the controller driver should do is just:
> > > > increase the number of bitflips + return the maximum number or bitflip
> > > > or increase the failure counter. Is this already the case?
> > > >  
> > > /* threshold = ceil(BCH-level * 0.75) */
> > > brcmnand_wr_corr_thresh(host, DIV_ROUND_UP(chip->ecc.strength * 3, 4));
> > > This how the threshold is set, all it means is that for high BCH
> > > levels don't interrupt on low number (less than threshold) of
> > > bit_flips. Yes the controller driver only increments correctable ECC
> > > count. But due the EDU design an EDU operation is disrupted when the
> > > controller interrupts on correctable ECC errors during subpage ECC
> > > calculations. Hence the driver needs to read the page again with PIO
> > > to transfer corrected data.  
> >
> > IIUC, you are doing the job twice: you should just return a number of
> > bitflips or an error to the NAND core. So that's why I'm telling that
> > you should get rid of this threshold. It would avoid the need for the
> > PIO transfer too.  
> 
> I think you are reading some statements in isolation that probably are
> causing some confusion.
> EDU design has a flaw in case of reported  ECC error interrupt in that
> corrected data is not transferred to the DMA buffer. The PIO is needed
> to read corrected data into the NAND data buffer and only for the
> reported errors. So there is no need to change the threshold
> calculation logic, if we get rid of the threshold then we will have to
> do the PIO read on any correctable bit error if it occurs during EDU
> reads.
> 
> >
> > You also say that the controller "only increments correctable ECC
> > count", what do you mean exactly?  
> 
> Maybe that statement was a bit misleading. To be clear when an ECC
> error is reported the controller gives the bit_flips count  as well as
> updates the ECC error address Register and ecc error status registers.
> This logic works as expected in the hardware.
> 
> >The controller does not report errors
> > when the number of bitflips happens to be above the BCH threshold? This
> > would be the only case where what is currently done would be actually
> > needed though.  
> 
>  It's the other way. The controller only reports bit errors beyond
> >=threshold value, will not report otherwise and silently correct the  
> data. There is no problem in  cases where erros are corrected
> silently. Now ECC (un)correctable on EDU reads are detected by simply
> reading back the ECC Error address register. And in case of reported
> uncorrectable ECC errors are treated as usual.  And for reported
> correctable ECC errors we need to read the page again using PIO so
> that the corrected data is properly transferred. All this applies to
> EDU transfer only.
> 

Thank you very much for the explanation, I understand better how this
controller works now.


Cheers,
Miquèl

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-06-15 17:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11  5:44 [PATCH 1/2] mtd: rawnand: brcmnand: Don't default to edu transfer Kamal Dasu
2020-06-11  5:44 ` [PATCH 2/2] mtd: rawnand: brcmnand: Ecc error handling on EDU transfers Kamal Dasu
2020-06-11  7:27   ` Miquel Raynal
2020-06-11 16:04     ` Kamal Dasu
2020-06-12  7:07       ` Miquel Raynal
2020-06-12 16:34         ` Kamal Dasu
2020-06-15  7:19           ` Miquel Raynal
2020-06-15 15:11             ` Kamal Dasu
2020-06-15 17:37               ` Miquel Raynal
2020-06-11  7:16 ` [PATCH 1/2] mtd: rawnand: brcmnand: Don't default to edu transfer Miquel Raynal
2020-06-11 15:22   ` Kamal Dasu
2020-06-11 15:24     ` Miquel Raynal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).