All of lore.kernel.org
 help / color / mirror / Atom feed
* Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide?
@ 2022-02-21 19:00 Daniel Glöckner
  2022-02-22 22:02 ` Han Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Glöckner @ 2022-02-21 19:00 UTC (permalink / raw)
  To: linux-mtd; +Cc: Lothar Waßmann, Brian Norris, Han Xu

Hi,

we are using UBI on a NAND flash with BBT and have recently observed
bad blocks where nand_markbad_bbm returns an error. Since that error is
returned by nand_block_markbad_lowlevel even when marking the block in
the BBT succeeds, UBI goes into read-only mode. We would therefore like
to set NAND_BBT_NO_OOB_BBM.

Unfortunately there is no device tree property for this flag. Also we
internally disagree if this should be configurable on our platform at
all. We are using an i.MX6 that needs to relocate the bad block marker
to a different byte within the page because of its ECC layout.

In 2014 Lothar already submitted a patch to add a nand-no-oob-bbm device
tree property that got rejected:
https://patchwork.kernel.org/project/linux-arm-kernel/patch/1402579245-13377-5-git-send-email-LW@KARO-electronics.de/
Brian suggested back then to tie this behavior to the non-standard
fsl,no-blockmark-swap property because the marker becomes completely
useless when it stays at the same position as data bytes in good blocks.

So which solution would have the highest chance of being accepted as a
patch? Introducing a device tree property for NAND_BBT_NO_OOB_BBM, using
fsl,no-blockmark-swap, or setting NAND_BBT_NO_OOB_BBM for all boards
inside the i.MX gpmi driver when there is a BBT? Or maybe renaming
fsl,no-blockmark-swap to nand-no-oob-bbm (with a transition phase)?

Best regards,

  Daniel

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide?
  2022-02-21 19:00 Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide? Daniel Glöckner
@ 2022-02-22 22:02 ` Han Xu
  2022-02-23 10:59   ` Daniel Glöckner
  0 siblings, 1 reply; 10+ messages in thread
From: Han Xu @ 2022-02-22 22:02 UTC (permalink / raw)
  To: Daniel Glöckner; +Cc: linux-mtd, Lothar Waßmann, Brian Norris

On 22/02/21 08:00PM, Daniel Glöckner wrote:
> Hi,
> 
> we are using UBI on a NAND flash with BBT and have recently observed
> bad blocks where nand_markbad_bbm returns an error. Since that error is
> returned by nand_block_markbad_lowlevel even when marking the block in
> the BBT succeeds, UBI goes into read-only mode. We would therefore like
> to set NAND_BBT_NO_OOB_BBM.

Could you please describe more details about what kind of error, how to
reproduce it and on which kernel version?

> 
> Unfortunately there is no device tree property for this flag. Also we
> internally disagree if this should be configurable on our platform at
> all. We are using an i.MX6 that needs to relocate the bad block marker
> to a different byte within the page because of its ECC layout.
> 
> In 2014 Lothar already submitted a patch to add a nand-no-oob-bbm device
> tree property that got rejected:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-arm-kernel%2Fpatch%2F1402579245-13377-5-git-send-email-LW%40KARO-electronics.de%2F&data=04%7C01%7Chan.xu%40nxp.com%7Cda5e4832940349710a3008d9f56c78ac%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637810668500447823%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=zyQ%2B3IlnrestiToyJjPvOsZ9cn3Dfdr%2Fcn8dW30bTaQ%3D&reserved=0
> Brian suggested back then to tie this behavior to the non-standard
> fsl,no-blockmark-swap property because the marker becomes completely
> useless when it stays at the same position as data bytes in good blocks.
> 
> So which solution would have the highest chance of being accepted as a
> patch? Introducing a device tree property for NAND_BBT_NO_OOB_BBM, using
> fsl,no-blockmark-swap, or setting NAND_BBT_NO_OOB_BBM for all boards
> inside the i.MX gpmi driver when there is a BBT? Or maybe renaming
> fsl,no-blockmark-swap to nand-no-oob-bbm (with a transition phase)?
> 
> Best regards,
> 
>   Daniel

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide?
  2022-02-22 22:02 ` Han Xu
@ 2022-02-23 10:59   ` Daniel Glöckner
  2022-02-24 15:29     ` Miquel Raynal
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Glöckner @ 2022-02-23 10:59 UTC (permalink / raw)
  To: Han Xu; +Cc: linux-mtd, Lothar Waßmann, Brian Norris

Am 22.02.22 um 23:02 schrieb Han Xu:
> On 22/02/21 08:00PM, Daniel Glöckner wrote:
>> we are using UBI on a NAND flash with BBT and have recently observed
>> bad blocks where nand_markbad_bbm returns an error. Since that error is
>> returned by nand_block_markbad_lowlevel even when marking the block in
>> the BBT succeeds, UBI goes into read-only mode. We would therefore like
>> to set NAND_BBT_NO_OOB_BBM.
> 
> Could you please describe more details about what kind of error, how to
> reproduce it and on which kernel version?

You need a flash that has one bad block where programming the BBM sets
NAND_STATUS_FAIL in its status register. The latest kernels should still
have problems when this happens in a UBI.

Best regards,

  Daniel

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide?
  2022-02-23 10:59   ` Daniel Glöckner
@ 2022-02-24 15:29     ` Miquel Raynal
  2022-02-24 15:55       ` Daniel Glöckner
  0 siblings, 1 reply; 10+ messages in thread
From: Miquel Raynal @ 2022-02-24 15:29 UTC (permalink / raw)
  To: Daniel Glöckner; +Cc: Han Xu, linux-mtd, Lothar Waßmann, Brian Norris

Hi Daniel,

dg@emlix.com wrote on Wed, 23 Feb 2022 11:59:02 +0100:

> Am 22.02.22 um 23:02 schrieb Han Xu:
> > On 22/02/21 08:00PM, Daniel Glöckner wrote:  
> >> we are using UBI on a NAND flash with BBT and have recently observed
> >> bad blocks where nand_markbad_bbm returns an error. Since that error is
> >> returned by nand_block_markbad_lowlevel even when marking the block in
> >> the BBT succeeds, UBI goes into read-only mode. We would therefore like
> >> to set NAND_BBT_NO_OOB_BBM.  
> > 
> > Could you please describe more details about what kind of error, how to
> > reproduce it and on which kernel version?  
> 
> You need a flash that has one bad block where programming the BBM sets
> NAND_STATUS_FAIL in its status register. The latest kernels should still
> have problems when this happens in a UBI.

I believe we should try to tackle "why" this happens more than try to
workaround its consequences. Can you give more details about why we get
this status?


Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide?
  2022-02-24 15:29     ` Miquel Raynal
@ 2022-02-24 15:55       ` Daniel Glöckner
  2022-02-24 16:03         ` Miquel Raynal
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Glöckner @ 2022-02-24 15:55 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: Han Xu, linux-mtd, Lothar Waßmann, Brian Norris

Am 24.02.22 um 16:29 schrieb Miquel Raynal:
> dg@emlix.com wrote on Wed, 23 Feb 2022 11:59:02 +0100:
>> Am 22.02.22 um 23:02 schrieb Han Xu:>>> Could you please describe more details about what kind of error, how to
>>> reproduce it and on which kernel version?  
>>
>> You need a flash that has one bad block where programming the BBM sets
>> NAND_STATUS_FAIL in its status register. The latest kernels should still
>> have problems when this happens in a UBI.
> 
> I believe we should try to tackle "why" this happens more than try to
> workaround its consequences. Can you give more details about why we get
> this status?

Uhm, the block is bad, broken. It shows the same behavior even after
power cycling. The other blocks are ok. I don't think it is our fault
that it died so early.

Best regards,

  Daniel


-- 
Dipl.-Math. Daniel Glöckner, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11,
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke
Ust-IdNr.: DE 205 198 055

emlix - your embedded linux partner

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide?
  2022-02-24 15:55       ` Daniel Glöckner
@ 2022-02-24 16:03         ` Miquel Raynal
  2022-02-24 18:17           ` Daniel Glöckner
  0 siblings, 1 reply; 10+ messages in thread
From: Miquel Raynal @ 2022-02-24 16:03 UTC (permalink / raw)
  To: Daniel Glöckner; +Cc: Han Xu, linux-mtd, Lothar Waßmann, Brian Norris

Hi Daniel,

dg@emlix.com wrote on Thu, 24 Feb 2022 16:55:27 +0100:

> Am 24.02.22 um 16:29 schrieb Miquel Raynal:
> > dg@emlix.com wrote on Wed, 23 Feb 2022 11:59:02 +0100:  
> >> Am 22.02.22 um 23:02 schrieb Han Xu:>>> Could you please describe more details about what kind of error, how to  
> >>> reproduce it and on which kernel version?    
> >>
> >> You need a flash that has one bad block where programming the BBM sets
> >> NAND_STATUS_FAIL in its status register. The latest kernels should still
> >> have problems when this happens in a UBI.  
> > 
> > I believe we should try to tackle "why" this happens more than try to
> > workaround its consequences. Can you give more details about why we get
> > this status?  
> 
> Uhm, the block is bad, broken. It shows the same behavior even after
> power cycling. The other blocks are ok. I don't think it is our fault
> that it died so early.

But why after a power cycle are we trying to write the BBM? Is it
that there are too many ECC errors and so when reading the block it
is declared bad and the system tries to set the BBM/BBT bit? Or is it
already marked bad somewhere and something silly happens which at
some point tries to re-write the BBM?

Are you using fastmap? do you use a BBT?

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide?
  2022-02-24 16:03         ` Miquel Raynal
@ 2022-02-24 18:17           ` Daniel Glöckner
  2022-03-14 15:45             ` Miquel Raynal
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Glöckner @ 2022-02-24 18:17 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: Han Xu, linux-mtd, Lothar Waßmann, Brian Norris

Hi Miquel,

Am 24.02.22 um 17:03 schrieb Miquel Raynal:
> dg@emlix.com wrote on Thu, 24 Feb 2022 16:55:27 +0100:
>> Am 24.02.22 um 16:29 schrieb Miquel Raynal:
>>> dg@emlix.com wrote on Wed, 23 Feb 2022 11:59:02 +0100:  
>>>> Am 22.02.22 um 23:02 schrieb Han Xu:>>> Could you please describe more details about what kind of error, how to  
>>>>> reproduce it and on which kernel version?    
>>>>
>>>> You need a flash that has one bad block where programming the BBM sets
>>>> NAND_STATUS_FAIL in its status register. The latest kernels should still
>>>> have problems when this happens in a UBI.  
>>>
>>> I believe we should try to tackle "why" this happens more than try to
>>> workaround its consequences. Can you give more details about why we get
>>> this status?  
>>
>> Uhm, the block is bad, broken. It shows the same behavior even after
>> power cycling. The other blocks are ok. I don't think it is our fault
>> that it died so early.
> 
> But why after a power cycle are we trying to write the BBM?

I did not want to imply that Linux tries to write the block after every
power cycle. UBI notices that the block is broken once and manages to
mark it as bad in the BBT, so after power cycle it will not try to write
to that block again. What I wanted to say is that manual testing of the
block after power cycling shows that the block remains unusable.

The problem is that UBI switches to read-only mode after it marked the
block as bad in the BBT because the redundant BBM in the OOB of the
block could not be written. And we don't want to get into a situation
where we have to reboot the system, especially if it is because of
something we don't need.

We could change nand_block_markbad_lowlevel to return success as long
as updating the BBT succeeds, if you think that this is the correct
approach.

> Is it that there are too many ECC errors and so when reading the block it
> is declared bad and the system tries to set the BBM/BBT bit? Or is it
> already marked bad somewhere and something silly happens which at
> some point tries to re-write the BBM?

I guess when programming the BBM fails with an error in the status
register the same probably happened when UBI tried to write data to the
block.

> Are you using fastmap? do you use a BBT?

Yes and yes. The fact that we use a BBT is why we want to set
NAND_BBT_NO_OOB_BBM.

Best regards,

  Daniel

-- 
Dipl.-Math. Daniel Glöckner, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11,
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke
Ust-IdNr.: DE 205 198 055

emlix - your embedded linux partner

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide?
  2022-02-24 18:17           ` Daniel Glöckner
@ 2022-03-14 15:45             ` Miquel Raynal
  2022-03-15  7:06               ` Lothar Waßmann
  0 siblings, 1 reply; 10+ messages in thread
From: Miquel Raynal @ 2022-03-14 15:45 UTC (permalink / raw)
  To: Daniel Glöckner; +Cc: Han Xu, linux-mtd, Lothar Waßmann, Brian Norris

Hi Daniel,

Sorry for the delay.

dg@emlix.com wrote on Thu, 24 Feb 2022 19:17:43 +0100:

> Hi Miquel,
> 
> Am 24.02.22 um 17:03 schrieb Miquel Raynal:
> > dg@emlix.com wrote on Thu, 24 Feb 2022 16:55:27 +0100:  
> >> Am 24.02.22 um 16:29 schrieb Miquel Raynal:  
> >>> dg@emlix.com wrote on Wed, 23 Feb 2022 11:59:02 +0100:    
> >>>> Am 22.02.22 um 23:02 schrieb Han Xu:>>> Could you please describe more details about what kind of error, how to    
> >>>>> reproduce it and on which kernel version?      
> >>>>
> >>>> You need a flash that has one bad block where programming the BBM sets
> >>>> NAND_STATUS_FAIL in its status register. The latest kernels should still
> >>>> have problems when this happens in a UBI.    
> >>>
> >>> I believe we should try to tackle "why" this happens more than try to
> >>> workaround its consequences. Can you give more details about why we get
> >>> this status?    
> >>
> >> Uhm, the block is bad, broken. It shows the same behavior even after
> >> power cycling. The other blocks are ok. I don't think it is our fault
> >> that it died so early.  
> > 
> > But why after a power cycle are we trying to write the BBM?  
> 
> I did not want to imply that Linux tries to write the block after every
> power cycle. UBI notices that the block is broken once and manages to
> mark it as bad in the BBT, so after power cycle it will not try to write
> to that block again. What I wanted to say is that manual testing of the
> block after power cycling shows that the block remains unusable.
> 
> The problem is that UBI switches to read-only mode after it marked the
> block as bad in the BBT because the redundant BBM in the OOB of the
> block could not be written.

I think I understand better your situation now.

So here is our problem : why can't we write the OOB? If there is a good
reason this cannot happen, then we can provide the NAND_BBT_NO_OOB_BBM
flag. Otherwise we should find the root cause.

> And we don't want to get into a situation
> where we have to reboot the system, especially if it is because of
> something we don't need.
> 
> We could change nand_block_markbad_lowlevel to return success as long
> as updating the BBT succeeds, if you think that this is the correct
> approach.

That is not a correct approach if we did not asked to bypass writing
BBMs explicitly.

> > Is it that there are too many ECC errors and so when reading the block it
> > is declared bad and the system tries to set the BBM/BBT bit? Or is it
> > already marked bad somewhere and something silly happens which at
> > some point tries to re-write the BBM?  
> 
> I guess when programming the BBM fails with an error in the status
> register

Why would a (without-ECC) program operation fail? I guess this is what
we should understand first.

> the same probably happened when UBI tried to write data to the
> block.
> 
> > Are you using fastmap? do you use a BBT?  
> 
> Yes and yes. The fact that we use a BBT is why we want to set
> NAND_BBT_NO_OOB_BBM.
> 
> Best regards,
> 
>   Daniel
> 

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide?
  2022-03-14 15:45             ` Miquel Raynal
@ 2022-03-15  7:06               ` Lothar Waßmann
  2022-03-15  8:34                 ` Miquel Raynal
  0 siblings, 1 reply; 10+ messages in thread
From: Lothar Waßmann @ 2022-03-15  7:06 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: Daniel Glöckner, Han Xu, linux-mtd, Brian Norris

Miquel Raynal <miquel.raynal@bootlin.com> wrote:

> Hi Daniel,
> 
> Sorry for the delay.
> 
> dg@emlix.com wrote on Thu, 24 Feb 2022 19:17:43 +0100:
> 
> > Hi Miquel,
> > 
> > Am 24.02.22 um 17:03 schrieb Miquel Raynal:  
> > > dg@emlix.com wrote on Thu, 24 Feb 2022 16:55:27 +0100:    
> > >> Am 24.02.22 um 16:29 schrieb Miquel Raynal:    
> > >>> dg@emlix.com wrote on Wed, 23 Feb 2022 11:59:02 +0100:      
> > >>>> Am 22.02.22 um 23:02 schrieb Han Xu:>>> Could you please
> > >>>> describe more details about what kind of error, how to      
> > >>>>> reproduce it and on which kernel version?        
> > >>>>
> > >>>> You need a flash that has one bad block where programming the
> > >>>> BBM sets NAND_STATUS_FAIL in its status register. The latest
> > >>>> kernels should still have problems when this happens in a
> > >>>> UBI.      
> > >>>
> > >>> I believe we should try to tackle "why" this happens more than
> > >>> try to workaround its consequences. Can you give more details
> > >>> about why we get this status?      
> > >>
> > >> Uhm, the block is bad, broken. It shows the same behavior even
> > >> after power cycling. The other blocks are ok. I don't think it
> > >> is our fault that it died so early.    
> > > 
> > > But why after a power cycle are we trying to write the BBM?    
> > 
> > I did not want to imply that Linux tries to write the block after
> > every power cycle. UBI notices that the block is broken once and
> > manages to mark it as bad in the BBT, so after power cycle it will
> > not try to write to that block again. What I wanted to say is that
> > manual testing of the block after power cycling shows that the
> > block remains unusable.
> > 
> > The problem is that UBI switches to read-only mode after it marked
> > the block as bad in the BBT because the redundant BBM in the OOB of
> > the block could not be written.  
> 
> I think I understand better your situation now.
> 
> So here is our problem : why can't we write the OOB? If there is a
> good reason this cannot happen, then we can provide the
> NAND_BBT_NO_OOB_BBM flag. Otherwise we should find the root cause.
> 
> > And we don't want to get into a situation
> > where we have to reboot the system, especially if it is because of
> > something we don't need.
> > 
> > We could change nand_block_markbad_lowlevel to return success as
> > long as updating the BBT succeeds, if you think that this is the
> > correct approach.  
> 
> That is not a correct approach if we did not asked to bypass writing
> BBMs explicitly.
>
The BBM in the OOB area is a "Factory Bad Block Marker" where the
manufacturer marks initially bad blocks. There is no guarantee that the
BBM can be written on a block that turned bad lateron.
If a block turned BAD during use it is completely useless to try writing
anything to it. Depending on the nature of the NAND error that turned
the block bad, trying to write that block may also affect random other
blocks.


Lothar Waßmann
-- 
___________________________________________________________

Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
Geschäftsführer: Matthias Kaussen
Handelsregistereintrag: Amtsgericht Aachen, HRB 4996

www.karo-electronics.de | info@karo-electronics.de
___________________________________________________________

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide?
  2022-03-15  7:06               ` Lothar Waßmann
@ 2022-03-15  8:34                 ` Miquel Raynal
  0 siblings, 0 replies; 10+ messages in thread
From: Miquel Raynal @ 2022-03-15  8:34 UTC (permalink / raw)
  To: Lothar Waßmann; +Cc: Daniel Glöckner, Han Xu, linux-mtd, Brian Norris

Hi Lothar,

LW@KARO-electronics.de wrote on Tue, 15 Mar 2022 08:06:02 +0100:

> Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> 
> > Hi Daniel,
> > 
> > Sorry for the delay.
> > 
> > dg@emlix.com wrote on Thu, 24 Feb 2022 19:17:43 +0100:
> >   
> > > Hi Miquel,
> > > 
> > > Am 24.02.22 um 17:03 schrieb Miquel Raynal:    
> > > > dg@emlix.com wrote on Thu, 24 Feb 2022 16:55:27 +0100:      
> > > >> Am 24.02.22 um 16:29 schrieb Miquel Raynal:      
> > > >>> dg@emlix.com wrote on Wed, 23 Feb 2022 11:59:02 +0100:        
> > > >>>> Am 22.02.22 um 23:02 schrieb Han Xu:>>> Could you please
> > > >>>> describe more details about what kind of error, how to        
> > > >>>>> reproduce it and on which kernel version?          
> > > >>>>
> > > >>>> You need a flash that has one bad block where programming the
> > > >>>> BBM sets NAND_STATUS_FAIL in its status register. The latest
> > > >>>> kernels should still have problems when this happens in a
> > > >>>> UBI.        
> > > >>>
> > > >>> I believe we should try to tackle "why" this happens more than
> > > >>> try to workaround its consequences. Can you give more details
> > > >>> about why we get this status?        
> > > >>
> > > >> Uhm, the block is bad, broken. It shows the same behavior even
> > > >> after power cycling. The other blocks are ok. I don't think it
> > > >> is our fault that it died so early.      
> > > > 
> > > > But why after a power cycle are we trying to write the BBM?      
> > > 
> > > I did not want to imply that Linux tries to write the block after
> > > every power cycle. UBI notices that the block is broken once and
> > > manages to mark it as bad in the BBT, so after power cycle it will
> > > not try to write to that block again. What I wanted to say is that
> > > manual testing of the block after power cycling shows that the
> > > block remains unusable.
> > > 
> > > The problem is that UBI switches to read-only mode after it marked
> > > the block as bad in the BBT because the redundant BBM in the OOB of
> > > the block could not be written.    
> > 
> > I think I understand better your situation now.
> > 
> > So here is our problem : why can't we write the OOB? If there is a
> > good reason this cannot happen, then we can provide the
> > NAND_BBT_NO_OOB_BBM flag. Otherwise we should find the root cause.
> >   
> > > And we don't want to get into a situation
> > > where we have to reboot the system, especially if it is because of
> > > something we don't need.
> > > 
> > > We could change nand_block_markbad_lowlevel to return success as
> > > long as updating the BBT succeeds, if you think that this is the
> > > correct approach.    
> > 
> > That is not a correct approach if we did not asked to bypass writing
> > BBMs explicitly.
> >  
> The BBM in the OOB area is a "Factory Bad Block Marker" where the
> manufacturer marks initially bad blocks. There is no guarantee that the
> BBM can be written on a block that turned bad lateron.

Writing a BBM means programming one byte to 0. We don't care about the
other bytes in the entire page, really, so we don't really care if
other bits flip during this operation. Worst case scenario: none of
the bits in the BBM are programmed (quite unlikely given the fact that
it's probably the "data" which triggered the errors in the first
place, even less likely knowing that only the first page of the block
will receive the marker while it's maybe not this page which shown
errors in the first place).

Anyway, let's assume the bad block marker cannot be programmed. Why
would the raw PROGRAM PAGE operation fail? There is no read back
happening automatically. We need to understand why the NAND op failed
in the first place, I don't think it is related to the page being bad,
more to the specific 1-byte write that the driver tries to do. I
believe this issue is gpmi-specific.

> If a block turned BAD during use it is completely useless to try writing
> anything to it.

Not necessarily, in particular if UBI decided to turn it bad. It does
not mean the block has wear out completely, it just means that the
block is about to wear out.

> Depending on the nature of the NAND error that turned
> the block bad, trying to write that block may also affect random other
> blocks.

I don't think this can happen on SLC. And on MLC I believe it is
'correctly' handled thanks to the known pairing scheme.

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-03-15  8:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-21 19:00 Make NAND_BBT_NO_OOB_BBM configurable or let the gpmi driver decide? Daniel Glöckner
2022-02-22 22:02 ` Han Xu
2022-02-23 10:59   ` Daniel Glöckner
2022-02-24 15:29     ` Miquel Raynal
2022-02-24 15:55       ` Daniel Glöckner
2022-02-24 16:03         ` Miquel Raynal
2022-02-24 18:17           ` Daniel Glöckner
2022-03-14 15:45             ` Miquel Raynal
2022-03-15  7:06               ` Lothar Waßmann
2022-03-15  8:34                 ` Miquel Raynal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.