All of lore.kernel.org
 help / color / mirror / Atom feed
* UBIFS: recovery of master node
@ 2015-07-16 13:22 Andrea Scian
  2015-07-16 15:29 ` Richard Weinberger
  0 siblings, 1 reply; 10+ messages in thread
From: Andrea Scian @ 2015-07-16 13:22 UTC (permalink / raw)
  To: mtd_mailinglist


Dear all,

I'm trying to understand how UBIFS recovers master node from a corrupted 
flash (e.g. after power cut during one of the two of master node or due 
flash corruption itself).

IIUC (please correct me if I'm wrong) UBIFS store two copies of master 
node, in LEB1 and LEB2 (LEB0 is reserved for superblock), ref. 
http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf.
Inside ubifs_recover_master_node() try to read first LEB1 and, only in 
case of get_master_node() does NOT return an error, it tries to read 
LEB2 (ref. fs/ubifs/recovery.c)
I'm working with a 3.10 class kernel, but I've found nearly the same 
code on mainline.

On my test-bed, the recovery fails because get_master_node(LEB1) fails 
(exactly here 
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/ubifs/recovery.c#n184)
I'm hacking around the code to see what's really happening under the 
wood, but I'm a UBIFS newbie I would like to ask:

1) why if get_master_node(LEB1) fails we don't ALWAYS look at 
get_master_node(LEB2)? I think we should try to read LEB2 even if 
something really bad happens to LEB1.. or not?

2) if I bypass the get_master_node(LEB1) return value, I found that 
get_master_node(LEB2) fails too, for the same reason of LEB1 (see 
above). IIUC we check about empty space because master node pages get 
written without being erased every time, but I'm still studying this 
topic ;-)

If I bypass that check too, I can mount UBIFS and everything inside the 
FS is there but, of course, I'm sure I'm doing something that may be wrong..

WDYT?

Thanks in advance and kind regards,

-- 

Andrea SCIAN

DAVE Embedded Systems

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS: recovery of master node
  2015-07-16 13:22 UBIFS: recovery of master node Andrea Scian
@ 2015-07-16 15:29 ` Richard Weinberger
  2015-07-16 15:50   ` Andrea Scian
  2015-07-17  6:58   ` Andrea Scian
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Weinberger @ 2015-07-16 15:29 UTC (permalink / raw)
  To: Andrea Scian; +Cc: mtd_mailinglist

Andrea,

On Thu, Jul 16, 2015 at 3:22 PM, Andrea Scian <rnd4@dave-tech.it> wrote:
>
> Dear all,
>
> I'm trying to understand how UBIFS recovers master node from a corrupted
> flash (e.g. after power cut during one of the two of master node or due
> flash corruption itself).
>
> IIUC (please correct me if I'm wrong) UBIFS store two copies of master node,
> in LEB1 and LEB2 (LEB0 is reserved for superblock), ref.
> http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf.
> Inside ubifs_recover_master_node() try to read first LEB1 and, only in case
> of get_master_node() does NOT return an error, it tries to read LEB2 (ref.
> fs/ubifs/recovery.c)
> I'm working with a 3.10 class kernel, but I've found nearly the same code on
> mainline.
>
> On my test-bed, the recovery fails because get_master_node(LEB1) fails
> (exactly here
> http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/ubifs/recovery.c#n184)
> I'm hacking around the code to see what's really happening under the wood,
> but I'm a UBIFS newbie I would like to ask:
>
> 1) why if get_master_node(LEB1) fails we don't ALWAYS look at
> get_master_node(LEB2)? I think we should try to read LEB2 even if something
> really bad happens to LEB1.. or not?

AFAIK the idea was that only upon plausible errors the second LEB will be used.
If reading LEB1 fails due to an internal MTD error UBIFS gives up.

> 2) if I bypass the get_master_node(LEB1) return value, I found that
> get_master_node(LEB2) fails too, for the same reason of LEB1 (see above).
> IIUC we check about empty space because master node pages get written
> without being erased every time, but I'm still studying this topic ;-)
>
> If I bypass that check too, I can mount UBIFS and everything inside the FS
> is there but, of course, I'm sure I'm doing something that may be wrong..
>
> WDYT?

So, you're facing bitflips on empty space?
Is this MLC NAND?

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS: recovery of master node
  2015-07-16 15:29 ` Richard Weinberger
@ 2015-07-16 15:50   ` Andrea Scian
  2015-07-17  6:58   ` Andrea Scian
  1 sibling, 0 replies; 10+ messages in thread
From: Andrea Scian @ 2015-07-16 15:50 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: mtd_mailinglist


Dear Richard,

Il 16/07/2015 17:29, Richard Weinberger ha scritto:
> Andrea,
>
> On Thu, Jul 16, 2015 at 3:22 PM, Andrea Scian <rnd4@dave-tech.it> wrote:
>> Dear all,
>>
>> I'm trying to understand how UBIFS recovers master node from a corrupted
>> flash (e.g. after power cut during one of the two of master node or due
>> flash corruption itself).
>>
>> IIUC (please correct me if I'm wrong) UBIFS store two copies of master node,
>> in LEB1 and LEB2 (LEB0 is reserved for superblock), ref.
>> http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf.
>> Inside ubifs_recover_master_node() try to read first LEB1 and, only in case
>> of get_master_node() does NOT return an error, it tries to read LEB2 (ref.
>> fs/ubifs/recovery.c)
>> I'm working with a 3.10 class kernel, but I've found nearly the same code on
>> mainline.
>>
>> On my test-bed, the recovery fails because get_master_node(LEB1) fails
>> (exactly here
>> http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/ubifs/recovery.c#n184)
>> I'm hacking around the code to see what's really happening under the wood,
>> but I'm a UBIFS newbie I would like to ask:
>>
>> 1) why if get_master_node(LEB1) fails we don't ALWAYS look at
>> get_master_node(LEB2)? I think we should try to read LEB2 even if something
>> really bad happens to LEB1.. or not?
> AFAIK the idea was that only upon plausible errors the second LEB will be used.
> If reading LEB1 fails due to an internal MTD error UBIFS gives up.

Understood, I think you already told me that UBI/UBIFS assume that empty 
flash always stay empty

However, to me it seems this is a heavy limitation, if we have another 
master node, why don't we always try to use it?
Again, I'm just trying to understand the recovery code and try to 
improve it (if possible ;-) )

>> 2) if I bypass the get_master_node(LEB1) return value, I found that
>> get_master_node(LEB2) fails too, for the same reason of LEB1 (see above).
>> IIUC we check about empty space because master node pages get written
>> without being erased every time, but I'm still studying this topic ;-)
>>
>> If I bypass that check too, I can mount UBIFS and everything inside the FS
>> is there but, of course, I'm sure I'm doing something that may be wrong..
>>
>> WDYT?
> So, you're facing bitflips on empty space?

Yes, and this seems the same behavior I saw previously regarding factory 
bad block marker (ref 
http://lists.infradead.org/pipermail/linux-mtd/2015-March/058151.html)

> Is this MLC NAND?

Yes, I'm doing some heavy stress test on it

Kind Regards,

-- 

Andrea SCIAN

DAVE Embedded Systems

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS: recovery of master node
  2015-07-16 15:29 ` Richard Weinberger
  2015-07-16 15:50   ` Andrea Scian
@ 2015-07-17  6:58   ` Andrea Scian
  2015-07-17  7:24     ` Richard Weinberger
  1 sibling, 1 reply; 10+ messages in thread
From: Andrea Scian @ 2015-07-17  6:58 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: mtd_mailinglist

Il 16/07/2015 17:29, Richard Weinberger ha scritto:
> Andrea,
>
> On Thu, Jul 16, 2015 at 3:22 PM, Andrea Scian <rnd4@dave-tech.it> wrote:
>>
>> If I bypass that check too, I can mount UBIFS and everything inside the FS
>> is there but, of course, I'm sure I'm doing something that may be wrong..
>>
>> WDYT?
>
> So, you're facing bitflips on empty space?

Another UBI/UBIFS "implementation" question: are there some other 
places, apart from get_master_node(), where UBIFS check empty space 
corruption and fails badly if something wrong?

TIA,

-- 

Andrea SCIAN

DAVE Embedded Systems

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS: recovery of master node
  2015-07-17  6:58   ` Andrea Scian
@ 2015-07-17  7:24     ` Richard Weinberger
  2015-07-17  8:04       ` Andrea Scian
  2015-07-17 11:38       ` Artem Bityutskiy
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Weinberger @ 2015-07-17  7:24 UTC (permalink / raw)
  To: Andrea Scian; +Cc: mtd_mailinglist

Am 17.07.2015 um 08:58 schrieb Andrea Scian:
> Il 16/07/2015 17:29, Richard Weinberger ha scritto:
>> Andrea,
>>
>> On Thu, Jul 16, 2015 at 3:22 PM, Andrea Scian <rnd4@dave-tech.it> wrote:
>>>
>>> If I bypass that check too, I can mount UBIFS and everything inside the FS
>>> is there but, of course, I'm sure I'm doing something that may be wrong..
>>>
>>> WDYT?
>>
>> So, you're facing bitflips on empty space?
> 
> Another UBI/UBIFS "implementation" question: are there some other places, apart from get_master_node(), where UBIFS check empty space corruption and fails badly if something wrong?

Having non-corrupted empty space is a fundamental requirement of UBIFS.
If you patch it out you'll hurt UBIFS's ability to recover from a power cut.
Someone tried to do so already.

I know, cheap modern NAND, especially MLC seems to show bitflips also on empty pages.
Not all NAND controllers can deal with that and will just return an uncorrectable ECC error
upon reading.
IMHO the right place to deal with that is MTD core.
Please search the archives, Brian posted some patches some time ago.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS: recovery of master node
  2015-07-17  7:24     ` Richard Weinberger
@ 2015-07-17  8:04       ` Andrea Scian
  2015-07-17  8:10         ` Richard Weinberger
  2015-07-17 11:38       ` Artem Bityutskiy
  1 sibling, 1 reply; 10+ messages in thread
From: Andrea Scian @ 2015-07-17  8:04 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: mtd_mailinglist


Dear Richard,

Il 17/07/2015 09:24, Richard Weinberger ha scritto:
> Am 17.07.2015 um 08:58 schrieb Andrea Scian:
>> Il 16/07/2015 17:29, Richard Weinberger ha scritto:
>>> Andrea,
>>>
>>> On Thu, Jul 16, 2015 at 3:22 PM, Andrea Scian <rnd4@dave-tech.it> wrote:
>>>>
>>>> If I bypass that check too, I can mount UBIFS and everything inside the FS
>>>> is there but, of course, I'm sure I'm doing something that may be wrong..
>>>>
>>>> WDYT?
>>>
>>> So, you're facing bitflips on empty space?
>>
>> Another UBI/UBIFS "implementation" question: are there some other places,
 >> apart from get_master_node(),
 >> where UBIFS check empty space corruption and fails
 >> badly if something wrong?
>
> Having non-corrupted empty space is a fundamental requirement of UBIFS.
> If you patch it out you'll hurt UBIFS's ability to recover from a power cut.
> Someone tried to do so already.

Thanks, this are the internals of UBIFS I'm not aware of, and for this 
I'm asking the experts :-)

> I know, cheap modern NAND, especially MLC seems to show bitflips also on empty pages.
> Not all NAND controllers can deal with that and will just return an uncorrectable ECC error
> upon reading.

Is the any NAND controller able to do so? ;-)

> IMHO the right place to deal with that is MTD core.

I agree with you, however I'm handling it at lowest level, inside the 
NAND controller.
I know that having this code into the MTD NAND layer will allow us to 
have a "controller independent" implementation, however MTD see only a 
bigger picture: for example MTD sees only a NAND page (4k in my case) 
while the NAND controller usually apply ECC on smaller (1k in my case) 
section and here we have the right threshold to apply (ecc_strength or 
something a bit smaller if you prefer).

Finding the right threshold is, IMHO, the real trick.

> Please search the archives, Brian posted some patches some time ago.

Thanks for point this out.
I've found some patches from one year ago from Huang Shijie (@freescale) 
which looks very close to mine. However I cannot see any inclusion in 
the mainline.

I'll continue my research.

Thanks and kind regards,

-- 

Andrea SCIAN

DAVE Embedded Systems

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS: recovery of master node
  2015-07-17  8:04       ` Andrea Scian
@ 2015-07-17  8:10         ` Richard Weinberger
  2015-07-17  8:59           ` Richard Genoud
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2015-07-17  8:10 UTC (permalink / raw)
  To: Andrea Scian; +Cc: mtd_mailinglist

Am 17.07.2015 um 10:04 schrieb Andrea Scian:
> 
> Dear Richard,
> 
> Il 17/07/2015 09:24, Richard Weinberger ha scritto:
>> Am 17.07.2015 um 08:58 schrieb Andrea Scian:
>>> Il 16/07/2015 17:29, Richard Weinberger ha scritto:
>>>> Andrea,
>>>>
>>>> On Thu, Jul 16, 2015 at 3:22 PM, Andrea Scian <rnd4@dave-tech.it> wrote:
>>>>>
>>>>> If I bypass that check too, I can mount UBIFS and everything inside the FS
>>>>> is there but, of course, I'm sure I'm doing something that may be wrong..
>>>>>
>>>>> WDYT?
>>>>
>>>> So, you're facing bitflips on empty space?
>>>
>>> Another UBI/UBIFS "implementation" question: are there some other places,
>>> apart from get_master_node(),
>>> where UBIFS check empty space corruption and fails
>>> badly if something wrong?
>>
>> Having non-corrupted empty space is a fundamental requirement of UBIFS.
>> If you patch it out you'll hurt UBIFS's ability to recover from a power cut.
>> Someone tried to do so already.
> 
> Thanks, this are the internals of UBIFS I'm not aware of, and for this I'm asking the experts :-)
> 
>> I know, cheap modern NAND, especially MLC seems to show bitflips also on empty pages.
>> Not all NAND controllers can deal with that and will just return an uncorrectable ECC error
>> upon reading.
> 
> Is the any NAND controller able to do so? ;-)

TBH, I don't know. :-)

>> IMHO the right place to deal with that is MTD core.
> 
> I agree with you, however I'm handling it at lowest level, inside the NAND controller.
> I know that having this code into the MTD NAND layer will allow us to have a "controller independent" implementation, however MTD see only a bigger picture: for example MTD sees
> only a NAND page (4k in my case) while the NAND controller usually apply ECC on smaller (1k in my case) section and here we have the right threshold to apply (ecc_strength or
> something a bit smaller if you prefer).
> 
> Finding the right threshold is, IMHO, the real trick.

Another problem is that on some controllers ECC simply does not work for empty
pages. So, you'll get an uncorrectable ECC error upon the first bitflip.

> Thanks for point this out.
> I've found some patches from one year ago from Huang Shijie (@freescale) which looks very close to mine. However I cannot see any inclusion in the mainline.

None of the proposed solutions went mainline so far. :(

Thanks,
//richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS: recovery of master node
  2015-07-17  8:10         ` Richard Weinberger
@ 2015-07-17  8:59           ` Richard Genoud
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Genoud @ 2015-07-17  8:59 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: Andrea Scian, mtd_mailinglist

2015-07-17 10:10 GMT+02:00 Richard Weinberger <richard@nod.at>:
> Am 17.07.2015 um 10:04 schrieb Andrea Scian:
>>
>> Dear Richard,
>>
>> Il 17/07/2015 09:24, Richard Weinberger ha scritto:
>>> Am 17.07.2015 um 08:58 schrieb Andrea Scian:
>>>> Il 16/07/2015 17:29, Richard Weinberger ha scritto:
>>>>> Andrea,
>>>>>
>>>>> On Thu, Jul 16, 2015 at 3:22 PM, Andrea Scian <rnd4@dave-tech.it> wrote:
>>>>>>
>>>>>> If I bypass that check too, I can mount UBIFS and everything inside the FS
>>>>>> is there but, of course, I'm sure I'm doing something that may be wrong..
>>>>>>
>>>>>> WDYT?
>>>>>
>>>>> So, you're facing bitflips on empty space?
>>>>
>>>> Another UBI/UBIFS "implementation" question: are there some other places,
>>>> apart from get_master_node(),
>>>> where UBIFS check empty space corruption and fails
>>>> badly if something wrong?
>>>
>>> Having non-corrupted empty space is a fundamental requirement of UBIFS.
>>> If you patch it out you'll hurt UBIFS's ability to recover from a power cut.
>>> Someone tried to do so already.
>>
>> Thanks, this are the internals of UBIFS I'm not aware of, and for this I'm asking the experts :-)
>>
>>> I know, cheap modern NAND, especially MLC seems to show bitflips also on empty pages.
>>> Not all NAND controllers can deal with that and will just return an uncorrectable ECC error
>>> upon reading.
>>
>> Is the any NAND controller able to do so? ;-)
>
> TBH, I don't know. :-)
The atmel nand controllers since SAMA5 can do that (and that quite easy to do) :
All they have to do is set an XOR value at the end of the ECC
computation so that ECC(blank_page)==0xFFFFFFFFF...
(maybe it's more complicated that that, but that's the idea).

As far as I've seen, the actual implementation (in atmel_nand for
chips >SAMA5) is buggy :
http://lxr.free-electrons.com/source/drivers/mtd/nand/atmel_nand.c#L857
If the ECC is all FF, the page is considered to be a blank page (but
it's not memset-ed to 0)
and a bitflip in the ecc area is not handled either.

I didn't have time to rebound on Brian's patch proposal yet (
http://patchwork.ozlabs.org/patch/328994/ ) but it's clear that we
have to do something to address that.
At least, UBIFS is screaming out loud when a blank page is not blank !
others FS may just write corrupted data...


Richard.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS: recovery of master node
  2015-07-17  7:24     ` Richard Weinberger
  2015-07-17  8:04       ` Andrea Scian
@ 2015-07-17 11:38       ` Artem Bityutskiy
  2015-07-17 11:43         ` Richard Weinberger
  1 sibling, 1 reply; 10+ messages in thread
From: Artem Bityutskiy @ 2015-07-17 11:38 UTC (permalink / raw)
  To: Richard Weinberger, Andrea Scian; +Cc: mtd_mailinglist

On Fri, 2015-07-17 at 09:24 +0200, Richard Weinberger wrote:
> Am 17.07.2015 um 08:58 schrieb Andrea Scian:
> > Il 16/07/2015 17:29, Richard Weinberger ha scritto:
> > > Andrea,
> > > 
> > > On Thu, Jul 16, 2015 at 3:22 PM, Andrea Scian <rnd4@dave-tech.it> 
> > > wrote:
> > > > 
> > > > If I bypass that check too, I can mount UBIFS and everything 
> > > > inside the FS
> > > > is there but, of course, I'm sure I'm doing something that may 
> > > > be wrong..
> > > > 
> > > > WDYT?
> > > 
> > > So, you're facing bitflips on empty space?
> > 
> > Another UBI/UBIFS "implementation" question: are there some other 
> > places, apart from get_master_node(), where UBIFS check empty space 
> > corruption and fails badly if something wrong?
> 
> Having non-corrupted empty space is a fundamental requirement of 
> UBIFS.

I am not sure it is that fundamental. What UBIFS needs is to
distinguish between used and usused flash areas. It does this by
comparing agains 0xFFs. Simple, worked fine in the past.

If the space is empty, UBIFS assumes it can write to it. UBIFS is being
paranoid and also verifies that entire empty space contains all 0xFFs.
Also very simple, worked fine in the past.

Now if you have "corrupted empty space" (i.e., you cannot write to it),
this does not have to be the end of the world, you can clean it up by
doing the "atomic LEB change" operation. It is just not implemented,
but it could be done - matter of engineer-hours spent.

Artem.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS: recovery of master node
  2015-07-17 11:38       ` Artem Bityutskiy
@ 2015-07-17 11:43         ` Richard Weinberger
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Weinberger @ 2015-07-17 11:43 UTC (permalink / raw)
  To: dedekind1, Andrea Scian; +Cc: mtd_mailinglist

Am 17.07.2015 um 13:38 schrieb Artem Bityutskiy:
> On Fri, 2015-07-17 at 09:24 +0200, Richard Weinberger wrote:
>> Am 17.07.2015 um 08:58 schrieb Andrea Scian:
>>> Il 16/07/2015 17:29, Richard Weinberger ha scritto:
>>>> Andrea,
>>>>
>>>> On Thu, Jul 16, 2015 at 3:22 PM, Andrea Scian <rnd4@dave-tech.it> 
>>>> wrote:
>>>>>
>>>>> If I bypass that check too, I can mount UBIFS and everything 
>>>>> inside the FS
>>>>> is there but, of course, I'm sure I'm doing something that may 
>>>>> be wrong..
>>>>>
>>>>> WDYT?
>>>>
>>>> So, you're facing bitflips on empty space?
>>>
>>> Another UBI/UBIFS "implementation" question: are there some other 
>>> places, apart from get_master_node(), where UBIFS check empty space 
>>> corruption and fails badly if something wrong?
>>
>> Having non-corrupted empty space is a fundamental requirement of 
>> UBIFS.
> 
> I am not sure it is that fundamental. What UBIFS needs is to
> distinguish between used and usused flash areas. It does this by
> comparing agains 0xFFs. Simple, worked fine in the past.

Okay. This good to know. I thought it is much more fundamental. :-)

> If the space is empty, UBIFS assumes it can write to it. UBIFS is being
> paranoid and also verifies that entire empty space contains all 0xFFs.
> Also very simple, worked fine in the past.
> 
> Now if you have "corrupted empty space" (i.e., you cannot write to it),
> this does not have to be the end of the world, you can clean it up by
> doing the "atomic LEB change" operation. It is just not implemented,
> but it could be done - matter of engineer-hours spent.

Thanks for pointing this out, Artem!

Thanks,
//richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-17 11:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-16 13:22 UBIFS: recovery of master node Andrea Scian
2015-07-16 15:29 ` Richard Weinberger
2015-07-16 15:50   ` Andrea Scian
2015-07-17  6:58   ` Andrea Scian
2015-07-17  7:24     ` Richard Weinberger
2015-07-17  8:04       ` Andrea Scian
2015-07-17  8:10         ` Richard Weinberger
2015-07-17  8:59           ` Richard Genoud
2015-07-17 11:38       ` Artem Bityutskiy
2015-07-17 11:43         ` Richard Weinberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.