All of lore.kernel.org
 help / color / mirror / Atom feed
* ubi vol_size and lots of bad blocks
@ 2011-10-10 12:09 Daniel Drake
  2011-10-11 11:35 ` Atlant Schmidt
  2011-10-14 11:15 ` Artem Bityutskiy
  0 siblings, 2 replies; 8+ messages in thread
From: Daniel Drake @ 2011-10-10 12:09 UTC (permalink / raw)
  To: linux-mtd

Hi,

We're still working on getting ubifs shipped on OLPC XO-1.

One outstanding issue we have is that on some laptops, when switching
from jffs2 to ubifs, the laptop simply does not boot (root fs mounting
difficulties).

One case of this is when there are a large number of bad blocks on the
disk, during boot we get:
[   76.855427] UBI error: vtbl_check: too large reserved_pebs 7850,
good PEBs 7765
[   76.867878] UBI error: vtbl_check: volume table check failed:
record 0, error 9

With so many bad blocks, this is likely a problematic nand or a
corrupt BBT. However, jffs2 worked in this situation, and (with many
of our laptops in remote places) it would be nice for us to figure out
how to make ubifs handle it as well.


There are other cases of this error in the archive, and people have
generally solved it by using a smaller vol_size in the ubinize config.
Am I right in saying that reserved_pebs is computed from the vol_size
specified in the ubinize config?

I guess "good PEBs" is calculated from the amount of non-bad blocks
found during the boot process.

This suggests that using vol_size is unsafe for installations such as
ours, where while we do know the NAND size in advance, we also want to
support an unknown, high number of bad blocks which will vary
throughout the field.

I found a note in the UBI FAQ where it says vol_size can be excluded
and it will be computed to be the size of the input image, and then
the autoresize flag can be used to expand the partition later.
Excluding vol_size in this way indeed solves the problem and the
problematic laptop now boots.

So, am I right in saying that for an installation such as OLPC, where
resilience to strange NAND conditions involving high numbers of bad
blocks is desired, it is advisable to *not* specify vol_size in
ubinize.cfg?

(If so I'll send in a FAQ update for the website.)

The one bit I don't understand is what happens if another block goes
bad later. If the autoresize functionality has modified reserved_pebs
to represent the exact number of good blocks on the disk (i.e.
reserved_pebs==good_PEBs), next time a block goes bad the same
reserved_pebs>good_PEBs boot failure would be hit again. But I am
probably missing something.

cheers,
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: ubi vol_size and lots of bad blocks
  2011-10-10 12:09 ubi vol_size and lots of bad blocks Daniel Drake
@ 2011-10-11 11:35 ` Atlant Schmidt
  2011-10-14 12:58   ` Artem Bityutskiy
  2011-10-14 11:15 ` Artem Bityutskiy
  1 sibling, 1 reply; 8+ messages in thread
From: Atlant Schmidt @ 2011-10-11 11:35 UTC (permalink / raw)
  To: 'Daniel Drake', linux-mtd

Daniel:

> The one bit I don't understand is what happens if another block goes
> bad later. If the autoresize functionality has modified reserved_pebs
> to represent the exact number of good blocks on the disk (i.e.
> reserved_pebs==good_PEBs), next time a block goes bad the same
> reserved_pebs>good_PEBs boot failure would be hit again. But I am
> probably missing something.

  Be careful here -- the last time I looked, blocks that go
  bad *ARE NOT* actually permanently marked as bad; they may
  no longer be used during the current boot, but the next time
  you reboot, they're eligible for attempted-but-often-failing
  use once again.

  That is, once you've initialized the UBIfs, the number of
  bad PEBs never grows, no matter how many times it the software
  discovers that (say) PEB #1234 being just atrociously bad.

  And again, this may have changed but was definitely true the
  last time I tested this (although I'd love to be told otherwise).

                                  Atlant

-----Original Message-----
From: linux-mtd-bounces@lists.infradead.org [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Daniel Drake
Sent: Monday, October 10, 2011 08:09
To: linux-mtd@lists.infradead.org
Subject: ubi vol_size and lots of bad blocks

Hi,

We're still working on getting ubifs shipped on OLPC XO-1.

One outstanding issue we have is that on some laptops, when switching
from jffs2 to ubifs, the laptop simply does not boot (root fs mounting
difficulties).

One case of this is when there are a large number of bad blocks on the
disk, during boot we get:
[   76.855427] UBI error: vtbl_check: too large reserved_pebs 7850,
good PEBs 7765
[   76.867878] UBI error: vtbl_check: volume table check failed:
record 0, error 9

With so many bad blocks, this is likely a problematic nand or a
corrupt BBT. However, jffs2 worked in this situation, and (with many
of our laptops in remote places) it would be nice for us to figure out
how to make ubifs handle it as well.


There are other cases of this error in the archive, and people have
generally solved it by using a smaller vol_size in the ubinize config.
Am I right in saying that reserved_pebs is computed from the vol_size
specified in the ubinize config?

I guess "good PEBs" is calculated from the amount of non-bad blocks
found during the boot process.

This suggests that using vol_size is unsafe for installations such as
ours, where while we do know the NAND size in advance, we also want to
support an unknown, high number of bad blocks which will vary
throughout the field.

I found a note in the UBI FAQ where it says vol_size can be excluded
and it will be computed to be the size of the input image, and then
the autoresize flag can be used to expand the partition later.
Excluding vol_size in this way indeed solves the problem and the
problematic laptop now boots.

So, am I right in saying that for an installation such as OLPC, where
resilience to strange NAND conditions involving high numbers of bad
blocks is desired, it is advisable to *not* specify vol_size in
ubinize.cfg?

(If so I'll send in a FAQ update for the website.)

The one bit I don't understand is what happens if another block goes
bad later. If the autoresize functionality has modified reserved_pebs
to represent the exact number of good blocks on the disk (i.e.
reserved_pebs==good_PEBs), next time a block goes bad the same
reserved_pebs>good_PEBs boot failure would be hit again. But I am
probably missing something.

cheers,
Daniel

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/


 Click https://www.mailcontrol.com/sr/JXot!iSixtzTndxI!oX7UpJAdpTSMUBqW1!uL9x+cJDFU9F9FklsxoR4wEgrZ2pSIEZflx!5bMpTHufDF4Ashw==  to report this email as spam.

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubi vol_size and lots of bad blocks
  2011-10-10 12:09 ubi vol_size and lots of bad blocks Daniel Drake
  2011-10-11 11:35 ` Atlant Schmidt
@ 2011-10-14 11:15 ` Artem Bityutskiy
  2011-10-17 13:35   ` Daniel Drake
  1 sibling, 1 reply; 8+ messages in thread
From: Artem Bityutskiy @ 2011-10-14 11:15 UTC (permalink / raw)
  To: Daniel Drake; +Cc: linux-mtd

Hi Daniel,

On Mon, 2011-10-10 at 13:09 +0100, Daniel Drake wrote:
> One outstanding issue we have is that on some laptops, when switching
> from jffs2 to ubifs, the laptop simply does not boot (root fs mounting
> difficulties).
> 
> One case of this is when there are a large number of bad blocks on the
> disk, during boot we get:
> [   76.855427] UBI error: vtbl_check: too large reserved_pebs 7850,
> good PEBs 7765
> [   76.867878] UBI error: vtbl_check: volume table check failed:
> record 0, error 9

Would be great if you also attached full kernel log with UBI debugging
enabled and probably build messages enabled. Just makes it easier when
you can see UBI output about the flash geometry, etc. 

> With so many bad blocks, this is likely a problematic nand or a
> corrupt BBT. However, jffs2 worked in this situation, and (with many
> of our laptops in remote places) it would be nice for us to figure out
> how to make ubifs handle it as well.
> 
> 
> There are other cases of this error in the archive, and people have
> generally solved it by using a smaller vol_size in the ubinize config.
> Am I right in saying that reserved_pebs is computed from the vol_size
> specified in the ubinize config?
> 
> I guess "good PEBs" is calculated from the amount of non-bad blocks
> found during the boot process.

Yes, I believe it is just amount of non-bad eraseblocks.

> This suggests that using vol_size is unsafe for installations such as
> ours, where while we do know the NAND size in advance, we also want to
> support an unknown, high number of bad blocks which will vary
> throughout the field.

But this is why the autoresize flag was introduce.

When creating UBI image, you have to know how big your volume has to be.
At least you need to know the _minimum_ size. And you should use this
minimum volume size in your ubinize config file.

> I found a note in the UBI FAQ where it says vol_size can be excluded
> and it will be computed to be the size of the input image, and then
> the autoresize flag can be used to expand the partition later.
> Excluding vol_size in this way indeed solves the problem and the
> problematic laptop now boots.

Well, you probably need some free space as well. Just come up with
some minimum number, say 300MiB and use this number for volume size in
ubinize, and use autoresize flag.

In this case, when you flash this image to your device, UBI will
automatically resize this volume to the maximum possible size.

> So, am I right in saying that for an installation such as OLPC, where
> resilience to strange NAND conditions involving high numbers of bad
> blocks is desired, it is advisable to *not* specify vol_size in
> ubinize.cfg?

Yes, I think you can do this, I think.

> (If so I'll send in a FAQ update for the website.)
> 
> The one bit I don't understand is what happens if another block goes
> bad later. If the autoresize functionality has modified reserved_pebs
> to represent the exact number of good blocks on the disk (i.e.
> reserved_pebs==good_PEBs), next time a block goes bad the same
> reserved_pebs>good_PEBs boot failure would be hit again. But I am
> probably missing something.

Autorisize will not occupy the PEBs reserved for bad block handling.

Dunno how much you looked into UBI code, but it works roughly like this:

1. avail_pebs = good_pebs
2. read volume table, and avail_pebs -= reserved_pebs for each volume,
   i.e., we subtract the amount of PEB which all volumes absolutely
   require.
3. initialize other subsystems, and subtract EBA_RESERVED_PEBS=1,
   WL_RESERVED_PEBS=1. IOW, every subsystem subtracts amount of PEBs
   it requires to operate. E.g., Wear-levelling (WL) subsystem requires
   one eraseblock for its purposes, etc.
4. In 'ubi_eba_init_scan()' function we calculate the normal amount of
   PEBs which we reserve for bad blocks handling (default is 1%), and
   subtract that amount from avail_pebs. If avail_peb's is already
   very small, it will become zero in this case.
5. At the very end, we increase the autoresize-marked volume by what
   is left in avail_pebs.

IOW, autoresize will not touch PEBs reserved for BB handling.

Remember, UBIFS also does autoresize automatically, but it is limited by
what you specified with -c option to mkfs.ubifs. So specify large enough
number, but not too large, because the larger it is, the more space
UBIFS will reserve for LPT. But only power-of-2 boundaries make
difference for UBIFS. IOW, 4000 and 4095 LEBs in -c are equivalent from
UBIFS POW. But 4095 and 4096 make a difference.

So whatever you specify for -c (say -c X), you can make that to be
"-c roundup_pow_of_two(X) - 1" and this will not affect anything. 
But "roundup_pow_of_two(X)" will make UBIFS image a bit larger.

I think this info is in the web size in a more readable form.

Sorry if my reply is very messy, feel free to ask questions.

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: ubi vol_size and lots of bad blocks
  2011-10-11 11:35 ` Atlant Schmidt
@ 2011-10-14 12:58   ` Artem Bityutskiy
  2011-10-14 13:03     ` Atlant Schmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Artem Bityutskiy @ 2011-10-14 12:58 UTC (permalink / raw)
  To: Atlant Schmidt; +Cc: linux-mtd, 'Daniel Drake'

On Tue, 2011-10-11 at 07:35 -0400, Atlant Schmidt wrote:
> 
>   Be careful here -- the last time I looked, blocks that go
>   bad *ARE NOT* actually permanently marked as bad; they may
>   no longer be used during the current boot, but the next time
>   you reboot, they're eligible for attempted-but-often-failing
>   use once again.

If this happened to you, this must be because of a bug in your MTD
driver. UBI dimply calls MTD mark_bad() function to mark blocks bad.

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: ubi vol_size and lots of bad blocks
  2011-10-14 12:58   ` Artem Bityutskiy
@ 2011-10-14 13:03     ` Atlant Schmidt
  2011-10-17 11:36       ` Atlant Schmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Atlant Schmidt @ 2011-10-14 13:03 UTC (permalink / raw)
  To: 'dedekind1@gmail.com'; +Cc: linux-mtd, 'Daniel Drake'

Artem:

  Interesting! I'll ask our subject-matter expert;
  it would be great if this were resolved.

                          Atlant

-----Original Message-----
From: Artem Bityutskiy [mailto:dedekind1@gmail.com]
Sent: Friday, October 14, 2011 08:58
To: Atlant Schmidt
Cc: 'Daniel Drake'; linux-mtd@lists.infradead.org
Subject: RE: ubi vol_size and lots of bad blocks

On Tue, 2011-10-11 at 07:35 -0400, Atlant Schmidt wrote:
>
>   Be careful here -- the last time I looked, blocks that go
>   bad *ARE NOT* actually permanently marked as bad; they may
>   no longer be used during the current boot, but the next time
>   you reboot, they're eligible for attempted-but-often-failing
>   use once again.

If this happened to you, this must be because of a bug in your MTD
driver. UBI dimply calls MTD mark_bad() function to mark blocks bad.

--
Best Regards,
Artem Bityutskiy



 Click https://www.mailcontrol.com/sr/fue6KqaG!5rTndxI!oX7Us7Qlo!t9IHt!Dknkv+q9bergxmG63nRctGYsfIhXbFK16EEqwzvaW+2eN6FcXjHgg==  to report this email as spam.

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: ubi vol_size and lots of bad blocks
  2011-10-14 13:03     ` Atlant Schmidt
@ 2011-10-17 11:36       ` Atlant Schmidt
  0 siblings, 0 replies; 8+ messages in thread
From: Atlant Schmidt @ 2011-10-17 11:36 UTC (permalink / raw)
  To: Atlant Schmidt, 'dedekind1@gmail.com'
  Cc: linux-mtd, 'Daniel Drake'

Artem:

  Thanks for the prompt -- our subject-matter expert
  says that yes, this was a bug that has now been
  corrected!
                         Atlant

-----Original Message-----
From: linux-mtd-bounces@lists.infradead.org [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Atlant Schmidt
Sent: Friday, October 14, 2011 09:03
To: 'dedekind1@gmail.com'
Cc: linux-mtd@lists.infradead.org; 'Daniel Drake'
Subject: RE: ubi vol_size and lots of bad blocks

Artem:

  Interesting! I'll ask our subject-matter expert;
  it would be great if this were resolved.

                          Atlant

-----Original Message-----
From: Artem Bityutskiy [mailto:dedekind1@gmail.com]
Sent: Friday, October 14, 2011 08:58
To: Atlant Schmidt
Cc: 'Daniel Drake'; linux-mtd@lists.infradead.org
Subject: RE: ubi vol_size and lots of bad blocks

On Tue, 2011-10-11 at 07:35 -0400, Atlant Schmidt wrote:
>
>   Be careful here -- the last time I looked, blocks that go
>   bad *ARE NOT* actually permanently marked as bad; they may
>   no longer be used during the current boot, but the next time
>   you reboot, they're eligible for attempted-but-often-failing
>   use once again.

If this happened to you, this must be because of a bug in your MTD
driver. UBI dimply calls MTD mark_bad() function to mark blocks bad.

--
Best Regards,
Artem Bityutskiy



 Click https://www.mailcontrol.com/sr/fue6KqaG!5rTndxI!oX7Us7Qlo!t9IHt!Dknkv+q9bergxmG63nRctGYsfIhXbFK16EEqwzvaW+2eN6FcXjHgg==  to report this email as spam.

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubi vol_size and lots of bad blocks
  2011-10-14 11:15 ` Artem Bityutskiy
@ 2011-10-17 13:35   ` Daniel Drake
  2011-10-20 15:57     ` Artem Bityutskiy
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Drake @ 2011-10-17 13:35 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

Hi Artem,

Thanks for the detailed response - I'll be sure to send another
documentation patch once we've got to the bottom of everything.

On Fri, Oct 14, 2011 at 12:15 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
>> I found a note in the UBI FAQ where it says vol_size can be excluded
>> and it will be computed to be the size of the input image, and then
>> the autoresize flag can be used to expand the partition later.
>> Excluding vol_size in this way indeed solves the problem and the
>> problematic laptop now boots.
>
> Well, you probably need some free space as well. Just come up with
> some minimum number, say 300MiB and use this number for volume size in
> ubinize, and use autoresize flag.

Regarding free space, is it really necessary? My understanding is that
the autoresize functionality will resize the volume *before* it gets
mounted for the first time, so it should be fine to not leave any free
space at image creation time. When it gets mounted for the first time,
it will be freshly resized and have free space available.

As for "some minimum number", I guess it goes without saying that
whatever number is chosen, it must be bigger than the amount of data
that is going to be written into the image. Our image building tool
will be used by different customers who will apply simple
customisations (e.g. with GNOME, with wikipedia) so the range of image
sizes varies. We need to do it based on some kind of calculation that
considers the size of the initial data to be written to the flash. If
we can do it with no free space initially, we can let ubinize do that
for us, with autoresize enabled (this was my trail of thought).

> Autorisize will not occupy the PEBs reserved for bad block handling.

OK, thanks for clarifying.
One final question... What happens when the PEBs reserved for bad
block handling run out?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ubi vol_size and lots of bad blocks
  2011-10-17 13:35   ` Daniel Drake
@ 2011-10-20 15:57     ` Artem Bityutskiy
  0 siblings, 0 replies; 8+ messages in thread
From: Artem Bityutskiy @ 2011-10-20 15:57 UTC (permalink / raw)
  To: Daniel Drake; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 3283 bytes --]

On Mon, 2011-10-17 at 14:35 +0100, Daniel Drake wrote:
> Hi Artem,
> 
> Thanks for the detailed response - I'll be sure to send another
> documentation patch once we've got to the bottom of everything.
> 
> On Fri, Oct 14, 2011 at 12:15 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> >> I found a note in the UBI FAQ where it says vol_size can be excluded
> >> and it will be computed to be the size of the input image, and then
> >> the autoresize flag can be used to expand the partition later.
> >> Excluding vol_size in this way indeed solves the problem and the
> >> problematic laptop now boots.
> >
> > Well, you probably need some free space as well. Just come up with
> > some minimum number, say 300MiB and use this number for volume size in
> > ubinize, and use autoresize flag.
> 
> Regarding free space, is it really necessary?

Well, I though that if OLPC requires some free space to boot, it could
be necessary.

>  My understanding is that
> the autoresize functionality will resize the volume *before* it gets
> mounted for the first time, so it should be fine to not leave any free
> space at image creation time. When it gets mounted for the first time,
> it will be freshly resized and have free space available.

Yes. I was just thinking about a situation when you have so many bad
blocks, that it will be resized and there will be too few space. In that
case the device won't boot with weird and unexpected symptoms. I thought
that if you reserve min. free space, then it won't boot with predictable
symptoms - UBI will print a message like "not enough eraseblocks" or
something like that.

> As for "some minimum number", I guess it goes without saying that
> whatever number is chosen, it must be bigger than the amount of data
> that is going to be written into the image.

Frankly, do not remember, depends on ubinize implemenation. Most
probably yes, if you put smaller number, ubinize will throw an error
back.

>  Our image building tool
> will be used by different customers who will apply simple
> customisations (e.g. with GNOME, with wikipedia) so the range of image
> sizes varies. We need to do it based on some kind of calculation that
> considers the size of the initial data to be written to the flash. If
> we can do it with no free space initially, we can let ubinize do that
> for us, with autoresize enabled (this was my trail of thought).

Yeah, you can forget about the free space stuff.

> > Autorisize will not occupy the PEBs reserved for bad block handling.
> 
> OK, thanks for clarifying.
> One final question... What happens when the PEBs reserved for bad
> block handling run out?

Very good question. In this case you will get an error and UBI will
switch to R/O mode.

UBI guarantees that there is a PEB for each LEB. If you run out of good
PEBs, then a write to the LEB may fail.

To recover from this error you could re-flash the device. The run-time
recovery would require deleting or shrinking one of the UBI volumes.

So you need to carefully select the amount of PEBs reserver for bad
blocks handling. For Nokia phones like N900 1% was just fine. The have
Samsung OneNAND flash, 256MiB in size, 128KiB PEBs.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-10-20 15:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-10 12:09 ubi vol_size and lots of bad blocks Daniel Drake
2011-10-11 11:35 ` Atlant Schmidt
2011-10-14 12:58   ` Artem Bityutskiy
2011-10-14 13:03     ` Atlant Schmidt
2011-10-17 11:36       ` Atlant Schmidt
2011-10-14 11:15 ` Artem Bityutskiy
2011-10-17 13:35   ` Daniel Drake
2011-10-20 15:57     ` Artem Bityutskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.