All of lore.kernel.org
 help / color / mirror / Atom feed
* UBIL design doc
@ 2010-05-08 19:39 Brijesh Singh
  2010-05-10  7:15 ` Artem Bityutskiy
  0 siblings, 1 reply; 18+ messages in thread
From: Brijesh Singh @ 2010-05-08 19:39 UTC (permalink / raw)
  To: dedekind1; +Cc: rohitvdongre, linux-mtd

[-- Attachment #1: Type: text/plain, Size: 228 bytes --]

Hi,
  I am forwarding you the design document for ubi with log. Please
find the ubil document at
http://git.infradead.org/users/brijesh/ubil_results/blob_plain/HEAD:/UBIL
design document.pdf

Thanks and Regards,
Brijesh

[-- Attachment #2: UBIL design document.pdf --]
[-- Type: application/pdf, Size: 353657 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-08 19:39 UBIL design doc Brijesh Singh
@ 2010-05-10  7:15 ` Artem Bityutskiy
  2010-05-10 10:31   ` Brijesh Singh
  2010-05-11 19:17   ` Thomas Gleixner
  0 siblings, 2 replies; 18+ messages in thread
From: Artem Bityutskiy @ 2010-05-10  7:15 UTC (permalink / raw)
  To: Brijesh Singh; +Cc: rohitvdongre, linux-mtd

On Sun, 2010-05-09 at 01:09 +0530, Brijesh Singh wrote:
> Hi,
>   I am forwarding you the design document for ubi with log. Please
> find the ubil document at
> http://git.infradead.org/users/brijesh/ubil_results/blob_plain/HEAD:/UBIL
> design document.pdf

Hi guys,

I've read the document. Looks very promising. Here some feed-back.

1. SB PEB wear-out. What if the reaseblock lifetime is, say, 10000
erease cycles? Won't the SB PEB wear out very quickly? Why you did not
go for the chaining approach which I described in the old JFFS3 design
doc?

If we do not implement chaining, we should at least design it and make
sure UBIL can be extended later so that SB chaining could be added.

2. SB PEB at the end. I think this is a very bad idea. Imagine you have
to do UBIL images for production on the factory. With your design you
have the following bad drawbacks:

  a. NAND flash has initial bad blocks, and you do not know how many,
     and where they sit. These may be the last 8 eraseblocks. So, when
     you prepare an image (say, with the ubinize user-space tool), where
     will you put the second SB PEB?

  b. Currently, UBI/UBIFS images are small. E.g., if you make an
     UBI/UBIFS image for 1GiB flash, and you have just few KiB of files,
     your image will be few megs - it will contain the files, and all
     the needed UBI/UBIFS meta-data.

     So now what will be image size for UBIL - 1GiB, and this is bad.
     You then will transfer 1GiB of data to the devices during flashing
     or you will have to invent ways to work around this. Do you need
     these complexities?

I think the second SB PEB should not be at the end.

3. Backward-compatibility. In UBIL you removed EC anc VID headers in
   PEBs. That's fine for optimization purposes. But it has draw-backs:

   a. If any of the UBIL meta-data blocks like SB, CMT or log are
      corrupted - that's it - we are screwed. You cannot anymore
      re-consturct the data by scanning. The robustness goes down.

   c. Backward compatibility - UBI will not be able to attach UBIL
      images. This is not very nice.

So, I think you should keep EC and VID headers in PEBs. And you should
make the SB/CMT/log blocks to be a new type of UBI volume with
UBI_COMPAT_DELETE or UBI_COMPAT_PRESERVE or UBI_COMPAT_RO type. In this
case UBI will attach UBIL volumes just fine.

Then, you can add an _option_ to have no EC/VID headers in PEBs. This
then can be used for performance, if one wants to sacrifice robustness.
But this should be the second step. In this case, you will just need to
put a VID header with UBI_COMPAT_REJECT flag to the first PEB.

I have some more notes, but these 3 are enough for now.

What do you think?

In any case, whatever you will try to change in UBIL, remember to make
it stable as it is now first, then do all changes so that you do not
break it.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-10  7:15 ` Artem Bityutskiy
@ 2010-05-10 10:31   ` Brijesh Singh
  2010-05-11 19:17   ` Thomas Gleixner
  1 sibling, 0 replies; 18+ messages in thread
From: Brijesh Singh @ 2010-05-10 10:31 UTC (permalink / raw)
  To: dedekind1; +Cc: rohitvdongre, linux-mtd

On Mon, May 10, 2010 at 12:45 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> On Sun, 2010-05-09 at 01:09 +0530, Brijesh Singh wrote:
>> Hi,
>>   I am forwarding you the design document for ubi with log. Please
>> find the ubil document at
>> http://git.infradead.org/users/brijesh/ubil_results/blob_plain/HEAD:/UBIL
>> design document.pdf
>
> Hi guys,
>
> I've read the document. Looks very promising. Here some feed-back.
>
> 1. SB PEB wear-out. What if the reaseblock lifetime is, say, 10000
> erease cycles? Won't the SB PEB wear out very quickly? Why you did not
> go for the chaining approach which I described in the old JFFS3 design
> doc?

Right.

> If we do not implement chaining, we should at least design it and make
> sure UBIL can be extended later so that SB chaining could be added.

This looks better for now. I will add it it to design.

> 2. SB PEB at the end. I think this is a very bad idea. Imagine you have
> to do UBIL images for production on the factory. With your design you
> have the following bad drawbacks:
>
>  a. NAND flash has initial bad blocks, and you do not know how many,
>     and where they sit. These may be the last 8 eraseblocks. So, when
>     you prepare an image (say, with the ubinize user-space tool), where
>     will you put the second SB PEB?
>
>  b. Currently, UBI/UBIFS images are small. E.g., if you make an
>     UBI/UBIFS image for 1GiB flash, and you have just few KiB of files,
>     your image will be few megs - it will contain the files, and all
>     the needed UBI/UBIFS meta-data.
>
>     So now what will be image size for UBIL - 1GiB, and this is bad.
>     You then will transfer 1GiB of data to the devices during flashing
>     or you will have to invent ways to work around this. Do you need
>     these complexities?
>
> I think the second SB PEB should not be at the end.
Good point. I will add it to first and second good block right now.

> 3. Backward-compatibility. In UBIL you removed EC anc VID headers in
>   PEBs. That's fine for optimization purposes. But it has draw-backs:
>
>   a. If any of the UBIL meta-data blocks like SB, CMT or log are
>      corrupted - that's it - we are screwed. You cannot anymore
>      re-consturct the data by scanning. The robustness goes down.
>
>   c. Backward compatibility - UBI will not be able to attach UBIL
>      images. This is not very nice.
>
> So, I think you should keep EC and VID headers in PEBs. And you should
> make the SB/CMT/log blocks to be a new type of UBI volume with
> UBI_COMPAT_DELETE or UBI_COMPAT_PRESERVE or UBI_COMPAT_RO type. In this
> case UBI will attach UBIL volumes just fine.
>
> Then, you can add an _option_ to have no EC/VID headers in PEBs. This
> then can be used for performance, if one wants to sacrifice robustness.
> But this should be the second step. In this case, you will just need to
> put a VID header with UBI_COMPAT_REJECT flag to the first PEB.
>
> I have some more notes, but these 3 are enough for now.
>
> What do you think?
This is nice. But I need to think a bit. I will get back on this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-10  7:15 ` Artem Bityutskiy
  2010-05-10 10:31   ` Brijesh Singh
@ 2010-05-11 19:17   ` Thomas Gleixner
  2010-05-12  7:03     ` Brijesh Singh
  2010-05-12  7:41     ` Artem Bityutskiy
  1 sibling, 2 replies; 18+ messages in thread
From: Thomas Gleixner @ 2010-05-11 19:17 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: Brijesh Singh, rohitvdongre, linux-mtd

B1;2005;0cOn Mon, 10 May 2010, Artem Bityutskiy wrote:

> On Sun, 2010-05-09 at 01:09 +0530, Brijesh Singh wrote:
> > Hi,
> >   I am forwarding you the design document for ubi with log. Please
> > find the ubil document at
> > http://git.infradead.org/users/brijesh/ubil_results/blob_plain/HEAD:/UBIL
> > design document.pdf

@Brijesh, thanks for tackling this !
 
> Hi guys,
> 
> I've read the document. Looks very promising. Here some feed-back.
> 
> 1. SB PEB wear-out. What if the reaseblock lifetime is, say, 10000
> erease cycles? Won't the SB PEB wear out very quickly? Why you did not
> go for the chaining approach which I described in the old JFFS3 design
> doc?
> 
> If we do not implement chaining, we should at least design it and make
> sure UBIL can be extended later so that SB chaining could be added.

The super block needs to be scanned for from the beginning of flash
anyway due to bad blocks. Putting it into a fixed position (first good
erase block) is a very bad design decision vs. wear leveling.

The super block must be moveable like any other block, though we can
keep it as close to the start of flash as possible.

Also chaining has a tradeoff. The more chains you need to walk the
closer you get to the point where you are equally bad as a full scan.

> 2. SB PEB at the end. I think this is a very bad idea. Imagine you have
> to do UBIL images for production on the factory. With your design you
> have the following bad drawbacks:
> 
>   a. NAND flash has initial bad blocks, and you do not know how many,
>      and where they sit. These may be the last 8 eraseblocks. So, when
>      you prepare an image (say, with the ubinize user-space tool), where
>      will you put the second SB PEB?
> 
>   b. Currently, UBI/UBIFS images are small. E.g., if you make an
>      UBI/UBIFS image for 1GiB flash, and you have just few KiB of files,
>      your image will be few megs - it will contain the files, and all
>      the needed UBI/UBIFS meta-data.
> 
>      So now what will be image size for UBIL - 1GiB, and this is bad.
>      You then will transfer 1GiB of data to the devices during flashing
>      or you will have to invent ways to work around this. Do you need
>      these complexities?
> 
> I think the second SB PEB should not be at the end.

I think we do not need a second SB at all. UBI should not depend on
the super block in any way. The super block is an optimization for the
common case - nothing more.

> 3. Backward-compatibility. In UBIL you removed EC anc VID headers in
>    PEBs. That's fine for optimization purposes. But it has draw-backs:
> 
>    a. If any of the UBIL meta-data blocks like SB, CMT or log are
>       corrupted - that's it - we are screwed. You cannot anymore
>       re-consturct the data by scanning. The robustness goes down.
> 
>    c. Backward compatibility - UBI will not be able to attach UBIL
>       images. This is not very nice.
> 
> So, I think you should keep EC and VID headers in PEBs. And you should
> make the SB/CMT/log blocks to be a new type of UBI volume with
> UBI_COMPAT_DELETE or UBI_COMPAT_PRESERVE or UBI_COMPAT_RO type. In this
> case UBI will attach UBIL volumes just fine.
> 
> Then, you can add an _option_ to have no EC/VID headers in PEBs. This
> then can be used for performance, if one wants to sacrifice robustness.
> But this should be the second step. In this case, you will just need to
> put a VID header with UBI_COMPAT_REJECT flag to the first PEB.

I don't think it's a good idea to kill the EC/VID headers. It not only
violates the backwards compability it also fundamentally weakens UBIs
reliability for no good reason and I doubt that the performance win is
big enough to make it worth.

The performance gain is at attach time by getting rid of the flash
scan, but not by getting rid of writing the EC/VID headers.

The logging is a speed up / optimization for the common case, but it
needs to preserve full reconstruction via scanning all eraseblocks and
checking the EC/VID headers. That also allows retrofitting on existing
devices.

I'd rather see the super block / log volume as a checkpointing
mechanism which provides a snapshot of the EC/VID headers at a given
point and a list of eraseblocks which need to be scanned at attach
time. 

That has two main advantages:
 1) It limits the number of log writes
 2) It allows full backward and forward compatibility

Looking at
http://git.infradead.org/users/brijesh/ubil_results/blob/HEAD:/nand_mount_time.pdf
I still see a linear - though less steep - attach time. For the 1GB
flash size it's still 0.8s which is nice progress vs. the 2s for the
non logging case. But that's surprising as one would expect that
logging would provide a more aggressive and non linear gain.

Just doing the simple math:

1GB FLASH with erase block size 128K and page size 2k, that
translates to 8192 erase blocks

So UBI scans 8192 erase block EC/VID headers in 2 seconds. That
equals to 8192 FLASH pages.

UBIL needs 0.8 seconds. That means that UBIL still scans ~3236 FLASH
pages (or spends the equivivalent time) to achieve the same result.

That looks wrong. Care to explain ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-11 19:17   ` Thomas Gleixner
@ 2010-05-12  7:03     ` Brijesh Singh
  2010-05-12  7:14       ` Brijesh Singh
  2010-05-12  9:02       ` Thomas Gleixner
  2010-05-12  7:41     ` Artem Bityutskiy
  1 sibling, 2 replies; 18+ messages in thread
From: Brijesh Singh @ 2010-05-12  7:03 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: rohitvdongre, linux-mtd, Artem Bityutskiy

Hi,

On Wed, May 12, 2010 at 12:47 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> B1;2005;0cOn Mon, 10 May 2010, Artem Bityutskiy wrote:
>
>> On Sun, 2010-05-09 at 01:09 +0530, Brijesh Singh wrote:
>> > Hi,
>> >   I am forwarding you the design document for ubi with log. Please
>> > find the ubil document at
>> > http://git.infradead.org/users/brijesh/ubil_results/blob_plain/HEAD:/UBIL
>> > design document.pdf
>
> @Brijesh, thanks for tackling this !
>
>> Hi guys,
>>
>> I've read the document. Looks very promising. Here some feed-back.
>>
>> 1. SB PEB wear-out. What if the reaseblock lifetime is, say, 10000
>> erease cycles? Won't the SB PEB wear out very quickly? Why you did not
>> go for the chaining approach which I described in the old JFFS3 design
>> doc?
>>
>> If we do not implement chaining, we should at least design it and make
>> sure UBIL can be extended later so that SB chaining could be added.
>
> The super block needs to be scanned for from the beginning of flash
> anyway due to bad blocks. Putting it into a fixed position (first good
> erase block) is a very bad design decision vs. wear leveling.

This scan is minimal once the bad blocks are marked bad. Flash driver
generally  returns error by reading oob area or in ram bbt table. In comparison,
keeping super block in first few blocks may become question of availability or
wear-leveling trade off. The scan time for super block itself will
take lot of time.
In fact, in that case we won't need the super block at all. Just scan
to find first
chained commit block. But this doesn't look like a very good idea.

> The super block must be moveable like any other block, though we can
> keep it as close to the start of flash as possible.

The idea is to get rid of scanning. A fixed place super block can
locate movable headers.

> Also chaining has a tradeoff. The more chains you need to walk the
> closer you get to the point where you are equally bad as a full scan.

As artem suggested, chaining should help in minimizing writes to
anchor block at fixed
location. At first instance it looked promising. But this design also
has single point of failure.

>> 2. SB PEB at the end. I think this is a very bad idea. Imagine you have
>> to do UBIL images for production on the factory. With your design you
>> have the following bad drawbacks:
>>
>>   a. NAND flash has initial bad blocks, and you do not know how many,
>>      and where they sit. These may be the last 8 eraseblocks. So, when
>>      you prepare an image (say, with the ubinize user-space tool), where
>>      will you put the second SB PEB?
>>
>>   b. Currently, UBI/UBIFS images are small. E.g., if you make an
>>      UBI/UBIFS image for 1GiB flash, and you have just few KiB of files,
>>      your image will be few megs - it will contain the files, and all
>>      the needed UBI/UBIFS meta-data.
>>
>>      So now what will be image size for UBIL - 1GiB, and this is bad.
>>      You then will transfer 1GiB of data to the devices during flashing
>>      or you will have to invent ways to work around this. Do you need
>>      these complexities?
>>
>> I think the second SB PEB should not be at the end.
>
> I think we do not need a second SB at all. UBI should not depend on
> the super block in any way. The super block is an optimization for the
> common case - nothing more.
>
>> 3. Backward-compatibility. In UBIL you removed EC anc VID headers in
>>    PEBs. That's fine for optimization purposes. But it has draw-backs:
>>
>>    a. If any of the UBIL meta-data blocks like SB, CMT or log are
>>       corrupted - that's it - we are screwed. You cannot anymore
>>       re-consturct the data by scanning. The robustness goes down.
>>
>>    c. Backward compatibility - UBI will not be able to attach UBIL
>>       images. This is not very nice.
>>
>> So, I think you should keep EC and VID headers in PEBs. And you should
>> make the SB/CMT/log blocks to be a new type of UBI volume with
>> UBI_COMPAT_DELETE or UBI_COMPAT_PRESERVE or UBI_COMPAT_RO type. In this
>> case UBI will attach UBIL volumes just fine.
>>
>> Then, you can add an _option_ to have no EC/VID headers in PEBs. This
>> then can be used for performance, if one wants to sacrifice robustness.
>> But this should be the second step. In this case, you will just need to
>> put a VID header with UBI_COMPAT_REJECT flag to the first PEB.
>
> I don't think it's a good idea to kill the EC/VID headers. It not only
> violates the backwards compability it also fundamentally weakens UBIs
> reliability for no good reason and I doubt that the performance win is
> big enough to make it worth.
>
> The performance gain is at attach time by getting rid of the flash
> scan, but not by getting rid of writing the EC/VID headers.

These flash headers have some more problems. Like, space wastage in MLC.
Alignment problem for byte addressable memory. Backward compatibility
is a good idea.
But it is possible to implement these features and higher performance
by getting rid of them.
It seemed a fair trade off to me. But I am open for any better solution.

> The logging is a speed up / optimization for the common case, but it
> needs to preserve full reconstruction via scanning all eraseblocks and
> checking the EC/VID headers. That also allows retrofitting on existing
> devices.
>
> I'd rather see the super block / log volume as a checkpointing
> mechanism which provides a snapshot of the EC/VID headers at a given
> point and a list of eraseblocks which need to be scanned at attach
> time.
>
> That has two main advantages:
>  1) It limits the number of log writes
>  2) It allows full backward and forward compatibility
>
> Looking at
> http://git.infradead.org/users/brijesh/ubil_results/blob/HEAD:/nand_mount_time.pdf
> I still see a linear - though less steep - attach time. For the 1GB
> flash size it's still 0.8s which is nice progress vs. the 2s for the
> non logging case. But that's surprising as one would expect that
> logging would provide a more aggressive and non linear gain.
>
> Just doing the simple math:
>
> 1GB FLASH with erase block size 128K and page size 2k, that
> translates to 8192 erase blocks
>
> So UBI scans 8192 erase block EC/VID headers in 2 seconds. That
> equals to 8192 FLASH pages.
>
> UBIL needs 0.8 seconds. That means that UBIL still scans ~3236 FLASH
> pages (or spends the equivivalent time) to achieve the same result.
>
> That looks wrong. Care to explain ?
Very good point. The calculations are confusing me. :-) I need to check this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12  7:03     ` Brijesh Singh
@ 2010-05-12  7:14       ` Brijesh Singh
  2010-05-12  9:02       ` Thomas Gleixner
  1 sibling, 0 replies; 18+ messages in thread
From: Brijesh Singh @ 2010-05-12  7:14 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: rohitvdongre, linux-mtd, Artem Bityutskiy

On Wed, May 12, 2010 at 12:33 PM, Brijesh Singh
<brijesh.s.singh@gmail.com> wrote:
> Hi,
>
> On Wed, May 12, 2010 at 12:47 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> B1;2005;0cOn Mon, 10 May 2010, Artem Bityutskiy wrote:
>>
>>> On Sun, 2010-05-09 at 01:09 +0530, Brijesh Singh wrote:
>>> > Hi,
>>> >   I am forwarding you the design document for ubi with log. Please
>>> > find the ubil document at
>>> > http://git.infradead.org/users/brijesh/ubil_results/blob_plain/HEAD:/UBIL
>>> > design document.pdf
>>
>> @Brijesh, thanks for tackling this !
>>
>>> Hi guys,
>>>
>>> I've read the document. Looks very promising. Here some feed-back.
>>>
>>> 1. SB PEB wear-out. What if the reaseblock lifetime is, say, 10000
>>> erease cycles? Won't the SB PEB wear out very quickly? Why you did not
>>> go for the chaining approach which I described in the old JFFS3 design
>>> doc?
>>>
>>> If we do not implement chaining, we should at least design it and make
>>> sure UBIL can be extended later so that SB chaining could be added.
>>
>> The super block needs to be scanned for from the beginning of flash
>> anyway due to bad blocks. Putting it into a fixed position (first good
>> erase block) is a very bad design decision vs. wear leveling.
>
> This scan is minimal once the bad blocks are marked bad. Flash driver
> generally  returns error by reading oob area or in ram bbt table. In comparison,
> keeping super block in first few blocks may become question of availability or
> wear-leveling trade off. The scan time for super block itself will
> take lot of time.
> In fact, in that case we won't need the super block at all. Just scan
> to find first
> chained commit block. But this doesn't look like a very good idea.
>
>> The super block must be moveable like any other block, though we can
>> keep it as close to the start of flash as possible.
>
> The idea is to get rid of scanning. A fixed place super block can
> locate movable headers.
>
>> Also chaining has a tradeoff. The more chains you need to walk the
>> closer you get to the point where you are equally bad as a full scan.
>
> As artem suggested, chaining should help in minimizing writes to
> anchor block at fixed
> location. At first instance it looked promising. But this design also
> has single point of failure.
>
>>> 2. SB PEB at the end. I think this is a very bad idea. Imagine you have
>>> to do UBIL images for production on the factory. With your design you
>>> have the following bad drawbacks:
>>>
>>>   a. NAND flash has initial bad blocks, and you do not know how many,
>>>      and where they sit. These may be the last 8 eraseblocks. So, when
>>>      you prepare an image (say, with the ubinize user-space tool), where
>>>      will you put the second SB PEB?
>>>
>>>   b. Currently, UBI/UBIFS images are small. E.g., if you make an
>>>      UBI/UBIFS image for 1GiB flash, and you have just few KiB of files,
>>>      your image will be few megs - it will contain the files, and all
>>>      the needed UBI/UBIFS meta-data.
>>>
>>>      So now what will be image size for UBIL - 1GiB, and this is bad.
>>>      You then will transfer 1GiB of data to the devices during flashing
>>>      or you will have to invent ways to work around this. Do you need
>>>      these complexities?
>>>
>>> I think the second SB PEB should not be at the end.
>>
>> I think we do not need a second SB at all. UBI should not depend on
>> the super block in any way. The super block is an optimization for the
>> common case - nothing more.
>>
>>> 3. Backward-compatibility. In UBIL you removed EC anc VID headers in
>>>    PEBs. That's fine for optimization purposes. But it has draw-backs:
>>>
>>>    a. If any of the UBIL meta-data blocks like SB, CMT or log are
>>>       corrupted - that's it - we are screwed. You cannot anymore
>>>       re-consturct the data by scanning. The robustness goes down.
>>>
>>>    c. Backward compatibility - UBI will not be able to attach UBIL
>>>       images. This is not very nice.
>>>
>>> So, I think you should keep EC and VID headers in PEBs. And you should
>>> make the SB/CMT/log blocks to be a new type of UBI volume with
>>> UBI_COMPAT_DELETE or UBI_COMPAT_PRESERVE or UBI_COMPAT_RO type. In this
>>> case UBI will attach UBIL volumes just fine.
>>>
>>> Then, you can add an _option_ to have no EC/VID headers in PEBs. This
>>> then can be used for performance, if one wants to sacrifice robustness.
>>> But this should be the second step. In this case, you will just need to
>>> put a VID header with UBI_COMPAT_REJECT flag to the first PEB.
>>
>> I don't think it's a good idea to kill the EC/VID headers. It not only
>> violates the backwards compability it also fundamentally weakens UBIs
>> reliability for no good reason and I doubt that the performance win is
>> big enough to make it worth.
>>
>> The performance gain is at attach time by getting rid of the flash
>> scan, but not by getting rid of writing the EC/VID headers.
>
> These flash headers have some more problems. Like, space wastage in MLC.
> Alignment problem for byte addressable memory. Backward compatibility
> is a good idea.
> But it is possible to implement these features and higher performance
> by getting rid of them.
> It seemed a fair trade off to me. But I am open for any better solution.
>
>> The logging is a speed up / optimization for the common case, but it
>> needs to preserve full reconstruction via scanning all eraseblocks and
>> checking the EC/VID headers. That also allows retrofitting on existing
>> devices.
>>
>> I'd rather see the super block / log volume as a checkpointing
>> mechanism which provides a snapshot of the EC/VID headers at a given
>> point and a list of eraseblocks which need to be scanned at attach
>> time.
>>
>> That has two main advantages:
>>  1) It limits the number of log writes
>>  2) It allows full backward and forward compatibility
>>
>> Looking at
>> http://git.infradead.org/users/brijesh/ubil_results/blob/HEAD:/nand_mount_time.pdf
>> I still see a linear - though less steep - attach time. For the 1GB
>> flash size it's still 0.8s which is nice progress vs. the 2s for the
>> non logging case. But that's surprising as one would expect that
>> logging would provide a more aggressive and non linear gain.
>>
>> Just doing the simple math:
>>
>> 1GB FLASH with erase block size 128K and page size 2k, that
>> translates to 8192 erase blocks
>>
>> So UBI scans 8192 erase block EC/VID headers in 2 seconds. That
>> equals to 8192 FLASH pages.
>>
>> UBIL needs 0.8 seconds. That means that UBIL still scans ~3236 FLASH
>> pages (or spends the equivivalent time) to achieve the same result.
>>
>> That looks wrong. Care to explain ?
> Very good point. The calculations are confusing me. :-) I need to check this.
Sorry, for email client alignment problem. :(

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-11 19:17   ` Thomas Gleixner
  2010-05-12  7:03     ` Brijesh Singh
@ 2010-05-12  7:41     ` Artem Bityutskiy
  2010-05-12  8:03       ` Brijesh Singh
  2010-05-12  9:06       ` Thomas Gleixner
  1 sibling, 2 replies; 18+ messages in thread
From: Artem Bityutskiy @ 2010-05-12  7:41 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Brijesh Singh, rohitvdongre, linux-mtd

On Tue, 2010-05-11 at 21:17 +0200, Thomas Gleixner wrote:
> B1;2005;0cOn Mon, 10 May 2010, Artem Bityutskiy wrote:
> 
> > On Sun, 2010-05-09 at 01:09 +0530, Brijesh Singh wrote:
> > > Hi,
> > >   I am forwarding you the design document for ubi with log. Please
> > > find the ubil document at
> > > http://git.infradead.org/users/brijesh/ubil_results/blob_plain/HEAD:/UBIL
> > > design document.pdf
> 
> @Brijesh, thanks for tackling this !
>  
> > Hi guys,
> > 
> > I've read the document. Looks very promising. Here some feed-back.
> > 
> > 1. SB PEB wear-out. What if the reaseblock lifetime is, say, 10000
> > erease cycles? Won't the SB PEB wear out very quickly? Why you did not
> > go for the chaining approach which I described in the old JFFS3 design
> > doc?
> > 
> > If we do not implement chaining, we should at least design it and make
> > sure UBIL can be extended later so that SB chaining could be added.
> 
> The super block needs to be scanned for from the beginning of flash
> anyway due to bad blocks. Putting it into a fixed position (first good
> erase block) is a very bad design decision vs. wear leveling.
> 
> The super block must be moveable like any other block, though we can
> keep it as close to the start of flash as possible.
> 
> Also chaining has a tradeoff. The more chains you need to walk the
> closer you get to the point where you are equally bad as a full scan.

Well, every new chain member reduces the superblock wear speed by order
2, so I the chain would have 2-4 eraseblocks in most cases, I guess,
which is not bad.

In the opposite, moving the SB 3-4 eraseblocks further only reduces the
load merely by factor 3-4.

> > 2. SB PEB at the end. I think this is a very bad idea. Imagine you have
> > to do UBIL images for production on the factory. With your design you
> > have the following bad drawbacks:
> > 
> >   a. NAND flash has initial bad blocks, and you do not know how many,
> >      and where they sit. These may be the last 8 eraseblocks. So, when
> >      you prepare an image (say, with the ubinize user-space tool), where
> >      will you put the second SB PEB?
> > 
> >   b. Currently, UBI/UBIFS images are small. E.g., if you make an
> >      UBI/UBIFS image for 1GiB flash, and you have just few KiB of files,
> >      your image will be few megs - it will contain the files, and all
> >      the needed UBI/UBIFS meta-data.
> > 
> >      So now what will be image size for UBIL - 1GiB, and this is bad.
> >      You then will transfer 1GiB of data to the devices during flashing
> >      or you will have to invent ways to work around this. Do you need
> >      these complexities?
> > 
> > I think the second SB PEB should not be at the end.
> 
> I think we do not need a second SB at all. UBI should not depend on
> the super block in any way. The super block is an optimization for the
> common case - nothing more.

Yeah, if we preserve the headers we can always fall-back to scanning
should something be broken.

> 
> > 3. Backward-compatibility. In UBIL you removed EC anc VID headers in
> >    PEBs. That's fine for optimization purposes. But it has draw-backs:
> > 
> >    a. If any of the UBIL meta-data blocks like SB, CMT or log are
> >       corrupted - that's it - we are screwed. You cannot anymore
> >       re-consturct the data by scanning. The robustness goes down.
> > 
> >    c. Backward compatibility - UBI will not be able to attach UBIL
> >       images. This is not very nice.
> > 
> > So, I think you should keep EC and VID headers in PEBs. And you should
> > make the SB/CMT/log blocks to be a new type of UBI volume with
> > UBI_COMPAT_DELETE or UBI_COMPAT_PRESERVE or UBI_COMPAT_RO type. In this
> > case UBI will attach UBIL volumes just fine.
> > 
> > Then, you can add an _option_ to have no EC/VID headers in PEBs. This
> > then can be used for performance, if one wants to sacrifice robustness.
> > But this should be the second step. In this case, you will just need to
> > put a VID header with UBI_COMPAT_REJECT flag to the first PEB.
> 
> I don't think it's a good idea to kill the EC/VID headers. It not only
> violates the backwards compability it also fundamentally weakens UBIs
> reliability for no good reason and I doubt that the performance win is
> big enough to make it worth.
> 
> The performance gain is at attach time by getting rid of the flash
> scan, but not by getting rid of writing the EC/VID headers.

Well, there are some space savings as well.

> 
> The logging is a speed up / optimization for the common case, but it
> needs to preserve full reconstruction via scanning all eraseblocks and
> checking the EC/VID headers. That also allows retrofitting on existing
> devices.
> 
> I'd rather see the super block / log volume as a checkpointing
> mechanism which provides a snapshot of the EC/VID headers at a given
> point and a list of eraseblocks which need to be scanned at attach
> time.
>  
> 
> That has two main advantages:
>  1) It limits the number of log writes
>  2) It allows full backward and forward compatibility

I think this is what they do, but they for some reasons removed the
headers. If they add them back, it should look like you described.

We should preserve the headers. It is always easy to disable them later,
if someone needs this for optimization purposes. E.g., we can add an
ubi_compat=0 option or something like that.

> Looking at
> http://git.infradead.org/users/brijesh/ubil_results/blob/HEAD:/nand_mount_time.pdf
> I still see a linear - though less steep - attach time. For the 1GB
> flash size it's still 0.8s which is nice progress vs. the 2s for the
> non logging case. But that's surprising as one would expect that
> logging would provide a more aggressive and non linear gain.
> 
> Just doing the simple math:
> 
> 1GB FLASH with erase block size 128K and page size 2k, that
> translates to 8192 erase blocks
> 
> So UBI scans 8192 erase block EC/VID headers in 2 seconds. That
> equals to 8192 FLASH pages.
> 
> UBIL needs 0.8 seconds. That means that UBIL still scans ~3236 FLASH
> pages (or spends the equivivalent time) to achieve the same result.
> 
> That looks wrong. Care to explain ?

I suspect these are implementation issues. I did not look at the code,
but I suspect they read whole CMT block and populate the all EB
associations at scan time. However, they could populate them lazily,
which would optimize things.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12  7:41     ` Artem Bityutskiy
@ 2010-05-12  8:03       ` Brijesh Singh
  2010-05-12  8:35         ` Artem Bityutskiy
  2010-05-12  9:06       ` Thomas Gleixner
  1 sibling, 1 reply; 18+ messages in thread
From: Brijesh Singh @ 2010-05-12  8:03 UTC (permalink / raw)
  To: dedekind1; +Cc: Thomas Gleixner, linux-mtd, rohitvdongre

Hi,

On Wed, May 12, 2010 at 1:11 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> On Tue, 2010-05-11 at 21:17 +0200, Thomas Gleixner wrote:
>> B1;2005;0cOn Mon, 10 May 2010, Artem Bityutskiy wrote:
>>
>> > On Sun, 2010-05-09 at 01:09 +0530, Brijesh Singh wrote:
>> > > Hi,
>> > >   I am forwarding you the design document for ubi with log. Please
>> > > find the ubil document at
>> > > http://git.infradead.org/users/brijesh/ubil_results/blob_plain/HEAD:/UBIL
>> > > design document.pdf
>>
>> @Brijesh, thanks for tackling this !
>>
>> > Hi guys,
>> >
>> > I've read the document. Looks very promising. Here some feed-back.
>> >
>> > 1. SB PEB wear-out. What if the reaseblock lifetime is, say, 10000
>> > erease cycles? Won't the SB PEB wear out very quickly? Why you did not
>> > go for the chaining approach which I described in the old JFFS3 design
>> > doc?
>> >
>> > If we do not implement chaining, we should at least design it and make
>> > sure UBIL can be extended later so that SB chaining could be added.
>>
>> The super block needs to be scanned for from the beginning of flash
>> anyway due to bad blocks. Putting it into a fixed position (first good
>> erase block) is a very bad design decision vs. wear leveling.
>>
>> The super block must be moveable like any other block, though we can
>> keep it as close to the start of flash as possible.
>>
>> Also chaining has a tradeoff. The more chains you need to walk the
>> closer you get to the point where you are equally bad as a full scan.
>
> Well, every new chain member reduces the superblock wear speed by order
> 2, so I the chain would have 2-4 eraseblocks in most cases, I guess,
> which is not bad.
>
> In the opposite, moving the SB 3-4 eraseblocks further only reduces the
> load merely by factor 3-4.
>
>> > 2. SB PEB at the end. I think this is a very bad idea. Imagine you have
>> > to do UBIL images for production on the factory. With your design you
>> > have the following bad drawbacks:
>> >
>> >   a. NAND flash has initial bad blocks, and you do not know how many,
>> >      and where they sit. These may be the last 8 eraseblocks. So, when
>> >      you prepare an image (say, with the ubinize user-space tool), where
>> >      will you put the second SB PEB?
>> >
>> >   b. Currently, UBI/UBIFS images are small. E.g., if you make an
>> >      UBI/UBIFS image for 1GiB flash, and you have just few KiB of files,
>> >      your image will be few megs - it will contain the files, and all
>> >      the needed UBI/UBIFS meta-data.
>> >
>> >      So now what will be image size for UBIL - 1GiB, and this is bad.
>> >      You then will transfer 1GiB of data to the devices during flashing
>> >      or you will have to invent ways to work around this. Do you need
>> >      these complexities?
>> >
>> > I think the second SB PEB should not be at the end.
>>
>> I think we do not need a second SB at all. UBI should not depend on
>> the super block in any way. The super block is an optimization for the
>> common case - nothing more.
>
> Yeah, if we preserve the headers we can always fall-back to scanning
> should something be broken.
>
>>
>> > 3. Backward-compatibility. In UBIL you removed EC anc VID headers in
>> >    PEBs. That's fine for optimization purposes. But it has draw-backs:
>> >
>> >    a. If any of the UBIL meta-data blocks like SB, CMT or log are
>> >       corrupted - that's it - we are screwed. You cannot anymore
>> >       re-consturct the data by scanning. The robustness goes down.
>> >
>> >    c. Backward compatibility - UBI will not be able to attach UBIL
>> >       images. This is not very nice.
>> >
>> > So, I think you should keep EC and VID headers in PEBs. And you should
>> > make the SB/CMT/log blocks to be a new type of UBI volume with
>> > UBI_COMPAT_DELETE or UBI_COMPAT_PRESERVE or UBI_COMPAT_RO type. In this
>> > case UBI will attach UBIL volumes just fine.
>> >
>> > Then, you can add an _option_ to have no EC/VID headers in PEBs. This
>> > then can be used for performance, if one wants to sacrifice robustness.
>> > But this should be the second step. In this case, you will just need to
>> > put a VID header with UBI_COMPAT_REJECT flag to the first PEB.
>>
>> I don't think it's a good idea to kill the EC/VID headers. It not only
>> violates the backwards compability it also fundamentally weakens UBIs
>> reliability for no good reason and I doubt that the performance win is
>> big enough to make it worth.
>>
>> The performance gain is at attach time by getting rid of the flash
>> scan, but not by getting rid of writing the EC/VID headers.
>
> Well, there are some space savings as well.
>
>>
>> The logging is a speed up / optimization for the common case, but it
>> needs to preserve full reconstruction via scanning all eraseblocks and
>> checking the EC/VID headers. That also allows retrofitting on existing
>> devices.
>>
>> I'd rather see the super block / log volume as a checkpointing
>> mechanism which provides a snapshot of the EC/VID headers at a given
>> point and a list of eraseblocks which need to be scanned at attach
>> time.
>>
>>
>> That has two main advantages:
>>  1) It limits the number of log writes
>>  2) It allows full backward and forward compatibility
>
> I think this is what they do, but they for some reasons removed the
> headers. If they add them back, it should look like you described.
>
> We should preserve the headers. It is always easy to disable them later,
> if someone needs this for optimization purposes. E.g., we can add an
> ubi_compat=0 option or something like that.
>
>> Looking at
>> http://git.infradead.org/users/brijesh/ubil_results/blob/HEAD:/nand_mount_time.pdf
>> I still see a linear - though less steep - attach time. For the 1GB
>> flash size it's still 0.8s which is nice progress vs. the 2s for the
>> non logging case. But that's surprising as one would expect that
>> logging would provide a more aggressive and non linear gain.
>>
>> Just doing the simple math:
>>
>> 1GB FLASH with erase block size 128K and page size 2k, that
>> translates to 8192 erase blocks
>>
>> So UBI scans 8192 erase block EC/VID headers in 2 seconds. That
>> equals to 8192 FLASH pages.
>>
>> UBIL needs 0.8 seconds. That means that UBIL still scans ~3236 FLASH
>> pages (or spends the equivivalent time) to achieve the same result.
>>
>> That looks wrong. Care to explain ?
>
> I suspect these are implementation issues. I did not look at the code,
> but I suspect they read whole CMT block and populate the all EB
> associations at scan time. However, they could populate them lazily,
> which would optimize things.
I am trying to summarize what I have understood.
I will send the patches if this is correct.
1) Commit will have ec and vid headers just as any other UBI block.
The compat flag helps in backword compatibility,
2)chained sb will locate commit. It will be part of internal volume as well.
3) Commit will be called on unmount.
4) Any unclean un-mount will lead to flash scanning just as UBI.
Any thing goes bad, normal scanning becomes recovery.
5) Not sure if log is required in first place. But it could be an option.
Is that correct?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12  8:03       ` Brijesh Singh
@ 2010-05-12  8:35         ` Artem Bityutskiy
  2010-05-12  9:49           ` Brijesh Singh
  0 siblings, 1 reply; 18+ messages in thread
From: Artem Bityutskiy @ 2010-05-12  8:35 UTC (permalink / raw)
  To: Brijesh Singh; +Cc: Thomas Gleixner, linux-mtd, rohitvdongre

On Wed, 2010-05-12 at 13:33 +0530, Brijesh Singh wrote:
> Hi,
> 
> On Wed, May 12, 2010 at 1:11 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> > On Tue, 2010-05-11 at 21:17 +0200, Thomas Gleixner wrote:
> >> B1;2005;0cOn Mon, 10 May 2010, Artem Bityutskiy wrote:
> >>
> >> > On Sun, 2010-05-09 at 01:09 +0530, Brijesh Singh wrote:
> >> > > Hi,
> >> > >   I am forwarding you the design document for ubi with log. Please
> >> > > find the ubil document at
> >> > > http://git.infradead.org/users/brijesh/ubil_results/blob_plain/HEAD:/UBIL
> >> > > design document.pdf
> >>
> >> @Brijesh, thanks for tackling this !
> >>
> >> > Hi guys,
> >> >
> >> > I've read the document. Looks very promising. Here some feed-back.
> >> >
> >> > 1. SB PEB wear-out. What if the reaseblock lifetime is, say, 10000
> >> > erease cycles? Won't the SB PEB wear out very quickly? Why you did not
> >> > go for the chaining approach which I described in the old JFFS3 design
> >> > doc?
> >> >
> >> > If we do not implement chaining, we should at least design it and make
> >> > sure UBIL can be extended later so that SB chaining could be added.
> >>
> >> The super block needs to be scanned for from the beginning of flash
> >> anyway due to bad blocks. Putting it into a fixed position (first good
> >> erase block) is a very bad design decision vs. wear leveling.
> >>
> >> The super block must be moveable like any other block, though we can
> >> keep it as close to the start of flash as possible.
> >>
> >> Also chaining has a tradeoff. The more chains you need to walk the
> >> closer you get to the point where you are equally bad as a full scan.
> >
> > Well, every new chain member reduces the superblock wear speed by order
> > 2, so I the chain would have 2-4 eraseblocks in most cases, I guess,
> > which is not bad.
> >
> > In the opposite, moving the SB 3-4 eraseblocks further only reduces the
> > load merely by factor 3-4.
> >
> >> > 2. SB PEB at the end. I think this is a very bad idea. Imagine you have
> >> > to do UBIL images for production on the factory. With your design you
> >> > have the following bad drawbacks:
> >> >
> >> >   a. NAND flash has initial bad blocks, and you do not know how many,
> >> >      and where they sit. These may be the last 8 eraseblocks. So, when
> >> >      you prepare an image (say, with the ubinize user-space tool), where
> >> >      will you put the second SB PEB?
> >> >
> >> >   b. Currently, UBI/UBIFS images are small. E.g., if you make an
> >> >      UBI/UBIFS image for 1GiB flash, and you have just few KiB of files,
> >> >      your image will be few megs - it will contain the files, and all
> >> >      the needed UBI/UBIFS meta-data.
> >> >
> >> >      So now what will be image size for UBIL - 1GiB, and this is bad.
> >> >      You then will transfer 1GiB of data to the devices during flashing
> >> >      or you will have to invent ways to work around this. Do you need
> >> >      these complexities?
> >> >
> >> > I think the second SB PEB should not be at the end.
> >>
> >> I think we do not need a second SB at all. UBI should not depend on
> >> the super block in any way. The super block is an optimization for the
> >> common case - nothing more.
> >
> > Yeah, if we preserve the headers we can always fall-back to scanning
> > should something be broken.
> >
> >>
> >> > 3. Backward-compatibility. In UBIL you removed EC anc VID headers in
> >> >    PEBs. That's fine for optimization purposes. But it has draw-backs:
> >> >
> >> >    a. If any of the UBIL meta-data blocks like SB, CMT or log are
> >> >       corrupted - that's it - we are screwed. You cannot anymore
> >> >       re-consturct the data by scanning. The robustness goes down.
> >> >
> >> >    c. Backward compatibility - UBI will not be able to attach UBIL
> >> >       images. This is not very nice.
> >> >
> >> > So, I think you should keep EC and VID headers in PEBs. And you should
> >> > make the SB/CMT/log blocks to be a new type of UBI volume with
> >> > UBI_COMPAT_DELETE or UBI_COMPAT_PRESERVE or UBI_COMPAT_RO type. In this
> >> > case UBI will attach UBIL volumes just fine.
> >> >
> >> > Then, you can add an _option_ to have no EC/VID headers in PEBs. This
> >> > then can be used for performance, if one wants to sacrifice robustness.
> >> > But this should be the second step. In this case, you will just need to
> >> > put a VID header with UBI_COMPAT_REJECT flag to the first PEB.
> >>
> >> I don't think it's a good idea to kill the EC/VID headers. It not only
> >> violates the backwards compability it also fundamentally weakens UBIs
> >> reliability for no good reason and I doubt that the performance win is
> >> big enough to make it worth.
> >>
> >> The performance gain is at attach time by getting rid of the flash
> >> scan, but not by getting rid of writing the EC/VID headers.
> >
> > Well, there are some space savings as well.
> >
> >>
> >> The logging is a speed up / optimization for the common case, but it
> >> needs to preserve full reconstruction via scanning all eraseblocks and
> >> checking the EC/VID headers. That also allows retrofitting on existing
> >> devices.
> >>
> >> I'd rather see the super block / log volume as a checkpointing
> >> mechanism which provides a snapshot of the EC/VID headers at a given
> >> point and a list of eraseblocks which need to be scanned at attach
> >> time.
> >>
> >>
> >> That has two main advantages:
> >>  1) It limits the number of log writes
> >>  2) It allows full backward and forward compatibility
> >
> > I think this is what they do, but they for some reasons removed the
> > headers. If they add them back, it should look like you described.
> >
> > We should preserve the headers. It is always easy to disable them later,
> > if someone needs this for optimization purposes. E.g., we can add an
> > ubi_compat=0 option or something like that.
> >
> >> Looking at
> >> http://git.infradead.org/users/brijesh/ubil_results/blob/HEAD:/nand_mount_time.pdf
> >> I still see a linear - though less steep - attach time. For the 1GB
> >> flash size it's still 0.8s which is nice progress vs. the 2s for the
> >> non logging case. But that's surprising as one would expect that
> >> logging would provide a more aggressive and non linear gain.
> >>
> >> Just doing the simple math:
> >>
> >> 1GB FLASH with erase block size 128K and page size 2k, that
> >> translates to 8192 erase blocks
> >>
> >> So UBI scans 8192 erase block EC/VID headers in 2 seconds. That
> >> equals to 8192 FLASH pages.
> >>
> >> UBIL needs 0.8 seconds. That means that UBIL still scans ~3236 FLASH
> >> pages (or spends the equivivalent time) to achieve the same result.
> >>
> >> That looks wrong. Care to explain ?
> >
> > I suspect these are implementation issues. I did not look at the code,
> > but I suspect they read whole CMT block and populate the all EB
> > associations at scan time. However, they could populate them lazily,
> > which would optimize things.
> I am trying to summarize what I have understood.
> I will send the patches if this is correct.
> 1) Commit will have ec and vid headers just as any other UBI block.
> The compat flag helps in backword compatibility,
> 2)chained sb will locate commit. It will be part of internal volume as well.
> 3) Commit will be called on unmount.
> 4) Any unclean un-mount will lead to flash scanning just as UBI.

No! Why you have the log then? Unclean reboots are handled by the log.

Scanning happens only when you have _corrupted_ SB, or corrupted cmt, or
log. Then you fall-back to scanning.

> Any thing goes bad, normal scanning becomes recovery.
> 5) Not sure if log is required in first place. But it could be an option.
> Is that correct?

No, at least I did not suggest you to get rid of the log. It is needed
to handle unclean reboots.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12  7:03     ` Brijesh Singh
  2010-05-12  7:14       ` Brijesh Singh
@ 2010-05-12  9:02       ` Thomas Gleixner
  2010-05-12  9:46         ` Brijesh Singh
  1 sibling, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2010-05-12  9:02 UTC (permalink / raw)
  To: Brijesh Singh; +Cc: rohitvdongre, linux-mtd, Artem Bityutskiy

Brijesh,

On Wed, 12 May 2010, Brijesh Singh wrote:
> On Wed, May 12, 2010 at 12:47 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Mon, 10 May 2010, Artem Bityutskiy wrote:
> >> 1. SB PEB wear-out. What if the reaseblock lifetime is, say, 10000
> >> erease cycles? Won't the SB PEB wear out very quickly? Why you did not
> >> go for the chaining approach which I described in the old JFFS3 design
> >> doc?
> >>
> >> If we do not implement chaining, we should at least design it and make
> >> sure UBIL can be extended later so that SB chaining could be added.
> >
> > The super block needs to be scanned for from the beginning of flash
> > anyway due to bad blocks. Putting it into a fixed position (first good
> > erase block) is a very bad design decision vs. wear leveling.
> 
> This scan is minimal once the bad blocks are marked bad. Flash driver
> generally  returns error by reading oob area or in ram bbt table. In comparison,
> keeping super block in first few blocks may become question of availability or
> wear-leveling trade off. The scan time for super block itself will
> take lot of time.
> In fact, in that case we won't need the super block at all. Just scan
> to find first
> chained commit block. But this doesn't look like a very good idea.

Well, the super block is special. It links to the log block, so that
we can move the log block to any place in the flash. So it's going to
be a block which is not recycled too often depending on how the log
chain works.

I don't see why scanning for the super block will take so much
time. If we keep it inside of the first N blocks then it's a well
defined scan time limit. And you just need to read the first page to
find it.

> > The super block must be moveable like any other block, though we can
> > keep it as close to the start of flash as possible.
> 
> The idea is to get rid of scanning. A fixed place super block can
> locate movable headers.

Right, but there is a difference between scanning 8k blocks and
scanning a low number (16 or 32) blocks. That's in the single digit
milliseconds range. So forcing it to a fixed location is just an
overoptimization.

> > Also chaining has a tradeoff. The more chains you need to walk the
> > closer you get to the point where you are equally bad as a full scan.
> 
> As artem suggested, chaining should help in minimizing writes to
> anchor block at fixed
> location. At first instance it looked promising. But this design also
> has single point of failure.

Not if you make the super block movable. Then you can create the
replacement block before erasing the old super block. So in the worst
case you have two super blocks in that scan range and you need to
figure out which one is the valid one. No big deal.

> > I don't think it's a good idea to kill the EC/VID headers. It not only
> > violates the backwards compability it also fundamentally weakens UBIs
> > reliability for no good reason and I doubt that the performance win is
> > big enough to make it worth.
> >
> > The performance gain is at attach time by getting rid of the flash
> > scan, but not by getting rid of writing the EC/VID headers.
> 
> These flash headers have some more problems. Like, space wastage in MLC.
> Alignment problem for byte addressable memory. Backward compatibility
> is a good idea.
> But it is possible to implement these features and higher performance
> by getting rid of them.
> It seemed a fair trade off to me. But I am open for any better solution.

Well. You can get rid of them, but that needs to be optional. 

And if you remove them then your log needs to take care of the EC/VID
tracking.

That means you write at least ECC size long data to the log for every
single transaction. IOW, you move the space waste partially from the
data blocks to the loging blocks.

If you do not, you lose reliability completely, which is not an option
at all.

Further this will need a careful balance of how many log entries you
write before you need to create a compressed snapshot simply because
the number of entries is linearely increasing the attach time.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12  7:41     ` Artem Bityutskiy
  2010-05-12  8:03       ` Brijesh Singh
@ 2010-05-12  9:06       ` Thomas Gleixner
  2010-05-12  9:31         ` Artem Bityutskiy
  1 sibling, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2010-05-12  9:06 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: Brijesh Singh, rohitvdongre, linux-mtd

On Wed, 12 May 2010, Artem Bityutskiy wrote:
> On Tue, 2010-05-11 at 21:17 +0200, Thomas Gleixner wrote:
> > 
> > Also chaining has a tradeoff. The more chains you need to walk the
> > closer you get to the point where you are equally bad as a full scan.
> 
> Well, every new chain member reduces the superblock wear speed by order
> 2, so I the chain would have 2-4 eraseblocks in most cases, I guess,
> which is not bad.
> 
> In the opposite, moving the SB 3-4 eraseblocks further only reduces the
> load merely by factor 3-4.

Right, but having the flexibility of moving the super block in the
first 16 or 32 blocks is not going to hurt the attach time
significantly. I'm not against the super block and chain design, I
merily fight fixed address designs.
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12  9:06       ` Thomas Gleixner
@ 2010-05-12  9:31         ` Artem Bityutskiy
  0 siblings, 0 replies; 18+ messages in thread
From: Artem Bityutskiy @ 2010-05-12  9:31 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Brijesh Singh, rohitvdongre, linux-mtd

On Wed, 2010-05-12 at 11:06 +0200, Thomas Gleixner wrote:
> On Wed, 12 May 2010, Artem Bityutskiy wrote:
> > On Tue, 2010-05-11 at 21:17 +0200, Thomas Gleixner wrote:
> > > 
> > > Also chaining has a tradeoff. The more chains you need to walk the
> > > closer you get to the point where you are equally bad as a full scan.
> > 
> > Well, every new chain member reduces the superblock wear speed by order
> > 2, so I the chain would have 2-4 eraseblocks in most cases, I guess,
> > which is not bad.
> > 
> > In the opposite, moving the SB 3-4 eraseblocks further only reduces the
> > load merely by factor 3-4.
> 
> Right, but having the flexibility of moving the super block in the
> first 16 or 32 blocks is not going to hurt the attach time
> significantly. I'm not against the super block and chain design, I
> merily fight fixed address designs.

Yeah, I guess this is not a big deal to shift the SB forward a bit if
needed.

It is not worth discussing further, but to make sure Brijesh is focused
on the most important things, I'd like to note that implementation-wise,
it is OK to have a constant defined to 1 so far, and later test that
everything works just fine when it is something else, and optionally
implement the SB searching function.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12  9:02       ` Thomas Gleixner
@ 2010-05-12  9:46         ` Brijesh Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Brijesh Singh @ 2010-05-12  9:46 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: rohitvdongre, linux-mtd, Artem Bityutskiy

On Wed, May 12, 2010 at 2:32 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> Brijesh,
>
> On Wed, 12 May 2010, Brijesh Singh wrote:
>> On Wed, May 12, 2010 at 12:47 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Mon, 10 May 2010, Artem Bityutskiy wrote:
>> >> 1. SB PEB wear-out. What if the reaseblock lifetime is, say, 10000
>> >> erease cycles? Won't the SB PEB wear out very quickly? Why you did not
>> >> go for the chaining approach which I described in the old JFFS3 design
>> >> doc?
>> >>
>> >> If we do not implement chaining, we should at least design it and make
>> >> sure UBIL can be extended later so that SB chaining could be added.
>> >
>> > The super block needs to be scanned for from the beginning of flash
>> > anyway due to bad blocks. Putting it into a fixed position (first good
>> > erase block) is a very bad design decision vs. wear leveling.
>>
>> This scan is minimal once the bad blocks are marked bad. Flash driver
>> generally  returns error by reading oob area or in ram bbt table. In comparison,
>> keeping super block in first few blocks may become question of availability or
>> wear-leveling trade off. The scan time for super block itself will
>> take lot of time.
>> In fact, in that case we won't need the super block at all. Just scan
>> to find first
>> chained commit block. But this doesn't look like a very good idea.
>
> Well, the super block is special. It links to the log block, so that
> we can move the log block to any place in the flash. So it's going to
> be a block which is not recycled too often depending on how the log
> chain works.
>
> I don't see why scanning for the super block will take so much
> time. If we keep it inside of the first N blocks then it's a well
> defined scan time limit. And you just need to read the first page to
> find it.
>
>> > The super block must be moveable like any other block, though we can
>> > keep it as close to the start of flash as possible.
>>
>> The idea is to get rid of scanning. A fixed place super block can
>> locate movable headers.
>
> Right, but there is a difference between scanning 8k blocks and
> scanning a low number (16 or 32) blocks. That's in the single digit
> milliseconds range. So forcing it to a fixed location is just an
> overoptimization.
>
>> > Also chaining has a tradeoff. The more chains you need to walk the
>> > closer you get to the point where you are equally bad as a full scan.
>>
>> As artem suggested, chaining should help in minimizing writes to
>> anchor block at fixed
>> location. At first instance it looked promising. But this design also
>> has single point of failure.
>
> Not if you make the super block movable. Then you can create the
> replacement block before erasing the old super block. So in the worst
> case you have two super blocks in that scan range and you need to
> figure out which one is the valid one. No big deal.
>
>> > I don't think it's a good idea to kill the EC/VID headers. It not only
>> > violates the backwards compability it also fundamentally weakens UBIs
>> > reliability for no good reason and I doubt that the performance win is
>> > big enough to make it worth.
>> >
>> > The performance gain is at attach time by getting rid of the flash
>> > scan, but not by getting rid of writing the EC/VID headers.
>>
>> These flash headers have some more problems. Like, space wastage in MLC.
>> Alignment problem for byte addressable memory. Backward compatibility
>> is a good idea.
>> But it is possible to implement these features and higher performance
>> by getting rid of them.
>> It seemed a fair trade off to me. But I am open for any better solution.
>
> Well. You can get rid of them, but that needs to be optional.
>
> And if you remove them then your log needs to take care of the EC/VID
> tracking.
>
> That means you write at least ECC size long data to the log for every
> single transaction. IOW, you move the space waste partially from the
> data blocks to the loging blocks.
Right now we are doing this. Writing ECC size long data to log for each update.
But we removed EC,VID headers from block. Because they were not needed.
Log has all the data.Now, I am trying to get EC,VID headers back as an option.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12  8:35         ` Artem Bityutskiy
@ 2010-05-12  9:49           ` Brijesh Singh
  2010-05-12 10:01             ` Artem Bityutskiy
  2010-05-12 10:58             ` Thomas Gleixner
  0 siblings, 2 replies; 18+ messages in thread
From: Brijesh Singh @ 2010-05-12  9:49 UTC (permalink / raw)
  To: dedekind1; +Cc: Thomas Gleixner, linux-mtd, rohitvdongre

On Wed, May 12, 2010 at 2:05 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> On Wed, 2010-05-12 at 13:33 +0530, Brijesh Singh wrote:
>> Hi,
>>
>> On Wed, May 12, 2010 at 1:11 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
>> > On Tue, 2010-05-11 at 21:17 +0200, Thomas Gleixner wrote:
>> >> B1;2005;0cOn Mon, 10 May 2010, Artem Bityutskiy wrote:
>> >>
>> >> > On Sun, 2010-05-09 at 01:09 +0530, Brijesh Singh wrote:
>> >> > > Hi,
>> >> > >   I am forwarding you the design document for ubi with log. Please
>> >> > > find the ubil document at
>> >> > > http://git.infradead.org/users/brijesh/ubil_results/blob_plain/HEAD:/UBIL
>> >> > > design document.pdf
>> >>
>> >> @Brijesh, thanks for tackling this !
>> >>
>> >> > Hi guys,
>> >> >
>> >> > I've read the document. Looks very promising. Here some feed-back.
>> >> >
>> >> > 1. SB PEB wear-out. What if the reaseblock lifetime is, say, 10000
>> >> > erease cycles? Won't the SB PEB wear out very quickly? Why you did not
>> >> > go for the chaining approach which I described in the old JFFS3 design
>> >> > doc?
>> >> >
>> >> > If we do not implement chaining, we should at least design it and make
>> >> > sure UBIL can be extended later so that SB chaining could be added.
>> >>
>> >> The super block needs to be scanned for from the beginning of flash
>> >> anyway due to bad blocks. Putting it into a fixed position (first good
>> >> erase block) is a very bad design decision vs. wear leveling.
>> >>
>> >> The super block must be moveable like any other block, though we can
>> >> keep it as close to the start of flash as possible.
>> >>
>> >> Also chaining has a tradeoff. The more chains you need to walk the
>> >> closer you get to the point where you are equally bad as a full scan.
>> >
>> > Well, every new chain member reduces the superblock wear speed by order
>> > 2, so I the chain would have 2-4 eraseblocks in most cases, I guess,
>> > which is not bad.
>> >
>> > In the opposite, moving the SB 3-4 eraseblocks further only reduces the
>> > load merely by factor 3-4.
>> >
>> >> > 2. SB PEB at the end. I think this is a very bad idea. Imagine you have
>> >> > to do UBIL images for production on the factory. With your design you
>> >> > have the following bad drawbacks:
>> >> >
>> >> >   a. NAND flash has initial bad blocks, and you do not know how many,
>> >> >      and where they sit. These may be the last 8 eraseblocks. So, when
>> >> >      you prepare an image (say, with the ubinize user-space tool), where
>> >> >      will you put the second SB PEB?
>> >> >
>> >> >   b. Currently, UBI/UBIFS images are small. E.g., if you make an
>> >> >      UBI/UBIFS image for 1GiB flash, and you have just few KiB of files,
>> >> >      your image will be few megs - it will contain the files, and all
>> >> >      the needed UBI/UBIFS meta-data.
>> >> >
>> >> >      So now what will be image size for UBIL - 1GiB, and this is bad.
>> >> >      You then will transfer 1GiB of data to the devices during flashing
>> >> >      or you will have to invent ways to work around this. Do you need
>> >> >      these complexities?
>> >> >
>> >> > I think the second SB PEB should not be at the end.
>> >>
>> >> I think we do not need a second SB at all. UBI should not depend on
>> >> the super block in any way. The super block is an optimization for the
>> >> common case - nothing more.
>> >
>> > Yeah, if we preserve the headers we can always fall-back to scanning
>> > should something be broken.
>> >
>> >>
>> >> > 3. Backward-compatibility. In UBIL you removed EC anc VID headers in
>> >> >    PEBs. That's fine for optimization purposes. But it has draw-backs:
>> >> >
>> >> >    a. If any of the UBIL meta-data blocks like SB, CMT or log are
>> >> >       corrupted - that's it - we are screwed. You cannot anymore
>> >> >       re-consturct the data by scanning. The robustness goes down.
>> >> >
>> >> >    c. Backward compatibility - UBI will not be able to attach UBIL
>> >> >       images. This is not very nice.
>> >> >
>> >> > So, I think you should keep EC and VID headers in PEBs. And you should
>> >> > make the SB/CMT/log blocks to be a new type of UBI volume with
>> >> > UBI_COMPAT_DELETE or UBI_COMPAT_PRESERVE or UBI_COMPAT_RO type. In this
>> >> > case UBI will attach UBIL volumes just fine.
>> >> >
>> >> > Then, you can add an _option_ to have no EC/VID headers in PEBs. This
>> >> > then can be used for performance, if one wants to sacrifice robustness.
>> >> > But this should be the second step. In this case, you will just need to
>> >> > put a VID header with UBI_COMPAT_REJECT flag to the first PEB.
>> >>
>> >> I don't think it's a good idea to kill the EC/VID headers. It not only
>> >> violates the backwards compability it also fundamentally weakens UBIs
>> >> reliability for no good reason and I doubt that the performance win is
>> >> big enough to make it worth.
>> >>
>> >> The performance gain is at attach time by getting rid of the flash
>> >> scan, but not by getting rid of writing the EC/VID headers.
>> >
>> > Well, there are some space savings as well.
>> >
>> >>
>> >> The logging is a speed up / optimization for the common case, but it
>> >> needs to preserve full reconstruction via scanning all eraseblocks and
>> >> checking the EC/VID headers. That also allows retrofitting on existing
>> >> devices.
>> >>
>> >> I'd rather see the super block / log volume as a checkpointing
>> >> mechanism which provides a snapshot of the EC/VID headers at a given
>> >> point and a list of eraseblocks which need to be scanned at attach
>> >> time.
>> >>
>> >>
>> >> That has two main advantages:
>> >>  1) It limits the number of log writes
>> >>  2) It allows full backward and forward compatibility
>> >
>> > I think this is what they do, but they for some reasons removed the
>> > headers. If they add them back, it should look like you described.
>> >
>> > We should preserve the headers. It is always easy to disable them later,
>> > if someone needs this for optimization purposes. E.g., we can add an
>> > ubi_compat=0 option or something like that.
>> >
>> >> Looking at
>> >> http://git.infradead.org/users/brijesh/ubil_results/blob/HEAD:/nand_mount_time.pdf
>> >> I still see a linear - though less steep - attach time. For the 1GB
>> >> flash size it's still 0.8s which is nice progress vs. the 2s for the
>> >> non logging case. But that's surprising as one would expect that
>> >> logging would provide a more aggressive and non linear gain.
>> >>
>> >> Just doing the simple math:
>> >>
>> >> 1GB FLASH with erase block size 128K and page size 2k, that
>> >> translates to 8192 erase blocks
>> >>
>> >> So UBI scans 8192 erase block EC/VID headers in 2 seconds. That
>> >> equals to 8192 FLASH pages.
>> >>
>> >> UBIL needs 0.8 seconds. That means that UBIL still scans ~3236 FLASH
>> >> pages (or spends the equivivalent time) to achieve the same result.
>> >>
>> >> That looks wrong. Care to explain ?
>> >
>> > I suspect these are implementation issues. I did not look at the code,
>> > but I suspect they read whole CMT block and populate the all EB
>> > associations at scan time. However, they could populate them lazily,
>> > which would optimize things.
>> I am trying to summarize what I have understood.
>> I will send the patches if this is correct.
>> 1) Commit will have ec and vid headers just as any other UBI block.
>> The compat flag helps in backword compatibility,
>> 2)chained sb will locate commit. It will be part of internal volume as well.
>> 3) Commit will be called on unmount.
>> 4) Any unclean un-mount will lead to flash scanning just as UBI.
>
> No! Why you have the log then? Unclean reboots are handled by the log.
>
> Scanning happens only when you have _corrupted_ SB, or corrupted cmt, or
> log. Then you fall-back to scanning.
>
>> Any thing goes bad, normal scanning becomes recovery.
>> 5) Not sure if log is required in first place. But it could be an option.
>> Is that correct?
>
> No, at least I did not suggest you to get rid of the log. It is needed
> to handle unclean reboots.
Log is written for each EC or VID change. Frequency of log write is same as
the frequency of these headers. In case we keep both, there will be one log
write penalty per write/erase. So write performance will drop considerably.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12  9:49           ` Brijesh Singh
@ 2010-05-12 10:01             ` Artem Bityutskiy
  2010-05-12 10:25               ` Brijesh Singh
  2010-05-12 10:58             ` Thomas Gleixner
  1 sibling, 1 reply; 18+ messages in thread
From: Artem Bityutskiy @ 2010-05-12 10:01 UTC (permalink / raw)
  To: Brijesh Singh; +Cc: Thomas Gleixner, linux-mtd, rohitvdongre

On Wed, 2010-05-12 at 15:19 +0530, Brijesh Singh wrote:
> >> Any thing goes bad, normal scanning becomes recovery.
> >> 5) Not sure if log is required in first place. But it could be an option.
> >> Is that correct?
> >
> > No, at least I did not suggest you to get rid of the log. It is needed
> > to handle unclean reboots.
> Log is written for each EC or VID change.

Yes, I understand.

>  Frequency of log write is same as
> the frequency of these headers.

Right.

>  In case we keep both, there will be one log
> write penalty per write/erase.

Yes, each time you write to an unmapped LEB, you write the VID header
and a log entry.

>  So write performance will drop considerably.

Not sure about 'considerably'. This is to be tested. Keeping the headers
means 1 additional write per peb_size bytes, right? Plus just after
erase, EC header should be written. But this is the price you pay for
robustness and compatibility.

But again, it is very easy to switch off headers if this is needed,
isn't it?

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12 10:01             ` Artem Bityutskiy
@ 2010-05-12 10:25               ` Brijesh Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Brijesh Singh @ 2010-05-12 10:25 UTC (permalink / raw)
  To: dedekind1; +Cc: Thomas Gleixner, linux-mtd, rohitvdongre

On Wed, May 12, 2010 at 3:31 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> On Wed, 2010-05-12 at 15:19 +0530, Brijesh Singh wrote:
>> >> Any thing goes bad, normal scanning becomes recovery.
>> >> 5) Not sure if log is required in first place. But it could be an option.
>> >> Is that correct?
>> >
>> > No, at least I did not suggest you to get rid of the log. It is needed
>> > to handle unclean reboots.
>> Log is written for each EC or VID change.
>
> Yes, I understand.
>
>>  Frequency of log write is same as
>> the frequency of these headers.
>
> Right.
>
>>  In case we keep both, there will be one log
>> write penalty per write/erase.
>
> Yes, each time you write to an unmapped LEB, you write the VID header
> and a log entry.
>
>>  So write performance will drop considerably.
>
> Not sure about 'considerably'. This is to be tested. Keeping the headers
> means 1 additional write per peb_size bytes, right? Plus just after
> erase, EC header should be written. But this is the price you pay for
> robustness and compatibility.
>
> But again, it is very easy to switch off headers if this is needed,
> isn't it?
Yes.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12  9:49           ` Brijesh Singh
  2010-05-12 10:01             ` Artem Bityutskiy
@ 2010-05-12 10:58             ` Thomas Gleixner
  2010-05-13  7:10               ` Brijesh Singh
  1 sibling, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2010-05-12 10:58 UTC (permalink / raw)
  To: Brijesh Singh; +Cc: rohitvdongre, linux-mtd, dedekind1

On Wed, 12 May 2010, Brijesh Singh wrote:
> On Wed, May 12, 2010 at 2:05 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> > On Wed, 2010-05-12 at 13:33 +0530, Brijesh Singh wrote:
> >> 4) Any unclean un-mount will lead to flash scanning just as UBI.
> >
> > No! Why you have the log then? Unclean reboots are handled by the log.
> >
> > Scanning happens only when you have _corrupted_ SB, or corrupted cmt, or
> > log. Then you fall-back to scanning.
> >
> >> Any thing goes bad, normal scanning becomes recovery.
> >> 5) Not sure if log is required in first place. But it could be an option.
> >> Is that correct?
> >
> > No, at least I did not suggest you to get rid of the log. It is needed
> > to handle unclean reboots.
>
> Log is written for each EC or VID change. Frequency of log write is same as
> the frequency of these headers. In case we keep both, there will be one log
> write penalty per write/erase. So write performance will drop considerably.

True, but the reliability will drop as well. Losing a log block is
going to be fatal as there is no way to reconstruct while losing a
single block in UBI is not inevitably fatal.

Back then when UBI was designed / written we discussed a different
approach of avoiding the full flash scan while keeping the reliability
intact.

Superblock in the first couple of erase blocks which points to a
snapshot block. snapshot block(s) contain a compressed EC/VID header
snapshot. A defined number of blocks in that snapshot is marked as
NEED_SCAN. At the point of creating the snapshot these blocks are
empty and belong to the blocks with the lowest erase count.

Now when an UBI client (filesystem ...) requests an erase block one of
those NEED_SCAN marked blocks is given out. Blocks which are handed
back from the client for erasure which are not marked NEED_SCAN are
erased and not given out as long as there are still enough empty
blocks marked NEED_SCAN available. When we run out of NEED_SCAN marked
blocks we write a new snapshot with a new set of NEED_SCAN blocks.

So at attach time we read the snapshot and scan the few NEED_SCAN
blocks. They are either empty or assigned to a volume. If assigned
they can replace an already existing logical erase block reference in
the snapshot, so we know that we need to put the original physical
erase block into a lazy back ground scan list.

With that approach we keep the reliability of UBI untouched with the
penalty of scanning a limited number of erase blocks at attach time.

That limits the number of writes to the snapshot / log
significantly. For devices with a low write frequency that means that
the snapshot block can be untouched for a very long time.

The speed penalty is constant and does not depend on the number of log
entries after the snapshot.

Your full log approach is going to slower once the number of log
entries is greater than the number of NEED_SCAN marked blocks.

If we assume a page read time of 1ms and the number of NEED_SCAN
blocks of 64, then we talk about a constant overhead of 64 ms.

So lets look at the full picture:

Flashsize:                   1 GiB
Eraseblocksize:            128 KiB
Pagesize:                    2 KiB
Subpagesize:                 1 KiB
Number of erase blocks:   8192

Snapshot size per block:    16 Byte
Full snapshot size:        128 KiB
Full snapshot pages:	    64

Number of NEED_SCAN blocks: 64

Number of blocks to scan
for finding super block(s): 64

So with an assumption of page read time == 1ms the total time of
building the initial data structures in RAM is 3 * 64ms.

So yes, it _IS_ 3 times the time which we need for your log approach
(assumed that the super block is first good block and the number of
log entries after the snapshot is 0)

So once we agree that a moveable super block is the correct way, the
speed advantage is of your log approach is 64ms (still assumed that
the number of log entry pages is 0)

Now take the log entries into account. Once you have to read 64 pages
worth of log entries, which happens in the above example after exaclty
128 entries, the speed advantage is exaclty zero. From that point on
it's going to be worse.

Thoughts ?

	 tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: UBIL design doc
  2010-05-12 10:58             ` Thomas Gleixner
@ 2010-05-13  7:10               ` Brijesh Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Brijesh Singh @ 2010-05-13  7:10 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: rohitvdongre, linux-mtd, dedekind1

Hi Thomas,
 Thanks for the idea. It looked impressive.

On Wed, May 12, 2010 at 4:28 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 12 May 2010, Brijesh Singh wrote:
>> On Wed, May 12, 2010 at 2:05 PM, Artem Bityutskiy <dedekind1@gmail.com> wrote:
>> > On Wed, 2010-05-12 at 13:33 +0530, Brijesh Singh wrote:
>> >> 4) Any unclean un-mount will lead to flash scanning just as UBI.
>> >
>> > No! Why you have the log then? Unclean reboots are handled by the log.
>> >
>> > Scanning happens only when you have _corrupted_ SB, or corrupted cmt, or
>> > log. Then you fall-back to scanning.
>> >
>> >> Any thing goes bad, normal scanning becomes recovery.
>> >> 5) Not sure if log is required in first place. But it could be an option.
>> >> Is that correct?
>> >
>> > No, at least I did not suggest you to get rid of the log. It is needed
>> > to handle unclean reboots.
>>
>> Log is written for each EC or VID change. Frequency of log write is same as
>> the frequency of these headers. In case we keep both, there will be one log
>> write penalty per write/erase. So write performance will drop considerably.
>
> True, but the reliability will drop as well. Losing a log block is
> going to be fatal as there is no way to reconstruct while losing a
> single block in UBI is not inevitably fatal.
>
> Back then when UBI was designed / written we discussed a different
> approach of avoiding the full flash scan while keeping the reliability
> intact.
>
> Superblock in the first couple of erase blocks which points to a
> snapshot block. snapshot block(s) contain a compressed EC/VID header
> snapshot. A defined number of blocks in that snapshot is marked as
> NEED_SCAN. At the point of creating the snapshot these blocks are
> empty and belong to the blocks with the lowest erase count.
>
> Now when an UBI client (filesystem ...) requests an erase block one of
> those NEED_SCAN marked blocks is given out. Blocks which are handed
> back from the client for erasure which are not marked NEED_SCAN are
> erased and not given out as long as there are still enough empty
> blocks marked NEED_SCAN available. When we run out of NEED_SCAN marked
> blocks we write a new snapshot with a new set of NEED_SCAN blocks.

This is compromise with wear-leveling.Also, erasing a block will write
EC to flash. We won't be able to erase any of no NEED_SCAN blocks.
Only NEED_SCAN blocks can be erased after the snapshot is written. So
wear-leveling thread will be inactive.
Problems:
1)What if a block which is not NEED_SCAN block, is unmapped, how do we
erase it? We can't.
2)What if wear-leveling threshold is hit? How to move blocks?

> So at attach time we read the snapshot and scan the few NEED_SCAN
> blocks. They are either empty or assigned to a volume. If assigned
> they can replace an already existing logical erase block reference in
> the snapshot, so we know that we need to put the original physical
> erase block into a lazy back ground scan list.
>
> With that approach we keep the reliability of UBI untouched with the
> penalty of scanning a limited number of erase blocks at attach time.
>
> That limits the number of writes to the snapshot / log
> significantly. For devices with a low write frequency that means that
> the snapshot block can be untouched for a very long time.
>
> The speed penalty is constant and does not depend on the number of log
> entries after the snapshot.
>
> Your full log approach is going to slower once the number of log
> entries is greater than the number of NEED_SCAN marked blocks.
>
> If we assume a page read time of 1ms and the number of NEED_SCAN
> blocks of 64, then we talk about a constant overhead of 64 ms.
>
> So lets look at the full picture:
>
> Flashsize:                   1 GiB
> Eraseblocksize:            128 KiB
> Pagesize:                    2 KiB
> Subpagesize:                 1 KiB
> Number of erase blocks:   8192
>
> Snapshot size per block:    16 Byte
> Full snapshot size:        128 KiB
> Full snapshot pages:        64
>
> Number of NEED_SCAN blocks: 64
>
> Number of blocks to scan
> for finding super block(s): 64
>
> So with an assumption of page read time == 1ms the total time of
> building the initial data structures in RAM is 3 * 64ms.
>
> So yes, it _IS_ 3 times the time which we need for your log approach
> (assumed that the super block is first good block and the number of
> log entries after the snapshot is 0)
>
> So once we agree that a moveable super block is the correct way, the
> speed advantage is of your log approach is 64ms (still assumed that
> the number of log entry pages is 0)
>
> Now take the log entries into account. Once you have to read 64 pages
> worth of log entries, which happens in the above example after exaclty
> 128 entries, the speed advantage is exaclty zero. From that point on
> it's going to be worse.
>
> Thoughts ?
It is getting complicated. Should we fix back word compatibility first
and then can come to these optimization?

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-05-13  7:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-08 19:39 UBIL design doc Brijesh Singh
2010-05-10  7:15 ` Artem Bityutskiy
2010-05-10 10:31   ` Brijesh Singh
2010-05-11 19:17   ` Thomas Gleixner
2010-05-12  7:03     ` Brijesh Singh
2010-05-12  7:14       ` Brijesh Singh
2010-05-12  9:02       ` Thomas Gleixner
2010-05-12  9:46         ` Brijesh Singh
2010-05-12  7:41     ` Artem Bityutskiy
2010-05-12  8:03       ` Brijesh Singh
2010-05-12  8:35         ` Artem Bityutskiy
2010-05-12  9:49           ` Brijesh Singh
2010-05-12 10:01             ` Artem Bityutskiy
2010-05-12 10:25               ` Brijesh Singh
2010-05-12 10:58             ` Thomas Gleixner
2010-05-13  7:10               ` Brijesh Singh
2010-05-12  9:06       ` Thomas Gleixner
2010-05-12  9:31         ` Artem Bityutskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.