All of lore.kernel.org
 help / color / mirror / Atom feed
* UBIFS power cut issues
@ 2009-09-02  9:35 JiSheng Zhang
  2009-09-08  6:22 ` Artem Bityutskiy
  0 siblings, 1 reply; 10+ messages in thread
From: JiSheng Zhang @ 2009-09-02  9:35 UTC (permalink / raw)
  To: linux-mtd

Hi list,

If we cut power when copy file into ubifs, then remount ubifs and try
to read the file, we found that the data at some offset of the file
began different from the data of the original file at the same offset.
Is this a bug of ubifs?

PS:how do you test data integrity of ubifs under power loss?

Thanks in advance,
Jisheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS power cut issues
  2009-09-02  9:35 UBIFS power cut issues JiSheng Zhang
@ 2009-09-08  6:22 ` Artem Bityutskiy
  2009-09-09  9:45   ` JiSheng Zhang
  0 siblings, 1 reply; 10+ messages in thread
From: Artem Bityutskiy @ 2009-09-08  6:22 UTC (permalink / raw)
  To: JiSheng Zhang; +Cc: linux-mtd

Hi,

sorry for late answer, was very busy.

On Wed, 2009-09-02 at 17:35 +0800, JiSheng Zhang wrote:
> If we cut power when copy file into ubifs, then remount ubifs and try
> to read the file, we found that the data at some offset of the file
> began different from the data of the original file at the same offset.
> Is this a bug of ubifs?

This is expected behavior on any asynchronous FS. You may switch to
synchronous behavior with '-o sync' mount option. I wrote a lot of
docs about write-back and the related issues. Dig UBIFS docs and FAQ.
E.g.:

http://www.linux-mtd.infradead.org/faq/ubifs.html#L_empty_file
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_writeback

If you have a _specific_ question, feel free to ask, of course. But
for this general question I do not have a better answer than RTFM
:-)))

> PS:how do you test data integrity of ubifs under power loss?

We mostly checked it using either 'integck' (see mtd-utils) or
'fsstress' (see LTP). We ran those tests and cut power off at random
points using these devices:

http://www.cpscom.com/gprod/ipn.htm

Then we mounted the FS. We did not really check the contents of the
FS, because it is not simple and tricky, but we checked that it mounts,
re-mounts, and files are readable/writable/deletable.

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS power cut issues
  2009-09-08  6:22 ` Artem Bityutskiy
@ 2009-09-09  9:45   ` JiSheng Zhang
  2009-09-09 10:06     ` Artem Bityutskiy
  2009-09-10 15:42     ` Artem Bityutskiy
  0 siblings, 2 replies; 10+ messages in thread
From: JiSheng Zhang @ 2009-09-09  9:45 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

Hi Artem,

2009/9/8 Artem Bityutskiy <dedekind1@gmail.com>:
> Hi,
>
> sorry for late answer, was very busy.
>
> On Wed, 2009-09-02 at 17:35 +0800, JiSheng Zhang wrote:
>> If we cut power when copy file into ubifs, then remount ubifs and try
>> to read the file, we found that the data at some offset of the file
>> began different from the data of the original file at the same offset.
>> Is this a bug of ubifs?
>
> This is expected behavior on any asynchronous FS. You may switch to
> synchronous behavior with '-o sync' mount option. I wrote a lot of

I have tested with "mount -o sync", the result is the same. It's not
empty file. For example:
cp fileA /mnt/ubifs/fileB
random cut power before "cp" completed.
then remount
>From head of /mnt/ubifs/fileB to some offset offsetC is the same as
fileA. But from offsetC to the end is different from fileA at the same
offset offsetC, it's not empty either.
Hope I expressed myself clearly.

> docs about write-back and the related issues. Dig UBIFS docs and FAQ.
> E.g.:
>
> http://www.linux-mtd.infradead.org/faq/ubifs.html#L_empty_file
> http://www.linux-mtd.infradead.org/doc/ubifs.html#L_writeback
>
> If you have a _specific_ question, feel free to ask, of course. But
> for this general question I do not have a better answer than RTFM
> :-)))
>
>> PS:how do you test data integrity of ubifs under power loss?
>
> We mostly checked it using either 'integck' (see mtd-utils) or
> 'fsstress' (see LTP). We ran those tests and cut power off at random
> points using these devices:
>
> http://www.cpscom.com/gprod/ipn.htm
>
> Then we mounted the FS. We did not really check the contents of the
> FS, because it is not simple and tricky, but we checked that it mounts,
> re-mounts, and files are readable/writable/deletable.

Thanks for this information.

Regards,
Jisheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS power cut issues
  2009-09-09  9:45   ` JiSheng Zhang
@ 2009-09-09 10:06     ` Artem Bityutskiy
  2009-09-11  9:23       ` JiSheng Zhang
  2009-09-10 15:42     ` Artem Bityutskiy
  1 sibling, 1 reply; 10+ messages in thread
From: Artem Bityutskiy @ 2009-09-09 10:06 UTC (permalink / raw)
  To: JiSheng Zhang; +Cc: linux-mtd

On 09/09/2009 12:45 PM, JiSheng Zhang wrote:
>> On Wed, 2009-09-02 at 17:35 +0800, JiSheng Zhang wrote:
>>> If we cut power when copy file into ubifs, then remount ubifs and try
>>> to read the file, we found that the data at some offset of the file
>>> began different from the data of the original file at the same offset.
>>> Is this a bug of ubifs?
>>
>> This is expected behavior on any asynchronous FS. You may switch to
>> synchronous behavior with '-o sync' mount option. I wrote a lot of
>
> I have tested with "mount -o sync", the result is the same. It's not
> empty file. For example:
> cp fileA /mnt/ubifs/fileB
> random cut power before "cp" completed.
> then remount
>  From head of /mnt/ubifs/fileB to some offset offsetC is the same as
> fileA. But from offsetC to the end is different from fileA at the same
> offset offsetC, it's not empty either.
> Hope I expressed myself clearly.

Hmm, ok. What is your kernel version?

Could you please take a closer look and see if these differences
are zeroes or not?

Do you have an automated test for this? Can you share your script?

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS power cut issues
  2009-09-09  9:45   ` JiSheng Zhang
  2009-09-09 10:06     ` Artem Bityutskiy
@ 2009-09-10 15:42     ` Artem Bityutskiy
  2009-09-10 16:00       ` Bill Gatliff
  2009-09-11  9:33       ` JiSheng Zhang
  1 sibling, 2 replies; 10+ messages in thread
From: Artem Bityutskiy @ 2009-09-10 15:42 UTC (permalink / raw)
  To: JiSheng Zhang; +Cc: linux-mtd

On Wed, 2009-09-09 at 17:45 +0800, JiSheng Zhang wrote:
> Hi Artem,
> 
> 2009/9/8 Artem Bityutskiy <dedekind1@gmail.com>:
> > Hi,
> >
> > sorry for late answer, was very busy.
> >
> > On Wed, 2009-09-02 at 17:35 +0800, JiSheng Zhang wrote:
> >> If we cut power when copy file into ubifs, then remount ubifs and try
> >> to read the file, we found that the data at some offset of the file
> >> began different from the data of the original file at the same offset.
> >> Is this a bug of ubifs?
> >
> > This is expected behavior on any asynchronous FS. You may switch to
> > synchronous behavior with '-o sync' mount option. I wrote a lot of
> 
> I have tested with "mount -o sync", the result is the same. It's not
> empty file. For example:
> cp fileA /mnt/ubifs/fileB
> random cut power before "cp" completed.
> then remount
> From head of /mnt/ubifs/fileB to some offset offsetC is the same as
> fileA. But from offsetC to the end is different from fileA at the same
> offset offsetC, it's not empty either.
> Hope I expressed myself clearly.

I believe you have zeroes at the end. These are actually holes. And this
is actually expected. I've added these pieces of documentation for you:

http://www.linux-mtd.infradead.org/faq/ubifs.html#L_end_hole
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_sync_semantics

And the text here, just in case someone would review it.

UBIFS in synchronous mode vs JFFS2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When UBIFS is mounted in synchronous mode (-o sync mount options) - all
file system operations become synchronous. This means that all data are
written to flash before the file-system operations return.

For example, if you write 10MiB of data to a file f.dat using the
write() call, and UBIFS is in synchronous mode, then UBIFS guarantees
that all 10MiB of data and the meta-data (file size and date changes)
will reach the flash media before write() returns. And if a power cut
happens after the write() call returns, the file will contain the
written data.

The same is true for situations when f.dat has was opened with O_SYNC or
has the sync flag (see man 2 chattr).

It is well-known that the JFFS2 file-system is synchronous (except a
small write-buffer). However, UBIFS in synchronous mode is not the same
as JFFS2 and provides somewhat less guarantees that JFFS2 does with
respect to sudden power cuts.

In JFFS2 all the meta-data (like inode atime/mtime/ctime, inode size,
UID/GID, etc) are stored in the data node headers. Data nodes carry 4KiB
of (compressed) data. This means that the meta-data information is
duplicated in many places, but this also means that every time JFFS2
writes a data node to the flash media, it updates inode size as well.

In practice this means that JFFS2 will write these 10MiB of data
sequentially, from the beginning to the end. And if you have a power
cut, you will just loose some amount of data at the end of the inode.
For example, if JFFS2 starts writing those 10MiB of data, write 5MiB,
and a power cut happens, you will end up with a 5MiB f.dat file. You
loose only the last 5MiB.

Things are a little bit more complex in case of UBIFS, where data are
stored in data nodes and meta-data are stored in (separate) inode nodes.
The meta-data are not duplicated in each data node, like in JFFS2. Lets
consider an example.

      * User creates an empty file f.dat. The file is synchronous, or
        UBIFS is mounted in synchronous mode. User calls the write()
        function with a 10MiB buffer.
      * The kernel first copies all 10MiB of the data to the page cache.
        Inode size is changed to 10MiB as well and the inode is marked
        as dirty. Nothing has been written to the flash media so far. If
        a power cut happens at this point, the user will end up with an
        empty f.dat file.
      * UBIFS sees that the I/O has to be synchronous, and starts
        synchronizing the inode. First of all, it writes the inode node
        to the flash media. If a power cut happens at this moment, the
        user will end up with a 10MiB file which contains no data
        (hole), and if he read this file, he will get 10MiB of zeroes.
      * UBIFS starts writing the data. If a power cut happens at this
        point, the user will end up with a 10MiB file containing a hole
        at the end.

Note, if the I/O was not synchronous, UBIFS would skip the last step and
would just return. And the actual write-back would then happen in
back-ground. But power cuts during write-back could anyway lead to files
with holes at the end.

Thus, synchronous I/O in UBIFS provides less guarantees than JFFS2 I/O -
UBIFS has an effect of holes at the end of files. In ideal world
applications should not assume anything about the contents of files
which were not synchronized before a power-cut has happened. And
"mainstream" file-systems like ext3 do not provide JFSS2-like
guarantees.

However, UBIFS is sometimes used as a JFFS2 replacement and people may
want it to behave the same way as JFFS2 if it is mounted synchronously.
This is doable, but needs some non-trivial development, so this was not
implemented so far. On the other hand, there was no strong demand. You
may implement this as an excercise, or you may try to convince UBIFS
authors to do this.


-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS power cut issues
  2009-09-10 15:42     ` Artem Bityutskiy
@ 2009-09-10 16:00       ` Bill Gatliff
  2009-09-11  8:01         ` Artem Bityutskiy
  2009-09-11  9:33       ` JiSheng Zhang
  1 sibling, 1 reply; 10+ messages in thread
From: Bill Gatliff @ 2009-09-10 16:00 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, JiSheng Zhang

Artem Bityutskiy wrote:
> And the text here, just in case someone would review it.
>   

When you mean "something is lost", the correct spelling is "lose".  To 
"loose" means to "disconnect", or "release" something.


> However, UBIFS is sometimes used as a JFFS2 replacement and people may
> want it to behave the same way as JFFS2 if it is mounted synchronously.
> This is doable, but needs some non-trivial development, so this was not
> implemented so far. On the other hand, there was no strong demand. You
> may implement this as an excercise, or you may try to convince UBIFS
> authors to do this.
>   

In summary, the differences in results between JFFS2 and UBIFS in the 
case of interrupted, large synchronous writes are related to differences 
in how the two store and/or compute file sizes?

Based on your documentation, my understanding is that with JFFS2 file 
sizes are stored along with the file data nodes, and are updated as the 
file grows in size--- so an interruption truncates the file at the point 
the interruption occurs.  For UBIFS, in contrast, file sizes are stored 
in separate nodes which might not have been written at the point of 
interruption--- so the state if the file when power is restored depends 
highly upon the precise moment that the interruption occurs.



b.g.

-- 
Bill Gatliff
bgat@billgatliff.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS power cut issues
  2009-09-10 16:00       ` Bill Gatliff
@ 2009-09-11  8:01         ` Artem Bityutskiy
  0 siblings, 0 replies; 10+ messages in thread
From: Artem Bityutskiy @ 2009-09-11  8:01 UTC (permalink / raw)
  To: Bill Gatliff; +Cc: linux-mtd, JiSheng Zhang

On 09/10/2009 07:00 PM, Bill Gatliff wrote:
> Artem Bityutskiy wrote:
>> And the text here, just in case someone would review it.
>
> When you mean "something is lost", the correct spelling is "lose". To
> "loose" means to "disconnect", or "release" something.

Thanks, fixed:

http://git.infradead.org/mtd-www.git/commit/8b407024f4b8377eae6557644c29a31bc20e1350

>> However, UBIFS is sometimes used as a JFFS2 replacement and people may
>> want it to behave the same way as JFFS2 if it is mounted synchronously.
>> This is doable, but needs some non-trivial development, so this was not
>> implemented so far. On the other hand, there was no strong demand. You
>> may implement this as an excercise, or you may try to convince UBIFS
>> authors to do this.
>
> In summary, the differences in results between JFFS2 and UBIFS in the
> case of interrupted, large synchronous writes are related to differences
> in how the two store and/or compute file sizes?

Yes. JFFS2 stores inode size in data nodes. So every time it writes the
data node to the flash, it updates the inode size. When JFFS2 mounts the
flash, it does full scanning, finds the last written data node and thus,
it has correct inode size.

UBIFS does not store file size in data nodes, but stores it in separate
inode nodes, pretty much like any FS does. And UBIFS does not do scanning.
This is where the difficulties come from.

> Based on your documentation, my understanding is that with JFFS2 file
> sizes are stored along with the file data nodes, and are updated as the
> file grows in size--- so an interruption truncates the file at the point
> the interruption occurs.

Right.

> For UBIFS, in contrast, file sizes are stored
> in separate nodes which might not have been written at the point of
> interruption--- so the state if the file when power is restored depends
> highly upon the precise moment that the interruption occurs.

Not exactly. UBIFS never writes data nodes beyond the on-flash inode size.
If it has to write a data node and the data node is beyond the on-flash inode
size (the in-memory inode has up-to-data size, but it is dirty and was not
flushed yet), then UBIFS first writes the inode to the media, and then it
starts writing the data. And if you have an interrupt, you _lose_ data
nodes and you have holes (or old data nodes, if you are overwriting).

If you need information why UBIFS never writes beyond inode size, you may
take a look at file.c, there is a comment explaining this:

/*
  * When writing-back dirty inodes, VFS first writes-back pages belonging to the
  * inode, then the inode itself. For UBIFS this may cause a problem. Consider a
  * situation when a we have an inode with size 0, then a megabyte of data is
  * appended to the inode, then write-back starts and flushes some amount of the
  * dirty pages, the journal becomes full, commit happens and finishes, and then
  * an unclean reboot happens. When the file system is mounted next time, the
  * inode size would still be 0, but there would be many pages which are beyond
  * the inode size, they would be indexed and consume flash space. Because the
  * journal has been committed, the replay would not be able to detect this
  * situation and correct the inode size. This means UBIFS would have to scan
  * whole index and correct all inode sizes, which is long an unacceptable.
  *
  * To prevent situations like this, UBIFS writes pages back only if they are
  * within the last synchronized inode size, i.e. the size which has been
  * written to the flash media last time. Otherwise, UBIFS forces inode
  * write-back, thus making sure the on-flash inode contains current inode size,
  * and then keeps writing pages back.
...

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS power cut issues
  2009-09-09 10:06     ` Artem Bityutskiy
@ 2009-09-11  9:23       ` JiSheng Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: JiSheng Zhang @ 2009-09-11  9:23 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: linux-mtd

Hi Artem,

2009/9/9 Artem Bityutskiy <dedekind1@gmail.com>:
> On 09/09/2009 12:45 PM, JiSheng Zhang wrote:
>>>
>>> On Wed, 2009-09-02 at 17:35 +0800, JiSheng Zhang wrote:
>>>>
>>>> If we cut power when copy file into ubifs, then remount ubifs and try
>>>> to read the file, we found that the data at some offset of the file
>>>> began different from the data of the original file at the same offset.
>>>> Is this a bug of ubifs?
>>>
>>> This is expected behavior on any asynchronous FS. You may switch to
>>> synchronous behavior with '-o sync' mount option. I wrote a lot of
>>
>> I have tested with "mount -o sync", the result is the same. It's not
>> empty file. For example:
>> cp fileA /mnt/ubifs/fileB
>> random cut power before "cp" completed.
>> then remount
>>  From head of /mnt/ubifs/fileB to some offset offsetC is the same as
>> fileA. But from offsetC to the end is different from fileA at the same
>> offset offsetC, it's not empty either.
>> Hope I expressed myself clearly.
>
> Hmm, ok. What is your kernel version?
>
> Could you please take a closer look and see if these differences
> are zeroes or not?

My mistake, sorry. I have look from the offset to the end of the file,
they're really 0, that is file hole.
>
> Do you have an automated test for this? Can you share your script?

Hmm, I just run copy manually and diff once mounted again.
>
> --
> Best Regards,
> Artem Bityutskiy (Артём Битюцкий)
>

Best Regards,
Jisheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS power cut issues
  2009-09-10 15:42     ` Artem Bityutskiy
  2009-09-10 16:00       ` Bill Gatliff
@ 2009-09-11  9:33       ` JiSheng Zhang
  2009-09-11 10:06         ` Artem Bityutskiy
  1 sibling, 1 reply; 10+ messages in thread
From: JiSheng Zhang @ 2009-09-11  9:33 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

Hi Artem,


2009/9/10 Artem Bityutskiy <dedekind1@gmail.com>:
>
>      * User creates an empty file f.dat. The file is synchronous, or
>        UBIFS is mounted in synchronous mode. User calls the write()
>        function with a 10MiB buffer.
>      * The kernel first copies all 10MiB of the data to the page cache.
>        Inode size is changed to 10MiB as well and the inode is marked
>        as dirty. Nothing has been written to the flash media so far. If
>        a power cut happens at this point, the user will end up with an
>        empty f.dat file.
>      * UBIFS sees that the I/O has to be synchronous, and starts
>        synchronizing the inode. First of all, it writes the inode node
>        to the flash media. If a power cut happens at this moment, the
>        user will end up with a 10MiB file which contains no data
>        (hole), and if he read this file, he will get 10MiB of zeroes.
>      * UBIFS starts writing the data. If a power cut happens at this
>        point, the user will end up with a 10MiB file containing a hole
>        at the end.
>
> Note, if the I/O was not synchronous, UBIFS would skip the last step and
> would just return. And the actual write-back would then happen in
> back-ground. But power cuts during write-back could anyway lead to files
> with holes at the end.

Thanks very much for this document, excellent document, I like it very much.
>
> Thus, synchronous I/O in UBIFS provides less guarantees than JFFS2 I/O -
> UBIFS has an effect of holes at the end of files. In ideal world
> applications should not assume anything about the contents of files
> which were not synchronized before a power-cut has happened. And
> "mainstream" file-systems like ext3 do not provide JFSS2-like
> guarantees.
>
> However, UBIFS is sometimes used as a JFFS2 replacement and people may
> want it to behave the same way as JFFS2 if it is mounted synchronously.
> This is doable, but needs some non-trivial development, so this was not
> implemented so far. On the other hand, there was no strong demand. You
> may implement this as an excercise, or you may try to convince UBIFS
> authors to do this.

Hmmm, this style(there's hole at the end of file) can be accepted.

Thanks again,
Jisheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS power cut issues
  2009-09-11  9:33       ` JiSheng Zhang
@ 2009-09-11 10:06         ` Artem Bityutskiy
  0 siblings, 0 replies; 10+ messages in thread
From: Artem Bityutskiy @ 2009-09-11 10:06 UTC (permalink / raw)
  To: JiSheng Zhang; +Cc: linux-mtd

On 09/11/2009 12:33 PM, JiSheng Zhang wrote:
>> However, UBIFS is sometimes used as a JFFS2 replacement and people may
>> want it to behave the same way as JFFS2 if it is mounted synchronously.
>> This is doable, but needs some non-trivial development, so this was not
>> implemented so far. On the other hand, there was no strong demand. You
>> may implement this as an excercise, or you may try to convince UBIFS
>> authors to do this.
>
> Hmmm, this style(there's hole at the end of file) can be accepted.

Also note, due to an MM but the pages are sometimes written not exactly
in order. Adrian made a patch for this, but the patch was not yet made
it upstream:

http://marc.info/?l=linux-kernel&m=125233252015797&w=2

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-09-11 10:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-02  9:35 UBIFS power cut issues JiSheng Zhang
2009-09-08  6:22 ` Artem Bityutskiy
2009-09-09  9:45   ` JiSheng Zhang
2009-09-09 10:06     ` Artem Bityutskiy
2009-09-11  9:23       ` JiSheng Zhang
2009-09-10 15:42     ` Artem Bityutskiy
2009-09-10 16:00       ` Bill Gatliff
2009-09-11  8:01         ` Artem Bityutskiy
2009-09-11  9:33       ` JiSheng Zhang
2009-09-11 10:06         ` Artem Bityutskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.