All of lore.kernel.org
 help / color / mirror / Atom feed
* UBIFS: file data corruption during the power cut-off test
@ 2019-06-06  9:10 Sergei Poselenov
  2019-06-06 14:55 ` Richard Weinberger
  0 siblings, 1 reply; 16+ messages in thread
From: Sergei Poselenov @ 2019-06-06  9:10 UTC (permalink / raw)
  To: linux-mtd

Hello,

We have an embedded system based on i.MX 6ULL, with the root filesystem deployed on a UBI volume built on top of the SLC NAND device. The volume is mounted as "rw,sync,relatime".

We are doing the following test:

   - Copy the same file (/bin/busybox) to a different file names, in a sequence.

   - Power is cut off during the copy sequence.

   - After reboot, UBI mounts OK, and we see a number of valid copies of /bin/busybox (md5sum matches), and the last file with the short length (partially written), as expected.

The question is: should we expect the last partially-written file to have the valid data of the original? We observe that quite often, the last 512 or so bytes in the short file doesn't match the original.

Any thoughts are welcome.

Regards,
Sergei

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-06  9:10 UBIFS: file data corruption during the power cut-off test Sergei Poselenov
@ 2019-06-06 14:55 ` Richard Weinberger
  2019-06-06 18:08   ` Sergei Poselenov
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Weinberger @ 2019-06-06 14:55 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: linux-mtd

On Thu, Jun 6, 2019 at 11:10 AM Sergei Poselenov <sposelenov@emcraft.com> wrote:
>
> Hello,
>
> We have an embedded system based on i.MX 6ULL, with the root filesystem deployed on a UBI volume built on top of the SLC NAND device. The volume is mounted as "rw,sync,relatime".
>
> We are doing the following test:
>
>    - Copy the same file (/bin/busybox) to a different file names, in a sequence.
>
>    - Power is cut off during the copy sequence.
>
>    - After reboot, UBI mounts OK, and we see a number of valid copies of /bin/busybox (md5sum matches), and the last file with the short length (partially written), as expected.
>
> The question is: should we expect the last partially-written file to have the valid data of the original? We observe that quite often, the last 512 or so bytes in the short file doesn't match the original.

Hm, I don't fully understand your question.
If you copy files and this is interrupted, the copy is not complete
and therefore contents are missing.

-- 
Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-06 14:55 ` Richard Weinberger
@ 2019-06-06 18:08   ` Sergei Poselenov
  2019-06-06 18:13     ` Richard Weinberger
  0 siblings, 1 reply; 16+ messages in thread
From: Sergei Poselenov @ 2019-06-06 18:08 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

Hello Richard,


On Thu, 6 Jun 2019 16:55:27 +0200 Richard Weinberger <richard.weinberger@gmail.com> wrote:

> On Thu, Jun 6, 2019 at 11:10 AM Sergei Poselenov <sposelenov@emcraft.com> wrote:
> >
> > Hello,
> >
> > We have an embedded system based on i.MX 6ULL, with the root filesystem deployed on a UBI volume built on top of the SLC NAND device. The volume is mounted as "rw,sync,relatime".
> >
> > We are doing the following test:
> >
> >    - Copy the same file (/bin/busybox) to a different file names, in a sequence.
> >
> >    - Power is cut off during the copy sequence.
> >
> >    - After reboot, UBI mounts OK, and we see a number of valid copies of /bin/busybox (md5sum matches), and the last file with the short length (partially written), as expected.
> >
> > The question is: should we expect the last partially-written file to have the valid data of the original? We observe that quite often, the last 512 or so bytes in the short file doesn't match the original.  
> 
> Hm, I don't fully understand your question.
> If you copy files and this is interrupted, the copy is not complete
> and therefore contents are missing.
> 

This is understood. However, on the file length that is written to the partition, I'd expect that the file content will be the same as in the original file. This is not so.
Is it expected, or is it a deficiency of UBI?

Thanks!

Regards,
Sergei

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-06 18:08   ` Sergei Poselenov
@ 2019-06-06 18:13     ` Richard Weinberger
  2019-06-07 14:23       ` Sergei Poselenov
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Weinberger @ 2019-06-06 18:13 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: linux-mtd

On Thu, Jun 6, 2019 at 8:08 PM Sergei Poselenov <sposelenov@emcraft.com> wrote:
> This is understood. However, on the file length that is written to the partition, I'd expect that the file content will be the same as in the original file. This is not so.
> Is it expected, or is it a deficiency of UBI?

Please show in detail what you are doing, on syscall level, and what
the expected output is.

-- 
Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-06 18:13     ` Richard Weinberger
@ 2019-06-07 14:23       ` Sergei Poselenov
  2019-06-07 16:01         ` Steve deRosier
  2019-06-07 16:02         ` Richard Weinberger
  0 siblings, 2 replies; 16+ messages in thread
From: Sergei Poselenov @ 2019-06-07 14:23 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

Hello Richard,

On Thu, 6 Jun 2019 20:13:07 +0200 Richard Weinberger <richard.weinberger@gmail.com> wrote:

> On Thu, Jun 6, 2019 at 8:08 PM Sergei Poselenov <sposelenov@emcraft.com> wrote:
> > This is understood. However, on the file length that is written to the partition, I'd expect that the file content will be the same as in the original file. This is not so.
> > Is it expected, or is it a deficiency of UBI?  
> 
> Please show in detail what you are doing, on syscall level, and what
> the expected output is.
> 

Here is my test:
/mnt/data1 # for i in `seq 0 99`; do
> dd if=/bin/busybox of=test${i}; done

During the  test, the board is powered off.

After bootup:
/mnt/data1 # ls -l
-rw-r--r--    1 root     root        396795 Jan  8 23:58 test0
-rw-r--r--    1 root     root        396795 Jan  8 23:58 test1
-rw-r--r--    1 root     root        113664 Jan  8 23:58 test2
/mnt/data1 #
/mnt/data1 # md5sum /bin/busybox *
23376319de62934c3859615e6244aae0  /bin/busybox
23376319de62934c3859615e6244aae0  test0
23376319de62934c3859615e6244aae0  test1
87cd1d1a28896e63eb039d75efef43a3  test2

As expected, test2 is shorter, hence has a different checksum.

Now, I' overwriting the content of the "test0" with "test2", without "test0" truncation:
mnt/data1 # dd if=test2 of=test0 conv=notrunc
222+0 records in
222+0 records out
113664 bytes (111.0KB) copied, 0.859668 seconds, 129.1KB/s
/mnt/data1 # md5sum *
23376319de62934c3859615e6244aae0  test0
23376319de62934c3859615e6244aae0  test1
87cd1d1a28896e63eb039d75efef43a3  test2
/mnt/data1 #

For this particular test pass, the content of test2 matches the content of the original file test0, so overwriting test0 with test2 doesn't change the data.

However, upon retry of the very same test from the beginning (with the power cut-off in the middle) it's easily to have the content of test2 (exactly the last 512 bytes in my case) which doesn't match test0, so "dd if=test2 of=test0 conv=notrunc" will result in test0 with a different checksum.

To remind, the partition is mounted with the "sync" option.

Thanks!

Regards,
Sergei


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-07 14:23       ` Sergei Poselenov
@ 2019-06-07 16:01         ` Steve deRosier
  2019-06-09  8:32           ` Sergei Poselenov
  2019-06-07 16:02         ` Richard Weinberger
  1 sibling, 1 reply; 16+ messages in thread
From: Steve deRosier @ 2019-06-07 16:01 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: Richard Weinberger, linux-mtd

On Fri, Jun 7, 2019 at 7:24 AM Sergei Poselenov <sposelenov@emcraft.com> wrote:
>
> Hello Richard,
>
> On Thu, 6 Jun 2019 20:13:07 +0200 Richard Weinberger <richard.weinberger@gmail.com> wrote:
>
> > On Thu, Jun 6, 2019 at 8:08 PM Sergei Poselenov <sposelenov@emcraft.com> wrote:
> > > This is understood. However, on the file length that is written to the partition, I'd expect that the file content will be the same as in the original file. This is not so.
> > > Is it expected, or is it a deficiency of UBI?
> >
> > Please show in detail what you are doing, on syscall level, and what
> > the expected output is.
> >
>
> Here is my test:
...
>
> However, upon retry of the very same test from the beginning (with the power cut-off in the middle) it's easily to have the content of test2 (exactly the last 512 bytes in my case) which doesn't match test0, so "dd if=test2 of=test0 conv=notrunc" will result in test0 with a different checksum.
>

IMHO, your test is invalid and it's your expectations that are wrong.
The file didn't finish writing because you did a power-cut. If I had
to guess, those exactly "last 512 bytes", are the size of your page or
subpage on the NAND flash, and I'd bet they're filled with 0xFF.
Unlike other filesystem media, writing flash media is done in pages,
where they're erased and then written, and erasing and writing is slow
and complex process.

If I had to continue my guessing - the valid portion of the file test2
that was successfully written is not a multiple of your NAND's page
size.  Likely you've got 2Kb pages with 4 512 byte subpages.  The last
page of that flash that was written for that file wrote three of the
four subpages.  When you `dd` the file overwrite the existing file,
you corrupt it yourself by using the no-trim option - for each page
from the start of test0, it erases, writes the page from test2, until
it gets to the last page of test2 where it erases the page, writes
three subpages, and leaves the last subpage as erased, but now you've
got invalid data in the middle of your file because you don't trim the
size to the write and so the erased data is now part of your file.

You're seeing a hardware effect and expecting a software result.

Simple fact is - a power cut when writing a large file, even with sync
on, will result in an invalid (short) file.  UBIFS (nor ANY
filesystem) can not protect against that. UBIFS is doing it's job by
making sure your filesystem is still usable after the power cut
despite it being in the middle of a write.  Which, since your system
is booting and you're not posting any kernel logs showing corrupted
filesystem, it seems to me that UBIFS is doing what it is supposed to.

If you want to understand more, the mtd website is a good start, and
you should absolutely read all the datasheets and app notes for the
flash device and the NAND interface you're using.

- Steve

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-07 14:23       ` Sergei Poselenov
  2019-06-07 16:01         ` Steve deRosier
@ 2019-06-07 16:02         ` Richard Weinberger
  2019-06-08  6:42           ` Sergei Poselenov
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Weinberger @ 2019-06-07 16:02 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: linux-mtd

----- Ursprüngliche Mail -----
> Now, I' overwriting the content of the "test0" with "test2", without "test0"
> truncation:
> mnt/data1 # dd if=test2 of=test0 conv=notrunc
> 222+0 records in
> 222+0 records out
> 113664 bytes (111.0KB) copied, 0.859668 seconds, 129.1KB/s
> /mnt/data1 # md5sum *
> 23376319de62934c3859615e6244aae0  test0
> 23376319de62934c3859615e6244aae0  test1
> 87cd1d1a28896e63eb039d75efef43a3  test2
> /mnt/data1 #
> 
> For this particular test pass, the content of test2 matches the content of the
> original file test0, so overwriting test0 with test2 doesn't change the data.
> 
> However, upon retry of the very same test from the beginning (with the power
> cut-off in the middle) it's easily to have the content of test2 (exactly the
> last 512 bytes in my case) which doesn't match test0, so "dd if=test2 of=test0
> conv=notrunc" will result in test0 with a different checksum.
> 
> To remind, the partition is mounted with the "sync" option.

Ahhh, now understand what you do. Sorry for being dense.

I fear your assumption is not correct because UBIFS is strictly copy-on-write.
So if you overwrite a file, it will always create new data blocks. A data block
has 4k. If you face a power-cut from this 4k block the last written page can be
lost. This is likely what you see.

Does my answer help?

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-07 16:02         ` Richard Weinberger
@ 2019-06-08  6:42           ` Sergei Poselenov
  2019-06-08  8:46             ` Richard Weinberger
  0 siblings, 1 reply; 16+ messages in thread
From: Sergei Poselenov @ 2019-06-08  6:42 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

Hello Richard,

On Fri, 7 Jun 2019 18:02:16 +0200 (CEST) Richard Weinberger <
richard@nod.at> wrote:

> ----- Ursprüngliche Mail -----
> > Now, I' overwriting the content of the "test0" with "test2",
> > without "test0"
> > truncation:
> > mnt/data1 # dd if=test2 of=test0 conv=notrunc
> > 222+0 records in
> > 222+0 records out
> > 113664 bytes (111.0KB) copied, 0.859668 seconds, 129.1KB/s
> > /mnt/data1 # md5sum *
> > 23376319de62934c3859615e6244aae0  test0
> > 23376319de62934c3859615e6244aae0  test1
> > 87cd1d1a28896e63eb039d75efef43a3  test2
> > /mnt/data1 #
> > 
> > For this particular test pass, the content of test2 matches the
> > content of the
> > original file test0, so overwriting test0 with test2 doesn't change
> > the data.
> > 
> > However, upon retry of the very same test from the beginning (with
> > the power
> > cut-off in the middle) it's easily to have the content of test2
> > (exactly the
> > last 512 bytes in my case) which doesn't match test0, so "dd
> > if=test2 of=test0
> > conv=notrunc" will result in test0 with a different checksum.
> > 
> > To remind, the partition is mounted with the "sync" option.  
> 
> Ahhh, now understand what you do. Sorry for being dense.
> 
> I fear your assumption is not correct because UBIFS is strictly copy-
> on-write.
> So if you overwrite a file, it will always create new data blocks. A
> data block
> has 4k. If you face a power-cut from this 4k block the last written
> page can be
> lost. This is likely what you see.
> 
> Does my answer help?
> 

Not actually, I'm afraid. I don't see how overwriting a file is
relevant here. Definitely, I don't do power cut during "dd if=test2
of=test0 conv=notrunc". At the moment of power cut-off, test2 has been
created from scratch and has some data written to it, with more writes in progress.

Overwriting with "dd if=test2 of=test0 conv=notrunc" is just an easy
way to demonstrate that the last written file has invalid data. I still
don't understand why is that? It's not the dd operation itself that
corrupts test0. It's test2 which already has invalid data.

It looks like the directory entry was updated before the data was
actually written to NAND, regardless of the "sync" mount option?

Just in case, here is my sequence:
repeat:
0. cd /mnt/data1; rm -rf *
1. /mnt/data1 # for i in `seq 0 99`; do
> dd if=/bin/busybox of=test${i}; done  
3. Power is cut during the above "for" loop
4. After boot-up, check the last written file:
/mnt/data1 # ls -l
-rw-r--r--    1 root     root        396795 Jan  8 23:58 test0
-rw-r--r--    1 root     root        396795 Jan  8 23:58 test1
-rw-r--r--    1 root     root        113664 Jan  8 23:58 test2

/mnt/data1# dd if=test2 of=test0 conv=notrunc
/mnt/data1# md5sum *
Result:Sometimes, "test0" has invalid data, which means that "test2"
has corrupted data on its written length.
5. goto repeat;


Update: I just re-done the test. Here is what I have after the system
recovery after the power cut-off:
~ # ls -l /mnt/data1
-rw-r--r--    1 root     root        430519 Jan  1 00:00 test0
-rw-r--r--    1 root     root        430519 Jan  1 00:04 test1
-rw-r--r--    1 root     root        430519 Jan  1 00:04 test2
-rw-r--r--    1 root     root        430519 Jan  1 00:04 test3
-rw-r--r--    1 root     root        286720 Jan  1 00:04 test4

Now, my test:
/mnt/data1 # md5sum *
0dd9133f519bbee46ac593490b9d8805  test0
0dd9133f519bbee46ac593490b9d8805  test1
0dd9133f519bbee46ac593490b9d8805  test2
0dd9133f519bbee46ac593490b9d8805  test3
76d6ffbaad4220ac474484bf09b1bbb8  test4
/mnt/data1 # dd if=test4 of=test0 conv=notrunc
/mnt/data1 # md5sum *
81c3dcc4f2dfd65592ad245a6976091d  test0
0dd9133f519bbee46ac593490b9d8805  test1
0dd9133f519bbee46ac593490b9d8805  test2
0dd9133f519bbee46ac593490b9d8805  test3
76d6ffbaad4220ac474484bf09b1bbb8  test4

OK, test4 is corrupted:
# hexdump test0 > test0.hex
# hexdump test4 > test4.hex
# hexdump /bin/busybox > busybox.hex
# diff -pu busybox.hex test0.hex | more
--- busybox.hex	2019-06-07 21:24:16.000000000 +0300
+++ test0.hex	2019-06-07 21:24:28.000000000 +0300
@@ -17852,38 +17852,8 @@
 0045dd0 0004 e1a0 81f0 e8bd 9ba0 0007 3014 e59f
 0045de0 4010 e92d 4000 e593 0008 e594 ffa9 ebff
 0045df0 0008 e584 8010 e8bd 9ba0 0007 0000 e151
-0045e00 40f8 e92d 3000 31a0 4000 e1a0 4001 31a0
-0045e10 1003 31a0 0001 e1a0 ff8d ebff 3044 e59f
-0045e20 5000 e3a0 6000 e593 7000 e1a0 0007 e154
-0045e30 000a 8a00 3004 e596 3001 e243 0003 e154
-0045e40 0006 8a00 0004 e1a0 ff81 ebff 3000 e5d0
-0045e50 4001 e280 000a e353 5001 0285 fff2 eaff
-0045e60 0005 e1a0 80f8 e8bd 9ba0 0007 3000 e3e0
-0045e70 40f7 e92d 5001 e1a0 3000 e581 1000 e5d0
-0045e80 4000 e1a0 002e e351 3130 059f 4001 0280
-0045e90 6000 0593 0008 0596 0038 0a00 0027 e351
-0045ea0 0013 1a00 3001 e5d0 4002 e280 2041 e243
-0045eb0 0019 e352 3020 9283 3073 96ef 3061 e243
-0045ec0 3073 e6ef 0019 e353 0039 8a00 20ec e59f
-0045ed0 3044 e283 2000 e592 1103 e792 0000 e351
-0045ee0 0033 0a00 0000 e592 ffc3 ebff 0000 e585
-0045ef0 002f ea00 002f e351 0019 1a00 6001 e280
-0045f00 0006 e1a0 f7c7 ebfe 4000 e1a0 1004 e066
-0045f10 0006 e1a0 ff36 ebfe 3000 e5d4 2001 e3a0
-0045f20 002f e353 3094 e59f 6000 e1a0 7000 e593
-0045f30 1006 e1a0 0008 e597 3002 e1a0 4001 0284
-0045f40 ff6d ebff 1000 e250 0002 0a00 0000 e597
-0045f50 ffa9 ebff 0000 e585 0006 e1a0 f7b4 ebfe
-0045f60 0013 ea00 0024 e351 0008 1a00 304c e59f
-0045f70 4001 e280 6000 e593 0004 e596 0001 e240
-0045f80 fee5 ebff 1000 e1a0 0000 e596 ffd5 eaff
-0045f90 3030 e241 0009 e353 0005 8a00 3004 e28d
-0045fa0 101c e59f 2005 e1a0 fa0e ebfe 3004 e59d
-0045fb0 4003 e084 0004 e1a0 d00c e28d 80f0 e8bd
-0045fc0 9ba0 0007 dd97 0006 4038 e92d 302c e59f
-0045fd0 5000 e3a0 3000 e593 4034 e593 0058 e593
-0045fe0 4002 e244 0004 e155 0002 2a00 ff29 ebff
-0045ff0 5001 e285 fffa eaff 4038 e8bd ff14 eaff
+0045e00 0000 0000 0000 0000 0000 0000 0000 0000
+*
 0046000 9ba0 0007 40f8 e92d 306c e59f 6000 e1a0
 0046010 7001 e1a0 5000 e593 4003 e1a0 0000 e356
 0046020 0008 da00 0000 e357 0058 e595 0001 aa00

You see, exactly 512 bytes at 0x45e00 are zeroes. That's the test4
file:
# hexdump test4.hex
...
0045dd0 0004 e1a0 81f0 e8bd 9ba0 0007 3014 e59f
0045de0 4010 e92d 4000 e593 0008 e594 ffa9 ebff
0045df0 0008 e584 8010 e8bd 9ba0 0007 0000 e151
0045e00 0000 0000 0000 0000 0000 0000 0000 0000
*
0046000
#

Thanks!

Regards,
Sergei


> Thanks,
> //richard
> 


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-08  6:42           ` Sergei Poselenov
@ 2019-06-08  8:46             ` Richard Weinberger
  2019-06-09  8:18               ` Sergei Poselenov
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Weinberger @ 2019-06-08  8:46 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: linux-mtd

----- Ursprüngliche Mail -----
> Not actually, I'm afraid. I don't see how overwriting a file is
> relevant here. Definitely, I don't do power cut during "dd if=test2
> of=test0 conv=notrunc". At the moment of power cut-off, test2 has been
> created from scratch and has some data written to it, with more writes in
> progress.

I'm sorry, but I'm really confused now.

Please share your real test case, with real results and what you expect.
No guesswork, no simplified pseudo code. :-)

Otherwise we have little chance to sort this out.
Also read what Steve wrote.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-08  8:46             ` Richard Weinberger
@ 2019-06-09  8:18               ` Sergei Poselenov
  2019-06-09  8:53                 ` Richard Weinberger
  0 siblings, 1 reply; 16+ messages in thread
From: Sergei Poselenov @ 2019-06-09  8:18 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

Hello Richard,

Thank you for your patience. OK, I'll try to describe my real test case
once again, below.

1. On my system, /mnt/data1 is a UBI partition mounted with the "sync"
option on NAND device.
2. Initially, the partition is empty.
3. I do:
/mnt/data1 # for i in `seq 0 99`; do
> dd if=/bin/busybox of=test${i}; done
4. I do a power cut-off during the loop above.
5. After restoring the power, the system reboots and recovers the UBI
partition successfully. Here is what I have there now:
~ # ls -l /mnt/data1
-rw-r--r--    1 root     root        430519 Jan  1 00:00 test0
-rw-r--r--    1 root     root        430519 Jan  1 00:04 test1
-rw-r--r--    1 root     root        430519 Jan  1 00:04 test2
-rw-r--r--    1 root     root        430519 Jan  1 00:04 test3
-rw-r--r--    1 root     root        286720 Jan  1 00:04 test4
6. The test0-test3 files above are exact replicas of the initial image
/bin/busybox. "test4" is incompletely written, as expected.
7. Let's check the data at the end of the last partially written file
test4:
# hexdump test4.hex
...
0045dd0 0004 e1a0 81f0 e8bd 9ba0 0007 3014 e59f
0045de0 4010 e92d 4000 e593 0008 e594 ffa9 ebff
0045df0 0008 e584 8010 e8bd 9ba0 0007 0000 e151
0045e00 0000 0000 0000 0000 0000 0000 0000 0000
*
0046000
# 
As you can see, the last 512 bytes starting from 0x0045e00 are zeroes.

Let's check the data by this offset in the original file /bin/busybox:
   0045dd0 0004 e1a0 81f0 e8bd 9ba0 0007 3014 e59f
   0045de0 4010 e92d 4000 e593 0008 e594 ffa9 ebff
   0045df0 0008 e584 8010 e8bd 9ba0 0007 0000 e151
-> 0045e00 40f8 e92d 3000 31a0 4000 e1a0 4001 31a0
   0045e10 1003 31a0 0001 e1a0 ff8d ebff 3044 e59f
   0045e20 5000 e3a0 6000 e593 7000 e1a0 0007 e154
   0045e30 000a 8a00 3004 e596 3001 e243 0003 e154
   0045e40 0006 8a00 0004 e1a0 ff81 ebff 3000 e5d0
   0045e50 4001 e280 000a e353 5001 0285 fff2 eaff
   0045e60 0005 e1a0 80f8 e8bd 9ba0 0007 3000 e3e0
   0045e70 40f7 e92d 5001 e1a0 3000 e581 1000 e5d0
   0045e80 4000 e1a0 002e e351 3130 059f 4001 0280
...


Now, the question: why are these 512 bytes of zeros in test4?

Extrapolating to a real use case, consider a system writing a log file
to the UBI partition. Power cutoff, recovery, and we have a garbage at
the end of the log file. Is this expected?

Thanks!

Regards,
Sergei
On Sat, 2019-06-08 at 10:46 +0200, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
> > Not actually, I'm afraid. I don't see how overwriting a file is
> > relevant here. Definitely, I don't do power cut during "dd if=test2
> > of=test0 conv=notrunc". At the moment of power cut-off, test2 has
> > been
> > created from scratch and has some data written to it, with more
> > writes in
> > progress.
> 
> I'm sorry, but I'm really confused now.
> 
> Please share your real test case, with real results and what you
> expect.
> No guesswork, no simplified pseudo code. :-)
> 
> Otherwise we have little chance to sort this out.
> Also read what Steve wrote.
> 
> Thanks,
> //richard
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-07 16:01         ` Steve deRosier
@ 2019-06-09  8:32           ` Sergei Poselenov
  2019-06-09 15:25             ` Steve deRosier
  0 siblings, 1 reply; 16+ messages in thread
From: Sergei Poselenov @ 2019-06-09  8:32 UTC (permalink / raw)
  To: Steve deRosier; +Cc: Richard Weinberger, linux-mtd

Hello Steve,

Please see my comment below.

On Fri, 2019-06-07 at 09:01 -0700, Steve deRosier wrote:
> On Fri, Jun 7, 2019 at 7:24 AM Sergei Poselenov <
> sposelenov@emcraft.com> wrote:
> > Hello Richard,
> > 
> > On Thu, 6 Jun 2019 20:13:07 +0200 Richard Weinberger <
> > richard.weinberger@gmail.com> wrote:
> > 
> > > On Thu, Jun 6, 2019 at 8:08 PM Sergei Poselenov <
> > > sposelenov@emcraft.com> wrote:
> > > > This is understood. However, on the file length that is written
> > > > to the partition, I'd expect that the file content will be the
> > > > same as in the original file. This is not so.
> > > > Is it expected, or is it a deficiency of UBI?
> > > 
> > > Please show in detail what you are doing, on syscall level, and
> > > what
> > > the expected output is.
> > > 
> > 
> > Here is my test:
> ...
> > However, upon retry of the very same test from the beginning (with
> > the power cut-off in the middle) it's easily to have the content of
> > test2 (exactly the last 512 bytes in my case) which doesn't match
> > test0, so "dd if=test2 of=test0 conv=notrunc" will result in test0
> > with a different checksum.
> > 
> 
> IMHO, your test is invalid and it's your expectations that are wrong.
> The file didn't finish writing because you did a power-cut. If I had
> to guess, those exactly "last 512 bytes", are the size of your page
> or
> subpage on the NAND flash, and I'd bet they're filled with 0xFF.
Actually, in that last subpage I've seen 512 bytes of zeroes, or some
other data, but never 0xff.
> Unlike other filesystem media, writing flash media is done in pages,
> where they're erased and then written, and erasing and writing is
> slow
> and complex process.
> 
> If I had to continue my guessing - the valid portion of the file
> test2
> that was successfully written is not a multiple of your NAND's page
> size.  Likely you've got 2Kb pages with 4 512 byte subpages.  The
> last
> page of that flash that was written for that file wrote three of the
> four subpages.  When you `dd` the file overwrite the existing file,
Looks like you are right, what I'm seeing is that only 3 of 4 512-bytes 
subpages written correctly.

So, you are saying that the NAND controller (or the kernel device
driver?) returned "success" for the "4K page write" operation, while
that wasn't actually true?

Thanks!

Regards,
Sergei


> you corrupt it yourself by using the no-trim option - for each page
> from the start of test0, it erases, writes the page from test2, until
> it gets to the last page of test2 where it erases the page, writes
> three subpages, and leaves the last subpage as erased, but now you've
> got invalid data in the middle of your file because you don't trim
> the
> size to the write and so the erased data is now part of your file.
> 
> You're seeing a hardware effect and expecting a software result.
> 
> Simple fact is - a power cut when writing a large file, even with
> sync
> on, will result in an invalid (short) file.  UBIFS (nor ANY
> filesystem) can not protect against that. UBIFS is doing it's job by
> making sure your filesystem is still usable after the power cut
> despite it being in the middle of a write.  Which, since your system
> is booting and you're not posting any kernel logs showing corrupted
> filesystem, it seems to me that UBIFS is doing what it is supposed
> to.
> 
> If you want to understand more, the mtd website is a good start, and
> you should absolutely read all the datasheets and app notes for the
> flash device and the NAND interface you're using.
> 
> - Steve
> 


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-09  8:18               ` Sergei Poselenov
@ 2019-06-09  8:53                 ` Richard Weinberger
  2019-06-09  9:59                   ` Sergei Poselenov
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Weinberger @ 2019-06-09  8:53 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: linux-mtd

----- Ursprüngliche Mail -----
> Now, the question: why are these 512 bytes of zeros in test4?
> 
> Extrapolating to a real use case, consider a system writing a log file
> to the UBI partition. Power cutoff, recovery, and we have a garbage at
> the end of the log file. Is this expected?

Well, zero bytes are not garbage. ;)
But yes, this is is expected. In your case, I guess, the inode size
does not match exactly what is present on flash.
So the file size is 512 bytes too large, if you read too far, 0x0 bytes are returned.
During journal replay UBIFS tries to fixup file lengths but is approach
does not work in all cases perfectly.

This case is actually documented, please see:
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_sync_semantics

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-09  8:53                 ` Richard Weinberger
@ 2019-06-09  9:59                   ` Sergei Poselenov
  2019-06-09 10:00                     ` Richard Weinberger
  0 siblings, 1 reply; 16+ messages in thread
From: Sergei Poselenov @ 2019-06-09  9:59 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

On Sun, 2019-06-09 at 10:53 +0200, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
> > Now, the question: why are these 512 bytes of zeros in test4?
> > 
> > Extrapolating to a real use case, consider a system writing a log
> > file
> > to the UBI partition. Power cutoff, recovery, and we have a garbage
> > at
> > the end of the log file. Is this expected?
> 
> Well, zero bytes are not garbage. ;)

I've seen some data in the last 512 bytes as well.

> But yes, this is is expected. In your case, I guess, the inode size
> does not match exactly what is present on flash.
> So the file size is 512 bytes too large, if you read too far, 0x0
> bytes are returned.
> During journal replay UBIFS tries to fixup file lengths but is
> approach
> does not work in all cases perfectly.
> 
> This case is actually documented, please see:
> http://www.linux-mtd.infradead.org/doc/ubifs.html#L_sync_semantics
> 
Thank you for the pointer. I'll look.

Regards,
Sergei
> Thanks,
> //richard
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-09  9:59                   ` Sergei Poselenov
@ 2019-06-09 10:00                     ` Richard Weinberger
  2019-06-15  7:26                       ` Sergei Poselenov
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Weinberger @ 2019-06-09 10:00 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: linux-mtd

----- Ursprüngliche Mail -----
>> Well, zero bytes are not garbage. ;)
> 
> I've seen some data in the last 512 bytes as well.

That would be a bug, unless you are overwriting an existing
file.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-09  8:32           ` Sergei Poselenov
@ 2019-06-09 15:25             ` Steve deRosier
  0 siblings, 0 replies; 16+ messages in thread
From: Steve deRosier @ 2019-06-09 15:25 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: Richard Weinberger, linux-mtd

Hi Sergei,

On Sun, Jun 9, 2019 at 1:32 AM Sergei Poselenov <sposelenov@emcraft.com> wrote:
>
> Hello Steve,
>
> Please see my comment below.
>

> > If I had to continue my guessing - the valid portion of the file
> > test2
> > that was successfully written is not a multiple of your NAND's page
> > size.  Likely you've got 2Kb pages with 4 512 byte subpages.  The
> > last
> > page of that flash that was written for that file wrote three of the
> > four subpages.  When you `dd` the file overwrite the existing file,
> Looks like you are right, what I'm seeing is that only 3 of 4 512-bytes
> subpages written correctly.
>
> So, you are saying that the NAND controller (or the kernel device
> driver?) returned "success" for the "4K page write" operation, while
> that wasn't actually true?
>
>

No, that's not what I'm saying.  I'm saying the NAND was written
exactly as you specified.  You basically said: "write these bytes from
this page, ignoring the fact that the space I'm writing to is too
long". Flash gets erased and written in pages and subpages
(respectively). My suspicion is that your test case itself causes the
issue.  UBIFS, nor any filesystem will protect a file from getting
corrupted when the power goes out when you write it. It also can't
protect you from purposely corrupting a file with you test-case. The
purpose of UBIFS's power-cut tolerance is to be sure the filesystem
itself doesn't corrupt and can still boot.

While I'm still pretty sure your test case is the cause of the
corruption itself, I tried the basics (minus the power cut) to test a
part of my theory. Namely the dd of the partial page with notrunc is
the source of your problem.  On my platform, on a single test, I
couldn't replicate what I think is happening.  However, that's hardly
conclusive because I _know_ I'm using different hardware and software
stack than you.

So, let's start back at the basics:

* What is your processor hardware? You said "based on i.MX 6ULL", but
let's be 100% specific.
* What is your NAND chip?
* What is the layout of the NAND chip (sectors, pages, subpages)?
* What ECC level are you using?
* What version of Linux are you using (all three x.y.z, preferably
with a reference to the actual git branch you're using, vendor or
stock)
* What NAND controller driver are you using?
* What NAND chip driver are you using?

And finally:
* Do you see overall UBIFS corruption? In other words, when the device
boots, do you see it unable to correct a problem caused by the power
cut?

Read the link Richard sent you. The basic rule is: "...applications
should not assume anything about the contents of files which were not
synchronized before a power-cut has happened. " Having -osync on
doesn't mean your file was synced if the power-cut comes in the middle
of a write. It just means that the OS is going to do the sync
automatically (per the semantics) so you don't have to issue a sync
command.

I think UBIFS is behaving as designed and intended and your test-case
and expectations are flawed.  However, please give the asked for
details, actual logs and data dumps and if there's a bug here it'll
get looked at.

- Steve

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: UBIFS: file data corruption during the power cut-off test
  2019-06-09 10:00                     ` Richard Weinberger
@ 2019-06-15  7:26                       ` Sergei Poselenov
  0 siblings, 0 replies; 16+ messages in thread
From: Sergei Poselenov @ 2019-06-15  7:26 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd

Hello Richard, (Steve),

On Sun, 2019-06-09 at 12:00 +0200, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
> > > Well, zero bytes are not garbage. ;)
> > 
> > I've seen some data in the last 512 bytes as well.
> 
> That would be a bug, unless you are overwriting an existing
> file.
> 
I did more tests and was unable to see a non-zero data at the end of
the last file after recovery. Apparently, I was mistaken.

The UBI recovery worked fine every time after the power cut-off.

Thank you and Steve for your comments.

Regards,
Sergei
> Thanks,
> //richard
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-06-15  7:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-06  9:10 UBIFS: file data corruption during the power cut-off test Sergei Poselenov
2019-06-06 14:55 ` Richard Weinberger
2019-06-06 18:08   ` Sergei Poselenov
2019-06-06 18:13     ` Richard Weinberger
2019-06-07 14:23       ` Sergei Poselenov
2019-06-07 16:01         ` Steve deRosier
2019-06-09  8:32           ` Sergei Poselenov
2019-06-09 15:25             ` Steve deRosier
2019-06-07 16:02         ` Richard Weinberger
2019-06-08  6:42           ` Sergei Poselenov
2019-06-08  8:46             ` Richard Weinberger
2019-06-09  8:18               ` Sergei Poselenov
2019-06-09  8:53                 ` Richard Weinberger
2019-06-09  9:59                   ` Sergei Poselenov
2019-06-09 10:00                     ` Richard Weinberger
2019-06-15  7:26                       ` Sergei Poselenov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.