* Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-19 12:31 ` Rafał Miłecki 0 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-19 12:31 UTC (permalink / raw) To: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Richard Weinberger, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List Hi, Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly reporting file system corruptions. OpenWrt uses overlay(fs) with squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate & describe test case for reproducing corruption when doing a power cut after first boot. Interestingly it cannot be reproduced on all devices (NAND dependant? arch dependant?!). I couldn't reproduce that problem on none of my Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up"). FWIW I was told it also affects: Asus RT-AC58U (ARCH_IPQ40XX=y) powerpc RB493G, DIR-860L (ATH79=y) Steps to reproduce the problem: 1) Flash firmware 2) Boot (for the first time) 3) Let the init script copy config files from lowerdir to the upperdir 4) Wait for boot to finish 5) Verify content of some unmodified config on overlay, using either: hexdump -C /etc/config/dropbear hexdump -C /overlay/upper/etc/config/dropbear 6) Power cut & boot again 7) Check the content of the same file After above regressing commit the later check confirms the file size looks correct but it's filled with all 00-es only. Can I ask you to check if there is something possibly wrong with the above ovl commit? Or does it expose some problem with the ubifs? Or maybe the whole UBI? FWIW testing above commit (and one before it) always results in single error in the kernel log: [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice That UBIFS error doesn't occur with 4.12.14. Unfortunately it's impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. -- Rafał ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-19 12:31 ` Rafał Miłecki 0 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-19 12:31 UTC (permalink / raw) To: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Richard Weinberger, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List Hi, Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly reporting file system corruptions. OpenWrt uses overlay(fs) with squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate & describe test case for reproducing corruption when doing a power cut after first boot. Interestingly it cannot be reproduced on all devices (NAND dependant? arch dependant?!). I couldn't reproduce that problem on none of my Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up"). FWIW I was told it also affects: Asus RT-AC58U (ARCH_IPQ40XX=y) powerpc RB493G, DIR-860L (ATH79=y) Steps to reproduce the problem: 1) Flash firmware 2) Boot (for the first time) 3) Let the init script copy config files from lowerdir to the upperdir 4) Wait for boot to finish 5) Verify content of some unmodified config on overlay, using either: hexdump -C /etc/config/dropbear hexdump -C /overlay/upper/etc/config/dropbear 6) Power cut & boot again 7) Check the content of the same file After above regressing commit the later check confirms the file size looks correct but it's filled with all 00-es only. Can I ask you to check if there is something possibly wrong with the above ovl commit? Or does it expose some problem with the ubifs? Or maybe the whole UBI? FWIW testing above commit (and one before it) always results in single error in the kernel log: [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice That UBIFS error doesn't occur with 4.12.14. Unfortunately it's impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. -- Rafał ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 12:31 ` Rafał Miłecki @ 2018-10-19 14:45 ` Richard Weinberger -1 siblings, 0 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-19 14:45 UTC (permalink / raw) To: Rafał Miłecki Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List Rafał, ----- Ursprüngliche Mail ----- > Von: "Rafał Miłecki" <zajec5@gmail.com> > An: "Amir Goldstein" <amir73il@gmail.com>, "Miklos Szeredi" <miklos@szeredi.hu>, linux-unionfs@vger.kernel.org, > linux-fsdevel@vger.kernel.org, "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter" > <adrian.hunter@intel.com>, linux-mtd@lists.infradead.org, "Russell Senior" <russell@personaltelco.net>, "OpenWrt > Development List" <openwrt-devel@lists.openwrt.org> > Gesendet: Freitag, 19. Oktober 2018 14:31:29 > Betreff: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") > Hi, > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > reporting file system corruptions. OpenWrt uses overlay(fs) with > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > & describe test case for reproducing corruption when doing a power cut > after first boot. > > Interestingly it cannot be reproduced on all devices (NAND dependant? > arch dependant?!). I couldn't reproduce that problem on none of my > Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti > EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and > bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of > lower inode on copy up"). > > FWIW I was told it also affects: > Asus RT-AC58U (ARCH_IPQ40XX=y) > powerpc > RB493G, DIR-860L (ATH79=y) > > Steps to reproduce the problem: > 1) Flash firmware > 2) Boot (for the first time) > 3) Let the init script copy config files from lowerdir to the upperdir > 4) Wait for boot to finish > 5) Verify content of some unmodified config on overlay, using either: > hexdump -C /etc/config/dropbear > hexdump -C /overlay/upper/etc/config/dropbear > 6) Power cut & boot again > 7) Check the content of the same file Do you have something also I can test? A C reproducer? An xfstest case? > After above regressing commit the later check confirms the file size > looks correct but it's filled with all 00-es only. > > Can I ask you to check if there is something possibly wrong with the > above ovl commit? Or does it expose some problem with the ubifs? Or > maybe the whole UBI? Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. Did you bisect the problem and you are sure that the said commit is the first bad commit? > FWIW testing above commit (and one before it) always results in single > error in the kernel log: > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice Please show the full log. The orphan thing rings a bell, we had such a bug already. Thanks, //richard ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-19 14:45 ` Richard Weinberger 0 siblings, 0 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-19 14:45 UTC (permalink / raw) To: Rafał Miłecki Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List Rafał, ----- Ursprüngliche Mail ----- > Von: "Rafał Miłecki" <zajec5@gmail.com> > An: "Amir Goldstein" <amir73il@gmail.com>, "Miklos Szeredi" <miklos@szeredi.hu>, linux-unionfs@vger.kernel.org, > linux-fsdevel@vger.kernel.org, "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter" > <adrian.hunter@intel.com>, linux-mtd@lists.infradead.org, "Russell Senior" <russell@personaltelco.net>, "OpenWrt > Development List" <openwrt-devel@lists.openwrt.org> > Gesendet: Freitag, 19. Oktober 2018 14:31:29 > Betreff: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") > Hi, > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > reporting file system corruptions. OpenWrt uses overlay(fs) with > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > & describe test case for reproducing corruption when doing a power cut > after first boot. > > Interestingly it cannot be reproduced on all devices (NAND dependant? > arch dependant?!). I couldn't reproduce that problem on none of my > Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti > EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and > bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of > lower inode on copy up"). > > FWIW I was told it also affects: > Asus RT-AC58U (ARCH_IPQ40XX=y) > powerpc > RB493G, DIR-860L (ATH79=y) > > Steps to reproduce the problem: > 1) Flash firmware > 2) Boot (for the first time) > 3) Let the init script copy config files from lowerdir to the upperdir > 4) Wait for boot to finish > 5) Verify content of some unmodified config on overlay, using either: > hexdump -C /etc/config/dropbear > hexdump -C /overlay/upper/etc/config/dropbear > 6) Power cut & boot again > 7) Check the content of the same file Do you have something also I can test? A C reproducer? An xfstest case? > After above regressing commit the later check confirms the file size > looks correct but it's filled with all 00-es only. > > Can I ask you to check if there is something possibly wrong with the > above ovl commit? Or does it expose some problem with the ubifs? Or > maybe the whole UBI? Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. Did you bisect the problem and you are sure that the said commit is the first bad commit? > FWIW testing above commit (and one before it) always results in single > error in the kernel log: > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice Please show the full log. The orphan thing rings a bell, we had such a bug already. Thanks, //richard ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 14:45 ` Richard Weinberger (?) @ 2018-10-19 14:59 ` Richard Weinberger -1 siblings, 0 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-19 14:59 UTC (permalink / raw) To: Rafał Miłecki Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger: > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > Did you bisect the problem and you are sure that the said commit is the first bad commit? Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs, if I read the commit correctly. Maybe this is the root of the problem. I'm currently traveling, so I cannot test much. Please make sure whether 3a1e819b4e80 is really the first bad commit. Then I can go for a bug hunt. :-) Thanks, //richard ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 14:45 ` Richard Weinberger (?) (?) @ 2018-10-19 14:59 ` Richard Weinberger 2018-10-19 15:07 ` Amir Goldstein ` (3 more replies) -1 siblings, 4 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-19 14:59 UTC (permalink / raw) To: Rafał Miłecki Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger: > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > Did you bisect the problem and you are sure that the said commit is the first bad commit? Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs, if I read the commit correctly. Maybe this is the root of the problem. I'm currently traveling, so I cannot test much. Please make sure whether 3a1e819b4e80 is really the first bad commit. Then I can go for a bug hunt. :-) Thanks, //richard ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 14:59 ` Richard Weinberger @ 2018-10-19 15:07 ` Amir Goldstein 2018-10-19 15:07 ` Amir Goldstein ` (2 subsequent siblings) 3 siblings, 0 replies; 40+ messages in thread From: Amir Goldstein @ 2018-10-19 15:07 UTC (permalink / raw) To: Richard Weinberger Cc: Artem Bityutskiy, Rafał Miłecki, Adrian Hunter, overlayfs, Miklos Szeredi, linux-mtd, russell, linux-fsdevel, openwrt-devel On Fri, Oct 19, 2018 at 5:59 PM Richard Weinberger <richard@nod.at> wrote: > > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger: > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > > Did you bisect the problem and you are sure that the said commit is the first bad commit? > > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs, > if I read the commit correctly. > Maybe this is the root of the problem. > You are not reading the commit correctly. Commit says: "If lower fs does not support NFS export ops store a zero sized xattr so we can always use the overlay.origin xattr to distinguish between a copy up and a pure upper inode." What it means is: - Doesn't matter if ubifs supports NFS export - it's not the lower fs - squashfs supports NFS export, but has zero s_uuid, so currently that is treated the same as no NFS export support (I have patch to relax this) - The result of no lower NFS export support is that a zero size xattr is written as opposed to non-zero size xattr in the case of lower support NFS export Thanks, Amir. ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 14:59 ` Richard Weinberger 2018-10-19 15:07 ` Amir Goldstein @ 2018-10-19 15:07 ` Amir Goldstein 2018-10-19 21:28 ` Rafał Miłecki 2018-10-19 21:28 ` Rafał Miłecki 3 siblings, 0 replies; 40+ messages in thread From: Amir Goldstein @ 2018-10-19 15:07 UTC (permalink / raw) To: Richard Weinberger Cc: Rafał Miłecki, Miklos Szeredi, overlayfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, russell, openwrt-devel On Fri, Oct 19, 2018 at 5:59 PM Richard Weinberger <richard@nod.at> wrote: > > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger: > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > > Did you bisect the problem and you are sure that the said commit is the first bad commit? > > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs, > if I read the commit correctly. > Maybe this is the root of the problem. > You are not reading the commit correctly. Commit says: "If lower fs does not support NFS export ops store a zero sized xattr so we can always use the overlay.origin xattr to distinguish between a copy up and a pure upper inode." What it means is: - Doesn't matter if ubifs supports NFS export - it's not the lower fs - squashfs supports NFS export, but has zero s_uuid, so currently that is treated the same as no NFS export support (I have patch to relax this) - The result of no lower NFS export support is that a zero size xattr is written as opposed to non-zero size xattr in the case of lower support NFS export Thanks, Amir. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 14:59 ` Richard Weinberger 2018-10-19 15:07 ` Amir Goldstein 2018-10-19 15:07 ` Amir Goldstein @ 2018-10-19 21:28 ` Rafał Miłecki 2018-10-19 21:28 ` Rafał Miłecki 3 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-19 21:28 UTC (permalink / raw) To: Richard Weinberger Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote: > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger: > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > > Did you bisect the problem and you are sure that the said commit is the first bad commit? > > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs, > if I read the commit correctly. > Maybe this is the root of the problem. > > I'm currently traveling, so I cannot test much. Please make sure whether > 3a1e819b4e80 is really the first bad commit. > Then I can go for a bug hunt. :-) I did another test and everything keeps pointing to the 3a1e819b4e80. 1) I switched to the 4.12.14 2) I confirmed problem exists there 3) I added "return 0;" at the beginning of ovl_set_origin() 4) I confirmed problem disappeared -- Rafał ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 14:59 ` Richard Weinberger ` (2 preceding siblings ...) 2018-10-19 21:28 ` Rafał Miłecki @ 2018-10-19 21:28 ` Rafał Miłecki 2018-10-20 6:58 ` Richard Weinberger 2018-10-20 6:58 ` Richard Weinberger 3 siblings, 2 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-19 21:28 UTC (permalink / raw) To: Richard Weinberger Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote: > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger: > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > > Did you bisect the problem and you are sure that the said commit is the first bad commit? > > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs, > if I read the commit correctly. > Maybe this is the root of the problem. > > I'm currently traveling, so I cannot test much. Please make sure whether > 3a1e819b4e80 is really the first bad commit. > Then I can go for a bug hunt. :-) I did another test and everything keeps pointing to the 3a1e819b4e80. 1) I switched to the 4.12.14 2) I confirmed problem exists there 3) I added "return 0;" at the beginning of ovl_set_origin() 4) I confirmed problem disappeared -- Rafał ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 21:28 ` Rafał Miłecki @ 2018-10-20 6:58 ` Richard Weinberger 2018-10-22 7:01 ` Rafał Miłecki 2018-10-20 6:58 ` Richard Weinberger 1 sibling, 1 reply; 40+ messages in thread From: Richard Weinberger @ 2018-10-20 6:58 UTC (permalink / raw) To: Rafał Miłecki Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List Am Freitag, 19. Oktober 2018, 23:28:33 CEST schrieb Rafał Miłecki: > On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote: > > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger: > > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > > > Did you bisect the problem and you are sure that the said commit is the first bad commit? > > > > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs, > > if I read the commit correctly. > > Maybe this is the root of the problem. > > > > I'm currently traveling, so I cannot test much. Please make sure whether > > 3a1e819b4e80 is really the first bad commit. > > Then I can go for a bug hunt. :-) > > I did another test and everything keeps pointing to the 3a1e819b4e80. > > 1) I switched to the 4.12.14 > 2) I confirmed problem exists there > 3) I added "return 0;" at the beginning of ovl_set_origin() > 4) I confirmed problem disappeared Okay. But I still think we're facing some UBIFS problem. To double check, on Linux 4.12.14 the problem exists and by reverting 3a1e819b4e80 (or making it a nop) it goes away? Do you see the warning regarding the double orphan only in the bad case? Thanks, //richard ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-20 6:58 ` Richard Weinberger @ 2018-10-22 7:01 ` Rafał Miłecki 0 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-22 7:01 UTC (permalink / raw) To: Richard Weinberger Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List On Sat, 20 Oct 2018 at 08:58, Richard Weinberger <richard@nod.at> wrote: > Am Freitag, 19. Oktober 2018, 23:28:33 CEST schrieb Rafał Miłecki: > > On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote: > > > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger: > > > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > > > > Did you bisect the problem and you are sure that the said commit is the first bad commit? > > > > > > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs, > > > if I read the commit correctly. > > > Maybe this is the root of the problem. > > > > > > I'm currently traveling, so I cannot test much. Please make sure whether > > > 3a1e819b4e80 is really the first bad commit. > > > Then I can go for a bug hunt. :-) > > > > I did another test and everything keeps pointing to the 3a1e819b4e80. > > > > 1) I switched to the 4.12.14 > > 2) I confirmed problem exists there > > 3) I added "return 0;" at the beginning of ovl_set_origin() > > 4) I confirmed problem disappeared > > Okay. But I still think we're facing some UBIFS problem. > > To double check, on Linux 4.12.14 the problem exists and by reverting 3a1e819b4e80 (or making it a nop) > it goes away? Yes. This is exactly what I described above (4 steps test). Making ovl_set_origin() a NOP in 4.12.14 makes problem disappear. > Do you see the warning regarding the double orphan only in the bad case? No. Let me quote myself: "Please note I wrote this error appears *with* ovl commit and also with one commit earlier.". If I checkout 3a1e819b4e80~1 (one commit before) I don't see power cut corruption BUT I still see orphan error. It seems despite me explaining some things twice still not everything is clear. Let me work on a complete summary for all commits & behaviors. -- Rafał ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-22 7:01 ` Rafał Miłecki 0 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-22 7:01 UTC (permalink / raw) To: Richard Weinberger Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List On Sat, 20 Oct 2018 at 08:58, Richard Weinberger <richard@nod.at> wrote: > Am Freitag, 19. Oktober 2018, 23:28:33 CEST schrieb Rafał Miłecki: > > On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote: > > > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger: > > > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > > > > Did you bisect the problem and you are sure that the said commit is the first bad commit? > > > > > > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs, > > > if I read the commit correctly. > > > Maybe this is the root of the problem. > > > > > > I'm currently traveling, so I cannot test much. Please make sure whether > > > 3a1e819b4e80 is really the first bad commit. > > > Then I can go for a bug hunt. :-) > > > > I did another test and everything keeps pointing to the 3a1e819b4e80. > > > > 1) I switched to the 4.12.14 > > 2) I confirmed problem exists there > > 3) I added "return 0;" at the beginning of ovl_set_origin() > > 4) I confirmed problem disappeared > > Okay. But I still think we're facing some UBIFS problem. > > To double check, on Linux 4.12.14 the problem exists and by reverting 3a1e819b4e80 (or making it a nop) > it goes away? Yes. This is exactly what I described above (4 steps test). Making ovl_set_origin() a NOP in 4.12.14 makes problem disappear. > Do you see the warning regarding the double orphan only in the bad case? No. Let me quote myself: "Please note I wrote this error appears *with* ovl commit and also with one commit earlier.". If I checkout 3a1e819b4e80~1 (one commit before) I don't see power cut corruption BUT I still see orphan error. It seems despite me explaining some things twice still not everything is clear. Let me work on a complete summary for all commits & behaviors. -- Rafał ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 21:28 ` Rafał Miłecki 2018-10-20 6:58 ` Richard Weinberger @ 2018-10-20 6:58 ` Richard Weinberger 1 sibling, 0 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-20 6:58 UTC (permalink / raw) To: Rafał Miłecki Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List Am Freitag, 19. Oktober 2018, 23:28:33 CEST schrieb Rafał Miłecki: > On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote: > > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger: > > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > > > Did you bisect the problem and you are sure that the said commit is the first bad commit? > > > > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs, > > if I read the commit correctly. > > Maybe this is the root of the problem. > > > > I'm currently traveling, so I cannot test much. Please make sure whether > > 3a1e819b4e80 is really the first bad commit. > > Then I can go for a bug hunt. :-) > > I did another test and everything keeps pointing to the 3a1e819b4e80. > > 1) I switched to the 4.12.14 > 2) I confirmed problem exists there > 3) I added "return 0;" at the beginning of ovl_set_origin() > 4) I confirmed problem disappeared Okay. But I still think we're facing some UBIFS problem. To double check, on Linux 4.12.14 the problem exists and by reverting 3a1e819b4e80 (or making it a nop) it goes away? Do you see the warning regarding the double orphan only in the bad case? Thanks, //richard ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 14:45 ` Richard Weinberger ` (2 preceding siblings ...) (?) @ 2018-10-19 16:18 ` Rafał Miłecki -1 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-19 16:18 UTC (permalink / raw) To: Richard Weinberger Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List On Fri, 19 Oct 2018 at 16:45, Richard Weinberger <richard@nod.at> wrote: > ----- Ursprüngliche Mail ----- > > Von: "Rafał Miłecki" <zajec5@gmail.com> > > An: "Amir Goldstein" <amir73il@gmail.com>, "Miklos Szeredi" <miklos@szeredi.hu>, linux-unionfs@vger.kernel.org, > > linux-fsdevel@vger.kernel.org, "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter" > > <adrian.hunter@intel.com>, linux-mtd@lists.infradead.org, "Russell Senior" <russell@personaltelco.net>, "OpenWrt > > Development List" <openwrt-devel@lists.openwrt.org> > > Gesendet: Freitag, 19. Oktober 2018 14:31:29 > > Betreff: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") > > > Hi, > > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > & describe test case for reproducing corruption when doing a power cut > > after first boot. > > > > Interestingly it cannot be reproduced on all devices (NAND dependant? > > arch dependant?!). I couldn't reproduce that problem on none of my > > Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti > > EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and > > bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of > > lower inode on copy up"). > > > > FWIW I was told it also affects: > > Asus RT-AC58U (ARCH_IPQ40XX=y) > > powerpc > > RB493G, DIR-860L (ATH79=y) > > > > Steps to reproduce the problem: > > 1) Flash firmware > > 2) Boot (for the first time) > > 3) Let the init script copy config files from lowerdir to the upperdir > > 4) Wait for boot to finish > > 5) Verify content of some unmodified config on overlay, using either: > > hexdump -C /etc/config/dropbear > > hexdump -C /overlay/upper/etc/config/dropbear > > 6) Power cut & boot again > > 7) Check the content of the same file > > Do you have something also I can test? > A C reproducer? An xfstest case? I don't. I may try writing one with info provided my Amir, but I'm not experienced with such things, won't be trivial for me. > > After above regressing commit the later check confirms the file size > > looks correct but it's filled with all 00-es only. > > > > Can I ask you to check if there is something possibly wrong with the > > above ovl commit? Or does it expose some problem with the ubifs? Or > > maybe the whole UBI? > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > Did you bisect the problem and you are sure that the said commit is the first bad commit? Yes, I did git bisect and then double verified that. > > FWIW testing above commit (and one before it) always results in single > > error in the kernel log: > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > Please show the full log. > The orphan thing rings a bell, we had such a bug already. I will get a full log later. Please note I wrote this error appears *with* ovl commit and also with one commit earlier. So it's very unlikely to be caused by ovl change. Most likely it was some error present in 4.11.0-rc1 and fixed later (not related to ovl). -- Rafał ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 14:45 ` Richard Weinberger ` (3 preceding siblings ...) (?) @ 2018-10-19 16:18 ` Rafał Miłecki 2018-10-19 17:18 ` Richard Weinberger -1 siblings, 1 reply; 40+ messages in thread From: Rafał Miłecki @ 2018-10-19 16:18 UTC (permalink / raw) To: Richard Weinberger Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List On Fri, 19 Oct 2018 at 16:45, Richard Weinberger <richard@nod.at> wrote: > ----- Ursprüngliche Mail ----- > > Von: "Rafał Miłecki" <zajec5@gmail.com> > > An: "Amir Goldstein" <amir73il@gmail.com>, "Miklos Szeredi" <miklos@szeredi.hu>, linux-unionfs@vger.kernel.org, > > linux-fsdevel@vger.kernel.org, "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter" > > <adrian.hunter@intel.com>, linux-mtd@lists.infradead.org, "Russell Senior" <russell@personaltelco.net>, "OpenWrt > > Development List" <openwrt-devel@lists.openwrt.org> > > Gesendet: Freitag, 19. Oktober 2018 14:31:29 > > Betreff: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") > > > Hi, > > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > & describe test case for reproducing corruption when doing a power cut > > after first boot. > > > > Interestingly it cannot be reproduced on all devices (NAND dependant? > > arch dependant?!). I couldn't reproduce that problem on none of my > > Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti > > EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and > > bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of > > lower inode on copy up"). > > > > FWIW I was told it also affects: > > Asus RT-AC58U (ARCH_IPQ40XX=y) > > powerpc > > RB493G, DIR-860L (ATH79=y) > > > > Steps to reproduce the problem: > > 1) Flash firmware > > 2) Boot (for the first time) > > 3) Let the init script copy config files from lowerdir to the upperdir > > 4) Wait for boot to finish > > 5) Verify content of some unmodified config on overlay, using either: > > hexdump -C /etc/config/dropbear > > hexdump -C /overlay/upper/etc/config/dropbear > > 6) Power cut & boot again > > 7) Check the content of the same file > > Do you have something also I can test? > A C reproducer? An xfstest case? I don't. I may try writing one with info provided my Amir, but I'm not experienced with such things, won't be trivial for me. > > After above regressing commit the later check confirms the file size > > looks correct but it's filled with all 00-es only. > > > > Can I ask you to check if there is something possibly wrong with the > > above ovl commit? Or does it expose some problem with the ubifs? Or > > maybe the whole UBI? > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs. > Did you bisect the problem and you are sure that the said commit is the first bad commit? Yes, I did git bisect and then double verified that. > > FWIW testing above commit (and one before it) always results in single > > error in the kernel log: > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > Please show the full log. > The orphan thing rings a bell, we had such a bug already. I will get a full log later. Please note I wrote this error appears *with* ovl commit and also with one commit earlier. So it's very unlikely to be caused by ovl change. Most likely it was some error present in 4.11.0-rc1 and fixed later (not related to ovl). -- Rafał ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 16:18 ` Rafał Miłecki @ 2018-10-19 17:18 ` Richard Weinberger 0 siblings, 0 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-19 17:18 UTC (permalink / raw) To: Rafał Miłecki Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List Am Freitag, 19. Oktober 2018, 18:18:05 CEST schrieb Rafał Miłecki: > > Do you have something also I can test? > > A C reproducer? An xfstest case? > > I don't. I may try writing one with info provided my Amir, but I'm not > experienced with such things, won't be trivial for me. Next week I'm in Edinburgh at ELCE, you too? Maybe we find an hour or two to work together on the problem. Thanks, //richard ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-19 17:18 ` Richard Weinberger 0 siblings, 0 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-19 17:18 UTC (permalink / raw) To: Rafał Miłecki Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List Am Freitag, 19. Oktober 2018, 18:18:05 CEST schrieb Rafał Miłecki: > > Do you have something also I can test? > > A C reproducer? An xfstest case? > > I don't. I may try writing one with info provided my Amir, but I'm not > experienced with such things, won't be trivial for me. Next week I'm in Edinburgh at ELCE, you too? Maybe we find an hour or two to work together on the problem. Thanks, //richard ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 17:18 ` Richard Weinberger @ 2018-10-19 21:29 ` Rafał Miłecki -1 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-19 21:29 UTC (permalink / raw) To: Richard Weinberger Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List On Fri, 19 Oct 2018 at 19:18, Richard Weinberger <richard@nod.at> wrote: > Am Freitag, 19. Oktober 2018, 18:18:05 CEST schrieb Rafał Miłecki: > > > Do you have something also I can test? > > > A C reproducer? An xfstest case? > > > > I don't. I may try writing one with info provided my Amir, but I'm not > > experienced with such things, won't be trivial for me. > > Next week I'm in Edinburgh at ELCE, you too? > Maybe we find an hour or two to work together on the problem. Sadly I'm not attending ELCE :( I chose to attend OpenWrt summit which takes place a week later. -- Rafał ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-19 21:29 ` Rafał Miłecki 0 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-19 21:29 UTC (permalink / raw) To: Richard Weinberger Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List On Fri, 19 Oct 2018 at 19:18, Richard Weinberger <richard@nod.at> wrote: > Am Freitag, 19. Oktober 2018, 18:18:05 CEST schrieb Rafał Miłecki: > > > Do you have something also I can test? > > > A C reproducer? An xfstest case? > > > > I don't. I may try writing one with info provided my Amir, but I'm not > > experienced with such things, won't be trivial for me. > > Next week I'm in Edinburgh at ELCE, you too? > Maybe we find an hour or two to work together on the problem. Sadly I'm not attending ELCE :( I chose to attend OpenWrt summit which takes place a week later. -- Rafał ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 12:31 ` Rafał Miłecki @ 2018-10-19 14:56 ` Amir Goldstein -1 siblings, 0 replies; 40+ messages in thread From: Amir Goldstein @ 2018-10-19 14:56 UTC (permalink / raw) To: Rafał Miłecki Cc: Artem Bityutskiy, Richard Weinberger, Miklos Szeredi, Adrian Hunter, overlayfs, linux-mtd, russell, linux-fsdevel, openwrt-devel On Fri, Oct 19, 2018 at 3:31 PM Rafał Miłecki <zajec5@gmail.com> wrote: > > Hi, > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > reporting file system corruptions. OpenWrt uses overlay(fs) with > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > & describe test case for reproducing corruption when doing a power cut > after first boot. > > Interestingly it cannot be reproduced on all devices (NAND dependant? > arch dependant?!). I couldn't reproduce that problem on none of my > Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti > EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and > bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of > lower inode on copy up"). > > FWIW I was told it also affects: > Asus RT-AC58U (ARCH_IPQ40XX=y) > powerpc > RB493G, DIR-860L (ATH79=y) > > Steps to reproduce the problem: > 1) Flash firmware > 2) Boot (for the first time) > 3) Let the init script copy config files from lowerdir to the upperdir > 4) Wait for boot to finish > 5) Verify content of some unmodified config on overlay, using either: > hexdump -C /etc/config/dropbear > hexdump -C /overlay/upper/etc/config/dropbear > 6) Power cut & boot again > 7) Check the content of the same file > > After above regressing commit the later check confirms the file size > looks correct but it's filled with all 00-es only. > > Can I ask you to check if there is something possibly wrong with the > above ovl commit? Or does it expose some problem with the ubifs? Or > maybe the whole UBI? > > FWIW testing above commit (and one before it) always results in single > error in the kernel log: > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > This looks very suspicious and should be taken up with ubifs developers. There shouldn't be any way for an external module (overlayfs) to generate internal errors like this in ubifs. Let me describes what the "offending" commit changes to help you narrow down the reproducer. Before the change when config file is copied up overlayfs does: - open O_TMPFILE in upper ubifs - copy lower file data to tmpfile - copy lower file metadata to tmpfile - link tmpfile to upper ubifs path After the change, the *only* extra operation is: vfs_setxattr(tmpfile, "trusted.overlay.origin", NULL, 0, 0); after opening tmpfile Add to that the fact that O_TMPFILE support in ubifs is rather new and you get high likelihood that the problem is stemmed somewhere around setting extended attributes on O_TMPFILE in ubifs. You can try to write a small reproducer that does the 5 steps above without overlayfs directly on ubifs and see if youo get similar results after power cut. BTW, creating an O_TMPFILE "orphands" the created inode (once) something in this procedure, allegedly setxattr() for some reason causes the inode to be orphaned twice if we believe the error from ubifs. BTW, off topic, I posted some overlayfs patches that enable more features for lower squashfs: https://www.spinics.net/lists/linux-unionfs/msg05570.html The most prominent feature is NFS export, but there is also persistent st_dev;st_ino values that don't change after copy up + reboot and not breaking hardlinks on copy up. Can anyone from OpenWrt say if this is at all interesting for your users? Thanks, Amir. > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > -- > Rafał ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-19 14:56 ` Amir Goldstein 0 siblings, 0 replies; 40+ messages in thread From: Amir Goldstein @ 2018-10-19 14:56 UTC (permalink / raw) To: Rafał Miłecki Cc: Miklos Szeredi, overlayfs, linux-fsdevel, Richard Weinberger, Artem Bityutskiy, Adrian Hunter, linux-mtd, russell, openwrt-devel On Fri, Oct 19, 2018 at 3:31 PM Rafał Miłecki <zajec5@gmail.com> wrote: > > Hi, > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > reporting file system corruptions. OpenWrt uses overlay(fs) with > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > & describe test case for reproducing corruption when doing a power cut > after first boot. > > Interestingly it cannot be reproduced on all devices (NAND dependant? > arch dependant?!). I couldn't reproduce that problem on none of my > Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti > EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and > bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of > lower inode on copy up"). > > FWIW I was told it also affects: > Asus RT-AC58U (ARCH_IPQ40XX=y) > powerpc > RB493G, DIR-860L (ATH79=y) > > Steps to reproduce the problem: > 1) Flash firmware > 2) Boot (for the first time) > 3) Let the init script copy config files from lowerdir to the upperdir > 4) Wait for boot to finish > 5) Verify content of some unmodified config on overlay, using either: > hexdump -C /etc/config/dropbear > hexdump -C /overlay/upper/etc/config/dropbear > 6) Power cut & boot again > 7) Check the content of the same file > > After above regressing commit the later check confirms the file size > looks correct but it's filled with all 00-es only. > > Can I ask you to check if there is something possibly wrong with the > above ovl commit? Or does it expose some problem with the ubifs? Or > maybe the whole UBI? > > FWIW testing above commit (and one before it) always results in single > error in the kernel log: > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > This looks very suspicious and should be taken up with ubifs developers. There shouldn't be any way for an external module (overlayfs) to generate internal errors like this in ubifs. Let me describes what the "offending" commit changes to help you narrow down the reproducer. Before the change when config file is copied up overlayfs does: - open O_TMPFILE in upper ubifs - copy lower file data to tmpfile - copy lower file metadata to tmpfile - link tmpfile to upper ubifs path After the change, the *only* extra operation is: vfs_setxattr(tmpfile, "trusted.overlay.origin", NULL, 0, 0); after opening tmpfile Add to that the fact that O_TMPFILE support in ubifs is rather new and you get high likelihood that the problem is stemmed somewhere around setting extended attributes on O_TMPFILE in ubifs. You can try to write a small reproducer that does the 5 steps above without overlayfs directly on ubifs and see if youo get similar results after power cut. BTW, creating an O_TMPFILE "orphands" the created inode (once) something in this procedure, allegedly setxattr() for some reason causes the inode to be orphaned twice if we believe the error from ubifs. BTW, off topic, I posted some overlayfs patches that enable more features for lower squashfs: https://www.spinics.net/lists/linux-unionfs/msg05570.html The most prominent feature is NFS export, but there is also persistent st_dev;st_ino values that don't change after copy up + reboot and not breaking hardlinks on copy up. Can anyone from OpenWrt say if this is at all interesting for your users? Thanks, Amir. > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > -- > Rafał ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-19 12:31 ` Rafał Miłecki @ 2018-10-22 7:14 ` Rafał Miłecki -1 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-22 7:14 UTC (permalink / raw) To: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Richard Weinberger, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > reporting file system corruptions. OpenWrt uses overlay(fs) with > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > & describe test case for reproducing corruption when doing a power cut > after first boot. > > (...) > > Can I ask you to check if there is something possibly wrong with the > above ovl commit? Or does it expose some problem with the ubifs? Or > maybe the whole UBI? > > FWIW testing above commit (and one before it) always results in single > error in the kernel log: > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. Let me provide a summary of all relevant commits & tests: By "Corruption" I mean file system corruption after power cut By "UBIFS error" I mean: [ 14.308125] UBIFS error (ubi0:1 pid 628): ubifs_add_orphan: orphaned twice 1) 3a1e819b4e80~1 (one commit before regression) Corruption: NO UBIFS error: YES 2) 3a1e819b4e80~1 + cherry-pick of 32fe905c17f0 (one commit before regression + ubifs fix) Corruption: NO UBIFS error: NO 3) 3a1e819b4e80 (regressing commit) Corruption: YES UBIFS error: YES 4) 3a1e819b4e80 + cherry-pick of 32fe905c17f0 (regressing commit + ubifs fix) Corruption: YES UBIFS error: NO 5) 4.12.14 Corruption: YES UBIFS error: NO 6) 4.12.14 + ovl_set_origin() made nop Corruption: NO UBIFS error: NO It proves that: 1) UBIFS error is not related to this file system corruption and can be ignored 2) 3a1e819b4e80 and ovl_set_origin() are definitely responsible for the regression and file system corruption -- Rafał ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-22 7:14 ` Rafał Miłecki 0 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-22 7:14 UTC (permalink / raw) To: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Richard Weinberger, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > reporting file system corruptions. OpenWrt uses overlay(fs) with > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > & describe test case for reproducing corruption when doing a power cut > after first boot. > > (...) > > Can I ask you to check if there is something possibly wrong with the > above ovl commit? Or does it expose some problem with the ubifs? Or > maybe the whole UBI? > > FWIW testing above commit (and one before it) always results in single > error in the kernel log: > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. Let me provide a summary of all relevant commits & tests: By "Corruption" I mean file system corruption after power cut By "UBIFS error" I mean: [ 14.308125] UBIFS error (ubi0:1 pid 628): ubifs_add_orphan: orphaned twice 1) 3a1e819b4e80~1 (one commit before regression) Corruption: NO UBIFS error: YES 2) 3a1e819b4e80~1 + cherry-pick of 32fe905c17f0 (one commit before regression + ubifs fix) Corruption: NO UBIFS error: NO 3) 3a1e819b4e80 (regressing commit) Corruption: YES UBIFS error: YES 4) 3a1e819b4e80 + cherry-pick of 32fe905c17f0 (regressing commit + ubifs fix) Corruption: YES UBIFS error: NO 5) 4.12.14 Corruption: YES UBIFS error: NO 6) 4.12.14 + ovl_set_origin() made nop Corruption: NO UBIFS error: NO It proves that: 1) UBIFS error is not related to this file system corruption and can be ignored 2) 3a1e819b4e80 and ovl_set_origin() are definitely responsible for the regression and file system corruption -- Rafał ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-22 7:14 ` Rafał Miłecki (?) @ 2018-10-22 8:26 ` Richard Weinberger -1 siblings, 0 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-22 8:26 UTC (permalink / raw) To: Rafał Miłecki Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > & describe test case for reproducing corruption when doing a power cut > > after first boot. > > > > (...) > > > > Can I ask you to check if there is something possibly wrong with the > > above ovl commit? Or does it expose some problem with the ubifs? Or > > maybe the whole UBI? > > > > FWIW testing above commit (and one before it) always results in single > > error in the kernel log: > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > Let me provide a summary of all relevant commits & tests: > > By "Corruption" I mean file system corruption after power cut Well, is the filesystem not consistent anymore? From what Russel explained to me, I thought the main problem is that no write back happens. IOW the inode is present, has correct length, but no content is there (all zeros). Just like the typical case where userspace does not fsync. But in your case sooner or later write back should have happened because the writeback timer fires at some point. Thanks, //richard ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-22 8:26 ` Richard Weinberger 0 siblings, 0 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-22 8:26 UTC (permalink / raw) To: Rafał Miłecki Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > & describe test case for reproducing corruption when doing a power cut > > after first boot. > > > > (...) > > > > Can I ask you to check if there is something possibly wrong with the > > above ovl commit? Or does it expose some problem with the ubifs? Or > > maybe the whole UBI? > > > > FWIW testing above commit (and one before it) always results in single > > error in the kernel log: > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > Let me provide a summary of all relevant commits & tests: > > By "Corruption" I mean file system corruption after power cut Well, is the filesystem not consistent anymore? From what Russel explained to me, I thought the main problem is that no write back happens. IOW the inode is present, has correct length, but no content is there (all zeros). Just like the typical case where userspace does not fsync. But in your case sooner or later write back should have happened because the writeback timer fires at some point. Thanks, //richard ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-22 8:26 ` Richard Weinberger 0 siblings, 0 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-22 8:26 UTC (permalink / raw) To: Rafał Miłecki Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > & describe test case for reproducing corruption when doing a power cut > > after first boot. > > > > (...) > > > > Can I ask you to check if there is something possibly wrong with the > > above ovl commit? Or does it expose some problem with the ubifs? Or > > maybe the whole UBI? > > > > FWIW testing above commit (and one before it) always results in single > > error in the kernel log: > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > Let me provide a summary of all relevant commits & tests: > > By "Corruption" I mean file system corruption after power cut Well, is the filesystem not consistent anymore? >From what Russel explained to me, I thought the main problem is that no write back happens. IOW the inode is present, has correct length, but no content is there (all zeros). Just like the typical case where userspace does not fsync. But in your case sooner or later write back should have happened because the writeback timer fires at some point. Thanks, //richard ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-22 8:26 ` Richard Weinberger @ 2018-10-22 8:57 ` Amir Goldstein -1 siblings, 0 replies; 40+ messages in thread From: Amir Goldstein @ 2018-10-22 8:57 UTC (permalink / raw) To: Richard Weinberger Cc: Artem Bityutskiy, Rafał Miłecki, Adrian Hunter, overlayfs, Miklos Szeredi, linux-mtd, russell, linux-fsdevel, openwrt-devel On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote: > > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > > & describe test case for reproducing corruption when doing a power cut > > > after first boot. > > > > > > (...) > > > > > > Can I ask you to check if there is something possibly wrong with the > > > above ovl commit? Or does it expose some problem with the ubifs? Or > > > maybe the whole UBI? > > > > > > FWIW testing above commit (and one before it) always results in single > > > error in the kernel log: > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > > > Let me provide a summary of all relevant commits & tests: > > > > By "Corruption" I mean file system corruption after power cut > > Well, is the filesystem not consistent anymore? > From what Russel explained to me, I thought the main problem is that no write back happens. > IOW the inode is present, has correct length, but no content is there (all zeros). > > Just like the typical case where userspace does not fsync. > But in your case sooner or later write back should have happened because the writeback timer > fires at some point. > For the records overlayfs does: - open(O_TMPFILE) - setxattr() [with 3a1e819b4e80] - write to tmpfile - fsync tmpfile - link tmpfile I suggest that you try the same from user space on ubifs. Thanks, Amir. ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-22 8:57 ` Amir Goldstein 0 siblings, 0 replies; 40+ messages in thread From: Amir Goldstein @ 2018-10-22 8:57 UTC (permalink / raw) To: Richard Weinberger Cc: Rafał Miłecki, Miklos Szeredi, overlayfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, russell, openwrt-devel On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote: > > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > > & describe test case for reproducing corruption when doing a power cut > > > after first boot. > > > > > > (...) > > > > > > Can I ask you to check if there is something possibly wrong with the > > > above ovl commit? Or does it expose some problem with the ubifs? Or > > > maybe the whole UBI? > > > > > > FWIW testing above commit (and one before it) always results in single > > > error in the kernel log: > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > > > Let me provide a summary of all relevant commits & tests: > > > > By "Corruption" I mean file system corruption after power cut > > Well, is the filesystem not consistent anymore? > From what Russel explained to me, I thought the main problem is that no write back happens. > IOW the inode is present, has correct length, but no content is there (all zeros). > > Just like the typical case where userspace does not fsync. > But in your case sooner or later write back should have happened because the writeback timer > fires at some point. > For the records overlayfs does: - open(O_TMPFILE) - setxattr() [with 3a1e819b4e80] - write to tmpfile - fsync tmpfile - link tmpfile I suggest that you try the same from user space on ubifs. Thanks, Amir. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-22 8:57 ` Amir Goldstein @ 2018-10-22 15:34 ` Rafał Miłecki -1 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-22 15:34 UTC (permalink / raw) To: Amir Goldstein Cc: Artem Bityutskiy, Richard Weinberger, Miklos Szeredi, Adrian Hunter, linux-unionfs, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List [-- Attachment #1: Type: text/plain, Size: 2806 bytes --] On Mon, 22 Oct 2018 at 10:57, Amir Goldstein <amir73il@gmail.com> wrote: > On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote: > > > > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > > > & describe test case for reproducing corruption when doing a power cut > > > > after first boot. > > > > > > > > (...) > > > > > > > > Can I ask you to check if there is something possibly wrong with the > > > > above ovl commit? Or does it expose some problem with the ubifs? Or > > > > maybe the whole UBI? > > > > > > > > FWIW testing above commit (and one before it) always results in single > > > > error in the kernel log: > > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > > > > > Let me provide a summary of all relevant commits & tests: > > > > > > By "Corruption" I mean file system corruption after power cut > > > > Well, is the filesystem not consistent anymore? > > From what Russel explained to me, I thought the main problem is that no write back happens. > > IOW the inode is present, has correct length, but no content is there (all zeros). > > > > Just like the typical case where userspace does not fsync. > > But in your case sooner or later write back should have happened because the writeback timer > > fires at some point. > > > > For the records overlayfs does: > - open(O_TMPFILE) > - setxattr() [with 3a1e819b4e80] > - write to tmpfile > - fsync tmpfile > - link tmpfile > > I suggest that you try the same from user space on ubifs. Are you 100% sure about it? I tried writing C app behaving as you described above and I could not reproduce the problem. Then I took a close look at ovl_copy_up_locked() and it seems above info isn't accurate. It seems to me that setxattr() happens between fsync and link. I modified my C app to follow that order (open, write, fsync, setxattr, link) and I can reproduce the problem now! Steps to reproduce the problem: 1) compile tmptest.c 2) tmptest /overlay/upper/foo.txt user.bar baz 3) wait 5 seconds (so ubifs writes to flash) 4) power cut 5) boot again and check content of /overlay/upper/foo.txt 6) in my case content appears to be 00 00 00 00 Is this correct for ovl to call vfs_setxattr() *after* calling vfs_fsync()? -- Rafał [-- Attachment #2: tmptest.c --] [-- Type: text/x-csrc, Size: 1616 bytes --] #include <errno.h> #include <fcntl.h> #include <libgen.h> #include <linux/limits.h> #include <stdio.h> #include <string.h> #include <sys/xattr.h> #include <unistd.h> static const char foo[] = "foo\n"; int main(int argc, char **argv) { char path[PATH_MAX]; char *dir; int err; int fd; if (argc < 2) { fprintf(stderr, "You have to pass file as argument\n"); return -ENOENT; } dir = strdup(argv[1]); if (!dir) { return -ENOMEM; } dir = dirname(dir); fd = open(dir, O_TMPFILE | O_RDWR); if (fd < 0) { err = -errno; fprintf(stderr, "Failed to open O_TMPFILE: %d\n", err); return err; } #if 0 if (argc >= 4) { if (fsetxattr(fd, argv[2], argv[3], strlen(argv[3]), 0) != 0) { err = -errno; fprintf(stderr, "Failed to fsetxattr(): %d\n", err); return err; } printf("Wrote xattr \"%s\" value \"%s\"\n", argv[2], argv[3]); } #endif if (write(fd, foo, sizeof(foo)) != sizeof(foo)) { err = -errno; fprintf(stderr, "Failed to write(): %d\n", err); return err; } if (fsync(fd) < 0) { err = -errno; fprintf(stderr, "Failed to fsync(): %d\n", err); return err; } #if 1 if (argc >= 4) { if (fsetxattr(fd, argv[2], argv[3], strlen(argv[3]), 0) != 0) { err = -errno; fprintf(stderr, "Failed to fsetxattr(): %d\n", err); return err; } printf("Wrote xattr \"%s\" value \"%s\"\n", argv[2], argv[3]); } #endif snprintf(path, sizeof(path), "/proc/self/fd/%d", fd); if (linkat(-1, path, AT_FDCWD, argv[1], AT_SYMLINK_FOLLOW) != 0) { err = -errno; fprintf(stderr, "Failed to linkat(): %d\n", err); return err; } printf("Success!\n"); return 0; } [-- Attachment #3: Type: text/plain, Size: 144 bytes --] ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-22 15:34 ` Rafał Miłecki 0 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-22 15:34 UTC (permalink / raw) To: Amir Goldstein Cc: Richard Weinberger, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List [-- Attachment #1: Type: text/plain, Size: 2806 bytes --] On Mon, 22 Oct 2018 at 10:57, Amir Goldstein <amir73il@gmail.com> wrote: > On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote: > > > > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > > > & describe test case for reproducing corruption when doing a power cut > > > > after first boot. > > > > > > > > (...) > > > > > > > > Can I ask you to check if there is something possibly wrong with the > > > > above ovl commit? Or does it expose some problem with the ubifs? Or > > > > maybe the whole UBI? > > > > > > > > FWIW testing above commit (and one before it) always results in single > > > > error in the kernel log: > > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > > > > > Let me provide a summary of all relevant commits & tests: > > > > > > By "Corruption" I mean file system corruption after power cut > > > > Well, is the filesystem not consistent anymore? > > From what Russel explained to me, I thought the main problem is that no write back happens. > > IOW the inode is present, has correct length, but no content is there (all zeros). > > > > Just like the typical case where userspace does not fsync. > > But in your case sooner or later write back should have happened because the writeback timer > > fires at some point. > > > > For the records overlayfs does: > - open(O_TMPFILE) > - setxattr() [with 3a1e819b4e80] > - write to tmpfile > - fsync tmpfile > - link tmpfile > > I suggest that you try the same from user space on ubifs. Are you 100% sure about it? I tried writing C app behaving as you described above and I could not reproduce the problem. Then I took a close look at ovl_copy_up_locked() and it seems above info isn't accurate. It seems to me that setxattr() happens between fsync and link. I modified my C app to follow that order (open, write, fsync, setxattr, link) and I can reproduce the problem now! Steps to reproduce the problem: 1) compile tmptest.c 2) tmptest /overlay/upper/foo.txt user.bar baz 3) wait 5 seconds (so ubifs writes to flash) 4) power cut 5) boot again and check content of /overlay/upper/foo.txt 6) in my case content appears to be 00 00 00 00 Is this correct for ovl to call vfs_setxattr() *after* calling vfs_fsync()? -- Rafał [-- Attachment #2: tmptest.c --] [-- Type: text/x-csrc, Size: 1616 bytes --] #include <errno.h> #include <fcntl.h> #include <libgen.h> #include <linux/limits.h> #include <stdio.h> #include <string.h> #include <sys/xattr.h> #include <unistd.h> static const char foo[] = "foo\n"; int main(int argc, char **argv) { char path[PATH_MAX]; char *dir; int err; int fd; if (argc < 2) { fprintf(stderr, "You have to pass file as argument\n"); return -ENOENT; } dir = strdup(argv[1]); if (!dir) { return -ENOMEM; } dir = dirname(dir); fd = open(dir, O_TMPFILE | O_RDWR); if (fd < 0) { err = -errno; fprintf(stderr, "Failed to open O_TMPFILE: %d\n", err); return err; } #if 0 if (argc >= 4) { if (fsetxattr(fd, argv[2], argv[3], strlen(argv[3]), 0) != 0) { err = -errno; fprintf(stderr, "Failed to fsetxattr(): %d\n", err); return err; } printf("Wrote xattr \"%s\" value \"%s\"\n", argv[2], argv[3]); } #endif if (write(fd, foo, sizeof(foo)) != sizeof(foo)) { err = -errno; fprintf(stderr, "Failed to write(): %d\n", err); return err; } if (fsync(fd) < 0) { err = -errno; fprintf(stderr, "Failed to fsync(): %d\n", err); return err; } #if 1 if (argc >= 4) { if (fsetxattr(fd, argv[2], argv[3], strlen(argv[3]), 0) != 0) { err = -errno; fprintf(stderr, "Failed to fsetxattr(): %d\n", err); return err; } printf("Wrote xattr \"%s\" value \"%s\"\n", argv[2], argv[3]); } #endif snprintf(path, sizeof(path), "/proc/self/fd/%d", fd); if (linkat(-1, path, AT_FDCWD, argv[1], AT_SYMLINK_FOLLOW) != 0) { err = -errno; fprintf(stderr, "Failed to linkat(): %d\n", err); return err; } printf("Success!\n"); return 0; } ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-22 15:34 ` Rafał Miłecki (?) @ 2018-10-22 17:00 ` Amir Goldstein -1 siblings, 0 replies; 40+ messages in thread From: Amir Goldstein @ 2018-10-22 17:00 UTC (permalink / raw) To: Rafał Miłecki Cc: Richard Weinberger, Miklos Szeredi, overlayfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, russell, openwrt-devel On Mon, Oct 22, 2018 at 6:34 PM Rafał Miłecki <zajec5@gmail.com> wrote: > > On Mon, 22 Oct 2018 at 10:57, Amir Goldstein <amir73il@gmail.com> wrote: > > On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote: > > > > > > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > > > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > > > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > > > > & describe test case for reproducing corruption when doing a power cut > > > > > after first boot. > > > > > > > > > > (...) > > > > > > > > > > Can I ask you to check if there is something possibly wrong with the > > > > > above ovl commit? Or does it expose some problem with the ubifs? Or > > > > > maybe the whole UBI? > > > > > > > > > > FWIW testing above commit (and one before it) always results in single > > > > > error in the kernel log: > > > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > > > > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > > > > > > > Let me provide a summary of all relevant commits & tests: > > > > > > > > By "Corruption" I mean file system corruption after power cut > > > > > > Well, is the filesystem not consistent anymore? > > > From what Russel explained to me, I thought the main problem is that no write back happens. > > > IOW the inode is present, has correct length, but no content is there (all zeros). > > > > > > Just like the typical case where userspace does not fsync. > > > But in your case sooner or later write back should have happened because the writeback timer > > > fires at some point. > > > > > > > For the records overlayfs does: > > - open(O_TMPFILE) > > - setxattr() [with 3a1e819b4e80] > > - write to tmpfile > > - fsync tmpfile > > - link tmpfile > > > > I suggest that you try the same from user space on ubifs. > > Are you 100% sure about it? I tried writing C app behaving as you > described above and I could not reproduce the problem. > I am never 100% sure ;-) I looked at upstream code, which does ovl_set_origin() before copying data. commit 3a1e819b4e80 does ovl_set_origin() after fsync like you say. > Then I took a close look at ovl_copy_up_locked() and it seems above > info isn't accurate. It seems to me that setxattr() happens between > fsync and link. I modified my C app to follow that order (open, write, > fsync, setxattr, link) and I can reproduce the problem now! > > Steps to reproduce the problem: > 1) compile tmptest.c > 2) tmptest /overlay/upper/foo.txt user.bar baz > 3) wait 5 seconds (so ubifs writes to flash) > 4) power cut > 5) boot again and check content of /overlay/upper/foo.txt > 6) in my case content appears to be 00 00 00 00 > > Is this correct for ovl to call vfs_setxattr() *after* calling vfs_fsync()? > Let's just clarify that for the matter of data corruption it doesn't matter if and when overlayfs does vfs_setxattr(). Overalyfs does write()+fsync() and that is all that should matter for guarantying data integrity. Regarding "Is it correct to setxattr after fsync()?", that question depends on what overlayfs needs to guaranty w.r.t crash consistency of its private xattrs and on the type of crash consistency guaranties overlayfs expects from the underlying filesystem. Overlayfs needs to guaranty that if crash happens mid copy up, after reboot, either the upper file is not there or the upper file is there with all copied data and all copied metadata and the overlay xattrs. Overlayfs current code expects the upper filesystem to have "strictly ordered metadata" semantics, meaning that setxattr();link(), cannot appear after crash as a linked file without the xattr. Just the same as mkdir();chown();rename() cannot appear after crash as renamed dir without the chwon(); Those semantics are provided by the journaling file systems (xfs, ext4) and to be honest I am not sure if they are provided by btrfs or not - every time this subject came up, btrfs developer answers were a bit too vague for me to understand what is the expected behavior in btrfs. Does ubifs has a problem to provide this guaranty? Finally, I'll repeat with what I started - this has nothing to do with data consistency. In fact, in the specific case of lower squashfs, overlayfs will end up writing a zero length xattr, which overlayfs only ever cares about on directories anyway. Thanks, Amir. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-22 15:34 ` Rafał Miłecki (?) (?) @ 2018-10-22 17:00 ` Amir Goldstein -1 siblings, 0 replies; 40+ messages in thread From: Amir Goldstein @ 2018-10-22 17:00 UTC (permalink / raw) To: Rafał Miłecki Cc: Artem Bityutskiy, Richard Weinberger, Miklos Szeredi, Adrian Hunter, overlayfs, linux-mtd, russell, linux-fsdevel, openwrt-devel On Mon, Oct 22, 2018 at 6:34 PM Rafał Miłecki <zajec5@gmail.com> wrote: > > On Mon, 22 Oct 2018 at 10:57, Amir Goldstein <amir73il@gmail.com> wrote: > > On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote: > > > > > > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > > > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > > > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > > > > & describe test case for reproducing corruption when doing a power cut > > > > > after first boot. > > > > > > > > > > (...) > > > > > > > > > > Can I ask you to check if there is something possibly wrong with the > > > > > above ovl commit? Or does it expose some problem with the ubifs? Or > > > > > maybe the whole UBI? > > > > > > > > > > FWIW testing above commit (and one before it) always results in single > > > > > error in the kernel log: > > > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > > > > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > > > > > > > Let me provide a summary of all relevant commits & tests: > > > > > > > > By "Corruption" I mean file system corruption after power cut > > > > > > Well, is the filesystem not consistent anymore? > > > From what Russel explained to me, I thought the main problem is that no write back happens. > > > IOW the inode is present, has correct length, but no content is there (all zeros). > > > > > > Just like the typical case where userspace does not fsync. > > > But in your case sooner or later write back should have happened because the writeback timer > > > fires at some point. > > > > > > > For the records overlayfs does: > > - open(O_TMPFILE) > > - setxattr() [with 3a1e819b4e80] > > - write to tmpfile > > - fsync tmpfile > > - link tmpfile > > > > I suggest that you try the same from user space on ubifs. > > Are you 100% sure about it? I tried writing C app behaving as you > described above and I could not reproduce the problem. > I am never 100% sure ;-) I looked at upstream code, which does ovl_set_origin() before copying data. commit 3a1e819b4e80 does ovl_set_origin() after fsync like you say. > Then I took a close look at ovl_copy_up_locked() and it seems above > info isn't accurate. It seems to me that setxattr() happens between > fsync and link. I modified my C app to follow that order (open, write, > fsync, setxattr, link) and I can reproduce the problem now! > > Steps to reproduce the problem: > 1) compile tmptest.c > 2) tmptest /overlay/upper/foo.txt user.bar baz > 3) wait 5 seconds (so ubifs writes to flash) > 4) power cut > 5) boot again and check content of /overlay/upper/foo.txt > 6) in my case content appears to be 00 00 00 00 > > Is this correct for ovl to call vfs_setxattr() *after* calling vfs_fsync()? > Let's just clarify that for the matter of data corruption it doesn't matter if and when overlayfs does vfs_setxattr(). Overalyfs does write()+fsync() and that is all that should matter for guarantying data integrity. Regarding "Is it correct to setxattr after fsync()?", that question depends on what overlayfs needs to guaranty w.r.t crash consistency of its private xattrs and on the type of crash consistency guaranties overlayfs expects from the underlying filesystem. Overlayfs needs to guaranty that if crash happens mid copy up, after reboot, either the upper file is not there or the upper file is there with all copied data and all copied metadata and the overlay xattrs. Overlayfs current code expects the upper filesystem to have "strictly ordered metadata" semantics, meaning that setxattr();link(), cannot appear after crash as a linked file without the xattr. Just the same as mkdir();chown();rename() cannot appear after crash as renamed dir without the chwon(); Those semantics are provided by the journaling file systems (xfs, ext4) and to be honest I am not sure if they are provided by btrfs or not - every time this subject came up, btrfs developer answers were a bit too vague for me to understand what is the expected behavior in btrfs. Does ubifs has a problem to provide this guaranty? Finally, I'll repeat with what I started - this has nothing to do with data consistency. In fact, in the specific case of lower squashfs, overlayfs will end up writing a zero length xattr, which overlayfs only ever cares about on directories anyway. Thanks, Amir. ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-22 15:34 ` Rafał Miłecki @ 2018-10-27 19:33 ` Richard Weinberger -1 siblings, 0 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-27 19:33 UTC (permalink / raw) To: Rafał Miłecki Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List [-- Attachment #1: Type: text/plain, Size: 1088 bytes --] Rafał, Am Montag, 22. Oktober 2018, 17:34:44 CEST schrieb Rafał Miłecki: > Then I took a close look at ovl_copy_up_locked() and it seems above > info isn't accurate. It seems to me that setxattr() happens between > fsync and link. I modified my C app to follow that order (open, write, > fsync, setxattr, link) and I can reproduce the problem now! > > Steps to reproduce the problem: > 1) compile tmptest.c > 2) tmptest /overlay/upper/foo.txt user.bar baz > 3) wait 5 seconds (so ubifs writes to flash) > 4) power cut > 5) boot again and check content of /overlay/upper/foo.txt > 6) in my case content appears to be 00 00 00 00 Just returned from Edinburgh and had a chance to look at the problem. The problem is not that no write-back happens, in fact it happens just fine. But we have a problem upon journal replay if an unlinked file (O_TMPFILE) gets relinked in combination with xattrs. Can you please give the attached patch a try? It is not perfect but if I understand the problem correctly it should fix the issues you're facing. Thanks, //richard [-- Attachment #2: replay_tmpfile_fix.diff --] [-- Type: text/x-patch, Size: 1263 bytes --] diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c index 4844538eb926..82b1244e4138 100644 --- a/fs/ubifs/replay.c +++ b/fs/ubifs/replay.c @@ -209,6 +209,19 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r) return ubifs_tnc_remove_range(c, &min_key, &max_key); } +static bool inode_comes_back(struct ubifs_info *c, struct replay_entry *rino) +{ + struct replay_entry *r; + + list_for_each_entry(r, &c->replay_list, list) { + if (keys_cmp(c, &rino->key, &r->key) == 0 && + r->deletion == 0) + return true; + } + + return false; +} + /** * apply_replay_entry - apply a replay entry to the TNC. * @c: UBIFS file-system description object @@ -218,7 +231,7 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r) */ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r) { - int err; + int err = 0; dbg_mntk(&r->key, "LEB %d:%d len %d deletion %d sqnum %llu key ", r->lnum, r->offs, r->len, r->deletion, r->sqnum); @@ -236,6 +249,9 @@ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r) { ino_t inum = key_inum(c, &r->key); + if (inode_comes_back(c, r)) + break; + err = ubifs_tnc_remove_ino(c, inum); break; } [-- Attachment #3: Type: text/plain, Size: 144 bytes --] ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-27 19:33 ` Richard Weinberger 0 siblings, 0 replies; 40+ messages in thread From: Richard Weinberger @ 2018-10-27 19:33 UTC (permalink / raw) To: Rafał Miłecki Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List [-- Attachment #1: Type: text/plain, Size: 1088 bytes --] Rafał, Am Montag, 22. Oktober 2018, 17:34:44 CEST schrieb Rafał Miłecki: > Then I took a close look at ovl_copy_up_locked() and it seems above > info isn't accurate. It seems to me that setxattr() happens between > fsync and link. I modified my C app to follow that order (open, write, > fsync, setxattr, link) and I can reproduce the problem now! > > Steps to reproduce the problem: > 1) compile tmptest.c > 2) tmptest /overlay/upper/foo.txt user.bar baz > 3) wait 5 seconds (so ubifs writes to flash) > 4) power cut > 5) boot again and check content of /overlay/upper/foo.txt > 6) in my case content appears to be 00 00 00 00 Just returned from Edinburgh and had a chance to look at the problem. The problem is not that no write-back happens, in fact it happens just fine. But we have a problem upon journal replay if an unlinked file (O_TMPFILE) gets relinked in combination with xattrs. Can you please give the attached patch a try? It is not perfect but if I understand the problem correctly it should fix the issues you're facing. Thanks, //richard [-- Attachment #2: replay_tmpfile_fix.diff --] [-- Type: text/x-patch, Size: 1263 bytes --] diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c index 4844538eb926..82b1244e4138 100644 --- a/fs/ubifs/replay.c +++ b/fs/ubifs/replay.c @@ -209,6 +209,19 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r) return ubifs_tnc_remove_range(c, &min_key, &max_key); } +static bool inode_comes_back(struct ubifs_info *c, struct replay_entry *rino) +{ + struct replay_entry *r; + + list_for_each_entry(r, &c->replay_list, list) { + if (keys_cmp(c, &rino->key, &r->key) == 0 && + r->deletion == 0) + return true; + } + + return false; +} + /** * apply_replay_entry - apply a replay entry to the TNC. * @c: UBIFS file-system description object @@ -218,7 +231,7 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r) */ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r) { - int err; + int err = 0; dbg_mntk(&r->key, "LEB %d:%d len %d deletion %d sqnum %llu key ", r->lnum, r->offs, r->len, r->deletion, r->sqnum); @@ -236,6 +249,9 @@ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r) { ino_t inum = key_inum(c, &r->key); + if (inode_comes_back(c, r)) + break; + err = ubifs_tnc_remove_ino(c, inum); break; } ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-27 19:33 ` Richard Weinberger (?) @ 2018-10-28 7:09 ` Russell Senior -1 siblings, 0 replies; 40+ messages in thread From: Russell Senior @ 2018-10-28 7:09 UTC (permalink / raw) To: richard Cc: zajec5, amir73il, miklos, linux-unionfs, linux-fsdevel, dedekind1, adrian.hunter, linux-mtd, openwrt-devel [-- Attachment #1: Type: text/plain, Size: 1306 bytes --] On Sat, Oct 27, 2018 at 12:33 PM Richard Weinberger <richard@nod.at> wrote: > Rafał, > > Am Montag, 22. Oktober 2018, 17:34:44 CEST schrieb Rafał Miłecki: > > Then I took a close look at ovl_copy_up_locked() and it seems above > > info isn't accurate. It seems to me that setxattr() happens between > > fsync and link. I modified my C app to follow that order (open, write, > > fsync, setxattr, link) and I can reproduce the problem now! > > > > Steps to reproduce the problem: > > 1) compile tmptest.c > > 2) tmptest /overlay/upper/foo.txt user.bar baz > > 3) wait 5 seconds (so ubifs writes to flash) > > 4) power cut > > 5) boot again and check content of /overlay/upper/foo.txt > > 6) in my case content appears to be 00 00 00 00 > > Just returned from Edinburgh and had a chance to look at the problem. > The problem is not that no write-back happens, in fact it happens just > fine. > But we have a problem upon journal replay if an unlinked file (O_TMPFILE) > gets relinked in combination with xattrs. > > Can you please give the attached patch a try? It is not perfect but if I > understand the problem correctly it should fix the issues you're facing. > Your patch seems to have fixed it for me, sample size = 1. -- Russell Senior russell@personaltelco.net [-- Attachment #2: Type: text/html, Size: 1777 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-22 8:26 ` Richard Weinberger @ 2018-10-22 21:23 ` Rafał Miłecki -1 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-22 21:23 UTC (permalink / raw) To: Richard Weinberger Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List [-- Attachment #1: Type: text/plain, Size: 2568 bytes --] On Mon, 22 Oct 2018 at 10:26, Richard Weinberger <richard@nod.at> wrote: > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > > & describe test case for reproducing corruption when doing a power cut > > > after first boot. > > > > > > (...) > > > > > > Can I ask you to check if there is something possibly wrong with the > > > above ovl commit? Or does it expose some problem with the ubifs? Or > > > maybe the whole UBI? > > > > > > FWIW testing above commit (and one before it) always results in single > > > error in the kernel log: > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > > > Let me provide a summary of all relevant commits & tests: > > > > By "Corruption" I mean file system corruption after power cut > > Well, is the filesystem not consistent anymore? > From what Russel explained to me, I thought the main problem is that no write back happens. > IOW the inode is present, has correct length, but no content is there (all zeros). I probably misused "corruption" word. What I meant by "corruption" was file having all "00"es instead of expected data. > Just like the typical case where userspace does not fsync. > But in your case sooner or later write back should have happened because the writeback timer > fires at some point. As you probably noticed I wrote tmptest.c - could you test it, please? See if you can reproduce the problem? Please call it with 3 arguments: 1) Path on ubifs mount point 2) Some xattr name 3) Some xattr value Please note I wait 5 seconds (this matches vm.dirty_writeback_centisecs being 500) before doing a power cut. That lets ubifs write to flash. For some reason files that got fsetxattr called for them still are all "00"es after a power cut. I did some extra testing after enabling ubifs debugging for io.c, file.c and journal.c. Debugging output looks like expected. I can clearly see wbuf_timer_callback_nolock() being called. I attached my debugging summary as ubifs-dbg.html please take a look at it in case I missed something. -- Rafał [-- Attachment #2: ubifs-dbg.html --] [-- Type: text/html, Size: 5727 bytes --] [-- Attachment #3: Type: text/plain, Size: 144 bytes --] ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-22 21:23 ` Rafał Miłecki 0 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-22 21:23 UTC (permalink / raw) To: Richard Weinberger Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List [-- Attachment #1: Type: text/plain, Size: 2568 bytes --] On Mon, 22 Oct 2018 at 10:26, Richard Weinberger <richard@nod.at> wrote: > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > > & describe test case for reproducing corruption when doing a power cut > > > after first boot. > > > > > > (...) > > > > > > Can I ask you to check if there is something possibly wrong with the > > > above ovl commit? Or does it expose some problem with the ubifs? Or > > > maybe the whole UBI? > > > > > > FWIW testing above commit (and one before it) always results in single > > > error in the kernel log: > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > > > Let me provide a summary of all relevant commits & tests: > > > > By "Corruption" I mean file system corruption after power cut > > Well, is the filesystem not consistent anymore? > From what Russel explained to me, I thought the main problem is that no write back happens. > IOW the inode is present, has correct length, but no content is there (all zeros). I probably misused "corruption" word. What I meant by "corruption" was file having all "00"es instead of expected data. > Just like the typical case where userspace does not fsync. > But in your case sooner or later write back should have happened because the writeback timer > fires at some point. As you probably noticed I wrote tmptest.c - could you test it, please? See if you can reproduce the problem? Please call it with 3 arguments: 1) Path on ubifs mount point 2) Some xattr name 3) Some xattr value Please note I wait 5 seconds (this matches vm.dirty_writeback_centisecs being 500) before doing a power cut. That lets ubifs write to flash. For some reason files that got fsetxattr called for them still are all "00"es after a power cut. I did some extra testing after enabling ubifs debugging for io.c, file.c and journal.c. Debugging output looks like expected. I can clearly see wbuf_timer_callback_nolock() being called. I attached my debugging summary as ubifs-dbg.html please take a look at it in case I missed something. -- Rafał [-- Attachment #2: ubifs-dbg.html --] [-- Type: text/html, Size: 5727 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") 2018-10-22 8:26 ` Richard Weinberger @ 2018-10-22 21:27 ` Rafał Miłecki -1 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-22 21:27 UTC (permalink / raw) To: Richard Weinberger Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs, Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel, OpenWrt Development List [resending without HTML attachment] [see http://files.zajec.net/openwrt/ubifs-dbg.html instead] On Mon, 22 Oct 2018 at 10:26, Richard Weinberger <richard@nod.at> wrote: > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > > & describe test case for reproducing corruption when doing a power cut > > > after first boot. > > > > > > (...) > > > > > > Can I ask you to check if there is something possibly wrong with the > > > above ovl commit? Or does it expose some problem with the ubifs? Or > > > maybe the whole UBI? > > > > > > FWIW testing above commit (and one before it) always results in single > > > error in the kernel log: > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > > > Let me provide a summary of all relevant commits & tests: > > > > By "Corruption" I mean file system corruption after power cut > > Well, is the filesystem not consistent anymore? > From what Russel explained to me, I thought the main problem is that no write back happens. > IOW the inode is present, has correct length, but no content is there (all zeros). I probably misused "corruption" word. What I meant by "corruption" was file having all "00"es instead of expected data. > Just like the typical case where userspace does not fsync. > But in your case sooner or later write back should have happened because the writeback timer > fires at some point. As you probably noticed I wrote tmptest.c - could you test it, please? See if you can reproduce the problem? Please call it with 3 arguments: 1) Path on ubifs mount point 2) Some xattr name 3) Some xattr value Please note I wait 5 seconds (this matches vm.dirty_writeback_centisecs being 500) before doing a power cut. That lets ubifs write to flash. For some reason files that got fsetxattr called for them still are all "00"es after a power cut. I did some extra testing after enabling ubifs debugging for io.c, file.c and journal.c. Debugging output looks like expected. I can clearly see wbuf_timer_callback_nolock() being called. I attached my debugging summary as ubifs-dbg.html please take a look at it in case I missed something. -- Rafał ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") @ 2018-10-22 21:27 ` Rafał Miłecki 0 siblings, 0 replies; 40+ messages in thread From: Rafał Miłecki @ 2018-10-22 21:27 UTC (permalink / raw) To: Richard Weinberger Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior, OpenWrt Development List [resending without HTML attachment] [see http://files.zajec.net/openwrt/ubifs-dbg.html instead] On Mon, 22 Oct 2018 at 10:26, Richard Weinberger <richard@nod.at> wrote: > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki: > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote: > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate > > > & describe test case for reproducing corruption when doing a power cut > > > after first boot. > > > > > > (...) > > > > > > Can I ask you to check if there is something possibly wrong with the > > > above ovl commit? Or does it expose some problem with the ubifs? Or > > > maybe the whole UBI? > > > > > > FWIW testing above commit (and one before it) always results in single > > > error in the kernel log: > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice > > > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > > > Let me provide a summary of all relevant commits & tests: > > > > By "Corruption" I mean file system corruption after power cut > > Well, is the filesystem not consistent anymore? > From what Russel explained to me, I thought the main problem is that no write back happens. > IOW the inode is present, has correct length, but no content is there (all zeros). I probably misused "corruption" word. What I meant by "corruption" was file having all "00"es instead of expected data. > Just like the typical case where userspace does not fsync. > But in your case sooner or later write back should have happened because the writeback timer > fires at some point. As you probably noticed I wrote tmptest.c - could you test it, please? See if you can reproduce the problem? Please call it with 3 arguments: 1) Path on ubifs mount point 2) Some xattr name 3) Some xattr value Please note I wait 5 seconds (this matches vm.dirty_writeback_centisecs being 500) before doing a power cut. That lets ubifs write to flash. For some reason files that got fsetxattr called for them still are all "00"es after a power cut. I did some extra testing after enabling ubifs debugging for io.c, file.c and journal.c. Debugging output looks like expected. I can clearly see wbuf_timer_callback_nolock() being called. I attached my debugging summary as ubifs-dbg.html please take a look at it in case I missed something. -- Rafał ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2018-10-28 7:09 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-19 12:31 Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") Rafał Miłecki 2018-10-19 12:31 ` Rafał Miłecki 2018-10-19 14:45 ` Richard Weinberger 2018-10-19 14:45 ` Richard Weinberger 2018-10-19 14:59 ` Richard Weinberger 2018-10-19 14:59 ` Richard Weinberger 2018-10-19 15:07 ` Amir Goldstein 2018-10-19 15:07 ` Amir Goldstein 2018-10-19 21:28 ` Rafał Miłecki 2018-10-19 21:28 ` Rafał Miłecki 2018-10-20 6:58 ` Richard Weinberger 2018-10-22 7:01 ` Rafał Miłecki 2018-10-22 7:01 ` Rafał Miłecki 2018-10-20 6:58 ` Richard Weinberger 2018-10-19 16:18 ` Rafał Miłecki 2018-10-19 16:18 ` Rafał Miłecki 2018-10-19 17:18 ` Richard Weinberger 2018-10-19 17:18 ` Richard Weinberger 2018-10-19 21:29 ` Rafał Miłecki 2018-10-19 21:29 ` Rafał Miłecki 2018-10-19 14:56 ` Amir Goldstein 2018-10-19 14:56 ` Amir Goldstein 2018-10-22 7:14 ` Rafał Miłecki 2018-10-22 7:14 ` Rafał Miłecki 2018-10-22 8:26 ` Richard Weinberger 2018-10-22 8:26 ` Richard Weinberger 2018-10-22 8:26 ` Richard Weinberger 2018-10-22 8:57 ` Amir Goldstein 2018-10-22 8:57 ` Amir Goldstein 2018-10-22 15:34 ` Rafał Miłecki 2018-10-22 15:34 ` Rafał Miłecki 2018-10-22 17:00 ` Amir Goldstein 2018-10-22 17:00 ` Amir Goldstein 2018-10-27 19:33 ` Richard Weinberger 2018-10-27 19:33 ` Richard Weinberger 2018-10-28 7:09 ` Russell Senior 2018-10-22 21:23 ` Rafał Miłecki 2018-10-22 21:23 ` Rafał Miłecki 2018-10-22 21:27 ` Rafał Miłecki 2018-10-22 21:27 ` Rafał Miłecki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.