Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")

All of lore.kernel.org
 help / color / mirror / Atom feed

* Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-19 12:31 ` Rafał Miłecki
  0 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-19 12:31 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Richard Weinberger, Artem Bityutskiy, Adrian Hunter, linux-mtd,
	Russell Senior, OpenWrt Development List

Hi,

Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
reporting file system corruptions. OpenWrt uses overlay(fs) with
squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
& describe test case for reproducing corruption when doing a power cut
after first boot.

Interestingly it cannot be reproduced on all devices (NAND dependant?
arch dependant?!). I couldn't reproduce that problem on none of my
Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti
EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and
bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of
lower inode on copy up").

FWIW I was told it also affects:
Asus RT-AC58U (ARCH_IPQ40XX=y)
powerpc
RB493G, DIR-860L (ATH79=y)

Steps to reproduce the problem:
1) Flash firmware
2) Boot (for the first time)
3) Let the init script copy config files from lowerdir to the upperdir
4) Wait for boot to finish
5) Verify content of some unmodified config on overlay, using either:
hexdump -C /etc/config/dropbear
hexdump -C /overlay/upper/etc/config/dropbear
6) Power cut & boot again
7) Check the content of the same file

After above regressing commit the later check confirms the file size
looks correct but it's filled with all 00-es only.

Can I ask you to check if there is something possibly wrong with the
above ovl commit? Or does it expose some problem with the ubifs? Or
maybe the whole UBI?

FWIW testing above commit (and one before it) always results in single
error in the kernel log:
[   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice

That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.

-- 
Rafał

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-19 12:31 ` Rafał Miłecki
  0 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-19 12:31 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Richard Weinberger, Artem Bityutskiy, Adrian Hunter, linux-mtd,
	Russell Senior, OpenWrt Development List

Hi,

Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
reporting file system corruptions. OpenWrt uses overlay(fs) with
squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
& describe test case for reproducing corruption when doing a power cut
after first boot.

Interestingly it cannot be reproduced on all devices (NAND dependant?
arch dependant?!). I couldn't reproduce that problem on none of my
Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti
EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and
bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of
lower inode on copy up").

FWIW I was told it also affects:
Asus RT-AC58U (ARCH_IPQ40XX=y)
powerpc
RB493G, DIR-860L (ATH79=y)

Steps to reproduce the problem:
1) Flash firmware
2) Boot (for the first time)
3) Let the init script copy config files from lowerdir to the upperdir
4) Wait for boot to finish
5) Verify content of some unmodified config on overlay, using either:
hexdump -C /etc/config/dropbear
hexdump -C /overlay/upper/etc/config/dropbear
6) Power cut & boot again
7) Check the content of the same file

After above regressing commit the later check confirms the file size
looks correct but it's filled with all 00-es only.

Can I ask you to check if there is something possibly wrong with the
above ovl commit? Or does it expose some problem with the ubifs? Or
maybe the whole UBI?

FWIW testing above commit (and one before it) always results in single
error in the kernel log:
[   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice

That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.

-- 
Rafał

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 12:31 ` Rafał Miłecki
@ 2018-10-19 14:45   ` Richard Weinberger
  -1 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-19 14:45 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

Rafał,

----- Ursprüngliche Mail -----
> Von: "Rafał Miłecki" <zajec5@gmail.com>
> An: "Amir Goldstein" <amir73il@gmail.com>, "Miklos Szeredi" <miklos@szeredi.hu>, linux-unionfs@vger.kernel.org,
> linux-fsdevel@vger.kernel.org, "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter"
> <adrian.hunter@intel.com>, linux-mtd@lists.infradead.org, "Russell Senior" <russell@personaltelco.net>, "OpenWrt
> Development List" <openwrt-devel@lists.openwrt.org>
> Gesendet: Freitag, 19. Oktober 2018 14:31:29
> Betreff: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")

> Hi,
> 
> Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> reporting file system corruptions. OpenWrt uses overlay(fs) with
> squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> & describe test case for reproducing corruption when doing a power cut
> after first boot.
> 
> Interestingly it cannot be reproduced on all devices (NAND dependant?
> arch dependant?!). I couldn't reproduce that problem on none of my
> Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti
> EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and
> bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of
> lower inode on copy up").
> 
> FWIW I was told it also affects:
> Asus RT-AC58U (ARCH_IPQ40XX=y)
> powerpc
> RB493G, DIR-860L (ATH79=y)
> 
> Steps to reproduce the problem:
> 1) Flash firmware
> 2) Boot (for the first time)
> 3) Let the init script copy config files from lowerdir to the upperdir
> 4) Wait for boot to finish
> 5) Verify content of some unmodified config on overlay, using either:
> hexdump -C /etc/config/dropbear
> hexdump -C /overlay/upper/etc/config/dropbear
> 6) Power cut & boot again
> 7) Check the content of the same file

Do you have something also I can test?
A C reproducer? An xfstest case?

> After above regressing commit the later check confirms the file size
> looks correct but it's filled with all 00-es only.
> 
> Can I ask you to check if there is something possibly wrong with the
> above ovl commit? Or does it expose some problem with the ubifs? Or
> maybe the whole UBI?

Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
Did you bisect the problem and you are sure that the said commit is the first bad commit?
 
> FWIW testing above commit (and one before it) always results in single
> error in the kernel log:
> [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice

Please show the full log.
The orphan thing rings a bell, we had such a bug already.

Thanks,
//richard

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-19 14:45   ` Richard Weinberger
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-19 14:45 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

Rafał,

----- Ursprüngliche Mail -----
> Von: "Rafał Miłecki" <zajec5@gmail.com>
> An: "Amir Goldstein" <amir73il@gmail.com>, "Miklos Szeredi" <miklos@szeredi.hu>, linux-unionfs@vger.kernel.org,
> linux-fsdevel@vger.kernel.org, "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter"
> <adrian.hunter@intel.com>, linux-mtd@lists.infradead.org, "Russell Senior" <russell@personaltelco.net>, "OpenWrt
> Development List" <openwrt-devel@lists.openwrt.org>
> Gesendet: Freitag, 19. Oktober 2018 14:31:29
> Betreff: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")

> Hi,
> 
> Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> reporting file system corruptions. OpenWrt uses overlay(fs) with
> squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> & describe test case for reproducing corruption when doing a power cut
> after first boot.
> 
> Interestingly it cannot be reproduced on all devices (NAND dependant?
> arch dependant?!). I couldn't reproduce that problem on none of my
> Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti
> EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and
> bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of
> lower inode on copy up").
> 
> FWIW I was told it also affects:
> Asus RT-AC58U (ARCH_IPQ40XX=y)
> powerpc
> RB493G, DIR-860L (ATH79=y)
> 
> Steps to reproduce the problem:
> 1) Flash firmware
> 2) Boot (for the first time)
> 3) Let the init script copy config files from lowerdir to the upperdir
> 4) Wait for boot to finish
> 5) Verify content of some unmodified config on overlay, using either:
> hexdump -C /etc/config/dropbear
> hexdump -C /overlay/upper/etc/config/dropbear
> 6) Power cut & boot again
> 7) Check the content of the same file

Do you have something also I can test?
A C reproducer? An xfstest case?

> After above regressing commit the later check confirms the file size
> looks correct but it's filled with all 00-es only.
> 
> Can I ask you to check if there is something possibly wrong with the
> above ovl commit? Or does it expose some problem with the ubifs? Or
> maybe the whole UBI?

Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
Did you bisect the problem and you are sure that the said commit is the first bad commit?
 
> FWIW testing above commit (and one before it) always results in single
> error in the kernel log:
> [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice

Please show the full log.
The orphan thing rings a bell, we had such a bug already.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 14:45   ` Richard Weinberger
  (?)
@ 2018-10-19 14:59   ` Richard Weinberger
  -1 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-19 14:59 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger:
> Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> Did you bisect the problem and you are sure that the said commit is the first bad commit?

Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs,
if I read the commit correctly.
Maybe this is the root of the problem.

I'm currently traveling, so I cannot test much. Please make sure whether
3a1e819b4e80 is really the first bad commit.
Then I can go for a bug hunt. :-)

Thanks,
//richard



______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 14:45   ` Richard Weinberger
  (?)
  (?)
@ 2018-10-19 14:59   ` Richard Weinberger
  2018-10-19 15:07     ` Amir Goldstein
                       ` (3 more replies)
  -1 siblings, 4 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-19 14:59 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger:
> Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> Did you bisect the problem and you are sure that the said commit is the first bad commit?

Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs,
if I read the commit correctly.
Maybe this is the root of the problem.

I'm currently traveling, so I cannot test much. Please make sure whether
3a1e819b4e80 is really the first bad commit.
Then I can go for a bug hunt. :-)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 14:59   ` Richard Weinberger
@ 2018-10-19 15:07     ` Amir Goldstein
  2018-10-19 15:07     ` Amir Goldstein
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: Amir Goldstein @ 2018-10-19 15:07 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Artem Bityutskiy, Rafał Miłecki, Adrian Hunter,
	overlayfs, Miklos Szeredi, linux-mtd, russell, linux-fsdevel,
	openwrt-devel

On Fri, Oct 19, 2018 at 5:59 PM Richard Weinberger <richard@nod.at> wrote:
>
> Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger:
> > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> > Did you bisect the problem and you are sure that the said commit is the first bad commit?
>
> Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs,
> if I read the commit correctly.
> Maybe this is the root of the problem.
>

You are not reading the commit correctly.

Commit says:
    "If lower fs does not support NFS export ops store a zero sized xattr so we
    can always use the overlay.origin xattr to distinguish between a copy up
    and a pure upper inode."

What it means is:
- Doesn't matter if ubifs supports NFS export - it's not the lower fs
- squashfs supports NFS export, but has zero s_uuid, so currently that
  is treated the same as no NFS export support (I have patch to relax this)
- The result of no lower NFS export support is that a zero size xattr is written
  as opposed to non-zero size xattr in the case of lower support NFS export

Thanks,
Amir.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 14:59   ` Richard Weinberger
  2018-10-19 15:07     ` Amir Goldstein
@ 2018-10-19 15:07     ` Amir Goldstein
  2018-10-19 21:28     ` Rafał Miłecki
  2018-10-19 21:28     ` Rafał Miłecki
  3 siblings, 0 replies; 40+ messages in thread
From: Amir Goldstein @ 2018-10-19 15:07 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Rafał Miłecki, Miklos Szeredi, overlayfs,
	linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd,
	russell, openwrt-devel

On Fri, Oct 19, 2018 at 5:59 PM Richard Weinberger <richard@nod.at> wrote:
>
> Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger:
> > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> > Did you bisect the problem and you are sure that the said commit is the first bad commit?
>
> Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs,
> if I read the commit correctly.
> Maybe this is the root of the problem.
>

You are not reading the commit correctly.

Commit says:
    "If lower fs does not support NFS export ops store a zero sized xattr so we
    can always use the overlay.origin xattr to distinguish between a copy up
    and a pure upper inode."

What it means is:
- Doesn't matter if ubifs supports NFS export - it's not the lower fs
- squashfs supports NFS export, but has zero s_uuid, so currently that
  is treated the same as no NFS export support (I have patch to relax this)
- The result of no lower NFS export support is that a zero size xattr is written
  as opposed to non-zero size xattr in the case of lower support NFS export

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 14:59   ` Richard Weinberger
  2018-10-19 15:07     ` Amir Goldstein
  2018-10-19 15:07     ` Amir Goldstein
@ 2018-10-19 21:28     ` Rafał Miłecki
  2018-10-19 21:28     ` Rafał Miłecki
  3 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-19 21:28 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote:
> Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger:
> > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> > Did you bisect the problem and you are sure that the said commit is the first bad commit?
>
> Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs,
> if I read the commit correctly.
> Maybe this is the root of the problem.
>
> I'm currently traveling, so I cannot test much. Please make sure whether
> 3a1e819b4e80 is really the first bad commit.
> Then I can go for a bug hunt. :-)

I did another test and everything keeps pointing to the 3a1e819b4e80.

1) I switched to the 4.12.14
2) I confirmed problem exists there
3) I added "return 0;" at the beginning of ovl_set_origin()
4) I confirmed problem disappeared

-- 
Rafał

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 14:59   ` Richard Weinberger
                       ` (2 preceding siblings ...)
  2018-10-19 21:28     ` Rafał Miłecki
@ 2018-10-19 21:28     ` Rafał Miłecki
  2018-10-20  6:58       ` Richard Weinberger
  2018-10-20  6:58       ` Richard Weinberger
  3 siblings, 2 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-19 21:28 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote:
> Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger:
> > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> > Did you bisect the problem and you are sure that the said commit is the first bad commit?
>
> Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs,
> if I read the commit correctly.
> Maybe this is the root of the problem.
>
> I'm currently traveling, so I cannot test much. Please make sure whether
> 3a1e819b4e80 is really the first bad commit.
> Then I can go for a bug hunt. :-)

I did another test and everything keeps pointing to the 3a1e819b4e80.

1) I switched to the 4.12.14
2) I confirmed problem exists there
3) I added "return 0;" at the beginning of ovl_set_origin()
4) I confirmed problem disappeared

-- 
Rafał

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 21:28     ` Rafał Miłecki
@ 2018-10-20  6:58       ` Richard Weinberger
  2018-10-22  7:01           ` Rafał Miłecki
  2018-10-20  6:58       ` Richard Weinberger
  1 sibling, 1 reply; 40+ messages in thread
From: Richard Weinberger @ 2018-10-20  6:58 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

Am Freitag, 19. Oktober 2018, 23:28:33 CEST schrieb Rafał Miłecki:
> On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote:
> > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger:
> > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> > > Did you bisect the problem and you are sure that the said commit is the first bad commit?
> >
> > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs,
> > if I read the commit correctly.
> > Maybe this is the root of the problem.
> >
> > I'm currently traveling, so I cannot test much. Please make sure whether
> > 3a1e819b4e80 is really the first bad commit.
> > Then I can go for a bug hunt. :-)
> 
> I did another test and everything keeps pointing to the 3a1e819b4e80.
> 
> 1) I switched to the 4.12.14
> 2) I confirmed problem exists there
> 3) I added "return 0;" at the beginning of ovl_set_origin()
> 4) I confirmed problem disappeared

Okay. But I still think we're facing some UBIFS problem.

To double check, on Linux 4.12.14 the problem exists and by reverting 3a1e819b4e80 (or making it a nop)
it goes away?
Do you see the warning regarding the double orphan only in the bad case?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-20  6:58       ` Richard Weinberger
@ 2018-10-22  7:01           ` Rafał Miłecki
  0 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-22  7:01 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

On Sat, 20 Oct 2018 at 08:58, Richard Weinberger <richard@nod.at> wrote:
> Am Freitag, 19. Oktober 2018, 23:28:33 CEST schrieb Rafał Miłecki:
> > On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote:
> > > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger:
> > > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> > > > Did you bisect the problem and you are sure that the said commit is the first bad commit?
> > >
> > > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs,
> > > if I read the commit correctly.
> > > Maybe this is the root of the problem.
> > >
> > > I'm currently traveling, so I cannot test much. Please make sure whether
> > > 3a1e819b4e80 is really the first bad commit.
> > > Then I can go for a bug hunt. :-)
> >
> > I did another test and everything keeps pointing to the 3a1e819b4e80.
> >
> > 1) I switched to the 4.12.14
> > 2) I confirmed problem exists there
> > 3) I added "return 0;" at the beginning of ovl_set_origin()
> > 4) I confirmed problem disappeared
>
> Okay. But I still think we're facing some UBIFS problem.
>
> To double check, on Linux 4.12.14 the problem exists and by reverting 3a1e819b4e80 (or making it a nop)
> it goes away?

Yes. This is exactly what I described above (4 steps test). Making
ovl_set_origin() a NOP in 4.12.14 makes problem disappear.

> Do you see the warning regarding the double orphan only in the bad case?

No. Let me quote myself: "Please note I wrote this error appears
*with* ovl commit and also with one commit earlier.". If I checkout
3a1e819b4e80~1 (one commit before) I don't see power cut corruption
BUT I still see orphan error.


It seems despite me explaining some things twice still not everything
is clear. Let me work on a complete summary for all commits &
behaviors.

-- 
Rafał

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-22  7:01           ` Rafał Miłecki
  0 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-22  7:01 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

On Sat, 20 Oct 2018 at 08:58, Richard Weinberger <richard@nod.at> wrote:
> Am Freitag, 19. Oktober 2018, 23:28:33 CEST schrieb Rafał Miłecki:
> > On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote:
> > > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger:
> > > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> > > > Did you bisect the problem and you are sure that the said commit is the first bad commit?
> > >
> > > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs,
> > > if I read the commit correctly.
> > > Maybe this is the root of the problem.
> > >
> > > I'm currently traveling, so I cannot test much. Please make sure whether
> > > 3a1e819b4e80 is really the first bad commit.
> > > Then I can go for a bug hunt. :-)
> >
> > I did another test and everything keeps pointing to the 3a1e819b4e80.
> >
> > 1) I switched to the 4.12.14
> > 2) I confirmed problem exists there
> > 3) I added "return 0;" at the beginning of ovl_set_origin()
> > 4) I confirmed problem disappeared
>
> Okay. But I still think we're facing some UBIFS problem.
>
> To double check, on Linux 4.12.14 the problem exists and by reverting 3a1e819b4e80 (or making it a nop)
> it goes away?

Yes. This is exactly what I described above (4 steps test). Making
ovl_set_origin() a NOP in 4.12.14 makes problem disappear.

> Do you see the warning regarding the double orphan only in the bad case?

No. Let me quote myself: "Please note I wrote this error appears
*with* ovl commit and also with one commit earlier.". If I checkout
3a1e819b4e80~1 (one commit before) I don't see power cut corruption
BUT I still see orphan error.


It seems despite me explaining some things twice still not everything
is clear. Let me work on a complete summary for all commits &
behaviors.

-- 
Rafał

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 21:28     ` Rafał Miłecki
  2018-10-20  6:58       ` Richard Weinberger
@ 2018-10-20  6:58       ` Richard Weinberger
  1 sibling, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-20  6:58 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

Am Freitag, 19. Oktober 2018, 23:28:33 CEST schrieb Rafał Miłecki:
> On Fri, 19 Oct 2018 at 16:59, Richard Weinberger <richard@nod.at> wrote:
> > Am Freitag, 19. Oktober 2018, 16:45:53 CEST schrieb Richard Weinberger:
> > > Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> > > Did you bisect the problem and you are sure that the said commit is the first bad commit?
> >
> > Another thing, UBIFS has no export operations, so overlayfs will fall back to xattrs,
> > if I read the commit correctly.
> > Maybe this is the root of the problem.
> >
> > I'm currently traveling, so I cannot test much. Please make sure whether
> > 3a1e819b4e80 is really the first bad commit.
> > Then I can go for a bug hunt. :-)
> 
> I did another test and everything keeps pointing to the 3a1e819b4e80.
> 
> 1) I switched to the 4.12.14
> 2) I confirmed problem exists there
> 3) I added "return 0;" at the beginning of ovl_set_origin()
> 4) I confirmed problem disappeared

Okay. But I still think we're facing some UBIFS problem.

To double check, on Linux 4.12.14 the problem exists and by reverting 3a1e819b4e80 (or making it a nop)
it goes away?
Do you see the warning regarding the double orphan only in the bad case?

Thanks,
//richard



______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 14:45   ` Richard Weinberger
                     ` (2 preceding siblings ...)
  (?)
@ 2018-10-19 16:18   ` Rafał Miłecki
  -1 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-19 16:18 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

On Fri, 19 Oct 2018 at 16:45, Richard Weinberger <richard@nod.at> wrote:
> ----- Ursprüngliche Mail -----
> > Von: "Rafał Miłecki" <zajec5@gmail.com>
> > An: "Amir Goldstein" <amir73il@gmail.com>, "Miklos Szeredi" <miklos@szeredi.hu>, linux-unionfs@vger.kernel.org,
> > linux-fsdevel@vger.kernel.org, "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter"
> > <adrian.hunter@intel.com>, linux-mtd@lists.infradead.org, "Russell Senior" <russell@personaltelco.net>, "OpenWrt
> > Development List" <openwrt-devel@lists.openwrt.org>
> > Gesendet: Freitag, 19. Oktober 2018 14:31:29
> > Betreff: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
>
> > Hi,
> >
> > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > & describe test case for reproducing corruption when doing a power cut
> > after first boot.
> >
> > Interestingly it cannot be reproduced on all devices (NAND dependant?
> > arch dependant?!). I couldn't reproduce that problem on none of my
> > Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti
> > EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and
> > bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of
> > lower inode on copy up").
> >
> > FWIW I was told it also affects:
> > Asus RT-AC58U (ARCH_IPQ40XX=y)
> > powerpc
> > RB493G, DIR-860L (ATH79=y)
> >
> > Steps to reproduce the problem:
> > 1) Flash firmware
> > 2) Boot (for the first time)
> > 3) Let the init script copy config files from lowerdir to the upperdir
> > 4) Wait for boot to finish
> > 5) Verify content of some unmodified config on overlay, using either:
> > hexdump -C /etc/config/dropbear
> > hexdump -C /overlay/upper/etc/config/dropbear
> > 6) Power cut & boot again
> > 7) Check the content of the same file
>
> Do you have something also I can test?
> A C reproducer? An xfstest case?

I don't. I may try writing one with info provided my Amir, but I'm not
experienced with such things, won't be trivial for me.


> > After above regressing commit the later check confirms the file size
> > looks correct but it's filled with all 00-es only.
> >
> > Can I ask you to check if there is something possibly wrong with the
> > above ovl commit? Or does it expose some problem with the ubifs? Or
> > maybe the whole UBI?
>
> Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> Did you bisect the problem and you are sure that the said commit is the first bad commit?

Yes, I did git bisect and then double verified that.


> > FWIW testing above commit (and one before it) always results in single
> > error in the kernel log:
> > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
>
> Please show the full log.
> The orphan thing rings a bell, we had such a bug already.

I will get a full log later. Please note I wrote this error appears
*with* ovl commit and also with one commit earlier. So it's very
unlikely to be caused by ovl change. Most likely it was some error
present in 4.11.0-rc1 and fixed later (not related to ovl).

-- 
Rafał

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 14:45   ` Richard Weinberger
                     ` (3 preceding siblings ...)
  (?)
@ 2018-10-19 16:18   ` Rafał Miłecki
  2018-10-19 17:18       ` Richard Weinberger
  -1 siblings, 1 reply; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-19 16:18 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

On Fri, 19 Oct 2018 at 16:45, Richard Weinberger <richard@nod.at> wrote:
> ----- Ursprüngliche Mail -----
> > Von: "Rafał Miłecki" <zajec5@gmail.com>
> > An: "Amir Goldstein" <amir73il@gmail.com>, "Miklos Szeredi" <miklos@szeredi.hu>, linux-unionfs@vger.kernel.org,
> > linux-fsdevel@vger.kernel.org, "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter"
> > <adrian.hunter@intel.com>, linux-mtd@lists.infradead.org, "Russell Senior" <russell@personaltelco.net>, "OpenWrt
> > Development List" <openwrt-devel@lists.openwrt.org>
> > Gesendet: Freitag, 19. Oktober 2018 14:31:29
> > Betreff: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
>
> > Hi,
> >
> > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > & describe test case for reproducing corruption when doing a power cut
> > after first boot.
> >
> > Interestingly it cannot be reproduced on all devices (NAND dependant?
> > arch dependant?!). I couldn't reproduce that problem on none of my
> > Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti
> > EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and
> > bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of
> > lower inode on copy up").
> >
> > FWIW I was told it also affects:
> > Asus RT-AC58U (ARCH_IPQ40XX=y)
> > powerpc
> > RB493G, DIR-860L (ATH79=y)
> >
> > Steps to reproduce the problem:
> > 1) Flash firmware
> > 2) Boot (for the first time)
> > 3) Let the init script copy config files from lowerdir to the upperdir
> > 4) Wait for boot to finish
> > 5) Verify content of some unmodified config on overlay, using either:
> > hexdump -C /etc/config/dropbear
> > hexdump -C /overlay/upper/etc/config/dropbear
> > 6) Power cut & boot again
> > 7) Check the content of the same file
>
> Do you have something also I can test?
> A C reproducer? An xfstest case?

I don't. I may try writing one with info provided my Amir, but I'm not
experienced with such things, won't be trivial for me.


> > After above regressing commit the later check confirms the file size
> > looks correct but it's filled with all 00-es only.
> >
> > Can I ask you to check if there is something possibly wrong with the
> > above ovl commit? Or does it expose some problem with the ubifs? Or
> > maybe the whole UBI?
>
> Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
> Did you bisect the problem and you are sure that the said commit is the first bad commit?

Yes, I did git bisect and then double verified that.


> > FWIW testing above commit (and one before it) always results in single
> > error in the kernel log:
> > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
>
> Please show the full log.
> The orphan thing rings a bell, we had such a bug already.

I will get a full log later. Please note I wrote this error appears
*with* ovl commit and also with one commit earlier. So it's very
unlikely to be caused by ovl change. Most likely it was some error
present in 4.11.0-rc1 and fixed later (not related to ovl).

-- 
Rafał

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 16:18   ` Rafał Miłecki
@ 2018-10-19 17:18       ` Richard Weinberger
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-19 17:18 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

Am Freitag, 19. Oktober 2018, 18:18:05 CEST schrieb Rafał Miłecki:
> > Do you have something also I can test?
> > A C reproducer? An xfstest case?
> 
> I don't. I may try writing one with info provided my Amir, but I'm not
> experienced with such things, won't be trivial for me.

Next week I'm in Edinburgh at ELCE, you too?
Maybe we find an hour or two to work together on the problem.

Thanks,
//richard 




______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-19 17:18       ` Richard Weinberger
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-19 17:18 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

Am Freitag, 19. Oktober 2018, 18:18:05 CEST schrieb Rafał Miłecki:
> > Do you have something also I can test?
> > A C reproducer? An xfstest case?
> 
> I don't. I may try writing one with info provided my Amir, but I'm not
> experienced with such things, won't be trivial for me.

Next week I'm in Edinburgh at ELCE, you too?
Maybe we find an hour or two to work together on the problem.

Thanks,
//richard 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 17:18       ` Richard Weinberger
@ 2018-10-19 21:29         ` Rafał Miłecki
  -1 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-19 21:29 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

On Fri, 19 Oct 2018 at 19:18, Richard Weinberger <richard@nod.at> wrote:
> Am Freitag, 19. Oktober 2018, 18:18:05 CEST schrieb Rafał Miłecki:
> > > Do you have something also I can test?
> > > A C reproducer? An xfstest case?
> >
> > I don't. I may try writing one with info provided my Amir, but I'm not
> > experienced with such things, won't be trivial for me.
>
> Next week I'm in Edinburgh at ELCE, you too?
> Maybe we find an hour or two to work together on the problem.

Sadly I'm not attending ELCE :( I chose to attend OpenWrt summit which
takes place a week later.

-- 
Rafał

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-19 21:29         ` Rafał Miłecki
  0 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-19 21:29 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

On Fri, 19 Oct 2018 at 19:18, Richard Weinberger <richard@nod.at> wrote:
> Am Freitag, 19. Oktober 2018, 18:18:05 CEST schrieb Rafał Miłecki:
> > > Do you have something also I can test?
> > > A C reproducer? An xfstest case?
> >
> > I don't. I may try writing one with info provided my Amir, but I'm not
> > experienced with such things, won't be trivial for me.
>
> Next week I'm in Edinburgh at ELCE, you too?
> Maybe we find an hour or two to work together on the problem.

Sadly I'm not attending ELCE :( I chose to attend OpenWrt summit which
takes place a week later.

-- 
Rafał

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 12:31 ` Rafał Miłecki
@ 2018-10-19 14:56   ` Amir Goldstein
  -1 siblings, 0 replies; 40+ messages in thread
From: Amir Goldstein @ 2018-10-19 14:56 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Artem Bityutskiy, Richard Weinberger, Miklos Szeredi,
	Adrian Hunter, overlayfs, linux-mtd, russell, linux-fsdevel,
	openwrt-devel

On Fri, Oct 19, 2018 at 3:31 PM Rafał Miłecki <zajec5@gmail.com> wrote:
>
> Hi,
>
> Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> reporting file system corruptions. OpenWrt uses overlay(fs) with
> squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> & describe test case for reproducing corruption when doing a power cut
> after first boot.
>
> Interestingly it cannot be reproduced on all devices (NAND dependant?
> arch dependant?!). I couldn't reproduce that problem on none of my
> Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti
> EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and
> bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of
> lower inode on copy up").
>
> FWIW I was told it also affects:
> Asus RT-AC58U (ARCH_IPQ40XX=y)
> powerpc
> RB493G, DIR-860L (ATH79=y)
>
> Steps to reproduce the problem:
> 1) Flash firmware
> 2) Boot (for the first time)
> 3) Let the init script copy config files from lowerdir to the upperdir
> 4) Wait for boot to finish
> 5) Verify content of some unmodified config on overlay, using either:
> hexdump -C /etc/config/dropbear
> hexdump -C /overlay/upper/etc/config/dropbear
> 6) Power cut & boot again
> 7) Check the content of the same file
>
> After above regressing commit the later check confirms the file size
> looks correct but it's filled with all 00-es only.
>
> Can I ask you to check if there is something possibly wrong with the
> above ovl commit? Or does it expose some problem with the ubifs? Or
> maybe the whole UBI?
>
> FWIW testing above commit (and one before it) always results in single
> error in the kernel log:
> [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
>

This looks very suspicious and should be taken up with ubifs developers.
There shouldn't be any way for an external module (overlayfs) to generate
internal errors like this in ubifs.

Let me describes what the "offending" commit changes to help you narrow
down the reproducer.

Before the change when config file is copied up overlayfs does:
- open O_TMPFILE in upper ubifs
- copy lower file data to tmpfile
- copy lower file metadata to tmpfile
- link tmpfile to upper ubifs path

After the change, the *only* extra operation is:
vfs_setxattr(tmpfile, "trusted.overlay.origin", NULL, 0, 0);
after opening tmpfile

Add to that the fact that O_TMPFILE support in ubifs is rather new
and you get high likelihood that the problem is stemmed somewhere
around setting extended attributes on O_TMPFILE in ubifs.

You can try to write a small reproducer that does the 5 steps above
without overlayfs directly on ubifs and see if youo get similar results
after power cut.

BTW, creating an O_TMPFILE "orphands" the created inode (once)
something in this procedure, allegedly setxattr() for some reason
causes the inode to be orphaned twice if we believe the error from
ubifs.

BTW, off topic, I posted some overlayfs patches that enable more features
for lower squashfs:
https://www.spinics.net/lists/linux-unionfs/msg05570.html

The most prominent feature is NFS export, but there is also persistent
st_dev;st_ino values that don't change after copy up + reboot and not
breaking hardlinks on copy up.

Can anyone from OpenWrt say if this is at all interesting for your users?

Thanks,
Amir.



> That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
>
> --
> Rafał

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-19 14:56   ` Amir Goldstein
  0 siblings, 0 replies; 40+ messages in thread
From: Amir Goldstein @ 2018-10-19 14:56 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Miklos Szeredi, overlayfs, linux-fsdevel, Richard Weinberger,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, russell,
	openwrt-devel

On Fri, Oct 19, 2018 at 3:31 PM Rafał Miłecki <zajec5@gmail.com> wrote:
>
> Hi,
>
> Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> reporting file system corruptions. OpenWrt uses overlay(fs) with
> squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> & describe test case for reproducing corruption when doing a power cut
> after first boot.
>
> Interestingly it cannot be reproduced on all devices (NAND dependant?
> arch dependant?!). I couldn't reproduce that problem on none of my
> Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti
> EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and
> bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of
> lower inode on copy up").
>
> FWIW I was told it also affects:
> Asus RT-AC58U (ARCH_IPQ40XX=y)
> powerpc
> RB493G, DIR-860L (ATH79=y)
>
> Steps to reproduce the problem:
> 1) Flash firmware
> 2) Boot (for the first time)
> 3) Let the init script copy config files from lowerdir to the upperdir
> 4) Wait for boot to finish
> 5) Verify content of some unmodified config on overlay, using either:
> hexdump -C /etc/config/dropbear
> hexdump -C /overlay/upper/etc/config/dropbear
> 6) Power cut & boot again
> 7) Check the content of the same file
>
> After above regressing commit the later check confirms the file size
> looks correct but it's filled with all 00-es only.
>
> Can I ask you to check if there is something possibly wrong with the
> above ovl commit? Or does it expose some problem with the ubifs? Or
> maybe the whole UBI?
>
> FWIW testing above commit (and one before it) always results in single
> error in the kernel log:
> [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
>

This looks very suspicious and should be taken up with ubifs developers.
There shouldn't be any way for an external module (overlayfs) to generate
internal errors like this in ubifs.

Let me describes what the "offending" commit changes to help you narrow
down the reproducer.

Before the change when config file is copied up overlayfs does:
- open O_TMPFILE in upper ubifs
- copy lower file data to tmpfile
- copy lower file metadata to tmpfile
- link tmpfile to upper ubifs path

After the change, the *only* extra operation is:
vfs_setxattr(tmpfile, "trusted.overlay.origin", NULL, 0, 0);
after opening tmpfile

Add to that the fact that O_TMPFILE support in ubifs is rather new
and you get high likelihood that the problem is stemmed somewhere
around setting extended attributes on O_TMPFILE in ubifs.

You can try to write a small reproducer that does the 5 steps above
without overlayfs directly on ubifs and see if youo get similar results
after power cut.

BTW, creating an O_TMPFILE "orphands" the created inode (once)
something in this procedure, allegedly setxattr() for some reason
causes the inode to be orphaned twice if we believe the error from
ubifs.

BTW, off topic, I posted some overlayfs patches that enable more features
for lower squashfs:
https://www.spinics.net/lists/linux-unionfs/msg05570.html

The most prominent feature is NFS export, but there is also persistent
st_dev;st_ino values that don't change after copy up + reboot and not
breaking hardlinks on copy up.

Can anyone from OpenWrt say if this is at all interesting for your users?

Thanks,
Amir.



> That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
>
> --
> Rafał

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-19 12:31 ` Rafał Miłecki
@ 2018-10-22  7:14   ` Rafał Miłecki
  -1 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-22  7:14 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Richard Weinberger, Artem Bityutskiy, Adrian Hunter, linux-mtd,
	Russell Senior, OpenWrt Development List

On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> reporting file system corruptions. OpenWrt uses overlay(fs) with
> squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> & describe test case for reproducing corruption when doing a power cut
> after first boot.
>
> (...)
>
> Can I ask you to check if there is something possibly wrong with the
> above ovl commit? Or does it expose some problem with the ubifs? Or
> maybe the whole UBI?
>
> FWIW testing above commit (and one before it) always results in single
> error in the kernel log:
> [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
>
> That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.

Let me provide a summary of all relevant commits & tests:

By "Corruption" I mean file system corruption after power cut
By "UBIFS error" I mean:
[   14.308125] UBIFS error (ubi0:1 pid 628): ubifs_add_orphan: orphaned twice


1) 3a1e819b4e80~1
(one commit before regression)
Corruption: NO
UBIFS error: YES

2) 3a1e819b4e80~1 + cherry-pick of 32fe905c17f0
(one commit before regression + ubifs fix)
Corruption: NO
UBIFS error: NO

3) 3a1e819b4e80
(regressing commit)
Corruption: YES
UBIFS error: YES

4) 3a1e819b4e80 + cherry-pick of 32fe905c17f0
(regressing commit + ubifs fix)
Corruption: YES
UBIFS error: NO

5) 4.12.14
Corruption: YES
UBIFS error: NO

6) 4.12.14 + ovl_set_origin() made nop
Corruption: NO
UBIFS error: NO


It proves that:
1) UBIFS error is not related to this file system corruption and can be ignored
2) 3a1e819b4e80 and ovl_set_origin() are definitely responsible for
the regression and file system corruption

-- 
Rafał

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-22  7:14   ` Rafał Miłecki
  0 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-22  7:14 UTC (permalink / raw)
  To: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Richard Weinberger, Artem Bityutskiy, Adrian Hunter, linux-mtd,
	Russell Senior, OpenWrt Development List

On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> reporting file system corruptions. OpenWrt uses overlay(fs) with
> squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> & describe test case for reproducing corruption when doing a power cut
> after first boot.
>
> (...)
>
> Can I ask you to check if there is something possibly wrong with the
> above ovl commit? Or does it expose some problem with the ubifs? Or
> maybe the whole UBI?
>
> FWIW testing above commit (and one before it) always results in single
> error in the kernel log:
> [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
>
> That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.

Let me provide a summary of all relevant commits & tests:

By "Corruption" I mean file system corruption after power cut
By "UBIFS error" I mean:
[   14.308125] UBIFS error (ubi0:1 pid 628): ubifs_add_orphan: orphaned twice


1) 3a1e819b4e80~1
(one commit before regression)
Corruption: NO
UBIFS error: YES

2) 3a1e819b4e80~1 + cherry-pick of 32fe905c17f0
(one commit before regression + ubifs fix)
Corruption: NO
UBIFS error: NO

3) 3a1e819b4e80
(regressing commit)
Corruption: YES
UBIFS error: YES

4) 3a1e819b4e80 + cherry-pick of 32fe905c17f0
(regressing commit + ubifs fix)
Corruption: YES
UBIFS error: NO

5) 4.12.14
Corruption: YES
UBIFS error: NO

6) 4.12.14 + ovl_set_origin() made nop
Corruption: NO
UBIFS error: NO


It proves that:
1) UBIFS error is not related to this file system corruption and can be ignored
2) 3a1e819b4e80 and ovl_set_origin() are definitely responsible for
the regression and file system corruption

-- 
Rafał

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-22  7:14   ` Rafał Miłecki
  (?)
@ 2018-10-22  8:26     ` Richard Weinberger
  -1 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-22  8:26 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > & describe test case for reproducing corruption when doing a power cut
> > after first boot.
> >
> > (...)
> >
> > Can I ask you to check if there is something possibly wrong with the
> > above ovl commit? Or does it expose some problem with the ubifs? Or
> > maybe the whole UBI?
> >
> > FWIW testing above commit (and one before it) always results in single
> > error in the kernel log:
> > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> >
> > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> 
> Let me provide a summary of all relevant commits & tests:
> 
> By "Corruption" I mean file system corruption after power cut

Well, is the filesystem not consistent anymore?
From what Russel explained to me, I thought the main problem is that no write back happens.
IOW the inode is present, has correct length, but no content is there (all zeros).

Just like the typical case where userspace does not fsync.
But in your case sooner or later write back should have happened because the writeback timer
fires at some point.

Thanks,
//richard





______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-22  8:26     ` Richard Weinberger
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-22  8:26 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > & describe test case for reproducing corruption when doing a power cut
> > after first boot.
> >
> > (...)
> >
> > Can I ask you to check if there is something possibly wrong with the
> > above ovl commit? Or does it expose some problem with the ubifs? Or
> > maybe the whole UBI?
> >
> > FWIW testing above commit (and one before it) always results in single
> > error in the kernel log:
> > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> >
> > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> 
> Let me provide a summary of all relevant commits & tests:
> 
> By "Corruption" I mean file system corruption after power cut

Well, is the filesystem not consistent anymore?
From what Russel explained to me, I thought the main problem is that no write back happens.
IOW the inode is present, has correct length, but no content is there (all zeros).

Just like the typical case where userspace does not fsync.
But in your case sooner or later write back should have happened because the writeback timer
fires at some point.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-22  8:26     ` Richard Weinberger
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-22  8:26 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > & describe test case for reproducing corruption when doing a power cut
> > after first boot.
> >
> > (...)
> >
> > Can I ask you to check if there is something possibly wrong with the
> > above ovl commit? Or does it expose some problem with the ubifs? Or
> > maybe the whole UBI?
> >
> > FWIW testing above commit (and one before it) always results in single
> > error in the kernel log:
> > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> >
> > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> 
> Let me provide a summary of all relevant commits & tests:
> 
> By "Corruption" I mean file system corruption after power cut

Well, is the filesystem not consistent anymore?
>From what Russel explained to me, I thought the main problem is that no write back happens.
IOW the inode is present, has correct length, but no content is there (all zeros).

Just like the typical case where userspace does not fsync.
But in your case sooner or later write back should have happened because the writeback timer
fires at some point.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-22  8:26     ` Richard Weinberger
@ 2018-10-22  8:57       ` Amir Goldstein
  -1 siblings, 0 replies; 40+ messages in thread
From: Amir Goldstein @ 2018-10-22  8:57 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Artem Bityutskiy, Rafał Miłecki, Adrian Hunter,
	overlayfs, Miklos Szeredi, linux-mtd, russell, linux-fsdevel,
	openwrt-devel

On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote:
>
> Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > & describe test case for reproducing corruption when doing a power cut
> > > after first boot.
> > >
> > > (...)
> > >
> > > Can I ask you to check if there is something possibly wrong with the
> > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > maybe the whole UBI?
> > >
> > > FWIW testing above commit (and one before it) always results in single
> > > error in the kernel log:
> > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > >
> > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> >
> > Let me provide a summary of all relevant commits & tests:
> >
> > By "Corruption" I mean file system corruption after power cut
>
> Well, is the filesystem not consistent anymore?
> From what Russel explained to me, I thought the main problem is that no write back happens.
> IOW the inode is present, has correct length, but no content is there (all zeros).
>
> Just like the typical case where userspace does not fsync.
> But in your case sooner or later write back should have happened because the writeback timer
> fires at some point.
>

For the records overlayfs does:
- open(O_TMPFILE)
- setxattr() [with 3a1e819b4e80]
- write to tmpfile
- fsync tmpfile
- link tmpfile

I suggest that you try the same from user space on ubifs.

Thanks,
Amir.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-22  8:57       ` Amir Goldstein
  0 siblings, 0 replies; 40+ messages in thread
From: Amir Goldstein @ 2018-10-22  8:57 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Rafał Miłecki, Miklos Szeredi, overlayfs,
	linux-fsdevel, Artem Bityutskiy, Adrian Hunter, linux-mtd,
	russell, openwrt-devel

On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote:
>
> Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > & describe test case for reproducing corruption when doing a power cut
> > > after first boot.
> > >
> > > (...)
> > >
> > > Can I ask you to check if there is something possibly wrong with the
> > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > maybe the whole UBI?
> > >
> > > FWIW testing above commit (and one before it) always results in single
> > > error in the kernel log:
> > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > >
> > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> >
> > Let me provide a summary of all relevant commits & tests:
> >
> > By "Corruption" I mean file system corruption after power cut
>
> Well, is the filesystem not consistent anymore?
> From what Russel explained to me, I thought the main problem is that no write back happens.
> IOW the inode is present, has correct length, but no content is there (all zeros).
>
> Just like the typical case where userspace does not fsync.
> But in your case sooner or later write back should have happened because the writeback timer
> fires at some point.
>

For the records overlayfs does:
- open(O_TMPFILE)
- setxattr() [with 3a1e819b4e80]
- write to tmpfile
- fsync tmpfile
- link tmpfile

I suggest that you try the same from user space on ubifs.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-22  8:57       ` Amir Goldstein
@ 2018-10-22 15:34         ` Rafał Miłecki
  -1 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-22 15:34 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Artem Bityutskiy, Richard Weinberger, Miklos Szeredi,
	Adrian Hunter, linux-unionfs, linux-mtd, Russell Senior,
	linux-fsdevel, OpenWrt Development List

[-- Attachment #1: Type: text/plain, Size: 2806 bytes --]

On Mon, 22 Oct 2018 at 10:57, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote:
> >
> > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > > & describe test case for reproducing corruption when doing a power cut
> > > > after first boot.
> > > >
> > > > (...)
> > > >
> > > > Can I ask you to check if there is something possibly wrong with the
> > > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > > maybe the whole UBI?
> > > >
> > > > FWIW testing above commit (and one before it) always results in single
> > > > error in the kernel log:
> > > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > > >
> > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> > >
> > > Let me provide a summary of all relevant commits & tests:
> > >
> > > By "Corruption" I mean file system corruption after power cut
> >
> > Well, is the filesystem not consistent anymore?
> > From what Russel explained to me, I thought the main problem is that no write back happens.
> > IOW the inode is present, has correct length, but no content is there (all zeros).
> >
> > Just like the typical case where userspace does not fsync.
> > But in your case sooner or later write back should have happened because the writeback timer
> > fires at some point.
> >
>
> For the records overlayfs does:
> - open(O_TMPFILE)
> - setxattr() [with 3a1e819b4e80]
> - write to tmpfile
> - fsync tmpfile
> - link tmpfile
>
> I suggest that you try the same from user space on ubifs.

Are you 100% sure about it? I tried writing C app behaving as you
described above and I could not reproduce the problem.

Then I took a close look at ovl_copy_up_locked() and it seems above
info isn't accurate. It seems to me that setxattr() happens between
fsync and link. I modified my C app to follow that order (open, write,
fsync, setxattr, link) and I can reproduce the problem now!

Steps to reproduce the problem:
1) compile tmptest.c
2) tmptest /overlay/upper/foo.txt user.bar baz
3) wait 5 seconds (so ubifs writes to flash)
4) power cut
5) boot again and check content of /overlay/upper/foo.txt
6) in my case content appears to be 00 00 00 00

Is this correct for ovl to call vfs_setxattr() *after* calling vfs_fsync()?

-- 
Rafał

[-- Attachment #2: tmptest.c --]
[-- Type: text/x-csrc, Size: 1616 bytes --]

#include <errno.h>
#include <fcntl.h>
#include <libgen.h>
#include <linux/limits.h>
#include <stdio.h>
#include <string.h>
#include <sys/xattr.h>
#include <unistd.h>

static const char foo[] = "foo\n";

int main(int argc, char **argv) {
	char path[PATH_MAX];
	char *dir;
	int err;
	int fd;

	if (argc < 2) {
		fprintf(stderr, "You have to pass file as argument\n");
		return -ENOENT;
	}

	dir = strdup(argv[1]);
	if (!dir) {
		return -ENOMEM;
	}
	dir = dirname(dir);
	fd = open(dir, O_TMPFILE | O_RDWR);
	if (fd < 0) {
		err = -errno;
		fprintf(stderr, "Failed to open O_TMPFILE: %d\n", err);
		return err;
	}

#if 0
	if (argc >= 4) {
		if (fsetxattr(fd, argv[2], argv[3], strlen(argv[3]), 0) != 0) {
			err = -errno;
			fprintf(stderr, "Failed to fsetxattr(): %d\n", err);
			return err;
		}
		printf("Wrote xattr \"%s\" value \"%s\"\n", argv[2], argv[3]);
	}
#endif

	if (write(fd, foo, sizeof(foo)) != sizeof(foo)) {
		err = -errno;
		fprintf(stderr, "Failed to write(): %d\n", err);
		return err;
	}

	if (fsync(fd) < 0) {
		err = -errno;
		fprintf(stderr, "Failed to fsync(): %d\n", err);
		return err;
	}

#if 1
	if (argc >= 4) {
		if (fsetxattr(fd, argv[2], argv[3], strlen(argv[3]), 0) != 0) {
			err = -errno;
			fprintf(stderr, "Failed to fsetxattr(): %d\n", err);
			return err;
		}
		printf("Wrote xattr \"%s\" value \"%s\"\n", argv[2], argv[3]);
	}
#endif

	snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
	if (linkat(-1, path, AT_FDCWD, argv[1], AT_SYMLINK_FOLLOW) != 0) {
		err = -errno;
		fprintf(stderr, "Failed to linkat(): %d\n", err);
		return err;
	}

	printf("Success!\n");

	return 0;
}

[-- Attachment #3: Type: text/plain, Size: 144 bytes --]

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-22 15:34         ` Rafał Miłecki
  0 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-22 15:34 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Richard Weinberger, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

[-- Attachment #1: Type: text/plain, Size: 2806 bytes --]

On Mon, 22 Oct 2018 at 10:57, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote:
> >
> > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > > & describe test case for reproducing corruption when doing a power cut
> > > > after first boot.
> > > >
> > > > (...)
> > > >
> > > > Can I ask you to check if there is something possibly wrong with the
> > > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > > maybe the whole UBI?
> > > >
> > > > FWIW testing above commit (and one before it) always results in single
> > > > error in the kernel log:
> > > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > > >
> > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> > >
> > > Let me provide a summary of all relevant commits & tests:
> > >
> > > By "Corruption" I mean file system corruption after power cut
> >
> > Well, is the filesystem not consistent anymore?
> > From what Russel explained to me, I thought the main problem is that no write back happens.
> > IOW the inode is present, has correct length, but no content is there (all zeros).
> >
> > Just like the typical case where userspace does not fsync.
> > But in your case sooner or later write back should have happened because the writeback timer
> > fires at some point.
> >
>
> For the records overlayfs does:
> - open(O_TMPFILE)
> - setxattr() [with 3a1e819b4e80]
> - write to tmpfile
> - fsync tmpfile
> - link tmpfile
>
> I suggest that you try the same from user space on ubifs.

Are you 100% sure about it? I tried writing C app behaving as you
described above and I could not reproduce the problem.

Then I took a close look at ovl_copy_up_locked() and it seems above
info isn't accurate. It seems to me that setxattr() happens between
fsync and link. I modified my C app to follow that order (open, write,
fsync, setxattr, link) and I can reproduce the problem now!

Steps to reproduce the problem:
1) compile tmptest.c
2) tmptest /overlay/upper/foo.txt user.bar baz
3) wait 5 seconds (so ubifs writes to flash)
4) power cut
5) boot again and check content of /overlay/upper/foo.txt
6) in my case content appears to be 00 00 00 00

Is this correct for ovl to call vfs_setxattr() *after* calling vfs_fsync()?

-- 
Rafał

[-- Attachment #2: tmptest.c --]
[-- Type: text/x-csrc, Size: 1616 bytes --]

#include <errno.h>
#include <fcntl.h>
#include <libgen.h>
#include <linux/limits.h>
#include <stdio.h>
#include <string.h>
#include <sys/xattr.h>
#include <unistd.h>

static const char foo[] = "foo\n";

int main(int argc, char **argv) {
	char path[PATH_MAX];
	char *dir;
	int err;
	int fd;

	if (argc < 2) {
		fprintf(stderr, "You have to pass file as argument\n");
		return -ENOENT;
	}

	dir = strdup(argv[1]);
	if (!dir) {
		return -ENOMEM;
	}
	dir = dirname(dir);
	fd = open(dir, O_TMPFILE | O_RDWR);
	if (fd < 0) {
		err = -errno;
		fprintf(stderr, "Failed to open O_TMPFILE: %d\n", err);
		return err;
	}

#if 0
	if (argc >= 4) {
		if (fsetxattr(fd, argv[2], argv[3], strlen(argv[3]), 0) != 0) {
			err = -errno;
			fprintf(stderr, "Failed to fsetxattr(): %d\n", err);
			return err;
		}
		printf("Wrote xattr \"%s\" value \"%s\"\n", argv[2], argv[3]);
	}
#endif

	if (write(fd, foo, sizeof(foo)) != sizeof(foo)) {
		err = -errno;
		fprintf(stderr, "Failed to write(): %d\n", err);
		return err;
	}

	if (fsync(fd) < 0) {
		err = -errno;
		fprintf(stderr, "Failed to fsync(): %d\n", err);
		return err;
	}

#if 1
	if (argc >= 4) {
		if (fsetxattr(fd, argv[2], argv[3], strlen(argv[3]), 0) != 0) {
			err = -errno;
			fprintf(stderr, "Failed to fsetxattr(): %d\n", err);
			return err;
		}
		printf("Wrote xattr \"%s\" value \"%s\"\n", argv[2], argv[3]);
	}
#endif

	snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
	if (linkat(-1, path, AT_FDCWD, argv[1], AT_SYMLINK_FOLLOW) != 0) {
		err = -errno;
		fprintf(stderr, "Failed to linkat(): %d\n", err);
		return err;
	}

	printf("Success!\n");

	return 0;
}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-22 15:34         ` Rafał Miłecki
  (?)
@ 2018-10-22 17:00         ` Amir Goldstein
  -1 siblings, 0 replies; 40+ messages in thread
From: Amir Goldstein @ 2018-10-22 17:00 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Richard Weinberger, Miklos Szeredi, overlayfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, russell,
	openwrt-devel

On Mon, Oct 22, 2018 at 6:34 PM Rafał Miłecki <zajec5@gmail.com> wrote:
>
> On Mon, 22 Oct 2018 at 10:57, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote:
> > >
> > > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > > > & describe test case for reproducing corruption when doing a power cut
> > > > > after first boot.
> > > > >
> > > > > (...)
> > > > >
> > > > > Can I ask you to check if there is something possibly wrong with the
> > > > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > > > maybe the whole UBI?
> > > > >
> > > > > FWIW testing above commit (and one before it) always results in single
> > > > > error in the kernel log:
> > > > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > > > >
> > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> > > >
> > > > Let me provide a summary of all relevant commits & tests:
> > > >
> > > > By "Corruption" I mean file system corruption after power cut
> > >
> > > Well, is the filesystem not consistent anymore?
> > > From what Russel explained to me, I thought the main problem is that no write back happens.
> > > IOW the inode is present, has correct length, but no content is there (all zeros).
> > >
> > > Just like the typical case where userspace does not fsync.
> > > But in your case sooner or later write back should have happened because the writeback timer
> > > fires at some point.
> > >
> >
> > For the records overlayfs does:
> > - open(O_TMPFILE)
> > - setxattr() [with 3a1e819b4e80]
> > - write to tmpfile
> > - fsync tmpfile
> > - link tmpfile
> >
> > I suggest that you try the same from user space on ubifs.
>
> Are you 100% sure about it? I tried writing C app behaving as you
> described above and I could not reproduce the problem.
>

I am never 100% sure ;-)
I looked at upstream code, which does ovl_set_origin() before
copying data.
commit  3a1e819b4e80 does ovl_set_origin() after fsync like you say.

> Then I took a close look at ovl_copy_up_locked() and it seems above
> info isn't accurate. It seems to me that setxattr() happens between
> fsync and link. I modified my C app to follow that order (open, write,
> fsync, setxattr, link) and I can reproduce the problem now!
>
> Steps to reproduce the problem:
> 1) compile tmptest.c
> 2) tmptest /overlay/upper/foo.txt user.bar baz
> 3) wait 5 seconds (so ubifs writes to flash)
> 4) power cut
> 5) boot again and check content of /overlay/upper/foo.txt
> 6) in my case content appears to be 00 00 00 00
>
> Is this correct for ovl to call vfs_setxattr() *after* calling vfs_fsync()?
>

Let's just clarify that for the matter of data corruption it doesn't matter
if and when overlayfs does vfs_setxattr(). Overalyfs does write()+fsync()
and that is all that should matter for guarantying data integrity.

Regarding "Is it correct to setxattr after fsync()?", that question depends
on what overlayfs needs to guaranty w.r.t crash consistency of its private
xattrs and on the type of crash consistency guaranties overlayfs expects
from the underlying filesystem.

Overlayfs needs to guaranty that if crash happens mid copy up, after
reboot, either the upper file is not there or the upper file is there with
all copied data and all copied metadata and the overlay xattrs.

Overlayfs current code expects the upper filesystem to have "strictly
ordered metadata" semantics, meaning that setxattr();link(), cannot
appear after crash as a linked file without the xattr. Just the same as
mkdir();chown();rename() cannot appear after crash as renamed dir
without the chwon();

Those semantics are provided by the journaling file systems (xfs, ext4)
and to be honest I am not sure if they are provided by btrfs or not -
every time this subject came up, btrfs developer answers were a bit too
vague for me to understand what is the expected behavior in btrfs.

Does ubifs has a problem to provide this guaranty?

Finally, I'll repeat with what I started - this has nothing to do with
data consistency. In fact, in the specific case of lower squashfs,
overlayfs will end up writing a zero length xattr, which overlayfs
only ever cares about on directories anyway.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-22 15:34         ` Rafał Miłecki
  (?)
  (?)
@ 2018-10-22 17:00         ` Amir Goldstein
  -1 siblings, 0 replies; 40+ messages in thread
From: Amir Goldstein @ 2018-10-22 17:00 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Artem Bityutskiy, Richard Weinberger, Miklos Szeredi,
	Adrian Hunter, overlayfs, linux-mtd, russell, linux-fsdevel,
	openwrt-devel

On Mon, Oct 22, 2018 at 6:34 PM Rafał Miłecki <zajec5@gmail.com> wrote:
>
> On Mon, 22 Oct 2018 at 10:57, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Mon, Oct 22, 2018 at 11:26 AM Richard Weinberger <richard@nod.at> wrote:
> > >
> > > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > > > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > > > & describe test case for reproducing corruption when doing a power cut
> > > > > after first boot.
> > > > >
> > > > > (...)
> > > > >
> > > > > Can I ask you to check if there is something possibly wrong with the
> > > > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > > > maybe the whole UBI?
> > > > >
> > > > > FWIW testing above commit (and one before it) always results in single
> > > > > error in the kernel log:
> > > > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > > > >
> > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> > > >
> > > > Let me provide a summary of all relevant commits & tests:
> > > >
> > > > By "Corruption" I mean file system corruption after power cut
> > >
> > > Well, is the filesystem not consistent anymore?
> > > From what Russel explained to me, I thought the main problem is that no write back happens.
> > > IOW the inode is present, has correct length, but no content is there (all zeros).
> > >
> > > Just like the typical case where userspace does not fsync.
> > > But in your case sooner or later write back should have happened because the writeback timer
> > > fires at some point.
> > >
> >
> > For the records overlayfs does:
> > - open(O_TMPFILE)
> > - setxattr() [with 3a1e819b4e80]
> > - write to tmpfile
> > - fsync tmpfile
> > - link tmpfile
> >
> > I suggest that you try the same from user space on ubifs.
>
> Are you 100% sure about it? I tried writing C app behaving as you
> described above and I could not reproduce the problem.
>

I am never 100% sure ;-)
I looked at upstream code, which does ovl_set_origin() before
copying data.
commit  3a1e819b4e80 does ovl_set_origin() after fsync like you say.

> Then I took a close look at ovl_copy_up_locked() and it seems above
> info isn't accurate. It seems to me that setxattr() happens between
> fsync and link. I modified my C app to follow that order (open, write,
> fsync, setxattr, link) and I can reproduce the problem now!
>
> Steps to reproduce the problem:
> 1) compile tmptest.c
> 2) tmptest /overlay/upper/foo.txt user.bar baz
> 3) wait 5 seconds (so ubifs writes to flash)
> 4) power cut
> 5) boot again and check content of /overlay/upper/foo.txt
> 6) in my case content appears to be 00 00 00 00
>
> Is this correct for ovl to call vfs_setxattr() *after* calling vfs_fsync()?
>

Let's just clarify that for the matter of data corruption it doesn't matter
if and when overlayfs does vfs_setxattr(). Overalyfs does write()+fsync()
and that is all that should matter for guarantying data integrity.

Regarding "Is it correct to setxattr after fsync()?", that question depends
on what overlayfs needs to guaranty w.r.t crash consistency of its private
xattrs and on the type of crash consistency guaranties overlayfs expects
from the underlying filesystem.

Overlayfs needs to guaranty that if crash happens mid copy up, after
reboot, either the upper file is not there or the upper file is there with
all copied data and all copied metadata and the overlay xattrs.

Overlayfs current code expects the upper filesystem to have "strictly
ordered metadata" semantics, meaning that setxattr();link(), cannot
appear after crash as a linked file without the xattr. Just the same as
mkdir();chown();rename() cannot appear after crash as renamed dir
without the chwon();

Those semantics are provided by the journaling file systems (xfs, ext4)
and to be honest I am not sure if they are provided by btrfs or not -
every time this subject came up, btrfs developer answers were a bit too
vague for me to understand what is the expected behavior in btrfs.

Does ubifs has a problem to provide this guaranty?

Finally, I'll repeat with what I started - this has nothing to do with
data consistency. In fact, in the specific case of lower squashfs,
overlayfs will end up writing a zero length xattr, which overlayfs
only ever cares about on directories anyway.

Thanks,
Amir.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-22 15:34         ` Rafał Miłecki
@ 2018-10-27 19:33           ` Richard Weinberger
  -1 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-27 19:33 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

[-- Attachment #1: Type: text/plain, Size: 1088 bytes --]

Rafał,

Am Montag, 22. Oktober 2018, 17:34:44 CEST schrieb Rafał Miłecki:
> Then I took a close look at ovl_copy_up_locked() and it seems above
> info isn't accurate. It seems to me that setxattr() happens between
> fsync and link. I modified my C app to follow that order (open, write,
> fsync, setxattr, link) and I can reproduce the problem now!
> 
> Steps to reproduce the problem:
> 1) compile tmptest.c
> 2) tmptest /overlay/upper/foo.txt user.bar baz
> 3) wait 5 seconds (so ubifs writes to flash)
> 4) power cut
> 5) boot again and check content of /overlay/upper/foo.txt
> 6) in my case content appears to be 00 00 00 00

Just returned from Edinburgh and had a chance to look at the problem.
The problem is not that no write-back happens, in fact it happens just fine.
But we have a problem upon journal replay if an unlinked file (O_TMPFILE)
gets relinked in combination with xattrs.

Can you please give the attached patch a try? It is not perfect but if I 
understand the problem correctly it should fix the issues you're facing.

Thanks,
//richard

[-- Attachment #2: replay_tmpfile_fix.diff --]
[-- Type: text/x-patch, Size: 1263 bytes --]

diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index 4844538eb926..82b1244e4138 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -209,6 +209,19 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r)
 	return ubifs_tnc_remove_range(c, &min_key, &max_key);
 }
 
+static bool inode_comes_back(struct ubifs_info *c, struct replay_entry *rino)
+{
+	struct replay_entry *r;
+
+	list_for_each_entry(r, &c->replay_list, list) {
+		if (keys_cmp(c, &rino->key, &r->key) == 0 &&
+		    r->deletion == 0)
+			return true;
+	}
+
+	return false;
+}
+
 /**
  * apply_replay_entry - apply a replay entry to the TNC.
  * @c: UBIFS file-system description object
@@ -218,7 +231,7 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r)
  */
 static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r)
 {
-	int err;
+	int err = 0;
 
 	dbg_mntk(&r->key, "LEB %d:%d len %d deletion %d sqnum %llu key ",
 		 r->lnum, r->offs, r->len, r->deletion, r->sqnum);
@@ -236,6 +249,9 @@ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r)
 			{
 				ino_t inum = key_inum(c, &r->key);
 
+				if (inode_comes_back(c, r))
+					break;
+
 				err = ubifs_tnc_remove_ino(c, inum);
 				break;
 			}

[-- Attachment #3: Type: text/plain, Size: 144 bytes --]

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-27 19:33           ` Richard Weinberger
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Weinberger @ 2018-10-27 19:33 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

[-- Attachment #1: Type: text/plain, Size: 1088 bytes --]

Rafał,

Am Montag, 22. Oktober 2018, 17:34:44 CEST schrieb Rafał Miłecki:
> Then I took a close look at ovl_copy_up_locked() and it seems above
> info isn't accurate. It seems to me that setxattr() happens between
> fsync and link. I modified my C app to follow that order (open, write,
> fsync, setxattr, link) and I can reproduce the problem now!
> 
> Steps to reproduce the problem:
> 1) compile tmptest.c
> 2) tmptest /overlay/upper/foo.txt user.bar baz
> 3) wait 5 seconds (so ubifs writes to flash)
> 4) power cut
> 5) boot again and check content of /overlay/upper/foo.txt
> 6) in my case content appears to be 00 00 00 00

Just returned from Edinburgh and had a chance to look at the problem.
The problem is not that no write-back happens, in fact it happens just fine.
But we have a problem upon journal replay if an unlinked file (O_TMPFILE)
gets relinked in combination with xattrs.

Can you please give the attached patch a try? It is not perfect but if I 
understand the problem correctly it should fix the issues you're facing.

Thanks,
//richard

[-- Attachment #2: replay_tmpfile_fix.diff --]
[-- Type: text/x-patch, Size: 1263 bytes --]

diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index 4844538eb926..82b1244e4138 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -209,6 +209,19 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r)
 	return ubifs_tnc_remove_range(c, &min_key, &max_key);
 }
 
+static bool inode_comes_back(struct ubifs_info *c, struct replay_entry *rino)
+{
+	struct replay_entry *r;
+
+	list_for_each_entry(r, &c->replay_list, list) {
+		if (keys_cmp(c, &rino->key, &r->key) == 0 &&
+		    r->deletion == 0)
+			return true;
+	}
+
+	return false;
+}
+
 /**
  * apply_replay_entry - apply a replay entry to the TNC.
  * @c: UBIFS file-system description object
@@ -218,7 +231,7 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r)
  */
 static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r)
 {
-	int err;
+	int err = 0;
 
 	dbg_mntk(&r->key, "LEB %d:%d len %d deletion %d sqnum %llu key ",
 		 r->lnum, r->offs, r->len, r->deletion, r->sqnum);
@@ -236,6 +249,9 @@ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r)
 			{
 				ino_t inum = key_inum(c, &r->key);
 
+				if (inode_comes_back(c, r))
+					break;
+
 				err = ubifs_tnc_remove_ino(c, inum);
 				break;
 			}

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-27 19:33           ` Richard Weinberger
  (?)
@ 2018-10-28  7:09           ` Russell Senior
  -1 siblings, 0 replies; 40+ messages in thread
From: Russell Senior @ 2018-10-28  7:09 UTC (permalink / raw)
  To: richard
  Cc: zajec5, amir73il, miklos, linux-unionfs, linux-fsdevel,
	dedekind1, adrian.hunter, linux-mtd, openwrt-devel

[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]

On Sat, Oct 27, 2018 at 12:33 PM Richard Weinberger <richard@nod.at> wrote:

> Rafał,
>
> Am Montag, 22. Oktober 2018, 17:34:44 CEST schrieb Rafał Miłecki:
> > Then I took a close look at ovl_copy_up_locked() and it seems above
> > info isn't accurate. It seems to me that setxattr() happens between
> > fsync and link. I modified my C app to follow that order (open, write,
> > fsync, setxattr, link) and I can reproduce the problem now!
> >
> > Steps to reproduce the problem:
> > 1) compile tmptest.c
> > 2) tmptest /overlay/upper/foo.txt user.bar baz
> > 3) wait 5 seconds (so ubifs writes to flash)
> > 4) power cut
> > 5) boot again and check content of /overlay/upper/foo.txt
> > 6) in my case content appears to be 00 00 00 00
>
> Just returned from Edinburgh and had a chance to look at the problem.
> The problem is not that no write-back happens, in fact it happens just
> fine.
> But we have a problem upon journal replay if an unlinked file (O_TMPFILE)
> gets relinked in combination with xattrs.
>
> Can you please give the attached patch a try? It is not perfect but if I
> understand the problem correctly it should fix the issues you're facing.
>

Your patch seems to have fixed it for me, sample size = 1.

-- 
Russell Senior
russell@personaltelco.net

[-- Attachment #2: Type: text/html, Size: 1777 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-22  8:26     ` Richard Weinberger
@ 2018-10-22 21:23       ` Rafał Miłecki
  -1 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-22 21:23 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

[-- Attachment #1: Type: text/plain, Size: 2568 bytes --]

On Mon, 22 Oct 2018 at 10:26, Richard Weinberger <richard@nod.at> wrote:
> Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > & describe test case for reproducing corruption when doing a power cut
> > > after first boot.
> > >
> > > (...)
> > >
> > > Can I ask you to check if there is something possibly wrong with the
> > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > maybe the whole UBI?
> > >
> > > FWIW testing above commit (and one before it) always results in single
> > > error in the kernel log:
> > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > >
> > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> >
> > Let me provide a summary of all relevant commits & tests:
> >
> > By "Corruption" I mean file system corruption after power cut
>
> Well, is the filesystem not consistent anymore?
> From what Russel explained to me, I thought the main problem is that no write back happens.
> IOW the inode is present, has correct length, but no content is there (all zeros).

I probably misused "corruption" word. What I meant by "corruption" was
file having all "00"es instead of expected data.


> Just like the typical case where userspace does not fsync.
> But in your case sooner or later write back should have happened because the writeback timer
> fires at some point.

As you probably noticed I wrote tmptest.c - could you test it, please?
See if you can reproduce the problem? Please call it with 3 arguments:
1) Path on ubifs mount point
2) Some xattr name
3) Some xattr value

Please note I wait 5 seconds (this matches
vm.dirty_writeback_centisecs being 500) before doing a power cut. That
lets ubifs write to flash. For some reason files that got fsetxattr
called for them still are all "00"es after a power cut.

I did some extra testing after enabling ubifs debugging for io.c,
file.c and journal.c. Debugging output looks like expected. I can
clearly see wbuf_timer_callback_nolock() being called.
I attached my debugging summary as ubifs-dbg.html please take a look
at it in case I missed something.

-- 
Rafał

[-- Attachment #2: ubifs-dbg.html --]
[-- Type: text/html, Size: 5727 bytes --]

[-- Attachment #3: Type: text/plain, Size: 144 bytes --]

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-22 21:23       ` Rafał Miłecki
  0 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-22 21:23 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

[-- Attachment #1: Type: text/plain, Size: 2568 bytes --]

On Mon, 22 Oct 2018 at 10:26, Richard Weinberger <richard@nod.at> wrote:
> Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > & describe test case for reproducing corruption when doing a power cut
> > > after first boot.
> > >
> > > (...)
> > >
> > > Can I ask you to check if there is something possibly wrong with the
> > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > maybe the whole UBI?
> > >
> > > FWIW testing above commit (and one before it) always results in single
> > > error in the kernel log:
> > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > >
> > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> >
> > Let me provide a summary of all relevant commits & tests:
> >
> > By "Corruption" I mean file system corruption after power cut
>
> Well, is the filesystem not consistent anymore?
> From what Russel explained to me, I thought the main problem is that no write back happens.
> IOW the inode is present, has correct length, but no content is there (all zeros).

I probably misused "corruption" word. What I meant by "corruption" was
file having all "00"es instead of expected data.


> Just like the typical case where userspace does not fsync.
> But in your case sooner or later write back should have happened because the writeback timer
> fires at some point.

As you probably noticed I wrote tmptest.c - could you test it, please?
See if you can reproduce the problem? Please call it with 3 arguments:
1) Path on ubifs mount point
2) Some xattr name
3) Some xattr value

Please note I wait 5 seconds (this matches
vm.dirty_writeback_centisecs being 500) before doing a power cut. That
lets ubifs write to flash. For some reason files that got fsetxattr
called for them still are all "00"es after a power cut.

I did some extra testing after enabling ubifs debugging for io.c,
file.c and journal.c. Debugging output looks like expected. I can
clearly see wbuf_timer_callback_nolock() being called.
I attached my debugging summary as ubifs-dbg.html please take a look
at it in case I missed something.

-- 
Rafał

[-- Attachment #2: ubifs-dbg.html --]
[-- Type: text/html, Size: 5727 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
  2018-10-22  8:26     ` Richard Weinberger
@ 2018-10-22 21:27       ` Rafał Miłecki
  -1 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-22 21:27 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Artem Bityutskiy, Amir Goldstein, Adrian Hunter, linux-unionfs,
	Miklos Szeredi, linux-mtd, Russell Senior, linux-fsdevel,
	OpenWrt Development List

[resending without HTML attachment]
[see http://files.zajec.net/openwrt/ubifs-dbg.html instead]

On Mon, 22 Oct 2018 at 10:26, Richard Weinberger <richard@nod.at> wrote:
> Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > & describe test case for reproducing corruption when doing a power cut
> > > after first boot.
> > >
> > > (...)
> > >
> > > Can I ask you to check if there is something possibly wrong with the
> > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > maybe the whole UBI?
> > >
> > > FWIW testing above commit (and one before it) always results in single
> > > error in the kernel log:
> > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > >
> > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> >
> > Let me provide a summary of all relevant commits & tests:
> >
> > By "Corruption" I mean file system corruption after power cut
>
> Well, is the filesystem not consistent anymore?
> From what Russel explained to me, I thought the main problem is that no write back happens.
> IOW the inode is present, has correct length, but no content is there (all zeros).

I probably misused "corruption" word. What I meant by "corruption" was
file having all "00"es instead of expected data.


> Just like the typical case where userspace does not fsync.
> But in your case sooner or later write back should have happened because the writeback timer
> fires at some point.

As you probably noticed I wrote tmptest.c - could you test it, please?
See if you can reproduce the problem? Please call it with 3 arguments:
1) Path on ubifs mount point
2) Some xattr name
3) Some xattr value

Please note I wait 5 seconds (this matches
vm.dirty_writeback_centisecs being 500) before doing a power cut. That
lets ubifs write to flash. For some reason files that got fsetxattr
called for them still are all "00"es after a power cut.

I did some extra testing after enabling ubifs debugging for io.c,
file.c and journal.c. Debugging output looks like expected. I can
clearly see wbuf_timer_callback_nolock() being called.
I attached my debugging summary as ubifs-dbg.html please take a look
at it in case I missed something.

-- 
Rafał

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
@ 2018-10-22 21:27       ` Rafał Miłecki
  0 siblings, 0 replies; 40+ messages in thread
From: Rafał Miłecki @ 2018-10-22 21:27 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Amir Goldstein, Miklos Szeredi, linux-unionfs, linux-fsdevel,
	Artem Bityutskiy, Adrian Hunter, linux-mtd, Russell Senior,
	OpenWrt Development List

[resending without HTML attachment]
[see http://files.zajec.net/openwrt/ubifs-dbg.html instead]

On Mon, 22 Oct 2018 at 10:26, Richard Weinberger <richard@nod.at> wrote:
> Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > & describe test case for reproducing corruption when doing a power cut
> > > after first boot.
> > >
> > > (...)
> > >
> > > Can I ask you to check if there is something possibly wrong with the
> > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > maybe the whole UBI?
> > >
> > > FWIW testing above commit (and one before it) always results in single
> > > error in the kernel log:
> > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > >
> > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> >
> > Let me provide a summary of all relevant commits & tests:
> >
> > By "Corruption" I mean file system corruption after power cut
>
> Well, is the filesystem not consistent anymore?
> From what Russel explained to me, I thought the main problem is that no write back happens.
> IOW the inode is present, has correct length, but no content is there (all zeros).

I probably misused "corruption" word. What I meant by "corruption" was
file having all "00"es instead of expected data.


> Just like the typical case where userspace does not fsync.
> But in your case sooner or later write back should have happened because the writeback timer
> fires at some point.

As you probably noticed I wrote tmptest.c - could you test it, please?
See if you can reproduce the problem? Please call it with 3 arguments:
1) Path on ubifs mount point
2) Some xattr name
3) Some xattr value

Please note I wait 5 seconds (this matches
vm.dirty_writeback_centisecs being 500) before doing a power cut. That
lets ubifs write to flash. For some reason files that got fsetxattr
called for them still are all "00"es after a power cut.

I did some extra testing after enabling ubifs debugging for io.c,
file.c and journal.c. Debugging output looks like expected. I can
clearly see wbuf_timer_callback_nolock() being called.
I attached my debugging summary as ubifs-dbg.html please take a look
at it in case I missed something.

-- 
Rafał

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2018-10-28  7:09 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-19 12:31 Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") Rafał Miłecki
2018-10-19 12:31 ` Rafał Miłecki
2018-10-19 14:45 ` Richard Weinberger
2018-10-19 14:45   ` Richard Weinberger
2018-10-19 14:59   ` Richard Weinberger
2018-10-19 14:59   ` Richard Weinberger
2018-10-19 15:07     ` Amir Goldstein
2018-10-19 15:07     ` Amir Goldstein
2018-10-19 21:28     ` Rafał Miłecki
2018-10-19 21:28     ` Rafał Miłecki
2018-10-20  6:58       ` Richard Weinberger
2018-10-22  7:01         ` Rafał Miłecki
2018-10-22  7:01           ` Rafał Miłecki
2018-10-20  6:58       ` Richard Weinberger
2018-10-19 16:18   ` Rafał Miłecki
2018-10-19 16:18   ` Rafał Miłecki
2018-10-19 17:18     ` Richard Weinberger
2018-10-19 17:18       ` Richard Weinberger
2018-10-19 21:29       ` Rafał Miłecki
2018-10-19 21:29         ` Rafał Miłecki
2018-10-19 14:56 ` Amir Goldstein
2018-10-19 14:56   ` Amir Goldstein
2018-10-22  7:14 ` Rafał Miłecki
2018-10-22  7:14   ` Rafał Miłecki
2018-10-22  8:26   ` Richard Weinberger
2018-10-22  8:26     ` Richard Weinberger
2018-10-22  8:26     ` Richard Weinberger
2018-10-22  8:57     ` Amir Goldstein
2018-10-22  8:57       ` Amir Goldstein
2018-10-22 15:34       ` Rafał Miłecki
2018-10-22 15:34         ` Rafał Miłecki
2018-10-22 17:00         ` Amir Goldstein
2018-10-22 17:00         ` Amir Goldstein
2018-10-27 19:33         ` Richard Weinberger
2018-10-27 19:33           ` Richard Weinberger
2018-10-28  7:09           ` Russell Senior
2018-10-22 21:23     ` Rafał Miłecki
2018-10-22 21:23       ` Rafał Miłecki
2018-10-22 21:27     ` Rafał Miłecki
2018-10-22 21:27       ` Rafał Miłecki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.