All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafał Miłecki" <zajec5@gmail.com>
To: Richard Weinberger <richard@nod.at>
Cc: Artem Bityutskiy <dedekind1@gmail.com>,
	Amir Goldstein <amir73il@gmail.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-unionfs@vger.kernel.org, Miklos Szeredi <miklos@szeredi.hu>,
	linux-mtd@lists.infradead.org,
	Russell Senior <russell@personaltelco.net>,
	linux-fsdevel@vger.kernel.org,
	OpenWrt Development List <openwrt-devel@lists.openwrt.org>
Subject: Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
Date: Mon, 22 Oct 2018 23:27:00 +0200	[thread overview]
Message-ID: <CACna6rzOuC8vf12qRPOCJwdMPH7-R=eUGaofNYjD7f0_1a23oA@mail.gmail.com> (raw)
In-Reply-To: <3000620.g91H2S8sUk@blindfold>

[resending without HTML attachment]
[see http://files.zajec.net/openwrt/ubifs-dbg.html instead]

On Mon, 22 Oct 2018 at 10:26, Richard Weinberger <richard@nod.at> wrote:
> Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > & describe test case for reproducing corruption when doing a power cut
> > > after first boot.
> > >
> > > (...)
> > >
> > > Can I ask you to check if there is something possibly wrong with the
> > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > maybe the whole UBI?
> > >
> > > FWIW testing above commit (and one before it) always results in single
> > > error in the kernel log:
> > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > >
> > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> >
> > Let me provide a summary of all relevant commits & tests:
> >
> > By "Corruption" I mean file system corruption after power cut
>
> Well, is the filesystem not consistent anymore?
> From what Russel explained to me, I thought the main problem is that no write back happens.
> IOW the inode is present, has correct length, but no content is there (all zeros).

I probably misused "corruption" word. What I meant by "corruption" was
file having all "00"es instead of expected data.


> Just like the typical case where userspace does not fsync.
> But in your case sooner or later write back should have happened because the writeback timer
> fires at some point.

As you probably noticed I wrote tmptest.c - could you test it, please?
See if you can reproduce the problem? Please call it with 3 arguments:
1) Path on ubifs mount point
2) Some xattr name
3) Some xattr value

Please note I wait 5 seconds (this matches
vm.dirty_writeback_centisecs being 500) before doing a power cut. That
lets ubifs write to flash. For some reason files that got fsetxattr
called for them still are all "00"es after a power cut.

I did some extra testing after enabling ubifs debugging for io.c,
file.c and journal.c. Debugging output looks like expected. I can
clearly see wbuf_timer_callback_nolock() being called.
I attached my debugging summary as ubifs-dbg.html please take a look
at it in case I missed something.

-- 
Rafał

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

WARNING: multiple messages have this Message-ID (diff)
From: "Rafał Miłecki" <zajec5@gmail.com>
To: Richard Weinberger <richard@nod.at>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	linux-unionfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Artem Bityutskiy <dedekind1@gmail.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-mtd@lists.infradead.org,
	Russell Senior <russell@personaltelco.net>,
	OpenWrt Development List <openwrt-devel@lists.openwrt.org>
Subject: Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
Date: Mon, 22 Oct 2018 23:27:00 +0200	[thread overview]
Message-ID: <CACna6rzOuC8vf12qRPOCJwdMPH7-R=eUGaofNYjD7f0_1a23oA@mail.gmail.com> (raw)
In-Reply-To: <3000620.g91H2S8sUk@blindfold>

[resending without HTML attachment]
[see http://files.zajec.net/openwrt/ubifs-dbg.html instead]

On Mon, 22 Oct 2018 at 10:26, Richard Weinberger <richard@nod.at> wrote:
> Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafał Miłecki:
> > On Fri, 19 Oct 2018 at 14:31, Rafał Miłecki <zajec5@gmail.com> wrote:
> > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> > > reporting file system corruptions. OpenWrt uses overlay(fs) with
> > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> > > & describe test case for reproducing corruption when doing a power cut
> > > after first boot.
> > >
> > > (...)
> > >
> > > Can I ask you to check if there is something possibly wrong with the
> > > above ovl commit? Or does it expose some problem with the ubifs? Or
> > > maybe the whole UBI?
> > >
> > > FWIW testing above commit (and one before it) always results in single
> > > error in the kernel log:
> > > [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice
> > >
> > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's
> > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14.
> >
> > Let me provide a summary of all relevant commits & tests:
> >
> > By "Corruption" I mean file system corruption after power cut
>
> Well, is the filesystem not consistent anymore?
> From what Russel explained to me, I thought the main problem is that no write back happens.
> IOW the inode is present, has correct length, but no content is there (all zeros).

I probably misused "corruption" word. What I meant by "corruption" was
file having all "00"es instead of expected data.


> Just like the typical case where userspace does not fsync.
> But in your case sooner or later write back should have happened because the writeback timer
> fires at some point.

As you probably noticed I wrote tmptest.c - could you test it, please?
See if you can reproduce the problem? Please call it with 3 arguments:
1) Path on ubifs mount point
2) Some xattr name
3) Some xattr value

Please note I wait 5 seconds (this matches
vm.dirty_writeback_centisecs being 500) before doing a power cut. That
lets ubifs write to flash. For some reason files that got fsetxattr
called for them still are all "00"es after a power cut.

I did some extra testing after enabling ubifs debugging for io.c,
file.c and journal.c. Debugging output looks like expected. I can
clearly see wbuf_timer_callback_nolock() being called.
I attached my debugging summary as ubifs-dbg.html please take a look
at it in case I missed something.

-- 
Rafał

  parent reply	other threads:[~2018-10-22 21:27 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-19 12:31 Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") Rafał Miłecki
2018-10-19 12:31 ` Rafał Miłecki
2018-10-19 14:45 ` Richard Weinberger
2018-10-19 14:45   ` Richard Weinberger
2018-10-19 14:59   ` Richard Weinberger
2018-10-19 14:59   ` Richard Weinberger
2018-10-19 15:07     ` Amir Goldstein
2018-10-19 15:07     ` Amir Goldstein
2018-10-19 21:28     ` Rafał Miłecki
2018-10-19 21:28     ` Rafał Miłecki
2018-10-20  6:58       ` Richard Weinberger
2018-10-22  7:01         ` Rafał Miłecki
2018-10-22  7:01           ` Rafał Miłecki
2018-10-20  6:58       ` Richard Weinberger
2018-10-19 16:18   ` Rafał Miłecki
2018-10-19 16:18   ` Rafał Miłecki
2018-10-19 17:18     ` Richard Weinberger
2018-10-19 17:18       ` Richard Weinberger
2018-10-19 21:29       ` Rafał Miłecki
2018-10-19 21:29         ` Rafał Miłecki
2018-10-19 14:56 ` Amir Goldstein
2018-10-19 14:56   ` Amir Goldstein
2018-10-22  7:14 ` Rafał Miłecki
2018-10-22  7:14   ` Rafał Miłecki
2018-10-22  8:26   ` Richard Weinberger
2018-10-22  8:26     ` Richard Weinberger
2018-10-22  8:26     ` Richard Weinberger
2018-10-22  8:57     ` Amir Goldstein
2018-10-22  8:57       ` Amir Goldstein
2018-10-22 15:34       ` Rafał Miłecki
2018-10-22 15:34         ` Rafał Miłecki
2018-10-22 17:00         ` Amir Goldstein
2018-10-22 17:00         ` Amir Goldstein
2018-10-27 19:33         ` Richard Weinberger
2018-10-27 19:33           ` Richard Weinberger
2018-10-28  7:09           ` Russell Senior
2018-10-22 21:23     ` Rafał Miłecki
2018-10-22 21:23       ` Rafał Miłecki
2018-10-22 21:27     ` Rafał Miłecki [this message]
2018-10-22 21:27       ` Rafał Miłecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACna6rzOuC8vf12qRPOCJwdMPH7-R=eUGaofNYjD7f0_1a23oA@mail.gmail.com' \
    --to=zajec5@gmail.com \
    --cc=adrian.hunter@intel.com \
    --cc=amir73il@gmail.com \
    --cc=dedekind1@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=openwrt-devel@lists.openwrt.org \
    --cc=richard@nod.at \
    --cc=russell@personaltelco.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.