All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xuanrui Qi <me@xuanruiqi.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Subject: Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs
Date: Tue, 02 Jun 2020 10:51:51 +0900	[thread overview]
Message-ID: <DM6PR02MB4427F7962A6FD26BC147B4B2C98B0@DM6PR02MB4427.namprd02.prod.outlook.com> (raw)
In-Reply-To: <bf3629ad-730d-3808-38e5-8c42eccbaf5e@gmx.com>

[-- Attachment #1: Type: text/plain, Size: 5560 bytes --]

Hello Wenruo (and all),

> Any log on `btrfs check` without --repair?

This was all after I reformatted the partition, so it might not be as
useful. But as you see, `dmesg` reports 14 corruption errors on
/dev/sda1 (which has been functioning correctly) but `btrfs scrub` does
not report any problems. I'll do a btrfs check when I boot from a live
USB.

> But normally, csum read shouldn't lead to RO, thus I believe there
> are more problems of that previous failure.

I think there are other problems indeed, not just csum mismatch. I got
lots of I/O errors, but now after reformatting my partition they just
disappeared. Particularly, writing to the filesystem could randomly
crash the filesystem. It could be a hardware issue, but now it seems
more likely to be software-related.

Best,
Xuanrui

On Tue, 2020-06-02 at 09:18 +0800, Qu Wenruo wrote:
> 
> On 2020/6/2 上午5:08, Xuanrui Qi wrote:
> > Hello all,
> > 
> > I have just recovered from a massive filesystem corruption problem
> > which turned out to be a total nightmare, and I have strong reason
> > to
> > suspect that it is related to eCryptfs-encrypted folders on btrfs.
> > 
> > I run Arch Linux and have my /home directory as a btrfs partition.
> > My
> > user's home directory (/home/xuanrui) is encrypted using eCryptFS.
> > 
> > I ran into a massive filesystem corrpution issue a while ago. When
> > reading certain files or occasionally writing to files, I encounter
> > FS
> > errors (mainly checksum errors, but also other I/O errors). Then my
> > file system becomes read-only because errors were encountered.
> 
> It's a pity we won't get the dmesg of that incident, what would be
> super
> useful to debug.
> 
> > A `btrfs scrub` identified a dozen of checksum errors which were
> > "not
> > correctable", and `btrfs check --repair` (and `btrfs check --repair 
> > --
> > init-csum-tree`)
> 
> Not recommended, but the output may still help.
> 
> > also failed to fix anything. The former crashed in a
> > segfault, and the latter refused to write anything because of an
> > "I/O
> > error".
> > 
> > Unfortunately, I don't have any logs because I had to nuke (wipe &
> > re-
> > make) my filesystem as the solution. However, after the
> > reformatting I
> > gave up using eCryptFs, and the file corruption bugs have not
> > reappeared since.
> 
> That's a little strange. I guess there is some buffered IO mixed with
> direct IO, which is known to cause csum mismatch, while other fs just
> can't detect such data corruption and pretend nothing happened.
> 
> But normally, csum read shouldn't lead to RO, thus I believe there
> are
> more problems of that previous failure.
> 
> > Initially I suspected that it was a hardware issue,
> > but I did a SMART test and no errors were detected; I strongly
> > suspect
> > that it is related to eCryptFS.
> > 
> > System info:
> > 
> > uname -a:
> > 
> > Linux xuanruiwork 5.6.15-3-clear #1 SMP Sun, 31 May 2020 19:57:42
> > +0000
> > x86_64 GNU/Linux
> > 
> > btrfs --version:
> > btrfs-progs v5.6.1
> > 
> > (the rest is from after the reformat, but the setup is identical to
> > before the reformat sans eCryptFS)
> > 
> > btrfs fi show:
> > Label: none  uuid: 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64
> > 	Total devices 1 FS bytes used 57.58GiB
> > 	devid    1 size 332.94GiB used 60.02GiB path /dev/sda3
> > 
> > btrfs fi df /home:
> > Data, single: total=59.01GiB, used=57.26GiB
> > System, single: total=4.00MiB, used=16.00KiB
> > Metadata, single: total=1.01GiB, used=328.25MiB
> > GlobalReserve, single: total=75.17MiB, used=0.00B
> > 
> > Some output from dmesg (note that /dev/sda1 is not the corrupted
> > filesystem; these corruptions seem to have been self-corrected by
> > btrfs):
> > 
> > [    3.434351] BTRFS: device fsid 823961e1-6b9e-4ab8-b5a7-
> > c17eb8c40d64
> > devid 1 transid 79 /dev/sda3 scanned by systemd-udevd (519)
> > [    3.440896] BTRFS: device fsid a3892669-1ad8-4ff3-9747-
> > 0f8c405c0e6a
> > devid 1 transid 4769881 /dev/sda1 scanned by systemd-udevd (487)
> > [    3.461539] BTRFS info (device sda1): disk space caching is
> > enabled
> > [    3.461540] BTRFS info (device sda1): has skinny extents
> > [    3.464079] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0,
> > rd
> > 0, flush 0, corrupt 14, gen 0
> 
> Corruption count 14 doesn't seem good.
> 
> > [    3.510991] BTRFS info (device sda1): enabling ssd optimizations
> > [    5.938153] BTRFS info (device sda1): disk space caching is
> > enabled
> > [    7.072974] BTRFS info (device sda3): enabling ssd optimizations
> > [    7.072977] BTRFS info (device sda3): disk space caching is
> > enabled
> > [    7.072978] BTRFS info (device sda3): has skinny extents
> > [ 3710.968433] BTRFS warning (device sda3): qgroup rescan init
> > failed,
> > qgroup is not enabled
> 
> And btrfs is trying to init qgroup rescan while qgroup is not
> enabled?
> That's doesn't sound good either.
> 
> > [ 7412.459332] BTRFS info (device sda1): scrub: started on devid 1
> > [ 7545.641724] BTRFS info (device sda1): scrub: finished on devid 1
> > with status: 0
> > [ 8244.846830] BTRFS info (device sda3): scrub: started on devid 1
> > [ 8369.651774] BTRFS info (device sda3): scrub: finished on devid 1
> > with status: 0
> 
> Any log on `btrfs check` without --repair?
> 
> Thanks,
> Qu
> > If anyone could look into the issue, it would be greatly
> > appreciated.
> > 
> > Best,
> > Xuanrui
> > 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-06-02  1:52 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-01 21:08 Massive filesystem corruption, potentially related to eCryptfs-on-btrfs Xuanrui Qi
2020-06-02  1:18 ` Qu Wenruo
2020-06-02  1:51   ` Xuanrui Qi [this message]
2020-06-02  3:58     ` Chris Murphy
2020-06-02  6:04 ` Swâmi Petaramesh
2020-06-03 13:01   ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR02MB4427F7962A6FD26BC147B4B2C98B0@DM6PR02MB4427.namprd02.prod.outlook.com \
    --to=me@xuanruiqi.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.