* Massive filesystem corruption, potentially related to eCryptfs-on-btrfs @ 2020-06-01 21:08 Xuanrui Qi 2020-06-02 1:18 ` Qu Wenruo 2020-06-02 6:04 ` Swâmi Petaramesh 0 siblings, 2 replies; 6+ messages in thread From: Xuanrui Qi @ 2020-06-01 21:08 UTC (permalink / raw) To: linux-btrfs Hello all, I have just recovered from a massive filesystem corruption problem which turned out to be a total nightmare, and I have strong reason to suspect that it is related to eCryptfs-encrypted folders on btrfs. I run Arch Linux and have my /home directory as a btrfs partition. My user's home directory (/home/xuanrui) is encrypted using eCryptFS. I ran into a massive filesystem corrpution issue a while ago. When reading certain files or occasionally writing to files, I encounter FS errors (mainly checksum errors, but also other I/O errors). Then my file system becomes read-only because errors were encountered. A `btrfs scrub` identified a dozen of checksum errors which were "not correctable", and `btrfs check --repair` (and `btrfs check --repair -- init-csum-tree`) also failed to fix anything. The former crashed in a segfault, and the latter refused to write anything because of an "I/O error". Unfortunately, I don't have any logs because I had to nuke (wipe & re- make) my filesystem as the solution. However, after the reformatting I gave up using eCryptFs, and the file corruption bugs have not reappeared since. Initially I suspected that it was a hardware issue, but I did a SMART test and no errors were detected; I strongly suspect that it is related to eCryptFS. System info: uname -a: Linux xuanruiwork 5.6.15-3-clear #1 SMP Sun, 31 May 2020 19:57:42 +0000 x86_64 GNU/Linux btrfs --version: btrfs-progs v5.6.1 (the rest is from after the reformat, but the setup is identical to before the reformat sans eCryptFS) btrfs fi show: Label: none uuid: 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64 Total devices 1 FS bytes used 57.58GiB devid 1 size 332.94GiB used 60.02GiB path /dev/sda3 btrfs fi df /home: Data, single: total=59.01GiB, used=57.26GiB System, single: total=4.00MiB, used=16.00KiB Metadata, single: total=1.01GiB, used=328.25MiB GlobalReserve, single: total=75.17MiB, used=0.00B Some output from dmesg (note that /dev/sda1 is not the corrupted filesystem; these corruptions seem to have been self-corrected by btrfs): [ 3.434351] BTRFS: device fsid 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64 devid 1 transid 79 /dev/sda3 scanned by systemd-udevd (519) [ 3.440896] BTRFS: device fsid a3892669-1ad8-4ff3-9747-0f8c405c0e6a devid 1 transid 4769881 /dev/sda1 scanned by systemd-udevd (487) [ 3.461539] BTRFS info (device sda1): disk space caching is enabled [ 3.461540] BTRFS info (device sda1): has skinny extents [ 3.464079] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 14, gen 0 [ 3.510991] BTRFS info (device sda1): enabling ssd optimizations [ 5.938153] BTRFS info (device sda1): disk space caching is enabled [ 7.072974] BTRFS info (device sda3): enabling ssd optimizations [ 7.072977] BTRFS info (device sda3): disk space caching is enabled [ 7.072978] BTRFS info (device sda3): has skinny extents [ 3710.968433] BTRFS warning (device sda3): qgroup rescan init failed, qgroup is not enabled [ 7412.459332] BTRFS info (device sda1): scrub: started on devid 1 [ 7545.641724] BTRFS info (device sda1): scrub: finished on devid 1 with status: 0 [ 8244.846830] BTRFS info (device sda3): scrub: started on devid 1 [ 8369.651774] BTRFS info (device sda3): scrub: finished on devid 1 with status: 0 If anyone could look into the issue, it would be greatly appreciated. Best, Xuanrui ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs 2020-06-01 21:08 Massive filesystem corruption, potentially related to eCryptfs-on-btrfs Xuanrui Qi @ 2020-06-02 1:18 ` Qu Wenruo 2020-06-02 1:51 ` Xuanrui Qi 2020-06-02 6:04 ` Swâmi Petaramesh 1 sibling, 1 reply; 6+ messages in thread From: Qu Wenruo @ 2020-06-02 1:18 UTC (permalink / raw) To: Xuanrui Qi, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 4373 bytes --] On 2020/6/2 上午5:08, Xuanrui Qi wrote: > Hello all, > > I have just recovered from a massive filesystem corruption problem > which turned out to be a total nightmare, and I have strong reason to > suspect that it is related to eCryptfs-encrypted folders on btrfs. > > I run Arch Linux and have my /home directory as a btrfs partition. My > user's home directory (/home/xuanrui) is encrypted using eCryptFS. > > I ran into a massive filesystem corrpution issue a while ago. When > reading certain files or occasionally writing to files, I encounter FS > errors (mainly checksum errors, but also other I/O errors). Then my > file system becomes read-only because errors were encountered. It's a pity we won't get the dmesg of that incident, what would be super useful to debug. > > A `btrfs scrub` identified a dozen of checksum errors which were "not > correctable", and `btrfs check --repair` (and `btrfs check --repair -- > init-csum-tree`) Not recommended, but the output may still help. > also failed to fix anything. The former crashed in a > segfault, and the latter refused to write anything because of an "I/O > error". > > Unfortunately, I don't have any logs because I had to nuke (wipe & re- > make) my filesystem as the solution. However, after the reformatting I > gave up using eCryptFs, and the file corruption bugs have not > reappeared since. That's a little strange. I guess there is some buffered IO mixed with direct IO, which is known to cause csum mismatch, while other fs just can't detect such data corruption and pretend nothing happened. But normally, csum read shouldn't lead to RO, thus I believe there are more problems of that previous failure. > Initially I suspected that it was a hardware issue, > but I did a SMART test and no errors were detected; I strongly suspect > that it is related to eCryptFS. > > System info: > > uname -a: > > Linux xuanruiwork 5.6.15-3-clear #1 SMP Sun, 31 May 2020 19:57:42 +0000 > x86_64 GNU/Linux > > btrfs --version: > btrfs-progs v5.6.1 > > (the rest is from after the reformat, but the setup is identical to > before the reformat sans eCryptFS) > > btrfs fi show: > Label: none uuid: 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64 > Total devices 1 FS bytes used 57.58GiB > devid 1 size 332.94GiB used 60.02GiB path /dev/sda3 > > btrfs fi df /home: > Data, single: total=59.01GiB, used=57.26GiB > System, single: total=4.00MiB, used=16.00KiB > Metadata, single: total=1.01GiB, used=328.25MiB > GlobalReserve, single: total=75.17MiB, used=0.00B > > Some output from dmesg (note that /dev/sda1 is not the corrupted > filesystem; these corruptions seem to have been self-corrected by > btrfs): > > [ 3.434351] BTRFS: device fsid 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64 > devid 1 transid 79 /dev/sda3 scanned by systemd-udevd (519) > [ 3.440896] BTRFS: device fsid a3892669-1ad8-4ff3-9747-0f8c405c0e6a > devid 1 transid 4769881 /dev/sda1 scanned by systemd-udevd (487) > [ 3.461539] BTRFS info (device sda1): disk space caching is enabled > [ 3.461540] BTRFS info (device sda1): has skinny extents > [ 3.464079] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd > 0, flush 0, corrupt 14, gen 0 Corruption count 14 doesn't seem good. > [ 3.510991] BTRFS info (device sda1): enabling ssd optimizations > [ 5.938153] BTRFS info (device sda1): disk space caching is enabled > [ 7.072974] BTRFS info (device sda3): enabling ssd optimizations > [ 7.072977] BTRFS info (device sda3): disk space caching is enabled > [ 7.072978] BTRFS info (device sda3): has skinny extents > [ 3710.968433] BTRFS warning (device sda3): qgroup rescan init failed, > qgroup is not enabled And btrfs is trying to init qgroup rescan while qgroup is not enabled? That's doesn't sound good either. > [ 7412.459332] BTRFS info (device sda1): scrub: started on devid 1 > [ 7545.641724] BTRFS info (device sda1): scrub: finished on devid 1 > with status: 0 > [ 8244.846830] BTRFS info (device sda3): scrub: started on devid 1 > [ 8369.651774] BTRFS info (device sda3): scrub: finished on devid 1 > with status: 0 Any log on `btrfs check` without --repair? Thanks, Qu > > If anyone could look into the issue, it would be greatly appreciated. > > Best, > Xuanrui > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs 2020-06-02 1:18 ` Qu Wenruo @ 2020-06-02 1:51 ` Xuanrui Qi 2020-06-02 3:58 ` Chris Murphy 0 siblings, 1 reply; 6+ messages in thread From: Xuanrui Qi @ 2020-06-02 1:51 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 5560 bytes --] Hello Wenruo (and all), > Any log on `btrfs check` without --repair? This was all after I reformatted the partition, so it might not be as useful. But as you see, `dmesg` reports 14 corruption errors on /dev/sda1 (which has been functioning correctly) but `btrfs scrub` does not report any problems. I'll do a btrfs check when I boot from a live USB. > But normally, csum read shouldn't lead to RO, thus I believe there > are more problems of that previous failure. I think there are other problems indeed, not just csum mismatch. I got lots of I/O errors, but now after reformatting my partition they just disappeared. Particularly, writing to the filesystem could randomly crash the filesystem. It could be a hardware issue, but now it seems more likely to be software-related. Best, Xuanrui On Tue, 2020-06-02 at 09:18 +0800, Qu Wenruo wrote: > > On 2020/6/2 上午5:08, Xuanrui Qi wrote: > > Hello all, > > > > I have just recovered from a massive filesystem corruption problem > > which turned out to be a total nightmare, and I have strong reason > > to > > suspect that it is related to eCryptfs-encrypted folders on btrfs. > > > > I run Arch Linux and have my /home directory as a btrfs partition. > > My > > user's home directory (/home/xuanrui) is encrypted using eCryptFS. > > > > I ran into a massive filesystem corrpution issue a while ago. When > > reading certain files or occasionally writing to files, I encounter > > FS > > errors (mainly checksum errors, but also other I/O errors). Then my > > file system becomes read-only because errors were encountered. > > It's a pity we won't get the dmesg of that incident, what would be > super > useful to debug. > > > A `btrfs scrub` identified a dozen of checksum errors which were > > "not > > correctable", and `btrfs check --repair` (and `btrfs check --repair > > -- > > init-csum-tree`) > > Not recommended, but the output may still help. > > > also failed to fix anything. The former crashed in a > > segfault, and the latter refused to write anything because of an > > "I/O > > error". > > > > Unfortunately, I don't have any logs because I had to nuke (wipe & > > re- > > make) my filesystem as the solution. However, after the > > reformatting I > > gave up using eCryptFs, and the file corruption bugs have not > > reappeared since. > > That's a little strange. I guess there is some buffered IO mixed with > direct IO, which is known to cause csum mismatch, while other fs just > can't detect such data corruption and pretend nothing happened. > > But normally, csum read shouldn't lead to RO, thus I believe there > are > more problems of that previous failure. > > > Initially I suspected that it was a hardware issue, > > but I did a SMART test and no errors were detected; I strongly > > suspect > > that it is related to eCryptFS. > > > > System info: > > > > uname -a: > > > > Linux xuanruiwork 5.6.15-3-clear #1 SMP Sun, 31 May 2020 19:57:42 > > +0000 > > x86_64 GNU/Linux > > > > btrfs --version: > > btrfs-progs v5.6.1 > > > > (the rest is from after the reformat, but the setup is identical to > > before the reformat sans eCryptFS) > > > > btrfs fi show: > > Label: none uuid: 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64 > > Total devices 1 FS bytes used 57.58GiB > > devid 1 size 332.94GiB used 60.02GiB path /dev/sda3 > > > > btrfs fi df /home: > > Data, single: total=59.01GiB, used=57.26GiB > > System, single: total=4.00MiB, used=16.00KiB > > Metadata, single: total=1.01GiB, used=328.25MiB > > GlobalReserve, single: total=75.17MiB, used=0.00B > > > > Some output from dmesg (note that /dev/sda1 is not the corrupted > > filesystem; these corruptions seem to have been self-corrected by > > btrfs): > > > > [ 3.434351] BTRFS: device fsid 823961e1-6b9e-4ab8-b5a7- > > c17eb8c40d64 > > devid 1 transid 79 /dev/sda3 scanned by systemd-udevd (519) > > [ 3.440896] BTRFS: device fsid a3892669-1ad8-4ff3-9747- > > 0f8c405c0e6a > > devid 1 transid 4769881 /dev/sda1 scanned by systemd-udevd (487) > > [ 3.461539] BTRFS info (device sda1): disk space caching is > > enabled > > [ 3.461540] BTRFS info (device sda1): has skinny extents > > [ 3.464079] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, > > rd > > 0, flush 0, corrupt 14, gen 0 > > Corruption count 14 doesn't seem good. > > > [ 3.510991] BTRFS info (device sda1): enabling ssd optimizations > > [ 5.938153] BTRFS info (device sda1): disk space caching is > > enabled > > [ 7.072974] BTRFS info (device sda3): enabling ssd optimizations > > [ 7.072977] BTRFS info (device sda3): disk space caching is > > enabled > > [ 7.072978] BTRFS info (device sda3): has skinny extents > > [ 3710.968433] BTRFS warning (device sda3): qgroup rescan init > > failed, > > qgroup is not enabled > > And btrfs is trying to init qgroup rescan while qgroup is not > enabled? > That's doesn't sound good either. > > > [ 7412.459332] BTRFS info (device sda1): scrub: started on devid 1 > > [ 7545.641724] BTRFS info (device sda1): scrub: finished on devid 1 > > with status: 0 > > [ 8244.846830] BTRFS info (device sda3): scrub: started on devid 1 > > [ 8369.651774] BTRFS info (device sda3): scrub: finished on devid 1 > > with status: 0 > > Any log on `btrfs check` without --repair? > > Thanks, > Qu > > If anyone could look into the issue, it would be greatly > > appreciated. > > > > Best, > > Xuanrui > > [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs 2020-06-02 1:51 ` Xuanrui Qi @ 2020-06-02 3:58 ` Chris Murphy 0 siblings, 0 replies; 6+ messages in thread From: Chris Murphy @ 2020-06-02 3:58 UTC (permalink / raw) To: Xuanrui Qi; +Cc: Qu Wenruo, Btrfs BTRFS On Mon, Jun 1, 2020 at 7:52 PM Xuanrui Qi <me@xuanruiqi.com> wrote: > > Hello Wenruo (and all), > > > Any log on `btrfs check` without --repair? > > This was all after I reformatted the partition, so it might not be as > useful. But as you see, `dmesg` reports 14 corruption errors on > /dev/sda1 (which has been functioning correctly) but `btrfs scrub` does > not report any problems. I'll do a btrfs check when I boot from a live > USB. [ 3.464079] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 14, gen 0 This is a persistent counter, not a live event. So it's probably old if scrub isn't finding problems. -- Chris Murphy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs 2020-06-01 21:08 Massive filesystem corruption, potentially related to eCryptfs-on-btrfs Xuanrui Qi 2020-06-02 1:18 ` Qu Wenruo @ 2020-06-02 6:04 ` Swâmi Petaramesh 2020-06-03 13:01 ` Martin Steigerwald 1 sibling, 1 reply; 6+ messages in thread From: Swâmi Petaramesh @ 2020-06-02 6:04 UTC (permalink / raw) To: Xuanrui Qi, linux-btrfs Le 01/06/2020 à 23:08, Xuanrui Qi a écrit : > > I have just recovered from a massive filesystem corruption problem > which turned out to be a total nightmare, and I have strong reason to > suspect that it is related to eCryptfs-encrypted folders on btrfs. Hi there, For the record, I've been using ecryptfs on BTRFS for years on more than 10 different machines (including the one on which I'm presently writing this) and *NEVER* went into a corruption problem relating BTRFS and ecryptfs. Although I had some FS corruption issues with BTRFS, they all related to issues that have been diagnosed since and had nothing to do with ecryptfs. Kind regards. ॐ -- Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs 2020-06-02 6:04 ` Swâmi Petaramesh @ 2020-06-03 13:01 ` Martin Steigerwald 0 siblings, 0 replies; 6+ messages in thread From: Martin Steigerwald @ 2020-06-03 13:01 UTC (permalink / raw) To: Xuanrui Qi, linux-btrfs, Swâmi Petaramesh Hi. Swâmi Petaramesh - 02.06.20, 08:04:59 CEST: > Le 01/06/2020 à 23:08, Xuanrui Qi a écrit : > > I have just recovered from a massive filesystem corruption problem > > which turned out to be a total nightmare, and I have strong reason > > to > > suspect that it is related to eCryptfs-encrypted folders on btrfs. […] > For the record, I've been using ecryptfs on BTRFS for years on more > than 10 different machines (including the one on which I'm presently > writing this) and *NEVER* went into a corruption problem relating > BTRFS and ecryptfs. I have been using ecryptfs on BTRFS just on one machine, but also for years. No issues either. So I do not believe that there is a principal issue with running ecryptfs on BTRFS. Best, -- Martin ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-06-03 13:08 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-01 21:08 Massive filesystem corruption, potentially related to eCryptfs-on-btrfs Xuanrui Qi 2020-06-02 1:18 ` Qu Wenruo 2020-06-02 1:51 ` Xuanrui Qi 2020-06-02 3:58 ` Chris Murphy 2020-06-02 6:04 ` Swâmi Petaramesh 2020-06-03 13:01 ` Martin Steigerwald
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.