All of lore.kernel.org
 help / color / mirror / Atom feed
* Massive filesystem corruption, potentially related to eCryptfs-on-btrfs
@ 2020-06-01 21:08 Xuanrui Qi
  2020-06-02  1:18 ` Qu Wenruo
  2020-06-02  6:04 ` Swâmi Petaramesh
  0 siblings, 2 replies; 6+ messages in thread
From: Xuanrui Qi @ 2020-06-01 21:08 UTC (permalink / raw)
  To: linux-btrfs

Hello all,

I have just recovered from a massive filesystem corruption problem
which turned out to be a total nightmare, and I have strong reason to
suspect that it is related to eCryptfs-encrypted folders on btrfs.

I run Arch Linux and have my /home directory as a btrfs partition. My
user's home directory (/home/xuanrui) is encrypted using eCryptFS.

I ran into a massive filesystem corrpution issue a while ago. When
reading certain files or occasionally writing to files, I encounter FS
errors (mainly checksum errors, but also other I/O errors). Then my
file system becomes read-only because errors were encountered.

A `btrfs scrub` identified a dozen of checksum errors which were "not
correctable", and `btrfs check --repair` (and `btrfs check --repair --
init-csum-tree`) also failed to fix anything. The former crashed in a
segfault, and the latter refused to write anything because of an "I/O
error".

Unfortunately, I don't have any logs because I had to nuke (wipe & re-
make) my filesystem as the solution. However, after the reformatting I
gave up using eCryptFs, and the file corruption bugs have not
reappeared since. Initially I suspected that it was a hardware issue,
but I did a SMART test and no errors were detected; I strongly suspect
that it is related to eCryptFS.

System info:

uname -a:

Linux xuanruiwork 5.6.15-3-clear #1 SMP Sun, 31 May 2020 19:57:42 +0000
x86_64 GNU/Linux

btrfs --version:
btrfs-progs v5.6.1

(the rest is from after the reformat, but the setup is identical to
before the reformat sans eCryptFS)

btrfs fi show:
Label: none  uuid: 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64
	Total devices 1 FS bytes used 57.58GiB
	devid    1 size 332.94GiB used 60.02GiB path /dev/sda3

btrfs fi df /home:
Data, single: total=59.01GiB, used=57.26GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=1.01GiB, used=328.25MiB
GlobalReserve, single: total=75.17MiB, used=0.00B

Some output from dmesg (note that /dev/sda1 is not the corrupted
filesystem; these corruptions seem to have been self-corrected by
btrfs):

[    3.434351] BTRFS: device fsid 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64
devid 1 transid 79 /dev/sda3 scanned by systemd-udevd (519)
[    3.440896] BTRFS: device fsid a3892669-1ad8-4ff3-9747-0f8c405c0e6a
devid 1 transid 4769881 /dev/sda1 scanned by systemd-udevd (487)
[    3.461539] BTRFS info (device sda1): disk space caching is enabled
[    3.461540] BTRFS info (device sda1): has skinny extents
[    3.464079] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd
0, flush 0, corrupt 14, gen 0
[    3.510991] BTRFS info (device sda1): enabling ssd optimizations
[    5.938153] BTRFS info (device sda1): disk space caching is enabled
[    7.072974] BTRFS info (device sda3): enabling ssd optimizations
[    7.072977] BTRFS info (device sda3): disk space caching is enabled
[    7.072978] BTRFS info (device sda3): has skinny extents
[ 3710.968433] BTRFS warning (device sda3): qgroup rescan init failed,
qgroup is not enabled
[ 7412.459332] BTRFS info (device sda1): scrub: started on devid 1
[ 7545.641724] BTRFS info (device sda1): scrub: finished on devid 1
with status: 0
[ 8244.846830] BTRFS info (device sda3): scrub: started on devid 1
[ 8369.651774] BTRFS info (device sda3): scrub: finished on devid 1
with status: 0

If anyone could look into the issue, it would be greatly appreciated.

Best,
Xuanrui


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs
  2020-06-01 21:08 Massive filesystem corruption, potentially related to eCryptfs-on-btrfs Xuanrui Qi
@ 2020-06-02  1:18 ` Qu Wenruo
  2020-06-02  1:51   ` Xuanrui Qi
  2020-06-02  6:04 ` Swâmi Petaramesh
  1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2020-06-02  1:18 UTC (permalink / raw)
  To: Xuanrui Qi, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4373 bytes --]



On 2020/6/2 上午5:08, Xuanrui Qi wrote:
> Hello all,
> 
> I have just recovered from a massive filesystem corruption problem
> which turned out to be a total nightmare, and I have strong reason to
> suspect that it is related to eCryptfs-encrypted folders on btrfs.
> 
> I run Arch Linux and have my /home directory as a btrfs partition. My
> user's home directory (/home/xuanrui) is encrypted using eCryptFS.
> 
> I ran into a massive filesystem corrpution issue a while ago. When
> reading certain files or occasionally writing to files, I encounter FS
> errors (mainly checksum errors, but also other I/O errors). Then my
> file system becomes read-only because errors were encountered.

It's a pity we won't get the dmesg of that incident, what would be super
useful to debug.

> 
> A `btrfs scrub` identified a dozen of checksum errors which were "not
> correctable", and `btrfs check --repair` (and `btrfs check --repair --
> init-csum-tree`)

Not recommended, but the output may still help.

> also failed to fix anything. The former crashed in a
> segfault, and the latter refused to write anything because of an "I/O
> error".
> 
> Unfortunately, I don't have any logs because I had to nuke (wipe & re-
> make) my filesystem as the solution. However, after the reformatting I
> gave up using eCryptFs, and the file corruption bugs have not
> reappeared since.

That's a little strange. I guess there is some buffered IO mixed with
direct IO, which is known to cause csum mismatch, while other fs just
can't detect such data corruption and pretend nothing happened.

But normally, csum read shouldn't lead to RO, thus I believe there are
more problems of that previous failure.

> Initially I suspected that it was a hardware issue,
> but I did a SMART test and no errors were detected; I strongly suspect
> that it is related to eCryptFS.
> 
> System info:
> 
> uname -a:
> 
> Linux xuanruiwork 5.6.15-3-clear #1 SMP Sun, 31 May 2020 19:57:42 +0000
> x86_64 GNU/Linux
> 
> btrfs --version:
> btrfs-progs v5.6.1
> 
> (the rest is from after the reformat, but the setup is identical to
> before the reformat sans eCryptFS)
> 
> btrfs fi show:
> Label: none  uuid: 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64
> 	Total devices 1 FS bytes used 57.58GiB
> 	devid    1 size 332.94GiB used 60.02GiB path /dev/sda3
> 
> btrfs fi df /home:
> Data, single: total=59.01GiB, used=57.26GiB
> System, single: total=4.00MiB, used=16.00KiB
> Metadata, single: total=1.01GiB, used=328.25MiB
> GlobalReserve, single: total=75.17MiB, used=0.00B
> 
> Some output from dmesg (note that /dev/sda1 is not the corrupted
> filesystem; these corruptions seem to have been self-corrected by
> btrfs):
> 
> [    3.434351] BTRFS: device fsid 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64
> devid 1 transid 79 /dev/sda3 scanned by systemd-udevd (519)
> [    3.440896] BTRFS: device fsid a3892669-1ad8-4ff3-9747-0f8c405c0e6a
> devid 1 transid 4769881 /dev/sda1 scanned by systemd-udevd (487)
> [    3.461539] BTRFS info (device sda1): disk space caching is enabled
> [    3.461540] BTRFS info (device sda1): has skinny extents
> [    3.464079] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd
> 0, flush 0, corrupt 14, gen 0

Corruption count 14 doesn't seem good.

> [    3.510991] BTRFS info (device sda1): enabling ssd optimizations
> [    5.938153] BTRFS info (device sda1): disk space caching is enabled
> [    7.072974] BTRFS info (device sda3): enabling ssd optimizations
> [    7.072977] BTRFS info (device sda3): disk space caching is enabled
> [    7.072978] BTRFS info (device sda3): has skinny extents
> [ 3710.968433] BTRFS warning (device sda3): qgroup rescan init failed,
> qgroup is not enabled

And btrfs is trying to init qgroup rescan while qgroup is not enabled?
That's doesn't sound good either.

> [ 7412.459332] BTRFS info (device sda1): scrub: started on devid 1
> [ 7545.641724] BTRFS info (device sda1): scrub: finished on devid 1
> with status: 0
> [ 8244.846830] BTRFS info (device sda3): scrub: started on devid 1
> [ 8369.651774] BTRFS info (device sda3): scrub: finished on devid 1
> with status: 0

Any log on `btrfs check` without --repair?

Thanks,
Qu
> 
> If anyone could look into the issue, it would be greatly appreciated.
> 
> Best,
> Xuanrui
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs
  2020-06-02  1:18 ` Qu Wenruo
@ 2020-06-02  1:51   ` Xuanrui Qi
  2020-06-02  3:58     ` Chris Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Xuanrui Qi @ 2020-06-02  1:51 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5560 bytes --]

Hello Wenruo (and all),

> Any log on `btrfs check` without --repair?

This was all after I reformatted the partition, so it might not be as
useful. But as you see, `dmesg` reports 14 corruption errors on
/dev/sda1 (which has been functioning correctly) but `btrfs scrub` does
not report any problems. I'll do a btrfs check when I boot from a live
USB.

> But normally, csum read shouldn't lead to RO, thus I believe there
> are more problems of that previous failure.

I think there are other problems indeed, not just csum mismatch. I got
lots of I/O errors, but now after reformatting my partition they just
disappeared. Particularly, writing to the filesystem could randomly
crash the filesystem. It could be a hardware issue, but now it seems
more likely to be software-related.

Best,
Xuanrui

On Tue, 2020-06-02 at 09:18 +0800, Qu Wenruo wrote:
> 
> On 2020/6/2 上午5:08, Xuanrui Qi wrote:
> > Hello all,
> > 
> > I have just recovered from a massive filesystem corruption problem
> > which turned out to be a total nightmare, and I have strong reason
> > to
> > suspect that it is related to eCryptfs-encrypted folders on btrfs.
> > 
> > I run Arch Linux and have my /home directory as a btrfs partition.
> > My
> > user's home directory (/home/xuanrui) is encrypted using eCryptFS.
> > 
> > I ran into a massive filesystem corrpution issue a while ago. When
> > reading certain files or occasionally writing to files, I encounter
> > FS
> > errors (mainly checksum errors, but also other I/O errors). Then my
> > file system becomes read-only because errors were encountered.
> 
> It's a pity we won't get the dmesg of that incident, what would be
> super
> useful to debug.
> 
> > A `btrfs scrub` identified a dozen of checksum errors which were
> > "not
> > correctable", and `btrfs check --repair` (and `btrfs check --repair 
> > --
> > init-csum-tree`)
> 
> Not recommended, but the output may still help.
> 
> > also failed to fix anything. The former crashed in a
> > segfault, and the latter refused to write anything because of an
> > "I/O
> > error".
> > 
> > Unfortunately, I don't have any logs because I had to nuke (wipe &
> > re-
> > make) my filesystem as the solution. However, after the
> > reformatting I
> > gave up using eCryptFs, and the file corruption bugs have not
> > reappeared since.
> 
> That's a little strange. I guess there is some buffered IO mixed with
> direct IO, which is known to cause csum mismatch, while other fs just
> can't detect such data corruption and pretend nothing happened.
> 
> But normally, csum read shouldn't lead to RO, thus I believe there
> are
> more problems of that previous failure.
> 
> > Initially I suspected that it was a hardware issue,
> > but I did a SMART test and no errors were detected; I strongly
> > suspect
> > that it is related to eCryptFS.
> > 
> > System info:
> > 
> > uname -a:
> > 
> > Linux xuanruiwork 5.6.15-3-clear #1 SMP Sun, 31 May 2020 19:57:42
> > +0000
> > x86_64 GNU/Linux
> > 
> > btrfs --version:
> > btrfs-progs v5.6.1
> > 
> > (the rest is from after the reformat, but the setup is identical to
> > before the reformat sans eCryptFS)
> > 
> > btrfs fi show:
> > Label: none  uuid: 823961e1-6b9e-4ab8-b5a7-c17eb8c40d64
> > 	Total devices 1 FS bytes used 57.58GiB
> > 	devid    1 size 332.94GiB used 60.02GiB path /dev/sda3
> > 
> > btrfs fi df /home:
> > Data, single: total=59.01GiB, used=57.26GiB
> > System, single: total=4.00MiB, used=16.00KiB
> > Metadata, single: total=1.01GiB, used=328.25MiB
> > GlobalReserve, single: total=75.17MiB, used=0.00B
> > 
> > Some output from dmesg (note that /dev/sda1 is not the corrupted
> > filesystem; these corruptions seem to have been self-corrected by
> > btrfs):
> > 
> > [    3.434351] BTRFS: device fsid 823961e1-6b9e-4ab8-b5a7-
> > c17eb8c40d64
> > devid 1 transid 79 /dev/sda3 scanned by systemd-udevd (519)
> > [    3.440896] BTRFS: device fsid a3892669-1ad8-4ff3-9747-
> > 0f8c405c0e6a
> > devid 1 transid 4769881 /dev/sda1 scanned by systemd-udevd (487)
> > [    3.461539] BTRFS info (device sda1): disk space caching is
> > enabled
> > [    3.461540] BTRFS info (device sda1): has skinny extents
> > [    3.464079] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0,
> > rd
> > 0, flush 0, corrupt 14, gen 0
> 
> Corruption count 14 doesn't seem good.
> 
> > [    3.510991] BTRFS info (device sda1): enabling ssd optimizations
> > [    5.938153] BTRFS info (device sda1): disk space caching is
> > enabled
> > [    7.072974] BTRFS info (device sda3): enabling ssd optimizations
> > [    7.072977] BTRFS info (device sda3): disk space caching is
> > enabled
> > [    7.072978] BTRFS info (device sda3): has skinny extents
> > [ 3710.968433] BTRFS warning (device sda3): qgroup rescan init
> > failed,
> > qgroup is not enabled
> 
> And btrfs is trying to init qgroup rescan while qgroup is not
> enabled?
> That's doesn't sound good either.
> 
> > [ 7412.459332] BTRFS info (device sda1): scrub: started on devid 1
> > [ 7545.641724] BTRFS info (device sda1): scrub: finished on devid 1
> > with status: 0
> > [ 8244.846830] BTRFS info (device sda3): scrub: started on devid 1
> > [ 8369.651774] BTRFS info (device sda3): scrub: finished on devid 1
> > with status: 0
> 
> Any log on `btrfs check` without --repair?
> 
> Thanks,
> Qu
> > If anyone could look into the issue, it would be greatly
> > appreciated.
> > 
> > Best,
> > Xuanrui
> > 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs
  2020-06-02  1:51   ` Xuanrui Qi
@ 2020-06-02  3:58     ` Chris Murphy
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Murphy @ 2020-06-02  3:58 UTC (permalink / raw)
  To: Xuanrui Qi; +Cc: Qu Wenruo, Btrfs BTRFS

On Mon, Jun 1, 2020 at 7:52 PM Xuanrui Qi <me@xuanruiqi.com> wrote:
>
> Hello Wenruo (and all),
>
> > Any log on `btrfs check` without --repair?
>
> This was all after I reformatted the partition, so it might not be as
> useful. But as you see, `dmesg` reports 14 corruption errors on
> /dev/sda1 (which has been functioning correctly) but `btrfs scrub` does
> not report any problems. I'll do a btrfs check when I boot from a live
> USB.

[    3.464079] BTRFS info (device sda1): bdev /dev/sda1 errs: wr 0, rd
0, flush 0, corrupt 14, gen 0

This is a persistent counter, not a live event. So it's probably old
if scrub isn't finding problems.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs
  2020-06-01 21:08 Massive filesystem corruption, potentially related to eCryptfs-on-btrfs Xuanrui Qi
  2020-06-02  1:18 ` Qu Wenruo
@ 2020-06-02  6:04 ` Swâmi Petaramesh
  2020-06-03 13:01   ` Martin Steigerwald
  1 sibling, 1 reply; 6+ messages in thread
From: Swâmi Petaramesh @ 2020-06-02  6:04 UTC (permalink / raw)
  To: Xuanrui Qi, linux-btrfs

Le 01/06/2020 à 23:08, Xuanrui Qi a écrit :
> 
> I have just recovered from a massive filesystem corruption problem
> which turned out to be a total nightmare, and I have strong reason to
> suspect that it is related to eCryptfs-encrypted folders on btrfs.
Hi there,

For the record, I've been using ecryptfs on BTRFS for years on more than
10 different machines (including the one on which I'm presently writing
this) and *NEVER* went into a corruption problem relating BTRFS and
ecryptfs.

Although I had some FS corruption issues with BTRFS, they all related to
issues that have been diagnosed since and had nothing to do with ecryptfs.

Kind regards.

ॐ

-- 
Swâmi Petaramesh <swami@petaramesh.org> PGP 9076E32E

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Massive filesystem corruption, potentially related to eCryptfs-on-btrfs
  2020-06-02  6:04 ` Swâmi Petaramesh
@ 2020-06-03 13:01   ` Martin Steigerwald
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Steigerwald @ 2020-06-03 13:01 UTC (permalink / raw)
  To: Xuanrui Qi, linux-btrfs, Swâmi Petaramesh

Hi.

Swâmi Petaramesh - 02.06.20, 08:04:59 CEST:
> Le 01/06/2020 à 23:08, Xuanrui Qi a écrit :
> > I have just recovered from a massive filesystem corruption problem
> > which turned out to be a total nightmare, and I have strong reason
> > to
> > suspect that it is related to eCryptfs-encrypted folders on btrfs.
[…]
> For the record, I've been using ecryptfs on BTRFS for years on more
> than 10 different machines (including the one on which I'm presently
> writing this) and *NEVER* went into a corruption problem relating
> BTRFS and ecryptfs.

I have been using ecryptfs on BTRFS just on one machine, but also for 
years. No issues either.

So I do not believe that there is a principal issue with running 
ecryptfs on BTRFS.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-03 13:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-01 21:08 Massive filesystem corruption, potentially related to eCryptfs-on-btrfs Xuanrui Qi
2020-06-02  1:18 ` Qu Wenruo
2020-06-02  1:51   ` Xuanrui Qi
2020-06-02  3:58     ` Chris Murphy
2020-06-02  6:04 ` Swâmi Petaramesh
2020-06-03 13:01   ` Martin Steigerwald

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.