Linux-BTRFS Archive on lore.kernel.org
 help / Atom feed
From: "Michael Laß" <bevan@bi-co.net>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2
Date: Fri, 17 May 2019 19:37:49 +0200
Message-ID: <8C31D41C-9608-4A65-B543-8ABCC0B907A0@bi-co.net> (raw)
In-Reply-To: <CAJCQCtR-uo9fgs66pBMEoYX_xAye=O-L8kiMwyAdFjPS5T4+CA@mail.gmail.com>


> Am 17.05.2019 um 01:42 schrieb Chris Murphy <lists@colorremedies.com>:
> 
> Btrfs balance is supposed to be COW. So a block group is not
> dereferenced until it is copied successfully and metadata is updated.
> So it sounds like the fstrim happened before the metadata was updated.
> But I don't see how that's possible in normal operation even without a
> sync, let alone with the sync.

Balance is indeed not to blame here. See below.

> The most reliable way to test it, ideally keep everything the same, do
> a new mkfs.btrfs, and try to reproduce the problem. And then do a
> bisect. That for sure will find it, whether it's btrfs or something
> else that's changed in the kernel. But it's also a bit tedious.
> 
> I'm not sure how to test this with any other filesystem on top of your
> existing storage stack instead of btrfs, to see if it's btrfs or
> something else. And you'll still have to do a lot of iteration. So it
> doesn't make things that much easier than doing a kernel bisect.
> Neither ext4 nor XFS have block group move like Btrfs does. LVM does
> however, with pvmove. But that makes the testing more complicated,
> introduces more factors. So...I still vote for bisect.
> 
> But even if you can't bisect, if you can reproduce, that might help
> someone else who can do the bisect.

I tried to reproduce this issue: I recreated the btrfs file system, set up a minimal system and issued fstrim again. It printed the following error message:

fstrim: /: FITRIM ioctl failed: Input/output error

Now it gets iteresting: After this, the btrfs file system was fine. However, two other LVM logical volumes that are partitioned with ext4 were destroyed. I cannot reproduce this issue with an older Linux 4.19 live CD. So I assume that it is not an issue with the SSD itself. I’ll start bisecting now. It could take a while since every “successful” (i.e., destructive) test requires me to recreate the system.

> Your stack looks like this?
> 
> Btrfs
> LUKS/dmcrypt
> LVM
> Samsung SSD

To be precise, there’s an MBR partition in the game as well:

Btrfs
LUKS/dmcrypt
LVM
MBR partition
Samsung SSD

Cheers,
Michael

  reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-16 22:16 Michael Laß
2019-05-16 23:41 ` Qu Wenruo
2019-05-16 23:42 ` Chris Murphy
2019-05-17 17:37   ` Michael Laß [this message]
2019-05-18  4:09     ` Chris Murphy
2019-05-18  9:18       ` Michael Laß
2019-05-18  9:31         ` Roman Mamedov
2019-05-18 10:09           ` Michael Laß
2019-05-18 10:26         ` Qu Wenruo
2019-05-19 19:55           ` fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss Michael Laß
2019-05-20 11:38             ` [dm-devel] " Michael Laß
2019-05-21 16:46               ` Michael Laß
2019-05-21 19:00                 ` Andrea Gelmini
2019-05-21 19:59                   ` Michael Laß
2019-05-21 20:12                   ` Mike Snitzer
     [not found]             ` <CAK-xaQYPs62v971zm1McXw_FGzDmh_vpz3KLEbxzkmrsSgTfXw@mail.gmail.com>
2019-05-20 13:58               ` Michael Laß
2019-05-20 14:53                 ` Andrea Gelmini
2019-05-20 16:45                   ` Milan Broz
2019-05-20 19:58                     ` Michael Laß
2019-05-21 18:54                     ` Andrea Gelmini

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8C31D41C-9608-4A65-B543-8ABCC0B907A0@bi-co.net \
    --to=bevan@bi-co.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox