* Massive filesystem corruption after balance + fstrim on Linux 5.1.2
@ 2019-05-16 22:16 Michael Laß
2019-05-16 23:41 ` Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 24+ messages in thread
From: Michael Laß @ 2019-05-16 22:16 UTC (permalink / raw)
To: linux-btrfs
Hi.
Today I managed to destroy my btrfs root filesystem using the following
sequence of commands:
sync
btrfs balance start -dusage 75 -musage 75 /
sync
fstrim -v /
Shortly after, the kernel spew out lots of messages like the following:
BTRFS warning (device dm-5): csum failed root 257 ino 16634085 off
21504884736 csum 0xd47cc2a2 expected csum 0xcebd791b mirror 1
A btrfs scrub shows roughly 27000 unrecoverable csum errors and lots of
data on that system is not accessible anymore.
I'm running Linux 5.1.2 on an Arch Linux. Their kernel pretty much
matches upstream with only one non btrfs-related patch on top:
https://git.archlinux.org/linux.git/log/?h=v5.1.2-arch1
The btrfs file system was mounted with compress=lzo. The underlying
storage device is a LUKS volume, on top of an LVM logical volume and the
underlying physical volume is a Samsung 830 SSD. The LUKS volume is
opened with the option "discard" so that trim commands are passed to the
device.
SMART shows no errors on the SSD itself. I never had issues with
balancing or trimming the btrfs volume before, even the exact same
sequence of commands as above never caused any issues. Until now.
Does anyone have an idea of what happened here? Could this be a bug in
btrfs?
I have made a copy of that volume so I can get further information out
of it if necessary. I already ran btrfs check on it (using the slightly
outdated version 4.19.1) and it did not show any errors. So it seems
like only data has been corrupted.
Please tell me if I can provide any more useful information on this.
Cheers,
Michael
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2
2019-05-16 22:16 Massive filesystem corruption after balance + fstrim on Linux 5.1.2 Michael Laß
@ 2019-05-16 23:41 ` Qu Wenruo
2019-05-16 23:42 ` Chris Murphy
2019-05-28 12:36 ` Massive filesystem corruption after balance + fstrim on Linux 5.1.2 Christoph Anton Mitterer
2 siblings, 0 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-05-16 23:41 UTC (permalink / raw)
To: Michael Laß, linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 2587 bytes --]
On 2019/5/17 上午6:16, Michael Laß wrote:
> Hi.
>
> Today I managed to destroy my btrfs root filesystem using the following
> sequence of commands:
I don't have a root fs filled, but a btrfs with linux kernel with
compiled results filling 5G of a total 10G.
I'm using the that fs in my VM to try to reproduce.
>
> sync
> btrfs balance start -dusage 75 -musage 75 /
> sync
> fstrim -v /
Tried the same, while I use --full-blanace for that balance to ensure
all chunks get relocated.
>
> Shortly after, the kernel spew out lots of messages like the following:
>
> BTRFS warning (device dm-5): csum failed root 257 ino 16634085 off
> 21504884736 csum 0xd47cc2a2 expected csum 0xcebd791b mirror 1
>
> A btrfs scrub shows roughly 27000 unrecoverable csum errors and lots of
> data on that system is not accessible anymore.
After above operations, nothing wrong happened in scrub:
$ sudo btrfs scrub start -B /mnt/btrfs/
scrub done for 1dd1bcf6-4392-4be1-8c0e-0bfd16321ade
scrub started at Fri May 17 07:34:26 2019 and finished after 00:00:02
total bytes scrubbed: 4.19GiB with 0 errors
>
> I'm running Linux 5.1.2 on an Arch Linux. Their kernel pretty much
> matches upstream with only one non btrfs-related patch on top:
> https://git.archlinux.org/linux.git/log/?h=v5.1.2-arch1
>
> The btrfs file system was mounted with compress=lzo. The underlying
> storage device is a LUKS volume, on top of an LVM logical volume and the
> underlying physical volume is a Samsung 830 SSD. The LUKS volume is
> opened with the option "discard" so that trim commands are passed to the
> device.
I'm not sure if it's LUKS or btrfs to blame.
In my test environment, I'm using LVM but without LUKS.
My LVM setup has issue_discards = 1 set.
Would you please try to verify the behavior on a plain partition to rule
out possible interference?
Thanks,
Qu
>
> SMART shows no errors on the SSD itself. I never had issues with
> balancing or trimming the btrfs volume before, even the exact same
> sequence of commands as above never caused any issues. Until now.
>
> Does anyone have an idea of what happened here? Could this be a bug in
> btrfs?
>
> I have made a copy of that volume so I can get further information out
> of it if necessary. I already ran btrfs check on it (using the slightly
> outdated version 4.19.1) and it did not show any errors. So it seems
> like only data has been corrupted.
>
> Please tell me if I can provide any more useful information on this.
>
> Cheers,
> Michael
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2
2019-05-16 22:16 Massive filesystem corruption after balance + fstrim on Linux 5.1.2 Michael Laß
2019-05-16 23:41 ` Qu Wenruo
@ 2019-05-16 23:42 ` Chris Murphy
2019-05-17 17:37 ` Michael Laß
2019-05-28 12:36 ` Massive filesystem corruption after balance + fstrim on Linux 5.1.2 Christoph Anton Mitterer
2 siblings, 1 reply; 24+ messages in thread
From: Chris Murphy @ 2019-05-16 23:42 UTC (permalink / raw)
To: Michael Laß; +Cc: Btrfs BTRFS
On Thu, May 16, 2019 at 4:26 PM Michael Laß <bevan@bi-co.net> wrote:
>
> Hi.
>
> Today I managed to destroy my btrfs root filesystem using the following
> sequence of commands:
>
> sync
> btrfs balance start -dusage 75 -musage 75 /
> sync
> fstrim -v /
>
> Shortly after, the kernel spew out lots of messages like the following:
>
> BTRFS warning (device dm-5): csum failed root 257 ino 16634085 off
> 21504884736 csum 0xd47cc2a2 expected csum 0xcebd791b mirror 1
>
> A btrfs scrub shows roughly 27000 unrecoverable csum errors and lots of
> data on that system is not accessible anymore.
>
> I'm running Linux 5.1.2 on an Arch Linux. Their kernel pretty much
> matches upstream with only one non btrfs-related patch on top:
> https://git.archlinux.org/linux.git/log/?h=v5.1.2-arch1
>
> The btrfs file system was mounted with compress=lzo. The underlying
> storage device is a LUKS volume, on top of an LVM logical volume and the
> underlying physical volume is a Samsung 830 SSD. The LUKS volume is
> opened with the option "discard" so that trim commands are passed to the
> device.
>
> SMART shows no errors on the SSD itself. I never had issues with
> balancing or trimming the btrfs volume before, even the exact same
> sequence of commands as above never caused any issues. Until now.
>
> Does anyone have an idea of what happened here? Could this be a bug in
> btrfs?
I suspect there's a regression somewhere, question is where. I've used
a Samsung 830 SSD extensively with Btrfs and fstrim in the past, but
without dm-crypt. I'm using Btrfs extensively with dm-crypt but on
hard drives. So I can't test this.
Btrfs balance is supposed to be COW. So a block group is not
dereferenced until it is copied successfully and metadata is updated.
So it sounds like the fstrim happened before the metadata was updated.
But I don't see how that's possible in normal operation even without a
sync, let alone with the sync.
The most reliable way to test it, ideally keep everything the same, do
a new mkfs.btrfs, and try to reproduce the problem. And then do a
bisect. That for sure will find it, whether it's btrfs or something
else that's changed in the kernel. But it's also a bit tedious.
I'm not sure how to test this with any other filesystem on top of your
existing storage stack instead of btrfs, to see if it's btrfs or
something else. And you'll still have to do a lot of iteration. So it
doesn't make things that much easier than doing a kernel bisect.
Neither ext4 nor XFS have block group move like Btrfs does. LVM does
however, with pvmove. But that makes the testing more complicated,
introduces more factors. So...I still vote for bisect.
But even if you can't bisect, if you can reproduce, that might help
someone else who can do the bisect.
Your stack looks like this?
Btrfs
LUKS/dmcrypt
LVM
Samsung SSD
--
Chris Murphy
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2
2019-05-16 23:42 ` Chris Murphy
@ 2019-05-17 17:37 ` Michael Laß
2019-05-18 4:09 ` Chris Murphy
0 siblings, 1 reply; 24+ messages in thread
From: Michael Laß @ 2019-05-17 17:37 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
> Am 17.05.2019 um 01:42 schrieb Chris Murphy <lists@colorremedies.com>:
>
> Btrfs balance is supposed to be COW. So a block group is not
> dereferenced until it is copied successfully and metadata is updated.
> So it sounds like the fstrim happened before the metadata was updated.
> But I don't see how that's possible in normal operation even without a
> sync, let alone with the sync.
Balance is indeed not to blame here. See below.
> The most reliable way to test it, ideally keep everything the same, do
> a new mkfs.btrfs, and try to reproduce the problem. And then do a
> bisect. That for sure will find it, whether it's btrfs or something
> else that's changed in the kernel. But it's also a bit tedious.
>
> I'm not sure how to test this with any other filesystem on top of your
> existing storage stack instead of btrfs, to see if it's btrfs or
> something else. And you'll still have to do a lot of iteration. So it
> doesn't make things that much easier than doing a kernel bisect.
> Neither ext4 nor XFS have block group move like Btrfs does. LVM does
> however, with pvmove. But that makes the testing more complicated,
> introduces more factors. So...I still vote for bisect.
>
> But even if you can't bisect, if you can reproduce, that might help
> someone else who can do the bisect.
I tried to reproduce this issue: I recreated the btrfs file system, set up a minimal system and issued fstrim again. It printed the following error message:
fstrim: /: FITRIM ioctl failed: Input/output error
Now it gets iteresting: After this, the btrfs file system was fine. However, two other LVM logical volumes that are partitioned with ext4 were destroyed. I cannot reproduce this issue with an older Linux 4.19 live CD. So I assume that it is not an issue with the SSD itself. I’ll start bisecting now. It could take a while since every “successful” (i.e., destructive) test requires me to recreate the system.
> Your stack looks like this?
>
> Btrfs
> LUKS/dmcrypt
> LVM
> Samsung SSD
To be precise, there’s an MBR partition in the game as well:
Btrfs
LUKS/dmcrypt
LVM
MBR partition
Samsung SSD
Cheers,
Michael
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2
2019-05-17 17:37 ` Michael Laß
@ 2019-05-18 4:09 ` Chris Murphy
2019-05-18 9:18 ` Michael Laß
0 siblings, 1 reply; 24+ messages in thread
From: Chris Murphy @ 2019-05-18 4:09 UTC (permalink / raw)
To: Michael Laß; +Cc: Btrfs BTRFS
On Fri, May 17, 2019 at 11:37 AM Michael Laß <bevan@bi-co.net> wrote:
>
>
> I tried to reproduce this issue: I recreated the btrfs file system, set up a minimal system and issued fstrim again. It printed the following error message:
>
> fstrim: /: FITRIM ioctl failed: Input/output error
Huh. Any kernel message at the same time? I would expect any fstrim
user space error message to also have a kernel message. Any i/o error
suggests some kind of storage stack failure - which could be hardware
or software, you can't know without seeing the kernel messages.
--
Chris Murphy
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2
2019-05-18 4:09 ` Chris Murphy
@ 2019-05-18 9:18 ` Michael Laß
2019-05-18 9:31 ` Roman Mamedov
2019-05-18 10:26 ` Qu Wenruo
0 siblings, 2 replies; 24+ messages in thread
From: Michael Laß @ 2019-05-18 9:18 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
> Am 18.05.2019 um 06:09 schrieb Chris Murphy <lists@colorremedies.com>:
>
> On Fri, May 17, 2019 at 11:37 AM Michael Laß <bevan@bi-co.net> wrote:
>>
>>
>> I tried to reproduce this issue: I recreated the btrfs file system, set up a minimal system and issued fstrim again. It printed the following error message:
>>
>> fstrim: /: FITRIM ioctl failed: Input/output error
>
> Huh. Any kernel message at the same time? I would expect any fstrim
> user space error message to also have a kernel message. Any i/o error
> suggests some kind of storage stack failure - which could be hardware
> or software, you can't know without seeing the kernel messages.
I missed that. The kernel messages are:
attempt to access beyond end of device
sda1: rw=16387, want=252755893, limit=250067632
BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
Here are some more information on the partitions and LVM physical segments:
fdisk -l /dev/sda:
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 250069679 250067632 119.2G 8e Linux LVM
pvdisplay -m:
--- Physical volume ---
PV Name /dev/sda1
VG Name vg_system
PV Size 119.24 GiB / not usable <22.34 MiB
Allocatable yes (but full)
PE Size 32.00 MiB
Total PE 3815
Free PE 0
Allocated PE 3815
PV UUID mqCLFy-iDnt-NfdC-lfSv-Maor-V1Ih-RlG8lP
--- Physical Segments ---
Physical extent 0 to 1248:
Logical volume /dev/vg_system/btrfs
Logical extents 2231 to 3479
Physical extent 1249 to 1728:
Logical volume /dev/vg_system/btrfs
Logical extents 640 to 1119
Physical extent 1729 to 1760:
Logical volume /dev/vg_system/grml-images
Logical extents 0 to 31
Physical extent 1761 to 2016:
Logical volume /dev/vg_system/swap
Logical extents 0 to 255
Physical extent 2017 to 2047:
Logical volume /dev/vg_system/btrfs
Logical extents 3480 to 3510
Physical extent 2048 to 2687:
Logical volume /dev/vg_system/btrfs
Logical extents 0 to 639
Physical extent 2688 to 3007:
Logical volume /dev/vg_system/btrfs
Logical extents 1911 to 2230
Physical extent 3008 to 3320:
Logical volume /dev/vg_system/btrfs
Logical extents 1120 to 1432
Physical extent 3321 to 3336:
Logical volume /dev/vg_system/boot
Logical extents 0 to 15
Physical extent 3337 to 3814:
Logical volume /dev/vg_system/btrfs
Logical extents 1433 to 1910
Would btrfs even be able to accidentally trim parts of other LVs or does this clearly hint towards a LVM/dm issue? Is there an easy way to somehow trace the trim through the different layers so one can see where it goes wrong?
Cheers,
Michael
PS: Current state of bisection: It looks like the error was introduced somewhere between b5dd0c658c31b469ccff1b637e5124851e7a4a1c and v5.1.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2
2019-05-18 9:18 ` Michael Laß
@ 2019-05-18 9:31 ` Roman Mamedov
2019-05-18 10:09 ` Michael Laß
2019-05-18 10:26 ` Qu Wenruo
1 sibling, 1 reply; 24+ messages in thread
From: Roman Mamedov @ 2019-05-18 9:31 UTC (permalink / raw)
To: Michael Laß; +Cc: Chris Murphy, Btrfs BTRFS
On Sat, 18 May 2019 11:18:31 +0200
Michael Laß <bevan@bi-co.net> wrote:
>
> > Am 18.05.2019 um 06:09 schrieb Chris Murphy <lists@colorremedies.com>:
> >
> > On Fri, May 17, 2019 at 11:37 AM Michael Laß <bevan@bi-co.net> wrote:
> >>
> >>
> >> I tried to reproduce this issue: I recreated the btrfs file system, set up a minimal system and issued fstrim again. It printed the following error message:
> >>
> >> fstrim: /: FITRIM ioctl failed: Input/output error
> >
> > Huh. Any kernel message at the same time? I would expect any fstrim
> > user space error message to also have a kernel message. Any i/o error
> > suggests some kind of storage stack failure - which could be hardware
> > or software, you can't know without seeing the kernel messages.
>
> I missed that. The kernel messages are:
>
> attempt to access beyond end of device
> sda1: rw=16387, want=252755893, limit=250067632
> BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
>
> Here are some more information on the partitions and LVM physical segments:
>
> fdisk -l /dev/sda:
>
> Device Boot Start End Sectors Size Id Type
> /dev/sda1 * 2048 250069679 250067632 119.2G 8e Linux LVM
>
> pvdisplay -m:
>
> --- Physical volume ---
> PV Name /dev/sda1
> VG Name vg_system
> PV Size 119.24 GiB / not usable <22.34 MiB
> Allocatable yes (but full)
> PE Size 32.00 MiB
> Total PE 3815
> Free PE 0
> Allocated PE 3815
> PV UUID mqCLFy-iDnt-NfdC-lfSv-Maor-V1Ih-RlG8lP
Such peculiar physical layout suggests you resize your LVs up and down a lot,
is there any chance you could have recently shrinked the LV without first
resizing down all the layers above it (Btrfs and LUKS) in proper order?
> --- Physical Segments ---
> Physical extent 0 to 1248:
> Logical volume /dev/vg_system/btrfs
> Logical extents 2231 to 3479
> Physical extent 1249 to 1728:
> Logical volume /dev/vg_system/btrfs
> Logical extents 640 to 1119
> Physical extent 1729 to 1760:
> Logical volume /dev/vg_system/grml-images
> Logical extents 0 to 31
> Physical extent 1761 to 2016:
> Logical volume /dev/vg_system/swap
> Logical extents 0 to 255
> Physical extent 2017 to 2047:
> Logical volume /dev/vg_system/btrfs
> Logical extents 3480 to 3510
> Physical extent 2048 to 2687:
> Logical volume /dev/vg_system/btrfs
> Logical extents 0 to 639
> Physical extent 2688 to 3007:
> Logical volume /dev/vg_system/btrfs
> Logical extents 1911 to 2230
> Physical extent 3008 to 3320:
> Logical volume /dev/vg_system/btrfs
> Logical extents 1120 to 1432
> Physical extent 3321 to 3336:
> Logical volume /dev/vg_system/boot
> Logical extents 0 to 15
> Physical extent 3337 to 3814:
> Logical volume /dev/vg_system/btrfs
> Logical extents 1433 to 1910
>
>
> Would btrfs even be able to accidentally trim parts of other LVs or does this clearly hint towards a LVM/dm issue? Is there an easy way to somehow trace the trim through the different layers so one can see where it goes wrong?
>
> Cheers,
> Michael
>
> PS: Current state of bisection: It looks like the error was introduced somewhere between b5dd0c658c31b469ccff1b637e5124851e7a4a1c and v5.1.
--
With respect,
Roman
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2
2019-05-18 9:31 ` Roman Mamedov
@ 2019-05-18 10:09 ` Michael Laß
0 siblings, 0 replies; 24+ messages in thread
From: Michael Laß @ 2019-05-18 10:09 UTC (permalink / raw)
To: Roman Mamedov; +Cc: Chris Murphy, Btrfs BTRFS
> Am 18.05.2019 um 11:31 schrieb Roman Mamedov <rm@romanrm.net>:
>
> On Sat, 18 May 2019 11:18:31 +0200
> Michael Laß <bevan@bi-co.net> wrote:
>>
>> pvdisplay -m:
>>
>> --- Physical volume ---
>> PV Name /dev/sda1
>> VG Name vg_system
>> PV Size 119.24 GiB / not usable <22.34 MiB
>> Allocatable yes (but full)
>> PE Size 32.00 MiB
>> Total PE 3815
>> Free PE 0
>> Allocated PE 3815
>> PV UUID mqCLFy-iDnt-NfdC-lfSv-Maor-V1Ih-RlG8lP
>
> Such peculiar physical layout suggests you resize your LVs up and down a lot,
> is there any chance you could have recently shrinked the LV without first
> resizing down all the layers above it (Btrfs and LUKS) in proper order?
This is mostly a result from my transition from several ext4 volumes to one btrfs volume, where I extended the new btrfs volume several times. I quickly checked my shell history and it was something like this:
cryptsetup luksFormat /dev/mapper/vg_system-btrfs
cryptsetup luksOpen --allow-discards /dev/mapper/vg_system-btrfs cryptsystem
mkfs.btrfs -L system /dev/mapper/cryptsystem
lvextend -l100%free /dev/vg_system/btrfs
cryptsetup resize cryptsystem
btrfs fi resize max /
The previous ext4 volumes had been resized a couple of times as well before. However, the last resize operation was in 2015 and never caused any issues since then.
The btrfs file system which I now use to reproduce the issue is freshly created. So if there is any fallout from these resize operations, it would have to be in dm-crypt or LVM. Just to double-check, I compared the output of “cryptsetup status” and “lvdisplay”:
lvdisplay shows me that vg_system/btrfs uses 3511 LE. Each of those is 32MiB which makes
3511 * 32 * 1024 * 1024 / 512 = 230096896 sectors
cryptsetup shows me that the volume has a size of 230092800 sectors and an offset of 4096 which makes
230092800 + 4096 = 230096896 sectors
So this seems to match perfectly.
>> --- Physical Segments ---
>> Physical extent 0 to 1248:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 2231 to 3479
>> Physical extent 1249 to 1728:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 640 to 1119
>> Physical extent 1729 to 1760:
>> Logical volume /dev/vg_system/grml-images
>> Logical extents 0 to 31
>> Physical extent 1761 to 2016:
>> Logical volume /dev/vg_system/swap
>> Logical extents 0 to 255
>> Physical extent 2017 to 2047:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 3480 to 3510
>> Physical extent 2048 to 2687:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 0 to 639
>> Physical extent 2688 to 3007:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 1911 to 2230
>> Physical extent 3008 to 3320:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 1120 to 1432
>> Physical extent 3321 to 3336:
>> Logical volume /dev/vg_system/boot
>> Logical extents 0 to 15
>> Physical extent 3337 to 3814:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 1433 to 1910
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2
2019-05-18 9:18 ` Michael Laß
2019-05-18 9:31 ` Roman Mamedov
@ 2019-05-18 10:26 ` Qu Wenruo
2019-05-19 19:55 ` fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss Michael Laß
1 sibling, 1 reply; 24+ messages in thread
From: Qu Wenruo @ 2019-05-18 10:26 UTC (permalink / raw)
To: Michael Laß, Chris Murphy; +Cc: Btrfs BTRFS
[-- Attachment #1.1: Type: text/plain, Size: 3948 bytes --]
On 2019/5/18 下午5:18, Michael Laß wrote:
>
>> Am 18.05.2019 um 06:09 schrieb Chris Murphy <lists@colorremedies.com>:
>>
>> On Fri, May 17, 2019 at 11:37 AM Michael Laß <bevan@bi-co.net> wrote:
>>>
>>>
>>> I tried to reproduce this issue: I recreated the btrfs file system, set up a minimal system and issued fstrim again. It printed the following error message:
>>>
>>> fstrim: /: FITRIM ioctl failed: Input/output error
>>
>> Huh. Any kernel message at the same time? I would expect any fstrim
>> user space error message to also have a kernel message. Any i/o error
>> suggests some kind of storage stack failure - which could be hardware
>> or software, you can't know without seeing the kernel messages.
>
> I missed that. The kernel messages are:
>
> attempt to access beyond end of device
> sda1: rw=16387, want=252755893, limit=250067632
> BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
>
> Here are some more information on the partitions and LVM physical segments:
>
> fdisk -l /dev/sda:
>
> Device Boot Start End Sectors Size Id Type
> /dev/sda1 * 2048 250069679 250067632 119.2G 8e Linux LVM
>
> pvdisplay -m:
>
> --- Physical volume ---
> PV Name /dev/sda1
> VG Name vg_system
> PV Size 119.24 GiB / not usable <22.34 MiB
> Allocatable yes (but full)
> PE Size 32.00 MiB
> Total PE 3815
> Free PE 0
> Allocated PE 3815
> PV UUID mqCLFy-iDnt-NfdC-lfSv-Maor-V1Ih-RlG8lP
>
> --- Physical Segments ---
> Physical extent 0 to 1248:
> Logical volume /dev/vg_system/btrfs
> Logical extents 2231 to 3479
> Physical extent 1249 to 1728:
> Logical volume /dev/vg_system/btrfs
> Logical extents 640 to 1119
> Physical extent 1729 to 1760:
> Logical volume /dev/vg_system/grml-images
> Logical extents 0 to 31
> Physical extent 1761 to 2016:
> Logical volume /dev/vg_system/swap
> Logical extents 0 to 255
> Physical extent 2017 to 2047:
> Logical volume /dev/vg_system/btrfs
> Logical extents 3480 to 3510
> Physical extent 2048 to 2687:
> Logical volume /dev/vg_system/btrfs
> Logical extents 0 to 639
> Physical extent 2688 to 3007:
> Logical volume /dev/vg_system/btrfs
> Logical extents 1911 to 2230
> Physical extent 3008 to 3320:
> Logical volume /dev/vg_system/btrfs
> Logical extents 1120 to 1432
> Physical extent 3321 to 3336:
> Logical volume /dev/vg_system/boot
> Logical extents 0 to 15
> Physical extent 3337 to 3814:
> Logical volume /dev/vg_system/btrfs
> Logical extents 1433 to 1910
>
>
> Would btrfs even be able to accidentally trim parts of other LVs or does this clearly hint towards a LVM/dm issue?
I can't speak sure, but (at least for latest kernel) btrfs has a lot of
extra mount time self check, including chunk stripe check against
underlying device, thus the possibility shouldn't be that high for btrfs.
> Is there an easy way to somehow trace the trim through the different layers so one can see where it goes wrong?
Sure, you could use dm-log-writes.
It will record all read/write (including trim) for later replay.
So in your case, you can build the storage stack like:
Btrfs
<dm-log-writes>
LUKS/dmcrypt
LVM
MBR partition
Samsung SSD
Then replay the log (using src/log-write/replay-log in fstests) with
verbose output, you can verify every trim operation against the dmcrypt
device size.
If all trim are fine, then move the dm-log-writes a layer lower, until
you find which layer is causing the problem.
Thanks,
Qu
>
> Cheers,
> Michael
>
> PS: Current state of bisection: It looks like the error was introduced somewhere between b5dd0c658c31b469ccff1b637e5124851e7a4a1c and v5.1.
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-18 10:26 ` Qu Wenruo
@ 2019-05-19 19:55 ` Michael Laß
2019-05-20 11:38 ` [dm-devel] " Michael Laß
[not found] ` <CAK-xaQYPs62v971zm1McXw_FGzDmh_vpz3KLEbxzkmrsSgTfXw@mail.gmail.com>
0 siblings, 2 replies; 24+ messages in thread
From: Michael Laß @ 2019-05-19 19:55 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Chris Murphy, Btrfs BTRFS, dm-devel
CC'ing dm-devel, as this seems to be a dm-related issue. Short summary for new readers:
On Linux 5.1 (tested up to 5.1.3), fstrim may discard too many blocks, leading to data loss. I have the following storage stack:
btrfs
dm-crypt (LUKS)
LVM logical volume
LVM single physical volume
MBR partition
Samsung 830 SSD
The mapping between logical volumes and physical segments is a bit mixed up. See below for the output for “pvdisplay -m”. When I issue fstrim on the mounted btrfs volume, I get the following kernel messages:
attempt to access beyond end of device
sda1: rw=16387, want=252755893, limit=250067632
BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
At the same time, other logical volumes on the same physical volume are destroyed. Also the btrfs volume itself may be damaged (this seems to depend on the actual usage).
I can easily reproduce this issue locally and I’m currently bisecting. So far I could narrow down the range of commits to:
Good: 92fff53b7191cae566be9ca6752069426c7f8241
Bad: 225557446856448039a9e495da37b72c20071ef2
In this range of commits, there are only dm-related changes.
So far, I have not reproduced the issue with other file systems or a simplified stack. I first want to continue bisecting but this may take another day.
> Am 18.05.2019 um 12:26 schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> On 2019/5/18 下午5:18, Michael Laß wrote:
>>
>>> Am 18.05.2019 um 06:09 schrieb Chris Murphy <lists@colorremedies.com>:
>>>
>>> On Fri, May 17, 2019 at 11:37 AM Michael Laß <bevan@bi-co.net> wrote:
>>>>
>>>>
>>>> I tried to reproduce this issue: I recreated the btrfs file system, set up a minimal system and issued fstrim again. It printed the following error message:
>>>>
>>>> fstrim: /: FITRIM ioctl failed: Input/output error
>>>
>>> Huh. Any kernel message at the same time? I would expect any fstrim
>>> user space error message to also have a kernel message. Any i/o error
>>> suggests some kind of storage stack failure - which could be hardware
>>> or software, you can't know without seeing the kernel messages.
>>
>> I missed that. The kernel messages are:
>>
>> attempt to access beyond end of device
>> sda1: rw=16387, want=252755893, limit=250067632
>> BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
>>
>> Here are some more information on the partitions and LVM physical segments:
>>
>> fdisk -l /dev/sda:
>>
>> Device Boot Start End Sectors Size Id Type
>> /dev/sda1 * 2048 250069679 250067632 119.2G 8e Linux LVM
>>
>> pvdisplay -m:
>>
>> --- Physical volume ---
>> PV Name /dev/sda1
>> VG Name vg_system
>> PV Size 119.24 GiB / not usable <22.34 MiB
>> Allocatable yes (but full)
>> PE Size 32.00 MiB
>> Total PE 3815
>> Free PE 0
>> Allocated PE 3815
>> PV UUID mqCLFy-iDnt-NfdC-lfSv-Maor-V1Ih-RlG8lP
>>
>> --- Physical Segments ---
>> Physical extent 0 to 1248:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 2231 to 3479
>> Physical extent 1249 to 1728:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 640 to 1119
>> Physical extent 1729 to 1760:
>> Logical volume /dev/vg_system/grml-images
>> Logical extents 0 to 31
>> Physical extent 1761 to 2016:
>> Logical volume /dev/vg_system/swap
>> Logical extents 0 to 255
>> Physical extent 2017 to 2047:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 3480 to 3510
>> Physical extent 2048 to 2687:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 0 to 639
>> Physical extent 2688 to 3007:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 1911 to 2230
>> Physical extent 3008 to 3320:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 1120 to 1432
>> Physical extent 3321 to 3336:
>> Logical volume /dev/vg_system/boot
>> Logical extents 0 to 15
>> Physical extent 3337 to 3814:
>> Logical volume /dev/vg_system/btrfs
>> Logical extents 1433 to 1910
>>
>>
>> Would btrfs even be able to accidentally trim parts of other LVs or does this clearly hint towards a LVM/dm issue?
>
> I can't speak sure, but (at least for latest kernel) btrfs has a lot of
> extra mount time self check, including chunk stripe check against
> underlying device, thus the possibility shouldn't be that high for btrfs.
Indeed, bisecting the issue led me to a range of commits that only contains dm-related and no btrfs-related changes. So I assume this is a bug in dm.
>> Is there an easy way to somehow trace the trim through the different layers so one can see where it goes wrong?
>
> Sure, you could use dm-log-writes.
> It will record all read/write (including trim) for later replay.
>
> So in your case, you can build the storage stack like:
>
> Btrfs
> <dm-log-writes>
> LUKS/dmcrypt
> LVM
> MBR partition
> Samsung SSD
>
> Then replay the log (using src/log-write/replay-log in fstests) with
> verbose output, you can verify every trim operation against the dmcrypt
> device size.
>
> If all trim are fine, then move the dm-log-writes a layer lower, until
> you find which layer is causing the problem.
That sounds like a plan! However, I first want to continue bisecting as I am afraid to lose my reproducer by changing parts of my storage stack.
Cheers,
Michael
>
> Thanks,
> Qu
>>
>> Cheers,
>> Michael
>>
>> PS: Current state of bisection: It looks like the error was introduced somewhere between b5dd0c658c31b469ccff1b637e5124851e7a4a1c and v5.1.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [dm-devel] fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-19 19:55 ` fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss Michael Laß
@ 2019-05-20 11:38 ` Michael Laß
2019-05-21 16:46 ` Michael Laß
[not found] ` <CAK-xaQYPs62v971zm1McXw_FGzDmh_vpz3KLEbxzkmrsSgTfXw@mail.gmail.com>
1 sibling, 1 reply; 24+ messages in thread
From: Michael Laß @ 2019-05-20 11:38 UTC (permalink / raw)
To: dm-devel; +Cc: Chris Murphy, Btrfs BTRFS, Qu Wenruo
> Am 19.05.2019 um 21:55 schrieb Michael Laß <bevan@bi-co.net>:
>
> CC'ing dm-devel, as this seems to be a dm-related issue. Short summary for new readers:
>
> On Linux 5.1 (tested up to 5.1.3), fstrim may discard too many blocks, leading to data loss. I have the following storage stack:
>
> btrfs
> dm-crypt (LUKS)
> LVM logical volume
> LVM single physical volume
> MBR partition
> Samsung 830 SSD
>
> The mapping between logical volumes and physical segments is a bit mixed up. See below for the output for “pvdisplay -m”. When I issue fstrim on the mounted btrfs volume, I get the following kernel messages:
>
> attempt to access beyond end of device
> sda1: rw=16387, want=252755893, limit=250067632
> BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
>
> At the same time, other logical volumes on the same physical volume are destroyed. Also the btrfs volume itself may be damaged (this seems to depend on the actual usage).
>
> I can easily reproduce this issue locally and I’m currently bisecting. So far I could narrow down the range of commits to:
> Good: 92fff53b7191cae566be9ca6752069426c7f8241
> Bad: 225557446856448039a9e495da37b72c20071ef2
I finished bisecting. Here’s the responsible commit:
commit 61697a6abd24acba941359c6268a94f4afe4a53d
Author: Mike Snitzer <snitzer@redhat.com>
Date: Fri Jan 18 14:19:26 2019 -0500
dm: eliminate 'split_discard_bios' flag from DM target interface
There is no need to have DM core split discards on behalf of a DM target
now that blk_queue_split() handles splitting discards based on the
queue_limits. A DM target just needs to set max_discard_sectors,
discard_granularity, etc, in queue_limits.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Maybe the assumptions taken here ("A DM target just needs to set max_discard_sectors, discard_granularity, etc, in queue_limits.”) isn’t valid in my case? Does anyone have an idea?
>
> In this range of commits, there are only dm-related changes.
>
> So far, I have not reproduced the issue with other file systems or a simplified stack. I first want to continue bisecting but this may take another day.
>
>
>> Am 18.05.2019 um 12:26 schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>> On 2019/5/18 下午5:18, Michael Laß wrote:
>>>
>>>> Am 18.05.2019 um 06:09 schrieb Chris Murphy <lists@colorremedies.com>:
>>>>
>>>> On Fri, May 17, 2019 at 11:37 AM Michael Laß <bevan@bi-co.net> wrote:
>>>>>
>>>>>
>>>>> I tried to reproduce this issue: I recreated the btrfs file system, set up a minimal system and issued fstrim again. It printed the following error message:
>>>>>
>>>>> fstrim: /: FITRIM ioctl failed: Input/output error
>>>>
>>>> Huh. Any kernel message at the same time? I would expect any fstrim
>>>> user space error message to also have a kernel message. Any i/o error
>>>> suggests some kind of storage stack failure - which could be hardware
>>>> or software, you can't know without seeing the kernel messages.
>>>
>>> I missed that. The kernel messages are:
>>>
>>> attempt to access beyond end of device
>>> sda1: rw=16387, want=252755893, limit=250067632
>>> BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
>>>
>>> Here are some more information on the partitions and LVM physical segments:
>>>
>>> fdisk -l /dev/sda:
>>>
>>> Device Boot Start End Sectors Size Id Type
>>> /dev/sda1 * 2048 250069679 250067632 119.2G 8e Linux LVM
>>>
>>> pvdisplay -m:
>>>
>>> --- Physical volume ---
>>> PV Name /dev/sda1
>>> VG Name vg_system
>>> PV Size 119.24 GiB / not usable <22.34 MiB
>>> Allocatable yes (but full)
>>> PE Size 32.00 MiB
>>> Total PE 3815
>>> Free PE 0
>>> Allocated PE 3815
>>> PV UUID mqCLFy-iDnt-NfdC-lfSv-Maor-V1Ih-RlG8lP
>>>
>>> --- Physical Segments ---
>>> Physical extent 0 to 1248:
>>> Logical volume /dev/vg_system/btrfs
>>> Logical extents 2231 to 3479
>>> Physical extent 1249 to 1728:
>>> Logical volume /dev/vg_system/btrfs
>>> Logical extents 640 to 1119
>>> Physical extent 1729 to 1760:
>>> Logical volume /dev/vg_system/grml-images
>>> Logical extents 0 to 31
>>> Physical extent 1761 to 2016:
>>> Logical volume /dev/vg_system/swap
>>> Logical extents 0 to 255
>>> Physical extent 2017 to 2047:
>>> Logical volume /dev/vg_system/btrfs
>>> Logical extents 3480 to 3510
>>> Physical extent 2048 to 2687:
>>> Logical volume /dev/vg_system/btrfs
>>> Logical extents 0 to 639
>>> Physical extent 2688 to 3007:
>>> Logical volume /dev/vg_system/btrfs
>>> Logical extents 1911 to 2230
>>> Physical extent 3008 to 3320:
>>> Logical volume /dev/vg_system/btrfs
>>> Logical extents 1120 to 1432
>>> Physical extent 3321 to 3336:
>>> Logical volume /dev/vg_system/boot
>>> Logical extents 0 to 15
>>> Physical extent 3337 to 3814:
>>> Logical volume /dev/vg_system/btrfs
>>> Logical extents 1433 to 1910
>>>
>>>
>>> Would btrfs even be able to accidentally trim parts of other LVs or does this clearly hint towards a LVM/dm issue?
>>
>> I can't speak sure, but (at least for latest kernel) btrfs has a lot of
>> extra mount time self check, including chunk stripe check against
>> underlying device, thus the possibility shouldn't be that high for btrfs.
>
> Indeed, bisecting the issue led me to a range of commits that only contains dm-related and no btrfs-related changes. So I assume this is a bug in dm.
>
>>> Is there an easy way to somehow trace the trim through the different layers so one can see where it goes wrong?
>>
>> Sure, you could use dm-log-writes.
>> It will record all read/write (including trim) for later replay.
>>
>> So in your case, you can build the storage stack like:
>>
>> Btrfs
>> <dm-log-writes>
>> LUKS/dmcrypt
>> LVM
>> MBR partition
>> Samsung SSD
>>
>> Then replay the log (using src/log-write/replay-log in fstests) with
>> verbose output, you can verify every trim operation against the dmcrypt
>> device size.
>>
>> If all trim are fine, then move the dm-log-writes a layer lower, until
>> you find which layer is causing the problem.
>
> That sounds like a plan! However, I first want to continue bisecting as I am afraid to lose my reproducer by changing parts of my storage stack.
>
> Cheers,
> Michael
>
>>
>> Thanks,
>> Qu
>>>
>>> Cheers,
>>> Michael
>>>
>>> PS: Current state of bisection: It looks like the error was introduced somewhere between b5dd0c658c31b469ccff1b637e5124851e7a4a1c and v5.1.
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
[not found] ` <CAK-xaQYPs62v971zm1McXw_FGzDmh_vpz3KLEbxzkmrsSgTfXw@mail.gmail.com>
@ 2019-05-20 13:58 ` Michael Laß
2019-05-20 14:53 ` Andrea Gelmini
0 siblings, 1 reply; 24+ messages in thread
From: Michael Laß @ 2019-05-20 13:58 UTC (permalink / raw)
To: Andrea Gelmini; +Cc: Qu Wenruo, Chris Murphy, Btrfs BTRFS, dm-devel
> Am 20.05.2019 um 15:53 schrieb Andrea Gelmini <andrea.gelmini@gmail.com>:
>
> Had same issue on a similar (well, quite exactly same setup), on a machine in production.
> But It Is more than 4 tera of data, so in the end I re-dd the image and restarted, sticking to 5.0.y branch never had problem.
> I was able to replicate it. SSD Samsung, more recent version.
> Not with btrfs but ext4, by the way.
Thanks for the info, that eliminates one variable. So you also used dm-crypt on top of LVM?
Cheers,
Michael
> I saw the discard of big initial part of lvm partition. I can't find superblocks Copy in the First half, but torwards the end of logical volume.
>
> Sorry, i can't play with It again, but i have the whole (4 tera) dd image with the bug.
>
>
> Ciao,
> Gelma
>
> Il lun 20 mag 2019, 02:38 Michael Laß <bevan@bi-co.net> ha scritto:
> CC'ing dm-devel, as this seems to be a dm-related issue. Short summary for new readers:
>
> On Linux 5.1 (tested up to 5.1.3), fstrim may discard too many blocks, leading to data loss. I have the following storage stack:
>
> btrfs
> dm-crypt (LUKS)
> LVM logical volume
> LVM single physical volume
> MBR partition
> Samsung 830 SSD
>
> The mapping between logical volumes and physical segments is a bit mixed up. See below for the output for “pvdisplay -m”. When I issue fstrim on the mounted btrfs volume, I get the following kernel messages:
>
> attempt to access beyond end of device
> sda1: rw=16387, want=252755893, limit=250067632
> BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
>
> At the same time, other logical volumes on the same physical volume are destroyed. Also the btrfs volume itself may be damaged (this seems to depend on the actual usage).
>
> I can easily reproduce this issue locally and I’m currently bisecting. So far I could narrow down the range of commits to:
> Good: 92fff53b7191cae566be9ca6752069426c7f8241
> Bad: 225557446856448039a9e495da37b72c20071ef2
>
> In this range of commits, there are only dm-related changes.
>
> So far, I have not reproduced the issue with other file systems or a simplified stack. I first want to continue bisecting but this may take another day.
>
>
> > Am 18.05.2019 um 12:26 schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> > On 2019/5/18 下午5:18, Michael Laß wrote:
> >>
> >>> Am 18.05.2019 um 06:09 schrieb Chris Murphy <lists@colorremedies.com>:
> >>>
> >>> On Fri, May 17, 2019 at 11:37 AM Michael Laß <bevan@bi-co.net> wrote:
> >>>>
> >>>>
> >>>> I tried to reproduce this issue: I recreated the btrfs file system, set up a minimal system and issued fstrim again. It printed the following error message:
> >>>>
> >>>> fstrim: /: FITRIM ioctl failed: Input/output error
> >>>
> >>> Huh. Any kernel message at the same time? I would expect any fstrim
> >>> user space error message to also have a kernel message. Any i/o error
> >>> suggests some kind of storage stack failure - which could be hardware
> >>> or software, you can't know without seeing the kernel messages.
> >>
> >> I missed that. The kernel messages are:
> >>
> >> attempt to access beyond end of device
> >> sda1: rw=16387, want=252755893, limit=250067632
> >> BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
> >>
> >> Here are some more information on the partitions and LVM physical segments:
> >>
> >> fdisk -l /dev/sda:
> >>
> >> Device Boot Start End Sectors Size Id Type
> >> /dev/sda1 * 2048 250069679 250067632 119.2G 8e Linux LVM
> >>
> >> pvdisplay -m:
> >>
> >> --- Physical volume ---
> >> PV Name /dev/sda1
> >> VG Name vg_system
> >> PV Size 119.24 GiB / not usable <22.34 MiB
> >> Allocatable yes (but full)
> >> PE Size 32.00 MiB
> >> Total PE 3815
> >> Free PE 0
> >> Allocated PE 3815
> >> PV UUID mqCLFy-iDnt-NfdC-lfSv-Maor-V1Ih-RlG8lP
> >>
> >> --- Physical Segments ---
> >> Physical extent 0 to 1248:
> >> Logical volume /dev/vg_system/btrfs
> >> Logical extents 2231 to 3479
> >> Physical extent 1249 to 1728:
> >> Logical volume /dev/vg_system/btrfs
> >> Logical extents 640 to 1119
> >> Physical extent 1729 to 1760:
> >> Logical volume /dev/vg_system/grml-images
> >> Logical extents 0 to 31
> >> Physical extent 1761 to 2016:
> >> Logical volume /dev/vg_system/swap
> >> Logical extents 0 to 255
> >> Physical extent 2017 to 2047:
> >> Logical volume /dev/vg_system/btrfs
> >> Logical extents 3480 to 3510
> >> Physical extent 2048 to 2687:
> >> Logical volume /dev/vg_system/btrfs
> >> Logical extents 0 to 639
> >> Physical extent 2688 to 3007:
> >> Logical volume /dev/vg_system/btrfs
> >> Logical extents 1911 to 2230
> >> Physical extent 3008 to 3320:
> >> Logical volume /dev/vg_system/btrfs
> >> Logical extents 1120 to 1432
> >> Physical extent 3321 to 3336:
> >> Logical volume /dev/vg_system/boot
> >> Logical extents 0 to 15
> >> Physical extent 3337 to 3814:
> >> Logical volume /dev/vg_system/btrfs
> >> Logical extents 1433 to 1910
> >>
> >>
> >> Would btrfs even be able to accidentally trim parts of other LVs or does this clearly hint towards a LVM/dm issue?
> >
> > I can't speak sure, but (at least for latest kernel) btrfs has a lot of
> > extra mount time self check, including chunk stripe check against
> > underlying device, thus the possibility shouldn't be that high for btrfs.
>
> Indeed, bisecting the issue led me to a range of commits that only contains dm-related and no btrfs-related changes. So I assume this is a bug in dm.
>
> >> Is there an easy way to somehow trace the trim through the different layers so one can see where it goes wrong?
> >
> > Sure, you could use dm-log-writes.
> > It will record all read/write (including trim) for later replay.
> >
> > So in your case, you can build the storage stack like:
> >
> > Btrfs
> > <dm-log-writes>
> > LUKS/dmcrypt
> > LVM
> > MBR partition
> > Samsung SSD
> >
> > Then replay the log (using src/log-write/replay-log in fstests) with
> > verbose output, you can verify every trim operation against the dmcrypt
> > device size.
> >
> > If all trim are fine, then move the dm-log-writes a layer lower, until
> > you find which layer is causing the problem.
>
> That sounds like a plan! However, I first want to continue bisecting as I am afraid to lose my reproducer by changing parts of my storage stack.
>
> Cheers,
> Michael
>
> >
> > Thanks,
> > Qu
> >>
> >> Cheers,
> >> Michael
> >>
> >> PS: Current state of bisection: It looks like the error was introduced somewhere between b5dd0c658c31b469ccff1b637e5124851e7a4a1c and v5.1.
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-20 13:58 ` Michael Laß
@ 2019-05-20 14:53 ` Andrea Gelmini
2019-05-20 16:45 ` Milan Broz
0 siblings, 1 reply; 24+ messages in thread
From: Andrea Gelmini @ 2019-05-20 14:53 UTC (permalink / raw)
To: Michael Laß; +Cc: Qu Wenruo, Chris Murphy, Btrfs BTRFS, dm-devel
Il giorno lun 20 mag 2019 alle ore 15:58 Michael Laß <bevan@bi-co.net>
ha scritto:
>
>
> > Am 20.05.2019 um 15:53 schrieb Andrea Gelmini <andrea.gelmini@gmail.com>:
> >
> > Had same issue on a similar (well, quite exactly same setup), on a machine in production.
> > But It Is more than 4 tera of data, so in the end I re-dd the image and restarted, sticking to 5.0.y branch never had problem.
> > I was able to replicate it. SSD Samsung, more recent version.
> > Not with btrfs but ext4, by the way.
>
> Thanks for the info, that eliminates one variable. So you also used dm-crypt on top of LVM?
root@glet:~# lsblk |grep -v loop
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 3,7T 0 disk
├─sda1 8:1 0 260M 0 part /boot/efi
├─sda2 8:2 0 16M 0 part
├─sda3 8:3 0 67,6G 0 part
├─sda4 8:4 0 883M 0 part
├─sda5 8:5 0 1,9G 0 part /boot
└─sda6 8:6 0 3,5T 0 part
└─sda6_crypt 254:0 0 3,5T 0 crypt
├─cry-root 254:1 0 28G 0 lvm /
├─cry-swap 254:2 0 70G 0 lvm [SWAP]
└─cry-home 254:3 0 2,7T 0 lvm /home
nvme0n1 259:0 0 119,2G 0 disk
├─nvme0n1p1 259:1 0 97,8G 0 part /mnt/nvme
└─nvme0n1p2 259:2 0 21,5G 0 part [SWAP]
root@glet:~#
Booting with kernel > 5.0, it discard cry-home, for the first big part.
root@glet:~# lvdisplay -vv
devices/global_filter not found in config: defaulting to
global_filter = [ "a|.*/|" ]
Setting global/locking_type to 1
Setting global/use_lvmetad to 1
global/lvmetad_update_wait_time not found in config: defaulting to 10
Setting response to OK
Setting protocol to lvmetad
Setting version to 1
Setting global/use_lvmpolld to 1
Setting devices/sysfs_scan to 1
Setting devices/multipath_component_detection to 1
Setting devices/md_component_detection to 1
Setting devices/fw_raid_component_detection to 0
Setting devices/ignore_suspended_devices to 0
Setting devices/ignore_lvm_mirrors to 1
devices/filter not found in config: defaulting to filter = [ "a|.*/|" ]
Setting devices/cache_dir to /run/lvm
Setting devices/cache_file_prefix to
devices/cache not found in config: defaulting to /run/lvm/.cache
Setting devices/write_cache_state to 1
Setting global/use_lvmetad to 1
Setting activation/activation_mode to degraded
metadata/record_lvs_history not found in config: defaulting to 0
Setting activation/monitoring to 1
Setting global/locking_type to 1
Setting global/wait_for_locks to 1
File-based locking selected.
Setting global/prioritise_write_locks to 1
Setting global/locking_dir to /run/lock/lvm
Setting global/use_lvmlockd to 0
Setting response to OK
Setting token to filter:3239235440
Setting daemon_pid to 650
Setting response to OK
Setting global_disable to 0
report/output_format not found in config: defaulting to basic
log/report_command_log not found in config: defaulting to 0
Setting response to OK
Setting response to OK
Setting response to OK
Setting name to cry
Processing VG cry Orkwof-zq16-e1qM-rUMt-vKV1-Lc13-CgiKYp
Locking /run/lock/lvm/V_cry RB
Reading VG cry Orkwofzq16e1qMrUMtvKV1Lc13CgiKYp
Setting response to OK
Setting response to OK
Setting response to OK
Setting name to cry
Setting metadata/format to lvm2
Setting id to OtoEfX-bpWN-l9gd-kLJW-1xca-PaHR-ARrSKr
Setting format to lvm2
Setting device to 65024
Setting dev_size to 7465840640
Setting label_sector to 1
Setting ext_flags to 1
Setting ext_version to 2
Setting size to 1044480
Setting start to 4096
Setting ignore to 0
Setting response to OK
Setting response to OK
Setting response to OK
/dev/mapper/sda6_crypt: size is 7465842688 sectors
Adding cry/root to the list of LVs to be processed.
Adding cry/swap to the list of LVs to be processed.
Adding cry/home to the list of LVs to be processed.
Processing LV root in VG cry.
--- Logical volume ---
global/lvdisplay_shows_full_device_path not found in config:
defaulting to 0
LV Path /dev/cry/root
LV Name root
VG Name cry
LV UUID J0vJ5D-Rzyt-9fOm-cJVU-bwjc-6pc1-jGqIhc
LV Write Access read/write
LV Creation host, time glet, 2018-11-02 17:51:35 +0100
LV Status available
# open 1
LV Size <27,94 GiB
Current LE 7152
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:1
Processing LV swap in VG cry.
--- Logical volume ---
global/lvdisplay_shows_full_device_path not found in config:
defaulting to 0
LV Path /dev/cry/swap
LV Name swap
VG Name cry
LV UUID c4iLex-xxMu-Quyr-4qkt-hFk2-uOb5-BDF5ls
LV Write Access read/write
LV Creation host, time glet, 2018-11-02 17:51:43 +0100
LV Status available
# open 2
LV Size 70,00 GiB
Current LE 17920
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:2
Processing LV home in VG cry.
--- Logical volume ---
global/lvdisplay_shows_full_device_path not found in config:
defaulting to 0
LV Path /dev/cry/home
LV Name home
VG Name cry
LV UUID jycl7w-59lN-F3Ne-DBDa-G21g-CAmb-ROvIaX
LV Write Access read/write
LV Creation host, time glet, 2018-11-02 17:51:50 +0100
LV Status available
# open 1
LV Size <2,71 TiB
Current LE 709591
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:3
Unlocking /run/lock/lvm/V_cry
Setting global/notify_dbus to 1
Also, changing crypttab:
root@glet:~# cat /etc/crypttab
sda6_crypt UUID=fe03e2e6-b8b1-4672-8a3e-b536ac4e1539 none luks,discard
removing discard didn't solve the issue.
In my setup it was enough to boot the system, so having complain about
/home mounting
impossible (no filesystem found).
Well, keep in mind that at boot I have a few things, like:
root@glet:~# grep -i swap /etc/fstab
/dev/mapper/cry-swap none swap sw,discard=once,pri=0
0 0
/dev/nvme0n1p2 none swap sw,discard=once,pri=1
And other stuff in cron and so on.
So I can trigger the problem at boot (ubuntu 19.04), by my changes.
Hope it helps.
Uhm, by the way, my SSD (latest firmware):
root@glet:~# hdparm -I /dev/sda
/dev/sda:
ATA device, with non-removable media
Model Number: Samsung SSD 860 EVO 4TB
Serial Number: S3YPNWAK101163T
Firmware Revision: RVT02B6Q
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II
Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Used: unknown (minor revision code 0x005e)
Supported: 11 8 7 6 5
Likely used: 11
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 7814037168
Logical Sector size: 512 bytes
Physical Sector size: 512 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 3815447 MBytes
device size with M = 1000*1000: 4000787 MBytes (4000 GB)
cache/buffer size = unknown
Form Factor: 2.5 inch
Nominal Media Rotation Rate: Solid State Device
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 1 Current = 1
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
Write-Read-Verify feature set
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Phy event counters
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
* DMA Setup Auto-Activate optimization
* Device-initiated interface power management
* Asynchronous notification (eg. media change)
* Software settings preservation
* Device Sleep (DEVSLP)
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
* reserved 69[4]
* DOWNLOAD MICROCODE DMA command
* SET MAX SETPASSWORD/UNLOCK DMA commands
* WRITE BUFFER DMA command
* READ BUFFER DMA command
* Data Set Management TRIM supported (limit 8 blocks)
* Deterministic read ZEROs after TRIM
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
supported: enhanced erase
4min for SECURITY ERASE UNIT. 8min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5002538e7001e8a7
NAA : 5
IEEE OUI : 002538
Unique ID : e7001e8a7
Device Sleep:
DEVSLP Exit Timeout (DETO): 50 ms (drive)
Minimum DEVSLP Assertion Time (MDAT): 30 ms (drive)
Checksum: correct
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-20 14:53 ` Andrea Gelmini
@ 2019-05-20 16:45 ` Milan Broz
2019-05-20 19:58 ` Michael Laß
2019-05-21 18:54 ` Andrea Gelmini
0 siblings, 2 replies; 24+ messages in thread
From: Milan Broz @ 2019-05-20 16:45 UTC (permalink / raw)
To: Andrea Gelmini, Michael Laß
Cc: Qu Wenruo, Chris Murphy, Btrfs BTRFS, dm-devel
On 20/05/2019 16:53, Andrea Gelmini wrote:
...
> Also, changing crypttab:
> root@glet:~# cat /etc/crypttab
> sda6_crypt UUID=fe03e2e6-b8b1-4672-8a3e-b536ac4e1539 none luks,discard
>
> removing discard didn't solve the issue.
This is very strange, disabling discard should reject every discard IO
on the dmcrypt layer. Are you sure it was really disabled?
Note, it is the root filesystem, so you have to regenerate initramfs
to update crypttab inside it.
Could you paste "dmsetup table" and "lsblk -D" to verify that discard flag
is not there?
(I mean dmsetup table with the zeroed key, as a default and safe output.)
Milan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-20 16:45 ` Milan Broz
@ 2019-05-20 19:58 ` Michael Laß
2019-05-21 18:54 ` Andrea Gelmini
1 sibling, 0 replies; 24+ messages in thread
From: Michael Laß @ 2019-05-20 19:58 UTC (permalink / raw)
To: Milan Broz; +Cc: Andrea Gelmini, Qu Wenruo, Chris Murphy, Btrfs BTRFS, dm-devel
> Am 20.05.2019 um 18:45 schrieb Milan Broz <gmazyland@gmail.com>:
>
> On 20/05/2019 16:53, Andrea Gelmini wrote:
> ...
>> Also, changing crypttab:
>> root@glet:~# cat /etc/crypttab
>> sda6_crypt UUID=fe03e2e6-b8b1-4672-8a3e-b536ac4e1539 none luks,discard
>>
>> removing discard didn't solve the issue.
>
> This is very strange, disabling discard should reject every discard IO
> on the dmcrypt layer. Are you sure it was really disabled?
>
> Note, it is the root filesystem, so you have to regenerate initramfs
> to update crypttab inside it.
For me, I cannot reproduce the issue when I remove the discard option from the crypttab (and regenerate the initramfs). When trying fstrim I just get “the discard operation is not supported”, as I would expect. No damage is done to other logical volumes.
However, my stack differs from Andrea’s in that I have dm-crypt on an LVM logical volume and not dm-crypt as a physical volume for LVM. Not sure if that makes a difference here.
Cheers,
Michael
> Could you paste "dmsetup table" and "lsblk -D" to verify that discard flag
> is not there?
> (I mean dmsetup table with the zeroed key, as a default and safe output.)
>
> Milan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [dm-devel] fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-20 11:38 ` [dm-devel] " Michael Laß
@ 2019-05-21 16:46 ` Michael Laß
2019-05-21 19:00 ` Andrea Gelmini
0 siblings, 1 reply; 24+ messages in thread
From: Michael Laß @ 2019-05-21 16:46 UTC (permalink / raw)
To: dm-devel; +Cc: Chris Murphy, Qu Wenruo, Btrfs BTRFS
> Am 20.05.2019 um 13:38 schrieb Michael Laß <bevan@bi-co.net>:
>
>>
>> Am 19.05.2019 um 21:55 schrieb Michael Laß <bevan@bi-co.net>:
>>
>> CC'ing dm-devel, as this seems to be a dm-related issue. Short summary for new readers:
>>
>> On Linux 5.1 (tested up to 5.1.3), fstrim may discard too many blocks, leading to data loss. I have the following storage stack:
>>
>> btrfs
>> dm-crypt (LUKS)
>> LVM logical volume
>> LVM single physical volume
>> MBR partition
>> Samsung 830 SSD
>>
>> The mapping between logical volumes and physical segments is a bit mixed up. See below for the output for “pvdisplay -m”. When I issue fstrim on the mounted btrfs volume, I get the following kernel messages:
>>
>> attempt to access beyond end of device
>> sda1: rw=16387, want=252755893, limit=250067632
>> BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
>>
>> At the same time, other logical volumes on the same physical volume are destroyed. Also the btrfs volume itself may be damaged (this seems to depend on the actual usage).
>>
>> I can easily reproduce this issue locally and I’m currently bisecting. So far I could narrow down the range of commits to:
>> Good: 92fff53b7191cae566be9ca6752069426c7f8241
>> Bad: 225557446856448039a9e495da37b72c20071ef2
>
> I finished bisecting. Here’s the responsible commit:
>
> commit 61697a6abd24acba941359c6268a94f4afe4a53d
> Author: Mike Snitzer <snitzer@redhat.com>
> Date: Fri Jan 18 14:19:26 2019 -0500
>
> dm: eliminate 'split_discard_bios' flag from DM target interface
>
> There is no need to have DM core split discards on behalf of a DM target
> now that blk_queue_split() handles splitting discards based on the
> queue_limits. A DM target just needs to set max_discard_sectors,
> discard_granularity, etc, in queue_limits.
>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reverting that commit solves the issue for me on Linux 5.1.3. Would that be an option until the root cause has been identified? I’d rather not let more people run into this issue.
Cheers,
Michael
> Maybe the assumptions taken here ("A DM target just needs to set max_discard_sectors, discard_granularity, etc, in queue_limits.”) isn’t valid in my case? Does anyone have an idea?
>
>
>>
>> In this range of commits, there are only dm-related changes.
>>
>> So far, I have not reproduced the issue with other file systems or a simplified stack. I first want to continue bisecting but this may take another day.
>>
>>
>>> Am 18.05.2019 um 12:26 schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>> On 2019/5/18 下午5:18, Michael Laß wrote:
>>>>
>>>>> Am 18.05.2019 um 06:09 schrieb Chris Murphy <lists@colorremedies.com>:
>>>>>
>>>>> On Fri, May 17, 2019 at 11:37 AM Michael Laß <bevan@bi-co.net> wrote:
>>>>>>
>>>>>>
>>>>>> I tried to reproduce this issue: I recreated the btrfs file system, set up a minimal system and issued fstrim again. It printed the following error message:
>>>>>>
>>>>>> fstrim: /: FITRIM ioctl failed: Input/output error
>>>>>
>>>>> Huh. Any kernel message at the same time? I would expect any fstrim
>>>>> user space error message to also have a kernel message. Any i/o error
>>>>> suggests some kind of storage stack failure - which could be hardware
>>>>> or software, you can't know without seeing the kernel messages.
>>>>
>>>> I missed that. The kernel messages are:
>>>>
>>>> attempt to access beyond end of device
>>>> sda1: rw=16387, want=252755893, limit=250067632
>>>> BTRFS warning (device dm-5): failed to trim 1 device(s), last error -5
>>>>
>>>> Here are some more information on the partitions and LVM physical segments:
>>>>
>>>> fdisk -l /dev/sda:
>>>>
>>>> Device Boot Start End Sectors Size Id Type
>>>> /dev/sda1 * 2048 250069679 250067632 119.2G 8e Linux LVM
>>>>
>>>> pvdisplay -m:
>>>>
>>>> --- Physical volume ---
>>>> PV Name /dev/sda1
>>>> VG Name vg_system
>>>> PV Size 119.24 GiB / not usable <22.34 MiB
>>>> Allocatable yes (but full)
>>>> PE Size 32.00 MiB
>>>> Total PE 3815
>>>> Free PE 0
>>>> Allocated PE 3815
>>>> PV UUID mqCLFy-iDnt-NfdC-lfSv-Maor-V1Ih-RlG8lP
>>>>
>>>> --- Physical Segments ---
>>>> Physical extent 0 to 1248:
>>>> Logical volume /dev/vg_system/btrfs
>>>> Logical extents 2231 to 3479
>>>> Physical extent 1249 to 1728:
>>>> Logical volume /dev/vg_system/btrfs
>>>> Logical extents 640 to 1119
>>>> Physical extent 1729 to 1760:
>>>> Logical volume /dev/vg_system/grml-images
>>>> Logical extents 0 to 31
>>>> Physical extent 1761 to 2016:
>>>> Logical volume /dev/vg_system/swap
>>>> Logical extents 0 to 255
>>>> Physical extent 2017 to 2047:
>>>> Logical volume /dev/vg_system/btrfs
>>>> Logical extents 3480 to 3510
>>>> Physical extent 2048 to 2687:
>>>> Logical volume /dev/vg_system/btrfs
>>>> Logical extents 0 to 639
>>>> Physical extent 2688 to 3007:
>>>> Logical volume /dev/vg_system/btrfs
>>>> Logical extents 1911 to 2230
>>>> Physical extent 3008 to 3320:
>>>> Logical volume /dev/vg_system/btrfs
>>>> Logical extents 1120 to 1432
>>>> Physical extent 3321 to 3336:
>>>> Logical volume /dev/vg_system/boot
>>>> Logical extents 0 to 15
>>>> Physical extent 3337 to 3814:
>>>> Logical volume /dev/vg_system/btrfs
>>>> Logical extents 1433 to 1910
>>>>
>>>>
>>>> Would btrfs even be able to accidentally trim parts of other LVs or does this clearly hint towards a LVM/dm issue?
>>>
>>> I can't speak sure, but (at least for latest kernel) btrfs has a lot of
>>> extra mount time self check, including chunk stripe check against
>>> underlying device, thus the possibility shouldn't be that high for btrfs.
>>
>> Indeed, bisecting the issue led me to a range of commits that only contains dm-related and no btrfs-related changes. So I assume this is a bug in dm.
>>
>>>> Is there an easy way to somehow trace the trim through the different layers so one can see where it goes wrong?
>>>
>>> Sure, you could use dm-log-writes.
>>> It will record all read/write (including trim) for later replay.
>>>
>>> So in your case, you can build the storage stack like:
>>>
>>> Btrfs
>>> <dm-log-writes>
>>> LUKS/dmcrypt
>>> LVM
>>> MBR partition
>>> Samsung SSD
>>>
>>> Then replay the log (using src/log-write/replay-log in fstests) with
>>> verbose output, you can verify every trim operation against the dmcrypt
>>> device size.
>>>
>>> If all trim are fine, then move the dm-log-writes a layer lower, until
>>> you find which layer is causing the problem.
>>
>> That sounds like a plan! However, I first want to continue bisecting as I am afraid to lose my reproducer by changing parts of my storage stack.
>>
>> Cheers,
>> Michael
>>
>>>
>>> Thanks,
>>> Qu
>>>>
>>>> Cheers,
>>>> Michael
>>>>
>>>> PS: Current state of bisection: It looks like the error was introduced somewhere between b5dd0c658c31b469ccff1b637e5124851e7a4a1c and v5.1.
>>
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-20 16:45 ` Milan Broz
2019-05-20 19:58 ` Michael Laß
@ 2019-05-21 18:54 ` Andrea Gelmini
1 sibling, 0 replies; 24+ messages in thread
From: Andrea Gelmini @ 2019-05-21 18:54 UTC (permalink / raw)
To: Milan Broz
Cc: Michael Laß, Qu Wenruo, Chris Murphy, Btrfs BTRFS, dm-devel
Il giorno lun 20 mag 2019 alle ore 18:45 Milan Broz
<gmazyland@gmail.com> ha scritto:
> Note, it is the root filesystem, so you have to regenerate initramfs
> to update crypttab inside it.
Good catch. I didn't re-mkinitramfs.
> Could you paste "dmsetup table" and "lsblk -D" to verify that discard flag
> is not there?
> (I mean dmsetup table with the zeroed key, as a default and safe output.)
This weekend if I have time I'm going to re-test it. It takes a lot to
restore 4TB.
Thanks a lot,
Andrea
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [dm-devel] fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-21 16:46 ` Michael Laß
@ 2019-05-21 19:00 ` Andrea Gelmini
2019-05-21 19:59 ` Michael Laß
2019-05-21 20:12 ` Mike Snitzer
0 siblings, 2 replies; 24+ messages in thread
From: Andrea Gelmini @ 2019-05-21 19:00 UTC (permalink / raw)
To: Michael Laß
Cc: dm-devel, Chris Murphy, Qu Wenruo, Btrfs BTRFS, Mike Snitzer
On Tue, May 21, 2019 at 06:46:20PM +0200, Michael Laß wrote:
> > I finished bisecting. Here’s the responsible commit:
> >
> > commit 61697a6abd24acba941359c6268a94f4afe4a53d
> > Author: Mike Snitzer <snitzer@redhat.com>
> > Date: Fri Jan 18 14:19:26 2019 -0500
> >
> > dm: eliminate 'split_discard_bios' flag from DM target interface
> >
> > There is no need to have DM core split discards on behalf of a DM target
> > now that blk_queue_split() handles splitting discards based on the
> > queue_limits. A DM target just needs to set max_discard_sectors,
> > discard_granularity, etc, in queue_limits.
> >
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
>
> Reverting that commit solves the issue for me on Linux 5.1.3. Would that be an option until the root cause has been identified? I’d rather not let more people run into this issue.
Thanks a lot Michael, for your time/work.
This kind of bisecting are very boring and time consuming.
I CC: also the patch author.
Thanks again,
Andrea
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [dm-devel] fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-21 19:00 ` Andrea Gelmini
@ 2019-05-21 19:59 ` Michael Laß
2019-05-21 20:12 ` Mike Snitzer
1 sibling, 0 replies; 24+ messages in thread
From: Michael Laß @ 2019-05-21 19:59 UTC (permalink / raw)
To: Andrea Gelmini
Cc: dm-devel, Chris Murphy, Qu Wenruo, Btrfs BTRFS, Mike Snitzer
> Am 21.05.2019 um 21:00 schrieb Andrea Gelmini <andrea.gelmini@linux.it>:
>
> On Tue, May 21, 2019 at 06:46:20PM +0200, Michael Laß wrote:
>>> I finished bisecting. Here’s the responsible commit:
>>>
>>> commit 61697a6abd24acba941359c6268a94f4afe4a53d
>>> Author: Mike Snitzer <snitzer@redhat.com>
>>> Date: Fri Jan 18 14:19:26 2019 -0500
>>>
>>> dm: eliminate 'split_discard_bios' flag from DM target interface
>>>
>>> There is no need to have DM core split discards on behalf of a DM target
>>> now that blk_queue_split() handles splitting discards based on the
>>> queue_limits. A DM target just needs to set max_discard_sectors,
>>> discard_granularity, etc, in queue_limits.
>>>
>>> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
>>
>> Reverting that commit solves the issue for me on Linux 5.1.3. Would that be an option until the root cause has been identified? I’d rather not let more people run into this issue.
>
> Thanks a lot Michael, for your time/work.
>
> This kind of bisecting are very boring and time consuming.
I just sent a patch to dm-devel which fixes the issue for me. Maybe you can test that in your environment?
Cheers,
Michael
PS: Sorry if the patch was sent multiple times. I had some issues with git send-email.
> I CC: also the patch author.
>
> Thanks again,
> Andrea
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-21 19:00 ` Andrea Gelmini
2019-05-21 19:59 ` Michael Laß
@ 2019-05-21 20:12 ` Mike Snitzer
2019-05-24 15:00 ` Andrea Gelmini
1 sibling, 1 reply; 24+ messages in thread
From: Mike Snitzer @ 2019-05-21 20:12 UTC (permalink / raw)
To: Andrea Gelmini
Cc: Michael Laß, dm-devel, Chris Murphy, Qu Wenruo, Btrfs BTRFS
On Tue, May 21 2019 at 3:00pm -0400,
Andrea Gelmini <andrea.gelmini@linux.it> wrote:
> On Tue, May 21, 2019 at 06:46:20PM +0200, Michael Laß wrote:
> > > I finished bisecting. Here’s the responsible commit:
> > >
> > > commit 61697a6abd24acba941359c6268a94f4afe4a53d
> > > Author: Mike Snitzer <snitzer@redhat.com>
> > > Date: Fri Jan 18 14:19:26 2019 -0500
> > >
> > > dm: eliminate 'split_discard_bios' flag from DM target interface
> > >
> > > There is no need to have DM core split discards on behalf of a DM target
> > > now that blk_queue_split() handles splitting discards based on the
> > > queue_limits. A DM target just needs to set max_discard_sectors,
> > > discard_granularity, etc, in queue_limits.
> > >
> > > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> >
> > Reverting that commit solves the issue for me on Linux 5.1.3. Would
> that be an option until the root cause has been identified? I’d rather
> not let more people run into this issue.
>
> Thanks a lot Michael, for your time/work.
>
> This kind of bisecting are very boring and time consuming.
>
> I CC: also the patch author.
Thanks for cc'ing me, this thread didn't catch my eye.
Sorry for your troubles. Can you please try this patch?
Thanks,
Mike
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1fb1333fefec..997385c1ca54 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1469,7 +1469,7 @@ static unsigned get_num_write_zeroes_bios(struct dm_target *ti)
static int __send_changing_extent_only(struct clone_info *ci, struct dm_target *ti,
unsigned num_bios)
{
- unsigned len = ci->sector_count;
+ unsigned len;
/*
* Even though the device advertised support for this type of
@@ -1480,6 +1480,8 @@ static int __send_changing_extent_only(struct clone_info *ci, struct dm_target *
if (!num_bios)
return -EOPNOTSUPP;
+ len = min((sector_t)ci->sector_count, max_io_len_target_boundary(ci->sector, ti));
+
__send_duplicate_bios(ci, ti, num_bios, &len);
ci->sector += len;
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-21 20:12 ` Mike Snitzer
@ 2019-05-24 15:00 ` Andrea Gelmini
2019-05-24 15:10 ` Greg KH
0 siblings, 1 reply; 24+ messages in thread
From: Andrea Gelmini @ 2019-05-24 15:00 UTC (permalink / raw)
To: Mike Snitzer
Cc: Michael Laß, dm-devel, Chris Murphy, Qu Wenruo, Btrfs BTRFS, gregkh
Hi Mike,
I'm doing setup to replicate and test the condition. I see your
patch is already in the 5.2 dev kernel.
I'm going to try with latest git, and see what happens. Anyway,
don't you this it would be good
to have this patch ( 51b86f9a8d1c4bb4e3862ee4b4c5f46072f7520d )
anyway in the 5.1 stable branch?
Thanks a lot for your time,
Gelma
Il giorno mar 21 mag 2019 alle ore 22:12 Mike Snitzer
<snitzer@redhat.com> ha scritto:
>
> On Tue, May 21 2019 at 3:00pm -0400,
> Andrea Gelmini <andrea.gelmini@linux.it> wrote:
>
> > On Tue, May 21, 2019 at 06:46:20PM +0200, Michael Laß wrote:
> > > > I finished bisecting. Here’s the responsible commit:
> > > >
> > > > commit 61697a6abd24acba941359c6268a94f4afe4a53d
> > > > Author: Mike Snitzer <snitzer@redhat.com>
> > > > Date: Fri Jan 18 14:19:26 2019 -0500
> > > >
> > > > dm: eliminate 'split_discard_bios' flag from DM target interface
> > > >
> > > > There is no need to have DM core split discards on behalf of a DM target
> > > > now that blk_queue_split() handles splitting discards based on the
> > > > queue_limits. A DM target just needs to set max_discard_sectors,
> > > > discard_granularity, etc, in queue_limits.
> > > >
> > > > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > >
> > > Reverting that commit solves the issue for me on Linux 5.1.3. Would
> > that be an option until the root cause has been identified? I’d rather
> > not let more people run into this issue.
> >
> > Thanks a lot Michael, for your time/work.
> >
> > This kind of bisecting are very boring and time consuming.
> >
> > I CC: also the patch author.
>
> Thanks for cc'ing me, this thread didn't catch my eye.
>
> Sorry for your troubles. Can you please try this patch?
>
> Thanks,
> Mike
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 1fb1333fefec..997385c1ca54 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1469,7 +1469,7 @@ static unsigned get_num_write_zeroes_bios(struct dm_target *ti)
> static int __send_changing_extent_only(struct clone_info *ci, struct dm_target *ti,
> unsigned num_bios)
> {
> - unsigned len = ci->sector_count;
> + unsigned len;
>
> /*
> * Even though the device advertised support for this type of
> @@ -1480,6 +1480,8 @@ static int __send_changing_extent_only(struct clone_info *ci, struct dm_target *
> if (!num_bios)
> return -EOPNOTSUPP;
>
> + len = min((sector_t)ci->sector_count, max_io_len_target_boundary(ci->sector, ti));
> +
> __send_duplicate_bios(ci, ti, num_bios, &len);
>
> ci->sector += len;
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss
2019-05-24 15:00 ` Andrea Gelmini
@ 2019-05-24 15:10 ` Greg KH
0 siblings, 0 replies; 24+ messages in thread
From: Greg KH @ 2019-05-24 15:10 UTC (permalink / raw)
To: Andrea Gelmini
Cc: Mike Snitzer, Michael Laß,
dm-devel, Chris Murphy, Qu Wenruo, Btrfs BTRFS
On Fri, May 24, 2019 at 05:00:51PM +0200, Andrea Gelmini wrote:
> Hi Mike,
> I'm doing setup to replicate and test the condition. I see your
> patch is already in the 5.2 dev kernel.
> I'm going to try with latest git, and see what happens. Anyway,
> don't you this it would be good
> to have this patch ( 51b86f9a8d1c4bb4e3862ee4b4c5f46072f7520d )
> anyway in the 5.1 stable branch?
It's already in the 5.1 stable queue and will be in the next 5.1 release
in a day or so.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2
2019-05-16 22:16 Massive filesystem corruption after balance + fstrim on Linux 5.1.2 Michael Laß
2019-05-16 23:41 ` Qu Wenruo
2019-05-16 23:42 ` Chris Murphy
@ 2019-05-28 12:36 ` Christoph Anton Mitterer
2019-05-28 12:43 ` Michael Laß
2 siblings, 1 reply; 24+ messages in thread
From: Christoph Anton Mitterer @ 2019-05-28 12:36 UTC (permalink / raw)
To: Michael Laß, linux-btrfs
Hey.
Just to be on the safe side...
AFAIU this issue only occured in 5.1.2 and later, right?
Starting with which 5.1.x and 5.2.x versions has the fix been merged?
Cheers,
Chris.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2
2019-05-28 12:36 ` Massive filesystem corruption after balance + fstrim on Linux 5.1.2 Christoph Anton Mitterer
@ 2019-05-28 12:43 ` Michael Laß
0 siblings, 0 replies; 24+ messages in thread
From: Michael Laß @ 2019-05-28 12:43 UTC (permalink / raw)
To: Christoph Anton Mitterer; +Cc: linux-btrfs
> Am 28.05.2019 um 14:36 schrieb Christoph Anton Mitterer <calestyo@scientia.net>:
>
> Hey.
>
> Just to be on the safe side...
>
> AFAIU this issue only occured in 5.1.2 and later, right?
No. The issue was already introduced in v5.1-rc1 (commit 61697a6abd24).
> Starting with which 5.1.x and 5.2.x versions has the fix been merged?
It's fixed in v5.2-rc2 (commit 51b86f9a8d1c) and v5.1.5 (commit 871e122d55e8).
Cheers,
Michael
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2019-05-28 12:43 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16 22:16 Massive filesystem corruption after balance + fstrim on Linux 5.1.2 Michael Laß
2019-05-16 23:41 ` Qu Wenruo
2019-05-16 23:42 ` Chris Murphy
2019-05-17 17:37 ` Michael Laß
2019-05-18 4:09 ` Chris Murphy
2019-05-18 9:18 ` Michael Laß
2019-05-18 9:31 ` Roman Mamedov
2019-05-18 10:09 ` Michael Laß
2019-05-18 10:26 ` Qu Wenruo
2019-05-19 19:55 ` fstrim discarding too many or wrong blocks on Linux 5.1, leading to data loss Michael Laß
2019-05-20 11:38 ` [dm-devel] " Michael Laß
2019-05-21 16:46 ` Michael Laß
2019-05-21 19:00 ` Andrea Gelmini
2019-05-21 19:59 ` Michael Laß
2019-05-21 20:12 ` Mike Snitzer
2019-05-24 15:00 ` Andrea Gelmini
2019-05-24 15:10 ` Greg KH
[not found] ` <CAK-xaQYPs62v971zm1McXw_FGzDmh_vpz3KLEbxzkmrsSgTfXw@mail.gmail.com>
2019-05-20 13:58 ` Michael Laß
2019-05-20 14:53 ` Andrea Gelmini
2019-05-20 16:45 ` Milan Broz
2019-05-20 19:58 ` Michael Laß
2019-05-21 18:54 ` Andrea Gelmini
2019-05-28 12:36 ` Massive filesystem corruption after balance + fstrim on Linux 5.1.2 Christoph Anton Mitterer
2019-05-28 12:43 ` Michael Laß
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).