Re: It's broke, Jim. BTRFS mounted read only after corruption errors

* Re: It's broke, Jim. BTRFS mounted read only after corruption errors
@ 2021-09-01  2:45 Duncan
  2021-09-04 13:02 ` Martin Steigerwald
  0 siblings, 1 reply; 2+ messages in thread
From: Duncan @ 2021-09-01  2:45 UTC (permalink / raw)
  To: Martin Steigerwald, Btrfs BTRFS

Martin Steigerwald posted on Sun, 22 Aug 2021 13:14:39 +0200 as
excerpted:

> This might be a sequel of:
> 
> Corruption errors on Samsung 980 Pro
> 
> https://lore.kernel.org/linux-btrfs/2729231.WZja5ltl65@ananda/

I saw on the previous thread some discussion of trim/discard but lost 
track of whether you're still trying to enable it in the mount options
or not.

I'd suggest *NOT* enabling trim/discard on any samsung SSDs unless you 
are extremely confident that it is well tested and known to work on
your particular model, because...

I have a samsung 850 evo and did some earlier research on trim/discard 
for it.

What I found was that at least for earlier samsung ssds, queued-trim
had been found not to work safely, with a number of bugs filed over the 
years, resulting in samsung ssds (and a few others) being queued-trim-
blacklisted in the kernel.  Back when I did my research at least, the 
blacklist was for all samsung ssd models, with the problem being that 
they claimed sata 3.1 compliance which requires queued-trim, but they 
didn't actually handle it as a queued command.  When it's not queued
the queued command stream must be flushed before a discard, with the
discard and then another flush issued to ensure proper write order,
before queued command stream can be resumed.

In theory the black-listing should mean the kernel does the right thing 
and it's simply slower, but then it's slower, so not enabling the
discard mount option is probably a good idea anyway.

Now it's quite possible that your newer 980 pro model handles
queued-trim properly, but it's also possible that it still doesn't,
while the kernel blacklist might have been updated assuming it does, or
that the blacklist isn't applying for some reason. And given that you're
seeing problems, probably better safe than sorry. I'd leave discard
disabled.

Another consideration for btrfs is the older root-blocks that are not 
normally immediately overwritten, that thus remain available to use for 
repair/recovery should that be necessary.   Because they're technically 
no longer in use the discard mount option clears these along with other 
unused blocks, so they're no longer an option for repair/recover. =:^(

The alternative (beyond possibly deliberately leaving some unpartitioned
free-space for the ssd wear-leveling algorithm to work with, in
addition to the unreported space it already reserves for that
purpose) is fstrim.

At least on my systemd-option gentoo, there's a weekly fstrim scheduled 
(see fstrim.service and fstrim.timer, owned by the util-linux package), 
tho I don't recall whether I had to enable it or whether it was enabled 
automatically.

Tho it's worth noting that the default fstrim.service apparently (based 
on my logs) only trims filesystems mounted read-write when it runs.  I 
have several filesystems not mounted by default (backups and /boot 
mostly), and my / is mounted read-only by default, and they don't get 
fstrimmed.  But the backups tend to be mkfs.btrfs, mount, backup, 
unmount, with few or no writes after the backup, and mkfs.btrfs already 
does a trim to clear the partition before it does the mkfs, so there's 
little to trim there.  / and /boot get more writes, but /boot is
sub-GiB and / is only 8 GiB, trivial when I've several hundred GiB of
the 1 TB ssd entirely unpartitioned for the ssd firmware to wear-level
with, so I'm not too worried.  But it's something to be aware of and to
consider modifying the scheduled commandline if necessary for your
use-case.

Something else I used to wonder about was whether fstrim handled all 
devices on a multi-device btrfs, or just the specific device that it
was pointed at (that mount said was mounted in the case of the
automatic runs).  But while the log only indicates the one device
fstrimmed, the reported space trimmed is the free space of the entire
filesystem, pair-device btrfs raid1 for most of my btrfs, with double
the free space of the one device reported as trimmed, so it does appear
to trim the free space on all devices of the filesystem despite only
listing one in the log.

As for backup root-blocks, fstrim will still clear those too, but since 
it's running just once a week, on filesystems with any routine writes
at all, the window in which there's not at least a couple backup
root-blocks available is going to be reasonably small, likely to be
considered worth the trivial incremental risk for anyone following the
sysadmin's rule that the value of the data is defined by the number
(and freshness) of backups of said data it's considered valuable enough
to have.  And for filesystems without a lot of writes that's less risk
of damage during any potential crash within that window anyway, so
again, worth the trivial incremental risk, especially compared to that
of using the discard mount option.

-- 
Duncan - No HTML messages please; they are filtered as spam.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 2+ messages in thread