On 2015-11-24 00:42, Duncan wrote:
> Nils Steinger posted on Mon, 23 Nov 2015 22:10:12 +0100 as excerpted:
>
>> Do we anything about what might cause a filesystem to enter a state
>> which `send` chokes on?
>> I've only seen a small sample of the corrupted files before growing
>> tired of the process and just recreating the whole thing, but all of
>> them were database files (presumably SQLite). Could it be that the files
>> were being written to during an unclean shutdown, leading to some kind
>> of corruption of the FS? Unfortunately, I was a little triggerhappy when
>> cleaning up old snapshots, so there aren't any left to aid in
>> troubleshooting this problem further…
That's OK, I've not been able to figure out much anyway, despite the 
case of this I had about a month ago with about 200 different files 
hitting the issue (I had written a script at that time to automate 
fixing it, but haven't been able to find it for some reason), and the 
other cases I've had on my systems over the past year (I only started 
using send about a year ago for backups).  It might be worth noting that 
you're the first person who's directly reported this (I would have, but 
I hate to report stuff that isn't a critical data safety issue without a 
reliable reproducer).
>
> Austin's the one attempting to trace down the problem, so he'd have the
> most direct answer there.  (My use-case doesn't involve snapshotting or
> send/receive at all.)
I stopped using send/receive for backups after hitting this for what I 
think is the seventh time in the past year about a month ago (I still 
use snapshots for backups, but now I use them to generate SquashFS 
images (I really don't care about the block layout or inode numbers or 
most of the BTRFS related properties), which preserves my desire to have 
bootable backups, and also saves significant storage space both locally 
and on the cloud storage services I use for off-site backups (and in 
turn saves money on those too)).  I am still trying to pull together 
something to reliably reproduce this though, as I still use send/receive 
for some things (like cloning VM's without taking them offline or 
hitting the issues with block copies of a BTRFS filesystem).
>
> But if any type of files would be likely to create issues, it'd be
> something like database or VM image files, since the random-file-rewrite-
> pattern they typically have is in general the most problematic for copy-
> on-write (COW) filesystems such as btrfs.  Without some sort of
> additional fragmentation management (like the autodefrag mount option),
> these files will end up _highly_ fragmented on btrfs, often thousands of
> fragments, tens of thousands when the files in question are multi-gig.
In general, I've seen this mostly with three types of files:
1. Database files and VM images (In my experience, this has been the 
majority of the issue on filesystems that have them.  Autodefrag doesn't 
seem to help, at least, not for SQLite or BerkDB/GDBM databases).
2. Shared libraries and executables (these are the majority of the issue 
on filesystems without databases or VM images, although I can't for the 
life of me figure out why, as they are usually written to very infrequently)
3. Plain text configuration files.

For example, the last time I had this happen, it was on the root 
filesystem of one of my systems, and about a third of the problem files 
were either in /etc or text files under /usr/share, while the remaining 
2 thirds were mostly stuff under /usr/lib and /lib.  It's probably worth 
noting also that I've never seen certain files trigger this that I would 
expect to based on the above info, in particular:
1. ClamAV virus databases (IIRC, these are similar in structure to 
SQLite DB's).
2. BOINC applications.
3. Almost anything in /usr/libexec (stuff like GCC and binutils).
4. Almost any kind of script.
It's probably also worth noting that I occasionally see inconsistencies 
in database files that cause this to happen, but have never seen any 
corruption in any other types of file, so it doesn't seem to have an 
impact on data safety.