On 2015-11-24 00:42, Duncan wrote: > Nils Steinger posted on Mon, 23 Nov 2015 22:10:12 +0100 as excerpted: > >> Do we anything about what might cause a filesystem to enter a state >> which `send` chokes on? >> I've only seen a small sample of the corrupted files before growing >> tired of the process and just recreating the whole thing, but all of >> them were database files (presumably SQLite). Could it be that the files >> were being written to during an unclean shutdown, leading to some kind >> of corruption of the FS? Unfortunately, I was a little triggerhappy when >> cleaning up old snapshots, so there aren't any left to aid in >> troubleshooting this problem further… That's OK, I've not been able to figure out much anyway, despite the case of this I had about a month ago with about 200 different files hitting the issue (I had written a script at that time to automate fixing it, but haven't been able to find it for some reason), and the other cases I've had on my systems over the past year (I only started using send about a year ago for backups). It might be worth noting that you're the first person who's directly reported this (I would have, but I hate to report stuff that isn't a critical data safety issue without a reliable reproducer). > > Austin's the one attempting to trace down the problem, so he'd have the > most direct answer there. (My use-case doesn't involve snapshotting or > send/receive at all.) I stopped using send/receive for backups after hitting this for what I think is the seventh time in the past year about a month ago (I still use snapshots for backups, but now I use them to generate SquashFS images (I really don't care about the block layout or inode numbers or most of the BTRFS related properties), which preserves my desire to have bootable backups, and also saves significant storage space both locally and on the cloud storage services I use for off-site backups (and in turn saves money on those too)). I am still trying to pull together something to reliably reproduce this though, as I still use send/receive for some things (like cloning VM's without taking them offline or hitting the issues with block copies of a BTRFS filesystem). > > But if any type of files would be likely to create issues, it'd be > something like database or VM image files, since the random-file-rewrite- > pattern they typically have is in general the most problematic for copy- > on-write (COW) filesystems such as btrfs. Without some sort of > additional fragmentation management (like the autodefrag mount option), > these files will end up _highly_ fragmented on btrfs, often thousands of > fragments, tens of thousands when the files in question are multi-gig. In general, I've seen this mostly with three types of files: 1. Database files and VM images (In my experience, this has been the majority of the issue on filesystems that have them. Autodefrag doesn't seem to help, at least, not for SQLite or BerkDB/GDBM databases). 2. Shared libraries and executables (these are the majority of the issue on filesystems without databases or VM images, although I can't for the life of me figure out why, as they are usually written to very infrequently) 3. Plain text configuration files. For example, the last time I had this happen, it was on the root filesystem of one of my systems, and about a third of the problem files were either in /etc or text files under /usr/share, while the remaining 2 thirds were mostly stuff under /usr/lib and /lib. It's probably worth noting also that I've never seen certain files trigger this that I would expect to based on the above info, in particular: 1. ClamAV virus databases (IIRC, these are similar in structure to SQLite DB's). 2. BOINC applications. 3. Almost anything in /usr/libexec (stuff like GCC and binutils). 4. Almost any kind of script. It's probably also worth noting that I occasionally see inconsistencies in database files that cause this to happen, but have never seen any corruption in any other types of file, so it doesn't seem to have an impact on data safety.