On Thu, Feb 14, 2019 at 01:21:29PM +0100, Christoph Anton Mitterer wrote: > On Thu, 2019-02-14 at 01:22 +0000, Filipe Manana wrote: > > The following one liner fixes it: > > https://friendpaste.com/22t4OdktHQTl0aMGxcWLj3 > > Great to see that fixed... is there any advise that can be given for > users/admins? > > > Like whether and how any occurred corruptions can be detected (right > now, people may still have backups)? The problem occurs only on reads. Data that is written to disk will be OK, and can be read correctly by a fixed kernel. A kernel without the fix will give corrupt data on reads with no indication of corruption other than the changes to the data itself. Applications that copy data may read corrupted data and write it back to the filesystem. This will make the corruption permanent in the copied data. Given the age of the bug, backups that can be corrupted by this bug probably already are. Verify files against internal CRC/hashes where possible. The original files are likely to be OK, since the bug does not affect writes. If your situation has the risk factors listed below, it may be worthwhile to create a fresh set of non-incremental backups after applying the kernel fix. > Or under which exact circumstances did the corruption happen? And under > which was one safe? Compression is required to trigger the bug, so you are safe if you (or the applications you run) never enabled filesystem compression. Even if compression is enabled, the file data must be compressed for the bug to corrupt it. Incompressible data extents will never be affected by this bug. If you do use compression, you are still safe if: - you never punch holes in files - you never dedupe or clone files If you do use compression and do the other things, the probability of corruption by this particular bug is non-zero. Whether you get corruption and how often depends on the technical details of what you're doing. To get corruption you have to have one data extent that is split in two parts by punching a hole, or an extent that is cloned/deduped in two parts to adjacent logical offsets in the same file. Both of these methods create the pattern on disk which triggers the bug. Files that consist entirely of unique data will not be affected by dedupe so will not trigger the bug that way. Files that consist partially of unique data may or may not be affected depending on the dedupe tool, data alignment, etc. > E.g. only on specific compression algos (I've been using -o compress > (which should be zlib) for quite a while but never found any All decompress algorithms are affected. The bug is in the generic btrfs decompression handling, so it is not limited to any single algorithm. Compression (i.e. writing) is not affected--whatever data is written to disk should be readable correctly with a fixed kernel. > compression),... or only when specific file operations were done (I did > e.g. cp with refcopy, but I think none of the standard tools does hole- > punching)? That depends on whether you consider fallocate or qemu to be standard tools. The hole-punching function has been a feature of several Linux filesystems for some years now, so we can expect it to be more widely adopted over time. You'd have to do an audit to be sure none of the tools you use are punching holes. "Ordinary" sparse files (made by seeking forward while writing, as done by older Unix utilities including cp, tar, rsync, cpio, binutils) do not trigger this bug. An ordinary sparse file has two distinct data extents from two different writes separated by a hole which has never contained file data. A punched hole splits an existing single data extent into two pieces with a newly created hole between them that replaces previously existing file data. These actions create different extent reference patterns and only the hole-punching one is affected by the bug. Files that contain no blocks full of zeros will not be affected by fallocate-d-style hole punching (it searches for existing zeros and punches holes over them--no zeros, no holes). If the the hole punching intentionally introduces zeros where zeros did not exist before (e.g. qemu discard operations on raw image files) then it may trigger the bug. btrfs send and receive may be affected, but I don't use them so I don't have any experience of the bug related to these tools. It seems from reading the btrfs receive code that it lacks any code capable of punching a hole, but I'm only doing a quick search for words like "punch", not a detailed code analysis. bees continues to be an awesome tool for discovering btrfs kernel bugs. It compresses, dedupes, *and* punches holes. > > Cheers, > Chris. >