From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [195.159.176.226] ([195.159.176.226]:38273 "EHLO blaine.gmane.org" rhost-flags-FAIL-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1751712AbcILEgX (ORCPT ); Mon, 12 Sep 2016 00:36:23 -0400 Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1bjIyf-00023x-Op for linux-btrfs@vger.kernel.org; Mon, 12 Sep 2016 06:36:17 +0200 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: compress=lzo safe to use? Date: Mon, 12 Sep 2016 04:36:07 +0000 (UTC) Message-ID: References: <15415597-7f29-396e-8425-8cbbeb32e897@crc.id.au> <21b8852b-fba6-6f8f-feed-7bbfa12312d2@crc.id.au> <4096253.hu8ZAHGEqT@merkaba> <6ef80ffd-6a56-3538-0778-a99cb4b9851e@mendix.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Hans van Kranenburg posted on Sun, 11 Sep 2016 22:49:58 +0200 as excerpted: > So, you can use a lot of compress without problems for years. > > Only if your hardware is starting to break in a specific way, causing > lots and lots of checksum errors, the kernel might not be able to handle > all of them at the same time currently. > > The compress might be super stable itself, but in this case another part > of the filesystem is not perfecty able to handle certain failure > scenario's involving it. Well put. In my case I had problems trigger due to exactly two things, tho there are obviously other ways of triggering the same issues, including a crash in the middle of a commit, with one copy of the raid1 already updated while the other is still being written.: 1) I first discovered the problem when one of my pair of ssds was going bad. Because I had btrfs raid1 and could normally scrub-fix things, and because I had backups anyway, I chose to continue running it for some time, just to see how it handled things, as more and more sectors became unwritable and were replaced by spares. By the end I had several MiB worth of spares in-use, altho smart reported I had only used about 15% of the available spares, but by then it was getting bad enough and the newness had worn off, so I just replaced it and got rid of the hassle. But as a result of the above, I had a *LOT* of practice with btrfs recovery, mostly running scrub. And what I found was that if btrfs raid1 encounters too many checksum errors in compressed data it will crash btrfs and the kernel, even when it *SHOULD* recover from the other device because it has a good copy, as demonstrated by the fact that after a reboot, I could run a scrub and fix everything, no uncorrected errors at all. At first I thought it was just the way btrfs worked -- that it could handle a few checksum errors but not too many at once. I had no idea it was compression related. But nobody else seemed to mention the problem, which I though a bit strange, until someone /did/ mention it, and furthermore, actually tested both compressed and uncompressed btrfs, and found the problem only when btrfs was reading compressed data. If the data wasn't compressed, btrfs went ahead and read the second copy correctly, without crashing the system, every time. The extra kink in this is that at the time, I had a boot-time service setup to cache (via cat > /dev/null) a bunch of files in a particular directory. This particular directory is a cache for news archives, with articles on some groups going back over a decade to 2002, and my news client (pan) is slow to startup with several gigs of cached messages like that, so I had the boot-time service pre-cache everything, so by the time I started X and pan, it would be done or nearly so and I'd not have to wait for pan to startup. The problem was that many of the new files were in this directory, and all that activity tended to hit the going-bad sectors on that ssd rather frequently, making one copy often bad. Additionally, these are mostly text messages, so they compress quite well, meaning compress=lzo would trigger compression on many of them. And because I had it reading them at boot, the kernel tended to overload on checksum errors before it finished booting, far more frequently than it would have otherwise. Of course, that would crash the system before I could get a login in ordered to run btrfs scrub and fix the problem. What I had to do then was boot to rescue mode, with the filesystems mounted but before normal services (including this caching service) ran, run the scrub from there, and then continue boot, which would then work just fine because I'd fixed all the checksum errors. But, as I said I eventually got tired of the hassle and just replaced the failing device. Btrfs replace worked nicely. =:^) 2a) My second trigger is that I've found that with multiple devices, as in multi-device btrfs, but also when I used to run mdraid, don't always resume from suspend-to-RAM very well. Often one device takes longer to wake up than the other(s), and the kernel will try to resume while one still isn't responding properly. (FWIW, I ran into this problem on spinning rust back on mdraid, but I see it now on ssds on btrfs as well, so it seems to be a common issue, which probably remains relatively obscure I'd guess because relatively few people with multi-device btrfs or mdraid do suspend-to-ram.) The result is that btrfs will try to write to the remaining device(s), getting them out of sync with the one that isn't responding properly yet. Ultimately this leads to a crash if I don't catch it and complete a controlled shutdown before that, and sometimes I see the same crash-on- boot-due-to-too-many-checksum-errors problem I saw with #1. I no longer have that caching job running at boot and thus don't see it as often, but it still happens occasionally. Again, once I boot to rescue mode and run scrub, it fixes the problem and I can resume the normal mode boot without further issue. So I pretty much quit suspending to RAM, at least for any longer period, and just shutdown and reboot, now. With systemd and ssds, the boot doesn't take significantly longer anyway, tho it does mean I can't simply resume and pick up where I was, I have to reopen my work, etc. 2b) Closely related to #2a and most recent, since I'm no longer trying to suspend to RAM, I think one of the ssds now has a bad backup capacitor or something, as if I leave it idle for too long it'll fail to respond once I start trying to use it again. Same story, the other device gets writes that the unresponsive device is missing, and eventually if I don't reboot I crash. Upon reboot, again, if there were too many things written to the device that stayed up that didn't make it to the other one, it can trigger a crash due to checksum failure. However, if I can get a command prompt, either because it boots all the way or because I boot to rescue mode, I can run a scrub and update the bad device from the good one, and then everything works fine once again... until the device goes unresponsive, again. Again, I once thought all this was just the stage at which btrfs was, until I found out that it doesn't seem to happen if btrfs compression isn't being used. Something about the way it recovers from checksum errors on compressed data differs from the way it recovers from checksum errors on uncompressed data, and there's a bug in the compressed data processing path. But beyond that, I'm not a dev and it gets a bit fuzzy, which also explains why I've not gone code diving and submitted patches to try to fix it, myself. But if I'm correct, it probably doesn't matter what the compression type is, only how much of it there is. So compress-force would tend to trigger the issue far more frequently than simply compress, unless of course your use-case is a corner-case like my trying to read all those compressible text messages into cache at boot was, but compress (or compress-force) =lzo vs =zlib shouldn't matter. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman