[Bug] btrfs' clear_cache mount option doesn't appear to do a rebuild, as documented that it should.

* [Bug]  btrfs' clear_cache mount option doesn't appear to do a rebuild, as documented that it should.
@ 2014-08-07 16:40 Duncan
  0 siblings, 0 replies; only message in thread
From: Duncan @ 2014-08-07 16:40 UTC (permalink / raw)
  To: linux-btrfs

Kernel 3.16.0 from git, btrfs-progs 3.14.2 from git, gentoo/~amd64.

Earlier today I had a device (SSD) not respond quickly enough after 
resume from suspend-to-ram, a problem I had frequently some months ago, 
but that I though was fixed as I've not had it in awhile.

The affected filesystems were all dual-device raid1 (data/metadata), and 
to the best of my knowledge a quit-X, systemctl emergency and SRQ-s-u-b 
prevented too much damage.  After reboot I did a scrub on the affected 
filesystems (/ is btrfs as well, but is mounted read-only by default as 
it was in this case, so it was clean, only /home and /var/log were 
writable and damaged) and believe I recovered (almost) everything else, 
as (besides not seeing any files missing/damaged) scrub did fix a number 
of errors on the first run, which on a second-run-verify didn't show up.

But, the space-cache remained screwed up on /home (/log was fine after 
the scrub).

After trying various things, including (an at first read-only to be sure 
it wasn't going to do anything else) btrfs check, remounting with 
clear_cache, remounting with nospace_cache and again with it enabled, 
etc, nothing was clearing the space-cache errors.

In fact, mounting with clear_cache resulted in even *MORE* space-cache 
errors!  As best I can see, it cleared the space-cache, but didn't 
rebuild it as the documentation says it should -- no activity beyond the 
initial mount, and the errors remained, both as reported by /mount/umount 
and as reported by btrfs check.

After I persuaded myself it wasn't going to do anything else besides 
attempt to fix the cache, I ran btrfs check --repair as well, and same 
thing, it apparently cleared the cache, but neither then nor on a 
subsequent mount did it appear to be rebuilt, and I kept getting the 
errors.

Eventually I did a (full) balance, which DID cure the problem, no more 
space-cache errors. =:^)

But why didn't clear_cache, or for that matter, btrfs check --repair, 
trigger a cache rebuild, and why was I still getting space-cache 
generation errors after a couple mount/umount cycles, with no space_cache 
rebuild activity noted?

That might be while space-cache errors are so common in the various 
posted reports -- once there's a single space-cache error, nothing but 
balance is actually fixing it, despite documentation to the contrary.

Meanwhile, I did a full balance (under 100 GB on SSD so that doesn't take 
long) and that DID fix the problem, but now I'm wondering what bit of the 
balance I actually had to run?  Would a -s/system have fixed it, or would 
-m/metadata (which implies -s as well, I believe) have been necessary, or 
is there no direct way to rebalance the space-cache at all, without doing 
a full rebalance?  I guess the space_cache wouldn't be -d/data area?

So the bug is, clear_cache may indeed clear it, but it doesn't appear to 
trigger a rebuild as documented, and btrfs check --repair seems to have 
the same behavior, possible clear, but no triggered rebuild either then 
or on the next mount.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] only message in thread