All of lore.kernel.org
 help / color / mirror / Atom feed
* Scrub: no spae left on device
@ 2015-12-08 15:06 Marc MERLIN
  2015-12-08 15:37 ` Holger Hoffstätte
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Marc MERLIN @ 2015-12-08 15:06 UTC (permalink / raw)
  To: linux-btrfs

Howdy,

Why would scrub need space and why would it cancel if there isn't enough of
it?
(kernel 4.3)

/etc/cron.daily/btrfs-scrub:
btrfs scrub start -Bd /dev/mapper/cryptroot
scrub device /dev/mapper/cryptroot (id 1) done
	scrub started at Mon Dec  7 01:35:08 2015 and finished after 258 seconds
	total bytes scrubbed: 130.84GiB with 0 errors
btrfs scrub start -Bd /dev/mapper/pool1
ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 (No space left on device)
scrub device /dev/mapper/pool1 (id 1) canceled

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scrub: no spae left on device
  2015-12-08 15:06 Scrub: no spae left on device Marc MERLIN
@ 2015-12-08 15:37 ` Holger Hoffstätte
  2015-12-08 15:46   ` Lionel Bouton
  2015-12-08 15:39 ` Lionel Bouton
  2015-12-08 16:00 ` Austin S Hemmelgarn
  2 siblings, 1 reply; 10+ messages in thread
From: Holger Hoffstätte @ 2015-12-08 15:37 UTC (permalink / raw)
  To: Marc MERLIN, linux-btrfs

On 12/08/15 16:06, Marc MERLIN wrote:
> Howdy,
> 
> Why would scrub need space and why would it cancel if there isn't enough of
> it?
> (kernel 4.3)
> 
> /etc/cron.daily/btrfs-scrub:
> btrfs scrub start -Bd /dev/mapper/cryptroot
> scrub device /dev/mapper/cryptroot (id 1) done
> 	scrub started at Mon Dec  7 01:35:08 2015 and finished after 258 seconds
> 	total bytes scrubbed: 130.84GiB with 0 errors
> btrfs scrub start -Bd /dev/mapper/pool1
> ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 (No space left on device)
> scrub device /dev/mapper/pool1 (id 1) canceled

Scrub rewrites metadata (apparently even in -r aka readonly mode), and that
can lead to temporary metadata expansion (stuff gets COWed around); it's
a bit surprising but makes sense if you think about it. The fact that you
ENOSPCed means that the fs was probably already fully allocated.

If it bothers you, a subsequent balance with -musage=10 should vacuum things
up. Alternatively just keep using the filesystem; eventually the empty metadata
chunks should be collected, on the next remount at the latest.

tl;dr: Never allocate all the chunks. Yes, this needs more graceful handling.

-h


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scrub: no spae left on device
  2015-12-08 15:06 Scrub: no spae left on device Marc MERLIN
  2015-12-08 15:37 ` Holger Hoffstätte
@ 2015-12-08 15:39 ` Lionel Bouton
  2015-12-08 16:00 ` Austin S Hemmelgarn
  2 siblings, 0 replies; 10+ messages in thread
From: Lionel Bouton @ 2015-12-08 15:39 UTC (permalink / raw)
  To: Marc MERLIN, linux-btrfs

Le 08/12/2015 16:06, Marc MERLIN a écrit :
> Howdy,
>
> Why would scrub need space and why would it cancel if there isn't enough of
> it?
> (kernel 4.3)
>
> /etc/cron.daily/btrfs-scrub:
> btrfs scrub start -Bd /dev/mapper/cryptroot
> scrub device /dev/mapper/cryptroot (id 1) done
> 	scrub started at Mon Dec  7 01:35:08 2015 and finished after 258 seconds
> 	total bytes scrubbed: 130.84GiB with 0 errors
> btrfs scrub start -Bd /dev/mapper/pool1
> ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 (No space left on device)
> scrub device /dev/mapper/pool1 (id 1) canceled

I can't be sure (not-a-dev), but one possibility that comes to mind is
that if an error is detected writes must be done on the device. The
repair might not be done in-place but with CoW and even if the error is
not repaired by lack of redundancy IIRC each device tracks the number of
errors detected so I assume this is written somewhere (system or
metadata chunks most probably).

Best regards,

Lionel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scrub: no spae left on device
  2015-12-08 15:37 ` Holger Hoffstätte
@ 2015-12-08 15:46   ` Lionel Bouton
  2015-12-08 16:02     ` Holger Hoffstätte
  2015-12-08 16:06     ` Marc MERLIN
  0 siblings, 2 replies; 10+ messages in thread
From: Lionel Bouton @ 2015-12-08 15:46 UTC (permalink / raw)
  To: Holger Hoffstätte, Marc MERLIN, linux-btrfs

Le 08/12/2015 16:37, Holger Hoffstätte a écrit :
> On 12/08/15 16:06, Marc MERLIN wrote:
>> Howdy,
>>
>> Why would scrub need space and why would it cancel if there isn't enough of
>> it?
>> (kernel 4.3)
>>
>> /etc/cron.daily/btrfs-scrub:
>> btrfs scrub start -Bd /dev/mapper/cryptroot
>> scrub device /dev/mapper/cryptroot (id 1) done
>> 	scrub started at Mon Dec  7 01:35:08 2015 and finished after 258 seconds
>> 	total bytes scrubbed: 130.84GiB with 0 errors
>> btrfs scrub start -Bd /dev/mapper/pool1
>> ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 (No space left on device)
>> scrub device /dev/mapper/pool1 (id 1) canceled
> Scrub rewrites metadata (apparently even in -r aka readonly mode), and that
> can lead to temporary metadata expansion (stuff gets COWed around); it's
> a bit surprising but makes sense if you think about it.

How long must I think about it until it makes sense? :-)

Sorry I'm not sure why metadata is rewritten if no error is detected.
I've several theories but lack information: is the fact that no error
has been detected stored somewhere? is scrub using some kind of internal
temporary snapshot(s) to avoid interfering with other operations? other
reason I didn't think about?

Lionel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scrub: no spae left on device
  2015-12-08 15:06 Scrub: no spae left on device Marc MERLIN
  2015-12-08 15:37 ` Holger Hoffstätte
  2015-12-08 15:39 ` Lionel Bouton
@ 2015-12-08 16:00 ` Austin S Hemmelgarn
  2 siblings, 0 replies; 10+ messages in thread
From: Austin S Hemmelgarn @ 2015-12-08 16:00 UTC (permalink / raw)
  To: Marc MERLIN, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 295 bytes --]

On 2015-12-08 10:06, Marc MERLIN wrote:
> Howdy,
>
> Why would scrub need space and why would it cancel if there isn't enough of
> it?
> (kernel 4.3)
>
Wild guess here, but maybe scrub unconditionally updates the error 
counters, regardless of whether any errors were found or not?



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scrub: no spae left on device
  2015-12-08 15:46   ` Lionel Bouton
@ 2015-12-08 16:02     ` Holger Hoffstätte
  2015-12-08 16:06     ` Marc MERLIN
  1 sibling, 0 replies; 10+ messages in thread
From: Holger Hoffstätte @ 2015-12-08 16:02 UTC (permalink / raw)
  To: Lionel Bouton, Marc MERLIN, linux-btrfs

On 12/08/15 16:46, Lionel Bouton wrote:
> Le 08/12/2015 16:37, Holger Hoffstätte a écrit :
>> On 12/08/15 16:06, Marc MERLIN wrote:
>>> Howdy,
>>>
>>> Why would scrub need space and why would it cancel if there isn't enough of
>>> it?
>>> (kernel 4.3)
>>>
>>> /etc/cron.daily/btrfs-scrub:
>>> btrfs scrub start -Bd /dev/mapper/cryptroot
>>> scrub device /dev/mapper/cryptroot (id 1) done
>>> 	scrub started at Mon Dec  7 01:35:08 2015 and finished after 258 seconds
>>> 	total bytes scrubbed: 130.84GiB with 0 errors
>>> btrfs scrub start -Bd /dev/mapper/pool1
>>> ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 (No space left on device)
>>> scrub device /dev/mapper/pool1 (id 1) canceled
>> Scrub rewrites metadata (apparently even in -r aka readonly mode), and that
>> can lead to temporary metadata expansion (stuff gets COWed around); it's
>> a bit surprising but makes sense if you think about it.
> 
> How long must I think about it until it makes sense? :-)
> 
> Sorry I'm not sure why metadata is rewritten if no error is detected.
> I've several theories but lack information: is the fact that no error
> has been detected stored somewhere? is scrub using some kind of internal
> temporary snapshot(s) to avoid interfering with other operations? other
> reason I didn't think about?

Well..I have no idea what the historical motivation for this behaviour was,
even though I can make up at least two: rewriting known-good checksums
generally (since you know they are good this very moment), and in case of
error avoiding the area where the block error occurred (read errors on rust
are often clustered and affect entire tracks).

That's really all I know. I agree it's surprising, especially since it
happens by default and also in -r mode, which might be considered a bug.

-h


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scrub: no spae left on device
  2015-12-08 15:46   ` Lionel Bouton
  2015-12-08 16:02     ` Holger Hoffstätte
@ 2015-12-08 16:06     ` Marc MERLIN
  2015-12-08 16:24       ` Holger Hoffstätte
  2015-12-09  6:46       ` Duncan
  1 sibling, 2 replies; 10+ messages in thread
From: Marc MERLIN @ 2015-12-08 16:06 UTC (permalink / raw)
  To: Lionel Bouton; +Cc: Holger Hoffstätte, linux-btrfs

On Tue, Dec 08, 2015 at 04:46:32PM +0100, Lionel Bouton wrote:
> Le 08/12/2015 16:37, Holger Hoffstätte a écrit :
> > On 12/08/15 16:06, Marc MERLIN wrote:
> >> Howdy,
> >>
> >> Why would scrub need space and why would it cancel if there isn't enough of
> >> it?
> >> (kernel 4.3)
> >>
> >> /etc/cron.daily/btrfs-scrub:
> >> btrfs scrub start -Bd /dev/mapper/cryptroot
> >> scrub device /dev/mapper/cryptroot (id 1) done
> >> 	scrub started at Mon Dec  7 01:35:08 2015 and finished after 258 seconds
> >> 	total bytes scrubbed: 130.84GiB with 0 errors
> >> btrfs scrub start -Bd /dev/mapper/pool1
> >> ERROR: scrubbing /dev/mapper/pool1 failed for device id 1 (No space left on device)
> >> scrub device /dev/mapper/pool1 (id 1) canceled
> > Scrub rewrites metadata (apparently even in -r aka readonly mode), and that
> > can lead to temporary metadata expansion (stuff gets COWed around); it's
> > a bit surprising but makes sense if you think about it.
> 
> How long must I think about it until it makes sense? :-)
> 
> Sorry I'm not sure why metadata is rewritten if no error is detected.
> I've several theories but lack information: is the fact that no error
> has been detected stored somewhere? is scrub using some kind of internal
> temporary snapshot(s) to avoid interfering with other operations? other
> reason I didn't think about?

Yeah, I was also wondering why metadata should be rewritten on a single
device scrub.
Does not make sense to me.

And this is what I got:
legolas:~# btrfs balance start -musage=10 -v /mnt/btrfs_pool1/ 
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=10
  SYSTEM (flags 0x2): balancing, usage=10
ERROR: error during balancing '/mnt/btrfs_pool1/' - No space left on device
There may be more info in syslog - try dmesg | tail

Ok, that sucks.

legolas:~# btrfs balance start -musage=0 -v /mnt/btrfs_pool1/
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=0
  SYSTEM (flags 0x2): balancing, usage=0
Done, had to relocate 0 out of 618 chunks

This worked. Mmmh, I thought this wouldn't be necessary anymore in 4.3 kernels?

legolas:~# btrfs balance start -musage=10 -v /mnt/btrfs_pool1
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=10
  SYSTEM (flags 0x2): balancing, usage=10
Done, had to relocate 1 out of 618 chunks

And now I'm back in business...

Still, this is a bit disappointing and at the very least very unexpected in 4.3.

legolas:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=604.88GiB, used=520.09GiB
System, DUP: total=32.00MiB, used=96.00KiB
Metadata, DUP: total=5.00GiB, used=4.17GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
legolas:~# btrfs fi show /mnt/btrfs_pool1
Label: 'btrfs_pool1'  uuid: 5ee24229-2431-448a-868e-2c325d10bfa7
	Total devices 1 FS bytes used 524.26GiB
	devid    1 size 615.01GiB used 614.94GiB path /dev/mapper/pool1


Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scrub: no spae left on device
  2015-12-08 16:06     ` Marc MERLIN
@ 2015-12-08 16:24       ` Holger Hoffstätte
  2015-12-08 16:39         ` Marc MERLIN
  2015-12-09  6:46       ` Duncan
  1 sibling, 1 reply; 10+ messages in thread
From: Holger Hoffstätte @ 2015-12-08 16:24 UTC (permalink / raw)
  To: Marc MERLIN, Lionel Bouton; +Cc: linux-btrfs

On 12/08/15 17:06, Marc MERLIN wrote:
> Label: 'btrfs_pool1'  uuid: 5ee24229-2431-448a-868e-2c325d10bfa7
> 	Total devices 1 FS bytes used 524.26GiB
> 	devid    1 size 615.01GiB used 614.94GiB path /dev/mapper/pool1
                        ^^^^^^^^^^^^^^^^^^^^^^^^
This is what I was alluding to. You could have started a -dusage balance
*before* the scrub so that one or several data chunks get freed.
Balancing metadata when you're out of space accomplishes nothing and only
will very likely fail, just as you saw. You have ~90GB usable space, but
that space is spread over chunks with low utilisation.

-h


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scrub: no spae left on device
  2015-12-08 16:24       ` Holger Hoffstätte
@ 2015-12-08 16:39         ` Marc MERLIN
  0 siblings, 0 replies; 10+ messages in thread
From: Marc MERLIN @ 2015-12-08 16:39 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: Lionel Bouton, linux-btrfs

On Tue, Dec 08, 2015 at 05:24:16PM +0100, Holger Hoffstätte wrote:
> On 12/08/15 17:06, Marc MERLIN wrote:
> > Label: 'btrfs_pool1'  uuid: 5ee24229-2431-448a-868e-2c325d10bfa7
> > 	Total devices 1 FS bytes used 524.26GiB
> > 	devid    1 size 615.01GiB used 614.94GiB path /dev/mapper/pool1
>                         ^^^^^^^^^^^^^^^^^^^^^^^^
> This is what I was alluding to. You could have started a -dusage balance
> *before* the scrub so that one or several data chunks get freed.
> Balancing metadata when you're out of space accomplishes nothing and only
> will very likely fail, just as you saw. You have ~90GB usable space, but
> that space is spread over chunks with low utilisation.

Yes, my partition got a bit full, I freed up space, and unfortunately we
still don't have a background rebalance to fix this, so I did run a manual
one.
But my filesystem was usable, I was writing to it just fine. I was just very
surprised that scrub needed to rewrite blocks on a single disk device.

You could make the case that scrub and balance=0 should be run together.
In the meantime, I upgraded my script:
http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair
http://marc.merlins.org/linux/scripts/btrfs-scrub

I figured there is no good reason not to run a balance 20 on metadata and
data every night.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scrub: no spae left on device
  2015-12-08 16:06     ` Marc MERLIN
  2015-12-08 16:24       ` Holger Hoffstätte
@ 2015-12-09  6:46       ` Duncan
  1 sibling, 0 replies; 10+ messages in thread
From: Duncan @ 2015-12-09  6:46 UTC (permalink / raw)
  To: linux-btrfs

Marc MERLIN posted on Tue, 08 Dec 2015 08:06:15 -0800 as excerpted:

> On Tue, Dec 08, 2015 at 04:46:32PM +0100, Lionel Bouton wrote:
>> Le 08/12/2015 16:37, Holger Hoffstätte a écrit :
>> > On 12/08/15 16:06, Marc MERLIN wrote:
>> >>
>> >> Why would scrub need space and why would it cancel if there isn't
>> >> enough of it? (kernel 4.3)
>> >>
>> >> btrfs scrub start -Bd /dev/mapper/pool1
>> >> ERROR: scrubbing /dev/mapper/pool1 failed for device id 1
>> >> (No space left on device)
>> >> scrub device /dev/mapper/pool1 (id 1) canceled
>> > Scrub rewrites metadata (apparently even in -r aka readonly mode),
>> > and that can lead to temporary metadata expansion (stuff gets COWed
>> > around); it's a bit surprising but makes sense if you think about it.

Are you sure about that?

My / is mounted ro by default, and if I try to scrub it in normal mode, 
it'll error out due to read-only.  But I can run a read-only scrub just 
fine, and if I find errors, I simply mount it writable and redo the scrub 
without the -r.  (My / is only 8 GiB, under half used including metadata 
on a fast SSD, so scrubs complete in under 30 seconds, and doing a read-
only scrub followed by a mount-writable and a second fixing scrub if 
necessary, is trivial.)

>> Sorry I'm not sure why metadata is rewritten if no error is detected.

But scrub will of course do copy-on-write if there's an error, and it's 
possible that on initialization it checks for space to do a few cows if 
necessary, before it actually checks for the -r read-only flag.  I try to 
leave at least enough unallocated space to do a balance, which of course 
except for -dusage=0 (or -musage=0) writes a new chunk to rewrite 
existing chunks into, so I'd be unlikely to ever get that close to out of 
space to trigger the possible initialization-time space-warning, and thus 
wouldn't know whether it has one or whether it comes before the -r check, 
or not.

> And this is what I got:
> legolas:~# btrfs balance start -musage=10 -v /mnt/btrfs_pool1/
> Dumping filters: flags 0x6, state 0x0, force is off
>   METADATA (flags 0x2): balancing, usage=10
>   SYSTEM (flags 0x2): balancing, usage=10
> ERROR: error during balancing '/mnt/btrfs_pool1/' - No space left on
> device There may be more info in syslog - try dmesg | tail
> 
> Ok, that sucks.
> 
> legolas:~# btrfs balance start -musage=0 -v /mnt/btrfs_pool1/
> Dumping filters: flags 0x6, state 0x0, force is off
>   METADATA (flags 0x2): balancing, usage=0
>   SYSTEM (flags 0x2): balancing, usage=0
> Done, had to relocate 0 out of 618 chunks
> 
> This worked. Mmmh, I thought this wouldn't be necessary anymore in 4.3
> kernels?

Well, it said it had to relocate zero blocks, so it _appears_ that it 
didn't do anything, which would be expected on reasonably current kernels 
as they already clean up zero-usage chunks, automatically.  *BUT*...

> legolas:~# btrfs balance start -musage=10 -v /mnt/btrfs_pool1
> Dumping filters: flags 0x6, state 0x0, force is off
>   METADATA (flags 0x2): balancing, usage=10
>   SYSTEM (flags 0x2):  balancing, usage=10
> Done, had to relocate 1 out of 618 chunks

... if it did nothing in the -musage=0 case above, why did the -musage=10 
case fail before, but succeed after?

That's a very good question I don't have an answer to.  Good question for 
the devs and others that actually read code.

Meanwhile, note that if it relocates only a single chunk (of non-zero 
usage), under normal circumstances, it'll take exactly the same amount of 
space as before, because it'd allocate a new chunk of exactly the same 
size as the one it was rewriting.

However, once remaining unallocated space gets tight enough, it starts 
allocating smaller than normal chunks, which may be what happened this 
time.  Presumably that chunk was originally allocated when the filesystem 
still has much more unallocated free space, so it was a standard size 
chunk.  When it was rewritten, unallocated space was much tighter, so a 
smaller chunk would likely be written, which would then be rather fuller 
than it was previously, as it would have the same amount of metadata in 
it, but be a smaller chunk.

And, perhaps partially answering my own question above, the balance with 
-musage=0 somehow triggered a space reevaluation, thus allowing the 
-musage=10 balance to run afterward when it wouldn't before, even tho the 
-musage=0 didn't actually relocate (to /dev/null as they'd be empty, IOW, 
delete) any empty chunks.

But... it still shouldn't happen, as if -musage=0 didn't relocate 
anything, it shouldn't trigger a space reevaluage that -musage=10 
wouldn't trigger on its own, so while this might partially answer what 
happened, it does nothing to explain /why/ it happened.  I'd call it a 
bug in the balance code, as the result of the -musage=10 should be 
exactly the same before and after, because the -musage=0 didn't actually 
relocate/delete anything.

> And now I'm back in business...
> 
> Still, this is a bit disappointing and at the very least very unexpected
> in 4.3.
> 
> legolas:~# btrfs fi df /mnt/btrfs_pool1
> Data, single: total=604.88GiB, used=520.09GiB
> System, DUP: total=32.00MiB, used=96.00KiB
> Metadata, DUP: total=5.00GiB, used=4.17GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B

> legolas:~# btrfs fi show /mnt/btrfs_pool1
> Label: 'btrfs_pool1'  uuid: [...]
> 	Total devices 1 FS bytes used 524.26GiB
>       devid    1 size 615.01GiB used 614.94GiB path /dev/mapper/pool1


As Holger points out, you really are out of unallocated space.

And metadata is 5.00 GiB allocated, 4.17 directly used, plus the global 
reserve (which was recently confirmed on-list to come out of metadata) of 
half a GiB, so 4.17 + 0.50 = 4.67 GiB out of 5.00 used, so while not 
entirely full, you're close enough (under half a GiB free, and it's dup 
so you're under a pair of quarter-GiB metadata chunks free) that large 
operations may fail.

But as Holger also alluded to, you have all sorts of data space available 
(see below for why), with metadata space almost entirely used.

So why were you running -m balances, when -m was basically full but -d 
had some spare room and you actually needed to clear it?  Why weren't you 
doing -dusage=, to clear out those (partially, again, see below) empty 
data chunks, instead of the -musage=, which couldn't do much as metadata 
was pretty much fully used already?

And your command-prompts don't include timestamps so I can't say for 
sure, but presumably those results were AFTER the balance -musage=10 
succeeded and we don't have any pre-balance reports.  It's possible you 
were actually in worse shape before.

Meanwhile, it's worth noting that while current kernel btrfs /does/ 
automatically delete entirely empty chunks now, so -[dm]usage=0 can be 
expected to do nothing as the kernel already does that on its own now, 
thereby fixing the previously most extreme out-of-balance scenarios where 
there's loads of entirely empty chunks lying around, the kernel does 
*not* automatically do balances of _mostly-but-not-entirely_ empty chunks.

Which means that over time, normal usage is still likely to accumulate a 
bunch of say 1-60% full chunks, most likely data, that can still add up 
to tens or even hundreds of gigs of wasted chunk allocations that are 
*not* automatically cleared, because there's still at least *some* usage 
in those chunks.

Of course people leaving old snapshots lying around will exacerbate the 
problem, but even without snapshots, it'll likely still develop, given 
enough time, tho with usage=0 chunks automatically deleted now, it should 
take far longer than it did before.

That explains that data line above, nearly 605 GiB data chunk allocation, 
with only just over 520 GiB actually used, a difference of ~85 GiB.

While space is pretty tight and you might have to start pretty small (or 
delete a bunch of snapshots or temporarily delete or move off-filesystem 
a bunch of unsnapshotted files, hopefully clearing at least some data 
chunks to usage=0 so they can be cleaned up by the kernel or manually), 
say at -dusage=1, you should be able to get a good portion of that 85 GiB 
back with balance -dusage=, going up to say 70% if necessary, as you may 
have several 70% full chunks that can combine to one or two less chunks 
if they're rebalanced.

After that, please try to keep at least 5 or even 10 GiB unallocated, 
doing -dusage= balances while you still have enough room for balance to 
write new chunks, not letting it get so tight.  That's even more critical 
now than it was before, because there's unlikely to be zero-usage chunks 
lying around to balance away getting you out of the tight spot, because 
the kernel now balances those away on its own.

And of course if you do that, you shouldn't run into the scrub ENOSPC 
errors, either. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-12-09  6:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-08 15:06 Scrub: no spae left on device Marc MERLIN
2015-12-08 15:37 ` Holger Hoffstätte
2015-12-08 15:46   ` Lionel Bouton
2015-12-08 16:02     ` Holger Hoffstätte
2015-12-08 16:06     ` Marc MERLIN
2015-12-08 16:24       ` Holger Hoffstätte
2015-12-08 16:39         ` Marc MERLIN
2015-12-09  6:46       ` Duncan
2015-12-08 15:39 ` Lionel Bouton
2015-12-08 16:00 ` Austin S Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.