All of lore.kernel.org
 help / color / mirror / Atom feed
* Is autodefrag recommended?
@ 2017-09-04  9:31 Marat Khalili
  2017-09-04 10:23 ` Henk Slager
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Marat Khalili @ 2017-09-04  9:31 UTC (permalink / raw)
  To: linux-btrfs

Hello list,
good time of the day,

More than once I see mentioned in this list that autodefrag option 
solves problems with no apparent drawbacks, but it's not the default. 
Can you recommend to just switch it on indiscriminately on all 
installations?

I'm currently on kernel 4.4, can switch to 4.10 if necessary (it's 
Ubuntu that gives us this strange choice, no idea why it's not 4.9). 
Only spinning rust here, no SSDs.

--

With Best Regards,
Marat Khalili

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is autodefrag recommended?
  2017-09-04  9:31 Is autodefrag recommended? Marat Khalili
@ 2017-09-04 10:23 ` Henk Slager
  2017-09-04 10:34 ` Duncan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Henk Slager @ 2017-09-04 10:23 UTC (permalink / raw)
  To: Marat Khalili; +Cc: linux-btrfs

On Mon, Sep 4, 2017 at 11:31 AM, Marat Khalili <mkh@rqc.ru> wrote:
> Hello list,
> good time of the day,
>
> More than once I see mentioned in this list that autodefrag option solves
> problems with no apparent drawbacks, but it's not the default. Can you
> recommend to just switch it on indiscriminately on all installations?

Of course it has drawbacks, it depends on the use-cases on the
filesystem what your trade-off is. If the filesystem is created log
time ago and has 4k leafes the on HDD over time you get exessive
fragmentation and scattered 4k blocks all over the disk for a file
with a lot random writes (standard CoW for whole fs), like a 50G vm
image, easily 500k extents.

With autodefrag on from the beginning of fs creation, most
extent/blocksizes will be 128k or 256k in that order and then amount
of extents for the same vm image is roughly 50k. So statistically, the
average blocksize is not 4k but 128k, which is at least less free
space fragmentation (I use SSD caching of HDD otherwise also those 50k
extents result in totally unacceptable performance). But also for
newer standard 16k leafes, it is more or less the same story.

The drawbacks for me are:
1. I use nightly differential snapshotting for backup/replication over
a metered mobile network link, and this autodefrag causes a certain
amount of unnessccesaty fake content difference due to send|receive
based on CoW. But the amount of extra datavolume it causes is still
acceptable. If I would let the guest OS defragment its fs inside the
vm image for example, then the datavolume per day becomes
unacceptable.
2. It causes extra HDD activity, so noise, powerconsumption etc, which
might be unacceptable for some use-cases.

> I'm currently on kernel 4.4, can switch to 4.10 if necessary (it's Ubuntu
> that gives us this strange choice, no idea why it's not 4.9). Only spinning
> rust here, no SSDs.
Kernel 4.4 is new enough w.r.t. autodefrag, but if you can switch to
4.8 or newer, I would do so.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is autodefrag recommended?
  2017-09-04  9:31 Is autodefrag recommended? Marat Khalili
  2017-09-04 10:23 ` Henk Slager
@ 2017-09-04 10:34 ` Duncan
  2017-09-04 11:09   ` Henk Slager
  2017-09-04 10:54 ` Hugo Mills
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Duncan @ 2017-09-04 10:34 UTC (permalink / raw)
  To: linux-btrfs

Marat Khalili posted on Mon, 04 Sep 2017 12:31:54 +0300 as excerpted:

> Hello list,
> good time of the day,
> 
> More than once I see mentioned in this list that autodefrag option 
> solves problems with no apparent drawbacks, but it's not the default. 
> Can you recommend to just switch it on indiscriminately on all 
> installations?
> 
> I'm currently on kernel 4.4, can switch to 4.10 if necessary (it's 
> Ubuntu that gives us this strange choice, no idea why it's not 4.9). 
> Only spinning rust here, no SSDs.

AFAIK autodefrag is recommended in general, but may not be for certain 
specific use-cases.

* Because the mechanism involves watching written files for fragmentation 
and scheduling areas that are too fragmented for later rewrite, if the 
filesystem is operating at near capacity already, adding the extra load 
of the defragmenting rewrites may actually reduce throughput and increase 
latency, at least short term.  (Longer term the additional fragmentation 
from /not/ using it will become a factor and reduce throughput even more.)

* Users just turning autodefrag on after not using it for an extended 
period, thus having an already highly fragmented filesystem, may well see 
a period of higher latencies and lower throughput until the system 
"catches up" and has defragged frequently written-to files.  (This is 
avoided with a policy of always having it on from the first time the 
filesystem is mounted, so it's on at initial filesystem population.)

* As with many issues on a COW-based filesystem such as btrfs, it's the 
frequently written into (aka internal-rewrite-pattern) files that are the 
biggest test case.  At-once written files that are never in-place 
rewritten (for file safety many editors make a temporary copy, fsync it, 
and then atomically replace the original with a rename, thus not being in-
place rewrites) don't tend to be an issue, unless the filesystem is 
already fragmented enough at write time that the file must be fragmented 
as it is initially written.

In general, this internal-rewrite-pattern is most commonly seen for 
database files and virtual machines, with systemd's journal files likely 
being the most common example of the former -- they're NOT the common 
append-only log file format that so-called "legacy" text-based log files 
tend to be.  Also extremely common are the browser database files used by 
both gecko and webkit based browsers (and browser-based apps such as 
thunderbird).

* Autodefrag works very well when these internal-rewrite-pattern files 
are relatively small, say a quarter GiB or less, but, again with near-
capacity throughput, not necessarily so well with larger databases or VM 
images of a GiB or larger.  (The quarter-gig to gig size is intermediate, 
not as often a problem and not a problem for many, but it can be for 
slower devices, while those on fast ssds may not see a problem until 
sizes reach multiple GiB.)

For larger internal-rewrite-pattern files, again, starting at a gig or so 
depending on device speed as well as rewrite activity, where 
fragmentation AND performance are issues, the NOCOW file attribute may be 
useful, tho there are side effects (losing btrfs checksumming and 
compression functionality, interaction with btrfs snapshotting forcing 
COW1, etc).

However, in general COW-based filesystems, including btrfs, are not going 
to perform well with this use-case, and those operating large DBs or VMs 
with performance considerations may find more traditional filesystems (or 
even operation on bare-device, bypassing the filesystem layer entirely) a 
better match for their needs, and I personally consider NOCOW an 
unacceptable compromise, losing many of the advantages that make btrfs so 
nice in general, so IMO it's then better to just use a different 
filesystem better suited to that use-case.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is autodefrag recommended?
  2017-09-04  9:31 Is autodefrag recommended? Marat Khalili
  2017-09-04 10:23 ` Henk Slager
  2017-09-04 10:34 ` Duncan
@ 2017-09-04 10:54 ` Hugo Mills
  2017-09-05 11:45   ` Austin S. Hemmelgarn
  2017-09-05 12:36 ` A L
  2017-09-05 14:01 ` Is autodefrag recommended? -- re-duplication??? Marat Khalili
  4 siblings, 1 reply; 12+ messages in thread
From: Hugo Mills @ 2017-09-04 10:54 UTC (permalink / raw)
  To: Marat Khalili; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1057 bytes --]

On Mon, Sep 04, 2017 at 12:31:54PM +0300, Marat Khalili wrote:
> Hello list,
> good time of the day,
> 
> More than once I see mentioned in this list that autodefrag option
> solves problems with no apparent drawbacks, but it's not the
> default. Can you recommend to just switch it on indiscriminately on
> all installations?
> 
> I'm currently on kernel 4.4, can switch to 4.10 if necessary (it's
> Ubuntu that gives us this strange choice, no idea why it's not 4.9).
> Only spinning rust here, no SSDs.

   autodefrag effectively works by taking a small region around every
write or cluster of writes and making that into a stand-alone extent.

   This has two consequences:

 - You end up duplicating more data than is strictly necessary. This
   is, IIRC, something like 128 KiB for a write.

 - There's an I/O overhead for enabling autodefrag, because it's
   increasing the amount of data written.

   Hugo.

-- 
Hugo Mills             | The future isn't what it used to be.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is autodefrag recommended?
  2017-09-04 10:34 ` Duncan
@ 2017-09-04 11:09   ` Henk Slager
  2017-09-04 22:27     ` Duncan
  0 siblings, 1 reply; 12+ messages in thread
From: Henk Slager @ 2017-09-04 11:09 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Mon, Sep 4, 2017 at 12:34 PM, Duncan <1i5t5.duncan@cox.net> wrote:

> * Autodefrag works very well when these internal-rewrite-pattern files
> are relatively small, say a quarter GiB or less, but, again with near-
> capacity throughput, not necessarily so well with larger databases or VM
> images of a GiB or larger.  (The quarter-gig to gig size is intermediate,
> not as often a problem and not a problem for many, but it can be for
> slower devices, while those on fast ssds may not see a problem until
> sizes reach multiple GiB.)

I have seen you stating this before about some quarter GiB filesize or
so, but it is irrelevant, it is simply not how it works. See
explanation of Hugo for how it works. I can post/store an actual
filefrag output of a vm image that is around for 2 years on the one of
my btrfs fs, then you can do some statistics on it and see from there
how it works.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is autodefrag recommended?
  2017-09-04 11:09   ` Henk Slager
@ 2017-09-04 22:27     ` Duncan
  0 siblings, 0 replies; 12+ messages in thread
From: Duncan @ 2017-09-04 22:27 UTC (permalink / raw)
  To: linux-btrfs

Henk Slager posted on Mon, 04 Sep 2017 13:09:24 +0200 as excerpted:

> On Mon, Sep 4, 2017 at 12:34 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> 
>> * Autodefrag works very well when these internal-rewrite-pattern files
>> are relatively small, say a quarter GiB or less, but, again with near-
>> capacity throughput, not necessarily so well with larger databases or
>> VM images of a GiB or larger.  (The quarter-gig to gig size is
>> intermediate,
>> not as often a problem and not a problem for many, but it can be for
>> slower devices, while those on fast ssds may not see a problem until
>> sizes reach multiple GiB.)
> 
> I have seen you stating this before about some quarter GiB filesize or
> so, but it is irrelevant, it is simply not how it works. See explanation
> of Hugo for how it works. I can post/store an actual filefrag output of
> a vm image that is around for 2 years on the one of my btrfs fs, then
> you can do some statistics on it and see from there how it works.

FWIW...

I believe it did work that way (whole-file autodefrag) at one point.  
Because back in the early kernel 3.x era at least, we had complaints 
about autodefrag performance with larger internal-rewrite-pattern files 
where the larger the file the worse the performance, and the 
documentation mentioned something about being appropriate for small files 
but less so far large files as well.

But I also believe you're correct that it no longer works that way (if it 
ever did, maybe the complaints were due to some unrelated side effect, in 
any case I've not seen any for quite some time now), and hasn't since 
before anything we're still trying to reasonably support on this list 
(IOW, back two LTS kernel series ago, so to 4.4).

So I should drop the size factor, or mention that it's not nearly the 
problem it once was, at least.

Thanks for forcing the reckoning. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is autodefrag recommended?
  2017-09-04 10:54 ` Hugo Mills
@ 2017-09-05 11:45   ` Austin S. Hemmelgarn
  2017-09-05 12:49     ` Henk Slager
  0 siblings, 1 reply; 12+ messages in thread
From: Austin S. Hemmelgarn @ 2017-09-05 11:45 UTC (permalink / raw)
  To: Hugo Mills, Marat Khalili, linux-btrfs

On 2017-09-04 06:54, Hugo Mills wrote:
> On Mon, Sep 04, 2017 at 12:31:54PM +0300, Marat Khalili wrote:
>> Hello list,
>> good time of the day,
>>
>> More than once I see mentioned in this list that autodefrag option
>> solves problems with no apparent drawbacks, but it's not the
>> default. Can you recommend to just switch it on indiscriminately on
>> all installations?
>>
>> I'm currently on kernel 4.4, can switch to 4.10 if necessary (it's
>> Ubuntu that gives us this strange choice, no idea why it's not 4.9).
>> Only spinning rust here, no SSDs.
> 
>     autodefrag effectively works by taking a small region around every
> write or cluster of writes and making that into a stand-alone extent.
I was under the impression that it had some kind of 'random access' 
detection heuristic, and onky triggered if that flagged the write 
patterns as 'random'.
> 
>     This has two consequences:
> 
>   - You end up duplicating more data than is strictly necessary. This
>     is, IIRC, something like 128 KiB for a write.
FWIW< I'm pretty sure you can mitigate this first issue by running a 
regular defrag on a semi-regular basis (monthly is what I would probably 
suggest).
> 
>   - There's an I/O overhead for enabling autodefrag, because it's
>     increasing the amount of data written.
And this issue may not be as much of an issue.  The region being 
rewritten gets written out sequentially, so it will increase the amount 
of data written, but in most cases probably won't increase IO request 
counts to the device by much.  If you care mostly about raw bandwidth, 
then this could still have an impact, but if you care about IOPS, it 
probably won't have much impact unless you're already running the device 
at peak capacity.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is autodefrag recommended?
  2017-09-04  9:31 Is autodefrag recommended? Marat Khalili
                   ` (2 preceding siblings ...)
  2017-09-04 10:54 ` Hugo Mills
@ 2017-09-05 12:36 ` A L
  2017-09-05 14:01 ` Is autodefrag recommended? -- re-duplication??? Marat Khalili
  4 siblings, 0 replies; 12+ messages in thread
From: A L @ 2017-09-05 12:36 UTC (permalink / raw)
  To: linux-btrfs

There is a drawback in that defragmentation re-dups data that is previously deduped or shared in snapshots/subvolumes.

---- From: Marat Khalili <mkh@rqc.ru> -- Sent: 2017-09-04 - 11:31 ----

> Hello list,
> good time of the day,
> 
> More than once I see mentioned in this list that autodefrag option 
> solves problems with no apparent drawbacks, but it's not the default. 
> Can you recommend to just switch it on indiscriminately on all 
> installations?
> 
> I'm currently on kernel 4.4, can switch to 4.10 if necessary (it's 
> Ubuntu that gives us this strange choice, no idea why it's not 4.9). 
> Only spinning rust here, no SSDs.
> 
> --
> 
> With Best Regards,
> Marat Khalili
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is autodefrag recommended?
  2017-09-05 11:45   ` Austin S. Hemmelgarn
@ 2017-09-05 12:49     ` Henk Slager
  2017-09-05 13:00       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 12+ messages in thread
From: Henk Slager @ 2017-09-05 12:49 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

On Tue, Sep 5, 2017 at 1:45 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

>>   - You end up duplicating more data than is strictly necessary. This
>>     is, IIRC, something like 128 KiB for a write.
>
> FWIW< I'm pretty sure you can mitigate this first issue by running a regular
> defrag on a semi-regular basis (monthly is what I would probably suggest).

No, both autodefrag and regular defrag duplicate data, so if you keep
snapshots around for weeks or months, it can eat up a significant
amount of space.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is autodefrag recommended?
  2017-09-05 12:49     ` Henk Slager
@ 2017-09-05 13:00       ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 12+ messages in thread
From: Austin S. Hemmelgarn @ 2017-09-05 13:00 UTC (permalink / raw)
  To: Henk Slager; +Cc: linux-btrfs

On 2017-09-05 08:49, Henk Slager wrote:
> On Tue, Sep 5, 2017 at 1:45 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
> 
>>>    - You end up duplicating more data than is strictly necessary. This
>>>      is, IIRC, something like 128 KiB for a write.
>>
>> FWIW< I'm pretty sure you can mitigate this first issue by running a regular
>> defrag on a semi-regular basis (monthly is what I would probably suggest).
> 
> No, both autodefrag and regular defrag duplicate data, so if you keep
> snapshots around for weeks or months, it can eat up a significant
> amount of space.
> 
I'm not talking about data duplication due to broken reflinks, I'm 
talking about data duplication due to how partial extent rewrites are 
handled in BTRFS.

As a more illustrative example, suppose you've got a 256k file that has 
just one extent.  Such a file will require 256k of space for the data 
Now rewrite from 128k to 192k.  The file now technically takes up 320k, 
because the region you rewrote is still allocated in the original extent.

I know that sub-extent-size reflinks are handled like this (in the above 
example, if you instead use the CLONE ioctl to create a new file 
reflinking that range, then delete the original, the remaining 192k of 
space in the extent ends up unreferenced, but gets kept around until the 
referenced region is no longer referenced (and the easiest way to ensure 
this is to either rewrite the whole file, or defragment it)), and I'm 
pretty sure from reading the code that mid-extent writes are handled 
this way too, in which case, a full defrag can reclaim that space.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is autodefrag recommended? -- re-duplication???
  2017-09-04  9:31 Is autodefrag recommended? Marat Khalili
                   ` (3 preceding siblings ...)
  2017-09-05 12:36 ` A L
@ 2017-09-05 14:01 ` Marat Khalili
  2017-09-05 14:39   ` Hugo Mills
  4 siblings, 1 reply; 12+ messages in thread
From: Marat Khalili @ 2017-09-05 14:01 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Henk Slager, Duncan, Hugo Mills, Austin S. Hemmelgarn, Henk Slager, A L

Dear experts,

At first reaction to just switching autodefrag on was positive, but 
mentions of re-duplication are very scary. Main use of BTRFS here is 
backup snapshots, so re-duplication would be disastrous.

In order to stick to concrete example, let there be two files, 4KB and 
4GB in size, referenced in read-only snapshots 100 times each, and some 
4KB of both files are rewritten each night and then another snapshot is 
created (let's ignore snapshots deletion here). AFAIU 8KB of additional 
space (+metadata) will be allocated each night without autodefrag. With 
autodefrag will it be perhaps 4KB+128KB or something much worse?

--

With Best Regards,
Marat Khalili


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is autodefrag recommended? -- re-duplication???
  2017-09-05 14:01 ` Is autodefrag recommended? -- re-duplication??? Marat Khalili
@ 2017-09-05 14:39   ` Hugo Mills
  0 siblings, 0 replies; 12+ messages in thread
From: Hugo Mills @ 2017-09-05 14:39 UTC (permalink / raw)
  To: Marat Khalili; +Cc: linux-btrfs, Henk Slager, Duncan, Austin S. Hemmelgarn, A L

[-- Attachment #1: Type: text/plain, Size: 1061 bytes --]

On Tue, Sep 05, 2017 at 05:01:10PM +0300, Marat Khalili wrote:
> Dear experts,
> 
> At first reaction to just switching autodefrag on was positive, but
> mentions of re-duplication are very scary. Main use of BTRFS here is
> backup snapshots, so re-duplication would be disastrous.
> 
> In order to stick to concrete example, let there be two files, 4KB
> and 4GB in size, referenced in read-only snapshots 100 times each,
> and some 4KB of both files are rewritten each night and then another
> snapshot is created (let's ignore snapshots deletion here). AFAIU
> 8KB of additional space (+metadata) will be allocated each night
> without autodefrag. With autodefrag will it be perhaps 4KB+128KB or
> something much worse?

   I'm going for 132 KiB (4+128).

   Of course, if there's two 4 KiB writes close together, then there's
less overhead, as they'll share the range.

   Hugo.

-- 
Hugo Mills             | Once is happenstance; twice is coincidence; three
hugo@... carfax.org.uk | times is enemy action.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-09-05 14:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-04  9:31 Is autodefrag recommended? Marat Khalili
2017-09-04 10:23 ` Henk Slager
2017-09-04 10:34 ` Duncan
2017-09-04 11:09   ` Henk Slager
2017-09-04 22:27     ` Duncan
2017-09-04 10:54 ` Hugo Mills
2017-09-05 11:45   ` Austin S. Hemmelgarn
2017-09-05 12:49     ` Henk Slager
2017-09-05 13:00       ` Austin S. Hemmelgarn
2017-09-05 12:36 ` A L
2017-09-05 14:01 ` Is autodefrag recommended? -- re-duplication??? Marat Khalili
2017-09-05 14:39   ` Hugo Mills

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.