All of lore.kernel.org
 help / color / mirror / Atom feed
* Query about proposed dedup patches and behaviours
@ 2016-01-14 16:13 James Hogarth
  2016-01-14 16:46 ` Austin S. Hemmelgarn
  2016-01-23 22:11 ` Query about proposed dedup patches and behaviours Mark Fasheh
  0 siblings, 2 replies; 15+ messages in thread
From: James Hogarth @ 2016-01-14 16:13 UTC (permalink / raw)
  To: linux-btrfs

Hi,

The duperemove[1] tool is in the process for packaging for Fedora at
present but I was wondering what future this may have with the 4.5
dedup patches being proposed.

WIll the btrfs command have the ability to out-of-line dedup files
similar to duperemove (thus negating the need for it) or will this
only control in-line dedup with a tool like duperemove still being
required for periodic only (or restricted path) dedup?

To avoid memory usage bloat if the btrfs command can order dedup  of X
files on the path correctly can it be passed a path to carry the hash
map in some form (similar to how dupeemeove can use sqlite for this)
or is this another use case for the external tool?

Finally what's the present situation with regards to defragmentation
and deduplication? Is it safe to turn on autodefrag now when using
snapshots and duperemove? What should the behaviour be with the
proposed 4.5 dedup patches if both inline dedup and autodefrag are
enabled as mount options?

Cheers,

James

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1244678

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Query about proposed dedup patches and behaviours
  2016-01-14 16:13 Query about proposed dedup patches and behaviours James Hogarth
@ 2016-01-14 16:46 ` Austin S. Hemmelgarn
  2016-01-14 19:26   ` Liu Bo
  2016-01-23 22:11 ` Query about proposed dedup patches and behaviours Mark Fasheh
  1 sibling, 1 reply; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2016-01-14 16:46 UTC (permalink / raw)
  To: James Hogarth, linux-btrfs

On 2016-01-14 11:13, James Hogarth wrote:
> Hi,
>
> The duperemove[1] tool is in the process for packaging for Fedora at
> present but I was wondering what future this may have with the 4.5
> dedup patches being proposed.
>
> WIll the btrfs command have the ability to out-of-line dedup files
> similar to duperemove (thus negating the need for it) or will this
> only control in-line dedup with a tool like duperemove still being
> required for periodic only (or restricted path) dedup?
Unless I'm horribly misreading the code, the regular btrfs-progs will 
not be adding the ability to do out-of-band deduplication.  It may at 
some point add a shortcut for the required ioctl to be used from 
scripts, but that's probably unlikely.
>
> To avoid memory usage bloat if the btrfs command can order dedup  of X
> files on the path correctly can it be passed a path to carry the hash
> map in some form (similar to how dupeemeove can use sqlite for this)
> or is this another use case for the external tool?
This shouldn't be an issue for in-line deduplication, as that's handled 
in the kernel.
>
> Finally what's the present situation with regards to defragmentation
> and deduplication? Is it safe to turn on autodefrag now when using
> snapshots and duperemove? What should the behaviour be with the
> proposed 4.5 dedup patches if both inline dedup and autodefrag are
> enabled as mount options?
I'm not entirely certain how deduplication would interact with any form 
of defragmentation.  I'm pretty certain though that autodefrag does 
properly handle snapshots, such that the reflinks aren't broken, and 
it's the original copy that gets any shared extents defragmented into it.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Query about proposed dedup patches and behaviours
  2016-01-14 16:46 ` Austin S. Hemmelgarn
@ 2016-01-14 19:26   ` Liu Bo
  2016-01-14 19:41     ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 15+ messages in thread
From: Liu Bo @ 2016-01-14 19:26 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: James Hogarth, linux-btrfs

On Thu, Jan 14, 2016 at 11:46:33AM -0500, Austin S. Hemmelgarn wrote:
> On 2016-01-14 11:13, James Hogarth wrote:
> >Hi,
> >
> >The duperemove[1] tool is in the process for packaging for Fedora at
> >present but I was wondering what future this may have with the 4.5
> >dedup patches being proposed.
> >
> >WIll the btrfs command have the ability to out-of-line dedup files
> >similar to duperemove (thus negating the need for it) or will this
> >only control in-line dedup with a tool like duperemove still being
> >required for periodic only (or restricted path) dedup?
> Unless I'm horribly misreading the code, the regular btrfs-progs will not be
> adding the ability to do out-of-band deduplication.  It may at some point
> add a shortcut for the required ioctl to be used from scripts, but that's
> probably unlikely.
> >
> >To avoid memory usage bloat if the btrfs command can order dedup  of X
> >files on the path correctly can it be passed a path to carry the hash
> >map in some form (similar to how dupeemeove can use sqlite for this)
> >or is this another use case for the external tool?
> This shouldn't be an issue for in-line deduplication, as that's handled in
> the kernel.
> >
> >Finally what's the present situation with regards to defragmentation
> >and deduplication? Is it safe to turn on autodefrag now when using
> >snapshots and duperemove? What should the behaviour be with the
> >proposed 4.5 dedup patches if both inline dedup and autodefrag are
> >enabled as mount options?
> I'm not entirely certain how deduplication would interact with any form of
> defragmentation.  I'm pretty certain though that autodefrag does properly
> handle snapshots, such that the reflinks aren't broken, and it's the
> original copy that gets any shared extents defragmented into it.

If it refers to snapshot-aware defrag, it's been disabled, so now btrfs
will not maintain reflinks between snapshots.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Query about proposed dedup patches and behaviours
  2016-01-14 19:26   ` Liu Bo
@ 2016-01-14 19:41     ` Austin S. Hemmelgarn
  2016-01-15  1:47       ` Duncan
  2016-01-20 15:33       ` Interjection: autodefrag mount option aye, nae? Al
  0 siblings, 2 replies; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2016-01-14 19:41 UTC (permalink / raw)
  To: bo.li.liu; +Cc: James Hogarth, linux-btrfs

On 2016-01-14 14:26, Liu Bo wrote:
> On Thu, Jan 14, 2016 at 11:46:33AM -0500, Austin S. Hemmelgarn wrote:
>> On 2016-01-14 11:13, James Hogarth wrote:
>>> Hi,
>>>
>>> The duperemove[1] tool is in the process for packaging for Fedora at
>>> present but I was wondering what future this may have with the 4.5
>>> dedup patches being proposed.
>>>
>>> WIll the btrfs command have the ability to out-of-line dedup files
>>> similar to duperemove (thus negating the need for it) or will this
>>> only control in-line dedup with a tool like duperemove still being
>>> required for periodic only (or restricted path) dedup?
>> Unless I'm horribly misreading the code, the regular btrfs-progs will not be
>> adding the ability to do out-of-band deduplication.  It may at some point
>> add a shortcut for the required ioctl to be used from scripts, but that's
>> probably unlikely.
>>>
>>> To avoid memory usage bloat if the btrfs command can order dedup  of X
>>> files on the path correctly can it be passed a path to carry the hash
>>> map in some form (similar to how dupeemeove can use sqlite for this)
>>> or is this another use case for the external tool?
>> This shouldn't be an issue for in-line deduplication, as that's handled in
>> the kernel.
>>>
>>> Finally what's the present situation with regards to defragmentation
>>> and deduplication? Is it safe to turn on autodefrag now when using
>>> snapshots and duperemove? What should the behaviour be with the
>>> proposed 4.5 dedup patches if both inline dedup and autodefrag are
>>> enabled as mount options?
>> I'm not entirely certain how deduplication would interact with any form of
>> defragmentation.  I'm pretty certain though that autodefrag does properly
>> handle snapshots, such that the reflinks aren't broken, and it's the
>> original copy that gets any shared extents defragmented into it.
>
> If it refers to snapshot-aware defrag, it's been disabled, so now btrfs
> will not maintain reflinks between snapshots.
>
I was under the impression that autodefrag had been done separately from 
the snapshot-aware manually triggered defrag, and that it's always been 
snapshot aware.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Query about proposed dedup patches and behaviours
  2016-01-14 19:41     ` Austin S. Hemmelgarn
@ 2016-01-15  1:47       ` Duncan
  2016-01-15  9:33         ` James Hogarth
  2016-01-20 15:33       ` Interjection: autodefrag mount option aye, nae? Al
  1 sibling, 1 reply; 15+ messages in thread
From: Duncan @ 2016-01-15  1:47 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Thu, 14 Jan 2016 14:41:27 -0500 as
excerpted:

> On 2016-01-14 14:26, Liu Bo wrote:
>> On Thu, Jan 14, 2016 at 11:46:33AM -0500, Austin S. Hemmelgarn wrote:
>>> On 2016-01-14 11:13, James Hogarth wrote:

>>>> Finally what's the present situation with regards to defragmentation
>>>> and deduplication? Is it safe to turn on autodefrag now when using
>>>> snapshots and duperemove? What should the behaviour be with the
>>>> proposed 4.5 dedup patches if both inline dedup and autodefrag are
>>>> enabled as mount options?

>>> I'm not entirely certain how deduplication would interact with any
>>> form of defragmentation.  I'm pretty certain though that autodefrag
>>> does properly handle snapshots, such that the reflinks aren't broken,
>>> and it's the original copy that gets any shared extents defragmented
>>> into it.
>>
>> If it refers to snapshot-aware defrag, it's been disabled, so now btrfs
>> will not maintain reflinks between snapshots.
>>
> I was under the impression that autodefrag had been done separately from
> the snapshot-aware manually triggered defrag, and that it's always been
> snapshot aware.

Hugo should really explain as he was the one that said that, but upon 
looking into it, he found that while he was correct in a sense, his 
reasoning was a bit narrow, and autodefrag isn't snapshot aware in the 
wider context.

Without attempting to explain his reasoning as I think I sort of 
understand it but not well enough to try to explain, autodefrag isn't 
snapshot aware and will break reflinks, but due to $reasons, autodefrag's 
damage to reflinking apparently isn't as bad as manual defrag.

That's the best I can do to explain the situation.  In general, 
autodefrag remains bad for reflinks, but apparently not h***-bad, as 
manual defrag is.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Query about proposed dedup patches and behaviours
  2016-01-15  1:47       ` Duncan
@ 2016-01-15  9:33         ` James Hogarth
  2016-01-15 12:18           ` Duncan
  0 siblings, 1 reply; 15+ messages in thread
From: James Hogarth @ 2016-01-15  9:33 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On 15 January 2016 at 01:47, Duncan <1i5t5.duncan@cox.net> wrote:
>
> Hugo should really explain as he was the one that said that, but upon
> looking into it, he found that while he was correct in a sense, his
> reasoning was a bit narrow, and autodefrag isn't snapshot aware in the
> wider context.
>
> Without attempting to explain his reasoning as I think I sort of
> understand it but not well enough to try to explain, autodefrag isn't
> snapshot aware and will break reflinks, but due to $reasons, autodefrag's
> damage to reflinking apparently isn't as bad as manual defrag.
>
> That's the best I can do to explain the situation.  In general,
> autodefrag remains bad for reflinks, but apparently not h***-bad, as
> manual defrag is.
>

As I recall it's something like autodefrag will break the reflink
pretty much to the same extent as if you just starting writing to each
instance.

http://article.gmane.org/gmane.comp.file-systems.btrfs/51441

Looking through the patches again I see that Qu has indeed already
looked to on disk hash rather than in memory so that relieves my
memory blooming concerns.

http://thread.gmane.org/gmane.comp.file-systems.btrfs/52215

It does appear that btrfs-progs is only being extended to enable or
disable dedup on a whole pool rather than to dedup X files

http://news.gmane.org/find-root.php?message_id=1452751070%2d2460%2d3%2dgit%2dsend%2demail%2dquwenruo%40cn.fujitsu.com

I suppose that one could in principle target a btrfs balance to
particular extents after enabling dedupe on the pool in order to try
and target particular files but that seems rather cumbersome, and if
wanting to dedup an entire pool then enabling the feature followed by
a full balance ought to do it.

So I see two things out of this:

1) A least a note in the man page (or command output as well
preferably) reminding that autodefrag will to an extent work against
dedupe (and it may be worth testing the effect of both enabled and if
poor preventing one whilst the other is there).

2) Qu is there any intention to be able to do btrfs dedup
/path1../pathN or is the intention for this work only to enable
in-band across an entire pool (less any files with the proposed
attribute changed to say nodedup)?

If there is no intention for 2 then the duperemove packaging is still
worthwhile to carry out.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Query about proposed dedup patches and behaviours
  2016-01-15  9:33         ` James Hogarth
@ 2016-01-15 12:18           ` Duncan
  0 siblings, 0 replies; 15+ messages in thread
From: Duncan @ 2016-01-15 12:18 UTC (permalink / raw)
  To: linux-btrfs

James Hogarth posted on Fri, 15 Jan 2016 09:33:44 +0000 as excerpted:

> On 15 January 2016 at 01:47, Duncan <1i5t5.duncan@cox.net> wrote:
>>
>> Hugo should really explain as he was the one that said that, but 
>> [...]  In general, autodefrag remains bad for reflinks, but
>> apparently not h***-bad, as manual defrag is.
>>
> As I recall it's something like autodefrag will break the reflink pretty
> much to the same extent as if you just starting writing to each
> instance.
> 
> http://article.gmane.org/gmane.comp.file-systems.btrfs/51441

That's it, yes.  Thanks.  =:^)

I think I had read it but hadn't actually explained it to anyone myself 
yet, which tends to solidify it in my mind, so forgot enough of the 
detail that I couldn't easily do so.  Let's see if the below explanation 
solves that for next time.  =:^)

Tho just writing to the file would normally only copy the 4096-byte 
block, while autodefrag will check how fragmented the file is around that 
block, and if the extents are small enough to trigger a defrag, it'll 
rewrite rather more of the file into a (hopefully) larger single extent.

So autodefrag will break reflinks to a rather larger extent (literally, 
file extent) than will writing to an individual block within a file, but 
(on a reasonably large file, say 100-MiB scale) it should still be a much 
smaller effect (breaking reflinks for a rather smaller part of the file) 
than defragging the entire file, which is what a manual defrag would do.

And as Hugo said, of course if you're rewriting most of the file, it's 
likely all or almost all the file will be reflink-broken, but that would 
be expected anyway, if you're rewriting the file.

So as I said, autodefrag is a bit bad for reflinks, yes, but not h***-bad 
for them, as manual defrag is.


> It does appear that btrfs-progs is only being extended to enable or
> disable dedup on a whole pool rather than to dedup X files

> So I see two things out of this:
> 
> 1) A least a note in the man page (or command output as well preferably)
> reminding that autodefrag will to an extent work against dedupe (and it
> may be worth testing the effect of both enabled and if poor preventing
> one whilst the other is there).

Agreed, a manpage (and wiki mount options page) note explaining that 
autodefrag can partially undo dedup's work, would be useful.

> 2) Qu is there any intention to be able to do btrfs dedup /path1../pathN
> or is the intention for this work only to enable in-band across an
> entire pool (less any files with the proposed attribute changed to say
> nodedup)?
> 
> If there is no intention for 2 then the duperemove packaging is still
> worthwhile to carry out.

Previous discussion has made plain that this is /inline/ dedup, write new 
data, and it's compared against existing data (hashes or the like) to see 
if part or all of it can be reflinked instead of written separately.

And while not yet part of the patches, a per-file nodedup property is 
intended as well, which if set, will mean the file isn't included as 
existing data when the comparison of new data against existing data is 
made.

As such, there will indeed be no way to specifically dedup one already 
existing file against another (unless of course you rewrite it, 
triggering the inline dedup), which is precisely where (separate) out-of-
line dedup comes in.

So yes, dupremove, as one option for that separate out-of-line dedup, 
will still be worthwhile, with the two functionalities complimenting each 
other.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Interjection: autodefrag mount option aye, nae?
  2016-01-14 19:41     ` Austin S. Hemmelgarn
  2016-01-15  1:47       ` Duncan
@ 2016-01-20 15:33       ` Al
  2016-01-20 15:39         ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 15+ messages in thread
From: Al @ 2016-01-20 15:33 UTC (permalink / raw)
  To: linux-btrfs

[very quietly] I've had autodefrag out of my mount options for a long while
now. Is that still the recommended position?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Interjection: autodefrag mount option aye, nae?
  2016-01-20 15:33       ` Interjection: autodefrag mount option aye, nae? Al
@ 2016-01-20 15:39         ` Austin S. Hemmelgarn
  2016-01-20 18:39           ` Duncan
  2016-01-21 20:59           ` Kai Krakow
  0 siblings, 2 replies; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2016-01-20 15:39 UTC (permalink / raw)
  To: Al, linux-btrfs

On 2016-01-20 10:33, Al wrote:
> [very quietly] I've had autodefrag out of my mount options for a long while
> now. Is that still the recommended position?
I think it really depends on what you're doing.  In my case, I usually 
have it on, and the only issue I've ever seen is that Chrome sometimes 
loads pages from local cache slower than it should be.  I also don't use 
ridiculous numbers of snapshots either (I use them only to get a stable 
view of the filesystem when generating a backup), so I don't have much 
experience with how they interact with autodefrag.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Interjection: autodefrag mount option aye, nae?
  2016-01-20 15:39         ` Austin S. Hemmelgarn
@ 2016-01-20 18:39           ` Duncan
  2016-01-21 20:59           ` Kai Krakow
  1 sibling, 0 replies; 15+ messages in thread
From: Duncan @ 2016-01-20 18:39 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Wed, 20 Jan 2016 10:39:58 -0500 as
excerpted:

> On 2016-01-20 10:33, Al wrote:
>> [very quietly] I've had autodefrag out of my mount options for a long
>> while now. Is that still the recommended position?

> I think it really depends on what you're doing.  In my case, I usually
> have it on, and the only issue I've ever seen is that Chrome sometimes
> loads pages from local cache slower than it should be.  I also don't use
> ridiculous numbers of snapshots either (I use them only to get a stable
> view of the filesystem when generating a backup), so I don't have much
> experience with how they interact with autodefrag.

I use autodefrag here too.

The situations where autodefrag won't make sense are going to be ones 
where people are doing large files (half-gig plus) with heavy rewrites -- 
typically large database and VM image files.  Those need other measures, 
generally nocow, lower snapshotting frequencies, and periodic manual 
defrag.

Autodefrag with heavy snapshotting is more of an open question, as would 
be autodefrag on SSD.  I'd personally argue that the benefits of 
autodefrag exceed the down sides in these cases, but can easily see how 
some may argue otherwise, so it's admin's call, after suitable testing if 
they care enough about it to do that.

Autodefrag is definitely recommended for "desktop" usage (particularly on 
non-ssd), however, where the largest random-rewrite-pattern files are the 
smaller (typically under quarter GiB) sqlite type databases common to 
firefox/chrome/thunderbird/evolution/etc, as that's where autodefrag does 
its best.

My typical usage is pretty close to this "desktop" usage, tho I am on 
SSD, so I use autodefrag.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Interjection: autodefrag mount option aye, nae?
  2016-01-20 15:39         ` Austin S. Hemmelgarn
  2016-01-20 18:39           ` Duncan
@ 2016-01-21 20:59           ` Kai Krakow
  2016-01-22 12:14             ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 15+ messages in thread
From: Kai Krakow @ 2016-01-21 20:59 UTC (permalink / raw)
  To: linux-btrfs

Am Wed, 20 Jan 2016 10:39:58 -0500
schrieb "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:

> On 2016-01-20 10:33, Al wrote:
> > [very quietly] I've had autodefrag out of my mount options for a
> > long while now. Is that still the recommended position?
> I think it really depends on what you're doing.  In my case, I
> usually have it on, and the only issue I've ever seen is that Chrome
> sometimes loads pages from local cache slower than it should be.  I
> also don't use ridiculous numbers of snapshots either (I use them
> only to get a stable view of the filesystem when generating a
> backup), so I don't have much experience with how they interact with
> autodefrag.

I'd recommend to set chrome caching to simple http cache in
chrome://flags as this is more suitable for btrfs (as for most Unix
file systems which deal with many small files better than with random
updates in a big fat files).

I experienced much improved performance and responsiveness with it. May
be worth a try for you. I'd be interested in your results.

chrome://flags/#enable-simple-cache-backend

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Interjection: autodefrag mount option aye, nae?
  2016-01-21 20:59           ` Kai Krakow
@ 2016-01-22 12:14             ` Austin S. Hemmelgarn
  2016-01-22 19:43               ` Kai Krakow
  0 siblings, 1 reply; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2016-01-22 12:14 UTC (permalink / raw)
  To: Kai Krakow, linux-btrfs

On 2016-01-21 15:59, Kai Krakow wrote:
> Am Wed, 20 Jan 2016 10:39:58 -0500
> schrieb "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:
>
>> On 2016-01-20 10:33, Al wrote:
>>> [very quietly] I've had autodefrag out of my mount options for a
>>> long while now. Is that still the recommended position?
>> I think it really depends on what you're doing.  In my case, I
>> usually have it on, and the only issue I've ever seen is that Chrome
>> sometimes loads pages from local cache slower than it should be.  I
>> also don't use ridiculous numbers of snapshots either (I use them
>> only to get a stable view of the filesystem when generating a
>> backup), so I don't have much experience with how they interact with
>> autodefrag.
>
> I'd recommend to set chrome caching to simple http cache in
> chrome://flags as this is more suitable for btrfs (as for most Unix
> file systems which deal with many small files better than with random
> updates in a big fat files).
>
> I experienced much improved performance and responsiveness with it. May
> be worth a try for you. I'd be interested in your results.
>
> chrome://flags/#enable-simple-cache-backend
>
Thanks for the suggestion, it does in fact appear to improve things on 
BTRFS.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Interjection: autodefrag mount option aye, nae?
  2016-01-22 12:14             ` Austin S. Hemmelgarn
@ 2016-01-22 19:43               ` Kai Krakow
  0 siblings, 0 replies; 15+ messages in thread
From: Kai Krakow @ 2016-01-22 19:43 UTC (permalink / raw)
  To: linux-btrfs

Am Fri, 22 Jan 2016 07:14:57 -0500
schrieb "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:

> On 2016-01-21 15:59, Kai Krakow wrote:
> > Am Wed, 20 Jan 2016 10:39:58 -0500
> > schrieb "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:
> >
> >> On 2016-01-20 10:33, Al wrote:
> >>> [very quietly] I've had autodefrag out of my mount options for a
> >>> long while now. Is that still the recommended position?
> >> I think it really depends on what you're doing.  In my case, I
> >> usually have it on, and the only issue I've ever seen is that
> >> Chrome sometimes loads pages from local cache slower than it
> >> should be.  I also don't use ridiculous numbers of snapshots
> >> either (I use them only to get a stable view of the filesystem
> >> when generating a backup), so I don't have much experience with
> >> how they interact with autodefrag.
> >
> > I'd recommend to set chrome caching to simple http cache in
> > chrome://flags as this is more suitable for btrfs (as for most Unix
> > file systems which deal with many small files better than with
> > random updates in a big fat files).
> >
> > I experienced much improved performance and responsiveness with it.
> > May be worth a try for you. I'd be interested in your results.
> >
> > chrome://flags/#enable-simple-cache-backend
> >
> Thanks for the suggestion, it does in fact appear to improve things
> on BTRFS.

The original Chrome cache manages HTTP files in big, database-like
files. This design is better for Windows machines as NTFS (or probably
an almost non-existing IO scheduler) is not good at handling many small
files. Unix is traditionally much more optimized at that and
outperforms Windows here. Adding the fact that COW file systems are not
good at database-like workloads, it explains why it works much better.

But it was also more responsive when I used it back in my XFS days
(spanning multiple devices using LVM JBOD). So I stayed with it. The
multi-second freezes once in a while of Chrome drove me crazy. This
fixed it.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Query about proposed dedup patches and behaviours
  2016-01-14 16:13 Query about proposed dedup patches and behaviours James Hogarth
  2016-01-14 16:46 ` Austin S. Hemmelgarn
@ 2016-01-23 22:11 ` Mark Fasheh
  2016-01-24  5:12   ` Duncan
  1 sibling, 1 reply; 15+ messages in thread
From: Mark Fasheh @ 2016-01-23 22:11 UTC (permalink / raw)
  To: James Hogarth; +Cc: linux-btrfs

On Thu, Jan 14, 2016 at 04:13:00PM +0000, James Hogarth wrote:
> The duperemove[1] tool is in the process for packaging for Fedora at
> present but I was wondering what future this may have with the 4.5
> dedup patches being proposed.
> 
> WIll the btrfs command have the ability to out-of-line dedup files
> similar to duperemove (thus negating the need for it) or will this
> only control in-line dedup with a tool like duperemove still being
> required for periodic only (or restricted path) dedup?

Similar to dupremove, I doubt it. Duperemove is about 12,000 lines at this
point and very little of it is duplicated from btrfs-progs. Much of it is
concerned with efficiently scanning files, making extents from duplicated
blocks, managing a sqlite db, etc. Things that the btrfs command doesn't
need to handle.

Also Ocfs2 should be able to support extent-same at some point and
duperemove will want to run on that FS as well.

We could always have a small wrapper to the ioctl but again the difference
between 'hey dedupe a couple of files' and 'scan terabytes of data to
dedupe' is pretty big if you care about getting it done efficiently.


> To avoid memory usage bloat if the btrfs command can order dedup  of X
> files on the path correctly can it be passed a path to carry the hash
> map in some form (similar to how dupeemeove can use sqlite for this)
> or is this another use case for the external tool?

I'm not totally clear on what you're asking here. Do you want the duperemove
hashes passed into the kernel? There's no point since we just use that map
to call our ioctl...


> Finally what's the present situation with regards to defragmentation
> and deduplication? Is it safe to turn on autodefrag now when using
> snapshots and duperemove? What should the behaviour be with the
> proposed 4.5 dedup patches if both inline dedup and autodefrag are
> enabled as mount options?

Was there ever a reason it was unsafe to do dedupe + autodefrag? To my
knowledge this should be fine.

Thanks,
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Query about proposed dedup patches and behaviours
  2016-01-23 22:11 ` Query about proposed dedup patches and behaviours Mark Fasheh
@ 2016-01-24  5:12   ` Duncan
  0 siblings, 0 replies; 15+ messages in thread
From: Duncan @ 2016-01-24  5:12 UTC (permalink / raw)
  To: linux-btrfs

Mark Fasheh posted on Sat, 23 Jan 2016 14:11:16 -0800 as excerpted:

> On Thu, Jan 14, 2016 at 04:13:00PM +0000, James Hogarth wrote:
> 
>> Finally what's the present situation with regards to defragmentation
>> and deduplication? Is it safe to turn on autodefrag now when using
>> snapshots and duperemove? What should the behaviour be with the
>> proposed 4.5 dedup patches if both inline dedup and autodefrag are
>> enabled as mount options?
> 
> Was there ever a reason it was unsafe to do dedupe + autodefrag? To my
> knowledge this should be fine.

There's "unsafe" and there's "unsafe".  In this case, the question uses 
"unsafe" not as in "can crash or cause corruption unsafe", but rather as 
in "will it break the dedup reflinks I've worked so hard to create, 
reduplicating the content, unsafe".

The question was based on list discussion, originally in the context of 
(manual) defrag breaking snapshot reflinks and duplicating defragged 
content due to being (again) snapshot unaware.  The question in that form 
was if manual defrag is so bad in terms of additional space usage due to 
breaking reflinks, what about autodefrag?  The logical extension of the 
question here is what is the reflink-breaking effect of autodefrag on 
dedup?

In the original snapshot context of the question, there was originally 
some difference of opinion.  The one side, taken by a dev or two, was 
that it uses the same mechanism, so the effect should be similar.  The 
other side, originally taken by Hugo, was that it was no big deal, at 
first simply stated without a reason given, thus making things very 
confusing for pretty much everyone.

After the confusion became apparent, Hugo (as he later explained in his 
reply) did some research, originally intending to confirm his reasoning 
by pointing at the code.  However, in doing so he found out both sides 
were correct, they were simply looking at things from different 
viewpoints.

So here's the deal.  (Manual) defrag is pointed at some files and if they 
appear to be fragmented in the subvolume (snapshot or working copy) that 
it is pointed at, it will rewrite potentially large portions of the file 
as it attempts to consolidate fragmented sections into fewer fragments.  
Of course as it does so, it breaks reflinks (snapshot or otherwise), 
thereby increasing space usage.

Autodefrag, meanwhile, apparently primarily (only?) triggers on partial 
rewrite, and only checks a relatively small portion of the file around 
the written block, scheduling them for later rewrite of the relatively 
smaller section, if it is fragmented.  Yes, it'll break reflinks as well, 
but the write by itself will obviously break them for the block being 
rewritten, already, due to COW.  And because autodefrag primarily (only?) 
triggers on writes, not reads as well, and only in a relatively small 
area around the write itself, only these much smaller areas are subject 
to reflink breakage, and some such breakage would already be occurring 
due to the write in the first place.

So the answer, at least as Hugo explained it (or more precisely, at least 
as I understand his explanation...), is that autodefrag will rewrite more 
of the file and thus break more reflinks and (re)duplicate more blocks 
than doing the writes without autodefrag, but it'll be a relatively small 
increase in duplication, likely acceptable given the higher read 
efficiency of the autodefragged area, compared to manual defrag of the 
same files.

So autodefrag in the context of dedup could be a small positive or a 
small negative, depending on how sensitive specific installations that 
are already doing dedup are to the relatively small increase in size, but 
the effect should be nothing at all like manual defrag on the same files 
that were deduped, which has a far larger potential to undo all the 
reflinking the dedup did in the first place.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-01-24  5:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-14 16:13 Query about proposed dedup patches and behaviours James Hogarth
2016-01-14 16:46 ` Austin S. Hemmelgarn
2016-01-14 19:26   ` Liu Bo
2016-01-14 19:41     ` Austin S. Hemmelgarn
2016-01-15  1:47       ` Duncan
2016-01-15  9:33         ` James Hogarth
2016-01-15 12:18           ` Duncan
2016-01-20 15:33       ` Interjection: autodefrag mount option aye, nae? Al
2016-01-20 15:39         ` Austin S. Hemmelgarn
2016-01-20 18:39           ` Duncan
2016-01-21 20:59           ` Kai Krakow
2016-01-22 12:14             ` Austin S. Hemmelgarn
2016-01-22 19:43               ` Kai Krakow
2016-01-23 22:11 ` Query about proposed dedup patches and behaviours Mark Fasheh
2016-01-24  5:12   ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.