* Announcing btrfs-dedupe
@ 2016-11-06 13:30 James Pharaoh
2016-11-07 14:02 ` David Sterba
` (3 more replies)
0 siblings, 4 replies; 42+ messages in thread
From: James Pharaoh @ 2016-11-06 13:30 UTC (permalink / raw)
To: linux-btrfs
Hi all,
I'm pleased to announce my btrfs deduplication utility, written in Rust.
This operates on whole files, is fast, and I believe complements the
existing utilities (duperemove, bedup), which exist currently.
Please visit the homepage for more information:
http://btrfs-dedupe.com
James Pharaoh
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-06 13:30 Announcing btrfs-dedupe James Pharaoh
@ 2016-11-07 14:02 ` David Sterba
2016-11-07 17:48 ` Mark Fasheh
2016-11-08 2:40 ` Christoph Anton Mitterer
2016-11-07 17:59 ` Mark Fasheh
` (2 subsequent siblings)
3 siblings, 2 replies; 42+ messages in thread
From: David Sterba @ 2016-11-07 14:02 UTC (permalink / raw)
To: James Pharaoh; +Cc: linux-btrfs, mark
On Sun, Nov 06, 2016 at 02:30:52PM +0100, James Pharaoh wrote:
> I'm pleased to announce my btrfs deduplication utility, written in Rust.
> This operates on whole files, is fast, and I believe complements the
> existing utilities (duperemove, bedup), which exist currently.
Mark can correct me if I'm wrong, but AFAIK, duperemove can consume
output of fdupes, which does the whole file scanning for duplicates. And
I think adding a whole-file dedup mode to duperemove would be better
(from user's POV) than writing a whole new tool, eg. because of existing
availability of duperemove in the distros.
Also looking to your roadmap, some of the items are implemented in
duperemove: database of existing csums, cross filesystem boundary,
mtime-based speedups).
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-07 14:02 ` David Sterba
@ 2016-11-07 17:48 ` Mark Fasheh
2016-11-07 20:54 ` Adam Borowski
2016-11-08 2:40 ` Christoph Anton Mitterer
1 sibling, 1 reply; 42+ messages in thread
From: Mark Fasheh @ 2016-11-07 17:48 UTC (permalink / raw)
To: dsterba, James Pharaoh, linux-btrfs, Mark Fasheh
Hi David and James,
On Mon, Nov 7, 2016 at 6:02 AM, David Sterba <dsterba@suse.cz> wrote:
> On Sun, Nov 06, 2016 at 02:30:52PM +0100, James Pharaoh wrote:
>> I'm pleased to announce my btrfs deduplication utility, written in Rust.
>> This operates on whole files, is fast, and I believe complements the
>> existing utilities (duperemove, bedup), which exist currently.
>
> Mark can correct me if I'm wrong, but AFAIK, duperemove can consume
> output of fdupes, which does the whole file scanning for duplicates. And
> I think adding a whole-file dedup mode to duperemove would be better
> (from user's POV) than writing a whole new tool, eg. because of existing
> availability of duperemove in the distros.
Yeah you are correct - fdupes -r /foo | duperemove --fdupes will get
you the same effect.
There's been a request for us to do all of that internally so that the
whole file dedupe works with the mtime checking code. This is entirely
doable. I would probably either add a field to the files table or add
a new table to hold whole-file hashes. We can then squeeze down our
existing block hashes into one big one or just rehash the whole file.
> Also looking to your roadmap, some of the items are implemented in
> duperemove: database of existing csums, cross filesystem boundary,
> mtime-based speedups).
Yeah, rescanning based on mtime was a huge speedup for Duperemove as
was keeping checksums in a db. We do all this today, also on XFS with
the dedupe ioctl (I believe this should be out with Linux-4.9).
Btw, there's lots of little details and bug fixes which I feel add up
to a relatively complete (though far from perfect!) tool. For example,
the dedupe code can handle multiple kernel versions including old
kernels which couldn't dedupe on non aligned block boundaries. Every
major step in duperemove is threaded at this point too which has also
been an enormous performance increase (which new features benefit
from).
Thanks,
--Mark
--
"When the going gets weird, the weird turn pro."
Hunter S. Thompson
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-06 13:30 Announcing btrfs-dedupe James Pharaoh
2016-11-07 14:02 ` David Sterba
@ 2016-11-07 17:59 ` Mark Fasheh
2016-11-07 18:49 ` James Pharaoh
2016-11-08 11:06 ` Niccolò Belli
2016-11-08 22:36 ` Saint Germain
3 siblings, 1 reply; 42+ messages in thread
From: Mark Fasheh @ 2016-11-07 17:59 UTC (permalink / raw)
To: James Pharaoh; +Cc: linux-btrfs
Hi James,
Re the following text on your project page:
"IMPORTANT CAVEAT — I have read that there are race and/or error
conditions which can cause filesystem corruption in the kernel
implementation of the deduplication ioctl."
Can you expound on that? I'm not aware of any bugs right now but if
there is any it'd absolutely be worth having that info on the btrfs
list.
Thanks,
--Mark
On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh
<james@wellbehavedsoftware.com> wrote:
> Hi all,
>
> I'm pleased to announce my btrfs deduplication utility, written in Rust.
> This operates on whole files, is fast, and I believe complements the
> existing utilities (duperemove, bedup), which exist currently.
>
> Please visit the homepage for more information:
>
> http://btrfs-dedupe.com
>
> James Pharaoh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-07 17:59 ` Mark Fasheh
@ 2016-11-07 18:49 ` James Pharaoh
2016-11-07 18:53 ` James Pharaoh
2016-11-14 18:07 ` Zygo Blaxell
0 siblings, 2 replies; 42+ messages in thread
From: James Pharaoh @ 2016-11-07 18:49 UTC (permalink / raw)
To: Mark Fasheh; +Cc: linux-btrfs
Annoyingly I can't find this now, but I definitely remember reading
someone, apparently someone knowledgable, claim that the latest version
of the kernel which I was using at the time, still suffered from issues
regarding the dedupe code.
This was a while ago, and I would be very pleased to hear that there is
high confidence in the current implementation! I'll post a link if I
manage to find the comments.
James
On 07/11/16 18:59, Mark Fasheh wrote:
> Hi James,
>
> Re the following text on your project page:
>
> "IMPORTANT CAVEAT — I have read that there are race and/or error
> conditions which can cause filesystem corruption in the kernel
> implementation of the deduplication ioctl."
>
> Can you expound on that? I'm not aware of any bugs right now but if
> there is any it'd absolutely be worth having that info on the btrfs
> list.
>
> Thanks,
> --Mark
>
>
> On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh
> <james@wellbehavedsoftware.com> wrote:
>> Hi all,
>>
>> I'm pleased to announce my btrfs deduplication utility, written in Rust.
>> This operates on whole files, is fast, and I believe complements the
>> existing utilities (duperemove, bedup), which exist currently.
>>
>> Please visit the homepage for more information:
>>
>> http://btrfs-dedupe.com
>>
>> James Pharaoh
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-07 18:49 ` James Pharaoh
@ 2016-11-07 18:53 ` James Pharaoh
2016-11-14 18:07 ` Zygo Blaxell
1 sibling, 0 replies; 42+ messages in thread
From: James Pharaoh @ 2016-11-07 18:53 UTC (permalink / raw)
To: Mark Fasheh; +Cc: linux-btrfs
FWIW I have updated my comments about duperemove and also the "caveat"
section you mentioned in your other mail in the readme.
http://btrfs-dedupe.com
James
On 07/11/16 19:49, James Pharaoh wrote:
> Annoyingly I can't find this now, but I definitely remember reading
> someone, apparently someone knowledgable, claim that the latest version
> of the kernel which I was using at the time, still suffered from issues
> regarding the dedupe code.
>
> This was a while ago, and I would be very pleased to hear that there is
> high confidence in the current implementation! I'll post a link if I
> manage to find the comments.
>
> James
>
> On 07/11/16 18:59, Mark Fasheh wrote:
>> Hi James,
>>
>> Re the following text on your project page:
>>
>> "IMPORTANT CAVEAT — I have read that there are race and/or error
>> conditions which can cause filesystem corruption in the kernel
>> implementation of the deduplication ioctl."
>>
>> Can you expound on that? I'm not aware of any bugs right now but if
>> there is any it'd absolutely be worth having that info on the btrfs
>> list.
>>
>> Thanks,
>> --Mark
>>
>>
>> On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh
>> <james@wellbehavedsoftware.com> wrote:
>>> Hi all,
>>>
>>> I'm pleased to announce my btrfs deduplication utility, written in Rust.
>>> This operates on whole files, is fast, and I believe complements the
>>> existing utilities (duperemove, bedup), which exist currently.
>>>
>>> Please visit the homepage for more information:
>>>
>>> http://btrfs-dedupe.com
>>>
>>> James Pharaoh
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-07 17:48 ` Mark Fasheh
@ 2016-11-07 20:54 ` Adam Borowski
2016-11-08 2:17 ` Darrick J. Wong
2016-11-09 15:02 ` David Sterba
0 siblings, 2 replies; 42+ messages in thread
From: Adam Borowski @ 2016-11-07 20:54 UTC (permalink / raw)
To: Mark Fasheh; +Cc: dsterba, James Pharaoh, linux-btrfs
On Mon, Nov 07, 2016 at 09:48:41AM -0800, Mark Fasheh wrote:
> also on XFS with the dedupe ioctl (I believe this should be out with
> Linux-4.9).
It's already there in 4.9-rc1, although you need a special version of
xfsprogs (possibly already released, I didn't check). It's an experimental
feature that needs to be enabled with "-m reflink=1".
Despite that experimental status, I'd strongly recommend James to test his
tool on xfs as well, as it's the second major implementation of this API[1].
Mark has already included XFS in documentation of duperemove, all that looks
amiss is btrfs-extent-same having an obsolete name. But then, I never did
any non-superficial tests on XFS, beyond "seems to work".
Meow!
[1]. For some reasons zfs-on-linux guys didn't implement this yet, despite
it being an obvious thing on ZFS.
--
A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol, 1kg
raspberries, 0.4kg sugar; put into a big jar for 1 month. Filter out and
throw away the fruits (can dump them into a cake, etc), let the drink age
at least 3-6 months.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-07 20:54 ` Adam Borowski
@ 2016-11-08 2:17 ` Darrick J. Wong
2016-11-08 18:59 ` Mark Fasheh
2016-11-09 15:02 ` David Sterba
1 sibling, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2016-11-08 2:17 UTC (permalink / raw)
To: Adam Borowski; +Cc: Mark Fasheh, dsterba, James Pharaoh, linux-btrfs
On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote:
> On Mon, Nov 07, 2016 at 09:48:41AM -0800, Mark Fasheh wrote:
> > also on XFS with the dedupe ioctl (I believe this should be out with
> > Linux-4.9).
>
> It's already there in 4.9-rc1, although you need a special version of
> xfsprogs (possibly already released, I didn't check). It's an experimental
> feature that needs to be enabled with "-m reflink=1".
The code will be available in xfsprogs 4.9, due out after Linux 4.9.
You'll still have to pass '-m reflink=1' to enable reflink until we
declare the feature stable, however.
> Despite that experimental status, I'd strongly recommend James to test his
> tool on xfs as well, as it's the second major implementation of this API[1].
Agreed. :)
> Mark has already included XFS in documentation of duperemove, all that looks
> amiss is btrfs-extent-same having an obsolete name. But then, I never did
> any non-superficial tests on XFS, beyond "seems to work".
/me wonders if ocfs2 will ever catch up to the reflink/dedupe party. ;)
--Darrick
>
>
> Meow!
>
> [1]. For some reasons zfs-on-linux guys didn't implement this yet, despite
> it being an obvious thing on ZFS.
> --
> A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol, 1kg
> raspberries, 0.4kg sugar; put into a big jar for 1 month. Filter out and
> throw away the fruits (can dump them into a cake, etc), let the drink age
> at least 3-6 months.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-07 14:02 ` David Sterba
2016-11-07 17:48 ` Mark Fasheh
@ 2016-11-08 2:40 ` Christoph Anton Mitterer
2016-11-08 6:11 ` James Pharaoh
` (2 more replies)
1 sibling, 3 replies; 42+ messages in thread
From: Christoph Anton Mitterer @ 2016-11-08 2:40 UTC (permalink / raw)
To: dsterba, James Pharaoh; +Cc: linux-btrfs, mark
[-- Attachment #1: Type: text/plain, Size: 372 bytes --]
On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:
> I think adding a whole-file dedup mode to duperemove would be better
> (from user's POV) than writing a whole new tool
What would IMO be really good from a user's POV was, if one of the
tools, deemed to be the "best", would be added to the btrfs-progs and
simply become "the official" one.
Cheers,
Chris.
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 2:40 ` Christoph Anton Mitterer
@ 2016-11-08 6:11 ` James Pharaoh
2016-11-08 13:26 ` Austin S. Hemmelgarn
2016-11-08 18:49 ` Mark Fasheh
2 siblings, 0 replies; 42+ messages in thread
From: James Pharaoh @ 2016-11-08 6:11 UTC (permalink / raw)
To: Christoph Anton Mitterer, dsterba; +Cc: linux-btrfs, mark
Perhaps the complexity of doing this efficiently makes it inappropriate
for inclusion in the tool itself, whereas I believe the core
implementation's focus is on in-band deduplication, automatic and behind
the scenes.
On 08/11/16 03:40, Christoph Anton Mitterer wrote:
> On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:
>> I think adding a whole-file dedup mode to duperemove would be better
>> (from user's POV) than writing a whole new tool
>
> What would IMO be really good from a user's POV was, if one of the
> tools, deemed to be the "best", would be added to the btrfs-progs and
> simply become "the official" one.
>
> Cheers,
> Chris.
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-06 13:30 Announcing btrfs-dedupe James Pharaoh
2016-11-07 14:02 ` David Sterba
2016-11-07 17:59 ` Mark Fasheh
@ 2016-11-08 11:06 ` Niccolò Belli
2016-11-08 11:38 ` James Pharaoh
2016-11-14 18:27 ` Zygo Blaxell
2016-11-08 22:36 ` Saint Germain
3 siblings, 2 replies; 42+ messages in thread
From: Niccolò Belli @ 2016-11-08 11:06 UTC (permalink / raw)
To: James Pharaoh; +Cc: linux-btrfs
Nice, you should probably update the btrfs wiki as well, because there is
no mention of btrfs-dedupe.
First question, why this name? Don't you plan to support xfs as well?
Second question, I'm trying deduplication tools for the very first time and
I still have to figure out how to handle snapper snapshots, which are read
only. I currently tried duperemove 0.11 git and I get tons of "Error 30:
Read-only file system while opening "/.../@snapshots/4385/...". How am I
supposed to handle snapper snapshots?
I do not run duperemove from a live distro, instead I run it directly on
the system I want to deduplicate:
sudo mount -o noatime,compress=lzo,autodefrag /dev/mapper/cryptroot
/home/niko/nosnap/rootfs/
sudo duperemove -drh --dedupe-options=nofiemap
--hashfile=/home/niko/nosnap/rootfs.hash /home/niko/nosnap/rootfs/
Is btrfs-dedupe able to handle snapper snapshots?
Thanks,
Niccolo' Belli
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 11:06 ` Niccolò Belli
@ 2016-11-08 11:38 ` James Pharaoh
2016-11-08 16:57 ` Niccolò Belli
2016-11-14 18:27 ` Zygo Blaxell
1 sibling, 1 reply; 42+ messages in thread
From: James Pharaoh @ 2016-11-08 11:38 UTC (permalink / raw)
To: Niccolò Belli; +Cc: linux-btrfs
On 08/11/16 12:06, Niccolò Belli wrote:
> Nice, you should probably update the btrfs wiki as well, because there
> is no mention of btrfs-dedupe.
I am planning to, I had to apply for an account, which has now been
approved.
> First question, why this name? Don't you plan to support xfs as well?
It didn't occur to me, to be honest. I might support XFS as well, but I
don't use it, and will possibly be adding other btrfs-specific stuff to
it. You'll notice it's part of a bigger wbs-backup repo, with other
tools, which I'm developing to manage my storage and backup requirements.
I'll take a look at it, and certainly see if it works out of the box.
> Second question, I'm trying deduplication tools for the very first time
> and I still have to figure out how to handle snapper snapshots, which
> are read only. I currently tried duperemove 0.11 git and I get tons of
> "Error 30: Read-only file system while opening
> "/.../@snapshots/4385/...". How am I supposed to handle snapper snapshots?
> Is btrfs-dedupe able to handle snapper snapshots?
You can't deduplicate a read-only snapshot, but you can create
read-write snapshots from them, deduplicate those, and then recreate the
read-only ones. This is what I've done.
In theory, once this has been done once, it shouldn't have to be done
again, at least for those snapshots, unless you want to modify the
deduplication. It's probably a good idea to defragment files and
directories first, as well.
It should be possible to deduplicate a read-only file to a read-write
one, but that's probably not worth the effort in many real-world use cases.
James
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 2:40 ` Christoph Anton Mitterer
2016-11-08 6:11 ` James Pharaoh
@ 2016-11-08 13:26 ` Austin S. Hemmelgarn
2016-11-08 16:57 ` Darrick J. Wong
2016-11-08 18:49 ` Mark Fasheh
2 siblings, 1 reply; 42+ messages in thread
From: Austin S. Hemmelgarn @ 2016-11-08 13:26 UTC (permalink / raw)
To: Christoph Anton Mitterer, dsterba, James Pharaoh; +Cc: linux-btrfs, mark
On 2016-11-07 21:40, Christoph Anton Mitterer wrote:
> On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:
>> I think adding a whole-file dedup mode to duperemove would be better
>> (from user's POV) than writing a whole new tool
>
> What would IMO be really good from a user's POV was, if one of the
> tools, deemed to be the "best", would be added to the btrfs-progs and
> simply become "the official" one.
The problem is that for deduplication, most tools won't work well for
everything. For example the cases I use it in are very specific and
have horrible performance using pretty much any available tool (I have a
couple cases where I have disjoint subsets of the same directory tree
with different prefixes, so I can tell exactly which files are
duplicated, and that any duplicate file is 100% duplicate, as well as a
couple of cases where changes are small, scattered, and highly
predictable (and thus it's easier to find what's changed and dedupe
everything else instead of finding what's the same), and none of the
existing options do well in either situation).
I'd argue at minimum for having the extent-same tool from duperemove in
btrfs-progs, as that lets people do deduplication how they want without
having to write C code. Something equivalent that would let you call
any BTRFS ioctl with (reasonably) arbitrary arguments might actually be
even better (I can see such a tool being wonderful for debugging).
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 13:26 ` Austin S. Hemmelgarn
@ 2016-11-08 16:57 ` Darrick J. Wong
2016-11-08 17:04 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2016-11-08 16:57 UTC (permalink / raw)
To: Austin S. Hemmelgarn
Cc: Christoph Anton Mitterer, dsterba, James Pharaoh, linux-btrfs, mark
On Tue, Nov 08, 2016 at 08:26:02AM -0500, Austin S. Hemmelgarn wrote:
> On 2016-11-07 21:40, Christoph Anton Mitterer wrote:
> >On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:
> >>I think adding a whole-file dedup mode to duperemove would be better
> >>(from user's POV) than writing a whole new tool
> >
> >What would IMO be really good from a user's POV was, if one of the
> >tools, deemed to be the "best", would be added to the btrfs-progs and
> >simply become "the official" one.
>
> The problem is that for deduplication, most tools won't work well for
> everything. For example the cases I use it in are very specific and have
> horrible performance using pretty much any available tool (I have a couple
> cases where I have disjoint subsets of the same directory tree with
> different prefixes, so I can tell exactly which files are duplicated, and
> that any duplicate file is 100% duplicate, as well as a couple of cases
> where changes are small, scattered, and highly predictable (and thus it's
> easier to find what's changed and dedupe everything else instead of finding
> what's the same), and none of the existing options do well in either
> situation).
>
> I'd argue at minimum for having the extent-same tool from duperemove in
> btrfs-progs, as that lets people do deduplication how they want without
> having to write C code. Something equivalent that would let you call any
> BTRFS ioctl with (reasonably) arbitrary arguments might actually be even
> better (I can see such a tool being wonderful for debugging).
Since xfsprogs 4.3, xfs_io has a 'dedupe' command that can talk to
FIDEDUPERANGE (f.k.a. EXTENT SAME):
$ xfs_io -c '/mnt/srcfile srcoffset dstoffset length' /mnt/destfile
--D
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 11:38 ` James Pharaoh
@ 2016-11-08 16:57 ` Niccolò Belli
2016-11-08 16:58 ` James Pharaoh
0 siblings, 1 reply; 42+ messages in thread
From: Niccolò Belli @ 2016-11-08 16:57 UTC (permalink / raw)
To: James Pharaoh; +Cc: linux-btrfs
On martedì 8 novembre 2016 12:38:48 CET, James Pharaoh wrote:
> You can't deduplicate a read-only snapshot, but you can create
> read-write snapshots from them, deduplicate those, and then
> recreate the read-only ones. This is what I've done.
Since snapper creates hundreds of snapshots, isn't this something that the
deduplication software could do for me if I explicitely tell it to do so? I
mean momentarily switching the snapshot to rw in order to deduplicate it,
then switching it back to ro.
> In theory, once this has been done once, it shouldn't have to
> be done again, at least for those snapshots, unless you want to
> modify the deduplication. It's probably a good idea to
> defragment files and directories first, as well.
I can't defragment anything, because it would take too much free space to
do so with so many snapshots. Instead, the deduplication software could
defragment each file before calling the extent-same ioctl, that would be
feasible. Such a way you will not need hilarious amounts of free space to
defragment the fs.
> It should be possible to deduplicate a read-only file to a
> read-write one, but that's probably not worth the effort in many
> real-world use cases.
This is exactly what I would expect a deduplication tool to do when it
encounters a ro snapshot, except when I explicitely tell it to momentarily
switch the snapshot to rw in order to deduplicate it.
Niccolo' Belli
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 16:57 ` Niccolò Belli
@ 2016-11-08 16:58 ` James Pharaoh
2016-11-08 17:08 ` Niccolò Belli
0 siblings, 1 reply; 42+ messages in thread
From: James Pharaoh @ 2016-11-08 16:58 UTC (permalink / raw)
To: Niccolò Belli; +Cc: linux-btrfs
Yes, everything you have described here is something I intend to create,
and might as well include in the tool itself. I'll add it to the roadmap ;-)
James
On 08/11/16 17:57, Niccolò Belli wrote:
> On martedì 8 novembre 2016 12:38:48 CET, James Pharaoh wrote:
>> You can't deduplicate a read-only snapshot, but you can create
>> read-write snapshots from them, deduplicate those, and then recreate
>> the read-only ones. This is what I've done.
>
> Since snapper creates hundreds of snapshots, isn't this something that
> the deduplication software could do for me if I explicitely tell it to
> do so? I mean momentarily switching the snapshot to rw in order to
> deduplicate it, then switching it back to ro.
>
>> In theory, once this has been done once, it shouldn't have to be done
>> again, at least for those snapshots, unless you want to modify the
>> deduplication. It's probably a good idea to defragment files and
>> directories first, as well.
>
> I can't defragment anything, because it would take too much free space
> to do so with so many snapshots. Instead, the deduplication software
> could defragment each file before calling the extent-same ioctl, that
> would be feasible. Such a way you will not need hilarious amounts of
> free space to defragment the fs.
>
>> It should be possible to deduplicate a read-only file to a read-write
>> one, but that's probably not worth the effort in many real-world use
>> cases.
>
> This is exactly what I would expect a deduplication tool to do when it
> encounters a ro snapshot, except when I explicitely tell it to
> momentarily switch the snapshot to rw in order to deduplicate it.
>
> Niccolo' Belli
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 16:57 ` Darrick J. Wong
@ 2016-11-08 17:04 ` Austin S. Hemmelgarn
0 siblings, 0 replies; 42+ messages in thread
From: Austin S. Hemmelgarn @ 2016-11-08 17:04 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Anton Mitterer, dsterba, James Pharaoh, linux-btrfs, mark
On 2016-11-08 11:57, Darrick J. Wong wrote:
> On Tue, Nov 08, 2016 at 08:26:02AM -0500, Austin S. Hemmelgarn wrote:
>> On 2016-11-07 21:40, Christoph Anton Mitterer wrote:
>>> On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:
>>>> I think adding a whole-file dedup mode to duperemove would be better
>>>> (from user's POV) than writing a whole new tool
>>>
>>> What would IMO be really good from a user's POV was, if one of the
>>> tools, deemed to be the "best", would be added to the btrfs-progs and
>>> simply become "the official" one.
>>
>> The problem is that for deduplication, most tools won't work well for
>> everything. For example the cases I use it in are very specific and have
>> horrible performance using pretty much any available tool (I have a couple
>> cases where I have disjoint subsets of the same directory tree with
>> different prefixes, so I can tell exactly which files are duplicated, and
>> that any duplicate file is 100% duplicate, as well as a couple of cases
>> where changes are small, scattered, and highly predictable (and thus it's
>> easier to find what's changed and dedupe everything else instead of finding
>> what's the same), and none of the existing options do well in either
>> situation).
>>
>> I'd argue at minimum for having the extent-same tool from duperemove in
>> btrfs-progs, as that lets people do deduplication how they want without
>> having to write C code. Something equivalent that would let you call any
>> BTRFS ioctl with (reasonably) arbitrary arguments might actually be even
>> better (I can see such a tool being wonderful for debugging).
>
> Since xfsprogs 4.3, xfs_io has a 'dedupe' command that can talk to
> FIDEDUPERANGE (f.k.a. EXTENT SAME):
>
> $ xfs_io -c '/mnt/srcfile srcoffset dstoffset length' /mnt/destfile
>
I actually hadn't known about this, thanks. It means that xfs_io just
got even more useful despite me not running XFS.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 16:58 ` James Pharaoh
@ 2016-11-08 17:08 ` Niccolò Belli
0 siblings, 0 replies; 42+ messages in thread
From: Niccolò Belli @ 2016-11-08 17:08 UTC (permalink / raw)
To: James Pharaoh; +Cc: linux-btrfs
On martedì 8 novembre 2016 17:58:52 CET, James Pharaoh wrote:
> Yes, everything you have described here is something I intend
> to create, and might as well include in the tool itself. I'll
> add it to the roadmap ;-)
Sounds good, but I have yet another feature request which is even more
interesting in my opinion.
If you ever used snapper you probably already found yourself in the
poisition when you want to free some space and you actually can't, because
the files you want to delete are already present in countless snapshots.
Such a way you will have to delete the unwanted files from every snapshot,
which is tedious task, even more difficult if you moved/renamed these
files. What I actually do is exploiting duperemove's hashfile to grep for
the checksum and obtain all the paths. Then I will have to switch the
snapshots to rw, manually delete each file and finally switch them back to
ro. A tool which automates these task would be awesome.
Niccolo'
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 2:40 ` Christoph Anton Mitterer
2016-11-08 6:11 ` James Pharaoh
2016-11-08 13:26 ` Austin S. Hemmelgarn
@ 2016-11-08 18:49 ` Mark Fasheh
2 siblings, 0 replies; 42+ messages in thread
From: Mark Fasheh @ 2016-11-08 18:49 UTC (permalink / raw)
To: Christoph Anton Mitterer; +Cc: dsterba, James Pharaoh, linux-btrfs
On Mon, Nov 7, 2016 at 6:40 PM, Christoph Anton Mitterer
<calestyo@scientia.net> wrote:
> On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote:
>> I think adding a whole-file dedup mode to duperemove would be better
>> (from user's POV) than writing a whole new tool
>
> What would IMO be really good from a user's POV was, if one of the
> tools, deemed to be the "best", would be added to the btrfs-progs and
> simply become "the official" one.
Yeah there's two problems, one being that the extent-same ioctl (and
duperemove) is cross-file system now so I. The other one James touches
on, which is that there's a non trivial amount of complexity in
duperemove so shoving it in btrfs progs just means we're going to have
parallel development streams solving some different problems.
That's not to say that every dedupe tool has to be complex - we have
xfs_io to run the ioctl and I don't think it'd be a bad idea if
btrfs-progs had a simple interface to it too.
--Mark
--
"When the going gets weird, the weird turn pro."
Hunter S. Thompson
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 2:17 ` Darrick J. Wong
@ 2016-11-08 18:59 ` Mark Fasheh
2016-11-08 19:47 ` [Ocfs2-devel] " Darrick J. Wong
0 siblings, 1 reply; 42+ messages in thread
From: Mark Fasheh @ 2016-11-08 18:59 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Adam Borowski, dsterba, James Pharaoh, linux-btrfs
On Mon, Nov 7, 2016 at 6:17 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote:
>> Mark has already included XFS in documentation of duperemove, all that looks
>> amiss is btrfs-extent-same having an obsolete name. But then, I never did
>> any non-superficial tests on XFS, beyond "seems to work".
I'd actually be ok dropping btrfs-extent-same completely at this point
but I'm concerned that it would leave some users behind.
> /me wonders if ocfs2 will ever catch up to the reflink/dedupe party. ;)
Hey, Ocfs2 started the reflink party! But yeah it's fallen behind
since then with respect to cow and dedupe. More importantly though I'd
like to see some extra extent tracking in there like XFS did with the
reflink b+tree.
--Mark
--
"When the going gets weird, the weird turn pro."
Hunter S. Thompson
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 18:59 ` Mark Fasheh
@ 2016-11-08 19:47 ` Darrick J. Wong
0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2016-11-08 19:47 UTC (permalink / raw)
To: Mark Fasheh
Cc: Adam Borowski, dsterba, James Pharaoh, linux-btrfs, ocfs2-devel
On Tue, Nov 08, 2016 at 10:59:56AM -0800, Mark Fasheh wrote:
> On Mon, Nov 7, 2016 at 6:17 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote:
> >> Mark has already included XFS in documentation of duperemove, all that looks
> >> amiss is btrfs-extent-same having an obsolete name. But then, I never did
> >> any non-superficial tests on XFS, beyond "seems to work".
>
> I'd actually be ok dropping btrfs-extent-same completely at this point
> but I'm concerned that it would leave some users behind.
>
>
> > /me wonders if ocfs2 will ever catch up to the reflink/dedupe party. ;)
>
> Hey, Ocfs2 started the reflink party! But yeah it's fallen behind
> since then with respect to cow and dedupe. More importantly though I'd
> like to see some extra extent tracking in there like XFS did with the
> reflink b+tree.
Perhaps this should move to the ocfs2 list, but...
...as I understand ocfs2, each inode can point to the head of a refcount
tree that maintains refcounts for all the physical blocks that are
mapped by any of the files that share that refcount tree. It wouldn't
be difficult to hook up this existing refcount structure to the reflink
and dedupe vfs ioctls, with the huge caveat that both inodes will end up
belonging to the same refcount tree (or the call fails). This might not
be such a huge issue for reflink since we're generally only using it
during a file copy anyway, but for dedupe this could have disastrous
consequences if someone does an fs-wide dedupe and every file in the fs
ends up with the same refcount tree.
So I guess you could give each block group its own refcount tree or
something so that all the writes in the fs don't end up contending for a
single data structure.
--D
> --Mark
>
> --
> "When the going gets weird, the weird turn pro."
> Hunter S. Thompson
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Ocfs2-devel] Announcing btrfs-dedupe
@ 2016-11-08 19:47 ` Darrick J. Wong
0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2016-11-08 19:47 UTC (permalink / raw)
To: Mark Fasheh
Cc: Adam Borowski, dsterba, James Pharaoh, linux-btrfs, ocfs2-devel
On Tue, Nov 08, 2016 at 10:59:56AM -0800, Mark Fasheh wrote:
> On Mon, Nov 7, 2016 at 6:17 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote:
> >> Mark has already included XFS in documentation of duperemove, all that looks
> >> amiss is btrfs-extent-same having an obsolete name. But then, I never did
> >> any non-superficial tests on XFS, beyond "seems to work".
>
> I'd actually be ok dropping btrfs-extent-same completely at this point
> but I'm concerned that it would leave some users behind.
>
>
> > /me wonders if ocfs2 will ever catch up to the reflink/dedupe party. ;)
>
> Hey, Ocfs2 started the reflink party! But yeah it's fallen behind
> since then with respect to cow and dedupe. More importantly though I'd
> like to see some extra extent tracking in there like XFS did with the
> reflink b+tree.
Perhaps this should move to the ocfs2 list, but...
...as I understand ocfs2, each inode can point to the head of a refcount
tree that maintains refcounts for all the physical blocks that are
mapped by any of the files that share that refcount tree. It wouldn't
be difficult to hook up this existing refcount structure to the reflink
and dedupe vfs ioctls, with the huge caveat that both inodes will end up
belonging to the same refcount tree (or the call fails). This might not
be such a huge issue for reflink since we're generally only using it
during a file copy anyway, but for dedupe this could have disastrous
consequences if someone does an fs-wide dedupe and every file in the fs
ends up with the same refcount tree.
So I guess you could give each block group its own refcount tree or
something so that all the writes in the fs don't end up contending for a
single data structure.
--D
> --Mark
>
> --
> "When the going gets weird, the weird turn pro."
> Hunter S. Thompson
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-06 13:30 Announcing btrfs-dedupe James Pharaoh
` (2 preceding siblings ...)
2016-11-08 11:06 ` Niccolò Belli
@ 2016-11-08 22:36 ` Saint Germain
2016-11-09 11:24 ` Niccolò Belli
2016-11-13 12:45 ` James Pharaoh
3 siblings, 2 replies; 42+ messages in thread
From: Saint Germain @ 2016-11-08 22:36 UTC (permalink / raw)
To: linux-btrfs; +Cc: James Pharaoh
On Sun, 6 Nov 2016 14:30:52 +0100, James Pharaoh
<james@wellbehavedsoftware.com> wrote :
> Hi all,
>
> I'm pleased to announce my btrfs deduplication utility, written in
> Rust. This operates on whole files, is fast, and I believe
> complements the existing utilities (duperemove, bedup), which exist
> currently.
>
> Please visit the homepage for more information:
>
> http://btrfs-dedupe.com
>
Thanks for having shared your work.
Please be aware of these other similar softwares:
- jdupes: https://github.com/jbruchon/jdupes
- rmlint: https://github.com/sahib/rmlint
And of course fdupes.
Some intesting points I have seen in them:
- use xxhash to identify potential duplicates (huge speedup)
- ability to deduplicate read-only snapshots
- identify potential reflinked files (see also my email here:
https://www.spinics.net/lists/linux-btrfs/msg60081.html)
- ability to filter out hardlinks
- triangle problem: see jdupes readme
- jdupes has started the process to be included in Debian
I hope that will help and that you can share some codes with them !
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 22:36 ` Saint Germain
@ 2016-11-09 11:24 ` Niccolò Belli
2016-11-09 12:47 ` Saint Germain
2016-11-13 12:45 ` James Pharaoh
1 sibling, 1 reply; 42+ messages in thread
From: Niccolò Belli @ 2016-11-09 11:24 UTC (permalink / raw)
To: Saint Germain; +Cc: linux-btrfs, James Pharaoh
Hi,
What do you think about jdupes? I'm searching an alternative to duperemove
and rmlint doesn't seem to support btrfs deduplication, so I would like to
try jdupes. My main problem with duperemove is a memory leak, also it seems
to lead to greater disk usage:
https://github.com/markfasheh/duperemove/issues/163
Niccolo' Belli
On martedì 8 novembre 2016 23:36:25 CET, Saint Germain wrote:
> Please be aware of these other similar softwares:
> - jdupes: https://github.com/jbruchon/jdupes
> - rmlint: https://github.com/sahib/rmlint
> And of course fdupes.
>
> Some intesting points I have seen in them:
> - use xxhash to identify potential duplicates (huge speedup)
> - ability to deduplicate read-only snapshots
> - identify potential reflinked files (see also my email here:
> https://www.spinics.net/lists/linux-btrfs/msg60081.html)
> - ability to filter out hardlinks
> - triangle problem: see jdupes readme
> - jdupes has started the process to be included in Debian
>
> I hope that will help and that you can share some codes with them !
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-09 11:24 ` Niccolò Belli
@ 2016-11-09 12:47 ` Saint Germain
0 siblings, 0 replies; 42+ messages in thread
From: Saint Germain @ 2016-11-09 12:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Niccolò Belli
On Wed, 09 Nov 2016 12:24:51 +0100, Niccolò Belli
<darkbasic@linuxsystems.it> wrote :
>
> On martedì 8 novembre 2016 23:36:25 CET, Saint Germain wrote:
> > Please be aware of these other similar softwares:
> > - jdupes: https://github.com/jbruchon/jdupes
> > - rmlint: https://github.com/sahib/rmlint
> > And of course fdupes.
> >
> > Some intesting points I have seen in them:
> > - use xxhash to identify potential duplicates (huge speedup)
> > - ability to deduplicate read-only snapshots
> > - identify potential reflinked files (see also my email here:
> > https://www.spinics.net/lists/linux-btrfs/msg60081.html)
> > - ability to filter out hardlinks
> > - triangle problem: see jdupes readme
> > - jdupes has started the process to be included in Debian
> >
> > I hope that will help and that you can share some codes with them !
> >
> Hi,
> What do you think about jdupes? I'm searching an alternative to
> duperemove and rmlint doesn't seem to support btrfs deduplication, so
> I would like to try jdupes. My main problem with duperemove is a
> memory leak, also it seems to lead to greater disk usage:
> https://github.com/markfasheh/duperemove/issues/163
rmlint is supporting btrfs deduplication:
rmlint --algorithm=xxhash --types="duplicates" --hidden --config=sh:handler=clone --no-hardlinked
I've used jdupes and rmlint to deduplicate 2TB with 4GB RAM and it took
a few hours. So it is acceptable from a performance point of view.
The problems I found have been corrected by both.
Jdupes author is really kind and reactive !
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-07 20:54 ` Adam Borowski
2016-11-08 2:17 ` Darrick J. Wong
@ 2016-11-09 15:02 ` David Sterba
1 sibling, 0 replies; 42+ messages in thread
From: David Sterba @ 2016-11-09 15:02 UTC (permalink / raw)
To: Adam Borowski; +Cc: Mark Fasheh, dsterba, James Pharaoh, linux-btrfs
On Mon, Nov 07, 2016 at 09:54:09PM +0100, Adam Borowski wrote:
> [1]. For some reasons zfs-on-linux guys didn't implement this yet, despite
> it being an obvious thing on ZFS.
In my understanding, the COW mechanics are different, there are no
extent back references, so this would require some design updates. See
issue 405 at ZoL tracker.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 22:36 ` Saint Germain
2016-11-09 11:24 ` Niccolò Belli
@ 2016-11-13 12:45 ` James Pharaoh
1 sibling, 0 replies; 42+ messages in thread
From: James Pharaoh @ 2016-11-13 12:45 UTC (permalink / raw)
To: Saint Germain, linux-btrfs
I've updated the BTRFS wiki here with all the new tools people have
mentioned:
https://btrfs.wiki.kernel.org/index.php/Deduplication#Other_tools
Please let me know if anyone who does not have access to the wiki has
any additions, updates or corrections to what I've written here.
James
On 08/11/16 23:36, Saint Germain wrote:
> On Sun, 6 Nov 2016 14:30:52 +0100, James Pharaoh
> <james@wellbehavedsoftware.com> wrote :
>
>> Hi all,
>>
>> I'm pleased to announce my btrfs deduplication utility, written in
>> Rust. This operates on whole files, is fast, and I believe
>> complements the existing utilities (duperemove, bedup), which exist
>> currently.
>>
>> Please visit the homepage for more information:
>>
>> http://btrfs-dedupe.com
>>
>
> Thanks for having shared your work.
> Please be aware of these other similar softwares:
> - jdupes: https://github.com/jbruchon/jdupes
> - rmlint: https://github.com/sahib/rmlint
> And of course fdupes.
>
> Some intesting points I have seen in them:
> - use xxhash to identify potential duplicates (huge speedup)
> - ability to deduplicate read-only snapshots
> - identify potential reflinked files (see also my email here:
> https://www.spinics.net/lists/linux-btrfs/msg60081.html)
> - ability to filter out hardlinks
> - triangle problem: see jdupes readme
> - jdupes has started the process to be included in Debian
>
> I hope that will help and that you can share some codes with them !
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-07 18:49 ` James Pharaoh
2016-11-07 18:53 ` James Pharaoh
@ 2016-11-14 18:07 ` Zygo Blaxell
2016-11-14 18:22 ` James Pharaoh
1 sibling, 1 reply; 42+ messages in thread
From: Zygo Blaxell @ 2016-11-14 18:07 UTC (permalink / raw)
To: James Pharaoh; +Cc: Mark Fasheh, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 4115 bytes --]
On Mon, Nov 07, 2016 at 07:49:51PM +0100, James Pharaoh wrote:
> Annoyingly I can't find this now, but I definitely remember reading someone,
> apparently someone knowledgable, claim that the latest version of the kernel
> which I was using at the time, still suffered from issues regarding the
> dedupe code.
> This was a while ago, and I would be very pleased to hear that there is high
> confidence in the current implementation! I'll post a link if I manage to
> find the comments.
I've been running the btrfs dedup ioctl 7 times per second on average
over 42TB of test data for most of a year (and at a lower rate for two
years). I have not found any data corruptions due to _dedup_. I did find
three distinct data corruption kernel bugs unrelated to dedup, and two
test machines with bad RAM, so I'm pretty sure my corruption detection
is working.
That said, I wouldn't run dedup on a kernel older than 4.4. LTS kernels
might be OK too, but only if they're up to date with backported btrfs
fixes.
Kernels older than 3.13 lack the FILE_EXTENT_SAME ioctl and can
only deduplicate static data (i.e. data you are certain is not being
concurrently modified). Before 3.12 there are so many bugs you might
as well not bother.
Older kernels are bad for dedup because of non-corruption reasons.
Between 3.13 and 4.4, the following bugs were fixed:
- false-negative capability checks (e.g. same-inode, EOF extent)
reduce dedup efficiency
- ctime updates (older versions would update ctime when a file was
deduped) mess with incremental backup tools, build systems, etc.
- kernel memory leaks (self-explanatory)
- multiple kernel hang/panic bugs (e.g. a deadlock if two threads
try to read the same extent at the same time, and at least one
of those threads is dedup; and there was some race condition
leading to invalid memory access on dedup's comparison reads)
which won't eat your data, but they might ruin your day anyway.
There is also a still-unresolved problem where the filesystem CPU usage
rises exponentially for some operations depending on the number of shared
references to an extent. Files which contain blocks with more than a few
thousand shared references can trigger this problem. A file over 1TB can
keep the kernel busy at 100% CPU for over 40 minutes at a time.
There might also be a correlation between delalloc data and hangs in
extent-same, but I have NOT been able to confirm this. All I know
at this point is that doing a fsync() on the source FD just before
doing the extent-same ioctl dramatically reduces filesystem hang rates:
several weeks between hangs (or no hangs at all) with fsync, vs. 18 hours
or less without.
> James
>
> On 07/11/16 18:59, Mark Fasheh wrote:
> >Hi James,
> >
> >Re the following text on your project page:
> >
> >"IMPORTANT CAVEAT — I have read that there are race and/or error
> >conditions which can cause filesystem corruption in the kernel
> >implementation of the deduplication ioctl."
> >
> >Can you expound on that? I'm not aware of any bugs right now but if
> >there is any it'd absolutely be worth having that info on the btrfs
> >list.
> >
> >Thanks,
> > --Mark
> >
> >
> >On Sun, Nov 6, 2016 at 7:30 AM, James Pharaoh
> ><james@wellbehavedsoftware.com> wrote:
> >>Hi all,
> >>
> >>I'm pleased to announce my btrfs deduplication utility, written in Rust.
> >>This operates on whole files, is fast, and I believe complements the
> >>existing utilities (duperemove, bedup), which exist currently.
> >>
> >>Please visit the homepage for more information:
> >>
> >>http://btrfs-dedupe.com
> >>
> >>James Pharaoh
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-14 18:07 ` Zygo Blaxell
@ 2016-11-14 18:22 ` James Pharaoh
2016-11-14 18:39 ` Austin S. Hemmelgarn
2016-11-14 18:43 ` Zygo Blaxell
0 siblings, 2 replies; 42+ messages in thread
From: James Pharaoh @ 2016-11-14 18:22 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: Mark Fasheh, linux-btrfs
On 14/11/16 19:07, Zygo Blaxell wrote:
> On Mon, Nov 07, 2016 at 07:49:51PM +0100, James Pharaoh wrote:
>> Annoyingly I can't find this now, but I definitely remember reading someone,
>> apparently someone knowledgable, claim that the latest version of the kernel
>> which I was using at the time, still suffered from issues regarding the
>> dedupe code.
>
>> This was a while ago, and I would be very pleased to hear that there is high
>> confidence in the current implementation! I'll post a link if I manage to
>> find the comments.
>
> I've been running the btrfs dedup ioctl 7 times per second on average
> over 42TB of test data for most of a year (and at a lower rate for two
> years). I have not found any data corruptions due to _dedup_. I did find
> three distinct data corruption kernel bugs unrelated to dedup, and two
> test machines with bad RAM, so I'm pretty sure my corruption detection
> is working.
>
> That said, I wouldn't run dedup on a kernel older than 4.4. LTS kernels
> might be OK too, but only if they're up to date with backported btrfs
> fixes.
Ok, I think this might have referred to the 4.2 kernel, which was newly
released at the time. I wish I could find the post!
> Kernels older than 3.13 lack the FILE_EXTENT_SAME ioctl and can
> only deduplicate static data (i.e. data you are certain is not being
> concurrently modified). Before 3.12 there are so many bugs you might
> as well not bother.
Yes well I don't need to be told that, sadly.
> Older kernels are bad for dedup because of non-corruption reasons.
> Between 3.13 and 4.4, the following bugs were fixed:
>
> - false-negative capability checks (e.g. same-inode, EOF extent)
> reduce dedup efficiency
>
> - ctime updates (older versions would update ctime when a file was
> deduped) mess with incremental backup tools, build systems, etc.
>
> - kernel memory leaks (self-explanatory)
>
> - multiple kernel hang/panic bugs (e.g. a deadlock if two threads
> try to read the same extent at the same time, and at least one
> of those threads is dedup; and there was some race condition
> leading to invalid memory access on dedup's comparison reads)
> which won't eat your data, but they might ruin your day anyway.
Ok, I have thing I've seen some stuff like this, I certainly have
problems, but never a loss of data. Things can take a LONG time to get
out of the filesystem, though.
> There is also a still-unresolved problem where the filesystem CPU usage
> rises exponentially for some operations depending on the number of shared
> references to an extent. Files which contain blocks with more than a few
> thousand shared references can trigger this problem. A file over 1TB can
> keep the kernel busy at 100% CPU for over 40 minutes at a time.
Yes, I see this all the time. For my use cases, I don't really care
about "shared references" as blocks of files, but am happy to simply
deduplicate at the whole-file level. I wonder if this still will have
the same effect, however. I guess that this could be mitigated in a
tool, but this is going to be both annoying and not the most elegant
solution.
> There might also be a correlation between delalloc data and hangs in
> extent-same, but I have NOT been able to confirm this. All I know
> at this point is that doing a fsync() on the source FD just before
> doing the extent-same ioctl dramatically reduces filesystem hang rates:
> several weeks between hangs (or no hangs at all) with fsync, vs. 18 hours
> or less without.
Interesting, I'll maybe see if I can make use of this.
One thing I am keen to understand is if BTRFS will automatically ignore
a request to deduplicate a file if it is already deduplicated? Given the
performance I see when doing a repeat deduplication, it seems to me that
it can't be doing so, although this could be caused by the CPU usage you
mention above.
In any case, I'm considering some digging into the filesystem structures
to see if I can work this out myself before i do any deduplication. I'm
fairly sure this should be relatively simple to work out, at least well
enough for my purposes.
James
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-08 11:06 ` Niccolò Belli
2016-11-08 11:38 ` James Pharaoh
@ 2016-11-14 18:27 ` Zygo Blaxell
1 sibling, 0 replies; 42+ messages in thread
From: Zygo Blaxell @ 2016-11-14 18:27 UTC (permalink / raw)
To: Niccolò Belli; +Cc: James Pharaoh, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 551 bytes --]
On Tue, Nov 08, 2016 at 12:06:01PM +0100, Niccolò Belli wrote:
> Nice, you should probably update the btrfs wiki as well, because there is no
> mention of btrfs-dedupe.
>
> First question, why this name? Don't you plan to support xfs as well?
Does XFS plan to support LOGICAL_INO, INO_PATHS, and something analogous
to SEARCH_V2?
POSIX API + FILE_EXTENT_SAME is OK for the lowest common denominator
across arbitrary filesystems, but a btrfs-specific tool can do a lot
better. Especially for incremental dedup and low-RAM algorithms.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-14 18:22 ` James Pharaoh
@ 2016-11-14 18:39 ` Austin S. Hemmelgarn
2016-11-14 19:51 ` Zygo Blaxell
2016-11-14 18:43 ` Zygo Blaxell
1 sibling, 1 reply; 42+ messages in thread
From: Austin S. Hemmelgarn @ 2016-11-14 18:39 UTC (permalink / raw)
To: James Pharaoh, Zygo Blaxell; +Cc: Mark Fasheh, linux-btrfs
On 2016-11-14 13:22, James Pharaoh wrote:
> On 14/11/16 19:07, Zygo Blaxell wrote:
>> On Mon, Nov 07, 2016 at 07:49:51PM +0100, James Pharaoh wrote:
>>> Annoyingly I can't find this now, but I definitely remember reading
>>> someone,
>>> apparently someone knowledgable, claim that the latest version of the
>>> kernel
>>> which I was using at the time, still suffered from issues regarding the
>>> dedupe code.
>>
>>> This was a while ago, and I would be very pleased to hear that there
>>> is high
>>> confidence in the current implementation! I'll post a link if I
>>> manage to
>>> find the comments.
>>
>> I've been running the btrfs dedup ioctl 7 times per second on average
>> over 42TB of test data for most of a year (and at a lower rate for two
>> years). I have not found any data corruptions due to _dedup_. I did
>> find
>> three distinct data corruption kernel bugs unrelated to dedup, and two
>> test machines with bad RAM, so I'm pretty sure my corruption detection
>> is working.
>>
>> That said, I wouldn't run dedup on a kernel older than 4.4. LTS kernels
>> might be OK too, but only if they're up to date with backported btrfs
>> fixes.
>
> Ok, I think this might have referred to the 4.2 kernel, which was newly
> released at the time. I wish I could find the post!
>
>> Kernels older than 3.13 lack the FILE_EXTENT_SAME ioctl and can
>> only deduplicate static data (i.e. data you are certain is not being
>> concurrently modified). Before 3.12 there are so many bugs you might
>> as well not bother.
>
> Yes well I don't need to be told that, sadly.
>
>> Older kernels are bad for dedup because of non-corruption reasons.
>> Between 3.13 and 4.4, the following bugs were fixed:
>>
>> - false-negative capability checks (e.g. same-inode, EOF extent)
>> reduce dedup efficiency
>>
>> - ctime updates (older versions would update ctime when a file was
>> deduped) mess with incremental backup tools, build systems, etc.
>>
>> - kernel memory leaks (self-explanatory)
>>
>> - multiple kernel hang/panic bugs (e.g. a deadlock if two threads
>> try to read the same extent at the same time, and at least one
>> of those threads is dedup; and there was some race condition
>> leading to invalid memory access on dedup's comparison reads)
>> which won't eat your data, but they might ruin your day anyway.
>
> Ok, I have thing I've seen some stuff like this, I certainly have
> problems, but never a loss of data. Things can take a LONG time to get
> out of the filesystem, though.
>
>> There is also a still-unresolved problem where the filesystem CPU usage
>> rises exponentially for some operations depending on the number of shared
>> references to an extent. Files which contain blocks with more than a few
>> thousand shared references can trigger this problem. A file over 1TB can
>> keep the kernel busy at 100% CPU for over 40 minutes at a time.
>
> Yes, I see this all the time. For my use cases, I don't really care
> about "shared references" as blocks of files, but am happy to simply
> deduplicate at the whole-file level. I wonder if this still will have
> the same effect, however. I guess that this could be mitigated in a
> tool, but this is going to be both annoying and not the most elegant
> solution.
The issue is at the extent level, so it will impact whole files too (but
it will have less impact on defragmented files that are then
deduplicated as whole files). Pretty much anything that pins references
to extents will impact this, so cloned extents and snapshots will also
have an impact.
>
>> There might also be a correlation between delalloc data and hangs in
>> extent-same, but I have NOT been able to confirm this. All I know
>> at this point is that doing a fsync() on the source FD just before
>> doing the extent-same ioctl dramatically reduces filesystem hang rates:
>> several weeks between hangs (or no hangs at all) with fsync, vs. 18 hours
>> or less without.
>
> Interesting, I'll maybe see if I can make use of this.
>
> One thing I am keen to understand is if BTRFS will automatically ignore
> a request to deduplicate a file if it is already deduplicated? Given the
> performance I see when doing a repeat deduplication, it seems to me that
> it can't be doing so, although this could be caused by the CPU usage you
> mention above.
What's happening is that the dedupe ioctl does a byte-wise comparison of
the ranges to make sure they're the same before linking them. This is
actually what takes most of the time when calling the ioctl, and is part
of why it takes longer the larger the range to deduplicate is. In
essence, it's behaving like an OS should and not trusting userspace to
make reasonable requests (which is also why there's a separate ioctl to
clone a range from another file instead of deduplicating existing data).
TBH, even though it's kind of annoying from a performance perspective,
it's a rather nice safety net to have. For example, one of the cases
where I do deduplication is a couple of directories where each directory
is an overlapping partial subset of one large tree which I keep
elsewhere. In this case, I can tell just by filename exactly what files
might be duplicates, so the ioctl's check lets me just call the ioctl on
all potential duplicates (after checking size, no point in wasting time
if the files obviously aren't duplicates), and have it figure out
whether or not they can be deduplicated.
>
> In any case, I'm considering some digging into the filesystem structures
> to see if I can work this out myself before i do any deduplication. I'm
> fairly sure this should be relatively simple to work out, at least well
> enough for my purposes.
Sadly, there's no way to avoid doing so right now.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-14 18:22 ` James Pharaoh
2016-11-14 18:39 ` Austin S. Hemmelgarn
@ 2016-11-14 18:43 ` Zygo Blaxell
1 sibling, 0 replies; 42+ messages in thread
From: Zygo Blaxell @ 2016-11-14 18:43 UTC (permalink / raw)
To: James Pharaoh; +Cc: Mark Fasheh, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2736 bytes --]
On Mon, Nov 14, 2016 at 07:22:59PM +0100, James Pharaoh wrote:
> On 14/11/16 19:07, Zygo Blaxell wrote:
> >There is also a still-unresolved problem where the filesystem CPU usage
> >rises exponentially for some operations depending on the number of shared
> >references to an extent. Files which contain blocks with more than a few
> >thousand shared references can trigger this problem. A file over 1TB can
> >keep the kernel busy at 100% CPU for over 40 minutes at a time.
>
> Yes, I see this all the time. For my use cases, I don't really care about
> "shared references" as blocks of files, but am happy to simply deduplicate
> at the whole-file level. I wonder if this still will have the same effect,
> however. I guess that this could be mitigated in a tool, but this is going
> to be both annoying and not the most elegant solution.
If you have huge files (1TB+) this can be a problem even with whole-file
deduplications (which are really just extent-level deduplications applied
to the entire file). The CPU time is a product of file size and extent
reference count with some other multipliers on top.
I've hacked around it by timing how long it takes to manipulate the data,
and blacklisting any hash value or block address that takes more than
10 seconds to process (if such a block is found after blacklisting, just
skip processing the block/extent/file entirely). It turns out there are
very few of these in practice (only a few hundred per TB) but these few
hundred block hash values occur millions of times in a large data corpus.
> One thing I am keen to understand is if BTRFS will automatically ignore a
> request to deduplicate a file if it is already deduplicated? Given the
> performance I see when doing a repeat deduplication, it seems to me that it
> can't be doing so, although this could be caused by the CPU usage you
> mention above.
As far as I can tell btrfs doesn't do anything different in this
case--it'll happily repeat the entire lock/read/compare/delete/insert
sequence even if the outcome cannot be different from the initial
conditions. Due to limitations of VFS caching it'll read the same blocks
from storage hardware twice, too.
> In any case, I'm considering some digging into the filesystem structures to
> see if I can work this out myself before i do any deduplication. I'm fairly
> sure this should be relatively simple to work out, at least well enough for
> my purposes.
I used FIEMAP (then later replaced it with SEARCH_V2 for speed) to map
the extents to physical addresses before deduping them. If you're only
going to do whole-file dedup then you only need to care about the physical
address of the first non-hole extent.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-14 18:39 ` Austin S. Hemmelgarn
@ 2016-11-14 19:51 ` Zygo Blaxell
2016-11-14 19:56 ` Austin S. Hemmelgarn
2016-11-14 20:07 ` James Pharaoh
0 siblings, 2 replies; 42+ messages in thread
From: Zygo Blaxell @ 2016-11-14 19:51 UTC (permalink / raw)
To: Austin S. Hemmelgarn; +Cc: James Pharaoh, Mark Fasheh, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 3014 bytes --]
On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote:
> On 2016-11-14 13:22, James Pharaoh wrote:
> >One thing I am keen to understand is if BTRFS will automatically ignore
> >a request to deduplicate a file if it is already deduplicated? Given the
> >performance I see when doing a repeat deduplication, it seems to me that
> >it can't be doing so, although this could be caused by the CPU usage you
> >mention above.
> What's happening is that the dedupe ioctl does a byte-wise comparison of the
> ranges to make sure they're the same before linking them. This is actually
> what takes most of the time when calling the ioctl, and is part of why it
> takes longer the larger the range to deduplicate is. In essence, it's
> behaving like an OS should and not trusting userspace to make reasonable
> requests (which is also why there's a separate ioctl to clone a range from
> another file instead of deduplicating existing data).
Deduplicating an extent that may might be concurrently modified during the
dedup is a reasonable userspace request. In the general case there's
no way for userspace to ensure that it's not happening.
That said, some optimization is possible (although there are good reasons
not to bother with optimization in the kernel):
- VFS could recognize when it has two separate references to
the same physical extent and not re-read the same data twice
(but that requires teaching VFS how to do CoW in general, and is
hard for political reasons on top of the obvious technical ones).
- the extent-same ioctl could check to see which extents
are referenced by the src and dst ranges, and return success
immediately without reading data if they are the same (but
userspace should already know this, or it's wasting a huge amount
of time before it even calls the kernel).
> TBH, even though it's kind of annoying from a performance perspective, it's
> a rather nice safety net to have. For example, one of the cases where I do
> deduplication is a couple of directories where each directory is an
> overlapping partial subset of one large tree which I keep elsewhere. In
> this case, I can tell just by filename exactly what files might be
> duplicates, so the ioctl's check lets me just call the ioctl on all
> potential duplicates (after checking size, no point in wasting time if the
> files obviously aren't duplicates), and have it figure out whether or not
> they can be deduplicated.
> >
> >In any case, I'm considering some digging into the filesystem structures
> >to see if I can work this out myself before i do any deduplication. I'm
> >fairly sure this should be relatively simple to work out, at least well
> >enough for my purposes.
> Sadly, there's no way to avoid doing so right now.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-14 19:51 ` Zygo Blaxell
@ 2016-11-14 19:56 ` Austin S. Hemmelgarn
2016-11-14 21:10 ` Zygo Blaxell
2016-11-14 20:07 ` James Pharaoh
1 sibling, 1 reply; 42+ messages in thread
From: Austin S. Hemmelgarn @ 2016-11-14 19:56 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: James Pharaoh, Mark Fasheh, linux-btrfs
On 2016-11-14 14:51, Zygo Blaxell wrote:
> On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote:
>> On 2016-11-14 13:22, James Pharaoh wrote:
>>> One thing I am keen to understand is if BTRFS will automatically ignore
>>> a request to deduplicate a file if it is already deduplicated? Given the
>>> performance I see when doing a repeat deduplication, it seems to me that
>>> it can't be doing so, although this could be caused by the CPU usage you
>>> mention above.
>> What's happening is that the dedupe ioctl does a byte-wise comparison of the
>> ranges to make sure they're the same before linking them. This is actually
>> what takes most of the time when calling the ioctl, and is part of why it
>> takes longer the larger the range to deduplicate is. In essence, it's
>> behaving like an OS should and not trusting userspace to make reasonable
>> requests (which is also why there's a separate ioctl to clone a range from
>> another file instead of deduplicating existing data).
>
> Deduplicating an extent that may might be concurrently modified during the
> dedup is a reasonable userspace request. In the general case there's
> no way for userspace to ensure that it's not happening.
I'm not even talking about the locking, I'm talking about the data
comparison that the ioctl does to ensure they are the same before
deduplicating them, and specifically that protecting against userspace
just passing in two random extents that happen to be the same size but
not contain the same data (because deduplication _should_ reject such a
situation, that's what the clone ioctl is for).
The locking is perfectly reasonable and shouldn't contribute that much
to the overhead (unless you're being crazy and deduplicating thousands
of tiny blocks of data).
>
> That said, some optimization is possible (although there are good reasons
> not to bother with optimization in the kernel):
>
> - VFS could recognize when it has two separate references to
> the same physical extent and not re-read the same data twice
> (but that requires teaching VFS how to do CoW in general, and is
> hard for political reasons on top of the obvious technical ones).
>
> - the extent-same ioctl could check to see which extents
> are referenced by the src and dst ranges, and return success
> immediately without reading data if they are the same (but
> userspace should already know this, or it's wasting a huge amount
> of time before it even calls the kernel).
>
>> TBH, even though it's kind of annoying from a performance perspective, it's
>> a rather nice safety net to have. For example, one of the cases where I do
>> deduplication is a couple of directories where each directory is an
>> overlapping partial subset of one large tree which I keep elsewhere. In
>> this case, I can tell just by filename exactly what files might be
>> duplicates, so the ioctl's check lets me just call the ioctl on all
>> potential duplicates (after checking size, no point in wasting time if the
>> files obviously aren't duplicates), and have it figure out whether or not
>> they can be deduplicated.
>>>
>>> In any case, I'm considering some digging into the filesystem structures
>>> to see if I can work this out myself before i do any deduplication. I'm
>>> fairly sure this should be relatively simple to work out, at least well
>>> enough for my purposes.
>> Sadly, there's no way to avoid doing so right now.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-14 19:51 ` Zygo Blaxell
2016-11-14 19:56 ` Austin S. Hemmelgarn
@ 2016-11-14 20:07 ` James Pharaoh
2016-11-14 21:22 ` Zygo Blaxell
1 sibling, 1 reply; 42+ messages in thread
From: James Pharaoh @ 2016-11-14 20:07 UTC (permalink / raw)
To: Zygo Blaxell, Austin S. Hemmelgarn; +Cc: Mark Fasheh, linux-btrfs
On 14/11/16 20:51, Zygo Blaxell wrote:
> On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote:
>> On 2016-11-14 13:22, James Pharaoh wrote:
>>> One thing I am keen to understand is if BTRFS will automatically ignore
>>> a request to deduplicate a file if it is already deduplicated? Given the
>>> performance I see when doing a repeat deduplication, it seems to me that
>>> it can't be doing so, although this could be caused by the CPU usage you
>>> mention above.
>>
>> What's happening is that the dedupe ioctl does a byte-wise comparison of the
>> ranges to make sure they're the same before linking them. This is actually
>> what takes most of the time when calling the ioctl, and is part of why it
>> takes longer the larger the range to deduplicate is. In essence, it's
>> behaving like an OS should and not trusting userspace to make reasonable
>> requests (which is also why there's a separate ioctl to clone a range from
>> another file instead of deduplicating existing data).
>
> - the extent-same ioctl could check to see which extents
> are referenced by the src and dst ranges, and return success
> immediately without reading data if they are the same (but
> userspace should already know this, or it's wasting a huge amount
> of time before it even calls the kernel).
Yes, this is what I am talking about. I believe I should be able to read
data about the BTRFS data structures and determine if this is the case.
I don't care if there are false matches, due to concurrent updates, but
there'll be a /lot/ of repeat deduplications unless I do this, because
even if the file is identical, the mtime etc hasn't changed, and I have
a record of previously doing a dedupe, there's no guarantee that the
file hasn't been rewritten in place (eg by rsync), and no way that I
know of to reliably detect if a file has been changed.
I am sure there are libraries out there which can look into the data
structures of a BTRFS file system, I haven't researched this in detail
though. I imagine that with some kind of lock on a BTRFS root, this
could be achieved by simply reading the data from the disk, since I
believe that everything is copy-on-write, so no existing data should be
overwritten until all roots referring to it are updated. Perhaps I'm
missing something though...
James
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-14 19:56 ` Austin S. Hemmelgarn
@ 2016-11-14 21:10 ` Zygo Blaxell
2016-11-15 12:26 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 42+ messages in thread
From: Zygo Blaxell @ 2016-11-14 21:10 UTC (permalink / raw)
To: Austin S. Hemmelgarn; +Cc: James Pharaoh, Mark Fasheh, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 4307 bytes --]
On Mon, Nov 14, 2016 at 02:56:51PM -0500, Austin S. Hemmelgarn wrote:
> On 2016-11-14 14:51, Zygo Blaxell wrote:
> >Deduplicating an extent that may might be concurrently modified during the
> >dedup is a reasonable userspace request. In the general case there's
> >no way for userspace to ensure that it's not happening.
> I'm not even talking about the locking, I'm talking about the data
> comparison that the ioctl does to ensure they are the same before
> deduplicating them, and specifically that protecting against userspace just
> passing in two random extents that happen to be the same size but not
> contain the same data (because deduplication _should_ reject such a
> situation, that's what the clone ioctl is for).
If I'm deduping a VM image, and the virtual host is writing to said image
(which is likely since an incremental dedup will be intentionally doing
dedup over recently active data sets), the extent I just compared in
userspace might be different by the time the kernel sees it.
This is an important reason why the whole lock/read/compare/replace step
is an atomic operation from userspace's PoV.
The read also saves having to confirm a short/weak hash isn't a collision.
The RAM savings from using weak hashes (~48 bits) are a huge performance
win.
The locking overhead is very small compared to the reading overhead,
and (in the absence of bugs) it will only block concurrent writes to the
same offset range in the src/dst inodes (based on a read of the code...I
don't know if there's also an inode-level or backref-level barrier that
expands the locking scope).
I'm not sure the ioctl is well designed for simply throwing random
data at it, especially not entire files (it can't handle files over
16MB anyway). It will read more data than it has to compared to a
block-by-block comparison from userspace with prefetches or a pair of
IO threads. If userspace reads both copies of the data just before
issuing the extent-same call, the kernel will read the data from cache
reasonably quickly.
> The locking is perfectly reasonable and shouldn't contribute that much to
> the overhead (unless you're being crazy and deduplicating thousands of tiny
> blocks of data).
Why is deduplicating thousands of blocks of data crazy? I already
deduplicate four orders of magnitude more than that per week.
> >That said, some optimization is possible (although there are good reasons
> >not to bother with optimization in the kernel):
> >
> > - VFS could recognize when it has two separate references to
> > the same physical extent and not re-read the same data twice
> > (but that requires teaching VFS how to do CoW in general, and is
> > hard for political reasons on top of the obvious technical ones).
> >
> > - the extent-same ioctl could check to see which extents
> > are referenced by the src and dst ranges, and return success
> > immediately without reading data if they are the same (but
> > userspace should already know this, or it's wasting a huge amount
> > of time before it even calls the kernel).
> >
> >>TBH, even though it's kind of annoying from a performance perspective, it's
> >>a rather nice safety net to have. For example, one of the cases where I do
> >>deduplication is a couple of directories where each directory is an
> >>overlapping partial subset of one large tree which I keep elsewhere. In
> >>this case, I can tell just by filename exactly what files might be
> >>duplicates, so the ioctl's check lets me just call the ioctl on all
> >>potential duplicates (after checking size, no point in wasting time if the
> >>files obviously aren't duplicates), and have it figure out whether or not
> >>they can be deduplicated.
> >>>
> >>>In any case, I'm considering some digging into the filesystem structures
> >>>to see if I can work this out myself before i do any deduplication. I'm
> >>>fairly sure this should be relatively simple to work out, at least well
> >>>enough for my purposes.
> >>Sadly, there's no way to avoid doing so right now.
> >>
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-14 20:07 ` James Pharaoh
@ 2016-11-14 21:22 ` Zygo Blaxell
0 siblings, 0 replies; 42+ messages in thread
From: Zygo Blaxell @ 2016-11-14 21:22 UTC (permalink / raw)
To: James Pharaoh; +Cc: Austin S. Hemmelgarn, Mark Fasheh, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 3411 bytes --]
On Mon, Nov 14, 2016 at 09:07:51PM +0100, James Pharaoh wrote:
> On 14/11/16 20:51, Zygo Blaxell wrote:
> >On Mon, Nov 14, 2016 at 01:39:02PM -0500, Austin S. Hemmelgarn wrote:
> >>On 2016-11-14 13:22, James Pharaoh wrote:
> >>>One thing I am keen to understand is if BTRFS will automatically ignore
> >>>a request to deduplicate a file if it is already deduplicated? Given the
> >>>performance I see when doing a repeat deduplication, it seems to me that
> >>>it can't be doing so, although this could be caused by the CPU usage you
> >>>mention above.
> >>
> >>What's happening is that the dedupe ioctl does a byte-wise comparison of the
> >>ranges to make sure they're the same before linking them. This is actually
> >>what takes most of the time when calling the ioctl, and is part of why it
> >>takes longer the larger the range to deduplicate is. In essence, it's
> >>behaving like an OS should and not trusting userspace to make reasonable
> >>requests (which is also why there's a separate ioctl to clone a range from
> >>another file instead of deduplicating existing data).
> >
> > - the extent-same ioctl could check to see which extents
> > are referenced by the src and dst ranges, and return success
> > immediately without reading data if they are the same (but
> > userspace should already know this, or it's wasting a huge amount
> > of time before it even calls the kernel).
>
> Yes, this is what I am talking about. I believe I should be able to read
> data about the BTRFS data structures and determine if this is the case. I
> don't care if there are false matches, due to concurrent updates, but
> there'll be a /lot/ of repeat deduplications unless I do this, because even
> if the file is identical, the mtime etc hasn't changed, and I have a record
> of previously doing a dedupe, there's no guarantee that the file hasn't been
> rewritten in place (eg by rsync), and no way that I know of to reliably
> detect if a file has been changed.
>
> I am sure there are libraries out there which can look into the data
> structures of a BTRFS file system, I haven't researched this in detail
> though. I imagine that with some kind of lock on a BTRFS root, this could be
> achieved by simply reading the data from the disk, since I believe that
> everything is copy-on-write, so no existing data should be overwritten until
> all roots referring to it are updated. Perhaps I'm missing something
> though...
FIEMAP (VFS) and SEARCH_V2 (btrfs-specific) will both give you access
to the underlying physical block numbers. SEARCH_V2 is non-trivial
to use without reverse-engineering significant parts of btrfs-progs.
SEARCH_V2 is a generic tree-searching tool which will give you all kinds
of information about btrfs structures...it's essential for a sophisticated
deduplicator and overkill for a simple one.
For full-file dedup using FIEMAP you only need to look at the "physical"
field of the first extent (if it's zero or the same as the other file, the
files cannot be deduplicated or are already deduplicated, respectively).
The source for 'filefrag' (from e2fsprogs) is good for learning how
FIEMAP works.
For block-level dedup you need to look at each extent individually.
That's much slower and full of additional caveats. If you're going down
that road it's probably better to just improve duperemove instead.
> James
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-14 21:10 ` Zygo Blaxell
@ 2016-11-15 12:26 ` Austin S. Hemmelgarn
2016-11-15 17:52 ` Zygo Blaxell
0 siblings, 1 reply; 42+ messages in thread
From: Austin S. Hemmelgarn @ 2016-11-15 12:26 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: James Pharaoh, Mark Fasheh, linux-btrfs
On 2016-11-14 16:10, Zygo Blaxell wrote:
> On Mon, Nov 14, 2016 at 02:56:51PM -0500, Austin S. Hemmelgarn wrote:
>> On 2016-11-14 14:51, Zygo Blaxell wrote:
>>> Deduplicating an extent that may might be concurrently modified during the
>>> dedup is a reasonable userspace request. In the general case there's
>>> no way for userspace to ensure that it's not happening.
>> I'm not even talking about the locking, I'm talking about the data
>> comparison that the ioctl does to ensure they are the same before
>> deduplicating them, and specifically that protecting against userspace just
>> passing in two random extents that happen to be the same size but not
>> contain the same data (because deduplication _should_ reject such a
>> situation, that's what the clone ioctl is for).
>
> If I'm deduping a VM image, and the virtual host is writing to said image
> (which is likely since an incremental dedup will be intentionally doing
> dedup over recently active data sets), the extent I just compared in
> userspace might be different by the time the kernel sees it.
>
> This is an important reason why the whole lock/read/compare/replace step
> is an atomic operation from userspace's PoV.
>
> The read also saves having to confirm a short/weak hash isn't a collision.
> The RAM savings from using weak hashes (~48 bits) are a huge performance
> win.
>
> The locking overhead is very small compared to the reading overhead,
> and (in the absence of bugs) it will only block concurrent writes to the
> same offset range in the src/dst inodes (based on a read of the code...I
> don't know if there's also an inode-level or backref-level barrier that
> expands the locking scope).
I'm not arguing that it's a bad thing that the kernel is doing this, I'm
just saying that the locking overhead is minuscule in most cases
compared to the data comparison. It is absolutely necessary for exactly
the reasons you are outlining.
>
> I'm not sure the ioctl is well designed for simply throwing random
> data at it, especially not entire files (it can't handle files over
> 16MB anyway). It will read more data than it has to compared to a
> block-by-block comparison from userspace with prefetches or a pair of
> IO threads. If userspace reads both copies of the data just before
> issuing the extent-same call, the kernel will read the data from cache
> reasonably quickly.
It still depends on the use case to a certain extent. In the case I was
using as an example, I know to a reasonably certain degree (barring
tampering, bugs, or hardware failure) that any two files are identical,
and I actually don't want to trash the page-cache just to deduplicate
data faster (he data set in question is large, but most of it is idle at
any given point in time), so there's no point in me prereading
everything in userspace, which in turn makes the script I use much
simpler (the most complex part is figuring out how to split extents for
files bigger than the ioctl can handle such that I don't have tiny tail
extents but still have a minimum number per file).
>
>> The locking is perfectly reasonable and shouldn't contribute that much to
>> the overhead (unless you're being crazy and deduplicating thousands of tiny
>> blocks of data).
>
> Why is deduplicating thousands of blocks of data crazy? I already
> deduplicate four orders of magnitude more than that per week.
You missed the 'tiny' quantifier. I'm talking really small blocks, on
the order of less than 64k (so, IOW, stuff that's not much bigger than a
few filesystem blocks), and that is somewhat crazy because it ends up
not only taking _really_ long to do compared to larger chunks (because
you're running more independent hashes than with bigger blocks), but
also because it will often split extents unnecessarily and contribute to
fragmentation, which will lead to all kinds of other performance
problems on the FS.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-15 12:26 ` Austin S. Hemmelgarn
@ 2016-11-15 17:52 ` Zygo Blaxell
2016-11-16 22:24 ` Niccolò Belli
0 siblings, 1 reply; 42+ messages in thread
From: Zygo Blaxell @ 2016-11-15 17:52 UTC (permalink / raw)
To: Austin S. Hemmelgarn; +Cc: James Pharaoh, Mark Fasheh, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 3732 bytes --]
On Tue, Nov 15, 2016 at 07:26:53AM -0500, Austin S. Hemmelgarn wrote:
> On 2016-11-14 16:10, Zygo Blaxell wrote:
> >Why is deduplicating thousands of blocks of data crazy? I already
> >deduplicate four orders of magnitude more than that per week.
> You missed the 'tiny' quantifier. I'm talking really small blocks, on the
> order of less than 64k (so, IOW, stuff that's not much bigger than a few
> filesystem blocks), and that is somewhat crazy because it ends up not only
> taking _really_ long to do compared to larger chunks (because you're running
> more independent hashes than with bigger blocks), but also because it will
> often split extents unnecessarily and contribute to fragmentation, which
> will lead to all kinds of other performance problems on the FS.
Like I said, millions of extents per week...
64K is an enormous dedup block size, especially if it comes with a 64K
alignment constraint as well.
These are the top ten duplicate block sizes from a sample of 95251
dedup ops on a medium-sized production server with 4TB of filesystem
(about one machine-day of data):
total bytes extent count dup size
2750808064 20987 131072
803733504 1533 524288
123801600 975 126976
103575552 8429 12288
97443840 793 122880
82051072 10016 8192
77492224 18919 4096
71331840 645 110592
64143360 540 118784
63897600 650 98304
all bytes all extents average dup size
6129995776 95251 64356
128K and 512K are the most common sizes due to btrfs compression (it
limits the block size to 128K for compressed extents and seems to limit
uncompressed extents to 512K for some reason). 12K is #4, and 3 of the
top ten sizes are below 16K. The average size is just a little below 64K.
These are the duplicates with block sizes smaller than 64K:
total bytes extent count extent size
41615360 635 65536
46264320 753 61440
45817856 799 57344
41267200 775 53248
45760512 931 49152
46948352 1042 45056
43417600 1060 40960
47296512 1283 36864
59277312 1809 32768
49029120 1710 28672
43745280 1780 24576
53616640 2618 20480
43466752 2653 16384
103575552 8429 12288
82051072 10016 8192
77492224 18919 4096
all bytes <=64K extents <=64K average dup size <=64K
870641664 55212 15769
14% of my duplicate bytes are in blocks smaller than 64K or blocks not
aligned to a 64K boundary within a file. It's too large a space saving
to ignore on machines that have constrained storage.
It may be worthwhile skipping 4K and 8K dedups--at 250 ms per dedup,
they're 30% of the total run time and only 2.6% of the total dedup bytes.
On the other hand, this machine is already deduping everything fast enough
to keep up with new data, so there's no performance problem to solve here.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-15 17:52 ` Zygo Blaxell
@ 2016-11-16 22:24 ` Niccolò Belli
2016-11-17 3:01 ` Zygo Blaxell
0 siblings, 1 reply; 42+ messages in thread
From: Niccolò Belli @ 2016-11-16 22:24 UTC (permalink / raw)
To: Zygo Blaxell
Cc: Austin S. Hemmelgarn, James Pharaoh, Mark Fasheh, linux-btrfs
On martedì 15 novembre 2016 18:52:01 CET, Zygo Blaxell wrote:
> Like I said, millions of extents per week...
>
> 64K is an enormous dedup block size, especially if it comes with a 64K
> alignment constraint as well.
>
> These are the top ten duplicate block sizes from a sample of 95251
> dedup ops on a medium-sized production server with 4TB of filesystem
> (about one machine-day of data):
Which software do you use to dedupe your data? I tried duperemove but it
gets killed by the OOM killer because it triggers some kind of memory leak:
https://github.com/markfasheh/duperemove/issues/163
Niccolò Belli
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-16 22:24 ` Niccolò Belli
@ 2016-11-17 3:01 ` Zygo Blaxell
2016-11-18 10:36 ` Niccolò Belli
0 siblings, 1 reply; 42+ messages in thread
From: Zygo Blaxell @ 2016-11-17 3:01 UTC (permalink / raw)
To: Niccolò Belli
Cc: Austin S. Hemmelgarn, James Pharaoh, Mark Fasheh, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]
On Wed, Nov 16, 2016 at 11:24:33PM +0100, Niccolò Belli wrote:
> On martedì 15 novembre 2016 18:52:01 CET, Zygo Blaxell wrote:
> >Like I said, millions of extents per week...
> >
> >64K is an enormous dedup block size, especially if it comes with a 64K
> >alignment constraint as well.
> >
> >These are the top ten duplicate block sizes from a sample of 95251
> >dedup ops on a medium-sized production server with 4TB of filesystem
> >(about one machine-day of data):
>
> Which software do you use to dedupe your data? I tried duperemove but it
> gets killed by the OOM killer because it triggers some kind of memory leak:
> https://github.com/markfasheh/duperemove/issues/163
Duperemove does use a lot of memory, but the logs at that URL only show
2G of RAM in duperemove--not nearly enough to trigger OOM under normal
conditions on an 8G machine. There's another process with 6G of virtual
address space (although much less than that resident) that looks more
interesting (i.e. duperemove might just be the victim of some interaction
between baloo_file and the OOM killer).
On the other hand, the logs also show kernel 4.8. 100% of my test
machines failed to finish booting before they were cut down by OOM on
4.7.x kernels. The same problem occurs on early kernels in the 4.8.x
series. I am having good results with 4.8.6 and later, but you should
be aware that significant changes have been made to the way OOM works
in these kernel versions, and maybe you're hitting a regression for your
use case.
> Niccolò Belli
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Announcing btrfs-dedupe
2016-11-17 3:01 ` Zygo Blaxell
@ 2016-11-18 10:36 ` Niccolò Belli
0 siblings, 0 replies; 42+ messages in thread
From: Niccolò Belli @ 2016-11-18 10:36 UTC (permalink / raw)
To: Zygo Blaxell
Cc: Austin S. Hemmelgarn, James Pharaoh, Mark Fasheh, linux-btrfs
On giovedì 17 novembre 2016 04:01:52 CET, Zygo Blaxell wrote:
> Duperemove does use a lot of memory, but the logs at that URL only show
> 2G of RAM in duperemove--not nearly enough to trigger OOM under normal
> conditions on an 8G machine. There's another process with 6G of virtual
> address space (although much less than that resident) that looks more
> interesting (i.e. duperemove might just be the victim of some interaction
> between baloo_file and the OOM killer).
Thanks, I killed baloo_file before starting duperemove and it somehow
improved (it reached 99.73% before getting killed by OOM killer once
again):
[ 6342.147251] Purging GPU memory, 0 pages freed, 18268 pages still pinned.
[ 6342.147253] 48 and 0 pages still available in the bound and unbound GPU
page lists.
[ 6342.147340] Xorg invoked oom-killer:
gfp_mask=0x240c0d0(GFP_TEMPORARY|__GFP_COMP|__GFP_ZERO), order=3,
oom_score_adj=0
[ 6342.147341] Xorg cpuset=/ mems_allowed=0
[ 6342.147346] CPU: 3 PID: 650 Comm: Xorg Not tainted 4.8.8-2-ARCH #1
[ 6342.147347] Hardware name: Dell Inc. XPS 13 9343/0F5KF3, BIOS A09
08/29/2016
[ 6342.147348] 0000000000000286 000000009b89a9c8 ffff88020752f598
ffffffff812fde10
[ 6342.147351] ffff88020752f758 ffff8801edc62ac0 ffff88020752f608
ffffffff81205fa2
[ 6342.147353] 000188020752f5a0 000000009b89a9c8 00000000ffffffff
0000000000000000
[ 6342.147356] Call Trace:
[ 6342.147361] [<ffffffff812fde10>] dump_stack+0x63/0x83
[ 6342.147364] [<ffffffff81205fa2>] dump_header+0x5c/0x1ea
[ 6342.147366] [<ffffffff8117ca35>] oom_kill_process+0x265/0x410
[ 6342.147368] [<ffffffff810864c7>] ? has_capability_noaudit+0x17/0x20
[ 6342.147369] [<ffffffff8117cfb0>] out_of_memory+0x380/0x420
[ 6342.147373] [<ffffffff81312c48>] ? find_next_bit+0x18/0x20
[ 6342.147374] [<ffffffff81182610>] __alloc_pages_nodemask+0xda0/0xde0
[ 6342.147377] [<ffffffff811d6605>] alloc_pages_current+0x95/0x140
[ 6342.147380] [<ffffffff811a3e6e>] kmalloc_order_trace+0x2e/0xf0
[ 6342.147382] [<ffffffff811e1f4a>] __kmalloc+0x1ea/0x200
[ 6342.147397] [<ffffffffa05f825e>] ? alloc_gen8_temp_bitmaps+0x2e/0x80
[i915]
[ 6342.147407] [<ffffffffa05f8277>] alloc_gen8_temp_bitmaps+0x47/0x80
[i915]
[ 6342.147417] [<ffffffffa05fb648>] gen8_alloc_va_range_3lvl+0x98/0x9c0
[i915]
[ 6342.147419] [<ffffffff81197f3d>] ? shmem_getpage_gfp+0xed/0xc30
[ 6342.147421] [<ffffffff8130e3ba>] ? sg_init_table+0x1a/0x40
[ 6342.147423] [<ffffffff813292e3>] ? swiotlb_map_sg_attrs+0x53/0x130
[ 6342.147432] [<ffffffffa05fc1c6>] gen8_alloc_va_range+0x256/0x490 [i915]
[ 6342.147442] [<ffffffffa05fea7b>] i915_vma_bind+0x9b/0x190 [i915]
[ 6342.147453] [<ffffffffa060522b>] i915_gem_object_do_pin+0x86b/0xa90
[i915]
[ 6342.147463] [<ffffffffa060547d>] i915_gem_object_pin+0x2d/0x30 [i915]
[ 6342.147472] [<ffffffffa05f316f>]
i915_gem_execbuffer_reserve_vma.isra.7+0x9f/0x180 [i915]
[ 6342.147482] [<ffffffffa05f35e6>]
i915_gem_execbuffer_reserve.isra.8+0x396/0x3c0 [i915]
[ 6342.147491] [<ffffffffa05f457b>]
i915_gem_do_execbuffer.isra.14+0x68b/0x1270 [i915]
[ 6342.147493] [<ffffffff8158ad31>] ? unix_stream_read_generic+0x281/0x8a0
[ 6342.147503] [<ffffffffa05f5dc4>] i915_gem_execbuffer2+0x104/0x270
[i915]
[ 6342.147509] [<ffffffffa046cd40>] drm_ioctl+0x200/0x4f0 [drm]
[ 6342.147518] [<ffffffffa05f5cc0>] ? i915_gem_execbuffer+0x330/0x330
[i915]
[ 6342.147520] [<ffffffff810ecc1d>] ? enqueue_hrtimer+0x3d/0xa0
[ 6342.147522] [<ffffffff813061f4>] ? timerqueue_del+0x24/0x70
[ 6342.147523] [<ffffffff810ececc>] ? __remove_hrtimer+0x3c/0x90
[ 6342.147525] [<ffffffff8121c383>] do_vfs_ioctl+0xa3/0x5f0
[ 6342.147527] [<ffffffff810ee65b>] ? do_setitimer+0x12b/0x230
[ 6342.147529] [<ffffffff812275f7>] ? __fget+0x77/0xb0
[ 6342.147531] [<ffffffff8121c949>] SyS_ioctl+0x79/0x90
[ 6342.147533] [<ffffffff815f7e72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 6342.147535] Mem-Info:
[ 6342.147538] active_anon:76311 inactive_anon:76782 isolated_anon:0
active_file:347581 inactive_file:1415592 isolated_file:64
unevictable:8 dirty:482 writeback:0 unstable:0
slab_reclaimable:27219 slab_unreclaimable:14772
mapped:20714 shmem:30458 pagetables:10557 bounce:0
free:25642 free_pcp:327 free_cma:0
[ 6342.147541] Node 0 active_anon:305244kB inactive_anon:307128kB
active_file:1390324kB inactive_file:5662368kB unevictable:32kB
isolated(anon):0kB isolated(file):256kB mapped:82856kB dirty:1928kB
writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 81920kB anon_thp:
121832kB writeback_tmp:0kB unstable:0kB pages_scanned:32 all_unreclaimable?
no
[ 6342.147542] Node 0 DMA free:15688kB min:132kB low:164kB high:196kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB writepending:0kB present:15984kB managed:15896kB
mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:208kB kernel_stack:0kB
pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 6342.147545] lowmem_reserve[]: 0 3395 7850 7850 7850
[ 6342.147548] Node 0 DMA32 free:48772kB min:29172kB low:36464kB
high:43756kB active_anon:84724kB inactive_anon:87164kB active_file:555728kB
inactive_file:2639796kB unevictable:0kB writepending:696kB
present:3564504kB managed:3488752kB mlocked:0kB slab_reclaimable:47196kB
slab_unreclaimable:11472kB kernel_stack:192kB pagetables:200kB bounce:0kB
free_pcp:1284kB local_pcp:0kB free_cma:0kB
[ 6342.147553] lowmem_reserve[]: 0 0 4454 4454 4454
[ 6342.147555] Node 0 Normal free:38108kB min:38276kB low:47844kB
high:57412kB active_anon:220520kB inactive_anon:219964kB
active_file:834596kB inactive_file:3022312kB unevictable:32kB
writepending:1232kB present:4694016kB managed:4561616kB mlocked:32kB
slab_reclaimable:61680kB slab_unreclaimable:47408kB kernel_stack:7776kB
pagetables:42028kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 6342.147558] lowmem_reserve[]: 0 0 0 0 0
[ 6342.147561] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 1*64kB (U) 2*128kB
(U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15688kB
[ 6342.147570] Node 0 DMA32: 12108*4kB (UE) 29*8kB (UE) 0*16kB 0*32kB
0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48664kB
[ 6342.147578] Node 0 Normal: 9514*4kB (UMH) 47*8kB (H) 0*16kB 0*32kB
0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 38432kB
[ 6342.147586] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
[ 6342.147588] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB
[ 6342.147589] 1796570 total pagecache pages
[ 6342.147591] 2853 pages in swap cache
[ 6342.147592] Swap cache stats: add 378584, delete 375731, find
101539/126272
[ 6342.147592] Free swap = 7743900kB
[ 6342.147593] Total swap = 8387904kB
[ 6342.147594] 2068626 pages RAM
[ 6342.147594] 0 pages HighMem/MovableOnly
[ 6342.147595] 52060 pages reserved
[ 6342.147595] 0 pages hwpoisoned
[ 6342.147596] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds
swapents oom_score_adj name
[ 6342.147600] [ 244] 0 244 21377 2220 37 4
87 0 systemd-journal
[ 6342.147603] [ 278] 0 278 9122 33 20 3
334 -1000 systemd-udevd
[ 6342.147605] [ 597] 192 597 35700 41 39 3
146 0 systemd-timesyn
[ 6342.147607] [ 601] 84 601 10328 158 24 3
74 0 avahi-daemon
[ 6342.147609] [ 603] 0 603 1841 3 9 3
21 0 gpm
[ 6342.147610] [ 604] 0 604 9294 32 21 3
140 0 bluetoothd
[ 6342.147612] [ 606] 84 606 10295 0 23 3
78 0 avahi-daemon
[ 6342.147614] [ 607] 81 607 8584 716 21 3
95 -900 dbus-daemon
[ 6342.147615] [ 609] 0 609 9635 373 23 3
114 0 systemd-logind
[ 6342.147617] [ 610] 0 610 90299 1407 74 3
1048 0 NetworkManager
[ 6342.147619] [ 629] 0 629 22126 369 46 3
462 0 cupsd
[ 6342.147621] [ 630] 0 630 10100 0 24 3
143 -1000 sshd
[ 6342.147622] [ 631] 0 631 186671 75 145 4
1675 0 libvirtd
[ 6342.147624] [ 633] 0 633 47157 132 44 3
345 0 sddm
[ 6342.147626] [ 650] 0 650 80478 10316 161 3
2042 0 Xorg
[ 6342.147628] [ 651] 124 651 78876 73 53 3
2031 0 colord
[ 6342.147630] [ 663] 0 663 79429 432 150 4
621 0 smbd
[ 6342.147631] [ 665] 0 665 78446 9 144 4
643 0 smbd-notifyd
[ 6342.147634] [ 666] 0 666 78444 10 142 4
642 0 cleanupd
[ 6342.147635] [ 669] 0 669 79429 664 145 4
598 0 lpqd
[ 6342.147637] [ 681] 0 681 10855 0 25 3
114 0 wpa_supplicant
[ 6342.147639] [ 689] 102 689 131227 585 53 3
2366 0 polkitd
[ 6342.147640] [ 777] 99 777 11640 0 26 3
134 0 dnsmasq
[ 6342.147642] [ 778] 0 778 11640 0 25 3
132 0 dnsmasq
[ 6342.147644] [ 838] 99 838 11640 108 24 3
130 0 dnsmasq
[ 6342.147645] [ 839] 0 839 11607 0 24 3
90 0 dnsmasq
[ 6342.147647] [ 844] 0 844 93062 666 46 3
248 0 udisksd
[ 6342.147648] [ 850] 0 850 75056 510 46 3
285 0 upowerd
[ 6342.147650] [ 883] 0 883 40487 226 69 3
397 0 sddm-helper
[ 6342.147651] [ 884] 1000 884 13702 505 31 3
206 0 systemd
[ 6342.147653] [ 885] 1000 885 24689 15 49 3
428 0 (sd-pam)
[ 6342.147655] [ 894] 1000 894 3425 51 11 3
114 0 startkde
[ 6342.147656] [ 901] 1000 901 8539 69 21 3
419 0 dbus-daemon
[ 6342.147658] [ 935] 1000 935 1526 0 8 3
21 0 start_kdeinit
[ 6342.147659] [ 936] 1000 936 75329 236 126 4
762 0 kdeinit5
[ 6342.147661] [ 937] 1000 937 129068 50 169 3
942 0 klauncher
[ 6342.147662] [ 940] 1000 940 367750 2255 318 5
2577 0 kded5
[ 6342.147664] [ 948] 1000 948 132228 123 168 4
1562 0 kaccess
[ 6342.147666] [ 962] 1000 962 136114 850 63 4
439 0 mission-control
[ 6342.147667] [ 967] 1000 967 129490 142 162 4
1190 0 kglobalaccel5
[ 6342.147669] [ 973] 1000 973 46251 73 26 3
135 0 dconf-service
[ 6342.147670] [ 980] 1000 980 21602 62 32 4
145 0 kwrapper5
[ 6342.147672] [ 981] 1000 981 159329 271 188 3
1237 0 ksmserver
[ 6342.147673] [ 991] 1000 991 89508 221 99 3
502 0 kscreen_backend
[ 6342.147675] [ 993] 1000 993 780636 196 268 5
6207 0 kwin_x11
[ 6342.147677] [ 996] 1000 996 164381 573 203 4
1179 0 kdeconnectd
[ 6342.147678] [ 997] 1000 997 190138 156 242 4
4271 0 krunner
[ 6342.147680] [ 999] 1000 999 928816 13708 461 5
17112 0 plasmashell
[ 6342.147682] [ 1000] 1000 1000 172107 144 179 4
1034 0 polkit-kde-auth
[ 6342.147683] [ 1001] 1000 1001 129793 58 167 3
934 0 xembedsniproxy
[ 6342.147685] [ 1006] 1000 1006 127736 476 167 3
860 0 org_kde_powerde
[ 6342.147686] [ 1007] 1000 1007 267637 360 284 4
4630 0 korgac
[ 6342.147688] [ 1077] 1000 1077 125471 67 93 4
1316 0 pulseaudio
[ 6342.147689] [ 1079] 133 1079 44462 0 22 3
71 0 rtkit-daemon
[ 6342.147691] [ 1093] 1000 1093 199345 71 65 4
5741 0 xiccd
[ 6342.147692] [ 1116] 1000 1116 21171 0 43 3
175 0 gconf-helper
[ 6342.147694] [ 1118] 1000 1118 16483 25 34 3
198 0 gconfd-2
[ 6342.147695] [ 1123] 1000 1123 70044 0 38 3
225 0 gvfsd
[ 6342.147697] [ 1125] 1000 1125 226815 111 183 4
1213 0 kactivitymanage
[ 6342.147701] [ 1130] 1000 1130 86949 69 35 3
744 0 gvfsd-fuse
[ 6342.147702] [ 1146] 1000 1146 129379 146 164 3
948 0 kactivitymanage
[ 6342.147704] [ 1150] 1000 1150 129380 130 158 4
956 0 kactivitymanage
[ 6342.147705] [ 1195] 1000 1195 10293 44 24 3
149 0 obexd
[ 6342.147707] [ 1197] 1000 1197 128967 247 166 3
944 0 akonadi_control
[ 6342.147709] [ 1206] 1000 1206 682254 0 210 6
3962 0 akonadiserver
[ 6342.147710] [ 1211] 1000 1211 146367 5703 109 3
19313 0 mysqld
[ 6342.147712] [ 1261] 1000 1261 172241 0 178 4
1600 0 akonadi_akonote
[ 6342.147714] [ 1262] 1000 1262 328923 0 321 4
4533 0 akonadi_archive
[ 6342.147715] [ 1263] 1000 1263 193336 0 187 3
1156 0 akonadi_birthda
[ 6342.147717] [ 1264] 1000 1264 169386 6 177 4
1586 0 akonadi_contact
[ 6342.147718] [ 1265] 1000 1265 196042 380 192 4
1077 0 akonadi_followu
[ 6342.147719] [ 1266] 1000 1266 195484 884 195 4
1081 0 akonadi_ical_re
[ 6342.147721] [ 1269] 1000 1269 148974 114 207 4
2098 0 kwalletd5
[ 6342.147723] [ 1271] 1000 1271 178961 0 187 4
1328 0 akonadi_indexin
[ 6342.147724] [ 1272] 1000 1272 172239 0 179 4
1608 0 akonadi_maildir
[ 6342.147726] [ 1275] 1000 1275 229896 260 192 4
1551 0 akonadi_maildis
[ 6342.147727] [ 1278] 1000 1278 356078 0 344 5
5204 0 akonadi_mailfil
[ 6342.147729] [ 1279] 1000 1279 169395 9 179 4
1077 0 akonadi_migrati
[ 6342.147730] [ 1281] 1000 1281 237225 0 271 4
3373 0 akonadi_newmail
[ 6342.147731] [ 1283] 1000 1283 296254 1054 280 4
3818 0 akonadi_notes_a
[ 6342.147733] [ 1284] 1000 1284 292138 501 312 4
3846 0 akonadi_sendlat
[ 6342.147736] [ 1486] 1000 1486 129345 71 170 3
977 0 kuiserver5
[ 6342.147738] [ 1575] 1000 1575 157148 10329 220 4
2473 0 konsole
[ 6342.147739] [ 1579] 1000 1579 4021 56 13 3
164 0 bash
[ 6342.147741] [ 1582] 0 1582 17563 23 38 3
248 0 sudo
[ 6342.147742] [ 1583] 0 1583 3425 53 11 3
118 0 duperemove.sh
[ 6342.147744] [ 4060] 0 4060 168501 92579 203 3
24 0 duperemove
[ 6342.147746] Out of memory: Kill process 4060 (duperemove) score 21 or
sacrifice child
[ 6342.147754] Killed process 4060 (duperemove) total-vm:674004kB,
anon-rss:367672kB, file-rss:2644kB, shmem-rss:0kB
Any idea? The process with the highest total_vm is plasmashell, but it has
only 900MB of vm.
Niccolò Belli
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2016-11-18 10:36 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-06 13:30 Announcing btrfs-dedupe James Pharaoh
2016-11-07 14:02 ` David Sterba
2016-11-07 17:48 ` Mark Fasheh
2016-11-07 20:54 ` Adam Borowski
2016-11-08 2:17 ` Darrick J. Wong
2016-11-08 18:59 ` Mark Fasheh
2016-11-08 19:47 ` Darrick J. Wong
2016-11-08 19:47 ` [Ocfs2-devel] " Darrick J. Wong
2016-11-09 15:02 ` David Sterba
2016-11-08 2:40 ` Christoph Anton Mitterer
2016-11-08 6:11 ` James Pharaoh
2016-11-08 13:26 ` Austin S. Hemmelgarn
2016-11-08 16:57 ` Darrick J. Wong
2016-11-08 17:04 ` Austin S. Hemmelgarn
2016-11-08 18:49 ` Mark Fasheh
2016-11-07 17:59 ` Mark Fasheh
2016-11-07 18:49 ` James Pharaoh
2016-11-07 18:53 ` James Pharaoh
2016-11-14 18:07 ` Zygo Blaxell
2016-11-14 18:22 ` James Pharaoh
2016-11-14 18:39 ` Austin S. Hemmelgarn
2016-11-14 19:51 ` Zygo Blaxell
2016-11-14 19:56 ` Austin S. Hemmelgarn
2016-11-14 21:10 ` Zygo Blaxell
2016-11-15 12:26 ` Austin S. Hemmelgarn
2016-11-15 17:52 ` Zygo Blaxell
2016-11-16 22:24 ` Niccolò Belli
2016-11-17 3:01 ` Zygo Blaxell
2016-11-18 10:36 ` Niccolò Belli
2016-11-14 20:07 ` James Pharaoh
2016-11-14 21:22 ` Zygo Blaxell
2016-11-14 18:43 ` Zygo Blaxell
2016-11-08 11:06 ` Niccolò Belli
2016-11-08 11:38 ` James Pharaoh
2016-11-08 16:57 ` Niccolò Belli
2016-11-08 16:58 ` James Pharaoh
2016-11-08 17:08 ` Niccolò Belli
2016-11-14 18:27 ` Zygo Blaxell
2016-11-08 22:36 ` Saint Germain
2016-11-09 11:24 ` Niccolò Belli
2016-11-09 12:47 ` Saint Germain
2016-11-13 12:45 ` James Pharaoh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.