All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <dsterba@suse.com>, <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH RFC 00/14] Yet Another In-band(online) deduplication implement
Date: Thu, 27 Aug 2015 08:52:25 +0800	[thread overview]
Message-ID: <55DE5F49.9@cn.fujitsu.com> (raw)
In-Reply-To: <55BF15DF.9080805@cn.fujitsu.com>



Qu Wenruo wrote on 2015/08/03 15:18 +0800:
>
>
> David Sterba wrote on 2015/07/28 16:50 +0200:
>> On Tue, Jul 28, 2015 at 04:30:36PM +0800, Qu Wenruo wrote:
>>> Although Liu Bo has already submitted a V10 version of his deduplication
>>> implement, here is another implement for it.
>>
>> What's the reason to start another implementation?
>>
>>> [[CORE FEATURES]]
>>> The main design concept is the following:
>>> 1) Controllable memory usage
>>> 2) No guarantee to dedup every duplication.
>>> 3) No on-disk format change or new format
>>> 4) Page size level deduplication
>>
>> 1 and 2) are good goals, allow usability tradeoffs
>>
>> 3) so the dedup hash is stored only for the mount life time. Though it
>> avoids the on-disk format changes, it also reduces the effectivity. It
>> is possible to "seed" the in-memory tree by reading all files that
>> contain potentially duplicate blocks but one would have to do that after
>> each mount.
>>
>> 4) page-sized dedup chunk is IMHO way too small. Although it can achieve
>> high dedup rate, the metadata can potentially explode and cause more
>> fragmentation.
>>
>>> Implement details includes the following:
>>> 1) LRU hash maps to limit the memory usage
>>>     The hash -> extent mapping is control by LRU (or unlimited), to
>>>     get a controllable memory usage (can be tuned by mount option)
>>>     alone with controllable read/write overhead used for hash searching.
>>
>> In Liu Bo's series, I rejected the mount options as an interface and
>> will do that here as well. His patches added a dedup ioctl to (at least)
>> enable/disable the dedup.
> BTW, would you please give me some reason why that's not a good idea to
> use mount option to trigger/change dedup options?
>
> Thanks,
> Qu

Ping?
No other comment?

Thanks,
Qu
>>
>>> 2) Reuse existing ordered_extent infrastructure
>>>     For duplicated page, it will still submit a ordered_extent(only one
>>>     page long), to make the full use of all existing infrastructure.
>>>     But only not submit a bio.
>>>     This can reduce the number of code lines.
>>
>>> 3) Mount option to control dedup behavior
>>>     Deduplication and its memory usage can be tuned by mount option.
>>>     No need to indicated ioctl interface.
>>
>> I'd say the other way around.
>>
>>>     And further more, it can easily support BTRFS_INODE flag like
>>>     compression, to allow further per file dedup fine tunning.
>>>
>>> [[TODO]]
>>> 3. Add support for per file dedup flags
>>>     Much easier, just like compression flags.
>>
>> How is that supposed to work? You mean add per-file flags/attributes to
>> mark a file so it fills the dedup hash tree and is actively going to be
>> deduped agains other files?
>>
>>> Any early review or advice/question on the design is welcomed.
>>
>> The implementation is looks simpler than the Liu Bo's, but (IMHO) at the
>> cost of reduced funcionality.
>>
>> Ideally, we merge one patchset with all desired functionality. Some kind
>> of control interface is needed not only to enable/dsiable the whole
>> feature but to affect the trade-offs (memory consumptin vs dedup
>> efficiency vs speed), and that in a way that's flexible according to
>> immediate needs.
>>
>> The persistent dedup hash storage is not mandatory in theory, so we
>> could implement an "in-memory tree only" mode, ie. what you're
>> proposing, on top of Liu Bo's patchset.
>>

  reply	other threads:[~2015-08-27  0:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-28  8:30 [PATCH RFC 00/14] Yet Another In-band(online) deduplication implement Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 01/14] btrfs: file-item: Introduce btrfs_setup_file_extent function Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 02/14] btrfs: Use btrfs_fill_file_extent to reduce duplicated codes Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 03/14] btrfs: dedup: Add basic init/free functions for inband dedup Qu Wenruo
2015-07-28  8:30 ` [PATCH RFC 04/14] btrfs: dedup: Add internal add/remove/search function for btrfs dedup Qu Wenruo
2015-07-28  8:56 ` [PATCH RFC 00/14] Yet Another In-band(online) deduplication implement Qu Wenruo
2015-07-28  9:52 ` Liu Bo
2015-07-29  2:09   ` Qu Wenruo
2015-07-28 14:50 ` David Sterba
2015-07-29  1:07   ` Chris Mason
2015-07-29  1:47   ` Qu Wenruo
2015-07-29  2:40     ` Liu Bo
2015-08-03  7:18   ` Qu Wenruo
2015-08-27  0:52     ` Qu Wenruo [this message]
2015-08-27  9:14     ` David Sterba
2015-08-31  1:13       ` Qu Wenruo
2015-09-22 15:07         ` David Sterba
2015-09-23  7:16           ` Qu Wenruo
2015-07-28  9:14 Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55DE5F49.9@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.