All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Chris Mason <clm@fb.com>, <linux-btrfs@vger.kernel.org>,
	Liu Bo <bo.li.liu@oracle.com>,
	Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Subject: Re: [PATCH v8 10/27] btrfs: dedupe: Add basic tree structure for on-disk dedupe method
Date: Fri, 25 Mar 2016 09:59:39 +0800	[thread overview]
Message-ID: <56F49B8B.4000701@cn.fujitsu.com> (raw)
In-Reply-To: <20160324205807.ctxig7c2r4rfoowp@floor.thefacebook.com>



Chris Mason wrote on 2016/03/24 16:58 -0400:
> On Tue, Mar 22, 2016 at 09:35:35AM +0800, Qu Wenruo wrote:
>> Introduce a new tree, dedupe tree to record on-disk dedupe hash.
>> As a persist hash storage instead of in-memeory only implement.
>>
>> Unlike Liu Bo's implement, in this version we won't do hack for
>> bytenr -> hash search, but add a new type, DEDUP_BYTENR_ITEM for such
>> search case, just like in-memory backend.
>
> Thanks for refreshing this again, I'm starting to go through the disk
> format in more detail.
>
>>
>> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>   fs/btrfs/ctree.h             | 63 +++++++++++++++++++++++++++++++++++++++++++-
>>   fs/btrfs/dedupe.h            |  5 ++++
>>   fs/btrfs/disk-io.c           |  1 +
>>   include/trace/events/btrfs.h |  3 ++-
>>   4 files changed, 70 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 022ab61..bed9273 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -100,6 +100,9 @@ struct btrfs_ordered_sum;
>>   /* tracks free space in block groups. */
>>   #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
>>
>> +/* on-disk dedupe tree (EXPERIMENTAL) */
>> +#define BTRFS_DEDUPE_TREE_OBJECTID 11ULL
>> +
>>   /* device stats in the device tree */
>>   #define BTRFS_DEV_STATS_OBJECTID 0ULL
>>
>> @@ -508,6 +511,7 @@ struct btrfs_super_block {
>>    * ones specified below then we will fail to mount
>>    */
>>   #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE	(1ULL << 0)
>> +#define BTRFS_FEATURE_COMPAT_RO_DEDUPE		(1ULL << 1)
>>
>>   #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF	(1ULL << 0)
>>   #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
>> @@ -537,7 +541,8 @@ struct btrfs_super_block {
>>   #define BTRFS_FEATURE_COMPAT_SAFE_CLEAR		0ULL
>>
>>   #define BTRFS_FEATURE_COMPAT_RO_SUPP			\
>> -	(BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE)
>> +	(BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |	\
>> +	 BTRFS_FEATURE_COMPAT_RO_DEDUPE)
>>
>>   #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET	0ULL
>>   #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR	0ULL
>> @@ -959,6 +964,42 @@ struct btrfs_csum_item {
>>   	u8 csum;
>>   } __attribute__ ((__packed__));
>>
>> +/*
>> + * Objectid: 0
>> + * Type: BTRFS_DEDUPE_STATUS_ITEM_KEY
>> + * Offset: 0
>> + */
>> +struct btrfs_dedupe_status_item {
>> +	__le64 blocksize;
>> +	__le64 limit_nr;
>> +	__le16 hash_type;
>> +	__le16 backend;
>> +} __attribute__ ((__packed__));
>> +
>> +/*
>> + * Objectid: Last 64 bit of the hash
>> + * Type: BTRFS_DEDUPE_HASH_ITEM_KEY
>> + * Offset: Bytenr of the hash
>> + *
>> + * Used for hash <-> bytenr search
>> + */
>> +struct btrfs_dedupe_hash_item {
>> +	/* length of dedupe range */
>> +	__le32 len;
>> +
>> +	/* Hash follows */
>> +} __attribute__ ((__packed__));
>
> Are you storing the entire hash, or just the parts not represented in
> the key?  I'd like to keep the on-disk part as compact as possible for
> this part.

Currently, it's entire hash.

More detailed can be checked in another mail.
http://article.gmane.org/gmane.comp.file-systems.btrfs/54432

Although it's OK to truncate the last duplicated 8 bytes(64bit) for me,
I still quite like current implementation, as one memcpy() is simpler.

>
>> +
>> +/*
>> + * Objectid: bytenr
>> + * Type: BTRFS_DEDUPE_BYTENR_ITEM_KEY
>> + * offset: Last 64 bit of the hash
>> + *
>> + * Used for bytenr <-> hash search (for free_extent)
>> + * all its content is hash.
>> + * So no special item struct is needed.
>> + */
>> +
>
> Can we do this instead with a backref from the extent?  It'll save us a
> huge amount of IO as we delete things.

That's the original implementation from Liu Bo.

The problem is, it changes the data backref rules(originally, only 
EXTENT_DATA item can cause data backref), and will make dedupe INCOMPACT 
other than current RO_COMPACT.
So I really don't like to change the data backref rule.


If only want to reduce ondisk space, just trashing the hash and making 
DEDUPE_BYTENR_ITEM have no data would be good enough.

As (bytenr, DEDEUPE_BYTENR_ITEM) can locate the hash uniquely.

In fact no code really checked the hash for dedupe bytenr item, they all 
just swap objectid and offset, reset the type and do search for 
DEDUPE_HASH_ITEM.

So it's OK to emit the hash.

Thanks,
Qu

>
> -chris
>
>



  reply	other threads:[~2016-03-25  1:59 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-22  1:35 [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 01/27] btrfs: dedupe: Introduce dedupe framework and its header Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 02/27] btrfs: dedupe: Introduce function to initialize dedupe info Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 03/27] btrfs: dedupe: Introduce function to add hash into in-memory tree Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 04/27] btrfs: dedupe: Introduce function to remove hash from " Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 05/27] btrfs: delayed-ref: Add support for increasing data ref under spinlock Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 06/27] btrfs: dedupe: Introduce function to search for an existing hash Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 07/27] btrfs: dedupe: Implement btrfs_dedupe_calc_hash interface Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 08/27] btrfs: ordered-extent: Add support for dedupe Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 09/27] btrfs: dedupe: Inband in-memory only de-duplication implement Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 10/27] btrfs: dedupe: Add basic tree structure for on-disk dedupe method Qu Wenruo
2016-03-24 20:58   ` Chris Mason
2016-03-25  1:59     ` Qu Wenruo [this message]
2016-03-25 15:11       ` Chris Mason
2016-03-26 13:11         ` Qu Wenruo
2016-03-28 14:09           ` Chris Mason
2016-03-29  1:47             ` Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 11/27] btrfs: dedupe: Introduce interfaces to resume and cleanup dedupe info Qu Wenruo
2016-03-29 17:31   ` Alex Lyakas
2016-03-30  0:26     ` Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 12/27] btrfs: dedupe: Add support for on-disk hash search Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 13/27] btrfs: dedupe: Add support to delete hash for on-disk backend Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 14/27] btrfs: dedupe: Add support for adding " Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 15/27] btrfs: dedupe: Add ioctl for inband dedupelication Qu Wenruo
2016-03-22  2:29   ` kbuild test robot
2016-03-22  2:48   ` kbuild test robot
2016-03-22  1:35 ` [PATCH v8 16/27] btrfs: dedupe: add an inode nodedupe flag Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 17/27] btrfs: dedupe: add a property handler for online dedupe Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 18/27] btrfs: dedupe: add per-file online dedupe control Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 19/27] btrfs: try more times to alloc metadata reserve space Qu Wenruo
2016-04-22 18:06   ` Josef Bacik
2016-04-25  0:54     ` Qu Wenruo
2016-04-25 14:05       ` Josef Bacik
2016-04-26  0:50         ` Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 20/27] btrfs: dedupe: Fix a bug when running inband dedupe with balance Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 21/27] btrfs: Fix a memory leak in inband dedupe hash Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 22/27] btrfs: dedupe: Fix metadata balance error when dedupe is enabled Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 23/27] btrfs: dedupe: Avoid submit IO for hash hit extent Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 24/27] btrfs: dedupe: Preparation for compress-dedupe co-work Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 25/27] btrfs: dedupe: Add support for compression and dedpue Qu Wenruo
2016-03-24 20:35   ` Chris Mason
2016-03-25  1:44     ` Qu Wenruo
2016-03-25 15:12       ` Chris Mason
2016-03-22  1:35 ` [PATCH v8 26/27] btrfs: relocation: Enhance error handling to avoid BUG_ON Qu Wenruo
2016-03-22  1:35 ` [PATCH v8 27/27] btrfs: dedupe: Fix a space cache delalloc bytes underflow bug Qu Wenruo
2016-03-22 13:38 ` [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework David Sterba
2016-03-23  2:25   ` Qu Wenruo
2016-03-24 13:42     ` David Sterba
2016-03-25  1:38       ` Qu Wenruo
2016-04-04 16:55         ` David Sterba
2016-04-05  3:08           ` Qu Wenruo
2016-04-20  2:02             ` Qu Wenruo
2016-04-20 19:14               ` Chris Mason
2016-04-06  3:47           ` Nicholas D Steeves
2016-04-06  5:22             ` Qu Wenruo
2016-04-22 22:14               ` Nicholas D Steeves
2016-04-25  1:25                 ` Qu Wenruo
2016-03-29 17:22 ` Alex Lyakas
2016-03-30  0:34   ` Qu Wenruo
2016-03-30 10:36     ` Alex Lyakas
2016-04-03  8:22     ` Alex Lyakas
2016-04-05  3:51       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56F49B8B.4000701@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=bo.li.liu@oracle.com \
    --cc=clm@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wangxg.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.