All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework
@ 2016-01-14  5:57 Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 01/18] btrfs: dedup: Introduce dedup framework and its header Qu Wenruo
                   ` (17 more replies)
  0 siblings, 18 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs

This updated version of inband de-duplication has the following features:
1) ONE unified dedup framework.
   Most of its code is hidden quietly in dedup.c and export the minimal
   interfaces for its caller.
   Reviewer and further developer would benefit from the unified
   framework.

2) TWO different back-end with different trade-off
   One is the improved version of previous Fujitsu in-memory only dedup.
   The other one is enhanced dedup implementation from Liu Bo.
   Changed its tree structure to handle bytenr -> hash search for
   deleting hash, without the hideous data backref hack.

3) Ioctl interface with persist dedup status
   Advised by David, now we use ioctl to enable/disable dedup.
   Further we will add per-file/dir dedup disable command in that ioctl.

   And we now have dedup status, recorded in the first item of dedup
   tree.
   Just like quota, once enabled, no extra ioctl is needed for next
   mount.

4) Ability to disable dedup for given dirs/files
   It works just like the compression prop method, by adding a new
   xattr.

For stability, we have run xfstests with *DEDUP ENABLED*, and except
some ENOSPC test case which will take life long to fill,
dedup are all OK and caused nothing wrong.

TODO:
1) Support compression for hash miss case
   It will go though non-compress routine, no compressing it even we can.
   This may need to change the on-disk format for on-disk backend.

2) Add extent-by-extent comparison for faster but more conflicting algorithm
   Current SHA256 hash is quite slow, and for some old(5 years ago) CPU,
   CPU may even be a bottleneck other than IO.
   But for faster hash, it will definitely cause conflicts, so we need
   extent comparison before we introduce new dedup algorithm.
   

Changelog:
v2:
  Totally reworked to handle multiple backends
v3:
  Fix a stupid but deadly on-disk backend bug
  Add handle for multiple hash on same bytenr corner case to fix abort
  trans error
  Increase dedup rate by enhancing delayed ref handler for both backend.
  Move dedup_add() to run_delayed_ref() time, to fix abort trans error.
  Increase dedup block size up limit to 8M.
v4:
  Add dedup prop for disabling dedup for given files/dirs.
  Merge inmem_search() and ondisk_search() into generic_search() to save
  some code
  Fix another delayed_ref related bug.
  Use the same mutex for both inmem and ondisk backend.
  Move dedup_add() back to btrfs_finish_ordered_io() to increase dedup
  rate.

Qu Wenruo (6):
  btrfs: delayed-ref: Add support for atomic increasing extent ref
  btrfs: dedup: Add basic tree structure for on-disk dedup method
  btrfs: dedup: Introduce interfaces to resume and cleanup dedup info
  btrfs: dedup: Add support for on-disk hash search
  btrfs: dedup: Add support to delete hash for on-disk backend
  btrfs: dedup: Add support for adding hash for on-disk backend

Wang Xiaoguang (12):
  btrfs: dedup: Introduce dedup framework and its header
  btrfs: dedup: Introduce function to initialize dedup info
  btrfs: dedup: Introduce function to add hash into in-memory tree
  btrfs: dedup: Introduce function to remove hash from in-memory tree
  btrfs: dedup: Introduce function to search for an existing hash
  btrfs: dedup: Implement btrfs_dedup_calc_hash interface
  btrfs: ordered-extent: Add support for dedup
  btrfs: dedup: Inband in-memory only de-duplication implement
  btrfs: dedup: Add ioctl for inband deduplication
  btrfs: dedup: add an inode nodedup flag
  btrfs: dedup: add a property handler for online dedup
  btrfs: dedup: add per-file online dedup control

 fs/btrfs/Makefile            |   2 +-
 fs/btrfs/ctree.h             |  76 +++-
 fs/btrfs/dedup.c             | 957 +++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/dedup.h             | 154 +++++++
 fs/btrfs/delayed-ref.c       |  25 +-
 fs/btrfs/delayed-ref.h       |   2 +-
 fs/btrfs/disk-io.c           |  28 +-
 fs/btrfs/disk-io.h           |   1 +
 fs/btrfs/extent-tree.c       |  40 +-
 fs/btrfs/extent_io.c         |  30 +-
 fs/btrfs/extent_io.h         |  15 +
 fs/btrfs/inode.c             | 314 +++++++++++++-
 fs/btrfs/ioctl.c             |  69 +++-
 fs/btrfs/ordered-data.c      |  33 +-
 fs/btrfs/ordered-data.h      |  13 +
 fs/btrfs/props.c             |  40 ++
 include/trace/events/btrfs.h |   3 +-
 include/uapi/linux/btrfs.h   |  23 ++
 18 files changed, 1779 insertions(+), 46 deletions(-)
 create mode 100644 fs/btrfs/dedup.c
 create mode 100644 fs/btrfs/dedup.h

-- 
2.7.0




^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 01/18] btrfs: dedup: Introduce dedup framework and its header
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 02/18] btrfs: dedup: Introduce function to initialize dedup info Qu Wenruo
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Introduce the header for btrfs online(write time) de-duplication
framework and needed header.

The new de-duplication framework is going to support 2 different dedup
method and 1 dedup hash.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/ctree.h |   3 ++
 fs/btrfs/dedup.h | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+)
 create mode 100644 fs/btrfs/dedup.h

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c5f40dc..2132fa5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1860,6 +1860,9 @@ struct btrfs_fs_info {
 	struct list_head pinned_chunks;
 
 	int creating_free_space_tree;
+
+	/* reference to inband de-duplication info */
+	struct btrfs_dedup_info *dedup_info;
 };
 
 struct btrfs_subvolume_writers {
diff --git a/fs/btrfs/dedup.h b/fs/btrfs/dedup.h
new file mode 100644
index 0000000..f0edc76
--- /dev/null
+++ b/fs/btrfs/dedup.h
@@ -0,0 +1,123 @@
+/*
+ * Copyright (C) 2015 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#ifndef __BTRFS_DEDUP__
+#define __BTRFS_DEDUP__
+
+#include <linux/btrfs.h>
+#include <crypto/hash.h>
+
+/*
+ * Dedup storage backend
+ * On disk is persist storage but overhead is large
+ * In memory is fast but will lose all its hash on umount
+ */
+#define BTRFS_DEDUP_BACKEND_INMEMORY		0
+#define BTRFS_DEDUP_BACKEND_ONDISK		1
+#define BTRFS_DEDUP_BACKEND_LAST		2
+
+/* Dedup block size limit and default value */
+#define BTRFS_DEDUP_BLOCKSIZE_MAX	(8 * 1024 * 1024)
+#define BTRFS_DEDUP_BLOCKSIZE_MIN	(16 * 1024)
+#define BTRFS_DEDUP_BLOCKSIZE_DEFAULT	(32 * 1024)
+
+/* Hash algorithm, only support SHA256 yet */
+#define BTRFS_DEDUP_HASH_SHA256		0
+
+static int btrfs_dedup_sizes[] = { 32 };
+
+/*
+ * For caller outside of dedup.c
+ *
+ * Different dedup backends should have their own hash structure
+ */
+struct btrfs_dedup_hash {
+	u64 bytenr;
+	u32 num_bytes;
+
+	/* last field is a variable length array of dedup hash */
+	u8 hash[];
+};
+
+struct btrfs_dedup_info {
+	/* dedup blocksize */
+	u64 blocksize;
+	u16 backend;
+	u16 hash_type;
+
+	struct crypto_shash *dedup_driver;
+	struct mutex lock;
+
+	/* following members are only used in in-memory dedup mode */
+	struct rb_root hash_root;
+	struct rb_root bytenr_root;
+	struct list_head lru_list;
+	u64 limit_nr;
+	u64 current_nr;
+};
+
+struct btrfs_trans_handle;
+
+int btrfs_dedup_hash_size(u16 type);
+struct btrfs_dedup_hash *btrfs_dedup_alloc_hash(u16 type);
+
+/*
+ * Initial inband dedup info
+ * Called at dedup enable time.
+ */
+int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
+		       u64 blocksize, u64 limit);
+
+/*
+ * Disable dedup and invalidate all its dedup data.
+ * Called at dedup disable time.
+ */
+int btrfs_dedup_disable(struct btrfs_fs_info *fs_info);
+
+/*
+ * Calculate hash for dedup.
+ * Caller must ensure [start, start + dedup_bs) has valid data.
+ */
+int btrfs_dedup_calc_hash(struct btrfs_root *root, struct inode *inode,
+			  u64 start, struct btrfs_dedup_hash *hash);
+
+/*
+ * Search for duplicated extents by calculated hash
+ * Caller must call btrfs_dedup_calc_hash() first to get the hash.
+ *
+ * @inode: the inode for we are writing
+ * @file_pos: offset inside the inode
+ * As we will increase extent ref immediately after a hash match,
+ * we need @file_pos and @inode in this case.
+ *
+ * Return > 0 for a hash match, and the extent ref will be
+ * *INCREASED*, and hash->bytenr/num_bytes will record the existing
+ * extent data.
+ * Return 0 for a hash miss. Nothing is done
+ */
+int btrfs_dedup_search(struct inode *inode, u64 file_pos,
+		       struct btrfs_dedup_hash *hash);
+
+/* Add a dedup hash into dedup info */
+int btrfs_dedup_add(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+		    struct btrfs_dedup_hash *hash);
+
+/* Remove a dedup hash from dedup info */
+int btrfs_dedup_del(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+		    u64 bytenr);
+#endif
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 02/18] btrfs: dedup: Introduce function to initialize dedup info
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 01/18] btrfs: dedup: Introduce dedup framework and its header Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14 21:33   ` kbuild test robot
  2016-01-14  5:57 ` [PATCH v4 03/18] btrfs: dedup: Introduce function to add hash into in-memory tree Qu Wenruo
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Add generic function to initialize dedup info.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/Makefile |  2 +-
 fs/btrfs/dedup.c  | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/dedup.h  | 14 ++++++--
 3 files changed, 109 insertions(+), 3 deletions(-)
 create mode 100644 fs/btrfs/dedup.c

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 128ce17..a6207ff 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
 	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
-	   uuid-tree.o props.o hash.o free-space-tree.o
+	   uuid-tree.o props.o hash.o free-space-tree.o dedup.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
new file mode 100644
index 0000000..10a1db4
--- /dev/null
+++ b/fs/btrfs/dedup.c
@@ -0,0 +1,96 @@
+/*
+ * Copyright (C) 2015 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+#include "ctree.h"
+#include "dedup.h"
+#include "btrfs_inode.h"
+#include "transaction.h"
+#include "delayed-ref.h"
+
+int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
+		       u64 blocksize, u64 limit)
+{
+	struct btrfs_dedup_info *dedup_info;
+	int ret = 0;
+
+	/* Sanity check */
+	if (blocksize > BTRFS_DEDUP_BLOCKSIZE_MAX ||
+	    blocksize < BTRFS_DEDUP_BLOCKSIZE_MIN ||
+	    blocksize < fs_info->tree_root->sectorsize ||
+	    !is_power_of_2(blocksize))
+		return -EINVAL;
+	if (type > ARRAY_SIZE(btrfs_dedup_sizes))
+		return -EINVAL;
+	if (backend >= BTRFS_DEDUP_BACKEND_LAST)
+		return -EINVAL;
+	if (backend == BTRFS_DEDUP_BACKEND_INMEMORY && limit == 0)
+		limit = 4096; /* default value */
+	if (backend == BTRFS_DEDUP_BACKEND_ONDISK && limit != 0)
+		limit = 0;
+
+	if (fs_info->dedup_info) {
+		dedup_info = fs_info->dedup_info;
+
+		/* Check if we are re-enable for different dedup config */
+		if (dedup_info->blocksize != blocksize ||
+		    dedup_info->hash_type != type ||
+		    dedup_info->backend != backend) {
+			btrfs_dedup_disable(fs_info);
+			goto enable;
+		}
+
+		/* On-fly limit change is OK */
+		mutex_lock(&dedup_info->lock);
+		fs_info->dedup_info->limit_nr = limit;
+		mutex_unlock(&dedup_info->lock);
+		return 0;
+	}
+
+enable:
+	fs_info->dedup_info = kzalloc(sizeof(*dedup_info), GFP_NOFS);
+	if (!fs_info->dedup_info)
+		return -ENOMEM;
+
+	dedup_info = fs_info->dedup_info;
+
+	dedup_info->hash_type = type;
+	dedup_info->backend = backend;
+	dedup_info->blocksize = blocksize;
+	dedup_info->limit_nr = limit;
+
+	/* Only support SHA256 yet */
+	dedup_info->dedup_driver = crypto_alloc_shash("sha256", 0, 0);
+	if (IS_ERR(dedup_info->dedup_driver)) {
+		btrfs_err(fs_info, "failed to init sha256 driver");
+		ret = PTR_ERR(dedup_info->dedup_driver);
+		goto out;
+	}
+
+	dedup_info->hash_root = RB_ROOT;
+	dedup_info->bytenr_root = RB_ROOT;
+	dedup_info->current_nr = 0;
+	INIT_LIST_HEAD(&dedup_info->lru_list);
+	mutex_init(&dedup_info->lock);
+
+	fs_info->dedup_info = dedup_info;
+out:
+	if (ret < 0) {
+		kfree(dedup_info);
+		fs_info->dedup_info = NULL;
+	}
+	return ret;
+}
diff --git a/fs/btrfs/dedup.h b/fs/btrfs/dedup.h
index f0edc76..a859ad8 100644
--- a/fs/btrfs/dedup.h
+++ b/fs/btrfs/dedup.h
@@ -73,8 +73,18 @@ struct btrfs_dedup_info {
 
 struct btrfs_trans_handle;
 
-int btrfs_dedup_hash_size(u16 type);
-struct btrfs_dedup_hash *btrfs_dedup_alloc_hash(u16 type);
+static inline int btrfs_dedup_hash_size(u16 type)
+{
+	if (WARN_ON(type >= ARRAY_SIZE(btrfs_dedup_sizes)))
+		return -EINVAL;
+	return sizeof(struct btrfs_dedup_hash) + btrfs_dedup_sizes[type];
+}
+
+static inline struct btrfs_dedup_hash *btrfs_dedup_alloc_hash(u16 type)
+{
+	return kzalloc(btrfs_dedup_hash_size(type), GFP_NOFS);
+}
+
 
 /*
  * Initial inband dedup info
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 03/18] btrfs: dedup: Introduce function to add hash into in-memory tree
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 01/18] btrfs: dedup: Introduce dedup framework and its header Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 02/18] btrfs: dedup: Introduce function to initialize dedup info Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 04/18] btrfs: dedup: Introduce function to remove hash from " Qu Wenruo
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Introduce static function inmem_add() to add hash into in-memory tree.
And now we can implement the btrfs_dedup_add() interface.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/dedup.c | 155 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 155 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 10a1db4..727424e 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -21,6 +21,25 @@
 #include "transaction.h"
 #include "delayed-ref.h"
 
+struct inmem_hash {
+	struct rb_node hash_node;
+	struct rb_node bytenr_node;
+	struct list_head lru_list;
+
+	u64 bytenr;
+	u32 num_bytes;
+
+	u8 hash[];
+};
+
+static inline struct inmem_hash *inmem_alloc_hash(u16 type)
+{
+	if (WARN_ON(type >= ARRAY_SIZE(btrfs_dedup_sizes)))
+		return NULL;
+	return kzalloc(sizeof(struct inmem_hash) + btrfs_dedup_sizes[type],
+			GFP_NOFS);
+}
+
 int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
 		       u64 blocksize, u64 limit)
 {
@@ -94,3 +113,139 @@ out:
 	}
 	return ret;
 }
+
+static int inmem_insert_hash(struct rb_root *root,
+			     struct inmem_hash *hash, int hash_len)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct inmem_hash *entry = NULL;
+
+	while (*p) {
+		parent = *p;
+		entry = rb_entry(parent, struct inmem_hash, hash_node);
+		if (memcmp(hash->hash, entry->hash, hash_len) < 0)
+			p = &(*p)->rb_left;
+		else if (memcmp(hash->hash, entry->hash, hash_len) > 0)
+			p = &(*p)->rb_right;
+		else
+			return 1;
+	}
+	rb_link_node(&hash->hash_node, parent, p);
+	rb_insert_color(&hash->hash_node, root);
+	return 0;
+}
+
+static int inmem_insert_bytenr(struct rb_root *root,
+			       struct inmem_hash *hash)
+{
+	struct rb_node **p = &root->rb_node;
+	struct rb_node *parent = NULL;
+	struct inmem_hash *entry = NULL;
+
+	while (*p) {
+		parent = *p;
+		entry = rb_entry(parent, struct inmem_hash, bytenr_node);
+		if (hash->bytenr < entry->bytenr)
+			p = &(*p)->rb_left;
+		else if (hash->bytenr > entry->bytenr)
+			p = &(*p)->rb_right;
+		else
+			return 1;
+	}
+	rb_link_node(&hash->bytenr_node, parent, p);
+	rb_insert_color(&hash->bytenr_node, root);
+	return 0;
+}
+
+static void __inmem_del(struct btrfs_dedup_info *dedup_info,
+			struct inmem_hash *hash)
+{
+	list_del(&hash->lru_list);
+	rb_erase(&hash->hash_node, &dedup_info->hash_root);
+	rb_erase(&hash->bytenr_node, &dedup_info->bytenr_root);
+
+	if (!WARN_ON(dedup_info->current_nr == 0))
+		dedup_info->current_nr--;
+
+	kfree(hash);
+}
+
+/*
+ * Insert a hash into in-memory dedup tree
+ * Will remove exceeding last recent use hash.
+ *
+ * If the hash mathced with existing one, we won't insert it, to
+ * save memory
+ */
+static int inmem_add(struct btrfs_dedup_info *dedup_info,
+		     struct btrfs_dedup_hash *hash)
+{
+	int ret = 0;
+	u16 type = dedup_info->hash_type;
+	struct inmem_hash *ihash;
+
+	ihash = inmem_alloc_hash(type);
+
+	if (!ihash)
+		return -ENOMEM;
+
+	/* Copy the data out */
+	ihash->bytenr = hash->bytenr;
+	ihash->num_bytes = hash->num_bytes;
+	memcpy(ihash->hash, hash->hash, btrfs_dedup_sizes[type]);
+
+	mutex_lock(&dedup_info->lock);
+
+	ret = inmem_insert_bytenr(&dedup_info->bytenr_root, ihash);
+	if (ret > 0) {
+		kfree(ihash);
+		ret = 0;
+		goto out;
+	}
+
+	ret = inmem_insert_hash(&dedup_info->hash_root, ihash,
+				btrfs_dedup_sizes[type]);
+	if (ret > 0) {
+		/*
+		 * We only keep one hash in tree to save memory, so if
+		 * hash conflicts, free the one to insert.
+		 */
+		rb_erase(&ihash->bytenr_node, &dedup_info->bytenr_root);
+		kfree(ihash);
+		ret = 0;
+		goto out;
+	}
+
+	list_add(&ihash->lru_list, &dedup_info->lru_list);
+	dedup_info->current_nr++;
+
+	/* Remove the last dedup hash if we exceed limit */
+	while (dedup_info->current_nr > dedup_info->limit_nr) {
+		struct inmem_hash *last;
+
+		last = list_entry(dedup_info->lru_list.prev,
+				  struct inmem_hash, lru_list);
+		__inmem_del(dedup_info, last);
+	}
+out:
+	mutex_unlock(&dedup_info->lock);
+	return 0;
+}
+
+int btrfs_dedup_add(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+		    struct btrfs_dedup_hash *hash)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_dedup_info *dedup_info = fs_info->dedup_info;
+
+	if (!dedup_info || !hash)
+		return 0;
+
+	if (WARN_ON(hash->bytenr == 0))
+		return -EINVAL;
+
+	if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
+		return inmem_add(dedup_info, hash);
+	return -EINVAL;
+}
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 04/18] btrfs: dedup: Introduce function to remove hash from in-memory tree
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (2 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 03/18] btrfs: dedup: Introduce function to add hash into in-memory tree Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 05/18] btrfs: delayed-ref: Add support for atomic increasing extent ref Qu Wenruo
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Introduce static function inmem_del() to remove hash from in-memory
dedup tree.
And implement btrfs_dedup_del() and btrfs_dedup_destroy() interfaces.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/dedup.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 727424e..39391b8 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -249,3 +249,80 @@ int btrfs_dedup_add(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 		return inmem_add(dedup_info, hash);
 	return -EINVAL;
 }
+
+static struct inmem_hash *
+inmem_search_bytenr(struct btrfs_dedup_info *dedup_info, u64 bytenr)
+{
+	struct rb_node **p = &dedup_info->bytenr_root.rb_node;
+	struct rb_node *parent = NULL;
+	struct inmem_hash *entry = NULL;
+
+	while (*p) {
+		parent = *p;
+		entry = rb_entry(parent, struct inmem_hash, bytenr_node);
+
+		if (bytenr < entry->bytenr)
+			p = &(*p)->rb_left;
+		else if (bytenr > entry->bytenr)
+			p = &(*p)->rb_right;
+		else
+			return entry;
+	}
+
+	return NULL;
+}
+
+/* Delete a hash from in-memory dedup tree */
+static int inmem_del(struct btrfs_dedup_info *dedup_info, u64 bytenr)
+{
+	struct inmem_hash *hash;
+
+	mutex_lock(&dedup_info->lock);
+	hash = inmem_search_bytenr(dedup_info, bytenr);
+	if (!hash) {
+		mutex_unlock(&dedup_info->lock);
+		return 0;
+	}
+
+	__inmem_del(dedup_info, hash);
+	mutex_unlock(&dedup_info->lock);
+	return 0;
+}
+
+/* Remove a dedup hash from dedup tree */
+int btrfs_dedup_del(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+		    u64 bytenr)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_dedup_info *dedup_info = fs_info->dedup_info;
+
+	if (!dedup_info)
+		return 0;
+
+	if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
+		return inmem_del(dedup_info, bytenr);
+	return -EINVAL;
+}
+
+static void inmem_destroy(struct btrfs_fs_info *fs_info)
+{
+	struct inmem_hash *entry, *tmp;
+	struct btrfs_dedup_info *dedup_info = fs_info->dedup_info;
+
+	mutex_lock(&dedup_info->lock);
+	list_for_each_entry_safe(entry, tmp, &dedup_info->lru_list, lru_list)
+		__inmem_del(dedup_info, entry);
+	mutex_unlock(&dedup_info->lock);
+}
+
+int btrfs_dedup_disable(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_dedup_info *dedup_info = fs_info->dedup_info;
+
+	if (!dedup_info)
+		return 0;
+
+	if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
+		inmem_destroy(fs_info);
+	return 0;
+}
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 05/18] btrfs: delayed-ref: Add support for atomic increasing extent ref
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (3 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 04/18] btrfs: dedup: Introduce function to remove hash from " Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  9:56   ` Filipe Manana
  2016-01-14  5:57 ` [PATCH v4 06/18] btrfs: dedup: Introduce function to search for an existing hash Qu Wenruo
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs

Slightly modify btrfs_add_delayed_data_ref() to allow it accept
GFP_ATOMIC, and allow it to do be called inside a spinlock.

This is used by later dedup patches.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  4 ++++
 fs/btrfs/delayed-ref.c | 25 +++++++++++++++++--------
 fs/btrfs/delayed-ref.h |  2 +-
 fs/btrfs/extent-tree.c | 24 +++++++++++++++++++++---
 4 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2132fa5..671be87 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3539,6 +3539,10 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 			 struct btrfs_root *root,
 			 u64 bytenr, u64 num_bytes, u64 parent,
 			 u64 root_objectid, u64 owner, u64 offset);
+int btrfs_inc_extent_ref_atomic(struct btrfs_trans_handle *trans,
+				struct btrfs_root *root, u64 bytenr,
+				u64 num_bytes, u64 parent,
+				u64 root_objectid, u64 owner, u64 offset);
 
 int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
 				   struct btrfs_root *root);
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 914ac13..e869442 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -812,26 +812,31 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 			       u64 bytenr, u64 num_bytes,
 			       u64 parent, u64 ref_root,
 			       u64 owner, u64 offset, u64 reserved, int action,
-			       struct btrfs_delayed_extent_op *extent_op)
+			       int atomic)
 {
 	struct btrfs_delayed_data_ref *ref;
 	struct btrfs_delayed_ref_head *head_ref;
 	struct btrfs_delayed_ref_root *delayed_refs;
 	struct btrfs_qgroup_extent_record *record = NULL;
+	gfp_t gfp_flags;
+
+	if (atomic)
+		gfp_flags = GFP_ATOMIC;
+	else
+		gfp_flags = GFP_NOFS;
 
-	BUG_ON(extent_op && !extent_op->is_data);
-	ref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, GFP_NOFS);
+	ref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, gfp_flags);
 	if (!ref)
 		return -ENOMEM;
 
-	head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
+	head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, gfp_flags);
 	if (!head_ref) {
 		kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
 		return -ENOMEM;
 	}
 
 	if (fs_info->quota_enabled && is_fstree(ref_root)) {
-		record = kmalloc(sizeof(*record), GFP_NOFS);
+		record = kmalloc(sizeof(*record), gfp_flags);
 		if (!record) {
 			kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
 			kmem_cache_free(btrfs_delayed_ref_head_cachep,
@@ -840,10 +845,13 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 		}
 	}
 
-	head_ref->extent_op = extent_op;
+	head_ref->extent_op = NULL;
 
 	delayed_refs = &trans->transaction->delayed_refs;
-	spin_lock(&delayed_refs->lock);
+
+	/* For atomic case, caller should already hold the delayed_refs lock */
+	if (!atomic)
+		spin_lock(&delayed_refs->lock);
 
 	/*
 	 * insert both the head node and the new ref without dropping
@@ -856,7 +864,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 	add_delayed_data_ref(fs_info, trans, head_ref, &ref->node, bytenr,
 				   num_bytes, parent, ref_root, owner, offset,
 				   action);
-	spin_unlock(&delayed_refs->lock);
+	if (!atomic)
+		spin_unlock(&delayed_refs->lock);
 
 	return 0;
 }
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index c24b653..e34f96a 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -249,7 +249,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 			       u64 bytenr, u64 num_bytes,
 			       u64 parent, u64 ref_root,
 			       u64 owner, u64 offset, u64 reserved, int action,
-			       struct btrfs_delayed_extent_op *extent_op);
+			       int atomic);
 int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
 				     struct btrfs_trans_handle *trans,
 				     u64 ref_root, u64 bytenr, u64 num_bytes);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 60cc139..4a01ca9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2105,11 +2105,29 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 		ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
 					num_bytes, parent, root_objectid,
 					owner, offset, 0,
-					BTRFS_ADD_DELAYED_REF, NULL);
+					BTRFS_ADD_DELAYED_REF, 0);
 	}
 	return ret;
 }
 
+int btrfs_inc_extent_ref_atomic(struct btrfs_trans_handle *trans,
+				struct btrfs_root *root, u64 bytenr,
+				u64 num_bytes, u64 parent,
+				u64 root_objectid, u64 owner, u64 offset)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+
+	BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID &&
+	       root_objectid == BTRFS_TREE_LOG_OBJECTID);
+
+	/* Only used by dedup, so only data is possible */
+	if (WARN_ON(owner < BTRFS_FIRST_FREE_OBJECTID))
+		return -EINVAL;
+	return btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
+			num_bytes, parent, root_objectid,
+			owner, offset, 0, BTRFS_ADD_DELAYED_REF, 1);
+}
+
 static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 				  struct btrfs_root *root,
 				  struct btrfs_delayed_ref_node *node,
@@ -6893,7 +6911,7 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 						num_bytes,
 						parent, root_objectid, owner,
 						offset, 0,
-						BTRFS_DROP_DELAYED_REF, NULL);
+						BTRFS_DROP_DELAYED_REF, 0);
 	}
 	return ret;
 }
@@ -7845,7 +7863,7 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
 					 ins->offset, 0,
 					 root_objectid, owner, offset,
 					 ram_bytes, BTRFS_ADD_DELAYED_EXTENT,
-					 NULL);
+					 0);
 	return ret;
 }
 
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 06/18] btrfs: dedup: Introduce function to search for an existing hash
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (4 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 05/18] btrfs: delayed-ref: Add support for atomic increasing extent ref Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 07/18] btrfs: dedup: Implement btrfs_dedup_calc_hash interface Qu Wenruo
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Introduce static function inmem_search() to handle the job for in-memory
hash tree.

The trick is, we must ensure the delayed ref head is not being run at
the time we search the for the hash.

With inmem_search(), we can implement the btrfs_dedup_search()
interface.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/dedup.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 139 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 39391b8..d6ee576 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -326,3 +326,142 @@ int btrfs_dedup_disable(struct btrfs_fs_info *fs_info)
 		inmem_destroy(fs_info);
 	return 0;
 }
+
+/*
+ * Caller must ensure the corresponding ref head is not being run.
+ */
+static struct inmem_hash *
+inmem_search_hash(struct btrfs_dedup_info *dedup_info, u8 *hash)
+{
+	struct rb_node **p = &dedup_info->hash_root.rb_node;
+	struct rb_node *parent = NULL;
+	struct inmem_hash *entry = NULL;
+	u16 hash_type = dedup_info->hash_type;
+	int hash_len = btrfs_dedup_sizes[hash_type];
+
+	while (*p) {
+		parent = *p;
+		entry = rb_entry(parent, struct inmem_hash, hash_node);
+
+		if (memcmp(hash, entry->hash, hash_len) < 0) {
+			p = &(*p)->rb_left;
+		} else if (memcmp(hash, entry->hash, hash_len) > 0) {
+			p = &(*p)->rb_right;
+		} else {
+			/* Found, need to re-add it to LRU list head */
+			list_del(&entry->lru_list);
+			list_add(&entry->lru_list, &dedup_info->lru_list);
+			return entry;
+		}
+	}
+	return NULL;
+}
+
+static int inmem_search(struct inode *inode, u64 file_pos,
+			struct btrfs_dedup_hash *hash)
+{
+	int ret;
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_trans_handle *trans;
+	struct btrfs_delayed_ref_root *delayed_refs;
+	struct btrfs_delayed_ref_head *head;
+	struct inmem_hash *found_hash;
+	struct btrfs_dedup_info *dedup_info = fs_info->dedup_info;
+	u64 bytenr;
+	u32 num_bytes;
+
+	trans = btrfs_join_transaction(root);
+	if (IS_ERR(trans))
+		return PTR_ERR(trans);
+
+again:
+	mutex_lock(&dedup_info->lock);
+	found_hash = inmem_search_hash(dedup_info, hash->hash);
+	/* If we don't find a duplicated extent, just return. */
+	if (!found_hash) {
+		ret = 0;
+		goto out;
+	}
+	bytenr = found_hash->bytenr;
+	num_bytes = found_hash->num_bytes;
+
+	delayed_refs = &trans->transaction->delayed_refs;
+
+	spin_lock(&delayed_refs->lock);
+	head = btrfs_find_delayed_ref_head(trans, bytenr);
+	if (!head) {
+		/*
+		 * We can safely insert a new delayed_ref as long as we
+		 * hold delayed_refs->lock.
+		 * Only need to use atomic inc_extent_ref()
+		 */
+		ret = btrfs_inc_extent_ref_atomic(trans, root, bytenr,
+				num_bytes, 0, root->root_key.objectid,
+				btrfs_ino(inode), file_pos);
+		spin_unlock(&delayed_refs->lock);
+
+		if (ret == 0) {
+			hash->bytenr = bytenr;
+			hash->num_bytes = num_bytes;
+			ret = 1;
+		}
+		goto out;
+	}
+
+	/*
+	 * We can't lock ref head with dedup_info->lock hold or we will cause
+	 * ABBA dead lock.
+	 */
+	mutex_unlock(&dedup_info->lock);
+	ret = btrfs_delayed_ref_lock(trans, head);
+	spin_unlock(&delayed_refs->lock);
+	if (ret == -EAGAIN)
+		goto again;
+
+	mutex_lock(&dedup_info->lock);
+	/* Search again to ensure the hash is still here */
+	found_hash = inmem_search_hash(dedup_info, hash->hash);
+	if (!found_hash) {
+		ret = 0;
+		mutex_unlock(&head->mutex);
+		goto out;
+	}
+	hash->bytenr = bytenr;
+	hash->num_bytes = num_bytes;
+
+	/*
+	 * Increase the extent ref right now, to avoid delayed ref run
+	 * Or we may increase ref on non-exist extent.
+	 */
+	btrfs_inc_extent_ref(trans, root, bytenr, num_bytes, 0,
+			     root->root_key.objectid,
+			     btrfs_ino(inode), file_pos);
+	mutex_unlock(&head->mutex);
+out:
+	mutex_unlock(&dedup_info->lock);
+	btrfs_end_transaction(trans, root);
+
+	return ret;
+}
+
+int btrfs_dedup_search(struct inode *inode, u64 file_pos,
+		       struct btrfs_dedup_hash *hash)
+{
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+	struct btrfs_dedup_info *dedup_info = fs_info->dedup_info;
+	int ret = 0;
+
+	if (WARN_ON(!dedup_info || !hash))
+		return 0;
+
+	if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
+		ret = inmem_search(inode, file_pos, hash);
+
+	/* It's possible hash->bytenr/num_bytenr already changed */
+	if (ret == 0) {
+		hash->num_bytes = 0;
+		hash->bytenr = 0;
+	}
+	return ret;
+}
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 07/18] btrfs: dedup: Implement btrfs_dedup_calc_hash interface
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (5 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 06/18] btrfs: dedup: Introduce function to search for an existing hash Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14 10:08   ` Filipe Manana
  2016-01-14  5:57 ` [PATCH v4 08/18] btrfs: ordered-extent: Add support for dedup Qu Wenruo
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Unlike in-memory or on-disk dedup method, only SHA256 hash method is
supported yet, so implement btrfs_dedup_calc_hash() interface using
SHA256.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/dedup.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index d6ee576..8860916 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -465,3 +465,60 @@ int btrfs_dedup_search(struct inode *inode, u64 file_pos,
 	}
 	return ret;
 }
+
+static int hash_data(struct btrfs_dedup_info *dedup_info, const char *data,
+		     u64 length, struct btrfs_dedup_hash *hash)
+{
+	struct crypto_shash *tfm = dedup_info->dedup_driver;
+	struct {
+		struct shash_desc desc;
+		char ctx[crypto_shash_descsize(tfm)];
+	} sdesc;
+	int ret;
+
+	sdesc.desc.tfm = tfm;
+	sdesc.desc.flags = 0;
+
+	ret = crypto_shash_digest(&sdesc.desc, data, length,
+				  (char *)(hash->hash));
+	return ret;
+}
+
+int btrfs_dedup_calc_hash(struct btrfs_root *root, struct inode *inode,
+			  u64 start, struct btrfs_dedup_hash *hash)
+{
+	struct page *p;
+	struct btrfs_dedup_info *dedup_info = root->fs_info->dedup_info;
+	char *data;
+	int i;
+	int ret;
+	u64 dedup_bs;
+	u64 sectorsize = root->sectorsize;
+
+	if (!dedup_info || !hash)
+		return 0;
+
+	WARN_ON(!IS_ALIGNED(start, sectorsize));
+
+	dedup_bs = dedup_info->blocksize;
+	sectorsize = root->sectorsize;
+
+	data = kmalloc(dedup_bs, GFP_NOFS);
+	if (!data)
+		return -ENOMEM;
+	for (i = 0; sectorsize * i < dedup_bs; i++) {
+		char *d;
+
+		/* TODO: Add support for subpage size case */
+		p = find_get_page(inode->i_mapping,
+				  (start >> PAGE_CACHE_SHIFT) + i);
+		WARN_ON(!p);
+		d = kmap_atomic(p);
+		memcpy((data + sectorsize * i), d, sectorsize);
+		kunmap_atomic(d);
+		page_cache_release(p);
+	}
+	ret = hash_data(dedup_info, data, dedup_bs, hash);
+	kfree(data);
+	return ret;
+}
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 08/18] btrfs: ordered-extent: Add support for dedup
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (6 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 07/18] btrfs: dedup: Implement btrfs_dedup_calc_hash interface Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 09/18] btrfs: dedup: Inband in-memory only de-duplication implement Qu Wenruo
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Add ordered-extent support for dedup.

Note, current ordered-extent support only supports non-compressed source
extent.
Support for compressed source extent will be added later.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/ordered-data.c | 33 +++++++++++++++++++++++++++++----
 fs/btrfs/ordered-data.h | 13 +++++++++++++
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 8c27292..46493f5 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -25,6 +25,7 @@
 #include "btrfs_inode.h"
 #include "extent_io.h"
 #include "disk-io.h"
+#include "dedup.h"
 
 static struct kmem_cache *btrfs_ordered_extent_cache;
 
@@ -183,12 +184,14 @@ static inline struct rb_node *tree_search(struct btrfs_ordered_inode_tree *tree,
  */
 static int __btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
 				      u64 start, u64 len, u64 disk_len,
-				      int type, int dio, int compress_type)
+				      int type, int dio, int compress_type,
+				      struct btrfs_dedup_hash *hash)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_ordered_inode_tree *tree;
 	struct rb_node *node;
 	struct btrfs_ordered_extent *entry;
+	struct btrfs_dedup_info *dedup_info = root->fs_info->dedup_info;
 
 	tree = &BTRFS_I(inode)->ordered_tree;
 	entry = kmem_cache_zalloc(btrfs_ordered_extent_cache, GFP_NOFS);
@@ -203,6 +206,20 @@ static int __btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
 	entry->inode = igrab(inode);
 	entry->compress_type = compress_type;
 	entry->truncated_len = (u64)-1;
+	entry->hash = NULL;
+	if (hash && dedup_info) {
+		entry->hash = btrfs_dedup_alloc_hash(dedup_info->hash_type);
+		if (!entry->hash) {
+			kmem_cache_free(btrfs_ordered_extent_cache, entry);
+			return -ENOMEM;
+		}
+		/* Hash contains locks, only copy what we need */
+		entry->hash->bytenr = hash->bytenr;
+		entry->hash->num_bytes = hash->num_bytes;
+		memcpy(entry->hash->hash, hash->hash,
+		       btrfs_dedup_sizes[dedup_info->hash_type]);
+	}
+
 	if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE)
 		set_bit(type, &entry->flags);
 
@@ -249,15 +266,23 @@ int btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
 {
 	return __btrfs_add_ordered_extent(inode, file_offset, start, len,
 					  disk_len, type, 0,
-					  BTRFS_COMPRESS_NONE);
+					  BTRFS_COMPRESS_NONE, NULL);
 }
 
+int btrfs_add_ordered_extent_dedup(struct inode *inode, u64 file_offset,
+				   u64 start, u64 len, u64 disk_len, int type,
+				   struct btrfs_dedup_hash *hash)
+{
+	return __btrfs_add_ordered_extent(inode, file_offset, start, len,
+					  disk_len, type, 0,
+					  BTRFS_COMPRESS_NONE, hash);
+}
 int btrfs_add_ordered_extent_dio(struct inode *inode, u64 file_offset,
 				 u64 start, u64 len, u64 disk_len, int type)
 {
 	return __btrfs_add_ordered_extent(inode, file_offset, start, len,
 					  disk_len, type, 1,
-					  BTRFS_COMPRESS_NONE);
+					  BTRFS_COMPRESS_NONE, NULL);
 }
 
 int btrfs_add_ordered_extent_compress(struct inode *inode, u64 file_offset,
@@ -266,7 +291,7 @@ int btrfs_add_ordered_extent_compress(struct inode *inode, u64 file_offset,
 {
 	return __btrfs_add_ordered_extent(inode, file_offset, start, len,
 					  disk_len, type, 0,
-					  compress_type);
+					  compress_type, NULL);
 }
 
 /*
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 23c9605..58519ce 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -139,6 +139,16 @@ struct btrfs_ordered_extent {
 	struct completion completion;
 	struct btrfs_work flush_work;
 	struct list_head work_list;
+
+	/*
+	 * For inband deduplication
+	 * If hash is NULL, no deduplication.
+	 * If hash->bytenr is zero, means this is a dedup miss, hash will
+	 * be added into dedup tree.
+	 * If hash->bytenr is non-zero, this is a dedup hit. Extent ref is
+	 * *ALREADY* increased.
+	 */
+	struct btrfs_dedup_hash *hash;
 };
 
 /*
@@ -172,6 +182,9 @@ int btrfs_dec_test_first_ordered_pending(struct inode *inode,
 				   int uptodate);
 int btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
 			     u64 start, u64 len, u64 disk_len, int type);
+int btrfs_add_ordered_extent_dedup(struct inode *inode, u64 file_offset,
+				   u64 start, u64 len, u64 disk_len, int type,
+				   struct btrfs_dedup_hash *hash);
 int btrfs_add_ordered_extent_dio(struct inode *inode, u64 file_offset,
 				 u64 start, u64 len, u64 disk_len, int type);
 int btrfs_add_ordered_extent_compress(struct inode *inode, u64 file_offset,
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 09/18] btrfs: dedup: Inband in-memory only de-duplication implement
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (7 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 08/18] btrfs: ordered-extent: Add support for dedup Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 10/18] btrfs: dedup: Add basic tree structure for on-disk dedup method Qu Wenruo
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Core implement for inband de-duplication.
It reuse the async_cow_start() facility to do the calculate dedup hash.
And use dedup hash to do inband de-duplication at extent level.

The work flow is as below:
1) Run delalloc range for an inode
2) Calculate hash for the delalloc range at the unit of dedup_bs
3) For hash match(duplicated) case, just increase source extent ref
   and insert file extent.
   For hash mismatch case, go through the normal cow_file_range()
   fallback, and add hash into dedup_tree.
   Compress for hash miss case is not supported yet.

Current implement restore all dedup hash in memory rb-tree, with LRU
behavior to control the limit.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c |  16 +++
 fs/btrfs/extent_io.c   |  30 ++---
 fs/btrfs/extent_io.h   |  15 +++
 fs/btrfs/inode.c       | 301 +++++++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 338 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4a01ca9..a74cf36 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -37,6 +37,7 @@
 #include "math.h"
 #include "sysfs.h"
 #include "qgroup.h"
+#include "dedup.h"
 
 #undef SCRAMBLE_DELAYED_REFS
 
@@ -2431,6 +2432,15 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
 			btrfs_pin_extent(root, node->bytenr,
 					 node->num_bytes, 1);
 			if (head->is_data) {
+				/*
+				 * If insert_reserved is given, it means
+				 * a new extent is revered, then deleted
+				 * in one tran, and inc/dec get merged to 0.
+				 *
+				 * In this case, we need to remove its dedup
+				 * hash.
+				 */
+				btrfs_dedup_del(trans, root, node->bytenr);
 				ret = btrfs_del_csums(trans, root,
 						      node->bytenr,
 						      node->num_bytes);
@@ -6722,6 +6732,12 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 		btrfs_release_path(path);
 
 		if (is_data) {
+			ret = btrfs_dedup_del(trans, root, bytenr);
+			if (ret < 0) {
+				btrfs_abort_transaction(trans, extent_root,
+							ret);
+				goto out;
+			}
 			ret = btrfs_del_csums(trans, root, bytenr, num_bytes);
 			if (ret) {
 				btrfs_abort_transaction(trans, extent_root, ret);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2e7c97a..55edf5a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2456,7 +2456,7 @@ void end_extent_writepage(struct page *page, int err, u64 start, u64 end)
  * Scheduling is not allowed, so the extent state tree is expected
  * to have one and only one object corresponding to this IO.
  */
-static void end_bio_extent_writepage(struct bio *bio)
+void end_bio_extent_writepage(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	u64 start;
@@ -2718,8 +2718,8 @@ struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 }
 
 
-static int __must_check submit_one_bio(int rw, struct bio *bio,
-				       int mirror_num, unsigned long bio_flags)
+int __must_check submit_one_bio(int rw, struct bio *bio,
+				int mirror_num, unsigned long bio_flags)
 {
 	int ret = 0;
 	struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
@@ -2756,18 +2756,18 @@ static int merge_bio(int rw, struct extent_io_tree *tree, struct page *page,
 
 }
 
-static int submit_extent_page(int rw, struct extent_io_tree *tree,
-			      struct writeback_control *wbc,
-			      struct page *page, sector_t sector,
-			      size_t size, unsigned long offset,
-			      struct block_device *bdev,
-			      struct bio **bio_ret,
-			      unsigned long max_pages,
-			      bio_end_io_t end_io_func,
-			      int mirror_num,
-			      unsigned long prev_bio_flags,
-			      unsigned long bio_flags,
-			      bool force_bio_submit)
+int submit_extent_page(int rw, struct extent_io_tree *tree,
+			struct writeback_control *wbc,
+			struct page *page, sector_t sector,
+			size_t size, unsigned long offset,
+			struct block_device *bdev,
+			struct bio **bio_ret,
+			unsigned long max_pages,
+			bio_end_io_t end_io_func,
+			int mirror_num,
+			unsigned long prev_bio_flags,
+			unsigned long bio_flags,
+			bool force_bio_submit)
 {
 	int ret = 0;
 	struct bio *bio;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 0377413..053f745 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -436,6 +436,21 @@ int clean_io_failure(struct inode *inode, u64 start, struct page *page,
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 int repair_eb_io_failure(struct btrfs_root *root, struct extent_buffer *eb,
 			 int mirror_num);
+int submit_extent_page(int rw, struct extent_io_tree *tree,
+		       struct writeback_control *wbc,
+		       struct page *page, sector_t sector,
+		       size_t size, unsigned long offset,
+		       struct block_device *bdev,
+		       struct bio **bio_ret,
+		       unsigned long max_pages,
+		       bio_end_io_t end_io_func,
+		       int mirror_num,
+		       unsigned long prev_bio_flags,
+		       unsigned long bio_flags,
+		       bool force_bio_submit);
+int __must_check submit_one_bio(int rw, struct bio *bio,
+				int mirror_num, unsigned long bio_flags);
+void end_bio_extent_writepage(struct bio *bio);
 
 /*
  * When IO fails, either with EIO or csum verification fails, we
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 85afe66..5a49afa 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -60,6 +60,7 @@
 #include "hash.h"
 #include "props.h"
 #include "qgroup.h"
+#include "dedup.h"
 
 struct btrfs_iget_args {
 	struct btrfs_key *location;
@@ -671,6 +672,254 @@ static void free_async_extent_pages(struct async_extent *async_extent)
 	async_extent->pages = NULL;
 }
 
+static int submit_dedup_extent(struct inode *inode, u64 start,
+			       unsigned long len, u64 disk_start, int dedup)
+{
+	int i, ret = 0;
+	unsigned long nr_pages = len / PAGE_CACHE_SIZE;
+	sector_t sector;
+	struct page *page = NULL;
+	struct bio *bio = NULL;
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct block_device *bdev = root->fs_info->fs_devices->latest_bdev;
+	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+
+	for (i = 0; i < nr_pages; i++) {
+		/* page has been locked by caller */
+		page = find_get_page(inode->i_mapping,
+				     start >> PAGE_CACHE_SHIFT);
+		WARN_ON(!page);
+
+		if (dedup) {
+			end_extent_writepage(page, 0, start,
+					     start + PAGE_CACHE_SIZE - 1);
+			/* we need to do this ourselves because we skip IO */
+			end_page_writeback(page);
+
+			/* Don't forget to free qgroup reserved space */
+			btrfs_qgroup_free_data(inode, start, PAGE_CACHE_SIZE);
+		} else {
+			sector = (disk_start + PAGE_CACHE_SIZE * i) >> 9;
+			ret = submit_extent_page(WRITE, io_tree, NULL, page,
+						 sector, PAGE_CACHE_SIZE, 0,
+						 bdev, &bio, 0,
+						 end_bio_extent_writepage,
+						 0, 0, 0, 0);
+			if (ret)
+				break;
+		}
+
+		start += PAGE_CACHE_SIZE;
+		unlock_page(page);
+		page_cache_release(page);
+		page = NULL;
+	}
+
+	if (bio) {
+		if (ret)
+			bio_put(bio);
+		else
+			ret = submit_one_bio(WRITE, bio, 0, 0);
+		bio = NULL;
+	}
+
+	if (ret && page)
+		SetPageError(page);
+	if (page) {
+		unlock_page(page);
+		page_cache_release(page);
+	}
+
+	return ret;
+}
+
+/*
+ * Run dedup for delalloc range
+ * Will calculate the hash for the range.
+ */
+static noinline int
+run_delalloc_dedup(struct inode *inode, struct page *locked_page, u64 start,
+		   u64 end, struct async_cow *async_cow)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
+	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
+	struct extent_map *em;
+	struct page *page = NULL;
+	struct btrfs_key ins;
+	u64 blocksize = root->sectorsize;
+	u64 num_bytes;
+	u64 cur_alloc_size;
+	u64 cur_end;
+	u64 alloc_hint = 0;
+	int found = 0;
+	int type = 0;
+	int ret = 0;
+	struct extent_state *cached_state = NULL;
+	struct btrfs_dedup_info *dedup_info = root->fs_info->dedup_info;
+	u64 dedup_bs = dedup_info->blocksize;
+	u16 hash_type = dedup_info->hash_type;
+	struct btrfs_dedup_hash *hash = NULL;
+
+	WARN_ON(btrfs_is_free_space_inode(inode));
+
+	num_bytes = ALIGN(end - start + 1, blocksize);
+	num_bytes = max(blocksize, num_bytes);
+
+	hash = btrfs_dedup_alloc_hash(hash_type);
+	if (!hash) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	btrfs_drop_extent_cache(inode, start, start + num_bytes - 1, 0);
+
+	while (num_bytes > 0) {
+		unsigned long op = 0;
+
+		/* too small data, go for normal path */
+		if (num_bytes < dedup_bs) {
+			int page_started = 0;
+			unsigned long nr_written = 0;
+
+			/* page has been locked by caller */
+			page = find_get_page(inode->i_mapping,
+					     start >> PAGE_CACHE_SHIFT);
+			WARN_ON(!page); /* page should be here */
+
+			cur_end = start + num_bytes - 1;
+
+			/* Now locked_page is not dirty. */
+			if (page_offset(locked_page) >= start &&
+			    page_offset(locked_page) <= cur_end) {
+				__set_page_dirty_nobuffers(locked_page);
+			}
+
+			lock_extent(tree, start, cur_end);
+
+			/* allocate blocks */
+			ret = cow_file_range(inode, page, start, cur_end,
+					     &page_started, &nr_written, 0);
+
+			if (!page_started && !ret)
+				extent_write_locked_range(tree, inode, start,
+						cur_end, btrfs_get_extent,
+						WB_SYNC_ALL);
+			else if (ret)
+				unlock_page(page);
+
+			if (ret)
+				SetPageError(page);
+
+			page_cache_release(page);
+			page = NULL;
+
+			num_bytes = 0;
+			start += num_bytes;
+			cond_resched();
+			continue;
+		}
+
+		cur_alloc_size = min_t(u64, num_bytes, dedup_bs);
+		WARN_ON(cur_alloc_size < dedup_bs);	/* shouldn't happen */
+		cur_end = start + cur_alloc_size - 1;
+
+		/* see comments in compress_file_range */
+		extent_range_clear_dirty_for_io(inode, start, cur_end);
+
+		ret = btrfs_dedup_calc_hash(root, inode, start, hash);
+		if (ret < 0)
+			goto out;
+
+		found = btrfs_dedup_search(inode, start, hash);
+
+		if (found == 0) {
+			/* Dedup hash miss, normal routine */
+			ret = btrfs_reserve_extent(root, cur_alloc_size,
+					   cur_alloc_size, 0, alloc_hint,
+					   &ins, 1, 1);
+			if (ret < 0)
+				goto out;
+		} else {
+			/* Dedup hash hit, only insert file extent */
+			ins.objectid = hash->bytenr;
+			ins.offset = hash->num_bytes;
+		}
+
+		lock_extent(tree, start, cur_end);
+
+		em = alloc_extent_map();
+		if (!em) {
+			ret = -ENOMEM;
+			goto out_reserve;
+		}
+		em->start = start;
+		em->orig_start = em->start;
+		em->len = cur_alloc_size;
+		em->mod_start = em->start;
+		em->mod_len = em->len;
+
+		em->block_start = ins.objectid;
+		em->block_len = ins.offset;
+		em->orig_block_len = ins.offset;
+		em->bdev = root->fs_info->fs_devices->latest_bdev;
+		set_bit(EXTENT_FLAG_PINNED, &em->flags);
+		em->generation = -1;
+
+		while (1) {
+			write_lock(&em_tree->lock);
+			ret = add_extent_mapping(em_tree, em, 1);
+			write_unlock(&em_tree->lock);
+			if (ret != -EEXIST) {
+				free_extent_map(em);
+				break;
+			}
+			btrfs_drop_extent_cache(inode, start, cur_end, 0);
+		}
+		if (ret)
+			goto out_reserve;
+
+		ret = btrfs_add_ordered_extent_dedup(inode, start, ins.objectid,
+						     cur_alloc_size, ins.offset,
+						     type, hash);
+		if (ret)
+			goto out_reserve;
+
+		op |= PAGE_SET_WRITEBACK | PAGE_CLEAR_DIRTY;
+		extent_clear_unlock_delalloc(inode, start, cur_end,
+					     NULL,
+					     EXTENT_LOCKED | EXTENT_DELALLOC,
+					     op);
+
+		ret = submit_dedup_extent(inode, start, cur_alloc_size,
+					  ins.objectid, found);
+		if (ret)
+			break;
+
+		num_bytes -= dedup_bs;
+		alloc_hint = ins.objectid + dedup_bs;
+		start += dedup_bs;
+		cond_resched();
+	}
+
+out:
+	if (ret && num_bytes > 0)
+		extent_clear_unlock_delalloc(inode,
+			     start, start + num_bytes - 1, NULL,
+			     EXTENT_DELALLOC | EXTENT_LOCKED | EXTENT_DEFRAG,
+			     PAGE_UNLOCK | PAGE_SET_WRITEBACK |
+			     PAGE_END_WRITEBACK | PAGE_CLEAR_DIRTY);
+
+	kfree(hash);
+	free_extent_state(cached_state);
+	return ret;
+
+out_reserve:
+	if (found == 0)
+		btrfs_free_reserved_extent(root, ins.objectid, ins.offset, 1);
+	goto out;
+}
+
 /*
  * phase two of compressed writeback.  This is the ordered portion
  * of the code, which only gets called in the order the work was
@@ -1083,11 +1332,19 @@ static noinline void async_cow_start(struct btrfs_work *work)
 {
 	struct async_cow *async_cow;
 	int num_added = 0;
+	int ret = 0;
 	async_cow = container_of(work, struct async_cow, work);
 
-	compress_file_range(async_cow->inode, async_cow->locked_page,
-			    async_cow->start, async_cow->end, async_cow,
-			    &num_added);
+	if (inode_need_compress(async_cow->inode))
+		compress_file_range(async_cow->inode, async_cow->locked_page,
+				    async_cow->start, async_cow->end, async_cow,
+				    &num_added);
+	else
+		ret = run_delalloc_dedup(async_cow->inode,
+				async_cow->locked_page, async_cow->start,
+				async_cow->end, async_cow);
+	WARN_ON(ret);
+
 	if (num_added == 0) {
 		btrfs_add_delayed_iput(async_cow->inode);
 		async_cow->inode = NULL;
@@ -1537,6 +1794,8 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page,
 {
 	int ret;
 	int force_cow = need_force_cow(inode, start, end);
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_dedup_info *dedup_info = root->fs_info->dedup_info;
 
 	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW && !force_cow) {
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
@@ -1544,7 +1803,7 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page,
 	} else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) {
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
 					 page_started, 0, nr_written);
-	} else if (!inode_need_compress(inode)) {
+	} else if (!inode_need_compress(inode) && !dedup_info) {
 		ret = cow_file_range(inode, locked_page, start, end,
 				      page_started, nr_written, 1);
 	} else {
@@ -2075,7 +2334,8 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 				       u64 disk_bytenr, u64 disk_num_bytes,
 				       u64 num_bytes, u64 ram_bytes,
 				       u8 compression, u8 encryption,
-				       u16 other_encoding, int extent_type)
+				       u16 other_encoding, int extent_type,
+				       struct btrfs_dedup_hash *hash)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_file_extent_item *fi;
@@ -2137,10 +2397,29 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	ins.objectid = disk_bytenr;
 	ins.offset = disk_num_bytes;
 	ins.type = BTRFS_EXTENT_ITEM_KEY;
-	ret = btrfs_alloc_reserved_file_extent(trans, root,
+
+	/*
+	 * Only for no-dedup or hash miss case, we need to increase
+	 * extent reference
+	 * For hash hit case, reference is already increased
+	 */
+	if (!hash || hash->bytenr == 0)
+		ret = btrfs_alloc_reserved_file_extent(trans, root,
 					root->root_key.objectid,
 					btrfs_ino(inode), file_pos,
 					ram_bytes, &ins);
+	if (ret < 0)
+		goto out_qgroup;
+
+	/* Add missed hash into dedup tree */
+	if (hash && hash->bytenr == 0) {
+		hash->bytenr = ins.objectid;
+		hash->num_bytes = ins.offset;
+		ret = btrfs_dedup_add(trans, root, hash);
+	}
+
+out_qgroup:
+
 	/*
 	 * Release the reserved range from inode dirty range map, as it is
 	 * already moved into delayed_ref_head
@@ -2924,7 +3203,8 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 						ordered_extent->disk_len,
 						logical_len, logical_len,
 						compress_type, 0, 0,
-						BTRFS_FILE_EXTENT_REG);
+						BTRFS_FILE_EXTENT_REG,
+						ordered_extent->hash);
 		if (!ret)
 			btrfs_release_delalloc_bytes(root,
 						     ordered_extent->start,
@@ -2953,6 +3233,9 @@ out_unlock:
 			     ordered_extent->file_offset +
 			     ordered_extent->len - 1, &cached_state, GFP_NOFS);
 out:
+	/* free dedup hash */
+	kfree(ordered_extent->hash);
+
 	if (root != root->fs_info->tree_root)
 		btrfs_delalloc_release_metadata(inode, ordered_extent->len);
 	if (trans)
@@ -2984,7 +3267,6 @@ out:
 						   ordered_extent->disk_len, 1);
 	}
 
-
 	/*
 	 * This needs to be done to make sure anybody waiting knows we are done
 	 * updating everything for this ordered extent.
@@ -9807,7 +10089,8 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 						  cur_offset, ins.objectid,
 						  ins.offset, ins.offset,
 						  ins.offset, 0, 0, 0,
-						  BTRFS_FILE_EXTENT_PREALLOC);
+						  BTRFS_FILE_EXTENT_PREALLOC,
+						  NULL);
 		if (ret) {
 			btrfs_free_reserved_extent(root, ins.objectid,
 						   ins.offset, 0);
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 10/18] btrfs: dedup: Add basic tree structure for on-disk dedup method
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (8 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 09/18] btrfs: dedup: Inband in-memory only de-duplication implement Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 11/18] btrfs: dedup: Introduce interfaces to resume and cleanup dedup info Qu Wenruo
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Liu Bo, Wang Xiaoguang

Introduce a new tree, dedup tree to record on-disk dedup hash.
As a persist hash storage instead of in-memeory only implement.

Unlike Liu Bo's implement, in this version we won't do hack for
bytenr -> hash search, but add a new type, DEDUP_BYTENR_ITEM for such
search case, just like in-memory backend.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h             | 67 +++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/dedup.h             |  5 ++++
 fs/btrfs/disk-io.c           |  1 +
 include/trace/events/btrfs.h |  3 +-
 4 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 671be87..6f75e48 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -100,6 +100,9 @@ struct btrfs_ordered_sum;
 /* tracks free space in block groups. */
 #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
 
+/* on-disk dedup tree (EXPERIMENTAL) */
+#define BTRFS_DEDUP_TREE_OBJECTID 11ULL
+
 /* for storing balance parameters in the root tree */
 #define BTRFS_BALANCE_OBJECTID -4ULL
 
@@ -505,6 +508,7 @@ struct btrfs_super_block {
  * ones specified below then we will fail to mount
  */
 #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE	(1ULL << 0)
+#define BTRFS_FEATURE_COMPAT_RO_DEDUP		(1ULL << 1)
 
 #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF	(1ULL << 0)
 #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
@@ -534,7 +538,8 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_COMPAT_SAFE_CLEAR		0ULL
 
 #define BTRFS_FEATURE_COMPAT_RO_SUPP			\
-	(BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE)
+	(BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |	\
+	 BTRFS_FEATURE_COMPAT_RO_DEDUP)
 
 #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET	0ULL
 #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR	0ULL
@@ -964,6 +969,46 @@ struct btrfs_csum_item {
 	u8 csum;
 } __attribute__ ((__packed__));
 
+/*
+ * Objectid: 0
+ * Type: BTRFS_DEDUP_STATUS_ITEM_KEY
+ * Offset: 0
+ */
+struct btrfs_dedup_status_item {
+	__le64 blocksize;
+	__le64 limit_nr;
+	__le16 hash_type;
+	__le16 backend;
+} __attribute__ ((__packed__));
+
+/*
+ * Objectid: Last 64 bit of the hash
+ * Type: BTRFS_DEDUP_HASH_ITEM_KEY
+ * Offset: Bytenr of the hash
+ *
+ * Used for hash <-> bytenr search
+ * XXX: On-disk format not stable yet, see the unsed one
+ */
+struct btrfs_dedup_hash_item {
+	/* on disk length of dedup range */
+	__le64 len;
+
+	/* Spare space */
+	u8 __unused[16];
+
+	/* Hash follows */
+} __attribute__ ((__packed__));
+
+/*
+ * Objectid: bytenr
+ * Type: BTRFS_DEDUP_BYTENR_ITEM_KEY
+ * offset: Last 64 bit of the hash
+ *
+ * Used for bytenr <-> hash search (for free_extent)
+ * all its content is hash.
+ * So no special item struct is needed.
+ */
+
 struct btrfs_dev_stats_item {
 	/*
 	 * grow this item struct at the end for future enhancements and keep
@@ -2165,6 +2210,13 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_CHUNK_ITEM_KEY	228
 
 /*
+ * Dedup item and status
+ */
+#define BTRFS_DEDUP_STATUS_ITEM_KEY	230
+#define BTRFS_DEDUP_HASH_ITEM_KEY	231
+#define BTRFS_DEDUP_BYTENR_ITEM_KEY	232
+
+/*
  * Records the overall state of the qgroups.
  * There's only one instance of this key present,
  * (0, BTRFS_QGROUP_STATUS_KEY, 0)
@@ -3227,6 +3279,19 @@ static inline unsigned long btrfs_leaf_data(struct extent_buffer *l)
 	return offsetof(struct btrfs_leaf, items);
 }
 
+/* btrfs_dedup_status */
+BTRFS_SETGET_FUNCS(dedup_status_blocksize, struct btrfs_dedup_status_item,
+		   blocksize, 64);
+BTRFS_SETGET_FUNCS(dedup_status_limit, struct btrfs_dedup_status_item,
+		   limit_nr, 64);
+BTRFS_SETGET_FUNCS(dedup_status_hash_type, struct btrfs_dedup_status_item,
+		   hash_type, 16);
+BTRFS_SETGET_FUNCS(dedup_status_backend, struct btrfs_dedup_status_item,
+		   backend, 16);
+
+/* btrfs_dedup_hash_item */
+BTRFS_SETGET_FUNCS(dedup_hash_len, struct btrfs_dedup_hash_item, len, 64);
+
 /* struct btrfs_file_extent_item */
 BTRFS_SETGET_FUNCS(file_extent_type, struct btrfs_file_extent_item, type, 8);
 BTRFS_SETGET_STACK_FUNCS(stack_file_extent_disk_bytenr,
diff --git a/fs/btrfs/dedup.h b/fs/btrfs/dedup.h
index a859ad8..d22031b 100644
--- a/fs/btrfs/dedup.h
+++ b/fs/btrfs/dedup.h
@@ -54,6 +54,8 @@ struct btrfs_dedup_hash {
 	u8 hash[];
 };
 
+struct btrfs_root;
+
 struct btrfs_dedup_info {
 	/* dedup blocksize */
 	u64 blocksize;
@@ -69,6 +71,9 @@ struct btrfs_dedup_info {
 	struct list_head lru_list;
 	u64 limit_nr;
 	u64 current_nr;
+
+	/* for persist data like dedup-hash and dedup status */
+	struct btrfs_root *dedup_root;
 };
 
 struct btrfs_trans_handle;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c67c129..a544277 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -176,6 +176,7 @@ static struct btrfs_lockdep_keyset {
 	{ .id = BTRFS_TREE_RELOC_OBJECTID,	.name_stem = "treloc"	},
 	{ .id = BTRFS_DATA_RELOC_TREE_OBJECTID,	.name_stem = "dreloc"	},
 	{ .id = BTRFS_UUID_TREE_OBJECTID,	.name_stem = "uuid"	},
+	{ .id = BTRFS_DEDUP_TREE_OBJECTID,	.name_stem = "dedup"	},
 	{ .id = 0,				.name_stem = "tree"	},
 };
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index d866f21..44d5e0f 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -47,12 +47,13 @@ struct btrfs_qgroup_operation;
 		{ BTRFS_TREE_RELOC_OBJECTID,	"TREE_RELOC"	},	\
 		{ BTRFS_UUID_TREE_OBJECTID,	"UUID_TREE"	},	\
 		{ BTRFS_FREE_SPACE_TREE_OBJECTID, "FREE_SPACE_TREE" },	\
+		{ BTRFS_DEDUP_TREE_OBJECTID,	"DEDUP_TREE"	},	\
 		{ BTRFS_DATA_RELOC_TREE_OBJECTID, "DATA_RELOC_TREE" })
 
 #define show_root_type(obj)						\
 	obj, ((obj >= BTRFS_DATA_RELOC_TREE_OBJECTID) ||		\
 	      (obj >= BTRFS_ROOT_TREE_OBJECTID &&			\
-	       obj <= BTRFS_QUOTA_TREE_OBJECTID)) ? __show_root_type(obj) : "-"
+	       obj <= BTRFS_DEDUP_TREE_OBJECTID)) ? __show_root_type(obj) : "-"
 
 #define BTRFS_GROUP_FLAGS	\
 	{ BTRFS_BLOCK_GROUP_DATA,	"DATA"},	\
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 11/18] btrfs: dedup: Introduce interfaces to resume and cleanup dedup info
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (9 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 10/18] btrfs: dedup: Add basic tree structure for on-disk dedup method Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 12/18] btrfs: dedup: Add support for on-disk hash search Qu Wenruo
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

Since we will introduce a new on-disk based dedup method, introduce new
interfaces to resume previous dedup setup.

And since we introduce a new tree for status, also add disable handler
for it.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/dedup.c   | 192 +++++++++++++++++++++++++++++++++++++++++++++++------
 fs/btrfs/dedup.h   |  13 ++++
 fs/btrfs/disk-io.c |  26 +++++++-
 fs/btrfs/disk-io.h |   1 +
 4 files changed, 211 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 8860916..c97823f 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -20,6 +20,42 @@
 #include "btrfs_inode.h"
 #include "transaction.h"
 #include "delayed-ref.h"
+#include "disk-io.h"
+
+static int init_dedup_info(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
+			   u64 blocksize, u64 limit)
+{
+	struct btrfs_dedup_info *dedup_info;
+	int ret;
+
+	fs_info->dedup_info = kzalloc(sizeof(*dedup_info), GFP_NOFS);
+	if (!fs_info->dedup_info)
+		return -ENOMEM;
+
+	dedup_info = fs_info->dedup_info;
+
+	dedup_info->hash_type = type;
+	dedup_info->backend = backend;
+	dedup_info->blocksize = blocksize;
+	dedup_info->limit_nr = limit;
+
+	/* Only support SHA256 yet */
+	dedup_info->dedup_driver = crypto_alloc_shash("sha256", 0, 0);
+	if (IS_ERR(dedup_info->dedup_driver)) {
+		btrfs_err(fs_info, "failed to init sha256 driver");
+		ret = PTR_ERR(dedup_info->dedup_driver);
+		kfree(fs_info->dedup_info);
+		fs_info->dedup_info = NULL;
+		return ret;
+	}
+
+	dedup_info->hash_root = RB_ROOT;
+	dedup_info->bytenr_root = RB_ROOT;
+	dedup_info->current_nr = 0;
+	INIT_LIST_HEAD(&dedup_info->lru_list);
+	mutex_init(&dedup_info->lock);
+	return 0;
+}
 
 struct inmem_hash {
 	struct rb_node hash_node;
@@ -44,6 +80,13 @@ int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
 		       u64 blocksize, u64 limit)
 {
 	struct btrfs_dedup_info *dedup_info;
+	struct btrfs_root *dedup_root;
+	struct btrfs_key key;
+	struct btrfs_trans_handle *trans;
+	struct btrfs_path *path;
+	struct btrfs_dedup_status_item *status;
+	int create_tree;
+	u64 compat_ro_flag = btrfs_super_compat_ro_flags(fs_info->super_copy);
 	int ret = 0;
 
 	/* Sanity check */
@@ -61,6 +104,18 @@ int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
 	if (backend == BTRFS_DEDUP_BACKEND_ONDISK && limit != 0)
 		limit = 0;
 
+	/*
+	 * If current fs doesn't support DEDUP feature, don't enable
+	 * on-disk dedup.
+	 */
+	if (!(compat_ro_flag & BTRFS_FEATURE_COMPAT_RO_DEDUP) &&
+	    backend == BTRFS_DEDUP_BACKEND_ONDISK)
+		return -EINVAL;
+
+	/* Meaningless and unable to enable dedup for RO fs */
+	if (fs_info->sb->s_flags & MS_RDONLY)
+		return -EINVAL;
+
 	if (fs_info->dedup_info) {
 		dedup_info = fs_info->dedup_info;
 
@@ -80,32 +135,63 @@ int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
 	}
 
 enable:
-	fs_info->dedup_info = kzalloc(sizeof(*dedup_info), GFP_NOFS);
-	if (!fs_info->dedup_info)
-		return -ENOMEM;
+	create_tree = compat_ro_flag & BTRFS_FEATURE_COMPAT_RO_DEDUP;
 
+	ret = init_dedup_info(fs_info, type, backend, blocksize, limit);
 	dedup_info = fs_info->dedup_info;
+	if (ret < 0)
+		goto out;
 
-	dedup_info->hash_type = type;
-	dedup_info->backend = backend;
-	dedup_info->blocksize = blocksize;
-	dedup_info->limit_nr = limit;
+	if (!create_tree)
+		goto out;
 
-	/* Only support SHA256 yet */
-	dedup_info->dedup_driver = crypto_alloc_shash("sha256", 0, 0);
-	if (IS_ERR(dedup_info->dedup_driver)) {
-		btrfs_err(fs_info, "failed to init sha256 driver");
-		ret = PTR_ERR(dedup_info->dedup_driver);
+	/* Create dedup tree for status at least */
+	path = btrfs_alloc_path();
+	if (!path) {
+		ret = -ENOMEM;
 		goto out;
 	}
 
-	dedup_info->hash_root = RB_ROOT;
-	dedup_info->bytenr_root = RB_ROOT;
-	dedup_info->current_nr = 0;
-	INIT_LIST_HEAD(&dedup_info->lru_list);
-	mutex_init(&dedup_info->lock);
+	trans = btrfs_start_transaction(fs_info->tree_root, 2);
+	if (IS_ERR(trans)) {
+		ret = PTR_ERR(trans);
+		btrfs_free_path(path);
+		goto out;
+	}
+
+	dedup_root = btrfs_create_tree(trans, fs_info,
+				       BTRFS_DEDUP_TREE_OBJECTID);
+	if (IS_ERR(dedup_root)) {
+		ret = PTR_ERR(dedup_root);
+		btrfs_abort_transaction(trans, fs_info->tree_root, ret);
+		btrfs_free_path(path);
+		goto out;
+	}
+
+	dedup_info->dedup_root = dedup_root;
+
+	key.objectid = 0;
+	key.type = BTRFS_DEDUP_STATUS_ITEM_KEY;
+	key.offset = 0;
+
+	ret = btrfs_insert_empty_item(trans, dedup_root, path, &key,
+				      sizeof(*status));
+	if (ret < 0) {
+		btrfs_abort_transaction(trans, fs_info->tree_root, ret);
+		btrfs_free_path(path);
+		goto out;
+	}
+	status = btrfs_item_ptr(path->nodes[0], path->slots[0],
+				struct btrfs_dedup_status_item);
+	btrfs_set_dedup_status_blocksize(path->nodes[0], status, blocksize);
+	btrfs_set_dedup_status_limit(path->nodes[0], status, limit);
+	btrfs_set_dedup_status_hash_type(path->nodes[0], status, type);
+	btrfs_set_dedup_status_backend(path->nodes[0], status, backend);
+	btrfs_mark_buffer_dirty(path->nodes[0]);
+
+	btrfs_free_path(path);
+	ret = btrfs_commit_transaction(trans, fs_info->tree_root);
 
-	fs_info->dedup_info = dedup_info;
 out:
 	if (ret < 0) {
 		kfree(dedup_info);
@@ -114,6 +200,68 @@ out:
 	return ret;
 }
 
+int btrfs_dedup_resume(struct btrfs_fs_info *fs_info,
+		       struct btrfs_root *dedup_root)
+{
+	struct btrfs_dedup_status_item *status;
+	struct btrfs_key key;
+	struct btrfs_path *path;
+	u64 blocksize;
+	u64 limit;
+	u16 type;
+	u16 backend;
+	int ret = 0;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = 0;
+	key.type = BTRFS_DEDUP_STATUS_ITEM_KEY;
+	key.offset = 0;
+
+	ret = btrfs_search_slot(NULL, dedup_root, &key, path, 0, 0);
+	if (ret > 0) {
+		ret = -ENOENT;
+		goto out;
+	} else if (ret < 0) {
+		goto out;
+	}
+
+	status = btrfs_item_ptr(path->nodes[0], path->slots[0],
+				struct btrfs_dedup_status_item);
+	blocksize = btrfs_dedup_status_blocksize(path->nodes[0], status);
+	limit = btrfs_dedup_status_limit(path->nodes[0], status);
+	type = btrfs_dedup_status_hash_type(path->nodes[0], status);
+	backend = btrfs_dedup_status_backend(path->nodes[0], status);
+
+	ret = init_dedup_info(fs_info, type, backend, blocksize, limit);
+	if (ret < 0)
+		goto out;
+	fs_info->dedup_info->dedup_root = dedup_root;
+
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+static void inmem_destroy(struct btrfs_fs_info *fs_info);
+int btrfs_dedup_cleanup(struct btrfs_fs_info *fs_info)
+{
+	if (!fs_info->dedup_info)
+		return 0;
+	if (fs_info->dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
+		inmem_destroy(fs_info);
+	if (fs_info->dedup_info->dedup_root) {
+		free_root_extent_buffers(fs_info->dedup_info->dedup_root);
+		kfree(fs_info->dedup_info->dedup_root);
+	}
+	crypto_free_shash(fs_info->dedup_info->dedup_driver);
+	kfree(fs_info->dedup_info);
+	fs_info->dedup_info = NULL;
+	return 0;
+}
+
 static int inmem_insert_hash(struct rb_root *root,
 			     struct inmem_hash *hash, int hash_len)
 {
@@ -318,13 +466,19 @@ static void inmem_destroy(struct btrfs_fs_info *fs_info)
 int btrfs_dedup_disable(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_dedup_info *dedup_info = fs_info->dedup_info;
+	int ret = 0;
 
 	if (!dedup_info)
 		return 0;
 
 	if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
 		inmem_destroy(fs_info);
-	return 0;
+	if (dedup_info->dedup_root)
+		ret = btrfs_drop_snapshot(dedup_info->dedup_root, NULL, 1, 0);
+	crypto_free_shash(fs_info->dedup_info->dedup_driver);
+	kfree(fs_info->dedup_info);
+	fs_info->dedup_info = NULL;
+	return ret;
 }
 
 /*
diff --git a/fs/btrfs/dedup.h b/fs/btrfs/dedup.h
index d22031b..f23053c 100644
--- a/fs/btrfs/dedup.h
+++ b/fs/btrfs/dedup.h
@@ -105,6 +105,19 @@ int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
 int btrfs_dedup_disable(struct btrfs_fs_info *fs_info);
 
 /*
+ * Restore previous dedup setup from disk
+ * Called at mount time
+ */
+int btrfs_dedup_resume(struct btrfs_fs_info *fs_info,
+		       struct btrfs_root *dedup_root);
+
+/*
+ * Free current btrfs_dedup_info
+ * Called at umount(close_ctree) time
+ */
+int btrfs_dedup_cleanup(struct btrfs_fs_info *fs_info);
+
+/*
  * Calculate hash for dedup.
  * Caller must ensure [start, start + dedup_bs) has valid data.
  */
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a544277..5754eb2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -50,6 +50,7 @@
 #include "raid56.h"
 #include "sysfs.h"
 #include "qgroup.h"
+#include "dedup.h"
 
 #ifdef CONFIG_X86
 #include <asm/cpufeature.h>
@@ -2131,7 +2132,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
 	btrfs_destroy_workqueue(fs_info->extent_workers);
 }
 
-static void free_root_extent_buffers(struct btrfs_root *root)
+void free_root_extent_buffers(struct btrfs_root *root)
 {
 	if (root) {
 		free_extent_buffer(root->node);
@@ -2463,7 +2464,25 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info,
 		fs_info->free_space_root = root;
 	}
 
-	return 0;
+	location.objectid = BTRFS_DEDUP_TREE_OBJECTID;
+	root = btrfs_read_tree_root(tree_root, &location);
+	if (IS_ERR(root)) {
+		ret = PTR_ERR(root);
+		if (ret != -ENOENT)
+			return ret;
+		/* Just OK if there is no dedup root */
+		return 0;
+	}
+
+	set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
+	/* Found dedup root, resume previous dedup setup */
+	ret = btrfs_dedup_resume(fs_info, root);
+
+	if (ret < 0) {
+		free_root_extent_buffers(root);
+		kfree(root);
+	}
+	return ret;
 }
 
 int open_ctree(struct super_block *sb,
@@ -3868,6 +3887,9 @@ void close_ctree(struct btrfs_root *root)
 
 	btrfs_free_qgroup_config(fs_info);
 
+	/* Cleanup dedup info */
+	btrfs_dedup_cleanup(fs_info);
+
 	if (percpu_counter_sum(&fs_info->delalloc_bytes)) {
 		btrfs_info(fs_info, "at unmount delalloc count %lld",
 		       percpu_counter_sum(&fs_info->delalloc_bytes));
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 8e79d00..42c4ff2 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -70,6 +70,7 @@ struct btrfs_root *btrfs_read_fs_root(struct btrfs_root *tree_root,
 int btrfs_init_fs_root(struct btrfs_root *root);
 int btrfs_insert_fs_root(struct btrfs_fs_info *fs_info,
 			 struct btrfs_root *root);
+void free_root_extent_buffers(struct btrfs_root *root);
 void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
 
 struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 12/18] btrfs: dedup: Add support for on-disk hash search
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (10 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 11/18] btrfs: dedup: Introduce interfaces to resume and cleanup dedup info Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 13/18] btrfs: dedup: Add support to delete hash for on-disk backend Qu Wenruo
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

Now on-disk backend should be able to search hash now.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/dedup.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++--------
 fs/btrfs/dedup.h |   3 ++
 2 files changed, 127 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index c97823f..4acbbfb 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -482,6 +482,79 @@ int btrfs_dedup_disable(struct btrfs_fs_info *fs_info)
 }
 
 /*
+ * Return 0 for not found
+ * Return >0 for found and set bytenr_ret
+ * Return <0 for error
+ */
+static int ondisk_search_hash(struct btrfs_dedup_info *dedup_info, u8 *hash,
+			      u64 *bytenr_ret, u32 *num_bytes_ret)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	struct btrfs_root *dedup_root = dedup_info->dedup_root;
+	u8 *buf = NULL;
+	u64 hash_key;
+	int hash_len = btrfs_dedup_sizes[dedup_info->hash_type];
+	int ret;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	buf = kmalloc(hash_len, GFP_NOFS);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	memcpy(&hash_key, hash + hash_len - 8, 8);
+	key.objectid = hash_key;
+	key.type = BTRFS_DEDUP_HASH_ITEM_KEY;
+	key.offset = (u64)-1;
+
+	ret = btrfs_search_slot(NULL, dedup_root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	WARN_ON(ret == 0);
+	while (1) {
+		struct extent_buffer *node;
+		struct btrfs_dedup_hash_item *hash_item;
+		int slot;
+
+		ret = btrfs_previous_item(dedup_root, path, hash_key,
+					  BTRFS_DEDUP_HASH_ITEM_KEY);
+		if (ret < 0)
+			goto out;
+		if (ret > 0) {
+			ret = 0;
+			goto out;
+		}
+
+		node = path->nodes[0];
+		slot = path->slots[0];
+		btrfs_item_key_to_cpu(node, &key, slot);
+
+		if (key.type != BTRFS_DEDUP_HASH_ITEM_KEY ||
+		    memcmp(&key.objectid, hash + hash_len - 8, 8))
+			break;
+		hash_item = btrfs_item_ptr(node, slot,
+				struct btrfs_dedup_hash_item);
+		read_extent_buffer(node, buf, (unsigned long)(hash_item + 1),
+				   hash_len);
+		if (!memcmp(buf, hash, hash_len)) {
+			ret = 1;
+			*bytenr_ret = key.offset;
+			*num_bytes_ret = btrfs_dedup_hash_len(node, hash_item);
+			break;
+		}
+	}
+out:
+	kfree(buf);
+	btrfs_free_path(path);
+	return ret;
+}
+
+/*
  * Caller must ensure the corresponding ref head is not being run.
  */
 static struct inmem_hash *
@@ -511,7 +584,34 @@ inmem_search_hash(struct btrfs_dedup_info *dedup_info, u8 *hash)
 	return NULL;
 }
 
-static int inmem_search(struct inode *inode, u64 file_pos,
+/* Wrapper for different backends, caller needs to hold dedup_info->lock */
+static inline int generic_search_hash(struct btrfs_dedup_info *dedup_info,
+				      u8 *hash, u64 *bytenr_ret,
+				      u32 *num_bytes_ret)
+{
+	if (dedup_info->hash_type == BTRFS_DEDUP_BACKEND_INMEMORY) {
+		struct inmem_hash *found_hash;
+		int ret;
+
+		found_hash = inmem_search_hash(dedup_info, hash);
+		if (found_hash) {
+			ret = 1;
+			*bytenr_ret = found_hash->bytenr;
+			*num_bytes_ret = found_hash->num_bytes;
+		} else {
+			ret = 0;
+			*bytenr_ret = 0;
+			*num_bytes_ret = 0;
+		}
+		return ret;
+	} else if (dedup_info->hash_type == BTRFS_DEDUP_BACKEND_ONDISK) {
+		return ondisk_search_hash(dedup_info, hash, bytenr_ret,
+					  num_bytes_ret);
+	}
+	return -EINVAL;
+}
+
+static int generic_search(struct inode *inode, u64 file_pos,
 			struct btrfs_dedup_hash *hash)
 {
 	int ret;
@@ -520,9 +620,9 @@ static int inmem_search(struct inode *inode, u64 file_pos,
 	struct btrfs_trans_handle *trans;
 	struct btrfs_delayed_ref_root *delayed_refs;
 	struct btrfs_delayed_ref_head *head;
-	struct inmem_hash *found_hash;
 	struct btrfs_dedup_info *dedup_info = fs_info->dedup_info;
 	u64 bytenr;
+	u64 tmp_bytenr;
 	u32 num_bytes;
 
 	trans = btrfs_join_transaction(root);
@@ -531,14 +631,9 @@ static int inmem_search(struct inode *inode, u64 file_pos,
 
 again:
 	mutex_lock(&dedup_info->lock);
-	found_hash = inmem_search_hash(dedup_info, hash->hash);
-	/* If we don't find a duplicated extent, just return. */
-	if (!found_hash) {
-		ret = 0;
+	ret = generic_search_hash(dedup_info, hash->hash, &bytenr, &num_bytes);
+	if (ret <= 0)
 		goto out;
-	}
-	bytenr = found_hash->bytenr;
-	num_bytes = found_hash->num_bytes;
 
 	delayed_refs = &trans->transaction->delayed_refs;
 
@@ -574,13 +669,21 @@ again:
 		goto again;
 
 	mutex_lock(&dedup_info->lock);
-	/* Search again to ensure the hash is still here */
-	found_hash = inmem_search_hash(dedup_info, hash->hash);
-	if (!found_hash) {
-		ret = 0;
+	/*
+	 * Search again to ensure the hash is still here and bytenr didn't
+	 * change
+	 */
+	ret = generic_search_hash(dedup_info, hash->hash, &tmp_bytenr,
+				  &num_bytes);
+	if (ret <= 0) {
 		mutex_unlock(&head->mutex);
 		goto out;
 	}
+	if (tmp_bytenr != bytenr) {
+		mutex_unlock(&head->mutex);
+		mutex_unlock(&dedup_info->lock);
+		goto again;
+	}
 	hash->bytenr = bytenr;
 	hash->num_bytes = num_bytes;
 
@@ -609,15 +712,15 @@ int btrfs_dedup_search(struct inode *inode, u64 file_pos,
 	if (WARN_ON(!dedup_info || !hash))
 		return 0;
 
-	if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
-		ret = inmem_search(inode, file_pos, hash);
-
-	/* It's possible hash->bytenr/num_bytenr already changed */
-	if (ret == 0) {
-		hash->num_bytes = 0;
-		hash->bytenr = 0;
+	if (dedup_info->backend < BTRFS_DEDUP_BACKEND_LAST) {
+		ret = generic_search(inode, file_pos, hash);
+		if (ret == 0) {
+			hash->num_bytes = 0;
+			hash->bytenr = 0;
+		}
+		return ret;
 	}
-	return ret;
+	return -EINVAL;
 }
 
 static int hash_data(struct btrfs_dedup_info *dedup_info, const char *data,
diff --git a/fs/btrfs/dedup.h b/fs/btrfs/dedup.h
index f23053c..3c08b86 100644
--- a/fs/btrfs/dedup.h
+++ b/fs/btrfs/dedup.h
@@ -137,6 +137,9 @@ int btrfs_dedup_calc_hash(struct btrfs_root *root, struct inode *inode,
  * *INCREASED*, and hash->bytenr/num_bytes will record the existing
  * extent data.
  * Return 0 for a hash miss. Nothing is done
+ * Return <0 for error.
+ *
+ * Only on-disk backedn may return error though.
  */
 int btrfs_dedup_search(struct inode *inode, u64 file_pos,
 		       struct btrfs_dedup_hash *hash);
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 13/18] btrfs: dedup: Add support to delete hash for on-disk backend
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (11 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 12/18] btrfs: dedup: Add support for on-disk hash search Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14 10:19   ` Filipe Manana
  2016-01-14  5:57 ` [PATCH v4 14/18] btrfs: dedup: Add support for adding " Qu Wenruo
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

Now on-disk backend can delete hash now.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/dedup.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 4acbbfb..66c8f05 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -437,6 +437,97 @@ static int inmem_del(struct btrfs_dedup_info *dedup_info, u64 bytenr)
 	return 0;
 }
 
+/*
+ * If prepare_del is given, this will setup search_slot() for delete.
+ * Caller needs to do proper locking.
+ *
+ * Return > 0 for found.
+ * Return 0 for not found.
+ * Return < 0 for error.
+ */
+static int ondisk_search_bytenr(struct btrfs_trans_handle *trans,
+				struct btrfs_dedup_info *dedup_info,
+				struct btrfs_path *path, u64 bytenr,
+				int prepare_del)
+{
+	struct btrfs_key key;
+	struct btrfs_root *dedup_root = dedup_info->dedup_root;
+	int ret;
+	int ins_len = 0;
+	int cow = 0;
+
+	if (prepare_del) {
+		if (WARN_ON(trans == NULL))
+			return -EINVAL;
+		cow = 1;
+		ins_len = -1;
+	}
+
+	key.objectid = bytenr;
+	key.type = BTRFS_DEDUP_BYTENR_ITEM_KEY;
+	key.offset = (u64)-1;
+
+	ret = btrfs_search_slot(trans, dedup_root, &key, path,
+				ins_len, cow);
+	if (ret < 0)
+		return ret;
+
+	WARN_ON(ret == 0);
+	ret = btrfs_previous_item(dedup_root, path, bytenr,
+				  BTRFS_DEDUP_BYTENR_ITEM_KEY);
+	if (ret < 0)
+		return ret;
+	if (ret > 0)
+		return 0;
+	return 1;
+}
+
+static int ondisk_del(struct btrfs_trans_handle *trans,
+		      struct btrfs_dedup_info *dedup_info, u64 bytenr)
+{
+	struct btrfs_root *dedup_root = dedup_info->dedup_root;
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	int ret;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	key.objectid = bytenr;
+	key.type = BTRFS_DEDUP_BYTENR_ITEM_KEY;
+	key.offset = 0;
+
+	mutex_lock(&dedup_info->lock);
+
+	ret = ondisk_search_bytenr(trans, dedup_info, path, bytenr, 1);
+	if (ret <= 0)
+		goto out;
+
+	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+	btrfs_del_item(trans, dedup_root, path);
+	btrfs_release_path(path);
+
+	/* Search for hash item and delete it */
+	key.objectid = key.offset;
+	key.type = BTRFS_DEDUP_HASH_ITEM_KEY;
+	key.offset = bytenr;
+
+	ret = btrfs_search_slot(trans, dedup_root, &key, path, -1, 1);
+	if (WARN_ON(ret > 0)) {
+		ret = -ENOENT;
+		goto out;
+	}
+	if (ret < 0)
+		goto out;
+	btrfs_del_item(trans, dedup_root, path);
+
+out:
+	btrfs_free_path(path);
+	mutex_unlock(&dedup_info->lock);
+	return ret;
+}
+
 /* Remove a dedup hash from dedup tree */
 int btrfs_dedup_del(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 		    u64 bytenr)
@@ -449,6 +540,8 @@ int btrfs_dedup_del(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 
 	if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
 		return inmem_del(dedup_info, bytenr);
+	if (dedup_info->backend == BTRFS_DEDUP_BACKEND_ONDISK)
+		return ondisk_del(trans, dedup_info, bytenr);
 	return -EINVAL;
 }
 
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 14/18] btrfs: dedup: Add support for adding hash for on-disk backend
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (12 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 13/18] btrfs: dedup: Add support to delete hash for on-disk backend Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 15/18] btrfs: dedup: Add ioctl for inband deduplication Qu Wenruo
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

Now on-disk backend can add hash now.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/dedup.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 66c8f05..413d640 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -245,6 +245,8 @@ out:
 	return ret;
 }
 
+static int ondisk_search_hash(struct btrfs_dedup_info *dedup_info, u8 *hash,
+			      u64 *bytenr_ret, u32 *num_bytes_ret);
 static void inmem_destroy(struct btrfs_fs_info *fs_info);
 int btrfs_dedup_cleanup(struct btrfs_fs_info *fs_info)
 {
@@ -381,6 +383,85 @@ out:
 	return 0;
 }
 
+static int ondisk_search_bytenr(struct btrfs_trans_handle *trans,
+				struct btrfs_dedup_info *dedup_info,
+				struct btrfs_path *path, u64 bytenr,
+				int prepare_del);
+static int ondisk_add(struct btrfs_trans_handle *trans,
+		      struct btrfs_dedup_info *dedup_info,
+		      struct btrfs_dedup_hash *hash)
+{
+	struct btrfs_path *path;
+	struct btrfs_root *dedup_root = dedup_info->dedup_root;
+	struct btrfs_key key;
+	struct btrfs_dedup_hash_item *hash_item;
+	u64 bytenr;
+	u32 num_bytes;
+	int hash_len = btrfs_dedup_sizes[dedup_info->hash_type];
+	int ret;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	mutex_lock(&dedup_info->lock);
+
+	ret = ondisk_search_bytenr(NULL, dedup_info, path, hash->bytenr, 0);
+	if (ret < 0)
+		goto out;
+	if (ret > 0) {
+		ret = 0;
+		goto out;
+	}
+	btrfs_release_path(path);
+
+	ret = ondisk_search_hash(dedup_info, hash->hash, &bytenr, &num_bytes);
+	if (ret < 0)
+		goto out;
+	/* Same hash found, don't re-add to save dedup tree space */
+	if (ret > 0) {
+		ret = 0;
+		goto out;
+	}
+
+	/* Insert hash->bytenr item */
+	memcpy(&key.objectid, hash->hash + hash_len - 8, 8);
+	key.type = BTRFS_DEDUP_HASH_ITEM_KEY;
+	key.offset = hash->bytenr;
+
+	ret = btrfs_insert_empty_item(trans, dedup_root, path, &key,
+			sizeof(*hash_item) + hash_len);
+	WARN_ON(ret == -EEXIST);
+	if (ret < 0)
+		goto out;
+	hash_item = btrfs_item_ptr(path->nodes[0], path->slots[0],
+				   struct btrfs_dedup_hash_item);
+	btrfs_set_dedup_hash_len(path->nodes[0], hash_item, hash->num_bytes);
+	write_extent_buffer(path->nodes[0], hash->hash,
+			    (unsigned long)(hash_item + 1), hash_len);
+	btrfs_mark_buffer_dirty(path->nodes[0]);
+	btrfs_release_path(path);
+
+	/* Then bytenr->hash item */
+	key.objectid = hash->bytenr;
+	key.type = BTRFS_DEDUP_BYTENR_ITEM_KEY;
+	memcpy(&key.offset, hash->hash + hash_len - 8, 8);
+
+	ret = btrfs_insert_empty_item(trans, dedup_root, path, &key, hash_len);
+	WARN_ON(ret == -EEXIST);
+	if (ret < 0)
+		goto out;
+	write_extent_buffer(path->nodes[0], hash->hash,
+			btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
+			hash_len);
+	btrfs_mark_buffer_dirty(path->nodes[0]);
+
+out:
+	mutex_unlock(&dedup_info->lock);
+	btrfs_free_path(path);
+	return ret;
+}
+
 int btrfs_dedup_add(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 		    struct btrfs_dedup_hash *hash)
 {
@@ -395,6 +476,8 @@ int btrfs_dedup_add(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 
 	if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
 		return inmem_add(dedup_info, hash);
+	if (dedup_info->backend == BTRFS_DEDUP_BACKEND_ONDISK)
+		return ondisk_add(trans, dedup_info, hash);
 	return -EINVAL;
 }
 
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 15/18] btrfs: dedup: Add ioctl for inband deduplication
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (13 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 14/18] btrfs: dedup: Add support for adding " Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 16/18] btrfs: dedup: add an inode nodedup flag Qu Wenruo
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Add ioctl interface for inband deduplication, which includes:
1) enable
2) disable
3) status

We will later add ioctl to disable inband dedup for given file/dir.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/ctree.h           |  1 +
 fs/btrfs/disk-io.c         |  1 +
 fs/btrfs/ioctl.c           | 63 ++++++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/btrfs.h | 23 +++++++++++++++++
 4 files changed, 88 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6f75e48..a68e23d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1908,6 +1908,7 @@ struct btrfs_fs_info {
 
 	/* reference to inband de-duplication info */
 	struct btrfs_dedup_info *dedup_info;
+	struct mutex dedup_ioctl_mutex;
 };
 
 struct btrfs_subvolume_writers {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5754eb2..7841bcb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2576,6 +2576,7 @@ int open_ctree(struct super_block *sb,
 	mutex_init(&fs_info->delete_unused_bgs_mutex);
 	mutex_init(&fs_info->reloc_mutex);
 	mutex_init(&fs_info->delalloc_root_mutex);
+	mutex_init(&fs_info->dedup_ioctl_mutex);
 	seqlock_init(&fs_info->profiles_lock);
 	init_rwsem(&fs_info->delayed_iput_sem);
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e392dd6..c18be18 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -59,6 +59,7 @@
 #include "props.h"
 #include "sysfs.h"
 #include "qgroup.h"
+#include "dedup.h"
 
 #ifdef CONFIG_64BIT
 /* If we have a 32-bit userspace and 64-bit kernel, then the UAPI
@@ -3214,6 +3215,66 @@ out:
 	return ret;
 }
 
+static long btrfs_ioctl_dedup_ctl(struct btrfs_root *root, void __user *args)
+{
+	struct btrfs_ioctl_dedup_args *dargs;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_dedup_info *dedup_info;
+	int ret;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	dargs = memdup_user(args, sizeof(*dargs));
+	if (IS_ERR(dargs)) {
+		ret = PTR_ERR(dargs);
+		return ret;
+	}
+
+	if (dargs->cmd >= BTRFS_DEDUP_CTL_LAST) {
+		ret = -EINVAL;
+		goto out;
+	}
+	switch (dargs->cmd) {
+	case BTRFS_DEDUP_CTL_ENABLE:
+		ret = btrfs_dedup_enable(fs_info, dargs->hash_type,
+					 dargs->backend, dargs->blocksize,
+					 dargs->limit_nr);
+		break;
+	case BTRFS_DEDUP_CTL_DISABLE:
+		ret = btrfs_dedup_disable(fs_info);
+		break;
+	case BTRFS_DEDUP_CTL_STATUS:
+		dedup_info = fs_info->dedup_info;
+		if (dedup_info) {
+			dargs->status = 1;
+			dargs->blocksize = dedup_info->blocksize;
+			dargs->backend = dedup_info->backend;
+			dargs->hash_type = dedup_info->hash_type;
+			dargs->limit_nr = dedup_info->limit_nr;
+			dargs->current_nr = dedup_info->current_nr;
+		} else {
+			dargs->status = 0;
+			dargs->blocksize = 0;
+			dargs->backend = 0;
+			dargs->hash_type = 0;
+			dargs->limit_nr = 0;
+			dargs->current_nr = 0;
+		}
+		if (copy_to_user(args, dargs, sizeof(*dargs)))
+			ret = -EFAULT;
+		else
+			ret = 0;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+out:
+	kfree(dargs);
+	return ret;
+}
+
 static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
 				     struct inode *inode,
 				     u64 endoff,
@@ -5576,6 +5637,8 @@ long btrfs_ioctl(struct file *file, unsigned int
 		return btrfs_ioctl_set_fslabel(file, argp);
 	case BTRFS_IOC_FILE_EXTENT_SAME:
 		return btrfs_ioctl_file_extent_same(file, argp);
+	case BTRFS_IOC_DEDUP_CTL:
+		return btrfs_ioctl_dedup_ctl(root, argp);
 	case BTRFS_IOC_GET_SUPPORTED_FEATURES:
 		return btrfs_ioctl_get_supported_features(file, argp);
 	case BTRFS_IOC_GET_FEATURES:
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index dea8931..b33da24 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -445,6 +445,27 @@ struct btrfs_ioctl_get_dev_stats {
 	__u64 unused[128 - 2 - BTRFS_DEV_STAT_VALUES_MAX]; /* pad to 1k */
 };
 
+/*
+ * de-duplication control modes
+ * For re-config, re-enable will handle it
+ * TODO: Add support to disable per-file/dir dedup operation
+ */
+#define BTRFS_DEDUP_CTL_ENABLE	1
+#define BTRFS_DEDUP_CTL_DISABLE 2
+#define BTRFS_DEDUP_CTL_STATUS	3
+#define BTRFS_DEDUP_CTL_LAST	4
+struct btrfs_ioctl_dedup_args {
+	__u16 cmd;		/* In: command(see above macro) */
+	__u64 blocksize;	/* In/Out: For enable/status */
+	__u64 limit_nr;		/* In/Out: For enable/status */
+	__u64 current_nr;	/* Out: For status output */
+	__u16 backend;		/* In/Out: For enable/status */
+	__u16 hash_type;	/* In/Out: For enable/status */
+	u8 status;		/* Out: For status output */
+	/* pad to 512 bytes */
+	u8 __unused[489];
+};
+
 #define BTRFS_QUOTA_CTL_ENABLE	1
 #define BTRFS_QUOTA_CTL_DISABLE	2
 #define BTRFS_QUOTA_CTL_RESCAN__NOTUSED	3
@@ -653,6 +674,8 @@ static inline char *btrfs_err_str(enum btrfs_err_code err_code)
 				    struct btrfs_ioctl_dev_replace_args)
 #define BTRFS_IOC_FILE_EXTENT_SAME _IOWR(BTRFS_IOCTL_MAGIC, 54, \
 					 struct btrfs_ioctl_same_args)
+#define BTRFS_IOC_DEDUP_CTL	_IOWR(BTRFS_IOCTL_MAGIC, 55, \
+				      struct btrfs_ioctl_dedup_args)
 #define BTRFS_IOC_GET_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 57, \
 				   struct btrfs_ioctl_feature_flags)
 #define BTRFS_IOC_SET_FEATURES _IOW(BTRFS_IOCTL_MAGIC, 57, \
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 16/18] btrfs: dedup: add an inode nodedup flag
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (14 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 15/18] btrfs: dedup: Add ioctl for inband deduplication Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup Qu Wenruo
  2016-01-14  5:57 ` [PATCH v4 18/18] btrfs: dedup: add per-file online dedup control Qu Wenruo
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Introduce BTRFS_INODE_NODEDUP flag, then we can explicitly disable
online data deduplication for specified files.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/ctree.h | 1 +
 fs/btrfs/ioctl.c | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a68e23d..ddc6ff2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2400,6 +2400,7 @@ do {                                                                   \
 #define BTRFS_INODE_NOATIME		(1 << 9)
 #define BTRFS_INODE_DIRSYNC		(1 << 10)
 #define BTRFS_INODE_COMPRESS		(1 << 11)
+#define BTRFS_INODE_NODEDUP		(1 << 12)
 
 #define BTRFS_INODE_ROOT_ITEM_INIT	(1 << 31)
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index c18be18..32c3aa2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -159,7 +159,8 @@ void btrfs_update_iflags(struct inode *inode)
 /*
  * Inherit flags from the parent inode.
  *
- * Currently only the compression flags and the cow flags are inherited.
+ * Currently only the compression flags, the dedup flags and the cow
+ * flags are inherited.
  */
 void btrfs_inherit_iflags(struct inode *inode, struct inode *dir)
 {
@@ -184,6 +185,9 @@ void btrfs_inherit_iflags(struct inode *inode, struct inode *dir)
 			BTRFS_I(inode)->flags |= BTRFS_INODE_NODATASUM;
 	}
 
+	if (flags & BTRFS_INODE_NODEDUP)
+		BTRFS_I(inode)->flags |= BTRFS_INODE_NODEDUP;
+
 	btrfs_update_iflags(inode);
 }
 
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (15 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 16/18] btrfs: dedup: add an inode nodedup flag Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  2016-01-14  9:56   ` Filipe Manana
  2016-01-14  5:57 ` [PATCH v4 18/18] btrfs: dedup: add per-file online dedup control Qu Wenruo
  17 siblings, 1 reply; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

We use btrfs extended attribute "btrfs.dedup" to record per-file online
dedup status, so add a dedup property handler.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/props.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index f9e6023..ae8b76d 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -41,6 +41,10 @@ static int prop_compression_apply(struct inode *inode,
 				  size_t len);
 static const char *prop_compression_extract(struct inode *inode);
 
+static int prop_dedup_validate(const char *value, size_t len);
+static int prop_dedup_apply(struct inode *inode, const char *value, size_t len);
+static const char *prop_dedup_extract(struct inode *inode);
+
 static struct prop_handler prop_handlers[] = {
 	{
 		.xattr_name = XATTR_BTRFS_PREFIX "compression",
@@ -49,6 +53,13 @@ static struct prop_handler prop_handlers[] = {
 		.extract = prop_compression_extract,
 		.inheritable = 1
 	},
+	{
+		.xattr_name = XATTR_BTRFS_PREFIX "dedup",
+		.validate = prop_dedup_validate,
+		.apply = prop_dedup_apply,
+		.extract = prop_dedup_extract,
+		.inheritable = 1
+	},
 };
 
 void __init btrfs_props_init(void)
@@ -425,4 +436,33 @@ static const char *prop_compression_extract(struct inode *inode)
 	return NULL;
 }
 
+static int prop_dedup_validate(const char *value, size_t len)
+{
+	if (!strncmp("disable", value, len))
+		return 0;
+
+	return -EINVAL;
+}
+
+static int prop_dedup_apply(struct inode *inode, const char *value, size_t len)
+{
+	if (len == 0) {
+		BTRFS_I(inode)->flags &= ~BTRFS_INODE_NODEDUP;
+		return 0;
+	}
+
+	if (!strncmp("disable", value, len)) {
+		BTRFS_I(inode)->flags |= BTRFS_INODE_NODEDUP;
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
+static const char *prop_dedup_extract(struct inode *inode)
+{
+	if (BTRFS_I(inode)->flags | BTRFS_INODE_NODEDUP)
+		return "disable";
 
+	return NULL;
+}
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 18/18] btrfs: dedup: add per-file online dedup control
  2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
                   ` (16 preceding siblings ...)
  2016-01-14  5:57 ` [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup Qu Wenruo
@ 2016-01-14  5:57 ` Qu Wenruo
  17 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-14  5:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Xiaoguang

From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

Introduce inode_need_dedup() to implement per-file online dedup control.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
---
 fs/btrfs/inode.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5a49afa..6d2f068 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -672,6 +672,21 @@ static void free_async_extent_pages(struct async_extent *async_extent)
 	async_extent->pages = NULL;
 }
 
+static inline int inode_need_dedup(struct inode *inode)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_dedup_info *dedup_info = root->fs_info->dedup_info;
+
+	if (!dedup_info)
+		return 0;
+
+	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODEDUP)
+		return 0;
+
+	return 1;
+}
+
+
 static int submit_dedup_extent(struct inode *inode, u64 start,
 			       unsigned long len, u64 disk_start, int dedup)
 {
@@ -1794,8 +1809,6 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page,
 {
 	int ret;
 	int force_cow = need_force_cow(inode, start, end);
-	struct btrfs_root *root = BTRFS_I(inode)->root;
-	struct btrfs_dedup_info *dedup_info = root->fs_info->dedup_info;
 
 	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW && !force_cow) {
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
@@ -1803,7 +1816,7 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page,
 	} else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) {
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
 					 page_started, 0, nr_written);
-	} else if (!inode_need_compress(inode) && !dedup_info) {
+	} else if (!inode_need_compress(inode) && !inode_need_dedup(inode)) {
 		ret = cow_file_range(inode, locked_page, start, end,
 				      page_started, nr_written, 1);
 	} else {
-- 
2.7.0




^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 05/18] btrfs: delayed-ref: Add support for atomic increasing extent ref
  2016-01-14  5:57 ` [PATCH v4 05/18] btrfs: delayed-ref: Add support for atomic increasing extent ref Qu Wenruo
@ 2016-01-14  9:56   ` Filipe Manana
  2016-01-15  1:16     ` Qu Wenruo
  0 siblings, 1 reply; 35+ messages in thread
From: Filipe Manana @ 2016-01-14  9:56 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> Slightly modify btrfs_add_delayed_data_ref() to allow it accept
> GFP_ATOMIC, and allow it to do be called inside a spinlock.

Hi Qu,

I really would prefer to avoid gfp_atomic allocations.
Instead of using them, how about changing the approach so that callers
allocate the ref structure (without using gfp_atomic), then take the
spinlock, and then pass the structure to the inc/dec ref functions?

thanks

>
> This is used by later dedup patches.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/ctree.h       |  4 ++++
>  fs/btrfs/delayed-ref.c | 25 +++++++++++++++++--------
>  fs/btrfs/delayed-ref.h |  2 +-
>  fs/btrfs/extent-tree.c | 24 +++++++++++++++++++++---
>  4 files changed, 43 insertions(+), 12 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2132fa5..671be87 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3539,6 +3539,10 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>                          struct btrfs_root *root,
>                          u64 bytenr, u64 num_bytes, u64 parent,
>                          u64 root_objectid, u64 owner, u64 offset);
> +int btrfs_inc_extent_ref_atomic(struct btrfs_trans_handle *trans,
> +                               struct btrfs_root *root, u64 bytenr,
> +                               u64 num_bytes, u64 parent,
> +                               u64 root_objectid, u64 owner, u64 offset);
>
>  int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
>                                    struct btrfs_root *root);
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 914ac13..e869442 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -812,26 +812,31 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>                                u64 bytenr, u64 num_bytes,
>                                u64 parent, u64 ref_root,
>                                u64 owner, u64 offset, u64 reserved, int action,
> -                              struct btrfs_delayed_extent_op *extent_op)
> +                              int atomic)
>  {
>         struct btrfs_delayed_data_ref *ref;
>         struct btrfs_delayed_ref_head *head_ref;
>         struct btrfs_delayed_ref_root *delayed_refs;
>         struct btrfs_qgroup_extent_record *record = NULL;
> +       gfp_t gfp_flags;
> +
> +       if (atomic)
> +               gfp_flags = GFP_ATOMIC;
> +       else
> +               gfp_flags = GFP_NOFS;
>
> -       BUG_ON(extent_op && !extent_op->is_data);
> -       ref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, GFP_NOFS);
> +       ref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, gfp_flags);
>         if (!ref)
>                 return -ENOMEM;
>
> -       head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
> +       head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, gfp_flags);
>         if (!head_ref) {
>                 kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
>                 return -ENOMEM;
>         }
>
>         if (fs_info->quota_enabled && is_fstree(ref_root)) {
> -               record = kmalloc(sizeof(*record), GFP_NOFS);
> +               record = kmalloc(sizeof(*record), gfp_flags);
>                 if (!record) {
>                         kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
>                         kmem_cache_free(btrfs_delayed_ref_head_cachep,
> @@ -840,10 +845,13 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>                 }
>         }
>
> -       head_ref->extent_op = extent_op;
> +       head_ref->extent_op = NULL;
>
>         delayed_refs = &trans->transaction->delayed_refs;
> -       spin_lock(&delayed_refs->lock);
> +
> +       /* For atomic case, caller should already hold the delayed_refs lock */
> +       if (!atomic)
> +               spin_lock(&delayed_refs->lock);
>
>         /*
>          * insert both the head node and the new ref without dropping
> @@ -856,7 +864,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>         add_delayed_data_ref(fs_info, trans, head_ref, &ref->node, bytenr,
>                                    num_bytes, parent, ref_root, owner, offset,
>                                    action);
> -       spin_unlock(&delayed_refs->lock);
> +       if (!atomic)
> +               spin_unlock(&delayed_refs->lock);
>
>         return 0;
>  }
> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> index c24b653..e34f96a 100644
> --- a/fs/btrfs/delayed-ref.h
> +++ b/fs/btrfs/delayed-ref.h
> @@ -249,7 +249,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>                                u64 bytenr, u64 num_bytes,
>                                u64 parent, u64 ref_root,
>                                u64 owner, u64 offset, u64 reserved, int action,
> -                              struct btrfs_delayed_extent_op *extent_op);
> +                              int atomic);
>  int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
>                                      struct btrfs_trans_handle *trans,
>                                      u64 ref_root, u64 bytenr, u64 num_bytes);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 60cc139..4a01ca9 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2105,11 +2105,29 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>                 ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
>                                         num_bytes, parent, root_objectid,
>                                         owner, offset, 0,
> -                                       BTRFS_ADD_DELAYED_REF, NULL);
> +                                       BTRFS_ADD_DELAYED_REF, 0);
>         }
>         return ret;
>  }
>
> +int btrfs_inc_extent_ref_atomic(struct btrfs_trans_handle *trans,
> +                               struct btrfs_root *root, u64 bytenr,
> +                               u64 num_bytes, u64 parent,
> +                               u64 root_objectid, u64 owner, u64 offset)
> +{
> +       struct btrfs_fs_info *fs_info = root->fs_info;
> +
> +       BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID &&
> +              root_objectid == BTRFS_TREE_LOG_OBJECTID);
> +
> +       /* Only used by dedup, so only data is possible */
> +       if (WARN_ON(owner < BTRFS_FIRST_FREE_OBJECTID))
> +               return -EINVAL;
> +       return btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
> +                       num_bytes, parent, root_objectid,
> +                       owner, offset, 0, BTRFS_ADD_DELAYED_REF, 1);
> +}
> +
>  static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>                                   struct btrfs_root *root,
>                                   struct btrfs_delayed_ref_node *node,
> @@ -6893,7 +6911,7 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>                                                 num_bytes,
>                                                 parent, root_objectid, owner,
>                                                 offset, 0,
> -                                               BTRFS_DROP_DELAYED_REF, NULL);
> +                                               BTRFS_DROP_DELAYED_REF, 0);
>         }
>         return ret;
>  }
> @@ -7845,7 +7863,7 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
>                                          ins->offset, 0,
>                                          root_objectid, owner, offset,
>                                          ram_bytes, BTRFS_ADD_DELAYED_EXTENT,
> -                                        NULL);
> +                                        0);
>         return ret;
>  }
>
> --
> 2.7.0
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup
  2016-01-14  5:57 ` [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup Qu Wenruo
@ 2016-01-14  9:56   ` Filipe Manana
  2016-01-14 19:04     ` Darrick J. Wong
  2016-01-15  1:37     ` Qu Wenruo
  0 siblings, 2 replies; 35+ messages in thread
From: Filipe Manana @ 2016-01-14  9:56 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Wang Xiaoguang

On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>
> We use btrfs extended attribute "btrfs.dedup" to record per-file online
> dedup status, so add a dedup property handler.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>  fs/btrfs/props.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
> index f9e6023..ae8b76d 100644
> --- a/fs/btrfs/props.c
> +++ b/fs/btrfs/props.c
> @@ -41,6 +41,10 @@ static int prop_compression_apply(struct inode *inode,
>                                   size_t len);
>  static const char *prop_compression_extract(struct inode *inode);
>
> +static int prop_dedup_validate(const char *value, size_t len);
> +static int prop_dedup_apply(struct inode *inode, const char *value, size_t len);
> +static const char *prop_dedup_extract(struct inode *inode);
> +
>  static struct prop_handler prop_handlers[] = {
>         {
>                 .xattr_name = XATTR_BTRFS_PREFIX "compression",
> @@ -49,6 +53,13 @@ static struct prop_handler prop_handlers[] = {
>                 .extract = prop_compression_extract,
>                 .inheritable = 1
>         },
> +       {
> +               .xattr_name = XATTR_BTRFS_PREFIX "dedup",
> +               .validate = prop_dedup_validate,
> +               .apply = prop_dedup_apply,
> +               .extract = prop_dedup_extract,
> +               .inheritable = 1
> +       },
>  };
>
>  void __init btrfs_props_init(void)
> @@ -425,4 +436,33 @@ static const char *prop_compression_extract(struct inode *inode)
>         return NULL;
>  }
>
> +static int prop_dedup_validate(const char *value, size_t len)
> +{
> +       if (!strncmp("disable", value, len))
> +               return 0;
> +
> +       return -EINVAL;
> +}
> +
> +static int prop_dedup_apply(struct inode *inode, const char *value, size_t len)
> +{
> +       if (len == 0) {
> +               BTRFS_I(inode)->flags &= ~BTRFS_INODE_NODEDUP;
> +               return 0;
> +       }
> +
> +       if (!strncmp("disable", value, len)) {
> +               BTRFS_I(inode)->flags |= BTRFS_INODE_NODEDUP;
> +               return 0;
> +       }
> +
> +       return -EINVAL;
> +}
> +
> +static const char *prop_dedup_extract(struct inode *inode)
> +{
> +       if (BTRFS_I(inode)->flags | BTRFS_INODE_NODEDUP)
> +               return "disable";

| -> &

How about writing test cases (for xfstests)? Not only for the property
but for the whole dedup feature, it would avoid such simple
mistakes...
Take a look at the good example of xfs development. For example when
all the recent patches for their reflink implementation was posted
(and before getting merged), a comprehensive set of test cases for
xfstests was also posted...

>
> +       return NULL;
> +}
> --
> 2.7.0
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 07/18] btrfs: dedup: Implement btrfs_dedup_calc_hash interface
  2016-01-14  5:57 ` [PATCH v4 07/18] btrfs: dedup: Implement btrfs_dedup_calc_hash interface Qu Wenruo
@ 2016-01-14 10:08   ` Filipe Manana
  2016-01-15  1:41     ` Qu Wenruo
  0 siblings, 1 reply; 35+ messages in thread
From: Filipe Manana @ 2016-01-14 10:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Wang Xiaoguang

On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>
> Unlike in-memory or on-disk dedup method, only SHA256 hash method is
> supported yet, so implement btrfs_dedup_calc_hash() interface using
> SHA256.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> ---
>  fs/btrfs/dedup.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 57 insertions(+)
>
> diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
> index d6ee576..8860916 100644
> --- a/fs/btrfs/dedup.c
> +++ b/fs/btrfs/dedup.c
> @@ -465,3 +465,60 @@ int btrfs_dedup_search(struct inode *inode, u64 file_pos,
>         }
>         return ret;
>  }
> +
> +static int hash_data(struct btrfs_dedup_info *dedup_info, const char *data,
> +                    u64 length, struct btrfs_dedup_hash *hash)
> +{
> +       struct crypto_shash *tfm = dedup_info->dedup_driver;
> +       struct {
> +               struct shash_desc desc;
> +               char ctx[crypto_shash_descsize(tfm)];
> +       } sdesc;
> +       int ret;
> +
> +       sdesc.desc.tfm = tfm;
> +       sdesc.desc.flags = 0;
> +
> +       ret = crypto_shash_digest(&sdesc.desc, data, length,
> +                                 (char *)(hash->hash));
> +       return ret;
> +}
> +
> +int btrfs_dedup_calc_hash(struct btrfs_root *root, struct inode *inode,
> +                         u64 start, struct btrfs_dedup_hash *hash)
> +{
> +       struct page *p;
> +       struct btrfs_dedup_info *dedup_info = root->fs_info->dedup_info;
> +       char *data;
> +       int i;
> +       int ret;
> +       u64 dedup_bs;
> +       u64 sectorsize = root->sectorsize;
> +
> +       if (!dedup_info || !hash)
> +               return 0;
> +
> +       WARN_ON(!IS_ALIGNED(start, sectorsize));
> +
> +       dedup_bs = dedup_info->blocksize;
> +       sectorsize = root->sectorsize;
> +
> +       data = kmalloc(dedup_bs, GFP_NOFS);
> +       if (!data)
> +               return -ENOMEM;
> +       for (i = 0; sectorsize * i < dedup_bs; i++) {
> +               char *d;
> +
> +               /* TODO: Add support for subpage size case */
> +               p = find_get_page(inode->i_mapping,
> +                                 (start >> PAGE_CACHE_SHIFT) + i);
> +               WARN_ON(!p);

If !p, we should return with an error too, otherwise we'll get a null
pointer dereference below.

> +               d = kmap_atomic(p);
> +               memcpy((data + sectorsize * i), d, sectorsize);
> +               kunmap_atomic(d);
> +               page_cache_release(p);
> +       }
> +       ret = hash_data(dedup_info, data, dedup_bs, hash);
> +       kfree(data);
> +       return ret;
> +}
> --
> 2.7.0
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 13/18] btrfs: dedup: Add support to delete hash for on-disk backend
  2016-01-14  5:57 ` [PATCH v4 13/18] btrfs: dedup: Add support to delete hash for on-disk backend Qu Wenruo
@ 2016-01-14 10:19   ` Filipe Manana
  2016-01-15  1:43     ` Qu Wenruo
  0 siblings, 1 reply; 35+ messages in thread
From: Filipe Manana @ 2016-01-14 10:19 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Wang Xiaoguang

On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> Now on-disk backend can delete hash now.
>
> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/dedup.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 93 insertions(+)
>
> diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
> index 4acbbfb..66c8f05 100644
> --- a/fs/btrfs/dedup.c
> +++ b/fs/btrfs/dedup.c
> @@ -437,6 +437,97 @@ static int inmem_del(struct btrfs_dedup_info *dedup_info, u64 bytenr)
>         return 0;
>  }
>
> +/*
> + * If prepare_del is given, this will setup search_slot() for delete.
> + * Caller needs to do proper locking.
> + *
> + * Return > 0 for found.
> + * Return 0 for not found.
> + * Return < 0 for error.
> + */
> +static int ondisk_search_bytenr(struct btrfs_trans_handle *trans,
> +                               struct btrfs_dedup_info *dedup_info,
> +                               struct btrfs_path *path, u64 bytenr,
> +                               int prepare_del)
> +{
> +       struct btrfs_key key;
> +       struct btrfs_root *dedup_root = dedup_info->dedup_root;
> +       int ret;
> +       int ins_len = 0;
> +       int cow = 0;
> +
> +       if (prepare_del) {
> +               if (WARN_ON(trans == NULL))
> +                       return -EINVAL;
> +               cow = 1;
> +               ins_len = -1;
> +       }
> +
> +       key.objectid = bytenr;
> +       key.type = BTRFS_DEDUP_BYTENR_ITEM_KEY;
> +       key.offset = (u64)-1;
> +
> +       ret = btrfs_search_slot(trans, dedup_root, &key, path,
> +                               ins_len, cow);
> +       if (ret < 0)
> +               return ret;
> +
> +       WARN_ON(ret == 0);

If ret == 0 (who knows if one day there won't be an hash algorithm
that produces 64 bits hashes with all bits set for some inputs), then
we shouldn't do the btrfs_previous_item() call, which would make us
miss the item.

> +       ret = btrfs_previous_item(dedup_root, path, bytenr,
> +                                 BTRFS_DEDUP_BYTENR_ITEM_KEY);
> +       if (ret < 0)
> +               return ret;
> +       if (ret > 0)
> +               return 0;
> +       return 1;
> +}
> +
> +static int ondisk_del(struct btrfs_trans_handle *trans,
> +                     struct btrfs_dedup_info *dedup_info, u64 bytenr)
> +{
> +       struct btrfs_root *dedup_root = dedup_info->dedup_root;
> +       struct btrfs_path *path;
> +       struct btrfs_key key;
> +       int ret;
> +
> +       path = btrfs_alloc_path();
> +       if (!path)
> +               return -ENOMEM;
> +
> +       key.objectid = bytenr;
> +       key.type = BTRFS_DEDUP_BYTENR_ITEM_KEY;
> +       key.offset = 0;
> +
> +       mutex_lock(&dedup_info->lock);
> +
> +       ret = ondisk_search_bytenr(trans, dedup_info, path, bytenr, 1);
> +       if (ret <= 0)
> +               goto out;
> +
> +       btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> +       btrfs_del_item(trans, dedup_root, path);
> +       btrfs_release_path(path);
> +
> +       /* Search for hash item and delete it */
> +       key.objectid = key.offset;
> +       key.type = BTRFS_DEDUP_HASH_ITEM_KEY;
> +       key.offset = bytenr;
> +
> +       ret = btrfs_search_slot(trans, dedup_root, &key, path, -1, 1);
> +       if (WARN_ON(ret > 0)) {
> +               ret = -ENOENT;
> +               goto out;
> +       }
> +       if (ret < 0)
> +               goto out;
> +       btrfs_del_item(trans, dedup_root, path);

btrfs_del_item() can return errors.

> +
> +out:
> +       btrfs_free_path(path);
> +       mutex_unlock(&dedup_info->lock);
> +       return ret;
> +}
> +
>  /* Remove a dedup hash from dedup tree */
>  int btrfs_dedup_del(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>                     u64 bytenr)
> @@ -449,6 +540,8 @@ int btrfs_dedup_del(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>
>         if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
>                 return inmem_del(dedup_info, bytenr);
> +       if (dedup_info->backend == BTRFS_DEDUP_BACKEND_ONDISK)
> +               return ondisk_del(trans, dedup_info, bytenr);
>         return -EINVAL;
>  }
>
> --
> 2.7.0
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup
  2016-01-14  9:56   ` Filipe Manana
@ 2016-01-14 19:04     ` Darrick J. Wong
  2016-01-15  1:37     ` Qu Wenruo
  1 sibling, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2016-01-14 19:04 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Qu Wenruo, linux-btrfs, Wang Xiaoguang

On Thu, Jan 14, 2016 at 09:56:12AM +0000, Filipe Manana wrote:
> On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> > From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> >
> > We use btrfs extended attribute "btrfs.dedup" to record per-file online
> > dedup status, so add a dedup property handler.
> >
> > Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> > ---
> >  fs/btrfs/props.c | 40 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> >
> > diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
> > index f9e6023..ae8b76d 100644
> > --- a/fs/btrfs/props.c
> > +++ b/fs/btrfs/props.c
> > @@ -41,6 +41,10 @@ static int prop_compression_apply(struct inode *inode,
> >                                   size_t len);
> >  static const char *prop_compression_extract(struct inode *inode);
> >
> > +static int prop_dedup_validate(const char *value, size_t len);
> > +static int prop_dedup_apply(struct inode *inode, const char *value, size_t len);
> > +static const char *prop_dedup_extract(struct inode *inode);
> > +
> >  static struct prop_handler prop_handlers[] = {
> >         {
> >                 .xattr_name = XATTR_BTRFS_PREFIX "compression",
> > @@ -49,6 +53,13 @@ static struct prop_handler prop_handlers[] = {
> >                 .extract = prop_compression_extract,
> >                 .inheritable = 1
> >         },
> > +       {
> > +               .xattr_name = XATTR_BTRFS_PREFIX "dedup",
> > +               .validate = prop_dedup_validate,
> > +               .apply = prop_dedup_apply,
> > +               .extract = prop_dedup_extract,
> > +               .inheritable = 1
> > +       },
> >  };
> >
> >  void __init btrfs_props_init(void)
> > @@ -425,4 +436,33 @@ static const char *prop_compression_extract(struct inode *inode)
> >         return NULL;
> >  }
> >
> > +static int prop_dedup_validate(const char *value, size_t len)
> > +{
> > +       if (!strncmp("disable", value, len))
> > +               return 0;
> > +
> > +       return -EINVAL;
> > +}
> > +
> > +static int prop_dedup_apply(struct inode *inode, const char *value, size_t len)
> > +{
> > +       if (len == 0) {
> > +               BTRFS_I(inode)->flags &= ~BTRFS_INODE_NODEDUP;
> > +               return 0;
> > +       }
> > +
> > +       if (!strncmp("disable", value, len)) {
> > +               BTRFS_I(inode)->flags |= BTRFS_INODE_NODEDUP;
> > +               return 0;
> > +       }
> > +
> > +       return -EINVAL;
> > +}
> > +
> > +static const char *prop_dedup_extract(struct inode *inode)
> > +{
> > +       if (BTRFS_I(inode)->flags | BTRFS_INODE_NODEDUP)
> > +               return "disable";
> 
> | -> &
> 
> How about writing test cases (for xfstests)? Not only for the property
> but for the whole dedup feature, it would avoid such simple
> mistakes...
> Take a look at the good example of xfs development. For example when
> all the recent patches for their reflink implementation was posted
> (and before getting merged), a comprehensive set of test cases for
> xfstests was also posted...

Seconded. ;)

(More reflink xfstests are coming, too...)

--D

> 
> >
> > +       return NULL;
> > +}
> > --
> > 2.7.0
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Filipe David Manana,
> 
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 02/18] btrfs: dedup: Introduce function to initialize dedup info
  2016-01-14  5:57 ` [PATCH v4 02/18] btrfs: dedup: Introduce function to initialize dedup info Qu Wenruo
@ 2016-01-14 21:33   ` kbuild test robot
  0 siblings, 0 replies; 35+ messages in thread
From: kbuild test robot @ 2016-01-14 21:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: kbuild-all, linux-btrfs, Wang Xiaoguang

[-- Attachment #1: Type: text/plain, Size: 1615 bytes --]

Hi Wang,

[auto build test ERROR on btrfs/next]
[also build test ERROR on next-20160114]
[cannot apply to v4.4]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-dedup-Introduce-dedup-framework-and-its-header/20160114-140449
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next
config: s390-gcov_defconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=s390 

Note: the linux-review/Qu-Wenruo/btrfs-dedup-Introduce-dedup-framework-and-its-header/20160114-140449 HEAD 32b59e8e273105ae1025de27a5e2d47d7e2191e4 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   fs/built-in.o: In function `btrfs_dedup_enable':
>> fs/btrfs/dedup.c:52: undefined reference to `btrfs_dedup_disable'

vim +52 fs/btrfs/dedup.c

    46			dedup_info = fs_info->dedup_info;
    47	
    48			/* Check if we are re-enable for different dedup config */
    49			if (dedup_info->blocksize != blocksize ||
    50			    dedup_info->hash_type != type ||
    51			    dedup_info->backend != backend) {
  > 52				btrfs_dedup_disable(fs_info);
    53				goto enable;
    54			}
    55	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 15606 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 05/18] btrfs: delayed-ref: Add support for atomic increasing extent ref
  2016-01-14  9:56   ` Filipe Manana
@ 2016-01-15  1:16     ` Qu Wenruo
  2016-01-20  3:25       ` Qu Wenruo
  0 siblings, 1 reply; 35+ messages in thread
From: Qu Wenruo @ 2016-01-15  1:16 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

Hi Filipe,

Thanks for your review.

Filipe Manana wrote on 2016/01/14 09:56 +0000:
> On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> Slightly modify btrfs_add_delayed_data_ref() to allow it accept
>> GFP_ATOMIC, and allow it to do be called inside a spinlock.
>
> Hi Qu,
>
> I really would prefer to avoid gfp_atomic allocations.
> Instead of using them, how about changing the approach so that callers
> allocate the ref structure (without using gfp_atomic), then take the
> spinlock, and then pass the structure to the inc/dec ref functions?

Yes, we also considered that method too.

But since it's only used for dedup hit case which has no existing 
delayed ref head, it's on a uncommon routine.

On the other hand, if using the method you mentioned, alloc memory first 
then pass it to add_delayed_data_ref(), we will always need to alloc 
memory for minority cases.

It would be very nice if you can provide some extra disadvantage on 
using GFP_ATOMIC, maybe there is something important I missed.

Thanks
Qu

>
> thanks
>
>>
>> This is used by later dedup patches.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>   fs/btrfs/ctree.h       |  4 ++++
>>   fs/btrfs/delayed-ref.c | 25 +++++++++++++++++--------
>>   fs/btrfs/delayed-ref.h |  2 +-
>>   fs/btrfs/extent-tree.c | 24 +++++++++++++++++++++---
>>   4 files changed, 43 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 2132fa5..671be87 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -3539,6 +3539,10 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>>                           struct btrfs_root *root,
>>                           u64 bytenr, u64 num_bytes, u64 parent,
>>                           u64 root_objectid, u64 owner, u64 offset);
>> +int btrfs_inc_extent_ref_atomic(struct btrfs_trans_handle *trans,
>> +                               struct btrfs_root *root, u64 bytenr,
>> +                               u64 num_bytes, u64 parent,
>> +                               u64 root_objectid, u64 owner, u64 offset);
>>
>>   int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
>>                                     struct btrfs_root *root);
>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>> index 914ac13..e869442 100644
>> --- a/fs/btrfs/delayed-ref.c
>> +++ b/fs/btrfs/delayed-ref.c
>> @@ -812,26 +812,31 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>>                                 u64 bytenr, u64 num_bytes,
>>                                 u64 parent, u64 ref_root,
>>                                 u64 owner, u64 offset, u64 reserved, int action,
>> -                              struct btrfs_delayed_extent_op *extent_op)
>> +                              int atomic)
>>   {
>>          struct btrfs_delayed_data_ref *ref;
>>          struct btrfs_delayed_ref_head *head_ref;
>>          struct btrfs_delayed_ref_root *delayed_refs;
>>          struct btrfs_qgroup_extent_record *record = NULL;
>> +       gfp_t gfp_flags;
>> +
>> +       if (atomic)
>> +               gfp_flags = GFP_ATOMIC;
>> +       else
>> +               gfp_flags = GFP_NOFS;
>>
>> -       BUG_ON(extent_op && !extent_op->is_data);
>> -       ref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, GFP_NOFS);
>> +       ref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, gfp_flags);
>>          if (!ref)
>>                  return -ENOMEM;
>>
>> -       head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
>> +       head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, gfp_flags);
>>          if (!head_ref) {
>>                  kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
>>                  return -ENOMEM;
>>          }
>>
>>          if (fs_info->quota_enabled && is_fstree(ref_root)) {
>> -               record = kmalloc(sizeof(*record), GFP_NOFS);
>> +               record = kmalloc(sizeof(*record), gfp_flags);
>>                  if (!record) {
>>                          kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
>>                          kmem_cache_free(btrfs_delayed_ref_head_cachep,
>> @@ -840,10 +845,13 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>>                  }
>>          }
>>
>> -       head_ref->extent_op = extent_op;
>> +       head_ref->extent_op = NULL;
>>
>>          delayed_refs = &trans->transaction->delayed_refs;
>> -       spin_lock(&delayed_refs->lock);
>> +
>> +       /* For atomic case, caller should already hold the delayed_refs lock */
>> +       if (!atomic)
>> +               spin_lock(&delayed_refs->lock);
>>
>>          /*
>>           * insert both the head node and the new ref without dropping
>> @@ -856,7 +864,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>>          add_delayed_data_ref(fs_info, trans, head_ref, &ref->node, bytenr,
>>                                     num_bytes, parent, ref_root, owner, offset,
>>                                     action);
>> -       spin_unlock(&delayed_refs->lock);
>> +       if (!atomic)
>> +               spin_unlock(&delayed_refs->lock);
>>
>>          return 0;
>>   }
>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>> index c24b653..e34f96a 100644
>> --- a/fs/btrfs/delayed-ref.h
>> +++ b/fs/btrfs/delayed-ref.h
>> @@ -249,7 +249,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>>                                 u64 bytenr, u64 num_bytes,
>>                                 u64 parent, u64 ref_root,
>>                                 u64 owner, u64 offset, u64 reserved, int action,
>> -                              struct btrfs_delayed_extent_op *extent_op);
>> +                              int atomic);
>>   int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
>>                                       struct btrfs_trans_handle *trans,
>>                                       u64 ref_root, u64 bytenr, u64 num_bytes);
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 60cc139..4a01ca9 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -2105,11 +2105,29 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>>                  ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
>>                                          num_bytes, parent, root_objectid,
>>                                          owner, offset, 0,
>> -                                       BTRFS_ADD_DELAYED_REF, NULL);
>> +                                       BTRFS_ADD_DELAYED_REF, 0);
>>          }
>>          return ret;
>>   }
>>
>> +int btrfs_inc_extent_ref_atomic(struct btrfs_trans_handle *trans,
>> +                               struct btrfs_root *root, u64 bytenr,
>> +                               u64 num_bytes, u64 parent,
>> +                               u64 root_objectid, u64 owner, u64 offset)
>> +{
>> +       struct btrfs_fs_info *fs_info = root->fs_info;
>> +
>> +       BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID &&
>> +              root_objectid == BTRFS_TREE_LOG_OBJECTID);
>> +
>> +       /* Only used by dedup, so only data is possible */
>> +       if (WARN_ON(owner < BTRFS_FIRST_FREE_OBJECTID))
>> +               return -EINVAL;
>> +       return btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
>> +                       num_bytes, parent, root_objectid,
>> +                       owner, offset, 0, BTRFS_ADD_DELAYED_REF, 1);
>> +}
>> +
>>   static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>>                                    struct btrfs_root *root,
>>                                    struct btrfs_delayed_ref_node *node,
>> @@ -6893,7 +6911,7 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>>                                                  num_bytes,
>>                                                  parent, root_objectid, owner,
>>                                                  offset, 0,
>> -                                               BTRFS_DROP_DELAYED_REF, NULL);
>> +                                               BTRFS_DROP_DELAYED_REF, 0);
>>          }
>>          return ret;
>>   }
>> @@ -7845,7 +7863,7 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
>>                                           ins->offset, 0,
>>                                           root_objectid, owner, offset,
>>                                           ram_bytes, BTRFS_ADD_DELAYED_EXTENT,
>> -                                        NULL);
>> +                                        0);
>>          return ret;
>>   }
>>
>> --
>> 2.7.0
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup
  2016-01-14  9:56   ` Filipe Manana
  2016-01-14 19:04     ` Darrick J. Wong
@ 2016-01-15  1:37     ` Qu Wenruo
  2016-01-15  9:19       ` Filipe Manana
  1 sibling, 1 reply; 35+ messages in thread
From: Qu Wenruo @ 2016-01-15  1:37 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, Wang Xiaoguang



Filipe Manana wrote on 2016/01/14 09:56 +0000:
> On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>
>> We use btrfs extended attribute "btrfs.dedup" to record per-file online
>> dedup status, so add a dedup property handler.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>>   fs/btrfs/props.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 40 insertions(+)
>>
>> diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
>> index f9e6023..ae8b76d 100644
>> --- a/fs/btrfs/props.c
>> +++ b/fs/btrfs/props.c
>> @@ -41,6 +41,10 @@ static int prop_compression_apply(struct inode *inode,
>>                                    size_t len);
>>   static const char *prop_compression_extract(struct inode *inode);
>>
>> +static int prop_dedup_validate(const char *value, size_t len);
>> +static int prop_dedup_apply(struct inode *inode, const char *value, size_t len);
>> +static const char *prop_dedup_extract(struct inode *inode);
>> +
>>   static struct prop_handler prop_handlers[] = {
>>          {
>>                  .xattr_name = XATTR_BTRFS_PREFIX "compression",
>> @@ -49,6 +53,13 @@ static struct prop_handler prop_handlers[] = {
>>                  .extract = prop_compression_extract,
>>                  .inheritable = 1
>>          },
>> +       {
>> +               .xattr_name = XATTR_BTRFS_PREFIX "dedup",
>> +               .validate = prop_dedup_validate,
>> +               .apply = prop_dedup_apply,
>> +               .extract = prop_dedup_extract,
>> +               .inheritable = 1
>> +       },
>>   };
>>
>>   void __init btrfs_props_init(void)
>> @@ -425,4 +436,33 @@ static const char *prop_compression_extract(struct inode *inode)
>>          return NULL;
>>   }
>>
>> +static int prop_dedup_validate(const char *value, size_t len)
>> +{
>> +       if (!strncmp("disable", value, len))
>> +               return 0;
>> +
>> +       return -EINVAL;
>> +}
>> +
>> +static int prop_dedup_apply(struct inode *inode, const char *value, size_t len)
>> +{
>> +       if (len == 0) {
>> +               BTRFS_I(inode)->flags &= ~BTRFS_INODE_NODEDUP;
>> +               return 0;
>> +       }
>> +
>> +       if (!strncmp("disable", value, len)) {
>> +               BTRFS_I(inode)->flags |= BTRFS_INODE_NODEDUP;
>> +               return 0;
>> +       }
>> +
>> +       return -EINVAL;
>> +}
>> +
>> +static const char *prop_dedup_extract(struct inode *inode)
>> +{
>> +       if (BTRFS_I(inode)->flags | BTRFS_INODE_NODEDUP)
>> +               return "disable";
>
> | -> &
>
> How about writing test cases (for xfstests)? Not only for the property
> but for the whole dedup feature, it would avoid such simple
> mistakes...

Yes, it's already in our schedule and we have some internal simple test 
case.

But the problem is,
Some behavior/format is not completely determined yet
Especially for on-disk format (ondisk backend only), and ioctl interface.

So I'm a little concerned about submit them too early before we made 
final decision on the ioctl interface.

> Take a look at the good example of xfs development. For example when
> all the recent patches for their reflink implementation was posted
> (and before getting merged), a comprehensive set of test cases for
> xfstests was also posted...

Yes, test driven development is very nice, but there is also problem, 
especially for btrfs, like btrfs/047 which needs btrfs send 
--stream-version support.

But unfortunately, there is not --stream-version support in upsteam 
btrfs-progs and the test case never get executed on upsteam btrfs-progs.

Personally speaking, xfs development can only happen in that way because 
xfstest maintainer is also the maintainer of xfs.
As Dave knows every aspect of xfs and can determine which feature will 
be merged.
But that's not true for btrfs, so he can merge wrong patch and cause 
inconsistent with what btrfs really supports between test cases.

Thanks,
Qu

>
>>
>> +       return NULL;
>> +}
>> --
>> 2.7.0
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 07/18] btrfs: dedup: Implement btrfs_dedup_calc_hash interface
  2016-01-14 10:08   ` Filipe Manana
@ 2016-01-15  1:41     ` Qu Wenruo
  0 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-15  1:41 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, Wang Xiaoguang



Filipe Manana wrote on 2016/01/14 10:08 +0000:
> On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>
>> Unlike in-memory or on-disk dedup method, only SHA256 hash method is
>> supported yet, so implement btrfs_dedup_calc_hash() interface using
>> SHA256.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> ---
>>   fs/btrfs/dedup.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 57 insertions(+)
>>
>> diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
>> index d6ee576..8860916 100644
>> --- a/fs/btrfs/dedup.c
>> +++ b/fs/btrfs/dedup.c
>> @@ -465,3 +465,60 @@ int btrfs_dedup_search(struct inode *inode, u64 file_pos,
>>          }
>>          return ret;
>>   }
>> +
>> +static int hash_data(struct btrfs_dedup_info *dedup_info, const char *data,
>> +                    u64 length, struct btrfs_dedup_hash *hash)
>> +{
>> +       struct crypto_shash *tfm = dedup_info->dedup_driver;
>> +       struct {
>> +               struct shash_desc desc;
>> +               char ctx[crypto_shash_descsize(tfm)];
>> +       } sdesc;
>> +       int ret;
>> +
>> +       sdesc.desc.tfm = tfm;
>> +       sdesc.desc.flags = 0;
>> +
>> +       ret = crypto_shash_digest(&sdesc.desc, data, length,
>> +                                 (char *)(hash->hash));
>> +       return ret;
>> +}
>> +
>> +int btrfs_dedup_calc_hash(struct btrfs_root *root, struct inode *inode,
>> +                         u64 start, struct btrfs_dedup_hash *hash)
>> +{
>> +       struct page *p;
>> +       struct btrfs_dedup_info *dedup_info = root->fs_info->dedup_info;
>> +       char *data;
>> +       int i;
>> +       int ret;
>> +       u64 dedup_bs;
>> +       u64 sectorsize = root->sectorsize;
>> +
>> +       if (!dedup_info || !hash)
>> +               return 0;
>> +
>> +       WARN_ON(!IS_ALIGNED(start, sectorsize));
>> +
>> +       dedup_bs = dedup_info->blocksize;
>> +       sectorsize = root->sectorsize;
>> +
>> +       data = kmalloc(dedup_bs, GFP_NOFS);
>> +       if (!data)
>> +               return -ENOMEM;
>> +       for (i = 0; sectorsize * i < dedup_bs; i++) {
>> +               char *d;
>> +
>> +               /* TODO: Add support for subpage size case */
>> +               p = find_get_page(inode->i_mapping,
>> +                                 (start >> PAGE_CACHE_SHIFT) + i);
>> +               WARN_ON(!p);
>
> If !p, we should return with an error too, otherwise we'll get a null
> pointer dereference below.

Right, and that's also true for all the caller of find_get_page() in btrfs.

I'll also locate and fix them in another patch before btrfs dedup patchset.

Thanks for the review and hint.
Qu

>
>> +               d = kmap_atomic(p);
>> +               memcpy((data + sectorsize * i), d, sectorsize);
>> +               kunmap_atomic(d);
>> +               page_cache_release(p);
>> +       }
>> +       ret = hash_data(dedup_info, data, dedup_bs, hash);
>> +       kfree(data);
>> +       return ret;
>> +}
>> --
>> 2.7.0
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 13/18] btrfs: dedup: Add support to delete hash for on-disk backend
  2016-01-14 10:19   ` Filipe Manana
@ 2016-01-15  1:43     ` Qu Wenruo
  0 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-15  1:43 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, Wang Xiaoguang



Filipe Manana wrote on 2016/01/14 10:19 +0000:
> On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> Now on-disk backend can delete hash now.
>>
>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>   fs/btrfs/dedup.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 93 insertions(+)
>>
>> diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
>> index 4acbbfb..66c8f05 100644
>> --- a/fs/btrfs/dedup.c
>> +++ b/fs/btrfs/dedup.c
>> @@ -437,6 +437,97 @@ static int inmem_del(struct btrfs_dedup_info *dedup_info, u64 bytenr)
>>          return 0;
>>   }
>>
>> +/*
>> + * If prepare_del is given, this will setup search_slot() for delete.
>> + * Caller needs to do proper locking.
>> + *
>> + * Return > 0 for found.
>> + * Return 0 for not found.
>> + * Return < 0 for error.
>> + */
>> +static int ondisk_search_bytenr(struct btrfs_trans_handle *trans,
>> +                               struct btrfs_dedup_info *dedup_info,
>> +                               struct btrfs_path *path, u64 bytenr,
>> +                               int prepare_del)
>> +{
>> +       struct btrfs_key key;
>> +       struct btrfs_root *dedup_root = dedup_info->dedup_root;
>> +       int ret;
>> +       int ins_len = 0;
>> +       int cow = 0;
>> +
>> +       if (prepare_del) {
>> +               if (WARN_ON(trans == NULL))
>> +                       return -EINVAL;
>> +               cow = 1;
>> +               ins_len = -1;
>> +       }
>> +
>> +       key.objectid = bytenr;
>> +       key.type = BTRFS_DEDUP_BYTENR_ITEM_KEY;
>> +       key.offset = (u64)-1;
>> +
>> +       ret = btrfs_search_slot(trans, dedup_root, &key, path,
>> +                               ins_len, cow);
>> +       if (ret < 0)
>> +               return ret;
>> +
>> +       WARN_ON(ret == 0);
>
> If ret == 0 (who knows if one day there won't be an hash algorithm
> that produces 64 bits hashes with all bits set for some inputs), then
> we shouldn't do the btrfs_previous_item() call, which would make us
> miss the item.

That's right, and it doesn't even need another algorithm, since at least 
in theory, SHA256 should also be able to produce such hash (only need to 
be all 1 for the last 64bits).

I'll change it to skip the previous_item() if ret == 0.

Thanks,
Qu

>
>> +       ret = btrfs_previous_item(dedup_root, path, bytenr,
>> +                                 BTRFS_DEDUP_BYTENR_ITEM_KEY);
>> +       if (ret < 0)
>> +               return ret;
>> +       if (ret > 0)
>> +               return 0;
>> +       return 1;
>> +}
>> +
>> +static int ondisk_del(struct btrfs_trans_handle *trans,
>> +                     struct btrfs_dedup_info *dedup_info, u64 bytenr)
>> +{
>> +       struct btrfs_root *dedup_root = dedup_info->dedup_root;
>> +       struct btrfs_path *path;
>> +       struct btrfs_key key;
>> +       int ret;
>> +
>> +       path = btrfs_alloc_path();
>> +       if (!path)
>> +               return -ENOMEM;
>> +
>> +       key.objectid = bytenr;
>> +       key.type = BTRFS_DEDUP_BYTENR_ITEM_KEY;
>> +       key.offset = 0;
>> +
>> +       mutex_lock(&dedup_info->lock);
>> +
>> +       ret = ondisk_search_bytenr(trans, dedup_info, path, bytenr, 1);
>> +       if (ret <= 0)
>> +               goto out;
>> +
>> +       btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>> +       btrfs_del_item(trans, dedup_root, path);
>> +       btrfs_release_path(path);
>> +
>> +       /* Search for hash item and delete it */
>> +       key.objectid = key.offset;
>> +       key.type = BTRFS_DEDUP_HASH_ITEM_KEY;
>> +       key.offset = bytenr;
>> +
>> +       ret = btrfs_search_slot(trans, dedup_root, &key, path, -1, 1);
>> +       if (WARN_ON(ret > 0)) {
>> +               ret = -ENOENT;
>> +               goto out;
>> +       }
>> +       if (ret < 0)
>> +               goto out;
>> +       btrfs_del_item(trans, dedup_root, path);
>
> btrfs_del_item() can return errors.
>
>> +
>> +out:
>> +       btrfs_free_path(path);
>> +       mutex_unlock(&dedup_info->lock);
>> +       return ret;
>> +}
>> +
>>   /* Remove a dedup hash from dedup tree */
>>   int btrfs_dedup_del(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>>                      u64 bytenr)
>> @@ -449,6 +540,8 @@ int btrfs_dedup_del(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>>
>>          if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
>>                  return inmem_del(dedup_info, bytenr);
>> +       if (dedup_info->backend == BTRFS_DEDUP_BACKEND_ONDISK)
>> +               return ondisk_del(trans, dedup_info, bytenr);
>>          return -EINVAL;
>>   }
>>
>> --
>> 2.7.0
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup
  2016-01-15  1:37     ` Qu Wenruo
@ 2016-01-15  9:19       ` Filipe Manana
  2016-01-15  9:33         ` Qu Wenruo
  2016-01-15 12:36         ` Duncan
  0 siblings, 2 replies; 35+ messages in thread
From: Filipe Manana @ 2016-01-15  9:19 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Wang Xiaoguang

On Fri, Jan 15, 2016 at 1:37 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
>
> Filipe Manana wrote on 2016/01/14 09:56 +0000:
>>
>> On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com>
>> wrote:
>>>
>>> From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>>
>>> We use btrfs extended attribute "btrfs.dedup" to record per-file online
>>> dedup status, so add a dedup property handler.
>>>
>>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>> ---
>>>   fs/btrfs/props.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 40 insertions(+)
>>>
>>> diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
>>> index f9e6023..ae8b76d 100644
>>> --- a/fs/btrfs/props.c
>>> +++ b/fs/btrfs/props.c
>>> @@ -41,6 +41,10 @@ static int prop_compression_apply(struct inode *inode,
>>>                                    size_t len);
>>>   static const char *prop_compression_extract(struct inode *inode);
>>>
>>> +static int prop_dedup_validate(const char *value, size_t len);
>>> +static int prop_dedup_apply(struct inode *inode, const char *value,
>>> size_t len);
>>> +static const char *prop_dedup_extract(struct inode *inode);
>>> +
>>>   static struct prop_handler prop_handlers[] = {
>>>          {
>>>                  .xattr_name = XATTR_BTRFS_PREFIX "compression",
>>> @@ -49,6 +53,13 @@ static struct prop_handler prop_handlers[] = {
>>>                  .extract = prop_compression_extract,
>>>                  .inheritable = 1
>>>          },
>>> +       {
>>> +               .xattr_name = XATTR_BTRFS_PREFIX "dedup",
>>> +               .validate = prop_dedup_validate,
>>> +               .apply = prop_dedup_apply,
>>> +               .extract = prop_dedup_extract,
>>> +               .inheritable = 1
>>> +       },
>>>   };
>>>
>>>   void __init btrfs_props_init(void)
>>> @@ -425,4 +436,33 @@ static const char *prop_compression_extract(struct
>>> inode *inode)
>>>          return NULL;
>>>   }
>>>
>>> +static int prop_dedup_validate(const char *value, size_t len)
>>> +{
>>> +       if (!strncmp("disable", value, len))
>>> +               return 0;
>>> +
>>> +       return -EINVAL;
>>> +}
>>> +
>>> +static int prop_dedup_apply(struct inode *inode, const char *value,
>>> size_t len)
>>> +{
>>> +       if (len == 0) {
>>> +               BTRFS_I(inode)->flags &= ~BTRFS_INODE_NODEDUP;
>>> +               return 0;
>>> +       }
>>> +
>>> +       if (!strncmp("disable", value, len)) {
>>> +               BTRFS_I(inode)->flags |= BTRFS_INODE_NODEDUP;
>>> +               return 0;
>>> +       }
>>> +
>>> +       return -EINVAL;
>>> +}
>>> +
>>> +static const char *prop_dedup_extract(struct inode *inode)
>>> +{
>>> +       if (BTRFS_I(inode)->flags | BTRFS_INODE_NODEDUP)
>>> +               return "disable";
>>
>>
>> | -> &
>>
>> How about writing test cases (for xfstests)? Not only for the property
>> but for the whole dedup feature, it would avoid such simple
>> mistakes...
>
>
> Yes, it's already in our schedule and we have some internal simple test
> case.
>
> But the problem is,
> Some behavior/format is not completely determined yet
> Especially for on-disk format (ondisk backend only), and ioctl interface.

How does someone guesses that? You could add an RFC tag to the patches...

>
> So I'm a little concerned about submit them too early before we made final
> decision on the ioctl interface.

What's the problem? You can submit such tests only to the btrfs list
for now and tag them as RFC and mention that.

>
>> Take a look at the good example of xfs development. For example when
>> all the recent patches for their reflink implementation was posted
>> (and before getting merged), a comprehensive set of test cases for
>> xfstests was also posted...
>
>
> Yes, test driven development is very nice, but there is also problem,
> especially for btrfs, like btrfs/047 which needs btrfs send --stream-version
> support.

Yes, this one was a special case in that around 2 years ago it seemed
the functionality was going to be merged, but it ended up not being
merged due to reasons not known to me (no reasons pointed in the
mailing list nor in private).

Better have unused tests instead of features without tests (and many
btrfs specific features had no tests or almost nothing not that long
ago).

>
> But unfortunately, there is not --stream-version support in upsteam
> btrfs-progs and the test case never get executed on upsteam btrfs-progs.
>
> Personally speaking, xfs development can only happen in that way because
> xfstest maintainer is also the maintainer of xfs.
> As Dave knows every aspect of xfs and can determine which feature will be
> merged.

He can also accept patches to remove tests or change them...
Shouldn't be an excuse to not share tests along with patches for new
features, as noted above.

> But that's not true for btrfs, so he can merge wrong patch and cause
> inconsistent with what btrfs really supports between test cases.
>
> Thanks,
> Qu
>
>
>>
>>>
>>> +       return NULL;
>>> +}
>>> --
>>> 2.7.0
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
>
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup
  2016-01-15  9:19       ` Filipe Manana
@ 2016-01-15  9:33         ` Qu Wenruo
  2016-01-15 12:36         ` Duncan
  1 sibling, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-15  9:33 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, Wang Xiaoguang



Filipe Manana wrote on 2016/01/15 09:19 +0000:
> On Fri, Jan 15, 2016 at 1:37 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>>
>> Filipe Manana wrote on 2016/01/14 09:56 +0000:
>>>
>>> On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> wrote:
>>>>
>>>> From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>>>
>>>> We use btrfs extended attribute "btrfs.dedup" to record per-file online
>>>> dedup status, so add a dedup property handler.
>>>>
>>>> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
>>>> ---
>>>>    fs/btrfs/props.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 40 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
>>>> index f9e6023..ae8b76d 100644
>>>> --- a/fs/btrfs/props.c
>>>> +++ b/fs/btrfs/props.c
>>>> @@ -41,6 +41,10 @@ static int prop_compression_apply(struct inode *inode,
>>>>                                     size_t len);
>>>>    static const char *prop_compression_extract(struct inode *inode);
>>>>
>>>> +static int prop_dedup_validate(const char *value, size_t len);
>>>> +static int prop_dedup_apply(struct inode *inode, const char *value,
>>>> size_t len);
>>>> +static const char *prop_dedup_extract(struct inode *inode);
>>>> +
>>>>    static struct prop_handler prop_handlers[] = {
>>>>           {
>>>>                   .xattr_name = XATTR_BTRFS_PREFIX "compression",
>>>> @@ -49,6 +53,13 @@ static struct prop_handler prop_handlers[] = {
>>>>                   .extract = prop_compression_extract,
>>>>                   .inheritable = 1
>>>>           },
>>>> +       {
>>>> +               .xattr_name = XATTR_BTRFS_PREFIX "dedup",
>>>> +               .validate = prop_dedup_validate,
>>>> +               .apply = prop_dedup_apply,
>>>> +               .extract = prop_dedup_extract,
>>>> +               .inheritable = 1
>>>> +       },
>>>>    };
>>>>
>>>>    void __init btrfs_props_init(void)
>>>> @@ -425,4 +436,33 @@ static const char *prop_compression_extract(struct
>>>> inode *inode)
>>>>           return NULL;
>>>>    }
>>>>
>>>> +static int prop_dedup_validate(const char *value, size_t len)
>>>> +{
>>>> +       if (!strncmp("disable", value, len))
>>>> +               return 0;
>>>> +
>>>> +       return -EINVAL;
>>>> +}
>>>> +
>>>> +static int prop_dedup_apply(struct inode *inode, const char *value,
>>>> size_t len)
>>>> +{
>>>> +       if (len == 0) {
>>>> +               BTRFS_I(inode)->flags &= ~BTRFS_INODE_NODEDUP;
>>>> +               return 0;
>>>> +       }
>>>> +
>>>> +       if (!strncmp("disable", value, len)) {
>>>> +               BTRFS_I(inode)->flags |= BTRFS_INODE_NODEDUP;
>>>> +               return 0;
>>>> +       }
>>>> +
>>>> +       return -EINVAL;
>>>> +}
>>>> +
>>>> +static const char *prop_dedup_extract(struct inode *inode)
>>>> +{
>>>> +       if (BTRFS_I(inode)->flags | BTRFS_INODE_NODEDUP)
>>>> +               return "disable";
>>>
>>>
>>> | -> &
>>>
>>> How about writing test cases (for xfstests)? Not only for the property
>>> but for the whole dedup feature, it would avoid such simple
>>> mistakes...
>>
>>
>> Yes, it's already in our schedule and we have some internal simple test
>> case.
>>
>> But the problem is,
>> Some behavior/format is not completely determined yet
>> Especially for on-disk format (ondisk backend only), and ioctl interface.
>
> How does someone guesses that? You could add an RFC tag to the patches...
>
>>
>> So I'm a little concerned about submit them too early before we made final
>> decision on the ioctl interface.
>
> What's the problem? You can submit such tests only to the btrfs list
> for now and tag them as RFC and mention that.

Wow, I never though xfstest patches can be submitted to btrfs maillist 
only!!

What a wonderful idea for RFC features like dedup!

I'll soon add such test cases.

Great thanks for such advice!
Qu

>
>>
>>> Take a look at the good example of xfs development. For example when
>>> all the recent patches for their reflink implementation was posted
>>> (and before getting merged), a comprehensive set of test cases for
>>> xfstests was also posted...
>>
>>
>> Yes, test driven development is very nice, but there is also problem,
>> especially for btrfs, like btrfs/047 which needs btrfs send --stream-version
>> support.
>
> Yes, this one was a special case in that around 2 years ago it seemed
> the functionality was going to be merged, but it ended up not being
> merged due to reasons not known to me (no reasons pointed in the
> mailing list nor in private).
>
> Better have unused tests instead of features without tests (and many
> btrfs specific features had no tests or almost nothing not that long
> ago).
>
>>
>> But unfortunately, there is not --stream-version support in upsteam
>> btrfs-progs and the test case never get executed on upsteam btrfs-progs.
>>
>> Personally speaking, xfs development can only happen in that way because
>> xfstest maintainer is also the maintainer of xfs.
>> As Dave knows every aspect of xfs and can determine which feature will be
>> merged.
>
> He can also accept patches to remove tests or change them...
> Shouldn't be an excuse to not share tests along with patches for new
> features, as noted above.
>
>> But that's not true for btrfs, so he can merge wrong patch and cause
>> inconsistent with what btrfs really supports between test cases.
>>
>> Thanks,
>> Qu
>>
>>
>>>
>>>>
>>>> +       return NULL;
>>>> +}
>>>> --
>>>> 2.7.0
>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>>
>>
>>
>
>
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup
  2016-01-15  9:19       ` Filipe Manana
  2016-01-15  9:33         ` Qu Wenruo
@ 2016-01-15 12:36         ` Duncan
  2016-01-15 15:22           ` Filipe Manana
  1 sibling, 1 reply; 35+ messages in thread
From: Duncan @ 2016-01-15 12:36 UTC (permalink / raw)
  To: linux-btrfs

Filipe Manana posted on Fri, 15 Jan 2016 09:19:28 +0000 as excerpted:

>> Yes, test driven development is very nice, but there is also problem,
>> especially for btrfs, like btrfs/047 which needs btrfs send
>> --stream-version support.
> 
> Yes, this one was a special case in that around 2 years ago it seemed
> the functionality was going to be merged, but it ended up not being
> merged due to reasons not known to me (no reasons pointed in the mailing
> list nor in private).

AFAIK/IIRC, the reason as I understood it is that ideally Chris Mason 
wants to do just one more stream version bump, which means being /very/ 
sure it includes anything that has come up as needed in intervening 
development.  There's a few things agreed to be missing in the current 
stream version, yes, but it's usable as-is in the near term, and before 
that bump is done, everything else related needs to be in a mature enough 
development state that we can be reasonably sure nothing else is going to 
come up as still missing, at least for a couple years after the stream 
version bump.

And without a second stream version to try, a --stream-version option 
isn't particularly testable, so it was judged to be too early to add it 
as an option.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup
  2016-01-15 12:36         ` Duncan
@ 2016-01-15 15:22           ` Filipe Manana
  2016-01-16  2:53             ` Duncan
  0 siblings, 1 reply; 35+ messages in thread
From: Filipe Manana @ 2016-01-15 15:22 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Fri, Jan 15, 2016 at 12:36 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> Filipe Manana posted on Fri, 15 Jan 2016 09:19:28 +0000 as excerpted:
>
>>> Yes, test driven development is very nice, but there is also problem,
>>> especially for btrfs, like btrfs/047 which needs btrfs send
>>> --stream-version support.
>>
>> Yes, this one was a special case in that around 2 years ago it seemed
>> the functionality was going to be merged, but it ended up not being
>> merged due to reasons not known to me (no reasons pointed in the mailing
>> list nor in private).
>
> AFAIK/IIRC, the reason as I understood it is that ideally Chris Mason
> wants to do just one more stream version bump, which means being /very/
> sure it includes anything that has come up as needed in intervening
> development.  There's a few things agreed to be missing in the current
> stream version

Either you have had access to information I didn't had (Chris didn't
ever commented on
it, but Josef, David and Mark had) or you missed part of past discussions in the
mailing list (or perhaps my mail accounts didn't got anything).
Yes there were several new things discussed for a new version (and lot
of it listed in some
wiki page iirc), some of which I implemented and the remaining I
didn't, but left the necessary
changes in the stream format to avoid bumping the stream version
again. And that's where things
settled, without any more comments (neither on the mailing list nor
anywhere I could read,
something that happens often unfortunately).

>, yes, but it's usable as-is in the near term, and before
> that bump is done, everything else related needs to be in a mature enough
> development state that we can be reasonably sure nothing else is going to
> come up as still missing, at least for a couple years after the stream
> version bump.

Yes, send has had (and still has) several problems, but that shouldn't
prevent it from
getting more new useful commands indefinitely. Just like every part of
btrfs still has
bugs (you can see it either from the endless user reports or patches
coming in in every
release) with several degrees of severity.

>
> And without a second stream version to try, a --stream-version option
> isn't particularly testable, so it was judged to be too early to add it
> as an option.
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup
  2016-01-15 15:22           ` Filipe Manana
@ 2016-01-16  2:53             ` Duncan
  0 siblings, 0 replies; 35+ messages in thread
From: Duncan @ 2016-01-16  2:53 UTC (permalink / raw)
  To: linux-btrfs

Filipe Manana posted on Fri, 15 Jan 2016 15:22:39 +0000 as excerpted:

> On Fri, Jan 15, 2016 at 12:36 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>> Filipe Manana posted on Fri, 15 Jan 2016 09:19:28 +0000 as excerpted:
>>
>>>> Yes, test driven development is very nice, but there is also problem,
>>>> especially for btrfs, like btrfs/047 which needs btrfs send
>>>> --stream-version support.
>>>
>>> Yes, this one was a special case in that around 2 years ago it seemed
>>> the functionality was going to be merged, but it ended up not being
>>> merged due to reasons not known to me (no reasons pointed in the
>>> mailing list nor in private).
>>
>> AFAIK/IIRC, the reason as I understood it is that ideally Chris Mason
>> wants to do just one more stream version bump, which means being /very/
>> sure it includes anything that has come up as needed in intervening
>> development.  There's a few things agreed to be missing in the current
>> stream version
> 
> Either you have had access to information I didn't had (Chris didn't
> ever commented on it, but Josef, David and Mark had) or you missed part
> of past discussions in the mailing list (or perhaps my mail accounts
> didn't got anything).
> Yes there were several new things discussed for a new version (and lot
> of it listed in some wiki page iirc), some of which I implemented and
> the remaining I didn't, but left the necessary changes in the stream
> format to avoid bumping the stream version again. And that's where
> things settled, without any more comments (neither on the mailing list
> nor anywhere I could read,
> something that happens often unfortunately).

So AFAIK/IIRC was incorrect, as I thought I had seen CMason comment on it 
but perhaps it was the others, and I wasn't aware that you had 
implemented all the proposed stream format changes, even if not all the 
code to actually make use of them.  But then it just... dropped.

Regardless of the details from back then, however, btrfs is arguably in a 
far more stable and mature state now than it was two years ago, and it 
may well be time to actually do that stream format bump now.

Tho "now" obviously means 4.6 at the earliest as it'll surely take a bit 
to dust off and reevaluate the patches, putting it beyond the current 4.5 
commit window.

Thanks.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 05/18] btrfs: delayed-ref: Add support for atomic increasing extent ref
  2016-01-15  1:16     ` Qu Wenruo
@ 2016-01-20  3:25       ` Qu Wenruo
  0 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2016-01-20  3:25 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs



Qu Wenruo wrote on 2016/01/15 09:16 +0800:
> Hi Filipe,
>
> Thanks for your review.
>
> Filipe Manana wrote on 2016/01/14 09:56 +0000:
>> On Thu, Jan 14, 2016 at 5:57 AM, Qu Wenruo <quwenruo@cn.fujitsu.com>
>> wrote:
>>> Slightly modify btrfs_add_delayed_data_ref() to allow it accept
>>> GFP_ATOMIC, and allow it to do be called inside a spinlock.
>>
>> Hi Qu,
>>
>> I really would prefer to avoid gfp_atomic allocations.
>> Instead of using them, how about changing the approach so that callers
>> allocate the ref structure (without using gfp_atomic), then take the
>> spinlock, and then pass the structure to the inc/dec ref functions?
>
> Yes, we also considered that method too.
>
> But since it's only used for dedup hit case which has no existing
> delayed ref head, it's on a uncommon routine.
>
> On the other hand, if using the method you mentioned, alloc memory first
> then pass it to add_delayed_data_ref(), we will always need to alloc
> memory for minority cases.
>
> It would be very nice if you can provide some extra disadvantage on
> using GFP_ATOMIC, maybe there is something important I missed.
>
> Thanks
> Qu
>

My previous comment is totally wrong.

Such ATOMIC allocation failure has happened twice, much much more 
possible than I expected.

So I'll use the method mentioned.

Thanks,
Qu

>>
>> thanks
>>
>>>
>>> This is used by later dedup patches.
>>>
>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> ---
>>>   fs/btrfs/ctree.h       |  4 ++++
>>>   fs/btrfs/delayed-ref.c | 25 +++++++++++++++++--------
>>>   fs/btrfs/delayed-ref.h |  2 +-
>>>   fs/btrfs/extent-tree.c | 24 +++++++++++++++++++++---
>>>   4 files changed, 43 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index 2132fa5..671be87 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -3539,6 +3539,10 @@ int btrfs_inc_extent_ref(struct
>>> btrfs_trans_handle *trans,
>>>                           struct btrfs_root *root,
>>>                           u64 bytenr, u64 num_bytes, u64 parent,
>>>                           u64 root_objectid, u64 owner, u64 offset);
>>> +int btrfs_inc_extent_ref_atomic(struct btrfs_trans_handle *trans,
>>> +                               struct btrfs_root *root, u64 bytenr,
>>> +                               u64 num_bytes, u64 parent,
>>> +                               u64 root_objectid, u64 owner, u64
>>> offset);
>>>
>>>   int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
>>>                                     struct btrfs_root *root);
>>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>>> index 914ac13..e869442 100644
>>> --- a/fs/btrfs/delayed-ref.c
>>> +++ b/fs/btrfs/delayed-ref.c
>>> @@ -812,26 +812,31 @@ int btrfs_add_delayed_data_ref(struct
>>> btrfs_fs_info *fs_info,
>>>                                 u64 bytenr, u64 num_bytes,
>>>                                 u64 parent, u64 ref_root,
>>>                                 u64 owner, u64 offset, u64 reserved,
>>> int action,
>>> -                              struct btrfs_delayed_extent_op
>>> *extent_op)
>>> +                              int atomic)
>>>   {
>>>          struct btrfs_delayed_data_ref *ref;
>>>          struct btrfs_delayed_ref_head *head_ref;
>>>          struct btrfs_delayed_ref_root *delayed_refs;
>>>          struct btrfs_qgroup_extent_record *record = NULL;
>>> +       gfp_t gfp_flags;
>>> +
>>> +       if (atomic)
>>> +               gfp_flags = GFP_ATOMIC;
>>> +       else
>>> +               gfp_flags = GFP_NOFS;
>>>
>>> -       BUG_ON(extent_op && !extent_op->is_data);
>>> -       ref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, GFP_NOFS);
>>> +       ref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep,
>>> gfp_flags);
>>>          if (!ref)
>>>                  return -ENOMEM;
>>>
>>> -       head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep,
>>> GFP_NOFS);
>>> +       head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep,
>>> gfp_flags);
>>>          if (!head_ref) {
>>>                  kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
>>>                  return -ENOMEM;
>>>          }
>>>
>>>          if (fs_info->quota_enabled && is_fstree(ref_root)) {
>>> -               record = kmalloc(sizeof(*record), GFP_NOFS);
>>> +               record = kmalloc(sizeof(*record), gfp_flags);
>>>                  if (!record) {
>>>
>>> kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
>>>                          kmem_cache_free(btrfs_delayed_ref_head_cachep,
>>> @@ -840,10 +845,13 @@ int btrfs_add_delayed_data_ref(struct
>>> btrfs_fs_info *fs_info,
>>>                  }
>>>          }
>>>
>>> -       head_ref->extent_op = extent_op;
>>> +       head_ref->extent_op = NULL;
>>>
>>>          delayed_refs = &trans->transaction->delayed_refs;
>>> -       spin_lock(&delayed_refs->lock);
>>> +
>>> +       /* For atomic case, caller should already hold the
>>> delayed_refs lock */
>>> +       if (!atomic)
>>> +               spin_lock(&delayed_refs->lock);
>>>
>>>          /*
>>>           * insert both the head node and the new ref without dropping
>>> @@ -856,7 +864,8 @@ int btrfs_add_delayed_data_ref(struct
>>> btrfs_fs_info *fs_info,
>>>          add_delayed_data_ref(fs_info, trans, head_ref, &ref->node,
>>> bytenr,
>>>                                     num_bytes, parent, ref_root,
>>> owner, offset,
>>>                                     action);
>>> -       spin_unlock(&delayed_refs->lock);
>>> +       if (!atomic)
>>> +               spin_unlock(&delayed_refs->lock);
>>>
>>>          return 0;
>>>   }
>>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>>> index c24b653..e34f96a 100644
>>> --- a/fs/btrfs/delayed-ref.h
>>> +++ b/fs/btrfs/delayed-ref.h
>>> @@ -249,7 +249,7 @@ int btrfs_add_delayed_data_ref(struct
>>> btrfs_fs_info *fs_info,
>>>                                 u64 bytenr, u64 num_bytes,
>>>                                 u64 parent, u64 ref_root,
>>>                                 u64 owner, u64 offset, u64 reserved,
>>> int action,
>>> -                              struct btrfs_delayed_extent_op
>>> *extent_op);
>>> +                              int atomic);
>>>   int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
>>>                                       struct btrfs_trans_handle *trans,
>>>                                       u64 ref_root, u64 bytenr, u64
>>> num_bytes);
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 60cc139..4a01ca9 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -2105,11 +2105,29 @@ int btrfs_inc_extent_ref(struct
>>> btrfs_trans_handle *trans,
>>>                  ret = btrfs_add_delayed_data_ref(fs_info, trans,
>>> bytenr,
>>>                                          num_bytes, parent,
>>> root_objectid,
>>>                                          owner, offset, 0,
>>> -                                       BTRFS_ADD_DELAYED_REF, NULL);
>>> +                                       BTRFS_ADD_DELAYED_REF, 0);
>>>          }
>>>          return ret;
>>>   }
>>>
>>> +int btrfs_inc_extent_ref_atomic(struct btrfs_trans_handle *trans,
>>> +                               struct btrfs_root *root, u64 bytenr,
>>> +                               u64 num_bytes, u64 parent,
>>> +                               u64 root_objectid, u64 owner, u64
>>> offset)
>>> +{
>>> +       struct btrfs_fs_info *fs_info = root->fs_info;
>>> +
>>> +       BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID &&
>>> +              root_objectid == BTRFS_TREE_LOG_OBJECTID);
>>> +
>>> +       /* Only used by dedup, so only data is possible */
>>> +       if (WARN_ON(owner < BTRFS_FIRST_FREE_OBJECTID))
>>> +               return -EINVAL;
>>> +       return btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
>>> +                       num_bytes, parent, root_objectid,
>>> +                       owner, offset, 0, BTRFS_ADD_DELAYED_REF, 1);
>>> +}
>>> +
>>>   static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>>>                                    struct btrfs_root *root,
>>>                                    struct btrfs_delayed_ref_node *node,
>>> @@ -6893,7 +6911,7 @@ int btrfs_free_extent(struct btrfs_trans_handle
>>> *trans, struct btrfs_root *root,
>>>                                                  num_bytes,
>>>                                                  parent,
>>> root_objectid, owner,
>>>                                                  offset, 0,
>>> -
>>> BTRFS_DROP_DELAYED_REF, NULL);
>>> +
>>> BTRFS_DROP_DELAYED_REF, 0);
>>>          }
>>>          return ret;
>>>   }
>>> @@ -7845,7 +7863,7 @@ int btrfs_alloc_reserved_file_extent(struct
>>> btrfs_trans_handle *trans,
>>>                                           ins->offset, 0,
>>>                                           root_objectid, owner, offset,
>>>                                           ram_bytes,
>>> BTRFS_ADD_DELAYED_EXTENT,
>>> -                                        NULL);
>>> +                                        0);
>>>          return ret;
>>>   }
>>>
>>> --
>>> 2.7.0
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-01-20  3:25 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-14  5:57 [PATCH v4 00/14][For 4.6] Btrfs: Add inband (write time) de-duplication framework Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 01/18] btrfs: dedup: Introduce dedup framework and its header Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 02/18] btrfs: dedup: Introduce function to initialize dedup info Qu Wenruo
2016-01-14 21:33   ` kbuild test robot
2016-01-14  5:57 ` [PATCH v4 03/18] btrfs: dedup: Introduce function to add hash into in-memory tree Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 04/18] btrfs: dedup: Introduce function to remove hash from " Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 05/18] btrfs: delayed-ref: Add support for atomic increasing extent ref Qu Wenruo
2016-01-14  9:56   ` Filipe Manana
2016-01-15  1:16     ` Qu Wenruo
2016-01-20  3:25       ` Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 06/18] btrfs: dedup: Introduce function to search for an existing hash Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 07/18] btrfs: dedup: Implement btrfs_dedup_calc_hash interface Qu Wenruo
2016-01-14 10:08   ` Filipe Manana
2016-01-15  1:41     ` Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 08/18] btrfs: ordered-extent: Add support for dedup Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 09/18] btrfs: dedup: Inband in-memory only de-duplication implement Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 10/18] btrfs: dedup: Add basic tree structure for on-disk dedup method Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 11/18] btrfs: dedup: Introduce interfaces to resume and cleanup dedup info Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 12/18] btrfs: dedup: Add support for on-disk hash search Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 13/18] btrfs: dedup: Add support to delete hash for on-disk backend Qu Wenruo
2016-01-14 10:19   ` Filipe Manana
2016-01-15  1:43     ` Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 14/18] btrfs: dedup: Add support for adding " Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 15/18] btrfs: dedup: Add ioctl for inband deduplication Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 16/18] btrfs: dedup: add an inode nodedup flag Qu Wenruo
2016-01-14  5:57 ` [PATCH v4 17/18] btrfs: dedup: add a property handler for online dedup Qu Wenruo
2016-01-14  9:56   ` Filipe Manana
2016-01-14 19:04     ` Darrick J. Wong
2016-01-15  1:37     ` Qu Wenruo
2016-01-15  9:19       ` Filipe Manana
2016-01-15  9:33         ` Qu Wenruo
2016-01-15 12:36         ` Duncan
2016-01-15 15:22           ` Filipe Manana
2016-01-16  2:53             ` Duncan
2016-01-14  5:57 ` [PATCH v4 18/18] btrfs: dedup: add per-file online dedup control Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.