All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/14] Accurate qgroup reserve framework
@ 2015-09-08  8:37 Qu Wenruo
  2015-09-08  8:37 ` [PATCH 01/19] btrfs: qgroup: New function declaration for new reserve implement Qu Wenruo
                   ` (18 more replies)
  0 siblings, 19 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

[[BUG]]
One of the most common case to trigger the bug is the following method:
1) Enable quota
2) Limit excl of qgroup 5 to 16M
3) Write [0,2M) of a file inside subvol 5 10 times without sync

EQUOT will be triggered at about the 8th write.

[[CAUSE]]
The problem is caused by the fact that qgroup will reserve space even
the data space is already reserved.

In above reproducer, each time we buffered write [0,2M) qgroup will
reserve 2M space, but in fact, at the 1st time, we have already reserved
2M and from then on, we don't need to reserved any data space as we are
only writing [0,2M).

Also, the reserved space will only be freed *ONCE* when its backref is
run at commit_transaction() time.

That's causing the reserved space leaking.

[[FIX]]
The fix is not a simple one, as currently btrfs_qgroup_reserve() follow
the very bad btrfs space allocating principle:
  Allocate as much as you needed, even it's not fully used.

So for accurate qgroup reserve, we introduce a completely new framework
for data and metadata.
1) Per-inode data reserve map
   Now, each inode will have a data reserve map, recording which range
   of data is already reserved.
   If we are writing a range which is already reserved, we won't need to
   reserve space again.

   Also, for the fact that qgroup is only accounted at commit_trans(),
   for data commit into disc and its metadata is also inserted into
   current tree, we should free the data reserved range, but still keep
   the reserved space until commit_trans().

   So delayed_ref_head will have new members to record how much space is
   reserved and free them at commit_trans() time.

2) Per-root metadata reserve counter
   For metadata(tree block), it's impossible to know how much space it
   will use exactly in advance.
   And due to the new qgroup accounting framework, the old
   free-at-end-trans may lead to exceeding limit.

   So we record how much metadata space is reserved for each root, and
   free them at commit_trans() time.
   This method is not perfect, but thanks to the compared small size of
   metadata, it should be quite good.

More detailed info can be found in each commit message and source
commend.

Qu Wenruo (19):
  btrfs: qgroup: New function declaration for new reserve implement
  btrfs: qgroup: Implement data_rsv_map init/free functions
  btrfs: qgroup: Introduce new function to search most left reserve
    range
  btrfs: qgroup: Introduce function to insert non-overlap reserve range
  btrfs: qgroup: Introduce function to reserve data range per inode
  btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function
  btrfs: qgroup: Introduce function to release reserved range
  btrfs: qgroup: Introduce function to release/free reserved data range
  btrfs: delayed_ref: Add new function to record reserved space into    
    delayed ref
  btrfs: delayed_ref: release and free qgroup reserved at proper timing
  btrfs: qgroup: Introduce new functions to reserve/free metadata
  btrfs: qgroup: Use new metadata reservation.
  btrfs: extent-tree: Add new verions of btrfs_check_data_free_space
  btrfs: Switch to new check_data_free_space
  btrfs: fallocate: Add support to accurate qgroup reserve
  btrfs: extent-tree: Add new version of btrfs_delalloc_reserve_space
  btrfs: extent-tree: Use new __btrfs_delalloc_reserve_space function
  btrfs: qgroup: Cleanup old inaccurate facilities
  btrfs: qgroup: Add handler for NOCOW and inline

 fs/btrfs/btrfs_inode.h |   6 +
 fs/btrfs/ctree.h       |   8 +-
 fs/btrfs/delayed-ref.c |  29 +++
 fs/btrfs/delayed-ref.h |  14 +
 fs/btrfs/disk-io.c     |   1 +
 fs/btrfs/extent-tree.c |  99 +++++---
 fs/btrfs/file.c        | 169 +++++++++----
 fs/btrfs/inode-map.c   |   2 +-
 fs/btrfs/inode.c       |  51 +++-
 fs/btrfs/ioctl.c       |   3 +-
 fs/btrfs/qgroup.c      | 674 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/qgroup.h      |  18 +-
 fs/btrfs/transaction.c |  34 +--
 fs/btrfs/transaction.h |   1 -
 14 files changed, 979 insertions(+), 130 deletions(-)

-- 
2.5.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/19] btrfs: qgroup: New function declaration for new reserve implement
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 02/19] btrfs: qgroup: Implement data_rsv_map init/free functions Qu Wenruo
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Add new structures and functions for new qgroup reserve implement dirty
phase.
Which will focus on avoiding over-reserve as in that case, which means
for already reserved dirty space range, we won't reserve space again.

This patch adds the needed structure declaration and comments.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/btrfs_inode.h |  4 ++++
 fs/btrfs/qgroup.c      | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h      |  3 +++
 3 files changed, 65 insertions(+)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 81220b2..e3ece65 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -24,6 +24,7 @@
 #include "extent_io.h"
 #include "ordered-data.h"
 #include "delayed-inode.h"
+#include "qgroup.h"
 
 /*
  * ordered_data_close is set by truncate when a file that used
@@ -195,6 +196,9 @@ struct btrfs_inode {
 	struct timespec i_otime;
 
 	struct inode vfs_inode;
+
+	/* qgroup dirty map for data space reserve */
+	struct btrfs_qgroup_data_rsv_map *qgroup_rsv_map;
 };
 
 extern unsigned char btrfs_filetype_table[];
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index e9ace09..561c36d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -91,6 +91,64 @@ struct btrfs_qgroup {
 	u64 new_refcnt;
 };
 
+/*
+ * Record one range of reserved space.
+ */
+struct data_rsv_range {
+	struct rb_node node;
+	u64 start;
+	u64 len;
+};
+
+/*
+ * Record per inode reserved range.
+ * This is mainly used to resolve reserved space leaking problem.
+ * One of the cause is the mismatch with reserve and free.
+ *
+ * New qgroup will handle reserve in two phase.
+ * 1) Dirty phase.
+ *    Pages are just marked dirty, but not written to disk.
+ * 2) Flushed phase
+ *    Pages are written to disk, but transaction is not committed yet.
+ *
+ * At Diryt phase, we only need to focus on avoiding over-reserve.
+ *
+ * The idea is like below.
+ * 1) Write [0,8K)
+ * 0	4K	8K	12K	16K
+ * |////////////|
+ * Reserve +8K, total reserved: 8K
+ *
+ * 2) Write [0,4K)
+ * 0	4K	8K	12K	16K
+ * |////////////|
+ * Reserve 0, total reserved 8K
+ *
+ * 3) Write [12K,16K)
+ * 0	4K	8K	12K	16K
+ * |////////////|	|///////|
+ * Reserve +4K, tocal reserved 12K
+ *
+ * 4) Flush [0,8K)
+ * Can happen without commit transaction, like fallocate will trigger the
+ * write.
+ * 0	4K	8K	12K	16K
+ *			|///////|
+ * Reserve 0, tocal reserved 12K
+ * As the extent is written to disk, not dirty any longer, the range get
+ * removed.
+ * But as its delayed_refs is not run, its reserved space will not be freed.
+ * And things continue to Flushed phase.
+ *
+ * By this method, we can avoid over-reserve, which will lead to reserved
+ * space leak.
+ */
+struct btrfs_qgroup_data_rsv_map {
+	struct rb_root root;
+	u64 reserved;
+	spinlock_t lock;
+};
+
 static void btrfs_qgroup_update_old_refcnt(struct btrfs_qgroup *qg, u64 seq,
 					   int mod)
 {
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 6387dcf..2f863a4 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -33,6 +33,9 @@ struct btrfs_qgroup_extent_record {
 	struct ulist *old_roots;
 };
 
+/* For per-inode dirty range reserve */
+struct btrfs_qgroup_data_rsv_map;
+
 int btrfs_quota_enable(struct btrfs_trans_handle *trans,
 		       struct btrfs_fs_info *fs_info);
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/19] btrfs: qgroup: Implement data_rsv_map init/free functions
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
  2015-09-08  8:37 ` [PATCH 01/19] btrfs: qgroup: New function declaration for new reserve implement Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 03/19] btrfs: qgroup: Introduce new function to search most left reserve range Qu Wenruo
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

New functions btrfs_qgroup_init/free_data_rsv_map() to init/free data
reserve map.

Data reserve map is used to mark which range already holds reserved
space, to avoid current reserved space leak.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/btrfs_inode.h |  2 ++
 fs/btrfs/inode.c       | 10 +++++++
 fs/btrfs/qgroup.c      | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h      |  3 ++
 4 files changed, 92 insertions(+)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index e3ece65..27cc338 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -199,6 +199,8 @@ struct btrfs_inode {
 
 	/* qgroup dirty map for data space reserve */
 	struct btrfs_qgroup_data_rsv_map *qgroup_rsv_map;
+	/* lock to ensure rsv_map will only be initialized once */
+	spinlock_t qgroup_init_lock;
 };
 
 extern unsigned char btrfs_filetype_table[];
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 37dd8d0..61b2c17 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8939,6 +8939,14 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	INIT_LIST_HEAD(&ei->delalloc_inodes);
 	RB_CLEAR_NODE(&ei->rb_node);
 
+	/*
+	 * Init qgroup info to empty, as they will be initialized at write
+	 * time.
+	 * This behavior is needed for enable quota later case.
+	 */
+	spin_lock_init(&ei->qgroup_init_lock);
+	ei->qgroup_rsv_map = NULL;
+
 	return inode;
 }
 
@@ -8996,6 +9004,8 @@ void btrfs_destroy_inode(struct inode *inode)
 			btrfs_put_ordered_extent(ordered);
 		}
 	}
+	/* free and check data rsv map */
+	btrfs_qgroup_free_data_rsv_map(inode);
 	inode_tree_del(inode);
 	btrfs_drop_extent_cache(inode, 0, (u64)-1, 0);
 free:
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 561c36d..cf07c17 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2539,3 +2539,80 @@ btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
 		btrfs_queue_work(fs_info->qgroup_rescan_workers,
 				 &fs_info->qgroup_rescan_work);
 }
+
+/*
+ * Init data_rsv_map for a given inode.
+ *
+ * This is needed at write time as quota can be disabled and then enabled
+ */
+int btrfs_qgroup_init_data_rsv_map(struct inode *inode)
+{
+	struct btrfs_inode *binode = BTRFS_I(inode);
+	struct btrfs_root *root = binode->root;
+	struct btrfs_qgroup_data_rsv_map *dirty_map;
+
+	if (!root->fs_info->quota_enabled || !is_fstree(root->objectid))
+		return 0;
+
+	spin_lock(&binode->qgroup_init_lock);
+	/* Quick route for init */
+	if (likely(binode->qgroup_rsv_map))
+		goto out;
+	spin_unlock(&binode->qgroup_init_lock);
+
+	/*
+	 * Slow allocation route
+	 *
+	 * TODO: Use kmem_cache to speedup allocation
+	 */
+	dirty_map = kmalloc(sizeof(*dirty_map), GFP_NOFS);
+	if (!dirty_map)
+		return -ENOMEM;
+
+	dirty_map->reserved = 0;
+	dirty_map->root = RB_ROOT;
+	spin_lock_init(&dirty_map->lock);
+
+	/* Lock again to ensure no one has already init it before */
+	spin_lock(&binode->qgroup_init_lock);
+	if (binode->qgroup_rsv_map) {
+		spin_unlock(&binode->qgroup_init_lock);
+		kfree(dirty_map);
+		return 0;
+	}
+	binode->qgroup_rsv_map = dirty_map;
+out:
+	spin_unlock(&binode->qgroup_init_lock);
+	return 0;
+}
+
+void btrfs_qgroup_free_data_rsv_map(struct inode *inode)
+{
+	struct btrfs_inode *binode = BTRFS_I(inode);
+	struct btrfs_root *root = binode->root;
+	struct btrfs_qgroup_data_rsv_map *dirty_map = binode->qgroup_rsv_map;
+	struct rb_node *node;
+
+	/*
+	 * this function is called at inode destroy routine, so no concurrency
+	 * will happen, no need to get the lock.
+	 */
+	if (!dirty_map)
+		return;
+
+	/* insanity check */
+	WARN_ON(!root->fs_info->quota_enabled || !is_fstree(root->objectid));
+
+	btrfs_qgroup_free(root, dirty_map->reserved);
+	spin_lock(&dirty_map->lock);
+	while ((node = rb_first(&dirty_map->root)) != NULL) {
+		struct data_rsv_range *range;
+
+		range = rb_entry(node, struct data_rsv_range, node);
+		rb_erase(node, &dirty_map->root);
+		kfree(range);
+	}
+	spin_unlock(&dirty_map->lock);
+	kfree(dirty_map);
+	binode->qgroup_rsv_map = NULL;
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 2f863a4..c87b7dc 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -84,4 +84,7 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 			       u64 rfer, u64 excl);
 #endif
 
+/* for qgroup reserve */
+int btrfs_qgroup_init_data_rsv_map(struct inode *inode);
+void btrfs_qgroup_free_data_rsv_map(struct inode *inode);
 #endif /* __BTRFS_QGROUP__ */
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/19] btrfs: qgroup: Introduce new function to search most left reserve range
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
  2015-09-08  8:37 ` [PATCH 01/19] btrfs: qgroup: New function declaration for new reserve implement Qu Wenruo
  2015-09-08  8:37 ` [PATCH 02/19] btrfs: qgroup: Implement data_rsv_map init/free functions Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 04/19] btrfs: qgroup: Introduce function to insert non-overlap " Qu Wenruo
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Introduce the new function to search the most left reserve range in a
reserve map.

It provides the basis for later reserve map implement.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index cf07c17..fc24fc3 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2541,6 +2541,42 @@ btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
 }
 
 /*
+ * Return the nearest left range of given start
+ * No ensure about the range will cover start.
+ */
+static struct data_rsv_range *
+find_reserve_range(struct btrfs_qgroup_data_rsv_map *map, u64 start)
+{
+	struct rb_node **p = &map->root.rb_node;
+	struct rb_node *parent = NULL;
+	struct rb_node *prev = NULL;
+	struct data_rsv_range *range = NULL;
+
+	while (*p) {
+		parent = *p;
+		range = rb_entry(parent, struct data_rsv_range, node);
+		if (range->start < start)
+			p = &(*p)->rb_right;
+		else if (range->start > start)
+			p = &(*p)->rb_left;
+		else
+			return range;
+	}
+
+	/* empty tree */
+	if (!parent)
+		return NULL;
+	if (range->start <= start)
+		return range;
+
+	prev = rb_prev(parent);
+	/* Already most left one */
+	if (!prev)
+		return range;
+	return rb_entry(prev, struct data_rsv_range, node);
+}
+
+/*
  * Init data_rsv_map for a given inode.
  *
  * This is needed at write time as quota can be disabled and then enabled
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/19] btrfs: qgroup: Introduce function to insert non-overlap reserve range
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (2 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 03/19] btrfs: qgroup: Introduce new function to search most left reserve range Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 05/19] btrfs: qgroup: Introduce function to reserve data range per inode Qu Wenruo
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

New function insert_data_ranges() will insert non-overlap reserve ranges
into reserve map.

It provides the basis for later qgroup reserve map implement.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index fc24fc3..a4e3af4 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2577,6 +2577,130 @@ find_reserve_range(struct btrfs_qgroup_data_rsv_map *map, u64 start)
 }
 
 /*
+ * Insert one data range
+ * [start,len) here won't overflap with each other.
+ *
+ * Return 0 if range is inserted and tmp is not used.
+ * Return > 0 if range is inserted and tmp is used.
+ * No catchable error case. Only possible error will cause BUG_ON() as
+ * that's logical error.
+ */
+static int insert_data_range(struct btrfs_qgroup_data_rsv_map *map,
+			     struct data_rsv_range *tmp,
+			     u64 start, u64 len)
+{
+	struct rb_node **p = &map->root.rb_node;
+	struct rb_node *parent = NULL;
+	struct rb_node *tmp_node = NULL;
+	struct data_rsv_range *range = NULL;
+	struct data_rsv_range *prev_range = NULL;
+	struct data_rsv_range *next_range = NULL;
+	int prev_merged = 0;
+	int next_merged = 0;
+	int ret = 0;
+
+	while (*p) {
+		parent = *p;
+		range = rb_entry(parent, struct data_rsv_range, node);
+		if (range->start < start)
+			p = &(*p)->rb_right;
+		else if (range->start > start)
+			p = &(*p)->rb_left;
+		else
+			BUG_ON(1);
+	}
+
+	/* Empty tree, goto isolated case */
+	if (!range)
+		goto insert_isolated;
+
+	/* get adjusted ranges */
+	if (range->start < start) {
+		prev_range = range;
+		tmp_node = rb_next(parent);
+		if (tmp)
+			next_range = rb_entry(tmp_node, struct data_rsv_range,
+					      node);
+	} else {
+		next_range = range;
+		tmp_node = rb_prev(parent);
+		if (tmp)
+			prev_range = rb_entry(tmp_node, struct data_rsv_range,
+					      node);
+	}
+
+	/* try to merge with previous and next ranges */
+	if (prev_range && prev_range->start + prev_range->len == start) {
+		prev_merged = 1;
+		prev_range->len += len;
+	}
+	if (next_range && start + len == next_range->start) {
+		next_merged = 1;
+
+		/*
+		 * the range can be merged with adjusted two ranges into one,
+		 * remove the tailing range.
+		 */
+		if (prev_merged) {
+			prev_range->len += next_range->len;
+			rb_erase(&next_range->node, &map->root);
+			kfree(next_range);
+		} else {
+			next_range->start = start;
+			next_range->len += len;
+		}
+	}
+
+insert_isolated:
+	/* isolated case, need to insert range now */
+	if (!next_merged && !prev_merged) {
+		BUG_ON(!tmp);
+
+		tmp->start = start;
+		tmp->len = len;
+		rb_link_node(&tmp->node, parent, p);
+		rb_insert_color(&tmp->node, &map->root);
+		ret = 1;
+	}
+	return ret;
+}
+
+/*
+ * insert reserve range and merge them if possible
+ *
+ * Return 0 if all inserted and tmp not used
+ * Return > 0 if all inserted and tmp used
+ * No catchable error return value.
+ */
+static int insert_data_ranges(struct btrfs_qgroup_data_rsv_map *map,
+			      struct data_rsv_range *tmp,
+			      struct ulist *insert_list)
+{
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+	int tmp_used = 0;
+	int ret = 0;
+
+	ULIST_ITER_INIT(&uiter);
+	while ((unode = ulist_next(insert_list, &uiter))) {
+		ret = insert_data_range(map, tmp, unode->val, unode->aux);
+
+		/*
+		 * insert_data_range() won't return error return value,
+		 * no need to hanle <0 case.
+		 *
+		 * Also tmp should be used at most one time, so clear it to
+		 * NULL to cooperate with sanity check in insert_data_range().
+		 */
+		if (ret > 0) {
+			tmp_used = 1;
+			tmp = NULL;
+		}
+	}
+	return tmp_used;
+}
+
+/*
  * Init data_rsv_map for a given inode.
  *
  * This is needed at write time as quota can be disabled and then enabled
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/19] btrfs: qgroup: Introduce function to reserve data range per inode
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (3 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 04/19] btrfs: qgroup: Introduce function to insert non-overlap " Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 06/19] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Introduce new function reserve_data_range().
This function will find non-overlap range and to insert it into reserve
map using previously introduced functions.

This provides the basis for later per inode reserve map implement.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index a4e3af4..77a2e07 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2701,6 +2701,98 @@ static int insert_data_ranges(struct btrfs_qgroup_data_rsv_map *map,
 }
 
 /*
+ * Check qgroup limit and insert dirty range into reserve_map.
+ *
+ * Must be called with map->lock hold
+ */
+static int reserve_data_range(struct btrfs_root *root,
+			      struct btrfs_qgroup_data_rsv_map *map,
+			      struct data_rsv_range *tmp,
+			      struct ulist *insert_list, u64 start, u64 len)
+{
+	struct data_rsv_range *range;
+	u64 cur_start = 0;
+	u64 cur_len = 0;
+	u64 reserve = 0;
+	int ret = 0;
+
+	range = find_reserve_range(map, start);
+	/* empty tree, insert the whole range */
+	if (!range) {
+		reserve = len;
+		ret = ulist_add(insert_list, start, len, GFP_ATOMIC);
+		if (ret < 0)
+			return ret;
+		goto insert;
+	}
+
+	/* For case range is covering the leading part */
+	if (range->start <= start && range->start + range->len > start)
+		cur_start = range->start + range->len;
+	else
+		cur_start = start;
+
+	/*
+	 * iterate until the end of the range.
+	 * Like the following:
+	 *
+	 *	|<--------desired---------------------->|
+	 *|//1//|	|////2//|	|///3///|	<- exists
+	 * Then we will need to insert the following
+	 *	|\\\4\\\|	|\\\5\\\|	|\\\6\\\|
+	 * And only add qgroup->reserved for rang 4,5,6.
+	 */
+	while (cur_start < start + len) {
+		struct rb_node *next_node;
+		u64 next_start;
+
+		if (range->start + range->len <= cur_start) {
+			/*
+			 * Move to next range if current range is before
+			 * cur_start
+			 * e.g range is 1, cur_start is the end of range 1.
+			 */
+			next_node = rb_next(&range->node);
+			if (!next_node) {
+				/*
+				 * no next range, fill the rest
+				 * e.g range is 3, cur_start is end of range 3.
+				 */
+				cur_len = start + len - cur_start;
+				next_start = start + len;
+			} else {
+				range = rb_entry(next_node,
+						 struct data_rsv_range, node);
+				cur_len = min(range->start, start + len) -
+					  cur_start;
+				next_start = range->start + range->len;
+			}
+		} else {
+			/*
+			 * current range is already after cur_start
+			 * e.g range is 2, cur_start is end of range 1.
+			 */
+			cur_len = min(range->start, start + len) - cur_start;
+			next_start = range->start + range->len;
+		}
+		reserve += cur_len;
+		ret = ulist_add(insert_list, cur_start, cur_len, GFP_ATOMIC);
+		if (ret < 0)
+			return ret;
+
+		cur_start = next_start;
+	}
+insert:
+	ret = btrfs_qgroup_reserve(root, reserve);
+	if (ret < 0)
+		return ret;
+	/* ranges must be inserted after we are sure it has enough space */
+	ret = insert_data_ranges(map, tmp, insert_list);
+	map->reserved += reserve;
+	return ret;
+}
+
+/*
  * Init data_rsv_map for a given inode.
  *
  * This is needed at write time as quota can be disabled and then enabled
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/19] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (4 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 05/19] btrfs: qgroup: Introduce function to reserve data range per inode Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 07/19] btrfs: qgroup: Introduce function to release reserved range Qu Wenruo
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

This new function will do all the hard work to reserve precious space
for a write.

The overall work flow will be the following.

File A already has some dirty pages:

0	4K	8K	12K	16K
|///////|	|///////|

And then, someone want to write some data into range [4K, 16K).
	|<------desired-------->|

Unlike the old and wrong implement, which reserve 12K, this function
will only reserve space for newly dirty part:
	|\\\\\\\|	|\\\\\\\|
Which only takes 8K reserve space, as other part has already allocated
their own reserve space.

So the final reserve map will be:
|///////////////////////////////|

This provides the basis to resolve the long existing qgroup limit bug.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h |  1 +
 2 files changed, 58 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 77a2e07..337b784 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2793,6 +2793,63 @@ insert:
 }
 
 /*
+ * Make sure the data space for [start, start + len) is reserved.
+ * It will either reserve new space from given qgroup or reuse the already
+ * reserved space.
+ *
+ * Return 0 for successful reserve.
+ * Return <0 for error.
+ *
+ * TODO: to handle nocow case, like NODATACOW or write into prealloc space
+ * along with other mixed case.
+ * Like write 2M, first 1M can be nocowed, but next 1M is on hole and need COW.
+ */
+int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
+{
+	struct btrfs_inode *binode = BTRFS_I(inode);
+	struct btrfs_root *root = binode->root;
+	struct btrfs_qgroup_data_rsv_map *reserve_map;
+	struct data_rsv_range *tmp = NULL;
+	struct ulist *insert_list;
+	int ret;
+
+	if (!root->fs_info->quota_enabled || !is_fstree(root->objectid) ||
+	    len == 0)
+		return 0;
+
+	if (!binode->qgroup_rsv_map) {
+		ret = btrfs_qgroup_init_data_rsv_map(inode);
+		if (ret < 0)
+			return ret;
+	}
+	reserve_map = binode->qgroup_rsv_map;
+	insert_list = ulist_alloc(GFP_NOFS);
+	if (!insert_list)
+		return -ENOMEM;
+	tmp = kzalloc(sizeof(*tmp), GFP_NOFS);
+	if (!tmp) {
+		ulist_free(insert_list);
+		return -ENOMEM;
+	}
+
+	spin_lock(&reserve_map->lock);
+	ret = reserve_data_range(root, reserve_map, tmp, insert_list, start,
+				 len);
+	/*
+	 * For error and already exists case, free tmp memory.
+	 * For tmp used case, set ret to 0, as some careless
+	 * caller consider >0 as error.
+	 */
+	if (ret <= 0)
+		kfree(tmp);
+	else
+		ret = 0;
+	spin_unlock(&reserve_map->lock);
+	ulist_free(insert_list);
+	return ret;
+}
+
+/*
  * Init data_rsv_map for a given inode.
  *
  * This is needed at write time as quota can be disabled and then enabled
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index c87b7dc..366b853 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -87,4 +87,5 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 /* for qgroup reserve */
 int btrfs_qgroup_init_data_rsv_map(struct inode *inode);
 void btrfs_qgroup_free_data_rsv_map(struct inode *inode);
+int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
 #endif /* __BTRFS_QGROUP__ */
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 07/19] btrfs: qgroup: Introduce function to release reserved range
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (5 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 06/19] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 08/19] btrfs: qgroup: Introduce function to release/free reserved data range Qu Wenruo
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Introduce new function release_data_range() to release reserved ranges.
It will iterate through all existing ranges and remove/shrink them.

Note this function will not free reserved space, as the range can be
released in the following conditions:
1) The dirty range gets written to disk.
   In this case, reserved range will be released but reserved bytes
   will not be freed until the delayed_ref is run.

2) Truncate
   In this case, dirty ranges will be released and reserved bytes will
   also be freed.

So the new function won't free reserved space, but record them into
parameter if called needs.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 337b784..e24c10d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2849,6 +2849,136 @@ int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
 	return ret;
 }
 
+/* Small helper used in release_data_range() to update rsv map */
+static inline void __update_rsv(struct btrfs_qgroup_data_rsv_map *map,
+				u64 *reserved, u64 cur_rsv)
+{
+	if (reserved)
+		*reserved += cur_rsv;
+	if (WARN_ON(map->reserved < cur_rsv))
+		map->reserved = 0;
+	else
+		map->reserved -= cur_rsv;
+}
+
+/*
+ * Release the range [start, start + len) from rsv map.
+ *
+ * The behavior should be much like reserve_data_range().
+ * @tmp: the allocated memory for case which need to split existing
+ *       range into two.
+ * @reserved: the number of bytes that may need to free
+ * Return > 0 if 'tmp' memory is used and release range successfully
+ * Return 0 if 'tmp' memory is not used and release range successfully
+ * Return < 0 for error
+ */
+static int release_data_range(struct btrfs_qgroup_data_rsv_map *map,
+			      struct data_rsv_range *tmp,
+			      u64 start, u64 len, u64 *reserved)
+{
+	struct data_rsv_range *range;
+	u64 cur_rsv = 0;
+	int ret = 0;
+
+	range = find_reserve_range(map, start);
+	/* empty tree, just return */
+	if (!range)
+		return 0;
+	/*
+	 * For split case
+	 *		|<----desired---->|
+	 * |////////////////////////////////////////////|
+	 * In this case, we need to insert one new range.
+	 */
+	if (range->start < start && range->start + range->len > start + len) {
+		u64 new_start = start + len;
+		u64 new_len = range->start + range->len - start - len;
+
+		cur_rsv = len;
+		if (reserved)
+			*reserved += cur_rsv;
+		map->reserved -= cur_rsv;
+
+		range->len = start - range->start;
+		ret = insert_data_range(map, tmp, new_start, new_len);
+		WARN_ON(ret <= 0);
+		return 1;
+	}
+
+	/*
+	 * Iterate until the end of the range and free release all
+	 * reserved data from map.
+	 * We iterate by existing range, as that will makes codes a
+	 * little more clean.
+	 *
+	 *	|<---------desired------------------------>|
+	 * |//1//|	|//2//|		|//3//|		|//4//|
+	 */
+	while (range->start < start + len) {
+		struct rb_node *next = NULL;
+		int range_freed = 0;
+
+		/*
+		 *		|<---desired---->|
+		 * |///////|
+		 */
+		if (unlikely(range->start + range->len <= start))
+			goto next;
+
+		/*
+		 *	|<----desired---->|
+		 * |///////|
+		 */
+		if (range->start < start &&
+		    range->start + range->len > start) {
+			cur_rsv = range->start + range->len - start;
+
+			range->len = start - range->start;
+			goto next;
+		}
+
+		/*
+		 *	|<--desired-->|
+		 *	    |/////|
+		 * Including same start/end case, so other case don't need
+		 * to check start/end equal case and don't need bother
+		 * deleting range.
+		 */
+		if (range->start >= start &&
+		    range->start + range->len <= start + len) {
+			cur_rsv = range->len;
+
+			range_freed = 1;
+			next = rb_next(&range->node);
+			rb_erase(&range->node, &map->root);
+			kfree(range);
+			goto next;
+
+		}
+
+		/*
+		 *	|<--desired-->|
+		 *		  |///////|
+		 */
+		if (range->start < start + len &&
+		    range->start + range->len > start + len) {
+			cur_rsv = start + len - range->start;
+
+			range->len = range->start + range->len - start - len;
+			range->start = start + len;
+			goto next;
+		}
+next:
+		__update_rsv(map, reserved, cur_rsv);
+		if (!range_freed)
+			next = rb_next(&range->node);
+		if (!next)
+			break;
+		range = rb_entry(next, struct data_rsv_range, node);
+	}
+	return 0;
+}
+
 /*
  * Init data_rsv_map for a given inode.
  *
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/19] btrfs: qgroup: Introduce function to release/free reserved data range
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (6 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 07/19] btrfs: qgroup: Introduce function to release reserved range Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 09/19] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Introduce functions btrfs_qgroup_release/free_data() to release/free
reserved data range.

Release means, just remove the data range from data rsv map, but doesn't
free the reserved space.
This is for normal buffered write case, when data is written into disc
and its metadata is added into tree, its reserved space should still be
kept until commit_trans().
So in that case, we only release dirty range, but keep the reserved
space recorded some other place until commit_tran().

Free means not only remove data range, but also free reserved space.
This is used for case for cleanup.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h |  2 ++
 2 files changed, 50 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index e24c10d..ba7888f 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2979,6 +2979,54 @@ next:
 	return 0;
 }
 
+static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
+				       int free_reserved)
+{
+	struct data_rsv_range *tmp;
+	struct btrfs_qgroup_data_rsv_map *map;
+	u64 reserved = 0;
+	int ret;
+
+	spin_lock(&BTRFS_I(inode)->qgroup_init_lock);
+	map = BTRFS_I(inode)->qgroup_rsv_map;
+	spin_unlock(&BTRFS_I(inode)->qgroup_init_lock);
+	if (!map)
+		return 0;
+
+	tmp = kmalloc(sizeof(*tmp), GFP_NOFS);
+	if (!tmp)
+		return -ENOMEM;
+	spin_lock(&map->lock);
+	ret = release_data_range(map, tmp, start, len, &reserved);
+	/* release_data_range() won't fail only check if memory is used */
+	if (ret == 0)
+		kfree(tmp);
+	if (free_reserved)
+		btrfs_qgroup_free(BTRFS_I(inode)->root, reserved);
+	spin_unlock(&map->lock);
+	return 0;
+}
+
+/*
+ * Caller should be truncate/invalidate_page.
+ * As it will release the reserved data.
+ */
+int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len)
+{
+	return __btrfs_qgroup_release_data(inode, start, len, 1);
+}
+
+/*
+ * Caller should be finish_ordered_io
+ * As qgroup accouting happens at commit time, for data written to disk
+ * its reserved space should not be freed until commit.
+ * Or we may beyond the limit.
+ */
+int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len)
+{
+	return __btrfs_qgroup_release_data(inode, start, len, 0);
+}
+
 /*
  * Init data_rsv_map for a given inode.
  *
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 366b853..8e69dc1 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -88,4 +88,6 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 int btrfs_qgroup_init_data_rsv_map(struct inode *inode);
 void btrfs_qgroup_free_data_rsv_map(struct inode *inode);
 int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
+int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
+int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
 #endif /* __BTRFS_QGROUP__ */
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/19] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (7 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 08/19] btrfs: qgroup: Introduce function to release/free reserved data range Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 10/19] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Add new function btrfs_add_delayed_qgroup_reserve() function to record
how much space is reserved for that extent.

As btrfs only accounts qgroup at run_delayed_refs() time, so newly
allocated extent should keep the reserved space until then.

So add needed function with related members to do it.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/delayed-ref.c | 29 +++++++++++++++++++++++++++++
 fs/btrfs/delayed-ref.h | 14 ++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ac3e81d..bd9b63b 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 	INIT_LIST_HEAD(&head_ref->ref_list);
 	head_ref->processing = 0;
 	head_ref->total_ref_mod = count_mod;
+	head_ref->qgroup_reserved = 0;
+	head_ref->qgroup_ref_root = 0;
 
 	/* Record qgroup extent info if provided */
 	if (qrecord) {
@@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
+				     struct btrfs_trans_handle *trans,
+				     u64 ref_root, u64 bytenr, u64 num_bytes)
+{
+	struct btrfs_delayed_ref_root *delayed_refs;
+	struct btrfs_delayed_ref_head *ref_head;
+	int ret = 0;
+
+	if (!fs_info->quota_enabled || !is_fstree(ref_root))
+		return 0;
+
+	delayed_refs = &trans->transaction->delayed_refs;
+
+	spin_lock(&delayed_refs->lock);
+	ref_head = find_ref_head(&delayed_refs->href_root, bytenr, 0);
+	if (!ref_head) {
+		ret = -ENOENT;
+		goto out;
+	}
+	WARN_ON(ref_head->qgroup_reserved || ref_head->qgroup_ref_root);
+	ref_head->qgroup_ref_root = ref_root;
+	ref_head->qgroup_reserved = num_bytes;
+out:
+	spin_unlock(&delayed_refs->lock);
+	return ret;
+}
+
 int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
 				struct btrfs_trans_handle *trans,
 				u64 bytenr, u64 num_bytes,
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 13fb5e6..d4c41e2 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -113,6 +113,17 @@ struct btrfs_delayed_ref_head {
 	int total_ref_mod;
 
 	/*
+	 * For qgroup reserved space freeing.
+	 *
+	 * ref_root and reserved will be recorded after
+	 * BTRFS_ADD_DELAYED_EXTENT is called.
+	 * And will be used to free reserved qgroup space at
+	 * run_delayed_refs() time.
+	 */
+	u64 qgroup_ref_root;
+	u64 qgroup_reserved;
+
+	/*
 	 * when a new extent is allocated, it is just reserved in memory
 	 * The actual extent isn't inserted into the extent allocation tree
 	 * until the delayed ref is processed.  must_insert_reserved is
@@ -242,6 +253,9 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 			       u64 owner, u64 offset, int action,
 			       struct btrfs_delayed_extent_op *extent_op,
 			       int no_quota);
+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
+				     struct btrfs_trans_handle *trans,
+				     u64 ref_root, u64 bytenr, u64 num_bytes);
 int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
 				struct btrfs_trans_handle *trans,
 				u64 bytenr, u64 num_bytes,
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/19] btrfs: delayed_ref: release and free qgroup reserved at proper timing
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (8 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 09/19] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 11/19] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Qgroup reserved space needs to be released from inode dirty map and get
freed at different timing:

1) Release when the metadata is written into tree
After corresponding metadata is written into tree, any newer write will
be COWed(don't include NOCOW case yet).
So we must release its range from inode dirty range map, or we will
forget to reserve needed range, causing accounting exceeding the limit.

2) Free reserved bytes when delayed ref is run
When delayed refs are run, qgroup accounting will follow soon and turn
the reserved bytes into rfer/excl numbers.
As run_delayed_refs and qgroup accounting are all done at
commit_transaction() time, we are safe to free reserved space in
run_delayed_ref time().

With these timing to release/free reserved space, we should be able to
resolve the long existing qgroup reserve space leak problem.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c |  4 ++++
 fs/btrfs/inode.c       | 10 ++++++++++
 fs/btrfs/qgroup.c      |  5 ++---
 fs/btrfs/qgroup.h      |  8 +++++++-
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5411f0a..65e60eb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2345,6 +2345,10 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
 						      node->num_bytes);
 			}
 		}
+
+		/* Also free its reserved qgroup space */
+		btrfs_qgroup_free_refroot(root->fs_info, head->qgroup_ref_root,
+					  head->qgroup_reserved);
 		return ret;
 	}
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 61b2c17..1f7cac0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2112,6 +2112,16 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	ret = btrfs_alloc_reserved_file_extent(trans, root,
 					root->root_key.objectid,
 					btrfs_ino(inode), file_pos, &ins);
+	if (ret < 0)
+		goto out;
+	/*
+	 * Release the reserved range from inode dirty range map, and
+	 * move it to delayed ref codes, as now accounting only happens at
+	 * commit_transaction() time.
+	 */
+	btrfs_qgroup_release_data(inode, file_pos, ram_bytes);
+	ret = btrfs_add_delayed_qgroup_reserve(root->fs_info, trans,
+			root->objectid, disk_bytenr, ram_bytes);
 out:
 	btrfs_free_path(path);
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index ba7888f..5a69a2d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2169,14 +2169,13 @@ out:
 	return ret;
 }
 
-void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
+void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
+			       u64 ref_root, u64 num_bytes)
 {
 	struct btrfs_root *quota_root;
 	struct btrfs_qgroup *qgroup;
-	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct ulist_node *unode;
 	struct ulist_iterator uiter;
-	u64 ref_root = root->root_key.objectid;
 	int ret = 0;
 
 	if (!is_fstree(ref_root))
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 8e69dc1..49fa15e 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -75,7 +75,13 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
 			 struct btrfs_qgroup_inherit *inherit);
 int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
-void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
+void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
+			       u64 ref_root, u64 num_bytes);
+static inline void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
+{
+	return btrfs_qgroup_free_refroot(root->fs_info, root->objectid,
+					 num_bytes);
+}
 
 void assert_qgroups_uptodate(struct btrfs_trans_handle *trans);
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 11/19] btrfs: qgroup: Introduce new functions to reserve/free metadata
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (9 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 10/19] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 12/19] btrfs: qgroup: Use new metadata reservation Qu Wenruo
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Introduce new functions btrfs_qgroup_reserve/free_meta() to reserve/free
metadata reserved space.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h   |  3 +++
 fs/btrfs/disk-io.c |  1 +
 fs/btrfs/qgroup.c  | 40 ++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h  |  4 ++++
 4 files changed, 48 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 938efe3..ae86025 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1943,6 +1943,9 @@ struct btrfs_root {
 	int send_in_progress;
 	struct btrfs_subvolume_writers *subv_writers;
 	atomic_t will_be_snapshoted;
+
+	/* For qgroup metadata space reserve */
+	atomic_t qgroup_meta_rsv;
 };
 
 struct btrfs_ioctl_defrag_range_args {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0b658d0..704d212 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1259,6 +1259,7 @@ static void __setup_root(u32 nodesize, u32 sectorsize, u32 stripesize,
 	atomic_set(&root->orphan_inodes, 0);
 	atomic_set(&root->refs, 1);
 	atomic_set(&root->will_be_snapshoted, 0);
+	atomic_set(&root->qgroup_meta_rsv, 0);
 	root->log_transid = 0;
 	root->log_transid_committed = -1;
 	root->last_log_commit = 0;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 5a69a2d..b759e96 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3102,3 +3102,43 @@ void btrfs_qgroup_free_data_rsv_map(struct inode *inode)
 	kfree(dirty_map);
 	binode->qgroup_rsv_map = NULL;
 }
+
+int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes)
+{
+	int ret;
+
+	if (!root->fs_info->quota_enabled || !is_fstree(root->objectid) ||
+	    num_bytes == 0)
+		return 0;
+
+	BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
+	ret = btrfs_qgroup_reserve(root, num_bytes);
+	if (ret < 0)
+		return ret;
+	atomic_add(num_bytes, &root->qgroup_meta_rsv);
+	return ret;
+}
+
+void btrfs_qgroup_free_meta_all(struct btrfs_root *root)
+{
+	int reserved;
+
+	if (!root->fs_info->quota_enabled || !is_fstree(root->objectid))
+		return;
+
+	reserved = atomic_xchg(&root->qgroup_meta_rsv, 0);
+	if (reserved == 0)
+		return;
+	btrfs_qgroup_free(root, reserved);
+}
+
+void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
+{
+	if (!root->fs_info->quota_enabled || !is_fstree(root->objectid))
+		return;
+
+	BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
+	WARN_ON(atomic_read(&root->qgroup_meta_rsv) < num_bytes);
+	atomic_sub(num_bytes, &root->qgroup_meta_rsv);
+	btrfs_qgroup_free(root, num_bytes);
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 49fa15e..2d507c8 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -96,4 +96,8 @@ void btrfs_qgroup_free_data_rsv_map(struct inode *inode);
 int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
 int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
 int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
+
+int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes);
+void btrfs_qgroup_free_meta_all(struct btrfs_root *root);
+void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes);
 #endif /* __BTRFS_QGROUP__ */
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 12/19] btrfs: qgroup: Use new metadata reservation.
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (10 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 11/19] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 13/19] btrfs: extent-tree: Add new verions of btrfs_check_data_free_space Qu Wenruo
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

As we have the new metadata reservation functions, use them to replace
the old btrfs_qgroup_reserve() call for metadata.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 14 ++++++--------
 fs/btrfs/transaction.c | 34 ++++++----------------------------
 fs/btrfs/transaction.h |  1 -
 3 files changed, 12 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 65e60eb..402415c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5343,7 +5343,7 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 	if (root->fs_info->quota_enabled) {
 		/* One for parent inode, two for dir entries */
 		num_bytes = 3 * root->nodesize;
-		ret = btrfs_qgroup_reserve(root, num_bytes);
+		ret = btrfs_qgroup_reserve_meta(root, num_bytes);
 		if (ret)
 			return ret;
 	} else {
@@ -5361,10 +5361,8 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 	if (ret == -ENOSPC && use_global_rsv)
 		ret = btrfs_block_rsv_migrate(global_rsv, rsv, num_bytes);
 
-	if (ret) {
-		if (*qgroup_reserved)
-			btrfs_qgroup_free(root, *qgroup_reserved);
-	}
+	if (ret && *qgroup_reserved)
+		btrfs_qgroup_free_meta(root, *qgroup_reserved);
 
 	return ret;
 }
@@ -5525,15 +5523,15 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
 	spin_unlock(&BTRFS_I(inode)->lock);
 
 	if (root->fs_info->quota_enabled) {
-		ret = btrfs_qgroup_reserve(root, nr_extents * root->nodesize);
+		ret = btrfs_qgroup_reserve_meta(root,
+				nr_extents * root->nodesize);
 		if (ret)
 			goto out_fail;
 	}
 
 	ret = reserve_metadata_bytes(root, block_rsv, to_reserve, flush);
 	if (unlikely(ret)) {
-		if (root->fs_info->quota_enabled)
-			btrfs_qgroup_free(root, nr_extents * root->nodesize);
+		btrfs_qgroup_free_meta(root, nr_extents * root->nodesize);
 		goto out_fail;
 	}
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 68ad89e..707e8ea 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -446,13 +446,10 @@ start_transaction(struct btrfs_root *root, u64 num_items, unsigned int type,
 	 * the appropriate flushing if need be.
 	 */
 	if (num_items > 0 && root != root->fs_info->chunk_root) {
-		if (root->fs_info->quota_enabled &&
-		    is_fstree(root->root_key.objectid)) {
-			qgroup_reserved = num_items * root->nodesize;
-			ret = btrfs_qgroup_reserve(root, qgroup_reserved);
-			if (ret)
-				return ERR_PTR(ret);
-		}
+		qgroup_reserved = num_items * root->nodesize;
+		ret = btrfs_qgroup_reserve_meta(root, qgroup_reserved);
+		if (ret)
+			return ERR_PTR(ret);
 
 		num_bytes = btrfs_calc_trans_metadata_size(root, num_items);
 		/*
@@ -521,7 +518,6 @@ again:
 	h->block_rsv = NULL;
 	h->orig_rsv = NULL;
 	h->aborted = 0;
-	h->qgroup_reserved = 0;
 	h->delayed_ref_elem.seq = 0;
 	h->type = type;
 	h->allocating_chunk = false;
@@ -546,7 +542,6 @@ again:
 		h->bytes_reserved = num_bytes;
 		h->reloc_reserved = reloc_reserved;
 	}
-	h->qgroup_reserved = qgroup_reserved;
 
 got_it:
 	btrfs_record_root_in_trans(h, root);
@@ -564,8 +559,7 @@ alloc_fail:
 		btrfs_block_rsv_release(root, &root->fs_info->trans_block_rsv,
 					num_bytes);
 reserve_fail:
-	if (qgroup_reserved)
-		btrfs_qgroup_free(root, qgroup_reserved);
+	btrfs_qgroup_free_meta(root, qgroup_reserved);
 	return ERR_PTR(ret);
 }
 
@@ -782,15 +776,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 			must_run_delayed_refs = 2;
 	}
 
-	if (trans->qgroup_reserved) {
-		/*
-		 * the same root has to be passed here between start_transaction
-		 * and end_transaction. Subvolume quota depends on this.
-		 */
-		btrfs_qgroup_free(trans->root, trans->qgroup_reserved);
-		trans->qgroup_reserved = 0;
-	}
-
 	btrfs_trans_release_metadata(trans, root);
 	trans->block_rsv = NULL;
 
@@ -1205,6 +1190,7 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
 			spin_lock(&fs_info->fs_roots_radix_lock);
 			if (err)
 				break;
+			btrfs_qgroup_free_meta_all(root);
 		}
 	}
 	spin_unlock(&fs_info->fs_roots_radix_lock);
@@ -1813,10 +1799,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 
 	btrfs_trans_release_metadata(trans, root);
 	trans->block_rsv = NULL;
-	if (trans->qgroup_reserved) {
-		btrfs_qgroup_free(root, trans->qgroup_reserved);
-		trans->qgroup_reserved = 0;
-	}
 
 	cur_trans = trans->transaction;
 
@@ -2169,10 +2151,6 @@ cleanup_transaction:
 	btrfs_trans_release_metadata(trans, root);
 	btrfs_trans_release_chunk_metadata(trans);
 	trans->block_rsv = NULL;
-	if (trans->qgroup_reserved) {
-		btrfs_qgroup_free(root, trans->qgroup_reserved);
-		trans->qgroup_reserved = 0;
-	}
 	btrfs_warn(root->fs_info, "Skipping commit of aborted transaction.");
 	if (current->journal_info == trans)
 		current->journal_info = NULL;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index edc2fbc..6586ef1 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -105,7 +105,6 @@ struct btrfs_trans_handle {
 	u64 transid;
 	u64 bytes_reserved;
 	u64 chunk_bytes_reserved;
-	u64 qgroup_reserved;
 	unsigned long use_count;
 	unsigned long blocks_reserved;
 	unsigned long blocks_used;
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 13/19] btrfs: extent-tree: Add new verions of btrfs_check_data_free_space
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (11 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 12/19] btrfs: qgroup: Use new metadata reservation Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 14/19] btrfs: Switch to new check_data_free_space Qu Wenruo
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Add new function __btrfs_check_data_free_space() to do precious space
reservation.

The new function will replace old btrfs_check_data_free_space(), but
until all the change is done, let's just use the new name.

Also, export internal use function btrfs_alloc_data_chunk_ondemand(), as
now qgroup reserve requires precious bytes, which can only be got in
later loop(like fallocate).
But data space info check and data chunk allocate doesn't need to be
that accurate, and can be called at the beginning.

So export it for later operations.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  2 ++
 fs/btrfs/extent-tree.c | 50 +++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ae86025..c1a0aaf 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3453,6 +3453,8 @@ enum btrfs_reserve_flush_enum {
 };
 
 int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes);
+int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
+int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes);
 void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes);
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
 				struct btrfs_root *root);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 402415c..61366ca 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3907,11 +3907,7 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data)
 	return ret;
 }
 
-/*
- * This will check the space that the inode allocates from to make sure we have
- * enough space for bytes.
- */
-int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
+int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
 {
 	struct btrfs_space_info *data_sinfo;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -4032,19 +4028,55 @@ commit_trans:
 					      data_sinfo->flags, bytes, 1);
 		return -ENOSPC;
 	}
-	ret = btrfs_qgroup_reserve(root, write_bytes);
-	if (ret)
-		goto out;
 	data_sinfo->bytes_may_use += bytes;
 	trace_btrfs_space_reservation(root->fs_info, "space_info",
 				      data_sinfo->flags, bytes, 1);
-out:
 	spin_unlock(&data_sinfo->lock);
 
 	return ret;
 }
 
 /*
+ * This will check the space that the inode allocates from to make sure we have
+ * enough space for bytes.
+ */
+int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	int ret;
+
+	ret = btrfs_alloc_data_chunk_ondemand(inode, bytes);
+	if (ret < 0)
+		return ret;
+	ret = btrfs_qgroup_reserve(root, write_bytes);
+	return ret;
+}
+
+/*
+ * New check_data_free_space() with ability for precious data reserveation
+ * Will replace old btrfs_check_data_free_space(), but for patch split,
+ * add a new function first and then replace it.
+ */
+int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	int ret;
+
+	/* align the range */
+	len = round_up(start + len, root->sectorsize) -
+	      round_down(start, root->sectorsize);
+	start = round_down(start, root->sectorsize);
+
+	ret = btrfs_alloc_data_chunk_ondemand(inode, len);
+	if (ret < 0)
+		return ret;
+
+	/* Use new btrfs_qgroup_reserve_data to reserve precious data space */
+	ret = btrfs_qgroup_reserve_data(inode, start, len);
+	return ret;
+}
+
+/*
  * Called if we need to clear a data reservation for this inode.
  */
 void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes)
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 14/19] btrfs: Switch to new check_data_free_space
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (12 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 13/19] btrfs: extent-tree: Add new verions of btrfs_check_data_free_space Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 15/19] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Use new check_data_free_space for buffered write and inode cache.

For buffered write case, as nodatacow write won't increase quota account,
so unlike old behavior which does reserve before check nocow, now we
check nocow first and then only reserve data if we can't do nocow write.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c |  2 +-
 fs/btrfs/file.c        | 22 +++++++++++++---------
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 61366ca..2e3f19e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3352,7 +3352,7 @@ again:
 	num_pages *= 16;
 	num_pages *= PAGE_CACHE_SIZE;
 
-	ret = btrfs_check_data_free_space(inode, num_pages, num_pages);
+	ret = __btrfs_check_data_free_space(inode, 0, num_pages);
 	if (ret)
 		goto out_put;
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b823fac..c1eec4f 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1510,12 +1510,17 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 		}
 
 		reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
-		ret = btrfs_check_data_free_space(inode, reserve_bytes, write_bytes);
-		if (ret == -ENOSPC &&
-		    (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
-					      BTRFS_INODE_PREALLOC))) {
+
+		if (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+					     BTRFS_INODE_PREALLOC)) {
 			ret = check_can_nocow(inode, pos, &write_bytes);
+			if (ret < 0)
+				break;
 			if (ret > 0) {
+				/*
+				 * For nodata cow case, no need to reserve
+				 * data space.
+				 */
 				only_release_metadata = true;
 				/*
 				 * our prealloc extent may be smaller than
@@ -1524,15 +1529,14 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				num_pages = DIV_ROUND_UP(write_bytes + offset,
 							 PAGE_CACHE_SIZE);
 				reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
-				ret = 0;
-			} else {
-				ret = -ENOSPC;
+				goto reserve_metadata;
 			}
 		}
-
-		if (ret)
+		ret = __btrfs_check_data_free_space(inode, pos, write_bytes);
+		if (ret < 0)
 			break;
 
+reserve_metadata:
 		ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes);
 		if (ret) {
 			if (!only_release_metadata)
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 15/19] btrfs: fallocate: Add support to accurate qgroup reserve
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (13 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 14/19] btrfs: Switch to new check_data_free_space Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 16/19] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve_space Qu Wenruo
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Now fallocate will do accurate qgroup reserve space check, unlike old
method, which will always reserve the whole length of the range.

With this patch, fallocate will:
1) Iterate the desired range and mark in data rsv map
   Only range which is going to be allocated will be recorded in data
   rsv map and reserve the space.
   For already allocated range (normal/prealloc extent) they will be
   skipped.
   Also, record the marked range into a new list for later use.

2) If 1) succeeded, do real file extent allocate.
   And at file extent allocation time, corresponding range will be
   removed from the range in data rsv map.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/file.c | 147 +++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 107 insertions(+), 40 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index c1eec4f..26e59bc 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2545,17 +2545,61 @@ out_only_mutex:
 	return err;
 }
 
+/* Helper structure to record which range is already reserved */
+struct falloc_range {
+	struct list_head list;
+	u64 start;
+	u64 len;
+};
+
+/*
+ * Helper function to add falloc range
+ *
+ * Caller should have locked the larger range of extent containing
+ * [start, len)
+ */
+static int add_falloc_range(struct list_head *head, u64 start, u64 len)
+{
+	struct falloc_range *prev = NULL;
+	struct falloc_range *range = NULL;
+
+	if (list_empty(head))
+		goto insert;
+
+	/*
+	 * As fallocate iterate by bytenr order, we only need to check
+	 * the last range.
+	 */
+	prev = list_entry(head->prev, struct falloc_range, list);
+	if (prev->start + prev->len == start) {
+		prev->len += len;
+		return 0;
+	}
+insert:
+	range = kmalloc(sizeof(*range), GFP_NOFS);
+	if (!range)
+		return -ENOMEM;
+	range->start = start;
+	range->len = len;
+	list_add_tail(&range->list, head);
+	return 0;
+}
+
 static long btrfs_fallocate(struct file *file, int mode,
 			    loff_t offset, loff_t len)
 {
 	struct inode *inode = file_inode(file);
 	struct extent_state *cached_state = NULL;
+	struct falloc_range *range;
+	struct falloc_range *tmp;
+	struct list_head reserve_list;
 	u64 cur_offset;
 	u64 last_byte;
 	u64 alloc_start;
 	u64 alloc_end;
 	u64 alloc_hint = 0;
 	u64 locked_end;
+	u64 actual_end = 0;
 	struct extent_map *em;
 	int blocksize = BTRFS_I(inode)->root->sectorsize;
 	int ret;
@@ -2571,10 +2615,11 @@ static long btrfs_fallocate(struct file *file, int mode,
 		return btrfs_punch_hole(inode, offset, len);
 
 	/*
-	 * Make sure we have enough space before we do the
-	 * allocation.
+	 * Only trigger disk allocation, don't trigger qgroup reserve
+	 *
+	 * For qgroup space, it will be checked later.
 	 */
-	ret = btrfs_check_data_free_space(inode, alloc_end - alloc_start, alloc_end - alloc_start);
+	ret = btrfs_alloc_data_chunk_ondemand(inode, alloc_end - alloc_start);
 	if (ret)
 		return ret;
 
@@ -2583,6 +2628,13 @@ static long btrfs_fallocate(struct file *file, int mode,
 	if (ret)
 		goto out;
 
+	/*
+	 * TODO: Move these two operations after we have checked
+	 * accurate reserved space, or fallocate can still fail but
+	 * with page truncated or size expanded.
+	 *
+	 * But that's a minor problem and won't do much harm BTW.
+	 */
 	if (alloc_start > inode->i_size) {
 		ret = btrfs_cont_expand(inode, i_size_read(inode),
 					alloc_start);
@@ -2641,10 +2693,10 @@ static long btrfs_fallocate(struct file *file, int mode,
 		}
 	}
 
+	/* First, check if we exceed the qgroup limit */
+	INIT_LIST_HEAD(&reserve_list);
 	cur_offset = alloc_start;
 	while (1) {
-		u64 actual_end;
-
 		em = btrfs_get_extent(inode, NULL, 0, cur_offset,
 				      alloc_end - cur_offset, 0);
 		if (IS_ERR_OR_NULL(em)) {
@@ -2657,54 +2709,69 @@ static long btrfs_fallocate(struct file *file, int mode,
 		last_byte = min(extent_map_end(em), alloc_end);
 		actual_end = min_t(u64, extent_map_end(em), offset + len);
 		last_byte = ALIGN(last_byte, blocksize);
-
 		if (em->block_start == EXTENT_MAP_HOLE ||
 		    (cur_offset >= inode->i_size &&
 		     !test_bit(EXTENT_FLAG_PREALLOC, &em->flags))) {
-			ret = btrfs_prealloc_file_range(inode, mode, cur_offset,
-							last_byte - cur_offset,
-							1 << inode->i_blkbits,
-							offset + len,
-							&alloc_hint);
-		} else if (actual_end > inode->i_size &&
-			   !(mode & FALLOC_FL_KEEP_SIZE)) {
-			struct btrfs_trans_handle *trans;
-			struct btrfs_root *root = BTRFS_I(inode)->root;
-
-			/*
-			 * We didn't need to allocate any more space, but we
-			 * still extended the size of the file so we need to
-			 * update i_size and the inode item.
-			 */
-			trans = btrfs_start_transaction(root, 1);
-			if (IS_ERR(trans)) {
-				ret = PTR_ERR(trans);
-			} else {
-				inode->i_ctime = CURRENT_TIME;
-				i_size_write(inode, actual_end);
-				btrfs_ordered_update_i_size(inode, actual_end,
-							    NULL);
-				ret = btrfs_update_inode(trans, root, inode);
-				if (ret)
-					btrfs_end_transaction(trans, root);
-				else
-					ret = btrfs_end_transaction(trans,
-								    root);
+			ret = add_falloc_range(&reserve_list, cur_offset,
+					       last_byte - cur_offset);
+			if (ret < 0) {
+				free_extent_map(em);
+				goto out;
 			}
+			ret = btrfs_qgroup_reserve_data(inode, cur_offset,
+					last_byte - cur_offset);
 		}
 		free_extent_map(em);
-		if (ret < 0)
-			break;
-
 		cur_offset = last_byte;
-		if (cur_offset >= alloc_end) {
-			ret = 0;
+		if (cur_offset >= alloc_end)
 			break;
+	}
+	if (ret < 0)
+		goto out;
+
+	/* Now we are sure qgroup reserved enough space now */
+	list_for_each_entry_safe(range, tmp, &reserve_list, list) {
+		ret = btrfs_prealloc_file_range(inode, mode, range->start,
+				range->len, 1 << inode->i_blkbits,
+				offset + len, &alloc_hint);
+		if (ret < 0)
+			goto out;
+	}
+	if (actual_end > inode->i_size &&
+	    !(mode & FALLOC_FL_KEEP_SIZE)) {
+		struct btrfs_trans_handle *trans;
+		struct btrfs_root *root = BTRFS_I(inode)->root;
+
+		/*
+		 * We didn't need to allocate any more space, but we
+		 * still extended the size of the file so we need to
+		 * update i_size and the inode item.
+		 */
+		trans = btrfs_start_transaction(root, 1);
+		if (IS_ERR(trans)) {
+			ret = PTR_ERR(trans);
+		} else {
+			inode->i_ctime = CURRENT_TIME;
+			i_size_write(inode, actual_end);
+			btrfs_ordered_update_i_size(inode, actual_end, NULL);
+			ret = btrfs_update_inode(trans, root, inode);
+			if (ret)
+				btrfs_end_transaction(trans, root);
+			else
+				ret = btrfs_end_transaction(trans, root);
 		}
 	}
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
 			     &cached_state, GFP_NOFS);
 out:
+	/*
+	 * As we waited the extent range, the data_rsv_map must be empty
+	 * in the range, as written data range will be released from it.
+	 * And for prelloacted extent, it will also be released when
+	 * its metadata is written.
+	 * So this is completely used as cleanup.
+	 */
+	btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start);
 	mutex_unlock(&inode->i_mutex);
 	/* Let go of our reservation. */
 	btrfs_free_reserved_data_space(inode, alloc_end - alloc_start);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 16/19] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve_space
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (14 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 15/19] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 17/19] btrfs: extent-tree: Use new __btrfs_delalloc_reserve_space function Qu Wenruo
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Add new version of btrfs_delalloc_reserve_space() function, which
supports accurate qgroup reserve.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  1 +
 fs/btrfs/extent-tree.c | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c1a0aaf..12f14fd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3472,6 +3472,7 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root,
 int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
 void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
 int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes);
+int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
 void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2e3f19e..07f45b7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5686,6 +5686,44 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
 }
 
 /**
+ * __btrfs_delalloc_reserve_space - reserve data and metadata space for
+ * delalloc
+ * @inode: inode we're writing to
+ * @start: start range we are writing to
+ * @len: how long the range we are writing to
+ *
+ * TODO: This function will finally replace old btrfs_delalloc_reserve_space()
+ *
+ * This will do the following things
+ *
+ * o reserve space in data space info for num bytes
+ *   and reserve precious corresponding qgroup space
+ *   (Done in check_data_free_space)
+ *
+ * o reserve space for metadata space, based on the number of outstanding
+ *   extents and how much csums will be needed
+ *   also reserve metadata space in a per root over-reserve method.
+ * o add to the inodes->delalloc_bytes
+ * o add it to the fs_info's delalloc inodes list.
+ *   (Above 3 all done in delalloc_reserve_metadata)
+ *
+ * Return 0 for success
+ * Return <0 for error(-ENOSPC or -EQUOT)
+ */
+int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
+{
+	int ret;
+
+	ret = __btrfs_check_data_free_space(inode, start, len);
+	if (ret < 0)
+		return ret;
+	ret = btrfs_delalloc_reserve_metadata(inode, len);
+	if (ret < 0)
+		btrfs_free_reserved_data_space(inode, len);
+	return ret;
+}
+
+/**
  * btrfs_delalloc_reserve_space - reserve data and metadata space for delalloc
  * @inode: inode we're writing to
  * @num_bytes: the number of bytes we want to allocate
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 17/19] btrfs: extent-tree: Use new __btrfs_delalloc_reserve_space function
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (15 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 16/19] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve_space Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 18/19] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
  2015-09-08  8:37 ` [PATCH 19/19] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Use new __btrfs_delalloc_reserve_space to reserve space.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/inode-map.c |  2 +-
 fs/btrfs/inode.c     | 16 ++++++++++------
 fs/btrfs/ioctl.c     |  5 +++--
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index d4a582a..ab639d3 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -488,7 +488,7 @@ again:
 	/* Just to make sure we have enough space */
 	prealloc += 8 * PAGE_CACHE_SIZE;
 
-	ret = btrfs_delalloc_reserve_space(inode, prealloc);
+	ret = __btrfs_delalloc_reserve_space(inode, 0, prealloc);
 	if (ret)
 		goto out_put;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1f7cac0..d70cb26 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1985,7 +1985,8 @@ again:
 		goto again;
 	}
 
-	ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+	ret = __btrfs_delalloc_reserve_space(inode, page_start,
+					     PAGE_CACHE_SIZE);
 	if (ret) {
 		mapping_set_error(page->mapping, ret);
 		end_extent_writepage(page, ret, page_start, page_end);
@@ -4581,7 +4582,8 @@ int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
 	if ((offset & (blocksize - 1)) == 0 &&
 	    (!len || ((len & (blocksize - 1)) == 0)))
 		goto out;
-	ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+	ret = __btrfs_delalloc_reserve_space(inode,
+			round_down(from, PAGE_CACHE_SIZE), PAGE_CACHE_SIZE);
 	if (ret)
 		goto out;
 
@@ -8373,7 +8375,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 			mutex_unlock(&inode->i_mutex);
 			relock = true;
 		}
-		ret = btrfs_delalloc_reserve_space(inode, count);
+		ret = __btrfs_delalloc_reserve_space(inode, offset, count);
 		if (ret)
 			goto out;
 		outstanding_extents = div64_u64(count +
@@ -8620,7 +8622,11 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	u64 page_end;
 
 	sb_start_pagefault(inode->i_sb);
-	ret  = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+	page_start = page_offset(page);
+	page_end = page_start + PAGE_CACHE_SIZE - 1;
+
+	ret = __btrfs_delalloc_reserve_space(inode, page_start,
+					     PAGE_CACHE_SIZE);
 	if (!ret) {
 		ret = file_update_time(vma->vm_file);
 		reserved = 1;
@@ -8639,8 +8645,6 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 again:
 	lock_page(page);
 	size = i_size_read(inode);
-	page_start = page_offset(page);
-	page_end = page_start + PAGE_CACHE_SIZE - 1;
 
 	if ((page->mapping != inode->i_mapping) ||
 	    (page_start >= size)) {
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0adf542..e0291fc 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1119,8 +1119,9 @@ static int cluster_pages_for_defrag(struct inode *inode,
 
 	page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
 
-	ret = btrfs_delalloc_reserve_space(inode,
-					   page_cnt << PAGE_CACHE_SHIFT);
+	ret = __btrfs_delalloc_reserve_space(inode,
+			start_index << PAGE_CACHE_SHIFT,
+			page_cnt << PAGE_CACHE_SHIFT);
 	if (ret)
 		return ret;
 	i_done = 0;
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 18/19] btrfs: qgroup: Cleanup old inaccurate facilities
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (16 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 17/19] btrfs: extent-tree: Use new __btrfs_delalloc_reserve_space function Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  2015-09-08  8:37 ` [PATCH 19/19] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

Cleanup the old facilities which use old btrfs_qgroup_reserve() function
call, replace them with the newer version, and remove the "__" prefix in
them.

Also, make btrfs_qgroup_reserve/free() functions private, as they are
now only used inside qgroup codes.

Now, the whole btrfs qgroup is swithed to use the new reserve facilities.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  6 ++----
 fs/btrfs/extent-tree.c | 56 ++++----------------------------------------------
 fs/btrfs/file.c        |  2 +-
 fs/btrfs/inode-map.c   |  2 +-
 fs/btrfs/inode.c       | 12 +++++------
 fs/btrfs/ioctl.c       |  2 +-
 fs/btrfs/qgroup.c      | 19 ++++++++++-------
 fs/btrfs/qgroup.h      |  7 -------
 8 files changed, 27 insertions(+), 79 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 12f14fd..8489419 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3452,8 +3452,7 @@ enum btrfs_reserve_flush_enum {
 	BTRFS_RESERVE_FLUSH_ALL,
 };
 
-int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes);
-int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
+int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
 int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes);
 void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes);
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
@@ -3471,8 +3470,7 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root,
 				      u64 qgroup_reserved);
 int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
 void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes);
-int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
+int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
 void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 07f45b7..ab1b1a1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3352,7 +3352,7 @@ again:
 	num_pages *= 16;
 	num_pages *= PAGE_CACHE_SIZE;
 
-	ret = __btrfs_check_data_free_space(inode, 0, num_pages);
+	ret = btrfs_check_data_free_space(inode, 0, num_pages);
 	if (ret)
 		goto out_put;
 
@@ -4037,27 +4037,11 @@ commit_trans:
 }
 
 /*
- * This will check the space that the inode allocates from to make sure we have
- * enough space for bytes.
- */
-int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
-{
-	struct btrfs_root *root = BTRFS_I(inode)->root;
-	int ret;
-
-	ret = btrfs_alloc_data_chunk_ondemand(inode, bytes);
-	if (ret < 0)
-		return ret;
-	ret = btrfs_qgroup_reserve(root, write_bytes);
-	return ret;
-}
-
-/*
  * New check_data_free_space() with ability for precious data reserveation
  * Will replace old btrfs_check_data_free_space(), but for patch split,
  * add a new function first and then replace it.
  */
-int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
+int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	int ret;
@@ -5710,11 +5694,11 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
  * Return 0 for success
  * Return <0 for error(-ENOSPC or -EQUOT)
  */
-int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
+int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
 {
 	int ret;
 
-	ret = __btrfs_check_data_free_space(inode, start, len);
+	ret = btrfs_check_data_free_space(inode, start, len);
 	if (ret < 0)
 		return ret;
 	ret = btrfs_delalloc_reserve_metadata(inode, len);
@@ -5724,38 +5708,6 @@ int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
 }
 
 /**
- * btrfs_delalloc_reserve_space - reserve data and metadata space for delalloc
- * @inode: inode we're writing to
- * @num_bytes: the number of bytes we want to allocate
- *
- * This will do the following things
- *
- * o reserve space in the data space info for num_bytes
- * o reserve space in the metadata space info based on number of outstanding
- *   extents and how much csums will be needed
- * o add to the inodes ->delalloc_bytes
- * o add it to the fs_info's delalloc inodes list.
- *
- * This will return 0 for success and -ENOSPC if there is no space left.
- */
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes)
-{
-	int ret;
-
-	ret = btrfs_check_data_free_space(inode, num_bytes, num_bytes);
-	if (ret)
-		return ret;
-
-	ret = btrfs_delalloc_reserve_metadata(inode, num_bytes);
-	if (ret) {
-		btrfs_free_reserved_data_space(inode, num_bytes);
-		return ret;
-	}
-
-	return 0;
-}
-
-/**
  * btrfs_delalloc_release_space - release data and metadata space for delalloc
  * @inode: inode we're releasing space for
  * @num_bytes: the number of bytes we want to free up
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 26e59bc..124c1d4 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1532,7 +1532,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				goto reserve_metadata;
 			}
 		}
-		ret = __btrfs_check_data_free_space(inode, pos, write_bytes);
+		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
 		if (ret < 0)
 			break;
 
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index ab639d3..9a0ad27 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -488,7 +488,7 @@ again:
 	/* Just to make sure we have enough space */
 	prealloc += 8 * PAGE_CACHE_SIZE;
 
-	ret = __btrfs_delalloc_reserve_space(inode, 0, prealloc);
+	ret = btrfs_delalloc_reserve_space(inode, 0, prealloc);
 	if (ret)
 		goto out_put;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d70cb26..8c09197 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1985,8 +1985,8 @@ again:
 		goto again;
 	}
 
-	ret = __btrfs_delalloc_reserve_space(inode, page_start,
-					     PAGE_CACHE_SIZE);
+	ret = btrfs_delalloc_reserve_space(inode, page_start,
+					   PAGE_CACHE_SIZE);
 	if (ret) {
 		mapping_set_error(page->mapping, ret);
 		end_extent_writepage(page, ret, page_start, page_end);
@@ -4582,7 +4582,7 @@ int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
 	if ((offset & (blocksize - 1)) == 0 &&
 	    (!len || ((len & (blocksize - 1)) == 0)))
 		goto out;
-	ret = __btrfs_delalloc_reserve_space(inode,
+	ret = btrfs_delalloc_reserve_space(inode,
 			round_down(from, PAGE_CACHE_SIZE), PAGE_CACHE_SIZE);
 	if (ret)
 		goto out;
@@ -8375,7 +8375,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 			mutex_unlock(&inode->i_mutex);
 			relock = true;
 		}
-		ret = __btrfs_delalloc_reserve_space(inode, offset, count);
+		ret = btrfs_delalloc_reserve_space(inode, offset, count);
 		if (ret)
 			goto out;
 		outstanding_extents = div64_u64(count +
@@ -8625,8 +8625,8 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	page_start = page_offset(page);
 	page_end = page_start + PAGE_CACHE_SIZE - 1;
 
-	ret = __btrfs_delalloc_reserve_space(inode, page_start,
-					     PAGE_CACHE_SIZE);
+	ret = btrfs_delalloc_reserve_space(inode, page_start,
+					   PAGE_CACHE_SIZE);
 	if (!ret) {
 		ret = file_update_time(vma->vm_file);
 		reserved = 1;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e0291fc..c057317 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1119,7 +1119,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 
 	page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
 
-	ret = __btrfs_delalloc_reserve_space(inode,
+	ret = btrfs_delalloc_reserve_space(inode,
 			start_index << PAGE_CACHE_SHIFT,
 			page_cnt << PAGE_CACHE_SHIFT);
 	if (ret)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index b759e96..5f7d7f5 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2088,7 +2088,7 @@ out:
 	return ret;
 }
 
-int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
+static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
 {
 	struct btrfs_root *quota_root;
 	struct btrfs_qgroup *qgroup;
@@ -2221,6 +2221,11 @@ out:
 	spin_unlock(&fs_info->qgroup_lock);
 }
 
+static inline void qgroup_free(struct btrfs_root *root, u64 num_bytes)
+{
+	return btrfs_qgroup_free_refroot(root->fs_info, root->objectid,
+					 num_bytes);
+}
 void assert_qgroups_uptodate(struct btrfs_trans_handle *trans)
 {
 	if (list_empty(&trans->qgroup_ref_list) && !trans->delayed_ref_elem.seq)
@@ -2782,7 +2787,7 @@ static int reserve_data_range(struct btrfs_root *root,
 		cur_start = next_start;
 	}
 insert:
-	ret = btrfs_qgroup_reserve(root, reserve);
+	ret = qgroup_reserve(root, reserve);
 	if (ret < 0)
 		return ret;
 	/* ranges must be inserted after we are sure it has enough space */
@@ -3001,7 +3006,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
 	if (ret == 0)
 		kfree(tmp);
 	if (free_reserved)
-		btrfs_qgroup_free(BTRFS_I(inode)->root, reserved);
+		qgroup_free(BTRFS_I(inode)->root, reserved);
 	spin_unlock(&map->lock);
 	return 0;
 }
@@ -3089,7 +3094,7 @@ void btrfs_qgroup_free_data_rsv_map(struct inode *inode)
 	/* insanity check */
 	WARN_ON(!root->fs_info->quota_enabled || !is_fstree(root->objectid));
 
-	btrfs_qgroup_free(root, dirty_map->reserved);
+	qgroup_free(root, dirty_map->reserved);
 	spin_lock(&dirty_map->lock);
 	while ((node = rb_first(&dirty_map->root)) != NULL) {
 		struct data_rsv_range *range;
@@ -3112,7 +3117,7 @@ int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes)
 		return 0;
 
 	BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
-	ret = btrfs_qgroup_reserve(root, num_bytes);
+	ret = qgroup_reserve(root, num_bytes);
 	if (ret < 0)
 		return ret;
 	atomic_add(num_bytes, &root->qgroup_meta_rsv);
@@ -3129,7 +3134,7 @@ void btrfs_qgroup_free_meta_all(struct btrfs_root *root)
 	reserved = atomic_xchg(&root->qgroup_meta_rsv, 0);
 	if (reserved == 0)
 		return;
-	btrfs_qgroup_free(root, reserved);
+	qgroup_free(root, reserved);
 }
 
 void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
@@ -3140,5 +3145,5 @@ void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
 	BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
 	WARN_ON(atomic_read(&root->qgroup_meta_rsv) < num_bytes);
 	atomic_sub(num_bytes, &root->qgroup_meta_rsv);
-	btrfs_qgroup_free(root, num_bytes);
+	qgroup_free(root, num_bytes);
 }
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 2d507c8..da849db 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -74,15 +74,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
 			 struct btrfs_qgroup_inherit *inherit);
-int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
 void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
 			       u64 ref_root, u64 num_bytes);
-static inline void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
-{
-	return btrfs_qgroup_free_refroot(root->fs_info, root->objectid,
-					 num_bytes);
-}
-
 void assert_qgroups_uptodate(struct btrfs_trans_handle *trans);
 
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 19/19] btrfs: qgroup: Add handler for NOCOW and inline
  2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
                   ` (17 preceding siblings ...)
  2015-09-08  8:37 ` [PATCH 18/19] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
@ 2015-09-08  8:37 ` Qu Wenruo
  18 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  8:37 UTC (permalink / raw)
  To: linux-btrfs

For NOCOW and inline case, there will be no delayed_ref created for
them, so we should free their reserved data space at proper
time(finish_ordered_io for NOCOW and cow_file_inline for inline).

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c |  7 ++++++-
 fs/btrfs/inode.c       | 15 +++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ab1b1a1..ca15bd3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4055,7 +4055,12 @@ int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
 	if (ret < 0)
 		return ret;
 
-	/* Use new btrfs_qgroup_reserve_data to reserve precious data space */
+	/*
+	 * Use new btrfs_qgroup_reserve_data to reserve precious data space
+	 *
+	 * TODO: Find a good method to avoid reserve data space for NOCOW
+	 * range, but don't impact performance on quota disable case.
+	 */
 	ret = btrfs_qgroup_reserve_data(inode, start, len);
 	return ret;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8c09197..9b783e6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -310,6 +310,13 @@ static noinline int cow_file_range_inline(struct btrfs_root *root,
 	btrfs_delalloc_release_metadata(inode, end + 1 - start);
 	btrfs_drop_extent_cache(inode, start, aligned_end - 1, 0);
 out:
+	/*
+	 * Don't forget to free the reserved space, as for inlined extent
+	 * it won't count as data extent, free them directly here.
+	 * And at reserve time, it's always aligned to page size, so
+	 * just free one page here.
+	 */
+	btrfs_qgroup_free_data(inode, 0, PAGE_CACHE_SIZE);
 	btrfs_free_path(path);
 	btrfs_end_transaction(trans, root);
 	return ret;
@@ -2831,6 +2838,14 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 
 	if (test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags)) {
 		BUG_ON(!list_empty(&ordered_extent->list)); /* Logic error */
+
+		/*
+		 * For mwrite(mmap + memset to write) case, we still reserve
+		 * space for NOCOW range.
+		 * As NOCOW won't cause a new delayed ref, just free the space
+		 */
+		btrfs_qgroup_free_data(inode, ordered_extent->file_offset,
+				       ordered_extent->len);
 		btrfs_ordered_update_i_size(inode, 0, ordered_extent);
 		if (nolock)
 			trans = btrfs_join_transaction_nolock(root);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 13/19] btrfs: extent-tree: Add new verions of btrfs_check_data_free_space
  2015-09-08  9:22 ` [PATCH 13/19] btrfs: extent-tree: Add new verions of btrfs_check_data_free_space Qu Wenruo
@ 2015-09-09  1:35   ` Tsutomu Itoh
  0 siblings, 0 replies; 22+ messages in thread
From: Tsutomu Itoh @ 2015-09-09  1:35 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Hi, Qu,

On 2015/09/08 18:22, Qu Wenruo wrote:
> Add new function __btrfs_check_data_free_space() to do precious space
> reservation.
> 
> The new function will replace old btrfs_check_data_free_space(), but
> until all the change is done, let's just use the new name.
> 
> Also, export internal use function btrfs_alloc_data_chunk_ondemand(), as
> now qgroup reserve requires precious bytes, which can only be got in
> later loop(like fallocate).
> But data space info check and data chunk allocate doesn't need to be
> that accurate, and can be called at the beginning.
> 
> So export it for later operations.
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>   fs/btrfs/ctree.h       |  2 ++
>   fs/btrfs/extent-tree.c | 50 +++++++++++++++++++++++++++++++++++++++++---------
>   2 files changed, 43 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index ae86025..c1a0aaf 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3453,6 +3453,8 @@ enum btrfs_reserve_flush_enum {
>   };
>   
>   int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes);
> +int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
> +int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes);
>   void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes);
>   void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
>   				struct btrfs_root *root);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 402415c..61366ca 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3907,11 +3907,7 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data)
>   	return ret;
>   }
>   
> -/*
> - * This will check the space that the inode allocates from to make sure we have
> - * enough space for bytes.
> - */
> -int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
> +int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
>   {
>   	struct btrfs_space_info *data_sinfo;
>   	struct btrfs_root *root = BTRFS_I(inode)->root;
> @@ -4032,19 +4028,55 @@ commit_trans:
>   					      data_sinfo->flags, bytes, 1);
>   		return -ENOSPC;
>   	}
> -	ret = btrfs_qgroup_reserve(root, write_bytes);
> -	if (ret)
> -		goto out;
>   	data_sinfo->bytes_may_use += bytes;
>   	trace_btrfs_space_reservation(root->fs_info, "space_info",
>   				      data_sinfo->flags, bytes, 1);
> -out:
>   	spin_unlock(&data_sinfo->lock);
>   
>   	return ret;
>   }
>   
>   /*
> + * This will check the space that the inode allocates from to make sure we have
> + * enough space for bytes.
> + */
> +int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
> +{
> +	struct btrfs_root *root = BTRFS_I(inode)->root;
> +	int ret;
> +
> +	ret = btrfs_alloc_data_chunk_ondemand(inode, bytes);
> +	if (ret < 0)
> +		return ret;
> +	ret = btrfs_qgroup_reserve(root, write_bytes);
> +	return ret;
> +}
> +
> +/*

> + * New check_data_free_space() with ability for precious data reserveation

                                                                 reservation

Thanks,
Tsutomu

> + * Will replace old btrfs_check_data_free_space(), but for patch split,
> + * add a new function first and then replace it.
> + */
> +int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
> +{
> +	struct btrfs_root *root = BTRFS_I(inode)->root;
> +	int ret;
> +
> +	/* align the range */
> +	len = round_up(start + len, root->sectorsize) -
> +	      round_down(start, root->sectorsize);
> +	start = round_down(start, root->sectorsize);
> +
> +	ret = btrfs_alloc_data_chunk_ondemand(inode, len);
> +	if (ret < 0)
> +		return ret;
> +
> +	/* Use new btrfs_qgroup_reserve_data to reserve precious data space */
> +	ret = btrfs_qgroup_reserve_data(inode, start, len);
> +	return ret;
> +}
> +
> +/*
>    * Called if we need to clear a data reservation for this inode.
>    */
>   void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes)
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 13/19] btrfs: extent-tree: Add new verions of btrfs_check_data_free_space
  2015-09-08  8:56 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
@ 2015-09-08  9:22 ` Qu Wenruo
  2015-09-09  1:35   ` Tsutomu Itoh
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2015-09-08  9:22 UTC (permalink / raw)
  To: linux-btrfs

Add new function __btrfs_check_data_free_space() to do precious space
reservation.

The new function will replace old btrfs_check_data_free_space(), but
until all the change is done, let's just use the new name.

Also, export internal use function btrfs_alloc_data_chunk_ondemand(), as
now qgroup reserve requires precious bytes, which can only be got in
later loop(like fallocate).
But data space info check and data chunk allocate doesn't need to be
that accurate, and can be called at the beginning.

So export it for later operations.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  2 ++
 fs/btrfs/extent-tree.c | 50 +++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ae86025..c1a0aaf 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3453,6 +3453,8 @@ enum btrfs_reserve_flush_enum {
 };
 
 int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes);
+int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
+int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes);
 void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes);
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
 				struct btrfs_root *root);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 402415c..61366ca 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3907,11 +3907,7 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data)
 	return ret;
 }
 
-/*
- * This will check the space that the inode allocates from to make sure we have
- * enough space for bytes.
- */
-int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
+int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
 {
 	struct btrfs_space_info *data_sinfo;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -4032,19 +4028,55 @@ commit_trans:
 					      data_sinfo->flags, bytes, 1);
 		return -ENOSPC;
 	}
-	ret = btrfs_qgroup_reserve(root, write_bytes);
-	if (ret)
-		goto out;
 	data_sinfo->bytes_may_use += bytes;
 	trace_btrfs_space_reservation(root->fs_info, "space_info",
 				      data_sinfo->flags, bytes, 1);
-out:
 	spin_unlock(&data_sinfo->lock);
 
 	return ret;
 }
 
 /*
+ * This will check the space that the inode allocates from to make sure we have
+ * enough space for bytes.
+ */
+int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	int ret;
+
+	ret = btrfs_alloc_data_chunk_ondemand(inode, bytes);
+	if (ret < 0)
+		return ret;
+	ret = btrfs_qgroup_reserve(root, write_bytes);
+	return ret;
+}
+
+/*
+ * New check_data_free_space() with ability for precious data reserveation
+ * Will replace old btrfs_check_data_free_space(), but for patch split,
+ * add a new function first and then replace it.
+ */
+int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	int ret;
+
+	/* align the range */
+	len = round_up(start + len, root->sectorsize) -
+	      round_down(start, root->sectorsize);
+	start = round_down(start, root->sectorsize);
+
+	ret = btrfs_alloc_data_chunk_ondemand(inode, len);
+	if (ret < 0)
+		return ret;
+
+	/* Use new btrfs_qgroup_reserve_data to reserve precious data space */
+	ret = btrfs_qgroup_reserve_data(inode, start, len);
+	return ret;
+}
+
+/*
  * Called if we need to clear a data reservation for this inode.
  */
 void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes)
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-09-09  5:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-08  8:37 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
2015-09-08  8:37 ` [PATCH 01/19] btrfs: qgroup: New function declaration for new reserve implement Qu Wenruo
2015-09-08  8:37 ` [PATCH 02/19] btrfs: qgroup: Implement data_rsv_map init/free functions Qu Wenruo
2015-09-08  8:37 ` [PATCH 03/19] btrfs: qgroup: Introduce new function to search most left reserve range Qu Wenruo
2015-09-08  8:37 ` [PATCH 04/19] btrfs: qgroup: Introduce function to insert non-overlap " Qu Wenruo
2015-09-08  8:37 ` [PATCH 05/19] btrfs: qgroup: Introduce function to reserve data range per inode Qu Wenruo
2015-09-08  8:37 ` [PATCH 06/19] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
2015-09-08  8:37 ` [PATCH 07/19] btrfs: qgroup: Introduce function to release reserved range Qu Wenruo
2015-09-08  8:37 ` [PATCH 08/19] btrfs: qgroup: Introduce function to release/free reserved data range Qu Wenruo
2015-09-08  8:37 ` [PATCH 09/19] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
2015-09-08  8:37 ` [PATCH 10/19] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
2015-09-08  8:37 ` [PATCH 11/19] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
2015-09-08  8:37 ` [PATCH 12/19] btrfs: qgroup: Use new metadata reservation Qu Wenruo
2015-09-08  8:37 ` [PATCH 13/19] btrfs: extent-tree: Add new verions of btrfs_check_data_free_space Qu Wenruo
2015-09-08  8:37 ` [PATCH 14/19] btrfs: Switch to new check_data_free_space Qu Wenruo
2015-09-08  8:37 ` [PATCH 15/19] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
2015-09-08  8:37 ` [PATCH 16/19] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve_space Qu Wenruo
2015-09-08  8:37 ` [PATCH 17/19] btrfs: extent-tree: Use new __btrfs_delalloc_reserve_space function Qu Wenruo
2015-09-08  8:37 ` [PATCH 18/19] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
2015-09-08  8:37 ` [PATCH 19/19] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
2015-09-08  8:56 [PATCH RFC 00/14] Accurate qgroup reserve framework Qu Wenruo
2015-09-08  9:22 ` [PATCH 13/19] btrfs: extent-tree: Add new verions of btrfs_check_data_free_space Qu Wenruo
2015-09-09  1:35   ` Tsutomu Itoh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.