All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/21] Rework btrfs qgroup reserved space framework
@ 2015-10-13  2:20 Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 01/21] btrfs: extent_io: Introduce needed structure for recoding set/clear bits Qu Wenruo
                   ` (20 more replies)
  0 siblings, 21 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

In previous rework of qgroup, we succeeded in fixing qgroup accounting
part, making the rfer/excl numbers accurate.

But that's just part of qgroup work, another part of qgroup still has
quite a lot problem, that's qgroup reserve space part which will lead to
EQUOT even we are far from the limit.

[[BUG]]
The easiest way to trigger the bug is,
1) Enable quota
2) Limit excl of qgroup 5 to 16M
3) Write [0,2M) of a file inside subvol 5 10 times without sync

EQUOT will be triggered at about the 8th write.
But after remount, we can still write until about 15M.

[[CAUSE]]
The problem is caused by the fact that qgroup will reserve space even
the data space is already reserved.

In above reproducer, each time we buffered write [0,2M) qgroup will
reserve 2M space, but in fact, at the 1st time, we have already reserved
2M and from then on, we don't need to reserved any data space as we are
only writing [0,2M).

Also, the reserved space will only be freed *ONCE* when its backref is
run at commit_transaction() time.

That's causing the reserved space leaking.

[[FIX]]
Reuse the existing io_tree facilities to record which range is already
reserved for qgroup.

Although qgroup reserved space behavior is quite similar with already
existing DELALLOC flag, but since fallocate don't go through DELALLOC
flag, we introduce a new extent flag, EXTENT_QGROUP_RESERVED for our own
purpose, without interfering any existing flag.

The new API itself is quite safe, any stupid caller reserve or free a
range twice or more won't cause any problem, due to the nature of the
design.

[[PATCH STRUCTURE]]
As the patchset is a little huge, it can be spilt into different parts:
1) Accurate reserve space framework API(Patch 1 ~ 8)
   Use io_tree to implement the needed data reserve API.
   And slightly change the metadata reserve API

2) Apply needed hooks to related callers(Pathc 9 ~ 18)
   The following functions need to be converted to using new qgroup
   reserve API:
   btrfs_check_free_data_space()
   btrfs_free_reserved_data_space()
   btrfs_delalloc_reserve_space()
   btrfs_delalloc_release_space()

   And the following function need to change its behavior for accurate
   qgroup reserve space:
   btrfs_fallocate()

   Also add ftrace support for new APIs in patch 17.

3) Minor enhancement and fix(Patch 19~21)
   Avoid unneeded page truncating (Patch 19)
   Fix a deadlock due to lock io_tree with io_tree lock hold in
   set_bit_hook() (Patch 20)
   And finally, makes qgroup reserved space much more obvious for
   further debugging (Patch 21)

[[Changelog]]
v2:
  Add new handlers to avoid reserved space leaking for buffered write
  followed by a truncate:
    btrfs_invalidatepage()
    evict_inode_truncate_page()
  Add new handlers to avoid reserved space leaking for error handle
  routine:
    btrfs_free_reserved_data_space()
    btrfs_delalloc_release_space()

v3:
  Use io_tree to implement data reserve map, which hugely reduced the
  patchset size, from 1300+ lines net insert to 600+ lines net insert.
  Suggested-by: Josef Bacik<jbacik@fb.com>

Qu Wenruo (21):
  btrfs: extent_io: Introduce needed structure for recoding set/clear
    bits
  btrfs: extent_io: Introduce new function set_record_extent_bits
  btrfs: extent_io: Introduce new function clear_record_extent_bits()
  btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function
  btrfs: qgroup: Introduce functions to release/free qgroup reserve data
        space
  btrfs: delayed_ref: Add new function to record reserved space into
    delayed ref
  btrfs: delayed_ref: release and free qgroup reserved at proper timing
  btrfs: qgroup: Introduce new functions to reserve/free metadata
  btrfs: qgroup: Use new metadata reservation.
  btrfs: extent-tree: Add new version of btrfs_check_data_free_space and
    btrfs_free_reserved_data_space.
  btrfs: extent-tree: Switch to new check_data_free_space and
    free_reserved_data_space
  btrfs: extent-tree: Add new version of
    btrfs_delalloc_reserve/release_space
  btrfs: extent-tree: Switch to new delalloc space reserve and release
  btrfs: qgroup: Cleanup old inaccurate facilities
  btrfs: qgroup: Add handler for NOCOW and inline
  btrfs: Add handler for invalidate page
  btrfs: qgroup: Add new trace point for qgroup data reserve
  btrfs: fallocate: Add support to accurate qgroup reserve
  btrfs: Avoid truncate tailing page if fallocate range doesn't exceed
    inode size
  btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in
    clear_bit_hook
  btrfs: qgroup: Check if qgroup reserved space leaked

 fs/btrfs/ctree.h             |  14 ++-
 fs/btrfs/delayed-ref.c       |  29 +++++++
 fs/btrfs/delayed-ref.h       |  14 +++
 fs/btrfs/disk-io.c           |   1 +
 fs/btrfs/extent-tree.c       | 149 ++++++++++++++++++++++----------
 fs/btrfs/extent_io.c         | 121 +++++++++++++++++++-------
 fs/btrfs/extent_io.h         |  19 +++++
 fs/btrfs/file.c              | 193 +++++++++++++++++++++++++++++------------
 fs/btrfs/inode-map.c         |   6 +-
 fs/btrfs/inode.c             |  86 ++++++++++++++++---
 fs/btrfs/ioctl.c             |  10 ++-
 fs/btrfs/qgroup.c            | 199 ++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/qgroup.h            |  31 ++++++-
 fs/btrfs/relocation.c        |   8 +-
 fs/btrfs/transaction.c       |  34 ++------
 fs/btrfs/transaction.h       |   1 -
 include/trace/events/btrfs.h | 113 ++++++++++++++++++++++++
 17 files changed, 832 insertions(+), 196 deletions(-)

-- 
2.6.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 01/21] btrfs: extent_io: Introduce needed structure for recoding set/clear bits
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 02/21] btrfs: extent_io: Introduce new function set_record_extent_bits Qu Wenruo
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Add a new structure, extent_change_set, to record how many bytes are
changed in one set/clear_extent_bits() operation, with detailed changed
ranges info.

This provides the needed facilities for later qgroup reserve framework.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v3:
  Newly introduced, to reuse existing extent_io facilities.
---
 fs/btrfs/extent_io.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index c668f36..3107a6e 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -2,6 +2,7 @@
 #define __EXTENTIO__
 
 #include <linux/rbtree.h>
+#include "ulist.h"
 
 /* bits for the extent state */
 #define EXTENT_DIRTY		(1U << 0)
@@ -161,6 +162,17 @@ struct extent_buffer {
 #endif
 };
 
+/*
+ * Structure to record how many bytes and which ranges are set/cleared
+ */
+struct extent_changeset {
+	/* How many bytes are set/cleared in this operation */
+	u64 bytes_changed;
+
+	/* Changed ranges */
+	struct ulist *range_changed;
+};
+
 static inline void extent_set_compress_type(unsigned long *bio_flags,
 					    int compress_type)
 {
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 02/21] btrfs: extent_io: Introduce new function set_record_extent_bits
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 01/21] btrfs: extent_io: Introduce needed structure for recoding set/clear bits Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 03/21] btrfs: extent_io: Introduce new function clear_record_extent_bits() Qu Wenruo
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Introduce new function set_record_extent_bits(), which will not only set
given bits, but also record how many bytes are changed, and detailed
range info.

This is quite important for later qgroup reserve framework.
The number of bytes will be used to do qgroup reserve, and detailed
range info will be used to cleanup for EQUOT case.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v3:
  Newly introduced
---
 fs/btrfs/extent_io.c | 71 +++++++++++++++++++++++++++++++++++++++-------------
 fs/btrfs/extent_io.h |  3 +++
 2 files changed, 56 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 363726b..f5efaa6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -131,6 +131,23 @@ struct extent_page_data {
 	unsigned int sync_io:1;
 };
 
+static void add_extent_changeset(struct extent_state *state, unsigned bits,
+				 struct extent_changeset *changeset,
+				 int set)
+{
+	int ret;
+
+	if (!changeset)
+		return;
+	if (set && (state->state & bits) == bits)
+		return;
+	changeset->bytes_changed += state->end - state->start + 1;
+	ret = ulist_add(changeset->range_changed, state->start, state->end,
+			GFP_ATOMIC);
+	/* ENOMEM */
+	BUG_ON(ret < 0);
+}
+
 static noinline void flush_write_bio(void *data);
 static inline struct btrfs_fs_info *
 tree_fs_info(struct extent_io_tree *tree)
@@ -410,7 +427,8 @@ static void clear_state_cb(struct extent_io_tree *tree,
 }
 
 static void set_state_bits(struct extent_io_tree *tree,
-			   struct extent_state *state, unsigned *bits);
+			   struct extent_state *state, unsigned *bits,
+			   struct extent_changeset *changeset);
 
 /*
  * insert an extent_state struct into the tree.  'bits' are set on the
@@ -426,7 +444,7 @@ static int insert_state(struct extent_io_tree *tree,
 			struct extent_state *state, u64 start, u64 end,
 			struct rb_node ***p,
 			struct rb_node **parent,
-			unsigned *bits)
+			unsigned *bits, struct extent_changeset *changeset)
 {
 	struct rb_node *node;
 
@@ -436,7 +454,7 @@ static int insert_state(struct extent_io_tree *tree,
 	state->start = start;
 	state->end = end;
 
-	set_state_bits(tree, state, bits);
+	set_state_bits(tree, state, bits, changeset);
 
 	node = tree_insert(&tree->state, NULL, end, &state->rb_node, p, parent);
 	if (node) {
@@ -789,7 +807,7 @@ out:
 
 static void set_state_bits(struct extent_io_tree *tree,
 			   struct extent_state *state,
-			   unsigned *bits)
+			   unsigned *bits, struct extent_changeset *changeset)
 {
 	unsigned bits_to_set = *bits & ~EXTENT_CTLBITS;
 
@@ -798,6 +816,7 @@ static void set_state_bits(struct extent_io_tree *tree,
 		u64 range = state->end - state->start + 1;
 		tree->dirty_bytes += range;
 	}
+	add_extent_changeset(state, bits_to_set, changeset, 1);
 	state->state |= bits_to_set;
 }
 
@@ -835,7 +854,7 @@ static int __must_check
 __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		 unsigned bits, unsigned exclusive_bits,
 		 u64 *failed_start, struct extent_state **cached_state,
-		 gfp_t mask)
+		 gfp_t mask, struct extent_changeset *changeset)
 {
 	struct extent_state *state;
 	struct extent_state *prealloc = NULL;
@@ -873,7 +892,7 @@ again:
 		prealloc = alloc_extent_state_atomic(prealloc);
 		BUG_ON(!prealloc);
 		err = insert_state(tree, prealloc, start, end,
-				   &p, &parent, &bits);
+				   &p, &parent, &bits, changeset);
 		if (err)
 			extent_io_tree_panic(tree, err);
 
@@ -899,7 +918,7 @@ hit_next:
 			goto out;
 		}
 
-		set_state_bits(tree, state, &bits);
+		set_state_bits(tree, state, &bits, changeset);
 		cache_state(state, cached_state);
 		merge_state(tree, state);
 		if (last_end == (u64)-1)
@@ -945,7 +964,7 @@ hit_next:
 		if (err)
 			goto out;
 		if (state->end <= end) {
-			set_state_bits(tree, state, &bits);
+			set_state_bits(tree, state, &bits, changeset);
 			cache_state(state, cached_state);
 			merge_state(tree, state);
 			if (last_end == (u64)-1)
@@ -980,7 +999,7 @@ hit_next:
 		 * the later extent.
 		 */
 		err = insert_state(tree, prealloc, start, this_end,
-				   NULL, NULL, &bits);
+				   NULL, NULL, &bits, changeset);
 		if (err)
 			extent_io_tree_panic(tree, err);
 
@@ -1008,7 +1027,7 @@ hit_next:
 		if (err)
 			extent_io_tree_panic(tree, err);
 
-		set_state_bits(tree, prealloc, &bits);
+		set_state_bits(tree, prealloc, &bits, changeset);
 		cache_state(prealloc, cached_state);
 		merge_state(tree, prealloc);
 		prealloc = NULL;
@@ -1038,7 +1057,7 @@ int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		   struct extent_state **cached_state, gfp_t mask)
 {
 	return __set_extent_bit(tree, start, end, bits, 0, failed_start,
-				cached_state, mask);
+				cached_state, mask, NULL);
 }
 
 
@@ -1111,7 +1130,7 @@ again:
 			goto out;
 		}
 		err = insert_state(tree, prealloc, start, end,
-				   &p, &parent, &bits);
+				   &p, &parent, &bits, NULL);
 		if (err)
 			extent_io_tree_panic(tree, err);
 		cache_state(prealloc, cached_state);
@@ -1130,7 +1149,7 @@ hit_next:
 	 * Just lock what we found and keep going
 	 */
 	if (state->start == start && state->end <= end) {
-		set_state_bits(tree, state, &bits);
+		set_state_bits(tree, state, &bits, NULL);
 		cache_state(state, cached_state);
 		state = clear_state_bit(tree, state, &clear_bits, 0);
 		if (last_end == (u64)-1)
@@ -1171,7 +1190,7 @@ hit_next:
 		if (err)
 			goto out;
 		if (state->end <= end) {
-			set_state_bits(tree, state, &bits);
+			set_state_bits(tree, state, &bits, NULL);
 			cache_state(state, cached_state);
 			state = clear_state_bit(tree, state, &clear_bits, 0);
 			if (last_end == (u64)-1)
@@ -1208,7 +1227,7 @@ hit_next:
 		 * the later extent.
 		 */
 		err = insert_state(tree, prealloc, start, this_end,
-				   NULL, NULL, &bits);
+				   NULL, NULL, &bits, NULL);
 		if (err)
 			extent_io_tree_panic(tree, err);
 		cache_state(prealloc, cached_state);
@@ -1233,7 +1252,7 @@ hit_next:
 		if (err)
 			extent_io_tree_panic(tree, err);
 
-		set_state_bits(tree, prealloc, &bits);
+		set_state_bits(tree, prealloc, &bits, NULL);
 		cache_state(prealloc, cached_state);
 		clear_state_bit(tree, prealloc, &clear_bits, 0);
 		prealloc = NULL;
@@ -1274,6 +1293,22 @@ int set_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 			      NULL, mask);
 }
 
+int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
+			   unsigned bits, gfp_t mask,
+			   struct extent_changeset *changeset)
+{
+	/*
+	 * We don't support EXTENT_LOCKED yet, as current changeset will
+	 * record any bits changed, so for EXTENT_LOCKED case, it will
+	 * either fail with -EEXIST or changeset will record the whole
+	 * range.
+	 */
+	BUG_ON(bits & EXTENT_LOCKED);
+
+	return __set_extent_bit(tree, start, end, bits, 0, NULL, NULL, mask,
+				changeset);
+}
+
 int clear_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 		      unsigned bits, gfp_t mask)
 {
@@ -1343,7 +1378,7 @@ int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 	while (1) {
 		err = __set_extent_bit(tree, start, end, EXTENT_LOCKED | bits,
 				       EXTENT_LOCKED, &failed_start,
-				       cached_state, GFP_NOFS);
+				       cached_state, GFP_NOFS, NULL);
 		if (err == -EEXIST) {
 			wait_extent_bit(tree, failed_start, end, EXTENT_LOCKED);
 			start = failed_start;
@@ -1365,7 +1400,7 @@ int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end)
 	u64 failed_start;
 
 	err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, EXTENT_LOCKED,
-			       &failed_start, NULL, GFP_NOFS);
+			       &failed_start, NULL, GFP_NOFS, NULL);
 	if (err == -EEXIST) {
 		if (failed_start > start)
 			clear_extent_bit(tree, start, failed_start - 1,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 3107a6e..4a7c9d9 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -227,6 +227,9 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		     struct extent_state **cached, gfp_t mask);
 int set_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 		    unsigned bits, gfp_t mask);
+int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
+			   unsigned bits, gfp_t mask,
+			   struct extent_changeset *changeset);
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		   unsigned bits, u64 *failed_start,
 		   struct extent_state **cached_state, gfp_t mask);
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 03/21] btrfs: extent_io: Introduce new function clear_record_extent_bits()
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 01/21] btrfs: extent_io: Introduce needed structure for recoding set/clear bits Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 02/21] btrfs: extent_io: Introduce new function set_record_extent_bits Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 04/21] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Introduce new function clear_record_extent_bits(), which will clear bits
for given range and record the details about which ranges are cleared
and how many bytes in total it changes.

This provides the basis for later qgroup reserve codes.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v3:
  Newly introduced
---
 fs/btrfs/extent_io.c | 50 +++++++++++++++++++++++++++++++++++++++-----------
 fs/btrfs/extent_io.h |  3 +++
 2 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f5efaa6..1c20f8be 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -141,6 +141,8 @@ static void add_extent_changeset(struct extent_state *state, unsigned bits,
 		return;
 	if (set && (state->state & bits) == bits)
 		return;
+	if (!set && (state->state & bits) == 0)
+		return;
 	changeset->bytes_changed += state->end - state->start + 1;
 	ret = ulist_add(changeset->range_changed, state->start, state->end,
 			GFP_ATOMIC);
@@ -529,7 +531,8 @@ static struct extent_state *next_state(struct extent_state *state)
  */
 static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
 					    struct extent_state *state,
-					    unsigned *bits, int wake)
+					    unsigned *bits, int wake,
+					    struct extent_changeset *changeset)
 {
 	struct extent_state *next;
 	unsigned bits_to_clear = *bits & ~EXTENT_CTLBITS;
@@ -540,6 +543,7 @@ static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
 		tree->dirty_bytes -= range;
 	}
 	clear_state_cb(tree, state, bits);
+	add_extent_changeset(state, bits_to_clear, changeset, 0);
 	state->state &= ~bits_to_clear;
 	if (wake)
 		wake_up(&state->wq);
@@ -587,10 +591,10 @@ static void extent_io_tree_panic(struct extent_io_tree *tree, int err)
  *
  * This takes the tree lock, and returns 0 on success and < 0 on error.
  */
-int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		     unsigned bits, int wake, int delete,
-		     struct extent_state **cached_state,
-		     gfp_t mask)
+static int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
+			      unsigned bits, int wake, int delete,
+			      struct extent_state **cached_state,
+			      gfp_t mask, struct extent_changeset *changeset)
 {
 	struct extent_state *state;
 	struct extent_state *cached;
@@ -689,7 +693,8 @@ hit_next:
 		if (err)
 			goto out;
 		if (state->end <= end) {
-			state = clear_state_bit(tree, state, &bits, wake);
+			state = clear_state_bit(tree, state, &bits, wake,
+						changeset);
 			goto next;
 		}
 		goto search_again;
@@ -710,13 +715,13 @@ hit_next:
 		if (wake)
 			wake_up(&state->wq);
 
-		clear_state_bit(tree, prealloc, &bits, wake);
+		clear_state_bit(tree, prealloc, &bits, wake, changeset);
 
 		prealloc = NULL;
 		goto out;
 	}
 
-	state = clear_state_bit(tree, state, &bits, wake);
+	state = clear_state_bit(tree, state, &bits, wake, changeset);
 next:
 	if (last_end == (u64)-1)
 		goto out;
@@ -1151,7 +1156,7 @@ hit_next:
 	if (state->start == start && state->end <= end) {
 		set_state_bits(tree, state, &bits, NULL);
 		cache_state(state, cached_state);
-		state = clear_state_bit(tree, state, &clear_bits, 0);
+		state = clear_state_bit(tree, state, &clear_bits, 0, NULL);
 		if (last_end == (u64)-1)
 			goto out;
 		start = last_end + 1;
@@ -1192,7 +1197,8 @@ hit_next:
 		if (state->end <= end) {
 			set_state_bits(tree, state, &bits, NULL);
 			cache_state(state, cached_state);
-			state = clear_state_bit(tree, state, &clear_bits, 0);
+			state = clear_state_bit(tree, state, &clear_bits, 0,
+						NULL);
 			if (last_end == (u64)-1)
 				goto out;
 			start = last_end + 1;
@@ -1254,7 +1260,7 @@ hit_next:
 
 		set_state_bits(tree, prealloc, &bits, NULL);
 		cache_state(prealloc, cached_state);
-		clear_state_bit(tree, prealloc, &clear_bits, 0);
+		clear_state_bit(tree, prealloc, &clear_bits, 0, NULL);
 		prealloc = NULL;
 		goto out;
 	}
@@ -1309,6 +1315,14 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 				changeset);
 }
 
+int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
+		     unsigned bits, int wake, int delete,
+		     struct extent_state **cached, gfp_t mask)
+{
+	return __clear_extent_bit(tree, start, end, bits, wake, delete,
+				  cached, mask, NULL);
+}
+
 int clear_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 		      unsigned bits, gfp_t mask)
 {
@@ -1320,6 +1334,20 @@ int clear_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 	return clear_extent_bit(tree, start, end, bits, wake, 0, NULL, mask);
 }
 
+int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
+			     unsigned bits, gfp_t mask,
+			     struct extent_changeset *changeset)
+{
+	/*
+	 * Don't support EXTENT_LOCKED case, same reason as
+	 * set_record_extent_bits().
+	 */
+	BUG_ON(bits & EXTENT_LOCKED);
+
+	return __clear_extent_bit(tree, start, end, bits, 0, 0, NULL, mask,
+				  changeset);
+}
+
 int set_extent_delalloc(struct extent_io_tree *tree, u64 start, u64 end,
 			struct extent_state **cached_state, gfp_t mask)
 {
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4a7c9d9..51e1b71 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -222,6 +222,9 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		   struct extent_state *cached_state);
 int clear_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 		      unsigned bits, gfp_t mask);
+int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
+			     unsigned bits, gfp_t mask,
+			     struct extent_changeset *changeset);
 int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		     unsigned bits, int wake, int delete,
 		     struct extent_state **cached, gfp_t mask);
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 04/21] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (2 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 03/21] btrfs: extent_io: Introduce new function clear_record_extent_bits() Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 05/21] btrfs: qgroup: Introduce functions to release/free qgroup reserve data space Qu Wenruo
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new function, btrfs_qgroup_reserve_data(), which will use
io_tree to accurate qgroup reserve, to avoid reserved space leaking.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Add needed parameter for later trace functions
v3:
  Use io_tree facilities instead of data_rsv_map facilities
---
 fs/btrfs/extent_io.h |  1 +
 fs/btrfs/qgroup.c    | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h    |  2 ++
 3 files changed, 52 insertions(+)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 51e1b71..f4c1ae1 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -19,6 +19,7 @@
 #define EXTENT_NEED_WAIT	(1U << 13)
 #define EXTENT_DAMAGED		(1U << 14)
 #define EXTENT_NORESERVE	(1U << 15)
+#define EXTENT_QGROUP_RESERVED	(1U << 16)
 #define EXTENT_IOBITS		(EXTENT_LOCKED | EXTENT_WRITEBACK)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING | EXTENT_FIRST_DELALLOC)
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index e9ace09..9ef8d73 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2481,3 +2481,52 @@ btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
 		btrfs_queue_work(fs_info->qgroup_rescan_workers,
 				 &fs_info->qgroup_rescan_work);
 }
+
+/*
+ * Reserve qgroup space for range [start, start + len).
+ *
+ * This function will either reserve space from related qgroups or doing
+ * nothing if the range is already reserved.
+ *
+ * Return 0 for successful reserve
+ * Return <0 for error (including -EQUOT)
+ *
+ * NOTE: this function may sleep for memory allocation.
+ */
+int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct extent_changeset changeset;
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+	int ret;
+
+	if (!root->fs_info->quota_enabled || !is_fstree(root->objectid) ||
+	    len == 0)
+		return 0;
+
+	changeset.bytes_changed = 0;
+	changeset.range_changed = ulist_alloc(GFP_NOFS);
+
+	ret = set_record_extent_bits(&BTRFS_I(inode)->io_tree, start,
+			start + len -1, EXTENT_QGROUP_RESERVED, GFP_NOFS,
+			&changeset);
+	if (ret < 0)
+		goto cleanup;
+	ret = btrfs_qgroup_reserve(root, changeset.bytes_changed);
+	if (ret < 0)
+		goto cleanup;
+
+	ulist_free(changeset.range_changed);
+	return ret;
+
+cleanup:
+	/* cleanup already reserved ranges */
+	ULIST_ITER_INIT(&uiter);
+	while ((unode = ulist_next(changeset.range_changed, &uiter)))
+		clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
+				 unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL,
+				 GFP_NOFS);
+	ulist_free(changeset.range_changed);
+	return ret;
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 6387dcf..bd17cc2 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -81,4 +81,6 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 			       u64 rfer, u64 excl);
 #endif
 
+/* New io_tree based accurate qgroup reserve API */
+int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
 #endif /* __BTRFS_QGROUP__ */
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 05/21] btrfs: qgroup: Introduce functions to release/free qgroup reserve data space
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (3 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 04/21] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Introduce functions btrfs_qgroup_release/free_data() to release/free
reserved data range.

Release means, just remove the data range from io_tree, but doesn't
free the reserved space.
This is for normal buffered write case, when data is written into disc
and its metadata is added into tree, its reserved space should still be
kept until commit_trans().
So in that case, we only release dirty range, but keep the reserved
space recorded some other place until commit_tran().

Free means not only remove data range, but also free reserved space.
This is used for case for cleanup and invalidate page.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Fix comment typo
  Update comment, to make it clear that the reserved space for any page
  cache will either be released(it goes to disk) or freed directly
  (truncated before reaching disk)
v3:
  Change to use io_tree facilities.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h |  2 ++
 2 files changed, 62 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9ef8d73..a86d9c6 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2530,3 +2530,63 @@ cleanup:
 	ulist_free(changeset.range_changed);
 	return ret;
 }
+
+static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
+				       int free)
+{
+	struct extent_changeset changeset;
+	int ret;
+
+	changeset.bytes_changed = 0;
+	changeset.range_changed = ulist_alloc(GFP_NOFS);
+	if (!changeset.range_changed)
+		return -ENOMEM;
+
+	ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start, 
+			start + len -1, EXTENT_QGROUP_RESERVED, GFP_NOFS,
+			&changeset);
+	if (ret < 0)
+		goto out;
+
+	if (free)
+		btrfs_qgroup_free(BTRFS_I(inode)->root,
+				  changeset.bytes_changed);
+out:
+	ulist_free(changeset.range_changed);
+	return ret;
+}
+
+/*
+ * Free a reserved space range from io_tree and related qgroups
+ *
+ * Should be called when a range of pages get invalidated before reaching disk.
+ * Or for error cleanup case.
+ *
+ * For data written to disk, use btrfs_qgroup_release_data().
+ *
+ * NOTE: This function may sleep for memory allocation.
+ */
+int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len)
+{
+	return __btrfs_qgroup_release_data(inode, start, len, 1);
+}
+
+/*
+ * Release a reserved space range from io_tree only.
+ *
+ * Should be called when a range of pages get written to disk and corresponding
+ * FILE_EXTENT is inserted into corresponding root.
+ *
+ * Since new qgroup accounting framework will only update qgroup numbers at
+ * commit_transaction() time, its reserved space shouldn't be freed from
+ * related qgroups.
+ *
+ * But we should release the range from io_tree, to allow further write to be
+ * COWed.
+ *
+ * NOTE: This function may sleep for memory allocation.
+ */
+int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len)
+{
+	return __btrfs_qgroup_release_data(inode, start, len, 0);
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index bd17cc2..564eb21 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -83,4 +83,6 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 
 /* New io_tree based accurate qgroup reserve API */
 int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
+int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
+int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
 #endif /* __BTRFS_QGROUP__ */
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (4 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 05/21] btrfs: qgroup: Introduce functions to release/free qgroup reserve data space Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-25 14:39   ` Filipe Manana
  2015-10-13  2:20 ` [PATCH v3 07/21] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Add new function btrfs_add_delayed_qgroup_reserve() function to record
how much space is reserved for that extent.

As btrfs only accounts qgroup at run_delayed_refs() time, so newly
allocated extent should keep the reserved space until then.

So add needed function with related members to do it.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  None
v3:
  None
---
 fs/btrfs/delayed-ref.c | 29 +++++++++++++++++++++++++++++
 fs/btrfs/delayed-ref.h | 14 ++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ac3e81d..bd9b63b 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 	INIT_LIST_HEAD(&head_ref->ref_list);
 	head_ref->processing = 0;
 	head_ref->total_ref_mod = count_mod;
+	head_ref->qgroup_reserved = 0;
+	head_ref->qgroup_ref_root = 0;
 
 	/* Record qgroup extent info if provided */
 	if (qrecord) {
@@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
+				     struct btrfs_trans_handle *trans,
+				     u64 ref_root, u64 bytenr, u64 num_bytes)
+{
+	struct btrfs_delayed_ref_root *delayed_refs;
+	struct btrfs_delayed_ref_head *ref_head;
+	int ret = 0;
+
+	if (!fs_info->quota_enabled || !is_fstree(ref_root))
+		return 0;
+
+	delayed_refs = &trans->transaction->delayed_refs;
+
+	spin_lock(&delayed_refs->lock);
+	ref_head = find_ref_head(&delayed_refs->href_root, bytenr, 0);
+	if (!ref_head) {
+		ret = -ENOENT;
+		goto out;
+	}
+	WARN_ON(ref_head->qgroup_reserved || ref_head->qgroup_ref_root);
+	ref_head->qgroup_ref_root = ref_root;
+	ref_head->qgroup_reserved = num_bytes;
+out:
+	spin_unlock(&delayed_refs->lock);
+	return ret;
+}
+
 int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
 				struct btrfs_trans_handle *trans,
 				u64 bytenr, u64 num_bytes,
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 13fb5e6..d4c41e2 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -113,6 +113,17 @@ struct btrfs_delayed_ref_head {
 	int total_ref_mod;
 
 	/*
+	 * For qgroup reserved space freeing.
+	 *
+	 * ref_root and reserved will be recorded after
+	 * BTRFS_ADD_DELAYED_EXTENT is called.
+	 * And will be used to free reserved qgroup space at
+	 * run_delayed_refs() time.
+	 */
+	u64 qgroup_ref_root;
+	u64 qgroup_reserved;
+
+	/*
 	 * when a new extent is allocated, it is just reserved in memory
 	 * The actual extent isn't inserted into the extent allocation tree
 	 * until the delayed ref is processed.  must_insert_reserved is
@@ -242,6 +253,9 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 			       u64 owner, u64 offset, int action,
 			       struct btrfs_delayed_extent_op *extent_op,
 			       int no_quota);
+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
+				     struct btrfs_trans_handle *trans,
+				     u64 ref_root, u64 bytenr, u64 num_bytes);
 int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
 				struct btrfs_trans_handle *trans,
 				u64 bytenr, u64 num_bytes,
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 07/21] btrfs: delayed_ref: release and free qgroup reserved at proper timing
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (5 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 08/21] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Qgroup reserved space needs to be released from inode dirty map and get
freed at different timing:

1) Release when the metadata is written into tree
After corresponding metadata is written into tree, any newer write will
be COWed(don't include NOCOW case yet).
So we must release its range from inode dirty range map, or we will
forget to reserve needed range, causing accounting exceeding the limit.

2) Free reserved bytes when delayed ref is run
When delayed refs are run, qgroup accounting will follow soon and turn
the reserved bytes into rfer/excl numbers.
As run_delayed_refs and qgroup accounting are all done at
commit_transaction() time, we are safe to free reserved space in
run_delayed_ref time().

With these timing to release/free reserved space, we should be able to
resolve the long existing qgroup reserve space leak problem.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Use a better wrapped function for delayed_ref reserved space release.
  As direct call to btrfs_qgroup_free_ref() will make it hard to add
  trace event.
v3:
  None
---
 fs/btrfs/extent-tree.c |  5 +++++
 fs/btrfs/inode.c       | 10 ++++++++++
 fs/btrfs/qgroup.c      |  5 ++---
 fs/btrfs/qgroup.h      | 18 +++++++++++++++++-
 4 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 601d7d4..4f6758b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2345,6 +2345,11 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
 						      node->num_bytes);
 			}
 		}
+
+		/* Also free its reserved qgroup space */
+		btrfs_qgroup_free_delayed_ref(root->fs_info,
+					      head->qgroup_ref_root,
+					      head->qgroup_reserved);
 		return ret;
 	}
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b7e439b..f5c2ffe 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2112,6 +2112,16 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	ret = btrfs_alloc_reserved_file_extent(trans, root,
 					root->root_key.objectid,
 					btrfs_ino(inode), file_pos, &ins);
+	if (ret < 0)
+		goto out;
+	/*
+	 * Release the reserved range from inode dirty range map, and
+	 * move it to delayed ref codes, as now accounting only happens at
+	 * commit_transaction() time.
+	 */
+	btrfs_qgroup_release_data(inode, file_pos, ram_bytes);
+	ret = btrfs_add_delayed_qgroup_reserve(root->fs_info, trans,
+			root->objectid, disk_bytenr, ram_bytes);
 out:
 	btrfs_free_path(path);
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index a86d9c6..a2678f6 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2111,14 +2111,13 @@ out:
 	return ret;
 }
 
-void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
+void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
+			       u64 ref_root, u64 num_bytes)
 {
 	struct btrfs_root *quota_root;
 	struct btrfs_qgroup *qgroup;
-	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct ulist_node *unode;
 	struct ulist_iterator uiter;
-	u64 ref_root = root->root_key.objectid;
 	int ret = 0;
 
 	if (!is_fstree(ref_root))
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 564eb21..80924ae 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -72,7 +72,23 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
 			 struct btrfs_qgroup_inherit *inherit);
 int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
-void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
+void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
+			       u64 ref_root, u64 num_bytes);
+static inline void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
+{
+	return btrfs_qgroup_free_refroot(root->fs_info, root->objectid,
+					 num_bytes);
+}
+
+/*
+ * TODO: Add proper trace point for it, as btrfs_qgroup_free() is
+ * called by everywhere, can't provide good trace for delayed ref case.
+ */
+static inline void btrfs_qgroup_free_delayed_ref(struct btrfs_fs_info *fs_info,
+						 u64 ref_root, u64 num_bytes)
+{
+	btrfs_qgroup_free_refroot(fs_info, ref_root, num_bytes);
+}
 
 void assert_qgroups_uptodate(struct btrfs_trans_handle *trans);
 
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 08/21] btrfs: qgroup: Introduce new functions to reserve/free metadata
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (6 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 07/21] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 09/21] btrfs: qgroup: Use new metadata reservation Qu Wenruo
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Introduce new functions btrfs_qgroup_reserve/free_meta() to reserve/free
metadata reserved space.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  None
v3:
  None
---
 fs/btrfs/ctree.h   |  3 +++
 fs/btrfs/disk-io.c |  1 +
 fs/btrfs/qgroup.c  | 40 ++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h  |  4 ++++
 4 files changed, 48 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 938efe3..ae86025 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1943,6 +1943,9 @@ struct btrfs_root {
 	int send_in_progress;
 	struct btrfs_subvolume_writers *subv_writers;
 	atomic_t will_be_snapshoted;
+
+	/* For qgroup metadata space reserve */
+	atomic_t qgroup_meta_rsv;
 };
 
 struct btrfs_ioctl_defrag_range_args {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 807f685..2b51705 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1259,6 +1259,7 @@ static void __setup_root(u32 nodesize, u32 sectorsize, u32 stripesize,
 	atomic_set(&root->orphan_inodes, 0);
 	atomic_set(&root->refs, 1);
 	atomic_set(&root->will_be_snapshoted, 0);
+	atomic_set(&root->qgroup_meta_rsv, 0);
 	root->log_transid = 0;
 	root->log_transid_committed = -1;
 	root->last_log_commit = 0;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index a2678f6..b5d1850 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2589,3 +2589,43 @@ int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len)
 {
 	return __btrfs_qgroup_release_data(inode, start, len, 0);
 }
+
+int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes)
+{
+	int ret;
+
+	if (!root->fs_info->quota_enabled || !is_fstree(root->objectid) ||
+	    num_bytes == 0)
+		return 0;
+
+	BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
+	ret = btrfs_qgroup_reserve(root, num_bytes);
+	if (ret < 0)
+		return ret;
+	atomic_add(num_bytes, &root->qgroup_meta_rsv);
+	return ret;
+}
+
+void btrfs_qgroup_free_meta_all(struct btrfs_root *root)
+{
+	int reserved;
+
+	if (!root->fs_info->quota_enabled || !is_fstree(root->objectid))
+		return;
+
+	reserved = atomic_xchg(&root->qgroup_meta_rsv, 0);
+	if (reserved == 0)
+		return;
+	btrfs_qgroup_free(root, reserved);
+}
+
+void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
+{
+	if (!root->fs_info->quota_enabled || !is_fstree(root->objectid))
+		return;
+
+	BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
+	WARN_ON(atomic_read(&root->qgroup_meta_rsv) < num_bytes);
+	atomic_sub(num_bytes, &root->qgroup_meta_rsv);
+	btrfs_qgroup_free(root, num_bytes);
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 80924ae..7d1c87c 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -101,4 +101,8 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
 int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
 int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
+
+int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes);
+void btrfs_qgroup_free_meta_all(struct btrfs_root *root);
+void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes);
 #endif /* __BTRFS_QGROUP__ */
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 09/21] btrfs: qgroup: Use new metadata reservation.
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (7 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 08/21] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 10/21] btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space Qu Wenruo
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

As we have the new metadata reservation functions, use them to replace
the old btrfs_qgroup_reserve() call for metadata.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  None
v3:
  None
---
 fs/btrfs/extent-tree.c | 14 ++++++--------
 fs/btrfs/transaction.c | 34 ++++++----------------------------
 fs/btrfs/transaction.h |  1 -
 3 files changed, 12 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4f6758b..22702bd 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5345,7 +5345,7 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 	if (root->fs_info->quota_enabled) {
 		/* One for parent inode, two for dir entries */
 		num_bytes = 3 * root->nodesize;
-		ret = btrfs_qgroup_reserve(root, num_bytes);
+		ret = btrfs_qgroup_reserve_meta(root, num_bytes);
 		if (ret)
 			return ret;
 	} else {
@@ -5363,10 +5363,8 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 	if (ret == -ENOSPC && use_global_rsv)
 		ret = btrfs_block_rsv_migrate(global_rsv, rsv, num_bytes);
 
-	if (ret) {
-		if (*qgroup_reserved)
-			btrfs_qgroup_free(root, *qgroup_reserved);
-	}
+	if (ret && *qgroup_reserved)
+		btrfs_qgroup_free_meta(root, *qgroup_reserved);
 
 	return ret;
 }
@@ -5527,15 +5525,15 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
 	spin_unlock(&BTRFS_I(inode)->lock);
 
 	if (root->fs_info->quota_enabled) {
-		ret = btrfs_qgroup_reserve(root, nr_extents * root->nodesize);
+		ret = btrfs_qgroup_reserve_meta(root,
+				nr_extents * root->nodesize);
 		if (ret)
 			goto out_fail;
 	}
 
 	ret = reserve_metadata_bytes(root, block_rsv, to_reserve, flush);
 	if (unlikely(ret)) {
-		if (root->fs_info->quota_enabled)
-			btrfs_qgroup_free(root, nr_extents * root->nodesize);
+		btrfs_qgroup_free_meta(root, nr_extents * root->nodesize);
 		goto out_fail;
 	}
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 376191c..5ed06b8 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -478,13 +478,10 @@ start_transaction(struct btrfs_root *root, u64 num_items, unsigned int type,
 	 * the appropriate flushing if need be.
 	 */
 	if (num_items > 0 && root != root->fs_info->chunk_root) {
-		if (root->fs_info->quota_enabled &&
-		    is_fstree(root->root_key.objectid)) {
-			qgroup_reserved = num_items * root->nodesize;
-			ret = btrfs_qgroup_reserve(root, qgroup_reserved);
-			if (ret)
-				return ERR_PTR(ret);
-		}
+		qgroup_reserved = num_items * root->nodesize;
+		ret = btrfs_qgroup_reserve_meta(root, qgroup_reserved);
+		if (ret)
+			return ERR_PTR(ret);
 
 		num_bytes = btrfs_calc_trans_metadata_size(root, num_items);
 		/*
@@ -553,7 +550,6 @@ again:
 	h->block_rsv = NULL;
 	h->orig_rsv = NULL;
 	h->aborted = 0;
-	h->qgroup_reserved = 0;
 	h->delayed_ref_elem.seq = 0;
 	h->type = type;
 	h->allocating_chunk = false;
@@ -579,7 +575,6 @@ again:
 		h->bytes_reserved = num_bytes;
 		h->reloc_reserved = reloc_reserved;
 	}
-	h->qgroup_reserved = qgroup_reserved;
 
 got_it:
 	btrfs_record_root_in_trans(h, root);
@@ -597,8 +592,7 @@ alloc_fail:
 		btrfs_block_rsv_release(root, &root->fs_info->trans_block_rsv,
 					num_bytes);
 reserve_fail:
-	if (qgroup_reserved)
-		btrfs_qgroup_free(root, qgroup_reserved);
+	btrfs_qgroup_free_meta(root, qgroup_reserved);
 	return ERR_PTR(ret);
 }
 
@@ -815,15 +809,6 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 			must_run_delayed_refs = 2;
 	}
 
-	if (trans->qgroup_reserved) {
-		/*
-		 * the same root has to be passed here between start_transaction
-		 * and end_transaction. Subvolume quota depends on this.
-		 */
-		btrfs_qgroup_free(trans->root, trans->qgroup_reserved);
-		trans->qgroup_reserved = 0;
-	}
-
 	btrfs_trans_release_metadata(trans, root);
 	trans->block_rsv = NULL;
 
@@ -1238,6 +1223,7 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
 			spin_lock(&fs_info->fs_roots_radix_lock);
 			if (err)
 				break;
+			btrfs_qgroup_free_meta_all(root);
 		}
 	}
 	spin_unlock(&fs_info->fs_roots_radix_lock);
@@ -1846,10 +1832,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 
 	btrfs_trans_release_metadata(trans, root);
 	trans->block_rsv = NULL;
-	if (trans->qgroup_reserved) {
-		btrfs_qgroup_free(root, trans->qgroup_reserved);
-		trans->qgroup_reserved = 0;
-	}
 
 	cur_trans = trans->transaction;
 
@@ -2202,10 +2184,6 @@ cleanup_transaction:
 	btrfs_trans_release_metadata(trans, root);
 	btrfs_trans_release_chunk_metadata(trans);
 	trans->block_rsv = NULL;
-	if (trans->qgroup_reserved) {
-		btrfs_qgroup_free(root, trans->qgroup_reserved);
-		trans->qgroup_reserved = 0;
-	}
 	btrfs_warn(root->fs_info, "Skipping commit of aborted transaction.");
 	if (current->journal_info == trans)
 		current->journal_info = NULL;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index a994bb0..ce41bc9 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -107,7 +107,6 @@ struct btrfs_trans_handle {
 	u64 transid;
 	u64 bytes_reserved;
 	u64 chunk_bytes_reserved;
-	u64 qgroup_reserved;
 	unsigned long use_count;
 	unsigned long blocks_reserved;
 	unsigned long blocks_used;
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 10/21] btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space.
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (8 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 09/21] btrfs: qgroup: Use new metadata reservation Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 11/21] btrfs: extent-tree: Switch to new check_data_free_space and free_reserved_data_space Qu Wenruo
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Add new functions __btrfs_check_data_free_space() and
__btrfs_free_reserved_data_space() to work with new accurate qgroup
reserved space framework.

The new function will replace old btrfs_check_data_free_space() and
btrfs_free_reserved_data_space() respectively, but until all the change
is done, let's just use the new name.

Also, export internal use function btrfs_alloc_data_chunk_ondemand(), as
now qgroup reserve requires precious bytes, some operation can't get the
accurate number in advance(like fallocate).
But data space info check and data chunk allocate doesn't need to be
that accurate, and can be called at the beginning.

So export it for later operations.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Fix comment typo
  Add __btrfs_free_reserved_data_space() function, or we will leak
  reserved space at EQUOT error handle routine.
v3:
  None
---
 fs/btrfs/ctree.h       |  3 ++
 fs/btrfs/extent-tree.c | 85 ++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 79 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ae86025..19450a1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3453,7 +3453,10 @@ enum btrfs_reserve_flush_enum {
 };
 
 int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes);
+int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
+int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes);
 void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes);
+void __btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
 				struct btrfs_root *root);
 void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 22702bd..0cd6baa 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3908,11 +3908,7 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data)
 	return ret;
 }
 
-/*
- * This will check the space that the inode allocates from to make sure we have
- * enough space for bytes.
- */
-int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
+int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes)
 {
 	struct btrfs_space_info *data_sinfo;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -4033,19 +4029,55 @@ commit_trans:
 					      data_sinfo->flags, bytes, 1);
 		return -ENOSPC;
 	}
-	ret = btrfs_qgroup_reserve(root, write_bytes);
-	if (ret)
-		goto out;
 	data_sinfo->bytes_may_use += bytes;
 	trace_btrfs_space_reservation(root->fs_info, "space_info",
 				      data_sinfo->flags, bytes, 1);
-out:
 	spin_unlock(&data_sinfo->lock);
 
 	return ret;
 }
 
 /*
+ * This will check the space that the inode allocates from to make sure we have
+ * enough space for bytes.
+ */
+int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	int ret;
+
+	ret = btrfs_alloc_data_chunk_ondemand(inode, bytes);
+	if (ret < 0)
+		return ret;
+	ret = btrfs_qgroup_reserve(root, write_bytes);
+	return ret;
+}
+
+/*
+ * New check_data_free_space() with ability for precious data reservation
+ * Will replace old btrfs_check_data_free_space(), but for patch split,
+ * add a new function first and then replace it.
+ */
+int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	int ret;
+
+	/* align the range */
+	len = round_up(start + len, root->sectorsize) -
+	      round_down(start, root->sectorsize);
+	start = round_down(start, root->sectorsize);
+
+	ret = btrfs_alloc_data_chunk_ondemand(inode, len);
+	if (ret < 0)
+		return ret;
+
+	/* Use new btrfs_qgroup_reserve_data to reserve precious data space */
+	ret = btrfs_qgroup_reserve_data(inode, start, len);
+	return ret;
+}
+
+/*
  * Called if we need to clear a data reservation for this inode.
  */
 void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes)
@@ -4065,6 +4097,41 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes)
 	spin_unlock(&data_sinfo->lock);
 }
 
+/*
+ * Called if we need to clear a data reservation for this inode
+ * Normally in a error case.
+ *
+ * This one will handle the per-indoe data rsv map for accurate reserved
+ * space framework.
+ */
+void __btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_space_info *data_sinfo;
+
+	/* Make sure the range is aligned to sectorsize */
+	len = round_up(start + len, root->sectorsize) -
+	      round_down(start, root->sectorsize);
+	start = round_down(start, root->sectorsize);
+
+	/*
+	 * Free any reserved qgroup data space first
+	 * As it will alloc memory, we can't do it with data sinfo
+	 * spinlock hold.
+	 */
+	btrfs_qgroup_free_data(inode, start, len);
+
+	data_sinfo = root->fs_info->data_sinfo;
+	spin_lock(&data_sinfo->lock);
+	if (WARN_ON(data_sinfo->bytes_may_use < len))
+		data_sinfo->bytes_may_use = 0;
+	else
+		data_sinfo->bytes_may_use -= len;
+	trace_btrfs_space_reservation(root->fs_info, "space_info",
+				      data_sinfo->flags, len, 0);
+	spin_unlock(&data_sinfo->lock);
+}
+
 static void force_metadata_allocation(struct btrfs_fs_info *info)
 {
 	struct list_head *head = &info->space_info;
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 11/21] btrfs: extent-tree: Switch to new check_data_free_space and free_reserved_data_space
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (9 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 10/21] btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 12/21] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve/release_space Qu Wenruo
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Use new reserve/free for buffered write and inode cache.

For buffered write case, as nodatacow write won't increase quota account,
so unlike old behavior which does reserve before check nocow, now we
check nocow first and then only reserve data if we can't do nocow write.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Add call for new free function too. Or we will leak reserved space in
  case of data reservation succeeded but metadata reservation failed.
v3:
  None
---
 fs/btrfs/extent-tree.c |  4 ++--
 fs/btrfs/file.c        | 34 +++++++++++++++++++++-------------
 fs/btrfs/relocation.c  |  8 ++++----
 3 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0cd6baa..f4b9db8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3356,7 +3356,7 @@ again:
 	num_pages *= 16;
 	num_pages *= PAGE_CACHE_SIZE;
 
-	ret = btrfs_check_data_free_space(inode, num_pages, num_pages);
+	ret = __btrfs_check_data_free_space(inode, 0, num_pages);
 	if (ret)
 		goto out_put;
 
@@ -3365,7 +3365,7 @@ again:
 					      &alloc_hint);
 	if (!ret)
 		dcs = BTRFS_DC_SETUP;
-	btrfs_free_reserved_data_space(inode, num_pages);
+	__btrfs_free_reserved_data_space(inode, 0, num_pages);
 
 out_put:
 	iput(inode);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b823fac..142b217 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1510,12 +1510,17 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 		}
 
 		reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
-		ret = btrfs_check_data_free_space(inode, reserve_bytes, write_bytes);
-		if (ret == -ENOSPC &&
-		    (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
-					      BTRFS_INODE_PREALLOC))) {
+
+		if (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+					     BTRFS_INODE_PREALLOC)) {
 			ret = check_can_nocow(inode, pos, &write_bytes);
+			if (ret < 0)
+				break;
 			if (ret > 0) {
+				/*
+				 * For nodata cow case, no need to reserve
+				 * data space.
+				 */
 				only_release_metadata = true;
 				/*
 				 * our prealloc extent may be smaller than
@@ -1524,20 +1529,19 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				num_pages = DIV_ROUND_UP(write_bytes + offset,
 							 PAGE_CACHE_SIZE);
 				reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
-				ret = 0;
-			} else {
-				ret = -ENOSPC;
+				goto reserve_metadata;
 			}
 		}
-
-		if (ret)
+		ret = __btrfs_check_data_free_space(inode, pos, write_bytes);
+		if (ret < 0)
 			break;
 
+reserve_metadata:
 		ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes);
 		if (ret) {
 			if (!only_release_metadata)
-				btrfs_free_reserved_data_space(inode,
-							       reserve_bytes);
+				__btrfs_free_reserved_data_space(inode, pos,
+							         write_bytes);
 			else
 				btrfs_end_write_no_snapshoting(root);
 			break;
@@ -2569,8 +2573,11 @@ static long btrfs_fallocate(struct file *file, int mode,
 	/*
 	 * Make sure we have enough space before we do the
 	 * allocation.
+	 * XXX: The behavior must be changed to do accurate check first
+	 * and then check data reserved space.
 	 */
-	ret = btrfs_check_data_free_space(inode, alloc_end - alloc_start, alloc_end - alloc_start);
+	ret = btrfs_check_data_free_space(inode, alloc_start,
+					  alloc_end - alloc_start);
 	if (ret)
 		return ret;
 
@@ -2703,7 +2710,8 @@ static long btrfs_fallocate(struct file *file, int mode,
 out:
 	mutex_unlock(&inode->i_mutex);
 	/* Let go of our reservation. */
-	btrfs_free_reserved_data_space(inode, alloc_end - alloc_start);
+	__btrfs_free_reserved_data_space(inode, alloc_start,
+					 alloc_end - alloc_start);
 	return ret;
 }
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 303babe..f4621c5 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3034,8 +3034,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	BUG_ON(cluster->start != cluster->boundary[0]);
 	mutex_lock(&inode->i_mutex);
 
-	ret = btrfs_check_data_free_space(inode, cluster->end +
-					  1 - cluster->start, 0);
+	ret = __btrfs_check_data_free_space(inode, cluster->start,
+					    cluster->end + 1 - cluster->start);
 	if (ret)
 		goto out;
 
@@ -3056,8 +3056,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
 			break;
 		nr++;
 	}
-	btrfs_free_reserved_data_space(inode, cluster->end +
-				       1 - cluster->start);
+	__btrfs_free_reserved_data_space(inode, cluster->start,
+					 cluster->end + 1 - cluster->start);
 out:
 	mutex_unlock(&inode->i_mutex);
 	return ret;
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 12/21] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve/release_space
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (10 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 11/21] btrfs: extent-tree: Switch to new check_data_free_space and free_reserved_data_space Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 13/21] btrfs: extent-tree: Switch to new delalloc space reserve and release Qu Wenruo
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Add new version of btrfs_delalloc_reserve_space() and
btrfs_delalloc_release_space() functions, which supports accurate qgroup
reserve.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Add new function btrfs_delalloc_release_space() to handle error case.
v3:
  None
---
 fs/btrfs/ctree.h       |  2 ++
 fs/btrfs/extent-tree.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 19450a1..4221bfd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3473,7 +3473,9 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root,
 int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
 void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
 int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes);
+int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
 void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes);
+void __btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root,
 					      unsigned short type);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f4b9db8..32455e0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5723,6 +5723,44 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
 }
 
 /**
+ * __btrfs_delalloc_reserve_space - reserve data and metadata space for
+ * delalloc
+ * @inode: inode we're writing to
+ * @start: start range we are writing to
+ * @len: how long the range we are writing to
+ *
+ * TODO: This function will finally replace old btrfs_delalloc_reserve_space()
+ *
+ * This will do the following things
+ *
+ * o reserve space in data space info for num bytes
+ *   and reserve precious corresponding qgroup space
+ *   (Done in check_data_free_space)
+ *
+ * o reserve space for metadata space, based on the number of outstanding
+ *   extents and how much csums will be needed
+ *   also reserve metadata space in a per root over-reserve method.
+ * o add to the inodes->delalloc_bytes
+ * o add it to the fs_info's delalloc inodes list.
+ *   (Above 3 all done in delalloc_reserve_metadata)
+ *
+ * Return 0 for success
+ * Return <0 for error(-ENOSPC or -EQUOT)
+ */
+int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
+{
+	int ret;
+
+	ret = __btrfs_check_data_free_space(inode, start, len);
+	if (ret < 0)
+		return ret;
+	ret = btrfs_delalloc_reserve_metadata(inode, len);
+	if (ret < 0)
+		__btrfs_free_reserved_data_space(inode, start, len);
+	return ret;
+}
+
+/**
  * btrfs_delalloc_reserve_space - reserve data and metadata space for delalloc
  * @inode: inode we're writing to
  * @num_bytes: the number of bytes we want to allocate
@@ -5755,6 +5793,27 @@ int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes)
 }
 
 /**
+ * __btrfs_delalloc_release_space - release data and metadata space for delalloc
+ * @inode: inode we're releasing space for
+ * @start: start position of the space already reserved
+ * @len: the len of the space already reserved
+ *
+ * This must be matched with a call to btrfs_delalloc_reserve_space.  This is
+ * called in the case that we don't need the metadata AND data reservations
+ * anymore.  So if there is an error or we insert an inline extent.
+ *
+ * This function will release the metadata space that was not used and will
+ * decrement ->delalloc_bytes and remove it from the fs_info delalloc_inodes
+ * list if there are no delalloc bytes left.
+ * Also it will handle the qgroup reserved space.
+ */
+void __btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len)
+{
+	btrfs_delalloc_release_metadata(inode, len);
+	__btrfs_free_reserved_data_space(inode, start, len);
+}
+
+/**
  * btrfs_delalloc_release_space - release data and metadata space for delalloc
  * @inode: inode we're releasing space for
  * @num_bytes: the number of bytes we want to free up
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 13/21] btrfs: extent-tree: Switch to new delalloc space reserve and release
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (11 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 12/21] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve/release_space Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 14/21] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Use new __btrfs_delalloc_reserve_space() and
__btrfs_delalloc_release_space() to reserve and release space for
delalloc.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Also use __btrfs_delalloc_release_space() function.
v3:
  None
---
 fs/btrfs/file.c      |  5 +++--
 fs/btrfs/inode-map.c |  6 +++---
 fs/btrfs/inode.c     | 38 +++++++++++++++++++++++---------------
 fs/btrfs/ioctl.c     | 14 +++++++++-----
 4 files changed, 38 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 142b217..bf4d5fb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1611,7 +1611,7 @@ again:
 				btrfs_delalloc_release_metadata(inode,
 								release_bytes);
 			else
-				btrfs_delalloc_release_space(inode,
+				__btrfs_delalloc_release_space(inode, pos,
 							     release_bytes);
 		}
 
@@ -1664,7 +1664,8 @@ again:
 			btrfs_end_write_no_snapshoting(root);
 			btrfs_delalloc_release_metadata(inode, release_bytes);
 		} else {
-			btrfs_delalloc_release_space(inode, release_bytes);
+			__btrfs_delalloc_release_space(inode, pos,
+						       release_bytes);
 		}
 	}
 
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index d4a582a..78bc09c 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -488,17 +488,17 @@ again:
 	/* Just to make sure we have enough space */
 	prealloc += 8 * PAGE_CACHE_SIZE;
 
-	ret = btrfs_delalloc_reserve_space(inode, prealloc);
+	ret = __btrfs_delalloc_reserve_space(inode, 0, prealloc);
 	if (ret)
 		goto out_put;
 
 	ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
 					      prealloc, prealloc, &alloc_hint);
 	if (ret) {
-		btrfs_delalloc_release_space(inode, prealloc);
+		__btrfs_delalloc_release_space(inode, 0, prealloc);
 		goto out_put;
 	}
-	btrfs_free_reserved_data_space(inode, prealloc);
+	__btrfs_free_reserved_data_space(inode, 0, prealloc);
 
 	ret = btrfs_write_out_ino_cache(root, trans, path, inode);
 out_put:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f5c2ffe..df3cff2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1766,7 +1766,8 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 
 		if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID
 		    && do_list && !(state->state & EXTENT_NORESERVE))
-			btrfs_free_reserved_data_space(inode, len);
+			__btrfs_free_reserved_data_space(inode, state->start,
+							 len);
 
 		__percpu_counter_add(&root->fs_info->delalloc_bytes, -len,
 				     root->fs_info->delalloc_batch);
@@ -1985,7 +1986,8 @@ again:
 		goto again;
 	}
 
-	ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+	ret = __btrfs_delalloc_reserve_space(inode, page_start,
+					     PAGE_CACHE_SIZE);
 	if (ret) {
 		mapping_set_error(page->mapping, ret);
 		end_extent_writepage(page, ret, page_start, page_end);
@@ -4581,14 +4583,17 @@ int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
 	if ((offset & (blocksize - 1)) == 0 &&
 	    (!len || ((len & (blocksize - 1)) == 0)))
 		goto out;
-	ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+	ret = __btrfs_delalloc_reserve_space(inode,
+			round_down(from, PAGE_CACHE_SIZE), PAGE_CACHE_SIZE);
 	if (ret)
 		goto out;
 
 again:
 	page = find_or_create_page(mapping, index, mask);
 	if (!page) {
-		btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
+		__btrfs_delalloc_release_space(inode,
+				round_down(from, PAGE_CACHE_SIZE),
+				PAGE_CACHE_SIZE);
 		ret = -ENOMEM;
 		goto out;
 	}
@@ -4656,7 +4661,8 @@ again:
 
 out_unlock:
 	if (ret)
-		btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
+		__btrfs_delalloc_release_space(inode, page_start,
+					       PAGE_CACHE_SIZE);
 	unlock_page(page);
 	page_cache_release(page);
 out:
@@ -7587,7 +7593,7 @@ unlock:
 			spin_unlock(&BTRFS_I(inode)->lock);
 		}
 
-		btrfs_free_reserved_data_space(inode, len);
+		__btrfs_free_reserved_data_space(inode, start, len);
 		WARN_ON(dio_data->reserve < len);
 		dio_data->reserve -= len;
 		current->journal_info = dio_data;
@@ -8380,7 +8386,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 			mutex_unlock(&inode->i_mutex);
 			relock = true;
 		}
-		ret = btrfs_delalloc_reserve_space(inode, count);
+		ret = __btrfs_delalloc_reserve_space(inode, offset, count);
 		if (ret)
 			goto out;
 		dio_data.outstanding_extents = div64_u64(count +
@@ -8409,11 +8415,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 		current->journal_info = NULL;
 		if (ret < 0 && ret != -EIOCBQUEUED) {
 			if (dio_data.reserve)
-				btrfs_delalloc_release_space(inode,
-							dio_data.reserve);
+				__btrfs_delalloc_release_space(inode, offset,
+						dio_data.reserve);
 		} else if (ret >= 0 && (size_t)ret < count)
-			btrfs_delalloc_release_space(inode,
-						     count - (size_t)ret);
+			__btrfs_delalloc_release_space(inode, offset,
+						       count - (size_t)ret);
 	}
 out:
 	if (wakeup)
@@ -8621,7 +8627,11 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	u64 page_end;
 
 	sb_start_pagefault(inode->i_sb);
-	ret  = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE);
+	page_start = page_offset(page);
+	page_end = page_start + PAGE_CACHE_SIZE - 1;
+
+	ret = __btrfs_delalloc_reserve_space(inode, page_start,
+					     PAGE_CACHE_SIZE);
 	if (!ret) {
 		ret = file_update_time(vma->vm_file);
 		reserved = 1;
@@ -8640,8 +8650,6 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 again:
 	lock_page(page);
 	size = i_size_read(inode);
-	page_start = page_offset(page);
-	page_end = page_start + PAGE_CACHE_SIZE - 1;
 
 	if ((page->mapping != inode->i_mapping) ||
 	    (page_start >= size)) {
@@ -8718,7 +8726,7 @@ out_unlock:
 	}
 	unlock_page(page);
 out:
-	btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE);
+	__btrfs_delalloc_release_space(inode, page_start, PAGE_CACHE_SIZE);
 out_noreserve:
 	sb_end_pagefault(inode->i_sb);
 	return ret;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0adf542..3158b0f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1119,8 +1119,9 @@ static int cluster_pages_for_defrag(struct inode *inode,
 
 	page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
 
-	ret = btrfs_delalloc_reserve_space(inode,
-					   page_cnt << PAGE_CACHE_SHIFT);
+	ret = __btrfs_delalloc_reserve_space(inode,
+			start_index << PAGE_CACHE_SHIFT,
+			page_cnt << PAGE_CACHE_SHIFT);
 	if (ret)
 		return ret;
 	i_done = 0;
@@ -1209,8 +1210,9 @@ again:
 		spin_lock(&BTRFS_I(inode)->lock);
 		BTRFS_I(inode)->outstanding_extents++;
 		spin_unlock(&BTRFS_I(inode)->lock);
-		btrfs_delalloc_release_space(inode,
-				     (page_cnt - i_done) << PAGE_CACHE_SHIFT);
+		__btrfs_delalloc_release_space(inode,
+				start_index << PAGE_CACHE_SHIFT,
+				(page_cnt - i_done) << PAGE_CACHE_SHIFT);
 	}
 
 
@@ -1235,7 +1237,9 @@ out:
 		unlock_page(pages[i]);
 		page_cache_release(pages[i]);
 	}
-	btrfs_delalloc_release_space(inode, page_cnt << PAGE_CACHE_SHIFT);
+	__btrfs_delalloc_release_space(inode,
+			start_index << PAGE_CACHE_SHIFT,
+			page_cnt << PAGE_CACHE_SHIFT);
 	return ret;
 
 }
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 14/21] btrfs: qgroup: Cleanup old inaccurate facilities
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (12 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 13/21] btrfs: extent-tree: Switch to new delalloc space reserve and release Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 15/21] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Cleanup the old facilities which use old btrfs_qgroup_reserve() function
call, replace them with the newer version, and remove the "__" prefix in
them.

Also, make btrfs_qgroup_reserve/free() functions private, as they are
now only used inside qgroup codes.

Now, the whole btrfs qgroup is swithed to use the new reserve facilities.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Apply newly introduced functions too.
v3:
  None
---
 fs/btrfs/ctree.h       |  12 ++----
 fs/btrfs/extent-tree.c | 109 +++++--------------------------------------------
 fs/btrfs/file.c        |  15 ++++---
 fs/btrfs/inode-map.c   |   6 +--
 fs/btrfs/inode.c       |  34 +++++++--------
 fs/btrfs/ioctl.c       |   6 +--
 fs/btrfs/qgroup.c      |  18 ++++----
 fs/btrfs/qgroup.h      |   8 ----
 fs/btrfs/relocation.c  |   8 ++--
 9 files changed, 60 insertions(+), 156 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4221bfd..f20b901 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3452,11 +3452,9 @@ enum btrfs_reserve_flush_enum {
 	BTRFS_RESERVE_FLUSH_ALL,
 };
 
-int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes);
-int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
+int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
 int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes);
-void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes);
-void __btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
+void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
 				struct btrfs_root *root);
 void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
@@ -3472,10 +3470,8 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root,
 				      u64 qgroup_reserved);
 int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
 void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes);
-int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
-void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes);
-void __btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
+int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
+void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root,
 					      unsigned short type);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 32455e0..1dadbba 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3356,7 +3356,7 @@ again:
 	num_pages *= 16;
 	num_pages *= PAGE_CACHE_SIZE;
 
-	ret = __btrfs_check_data_free_space(inode, 0, num_pages);
+	ret = btrfs_check_data_free_space(inode, 0, num_pages);
 	if (ret)
 		goto out_put;
 
@@ -3365,7 +3365,7 @@ again:
 					      &alloc_hint);
 	if (!ret)
 		dcs = BTRFS_DC_SETUP;
-	__btrfs_free_reserved_data_space(inode, 0, num_pages);
+	btrfs_free_reserved_data_space(inode, 0, num_pages);
 
 out_put:
 	iput(inode);
@@ -4038,27 +4038,11 @@ commit_trans:
 }
 
 /*
- * This will check the space that the inode allocates from to make sure we have
- * enough space for bytes.
- */
-int btrfs_check_data_free_space(struct inode *inode, u64 bytes, u64 write_bytes)
-{
-	struct btrfs_root *root = BTRFS_I(inode)->root;
-	int ret;
-
-	ret = btrfs_alloc_data_chunk_ondemand(inode, bytes);
-	if (ret < 0)
-		return ret;
-	ret = btrfs_qgroup_reserve(root, write_bytes);
-	return ret;
-}
-
-/*
  * New check_data_free_space() with ability for precious data reservation
  * Will replace old btrfs_check_data_free_space(), but for patch split,
  * add a new function first and then replace it.
  */
-int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
+int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	int ret;
@@ -4078,33 +4062,13 @@ int __btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
 }
 
 /*
- * Called if we need to clear a data reservation for this inode.
- */
-void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes)
-{
-	struct btrfs_root *root = BTRFS_I(inode)->root;
-	struct btrfs_space_info *data_sinfo;
-
-	/* make sure bytes are sectorsize aligned */
-	bytes = ALIGN(bytes, root->sectorsize);
-
-	data_sinfo = root->fs_info->data_sinfo;
-	spin_lock(&data_sinfo->lock);
-	WARN_ON(data_sinfo->bytes_may_use < bytes);
-	data_sinfo->bytes_may_use -= bytes;
-	trace_btrfs_space_reservation(root->fs_info, "space_info",
-				      data_sinfo->flags, bytes, 0);
-	spin_unlock(&data_sinfo->lock);
-}
-
-/*
  * Called if we need to clear a data reservation for this inode
  * Normally in a error case.
  *
  * This one will handle the per-indoe data rsv map for accurate reserved
  * space framework.
  */
-void __btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
+void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_space_info *data_sinfo;
@@ -5723,7 +5687,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
 }
 
 /**
- * __btrfs_delalloc_reserve_space - reserve data and metadata space for
+ * btrfs_delalloc_reserve_space - reserve data and metadata space for
  * delalloc
  * @inode: inode we're writing to
  * @start: start range we are writing to
@@ -5747,53 +5711,21 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
  * Return 0 for success
  * Return <0 for error(-ENOSPC or -EQUOT)
  */
-int __btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
+int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
 {
 	int ret;
 
-	ret = __btrfs_check_data_free_space(inode, start, len);
+	ret = btrfs_check_data_free_space(inode, start, len);
 	if (ret < 0)
 		return ret;
 	ret = btrfs_delalloc_reserve_metadata(inode, len);
 	if (ret < 0)
-		__btrfs_free_reserved_data_space(inode, start, len);
+		btrfs_free_reserved_data_space(inode, start, len);
 	return ret;
 }
 
 /**
- * btrfs_delalloc_reserve_space - reserve data and metadata space for delalloc
- * @inode: inode we're writing to
- * @num_bytes: the number of bytes we want to allocate
- *
- * This will do the following things
- *
- * o reserve space in the data space info for num_bytes
- * o reserve space in the metadata space info based on number of outstanding
- *   extents and how much csums will be needed
- * o add to the inodes ->delalloc_bytes
- * o add it to the fs_info's delalloc inodes list.
- *
- * This will return 0 for success and -ENOSPC if there is no space left.
- */
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes)
-{
-	int ret;
-
-	ret = btrfs_check_data_free_space(inode, num_bytes, num_bytes);
-	if (ret)
-		return ret;
-
-	ret = btrfs_delalloc_reserve_metadata(inode, num_bytes);
-	if (ret) {
-		btrfs_free_reserved_data_space(inode, num_bytes);
-		return ret;
-	}
-
-	return 0;
-}
-
-/**
- * __btrfs_delalloc_release_space - release data and metadata space for delalloc
+ * btrfs_delalloc_release_space - release data and metadata space for delalloc
  * @inode: inode we're releasing space for
  * @start: start position of the space already reserved
  * @len: the len of the space already reserved
@@ -5807,29 +5739,10 @@ int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes)
  * list if there are no delalloc bytes left.
  * Also it will handle the qgroup reserved space.
  */
-void __btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len)
+void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len)
 {
 	btrfs_delalloc_release_metadata(inode, len);
-	__btrfs_free_reserved_data_space(inode, start, len);
-}
-
-/**
- * btrfs_delalloc_release_space - release data and metadata space for delalloc
- * @inode: inode we're releasing space for
- * @num_bytes: the number of bytes we want to free up
- *
- * This must be matched with a call to btrfs_delalloc_reserve_space.  This is
- * called in the case that we don't need the metadata AND data reservations
- * anymore.  So if there is an error or we insert an inline extent.
- *
- * This function will release the metadata space that was not used and will
- * decrement ->delalloc_bytes and remove it from the fs_info delalloc_inodes
- * list if there are no delalloc bytes left.
- */
-void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes)
-{
-	btrfs_delalloc_release_metadata(inode, num_bytes);
-	btrfs_free_reserved_data_space(inode, num_bytes);
+	btrfs_free_reserved_data_space(inode, start, len);
 }
 
 static int update_block_group(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index bf4d5fb..c97b24f 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1532,7 +1532,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				goto reserve_metadata;
 			}
 		}
-		ret = __btrfs_check_data_free_space(inode, pos, write_bytes);
+		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
 		if (ret < 0)
 			break;
 
@@ -1540,8 +1540,8 @@ reserve_metadata:
 		ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes);
 		if (ret) {
 			if (!only_release_metadata)
-				__btrfs_free_reserved_data_space(inode, pos,
-							         write_bytes);
+				btrfs_free_reserved_data_space(inode, pos,
+							       write_bytes);
 			else
 				btrfs_end_write_no_snapshoting(root);
 			break;
@@ -1611,7 +1611,7 @@ again:
 				btrfs_delalloc_release_metadata(inode,
 								release_bytes);
 			else
-				__btrfs_delalloc_release_space(inode, pos,
+				btrfs_delalloc_release_space(inode, pos,
 							     release_bytes);
 		}
 
@@ -1664,8 +1664,7 @@ again:
 			btrfs_end_write_no_snapshoting(root);
 			btrfs_delalloc_release_metadata(inode, release_bytes);
 		} else {
-			__btrfs_delalloc_release_space(inode, pos,
-						       release_bytes);
+			btrfs_delalloc_release_space(inode, pos, release_bytes);
 		}
 	}
 
@@ -2711,8 +2710,8 @@ static long btrfs_fallocate(struct file *file, int mode,
 out:
 	mutex_unlock(&inode->i_mutex);
 	/* Let go of our reservation. */
-	__btrfs_free_reserved_data_space(inode, alloc_start,
-					 alloc_end - alloc_start);
+	btrfs_free_reserved_data_space(inode, alloc_start,
+				       alloc_end - alloc_start);
 	return ret;
 }
 
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 78bc09c..767a605 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -488,17 +488,17 @@ again:
 	/* Just to make sure we have enough space */
 	prealloc += 8 * PAGE_CACHE_SIZE;
 
-	ret = __btrfs_delalloc_reserve_space(inode, 0, prealloc);
+	ret = btrfs_delalloc_reserve_space(inode, 0, prealloc);
 	if (ret)
 		goto out_put;
 
 	ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
 					      prealloc, prealloc, &alloc_hint);
 	if (ret) {
-		__btrfs_delalloc_release_space(inode, 0, prealloc);
+		btrfs_delalloc_release_space(inode, 0, prealloc);
 		goto out_put;
 	}
-	__btrfs_free_reserved_data_space(inode, 0, prealloc);
+	btrfs_free_reserved_data_space(inode, 0, prealloc);
 
 	ret = btrfs_write_out_ino_cache(root, trans, path, inode);
 out_put:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index df3cff2..ef0f8cd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1766,8 +1766,8 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 
 		if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID
 		    && do_list && !(state->state & EXTENT_NORESERVE))
-			__btrfs_free_reserved_data_space(inode, state->start,
-							 len);
+			btrfs_free_reserved_data_space(inode, state->start,
+						       len);
 
 		__percpu_counter_add(&root->fs_info->delalloc_bytes, -len,
 				     root->fs_info->delalloc_batch);
@@ -1986,8 +1986,8 @@ again:
 		goto again;
 	}
 
-	ret = __btrfs_delalloc_reserve_space(inode, page_start,
-					     PAGE_CACHE_SIZE);
+	ret = btrfs_delalloc_reserve_space(inode, page_start,
+					   PAGE_CACHE_SIZE);
 	if (ret) {
 		mapping_set_error(page->mapping, ret);
 		end_extent_writepage(page, ret, page_start, page_end);
@@ -4583,7 +4583,7 @@ int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
 	if ((offset & (blocksize - 1)) == 0 &&
 	    (!len || ((len & (blocksize - 1)) == 0)))
 		goto out;
-	ret = __btrfs_delalloc_reserve_space(inode,
+	ret = btrfs_delalloc_reserve_space(inode,
 			round_down(from, PAGE_CACHE_SIZE), PAGE_CACHE_SIZE);
 	if (ret)
 		goto out;
@@ -4591,7 +4591,7 @@ int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
 again:
 	page = find_or_create_page(mapping, index, mask);
 	if (!page) {
-		__btrfs_delalloc_release_space(inode,
+		btrfs_delalloc_release_space(inode,
 				round_down(from, PAGE_CACHE_SIZE),
 				PAGE_CACHE_SIZE);
 		ret = -ENOMEM;
@@ -4661,8 +4661,8 @@ again:
 
 out_unlock:
 	if (ret)
-		__btrfs_delalloc_release_space(inode, page_start,
-					       PAGE_CACHE_SIZE);
+		btrfs_delalloc_release_space(inode, page_start,
+					     PAGE_CACHE_SIZE);
 	unlock_page(page);
 	page_cache_release(page);
 out:
@@ -7593,7 +7593,7 @@ unlock:
 			spin_unlock(&BTRFS_I(inode)->lock);
 		}
 
-		__btrfs_free_reserved_data_space(inode, start, len);
+		btrfs_free_reserved_data_space(inode, start, len);
 		WARN_ON(dio_data->reserve < len);
 		dio_data->reserve -= len;
 		current->journal_info = dio_data;
@@ -8386,7 +8386,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 			mutex_unlock(&inode->i_mutex);
 			relock = true;
 		}
-		ret = __btrfs_delalloc_reserve_space(inode, offset, count);
+		ret = btrfs_delalloc_reserve_space(inode, offset, count);
 		if (ret)
 			goto out;
 		dio_data.outstanding_extents = div64_u64(count +
@@ -8415,11 +8415,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 		current->journal_info = NULL;
 		if (ret < 0 && ret != -EIOCBQUEUED) {
 			if (dio_data.reserve)
-				__btrfs_delalloc_release_space(inode, offset,
-						dio_data.reserve);
+				btrfs_delalloc_release_space(inode, offset,
+							     dio_data.reserve);
 		} else if (ret >= 0 && (size_t)ret < count)
-			__btrfs_delalloc_release_space(inode, offset,
-						       count - (size_t)ret);
+			btrfs_delalloc_release_space(inode, offset,
+						     count - (size_t)ret);
 	}
 out:
 	if (wakeup)
@@ -8630,8 +8630,8 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	page_start = page_offset(page);
 	page_end = page_start + PAGE_CACHE_SIZE - 1;
 
-	ret = __btrfs_delalloc_reserve_space(inode, page_start,
-					     PAGE_CACHE_SIZE);
+	ret = btrfs_delalloc_reserve_space(inode, page_start,
+					   PAGE_CACHE_SIZE);
 	if (!ret) {
 		ret = file_update_time(vma->vm_file);
 		reserved = 1;
@@ -8726,7 +8726,7 @@ out_unlock:
 	}
 	unlock_page(page);
 out:
-	__btrfs_delalloc_release_space(inode, page_start, PAGE_CACHE_SIZE);
+	btrfs_delalloc_release_space(inode, page_start, PAGE_CACHE_SIZE);
 out_noreserve:
 	sb_end_pagefault(inode->i_sb);
 	return ret;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3158b0f..97aee25 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1119,7 +1119,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 
 	page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
 
-	ret = __btrfs_delalloc_reserve_space(inode,
+	ret = btrfs_delalloc_reserve_space(inode,
 			start_index << PAGE_CACHE_SHIFT,
 			page_cnt << PAGE_CACHE_SHIFT);
 	if (ret)
@@ -1210,7 +1210,7 @@ again:
 		spin_lock(&BTRFS_I(inode)->lock);
 		BTRFS_I(inode)->outstanding_extents++;
 		spin_unlock(&BTRFS_I(inode)->lock);
-		__btrfs_delalloc_release_space(inode,
+		btrfs_delalloc_release_space(inode,
 				start_index << PAGE_CACHE_SHIFT,
 				(page_cnt - i_done) << PAGE_CACHE_SHIFT);
 	}
@@ -1237,7 +1237,7 @@ out:
 		unlock_page(pages[i]);
 		page_cache_release(pages[i]);
 	}
-	__btrfs_delalloc_release_space(inode,
+	btrfs_delalloc_release_space(inode,
 			start_index << PAGE_CACHE_SHIFT,
 			page_cnt << PAGE_CACHE_SHIFT);
 	return ret;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index b5d1850..646a867 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2030,7 +2030,7 @@ out:
 	return ret;
 }
 
-int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
+static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes)
 {
 	struct btrfs_root *quota_root;
 	struct btrfs_qgroup *qgroup;
@@ -2163,6 +2163,11 @@ out:
 	spin_unlock(&fs_info->qgroup_lock);
 }
 
+static inline void qgroup_free(struct btrfs_root *root, u64 num_bytes)
+{
+	return btrfs_qgroup_free_refroot(root->fs_info, root->objectid,
+					 num_bytes);
+}
 void assert_qgroups_uptodate(struct btrfs_trans_handle *trans)
 {
 	if (list_empty(&trans->qgroup_ref_list) && !trans->delayed_ref_elem.seq)
@@ -2512,7 +2517,7 @@ int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
 			&changeset);
 	if (ret < 0)
 		goto cleanup;
-	ret = btrfs_qgroup_reserve(root, changeset.bytes_changed);
+	ret = qgroup_reserve(root, changeset.bytes_changed);
 	if (ret < 0)
 		goto cleanup;
 
@@ -2548,8 +2553,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
 		goto out;
 
 	if (free)
-		btrfs_qgroup_free(BTRFS_I(inode)->root,
-				  changeset.bytes_changed);
+		qgroup_free(BTRFS_I(inode)->root, changeset.bytes_changed);
 out:
 	ulist_free(changeset.range_changed);
 	return ret;
@@ -2599,7 +2603,7 @@ int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes)
 		return 0;
 
 	BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
-	ret = btrfs_qgroup_reserve(root, num_bytes);
+	ret = qgroup_reserve(root, num_bytes);
 	if (ret < 0)
 		return ret;
 	atomic_add(num_bytes, &root->qgroup_meta_rsv);
@@ -2616,7 +2620,7 @@ void btrfs_qgroup_free_meta_all(struct btrfs_root *root)
 	reserved = atomic_xchg(&root->qgroup_meta_rsv, 0);
 	if (reserved == 0)
 		return;
-	btrfs_qgroup_free(root, reserved);
+	qgroup_free(root, reserved);
 }
 
 void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
@@ -2627,5 +2631,5 @@ void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
 	BUG_ON(num_bytes != round_down(num_bytes, root->nodesize));
 	WARN_ON(atomic_read(&root->qgroup_meta_rsv) < num_bytes);
 	atomic_sub(num_bytes, &root->qgroup_meta_rsv);
-	btrfs_qgroup_free(root, num_bytes);
+	qgroup_free(root, num_bytes);
 }
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 7d1c87c..adb03da 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -71,15 +71,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 			 struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
 			 struct btrfs_qgroup_inherit *inherit);
-int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
 void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
 			       u64 ref_root, u64 num_bytes);
-static inline void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes)
-{
-	return btrfs_qgroup_free_refroot(root->fs_info, root->objectid,
-					 num_bytes);
-}
-
 /*
  * TODO: Add proper trace point for it, as btrfs_qgroup_free() is
  * called by everywhere, can't provide good trace for delayed ref case.
@@ -89,7 +82,6 @@ static inline void btrfs_qgroup_free_delayed_ref(struct btrfs_fs_info *fs_info,
 {
 	btrfs_qgroup_free_refroot(fs_info, ref_root, num_bytes);
 }
-
 void assert_qgroups_uptodate(struct btrfs_trans_handle *trans);
 
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index f4621c5..f823276 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3034,8 +3034,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	BUG_ON(cluster->start != cluster->boundary[0]);
 	mutex_lock(&inode->i_mutex);
 
-	ret = __btrfs_check_data_free_space(inode, cluster->start,
-					    cluster->end + 1 - cluster->start);
+	ret = btrfs_check_data_free_space(inode, cluster->start,
+					  cluster->end + 1 - cluster->start);
 	if (ret)
 		goto out;
 
@@ -3056,8 +3056,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
 			break;
 		nr++;
 	}
-	__btrfs_free_reserved_data_space(inode, cluster->start,
-					 cluster->end + 1 - cluster->start);
+	btrfs_free_reserved_data_space(inode, cluster->start,
+				       cluster->end + 1 - cluster->start);
 out:
 	mutex_unlock(&inode->i_mutex);
 	return ret;
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 15/21] btrfs: qgroup: Add handler for NOCOW and inline
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (13 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 14/21] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 16/21] btrfs: Add handler for invalidate page Qu Wenruo
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

For NOCOW and inline case, there will be no delayed_ref created for
them, so we should free their reserved data space at proper
time(finish_ordered_io for NOCOW and cow_file_inline for inline).

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Newly introduced
v3:
  None
---
 fs/btrfs/extent-tree.c |  7 ++++++-
 fs/btrfs/inode.c       | 15 +++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1dadbba..765f7e0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4056,7 +4056,12 @@ int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
 	if (ret < 0)
 		return ret;
 
-	/* Use new btrfs_qgroup_reserve_data to reserve precious data space */
+	/*
+	 * Use new btrfs_qgroup_reserve_data to reserve precious data space
+	 *
+	 * TODO: Find a good method to avoid reserve data space for NOCOW
+	 * range, but don't impact performance on quota disable case.
+	 */
 	ret = btrfs_qgroup_reserve_data(inode, start, len);
 	return ret;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ef0f8cd..2fe95f0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -310,6 +310,13 @@ static noinline int cow_file_range_inline(struct btrfs_root *root,
 	btrfs_delalloc_release_metadata(inode, end + 1 - start);
 	btrfs_drop_extent_cache(inode, start, aligned_end - 1, 0);
 out:
+	/*
+	 * Don't forget to free the reserved space, as for inlined extent
+	 * it won't count as data extent, free them directly here.
+	 * And at reserve time, it's always aligned to page size, so
+	 * just free one page here.
+	 */
+	btrfs_qgroup_free_data(inode, 0, PAGE_CACHE_SIZE);
 	btrfs_free_path(path);
 	btrfs_end_transaction(trans, root);
 	return ret;
@@ -2832,6 +2839,14 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 
 	if (test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags)) {
 		BUG_ON(!list_empty(&ordered_extent->list)); /* Logic error */
+
+		/*
+		 * For mwrite(mmap + memset to write) case, we still reserve
+		 * space for NOCOW range.
+		 * As NOCOW won't cause a new delayed ref, just free the space
+		 */
+		btrfs_qgroup_free_data(inode, ordered_extent->file_offset,
+				       ordered_extent->len);
 		btrfs_ordered_update_i_size(inode, 0, ordered_extent);
 		if (nolock)
 			trans = btrfs_join_transaction_nolock(root);
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 16/21] btrfs: Add handler for invalidate page
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (14 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 15/21] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 17/21] btrfs: qgroup: Add new trace point for qgroup data reserve Qu Wenruo
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

For btrfs_invalidatepage() and its variant evict_inode_truncate_page(),
there will be pages don't reach disk.
In that case, their reserved space won't be release nor freed by
finish_ordered_io() nor delayed_ref handler.

So we must free their qgroup reserved space, or we will leaking reserved
space again.

So this will patch will call btrfs_qgroup_free_data() for
invalidatepage() and its variant evict_inode_truncate_page().

And due to the nature of new btrfs_qgroup_reserve/free_data() reserved
space will only be reserved or freed once, so for pages which are
already flushed to disk, their reserved space will be released and freed
by delayed_ref handler.

Double free won't be a problem.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Newly introduced
v3:
  None
---
 fs/btrfs/inode.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2fe95f0..fee54b6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5075,6 +5075,18 @@ static void evict_inode_truncate_pages(struct inode *inode)
 		spin_unlock(&io_tree->lock);
 
 		lock_extent_bits(io_tree, start, end, 0, &cached_state);
+
+		/*
+		 * If still has DELALLOC flag, the extent didn't reach disk,
+		 * and its reserved space won't be freed by delayed_ref.
+		 * So we need to free its reserved space here.
+		 * (Refer to comment in btrfs_invalidatepage, case 2)
+		 *
+		 * Note, end is the bytenr of last byte, so we need + 1 here.
+		 */
+		if (state->state & EXTENT_DELALLOC)
+			btrfs_qgroup_free_data(inode, start, end - start + 1);
+
 		clear_extent_bit(io_tree, start, end,
 				 EXTENT_LOCKED | EXTENT_DIRTY |
 				 EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
@@ -8592,6 +8604,18 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		}
 	}
 
+	/*
+	 * Qgroup reserved space handler
+	 * Page here will be either
+	 * 1) Already written to disk
+	 *    In this case, its reserved space is released from data rsv map
+	 *    and will be freed by delayed_ref handler finally.
+	 *    So even we call qgroup_free_data(), it won't decrease reserved
+	 *    space.
+	 * 2) Not written to disk
+	 *    This means the reserved space should be freed here.
+	 */
+	btrfs_qgroup_free_data(inode, page_start, PAGE_CACHE_SIZE);
 	if (!inode_evicting) {
 		clear_extent_bit(tree, page_start, page_end,
 				 EXTENT_LOCKED | EXTENT_DIRTY |
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 17/21] btrfs: qgroup: Add new trace point for qgroup data reserve
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (15 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 16/21] btrfs: Add handler for invalidate page Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 18/21] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Now each qgroup reserve for data will has its ftrace event for better
debugging.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Newly introduced
v3:
  None
---
 fs/btrfs/qgroup.c            |  11 ++++-
 fs/btrfs/qgroup.h            |   8 +++
 include/trace/events/btrfs.h | 113 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 646a867..b3b485a 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2511,10 +2511,12 @@ int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
 
 	changeset.bytes_changed = 0;
 	changeset.range_changed = ulist_alloc(GFP_NOFS);
-
 	ret = set_record_extent_bits(&BTRFS_I(inode)->io_tree, start,
 			start + len -1, EXTENT_QGROUP_RESERVED, GFP_NOFS,
 			&changeset);
+	trace_btrfs_qgroup_reserve_data(inode, start, len,
+					changeset.bytes_changed,
+					QGROUP_RESERVE);
 	if (ret < 0)
 		goto cleanup;
 	ret = qgroup_reserve(root, changeset.bytes_changed);
@@ -2539,6 +2541,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
 				       int free)
 {
 	struct extent_changeset changeset;
+	int trace_op = QGROUP_RELEASE;
 	int ret;
 
 	changeset.bytes_changed = 0;
@@ -2552,8 +2555,12 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
 	if (ret < 0)
 		goto out;
 
-	if (free)
+	if (free) {
 		qgroup_free(BTRFS_I(inode)->root, changeset.bytes_changed);
+		trace_op = QGROUP_FREE;
+	}
+	trace_btrfs_qgroup_release_data(inode, start, len,
+					changeset.bytes_changed, trace_op);
 out:
 	ulist_free(changeset.range_changed);
 	return ret;
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index adb03da..686b60f 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -33,6 +33,13 @@ struct btrfs_qgroup_extent_record {
 	struct ulist *old_roots;
 };
 
+/*
+ * For qgroup event trace points only
+ */
+#define QGROUP_RESERVE		(1<<0)
+#define QGROUP_RELEASE		(1<<1)
+#define QGROUP_FREE		(1<<2)
+
 int btrfs_quota_enable(struct btrfs_trans_handle *trans,
 		       struct btrfs_fs_info *fs_info);
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
@@ -81,6 +88,7 @@ static inline void btrfs_qgroup_free_delayed_ref(struct btrfs_fs_info *fs_info,
 						 u64 ref_root, u64 num_bytes)
 {
 	btrfs_qgroup_free_refroot(fs_info, ref_root, num_bytes);
+	trace_btrfs_qgroup_free_delayed_ref(ref_root, num_bytes);
 }
 void assert_qgroups_uptodate(struct btrfs_trans_handle *trans);
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 0b73af9..b4473da 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1117,6 +1117,119 @@ DEFINE_EVENT(btrfs__workqueue_done, btrfs_workqueue_destroy,
 	TP_ARGS(wq)
 );
 
+DECLARE_EVENT_CLASS(btrfs__qgroup_data_map,
+
+	TP_PROTO(struct inode *inode, u64 free_reserved),
+
+	TP_ARGS(inode, free_reserved),
+
+	TP_STRUCT__entry(
+		__field(	u64,		rootid		)
+		__field(	unsigned long,	ino		)
+		__field(	u64,		free_reserved	)
+	),
+
+	TP_fast_assign(
+		__entry->rootid		=	BTRFS_I(inode)->root->objectid;
+		__entry->ino		=	inode->i_ino;
+		__entry->free_reserved	=	free_reserved;
+	),
+
+	TP_printk("rootid=%llu, ino=%lu, free_reserved=%llu",
+		  __entry->rootid, __entry->ino, __entry->free_reserved)
+);
+
+DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_init_data_rsv_map,
+
+	TP_PROTO(struct inode *inode, u64 free_reserved),
+
+	TP_ARGS(inode, free_reserved)
+);
+
+DEFINE_EVENT(btrfs__qgroup_data_map, btrfs_qgroup_free_data_rsv_map,
+
+	TP_PROTO(struct inode *inode, u64 free_reserved),
+
+	TP_ARGS(inode, free_reserved)
+);
+
+#define BTRFS_QGROUP_OPERATIONS				\
+	{ QGROUP_RESERVE,	"reserve"	},	\
+	{ QGROUP_RELEASE,	"release"	},	\
+	{ QGROUP_FREE,		"free"		}
+
+DECLARE_EVENT_CLASS(btrfs__qgroup_rsv_data,
+
+	TP_PROTO(struct inode *inode, u64 start, u64 len, u64 reserved, int op),
+
+	TP_ARGS(inode, start, len, reserved, op),
+
+	TP_STRUCT__entry(
+		__field(	u64,		rootid		)
+		__field(	unsigned long,	ino		)
+		__field(	u64,		start		)
+		__field(	u64,		len		)
+		__field(	u64,		reserved	)
+		__field(	int,		op		)
+	),
+
+	TP_fast_assign(
+		__entry->rootid		= BTRFS_I(inode)->root->objectid;
+		__entry->ino		= inode->i_ino;
+		__entry->start		= start;
+		__entry->len		= len;
+		__entry->reserved	= reserved;
+		__entry->op		= op;
+	),
+
+	TP_printk("root=%llu, ino=%lu, start=%llu, len=%llu, reserved=%llu, op=%s",
+		  __entry->rootid, __entry->ino, __entry->start, __entry->len,
+		  __entry->reserved,
+		  __print_flags((unsigned long)__entry->op, "",
+				BTRFS_QGROUP_OPERATIONS)
+	)
+);
+
+DEFINE_EVENT(btrfs__qgroup_rsv_data, btrfs_qgroup_reserve_data,
+
+	TP_PROTO(struct inode *inode, u64 start, u64 len, u64 reserved, int op),
+
+	TP_ARGS(inode, start, len, reserved, op)
+);
+
+DEFINE_EVENT(btrfs__qgroup_rsv_data, btrfs_qgroup_release_data,
+
+	TP_PROTO(struct inode *inode, u64 start, u64 len, u64 reserved, int op),
+
+	TP_ARGS(inode, start, len, reserved, op)
+);
+
+DECLARE_EVENT_CLASS(btrfs__qgroup_delayed_ref,
+
+	TP_PROTO(u64 ref_root, u64 reserved),
+
+	TP_ARGS(ref_root, reserved),
+
+	TP_STRUCT__entry(
+		__field(	u64,		ref_root	)
+		__field(	u64,		reserved	)
+	),
+
+	TP_fast_assign(
+		__entry->ref_root	= ref_root;
+		__entry->reserved	= reserved;
+	),
+
+	TP_printk("root=%llu, reserved=%llu, op=free",
+		  __entry->ref_root, __entry->reserved)
+);
+
+DEFINE_EVENT(btrfs__qgroup_delayed_ref, btrfs_qgroup_free_delayed_ref,
+
+	TP_PROTO(u64 ref_root, u64 reserved),
+
+	TP_ARGS(ref_root, reserved)
+);
 #endif /* _TRACE_BTRFS_H */
 
 /* This part must be outside protection */
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 18/21] btrfs: fallocate: Add support to accurate qgroup reserve
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (16 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 17/21] btrfs: qgroup: Add new trace point for qgroup data reserve Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 19/21] btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size Qu Wenruo
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Now fallocate will do accurate qgroup reserve space check, unlike old
method, which will always reserve the whole length of the range.

With this patch, fallocate will:
1) Iterate the desired range and mark in data rsv map
   Only range which is going to be allocated will be recorded in data
   rsv map and reserve the space.
   For already allocated range (normal/prealloc extent) they will be
   skipped.
   Also, record the marked range into a new list for later use.

2) If 1) succeeded, do real file extent allocate.
   And at file extent allocation time, corresponding range will be
   removed from the range in data rsv map.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Fix comment typo
  Add missing cleanup for falloc list
v3:
  Fix a false error return, due to chunk_allocation may return >0, but
  incorrectly considered as an error.
---
 fs/btrfs/file.c | 161 ++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 117 insertions(+), 44 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index c97b24f..35bfabf 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2545,17 +2545,61 @@ out_only_mutex:
 	return err;
 }
 
+/* Helper structure to record which range is already reserved */
+struct falloc_range {
+	struct list_head list;
+	u64 start;
+	u64 len;
+};
+
+/*
+ * Helper function to add falloc range
+ *
+ * Caller should have locked the larger range of extent containing
+ * [start, len)
+ */
+static int add_falloc_range(struct list_head *head, u64 start, u64 len)
+{
+	struct falloc_range *prev = NULL;
+	struct falloc_range *range = NULL;
+
+	if (list_empty(head))
+		goto insert;
+
+	/*
+	 * As fallocate iterate by bytenr order, we only need to check
+	 * the last range.
+	 */
+	prev = list_entry(head->prev, struct falloc_range, list);
+	if (prev->start + prev->len == start) {
+		prev->len += len;
+		return 0;
+	}
+insert:
+	range = kmalloc(sizeof(*range), GFP_NOFS);
+	if (!range)
+		return -ENOMEM;
+	range->start = start;
+	range->len = len;
+	list_add_tail(&range->list, head);
+	return 0;
+}
+
 static long btrfs_fallocate(struct file *file, int mode,
 			    loff_t offset, loff_t len)
 {
 	struct inode *inode = file_inode(file);
 	struct extent_state *cached_state = NULL;
+	struct falloc_range *range;
+	struct falloc_range *tmp;
+	struct list_head reserve_list;
 	u64 cur_offset;
 	u64 last_byte;
 	u64 alloc_start;
 	u64 alloc_end;
 	u64 alloc_hint = 0;
 	u64 locked_end;
+	u64 actual_end = 0;
 	struct extent_map *em;
 	int blocksize = BTRFS_I(inode)->root->sectorsize;
 	int ret;
@@ -2571,14 +2615,12 @@ static long btrfs_fallocate(struct file *file, int mode,
 		return btrfs_punch_hole(inode, offset, len);
 
 	/*
-	 * Make sure we have enough space before we do the
-	 * allocation.
-	 * XXX: The behavior must be changed to do accurate check first
-	 * and then check data reserved space.
+	 * Only trigger disk allocation, don't trigger qgroup reserve
+	 *
+	 * For qgroup space, it will be checked later.
 	 */
-	ret = btrfs_check_data_free_space(inode, alloc_start,
-					  alloc_end - alloc_start);
-	if (ret)
+	ret = btrfs_alloc_data_chunk_ondemand(inode, alloc_end - alloc_start);
+	if (ret < 0)
 		return ret;
 
 	mutex_lock(&inode->i_mutex);
@@ -2586,6 +2628,13 @@ static long btrfs_fallocate(struct file *file, int mode,
 	if (ret)
 		goto out;
 
+	/*
+	 * TODO: Move these two operations after we have checked
+	 * accurate reserved space, or fallocate can still fail but
+	 * with page truncated or size expanded.
+	 *
+	 * But that's a minor problem and won't do much harm BTW.
+	 */
 	if (alloc_start > inode->i_size) {
 		ret = btrfs_cont_expand(inode, i_size_read(inode),
 					alloc_start);
@@ -2644,10 +2693,10 @@ static long btrfs_fallocate(struct file *file, int mode,
 		}
 	}
 
+	/* First, check if we exceed the qgroup limit */
+	INIT_LIST_HEAD(&reserve_list);
 	cur_offset = alloc_start;
 	while (1) {
-		u64 actual_end;
-
 		em = btrfs_get_extent(inode, NULL, 0, cur_offset,
 				      alloc_end - cur_offset, 0);
 		if (IS_ERR_OR_NULL(em)) {
@@ -2660,54 +2709,78 @@ static long btrfs_fallocate(struct file *file, int mode,
 		last_byte = min(extent_map_end(em), alloc_end);
 		actual_end = min_t(u64, extent_map_end(em), offset + len);
 		last_byte = ALIGN(last_byte, blocksize);
-
 		if (em->block_start == EXTENT_MAP_HOLE ||
 		    (cur_offset >= inode->i_size &&
 		     !test_bit(EXTENT_FLAG_PREALLOC, &em->flags))) {
-			ret = btrfs_prealloc_file_range(inode, mode, cur_offset,
-							last_byte - cur_offset,
-							1 << inode->i_blkbits,
-							offset + len,
-							&alloc_hint);
-		} else if (actual_end > inode->i_size &&
-			   !(mode & FALLOC_FL_KEEP_SIZE)) {
-			struct btrfs_trans_handle *trans;
-			struct btrfs_root *root = BTRFS_I(inode)->root;
-
-			/*
-			 * We didn't need to allocate any more space, but we
-			 * still extended the size of the file so we need to
-			 * update i_size and the inode item.
-			 */
-			trans = btrfs_start_transaction(root, 1);
-			if (IS_ERR(trans)) {
-				ret = PTR_ERR(trans);
-			} else {
-				inode->i_ctime = CURRENT_TIME;
-				i_size_write(inode, actual_end);
-				btrfs_ordered_update_i_size(inode, actual_end,
-							    NULL);
-				ret = btrfs_update_inode(trans, root, inode);
-				if (ret)
-					btrfs_end_transaction(trans, root);
-				else
-					ret = btrfs_end_transaction(trans,
-								    root);
+			ret = add_falloc_range(&reserve_list, cur_offset,
+					       last_byte - cur_offset);
+			if (ret < 0) {
+				free_extent_map(em);
+				break;
 			}
+			ret = btrfs_qgroup_reserve_data(inode, cur_offset,
+					last_byte - cur_offset);
+			if (ret < 0)
+				break;
 		}
 		free_extent_map(em);
-		if (ret < 0)
-			break;
-
 		cur_offset = last_byte;
-		if (cur_offset >= alloc_end) {
-			ret = 0;
+		if (cur_offset >= alloc_end)
 			break;
+	}
+
+	/*
+	 * If ret is still 0, means we're OK to fallocate.
+	 * Or just cleanup the list and exit.
+	 */
+	list_for_each_entry_safe(range, tmp, &reserve_list, list) {
+		if (!ret)
+			ret = btrfs_prealloc_file_range(inode, mode,
+					range->start,
+					range->len, 1 << inode->i_blkbits,
+					offset + len, &alloc_hint);
+		list_del(&range->list);
+		kfree(range);
+	}
+	if (ret < 0)
+		goto out_unlock;
+
+	if (actual_end > inode->i_size &&
+	    !(mode & FALLOC_FL_KEEP_SIZE)) {
+		struct btrfs_trans_handle *trans;
+		struct btrfs_root *root = BTRFS_I(inode)->root;
+
+		/*
+		 * We didn't need to allocate any more space, but we
+		 * still extended the size of the file so we need to
+		 * update i_size and the inode item.
+		 */
+		trans = btrfs_start_transaction(root, 1);
+		if (IS_ERR(trans)) {
+			ret = PTR_ERR(trans);
+		} else {
+			inode->i_ctime = CURRENT_TIME;
+			i_size_write(inode, actual_end);
+			btrfs_ordered_update_i_size(inode, actual_end, NULL);
+			ret = btrfs_update_inode(trans, root, inode);
+			if (ret)
+				btrfs_end_transaction(trans, root);
+			else
+				ret = btrfs_end_transaction(trans, root);
 		}
 	}
+out_unlock:
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
 			     &cached_state, GFP_NOFS);
 out:
+	/*
+	 * As we waited the extent range, the data_rsv_map must be empty
+	 * in the range, as written data range will be released from it.
+	 * And for prealloacted extent, it will also be released when
+	 * its metadata is written.
+	 * So this is completely used as cleanup.
+	 */
+	btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start);
 	mutex_unlock(&inode->i_mutex);
 	/* Let go of our reservation. */
 	btrfs_free_reserved_data_space(inode, alloc_start,
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 19/21] btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (17 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 18/21] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 20/21] btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in clear_bit_hook Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 21/21] btrfs: qgroup: Check if qgroup reserved space leaked Qu Wenruo
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Current code will always truncate tailing page if its alloc_start is
smaller than inode size.

This behavior will cause a lot of unneeded COW page size extent.

This patch will avoid such problem.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  Newly introduced
v3:
  None
---
 fs/btrfs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 35bfabf..e59b123 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2640,7 +2640,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 					alloc_start);
 		if (ret)
 			goto out;
-	} else {
+	} else if (offset + len > inode->i_size) {
 		/*
 		 * If we are fallocating from the end of the file onward we
 		 * need to zero out the end of the page if i_size lands in the
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 20/21] btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in clear_bit_hook
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (18 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 19/21] btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  2015-10-13  2:20 ` [PATCH v3 21/21] btrfs: qgroup: Check if qgroup reserved space leaked Qu Wenruo
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

In clear_bit_hook, qgroup reserved data is already handled quite well,
either released by finish_ordered_io or invalidatepage.

So calling btrfs_qgroup_free_data() here is completely meaningless, and
since btrfs_qgroup_free_data() will lock io_tree, so it can't be called
with io_tree lock hold.

This patch will add a new function
btrfs_free_reserved_data_space_noquota() for clear_bit_hook() to cease
the lockdep warning.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v2:
  None
v3:
  Update commit message as now it will cause a deadlock instead of
  lockdep warning
---
 fs/btrfs/ctree.h       |  2 ++
 fs/btrfs/extent-tree.c | 28 ++++++++++++++++++----------
 fs/btrfs/inode.c       |  4 ++--
 3 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f20b901..3970426 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3455,6 +3455,8 @@ enum btrfs_reserve_flush_enum {
 int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
 int btrfs_alloc_data_chunk_ondemand(struct inode *inode, u64 bytes);
 void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
+void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
+					    u64 len);
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
 				struct btrfs_root *root);
 void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 765f7e0..af221eb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4070,10 +4070,12 @@ int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
  * Called if we need to clear a data reservation for this inode
  * Normally in a error case.
  *
- * This one will handle the per-indoe data rsv map for accurate reserved
- * space framework.
+ * This one will *NOT* use accurate qgroup reserved space API, just for case
+ * which we can't sleep and is sure it won't affect qgroup reserved space.
+ * Like clear_bit_hook().
  */
-void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
+void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
+					    u64 len)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_space_info *data_sinfo;
@@ -4083,13 +4085,6 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
 	      round_down(start, root->sectorsize);
 	start = round_down(start, root->sectorsize);
 
-	/*
-	 * Free any reserved qgroup data space first
-	 * As it will alloc memory, we can't do it with data sinfo
-	 * spinlock hold.
-	 */
-	btrfs_qgroup_free_data(inode, start, len);
-
 	data_sinfo = root->fs_info->data_sinfo;
 	spin_lock(&data_sinfo->lock);
 	if (WARN_ON(data_sinfo->bytes_may_use < len))
@@ -4101,6 +4096,19 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
 	spin_unlock(&data_sinfo->lock);
 }
 
+/*
+ * Called if we need to clear a data reservation for this inode
+ * Normally in a error case.
+ *
+ * This one will handle the per-indoe data rsv map for accurate reserved
+ * space framework.
+ */
+void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
+{
+	btrfs_free_reserved_data_space_noquota(inode, start, len);
+	btrfs_qgroup_free_data(inode, start, len);
+}
+
 static void force_metadata_allocation(struct btrfs_fs_info *info)
 {
 	struct list_head *head = &info->space_info;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index fee54b6..39c9191 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1773,8 +1773,8 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 
 		if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID
 		    && do_list && !(state->state & EXTENT_NORESERVE))
-			btrfs_free_reserved_data_space(inode, state->start,
-						       len);
+			btrfs_free_reserved_data_space_noquota(inode,
+					state->start, len);
 
 		__percpu_counter_add(&root->fs_info->delalloc_bytes, -len,
 				     root->fs_info->delalloc_batch);
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 21/21] btrfs: qgroup: Check if qgroup reserved space leaked
  2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
                   ` (19 preceding siblings ...)
  2015-10-13  2:20 ` [PATCH v3 20/21] btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in clear_bit_hook Qu Wenruo
@ 2015-10-13  2:20 ` Qu Wenruo
  20 siblings, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-13  2:20 UTC (permalink / raw)
  To: linux-btrfs

Add check at btrfs_destroy_inode() time to detect qgroup reserved space
leak.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
v3:
  Separate from old btrfs_qgroup_free_data_rsv_map().
---
 fs/btrfs/inode.c  |  1 +
 fs/btrfs/qgroup.c | 32 ++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h |  1 +
 3 files changed, 34 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 39c9191..15d6ee0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9054,6 +9054,7 @@ void btrfs_destroy_inode(struct inode *inode)
 			btrfs_put_ordered_extent(ordered);
 		}
 	}
+	btrfs_qgroup_check_reserved_leak(inode);
 	inode_tree_del(inode);
 	btrfs_drop_extent_cache(inode, 0, (u64)-1, 0);
 free:
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index b3b485a..f452c85 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2640,3 +2640,35 @@ void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
 	atomic_sub(num_bytes, &root->qgroup_meta_rsv);
 	qgroup_free(root, num_bytes);
 }
+
+/*
+ * Check qgroup reserved space leaking, normally at destory inode
+ * time
+ */
+void btrfs_qgroup_check_reserved_leak(struct inode *inode)
+{
+	struct extent_changeset changeset;
+	struct ulist_node *unode;
+	struct ulist_iterator iter;
+	int ret;
+
+	changeset.bytes_changed = 0;
+	changeset.range_changed = ulist_alloc(GFP_NOFS);
+	if (WARN_ON(!changeset.range_changed))
+		return;
+
+	ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, 0, (u64)-1,
+			EXTENT_QGROUP_RESERVED, GFP_NOFS, &changeset);
+
+	WARN_ON(ret < 0);
+	if (WARN_ON(changeset.bytes_changed)) {
+		ULIST_ITER_INIT(&iter);
+		while ((unode = ulist_next(changeset.range_changed, &iter))) {
+			btrfs_warn(BTRFS_I(inode)->root->fs_info,
+				"leaking qgroup reserved space, ino: %lu, start: %llu, end: %llu",
+				inode->i_ino, unode->val, unode->aux);
+		}
+		qgroup_free(BTRFS_I(inode)->root, changeset.bytes_changed);
+	}
+	ulist_free(changeset.range_changed);
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 686b60f..ecb2c14 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -105,4 +105,5 @@ int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
 int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes);
 void btrfs_qgroup_free_meta_all(struct btrfs_root *root);
 void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes);
+void btrfs_qgroup_check_reserved_leak(struct inode *inode);
 #endif /* __BTRFS_QGROUP__ */
-- 
2.6.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-13  2:20 ` [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
@ 2015-10-25 14:39   ` Filipe Manana
  2015-10-26  1:27     ` Qu Wenruo
  2015-10-27  4:13     ` Qu Wenruo
  0 siblings, 2 replies; 35+ messages in thread
From: Filipe Manana @ 2015-10-25 14:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> Add new function btrfs_add_delayed_qgroup_reserve() function to record
> how much space is reserved for that extent.
>
> As btrfs only accounts qgroup at run_delayed_refs() time, so newly
> allocated extent should keep the reserved space until then.
>
> So add needed function with related members to do it.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
> v2:
>   None
> v3:
>   None
> ---
>  fs/btrfs/delayed-ref.c | 29 +++++++++++++++++++++++++++++
>  fs/btrfs/delayed-ref.h | 14 ++++++++++++++
>  2 files changed, 43 insertions(+)
>
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index ac3e81d..bd9b63b 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>         INIT_LIST_HEAD(&head_ref->ref_list);
>         head_ref->processing = 0;
>         head_ref->total_ref_mod = count_mod;
> +       head_ref->qgroup_reserved = 0;
> +       head_ref->qgroup_ref_root = 0;
>
>         /* Record qgroup extent info if provided */
>         if (qrecord) {
> @@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>         return 0;
>  }
>
> +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
> +                                    struct btrfs_trans_handle *trans,
> +                                    u64 ref_root, u64 bytenr, u64 num_bytes)
> +{
> +       struct btrfs_delayed_ref_root *delayed_refs;
> +       struct btrfs_delayed_ref_head *ref_head;
> +       int ret = 0;
> +
> +       if (!fs_info->quota_enabled || !is_fstree(ref_root))
> +               return 0;
> +
> +       delayed_refs = &trans->transaction->delayed_refs;
> +
> +       spin_lock(&delayed_refs->lock);
> +       ref_head = find_ref_head(&delayed_refs->href_root, bytenr, 0);
> +       if (!ref_head) {
> +               ret = -ENOENT;
> +               goto out;
> +       }

Hi Qu,

So while running btrfs/063, with qgroups enabled (I modified the test
to enable qgroups), ran into this 2 times:

[169125.246506] BTRFS info (device sdc): disk space caching is enabled
[169125.363164] ------------[ cut here ]------------
[169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]()
[169125.367702] BTRFS: Transaction aborted (error -2)
[169125.368830] Modules linked in: btrfs dm_flakey dm_mod
crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs
lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4
psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core
serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom
ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring
scsi_mod e1000 virtio [last unloaded: btrfs]
[169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G
  W       4.3.0-rc5-btrfs-next-17+ #1
[169125.378522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org
04/01/2014
[169125.380916] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[169125.382167]  0000000000000000 ffff88007ef2bc28 ffffffff812566f4
ffff88007ef2bc70
[169125.383643]  ffff88007ef2bc60 ffffffff8104d0a6 ffffffffa03cac33
ffff8801f5ca6db0
[169125.385197]  ffff8802c6c7ee98 ffff880122bc1000 00000000fffffffe
ffff88007ef2bcc8
[169125.386691] Call Trace:
[169125.387194]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
[169125.388205]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
[169125.389386]  [<ffffffffa03cac33>] ?
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
[169125.390837]  [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50
[169125.391839]  [<ffffffffa03d67bb>] ? unpin_extent_cache+0xbe/0xcc [btrfs]
[169125.392973]  [<ffffffffa03cac33>]
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
[169125.395714]  [<ffffffff8147c612>] ? _raw_spin_unlock_irqrestore+0x38/0x60
[169125.396888]  [<ffffffff81087d0b>] ? trace_hardirqs_off_caller+0x1f/0xb9
[169125.397986]  [<ffffffffa03cadec>] finish_ordered_fn+0x15/0x17 [btrfs]
[169125.399122]  [<ffffffffa03ec706>] normal_work_helper+0x14c/0x32a [btrfs]
[169125.400300]  [<ffffffffa03ec9e6>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
[169125.401450]  [<ffffffff81063b23>] process_one_work+0x24a/0x4ac
[169125.402631]  [<ffffffff81064285>] worker_thread+0x206/0x2c2
[169125.403622]  [<ffffffff8106407f>] ? rescuer_thread+0x2cb/0x2cb
[169125.404693]  [<ffffffff8106904d>] kthread+0xef/0xf7
[169125.405727]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24
[169125.406808]  [<ffffffff8147d10f>] ret_from_fork+0x3f/0x70
[169125.407834]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24
[169125.408840] ---[ end trace 6ee4342a5722b119 ]---
[169125.409654] BTRFS: error (device sdc) in
btrfs_finish_ordered_io:2929: errno=-2 No such entry

So what you have here is racy:

btrfs_finish_ordered_io()
   joins existing transaction (or starts a new one)
   insert_reserved_file_extent()
      btrfs_alloc_reserved_file_extent() --> creates delayed ref

      ******* delayed refs are run, someone called
btrfs_async_run_delayed_refs() from btrfs_end_transaction(), ref head
is removed ******

      btrfs_add_delayed_qgroup_reserve() --> does not find delayed ref
head, returns -ENOENT and finish_ordered_io aborts current
transaction...

A very tiny race, but...

thanks


> +       WARN_ON(ref_head->qgroup_reserved || ref_head->qgroup_ref_root);
> +       ref_head->qgroup_ref_root = ref_root;
> +       ref_head->qgroup_reserved = num_bytes;
> +out:
> +       spin_unlock(&delayed_refs->lock);
> +       return ret;
> +}
> +
>  int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>                                 struct btrfs_trans_handle *trans,
>                                 u64 bytenr, u64 num_bytes,
> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> index 13fb5e6..d4c41e2 100644
> --- a/fs/btrfs/delayed-ref.h
> +++ b/fs/btrfs/delayed-ref.h
> @@ -113,6 +113,17 @@ struct btrfs_delayed_ref_head {
>         int total_ref_mod;
>
>         /*
> +        * For qgroup reserved space freeing.
> +        *
> +        * ref_root and reserved will be recorded after
> +        * BTRFS_ADD_DELAYED_EXTENT is called.
> +        * And will be used to free reserved qgroup space at
> +        * run_delayed_refs() time.
> +        */
> +       u64 qgroup_ref_root;
> +       u64 qgroup_reserved;
> +
> +       /*
>          * when a new extent is allocated, it is just reserved in memory
>          * The actual extent isn't inserted into the extent allocation tree
>          * until the delayed ref is processed.  must_insert_reserved is
> @@ -242,6 +253,9 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>                                u64 owner, u64 offset, int action,
>                                struct btrfs_delayed_extent_op *extent_op,
>                                int no_quota);
> +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
> +                                    struct btrfs_trans_handle *trans,
> +                                    u64 ref_root, u64 bytenr, u64 num_bytes);
>  int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>                                 struct btrfs_trans_handle *trans,
>                                 u64 bytenr, u64 num_bytes,
> --
> 2.6.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-25 14:39   ` Filipe Manana
@ 2015-10-26  1:27     ` Qu Wenruo
  2015-10-27  4:13     ` Qu Wenruo
  1 sibling, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-26  1:27 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs



Filipe Manana wrote on 2015/10/25 14:39 +0000:
> On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> Add new function btrfs_add_delayed_qgroup_reserve() function to record
>> how much space is reserved for that extent.
>>
>> As btrfs only accounts qgroup at run_delayed_refs() time, so newly
>> allocated extent should keep the reserved space until then.
>>
>> So add needed function with related members to do it.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>> v2:
>>    None
>> v3:
>>    None
>> ---
>>   fs/btrfs/delayed-ref.c | 29 +++++++++++++++++++++++++++++
>>   fs/btrfs/delayed-ref.h | 14 ++++++++++++++
>>   2 files changed, 43 insertions(+)
>>
>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>> index ac3e81d..bd9b63b 100644
>> --- a/fs/btrfs/delayed-ref.c
>> +++ b/fs/btrfs/delayed-ref.c
>> @@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>>          INIT_LIST_HEAD(&head_ref->ref_list);
>>          head_ref->processing = 0;
>>          head_ref->total_ref_mod = count_mod;
>> +       head_ref->qgroup_reserved = 0;
>> +       head_ref->qgroup_ref_root = 0;
>>
>>          /* Record qgroup extent info if provided */
>>          if (qrecord) {
>> @@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>>          return 0;
>>   }
>>
>> +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
>> +                                    struct btrfs_trans_handle *trans,
>> +                                    u64 ref_root, u64 bytenr, u64 num_bytes)
>> +{
>> +       struct btrfs_delayed_ref_root *delayed_refs;
>> +       struct btrfs_delayed_ref_head *ref_head;
>> +       int ret = 0;
>> +
>> +       if (!fs_info->quota_enabled || !is_fstree(ref_root))
>> +               return 0;
>> +
>> +       delayed_refs = &trans->transaction->delayed_refs;
>> +
>> +       spin_lock(&delayed_refs->lock);
>> +       ref_head = find_ref_head(&delayed_refs->href_root, bytenr, 0);
>> +       if (!ref_head) {
>> +               ret = -ENOENT;
>> +               goto out;
>> +       }
>
> Hi Qu,
>
> So while running btrfs/063, with qgroups enabled (I modified the test
> to enable qgroups), ran into this 2 times:

Thanks for the test.

I also want a method to enable quota for all other btrfs/generic tests,
but have no good idea other than modifing testcase itself.

Any good ideas?
>
> [169125.246506] BTRFS info (device sdc): disk space caching is enabled
> [169125.363164] ------------[ cut here ]------------
> [169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929
> btrfs_finish_ordered_io+0x347/0x4eb [btrfs]()
> [169125.367702] BTRFS: Transaction aborted (error -2)
> [169125.368830] Modules linked in: btrfs dm_flakey dm_mod
> crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs
> lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4
> psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core
> serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom
> ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring
> scsi_mod e1000 virtio [last unloaded: btrfs]
> [169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G
>    W       4.3.0-rc5-btrfs-next-17+ #1
> [169125.378522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org
> 04/01/2014
> [169125.380916] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [169125.382167]  0000000000000000 ffff88007ef2bc28 ffffffff812566f4
> ffff88007ef2bc70
> [169125.383643]  ffff88007ef2bc60 ffffffff8104d0a6 ffffffffa03cac33
> ffff8801f5ca6db0
> [169125.385197]  ffff8802c6c7ee98 ffff880122bc1000 00000000fffffffe
> ffff88007ef2bcc8
> [169125.386691] Call Trace:
> [169125.387194]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
> [169125.388205]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
> [169125.389386]  [<ffffffffa03cac33>] ?
> btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
> [169125.390837]  [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50
> [169125.391839]  [<ffffffffa03d67bb>] ? unpin_extent_cache+0xbe/0xcc [btrfs]
> [169125.392973]  [<ffffffffa03cac33>]
> btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
> [169125.395714]  [<ffffffff8147c612>] ? _raw_spin_unlock_irqrestore+0x38/0x60
> [169125.396888]  [<ffffffff81087d0b>] ? trace_hardirqs_off_caller+0x1f/0xb9
> [169125.397986]  [<ffffffffa03cadec>] finish_ordered_fn+0x15/0x17 [btrfs]
> [169125.399122]  [<ffffffffa03ec706>] normal_work_helper+0x14c/0x32a [btrfs]
> [169125.400300]  [<ffffffffa03ec9e6>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
> [169125.401450]  [<ffffffff81063b23>] process_one_work+0x24a/0x4ac
> [169125.402631]  [<ffffffff81064285>] worker_thread+0x206/0x2c2
> [169125.403622]  [<ffffffff8106407f>] ? rescuer_thread+0x2cb/0x2cb
> [169125.404693]  [<ffffffff8106904d>] kthread+0xef/0xf7
> [169125.405727]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24
> [169125.406808]  [<ffffffff8147d10f>] ret_from_fork+0x3f/0x70
> [169125.407834]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24
> [169125.408840] ---[ end trace 6ee4342a5722b119 ]---
> [169125.409654] BTRFS: error (device sdc) in
> btrfs_finish_ordered_io:2929: errno=-2 No such entry
>
> So what you have here is racy:
>
> btrfs_finish_ordered_io()
>     joins existing transaction (or starts a new one)
>     insert_reserved_file_extent()
>        btrfs_alloc_reserved_file_extent() --> creates delayed ref
>
>        ******* delayed refs are run, someone called
> btrfs_async_run_delayed_refs() from btrfs_end_transaction(), ref head
> is removed ******
>
>        btrfs_add_delayed_qgroup_reserve() --> does not find delayed ref
> head, returns -ENOENT and finish_ordered_io aborts current
> transaction...
>
> A very tiny race, but...

Oh, abort transaction, quite a big problem.

The original idea to introduce btrfs_add_delayed_qgroup_reserve() is to 
put all related qgroup code into qgroup.c, but truth turned out that is 
too ideal.

I'll add a new patch to modify btrfs_add_delayed_data_ref() function, 
remove the last parameter no_quota and add new reserved parameter, to 
allow reserved bytenr inserted at delayed_ref inserting time.

Thanks
Qu

>
> thanks
>
>
>> +       WARN_ON(ref_head->qgroup_reserved || ref_head->qgroup_ref_root);
>> +       ref_head->qgroup_ref_root = ref_root;
>> +       ref_head->qgroup_reserved = num_bytes;
>> +out:
>> +       spin_unlock(&delayed_refs->lock);
>> +       return ret;
>> +}
>> +
>>   int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>>                                  struct btrfs_trans_handle *trans,
>>                                  u64 bytenr, u64 num_bytes,
>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>> index 13fb5e6..d4c41e2 100644
>> --- a/fs/btrfs/delayed-ref.h
>> +++ b/fs/btrfs/delayed-ref.h
>> @@ -113,6 +113,17 @@ struct btrfs_delayed_ref_head {
>>          int total_ref_mod;
>>
>>          /*
>> +        * For qgroup reserved space freeing.
>> +        *
>> +        * ref_root and reserved will be recorded after
>> +        * BTRFS_ADD_DELAYED_EXTENT is called.
>> +        * And will be used to free reserved qgroup space at
>> +        * run_delayed_refs() time.
>> +        */
>> +       u64 qgroup_ref_root;
>> +       u64 qgroup_reserved;
>> +
>> +       /*
>>           * when a new extent is allocated, it is just reserved in memory
>>           * The actual extent isn't inserted into the extent allocation tree
>>           * until the delayed ref is processed.  must_insert_reserved is
>> @@ -242,6 +253,9 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>>                                 u64 owner, u64 offset, int action,
>>                                 struct btrfs_delayed_extent_op *extent_op,
>>                                 int no_quota);
>> +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
>> +                                    struct btrfs_trans_handle *trans,
>> +                                    u64 ref_root, u64 bytenr, u64 num_bytes);
>>   int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>>                                  struct btrfs_trans_handle *trans,
>>                                  u64 bytenr, u64 num_bytes,
>> --
>> 2.6.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-25 14:39   ` Filipe Manana
  2015-10-26  1:27     ` Qu Wenruo
@ 2015-10-27  4:13     ` Qu Wenruo
  2015-10-27  5:14       ` Chris Mason
  2015-10-27  9:22       ` Filipe Manana
  1 sibling, 2 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-27  4:13 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs



Filipe Manana wrote on 2015/10/25 14:39 +0000:
> On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> Add new function btrfs_add_delayed_qgroup_reserve() function to record
>> how much space is reserved for that extent.
>>
>> As btrfs only accounts qgroup at run_delayed_refs() time, so newly
>> allocated extent should keep the reserved space until then.
>>
>> So add needed function with related members to do it.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>> v2:
>>    None
>> v3:
>>    None
>> ---
>>   fs/btrfs/delayed-ref.c | 29 +++++++++++++++++++++++++++++
>>   fs/btrfs/delayed-ref.h | 14 ++++++++++++++
>>   2 files changed, 43 insertions(+)
>>
>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>> index ac3e81d..bd9b63b 100644
>> --- a/fs/btrfs/delayed-ref.c
>> +++ b/fs/btrfs/delayed-ref.c
>> @@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>>          INIT_LIST_HEAD(&head_ref->ref_list);
>>          head_ref->processing = 0;
>>          head_ref->total_ref_mod = count_mod;
>> +       head_ref->qgroup_reserved = 0;
>> +       head_ref->qgroup_ref_root = 0;
>>
>>          /* Record qgroup extent info if provided */
>>          if (qrecord) {
>> @@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>>          return 0;
>>   }
>>
>> +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
>> +                                    struct btrfs_trans_handle *trans,
>> +                                    u64 ref_root, u64 bytenr, u64 num_bytes)
>> +{
>> +       struct btrfs_delayed_ref_root *delayed_refs;
>> +       struct btrfs_delayed_ref_head *ref_head;
>> +       int ret = 0;
>> +
>> +       if (!fs_info->quota_enabled || !is_fstree(ref_root))
>> +               return 0;
>> +
>> +       delayed_refs = &trans->transaction->delayed_refs;
>> +
>> +       spin_lock(&delayed_refs->lock);
>> +       ref_head = find_ref_head(&delayed_refs->href_root, bytenr, 0);
>> +       if (!ref_head) {
>> +               ret = -ENOENT;
>> +               goto out;
>> +       }
>
> Hi Qu,
>
> So while running btrfs/063, with qgroups enabled (I modified the test
> to enable qgroups), ran into this 2 times:
>
> [169125.246506] BTRFS info (device sdc): disk space caching is enabled
> [169125.363164] ------------[ cut here ]------------
> [169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929
> btrfs_finish_ordered_io+0x347/0x4eb [btrfs]()
> [169125.367702] BTRFS: Transaction aborted (error -2)
> [169125.368830] Modules linked in: btrfs dm_flakey dm_mod
> crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs
> lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4
> psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core
> serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom
> ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring
> scsi_mod e1000 virtio [last unloaded: btrfs]
> [169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G
>    W       4.3.0-rc5-btrfs-next-17+ #1

Hi Filipe,

Although not related to the bug report, I'm a little interested in your 
testing kernel.

Are you testing integration-4.4 from Chris repo?
Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?

Although integration-4.4 already merged qgroup reserve patchset, but 
it's causing some strange bug like over decrease data 
sinfo->bytes_may_use, mainly in generic/127 testcase.

But if qgroup reserve patchset is rebased to integration-4.3 (I did all 
my old tests based on that), no generic/127 problem at all.

Thanks,
Qu

> [169125.378522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org
> 04/01/2014
> [169125.380916] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [169125.382167]  0000000000000000 ffff88007ef2bc28 ffffffff812566f4
> ffff88007ef2bc70
> [169125.383643]  ffff88007ef2bc60 ffffffff8104d0a6 ffffffffa03cac33
> ffff8801f5ca6db0
> [169125.385197]  ffff8802c6c7ee98 ffff880122bc1000 00000000fffffffe
> ffff88007ef2bcc8
> [169125.386691] Call Trace:
> [169125.387194]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
> [169125.388205]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
> [169125.389386]  [<ffffffffa03cac33>] ?
> btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
> [169125.390837]  [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50
> [169125.391839]  [<ffffffffa03d67bb>] ? unpin_extent_cache+0xbe/0xcc [btrfs]
> [169125.392973]  [<ffffffffa03cac33>]
> btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
> [169125.395714]  [<ffffffff8147c612>] ? _raw_spin_unlock_irqrestore+0x38/0x60
> [169125.396888]  [<ffffffff81087d0b>] ? trace_hardirqs_off_caller+0x1f/0xb9
> [169125.397986]  [<ffffffffa03cadec>] finish_ordered_fn+0x15/0x17 [btrfs]
> [169125.399122]  [<ffffffffa03ec706>] normal_work_helper+0x14c/0x32a [btrfs]
> [169125.400300]  [<ffffffffa03ec9e6>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
> [169125.401450]  [<ffffffff81063b23>] process_one_work+0x24a/0x4ac
> [169125.402631]  [<ffffffff81064285>] worker_thread+0x206/0x2c2
> [169125.403622]  [<ffffffff8106407f>] ? rescuer_thread+0x2cb/0x2cb
> [169125.404693]  [<ffffffff8106904d>] kthread+0xef/0xf7
> [169125.405727]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24
> [169125.406808]  [<ffffffff8147d10f>] ret_from_fork+0x3f/0x70
> [169125.407834]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24
> [169125.408840] ---[ end trace 6ee4342a5722b119 ]---
> [169125.409654] BTRFS: error (device sdc) in
> btrfs_finish_ordered_io:2929: errno=-2 No such entry
>
> So what you have here is racy:
>
> btrfs_finish_ordered_io()
>     joins existing transaction (or starts a new one)
>     insert_reserved_file_extent()
>        btrfs_alloc_reserved_file_extent() --> creates delayed ref
>
>        ******* delayed refs are run, someone called
> btrfs_async_run_delayed_refs() from btrfs_end_transaction(), ref head
> is removed ******
>
>        btrfs_add_delayed_qgroup_reserve() --> does not find delayed ref
> head, returns -ENOENT and finish_ordered_io aborts current
> transaction...
>
> A very tiny race, but...
>
> thanks
>
>
>> +       WARN_ON(ref_head->qgroup_reserved || ref_head->qgroup_ref_root);
>> +       ref_head->qgroup_ref_root = ref_root;
>> +       ref_head->qgroup_reserved = num_bytes;
>> +out:
>> +       spin_unlock(&delayed_refs->lock);
>> +       return ret;
>> +}
>> +
>>   int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>>                                  struct btrfs_trans_handle *trans,
>>                                  u64 bytenr, u64 num_bytes,
>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>> index 13fb5e6..d4c41e2 100644
>> --- a/fs/btrfs/delayed-ref.h
>> +++ b/fs/btrfs/delayed-ref.h
>> @@ -113,6 +113,17 @@ struct btrfs_delayed_ref_head {
>>          int total_ref_mod;
>>
>>          /*
>> +        * For qgroup reserved space freeing.
>> +        *
>> +        * ref_root and reserved will be recorded after
>> +        * BTRFS_ADD_DELAYED_EXTENT is called.
>> +        * And will be used to free reserved qgroup space at
>> +        * run_delayed_refs() time.
>> +        */
>> +       u64 qgroup_ref_root;
>> +       u64 qgroup_reserved;
>> +
>> +       /*
>>           * when a new extent is allocated, it is just reserved in memory
>>           * The actual extent isn't inserted into the extent allocation tree
>>           * until the delayed ref is processed.  must_insert_reserved is
>> @@ -242,6 +253,9 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>>                                 u64 owner, u64 offset, int action,
>>                                 struct btrfs_delayed_extent_op *extent_op,
>>                                 int no_quota);
>> +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
>> +                                    struct btrfs_trans_handle *trans,
>> +                                    u64 ref_root, u64 bytenr, u64 num_bytes);
>>   int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>>                                  struct btrfs_trans_handle *trans,
>>                                  u64 bytenr, u64 num_bytes,
>> --
>> 2.6.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-27  4:13     ` Qu Wenruo
@ 2015-10-27  5:14       ` Chris Mason
  2015-10-27  5:48         ` Qu Wenruo
  2015-10-27  9:22       ` Filipe Manana
  1 sibling, 1 reply; 35+ messages in thread
From: Chris Mason @ 2015-10-27  5:14 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: fdmanana, linux-btrfs

On Tue, Oct 27, 2015 at 12:13:11PM +0800, Qu Wenruo wrote:
> 
> 
> Filipe Manana wrote on 2015/10/25 14:39 +0000:
> >On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> >>Add new function btrfs_add_delayed_qgroup_reserve() function to record
> >>how much space is reserved for that extent.
> >>
> >>As btrfs only accounts qgroup at run_delayed_refs() time, so newly
> >>allocated extent should keep the reserved space until then.
> >>
> >>So add needed function with related members to do it.
> >>
> >>Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> >>---
> >>v2:
> >>   None
> >>v3:
> >>   None
> >>---
> >>  fs/btrfs/delayed-ref.c | 29 +++++++++++++++++++++++++++++
> >>  fs/btrfs/delayed-ref.h | 14 ++++++++++++++
> >>  2 files changed, 43 insertions(+)
> >>
> >>diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> >>index ac3e81d..bd9b63b 100644
> >>--- a/fs/btrfs/delayed-ref.c
> >>+++ b/fs/btrfs/delayed-ref.c
> >>@@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
> >>         INIT_LIST_HEAD(&head_ref->ref_list);
> >>         head_ref->processing = 0;
> >>         head_ref->total_ref_mod = count_mod;
> >>+       head_ref->qgroup_reserved = 0;
> >>+       head_ref->qgroup_ref_root = 0;
> >>
> >>         /* Record qgroup extent info if provided */
> >>         if (qrecord) {
> >>@@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
> >>         return 0;
> >>  }
> >>
> >>+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
> >>+                                    struct btrfs_trans_handle *trans,
> >>+                                    u64 ref_root, u64 bytenr, u64 num_bytes)
> >>+{
> >>+       struct btrfs_delayed_ref_root *delayed_refs;
> >>+       struct btrfs_delayed_ref_head *ref_head;
> >>+       int ret = 0;
> >>+
> >>+       if (!fs_info->quota_enabled || !is_fstree(ref_root))
> >>+               return 0;
> >>+
> >>+       delayed_refs = &trans->transaction->delayed_refs;
> >>+
> >>+       spin_lock(&delayed_refs->lock);
> >>+       ref_head = find_ref_head(&delayed_refs->href_root, bytenr, 0);
> >>+       if (!ref_head) {
> >>+               ret = -ENOENT;
> >>+               goto out;
> >>+       }
> >
> >Hi Qu,
> >
> >So while running btrfs/063, with qgroups enabled (I modified the test
> >to enable qgroups), ran into this 2 times:
> >
> >[169125.246506] BTRFS info (device sdc): disk space caching is enabled
> >[169125.363164] ------------[ cut here ]------------
> >[169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929
> >btrfs_finish_ordered_io+0x347/0x4eb [btrfs]()
> >[169125.367702] BTRFS: Transaction aborted (error -2)
> >[169125.368830] Modules linked in: btrfs dm_flakey dm_mod
> >crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs
> >lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4
> >psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core
> >serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom
> >ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring
> >scsi_mod e1000 virtio [last unloaded: btrfs]
> >[169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G
> >   W       4.3.0-rc5-btrfs-next-17+ #1
> 
> Hi Filipe,
> 
> Although not related to the bug report, I'm a little interested in your
> testing kernel.
> 
> Are you testing integration-4.4 from Chris repo?
> Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
> 
> Although integration-4.4 already merged qgroup reserve patchset, but it's
> causing some strange bug like over decrease data sinfo->bytes_may_use,
> mainly in generic/127 testcase.
> 
> But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
> old tests based on that), no generic/127 problem at all.

Did I mismerge things?

-chris

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-27  5:14       ` Chris Mason
@ 2015-10-27  5:48         ` Qu Wenruo
  2015-10-27  6:12           ` Chris Mason
  0 siblings, 1 reply; 35+ messages in thread
From: Qu Wenruo @ 2015-10-27  5:48 UTC (permalink / raw)
  To: Chris Mason, fdmanana, linux-btrfs



Chris Mason wrote on 2015/10/27 01:14 -0400:
> On Tue, Oct 27, 2015 at 12:13:11PM +0800, Qu Wenruo wrote:
>>
>>
>> Filipe Manana wrote on 2015/10/25 14:39 +0000:
>>> On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>> Add new function btrfs_add_delayed_qgroup_reserve() function to record
>>>> how much space is reserved for that extent.
>>>>
>>>> As btrfs only accounts qgroup at run_delayed_refs() time, so newly
>>>> allocated extent should keep the reserved space until then.
>>>>
>>>> So add needed function with related members to do it.
>>>>
>>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> ---
>>>> v2:
>>>>    None
>>>> v3:
>>>>    None
>>>> ---
>>>>   fs/btrfs/delayed-ref.c | 29 +++++++++++++++++++++++++++++
>>>>   fs/btrfs/delayed-ref.h | 14 ++++++++++++++
>>>>   2 files changed, 43 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>>>> index ac3e81d..bd9b63b 100644
>>>> --- a/fs/btrfs/delayed-ref.c
>>>> +++ b/fs/btrfs/delayed-ref.c
>>>> @@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>>>>          INIT_LIST_HEAD(&head_ref->ref_list);
>>>>          head_ref->processing = 0;
>>>>          head_ref->total_ref_mod = count_mod;
>>>> +       head_ref->qgroup_reserved = 0;
>>>> +       head_ref->qgroup_ref_root = 0;
>>>>
>>>>          /* Record qgroup extent info if provided */
>>>>          if (qrecord) {
>>>> @@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>>>>          return 0;
>>>>   }
>>>>
>>>> +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
>>>> +                                    struct btrfs_trans_handle *trans,
>>>> +                                    u64 ref_root, u64 bytenr, u64 num_bytes)
>>>> +{
>>>> +       struct btrfs_delayed_ref_root *delayed_refs;
>>>> +       struct btrfs_delayed_ref_head *ref_head;
>>>> +       int ret = 0;
>>>> +
>>>> +       if (!fs_info->quota_enabled || !is_fstree(ref_root))
>>>> +               return 0;
>>>> +
>>>> +       delayed_refs = &trans->transaction->delayed_refs;
>>>> +
>>>> +       spin_lock(&delayed_refs->lock);
>>>> +       ref_head = find_ref_head(&delayed_refs->href_root, bytenr, 0);
>>>> +       if (!ref_head) {
>>>> +               ret = -ENOENT;
>>>> +               goto out;
>>>> +       }
>>>
>>> Hi Qu,
>>>
>>> So while running btrfs/063, with qgroups enabled (I modified the test
>>> to enable qgroups), ran into this 2 times:
>>>
>>> [169125.246506] BTRFS info (device sdc): disk space caching is enabled
>>> [169125.363164] ------------[ cut here ]------------
>>> [169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929
>>> btrfs_finish_ordered_io+0x347/0x4eb [btrfs]()
>>> [169125.367702] BTRFS: Transaction aborted (error -2)
>>> [169125.368830] Modules linked in: btrfs dm_flakey dm_mod
>>> crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs
>>> lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4
>>> psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core
>>> serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom
>>> ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring
>>> scsi_mod e1000 virtio [last unloaded: btrfs]
>>> [169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G
>>>    W       4.3.0-rc5-btrfs-next-17+ #1
>>
>> Hi Filipe,
>>
>> Although not related to the bug report, I'm a little interested in your
>> testing kernel.
>>
>> Are you testing integration-4.4 from Chris repo?
>> Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
>>
>> Although integration-4.4 already merged qgroup reserve patchset, but it's
>> causing some strange bug like over decrease data sinfo->bytes_may_use,
>> mainly in generic/127 testcase.
>>
>> But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
>> old tests based on that), no generic/127 problem at all.
>
> Did I mismerge things?
>
> -chris
>
Not sure yet.

But at least some patches in 4.3 is not in integration-4.4, like the 
following patch:
btrfs: Avoid truncate tailing page if fallocate range doesn't exceed 
inode size

I'll continue testing and bisecting to see what triggers the strange 
WARN_ON() in integration-4.4.
------
Oct 27 11:05:00 vmware kernel: WARNING: CPU: 4 PID: 13711 at 
fs/btrfs//extent-tree.c:4171 
btrfs_free_reserved_data_space_noquota+0x175/0x190 [btrfs]()
Oct 27 11:05:00 vmware kernel: Modules linked in: btrfs(OE) fuse vfat 
msdos fat xfs binfmt_misc bridge stp llc dm_snapshot dm_bufio dm_flakey 
loop iptable_nat nf_conntrack_ipv4 nf
_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw 
iptable_filter ip_tables dm_mirror dm_region_hash dm_log xor dm_mod 
crc32c_intel vmw_balloon raid6_pq nfsd
vmw_vmci i2c_piix4 shpchp auth_rpcgss acpi_cpufreq nfs_acl lockd grace 
sunrpc ext4 mbcache jbd2 sd_mod vmwgfx drm_kms_helper syscopyarea 
sysfillrect sysimgblt fb_sys_fops ttm drm
ata_piix vmxnet3 libata vmw_pvscsi floppy [last unloaded: btrfs]
Oct 27 11:05:00 vmware kernel: CPU: 4 PID: 13711 Comm: fsx Tainted: G 
      W  OE   4.3.0-rc5+ #5
Oct 27 11:05:00 vmware kernel: Hardware name: VMware, Inc. VMware 
Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 08/16/2013
Oct 27 11:05:00 vmware kernel: 0000000000000000 000000002caf2373 
ffff88021f63b760 ffffffff81302e73
Oct 27 11:05:00 vmware kernel: 0000000000000000 ffff88021f63b798 
ffffffff810600f6 ffff88021c6e9000
Oct 27 11:05:00 vmware kernel: ffff88022a4abc00 0000000000006000 
ffff88021f63b8ac ffff8800827a0820
Oct 27 11:05:00 vmware kernel: Call Trace:
Oct 27 11:05:00 vmware kernel: [<ffffffff81302e73>] dump_stack+0x4b/0x68
Oct 27 11:05:00 vmware kernel: [<ffffffff810600f6>] 
warn_slowpath_common+0x86/0xc0
Oct 27 11:05:00 vmware kernel: [<ffffffff8106023a>] 
warn_slowpath_null+0x1a/0x20
Oct 27 11:05:00 vmware kernel: [<ffffffffa04d6b25>] 
btrfs_free_reserved_data_space_noquota+0x175/0x190 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa04f6b8d>] 
btrfs_clear_bit_hook+0x2ed/0x360 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa05113ad>] 
clear_state_bit+0x5d/0x1d0 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa051174a>] 
__clear_extent_bit+0x22a/0x3d0 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa051283a>] 
extent_clear_unlock_delalloc+0x7a/0x2c0 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffff8161a547>] ? 
_raw_spin_unlock+0x27/0x40
Oct 27 11:05:00 vmware kernel: [<ffffffffa050d665>] ? 
__btrfs_add_ordered_extent+0x245/0x3b0 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa04f934b>] 
cow_file_range+0x27b/0x430 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa04fa112>] 
run_delalloc_range+0x102/0x400 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa0513152>] 
writepage_delalloc.isra.35+0x112/0x170 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa0514235>] 
__extent_writepage+0xf5/0x370 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa0514817>] 
extent_write_cache_pages.isra.32.constprop.47+0x367/0x420 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa051662c>] 
extent_writepages+0x5c/0x90 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa04f73b0>] ? 
btrfs_real_readdir+0x570/0x570 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa04f4a38>] 
btrfs_writepages+0x28/0x30 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffff81191501>] do_writepages+0x21/0x40
Oct 27 11:05:00 vmware kernel: [<ffffffff81185ad0>] 
__filemap_fdatawrite_range+0x80/0xb0
Oct 27 11:05:00 vmware kernel: [<ffffffff81185bc3>] 
filemap_fdatawrite_range+0x13/0x20
Oct 27 11:05:00 vmware kernel: [<ffffffffa05095c0>] 
btrfs_fdatawrite_range+0x20/0x50 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa0509609>] 
start_ordered_ops+0x19/0x30 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffffa05096a3>] 
btrfs_sync_file+0x83/0x420 [btrfs]
Oct 27 11:05:00 vmware kernel: [<ffffffff811bd9e0>] ? SyS_msync+0x90/0x1f0
Oct 27 11:05:00 vmware kernel: [<ffffffff8122f7cd>] 
vfs_fsync_range+0x3d/0xb0
Oct 27 11:05:00 vmware kernel: [<ffffffff811bdac1>] SyS_msync+0x171/0x1f0
Oct 27 11:05:00 vmware kernel: [<ffffffff8161af17>] 
entry_SYSCALL_64_fastpath+0x12/0x6f
------

At least, this won't cause anything wrong, as I enhanced the existing 
WARN_ON() in old btrfs_free_reserved_data_space() to handle underflow 
case quite well.
But still need investigating as it seems to be a regression.

Maybe there are some other hidden bug in my qgroup patchset... :(

Thanks,
Qu

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-27  5:48         ` Qu Wenruo
@ 2015-10-27  6:12           ` Chris Mason
  2015-10-27  7:26             ` Qu Wenruo
  2015-10-27  9:05             ` Qu Wenruo
  0 siblings, 2 replies; 35+ messages in thread
From: Chris Mason @ 2015-10-27  6:12 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: fdmanana, linux-btrfs

On Tue, Oct 27, 2015 at 01:48:34PM +0800, Qu Wenruo wrote:
> >>Are you testing integration-4.4 from Chris repo?
> >>Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
> >>
> >>Although integration-4.4 already merged qgroup reserve patchset, but it's
> >>causing some strange bug like over decrease data sinfo->bytes_may_use,
> >>mainly in generic/127 testcase.
> >>
> >>But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
> >>old tests based on that), no generic/127 problem at all.
> >
> >Did I mismerge things?
> >
> >-chris
> >
> Not sure yet.
> 
> But at least some patches in 4.3 is not in integration-4.4, like the
> following patch:
> btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode
> size

Have you tried testing integration-4.4 merged with current Linus git?

-chris

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-27  6:12           ` Chris Mason
@ 2015-10-27  7:26             ` Qu Wenruo
  2015-10-27  9:05             ` Qu Wenruo
  1 sibling, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-27  7:26 UTC (permalink / raw)
  To: Chris Mason, fdmanana, linux-btrfs



Chris Mason wrote on 2015/10/27 02:12 -0400:
> On Tue, Oct 27, 2015 at 01:48:34PM +0800, Qu Wenruo wrote:
>>>> Are you testing integration-4.4 from Chris repo?
>>>> Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
>>>>
>>>> Although integration-4.4 already merged qgroup reserve patchset, but it's
>>>> causing some strange bug like over decrease data sinfo->bytes_may_use,
>>>> mainly in generic/127 testcase.
>>>>
>>>> But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
>>>> old tests based on that), no generic/127 problem at all.
>>>
>>> Did I mismerge things?
>>>
>>> -chris
>>>
>> Not sure yet.
>>
>> But at least some patches in 4.3 is not in integration-4.4, like the
>> following patch:
>> btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode
>> size
>
> Have you tried testing integration-4.4 merged with current Linus git?
>
> -chris
>
Nice advice, compiling now.

But even rebasing my local old branch(based on 7d35199e15b82a4d1a200, 
with 21 patches) to 4.3-rc7, the result got screwed up...

If things get crazy, I'm afraid it would need to revert all my qgroup 
patchset from integration-4.4. :(

Thanks,
Qu

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-27  6:12           ` Chris Mason
  2015-10-27  7:26             ` Qu Wenruo
@ 2015-10-27  9:05             ` Qu Wenruo
  2015-10-27 11:34               ` Chris Mason
  1 sibling, 1 reply; 35+ messages in thread
From: Qu Wenruo @ 2015-10-27  9:05 UTC (permalink / raw)
  To: Chris Mason, fdmanana, linux-btrfs



Chris Mason wrote on 2015/10/27 02:12 -0400:
> On Tue, Oct 27, 2015 at 01:48:34PM +0800, Qu Wenruo wrote:
>>>> Are you testing integration-4.4 from Chris repo?
>>>> Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
>>>>
>>>> Although integration-4.4 already merged qgroup reserve patchset, but it's
>>>> causing some strange bug like over decrease data sinfo->bytes_may_use,
>>>> mainly in generic/127 testcase.
>>>>
>>>> But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
>>>> old tests based on that), no generic/127 problem at all.
>>>
>>> Did I mismerge things?
>>>
>>> -chris
>>>
>> Not sure yet.
>>
>> But at least some patches in 4.3 is not in integration-4.4, like the
>> following patch:
>> btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode
>> size
>
> Have you tried testing integration-4.4 merged with current Linus git?
>
> -chris
>
Integration-4.4 merged with Linus' master also fails. :(

Current known working branches are all based on 4.3-integration(4.2-rc5):
https://github.com/adam900710/linux/tree/qgroup_reserve_good

Tried 4.3-rc5 and 4.3-rc7, all fails with kernel warning in generic/137.

And due to the huge difference, I'm afraid it won't take a short time to 
find the root cause...

Thanks,
Qu

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-27  4:13     ` Qu Wenruo
  2015-10-27  5:14       ` Chris Mason
@ 2015-10-27  9:22       ` Filipe Manana
  1 sibling, 0 replies; 35+ messages in thread
From: Filipe Manana @ 2015-10-27  9:22 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Oct 27, 2015 at 4:13 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>
>
> Filipe Manana wrote on 2015/10/25 14:39 +0000:
>>
>> On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruo <quwenruo@cn.fujitsu.com>
>> wrote:
>>>
>>> Add new function btrfs_add_delayed_qgroup_reserve() function to record
>>> how much space is reserved for that extent.
>>>
>>> As btrfs only accounts qgroup at run_delayed_refs() time, so newly
>>> allocated extent should keep the reserved space until then.
>>>
>>> So add needed function with related members to do it.
>>>
>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> ---
>>> v2:
>>>    None
>>> v3:
>>>    None
>>> ---
>>>   fs/btrfs/delayed-ref.c | 29 +++++++++++++++++++++++++++++
>>>   fs/btrfs/delayed-ref.h | 14 ++++++++++++++
>>>   2 files changed, 43 insertions(+)
>>>
>>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>>> index ac3e81d..bd9b63b 100644
>>> --- a/fs/btrfs/delayed-ref.c
>>> +++ b/fs/btrfs/delayed-ref.c
>>> @@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>>>          INIT_LIST_HEAD(&head_ref->ref_list);
>>>          head_ref->processing = 0;
>>>          head_ref->total_ref_mod = count_mod;
>>> +       head_ref->qgroup_reserved = 0;
>>> +       head_ref->qgroup_ref_root = 0;
>>>
>>>          /* Record qgroup extent info if provided */
>>>          if (qrecord) {
>>> @@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info
>>> *fs_info,
>>>          return 0;
>>>   }
>>>
>>> +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
>>> +                                    struct btrfs_trans_handle *trans,
>>> +                                    u64 ref_root, u64 bytenr, u64
>>> num_bytes)
>>> +{
>>> +       struct btrfs_delayed_ref_root *delayed_refs;
>>> +       struct btrfs_delayed_ref_head *ref_head;
>>> +       int ret = 0;
>>> +
>>> +       if (!fs_info->quota_enabled || !is_fstree(ref_root))
>>> +               return 0;
>>> +
>>> +       delayed_refs = &trans->transaction->delayed_refs;
>>> +
>>> +       spin_lock(&delayed_refs->lock);
>>> +       ref_head = find_ref_head(&delayed_refs->href_root, bytenr, 0);
>>> +       if (!ref_head) {
>>> +               ret = -ENOENT;
>>> +               goto out;
>>> +       }
>>
>>
>> Hi Qu,
>>
>> So while running btrfs/063, with qgroups enabled (I modified the test
>> to enable qgroups), ran into this 2 times:
>>
>> [169125.246506] BTRFS info (device sdc): disk space caching is enabled
>> [169125.363164] ------------[ cut here ]------------
>> [169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929
>> btrfs_finish_ordered_io+0x347/0x4eb [btrfs]()
>> [169125.367702] BTRFS: Transaction aborted (error -2)
>> [169125.368830] Modules linked in: btrfs dm_flakey dm_mod
>> crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs
>> lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4
>> psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core
>> serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom
>> ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring
>> scsi_mod e1000 virtio [last unloaded: btrfs]
>> [169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G
>>    W       4.3.0-rc5-btrfs-next-17+ #1
>
>
> Hi Filipe,

Hi Qu,

>
> Although not related to the bug report, I'm a little interested in your
> testing kernel.
>
> Are you testing integration-4.4 from Chris repo?

Yes, I got that from Chris' integration-4.4 branch.

> Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
>
> Although integration-4.4 already merged qgroup reserve patchset, but it's
> causing some strange bug like over decrease data sinfo->bytes_may_use,
> mainly in generic/127 testcase.

Haven't hit that one yet.

>
> But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
> old tests based on that), no generic/127 problem at all.
>
> Thanks,
> Qu
>
>
>> [169125.378522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org
>> 04/01/2014
>> [169125.380916] Workqueue: btrfs-endio-write btrfs_endio_write_helper
>> [btrfs]
>> [169125.382167]  0000000000000000 ffff88007ef2bc28 ffffffff812566f4
>> ffff88007ef2bc70
>> [169125.383643]  ffff88007ef2bc60 ffffffff8104d0a6 ffffffffa03cac33
>> ffff8801f5ca6db0
>> [169125.385197]  ffff8802c6c7ee98 ffff880122bc1000 00000000fffffffe
>> ffff88007ef2bcc8
>> [169125.386691] Call Trace:
>> [169125.387194]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
>> [169125.388205]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
>> [169125.389386]  [<ffffffffa03cac33>] ?
>> btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
>> [169125.390837]  [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50
>> [169125.391839]  [<ffffffffa03d67bb>] ? unpin_extent_cache+0xbe/0xcc
>> [btrfs]
>> [169125.392973]  [<ffffffffa03cac33>]
>> btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
>> [169125.395714]  [<ffffffff8147c612>] ?
>> _raw_spin_unlock_irqrestore+0x38/0x60
>> [169125.396888]  [<ffffffff81087d0b>] ?
>> trace_hardirqs_off_caller+0x1f/0xb9
>> [169125.397986]  [<ffffffffa03cadec>] finish_ordered_fn+0x15/0x17 [btrfs]
>> [169125.399122]  [<ffffffffa03ec706>] normal_work_helper+0x14c/0x32a
>> [btrfs]
>> [169125.400300]  [<ffffffffa03ec9e6>] btrfs_endio_write_helper+0x12/0x14
>> [btrfs]
>> [169125.401450]  [<ffffffff81063b23>] process_one_work+0x24a/0x4ac
>> [169125.402631]  [<ffffffff81064285>] worker_thread+0x206/0x2c2
>> [169125.403622]  [<ffffffff8106407f>] ? rescuer_thread+0x2cb/0x2cb
>> [169125.404693]  [<ffffffff8106904d>] kthread+0xef/0xf7
>> [169125.405727]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24
>> [169125.406808]  [<ffffffff8147d10f>] ret_from_fork+0x3f/0x70
>> [169125.407834]  [<ffffffff81068f5e>] ? kthread_parkme+0x24/0x24
>> [169125.408840] ---[ end trace 6ee4342a5722b119 ]---
>> [169125.409654] BTRFS: error (device sdc) in
>> btrfs_finish_ordered_io:2929: errno=-2 No such entry
>>
>> So what you have here is racy:
>>
>> btrfs_finish_ordered_io()
>>     joins existing transaction (or starts a new one)
>>     insert_reserved_file_extent()
>>        btrfs_alloc_reserved_file_extent() --> creates delayed ref
>>
>>        ******* delayed refs are run, someone called
>> btrfs_async_run_delayed_refs() from btrfs_end_transaction(), ref head
>> is removed ******
>>
>>        btrfs_add_delayed_qgroup_reserve() --> does not find delayed ref
>> head, returns -ENOENT and finish_ordered_io aborts current
>> transaction...
>>
>> A very tiny race, but...
>>
>> thanks
>>
>>
>>> +       WARN_ON(ref_head->qgroup_reserved || ref_head->qgroup_ref_root);
>>> +       ref_head->qgroup_ref_root = ref_root;
>>> +       ref_head->qgroup_reserved = num_bytes;
>>> +out:
>>> +       spin_unlock(&delayed_refs->lock);
>>> +       return ret;
>>> +}
>>> +
>>>   int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>>>                                  struct btrfs_trans_handle *trans,
>>>                                  u64 bytenr, u64 num_bytes,
>>> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
>>> index 13fb5e6..d4c41e2 100644
>>> --- a/fs/btrfs/delayed-ref.h
>>> +++ b/fs/btrfs/delayed-ref.h
>>> @@ -113,6 +113,17 @@ struct btrfs_delayed_ref_head {
>>>          int total_ref_mod;
>>>
>>>          /*
>>> +        * For qgroup reserved space freeing.
>>> +        *
>>> +        * ref_root and reserved will be recorded after
>>> +        * BTRFS_ADD_DELAYED_EXTENT is called.
>>> +        * And will be used to free reserved qgroup space at
>>> +        * run_delayed_refs() time.
>>> +        */
>>> +       u64 qgroup_ref_root;
>>> +       u64 qgroup_reserved;
>>> +
>>> +       /*
>>>           * when a new extent is allocated, it is just reserved in memory
>>>           * The actual extent isn't inserted into the extent allocation
>>> tree
>>>           * until the delayed ref is processed.  must_insert_reserved is
>>> @@ -242,6 +253,9 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info
>>> *fs_info,
>>>                                 u64 owner, u64 offset, int action,
>>>                                 struct btrfs_delayed_extent_op
>>> *extent_op,
>>>                                 int no_quota);
>>> +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
>>> +                                    struct btrfs_trans_handle *trans,
>>> +                                    u64 ref_root, u64 bytenr, u64
>>> num_bytes);
>>>   int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>>>                                  struct btrfs_trans_handle *trans,
>>>                                  u64 bytenr, u64 num_bytes,
>>> --
>>> 2.6.1
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-27  9:05             ` Qu Wenruo
@ 2015-10-27 11:34               ` Chris Mason
  2015-10-28  0:25                 ` Qu Wenruo
  2015-10-28 13:36                 ` Holger Hoffstätte
  0 siblings, 2 replies; 35+ messages in thread
From: Chris Mason @ 2015-10-27 11:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: fdmanana, linux-btrfs

On Tue, Oct 27, 2015 at 05:05:56PM +0800, Qu Wenruo wrote:
> 
> 
> Chris Mason wrote on 2015/10/27 02:12 -0400:
> >On Tue, Oct 27, 2015 at 01:48:34PM +0800, Qu Wenruo wrote:
> >>>>Are you testing integration-4.4 from Chris repo?
> >>>>Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
> >>>>
> >>>>Although integration-4.4 already merged qgroup reserve patchset, but it's
> >>>>causing some strange bug like over decrease data sinfo->bytes_may_use,
> >>>>mainly in generic/127 testcase.
> >>>>
> >>>>But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
> >>>>old tests based on that), no generic/127 problem at all.
> >>>
> >>>Did I mismerge things?
> >>>
> >>>-chris
> >>>
> >>Not sure yet.
> >>
> >>But at least some patches in 4.3 is not in integration-4.4, like the
> >>following patch:
> >>btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode
> >>size
> >
> >Have you tried testing integration-4.4 merged with current Linus git?
> >
> >-chris
> >
> Integration-4.4 merged with Linus' master also fails. :(
> 
> Current known working branches are all based on 4.3-integration(4.2-rc5):
> https://github.com/adam900710/linux/tree/qgroup_reserve_good
> 
> Tried 4.3-rc5 and 4.3-rc7, all fails with kernel warning in generic/137.
> 
> And due to the huge difference, I'm afraid it won't take a short time to
> find the root cause...

Ok, this is the top merge commit in integration:

commit a9e6d153563d2ed69c6cd7fb4fa5ce4ca7c712eb
Merge: 56fa9d0 0584f71
Author: Chris Mason <clm@fb.com>
Date:   Wed Oct 21 19:00:38 2015 -0700

    Merge branch 'allocator-fixes' into for-linus-4.4

Please try commit 56fa9d0, which doesn't have Josef's allocator fixes.
It's possible there is a conflict with your changes in there.

-chris


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-27 11:34               ` Chris Mason
@ 2015-10-28  0:25                 ` Qu Wenruo
  2015-10-28 13:36                 ` Holger Hoffstätte
  1 sibling, 0 replies; 35+ messages in thread
From: Qu Wenruo @ 2015-10-28  0:25 UTC (permalink / raw)
  To: Chris Mason, fdmanana, linux-btrfs



Chris Mason wrote on 2015/10/27 07:34 -0400:
> On Tue, Oct 27, 2015 at 05:05:56PM +0800, Qu Wenruo wrote:
>>
>>
>> Chris Mason wrote on 2015/10/27 02:12 -0400:
>>> On Tue, Oct 27, 2015 at 01:48:34PM +0800, Qu Wenruo wrote:
>>>>>> Are you testing integration-4.4 from Chris repo?
>>>>>> Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
>>>>>>
>>>>>> Although integration-4.4 already merged qgroup reserve patchset, but it's
>>>>>> causing some strange bug like over decrease data sinfo->bytes_may_use,
>>>>>> mainly in generic/127 testcase.
>>>>>>
>>>>>> But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
>>>>>> old tests based on that), no generic/127 problem at all.
>>>>>
>>>>> Did I mismerge things?
>>>>>
>>>>> -chris
>>>>>
>>>> Not sure yet.
>>>>
>>>> But at least some patches in 4.3 is not in integration-4.4, like the
>>>> following patch:
>>>> btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode
>>>> size
>>>
>>> Have you tried testing integration-4.4 merged with current Linus git?
>>>
>>> -chris
>>>
>> Integration-4.4 merged with Linus' master also fails. :(
>>
>> Current known working branches are all based on 4.3-integration(4.2-rc5):
>> https://github.com/adam900710/linux/tree/qgroup_reserve_good
>>
>> Tried 4.3-rc5 and 4.3-rc7, all fails with kernel warning in generic/137.
>>
>> And due to the huge difference, I'm afraid it won't take a short time to
>> find the root cause...
>
> Ok, this is the top merge commit in integration:
>
> commit a9e6d153563d2ed69c6cd7fb4fa5ce4ca7c712eb
> Merge: 56fa9d0 0584f71
> Author: Chris Mason <clm@fb.com>
> Date:   Wed Oct 21 19:00:38 2015 -0700
>
>      Merge branch 'allocator-fixes' into for-linus-4.4
>
> Please try commit 56fa9d0, which doesn't have Josef's allocator fixes.
> It's possible there is a conflict with your changes in there.
>
> -chris
>
Tried, still warning.

I'd better investigate it now, and forget why it's OK in 4.3-integration...

Thanks,
Qu

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-27 11:34               ` Chris Mason
  2015-10-28  0:25                 ` Qu Wenruo
@ 2015-10-28 13:36                 ` Holger Hoffstätte
  2015-10-29  6:29                   ` Chris Mason
  1 sibling, 1 reply; 35+ messages in thread
From: Holger Hoffstätte @ 2015-10-28 13:36 UTC (permalink / raw)
  To: Chris Mason, Qu Wenruo, fdmanana, linux-btrfs

On Tue, Oct 27, 2015 at 12:34 PM, Chris Mason <clm@fb.com> wrote:
> On Tue, Oct 27, 2015 at 05:05:56PM +0800, Qu Wenruo wrote:
>>
>>
>> Chris Mason wrote on 2015/10/27 02:12 -0400:
>> >On Tue, Oct 27, 2015 at 01:48:34PM +0800, Qu Wenruo wrote:
>> >>>>Are you testing integration-4.4 from Chris repo?
>> >>>>Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
>> >>>>
>> >>>>Although integration-4.4 already merged qgroup reserve patchset, but it's
>> >>>>causing some strange bug like over decrease data sinfo->bytes_may_use,
>> >>>>mainly in generic/127 testcase.
>> >>>>
>> >>>>But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
>> >>>>old tests based on that), no generic/127 problem at all.
>> >>>
>> >>>Did I mismerge things?
>> >>>
>> >>>-chris
>> >>>
>> >>Not sure yet.
>> >>
>> >>But at least some patches in 4.3 is not in integration-4.4, like the
>> >>following patch:
>> >>btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode
>> >>size
>> >
>> >Have you tried testing integration-4.4 merged with current Linus git?

Chris, something went definitely wrong with the 4.4-integration
branch, and it's not the point where you merged from Josef. Mainline
has: 0f6925fa2907df58496cabc33fa4677c635e2223 ("btrfs: Avoid truncate
tailing page if fallocate range doesn't exceed inode size"), and that
commit just doesn't exist in 4.4-integration any more. Neither did any
merges touch file.c, so it
seems this just got lost for some reason (rebase? forced push?).
It's difficult to say what else might have gone missing.

-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref
  2015-10-28 13:36                 ` Holger Hoffstätte
@ 2015-10-29  6:29                   ` Chris Mason
  0 siblings, 0 replies; 35+ messages in thread
From: Chris Mason @ 2015-10-29  6:29 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: Qu Wenruo, fdmanana, linux-btrfs

On Wed, Oct 28, 2015 at 02:36:42PM +0100, Holger Hoffstätte wrote:
> On Tue, Oct 27, 2015 at 12:34 PM, Chris Mason <clm@fb.com> wrote:
> > On Tue, Oct 27, 2015 at 05:05:56PM +0800, Qu Wenruo wrote:
> >>
> >>
> >> Chris Mason wrote on 2015/10/27 02:12 -0400:
> >> >On Tue, Oct 27, 2015 at 01:48:34PM +0800, Qu Wenruo wrote:
> >> >>>>Are you testing integration-4.4 from Chris repo?
> >> >>>>Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
> >> >>>>
> >> >>>>Although integration-4.4 already merged qgroup reserve patchset, but it's
> >> >>>>causing some strange bug like over decrease data sinfo->bytes_may_use,
> >> >>>>mainly in generic/127 testcase.
> >> >>>>
> >> >>>>But if qgroup reserve patchset is rebased to integration-4.3 (I did all my
> >> >>>>old tests based on that), no generic/127 problem at all.
> >> >>>
> >> >>>Did I mismerge things?
> >> >>>
> >> >>>-chris
> >> >>>
> >> >>Not sure yet.
> >> >>
> >> >>But at least some patches in 4.3 is not in integration-4.4, like the
> >> >>following patch:
> >> >>btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode
> >> >>size
> >> >
> >> >Have you tried testing integration-4.4 merged with current Linus git?
> 
> Chris, something went definitely wrong with the 4.4-integration
> branch, and it's not the point where you merged from Josef. Mainline
> has: 0f6925fa2907df58496cabc33fa4677c635e2223 ("btrfs: Avoid truncate
> tailing page if fallocate range doesn't exceed inode size"), and that
> commit just doesn't exist in 4.4-integration any more. Neither did any
> merges touch file.c, so it
> seems this just got lost for some reason (rebase? forced push?).
> It's difficult to say what else might have gone missing.

Hi Holger,

integration-4.4 is based on 4.3-rc5, and it doesn't include any of the
btrfs commits that went in after rc5.  So if you want the latest commits
from 4.3, you just need to merge integration-4.4 with a more recent
Linus rc.

This isn't completely intuitive ;)  I could merge in 4.3-rc7, but for the
trees that I send to Linus, he prefers I not add extra merges unless it
solves some dependency (like a new API, or highly critical bug).

So when I test integration, I test it merged into Linus' latest rc, but
I apply patches on top of the older base.  It makes the resulting graph
of merges look much nicer when Linus pulls from me, and if you scroll
through the commits with git log or gitweb, its more clear where the
new commits are.

-chris


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-10-29  6:29 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-13  2:20 [PATCH v3 00/21] Rework btrfs qgroup reserved space framework Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 01/21] btrfs: extent_io: Introduce needed structure for recoding set/clear bits Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 02/21] btrfs: extent_io: Introduce new function set_record_extent_bits Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 03/21] btrfs: extent_io: Introduce new function clear_record_extent_bits() Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 04/21] btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 05/21] btrfs: qgroup: Introduce functions to release/free qgroup reserve data space Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref Qu Wenruo
2015-10-25 14:39   ` Filipe Manana
2015-10-26  1:27     ` Qu Wenruo
2015-10-27  4:13     ` Qu Wenruo
2015-10-27  5:14       ` Chris Mason
2015-10-27  5:48         ` Qu Wenruo
2015-10-27  6:12           ` Chris Mason
2015-10-27  7:26             ` Qu Wenruo
2015-10-27  9:05             ` Qu Wenruo
2015-10-27 11:34               ` Chris Mason
2015-10-28  0:25                 ` Qu Wenruo
2015-10-28 13:36                 ` Holger Hoffstätte
2015-10-29  6:29                   ` Chris Mason
2015-10-27  9:22       ` Filipe Manana
2015-10-13  2:20 ` [PATCH v3 07/21] btrfs: delayed_ref: release and free qgroup reserved at proper timing Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 08/21] btrfs: qgroup: Introduce new functions to reserve/free metadata Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 09/21] btrfs: qgroup: Use new metadata reservation Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 10/21] btrfs: extent-tree: Add new version of btrfs_check_data_free_space and btrfs_free_reserved_data_space Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 11/21] btrfs: extent-tree: Switch to new check_data_free_space and free_reserved_data_space Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 12/21] btrfs: extent-tree: Add new version of btrfs_delalloc_reserve/release_space Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 13/21] btrfs: extent-tree: Switch to new delalloc space reserve and release Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 14/21] btrfs: qgroup: Cleanup old inaccurate facilities Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 15/21] btrfs: qgroup: Add handler for NOCOW and inline Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 16/21] btrfs: Add handler for invalidate page Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 17/21] btrfs: qgroup: Add new trace point for qgroup data reserve Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 18/21] btrfs: fallocate: Add support to accurate qgroup reserve Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 19/21] btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 20/21] btrfs: qgroup: Avoid calling btrfs_free_reserved_data_space in clear_bit_hook Qu Wenruo
2015-10-13  2:20 ` [PATCH v3 21/21] btrfs: qgroup: Check if qgroup reserved space leaked Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.