All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] Qgroup fixes
@ 2017-03-13  7:52 Qu Wenruo
  2017-03-13  7:52 ` [PATCH v2 1/9] btrfs: qgroup: Add trace point for qgroup reserved space Qu Wenruo
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Qu Wenruo @ 2017-03-13  7:52 UTC (permalink / raw)
  To: linux-btrfs

Patchset can be fetched from my github:
https://github.com/adam900710/linux.git qgroup_fixes

The new base is v4.11-rc1, only minor conflicts and are all easy to handle.

This pull request includes:
1) Fix for inode_cache mount option
   Although no one really cares inode_cache mount option, it will screw
   qgroup for a long time.
   Not only it will screw up qgroup test uses golden output, but also
   new test cases use btrfsck to verify qgroup.

2) Fix for btrfs/104 kernel warning
   This is caused by quota enabling with dirty buffers not written to
   disc.

   Fixed by checking EXTENT_QGROUP_RESERVED flag other than just
   decreasing qgroup->reserved.

3) Fix qgroup underflow caused by freeing ranges not reserved by caller
   Mainly exposed by Chandan on PPC64.

   It's possible that buffered write is blocked by reserving metadata,
   and in error path we will release reserved space for both data and
   metadata.

   However the reserved data can already be reserved by another process
   writing the same page.

   In that case, data reserved space can be freed by two process, one
   for error path, one for normal write routine, causing underflow.

   Fixed by checking if that data range is reserved by ourselves and
   only free it if it's reserved by ourselves.

Update since 2016/12/09:
  Rebased to latest for-linux-4.11.

  Add missing reviewed-by and tested-by tags.

  Add more comment for btrfs_qgroup_reserve_data() for error handling.

  Add more comment for qgroup_free_reserved_data() for later
  enhancement (not function enhancement).

v2:
  Add reviewed-by tag for 2nd patch
  Update the first patch to follow the new trace point standard

Qu Wenruo (9):
  btrfs: qgroup: Add trace point for qgroup reserved space
  btrfs: qgroup: Re-arrange tracepoint timing to co-operate with
    reserved space tracepoint
  btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount
    option
  btrfs: qgroup: Add quick exit for non-fs extents
  btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
  btrfs: qgroup: Return actually freed bytes for qgroup release or free
    data
  btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered
    write and quota enable
  btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  btrfs: qgroup: Fix qgroup reserved space underflow by only freeing
    reserved ranges

 fs/btrfs/ctree.h             |  12 +-
 fs/btrfs/extent-tree.c       |  31 ++++--
 fs/btrfs/extent_io.h         |  14 ++-
 fs/btrfs/file.c              |  46 +++++---
 fs/btrfs/inode-map.c         |   6 +-
 fs/btrfs/inode.c             |  64 +++++++----
 fs/btrfs/ioctl.c             |  11 +-
 fs/btrfs/qgroup.c            | 258 ++++++++++++++++++++++++++-----------------
 fs/btrfs/qgroup.h            |  58 ++++++++--
 fs/btrfs/relocation.c        |  13 ++-
 fs/btrfs/transaction.c       |  21 ++--
 include/trace/events/btrfs.h |  44 ++++++++
 12 files changed, 396 insertions(+), 182 deletions(-)

-- 
2.12.0




^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/9] btrfs: qgroup: Add trace point for qgroup reserved space
  2017-03-13  7:52 [PATCH v2 0/9] Qgroup fixes Qu Wenruo
@ 2017-03-13  7:52 ` Qu Wenruo
  2017-03-30 12:49   ` David Sterba
  2017-03-13  7:52 ` [PATCH v2 2/9] btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint Qu Wenruo
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2017-03-13  7:52 UTC (permalink / raw)
  To: linux-btrfs

Introduce the following trace points:
qgroup_update_reserve
qgroup_meta_reserve

These trace points are handy to trace qgroup reserve space related
problems.

Also export btrfs_qgroup structure, as now we directly pass btrfs_qgroup
structure to trace points, so that structure needs to be exported.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c            | 52 +++++++-------------------------------------
 fs/btrfs/qgroup.h            | 44 +++++++++++++++++++++++++++++++++++++
 include/trace/events/btrfs.h | 44 +++++++++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+), 44 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index a5da750c1087..68a7305b0a46 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -47,50 +47,6 @@
  *  - check all ioctl parameters
  */
 
-/*
- * one struct for each qgroup, organized in fs_info->qgroup_tree.
- */
-struct btrfs_qgroup {
-	u64 qgroupid;
-
-	/*
-	 * state
-	 */
-	u64 rfer;	/* referenced */
-	u64 rfer_cmpr;	/* referenced compressed */
-	u64 excl;	/* exclusive */
-	u64 excl_cmpr;	/* exclusive compressed */
-
-	/*
-	 * limits
-	 */
-	u64 lim_flags;	/* which limits are set */
-	u64 max_rfer;
-	u64 max_excl;
-	u64 rsv_rfer;
-	u64 rsv_excl;
-
-	/*
-	 * reservation tracking
-	 */
-	u64 reserved;
-
-	/*
-	 * lists
-	 */
-	struct list_head groups;  /* groups this group is member of */
-	struct list_head members; /* groups that are members of this group */
-	struct list_head dirty;   /* dirty groups */
-	struct rb_node node;	  /* tree of qgroups */
-
-	/*
-	 * temp variables for accounting operations
-	 * Refer to qgroup_shared_accounting() for details.
-	 */
-	u64 old_refcnt;
-	u64 new_refcnt;
-};
-
 static void btrfs_qgroup_update_old_refcnt(struct btrfs_qgroup *qg, u64 seq,
 					   int mod)
 {
@@ -1075,6 +1031,7 @@ static int __qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
 	qgroup->excl += sign * num_bytes;
 	qgroup->excl_cmpr += sign * num_bytes;
 	if (sign > 0) {
+		trace_qgroup_update_reserve(fs_info, qgroup, -(s64)num_bytes);
 		if (WARN_ON(qgroup->reserved < num_bytes))
 			report_reserved_underflow(fs_info, qgroup, num_bytes);
 		else
@@ -1100,6 +1057,8 @@ static int __qgroup_excl_accounting(struct btrfs_fs_info *fs_info,
 		WARN_ON(sign < 0 && qgroup->excl < num_bytes);
 		qgroup->excl += sign * num_bytes;
 		if (sign > 0) {
+			trace_qgroup_update_reserve(fs_info, qgroup,
+						    -(s64)num_bytes);
 			if (WARN_ON(qgroup->reserved < num_bytes))
 				report_reserved_underflow(fs_info, qgroup,
 							  num_bytes);
@@ -2424,6 +2383,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce)
 
 		qg = unode_aux_to_qgroup(unode);
 
+		trace_qgroup_update_reserve(fs_info, qg, num_bytes);
 		qg->reserved += num_bytes;
 	}
 
@@ -2469,6 +2429,7 @@ void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
 
 		qg = unode_aux_to_qgroup(unode);
 
+		trace_qgroup_update_reserve(fs_info, qg, -(s64)num_bytes);
 		if (WARN_ON(qg->reserved < num_bytes))
 			report_reserved_underflow(fs_info, qg, num_bytes);
 		else
@@ -2945,6 +2906,7 @@ int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
 		return 0;
 
 	BUG_ON(num_bytes != round_down(num_bytes, fs_info->nodesize));
+	trace_qgroup_meta_reserve(root, (s64)num_bytes);
 	ret = qgroup_reserve(root, num_bytes, enforce);
 	if (ret < 0)
 		return ret;
@@ -2964,6 +2926,7 @@ void btrfs_qgroup_free_meta_all(struct btrfs_root *root)
 	reserved = atomic_xchg(&root->qgroup_meta_rsv, 0);
 	if (reserved == 0)
 		return;
+	trace_qgroup_meta_reserve(root, -(s64)reserved);
 	btrfs_qgroup_free_refroot(fs_info, root->objectid, reserved);
 }
 
@@ -2978,6 +2941,7 @@ void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
 	BUG_ON(num_bytes != round_down(num_bytes, fs_info->nodesize));
 	WARN_ON(atomic_read(&root->qgroup_meta_rsv) < num_bytes);
 	atomic_sub(num_bytes, &root->qgroup_meta_rsv);
+	trace_qgroup_meta_reserve(root, -(s64)num_bytes);
 	btrfs_qgroup_free_refroot(fs_info, root->objectid, num_bytes);
 }
 
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 26932a8a1993..e5b42f1044fc 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -62,6 +62,50 @@ struct btrfs_qgroup_extent_record {
 };
 
 /*
+ * one struct for each qgroup, organized in fs_info->qgroup_tree.
+ */
+struct btrfs_qgroup {
+	u64 qgroupid;
+
+	/*
+	 * state
+	 */
+	u64 rfer;	/* referenced */
+	u64 rfer_cmpr;	/* referenced compressed */
+	u64 excl;	/* exclusive */
+	u64 excl_cmpr;	/* exclusive compressed */
+
+	/*
+	 * limits
+	 */
+	u64 lim_flags;	/* which limits are set */
+	u64 max_rfer;
+	u64 max_excl;
+	u64 rsv_rfer;
+	u64 rsv_excl;
+
+	/*
+	 * reservation tracking
+	 */
+	u64 reserved;
+
+	/*
+	 * lists
+	 */
+	struct list_head groups;  /* groups this group is member of */
+	struct list_head members; /* groups that are members of this group */
+	struct list_head dirty;   /* dirty groups */
+	struct rb_node node;	  /* tree of qgroups */
+
+	/*
+	 * temp variables for accounting operations
+	 * Refer to qgroup_shared_accounting() for details.
+	 */
+	u64 old_refcnt;
+	u64 new_refcnt;
+};
+
+/*
  * For qgroup event trace points only
  */
 #define QGROUP_RESERVE		(1<<0)
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index a3c3cab643a9..143dc304dfc1 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -24,6 +24,7 @@ struct extent_buffer;
 struct btrfs_work;
 struct __btrfs_workqueue;
 struct btrfs_qgroup_extent_record;
+struct btrfs_qgroup;
 
 #define show_ref_type(type)						\
 	__print_symbolic(type,						\
@@ -1475,6 +1476,49 @@ TRACE_EVENT(qgroup_update_counters,
 		  __entry->cur_new_count)
 );
 
+TRACE_EVENT(qgroup_update_reserve,
+
+	TP_PROTO(struct btrfs_fs_info *fs_info, struct btrfs_qgroup *qgroup,
+		 s64 diff),
+
+	TP_ARGS(fs_info, qgroup, diff),
+
+	TP_STRUCT__entry_btrfs(
+		__field(	u64,	qgid			)
+		__field(	u64,	cur_reserved		)
+		__field(	s64,	diff			)
+	),
+
+	TP_fast_assign_btrfs(fs_info,
+		__entry->qgid		= qgroup->qgroupid;
+		__entry->cur_reserved	= qgroup->reserved;
+		__entry->diff		= diff;
+	),
+
+	TP_printk_btrfs("qgid=%llu cur_reserved=%llu diff=%lld",
+		__entry->qgid, __entry->cur_reserved, __entry->diff)
+);
+
+TRACE_EVENT(qgroup_meta_reserve,
+
+	TP_PROTO(struct btrfs_root *root, s64 diff),
+
+	TP_ARGS(root, diff),
+
+	TP_STRUCT__entry_btrfs(
+		__field(	u64,	refroot			)
+		__field(	s64,	diff			)
+	),
+
+	TP_fast_assign_btrfs(root->fs_info,
+		__entry->refroot	= root->objectid;
+		__entry->diff		= diff;
+	),
+
+	TP_printk_btrfs("refroot=%llu(%s) diff=%lld",
+		show_root_type(__entry->refroot), __entry->diff)
+);
+
 #endif /* _TRACE_BTRFS_H */
 
 /* This part must be outside protection */
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/9] btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint
  2017-03-13  7:52 [PATCH v2 0/9] Qgroup fixes Qu Wenruo
  2017-03-13  7:52 ` [PATCH v2 1/9] btrfs: qgroup: Add trace point for qgroup reserved space Qu Wenruo
@ 2017-03-13  7:52 ` Qu Wenruo
  2017-03-30 13:11   ` David Sterba
  2017-03-13  7:52 ` [PATCH v2 3/9] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option Qu Wenruo
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2017-03-13  7:52 UTC (permalink / raw)
  To: linux-btrfs

Newly introduced qgroup reserved space trace points are normally nested
into several common qgroup operations.

While some other trace points are not well placed to co-operate with
them, causing confusing output.

This patch re-arrange trace_btrfs_qgroup_release_data() and
trace_btrfs_qgroup_free_delayed_ref() trace points so they are triggered
before reserved space ones.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/qgroup.c | 10 +++++-----
 fs/btrfs/qgroup.h |  6 +-----
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 68a7305b0a46..1fe481d71be5 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2847,14 +2847,14 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
 	if (ret < 0)
 		goto out;
 
-	if (free) {
-		btrfs_qgroup_free_refroot(BTRFS_I(inode)->root->fs_info,
-				BTRFS_I(inode)->root->objectid,
-				changeset.bytes_changed);
+	if (free)
 		trace_op = QGROUP_FREE;
-	}
 	trace_btrfs_qgroup_release_data(inode, start, len,
 					changeset.bytes_changed, trace_op);
+	if (free)
+		btrfs_qgroup_free_refroot(BTRFS_I(inode)->root->fs_info,
+				BTRFS_I(inode)->root->objectid,
+				changeset.bytes_changed);
 out:
 	ulist_release(&changeset.range_changed);
 	return ret;
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index e5b42f1044fc..19188e094d0e 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -230,15 +230,11 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 			 struct btrfs_qgroup_inherit *inherit);
 void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
 			       u64 ref_root, u64 num_bytes);
-/*
- * TODO: Add proper trace point for it, as btrfs_qgroup_free() is
- * called by everywhere, can't provide good trace for delayed ref case.
- */
 static inline void btrfs_qgroup_free_delayed_ref(struct btrfs_fs_info *fs_info,
 						 u64 ref_root, u64 num_bytes)
 {
-	btrfs_qgroup_free_refroot(fs_info, ref_root, num_bytes);
 	trace_btrfs_qgroup_free_delayed_ref(fs_info, ref_root, num_bytes);
+	btrfs_qgroup_free_refroot(fs_info, ref_root, num_bytes);
 }
 void assert_qgroups_uptodate(struct btrfs_trans_handle *trans);
 
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/9] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option
  2017-03-13  7:52 [PATCH v2 0/9] Qgroup fixes Qu Wenruo
  2017-03-13  7:52 ` [PATCH v2 1/9] btrfs: qgroup: Add trace point for qgroup reserved space Qu Wenruo
  2017-03-13  7:52 ` [PATCH v2 2/9] btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint Qu Wenruo
@ 2017-03-13  7:52 ` Qu Wenruo
  2017-04-07 11:05   ` David Sterba
  2017-03-13  7:52 ` [PATCH v2 4/9] btrfs: qgroup: Add quick exit for non-fs extents Qu Wenruo
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2017-03-13  7:52 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
The easist way to reproduce the bug is:
------
 # mkfs.btrfs -f $dev -n 16K
 # mount $dev $mnt -o inode_cache
 # btrfs quota enable $mnt
 # btrfs quota rescan -w $mnt
 # btrfs qgroup show $mnt
qgroupid         rfer         excl
--------         ----         ----
0/5          32.00KiB     32.00KiB
             ^^ Twice the correct value
------

And fstests/btrfs qgroup test group can easily detect them with
inode_cache mount option.
Although some of them are false alerts since old test cases are using
fixed golden output.
While new test cases will use "btrfs check" to detect qgroup mismatch.

[CAUSE]
Inode_cache mount option will make commit_fs_roots() to call
btrfs_save_ino_cache() to update fs/subvol trees, and generate new
delayed refs.

However we call btrfs_qgroup_prepare_account_extents() too early, before
commit_fs_roots().
This makes the "old_roots" for newly generated extents are always NULL.
For freeing extent case, this makes both new_roots and old_roots to be
empty, while correct old_roots should not be empty.
This causing qgroup numbers not decreased correctly.

[FIX]
Modify the timing of calling btrfs_qgroup_prepare_account_extents() to
just before btrfs_qgroup_account_extents(), and add needed delayed_refs
handler.
So qgroup can handle inode_map mount options correctly.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/transaction.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 61b807de3e16..894c4980c15c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2130,13 +2130,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 		goto scrub_continue;
 	}
 
-	/* Reocrd old roots for later qgroup accounting */
-	ret = btrfs_qgroup_prepare_account_extents(trans, fs_info);
-	if (ret) {
-		mutex_unlock(&fs_info->reloc_mutex);
-		goto scrub_continue;
-	}
-
 	/*
 	 * make sure none of the code above managed to slip in a
 	 * delayed item
@@ -2179,6 +2172,24 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	btrfs_free_log_root_tree(trans, fs_info);
 
 	/*
+	 * commit_fs_roots() can call btrfs_save_ino_cache(), which generates
+	 * new delayed refs. Must handle them or qgroup can be wrong.
+	 */
+	ret = btrfs_run_delayed_refs(trans, fs_info, (unsigned long)-1);
+	if (ret) {
+		mutex_unlock(&fs_info->tree_log_mutex);
+		mutex_unlock(&fs_info->reloc_mutex);
+		goto scrub_continue;
+	}
+
+	ret = btrfs_qgroup_prepare_account_extents(trans, fs_info);
+	if (ret) {
+		mutex_unlock(&fs_info->tree_log_mutex);
+		mutex_unlock(&fs_info->reloc_mutex);
+		goto scrub_continue;
+	}
+
+	/*
 	 * Since fs roots are all committed, we can get a quite accurate
 	 * new_roots. So let's do quota accounting.
 	 */
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/9] btrfs: qgroup: Add quick exit for non-fs extents
  2017-03-13  7:52 [PATCH v2 0/9] Qgroup fixes Qu Wenruo
                   ` (2 preceding siblings ...)
  2017-03-13  7:52 ` [PATCH v2 3/9] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option Qu Wenruo
@ 2017-03-13  7:52 ` Qu Wenruo
  2017-03-13  7:52 ` [PATCH v2 5/9] btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function Qu Wenruo
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2017-03-13  7:52 UTC (permalink / raw)
  To: linux-btrfs

For btrfs_qgroup_account_extent(), modify make it exit quicker for
non-fs extents.

This will also reduce the noise in trace_btrfs_qgroup_account_extent
event.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 1fe481d71be5..445014de924c 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1915,6 +1915,33 @@ static int qgroup_update_counters(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+/*
+ * Helper to check if the @roots is a list of fs tree roots
+ * Return 0 for definitely not a fs/subvol tree roots ulist
+ * Return 1 for possible fs/subvol tree roots ulist(including empty)
+ */
+static int maybe_fs_roots(struct ulist *roots)
+{
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+
+	/* Empty one, still possible for fs roots */
+	if (!roots || roots->nnodes == 0)
+		return 1;
+
+	ULIST_ITER_INIT(&uiter);
+	unode = ulist_next(roots, &uiter);
+	if (!unode)
+		return 1;
+
+	/*
+	 * If it contains fs tree roots, then it must belongs to fs/subvol
+	 * trees.
+	 * If it contains non-fs tree, it won't be shared to fs/subvol trees.
+	 */
+	return is_fstree(unode->val);
+}
+
 int
 btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans,
 			    struct btrfs_fs_info *fs_info,
@@ -1931,10 +1958,20 @@ btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans,
 	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
 		return 0;
 
-	if (new_roots)
+	if (new_roots) {
+		if (!maybe_fs_roots(new_roots))
+			goto out_free;
 		nr_new_roots = new_roots->nnodes;
-	if (old_roots)
+	}
+	if (old_roots) {
+		if (!maybe_fs_roots(old_roots))
+			goto out_free;
 		nr_old_roots = old_roots->nnodes;
+	}
+
+	/* Quick exit, either not fs tree roots, or won't affect any qgroup */
+	if (nr_old_roots == 0 && nr_new_roots == 0)
+		goto out_free;
 
 	BUG_ON(!fs_info->quota_root);
 
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/9] btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
  2017-03-13  7:52 [PATCH v2 0/9] Qgroup fixes Qu Wenruo
                   ` (3 preceding siblings ...)
  2017-03-13  7:52 ` [PATCH v2 4/9] btrfs: qgroup: Add quick exit for non-fs extents Qu Wenruo
@ 2017-03-13  7:52 ` Qu Wenruo
  2017-03-13  7:52 ` [PATCH v2 6/9] btrfs: qgroup: Return actually freed bytes for qgroup release or free data Qu Wenruo
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2017-03-13  7:52 UTC (permalink / raw)
  To: linux-btrfs

Quite a lot of qgroup corruption happens due to wrong timing of calling
btrfs_qgroup_prepare_account_extents().

Since the safest timing is calling it just before
btrfs_qgroup_account_extents(), there is no need to separate these 2
function.

Merging them will make code cleaner and less bug prone.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c      | 49 ++++++++++++++++---------------------------------
 fs/btrfs/qgroup.h      |  2 --
 fs/btrfs/transaction.c | 10 ----------
 3 files changed, 16 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 445014de924c..9535ea56b6f3 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1403,38 +1403,6 @@ int btrfs_limit_qgroup(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-int btrfs_qgroup_prepare_account_extents(struct btrfs_trans_handle *trans,
-					 struct btrfs_fs_info *fs_info)
-{
-	struct btrfs_qgroup_extent_record *record;
-	struct btrfs_delayed_ref_root *delayed_refs;
-	struct rb_node *node;
-	u64 qgroup_to_skip;
-	int ret = 0;
-
-	delayed_refs = &trans->transaction->delayed_refs;
-	qgroup_to_skip = delayed_refs->qgroup_to_skip;
-
-	/*
-	 * No need to do lock, since this function will only be called in
-	 * btrfs_commit_transaction().
-	 */
-	node = rb_first(&delayed_refs->dirty_extent_root);
-	while (node) {
-		record = rb_entry(node, struct btrfs_qgroup_extent_record,
-				  node);
-		if (WARN_ON(!record->old_roots))
-			ret = btrfs_find_all_roots(NULL, fs_info,
-					record->bytenr, 0, &record->old_roots);
-		if (ret < 0)
-			break;
-		if (qgroup_to_skip)
-			ulist_del(record->old_roots, qgroup_to_skip, 0);
-		node = rb_next(node);
-	}
-	return ret;
-}
-
 int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
 				struct btrfs_delayed_ref_root *delayed_refs,
 				struct btrfs_qgroup_extent_record *record)
@@ -2051,6 +2019,18 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
 
 		if (!ret) {
 			/*
+			 * old roots should be searched when inserting qgroup
+			 * extent record
+			 */
+			if (WARN_ON(!record->old_roots)) {
+				/* Search commit root to find old_roots */
+				ret = btrfs_find_all_roots(NULL, fs_info,
+						record->bytenr, 0,
+						&record->old_roots);
+				if (ret < 0)
+					goto cleanup;
+			}
+			/*
 			 * Use (u64)-1 as time_seq to do special search, which
 			 * doesn't lock tree or delayed_refs and search current
 			 * root. It's safe inside commit_transaction().
@@ -2059,8 +2039,11 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
 					record->bytenr, (u64)-1, &new_roots);
 			if (ret < 0)
 				goto cleanup;
-			if (qgroup_to_skip)
+			if (qgroup_to_skip) {
 				ulist_del(new_roots, qgroup_to_skip, 0);
+				ulist_del(record->old_roots, qgroup_to_skip,
+					  0);
+			}
 			ret = btrfs_qgroup_account_extent(trans, fs_info,
 					record->bytenr, record->num_bytes,
 					record->old_roots, new_roots);
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 19188e094d0e..b3c77f029dfe 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -134,8 +134,6 @@ int btrfs_limit_qgroup(struct btrfs_trans_handle *trans,
 int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info);
 void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info);
 struct btrfs_delayed_extent_op;
-int btrfs_qgroup_prepare_account_extents(struct btrfs_trans_handle *trans,
-					 struct btrfs_fs_info *fs_info);
 /*
  * Inform qgroup to trace one dirty extent, its info is recorded in @record.
  * So qgroup can account it at transaction committing time.
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 894c4980c15c..edee58751eb3 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1376,9 +1376,6 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
 	ret = commit_fs_roots(trans, fs_info);
 	if (ret)
 		goto out;
-	ret = btrfs_qgroup_prepare_account_extents(trans, fs_info);
-	if (ret < 0)
-		goto out;
 	ret = btrfs_qgroup_account_extents(trans, fs_info);
 	if (ret < 0)
 		goto out;
@@ -2182,13 +2179,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 		goto scrub_continue;
 	}
 
-	ret = btrfs_qgroup_prepare_account_extents(trans, fs_info);
-	if (ret) {
-		mutex_unlock(&fs_info->tree_log_mutex);
-		mutex_unlock(&fs_info->reloc_mutex);
-		goto scrub_continue;
-	}
-
 	/*
 	 * Since fs roots are all committed, we can get a quite accurate
 	 * new_roots. So let's do quota accounting.
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/9] btrfs: qgroup: Return actually freed bytes for qgroup release or free data
  2017-03-13  7:52 [PATCH v2 0/9] Qgroup fixes Qu Wenruo
                   ` (4 preceding siblings ...)
  2017-03-13  7:52 ` [PATCH v2 5/9] btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function Qu Wenruo
@ 2017-03-13  7:52 ` Qu Wenruo
  2017-03-30 13:08   ` David Sterba
  2017-03-13  7:52 ` [PATCH v2 7/9] btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quota enable Qu Wenruo
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2017-03-13  7:52 UTC (permalink / raw)
  To: linux-btrfs

btrfs_qgroup_release/free_data() only returns 0 or minus error
number(ENOMEM is the only possible error).

This is normally good enough, but sometimes we need the accurate byte
number it freed/released.

Change it to return actually released/freed bytenr number instead of 0
for success.
And slightly modify related extent_changeset structure, since in btrfs
one none-hole data extent won't be larger than 128M, so "unsigned int"
is large enough for the use case.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 2 +-
 fs/btrfs/extent_io.h   | 2 +-
 fs/btrfs/qgroup.c      | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index be5477676cc8..89d1bcba3f8c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4288,7 +4288,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
 
 	/* Use new btrfs_qgroup_reserve_data to reserve precious data space. */
 	ret = btrfs_qgroup_reserve_data(inode, start, len);
-	if (ret)
+	if (ret < 0)
 		btrfs_free_reserved_data_space_noquota(inode, start, len);
 	return ret;
 }
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 3e4fad4a909d..bdca004dc859 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -201,7 +201,7 @@ struct extent_buffer {
  */
 struct extent_changeset {
 	/* How many bytes are set/cleared in this operation */
-	u64 bytes_changed;
+	unsigned int bytes_changed;
 
 	/* Changed ranges */
 	struct ulist range_changed;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9535ea56b6f3..87aeefb34959 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2875,6 +2875,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
 		btrfs_qgroup_free_refroot(BTRFS_I(inode)->root->fs_info,
 				BTRFS_I(inode)->root->objectid,
 				changeset.bytes_changed);
+	ret = changeset.bytes_changed;
 out:
 	ulist_release(&changeset.range_changed);
 	return ret;
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 7/9] btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quota enable
  2017-03-13  7:52 [PATCH v2 0/9] Qgroup fixes Qu Wenruo
                   ` (5 preceding siblings ...)
  2017-03-13  7:52 ` [PATCH v2 6/9] btrfs: qgroup: Return actually freed bytes for qgroup release or free data Qu Wenruo
@ 2017-03-13  7:52 ` Qu Wenruo
  2017-03-13  7:52 ` [PATCH v2 8/9] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions Qu Wenruo
  2017-03-13  7:52 ` [PATCH v2 9/9] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges Qu Wenruo
  8 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2017-03-13  7:52 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
Under the following case, we can underflow qgroup reserved space.

            Task A                |            Task B
---------------------------------------------------------------
 Quota disabled                   |
 Buffered write                   |
 |- btrfs_check_data_free_space() |
 |  *NO* qgroup space is reserved |
 |  since quota is *DISABLED*     |
 |- All pages are copied to page  |
    cache                         |
                                  | Enable quota
                                  | Quota scan finished
                                  |
                                  | Sync_fs
                                  | |- run_delalloc_range
                                  | |- Write pages
                                  | |- btrfs_finish_ordered_io
                                  |    |- insert_reserved_file_extent
                                  |       |- btrfs_qgroup_release_data()
                                  |          Since no qgroup space is
                                             reserved in Task A, we
                                             underflow qgroup reserved
                                             space
This can be detected by fstest btrfs/104.

[CAUSE]
In insert_reserved_file_extent() we info qgroup to release the @ram_bytes
size of qgroup reserved_space under all case.
And btrfs_qgroup_release_data() will check if qgroup is enabled.

However in above case, the buffered write happens before quota is
enabled, so we don't havee reserved space for that range.

[FIX]
In insert_reserved_file_extent(), we info qgroup to release the acctual
byte number it released.
In above case, since we don't have reserved space, we info qgroup to
release 0 byte, so the problem can be fixed.

And thanks to the @reserved parameter introduced by qgroup rework, and
previous patch to return release bytes, the fix can be as small as less
than 10 lines.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/inode.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c40060cc481f..36b08abff564 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2044,6 +2044,7 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	struct btrfs_path *path;
 	struct extent_buffer *leaf;
 	struct btrfs_key ins;
+	u64 qg_released;
 	int extent_inserted = 0;
 	int ret;
 
@@ -2099,13 +2100,17 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	ins.objectid = disk_bytenr;
 	ins.offset = disk_num_bytes;
 	ins.type = BTRFS_EXTENT_ITEM_KEY;
-	ret = btrfs_alloc_reserved_file_extent(trans, root->root_key.objectid,
-			btrfs_ino(BTRFS_I(inode)), file_pos, ram_bytes, &ins);
+
 	/*
 	 * Release the reserved range from inode dirty range map, as it is
 	 * already moved into delayed_ref_head
 	 */
-	btrfs_qgroup_release_data(inode, file_pos, ram_bytes);
+	ret = btrfs_qgroup_release_data(inode, file_pos, ram_bytes);
+	if (ret < 0)
+		goto out;
+	qg_released = ret;
+	ret = btrfs_alloc_reserved_file_extent(trans, root->root_key.objectid,
+			btrfs_ino(BTRFS_I(inode)), file_pos, qg_released, &ins);
 out:
 	btrfs_free_path(path);
 
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 8/9] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-03-13  7:52 [PATCH v2 0/9] Qgroup fixes Qu Wenruo
                   ` (6 preceding siblings ...)
  2017-03-13  7:52 ` [PATCH v2 7/9] btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quota enable Qu Wenruo
@ 2017-03-13  7:52 ` Qu Wenruo
  2017-04-07 12:00   ` David Sterba
  2017-03-13  7:52 ` [PATCH v2 9/9] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges Qu Wenruo
  8 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2017-03-13  7:52 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new parameter, struct extent_changeset for
btrfs_qgroup_reserved_data() and its callers.

Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
which range it reserved in current reserve, so it can free it at error
path.

The reason we need to export it to callers is, at buffered write error
path, without knowing what exactly which range we reserved in current
allocation, we can free space which is not reserved by us.

This will lead to qgroup reserved space underflow.

Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  6 ++++--
 fs/btrfs/extent-tree.c | 17 ++++++++++++-----
 fs/btrfs/extent_io.h   | 12 ++++++++++++
 fs/btrfs/file.c        | 17 ++++++++++++++---
 fs/btrfs/inode-map.c   |  6 +++++-
 fs/btrfs/inode.c       | 24 ++++++++++++++++++++----
 fs/btrfs/ioctl.c       |  7 ++++++-
 fs/btrfs/qgroup.c      | 33 +++++++++++++++++++++------------
 fs/btrfs/qgroup.h      |  3 ++-
 fs/btrfs/relocation.c  |  5 ++++-
 10 files changed, 100 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 29b7fc28c607..8349b3bd6522 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2687,8 +2687,9 @@ enum btrfs_flush_state {
 	COMMIT_TRANS		=	6,
 };
 
-int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
 int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
+int btrfs_check_data_free_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len);
 void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
 void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
 					    u64 len);
@@ -2706,7 +2707,8 @@ void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
 				      struct btrfs_block_rsv *rsv);
 int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
 void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes);
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
+int btrfs_delalloc_reserve_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len);
 void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 89d1bcba3f8c..67f4e97ef8c6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3355,12 +3355,14 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
 	struct btrfs_fs_info *fs_info = block_group->fs_info;
 	struct btrfs_root *root = fs_info->tree_root;
 	struct inode *inode = NULL;
+	struct extent_changeset data_reserved;
 	u64 alloc_hint = 0;
 	int dcs = BTRFS_DC_ERROR;
 	u64 num_pages = 0;
 	int retries = 0;
 	int ret = 0;
 
+	extent_changeset_init(&data_reserved);
 	/*
 	 * If this block group is smaller than 100 megs don't bother caching the
 	 * block group.
@@ -3473,7 +3475,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
 	num_pages *= 16;
 	num_pages *= PAGE_SIZE;
 
-	ret = btrfs_check_data_free_space(inode, 0, num_pages);
+	ret = btrfs_check_data_free_space(inode, &data_reserved, 0, num_pages);
 	if (ret)
 		goto out_put;
 
@@ -3504,6 +3506,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
 	block_group->disk_cache_state = dcs;
 	spin_unlock(&block_group->lock);
 
+	extent_changeset_release(&data_reserved);
 	return ret;
 }
 
@@ -4272,7 +4275,8 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
  * Will replace old btrfs_check_data_free_space(), but for patch split,
  * add a new function first and then replace it.
  */
-int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
+int btrfs_check_data_free_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	int ret;
@@ -4287,9 +4291,11 @@ int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
 		return ret;
 
 	/* Use new btrfs_qgroup_reserve_data to reserve precious data space. */
-	ret = btrfs_qgroup_reserve_data(inode, start, len);
+	ret = btrfs_qgroup_reserve_data(inode, reserved, start, len);
 	if (ret < 0)
 		btrfs_free_reserved_data_space_noquota(inode, start, len);
+	else
+		ret = 0;
 	return ret;
 }
 
@@ -6130,11 +6136,12 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
  * Return 0 for success
  * Return <0 for error(-ENOSPC or -EQUOT)
  */
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
+int btrfs_delalloc_reserve_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len)
 {
 	int ret;
 
-	ret = btrfs_check_data_free_space(inode, start, len);
+	ret = btrfs_check_data_free_space(inode, reserved, start, len);
 	if (ret < 0)
 		return ret;
 	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), len);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index bdca004dc859..61db0fca74f9 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -207,6 +207,18 @@ struct extent_changeset {
 	struct ulist range_changed;
 };
 
+static inline void extent_changeset_init(struct extent_changeset *changeset)
+{
+	changeset->bytes_changed = 0;
+	ulist_init(&changeset->range_changed);
+}
+
+static inline void extent_changeset_release(struct extent_changeset *changeset)
+{
+	changeset->bytes_changed = 0;
+	ulist_release(&changeset->range_changed);
+}
+
 static inline void extent_set_compress_type(unsigned long *bio_flags,
 					    int compress_type)
 {
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 520cb7230b2d..209e67051c07 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1529,6 +1529,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct page **pages = NULL;
 	struct extent_state *cached_state = NULL;
+	struct extent_changeset data_reserved;
 	u64 release_bytes = 0;
 	u64 lockstart;
 	u64 lockend;
@@ -1539,6 +1540,8 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 	bool force_page_uptodate = false;
 	bool need_unlock;
 
+	extent_changeset_init(&data_reserved);
+
 	nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
 			PAGE_SIZE / (sizeof(struct page *)));
 	nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied);
@@ -1576,7 +1579,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 		reserve_bytes = round_up(write_bytes + sector_offset,
 				fs_info->sectorsize);
 
-		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
+		extent_changeset_release(&data_reserved);
+		ret = btrfs_check_data_free_space(inode, &data_reserved, pos,
+						  write_bytes);
 		if (ret < 0) {
 			if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
 						      BTRFS_INODE_PREALLOC)) &&
@@ -1750,6 +1755,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 		}
 	}
 
+	extent_changeset_release(&data_reserved);
 	return num_written ? num_written : ret;
 }
 
@@ -2722,6 +2728,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 {
 	struct inode *inode = file_inode(file);
 	struct extent_state *cached_state = NULL;
+	struct extent_changeset data_reserved;
 	struct falloc_range *range;
 	struct falloc_range *tmp;
 	struct list_head reserve_list;
@@ -2736,6 +2743,8 @@ static long btrfs_fallocate(struct file *file, int mode,
 	int blocksize = btrfs_inode_sectorsize(inode);
 	int ret;
 
+	extent_changeset_init(&data_reserved);
+
 	alloc_start = round_down(offset, blocksize);
 	alloc_end = round_up(offset + len, blocksize);
 	cur_offset = alloc_start;
@@ -2854,10 +2863,11 @@ static long btrfs_fallocate(struct file *file, int mode,
 				free_extent_map(em);
 				break;
 			}
-			ret = btrfs_qgroup_reserve_data(inode, cur_offset,
-					last_byte - cur_offset);
+			ret = btrfs_qgroup_reserve_data(inode, &data_reserved,
+					cur_offset, last_byte - cur_offset);
 			if (ret < 0)
 				break;
+			ret = 0;
 		} else {
 			/*
 			 * Do not need to reserve unwritten extent for this
@@ -2925,6 +2935,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 	if (ret != 0)
 		btrfs_free_reserved_data_space(inode, alloc_start,
 				       alloc_end - cur_offset);
+	extent_changeset_release(&data_reserved);
 	return ret;
 }
 
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 5c6c20ec64d8..66003fa79935 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -400,6 +400,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
 	struct btrfs_path *path;
 	struct inode *inode;
 	struct btrfs_block_rsv *rsv;
+	struct extent_changeset data_reserved;
 	u64 num_bytes;
 	u64 alloc_hint = 0;
 	int ret;
@@ -419,6 +420,8 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
 	if (!btrfs_test_opt(fs_info, INODE_MAP_CACHE))
 		return 0;
 
+	extent_changeset_init(&data_reserved);
+
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
@@ -492,7 +495,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
 	/* Just to make sure we have enough space */
 	prealloc += 8 * PAGE_SIZE;
 
-	ret = btrfs_delalloc_reserve_space(inode, 0, prealloc);
+	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, 0, prealloc);
 	if (ret)
 		goto out_put;
 
@@ -516,6 +519,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
 	trans->bytes_reserved = num_bytes;
 
 	btrfs_free_path(path);
+	extent_changeset_release(&data_reserved);
 	return ret;
 }
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 36b08abff564..216f28e94d57 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1936,12 +1936,15 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 	struct btrfs_writepage_fixup *fixup;
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
+	struct extent_changeset data_reserved;
 	struct page *page;
 	struct inode *inode;
 	u64 page_start;
 	u64 page_end;
 	int ret;
 
+	extent_changeset_init(&data_reserved);
+
 	fixup = container_of(work, struct btrfs_writepage_fixup, work);
 	page = fixup->page;
 again:
@@ -1973,7 +1976,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 		goto again;
 	}
 
-	ret = btrfs_delalloc_reserve_space(inode, page_start,
+	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start,
 					   PAGE_SIZE);
 	if (ret) {
 		mapping_set_error(page->mapping, ret);
@@ -1993,6 +1996,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 	unlock_page(page);
 	put_page(page);
 	kfree(fixup);
+	extent_changeset_release(&data_reserved);
 }
 
 /*
@@ -4647,6 +4651,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
+	struct extent_changeset data_reserved;
 	char *kaddr;
 	u32 blocksize = fs_info->sectorsize;
 	pgoff_t index = from >> PAGE_SHIFT;
@@ -4657,11 +4662,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	u64 block_start;
 	u64 block_end;
 
+	extent_changeset_init(&data_reserved);
+
 	if ((offset & (blocksize - 1)) == 0 &&
 	    (!len || ((len & (blocksize - 1)) == 0)))
 		goto out;
 
-	ret = btrfs_delalloc_reserve_space(inode,
+	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
 			round_down(from, blocksize), blocksize);
 	if (ret)
 		goto out;
@@ -4746,6 +4753,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	unlock_page(page);
 	put_page(page);
 out:
+	extent_changeset_release(&data_reserved);
 	return ret;
 }
 
@@ -8583,6 +8591,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	struct inode *inode = file->f_mapping->host;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_dio_data dio_data = { 0 };
+	struct extent_changeset data_reserved;
 	loff_t offset = iocb->ki_pos;
 	size_t count = 0;
 	int flags = 0;
@@ -8593,6 +8602,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	if (check_direct_IO(fs_info, iocb, iter, offset))
 		return 0;
 
+	extent_changeset_init(&data_reserved);
 	inode_dio_begin(inode);
 	smp_mb__after_atomic();
 
@@ -8619,7 +8629,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 			inode_unlock(inode);
 			relock = true;
 		}
-		ret = btrfs_delalloc_reserve_space(inode, offset, count);
+		ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
+						   offset, count);
 		if (ret)
 			goto out;
 		dio_data.outstanding_extents = count_max_extents(count);
@@ -8676,6 +8687,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	if (relock)
 		inode_lock(inode);
 
+	extent_changeset_release(&data_reserved);
 	return ret;
 }
 
@@ -8907,6 +8919,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
+	struct extent_changeset data_reserved;
 	char *kaddr;
 	unsigned long zero_start;
 	loff_t size;
@@ -8917,6 +8930,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 	u64 page_end;
 	u64 end;
 
+	extent_changeset_init(&data_reserved);
 	reserved_space = PAGE_SIZE;
 
 	sb_start_pagefault(inode->i_sb);
@@ -8932,7 +8946,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 	 * end up waiting indefinitely to get a lock on the page currently
 	 * being processed by btrfs_page_mkwrite() function.
 	 */
-	ret = btrfs_delalloc_reserve_space(inode, page_start,
+	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start,
 					   reserved_space);
 	if (!ret) {
 		ret = file_update_time(vmf->vma->vm_file);
@@ -9038,6 +9052,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 out_unlock:
 	if (!ret) {
 		sb_end_pagefault(inode->i_sb);
+		extent_changeset_release(&data_reserved);
 		return VM_FAULT_LOCKED;
 	}
 	unlock_page(page);
@@ -9045,6 +9060,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 	btrfs_delalloc_release_space(inode, page_start, reserved_space);
 out_noreserve:
 	sb_end_pagefault(inode->i_sb);
+	extent_changeset_release(&data_reserved);
 	return ret;
 }
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index dabfc7ac48a6..876e177a55fc 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1127,15 +1127,18 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
 	struct extent_io_tree *tree;
+	struct extent_changeset data_reserved;
 	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
 
+	extent_changeset_init(&data_reserved);
+
 	file_end = (isize - 1) >> PAGE_SHIFT;
 	if (!isize || start_index > file_end)
 		return 0;
 
 	page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
 
-	ret = btrfs_delalloc_reserve_space(inode,
+	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
 			start_index << PAGE_SHIFT,
 			page_cnt << PAGE_SHIFT);
 	if (ret)
@@ -1247,6 +1250,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		unlock_page(pages[i]);
 		put_page(pages[i]);
 	}
+	extent_changeset_release(&data_reserved);
 	return i_done;
 out:
 	for (i = 0; i < i_done; i++) {
@@ -1256,6 +1260,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	btrfs_delalloc_release_space(inode,
 			start_index << PAGE_SHIFT,
 			page_cnt << PAGE_SHIFT);
+	extent_changeset_release(&data_reserved);
 	return ret;
 
 }
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 87aeefb34959..a8eff593cabd 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2813,43 +2813,52 @@ btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
  * Return <0 for error (including -EQUOT)
  *
  * NOTE: this function may sleep for memory allocation.
+ *       if btrfs_qgroup_reserve_data() is called multiple times with
+ *       same @reserved, caller must ensure when error happens it's OK
+ *       to free *ALL* reserved space.
  */
-int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
+int btrfs_qgroup_reserve_data(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
-	struct extent_changeset changeset;
 	struct ulist_node *unode;
 	struct ulist_iterator uiter;
+	u64 orig_reserved;
+	u64 to_reserve;
 	int ret;
 
 	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags) ||
 	    !is_fstree(root->objectid) || len == 0)
 		return 0;
 
-	changeset.bytes_changed = 0;
-	ulist_init(&changeset.range_changed);
+	/* @reserved parameter is mandatory for qgroup */
+	if (WARN_ON(!reserved))
+		return -EINVAL;
+	/* Record already reserved space */
+	orig_reserved = reserved->bytes_changed;
 	ret = set_record_extent_bits(&BTRFS_I(inode)->io_tree, start,
-			start + len -1, EXTENT_QGROUP_RESERVED, &changeset);
+			start + len -1, EXTENT_QGROUP_RESERVED, reserved);
+
+	/* Newly reserved space */
+	to_reserve = reserved->bytes_changed - orig_reserved;
 	trace_btrfs_qgroup_reserve_data(inode, start, len,
-					changeset.bytes_changed,
-					QGROUP_RESERVE);
+					to_reserve, QGROUP_RESERVE);
 	if (ret < 0)
 		goto cleanup;
-	ret = qgroup_reserve(root, changeset.bytes_changed, true);
+	ret = qgroup_reserve(root, to_reserve, true);
 	if (ret < 0)
 		goto cleanup;
 
-	ulist_release(&changeset.range_changed);
 	return ret;
 
 cleanup:
-	/* cleanup already reserved ranges */
+	/* cleanup *ALL* already reserved ranges */
 	ULIST_ITER_INIT(&uiter);
-	while ((unode = ulist_next(&changeset.range_changed, &uiter)))
+	while ((unode = ulist_next(&reserved->range_changed, &uiter)))
 		clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
 				 unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL,
 				 GFP_NOFS);
-	ulist_release(&changeset.range_changed);
+	extent_changeset_release(reserved);
 	return ret;
 }
 
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index b3c77f029dfe..5fbe2023e17f 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -242,7 +242,8 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 #endif
 
 /* New io_tree based accurate qgroup reserve API */
-int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
+int btrfs_qgroup_reserve_data(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len);
 int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
 int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index d60df51959f7..493d4dc19edf 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3093,11 +3093,13 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	u64 prealloc_start = cluster->start - offset;
 	u64 prealloc_end = cluster->end - offset;
 	u64 cur_offset;
+	struct extent_changeset data_reserved;
 
 	BUG_ON(cluster->start != cluster->boundary[0]);
+	extent_changeset_init(&data_reserved);
 	inode_lock(inode);
 
-	ret = btrfs_check_data_free_space(inode, prealloc_start,
+	ret = btrfs_check_data_free_space(inode, &data_reserved, prealloc_start,
 					  prealloc_end + 1 - prealloc_start);
 	if (ret)
 		goto out;
@@ -3129,6 +3131,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
 				       prealloc_end + 1 - cur_offset);
 out:
 	inode_unlock(inode);
+	extent_changeset_release(&data_reserved);
 	return ret;
 }
 
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 9/9] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges
  2017-03-13  7:52 [PATCH v2 0/9] Qgroup fixes Qu Wenruo
                   ` (7 preceding siblings ...)
  2017-03-13  7:52 ` [PATCH v2 8/9] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions Qu Wenruo
@ 2017-03-13  7:52 ` Qu Wenruo
  2017-04-07 12:05   ` David Sterba
  8 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2017-03-13  7:52 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
For the following case, btrfs can underflow qgroup reserved space
at error path:
(Page size 4K, function name without "btrfs_" prefix)

         Task A                  |             Task B
----------------------------------------------------------------------
Buffered_write [0, 2K)           |
|- check_data_free_space()       |
|  |- qgroup_reserve_data()      |
|     Range aligned to page      |
|     range [0, 4K)          <<< |
|     4K bytes reserved      <<< |
|- copy pages to page cache      |
                                 | Buffered_write [2K, 4K)
                                 | |- check_data_free_space()
                                 | |  |- qgroup_reserved_data()
                                 | |     Range alinged to page
                                 | |     range [0, 4K)
                                 | |     Already reserved by A <<<
                                 | |     0 bytes reserved      <<<
                                 | |- delalloc_reserve_metadata()
                                 | |  And it *FAILED* (Maybe EQUOTA)
                                 | |- free_reserved_data_space()
                                      |- qgroup_free_data()
                                         Range aligned to page range
                                         [0, 4K)
                                         Freeing 4K
(Special thanks to Chandan for the detailed report and analyse)

[CAUSE]
Above Task B is freeing reserved data range [0, 4K) which is actually
reserved by Task A.

And at write back time, page dirty by Task A will go through writeback
routine, which will free 4K reserved data space at file extent insert
time, causing the qgroup underflow.

[FIX]
For btrfs_qgroup_free_data(), add @reserved parameter to only free
data ranges reserved by previous btrfs_qgroup_reserve_data().
So in above case, Task B will try to free 0 byte, so no underflow.

Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/btrfs/ctree.h       |  6 +++--
 fs/btrfs/extent-tree.c | 12 +++++----
 fs/btrfs/file.c        | 29 +++++++++++---------
 fs/btrfs/inode.c       | 29 ++++++++++----------
 fs/btrfs/ioctl.c       |  4 +--
 fs/btrfs/qgroup.c      | 72 ++++++++++++++++++++++++++++++++++++++++++++++----
 fs/btrfs/qgroup.h      |  3 ++-
 fs/btrfs/relocation.c  |  8 +++---
 8 files changed, 117 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8349b3bd6522..f76b3ecdc715 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2690,7 +2690,10 @@ enum btrfs_flush_state {
 int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
 int btrfs_check_data_free_space(struct inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len);
-void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
+void btrfs_free_reserved_data_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len);
+void btrfs_delalloc_release_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len);
 void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
 					    u64 len);
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
@@ -2709,7 +2712,6 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
 void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes);
 int btrfs_delalloc_reserve_space(struct inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len);
-void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_fs_info *fs_info,
 					      unsigned short type);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 67f4e97ef8c6..9de0436becd2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4336,7 +4336,8 @@ void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
  * This one will handle the per-inode data rsv map for accurate reserved
  * space framework.
  */
-void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
+void btrfs_free_reserved_data_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 
@@ -4346,7 +4347,7 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
 	start = round_down(start, root->fs_info->sectorsize);
 
 	btrfs_free_reserved_data_space_noquota(inode, start, len);
-	btrfs_qgroup_free_data(inode, start, len);
+	btrfs_qgroup_free_data(inode, reserved, start, len);
 }
 
 static void force_metadata_allocation(struct btrfs_fs_info *info)
@@ -6146,7 +6147,7 @@ int btrfs_delalloc_reserve_space(struct inode *inode,
 		return ret;
 	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), len);
 	if (ret < 0)
-		btrfs_free_reserved_data_space(inode, start, len);
+		btrfs_free_reserved_data_space(inode, reserved, start, len);
 	return ret;
 }
 
@@ -6165,10 +6166,11 @@ int btrfs_delalloc_reserve_space(struct inode *inode,
  * list if there are no delalloc bytes left.
  * Also it will handle the qgroup reserved space.
  */
-void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len)
+void btrfs_delalloc_release_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len)
 {
 	btrfs_delalloc_release_metadata(BTRFS_I(inode), len);
-	btrfs_free_reserved_data_space(inode, start, len);
+	btrfs_free_reserved_data_space(inode, reserved, start, len);
 }
 
 static int update_block_group(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 209e67051c07..83b622285c69 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1610,8 +1610,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				reserve_bytes);
 		if (ret) {
 			if (!only_release_metadata)
-				btrfs_free_reserved_data_space(inode, pos,
-							       write_bytes);
+				btrfs_free_reserved_data_space(inode,
+						&data_reserved, pos,
+						write_bytes);
 			else
 				btrfs_end_write_no_snapshoting(root);
 			break;
@@ -1693,8 +1694,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				__pos = round_down(pos,
 						   fs_info->sectorsize) +
 					(dirty_pages << PAGE_SHIFT);
-				btrfs_delalloc_release_space(inode, __pos,
-							     release_bytes);
+				btrfs_delalloc_release_space(inode,
+						&data_reserved, __pos,
+						release_bytes);
 			}
 		}
 
@@ -1749,9 +1751,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 			btrfs_delalloc_release_metadata(BTRFS_I(inode),
 					release_bytes);
 		} else {
-			btrfs_delalloc_release_space(inode,
-						round_down(pos, fs_info->sectorsize),
-						release_bytes);
+			btrfs_delalloc_release_space(inode, &data_reserved,
+					round_down(pos, fs_info->sectorsize),
+					release_bytes);
 		}
 	}
 
@@ -2874,8 +2876,8 @@ static long btrfs_fallocate(struct file *file, int mode,
 			 * range, free reserved data space first, otherwise
 			 * it'll result in false ENOSPC error.
 			 */
-			btrfs_free_reserved_data_space(inode, cur_offset,
-				last_byte - cur_offset);
+			btrfs_free_reserved_data_space(inode, &data_reserved,
+					cur_offset, last_byte - cur_offset);
 		}
 		free_extent_map(em);
 		cur_offset = last_byte;
@@ -2894,8 +2896,9 @@ static long btrfs_fallocate(struct file *file, int mode,
 					range->len, i_blocksize(inode),
 					offset + len, &alloc_hint);
 		else
-			btrfs_free_reserved_data_space(inode, range->start,
-						       range->len);
+			btrfs_free_reserved_data_space(inode,
+					&data_reserved, range->start,
+					range->len);
 		list_del(&range->list);
 		kfree(range);
 	}
@@ -2933,8 +2936,8 @@ static long btrfs_fallocate(struct file *file, int mode,
 	inode_unlock(inode);
 	/* Let go of our reservation. */
 	if (ret != 0)
-		btrfs_free_reserved_data_space(inode, alloc_start,
-				       alloc_end - cur_offset);
+		btrfs_free_reserved_data_space(inode, &data_reserved,
+				alloc_start, alloc_end - cur_offset);
 	extent_changeset_release(&data_reserved);
 	return ret;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 216f28e94d57..9004d5773f67 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -325,7 +325,7 @@ static noinline int cow_file_range_inline(struct btrfs_root *root,
 	 * And at reserve time, it's always aligned to page size, so
 	 * just free one page here.
 	 */
-	btrfs_qgroup_free_data(inode, 0, PAGE_SIZE);
+	btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE);
 	btrfs_free_path(path);
 	btrfs_end_transaction(trans);
 	return ret;
@@ -2829,7 +2829,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		 * space for NOCOW range.
 		 * As NOCOW won't cause a new delayed ref, just free the space
 		 */
-		btrfs_qgroup_free_data(inode, ordered_extent->file_offset,
+		btrfs_qgroup_free_data(inode, NULL, ordered_extent->file_offset,
 				       ordered_extent->len);
 		btrfs_ordered_update_i_size(inode, 0, ordered_extent);
 		if (nolock)
@@ -4676,7 +4676,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 again:
 	page = find_or_create_page(mapping, index, mask);
 	if (!page) {
-		btrfs_delalloc_release_space(inode,
+		btrfs_delalloc_release_space(inode, &data_reserved,
 				round_down(from, blocksize),
 				blocksize);
 		ret = -ENOMEM;
@@ -4748,7 +4748,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 
 out_unlock:
 	if (ret)
-		btrfs_delalloc_release_space(inode, block_start,
+		btrfs_delalloc_release_space(inode, &data_reserved, block_start,
 					     blocksize);
 	unlock_page(page);
 	put_page(page);
@@ -5148,7 +5148,7 @@ static void evict_inode_truncate_pages(struct inode *inode)
 		 * Note, end is the bytenr of last byte, so we need + 1 here.
 		 */
 		if (state->state & EXTENT_DELALLOC)
-			btrfs_qgroup_free_data(inode, start, end - start + 1);
+			btrfs_qgroup_free_data(inode, NULL, start, end - start + 1);
 
 		clear_extent_bit(io_tree, start, end,
 				 EXTENT_LOCKED | EXTENT_DIRTY |
@@ -8662,8 +8662,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		current->journal_info = NULL;
 		if (ret < 0 && ret != -EIOCBQUEUED) {
 			if (dio_data.reserve)
-				btrfs_delalloc_release_space(inode, offset,
-							     dio_data.reserve);
+				btrfs_delalloc_release_space(inode, &data_reserved,
+					offset, dio_data.reserve);
 			/*
 			 * On error we might have left some ordered extents
 			 * without submitting corresponding bios for them, so
@@ -8678,8 +8678,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 					dio_data.unsubmitted_oe_range_start,
 					0);
 		} else if (ret >= 0 && (size_t)ret < count)
-			btrfs_delalloc_release_space(inode, offset,
-						     count - (size_t)ret);
+			btrfs_delalloc_release_space(inode, &data_reserved,
+					offset, count - (size_t)ret);
 	}
 out:
 	if (wakeup)
@@ -8877,7 +8877,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	 *    free the entire extent.
 	 */
 	if (PageDirty(page))
-		btrfs_qgroup_free_data(inode, page_start, PAGE_SIZE);
+		btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE);
 	if (!inode_evicting) {
 		clear_extent_bit(tree, page_start, page_end,
 				 EXTENT_LOCKED | EXTENT_DIRTY |
@@ -9000,8 +9000,8 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 			spin_lock(&BTRFS_I(inode)->lock);
 			BTRFS_I(inode)->outstanding_extents++;
 			spin_unlock(&BTRFS_I(inode)->lock);
-			btrfs_delalloc_release_space(inode, page_start,
-						PAGE_SIZE - reserved_space);
+			btrfs_delalloc_release_space(inode, &data_reserved,
+					page_start, PAGE_SIZE - reserved_space);
 		}
 	}
 
@@ -9057,7 +9057,8 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 	}
 	unlock_page(page);
 out:
-	btrfs_delalloc_release_space(inode, page_start, reserved_space);
+	btrfs_delalloc_release_space(inode, &data_reserved, page_start,
+				     reserved_space);
 out_noreserve:
 	sb_end_pagefault(inode->i_sb);
 	extent_changeset_release(&data_reserved);
@@ -10414,7 +10415,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 			btrfs_end_transaction(trans);
 	}
 	if (cur_offset < end)
-		btrfs_free_reserved_data_space(inode, cur_offset,
+		btrfs_free_reserved_data_space(inode, NULL, cur_offset,
 			end - cur_offset + 1);
 	return ret;
 }
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 876e177a55fc..824b2390812f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1229,7 +1229,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		spin_lock(&BTRFS_I(inode)->lock);
 		BTRFS_I(inode)->outstanding_extents++;
 		spin_unlock(&BTRFS_I(inode)->lock);
-		btrfs_delalloc_release_space(inode,
+		btrfs_delalloc_release_space(inode, &data_reserved,
 				start_index << PAGE_SHIFT,
 				(page_cnt - i_done) << PAGE_SHIFT);
 	}
@@ -1257,7 +1257,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		unlock_page(pages[i]);
 		put_page(pages[i]);
 	}
-	btrfs_delalloc_release_space(inode,
+	btrfs_delalloc_release_space(inode, &data_reserved,
 			start_index << PAGE_SHIFT,
 			page_cnt << PAGE_SHIFT);
 	extent_changeset_release(&data_reserved);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index a8eff593cabd..d6abc269093e 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2862,13 +2862,72 @@ int btrfs_qgroup_reserve_data(struct inode *inode,
 	return ret;
 }
 
-static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
-				       int free)
+/* Free ranges specified by @reserved, normally in error path */
+static int qgroup_free_reserved_data(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+	struct extent_changeset changeset;
+	int freed = 0;
+	int ret;
+
+	extent_changeset_init(&changeset);
+	len = round_up(start + len, root->fs_info->sectorsize);
+	start = round_down(start, root->fs_info->sectorsize);
+
+	ULIST_ITER_INIT(&uiter);
+	while ((unode = ulist_next(&reserved->range_changed, &uiter))) {
+		u64 range_start = unode->val;
+		/* unode->aux is the inclusive end */
+		u64 range_len = unode->aux - range_start + 1;
+		u64 free_start;
+		u64 free_len;
+
+		extent_changeset_release(&changeset);
+
+		/* Only free range in range [start, start + len) */
+		if (range_start >= start + len ||
+		    range_start + range_len <= start)
+			continue;
+		free_start = max(range_start, start);
+		free_len = min(start + len, range_start + range_len) -
+			   free_start;
+		/*
+		 * TODO: To also modify reserved->ranges_reserved to reflect
+		 * the modification.
+		 *
+		 * However as long as we free qgroup reserved according to
+		 * EXTENT_QGROUP_RESERVED, we won't double free.
+		 * So not need to rush.
+		 */
+		ret = clear_record_extent_bits(&BTRFS_I(inode)->io_failure_tree,
+				free_start, free_start + free_len - 1,
+				EXTENT_QGROUP_RESERVED, &changeset);
+		if (ret < 0)
+			goto out;
+		freed += changeset.bytes_changed;
+	}
+	btrfs_qgroup_free_refroot(root->fs_info, root->objectid, freed);
+	ret = freed;
+out:
+	extent_changeset_release(&changeset);
+	return ret;
+}
+
+static int __btrfs_qgroup_release_data(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len,
+			int free)
 {
 	struct extent_changeset changeset;
 	int trace_op = QGROUP_RELEASE;
 	int ret;
 
+	/* In release case, we shouldn't have @reserved */
+	WARN_ON(!free && reserved);
+	if (free && reserved)
+		return qgroup_free_reserved_data(inode, reserved, start, len);
 	changeset.bytes_changed = 0;
 	ulist_init(&changeset.range_changed);
 	ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start, 
@@ -2895,14 +2954,17 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
  *
  * Should be called when a range of pages get invalidated before reaching disk.
  * Or for error cleanup case.
+ * if @reserved is given, only reserved range in [@start, @start + @len) will
+ * be freed.
  *
  * For data written to disk, use btrfs_qgroup_release_data().
  *
  * NOTE: This function may sleep for memory allocation.
  */
-int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len)
+int btrfs_qgroup_free_data(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len)
 {
-	return __btrfs_qgroup_release_data(inode, start, len, 1);
+	return __btrfs_qgroup_release_data(inode, reserved, start, len, 1);
 }
 
 /*
@@ -2922,7 +2984,7 @@ int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len)
  */
 int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len)
 {
-	return __btrfs_qgroup_release_data(inode, start, len, 0);
+	return __btrfs_qgroup_release_data(inode, NULL, start, len, 0);
 }
 
 int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 5fbe2023e17f..dd9553636d63 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -245,7 +245,8 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 int btrfs_qgroup_reserve_data(struct inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len);
 int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
-int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
+int btrfs_qgroup_free_data(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len);
 
 int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
 			      bool enforce);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 493d4dc19edf..66f6f0966a60 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3115,8 +3115,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
 		lock_extent(&BTRFS_I(inode)->io_tree, start, end);
 		num_bytes = end + 1 - start;
 		if (cur_offset < start)
-			btrfs_free_reserved_data_space(inode, cur_offset,
-					start - cur_offset);
+			btrfs_free_reserved_data_space(inode, &data_reserved,
+					cur_offset, start - cur_offset);
 		ret = btrfs_prealloc_file_range(inode, 0, start,
 						num_bytes, num_bytes,
 						end + 1, &alloc_hint);
@@ -3127,8 +3127,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
 		nr++;
 	}
 	if (cur_offset < prealloc_end)
-		btrfs_free_reserved_data_space(inode, cur_offset,
-				       prealloc_end + 1 - cur_offset);
+		btrfs_free_reserved_data_space(inode, &data_reserved,
+				cur_offset, prealloc_end + 1 - cur_offset);
 out:
 	inode_unlock(inode);
 	extent_changeset_release(&data_reserved);
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/9] btrfs: qgroup: Add trace point for qgroup reserved space
  2017-03-13  7:52 ` [PATCH v2 1/9] btrfs: qgroup: Add trace point for qgroup reserved space Qu Wenruo
@ 2017-03-30 12:49   ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2017-03-30 12:49 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Mar 13, 2017 at 03:52:08PM +0800, Qu Wenruo wrote:
> Introduce the following trace points:
> qgroup_update_reserve
> qgroup_meta_reserve
> 
> These trace points are handy to trace qgroup reserve space related
> problems.
> 
> Also export btrfs_qgroup structure, as now we directly pass btrfs_qgroup
> structure to trace points, so that structure needs to be exported.
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/9] btrfs: qgroup: Return actually freed bytes for qgroup release or free data
  2017-03-13  7:52 ` [PATCH v2 6/9] btrfs: qgroup: Return actually freed bytes for qgroup release or free data Qu Wenruo
@ 2017-03-30 13:08   ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2017-03-30 13:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Mar 13, 2017 at 03:52:13PM +0800, Qu Wenruo wrote:
> btrfs_qgroup_release/free_data() only returns 0 or minus error
> number(ENOMEM is the only possible error).

btrfs_qgroup_release_data -> __btrfs_qgroup_release_data
will not allocate the ulist anymore, and there are no errors propagated
from clear_record_extent_bits (just 0), but there are still some
unhandled cases. So here a negative value should be expected anyway.

> This is normally good enough, but sometimes we need the accurate byte
> number it freed/released.
> 
> Change it to return actually released/freed bytenr number instead of 0
> for success.
> And slightly modify related extent_changeset structure, since in btrfs
> one none-hole data extent won't be larger than 128M, so "unsigned int"
> is large enough for the use case.

> -	u64 bytes_changed;
> +	unsigned int bytes_changed;

> @@ -2875,6 +2875,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
>  		btrfs_qgroup_free_refroot(BTRFS_I(inode)->root->fs_info,
>  				BTRFS_I(inode)->root->objectid,
>  				changeset.bytes_changed);
> +	ret = changeset.bytes_changed;

This relies on the 128M limit, which will fit to int when assigned. I'm
thinking about the signedness, it makes sense to use unsigned for
bytes_changed, but we also need the negative error returned. Probably
ok.

>  out:
>  	ulist_release(&changeset.range_changed);
>  	return ret;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/9] btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint
  2017-03-13  7:52 ` [PATCH v2 2/9] btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint Qu Wenruo
@ 2017-03-30 13:11   ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2017-03-30 13:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Mar 13, 2017 at 03:52:09PM +0800, Qu Wenruo wrote:
> Newly introduced qgroup reserved space trace points are normally nested
> into several common qgroup operations.
> 
> While some other trace points are not well placed to co-operate with
> them, causing confusing output.
> 
> This patch re-arrange trace_btrfs_qgroup_release_data() and
> trace_btrfs_qgroup_free_delayed_ref() trace points so they are triggered
> before reserved space ones.
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Reviewed-by: Jeff Mahoney <jeffm@suse.com>

I'll add 1 and 2 to the 4.12 queue, but please keep it in your patchset
so it's not dependent on the current development patch queue.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/9] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option
  2017-03-13  7:52 ` [PATCH v2 3/9] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option Qu Wenruo
@ 2017-04-07 11:05   ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2017-04-07 11:05 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Mar 13, 2017 at 03:52:10PM +0800, Qu Wenruo wrote:
> [BUG]
> The easist way to reproduce the bug is:
> ------
>  # mkfs.btrfs -f $dev -n 16K
>  # mount $dev $mnt -o inode_cache
>  # btrfs quota enable $mnt
>  # btrfs quota rescan -w $mnt
>  # btrfs qgroup show $mnt
> qgroupid         rfer         excl
> --------         ----         ----
> 0/5          32.00KiB     32.00KiB
>              ^^ Twice the correct value
> ------
> 
> And fstests/btrfs qgroup test group can easily detect them with
> inode_cache mount option.
> Although some of them are false alerts since old test cases are using
> fixed golden output.
> While new test cases will use "btrfs check" to detect qgroup mismatch.
> 
> [CAUSE]
> Inode_cache mount option will make commit_fs_roots() to call
> btrfs_save_ino_cache() to update fs/subvol trees, and generate new
> delayed refs.
> 
> However we call btrfs_qgroup_prepare_account_extents() too early, before
> commit_fs_roots().
> This makes the "old_roots" for newly generated extents are always NULL.
> For freeing extent case, this makes both new_roots and old_roots to be
> empty, while correct old_roots should not be empty.
> This causing qgroup numbers not decreased correctly.
> 
> [FIX]
> Modify the timing of calling btrfs_qgroup_prepare_account_extents() to
> just before btrfs_qgroup_account_extents(), and add needed delayed_refs
> handler.
> So qgroup can handle inode_map mount options correctly.
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Patch added to 4.12 queue.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 8/9] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-03-13  7:52 ` [PATCH v2 8/9] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions Qu Wenruo
@ 2017-04-07 12:00   ` David Sterba
  2017-04-10  1:25     ` Qu Wenruo
  0 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2017-04-07 12:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Mar 13, 2017 at 03:52:15PM +0800, Qu Wenruo wrote:
> @@ -3355,12 +3355,14 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
>  	struct btrfs_fs_info *fs_info = block_group->fs_info;
>  	struct btrfs_root *root = fs_info->tree_root;
>  	struct inode *inode = NULL;
> +	struct extent_changeset data_reserved;

Size of this structure is 40 bytes, and it's being added to many
functions. This will be noticeable on the stack consumption.

-extent-tree.c:btrfs_check_data_free_space              40 static
-extent-tree.c:cache_save_setup                         96 static
+extent-tree.c:btrfs_check_data_free_space              48 static
+extent-tree.c:cache_save_setup                         136 static
-file.c:__btrfs_buffered_write                          192 static
+file.c:__btrfs_buffered_write                          232 static
-file.c:btrfs_fallocate                                 208 static
+file.c:btrfs_fallocate                                 248 static
-inode-map.c:btrfs_save_ino_cache                       112 static
+inode-map.c:btrfs_save_ino_cache                       152 static
-inode.c:btrfs_direct_IO                                128 static
+inode.c:btrfs_direct_IO                                176 static
-inode.c:btrfs_writepage_fixup_worker                   88 static
+inode.c:btrfs_writepage_fixup_worker                   128 static
-inode.c:btrfs_truncate_block                           136 static
+inode.c:btrfs_truncate_block                           176 static
-inode.c:btrfs_page_mkwrite                             112 static
+inode.c:btrfs_page_mkwrite                             152 static
+ioctl.c:cluster_pages_for_defrag                       200 static
-ioctl.c:btrfs_defrag_file                              312 static
+ioctl.c:btrfs_defrag_file                              232 static
-qgroup.c:btrfs_qgroup_reserve_data                     136 static
+qgroup.c:btrfs_qgroup_reserve_data                     128 static
-relocation.c:prealloc_file_extent_cluster              152 static
+relocation.c:prealloc_file_extent_cluster              192 static

There are generic functions so this will affect non-qgroup workloads as
well. So there need to be a dynamic allocation (which would add another
point of failure), or reworked in another way.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 9/9] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges
  2017-03-13  7:52 ` [PATCH v2 9/9] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges Qu Wenruo
@ 2017-04-07 12:05   ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2017-04-07 12:05 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Mar 13, 2017 at 03:52:16PM +0800, Qu Wenruo wrote:
> +		/*
> +		 * TODO: To also modify reserved->ranges_reserved to reflect

No new TODOs in the code please.

> +		 * the modification.
> +		 *
> +		 * However as long as we free qgroup reserved according to
> +		 * EXTENT_QGROUP_RESERVED, we won't double free.
> +		 * So not need to rush.
> +		 */
> +		ret = clear_record_extent_bits(&BTRFS_I(inode)->io_failure_tree,
> +				free_start, free_start + free_len - 1,
> +				EXTENT_QGROUP_RESERVED, &changeset);
> +		if (ret < 0)
> +			goto out;
> +		freed += changeset.bytes_changed;
> +	}
> +	btrfs_qgroup_free_refroot(root->fs_info, root->objectid, freed);
> +	ret = freed;
> +out:
> +	extent_changeset_release(&changeset);
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 8/9] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-04-07 12:00   ` David Sterba
@ 2017-04-10  1:25     ` Qu Wenruo
  2017-04-10 14:00       ` David Sterba
  0 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2017-04-10  1:25 UTC (permalink / raw)
  To: dsterba, linux-btrfs



At 04/07/2017 08:00 PM, David Sterba wrote:
> On Mon, Mar 13, 2017 at 03:52:15PM +0800, Qu Wenruo wrote:
>> @@ -3355,12 +3355,14 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
>>  	struct btrfs_fs_info *fs_info = block_group->fs_info;
>>  	struct btrfs_root *root = fs_info->tree_root;
>>  	struct inode *inode = NULL;
>> +	struct extent_changeset data_reserved;
>
> Size of this structure is 40 bytes, and it's being added to many
> functions. This will be noticeable on the stack consumption.
>
> -extent-tree.c:btrfs_check_data_free_space              40 static
> -extent-tree.c:cache_save_setup                         96 static
> +extent-tree.c:btrfs_check_data_free_space              48 static
> +extent-tree.c:cache_save_setup                         136 static
> -file.c:__btrfs_buffered_write                          192 static
> +file.c:__btrfs_buffered_write                          232 static
> -file.c:btrfs_fallocate                                 208 static
> +file.c:btrfs_fallocate                                 248 static
> -inode-map.c:btrfs_save_ino_cache                       112 static
> +inode-map.c:btrfs_save_ino_cache                       152 static
> -inode.c:btrfs_direct_IO                                128 static
> +inode.c:btrfs_direct_IO                                176 static
> -inode.c:btrfs_writepage_fixup_worker                   88 static
> +inode.c:btrfs_writepage_fixup_worker                   128 static
> -inode.c:btrfs_truncate_block                           136 static
> +inode.c:btrfs_truncate_block                           176 static
> -inode.c:btrfs_page_mkwrite                             112 static
> +inode.c:btrfs_page_mkwrite                             152 static
> +ioctl.c:cluster_pages_for_defrag                       200 static
> -ioctl.c:btrfs_defrag_file                              312 static
> +ioctl.c:btrfs_defrag_file                              232 static
> -qgroup.c:btrfs_qgroup_reserve_data                     136 static
> +qgroup.c:btrfs_qgroup_reserve_data                     128 static
> -relocation.c:prealloc_file_extent_cluster              152 static
> +relocation.c:prealloc_file_extent_cluster              192 static
>
> There are generic functions so this will affect non-qgroup workloads as
> well. So there need to be a dynamic allocation (which would add another
> point of failure), or reworked in another way.

Well, I'll try to rework this to allocate extent_changeset inside 
btrfs_qgroup_reserve_data(), so that callers only need extra 8 bytes 
stack space, and no need to introduce extra error handlers.

Thanks,
Qu




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 8/9] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-04-10  1:25     ` Qu Wenruo
@ 2017-04-10 14:00       ` David Sterba
  2017-04-10 14:14         ` David Sterba
  0 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2017-04-10 14:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Apr 10, 2017 at 09:25:18AM +0800, Qu Wenruo wrote:
> 
> 
> At 04/07/2017 08:00 PM, David Sterba wrote:
> > On Mon, Mar 13, 2017 at 03:52:15PM +0800, Qu Wenruo wrote:
> >> @@ -3355,12 +3355,14 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
> >>  	struct btrfs_fs_info *fs_info = block_group->fs_info;
> >>  	struct btrfs_root *root = fs_info->tree_root;
> >>  	struct inode *inode = NULL;
> >> +	struct extent_changeset data_reserved;
> >
> > Size of this structure is 40 bytes, and it's being added to many
> > functions. This will be noticeable on the stack consumption.
> >
> > -extent-tree.c:btrfs_check_data_free_space              40 static
> > -extent-tree.c:cache_save_setup                         96 static
> > +extent-tree.c:btrfs_check_data_free_space              48 static
> > +extent-tree.c:cache_save_setup                         136 static
> > -file.c:__btrfs_buffered_write                          192 static
> > +file.c:__btrfs_buffered_write                          232 static
> > -file.c:btrfs_fallocate                                 208 static
> > +file.c:btrfs_fallocate                                 248 static
> > -inode-map.c:btrfs_save_ino_cache                       112 static
> > +inode-map.c:btrfs_save_ino_cache                       152 static
> > -inode.c:btrfs_direct_IO                                128 static
> > +inode.c:btrfs_direct_IO                                176 static
> > -inode.c:btrfs_writepage_fixup_worker                   88 static
> > +inode.c:btrfs_writepage_fixup_worker                   128 static
> > -inode.c:btrfs_truncate_block                           136 static
> > +inode.c:btrfs_truncate_block                           176 static
> > -inode.c:btrfs_page_mkwrite                             112 static
> > +inode.c:btrfs_page_mkwrite                             152 static
> > +ioctl.c:cluster_pages_for_defrag                       200 static
> > -ioctl.c:btrfs_defrag_file                              312 static
> > +ioctl.c:btrfs_defrag_file                              232 static
> > -qgroup.c:btrfs_qgroup_reserve_data                     136 static
> > +qgroup.c:btrfs_qgroup_reserve_data                     128 static
> > -relocation.c:prealloc_file_extent_cluster              152 static
> > +relocation.c:prealloc_file_extent_cluster              192 static
> >
> > There are generic functions so this will affect non-qgroup workloads as
> > well. So there need to be a dynamic allocation (which would add another
> > point of failure), or reworked in another way.
> 
> Well, I'll try to rework this to allocate extent_changeset inside 
> btrfs_qgroup_reserve_data(), so that callers only need extra 8 bytes 
> stack space, and no need to introduce extra error handlers.

So this still requires the dynamic allocation, on various call paths,
that's what I object against most at the moment.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 8/9] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-04-10 14:00       ` David Sterba
@ 2017-04-10 14:14         ` David Sterba
  2017-04-11  1:44           ` Qu Wenruo
  0 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2017-04-10 14:14 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On Mon, Apr 10, 2017 at 04:00:25PM +0200, David Sterba wrote:
> On Mon, Apr 10, 2017 at 09:25:18AM +0800, Qu Wenruo wrote:
> > 
> > 
> > At 04/07/2017 08:00 PM, David Sterba wrote:
> > > On Mon, Mar 13, 2017 at 03:52:15PM +0800, Qu Wenruo wrote:
> > >> @@ -3355,12 +3355,14 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
> > >>  	struct btrfs_fs_info *fs_info = block_group->fs_info;
> > >>  	struct btrfs_root *root = fs_info->tree_root;
> > >>  	struct inode *inode = NULL;
> > >> +	struct extent_changeset data_reserved;
> > >
> > > Size of this structure is 40 bytes, and it's being added to many
> > > functions. This will be noticeable on the stack consumption.
> > >
> > > -extent-tree.c:btrfs_check_data_free_space              40 static
> > > -extent-tree.c:cache_save_setup                         96 static
> > > +extent-tree.c:btrfs_check_data_free_space              48 static
> > > +extent-tree.c:cache_save_setup                         136 static
> > > -file.c:__btrfs_buffered_write                          192 static
> > > +file.c:__btrfs_buffered_write                          232 static
> > > -file.c:btrfs_fallocate                                 208 static
> > > +file.c:btrfs_fallocate                                 248 static
> > > -inode-map.c:btrfs_save_ino_cache                       112 static
> > > +inode-map.c:btrfs_save_ino_cache                       152 static
> > > -inode.c:btrfs_direct_IO                                128 static
> > > +inode.c:btrfs_direct_IO                                176 static
> > > -inode.c:btrfs_writepage_fixup_worker                   88 static
> > > +inode.c:btrfs_writepage_fixup_worker                   128 static
> > > -inode.c:btrfs_truncate_block                           136 static
> > > +inode.c:btrfs_truncate_block                           176 static
> > > -inode.c:btrfs_page_mkwrite                             112 static
> > > +inode.c:btrfs_page_mkwrite                             152 static
> > > +ioctl.c:cluster_pages_for_defrag                       200 static
> > > -ioctl.c:btrfs_defrag_file                              312 static
> > > +ioctl.c:btrfs_defrag_file                              232 static
> > > -qgroup.c:btrfs_qgroup_reserve_data                     136 static
> > > +qgroup.c:btrfs_qgroup_reserve_data                     128 static
> > > -relocation.c:prealloc_file_extent_cluster              152 static
> > > +relocation.c:prealloc_file_extent_cluster              192 static
> > >
> > > There are generic functions so this will affect non-qgroup workloads as
> > > well. So there need to be a dynamic allocation (which would add another
> > > point of failure), or reworked in another way.
> > 
> > Well, I'll try to rework this to allocate extent_changeset inside 
> > btrfs_qgroup_reserve_data(), so that callers only need extra 8 bytes 
> > stack space, and no need to introduce extra error handlers.
> 
> So this still requires the dynamic allocation, on various call paths,
> that's what I object against most at the moment.

I get that we cannot easily avoid using the extent_changeset, so we'll
end up one way or another, the stack conservation has slight preference.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 8/9] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-04-10 14:14         ` David Sterba
@ 2017-04-11  1:44           ` Qu Wenruo
  2017-05-05 17:24             ` David Sterba
  0 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2017-04-11  1:44 UTC (permalink / raw)
  To: dsterba, linux-btrfs



At 04/10/2017 10:14 PM, David Sterba wrote:
> On Mon, Apr 10, 2017 at 04:00:25PM +0200, David Sterba wrote:
>> On Mon, Apr 10, 2017 at 09:25:18AM +0800, Qu Wenruo wrote:
>>>
>>>
>>> At 04/07/2017 08:00 PM, David Sterba wrote:
>>>> On Mon, Mar 13, 2017 at 03:52:15PM +0800, Qu Wenruo wrote:
>>>>> @@ -3355,12 +3355,14 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
>>>>>   	struct btrfs_fs_info *fs_info = block_group->fs_info;
>>>>>   	struct btrfs_root *root = fs_info->tree_root;
>>>>>   	struct inode *inode = NULL;
>>>>> +	struct extent_changeset data_reserved;
>>>>
>>>> Size of this structure is 40 bytes, and it's being added to many
>>>> functions. This will be noticeable on the stack consumption.
>>>>
>>>> -extent-tree.c:btrfs_check_data_free_space              40 static
>>>> -extent-tree.c:cache_save_setup                         96 static
>>>> +extent-tree.c:btrfs_check_data_free_space              48 static
>>>> +extent-tree.c:cache_save_setup                         136 static
>>>> -file.c:__btrfs_buffered_write                          192 static
>>>> +file.c:__btrfs_buffered_write                          232 static
>>>> -file.c:btrfs_fallocate                                 208 static
>>>> +file.c:btrfs_fallocate                                 248 static
>>>> -inode-map.c:btrfs_save_ino_cache                       112 static
>>>> +inode-map.c:btrfs_save_ino_cache                       152 static
>>>> -inode.c:btrfs_direct_IO                                128 static
>>>> +inode.c:btrfs_direct_IO                                176 static
>>>> -inode.c:btrfs_writepage_fixup_worker                   88 static
>>>> +inode.c:btrfs_writepage_fixup_worker                   128 static
>>>> -inode.c:btrfs_truncate_block                           136 static
>>>> +inode.c:btrfs_truncate_block                           176 static
>>>> -inode.c:btrfs_page_mkwrite                             112 static
>>>> +inode.c:btrfs_page_mkwrite                             152 static
>>>> +ioctl.c:cluster_pages_for_defrag                       200 static
>>>> -ioctl.c:btrfs_defrag_file                              312 static
>>>> +ioctl.c:btrfs_defrag_file                              232 static
>>>> -qgroup.c:btrfs_qgroup_reserve_data                     136 static
>>>> +qgroup.c:btrfs_qgroup_reserve_data                     128 static
>>>> -relocation.c:prealloc_file_extent_cluster              152 static
>>>> +relocation.c:prealloc_file_extent_cluster              192 static
>>>>
>>>> There are generic functions so this will affect non-qgroup workloads as
>>>> well. So there need to be a dynamic allocation (which would add another
>>>> point of failure), or reworked in another way.
>>>
>>> Well, I'll try to rework this to allocate extent_changeset inside
>>> btrfs_qgroup_reserve_data(), so that callers only need extra 8 bytes
>>> stack space, and no need to introduce extra error handlers.
>>
>> So this still requires the dynamic allocation, on various call paths,
>> that's what I object against most at the moment.
> 
> I get that we cannot easily avoid using the extent_changeset, so we'll
> end up one way or another, the stack conservation has slight preference.

Yes, I understand dynamic allocation can complicate the error handler.

But the allocation inside btrfs_qgroup_reserve_data() could reduce the 
impact. And bring less compact to qgroup disabled case.
(So btrfs_qgroup_reserve_data() accepts struct extent_changeset ** other 
than struct extent_changeset *)

The code to be modified should be at the same level as current patch.
Mostly change extent_changeset_release() to extent_changeset_free().


So overall the impact is not that huge, unless current error handlers 
already have some problems.

Personally speaking I'm OK either way.

Do I need to build a temporary branch/patchset for your evaluation?

Thanks,
Qu


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 8/9] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-04-11  1:44           ` Qu Wenruo
@ 2017-05-05 17:24             ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2017-05-05 17:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Tue, Apr 11, 2017 at 09:44:03AM +0800, Qu Wenruo wrote:
> > I get that we cannot easily avoid using the extent_changeset, so we'll
> > end up one way or another, the stack conservation has slight preference.
> 
> Yes, I understand dynamic allocation can complicate the error handler.
> 
> But the allocation inside btrfs_qgroup_reserve_data() could reduce the 
> impact. And bring less compact to qgroup disabled case.
> (So btrfs_qgroup_reserve_data() accepts struct extent_changeset ** other 
> than struct extent_changeset *)
> 
> The code to be modified should be at the same level as current patch.
> Mostly change extent_changeset_release() to extent_changeset_free().
> 
> 
> So overall the impact is not that huge, unless current error handlers 
> already have some problems.
> 
> Personally speaking I'm OK either way.
> 
> Do I need to build a temporary branch/patchset for your evaluation?

Yes please.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-05-05 17:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-13  7:52 [PATCH v2 0/9] Qgroup fixes Qu Wenruo
2017-03-13  7:52 ` [PATCH v2 1/9] btrfs: qgroup: Add trace point for qgroup reserved space Qu Wenruo
2017-03-30 12:49   ` David Sterba
2017-03-13  7:52 ` [PATCH v2 2/9] btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint Qu Wenruo
2017-03-30 13:11   ` David Sterba
2017-03-13  7:52 ` [PATCH v2 3/9] btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option Qu Wenruo
2017-04-07 11:05   ` David Sterba
2017-03-13  7:52 ` [PATCH v2 4/9] btrfs: qgroup: Add quick exit for non-fs extents Qu Wenruo
2017-03-13  7:52 ` [PATCH v2 5/9] btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function Qu Wenruo
2017-03-13  7:52 ` [PATCH v2 6/9] btrfs: qgroup: Return actually freed bytes for qgroup release or free data Qu Wenruo
2017-03-30 13:08   ` David Sterba
2017-03-13  7:52 ` [PATCH v2 7/9] btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quota enable Qu Wenruo
2017-03-13  7:52 ` [PATCH v2 8/9] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions Qu Wenruo
2017-04-07 12:00   ` David Sterba
2017-04-10  1:25     ` Qu Wenruo
2017-04-10 14:00       ` David Sterba
2017-04-10 14:14         ` David Sterba
2017-04-11  1:44           ` Qu Wenruo
2017-05-05 17:24             ` David Sterba
2017-03-13  7:52 ` [PATCH v2 9/9] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges Qu Wenruo
2017-04-07 12:05   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.