All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version
@ 2017-05-17  2:56 Qu Wenruo
  2017-05-17  2:56 ` [RFC PATCH v3.2 1/6] btrfs: qgroup: Add quick exit for non-fs extents Qu Wenruo
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Qu Wenruo @ 2017-05-17  2:56 UTC (permalink / raw)
  To: linux-btrfs, dsterba

The remaining qgroup fixes patches, based on the Chris' for-linus-4.12
branch with commit 9bcaaea7418d09691f1ffab5c49aacafe3eef9d0 as base.

Can be fetched from github:
https://github.com/adam900710/linux/tree/qgroup_fixes_non_stack

This update is commit message update only.

v2:
  Add reviewed-by tag for 2nd patch
  Update the first patch to follow the new trace point standard
RFC v3:
  Use non-stack (dyanamic allocation) for extent_changeset structure, in
  5th patch, to reduce impact for quota disabled cases.
  Rebase to latest for-linus-4.12 branch.
RFC v3.1:
  Update comment to include the newly introduced parameter
  Use init/release function to replace open coded ulist_init/release().
RFC v3.2:
  Update commit message of 1st and 3rd patch.

Qu Wenruo (6):
  btrfs: qgroup: Add quick exit for non-fs extents
  btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
  btrfs: qgroup: Return actually freed bytes for qgroup release or free
    data
  btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered
    write and quota enable
  btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  btrfs: qgroup: Fix qgroup reserved space underflow by only freeing
    reserved ranges

 fs/btrfs/ctree.h       |  12 ++-
 fs/btrfs/extent-tree.c |  37 +++++----
 fs/btrfs/extent_io.h   |  36 ++++++++-
 fs/btrfs/file.c        |  41 ++++++----
 fs/btrfs/inode-map.c   |   4 +-
 fs/btrfs/inode.c       |  58 ++++++++-----
 fs/btrfs/ioctl.c       |   9 ++-
 fs/btrfs/qgroup.c      | 215 ++++++++++++++++++++++++++++++++++++-------------
 fs/btrfs/qgroup.h      |   8 +-
 fs/btrfs/relocation.c  |  12 +--
 fs/btrfs/transaction.c |  10 ---
 11 files changed, 303 insertions(+), 139 deletions(-)

-- 
2.13.0




^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH v3.2 1/6] btrfs: qgroup: Add quick exit for non-fs extents
  2017-05-17  2:56 [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version Qu Wenruo
@ 2017-05-17  2:56 ` Qu Wenruo
  2017-05-17  2:56 ` [RFC PATCH v3.2 2/6] btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function Qu Wenruo
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2017-05-17  2:56 UTC (permalink / raw)
  To: linux-btrfs, dsterba

Modify btrfs_qgroup_account_extent() to exit quicker for non-fs extents.

The quick exit condition is:
1) The extent belongs to a non-fs tree
   Only fs-tree extents can affect qgroup numbers and is the only case
   where extent can be shared between different trees.

   Although strictly speaking extent in data-reloc or tree-reloc tree
   can be shared, data/tree-reloc root won't occur in the result of
   btrfs_find_all_roots(), so we can ignore such corner case.

   So we can check the first root in old_roots/new_roots ulist.
   If we find the 1st root is a not a fs/subvol root, then we can skip the
   extent.
   If we find the 1st root is a fs/subvol root, then we must continue
   calculation.

OR

2) both 'nr_old_roots' and 'nr_new_roots' are 0
   This means either such extent get allocated then freed in current
   transaction or it's a new reloc tree extent, whose nr_new_roots is 0.
   Either way it won't affect qgroup accounting and can be skipped
   safely.

Such quick exit can make trace output more quite and less confusing:
(example with fs uuid and time stamp removed)

Before:
------
add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
btrfs_qgroup_account_extent: bytenr=29556736 num_bytes=16384 nr_old_roots=0 nr_new_roots=1
------
Extent tree block will trigger btrfs_qgroup_account_extent() trace point
while no qgroup number is changed, as extent tree won't affect qgroup
accounting.

After:
------
add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
------
Now such unrelated extent won't trigger btrfs_qgroup_account_extent()
trace point, making the trace less noisy.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 3f75b5cbbfef..905fed1ee0dd 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1915,6 +1915,33 @@ static int qgroup_update_counters(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+/*
+ * Helper to check if the @roots is a list of fs tree roots
+ * Return 0 for definitely not a fs/subvol tree roots ulist
+ * Return 1 for possible fs/subvol tree roots ulist(including empty)
+ */
+static int maybe_fs_roots(struct ulist *roots)
+{
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+
+	/* Empty one, still possible for fs roots */
+	if (!roots || roots->nnodes == 0)
+		return 1;
+
+	ULIST_ITER_INIT(&uiter);
+	unode = ulist_next(roots, &uiter);
+	if (!unode)
+		return 1;
+
+	/*
+	 * If it contains fs tree roots, then it must belongs to fs/subvol
+	 * trees.
+	 * If it contains non-fs tree, it won't be shared to fs/subvol trees.
+	 */
+	return is_fstree(unode->val);
+}
+
 int
 btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans,
 			    struct btrfs_fs_info *fs_info,
@@ -1931,10 +1958,20 @@ btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans,
 	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags))
 		return 0;
 
-	if (new_roots)
+	if (new_roots) {
+		if (!maybe_fs_roots(new_roots))
+			goto out_free;
 		nr_new_roots = new_roots->nnodes;
-	if (old_roots)
+	}
+	if (old_roots) {
+		if (!maybe_fs_roots(old_roots))
+			goto out_free;
 		nr_old_roots = old_roots->nnodes;
+	}
+
+	/* Quick exit, either not fs tree roots, or won't affect any qgroup */
+	if (nr_old_roots == 0 && nr_new_roots == 0)
+		goto out_free;
 
 	BUG_ON(!fs_info->quota_root);
 
-- 
2.13.0




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH v3.2 2/6] btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
  2017-05-17  2:56 [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version Qu Wenruo
  2017-05-17  2:56 ` [RFC PATCH v3.2 1/6] btrfs: qgroup: Add quick exit for non-fs extents Qu Wenruo
@ 2017-05-17  2:56 ` Qu Wenruo
  2017-05-17  2:56 ` [RFC PATCH v3.2 3/6] btrfs: qgroup: Return actually freed bytes for qgroup release or free data Qu Wenruo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2017-05-17  2:56 UTC (permalink / raw)
  To: linux-btrfs, dsterba

Quite a lot of qgroup corruption happens due to wrong timing of calling
btrfs_qgroup_prepare_account_extents().

Since the safest timing is calling it just before
btrfs_qgroup_account_extents(), there is no need to separate these 2
function.

Merging them will make code cleaner and less bug prone.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/qgroup.c      | 50 +++++++++++++++++---------------------------------
 fs/btrfs/qgroup.h      |  2 --
 fs/btrfs/transaction.c | 10 ----------
 3 files changed, 17 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 905fed1ee0dd..9f01c25469f7 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1403,38 +1403,6 @@ int btrfs_limit_qgroup(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-int btrfs_qgroup_prepare_account_extents(struct btrfs_trans_handle *trans,
-					 struct btrfs_fs_info *fs_info)
-{
-	struct btrfs_qgroup_extent_record *record;
-	struct btrfs_delayed_ref_root *delayed_refs;
-	struct rb_node *node;
-	u64 qgroup_to_skip;
-	int ret = 0;
-
-	delayed_refs = &trans->transaction->delayed_refs;
-	qgroup_to_skip = delayed_refs->qgroup_to_skip;
-
-	/*
-	 * No need to do lock, since this function will only be called in
-	 * btrfs_commit_transaction().
-	 */
-	node = rb_first(&delayed_refs->dirty_extent_root);
-	while (node) {
-		record = rb_entry(node, struct btrfs_qgroup_extent_record,
-				  node);
-		if (WARN_ON(!record->old_roots))
-			ret = btrfs_find_all_roots(NULL, fs_info,
-					record->bytenr, 0, &record->old_roots);
-		if (ret < 0)
-			break;
-		if (qgroup_to_skip)
-			ulist_del(record->old_roots, qgroup_to_skip, 0);
-		node = rb_next(node);
-	}
-	return ret;
-}
-
 int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
 				struct btrfs_delayed_ref_root *delayed_refs,
 				struct btrfs_qgroup_extent_record *record)
@@ -2051,6 +2019,19 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
 
 		if (!ret) {
 			/*
+			 * old roots should be searched when inserting qgroup
+			 * extent record
+			 */
+			if (WARN_ON(!record->old_roots)) {
+				/* Search commit root to find old_roots */
+				ret = btrfs_find_all_roots(NULL, fs_info,
+						record->bytenr, 0,
+						&record->old_roots);
+				if (ret < 0)
+					goto cleanup;
+			}
+
+			/*
 			 * Use SEQ_LAST as time_seq to do special search, which
 			 * doesn't lock tree or delayed_refs and search current
 			 * root. It's safe inside commit_transaction().
@@ -2059,8 +2040,11 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans,
 					record->bytenr, SEQ_LAST, &new_roots);
 			if (ret < 0)
 				goto cleanup;
-			if (qgroup_to_skip)
+			if (qgroup_to_skip) {
 				ulist_del(new_roots, qgroup_to_skip, 0);
+				ulist_del(record->old_roots, qgroup_to_skip,
+					  0);
+			}
 			ret = btrfs_qgroup_account_extent(trans, fs_info,
 					record->bytenr, record->num_bytes,
 					record->old_roots, new_roots);
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index fe04d3f295c6..38d14d4575c0 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -134,8 +134,6 @@ int btrfs_limit_qgroup(struct btrfs_trans_handle *trans,
 int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info);
 void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info);
 struct btrfs_delayed_extent_op;
-int btrfs_qgroup_prepare_account_extents(struct btrfs_trans_handle *trans,
-					 struct btrfs_fs_info *fs_info);
 /*
  * Inform qgroup to trace one dirty extent, its info is recorded in @record.
  * So qgroup can account it at transaction committing time.
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 2168654c90a1..ee5b41d297d1 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1374,9 +1374,6 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
 	ret = commit_fs_roots(trans, fs_info);
 	if (ret)
 		goto out;
-	ret = btrfs_qgroup_prepare_account_extents(trans, fs_info);
-	if (ret < 0)
-		goto out;
 	ret = btrfs_qgroup_account_extents(trans, fs_info);
 	if (ret < 0)
 		goto out;
@@ -2180,13 +2177,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 		goto scrub_continue;
 	}
 
-	ret = btrfs_qgroup_prepare_account_extents(trans, fs_info);
-	if (ret) {
-		mutex_unlock(&fs_info->tree_log_mutex);
-		mutex_unlock(&fs_info->reloc_mutex);
-		goto scrub_continue;
-	}
-
 	/*
 	 * Since fs roots are all committed, we can get a quite accurate
 	 * new_roots. So let's do quota accounting.
-- 
2.13.0




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH v3.2 3/6] btrfs: qgroup: Return actually freed bytes for qgroup release or free data
  2017-05-17  2:56 [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version Qu Wenruo
  2017-05-17  2:56 ` [RFC PATCH v3.2 1/6] btrfs: qgroup: Add quick exit for non-fs extents Qu Wenruo
  2017-05-17  2:56 ` [RFC PATCH v3.2 2/6] btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function Qu Wenruo
@ 2017-05-17  2:56 ` Qu Wenruo
  2017-05-17  2:56 ` [RFC PATCH v3.2 4/6] btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quota enable Qu Wenruo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2017-05-17  2:56 UTC (permalink / raw)
  To: linux-btrfs, dsterba

btrfs_qgroup_release/free_data() only returns 0 or a negative error
number (ENOMEM is the only possible error).

This is normally good enough, but sometimes we need the accurate byte
number it freed/released.

Change it to return actually released/freed bytenr number instead of 0
for success.
And slightly modify related extent_changeset structure, since in btrfs
one no-hole data extent won't be larger than 128M, so "unsigned int"
is large enough for the use case.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 2 +-
 fs/btrfs/extent_io.h   | 2 +-
 fs/btrfs/qgroup.c      | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e390451c72e6..4f62696131a6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4298,7 +4298,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
 
 	/* Use new btrfs_qgroup_reserve_data to reserve precious data space. */
 	ret = btrfs_qgroup_reserve_data(inode, start, len);
-	if (ret)
+	if (ret < 0)
 		btrfs_free_reserved_data_space_noquota(inode, start, len);
 	return ret;
 }
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 1eafa2f0ede3..cc1b08fa9fe7 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -205,7 +205,7 @@ struct extent_buffer {
  */
 struct extent_changeset {
 	/* How many bytes are set/cleared in this operation */
-	u64 bytes_changed;
+	unsigned int bytes_changed;
 
 	/* Changed ranges */
 	struct ulist range_changed;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9f01c25469f7..ad2e99491395 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2886,6 +2886,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
 		btrfs_qgroup_free_refroot(BTRFS_I(inode)->root->fs_info,
 				BTRFS_I(inode)->root->objectid,
 				changeset.bytes_changed);
+	ret = changeset.bytes_changed;
 out:
 	ulist_release(&changeset.range_changed);
 	return ret;
-- 
2.13.0




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH v3.2 4/6] btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quota enable
  2017-05-17  2:56 [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version Qu Wenruo
                   ` (2 preceding siblings ...)
  2017-05-17  2:56 ` [RFC PATCH v3.2 3/6] btrfs: qgroup: Return actually freed bytes for qgroup release or free data Qu Wenruo
@ 2017-05-17  2:56 ` Qu Wenruo
  2017-05-17  2:56 ` [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions Qu Wenruo
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2017-05-17  2:56 UTC (permalink / raw)
  To: linux-btrfs, dsterba

[BUG]
Under the following case, we can underflow qgroup reserved space.

            Task A                |            Task B
---------------------------------------------------------------
 Quota disabled                   |
 Buffered write                   |
 |- btrfs_check_data_free_space() |
 |  *NO* qgroup space is reserved |
 |  since quota is *DISABLED*     |
 |- All pages are copied to page  |
    cache                         |
                                  | Enable quota
                                  | Quota scan finished
                                  |
                                  | Sync_fs
                                  | |- run_delalloc_range
                                  | |- Write pages
                                  | |- btrfs_finish_ordered_io
                                  |    |- insert_reserved_file_extent
                                  |       |- btrfs_qgroup_release_data()
                                  |          Since no qgroup space is
                                             reserved in Task A, we
                                             underflow qgroup reserved
                                             space
This can be detected by fstest btrfs/104.

[CAUSE]
In insert_reserved_file_extent() we info qgroup to release the @ram_bytes
size of qgroup reserved_space under all case.
And btrfs_qgroup_release_data() will check if qgroup is enabled.

However in above case, the buffered write happens before quota is
enabled, so we don't havee reserved space for that range.

[FIX]
In insert_reserved_file_extent(), we info qgroup to release the acctual
byte number it released.
In above case, since we don't have reserved space, we info qgroup to
release 0 byte, so the problem can be fixed.

And thanks to the @reserved parameter introduced by qgroup rework, and
previous patch to return release bytes, the fix can be as small as less
than 10 lines.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/inode.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 17cbe9306faf..a1294d5baef5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2143,6 +2143,7 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	struct btrfs_path *path;
 	struct extent_buffer *leaf;
 	struct btrfs_key ins;
+	u64 qg_released;
 	int extent_inserted = 0;
 	int ret;
 
@@ -2198,13 +2199,17 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
 	ins.objectid = disk_bytenr;
 	ins.offset = disk_num_bytes;
 	ins.type = BTRFS_EXTENT_ITEM_KEY;
-	ret = btrfs_alloc_reserved_file_extent(trans, root->root_key.objectid,
-			btrfs_ino(BTRFS_I(inode)), file_pos, ram_bytes, &ins);
+
 	/*
 	 * Release the reserved range from inode dirty range map, as it is
 	 * already moved into delayed_ref_head
 	 */
-	btrfs_qgroup_release_data(inode, file_pos, ram_bytes);
+	ret = btrfs_qgroup_release_data(inode, file_pos, ram_bytes);
+	if (ret < 0)
+		goto out;
+	qg_released = ret;
+	ret = btrfs_alloc_reserved_file_extent(trans, root->root_key.objectid,
+			btrfs_ino(BTRFS_I(inode)), file_pos, qg_released, &ins);
 out:
 	btrfs_free_path(path);
 
-- 
2.13.0




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-05-17  2:56 [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version Qu Wenruo
                   ` (3 preceding siblings ...)
  2017-05-17  2:56 ` [RFC PATCH v3.2 4/6] btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quota enable Qu Wenruo
@ 2017-05-17  2:56 ` Qu Wenruo
  2017-05-17 15:37   ` David Sterba
  2017-05-17  2:56 ` [RFC PATCH v3.2 6/6] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges Qu Wenruo
  2017-06-21 19:09 ` [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version David Sterba
  6 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2017-05-17  2:56 UTC (permalink / raw)
  To: linux-btrfs, dsterba

Introduce a new parameter, struct extent_changeset for
btrfs_qgroup_reserved_data() and its callers.

Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
which range it reserved in current reserve, so it can free it at error
path.

The reason we need to export it to callers is, at buffered write error
path, without knowing what exactly which range we reserved in current
allocation, we can free space which is not reserved by us.

This will lead to qgroup reserved space underflow.

Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h       |  6 ++++--
 fs/btrfs/extent-tree.c | 23 +++++++++++++----------
 fs/btrfs/extent_io.h   | 34 +++++++++++++++++++++++++++++++++
 fs/btrfs/file.c        | 12 +++++++++---
 fs/btrfs/inode-map.c   |  4 +++-
 fs/btrfs/inode.c       | 18 ++++++++++++++----
 fs/btrfs/ioctl.c       |  5 ++++-
 fs/btrfs/qgroup.c      | 51 ++++++++++++++++++++++++++++++++------------------
 fs/btrfs/qgroup.h      |  3 ++-
 fs/btrfs/relocation.c  |  4 +++-
 10 files changed, 119 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1e82516fe2d8..52a0147cd612 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2704,8 +2704,9 @@ enum btrfs_flush_state {
 	COMMIT_TRANS		=	6,
 };
 
-int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
 int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
+int btrfs_check_data_free_space(struct inode *inode,
+			struct extent_changeset **reserved, u64 start, u64 len);
 void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
 void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
 					    u64 len);
@@ -2723,7 +2724,8 @@ void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
 				      struct btrfs_block_rsv *rsv);
 int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
 void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes);
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
+int btrfs_delalloc_reserve_space(struct inode *inode,
+			struct extent_changeset **reserved, u64 start, u64 len);
 void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4f62696131a6..ef09cc37f25f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3364,6 +3364,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
 	struct btrfs_fs_info *fs_info = block_group->fs_info;
 	struct btrfs_root *root = fs_info->tree_root;
 	struct inode *inode = NULL;
+	struct extent_changeset *data_reserved = NULL;
 	u64 alloc_hint = 0;
 	int dcs = BTRFS_DC_ERROR;
 	u64 num_pages = 0;
@@ -3483,7 +3484,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
 	num_pages *= 16;
 	num_pages *= PAGE_SIZE;
 
-	ret = btrfs_check_data_free_space(inode, 0, num_pages);
+	ret = btrfs_check_data_free_space(inode, &data_reserved, 0, num_pages);
 	if (ret)
 		goto out_put;
 
@@ -3514,6 +3515,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
 	block_group->disk_cache_state = dcs;
 	spin_unlock(&block_group->lock);
 
+	extent_changeset_free(data_reserved);
 	return ret;
 }
 
@@ -4277,12 +4279,8 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
 	return ret;
 }
 
-/*
- * New check_data_free_space() with ability for precious data reservation
- * Will replace old btrfs_check_data_free_space(), but for patch split,
- * add a new function first and then replace it.
- */
-int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
+int btrfs_check_data_free_space(struct inode *inode,
+			struct extent_changeset **reserved, u64 start, u64 len)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	int ret;
@@ -4297,9 +4295,11 @@ int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
 		return ret;
 
 	/* Use new btrfs_qgroup_reserve_data to reserve precious data space. */
-	ret = btrfs_qgroup_reserve_data(inode, start, len);
+	ret = btrfs_qgroup_reserve_data(inode, reserved, start, len);
 	if (ret < 0)
 		btrfs_free_reserved_data_space_noquota(inode, start, len);
+	else
+		ret = 0;
 	return ret;
 }
 
@@ -6123,6 +6123,8 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
  * @inode: inode we're writing to
  * @start: start range we are writing to
  * @len: how long the range we are writing to
+ * @reserved: mandatory parameter, record actually reserved qgroup ranges of
+ * 	      current reservation.
  *
  * This will do the following things
  *
@@ -6140,11 +6142,12 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
  * Return 0 for success
  * Return <0 for error(-ENOSPC or -EQUOT)
  */
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
+int btrfs_delalloc_reserve_space(struct inode *inode,
+			struct extent_changeset **reserved, u64 start, u64 len)
 {
 	int ret;
 
-	ret = btrfs_check_data_free_space(inode, start, len);
+	ret = btrfs_check_data_free_space(inode, reserved, start, len);
 	if (ret < 0)
 		return ret;
 	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), len);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index cc1b08fa9fe7..afb3553f4f78 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -211,6 +211,40 @@ struct extent_changeset {
 	struct ulist range_changed;
 };
 
+static inline void extent_changeset_init(struct extent_changeset *changeset)
+{
+	changeset->bytes_changed = 0;
+	ulist_init(&changeset->range_changed);
+}
+
+static inline struct extent_changeset *extent_changeset_alloc(void)
+{
+	struct extent_changeset *ret;
+
+	ret = kmalloc(sizeof(*ret), GFP_KERNEL);
+	if (!ret)
+		return NULL;
+
+	extent_changeset_init(ret);
+	return ret;
+}
+
+static inline void extent_changeset_release(struct extent_changeset *changeset)
+{
+	if (!changeset)
+		return;
+	changeset->bytes_changed = 0;
+	ulist_release(&changeset->range_changed);
+}
+
+static inline void extent_changeset_free(struct extent_changeset *changeset)
+{
+	if (!changeset)
+		return;
+	extent_changeset_release(changeset);
+	kfree(changeset);
+}
+
 static inline void extent_set_compress_type(unsigned long *bio_flags,
 					    int compress_type)
 {
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index da1096eb1a40..d31c935b36ed 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1581,6 +1581,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct page **pages = NULL;
 	struct extent_state *cached_state = NULL;
+	struct extent_changeset *data_reserved = NULL;
 	u64 release_bytes = 0;
 	u64 lockstart;
 	u64 lockend;
@@ -1628,7 +1629,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 		reserve_bytes = round_up(write_bytes + sector_offset,
 				fs_info->sectorsize);
 
-		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
+		extent_changeset_release(data_reserved);
+		ret = btrfs_check_data_free_space(inode, &data_reserved, pos,
+						  write_bytes);
 		if (ret < 0) {
 			if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
 						      BTRFS_INODE_PREALLOC)) &&
@@ -1802,6 +1805,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 		}
 	}
 
+	extent_changeset_free(data_reserved);
 	return num_written ? num_written : ret;
 }
 
@@ -2769,6 +2773,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 {
 	struct inode *inode = file_inode(file);
 	struct extent_state *cached_state = NULL;
+	struct extent_changeset *data_reserved = NULL;
 	struct falloc_range *range;
 	struct falloc_range *tmp;
 	struct list_head reserve_list;
@@ -2898,8 +2903,8 @@ static long btrfs_fallocate(struct file *file, int mode,
 				free_extent_map(em);
 				break;
 			}
-			ret = btrfs_qgroup_reserve_data(inode, cur_offset,
-					last_byte - cur_offset);
+			ret = btrfs_qgroup_reserve_data(inode, &data_reserved,
+					cur_offset, last_byte - cur_offset);
 			if (ret < 0) {
 				free_extent_map(em);
 				break;
@@ -2971,6 +2976,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 	if (ret != 0)
 		btrfs_free_reserved_data_space(inode, alloc_start,
 				       alloc_end - cur_offset);
+	extent_changeset_free(data_reserved);
 	return ret;
 }
 
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 5c6c20ec64d8..d02019747d00 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -400,6 +400,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
 	struct btrfs_path *path;
 	struct inode *inode;
 	struct btrfs_block_rsv *rsv;
+	struct extent_changeset *data_reserved = NULL;
 	u64 num_bytes;
 	u64 alloc_hint = 0;
 	int ret;
@@ -492,7 +493,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
 	/* Just to make sure we have enough space */
 	prealloc += 8 * PAGE_SIZE;
 
-	ret = btrfs_delalloc_reserve_space(inode, 0, prealloc);
+	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, 0, prealloc);
 	if (ret)
 		goto out_put;
 
@@ -516,6 +517,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
 	trans->bytes_reserved = num_bytes;
 
 	btrfs_free_path(path);
+	extent_changeset_free(data_reserved);
 	return ret;
 }
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a1294d5baef5..4e211b54b267 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2035,6 +2035,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 	struct btrfs_writepage_fixup *fixup;
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
+	struct extent_changeset *data_reserved = NULL;
 	struct page *page;
 	struct inode *inode;
 	u64 page_start;
@@ -2072,7 +2073,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 		goto again;
 	}
 
-	ret = btrfs_delalloc_reserve_space(inode, page_start,
+	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start,
 					   PAGE_SIZE);
 	if (ret) {
 		mapping_set_error(page->mapping, ret);
@@ -2092,6 +2093,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 	unlock_page(page);
 	put_page(page);
 	kfree(fixup);
+	extent_changeset_free(data_reserved);
 }
 
 /*
@@ -4767,6 +4769,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
+	struct extent_changeset *data_reserved = NULL;
 	char *kaddr;
 	u32 blocksize = fs_info->sectorsize;
 	pgoff_t index = from >> PAGE_SHIFT;
@@ -4781,7 +4784,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	    (!len || ((len & (blocksize - 1)) == 0)))
 		goto out;
 
-	ret = btrfs_delalloc_reserve_space(inode,
+	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
 			round_down(from, blocksize), blocksize);
 	if (ret)
 		goto out;
@@ -4866,6 +4869,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	unlock_page(page);
 	put_page(page);
 out:
+	extent_changeset_free(data_reserved);
 	return ret;
 }
 
@@ -8725,6 +8729,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	struct inode *inode = file->f_mapping->host;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_dio_data dio_data = { 0 };
+	struct extent_changeset *data_reserved = NULL;
 	loff_t offset = iocb->ki_pos;
 	size_t count = 0;
 	int flags = 0;
@@ -8761,7 +8766,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 			inode_unlock(inode);
 			relock = true;
 		}
-		ret = btrfs_delalloc_reserve_space(inode, offset, count);
+		ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
+						   offset, count);
 		if (ret)
 			goto out;
 		dio_data.outstanding_extents = count_max_extents(count);
@@ -8818,6 +8824,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	if (relock)
 		inode_lock(inode);
 
+	extent_changeset_free(data_reserved);
 	return ret;
 }
 
@@ -9050,6 +9057,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
+	struct extent_changeset *data_reserved = NULL;
 	char *kaddr;
 	unsigned long zero_start;
 	loff_t size;
@@ -9075,7 +9083,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 	 * end up waiting indefinitely to get a lock on the page currently
 	 * being processed by btrfs_page_mkwrite() function.
 	 */
-	ret = btrfs_delalloc_reserve_space(inode, page_start,
+	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start,
 					   reserved_space);
 	if (!ret) {
 		ret = file_update_time(vmf->vma->vm_file);
@@ -9181,6 +9189,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 out_unlock:
 	if (!ret) {
 		sb_end_pagefault(inode->i_sb);
+		extent_changeset_free(data_reserved);
 		return VM_FAULT_LOCKED;
 	}
 	unlock_page(page);
@@ -9188,6 +9197,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 	btrfs_delalloc_release_space(inode, page_start, reserved_space);
 out_noreserve:
 	sb_end_pagefault(inode->i_sb);
+	extent_changeset_free(data_reserved);
 	return ret;
 }
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a29dc3fd7152..a16a16a66ed9 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1127,6 +1127,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	struct btrfs_ordered_extent *ordered;
 	struct extent_state *cached_state = NULL;
 	struct extent_io_tree *tree;
+	struct extent_changeset *data_reserved = NULL;
 	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
 
 	file_end = (isize - 1) >> PAGE_SHIFT;
@@ -1135,7 +1136,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 
 	page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
 
-	ret = btrfs_delalloc_reserve_space(inode,
+	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
 			start_index << PAGE_SHIFT,
 			page_cnt << PAGE_SHIFT);
 	if (ret)
@@ -1247,6 +1248,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		unlock_page(pages[i]);
 		put_page(pages[i]);
 	}
+	extent_changeset_free(data_reserved);
 	return i_done;
 out:
 	for (i = 0; i < i_done; i++) {
@@ -1256,6 +1258,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	btrfs_delalloc_release_space(inode,
 			start_index << PAGE_SHIFT,
 			page_cnt << PAGE_SHIFT);
+	extent_changeset_free(data_reserved);
 	return ret;
 
 }
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index ad2e99491395..6086a648e67e 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2824,43 +2824,60 @@ btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
  * Return <0 for error (including -EQUOT)
  *
  * NOTE: this function may sleep for memory allocation.
+ *       if btrfs_qgroup_reserve_data() is called multiple times with
+ *       same @reserved, caller must ensure when error happens it's OK
+ *       to free *ALL* reserved space.
  */
-int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
+int btrfs_qgroup_reserve_data(struct inode *inode,
+			struct extent_changeset **reserved_ret, u64 start,
+			u64 len)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
-	struct extent_changeset changeset;
 	struct ulist_node *unode;
 	struct ulist_iterator uiter;
+	struct extent_changeset *reserved;
+	u64 orig_reserved;
+	u64 to_reserve;
 	int ret;
 
 	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags) ||
 	    !is_fstree(root->objectid) || len == 0)
 		return 0;
 
-	changeset.bytes_changed = 0;
-	ulist_init(&changeset.range_changed);
+	/* @reserved parameter is mandatory for qgroup */
+	if (WARN_ON(!reserved_ret))
+		return -EINVAL;
+	if (!*reserved_ret) {
+		*reserved_ret = extent_changeset_alloc();
+		if (!*reserved_ret)
+			return -ENOMEM;
+	}
+	reserved = *reserved_ret;
+	/* Record already reserved space */
+	orig_reserved = reserved->bytes_changed;
 	ret = set_record_extent_bits(&BTRFS_I(inode)->io_tree, start,
-			start + len -1, EXTENT_QGROUP_RESERVED, &changeset);
+			start + len -1, EXTENT_QGROUP_RESERVED, reserved);
+
+	/* Newly reserved space */
+	to_reserve = reserved->bytes_changed - orig_reserved;
 	trace_btrfs_qgroup_reserve_data(inode, start, len,
-					changeset.bytes_changed,
-					QGROUP_RESERVE);
+					to_reserve, QGROUP_RESERVE);
 	if (ret < 0)
 		goto cleanup;
-	ret = qgroup_reserve(root, changeset.bytes_changed, true);
+	ret = qgroup_reserve(root, to_reserve, true);
 	if (ret < 0)
 		goto cleanup;
 
-	ulist_release(&changeset.range_changed);
 	return ret;
 
 cleanup:
-	/* cleanup already reserved ranges */
+	/* cleanup *ALL* already reserved ranges */
 	ULIST_ITER_INIT(&uiter);
-	while ((unode = ulist_next(&changeset.range_changed, &uiter)))
+	while ((unode = ulist_next(&reserved->range_changed, &uiter)))
 		clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
 				 unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL,
 				 GFP_NOFS);
-	ulist_release(&changeset.range_changed);
+	extent_changeset_release(reserved);
 	return ret;
 }
 
@@ -2871,8 +2888,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
 	int trace_op = QGROUP_RELEASE;
 	int ret;
 
-	changeset.bytes_changed = 0;
-	ulist_init(&changeset.range_changed);
+	extent_changeset_init(&changeset);
 	ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start, 
 			start + len -1, EXTENT_QGROUP_RESERVED, &changeset);
 	if (ret < 0)
@@ -2888,7 +2904,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
 				changeset.bytes_changed);
 	ret = changeset.bytes_changed;
 out:
-	ulist_release(&changeset.range_changed);
+	extent_changeset_release(&changeset);
 	return ret;
 }
 
@@ -2988,8 +3004,7 @@ void btrfs_qgroup_check_reserved_leak(struct inode *inode)
 	struct ulist_iterator iter;
 	int ret;
 
-	changeset.bytes_changed = 0;
-	ulist_init(&changeset.range_changed);
+	extent_changeset_init(&changeset);
 	ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, 0, (u64)-1,
 			EXTENT_QGROUP_RESERVED, &changeset);
 
@@ -3006,5 +3021,5 @@ void btrfs_qgroup_check_reserved_leak(struct inode *inode)
 				changeset.bytes_changed);
 
 	}
-	ulist_release(&changeset.range_changed);
+	extent_changeset_release(&changeset);
 }
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 38d14d4575c0..c7a2bcaff5d5 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -241,7 +241,8 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 #endif
 
 /* New io_tree based accurate qgroup reserve API */
-int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
+int btrfs_qgroup_reserve_data(struct inode *inode,
+			struct extent_changeset **reserved, u64 start, u64 len);
 int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
 int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index d60df51959f7..5823e2d67a84 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3093,11 +3093,12 @@ int prealloc_file_extent_cluster(struct inode *inode,
 	u64 prealloc_start = cluster->start - offset;
 	u64 prealloc_end = cluster->end - offset;
 	u64 cur_offset;
+	struct extent_changeset *data_reserved = NULL;
 
 	BUG_ON(cluster->start != cluster->boundary[0]);
 	inode_lock(inode);
 
-	ret = btrfs_check_data_free_space(inode, prealloc_start,
+	ret = btrfs_check_data_free_space(inode, &data_reserved, prealloc_start,
 					  prealloc_end + 1 - prealloc_start);
 	if (ret)
 		goto out;
@@ -3129,6 +3130,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
 				       prealloc_end + 1 - cur_offset);
 out:
 	inode_unlock(inode);
+	extent_changeset_free(data_reserved);
 	return ret;
 }
 
-- 
2.13.0




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH v3.2 6/6] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges
  2017-05-17  2:56 [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version Qu Wenruo
                   ` (4 preceding siblings ...)
  2017-05-17  2:56 ` [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions Qu Wenruo
@ 2017-05-17  2:56 ` Qu Wenruo
  2017-06-21 19:09 ` [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version David Sterba
  6 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2017-05-17  2:56 UTC (permalink / raw)
  To: linux-btrfs, dsterba

[BUG]
For the following case, btrfs can underflow qgroup reserved space
at error path:
(Page size 4K, function name without "btrfs_" prefix)

         Task A                  |             Task B
----------------------------------------------------------------------
Buffered_write [0, 2K)           |
|- check_data_free_space()       |
|  |- qgroup_reserve_data()      |
|     Range aligned to page      |
|     range [0, 4K)          <<< |
|     4K bytes reserved      <<< |
|- copy pages to page cache      |
                                 | Buffered_write [2K, 4K)
                                 | |- check_data_free_space()
                                 | |  |- qgroup_reserved_data()
                                 | |     Range alinged to page
                                 | |     range [0, 4K)
                                 | |     Already reserved by A <<<
                                 | |     0 bytes reserved      <<<
                                 | |- delalloc_reserve_metadata()
                                 | |  And it *FAILED* (Maybe EQUOTA)
                                 | |- free_reserved_data_space()
                                      |- qgroup_free_data()
                                         Range aligned to page range
                                         [0, 4K)
                                         Freeing 4K
(Special thanks to Chandan for the detailed report and analyse)

[CAUSE]
Above Task B is freeing reserved data range [0, 4K) which is actually
reserved by Task A.

And at write back time, page dirty by Task A will go through writeback
routine, which will free 4K reserved data space at file extent insert
time, causing the qgroup underflow.

[FIX]
For btrfs_qgroup_free_data(), add @reserved parameter to only free
data ranges reserved by previous btrfs_qgroup_reserve_data().
So in above case, Task B will try to free 0 byte, so no underflow.

Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
---
 fs/btrfs/ctree.h       |  6 +++--
 fs/btrfs/extent-tree.c | 12 +++++----
 fs/btrfs/file.c        | 29 +++++++++++---------
 fs/btrfs/inode.c       | 29 ++++++++++----------
 fs/btrfs/ioctl.c       |  4 +--
 fs/btrfs/qgroup.c      | 72 ++++++++++++++++++++++++++++++++++++++++++++++----
 fs/btrfs/qgroup.h      |  3 ++-
 fs/btrfs/relocation.c  |  8 +++---
 8 files changed, 117 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 52a0147cd612..75d2eced61b2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2707,7 +2707,10 @@ enum btrfs_flush_state {
 int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
 int btrfs_check_data_free_space(struct inode *inode,
 			struct extent_changeset **reserved, u64 start, u64 len);
-void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
+void btrfs_free_reserved_data_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len);
+void btrfs_delalloc_release_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len);
 void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
 					    u64 len);
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
@@ -2726,7 +2729,6 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
 void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes);
 int btrfs_delalloc_reserve_space(struct inode *inode,
 			struct extent_changeset **reserved, u64 start, u64 len);
-void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_fs_info *fs_info,
 					      unsigned short type);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ef09cc37f25f..eeeccc8a618e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4340,7 +4340,8 @@ void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
  * This one will handle the per-inode data rsv map for accurate reserved
  * space framework.
  */
-void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
+void btrfs_free_reserved_data_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 
@@ -4350,7 +4351,7 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len)
 	start = round_down(start, root->fs_info->sectorsize);
 
 	btrfs_free_reserved_data_space_noquota(inode, start, len);
-	btrfs_qgroup_free_data(inode, start, len);
+	btrfs_qgroup_free_data(inode, reserved, start, len);
 }
 
 static void force_metadata_allocation(struct btrfs_fs_info *info)
@@ -6152,7 +6153,7 @@ int btrfs_delalloc_reserve_space(struct inode *inode,
 		return ret;
 	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), len);
 	if (ret < 0)
-		btrfs_free_reserved_data_space(inode, start, len);
+		btrfs_free_reserved_data_space(inode, *reserved, start, len);
 	return ret;
 }
 
@@ -6171,10 +6172,11 @@ int btrfs_delalloc_reserve_space(struct inode *inode,
  * list if there are no delalloc bytes left.
  * Also it will handle the qgroup reserved space.
  */
-void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len)
+void btrfs_delalloc_release_space(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len)
 {
 	btrfs_delalloc_release_metadata(BTRFS_I(inode), len);
-	btrfs_free_reserved_data_space(inode, start, len);
+	btrfs_free_reserved_data_space(inode, reserved, start, len);
 }
 
 static int update_block_group(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d31c935b36ed..0502bd2272fe 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1660,8 +1660,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				reserve_bytes);
 		if (ret) {
 			if (!only_release_metadata)
-				btrfs_free_reserved_data_space(inode, pos,
-							       write_bytes);
+				btrfs_free_reserved_data_space(inode,
+						data_reserved, pos,
+						write_bytes);
 			else
 				btrfs_end_write_no_snapshoting(root);
 			break;
@@ -1743,8 +1744,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				__pos = round_down(pos,
 						   fs_info->sectorsize) +
 					(dirty_pages << PAGE_SHIFT);
-				btrfs_delalloc_release_space(inode, __pos,
-							     release_bytes);
+				btrfs_delalloc_release_space(inode,
+						data_reserved, __pos,
+						release_bytes);
 			}
 		}
 
@@ -1799,9 +1801,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 			btrfs_delalloc_release_metadata(BTRFS_I(inode),
 					release_bytes);
 		} else {
-			btrfs_delalloc_release_space(inode,
-						round_down(pos, fs_info->sectorsize),
-						release_bytes);
+			btrfs_delalloc_release_space(inode, data_reserved,
+					round_down(pos, fs_info->sectorsize),
+					release_bytes);
 		}
 	}
 
@@ -2915,8 +2917,8 @@ static long btrfs_fallocate(struct file *file, int mode,
 			 * range, free reserved data space first, otherwise
 			 * it'll result in false ENOSPC error.
 			 */
-			btrfs_free_reserved_data_space(inode, cur_offset,
-				last_byte - cur_offset);
+			btrfs_free_reserved_data_space(inode, data_reserved,
+					cur_offset, last_byte - cur_offset);
 		}
 		free_extent_map(em);
 		cur_offset = last_byte;
@@ -2935,8 +2937,9 @@ static long btrfs_fallocate(struct file *file, int mode,
 					range->len, i_blocksize(inode),
 					offset + len, &alloc_hint);
 		else
-			btrfs_free_reserved_data_space(inode, range->start,
-						       range->len);
+			btrfs_free_reserved_data_space(inode,
+					data_reserved, range->start,
+					range->len);
 		list_del(&range->list);
 		kfree(range);
 	}
@@ -2974,8 +2977,8 @@ static long btrfs_fallocate(struct file *file, int mode,
 	inode_unlock(inode);
 	/* Let go of our reservation. */
 	if (ret != 0)
-		btrfs_free_reserved_data_space(inode, alloc_start,
-				       alloc_end - cur_offset);
+		btrfs_free_reserved_data_space(inode, data_reserved,
+				alloc_start, alloc_end - cur_offset);
 	extent_changeset_free(data_reserved);
 	return ret;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4e211b54b267..c545aaaecb70 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -350,7 +350,7 @@ static noinline int cow_file_range_inline(struct btrfs_root *root,
 	 * And at reserve time, it's always aligned to page size, so
 	 * just free one page here.
 	 */
-	btrfs_qgroup_free_data(inode, 0, PAGE_SIZE);
+	btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE);
 	btrfs_free_path(path);
 	btrfs_end_transaction(trans);
 	return ret;
@@ -2933,7 +2933,7 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		 * space for NOCOW range.
 		 * As NOCOW won't cause a new delayed ref, just free the space
 		 */
-		btrfs_qgroup_free_data(inode, ordered_extent->file_offset,
+		btrfs_qgroup_free_data(inode, NULL, ordered_extent->file_offset,
 				       ordered_extent->len);
 		btrfs_ordered_update_i_size(inode, 0, ordered_extent);
 		if (nolock)
@@ -4792,7 +4792,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 again:
 	page = find_or_create_page(mapping, index, mask);
 	if (!page) {
-		btrfs_delalloc_release_space(inode,
+		btrfs_delalloc_release_space(inode, data_reserved,
 				round_down(from, blocksize),
 				blocksize);
 		ret = -ENOMEM;
@@ -4864,7 +4864,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 
 out_unlock:
 	if (ret)
-		btrfs_delalloc_release_space(inode, block_start,
+		btrfs_delalloc_release_space(inode, data_reserved, block_start,
 					     blocksize);
 	unlock_page(page);
 	put_page(page);
@@ -5264,7 +5264,7 @@ static void evict_inode_truncate_pages(struct inode *inode)
 		 * Note, end is the bytenr of last byte, so we need + 1 here.
 		 */
 		if (state->state & EXTENT_DELALLOC)
-			btrfs_qgroup_free_data(inode, start, end - start + 1);
+			btrfs_qgroup_free_data(inode, NULL, start, end - start + 1);
 
 		clear_extent_bit(io_tree, start, end,
 				 EXTENT_LOCKED | EXTENT_DIRTY |
@@ -8799,8 +8799,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		current->journal_info = NULL;
 		if (ret < 0 && ret != -EIOCBQUEUED) {
 			if (dio_data.reserve)
-				btrfs_delalloc_release_space(inode, offset,
-							     dio_data.reserve);
+				btrfs_delalloc_release_space(inode, data_reserved,
+					offset, dio_data.reserve);
 			/*
 			 * On error we might have left some ordered extents
 			 * without submitting corresponding bios for them, so
@@ -8815,8 +8815,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 					dio_data.unsubmitted_oe_range_start,
 					false);
 		} else if (ret >= 0 && (size_t)ret < count)
-			btrfs_delalloc_release_space(inode, offset,
-						     count - (size_t)ret);
+			btrfs_delalloc_release_space(inode, data_reserved,
+					offset, count - (size_t)ret);
 	}
 out:
 	if (wakeup)
@@ -9015,7 +9015,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	 *    free the entire extent.
 	 */
 	if (PageDirty(page))
-		btrfs_qgroup_free_data(inode, page_start, PAGE_SIZE);
+		btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE);
 	if (!inode_evicting) {
 		clear_extent_bit(tree, page_start, page_end,
 				 EXTENT_LOCKED | EXTENT_DIRTY |
@@ -9137,8 +9137,8 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 			spin_lock(&BTRFS_I(inode)->lock);
 			BTRFS_I(inode)->outstanding_extents++;
 			spin_unlock(&BTRFS_I(inode)->lock);
-			btrfs_delalloc_release_space(inode, page_start,
-						PAGE_SIZE - reserved_space);
+			btrfs_delalloc_release_space(inode, data_reserved,
+					page_start, PAGE_SIZE - reserved_space);
 		}
 	}
 
@@ -9194,7 +9194,8 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 	}
 	unlock_page(page);
 out:
-	btrfs_delalloc_release_space(inode, page_start, reserved_space);
+	btrfs_delalloc_release_space(inode, data_reserved, page_start,
+				     reserved_space);
 out_noreserve:
 	sb_end_pagefault(inode->i_sb);
 	extent_changeset_free(data_reserved);
@@ -10553,7 +10554,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 			btrfs_end_transaction(trans);
 	}
 	if (cur_offset < end)
-		btrfs_free_reserved_data_space(inode, cur_offset,
+		btrfs_free_reserved_data_space(inode, NULL, cur_offset,
 			end - cur_offset + 1);
 	return ret;
 }
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a16a16a66ed9..29f8d1817a1f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1227,7 +1227,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		spin_lock(&BTRFS_I(inode)->lock);
 		BTRFS_I(inode)->outstanding_extents++;
 		spin_unlock(&BTRFS_I(inode)->lock);
-		btrfs_delalloc_release_space(inode,
+		btrfs_delalloc_release_space(inode, data_reserved,
 				start_index << PAGE_SHIFT,
 				(page_cnt - i_done) << PAGE_SHIFT);
 	}
@@ -1255,7 +1255,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		unlock_page(pages[i]);
 		put_page(pages[i]);
 	}
-	btrfs_delalloc_release_space(inode,
+	btrfs_delalloc_release_space(inode, data_reserved,
 			start_index << PAGE_SHIFT,
 			page_cnt << PAGE_SHIFT);
 	extent_changeset_free(data_reserved);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 6086a648e67e..86366c6f2675 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2881,13 +2881,72 @@ int btrfs_qgroup_reserve_data(struct inode *inode,
 	return ret;
 }
 
-static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
-				       int free)
+/* Free ranges specified by @reserved, normally in error path */
+static int qgroup_free_reserved_data(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+	struct extent_changeset changeset;
+	int freed = 0;
+	int ret;
+
+	extent_changeset_init(&changeset);
+	len = round_up(start + len, root->fs_info->sectorsize);
+	start = round_down(start, root->fs_info->sectorsize);
+
+	ULIST_ITER_INIT(&uiter);
+	while ((unode = ulist_next(&reserved->range_changed, &uiter))) {
+		u64 range_start = unode->val;
+		/* unode->aux is the inclusive end */
+		u64 range_len = unode->aux - range_start + 1;
+		u64 free_start;
+		u64 free_len;
+
+		extent_changeset_release(&changeset);
+
+		/* Only free range in range [start, start + len) */
+		if (range_start >= start + len ||
+		    range_start + range_len <= start)
+			continue;
+		free_start = max(range_start, start);
+		free_len = min(start + len, range_start + range_len) -
+			   free_start;
+		/*
+		 * TODO: To also modify reserved->ranges_reserved to reflect
+		 * the modification.
+		 *
+		 * However as long as we free qgroup reserved according to
+		 * EXTENT_QGROUP_RESERVED, we won't double free.
+		 * So not need to rush.
+		 */
+		ret = clear_record_extent_bits(&BTRFS_I(inode)->io_failure_tree,
+				free_start, free_start + free_len - 1,
+				EXTENT_QGROUP_RESERVED, &changeset);
+		if (ret < 0)
+			goto out;
+		freed += changeset.bytes_changed;
+	}
+	btrfs_qgroup_free_refroot(root->fs_info, root->objectid, freed);
+	ret = freed;
+out:
+	extent_changeset_release(&changeset);
+	return ret;
+}
+
+static int __btrfs_qgroup_release_data(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len,
+			int free)
 {
 	struct extent_changeset changeset;
 	int trace_op = QGROUP_RELEASE;
 	int ret;
 
+	/* In release case, we shouldn't have @reserved */
+	WARN_ON(!free && reserved);
+	if (free && reserved)
+		return qgroup_free_reserved_data(inode, reserved, start, len);
 	extent_changeset_init(&changeset);
 	ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start, 
 			start + len -1, EXTENT_QGROUP_RESERVED, &changeset);
@@ -2913,14 +2972,17 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
  *
  * Should be called when a range of pages get invalidated before reaching disk.
  * Or for error cleanup case.
+ * if @reserved is given, only reserved range in [@start, @start + @len) will
+ * be freed.
  *
  * For data written to disk, use btrfs_qgroup_release_data().
  *
  * NOTE: This function may sleep for memory allocation.
  */
-int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len)
+int btrfs_qgroup_free_data(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len)
 {
-	return __btrfs_qgroup_release_data(inode, start, len, 1);
+	return __btrfs_qgroup_release_data(inode, reserved, start, len, 1);
 }
 
 /*
@@ -2940,7 +3002,7 @@ int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len)
  */
 int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len)
 {
-	return __btrfs_qgroup_release_data(inode, start, len, 0);
+	return __btrfs_qgroup_release_data(inode, NULL, start, len, 0);
 }
 
 int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index c7a2bcaff5d5..102aa7fb342b 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -244,7 +244,8 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
 int btrfs_qgroup_reserve_data(struct inode *inode,
 			struct extent_changeset **reserved, u64 start, u64 len);
 int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
-int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
+int btrfs_qgroup_free_data(struct inode *inode,
+			struct extent_changeset *reserved, u64 start, u64 len);
 
 int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
 			      bool enforce);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 5823e2d67a84..2115f3e02dd5 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3114,8 +3114,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
 		lock_extent(&BTRFS_I(inode)->io_tree, start, end);
 		num_bytes = end + 1 - start;
 		if (cur_offset < start)
-			btrfs_free_reserved_data_space(inode, cur_offset,
-					start - cur_offset);
+			btrfs_free_reserved_data_space(inode, data_reserved,
+					cur_offset, start - cur_offset);
 		ret = btrfs_prealloc_file_range(inode, 0, start,
 						num_bytes, num_bytes,
 						end + 1, &alloc_hint);
@@ -3126,8 +3126,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
 		nr++;
 	}
 	if (cur_offset < prealloc_end)
-		btrfs_free_reserved_data_space(inode, cur_offset,
-				       prealloc_end + 1 - cur_offset);
+		btrfs_free_reserved_data_space(inode, data_reserved,
+				cur_offset, prealloc_end + 1 - cur_offset);
 out:
 	inode_unlock(inode);
 	extent_changeset_free(data_reserved);
-- 
2.13.0




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-05-17  2:56 ` [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions Qu Wenruo
@ 2017-05-17 15:37   ` David Sterba
  2017-05-18  0:24     ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: David Sterba @ 2017-05-17 15:37 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, May 17, 2017 at 10:56:27AM +0800, Qu Wenruo wrote:
> Introduce a new parameter, struct extent_changeset for
> btrfs_qgroup_reserved_data() and its callers.
> 
> Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
> which range it reserved in current reserve, so it can free it at error
> path.
> 
> The reason we need to export it to callers is, at buffered write error
> path, without knowing what exactly which range we reserved in current
> allocation, we can free space which is not reserved by us.
> 
> This will lead to qgroup reserved space underflow.
> 
> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/ctree.h       |  6 ++++--
>  fs/btrfs/extent-tree.c | 23 +++++++++++++----------
>  fs/btrfs/extent_io.h   | 34 +++++++++++++++++++++++++++++++++
>  fs/btrfs/file.c        | 12 +++++++++---
>  fs/btrfs/inode-map.c   |  4 +++-
>  fs/btrfs/inode.c       | 18 ++++++++++++++----
>  fs/btrfs/ioctl.c       |  5 ++++-
>  fs/btrfs/qgroup.c      | 51 ++++++++++++++++++++++++++++++++------------------
>  fs/btrfs/qgroup.h      |  3 ++-
>  fs/btrfs/relocation.c  |  4 +++-
>  10 files changed, 119 insertions(+), 41 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 1e82516fe2d8..52a0147cd612 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2704,8 +2704,9 @@ enum btrfs_flush_state {
>  	COMMIT_TRANS		=	6,
>  };
>  
> -int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
>  int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
> +int btrfs_check_data_free_space(struct inode *inode,
> +			struct extent_changeset **reserved, u64 start, u64 len);
>  void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
>  void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
>  					    u64 len);
> @@ -2723,7 +2724,8 @@ void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
>  				      struct btrfs_block_rsv *rsv);
>  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
>  void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes);
> -int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
> +int btrfs_delalloc_reserve_space(struct inode *inode,
> +			struct extent_changeset **reserved, u64 start, u64 len);
>  void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
>  void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
>  struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_fs_info *fs_info,
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 4f62696131a6..ef09cc37f25f 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3364,6 +3364,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
>  	struct btrfs_fs_info *fs_info = block_group->fs_info;
>  	struct btrfs_root *root = fs_info->tree_root;
>  	struct inode *inode = NULL;
> +	struct extent_changeset *data_reserved = NULL;
>  	u64 alloc_hint = 0;
>  	int dcs = BTRFS_DC_ERROR;
>  	u64 num_pages = 0;
> @@ -3483,7 +3484,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
>  	num_pages *= 16;
>  	num_pages *= PAGE_SIZE;
>  
> -	ret = btrfs_check_data_free_space(inode, 0, num_pages);
> +	ret = btrfs_check_data_free_space(inode, &data_reserved, 0, num_pages);
>  	if (ret)
>  		goto out_put;
>  
> @@ -3514,6 +3515,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
>  	block_group->disk_cache_state = dcs;
>  	spin_unlock(&block_group->lock);
>  
> +	extent_changeset_free(data_reserved);
>  	return ret;
>  }
>  
> @@ -4277,12 +4279,8 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
>  	return ret;
>  }
>  
> -/*
> - * New check_data_free_space() with ability for precious data reservation
> - * Will replace old btrfs_check_data_free_space(), but for patch split,
> - * add a new function first and then replace it.
> - */
> -int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
> +int btrfs_check_data_free_space(struct inode *inode,
> +			struct extent_changeset **reserved, u64 start, u64 len)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  	int ret;
> @@ -4297,9 +4295,11 @@ int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
>  		return ret;
>  
>  	/* Use new btrfs_qgroup_reserve_data to reserve precious data space. */
> -	ret = btrfs_qgroup_reserve_data(inode, start, len);
> +	ret = btrfs_qgroup_reserve_data(inode, reserved, start, len);
>  	if (ret < 0)
>  		btrfs_free_reserved_data_space_noquota(inode, start, len);
> +	else
> +		ret = 0;
>  	return ret;
>  }
>  
> @@ -6123,6 +6123,8 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
>   * @inode: inode we're writing to
>   * @start: start range we are writing to
>   * @len: how long the range we are writing to
> + * @reserved: mandatory parameter, record actually reserved qgroup ranges of
> + * 	      current reservation.
>   *
>   * This will do the following things
>   *
> @@ -6140,11 +6142,12 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
>   * Return 0 for success
>   * Return <0 for error(-ENOSPC or -EQUOT)
>   */
> -int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
> +int btrfs_delalloc_reserve_space(struct inode *inode,
> +			struct extent_changeset **reserved, u64 start, u64 len)
>  {
>  	int ret;
>  
> -	ret = btrfs_check_data_free_space(inode, start, len);
> +	ret = btrfs_check_data_free_space(inode, reserved, start, len);
>  	if (ret < 0)
>  		return ret;
>  	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), len);
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index cc1b08fa9fe7..afb3553f4f78 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -211,6 +211,40 @@ struct extent_changeset {
>  	struct ulist range_changed;
>  };
>  
> +static inline void extent_changeset_init(struct extent_changeset *changeset)
> +{
> +	changeset->bytes_changed = 0;
> +	ulist_init(&changeset->range_changed);
> +}
> +
> +static inline struct extent_changeset *extent_changeset_alloc(void)
> +{
> +	struct extent_changeset *ret;
> +
> +	ret = kmalloc(sizeof(*ret), GFP_KERNEL);

I don't remember if we'd discussed this before, but have you evaluated
if GFP_KERNEL is ok to use in this context?

The potentially problematic callchain:

btrfs_fallocate
  lock_extent_bits
  btrfs_qgroup_reserve_data
    extent_changeset_alloc
      kmalloc(GFP_KERNEL)

so if the allocator wants to flush data and we go back to locking the
extent range, this could deadlock.

> +	if (!ret)
> +		return NULL;
> +
> +	extent_changeset_init(ret);
> +	return ret;
> +}
> +
> +static inline void extent_changeset_release(struct extent_changeset *changeset)
> +{
> +	if (!changeset)
> +		return;
> +	changeset->bytes_changed = 0;
> +	ulist_release(&changeset->range_changed);
> +}
> +
> +static inline void extent_changeset_free(struct extent_changeset *changeset)
> +{
> +	if (!changeset)
> +		return;
> +	extent_changeset_release(changeset);
> +	kfree(changeset);
> +}
> +
>  static inline void extent_set_compress_type(unsigned long *bio_flags,
>  					    int compress_type)
>  {
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index da1096eb1a40..d31c935b36ed 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1581,6 +1581,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>  	struct page **pages = NULL;
>  	struct extent_state *cached_state = NULL;
> +	struct extent_changeset *data_reserved = NULL;
>  	u64 release_bytes = 0;
>  	u64 lockstart;
>  	u64 lockend;
> @@ -1628,7 +1629,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  		reserve_bytes = round_up(write_bytes + sector_offset,
>  				fs_info->sectorsize);
>  
> -		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
> +		extent_changeset_release(data_reserved);
> +		ret = btrfs_check_data_free_space(inode, &data_reserved, pos,
> +						  write_bytes);
>  		if (ret < 0) {
>  			if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>  						      BTRFS_INODE_PREALLOC)) &&
> @@ -1802,6 +1805,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  		}
>  	}
>  
> +	extent_changeset_free(data_reserved);
>  	return num_written ? num_written : ret;
>  }
>  
> @@ -2769,6 +2773,7 @@ static long btrfs_fallocate(struct file *file, int mode,
>  {
>  	struct inode *inode = file_inode(file);
>  	struct extent_state *cached_state = NULL;
> +	struct extent_changeset *data_reserved = NULL;
>  	struct falloc_range *range;
>  	struct falloc_range *tmp;
>  	struct list_head reserve_list;
> @@ -2898,8 +2903,8 @@ static long btrfs_fallocate(struct file *file, int mode,
>  				free_extent_map(em);
>  				break;
>  			}
> -			ret = btrfs_qgroup_reserve_data(inode, cur_offset,
> -					last_byte - cur_offset);
> +			ret = btrfs_qgroup_reserve_data(inode, &data_reserved,
> +					cur_offset, last_byte - cur_offset);
>  			if (ret < 0) {
>  				free_extent_map(em);
>  				break;
> @@ -2971,6 +2976,7 @@ static long btrfs_fallocate(struct file *file, int mode,
>  	if (ret != 0)
>  		btrfs_free_reserved_data_space(inode, alloc_start,
>  				       alloc_end - cur_offset);
> +	extent_changeset_free(data_reserved);
>  	return ret;
>  }
>  
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index 5c6c20ec64d8..d02019747d00 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -400,6 +400,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>  	struct btrfs_path *path;
>  	struct inode *inode;
>  	struct btrfs_block_rsv *rsv;
> +	struct extent_changeset *data_reserved = NULL;
>  	u64 num_bytes;
>  	u64 alloc_hint = 0;
>  	int ret;
> @@ -492,7 +493,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>  	/* Just to make sure we have enough space */
>  	prealloc += 8 * PAGE_SIZE;
>  
> -	ret = btrfs_delalloc_reserve_space(inode, 0, prealloc);
> +	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, 0, prealloc);
>  	if (ret)
>  		goto out_put;
>  
> @@ -516,6 +517,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>  	trans->bytes_reserved = num_bytes;
>  
>  	btrfs_free_path(path);
> +	extent_changeset_free(data_reserved);
>  	return ret;
>  }
>  
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index a1294d5baef5..4e211b54b267 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2035,6 +2035,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>  	struct btrfs_writepage_fixup *fixup;
>  	struct btrfs_ordered_extent *ordered;
>  	struct extent_state *cached_state = NULL;
> +	struct extent_changeset *data_reserved = NULL;
>  	struct page *page;
>  	struct inode *inode;
>  	u64 page_start;
> @@ -2072,7 +2073,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>  		goto again;
>  	}
>  
> -	ret = btrfs_delalloc_reserve_space(inode, page_start,
> +	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start,
>  					   PAGE_SIZE);
>  	if (ret) {
>  		mapping_set_error(page->mapping, ret);
> @@ -2092,6 +2093,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>  	unlock_page(page);
>  	put_page(page);
>  	kfree(fixup);
> +	extent_changeset_free(data_reserved);
>  }
>  
>  /*
> @@ -4767,6 +4769,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>  	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>  	struct btrfs_ordered_extent *ordered;
>  	struct extent_state *cached_state = NULL;
> +	struct extent_changeset *data_reserved = NULL;
>  	char *kaddr;
>  	u32 blocksize = fs_info->sectorsize;
>  	pgoff_t index = from >> PAGE_SHIFT;
> @@ -4781,7 +4784,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>  	    (!len || ((len & (blocksize - 1)) == 0)))
>  		goto out;
>  
> -	ret = btrfs_delalloc_reserve_space(inode,
> +	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>  			round_down(from, blocksize), blocksize);
>  	if (ret)
>  		goto out;
> @@ -4866,6 +4869,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>  	unlock_page(page);
>  	put_page(page);
>  out:
> +	extent_changeset_free(data_reserved);
>  	return ret;
>  }
>  
> @@ -8725,6 +8729,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  	struct inode *inode = file->f_mapping->host;
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  	struct btrfs_dio_data dio_data = { 0 };
> +	struct extent_changeset *data_reserved = NULL;
>  	loff_t offset = iocb->ki_pos;
>  	size_t count = 0;
>  	int flags = 0;
> @@ -8761,7 +8766,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  			inode_unlock(inode);
>  			relock = true;
>  		}
> -		ret = btrfs_delalloc_reserve_space(inode, offset, count);
> +		ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
> +						   offset, count);
>  		if (ret)
>  			goto out;
>  		dio_data.outstanding_extents = count_max_extents(count);
> @@ -8818,6 +8824,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  	if (relock)
>  		inode_lock(inode);
>  
> +	extent_changeset_free(data_reserved);
>  	return ret;
>  }
>  
> @@ -9050,6 +9057,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
>  	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>  	struct btrfs_ordered_extent *ordered;
>  	struct extent_state *cached_state = NULL;
> +	struct extent_changeset *data_reserved = NULL;
>  	char *kaddr;
>  	unsigned long zero_start;
>  	loff_t size;
> @@ -9075,7 +9083,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
>  	 * end up waiting indefinitely to get a lock on the page currently
>  	 * being processed by btrfs_page_mkwrite() function.
>  	 */
> -	ret = btrfs_delalloc_reserve_space(inode, page_start,
> +	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start,
>  					   reserved_space);
>  	if (!ret) {
>  		ret = file_update_time(vmf->vma->vm_file);
> @@ -9181,6 +9189,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
>  out_unlock:
>  	if (!ret) {
>  		sb_end_pagefault(inode->i_sb);
> +		extent_changeset_free(data_reserved);
>  		return VM_FAULT_LOCKED;
>  	}
>  	unlock_page(page);
> @@ -9188,6 +9197,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
>  	btrfs_delalloc_release_space(inode, page_start, reserved_space);
>  out_noreserve:
>  	sb_end_pagefault(inode->i_sb);
> +	extent_changeset_free(data_reserved);
>  	return ret;
>  }
>  
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index a29dc3fd7152..a16a16a66ed9 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1127,6 +1127,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>  	struct btrfs_ordered_extent *ordered;
>  	struct extent_state *cached_state = NULL;
>  	struct extent_io_tree *tree;
> +	struct extent_changeset *data_reserved = NULL;
>  	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
>  
>  	file_end = (isize - 1) >> PAGE_SHIFT;
> @@ -1135,7 +1136,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>  
>  	page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
>  
> -	ret = btrfs_delalloc_reserve_space(inode,
> +	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>  			start_index << PAGE_SHIFT,
>  			page_cnt << PAGE_SHIFT);
>  	if (ret)
> @@ -1247,6 +1248,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>  		unlock_page(pages[i]);
>  		put_page(pages[i]);
>  	}
> +	extent_changeset_free(data_reserved);
>  	return i_done;
>  out:
>  	for (i = 0; i < i_done; i++) {
> @@ -1256,6 +1258,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>  	btrfs_delalloc_release_space(inode,
>  			start_index << PAGE_SHIFT,
>  			page_cnt << PAGE_SHIFT);
> +	extent_changeset_free(data_reserved);
>  	return ret;
>  
>  }
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index ad2e99491395..6086a648e67e 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2824,43 +2824,60 @@ btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>   * Return <0 for error (including -EQUOT)
>   *
>   * NOTE: this function may sleep for memory allocation.
> + *       if btrfs_qgroup_reserve_data() is called multiple times with
> + *       same @reserved, caller must ensure when error happens it's OK
> + *       to free *ALL* reserved space.
>   */
> -int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
> +int btrfs_qgroup_reserve_data(struct inode *inode,
> +			struct extent_changeset **reserved_ret, u64 start,
> +			u64 len)
>  {
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
> -	struct extent_changeset changeset;
>  	struct ulist_node *unode;
>  	struct ulist_iterator uiter;
> +	struct extent_changeset *reserved;
> +	u64 orig_reserved;
> +	u64 to_reserve;
>  	int ret;
>  
>  	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags) ||
>  	    !is_fstree(root->objectid) || len == 0)
>  		return 0;
>  
> -	changeset.bytes_changed = 0;
> -	ulist_init(&changeset.range_changed);
> +	/* @reserved parameter is mandatory for qgroup */
> +	if (WARN_ON(!reserved_ret))
> +		return -EINVAL;
> +	if (!*reserved_ret) {
> +		*reserved_ret = extent_changeset_alloc();
> +		if (!*reserved_ret)
> +			return -ENOMEM;
> +	}
> +	reserved = *reserved_ret;
> +	/* Record already reserved space */
> +	orig_reserved = reserved->bytes_changed;
>  	ret = set_record_extent_bits(&BTRFS_I(inode)->io_tree, start,
> -			start + len -1, EXTENT_QGROUP_RESERVED, &changeset);
> +			start + len -1, EXTENT_QGROUP_RESERVED, reserved);
> +
> +	/* Newly reserved space */
> +	to_reserve = reserved->bytes_changed - orig_reserved;
>  	trace_btrfs_qgroup_reserve_data(inode, start, len,
> -					changeset.bytes_changed,
> -					QGROUP_RESERVE);
> +					to_reserve, QGROUP_RESERVE);
>  	if (ret < 0)
>  		goto cleanup;
> -	ret = qgroup_reserve(root, changeset.bytes_changed, true);
> +	ret = qgroup_reserve(root, to_reserve, true);
>  	if (ret < 0)
>  		goto cleanup;
>  
> -	ulist_release(&changeset.range_changed);
>  	return ret;
>  
>  cleanup:
> -	/* cleanup already reserved ranges */
> +	/* cleanup *ALL* already reserved ranges */
>  	ULIST_ITER_INIT(&uiter);
> -	while ((unode = ulist_next(&changeset.range_changed, &uiter)))
> +	while ((unode = ulist_next(&reserved->range_changed, &uiter)))
>  		clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
>  				 unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL,
>  				 GFP_NOFS);
> -	ulist_release(&changeset.range_changed);
> +	extent_changeset_release(reserved);
>  	return ret;
>  }
>  
> @@ -2871,8 +2888,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
>  	int trace_op = QGROUP_RELEASE;
>  	int ret;
>  
> -	changeset.bytes_changed = 0;
> -	ulist_init(&changeset.range_changed);
> +	extent_changeset_init(&changeset);
>  	ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start, 
>  			start + len -1, EXTENT_QGROUP_RESERVED, &changeset);
>  	if (ret < 0)
> @@ -2888,7 +2904,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
>  				changeset.bytes_changed);
>  	ret = changeset.bytes_changed;
>  out:
> -	ulist_release(&changeset.range_changed);
> +	extent_changeset_release(&changeset);
>  	return ret;
>  }
>  
> @@ -2988,8 +3004,7 @@ void btrfs_qgroup_check_reserved_leak(struct inode *inode)
>  	struct ulist_iterator iter;
>  	int ret;
>  
> -	changeset.bytes_changed = 0;
> -	ulist_init(&changeset.range_changed);
> +	extent_changeset_init(&changeset);
>  	ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, 0, (u64)-1,
>  			EXTENT_QGROUP_RESERVED, &changeset);
>  
> @@ -3006,5 +3021,5 @@ void btrfs_qgroup_check_reserved_leak(struct inode *inode)
>  				changeset.bytes_changed);
>  
>  	}
> -	ulist_release(&changeset.range_changed);
> +	extent_changeset_release(&changeset);
>  }
> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
> index 38d14d4575c0..c7a2bcaff5d5 100644
> --- a/fs/btrfs/qgroup.h
> +++ b/fs/btrfs/qgroup.h
> @@ -241,7 +241,8 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
>  #endif
>  
>  /* New io_tree based accurate qgroup reserve API */
> -int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
> +int btrfs_qgroup_reserve_data(struct inode *inode,
> +			struct extent_changeset **reserved, u64 start, u64 len);
>  int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
>  int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
>  
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index d60df51959f7..5823e2d67a84 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3093,11 +3093,12 @@ int prealloc_file_extent_cluster(struct inode *inode,
>  	u64 prealloc_start = cluster->start - offset;
>  	u64 prealloc_end = cluster->end - offset;
>  	u64 cur_offset;
> +	struct extent_changeset *data_reserved = NULL;
>  
>  	BUG_ON(cluster->start != cluster->boundary[0]);
>  	inode_lock(inode);
>  
> -	ret = btrfs_check_data_free_space(inode, prealloc_start,
> +	ret = btrfs_check_data_free_space(inode, &data_reserved, prealloc_start,
>  					  prealloc_end + 1 - prealloc_start);
>  	if (ret)
>  		goto out;
> @@ -3129,6 +3130,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
>  				       prealloc_end + 1 - cur_offset);
>  out:
>  	inode_unlock(inode);
> +	extent_changeset_free(data_reserved);
>  	return ret;
>  }
>  
> -- 
> 2.13.0
> 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-05-17 15:37   ` David Sterba
@ 2017-05-18  0:24     ` Qu Wenruo
  2017-05-18 13:45       ` David Sterba
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2017-05-18  0:24 UTC (permalink / raw)
  To: dsterba, linux-btrfs



At 05/17/2017 11:37 PM, David Sterba wrote:
> On Wed, May 17, 2017 at 10:56:27AM +0800, Qu Wenruo wrote:
>> Introduce a new parameter, struct extent_changeset for
>> btrfs_qgroup_reserved_data() and its callers.
>>
>> Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
>> which range it reserved in current reserve, so it can free it at error
>> path.
>>
>> The reason we need to export it to callers is, at buffered write error
>> path, without knowing what exactly which range we reserved in current
>> allocation, we can free space which is not reserved by us.
>>
>> This will lead to qgroup reserved space underflow.
>>
>> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>   fs/btrfs/ctree.h       |  6 ++++--
>>   fs/btrfs/extent-tree.c | 23 +++++++++++++----------
>>   fs/btrfs/extent_io.h   | 34 +++++++++++++++++++++++++++++++++
>>   fs/btrfs/file.c        | 12 +++++++++---
>>   fs/btrfs/inode-map.c   |  4 +++-
>>   fs/btrfs/inode.c       | 18 ++++++++++++++----
>>   fs/btrfs/ioctl.c       |  5 ++++-
>>   fs/btrfs/qgroup.c      | 51 ++++++++++++++++++++++++++++++++------------------
>>   fs/btrfs/qgroup.h      |  3 ++-
>>   fs/btrfs/relocation.c  |  4 +++-
>>   10 files changed, 119 insertions(+), 41 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 1e82516fe2d8..52a0147cd612 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -2704,8 +2704,9 @@ enum btrfs_flush_state {
>>   	COMMIT_TRANS		=	6,
>>   };
>>   
>> -int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len);
>>   int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
>> +int btrfs_check_data_free_space(struct inode *inode,
>> +			struct extent_changeset **reserved, u64 start, u64 len);
>>   void btrfs_free_reserved_data_space(struct inode *inode, u64 start, u64 len);
>>   void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
>>   					    u64 len);
>> @@ -2723,7 +2724,8 @@ void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
>>   				      struct btrfs_block_rsv *rsv);
>>   int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
>>   void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes);
>> -int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
>> +int btrfs_delalloc_reserve_space(struct inode *inode,
>> +			struct extent_changeset **reserved, u64 start, u64 len);
>>   void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
>>   void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
>>   struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_fs_info *fs_info,
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 4f62696131a6..ef09cc37f25f 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -3364,6 +3364,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
>>   	struct btrfs_fs_info *fs_info = block_group->fs_info;
>>   	struct btrfs_root *root = fs_info->tree_root;
>>   	struct inode *inode = NULL;
>> +	struct extent_changeset *data_reserved = NULL;
>>   	u64 alloc_hint = 0;
>>   	int dcs = BTRFS_DC_ERROR;
>>   	u64 num_pages = 0;
>> @@ -3483,7 +3484,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
>>   	num_pages *= 16;
>>   	num_pages *= PAGE_SIZE;
>>   
>> -	ret = btrfs_check_data_free_space(inode, 0, num_pages);
>> +	ret = btrfs_check_data_free_space(inode, &data_reserved, 0, num_pages);
>>   	if (ret)
>>   		goto out_put;
>>   
>> @@ -3514,6 +3515,7 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
>>   	block_group->disk_cache_state = dcs;
>>   	spin_unlock(&block_group->lock);
>>   
>> +	extent_changeset_free(data_reserved);
>>   	return ret;
>>   }
>>   
>> @@ -4277,12 +4279,8 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
>>   	return ret;
>>   }
>>   
>> -/*
>> - * New check_data_free_space() with ability for precious data reservation
>> - * Will replace old btrfs_check_data_free_space(), but for patch split,
>> - * add a new function first and then replace it.
>> - */
>> -int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
>> +int btrfs_check_data_free_space(struct inode *inode,
>> +			struct extent_changeset **reserved, u64 start, u64 len)
>>   {
>>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>>   	int ret;
>> @@ -4297,9 +4295,11 @@ int btrfs_check_data_free_space(struct inode *inode, u64 start, u64 len)
>>   		return ret;
>>   
>>   	/* Use new btrfs_qgroup_reserve_data to reserve precious data space. */
>> -	ret = btrfs_qgroup_reserve_data(inode, start, len);
>> +	ret = btrfs_qgroup_reserve_data(inode, reserved, start, len);
>>   	if (ret < 0)
>>   		btrfs_free_reserved_data_space_noquota(inode, start, len);
>> +	else
>> +		ret = 0;
>>   	return ret;
>>   }
>>   
>> @@ -6123,6 +6123,8 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
>>    * @inode: inode we're writing to
>>    * @start: start range we are writing to
>>    * @len: how long the range we are writing to
>> + * @reserved: mandatory parameter, record actually reserved qgroup ranges of
>> + * 	      current reservation.
>>    *
>>    * This will do the following things
>>    *
>> @@ -6140,11 +6142,12 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
>>    * Return 0 for success
>>    * Return <0 for error(-ENOSPC or -EQUOT)
>>    */
>> -int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len)
>> +int btrfs_delalloc_reserve_space(struct inode *inode,
>> +			struct extent_changeset **reserved, u64 start, u64 len)
>>   {
>>   	int ret;
>>   
>> -	ret = btrfs_check_data_free_space(inode, start, len);
>> +	ret = btrfs_check_data_free_space(inode, reserved, start, len);
>>   	if (ret < 0)
>>   		return ret;
>>   	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), len);
>> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
>> index cc1b08fa9fe7..afb3553f4f78 100644
>> --- a/fs/btrfs/extent_io.h
>> +++ b/fs/btrfs/extent_io.h
>> @@ -211,6 +211,40 @@ struct extent_changeset {
>>   	struct ulist range_changed;
>>   };
>>   
>> +static inline void extent_changeset_init(struct extent_changeset *changeset)
>> +{
>> +	changeset->bytes_changed = 0;
>> +	ulist_init(&changeset->range_changed);
>> +}
>> +
>> +static inline struct extent_changeset *extent_changeset_alloc(void)
>> +{
>> +	struct extent_changeset *ret;
>> +
>> +	ret = kmalloc(sizeof(*ret), GFP_KERNEL);
> 
> I don't remember if we'd discussed this before, but have you evaluated
> if GFP_KERNEL is ok to use in this context?

IIRC you have informed me that I shouldn't abuse GFP_NOFS.

> 
> The potentially problematic callchain:
> 
> btrfs_fallocate
>    lock_extent_bits
>    btrfs_qgroup_reserve_data
>      extent_changeset_alloc
>        kmalloc(GFP_KERNEL)
> 
> so if the allocator wants to flush data and we go back to locking the
> extent range, this could deadlock.

That's right, GFP_NOFS should be used here.

Thanks,
Qu
> 
>> +	if (!ret)
>> +		return NULL;
>> +
>> +	extent_changeset_init(ret);
>> +	return ret;
>> +}
>> +
>> +static inline void extent_changeset_release(struct extent_changeset *changeset)
>> +{
>> +	if (!changeset)
>> +		return;
>> +	changeset->bytes_changed = 0;
>> +	ulist_release(&changeset->range_changed);
>> +}
>> +
>> +static inline void extent_changeset_free(struct extent_changeset *changeset)
>> +{
>> +	if (!changeset)
>> +		return;
>> +	extent_changeset_release(changeset);
>> +	kfree(changeset);
>> +}
>> +
>>   static inline void extent_set_compress_type(unsigned long *bio_flags,
>>   					    int compress_type)
>>   {
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index da1096eb1a40..d31c935b36ed 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1581,6 +1581,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>>   	struct btrfs_root *root = BTRFS_I(inode)->root;
>>   	struct page **pages = NULL;
>>   	struct extent_state *cached_state = NULL;
>> +	struct extent_changeset *data_reserved = NULL;
>>   	u64 release_bytes = 0;
>>   	u64 lockstart;
>>   	u64 lockend;
>> @@ -1628,7 +1629,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>>   		reserve_bytes = round_up(write_bytes + sector_offset,
>>   				fs_info->sectorsize);
>>   
>> -		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
>> +		extent_changeset_release(data_reserved);
>> +		ret = btrfs_check_data_free_space(inode, &data_reserved, pos,
>> +						  write_bytes);
>>   		if (ret < 0) {
>>   			if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>   						      BTRFS_INODE_PREALLOC)) &&
>> @@ -1802,6 +1805,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>>   		}
>>   	}
>>   
>> +	extent_changeset_free(data_reserved);
>>   	return num_written ? num_written : ret;
>>   }
>>   
>> @@ -2769,6 +2773,7 @@ static long btrfs_fallocate(struct file *file, int mode,
>>   {
>>   	struct inode *inode = file_inode(file);
>>   	struct extent_state *cached_state = NULL;
>> +	struct extent_changeset *data_reserved = NULL;
>>   	struct falloc_range *range;
>>   	struct falloc_range *tmp;
>>   	struct list_head reserve_list;
>> @@ -2898,8 +2903,8 @@ static long btrfs_fallocate(struct file *file, int mode,
>>   				free_extent_map(em);
>>   				break;
>>   			}
>> -			ret = btrfs_qgroup_reserve_data(inode, cur_offset,
>> -					last_byte - cur_offset);
>> +			ret = btrfs_qgroup_reserve_data(inode, &data_reserved,
>> +					cur_offset, last_byte - cur_offset);
>>   			if (ret < 0) {
>>   				free_extent_map(em);
>>   				break;
>> @@ -2971,6 +2976,7 @@ static long btrfs_fallocate(struct file *file, int mode,
>>   	if (ret != 0)
>>   		btrfs_free_reserved_data_space(inode, alloc_start,
>>   				       alloc_end - cur_offset);
>> +	extent_changeset_free(data_reserved);
>>   	return ret;
>>   }
>>   
>> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
>> index 5c6c20ec64d8..d02019747d00 100644
>> --- a/fs/btrfs/inode-map.c
>> +++ b/fs/btrfs/inode-map.c
>> @@ -400,6 +400,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>>   	struct btrfs_path *path;
>>   	struct inode *inode;
>>   	struct btrfs_block_rsv *rsv;
>> +	struct extent_changeset *data_reserved = NULL;
>>   	u64 num_bytes;
>>   	u64 alloc_hint = 0;
>>   	int ret;
>> @@ -492,7 +493,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>>   	/* Just to make sure we have enough space */
>>   	prealloc += 8 * PAGE_SIZE;
>>   
>> -	ret = btrfs_delalloc_reserve_space(inode, 0, prealloc);
>> +	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, 0, prealloc);
>>   	if (ret)
>>   		goto out_put;
>>   
>> @@ -516,6 +517,7 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>>   	trans->bytes_reserved = num_bytes;
>>   
>>   	btrfs_free_path(path);
>> +	extent_changeset_free(data_reserved);
>>   	return ret;
>>   }
>>   
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index a1294d5baef5..4e211b54b267 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -2035,6 +2035,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>>   	struct btrfs_writepage_fixup *fixup;
>>   	struct btrfs_ordered_extent *ordered;
>>   	struct extent_state *cached_state = NULL;
>> +	struct extent_changeset *data_reserved = NULL;
>>   	struct page *page;
>>   	struct inode *inode;
>>   	u64 page_start;
>> @@ -2072,7 +2073,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>>   		goto again;
>>   	}
>>   
>> -	ret = btrfs_delalloc_reserve_space(inode, page_start,
>> +	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start,
>>   					   PAGE_SIZE);
>>   	if (ret) {
>>   		mapping_set_error(page->mapping, ret);
>> @@ -2092,6 +2093,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>>   	unlock_page(page);
>>   	put_page(page);
>>   	kfree(fixup);
>> +	extent_changeset_free(data_reserved);
>>   }
>>   
>>   /*
>> @@ -4767,6 +4769,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>   	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>>   	struct btrfs_ordered_extent *ordered;
>>   	struct extent_state *cached_state = NULL;
>> +	struct extent_changeset *data_reserved = NULL;
>>   	char *kaddr;
>>   	u32 blocksize = fs_info->sectorsize;
>>   	pgoff_t index = from >> PAGE_SHIFT;
>> @@ -4781,7 +4784,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>   	    (!len || ((len & (blocksize - 1)) == 0)))
>>   		goto out;
>>   
>> -	ret = btrfs_delalloc_reserve_space(inode,
>> +	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>>   			round_down(from, blocksize), blocksize);
>>   	if (ret)
>>   		goto out;
>> @@ -4866,6 +4869,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>>   	unlock_page(page);
>>   	put_page(page);
>>   out:
>> +	extent_changeset_free(data_reserved);
>>   	return ret;
>>   }
>>   
>> @@ -8725,6 +8729,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>   	struct inode *inode = file->f_mapping->host;
>>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>>   	struct btrfs_dio_data dio_data = { 0 };
>> +	struct extent_changeset *data_reserved = NULL;
>>   	loff_t offset = iocb->ki_pos;
>>   	size_t count = 0;
>>   	int flags = 0;
>> @@ -8761,7 +8766,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>   			inode_unlock(inode);
>>   			relock = true;
>>   		}
>> -		ret = btrfs_delalloc_reserve_space(inode, offset, count);
>> +		ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>> +						   offset, count);
>>   		if (ret)
>>   			goto out;
>>   		dio_data.outstanding_extents = count_max_extents(count);
>> @@ -8818,6 +8824,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>   	if (relock)
>>   		inode_lock(inode);
>>   
>> +	extent_changeset_free(data_reserved);
>>   	return ret;
>>   }
>>   
>> @@ -9050,6 +9057,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
>>   	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>>   	struct btrfs_ordered_extent *ordered;
>>   	struct extent_state *cached_state = NULL;
>> +	struct extent_changeset *data_reserved = NULL;
>>   	char *kaddr;
>>   	unsigned long zero_start;
>>   	loff_t size;
>> @@ -9075,7 +9083,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
>>   	 * end up waiting indefinitely to get a lock on the page currently
>>   	 * being processed by btrfs_page_mkwrite() function.
>>   	 */
>> -	ret = btrfs_delalloc_reserve_space(inode, page_start,
>> +	ret = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start,
>>   					   reserved_space);
>>   	if (!ret) {
>>   		ret = file_update_time(vmf->vma->vm_file);
>> @@ -9181,6 +9189,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
>>   out_unlock:
>>   	if (!ret) {
>>   		sb_end_pagefault(inode->i_sb);
>> +		extent_changeset_free(data_reserved);
>>   		return VM_FAULT_LOCKED;
>>   	}
>>   	unlock_page(page);
>> @@ -9188,6 +9197,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
>>   	btrfs_delalloc_release_space(inode, page_start, reserved_space);
>>   out_noreserve:
>>   	sb_end_pagefault(inode->i_sb);
>> +	extent_changeset_free(data_reserved);
>>   	return ret;
>>   }
>>   
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index a29dc3fd7152..a16a16a66ed9 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1127,6 +1127,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>>   	struct btrfs_ordered_extent *ordered;
>>   	struct extent_state *cached_state = NULL;
>>   	struct extent_io_tree *tree;
>> +	struct extent_changeset *data_reserved = NULL;
>>   	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
>>   
>>   	file_end = (isize - 1) >> PAGE_SHIFT;
>> @@ -1135,7 +1136,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>>   
>>   	page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1);
>>   
>> -	ret = btrfs_delalloc_reserve_space(inode,
>> +	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>>   			start_index << PAGE_SHIFT,
>>   			page_cnt << PAGE_SHIFT);
>>   	if (ret)
>> @@ -1247,6 +1248,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>>   		unlock_page(pages[i]);
>>   		put_page(pages[i]);
>>   	}
>> +	extent_changeset_free(data_reserved);
>>   	return i_done;
>>   out:
>>   	for (i = 0; i < i_done; i++) {
>> @@ -1256,6 +1258,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>>   	btrfs_delalloc_release_space(inode,
>>   			start_index << PAGE_SHIFT,
>>   			page_cnt << PAGE_SHIFT);
>> +	extent_changeset_free(data_reserved);
>>   	return ret;
>>   
>>   }
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index ad2e99491395..6086a648e67e 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -2824,43 +2824,60 @@ btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>>    * Return <0 for error (including -EQUOT)
>>    *
>>    * NOTE: this function may sleep for memory allocation.
>> + *       if btrfs_qgroup_reserve_data() is called multiple times with
>> + *       same @reserved, caller must ensure when error happens it's OK
>> + *       to free *ALL* reserved space.
>>    */
>> -int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len)
>> +int btrfs_qgroup_reserve_data(struct inode *inode,
>> +			struct extent_changeset **reserved_ret, u64 start,
>> +			u64 len)
>>   {
>>   	struct btrfs_root *root = BTRFS_I(inode)->root;
>> -	struct extent_changeset changeset;
>>   	struct ulist_node *unode;
>>   	struct ulist_iterator uiter;
>> +	struct extent_changeset *reserved;
>> +	u64 orig_reserved;
>> +	u64 to_reserve;
>>   	int ret;
>>   
>>   	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &root->fs_info->flags) ||
>>   	    !is_fstree(root->objectid) || len == 0)
>>   		return 0;
>>   
>> -	changeset.bytes_changed = 0;
>> -	ulist_init(&changeset.range_changed);
>> +	/* @reserved parameter is mandatory for qgroup */
>> +	if (WARN_ON(!reserved_ret))
>> +		return -EINVAL;
>> +	if (!*reserved_ret) {
>> +		*reserved_ret = extent_changeset_alloc();
>> +		if (!*reserved_ret)
>> +			return -ENOMEM;
>> +	}
>> +	reserved = *reserved_ret;
>> +	/* Record already reserved space */
>> +	orig_reserved = reserved->bytes_changed;
>>   	ret = set_record_extent_bits(&BTRFS_I(inode)->io_tree, start,
>> -			start + len -1, EXTENT_QGROUP_RESERVED, &changeset);
>> +			start + len -1, EXTENT_QGROUP_RESERVED, reserved);
>> +
>> +	/* Newly reserved space */
>> +	to_reserve = reserved->bytes_changed - orig_reserved;
>>   	trace_btrfs_qgroup_reserve_data(inode, start, len,
>> -					changeset.bytes_changed,
>> -					QGROUP_RESERVE);
>> +					to_reserve, QGROUP_RESERVE);
>>   	if (ret < 0)
>>   		goto cleanup;
>> -	ret = qgroup_reserve(root, changeset.bytes_changed, true);
>> +	ret = qgroup_reserve(root, to_reserve, true);
>>   	if (ret < 0)
>>   		goto cleanup;
>>   
>> -	ulist_release(&changeset.range_changed);
>>   	return ret;
>>   
>>   cleanup:
>> -	/* cleanup already reserved ranges */
>> +	/* cleanup *ALL* already reserved ranges */
>>   	ULIST_ITER_INIT(&uiter);
>> -	while ((unode = ulist_next(&changeset.range_changed, &uiter)))
>> +	while ((unode = ulist_next(&reserved->range_changed, &uiter)))
>>   		clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
>>   				 unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL,
>>   				 GFP_NOFS);
>> -	ulist_release(&changeset.range_changed);
>> +	extent_changeset_release(reserved);
>>   	return ret;
>>   }
>>   
>> @@ -2871,8 +2888,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
>>   	int trace_op = QGROUP_RELEASE;
>>   	int ret;
>>   
>> -	changeset.bytes_changed = 0;
>> -	ulist_init(&changeset.range_changed);
>> +	extent_changeset_init(&changeset);
>>   	ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start,
>>   			start + len -1, EXTENT_QGROUP_RESERVED, &changeset);
>>   	if (ret < 0)
>> @@ -2888,7 +2904,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len,
>>   				changeset.bytes_changed);
>>   	ret = changeset.bytes_changed;
>>   out:
>> -	ulist_release(&changeset.range_changed);
>> +	extent_changeset_release(&changeset);
>>   	return ret;
>>   }
>>   
>> @@ -2988,8 +3004,7 @@ void btrfs_qgroup_check_reserved_leak(struct inode *inode)
>>   	struct ulist_iterator iter;
>>   	int ret;
>>   
>> -	changeset.bytes_changed = 0;
>> -	ulist_init(&changeset.range_changed);
>> +	extent_changeset_init(&changeset);
>>   	ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, 0, (u64)-1,
>>   			EXTENT_QGROUP_RESERVED, &changeset);
>>   
>> @@ -3006,5 +3021,5 @@ void btrfs_qgroup_check_reserved_leak(struct inode *inode)
>>   				changeset.bytes_changed);
>>   
>>   	}
>> -	ulist_release(&changeset.range_changed);
>> +	extent_changeset_release(&changeset);
>>   }
>> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
>> index 38d14d4575c0..c7a2bcaff5d5 100644
>> --- a/fs/btrfs/qgroup.h
>> +++ b/fs/btrfs/qgroup.h
>> @@ -241,7 +241,8 @@ int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid,
>>   #endif
>>   
>>   /* New io_tree based accurate qgroup reserve API */
>> -int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len);
>> +int btrfs_qgroup_reserve_data(struct inode *inode,
>> +			struct extent_changeset **reserved, u64 start, u64 len);
>>   int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
>>   int btrfs_qgroup_free_data(struct inode *inode, u64 start, u64 len);
>>   
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index d60df51959f7..5823e2d67a84 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -3093,11 +3093,12 @@ int prealloc_file_extent_cluster(struct inode *inode,
>>   	u64 prealloc_start = cluster->start - offset;
>>   	u64 prealloc_end = cluster->end - offset;
>>   	u64 cur_offset;
>> +	struct extent_changeset *data_reserved = NULL;
>>   
>>   	BUG_ON(cluster->start != cluster->boundary[0]);
>>   	inode_lock(inode);
>>   
>> -	ret = btrfs_check_data_free_space(inode, prealloc_start,
>> +	ret = btrfs_check_data_free_space(inode, &data_reserved, prealloc_start,
>>   					  prealloc_end + 1 - prealloc_start);
>>   	if (ret)
>>   		goto out;
>> @@ -3129,6 +3130,7 @@ int prealloc_file_extent_cluster(struct inode *inode,
>>   				       prealloc_end + 1 - cur_offset);
>>   out:
>>   	inode_unlock(inode);
>> +	extent_changeset_free(data_reserved);
>>   	return ret;
>>   }
>>   
>> -- 
>> 2.13.0
>>
>>
>>
> 
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-05-18  0:24     ` Qu Wenruo
@ 2017-05-18 13:45       ` David Sterba
  2017-05-19  0:32         ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: David Sterba @ 2017-05-18 13:45 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, May 18, 2017 at 08:24:26AM +0800, Qu Wenruo wrote:
> >> +static inline void extent_changeset_init(struct extent_changeset *changeset)
> >> +{
> >> +	changeset->bytes_changed = 0;
> >> +	ulist_init(&changeset->range_changed);
> >> +}
> >> +
> >> +static inline struct extent_changeset *extent_changeset_alloc(void)
> >> +{
> >> +	struct extent_changeset *ret;
> >> +
> >> +	ret = kmalloc(sizeof(*ret), GFP_KERNEL);
> > 
> > I don't remember if we'd discussed this before, but have you evaluated
> > if GFP_KERNEL is ok to use in this context?
> 
> IIRC you have informed me that I shouldn't abuse GFP_NOFS.

Use of GFP_NOFS or _KERNEL has to be evaluated case by case. So if it's
"let's use NOFS because everybody else does" or "he said I should not
use NOFS, then I'll use KERNEL", then it's wrong and I'll complain.

A short notice in the changelog or a comment above the allocation would
better signify that the patch author spent some time thinking about the
consequences.

Sometimes it can become pretty hard to find the potential deadlock
scenarios. Using GFP_NOFS in such case is a matter of precaution, but at
least would be nice to be explictly stated somewhere.

The hard cases help to understand the callchain patterns and it's easier
to detect them in the future. For example, in your patch I already knew
that it's a problem when I saw lock_extent_bits, because I had seen this
pattern in a patch doing allocation in FIEMAP. Commit
afce772e87c36c7f07f230a76d525025aaf09e41, discussion in thread
http://lkml.kernel.org/r/1465362783-27078-1-git-send-email-lufq.fnst@cn.fujitsu.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-05-18 13:45       ` David Sterba
@ 2017-05-19  0:32         ` Qu Wenruo
  2017-05-29 15:51           ` David Sterba
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2017-05-19  0:32 UTC (permalink / raw)
  To: dsterba, linux-btrfs



At 05/18/2017 09:45 PM, David Sterba wrote:
> On Thu, May 18, 2017 at 08:24:26AM +0800, Qu Wenruo wrote:
>>>> +static inline void extent_changeset_init(struct extent_changeset *changeset)
>>>> +{
>>>> +	changeset->bytes_changed = 0;
>>>> +	ulist_init(&changeset->range_changed);
>>>> +}
>>>> +
>>>> +static inline struct extent_changeset *extent_changeset_alloc(void)
>>>> +{
>>>> +	struct extent_changeset *ret;
>>>> +
>>>> +	ret = kmalloc(sizeof(*ret), GFP_KERNEL);
>>>
>>> I don't remember if we'd discussed this before, but have you evaluated
>>> if GFP_KERNEL is ok to use in this context?
>>
>> IIRC you have informed me that I shouldn't abuse GFP_NOFS.
> 
> Use of GFP_NOFS or _KERNEL has to be evaluated case by case. So if it's
> "let's use NOFS because everybody else does" or "he said I should not
> use NOFS, then I'll use KERNEL", then it's wrong and I'll complain.
> 
> A short notice in the changelog or a comment above the allocation would
> better signify that the patch author spent some time thinking about the
> consequences.
> 
> Sometimes it can become pretty hard to find the potential deadlock
> scenarios. Using GFP_NOFS in such case is a matter of precaution, but at
> least would be nice to be explictly stated somewhere.

Yes it's hard to find such deadlock especially when lockdep will not 
detect it.

And this makes the advantage of using stack memory in v3 patch more obvious.

I didn't realize the extra possible deadlock when memory pressure is 
high, and to make completely correct usage of GFP_ flags we should let 
caller to choose its GFP_ flag, which will introduce more modification 
and more possibility to cause problem.

So now I prefer the stack version a little more.

Thanks,
Qu

> 
> The hard cases help to understand the callchain patterns and it's easier
> to detect them in the future. For example, in your patch I already knew
> that it's a problem when I saw lock_extent_bits, because I had seen this
> pattern in a patch doing allocation in FIEMAP. Commit
> afce772e87c36c7f07f230a76d525025aaf09e41, discussion in thread
> http://lkml.kernel.org/r/1465362783-27078-1-git-send-email-lufq.fnst@cn.fujitsu.com
> 
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-05-19  0:32         ` Qu Wenruo
@ 2017-05-29 15:51           ` David Sterba
  2017-05-31  0:31             ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: David Sterba @ 2017-05-29 15:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Fri, May 19, 2017 at 08:32:18AM +0800, Qu Wenruo wrote:
> At 05/18/2017 09:45 PM, David Sterba wrote:
> > On Thu, May 18, 2017 at 08:24:26AM +0800, Qu Wenruo wrote:
> >>>> +static inline void extent_changeset_init(struct extent_changeset *changeset)
> >>>> +{
> >>>> +	changeset->bytes_changed = 0;
> >>>> +	ulist_init(&changeset->range_changed);
> >>>> +}
> >>>> +
> >>>> +static inline struct extent_changeset *extent_changeset_alloc(void)
> >>>> +{
> >>>> +	struct extent_changeset *ret;
> >>>> +
> >>>> +	ret = kmalloc(sizeof(*ret), GFP_KERNEL);
> >>>
> >>> I don't remember if we'd discussed this before, but have you evaluated
> >>> if GFP_KERNEL is ok to use in this context?
> >>
> >> IIRC you have informed me that I shouldn't abuse GFP_NOFS.
> > 
> > Use of GFP_NOFS or _KERNEL has to be evaluated case by case. So if it's
> > "let's use NOFS because everybody else does" or "he said I should not
> > use NOFS, then I'll use KERNEL", then it's wrong and I'll complain.
> > 
> > A short notice in the changelog or a comment above the allocation would
> > better signify that the patch author spent some time thinking about the
> > consequences.
> > 
> > Sometimes it can become pretty hard to find the potential deadlock
> > scenarios. Using GFP_NOFS in such case is a matter of precaution, but at
> > least would be nice to be explictly stated somewhere.
> 
> Yes it's hard to find such deadlock especially when lockdep will not 
> detect it.
> 
> And this makes the advantage of using stack memory in v3 patch more obvious.
> 
> I didn't realize the extra possible deadlock when memory pressure is 
> high, and to make completely correct usage of GFP_ flags we should let 
> caller to choose its GFP_ flag, which will introduce more modification 
> and more possibility to cause problem.
> 
> So now I prefer the stack version a little more.

The difference is that the stack version will always consume the stack
at runtime.  The dynamic allocation will not, but we have to add error
handling and make sure we use right gfp flags. So it's runtime vs review
trade off, I choose to spend time on review.

As catching all the gfp misuse is hard, we'll need some runtime
validation anyway, ie. marking the start and end of the context where
GFP_KERNEL must not be used.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-05-29 15:51           ` David Sterba
@ 2017-05-31  0:31             ` Qu Wenruo
  2017-05-31 14:30               ` David Sterba
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2017-05-31  0:31 UTC (permalink / raw)
  To: dsterba, linux-btrfs



At 05/29/2017 11:51 PM, David Sterba wrote:
> On Fri, May 19, 2017 at 08:32:18AM +0800, Qu Wenruo wrote:
>> At 05/18/2017 09:45 PM, David Sterba wrote:
>>> On Thu, May 18, 2017 at 08:24:26AM +0800, Qu Wenruo wrote:
>>>>>> +static inline void extent_changeset_init(struct extent_changeset *changeset)
>>>>>> +{
>>>>>> +	changeset->bytes_changed = 0;
>>>>>> +	ulist_init(&changeset->range_changed);
>>>>>> +}
>>>>>> +
>>>>>> +static inline struct extent_changeset *extent_changeset_alloc(void)
>>>>>> +{
>>>>>> +	struct extent_changeset *ret;
>>>>>> +
>>>>>> +	ret = kmalloc(sizeof(*ret), GFP_KERNEL);
>>>>>
>>>>> I don't remember if we'd discussed this before, but have you evaluated
>>>>> if GFP_KERNEL is ok to use in this context?
>>>>
>>>> IIRC you have informed me that I shouldn't abuse GFP_NOFS.
>>>
>>> Use of GFP_NOFS or _KERNEL has to be evaluated case by case. So if it's
>>> "let's use NOFS because everybody else does" or "he said I should not
>>> use NOFS, then I'll use KERNEL", then it's wrong and I'll complain.
>>>
>>> A short notice in the changelog or a comment above the allocation would
>>> better signify that the patch author spent some time thinking about the
>>> consequences.
>>>
>>> Sometimes it can become pretty hard to find the potential deadlock
>>> scenarios. Using GFP_NOFS in such case is a matter of precaution, but at
>>> least would be nice to be explictly stated somewhere.
>>
>> Yes it's hard to find such deadlock especially when lockdep will not
>> detect it.
>>
>> And this makes the advantage of using stack memory in v3 patch more obvious.
>>
>> I didn't realize the extra possible deadlock when memory pressure is
>> high, and to make completely correct usage of GFP_ flags we should let
>> caller to choose its GFP_ flag, which will introduce more modification
>> and more possibility to cause problem.
>>
>> So now I prefer the stack version a little more.
> 
> The difference is that the stack version will always consume the stack
> at runtime.  The dynamic allocation will not, but we have to add error
> handling and make sure we use right gfp flags. So it's runtime vs review
> trade off, I choose to spend time on review.

OK, then I'll update the patchset to allow passing gfp flags for each 
reservation.

> 
> As catching all the gfp misuse is hard, we'll need some runtime
> validation anyway, ie. marking the start and end of the context where
> GFP_KERNEL must not be used.

That's indeed a very nice feature.

Thanks,
Qu

> 
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-05-31  0:31             ` Qu Wenruo
@ 2017-05-31 14:30               ` David Sterba
  2017-06-01  1:01                 ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: David Sterba @ 2017-05-31 14:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Wed, May 31, 2017 at 08:31:35AM +0800, Qu Wenruo wrote:
> >> Yes it's hard to find such deadlock especially when lockdep will not
> >> detect it.
> >>
> >> And this makes the advantage of using stack memory in v3 patch more obvious.
> >>
> >> I didn't realize the extra possible deadlock when memory pressure is
> >> high, and to make completely correct usage of GFP_ flags we should let
> >> caller to choose its GFP_ flag, which will introduce more modification
> >> and more possibility to cause problem.
> >>
> >> So now I prefer the stack version a little more.
> > 
> > The difference is that the stack version will always consume the stack
> > at runtime.  The dynamic allocation will not, but we have to add error
> > handling and make sure we use right gfp flags. So it's runtime vs review
> > trade off, I choose to spend time on review.
> 
> OK, then I'll update the patchset to allow passing gfp flags for each 
> reservation.

You mean to add gfp flags to extent_changeset_alloc and update the
direct callers or to add gfp flags to the whole reservation codepath?
I strongly prefer to use GFP_NOFS for now, although it's not ideal.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-05-31 14:30               ` David Sterba
@ 2017-06-01  1:01                 ` Qu Wenruo
  2017-06-02 14:16                   ` David Sterba
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2017-06-01  1:01 UTC (permalink / raw)
  To: dsterba, linux-btrfs



At 05/31/2017 10:30 PM, David Sterba wrote:
> On Wed, May 31, 2017 at 08:31:35AM +0800, Qu Wenruo wrote:
>>>> Yes it's hard to find such deadlock especially when lockdep will not
>>>> detect it.
>>>>
>>>> And this makes the advantage of using stack memory in v3 patch more obvious.
>>>>
>>>> I didn't realize the extra possible deadlock when memory pressure is
>>>> high, and to make completely correct usage of GFP_ flags we should let
>>>> caller to choose its GFP_ flag, which will introduce more modification
>>>> and more possibility to cause problem.
>>>>
>>>> So now I prefer the stack version a little more.
>>>
>>> The difference is that the stack version will always consume the stack
>>> at runtime.  The dynamic allocation will not, but we have to add error
>>> handling and make sure we use right gfp flags. So it's runtime vs review
>>> trade off, I choose to spend time on review.
>>
>> OK, then I'll update the patchset to allow passing gfp flags for each
>> reservation.
> 
> You mean to add gfp flags to extent_changeset_alloc and update the
> direct callers or to add gfp flags to the whole reservation codepath?

Yes, I was planning to do it.

> I strongly prefer to use GFP_NOFS for now, although it's not ideal.

OK, then keep GFP_NOFS.
But I also want to know the reason why.

Is it just because we don't have good enough tool to detect possible 
deadlock caused by wrong GFP_* flags in write path?

Thanks,
Qu



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  2017-06-01  1:01                 ` Qu Wenruo
@ 2017-06-02 14:16                   ` David Sterba
  0 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2017-06-02 14:16 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Thu, Jun 01, 2017 at 09:01:26AM +0800, Qu Wenruo wrote:
> 
> 
> At 05/31/2017 10:30 PM, David Sterba wrote:
> > On Wed, May 31, 2017 at 08:31:35AM +0800, Qu Wenruo wrote:
> >>>> Yes it's hard to find such deadlock especially when lockdep will not
> >>>> detect it.
> >>>>
> >>>> And this makes the advantage of using stack memory in v3 patch more obvious.
> >>>>
> >>>> I didn't realize the extra possible deadlock when memory pressure is
> >>>> high, and to make completely correct usage of GFP_ flags we should let
> >>>> caller to choose its GFP_ flag, which will introduce more modification
> >>>> and more possibility to cause problem.
> >>>>
> >>>> So now I prefer the stack version a little more.
> >>>
> >>> The difference is that the stack version will always consume the stack
> >>> at runtime.  The dynamic allocation will not, but we have to add error
> >>> handling and make sure we use right gfp flags. So it's runtime vs review
> >>> trade off, I choose to spend time on review.
> >>
> >> OK, then I'll update the patchset to allow passing gfp flags for each
> >> reservation.
> > 
> > You mean to add gfp flags to extent_changeset_alloc and update the
> > direct callers or to add gfp flags to the whole reservation codepath?
> 
> Yes, I was planning to do it.
> 
> > I strongly prefer to use GFP_NOFS for now, although it's not ideal.
> 
> OK, then keep GFP_NOFS.
> But I also want to know the reason why.
> 
> Is it just because we don't have good enough tool to detect possible 
> deadlock caused by wrong GFP_* flags in write path?

Yes, basically. I'ts either overzealous GFP_NOFS or potential deadlock
with GFP_KERNEL. We'll deal with the NOFS eventually, so we want to be
safe until then.

Michal Hocko has a debugging patch that will report use of NOFS when
it's not needed, but we have to explicitly mark the sections for that.
This hasn't happened and is not easy to do as we have to audit lots of
codepaths.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version
  2017-05-17  2:56 [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version Qu Wenruo
                   ` (5 preceding siblings ...)
  2017-05-17  2:56 ` [RFC PATCH v3.2 6/6] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges Qu Wenruo
@ 2017-06-21 19:09 ` David Sterba
  6 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2017-06-21 19:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, dsterba

On Wed, May 17, 2017 at 10:56:22AM +0800, Qu Wenruo wrote:
> The remaining qgroup fixes patches, based on the Chris' for-linus-4.12
> branch with commit 9bcaaea7418d09691f1ffab5c49aacafe3eef9d0 as base.
> 
> Can be fetched from github:
> https://github.com/adam900710/linux/tree/qgroup_fixes_non_stack
> 
> This update is commit message update only.
> 
> v2:
>   Add reviewed-by tag for 2nd patch
>   Update the first patch to follow the new trace point standard
> RFC v3:
>   Use non-stack (dyanamic allocation) for extent_changeset structure, in
>   5th patch, to reduce impact for quota disabled cases.
>   Rebase to latest for-linus-4.12 branch.
> RFC v3.1:
>   Update comment to include the newly introduced parameter
>   Use init/release function to replace open coded ulist_init/release().
> RFC v3.2:
>   Update commit message of 1st and 3rd patch.
> 
> Qu Wenruo (6):
>   btrfs: qgroup: Add quick exit for non-fs extents
>   btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
>   btrfs: qgroup: Return actually freed bytes for qgroup release or free
>     data
>   btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered
>     write and quota enable
>   btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
>   btrfs: qgroup: Fix qgroup reserved space underflow by only freeing
>     reserved ranges

Added to 4.13 queue. I haven't seen any new reviews, so I tried to do my
best with my limited understanding. The preparatory look good to me, the
remaining got some review and testing so I rely on that. I made only
some wording adjustments to changelogs or comments. The whole patchset
has been in for-next for the entire dev cycle, I haven't seen any
related failures. Also I think we need to move forward with this
patchset, there was enough time to complain or help.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-06-21 19:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-17  2:56 [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version Qu Wenruo
2017-05-17  2:56 ` [RFC PATCH v3.2 1/6] btrfs: qgroup: Add quick exit for non-fs extents Qu Wenruo
2017-05-17  2:56 ` [RFC PATCH v3.2 2/6] btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function Qu Wenruo
2017-05-17  2:56 ` [RFC PATCH v3.2 3/6] btrfs: qgroup: Return actually freed bytes for qgroup release or free data Qu Wenruo
2017-05-17  2:56 ` [RFC PATCH v3.2 4/6] btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quota enable Qu Wenruo
2017-05-17  2:56 ` [RFC PATCH v3.2 5/6] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions Qu Wenruo
2017-05-17 15:37   ` David Sterba
2017-05-18  0:24     ` Qu Wenruo
2017-05-18 13:45       ` David Sterba
2017-05-19  0:32         ` Qu Wenruo
2017-05-29 15:51           ` David Sterba
2017-05-31  0:31             ` Qu Wenruo
2017-05-31 14:30               ` David Sterba
2017-06-01  1:01                 ` Qu Wenruo
2017-06-02 14:16                   ` David Sterba
2017-05-17  2:56 ` [RFC PATCH v3.2 6/6] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges Qu Wenruo
2017-06-21 19:09 ` [RFC PATCH v3.2 0/6] Qgroup fixes, Non-stack version David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.