All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] Use split qgroup rsv type
@ 2017-12-22  6:18 Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 01/10] btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans Qu Wenruo
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  6:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

[Overall]
The previous rework on qgroup reservation system put a lot of effort on
data, which works quite fine.

But it takes less focus on metadata reservation, causing some problem
like metadata reservation underflow and noisy kernel warning.

This patchset will try to address the remaining problem of metadata
reservation.

The idea of new qgroup metadata reservation is to use 2 types of
metadata reservation:
1) Per-transaction reservation
   Life span will be inside a transaction. Will be freed at transaction
   commit time.

2) Preallocated reservation
   For case where we reserve space before starting a transaction.
   Operation like dealloc and delayed-inode/item belongs to this type.

   This works similar to block_rsv, its reservation can be
   reserved/released at any timing caller like.

   The only point to notice is, if preallocated reservation is used and
   finished without problem, it should be converted to per-transaction
   type instead of just freeing.
   This is to co-operate with qgroup update at commit time.

For preallocated type, this patch will integrate them into inode_rsv
mechanism reworked by Josef, and delayed-inode/item reservation.


[Problem: Over-reserve for metadata operation]
With latest work on using more accurate and less over-estimated number
for delalloc, now for 128M limit, we can write about 123M.
(Although still worst than previous 126M, but that's due to we can free
prealloc reserved space in previous implementation)

But it can't handle metadata operation, like inode creation well.

For test case like btrfs/139, we can only create about 50+ 4M files
before hitting 1G limit.
(Double checked, there is no qgroup rsv leaking).

So there is still some room to improve the over-reserve behavior.

[Patch structure]
Patch 1~8 are mostly the same, while some of them receive some small
updates.
Patch 5 undergoes some small EDQUOT handler fix exposed by fstests.
Patch 9~10 are new patches to address the over-reserve behavior.

Changelog:
v2:
  Use independent qgroup rsv numbers other than reuse over-killed
  numbers used by block_rsv. Which greatly reduce the early EDQUOT
  problem for delalloc.

  Use transaction_kthread to do early commit in hope to free up some
  pertrans space.


Qu Wenruo (10):
  btrfs: qgroup: Split meta rsv type into meta_prealloc and
    meta_pertrans
  btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup
  btrfs: qgroup: Introduce function to convert META_PREALLOC into
    META_PERTRANS
  btrfs: qgroup: Use separate meta reservation type for delalloc
  btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and
    item
  btrfs: qgroup: Use root->qgroup_meta_rsv_* to record qgroup meta
    reserved space
  btrfs: qgroup: Update trace events for metadata reservation
  Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"
  btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT
  btrfs: qgroup: Use independent and accurate per inode qgroup rsv

 fs/btrfs/ctree.h             |  33 +++++-
 fs/btrfs/delayed-inode.c     |  56 +++++++---
 fs/btrfs/disk-io.c           |   2 +-
 fs/btrfs/extent-tree.c       |  98 ++++++++++++------
 fs/btrfs/file.c              |  15 +--
 fs/btrfs/free-space-cache.c  |   2 +-
 fs/btrfs/inode-map.c         |   4 +-
 fs/btrfs/inode.c             |  27 ++---
 fs/btrfs/ioctl.c             |  10 +-
 fs/btrfs/ordered-data.c      |   2 +-
 fs/btrfs/qgroup.c            | 241 ++++++++++++++++++++++++++++++++++---------
 fs/btrfs/qgroup.h            |  76 +++++++++++++-
 fs/btrfs/relocation.c        |   9 +-
 fs/btrfs/transaction.c       |   8 +-
 include/trace/events/btrfs.h |  60 ++++++++++-
 15 files changed, 500 insertions(+), 143 deletions(-)

-- 
2.15.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 01/10] btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans
  2017-12-22  6:18 [PATCH v2 00/10] Use split qgroup rsv type Qu Wenruo
@ 2017-12-22  6:18 ` Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 02/10] btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup Qu Wenruo
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  6:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Btrfs uses 2 different method to resever metadata qgroup space.
1) Reserve at btrfs_start_transaction() time
   This is quite straightforward, caller will use the trans handler
   allocated to modify b-trees.

   In this case, reserved metadata should be kept until qgroup numbers
   are updated.

2) Reserve by using block_rsv first, and later btrfs_join_transaction()
   This is more complicated, caller will reserve space using block_rsv
   first, and then later call btrfs_join_transaction() to get a trans
   handler.

   In this case, before we modify trees, the reserved space can be
   modified on demand, and after btrfs_join_transaction(), such reserved
   space should also be kept until qgroup numbers are updated.

Since this two types behaves quite different to each other, split the
original "META" reservation type into 2 sub-types:
  META_PERTRANS:
    For above case 1)

  META_PREALLOC:
    For reservation happened before btrfs_join_transaction() of case 2)

NOTE: This patch will only convert existing qgroup meta reservation
callers according to its situation, not ensuring all callers are at
correct timing.
Such fix will be address in later patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c       |  8 +++---
 fs/btrfs/qgroup.c            | 22 +++++++-------
 fs/btrfs/qgroup.h            | 68 ++++++++++++++++++++++++++++++++++++++++----
 fs/btrfs/transaction.c       |  8 +++---
 include/trace/events/btrfs.h |  5 ++--
 5 files changed, 86 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2f4328511ac8..20923fce06c4 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5989,7 +5989,7 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 	if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
 		/* One for parent inode, two for dir entries */
 		num_bytes = 3 * fs_info->nodesize;
-		ret = btrfs_qgroup_reserve_meta(root, num_bytes, true);
+		ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
 		if (ret)
 			return ret;
 	} else {
@@ -6008,7 +6008,7 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 		ret = btrfs_block_rsv_migrate(global_rsv, rsv, num_bytes, 1);
 
 	if (ret && *qgroup_reserved)
-		btrfs_qgroup_free_meta(root, *qgroup_reserved);
+		btrfs_qgroup_free_meta_prealloc(root, *qgroup_reserved);
 
 	return ret;
 }
@@ -6084,7 +6084,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 	spin_unlock(&inode->lock);
 
 	if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
-		ret = btrfs_qgroup_reserve_meta(root,
+		ret = btrfs_qgroup_reserve_meta_prealloc(root,
 				nr_extents * fs_info->nodesize, true);
 		if (ret)
 			goto out_fail;
@@ -6092,7 +6092,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 
 	ret = btrfs_inode_rsv_refill(inode, flush);
 	if (unlikely(ret)) {
-		btrfs_qgroup_free_meta(root,
+		btrfs_qgroup_free_meta_prealloc(root,
 				       nr_extents * fs_info->nodesize);
 		goto out_fail;
 	}
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 39bdf5341372..0b85ec21ac1d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -69,8 +69,10 @@ static const char *qgroup_rsv_type_str(enum btrfs_qgroup_rsv_type type)
 {
 	if (type == BTRFS_QGROUP_RSV_DATA)
 		return "data";
-	if (type == BTRFS_QGROUP_RSV_META)
-		return "meta";
+	if (type == BTRFS_QGROUP_RSV_META_PERTRANS)
+		return "meta_pertrans";
+	if (type == BTRFS_QGROUP_RSV_META_PREALLOC)
+		return "meta_prealloc";
 	return NULL;
 }
 #endif
@@ -3076,8 +3078,8 @@ int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len)
 	return __btrfs_qgroup_release_data(inode, NULL, start, len, 0);
 }
 
-int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
-			      bool enforce)
+int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
+				enum btrfs_qgroup_rsv_type type, bool enforce)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	int ret;
@@ -3088,14 +3090,14 @@ int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
 
 	BUG_ON(num_bytes != round_down(num_bytes, fs_info->nodesize));
 	trace_qgroup_meta_reserve(root, (s64)num_bytes);
-	ret = qgroup_reserve(root, num_bytes, enforce, BTRFS_QGROUP_RSV_META);
+	ret = qgroup_reserve(root, num_bytes, enforce, type);
 	if (ret < 0)
 		return ret;
 	atomic64_add(num_bytes, &root->qgroup_meta_rsv);
 	return ret;
 }
 
-void btrfs_qgroup_free_meta_all(struct btrfs_root *root)
+void btrfs_qgroup_free_meta_all_pertrans(struct btrfs_root *root)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	u64 reserved;
@@ -3109,10 +3111,11 @@ void btrfs_qgroup_free_meta_all(struct btrfs_root *root)
 		return;
 	trace_qgroup_meta_reserve(root, -(s64)reserved);
 	btrfs_qgroup_free_refroot(fs_info, root->objectid, reserved,
-				  BTRFS_QGROUP_RSV_META);
+				  BTRFS_QGROUP_RSV_META_PERTRANS);
 }
 
-void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
+void __btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes,
+			      enum btrfs_qgroup_rsv_type type)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 
@@ -3124,8 +3127,7 @@ void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes)
 	WARN_ON(atomic64_read(&root->qgroup_meta_rsv) < num_bytes);
 	atomic64_sub(num_bytes, &root->qgroup_meta_rsv);
 	trace_qgroup_meta_reserve(root, -(s64)num_bytes);
-	btrfs_qgroup_free_refroot(fs_info, root->objectid, num_bytes,
-				  BTRFS_QGROUP_RSV_META);
+	btrfs_qgroup_free_refroot(fs_info, root->objectid, num_bytes, type);
 }
 
 /*
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index c8c81b923674..b47740e2e017 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -61,9 +61,30 @@ struct btrfs_qgroup_extent_record {
 	struct ulist *old_roots;
 };
 
+/*
+ * Qgroup reservation types:
+ * DATA:
+ *	space reserved for data
+ *
+ * META_PERTRANS:
+ * 	Space reserved for metadata (per-transaction)
+ * 	Due to the fact that qgroup data is only updated at transaction commit
+ * 	time, reserved space for metadata must be kept until transaction
+ * 	commitment.
+ * 	Any metadata reserved used in btrfs_start_transaction() should be this
+ * 	type.
+ *
+ * META_PREALLOC:
+ *	There are cases where metadata space is reserved before starting
+ *	transaction, and then btrfs_join_transaction() to get a trans handler.
+ *	Any metadata reserved for such usage should be this type.
+ *	And after join_transaction() part (or all) of such reservation should
+ *	be converted into META_PERTRANS.
+ */
 enum btrfs_qgroup_rsv_type {
 	BTRFS_QGROUP_RSV_DATA = 0,
-	BTRFS_QGROUP_RSV_META,
+	BTRFS_QGROUP_RSV_META_PERTRANS,
+	BTRFS_QGROUP_RSV_META_PREALLOC,
 	BTRFS_QGROUP_RSV_LAST,
 };
 
@@ -270,9 +291,46 @@ int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len);
 int btrfs_qgroup_free_data(struct inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len);
 
-int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
-			      bool enforce);
-void btrfs_qgroup_free_meta_all(struct btrfs_root *root);
-void btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes);
+int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
+				enum btrfs_qgroup_rsv_type type, bool enforce);
+/* Reserve metadata space for pertrans and prealloc type*/
+static inline int btrfs_qgroup_reserve_meta_pertrans(struct btrfs_root *root,
+				int num_bytes, bool enforce)
+{
+	return __btrfs_qgroup_reserve_meta(root, num_bytes,
+			BTRFS_QGROUP_RSV_META_PERTRANS, enforce);
+}
+static inline int btrfs_qgroup_reserve_meta_prealloc(struct btrfs_root *root,
+				int num_bytes, bool enforce)
+{
+	return __btrfs_qgroup_reserve_meta(root, num_bytes,
+			BTRFS_QGROUP_RSV_META_PREALLOC, enforce);
+}
+
+void __btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes,
+			     enum btrfs_qgroup_rsv_type type);
+
+/* Free per-transaction meta reservation for error handler */
+static inline void btrfs_qgroup_free_meta_pertrans(struct btrfs_root *root,
+						   int num_bytes)
+{
+	__btrfs_qgroup_free_meta(root, num_bytes,
+			BTRFS_QGROUP_RSV_META_PERTRANS);
+}
+
+/* Pre-allocated meta reservation can be freed at need */
+static inline void btrfs_qgroup_free_meta_prealloc(struct btrfs_root *root,
+						   int num_bytes)
+{
+	__btrfs_qgroup_free_meta(root, num_bytes,
+			BTRFS_QGROUP_RSV_META_PREALLOC);
+}
+
+/*
+ * Per-transaction meta reservation should be all freed at transaction commit
+ * time
+ */
+void btrfs_qgroup_free_meta_all_pertrans(struct btrfs_root *root);
+
 void btrfs_qgroup_check_reserved_leak(struct inode *inode);
 #endif /* __BTRFS_QGROUP__ */
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 5a8c2649af2f..ddae813c01dd 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -508,8 +508,8 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 	 */
 	if (num_items && root != fs_info->chunk_root) {
 		qgroup_reserved = num_items * fs_info->nodesize;
-		ret = btrfs_qgroup_reserve_meta(root, qgroup_reserved,
-						enforce_qgroups);
+		ret = btrfs_qgroup_reserve_meta_pertrans(root, qgroup_reserved,
+				enforce_qgroups);
 		if (ret)
 			return ERR_PTR(ret);
 
@@ -606,7 +606,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 		btrfs_block_rsv_release(fs_info, &fs_info->trans_block_rsv,
 					num_bytes);
 reserve_fail:
-	btrfs_qgroup_free_meta(root, qgroup_reserved);
+	btrfs_qgroup_free_meta_pertrans(root, qgroup_reserved);
 	return ERR_PTR(ret);
 }
 
@@ -1298,7 +1298,7 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
 			spin_lock(&fs_info->fs_roots_radix_lock);
 			if (err)
 				break;
-			btrfs_qgroup_free_meta_all(root);
+			btrfs_qgroup_free_meta_all_pertrans(root);
 		}
 	}
 	spin_unlock(&fs_info->fs_roots_radix_lock);
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index b20b9f8e18a0..e3218e673e1d 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -73,8 +73,9 @@ TRACE_DEFINE_ENUM(COMMIT_TRANS);
 
 #define show_qgroup_rsv_type(type)					\
 	__print_symbolic(type,						\
-		{ BTRFS_QGROUP_RSV_DATA,	"DATA"	},		\
-		{ BTRFS_QGROUP_RSV_META,	"META"	})
+		{ BTRFS_QGROUP_RSV_DATA,	  "DATA"	},	\
+		{ BTRFS_QGROUP_RSV_META_PERTRANS, "META_PERTRANS" },	\
+		{ BTRFS_QGROUP_RSV_META_PREALLOC, "META_PREALLOC" })
 
 #define BTRFS_GROUP_FLAGS	\
 	{ BTRFS_BLOCK_GROUP_DATA,	"DATA"},	\
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 02/10] btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup
  2017-12-22  6:18 [PATCH v2 00/10] Use split qgroup rsv type Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 01/10] btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans Qu Wenruo
@ 2017-12-22  6:18 ` Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 03/10] btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS Qu Wenruo
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  6:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Since qgroup has seperate metadata reservation types now, we can
completely get rid of the old root->qgroup_meta_rsv, which mostly acts
as current META_PERTRANS reservation type.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h   |  3 ---
 fs/btrfs/disk-io.c |  1 -
 fs/btrfs/qgroup.c  | 34 +++++++++++++++++++++++++---------
 3 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 13c260b525a1..bd2bf589e6c8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1260,9 +1260,6 @@ struct btrfs_root {
 	int send_in_progress;
 	struct btrfs_subvolume_writers *subv_writers;
 	atomic_t will_be_snapshotted;
-
-	/* For qgroup metadata space reserve */
-	atomic64_t qgroup_meta_rsv;
 };
 
 struct btrfs_file_private {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a8ecccfc36de..03ea3b926fac 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1184,7 +1184,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	atomic_set(&root->orphan_inodes, 0);
 	refcount_set(&root->refs, 1);
 	atomic_set(&root->will_be_snapshotted, 0);
-	atomic64_set(&root->qgroup_meta_rsv, 0);
 	root->log_transid = 0;
 	root->log_transid_committed = -1;
 	root->last_log_commit = 0;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 0b85ec21ac1d..ce3d6c95d297 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2509,6 +2509,16 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce,
 	return ret;
 }
 
+/*
+ * Free @num_bytes of reserved space with @type for qgroup.
+ * (normally level 0 qgroup).
+ *
+ * Will handle all higher level qgroup either.
+ *
+ * NOTE: If @num_bytes is (u64)-1, this means to free all bytes of
+ * this qgroup.
+ * This special case is only used for META_PERTRANS type.
+ */
 void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
 			       u64 ref_root, u64 num_bytes,
 			       enum btrfs_qgroup_rsv_type type)
@@ -2525,6 +2535,10 @@ void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
 	if (num_bytes == 0)
 		return;
 
+	if (num_bytes == (u64)-1 && type != BTRFS_QGROUP_RSV_META_PERTRANS) {
+		WARN(1, "%s: Invalid type to free", __func__);
+		return;
+	}
 	spin_lock(&fs_info->qgroup_lock);
 
 	quota_root = fs_info->quota_root;
@@ -2535,6 +2549,13 @@ void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
 	if (!qgroup)
 		goto out;
 
+	/*
+	 * We're freeing all pertrans rsv, get current value from level 0
+	 * qgroup as real num_bytes to free.
+	 */
+	if (num_bytes == (u64)-1)
+		num_bytes = qgroup->rsv.values[type];
+
 	ulist_reinit(fs_info->qgroup_ulist);
 	ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
 			(uintptr_t)qgroup, GFP_ATOMIC);
@@ -3093,24 +3114,21 @@ int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
 	ret = qgroup_reserve(root, num_bytes, enforce, type);
 	if (ret < 0)
 		return ret;
-	atomic64_add(num_bytes, &root->qgroup_meta_rsv);
 	return ret;
 }
 
 void btrfs_qgroup_free_meta_all_pertrans(struct btrfs_root *root)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
-	u64 reserved;
 
 	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) ||
 	    !is_fstree(root->objectid))
 		return;
 
-	reserved = atomic64_xchg(&root->qgroup_meta_rsv, 0);
-	if (reserved == 0)
-		return;
-	trace_qgroup_meta_reserve(root, -(s64)reserved);
-	btrfs_qgroup_free_refroot(fs_info, root->objectid, reserved,
+	/* TODO: Update trace point to handle such free */
+	trace_qgroup_meta_reserve(root, 0);
+	/* Special value -1 means to free all reserved space */
+	btrfs_qgroup_free_refroot(fs_info, root->objectid, (u64)-1,
 				  BTRFS_QGROUP_RSV_META_PERTRANS);
 }
 
@@ -3124,8 +3142,6 @@ void __btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes,
 		return;
 
 	BUG_ON(num_bytes != round_down(num_bytes, fs_info->nodesize));
-	WARN_ON(atomic64_read(&root->qgroup_meta_rsv) < num_bytes);
-	atomic64_sub(num_bytes, &root->qgroup_meta_rsv);
 	trace_qgroup_meta_reserve(root, -(s64)num_bytes);
 	btrfs_qgroup_free_refroot(fs_info, root->objectid, num_bytes, type);
 }
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 03/10] btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS
  2017-12-22  6:18 [PATCH v2 00/10] Use split qgroup rsv type Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 01/10] btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 02/10] btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup Qu Wenruo
@ 2017-12-22  6:18 ` Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 04/10] btrfs: qgroup: Use separate meta reservation type for delalloc Qu Wenruo
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  6:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

For meta_prealloc reservation user, after btrfs_join_transaction()
caller will modify tree so part (or even all) meta_prealloc reservation
should be converted to meta_pertrans until transaction commit time.

This patch introduce a new function,
btrfs_qgroup_convert_reserved_meta() to do this for META_PREALLOC
reservation user.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.h |  8 ++++++++
 2 files changed, 64 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index ce3d6c95d297..24fc6e46f717 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3146,6 +3146,62 @@ void __btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes,
 	btrfs_qgroup_free_refroot(fs_info, root->objectid, num_bytes, type);
 }
 
+static void qgroup_convert_meta(struct btrfs_fs_info *fs_info, u64 ref_root,
+				int num_bytes)
+{
+	struct btrfs_root *quota_root = fs_info->quota_root;
+	struct btrfs_qgroup *qgroup;
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+	int ret = 0;
+
+	if (num_bytes == 0)
+		return;
+	if (!quota_root)
+		return;
+
+	spin_lock(&fs_info->qgroup_lock);
+	qgroup = find_qgroup_rb(fs_info, ref_root);
+	if (!qgroup)
+		goto out;
+	ulist_reinit(fs_info->qgroup_ulist);
+	ret = ulist_add(fs_info->qgroup_ulist, qgroup->qgroupid,
+		       (uintptr_t)qgroup, GFP_ATOMIC);
+	if (ret < 0)
+		goto out;
+	ULIST_ITER_INIT(&uiter);
+	while ((unode = ulist_next(fs_info->qgroup_ulist, &uiter))) {
+		struct btrfs_qgroup *qg;
+		struct btrfs_qgroup_list *glist;
+
+		qg = unode_aux_to_qgroup(unode);
+
+		qgroup_rsv_release(fs_info, qg, num_bytes,
+				BTRFS_QGROUP_RSV_META_PREALLOC);
+		qgroup_rsv_add(fs_info, qg, num_bytes,
+				BTRFS_QGROUP_RSV_META_PERTRANS);
+		list_for_each_entry(glist, &qg->groups, next_group) {
+			ret = ulist_add(fs_info->qgroup_ulist,
+					glist->group->qgroupid,
+					(uintptr_t)glist->group, GFP_ATOMIC);
+			if (ret < 0)
+				goto out;
+		}
+	}
+out:
+	spin_unlock(&fs_info->qgroup_lock);
+}
+
+void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+
+	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) ||
+	    !is_fstree(root->objectid))
+		return;
+	qgroup_convert_meta(fs_info, root->objectid, num_bytes);
+}
+
 /*
  * Check qgroup reserved space leaking, normally at destroy inode
  * time
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index b47740e2e017..4814d680c50f 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -332,5 +332,13 @@ static inline void btrfs_qgroup_free_meta_prealloc(struct btrfs_root *root,
  */
 void btrfs_qgroup_free_meta_all_pertrans(struct btrfs_root *root);
 
+/*
+ * Convert @num_bytes of META_PREALLOCATED reservation to META_PERTRANS.
+ *
+ * This is called when preallocated meta reservation needs to be used.
+ * Normally after btrfs_join_transaction() call.
+ */
+void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes);
+
 void btrfs_qgroup_check_reserved_leak(struct inode *inode);
 #endif /* __BTRFS_QGROUP__ */
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 04/10] btrfs: qgroup: Use separate meta reservation type for delalloc
  2017-12-22  6:18 [PATCH v2 00/10] Use split qgroup rsv type Qu Wenruo
                   ` (2 preceding siblings ...)
  2017-12-22  6:18 ` [PATCH v2 03/10] btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS Qu Wenruo
@ 2017-12-22  6:18 ` Qu Wenruo
  2017-12-26  5:37   ` [PATCH v2.2 " Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 05/10] btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item Qu Wenruo
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  6:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Before this patch, btrfs qgroup is mixing per-transcation meta rsv with
preallocated meta rsv, making it quite easy to underflow qgroup meta
reservation.

Since we have the new qgroup meta rsv types, apply it to delalloc
reservation.

Now for delalloc, most of its reserved space will use META_PREALLOC qgroup
rsv type.

And for callers reducing outstanding extent like btrfs_finish_ordered_io(),
they will convert corresponding META_PREALLOC reservation to
META_PERTRANS.

This is mainly due to the fact that current qgroup numbers will only be
updated in btrfs_commit_transaction(), that's to say if we don't keep
such placeholder reservation, we can exceed qgroup limitation.

And for callers freeing outstanding extent in error handler, we will
just free META_PREALLOC bytes.

This behavior makes callers of btrfs_qgroup_release_meta() or
btrfs_qgroup_convert_meta() to be aware of which type they are.
So in this patch, btrfs_delalloc_release_metadata() and its callers get
an extra parameter to info qgroup to do correct meta convert/release.

The good news is, even we use the wrong type (convert or free), it won't
cause obvious bug, as prealloc type is always in good shape, and the
type only affects how per-trans meta is increased or not.

So the worst case will be at most metadata limitation can be sometimes
exceeded (no convert at all) or metadata limitation is reached too soon
(no free at all).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h            |  9 ++++++---
 fs/btrfs/extent-tree.c      | 45 +++++++++++++++++++++++++--------------------
 fs/btrfs/file.c             | 15 ++++++++-------
 fs/btrfs/free-space-cache.c |  2 +-
 fs/btrfs/inode-map.c        |  4 ++--
 fs/btrfs/inode.c            | 27 ++++++++++++++-------------
 fs/btrfs/ioctl.c            | 10 ++++++----
 fs/btrfs/ordered-data.c     |  2 +-
 fs/btrfs/relocation.c       |  9 +++++----
 9 files changed, 68 insertions(+), 55 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bd2bf589e6c8..091c0e06b32e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2728,7 +2728,8 @@ int btrfs_check_data_free_space(struct inode *inode,
 void btrfs_free_reserved_data_space(struct inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len);
 void btrfs_delalloc_release_space(struct inode *inode,
-			struct extent_changeset *reserved, u64 start, u64 len);
+				  struct extent_changeset *reserved,
+				  u64 start, u64 len, bool qgroup_free);
 void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
 					    u64 len);
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
@@ -2743,10 +2744,12 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 				     u64 *qgroup_reserved, bool use_global_rsv);
 void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
 				      struct btrfs_block_rsv *rsv);
-void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes);
+void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes,
+				    bool qgroup_free);
 
 int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
-void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes);
+void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
+				     bool qgroup_free);
 int btrfs_delalloc_reserve_space(struct inode *inode,
 			struct extent_changeset **reserved, u64 start, u64 len);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 20923fce06c4..986660f301de 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5754,6 +5754,9 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
 	if (num_bytes == 0)
 		return 0;
 
+	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
+	if (ret)
+		return ret;
 	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
 	if (!ret) {
 		block_rsv_add_bytes(block_rsv, num_bytes, 0);
@@ -5766,11 +5769,16 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
 /**
  * btrfs_inode_rsv_release - release any excessive reservation.
  * @inode - the inode we need to release from.
+ * @qgroup_free - free or convert qgroup meta.
+ *   Unlike normal operation, qgroup meta reservation needs to know if
+ *   we are freeing qgroup reservation or just convert them into per-trans.
+ *   Normally @qgroup_free is true for error handler, and false for normal
+ *   release.
  *
  * This is the same as btrfs_block_rsv_release, except that it handles the
  * tracepoint for the reservation.
  */
-void btrfs_inode_rsv_release(struct btrfs_inode *inode)
+void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
@@ -5786,6 +5794,10 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode)
 	if (released > 0)
 		trace_btrfs_space_reservation(fs_info, "delalloc",
 					      btrfs_ino(inode), released, 0);
+	if (qgroup_free)
+		btrfs_qgroup_free_meta_prealloc(inode->root, released);
+	else
+		btrfs_qgroup_convert_reserved_meta(inode->root, released);
 }
 
 void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
@@ -6045,7 +6057,6 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
 int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
-	struct btrfs_root *root = inode->root;
 	unsigned nr_extents;
 	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_FLUSH_ALL;
 	int ret = 0;
@@ -6083,19 +6094,9 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 	btrfs_calculate_inode_block_rsv_size(fs_info, inode);
 	spin_unlock(&inode->lock);
 
-	if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
-		ret = btrfs_qgroup_reserve_meta_prealloc(root,
-				nr_extents * fs_info->nodesize, true);
-		if (ret)
-			goto out_fail;
-	}
-
 	ret = btrfs_inode_rsv_refill(inode, flush);
-	if (unlikely(ret)) {
-		btrfs_qgroup_free_meta_prealloc(root,
-				       nr_extents * fs_info->nodesize);
+	if (unlikely(ret))
 		goto out_fail;
-	}
 
 	if (delalloc_lock)
 		mutex_unlock(&inode->delalloc_mutex);
@@ -6109,7 +6110,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 	btrfs_calculate_inode_block_rsv_size(fs_info, inode);
 	spin_unlock(&inode->lock);
 
-	btrfs_inode_rsv_release(inode);
+	btrfs_inode_rsv_release(inode, true);
 	if (delalloc_lock)
 		mutex_unlock(&inode->delalloc_mutex);
 	return ret;
@@ -6119,12 +6120,14 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
  * btrfs_delalloc_release_metadata - release a metadata reservation for an inode
  * @inode: the inode to release the reservation for.
  * @num_bytes: the number of bytes we are releasing.
+ * @qgroup_free: free qgroup reservation or convert it to per-trans reservation
  *
  * This will release the metadata reservation for an inode.  This can be called
  * once we complete IO for a given set of bytes to release their metadata
  * reservations, or on error for the same reason.
  */
-void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
+void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
+				     bool qgroup_free)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
 
@@ -6137,13 +6140,14 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
 	if (btrfs_is_testing(fs_info))
 		return;
 
-	btrfs_inode_rsv_release(inode);
+	btrfs_inode_rsv_release(inode, qgroup_free);
 }
 
 /**
  * btrfs_delalloc_release_extents - release our outstanding_extents
  * @inode: the inode to balance the reservation for.
  * @num_bytes: the number of bytes we originally reserved with
+ * @qgroup_free: do we need to free qgroup meta reservation or convert them.
  *
  * When we reserve space we increase outstanding_extents for the extents we may
  * add.  Once we've set the range as delalloc or created our ordered extents we
@@ -6151,7 +6155,8 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
  * temporarily tracked outstanding_extents.  This _must_ be used in conjunction
  * with btrfs_delalloc_reserve_metadata.
  */
-void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes)
+void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes,
+				    bool qgroup_free)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
 	unsigned num_extents;
@@ -6165,7 +6170,7 @@ void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes)
 	if (btrfs_is_testing(fs_info))
 		return;
 
-	btrfs_inode_rsv_release(inode);
+	btrfs_inode_rsv_release(inode, qgroup_free);
 }
 
 /**
@@ -6221,9 +6226,9 @@ int btrfs_delalloc_reserve_space(struct inode *inode,
  */
 void btrfs_delalloc_release_space(struct inode *inode,
 				  struct extent_changeset *reserved,
-				  u64 start, u64 len)
+				  u64 start, u64 len, bool qgroup_free)
 {
-	btrfs_delalloc_release_metadata(BTRFS_I(inode), len);
+	btrfs_delalloc_release_metadata(BTRFS_I(inode), len, qgroup_free);
 	btrfs_free_reserved_data_space(inode, reserved, start, len);
 }
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index eb1bac7c8553..723cea16cd56 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1690,7 +1690,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				    force_page_uptodate);
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
-						       reserve_bytes);
+						       reserve_bytes, true);
 			break;
 		}
 
@@ -1702,7 +1702,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 			if (extents_locked == -EAGAIN)
 				goto again;
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
-						       reserve_bytes);
+						       reserve_bytes, true);
 			ret = extents_locked;
 			break;
 		}
@@ -1737,7 +1737,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 						fs_info->sb->s_blocksize_bits;
 			if (only_release_metadata) {
 				btrfs_delalloc_release_metadata(BTRFS_I(inode),
-								release_bytes);
+							release_bytes, true);
 			} else {
 				u64 __pos;
 
@@ -1746,7 +1746,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 					(dirty_pages << PAGE_SHIFT);
 				btrfs_delalloc_release_space(inode,
 						data_reserved, __pos,
-						release_bytes);
+						release_bytes, true);
 			}
 		}
 
@@ -1760,7 +1760,8 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
 					     lockstart, lockend, &cached_state,
 					     GFP_NOFS);
-		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes,
+					       (ret != 0));
 		if (ret) {
 			btrfs_drop_pages(pages, num_pages);
 			break;
@@ -1800,11 +1801,11 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 		if (only_release_metadata) {
 			btrfs_end_write_no_snapshotting(root);
 			btrfs_delalloc_release_metadata(BTRFS_I(inode),
-					release_bytes);
+					release_bytes, true);
 		} else {
 			btrfs_delalloc_release_space(inode, data_reserved,
 					round_down(pos, fs_info->sectorsize),
-					release_bytes);
+					release_bytes, true);
 		}
 	}
 
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 4426d1c73e50..fbd831044a1d 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3548,7 +3548,7 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
 	if (ret) {
 		if (release_metadata)
 			btrfs_delalloc_release_metadata(BTRFS_I(inode),
-					inode->i_size);
+					inode->i_size, true);
 #ifdef DEBUG
 		btrfs_err(fs_info,
 			  "failed to write free ino cache for root %llu",
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 022b19336fee..9409dcc7020d 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -500,12 +500,12 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
 	ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
 					      prealloc, prealloc, &alloc_hint);
 	if (ret) {
-		btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc, true);
 		goto out_put;
 	}
 
 	ret = btrfs_write_out_ino_cache(root, trans, path, inode);
-	btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc, false);
 out_put:
 	iput(inode);
 out_release:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e1a7f3cb5be9..76a1b4819607 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1839,7 +1839,7 @@ static void btrfs_clear_bit_hook(void *private_data,
 		 */
 		if (*bits & EXTENT_CLEAR_META_RESV &&
 		    root != fs_info->tree_root)
-			btrfs_delalloc_release_metadata(inode, len);
+			btrfs_delalloc_release_metadata(inode, len, false);
 
 		/* For sanity tests. */
 		if (btrfs_is_testing(fs_info))
@@ -2102,7 +2102,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 				  0);
 	ClearPageChecked(page);
 	set_page_dirty(page);
-	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, false);
 out:
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree, page_start, page_end,
 			     &cached_state, GFP_NOFS);
@@ -4760,8 +4760,8 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	page = find_or_create_page(mapping, index, mask);
 	if (!page) {
 		btrfs_delalloc_release_space(inode, data_reserved,
-					     block_start, blocksize);
-		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
+					     block_start, blocksize, true);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
 		ret = -ENOMEM;
 		goto out;
 	}
@@ -4829,8 +4829,8 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 out_unlock:
 	if (ret)
 		btrfs_delalloc_release_space(inode, data_reserved, block_start,
-					     blocksize);
-	btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
+					     blocksize, true);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));
 	unlock_page(page);
 	put_page(page);
 out:
@@ -8814,7 +8814,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		if (ret < 0 && ret != -EIOCBQUEUED) {
 			if (dio_data.reserve)
 				btrfs_delalloc_release_space(inode, data_reserved,
-					offset, dio_data.reserve);
+					offset, dio_data.reserve, true);
 			/*
 			 * On error we might have left some ordered extents
 			 * without submitting corresponding bios for them, so
@@ -8830,8 +8830,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 					false);
 		} else if (ret >= 0 && (size_t)ret < count)
 			btrfs_delalloc_release_space(inode, data_reserved,
-					offset, count - (size_t)ret);
-		btrfs_delalloc_release_extents(BTRFS_I(inode), count);
+					offset, count - (size_t)ret, true);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), count, false);
 	}
 out:
 	if (wakeup)
@@ -9150,7 +9150,8 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 		if (reserved_space < PAGE_SIZE) {
 			end = page_start + reserved_space - 1;
 			btrfs_delalloc_release_space(inode, data_reserved,
-					page_start, PAGE_SIZE - reserved_space);
+					page_start, PAGE_SIZE - reserved_space,
+					true);
 		}
 	}
 
@@ -9200,16 +9201,16 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 
 out_unlock:
 	if (!ret) {
-		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true);
 		sb_end_pagefault(inode->i_sb);
 		extent_changeset_free(data_reserved);
 		return VM_FAULT_LOCKED;
 	}
 	unlock_page(page);
 out:
-	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, (ret != 0));
 	btrfs_delalloc_release_space(inode, data_reserved, page_start,
-				     reserved_space);
+				     reserved_space, (ret != 0));
 out_noreserve:
 	sb_end_pagefault(inode->i_sb);
 	extent_changeset_free(data_reserved);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2ef8acaac688..9c46b89de591 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1198,7 +1198,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		spin_unlock(&BTRFS_I(inode)->lock);
 		btrfs_delalloc_release_space(inode, data_reserved,
 				start_index << PAGE_SHIFT,
-				(page_cnt - i_done) << PAGE_SHIFT);
+				(page_cnt - i_done) << PAGE_SHIFT, true);
 	}
 
 
@@ -1217,7 +1217,8 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		unlock_page(pages[i]);
 		put_page(pages[i]);
 	}
-	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT,
+				       false);
 	extent_changeset_free(data_reserved);
 	return i_done;
 out:
@@ -1227,8 +1228,9 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	}
 	btrfs_delalloc_release_space(inode, data_reserved,
 			start_index << PAGE_SHIFT,
-			page_cnt << PAGE_SHIFT);
-	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT);
+			page_cnt << PAGE_SHIFT, true);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT,
+				       true);
 	extent_changeset_free(data_reserved);
 	return ret;
 
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 5b311aeddcc8..c6f5f7e274a5 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -610,7 +610,7 @@ void btrfs_remove_ordered_extent(struct inode *inode,
 	btrfs_mod_outstanding_extents(btrfs_inode, -1);
 	spin_unlock(&btrfs_inode->lock);
 	if (root != fs_info->tree_root)
-		btrfs_delalloc_release_metadata(btrfs_inode, entry->len);
+		btrfs_delalloc_release_metadata(btrfs_inode, entry->len, false);
 
 	tree = &btrfs_inode->ordered_tree;
 	spin_lock_irq(&tree->lock);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index f0c3f00e97cb..632129024c5f 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3226,7 +3226,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
 						   mask);
 			if (!page) {
 				btrfs_delalloc_release_metadata(BTRFS_I(inode),
-							PAGE_SIZE);
+							PAGE_SIZE, true);
 				ret = -ENOMEM;
 				goto out;
 			}
@@ -3245,9 +3245,9 @@ static int relocate_file_extent_cluster(struct inode *inode,
 				unlock_page(page);
 				put_page(page);
 				btrfs_delalloc_release_metadata(BTRFS_I(inode),
-							PAGE_SIZE);
+							PAGE_SIZE, true);
 				btrfs_delalloc_release_extents(BTRFS_I(inode),
-							       PAGE_SIZE);
+							       PAGE_SIZE, true);
 				ret = -EIO;
 				goto out;
 			}
@@ -3278,7 +3278,8 @@ static int relocate_file_extent_cluster(struct inode *inode,
 		put_page(page);
 
 		index++;
-		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE,
+					       false);
 		balance_dirty_pages_ratelimited(inode->i_mapping);
 		btrfs_throttle(fs_info);
 	}
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 05/10] btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item
  2017-12-22  6:18 [PATCH v2 00/10] Use split qgroup rsv type Qu Wenruo
                   ` (3 preceding siblings ...)
  2017-12-22  6:18 ` [PATCH v2 04/10] btrfs: qgroup: Use separate meta reservation type for delalloc Qu Wenruo
@ 2017-12-22  6:18 ` Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 06/10] btrfs: qgroup: Use root->qgroup_meta_rsv_* to record qgroup meta reserved space Qu Wenruo
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  6:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Quite similar for delalloc, some modification to delayed-inode and
delayed-item reservation.
Also needs extra parameter for release case to distinguish normal release and
error release.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/delayed-inode.c | 56 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 40 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 5d73f79ded8b..e8d571d2ba06 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -22,6 +22,7 @@
 #include "disk-io.h"
 #include "transaction.h"
 #include "ctree.h"
+#include "qgroup.h"
 
 #define BTRFS_DELAYED_WRITEBACK		512
 #define BTRFS_DELAYED_BACKGROUND	128
@@ -528,11 +529,12 @@ static struct btrfs_delayed_item *__btrfs_next_delayed_item(
 }
 
 static int btrfs_delayed_item_reserve_metadata(struct btrfs_trans_handle *trans,
-					       struct btrfs_fs_info *fs_info,
+					       struct btrfs_root *root,
 					       struct btrfs_delayed_item *item)
 {
 	struct btrfs_block_rsv *src_rsv;
 	struct btrfs_block_rsv *dst_rsv;
+	struct btrfs_fs_info *fs_info = root->fs_info;
 	u64 num_bytes;
 	int ret;
 
@@ -543,6 +545,12 @@ static int btrfs_delayed_item_reserve_metadata(struct btrfs_trans_handle *trans,
 	dst_rsv = &fs_info->delayed_block_rsv;
 
 	num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
+
+	/*
+	 * Here we migrate space rsv from transaction rsv, since have
+	 * already reserved space when starting a transaction.
+	 * So no need to reserve qgroup space here.
+	 */
 	ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, 1);
 	if (!ret) {
 		trace_btrfs_space_reservation(fs_info, "delayed_item",
@@ -554,15 +562,20 @@ static int btrfs_delayed_item_reserve_metadata(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-static void btrfs_delayed_item_release_metadata(struct btrfs_fs_info *fs_info,
+static void btrfs_delayed_item_release_metadata(struct btrfs_root *root,
 						struct btrfs_delayed_item *item)
 {
 	struct btrfs_block_rsv *rsv;
+	struct btrfs_fs_info *fs_info = root->fs_info;
 
 	if (!item->bytes_reserved)
 		return;
 
 	rsv = &fs_info->delayed_block_rsv;
+	/*
+	 * Check btrfs_delayed_item_reserve_metadata() to see why we don't need
+	 * to release/reserve qgroup space.
+	 */
 	trace_btrfs_space_reservation(fs_info, "delayed_item",
 				      item->key.objectid, item->bytes_reserved,
 				      0);
@@ -598,6 +611,10 @@ static int btrfs_delayed_inode_reserve_metadata(
 	 */
 	if (!src_rsv || (!trans->bytes_reserved &&
 			 src_rsv->type != BTRFS_BLOCK_RSV_DELALLOC)) {
+		ret = btrfs_qgroup_reserve_meta_prealloc(root,
+				fs_info->nodesize, true);
+		if (ret < 0)
+			return ret;
 		ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes,
 					  BTRFS_RESERVE_NO_FLUSH);
 		/*
@@ -614,7 +631,9 @@ static int btrfs_delayed_inode_reserve_metadata(
 						      "delayed_inode",
 						      btrfs_ino(inode),
 						      num_bytes, 1);
-		}
+		} else
+			btrfs_qgroup_free_meta_prealloc(root,
+							fs_info->nodesize);
 		return ret;
 	}
 
@@ -629,7 +648,8 @@ static int btrfs_delayed_inode_reserve_metadata(
 }
 
 static void btrfs_delayed_inode_release_metadata(struct btrfs_fs_info *fs_info,
-						struct btrfs_delayed_node *node)
+						struct btrfs_delayed_node *node,
+						bool qgroup_free)
 {
 	struct btrfs_block_rsv *rsv;
 
@@ -641,6 +661,12 @@ static void btrfs_delayed_inode_release_metadata(struct btrfs_fs_info *fs_info,
 				      node->inode_id, node->bytes_reserved, 0);
 	btrfs_block_rsv_release(fs_info, rsv,
 				node->bytes_reserved);
+	if (qgroup_free)
+		btrfs_qgroup_free_meta_prealloc(node->root,
+				node->bytes_reserved);
+	else
+		btrfs_qgroup_convert_reserved_meta(node->root,
+				node->bytes_reserved);
 	node->bytes_reserved = 0;
 }
 
@@ -742,7 +768,7 @@ static int btrfs_batch_insert_items(struct btrfs_root *root,
 				    curr->data_len);
 		slot++;
 
-		btrfs_delayed_item_release_metadata(fs_info, curr);
+		btrfs_delayed_item_release_metadata(root, curr);
 
 		list_del(&curr->tree_list);
 		btrfs_release_delayed_item(curr);
@@ -764,7 +790,6 @@ static int btrfs_insert_delayed_item(struct btrfs_trans_handle *trans,
 				     struct btrfs_path *path,
 				     struct btrfs_delayed_item *delayed_item)
 {
-	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct extent_buffer *leaf;
 	char *ptr;
 	int ret;
@@ -782,7 +807,7 @@ static int btrfs_insert_delayed_item(struct btrfs_trans_handle *trans,
 			    delayed_item->data_len);
 	btrfs_mark_buffer_dirty(leaf);
 
-	btrfs_delayed_item_release_metadata(fs_info, delayed_item);
+	btrfs_delayed_item_release_metadata(root, delayed_item);
 	return 0;
 }
 
@@ -834,7 +859,6 @@ static int btrfs_batch_delete_items(struct btrfs_trans_handle *trans,
 				    struct btrfs_path *path,
 				    struct btrfs_delayed_item *item)
 {
-	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_delayed_item *curr, *next;
 	struct extent_buffer *leaf;
 	struct btrfs_key key;
@@ -884,7 +908,7 @@ static int btrfs_batch_delete_items(struct btrfs_trans_handle *trans,
 		goto out;
 
 	list_for_each_entry_safe(curr, next, &head, tree_list) {
-		btrfs_delayed_item_release_metadata(fs_info, curr);
+		btrfs_delayed_item_release_metadata(root, curr);
 		list_del(&curr->tree_list);
 		btrfs_release_delayed_item(curr);
 	}
@@ -1027,7 +1051,7 @@ static int __btrfs_update_delayed_inode(struct btrfs_trans_handle *trans,
 no_iref:
 	btrfs_release_path(path);
 err_out:
-	btrfs_delayed_inode_release_metadata(fs_info, node);
+	btrfs_delayed_inode_release_metadata(fs_info, node, (ret < 0));
 	btrfs_release_delayed_inode(node);
 
 	return ret;
@@ -1420,7 +1444,7 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
 	btrfs_set_stack_dir_type(dir_item, type);
 	memcpy((char *)(dir_item + 1), name, name_len);
 
-	ret = btrfs_delayed_item_reserve_metadata(trans, fs_info, delayed_item);
+	ret = btrfs_delayed_item_reserve_metadata(trans, dir->root, delayed_item);
 	/*
 	 * we have reserved enough space when we start a new transaction,
 	 * so reserving metadata failure is impossible
@@ -1457,7 +1481,7 @@ static int btrfs_delete_delayed_insertion_item(struct btrfs_fs_info *fs_info,
 		return 1;
 	}
 
-	btrfs_delayed_item_release_metadata(fs_info, item);
+	btrfs_delayed_item_release_metadata(node->root, item);
 	btrfs_release_delayed_item(item);
 	mutex_unlock(&node->mutex);
 	return 0;
@@ -1492,7 +1516,7 @@ int btrfs_delete_delayed_dir_index(struct btrfs_trans_handle *trans,
 
 	item->key = item_key;
 
-	ret = btrfs_delayed_item_reserve_metadata(trans, fs_info, item);
+	ret = btrfs_delayed_item_reserve_metadata(trans, dir->root, item);
 	/*
 	 * we have reserved enough space when we start a new transaction,
 	 * so reserving metadata failure is impossible.
@@ -1865,7 +1889,7 @@ static void __btrfs_kill_delayed_node(struct btrfs_delayed_node *delayed_node)
 	mutex_lock(&delayed_node->mutex);
 	curr_item = __btrfs_first_delayed_insertion_item(delayed_node);
 	while (curr_item) {
-		btrfs_delayed_item_release_metadata(fs_info, curr_item);
+		btrfs_delayed_item_release_metadata(root, curr_item);
 		prev_item = curr_item;
 		curr_item = __btrfs_next_delayed_item(prev_item);
 		btrfs_release_delayed_item(prev_item);
@@ -1873,7 +1897,7 @@ static void __btrfs_kill_delayed_node(struct btrfs_delayed_node *delayed_node)
 
 	curr_item = __btrfs_first_delayed_deletion_item(delayed_node);
 	while (curr_item) {
-		btrfs_delayed_item_release_metadata(fs_info, curr_item);
+		btrfs_delayed_item_release_metadata(root, curr_item);
 		prev_item = curr_item;
 		curr_item = __btrfs_next_delayed_item(prev_item);
 		btrfs_release_delayed_item(prev_item);
@@ -1883,7 +1907,7 @@ static void __btrfs_kill_delayed_node(struct btrfs_delayed_node *delayed_node)
 		btrfs_release_delayed_iref(delayed_node);
 
 	if (test_bit(BTRFS_DELAYED_NODE_INODE_DIRTY, &delayed_node->flags)) {
-		btrfs_delayed_inode_release_metadata(fs_info, delayed_node);
+		btrfs_delayed_inode_release_metadata(fs_info, delayed_node, false);
 		btrfs_release_delayed_inode(delayed_node);
 	}
 	mutex_unlock(&delayed_node->mutex);
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 06/10] btrfs: qgroup: Use root->qgroup_meta_rsv_* to record qgroup meta reserved space
  2017-12-22  6:18 [PATCH v2 00/10] Use split qgroup rsv type Qu Wenruo
                   ` (4 preceding siblings ...)
  2017-12-22  6:18 ` [PATCH v2 05/10] btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item Qu Wenruo
@ 2017-12-22  6:18 ` Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 07/10] btrfs: qgroup: Update trace events for metadata reservation Qu Wenruo
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  6:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

For quota disabled->enable case, it's possible that at reservation time
quota was not enabled so no byte was really reserved, while at release
time, quota is enabled so we will try to release some bytes we didn't
really own.

Such situation can cause metadata reserveation underflow, for both types,
also less possible for per-trans type since quota enable will commit
transaction.

To address this, record qgroup meta reserved bytes into
root->qgroup_meta_rsv_pertrans/prealloc.
So at releasing time we won't free any bytes we didn't reserve.

For DATA, it's already handled by io_tree, so nothing needs to be
worried about.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h   |  5 +++++
 fs/btrfs/disk-io.c |  1 +
 fs/btrfs/qgroup.c  | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 091c0e06b32e..0c58f92c2d44 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1260,6 +1260,11 @@ struct btrfs_root {
 	int send_in_progress;
 	struct btrfs_subvolume_writers *subv_writers;
 	atomic_t will_be_snapshotted;
+
+	/* For qgroup metadata reserved space */
+	spinlock_t qgroup_meta_rsv_lock;
+	u64 qgroup_meta_rsv_pertrans;
+	u64 qgroup_meta_rsv_prealloc;
 };
 
 struct btrfs_file_private {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 03ea3b926fac..d8eaadac4330 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1168,6 +1168,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	spin_lock_init(&root->accounting_lock);
 	spin_lock_init(&root->log_extents_lock[0]);
 	spin_lock_init(&root->log_extents_lock[1]);
+	spin_lock_init(&root->qgroup_meta_rsv_lock);
 	mutex_init(&root->objectid_mutex);
 	mutex_init(&root->log_mutex);
 	mutex_init(&root->ordered_extent_mutex);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 24fc6e46f717..96ed678b3588 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2549,11 +2549,11 @@ void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
 	if (!qgroup)
 		goto out;
 
-	/*
-	 * We're freeing all pertrans rsv, get current value from level 0
-	 * qgroup as real num_bytes to free.
-	 */
 	if (num_bytes == (u64)-1)
+		/*
+		 * We're freeing all pertrans rsv, get reserved value from
+		 * level 0 qgroup as real num_bytes to free.
+		 */
 		num_bytes = qgroup->rsv.values[type];
 
 	ulist_reinit(fs_info->qgroup_ulist);
@@ -3099,6 +3099,46 @@ int btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len)
 	return __btrfs_qgroup_release_data(inode, NULL, start, len, 0);
 }
 
+static void add_root_meta_rsv(struct btrfs_root *root, int num_bytes,
+			      enum btrfs_qgroup_rsv_type type)
+{
+	if (type != BTRFS_QGROUP_RSV_META_PREALLOC &&
+	    type != BTRFS_QGROUP_RSV_META_PERTRANS)
+		return;
+	if (num_bytes == 0)
+		return;
+
+	spin_lock(&root->qgroup_meta_rsv_lock);
+	if (type == BTRFS_QGROUP_RSV_META_PREALLOC)
+		root->qgroup_meta_rsv_prealloc += num_bytes;
+	else
+		root->qgroup_meta_rsv_pertrans += num_bytes;
+	spin_unlock(&root->qgroup_meta_rsv_lock);
+}
+
+static int sub_root_meta_rsv(struct btrfs_root *root, int num_bytes,
+			     enum btrfs_qgroup_rsv_type type)
+{
+	if (type != BTRFS_QGROUP_RSV_META_PREALLOC &&
+	    type != BTRFS_QGROUP_RSV_META_PERTRANS)
+		return 0;
+	if (num_bytes == 0)
+		return 0;
+
+	spin_lock(&root->qgroup_meta_rsv_lock);
+	if (type == BTRFS_QGROUP_RSV_META_PREALLOC) {
+		num_bytes = min_t(u64, root->qgroup_meta_rsv_prealloc,
+				  num_bytes);
+		root->qgroup_meta_rsv_prealloc -= num_bytes;
+	} else {
+		num_bytes = min_t(u64, root->qgroup_meta_rsv_pertrans,
+				  num_bytes);
+		root->qgroup_meta_rsv_pertrans -= num_bytes;
+	}
+	spin_unlock(&root->qgroup_meta_rsv_lock);
+	return num_bytes;
+}
+
 int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
 				enum btrfs_qgroup_rsv_type type, bool enforce)
 {
@@ -3114,6 +3154,15 @@ int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
 	ret = qgroup_reserve(root, num_bytes, enforce, type);
 	if (ret < 0)
 		return ret;
+	/*
+	 * Record what we have reserved into root.
+	 *
+	 * To avoid quota disabled->enabled underflow.
+	 * In that case, we may try to free space we haven't reserved
+	 * (since quota was disabled), so record what we reserved into root.
+	 * And ensure later release won't underflow this number.
+	 */
+	add_root_meta_rsv(root, num_bytes, type);
 	return ret;
 }
 
@@ -3141,6 +3190,12 @@ void __btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes,
 	    !is_fstree(root->objectid))
 		return;
 
+	/*
+	 * reservation for META_PREALLOC can happen before quota is enabled,
+	 * which can lead to underflow.
+	 * Here ensure we will only free what we really have reserved.
+	 */
+	num_bytes = sub_root_meta_rsv(root, num_bytes, type);
 	BUG_ON(num_bytes != round_down(num_bytes, fs_info->nodesize));
 	trace_qgroup_meta_reserve(root, -(s64)num_bytes);
 	btrfs_qgroup_free_refroot(fs_info, root->objectid, num_bytes, type);
@@ -3199,6 +3254,9 @@ void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes)
 	if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags) ||
 	    !is_fstree(root->objectid))
 		return;
+	/* Same as btrfs_qgroup_free_meta_prealloc() */
+	num_bytes = sub_root_meta_rsv(root, num_bytes,
+				      BTRFS_QGROUP_RSV_META_PREALLOC);
 	qgroup_convert_meta(fs_info, root->objectid, num_bytes);
 }
 
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 07/10] btrfs: qgroup: Update trace events for metadata reservation
  2017-12-22  6:18 [PATCH v2 00/10] Use split qgroup rsv type Qu Wenruo
                   ` (5 preceding siblings ...)
  2017-12-22  6:18 ` [PATCH v2 06/10] btrfs: qgroup: Use root->qgroup_meta_rsv_* to record qgroup meta reserved space Qu Wenruo
@ 2017-12-22  6:18 ` Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 08/10] Revert "btrfs: qgroups: Retry after commit on getting EDQUOT" Qu Wenruo
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  6:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Now trace_qgroup_meta_reserve() will have extra type parameter.

And introduce two new trace events:
1) trace_qgroup_meta_free_all_pertrans()
   For btrfs_qgroup_free_meta_all_pertrans()

2) trace_qgroup_meta_convert()
   For btrfs_qgroup_convert_reserved_meta()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c            |  7 +++---
 include/trace/events/btrfs.h | 55 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 96ed678b3588..ee5b05dd10a9 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3150,7 +3150,7 @@ int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
 		return 0;
 
 	BUG_ON(num_bytes != round_down(num_bytes, fs_info->nodesize));
-	trace_qgroup_meta_reserve(root, (s64)num_bytes);
+	trace_qgroup_meta_reserve(root, type, (s64)num_bytes);
 	ret = qgroup_reserve(root, num_bytes, enforce, type);
 	if (ret < 0)
 		return ret;
@@ -3175,7 +3175,7 @@ void btrfs_qgroup_free_meta_all_pertrans(struct btrfs_root *root)
 		return;
 
 	/* TODO: Update trace point to handle such free */
-	trace_qgroup_meta_reserve(root, 0);
+	trace_qgroup_meta_free_all_pertrans(root);
 	/* Special value -1 means to free all reserved space */
 	btrfs_qgroup_free_refroot(fs_info, root->objectid, (u64)-1,
 				  BTRFS_QGROUP_RSV_META_PERTRANS);
@@ -3197,7 +3197,7 @@ void __btrfs_qgroup_free_meta(struct btrfs_root *root, int num_bytes,
 	 */
 	num_bytes = sub_root_meta_rsv(root, num_bytes, type);
 	BUG_ON(num_bytes != round_down(num_bytes, fs_info->nodesize));
-	trace_qgroup_meta_reserve(root, -(s64)num_bytes);
+	trace_qgroup_meta_reserve(root, type, -(s64)num_bytes);
 	btrfs_qgroup_free_refroot(fs_info, root->objectid, num_bytes, type);
 }
 
@@ -3257,6 +3257,7 @@ void btrfs_qgroup_convert_reserved_meta(struct btrfs_root *root, int num_bytes)
 	/* Same as btrfs_qgroup_free_meta_prealloc() */
 	num_bytes = sub_root_meta_rsv(root, num_bytes,
 				      BTRFS_QGROUP_RSV_META_PREALLOC);
+	trace_qgroup_meta_convert(root, num_bytes);
 	qgroup_convert_meta(fs_info, root->objectid, num_bytes);
 }
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index e3218e673e1d..4460f1e84e45 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1629,6 +1629,28 @@ TRACE_EVENT(qgroup_update_reserve,
 
 TRACE_EVENT(qgroup_meta_reserve,
 
+	TP_PROTO(struct btrfs_root *root, s64 diff, int type),
+
+	TP_ARGS(root, diff, type),
+
+	TP_STRUCT__entry_btrfs(
+		__field(	u64,	refroot			)
+		__field(	s64,	diff			)
+		__field(	int,	type			)
+	),
+
+	TP_fast_assign_btrfs(root->fs_info,
+		__entry->refroot	= root->objectid;
+		__entry->diff		= diff;
+	),
+
+	TP_printk_btrfs("refroot=%llu(%s) type=%s diff=%lld",
+		show_root_type(__entry->refroot),
+		show_qgroup_rsv_type(__entry->type), __entry->diff)
+);
+
+TRACE_EVENT(qgroup_meta_convert,
+
 	TP_PROTO(struct btrfs_root *root, s64 diff),
 
 	TP_ARGS(root, diff),
@@ -1636,6 +1658,7 @@ TRACE_EVENT(qgroup_meta_reserve,
 	TP_STRUCT__entry_btrfs(
 		__field(	u64,	refroot			)
 		__field(	s64,	diff			)
+		__field(	int,	type			)
 	),
 
 	TP_fast_assign_btrfs(root->fs_info,
@@ -1643,8 +1666,36 @@ TRACE_EVENT(qgroup_meta_reserve,
 		__entry->diff		= diff;
 	),
 
-	TP_printk_btrfs("refroot=%llu(%s) diff=%lld",
-		show_root_type(__entry->refroot), __entry->diff)
+	TP_printk_btrfs("refroot=%llu(%s) type=%s->%s diff=%lld",
+		show_root_type(__entry->refroot),
+		show_qgroup_rsv_type(BTRFS_QGROUP_RSV_META_PREALLOC),
+		show_qgroup_rsv_type(BTRFS_QGROUP_RSV_META_PERTRANS),
+		__entry->diff)
+);
+
+TRACE_EVENT(qgroup_meta_free_all_pertrans,
+
+	TP_PROTO(struct btrfs_root *root),
+
+	TP_ARGS(root),
+
+	TP_STRUCT__entry_btrfs(
+		__field(	u64,	refroot			)
+		__field(	s64,	diff			)
+		__field(	int,	type			)
+	),
+
+	TP_fast_assign_btrfs(root->fs_info,
+		__entry->refroot	= root->objectid;
+		spin_lock(&root->qgroup_meta_rsv_lock);
+		__entry->diff		= -(s64)root->qgroup_meta_rsv_pertrans;
+		spin_unlock(&root->qgroup_meta_rsv_lock);
+		__entry->type		= BTRFS_QGROUP_RSV_META_PERTRANS;
+	),
+
+	TP_printk_btrfs("refroot=%llu(%s) type=%s diff=%lld",
+		show_root_type(__entry->refroot),
+		show_qgroup_rsv_type(__entry->type), __entry->diff)
 );
 
 DECLARE_EVENT_CLASS(btrfs__prelim_ref,
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 08/10] Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"
  2017-12-22  6:18 [PATCH v2 00/10] Use split qgroup rsv type Qu Wenruo
                   ` (6 preceding siblings ...)
  2017-12-22  6:18 ` [PATCH v2 07/10] btrfs: qgroup: Update trace events for metadata reservation Qu Wenruo
@ 2017-12-22  6:18 ` Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 09/10] btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv Qu Wenruo
  9 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  6:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

This reverts commit 48a89bc4f2ceab87bc858a8eb189636b09c846a7.

The idea to commit transaction and free some space after hitting qgroup
limit is good, although the problem is it will easily cause deadlocks.

One deadlock example is caused by trying to flush data while still
holding it:
Call Trace:
 __schedule+0x49d/0x10f0
 schedule+0xc6/0x290
 schedule_timeout+0x187/0x1c0
 wait_for_completion+0x204/0x3a0
 btrfs_wait_ordered_extents+0xa40/0xaf0 [btrfs]
 qgroup_reserve+0x913/0xa10 [btrfs]
 btrfs_qgroup_reserve_data+0x3ef/0x580 [btrfs]
 btrfs_check_data_free_space+0x96/0xd0 [btrfs]
 __btrfs_buffered_write+0x3ac/0xd40 [btrfs]
 btrfs_file_write_iter+0x62a/0xba0 [btrfs]
 __vfs_write+0x320/0x430
 vfs_write+0x107/0x270
 SyS_write+0xbf/0x150
 do_syscall_64+0x1b0/0x3d0
 entry_SYSCALL64_slow_path+0x25/0x25

Another case can be caused by trying to commit one transaction while
nesting with trans handler hold by ourselves:
btrfs_start_transaction()
|- btrfs_qgroup_reserve_meta_pertrans()
   |- qgroup_reserve()
      |- btrfs_join_transaction()
      |- btrfs_commit_transaction()

The retry is causing more problem than expectation when limit is
enabled.
At least graceful EDQUOT is way better than kernel deadlock.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 23 -----------------------
 1 file changed, 23 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index ee5b05dd10a9..6d5b476feb08 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2416,7 +2416,6 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce,
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	u64 ref_root = root->root_key.objectid;
 	int ret = 0;
-	int retried = 0;
 	struct ulist_node *unode;
 	struct ulist_iterator uiter;
 
@@ -2430,7 +2429,6 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce,
 	    capable(CAP_SYS_RESOURCE))
 		enforce = false;
 
-retry:
 	spin_lock(&fs_info->qgroup_lock);
 	quota_root = fs_info->quota_root;
 	if (!quota_root)
@@ -2457,27 +2455,6 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce,
 		qg = unode_aux_to_qgroup(unode);
 
 		if (enforce && !qgroup_check_limits(qg, num_bytes)) {
-			/*
-			 * Commit the tree and retry, since we may have
-			 * deletions which would free up space.
-			 */
-			if (!retried && qgroup_rsv_total(qg) > 0) {
-				struct btrfs_trans_handle *trans;
-
-				spin_unlock(&fs_info->qgroup_lock);
-				ret = btrfs_start_delalloc_inodes(root, 0);
-				if (ret)
-					return ret;
-				btrfs_wait_ordered_extents(root, U64_MAX, 0, (u64)-1);
-				trans = btrfs_join_transaction(root);
-				if (IS_ERR(trans))
-					return PTR_ERR(trans);
-				ret = btrfs_commit_transaction(trans);
-				if (ret)
-					return ret;
-				retried++;
-				goto retry;
-			}
 			ret = -EDQUOT;
 			goto out;
 		}
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 09/10] btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT
  2017-12-22  6:18 [PATCH v2 00/10] Use split qgroup rsv type Qu Wenruo
                   ` (7 preceding siblings ...)
  2017-12-22  6:18 ` [PATCH v2 08/10] Revert "btrfs: qgroups: Retry after commit on getting EDQUOT" Qu Wenruo
@ 2017-12-22  6:18 ` Qu Wenruo
  2017-12-22  8:06   ` [PATCH v2.1 " Qu Wenruo
  2017-12-22  6:18 ` [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv Qu Wenruo
  9 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  6:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Unlike previous method to try commit transaction inside
qgroup_reserve(), this time we will try to commit transaction using
fs_info->transaction_kthread to avoid nested transaction and no need to
worry about lock context.

Since it's an asynchronous function call and we won't wait transaction
commit, unlike previous method, we must call it before we hit qgroup
limit.

So this patch will use the ratio and size of qgroup meta_pertrans
reservation as indicator to check if we should trigger a transaction
commitment.
(meta_prealloc won't be cleaned in transaction commitment, it's useless
anyway)

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 43 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 6d5b476feb08..02eed4250815 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -24,6 +24,7 @@
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 #include <linux/btrfs.h>
+#include <linux/sizes.h>
 
 #include "ctree.h"
 #include "transaction.h"
@@ -2395,8 +2396,21 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-static bool qgroup_check_limits(const struct btrfs_qgroup *qg, u64 num_bytes)
+/*
+ * Two limits to commit transaction in advance.
+ *
+ * For RATIO, it will be 1/RATIO of the remaining limit
+ * (excluding data and prealloc meta) as threshold.
+ * For SIZE, it will be in byte unit as threshold.
+ */
+#define QGROUP_PERTRANS_RATIO		32
+#define QGROUP_PERTRANS_SIZE		SZ_16M
+static bool qgroup_check_limits(struct btrfs_fs_info *fs_info,
+				const struct btrfs_qgroup *qg, u64 num_bytes)
 {
+	u64 limit;
+	u64 threshold;
+
 	if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
 	    qgroup_rsv_total(qg) + (s64)qg->rfer + num_bytes > qg->max_rfer)
 		return false;
@@ -2405,6 +2419,31 @@ static bool qgroup_check_limits(const struct btrfs_qgroup *qg, u64 num_bytes)
 	    qgroup_rsv_total(qg) + (s64)qg->excl + num_bytes > qg->max_excl)
 		return false;
 
+	/*
+	 * Even if we passed the check, it's better to check if reservation
+	 * for meta_pertrans is pushing us near limit.
+	 * If there is too much pertrans reservation or it's near the limit,
+	 * let's try commit transaction to free some, using transaction_kthread
+	 */
+	if ((qg->lim_flags & (BTRFS_QGROUP_LIMIT_MAX_RFER |
+			      BTRFS_QGROUP_LIMIT_MAX_EXCL))) {
+		if (qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL)
+			limit = qg->max_excl;
+		else
+			limit = qg->max_rfer;
+		threshold = (limit - qg->rsv.values[BTRFS_QGROUP_RSV_DATA] -
+			    qg->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC]) /
+			    QGROUP_PERTRANS_RATIO;
+		threshold = min_t(u64, threshold, QGROUP_PERTRANS_SIZE);
+
+		/*
+		 * Use transaction_kthread to commit transaction, so we no
+		 * longer need to bother nested transaction nor lock context.
+		 */
+		if (qg->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS] > threshold)
+			wake_up_process(fs_info->transaction_kthread);
+	}
+
 	return true;
 }
 
@@ -2454,7 +2493,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce,
 
 		qg = unode_aux_to_qgroup(unode);
 
-		if (enforce && !qgroup_check_limits(qg, num_bytes)) {
+		if (enforce && !qgroup_check_limits(fs_info, qg, num_bytes)) {
 			ret = -EDQUOT;
 			goto out;
 		}
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2017-12-22  6:18 [PATCH v2 00/10] Use split qgroup rsv type Qu Wenruo
                   ` (8 preceding siblings ...)
  2017-12-22  6:18 ` [PATCH v2 09/10] btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT Qu Wenruo
@ 2017-12-22  6:18 ` Qu Wenruo
  2018-02-22 22:44   ` Jeff Mahoney
  2018-04-03  7:30   ` Qu Wenruo
  9 siblings, 2 replies; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  6:18 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Unlike reservation calculation used in inode rsv for metadata, qgroup
doesn't really need to care things like csum size or extent usage for
whole tree COW.

Qgroup care more about net change of extent usage.
That's to say, if we're going to insert one file extent, it will mostly
find its place in CoWed tree block, leaving no change in extent usage.
Or cause leaf split, result one new net extent, increasing qgroup number
by nodesize.
(Or even more rare case, increase the tree level, increasing qgroup
number by 2 * nodesize)

So here instead of using the way overkilled calculation for extent
allocator, which cares more about accurate and no error, qgroup doesn't
need that over-calculated reservation.

This patch will maintain 2 new members in btrfs_block_rsv structure for
qgroup, using much smaller calculation for qgroup rsv, reducing false
EDQUOT.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h       | 18 +++++++++++++++++
 fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0c58f92c2d44..97783ba91e00 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -467,6 +467,24 @@ struct btrfs_block_rsv {
 	unsigned short full;
 	unsigned short type;
 	unsigned short failfast;
+
+	/*
+	 * Qgroup equivalent for @size @reserved
+	 *
+	 * Unlike normal normal @size/@reserved for inode rsv,
+	 * qgroup doesn't care about things like csum size nor how many tree
+	 * blocks it will need to reserve.
+	 *
+	 * Qgroup cares more about *NET* change of extent usage.
+	 * So for ONE newly inserted file extent, in worst case it will cause
+	 * leaf split and level increase, nodesize for each file extent
+	 * is already way overkilled.
+	 *
+	 * In short, qgroup_size/reserved is the up limit of possible needed
+	 * qgroup metadata reservation.
+	 */
+	u64 qgroup_rsv_size;
+	u64 qgroup_rsv_reserved;
 };
 
 /*
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 986660f301de..9d579c7bcf7f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
 
 static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
 				    struct btrfs_block_rsv *block_rsv,
-				    struct btrfs_block_rsv *dest, u64 num_bytes)
+				    struct btrfs_block_rsv *dest, u64 num_bytes,
+				    u64 *qgroup_to_release_ret)
 {
 	struct btrfs_space_info *space_info = block_rsv->space_info;
+	u64 qgroup_to_release = 0;
 	u64 ret;
 
 	spin_lock(&block_rsv->lock);
-	if (num_bytes == (u64)-1)
+	if (num_bytes == (u64)-1) {
 		num_bytes = block_rsv->size;
+		qgroup_to_release = block_rsv->qgroup_rsv_size;
+	}
 	block_rsv->size -= num_bytes;
 	if (block_rsv->reserved >= block_rsv->size) {
 		num_bytes = block_rsv->reserved - block_rsv->size;
@@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
 	} else {
 		num_bytes = 0;
 	}
+	if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
+		qgroup_to_release = block_rsv->qgroup_rsv_reserved -
+				    block_rsv->qgroup_rsv_size;
+		block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
+	} else
+		qgroup_to_release = 0;
 	spin_unlock(&block_rsv->lock);
 
 	ret = num_bytes;
@@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
 			space_info_add_old_bytes(fs_info, space_info,
 						 num_bytes);
 	}
+	if (qgroup_to_release_ret)
+		*qgroup_to_release_ret = qgroup_to_release;
 	return ret;
 }
 
@@ -5744,17 +5756,21 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
 	struct btrfs_root *root = inode->root;
 	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
 	u64 num_bytes = 0;
+	u64 qgroup_num_bytes = 0;
 	int ret = -ENOSPC;
 
 	spin_lock(&block_rsv->lock);
 	if (block_rsv->reserved < block_rsv->size)
 		num_bytes = block_rsv->size - block_rsv->reserved;
+	if (block_rsv->qgroup_rsv_reserved < block_rsv->qgroup_rsv_size)
+		qgroup_num_bytes = block_rsv->qgroup_rsv_size -
+				   block_rsv->qgroup_rsv_reserved;
 	spin_unlock(&block_rsv->lock);
 
 	if (num_bytes == 0)
 		return 0;
 
-	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
+	ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_num_bytes, true);
 	if (ret)
 		return ret;
 	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
@@ -5762,7 +5778,13 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
 		block_rsv_add_bytes(block_rsv, num_bytes, 0);
 		trace_btrfs_space_reservation(root->fs_info, "delalloc",
 					      btrfs_ino(inode), num_bytes, 1);
-	}
+
+		/* Don't forget to increase qgroup_rsv_reserved */
+		spin_lock(&block_rsv->lock);
+		block_rsv->qgroup_rsv_reserved += qgroup_num_bytes;
+		spin_unlock(&block_rsv->lock);
+	} else
+		btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes);
 	return ret;
 }
 
@@ -5784,20 +5806,23 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
 	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
 	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
 	u64 released = 0;
+	u64 qgroup_to_release = 0;
 
 	/*
 	 * Since we statically set the block_rsv->size we just want to say we
 	 * are releasing 0 bytes, and then we'll just get the reservation over
 	 * the size free'd.
 	 */
-	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0);
+	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
+					   &qgroup_to_release);
 	if (released > 0)
 		trace_btrfs_space_reservation(fs_info, "delalloc",
 					      btrfs_ino(inode), released, 0);
 	if (qgroup_free)
-		btrfs_qgroup_free_meta_prealloc(inode->root, released);
+		btrfs_qgroup_free_meta_prealloc(inode->root, qgroup_to_release);
 	else
-		btrfs_qgroup_convert_reserved_meta(inode->root, released);
+		btrfs_qgroup_convert_reserved_meta(inode->root,
+						   qgroup_to_release);
 }
 
 void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
@@ -5809,7 +5834,7 @@ void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
 	if (global_rsv == block_rsv ||
 	    block_rsv->space_info != global_rsv->space_info)
 		global_rsv = NULL;
-	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes);
+	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
 }
 
 static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
@@ -5889,7 +5914,7 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
 static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
 {
 	block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
-				(u64)-1);
+				(u64)-1, NULL);
 	WARN_ON(fs_info->trans_block_rsv.size > 0);
 	WARN_ON(fs_info->trans_block_rsv.reserved > 0);
 	WARN_ON(fs_info->chunk_block_rsv.size > 0);
@@ -5931,7 +5956,7 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
 	WARN_ON_ONCE(!list_empty(&trans->new_bgs));
 
 	block_rsv_release_bytes(fs_info, &fs_info->chunk_block_rsv, NULL,
-				trans->chunk_bytes_reserved);
+				trans->chunk_bytes_reserved, NULL);
 	trans->chunk_bytes_reserved = 0;
 }
 
@@ -6036,6 +6061,7 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
 {
 	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
 	u64 reserve_size = 0;
+	u64 qgroup_rsv_size = 0;
 	u64 csum_leaves;
 	unsigned outstanding_extents;
 
@@ -6048,9 +6074,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
 						 inode->csum_bytes);
 	reserve_size += btrfs_calc_trans_metadata_size(fs_info,
 						       csum_leaves);
+	/*
+	 * For qgroup rsv, the calculation is very simple:
+	 * nodesize for each outstanding extent.
+	 * This is already overkilled under most case.
+	 */
+	qgroup_rsv_size = outstanding_extents * fs_info->nodesize;
 
 	spin_lock(&block_rsv->lock);
 	block_rsv->size = reserve_size;
+	block_rsv->qgroup_rsv_size = qgroup_rsv_size;
 	spin_unlock(&block_rsv->lock);
 }
 
@@ -8405,7 +8438,7 @@ static void unuse_block_rsv(struct btrfs_fs_info *fs_info,
 			    struct btrfs_block_rsv *block_rsv, u32 blocksize)
 {
 	block_rsv_add_bytes(block_rsv, blocksize, 0);
-	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0);
+	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0, NULL);
 }
 
 /*
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2.1 09/10] btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT
  2017-12-22  6:18 ` [PATCH v2 09/10] btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT Qu Wenruo
@ 2017-12-22  8:06   ` Qu Wenruo
  0 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2017-12-22  8:06 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Unlike previous method to try commit transaction inside
qgroup_reserve(), this time we will try to commit transaction using
fs_info->transaction_kthread to avoid nested transaction and no need to
worry about lock context.

Since it's an asynchronous function call and we won't wait transaction
commit, unlike previous method, we must call it before we hit qgroup
limit.

So this patch will use the ratio and size of qgroup meta_pertrans
reservation as indicator to check if we should trigger a transaction
commitment.
(meta_prealloc won't be cleaned in transaction commitment, it's useless
anyway)

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
v2.1
  Allow transaction_kthread to commit without waiting for commit
  interval.
  Previous version it doesn't really work due to the commit interval
  limit.

  Now btrfs can handle small file creation much better. Although
  btrfs/139 still hit EDQUOT earlier than expected.
---
 fs/btrfs/ctree.h       |  6 ++++++
 fs/btrfs/disk-io.c     |  1 +
 fs/btrfs/qgroup.c      | 43 +++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/transaction.c |  1 +
 fs/btrfs/transaction.h | 14 ++++++++++++++
 5 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0c58f92c2d44..141a36571bd7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -724,6 +724,12 @@ struct btrfs_delayed_root;
  */
 #define BTRFS_FS_EXCL_OP			16
 
+/*
+ * To info transaction_kthread we need an immediate commit so it doesn't
+ * need to wait for commit_interval
+ */
+#define BTRFS_FS_NEED_ASYNC_COMMIT		17
+
 struct btrfs_fs_info {
 	u8 fsid[BTRFS_FSID_SIZE];
 	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d8eaadac4330..a33e1da195ec 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1787,6 +1787,7 @@ static int transaction_kthread(void *arg)
 
 		now = get_seconds();
 		if (cur->state < TRANS_STATE_BLOCKED &&
+		    !test_bit(BTRFS_FS_NEED_ASYNC_COMMIT, &fs_info->flags) &&
 		    (now < cur->start_time ||
 		     now - cur->start_time < fs_info->commit_interval)) {
 			spin_unlock(&fs_info->trans_lock);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 6d5b476feb08..57d8fd605655 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -24,6 +24,7 @@
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 #include <linux/btrfs.h>
+#include <linux/sizes.h>
 
 #include "ctree.h"
 #include "transaction.h"
@@ -2395,8 +2396,21 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-static bool qgroup_check_limits(const struct btrfs_qgroup *qg, u64 num_bytes)
+/*
+ * Two limits to commit transaction in advance.
+ *
+ * For RATIO, it will be 1/RATIO of the remaining limit
+ * (excluding data and prealloc meta) as threshold.
+ * For SIZE, it will be in byte unit as threshold.
+ */
+#define QGROUP_PERTRANS_RATIO		32
+#define QGROUP_PERTRANS_SIZE		SZ_32M
+static bool qgroup_check_limits(struct btrfs_fs_info *fs_info,
+				const struct btrfs_qgroup *qg, u64 num_bytes)
 {
+	u64 limit;
+	u64 threshold;
+
 	if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
 	    qgroup_rsv_total(qg) + (s64)qg->rfer + num_bytes > qg->max_rfer)
 		return false;
@@ -2405,6 +2419,31 @@ static bool qgroup_check_limits(const struct btrfs_qgroup *qg, u64 num_bytes)
 	    qgroup_rsv_total(qg) + (s64)qg->excl + num_bytes > qg->max_excl)
 		return false;
 
+	/*
+	 * Even if we passed the check, it's better to check if reservation
+	 * for meta_pertrans is pushing us near limit.
+	 * If there is too much pertrans reservation or it's near the limit,
+	 * let's try commit transaction to free some, using transaction_kthread
+	 */
+	if ((qg->lim_flags & (BTRFS_QGROUP_LIMIT_MAX_RFER |
+			      BTRFS_QGROUP_LIMIT_MAX_EXCL))) {
+		if (qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL)
+			limit = qg->max_excl;
+		else
+			limit = qg->max_rfer;
+		threshold = (limit - qg->rsv.values[BTRFS_QGROUP_RSV_DATA] -
+			    qg->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC]) /
+			    QGROUP_PERTRANS_RATIO;
+		threshold = min_t(u64, threshold, QGROUP_PERTRANS_SIZE);
+
+		/*
+		 * Use transaction_kthread to commit transaction, so we no
+		 * longer need to bother nested transaction nor lock context.
+		 */
+		if (qg->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS] > threshold)
+			btrfs_commit_transaction_locksafe(fs_info);
+	}
+
 	return true;
 }
 
@@ -2454,7 +2493,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce,
 
 		qg = unode_aux_to_qgroup(unode);
 
-		if (enforce && !qgroup_check_limits(qg, num_bytes)) {
+		if (enforce && !qgroup_check_limits(fs_info, qg, num_bytes)) {
 			ret = -EDQUOT;
 			goto out;
 		}
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index ddae813c01dd..5e1c1d2295eb 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2289,6 +2289,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 	 */
 	cur_trans->state = TRANS_STATE_COMPLETED;
 	wake_up(&cur_trans->commit_wait);
+	clear_bit(BTRFS_FS_NEED_ASYNC_COMMIT, &fs_info->flags);
 
 	spin_lock(&fs_info->trans_lock);
 	list_del_init(&cur_trans->list);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index c55e44560103..63f1dc7f9268 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -204,6 +204,20 @@ int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root);
 int btrfs_commit_transaction(struct btrfs_trans_handle *trans);
 int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
 				   int wait_for_unblock);
+
+/*
+ * Try to commit transaction asynchronously, so this is safe to call
+ * even holding a spinlock.
+ *
+ * It's done by informing transaction_kthread to commit transaction without
+ * waiting for commit interval.
+ */
+static inline void btrfs_commit_transaction_locksafe(
+		struct btrfs_fs_info *fs_info)
+{
+	set_bit(BTRFS_FS_NEED_ASYNC_COMMIT, &fs_info->flags);
+	wake_up_process(fs_info->transaction_kthread);
+}
 int btrfs_end_transaction_throttle(struct btrfs_trans_handle *trans);
 int btrfs_should_end_transaction(struct btrfs_trans_handle *trans);
 void btrfs_throttle(struct btrfs_fs_info *fs_info);
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2.2 04/10] btrfs: qgroup: Use separate meta reservation type for delalloc
  2017-12-22  6:18 ` [PATCH v2 04/10] btrfs: qgroup: Use separate meta reservation type for delalloc Qu Wenruo
@ 2017-12-26  5:37   ` Qu Wenruo
  2017-12-26  5:40     ` Qu Wenruo
  0 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2017-12-26  5:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba

Before this patch, btrfs qgroup is mixing per-transcation meta rsv with
preallocated meta rsv, making it quite easy to underflow qgroup meta
reservation.

Since we have the new qgroup meta rsv types, apply it to delalloc
reservation.

Now for delalloc, most of its reserved space will use META_PREALLOC qgroup
rsv type.

And for callers reducing outstanding extent like btrfs_finish_ordered_io(),
they will convert corresponding META_PREALLOC reservation to
META_PERTRANS.

This is mainly due to the fact that current qgroup numbers will only be
updated in btrfs_commit_transaction(), that's to say if we don't keep
such placeholder reservation, we can exceed qgroup limitation.

And for callers freeing outstanding extent in error handler, we will
just free META_PREALLOC bytes.

This behavior makes callers of btrfs_qgroup_release_meta() or
btrfs_qgroup_convert_meta() to be aware of which type they are.
So in this patch, btrfs_delalloc_release_metadata() and its callers get
an extra parameter to info qgroup to do correct meta convert/release.

The good news is, even we use the wrong type (convert or free), it won't
cause obvious bug, as prealloc type is always in good shape, and the
type only affects how per-trans meta is increased or not.

So the worst case will be at most metadata limitation can be sometimes
exceeded (no convert at all) or metadata limitation is reached too soon
(no free at all).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
changelog:
v2.2
  Fix the wrong parameter in __btrfs_buffered_write(), where qgroup
  should be freed but not converted.
  Wrongly convert the reserved meta will easily flood the pertrans rsv,
  as each successful buffered write will increase pertrans usage, which
  results too early EDQUOT.
---
 fs/btrfs/ctree.h            |  9 ++++++---
 fs/btrfs/extent-tree.c      | 45 +++++++++++++++++++++++++--------------------
 fs/btrfs/file.c             | 15 ++++++++-------
 fs/btrfs/free-space-cache.c |  2 +-
 fs/btrfs/inode-map.c        |  4 ++--
 fs/btrfs/inode.c            | 27 ++++++++++++++-------------
 fs/btrfs/ioctl.c            | 10 ++++++----
 fs/btrfs/ordered-data.c     |  2 +-
 fs/btrfs/relocation.c       |  9 +++++----
 9 files changed, 68 insertions(+), 55 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bd2bf589e6c8..091c0e06b32e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2728,7 +2728,8 @@ int btrfs_check_data_free_space(struct inode *inode,
 void btrfs_free_reserved_data_space(struct inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len);
 void btrfs_delalloc_release_space(struct inode *inode,
-			struct extent_changeset *reserved, u64 start, u64 len);
+				  struct extent_changeset *reserved,
+				  u64 start, u64 len, bool qgroup_free);
 void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
 					    u64 len);
 void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
@@ -2743,10 +2744,12 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 				     u64 *qgroup_reserved, bool use_global_rsv);
 void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
 				      struct btrfs_block_rsv *rsv);
-void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes);
+void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes,
+				    bool qgroup_free);
 
 int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
-void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes);
+void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
+				     bool qgroup_free);
 int btrfs_delalloc_reserve_space(struct inode *inode,
 			struct extent_changeset **reserved, u64 start, u64 len);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 20923fce06c4..986660f301de 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5754,6 +5754,9 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
 	if (num_bytes == 0)
 		return 0;
 
+	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
+	if (ret)
+		return ret;
 	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
 	if (!ret) {
 		block_rsv_add_bytes(block_rsv, num_bytes, 0);
@@ -5766,11 +5769,16 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
 /**
  * btrfs_inode_rsv_release - release any excessive reservation.
  * @inode - the inode we need to release from.
+ * @qgroup_free - free or convert qgroup meta.
+ *   Unlike normal operation, qgroup meta reservation needs to know if
+ *   we are freeing qgroup reservation or just convert them into per-trans.
+ *   Normally @qgroup_free is true for error handler, and false for normal
+ *   release.
  *
  * This is the same as btrfs_block_rsv_release, except that it handles the
  * tracepoint for the reservation.
  */
-void btrfs_inode_rsv_release(struct btrfs_inode *inode)
+void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
@@ -5786,6 +5794,10 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode)
 	if (released > 0)
 		trace_btrfs_space_reservation(fs_info, "delalloc",
 					      btrfs_ino(inode), released, 0);
+	if (qgroup_free)
+		btrfs_qgroup_free_meta_prealloc(inode->root, released);
+	else
+		btrfs_qgroup_convert_reserved_meta(inode->root, released);
 }
 
 void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
@@ -6045,7 +6057,6 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
 int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
-	struct btrfs_root *root = inode->root;
 	unsigned nr_extents;
 	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_FLUSH_ALL;
 	int ret = 0;
@@ -6083,19 +6094,9 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 	btrfs_calculate_inode_block_rsv_size(fs_info, inode);
 	spin_unlock(&inode->lock);
 
-	if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
-		ret = btrfs_qgroup_reserve_meta_prealloc(root,
-				nr_extents * fs_info->nodesize, true);
-		if (ret)
-			goto out_fail;
-	}
-
 	ret = btrfs_inode_rsv_refill(inode, flush);
-	if (unlikely(ret)) {
-		btrfs_qgroup_free_meta_prealloc(root,
-				       nr_extents * fs_info->nodesize);
+	if (unlikely(ret))
 		goto out_fail;
-	}
 
 	if (delalloc_lock)
 		mutex_unlock(&inode->delalloc_mutex);
@@ -6109,7 +6110,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 	btrfs_calculate_inode_block_rsv_size(fs_info, inode);
 	spin_unlock(&inode->lock);
 
-	btrfs_inode_rsv_release(inode);
+	btrfs_inode_rsv_release(inode, true);
 	if (delalloc_lock)
 		mutex_unlock(&inode->delalloc_mutex);
 	return ret;
@@ -6119,12 +6120,14 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
  * btrfs_delalloc_release_metadata - release a metadata reservation for an inode
  * @inode: the inode to release the reservation for.
  * @num_bytes: the number of bytes we are releasing.
+ * @qgroup_free: free qgroup reservation or convert it to per-trans reservation
  *
  * This will release the metadata reservation for an inode.  This can be called
  * once we complete IO for a given set of bytes to release their metadata
  * reservations, or on error for the same reason.
  */
-void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
+void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
+				     bool qgroup_free)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
 
@@ -6137,13 +6140,14 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
 	if (btrfs_is_testing(fs_info))
 		return;
 
-	btrfs_inode_rsv_release(inode);
+	btrfs_inode_rsv_release(inode, qgroup_free);
 }
 
 /**
  * btrfs_delalloc_release_extents - release our outstanding_extents
  * @inode: the inode to balance the reservation for.
  * @num_bytes: the number of bytes we originally reserved with
+ * @qgroup_free: do we need to free qgroup meta reservation or convert them.
  *
  * When we reserve space we increase outstanding_extents for the extents we may
  * add.  Once we've set the range as delalloc or created our ordered extents we
@@ -6151,7 +6155,8 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
  * temporarily tracked outstanding_extents.  This _must_ be used in conjunction
  * with btrfs_delalloc_reserve_metadata.
  */
-void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes)
+void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes,
+				    bool qgroup_free)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
 	unsigned num_extents;
@@ -6165,7 +6170,7 @@ void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes)
 	if (btrfs_is_testing(fs_info))
 		return;
 
-	btrfs_inode_rsv_release(inode);
+	btrfs_inode_rsv_release(inode, qgroup_free);
 }
 
 /**
@@ -6221,9 +6226,9 @@ int btrfs_delalloc_reserve_space(struct inode *inode,
  */
 void btrfs_delalloc_release_space(struct inode *inode,
 				  struct extent_changeset *reserved,
-				  u64 start, u64 len)
+				  u64 start, u64 len, bool qgroup_free)
 {
-	btrfs_delalloc_release_metadata(BTRFS_I(inode), len);
+	btrfs_delalloc_release_metadata(BTRFS_I(inode), len, qgroup_free);
 	btrfs_free_reserved_data_space(inode, reserved, start, len);
 }
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index eb1bac7c8553..3bf186746119 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1690,7 +1690,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				    force_page_uptodate);
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
-						       reserve_bytes);
+						       reserve_bytes, true);
 			break;
 		}
 
@@ -1702,7 +1702,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 			if (extents_locked == -EAGAIN)
 				goto again;
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
-						       reserve_bytes);
+						       reserve_bytes, true);
 			ret = extents_locked;
 			break;
 		}
@@ -1737,7 +1737,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 						fs_info->sb->s_blocksize_bits;
 			if (only_release_metadata) {
 				btrfs_delalloc_release_metadata(BTRFS_I(inode),
-								release_bytes);
+							release_bytes, true);
 			} else {
 				u64 __pos;
 
@@ -1746,7 +1746,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 					(dirty_pages << PAGE_SHIFT);
 				btrfs_delalloc_release_space(inode,
 						data_reserved, __pos,
-						release_bytes);
+						release_bytes, true);
 			}
 		}
 
@@ -1760,7 +1760,8 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
 					     lockstart, lockend, &cached_state,
 					     GFP_NOFS);
-		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes,
+					       true);
 		if (ret) {
 			btrfs_drop_pages(pages, num_pages);
 			break;
@@ -1800,11 +1801,11 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 		if (only_release_metadata) {
 			btrfs_end_write_no_snapshotting(root);
 			btrfs_delalloc_release_metadata(BTRFS_I(inode),
-					release_bytes);
+					release_bytes, true);
 		} else {
 			btrfs_delalloc_release_space(inode, data_reserved,
 					round_down(pos, fs_info->sectorsize),
-					release_bytes);
+					release_bytes, true);
 		}
 	}
 
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 4426d1c73e50..fbd831044a1d 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3548,7 +3548,7 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
 	if (ret) {
 		if (release_metadata)
 			btrfs_delalloc_release_metadata(BTRFS_I(inode),
-					inode->i_size);
+					inode->i_size, true);
 #ifdef DEBUG
 		btrfs_err(fs_info,
 			  "failed to write free ino cache for root %llu",
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 022b19336fee..9409dcc7020d 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -500,12 +500,12 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
 	ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
 					      prealloc, prealloc, &alloc_hint);
 	if (ret) {
-		btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc, true);
 		goto out_put;
 	}
 
 	ret = btrfs_write_out_ino_cache(root, trans, path, inode);
-	btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc, false);
 out_put:
 	iput(inode);
 out_release:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e1a7f3cb5be9..76a1b4819607 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1839,7 +1839,7 @@ static void btrfs_clear_bit_hook(void *private_data,
 		 */
 		if (*bits & EXTENT_CLEAR_META_RESV &&
 		    root != fs_info->tree_root)
-			btrfs_delalloc_release_metadata(inode, len);
+			btrfs_delalloc_release_metadata(inode, len, false);
 
 		/* For sanity tests. */
 		if (btrfs_is_testing(fs_info))
@@ -2102,7 +2102,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 				  0);
 	ClearPageChecked(page);
 	set_page_dirty(page);
-	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, false);
 out:
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree, page_start, page_end,
 			     &cached_state, GFP_NOFS);
@@ -4760,8 +4760,8 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	page = find_or_create_page(mapping, index, mask);
 	if (!page) {
 		btrfs_delalloc_release_space(inode, data_reserved,
-					     block_start, blocksize);
-		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
+					     block_start, blocksize, true);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
 		ret = -ENOMEM;
 		goto out;
 	}
@@ -4829,8 +4829,8 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 out_unlock:
 	if (ret)
 		btrfs_delalloc_release_space(inode, data_reserved, block_start,
-					     blocksize);
-	btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
+					     blocksize, true);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));
 	unlock_page(page);
 	put_page(page);
 out:
@@ -8814,7 +8814,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		if (ret < 0 && ret != -EIOCBQUEUED) {
 			if (dio_data.reserve)
 				btrfs_delalloc_release_space(inode, data_reserved,
-					offset, dio_data.reserve);
+					offset, dio_data.reserve, true);
 			/*
 			 * On error we might have left some ordered extents
 			 * without submitting corresponding bios for them, so
@@ -8830,8 +8830,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 					false);
 		} else if (ret >= 0 && (size_t)ret < count)
 			btrfs_delalloc_release_space(inode, data_reserved,
-					offset, count - (size_t)ret);
-		btrfs_delalloc_release_extents(BTRFS_I(inode), count);
+					offset, count - (size_t)ret, true);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), count, false);
 	}
 out:
 	if (wakeup)
@@ -9150,7 +9150,8 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 		if (reserved_space < PAGE_SIZE) {
 			end = page_start + reserved_space - 1;
 			btrfs_delalloc_release_space(inode, data_reserved,
-					page_start, PAGE_SIZE - reserved_space);
+					page_start, PAGE_SIZE - reserved_space,
+					true);
 		}
 	}
 
@@ -9200,16 +9201,16 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 
 out_unlock:
 	if (!ret) {
-		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true);
 		sb_end_pagefault(inode->i_sb);
 		extent_changeset_free(data_reserved);
 		return VM_FAULT_LOCKED;
 	}
 	unlock_page(page);
 out:
-	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, (ret != 0));
 	btrfs_delalloc_release_space(inode, data_reserved, page_start,
-				     reserved_space);
+				     reserved_space, (ret != 0));
 out_noreserve:
 	sb_end_pagefault(inode->i_sb);
 	extent_changeset_free(data_reserved);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2ef8acaac688..9c46b89de591 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1198,7 +1198,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		spin_unlock(&BTRFS_I(inode)->lock);
 		btrfs_delalloc_release_space(inode, data_reserved,
 				start_index << PAGE_SHIFT,
-				(page_cnt - i_done) << PAGE_SHIFT);
+				(page_cnt - i_done) << PAGE_SHIFT, true);
 	}
 
 
@@ -1217,7 +1217,8 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		unlock_page(pages[i]);
 		put_page(pages[i]);
 	}
-	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT,
+				       false);
 	extent_changeset_free(data_reserved);
 	return i_done;
 out:
@@ -1227,8 +1228,9 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	}
 	btrfs_delalloc_release_space(inode, data_reserved,
 			start_index << PAGE_SHIFT,
-			page_cnt << PAGE_SHIFT);
-	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT);
+			page_cnt << PAGE_SHIFT, true);
+	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT,
+				       true);
 	extent_changeset_free(data_reserved);
 	return ret;
 
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 5b311aeddcc8..c6f5f7e274a5 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -610,7 +610,7 @@ void btrfs_remove_ordered_extent(struct inode *inode,
 	btrfs_mod_outstanding_extents(btrfs_inode, -1);
 	spin_unlock(&btrfs_inode->lock);
 	if (root != fs_info->tree_root)
-		btrfs_delalloc_release_metadata(btrfs_inode, entry->len);
+		btrfs_delalloc_release_metadata(btrfs_inode, entry->len, false);
 
 	tree = &btrfs_inode->ordered_tree;
 	spin_lock_irq(&tree->lock);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index f0c3f00e97cb..632129024c5f 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3226,7 +3226,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
 						   mask);
 			if (!page) {
 				btrfs_delalloc_release_metadata(BTRFS_I(inode),
-							PAGE_SIZE);
+							PAGE_SIZE, true);
 				ret = -ENOMEM;
 				goto out;
 			}
@@ -3245,9 +3245,9 @@ static int relocate_file_extent_cluster(struct inode *inode,
 				unlock_page(page);
 				put_page(page);
 				btrfs_delalloc_release_metadata(BTRFS_I(inode),
-							PAGE_SIZE);
+							PAGE_SIZE, true);
 				btrfs_delalloc_release_extents(BTRFS_I(inode),
-							       PAGE_SIZE);
+							       PAGE_SIZE, true);
 				ret = -EIO;
 				goto out;
 			}
@@ -3278,7 +3278,8 @@ static int relocate_file_extent_cluster(struct inode *inode,
 		put_page(page);
 
 		index++;
-		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
+		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE,
+					       false);
 		balance_dirty_pages_ratelimited(inode->i_mapping);
 		btrfs_throttle(fs_info);
 	}
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2.2 04/10] btrfs: qgroup: Use separate meta reservation type for delalloc
  2017-12-26  5:37   ` [PATCH v2.2 " Qu Wenruo
@ 2017-12-26  5:40     ` Qu Wenruo
  2017-12-26  7:10       ` Lakshmipathi.G
  0 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2017-12-26  5:40 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: dsterba


[-- Attachment #1.1: Type: text/plain, Size: 21404 bytes --]

And for anyone who wants to test the latest version of the patchset, it
can be fetched from the following branch:

https://github.com/adam900710/linux/tree/qgroup_meta_rework

And now, the branch can pass the existing qgroup tests without problem.

Thanks,
Qu


On 2017年12月26日 13:37, Qu Wenruo wrote:
> Before this patch, btrfs qgroup is mixing per-transcation meta rsv with
> preallocated meta rsv, making it quite easy to underflow qgroup meta
> reservation.
> 
> Since we have the new qgroup meta rsv types, apply it to delalloc
> reservation.
> 
> Now for delalloc, most of its reserved space will use META_PREALLOC qgroup
> rsv type.
> 
> And for callers reducing outstanding extent like btrfs_finish_ordered_io(),
> they will convert corresponding META_PREALLOC reservation to
> META_PERTRANS.
> 
> This is mainly due to the fact that current qgroup numbers will only be
> updated in btrfs_commit_transaction(), that's to say if we don't keep
> such placeholder reservation, we can exceed qgroup limitation.
> 
> And for callers freeing outstanding extent in error handler, we will
> just free META_PREALLOC bytes.
> 
> This behavior makes callers of btrfs_qgroup_release_meta() or
> btrfs_qgroup_convert_meta() to be aware of which type they are.
> So in this patch, btrfs_delalloc_release_metadata() and its callers get
> an extra parameter to info qgroup to do correct meta convert/release.
> 
> The good news is, even we use the wrong type (convert or free), it won't
> cause obvious bug, as prealloc type is always in good shape, and the
> type only affects how per-trans meta is increased or not.
> 
> So the worst case will be at most metadata limitation can be sometimes
> exceeded (no convert at all) or metadata limitation is reached too soon
> (no free at all).
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> changelog:
> v2.2
>   Fix the wrong parameter in __btrfs_buffered_write(), where qgroup
>   should be freed but not converted.
>   Wrongly convert the reserved meta will easily flood the pertrans rsv,
>   as each successful buffered write will increase pertrans usage, which
>   results too early EDQUOT.
> ---
>  fs/btrfs/ctree.h            |  9 ++++++---
>  fs/btrfs/extent-tree.c      | 45 +++++++++++++++++++++++++--------------------
>  fs/btrfs/file.c             | 15 ++++++++-------
>  fs/btrfs/free-space-cache.c |  2 +-
>  fs/btrfs/inode-map.c        |  4 ++--
>  fs/btrfs/inode.c            | 27 ++++++++++++++-------------
>  fs/btrfs/ioctl.c            | 10 ++++++----
>  fs/btrfs/ordered-data.c     |  2 +-
>  fs/btrfs/relocation.c       |  9 +++++----
>  9 files changed, 68 insertions(+), 55 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index bd2bf589e6c8..091c0e06b32e 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2728,7 +2728,8 @@ int btrfs_check_data_free_space(struct inode *inode,
>  void btrfs_free_reserved_data_space(struct inode *inode,
>  			struct extent_changeset *reserved, u64 start, u64 len);
>  void btrfs_delalloc_release_space(struct inode *inode,
> -			struct extent_changeset *reserved, u64 start, u64 len);
> +				  struct extent_changeset *reserved,
> +				  u64 start, u64 len, bool qgroup_free);
>  void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
>  					    u64 len);
>  void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
> @@ -2743,10 +2744,12 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
>  				     u64 *qgroup_reserved, bool use_global_rsv);
>  void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
>  				      struct btrfs_block_rsv *rsv);
> -void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes);
> +void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes,
> +				    bool qgroup_free);
>  
>  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
> -void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes);
> +void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
> +				     bool qgroup_free);
>  int btrfs_delalloc_reserve_space(struct inode *inode,
>  			struct extent_changeset **reserved, u64 start, u64 len);
>  void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 20923fce06c4..986660f301de 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5754,6 +5754,9 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>  	if (num_bytes == 0)
>  		return 0;
>  
> +	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
> +	if (ret)
> +		return ret;
>  	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
>  	if (!ret) {
>  		block_rsv_add_bytes(block_rsv, num_bytes, 0);
> @@ -5766,11 +5769,16 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>  /**
>   * btrfs_inode_rsv_release - release any excessive reservation.
>   * @inode - the inode we need to release from.
> + * @qgroup_free - free or convert qgroup meta.
> + *   Unlike normal operation, qgroup meta reservation needs to know if
> + *   we are freeing qgroup reservation or just convert them into per-trans.
> + *   Normally @qgroup_free is true for error handler, and false for normal
> + *   release.
>   *
>   * This is the same as btrfs_block_rsv_release, except that it handles the
>   * tracepoint for the reservation.
>   */
> -void btrfs_inode_rsv_release(struct btrfs_inode *inode)
> +void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>  {
>  	struct btrfs_fs_info *fs_info = inode->root->fs_info;
>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
> @@ -5786,6 +5794,10 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode)
>  	if (released > 0)
>  		trace_btrfs_space_reservation(fs_info, "delalloc",
>  					      btrfs_ino(inode), released, 0);
> +	if (qgroup_free)
> +		btrfs_qgroup_free_meta_prealloc(inode->root, released);
> +	else
> +		btrfs_qgroup_convert_reserved_meta(inode->root, released);
>  }
>  
>  void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
> @@ -6045,7 +6057,6 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
> -	struct btrfs_root *root = inode->root;
>  	unsigned nr_extents;
>  	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_FLUSH_ALL;
>  	int ret = 0;
> @@ -6083,19 +6094,9 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
>  	btrfs_calculate_inode_block_rsv_size(fs_info, inode);
>  	spin_unlock(&inode->lock);
>  
> -	if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) {
> -		ret = btrfs_qgroup_reserve_meta_prealloc(root,
> -				nr_extents * fs_info->nodesize, true);
> -		if (ret)
> -			goto out_fail;
> -	}
> -
>  	ret = btrfs_inode_rsv_refill(inode, flush);
> -	if (unlikely(ret)) {
> -		btrfs_qgroup_free_meta_prealloc(root,
> -				       nr_extents * fs_info->nodesize);
> +	if (unlikely(ret))
>  		goto out_fail;
> -	}
>  
>  	if (delalloc_lock)
>  		mutex_unlock(&inode->delalloc_mutex);
> @@ -6109,7 +6110,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
>  	btrfs_calculate_inode_block_rsv_size(fs_info, inode);
>  	spin_unlock(&inode->lock);
>  
> -	btrfs_inode_rsv_release(inode);
> +	btrfs_inode_rsv_release(inode, true);
>  	if (delalloc_lock)
>  		mutex_unlock(&inode->delalloc_mutex);
>  	return ret;
> @@ -6119,12 +6120,14 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
>   * btrfs_delalloc_release_metadata - release a metadata reservation for an inode
>   * @inode: the inode to release the reservation for.
>   * @num_bytes: the number of bytes we are releasing.
> + * @qgroup_free: free qgroup reservation or convert it to per-trans reservation
>   *
>   * This will release the metadata reservation for an inode.  This can be called
>   * once we complete IO for a given set of bytes to release their metadata
>   * reservations, or on error for the same reason.
>   */
> -void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
> +void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
> +				     bool qgroup_free)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
>  
> @@ -6137,13 +6140,14 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
>  	if (btrfs_is_testing(fs_info))
>  		return;
>  
> -	btrfs_inode_rsv_release(inode);
> +	btrfs_inode_rsv_release(inode, qgroup_free);
>  }
>  
>  /**
>   * btrfs_delalloc_release_extents - release our outstanding_extents
>   * @inode: the inode to balance the reservation for.
>   * @num_bytes: the number of bytes we originally reserved with
> + * @qgroup_free: do we need to free qgroup meta reservation or convert them.
>   *
>   * When we reserve space we increase outstanding_extents for the extents we may
>   * add.  Once we've set the range as delalloc or created our ordered extents we
> @@ -6151,7 +6155,8 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes)
>   * temporarily tracked outstanding_extents.  This _must_ be used in conjunction
>   * with btrfs_delalloc_reserve_metadata.
>   */
> -void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes)
> +void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes,
> +				    bool qgroup_free)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
>  	unsigned num_extents;
> @@ -6165,7 +6170,7 @@ void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes)
>  	if (btrfs_is_testing(fs_info))
>  		return;
>  
> -	btrfs_inode_rsv_release(inode);
> +	btrfs_inode_rsv_release(inode, qgroup_free);
>  }
>  
>  /**
> @@ -6221,9 +6226,9 @@ int btrfs_delalloc_reserve_space(struct inode *inode,
>   */
>  void btrfs_delalloc_release_space(struct inode *inode,
>  				  struct extent_changeset *reserved,
> -				  u64 start, u64 len)
> +				  u64 start, u64 len, bool qgroup_free)
>  {
> -	btrfs_delalloc_release_metadata(BTRFS_I(inode), len);
> +	btrfs_delalloc_release_metadata(BTRFS_I(inode), len, qgroup_free);
>  	btrfs_free_reserved_data_space(inode, reserved, start, len);
>  }
>  
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index eb1bac7c8553..3bf186746119 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1690,7 +1690,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  				    force_page_uptodate);
>  		if (ret) {
>  			btrfs_delalloc_release_extents(BTRFS_I(inode),
> -						       reserve_bytes);
> +						       reserve_bytes, true);
>  			break;
>  		}
>  
> @@ -1702,7 +1702,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  			if (extents_locked == -EAGAIN)
>  				goto again;
>  			btrfs_delalloc_release_extents(BTRFS_I(inode),
> -						       reserve_bytes);
> +						       reserve_bytes, true);
>  			ret = extents_locked;
>  			break;
>  		}
> @@ -1737,7 +1737,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  						fs_info->sb->s_blocksize_bits;
>  			if (only_release_metadata) {
>  				btrfs_delalloc_release_metadata(BTRFS_I(inode),
> -								release_bytes);
> +							release_bytes, true);
>  			} else {
>  				u64 __pos;
>  
> @@ -1746,7 +1746,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  					(dirty_pages << PAGE_SHIFT);
>  				btrfs_delalloc_release_space(inode,
>  						data_reserved, __pos,
> -						release_bytes);
> +						release_bytes, true);
>  			}
>  		}
>  
> @@ -1760,7 +1760,8 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  			unlock_extent_cached(&BTRFS_I(inode)->io_tree,
>  					     lockstart, lockend, &cached_state,
>  					     GFP_NOFS);
> -		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
> +		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes,
> +					       true);
>  		if (ret) {
>  			btrfs_drop_pages(pages, num_pages);
>  			break;
> @@ -1800,11 +1801,11 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  		if (only_release_metadata) {
>  			btrfs_end_write_no_snapshotting(root);
>  			btrfs_delalloc_release_metadata(BTRFS_I(inode),
> -					release_bytes);
> +					release_bytes, true);
>  		} else {
>  			btrfs_delalloc_release_space(inode, data_reserved,
>  					round_down(pos, fs_info->sectorsize),
> -					release_bytes);
> +					release_bytes, true);
>  		}
>  	}
>  
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index 4426d1c73e50..fbd831044a1d 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -3548,7 +3548,7 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
>  	if (ret) {
>  		if (release_metadata)
>  			btrfs_delalloc_release_metadata(BTRFS_I(inode),
> -					inode->i_size);
> +					inode->i_size, true);
>  #ifdef DEBUG
>  		btrfs_err(fs_info,
>  			  "failed to write free ino cache for root %llu",
> diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
> index 022b19336fee..9409dcc7020d 100644
> --- a/fs/btrfs/inode-map.c
> +++ b/fs/btrfs/inode-map.c
> @@ -500,12 +500,12 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
>  	ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc,
>  					      prealloc, prealloc, &alloc_hint);
>  	if (ret) {
> -		btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc);
> +		btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc, true);
>  		goto out_put;
>  	}
>  
>  	ret = btrfs_write_out_ino_cache(root, trans, path, inode);
> -	btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc);
> +	btrfs_delalloc_release_extents(BTRFS_I(inode), prealloc, false);
>  out_put:
>  	iput(inode);
>  out_release:
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index e1a7f3cb5be9..76a1b4819607 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1839,7 +1839,7 @@ static void btrfs_clear_bit_hook(void *private_data,
>  		 */
>  		if (*bits & EXTENT_CLEAR_META_RESV &&
>  		    root != fs_info->tree_root)
> -			btrfs_delalloc_release_metadata(inode, len);
> +			btrfs_delalloc_release_metadata(inode, len, false);
>  
>  		/* For sanity tests. */
>  		if (btrfs_is_testing(fs_info))
> @@ -2102,7 +2102,7 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>  				  0);
>  	ClearPageChecked(page);
>  	set_page_dirty(page);
> -	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
> +	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, false);
>  out:
>  	unlock_extent_cached(&BTRFS_I(inode)->io_tree, page_start, page_end,
>  			     &cached_state, GFP_NOFS);
> @@ -4760,8 +4760,8 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>  	page = find_or_create_page(mapping, index, mask);
>  	if (!page) {
>  		btrfs_delalloc_release_space(inode, data_reserved,
> -					     block_start, blocksize);
> -		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
> +					     block_start, blocksize, true);
> +		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
>  		ret = -ENOMEM;
>  		goto out;
>  	}
> @@ -4829,8 +4829,8 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
>  out_unlock:
>  	if (ret)
>  		btrfs_delalloc_release_space(inode, data_reserved, block_start,
> -					     blocksize);
> -	btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
> +					     blocksize, true);
> +	btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));
>  	unlock_page(page);
>  	put_page(page);
>  out:
> @@ -8814,7 +8814,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  		if (ret < 0 && ret != -EIOCBQUEUED) {
>  			if (dio_data.reserve)
>  				btrfs_delalloc_release_space(inode, data_reserved,
> -					offset, dio_data.reserve);
> +					offset, dio_data.reserve, true);
>  			/*
>  			 * On error we might have left some ordered extents
>  			 * without submitting corresponding bios for them, so
> @@ -8830,8 +8830,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>  					false);
>  		} else if (ret >= 0 && (size_t)ret < count)
>  			btrfs_delalloc_release_space(inode, data_reserved,
> -					offset, count - (size_t)ret);
> -		btrfs_delalloc_release_extents(BTRFS_I(inode), count);
> +					offset, count - (size_t)ret, true);
> +		btrfs_delalloc_release_extents(BTRFS_I(inode), count, false);
>  	}
>  out:
>  	if (wakeup)
> @@ -9150,7 +9150,8 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
>  		if (reserved_space < PAGE_SIZE) {
>  			end = page_start + reserved_space - 1;
>  			btrfs_delalloc_release_space(inode, data_reserved,
> -					page_start, PAGE_SIZE - reserved_space);
> +					page_start, PAGE_SIZE - reserved_space,
> +					true);
>  		}
>  	}
>  
> @@ -9200,16 +9201,16 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
>  
>  out_unlock:
>  	if (!ret) {
> -		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
> +		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true);
>  		sb_end_pagefault(inode->i_sb);
>  		extent_changeset_free(data_reserved);
>  		return VM_FAULT_LOCKED;
>  	}
>  	unlock_page(page);
>  out:
> -	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
> +	btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, (ret != 0));
>  	btrfs_delalloc_release_space(inode, data_reserved, page_start,
> -				     reserved_space);
> +				     reserved_space, (ret != 0));
>  out_noreserve:
>  	sb_end_pagefault(inode->i_sb);
>  	extent_changeset_free(data_reserved);
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 2ef8acaac688..9c46b89de591 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1198,7 +1198,7 @@ static int cluster_pages_for_defrag(struct inode *inode,
>  		spin_unlock(&BTRFS_I(inode)->lock);
>  		btrfs_delalloc_release_space(inode, data_reserved,
>  				start_index << PAGE_SHIFT,
> -				(page_cnt - i_done) << PAGE_SHIFT);
> +				(page_cnt - i_done) << PAGE_SHIFT, true);
>  	}
>  
>  
> @@ -1217,7 +1217,8 @@ static int cluster_pages_for_defrag(struct inode *inode,
>  		unlock_page(pages[i]);
>  		put_page(pages[i]);
>  	}
> -	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT);
> +	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT,
> +				       false);
>  	extent_changeset_free(data_reserved);
>  	return i_done;
>  out:
> @@ -1227,8 +1228,9 @@ static int cluster_pages_for_defrag(struct inode *inode,
>  	}
>  	btrfs_delalloc_release_space(inode, data_reserved,
>  			start_index << PAGE_SHIFT,
> -			page_cnt << PAGE_SHIFT);
> -	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT);
> +			page_cnt << PAGE_SHIFT, true);
> +	btrfs_delalloc_release_extents(BTRFS_I(inode), page_cnt << PAGE_SHIFT,
> +				       true);
>  	extent_changeset_free(data_reserved);
>  	return ret;
>  
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index 5b311aeddcc8..c6f5f7e274a5 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -610,7 +610,7 @@ void btrfs_remove_ordered_extent(struct inode *inode,
>  	btrfs_mod_outstanding_extents(btrfs_inode, -1);
>  	spin_unlock(&btrfs_inode->lock);
>  	if (root != fs_info->tree_root)
> -		btrfs_delalloc_release_metadata(btrfs_inode, entry->len);
> +		btrfs_delalloc_release_metadata(btrfs_inode, entry->len, false);
>  
>  	tree = &btrfs_inode->ordered_tree;
>  	spin_lock_irq(&tree->lock);
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index f0c3f00e97cb..632129024c5f 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3226,7 +3226,7 @@ static int relocate_file_extent_cluster(struct inode *inode,
>  						   mask);
>  			if (!page) {
>  				btrfs_delalloc_release_metadata(BTRFS_I(inode),
> -							PAGE_SIZE);
> +							PAGE_SIZE, true);
>  				ret = -ENOMEM;
>  				goto out;
>  			}
> @@ -3245,9 +3245,9 @@ static int relocate_file_extent_cluster(struct inode *inode,
>  				unlock_page(page);
>  				put_page(page);
>  				btrfs_delalloc_release_metadata(BTRFS_I(inode),
> -							PAGE_SIZE);
> +							PAGE_SIZE, true);
>  				btrfs_delalloc_release_extents(BTRFS_I(inode),
> -							       PAGE_SIZE);
> +							       PAGE_SIZE, true);
>  				ret = -EIO;
>  				goto out;
>  			}
> @@ -3278,7 +3278,8 @@ static int relocate_file_extent_cluster(struct inode *inode,
>  		put_page(page);
>  
>  		index++;
> -		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
> +		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE,
> +					       false);
>  		balance_dirty_pages_ratelimited(inode->i_mapping);
>  		btrfs_throttle(fs_info);
>  	}
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2.2 04/10] btrfs: qgroup: Use separate meta reservation type for delalloc
  2017-12-26  5:40     ` Qu Wenruo
@ 2017-12-26  7:10       ` Lakshmipathi.G
  0 siblings, 0 replies; 31+ messages in thread
From: Lakshmipathi.G @ 2017-12-26  7:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, btrfs, dsterba

> And for anyone who wants to test the latest version of the patchset, it
> can be fetched from the following branch:

I started a build with qgroup_meta_rework from [0] and its currently
running [1],
will update the results  once it completes in few hours time.

[0]: https://github.com/adam900710/linux :
[1]: http://btrfs.in:3000/r/Lakshmipathi

----
Cheers,
Lakshmipathi.G

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2017-12-22  6:18 ` [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv Qu Wenruo
@ 2018-02-22 22:44   ` Jeff Mahoney
  2018-02-22 23:34     ` Qu Wenruo
  2018-04-03  7:30   ` Qu Wenruo
  1 sibling, 1 reply; 31+ messages in thread
From: Jeff Mahoney @ 2018-02-22 22:44 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: dsterba


[-- Attachment #1.1: Type: text/plain, Size: 9850 bytes --]

On 12/22/17 1:18 AM, Qu Wenruo wrote:
> Unlike reservation calculation used in inode rsv for metadata, qgroup
> doesn't really need to care things like csum size or extent usage for
> whole tree COW.
> 
> Qgroup care more about net change of extent usage.
> That's to say, if we're going to insert one file extent, it will mostly
> find its place in CoWed tree block, leaving no change in extent usage.
> Or cause leaf split, result one new net extent, increasing qgroup number
> by nodesize.
> (Or even more rare case, increase the tree level, increasing qgroup
> number by 2 * nodesize)
> 
> So here instead of using the way overkilled calculation for extent
> allocator, which cares more about accurate and no error, qgroup doesn't
> need that over-calculated reservation.
> 
> This patch will maintain 2 new members in btrfs_block_rsv structure for
> qgroup, using much smaller calculation for qgroup rsv, reducing false
> EDQUOT.


I think this is the right idea but, rather than being the last step in a
qgroup rework, it should be the first.  Don't qgroup reservation
lifetimes match the block reservation lifetimes?  We wouldn't want a
global reserve and we wouldn't track usage on a per-block basis, but
they should otherwise match.  We already have clear APIs surrounding the
use of block reservations, so it seems to me that it'd make sense to
extend those to cover the qgroup cases as well.  Otherwise, the rest of
your patchset makes a parallel reservation system with a different API.
That keeps the current state of qgroups as a bolt-on that always needs
to be tracked separately (and which developers need to ensure they get
right).

-Jeff

> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/ctree.h       | 18 +++++++++++++++++
>  fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++----------
>  2 files changed, 62 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 0c58f92c2d44..97783ba91e00 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>  	unsigned short full;
>  	unsigned short type;
>  	unsigned short failfast;
> +
> +	/*
> +	 * Qgroup equivalent for @size @reserved
> +	 *
> +	 * Unlike normal normal @size/@reserved for inode rsv,
> +	 * qgroup doesn't care about things like csum size nor how many tree
> +	 * blocks it will need to reserve.
> +	 *
> +	 * Qgroup cares more about *NET* change of extent usage.
> +	 * So for ONE newly inserted file extent, in worst case it will cause
> +	 * leaf split and level increase, nodesize for each file extent
> +	 * is already way overkilled.
> +	 *
> +	 * In short, qgroup_size/reserved is the up limit of possible needed
> +	 * qgroup metadata reservation.
> +	 */
> +	u64 qgroup_rsv_size;
> +	u64 qgroup_rsv_reserved;
>  };
>  
>  /*
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 986660f301de..9d579c7bcf7f 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
>  
>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>  				    struct btrfs_block_rsv *block_rsv,
> -				    struct btrfs_block_rsv *dest, u64 num_bytes)
> +				    struct btrfs_block_rsv *dest, u64 num_bytes,
> +				    u64 *qgroup_to_release_ret)
>  {
>  	struct btrfs_space_info *space_info = block_rsv->space_info;
> +	u64 qgroup_to_release = 0;
>  	u64 ret;
>  
>  	spin_lock(&block_rsv->lock);
> -	if (num_bytes == (u64)-1)
> +	if (num_bytes == (u64)-1) {
>  		num_bytes = block_rsv->size;
> +		qgroup_to_release = block_rsv->qgroup_rsv_size;
> +	}
>  	block_rsv->size -= num_bytes;
>  	if (block_rsv->reserved >= block_rsv->size) {
>  		num_bytes = block_rsv->reserved - block_rsv->size;
> @@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>  	} else {
>  		num_bytes = 0;
>  	}
> +	if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
> +		qgroup_to_release = block_rsv->qgroup_rsv_reserved -
> +				    block_rsv->qgroup_rsv_size;
> +		block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
> +	} else
> +		qgroup_to_release = 0;
>  	spin_unlock(&block_rsv->lock);
>  
>  	ret = num_bytes;
> @@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>  			space_info_add_old_bytes(fs_info, space_info,
>  						 num_bytes);
>  	}
> +	if (qgroup_to_release_ret)
> +		*qgroup_to_release_ret = qgroup_to_release;
>  	return ret;
>  }
>  
> @@ -5744,17 +5756,21 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>  	struct btrfs_root *root = inode->root;
>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>  	u64 num_bytes = 0;
> +	u64 qgroup_num_bytes = 0;
>  	int ret = -ENOSPC;
>  
>  	spin_lock(&block_rsv->lock);
>  	if (block_rsv->reserved < block_rsv->size)
>  		num_bytes = block_rsv->size - block_rsv->reserved;
> +	if (block_rsv->qgroup_rsv_reserved < block_rsv->qgroup_rsv_size)
> +		qgroup_num_bytes = block_rsv->qgroup_rsv_size -
> +				   block_rsv->qgroup_rsv_reserved;
>  	spin_unlock(&block_rsv->lock);
>  
>  	if (num_bytes == 0)
>  		return 0;
>  
> -	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
> +	ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_num_bytes, true);
>  	if (ret)
>  		return ret;
>  	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
> @@ -5762,7 +5778,13 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>  		block_rsv_add_bytes(block_rsv, num_bytes, 0);
>  		trace_btrfs_space_reservation(root->fs_info, "delalloc",
>  					      btrfs_ino(inode), num_bytes, 1);
> -	}
> +
> +		/* Don't forget to increase qgroup_rsv_reserved */
> +		spin_lock(&block_rsv->lock);
> +		block_rsv->qgroup_rsv_reserved += qgroup_num_bytes;
> +		spin_unlock(&block_rsv->lock);
> +	} else
> +		btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes);
>  	return ret;
>  }
>  
> @@ -5784,20 +5806,23 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>  	u64 released = 0;
> +	u64 qgroup_to_release = 0;
>  
>  	/*
>  	 * Since we statically set the block_rsv->size we just want to say we
>  	 * are releasing 0 bytes, and then we'll just get the reservation over
>  	 * the size free'd.
>  	 */
> -	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0);
> +	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
> +					   &qgroup_to_release);
>  	if (released > 0)
>  		trace_btrfs_space_reservation(fs_info, "delalloc",
>  					      btrfs_ino(inode), released, 0);
>  	if (qgroup_free)
> -		btrfs_qgroup_free_meta_prealloc(inode->root, released);
> +		btrfs_qgroup_free_meta_prealloc(inode->root, qgroup_to_release);
>  	else
> -		btrfs_qgroup_convert_reserved_meta(inode->root, released);
> +		btrfs_qgroup_convert_reserved_meta(inode->root,
> +						   qgroup_to_release);
>  }
>  
>  void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
> @@ -5809,7 +5834,7 @@ void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>  	if (global_rsv == block_rsv ||
>  	    block_rsv->space_info != global_rsv->space_info)
>  		global_rsv = NULL;
> -	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes);
> +	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
>  }
>  
>  static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
> @@ -5889,7 +5914,7 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
>  static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
>  {
>  	block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
> -				(u64)-1);
> +				(u64)-1, NULL);
>  	WARN_ON(fs_info->trans_block_rsv.size > 0);
>  	WARN_ON(fs_info->trans_block_rsv.reserved > 0);
>  	WARN_ON(fs_info->chunk_block_rsv.size > 0);
> @@ -5931,7 +5956,7 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
>  	WARN_ON_ONCE(!list_empty(&trans->new_bgs));
>  
>  	block_rsv_release_bytes(fs_info, &fs_info->chunk_block_rsv, NULL,
> -				trans->chunk_bytes_reserved);
> +				trans->chunk_bytes_reserved, NULL);
>  	trans->chunk_bytes_reserved = 0;
>  }
>  
> @@ -6036,6 +6061,7 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>  {
>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>  	u64 reserve_size = 0;
> +	u64 qgroup_rsv_size = 0;
>  	u64 csum_leaves;
>  	unsigned outstanding_extents;
>  
> @@ -6048,9 +6074,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>  						 inode->csum_bytes);
>  	reserve_size += btrfs_calc_trans_metadata_size(fs_info,
>  						       csum_leaves);
> +	/*
> +	 * For qgroup rsv, the calculation is very simple:
> +	 * nodesize for each outstanding extent.
> +	 * This is already overkilled under most case.
> +	 */
> +	qgroup_rsv_size = outstanding_extents * fs_info->nodesize;
>  
>  	spin_lock(&block_rsv->lock);
>  	block_rsv->size = reserve_size;
> +	block_rsv->qgroup_rsv_size = qgroup_rsv_size;
>  	spin_unlock(&block_rsv->lock);
>  }
>  
> @@ -8405,7 +8438,7 @@ static void unuse_block_rsv(struct btrfs_fs_info *fs_info,
>  			    struct btrfs_block_rsv *block_rsv, u32 blocksize)
>  {
>  	block_rsv_add_bytes(block_rsv, blocksize, 0);
> -	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0);
> +	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0, NULL);
>  }
>  
>  /*
> 


-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-02-22 22:44   ` Jeff Mahoney
@ 2018-02-22 23:34     ` Qu Wenruo
  2018-02-23  8:14       ` Nikolay Borisov
  2018-02-23 14:43       ` Jeff Mahoney
  0 siblings, 2 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-02-22 23:34 UTC (permalink / raw)
  To: Jeff Mahoney, Qu Wenruo, linux-btrfs; +Cc: dsterba


[-- Attachment #1.1: Type: text/plain, Size: 10920 bytes --]



On 2018年02月23日 06:44, Jeff Mahoney wrote:
> On 12/22/17 1:18 AM, Qu Wenruo wrote:
>> Unlike reservation calculation used in inode rsv for metadata, qgroup
>> doesn't really need to care things like csum size or extent usage for
>> whole tree COW.
>>
>> Qgroup care more about net change of extent usage.
>> That's to say, if we're going to insert one file extent, it will mostly
>> find its place in CoWed tree block, leaving no change in extent usage.
>> Or cause leaf split, result one new net extent, increasing qgroup number
>> by nodesize.
>> (Or even more rare case, increase the tree level, increasing qgroup
>> number by 2 * nodesize)
>>
>> So here instead of using the way overkilled calculation for extent
>> allocator, which cares more about accurate and no error, qgroup doesn't
>> need that over-calculated reservation.
>>
>> This patch will maintain 2 new members in btrfs_block_rsv structure for
>> qgroup, using much smaller calculation for qgroup rsv, reducing false
>> EDQUOT.
> 
> 
> I think this is the right idea but, rather than being the last step in a
> qgroup rework, it should be the first.

That's right.

Although putting it as 1st patch needs extra work to co-operate with
later type seperation.

>  Don't qgroup reservation
> lifetimes match the block reservation lifetimes?

Also correct, but...

>  We wouldn't want a
> global reserve and we wouldn't track usage on a per-block basis, but
> they should otherwise match.  We already have clear APIs surrounding the
> use of block reservations, so it seems to me that it'd make sense to
> extend those to cover the qgroup cases as well.  Otherwise, the rest of
> your patchset makes a parallel reservation system with a different API.
> That keeps the current state of qgroups as a bolt-on that always needs
> to be tracked separately (and which developers need to ensure they get
> right).

The problem is, block reservation is designed to ensure every CoW could
be fulfilled without error.

That's to say, for case like CoW write with csum, we need to care about
space reservation not only for EXTENT_DATA in fs tree, but also later
EXTENT_ITEM for extent tree, and CSUM_ITEM for csum tree.

However extent tree and csum tree doesn't contribute to quota at all.
If we follow such over-reservation, it would be much much easier to hit
false EDQUOTA early.

That's the main reason why a separate (and a little simplified) block
rsv tracking system.

And if there is better solution, I'm all ears.

Thanks,
Qu

> 
> -Jeff
> 
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/ctree.h       | 18 +++++++++++++++++
>>  fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++----------
>>  2 files changed, 62 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 0c58f92c2d44..97783ba91e00 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>>  	unsigned short full;
>>  	unsigned short type;
>>  	unsigned short failfast;
>> +
>> +	/*
>> +	 * Qgroup equivalent for @size @reserved
>> +	 *
>> +	 * Unlike normal normal @size/@reserved for inode rsv,
>> +	 * qgroup doesn't care about things like csum size nor how many tree
>> +	 * blocks it will need to reserve.
>> +	 *
>> +	 * Qgroup cares more about *NET* change of extent usage.
>> +	 * So for ONE newly inserted file extent, in worst case it will cause
>> +	 * leaf split and level increase, nodesize for each file extent
>> +	 * is already way overkilled.
>> +	 *
>> +	 * In short, qgroup_size/reserved is the up limit of possible needed
>> +	 * qgroup metadata reservation.
>> +	 */
>> +	u64 qgroup_rsv_size;
>> +	u64 qgroup_rsv_reserved;
>>  };
>>  
>>  /*
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 986660f301de..9d579c7bcf7f 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
>>  
>>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>  				    struct btrfs_block_rsv *block_rsv,
>> -				    struct btrfs_block_rsv *dest, u64 num_bytes)
>> +				    struct btrfs_block_rsv *dest, u64 num_bytes,
>> +				    u64 *qgroup_to_release_ret)
>>  {
>>  	struct btrfs_space_info *space_info = block_rsv->space_info;
>> +	u64 qgroup_to_release = 0;
>>  	u64 ret;
>>  
>>  	spin_lock(&block_rsv->lock);
>> -	if (num_bytes == (u64)-1)
>> +	if (num_bytes == (u64)-1) {
>>  		num_bytes = block_rsv->size;
>> +		qgroup_to_release = block_rsv->qgroup_rsv_size;
>> +	}
>>  	block_rsv->size -= num_bytes;
>>  	if (block_rsv->reserved >= block_rsv->size) {
>>  		num_bytes = block_rsv->reserved - block_rsv->size;
>> @@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>  	} else {
>>  		num_bytes = 0;
>>  	}
>> +	if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
>> +		qgroup_to_release = block_rsv->qgroup_rsv_reserved -
>> +				    block_rsv->qgroup_rsv_size;
>> +		block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
>> +	} else
>> +		qgroup_to_release = 0;
>>  	spin_unlock(&block_rsv->lock);
>>  
>>  	ret = num_bytes;
>> @@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>  			space_info_add_old_bytes(fs_info, space_info,
>>  						 num_bytes);
>>  	}
>> +	if (qgroup_to_release_ret)
>> +		*qgroup_to_release_ret = qgroup_to_release;
>>  	return ret;
>>  }
>>  
>> @@ -5744,17 +5756,21 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>  	struct btrfs_root *root = inode->root;
>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>  	u64 num_bytes = 0;
>> +	u64 qgroup_num_bytes = 0;
>>  	int ret = -ENOSPC;
>>  
>>  	spin_lock(&block_rsv->lock);
>>  	if (block_rsv->reserved < block_rsv->size)
>>  		num_bytes = block_rsv->size - block_rsv->reserved;
>> +	if (block_rsv->qgroup_rsv_reserved < block_rsv->qgroup_rsv_size)
>> +		qgroup_num_bytes = block_rsv->qgroup_rsv_size -
>> +				   block_rsv->qgroup_rsv_reserved;
>>  	spin_unlock(&block_rsv->lock);
>>  
>>  	if (num_bytes == 0)
>>  		return 0;
>>  
>> -	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
>> +	ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_num_bytes, true);
>>  	if (ret)
>>  		return ret;
>>  	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
>> @@ -5762,7 +5778,13 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>  		block_rsv_add_bytes(block_rsv, num_bytes, 0);
>>  		trace_btrfs_space_reservation(root->fs_info, "delalloc",
>>  					      btrfs_ino(inode), num_bytes, 1);
>> -	}
>> +
>> +		/* Don't forget to increase qgroup_rsv_reserved */
>> +		spin_lock(&block_rsv->lock);
>> +		block_rsv->qgroup_rsv_reserved += qgroup_num_bytes;
>> +		spin_unlock(&block_rsv->lock);
>> +	} else
>> +		btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes);
>>  	return ret;
>>  }
>>  
>> @@ -5784,20 +5806,23 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>  	u64 released = 0;
>> +	u64 qgroup_to_release = 0;
>>  
>>  	/*
>>  	 * Since we statically set the block_rsv->size we just want to say we
>>  	 * are releasing 0 bytes, and then we'll just get the reservation over
>>  	 * the size free'd.
>>  	 */
>> -	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0);
>> +	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
>> +					   &qgroup_to_release);
>>  	if (released > 0)
>>  		trace_btrfs_space_reservation(fs_info, "delalloc",
>>  					      btrfs_ino(inode), released, 0);
>>  	if (qgroup_free)
>> -		btrfs_qgroup_free_meta_prealloc(inode->root, released);
>> +		btrfs_qgroup_free_meta_prealloc(inode->root, qgroup_to_release);
>>  	else
>> -		btrfs_qgroup_convert_reserved_meta(inode->root, released);
>> +		btrfs_qgroup_convert_reserved_meta(inode->root,
>> +						   qgroup_to_release);
>>  }
>>  
>>  void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>> @@ -5809,7 +5834,7 @@ void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>>  	if (global_rsv == block_rsv ||
>>  	    block_rsv->space_info != global_rsv->space_info)
>>  		global_rsv = NULL;
>> -	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes);
>> +	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
>>  }
>>  
>>  static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
>> @@ -5889,7 +5914,7 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
>>  static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
>>  {
>>  	block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
>> -				(u64)-1);
>> +				(u64)-1, NULL);
>>  	WARN_ON(fs_info->trans_block_rsv.size > 0);
>>  	WARN_ON(fs_info->trans_block_rsv.reserved > 0);
>>  	WARN_ON(fs_info->chunk_block_rsv.size > 0);
>> @@ -5931,7 +5956,7 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
>>  	WARN_ON_ONCE(!list_empty(&trans->new_bgs));
>>  
>>  	block_rsv_release_bytes(fs_info, &fs_info->chunk_block_rsv, NULL,
>> -				trans->chunk_bytes_reserved);
>> +				trans->chunk_bytes_reserved, NULL);
>>  	trans->chunk_bytes_reserved = 0;
>>  }
>>  
>> @@ -6036,6 +6061,7 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>  {
>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>  	u64 reserve_size = 0;
>> +	u64 qgroup_rsv_size = 0;
>>  	u64 csum_leaves;
>>  	unsigned outstanding_extents;
>>  
>> @@ -6048,9 +6074,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>  						 inode->csum_bytes);
>>  	reserve_size += btrfs_calc_trans_metadata_size(fs_info,
>>  						       csum_leaves);
>> +	/*
>> +	 * For qgroup rsv, the calculation is very simple:
>> +	 * nodesize for each outstanding extent.
>> +	 * This is already overkilled under most case.
>> +	 */
>> +	qgroup_rsv_size = outstanding_extents * fs_info->nodesize;
>>  
>>  	spin_lock(&block_rsv->lock);
>>  	block_rsv->size = reserve_size;
>> +	block_rsv->qgroup_rsv_size = qgroup_rsv_size;
>>  	spin_unlock(&block_rsv->lock);
>>  }
>>  
>> @@ -8405,7 +8438,7 @@ static void unuse_block_rsv(struct btrfs_fs_info *fs_info,
>>  			    struct btrfs_block_rsv *block_rsv, u32 blocksize)
>>  {
>>  	block_rsv_add_bytes(block_rsv, blocksize, 0);
>> -	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0);
>> +	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0, NULL);
>>  }
>>  
>>  /*
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-02-22 23:34     ` Qu Wenruo
@ 2018-02-23  8:14       ` Nikolay Borisov
  2018-02-23  9:06         ` Qu Wenruo
  2018-02-23 14:43       ` Jeff Mahoney
  1 sibling, 1 reply; 31+ messages in thread
From: Nikolay Borisov @ 2018-02-23  8:14 UTC (permalink / raw)
  To: Qu Wenruo, Jeff Mahoney, Qu Wenruo, linux-btrfs; +Cc: dsterba



On 23.02.2018 01:34, Qu Wenruo wrote:
> 
> 
> On 2018年02月23日 06:44, Jeff Mahoney wrote:
>> On 12/22/17 1:18 AM, Qu Wenruo wrote:
>>> Unlike reservation calculation used in inode rsv for metadata, qgroup
>>> doesn't really need to care things like csum size or extent usage for
>>> whole tree COW.
>>>
>>> Qgroup care more about net change of extent usage.
>>> That's to say, if we're going to insert one file extent, it will mostly
>>> find its place in CoWed tree block, leaving no change in extent usage.
>>> Or cause leaf split, result one new net extent, increasing qgroup number
>>> by nodesize.
>>> (Or even more rare case, increase the tree level, increasing qgroup
>>> number by 2 * nodesize)
>>>
>>> So here instead of using the way overkilled calculation for extent
>>> allocator, which cares more about accurate and no error, qgroup doesn't
>>> need that over-calculated reservation.
>>>
>>> This patch will maintain 2 new members in btrfs_block_rsv structure for
>>> qgroup, using much smaller calculation for qgroup rsv, reducing false
>>> EDQUOT.
>>
>>
>> I think this is the right idea but, rather than being the last step in a
>> qgroup rework, it should be the first.
> 
> That's right.
> 
> Although putting it as 1st patch needs extra work to co-operate with
> later type seperation.
> 
>>  Don't qgroup reservation
>> lifetimes match the block reservation lifetimes?
> 
> Also correct, but...
> 
>>  We wouldn't want a
>> global reserve and we wouldn't track usage on a per-block basis, but
>> they should otherwise match.  We already have clear APIs surrounding the
>> use of block reservations, so it seems to me that it'd make sense to
>> extend those to cover the qgroup cases as well.  Otherwise, the rest of
>> your patchset makes a parallel reservation system with a different API.
>> That keeps the current state of qgroups as a bolt-on that always needs
>> to be tracked separately (and which developers need to ensure they get
>> right).
> 
> The problem is, block reservation is designed to ensure every CoW could
> be fulfilled without error.
> 
> That's to say, for case like CoW write with csum, we need to care about
> space reservation not only for EXTENT_DATA in fs tree, but also later
> EXTENT_ITEM for extent tree, and CSUM_ITEM for csum tree.
> 
> However extent tree and csum tree doesn't contribute to quota at all.
> If we follow such over-reservation, it would be much much easier to hit
> false EDQUOTA early.
> 
> That's the main reason why a separate (and a little simplified) block
> rsv tracking system.
> 
> And if there is better solution, I'm all ears.

So looking at the code the main hurdles seems to be the fact that the
btrfs_block_rsv_* API's take a raw byte count, which is derived from
btrfs_calc_trans_metadata_size, which in turn is passed the number of
items we want.

So what if we extend the block_rsv_* apis to take an additional
'quota_bytes' or somesuch argument which would represent the amount we
would like to be charged to the qgroups. This will force us to revisit
every call site of block_rsv API's and adjust the code accordingly so
that callers pass the correct number. This will lead to code such as:

if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) { blah }

be contained into the block rsv apis. This will ensure lifetime of
blockrsv/qgroups is always consistent. I think this only applies to
qgroup metadata reservations.




> 
> Thanks,
> Qu
> 
>>
>> -Jeff
>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  fs/btrfs/ctree.h       | 18 +++++++++++++++++
>>>  fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++----------
>>>  2 files changed, 62 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index 0c58f92c2d44..97783ba91e00 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>>>  	unsigned short full;
>>>  	unsigned short type;
>>>  	unsigned short failfast;
>>> +
>>> +	/*
>>> +	 * Qgroup equivalent for @size @reserved
>>> +	 *
>>> +	 * Unlike normal normal @size/@reserved for inode rsv,
>>> +	 * qgroup doesn't care about things like csum size nor how many tree
>>> +	 * blocks it will need to reserve.
>>> +	 *
>>> +	 * Qgroup cares more about *NET* change of extent usage.
>>> +	 * So for ONE newly inserted file extent, in worst case it will cause
>>> +	 * leaf split and level increase, nodesize for each file extent
>>> +	 * is already way overkilled.
>>> +	 *
>>> +	 * In short, qgroup_size/reserved is the up limit of possible needed
>>> +	 * qgroup metadata reservation.
>>> +	 */
>>> +	u64 qgroup_rsv_size;
>>> +	u64 qgroup_rsv_reserved;
>>>  };
>>>  
>>>  /*
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 986660f301de..9d579c7bcf7f 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
>>>  
>>>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>  				    struct btrfs_block_rsv *block_rsv,
>>> -				    struct btrfs_block_rsv *dest, u64 num_bytes)
>>> +				    struct btrfs_block_rsv *dest, u64 num_bytes,
>>> +				    u64 *qgroup_to_release_ret)
>>>  {
>>>  	struct btrfs_space_info *space_info = block_rsv->space_info;
>>> +	u64 qgroup_to_release = 0;
>>>  	u64 ret;
>>>  
>>>  	spin_lock(&block_rsv->lock);
>>> -	if (num_bytes == (u64)-1)
>>> +	if (num_bytes == (u64)-1) {
>>>  		num_bytes = block_rsv->size;
>>> +		qgroup_to_release = block_rsv->qgroup_rsv_size;
>>> +	}
>>>  	block_rsv->size -= num_bytes;
>>>  	if (block_rsv->reserved >= block_rsv->size) {
>>>  		num_bytes = block_rsv->reserved - block_rsv->size;
>>> @@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>  	} else {
>>>  		num_bytes = 0;
>>>  	}
>>> +	if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
>>> +		qgroup_to_release = block_rsv->qgroup_rsv_reserved -
>>> +				    block_rsv->qgroup_rsv_size;
>>> +		block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
>>> +	} else
>>> +		qgroup_to_release = 0;
>>>  	spin_unlock(&block_rsv->lock);
>>>  
>>>  	ret = num_bytes;
>>> @@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>  			space_info_add_old_bytes(fs_info, space_info,
>>>  						 num_bytes);
>>>  	}
>>> +	if (qgroup_to_release_ret)
>>> +		*qgroup_to_release_ret = qgroup_to_release;
>>>  	return ret;
>>>  }
>>>  
>>> @@ -5744,17 +5756,21 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>>  	struct btrfs_root *root = inode->root;
>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>  	u64 num_bytes = 0;
>>> +	u64 qgroup_num_bytes = 0;
>>>  	int ret = -ENOSPC;
>>>  
>>>  	spin_lock(&block_rsv->lock);
>>>  	if (block_rsv->reserved < block_rsv->size)
>>>  		num_bytes = block_rsv->size - block_rsv->reserved;
>>> +	if (block_rsv->qgroup_rsv_reserved < block_rsv->qgroup_rsv_size)
>>> +		qgroup_num_bytes = block_rsv->qgroup_rsv_size -
>>> +				   block_rsv->qgroup_rsv_reserved;
>>>  	spin_unlock(&block_rsv->lock);
>>>  
>>>  	if (num_bytes == 0)
>>>  		return 0;
>>>  
>>> -	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
>>> +	ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_num_bytes, true);
>>>  	if (ret)
>>>  		return ret;
>>>  	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
>>> @@ -5762,7 +5778,13 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>>  		block_rsv_add_bytes(block_rsv, num_bytes, 0);
>>>  		trace_btrfs_space_reservation(root->fs_info, "delalloc",
>>>  					      btrfs_ino(inode), num_bytes, 1);
>>> -	}
>>> +
>>> +		/* Don't forget to increase qgroup_rsv_reserved */
>>> +		spin_lock(&block_rsv->lock);
>>> +		block_rsv->qgroup_rsv_reserved += qgroup_num_bytes;
>>> +		spin_unlock(&block_rsv->lock);
>>> +	} else
>>> +		btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes);
>>>  	return ret;
>>>  }
>>>  
>>> @@ -5784,20 +5806,23 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>>>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>  	u64 released = 0;
>>> +	u64 qgroup_to_release = 0;
>>>  
>>>  	/*
>>>  	 * Since we statically set the block_rsv->size we just want to say we
>>>  	 * are releasing 0 bytes, and then we'll just get the reservation over
>>>  	 * the size free'd.
>>>  	 */
>>> -	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0);
>>> +	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
>>> +					   &qgroup_to_release);
>>>  	if (released > 0)
>>>  		trace_btrfs_space_reservation(fs_info, "delalloc",
>>>  					      btrfs_ino(inode), released, 0);
>>>  	if (qgroup_free)
>>> -		btrfs_qgroup_free_meta_prealloc(inode->root, released);
>>> +		btrfs_qgroup_free_meta_prealloc(inode->root, qgroup_to_release);
>>>  	else
>>> -		btrfs_qgroup_convert_reserved_meta(inode->root, released);
>>> +		btrfs_qgroup_convert_reserved_meta(inode->root,
>>> +						   qgroup_to_release);
>>>  }
>>>  
>>>  void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>>> @@ -5809,7 +5834,7 @@ void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>>>  	if (global_rsv == block_rsv ||
>>>  	    block_rsv->space_info != global_rsv->space_info)
>>>  		global_rsv = NULL;
>>> -	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes);
>>> +	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
>>>  }
>>>  
>>>  static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
>>> @@ -5889,7 +5914,7 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>  static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>  {
>>>  	block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
>>> -				(u64)-1);
>>> +				(u64)-1, NULL);
>>>  	WARN_ON(fs_info->trans_block_rsv.size > 0);
>>>  	WARN_ON(fs_info->trans_block_rsv.reserved > 0);
>>>  	WARN_ON(fs_info->chunk_block_rsv.size > 0);
>>> @@ -5931,7 +5956,7 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
>>>  	WARN_ON_ONCE(!list_empty(&trans->new_bgs));
>>>  
>>>  	block_rsv_release_bytes(fs_info, &fs_info->chunk_block_rsv, NULL,
>>> -				trans->chunk_bytes_reserved);
>>> +				trans->chunk_bytes_reserved, NULL);
>>>  	trans->chunk_bytes_reserved = 0;
>>>  }
>>>  
>>> @@ -6036,6 +6061,7 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>>  {
>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>  	u64 reserve_size = 0;
>>> +	u64 qgroup_rsv_size = 0;
>>>  	u64 csum_leaves;
>>>  	unsigned outstanding_extents;
>>>  
>>> @@ -6048,9 +6074,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>>  						 inode->csum_bytes);
>>>  	reserve_size += btrfs_calc_trans_metadata_size(fs_info,
>>>  						       csum_leaves);
>>> +	/*
>>> +	 * For qgroup rsv, the calculation is very simple:
>>> +	 * nodesize for each outstanding extent.
>>> +	 * This is already overkilled under most case.
>>> +	 */
>>> +	qgroup_rsv_size = outstanding_extents * fs_info->nodesize;
>>>  
>>>  	spin_lock(&block_rsv->lock);
>>>  	block_rsv->size = reserve_size;
>>> +	block_rsv->qgroup_rsv_size = qgroup_rsv_size;
>>>  	spin_unlock(&block_rsv->lock);
>>>  }
>>>  
>>> @@ -8405,7 +8438,7 @@ static void unuse_block_rsv(struct btrfs_fs_info *fs_info,
>>>  			    struct btrfs_block_rsv *block_rsv, u32 blocksize)
>>>  {
>>>  	block_rsv_add_bytes(block_rsv, blocksize, 0);
>>> -	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0);
>>> +	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0, NULL);
>>>  }
>>>  
>>>  /*
>>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-02-23  8:14       ` Nikolay Borisov
@ 2018-02-23  9:06         ` Qu Wenruo
  2018-02-23 11:00           ` Nikolay Borisov
  0 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2018-02-23  9:06 UTC (permalink / raw)
  To: Nikolay Borisov, Jeff Mahoney, Qu Wenruo, linux-btrfs; +Cc: dsterba


[-- Attachment #1.1: Type: text/plain, Size: 12852 bytes --]



On 2018年02月23日 16:14, Nikolay Borisov wrote:
> 
> 
> On 23.02.2018 01:34, Qu Wenruo wrote:
>>
>>
>> On 2018年02月23日 06:44, Jeff Mahoney wrote:
>>> On 12/22/17 1:18 AM, Qu Wenruo wrote:
>>>> Unlike reservation calculation used in inode rsv for metadata, qgroup
>>>> doesn't really need to care things like csum size or extent usage for
>>>> whole tree COW.
>>>>
>>>> Qgroup care more about net change of extent usage.
>>>> That's to say, if we're going to insert one file extent, it will mostly
>>>> find its place in CoWed tree block, leaving no change in extent usage.
>>>> Or cause leaf split, result one new net extent, increasing qgroup number
>>>> by nodesize.
>>>> (Or even more rare case, increase the tree level, increasing qgroup
>>>> number by 2 * nodesize)
>>>>
>>>> So here instead of using the way overkilled calculation for extent
>>>> allocator, which cares more about accurate and no error, qgroup doesn't
>>>> need that over-calculated reservation.
>>>>
>>>> This patch will maintain 2 new members in btrfs_block_rsv structure for
>>>> qgroup, using much smaller calculation for qgroup rsv, reducing false
>>>> EDQUOT.
>>>
>>>
>>> I think this is the right idea but, rather than being the last step in a
>>> qgroup rework, it should be the first.
>>
>> That's right.
>>
>> Although putting it as 1st patch needs extra work to co-operate with
>> later type seperation.
>>
>>>  Don't qgroup reservation
>>> lifetimes match the block reservation lifetimes?
>>
>> Also correct, but...
>>
>>>  We wouldn't want a
>>> global reserve and we wouldn't track usage on a per-block basis, but
>>> they should otherwise match.  We already have clear APIs surrounding the
>>> use of block reservations, so it seems to me that it'd make sense to
>>> extend those to cover the qgroup cases as well.  Otherwise, the rest of
>>> your patchset makes a parallel reservation system with a different API.
>>> That keeps the current state of qgroups as a bolt-on that always needs
>>> to be tracked separately (and which developers need to ensure they get
>>> right).
>>
>> The problem is, block reservation is designed to ensure every CoW could
>> be fulfilled without error.
>>
>> That's to say, for case like CoW write with csum, we need to care about
>> space reservation not only for EXTENT_DATA in fs tree, but also later
>> EXTENT_ITEM for extent tree, and CSUM_ITEM for csum tree.
>>
>> However extent tree and csum tree doesn't contribute to quota at all.
>> If we follow such over-reservation, it would be much much easier to hit
>> false EDQUOTA early.
>>
>> That's the main reason why a separate (and a little simplified) block
>> rsv tracking system.
>>
>> And if there is better solution, I'm all ears.
> 
> So looking at the code the main hurdles seems to be the fact that the
> btrfs_block_rsv_* API's take a raw byte count, which is derived from
> btrfs_calc_trans_metadata_size, which in turn is passed the number of
> items we want.
> 
> So what if we extend the block_rsv_* apis to take an additional
> 'quota_bytes' or somesuch argument which would represent the amount we
> would like to be charged to the qgroups. This will force us to revisit
> every call site of block_rsv API's and adjust the code accordingly so
> that callers pass the correct number. This will lead to code such as:
> 
> if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) { blah }

We don't need to do such check at call site.

Just do the calculation (which should be really simple, as simple as
nodesize * nr_items_to_add), and pass it to btrfs_qgroup_reserve_* APIs,
which would handle the quota enabled check.

> 
> be contained into the block rsv apis. This will ensure lifetime of
> blockrsv/qgroups is always consistent. I think this only applies to
> qgroup metadata reservations.

Yep, and only applies to PREALLOC type metadata reservation.
For per-transaction rsv, it's handled by PERTRANS type.

Thanks,
Qu

> 
> 
> 
> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> -Jeff
>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>  fs/btrfs/ctree.h       | 18 +++++++++++++++++
>>>>  fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++----------
>>>>  2 files changed, 62 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>> index 0c58f92c2d44..97783ba91e00 100644
>>>> --- a/fs/btrfs/ctree.h
>>>> +++ b/fs/btrfs/ctree.h
>>>> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>>>>  	unsigned short full;
>>>>  	unsigned short type;
>>>>  	unsigned short failfast;
>>>> +
>>>> +	/*
>>>> +	 * Qgroup equivalent for @size @reserved
>>>> +	 *
>>>> +	 * Unlike normal normal @size/@reserved for inode rsv,
>>>> +	 * qgroup doesn't care about things like csum size nor how many tree
>>>> +	 * blocks it will need to reserve.
>>>> +	 *
>>>> +	 * Qgroup cares more about *NET* change of extent usage.
>>>> +	 * So for ONE newly inserted file extent, in worst case it will cause
>>>> +	 * leaf split and level increase, nodesize for each file extent
>>>> +	 * is already way overkilled.
>>>> +	 *
>>>> +	 * In short, qgroup_size/reserved is the up limit of possible needed
>>>> +	 * qgroup metadata reservation.
>>>> +	 */
>>>> +	u64 qgroup_rsv_size;
>>>> +	u64 qgroup_rsv_reserved;
>>>>  };
>>>>  
>>>>  /*
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index 986660f301de..9d579c7bcf7f 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
>>>>  
>>>>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>>  				    struct btrfs_block_rsv *block_rsv,
>>>> -				    struct btrfs_block_rsv *dest, u64 num_bytes)
>>>> +				    struct btrfs_block_rsv *dest, u64 num_bytes,
>>>> +				    u64 *qgroup_to_release_ret)
>>>>  {
>>>>  	struct btrfs_space_info *space_info = block_rsv->space_info;
>>>> +	u64 qgroup_to_release = 0;
>>>>  	u64 ret;
>>>>  
>>>>  	spin_lock(&block_rsv->lock);
>>>> -	if (num_bytes == (u64)-1)
>>>> +	if (num_bytes == (u64)-1) {
>>>>  		num_bytes = block_rsv->size;
>>>> +		qgroup_to_release = block_rsv->qgroup_rsv_size;
>>>> +	}
>>>>  	block_rsv->size -= num_bytes;
>>>>  	if (block_rsv->reserved >= block_rsv->size) {
>>>>  		num_bytes = block_rsv->reserved - block_rsv->size;
>>>> @@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>>  	} else {
>>>>  		num_bytes = 0;
>>>>  	}
>>>> +	if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
>>>> +		qgroup_to_release = block_rsv->qgroup_rsv_reserved -
>>>> +				    block_rsv->qgroup_rsv_size;
>>>> +		block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
>>>> +	} else
>>>> +		qgroup_to_release = 0;
>>>>  	spin_unlock(&block_rsv->lock);
>>>>  
>>>>  	ret = num_bytes;
>>>> @@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>>  			space_info_add_old_bytes(fs_info, space_info,
>>>>  						 num_bytes);
>>>>  	}
>>>> +	if (qgroup_to_release_ret)
>>>> +		*qgroup_to_release_ret = qgroup_to_release;
>>>>  	return ret;
>>>>  }
>>>>  
>>>> @@ -5744,17 +5756,21 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>>>  	struct btrfs_root *root = inode->root;
>>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>>  	u64 num_bytes = 0;
>>>> +	u64 qgroup_num_bytes = 0;
>>>>  	int ret = -ENOSPC;
>>>>  
>>>>  	spin_lock(&block_rsv->lock);
>>>>  	if (block_rsv->reserved < block_rsv->size)
>>>>  		num_bytes = block_rsv->size - block_rsv->reserved;
>>>> +	if (block_rsv->qgroup_rsv_reserved < block_rsv->qgroup_rsv_size)
>>>> +		qgroup_num_bytes = block_rsv->qgroup_rsv_size -
>>>> +				   block_rsv->qgroup_rsv_reserved;
>>>>  	spin_unlock(&block_rsv->lock);
>>>>  
>>>>  	if (num_bytes == 0)
>>>>  		return 0;
>>>>  
>>>> -	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
>>>> +	ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_num_bytes, true);
>>>>  	if (ret)
>>>>  		return ret;
>>>>  	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
>>>> @@ -5762,7 +5778,13 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>>>  		block_rsv_add_bytes(block_rsv, num_bytes, 0);
>>>>  		trace_btrfs_space_reservation(root->fs_info, "delalloc",
>>>>  					      btrfs_ino(inode), num_bytes, 1);
>>>> -	}
>>>> +
>>>> +		/* Don't forget to increase qgroup_rsv_reserved */
>>>> +		spin_lock(&block_rsv->lock);
>>>> +		block_rsv->qgroup_rsv_reserved += qgroup_num_bytes;
>>>> +		spin_unlock(&block_rsv->lock);
>>>> +	} else
>>>> +		btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes);
>>>>  	return ret;
>>>>  }
>>>>  
>>>> @@ -5784,20 +5806,23 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>>>>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>>  	u64 released = 0;
>>>> +	u64 qgroup_to_release = 0;
>>>>  
>>>>  	/*
>>>>  	 * Since we statically set the block_rsv->size we just want to say we
>>>>  	 * are releasing 0 bytes, and then we'll just get the reservation over
>>>>  	 * the size free'd.
>>>>  	 */
>>>> -	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0);
>>>> +	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
>>>> +					   &qgroup_to_release);
>>>>  	if (released > 0)
>>>>  		trace_btrfs_space_reservation(fs_info, "delalloc",
>>>>  					      btrfs_ino(inode), released, 0);
>>>>  	if (qgroup_free)
>>>> -		btrfs_qgroup_free_meta_prealloc(inode->root, released);
>>>> +		btrfs_qgroup_free_meta_prealloc(inode->root, qgroup_to_release);
>>>>  	else
>>>> -		btrfs_qgroup_convert_reserved_meta(inode->root, released);
>>>> +		btrfs_qgroup_convert_reserved_meta(inode->root,
>>>> +						   qgroup_to_release);
>>>>  }
>>>>  
>>>>  void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>>>> @@ -5809,7 +5834,7 @@ void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>>>>  	if (global_rsv == block_rsv ||
>>>>  	    block_rsv->space_info != global_rsv->space_info)
>>>>  		global_rsv = NULL;
>>>> -	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes);
>>>> +	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
>>>>  }
>>>>  
>>>>  static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>> @@ -5889,7 +5914,7 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>>  static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>>  {
>>>>  	block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
>>>> -				(u64)-1);
>>>> +				(u64)-1, NULL);
>>>>  	WARN_ON(fs_info->trans_block_rsv.size > 0);
>>>>  	WARN_ON(fs_info->trans_block_rsv.reserved > 0);
>>>>  	WARN_ON(fs_info->chunk_block_rsv.size > 0);
>>>> @@ -5931,7 +5956,7 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
>>>>  	WARN_ON_ONCE(!list_empty(&trans->new_bgs));
>>>>  
>>>>  	block_rsv_release_bytes(fs_info, &fs_info->chunk_block_rsv, NULL,
>>>> -				trans->chunk_bytes_reserved);
>>>> +				trans->chunk_bytes_reserved, NULL);
>>>>  	trans->chunk_bytes_reserved = 0;
>>>>  }
>>>>  
>>>> @@ -6036,6 +6061,7 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>>>  {
>>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>>  	u64 reserve_size = 0;
>>>> +	u64 qgroup_rsv_size = 0;
>>>>  	u64 csum_leaves;
>>>>  	unsigned outstanding_extents;
>>>>  
>>>> @@ -6048,9 +6074,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>>>  						 inode->csum_bytes);
>>>>  	reserve_size += btrfs_calc_trans_metadata_size(fs_info,
>>>>  						       csum_leaves);
>>>> +	/*
>>>> +	 * For qgroup rsv, the calculation is very simple:
>>>> +	 * nodesize for each outstanding extent.
>>>> +	 * This is already overkilled under most case.
>>>> +	 */
>>>> +	qgroup_rsv_size = outstanding_extents * fs_info->nodesize;
>>>>  
>>>>  	spin_lock(&block_rsv->lock);
>>>>  	block_rsv->size = reserve_size;
>>>> +	block_rsv->qgroup_rsv_size = qgroup_rsv_size;
>>>>  	spin_unlock(&block_rsv->lock);
>>>>  }
>>>>  
>>>> @@ -8405,7 +8438,7 @@ static void unuse_block_rsv(struct btrfs_fs_info *fs_info,
>>>>  			    struct btrfs_block_rsv *block_rsv, u32 blocksize)
>>>>  {
>>>>  	block_rsv_add_bytes(block_rsv, blocksize, 0);
>>>> -	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0);
>>>> +	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0, NULL);
>>>>  }
>>>>  
>>>>  /*
>>>>
>>>
>>>
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-02-23  9:06         ` Qu Wenruo
@ 2018-02-23 11:00           ` Nikolay Borisov
  2018-02-23 11:22             ` Qu Wenruo
  0 siblings, 1 reply; 31+ messages in thread
From: Nikolay Borisov @ 2018-02-23 11:00 UTC (permalink / raw)
  To: Qu Wenruo, Jeff Mahoney, Qu Wenruo, linux-btrfs; +Cc: dsterba



On 23.02.2018 11:06, Qu Wenruo wrote:
> 
> 
> On 2018年02月23日 16:14, Nikolay Borisov wrote:
>>
>>
>> On 23.02.2018 01:34, Qu Wenruo wrote:
>>>
>>>
>>> On 2018年02月23日 06:44, Jeff Mahoney wrote:
>>>> On 12/22/17 1:18 AM, Qu Wenruo wrote:
>>>>> Unlike reservation calculation used in inode rsv for metadata, qgroup
>>>>> doesn't really need to care things like csum size or extent usage for
>>>>> whole tree COW.
>>>>>
>>>>> Qgroup care more about net change of extent usage.
>>>>> That's to say, if we're going to insert one file extent, it will mostly
>>>>> find its place in CoWed tree block, leaving no change in extent usage.
>>>>> Or cause leaf split, result one new net extent, increasing qgroup number
>>>>> by nodesize.
>>>>> (Or even more rare case, increase the tree level, increasing qgroup
>>>>> number by 2 * nodesize)
>>>>>
>>>>> So here instead of using the way overkilled calculation for extent
>>>>> allocator, which cares more about accurate and no error, qgroup doesn't
>>>>> need that over-calculated reservation.
>>>>>
>>>>> This patch will maintain 2 new members in btrfs_block_rsv structure for
>>>>> qgroup, using much smaller calculation for qgroup rsv, reducing false
>>>>> EDQUOT.
>>>>
>>>>
>>>> I think this is the right idea but, rather than being the last step in a
>>>> qgroup rework, it should be the first.
>>>
>>> That's right.
>>>
>>> Although putting it as 1st patch needs extra work to co-operate with
>>> later type seperation.
>>>
>>>>  Don't qgroup reservation
>>>> lifetimes match the block reservation lifetimes?
>>>
>>> Also correct, but...
>>>
>>>>  We wouldn't want a
>>>> global reserve and we wouldn't track usage on a per-block basis, but
>>>> they should otherwise match.  We already have clear APIs surrounding the
>>>> use of block reservations, so it seems to me that it'd make sense to
>>>> extend those to cover the qgroup cases as well.  Otherwise, the rest of
>>>> your patchset makes a parallel reservation system with a different API.
>>>> That keeps the current state of qgroups as a bolt-on that always needs
>>>> to be tracked separately (and which developers need to ensure they get
>>>> right).
>>>
>>> The problem is, block reservation is designed to ensure every CoW could
>>> be fulfilled without error.
>>>
>>> That's to say, for case like CoW write with csum, we need to care about
>>> space reservation not only for EXTENT_DATA in fs tree, but also later
>>> EXTENT_ITEM for extent tree, and CSUM_ITEM for csum tree.
>>>
>>> However extent tree and csum tree doesn't contribute to quota at all.
>>> If we follow such over-reservation, it would be much much easier to hit
>>> false EDQUOTA early.
>>>
>>> That's the main reason why a separate (and a little simplified) block
>>> rsv tracking system.
>>>
>>> And if there is better solution, I'm all ears.
>>
>> So looking at the code the main hurdles seems to be the fact that the
>> btrfs_block_rsv_* API's take a raw byte count, which is derived from
>> btrfs_calc_trans_metadata_size, which in turn is passed the number of
>> items we want.
>>
>> So what if we extend the block_rsv_* apis to take an additional
>> 'quota_bytes' or somesuch argument which would represent the amount we
>> would like to be charged to the qgroups. This will force us to revisit
>> every call site of block_rsv API's and adjust the code accordingly so
>> that callers pass the correct number. This will lead to code such as:
>>
>> if (test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) { blah }
> 
> We don't need to do such check at call site.
> 
> Just do the calculation (which should be really simple, as simple as
> nodesize * nr_items_to_add), and pass it to btrfs_qgroup_reserve_* APIs,
> which would handle the quota enabled check.
> 
>>
>> be contained into the block rsv apis. This will ensure lifetime of
>> blockrsv/qgroups is always consistent. I think this only applies to
>> qgroup metadata reservations.
> 
> Yep, and only applies to PREALLOC type metadata reservation.
> For per-transaction rsv, it's handled by PERTRANS type.

And what about the btrfs_qgroup_reserve_meta()-type reservations, this
function doesn't take a transaction, what kind of reservation is that o_O

To sum it up we have  PERTRANS, PREALLOC and  btrfs_qgroup_reserve_meta
and those are 3 distinct type of reservations, correct?
> 
> Thanks,
> Qu
> 
>>
>>
>>
>>
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> -Jeff
>>>>
>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>> ---
>>>>>  fs/btrfs/ctree.h       | 18 +++++++++++++++++
>>>>>  fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++----------
>>>>>  2 files changed, 62 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>> index 0c58f92c2d44..97783ba91e00 100644
>>>>> --- a/fs/btrfs/ctree.h
>>>>> +++ b/fs/btrfs/ctree.h
>>>>> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>>>>>  	unsigned short full;
>>>>>  	unsigned short type;
>>>>>  	unsigned short failfast;
>>>>> +
>>>>> +	/*
>>>>> +	 * Qgroup equivalent for @size @reserved
>>>>> +	 *
>>>>> +	 * Unlike normal normal @size/@reserved for inode rsv,
>>>>> +	 * qgroup doesn't care about things like csum size nor how many tree
>>>>> +	 * blocks it will need to reserve.
>>>>> +	 *
>>>>> +	 * Qgroup cares more about *NET* change of extent usage.
>>>>> +	 * So for ONE newly inserted file extent, in worst case it will cause
>>>>> +	 * leaf split and level increase, nodesize for each file extent
>>>>> +	 * is already way overkilled.
>>>>> +	 *
>>>>> +	 * In short, qgroup_size/reserved is the up limit of possible needed
>>>>> +	 * qgroup metadata reservation.
>>>>> +	 */
>>>>> +	u64 qgroup_rsv_size;
>>>>> +	u64 qgroup_rsv_reserved;
>>>>>  };
>>>>>  
>>>>>  /*
>>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>>> index 986660f301de..9d579c7bcf7f 100644
>>>>> --- a/fs/btrfs/extent-tree.c
>>>>> +++ b/fs/btrfs/extent-tree.c
>>>>> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
>>>>>  
>>>>>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>>>  				    struct btrfs_block_rsv *block_rsv,
>>>>> -				    struct btrfs_block_rsv *dest, u64 num_bytes)
>>>>> +				    struct btrfs_block_rsv *dest, u64 num_bytes,
>>>>> +				    u64 *qgroup_to_release_ret)
>>>>>  {
>>>>>  	struct btrfs_space_info *space_info = block_rsv->space_info;
>>>>> +	u64 qgroup_to_release = 0;
>>>>>  	u64 ret;
>>>>>  
>>>>>  	spin_lock(&block_rsv->lock);
>>>>> -	if (num_bytes == (u64)-1)
>>>>> +	if (num_bytes == (u64)-1) {
>>>>>  		num_bytes = block_rsv->size;
>>>>> +		qgroup_to_release = block_rsv->qgroup_rsv_size;
>>>>> +	}
>>>>>  	block_rsv->size -= num_bytes;
>>>>>  	if (block_rsv->reserved >= block_rsv->size) {
>>>>>  		num_bytes = block_rsv->reserved - block_rsv->size;
>>>>> @@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>>>  	} else {
>>>>>  		num_bytes = 0;
>>>>>  	}
>>>>> +	if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
>>>>> +		qgroup_to_release = block_rsv->qgroup_rsv_reserved -
>>>>> +				    block_rsv->qgroup_rsv_size;
>>>>> +		block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
>>>>> +	} else
>>>>> +		qgroup_to_release = 0;
>>>>>  	spin_unlock(&block_rsv->lock);
>>>>>  
>>>>>  	ret = num_bytes;
>>>>> @@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>>>  			space_info_add_old_bytes(fs_info, space_info,
>>>>>  						 num_bytes);
>>>>>  	}
>>>>> +	if (qgroup_to_release_ret)
>>>>> +		*qgroup_to_release_ret = qgroup_to_release;
>>>>>  	return ret;
>>>>>  }
>>>>>  
>>>>> @@ -5744,17 +5756,21 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>>>>  	struct btrfs_root *root = inode->root;
>>>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>>>  	u64 num_bytes = 0;
>>>>> +	u64 qgroup_num_bytes = 0;
>>>>>  	int ret = -ENOSPC;
>>>>>  
>>>>>  	spin_lock(&block_rsv->lock);
>>>>>  	if (block_rsv->reserved < block_rsv->size)
>>>>>  		num_bytes = block_rsv->size - block_rsv->reserved;
>>>>> +	if (block_rsv->qgroup_rsv_reserved < block_rsv->qgroup_rsv_size)
>>>>> +		qgroup_num_bytes = block_rsv->qgroup_rsv_size -
>>>>> +				   block_rsv->qgroup_rsv_reserved;
>>>>>  	spin_unlock(&block_rsv->lock);
>>>>>  
>>>>>  	if (num_bytes == 0)
>>>>>  		return 0;
>>>>>  
>>>>> -	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
>>>>> +	ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_num_bytes, true);
>>>>>  	if (ret)
>>>>>  		return ret;
>>>>>  	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
>>>>> @@ -5762,7 +5778,13 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>>>>  		block_rsv_add_bytes(block_rsv, num_bytes, 0);
>>>>>  		trace_btrfs_space_reservation(root->fs_info, "delalloc",
>>>>>  					      btrfs_ino(inode), num_bytes, 1);
>>>>> -	}
>>>>> +
>>>>> +		/* Don't forget to increase qgroup_rsv_reserved */
>>>>> +		spin_lock(&block_rsv->lock);
>>>>> +		block_rsv->qgroup_rsv_reserved += qgroup_num_bytes;
>>>>> +		spin_unlock(&block_rsv->lock);
>>>>> +	} else
>>>>> +		btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes);
>>>>>  	return ret;
>>>>>  }
>>>>>  
>>>>> @@ -5784,20 +5806,23 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>>>>>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>>>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>>>  	u64 released = 0;
>>>>> +	u64 qgroup_to_release = 0;
>>>>>  
>>>>>  	/*
>>>>>  	 * Since we statically set the block_rsv->size we just want to say we
>>>>>  	 * are releasing 0 bytes, and then we'll just get the reservation over
>>>>>  	 * the size free'd.
>>>>>  	 */
>>>>> -	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0);
>>>>> +	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
>>>>> +					   &qgroup_to_release);
>>>>>  	if (released > 0)
>>>>>  		trace_btrfs_space_reservation(fs_info, "delalloc",
>>>>>  					      btrfs_ino(inode), released, 0);
>>>>>  	if (qgroup_free)
>>>>> -		btrfs_qgroup_free_meta_prealloc(inode->root, released);
>>>>> +		btrfs_qgroup_free_meta_prealloc(inode->root, qgroup_to_release);
>>>>>  	else
>>>>> -		btrfs_qgroup_convert_reserved_meta(inode->root, released);
>>>>> +		btrfs_qgroup_convert_reserved_meta(inode->root,
>>>>> +						   qgroup_to_release);
>>>>>  }
>>>>>  
>>>>>  void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>>>>> @@ -5809,7 +5834,7 @@ void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>>>>>  	if (global_rsv == block_rsv ||
>>>>>  	    block_rsv->space_info != global_rsv->space_info)
>>>>>  		global_rsv = NULL;
>>>>> -	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes);
>>>>> +	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
>>>>>  }
>>>>>  
>>>>>  static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>>> @@ -5889,7 +5914,7 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>>>  static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>>>  {
>>>>>  	block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
>>>>> -				(u64)-1);
>>>>> +				(u64)-1, NULL);
>>>>>  	WARN_ON(fs_info->trans_block_rsv.size > 0);
>>>>>  	WARN_ON(fs_info->trans_block_rsv.reserved > 0);
>>>>>  	WARN_ON(fs_info->chunk_block_rsv.size > 0);
>>>>> @@ -5931,7 +5956,7 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
>>>>>  	WARN_ON_ONCE(!list_empty(&trans->new_bgs));
>>>>>  
>>>>>  	block_rsv_release_bytes(fs_info, &fs_info->chunk_block_rsv, NULL,
>>>>> -				trans->chunk_bytes_reserved);
>>>>> +				trans->chunk_bytes_reserved, NULL);
>>>>>  	trans->chunk_bytes_reserved = 0;
>>>>>  }
>>>>>  
>>>>> @@ -6036,6 +6061,7 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>>>>  {
>>>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>>>  	u64 reserve_size = 0;
>>>>> +	u64 qgroup_rsv_size = 0;
>>>>>  	u64 csum_leaves;
>>>>>  	unsigned outstanding_extents;
>>>>>  
>>>>> @@ -6048,9 +6074,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>>>>  						 inode->csum_bytes);
>>>>>  	reserve_size += btrfs_calc_trans_metadata_size(fs_info,
>>>>>  						       csum_leaves);
>>>>> +	/*
>>>>> +	 * For qgroup rsv, the calculation is very simple:
>>>>> +	 * nodesize for each outstanding extent.
>>>>> +	 * This is already overkilled under most case.
>>>>> +	 */
>>>>> +	qgroup_rsv_size = outstanding_extents * fs_info->nodesize;
>>>>>  
>>>>>  	spin_lock(&block_rsv->lock);
>>>>>  	block_rsv->size = reserve_size;
>>>>> +	block_rsv->qgroup_rsv_size = qgroup_rsv_size;
>>>>>  	spin_unlock(&block_rsv->lock);
>>>>>  }
>>>>>  
>>>>> @@ -8405,7 +8438,7 @@ static void unuse_block_rsv(struct btrfs_fs_info *fs_info,
>>>>>  			    struct btrfs_block_rsv *block_rsv, u32 blocksize)
>>>>>  {
>>>>>  	block_rsv_add_bytes(block_rsv, blocksize, 0);
>>>>> -	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0);
>>>>> +	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0, NULL);
>>>>>  }
>>>>>  
>>>>>  /*
>>>>>
>>>>
>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-02-23 11:00           ` Nikolay Borisov
@ 2018-02-23 11:22             ` Qu Wenruo
  0 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-02-23 11:22 UTC (permalink / raw)
  To: Nikolay Borisov, Jeff Mahoney, Qu Wenruo, linux-btrfs; +Cc: dsterba


[-- Attachment #1.1: Type: text/plain, Size: 10833 bytes --]

[snip]
>>
>> We don't need to do such check at call site.
>>
>> Just do the calculation (which should be really simple, as simple as
>> nodesize * nr_items_to_add), and pass it to btrfs_qgroup_reserve_* APIs,
>> which would handle the quota enabled check.
>>
>>>
>>> be contained into the block rsv apis. This will ensure lifetime of
>>> blockrsv/qgroups is always consistent. I think this only applies to
>>> qgroup metadata reservations.
>>
>> Yep, and only applies to PREALLOC type metadata reservation.
>> For per-transaction rsv, it's handled by PERTRANS type.
> 
> And what about the btrfs_qgroup_reserve_meta()-type reservations, this
> function doesn't take a transaction, what kind of reservation is that o_O

We only have two functions to reserve metadata:
1) btrfs_qgroup_reserve_meta_pertrans
2) btrfs_qgroup_reserve_meta_prealloc

No 3rd option.
And each function name should explain themselves.

For pertrans rsv, we don't really need @tran parameter in fact.
We only needs to tell qgroup to reserve some meta space for pertrans type.

And there is only one caller for pertrans, that in transaction.c.

> 
> To sum it up we have  PERTRANS, PREALLOC and  btrfs_qgroup_reserve_meta
> and those are 3 distinct type of reservations, correct?

Only 2 in facts.

Thanks,
Qu

>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>>>
>>>>> -Jeff
>>>>>
>>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>>> ---
>>>>>>  fs/btrfs/ctree.h       | 18 +++++++++++++++++
>>>>>>  fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++----------
>>>>>>  2 files changed, 62 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>>>>> index 0c58f92c2d44..97783ba91e00 100644
>>>>>> --- a/fs/btrfs/ctree.h
>>>>>> +++ b/fs/btrfs/ctree.h
>>>>>> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>>>>>>  	unsigned short full;
>>>>>>  	unsigned short type;
>>>>>>  	unsigned short failfast;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * Qgroup equivalent for @size @reserved
>>>>>> +	 *
>>>>>> +	 * Unlike normal normal @size/@reserved for inode rsv,
>>>>>> +	 * qgroup doesn't care about things like csum size nor how many tree
>>>>>> +	 * blocks it will need to reserve.
>>>>>> +	 *
>>>>>> +	 * Qgroup cares more about *NET* change of extent usage.
>>>>>> +	 * So for ONE newly inserted file extent, in worst case it will cause
>>>>>> +	 * leaf split and level increase, nodesize for each file extent
>>>>>> +	 * is already way overkilled.
>>>>>> +	 *
>>>>>> +	 * In short, qgroup_size/reserved is the up limit of possible needed
>>>>>> +	 * qgroup metadata reservation.
>>>>>> +	 */
>>>>>> +	u64 qgroup_rsv_size;
>>>>>> +	u64 qgroup_rsv_reserved;
>>>>>>  };
>>>>>>  
>>>>>>  /*
>>>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>>>> index 986660f301de..9d579c7bcf7f 100644
>>>>>> --- a/fs/btrfs/extent-tree.c
>>>>>> +++ b/fs/btrfs/extent-tree.c
>>>>>> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
>>>>>>  
>>>>>>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>>>>  				    struct btrfs_block_rsv *block_rsv,
>>>>>> -				    struct btrfs_block_rsv *dest, u64 num_bytes)
>>>>>> +				    struct btrfs_block_rsv *dest, u64 num_bytes,
>>>>>> +				    u64 *qgroup_to_release_ret)
>>>>>>  {
>>>>>>  	struct btrfs_space_info *space_info = block_rsv->space_info;
>>>>>> +	u64 qgroup_to_release = 0;
>>>>>>  	u64 ret;
>>>>>>  
>>>>>>  	spin_lock(&block_rsv->lock);
>>>>>> -	if (num_bytes == (u64)-1)
>>>>>> +	if (num_bytes == (u64)-1) {
>>>>>>  		num_bytes = block_rsv->size;
>>>>>> +		qgroup_to_release = block_rsv->qgroup_rsv_size;
>>>>>> +	}
>>>>>>  	block_rsv->size -= num_bytes;
>>>>>>  	if (block_rsv->reserved >= block_rsv->size) {
>>>>>>  		num_bytes = block_rsv->reserved - block_rsv->size;
>>>>>> @@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>>>>  	} else {
>>>>>>  		num_bytes = 0;
>>>>>>  	}
>>>>>> +	if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
>>>>>> +		qgroup_to_release = block_rsv->qgroup_rsv_reserved -
>>>>>> +				    block_rsv->qgroup_rsv_size;
>>>>>> +		block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
>>>>>> +	} else
>>>>>> +		qgroup_to_release = 0;
>>>>>>  	spin_unlock(&block_rsv->lock);
>>>>>>  
>>>>>>  	ret = num_bytes;
>>>>>> @@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>>>>  			space_info_add_old_bytes(fs_info, space_info,
>>>>>>  						 num_bytes);
>>>>>>  	}
>>>>>> +	if (qgroup_to_release_ret)
>>>>>> +		*qgroup_to_release_ret = qgroup_to_release;
>>>>>>  	return ret;
>>>>>>  }
>>>>>>  
>>>>>> @@ -5744,17 +5756,21 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>>>>>  	struct btrfs_root *root = inode->root;
>>>>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>>>>  	u64 num_bytes = 0;
>>>>>> +	u64 qgroup_num_bytes = 0;
>>>>>>  	int ret = -ENOSPC;
>>>>>>  
>>>>>>  	spin_lock(&block_rsv->lock);
>>>>>>  	if (block_rsv->reserved < block_rsv->size)
>>>>>>  		num_bytes = block_rsv->size - block_rsv->reserved;
>>>>>> +	if (block_rsv->qgroup_rsv_reserved < block_rsv->qgroup_rsv_size)
>>>>>> +		qgroup_num_bytes = block_rsv->qgroup_rsv_size -
>>>>>> +				   block_rsv->qgroup_rsv_reserved;
>>>>>>  	spin_unlock(&block_rsv->lock);
>>>>>>  
>>>>>>  	if (num_bytes == 0)
>>>>>>  		return 0;
>>>>>>  
>>>>>> -	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
>>>>>> +	ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_num_bytes, true);
>>>>>>  	if (ret)
>>>>>>  		return ret;
>>>>>>  	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
>>>>>> @@ -5762,7 +5778,13 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>>>>>  		block_rsv_add_bytes(block_rsv, num_bytes, 0);
>>>>>>  		trace_btrfs_space_reservation(root->fs_info, "delalloc",
>>>>>>  					      btrfs_ino(inode), num_bytes, 1);
>>>>>> -	}
>>>>>> +
>>>>>> +		/* Don't forget to increase qgroup_rsv_reserved */
>>>>>> +		spin_lock(&block_rsv->lock);
>>>>>> +		block_rsv->qgroup_rsv_reserved += qgroup_num_bytes;
>>>>>> +		spin_unlock(&block_rsv->lock);
>>>>>> +	} else
>>>>>> +		btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes);
>>>>>>  	return ret;
>>>>>>  }
>>>>>>  
>>>>>> @@ -5784,20 +5806,23 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>>>>>>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>>>>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>>>>  	u64 released = 0;
>>>>>> +	u64 qgroup_to_release = 0;
>>>>>>  
>>>>>>  	/*
>>>>>>  	 * Since we statically set the block_rsv->size we just want to say we
>>>>>>  	 * are releasing 0 bytes, and then we'll just get the reservation over
>>>>>>  	 * the size free'd.
>>>>>>  	 */
>>>>>> -	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0);
>>>>>> +	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
>>>>>> +					   &qgroup_to_release);
>>>>>>  	if (released > 0)
>>>>>>  		trace_btrfs_space_reservation(fs_info, "delalloc",
>>>>>>  					      btrfs_ino(inode), released, 0);
>>>>>>  	if (qgroup_free)
>>>>>> -		btrfs_qgroup_free_meta_prealloc(inode->root, released);
>>>>>> +		btrfs_qgroup_free_meta_prealloc(inode->root, qgroup_to_release);
>>>>>>  	else
>>>>>> -		btrfs_qgroup_convert_reserved_meta(inode->root, released);
>>>>>> +		btrfs_qgroup_convert_reserved_meta(inode->root,
>>>>>> +						   qgroup_to_release);
>>>>>>  }
>>>>>>  
>>>>>>  void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>>>>>> @@ -5809,7 +5834,7 @@ void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>>>>>>  	if (global_rsv == block_rsv ||
>>>>>>  	    block_rsv->space_info != global_rsv->space_info)
>>>>>>  		global_rsv = NULL;
>>>>>> -	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes);
>>>>>> +	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
>>>>>>  }
>>>>>>  
>>>>>>  static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>>>> @@ -5889,7 +5914,7 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>>>>  static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>>>>  {
>>>>>>  	block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
>>>>>> -				(u64)-1);
>>>>>> +				(u64)-1, NULL);
>>>>>>  	WARN_ON(fs_info->trans_block_rsv.size > 0);
>>>>>>  	WARN_ON(fs_info->trans_block_rsv.reserved > 0);
>>>>>>  	WARN_ON(fs_info->chunk_block_rsv.size > 0);
>>>>>> @@ -5931,7 +5956,7 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
>>>>>>  	WARN_ON_ONCE(!list_empty(&trans->new_bgs));
>>>>>>  
>>>>>>  	block_rsv_release_bytes(fs_info, &fs_info->chunk_block_rsv, NULL,
>>>>>> -				trans->chunk_bytes_reserved);
>>>>>> +				trans->chunk_bytes_reserved, NULL);
>>>>>>  	trans->chunk_bytes_reserved = 0;
>>>>>>  }
>>>>>>  
>>>>>> @@ -6036,6 +6061,7 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>>>>>  {
>>>>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>>>>  	u64 reserve_size = 0;
>>>>>> +	u64 qgroup_rsv_size = 0;
>>>>>>  	u64 csum_leaves;
>>>>>>  	unsigned outstanding_extents;
>>>>>>  
>>>>>> @@ -6048,9 +6074,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>>>>>  						 inode->csum_bytes);
>>>>>>  	reserve_size += btrfs_calc_trans_metadata_size(fs_info,
>>>>>>  						       csum_leaves);
>>>>>> +	/*
>>>>>> +	 * For qgroup rsv, the calculation is very simple:
>>>>>> +	 * nodesize for each outstanding extent.
>>>>>> +	 * This is already overkilled under most case.
>>>>>> +	 */
>>>>>> +	qgroup_rsv_size = outstanding_extents * fs_info->nodesize;
>>>>>>  
>>>>>>  	spin_lock(&block_rsv->lock);
>>>>>>  	block_rsv->size = reserve_size;
>>>>>> +	block_rsv->qgroup_rsv_size = qgroup_rsv_size;
>>>>>>  	spin_unlock(&block_rsv->lock);
>>>>>>  }
>>>>>>  
>>>>>> @@ -8405,7 +8438,7 @@ static void unuse_block_rsv(struct btrfs_fs_info *fs_info,
>>>>>>  			    struct btrfs_block_rsv *block_rsv, u32 blocksize)
>>>>>>  {
>>>>>>  	block_rsv_add_bytes(block_rsv, blocksize, 0);
>>>>>> -	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0);
>>>>>> +	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0, NULL);
>>>>>>  }
>>>>>>  
>>>>>>  /*
>>>>>>
>>>>>
>>>>>
>>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-02-22 23:34     ` Qu Wenruo
  2018-02-23  8:14       ` Nikolay Borisov
@ 2018-02-23 14:43       ` Jeff Mahoney
  1 sibling, 0 replies; 31+ messages in thread
From: Jeff Mahoney @ 2018-02-23 14:43 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs; +Cc: dsterba

On 2/22/18 6:34 PM, Qu Wenruo wrote:
> 
> 
> On 2018年02月23日 06:44, Jeff Mahoney wrote:
>> On 12/22/17 1:18 AM, Qu Wenruo wrote:
>>> Unlike reservation calculation used in inode rsv for metadata, qgroup
>>> doesn't really need to care things like csum size or extent usage for
>>> whole tree COW.
>>>
>>> Qgroup care more about net change of extent usage.
>>> That's to say, if we're going to insert one file extent, it will mostly
>>> find its place in CoWed tree block, leaving no change in extent usage.
>>> Or cause leaf split, result one new net extent, increasing qgroup number
>>> by nodesize.
>>> (Or even more rare case, increase the tree level, increasing qgroup
>>> number by 2 * nodesize)
>>>
>>> So here instead of using the way overkilled calculation for extent
>>> allocator, which cares more about accurate and no error, qgroup doesn't
>>> need that over-calculated reservation.
>>>
>>> This patch will maintain 2 new members in btrfs_block_rsv structure for
>>> qgroup, using much smaller calculation for qgroup rsv, reducing false
>>> EDQUOT.
>>
>>
>> I think this is the right idea but, rather than being the last step in a
>> qgroup rework, it should be the first.
> 
> That's right.
> 
> Although putting it as 1st patch needs extra work to co-operate with
> later type seperation.
> 
>>  Don't qgroup reservation
>> lifetimes match the block reservation lifetimes?
> 
> Also correct, but...
> 
>>  We wouldn't want a
>> global reserve and we wouldn't track usage on a per-block basis, but
>> they should otherwise match.  We already have clear APIs surrounding the
>> use of block reservations, so it seems to me that it'd make sense to
>> extend those to cover the qgroup cases as well.  Otherwise, the rest of
>> your patchset makes a parallel reservation system with a different API.
>> That keeps the current state of qgroups as a bolt-on that always needs
>> to be tracked separately (and which developers need to ensure they get
>> right).
> 
> The problem is, block reservation is designed to ensure every CoW could
> be fulfilled without error.
> 
> That's to say, for case like CoW write with csum, we need to care about
> space reservation not only for EXTENT_DATA in fs tree, but also later
> EXTENT_ITEM for extent tree, and CSUM_ITEM for csum tree.
> 
> However extent tree and csum tree doesn't contribute to quota at all.
> If we follow such over-reservation, it would be much much easier to hit
> false EDQUOTA early.

I'm not suggesting a 1:1 mapping between block reservations and qgroup
reservations.  If that were possible, we wouldn't need separate
reservations at all.  What we can do is only use bytes from the qgroup
reservation when we allocate the leaf nodes belonging to the root we're
tracking.  Everywhere else we can migrate bytes normally between
reservations the same way we do for block reservations.  As we discussed
offline yesterday, I'll work up something along what I have in mind and
see if it works out.

-Jeff

> That's the main reason why a separate (and a little simplified) block
> rsv tracking system.
> 
> And if there is better solution, I'm all ears.
> 
> Thanks,
> Qu
> 
>>
>> -Jeff
>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  fs/btrfs/ctree.h       | 18 +++++++++++++++++
>>>  fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++----------
>>>  2 files changed, 62 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index 0c58f92c2d44..97783ba91e00 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>>>  	unsigned short full;
>>>  	unsigned short type;
>>>  	unsigned short failfast;
>>> +
>>> +	/*
>>> +	 * Qgroup equivalent for @size @reserved
>>> +	 *
>>> +	 * Unlike normal normal @size/@reserved for inode rsv,
>>> +	 * qgroup doesn't care about things like csum size nor how many tree
>>> +	 * blocks it will need to reserve.
>>> +	 *
>>> +	 * Qgroup cares more about *NET* change of extent usage.
>>> +	 * So for ONE newly inserted file extent, in worst case it will cause
>>> +	 * leaf split and level increase, nodesize for each file extent
>>> +	 * is already way overkilled.
>>> +	 *
>>> +	 * In short, qgroup_size/reserved is the up limit of possible needed
>>> +	 * qgroup metadata reservation.
>>> +	 */
>>> +	u64 qgroup_rsv_size;
>>> +	u64 qgroup_rsv_reserved;
>>>  };
>>>  
>>>  /*
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 986660f301de..9d579c7bcf7f 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
>>>  
>>>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>  				    struct btrfs_block_rsv *block_rsv,
>>> -				    struct btrfs_block_rsv *dest, u64 num_bytes)
>>> +				    struct btrfs_block_rsv *dest, u64 num_bytes,
>>> +				    u64 *qgroup_to_release_ret)
>>>  {
>>>  	struct btrfs_space_info *space_info = block_rsv->space_info;
>>> +	u64 qgroup_to_release = 0;
>>>  	u64 ret;
>>>  
>>>  	spin_lock(&block_rsv->lock);
>>> -	if (num_bytes == (u64)-1)
>>> +	if (num_bytes == (u64)-1) {
>>>  		num_bytes = block_rsv->size;
>>> +		qgroup_to_release = block_rsv->qgroup_rsv_size;
>>> +	}
>>>  	block_rsv->size -= num_bytes;
>>>  	if (block_rsv->reserved >= block_rsv->size) {
>>>  		num_bytes = block_rsv->reserved - block_rsv->size;
>>> @@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>  	} else {
>>>  		num_bytes = 0;
>>>  	}
>>> +	if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
>>> +		qgroup_to_release = block_rsv->qgroup_rsv_reserved -
>>> +				    block_rsv->qgroup_rsv_size;
>>> +		block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
>>> +	} else
>>> +		qgroup_to_release = 0;
>>>  	spin_unlock(&block_rsv->lock);
>>>  
>>>  	ret = num_bytes;
>>> @@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>>  			space_info_add_old_bytes(fs_info, space_info,
>>>  						 num_bytes);
>>>  	}
>>> +	if (qgroup_to_release_ret)
>>> +		*qgroup_to_release_ret = qgroup_to_release;
>>>  	return ret;
>>>  }
>>>  
>>> @@ -5744,17 +5756,21 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>>  	struct btrfs_root *root = inode->root;
>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>  	u64 num_bytes = 0;
>>> +	u64 qgroup_num_bytes = 0;
>>>  	int ret = -ENOSPC;
>>>  
>>>  	spin_lock(&block_rsv->lock);
>>>  	if (block_rsv->reserved < block_rsv->size)
>>>  		num_bytes = block_rsv->size - block_rsv->reserved;
>>> +	if (block_rsv->qgroup_rsv_reserved < block_rsv->qgroup_rsv_size)
>>> +		qgroup_num_bytes = block_rsv->qgroup_rsv_size -
>>> +				   block_rsv->qgroup_rsv_reserved;
>>>  	spin_unlock(&block_rsv->lock);
>>>  
>>>  	if (num_bytes == 0)
>>>  		return 0;
>>>  
>>> -	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
>>> +	ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_num_bytes, true);
>>>  	if (ret)
>>>  		return ret;
>>>  	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
>>> @@ -5762,7 +5778,13 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>>>  		block_rsv_add_bytes(block_rsv, num_bytes, 0);
>>>  		trace_btrfs_space_reservation(root->fs_info, "delalloc",
>>>  					      btrfs_ino(inode), num_bytes, 1);
>>> -	}
>>> +
>>> +		/* Don't forget to increase qgroup_rsv_reserved */
>>> +		spin_lock(&block_rsv->lock);
>>> +		block_rsv->qgroup_rsv_reserved += qgroup_num_bytes;
>>> +		spin_unlock(&block_rsv->lock);
>>> +	} else
>>> +		btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes);
>>>  	return ret;
>>>  }
>>>  
>>> @@ -5784,20 +5806,23 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>>>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>  	u64 released = 0;
>>> +	u64 qgroup_to_release = 0;
>>>  
>>>  	/*
>>>  	 * Since we statically set the block_rsv->size we just want to say we
>>>  	 * are releasing 0 bytes, and then we'll just get the reservation over
>>>  	 * the size free'd.
>>>  	 */
>>> -	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0);
>>> +	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
>>> +					   &qgroup_to_release);
>>>  	if (released > 0)
>>>  		trace_btrfs_space_reservation(fs_info, "delalloc",
>>>  					      btrfs_ino(inode), released, 0);
>>>  	if (qgroup_free)
>>> -		btrfs_qgroup_free_meta_prealloc(inode->root, released);
>>> +		btrfs_qgroup_free_meta_prealloc(inode->root, qgroup_to_release);
>>>  	else
>>> -		btrfs_qgroup_convert_reserved_meta(inode->root, released);
>>> +		btrfs_qgroup_convert_reserved_meta(inode->root,
>>> +						   qgroup_to_release);
>>>  }
>>>  
>>>  void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>>> @@ -5809,7 +5834,7 @@ void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>>>  	if (global_rsv == block_rsv ||
>>>  	    block_rsv->space_info != global_rsv->space_info)
>>>  		global_rsv = NULL;
>>> -	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes);
>>> +	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
>>>  }
>>>  
>>>  static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
>>> @@ -5889,7 +5914,7 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>  static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
>>>  {
>>>  	block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
>>> -				(u64)-1);
>>> +				(u64)-1, NULL);
>>>  	WARN_ON(fs_info->trans_block_rsv.size > 0);
>>>  	WARN_ON(fs_info->trans_block_rsv.reserved > 0);
>>>  	WARN_ON(fs_info->chunk_block_rsv.size > 0);
>>> @@ -5931,7 +5956,7 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
>>>  	WARN_ON_ONCE(!list_empty(&trans->new_bgs));
>>>  
>>>  	block_rsv_release_bytes(fs_info, &fs_info->chunk_block_rsv, NULL,
>>> -				trans->chunk_bytes_reserved);
>>> +				trans->chunk_bytes_reserved, NULL);
>>>  	trans->chunk_bytes_reserved = 0;
>>>  }
>>>  
>>> @@ -6036,6 +6061,7 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>>  {
>>>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>>>  	u64 reserve_size = 0;
>>> +	u64 qgroup_rsv_size = 0;
>>>  	u64 csum_leaves;
>>>  	unsigned outstanding_extents;
>>>  
>>> @@ -6048,9 +6074,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>>>  						 inode->csum_bytes);
>>>  	reserve_size += btrfs_calc_trans_metadata_size(fs_info,
>>>  						       csum_leaves);
>>> +	/*
>>> +	 * For qgroup rsv, the calculation is very simple:
>>> +	 * nodesize for each outstanding extent.
>>> +	 * This is already overkilled under most case.
>>> +	 */
>>> +	qgroup_rsv_size = outstanding_extents * fs_info->nodesize;
>>>  
>>>  	spin_lock(&block_rsv->lock);
>>>  	block_rsv->size = reserve_size;
>>> +	block_rsv->qgroup_rsv_size = qgroup_rsv_size;
>>>  	spin_unlock(&block_rsv->lock);
>>>  }
>>>  
>>> @@ -8405,7 +8438,7 @@ static void unuse_block_rsv(struct btrfs_fs_info *fs_info,
>>>  			    struct btrfs_block_rsv *block_rsv, u32 blocksize)
>>>  {
>>>  	block_rsv_add_bytes(block_rsv, blocksize, 0);
>>> -	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0);
>>> +	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0, NULL);
>>>  }
>>>  
>>>  /*
>>>
>>
>>
> 


-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2017-12-22  6:18 ` [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv Qu Wenruo
  2018-02-22 22:44   ` Jeff Mahoney
@ 2018-04-03  7:30   ` Qu Wenruo
  2018-04-04  8:53     ` Nikolay Borisov
  2018-04-12 13:13     ` David Sterba
  1 sibling, 2 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-04-03  7:30 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: dsterba


[-- Attachment #1.1: Type: text/plain, Size: 9514 bytes --]

Hi David,

I didn't see this patch merged in your misc-next branch but only the
remaining patches.

However without this patch, btrfs qgroup reserved space will get
obviously increased as prealloc metadata reserved space is never freed
until inode reserved space is freed.

This would cause a lot of qgroup related test cases to fail.

Would you please merge this patch with all qgroup patchset?

Thanks,
Qu

On 2017年12月22日 14:18, Qu Wenruo wrote:
> Unlike reservation calculation used in inode rsv for metadata, qgroup
> doesn't really need to care things like csum size or extent usage for
> whole tree COW.
> 
> Qgroup care more about net change of extent usage.
> That's to say, if we're going to insert one file extent, it will mostly
> find its place in CoWed tree block, leaving no change in extent usage.
> Or cause leaf split, result one new net extent, increasing qgroup number
> by nodesize.
> (Or even more rare case, increase the tree level, increasing qgroup
> number by 2 * nodesize)
> 
> So here instead of using the way overkilled calculation for extent
> allocator, which cares more about accurate and no error, qgroup doesn't
> need that over-calculated reservation.
> 
> This patch will maintain 2 new members in btrfs_block_rsv structure for
> qgroup, using much smaller calculation for qgroup rsv, reducing false
> EDQUOT.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/ctree.h       | 18 +++++++++++++++++
>  fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++----------
>  2 files changed, 62 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 0c58f92c2d44..97783ba91e00 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>  	unsigned short full;
>  	unsigned short type;
>  	unsigned short failfast;
> +
> +	/*
> +	 * Qgroup equivalent for @size @reserved
> +	 *
> +	 * Unlike normal normal @size/@reserved for inode rsv,
> +	 * qgroup doesn't care about things like csum size nor how many tree
> +	 * blocks it will need to reserve.
> +	 *
> +	 * Qgroup cares more about *NET* change of extent usage.
> +	 * So for ONE newly inserted file extent, in worst case it will cause
> +	 * leaf split and level increase, nodesize for each file extent
> +	 * is already way overkilled.
> +	 *
> +	 * In short, qgroup_size/reserved is the up limit of possible needed
> +	 * qgroup metadata reservation.
> +	 */
> +	u64 qgroup_rsv_size;
> +	u64 qgroup_rsv_reserved;
>  };
>  
>  /*
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 986660f301de..9d579c7bcf7f 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct btrfs_fs_info *fs_info,
>  
>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>  				    struct btrfs_block_rsv *block_rsv,
> -				    struct btrfs_block_rsv *dest, u64 num_bytes)
> +				    struct btrfs_block_rsv *dest, u64 num_bytes,
> +				    u64 *qgroup_to_release_ret)
>  {
>  	struct btrfs_space_info *space_info = block_rsv->space_info;
> +	u64 qgroup_to_release = 0;
>  	u64 ret;
>  
>  	spin_lock(&block_rsv->lock);
> -	if (num_bytes == (u64)-1)
> +	if (num_bytes == (u64)-1) {
>  		num_bytes = block_rsv->size;
> +		qgroup_to_release = block_rsv->qgroup_rsv_size;
> +	}
>  	block_rsv->size -= num_bytes;
>  	if (block_rsv->reserved >= block_rsv->size) {
>  		num_bytes = block_rsv->reserved - block_rsv->size;
> @@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>  	} else {
>  		num_bytes = 0;
>  	}
> +	if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
> +		qgroup_to_release = block_rsv->qgroup_rsv_reserved -
> +				    block_rsv->qgroup_rsv_size;
> +		block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
> +	} else
> +		qgroup_to_release = 0;
>  	spin_unlock(&block_rsv->lock);
>  
>  	ret = num_bytes;
> @@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>  			space_info_add_old_bytes(fs_info, space_info,
>  						 num_bytes);
>  	}
> +	if (qgroup_to_release_ret)
> +		*qgroup_to_release_ret = qgroup_to_release;
>  	return ret;
>  }
>  
> @@ -5744,17 +5756,21 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>  	struct btrfs_root *root = inode->root;
>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>  	u64 num_bytes = 0;
> +	u64 qgroup_num_bytes = 0;
>  	int ret = -ENOSPC;
>  
>  	spin_lock(&block_rsv->lock);
>  	if (block_rsv->reserved < block_rsv->size)
>  		num_bytes = block_rsv->size - block_rsv->reserved;
> +	if (block_rsv->qgroup_rsv_reserved < block_rsv->qgroup_rsv_size)
> +		qgroup_num_bytes = block_rsv->qgroup_rsv_size -
> +				   block_rsv->qgroup_rsv_reserved;
>  	spin_unlock(&block_rsv->lock);
>  
>  	if (num_bytes == 0)
>  		return 0;
>  
> -	ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
> +	ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_num_bytes, true);
>  	if (ret)
>  		return ret;
>  	ret = reserve_metadata_bytes(root, block_rsv, num_bytes, flush);
> @@ -5762,7 +5778,13 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>  		block_rsv_add_bytes(block_rsv, num_bytes, 0);
>  		trace_btrfs_space_reservation(root->fs_info, "delalloc",
>  					      btrfs_ino(inode), num_bytes, 1);
> -	}
> +
> +		/* Don't forget to increase qgroup_rsv_reserved */
> +		spin_lock(&block_rsv->lock);
> +		block_rsv->qgroup_rsv_reserved += qgroup_num_bytes;
> +		spin_unlock(&block_rsv->lock);
> +	} else
> +		btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes);
>  	return ret;
>  }
>  
> @@ -5784,20 +5806,23 @@ void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free)
>  	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>  	u64 released = 0;
> +	u64 qgroup_to_release = 0;
>  
>  	/*
>  	 * Since we statically set the block_rsv->size we just want to say we
>  	 * are releasing 0 bytes, and then we'll just get the reservation over
>  	 * the size free'd.
>  	 */
> -	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0);
> +	released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0,
> +					   &qgroup_to_release);
>  	if (released > 0)
>  		trace_btrfs_space_reservation(fs_info, "delalloc",
>  					      btrfs_ino(inode), released, 0);
>  	if (qgroup_free)
> -		btrfs_qgroup_free_meta_prealloc(inode->root, released);
> +		btrfs_qgroup_free_meta_prealloc(inode->root, qgroup_to_release);
>  	else
> -		btrfs_qgroup_convert_reserved_meta(inode->root, released);
> +		btrfs_qgroup_convert_reserved_meta(inode->root,
> +						   qgroup_to_release);
>  }
>  
>  void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
> @@ -5809,7 +5834,7 @@ void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
>  	if (global_rsv == block_rsv ||
>  	    block_rsv->space_info != global_rsv->space_info)
>  		global_rsv = NULL;
> -	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes);
> +	block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL);
>  }
>  
>  static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
> @@ -5889,7 +5914,7 @@ static void init_global_block_rsv(struct btrfs_fs_info *fs_info)
>  static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
>  {
>  	block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
> -				(u64)-1);
> +				(u64)-1, NULL);
>  	WARN_ON(fs_info->trans_block_rsv.size > 0);
>  	WARN_ON(fs_info->trans_block_rsv.reserved > 0);
>  	WARN_ON(fs_info->chunk_block_rsv.size > 0);
> @@ -5931,7 +5956,7 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
>  	WARN_ON_ONCE(!list_empty(&trans->new_bgs));
>  
>  	block_rsv_release_bytes(fs_info, &fs_info->chunk_block_rsv, NULL,
> -				trans->chunk_bytes_reserved);
> +				trans->chunk_bytes_reserved, NULL);
>  	trans->chunk_bytes_reserved = 0;
>  }
>  
> @@ -6036,6 +6061,7 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>  {
>  	struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
>  	u64 reserve_size = 0;
> +	u64 qgroup_rsv_size = 0;
>  	u64 csum_leaves;
>  	unsigned outstanding_extents;
>  
> @@ -6048,9 +6074,16 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
>  						 inode->csum_bytes);
>  	reserve_size += btrfs_calc_trans_metadata_size(fs_info,
>  						       csum_leaves);
> +	/*
> +	 * For qgroup rsv, the calculation is very simple:
> +	 * nodesize for each outstanding extent.
> +	 * This is already overkilled under most case.
> +	 */
> +	qgroup_rsv_size = outstanding_extents * fs_info->nodesize;
>  
>  	spin_lock(&block_rsv->lock);
>  	block_rsv->size = reserve_size;
> +	block_rsv->qgroup_rsv_size = qgroup_rsv_size;
>  	spin_unlock(&block_rsv->lock);
>  }
>  
> @@ -8405,7 +8438,7 @@ static void unuse_block_rsv(struct btrfs_fs_info *fs_info,
>  			    struct btrfs_block_rsv *block_rsv, u32 blocksize)
>  {
>  	block_rsv_add_bytes(block_rsv, blocksize, 0);
> -	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0);
> +	block_rsv_release_bytes(fs_info, block_rsv, NULL, 0, NULL);
>  }
>  
>  /*
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-04-03  7:30   ` Qu Wenruo
@ 2018-04-04  8:53     ` Nikolay Borisov
  2018-04-04 12:17       ` Qu Wenruo
  2018-04-12 13:13     ` David Sterba
  1 sibling, 1 reply; 31+ messages in thread
From: Nikolay Borisov @ 2018-04-04  8:53 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs; +Cc: dsterba



On  3.04.2018 10:30, Qu Wenruo wrote:
> Hi David,
> 
> I didn't see this patch merged in your misc-next branch but only the
> remaining patches.
> 
> However without this patch, btrfs qgroup reserved space will get
> obviously increased as prealloc metadata reserved space is never freed
> until inode reserved space is freed.
> 
> This would cause a lot of qgroup related test cases to fail.
> 
> Would you please merge this patch with all qgroup patchset?

I'm seeing btrfs/022 and btrfs/099 fail as of today's misc-next. Both
are qgroup related tests.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-04-04  8:53     ` Nikolay Borisov
@ 2018-04-04 12:17       ` Qu Wenruo
  2018-04-12  0:03         ` Omar Sandoval
  0 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2018-04-04 12:17 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs; +Cc: dsterba



On 2018年04月04日 16:53, Nikolay Borisov wrote:
> 
> 
> On  3.04.2018 10:30, Qu Wenruo wrote:
>> Hi David,
>>
>> I didn't see this patch merged in your misc-next branch but only the
>> remaining patches.
>>
>> However without this patch, btrfs qgroup reserved space will get
>> obviously increased as prealloc metadata reserved space is never freed
>> until inode reserved space is freed.
>>
>> This would cause a lot of qgroup related test cases to fail.
>>
>> Would you please merge this patch with all qgroup patchset?
> 
> I'm seeing btrfs/022 and btrfs/099 fail as of today's misc-next. Both
> are qgroup related tests.

Exactly.

Wondering why this patch is left.

Thanks,
Qu

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-04-04 12:17       ` Qu Wenruo
@ 2018-04-12  0:03         ` Omar Sandoval
  2018-04-12 12:46           ` David Sterba
  0 siblings, 1 reply; 31+ messages in thread
From: Omar Sandoval @ 2018-04-12  0:03 UTC (permalink / raw)
  To: David Sterba; +Cc: Nikolay Borisov, Qu Wenruo, linux-btrfs

On Wed, Apr 04, 2018 at 08:17:22PM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年04月04日 16:53, Nikolay Borisov wrote:
> > 
> > 
> > On  3.04.2018 10:30, Qu Wenruo wrote:
> >> Hi David,
> >>
> >> I didn't see this patch merged in your misc-next branch but only the
> >> remaining patches.
> >>
> >> However without this patch, btrfs qgroup reserved space will get
> >> obviously increased as prealloc metadata reserved space is never freed
> >> until inode reserved space is freed.
> >>
> >> This would cause a lot of qgroup related test cases to fail.
> >>
> >> Would you please merge this patch with all qgroup patchset?
> > 
> > I'm seeing btrfs/022 and btrfs/099 fail as of today's misc-next. Both
> > are qgroup related tests.
> 
> Exactly.
> 
> Wondering why this patch is left.
> 
> Thanks,
> Qu

I'm also seeing several qgroups xfstests failures on Linus' master
branch. Dave?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-04-12  0:03         ` Omar Sandoval
@ 2018-04-12 12:46           ` David Sterba
  0 siblings, 0 replies; 31+ messages in thread
From: David Sterba @ 2018-04-12 12:46 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: David Sterba, Nikolay Borisov, Qu Wenruo, linux-btrfs

On Wed, Apr 11, 2018 at 05:03:15PM -0700, Omar Sandoval wrote:
> > 
> > On 2018年04月04日 16:53, Nikolay Borisov wrote:
> > > On  3.04.2018 10:30, Qu Wenruo wrote:
> > >> I didn't see this patch merged in your misc-next branch but only the
> > >> remaining patches.
> > >>
> > >> However without this patch, btrfs qgroup reserved space will get
> > >> obviously increased as prealloc metadata reserved space is never freed
> > >> until inode reserved space is freed.
> > >>
> > >> This would cause a lot of qgroup related test cases to fail.
> > >>
> > >> Would you please merge this patch with all qgroup patchset?
> > > 
> > > I'm seeing btrfs/022 and btrfs/099 fail as of today's misc-next. Both
> > > are qgroup related tests.
> > 
> > Exactly.
> > 
> > Wondering why this patch is left.
> I'm also seeing several qgroups xfstests failures on Linus' master
> branch. Dave?

Hm, dunno where the last patch got lost. The intention whas to merge the
whole series, I'll add the patch to 2nd pull.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-04-03  7:30   ` Qu Wenruo
  2018-04-04  8:53     ` Nikolay Borisov
@ 2018-04-12 13:13     ` David Sterba
  2018-04-16  7:53       ` Misono Tomohiro
  1 sibling, 1 reply; 31+ messages in thread
From: David Sterba @ 2018-04-12 13:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs, dsterba

On Tue, Apr 03, 2018 at 03:30:34PM +0800, Qu Wenruo wrote:
> I didn't see this patch merged in your misc-next branch but only the
> remaining patches.
> 
> However without this patch, btrfs qgroup reserved space will get
> obviously increased as prealloc metadata reserved space is never freed
> until inode reserved space is freed.
> 
> This would cause a lot of qgroup related test cases to fail.
> 
> Would you please merge this patch with all qgroup patchset?

For the record: patch 9 and 10 are now in misc next and will go to 4.17.
I need to let them through the for-next and other testing, so it will be
some of the post rc1 pull requests.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-04-12 13:13     ` David Sterba
@ 2018-04-16  7:53       ` Misono Tomohiro
  2018-04-16 17:27         ` David Sterba
  0 siblings, 1 reply; 31+ messages in thread
From: Misono Tomohiro @ 2018-04-16  7:53 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2018/04/12 22:13, David Sterba wrote:
> On Tue, Apr 03, 2018 at 03:30:34PM +0800, Qu Wenruo wrote:
>> I didn't see this patch merged in your misc-next branch but only the
>> remaining patches.
>>
>> However without this patch, btrfs qgroup reserved space will get
>> obviously increased as prealloc metadata reserved space is never freed
>> until inode reserved space is freed.
>>
>> This would cause a lot of qgroup related test cases to fail.
>>
>> Would you please merge this patch with all qgroup patchset?
> 
> For the record: patch 9 and 10 are now in misc next and will go to 4.17.
> I need to let them through the for-next and other testing, so it will be
> some of the post rc1 pull requests.

I still saw qgroup related xfstests (btrfs/022 etc.) fails in current misc-next branch
which includes 9th and 10th patches.

I noticed 5th patch in the tree (already in v4.17-rc1) is older version and this
seems the reason that the tests still fails.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-04-16  7:53       ` Misono Tomohiro
@ 2018-04-16 17:27         ` David Sterba
  2018-04-17  0:14           ` Qu Wenruo
  0 siblings, 1 reply; 31+ messages in thread
From: David Sterba @ 2018-04-16 17:27 UTC (permalink / raw)
  To: Misono Tomohiro; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Mon, Apr 16, 2018 at 04:53:10PM +0900, Misono Tomohiro wrote:
> On 2018/04/12 22:13, David Sterba wrote:
> > On Tue, Apr 03, 2018 at 03:30:34PM +0800, Qu Wenruo wrote:
> >> I didn't see this patch merged in your misc-next branch but only the
> >> remaining patches.
> >>
> >> However without this patch, btrfs qgroup reserved space will get
> >> obviously increased as prealloc metadata reserved space is never freed
> >> until inode reserved space is freed.
> >>
> >> This would cause a lot of qgroup related test cases to fail.
> >>
> >> Would you please merge this patch with all qgroup patchset?
> > 
> > For the record: patch 9 and 10 are now in misc next and will go to 4.17.
> > I need to let them through the for-next and other testing, so it will be
> > some of the post rc1 pull requests.
> 
> I still saw qgroup related xfstests (btrfs/022 etc.) fails in current misc-next branch
> which includes 9th and 10th patches.
> 
> I noticed 5th patch in the tree (already in v4.17-rc1) is older version and this
> seems the reason that the tests still fails.

Qu, can you please have a look a send an incremental fixup? Handling of
this patchset was not very good from my side, I should have asked for a
fresh resend as I applied the changes from mailinglist and not from git.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv
  2018-04-16 17:27         ` David Sterba
@ 2018-04-17  0:14           ` Qu Wenruo
  0 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-04-17  0:14 UTC (permalink / raw)
  To: dsterba, Misono Tomohiro, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1699 bytes --]



On 2018年04月17日 01:27, David Sterba wrote:
> On Mon, Apr 16, 2018 at 04:53:10PM +0900, Misono Tomohiro wrote:
>> On 2018/04/12 22:13, David Sterba wrote:
>>> On Tue, Apr 03, 2018 at 03:30:34PM +0800, Qu Wenruo wrote:
>>>> I didn't see this patch merged in your misc-next branch but only the
>>>> remaining patches.
>>>>
>>>> However without this patch, btrfs qgroup reserved space will get
>>>> obviously increased as prealloc metadata reserved space is never freed
>>>> until inode reserved space is freed.
>>>>
>>>> This would cause a lot of qgroup related test cases to fail.
>>>>
>>>> Would you please merge this patch with all qgroup patchset?
>>>
>>> For the record: patch 9 and 10 are now in misc next and will go to 4.17.
>>> I need to let them through the for-next and other testing, so it will be
>>> some of the post rc1 pull requests.
>>
>> I still saw qgroup related xfstests (btrfs/022 etc.) fails in current misc-next branch
>> which includes 9th and 10th patches.
>>
>> I noticed 5th patch in the tree (already in v4.17-rc1) is older version and this
>> seems the reason that the tests still fails.
> 
> Qu, can you please have a look a send an incremental fixup? Handling of
> this patchset was not very good from my side, I should have asked for a
> fresh resend as I applied the changes from mailinglist and not from git.

No problem, and for next time, I'll double check misc-next branch for
such difference.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-04-17  0:14 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-22  6:18 [PATCH v2 00/10] Use split qgroup rsv type Qu Wenruo
2017-12-22  6:18 ` [PATCH v2 01/10] btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans Qu Wenruo
2017-12-22  6:18 ` [PATCH v2 02/10] btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup Qu Wenruo
2017-12-22  6:18 ` [PATCH v2 03/10] btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS Qu Wenruo
2017-12-22  6:18 ` [PATCH v2 04/10] btrfs: qgroup: Use separate meta reservation type for delalloc Qu Wenruo
2017-12-26  5:37   ` [PATCH v2.2 " Qu Wenruo
2017-12-26  5:40     ` Qu Wenruo
2017-12-26  7:10       ` Lakshmipathi.G
2017-12-22  6:18 ` [PATCH v2 05/10] btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item Qu Wenruo
2017-12-22  6:18 ` [PATCH v2 06/10] btrfs: qgroup: Use root->qgroup_meta_rsv_* to record qgroup meta reserved space Qu Wenruo
2017-12-22  6:18 ` [PATCH v2 07/10] btrfs: qgroup: Update trace events for metadata reservation Qu Wenruo
2017-12-22  6:18 ` [PATCH v2 08/10] Revert "btrfs: qgroups: Retry after commit on getting EDQUOT" Qu Wenruo
2017-12-22  6:18 ` [PATCH v2 09/10] btrfs: qgroup: Commit transaction in advance to reduce early EDQUOT Qu Wenruo
2017-12-22  8:06   ` [PATCH v2.1 " Qu Wenruo
2017-12-22  6:18 ` [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv Qu Wenruo
2018-02-22 22:44   ` Jeff Mahoney
2018-02-22 23:34     ` Qu Wenruo
2018-02-23  8:14       ` Nikolay Borisov
2018-02-23  9:06         ` Qu Wenruo
2018-02-23 11:00           ` Nikolay Borisov
2018-02-23 11:22             ` Qu Wenruo
2018-02-23 14:43       ` Jeff Mahoney
2018-04-03  7:30   ` Qu Wenruo
2018-04-04  8:53     ` Nikolay Borisov
2018-04-04 12:17       ` Qu Wenruo
2018-04-12  0:03         ` Omar Sandoval
2018-04-12 12:46           ` David Sterba
2018-04-12 13:13     ` David Sterba
2018-04-16  7:53       ` Misono Tomohiro
2018-04-16 17:27         ` David Sterba
2018-04-17  0:14           ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.