All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Btrfs: fix total_bytes_pinned counter
@ 2017-06-06 23:45 Omar Sandoval
  2017-06-06 23:45 ` [PATCH 1/7] Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64 Omar Sandoval
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Omar Sandoval @ 2017-06-06 23:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, Liu Bo, kernel-team

From: Omar Sandoval <osandov@fb.com>

This series fixes several problems with the total_bytes_pinned counter.
Patches 1 and 2 are cleanups. Patches 3 and 4 are straightforward fixes.
Patch 5 is prep for patch 6. Patch 6 is the most complicated fix.
Patches 5 and 6 are ugly, I'd love any suggestions for a cleaner fix.
Finally, patch 7 adds a warning to catch similar issues in the future.

Tested with this xfstests diff, which exposes a bunch of the issues
without these fixes:

diff --git a/common/rc b/common/rc
index 743df427..95e7517b 100644
--- a/common/rc
+++ b/common/rc
@@ -484,6 +484,12 @@ _scratch_unmount()
 		_overlay_scratch_unmount
 		;;
 	btrfs)
+		if uuid="$(findmnt -n -o UUID "$SCRATCH_MNT")"; then
+			pushd "/sys/fs/btrfs/$uuid/allocation" >/dev/null
+			sync; sync; sync
+			grep -H -v '^0$' */total_bytes_pinned
+			popd >/dev/null
+		fi
 		$UMOUNT_PROG $SCRATCH_MNT
 		;;
 	*)

Omar Sandoval (7):
  Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64
  Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT()
  Btrfs: update total_bytes_pinned when pinning down extents
  Btrfs: always account pinned bytes when dropping a tree block ref
  Btrfs: return old and new total ref mods when adding delayed refs
  Btrfs: rework delayed ref total_bytes_pinned accounting
  Btrfs: warn if total_bytes_pinned is non-zero on unmount

 fs/btrfs/delayed-ref.c |  29 +++++++---
 fs/btrfs/delayed-ref.h |   6 ++-
 fs/btrfs/extent-tree.c | 142 +++++++++++++++++++++++++++++--------------------
 3 files changed, 109 insertions(+), 68 deletions(-)

-- 
2.13.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 1/7] Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64
  2017-06-06 23:45 [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Omar Sandoval
@ 2017-06-06 23:45 ` Omar Sandoval
  2017-06-12 13:39   ` David Sterba
  2017-06-12 17:34   ` Liu Bo
  2017-06-06 23:45 ` [PATCH 2/7] Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT() Omar Sandoval
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Omar Sandoval @ 2017-06-06 23:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, Liu Bo, kernel-team

From: Omar Sandoval <osandov@fb.com>

There are a few places where we pass in a negative num_bytes, so make it
signed for clarity. Also move it up in the file since later patches will
need it there.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent-tree.c | 41 ++++++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e390451c72e6..7c01b4e9e3b6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -766,6 +766,26 @@ static struct btrfs_space_info *__find_space_info(struct btrfs_fs_info *info,
 	return NULL;
 }
 
+static void add_pinned_bytes(struct btrfs_fs_info *fs_info, s64 num_bytes,
+			     u64 owner, u64 root_objectid)
+{
+	struct btrfs_space_info *space_info;
+	u64 flags;
+
+	if (owner < BTRFS_FIRST_FREE_OBJECTID) {
+		if (root_objectid == BTRFS_CHUNK_TREE_OBJECTID)
+			flags = BTRFS_BLOCK_GROUP_SYSTEM;
+		else
+			flags = BTRFS_BLOCK_GROUP_METADATA;
+	} else {
+		flags = BTRFS_BLOCK_GROUP_DATA;
+	}
+
+	space_info = __find_space_info(fs_info, flags);
+	BUG_ON(!space_info); /* Logic bug */
+	percpu_counter_add(&space_info->total_bytes_pinned, num_bytes);
+}
+
 /*
  * after adding space to the filesystem, we need to clear the full flags
  * on all the space infos.
@@ -6793,27 +6813,6 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
-static void add_pinned_bytes(struct btrfs_fs_info *fs_info, u64 num_bytes,
-			     u64 owner, u64 root_objectid)
-{
-	struct btrfs_space_info *space_info;
-	u64 flags;
-
-	if (owner < BTRFS_FIRST_FREE_OBJECTID) {
-		if (root_objectid == BTRFS_CHUNK_TREE_OBJECTID)
-			flags = BTRFS_BLOCK_GROUP_SYSTEM;
-		else
-			flags = BTRFS_BLOCK_GROUP_METADATA;
-	} else {
-		flags = BTRFS_BLOCK_GROUP_DATA;
-	}
-
-	space_info = __find_space_info(fs_info, flags);
-	BUG_ON(!space_info); /* Logic bug */
-	percpu_counter_add(&space_info->total_bytes_pinned, num_bytes);
-}
-
-
 static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 				struct btrfs_fs_info *info,
 				struct btrfs_delayed_ref_node *node, u64 parent,
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/7] Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT()
  2017-06-06 23:45 [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Omar Sandoval
  2017-06-06 23:45 ` [PATCH 1/7] Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64 Omar Sandoval
@ 2017-06-06 23:45 ` Omar Sandoval
  2017-06-12 13:26   ` David Sterba
  2017-06-21 17:31   ` David Sterba
  2017-06-06 23:45 ` [PATCH 3/7] Btrfs: update total_bytes_pinned when pinning down extents Omar Sandoval
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Omar Sandoval @ 2017-06-06 23:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, Liu Bo, kernel-team

From: Omar Sandoval <osandov@fb.com>

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7c01b4e9e3b6..6032e9a635f2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -782,7 +782,7 @@ static void add_pinned_bytes(struct btrfs_fs_info *fs_info, s64 num_bytes,
 	}
 
 	space_info = __find_space_info(fs_info, flags);
-	BUG_ON(!space_info); /* Logic bug */
+	ASSERT(space_info);
 	percpu_counter_add(&space_info->total_bytes_pinned, num_bytes);
 }
 
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/7] Btrfs: update total_bytes_pinned when pinning down extents
  2017-06-06 23:45 [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Omar Sandoval
  2017-06-06 23:45 ` [PATCH 1/7] Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64 Omar Sandoval
  2017-06-06 23:45 ` [PATCH 2/7] Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT() Omar Sandoval
@ 2017-06-06 23:45 ` Omar Sandoval
  2017-06-12 17:37   ` Liu Bo
  2017-06-06 23:45 ` [PATCH 4/7] Btrfs: always account pinned bytes when dropping a tree block ref Omar Sandoval
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Omar Sandoval @ 2017-06-06 23:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, Liu Bo, kernel-team

From: Omar Sandoval <osandov@fb.com>

The extents marked in pin_down_extent() will be unpinned later in
unpin_extent_range(), which decrements total_bytes_pinned.
pin_down_extent() must increment the counter to avoid underflowing it.
Also adjust btrfs_free_tree_block() to avoid accounting for the same
extent twice.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent-tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6032e9a635f2..9cff937415cb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6343,6 +6343,7 @@ static int pin_down_extent(struct btrfs_fs_info *fs_info,
 
 	trace_btrfs_space_reservation(fs_info, "pinned",
 				      cache->space_info->flags, num_bytes, 1);
+	percpu_counter_add(&cache->space_info->total_bytes_pinned, num_bytes);
 	set_extent_dirty(fs_info->pinned_extents, bytenr,
 			 bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
 	return 0;
@@ -7189,6 +7190,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 				goto out;
 		}
 
+		pin = 0;
 		cache = btrfs_lookup_block_group(fs_info, buf->start);
 
 		if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) {
@@ -7204,7 +7206,6 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 		btrfs_free_reserved_bytes(cache, buf->len, 0);
 		btrfs_put_block_group(cache);
 		trace_btrfs_reserved_extent_free(fs_info, buf->start, buf->len);
-		pin = 0;
 	}
 out:
 	if (pin)
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/7] Btrfs: always account pinned bytes when dropping a tree block ref
  2017-06-06 23:45 [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Omar Sandoval
                   ` (2 preceding siblings ...)
  2017-06-06 23:45 ` [PATCH 3/7] Btrfs: update total_bytes_pinned when pinning down extents Omar Sandoval
@ 2017-06-06 23:45 ` Omar Sandoval
  2017-06-07 20:20   ` Liu Bo
  2017-06-06 23:45 ` [PATCH 5/7] Btrfs: return old and new total ref mods when adding delayed refs Omar Sandoval
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Omar Sandoval @ 2017-06-06 23:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, Liu Bo, kernel-team

From: Omar Sandoval <osandov@fb.com>

Currently, we only increment total_bytes_pinned in
btrfs_free_tree_block() when dropping the last reference on the block.
However, when the delayed ref is run later, we will decrement
total_bytes_pinned regardless of whether it was the last reference or
not. This causes the counter to underflow when the reference we dropped
was not the last reference. Fix it by incrementing the counter
unconditionally, which is what btrfs_free_extent() does. This makes
total_bytes_pinned an overestimate when references to shared extents are
dropped, but in the worst case this will just make us try to commit the
transaction to try to free up space and find we didn't free enough.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent-tree.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9cff937415cb..52f3a0486e64 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7178,10 +7178,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 		BUG_ON(ret); /* -ENOMEM */
 	}
 
-	if (!last_ref)
-		return;
-
-	if (btrfs_header_generation(buf) == trans->transid) {
+	if (last_ref && btrfs_header_generation(buf) == trans->transid) {
 		struct btrfs_block_group_cache *cache;
 
 		if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
@@ -7212,11 +7209,13 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 		add_pinned_bytes(fs_info, buf->len, btrfs_header_level(buf),
 				 root->root_key.objectid);
 
-	/*
-	 * Deleting the buffer, clear the corrupt flag since it doesn't matter
-	 * anymore.
-	 */
-	clear_bit(EXTENT_BUFFER_CORRUPT, &buf->bflags);
+	if (last_ref) {
+		/*
+		 * Deleting the buffer, clear the corrupt flag since it doesn't
+		 * matter anymore.
+		 */
+		clear_bit(EXTENT_BUFFER_CORRUPT, &buf->bflags);
+	}
 }
 
 /* Can return -ENOMEM */
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/7] Btrfs: return old and new total ref mods when adding delayed refs
  2017-06-06 23:45 [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Omar Sandoval
                   ` (3 preceding siblings ...)
  2017-06-06 23:45 ` [PATCH 4/7] Btrfs: always account pinned bytes when dropping a tree block ref Omar Sandoval
@ 2017-06-06 23:45 ` Omar Sandoval
  2017-06-07 20:06   ` Liu Bo
  2017-06-06 23:45 ` [PATCH 6/7] Btrfs: rework delayed ref total_bytes_pinned accounting Omar Sandoval
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Omar Sandoval @ 2017-06-06 23:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, Liu Bo, kernel-team

From: Omar Sandoval <osandov@fb.com>

We need this to decide when to account pinned bytes.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/delayed-ref.c | 29 ++++++++++++++++++++--------
 fs/btrfs/delayed-ref.h |  6 ++++--
 fs/btrfs/extent-tree.c | 51 ++++++++++++++++++++++++++------------------------
 3 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index be70d90dfee5..93ffa898df6d 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -470,7 +470,8 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans,
 static noinline void
 update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
 			 struct btrfs_delayed_ref_node *existing,
-			 struct btrfs_delayed_ref_node *update)
+			 struct btrfs_delayed_ref_node *update,
+			 int *old_ref_mod_ret)
 {
 	struct btrfs_delayed_ref_head *existing_ref;
 	struct btrfs_delayed_ref_head *ref;
@@ -523,6 +524,8 @@ update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
 	 * currently, for refs we just added we know we're a-ok.
 	 */
 	old_ref_mod = existing_ref->total_ref_mod;
+	if (old_ref_mod_ret)
+		*old_ref_mod_ret = old_ref_mod;
 	existing->ref_mod += update->ref_mod;
 	existing_ref->total_ref_mod += update->ref_mod;
 
@@ -550,7 +553,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 		     struct btrfs_delayed_ref_node *ref,
 		     struct btrfs_qgroup_extent_record *qrecord,
 		     u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
-		     int action, int is_data, int *qrecord_inserted_ret)
+		     int action, int is_data, int *qrecord_inserted_ret,
+		     int *old_ref_mod, int *new_ref_mod)
 {
 	struct btrfs_delayed_ref_head *existing;
 	struct btrfs_delayed_ref_head *head_ref = NULL;
@@ -638,7 +642,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 	if (existing) {
 		WARN_ON(ref_root && reserved && existing->qgroup_ref_root
 			&& existing->qgroup_reserved);
-		update_existing_head_ref(delayed_refs, &existing->node, ref);
+		update_existing_head_ref(delayed_refs, &existing->node, ref,
+					 old_ref_mod);
 		/*
 		 * we've updated the existing ref, free the newly
 		 * allocated ref
@@ -646,6 +651,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 		kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
 		head_ref = existing;
 	} else {
+		if (old_ref_mod)
+			*old_ref_mod = 0;
 		if (is_data && count_mod < 0)
 			delayed_refs->pending_csums += num_bytes;
 		delayed_refs->num_heads++;
@@ -655,6 +662,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 	}
 	if (qrecord_inserted_ret)
 		*qrecord_inserted_ret = qrecord_inserted;
+	if (new_ref_mod)
+		*new_ref_mod = head_ref->total_ref_mod;
 	return head_ref;
 }
 
@@ -778,7 +787,8 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
 			       struct btrfs_trans_handle *trans,
 			       u64 bytenr, u64 num_bytes, u64 parent,
 			       u64 ref_root,  int level, int action,
-			       struct btrfs_delayed_extent_op *extent_op)
+			       struct btrfs_delayed_extent_op *extent_op,
+			       int *old_ref_mod, int *new_ref_mod)
 {
 	struct btrfs_delayed_tree_ref *ref;
 	struct btrfs_delayed_ref_head *head_ref;
@@ -813,7 +823,8 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
 	 */
 	head_ref = add_delayed_ref_head(fs_info, trans, &head_ref->node, record,
 					bytenr, num_bytes, 0, 0, action, 0,
-					&qrecord_inserted);
+					&qrecord_inserted, old_ref_mod,
+					new_ref_mod);
 
 	add_delayed_tree_ref(fs_info, trans, head_ref, &ref->node, bytenr,
 			     num_bytes, parent, ref_root, level, action);
@@ -838,7 +849,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 			       struct btrfs_trans_handle *trans,
 			       u64 bytenr, u64 num_bytes,
 			       u64 parent, u64 ref_root,
-			       u64 owner, u64 offset, u64 reserved, int action)
+			       u64 owner, u64 offset, u64 reserved, int action,
+			       int *old_ref_mod, int *new_ref_mod)
 {
 	struct btrfs_delayed_data_ref *ref;
 	struct btrfs_delayed_ref_head *head_ref;
@@ -878,7 +890,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 	 */
 	head_ref = add_delayed_ref_head(fs_info, trans, &head_ref->node, record,
 					bytenr, num_bytes, ref_root, reserved,
-					action, 1, &qrecord_inserted);
+					action, 1, &qrecord_inserted,
+					old_ref_mod, new_ref_mod);
 
 	add_delayed_data_ref(fs_info, trans, head_ref, &ref->node, bytenr,
 				   num_bytes, parent, ref_root, owner, offset,
@@ -909,7 +922,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
 
 	add_delayed_ref_head(fs_info, trans, &head_ref->node, NULL, bytenr,
 			     num_bytes, 0, 0, BTRFS_UPDATE_DELAYED_HEAD,
-			     extent_op->is_data, NULL);
+			     extent_op->is_data, NULL, NULL, NULL);
 
 	spin_unlock(&delayed_refs->lock);
 	return 0;
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index c0264ff01b53..ce88e4ac5276 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -247,12 +247,14 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
 			       struct btrfs_trans_handle *trans,
 			       u64 bytenr, u64 num_bytes, u64 parent,
 			       u64 ref_root, int level, int action,
-			       struct btrfs_delayed_extent_op *extent_op);
+			       struct btrfs_delayed_extent_op *extent_op,
+			       int *old_ref_mod, int *new_ref_mod);
 int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
 			       struct btrfs_trans_handle *trans,
 			       u64 bytenr, u64 num_bytes,
 			       u64 parent, u64 ref_root,
-			       u64 owner, u64 offset, u64 reserved, int action);
+			       u64 owner, u64 offset, u64 reserved, int action,
+			       int *old_ref_mod, int *new_ref_mod);
 int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
 				struct btrfs_trans_handle *trans,
 				u64 bytenr, u64 num_bytes,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 52f3a0486e64..6dce7abafe84 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2119,14 +2119,16 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 
 	if (owner < BTRFS_FIRST_FREE_OBJECTID) {
 		ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
-					num_bytes,
-					parent, root_objectid, (int)owner,
-					BTRFS_ADD_DELAYED_REF, NULL);
+						 num_bytes, parent,
+						 root_objectid, (int)owner,
+						 BTRFS_ADD_DELAYED_REF, NULL,
+						 NULL, NULL);
 	} else {
 		ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
-					num_bytes, parent, root_objectid,
-					owner, offset, 0,
-					BTRFS_ADD_DELAYED_REF);
+						 num_bytes, parent,
+						 root_objectid, owner, offset,
+						 0, BTRFS_ADD_DELAYED_REF, NULL,
+						 NULL);
 	}
 	return ret;
 }
@@ -7169,12 +7171,12 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 	int ret;
 
 	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
-		ret = btrfs_add_delayed_tree_ref(fs_info, trans,
-						 buf->start, buf->len,
-						 parent,
+		ret = btrfs_add_delayed_tree_ref(fs_info, trans, buf->start,
+						 buf->len, parent,
 						 root->root_key.objectid,
 						 btrfs_header_level(buf),
-						 BTRFS_DROP_DELAYED_REF, NULL);
+						 BTRFS_DROP_DELAYED_REF, NULL,
+						 NULL, NULL);
 		BUG_ON(ret); /* -ENOMEM */
 	}
 
@@ -7242,15 +7244,16 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
 		ret = 0;
 	} else if (owner < BTRFS_FIRST_FREE_OBJECTID) {
 		ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
-					num_bytes,
-					parent, root_objectid, (int)owner,
-					BTRFS_DROP_DELAYED_REF, NULL);
+						 num_bytes, parent,
+						 root_objectid, (int)owner,
+						 BTRFS_DROP_DELAYED_REF, NULL,
+						 NULL, NULL);
 	} else {
 		ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
-						num_bytes,
-						parent, root_objectid, owner,
-						offset, 0,
-						BTRFS_DROP_DELAYED_REF);
+						 num_bytes, parent,
+						 root_objectid, owner, offset,
+						 0, BTRFS_DROP_DELAYED_REF,
+						 NULL, NULL);
 	}
 	return ret;
 }
@@ -8198,9 +8201,9 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
 	BUG_ON(root_objectid == BTRFS_TREE_LOG_OBJECTID);
 
 	ret = btrfs_add_delayed_data_ref(fs_info, trans, ins->objectid,
-					 ins->offset, 0,
-					 root_objectid, owner, offset,
-					 ram_bytes, BTRFS_ADD_DELAYED_EXTENT);
+					 ins->offset, 0, root_objectid, owner,
+					 offset, ram_bytes,
+					 BTRFS_ADD_DELAYED_EXTENT, NULL, NULL);
 	return ret;
 }
 
@@ -8420,11 +8423,11 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
 		extent_op->is_data = false;
 		extent_op->level = level;
 
-		ret = btrfs_add_delayed_tree_ref(fs_info, trans,
-						 ins.objectid, ins.offset,
-						 parent, root_objectid, level,
+		ret = btrfs_add_delayed_tree_ref(fs_info, trans, ins.objectid,
+						 ins.offset, parent,
+						 root_objectid, level,
 						 BTRFS_ADD_DELAYED_EXTENT,
-						 extent_op);
+						 extent_op, NULL, NULL);
 		if (ret)
 			goto out_free_delayed;
 	}
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/7] Btrfs: rework delayed ref total_bytes_pinned accounting
  2017-06-06 23:45 [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Omar Sandoval
                   ` (4 preceding siblings ...)
  2017-06-06 23:45 ` [PATCH 5/7] Btrfs: return old and new total ref mods when adding delayed refs Omar Sandoval
@ 2017-06-06 23:45 ` Omar Sandoval
  2017-06-07 20:18   ` Liu Bo
  2017-06-06 23:45 ` [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount Omar Sandoval
  2017-06-07 15:48 ` [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Holger Hoffstätte
  7 siblings, 1 reply; 23+ messages in thread
From: Omar Sandoval @ 2017-06-06 23:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, Liu Bo, kernel-team

From: Omar Sandoval <osandov@fb.com>

The total_bytes_pinned counter is completely broken when accounting
delayed refs:

- If two drops for the same extent are merged, we will decrement
  total_bytes_pinned twice but only increment it once.
- If an add is merged into a drop or vice versa, we will decrement the
  total_bytes_pinned counter but never increment it.
- If multiple references to an extent are dropped, we will account it
  multiple times, potentially vastly over-estimating the number of bytes
  that will be freed by a commit and doing unnecessary work when we're
  close to ENOSPC.

The last issue is relatively minor, but the first two make the
total_bytes_pinned counter leak or underflow very often. These
accounting issues were introduced in b150a4f10d87 ("Btrfs: use a percpu
to keep track of possibly pinned bytes"), but they were papered over by
zeroing out the counter on every commit until d288db5dc011 ("Btrfs: fix
race of using total_bytes_pinned").

We need to make sure that an extent is accounted as pinned exactly once
if and only if we will drop references to it when when the transaction
is committed. Ideally we would only add to total_bytes_pinned when the
*last* reference is dropped, but this information isn't readily
available for data extents. Again, this over-estimation can lead to
extra commits when we're close to ENOSPC, but it's not as bad as before.

The fix implemented here is to increment total_bytes_pinned when the
total refmod count for an extent goes negative and decrement it if the
refmod count goes back to non-negative or after we've run all of the
delayed refs for that extent.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent-tree.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6dce7abafe84..75ad24f8d253 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2112,6 +2112,7 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 			 u64 bytenr, u64 num_bytes, u64 parent,
 			 u64 root_objectid, u64 owner, u64 offset)
 {
+	int old_ref_mod, new_ref_mod;
 	int ret;
 
 	BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID &&
@@ -2122,14 +2123,18 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 						 num_bytes, parent,
 						 root_objectid, (int)owner,
 						 BTRFS_ADD_DELAYED_REF, NULL,
-						 NULL, NULL);
+						 &old_ref_mod, &new_ref_mod);
 	} else {
 		ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
 						 num_bytes, parent,
 						 root_objectid, owner, offset,
-						 0, BTRFS_ADD_DELAYED_REF, NULL,
-						 NULL);
+						 0, BTRFS_ADD_DELAYED_REF,
+						 &old_ref_mod, &new_ref_mod);
 	}
+
+	if (ret == 0 && old_ref_mod < 0 && new_ref_mod >= 0)
+		add_pinned_bytes(fs_info, -num_bytes, owner, root_objectid);
+
 	return ret;
 }
 
@@ -2433,6 +2438,16 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
 		head = btrfs_delayed_node_to_head(node);
 		trace_run_delayed_ref_head(fs_info, node, head, node->action);
 
+		if (head->total_ref_mod < 0) {
+			struct btrfs_block_group_cache *cache;
+
+			cache = btrfs_lookup_block_group(fs_info, node->bytenr);
+			ASSERT(cache);
+			percpu_counter_add(&cache->space_info->total_bytes_pinned,
+					   -node->num_bytes);
+			btrfs_put_block_group(cache);
+		}
+
 		if (insert_reserved) {
 			btrfs_pin_extent(fs_info, node->bytenr,
 					 node->num_bytes, 1);
@@ -6269,6 +6284,8 @@ static int update_block_group(struct btrfs_trans_handle *trans,
 			trace_btrfs_space_reservation(info, "pinned",
 						      cache->space_info->flags,
 						      num_bytes, 1);
+			percpu_counter_add(&cache->space_info->total_bytes_pinned,
+					   num_bytes);
 			set_extent_dirty(info->pinned_extents,
 					 bytenr, bytenr + num_bytes - 1,
 					 GFP_NOFS | __GFP_NOFAIL);
@@ -7038,8 +7055,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 				goto out;
 			}
 		}
-		add_pinned_bytes(info, -num_bytes, owner_objectid,
-				 root_objectid);
 	} else {
 		if (found_extent) {
 			BUG_ON(is_data && refs_to_drop !=
@@ -7171,13 +7186,16 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 	int ret;
 
 	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
+		int old_ref_mod, new_ref_mod;
+
 		ret = btrfs_add_delayed_tree_ref(fs_info, trans, buf->start,
 						 buf->len, parent,
 						 root->root_key.objectid,
 						 btrfs_header_level(buf),
 						 BTRFS_DROP_DELAYED_REF, NULL,
-						 NULL, NULL);
+						 &old_ref_mod, &new_ref_mod);
 		BUG_ON(ret); /* -ENOMEM */
+		pin = old_ref_mod >= 0 && new_ref_mod < 0;
 	}
 
 	if (last_ref && btrfs_header_generation(buf) == trans->transid) {
@@ -7226,12 +7244,12 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
 		      u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid,
 		      u64 owner, u64 offset)
 {
+	int old_ref_mod, new_ref_mod;
 	int ret;
 
 	if (btrfs_is_testing(fs_info))
 		return 0;
 
-	add_pinned_bytes(fs_info, num_bytes, owner, root_objectid);
 
 	/*
 	 * tree log blocks never actually go into the extent allocation
@@ -7241,20 +7259,25 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
 		WARN_ON(owner >= BTRFS_FIRST_FREE_OBJECTID);
 		/* unlocks the pinned mutex */
 		btrfs_pin_extent(fs_info, bytenr, num_bytes, 1);
+		old_ref_mod = new_ref_mod = 0;
 		ret = 0;
 	} else if (owner < BTRFS_FIRST_FREE_OBJECTID) {
 		ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
 						 num_bytes, parent,
 						 root_objectid, (int)owner,
 						 BTRFS_DROP_DELAYED_REF, NULL,
-						 NULL, NULL);
+						 &old_ref_mod, &new_ref_mod);
 	} else {
 		ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
 						 num_bytes, parent,
 						 root_objectid, owner, offset,
 						 0, BTRFS_DROP_DELAYED_REF,
-						 NULL, NULL);
+						 &old_ref_mod, &new_ref_mod);
 	}
+
+	if (ret == 0 && old_ref_mod >= 0 && new_ref_mod < 0)
+		add_pinned_bytes(fs_info, num_bytes, owner, root_objectid);
+
 	return ret;
 }
 
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount
  2017-06-06 23:45 [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Omar Sandoval
                   ` (5 preceding siblings ...)
  2017-06-06 23:45 ` [PATCH 6/7] Btrfs: rework delayed ref total_bytes_pinned accounting Omar Sandoval
@ 2017-06-06 23:45 ` Omar Sandoval
  2017-06-07 20:22   ` Liu Bo
                     ` (2 more replies)
  2017-06-07 15:48 ` [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Holger Hoffstätte
  7 siblings, 3 replies; 23+ messages in thread
From: Omar Sandoval @ 2017-06-06 23:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, Liu Bo, kernel-team

From: Omar Sandoval <osandov@fb.com>

Catch any future/remaining leaks or underflows of total_bytes_pinned.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent-tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 75ad24f8d253..5fb2fb27eda6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9860,6 +9860,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 			    space_info->bytes_reserved > 0 ||
 			    space_info->bytes_may_use > 0))
 			dump_space_info(info, space_info, 0, 0);
+		WARN_ON(percpu_counter_sum(&space_info->total_bytes_pinned) != 0);
 		list_del(&space_info->list);
 		for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
 			struct kobject *kobj;
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] Btrfs: fix total_bytes_pinned counter
  2017-06-06 23:45 [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Omar Sandoval
                   ` (6 preceding siblings ...)
  2017-06-06 23:45 ` [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount Omar Sandoval
@ 2017-06-07 15:48 ` Holger Hoffstätte
  2017-06-07 17:37   ` Omar Sandoval
  7 siblings, 1 reply; 23+ messages in thread
From: Holger Hoffstätte @ 2017-06-07 15:48 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: Josef Bacik, Liu Bo, kernel-team

On 06/07/17 01:45, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> This series fixes several problems with the total_bytes_pinned counter.
> Patches 1 and 2 are cleanups. Patches 3 and 4 are straightforward fixes.
> Patch 5 is prep for patch 6. Patch 6 is the most complicated fix.
> Patches 5 and 6 are ugly, I'd love any suggestions for a cleaner fix.
> Finally, patch 7 adds a warning to catch similar issues in the future.

Since this didn't really change the code significantly from the previous
version (except for the addition of minor patches 2+7) have a

Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>

Thanks!
Holger

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] Btrfs: fix total_bytes_pinned counter
  2017-06-07 15:48 ` [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Holger Hoffstätte
@ 2017-06-07 17:37   ` Omar Sandoval
  0 siblings, 0 replies; 23+ messages in thread
From: Omar Sandoval @ 2017-06-07 17:37 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: linux-btrfs, Josef Bacik, Liu Bo, kernel-team

On Wed, Jun 07, 2017 at 05:48:05PM +0200, Holger Hoffstätte wrote:
> On 06/07/17 01:45, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > This series fixes several problems with the total_bytes_pinned counter.
> > Patches 1 and 2 are cleanups. Patches 3 and 4 are straightforward fixes.
> > Patch 5 is prep for patch 6. Patch 6 is the most complicated fix.
> > Patches 5 and 6 are ugly, I'd love any suggestions for a cleaner fix.
> > Finally, patch 7 adds a warning to catch similar issues in the future.
> 
> Since this didn't really change the code significantly from the previous
> version (except for the addition of minor patches 2+7) have a
> 
> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>

Thanks for testing, Holger!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] Btrfs: return old and new total ref mods when adding delayed refs
  2017-06-06 23:45 ` [PATCH 5/7] Btrfs: return old and new total ref mods when adding delayed refs Omar Sandoval
@ 2017-06-07 20:06   ` Liu Bo
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Bo @ 2017-06-07 20:06 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, Josef Bacik, kernel-team

On Tue, Jun 06, 2017 at 04:45:30PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> We need this to decide when to account pinned bytes.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/delayed-ref.c | 29 ++++++++++++++++++++--------
>  fs/btrfs/delayed-ref.h |  6 ++++--
>  fs/btrfs/extent-tree.c | 51 ++++++++++++++++++++++++++------------------------
>  3 files changed, 52 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index be70d90dfee5..93ffa898df6d 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -470,7 +470,8 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans,
>  static noinline void
>  update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
>  			 struct btrfs_delayed_ref_node *existing,
> -			 struct btrfs_delayed_ref_node *update)
> +			 struct btrfs_delayed_ref_node *update,
> +			 int *old_ref_mod_ret)
>  {
>  	struct btrfs_delayed_ref_head *existing_ref;
>  	struct btrfs_delayed_ref_head *ref;
> @@ -523,6 +524,8 @@ update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
>  	 * currently, for refs we just added we know we're a-ok.
>  	 */
>  	old_ref_mod = existing_ref->total_ref_mod;
> +	if (old_ref_mod_ret)
> +		*old_ref_mod_ret = old_ref_mod;
>  	existing->ref_mod += update->ref_mod;
>  	existing_ref->total_ref_mod += update->ref_mod;
>  
> @@ -550,7 +553,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>  		     struct btrfs_delayed_ref_node *ref,
>  		     struct btrfs_qgroup_extent_record *qrecord,
>  		     u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
> -		     int action, int is_data, int *qrecord_inserted_ret)
> +		     int action, int is_data, int *qrecord_inserted_ret,
> +		     int *old_ref_mod, int *new_ref_mod)
>  {
>  	struct btrfs_delayed_ref_head *existing;
>  	struct btrfs_delayed_ref_head *head_ref = NULL;
> @@ -638,7 +642,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>  	if (existing) {
>  		WARN_ON(ref_root && reserved && existing->qgroup_ref_root
>  			&& existing->qgroup_reserved);
> -		update_existing_head_ref(delayed_refs, &existing->node, ref);
> +		update_existing_head_ref(delayed_refs, &existing->node, ref,
> +					 old_ref_mod);
>  		/*
>  		 * we've updated the existing ref, free the newly
>  		 * allocated ref
> @@ -646,6 +651,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>  		kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
>  		head_ref = existing;
>  	} else {
> +		if (old_ref_mod)
> +			*old_ref_mod = 0;
>  		if (is_data && count_mod < 0)
>  			delayed_refs->pending_csums += num_bytes;
>  		delayed_refs->num_heads++;
> @@ -655,6 +662,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>  	}
>  	if (qrecord_inserted_ret)
>  		*qrecord_inserted_ret = qrecord_inserted;
> +	if (new_ref_mod)
> +		*new_ref_mod = head_ref->total_ref_mod;

If we export add_pinned_bytes() to delayed_ref.c, we could modify the
counter here or after calling add_delayed_ref_head. 

thanks,
-liubo

>  	return head_ref;
>  }
>  
> @@ -778,7 +787,8 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
>  			       struct btrfs_trans_handle *trans,
>  			       u64 bytenr, u64 num_bytes, u64 parent,
>  			       u64 ref_root,  int level, int action,
> -			       struct btrfs_delayed_extent_op *extent_op)
> +			       struct btrfs_delayed_extent_op *extent_op,
> +			       int *old_ref_mod, int *new_ref_mod)
>  {
>  	struct btrfs_delayed_tree_ref *ref;
>  	struct btrfs_delayed_ref_head *head_ref;
> @@ -813,7 +823,8 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
>  	 */
>  	head_ref = add_delayed_ref_head(fs_info, trans, &head_ref->node, record,
>  					bytenr, num_bytes, 0, 0, action, 0,
> -					&qrecord_inserted);
> +					&qrecord_inserted, old_ref_mod,
> +					new_ref_mod);
>  
>  	add_delayed_tree_ref(fs_info, trans, head_ref, &ref->node, bytenr,
>  			     num_bytes, parent, ref_root, level, action);
> @@ -838,7 +849,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>  			       struct btrfs_trans_handle *trans,
>  			       u64 bytenr, u64 num_bytes,
>  			       u64 parent, u64 ref_root,
> -			       u64 owner, u64 offset, u64 reserved, int action)
> +			       u64 owner, u64 offset, u64 reserved, int action,
> +			       int *old_ref_mod, int *new_ref_mod)
>  {
>  	struct btrfs_delayed_data_ref *ref;
>  	struct btrfs_delayed_ref_head *head_ref;
> @@ -878,7 +890,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>  	 */
>  	head_ref = add_delayed_ref_head(fs_info, trans, &head_ref->node, record,
>  					bytenr, num_bytes, ref_root, reserved,
> -					action, 1, &qrecord_inserted);
> +					action, 1, &qrecord_inserted,
> +					old_ref_mod, new_ref_mod);
>  
>  	add_delayed_data_ref(fs_info, trans, head_ref, &ref->node, bytenr,
>  				   num_bytes, parent, ref_root, owner, offset,
> @@ -909,7 +922,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>  
>  	add_delayed_ref_head(fs_info, trans, &head_ref->node, NULL, bytenr,
>  			     num_bytes, 0, 0, BTRFS_UPDATE_DELAYED_HEAD,
> -			     extent_op->is_data, NULL);
> +			     extent_op->is_data, NULL, NULL, NULL);
>  
>  	spin_unlock(&delayed_refs->lock);
>  	return 0;
> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> index c0264ff01b53..ce88e4ac5276 100644
> --- a/fs/btrfs/delayed-ref.h
> +++ b/fs/btrfs/delayed-ref.h
> @@ -247,12 +247,14 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
>  			       struct btrfs_trans_handle *trans,
>  			       u64 bytenr, u64 num_bytes, u64 parent,
>  			       u64 ref_root, int level, int action,
> -			       struct btrfs_delayed_extent_op *extent_op);
> +			       struct btrfs_delayed_extent_op *extent_op,
> +			       int *old_ref_mod, int *new_ref_mod);
>  int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
>  			       struct btrfs_trans_handle *trans,
>  			       u64 bytenr, u64 num_bytes,
>  			       u64 parent, u64 ref_root,
> -			       u64 owner, u64 offset, u64 reserved, int action);
> +			       u64 owner, u64 offset, u64 reserved, int action,
> +			       int *old_ref_mod, int *new_ref_mod);
>  int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
>  				struct btrfs_trans_handle *trans,
>  				u64 bytenr, u64 num_bytes,
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 52f3a0486e64..6dce7abafe84 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2119,14 +2119,16 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>  
>  	if (owner < BTRFS_FIRST_FREE_OBJECTID) {
>  		ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
> -					num_bytes,
> -					parent, root_objectid, (int)owner,
> -					BTRFS_ADD_DELAYED_REF, NULL);
> +						 num_bytes, parent,
> +						 root_objectid, (int)owner,
> +						 BTRFS_ADD_DELAYED_REF, NULL,
> +						 NULL, NULL);
>  	} else {
>  		ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
> -					num_bytes, parent, root_objectid,
> -					owner, offset, 0,
> -					BTRFS_ADD_DELAYED_REF);
> +						 num_bytes, parent,
> +						 root_objectid, owner, offset,
> +						 0, BTRFS_ADD_DELAYED_REF, NULL,
> +						 NULL);
>  	}
>  	return ret;
>  }
> @@ -7169,12 +7171,12 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
>  	int ret;
>  
>  	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
> -		ret = btrfs_add_delayed_tree_ref(fs_info, trans,
> -						 buf->start, buf->len,
> -						 parent,
> +		ret = btrfs_add_delayed_tree_ref(fs_info, trans, buf->start,
> +						 buf->len, parent,
>  						 root->root_key.objectid,
>  						 btrfs_header_level(buf),
> -						 BTRFS_DROP_DELAYED_REF, NULL);
> +						 BTRFS_DROP_DELAYED_REF, NULL,
> +						 NULL, NULL);
>  		BUG_ON(ret); /* -ENOMEM */
>  	}
>  
> @@ -7242,15 +7244,16 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
>  		ret = 0;
>  	} else if (owner < BTRFS_FIRST_FREE_OBJECTID) {
>  		ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
> -					num_bytes,
> -					parent, root_objectid, (int)owner,
> -					BTRFS_DROP_DELAYED_REF, NULL);
> +						 num_bytes, parent,
> +						 root_objectid, (int)owner,
> +						 BTRFS_DROP_DELAYED_REF, NULL,
> +						 NULL, NULL);
>  	} else {
>  		ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
> -						num_bytes,
> -						parent, root_objectid, owner,
> -						offset, 0,
> -						BTRFS_DROP_DELAYED_REF);
> +						 num_bytes, parent,
> +						 root_objectid, owner, offset,
> +						 0, BTRFS_DROP_DELAYED_REF,
> +						 NULL, NULL);
>  	}
>  	return ret;
>  }
> @@ -8198,9 +8201,9 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
>  	BUG_ON(root_objectid == BTRFS_TREE_LOG_OBJECTID);
>  
>  	ret = btrfs_add_delayed_data_ref(fs_info, trans, ins->objectid,
> -					 ins->offset, 0,
> -					 root_objectid, owner, offset,
> -					 ram_bytes, BTRFS_ADD_DELAYED_EXTENT);
> +					 ins->offset, 0, root_objectid, owner,
> +					 offset, ram_bytes,
> +					 BTRFS_ADD_DELAYED_EXTENT, NULL, NULL);
>  	return ret;
>  }
>  
> @@ -8420,11 +8423,11 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>  		extent_op->is_data = false;
>  		extent_op->level = level;
>  
> -		ret = btrfs_add_delayed_tree_ref(fs_info, trans,
> -						 ins.objectid, ins.offset,
> -						 parent, root_objectid, level,
> +		ret = btrfs_add_delayed_tree_ref(fs_info, trans, ins.objectid,
> +						 ins.offset, parent,
> +						 root_objectid, level,
>  						 BTRFS_ADD_DELAYED_EXTENT,
> -						 extent_op);
> +						 extent_op, NULL, NULL);
>  		if (ret)
>  			goto out_free_delayed;
>  	}
> -- 
> 2.13.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/7] Btrfs: rework delayed ref total_bytes_pinned accounting
  2017-06-06 23:45 ` [PATCH 6/7] Btrfs: rework delayed ref total_bytes_pinned accounting Omar Sandoval
@ 2017-06-07 20:18   ` Liu Bo
  2017-06-09 23:38     ` Omar Sandoval
  0 siblings, 1 reply; 23+ messages in thread
From: Liu Bo @ 2017-06-07 20:18 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, Josef Bacik, kernel-team

On Tue, Jun 06, 2017 at 04:45:31PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> The total_bytes_pinned counter is completely broken when accounting
> delayed refs:
> 
> - If two drops for the same extent are merged, we will decrement
>   total_bytes_pinned twice but only increment it once.
> - If an add is merged into a drop or vice versa, we will decrement the
>   total_bytes_pinned counter but never increment it.
> - If multiple references to an extent are dropped, we will account it
>   multiple times, potentially vastly over-estimating the number of bytes
>   that will be freed by a commit and doing unnecessary work when we're
>   close to ENOSPC.
> 
> The last issue is relatively minor, but the first two make the
> total_bytes_pinned counter leak or underflow very often. These
> accounting issues were introduced in b150a4f10d87 ("Btrfs: use a percpu
> to keep track of possibly pinned bytes"), but they were papered over by
> zeroing out the counter on every commit until d288db5dc011 ("Btrfs: fix
> race of using total_bytes_pinned").
> 
> We need to make sure that an extent is accounted as pinned exactly once
> if and only if we will drop references to it when when the transaction
> is committed. Ideally we would only add to total_bytes_pinned when the
> *last* reference is dropped, but this information isn't readily
> available for data extents. Again, this over-estimation can lead to
> extra commits when we're close to ENOSPC, but it's not as bad as before.
> 
> The fix implemented here is to increment total_bytes_pinned when the
> total refmod count for an extent goes negative and decrement it if the
> refmod count goes back to non-negative or after we've run all of the
> delayed refs for that extent.
>

The patch could be cleaner if we inc/dec %pinned inside delayed_ref.c.

The idea looks good to me.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

-liubo

> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/extent-tree.c | 41 ++++++++++++++++++++++++++++++++---------
>  1 file changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 6dce7abafe84..75ad24f8d253 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2112,6 +2112,7 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>  			 u64 bytenr, u64 num_bytes, u64 parent,
>  			 u64 root_objectid, u64 owner, u64 offset)
>  {
> +	int old_ref_mod, new_ref_mod;
>  	int ret;
>  
>  	BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID &&
> @@ -2122,14 +2123,18 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>  						 num_bytes, parent,
>  						 root_objectid, (int)owner,
>  						 BTRFS_ADD_DELAYED_REF, NULL,
> -						 NULL, NULL);
> +						 &old_ref_mod, &new_ref_mod);
>  	} else {
>  		ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
>  						 num_bytes, parent,
>  						 root_objectid, owner, offset,
> -						 0, BTRFS_ADD_DELAYED_REF, NULL,
> -						 NULL);
> +						 0, BTRFS_ADD_DELAYED_REF,
> +						 &old_ref_mod, &new_ref_mod);
>  	}
> +
> +	if (ret == 0 && old_ref_mod < 0 && new_ref_mod >= 0)
> +		add_pinned_bytes(fs_info, -num_bytes, owner, root_objectid);
> +
>  	return ret;
>  }
>  
> @@ -2433,6 +2438,16 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
>  		head = btrfs_delayed_node_to_head(node);
>  		trace_run_delayed_ref_head(fs_info, node, head, node->action);
>  
> +		if (head->total_ref_mod < 0) {
> +			struct btrfs_block_group_cache *cache;
> +
> +			cache = btrfs_lookup_block_group(fs_info, node->bytenr);
> +			ASSERT(cache);
> +			percpu_counter_add(&cache->space_info->total_bytes_pinned,
> +					   -node->num_bytes);
> +			btrfs_put_block_group(cache);
> +		}
> +
>  		if (insert_reserved) {
>  			btrfs_pin_extent(fs_info, node->bytenr,
>  					 node->num_bytes, 1);
> @@ -6269,6 +6284,8 @@ static int update_block_group(struct btrfs_trans_handle *trans,
>  			trace_btrfs_space_reservation(info, "pinned",
>  						      cache->space_info->flags,
>  						      num_bytes, 1);
> +			percpu_counter_add(&cache->space_info->total_bytes_pinned,
> +					   num_bytes);
>  			set_extent_dirty(info->pinned_extents,
>  					 bytenr, bytenr + num_bytes - 1,
>  					 GFP_NOFS | __GFP_NOFAIL);
> @@ -7038,8 +7055,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>  				goto out;
>  			}
>  		}
> -		add_pinned_bytes(info, -num_bytes, owner_objectid,
> -				 root_objectid);
>  	} else {
>  		if (found_extent) {
>  			BUG_ON(is_data && refs_to_drop !=
> @@ -7171,13 +7186,16 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
>  	int ret;
>  
>  	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
> +		int old_ref_mod, new_ref_mod;
> +
>  		ret = btrfs_add_delayed_tree_ref(fs_info, trans, buf->start,
>  						 buf->len, parent,
>  						 root->root_key.objectid,
>  						 btrfs_header_level(buf),
>  						 BTRFS_DROP_DELAYED_REF, NULL,
> -						 NULL, NULL);
> +						 &old_ref_mod, &new_ref_mod);
>  		BUG_ON(ret); /* -ENOMEM */
> +		pin = old_ref_mod >= 0 && new_ref_mod < 0;
>  	}
>  
>  	if (last_ref && btrfs_header_generation(buf) == trans->transid) {
> @@ -7226,12 +7244,12 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
>  		      u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid,
>  		      u64 owner, u64 offset)
>  {
> +	int old_ref_mod, new_ref_mod;
>  	int ret;
>  
>  	if (btrfs_is_testing(fs_info))
>  		return 0;
>  
> -	add_pinned_bytes(fs_info, num_bytes, owner, root_objectid);
>  
>  	/*
>  	 * tree log blocks never actually go into the extent allocation
> @@ -7241,20 +7259,25 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
>  		WARN_ON(owner >= BTRFS_FIRST_FREE_OBJECTID);
>  		/* unlocks the pinned mutex */
>  		btrfs_pin_extent(fs_info, bytenr, num_bytes, 1);
> +		old_ref_mod = new_ref_mod = 0;
>  		ret = 0;
>  	} else if (owner < BTRFS_FIRST_FREE_OBJECTID) {
>  		ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
>  						 num_bytes, parent,
>  						 root_objectid, (int)owner,
>  						 BTRFS_DROP_DELAYED_REF, NULL,
> -						 NULL, NULL);
> +						 &old_ref_mod, &new_ref_mod);
>  	} else {
>  		ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
>  						 num_bytes, parent,
>  						 root_objectid, owner, offset,
>  						 0, BTRFS_DROP_DELAYED_REF,
> -						 NULL, NULL);
> +						 &old_ref_mod, &new_ref_mod);
>  	}
> +
> +	if (ret == 0 && old_ref_mod >= 0 && new_ref_mod < 0)
> +		add_pinned_bytes(fs_info, num_bytes, owner, root_objectid);
> +
>  	return ret;
>  }
>  
> -- 
> 2.13.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] Btrfs: always account pinned bytes when dropping a tree block ref
  2017-06-06 23:45 ` [PATCH 4/7] Btrfs: always account pinned bytes when dropping a tree block ref Omar Sandoval
@ 2017-06-07 20:20   ` Liu Bo
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Bo @ 2017-06-07 20:20 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, Josef Bacik, kernel-team

On Tue, Jun 06, 2017 at 04:45:29PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Currently, we only increment total_bytes_pinned in
> btrfs_free_tree_block() when dropping the last reference on the block.
> However, when the delayed ref is run later, we will decrement
> total_bytes_pinned regardless of whether it was the last reference or
> not. This causes the counter to underflow when the reference we dropped
> was not the last reference. Fix it by incrementing the counter
> unconditionally, which is what btrfs_free_extent() does. This makes
> total_bytes_pinned an overestimate when references to shared extents are
> dropped, but in the worst case this will just make us try to commit the
> transaction to try to free up space and find we didn't free enough.
>

I recognize the same problem, the patch looks good.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

-liubo
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/extent-tree.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 9cff937415cb..52f3a0486e64 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -7178,10 +7178,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
>  		BUG_ON(ret); /* -ENOMEM */
>  	}
>  
> -	if (!last_ref)
> -		return;
> -
> -	if (btrfs_header_generation(buf) == trans->transid) {
> +	if (last_ref && btrfs_header_generation(buf) == trans->transid) {
>  		struct btrfs_block_group_cache *cache;
>  
>  		if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
> @@ -7212,11 +7209,13 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
>  		add_pinned_bytes(fs_info, buf->len, btrfs_header_level(buf),
>  				 root->root_key.objectid);
>  
> -	/*
> -	 * Deleting the buffer, clear the corrupt flag since it doesn't matter
> -	 * anymore.
> -	 */
> -	clear_bit(EXTENT_BUFFER_CORRUPT, &buf->bflags);
> +	if (last_ref) {
> +		/*
> +		 * Deleting the buffer, clear the corrupt flag since it doesn't
> +		 * matter anymore.
> +		 */
> +		clear_bit(EXTENT_BUFFER_CORRUPT, &buf->bflags);
> +	}
>  }
>  
>  /* Can return -ENOMEM */
> -- 
> 2.13.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount
  2017-06-06 23:45 ` [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount Omar Sandoval
@ 2017-06-07 20:22   ` Liu Bo
  2017-06-09 23:45     ` Omar Sandoval
  2017-06-13 18:35   ` Jeff Mahoney
  2017-06-21 17:40   ` David Sterba
  2 siblings, 1 reply; 23+ messages in thread
From: Liu Bo @ 2017-06-07 20:22 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, Josef Bacik, kernel-team

On Tue, Jun 06, 2017 at 04:45:32PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Catch any future/remaining leaks or underflows of total_bytes_pinned.
>

This might be a little bit late, what about checking it after
btrfs_finish_extetn_commit()?

-liubo

> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/extent-tree.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 75ad24f8d253..5fb2fb27eda6 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9860,6 +9860,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
>  			    space_info->bytes_reserved > 0 ||
>  			    space_info->bytes_may_use > 0))
>  			dump_space_info(info, space_info, 0, 0);
> +		WARN_ON(percpu_counter_sum(&space_info->total_bytes_pinned) != 0);
>  		list_del(&space_info->list);
>  		for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
>  			struct kobject *kobj;
> -- 
> 2.13.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/7] Btrfs: rework delayed ref total_bytes_pinned accounting
  2017-06-07 20:18   ` Liu Bo
@ 2017-06-09 23:38     ` Omar Sandoval
  0 siblings, 0 replies; 23+ messages in thread
From: Omar Sandoval @ 2017-06-09 23:38 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs, Josef Bacik, kernel-team

On Wed, Jun 07, 2017 at 01:18:10PM -0700, Liu Bo wrote:
> On Tue, Jun 06, 2017 at 04:45:31PM -0700, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > The total_bytes_pinned counter is completely broken when accounting
> > delayed refs:
> > 
> > - If two drops for the same extent are merged, we will decrement
> >   total_bytes_pinned twice but only increment it once.
> > - If an add is merged into a drop or vice versa, we will decrement the
> >   total_bytes_pinned counter but never increment it.
> > - If multiple references to an extent are dropped, we will account it
> >   multiple times, potentially vastly over-estimating the number of bytes
> >   that will be freed by a commit and doing unnecessary work when we're
> >   close to ENOSPC.
> > 
> > The last issue is relatively minor, but the first two make the
> > total_bytes_pinned counter leak or underflow very often. These
> > accounting issues were introduced in b150a4f10d87 ("Btrfs: use a percpu
> > to keep track of possibly pinned bytes"), but they were papered over by
> > zeroing out the counter on every commit until d288db5dc011 ("Btrfs: fix
> > race of using total_bytes_pinned").
> > 
> > We need to make sure that an extent is accounted as pinned exactly once
> > if and only if we will drop references to it when when the transaction
> > is committed. Ideally we would only add to total_bytes_pinned when the
> > *last* reference is dropped, but this information isn't readily
> > available for data extents. Again, this over-estimation can lead to
> > extra commits when we're close to ENOSPC, but it's not as bad as before.
> > 
> > The fix implemented here is to increment total_bytes_pinned when the
> > total refmod count for an extent goes negative and decrement it if the
> > refmod count goes back to non-negative or after we've run all of the
> > delayed refs for that extent.
> >
> 
> The patch could be cleaner if we inc/dec %pinned inside delayed_ref.c.
> 
> The idea looks good to me.
> 
> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

Yeah, I think that'll work. My first reaction was that it'd be a
layering violation, but I think it makes sense, this counter really is
necessary because of delayed refs.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount
  2017-06-07 20:22   ` Liu Bo
@ 2017-06-09 23:45     ` Omar Sandoval
  0 siblings, 0 replies; 23+ messages in thread
From: Omar Sandoval @ 2017-06-09 23:45 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs, Josef Bacik, kernel-team

On Wed, Jun 07, 2017 at 01:22:04PM -0700, Liu Bo wrote:
> On Tue, Jun 06, 2017 at 04:45:32PM -0700, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Catch any future/remaining leaks or underflows of total_bytes_pinned.
> >
> 
> This might be a little bit late, what about checking it after
> btrfs_finish_extetn_commit()?
> 
> -liubo

Only reason I didn't put it there was because then I'd have to spend
time to convince myself that it couldn't be modified concurrently :)
I'll try it and make sure. Thanks for the review!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT()
  2017-06-06 23:45 ` [PATCH 2/7] Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT() Omar Sandoval
@ 2017-06-12 13:26   ` David Sterba
  2017-06-21 17:31   ` David Sterba
  1 sibling, 0 replies; 23+ messages in thread
From: David Sterba @ 2017-06-12 13:26 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, Josef Bacik, Liu Bo, kernel-team

On Tue, Jun 06, 2017 at 04:45:27PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/extent-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 7c01b4e9e3b6..6032e9a635f2 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -782,7 +782,7 @@ static void add_pinned_bytes(struct btrfs_fs_info *fs_info, s64 num_bytes,
>  	}
>  
>  	space_info = __find_space_info(fs_info, flags);
> -	BUG_ON(!space_info); /* Logic bug */
> +	ASSERT(space_info);

Why?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64
  2017-06-06 23:45 ` [PATCH 1/7] Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64 Omar Sandoval
@ 2017-06-12 13:39   ` David Sterba
  2017-06-12 17:34   ` Liu Bo
  1 sibling, 0 replies; 23+ messages in thread
From: David Sterba @ 2017-06-12 13:39 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, Josef Bacik, Liu Bo, kernel-team

On Tue, Jun 06, 2017 at 04:45:26PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> There are a few places where we pass in a negative num_bytes, so make it
> signed for clarity. Also move it up in the file since later patches will
> need it there.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64
  2017-06-06 23:45 ` [PATCH 1/7] Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64 Omar Sandoval
  2017-06-12 13:39   ` David Sterba
@ 2017-06-12 17:34   ` Liu Bo
  1 sibling, 0 replies; 23+ messages in thread
From: Liu Bo @ 2017-06-12 17:34 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, Josef Bacik, kernel-team

On Tue, Jun 06, 2017 at 04:45:26PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> There are a few places where we pass in a negative num_bytes, so make it
> signed for clarity. Also move it up in the file since later patches will
> need it there.
> 

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

-liubo
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/extent-tree.c | 41 ++++++++++++++++++++---------------------
>  1 file changed, 20 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index e390451c72e6..7c01b4e9e3b6 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -766,6 +766,26 @@ static struct btrfs_space_info *__find_space_info(struct btrfs_fs_info *info,
>  	return NULL;
>  }
>  
> +static void add_pinned_bytes(struct btrfs_fs_info *fs_info, s64 num_bytes,
> +			     u64 owner, u64 root_objectid)
> +{
> +	struct btrfs_space_info *space_info;
> +	u64 flags;
> +
> +	if (owner < BTRFS_FIRST_FREE_OBJECTID) {
> +		if (root_objectid == BTRFS_CHUNK_TREE_OBJECTID)
> +			flags = BTRFS_BLOCK_GROUP_SYSTEM;
> +		else
> +			flags = BTRFS_BLOCK_GROUP_METADATA;
> +	} else {
> +		flags = BTRFS_BLOCK_GROUP_DATA;
> +	}
> +
> +	space_info = __find_space_info(fs_info, flags);
> +	BUG_ON(!space_info); /* Logic bug */
> +	percpu_counter_add(&space_info->total_bytes_pinned, num_bytes);
> +}
> +
>  /*
>   * after adding space to the filesystem, we need to clear the full flags
>   * on all the space infos.
> @@ -6793,27 +6813,6 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans,
>  	return 0;
>  }
>  
> -static void add_pinned_bytes(struct btrfs_fs_info *fs_info, u64 num_bytes,
> -			     u64 owner, u64 root_objectid)
> -{
> -	struct btrfs_space_info *space_info;
> -	u64 flags;
> -
> -	if (owner < BTRFS_FIRST_FREE_OBJECTID) {
> -		if (root_objectid == BTRFS_CHUNK_TREE_OBJECTID)
> -			flags = BTRFS_BLOCK_GROUP_SYSTEM;
> -		else
> -			flags = BTRFS_BLOCK_GROUP_METADATA;
> -	} else {
> -		flags = BTRFS_BLOCK_GROUP_DATA;
> -	}
> -
> -	space_info = __find_space_info(fs_info, flags);
> -	BUG_ON(!space_info); /* Logic bug */
> -	percpu_counter_add(&space_info->total_bytes_pinned, num_bytes);
> -}
> -
> -
>  static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>  				struct btrfs_fs_info *info,
>  				struct btrfs_delayed_ref_node *node, u64 parent,
> -- 
> 2.13.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] Btrfs: update total_bytes_pinned when pinning down extents
  2017-06-06 23:45 ` [PATCH 3/7] Btrfs: update total_bytes_pinned when pinning down extents Omar Sandoval
@ 2017-06-12 17:37   ` Liu Bo
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Bo @ 2017-06-12 17:37 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, Josef Bacik, kernel-team

On Tue, Jun 06, 2017 at 04:45:28PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> The extents marked in pin_down_extent() will be unpinned later in
> unpin_extent_range(), which decrements total_bytes_pinned.
> pin_down_extent() must increment the counter to avoid underflowing it.
> Also adjust btrfs_free_tree_block() to avoid accounting for the same
> extent twice.
> 

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

-liubo
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/extent-tree.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 6032e9a635f2..9cff937415cb 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -6343,6 +6343,7 @@ static int pin_down_extent(struct btrfs_fs_info *fs_info,
>  
>  	trace_btrfs_space_reservation(fs_info, "pinned",
>  				      cache->space_info->flags, num_bytes, 1);
> +	percpu_counter_add(&cache->space_info->total_bytes_pinned, num_bytes);
>  	set_extent_dirty(fs_info->pinned_extents, bytenr,
>  			 bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
>  	return 0;
> @@ -7189,6 +7190,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
>  				goto out;
>  		}
>  
> +		pin = 0;
>  		cache = btrfs_lookup_block_group(fs_info, buf->start);
>  
>  		if (btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN)) {
> @@ -7204,7 +7206,6 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
>  		btrfs_free_reserved_bytes(cache, buf->len, 0);
>  		btrfs_put_block_group(cache);
>  		trace_btrfs_reserved_extent_free(fs_info, buf->start, buf->len);
> -		pin = 0;
>  	}
>  out:
>  	if (pin)
> -- 
> 2.13.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount
  2017-06-06 23:45 ` [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount Omar Sandoval
  2017-06-07 20:22   ` Liu Bo
@ 2017-06-13 18:35   ` Jeff Mahoney
  2017-06-21 17:40   ` David Sterba
  2 siblings, 0 replies; 23+ messages in thread
From: Jeff Mahoney @ 2017-06-13 18:35 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: Josef Bacik, Liu Bo, kernel-team


[-- Attachment #1.1: Type: text/plain, Size: 1240 bytes --]

On 6/6/17 7:45 PM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Catch any future/remaining leaks or underflows of total_bytes_pinned.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/extent-tree.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 75ad24f8d253..5fb2fb27eda6 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9860,6 +9860,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
>  			    space_info->bytes_reserved > 0 ||
>  			    space_info->bytes_may_use > 0))
>  			dump_space_info(info, space_info, 0, 0);
> +		WARN_ON(percpu_counter_sum(&space_info->total_bytes_pinned) != 0);
>  		list_del(&space_info->list);
>  		for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
>  			struct kobject *kobj;
> 

Can we group this in with the other WARN_ON and add printing
total_bytes_pinned to dump_space_info?  Understanding the magnitude and
whether we're underflowed or haven't released enough is helpful.  While
testing your patchset, I did this and it found a few bugs in cleanup
after error.  I'll post those patches shortly.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT()
  2017-06-06 23:45 ` [PATCH 2/7] Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT() Omar Sandoval
  2017-06-12 13:26   ` David Sterba
@ 2017-06-21 17:31   ` David Sterba
  1 sibling, 0 replies; 23+ messages in thread
From: David Sterba @ 2017-06-21 17:31 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, Josef Bacik, Liu Bo, kernel-team

On Tue, Jun 06, 2017 at 04:45:27PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: David Sterba <dsterba@suse.com>

Added some changelog.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount
  2017-06-06 23:45 ` [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount Omar Sandoval
  2017-06-07 20:22   ` Liu Bo
  2017-06-13 18:35   ` Jeff Mahoney
@ 2017-06-21 17:40   ` David Sterba
  2 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2017-06-21 17:40 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, Josef Bacik, Liu Bo, kernel-team, jeffm

On Tue, Jun 06, 2017 at 04:45:32PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Catch any future/remaining leaks or underflows of total_bytes_pinned.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>

This patch received some objections. As it's a debugging aid, I'd rather
merge a patch that other agree is useful for that purpose.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-06-21 17:41 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-06 23:45 [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Omar Sandoval
2017-06-06 23:45 ` [PATCH 1/7] Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64 Omar Sandoval
2017-06-12 13:39   ` David Sterba
2017-06-12 17:34   ` Liu Bo
2017-06-06 23:45 ` [PATCH 2/7] Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT() Omar Sandoval
2017-06-12 13:26   ` David Sterba
2017-06-21 17:31   ` David Sterba
2017-06-06 23:45 ` [PATCH 3/7] Btrfs: update total_bytes_pinned when pinning down extents Omar Sandoval
2017-06-12 17:37   ` Liu Bo
2017-06-06 23:45 ` [PATCH 4/7] Btrfs: always account pinned bytes when dropping a tree block ref Omar Sandoval
2017-06-07 20:20   ` Liu Bo
2017-06-06 23:45 ` [PATCH 5/7] Btrfs: return old and new total ref mods when adding delayed refs Omar Sandoval
2017-06-07 20:06   ` Liu Bo
2017-06-06 23:45 ` [PATCH 6/7] Btrfs: rework delayed ref total_bytes_pinned accounting Omar Sandoval
2017-06-07 20:18   ` Liu Bo
2017-06-09 23:38     ` Omar Sandoval
2017-06-06 23:45 ` [PATCH 7/7] Btrfs: warn if total_bytes_pinned is non-zero on unmount Omar Sandoval
2017-06-07 20:22   ` Liu Bo
2017-06-09 23:45     ` Omar Sandoval
2017-06-13 18:35   ` Jeff Mahoney
2017-06-21 17:40   ` David Sterba
2017-06-07 15:48 ` [PATCH 0/7] Btrfs: fix total_bytes_pinned counter Holger Hoffstätte
2017-06-07 17:37   ` Omar Sandoval

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.