All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/12] Btrfs: orphan and truncate fixes
@ 2018-05-11  0:11 Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 01/12] Btrfs: remove stale comment referencing vmtruncate() Omar Sandoval
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

Hi,

This is v2 of the fixes for the orphan item early ENOSPC issue we hit at
Facebook.

Changes since v1:

- Added two extra cleanups, patches 10 and 11
- Added a forgotten clear of the orphan bit in patch 8
- Reworded titles of patches 6 and 9
- Added people's reviewed-bys

Cover letter from v1:

At Facebook we hit an early ENOSPC issue which we tracked down to the
reservations for orphan items of deleted-but-still-open files. The
primary function of this series is to fix that bug, but I ended up
uncovering a pile of other issues in the process, most notably that the
orphan items we create for truncate are useless.

I've also posted an xfstest that reproduces this bug.

Thanks!

Omar Sandoval (12):
  Btrfs: remove stale comment referencing vmtruncate()
  Btrfs: fix error handling in btrfs_truncate_inode_items()
  Btrfs: don't BUG_ON() in btrfs_truncate_inode_items()
  Btrfs: stop creating orphan items for truncate
  Btrfs: don't release reserve or decrement orphan count if orphan item
    already existed
  Btrfs: don't return ino to ino cache if inode item removal fails
  Btrfs: refactor btrfs_evict_inode() reserve refill dance
  Btrfs: fix ENOSPC caused by orphan items reservations
  Btrfs: get rid of root->orphan_block_rsv and root->orphan_lock
  Btrfs: get rid of btrfs_orphan_commit_root() and root->orphan_inodes
  Btrfs: simplify error handling in btrfs_evict_inode()
  Btrfs: reserve space for O_TMPFILE orphan item deletion

 fs/btrfs/btrfs_inode.h      |  19 +-
 fs/btrfs/ctree.h            |   8 -
 fs/btrfs/disk-io.c          |   9 -
 fs/btrfs/extent-tree.c      |  38 ---
 fs/btrfs/free-space-cache.c |   6 +-
 fs/btrfs/inode.c            | 585 +++++++++++-------------------------
 fs/btrfs/transaction.c      |   1 -
 7 files changed, 191 insertions(+), 475 deletions(-)

-- 
2.17.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/12] Btrfs: remove stale comment referencing vmtruncate()
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 02/12] Btrfs: fix error handling in btrfs_truncate_inode_items() Omar Sandoval
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

Commit a41ad394a03b ("Btrfs: convert to the new truncate sequence")
changed vmtruncate() to truncate_setsize() but didn't update the comment
above it. truncate_setsize() never fails (the IS_SWAPFILE() check
happens elsewhere), so remove the comment.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d241285a0d2a..fef8dbb6a93f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5106,7 +5106,6 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		if (ret)
 			return ret;
 
-		/* we don't support swapfiles, so vmtruncate shouldn't fail */
 		truncate_setsize(inode, newsize);
 
 		/* Disable nonlocked read DIO to avoid the end less truncate */
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 02/12] Btrfs: fix error handling in btrfs_truncate_inode_items()
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 01/12] Btrfs: remove stale comment referencing vmtruncate() Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 03/12] Btrfs: don't BUG_ON() " Omar Sandoval
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

btrfs_truncate_inode_items() uses two variables for error handling, ret
and err. These are not handled consistently, leading to a couple of
bugs.

- Errors from btrfs_del_items() are handled but not propagated to the
  caller
- If btrfs_run_delayed_refs() fails and aborts the transaction, we
  continue running

Just use ret everywhere and simplify things a bit, fixing both of these
issues.

Fixes: 79787eaab461 ("btrfs: replace many BUG_ONs with proper error handling")
Fixes: 1262133b8d6f ("Btrfs: account for crcs in delayed ref processing")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 55 ++++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index fef8dbb6a93f..79d1da01a90d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4442,7 +4442,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	int pending_del_slot = 0;
 	int extent_type = -1;
 	int ret;
-	int err = 0;
 	u64 ino = btrfs_ino(BTRFS_I(inode));
 	u64 bytes_deleted = 0;
 	bool be_nice = false;
@@ -4494,22 +4493,19 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 	 * up a huge file in a single leaf.  Most of the time that
 	 * bytes_deleted is > 0, it will be huge by the time we get here
 	 */
-	if (be_nice && bytes_deleted > SZ_32M) {
-		if (btrfs_should_end_transaction(trans)) {
-			err = -EAGAIN;
-			goto error;
-		}
+	if (be_nice && bytes_deleted > SZ_32M &&
+	    btrfs_should_end_transaction(trans)) {
+		ret = -EAGAIN;
+		goto out;
 	}
 
-
 	path->leave_spinning = 1;
 	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
-	if (ret < 0) {
-		err = ret;
+	if (ret < 0)
 		goto out;
-	}
 
 	if (ret > 0) {
+		ret = 0;
 		/* there are no items in the tree for us to truncate, we're
 		 * done
 		 */
@@ -4620,7 +4616,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 				 * We have to bail so the last_size is set to
 				 * just before this extent.
 				 */
-				err = NEED_TRUNCATE_BLOCK;
+				ret = NEED_TRUNCATE_BLOCK;
 				break;
 			}
 
@@ -4687,7 +4683,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 						pending_del_nr);
 				if (ret) {
 					btrfs_abort_transaction(trans, ret);
-					goto error;
+					break;
 				}
 				pending_del_nr = 0;
 			}
@@ -4698,8 +4694,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 					trans->delayed_ref_updates = 0;
 					ret = btrfs_run_delayed_refs(trans,
 								   updates * 2);
-					if (ret && !err)
-						err = ret;
+					if (ret)
+						break;
 				}
 			}
 			/*
@@ -4707,8 +4703,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 			 * and let the transaction restart
 			 */
 			if (should_end) {
-				err = -EAGAIN;
-				goto error;
+				ret = -EAGAIN;
+				break;
 			}
 			goto search_again;
 		} else {
@@ -4716,32 +4712,37 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 		}
 	}
 out:
-	if (pending_del_nr) {
-		ret = btrfs_del_items(trans, root, path, pending_del_slot,
+	if (ret >= 0 && pending_del_nr) {
+		int err;
+
+		err = btrfs_del_items(trans, root, path, pending_del_slot,
 				      pending_del_nr);
-		if (ret)
-			btrfs_abort_transaction(trans, ret);
+		if (err) {
+			btrfs_abort_transaction(trans, err);
+			ret = err;
+		}
 	}
-error:
 	if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
 		ASSERT(last_size >= new_size);
-		if (!err && last_size > new_size)
+		if (!ret && last_size > new_size)
 			last_size = new_size;
 		btrfs_ordered_update_i_size(inode, last_size, NULL);
 	}
 
 	btrfs_free_path(path);
 
-	if (be_nice && bytes_deleted > SZ_32M) {
+	if (be_nice && bytes_deleted > SZ_32M && (ret >= 0 || ret == -EAGAIN)) {
 		unsigned long updates = trans->delayed_ref_updates;
+		int err;
+
 		if (updates) {
 			trans->delayed_ref_updates = 0;
-			ret = btrfs_run_delayed_refs(trans, updates * 2);
-			if (ret && !err)
-				err = ret;
+			err = btrfs_run_delayed_refs(trans, updates * 2);
+			if (err)
+				ret = err;
 		}
 	}
-	return err;
+	return ret;
 }
 
 /*
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 03/12] Btrfs: don't BUG_ON() in btrfs_truncate_inode_items()
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 01/12] Btrfs: remove stale comment referencing vmtruncate() Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 02/12] Btrfs: fix error handling in btrfs_truncate_inode_items() Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 04/12] Btrfs: stop creating orphan items for truncate Omar Sandoval
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

btrfs_free_extent() can fail because of ENOMEM. There's no reason to
panic here, we can just abort the transaction.

Fixes: f4b9aa8d3b87 ("btrfs_truncate")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 79d1da01a90d..bd4975476f0e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4655,7 +4655,10 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 						extent_num_bytes, 0,
 						btrfs_header_owner(leaf),
 						ino, extent_offset);
-			BUG_ON(ret);
+			if (ret) {
+				btrfs_abort_transaction(trans, ret);
+				break;
+			}
 			if (btrfs_should_throttle_delayed_refs(trans, fs_info))
 				btrfs_async_run_delayed_refs(fs_info,
 					trans->delayed_ref_updates * 2,
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 04/12] Btrfs: stop creating orphan items for truncate
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (2 preceding siblings ...)
  2018-05-11  0:11 ` [PATCH v2 03/12] Btrfs: don't BUG_ON() " Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 05/12] Btrfs: don't release reserve or decrement orphan count if orphan item already existed Omar Sandoval
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

Currently, we insert an orphan item during a truncate so that if there's
a crash, we don't leak extents past the on-disk i_size. However, since
commit 7f4f6e0a3f6d ("Btrfs: only update disk_i_size as we remove
extents"), we keep disk_i_size in sync with the extent items as we
truncate, so orphan cleanup will never have any extents to remove. Don't
bother with the superfluous orphan item.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/free-space-cache.c |   6 +-
 fs/btrfs/inode.c            | 159 +++++++++++-------------------------
 2 files changed, 51 insertions(+), 114 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index e5b569bebc73..d5f80cb300be 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -253,10 +253,8 @@ int btrfs_truncate_free_space_cache(struct btrfs_trans_handle *trans,
 	truncate_pagecache(inode, 0);
 
 	/*
-	 * We don't need an orphan item because truncating the free space cache
-	 * will never be split across transactions.
-	 * We don't need to check for -EAGAIN because we're a free space
-	 * cache inode
+	 * We skip the throttling logic for free space cache inodes, so we don't
+	 * need to check for -EAGAIN.
 	 */
 	ret = btrfs_truncate_inode_items(trans, root, inode,
 					 0, BTRFS_EXTENT_DATA_KEY);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index bd4975476f0e..1460823951d7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3341,8 +3341,8 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
 }
 
 /*
- * This creates an orphan entry for the given inode in case something goes
- * wrong in the middle of an unlink/truncate.
+ * This creates an orphan entry for the given inode in case something goes wrong
+ * in the middle of an unlink.
  *
  * NOTE: caller of this function should reserve 5 units of metadata for
  *	 this function.
@@ -3405,7 +3405,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 		}
 	}
 
-	/* insert an orphan item to track this unlinked/truncated file */
+	/* insert an orphan item to track this unlinked file */
 	if (insert) {
 		ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
 		if (ret) {
@@ -3434,8 +3434,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 }
 
 /*
- * We have done the truncate/delete so we can go ahead and remove the orphan
- * item for this particular inode.
+ * We have done the delete so we can go ahead and remove the orphan item for
+ * this particular inode.
  */
 static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
 			    struct btrfs_inode *inode)
@@ -3479,7 +3479,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 	struct btrfs_trans_handle *trans;
 	struct inode *inode;
 	u64 last_objectid = 0;
-	int ret = 0, nr_unlink = 0, nr_truncate = 0;
+	int ret = 0, nr_unlink = 0;
 
 	if (cmpxchg(&root->orphan_cleanup_state, 0, ORPHAN_CLEANUP_STARTED))
 		return 0;
@@ -3579,12 +3579,31 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 				key.offset = found_key.objectid - 1;
 				continue;
 			}
+
 		}
+
 		/*
-		 * Inode is already gone but the orphan item is still there,
-		 * kill the orphan item.
+		 * If we have an inode with links, there are a couple of
+		 * possibilities. Old kernels (before v3.12) used to create an
+		 * orphan item for truncate indicating that there were possibly
+		 * extent items past i_size that needed to be deleted. In v3.12,
+		 * truncate was changed to update i_size in sync with the extent
+		 * items, but the (useless) orphan item was still created. Since
+		 * v4.18, we don't create the orphan item for truncate at all.
+		 *
+		 * So, this item could mean that we need to do a truncate, but
+		 * only if this filesystem was last used on a pre-v3.12 kernel
+		 * and was not cleanly unmounted. The odds of that are quite
+		 * slim, and it's a pain to do the truncate now, so just delete
+		 * the orphan item.
+		 *
+		 * It's also possible that this orphan item was supposed to be
+		 * deleted but wasn't. The inode number may have been reused,
+		 * but either way, we can delete the orphan item.
 		 */
-		if (ret == -ENOENT) {
+		if (ret == -ENOENT || inode->i_nlink) {
+			if (!ret)
+				iput(inode);
 			trans = btrfs_start_transaction(root, 1);
 			if (IS_ERR(trans)) {
 				ret = PTR_ERR(trans);
@@ -3608,34 +3627,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 			&BTRFS_I(inode)->runtime_flags);
 		atomic_inc(&root->orphan_inodes);
 
-		/* if we have links, this was a truncate, lets do that */
-		if (inode->i_nlink) {
-			if (WARN_ON(!S_ISREG(inode->i_mode))) {
-				iput(inode);
-				continue;
-			}
-			nr_truncate++;
-
-			/* 1 for the orphan item deletion. */
-			trans = btrfs_start_transaction(root, 1);
-			if (IS_ERR(trans)) {
-				iput(inode);
-				ret = PTR_ERR(trans);
-				goto out;
-			}
-			ret = btrfs_orphan_add(trans, BTRFS_I(inode));
-			btrfs_end_transaction(trans);
-			if (ret) {
-				iput(inode);
-				goto out;
-			}
-
-			ret = btrfs_truncate(inode, false);
-			if (ret)
-				btrfs_orphan_del(NULL, BTRFS_I(inode));
-		} else {
-			nr_unlink++;
-		}
+		nr_unlink++;
 
 		/* this will do delete_inode and everything for us */
 		iput(inode);
@@ -3660,8 +3652,6 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 
 	if (nr_unlink)
 		btrfs_debug(fs_info, "unlinked %d orphans", nr_unlink);
-	if (nr_truncate)
-		btrfs_debug(fs_info, "truncated %d orphans", nr_truncate);
 
 out:
 	if (ret)
@@ -5087,29 +5077,6 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 			set_bit(BTRFS_INODE_ORDERED_DATA_CLOSE,
 				&BTRFS_I(inode)->runtime_flags);
 
-		/*
-		 * 1 for the orphan item we're going to add
-		 * 1 for the orphan item deletion.
-		 */
-		trans = btrfs_start_transaction(root, 2);
-		if (IS_ERR(trans))
-			return PTR_ERR(trans);
-
-		/*
-		 * We need to do this in case we fail at _any_ point during the
-		 * actual truncate.  Once we do the truncate_setsize we could
-		 * invalidate pages which forces any outstanding ordered io to
-		 * be instantly completed which will give us extents that need
-		 * to be truncated.  If we fail to get an orphan inode down we
-		 * could have left over extents that were never meant to live,
-		 * so we need to guarantee from this point on that everything
-		 * will be consistent.
-		 */
-		ret = btrfs_orphan_add(trans, BTRFS_I(inode));
-		btrfs_end_transaction(trans);
-		if (ret)
-			return ret;
-
 		truncate_setsize(inode, newsize);
 
 		/* Disable nonlocked read DIO to avoid the end less truncate */
@@ -5121,29 +5088,16 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 		if (ret && inode->i_nlink) {
 			int err;
 
-			/* To get a stable disk_i_size */
-			err = btrfs_wait_ordered_range(inode, 0, (u64)-1);
-			if (err) {
-				btrfs_orphan_del(NULL, BTRFS_I(inode));
-				return err;
-			}
-
 			/*
-			 * failed to truncate, disk_i_size is only adjusted down
-			 * as we remove extents, so it should represent the true
-			 * size of the inode, so reset the in memory size and
-			 * delete our orphan entry.
+			 * Truncate failed, so fix up the in-memory size. We
+			 * adjusted disk_i_size down as we removed extents, so
+			 * wait for disk_i_size to be stable and then update the
+			 * in-memory size to match.
 			 */
-			trans = btrfs_join_transaction(root);
-			if (IS_ERR(trans)) {
-				btrfs_orphan_del(NULL, BTRFS_I(inode));
-				return ret;
-			}
-			i_size_write(inode, BTRFS_I(inode)->disk_i_size);
-			err = btrfs_orphan_del(trans, BTRFS_I(inode));
+			err = btrfs_wait_ordered_range(inode, 0, (u64)-1);
 			if (err)
-				btrfs_abort_transaction(trans, err);
-			btrfs_end_transaction(trans);
+				return err;
+			i_size_write(inode, BTRFS_I(inode)->disk_i_size);
 		}
 	}
 
@@ -9048,39 +9002,31 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
 	}
 
 	/*
-	 * Yes ladies and gentlemen, this is indeed ugly.  The fact is we have
-	 * 3 things going on here
-	 *
-	 * 1) We need to reserve space for our orphan item and the space to
-	 * delete our orphan item.  Lord knows we don't want to have a dangling
-	 * orphan item because we didn't reserve space to remove it.
+	 * Yes ladies and gentlemen, this is indeed ugly.  We have a couple of
+	 * things going on here:
 	 *
-	 * 2) We need to reserve space to update our inode.
+	 * 1) We need to reserve space to update our inode.
 	 *
-	 * 3) We need to have something to cache all the space that is going to
+	 * 2) We need to have something to cache all the space that is going to
 	 * be free'd up by the truncate operation, but also have some slack
 	 * space reserved in case it uses space during the truncate (thank you
 	 * very much snapshotting).
 	 *
-	 * And we need these to all be separate.  The fact is we can use a lot of
+	 * And we need these to be separate.  The fact is we can use a lot of
 	 * space doing the truncate, and we have no earthly idea how much space
 	 * we will use, so we need the truncate reservation to be separate so it
-	 * doesn't end up using space reserved for updating the inode or
-	 * removing the orphan item.  We also need to be able to stop the
-	 * transaction and start a new one, which means we need to be able to
-	 * update the inode several times, and we have no idea of knowing how
-	 * many times that will be, so we can't just reserve 1 item for the
-	 * entirety of the operation, so that has to be done separately as well.
-	 * Then there is the orphan item, which does indeed need to be held on
-	 * to for the whole operation, and we need nobody to touch this reserved
-	 * space except the orphan code.
+	 * doesn't end up using space reserved for updating the inode.  We also
+	 * need to be able to stop the transaction and start a new one, which
+	 * means we need to be able to update the inode several times, and we
+	 * have no idea of knowing how many times that will be, so we can't just
+	 * reserve 1 item for the entirety of the operation, so that has to be
+	 * done separately as well.
 	 *
 	 * So that leaves us with
 	 *
-	 * 1) root->orphan_block_rsv - for the orphan deletion.
-	 * 2) rsv - for the truncate reservation, which we will steal from the
+	 * 1) rsv - for the truncate reservation, which we will steal from the
 	 * transaction reservation.
-	 * 3) fs_info->trans_block_rsv - this will have 1 items worth left for
+	 * 2) fs_info->trans_block_rsv - this will have 1 items worth left for
 	 * updating the inode.
 	 */
 	rsv = btrfs_alloc_block_rsv(fs_info, BTRFS_BLOCK_RSV_TEMP);
@@ -9168,13 +9114,6 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
 		btrfs_ordered_update_i_size(inode, inode->i_size, NULL);
 	}
 
-	if (ret == 0 && inode->i_nlink > 0) {
-		trans->block_rsv = root->orphan_block_rsv;
-		ret = btrfs_orphan_del(trans, BTRFS_I(inode));
-		if (ret)
-			err = ret;
-	}
-
 	if (trans) {
 		trans->block_rsv = &fs_info->trans_block_rsv;
 		ret = btrfs_update_inode(trans, root, inode);
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 05/12] Btrfs: don't release reserve or decrement orphan count if orphan item already existed
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (3 preceding siblings ...)
  2018-05-11  0:11 ` [PATCH v2 04/12] Btrfs: stop creating orphan items for truncate Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 06/12] Btrfs: don't return ino to ino cache if inode item removal fails Omar Sandoval
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

Currently, if btrfs_insert_orphan_item() fails, we release the reserved
space and decrement root->orphan_inodes. However, we also ignore -EEXIST
errors, so we still want the space reservation for when we delete the
item, and we still need to count the orphan because we're still going to
decrement root->orphan_inodes from btrfs_orphan_del() later.

Fixes: 4ef31a45a009 ("Btrfs: fix the error handling wrt orphan items")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1460823951d7..e77df96de642 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3408,7 +3408,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 	/* insert an orphan item to track this unlinked file */
 	if (insert) {
 		ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
-		if (ret) {
+		if (ret && ret != -EEXIST) {
 			if (reserve) {
 				clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
 					  &inode->runtime_flags);
@@ -3420,14 +3420,11 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 			 * decrease ->orphan_inodes after everything is done.
 			 */
 			atomic_dec(&root->orphan_inodes);
-			if (ret != -EEXIST) {
-				clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-					  &inode->runtime_flags);
-				btrfs_abort_transaction(trans, ret);
-				return ret;
-			}
+			clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
+				  &inode->runtime_flags);
+			btrfs_abort_transaction(trans, ret);
+			return ret;
 		}
-		ret = 0;
 	}
 
 	return 0;
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 06/12] Btrfs: don't return ino to ino cache if inode item removal fails
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (4 preceding siblings ...)
  2018-05-11  0:11 ` [PATCH v2 05/12] Btrfs: don't release reserve or decrement orphan count if orphan item already existed Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 07/12] Btrfs: refactor btrfs_evict_inode() reserve refill dance Omar Sandoval
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

In btrfs_evict_inode(), if btrfs_truncate_inode_items() fails, the inode
item will still be in the tree but we still return the ino to the ino
cache. That will blow up later when someone tries to allocate that ino,
so don't return it to the cache.

Fixes: 581bb050941b ("Btrfs: Cache free inode numbers in memory")
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e77df96de642..9a6a4e626e01 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5368,13 +5368,18 @@ void btrfs_evict_inode(struct inode *inode)
 		trans->block_rsv = rsv;
 
 		ret = btrfs_truncate_inode_items(trans, root, inode, 0, 0);
-		if (ret != -ENOSPC && ret != -EAGAIN)
+		if (ret) {
+			trans->block_rsv = &fs_info->trans_block_rsv;
+			btrfs_end_transaction(trans);
+			btrfs_btree_balance_dirty(fs_info);
+			if (ret != -ENOSPC && ret != -EAGAIN) {
+				btrfs_orphan_del(NULL, BTRFS_I(inode));
+				btrfs_free_block_rsv(fs_info, rsv);
+				goto no_delete;
+			}
+		} else {
 			break;
-
-		trans->block_rsv = &fs_info->trans_block_rsv;
-		btrfs_end_transaction(trans);
-		trans = NULL;
-		btrfs_btree_balance_dirty(fs_info);
+		}
 	}
 
 	btrfs_free_block_rsv(fs_info, rsv);
@@ -5383,12 +5388,8 @@ void btrfs_evict_inode(struct inode *inode)
 	 * Errors here aren't a big deal, it just means we leave orphan items
 	 * in the tree.  They will be cleaned up on the next mount.
 	 */
-	if (ret == 0) {
-		trans->block_rsv = root->orphan_block_rsv;
-		btrfs_orphan_del(trans, BTRFS_I(inode));
-	} else {
-		btrfs_orphan_del(NULL, BTRFS_I(inode));
-	}
+	trans->block_rsv = root->orphan_block_rsv;
+	btrfs_orphan_del(trans, BTRFS_I(inode));
 
 	trans->block_rsv = &fs_info->trans_block_rsv;
 	if (!(root == fs_info->tree_root ||
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 07/12] Btrfs: refactor btrfs_evict_inode() reserve refill dance
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (5 preceding siblings ...)
  2018-05-11  0:11 ` [PATCH v2 06/12] Btrfs: don't return ino to ino cache if inode item removal fails Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 08/12] Btrfs: fix ENOSPC caused by orphan items reservations Omar Sandoval
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

The truncate loop in btrfs_evict_inode() does two things at once:

- It refills the temporary block reserve, potentially stealing from the
  global reserve or committing
- It calls btrfs_truncate_inode_items()

The tangle of continues hides the fact that these two steps are actually
separate. Split the first step out into a separate function both for
clarity and so that we can reuse it in a later patch.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 113 ++++++++++++++++++-----------------------------
 1 file changed, 42 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9a6a4e626e01..348dc57920f5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5224,13 +5224,52 @@ static void evict_inode_truncate_pages(struct inode *inode)
 	spin_unlock(&io_tree->lock);
 }
 
+static struct btrfs_trans_handle *evict_refill_and_join(struct btrfs_root *root,
+							struct btrfs_block_rsv *rsv,
+							u64 min_size)
+{
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
+	int failures = 0;
+
+	for (;;) {
+		struct btrfs_trans_handle *trans;
+		int ret;
+
+		ret = btrfs_block_rsv_refill(root, rsv, min_size,
+					     BTRFS_RESERVE_FLUSH_LIMIT);
+
+		if (ret && ++failures > 2) {
+			btrfs_warn(fs_info,
+				   "could not allocate space for a delete; will truncate on mount");
+			return ERR_PTR(-ENOSPC);
+		}
+
+		trans = btrfs_join_transaction(root);
+		if (IS_ERR(trans) || !ret)
+			return trans;
+
+		/*
+		 * Try to steal from the global reserve if there is space for
+		 * it.
+		 */
+		if (!btrfs_check_space_for_delayed_refs(trans, fs_info) &&
+		    !btrfs_block_rsv_migrate(global_rsv, rsv, min_size, 0))
+			return trans;
+
+		/* If not, commit and try again. */
+		ret = btrfs_commit_transaction(trans);
+		if (ret)
+			return ERR_PTR(ret);
+	}
+}
+
 void btrfs_evict_inode(struct inode *inode)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_trans_handle *trans;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
-	struct btrfs_block_rsv *rsv, *global_rsv;
-	int steal_from_global = 0;
+	struct btrfs_block_rsv *rsv;
 	u64 min_size;
 	int ret;
 
@@ -5286,85 +5325,17 @@ void btrfs_evict_inode(struct inode *inode)
 	}
 	rsv->size = min_size;
 	rsv->failfast = 1;
-	global_rsv = &fs_info->global_block_rsv;
 
 	btrfs_i_size_write(BTRFS_I(inode), 0);
 
-	/*
-	 * This is a bit simpler than btrfs_truncate since we've already
-	 * reserved our space for our orphan item in the unlink, so we just
-	 * need to reserve some slack space in case we add bytes and update
-	 * inode item when doing the truncate.
-	 */
 	while (1) {
-		ret = btrfs_block_rsv_refill(root, rsv, min_size,
-					     BTRFS_RESERVE_FLUSH_LIMIT);
-
-		/*
-		 * Try and steal from the global reserve since we will
-		 * likely not use this space anyway, we want to try as
-		 * hard as possible to get this to work.
-		 */
-		if (ret)
-			steal_from_global++;
-		else
-			steal_from_global = 0;
-		ret = 0;
-
-		/*
-		 * steal_from_global == 0: we reserved stuff, hooray!
-		 * steal_from_global == 1: we didn't reserve stuff, boo!
-		 * steal_from_global == 2: we've committed, still not a lot of
-		 * room but maybe we'll have room in the global reserve this
-		 * time.
-		 * steal_from_global == 3: abandon all hope!
-		 */
-		if (steal_from_global > 2) {
-			btrfs_warn(fs_info,
-				   "Could not get space for a delete, will truncate on mount %d",
-				   ret);
-			btrfs_orphan_del(NULL, BTRFS_I(inode));
-			btrfs_free_block_rsv(fs_info, rsv);
-			goto no_delete;
-		}
-
-		trans = btrfs_join_transaction(root);
+		trans = evict_refill_and_join(root, rsv, min_size);
 		if (IS_ERR(trans)) {
 			btrfs_orphan_del(NULL, BTRFS_I(inode));
 			btrfs_free_block_rsv(fs_info, rsv);
 			goto no_delete;
 		}
 
-		/*
-		 * We can't just steal from the global reserve, we need to make
-		 * sure there is room to do it, if not we need to commit and try
-		 * again.
-		 */
-		if (steal_from_global) {
-			if (!btrfs_check_space_for_delayed_refs(trans, fs_info))
-				ret = btrfs_block_rsv_migrate(global_rsv, rsv,
-							      min_size, 0);
-			else
-				ret = -ENOSPC;
-		}
-
-		/*
-		 * Couldn't steal from the global reserve, we have too much
-		 * pending stuff built up, commit the transaction and try it
-		 * again.
-		 */
-		if (ret) {
-			ret = btrfs_commit_transaction(trans);
-			if (ret) {
-				btrfs_orphan_del(NULL, BTRFS_I(inode));
-				btrfs_free_block_rsv(fs_info, rsv);
-				goto no_delete;
-			}
-			continue;
-		} else {
-			steal_from_global = 0;
-		}
-
 		trans->block_rsv = rsv;
 
 		ret = btrfs_truncate_inode_items(trans, root, inode, 0, 0);
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 08/12] Btrfs: fix ENOSPC caused by orphan items reservations
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (6 preceding siblings ...)
  2018-05-11  0:11 ` [PATCH v2 07/12] Btrfs: refactor btrfs_evict_inode() reserve refill dance Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  2018-05-11  6:38   ` Nikolay Borisov
  2018-05-11  0:11 ` [PATCH v2 09/12] Btrfs: get rid of root->orphan_block_rsv and root->orphan_lock Omar Sandoval
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

Currently, we keep space reserved for all inode orphan items until the
inode is evicted (i.e., all references to it are dropped). We hit an
issue where an application would keep a bunch of deleted files open (by
design) and thus keep a large amount of space reserved, causing ENOSPC
errors when other operations tried to reserve space. This long-standing
reservation isn't absolutely necessary for a couple of reasons:

- We can almost always make the reservation we need or steal from the
  global reserve for the orphan item
- If we can't, it's not the end of the world if we drop the orphan item
  on the floor and let the next mount clean it up

So, get rid of persistent reservation and just reserve space in
btrfs_evict_inode().

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/btrfs_inode.h |  19 +++---
 fs/btrfs/inode.c       | 142 ++++++++++++-----------------------------
 2 files changed, 50 insertions(+), 111 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 234bae55b85d..2f466cf55790 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -20,16 +20,15 @@
  * new data the application may have written before commit.
  */
 #define BTRFS_INODE_ORDERED_DATA_CLOSE		0
-#define BTRFS_INODE_ORPHAN_META_RESERVED	1
-#define BTRFS_INODE_DUMMY			2
-#define BTRFS_INODE_IN_DEFRAG			3
-#define BTRFS_INODE_HAS_ORPHAN_ITEM		4
-#define BTRFS_INODE_HAS_ASYNC_EXTENT		5
-#define BTRFS_INODE_NEEDS_FULL_SYNC		6
-#define BTRFS_INODE_COPY_EVERYTHING		7
-#define BTRFS_INODE_IN_DELALLOC_LIST		8
-#define BTRFS_INODE_READDIO_NEED_LOCK		9
-#define BTRFS_INODE_HAS_PROPS		        10
+#define BTRFS_INODE_DUMMY			1
+#define BTRFS_INODE_IN_DEFRAG			2
+#define BTRFS_INODE_HAS_ORPHAN_ITEM		3
+#define BTRFS_INODE_HAS_ASYNC_EXTENT		4
+#define BTRFS_INODE_NEEDS_FULL_SYNC		5
+#define BTRFS_INODE_COPY_EVERYTHING		6
+#define BTRFS_INODE_IN_DELALLOC_LIST		7
+#define BTRFS_INODE_READDIO_NEED_LOCK		8
+#define BTRFS_INODE_HAS_PROPS		        9
 
 /* in memory btrfs inode */
 struct btrfs_inode {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 348dc57920f5..b9a046b8c72c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3343,88 +3343,25 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
 /*
  * This creates an orphan entry for the given inode in case something goes wrong
  * in the middle of an unlink.
- *
- * NOTE: caller of this function should reserve 5 units of metadata for
- *	 this function.
  */
 int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 		struct btrfs_inode *inode)
 {
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
 	struct btrfs_root *root = inode->root;
-	struct btrfs_block_rsv *block_rsv = NULL;
-	int reserve = 0;
-	bool insert = false;
 	int ret;
 
-	if (!root->orphan_block_rsv) {
-		block_rsv = btrfs_alloc_block_rsv(fs_info,
-						  BTRFS_BLOCK_RSV_TEMP);
-		if (!block_rsv)
-			return -ENOMEM;
-	}
-
-	if (!test_and_set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-			      &inode->runtime_flags))
-		insert = true;
-
-	if (!test_and_set_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
-			      &inode->runtime_flags))
-		reserve = 1;
-
-	spin_lock(&root->orphan_lock);
-	/* If someone has created ->orphan_block_rsv, be happy to use it. */
-	if (!root->orphan_block_rsv) {
-		root->orphan_block_rsv = block_rsv;
-	} else if (block_rsv) {
-		btrfs_free_block_rsv(fs_info, block_rsv);
-		block_rsv = NULL;
-	}
+	if (test_and_set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
+			     &inode->runtime_flags))
+		return 0;
 
-	if (insert)
-		atomic_inc(&root->orphan_inodes);
-	spin_unlock(&root->orphan_lock);
+	atomic_inc(&root->orphan_inodes);
 
-	/* grab metadata reservation from transaction handle */
-	if (reserve) {
-		ret = btrfs_orphan_reserve_metadata(trans, inode);
-		ASSERT(!ret);
-		if (ret) {
-			/*
-			 * dec doesn't need spin_lock as ->orphan_block_rsv
-			 * would be released only if ->orphan_inodes is
-			 * zero.
-			 */
-			atomic_dec(&root->orphan_inodes);
-			clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
-				  &inode->runtime_flags);
-			if (insert)
-				clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-					  &inode->runtime_flags);
-			return ret;
-		}
-	}
-
-	/* insert an orphan item to track this unlinked file */
-	if (insert) {
-		ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
-		if (ret && ret != -EEXIST) {
-			if (reserve) {
-				clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
-					  &inode->runtime_flags);
-				btrfs_orphan_release_metadata(inode);
-			}
-			/*
-			 * btrfs_orphan_commit_root may race with us and set
-			 * ->orphan_block_rsv to zero, in order to avoid that,
-			 * decrease ->orphan_inodes after everything is done.
-			 */
-			atomic_dec(&root->orphan_inodes);
-			clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-				  &inode->runtime_flags);
-			btrfs_abort_transaction(trans, ret);
-			return ret;
-		}
+	ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
+	if (ret && ret != -EEXIST) {
+		atomic_dec(&root->orphan_inodes);
+		clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, &inode->runtime_flags);
+		btrfs_abort_transaction(trans, ret);
+		return ret;
 	}
 
 	return 0;
@@ -3438,27 +3375,16 @@ static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
 			    struct btrfs_inode *inode)
 {
 	struct btrfs_root *root = inode->root;
-	int delete_item = 0;
 	int ret = 0;
 
-	if (test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-			       &inode->runtime_flags))
-		delete_item = 1;
+	if (!test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
+				&inode->runtime_flags))
+		return 0;
 
-	if (delete_item && trans)
+	if (trans)
 		ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode));
 
-	if (test_and_clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
-			       &inode->runtime_flags))
-		btrfs_orphan_release_metadata(inode);
-
-	/*
-	 * btrfs_orphan_commit_root may race with us and set ->orphan_block_rsv
-	 * to zero, in order to avoid that, decrease ->orphan_inodes after
-	 * everything is done.
-	 */
-	if (delete_item)
-		atomic_dec(&root->orphan_inodes);
+	atomic_dec(&root->orphan_inodes);
 
 	return ret;
 }
@@ -5339,10 +5265,10 @@ void btrfs_evict_inode(struct inode *inode)
 		trans->block_rsv = rsv;
 
 		ret = btrfs_truncate_inode_items(trans, root, inode, 0, 0);
+		trans->block_rsv = &fs_info->trans_block_rsv;
+		btrfs_end_transaction(trans);
+		btrfs_btree_balance_dirty(fs_info);
 		if (ret) {
-			trans->block_rsv = &fs_info->trans_block_rsv;
-			btrfs_end_transaction(trans);
-			btrfs_btree_balance_dirty(fs_info);
 			if (ret != -ENOSPC && ret != -EAGAIN) {
 				btrfs_orphan_del(NULL, BTRFS_I(inode));
 				btrfs_free_block_rsv(fs_info, rsv);
@@ -5353,22 +5279,36 @@ void btrfs_evict_inode(struct inode *inode)
 		}
 	}
 
-	btrfs_free_block_rsv(fs_info, rsv);
-
 	/*
-	 * Errors here aren't a big deal, it just means we leave orphan items
-	 * in the tree.  They will be cleaned up on the next mount.
+	 * Errors here aren't a big deal, it just means we leave orphan items in
+	 * the tree. They will be cleaned up on the next mount. If the inode
+	 * number gets reused, cleanup deletes the orphan item without doing
+	 * anything, and unlink reuses the existing orphan item.
+	 *
+	 * If it turns out that we are dropping too many of these, we might want
+	 * to add a mechanism for retrying these after a commit.
 	 */
-	trans->block_rsv = root->orphan_block_rsv;
-	btrfs_orphan_del(trans, BTRFS_I(inode));
+	trans = evict_refill_and_join(root, rsv, min_size);
+	if (IS_ERR(trans)) {
+		btrfs_orphan_del(NULL, BTRFS_I(inode));
+	} else {
+		trans->block_rsv = rsv;
+		btrfs_orphan_del(trans, BTRFS_I(inode));
+		trans->block_rsv = &fs_info->trans_block_rsv;
+		btrfs_end_transaction(trans);
+	}
+
+	btrfs_free_block_rsv(fs_info, rsv);
 
-	trans->block_rsv = &fs_info->trans_block_rsv;
 	if (!(root == fs_info->tree_root ||
 	      root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID))
 		btrfs_return_ino(root, btrfs_ino(BTRFS_I(inode)));
 
-	btrfs_end_transaction(trans);
-	btrfs_btree_balance_dirty(fs_info);
+	/*
+	 * If we didn't successfully delete, the orphan item will still be in
+	 * the tree and we'll retry on the next mount. Again, we might also want
+	 * to retry these periodically in the future.
+	 */
 no_delete:
 	btrfs_remove_delayed_node(BTRFS_I(inode));
 	clear_inode(inode);
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/12] Btrfs: get rid of root->orphan_block_rsv and root->orphan_lock
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (7 preceding siblings ...)
  2018-05-11  0:11 ` [PATCH v2 08/12] Btrfs: fix ENOSPC caused by orphan items reservations Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  2018-05-11  6:44   ` Nikolay Borisov
  2018-05-11  0:11 ` [PATCH v2 10/12] Btrfs: get rid of btrfs_orphan_commit_root() and root->orphan_inodes Omar Sandoval
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

Now that we don't keep long-standing reservations for orphan items,
root->orphan_block_rsv isn't used. We can git rid of it and
root->orphan_lock, which was used to protect it.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/ctree.h       |  5 -----
 fs/btrfs/disk-io.c     |  8 --------
 fs/btrfs/extent-tree.c | 38 --------------------------------------
 fs/btrfs/inode.c       | 41 ++++++-----------------------------------
 4 files changed, 6 insertions(+), 86 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2771cc56a622..d66241a35fab 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1219,9 +1219,7 @@ struct btrfs_root {
 	spinlock_t log_extents_lock[2];
 	struct list_head logged_list[2];
 
-	spinlock_t orphan_lock;
 	atomic_t orphan_inodes;
-	struct btrfs_block_rsv *orphan_block_rsv;
 	int orphan_cleanup_state;
 
 	spinlock_t inode_lock;
@@ -2764,9 +2762,6 @@ void btrfs_delalloc_release_space(struct inode *inode,
 void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
 					    u64 len);
 void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
-int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
-				  struct btrfs_inode *inode);
-void btrfs_orphan_release_metadata(struct btrfs_inode *inode);
 int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
 				     struct btrfs_block_rsv *rsv,
 				     int nitems,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 60caa68c3618..24e15e2080f4 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1185,7 +1185,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	root->inode_tree = RB_ROOT;
 	INIT_RADIX_TREE(&root->delayed_nodes_tree, GFP_ATOMIC);
 	root->block_rsv = NULL;
-	root->orphan_block_rsv = NULL;
 
 	INIT_LIST_HEAD(&root->dirty_list);
 	INIT_LIST_HEAD(&root->root_list);
@@ -1195,7 +1194,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	INIT_LIST_HEAD(&root->ordered_root);
 	INIT_LIST_HEAD(&root->logged_list[0]);
 	INIT_LIST_HEAD(&root->logged_list[1]);
-	spin_lock_init(&root->orphan_lock);
 	spin_lock_init(&root->inode_lock);
 	spin_lock_init(&root->delalloc_lock);
 	spin_lock_init(&root->ordered_extent_lock);
@@ -3674,8 +3672,6 @@ static void free_fs_root(struct btrfs_root *root)
 {
 	iput(root->ino_cache_inode);
 	WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
-	btrfs_free_block_rsv(root->fs_info, root->orphan_block_rsv);
-	root->orphan_block_rsv = NULL;
 	if (root->anon_dev)
 		free_anon_bdev(root->anon_dev);
 	if (root->subv_writers)
@@ -3766,7 +3762,6 @@ int btrfs_commit_super(struct btrfs_fs_info *fs_info)
 
 void close_ctree(struct btrfs_fs_info *fs_info)
 {
-	struct btrfs_root *root = fs_info->tree_root;
 	int ret;
 
 	set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
@@ -3861,9 +3856,6 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 	btrfs_free_stripe_hash_table(fs_info);
 	btrfs_free_ref_cache(fs_info);
 
-	__btrfs_free_block_rsv(root->orphan_block_rsv);
-	root->orphan_block_rsv = NULL;
-
 	while (!list_empty(&fs_info->pinned_chunks)) {
 		struct extent_map *em;
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 51b5e2da708c..3f2e026bc206 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5949,44 +5949,6 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
 	trans->chunk_bytes_reserved = 0;
 }
 
-/* Can only return 0 or -ENOSPC */
-int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
-				  struct btrfs_inode *inode)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
-	struct btrfs_root *root = inode->root;
-	/*
-	 * We always use trans->block_rsv here as we will have reserved space
-	 * for our orphan when starting the transaction, using get_block_rsv()
-	 * here will sometimes make us choose the wrong block rsv as we could be
-	 * doing a reloc inode for a non refcounted root.
-	 */
-	struct btrfs_block_rsv *src_rsv = trans->block_rsv;
-	struct btrfs_block_rsv *dst_rsv = root->orphan_block_rsv;
-
-	/*
-	 * We need to hold space in order to delete our orphan item once we've
-	 * added it, so this takes the reservation so we can release it later
-	 * when we are truly done with the orphan item.
-	 */
-	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
-
-	trace_btrfs_space_reservation(fs_info, "orphan", btrfs_ino(inode),
-			num_bytes, 1);
-	return btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, 1);
-}
-
-void btrfs_orphan_release_metadata(struct btrfs_inode *inode)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
-	struct btrfs_root *root = inode->root;
-	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
-
-	trace_btrfs_space_reservation(fs_info, "orphan", btrfs_ino(inode),
-			num_bytes, 0);
-	btrfs_block_rsv_release(fs_info, root->orphan_block_rsv, num_bytes);
-}
-
 /*
  * btrfs_subvolume_reserve_metadata() - reserve space for subvolume operation
  * root: the root of the parent directory
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b9a046b8c72c..50f10882a715 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3293,37 +3293,18 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
 }
 
 /*
- * This is called in transaction commit time. If there are no orphan
- * files in the subvolume, it removes orphan item and frees block_rsv
- * structure.
+ * This is called in transaction commit time. If there are no orphan files in
+ * the subvolume, it removes the orphan item.
  */
 void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
 			      struct btrfs_root *root)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
-	struct btrfs_block_rsv *block_rsv;
 	int ret;
 
-	if (atomic_read(&root->orphan_inodes) ||
-	    root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE)
-		return;
-
-	spin_lock(&root->orphan_lock);
-	if (atomic_read(&root->orphan_inodes)) {
-		spin_unlock(&root->orphan_lock);
-		return;
-	}
-
-	if (root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) {
-		spin_unlock(&root->orphan_lock);
-		return;
-	}
-
-	block_rsv = root->orphan_block_rsv;
-	root->orphan_block_rsv = NULL;
-	spin_unlock(&root->orphan_lock);
-
-	if (test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state) &&
+	if (atomic_read(&root->orphan_inodes) == 0 &&
+	    root->orphan_cleanup_state == ORPHAN_CLEANUP_DONE &&
+	    test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state) &&
 	    btrfs_root_refs(&root->root_item) > 0) {
 		ret = btrfs_del_orphan_item(trans, fs_info->tree_root,
 					    root->root_key.objectid);
@@ -3333,11 +3314,6 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
 			clear_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED,
 				  &root->state);
 	}
-
-	if (block_rsv) {
-		WARN_ON(block_rsv->size > 0);
-		btrfs_free_block_rsv(fs_info, block_rsv);
-	}
 }
 
 /*
@@ -3562,12 +3538,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 
 	root->orphan_cleanup_state = ORPHAN_CLEANUP_DONE;
 
-	if (root->orphan_block_rsv)
-		btrfs_block_rsv_release(fs_info, root->orphan_block_rsv,
-					(u64)-1);
-
-	if (root->orphan_block_rsv ||
-	    test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state)) {
+	if (test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state)) {
 		trans = btrfs_join_transaction(root);
 		if (!IS_ERR(trans))
 			btrfs_end_transaction(trans);
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 10/12] Btrfs: get rid of btrfs_orphan_commit_root() and root->orphan_inodes
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (8 preceding siblings ...)
  2018-05-11  0:11 ` [PATCH v2 09/12] Btrfs: get rid of root->orphan_block_rsv and root->orphan_lock Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  2018-05-11  7:01   ` Nikolay Borisov
  2018-05-11  0:11 ` [PATCH v2 11/12] Btrfs: simplify error handling in btrfs_evict_inode() Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 12/12] Btrfs: reserve space for O_TMPFILE orphan item deletion Omar Sandoval
  11 siblings, 1 reply; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

btrfs_orphan_commit_root() tries to delete an orphan item for a
subvolume in the tree root, but we don't actually insert that item in
the first place. See commit 0a0d4415e338 ("Btrfs: delete dead code in
btrfs_orphan_add()"). We can get rid of it.

root->orphan_inodes was used to as a refcount for root->orphan_block_rsv
and for btrfs_orphan_commit_root(), but both are gone now, so we can get
rid of it, as well.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/ctree.h       |  3 ---
 fs/btrfs/disk-io.c     |  1 -
 fs/btrfs/inode.c       | 40 ++++------------------------------------
 fs/btrfs/transaction.c |  1 -
 4 files changed, 4 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index d66241a35fab..51408de11af2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1219,7 +1219,6 @@ struct btrfs_root {
 	spinlock_t log_extents_lock[2];
 	struct list_head logged_list[2];
 
-	atomic_t orphan_inodes;
 	int orphan_cleanup_state;
 
 	spinlock_t inode_lock;
@@ -3233,8 +3232,6 @@ int btrfs_update_inode_fallback(struct btrfs_trans_handle *trans,
 int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 		struct btrfs_inode *inode);
 int btrfs_orphan_cleanup(struct btrfs_root *root);
-void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
-			      struct btrfs_root *root);
 int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size);
 void btrfs_invalidate_inodes(struct btrfs_root *root);
 void btrfs_add_delayed_iput(struct inode *inode);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 24e15e2080f4..4a40bfdddabc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1214,7 +1214,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
 	atomic_set(&root->log_commit[1], 0);
 	atomic_set(&root->log_writers, 0);
 	atomic_set(&root->log_batch, 0);
-	atomic_set(&root->orphan_inodes, 0);
 	refcount_set(&root->refs, 1);
 	atomic_set(&root->will_be_snapshotted, 0);
 	root->log_transid = 0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 50f10882a715..207e1d139b31 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3292,30 +3292,6 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
 	spin_unlock(&fs_info->delayed_iput_lock);
 }
 
-/*
- * This is called in transaction commit time. If there are no orphan files in
- * the subvolume, it removes the orphan item.
- */
-void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
-			      struct btrfs_root *root)
-{
-	struct btrfs_fs_info *fs_info = root->fs_info;
-	int ret;
-
-	if (atomic_read(&root->orphan_inodes) == 0 &&
-	    root->orphan_cleanup_state == ORPHAN_CLEANUP_DONE &&
-	    test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state) &&
-	    btrfs_root_refs(&root->root_item) > 0) {
-		ret = btrfs_del_orphan_item(trans, fs_info->tree_root,
-					    root->root_key.objectid);
-		if (ret)
-			btrfs_abort_transaction(trans, ret);
-		else
-			clear_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED,
-				  &root->state);
-	}
-}
-
 /*
  * This creates an orphan entry for the given inode in case something goes wrong
  * in the middle of an unlink.
@@ -3330,11 +3306,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
 			     &inode->runtime_flags))
 		return 0;
 
-	atomic_inc(&root->orphan_inodes);
-
 	ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
 	if (ret && ret != -EEXIST) {
-		atomic_dec(&root->orphan_inodes);
 		clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, &inode->runtime_flags);
 		btrfs_abort_transaction(trans, ret);
 		return ret;
@@ -3360,8 +3333,6 @@ static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
 	if (trans)
 		ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode));
 
-	atomic_dec(&root->orphan_inodes);
-
 	return ret;
 }
 
@@ -3519,12 +3490,11 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 		}
 
 		/*
-		 * add this inode to the orphan list so btrfs_orphan_del does
-		 * the proper thing when we hit it
+		 * Mark this inode as having an orphan item so
+		 * btrfs_orphan_del() does the proper thing when we hit it.
 		 */
 		set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
 			&BTRFS_I(inode)->runtime_flags);
-		atomic_inc(&root->orphan_inodes);
 
 		nr_unlink++;
 
@@ -9146,11 +9116,9 @@ void btrfs_destroy_inode(struct inode *inode)
 		goto free;
 
 	if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
-		     &BTRFS_I(inode)->runtime_flags)) {
-		btrfs_info(fs_info, "inode %llu still on the orphan list",
+		     &BTRFS_I(inode)->runtime_flags))
+		btrfs_info(fs_info, "inode %llu still has an orphan item",
 			   btrfs_ino(BTRFS_I(inode)));
-		atomic_dec(&root->orphan_inodes);
-	}
 
 	while (1) {
 		ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c944b4769e3c..44af1edf15d1 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1250,7 +1250,6 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
 
 			btrfs_free_log(trans, root);
 			btrfs_update_reloc_root(trans, root);
-			btrfs_orphan_commit_root(trans, root);
 
 			btrfs_save_ino_cache(root, trans);
 
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 11/12] Btrfs: simplify error handling in btrfs_evict_inode()
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (9 preceding siblings ...)
  2018-05-11  0:11 ` [PATCH v2 10/12] Btrfs: get rid of btrfs_orphan_commit_root() and root->orphan_inodes Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  2018-05-11  0:11 ` [PATCH v2 12/12] Btrfs: reserve space for O_TMPFILE orphan item deletion Omar Sandoval
  11 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

We have the same error handling code duplicated all over the place. Put
it all in one place, and while we're here, get rid of the weird
btrfs_orphan_del() trans == NULL case and just clear the orphan item bit
directly since that's all it does anymore.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 56 +++++++++++++++++-------------------------------
 1 file changed, 20 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 207e1d139b31..b28d5f251661 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3324,16 +3324,12 @@ static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
 			    struct btrfs_inode *inode)
 {
 	struct btrfs_root *root = inode->root;
-	int ret = 0;
 
 	if (!test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
 				&inode->runtime_flags))
 		return 0;
 
-	if (trans)
-		ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode));
-
-	return ret;
+	return btrfs_del_orphan_item(trans, root, btrfs_ino(inode));
 }
 
 /*
@@ -5155,12 +5151,10 @@ void btrfs_evict_inode(struct inode *inode)
 	    ((btrfs_root_refs(&root->root_item) != 0 &&
 	      root->root_key.objectid != BTRFS_ROOT_TREE_OBJECTID) ||
 	     btrfs_is_free_space_inode(BTRFS_I(inode))))
-		goto no_delete;
+		goto out;
 
-	if (is_bad_inode(inode)) {
-		btrfs_orphan_del(NULL, BTRFS_I(inode));
-		goto no_delete;
-	}
+	if (is_bad_inode(inode))
+		goto clear_orphan;
 	/* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
 	if (!special_file(inode->i_mode))
 		btrfs_wait_ordered_range(inode, 0, (u64)-1);
@@ -5170,26 +5164,22 @@ void btrfs_evict_inode(struct inode *inode)
 	if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) {
 		BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
 				 &BTRFS_I(inode)->runtime_flags));
-		goto no_delete;
+		goto out;
 	}
 
 	if (inode->i_nlink > 0) {
 		BUG_ON(btrfs_root_refs(&root->root_item) != 0 &&
 		       root->root_key.objectid != BTRFS_ROOT_TREE_OBJECTID);
-		goto no_delete;
+		goto out;
 	}
 
 	ret = btrfs_commit_inode_delayed_inode(BTRFS_I(inode));
-	if (ret) {
-		btrfs_orphan_del(NULL, BTRFS_I(inode));
-		goto no_delete;
-	}
+	if (ret)
+		goto clear_orphan;
 
 	rsv = btrfs_alloc_block_rsv(fs_info, BTRFS_BLOCK_RSV_TEMP);
-	if (!rsv) {
-		btrfs_orphan_del(NULL, BTRFS_I(inode));
-		goto no_delete;
-	}
+	if (!rsv)
+		goto clear_orphan;
 	rsv->size = min_size;
 	rsv->failfast = 1;
 
@@ -5197,11 +5187,8 @@ void btrfs_evict_inode(struct inode *inode)
 
 	while (1) {
 		trans = evict_refill_and_join(root, rsv, min_size);
-		if (IS_ERR(trans)) {
-			btrfs_orphan_del(NULL, BTRFS_I(inode));
-			btrfs_free_block_rsv(fs_info, rsv);
-			goto no_delete;
-		}
+		if (IS_ERR(trans))
+			goto free_rsv;
 
 		trans->block_rsv = rsv;
 
@@ -5210,11 +5197,8 @@ void btrfs_evict_inode(struct inode *inode)
 		btrfs_end_transaction(trans);
 		btrfs_btree_balance_dirty(fs_info);
 		if (ret) {
-			if (ret != -ENOSPC && ret != -EAGAIN) {
-				btrfs_orphan_del(NULL, BTRFS_I(inode));
-				btrfs_free_block_rsv(fs_info, rsv);
-				goto no_delete;
-			}
+			if (ret != -ENOSPC && ret != -EAGAIN)
+				goto free_rsv;
 		} else {
 			break;
 		}
@@ -5230,27 +5214,27 @@ void btrfs_evict_inode(struct inode *inode)
 	 * to add a mechanism for retrying these after a commit.
 	 */
 	trans = evict_refill_and_join(root, rsv, min_size);
-	if (IS_ERR(trans)) {
-		btrfs_orphan_del(NULL, BTRFS_I(inode));
-	} else {
+	if (!IS_ERR(trans)) {
 		trans->block_rsv = rsv;
 		btrfs_orphan_del(trans, BTRFS_I(inode));
 		trans->block_rsv = &fs_info->trans_block_rsv;
 		btrfs_end_transaction(trans);
 	}
 
-	btrfs_free_block_rsv(fs_info, rsv);
-
 	if (!(root == fs_info->tree_root ||
 	      root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID))
 		btrfs_return_ino(root, btrfs_ino(BTRFS_I(inode)));
 
+free_rsv:
+	btrfs_free_block_rsv(fs_info, rsv);
+clear_orphan:
 	/*
 	 * If we didn't successfully delete, the orphan item will still be in
 	 * the tree and we'll retry on the next mount. Again, we might also want
 	 * to retry these periodically in the future.
 	 */
-no_delete:
+	clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, &BTRFS_I(inode)->runtime_flags);
+out:
 	btrfs_remove_delayed_node(BTRFS_I(inode));
 	clear_inode(inode);
 }
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 12/12] Btrfs: reserve space for O_TMPFILE orphan item deletion
  2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
                   ` (10 preceding siblings ...)
  2018-05-11  0:11 ` [PATCH v2 11/12] Btrfs: simplify error handling in btrfs_evict_inode() Omar Sandoval
@ 2018-05-11  0:11 ` Omar Sandoval
  11 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  0:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik

From: Omar Sandoval <osandov@fb.com>

btrfs_link() calls btrfs_orphan_del() if it's linking an O_TMPFILE but
it doesn't reserve space to do so. Even before the removal of the
orphan_block_rsv it wasn't using it.

Fixes: ef3b9af50bfa ("Btrfs: implement inode_operations callback tmpfile")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b28d5f251661..a23962105d8b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6491,8 +6491,9 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
 	 * 2 items for inode and inode ref
 	 * 2 items for dir items
 	 * 1 item for parent inode
+	 * 1 item for orphan item deletion if O_TMPFILE
 	 */
-	trans = btrfs_start_transaction(root, 5);
+	trans = btrfs_start_transaction(root, inode->i_nlink ? 5 : 6);
 	if (IS_ERR(trans)) {
 		err = PTR_ERR(trans);
 		trans = NULL;
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 08/12] Btrfs: fix ENOSPC caused by orphan items reservations
  2018-05-11  0:11 ` [PATCH v2 08/12] Btrfs: fix ENOSPC caused by orphan items reservations Omar Sandoval
@ 2018-05-11  6:38   ` Nikolay Borisov
  2018-05-11  6:51     ` Omar Sandoval
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Borisov @ 2018-05-11  6:38 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik



On 11.05.2018 03:11, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Currently, we keep space reserved for all inode orphan items until the
> inode is evicted (i.e., all references to it are dropped). We hit an
> issue where an application would keep a bunch of deleted files open (by
> design) and thus keep a large amount of space reserved, causing ENOSPC
> errors when other operations tried to reserve space. This long-standing
> reservation isn't absolutely necessary for a couple of reasons:
> 
> - We can almost always make the reservation we need or steal from the
>   global reserve for the orphan item
> - If we can't, it's not the end of the world if we drop the orphan item
>   on the floor and let the next mount clean it up
> 
> So, get rid of persistent reservation and just reserve space in
> btrfs_evict_inode().
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/btrfs_inode.h |  19 +++---
>  fs/btrfs/inode.c       | 142 ++++++++++++-----------------------------
>  2 files changed, 50 insertions(+), 111 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 234bae55b85d..2f466cf55790 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -20,16 +20,15 @@
>   * new data the application may have written before commit.
>   */
>  #define BTRFS_INODE_ORDERED_DATA_CLOSE		0
> -#define BTRFS_INODE_ORPHAN_META_RESERVED	1
> -#define BTRFS_INODE_DUMMY			2
> -#define BTRFS_INODE_IN_DEFRAG			3
> -#define BTRFS_INODE_HAS_ORPHAN_ITEM		4
> -#define BTRFS_INODE_HAS_ASYNC_EXTENT		5
> -#define BTRFS_INODE_NEEDS_FULL_SYNC		6
> -#define BTRFS_INODE_COPY_EVERYTHING		7
> -#define BTRFS_INODE_IN_DELALLOC_LIST		8
> -#define BTRFS_INODE_READDIO_NEED_LOCK		9
> -#define BTRFS_INODE_HAS_PROPS		        10
> +#define BTRFS_INODE_DUMMY			1
> +#define BTRFS_INODE_IN_DEFRAG			2
> +#define BTRFS_INODE_HAS_ORPHAN_ITEM		3
> +#define BTRFS_INODE_HAS_ASYNC_EXTENT		4
> +#define BTRFS_INODE_NEEDS_FULL_SYNC		5
> +#define BTRFS_INODE_COPY_EVERYTHING		6
> +#define BTRFS_INODE_IN_DELALLOC_LIST		7
> +#define BTRFS_INODE_READDIO_NEED_LOCK		8
> +#define BTRFS_INODE_HAS_PROPS		        9
>  
>  /* in memory btrfs inode */
>  struct btrfs_inode {
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 348dc57920f5..b9a046b8c72c 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3343,88 +3343,25 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
>  /*
>   * This creates an orphan entry for the given inode in case something goes wrong
>   * in the middle of an unlink.
> - *
> - * NOTE: caller of this function should reserve 5 units of metadata for
> - *	 this function.
>   */
>  int btrfs_orphan_add(struct btrfs_trans_handle *trans,
>  		struct btrfs_inode *inode)
>  {
> -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
>  	struct btrfs_root *root = inode->root;
> -	struct btrfs_block_rsv *block_rsv = NULL;
> -	int reserve = 0;
> -	bool insert = false;
>  	int ret;
>  
> -	if (!root->orphan_block_rsv) {
> -		block_rsv = btrfs_alloc_block_rsv(fs_info,
> -						  BTRFS_BLOCK_RSV_TEMP);
> -		if (!block_rsv)
> -			return -ENOMEM;
> -	}
> -
> -	if (!test_and_set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> -			      &inode->runtime_flags))
> -		insert = true;
> -
> -	if (!test_and_set_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
> -			      &inode->runtime_flags))
> -		reserve = 1;
> -
> -	spin_lock(&root->orphan_lock);
> -	/* If someone has created ->orphan_block_rsv, be happy to use it. */
> -	if (!root->orphan_block_rsv) {
> -		root->orphan_block_rsv = block_rsv;
> -	} else if (block_rsv) {
> -		btrfs_free_block_rsv(fs_info, block_rsv);
> -		block_rsv = NULL;
> -	}
> +	if (test_and_set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> +			     &inode->runtime_flags))
> +		return 0;

How come this can return true? Shouldn't btrfs_orphan_add always be
called for an inode which doesn't have an orphan item? Having this check
seems to indicate there is some non-determinism in the lifetime of
orphan items.

>  
> -	if (insert)
> -		atomic_inc(&root->orphan_inodes);
> -	spin_unlock(&root->orphan_lock);
> +	atomic_inc(&root->orphan_inodes);
>  
> -	/* grab metadata reservation from transaction handle */
> -	if (reserve) {
> -		ret = btrfs_orphan_reserve_metadata(trans, inode);
> -		ASSERT(!ret);
> -		if (ret) {
> -			/*
> -			 * dec doesn't need spin_lock as ->orphan_block_rsv
> -			 * would be released only if ->orphan_inodes is
> -			 * zero.
> -			 */
> -			atomic_dec(&root->orphan_inodes);
> -			clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
> -				  &inode->runtime_flags);
> -			if (insert)
> -				clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> -					  &inode->runtime_flags);
> -			return ret;
> -		}
> -	}
> -
> -	/* insert an orphan item to track this unlinked file */
> -	if (insert) {
> -		ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
> -		if (ret && ret != -EEXIST) {
> -			if (reserve) {
> -				clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
> -					  &inode->runtime_flags);
> -				btrfs_orphan_release_metadata(inode);
> -			}
> -			/*
> -			 * btrfs_orphan_commit_root may race with us and set
> -			 * ->orphan_block_rsv to zero, in order to avoid that,
> -			 * decrease ->orphan_inodes after everything is done.
> -			 */
> -			atomic_dec(&root->orphan_inodes);
> -			clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> -				  &inode->runtime_flags);
> -			btrfs_abort_transaction(trans, ret);
> -			return ret;
> -		}
> +	ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
> +	if (ret && ret != -EEXIST) {
> +		atomic_dec(&root->orphan_inodes);
> +		clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, &inode->runtime_flags);
> +		btrfs_abort_transaction(trans, ret);
> +		return ret;
>  	}
>  
>  	return 0;
> @@ -3438,27 +3375,16 @@ static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
>  			    struct btrfs_inode *inode)
>  {
>  	struct btrfs_root *root = inode->root;
> -	int delete_item = 0;
>  	int ret = 0;
>  
> -	if (test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> -			       &inode->runtime_flags))
> -		delete_item = 1;
> +	if (!test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> +				&inode->runtime_flags))
> +		return 0;

Similar comment as in btrfs_orphan_del. Shouldn't this always follow a
successful btrfs_orphan_add, guaranteeing there is an item for this
inode? Are there some "benign" races that could happen in the mean time
which could trigger this check ?

>  
> -	if (delete_item && trans)
> +	if (trans)
>  		ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode));
>  
> -	if (test_and_clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
> -			       &inode->runtime_flags))
> -		btrfs_orphan_release_metadata(inode);
> -
> -	/*
> -	 * btrfs_orphan_commit_root may race with us and set ->orphan_block_rsv
> -	 * to zero, in order to avoid that, decrease ->orphan_inodes after
> -	 * everything is done.
> -	 */
> -	if (delete_item)
> -		atomic_dec(&root->orphan_inodes);
> +	atomic_dec(&root->orphan_inodes);
>  
>  	return ret;
>  }
> @@ -5339,10 +5265,10 @@ void btrfs_evict_inode(struct inode *inode)
>  		trans->block_rsv = rsv;
>  
>  		ret = btrfs_truncate_inode_items(trans, root, inode, 0, 0);
> +		trans->block_rsv = &fs_info->trans_block_rsv;
> +		btrfs_end_transaction(trans);
> +		btrfs_btree_balance_dirty(fs_info);
>  		if (ret) {
> -			trans->block_rsv = &fs_info->trans_block_rsv;
> -			btrfs_end_transaction(trans);
> -			btrfs_btree_balance_dirty(fs_info);
>  			if (ret != -ENOSPC && ret != -EAGAIN) {
>  				btrfs_orphan_del(NULL, BTRFS_I(inode));
>  				btrfs_free_block_rsv(fs_info, rsv);
> @@ -5353,22 +5279,36 @@ void btrfs_evict_inode(struct inode *inode)
>  		}
>  	}
>  
> -	btrfs_free_block_rsv(fs_info, rsv);
> -
>  	/*
> -	 * Errors here aren't a big deal, it just means we leave orphan items
> -	 * in the tree.  They will be cleaned up on the next mount.
> +	 * Errors here aren't a big deal, it just means we leave orphan items in
> +	 * the tree. They will be cleaned up on the next mount. If the inode
> +	 * number gets reused, cleanup deletes the orphan item without doing
> +	 * anything, and unlink reuses the existing orphan item.
> +	 *
> +	 * If it turns out that we are dropping too many of these, we might want
> +	 * to add a mechanism for retrying these after a commit.
>  	 */
> -	trans->block_rsv = root->orphan_block_rsv;
> -	btrfs_orphan_del(trans, BTRFS_I(inode));
> +	trans = evict_refill_and_join(root, rsv, min_size);
> +	if (IS_ERR(trans)) {
> +		btrfs_orphan_del(NULL, BTRFS_I(inode));
> +	} else {
> +		trans->block_rsv = rsv;
> +		btrfs_orphan_del(trans, BTRFS_I(inode));
> +		trans->block_rsv = &fs_info->trans_block_rsv;
> +		btrfs_end_transaction(trans);
> +	}
> +
> +	btrfs_free_block_rsv(fs_info, rsv);
>  
> -	trans->block_rsv = &fs_info->trans_block_rsv;
>  	if (!(root == fs_info->tree_root ||
>  	      root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID))
>  		btrfs_return_ino(root, btrfs_ino(BTRFS_I(inode)));
>  
> -	btrfs_end_transaction(trans);
> -	btrfs_btree_balance_dirty(fs_info);
> +	/*
> +	 * If we didn't successfully delete, the orphan item will still be in
> +	 * the tree and we'll retry on the next mount. Again, we might also want
> +	 * to retry these periodically in the future.
> +	 */
>  no_delete:
>  	btrfs_remove_delayed_node(BTRFS_I(inode));
>  	clear_inode(inode);
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 09/12] Btrfs: get rid of root->orphan_block_rsv and root->orphan_lock
  2018-05-11  0:11 ` [PATCH v2 09/12] Btrfs: get rid of root->orphan_block_rsv and root->orphan_lock Omar Sandoval
@ 2018-05-11  6:44   ` Nikolay Borisov
  2018-05-11  6:48     ` Omar Sandoval
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Borisov @ 2018-05-11  6:44 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik



On 11.05.2018 03:11, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Now that we don't keep long-standing reservations for orphan items,
> root->orphan_block_rsv isn't used. We can git rid of it and
> root->orphan_lock, which was used to protect it.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/ctree.h       |  5 -----
>  fs/btrfs/disk-io.c     |  8 --------
>  fs/btrfs/extent-tree.c | 38 --------------------------------------
>  fs/btrfs/inode.c       | 41 ++++++-----------------------------------
>  4 files changed, 6 insertions(+), 86 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2771cc56a622..d66241a35fab 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1219,9 +1219,7 @@ struct btrfs_root {
>  	spinlock_t log_extents_lock[2];
>  	struct list_head logged_list[2];
>  
> -	spinlock_t orphan_lock;
>  	atomic_t orphan_inodes;
> -	struct btrfs_block_rsv *orphan_block_rsv;
>  	int orphan_cleanup_state;
>  
>  	spinlock_t inode_lock;
> @@ -2764,9 +2762,6 @@ void btrfs_delalloc_release_space(struct inode *inode,
>  void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
>  					    u64 len);
>  void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
> -int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
> -				  struct btrfs_inode *inode);
> -void btrfs_orphan_release_metadata(struct btrfs_inode *inode);
>  int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
>  				     struct btrfs_block_rsv *rsv,
>  				     int nitems,
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 60caa68c3618..24e15e2080f4 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1185,7 +1185,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
>  	root->inode_tree = RB_ROOT;
>  	INIT_RADIX_TREE(&root->delayed_nodes_tree, GFP_ATOMIC);
>  	root->block_rsv = NULL;
> -	root->orphan_block_rsv = NULL;
>  
>  	INIT_LIST_HEAD(&root->dirty_list);
>  	INIT_LIST_HEAD(&root->root_list);
> @@ -1195,7 +1194,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
>  	INIT_LIST_HEAD(&root->ordered_root);
>  	INIT_LIST_HEAD(&root->logged_list[0]);
>  	INIT_LIST_HEAD(&root->logged_list[1]);
> -	spin_lock_init(&root->orphan_lock);
>  	spin_lock_init(&root->inode_lock);
>  	spin_lock_init(&root->delalloc_lock);
>  	spin_lock_init(&root->ordered_extent_lock);
> @@ -3674,8 +3672,6 @@ static void free_fs_root(struct btrfs_root *root)
>  {
>  	iput(root->ino_cache_inode);
>  	WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
> -	btrfs_free_block_rsv(root->fs_info, root->orphan_block_rsv);
> -	root->orphan_block_rsv = NULL;
>  	if (root->anon_dev)
>  		free_anon_bdev(root->anon_dev);
>  	if (root->subv_writers)
> @@ -3766,7 +3762,6 @@ int btrfs_commit_super(struct btrfs_fs_info *fs_info)
>  
>  void close_ctree(struct btrfs_fs_info *fs_info)
>  {
> -	struct btrfs_root *root = fs_info->tree_root;
>  	int ret;
>  
>  	set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
> @@ -3861,9 +3856,6 @@ void close_ctree(struct btrfs_fs_info *fs_info)
>  	btrfs_free_stripe_hash_table(fs_info);
>  	btrfs_free_ref_cache(fs_info);
>  
> -	__btrfs_free_block_rsv(root->orphan_block_rsv);
> -	root->orphan_block_rsv = NULL;
> -
>  	while (!list_empty(&fs_info->pinned_chunks)) {
>  		struct extent_map *em;
>  
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 51b5e2da708c..3f2e026bc206 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5949,44 +5949,6 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
>  	trans->chunk_bytes_reserved = 0;
>  }
>  
> -/* Can only return 0 or -ENOSPC */
> -int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
> -				  struct btrfs_inode *inode)
> -{
> -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
> -	struct btrfs_root *root = inode->root;
> -	/*
> -	 * We always use trans->block_rsv here as we will have reserved space
> -	 * for our orphan when starting the transaction, using get_block_rsv()
> -	 * here will sometimes make us choose the wrong block rsv as we could be
> -	 * doing a reloc inode for a non refcounted root.
> -	 */
> -	struct btrfs_block_rsv *src_rsv = trans->block_rsv;
> -	struct btrfs_block_rsv *dst_rsv = root->orphan_block_rsv;
> -
> -	/*
> -	 * We need to hold space in order to delete our orphan item once we've
> -	 * added it, so this takes the reservation so we can release it later
> -	 * when we are truly done with the orphan item.
> -	 */
> -	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
> -
> -	trace_btrfs_space_reservation(fs_info, "orphan", btrfs_ino(inode),
> -			num_bytes, 1);
> -	return btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, 1);
> -}
> -
> -void btrfs_orphan_release_metadata(struct btrfs_inode *inode)
> -{
> -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
> -	struct btrfs_root *root = inode->root;
> -	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
> -
> -	trace_btrfs_space_reservation(fs_info, "orphan", btrfs_ino(inode),
> -			num_bytes, 0);
> -	btrfs_block_rsv_release(fs_info, root->orphan_block_rsv, num_bytes);
> -}
> -
>  /*
>   * btrfs_subvolume_reserve_metadata() - reserve space for subvolume operation
>   * root: the root of the parent directory
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index b9a046b8c72c..50f10882a715 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3293,37 +3293,18 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
>  }
>  
>  /*
> - * This is called in transaction commit time. If there are no orphan
> - * files in the subvolume, it removes orphan item and frees block_rsv
> - * structure.
> + * This is called in transaction commit time. If there are no orphan files in
                     ^^
nit:               s/in/at
> + * the subvolume, it removes the orphan item.
>   */
>  void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
>  			      struct btrfs_root *root)
>  {

This function is called only in transaction.c, why not unexport it and
move it to transaction.c?

>  	struct btrfs_fs_info *fs_info = root->fs_info;
> -	struct btrfs_block_rsv *block_rsv;
>  	int ret;
>  
> -	if (atomic_read(&root->orphan_inodes) ||
> -	    root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE)
> -		return;
> -
> -	spin_lock(&root->orphan_lock);
> -	if (atomic_read(&root->orphan_inodes)) {
> -		spin_unlock(&root->orphan_lock);
> -		return;
> -	}
> -
> -	if (root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) {
> -		spin_unlock(&root->orphan_lock);
> -		return;
> -	}
> -
> -	block_rsv = root->orphan_block_rsv;
> -	root->orphan_block_rsv = NULL;
> -	spin_unlock(&root->orphan_lock);
> -
> -	if (test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state) &&
> +	if (atomic_read(&root->orphan_inodes) == 0 &&
> +	    root->orphan_cleanup_state == ORPHAN_CLEANUP_DONE &&
> +	    test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state) &&
>  	    btrfs_root_refs(&root->root_item) > 0) {
>  		ret = btrfs_del_orphan_item(trans, fs_info->tree_root,
>  					    root->root_key.objectid);
> @@ -3333,11 +3314,6 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
>  			clear_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED,
>  				  &root->state);
>  	}
> -
> -	if (block_rsv) {
> -		WARN_ON(block_rsv->size > 0);
> -		btrfs_free_block_rsv(fs_info, block_rsv);
> -	}
>  }
>  
>  /*
> @@ -3562,12 +3538,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
>  
>  	root->orphan_cleanup_state = ORPHAN_CLEANUP_DONE;
>  
> -	if (root->orphan_block_rsv)
> -		btrfs_block_rsv_release(fs_info, root->orphan_block_rsv,
> -					(u64)-1);
> -
> -	if (root->orphan_block_rsv ||
> -	    test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state)) {
> +	if (test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state)) {
>  		trans = btrfs_join_transaction(root);
>  		if (!IS_ERR(trans))
>  			btrfs_end_transaction(trans);
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 09/12] Btrfs: get rid of root->orphan_block_rsv and root->orphan_lock
  2018-05-11  6:44   ` Nikolay Borisov
@ 2018-05-11  6:48     ` Omar Sandoval
  2018-05-11  7:20       ` Omar Sandoval
  0 siblings, 1 reply; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  6:48 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, kernel-team, Chris Mason, Josef Bacik

On Fri, May 11, 2018 at 09:44:36AM +0300, Nikolay Borisov wrote:
> 
> 
> On 11.05.2018 03:11, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Now that we don't keep long-standing reservations for orphan items,
> > root->orphan_block_rsv isn't used. We can git rid of it and
> > root->orphan_lock, which was used to protect it.
> > 
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > ---
> >  fs/btrfs/ctree.h       |  5 -----
> >  fs/btrfs/disk-io.c     |  8 --------
> >  fs/btrfs/extent-tree.c | 38 --------------------------------------
> >  fs/btrfs/inode.c       | 41 ++++++-----------------------------------
> >  4 files changed, 6 insertions(+), 86 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 2771cc56a622..d66241a35fab 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -1219,9 +1219,7 @@ struct btrfs_root {
> >  	spinlock_t log_extents_lock[2];
> >  	struct list_head logged_list[2];
> >  
> > -	spinlock_t orphan_lock;
> >  	atomic_t orphan_inodes;
> > -	struct btrfs_block_rsv *orphan_block_rsv;
> >  	int orphan_cleanup_state;
> >  
> >  	spinlock_t inode_lock;
> > @@ -2764,9 +2762,6 @@ void btrfs_delalloc_release_space(struct inode *inode,
> >  void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
> >  					    u64 len);
> >  void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
> > -int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
> > -				  struct btrfs_inode *inode);
> > -void btrfs_orphan_release_metadata(struct btrfs_inode *inode);
> >  int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
> >  				     struct btrfs_block_rsv *rsv,
> >  				     int nitems,
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index 60caa68c3618..24e15e2080f4 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -1185,7 +1185,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
> >  	root->inode_tree = RB_ROOT;
> >  	INIT_RADIX_TREE(&root->delayed_nodes_tree, GFP_ATOMIC);
> >  	root->block_rsv = NULL;
> > -	root->orphan_block_rsv = NULL;
> >  
> >  	INIT_LIST_HEAD(&root->dirty_list);
> >  	INIT_LIST_HEAD(&root->root_list);
> > @@ -1195,7 +1194,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
> >  	INIT_LIST_HEAD(&root->ordered_root);
> >  	INIT_LIST_HEAD(&root->logged_list[0]);
> >  	INIT_LIST_HEAD(&root->logged_list[1]);
> > -	spin_lock_init(&root->orphan_lock);
> >  	spin_lock_init(&root->inode_lock);
> >  	spin_lock_init(&root->delalloc_lock);
> >  	spin_lock_init(&root->ordered_extent_lock);
> > @@ -3674,8 +3672,6 @@ static void free_fs_root(struct btrfs_root *root)
> >  {
> >  	iput(root->ino_cache_inode);
> >  	WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
> > -	btrfs_free_block_rsv(root->fs_info, root->orphan_block_rsv);
> > -	root->orphan_block_rsv = NULL;
> >  	if (root->anon_dev)
> >  		free_anon_bdev(root->anon_dev);
> >  	if (root->subv_writers)
> > @@ -3766,7 +3762,6 @@ int btrfs_commit_super(struct btrfs_fs_info *fs_info)
> >  
> >  void close_ctree(struct btrfs_fs_info *fs_info)
> >  {
> > -	struct btrfs_root *root = fs_info->tree_root;
> >  	int ret;
> >  
> >  	set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
> > @@ -3861,9 +3856,6 @@ void close_ctree(struct btrfs_fs_info *fs_info)
> >  	btrfs_free_stripe_hash_table(fs_info);
> >  	btrfs_free_ref_cache(fs_info);
> >  
> > -	__btrfs_free_block_rsv(root->orphan_block_rsv);
> > -	root->orphan_block_rsv = NULL;
> > -
> >  	while (!list_empty(&fs_info->pinned_chunks)) {
> >  		struct extent_map *em;
> >  
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 51b5e2da708c..3f2e026bc206 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -5949,44 +5949,6 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
> >  	trans->chunk_bytes_reserved = 0;
> >  }
> >  
> > -/* Can only return 0 or -ENOSPC */
> > -int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
> > -				  struct btrfs_inode *inode)
> > -{
> > -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
> > -	struct btrfs_root *root = inode->root;
> > -	/*
> > -	 * We always use trans->block_rsv here as we will have reserved space
> > -	 * for our orphan when starting the transaction, using get_block_rsv()
> > -	 * here will sometimes make us choose the wrong block rsv as we could be
> > -	 * doing a reloc inode for a non refcounted root.
> > -	 */
> > -	struct btrfs_block_rsv *src_rsv = trans->block_rsv;
> > -	struct btrfs_block_rsv *dst_rsv = root->orphan_block_rsv;
> > -
> > -	/*
> > -	 * We need to hold space in order to delete our orphan item once we've
> > -	 * added it, so this takes the reservation so we can release it later
> > -	 * when we are truly done with the orphan item.
> > -	 */
> > -	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
> > -
> > -	trace_btrfs_space_reservation(fs_info, "orphan", btrfs_ino(inode),
> > -			num_bytes, 1);
> > -	return btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, 1);
> > -}
> > -
> > -void btrfs_orphan_release_metadata(struct btrfs_inode *inode)
> > -{
> > -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
> > -	struct btrfs_root *root = inode->root;
> > -	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
> > -
> > -	trace_btrfs_space_reservation(fs_info, "orphan", btrfs_ino(inode),
> > -			num_bytes, 0);
> > -	btrfs_block_rsv_release(fs_info, root->orphan_block_rsv, num_bytes);
> > -}
> > -
> >  /*
> >   * btrfs_subvolume_reserve_metadata() - reserve space for subvolume operation
> >   * root: the root of the parent directory
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index b9a046b8c72c..50f10882a715 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -3293,37 +3293,18 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
> >  }
> >  
> >  /*
> > - * This is called in transaction commit time. If there are no orphan
> > - * files in the subvolume, it removes orphan item and frees block_rsv
> > - * structure.
> > + * This is called in transaction commit time. If there are no orphan files in
>                      ^^
> nit:               s/in/at
> > + * the subvolume, it removes the orphan item.
> >   */
> >  void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
> >  			      struct btrfs_root *root)
> >  {
> 
> This function is called only in transaction.c, why not unexport it and
> move it to transaction.c?

It goes away entirely in the next patch :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 08/12] Btrfs: fix ENOSPC caused by orphan items reservations
  2018-05-11  6:38   ` Nikolay Borisov
@ 2018-05-11  6:51     ` Omar Sandoval
  0 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  6:51 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, kernel-team, Chris Mason, Josef Bacik

On Fri, May 11, 2018 at 09:38:15AM +0300, Nikolay Borisov wrote:
> 
> 
> On 11.05.2018 03:11, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Currently, we keep space reserved for all inode orphan items until the
> > inode is evicted (i.e., all references to it are dropped). We hit an
> > issue where an application would keep a bunch of deleted files open (by
> > design) and thus keep a large amount of space reserved, causing ENOSPC
> > errors when other operations tried to reserve space. This long-standing
> > reservation isn't absolutely necessary for a couple of reasons:
> > 
> > - We can almost always make the reservation we need or steal from the
> >   global reserve for the orphan item
> > - If we can't, it's not the end of the world if we drop the orphan item
> >   on the floor and let the next mount clean it up
> > 
> > So, get rid of persistent reservation and just reserve space in
> > btrfs_evict_inode().
> > 
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > ---
> >  fs/btrfs/btrfs_inode.h |  19 +++---
> >  fs/btrfs/inode.c       | 142 ++++++++++++-----------------------------
> >  2 files changed, 50 insertions(+), 111 deletions(-)
> > 
> > diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> > index 234bae55b85d..2f466cf55790 100644
> > --- a/fs/btrfs/btrfs_inode.h
> > +++ b/fs/btrfs/btrfs_inode.h
> > @@ -20,16 +20,15 @@
> >   * new data the application may have written before commit.
> >   */
> >  #define BTRFS_INODE_ORDERED_DATA_CLOSE		0
> > -#define BTRFS_INODE_ORPHAN_META_RESERVED	1
> > -#define BTRFS_INODE_DUMMY			2
> > -#define BTRFS_INODE_IN_DEFRAG			3
> > -#define BTRFS_INODE_HAS_ORPHAN_ITEM		4
> > -#define BTRFS_INODE_HAS_ASYNC_EXTENT		5
> > -#define BTRFS_INODE_NEEDS_FULL_SYNC		6
> > -#define BTRFS_INODE_COPY_EVERYTHING		7
> > -#define BTRFS_INODE_IN_DELALLOC_LIST		8
> > -#define BTRFS_INODE_READDIO_NEED_LOCK		9
> > -#define BTRFS_INODE_HAS_PROPS		        10
> > +#define BTRFS_INODE_DUMMY			1
> > +#define BTRFS_INODE_IN_DEFRAG			2
> > +#define BTRFS_INODE_HAS_ORPHAN_ITEM		3
> > +#define BTRFS_INODE_HAS_ASYNC_EXTENT		4
> > +#define BTRFS_INODE_NEEDS_FULL_SYNC		5
> > +#define BTRFS_INODE_COPY_EVERYTHING		6
> > +#define BTRFS_INODE_IN_DELALLOC_LIST		7
> > +#define BTRFS_INODE_READDIO_NEED_LOCK		8
> > +#define BTRFS_INODE_HAS_PROPS		        9
> >  
> >  /* in memory btrfs inode */
> >  struct btrfs_inode {
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 348dc57920f5..b9a046b8c72c 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -3343,88 +3343,25 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
> >  /*
> >   * This creates an orphan entry for the given inode in case something goes wrong
> >   * in the middle of an unlink.
> > - *
> > - * NOTE: caller of this function should reserve 5 units of metadata for
> > - *	 this function.
> >   */
> >  int btrfs_orphan_add(struct btrfs_trans_handle *trans,
> >  		struct btrfs_inode *inode)
> >  {
> > -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
> >  	struct btrfs_root *root = inode->root;
> > -	struct btrfs_block_rsv *block_rsv = NULL;
> > -	int reserve = 0;
> > -	bool insert = false;
> >  	int ret;
> >  
> > -	if (!root->orphan_block_rsv) {
> > -		block_rsv = btrfs_alloc_block_rsv(fs_info,
> > -						  BTRFS_BLOCK_RSV_TEMP);
> > -		if (!block_rsv)
> > -			return -ENOMEM;
> > -	}
> > -
> > -	if (!test_and_set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > -			      &inode->runtime_flags))
> > -		insert = true;
> > -
> > -	if (!test_and_set_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
> > -			      &inode->runtime_flags))
> > -		reserve = 1;
> > -
> > -	spin_lock(&root->orphan_lock);
> > -	/* If someone has created ->orphan_block_rsv, be happy to use it. */
> > -	if (!root->orphan_block_rsv) {
> > -		root->orphan_block_rsv = block_rsv;
> > -	} else if (block_rsv) {
> > -		btrfs_free_block_rsv(fs_info, block_rsv);
> > -		block_rsv = NULL;
> > -	}
> > +	if (test_and_set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > +			     &inode->runtime_flags))
> > +		return 0;
> 
> How come this can return true? Shouldn't btrfs_orphan_add always be
> called for an inode which doesn't have an orphan item? Having this check
> seems to indicate there is some non-determinism in the lifetime of
> orphan items.

Another great point, this was needed when we also inserted orphan items
for truncate, but it's not needed anymore. I'll clean that up, too.

> > -	if (insert)
> > -		atomic_inc(&root->orphan_inodes);
> > -	spin_unlock(&root->orphan_lock);
> > +	atomic_inc(&root->orphan_inodes);
> >  
> > -	/* grab metadata reservation from transaction handle */
> > -	if (reserve) {
> > -		ret = btrfs_orphan_reserve_metadata(trans, inode);
> > -		ASSERT(!ret);
> > -		if (ret) {
> > -			/*
> > -			 * dec doesn't need spin_lock as ->orphan_block_rsv
> > -			 * would be released only if ->orphan_inodes is
> > -			 * zero.
> > -			 */
> > -			atomic_dec(&root->orphan_inodes);
> > -			clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
> > -				  &inode->runtime_flags);
> > -			if (insert)
> > -				clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > -					  &inode->runtime_flags);
> > -			return ret;
> > -		}
> > -	}
> > -
> > -	/* insert an orphan item to track this unlinked file */
> > -	if (insert) {
> > -		ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
> > -		if (ret && ret != -EEXIST) {
> > -			if (reserve) {
> > -				clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
> > -					  &inode->runtime_flags);
> > -				btrfs_orphan_release_metadata(inode);
> > -			}
> > -			/*
> > -			 * btrfs_orphan_commit_root may race with us and set
> > -			 * ->orphan_block_rsv to zero, in order to avoid that,
> > -			 * decrease ->orphan_inodes after everything is done.
> > -			 */
> > -			atomic_dec(&root->orphan_inodes);
> > -			clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > -				  &inode->runtime_flags);
> > -			btrfs_abort_transaction(trans, ret);
> > -			return ret;
> > -		}
> > +	ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
> > +	if (ret && ret != -EEXIST) {
> > +		atomic_dec(&root->orphan_inodes);
> > +		clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, &inode->runtime_flags);
> > +		btrfs_abort_transaction(trans, ret);
> > +		return ret;
> >  	}
> >  
> >  	return 0;
> > @@ -3438,27 +3375,16 @@ static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
> >  			    struct btrfs_inode *inode)
> >  {
> >  	struct btrfs_root *root = inode->root;
> > -	int delete_item = 0;
> >  	int ret = 0;
> >  
> > -	if (test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > -			       &inode->runtime_flags))
> > -		delete_item = 1;
> > +	if (!test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> > +				&inode->runtime_flags))
> > +		return 0;
> 
> Similar comment as in btrfs_orphan_del. Shouldn't this always follow a
> successful btrfs_orphan_add, guaranteeing there is an item for this
> inode? Are there some "benign" races that could happen in the mean time
> which could trigger this check ?

Same thing here, and it's even easier to convince myself here that it's
unnecessary: btrfs_evict_inode() ignores errors if something magical did
happen, and we really expect the orphan item to be there for an
O_TMPFILE in btrfs_link().

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 10/12] Btrfs: get rid of btrfs_orphan_commit_root() and root->orphan_inodes
  2018-05-11  0:11 ` [PATCH v2 10/12] Btrfs: get rid of btrfs_orphan_commit_root() and root->orphan_inodes Omar Sandoval
@ 2018-05-11  7:01   ` Nikolay Borisov
  2018-05-11  7:05     ` Omar Sandoval
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Borisov @ 2018-05-11  7:01 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Chris Mason, Josef Bacik



On 11.05.2018 03:11, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> btrfs_orphan_commit_root() tries to delete an orphan item for a
> subvolume in the tree root, but we don't actually insert that item in
> the first place. See commit 0a0d4415e338 ("Btrfs: delete dead code in
> btrfs_orphan_add()"). We can get rid of it.
> 
> root->orphan_inodes was used to as a refcount for root->orphan_block_rsv
> and for btrfs_orphan_commit_root(), but both are gone now, so we can get
> rid of it, as well.

So this really means that the logic for orphaned root item is completely
defunct. Perhaps:

a) This commit should be expanded to delete all the logic surrounding
orphaned roots: btrfs_find_orphan_roots, BTRFS_ROOT_ORPHAN_ITEM_INSERTED
flag define and any accompanying logic that utilizes this flag (end of
btrfs_orphan_cleanup, orphan root logic at the end of btrfs_get_fs_root
as well)

b) Have those changes in a separate patch



> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/ctree.h       |  3 ---
>  fs/btrfs/disk-io.c     |  1 -
>  fs/btrfs/inode.c       | 40 ++++------------------------------------
>  fs/btrfs/transaction.c |  1 -
>  4 files changed, 4 insertions(+), 41 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index d66241a35fab..51408de11af2 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1219,7 +1219,6 @@ struct btrfs_root {
>  	spinlock_t log_extents_lock[2];
>  	struct list_head logged_list[2];
>  
> -	atomic_t orphan_inodes;
>  	int orphan_cleanup_state;
>  
>  	spinlock_t inode_lock;
> @@ -3233,8 +3232,6 @@ int btrfs_update_inode_fallback(struct btrfs_trans_handle *trans,
>  int btrfs_orphan_add(struct btrfs_trans_handle *trans,
>  		struct btrfs_inode *inode);
>  int btrfs_orphan_cleanup(struct btrfs_root *root);
> -void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
> -			      struct btrfs_root *root);
>  int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size);
>  void btrfs_invalidate_inodes(struct btrfs_root *root);
>  void btrfs_add_delayed_iput(struct inode *inode);
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 24e15e2080f4..4a40bfdddabc 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1214,7 +1214,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
>  	atomic_set(&root->log_commit[1], 0);
>  	atomic_set(&root->log_writers, 0);
>  	atomic_set(&root->log_batch, 0);
> -	atomic_set(&root->orphan_inodes, 0);
>  	refcount_set(&root->refs, 1);
>  	atomic_set(&root->will_be_snapshotted, 0);
>  	root->log_transid = 0;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 50f10882a715..207e1d139b31 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3292,30 +3292,6 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
>  	spin_unlock(&fs_info->delayed_iput_lock);
>  }
>  
> -/*
> - * This is called in transaction commit time. If there are no orphan files in
> - * the subvolume, it removes the orphan item.
> - */
> -void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
> -			      struct btrfs_root *root)
> -{
> -	struct btrfs_fs_info *fs_info = root->fs_info;
> -	int ret;
> -
> -	if (atomic_read(&root->orphan_inodes) == 0 &&
> -	    root->orphan_cleanup_state == ORPHAN_CLEANUP_DONE &&
> -	    test_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, &root->state) &&
> -	    btrfs_root_refs(&root->root_item) > 0) {
> -		ret = btrfs_del_orphan_item(trans, fs_info->tree_root,
> -					    root->root_key.objectid);
> -		if (ret)
> -			btrfs_abort_transaction(trans, ret);
> -		else
> -			clear_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED,
> -				  &root->state);
> -	}
> -}
> -
>  /*
>   * This creates an orphan entry for the given inode in case something goes wrong
>   * in the middle of an unlink.
> @@ -3330,11 +3306,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
>  			     &inode->runtime_flags))
>  		return 0;
>  
> -	atomic_inc(&root->orphan_inodes);
> -
>  	ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
>  	if (ret && ret != -EEXIST) {
> -		atomic_dec(&root->orphan_inodes);
>  		clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, &inode->runtime_flags);
>  		btrfs_abort_transaction(trans, ret);
>  		return ret;
> @@ -3360,8 +3333,6 @@ static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
>  	if (trans)
>  		ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode));
>  
> -	atomic_dec(&root->orphan_inodes);
> -
>  	return ret;
>  }
>  
> @@ -3519,12 +3490,11 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
>  		}
>  
>  		/*
> -		 * add this inode to the orphan list so btrfs_orphan_del does
> -		 * the proper thing when we hit it
> +		 * Mark this inode as having an orphan item so
> +		 * btrfs_orphan_del() does the proper thing when we hit it.
>  		 */
>  		set_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
>  			&BTRFS_I(inode)->runtime_flags);
> -		atomic_inc(&root->orphan_inodes);
>  
>  		nr_unlink++;
>  
> @@ -9146,11 +9116,9 @@ void btrfs_destroy_inode(struct inode *inode)
>  		goto free;
>  
>  	if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> -		     &BTRFS_I(inode)->runtime_flags)) {
> -		btrfs_info(fs_info, "inode %llu still on the orphan list",
> +		     &BTRFS_I(inode)->runtime_flags))
> +		btrfs_info(fs_info, "inode %llu still has an orphan item",
>  			   btrfs_ino(BTRFS_I(inode)));
> -		atomic_dec(&root->orphan_inodes);
> -	}
>  
>  	while (1) {
>  		ordered = btrfs_lookup_first_ordered_extent(inode, (u64)-1);
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index c944b4769e3c..44af1edf15d1 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1250,7 +1250,6 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
>  
>  			btrfs_free_log(trans, root);
>  			btrfs_update_reloc_root(trans, root);
> -			btrfs_orphan_commit_root(trans, root);
>  
>  			btrfs_save_ino_cache(root, trans);
>  
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 10/12] Btrfs: get rid of btrfs_orphan_commit_root() and root->orphan_inodes
  2018-05-11  7:01   ` Nikolay Borisov
@ 2018-05-11  7:05     ` Omar Sandoval
  0 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  7:05 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, kernel-team, Chris Mason, Josef Bacik

On Fri, May 11, 2018 at 10:01:34AM +0300, Nikolay Borisov wrote:
> 
> 
> On 11.05.2018 03:11, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > btrfs_orphan_commit_root() tries to delete an orphan item for a
> > subvolume in the tree root, but we don't actually insert that item in
> > the first place. See commit 0a0d4415e338 ("Btrfs: delete dead code in
> > btrfs_orphan_add()"). We can get rid of it.
> > 
> > root->orphan_inodes was used to as a refcount for root->orphan_block_rsv
> > and for btrfs_orphan_commit_root(), but both are gone now, so we can get
> > rid of it, as well.
> 
> So this really means that the logic for orphaned root item is completely
> defunct. Perhaps:
> 
> a) This commit should be expanded to delete all the logic surrounding
> orphaned roots: btrfs_find_orphan_roots, BTRFS_ROOT_ORPHAN_ITEM_INSERTED
> flag define and any accompanying logic that utilizes this flag (end of
> btrfs_orphan_cleanup, orphan root logic at the end of btrfs_get_fs_root
> as well)
> 
> b) Have those changes in a separate patch

The root orphan handling is still necessary for subvolume deletion.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 09/12] Btrfs: get rid of root->orphan_block_rsv and root->orphan_lock
  2018-05-11  6:48     ` Omar Sandoval
@ 2018-05-11  7:20       ` Omar Sandoval
  0 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-11  7:20 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs, kernel-team, Chris Mason, Josef Bacik

On Thu, May 10, 2018 at 11:48:28PM -0700, Omar Sandoval wrote:
> On Fri, May 11, 2018 at 09:44:36AM +0300, Nikolay Borisov wrote:
> > 
> > 
> > On 11.05.2018 03:11, Omar Sandoval wrote:
> > > From: Omar Sandoval <osandov@fb.com>
> > > 
> > > Now that we don't keep long-standing reservations for orphan items,
> > > root->orphan_block_rsv isn't used. We can git rid of it and
> > > root->orphan_lock, which was used to protect it.
> > > 
> > > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > > ---
> > >  fs/btrfs/ctree.h       |  5 -----
> > >  fs/btrfs/disk-io.c     |  8 --------
> > >  fs/btrfs/extent-tree.c | 38 --------------------------------------
> > >  fs/btrfs/inode.c       | 41 ++++++-----------------------------------
> > >  4 files changed, 6 insertions(+), 86 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > > index 2771cc56a622..d66241a35fab 100644
> > > --- a/fs/btrfs/ctree.h
> > > +++ b/fs/btrfs/ctree.h
> > > @@ -1219,9 +1219,7 @@ struct btrfs_root {
> > >  	spinlock_t log_extents_lock[2];
> > >  	struct list_head logged_list[2];
> > >  
> > > -	spinlock_t orphan_lock;
> > >  	atomic_t orphan_inodes;
> > > -	struct btrfs_block_rsv *orphan_block_rsv;
> > >  	int orphan_cleanup_state;
> > >  
> > >  	spinlock_t inode_lock;
> > > @@ -2764,9 +2762,6 @@ void btrfs_delalloc_release_space(struct inode *inode,
> > >  void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
> > >  					    u64 len);
> > >  void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
> > > -int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
> > > -				  struct btrfs_inode *inode);
> > > -void btrfs_orphan_release_metadata(struct btrfs_inode *inode);
> > >  int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
> > >  				     struct btrfs_block_rsv *rsv,
> > >  				     int nitems,
> > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > > index 60caa68c3618..24e15e2080f4 100644
> > > --- a/fs/btrfs/disk-io.c
> > > +++ b/fs/btrfs/disk-io.c
> > > @@ -1185,7 +1185,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
> > >  	root->inode_tree = RB_ROOT;
> > >  	INIT_RADIX_TREE(&root->delayed_nodes_tree, GFP_ATOMIC);
> > >  	root->block_rsv = NULL;
> > > -	root->orphan_block_rsv = NULL;
> > >  
> > >  	INIT_LIST_HEAD(&root->dirty_list);
> > >  	INIT_LIST_HEAD(&root->root_list);
> > > @@ -1195,7 +1194,6 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info,
> > >  	INIT_LIST_HEAD(&root->ordered_root);
> > >  	INIT_LIST_HEAD(&root->logged_list[0]);
> > >  	INIT_LIST_HEAD(&root->logged_list[1]);
> > > -	spin_lock_init(&root->orphan_lock);
> > >  	spin_lock_init(&root->inode_lock);
> > >  	spin_lock_init(&root->delalloc_lock);
> > >  	spin_lock_init(&root->ordered_extent_lock);
> > > @@ -3674,8 +3672,6 @@ static void free_fs_root(struct btrfs_root *root)
> > >  {
> > >  	iput(root->ino_cache_inode);
> > >  	WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
> > > -	btrfs_free_block_rsv(root->fs_info, root->orphan_block_rsv);
> > > -	root->orphan_block_rsv = NULL;
> > >  	if (root->anon_dev)
> > >  		free_anon_bdev(root->anon_dev);
> > >  	if (root->subv_writers)
> > > @@ -3766,7 +3762,6 @@ int btrfs_commit_super(struct btrfs_fs_info *fs_info)
> > >  
> > >  void close_ctree(struct btrfs_fs_info *fs_info)
> > >  {
> > > -	struct btrfs_root *root = fs_info->tree_root;
> > >  	int ret;
> > >  
> > >  	set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
> > > @@ -3861,9 +3856,6 @@ void close_ctree(struct btrfs_fs_info *fs_info)
> > >  	btrfs_free_stripe_hash_table(fs_info);
> > >  	btrfs_free_ref_cache(fs_info);
> > >  
> > > -	__btrfs_free_block_rsv(root->orphan_block_rsv);
> > > -	root->orphan_block_rsv = NULL;
> > > -
> > >  	while (!list_empty(&fs_info->pinned_chunks)) {
> > >  		struct extent_map *em;
> > >  
> > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > index 51b5e2da708c..3f2e026bc206 100644
> > > --- a/fs/btrfs/extent-tree.c
> > > +++ b/fs/btrfs/extent-tree.c
> > > @@ -5949,44 +5949,6 @@ void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans)
> > >  	trans->chunk_bytes_reserved = 0;
> > >  }
> > >  
> > > -/* Can only return 0 or -ENOSPC */
> > > -int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
> > > -				  struct btrfs_inode *inode)
> > > -{
> > > -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
> > > -	struct btrfs_root *root = inode->root;
> > > -	/*
> > > -	 * We always use trans->block_rsv here as we will have reserved space
> > > -	 * for our orphan when starting the transaction, using get_block_rsv()
> > > -	 * here will sometimes make us choose the wrong block rsv as we could be
> > > -	 * doing a reloc inode for a non refcounted root.
> > > -	 */
> > > -	struct btrfs_block_rsv *src_rsv = trans->block_rsv;
> > > -	struct btrfs_block_rsv *dst_rsv = root->orphan_block_rsv;
> > > -
> > > -	/*
> > > -	 * We need to hold space in order to delete our orphan item once we've
> > > -	 * added it, so this takes the reservation so we can release it later
> > > -	 * when we are truly done with the orphan item.
> > > -	 */
> > > -	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
> > > -
> > > -	trace_btrfs_space_reservation(fs_info, "orphan", btrfs_ino(inode),
> > > -			num_bytes, 1);
> > > -	return btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, 1);
> > > -}
> > > -
> > > -void btrfs_orphan_release_metadata(struct btrfs_inode *inode)
> > > -{
> > > -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->vfs_inode.i_sb);
> > > -	struct btrfs_root *root = inode->root;
> > > -	u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
> > > -
> > > -	trace_btrfs_space_reservation(fs_info, "orphan", btrfs_ino(inode),
> > > -			num_bytes, 0);
> > > -	btrfs_block_rsv_release(fs_info, root->orphan_block_rsv, num_bytes);
> > > -}
> > > -
> > >  /*
> > >   * btrfs_subvolume_reserve_metadata() - reserve space for subvolume operation
> > >   * root: the root of the parent directory
> > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > > index b9a046b8c72c..50f10882a715 100644
> > > --- a/fs/btrfs/inode.c
> > > +++ b/fs/btrfs/inode.c
> > > @@ -3293,37 +3293,18 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
> > >  }
> > >  
> > >  /*
> > > - * This is called in transaction commit time. If there are no orphan
> > > - * files in the subvolume, it removes orphan item and frees block_rsv
> > > - * structure.
> > > + * This is called in transaction commit time. If there are no orphan files in
> >                      ^^
> > nit:               s/in/at
> > > + * the subvolume, it removes the orphan item.
> > >   */
> > >  void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans,
> > >  			      struct btrfs_root *root)
> > >  {
> > 
> > This function is called only in transaction.c, why not unexport it and
> > move it to transaction.c?
> 
> It goes away entirely in the next patch :)

I'm reordering things a bit so that some of the churn is unnecessary,
stay tuned.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-05-11  7:20 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-11  0:11 [PATCH v2 00/12] Btrfs: orphan and truncate fixes Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 01/12] Btrfs: remove stale comment referencing vmtruncate() Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 02/12] Btrfs: fix error handling in btrfs_truncate_inode_items() Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 03/12] Btrfs: don't BUG_ON() " Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 04/12] Btrfs: stop creating orphan items for truncate Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 05/12] Btrfs: don't release reserve or decrement orphan count if orphan item already existed Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 06/12] Btrfs: don't return ino to ino cache if inode item removal fails Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 07/12] Btrfs: refactor btrfs_evict_inode() reserve refill dance Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 08/12] Btrfs: fix ENOSPC caused by orphan items reservations Omar Sandoval
2018-05-11  6:38   ` Nikolay Borisov
2018-05-11  6:51     ` Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 09/12] Btrfs: get rid of root->orphan_block_rsv and root->orphan_lock Omar Sandoval
2018-05-11  6:44   ` Nikolay Borisov
2018-05-11  6:48     ` Omar Sandoval
2018-05-11  7:20       ` Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 10/12] Btrfs: get rid of btrfs_orphan_commit_root() and root->orphan_inodes Omar Sandoval
2018-05-11  7:01   ` Nikolay Borisov
2018-05-11  7:05     ` Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 11/12] Btrfs: simplify error handling in btrfs_evict_inode() Omar Sandoval
2018-05-11  0:11 ` [PATCH v2 12/12] Btrfs: reserve space for O_TMPFILE orphan item deletion Omar Sandoval

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.