linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Delayed iput fixes
@ 2018-11-21 19:09 Josef Bacik
  2018-11-21 19:09 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Josef Bacik @ 2018-11-21 19:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Here are some delayed iput fixes.  Delayed iputs can hold reservations for a
while and there's no real good way to make sure they were gone for good, which
means we could early enospc when in reality if we had just waited for the iput
we would have had plenty of space.  So fix this up by making us wait for delayed
iputs when deciding if we need to commit for enospc flushing, and then cleanup
and rework how we run delayed iputs to make it more straightforward to wait on
them and make sure we're all done using them.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] btrfs: run delayed iputs before committing
  2018-11-21 19:09 [PATCH 0/3] Delayed iput fixes Josef Bacik
@ 2018-11-21 19:09 ` Josef Bacik
  2018-11-26 14:44   ` Nikolay Borisov
  2018-11-21 19:09 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik
  2018-11-21 19:09 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik
  2 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2018-11-21 19:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd.  So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is and we freed enough we can then commit the transaction and
potentially be able to make our reservation.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent-tree.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 90423b6749b7..3a90dc1d6b31 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4833,6 +4833,15 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	if (!bytes)
 		return 0;
 
+	/*
+	 * If we have pending delayed iputs then we could free up a bunch of
+	 * pinned space, so make sure we run the iputs before we do our pinned
+	 * bytes check below.
+	 */
+	mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
+	btrfs_run_delayed_iputs(fs_info);
+	mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
+
 	trans = btrfs_join_transaction(fs_info->extent_root);
 	if (IS_ERR(trans))
 		return PTR_ERR(trans);
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-11-21 19:09 [PATCH 0/3] Delayed iput fixes Josef Bacik
  2018-11-21 19:09 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
@ 2018-11-21 19:09 ` Josef Bacik
  2018-11-27  8:26   ` Nikolay Borisov
  2018-11-21 19:09 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik
  2 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2018-11-21 19:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

The cleaner thread usually takes care of delayed iputs, with the
exception of the btrfs_end_transaction_throttle path.  The cleaner
thread only gets woken up every 30 seconds, so instead wake it up to do
it's work so that we can free up that space as quickly as possible.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3da9ac463344..3c42d8887183 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3264,6 +3264,7 @@ void btrfs_add_delayed_iput(struct inode *inode)
 	ASSERT(list_empty(&binode->delayed_iput));
 	list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs);
 	spin_unlock(&fs_info->delayed_iput_lock);
+	wake_up_process(fs_info->cleaner_kthread);
 }
 
 void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue
  2018-11-21 19:09 [PATCH 0/3] Delayed iput fixes Josef Bacik
  2018-11-21 19:09 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
  2018-11-21 19:09 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik
@ 2018-11-21 19:09 ` Josef Bacik
  2018-11-27  8:29   ` Nikolay Borisov
  2 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2018-11-21 19:09 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

The throttle path doesn't take cleaner_delayed_iput_mutex, which means
we could think we're done flushing iputs in the data space reservation
path when we could have a throttler doing an iput.  There's no real
reason to serialize the delayed iput flushing, so instead of taking the
cleaner_delayed_iput_mutex whenever we flush the delayed iputs just
replace it with an atomic counter and a waitqueue.  This removes the
short (or long depending on how big the inode is) window where we think
there are no more pending iputs when there really are some.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h       |  4 +++-
 fs/btrfs/disk-io.c     |  5 ++---
 fs/btrfs/extent-tree.c |  9 +++++----
 fs/btrfs/inode.c       | 21 +++++++++++++++++++++
 4 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 709de7471d86..a835fe7076eb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -912,7 +912,8 @@ struct btrfs_fs_info {
 
 	spinlock_t delayed_iput_lock;
 	struct list_head delayed_iputs;
-	struct mutex cleaner_delayed_iput_mutex;
+	atomic_t nr_delayed_iputs;
+	wait_queue_head_t delayed_iputs_wait;
 
 	/* this protects tree_mod_seq_list */
 	spinlock_t tree_mod_seq_lock;
@@ -3237,6 +3238,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root);
 int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size);
 void btrfs_add_delayed_iput(struct inode *inode);
 void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info);
+int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info);
 int btrfs_prealloc_file_range(struct inode *inode, int mode,
 			      u64 start, u64 num_bytes, u64 min_size,
 			      loff_t actual_len, u64 *alloc_hint);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c5918ff8241b..3f81dfaefa32 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1692,9 +1692,7 @@ static int cleaner_kthread(void *arg)
 			goto sleep;
 		}
 
-		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
 		btrfs_run_delayed_iputs(fs_info);
-		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
 
 		again = btrfs_clean_one_deleted_snapshot(root);
 		mutex_unlock(&fs_info->cleaner_mutex);
@@ -2651,7 +2649,6 @@ int open_ctree(struct super_block *sb,
 	mutex_init(&fs_info->delete_unused_bgs_mutex);
 	mutex_init(&fs_info->reloc_mutex);
 	mutex_init(&fs_info->delalloc_root_mutex);
-	mutex_init(&fs_info->cleaner_delayed_iput_mutex);
 	seqlock_init(&fs_info->profiles_lock);
 
 	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
@@ -2673,6 +2670,7 @@ int open_ctree(struct super_block *sb,
 	atomic_set(&fs_info->defrag_running, 0);
 	atomic_set(&fs_info->qgroup_op_seq, 0);
 	atomic_set(&fs_info->reada_works_cnt, 0);
+	atomic_set(&fs_info->nr_delayed_iputs, 0);
 	atomic64_set(&fs_info->tree_mod_seq, 0);
 	fs_info->sb = sb;
 	fs_info->max_inline = BTRFS_DEFAULT_MAX_INLINE;
@@ -2750,6 +2748,7 @@ int open_ctree(struct super_block *sb,
 	init_waitqueue_head(&fs_info->transaction_wait);
 	init_waitqueue_head(&fs_info->transaction_blocked_wait);
 	init_waitqueue_head(&fs_info->async_submit_wait);
+	init_waitqueue_head(&fs_info->delayed_iputs_wait);
 
 	INIT_LIST_HEAD(&fs_info->pinned_chunks);
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3a90dc1d6b31..36f43876be22 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4272,8 +4272,9 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
 				 * operations. Wait for it to finish so that
 				 * more space is released.
 				 */
-				mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
-				mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
+				ret = btrfs_wait_on_delayed_iputs(fs_info);
+				if (ret)
+					return ret;
 				goto again;
 			} else {
 				btrfs_end_transaction(trans);
@@ -4838,9 +4839,9 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
 	 * pinned space, so make sure we run the iputs before we do our pinned
 	 * bytes check below.
 	 */
-	mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
 	btrfs_run_delayed_iputs(fs_info);
-	mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
+	wait_event(fs_info->delayed_iputs_wait,
+		   atomic_read(&fs_info->nr_delayed_iputs) == 0);
 
 	trans = btrfs_join_transaction(fs_info->extent_root);
 	if (IS_ERR(trans))
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3c42d8887183..57bf514a90eb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3260,6 +3260,7 @@ void btrfs_add_delayed_iput(struct inode *inode)
 	if (atomic_add_unless(&inode->i_count, -1, 1))
 		return;
 
+	atomic_inc(&fs_info->nr_delayed_iputs);
 	spin_lock(&fs_info->delayed_iput_lock);
 	ASSERT(list_empty(&binode->delayed_iput));
 	list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs);
@@ -3279,11 +3280,31 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
 		list_del_init(&inode->delayed_iput);
 		spin_unlock(&fs_info->delayed_iput_lock);
 		iput(&inode->vfs_inode);
+		if (atomic_dec_and_test(&fs_info->nr_delayed_iputs))
+			wake_up(&fs_info->delayed_iputs_wait);
 		spin_lock(&fs_info->delayed_iput_lock);
 	}
 	spin_unlock(&fs_info->delayed_iput_lock);
 }
 
+/**
+ * btrfs_wait_on_delayed_iputs - wait on the delayed iputs to be done running
+ * @fs_info - the fs_info for this fs
+ * @return - EINTR if we were killed, 0 if nothing's pending
+ *
+ * This will wait on any delayed iputs that are currently running with KILLABLE
+ * set.  Once they are all done running we will return, unless we are killed in
+ * which case we return EINTR.
+ */
+int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info)
+{
+	int ret = wait_event_killable(fs_info->delayed_iputs_wait,
+			atomic_read(&fs_info->nr_delayed_iputs) == 0);
+	if (ret)
+		return -EINTR;
+	return 0;
+}
+
 /*
  * This creates an orphan entry for the given inode in case something goes wrong
  * in the middle of an unlink.
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] btrfs: run delayed iputs before committing
  2018-11-21 19:09 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
@ 2018-11-26 14:44   ` Nikolay Borisov
  0 siblings, 0 replies; 18+ messages in thread
From: Nikolay Borisov @ 2018-11-26 14:44 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
> Delayed iputs means we can have final iputs of deleted inodes in the
> queue, which could potentially generate a lot of pinned space that could
> be free'd.  So before we decide to commit the transaction for ENOPSC
> reasons, run the delayed iputs so that any potential space is free'd up.
> If there is and we freed enough we can then commit the transaction and
> potentially be able to make our reservation.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/extent-tree.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 90423b6749b7..3a90dc1d6b31 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4833,6 +4833,15 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  	if (!bytes)
>  		return 0;
>  
> +	/*
> +	 * If we have pending delayed iputs then we could free up a bunch of
> +	 * pinned space, so make sure we run the iputs before we do our pinned
> +	 * bytes check below.
> +	 */
> +	mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
> +	btrfs_run_delayed_iputs(fs_info);
> +	mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);

IMHO this code is better suited to be in case COMMIT_TRANS in
flush_space. Let's leave may_commit_Trans to just decide whether it
should commit or not.

> +
>  	trans = btrfs_join_transaction(fs_info->extent_root);
>  	if (IS_ERR(trans))
>  		return PTR_ERR(trans);
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-11-21 19:09 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik
@ 2018-11-27  8:26   ` Nikolay Borisov
  2018-11-27 19:54     ` Josef Bacik
  0 siblings, 1 reply; 18+ messages in thread
From: Nikolay Borisov @ 2018-11-27  8:26 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
> The cleaner thread usually takes care of delayed iputs, with the
> exception of the btrfs_end_transaction_throttle path.  The cleaner
> thread only gets woken up every 30 seconds, so instead wake it up to do
> it's work so that we can free up that space as quickly as possible.

Have you done any measurements how this affects the overall system. I
suspect this introduces a lot of noise since now we are going to be
doing a thread wakeup on every iput, does this give a chance to have
nice, large batches of iputs that  the cost of wake up can be amortized
across?

> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/inode.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3da9ac463344..3c42d8887183 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3264,6 +3264,7 @@ void btrfs_add_delayed_iput(struct inode *inode)
>  	ASSERT(list_empty(&binode->delayed_iput));
>  	list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs);
>  	spin_unlock(&fs_info->delayed_iput_lock);
> +	wake_up_process(fs_info->cleaner_kthread);
>  }
>  
>  void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue
  2018-11-21 19:09 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik
@ 2018-11-27  8:29   ` Nikolay Borisov
  2018-11-27 20:01     ` Josef Bacik
  0 siblings, 1 reply; 18+ messages in thread
From: Nikolay Borisov @ 2018-11-27  8:29 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
> The throttle path doesn't take cleaner_delayed_iput_mutex, which means

Which one is the throttle path? btrfs_end_transaction_throttle is only
called during snapshot drop and relocation? What scenario triggered your
observation and prompted this fix?

> we could think we're done flushing iputs in the data space reservation
> path when we could have a throttler doing an iput.  There's no real
> reason to serialize the delayed iput flushing, so instead of taking the
> cleaner_delayed_iput_mutex whenever we flush the delayed iputs just
> replace it with an atomic counter and a waitqueue.  This removes the
> short (or long depending on how big the inode is) window where we think
> there are no more pending iputs when there really are some.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/ctree.h       |  4 +++-
>  fs/btrfs/disk-io.c     |  5 ++---
>  fs/btrfs/extent-tree.c |  9 +++++----
>  fs/btrfs/inode.c       | 21 +++++++++++++++++++++
>  4 files changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 709de7471d86..a835fe7076eb 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -912,7 +912,8 @@ struct btrfs_fs_info {
>  
>  	spinlock_t delayed_iput_lock;
>  	struct list_head delayed_iputs;
> -	struct mutex cleaner_delayed_iput_mutex;
> +	atomic_t nr_delayed_iputs;
> +	wait_queue_head_t delayed_iputs_wait;
>  
>  	/* this protects tree_mod_seq_list */
>  	spinlock_t tree_mod_seq_lock;
> @@ -3237,6 +3238,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root);
>  int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size);
>  void btrfs_add_delayed_iput(struct inode *inode);
>  void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info);
> +int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info);
>  int btrfs_prealloc_file_range(struct inode *inode, int mode,
>  			      u64 start, u64 num_bytes, u64 min_size,
>  			      loff_t actual_len, u64 *alloc_hint);
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c5918ff8241b..3f81dfaefa32 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1692,9 +1692,7 @@ static int cleaner_kthread(void *arg)
>  			goto sleep;
>  		}
>  
> -		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
>  		btrfs_run_delayed_iputs(fs_info);
> -		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
>  
>  		again = btrfs_clean_one_deleted_snapshot(root);
>  		mutex_unlock(&fs_info->cleaner_mutex);
> @@ -2651,7 +2649,6 @@ int open_ctree(struct super_block *sb,
>  	mutex_init(&fs_info->delete_unused_bgs_mutex);
>  	mutex_init(&fs_info->reloc_mutex);
>  	mutex_init(&fs_info->delalloc_root_mutex);
> -	mutex_init(&fs_info->cleaner_delayed_iput_mutex);
>  	seqlock_init(&fs_info->profiles_lock);
>  
>  	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
> @@ -2673,6 +2670,7 @@ int open_ctree(struct super_block *sb,
>  	atomic_set(&fs_info->defrag_running, 0);
>  	atomic_set(&fs_info->qgroup_op_seq, 0);
>  	atomic_set(&fs_info->reada_works_cnt, 0);
> +	atomic_set(&fs_info->nr_delayed_iputs, 0);
>  	atomic64_set(&fs_info->tree_mod_seq, 0);
>  	fs_info->sb = sb;
>  	fs_info->max_inline = BTRFS_DEFAULT_MAX_INLINE;
> @@ -2750,6 +2748,7 @@ int open_ctree(struct super_block *sb,
>  	init_waitqueue_head(&fs_info->transaction_wait);
>  	init_waitqueue_head(&fs_info->transaction_blocked_wait);
>  	init_waitqueue_head(&fs_info->async_submit_wait);
> +	init_waitqueue_head(&fs_info->delayed_iputs_wait);
>  
>  	INIT_LIST_HEAD(&fs_info->pinned_chunks);
>  
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3a90dc1d6b31..36f43876be22 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4272,8 +4272,9 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
>  				 * operations. Wait for it to finish so that
>  				 * more space is released.
>  				 */
> -				mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
> -				mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
> +				ret = btrfs_wait_on_delayed_iputs(fs_info);
> +				if (ret)
> +					return ret;
>  				goto again;
>  			} else {
>  				btrfs_end_transaction(trans);
> @@ -4838,9 +4839,9 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
>  	 * pinned space, so make sure we run the iputs before we do our pinned
>  	 * bytes check below.
>  	 */
> -	mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
>  	btrfs_run_delayed_iputs(fs_info);
> -	mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
> +	wait_event(fs_info->delayed_iputs_wait,
> +		   atomic_read(&fs_info->nr_delayed_iputs) == 0);

Why open code btrfs_wait_on_delayed_iputs(fs_info)? Just make it use
wait_event instead of the killable version and use the function
uniformally across the code base.

>  
>  	trans = btrfs_join_transaction(fs_info->extent_root);
>  	if (IS_ERR(trans))
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3c42d8887183..57bf514a90eb 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3260,6 +3260,7 @@ void btrfs_add_delayed_iput(struct inode *inode)
>  	if (atomic_add_unless(&inode->i_count, -1, 1))
>  		return;
>  
> +	atomic_inc(&fs_info->nr_delayed_iputs);
>  	spin_lock(&fs_info->delayed_iput_lock);
>  	ASSERT(list_empty(&binode->delayed_iput));
>  	list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs);
> @@ -3279,11 +3280,31 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
>  		list_del_init(&inode->delayed_iput);
>  		spin_unlock(&fs_info->delayed_iput_lock);
>  		iput(&inode->vfs_inode);
> +		if (atomic_dec_and_test(&fs_info->nr_delayed_iputs))
> +			wake_up(&fs_info->delayed_iputs_wait);
>  		spin_lock(&fs_info->delayed_iput_lock);
>  	}
>  	spin_unlock(&fs_info->delayed_iput_lock);
>  }
>  
> +/**
> + * btrfs_wait_on_delayed_iputs - wait on the delayed iputs to be done running
> + * @fs_info - the fs_info for this fs
> + * @return - EINTR if we were killed, 0 if nothing's pending
> + *
> + * This will wait on any delayed iputs that are currently running with KILLABLE
> + * set.  Once they are all done running we will return, unless we are killed in
> + * which case we return EINTR.
> + */
> +int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info)
> +{
> +	int ret = wait_event_killable(fs_info->delayed_iputs_wait,
> +			atomic_read(&fs_info->nr_delayed_iputs) == 0);
> +	if (ret)
> +		return -EINTR;
> +	return 0;
> +}
> +
>  /*
>   * This creates an orphan entry for the given inode in case something goes wrong
>   * in the middle of an unlink.
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-11-27  8:26   ` Nikolay Borisov
@ 2018-11-27 19:54     ` Josef Bacik
  2018-11-27 19:59       ` Chris Mason
  0 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2018-11-27 19:54 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote:
> 
> 
> On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
> > The cleaner thread usually takes care of delayed iputs, with the
> > exception of the btrfs_end_transaction_throttle path.  The cleaner
> > thread only gets woken up every 30 seconds, so instead wake it up to do
> > it's work so that we can free up that space as quickly as possible.
> 
> Have you done any measurements how this affects the overall system. I
> suspect this introduces a lot of noise since now we are going to be
> doing a thread wakeup on every iput, does this give a chance to have
> nice, large batches of iputs that  the cost of wake up can be amortized
> across?

I ran the whole patchset with our A/B testing stuff and the patchset was a 5%
win overall, so I'm inclined to think it's fine.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-11-27 19:54     ` Josef Bacik
@ 2018-11-27 19:59       ` Chris Mason
  2018-11-27 20:08         ` Josef Bacik
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Mason @ 2018-11-27 19:59 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Nikolay Borisov, linux-btrfs, Kernel Team

On 27 Nov 2018, at 14:54, Josef Bacik wrote:

> On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote:
>>
>>
>> On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
>>> The cleaner thread usually takes care of delayed iputs, with the
>>> exception of the btrfs_end_transaction_throttle path.  The cleaner
>>> thread only gets woken up every 30 seconds, so instead wake it up to 
>>> do
>>> it's work so that we can free up that space as quickly as possible.
>>
>> Have you done any measurements how this affects the overall system. I
>> suspect this introduces a lot of noise since now we are going to be
>> doing a thread wakeup on every iput, does this give a chance to have
>> nice, large batches of iputs that  the cost of wake up can be 
>> amortized
>> across?
>
> I ran the whole patchset with our A/B testing stuff and the patchset 
> was a 5%
> win overall, so I'm inclined to think it's fine.  Thanks,

It's a good point though, a delayed wakeup may be less overhead.

-chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue
  2018-11-27  8:29   ` Nikolay Borisov
@ 2018-11-27 20:01     ` Josef Bacik
  0 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2018-11-27 20:01 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Tue, Nov 27, 2018 at 10:29:57AM +0200, Nikolay Borisov wrote:
> 
> 
> On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
> > The throttle path doesn't take cleaner_delayed_iput_mutex, which means
> 
> Which one is the throttle path? btrfs_end_transaction_throttle is only
> called during snapshot drop and relocation? What scenario triggered your
> observation and prompted this fix?
> 

One of my enospc tests runs snapshot creation/deletion in the background.

> > we could think we're done flushing iputs in the data space reservation
> > path when we could have a throttler doing an iput.  There's no real
> > reason to serialize the delayed iput flushing, so instead of taking the
> > cleaner_delayed_iput_mutex whenever we flush the delayed iputs just
> > replace it with an atomic counter and a waitqueue.  This removes the
> > short (or long depending on how big the inode is) window where we think
> > there are no more pending iputs when there really are some.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/btrfs/ctree.h       |  4 +++-
> >  fs/btrfs/disk-io.c     |  5 ++---
> >  fs/btrfs/extent-tree.c |  9 +++++----
> >  fs/btrfs/inode.c       | 21 +++++++++++++++++++++
> >  4 files changed, 31 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 709de7471d86..a835fe7076eb 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -912,7 +912,8 @@ struct btrfs_fs_info {
> >  
> >  	spinlock_t delayed_iput_lock;
> >  	struct list_head delayed_iputs;
> > -	struct mutex cleaner_delayed_iput_mutex;
> > +	atomic_t nr_delayed_iputs;
> > +	wait_queue_head_t delayed_iputs_wait;
> >  
> >  	/* this protects tree_mod_seq_list */
> >  	spinlock_t tree_mod_seq_lock;
> > @@ -3237,6 +3238,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root);
> >  int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size);
> >  void btrfs_add_delayed_iput(struct inode *inode);
> >  void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info);
> > +int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info);
> >  int btrfs_prealloc_file_range(struct inode *inode, int mode,
> >  			      u64 start, u64 num_bytes, u64 min_size,
> >  			      loff_t actual_len, u64 *alloc_hint);
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index c5918ff8241b..3f81dfaefa32 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -1692,9 +1692,7 @@ static int cleaner_kthread(void *arg)
> >  			goto sleep;
> >  		}
> >  
> > -		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
> >  		btrfs_run_delayed_iputs(fs_info);
> > -		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
> >  
> >  		again = btrfs_clean_one_deleted_snapshot(root);
> >  		mutex_unlock(&fs_info->cleaner_mutex);
> > @@ -2651,7 +2649,6 @@ int open_ctree(struct super_block *sb,
> >  	mutex_init(&fs_info->delete_unused_bgs_mutex);
> >  	mutex_init(&fs_info->reloc_mutex);
> >  	mutex_init(&fs_info->delalloc_root_mutex);
> > -	mutex_init(&fs_info->cleaner_delayed_iput_mutex);
> >  	seqlock_init(&fs_info->profiles_lock);
> >  
> >  	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
> > @@ -2673,6 +2670,7 @@ int open_ctree(struct super_block *sb,
> >  	atomic_set(&fs_info->defrag_running, 0);
> >  	atomic_set(&fs_info->qgroup_op_seq, 0);
> >  	atomic_set(&fs_info->reada_works_cnt, 0);
> > +	atomic_set(&fs_info->nr_delayed_iputs, 0);
> >  	atomic64_set(&fs_info->tree_mod_seq, 0);
> >  	fs_info->sb = sb;
> >  	fs_info->max_inline = BTRFS_DEFAULT_MAX_INLINE;
> > @@ -2750,6 +2748,7 @@ int open_ctree(struct super_block *sb,
> >  	init_waitqueue_head(&fs_info->transaction_wait);
> >  	init_waitqueue_head(&fs_info->transaction_blocked_wait);
> >  	init_waitqueue_head(&fs_info->async_submit_wait);
> > +	init_waitqueue_head(&fs_info->delayed_iputs_wait);
> >  
> >  	INIT_LIST_HEAD(&fs_info->pinned_chunks);
> >  
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 3a90dc1d6b31..36f43876be22 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -4272,8 +4272,9 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
> >  				 * operations. Wait for it to finish so that
> >  				 * more space is released.
> >  				 */
> > -				mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
> > -				mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
> > +				ret = btrfs_wait_on_delayed_iputs(fs_info);
> > +				if (ret)
> > +					return ret;
> >  				goto again;
> >  			} else {
> >  				btrfs_end_transaction(trans);
> > @@ -4838,9 +4839,9 @@ static int may_commit_transaction(struct btrfs_fs_info *fs_info,
> >  	 * pinned space, so make sure we run the iputs before we do our pinned
> >  	 * bytes check below.
> >  	 */
> > -	mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
> >  	btrfs_run_delayed_iputs(fs_info);
> > -	mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
> > +	wait_event(fs_info->delayed_iputs_wait,
> > +		   atomic_read(&fs_info->nr_delayed_iputs) == 0);
> 
> Why open code btrfs_wait_on_delayed_iputs(fs_info)? Just make it use
> wait_event instead of the killable version and use the function
> uniformally across the code base.

I don't want the flusher thread to be killable, but now that I've said that out
loud I realize that's not possible, so I'll just use the helper.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-11-27 19:59       ` Chris Mason
@ 2018-11-27 20:08         ` Josef Bacik
  2018-11-28 19:06           ` David Sterba
  0 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2018-11-27 20:08 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, Nikolay Borisov, linux-btrfs, Kernel Team

On Tue, Nov 27, 2018 at 07:59:42PM +0000, Chris Mason wrote:
> On 27 Nov 2018, at 14:54, Josef Bacik wrote:
> 
> > On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote:
> >>
> >>
> >> On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
> >>> The cleaner thread usually takes care of delayed iputs, with the
> >>> exception of the btrfs_end_transaction_throttle path.  The cleaner
> >>> thread only gets woken up every 30 seconds, so instead wake it up to 
> >>> do
> >>> it's work so that we can free up that space as quickly as possible.
> >>
> >> Have you done any measurements how this affects the overall system. I
> >> suspect this introduces a lot of noise since now we are going to be
> >> doing a thread wakeup on every iput, does this give a chance to have
> >> nice, large batches of iputs that  the cost of wake up can be 
> >> amortized
> >> across?
> >
> > I ran the whole patchset with our A/B testing stuff and the patchset 
> > was a 5%
> > win overall, so I'm inclined to think it's fine.  Thanks,
> 
> It's a good point though, a delayed wakeup may be less overhead.

Sure, but how do we go about that without it sometimes messing up?  In practice
the only time we're doing this is at the end of finish_ordered_io, so likely to
not be a constant stream.  I suppose since we have places where we force it to
run that we don't really need this.  IDK, I'm fine with dropping it.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-11-27 20:08         ` Josef Bacik
@ 2018-11-28 19:06           ` David Sterba
  2018-11-28 19:32             ` Chris Mason
  2018-11-28 20:08             ` Filipe Manana
  0 siblings, 2 replies; 18+ messages in thread
From: David Sterba @ 2018-11-28 19:06 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Chris Mason, Nikolay Borisov, linux-btrfs, Kernel Team

On Tue, Nov 27, 2018 at 03:08:08PM -0500, Josef Bacik wrote:
> On Tue, Nov 27, 2018 at 07:59:42PM +0000, Chris Mason wrote:
> > On 27 Nov 2018, at 14:54, Josef Bacik wrote:
> > 
> > > On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote:
> > >>
> > >>
> > >> On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
> > >>> The cleaner thread usually takes care of delayed iputs, with the
> > >>> exception of the btrfs_end_transaction_throttle path.  The cleaner
> > >>> thread only gets woken up every 30 seconds, so instead wake it up to 
> > >>> do
> > >>> it's work so that we can free up that space as quickly as possible.
> > >>
> > >> Have you done any measurements how this affects the overall system. I
> > >> suspect this introduces a lot of noise since now we are going to be
> > >> doing a thread wakeup on every iput, does this give a chance to have
> > >> nice, large batches of iputs that  the cost of wake up can be 
> > >> amortized
> > >> across?
> > >
> > > I ran the whole patchset with our A/B testing stuff and the patchset 
> > > was a 5%
> > > win overall, so I'm inclined to think it's fine.  Thanks,
> > 
> > It's a good point though, a delayed wakeup may be less overhead.
> 
> Sure, but how do we go about that without it sometimes messing up?  In practice
> the only time we're doing this is at the end of finish_ordered_io, so likely to
> not be a constant stream.  I suppose since we have places where we force it to
> run that we don't really need this.  IDK, I'm fine with dropping it.  Thanks,

The transaction thread wakes up cleaner periodically (commit interval,
30s by default), so the time to process iputs is not unbounded.

I have the same concerns as Nikolay, coupling the wakeup to all delayed
iputs could result in smaller batches. But some of the callers of
btrfs_add_delayed_iput might benefit from the extra wakeup, like
btrfs_remove_block_group, so I don't want to leave the idea yet.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-11-28 19:06           ` David Sterba
@ 2018-11-28 19:32             ` Chris Mason
  2018-11-28 20:08             ` Filipe Manana
  1 sibling, 0 replies; 18+ messages in thread
From: Chris Mason @ 2018-11-28 19:32 UTC (permalink / raw)
  To: David Sterba; +Cc: Josef Bacik, Nikolay Borisov, linux-btrfs, Kernel Team

On 28 Nov 2018, at 14:06, David Sterba wrote:

> On Tue, Nov 27, 2018 at 03:08:08PM -0500, Josef Bacik wrote:
>> On Tue, Nov 27, 2018 at 07:59:42PM +0000, Chris Mason wrote:
>>> On 27 Nov 2018, at 14:54, Josef Bacik wrote:
>>>
>>>> On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote:
>>>>>
>>>>>
>>>>> On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
>>>>>> The cleaner thread usually takes care of delayed iputs, with the
>>>>>> exception of the btrfs_end_transaction_throttle path.  The 
>>>>>> cleaner
>>>>>> thread only gets woken up every 30 seconds, so instead wake it up 
>>>>>> to
>>>>>> do
>>>>>> it's work so that we can free up that space as quickly as 
>>>>>> possible.
>>>>>
>>>>> Have you done any measurements how this affects the overall 
>>>>> system. I
>>>>> suspect this introduces a lot of noise since now we are going to 
>>>>> be
>>>>> doing a thread wakeup on every iput, does this give a chance to 
>>>>> have
>>>>> nice, large batches of iputs that  the cost of wake up can be
>>>>> amortized
>>>>> across?
>>>>
>>>> I ran the whole patchset with our A/B testing stuff and the 
>>>> patchset
>>>> was a 5%
>>>> win overall, so I'm inclined to think it's fine.  Thanks,
>>>
>>> It's a good point though, a delayed wakeup may be less overhead.
>>
>> Sure, but how do we go about that without it sometimes messing up?  
>> In practice
>> the only time we're doing this is at the end of finish_ordered_io, so 
>> likely to
>> not be a constant stream.  I suppose since we have places where we 
>> force it to
>> run that we don't really need this.  IDK, I'm fine with dropping it.  
>> Thanks,
>
> The transaction thread wakes up cleaner periodically (commit interval,
> 30s by default), so the time to process iputs is not unbounded.
>
> I have the same concerns as Nikolay, coupling the wakeup to all 
> delayed
> iputs could result in smaller batches. But some of the callers of
> btrfs_add_delayed_iput might benefit from the extra wakeup, like
> btrfs_remove_block_group, so I don't want to leave the idea yet.

Yeah, I love the idea, I'm just worried about a tiny bit of rate 
limiting.  Since we're only waking up a fixed process and not creating 
new work queue items every time, it's probably fine, but a delay of HZ/2 
would probably be even better.

-chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-11-28 19:06           ` David Sterba
  2018-11-28 19:32             ` Chris Mason
@ 2018-11-28 20:08             ` Filipe Manana
  2018-11-29  0:30               ` Qu Wenruo
  1 sibling, 1 reply; 18+ messages in thread
From: Filipe Manana @ 2018-11-28 20:08 UTC (permalink / raw)
  To: dsterba, Josef Bacik, Chris Mason, Nikolay Borisov, linux-btrfs,
	Kernel-team

On Wed, Nov 28, 2018 at 7:09 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Tue, Nov 27, 2018 at 03:08:08PM -0500, Josef Bacik wrote:
> > On Tue, Nov 27, 2018 at 07:59:42PM +0000, Chris Mason wrote:
> > > On 27 Nov 2018, at 14:54, Josef Bacik wrote:
> > >
> > > > On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote:
> > > >>
> > > >>
> > > >> On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
> > > >>> The cleaner thread usually takes care of delayed iputs, with the
> > > >>> exception of the btrfs_end_transaction_throttle path.  The cleaner
> > > >>> thread only gets woken up every 30 seconds, so instead wake it up to
> > > >>> do
> > > >>> it's work so that we can free up that space as quickly as possible.
> > > >>
> > > >> Have you done any measurements how this affects the overall system. I
> > > >> suspect this introduces a lot of noise since now we are going to be
> > > >> doing a thread wakeup on every iput, does this give a chance to have
> > > >> nice, large batches of iputs that  the cost of wake up can be
> > > >> amortized
> > > >> across?
> > > >
> > > > I ran the whole patchset with our A/B testing stuff and the patchset
> > > > was a 5%
> > > > win overall, so I'm inclined to think it's fine.  Thanks,
> > >
> > > It's a good point though, a delayed wakeup may be less overhead.
> >
> > Sure, but how do we go about that without it sometimes messing up?  In practice
> > the only time we're doing this is at the end of finish_ordered_io, so likely to
> > not be a constant stream.  I suppose since we have places where we force it to
> > run that we don't really need this.  IDK, I'm fine with dropping it.  Thanks,
>
> The transaction thread wakes up cleaner periodically (commit interval,
> 30s by default), so the time to process iputs is not unbounded.
>
> I have the same concerns as Nikolay, coupling the wakeup to all delayed
> iputs could result in smaller batches. But some of the callers of
> btrfs_add_delayed_iput might benefit from the extra wakeup, like
> btrfs_remove_block_group, so I don't want to leave the idea yet.

I'm curious, why do you think it would benefit btrfs_remove_block_group()?



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-11-28 20:08             ` Filipe Manana
@ 2018-11-29  0:30               ` Qu Wenruo
  0 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2018-11-29  0:30 UTC (permalink / raw)
  To: fdmanana, dsterba, Josef Bacik, Chris Mason, Nikolay Borisov,
	linux-btrfs, Kernel-team



On 2018/11/29 上午4:08, Filipe Manana wrote:
> On Wed, Nov 28, 2018 at 7:09 PM David Sterba <dsterba@suse.cz> wrote:
>>
>> On Tue, Nov 27, 2018 at 03:08:08PM -0500, Josef Bacik wrote:
>>> On Tue, Nov 27, 2018 at 07:59:42PM +0000, Chris Mason wrote:
>>>> On 27 Nov 2018, at 14:54, Josef Bacik wrote:
>>>>
>>>>> On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote:
>>>>>>
>>>>>>
>>>>>> On 21.11.18 г. 21:09 ч., Josef Bacik wrote:
>>>>>>> The cleaner thread usually takes care of delayed iputs, with the
>>>>>>> exception of the btrfs_end_transaction_throttle path.  The cleaner
>>>>>>> thread only gets woken up every 30 seconds, so instead wake it up to
>>>>>>> do
>>>>>>> it's work so that we can free up that space as quickly as possible.
>>>>>>
>>>>>> Have you done any measurements how this affects the overall system. I
>>>>>> suspect this introduces a lot of noise since now we are going to be
>>>>>> doing a thread wakeup on every iput, does this give a chance to have
>>>>>> nice, large batches of iputs that  the cost of wake up can be
>>>>>> amortized
>>>>>> across?
>>>>>
>>>>> I ran the whole patchset with our A/B testing stuff and the patchset
>>>>> was a 5%
>>>>> win overall, so I'm inclined to think it's fine.  Thanks,
>>>>
>>>> It's a good point though, a delayed wakeup may be less overhead.
>>>
>>> Sure, but how do we go about that without it sometimes messing up?  In practice
>>> the only time we're doing this is at the end of finish_ordered_io, so likely to
>>> not be a constant stream.  I suppose since we have places where we force it to
>>> run that we don't really need this.  IDK, I'm fine with dropping it.  Thanks,
>>
>> The transaction thread wakes up cleaner periodically (commit interval,
>> 30s by default), so the time to process iputs is not unbounded.
>>
>> I have the same concerns as Nikolay, coupling the wakeup to all delayed
>> iputs could result in smaller batches. But some of the callers of
>> btrfs_add_delayed_iput might benefit from the extra wakeup, like
>> btrfs_remove_block_group, so I don't want to leave the idea yet.
> 
> I'm curious, why do you think it would benefit btrfs_remove_block_group()?

Just as Filipe said, I'm not sure why btrfs_remove_block_group() would
get some benefit from more frequent cleaner thread wake up.

For an empty block group to really be removed, it also needs to have 0
pinned bytenr, which is only possible after a transaction get committed.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] btrfs: run delayed iputs before committing
  2019-01-11 15:21 [PATCH 0/3][V3] Delayed iput fixes Josef Bacik
@ 2019-01-11 15:21 ` Josef Bacik
  0 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2019-01-11 15:21 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd.  So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is and we freed enough we can then commit the transaction and
potentially be able to make our reservation.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent-tree.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8dfddfd3f315..0127d272cd2a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4953,6 +4953,15 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 			ret = 0;
 		break;
 	case COMMIT_TRANS:
+		/*
+		 * If we have pending delayed iputs then we could free up a
+		 * bunch of pinned space, so make sure we run the iputs before
+		 * we do our pinned bytes check below.
+		 */
+		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
+		btrfs_run_delayed_iputs(fs_info);
+		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
+
 		ret = may_commit_transaction(fs_info, space_info);
 		break;
 	default:
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] btrfs: run delayed iputs before committing
  2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
@ 2018-12-04  9:03   ` Nikolay Borisov
  0 siblings, 0 replies; 18+ messages in thread
From: Nikolay Borisov @ 2018-12-04  9:03 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 3.12.18 г. 18:06 ч., Josef Bacik wrote:
> Delayed iputs means we can have final iputs of deleted inodes in the
> queue, which could potentially generate a lot of pinned space that could
> be free'd.  So before we decide to commit the transaction for ENOPSC
> reasons, run the delayed iputs so that any potential space is free'd up.
> If there is and we freed enough we can then commit the transaction and
> potentially be able to make our reservation.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/extent-tree.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 8dfddfd3f315..0127d272cd2a 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4953,6 +4953,15 @@ static void flush_space(struct btrfs_fs_info *fs_info,
>  			ret = 0;
>  		break;
>  	case COMMIT_TRANS:
> +		/*
> +		 * If we have pending delayed iputs then we could free up a
> +		 * bunch of pinned space, so make sure we run the iputs before
> +		 * we do our pinned bytes check below.
> +		 */
> +		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
> +		btrfs_run_delayed_iputs(fs_info);
> +		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
> +
>  		ret = may_commit_transaction(fs_info, space_info);
>  		break;
>  	default:
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] btrfs: run delayed iputs before committing
  2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik
@ 2018-12-03 16:06 ` Josef Bacik
  2018-12-04  9:03   ` Nikolay Borisov
  0 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2018-12-03 16:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd.  So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is and we freed enough we can then commit the transaction and
potentially be able to make our reservation.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent-tree.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8dfddfd3f315..0127d272cd2a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4953,6 +4953,15 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 			ret = 0;
 		break;
 	case COMMIT_TRANS:
+		/*
+		 * If we have pending delayed iputs then we could free up a
+		 * bunch of pinned space, so make sure we run the iputs before
+		 * we do our pinned bytes check below.
+		 */
+		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
+		btrfs_run_delayed_iputs(fs_info);
+		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
+
 		ret = may_commit_transaction(fs_info, space_info);
 		break;
 	default:
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-01-11 15:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-21 19:09 [PATCH 0/3] Delayed iput fixes Josef Bacik
2018-11-21 19:09 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
2018-11-26 14:44   ` Nikolay Borisov
2018-11-21 19:09 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik
2018-11-27  8:26   ` Nikolay Borisov
2018-11-27 19:54     ` Josef Bacik
2018-11-27 19:59       ` Chris Mason
2018-11-27 20:08         ` Josef Bacik
2018-11-28 19:06           ` David Sterba
2018-11-28 19:32             ` Chris Mason
2018-11-28 20:08             ` Filipe Manana
2018-11-29  0:30               ` Qu Wenruo
2018-11-21 19:09 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik
2018-11-27  8:29   ` Nikolay Borisov
2018-11-27 20:01     ` Josef Bacik
2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik
2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
2018-12-04  9:03   ` Nikolay Borisov
2019-01-11 15:21 [PATCH 0/3][V3] Delayed iput fixes Josef Bacik
2019-01-11 15:21 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).