linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3][V2] Delayed iput fixes
@ 2018-12-03 16:06 Josef Bacik
  2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Josef Bacik @ 2018-12-03 16:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

v1->v2:
- only wakeup if the cleaner isn't currently doing work.
- re-arranged some stuff for running delayed iputs during flushint.
- removed the open code wakeup in the waitqueue patch.

-- Original message --

Here are some delayed iput fixes.  Delayed iputs can hold reservations for a
while and there's no real good way to make sure they were gone for good, which
means we could early enospc when in reality if we had just waited for the iput
we would have had plenty of space.  So fix this up by making us wait for delayed
iputs when deciding if we need to commit for enospc flushing, and then cleanup
and rework how we run delayed iputs to make it more straightforward to wait on
them and make sure we're all done using them.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] btrfs: run delayed iputs before committing
  2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik
@ 2018-12-03 16:06 ` Josef Bacik
  2018-12-04  9:03   ` Nikolay Borisov
  2018-12-03 16:06 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Josef Bacik @ 2018-12-03 16:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd.  So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is and we freed enough we can then commit the transaction and
potentially be able to make our reservation.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent-tree.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8dfddfd3f315..0127d272cd2a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4953,6 +4953,15 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 			ret = 0;
 		break;
 	case COMMIT_TRANS:
+		/*
+		 * If we have pending delayed iputs then we could free up a
+		 * bunch of pinned space, so make sure we run the iputs before
+		 * we do our pinned bytes check below.
+		 */
+		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
+		btrfs_run_delayed_iputs(fs_info);
+		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
+
 		ret = may_commit_transaction(fs_info, space_info);
 		break;
 	default:
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik
  2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
@ 2018-12-03 16:06 ` Josef Bacik
  2018-12-04  9:21   ` Nikolay Borisov
  2018-12-03 16:06 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik
  2019-01-08 15:34 ` [PATCH 0/3][V2] Delayed iput fixes David Sterba
  3 siblings, 1 reply; 11+ messages in thread
From: Josef Bacik @ 2018-12-03 16:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

The cleaner thread usually takes care of delayed iputs, with the
exception of the btrfs_end_transaction_throttle path.  The cleaner
thread only gets woken up every 30 seconds, so instead wake it up to do
it's work so that we can free up that space as quickly as possible.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h   | 3 +++
 fs/btrfs/disk-io.c | 3 +++
 fs/btrfs/inode.c   | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c8ddbacb6748..dc56a4d940c3 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -769,6 +769,9 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr);
  */
 #define BTRFS_FS_BALANCE_RUNNING		18
 
+/* Indicate that the cleaner thread is awake and doing something. */
+#define BTRFS_FS_CLEANER_RUNNING		19
+
 struct btrfs_fs_info {
 	u8 fsid[BTRFS_FSID_SIZE];
 	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c5918ff8241b..f40f6fdc1019 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1669,6 +1669,8 @@ static int cleaner_kthread(void *arg)
 	while (1) {
 		again = 0;
 
+		set_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags);
+
 		/* Make the cleaner go to sleep early. */
 		if (btrfs_need_cleaner_sleep(fs_info))
 			goto sleep;
@@ -1715,6 +1717,7 @@ static int cleaner_kthread(void *arg)
 		 */
 		btrfs_delete_unused_bgs(fs_info);
 sleep:
+		clear_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags);
 		if (kthread_should_park())
 			kthread_parkme();
 		if (kthread_should_stop())
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8ac7abe2ae9b..0b9f3e482cea 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3264,6 +3264,8 @@ void btrfs_add_delayed_iput(struct inode *inode)
 	ASSERT(list_empty(&binode->delayed_iput));
 	list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs);
 	spin_unlock(&fs_info->delayed_iput_lock);
+	if (!test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags))
+		wake_up_process(fs_info->cleaner_kthread);
 }
 
 void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue
  2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik
  2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
  2018-12-03 16:06 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik
@ 2018-12-03 16:06 ` Josef Bacik
  2018-12-04 11:46   ` Nikolay Borisov
  2019-01-08 15:34 ` [PATCH 0/3][V2] Delayed iput fixes David Sterba
  3 siblings, 1 reply; 11+ messages in thread
From: Josef Bacik @ 2018-12-03 16:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

The throttle path doesn't take cleaner_delayed_iput_mutex, which means
we could think we're done flushing iputs in the data space reservation
path when we could have a throttler doing an iput.  There's no real
reason to serialize the delayed iput flushing, so instead of taking the
cleaner_delayed_iput_mutex whenever we flush the delayed iputs just
replace it with an atomic counter and a waitqueue.  This removes the
short (or long depending on how big the inode is) window where we think
there are no more pending iputs when there really are some.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h       |  4 +++-
 fs/btrfs/disk-io.c     |  5 ++---
 fs/btrfs/extent-tree.c | 13 ++++++++-----
 fs/btrfs/inode.c       | 21 +++++++++++++++++++++
 4 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index dc56a4d940c3..20af5d6d81f1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -915,7 +915,8 @@ struct btrfs_fs_info {
 
 	spinlock_t delayed_iput_lock;
 	struct list_head delayed_iputs;
-	struct mutex cleaner_delayed_iput_mutex;
+	atomic_t nr_delayed_iputs;
+	wait_queue_head_t delayed_iputs_wait;
 
 	/* this protects tree_mod_seq_list */
 	spinlock_t tree_mod_seq_lock;
@@ -3240,6 +3241,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root);
 int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size);
 void btrfs_add_delayed_iput(struct inode *inode);
 void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info);
+int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info);
 int btrfs_prealloc_file_range(struct inode *inode, int mode,
 			      u64 start, u64 num_bytes, u64 min_size,
 			      loff_t actual_len, u64 *alloc_hint);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f40f6fdc1019..238e0113f2d3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1694,9 +1694,7 @@ static int cleaner_kthread(void *arg)
 			goto sleep;
 		}
 
-		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
 		btrfs_run_delayed_iputs(fs_info);
-		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
 
 		again = btrfs_clean_one_deleted_snapshot(root);
 		mutex_unlock(&fs_info->cleaner_mutex);
@@ -2654,7 +2652,6 @@ int open_ctree(struct super_block *sb,
 	mutex_init(&fs_info->delete_unused_bgs_mutex);
 	mutex_init(&fs_info->reloc_mutex);
 	mutex_init(&fs_info->delalloc_root_mutex);
-	mutex_init(&fs_info->cleaner_delayed_iput_mutex);
 	seqlock_init(&fs_info->profiles_lock);
 
 	INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots);
@@ -2676,6 +2673,7 @@ int open_ctree(struct super_block *sb,
 	atomic_set(&fs_info->defrag_running, 0);
 	atomic_set(&fs_info->qgroup_op_seq, 0);
 	atomic_set(&fs_info->reada_works_cnt, 0);
+	atomic_set(&fs_info->nr_delayed_iputs, 0);
 	atomic64_set(&fs_info->tree_mod_seq, 0);
 	fs_info->sb = sb;
 	fs_info->max_inline = BTRFS_DEFAULT_MAX_INLINE;
@@ -2753,6 +2751,7 @@ int open_ctree(struct super_block *sb,
 	init_waitqueue_head(&fs_info->transaction_wait);
 	init_waitqueue_head(&fs_info->transaction_blocked_wait);
 	init_waitqueue_head(&fs_info->async_submit_wait);
+	init_waitqueue_head(&fs_info->delayed_iputs_wait);
 
 	INIT_LIST_HEAD(&fs_info->pinned_chunks);
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0127d272cd2a..5b6c9fc227ff 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4280,10 +4280,14 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
 				/*
 				 * The cleaner kthread might still be doing iput
 				 * operations. Wait for it to finish so that
-				 * more space is released.
+				 * more space is released.  We don't need to
+				 * explicitly run the delayed iputs here because
+				 * the commit_transaction would have woken up
+				 * the cleaner.
 				 */
-				mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
-				mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
+				ret = btrfs_wait_on_delayed_iputs(fs_info);
+				if (ret)
+					return ret;
 				goto again;
 			} else {
 				btrfs_end_transaction(trans);
@@ -4958,9 +4962,8 @@ static void flush_space(struct btrfs_fs_info *fs_info,
 		 * bunch of pinned space, so make sure we run the iputs before
 		 * we do our pinned bytes check below.
 		 */
-		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
 		btrfs_run_delayed_iputs(fs_info);
-		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
+		btrfs_wait_on_delayed_iputs(fs_info);
 
 		ret = may_commit_transaction(fs_info, space_info);
 		break;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0b9f3e482cea..958e30c7c744 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3260,6 +3260,7 @@ void btrfs_add_delayed_iput(struct inode *inode)
 	if (atomic_add_unless(&inode->i_count, -1, 1))
 		return;
 
+	atomic_inc(&fs_info->nr_delayed_iputs);
 	spin_lock(&fs_info->delayed_iput_lock);
 	ASSERT(list_empty(&binode->delayed_iput));
 	list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs);
@@ -3280,11 +3281,31 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
 		list_del_init(&inode->delayed_iput);
 		spin_unlock(&fs_info->delayed_iput_lock);
 		iput(&inode->vfs_inode);
+		if (atomic_dec_and_test(&fs_info->nr_delayed_iputs))
+			wake_up(&fs_info->delayed_iputs_wait);
 		spin_lock(&fs_info->delayed_iput_lock);
 	}
 	spin_unlock(&fs_info->delayed_iput_lock);
 }
 
+/**
+ * btrfs_wait_on_delayed_iputs - wait on the delayed iputs to be done running
+ * @fs_info - the fs_info for this fs
+ * @return - EINTR if we were killed, 0 if nothing's pending
+ *
+ * This will wait on any delayed iputs that are currently running with KILLABLE
+ * set.  Once they are all done running we will return, unless we are killed in
+ * which case we return EINTR.
+ */
+int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info)
+{
+	int ret = wait_event_killable(fs_info->delayed_iputs_wait,
+			atomic_read(&fs_info->nr_delayed_iputs) == 0);
+	if (ret)
+		return -EINTR;
+	return 0;
+}
+
 /*
  * This creates an orphan entry for the given inode in case something goes wrong
  * in the middle of an unlink.
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] btrfs: run delayed iputs before committing
  2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
@ 2018-12-04  9:03   ` Nikolay Borisov
  0 siblings, 0 replies; 11+ messages in thread
From: Nikolay Borisov @ 2018-12-04  9:03 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 3.12.18 г. 18:06 ч., Josef Bacik wrote:
> Delayed iputs means we can have final iputs of deleted inodes in the
> queue, which could potentially generate a lot of pinned space that could
> be free'd.  So before we decide to commit the transaction for ENOPSC
> reasons, run the delayed iputs so that any potential space is free'd up.
> If there is and we freed enough we can then commit the transaction and
> potentially be able to make our reservation.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/extent-tree.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 8dfddfd3f315..0127d272cd2a 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4953,6 +4953,15 @@ static void flush_space(struct btrfs_fs_info *fs_info,
>  			ret = 0;
>  		break;
>  	case COMMIT_TRANS:
> +		/*
> +		 * If we have pending delayed iputs then we could free up a
> +		 * bunch of pinned space, so make sure we run the iputs before
> +		 * we do our pinned bytes check below.
> +		 */
> +		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
> +		btrfs_run_delayed_iputs(fs_info);
> +		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
> +
>  		ret = may_commit_transaction(fs_info, space_info);
>  		break;
>  	default:
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-12-03 16:06 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik
@ 2018-12-04  9:21   ` Nikolay Borisov
  2018-12-04 18:18     ` Josef Bacik
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Borisov @ 2018-12-04  9:21 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 3.12.18 г. 18:06 ч., Josef Bacik wrote:
> The cleaner thread usually takes care of delayed iputs, with the
> exception of the btrfs_end_transaction_throttle path.  The cleaner
> thread only gets woken up every 30 seconds, so instead wake it up to do
> it's work so that we can free up that space as quickly as possible.

This description misses any rationale whatsoever about why the cleaner
needs to be woken up more frequently than 30 seconds (and IMO this is
the most important question that needs answering).

Also have you done any measurements of the number of processed delayed
inodes with this change. Given the behavior you so desire why not just
make delayed iputs to be run via schedule_work on the global workqueue
and be done with it? I'm sure the latency will be better than the
current 30 seconds one :)

> 
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/ctree.h   | 3 +++
>  fs/btrfs/disk-io.c | 3 +++
>  fs/btrfs/inode.c   | 2 ++
>  3 files changed, 8 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index c8ddbacb6748..dc56a4d940c3 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -769,6 +769,9 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr);
>   */
>  #define BTRFS_FS_BALANCE_RUNNING		18
>  
> +/* Indicate that the cleaner thread is awake and doing something. */
> +#define BTRFS_FS_CLEANER_RUNNING		19
> +
>  struct btrfs_fs_info {
>  	u8 fsid[BTRFS_FSID_SIZE];
>  	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c5918ff8241b..f40f6fdc1019 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1669,6 +1669,8 @@ static int cleaner_kthread(void *arg)
>  	while (1) {
>  		again = 0;
>  
> +		set_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags);
> +
>  		/* Make the cleaner go to sleep early. */
>  		if (btrfs_need_cleaner_sleep(fs_info))
>  			goto sleep;
> @@ -1715,6 +1717,7 @@ static int cleaner_kthread(void *arg)
>  		 */
>  		btrfs_delete_unused_bgs(fs_info);
>  sleep:
> +		clear_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags);
>  		if (kthread_should_park())
>  			kthread_parkme();
>  		if (kthread_should_stop())
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8ac7abe2ae9b..0b9f3e482cea 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3264,6 +3264,8 @@ void btrfs_add_delayed_iput(struct inode *inode)
>  	ASSERT(list_empty(&binode->delayed_iput));
>  	list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs);
>  	spin_unlock(&fs_info->delayed_iput_lock);
> +	if (!test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags))
> +		wake_up_process(fs_info->cleaner_kthread);
>  }
>  
>  void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue
  2018-12-03 16:06 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik
@ 2018-12-04 11:46   ` Nikolay Borisov
  2018-12-04 18:21     ` Josef Bacik
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Borisov @ 2018-12-04 11:46 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 3.12.18 г. 18:06 ч., Josef Bacik wrote:
> The throttle path doesn't take cleaner_delayed_iput_mutex, which means
> we could think we're done flushing iputs in the data space reservation
> path when we could have a throttler doing an iput.  There's no real
> reason to serialize the delayed iput flushing, so instead of taking the
> cleaner_delayed_iput_mutex whenever we flush the delayed iputs just
> replace it with an atomic counter and a waitqueue.  This removes the
> short (or long depending on how big the inode is) window where we think
> there are no more pending iputs when there really are some.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/ctree.h       |  4 +++-
>  fs/btrfs/disk-io.c     |  5 ++---
>  fs/btrfs/extent-tree.c | 13 ++++++++-----
>  fs/btrfs/inode.c       | 21 +++++++++++++++++++++
>  4 files changed, 34 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index dc56a4d940c3..20af5d6d81f1 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -915,7 +915,8 @@ struct btrfs_fs_info {
>  
>  	spinlock_t delayed_iput_lock;
>  	struct list_head delayed_iputs;
> -	struct mutex cleaner_delayed_iput_mutex;
> +	atomic_t nr_delayed_iputs;
> +	wait_queue_head_t delayed_iputs_wait;
>  

Have you considered whether the same could be achieved with a completion
rather than an open-coded waitqueue? I tried prototyping it and it could
be done but it becomes messy regarding when the completion should be
initialised i.e only when it's not in btrfs_add_delayed_iput


<snip>

> @@ -4958,9 +4962,8 @@ static void flush_space(struct btrfs_fs_info *fs_info,
>  		 * bunch of pinned space, so make sure we run the iputs before
>  		 * we do our pinned bytes check below.
>  		 */
> -		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
>  		btrfs_run_delayed_iputs(fs_info);
> -		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
> +		btrfs_wait_on_delayed_iputs(fs_info);

Waiting on delayed iputs here is pointless since they are run
synchronously form this context.

>  
>  		ret = may_commit_transaction(fs_info, space_info);
>  		break;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 0b9f3e482cea..958e30c7c744 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3260,6 +3260,7 @@ void btrfs_add_delayed_iput(struct inode *inode)
>  	if (atomic_add_unless(&inode->i_count, -1, 1))
>  		return;
>  
> +	atomic_inc(&fs_info->nr_delayed_iputs);
>  	spin_lock(&fs_info->delayed_iput_lock);
>  	ASSERT(list_empty(&binode->delayed_iput));
>  	list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs);
> @@ -3280,11 +3281,31 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info)
>  		list_del_init(&inode->delayed_iput);
>  		spin_unlock(&fs_info->delayed_iput_lock);
>  		iput(&inode->vfs_inode);
> +		if (atomic_dec_and_test(&fs_info->nr_delayed_iputs))
> +			wake_up(&fs_info->delayed_iputs_wait);
>  		spin_lock(&fs_info->delayed_iput_lock);
>  	}
>  	spin_unlock(&fs_info->delayed_iput_lock);
>  }
>  
> +/**
> + * btrfs_wait_on_delayed_iputs - wait on the delayed iputs to be done running
> + * @fs_info - the fs_info for this fs
> + * @return - EINTR if we were killed, 0 if nothing's pending
> + *
> + * This will wait on any delayed iputs that are currently running with KILLABLE
> + * set.  Once they are all done running we will return, unless we are killed in
> + * which case we return EINTR.
> + */
> +int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info)
> +{
> +	int ret = wait_event_killable(fs_info->delayed_iputs_wait,
> +			atomic_read(&fs_info->nr_delayed_iputs) == 0);
> +	if (ret)
> +		return -EINTR;
> +	return 0;
> +}
> +
>  /*
>   * This creates an orphan entry for the given inode in case something goes wrong
>   * in the middle of an unlink.
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput
  2018-12-04  9:21   ` Nikolay Borisov
@ 2018-12-04 18:18     ` Josef Bacik
  0 siblings, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2018-12-04 18:18 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Tue, Dec 04, 2018 at 11:21:14AM +0200, Nikolay Borisov wrote:
> 
> 
> On 3.12.18 г. 18:06 ч., Josef Bacik wrote:
> > The cleaner thread usually takes care of delayed iputs, with the
> > exception of the btrfs_end_transaction_throttle path.  The cleaner
> > thread only gets woken up every 30 seconds, so instead wake it up to do
> > it's work so that we can free up that space as quickly as possible.
> 
> This description misses any rationale whatsoever about why the cleaner
> needs to be woken up more frequently than 30 seconds (and IMO this is
> the most important question that needs answering).
> 

Yeah I'll add that.

> Also have you done any measurements of the number of processed delayed
> inodes with this change. Given the behavior you so desire why not just
> make delayed iputs to be run via schedule_work on the global workqueue
> and be done with it? I'm sure the latency will be better than the
> current 30 seconds one :)

We already have the cleaner thread to do this work, and it sets up for the
snapshot drop stuff to be run as well.  We could probably add another delayed
work thing, but I would rather do that in a different patch.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue
  2018-12-04 11:46   ` Nikolay Borisov
@ 2018-12-04 18:21     ` Josef Bacik
  0 siblings, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2018-12-04 18:21 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Tue, Dec 04, 2018 at 01:46:58PM +0200, Nikolay Borisov wrote:
> 
> 
> On 3.12.18 г. 18:06 ч., Josef Bacik wrote:
> > The throttle path doesn't take cleaner_delayed_iput_mutex, which means
> > we could think we're done flushing iputs in the data space reservation
> > path when we could have a throttler doing an iput.  There's no real
> > reason to serialize the delayed iput flushing, so instead of taking the
> > cleaner_delayed_iput_mutex whenever we flush the delayed iputs just
> > replace it with an atomic counter and a waitqueue.  This removes the
> > short (or long depending on how big the inode is) window where we think
> > there are no more pending iputs when there really are some.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/btrfs/ctree.h       |  4 +++-
> >  fs/btrfs/disk-io.c     |  5 ++---
> >  fs/btrfs/extent-tree.c | 13 ++++++++-----
> >  fs/btrfs/inode.c       | 21 +++++++++++++++++++++
> >  4 files changed, 34 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index dc56a4d940c3..20af5d6d81f1 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -915,7 +915,8 @@ struct btrfs_fs_info {
> >  
> >  	spinlock_t delayed_iput_lock;
> >  	struct list_head delayed_iputs;
> > -	struct mutex cleaner_delayed_iput_mutex;
> > +	atomic_t nr_delayed_iputs;
> > +	wait_queue_head_t delayed_iputs_wait;
> >  
> 
> Have you considered whether the same could be achieved with a completion
> rather than an open-coded waitqueue? I tried prototyping it and it could
> be done but it becomes messy regarding when the completion should be
> initialised i.e only when it's not in btrfs_add_delayed_iput
> 

Yeah a waitqueue makes more sense here than a completion since it's not a
one-off.

> 
> <snip>
> 
> > @@ -4958,9 +4962,8 @@ static void flush_space(struct btrfs_fs_info *fs_info,
> >  		 * bunch of pinned space, so make sure we run the iputs before
> >  		 * we do our pinned bytes check below.
> >  		 */
> > -		mutex_lock(&fs_info->cleaner_delayed_iput_mutex);
> >  		btrfs_run_delayed_iputs(fs_info);
> > -		mutex_unlock(&fs_info->cleaner_delayed_iput_mutex);
> > +		btrfs_wait_on_delayed_iputs(fs_info);
> 
> Waiting on delayed iputs here is pointless since they are run
> synchronously form this context.
> 

Unless there are other threads (the cleaner thread) running iputs as well.  We
could be running an iput that is evicting the inode in another thread and we
really want that space, so we need to wait here to make sure everybody is truly
done.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3][V2] Delayed iput fixes
  2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik
                   ` (2 preceding siblings ...)
  2018-12-03 16:06 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik
@ 2019-01-08 15:34 ` David Sterba
  2019-01-11 15:09   ` Josef Bacik
  3 siblings, 1 reply; 11+ messages in thread
From: David Sterba @ 2019-01-08 15:34 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Mon, Dec 03, 2018 at 11:06:49AM -0500, Josef Bacik wrote:
> v1->v2:
> - only wakeup if the cleaner isn't currently doing work.
> - re-arranged some stuff for running delayed iputs during flushint.
> - removed the open code wakeup in the waitqueue patch.
> 
> -- Original message --
> 
> Here are some delayed iput fixes.  Delayed iputs can hold reservations for a
> while and there's no real good way to make sure they were gone for good, which
> means we could early enospc when in reality if we had just waited for the iput
> we would have had plenty of space.  So fix this up by making us wait for delayed
> iputs when deciding if we need to commit for enospc flushing, and then cleanup
> and rework how we run delayed iputs to make it more straightforward to wait on
> them and make sure we're all done using them.  Thanks,

I'd like to push this patchset to 5.0, can you please post the missing
rationale for 2/3? I'm not expecting any code changes so it's just the
text, I can update the patch, no need to resend the whole series.
Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3][V2] Delayed iput fixes
  2019-01-08 15:34 ` [PATCH 0/3][V2] Delayed iput fixes David Sterba
@ 2019-01-11 15:09   ` Josef Bacik
  0 siblings, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2019-01-11 15:09 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, kernel-team

On Tue, Jan 08, 2019 at 04:34:39PM +0100, David Sterba wrote:
> On Mon, Dec 03, 2018 at 11:06:49AM -0500, Josef Bacik wrote:
> > v1->v2:
> > - only wakeup if the cleaner isn't currently doing work.
> > - re-arranged some stuff for running delayed iputs during flushint.
> > - removed the open code wakeup in the waitqueue patch.
> > 
> > -- Original message --
> > 
> > Here are some delayed iput fixes.  Delayed iputs can hold reservations for a
> > while and there's no real good way to make sure they were gone for good, which
> > means we could early enospc when in reality if we had just waited for the iput
> > we would have had plenty of space.  So fix this up by making us wait for delayed
> > iputs when deciding if we need to commit for enospc flushing, and then cleanup
> > and rework how we run delayed iputs to make it more straightforward to wait on
> > them and make sure we're all done using them.  Thanks,
> 
> I'd like to push this patchset to 5.0, can you please post the missing
> rationale for 2/3? I'm not expecting any code changes so it's just the
> text, I can update the patch, no need to resend the whole series.
> Thanks.

Sorry I saw this earlier this week and then promptly forgot about it, I'll send
it shortly.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-11 15:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik
2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik
2018-12-04  9:03   ` Nikolay Borisov
2018-12-03 16:06 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik
2018-12-04  9:21   ` Nikolay Borisov
2018-12-04 18:18     ` Josef Bacik
2018-12-03 16:06 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik
2018-12-04 11:46   ` Nikolay Borisov
2018-12-04 18:21     ` Josef Bacik
2019-01-08 15:34 ` [PATCH 0/3][V2] Delayed iput fixes David Sterba
2019-01-11 15:09   ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).