* [PATCH 0/3][V2] Delayed iput fixes @ 2018-12-03 16:06 Josef Bacik 2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik ` (3 more replies) 0 siblings, 4 replies; 21+ messages in thread From: Josef Bacik @ 2018-12-03 16:06 UTC (permalink / raw) To: linux-btrfs, kernel-team v1->v2: - only wakeup if the cleaner isn't currently doing work. - re-arranged some stuff for running delayed iputs during flushint. - removed the open code wakeup in the waitqueue patch. -- Original message -- Here are some delayed iput fixes. Delayed iputs can hold reservations for a while and there's no real good way to make sure they were gone for good, which means we could early enospc when in reality if we had just waited for the iput we would have had plenty of space. So fix this up by making us wait for delayed iputs when deciding if we need to commit for enospc flushing, and then cleanup and rework how we run delayed iputs to make it more straightforward to wait on them and make sure we're all done using them. Thanks, Josef ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/3] btrfs: run delayed iputs before committing 2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik @ 2018-12-03 16:06 ` Josef Bacik 2018-12-04 9:03 ` Nikolay Borisov 2018-12-03 16:06 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik ` (2 subsequent siblings) 3 siblings, 1 reply; 21+ messages in thread From: Josef Bacik @ 2018-12-03 16:06 UTC (permalink / raw) To: linux-btrfs, kernel-team Delayed iputs means we can have final iputs of deleted inodes in the queue, which could potentially generate a lot of pinned space that could be free'd. So before we decide to commit the transaction for ENOPSC reasons, run the delayed iputs so that any potential space is free'd up. If there is and we freed enough we can then commit the transaction and potentially be able to make our reservation. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Omar Sandoval <osandov@fb.com> --- fs/btrfs/extent-tree.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8dfddfd3f315..0127d272cd2a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4953,6 +4953,15 @@ static void flush_space(struct btrfs_fs_info *fs_info, ret = 0; break; case COMMIT_TRANS: + /* + * If we have pending delayed iputs then we could free up a + * bunch of pinned space, so make sure we run the iputs before + * we do our pinned bytes check below. + */ + mutex_lock(&fs_info->cleaner_delayed_iput_mutex); + btrfs_run_delayed_iputs(fs_info); + mutex_unlock(&fs_info->cleaner_delayed_iput_mutex); + ret = may_commit_transaction(fs_info, space_info); break; default: -- 2.14.3 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] btrfs: run delayed iputs before committing 2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik @ 2018-12-04 9:03 ` Nikolay Borisov 0 siblings, 0 replies; 21+ messages in thread From: Nikolay Borisov @ 2018-12-04 9:03 UTC (permalink / raw) To: Josef Bacik, linux-btrfs, kernel-team On 3.12.18 г. 18:06 ч., Josef Bacik wrote: > Delayed iputs means we can have final iputs of deleted inodes in the > queue, which could potentially generate a lot of pinned space that could > be free'd. So before we decide to commit the transaction for ENOPSC > reasons, run the delayed iputs so that any potential space is free'd up. > If there is and we freed enough we can then commit the transaction and > potentially be able to make our reservation. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> > --- > fs/btrfs/extent-tree.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 8dfddfd3f315..0127d272cd2a 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -4953,6 +4953,15 @@ static void flush_space(struct btrfs_fs_info *fs_info, > ret = 0; > break; > case COMMIT_TRANS: > + /* > + * If we have pending delayed iputs then we could free up a > + * bunch of pinned space, so make sure we run the iputs before > + * we do our pinned bytes check below. > + */ > + mutex_lock(&fs_info->cleaner_delayed_iput_mutex); > + btrfs_run_delayed_iputs(fs_info); > + mutex_unlock(&fs_info->cleaner_delayed_iput_mutex); > + > ret = may_commit_transaction(fs_info, space_info); > break; > default: > ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik 2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik @ 2018-12-03 16:06 ` Josef Bacik 2018-12-04 9:21 ` Nikolay Borisov 2018-12-03 16:06 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik 2019-01-08 15:34 ` [PATCH 0/3][V2] Delayed iput fixes David Sterba 3 siblings, 1 reply; 21+ messages in thread From: Josef Bacik @ 2018-12-03 16:06 UTC (permalink / raw) To: linux-btrfs, kernel-team The cleaner thread usually takes care of delayed iputs, with the exception of the btrfs_end_transaction_throttle path. The cleaner thread only gets woken up every 30 seconds, so instead wake it up to do it's work so that we can free up that space as quickly as possible. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/disk-io.c | 3 +++ fs/btrfs/inode.c | 2 ++ 3 files changed, 8 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c8ddbacb6748..dc56a4d940c3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -769,6 +769,9 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr); */ #define BTRFS_FS_BALANCE_RUNNING 18 +/* Indicate that the cleaner thread is awake and doing something. */ +#define BTRFS_FS_CLEANER_RUNNING 19 + struct btrfs_fs_info { u8 fsid[BTRFS_FSID_SIZE]; u8 chunk_tree_uuid[BTRFS_UUID_SIZE]; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c5918ff8241b..f40f6fdc1019 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1669,6 +1669,8 @@ static int cleaner_kthread(void *arg) while (1) { again = 0; + set_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags); + /* Make the cleaner go to sleep early. */ if (btrfs_need_cleaner_sleep(fs_info)) goto sleep; @@ -1715,6 +1717,7 @@ static int cleaner_kthread(void *arg) */ btrfs_delete_unused_bgs(fs_info); sleep: + clear_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags); if (kthread_should_park()) kthread_parkme(); if (kthread_should_stop()) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8ac7abe2ae9b..0b9f3e482cea 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3264,6 +3264,8 @@ void btrfs_add_delayed_iput(struct inode *inode) ASSERT(list_empty(&binode->delayed_iput)); list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs); spin_unlock(&fs_info->delayed_iput_lock); + if (!test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags)) + wake_up_process(fs_info->cleaner_kthread); } void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info) -- 2.14.3 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-12-03 16:06 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik @ 2018-12-04 9:21 ` Nikolay Borisov 2018-12-04 18:18 ` Josef Bacik 0 siblings, 1 reply; 21+ messages in thread From: Nikolay Borisov @ 2018-12-04 9:21 UTC (permalink / raw) To: Josef Bacik, linux-btrfs, kernel-team On 3.12.18 г. 18:06 ч., Josef Bacik wrote: > The cleaner thread usually takes care of delayed iputs, with the > exception of the btrfs_end_transaction_throttle path. The cleaner > thread only gets woken up every 30 seconds, so instead wake it up to do > it's work so that we can free up that space as quickly as possible. This description misses any rationale whatsoever about why the cleaner needs to be woken up more frequently than 30 seconds (and IMO this is the most important question that needs answering). Also have you done any measurements of the number of processed delayed inodes with this change. Given the behavior you so desire why not just make delayed iputs to be run via schedule_work on the global workqueue and be done with it? I'm sure the latency will be better than the current 30 seconds one :) > > Reviewed-by: Filipe Manana <fdmanana@suse.com> > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/ctree.h | 3 +++ > fs/btrfs/disk-io.c | 3 +++ > fs/btrfs/inode.c | 2 ++ > 3 files changed, 8 insertions(+) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index c8ddbacb6748..dc56a4d940c3 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -769,6 +769,9 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr); > */ > #define BTRFS_FS_BALANCE_RUNNING 18 > > +/* Indicate that the cleaner thread is awake and doing something. */ > +#define BTRFS_FS_CLEANER_RUNNING 19 > + > struct btrfs_fs_info { > u8 fsid[BTRFS_FSID_SIZE]; > u8 chunk_tree_uuid[BTRFS_UUID_SIZE]; > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index c5918ff8241b..f40f6fdc1019 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -1669,6 +1669,8 @@ static int cleaner_kthread(void *arg) > while (1) { > again = 0; > > + set_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags); > + > /* Make the cleaner go to sleep early. */ > if (btrfs_need_cleaner_sleep(fs_info)) > goto sleep; > @@ -1715,6 +1717,7 @@ static int cleaner_kthread(void *arg) > */ > btrfs_delete_unused_bgs(fs_info); > sleep: > + clear_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags); > if (kthread_should_park()) > kthread_parkme(); > if (kthread_should_stop()) > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 8ac7abe2ae9b..0b9f3e482cea 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -3264,6 +3264,8 @@ void btrfs_add_delayed_iput(struct inode *inode) > ASSERT(list_empty(&binode->delayed_iput)); > list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs); > spin_unlock(&fs_info->delayed_iput_lock); > + if (!test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags)) > + wake_up_process(fs_info->cleaner_kthread); > } > > void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info) > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-12-04 9:21 ` Nikolay Borisov @ 2018-12-04 18:18 ` Josef Bacik 0 siblings, 0 replies; 21+ messages in thread From: Josef Bacik @ 2018-12-04 18:18 UTC (permalink / raw) To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team On Tue, Dec 04, 2018 at 11:21:14AM +0200, Nikolay Borisov wrote: > > > On 3.12.18 г. 18:06 ч., Josef Bacik wrote: > > The cleaner thread usually takes care of delayed iputs, with the > > exception of the btrfs_end_transaction_throttle path. The cleaner > > thread only gets woken up every 30 seconds, so instead wake it up to do > > it's work so that we can free up that space as quickly as possible. > > This description misses any rationale whatsoever about why the cleaner > needs to be woken up more frequently than 30 seconds (and IMO this is > the most important question that needs answering). > Yeah I'll add that. > Also have you done any measurements of the number of processed delayed > inodes with this change. Given the behavior you so desire why not just > make delayed iputs to be run via schedule_work on the global workqueue > and be done with it? I'm sure the latency will be better than the > current 30 seconds one :) We already have the cleaner thread to do this work, and it sets up for the snapshot drop stuff to be run as well. We could probably add another delayed work thing, but I would rather do that in a different patch. Thanks, Josef ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue 2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik 2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik 2018-12-03 16:06 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik @ 2018-12-03 16:06 ` Josef Bacik 2018-12-04 11:46 ` Nikolay Borisov 2019-01-08 15:34 ` [PATCH 0/3][V2] Delayed iput fixes David Sterba 3 siblings, 1 reply; 21+ messages in thread From: Josef Bacik @ 2018-12-03 16:06 UTC (permalink / raw) To: linux-btrfs, kernel-team The throttle path doesn't take cleaner_delayed_iput_mutex, which means we could think we're done flushing iputs in the data space reservation path when we could have a throttler doing an iput. There's no real reason to serialize the delayed iput flushing, so instead of taking the cleaner_delayed_iput_mutex whenever we flush the delayed iputs just replace it with an atomic counter and a waitqueue. This removes the short (or long depending on how big the inode is) window where we think there are no more pending iputs when there really are some. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/ctree.h | 4 +++- fs/btrfs/disk-io.c | 5 ++--- fs/btrfs/extent-tree.c | 13 ++++++++----- fs/btrfs/inode.c | 21 +++++++++++++++++++++ 4 files changed, 34 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index dc56a4d940c3..20af5d6d81f1 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -915,7 +915,8 @@ struct btrfs_fs_info { spinlock_t delayed_iput_lock; struct list_head delayed_iputs; - struct mutex cleaner_delayed_iput_mutex; + atomic_t nr_delayed_iputs; + wait_queue_head_t delayed_iputs_wait; /* this protects tree_mod_seq_list */ spinlock_t tree_mod_seq_lock; @@ -3240,6 +3241,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root); int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size); void btrfs_add_delayed_iput(struct inode *inode); void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info); +int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info); int btrfs_prealloc_file_range(struct inode *inode, int mode, u64 start, u64 num_bytes, u64 min_size, loff_t actual_len, u64 *alloc_hint); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f40f6fdc1019..238e0113f2d3 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1694,9 +1694,7 @@ static int cleaner_kthread(void *arg) goto sleep; } - mutex_lock(&fs_info->cleaner_delayed_iput_mutex); btrfs_run_delayed_iputs(fs_info); - mutex_unlock(&fs_info->cleaner_delayed_iput_mutex); again = btrfs_clean_one_deleted_snapshot(root); mutex_unlock(&fs_info->cleaner_mutex); @@ -2654,7 +2652,6 @@ int open_ctree(struct super_block *sb, mutex_init(&fs_info->delete_unused_bgs_mutex); mutex_init(&fs_info->reloc_mutex); mutex_init(&fs_info->delalloc_root_mutex); - mutex_init(&fs_info->cleaner_delayed_iput_mutex); seqlock_init(&fs_info->profiles_lock); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); @@ -2676,6 +2673,7 @@ int open_ctree(struct super_block *sb, atomic_set(&fs_info->defrag_running, 0); atomic_set(&fs_info->qgroup_op_seq, 0); atomic_set(&fs_info->reada_works_cnt, 0); + atomic_set(&fs_info->nr_delayed_iputs, 0); atomic64_set(&fs_info->tree_mod_seq, 0); fs_info->sb = sb; fs_info->max_inline = BTRFS_DEFAULT_MAX_INLINE; @@ -2753,6 +2751,7 @@ int open_ctree(struct super_block *sb, init_waitqueue_head(&fs_info->transaction_wait); init_waitqueue_head(&fs_info->transaction_blocked_wait); init_waitqueue_head(&fs_info->async_submit_wait); + init_waitqueue_head(&fs_info->delayed_iputs_wait); INIT_LIST_HEAD(&fs_info->pinned_chunks); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 0127d272cd2a..5b6c9fc227ff 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4280,10 +4280,14 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes) /* * The cleaner kthread might still be doing iput * operations. Wait for it to finish so that - * more space is released. + * more space is released. We don't need to + * explicitly run the delayed iputs here because + * the commit_transaction would have woken up + * the cleaner. */ - mutex_lock(&fs_info->cleaner_delayed_iput_mutex); - mutex_unlock(&fs_info->cleaner_delayed_iput_mutex); + ret = btrfs_wait_on_delayed_iputs(fs_info); + if (ret) + return ret; goto again; } else { btrfs_end_transaction(trans); @@ -4958,9 +4962,8 @@ static void flush_space(struct btrfs_fs_info *fs_info, * bunch of pinned space, so make sure we run the iputs before * we do our pinned bytes check below. */ - mutex_lock(&fs_info->cleaner_delayed_iput_mutex); btrfs_run_delayed_iputs(fs_info); - mutex_unlock(&fs_info->cleaner_delayed_iput_mutex); + btrfs_wait_on_delayed_iputs(fs_info); ret = may_commit_transaction(fs_info, space_info); break; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0b9f3e482cea..958e30c7c744 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3260,6 +3260,7 @@ void btrfs_add_delayed_iput(struct inode *inode) if (atomic_add_unless(&inode->i_count, -1, 1)) return; + atomic_inc(&fs_info->nr_delayed_iputs); spin_lock(&fs_info->delayed_iput_lock); ASSERT(list_empty(&binode->delayed_iput)); list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs); @@ -3280,11 +3281,31 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info) list_del_init(&inode->delayed_iput); spin_unlock(&fs_info->delayed_iput_lock); iput(&inode->vfs_inode); + if (atomic_dec_and_test(&fs_info->nr_delayed_iputs)) + wake_up(&fs_info->delayed_iputs_wait); spin_lock(&fs_info->delayed_iput_lock); } spin_unlock(&fs_info->delayed_iput_lock); } +/** + * btrfs_wait_on_delayed_iputs - wait on the delayed iputs to be done running + * @fs_info - the fs_info for this fs + * @return - EINTR if we were killed, 0 if nothing's pending + * + * This will wait on any delayed iputs that are currently running with KILLABLE + * set. Once they are all done running we will return, unless we are killed in + * which case we return EINTR. + */ +int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info) +{ + int ret = wait_event_killable(fs_info->delayed_iputs_wait, + atomic_read(&fs_info->nr_delayed_iputs) == 0); + if (ret) + return -EINTR; + return 0; +} + /* * This creates an orphan entry for the given inode in case something goes wrong * in the middle of an unlink. -- 2.14.3 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue 2018-12-03 16:06 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik @ 2018-12-04 11:46 ` Nikolay Borisov 2018-12-04 18:21 ` Josef Bacik 0 siblings, 1 reply; 21+ messages in thread From: Nikolay Borisov @ 2018-12-04 11:46 UTC (permalink / raw) To: Josef Bacik, linux-btrfs, kernel-team On 3.12.18 г. 18:06 ч., Josef Bacik wrote: > The throttle path doesn't take cleaner_delayed_iput_mutex, which means > we could think we're done flushing iputs in the data space reservation > path when we could have a throttler doing an iput. There's no real > reason to serialize the delayed iput flushing, so instead of taking the > cleaner_delayed_iput_mutex whenever we flush the delayed iputs just > replace it with an atomic counter and a waitqueue. This removes the > short (or long depending on how big the inode is) window where we think > there are no more pending iputs when there really are some. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/ctree.h | 4 +++- > fs/btrfs/disk-io.c | 5 ++--- > fs/btrfs/extent-tree.c | 13 ++++++++----- > fs/btrfs/inode.c | 21 +++++++++++++++++++++ > 4 files changed, 34 insertions(+), 9 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index dc56a4d940c3..20af5d6d81f1 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -915,7 +915,8 @@ struct btrfs_fs_info { > > spinlock_t delayed_iput_lock; > struct list_head delayed_iputs; > - struct mutex cleaner_delayed_iput_mutex; > + atomic_t nr_delayed_iputs; > + wait_queue_head_t delayed_iputs_wait; > Have you considered whether the same could be achieved with a completion rather than an open-coded waitqueue? I tried prototyping it and it could be done but it becomes messy regarding when the completion should be initialised i.e only when it's not in btrfs_add_delayed_iput <snip> > @@ -4958,9 +4962,8 @@ static void flush_space(struct btrfs_fs_info *fs_info, > * bunch of pinned space, so make sure we run the iputs before > * we do our pinned bytes check below. > */ > - mutex_lock(&fs_info->cleaner_delayed_iput_mutex); > btrfs_run_delayed_iputs(fs_info); > - mutex_unlock(&fs_info->cleaner_delayed_iput_mutex); > + btrfs_wait_on_delayed_iputs(fs_info); Waiting on delayed iputs here is pointless since they are run synchronously form this context. > > ret = may_commit_transaction(fs_info, space_info); > break; > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 0b9f3e482cea..958e30c7c744 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -3260,6 +3260,7 @@ void btrfs_add_delayed_iput(struct inode *inode) > if (atomic_add_unless(&inode->i_count, -1, 1)) > return; > > + atomic_inc(&fs_info->nr_delayed_iputs); > spin_lock(&fs_info->delayed_iput_lock); > ASSERT(list_empty(&binode->delayed_iput)); > list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs); > @@ -3280,11 +3281,31 @@ void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info) > list_del_init(&inode->delayed_iput); > spin_unlock(&fs_info->delayed_iput_lock); > iput(&inode->vfs_inode); > + if (atomic_dec_and_test(&fs_info->nr_delayed_iputs)) > + wake_up(&fs_info->delayed_iputs_wait); > spin_lock(&fs_info->delayed_iput_lock); > } > spin_unlock(&fs_info->delayed_iput_lock); > } > > +/** > + * btrfs_wait_on_delayed_iputs - wait on the delayed iputs to be done running > + * @fs_info - the fs_info for this fs > + * @return - EINTR if we were killed, 0 if nothing's pending > + * > + * This will wait on any delayed iputs that are currently running with KILLABLE > + * set. Once they are all done running we will return, unless we are killed in > + * which case we return EINTR. > + */ > +int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info) > +{ > + int ret = wait_event_killable(fs_info->delayed_iputs_wait, > + atomic_read(&fs_info->nr_delayed_iputs) == 0); > + if (ret) > + return -EINTR; > + return 0; > +} > + > /* > * This creates an orphan entry for the given inode in case something goes wrong > * in the middle of an unlink. > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue 2018-12-04 11:46 ` Nikolay Borisov @ 2018-12-04 18:21 ` Josef Bacik 0 siblings, 0 replies; 21+ messages in thread From: Josef Bacik @ 2018-12-04 18:21 UTC (permalink / raw) To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team On Tue, Dec 04, 2018 at 01:46:58PM +0200, Nikolay Borisov wrote: > > > On 3.12.18 г. 18:06 ч., Josef Bacik wrote: > > The throttle path doesn't take cleaner_delayed_iput_mutex, which means > > we could think we're done flushing iputs in the data space reservation > > path when we could have a throttler doing an iput. There's no real > > reason to serialize the delayed iput flushing, so instead of taking the > > cleaner_delayed_iput_mutex whenever we flush the delayed iputs just > > replace it with an atomic counter and a waitqueue. This removes the > > short (or long depending on how big the inode is) window where we think > > there are no more pending iputs when there really are some. > > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > > --- > > fs/btrfs/ctree.h | 4 +++- > > fs/btrfs/disk-io.c | 5 ++--- > > fs/btrfs/extent-tree.c | 13 ++++++++----- > > fs/btrfs/inode.c | 21 +++++++++++++++++++++ > > 4 files changed, 34 insertions(+), 9 deletions(-) > > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > > index dc56a4d940c3..20af5d6d81f1 100644 > > --- a/fs/btrfs/ctree.h > > +++ b/fs/btrfs/ctree.h > > @@ -915,7 +915,8 @@ struct btrfs_fs_info { > > > > spinlock_t delayed_iput_lock; > > struct list_head delayed_iputs; > > - struct mutex cleaner_delayed_iput_mutex; > > + atomic_t nr_delayed_iputs; > > + wait_queue_head_t delayed_iputs_wait; > > > > Have you considered whether the same could be achieved with a completion > rather than an open-coded waitqueue? I tried prototyping it and it could > be done but it becomes messy regarding when the completion should be > initialised i.e only when it's not in btrfs_add_delayed_iput > Yeah a waitqueue makes more sense here than a completion since it's not a one-off. > > <snip> > > > @@ -4958,9 +4962,8 @@ static void flush_space(struct btrfs_fs_info *fs_info, > > * bunch of pinned space, so make sure we run the iputs before > > * we do our pinned bytes check below. > > */ > > - mutex_lock(&fs_info->cleaner_delayed_iput_mutex); > > btrfs_run_delayed_iputs(fs_info); > > - mutex_unlock(&fs_info->cleaner_delayed_iput_mutex); > > + btrfs_wait_on_delayed_iputs(fs_info); > > Waiting on delayed iputs here is pointless since they are run > synchronously form this context. > Unless there are other threads (the cleaner thread) running iputs as well. We could be running an iput that is evicting the inode in another thread and we really want that space, so we need to wait here to make sure everybody is truly done. Thanks, Josef ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/3][V2] Delayed iput fixes 2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik ` (2 preceding siblings ...) 2018-12-03 16:06 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik @ 2019-01-08 15:34 ` David Sterba 2019-01-11 15:09 ` Josef Bacik 3 siblings, 1 reply; 21+ messages in thread From: David Sterba @ 2019-01-08 15:34 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-btrfs, kernel-team On Mon, Dec 03, 2018 at 11:06:49AM -0500, Josef Bacik wrote: > v1->v2: > - only wakeup if the cleaner isn't currently doing work. > - re-arranged some stuff for running delayed iputs during flushint. > - removed the open code wakeup in the waitqueue patch. > > -- Original message -- > > Here are some delayed iput fixes. Delayed iputs can hold reservations for a > while and there's no real good way to make sure they were gone for good, which > means we could early enospc when in reality if we had just waited for the iput > we would have had plenty of space. So fix this up by making us wait for delayed > iputs when deciding if we need to commit for enospc flushing, and then cleanup > and rework how we run delayed iputs to make it more straightforward to wait on > them and make sure we're all done using them. Thanks, I'd like to push this patchset to 5.0, can you please post the missing rationale for 2/3? I'm not expecting any code changes so it's just the text, I can update the patch, no need to resend the whole series. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/3][V2] Delayed iput fixes 2019-01-08 15:34 ` [PATCH 0/3][V2] Delayed iput fixes David Sterba @ 2019-01-11 15:09 ` Josef Bacik 0 siblings, 0 replies; 21+ messages in thread From: Josef Bacik @ 2019-01-11 15:09 UTC (permalink / raw) To: dsterba, Josef Bacik, linux-btrfs, kernel-team On Tue, Jan 08, 2019 at 04:34:39PM +0100, David Sterba wrote: > On Mon, Dec 03, 2018 at 11:06:49AM -0500, Josef Bacik wrote: > > v1->v2: > > - only wakeup if the cleaner isn't currently doing work. > > - re-arranged some stuff for running delayed iputs during flushint. > > - removed the open code wakeup in the waitqueue patch. > > > > -- Original message -- > > > > Here are some delayed iput fixes. Delayed iputs can hold reservations for a > > while and there's no real good way to make sure they were gone for good, which > > means we could early enospc when in reality if we had just waited for the iput > > we would have had plenty of space. So fix this up by making us wait for delayed > > iputs when deciding if we need to commit for enospc flushing, and then cleanup > > and rework how we run delayed iputs to make it more straightforward to wait on > > them and make sure we're all done using them. Thanks, > > I'd like to push this patchset to 5.0, can you please post the missing > rationale for 2/3? I'm not expecting any code changes so it's just the > text, I can update the patch, no need to resend the whole series. > Thanks. Sorry I saw this earlier this week and then promptly forgot about it, I'll send it shortly. Thanks, Josef ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 0/3][V3] Delayed iput fixes @ 2019-01-11 15:21 Josef Bacik 2019-01-11 15:21 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik 0 siblings, 1 reply; 21+ messages in thread From: Josef Bacik @ 2019-01-11 15:21 UTC (permalink / raw) To: linux-btrfs, kernel-team v1->v2: - only wakeup if the cleaner isn't currently doing work. - re-arranged some stuff for running delayed iputs during flushint. - removed the open code wakeup in the waitqueue patch. -- Original message -- Here are some delayed iput fixes. Delayed iputs can hold reservations for a while and there's no real good way to make sure they were gone for good, which means we could early enospc when in reality if we had just waited for the iput we would have had plenty of space. So fix this up by making us wait for delayed iputs when deciding if we need to commit for enospc flushing, and then cleanup and rework how we run delayed iputs to make it more straightforward to wait on them and make sure we're all done using them. Thanks, Josef ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2019-01-11 15:21 [PATCH 0/3][V3] " Josef Bacik @ 2019-01-11 15:21 ` Josef Bacik 0 siblings, 0 replies; 21+ messages in thread From: Josef Bacik @ 2019-01-11 15:21 UTC (permalink / raw) To: linux-btrfs, kernel-team The cleaner thread usually takes care of delayed iputs, with the exception of the btrfs_end_transaction_throttle path. Delaying iputs means we are potentially delaying the eviction of an inode and it's respective space. The cleaner thread only gets woken up every 30 seconds, or when we require space. If there are a lot of inodes that need to be deleted we could induce a serious amount of latency while we wait for these inodes to be evicted. So instead wakeup the cleaner if it's not already awake to process any new delayed iputs we add to the list. If we suddenly need space we will less likely be backed up behind a bunch of inodes that are waiting to be deleted, and we could possibly free space before we need to get into the flushing logic which will save us some latency. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/disk-io.c | 3 +++ fs/btrfs/inode.c | 2 ++ 3 files changed, 8 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c8ddbacb6748..dc56a4d940c3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -769,6 +769,9 @@ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr); */ #define BTRFS_FS_BALANCE_RUNNING 18 +/* Indicate that the cleaner thread is awake and doing something. */ +#define BTRFS_FS_CLEANER_RUNNING 19 + struct btrfs_fs_info { u8 fsid[BTRFS_FSID_SIZE]; u8 chunk_tree_uuid[BTRFS_UUID_SIZE]; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c5918ff8241b..f40f6fdc1019 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1669,6 +1669,8 @@ static int cleaner_kthread(void *arg) while (1) { again = 0; + set_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags); + /* Make the cleaner go to sleep early. */ if (btrfs_need_cleaner_sleep(fs_info)) goto sleep; @@ -1715,6 +1717,7 @@ static int cleaner_kthread(void *arg) */ btrfs_delete_unused_bgs(fs_info); sleep: + clear_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags); if (kthread_should_park()) kthread_parkme(); if (kthread_should_stop()) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8ac7abe2ae9b..0b9f3e482cea 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3264,6 +3264,8 @@ void btrfs_add_delayed_iput(struct inode *inode) ASSERT(list_empty(&binode->delayed_iput)); list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs); spin_unlock(&fs_info->delayed_iput_lock); + if (!test_bit(BTRFS_FS_CLEANER_RUNNING, &fs_info->flags)) + wake_up_process(fs_info->cleaner_kthread); } void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info) -- 2.14.3 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 0/3] Delayed iput fixes @ 2018-11-21 19:09 Josef Bacik 2018-11-21 19:09 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik 0 siblings, 1 reply; 21+ messages in thread From: Josef Bacik @ 2018-11-21 19:09 UTC (permalink / raw) To: linux-btrfs, kernel-team Here are some delayed iput fixes. Delayed iputs can hold reservations for a while and there's no real good way to make sure they were gone for good, which means we could early enospc when in reality if we had just waited for the iput we would have had plenty of space. So fix this up by making us wait for delayed iputs when deciding if we need to commit for enospc flushing, and then cleanup and rework how we run delayed iputs to make it more straightforward to wait on them and make sure we're all done using them. Thanks, Josef ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-11-21 19:09 [PATCH 0/3] Delayed iput fixes Josef Bacik @ 2018-11-21 19:09 ` Josef Bacik 2018-11-27 8:26 ` Nikolay Borisov 0 siblings, 1 reply; 21+ messages in thread From: Josef Bacik @ 2018-11-21 19:09 UTC (permalink / raw) To: linux-btrfs, kernel-team The cleaner thread usually takes care of delayed iputs, with the exception of the btrfs_end_transaction_throttle path. The cleaner thread only gets woken up every 30 seconds, so instead wake it up to do it's work so that we can free up that space as quickly as possible. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/inode.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3da9ac463344..3c42d8887183 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3264,6 +3264,7 @@ void btrfs_add_delayed_iput(struct inode *inode) ASSERT(list_empty(&binode->delayed_iput)); list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs); spin_unlock(&fs_info->delayed_iput_lock); + wake_up_process(fs_info->cleaner_kthread); } void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info) -- 2.14.3 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-11-21 19:09 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik @ 2018-11-27 8:26 ` Nikolay Borisov 2018-11-27 19:54 ` Josef Bacik 0 siblings, 1 reply; 21+ messages in thread From: Nikolay Borisov @ 2018-11-27 8:26 UTC (permalink / raw) To: Josef Bacik, linux-btrfs, kernel-team On 21.11.18 г. 21:09 ч., Josef Bacik wrote: > The cleaner thread usually takes care of delayed iputs, with the > exception of the btrfs_end_transaction_throttle path. The cleaner > thread only gets woken up every 30 seconds, so instead wake it up to do > it's work so that we can free up that space as quickly as possible. Have you done any measurements how this affects the overall system. I suspect this introduces a lot of noise since now we are going to be doing a thread wakeup on every iput, does this give a chance to have nice, large batches of iputs that the cost of wake up can be amortized across? > > Reviewed-by: Filipe Manana <fdmanana@suse.com> > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/inode.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 3da9ac463344..3c42d8887183 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -3264,6 +3264,7 @@ void btrfs_add_delayed_iput(struct inode *inode) > ASSERT(list_empty(&binode->delayed_iput)); > list_add_tail(&binode->delayed_iput, &fs_info->delayed_iputs); > spin_unlock(&fs_info->delayed_iput_lock); > + wake_up_process(fs_info->cleaner_kthread); > } > > void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info) > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-11-27 8:26 ` Nikolay Borisov @ 2018-11-27 19:54 ` Josef Bacik 2018-11-27 19:59 ` Chris Mason 0 siblings, 1 reply; 21+ messages in thread From: Josef Bacik @ 2018-11-27 19:54 UTC (permalink / raw) To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote: > > > On 21.11.18 г. 21:09 ч., Josef Bacik wrote: > > The cleaner thread usually takes care of delayed iputs, with the > > exception of the btrfs_end_transaction_throttle path. The cleaner > > thread only gets woken up every 30 seconds, so instead wake it up to do > > it's work so that we can free up that space as quickly as possible. > > Have you done any measurements how this affects the overall system. I > suspect this introduces a lot of noise since now we are going to be > doing a thread wakeup on every iput, does this give a chance to have > nice, large batches of iputs that the cost of wake up can be amortized > across? I ran the whole patchset with our A/B testing stuff and the patchset was a 5% win overall, so I'm inclined to think it's fine. Thanks, Josef ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-11-27 19:54 ` Josef Bacik @ 2018-11-27 19:59 ` Chris Mason 2018-11-27 20:08 ` Josef Bacik 0 siblings, 1 reply; 21+ messages in thread From: Chris Mason @ 2018-11-27 19:59 UTC (permalink / raw) To: Josef Bacik; +Cc: Nikolay Borisov, linux-btrfs, Kernel Team On 27 Nov 2018, at 14:54, Josef Bacik wrote: > On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote: >> >> >> On 21.11.18 г. 21:09 ч., Josef Bacik wrote: >>> The cleaner thread usually takes care of delayed iputs, with the >>> exception of the btrfs_end_transaction_throttle path. The cleaner >>> thread only gets woken up every 30 seconds, so instead wake it up to >>> do >>> it's work so that we can free up that space as quickly as possible. >> >> Have you done any measurements how this affects the overall system. I >> suspect this introduces a lot of noise since now we are going to be >> doing a thread wakeup on every iput, does this give a chance to have >> nice, large batches of iputs that the cost of wake up can be >> amortized >> across? > > I ran the whole patchset with our A/B testing stuff and the patchset > was a 5% > win overall, so I'm inclined to think it's fine. Thanks, It's a good point though, a delayed wakeup may be less overhead. -chris ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-11-27 19:59 ` Chris Mason @ 2018-11-27 20:08 ` Josef Bacik 2018-11-28 19:06 ` David Sterba 0 siblings, 1 reply; 21+ messages in thread From: Josef Bacik @ 2018-11-27 20:08 UTC (permalink / raw) To: Chris Mason; +Cc: Josef Bacik, Nikolay Borisov, linux-btrfs, Kernel Team On Tue, Nov 27, 2018 at 07:59:42PM +0000, Chris Mason wrote: > On 27 Nov 2018, at 14:54, Josef Bacik wrote: > > > On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote: > >> > >> > >> On 21.11.18 г. 21:09 ч., Josef Bacik wrote: > >>> The cleaner thread usually takes care of delayed iputs, with the > >>> exception of the btrfs_end_transaction_throttle path. The cleaner > >>> thread only gets woken up every 30 seconds, so instead wake it up to > >>> do > >>> it's work so that we can free up that space as quickly as possible. > >> > >> Have you done any measurements how this affects the overall system. I > >> suspect this introduces a lot of noise since now we are going to be > >> doing a thread wakeup on every iput, does this give a chance to have > >> nice, large batches of iputs that the cost of wake up can be > >> amortized > >> across? > > > > I ran the whole patchset with our A/B testing stuff and the patchset > > was a 5% > > win overall, so I'm inclined to think it's fine. Thanks, > > It's a good point though, a delayed wakeup may be less overhead. Sure, but how do we go about that without it sometimes messing up? In practice the only time we're doing this is at the end of finish_ordered_io, so likely to not be a constant stream. I suppose since we have places where we force it to run that we don't really need this. IDK, I'm fine with dropping it. Thanks, Josef ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-11-27 20:08 ` Josef Bacik @ 2018-11-28 19:06 ` David Sterba 2018-11-28 19:32 ` Chris Mason 2018-11-28 20:08 ` Filipe Manana 0 siblings, 2 replies; 21+ messages in thread From: David Sterba @ 2018-11-28 19:06 UTC (permalink / raw) To: Josef Bacik; +Cc: Chris Mason, Nikolay Borisov, linux-btrfs, Kernel Team On Tue, Nov 27, 2018 at 03:08:08PM -0500, Josef Bacik wrote: > On Tue, Nov 27, 2018 at 07:59:42PM +0000, Chris Mason wrote: > > On 27 Nov 2018, at 14:54, Josef Bacik wrote: > > > > > On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote: > > >> > > >> > > >> On 21.11.18 г. 21:09 ч., Josef Bacik wrote: > > >>> The cleaner thread usually takes care of delayed iputs, with the > > >>> exception of the btrfs_end_transaction_throttle path. The cleaner > > >>> thread only gets woken up every 30 seconds, so instead wake it up to > > >>> do > > >>> it's work so that we can free up that space as quickly as possible. > > >> > > >> Have you done any measurements how this affects the overall system. I > > >> suspect this introduces a lot of noise since now we are going to be > > >> doing a thread wakeup on every iput, does this give a chance to have > > >> nice, large batches of iputs that the cost of wake up can be > > >> amortized > > >> across? > > > > > > I ran the whole patchset with our A/B testing stuff and the patchset > > > was a 5% > > > win overall, so I'm inclined to think it's fine. Thanks, > > > > It's a good point though, a delayed wakeup may be less overhead. > > Sure, but how do we go about that without it sometimes messing up? In practice > the only time we're doing this is at the end of finish_ordered_io, so likely to > not be a constant stream. I suppose since we have places where we force it to > run that we don't really need this. IDK, I'm fine with dropping it. Thanks, The transaction thread wakes up cleaner periodically (commit interval, 30s by default), so the time to process iputs is not unbounded. I have the same concerns as Nikolay, coupling the wakeup to all delayed iputs could result in smaller batches. But some of the callers of btrfs_add_delayed_iput might benefit from the extra wakeup, like btrfs_remove_block_group, so I don't want to leave the idea yet. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-11-28 19:06 ` David Sterba @ 2018-11-28 19:32 ` Chris Mason 2018-11-28 20:08 ` Filipe Manana 1 sibling, 0 replies; 21+ messages in thread From: Chris Mason @ 2018-11-28 19:32 UTC (permalink / raw) To: David Sterba; +Cc: Josef Bacik, Nikolay Borisov, linux-btrfs, Kernel Team On 28 Nov 2018, at 14:06, David Sterba wrote: > On Tue, Nov 27, 2018 at 03:08:08PM -0500, Josef Bacik wrote: >> On Tue, Nov 27, 2018 at 07:59:42PM +0000, Chris Mason wrote: >>> On 27 Nov 2018, at 14:54, Josef Bacik wrote: >>> >>>> On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote: >>>>> >>>>> >>>>> On 21.11.18 г. 21:09 ч., Josef Bacik wrote: >>>>>> The cleaner thread usually takes care of delayed iputs, with the >>>>>> exception of the btrfs_end_transaction_throttle path. The >>>>>> cleaner >>>>>> thread only gets woken up every 30 seconds, so instead wake it up >>>>>> to >>>>>> do >>>>>> it's work so that we can free up that space as quickly as >>>>>> possible. >>>>> >>>>> Have you done any measurements how this affects the overall >>>>> system. I >>>>> suspect this introduces a lot of noise since now we are going to >>>>> be >>>>> doing a thread wakeup on every iput, does this give a chance to >>>>> have >>>>> nice, large batches of iputs that the cost of wake up can be >>>>> amortized >>>>> across? >>>> >>>> I ran the whole patchset with our A/B testing stuff and the >>>> patchset >>>> was a 5% >>>> win overall, so I'm inclined to think it's fine. Thanks, >>> >>> It's a good point though, a delayed wakeup may be less overhead. >> >> Sure, but how do we go about that without it sometimes messing up? >> In practice >> the only time we're doing this is at the end of finish_ordered_io, so >> likely to >> not be a constant stream. I suppose since we have places where we >> force it to >> run that we don't really need this. IDK, I'm fine with dropping it. >> Thanks, > > The transaction thread wakes up cleaner periodically (commit interval, > 30s by default), so the time to process iputs is not unbounded. > > I have the same concerns as Nikolay, coupling the wakeup to all > delayed > iputs could result in smaller batches. But some of the callers of > btrfs_add_delayed_iput might benefit from the extra wakeup, like > btrfs_remove_block_group, so I don't want to leave the idea yet. Yeah, I love the idea, I'm just worried about a tiny bit of rate limiting. Since we're only waking up a fixed process and not creating new work queue items every time, it's probably fine, but a delay of HZ/2 would probably be even better. -chris ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-11-28 19:06 ` David Sterba 2018-11-28 19:32 ` Chris Mason @ 2018-11-28 20:08 ` Filipe Manana 2018-11-29 0:30 ` Qu Wenruo 1 sibling, 1 reply; 21+ messages in thread From: Filipe Manana @ 2018-11-28 20:08 UTC (permalink / raw) To: dsterba, Josef Bacik, Chris Mason, Nikolay Borisov, linux-btrfs, Kernel-team On Wed, Nov 28, 2018 at 7:09 PM David Sterba <dsterba@suse.cz> wrote: > > On Tue, Nov 27, 2018 at 03:08:08PM -0500, Josef Bacik wrote: > > On Tue, Nov 27, 2018 at 07:59:42PM +0000, Chris Mason wrote: > > > On 27 Nov 2018, at 14:54, Josef Bacik wrote: > > > > > > > On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote: > > > >> > > > >> > > > >> On 21.11.18 г. 21:09 ч., Josef Bacik wrote: > > > >>> The cleaner thread usually takes care of delayed iputs, with the > > > >>> exception of the btrfs_end_transaction_throttle path. The cleaner > > > >>> thread only gets woken up every 30 seconds, so instead wake it up to > > > >>> do > > > >>> it's work so that we can free up that space as quickly as possible. > > > >> > > > >> Have you done any measurements how this affects the overall system. I > > > >> suspect this introduces a lot of noise since now we are going to be > > > >> doing a thread wakeup on every iput, does this give a chance to have > > > >> nice, large batches of iputs that the cost of wake up can be > > > >> amortized > > > >> across? > > > > > > > > I ran the whole patchset with our A/B testing stuff and the patchset > > > > was a 5% > > > > win overall, so I'm inclined to think it's fine. Thanks, > > > > > > It's a good point though, a delayed wakeup may be less overhead. > > > > Sure, but how do we go about that without it sometimes messing up? In practice > > the only time we're doing this is at the end of finish_ordered_io, so likely to > > not be a constant stream. I suppose since we have places where we force it to > > run that we don't really need this. IDK, I'm fine with dropping it. Thanks, > > The transaction thread wakes up cleaner periodically (commit interval, > 30s by default), so the time to process iputs is not unbounded. > > I have the same concerns as Nikolay, coupling the wakeup to all delayed > iputs could result in smaller batches. But some of the callers of > btrfs_add_delayed_iput might benefit from the extra wakeup, like > btrfs_remove_block_group, so I don't want to leave the idea yet. I'm curious, why do you think it would benefit btrfs_remove_block_group()? -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput 2018-11-28 20:08 ` Filipe Manana @ 2018-11-29 0:30 ` Qu Wenruo 0 siblings, 0 replies; 21+ messages in thread From: Qu Wenruo @ 2018-11-29 0:30 UTC (permalink / raw) To: fdmanana, dsterba, Josef Bacik, Chris Mason, Nikolay Borisov, linux-btrfs, Kernel-team On 2018/11/29 上午4:08, Filipe Manana wrote: > On Wed, Nov 28, 2018 at 7:09 PM David Sterba <dsterba@suse.cz> wrote: >> >> On Tue, Nov 27, 2018 at 03:08:08PM -0500, Josef Bacik wrote: >>> On Tue, Nov 27, 2018 at 07:59:42PM +0000, Chris Mason wrote: >>>> On 27 Nov 2018, at 14:54, Josef Bacik wrote: >>>> >>>>> On Tue, Nov 27, 2018 at 10:26:15AM +0200, Nikolay Borisov wrote: >>>>>> >>>>>> >>>>>> On 21.11.18 г. 21:09 ч., Josef Bacik wrote: >>>>>>> The cleaner thread usually takes care of delayed iputs, with the >>>>>>> exception of the btrfs_end_transaction_throttle path. The cleaner >>>>>>> thread only gets woken up every 30 seconds, so instead wake it up to >>>>>>> do >>>>>>> it's work so that we can free up that space as quickly as possible. >>>>>> >>>>>> Have you done any measurements how this affects the overall system. I >>>>>> suspect this introduces a lot of noise since now we are going to be >>>>>> doing a thread wakeup on every iput, does this give a chance to have >>>>>> nice, large batches of iputs that the cost of wake up can be >>>>>> amortized >>>>>> across? >>>>> >>>>> I ran the whole patchset with our A/B testing stuff and the patchset >>>>> was a 5% >>>>> win overall, so I'm inclined to think it's fine. Thanks, >>>> >>>> It's a good point though, a delayed wakeup may be less overhead. >>> >>> Sure, but how do we go about that without it sometimes messing up? In practice >>> the only time we're doing this is at the end of finish_ordered_io, so likely to >>> not be a constant stream. I suppose since we have places where we force it to >>> run that we don't really need this. IDK, I'm fine with dropping it. Thanks, >> >> The transaction thread wakes up cleaner periodically (commit interval, >> 30s by default), so the time to process iputs is not unbounded. >> >> I have the same concerns as Nikolay, coupling the wakeup to all delayed >> iputs could result in smaller batches. But some of the callers of >> btrfs_add_delayed_iput might benefit from the extra wakeup, like >> btrfs_remove_block_group, so I don't want to leave the idea yet. > > I'm curious, why do you think it would benefit btrfs_remove_block_group()? Just as Filipe said, I'm not sure why btrfs_remove_block_group() would get some benefit from more frequent cleaner thread wake up. For an empty block group to really be removed, it also needs to have 0 pinned bytenr, which is only possible after a transaction get committed. Thanks, Qu ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-01-11 15:21 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-03 16:06 [PATCH 0/3][V2] Delayed iput fixes Josef Bacik 2018-12-03 16:06 ` [PATCH 1/3] btrfs: run delayed iputs before committing Josef Bacik 2018-12-04 9:03 ` Nikolay Borisov 2018-12-03 16:06 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik 2018-12-04 9:21 ` Nikolay Borisov 2018-12-04 18:18 ` Josef Bacik 2018-12-03 16:06 ` [PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik 2018-12-04 11:46 ` Nikolay Borisov 2018-12-04 18:21 ` Josef Bacik 2019-01-08 15:34 ` [PATCH 0/3][V2] Delayed iput fixes David Sterba 2019-01-11 15:09 ` Josef Bacik -- strict thread matches above, loose matches on Subject: below -- 2019-01-11 15:21 [PATCH 0/3][V3] " Josef Bacik 2019-01-11 15:21 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik 2018-11-21 19:09 [PATCH 0/3] Delayed iput fixes Josef Bacik 2018-11-21 19:09 ` [PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik 2018-11-27 8:26 ` Nikolay Borisov 2018-11-27 19:54 ` Josef Bacik 2018-11-27 19:59 ` Chris Mason 2018-11-27 20:08 ` Josef Bacik 2018-11-28 19:06 ` David Sterba 2018-11-28 19:32 ` Chris Mason 2018-11-28 20:08 ` Filipe Manana 2018-11-29 0:30 ` Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).