linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH block 1/2] writeback, cgroup: Adjust WB_FRN_TIME_CUT_DIV to accelerate foreign inode switching
@ 2019-08-02 19:07 Tejun Heo
  2019-08-02 19:08 ` [PATCH block 2/2] writeback, cgroup: inode_switch_wbs() shouldn't give up on wb_switch_rwsem trylock fail Tejun Heo
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Tejun Heo @ 2019-08-02 19:07 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara; +Cc: linux-block, kernel-team, linux-kernel

WB_FRN_TIME_CUT_DIV is used to tell the foreign inode detection logic
to ignore short writeback rounds to prevent getting confused by a
burst of short writebacks.  The parameter is currently 2 meaning that
anything smaller than half of the running average writback duration
will be ignored.

This is unnecessarily aggressive.  The detection logic uses 16 history
slots and is already reasonably protected against some short bursts
confusing it and the current parameter can lead to tens of seconds of
missed detection depending on the writeback pattern.

Let's change the parameter to 8, so that it only ignores writeback
with are smaller than 12.5% of the current running average.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/fs-writeback.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -227,7 +227,7 @@ static void wb_wait_for_completion(struc
 /* parameters for foreign inode detection, see wb_detach_inode() */
 #define WB_FRN_TIME_SHIFT	13	/* 1s = 2^13, upto 8 secs w/ 16bit */
 #define WB_FRN_TIME_AVG_SHIFT	3	/* avg = avg * 7/8 + new * 1/8 */
-#define WB_FRN_TIME_CUT_DIV	2	/* ignore rounds < avg / 2 */
+#define WB_FRN_TIME_CUT_DIV	8	/* ignore rounds < avg / 8 */
 #define WB_FRN_TIME_PERIOD	(2 * (1 << WB_FRN_TIME_SHIFT))	/* 2s */
 
 #define WB_FRN_HIST_SLOTS	16	/* inode->i_wb_frn_history is 16bit */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH block 2/2] writeback, cgroup: inode_switch_wbs() shouldn't give up on wb_switch_rwsem trylock fail
  2019-08-02 19:07 [PATCH block 1/2] writeback, cgroup: Adjust WB_FRN_TIME_CUT_DIV to accelerate foreign inode switching Tejun Heo
@ 2019-08-02 19:08 ` Tejun Heo
  2019-08-15 13:53   ` Jan Kara
  2019-08-15 13:48 ` [PATCH block 1/2] writeback, cgroup: Adjust WB_FRN_TIME_CUT_DIV to accelerate foreign inode switching Jan Kara
  2019-08-15 19:25 ` [PATCH v2 " Tejun Heo
  2 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2019-08-02 19:08 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara; +Cc: linux-block, kernel-team, linux-kernel

As inode wb switching may make sync(2) miss some inodes, they're
synchronized using wb_switch_rwsem so that no wb switching happens
while sync(2) is in progress.  In addition to synchronizing the actual
switching, the rwsem is also used to prevent queueing new switch
attempts while sync(2) is in progress.  This is to avoid queueing too
many instances while the rwsem is held by sync(2).  Unfortunately,
this is too agressive and can block wb switching for a long time if
sync(2) is frequent.

The goal is avoiding expolding the number of scheduled switches, not
avoiding scheduling anything.  Let's use wb_switch_rwsem only for
synchronizing the actual switching and sync(2) and use
isw_nr_in_flight instead for limiting the maximum number of scheduled
switches.  The limit is set to 1024 which should be more than enough
while still avoiding extreme situations.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 fs/fs-writeback.c |   17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -237,6 +237,7 @@ static void wb_wait_for_completion(struc
 					/* if foreign slots >= 8, switch */
 #define WB_FRN_HIST_MAX_SLOTS	(WB_FRN_HIST_THR_SLOTS / 2 + 1)
 					/* one round can affect upto 5 slots */
+#define WB_FRN_MAX_IN_FLIGHT	1024	/* don't queue too many concurrently */
 
 static atomic_t isw_nr_in_flight = ATOMIC_INIT(0);
 static struct workqueue_struct *isw_wq;
@@ -489,18 +490,13 @@ static void inode_switch_wbs(struct inod
 	if (inode->i_state & I_WB_SWITCH)
 		return;
 
-	/*
-	 * Avoid starting new switches while sync_inodes_sb() is in
-	 * progress.  Otherwise, if the down_write protected issue path
-	 * blocks heavily, we might end up starting a large number of
-	 * switches which will block on the rwsem.
-	 */
-	if (!down_read_trylock(&bdi->wb_switch_rwsem))
+	/* avoid queueing a new switch if too many are already in flight */
+	if (atomic_read(&isw_nr_in_flight) > WB_FRN_MAX_IN_FLIGHT)
 		return;
 
 	isw = kzalloc(sizeof(*isw), GFP_ATOMIC);
 	if (!isw)
-		goto out_unlock;
+		return;
 
 	/* find and pin the new wb */
 	rcu_read_lock();
@@ -534,15 +530,12 @@ static void inode_switch_wbs(struct inod
 	call_rcu(&isw->rcu_head, inode_switch_wbs_rcu_fn);
 
 	atomic_inc(&isw_nr_in_flight);
-
-	goto out_unlock;
+	return;
 
 out_free:
 	if (isw->new_wb)
 		wb_put(isw->new_wb);
 	kfree(isw);
-out_unlock:
-	up_read(&bdi->wb_switch_rwsem);
 }
 
 /**

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH block 1/2] writeback, cgroup: Adjust WB_FRN_TIME_CUT_DIV to accelerate foreign inode switching
  2019-08-02 19:07 [PATCH block 1/2] writeback, cgroup: Adjust WB_FRN_TIME_CUT_DIV to accelerate foreign inode switching Tejun Heo
  2019-08-02 19:08 ` [PATCH block 2/2] writeback, cgroup: inode_switch_wbs() shouldn't give up on wb_switch_rwsem trylock fail Tejun Heo
@ 2019-08-15 13:48 ` Jan Kara
  2019-08-15 19:25 ` [PATCH v2 " Tejun Heo
  2 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2019-08-15 13:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, Jan Kara, linux-block, kernel-team, linux-kernel

On Fri 02-08-19 12:07:38, Tejun Heo wrote:
> WB_FRN_TIME_CUT_DIV is used to tell the foreign inode detection logic
> to ignore short writeback rounds to prevent getting confused by a
> burst of short writebacks.  The parameter is currently 2 meaning that
> anything smaller than half of the running average writback duration
> will be ignored.
> 
> This is unnecessarily aggressive.  The detection logic uses 16 history
> slots and is already reasonably protected against some short bursts
> confusing it and the current parameter can lead to tens of seconds of
> missed detection depending on the writeback pattern.
> 
> Let's change the parameter to 8, so that it only ignores writeback
> with are smaller than 12.5% of the current running average.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Makes sense to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/fs-writeback.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -227,7 +227,7 @@ static void wb_wait_for_completion(struc
>  /* parameters for foreign inode detection, see wb_detach_inode() */
>  #define WB_FRN_TIME_SHIFT	13	/* 1s = 2^13, upto 8 secs w/ 16bit */
>  #define WB_FRN_TIME_AVG_SHIFT	3	/* avg = avg * 7/8 + new * 1/8 */
> -#define WB_FRN_TIME_CUT_DIV	2	/* ignore rounds < avg / 2 */
> +#define WB_FRN_TIME_CUT_DIV	8	/* ignore rounds < avg / 8 */
>  #define WB_FRN_TIME_PERIOD	(2 * (1 << WB_FRN_TIME_SHIFT))	/* 2s */
>  
>  #define WB_FRN_HIST_SLOTS	16	/* inode->i_wb_frn_history is 16bit */
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH block 2/2] writeback, cgroup: inode_switch_wbs() shouldn't give up on wb_switch_rwsem trylock fail
  2019-08-02 19:08 ` [PATCH block 2/2] writeback, cgroup: inode_switch_wbs() shouldn't give up on wb_switch_rwsem trylock fail Tejun Heo
@ 2019-08-15 13:53   ` Jan Kara
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2019-08-15 13:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, Jan Kara, linux-block, kernel-team, linux-kernel

On Fri 02-08-19 12:08:13, Tejun Heo wrote:
> As inode wb switching may make sync(2) miss some inodes, they're
> synchronized using wb_switch_rwsem so that no wb switching happens
> while sync(2) is in progress.  In addition to synchronizing the actual
> switching, the rwsem is also used to prevent queueing new switch
> attempts while sync(2) is in progress.  This is to avoid queueing too
> many instances while the rwsem is held by sync(2).  Unfortunately,
> this is too agressive and can block wb switching for a long time if
> sync(2) is frequent.
> 
> The goal is avoiding expolding the number of scheduled switches, not
> avoiding scheduling anything.  Let's use wb_switch_rwsem only for
> synchronizing the actual switching and sync(2) and use
> isw_nr_in_flight instead for limiting the maximum number of scheduled
> switches.  The limit is set to 1024 which should be more than enough
> while still avoiding extreme situations.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  fs/fs-writeback.c |   17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -237,6 +237,7 @@ static void wb_wait_for_completion(struc
>  					/* if foreign slots >= 8, switch */
>  #define WB_FRN_HIST_MAX_SLOTS	(WB_FRN_HIST_THR_SLOTS / 2 + 1)
>  					/* one round can affect upto 5 slots */
> +#define WB_FRN_MAX_IN_FLIGHT	1024	/* don't queue too many concurrently */
>  
>  static atomic_t isw_nr_in_flight = ATOMIC_INIT(0);
>  static struct workqueue_struct *isw_wq;
> @@ -489,18 +490,13 @@ static void inode_switch_wbs(struct inod
>  	if (inode->i_state & I_WB_SWITCH)
>  		return;
>  
> -	/*
> -	 * Avoid starting new switches while sync_inodes_sb() is in
> -	 * progress.  Otherwise, if the down_write protected issue path
> -	 * blocks heavily, we might end up starting a large number of
> -	 * switches which will block on the rwsem.
> -	 */
> -	if (!down_read_trylock(&bdi->wb_switch_rwsem))
> +	/* avoid queueing a new switch if too many are already in flight */
> +	if (atomic_read(&isw_nr_in_flight) > WB_FRN_MAX_IN_FLIGHT)
>  		return;
>  
>  	isw = kzalloc(sizeof(*isw), GFP_ATOMIC);
>  	if (!isw)
> -		goto out_unlock;
> +		return;
>  
>  	/* find and pin the new wb */
>  	rcu_read_lock();
> @@ -534,15 +530,12 @@ static void inode_switch_wbs(struct inod
>  	call_rcu(&isw->rcu_head, inode_switch_wbs_rcu_fn);
>  
>  	atomic_inc(&isw_nr_in_flight);
> -
> -	goto out_unlock;
> +	return;
>  
>  out_free:
>  	if (isw->new_wb)
>  		wb_put(isw->new_wb);
>  	kfree(isw);
> -out_unlock:
> -	up_read(&bdi->wb_switch_rwsem);
>  }
>  
>  /**
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 block 1/2] writeback, cgroup: Adjust WB_FRN_TIME_CUT_DIV to accelerate foreign inode switching
  2019-08-02 19:07 [PATCH block 1/2] writeback, cgroup: Adjust WB_FRN_TIME_CUT_DIV to accelerate foreign inode switching Tejun Heo
  2019-08-02 19:08 ` [PATCH block 2/2] writeback, cgroup: inode_switch_wbs() shouldn't give up on wb_switch_rwsem trylock fail Tejun Heo
  2019-08-15 13:48 ` [PATCH block 1/2] writeback, cgroup: Adjust WB_FRN_TIME_CUT_DIV to accelerate foreign inode switching Jan Kara
@ 2019-08-15 19:25 ` Tejun Heo
  2019-08-15 19:31   ` Jens Axboe
  2 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2019-08-15 19:25 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara; +Cc: linux-block, kernel-team, linux-kernel

WB_FRN_TIME_CUT_DIV is used to tell the foreign inode detection logic
to ignore short writeback rounds to prevent getting confused by a
burst of short writebacks.  The parameter is currently 2 meaning that
anything smaller than half of the running average writback duration
will be ignored.

This is unnecessarily aggressive.  The detection logic uses 16 history
slots and is already reasonably protected against some short bursts
confusing it and the current parameter can lead to tens of seconds of
missed detection depending on the writeback pattern.

Let's change the parameter to 8, so that it only ignores writeback
with are smaller than 12.5% of the current running average.

v2: Add comment explaining what's going on with the foreign detection
    parameters.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |   22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -224,10 +224,28 @@ static void wb_wait_for_completion(struc
 
 #ifdef CONFIG_CGROUP_WRITEBACK
 
-/* parameters for foreign inode detection, see wb_detach_inode() */
+/*
+ * Parameters for foreign inode detection, see wbc_detach_inode() to see
+ * how they're used.
+ *
+ * These paramters are inherently heuristical as the detection target
+ * itself is fuzzy.  All we want to do is detaching an inode from the
+ * current owner if it's being written to by some other cgroups too much.
+ *
+ * The current cgroup writeback is built on the assumption that multiple
+ * cgroups writing to the same inode concurrently is very rare and a mode
+ * of operation which isn't well supported.  As such, the goal is not
+ * taking too long when a different cgroup takes over an inode while
+ * avoiding too aggressive flip-flops from occasional foreign writes.
+ *
+ * We record, very roughly, 2s worth of IO time history and if more than
+ * half of that is foreign, trigger the switch.  The recording is quantized
+ * to 16 slots.  To avoid tiny writes from swinging the decision too much,
+ * writes smaller than 1/8 of avg size are ignored.
+ */
 #define WB_FRN_TIME_SHIFT	13	/* 1s = 2^13, upto 8 secs w/ 16bit */
 #define WB_FRN_TIME_AVG_SHIFT	3	/* avg = avg * 7/8 + new * 1/8 */
-#define WB_FRN_TIME_CUT_DIV	2	/* ignore rounds < avg / 2 */
+#define WB_FRN_TIME_CUT_DIV	8	/* ignore rounds < avg / 8 */
 #define WB_FRN_TIME_PERIOD	(2 * (1 << WB_FRN_TIME_SHIFT))	/* 2s */
 
 #define WB_FRN_HIST_SLOTS	16	/* inode->i_wb_frn_history is 16bit */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 block 1/2] writeback, cgroup: Adjust WB_FRN_TIME_CUT_DIV to accelerate foreign inode switching
  2019-08-15 19:25 ` [PATCH v2 " Tejun Heo
@ 2019-08-15 19:31   ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2019-08-15 19:31 UTC (permalink / raw)
  To: Tejun Heo, Jan Kara; +Cc: linux-block, kernel-team, linux-kernel

On 8/15/19 1:25 PM, Tejun Heo wrote:
> WB_FRN_TIME_CUT_DIV is used to tell the foreign inode detection logic
> to ignore short writeback rounds to prevent getting confused by a
> burst of short writebacks.  The parameter is currently 2 meaning that
> anything smaller than half of the running average writback duration
> will be ignored.
> 
> This is unnecessarily aggressive.  The detection logic uses 16 history
> slots and is already reasonably protected against some short bursts
> confusing it and the current parameter can lead to tens of seconds of
> missed detection depending on the writeback pattern.
> 
> Let's change the parameter to 8, so that it only ignores writeback
> with are smaller than 12.5% of the current running average.
> 
> v2: Add comment explaining what's going on with the foreign detection
>      parameters.

Applied 1-2 for 5.4, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-08-15 19:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-02 19:07 [PATCH block 1/2] writeback, cgroup: Adjust WB_FRN_TIME_CUT_DIV to accelerate foreign inode switching Tejun Heo
2019-08-02 19:08 ` [PATCH block 2/2] writeback, cgroup: inode_switch_wbs() shouldn't give up on wb_switch_rwsem trylock fail Tejun Heo
2019-08-15 13:53   ` Jan Kara
2019-08-15 13:48 ` [PATCH block 1/2] writeback, cgroup: Adjust WB_FRN_TIME_CUT_DIV to accelerate foreign inode switching Jan Kara
2019-08-15 19:25 ` [PATCH v2 " Tejun Heo
2019-08-15 19:31   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).