linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] ext4: fix lazy initialization next schedule time computation in more granular unit
@ 2021-08-17 22:56 Shaoying Xu
  2021-08-17 22:56 ` [PATCH 1/1] " Shaoying Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Shaoying Xu @ 2021-08-17 22:56 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: linux-ext4, linux-kernel, fllinden, benh, shaoyi

Description
===========
Ext4 FS has issue on the next schedule time calculation which is based on 
the time one request takes to zero out inode table. In the implementations of ext4 
lazy initialization, all time related unit is jiffy and also use the current 
value of jiffies to calculate time intervals. Around 11 years ago, data from 
Lukas that “zeroing one single group takes approx. 28ms without any load” 
however with hardware as well as software are getting much upgraded, this number 
becomes much smaller today that is even less than 1 jiffy, which makes the calculation 
of the next schedule time not accurate and it is reflected on Amazon Linux 2 in 
x86_64 and arm64 these two architectures with various HZ values (x86_64: 250, arm64: 100) 
that the sleeping time is the same in both arches which is 10 jiffies given the 
default multiplier 10, while 10 jiffies are different in these 2 arches in real time 
which leads to obvious performance degradation of this ext4lazyinit thread.

Test
====
Tested the patch in stable kernel 5.10 with FS volume 2T and 3T on EC2
x86_64 and arm64 instances. Before the fix, x86_64 instances with HZ value 
250 finished the lazy initialization in around 2.4x time less than arm64
instances with HZ value 100. After the fix, both of them finished within
approximately same time. Then changed HZ value in arm64 to 250, no obvious
difference in terms of time was observed. 

Patch
=====
Shaoying Xu (1):
  ext4: fix lazy initialization next schedule time computation in more
    granular unit

 fs/ext4/super.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

-- 
2.16.6


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/1] ext4: fix lazy initialization next schedule time computation in more granular unit
  2021-08-17 22:56 [PATCH 0/1] ext4: fix lazy initialization next schedule time computation in more granular unit Shaoying Xu
@ 2021-08-17 22:56 ` Shaoying Xu
  2021-10-14  9:56   ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Shaoying Xu @ 2021-08-17 22:56 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: linux-ext4, linux-kernel, fllinden, benh, shaoyi

Ext4 file system has default lazy inode table initialization setup once
it is mounted. However, it has issue on computing the next schedule time
that makes the timeout same amount in jiffies but different real time in
secs if with various HZ values. Therefore, fix by measuring the current
time in a more granular unit nanoseconds and make the next schedule time
independent of the HZ value.

Fixes: bfff68738f1c ("ext4: add support for lazy inode table initialization")
Signed-off-by: Shaoying Xu <shaoyi@amazon.com>
Cc: stable@vger.kernel.org
---
 fs/ext4/super.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dfa09a277b56..399cbe9c14a8 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3421,9 +3421,9 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 	struct super_block *sb = elr->lr_super;
 	ext4_group_t ngroups = EXT4_SB(sb)->s_groups_count;
 	ext4_group_t group = elr->lr_next_group;
-	unsigned long timeout = 0;
 	unsigned int prefetch_ios = 0;
 	int ret = 0;
+	u64 start_time;
 
 	if (elr->lr_mode == EXT4_LI_MODE_PREFETCH_BBITMAP) {
 		elr->lr_next_group = ext4_mb_prefetch(sb, group,
@@ -3460,14 +3460,13 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 		ret = 1;
 
 	if (!ret) {
-		timeout = jiffies;
+		start_time = ktime_get_real_ns();
 		ret = ext4_init_inode_table(sb, group,
 					    elr->lr_timeout ? 0 : 1);
 		trace_ext4_lazy_itable_init(sb, group);
 		if (elr->lr_timeout == 0) {
-			timeout = (jiffies - timeout) *
-				EXT4_SB(elr->lr_super)->s_li_wait_mult;
-			elr->lr_timeout = timeout;
+			elr->lr_timeout = nsecs_to_jiffies((ktime_get_real_ns() - start_time) *
+				EXT4_SB(elr->lr_super)->s_li_wait_mult);
 		}
 		elr->lr_next_sched = jiffies + elr->lr_timeout;
 		elr->lr_next_group = group + 1;
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] ext4: fix lazy initialization next schedule time computation in more granular unit
  2021-08-17 22:56 ` [PATCH 1/1] " Shaoying Xu
@ 2021-10-14  9:56   ` Jan Kara
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2021-10-14  9:56 UTC (permalink / raw)
  To: Shaoying Xu
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, fllinden, benh

On Tue 17-08-21 22:56:54, Shaoying Xu wrote:
> Ext4 file system has default lazy inode table initialization setup once
> it is mounted. However, it has issue on computing the next schedule time
> that makes the timeout same amount in jiffies but different real time in
> secs if with various HZ values. Therefore, fix by measuring the current
> time in a more granular unit nanoseconds and make the next schedule time
> independent of the HZ value.
> 
> Fixes: bfff68738f1c ("ext4: add support for lazy inode table initialization")
> Signed-off-by: Shaoying Xu <shaoyi@amazon.com>
> Cc: stable@vger.kernel.org

Thanks for the patch. It seems to have fallen through the cracks. It looks
good just some nits: The timeout will be still dependent on the HZ value
because we use jiffie-granular timer.  But yes, I guess it is unnecessary
to make the imprecision 10x worse when we know we are likely dealing with
small numbers. 

> @@ -3460,14 +3460,13 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
>  		ret = 1;
>  
>  	if (!ret) {

Please add a comment here so that we don't forget. Like:
		/* Use ns-granular time as init can be really fast */

With this feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

> -		timeout = jiffies;
> +		start_time = ktime_get_real_ns();
>  		ret = ext4_init_inode_table(sb, group,
>  					    elr->lr_timeout ? 0 : 1);
>  		trace_ext4_lazy_itable_init(sb, group);
>  		if (elr->lr_timeout == 0) {
> -			timeout = (jiffies - timeout) *
> -				EXT4_SB(elr->lr_super)->s_li_wait_mult;
> -			elr->lr_timeout = timeout;
> +			elr->lr_timeout = nsecs_to_jiffies((ktime_get_real_ns() - start_time) *
> +				EXT4_SB(elr->lr_super)->s_li_wait_mult);
>  		}
>  		elr->lr_next_sched = jiffies + elr->lr_timeout;
>  		elr->lr_next_group = group + 1;


								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/1] ext4: fix lazy initialization next schedule time computation in more granular unit
  2021-09-02 16:44 [PATCH 0/1] [RESEND] " Shaoying Xu
@ 2021-09-02 16:44 ` Shaoying Xu
  0 siblings, 0 replies; 4+ messages in thread
From: Shaoying Xu @ 2021-09-02 16:44 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: linux-ext4, linux-kernel, benh, shaoyi

Ext4 file system has default lazy inode table initialization setup once
it is mounted. However, it has issue on computing the next schedule time
that makes the timeout same amount in jiffies but different real time in
secs if with various HZ values. Therefore, fix by measuring the current
time in a more granular unit nanoseconds and make the next schedule time
independent of the HZ value.

Fixes: bfff68738f1c ("ext4: add support for lazy inode table initialization")
Signed-off-by: Shaoying Xu <shaoyi@amazon.com>
Cc: stable@vger.kernel.org
---
 fs/ext4/super.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dfa09a277b56..399cbe9c14a8 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3421,9 +3421,9 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 	struct super_block *sb = elr->lr_super;
 	ext4_group_t ngroups = EXT4_SB(sb)->s_groups_count;
 	ext4_group_t group = elr->lr_next_group;
-	unsigned long timeout = 0;
 	unsigned int prefetch_ios = 0;
 	int ret = 0;
+	u64 start_time;
 
 	if (elr->lr_mode == EXT4_LI_MODE_PREFETCH_BBITMAP) {
 		elr->lr_next_group = ext4_mb_prefetch(sb, group,
@@ -3460,14 +3460,13 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 		ret = 1;
 
 	if (!ret) {
-		timeout = jiffies;
+		start_time = ktime_get_real_ns();
 		ret = ext4_init_inode_table(sb, group,
 					    elr->lr_timeout ? 0 : 1);
 		trace_ext4_lazy_itable_init(sb, group);
 		if (elr->lr_timeout == 0) {
-			timeout = (jiffies - timeout) *
-				EXT4_SB(elr->lr_super)->s_li_wait_mult;
-			elr->lr_timeout = timeout;
+			elr->lr_timeout = nsecs_to_jiffies((ktime_get_real_ns() - start_time) *
+				EXT4_SB(elr->lr_super)->s_li_wait_mult);
 		}
 		elr->lr_next_sched = jiffies + elr->lr_timeout;
 		elr->lr_next_group = group + 1;
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-14  9:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17 22:56 [PATCH 0/1] ext4: fix lazy initialization next schedule time computation in more granular unit Shaoying Xu
2021-08-17 22:56 ` [PATCH 1/1] " Shaoying Xu
2021-10-14  9:56   ` Jan Kara
2021-09-02 16:44 [PATCH 0/1] [RESEND] " Shaoying Xu
2021-09-02 16:44 ` [PATCH 1/1] " Shaoying Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).