linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] [RESEND] ext4: fix lazy initialization next schedule time computation in more granular unit
@ 2021-09-02 16:44 Shaoying Xu
  2021-09-02 16:44 ` [PATCH 1/1] " Shaoying Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Shaoying Xu @ 2021-09-02 16:44 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: linux-ext4, linux-kernel, benh, shaoyi

Description
===========
Ext4 FS has inappropriate implementations on the next schedule time calculation
that use jiffies to measure the time for one request to zero out inode table. This
actually makes the wait time effectively dependent on CONFIG_HZ, which is
undesirable. We have observed on server systems with 100HZ some fairly long delays
in initialization as a result. Therefore, we propose to use more granular unit to
calculate the next schedule time.

Test
====
Tested the patch in stable kernel 5.10 with FS volume 2T and 3T on EC2
instances. Before the fix, instances with 250HZ finished the lazy initialization 
in around 2.4x time less than instances with 100HZ. 
After the fix, both of them finished within approximately same time. 

Patch
=====
Shaoying Xu (1):
  ext4: fix lazy initialization next schedule time computation in more
    granular unit

 fs/ext4/super.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

-- 
2.16.6


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/1] ext4: fix lazy initialization next schedule time computation in more granular unit
  2021-09-02 16:44 [PATCH 0/1] [RESEND] ext4: fix lazy initialization next schedule time computation in more granular unit Shaoying Xu
@ 2021-09-02 16:44 ` Shaoying Xu
  2021-09-20 19:56 ` [PATCH 0/1] [RESEND] " Shaoying Xu
  2021-10-07 14:21 ` Theodore Ts'o
  2 siblings, 0 replies; 4+ messages in thread
From: Shaoying Xu @ 2021-09-02 16:44 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: linux-ext4, linux-kernel, benh, shaoyi

Ext4 file system has default lazy inode table initialization setup once
it is mounted. However, it has issue on computing the next schedule time
that makes the timeout same amount in jiffies but different real time in
secs if with various HZ values. Therefore, fix by measuring the current
time in a more granular unit nanoseconds and make the next schedule time
independent of the HZ value.

Fixes: bfff68738f1c ("ext4: add support for lazy inode table initialization")
Signed-off-by: Shaoying Xu <shaoyi@amazon.com>
Cc: stable@vger.kernel.org
---
 fs/ext4/super.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dfa09a277b56..399cbe9c14a8 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3421,9 +3421,9 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 	struct super_block *sb = elr->lr_super;
 	ext4_group_t ngroups = EXT4_SB(sb)->s_groups_count;
 	ext4_group_t group = elr->lr_next_group;
-	unsigned long timeout = 0;
 	unsigned int prefetch_ios = 0;
 	int ret = 0;
+	u64 start_time;
 
 	if (elr->lr_mode == EXT4_LI_MODE_PREFETCH_BBITMAP) {
 		elr->lr_next_group = ext4_mb_prefetch(sb, group,
@@ -3460,14 +3460,13 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 		ret = 1;
 
 	if (!ret) {
-		timeout = jiffies;
+		start_time = ktime_get_real_ns();
 		ret = ext4_init_inode_table(sb, group,
 					    elr->lr_timeout ? 0 : 1);
 		trace_ext4_lazy_itable_init(sb, group);
 		if (elr->lr_timeout == 0) {
-			timeout = (jiffies - timeout) *
-				EXT4_SB(elr->lr_super)->s_li_wait_mult;
-			elr->lr_timeout = timeout;
+			elr->lr_timeout = nsecs_to_jiffies((ktime_get_real_ns() - start_time) *
+				EXT4_SB(elr->lr_super)->s_li_wait_mult);
 		}
 		elr->lr_next_sched = jiffies + elr->lr_timeout;
 		elr->lr_next_group = group + 1;
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/1] [RESEND] ext4: fix lazy initialization next schedule time computation in more granular unit
  2021-09-02 16:44 [PATCH 0/1] [RESEND] ext4: fix lazy initialization next schedule time computation in more granular unit Shaoying Xu
  2021-09-02 16:44 ` [PATCH 1/1] " Shaoying Xu
@ 2021-09-20 19:56 ` Shaoying Xu
  2021-10-07 14:21 ` Theodore Ts'o
  2 siblings, 0 replies; 4+ messages in thread
From: Shaoying Xu @ 2021-09-20 19:56 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: linux-ext4, linux-kernel, benh, shaoyi

Here are more context and testing details:

This issue was originally identified in Amazon Linux 2 with kernel 5.10 and
CONFIG_HZ is 250 in x86_64 while 100 in arm64. It can be reproduced by launching
EC2 instances c5.2xlarge (x86_64) and c6g.2xlarge (arm64) then measuring time to
finish ext4lazyinit thread after mounting the ext4 FS.

w/o fix in kernel 5.10
|----------------+-------------+------------|
| ext4 FS volume | c6g.2xlarge | c5.2xlarge |
|----------------+-------------+------------|
| 2T             | 1842 secs   | 743 secs   |
|----------------+-------------+------------|
| 3T             | 2690 secs   | 1110 secs  |
|----------------+-------------+------------|

w/ fix in kernel 5.10
|----------------+-------------+------------|
| ext4 FS volume | c6g.2xlarge | c5.2xlarge |
|----------------+-------------+------------|
| 2T             | 660 secs    | 544 secs   |
|----------------+-------------+------------|
| 3T             | 1053 secs   | 932 secs   |
|----------------+-------------+------------|

On Thu, Sep 02, 2021 at 04:44:11PM +0000, Shaoying Xu wrote:
> Description
> ===========
> Ext4 FS has inappropriate implementations on the next schedule time calculation
> that use jiffies to measure the time for one request to zero out inode table. This
> actually makes the wait time effectively dependent on CONFIG_HZ, which is
> undesirable. We have observed on server systems with 100HZ some fairly long delays
> in initialization as a result. Therefore, we propose to use more granular unit to
> calculate the next schedule time.
> 
> Test
> ====
> Tested the patch in stable kernel 5.10 with FS volume 2T and 3T on EC2
> instances. Before the fix, instances with 250HZ finished the lazy initialization 
> in around 2.4x time less than instances with 100HZ. 
> After the fix, both of them finished within approximately same time. 
> 
> Patch
> =====
> Shaoying Xu (1):
>   ext4: fix lazy initialization next schedule time computation in more
>     granular unit
> 
>  fs/ext4/super.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> -- 
> 2.16.6
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/1] [RESEND] ext4: fix lazy initialization next schedule time computation in more granular unit
  2021-09-02 16:44 [PATCH 0/1] [RESEND] ext4: fix lazy initialization next schedule time computation in more granular unit Shaoying Xu
  2021-09-02 16:44 ` [PATCH 1/1] " Shaoying Xu
  2021-09-20 19:56 ` [PATCH 0/1] [RESEND] " Shaoying Xu
@ 2021-10-07 14:21 ` Theodore Ts'o
  2 siblings, 0 replies; 4+ messages in thread
From: Theodore Ts'o @ 2021-10-07 14:21 UTC (permalink / raw)
  To: adilger.kernel, Shaoying Xu
  Cc: Theodore Ts'o, linux-kernel, linux-ext4, benh

On Thu, 2 Sep 2021 16:44:11 +0000, Shaoying Xu wrote:
> Description
> ===========
> Ext4 FS has inappropriate implementations on the next schedule time calculation
> that use jiffies to measure the time for one request to zero out inode table. This
> actually makes the wait time effectively dependent on CONFIG_HZ, which is
> undesirable. We have observed on server systems with 100HZ some fairly long delays
> in initialization as a result. Therefore, we propose to use more granular unit to
> calculate the next schedule time.
> 
> [...]

Applied, thanks!

[1/1] ext4: fix lazy initialization next schedule time computation in more granular unit
      commit: 3782027982881d2c1105ffe058aecb69cc780dfa

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-07 14:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-02 16:44 [PATCH 0/1] [RESEND] ext4: fix lazy initialization next schedule time computation in more granular unit Shaoying Xu
2021-09-02 16:44 ` [PATCH 1/1] " Shaoying Xu
2021-09-20 19:56 ` [PATCH 0/1] [RESEND] " Shaoying Xu
2021-10-07 14:21 ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).