linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/5] ext4: get discard out of jbd2 commit context
@ 2021-07-24  7:41 Wang Jianchao
  2021-07-24  7:41 ` [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent Wang Jianchao
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Wang Jianchao @ 2021-07-24  7:41 UTC (permalink / raw)
  To: linux-ext4, linux-kernel; +Cc: tytso, adilger.kernel

Hi all

This is the version 3 patch set that attempts to get discard out of the jbd2
commit kthread. When the user delete a lot data and cause discard flooding,
the jbd2 commit kthread can be blocked for very long time and then all of
the metadata operations are blocked due to no journal space.

The xfstest with following parameters,
MODULAR=0
TEST_DIR=/mnt/test
TEST_DEV=/dev/nbd37p1
SCRATCH_MNT=/mnt/scratch
SCRATCH_DEV=/dev/nbd37p2
MOUNT_OPTIONS="-o discard"
has passed. The result is consistent w/ or w/o this patch set.

There are 5 patches,

Patch 1 ~ 3, there are no functional changes in them, but just some preparation
for following patches

Patch 4 introduces a async kworker to do discard in fstrim fation which implements
the core idea of this patch set.

Patch 5 let the fallocate retry when err is ENOSPC. This fix the generic/371

Any comments are welcome ;)

V2 -> V3
 - Get rid of the per block group rb tree which carries freed entry. It is not neccesary
   because we have done aggregation when wait for journal commit. Just use a list
   to carry the free entries.

V1 -> V2
 - free the blocks back to mb buddy after commit and then do ftrim fashion discard

 fs/ext4/ext4.h    |   2 +
 fs/ext4/extents.c |   6 ++-
 fs/ext4/mballoc.c | 223 ++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------
 3 files changed, 151 insertions(+), 80 deletions(-)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent
  2021-07-24  7:41 [PATCH V3 0/5] ext4: get discard out of jbd2 commit context Wang Jianchao
@ 2021-07-24  7:41 ` Wang Jianchao
  2021-07-26  3:42   ` Guoqing Jiang
  2021-08-04 15:26   ` Jan Kara
  2021-07-24  7:41 ` [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range() Wang Jianchao
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 23+ messages in thread
From: Wang Jianchao @ 2021-07-24  7:41 UTC (permalink / raw)
  To: linux-ext4, linux-kernel; +Cc: tytso, adilger.kernel

From: Wang Jianchao <wangjianchao@kuaishou.com>

Get rid of the 'group' parameter of ext4_trim_extent as we can get
it from the 'e4b'.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
 fs/ext4/mballoc.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 089c958aa2c3..018d5d3c6eeb 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6183,19 +6183,19 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
  * @sb:		super block for the file system
  * @start:	starting block of the free extent in the alloc. group
  * @count:	number of blocks to TRIM
- * @group:	alloc. group we are working with
  * @e4b:	ext4 buddy for the group
  *
  * Trim "count" blocks starting at "start" in the "group". To assure that no
  * one will allocate those blocks, mark it as used in buddy bitmap. This must
  * be called with under the group lock.
  */
-static int ext4_trim_extent(struct super_block *sb, int start, int count,
-			     ext4_group_t group, struct ext4_buddy *e4b)
+static int ext4_trim_extent(struct super_block *sb,
+		int start, int count, struct ext4_buddy *e4b)
 __releases(bitlock)
 __acquires(bitlock)
 {
 	struct ext4_free_extent ex;
+	ext4_group_t group = e4b->bd_group;
 	int ret = 0;
 
 	trace_ext4_trim_extent(sb, group, start, count);
@@ -6271,8 +6271,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 		next = mb_find_next_bit(bitmap, max + 1, start);
 
 		if ((next - start) >= minblocks) {
-			ret = ext4_trim_extent(sb, start,
-					       next - start, group, &e4b);
+			ret = ext4_trim_extent(sb, start, next - start, &e4b);
 			if (ret && ret != -EOPNOTSUPP)
 				break;
 			ret = 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range()
  2021-07-24  7:41 [PATCH V3 0/5] ext4: get discard out of jbd2 commit context Wang Jianchao
  2021-07-24  7:41 ` [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent Wang Jianchao
@ 2021-07-24  7:41 ` Wang Jianchao
  2021-08-04 15:29   ` Jan Kara
  2021-08-12 17:44   ` Theodore Ts'o
  2021-07-24  7:41 ` [PATCH V3 3/5] ext4: remove the repeated comment of ext4_trim_all_free Wang Jianchao
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 23+ messages in thread
From: Wang Jianchao @ 2021-07-24  7:41 UTC (permalink / raw)
  To: linux-ext4, linux-kernel; +Cc: tytso, adilger.kernel

From: Wang Jianchao <wangjianchao@kuaishou.com>

There is no functional change in this patch but just split the
codes, which serachs free block and does trim, into a new function
ext4_try_to_trim_range. This is preparing for the following async
backgroup discard.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
 fs/ext4/mballoc.c | 102 ++++++++++++++++++++++++++--------------------
 1 file changed, 57 insertions(+), 45 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 018d5d3c6eeb..e3844152a643 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6218,6 +6218,54 @@ __acquires(bitlock)
 	return ret;
 }
 
+static int ext4_try_to_trim_range(struct super_block *sb,
+		struct ext4_buddy *e4b, ext4_grpblk_t start,
+		ext4_grpblk_t max, ext4_grpblk_t minblocks)
+{
+	ext4_grpblk_t next, count, free_count;
+	void *bitmap;
+	int ret = 0;
+
+	bitmap = e4b->bd_bitmap;
+	start = (e4b->bd_info->bb_first_free > start) ?
+		e4b->bd_info->bb_first_free : start;
+	count = 0;
+	free_count = 0;
+
+	while (start <= max) {
+		start = mb_find_next_zero_bit(bitmap, max + 1, start);
+		if (start > max)
+			break;
+		next = mb_find_next_bit(bitmap, max + 1, start);
+
+		if ((next - start) >= minblocks) {
+			ret = ext4_trim_extent(sb, start, next - start, e4b);
+			if (ret && ret != -EOPNOTSUPP)
+				break;
+			ret = 0;
+			count += next - start;
+		}
+		free_count += next - start;
+		start = next + 1;
+
+		if (fatal_signal_pending(current)) {
+			count = -ERESTARTSYS;
+			break;
+		}
+
+		if (need_resched()) {
+			ext4_unlock_group(sb, e4b->bd_group);
+			cond_resched();
+			ext4_lock_group(sb, e4b->bd_group);
+		}
+
+		if ((e4b->bd_info->bb_free - free_count) < minblocks)
+			break;
+	}
+
+	return count;
+}
+
 /**
  * ext4_trim_all_free -- function to trim all free space in alloc. group
  * @sb:			super block for file system
@@ -6241,10 +6289,8 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 		   ext4_grpblk_t start, ext4_grpblk_t max,
 		   ext4_grpblk_t minblocks)
 {
-	void *bitmap;
-	ext4_grpblk_t next, count = 0, free_count = 0;
 	struct ext4_buddy e4b;
-	int ret = 0;
+	int ret;
 
 	trace_ext4_trim_all_free(sb, group, start, max);
 
@@ -6254,57 +6300,23 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
 			     ret, group);
 		return ret;
 	}
-	bitmap = e4b.bd_bitmap;
 
 	ext4_lock_group(sb, group);
-	if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
-	    minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
-		goto out;
-
-	start = (e4b.bd_info->bb_first_free > start) ?
-		e4b.bd_info->bb_first_free : start;
 
-	while (start <= max) {
-		start = mb_find_next_zero_bit(bitmap, max + 1, start);
-		if (start > max)
-			break;
-		next = mb_find_next_bit(bitmap, max + 1, start);
-
-		if ((next - start) >= minblocks) {
-			ret = ext4_trim_extent(sb, start, next - start, &e4b);
-			if (ret && ret != -EOPNOTSUPP)
-				break;
-			ret = 0;
-			count += next - start;
-		}
-		free_count += next - start;
-		start = next + 1;
-
-		if (fatal_signal_pending(current)) {
-			count = -ERESTARTSYS;
-			break;
-		}
-
-		if (need_resched()) {
-			ext4_unlock_group(sb, group);
-			cond_resched();
-			ext4_lock_group(sb, group);
-		}
-
-		if ((e4b.bd_info->bb_free - free_count) < minblocks)
-			break;
+	if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
+	    minblocks < atomic_read(&EXT4_SB(sb)->s_last_trim_minblks)) {
+		ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
+		if (ret >= 0)
+			EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
+	} else {
+		ret = 0;
 	}
 
-	if (!ret) {
-		ret = count;
-		EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
-	}
-out:
 	ext4_unlock_group(sb, group);
 	ext4_mb_unload_buddy(&e4b);
 
 	ext4_debug("trimmed %d blocks in the group %d\n",
-		count, group);
+		ret, group);
 
 	return ret;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 3/5] ext4: remove the repeated comment of ext4_trim_all_free
  2021-07-24  7:41 [PATCH V3 0/5] ext4: get discard out of jbd2 commit context Wang Jianchao
  2021-07-24  7:41 ` [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent Wang Jianchao
  2021-07-24  7:41 ` [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range() Wang Jianchao
@ 2021-07-24  7:41 ` Wang Jianchao
  2021-08-04 15:27   ` Jan Kara
  2021-07-24  7:41 ` [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
  2021-07-24  7:41 ` [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC Wang Jianchao
  4 siblings, 1 reply; 23+ messages in thread
From: Wang Jianchao @ 2021-07-24  7:41 UTC (permalink / raw)
  To: linux-ext4, linux-kernel; +Cc: tytso, adilger.kernel

From: Wang Jianchao <wangjianchao@kuaishou.com>

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
 fs/ext4/mballoc.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e3844152a643..34be2f07449d 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6274,15 +6274,10 @@ static int ext4_try_to_trim_range(struct super_block *sb,
  * @max:		last group block to examine
  * @minblocks:		minimum extent block count
  *
- * ext4_trim_all_free walks through group's buddy bitmap searching for free
- * extents. When the free block is found, ext4_trim_extent is called to TRIM
- * the extent.
- *
- *
  * ext4_trim_all_free walks through group's block bitmap searching for free
  * extents. When the free extent is found, mark it as used in group buddy
  * bitmap. Then issue a TRIM command on this extent and free the extent in
- * the group buddy bitmap. This is done until whole group is scanned.
+ * the group buddy bitmap.
  */
 static ext4_grpblk_t
 ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-07-24  7:41 [PATCH V3 0/5] ext4: get discard out of jbd2 commit context Wang Jianchao
                   ` (2 preceding siblings ...)
  2021-07-24  7:41 ` [PATCH V3 3/5] ext4: remove the repeated comment of ext4_trim_all_free Wang Jianchao
@ 2021-07-24  7:41 ` Wang Jianchao
  2021-08-04 15:45   ` Jan Kara
  2021-08-12 19:46   ` Theodore Ts'o
  2021-07-24  7:41 ` [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC Wang Jianchao
  4 siblings, 2 replies; 23+ messages in thread
From: Wang Jianchao @ 2021-07-24  7:41 UTC (permalink / raw)
  To: linux-ext4, linux-kernel; +Cc: tytso, adilger.kernel

From: Wang Jianchao <wangjianchao@kuaishou.com>

Right now, discard is issued and waited to be completed in jbd2
commit kthread context after the logs are committed. When large
amount of files are deleted and discard is flooding, jbd2 commit
kthread can be blocked for long time. Then all of the metadata
operations can be blocked to wait the log space.

One case is the page fault path with read mm->mmap_sem held, which
wants to update the file time but has to wait for the log space.
When other threads in the task wants to do mmap, then write mmap_sem
is blocked. Finally all of the following read mmap_sem requirements
are blocked, even the ps command which need to read the /proc/pid/
-cmdline. Our monitor service which needs to read /proc/pid/cmdline
used to be blocked for 5 mins.

This patch frees the blocks back to buddy after commit and then do
discard in a async kworker context in fstrim fashion, namely,
 - mark blocks to be discarded as used if they have not been allocated
 - do discard
 - mark them free
After this, jbd2 commit kthread won't be blocked any more by discard
and we won't get NOSPC even if the discard is slow or throttled.

Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
 fs/ext4/ext4.h    |   2 +
 fs/ext4/mballoc.c | 109 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 86 insertions(+), 25 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3c51e243450d..6b678b968d84 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1536,6 +1536,8 @@ struct ext4_sb_info {
 	unsigned int s_mb_free_pending;
 	struct list_head s_freed_data_list;	/* List of blocks to be freed
 						   after commit completed */
+	struct list_head s_discard_list;
+	struct work_struct s_discard_work;
 	struct rb_root s_mb_avg_fragment_size_root;
 	rwlock_t s_mb_rb_lock;
 	struct list_head *s_mb_largest_free_orders;
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 34be2f07449d..a496509e61b7 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -386,6 +386,7 @@
 static struct kmem_cache *ext4_pspace_cachep;
 static struct kmem_cache *ext4_ac_cachep;
 static struct kmem_cache *ext4_free_data_cachep;
+static struct workqueue_struct *ext4_discard_wq;
 
 /* We create slab caches for groupinfo data structures based on the
  * superblock block size.  There will be one per mounted filesystem for
@@ -408,6 +409,10 @@ static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac);
 static bool ext4_mb_good_group(struct ext4_allocation_context *ac,
 			       ext4_group_t group, int cr);
 
+static int ext4_try_to_trim_range(struct super_block *sb,
+		struct ext4_buddy *e4b, ext4_grpblk_t start,
+		ext4_grpblk_t max, ext4_grpblk_t minblocks);
+
 /*
  * The algorithm using this percpu seq counter goes below:
  * 1. We sample the percpu discard_pa_seq counter before trying for block
@@ -3308,6 +3313,55 @@ static int ext4_groupinfo_create_slab(size_t size)
 	return 0;
 }
 
+static void ext4_discard_work(struct work_struct *work)
+{
+	struct ext4_sb_info *sbi = container_of(work,
+			struct ext4_sb_info, s_discard_work);
+	struct super_block *sb = sbi->s_sb;
+	struct ext4_free_data *fd, *nfd;
+	struct ext4_buddy e4b;
+	struct list_head discard_list;
+	ext4_group_t grp, load_grp;
+	int err = 0;
+
+	INIT_LIST_HEAD(&discard_list);
+	spin_lock(&sbi->s_md_lock);
+	list_splice_init(&sbi->s_discard_list, &discard_list);
+	spin_unlock(&sbi->s_md_lock);
+
+	load_grp = UINT_MAX;
+	list_for_each_entry_safe(fd, nfd, &discard_list, efd_list) {
+		/*
+		 * If filesystem is umounting or no memory, give up the discard
+		 */
+		if ((sb->s_flags & SB_ACTIVE) && !err) {
+			grp = fd->efd_group;
+			if (grp != load_grp) {
+				if (load_grp != UINT_MAX)
+					ext4_mb_unload_buddy(&e4b);
+
+				err = ext4_mb_load_buddy(sb, grp, &e4b);
+				if (err) {
+					kmem_cache_free(ext4_free_data_cachep, fd);
+					load_grp = UINT_MAX;
+					continue;
+				} else {
+					load_grp = grp;
+				}
+			}
+
+			ext4_lock_group(sb, grp);
+			ext4_try_to_trim_range(sb, &e4b, fd->efd_start_cluster,
+						fd->efd_start_cluster + fd->efd_count - 1, 1);
+			ext4_unlock_group(sb, grp);
+		}
+		kmem_cache_free(ext4_free_data_cachep, fd);
+	}
+
+	if (load_grp != UINT_MAX)
+		ext4_mb_unload_buddy(&e4b);
+}
+
 int ext4_mb_init(struct super_block *sb)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -3376,6 +3430,8 @@ int ext4_mb_init(struct super_block *sb)
 	spin_lock_init(&sbi->s_md_lock);
 	sbi->s_mb_free_pending = 0;
 	INIT_LIST_HEAD(&sbi->s_freed_data_list);
+	INIT_LIST_HEAD(&sbi->s_discard_list);
+	INIT_WORK(&sbi->s_discard_work, ext4_discard_work);
 
 	sbi->s_mb_max_to_scan = MB_DEFAULT_MAX_TO_SCAN;
 	sbi->s_mb_min_to_scan = MB_DEFAULT_MIN_TO_SCAN;
@@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
 	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
 	int count;
 
+	if (test_opt(sb, DISCARD)) {
+		/*
+		 * wait the discard work to drain all of ext4_free_data
+		 */
+		queue_work(ext4_discard_wq, &sbi->s_discard_work);
+		flush_work(&sbi->s_discard_work);
+	}
+
 	if (sbi->s_group_info) {
 		for (i = 0; i < ngroups; i++) {
 			cond_resched();
@@ -3596,7 +3660,6 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
 		put_page(e4b.bd_bitmap_page);
 	}
 	ext4_unlock_group(sb, entry->efd_group);
-	kmem_cache_free(ext4_free_data_cachep, entry);
 	ext4_mb_unload_buddy(&e4b);
 
 	mb_debug(sb, "freed %d blocks in %d structures\n", count,
@@ -3611,10 +3674,9 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct ext4_free_data *entry, *tmp;
-	struct bio *discard_bio = NULL;
 	struct list_head freed_data_list;
 	struct list_head *cut_pos = NULL;
-	int err;
+	bool wake;
 
 	INIT_LIST_HEAD(&freed_data_list);
 
@@ -3629,30 +3691,20 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
 				  cut_pos);
 	spin_unlock(&sbi->s_md_lock);
 
-	if (test_opt(sb, DISCARD)) {
-		list_for_each_entry(entry, &freed_data_list, efd_list) {
-			err = ext4_issue_discard(sb, entry->efd_group,
-						 entry->efd_start_cluster,
-						 entry->efd_count,
-						 &discard_bio);
-			if (err && err != -EOPNOTSUPP) {
-				ext4_msg(sb, KERN_WARNING, "discard request in"
-					 " group:%d block:%d count:%d failed"
-					 " with %d", entry->efd_group,
-					 entry->efd_start_cluster,
-					 entry->efd_count, err);
-			} else if (err == -EOPNOTSUPP)
-				break;
-		}
+	list_for_each_entry(entry, &freed_data_list, efd_list)
+		ext4_free_data_in_buddy(sb, entry);
 
-		if (discard_bio) {
-			submit_bio_wait(discard_bio);
-			bio_put(discard_bio);
-		}
+	if (test_opt(sb, DISCARD)) {
+		spin_lock(&sbi->s_md_lock);
+		wake = list_empty(&sbi->s_discard_list);
+		list_splice_tail(&freed_data_list, &sbi->s_discard_list);
+		spin_unlock(&sbi->s_md_lock);
+		if (wake)
+			queue_work(ext4_discard_wq, &sbi->s_discard_work);
+	} else {
+		list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
+			kmem_cache_free(ext4_free_data_cachep, entry);
 	}
-
-	list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
-		ext4_free_data_in_buddy(sb, entry);
 }
 
 int __init ext4_init_mballoc(void)
@@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
 	if (ext4_free_data_cachep == NULL)
 		goto out_ac_free;
 
+	ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
+	if (!ext4_discard_wq)
+		goto out_free_data;
+
 	return 0;
 
+out_free_data:
+	kmem_cache_destroy(ext4_free_data_cachep);
 out_ac_free:
 	kmem_cache_destroy(ext4_ac_cachep);
 out_pa_free:
@@ -3693,6 +3751,7 @@ void ext4_exit_mballoc(void)
 	kmem_cache_destroy(ext4_ac_cachep);
 	kmem_cache_destroy(ext4_free_data_cachep);
 	ext4_groupinfo_destroy_slabs();
+	destroy_workqueue(ext4_discard_wq);
 }
 
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC
  2021-07-24  7:41 [PATCH V3 0/5] ext4: get discard out of jbd2 commit context Wang Jianchao
                   ` (3 preceding siblings ...)
  2021-07-24  7:41 ` [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
@ 2021-07-24  7:41 ` Wang Jianchao
  2021-07-26  3:40   ` Guoqing Jiang
  2021-08-04 15:32   ` Jan Kara
  4 siblings, 2 replies; 23+ messages in thread
From: Wang Jianchao @ 2021-07-24  7:41 UTC (permalink / raw)
  To: linux-ext4, linux-kernel; +Cc: tytso, adilger.kernel

From: Wang Jianchao <wangjianchao@kuaishou.com>

The blocks may be waiting for journal commit to be freed back to
mb buddy. Let fallocate wait and retry in that case.

Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
 fs/ext4/extents.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 92ad64b89d9b..ad0b874d3448 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	struct inode *inode = file_inode(file);
 	loff_t new_size = 0;
 	unsigned int max_blocks;
-	int ret = 0;
+	int ret = 0, retries = 0;
 	int flags;
 	ext4_lblk_t lblk;
 	unsigned int blkbits = inode->i_blkbits;
@@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 		     FALLOC_FL_INSERT_RANGE))
 		return -EOPNOTSUPP;
 
+retry:
 	ext4_fc_start_update(inode);
 
 	if (mode & FALLOC_FL_PUNCH_HOLE) {
@@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
 exit:
 	ext4_fc_stop_update(inode);
+	if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
+		goto retry;
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC
  2021-07-24  7:41 ` [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC Wang Jianchao
@ 2021-07-26  3:40   ` Guoqing Jiang
  2021-07-26  7:05     ` Wang Jianchao
  2021-08-04 15:32   ` Jan Kara
  1 sibling, 1 reply; 23+ messages in thread
From: Guoqing Jiang @ 2021-07-26  3:40 UTC (permalink / raw)
  To: Wang Jianchao, linux-ext4, linux-kernel; +Cc: tytso, adilger.kernel

Hi,

On 7/24/21 3:41 PM, Wang Jianchao wrote:
> From: Wang Jianchao <wangjianchao@kuaishou.com>
>
> The blocks may be waiting for journal commit to be freed back to
> mb buddy. Let fallocate wait and retry in that case.
>
> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
> ---
>   fs/ext4/extents.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 92ad64b89d9b..ad0b874d3448 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>   	struct inode *inode = file_inode(file);
>   	loff_t new_size = 0;
>   	unsigned int max_blocks;
> -	int ret = 0;
> +	int ret = 0, retries = 0;
>   	int flags;
>   	ext4_lblk_t lblk;
>   	unsigned int blkbits = inode->i_blkbits;
> @@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>   		     FALLOC_FL_INSERT_RANGE))
>   		return -EOPNOTSUPP;
>   
> +retry:
>   	ext4_fc_start_update(inode);
>   
>   	if (mode & FALLOC_FL_PUNCH_HOLE) {
> @@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>   	trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
>   exit:
>   	ext4_fc_stop_update(inode);
> +	if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> +		goto retry;
> +

Not sure if it is necessary since ext4_alloc_file_blocks already retries 
allocate.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent
  2021-07-24  7:41 ` [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent Wang Jianchao
@ 2021-07-26  3:42   ` Guoqing Jiang
  2021-08-04 15:26   ` Jan Kara
  1 sibling, 0 replies; 23+ messages in thread
From: Guoqing Jiang @ 2021-07-26  3:42 UTC (permalink / raw)
  To: Wang Jianchao, linux-ext4, linux-kernel; +Cc: tytso, adilger.kernel


On 7/24/21 3:41 PM, Wang Jianchao wrote:
> -static int ext4_trim_extent(struct super_block *sb, int start, int count,
> -			     ext4_group_t group, struct ext4_buddy *e4b)
> +static int ext4_trim_extent(struct super_block *sb,
> +		int start, int count, struct ext4_buddy *e4b)

Nit, seems only need to change the second line.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC
  2021-07-26  3:40   ` Guoqing Jiang
@ 2021-07-26  7:05     ` Wang Jianchao
  2021-08-04 15:52       ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Wang Jianchao @ 2021-07-26  7:05 UTC (permalink / raw)
  To: Guoqing Jiang, linux-ext4, linux-kernel; +Cc: tytso, adilger.kernel



On 2021/7/26 11:40 AM, Guoqing Jiang wrote:
> Hi,
> 
> On 7/24/21 3:41 PM, Wang Jianchao wrote:
>> From: Wang Jianchao <wangjianchao@kuaishou.com>
>>
>> The blocks may be waiting for journal commit to be freed back to
>> mb buddy. Let fallocate wait and retry in that case.
>>
>> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
>> ---
>>   fs/ext4/extents.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 92ad64b89d9b..ad0b874d3448 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>       struct inode *inode = file_inode(file);
>>       loff_t new_size = 0;
>>       unsigned int max_blocks;
>> -    int ret = 0;
>> +    int ret = 0, retries = 0;
>>       int flags;
>>       ext4_lblk_t lblk;
>>       unsigned int blkbits = inode->i_blkbits;
>> @@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>                FALLOC_FL_INSERT_RANGE))
>>           return -EOPNOTSUPP;
>>   +retry:
>>       ext4_fc_start_update(inode);
>>         if (mode & FALLOC_FL_PUNCH_HOLE) {
>> @@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>       trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
>>   exit:
>>       ext4_fc_stop_update(inode);
>> +    if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
>> +        goto retry;
>> +
> 
> Not sure if it is necessary since ext4_alloc_file_blocks already retries allocate.

Yes, this patch should be get rid of.
But it is indeed helpful to fix the xfstest generic/371 which does concurrently write/rm
and fallocate/rm. I'll figure out some other way to improve that

Thanks
Jianchao

> 
> Thanks,
> Guoqing

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent
  2021-07-24  7:41 ` [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent Wang Jianchao
  2021-07-26  3:42   ` Guoqing Jiang
@ 2021-08-04 15:26   ` Jan Kara
  1 sibling, 0 replies; 23+ messages in thread
From: Jan Kara @ 2021-08-04 15:26 UTC (permalink / raw)
  To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, tytso, adilger.kernel

On Sat 24-07-21 15:41:20, Wang Jianchao wrote:
> From: Wang Jianchao <wangjianchao@kuaishou.com>
> 
> Get rid of the 'group' parameter of ext4_trim_extent as we can get
> it from the 'e4b'.
> 
> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 089c958aa2c3..018d5d3c6eeb 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6183,19 +6183,19 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
>   * @sb:		super block for the file system
>   * @start:	starting block of the free extent in the alloc. group
>   * @count:	number of blocks to TRIM
> - * @group:	alloc. group we are working with
>   * @e4b:	ext4 buddy for the group
>   *
>   * Trim "count" blocks starting at "start" in the "group". To assure that no
>   * one will allocate those blocks, mark it as used in buddy bitmap. This must
>   * be called with under the group lock.
>   */
> -static int ext4_trim_extent(struct super_block *sb, int start, int count,
> -			     ext4_group_t group, struct ext4_buddy *e4b)
> +static int ext4_trim_extent(struct super_block *sb,
> +		int start, int count, struct ext4_buddy *e4b)
>  __releases(bitlock)
>  __acquires(bitlock)
>  {
>  	struct ext4_free_extent ex;
> +	ext4_group_t group = e4b->bd_group;
>  	int ret = 0;
>  
>  	trace_ext4_trim_extent(sb, group, start, count);
> @@ -6271,8 +6271,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>  		next = mb_find_next_bit(bitmap, max + 1, start);
>  
>  		if ((next - start) >= minblocks) {
> -			ret = ext4_trim_extent(sb, start,
> -					       next - start, group, &e4b);
> +			ret = ext4_trim_extent(sb, start, next - start, &e4b);
>  			if (ret && ret != -EOPNOTSUPP)
>  				break;
>  			ret = 0;
> -- 
> 2.17.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 3/5] ext4: remove the repeated comment of ext4_trim_all_free
  2021-07-24  7:41 ` [PATCH V3 3/5] ext4: remove the repeated comment of ext4_trim_all_free Wang Jianchao
@ 2021-08-04 15:27   ` Jan Kara
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2021-08-04 15:27 UTC (permalink / raw)
  To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, tytso, adilger.kernel

On Sat 24-07-21 15:41:22, Wang Jianchao wrote:
> From: Wang Jianchao <wangjianchao@kuaishou.com>
> 
> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e3844152a643..34be2f07449d 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6274,15 +6274,10 @@ static int ext4_try_to_trim_range(struct super_block *sb,
>   * @max:		last group block to examine
>   * @minblocks:		minimum extent block count
>   *
> - * ext4_trim_all_free walks through group's buddy bitmap searching for free
> - * extents. When the free block is found, ext4_trim_extent is called to TRIM
> - * the extent.
> - *
> - *
>   * ext4_trim_all_free walks through group's block bitmap searching for free
>   * extents. When the free extent is found, mark it as used in group buddy
>   * bitmap. Then issue a TRIM command on this extent and free the extent in
> - * the group buddy bitmap. This is done until whole group is scanned.
> + * the group buddy bitmap.
>   */
>  static ext4_grpblk_t
>  ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
> -- 
> 2.17.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range()
  2021-07-24  7:41 ` [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range() Wang Jianchao
@ 2021-08-04 15:29   ` Jan Kara
  2021-08-12 17:44   ` Theodore Ts'o
  1 sibling, 0 replies; 23+ messages in thread
From: Jan Kara @ 2021-08-04 15:29 UTC (permalink / raw)
  To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, tytso, adilger.kernel

On Sat 24-07-21 15:41:21, Wang Jianchao wrote:
> From: Wang Jianchao <wangjianchao@kuaishou.com>
> 
> There is no functional change in this patch but just split the
> codes, which serachs free block and does trim, into a new function
> ext4_try_to_trim_range. This is preparing for the following async
> backgroup discard.
> 
> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc.c | 102 ++++++++++++++++++++++++++--------------------
>  1 file changed, 57 insertions(+), 45 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 018d5d3c6eeb..e3844152a643 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6218,6 +6218,54 @@ __acquires(bitlock)
>  	return ret;
>  }
>  
> +static int ext4_try_to_trim_range(struct super_block *sb,
> +		struct ext4_buddy *e4b, ext4_grpblk_t start,
> +		ext4_grpblk_t max, ext4_grpblk_t minblocks)
> +{
> +	ext4_grpblk_t next, count, free_count;
> +	void *bitmap;
> +	int ret = 0;
> +
> +	bitmap = e4b->bd_bitmap;
> +	start = (e4b->bd_info->bb_first_free > start) ?
> +		e4b->bd_info->bb_first_free : start;
> +	count = 0;
> +	free_count = 0;
> +
> +	while (start <= max) {
> +		start = mb_find_next_zero_bit(bitmap, max + 1, start);
> +		if (start > max)
> +			break;
> +		next = mb_find_next_bit(bitmap, max + 1, start);
> +
> +		if ((next - start) >= minblocks) {
> +			ret = ext4_trim_extent(sb, start, next - start, e4b);
> +			if (ret && ret != -EOPNOTSUPP)
> +				break;
> +			ret = 0;
> +			count += next - start;
> +		}
> +		free_count += next - start;
> +		start = next + 1;
> +
> +		if (fatal_signal_pending(current)) {
> +			count = -ERESTARTSYS;
> +			break;
> +		}
> +
> +		if (need_resched()) {
> +			ext4_unlock_group(sb, e4b->bd_group);
> +			cond_resched();
> +			ext4_lock_group(sb, e4b->bd_group);
> +		}
> +
> +		if ((e4b->bd_info->bb_free - free_count) < minblocks)
> +			break;
> +	}
> +
> +	return count;
> +}
> +
>  /**
>   * ext4_trim_all_free -- function to trim all free space in alloc. group
>   * @sb:			super block for file system
> @@ -6241,10 +6289,8 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>  		   ext4_grpblk_t start, ext4_grpblk_t max,
>  		   ext4_grpblk_t minblocks)
>  {
> -	void *bitmap;
> -	ext4_grpblk_t next, count = 0, free_count = 0;
>  	struct ext4_buddy e4b;
> -	int ret = 0;
> +	int ret;
>  
>  	trace_ext4_trim_all_free(sb, group, start, max);
>  
> @@ -6254,57 +6300,23 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>  			     ret, group);
>  		return ret;
>  	}
> -	bitmap = e4b.bd_bitmap;
>  
>  	ext4_lock_group(sb, group);
> -	if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
> -	    minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
> -		goto out;
> -
> -	start = (e4b.bd_info->bb_first_free > start) ?
> -		e4b.bd_info->bb_first_free : start;
>  
> -	while (start <= max) {
> -		start = mb_find_next_zero_bit(bitmap, max + 1, start);
> -		if (start > max)
> -			break;
> -		next = mb_find_next_bit(bitmap, max + 1, start);
> -
> -		if ((next - start) >= minblocks) {
> -			ret = ext4_trim_extent(sb, start, next - start, &e4b);
> -			if (ret && ret != -EOPNOTSUPP)
> -				break;
> -			ret = 0;
> -			count += next - start;
> -		}
> -		free_count += next - start;
> -		start = next + 1;
> -
> -		if (fatal_signal_pending(current)) {
> -			count = -ERESTARTSYS;
> -			break;
> -		}
> -
> -		if (need_resched()) {
> -			ext4_unlock_group(sb, group);
> -			cond_resched();
> -			ext4_lock_group(sb, group);
> -		}
> -
> -		if ((e4b.bd_info->bb_free - free_count) < minblocks)
> -			break;
> +	if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
> +	    minblocks < atomic_read(&EXT4_SB(sb)->s_last_trim_minblks)) {
> +		ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
> +		if (ret >= 0)
> +			EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
> +	} else {
> +		ret = 0;
>  	}
>  
> -	if (!ret) {
> -		ret = count;
> -		EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
> -	}
> -out:
>  	ext4_unlock_group(sb, group);
>  	ext4_mb_unload_buddy(&e4b);
>  
>  	ext4_debug("trimmed %d blocks in the group %d\n",
> -		count, group);
> +		ret, group);
>  
>  	return ret;
>  }
> -- 
> 2.17.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC
  2021-07-24  7:41 ` [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC Wang Jianchao
  2021-07-26  3:40   ` Guoqing Jiang
@ 2021-08-04 15:32   ` Jan Kara
  2021-08-04 15:46     ` Jan Kara
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Kara @ 2021-08-04 15:32 UTC (permalink / raw)
  To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, tytso, adilger.kernel

On Sat 24-07-21 15:41:24, Wang Jianchao wrote:
> From: Wang Jianchao <wangjianchao@kuaishou.com>
> 
> The blocks may be waiting for journal commit to be freed back to
> mb buddy. Let fallocate wait and retry in that case.
> 
> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>

Did you really observe this? Because the retry is already handled in
ext4_alloc_file_blocks() that's used by ext4_fallocate(). So no retry
should be needed there.

								Honza

> ---
>  fs/ext4/extents.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 92ad64b89d9b..ad0b874d3448 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  	struct inode *inode = file_inode(file);
>  	loff_t new_size = 0;
>  	unsigned int max_blocks;
> -	int ret = 0;
> +	int ret = 0, retries = 0;
>  	int flags;
>  	ext4_lblk_t lblk;
>  	unsigned int blkbits = inode->i_blkbits;
> @@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  		     FALLOC_FL_INSERT_RANGE))
>  		return -EOPNOTSUPP;
>  
> +retry:
>  	ext4_fc_start_update(inode);
>  
>  	if (mode & FALLOC_FL_PUNCH_HOLE) {
> @@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  	trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
>  exit:
>  	ext4_fc_stop_update(inode);
> +	if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> +		goto retry;
> +
>  	return ret;
>  }
>  
> -- 
> 2.17.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-07-24  7:41 ` [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
@ 2021-08-04 15:45   ` Jan Kara
  2021-08-26  7:15     ` Wang Jianchao
  2021-08-12 19:46   ` Theodore Ts'o
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Kara @ 2021-08-04 15:45 UTC (permalink / raw)
  To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, tytso, adilger.kernel

On Sat 24-07-21 15:41:23, Wang Jianchao wrote:
> From: Wang Jianchao <wangjianchao@kuaishou.com>
> 
> Right now, discard is issued and waited to be completed in jbd2
> commit kthread context after the logs are committed. When large
> amount of files are deleted and discard is flooding, jbd2 commit
> kthread can be blocked for long time. Then all of the metadata
> operations can be blocked to wait the log space.
> 
> One case is the page fault path with read mm->mmap_sem held, which
> wants to update the file time but has to wait for the log space.
> When other threads in the task wants to do mmap, then write mmap_sem
> is blocked. Finally all of the following read mmap_sem requirements
> are blocked, even the ps command which need to read the /proc/pid/
> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
> used to be blocked for 5 mins.
> 
> This patch frees the blocks back to buddy after commit and then do
> discard in a async kworker context in fstrim fashion, namely,
>  - mark blocks to be discarded as used if they have not been allocated
>  - do discard
>  - mark them free
> After this, jbd2 commit kthread won't be blocked any more by discard
> and we won't get NOSPC even if the discard is slow or throttled.
> 
> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
> Suggested-by: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>

Looks good to me. Just one small comment below. With that addressed feel
free to add:

Reviewed-by: Jan Kara <jack@suse.cz>


> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>  	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>  	int count;
>  
> +	if (test_opt(sb, DISCARD)) {
> +		/*
> +		 * wait the discard work to drain all of ext4_free_data
> +		 */
> +		queue_work(ext4_discard_wq, &sbi->s_discard_work);

Do we really need to queue the work here? The filesystem should be
quiescent by now, we take care to queue the work whenever we add item to
empty list. So it should be enough to have flush_work() here and then
possibly

	WARN_ON_ONCE(!list_empty(&sbi->s_discard_list))

Or am I missing something?

								Honza

> +		flush_work(&sbi->s_discard_work);
> +	}
> +
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC
  2021-08-04 15:32   ` Jan Kara
@ 2021-08-04 15:46     ` Jan Kara
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2021-08-04 15:46 UTC (permalink / raw)
  To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, tytso, adilger.kernel

On Wed 04-08-21 17:32:21, Jan Kara wrote:
> On Sat 24-07-21 15:41:24, Wang Jianchao wrote:
> > From: Wang Jianchao <wangjianchao@kuaishou.com>
> > 
> > The blocks may be waiting for journal commit to be freed back to
> > mb buddy. Let fallocate wait and retry in that case.
> > 
> > Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
> 
> Did you really observe this? Because the retry is already handled in
> ext4_alloc_file_blocks() that's used by ext4_fallocate(). So no retry
> should be needed there.

Oh, I can see you've addressed these already in another reply. I'll comment
there.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC
  2021-07-26  7:05     ` Wang Jianchao
@ 2021-08-04 15:52       ` Jan Kara
  2021-08-26 11:42         ` Wang Jianchao
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2021-08-04 15:52 UTC (permalink / raw)
  To: Wang Jianchao
  Cc: Guoqing Jiang, linux-ext4, linux-kernel, tytso, adilger.kernel

On Mon 26-07-21 15:05:41, Wang Jianchao wrote:
> 
> 
> On 2021/7/26 11:40 AM, Guoqing Jiang wrote:
> > Hi,
> > 
> > On 7/24/21 3:41 PM, Wang Jianchao wrote:
> >> From: Wang Jianchao <wangjianchao@kuaishou.com>
> >>
> >> The blocks may be waiting for journal commit to be freed back to
> >> mb buddy. Let fallocate wait and retry in that case.
> >>
> >> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
> >> ---
> >>   fs/ext4/extents.c | 6 +++++-
> >>   1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index 92ad64b89d9b..ad0b874d3448 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >>       struct inode *inode = file_inode(file);
> >>       loff_t new_size = 0;
> >>       unsigned int max_blocks;
> >> -    int ret = 0;
> >> +    int ret = 0, retries = 0;
> >>       int flags;
> >>       ext4_lblk_t lblk;
> >>       unsigned int blkbits = inode->i_blkbits;
> >> @@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >>                FALLOC_FL_INSERT_RANGE))
> >>           return -EOPNOTSUPP;
> >>   +retry:
> >>       ext4_fc_start_update(inode);
> >>         if (mode & FALLOC_FL_PUNCH_HOLE) {
> >> @@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >>       trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
> >>   exit:
> >>       ext4_fc_stop_update(inode);
> >> +    if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> >> +        goto retry;
> >> +
> > 
> > Not sure if it is necessary since ext4_alloc_file_blocks already retries allocate.
> 
> Yes, this patch should be get rid of.  But it is indeed helpful to fix
> the xfstest generic/371 which does concurrently write/rm and
> fallocate/rm. I'll figure out some other way to improve that

Note that the retry logic is only a heuristic. It is not guaranteed any
number of retries is enough, we just do three to not give up too easily...
Your patch effectively raised number of retries to 9 so that may have
masked the issue. But I don't think so high number of retries is a sensible
choice because that way it may take too long to return ENOSPC.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range()
  2021-07-24  7:41 ` [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range() Wang Jianchao
  2021-08-04 15:29   ` Jan Kara
@ 2021-08-12 17:44   ` Theodore Ts'o
  2021-08-26  7:19     ` Wang Jianchao
  1 sibling, 1 reply; 23+ messages in thread
From: Theodore Ts'o @ 2021-08-12 17:44 UTC (permalink / raw)
  To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, adilger.kernel

On Sat, Jul 24, 2021 at 03:41:21PM +0800, Wang Jianchao wrote:
> From: Wang Jianchao <wangjianchao@kuaishou.com>
> 
> There is no functional change in this patch but just split the
> codes, which serachs free block and does trim, into a new function
> ext4_try_to_trim_range. This is preparing for the following async
> backgroup discard.
> 
> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
> ---
>  fs/ext4/mballoc.c | 102 ++++++++++++++++++++++++++--------------------
>  1 file changed, 57 insertions(+), 45 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 018d5d3c6eeb..e3844152a643 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6218,6 +6218,54 @@ __acquires(bitlock)
>  	return ret;
>  }
>  
> +static int ext4_try_to_trim_range(struct super_block *sb,
> +		struct ext4_buddy *e4b, ext4_grpblk_t start,
> +		ext4_grpblk_t max, ext4_grpblk_t minblocks)
> +{
> +	ext4_grpblk_t next, count, free_count;
> +	void *bitmap;
> +	int ret = 0;
> +
> +	bitmap = e4b->bd_bitmap;
> +	start = (e4b->bd_info->bb_first_free > start) ?
> +		e4b->bd_info->bb_first_free : start;
> +	count = 0;
> +	free_count = 0;
> +
> +	while (start <= max) {
> +		start = mb_find_next_zero_bit(bitmap, max + 1, start);
> +		if (start > max)
> +			break;
> +		next = mb_find_next_bit(bitmap, max + 1, start);
> +
> +		if ((next - start) >= minblocks) {
> +			ret = ext4_trim_extent(sb, start, next - start, e4b);
> +			if (ret && ret != -EOPNOTSUPP)
> +				break;
> +			ret = 0;
> +			count += next - start;
> +		}

"ret" is only used inside the if statement, so this might be better as:

> +		if ((next - start) >= minblocks) {
> +			int ret = ext4_trim_extent(sb, start, next - start, e4b);
> +
> +			if (ret && ret != -EOPNOTSUPP)
> +				break;
> +			count += next - start;
> +		}

... and then drop the "int ret = 0" above.

Otherwise, looks good.

						- Ted

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-07-24  7:41 ` [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
  2021-08-04 15:45   ` Jan Kara
@ 2021-08-12 19:46   ` Theodore Ts'o
  2021-08-26  7:51     ` Wang Jianchao
  1 sibling, 1 reply; 23+ messages in thread
From: Theodore Ts'o @ 2021-08-12 19:46 UTC (permalink / raw)
  To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, adilger.kernel

On Sat, Jul 24, 2021 at 03:41:23PM +0800, Wang Jianchao wrote:
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 34be2f07449d..a496509e61b7 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>  	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>  	int count;
>  
> +	if (test_opt(sb, DISCARD)) {
> +		/*
> +		 * wait the discard work to drain all of ext4_free_data
> +		 */
> +		queue_work(ext4_discard_wq, &sbi->s_discard_work);
> +		flush_work(&sbi->s_discard_work);

I agree with Jan --- it's not clear to me why the call to queue_work()
is needed.  After the flush_work() call returns, if s_discard_work is
still non-empty, there must be something terribly wrong --- are we
missing something?

> @@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
>  	if (ext4_free_data_cachep == NULL)
>  		goto out_ac_free;
>  
> +	ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
> +	if (!ext4_discard_wq)
> +		goto out_free_data;
> +


Perhaps we should only allocate the workqueue when it's needed ---
e.g., when a file system is mounted or remounted with "-o discard"?

Then in ext4_exit_malloc(), we only free it if ext4_discard_wq is
non-NULL.

This would save a bit of memory on systems that wouldn't need the ext4
discard work queue.

					- Ted

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-08-04 15:45   ` Jan Kara
@ 2021-08-26  7:15     ` Wang Jianchao
  0 siblings, 0 replies; 23+ messages in thread
From: Wang Jianchao @ 2021-08-26  7:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, linux-kernel, tytso, adilger.kernel



On 2021/8/4 11:45 PM, Jan Kara wrote:
> On Sat 24-07-21 15:41:23, Wang Jianchao wrote:
>> From: Wang Jianchao <wangjianchao@kuaishou.com>
>>
>> Right now, discard is issued and waited to be completed in jbd2
>> commit kthread context after the logs are committed. When large
>> amount of files are deleted and discard is flooding, jbd2 commit
>> kthread can be blocked for long time. Then all of the metadata
>> operations can be blocked to wait the log space.
>>
>> One case is the page fault path with read mm->mmap_sem held, which
>> wants to update the file time but has to wait for the log space.
>> When other threads in the task wants to do mmap, then write mmap_sem
>> is blocked. Finally all of the following read mmap_sem requirements
>> are blocked, even the ps command which need to read the /proc/pid/
>> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
>> used to be blocked for 5 mins.
>>
>> This patch frees the blocks back to buddy after commit and then do
>> discard in a async kworker context in fstrim fashion, namely,
>>  - mark blocks to be discarded as used if they have not been allocated
>>  - do discard
>>  - mark them free
>> After this, jbd2 commit kthread won't be blocked any more by discard
>> and we won't get NOSPC even if the discard is slow or throttled.
>>
>> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
>> Suggested-by: Theodore Ts'o <tytso@mit.edu>
>> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
> 
> Looks good to me. Just one small comment below. With that addressed feel
> free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> 
>> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>>  	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>>  	int count;
>>  
>> +	if (test_opt(sb, DISCARD)) {
>> +		/*
>> +		 * wait the discard work to drain all of ext4_free_data
>> +		 */
>> +		queue_work(ext4_discard_wq, &sbi->s_discard_work);
> 
> Do we really need to queue the work here? The filesystem should be
> quiescent by now, we take care to queue the work whenever we add item to
> empty list. So it should be enough to have flush_work() here and then
> possibly
> 
> 	WARN_ON_ONCE(!list_empty(&sbi->s_discard_list))
> 
> Or am I missing something?

queue_work here is indeed redundant.

Thanks so much for you point out this.
Jianchao

> 
> 								Honza
> 
>> +		flush_work(&sbi->s_discard_work);
>> +	}
>> +

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range()
  2021-08-12 17:44   ` Theodore Ts'o
@ 2021-08-26  7:19     ` Wang Jianchao
  0 siblings, 0 replies; 23+ messages in thread
From: Wang Jianchao @ 2021-08-26  7:19 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel, adilger.kernel



On 2021/8/13 1:44 AM, Theodore Ts'o wrote:
> On Sat, Jul 24, 2021 at 03:41:21PM +0800, Wang Jianchao wrote:
>> From: Wang Jianchao <wangjianchao@kuaishou.com>
>>
>> There is no functional change in this patch but just split the
>> codes, which serachs free block and does trim, into a new function
>> ext4_try_to_trim_range. This is preparing for the following async
>> backgroup discard.
>>
>> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
>> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
>> ---
>>  fs/ext4/mballoc.c | 102 ++++++++++++++++++++++++++--------------------
>>  1 file changed, 57 insertions(+), 45 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 018d5d3c6eeb..e3844152a643 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -6218,6 +6218,54 @@ __acquires(bitlock)
>>  	return ret;
>>  }
>>  
>> +static int ext4_try_to_trim_range(struct super_block *sb,
>> +		struct ext4_buddy *e4b, ext4_grpblk_t start,
>> +		ext4_grpblk_t max, ext4_grpblk_t minblocks)
>> +{
>> +	ext4_grpblk_t next, count, free_count;
>> +	void *bitmap;
>> +	int ret = 0;
>> +
>> +	bitmap = e4b->bd_bitmap;
>> +	start = (e4b->bd_info->bb_first_free > start) ?
>> +		e4b->bd_info->bb_first_free : start;
>> +	count = 0;
>> +	free_count = 0;
>> +
>> +	while (start <= max) {
>> +		start = mb_find_next_zero_bit(bitmap, max + 1, start);
>> +		if (start > max)
>> +			break;
>> +		next = mb_find_next_bit(bitmap, max + 1, start);
>> +
>> +		if ((next - start) >= minblocks) {
>> +			ret = ext4_trim_extent(sb, start, next - start, e4b);
>> +			if (ret && ret != -EOPNOTSUPP)
>> +				break;
>> +			ret = 0;
>> +			count += next - start;
>> +		}
> 
> "ret" is only used inside the if statement, so this might be better as:
> 
>> +		if ((next - start) >= minblocks) {
>> +			int ret = ext4_trim_extent(sb, start, next - start, e4b);
>> +
>> +			if (ret && ret != -EOPNOTSUPP)
>> +				break;
>> +			count += next - start;
>> +		}
> 
> ... and then drop the "int ret = 0" above.
> 
> Otherwise, looks good.
> 

OK, I'll do it in next version

Thanks so much
Jianchao
> 						- Ted
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-08-12 19:46   ` Theodore Ts'o
@ 2021-08-26  7:51     ` Wang Jianchao
  2021-08-26  8:58       ` Wang Jianchao
  0 siblings, 1 reply; 23+ messages in thread
From: Wang Jianchao @ 2021-08-26  7:51 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel, adilger.kernel



On 2021/8/13 3:46 AM, Theodore Ts'o wrote:
> On Sat, Jul 24, 2021 at 03:41:23PM +0800, Wang Jianchao wrote:
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 34be2f07449d..a496509e61b7 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>>  	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>>  	int count;
>>  
>> +	if (test_opt(sb, DISCARD)) {
>> +		/*
>> +		 * wait the discard work to drain all of ext4_free_data
>> +		 */
>> +		queue_work(ext4_discard_wq, &sbi->s_discard_work);
>> +		flush_work(&sbi->s_discard_work);
> 
> I agree with Jan --- it's not clear to me why the call to queue_work()
> is needed.  After the flush_work() call returns, if s_discard_work is
> still non-empty, there must be something terribly wrong --- are we
> missing something?

Yes,the queue_work() is redundant.
I will get rid of it in next version.

> 
>> @@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
>>  	if (ext4_free_data_cachep == NULL)
>>  		goto out_ac_free;
>>  
>> +	ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
>> +	if (!ext4_discard_wq)
>> +		goto out_free_data;
>> +
> 
> 
> Perhaps we should only allocate the workqueue when it's needed ---
> e.g., when a file system is mounted or remounted with "-o discard"?
> 
> Then in ext4_exit_malloc(), we only free it if ext4_discard_wq is
> non-NULL.
> 
> This would save a bit of memory on systems that wouldn't need the ext4
> discard work queue.

Yes, it make sense to the system with pool memory

Thanks so much
Jianchao

> 
> 					- Ted
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-08-26  7:51     ` Wang Jianchao
@ 2021-08-26  8:58       ` Wang Jianchao
  0 siblings, 0 replies; 23+ messages in thread
From: Wang Jianchao @ 2021-08-26  8:58 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel, adilger.kernel



On 2021/8/26 3:51 PM, Wang Jianchao wrote:

>>
>>> @@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
>>>  	if (ext4_free_data_cachep == NULL)
>>>  		goto out_ac_free;
>>>  
>>> +	ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
>>> +	if (!ext4_discard_wq)
>>> +		goto out_free_data;
>>> +
>>
>>
>> Perhaps we should only allocate the workqueue when it's needed ---
>> e.g., when a file system is mounted or remounted with "-o discard"?
>>
>> Then in ext4_exit_malloc(), we only free it if ext4_discard_wq is
>> non-NULL.
>>
>> This would save a bit of memory on systems that wouldn't need the ext4
>> discard work queue.
> 
> Yes, it make sense to the system with pool memory

s/pool/poor  :)

> 
> Thanks so much
> Jianchao
> 
>>
>> 					- Ted
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC
  2021-08-04 15:52       ` Jan Kara
@ 2021-08-26 11:42         ` Wang Jianchao
  0 siblings, 0 replies; 23+ messages in thread
From: Wang Jianchao @ 2021-08-26 11:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: Guoqing Jiang, linux-ext4, linux-kernel, tytso, adilger.kernel



On 2021/8/4 11:52 PM, Jan Kara wrote:
> On Mon 26-07-21 15:05:41, Wang Jianchao wrote:
>>
>>
>> On 2021/7/26 11:40 AM, Guoqing Jiang wrote:
>>> Hi,
>>>
>>> On 7/24/21 3:41 PM, Wang Jianchao wrote:
>>>> From: Wang Jianchao <wangjianchao@kuaishou.com>
>>>>
>>>> The blocks may be waiting for journal commit to be freed back to
>>>> mb buddy. Let fallocate wait and retry in that case.
>>>>
>>>> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
>>>> ---
>>>>   fs/ext4/extents.c | 6 +++++-
>>>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>>>> index 92ad64b89d9b..ad0b874d3448 100644
>>>> --- a/fs/ext4/extents.c
>>>> +++ b/fs/ext4/extents.c
>>>> @@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>>>       struct inode *inode = file_inode(file);
>>>>       loff_t new_size = 0;
>>>>       unsigned int max_blocks;
>>>> -    int ret = 0;
>>>> +    int ret = 0, retries = 0;
>>>>       int flags;
>>>>       ext4_lblk_t lblk;
>>>>       unsigned int blkbits = inode->i_blkbits;
>>>> @@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>>>                FALLOC_FL_INSERT_RANGE))
>>>>           return -EOPNOTSUPP;
>>>>   +retry:
>>>>       ext4_fc_start_update(inode);
>>>>         if (mode & FALLOC_FL_PUNCH_HOLE) {
>>>> @@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>>>       trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
>>>>   exit:
>>>>       ext4_fc_stop_update(inode);
>>>> +    if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
>>>> +        goto retry;
>>>> +
>>>
>>> Not sure if it is necessary since ext4_alloc_file_blocks already retries allocate.
>>
>> Yes, this patch should be get rid of.  But it is indeed helpful to fix
>> the xfstest generic/371 which does concurrently write/rm and
>> fallocate/rm. I'll figure out some other way to improve that
> 
> Note that the retry logic is only a heuristic. It is not guaranteed any
> number of retries is enough, we just do three to not give up too easily...
> Your patch effectively raised number of retries to 9 so that may have
> masked the issue. But I don't think so high number of retries is a sensible
> choice because that way it may take too long to return ENOSPC.

The failure seems due to the background discard which marks the blocks used
before issue discard. 

The test make a 256M filesystem which has 59316 4K blocks.
There are two thread running concurrently,
 - write, rm 80M file
 - fallocate, rm 80M file

When the fallocate failed, I can observe there was a 80M on-going background trim

We seems to need to add a flush_work(sbi->s_discard_work) into ext4_should_retry_alloc()

Thanks so much
Jianchao
> 
> 								Honza
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-08-26 11:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-24  7:41 [PATCH V3 0/5] ext4: get discard out of jbd2 commit context Wang Jianchao
2021-07-24  7:41 ` [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent Wang Jianchao
2021-07-26  3:42   ` Guoqing Jiang
2021-08-04 15:26   ` Jan Kara
2021-07-24  7:41 ` [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range() Wang Jianchao
2021-08-04 15:29   ` Jan Kara
2021-08-12 17:44   ` Theodore Ts'o
2021-08-26  7:19     ` Wang Jianchao
2021-07-24  7:41 ` [PATCH V3 3/5] ext4: remove the repeated comment of ext4_trim_all_free Wang Jianchao
2021-08-04 15:27   ` Jan Kara
2021-07-24  7:41 ` [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
2021-08-04 15:45   ` Jan Kara
2021-08-26  7:15     ` Wang Jianchao
2021-08-12 19:46   ` Theodore Ts'o
2021-08-26  7:51     ` Wang Jianchao
2021-08-26  8:58       ` Wang Jianchao
2021-07-24  7:41 ` [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC Wang Jianchao
2021-07-26  3:40   ` Guoqing Jiang
2021-07-26  7:05     ` Wang Jianchao
2021-08-04 15:52       ` Jan Kara
2021-08-26 11:42         ` Wang Jianchao
2021-08-04 15:32   ` Jan Kara
2021-08-04 15:46     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).