* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
@ 2021-07-26 20:30 kernel test robot
0 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2021-07-26 20:30 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 9859 bytes --]
CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210724074124.25731-5-jianchao.wan9@gmail.com>
References: <20210724074124.25731-5-jianchao.wan9@gmail.com>
TO: Wang Jianchao <jianchao.wan9@gmail.com>
TO: linux-ext4(a)vger.kernel.org
TO: linux-kernel(a)vger.kernel.org
CC: tytso(a)mit.edu
CC: adilger.kernel(a)dilger.ca
Hi Wang,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on ext4/dev]
[also build test WARNING on linux/master linus/master v5.14-rc3 next-20210723]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Wang-Jianchao/ext4-get-discard-out-of-jbd2-commit-context/20210724-154426
base: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
:::::: branch date: 3 days ago
:::::: commit date: 3 days ago
config: mips-randconfig-s031-20210726 (attached as .config)
compiler: mips-linux-gcc (GCC) 10.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-341-g8af24329-dirty
# https://github.com/0day-ci/linux/commit/c1714c046fe748ad2324623d650c2dfe5b3b7a55
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Wang-Jianchao/ext4-get-discard-out-of-jbd2-commit-context/20210724-154426
git checkout c1714c046fe748ad2324623d650c2dfe5b3b7a55
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash fs/ext4/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
command-line: note: in included file:
builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
builtin:0:0: sparse: this was the original definition
builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
builtin:0:0: sparse: this was the original definition
builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
builtin:0:0: sparse: this was the original definition
builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
builtin:0:0: sparse: this was the original definition
fs/ext4/mballoc.c:994:9: sparse: sparse: context imbalance in 'ext4_mb_choose_next_group_cr1' - wrong count at exit
fs/ext4/mballoc.c:1264:9: sparse: sparse: context imbalance in 'ext4_mb_init_cache' - different lock contexts for basic block
fs/ext4/mballoc.c:2168:5: sparse: sparse: context imbalance in 'ext4_mb_try_best_found' - different lock contexts for basic block
fs/ext4/mballoc.c:2196:5: sparse: sparse: context imbalance in 'ext4_mb_find_by_goal' - different lock contexts for basic block
fs/ext4/mballoc.c:2483:12: sparse: sparse: context imbalance in 'ext4_mb_good_group_nolock' - wrong count at exit
fs/ext4/mballoc.c:2698:87: sparse: sparse: context imbalance in 'ext4_mb_regular_allocator' - different lock contexts for basic block
fs/ext4/mballoc.c:2972:13: sparse: sparse: context imbalance in 'ext4_mb_seq_structs_summary_start' - wrong count at exit
fs/ext4/mballoc.c:3044:13: sparse: sparse: context imbalance in 'ext4_mb_seq_structs_summary_stop' - unexpected unlock
>> fs/ext4/mballoc.c:3333:9: sparse: sparse: context imbalance in 'ext4_discard_work' - different lock contexts for basic block
fs/ext4/mballoc.c:3542:17: sparse: sparse: context imbalance in 'ext4_mb_release' - different lock contexts for basic block
fs/ext4/mballoc.c:3662:26: sparse: sparse: context imbalance in 'ext4_free_data_in_buddy' - wrong count at exit
fs/ext4/mballoc.c:3873:15: sparse: sparse: context imbalance in 'ext4_mb_mark_diskspace_used' - different lock contexts for basic block
fs/ext4/mballoc.c:3881:6: sparse: sparse: context imbalance in 'ext4_mb_mark_bb' - different lock contexts for basic block
fs/ext4/mballoc.c:4203:13: sparse: sparse: context imbalance in 'ext4_discard_allocated_blocks' - different lock contexts for basic block
fs/ext4/mballoc.c:4505:13: sparse: sparse: context imbalance in 'ext4_mb_put_pa' - different lock contexts for basic block
fs/ext4/mballoc.c:4842:9: sparse: sparse: context imbalance in 'ext4_mb_discard_group_preallocations' - different lock contexts for basic block
fs/ext4/mballoc.c:4995:9: sparse: sparse: context imbalance in 'ext4_discard_preallocations' - different lock contexts for basic block
fs/ext4/mballoc.c:5062:9: sparse: sparse: context imbalance in 'ext4_mb_show_ac' - different lock contexts for basic block
fs/ext4/mballoc.c:5290:9: sparse: sparse: context imbalance in 'ext4_mb_discard_lg_preallocations' - different lock contexts for basic block
fs/ext4/mballoc.c:5062:9: sparse: sparse: context imbalance in 'ext4_mb_new_blocks' - different lock contexts for basic block
fs/ext4/mballoc.c:5935:9: sparse: sparse: context imbalance in 'ext4_free_blocks' - different lock contexts for basic block
fs/ext4/mballoc.c:6235:15: sparse: sparse: context imbalance in 'ext4_group_add_blocks' - different lock contexts for basic block
fs/ext4/mballoc.c:6275:24: sparse: sparse: context imbalance in 'ext4_trim_extent' - wrong count at exit
fs/ext4/mballoc.c:6325:9: sparse: sparse: context imbalance in 'ext4_try_to_trim_range' - different lock contexts for basic block
fs/ext4/mballoc.c:6342:1: sparse: sparse: context imbalance in 'ext4_trim_all_free' - different lock contexts for basic block
fs/ext4/mballoc.c:6471:1: sparse: sparse: context imbalance in 'ext4_mballoc_query_range' - different lock contexts for basic block
vim +/ext4_discard_work +3333 fs/ext4/mballoc.c
2892c15ddda6a7 Eric Sandeen 2011-02-12 3315
c1714c046fe748 Wang Jianchao 2021-07-24 3316 static void ext4_discard_work(struct work_struct *work)
c1714c046fe748 Wang Jianchao 2021-07-24 3317 {
c1714c046fe748 Wang Jianchao 2021-07-24 3318 struct ext4_sb_info *sbi = container_of(work,
c1714c046fe748 Wang Jianchao 2021-07-24 3319 struct ext4_sb_info, s_discard_work);
c1714c046fe748 Wang Jianchao 2021-07-24 3320 struct super_block *sb = sbi->s_sb;
c1714c046fe748 Wang Jianchao 2021-07-24 3321 struct ext4_free_data *fd, *nfd;
c1714c046fe748 Wang Jianchao 2021-07-24 3322 struct ext4_buddy e4b;
c1714c046fe748 Wang Jianchao 2021-07-24 3323 struct list_head discard_list;
c1714c046fe748 Wang Jianchao 2021-07-24 3324 ext4_group_t grp, load_grp;
c1714c046fe748 Wang Jianchao 2021-07-24 3325 int err = 0;
c1714c046fe748 Wang Jianchao 2021-07-24 3326
c1714c046fe748 Wang Jianchao 2021-07-24 3327 INIT_LIST_HEAD(&discard_list);
c1714c046fe748 Wang Jianchao 2021-07-24 3328 spin_lock(&sbi->s_md_lock);
c1714c046fe748 Wang Jianchao 2021-07-24 3329 list_splice_init(&sbi->s_discard_list, &discard_list);
c1714c046fe748 Wang Jianchao 2021-07-24 3330 spin_unlock(&sbi->s_md_lock);
c1714c046fe748 Wang Jianchao 2021-07-24 3331
c1714c046fe748 Wang Jianchao 2021-07-24 3332 load_grp = UINT_MAX;
c1714c046fe748 Wang Jianchao 2021-07-24 @3333 list_for_each_entry_safe(fd, nfd, &discard_list, efd_list) {
c1714c046fe748 Wang Jianchao 2021-07-24 3334 /*
c1714c046fe748 Wang Jianchao 2021-07-24 3335 * If filesystem is umounting or no memory, give up the discard
c1714c046fe748 Wang Jianchao 2021-07-24 3336 */
c1714c046fe748 Wang Jianchao 2021-07-24 3337 if ((sb->s_flags & SB_ACTIVE) && !err) {
c1714c046fe748 Wang Jianchao 2021-07-24 3338 grp = fd->efd_group;
c1714c046fe748 Wang Jianchao 2021-07-24 3339 if (grp != load_grp) {
c1714c046fe748 Wang Jianchao 2021-07-24 3340 if (load_grp != UINT_MAX)
c1714c046fe748 Wang Jianchao 2021-07-24 3341 ext4_mb_unload_buddy(&e4b);
c1714c046fe748 Wang Jianchao 2021-07-24 3342
c1714c046fe748 Wang Jianchao 2021-07-24 3343 err = ext4_mb_load_buddy(sb, grp, &e4b);
c1714c046fe748 Wang Jianchao 2021-07-24 3344 if (err) {
c1714c046fe748 Wang Jianchao 2021-07-24 3345 kmem_cache_free(ext4_free_data_cachep, fd);
c1714c046fe748 Wang Jianchao 2021-07-24 3346 load_grp = UINT_MAX;
c1714c046fe748 Wang Jianchao 2021-07-24 3347 continue;
c1714c046fe748 Wang Jianchao 2021-07-24 3348 } else {
c1714c046fe748 Wang Jianchao 2021-07-24 3349 load_grp = grp;
c1714c046fe748 Wang Jianchao 2021-07-24 3350 }
c1714c046fe748 Wang Jianchao 2021-07-24 3351 }
c1714c046fe748 Wang Jianchao 2021-07-24 3352
c1714c046fe748 Wang Jianchao 2021-07-24 3353 ext4_lock_group(sb, grp);
c1714c046fe748 Wang Jianchao 2021-07-24 3354 ext4_try_to_trim_range(sb, &e4b, fd->efd_start_cluster,
c1714c046fe748 Wang Jianchao 2021-07-24 3355 fd->efd_start_cluster + fd->efd_count - 1, 1);
c1714c046fe748 Wang Jianchao 2021-07-24 3356 ext4_unlock_group(sb, grp);
c1714c046fe748 Wang Jianchao 2021-07-24 3357 }
c1714c046fe748 Wang Jianchao 2021-07-24 3358 kmem_cache_free(ext4_free_data_cachep, fd);
c1714c046fe748 Wang Jianchao 2021-07-24 3359 }
c1714c046fe748 Wang Jianchao 2021-07-24 3360
c1714c046fe748 Wang Jianchao 2021-07-24 3361 if (load_grp != UINT_MAX)
c1714c046fe748 Wang Jianchao 2021-07-24 3362 ext4_mb_unload_buddy(&e4b);
c1714c046fe748 Wang Jianchao 2021-07-24 3363 }
c1714c046fe748 Wang Jianchao 2021-07-24 3364
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36123 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
2021-08-26 7:51 ` Wang Jianchao
@ 2021-08-26 8:58 ` Wang Jianchao
0 siblings, 0 replies; 7+ messages in thread
From: Wang Jianchao @ 2021-08-26 8:58 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel, adilger.kernel
On 2021/8/26 3:51 PM, Wang Jianchao wrote:
>>
>>> @@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
>>> if (ext4_free_data_cachep == NULL)
>>> goto out_ac_free;
>>>
>>> + ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
>>> + if (!ext4_discard_wq)
>>> + goto out_free_data;
>>> +
>>
>>
>> Perhaps we should only allocate the workqueue when it's needed ---
>> e.g., when a file system is mounted or remounted with "-o discard"?
>>
>> Then in ext4_exit_malloc(), we only free it if ext4_discard_wq is
>> non-NULL.
>>
>> This would save a bit of memory on systems that wouldn't need the ext4
>> discard work queue.
>
> Yes, it make sense to the system with pool memory
s/pool/poor :)
>
> Thanks so much
> Jianchao
>
>>
>> - Ted
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
2021-08-12 19:46 ` Theodore Ts'o
@ 2021-08-26 7:51 ` Wang Jianchao
2021-08-26 8:58 ` Wang Jianchao
0 siblings, 1 reply; 7+ messages in thread
From: Wang Jianchao @ 2021-08-26 7:51 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel, adilger.kernel
On 2021/8/13 3:46 AM, Theodore Ts'o wrote:
> On Sat, Jul 24, 2021 at 03:41:23PM +0800, Wang Jianchao wrote:
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 34be2f07449d..a496509e61b7 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>> struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>> int count;
>>
>> + if (test_opt(sb, DISCARD)) {
>> + /*
>> + * wait the discard work to drain all of ext4_free_data
>> + */
>> + queue_work(ext4_discard_wq, &sbi->s_discard_work);
>> + flush_work(&sbi->s_discard_work);
>
> I agree with Jan --- it's not clear to me why the call to queue_work()
> is needed. After the flush_work() call returns, if s_discard_work is
> still non-empty, there must be something terribly wrong --- are we
> missing something?
Yes,the queue_work() is redundant.
I will get rid of it in next version.
>
>> @@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
>> if (ext4_free_data_cachep == NULL)
>> goto out_ac_free;
>>
>> + ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
>> + if (!ext4_discard_wq)
>> + goto out_free_data;
>> +
>
>
> Perhaps we should only allocate the workqueue when it's needed ---
> e.g., when a file system is mounted or remounted with "-o discard"?
>
> Then in ext4_exit_malloc(), we only free it if ext4_discard_wq is
> non-NULL.
>
> This would save a bit of memory on systems that wouldn't need the ext4
> discard work queue.
Yes, it make sense to the system with pool memory
Thanks so much
Jianchao
>
> - Ted
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
2021-08-04 15:45 ` Jan Kara
@ 2021-08-26 7:15 ` Wang Jianchao
0 siblings, 0 replies; 7+ messages in thread
From: Wang Jianchao @ 2021-08-26 7:15 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4, linux-kernel, tytso, adilger.kernel
On 2021/8/4 11:45 PM, Jan Kara wrote:
> On Sat 24-07-21 15:41:23, Wang Jianchao wrote:
>> From: Wang Jianchao <wangjianchao@kuaishou.com>
>>
>> Right now, discard is issued and waited to be completed in jbd2
>> commit kthread context after the logs are committed. When large
>> amount of files are deleted and discard is flooding, jbd2 commit
>> kthread can be blocked for long time. Then all of the metadata
>> operations can be blocked to wait the log space.
>>
>> One case is the page fault path with read mm->mmap_sem held, which
>> wants to update the file time but has to wait for the log space.
>> When other threads in the task wants to do mmap, then write mmap_sem
>> is blocked. Finally all of the following read mmap_sem requirements
>> are blocked, even the ps command which need to read the /proc/pid/
>> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
>> used to be blocked for 5 mins.
>>
>> This patch frees the blocks back to buddy after commit and then do
>> discard in a async kworker context in fstrim fashion, namely,
>> - mark blocks to be discarded as used if they have not been allocated
>> - do discard
>> - mark them free
>> After this, jbd2 commit kthread won't be blocked any more by discard
>> and we won't get NOSPC even if the discard is slow or throttled.
>>
>> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
>> Suggested-by: Theodore Ts'o <tytso@mit.edu>
>> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
>
> Looks good to me. Just one small comment below. With that addressed feel
> free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
>
>> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>> struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>> int count;
>>
>> + if (test_opt(sb, DISCARD)) {
>> + /*
>> + * wait the discard work to drain all of ext4_free_data
>> + */
>> + queue_work(ext4_discard_wq, &sbi->s_discard_work);
>
> Do we really need to queue the work here? The filesystem should be
> quiescent by now, we take care to queue the work whenever we add item to
> empty list. So it should be enough to have flush_work() here and then
> possibly
>
> WARN_ON_ONCE(!list_empty(&sbi->s_discard_list))
>
> Or am I missing something?
queue_work here is indeed redundant.
Thanks so much for you point out this.
Jianchao
>
> Honza
>
>> + flush_work(&sbi->s_discard_work);
>> + }
>> +
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
2021-07-24 7:41 ` [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
2021-08-04 15:45 ` Jan Kara
@ 2021-08-12 19:46 ` Theodore Ts'o
2021-08-26 7:51 ` Wang Jianchao
1 sibling, 1 reply; 7+ messages in thread
From: Theodore Ts'o @ 2021-08-12 19:46 UTC (permalink / raw)
To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, adilger.kernel
On Sat, Jul 24, 2021 at 03:41:23PM +0800, Wang Jianchao wrote:
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 34be2f07449d..a496509e61b7 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
> struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
> int count;
>
> + if (test_opt(sb, DISCARD)) {
> + /*
> + * wait the discard work to drain all of ext4_free_data
> + */
> + queue_work(ext4_discard_wq, &sbi->s_discard_work);
> + flush_work(&sbi->s_discard_work);
I agree with Jan --- it's not clear to me why the call to queue_work()
is needed. After the flush_work() call returns, if s_discard_work is
still non-empty, there must be something terribly wrong --- are we
missing something?
> @@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
> if (ext4_free_data_cachep == NULL)
> goto out_ac_free;
>
> + ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
> + if (!ext4_discard_wq)
> + goto out_free_data;
> +
Perhaps we should only allocate the workqueue when it's needed ---
e.g., when a file system is mounted or remounted with "-o discard"?
Then in ext4_exit_malloc(), we only free it if ext4_discard_wq is
non-NULL.
This would save a bit of memory on systems that wouldn't need the ext4
discard work queue.
- Ted
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
2021-07-24 7:41 ` [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
@ 2021-08-04 15:45 ` Jan Kara
2021-08-26 7:15 ` Wang Jianchao
2021-08-12 19:46 ` Theodore Ts'o
1 sibling, 1 reply; 7+ messages in thread
From: Jan Kara @ 2021-08-04 15:45 UTC (permalink / raw)
To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, tytso, adilger.kernel
On Sat 24-07-21 15:41:23, Wang Jianchao wrote:
> From: Wang Jianchao <wangjianchao@kuaishou.com>
>
> Right now, discard is issued and waited to be completed in jbd2
> commit kthread context after the logs are committed. When large
> amount of files are deleted and discard is flooding, jbd2 commit
> kthread can be blocked for long time. Then all of the metadata
> operations can be blocked to wait the log space.
>
> One case is the page fault path with read mm->mmap_sem held, which
> wants to update the file time but has to wait for the log space.
> When other threads in the task wants to do mmap, then write mmap_sem
> is blocked. Finally all of the following read mmap_sem requirements
> are blocked, even the ps command which need to read the /proc/pid/
> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
> used to be blocked for 5 mins.
>
> This patch frees the blocks back to buddy after commit and then do
> discard in a async kworker context in fstrim fashion, namely,
> - mark blocks to be discarded as used if they have not been allocated
> - do discard
> - mark them free
> After this, jbd2 commit kthread won't be blocked any more by discard
> and we won't get NOSPC even if the discard is slow or throttled.
>
> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
> Suggested-by: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
Looks good to me. Just one small comment below. With that addressed feel
free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
> struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
> int count;
>
> + if (test_opt(sb, DISCARD)) {
> + /*
> + * wait the discard work to drain all of ext4_free_data
> + */
> + queue_work(ext4_discard_wq, &sbi->s_discard_work);
Do we really need to queue the work here? The filesystem should be
quiescent by now, we take care to queue the work whenever we add item to
empty list. So it should be enough to have flush_work() here and then
possibly
WARN_ON_ONCE(!list_empty(&sbi->s_discard_list))
Or am I missing something?
Honza
> + flush_work(&sbi->s_discard_work);
> + }
> +
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
2021-07-24 7:41 [PATCH V3 0/5] ext4: get discard out of jbd2 commit context Wang Jianchao
@ 2021-07-24 7:41 ` Wang Jianchao
2021-08-04 15:45 ` Jan Kara
2021-08-12 19:46 ` Theodore Ts'o
0 siblings, 2 replies; 7+ messages in thread
From: Wang Jianchao @ 2021-07-24 7:41 UTC (permalink / raw)
To: linux-ext4, linux-kernel; +Cc: tytso, adilger.kernel
From: Wang Jianchao <wangjianchao@kuaishou.com>
Right now, discard is issued and waited to be completed in jbd2
commit kthread context after the logs are committed. When large
amount of files are deleted and discard is flooding, jbd2 commit
kthread can be blocked for long time. Then all of the metadata
operations can be blocked to wait the log space.
One case is the page fault path with read mm->mmap_sem held, which
wants to update the file time but has to wait for the log space.
When other threads in the task wants to do mmap, then write mmap_sem
is blocked. Finally all of the following read mmap_sem requirements
are blocked, even the ps command which need to read the /proc/pid/
-cmdline. Our monitor service which needs to read /proc/pid/cmdline
used to be blocked for 5 mins.
This patch frees the blocks back to buddy after commit and then do
discard in a async kworker context in fstrim fashion, namely,
- mark blocks to be discarded as used if they have not been allocated
- do discard
- mark them free
After this, jbd2 commit kthread won't be blocked any more by discard
and we won't get NOSPC even if the discard is slow or throttled.
Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
fs/ext4/ext4.h | 2 +
fs/ext4/mballoc.c | 109 +++++++++++++++++++++++++++++++++++-----------
2 files changed, 86 insertions(+), 25 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3c51e243450d..6b678b968d84 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1536,6 +1536,8 @@ struct ext4_sb_info {
unsigned int s_mb_free_pending;
struct list_head s_freed_data_list; /* List of blocks to be freed
after commit completed */
+ struct list_head s_discard_list;
+ struct work_struct s_discard_work;
struct rb_root s_mb_avg_fragment_size_root;
rwlock_t s_mb_rb_lock;
struct list_head *s_mb_largest_free_orders;
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 34be2f07449d..a496509e61b7 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -386,6 +386,7 @@
static struct kmem_cache *ext4_pspace_cachep;
static struct kmem_cache *ext4_ac_cachep;
static struct kmem_cache *ext4_free_data_cachep;
+static struct workqueue_struct *ext4_discard_wq;
/* We create slab caches for groupinfo data structures based on the
* superblock block size. There will be one per mounted filesystem for
@@ -408,6 +409,10 @@ static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac);
static bool ext4_mb_good_group(struct ext4_allocation_context *ac,
ext4_group_t group, int cr);
+static int ext4_try_to_trim_range(struct super_block *sb,
+ struct ext4_buddy *e4b, ext4_grpblk_t start,
+ ext4_grpblk_t max, ext4_grpblk_t minblocks);
+
/*
* The algorithm using this percpu seq counter goes below:
* 1. We sample the percpu discard_pa_seq counter before trying for block
@@ -3308,6 +3313,55 @@ static int ext4_groupinfo_create_slab(size_t size)
return 0;
}
+static void ext4_discard_work(struct work_struct *work)
+{
+ struct ext4_sb_info *sbi = container_of(work,
+ struct ext4_sb_info, s_discard_work);
+ struct super_block *sb = sbi->s_sb;
+ struct ext4_free_data *fd, *nfd;
+ struct ext4_buddy e4b;
+ struct list_head discard_list;
+ ext4_group_t grp, load_grp;
+ int err = 0;
+
+ INIT_LIST_HEAD(&discard_list);
+ spin_lock(&sbi->s_md_lock);
+ list_splice_init(&sbi->s_discard_list, &discard_list);
+ spin_unlock(&sbi->s_md_lock);
+
+ load_grp = UINT_MAX;
+ list_for_each_entry_safe(fd, nfd, &discard_list, efd_list) {
+ /*
+ * If filesystem is umounting or no memory, give up the discard
+ */
+ if ((sb->s_flags & SB_ACTIVE) && !err) {
+ grp = fd->efd_group;
+ if (grp != load_grp) {
+ if (load_grp != UINT_MAX)
+ ext4_mb_unload_buddy(&e4b);
+
+ err = ext4_mb_load_buddy(sb, grp, &e4b);
+ if (err) {
+ kmem_cache_free(ext4_free_data_cachep, fd);
+ load_grp = UINT_MAX;
+ continue;
+ } else {
+ load_grp = grp;
+ }
+ }
+
+ ext4_lock_group(sb, grp);
+ ext4_try_to_trim_range(sb, &e4b, fd->efd_start_cluster,
+ fd->efd_start_cluster + fd->efd_count - 1, 1);
+ ext4_unlock_group(sb, grp);
+ }
+ kmem_cache_free(ext4_free_data_cachep, fd);
+ }
+
+ if (load_grp != UINT_MAX)
+ ext4_mb_unload_buddy(&e4b);
+}
+
int ext4_mb_init(struct super_block *sb)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -3376,6 +3430,8 @@ int ext4_mb_init(struct super_block *sb)
spin_lock_init(&sbi->s_md_lock);
sbi->s_mb_free_pending = 0;
INIT_LIST_HEAD(&sbi->s_freed_data_list);
+ INIT_LIST_HEAD(&sbi->s_discard_list);
+ INIT_WORK(&sbi->s_discard_work, ext4_discard_work);
sbi->s_mb_max_to_scan = MB_DEFAULT_MAX_TO_SCAN;
sbi->s_mb_min_to_scan = MB_DEFAULT_MIN_TO_SCAN;
@@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
int count;
+ if (test_opt(sb, DISCARD)) {
+ /*
+ * wait the discard work to drain all of ext4_free_data
+ */
+ queue_work(ext4_discard_wq, &sbi->s_discard_work);
+ flush_work(&sbi->s_discard_work);
+ }
+
if (sbi->s_group_info) {
for (i = 0; i < ngroups; i++) {
cond_resched();
@@ -3596,7 +3660,6 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
put_page(e4b.bd_bitmap_page);
}
ext4_unlock_group(sb, entry->efd_group);
- kmem_cache_free(ext4_free_data_cachep, entry);
ext4_mb_unload_buddy(&e4b);
mb_debug(sb, "freed %d blocks in %d structures\n", count,
@@ -3611,10 +3674,9 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_free_data *entry, *tmp;
- struct bio *discard_bio = NULL;
struct list_head freed_data_list;
struct list_head *cut_pos = NULL;
- int err;
+ bool wake;
INIT_LIST_HEAD(&freed_data_list);
@@ -3629,30 +3691,20 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
cut_pos);
spin_unlock(&sbi->s_md_lock);
- if (test_opt(sb, DISCARD)) {
- list_for_each_entry(entry, &freed_data_list, efd_list) {
- err = ext4_issue_discard(sb, entry->efd_group,
- entry->efd_start_cluster,
- entry->efd_count,
- &discard_bio);
- if (err && err != -EOPNOTSUPP) {
- ext4_msg(sb, KERN_WARNING, "discard request in"
- " group:%d block:%d count:%d failed"
- " with %d", entry->efd_group,
- entry->efd_start_cluster,
- entry->efd_count, err);
- } else if (err == -EOPNOTSUPP)
- break;
- }
+ list_for_each_entry(entry, &freed_data_list, efd_list)
+ ext4_free_data_in_buddy(sb, entry);
- if (discard_bio) {
- submit_bio_wait(discard_bio);
- bio_put(discard_bio);
- }
+ if (test_opt(sb, DISCARD)) {
+ spin_lock(&sbi->s_md_lock);
+ wake = list_empty(&sbi->s_discard_list);
+ list_splice_tail(&freed_data_list, &sbi->s_discard_list);
+ spin_unlock(&sbi->s_md_lock);
+ if (wake)
+ queue_work(ext4_discard_wq, &sbi->s_discard_work);
+ } else {
+ list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
+ kmem_cache_free(ext4_free_data_cachep, entry);
}
-
- list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
- ext4_free_data_in_buddy(sb, entry);
}
int __init ext4_init_mballoc(void)
@@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
if (ext4_free_data_cachep == NULL)
goto out_ac_free;
+ ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
+ if (!ext4_discard_wq)
+ goto out_free_data;
+
return 0;
+out_free_data:
+ kmem_cache_destroy(ext4_free_data_cachep);
out_ac_free:
kmem_cache_destroy(ext4_ac_cachep);
out_pa_free:
@@ -3693,6 +3751,7 @@ void ext4_exit_mballoc(void)
kmem_cache_destroy(ext4_ac_cachep);
kmem_cache_destroy(ext4_free_data_cachep);
ext4_groupinfo_destroy_slabs();
+ destroy_workqueue(ext4_discard_wq);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-08-26 8:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-26 20:30 [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex kernel test robot
-- strict thread matches above, loose matches on Subject: below --
2021-07-24 7:41 [PATCH V3 0/5] ext4: get discard out of jbd2 commit context Wang Jianchao
2021-07-24 7:41 ` [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
2021-08-04 15:45 ` Jan Kara
2021-08-26 7:15 ` Wang Jianchao
2021-08-12 19:46 ` Theodore Ts'o
2021-08-26 7:51 ` Wang Jianchao
2021-08-26 8:58 ` Wang Jianchao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.