All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
@ 2021-07-26 20:30 kernel test robot
  0 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2021-07-26 20:30 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 9859 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210724074124.25731-5-jianchao.wan9@gmail.com>
References: <20210724074124.25731-5-jianchao.wan9@gmail.com>
TO: Wang Jianchao <jianchao.wan9@gmail.com>
TO: linux-ext4(a)vger.kernel.org
TO: linux-kernel(a)vger.kernel.org
CC: tytso(a)mit.edu
CC: adilger.kernel(a)dilger.ca

Hi Wang,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on ext4/dev]
[also build test WARNING on linux/master linus/master v5.14-rc3 next-20210723]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Wang-Jianchao/ext4-get-discard-out-of-jbd2-commit-context/20210724-154426
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
:::::: branch date: 3 days ago
:::::: commit date: 3 days ago
config: mips-randconfig-s031-20210726 (attached as .config)
compiler: mips-linux-gcc (GCC) 10.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/c1714c046fe748ad2324623d650c2dfe5b3b7a55
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Wang-Jianchao/ext4-get-discard-out-of-jbd2-commit-context/20210724-154426
        git checkout c1714c046fe748ad2324623d650c2dfe5b3b7a55
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash fs/ext4/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   command-line: note: in included file:
   builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
   builtin:0:0: sparse: this was the original definition
   builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
   builtin:0:0: sparse: this was the original definition
   builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
   builtin:0:0: sparse: this was the original definition
   builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
   builtin:0:0: sparse: this was the original definition
   fs/ext4/mballoc.c:994:9: sparse: sparse: context imbalance in 'ext4_mb_choose_next_group_cr1' - wrong count at exit
   fs/ext4/mballoc.c:1264:9: sparse: sparse: context imbalance in 'ext4_mb_init_cache' - different lock contexts for basic block
   fs/ext4/mballoc.c:2168:5: sparse: sparse: context imbalance in 'ext4_mb_try_best_found' - different lock contexts for basic block
   fs/ext4/mballoc.c:2196:5: sparse: sparse: context imbalance in 'ext4_mb_find_by_goal' - different lock contexts for basic block
   fs/ext4/mballoc.c:2483:12: sparse: sparse: context imbalance in 'ext4_mb_good_group_nolock' - wrong count at exit
   fs/ext4/mballoc.c:2698:87: sparse: sparse: context imbalance in 'ext4_mb_regular_allocator' - different lock contexts for basic block
   fs/ext4/mballoc.c:2972:13: sparse: sparse: context imbalance in 'ext4_mb_seq_structs_summary_start' - wrong count at exit
   fs/ext4/mballoc.c:3044:13: sparse: sparse: context imbalance in 'ext4_mb_seq_structs_summary_stop' - unexpected unlock
>> fs/ext4/mballoc.c:3333:9: sparse: sparse: context imbalance in 'ext4_discard_work' - different lock contexts for basic block
   fs/ext4/mballoc.c:3542:17: sparse: sparse: context imbalance in 'ext4_mb_release' - different lock contexts for basic block
   fs/ext4/mballoc.c:3662:26: sparse: sparse: context imbalance in 'ext4_free_data_in_buddy' - wrong count at exit
   fs/ext4/mballoc.c:3873:15: sparse: sparse: context imbalance in 'ext4_mb_mark_diskspace_used' - different lock contexts for basic block
   fs/ext4/mballoc.c:3881:6: sparse: sparse: context imbalance in 'ext4_mb_mark_bb' - different lock contexts for basic block
   fs/ext4/mballoc.c:4203:13: sparse: sparse: context imbalance in 'ext4_discard_allocated_blocks' - different lock contexts for basic block
   fs/ext4/mballoc.c:4505:13: sparse: sparse: context imbalance in 'ext4_mb_put_pa' - different lock contexts for basic block
   fs/ext4/mballoc.c:4842:9: sparse: sparse: context imbalance in 'ext4_mb_discard_group_preallocations' - different lock contexts for basic block
   fs/ext4/mballoc.c:4995:9: sparse: sparse: context imbalance in 'ext4_discard_preallocations' - different lock contexts for basic block
   fs/ext4/mballoc.c:5062:9: sparse: sparse: context imbalance in 'ext4_mb_show_ac' - different lock contexts for basic block
   fs/ext4/mballoc.c:5290:9: sparse: sparse: context imbalance in 'ext4_mb_discard_lg_preallocations' - different lock contexts for basic block
   fs/ext4/mballoc.c:5062:9: sparse: sparse: context imbalance in 'ext4_mb_new_blocks' - different lock contexts for basic block
   fs/ext4/mballoc.c:5935:9: sparse: sparse: context imbalance in 'ext4_free_blocks' - different lock contexts for basic block
   fs/ext4/mballoc.c:6235:15: sparse: sparse: context imbalance in 'ext4_group_add_blocks' - different lock contexts for basic block
   fs/ext4/mballoc.c:6275:24: sparse: sparse: context imbalance in 'ext4_trim_extent' - wrong count at exit
   fs/ext4/mballoc.c:6325:9: sparse: sparse: context imbalance in 'ext4_try_to_trim_range' - different lock contexts for basic block
   fs/ext4/mballoc.c:6342:1: sparse: sparse: context imbalance in 'ext4_trim_all_free' - different lock contexts for basic block
   fs/ext4/mballoc.c:6471:1: sparse: sparse: context imbalance in 'ext4_mballoc_query_range' - different lock contexts for basic block

vim +/ext4_discard_work +3333 fs/ext4/mballoc.c

2892c15ddda6a7 Eric Sandeen  2011-02-12  3315  
c1714c046fe748 Wang Jianchao 2021-07-24  3316  static void ext4_discard_work(struct work_struct *work)
c1714c046fe748 Wang Jianchao 2021-07-24  3317  {
c1714c046fe748 Wang Jianchao 2021-07-24  3318  	struct ext4_sb_info *sbi = container_of(work,
c1714c046fe748 Wang Jianchao 2021-07-24  3319  			struct ext4_sb_info, s_discard_work);
c1714c046fe748 Wang Jianchao 2021-07-24  3320  	struct super_block *sb = sbi->s_sb;
c1714c046fe748 Wang Jianchao 2021-07-24  3321  	struct ext4_free_data *fd, *nfd;
c1714c046fe748 Wang Jianchao 2021-07-24  3322  	struct ext4_buddy e4b;
c1714c046fe748 Wang Jianchao 2021-07-24  3323  	struct list_head discard_list;
c1714c046fe748 Wang Jianchao 2021-07-24  3324  	ext4_group_t grp, load_grp;
c1714c046fe748 Wang Jianchao 2021-07-24  3325  	int err = 0;
c1714c046fe748 Wang Jianchao 2021-07-24  3326  
c1714c046fe748 Wang Jianchao 2021-07-24  3327  	INIT_LIST_HEAD(&discard_list);
c1714c046fe748 Wang Jianchao 2021-07-24  3328  	spin_lock(&sbi->s_md_lock);
c1714c046fe748 Wang Jianchao 2021-07-24  3329  	list_splice_init(&sbi->s_discard_list, &discard_list);
c1714c046fe748 Wang Jianchao 2021-07-24  3330  	spin_unlock(&sbi->s_md_lock);
c1714c046fe748 Wang Jianchao 2021-07-24  3331  
c1714c046fe748 Wang Jianchao 2021-07-24  3332  	load_grp = UINT_MAX;
c1714c046fe748 Wang Jianchao 2021-07-24 @3333  	list_for_each_entry_safe(fd, nfd, &discard_list, efd_list) {
c1714c046fe748 Wang Jianchao 2021-07-24  3334  		/*
c1714c046fe748 Wang Jianchao 2021-07-24  3335  		 * If filesystem is umounting or no memory, give up the discard
c1714c046fe748 Wang Jianchao 2021-07-24  3336  		 */
c1714c046fe748 Wang Jianchao 2021-07-24  3337  		if ((sb->s_flags & SB_ACTIVE) && !err) {
c1714c046fe748 Wang Jianchao 2021-07-24  3338  			grp = fd->efd_group;
c1714c046fe748 Wang Jianchao 2021-07-24  3339  			if (grp != load_grp) {
c1714c046fe748 Wang Jianchao 2021-07-24  3340  				if (load_grp != UINT_MAX)
c1714c046fe748 Wang Jianchao 2021-07-24  3341  					ext4_mb_unload_buddy(&e4b);
c1714c046fe748 Wang Jianchao 2021-07-24  3342  
c1714c046fe748 Wang Jianchao 2021-07-24  3343  				err = ext4_mb_load_buddy(sb, grp, &e4b);
c1714c046fe748 Wang Jianchao 2021-07-24  3344  				if (err) {
c1714c046fe748 Wang Jianchao 2021-07-24  3345  					kmem_cache_free(ext4_free_data_cachep, fd);
c1714c046fe748 Wang Jianchao 2021-07-24  3346  					load_grp = UINT_MAX;
c1714c046fe748 Wang Jianchao 2021-07-24  3347  					continue;
c1714c046fe748 Wang Jianchao 2021-07-24  3348  				} else {
c1714c046fe748 Wang Jianchao 2021-07-24  3349  					load_grp = grp;
c1714c046fe748 Wang Jianchao 2021-07-24  3350  				}
c1714c046fe748 Wang Jianchao 2021-07-24  3351  			}
c1714c046fe748 Wang Jianchao 2021-07-24  3352  
c1714c046fe748 Wang Jianchao 2021-07-24  3353  			ext4_lock_group(sb, grp);
c1714c046fe748 Wang Jianchao 2021-07-24  3354  			ext4_try_to_trim_range(sb, &e4b, fd->efd_start_cluster,
c1714c046fe748 Wang Jianchao 2021-07-24  3355  						fd->efd_start_cluster + fd->efd_count - 1, 1);
c1714c046fe748 Wang Jianchao 2021-07-24  3356  			ext4_unlock_group(sb, grp);
c1714c046fe748 Wang Jianchao 2021-07-24  3357  		}
c1714c046fe748 Wang Jianchao 2021-07-24  3358  		kmem_cache_free(ext4_free_data_cachep, fd);
c1714c046fe748 Wang Jianchao 2021-07-24  3359  	}
c1714c046fe748 Wang Jianchao 2021-07-24  3360  
c1714c046fe748 Wang Jianchao 2021-07-24  3361  	if (load_grp != UINT_MAX)
c1714c046fe748 Wang Jianchao 2021-07-24  3362  		ext4_mb_unload_buddy(&e4b);
c1714c046fe748 Wang Jianchao 2021-07-24  3363  }
c1714c046fe748 Wang Jianchao 2021-07-24  3364  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36123 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-08-26  7:51     ` Wang Jianchao
@ 2021-08-26  8:58       ` Wang Jianchao
  0 siblings, 0 replies; 7+ messages in thread
From: Wang Jianchao @ 2021-08-26  8:58 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel, adilger.kernel



On 2021/8/26 3:51 PM, Wang Jianchao wrote:

>>
>>> @@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
>>>  	if (ext4_free_data_cachep == NULL)
>>>  		goto out_ac_free;
>>>  
>>> +	ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
>>> +	if (!ext4_discard_wq)
>>> +		goto out_free_data;
>>> +
>>
>>
>> Perhaps we should only allocate the workqueue when it's needed ---
>> e.g., when a file system is mounted or remounted with "-o discard"?
>>
>> Then in ext4_exit_malloc(), we only free it if ext4_discard_wq is
>> non-NULL.
>>
>> This would save a bit of memory on systems that wouldn't need the ext4
>> discard work queue.
> 
> Yes, it make sense to the system with pool memory

s/pool/poor  :)

> 
> Thanks so much
> Jianchao
> 
>>
>> 					- Ted
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-08-12 19:46   ` Theodore Ts'o
@ 2021-08-26  7:51     ` Wang Jianchao
  2021-08-26  8:58       ` Wang Jianchao
  0 siblings, 1 reply; 7+ messages in thread
From: Wang Jianchao @ 2021-08-26  7:51 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel, adilger.kernel



On 2021/8/13 3:46 AM, Theodore Ts'o wrote:
> On Sat, Jul 24, 2021 at 03:41:23PM +0800, Wang Jianchao wrote:
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 34be2f07449d..a496509e61b7 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>>  	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>>  	int count;
>>  
>> +	if (test_opt(sb, DISCARD)) {
>> +		/*
>> +		 * wait the discard work to drain all of ext4_free_data
>> +		 */
>> +		queue_work(ext4_discard_wq, &sbi->s_discard_work);
>> +		flush_work(&sbi->s_discard_work);
> 
> I agree with Jan --- it's not clear to me why the call to queue_work()
> is needed.  After the flush_work() call returns, if s_discard_work is
> still non-empty, there must be something terribly wrong --- are we
> missing something?

Yes,the queue_work() is redundant.
I will get rid of it in next version.

> 
>> @@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
>>  	if (ext4_free_data_cachep == NULL)
>>  		goto out_ac_free;
>>  
>> +	ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
>> +	if (!ext4_discard_wq)
>> +		goto out_free_data;
>> +
> 
> 
> Perhaps we should only allocate the workqueue when it's needed ---
> e.g., when a file system is mounted or remounted with "-o discard"?
> 
> Then in ext4_exit_malloc(), we only free it if ext4_discard_wq is
> non-NULL.
> 
> This would save a bit of memory on systems that wouldn't need the ext4
> discard work queue.

Yes, it make sense to the system with pool memory

Thanks so much
Jianchao

> 
> 					- Ted
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-08-04 15:45   ` Jan Kara
@ 2021-08-26  7:15     ` Wang Jianchao
  0 siblings, 0 replies; 7+ messages in thread
From: Wang Jianchao @ 2021-08-26  7:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, linux-kernel, tytso, adilger.kernel



On 2021/8/4 11:45 PM, Jan Kara wrote:
> On Sat 24-07-21 15:41:23, Wang Jianchao wrote:
>> From: Wang Jianchao <wangjianchao@kuaishou.com>
>>
>> Right now, discard is issued and waited to be completed in jbd2
>> commit kthread context after the logs are committed. When large
>> amount of files are deleted and discard is flooding, jbd2 commit
>> kthread can be blocked for long time. Then all of the metadata
>> operations can be blocked to wait the log space.
>>
>> One case is the page fault path with read mm->mmap_sem held, which
>> wants to update the file time but has to wait for the log space.
>> When other threads in the task wants to do mmap, then write mmap_sem
>> is blocked. Finally all of the following read mmap_sem requirements
>> are blocked, even the ps command which need to read the /proc/pid/
>> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
>> used to be blocked for 5 mins.
>>
>> This patch frees the blocks back to buddy after commit and then do
>> discard in a async kworker context in fstrim fashion, namely,
>>  - mark blocks to be discarded as used if they have not been allocated
>>  - do discard
>>  - mark them free
>> After this, jbd2 commit kthread won't be blocked any more by discard
>> and we won't get NOSPC even if the discard is slow or throttled.
>>
>> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
>> Suggested-by: Theodore Ts'o <tytso@mit.edu>
>> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
> 
> Looks good to me. Just one small comment below. With that addressed feel
> free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> 
>> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>>  	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>>  	int count;
>>  
>> +	if (test_opt(sb, DISCARD)) {
>> +		/*
>> +		 * wait the discard work to drain all of ext4_free_data
>> +		 */
>> +		queue_work(ext4_discard_wq, &sbi->s_discard_work);
> 
> Do we really need to queue the work here? The filesystem should be
> quiescent by now, we take care to queue the work whenever we add item to
> empty list. So it should be enough to have flush_work() here and then
> possibly
> 
> 	WARN_ON_ONCE(!list_empty(&sbi->s_discard_list))
> 
> Or am I missing something?

queue_work here is indeed redundant.

Thanks so much for you point out this.
Jianchao

> 
> 								Honza
> 
>> +		flush_work(&sbi->s_discard_work);
>> +	}
>> +

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-07-24  7:41 ` [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
  2021-08-04 15:45   ` Jan Kara
@ 2021-08-12 19:46   ` Theodore Ts'o
  2021-08-26  7:51     ` Wang Jianchao
  1 sibling, 1 reply; 7+ messages in thread
From: Theodore Ts'o @ 2021-08-12 19:46 UTC (permalink / raw)
  To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, adilger.kernel

On Sat, Jul 24, 2021 at 03:41:23PM +0800, Wang Jianchao wrote:
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 34be2f07449d..a496509e61b7 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>  	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>  	int count;
>  
> +	if (test_opt(sb, DISCARD)) {
> +		/*
> +		 * wait the discard work to drain all of ext4_free_data
> +		 */
> +		queue_work(ext4_discard_wq, &sbi->s_discard_work);
> +		flush_work(&sbi->s_discard_work);

I agree with Jan --- it's not clear to me why the call to queue_work()
is needed.  After the flush_work() call returns, if s_discard_work is
still non-empty, there must be something terribly wrong --- are we
missing something?

> @@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
>  	if (ext4_free_data_cachep == NULL)
>  		goto out_ac_free;
>  
> +	ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
> +	if (!ext4_discard_wq)
> +		goto out_free_data;
> +


Perhaps we should only allocate the workqueue when it's needed ---
e.g., when a file system is mounted or remounted with "-o discard"?

Then in ext4_exit_malloc(), we only free it if ext4_discard_wq is
non-NULL.

This would save a bit of memory on systems that wouldn't need the ext4
discard work queue.

					- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-07-24  7:41 ` [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
@ 2021-08-04 15:45   ` Jan Kara
  2021-08-26  7:15     ` Wang Jianchao
  2021-08-12 19:46   ` Theodore Ts'o
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Kara @ 2021-08-04 15:45 UTC (permalink / raw)
  To: Wang Jianchao; +Cc: linux-ext4, linux-kernel, tytso, adilger.kernel

On Sat 24-07-21 15:41:23, Wang Jianchao wrote:
> From: Wang Jianchao <wangjianchao@kuaishou.com>
> 
> Right now, discard is issued and waited to be completed in jbd2
> commit kthread context after the logs are committed. When large
> amount of files are deleted and discard is flooding, jbd2 commit
> kthread can be blocked for long time. Then all of the metadata
> operations can be blocked to wait the log space.
> 
> One case is the page fault path with read mm->mmap_sem held, which
> wants to update the file time but has to wait for the log space.
> When other threads in the task wants to do mmap, then write mmap_sem
> is blocked. Finally all of the following read mmap_sem requirements
> are blocked, even the ps command which need to read the /proc/pid/
> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
> used to be blocked for 5 mins.
> 
> This patch frees the blocks back to buddy after commit and then do
> discard in a async kworker context in fstrim fashion, namely,
>  - mark blocks to be discarded as used if they have not been allocated
>  - do discard
>  - mark them free
> After this, jbd2 commit kthread won't be blocked any more by discard
> and we won't get NOSPC even if the discard is slow or throttled.
> 
> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
> Suggested-by: Theodore Ts'o <tytso@mit.edu>
> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>

Looks good to me. Just one small comment below. With that addressed feel
free to add:

Reviewed-by: Jan Kara <jack@suse.cz>


> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>  	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>  	int count;
>  
> +	if (test_opt(sb, DISCARD)) {
> +		/*
> +		 * wait the discard work to drain all of ext4_free_data
> +		 */
> +		queue_work(ext4_discard_wq, &sbi->s_discard_work);

Do we really need to queue the work here? The filesystem should be
quiescent by now, we take care to queue the work whenever we add item to
empty list. So it should be enough to have flush_work() here and then
possibly

	WARN_ON_ONCE(!list_empty(&sbi->s_discard_list))

Or am I missing something?

								Honza

> +		flush_work(&sbi->s_discard_work);
> +	}
> +
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex
  2021-07-24  7:41 [PATCH V3 0/5] ext4: get discard out of jbd2 commit context Wang Jianchao
@ 2021-07-24  7:41 ` Wang Jianchao
  2021-08-04 15:45   ` Jan Kara
  2021-08-12 19:46   ` Theodore Ts'o
  0 siblings, 2 replies; 7+ messages in thread
From: Wang Jianchao @ 2021-07-24  7:41 UTC (permalink / raw)
  To: linux-ext4, linux-kernel; +Cc: tytso, adilger.kernel

From: Wang Jianchao <wangjianchao@kuaishou.com>

Right now, discard is issued and waited to be completed in jbd2
commit kthread context after the logs are committed. When large
amount of files are deleted and discard is flooding, jbd2 commit
kthread can be blocked for long time. Then all of the metadata
operations can be blocked to wait the log space.

One case is the page fault path with read mm->mmap_sem held, which
wants to update the file time but has to wait for the log space.
When other threads in the task wants to do mmap, then write mmap_sem
is blocked. Finally all of the following read mmap_sem requirements
are blocked, even the ps command which need to read the /proc/pid/
-cmdline. Our monitor service which needs to read /proc/pid/cmdline
used to be blocked for 5 mins.

This patch frees the blocks back to buddy after commit and then do
discard in a async kworker context in fstrim fashion, namely,
 - mark blocks to be discarded as used if they have not been allocated
 - do discard
 - mark them free
After this, jbd2 commit kthread won't be blocked any more by discard
and we won't get NOSPC even if the discard is slow or throttled.

Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
---
 fs/ext4/ext4.h    |   2 +
 fs/ext4/mballoc.c | 109 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 86 insertions(+), 25 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3c51e243450d..6b678b968d84 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1536,6 +1536,8 @@ struct ext4_sb_info {
 	unsigned int s_mb_free_pending;
 	struct list_head s_freed_data_list;	/* List of blocks to be freed
 						   after commit completed */
+	struct list_head s_discard_list;
+	struct work_struct s_discard_work;
 	struct rb_root s_mb_avg_fragment_size_root;
 	rwlock_t s_mb_rb_lock;
 	struct list_head *s_mb_largest_free_orders;
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 34be2f07449d..a496509e61b7 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -386,6 +386,7 @@
 static struct kmem_cache *ext4_pspace_cachep;
 static struct kmem_cache *ext4_ac_cachep;
 static struct kmem_cache *ext4_free_data_cachep;
+static struct workqueue_struct *ext4_discard_wq;
 
 /* We create slab caches for groupinfo data structures based on the
  * superblock block size.  There will be one per mounted filesystem for
@@ -408,6 +409,10 @@ static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac);
 static bool ext4_mb_good_group(struct ext4_allocation_context *ac,
 			       ext4_group_t group, int cr);
 
+static int ext4_try_to_trim_range(struct super_block *sb,
+		struct ext4_buddy *e4b, ext4_grpblk_t start,
+		ext4_grpblk_t max, ext4_grpblk_t minblocks);
+
 /*
  * The algorithm using this percpu seq counter goes below:
  * 1. We sample the percpu discard_pa_seq counter before trying for block
@@ -3308,6 +3313,55 @@ static int ext4_groupinfo_create_slab(size_t size)
 	return 0;
 }
 
+static void ext4_discard_work(struct work_struct *work)
+{
+	struct ext4_sb_info *sbi = container_of(work,
+			struct ext4_sb_info, s_discard_work);
+	struct super_block *sb = sbi->s_sb;
+	struct ext4_free_data *fd, *nfd;
+	struct ext4_buddy e4b;
+	struct list_head discard_list;
+	ext4_group_t grp, load_grp;
+	int err = 0;
+
+	INIT_LIST_HEAD(&discard_list);
+	spin_lock(&sbi->s_md_lock);
+	list_splice_init(&sbi->s_discard_list, &discard_list);
+	spin_unlock(&sbi->s_md_lock);
+
+	load_grp = UINT_MAX;
+	list_for_each_entry_safe(fd, nfd, &discard_list, efd_list) {
+		/*
+		 * If filesystem is umounting or no memory, give up the discard
+		 */
+		if ((sb->s_flags & SB_ACTIVE) && !err) {
+			grp = fd->efd_group;
+			if (grp != load_grp) {
+				if (load_grp != UINT_MAX)
+					ext4_mb_unload_buddy(&e4b);
+
+				err = ext4_mb_load_buddy(sb, grp, &e4b);
+				if (err) {
+					kmem_cache_free(ext4_free_data_cachep, fd);
+					load_grp = UINT_MAX;
+					continue;
+				} else {
+					load_grp = grp;
+				}
+			}
+
+			ext4_lock_group(sb, grp);
+			ext4_try_to_trim_range(sb, &e4b, fd->efd_start_cluster,
+						fd->efd_start_cluster + fd->efd_count - 1, 1);
+			ext4_unlock_group(sb, grp);
+		}
+		kmem_cache_free(ext4_free_data_cachep, fd);
+	}
+
+	if (load_grp != UINT_MAX)
+		ext4_mb_unload_buddy(&e4b);
+}
+
 int ext4_mb_init(struct super_block *sb)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -3376,6 +3430,8 @@ int ext4_mb_init(struct super_block *sb)
 	spin_lock_init(&sbi->s_md_lock);
 	sbi->s_mb_free_pending = 0;
 	INIT_LIST_HEAD(&sbi->s_freed_data_list);
+	INIT_LIST_HEAD(&sbi->s_discard_list);
+	INIT_WORK(&sbi->s_discard_work, ext4_discard_work);
 
 	sbi->s_mb_max_to_scan = MB_DEFAULT_MAX_TO_SCAN;
 	sbi->s_mb_min_to_scan = MB_DEFAULT_MIN_TO_SCAN;
@@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
 	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
 	int count;
 
+	if (test_opt(sb, DISCARD)) {
+		/*
+		 * wait the discard work to drain all of ext4_free_data
+		 */
+		queue_work(ext4_discard_wq, &sbi->s_discard_work);
+		flush_work(&sbi->s_discard_work);
+	}
+
 	if (sbi->s_group_info) {
 		for (i = 0; i < ngroups; i++) {
 			cond_resched();
@@ -3596,7 +3660,6 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
 		put_page(e4b.bd_bitmap_page);
 	}
 	ext4_unlock_group(sb, entry->efd_group);
-	kmem_cache_free(ext4_free_data_cachep, entry);
 	ext4_mb_unload_buddy(&e4b);
 
 	mb_debug(sb, "freed %d blocks in %d structures\n", count,
@@ -3611,10 +3674,9 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct ext4_free_data *entry, *tmp;
-	struct bio *discard_bio = NULL;
 	struct list_head freed_data_list;
 	struct list_head *cut_pos = NULL;
-	int err;
+	bool wake;
 
 	INIT_LIST_HEAD(&freed_data_list);
 
@@ -3629,30 +3691,20 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
 				  cut_pos);
 	spin_unlock(&sbi->s_md_lock);
 
-	if (test_opt(sb, DISCARD)) {
-		list_for_each_entry(entry, &freed_data_list, efd_list) {
-			err = ext4_issue_discard(sb, entry->efd_group,
-						 entry->efd_start_cluster,
-						 entry->efd_count,
-						 &discard_bio);
-			if (err && err != -EOPNOTSUPP) {
-				ext4_msg(sb, KERN_WARNING, "discard request in"
-					 " group:%d block:%d count:%d failed"
-					 " with %d", entry->efd_group,
-					 entry->efd_start_cluster,
-					 entry->efd_count, err);
-			} else if (err == -EOPNOTSUPP)
-				break;
-		}
+	list_for_each_entry(entry, &freed_data_list, efd_list)
+		ext4_free_data_in_buddy(sb, entry);
 
-		if (discard_bio) {
-			submit_bio_wait(discard_bio);
-			bio_put(discard_bio);
-		}
+	if (test_opt(sb, DISCARD)) {
+		spin_lock(&sbi->s_md_lock);
+		wake = list_empty(&sbi->s_discard_list);
+		list_splice_tail(&freed_data_list, &sbi->s_discard_list);
+		spin_unlock(&sbi->s_md_lock);
+		if (wake)
+			queue_work(ext4_discard_wq, &sbi->s_discard_work);
+	} else {
+		list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
+			kmem_cache_free(ext4_free_data_cachep, entry);
 	}
-
-	list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
-		ext4_free_data_in_buddy(sb, entry);
 }
 
 int __init ext4_init_mballoc(void)
@@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
 	if (ext4_free_data_cachep == NULL)
 		goto out_ac_free;
 
+	ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
+	if (!ext4_discard_wq)
+		goto out_free_data;
+
 	return 0;
 
+out_free_data:
+	kmem_cache_destroy(ext4_free_data_cachep);
 out_ac_free:
 	kmem_cache_destroy(ext4_ac_cachep);
 out_pa_free:
@@ -3693,6 +3751,7 @@ void ext4_exit_mballoc(void)
 	kmem_cache_destroy(ext4_ac_cachep);
 	kmem_cache_destroy(ext4_free_data_cachep);
 	ext4_groupinfo_destroy_slabs();
+	destroy_workqueue(ext4_discard_wq);
 }
 
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-26  8:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-26 20:30 [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex kernel test robot
  -- strict thread matches above, loose matches on Subject: below --
2021-07-24  7:41 [PATCH V3 0/5] ext4: get discard out of jbd2 commit context Wang Jianchao
2021-07-24  7:41 ` [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex Wang Jianchao
2021-08-04 15:45   ` Jan Kara
2021-08-26  7:15     ` Wang Jianchao
2021-08-12 19:46   ` Theodore Ts'o
2021-08-26  7:51     ` Wang Jianchao
2021-08-26  8:58       ` Wang Jianchao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.