All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ritesh Harjani <riteshh@linux.ibm.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org, tytso@mit.edu,
	adilger.kernel@dilger.ca, linux-fsdevel@vger.kernel.org,
	aneesh.kumar@linux.ibm.com, sandeen@sandeen.net
Subject: Re: [RFCv2 1/1] ext4: Fix race in ext4 mb discard group preallocations
Date: Tue, 21 Apr 2020 09:54:57 +0530	[thread overview]
Message-ID: <20200421042458.40BB74C046@d06av22.portsmouth.uk.ibm.com> (raw)
In-Reply-To: <20200420143807.GE17130@quack2.suse.cz>

Hello Jan,

Thanks for your email.

On 4/20/20 8:08 PM, Jan Kara wrote:
> On Wed 15-04-20 22:53:01, Ritesh Harjani wrote:
>> There could be a race in function ext4_mb_discard_group_preallocations()
>> where the 1st thread may iterate through group's bb_prealloc_list and
>> remove all the PAs and add to function's local list head.
>> Now if the 2nd thread comes in to discard the group preallocations,
>> it will see that the group->bb_prealloc_list is empty and will return 0.
>>
>> Consider for a case where we have less number of groups (for e.g. just group 0),
>> this may even return an -ENOSPC error from ext4_mb_new_blocks()
>> (where we call for ext4_mb_discard_group_preallocations()).
>> But that is wrong, since 2nd thread should have waited for 1st thread to release
>> all the PAs and should have retried for allocation. Since 1st thread
>> was anyway going to discard the PAs.
>>
>> This patch fixes this race by introducing two paths (fastpath and
>> slowpath). We first try the fastpath via
>> ext4_mb_discard_preallocations(). So if any of the group's PA list is
>> empty then instead of waiting on the group_lock we continue to discard
>> other group's PA. This could help maintain the parallelism in trying to
>> discard multiple group's PA list. So if at the end some process is
>> not able to find any freed block, then we retry freeing all of the
>> groups PA list while holding the group_lock. And in case if the PA list
>> is empty, then we try return grp->bb_free which should tell us
>> whether there are any free blocks in the given group or not to make any
>> forward progress.
>>
>> Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> 
> Ritesh, do you still want to push this patch as is or do you plan to change
> it after a discussion on Thursday?

I would need more time on this. I will get back on this, once I do some
study to see how your suggested approach works out.

-ritesh


> 
>> @@ -3967,9 +3986,15 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
>>   		goto repeat;
>>   	}
>>   
>> -	/* found anything to free? */
>> +	/*
>> +	 * If this list is empty, then return the grp->bb_free. As someone
>> +	 * else may have freed the PAs and updated grp->bb_free.
>> +	 */
>>   	if (list_empty(&list)) {
>>   		BUG_ON(free != 0);
>> +		mb_debug(1, "Someone may have freed PA for this group %u, grp->bb_free %d\n",
>> +			 group, grp->bb_free);
>> +		free = grp->bb_free;
>>   		goto out;
>>   	}
> 
> I'm still somewhat concerned about the forward progress guarantee here...
> If you're convinced the allocation from goal is the only possibility of
> lockup and that logic can be removed, then please remove it and then write a
> good comment why lockup is not possible due to this.
> 
>> @@ -4464,17 +4492,39 @@ static int ext4_mb_release_context(struct ext4_allocation_context *ac)
>>   	return 0;
>>   }
>>   
>> +/*
>> + * ext4_mb_discard_preallocations: This function loop over each group's prealloc
>> + * list and try to free it. It may so happen that more than 1 process try to
>> + * call this function in parallel. That's why we initially take a fastpath
>> + * approach in which we first check if the grp->bb_prealloc_list is empty,
>> + * that could mean that, someone else may have removed all of it's PA and added
>> + * into it's local list. So we quickly return from there and try to discard
>> + * next group's PAs. This way we try to parallelize discarding of multiple group
>> + * PAs. But in case if any of the process is unfortunate to not able to free
>> + * any of group's PA, then we retry with slow path which will gurantee that
>> + * either some PAs will be made free or we will get group->bb_free blocks
>> + * (grp->bb_free if non-zero gurantees forward progress in ext4_mb_new_blocks())
>> + */
>>   static int ext4_mb_discard_preallocations(struct super_block *sb, int needed)
>>   {
>>   	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
>>   	int ret;
>>   	int freed = 0;
>> +	bool fastpath = true;
>> +	int tmp_needed;
>>   
>> -	trace_ext4_mb_discard_preallocations(sb, needed);
>> -	for (i = 0; i < ngroups && needed > 0; i++) {
>> -		ret = ext4_mb_discard_group_preallocations(sb, i, needed);
>> +repeat:
>> +	tmp_needed = needed;
>> +	trace_ext4_mb_discard_preallocations(sb, tmp_needed);
>> +	for (i = 0; i < ngroups && tmp_needed > 0; i++) {
>> +		ret = ext4_mb_discard_group_preallocations(sb, i, tmp_needed,
>> +							   fastpath);
>>   		freed += ret;
>> -		needed -= ret;
>> +		tmp_needed -= ret;
>> +	}
> 
> Why do you need 'tmp_needed'? When freed is 0, tmp_needed == needed, right?
> 
>> +	if (!freed && fastpath) {
>> +		fastpath = false;
>> +		goto repeat;
>>   	}
> 
> 								Honza
> 


  reply	other threads:[~2020-04-21  4:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-08 16:54 [RFC 0/1] ext4: Fix mballoc race in freeing up group preallocations Ritesh Harjani
2020-04-08 16:54 ` [RFC 1/1] ext4: Fix race in ext4_mb_discard_group_preallocations() Ritesh Harjani
2020-04-09 13:37   ` Jan Kara
2020-04-09 18:45     ` Ritesh Harjani
2020-04-10 15:52       ` Jan Kara
2020-04-15 17:20         ` Ritesh Harjani
2020-04-15 17:23           ` [RFCv2 1/1] ext4: Fix race in ext4 mb discard group preallocations Ritesh Harjani
2020-04-20 14:38             ` Jan Kara
2020-04-21  4:24               ` Ritesh Harjani [this message]
2020-04-16 10:27           ` [RFC 1/1] ext4: Fix race in ext4_mb_discard_group_preallocations() Jan Kara
2020-04-08 17:06 ` [RFC 0/1] ext4: Fix mballoc race in freeing up group preallocations Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200421042458.40BB74C046@d06av22.portsmouth.uk.ibm.com \
    --to=riteshh@linux.ibm.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.