All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: yangerkun <yangerkun@huawei.com>, NeilBrown <neilb@suse.de>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: kernel test robot <rong.a.chen@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@lists.01.org, Bruce Fields <bfields@fieldses.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression
Date: Tue, 10 Mar 2020 13:27:55 -0400	[thread overview]
Message-ID: <36c58a6d07b67aac751fca27a4938dc1759d9267.camel@kernel.org> (raw)
In-Reply-To: <923487db2c9396c79f8e8dd4f846b2b1762635c8.camel@kernel.org>

On Tue, 2020-03-10 at 08:52 -0400, Jeff Layton wrote:

[snip]

> On Tue, 2020-03-10 at 11:24 +0800, yangerkun wrote:
> > > 
> > Something others. I think there is no need to call locks_delete_block 
> > for all case in function like flock_lock_inode_wait. What we should do 
> > as the patch '16306a61d3b7 ("fs/locks: always delete_block after 
> > waiting.")' describes is that we need call locks_delete_block not only 
> > for error equal to -ERESTARTSYS(please point out if I am wrong). And 
> > this patch may fix the regression too since simple lock that success or 
> > unlock will not try to acquire blocked_lock_lock.
> > 
> > 
> 
> Nice! This looks like it would work too, and it's a simpler fix.
> 
> I'd be inclined to add a WARN_ON_ONCE(fl->fl_blocker) after the if
> statements to make sure we never exit with one still queued. Also, I
> think we can do a similar optimization in __break_lease.
> 
> There are some other callers of locks_delete_block:
> 
> cifs_posix_lock_set: already only calls it in these cases
> 
> nlmsvc_unlink_block: I think we need to call this in most cases, and
> they're not going to be high-performance codepaths in general
> 
> nfsd4 callback handling: Several calls here, most need to always be
> called. find_blocked_lock could be reworked to take the
> blocked_lock_lock only once (I'll do that in a separate patch).
> 
> How about something like this (
> 
> ----------------------8<---------------------
> 
> From: yangerkun <yangerkun@huawei.com>
> 
> [PATCH] filelock: fix regression in unlock performance
> 
> '6d390e4b5d48 ("locks: fix a potential use-after-free problem when
> wakeup a waiter")' introduces a regression since we will acquire
> blocked_lock_lock every time locks_delete_block is called.
> 
> In many cases we can just avoid calling locks_delete_block at all,
> when we know that the wait was awoken by the condition becoming true.
> Change several callers of locks_delete_block to only call it when
> waking up due to signal or other error condition.
> 
> [ jlayton: add similar optimization to __break_lease, reword changelog,
> 	   add WARN_ON_ONCE calls ]
> 
> Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
> Signed-off-by: yangerkun <yangerkun@huawei.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/locks.c | 29 ++++++++++++++++++++++-------
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..b88a5b11c464 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1354,7 +1354,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>  		if (error)
>  			break;
>  	}
> -	locks_delete_block(fl);
> +	if (error)
> +		locks_delete_block(fl);
> +	WARN_ON_ONCE(fl->fl_blocker);
> +
>  	return error;
>  }
>  
> @@ -1447,7 +1450,9 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
>  
>  		break;
>  	}
> -	locks_delete_block(&fl);
> +	if (error)
> +		locks_delete_block(&fl);
> +	WARN_ON_ONCE(fl.fl_blocker);
>  
>  	return error;
>  }
> @@ -1638,23 +1643,28 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>  
>  	locks_dispose_list(&dispose);
>  	error = wait_event_interruptible_timeout(new_fl->fl_wait,
> -						!new_fl->fl_blocker, break_time);
> +						 !new_fl->fl_blocker,
> +						 break_time);
>  
>  	percpu_down_read(&file_rwsem);
>  	spin_lock(&ctx->flc_lock);
>  	trace_break_lease_unblock(inode, new_fl);
> -	locks_delete_block(new_fl);
>  	if (error >= 0) {
>  		/*
>  		 * Wait for the next conflicting lease that has not been
>  		 * broken yet
>  		 */
> -		if (error == 0)
> +		if (error == 0) {
> +			locks_delete_block(new_fl);
>  			time_out_leases(inode, &dispose);
> +		}
>  		if (any_leases_conflict(inode, new_fl))
>  			goto restart;
>  		error = 0;
> +	} else {
> +		locks_delete_block(new_fl);
>  	}
> +	WARN_ON_ONCE(fl->fl_blocker);
>  out:
>  	spin_unlock(&ctx->flc_lock);
>  	percpu_up_read(&file_rwsem);
> @@ -2126,7 +2136,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>  		if (error)
>  			break;
>  	}
> -	locks_delete_block(fl);
> +	if (error)
> +		locks_delete_block(fl);
> +	WARN_ON_ONCE(fl->fl_blocker);
> +
>  	return error;
>  }
>  
> @@ -2403,7 +2416,9 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
>  		if (error)
>  			break;
>  	}
> -	locks_delete_block(fl);
> +	if (error)
> +		locks_delete_block(fl);
> +	WARN_ON_ONCE(fl->fl_blocker);
>  
>  	return error;
>  }

I've gone ahead and added the above patch to linux-next. Linus, Neil,
are you ok with this one? I think this is probably the simplest
approach.

Assuming so and that this tests out OK, I'll a PR in a few days, after
it has had a bit of soak time in next.

Thanks for the effort everyone! 
-- 
Jeff Layton <jlayton@kernel.org>


WARNING: multiple messages have this Message-ID (diff)
From: Jeff Layton <jlayton@kernel.org>
To: lkp@lists.01.org
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression
Date: Tue, 10 Mar 2020 13:27:55 -0400	[thread overview]
Message-ID: <36c58a6d07b67aac751fca27a4938dc1759d9267.camel@kernel.org> (raw)
In-Reply-To: <923487db2c9396c79f8e8dd4f846b2b1762635c8.camel@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 5101 bytes --]

On Tue, 2020-03-10 at 08:52 -0400, Jeff Layton wrote:

[snip]

> On Tue, 2020-03-10 at 11:24 +0800, yangerkun wrote:
> > > 
> > Something others. I think there is no need to call locks_delete_block 
> > for all case in function like flock_lock_inode_wait. What we should do 
> > as the patch '16306a61d3b7 ("fs/locks: always delete_block after 
> > waiting.")' describes is that we need call locks_delete_block not only 
> > for error equal to -ERESTARTSYS(please point out if I am wrong). And 
> > this patch may fix the regression too since simple lock that success or 
> > unlock will not try to acquire blocked_lock_lock.
> > 
> > 
> 
> Nice! This looks like it would work too, and it's a simpler fix.
> 
> I'd be inclined to add a WARN_ON_ONCE(fl->fl_blocker) after the if
> statements to make sure we never exit with one still queued. Also, I
> think we can do a similar optimization in __break_lease.
> 
> There are some other callers of locks_delete_block:
> 
> cifs_posix_lock_set: already only calls it in these cases
> 
> nlmsvc_unlink_block: I think we need to call this in most cases, and
> they're not going to be high-performance codepaths in general
> 
> nfsd4 callback handling: Several calls here, most need to always be
> called. find_blocked_lock could be reworked to take the
> blocked_lock_lock only once (I'll do that in a separate patch).
> 
> How about something like this (
> 
> ----------------------8<---------------------
> 
> From: yangerkun <yangerkun@huawei.com>
> 
> [PATCH] filelock: fix regression in unlock performance
> 
> '6d390e4b5d48 ("locks: fix a potential use-after-free problem when
> wakeup a waiter")' introduces a regression since we will acquire
> blocked_lock_lock every time locks_delete_block is called.
> 
> In many cases we can just avoid calling locks_delete_block at all,
> when we know that the wait was awoken by the condition becoming true.
> Change several callers of locks_delete_block to only call it when
> waking up due to signal or other error condition.
> 
> [ jlayton: add similar optimization to __break_lease, reword changelog,
> 	   add WARN_ON_ONCE calls ]
> 
> Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
> Signed-off-by: yangerkun <yangerkun@huawei.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/locks.c | 29 ++++++++++++++++++++++-------
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..b88a5b11c464 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1354,7 +1354,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>  		if (error)
>  			break;
>  	}
> -	locks_delete_block(fl);
> +	if (error)
> +		locks_delete_block(fl);
> +	WARN_ON_ONCE(fl->fl_blocker);
> +
>  	return error;
>  }
>  
> @@ -1447,7 +1450,9 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
>  
>  		break;
>  	}
> -	locks_delete_block(&fl);
> +	if (error)
> +		locks_delete_block(&fl);
> +	WARN_ON_ONCE(fl.fl_blocker);
>  
>  	return error;
>  }
> @@ -1638,23 +1643,28 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>  
>  	locks_dispose_list(&dispose);
>  	error = wait_event_interruptible_timeout(new_fl->fl_wait,
> -						!new_fl->fl_blocker, break_time);
> +						 !new_fl->fl_blocker,
> +						 break_time);
>  
>  	percpu_down_read(&file_rwsem);
>  	spin_lock(&ctx->flc_lock);
>  	trace_break_lease_unblock(inode, new_fl);
> -	locks_delete_block(new_fl);
>  	if (error >= 0) {
>  		/*
>  		 * Wait for the next conflicting lease that has not been
>  		 * broken yet
>  		 */
> -		if (error == 0)
> +		if (error == 0) {
> +			locks_delete_block(new_fl);
>  			time_out_leases(inode, &dispose);
> +		}
>  		if (any_leases_conflict(inode, new_fl))
>  			goto restart;
>  		error = 0;
> +	} else {
> +		locks_delete_block(new_fl);
>  	}
> +	WARN_ON_ONCE(fl->fl_blocker);
>  out:
>  	spin_unlock(&ctx->flc_lock);
>  	percpu_up_read(&file_rwsem);
> @@ -2126,7 +2136,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>  		if (error)
>  			break;
>  	}
> -	locks_delete_block(fl);
> +	if (error)
> +		locks_delete_block(fl);
> +	WARN_ON_ONCE(fl->fl_blocker);
> +
>  	return error;
>  }
>  
> @@ -2403,7 +2416,9 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
>  		if (error)
>  			break;
>  	}
> -	locks_delete_block(fl);
> +	if (error)
> +		locks_delete_block(fl);
> +	WARN_ON_ONCE(fl->fl_blocker);
>  
>  	return error;
>  }

I've gone ahead and added the above patch to linux-next. Linus, Neil,
are you ok with this one? I think this is probably the simplest
approach.

Assuming so and that this tests out OK, I'll a PR in a few days, after
it has had a bit of soak time in next.

Thanks for the effort everyone! 
-- 
Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2020-03-10 17:27 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-08 14:03 [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression kernel test robot
2020-03-08 14:03 ` kernel test robot
2020-03-09 14:36 ` Jeff Layton
2020-03-09 14:36   ` Jeff Layton
2020-03-09 15:52   ` Linus Torvalds
2020-03-09 15:52     ` Linus Torvalds
2020-03-09 17:22     ` Jeff Layton
2020-03-09 17:22       ` Jeff Layton
2020-03-09 19:09       ` Jeff Layton
2020-03-09 19:09         ` Jeff Layton
2020-03-09 19:53         ` Jeff Layton
2020-03-09 19:53           ` Jeff Layton
2020-03-09 21:42         ` NeilBrown
2020-03-09 21:42           ` NeilBrown
2020-03-09 21:58           ` Jeff Layton
2020-03-09 21:58             ` Jeff Layton
2020-03-10  7:52             ` kernel test robot
2020-03-10  7:52               ` kernel test robot
2020-03-09 22:11           ` Jeff Layton
2020-03-09 22:11             ` Jeff Layton
2020-03-10  3:24             ` yangerkun
2020-03-10  3:24               ` yangerkun
2020-03-10  7:54               ` kernel test robot
2020-03-10  7:54                 ` kernel test robot
2020-03-10 12:52               ` Jeff Layton
2020-03-10 12:52                 ` Jeff Layton
2020-03-10 14:18                 ` yangerkun
2020-03-10 14:18                   ` yangerkun
2020-03-10 15:06                   ` Jeff Layton
2020-03-10 15:06                     ` Jeff Layton
2020-03-10 17:27                 ` Jeff Layton [this message]
2020-03-10 17:27                   ` Jeff Layton
2020-03-10 21:01                   ` NeilBrown
2020-03-10 21:01                     ` NeilBrown
2020-03-10 21:14                     ` Jeff Layton
2020-03-10 21:14                       ` Jeff Layton
2020-03-10 21:21                       ` NeilBrown
2020-03-10 21:21                         ` NeilBrown
2020-03-10 21:47                         ` Linus Torvalds
2020-03-10 21:47                           ` Linus Torvalds
2020-03-10 22:07                           ` Jeff Layton
2020-03-10 22:07                             ` Jeff Layton
2020-03-10 22:31                             ` Linus Torvalds
2020-03-10 22:31                               ` Linus Torvalds
2020-03-11 22:22                               ` NeilBrown
2020-03-11 22:22                                 ` NeilBrown
2020-03-12  0:38                                 ` Linus Torvalds
2020-03-12  0:38                                   ` Linus Torvalds
2020-03-12  4:42                                   ` NeilBrown
2020-03-12  4:42                                     ` NeilBrown
2020-03-12 12:31                                     ` Jeff Layton
2020-03-12 12:31                                       ` Jeff Layton
2020-03-12 22:19                                       ` NeilBrown
2020-03-12 22:19                                         ` NeilBrown
2020-03-14  1:11                                         ` Jeff Layton
2020-03-14  1:11                                           ` Jeff Layton
2020-03-12 16:07                                     ` Linus Torvalds
2020-03-12 16:07                                       ` Linus Torvalds
2020-03-14  1:31                                       ` Jeff Layton
2020-03-14  1:31                                         ` Jeff Layton
2020-03-14  2:31                                         ` NeilBrown
2020-03-14  2:31                                           ` NeilBrown
2020-03-14 15:58                                           ` Linus Torvalds
2020-03-14 15:58                                             ` Linus Torvalds
2020-03-15 13:54                                             ` Jeff Layton
2020-03-15 13:54                                               ` Jeff Layton
2020-03-16  5:06                                               ` NeilBrown
2020-03-16  5:06                                                 ` NeilBrown
2020-03-16 11:07                                                 ` Jeff Layton
2020-03-16 11:07                                                   ` Jeff Layton
2020-03-16 17:26                                                   ` Linus Torvalds
2020-03-16 17:26                                                     ` Linus Torvalds
2020-03-17  1:41                                                     ` yangerkun
2020-03-17  1:41                                                       ` yangerkun
2020-03-17 14:05                                                       ` yangerkun
2020-03-17 14:05                                                         ` yangerkun
2020-03-17 16:07                                                         ` Jeff Layton
2020-03-17 16:07                                                           ` Jeff Layton
2020-03-18  1:09                                                           ` yangerkun
2020-03-18  1:09                                                             ` yangerkun
2020-03-19 17:51                                                     ` Jeff Layton
2020-03-19 17:51                                                       ` Jeff Layton
2020-03-19 19:23                                                       ` Linus Torvalds
2020-03-19 19:23                                                         ` Linus Torvalds
2020-03-19 19:24                                                         ` Jeff Layton
2020-03-19 19:24                                                           ` Jeff Layton
2020-03-19 19:35                                                           ` Linus Torvalds
2020-03-19 19:35                                                             ` Linus Torvalds
2020-03-19 20:10                                                             ` Jeff Layton
2020-03-19 20:10                                                               ` Jeff Layton
2020-03-16 22:45                                                   ` NeilBrown
2020-03-16 22:45                                                     ` NeilBrown
2020-03-17 15:59                                                     ` Jeff Layton
2020-03-17 15:59                                                       ` Jeff Layton
2020-03-17 21:27                                                       ` NeilBrown
2020-03-17 21:27                                                         ` NeilBrown
2020-03-18  5:12                                                   ` kernel test robot
2020-03-18  5:12                                                     ` kernel test robot
2020-03-16  4:26                                             ` NeilBrown
2020-03-16  4:26                                               ` NeilBrown
2020-03-11  1:57                     ` yangerkun
2020-03-11  1:57                       ` yangerkun
2020-03-11 12:52                       ` Jeff Layton
2020-03-11 12:52                         ` Jeff Layton
2020-03-11 13:26                         ` yangerkun
2020-03-11 13:26                           ` yangerkun
2020-03-11 22:15                       ` NeilBrown
2020-03-11 22:15                         ` NeilBrown
2020-03-10  7:50           ` kernel test robot
2020-03-10  7:50             ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36c58a6d07b67aac751fca27a4938dc1759d9267.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=bfields@fieldses.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@lists.01.org \
    --cc=neilb@suse.de \
    --cc=rong.a.chen@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yangerkun@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.