All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: NeilBrown <neilb@suse.de>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: kernel test robot <rong.a.chen@intel.com>,
	yangerkun <yangerkun@huawei.com>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@lists.01.org, Bruce Fields <bfields@fieldses.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression
Date: Mon, 09 Mar 2020 18:11:51 -0400	[thread overview]
Message-ID: <41c83d34ae4c166f48e7969b2b71e43a0f69028d.camel@kernel.org> (raw)
In-Reply-To: <87blp5urwq.fsf@notabene.neil.brown.name>

On Tue, 2020-03-10 at 08:42 +1100, NeilBrown wrote:
> On Mon, Mar 09 2020, Jeff Layton wrote:
> 
> > On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
> > > On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> > > > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> > > > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> > > > > 
> > > > > This is not completely unexpected as we're banging on the global
> > > > > blocked_lock_lock now for every unlock. This test just thrashes file
> > > > > locks and unlocks without doing anything in between, so the workload
> > > > > looks pretty artificial [1].
> > > > > 
> > > > > It would be nice to avoid the global lock in this codepath, but it
> > > > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> > > > > inclined to ignore this result unless we see a problem in more realistic
> > > > > workloads.
> > > > 
> > > > That is a _huge_ regression, though.
> > > > 
> > > > What about something like the attached? Wouldn't that work? And make
> > > > the code actually match the old comment about wow "fl_blocker" being
> > > > NULL being special.
> > > > 
> > > > The old code seemed to not know about things like memory ordering either.
> > > > 
> > > > Patch is entirely untested, but aims to have that "smp_store_release()
> > > > means I'm done and not going to touch it any more", making that
> > > > smp_load_acquire() test hopefully be valid as per the comment..
> > > 
> > > Yeah, something along those lines maybe. I don't think we can use
> > > fl_blocker that way though, as the wait_event_interruptible is waiting
> > > on it to go to NULL, and the wake_up happens before fl_blocker is
> > > cleared.
> > > 
> > > Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
> > > instead of testing for !fl_blocker to see whether we can avoid the
> > > blocked_lock_lock?
> > >   
> > 
> > How about something like this instead? (untested other than for
> > compilation)
> > 
> > Basically, this just switches the waiters over to wait for
> > fl_blocked_member to go empty. That still happens before the wakeup, so
> > it should be ok to wait on that.
> > 
> > I think we can also eliminate the lockless list_empty check in
> > locks_delete_block, as the fl_blocker check should be sufficient now.
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> > From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
> > From: Linus Torvalds <torvalds@linux-foundation.org>
> > Date: Mon, 9 Mar 2020 14:35:43 -0400
> > Subject: [PATCH] locks: reinstate locks_delete_lock optimization
> > 
> > ...by using smp_load_acquire and smp_store_release to close the race
> > window.
> > 
> > [ jlayton: wait on the fl_blocked_requests list to go empty instead of
> > 	   the fl_blocker pointer to clear. Remove the list_empty check
> > 	   from locks_delete_lock shortcut. ]
> 
> Why do you think it is OK to remove that list_empty check?  I don't
> think it is.  There might be locked requests that need to be woken up.
> 
> As the problem here is a use-after-free due to a race, one option would
> be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
> test/use.
> 
> Another option is to use a different lock.  The fl_wait contains a
> spinlock, and we have wake_up_locked() which is provided for exactly
> these sorts of situations where the wake_up call can race with a thread
> waking up.
> 
> So my compile-tested-only proposal is below.
> I can probably a proper change-log entry if you think the patch is a
> good way to go.
> 
> NeilBrown
> 
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..8aa04d5ac8b3 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>  
>  		waiter = list_first_entry(&blocker->fl_blocked_requests,
>  					  struct file_lock, fl_blocked_member);
> +		spin_lock(&waiter->fl_wait.lock);
>  		__locks_delete_block(waiter);
>  		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>  			waiter->fl_lmops->lm_notify(waiter);
>  		else
> -			wake_up(&waiter->fl_wait);
> +			wake_up_locked(&waiter->fl_wait);
> +		spin_unlock(&waiter->fl_wait.lock);
>  	}
>  }
>  
> @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
>  {
>  	int status = -ENOENT;
>  
> +	/*
> +	 * If fl_blocker is NULL, it won't be set again as this thread
> +	 * "owns" the lock and is the only one that might try to claim
> +	 * the lock.  So it is safe to test fl_blocker locklessly.
> +	 * Also if fl_blocker is NULL, this waiter is not listed on
> +	 * fl_blocked_requests for some lock, so no other request can
> +	 * be added to the list of fl_blocked_requests for this
> +	 * request.  So if fl_blocker is NULL, it is safe to
> +	 * locklessly check if fl_blocked_requests is empty.  If both
> +	 * of these checks succeed, there is no need to take the lock.
> +	 * However, some other thread might have only *just* set
> +	 * fl_blocker to NULL and it about to send a wakeup on
> +	 * fl_wait, so we mustn't return too soon or we might free waiter
> +	 * before that wakeup can be sent.  So take the fl_wait.lock
> +	 * to serialize with the wakeup in __locks_wake_up_blocks().
> +	 */
> +	if (waiter->fl_blocker == NULL) {
> +		spin_lock(&waiter->fl_wait.lock);
> +		if (waiter->fl_blocker == NULL &&
> +		    list_empty(&waiter->fl_blocked_requests)) {
> +			spin_unlock(&waiter->fl_wait.lock);
> +			return status;
> +		}
> +		spin_unlock(&waiter->fl_wait.lock);
> +	}
>  	spin_lock(&blocked_lock_lock);
>  	if (waiter->fl_blocker)
>  		status = 0;
> 

Looks good on a cursory check, and I'm inclined to go with this since
it's less fiddly for people to backport.

One other difference to note -- we are holding the fl_wait lock when
calling lm_notify, but I don't think it will matter to any of the
existing lm_notify functions.

If you want to clean up the changelog and resend that would be great.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>


WARNING: multiple messages have this Message-ID (diff)
From: Jeff Layton <jlayton@kernel.org>
To: lkp@lists.01.org
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression
Date: Mon, 09 Mar 2020 18:11:51 -0400	[thread overview]
Message-ID: <41c83d34ae4c166f48e7969b2b71e43a0f69028d.camel@kernel.org> (raw)
In-Reply-To: <87blp5urwq.fsf@notabene.neil.brown.name>

[-- Attachment #1: Type: text/plain, Size: 6270 bytes --]

On Tue, 2020-03-10 at 08:42 +1100, NeilBrown wrote:
> On Mon, Mar 09 2020, Jeff Layton wrote:
> 
> > On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
> > > On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> > > > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> > > > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> > > > > 
> > > > > This is not completely unexpected as we're banging on the global
> > > > > blocked_lock_lock now for every unlock. This test just thrashes file
> > > > > locks and unlocks without doing anything in between, so the workload
> > > > > looks pretty artificial [1].
> > > > > 
> > > > > It would be nice to avoid the global lock in this codepath, but it
> > > > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> > > > > inclined to ignore this result unless we see a problem in more realistic
> > > > > workloads.
> > > > 
> > > > That is a _huge_ regression, though.
> > > > 
> > > > What about something like the attached? Wouldn't that work? And make
> > > > the code actually match the old comment about wow "fl_blocker" being
> > > > NULL being special.
> > > > 
> > > > The old code seemed to not know about things like memory ordering either.
> > > > 
> > > > Patch is entirely untested, but aims to have that "smp_store_release()
> > > > means I'm done and not going to touch it any more", making that
> > > > smp_load_acquire() test hopefully be valid as per the comment..
> > > 
> > > Yeah, something along those lines maybe. I don't think we can use
> > > fl_blocker that way though, as the wait_event_interruptible is waiting
> > > on it to go to NULL, and the wake_up happens before fl_blocker is
> > > cleared.
> > > 
> > > Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
> > > instead of testing for !fl_blocker to see whether we can avoid the
> > > blocked_lock_lock?
> > >   
> > 
> > How about something like this instead? (untested other than for
> > compilation)
> > 
> > Basically, this just switches the waiters over to wait for
> > fl_blocked_member to go empty. That still happens before the wakeup, so
> > it should be ok to wait on that.
> > 
> > I think we can also eliminate the lockless list_empty check in
> > locks_delete_block, as the fl_blocker check should be sufficient now.
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> > From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
> > From: Linus Torvalds <torvalds@linux-foundation.org>
> > Date: Mon, 9 Mar 2020 14:35:43 -0400
> > Subject: [PATCH] locks: reinstate locks_delete_lock optimization
> > 
> > ...by using smp_load_acquire and smp_store_release to close the race
> > window.
> > 
> > [ jlayton: wait on the fl_blocked_requests list to go empty instead of
> > 	   the fl_blocker pointer to clear. Remove the list_empty check
> > 	   from locks_delete_lock shortcut. ]
> 
> Why do you think it is OK to remove that list_empty check?  I don't
> think it is.  There might be locked requests that need to be woken up.
> 
> As the problem here is a use-after-free due to a race, one option would
> be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
> test/use.
> 
> Another option is to use a different lock.  The fl_wait contains a
> spinlock, and we have wake_up_locked() which is provided for exactly
> these sorts of situations where the wake_up call can race with a thread
> waking up.
> 
> So my compile-tested-only proposal is below.
> I can probably a proper change-log entry if you think the patch is a
> good way to go.
> 
> NeilBrown
> 
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..8aa04d5ac8b3 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>  
>  		waiter = list_first_entry(&blocker->fl_blocked_requests,
>  					  struct file_lock, fl_blocked_member);
> +		spin_lock(&waiter->fl_wait.lock);
>  		__locks_delete_block(waiter);
>  		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>  			waiter->fl_lmops->lm_notify(waiter);
>  		else
> -			wake_up(&waiter->fl_wait);
> +			wake_up_locked(&waiter->fl_wait);
> +		spin_unlock(&waiter->fl_wait.lock);
>  	}
>  }
>  
> @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
>  {
>  	int status = -ENOENT;
>  
> +	/*
> +	 * If fl_blocker is NULL, it won't be set again as this thread
> +	 * "owns" the lock and is the only one that might try to claim
> +	 * the lock.  So it is safe to test fl_blocker locklessly.
> +	 * Also if fl_blocker is NULL, this waiter is not listed on
> +	 * fl_blocked_requests for some lock, so no other request can
> +	 * be added to the list of fl_blocked_requests for this
> +	 * request.  So if fl_blocker is NULL, it is safe to
> +	 * locklessly check if fl_blocked_requests is empty.  If both
> +	 * of these checks succeed, there is no need to take the lock.
> +	 * However, some other thread might have only *just* set
> +	 * fl_blocker to NULL and it about to send a wakeup on
> +	 * fl_wait, so we mustn't return too soon or we might free waiter
> +	 * before that wakeup can be sent.  So take the fl_wait.lock
> +	 * to serialize with the wakeup in __locks_wake_up_blocks().
> +	 */
> +	if (waiter->fl_blocker == NULL) {
> +		spin_lock(&waiter->fl_wait.lock);
> +		if (waiter->fl_blocker == NULL &&
> +		    list_empty(&waiter->fl_blocked_requests)) {
> +			spin_unlock(&waiter->fl_wait.lock);
> +			return status;
> +		}
> +		spin_unlock(&waiter->fl_wait.lock);
> +	}
>  	spin_lock(&blocked_lock_lock);
>  	if (waiter->fl_blocker)
>  		status = 0;
> 

Looks good on a cursory check, and I'm inclined to go with this since
it's less fiddly for people to backport.

One other difference to note -- we are holding the fl_wait lock when
calling lm_notify, but I don't think it will matter to any of the
existing lm_notify functions.

If you want to clean up the changelog and resend that would be great.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2020-03-09 22:11 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-08 14:03 [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression kernel test robot
2020-03-08 14:03 ` kernel test robot
2020-03-09 14:36 ` Jeff Layton
2020-03-09 14:36   ` Jeff Layton
2020-03-09 15:52   ` Linus Torvalds
2020-03-09 15:52     ` Linus Torvalds
2020-03-09 17:22     ` Jeff Layton
2020-03-09 17:22       ` Jeff Layton
2020-03-09 19:09       ` Jeff Layton
2020-03-09 19:09         ` Jeff Layton
2020-03-09 19:53         ` Jeff Layton
2020-03-09 19:53           ` Jeff Layton
2020-03-09 21:42         ` NeilBrown
2020-03-09 21:42           ` NeilBrown
2020-03-09 21:58           ` Jeff Layton
2020-03-09 21:58             ` Jeff Layton
2020-03-10  7:52             ` kernel test robot
2020-03-10  7:52               ` kernel test robot
2020-03-09 22:11           ` Jeff Layton [this message]
2020-03-09 22:11             ` Jeff Layton
2020-03-10  3:24             ` yangerkun
2020-03-10  3:24               ` yangerkun
2020-03-10  7:54               ` kernel test robot
2020-03-10  7:54                 ` kernel test robot
2020-03-10 12:52               ` Jeff Layton
2020-03-10 12:52                 ` Jeff Layton
2020-03-10 14:18                 ` yangerkun
2020-03-10 14:18                   ` yangerkun
2020-03-10 15:06                   ` Jeff Layton
2020-03-10 15:06                     ` Jeff Layton
2020-03-10 17:27                 ` Jeff Layton
2020-03-10 17:27                   ` Jeff Layton
2020-03-10 21:01                   ` NeilBrown
2020-03-10 21:01                     ` NeilBrown
2020-03-10 21:14                     ` Jeff Layton
2020-03-10 21:14                       ` Jeff Layton
2020-03-10 21:21                       ` NeilBrown
2020-03-10 21:21                         ` NeilBrown
2020-03-10 21:47                         ` Linus Torvalds
2020-03-10 21:47                           ` Linus Torvalds
2020-03-10 22:07                           ` Jeff Layton
2020-03-10 22:07                             ` Jeff Layton
2020-03-10 22:31                             ` Linus Torvalds
2020-03-10 22:31                               ` Linus Torvalds
2020-03-11 22:22                               ` NeilBrown
2020-03-11 22:22                                 ` NeilBrown
2020-03-12  0:38                                 ` Linus Torvalds
2020-03-12  0:38                                   ` Linus Torvalds
2020-03-12  4:42                                   ` NeilBrown
2020-03-12  4:42                                     ` NeilBrown
2020-03-12 12:31                                     ` Jeff Layton
2020-03-12 12:31                                       ` Jeff Layton
2020-03-12 22:19                                       ` NeilBrown
2020-03-12 22:19                                         ` NeilBrown
2020-03-14  1:11                                         ` Jeff Layton
2020-03-14  1:11                                           ` Jeff Layton
2020-03-12 16:07                                     ` Linus Torvalds
2020-03-12 16:07                                       ` Linus Torvalds
2020-03-14  1:31                                       ` Jeff Layton
2020-03-14  1:31                                         ` Jeff Layton
2020-03-14  2:31                                         ` NeilBrown
2020-03-14  2:31                                           ` NeilBrown
2020-03-14 15:58                                           ` Linus Torvalds
2020-03-14 15:58                                             ` Linus Torvalds
2020-03-15 13:54                                             ` Jeff Layton
2020-03-15 13:54                                               ` Jeff Layton
2020-03-16  5:06                                               ` NeilBrown
2020-03-16  5:06                                                 ` NeilBrown
2020-03-16 11:07                                                 ` Jeff Layton
2020-03-16 11:07                                                   ` Jeff Layton
2020-03-16 17:26                                                   ` Linus Torvalds
2020-03-16 17:26                                                     ` Linus Torvalds
2020-03-17  1:41                                                     ` yangerkun
2020-03-17  1:41                                                       ` yangerkun
2020-03-17 14:05                                                       ` yangerkun
2020-03-17 14:05                                                         ` yangerkun
2020-03-17 16:07                                                         ` Jeff Layton
2020-03-17 16:07                                                           ` Jeff Layton
2020-03-18  1:09                                                           ` yangerkun
2020-03-18  1:09                                                             ` yangerkun
2020-03-19 17:51                                                     ` Jeff Layton
2020-03-19 17:51                                                       ` Jeff Layton
2020-03-19 19:23                                                       ` Linus Torvalds
2020-03-19 19:23                                                         ` Linus Torvalds
2020-03-19 19:24                                                         ` Jeff Layton
2020-03-19 19:24                                                           ` Jeff Layton
2020-03-19 19:35                                                           ` Linus Torvalds
2020-03-19 19:35                                                             ` Linus Torvalds
2020-03-19 20:10                                                             ` Jeff Layton
2020-03-19 20:10                                                               ` Jeff Layton
2020-03-16 22:45                                                   ` NeilBrown
2020-03-16 22:45                                                     ` NeilBrown
2020-03-17 15:59                                                     ` Jeff Layton
2020-03-17 15:59                                                       ` Jeff Layton
2020-03-17 21:27                                                       ` NeilBrown
2020-03-17 21:27                                                         ` NeilBrown
2020-03-18  5:12                                                   ` kernel test robot
2020-03-18  5:12                                                     ` kernel test robot
2020-03-16  4:26                                             ` NeilBrown
2020-03-16  4:26                                               ` NeilBrown
2020-03-11  1:57                     ` yangerkun
2020-03-11  1:57                       ` yangerkun
2020-03-11 12:52                       ` Jeff Layton
2020-03-11 12:52                         ` Jeff Layton
2020-03-11 13:26                         ` yangerkun
2020-03-11 13:26                           ` yangerkun
2020-03-11 22:15                       ` NeilBrown
2020-03-11 22:15                         ` NeilBrown
2020-03-10  7:50           ` kernel test robot
2020-03-10  7:50             ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41c83d34ae4c166f48e7969b2b71e43a0f69028d.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=bfields@fieldses.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@lists.01.org \
    --cc=neilb@suse.de \
    --cc=rong.a.chen@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yangerkun@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.