All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Jeff Layton <jlayton@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: yangerkun <yangerkun@huawei.com>,
	kernel test robot <rong.a.chen@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@lists.01.org, Bruce Fields <bfields@fieldses.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression
Date: Sat, 14 Mar 2020 13:31:03 +1100	[thread overview]
Message-ID: <877dznu0pk.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <f000e352d9e103b3ade3506aac225920420d2323.camel@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 4647 bytes --]

On Fri, Mar 13 2020, Jeff Layton wrote:

> On Thu, 2020-03-12 at 09:07 -0700, Linus Torvalds wrote:
>> On Wed, Mar 11, 2020 at 9:42 PM NeilBrown <neilb@suse.de> wrote:
>> > It seems that test_and_set_bit_lock() is the preferred way to handle
>> > flags when memory ordering is important
>> 
>> That looks better.
>> 
>> The _preferred_ way is actually the one I already posted: do a
>> "smp_store_release()" to store the flag (like a NULL pointer), and a
>> smp_load_acquire() to load it.
>> 
>> That's basically optimal on most architectures (all modern ones -
>> there are bad architectures from before people figured out that
>> release/acquire is better than separate memory barriers), not needing
>> any atomics and only minimal memory ordering.
>> 
>> I wonder if a special flags value (keeping it "unsigned int" to avoid
>> the issue Jeff pointed out) might be acceptable?
>> 
>> IOW, could we do just
>> 
>>         smp_store_release(&waiter->fl_flags, FL_RELEASED);
>> 
>> to say that we're done with the lock? Or do people still look at and
>> depend on the flag values at that point?
>
> I think nlmsvc_grant_block does. We could probably work around it
> there, but we'd need to couple this change with some clear
> documentation to make it clear that you can't rely on fl_flags after
> locks_delete_block returns.
>
> If avoiding new locks is preferred here (and I'm fine with that), then
> maybe we should just go with the patch you sent originally (along with
> changing the waiters to wait on fl_blocked_member going empty instead
> of the fl_blocker going NULL)?

I agree.  I've poked at this for a while and come to the conclusion that
I cannot really come up with anything that is structurally better than
your patch.
The idea of list_del_init_release() and list_empty_acquire() is growing
on me though.  See below.

list_empty_acquire() might be appropriate for waitqueue_active(), which
is documented as requiring a memory barrier, but in practice seems to
often be used without one.

But I'm happy for you to go with your patch that changes all the wait
calls.

NeilBrown



diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..2e5eb677c324 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -174,6 +174,20 @@
 
 #include <linux/uaccess.h>
 
+/* Should go in list.h */
+static inline int list_empty_acquire(const struct list_head *head)
+{
+	return smp_load_acquire(&head->next) == head;
+}
+
+static inline void list_del_init_release(struct list_head *entry)
+{
+	__list_del_entry(entry);
+	entry->prev = entry;
+	smp_store_release(&entry->next, entry);
+}
+
+
 #define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
 #define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
 #define IS_LEASE(fl)	(fl->fl_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
@@ -724,7 +738,6 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
 static void __locks_delete_block(struct file_lock *waiter)
 {
 	locks_delete_global_blocked(waiter);
-	list_del_init(&waiter->fl_blocked_member);
 	waiter->fl_blocker = NULL;
 }
 
@@ -740,6 +753,11 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
 			waiter->fl_lmops->lm_notify(waiter);
 		else
 			wake_up(&waiter->fl_wait);
+		/*
+		 * Tell the world that we're done with it - see comment at
+		 * top of locks_delete_block().
+		 */
+		list_del_init_release(&waiter->fl_blocked_member);
 	}
 }
 
@@ -753,6 +771,25 @@ int locks_delete_block(struct file_lock *waiter)
 {
 	int status = -ENOENT;
 
+	/*
+	 * If fl_blocker is NULL, it won't be set again as this thread
+	 * "owns" the lock and is the only one that might try to claim
+	 * the lock.  So it is safe to test fl_blocker locklessly.
+	 * Also if fl_blocker is NULL, this waiter is not listed on
+	 * fl_blocked_requests for some lock, so no other request can
+	 * be added to the list of fl_blocked_requests for this
+	 * request.  So if fl_blocker is NULL, it is safe to
+	 * locklessly check if fl_blocked_requests is empty.  If both
+	 * of these checks succeed, there is no need to take the lock.
+	 * However, some other thread could still be in__locks_wake_up_blocks()
+	 * and may yet access 'waiter', so we cannot return and possibly
+	 * free the 'waiter' unless we check that __locks_wake_up_blocks()
+	 * is done.  For that we carefully test fl_blocked_member.
+	 */
+	if (waiter->fl_blocker == NULL &&
+	    list_empty(&waiter->fl_blocked_requests) &&
+	    list_empty_acquire(&waiter->fl_blocked_member))
+		return status;
 	spin_lock(&blocked_lock_lock);
 	if (waiter->fl_blocker)
 		status = 0;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <neilb@suse.de>
To: lkp@lists.01.org
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression
Date: Sat, 14 Mar 2020 13:31:03 +1100	[thread overview]
Message-ID: <877dznu0pk.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <f000e352d9e103b3ade3506aac225920420d2323.camel@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 4647 bytes --]

On Fri, Mar 13 2020, Jeff Layton wrote:

> On Thu, 2020-03-12 at 09:07 -0700, Linus Torvalds wrote:
>> On Wed, Mar 11, 2020 at 9:42 PM NeilBrown <neilb@suse.de> wrote:
>> > It seems that test_and_set_bit_lock() is the preferred way to handle
>> > flags when memory ordering is important
>> 
>> That looks better.
>> 
>> The _preferred_ way is actually the one I already posted: do a
>> "smp_store_release()" to store the flag (like a NULL pointer), and a
>> smp_load_acquire() to load it.
>> 
>> That's basically optimal on most architectures (all modern ones -
>> there are bad architectures from before people figured out that
>> release/acquire is better than separate memory barriers), not needing
>> any atomics and only minimal memory ordering.
>> 
>> I wonder if a special flags value (keeping it "unsigned int" to avoid
>> the issue Jeff pointed out) might be acceptable?
>> 
>> IOW, could we do just
>> 
>>         smp_store_release(&waiter->fl_flags, FL_RELEASED);
>> 
>> to say that we're done with the lock? Or do people still look at and
>> depend on the flag values at that point?
>
> I think nlmsvc_grant_block does. We could probably work around it
> there, but we'd need to couple this change with some clear
> documentation to make it clear that you can't rely on fl_flags after
> locks_delete_block returns.
>
> If avoiding new locks is preferred here (and I'm fine with that), then
> maybe we should just go with the patch you sent originally (along with
> changing the waiters to wait on fl_blocked_member going empty instead
> of the fl_blocker going NULL)?

I agree.  I've poked at this for a while and come to the conclusion that
I cannot really come up with anything that is structurally better than
your patch.
The idea of list_del_init_release() and list_empty_acquire() is growing
on me though.  See below.

list_empty_acquire() might be appropriate for waitqueue_active(), which
is documented as requiring a memory barrier, but in practice seems to
often be used without one.

But I'm happy for you to go with your patch that changes all the wait
calls.

NeilBrown



diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..2e5eb677c324 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -174,6 +174,20 @@
 
 #include <linux/uaccess.h>
 
+/* Should go in list.h */
+static inline int list_empty_acquire(const struct list_head *head)
+{
+	return smp_load_acquire(&head->next) == head;
+}
+
+static inline void list_del_init_release(struct list_head *entry)
+{
+	__list_del_entry(entry);
+	entry->prev = entry;
+	smp_store_release(&entry->next, entry);
+}
+
+
 #define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
 #define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
 #define IS_LEASE(fl)	(fl->fl_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
@@ -724,7 +738,6 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
 static void __locks_delete_block(struct file_lock *waiter)
 {
 	locks_delete_global_blocked(waiter);
-	list_del_init(&waiter->fl_blocked_member);
 	waiter->fl_blocker = NULL;
 }
 
@@ -740,6 +753,11 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
 			waiter->fl_lmops->lm_notify(waiter);
 		else
 			wake_up(&waiter->fl_wait);
+		/*
+		 * Tell the world that we're done with it - see comment at
+		 * top of locks_delete_block().
+		 */
+		list_del_init_release(&waiter->fl_blocked_member);
 	}
 }
 
@@ -753,6 +771,25 @@ int locks_delete_block(struct file_lock *waiter)
 {
 	int status = -ENOENT;
 
+	/*
+	 * If fl_blocker is NULL, it won't be set again as this thread
+	 * "owns" the lock and is the only one that might try to claim
+	 * the lock.  So it is safe to test fl_blocker locklessly.
+	 * Also if fl_blocker is NULL, this waiter is not listed on
+	 * fl_blocked_requests for some lock, so no other request can
+	 * be added to the list of fl_blocked_requests for this
+	 * request.  So if fl_blocker is NULL, it is safe to
+	 * locklessly check if fl_blocked_requests is empty.  If both
+	 * of these checks succeed, there is no need to take the lock.
+	 * However, some other thread could still be in__locks_wake_up_blocks()
+	 * and may yet access 'waiter', so we cannot return and possibly
+	 * free the 'waiter' unless we check that __locks_wake_up_blocks()
+	 * is done.  For that we carefully test fl_blocked_member.
+	 */
+	if (waiter->fl_blocker == NULL &&
+	    list_empty(&waiter->fl_blocked_requests) &&
+	    list_empty_acquire(&waiter->fl_blocked_member))
+		return status;
 	spin_lock(&blocked_lock_lock);
 	if (waiter->fl_blocker)
 		status = 0;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2020-03-14  2:31 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-08 14:03 [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression kernel test robot
2020-03-08 14:03 ` kernel test robot
2020-03-09 14:36 ` Jeff Layton
2020-03-09 14:36   ` Jeff Layton
2020-03-09 15:52   ` Linus Torvalds
2020-03-09 15:52     ` Linus Torvalds
2020-03-09 17:22     ` Jeff Layton
2020-03-09 17:22       ` Jeff Layton
2020-03-09 19:09       ` Jeff Layton
2020-03-09 19:09         ` Jeff Layton
2020-03-09 19:53         ` Jeff Layton
2020-03-09 19:53           ` Jeff Layton
2020-03-09 21:42         ` NeilBrown
2020-03-09 21:42           ` NeilBrown
2020-03-09 21:58           ` Jeff Layton
2020-03-09 21:58             ` Jeff Layton
2020-03-10  7:52             ` kernel test robot
2020-03-10  7:52               ` kernel test robot
2020-03-09 22:11           ` Jeff Layton
2020-03-09 22:11             ` Jeff Layton
2020-03-10  3:24             ` yangerkun
2020-03-10  3:24               ` yangerkun
2020-03-10  7:54               ` kernel test robot
2020-03-10  7:54                 ` kernel test robot
2020-03-10 12:52               ` Jeff Layton
2020-03-10 12:52                 ` Jeff Layton
2020-03-10 14:18                 ` yangerkun
2020-03-10 14:18                   ` yangerkun
2020-03-10 15:06                   ` Jeff Layton
2020-03-10 15:06                     ` Jeff Layton
2020-03-10 17:27                 ` Jeff Layton
2020-03-10 17:27                   ` Jeff Layton
2020-03-10 21:01                   ` NeilBrown
2020-03-10 21:01                     ` NeilBrown
2020-03-10 21:14                     ` Jeff Layton
2020-03-10 21:14                       ` Jeff Layton
2020-03-10 21:21                       ` NeilBrown
2020-03-10 21:21                         ` NeilBrown
2020-03-10 21:47                         ` Linus Torvalds
2020-03-10 21:47                           ` Linus Torvalds
2020-03-10 22:07                           ` Jeff Layton
2020-03-10 22:07                             ` Jeff Layton
2020-03-10 22:31                             ` Linus Torvalds
2020-03-10 22:31                               ` Linus Torvalds
2020-03-11 22:22                               ` NeilBrown
2020-03-11 22:22                                 ` NeilBrown
2020-03-12  0:38                                 ` Linus Torvalds
2020-03-12  0:38                                   ` Linus Torvalds
2020-03-12  4:42                                   ` NeilBrown
2020-03-12  4:42                                     ` NeilBrown
2020-03-12 12:31                                     ` Jeff Layton
2020-03-12 12:31                                       ` Jeff Layton
2020-03-12 22:19                                       ` NeilBrown
2020-03-12 22:19                                         ` NeilBrown
2020-03-14  1:11                                         ` Jeff Layton
2020-03-14  1:11                                           ` Jeff Layton
2020-03-12 16:07                                     ` Linus Torvalds
2020-03-12 16:07                                       ` Linus Torvalds
2020-03-14  1:31                                       ` Jeff Layton
2020-03-14  1:31                                         ` Jeff Layton
2020-03-14  2:31                                         ` NeilBrown [this message]
2020-03-14  2:31                                           ` NeilBrown
2020-03-14 15:58                                           ` Linus Torvalds
2020-03-14 15:58                                             ` Linus Torvalds
2020-03-15 13:54                                             ` Jeff Layton
2020-03-15 13:54                                               ` Jeff Layton
2020-03-16  5:06                                               ` NeilBrown
2020-03-16  5:06                                                 ` NeilBrown
2020-03-16 11:07                                                 ` Jeff Layton
2020-03-16 11:07                                                   ` Jeff Layton
2020-03-16 17:26                                                   ` Linus Torvalds
2020-03-16 17:26                                                     ` Linus Torvalds
2020-03-17  1:41                                                     ` yangerkun
2020-03-17  1:41                                                       ` yangerkun
2020-03-17 14:05                                                       ` yangerkun
2020-03-17 14:05                                                         ` yangerkun
2020-03-17 16:07                                                         ` Jeff Layton
2020-03-17 16:07                                                           ` Jeff Layton
2020-03-18  1:09                                                           ` yangerkun
2020-03-18  1:09                                                             ` yangerkun
2020-03-19 17:51                                                     ` Jeff Layton
2020-03-19 17:51                                                       ` Jeff Layton
2020-03-19 19:23                                                       ` Linus Torvalds
2020-03-19 19:23                                                         ` Linus Torvalds
2020-03-19 19:24                                                         ` Jeff Layton
2020-03-19 19:24                                                           ` Jeff Layton
2020-03-19 19:35                                                           ` Linus Torvalds
2020-03-19 19:35                                                             ` Linus Torvalds
2020-03-19 20:10                                                             ` Jeff Layton
2020-03-19 20:10                                                               ` Jeff Layton
2020-03-16 22:45                                                   ` NeilBrown
2020-03-16 22:45                                                     ` NeilBrown
2020-03-17 15:59                                                     ` Jeff Layton
2020-03-17 15:59                                                       ` Jeff Layton
2020-03-17 21:27                                                       ` NeilBrown
2020-03-17 21:27                                                         ` NeilBrown
2020-03-18  5:12                                                   ` kernel test robot
2020-03-18  5:12                                                     ` kernel test robot
2020-03-16  4:26                                             ` NeilBrown
2020-03-16  4:26                                               ` NeilBrown
2020-03-11  1:57                     ` yangerkun
2020-03-11  1:57                       ` yangerkun
2020-03-11 12:52                       ` Jeff Layton
2020-03-11 12:52                         ` Jeff Layton
2020-03-11 13:26                         ` yangerkun
2020-03-11 13:26                           ` yangerkun
2020-03-11 22:15                       ` NeilBrown
2020-03-11 22:15                         ` NeilBrown
2020-03-10  7:50           ` kernel test robot
2020-03-10  7:50             ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877dznu0pk.fsf@notabene.neil.brown.name \
    --to=neilb@suse.de \
    --cc=bfields@fieldses.org \
    --cc=jlayton@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@lists.01.org \
    --cc=rong.a.chen@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yangerkun@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.