From: Jeff Layton <jlayton@kernel.org> To: NeilBrown <neilb@suse.de>, Linus Torvalds <torvalds@linux-foundation.org> Cc: yangerkun <yangerkun@huawei.com>, kernel test robot <rong.a.chen@intel.com>, LKML <linux-kernel@vger.kernel.org>, lkp@lists.01.org, Bruce Fields <bfields@fieldses.org>, Al Viro <viro@zeniv.linux.org.uk> Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression Date: Mon, 16 Mar 2020 07:07:24 -0400 [thread overview] Message-ID: <ce48ed9e48eda3c0f27d2f417314bd00cb1a68db.camel@kernel.org> (raw) In-Reply-To: <87pndcsxc6.fsf@notabene.neil.brown.name> [-- Attachment #1: Type: text/plain, Size: 7214 bytes --] On Mon, 2020-03-16 at 16:06 +1100, NeilBrown wrote: [...] > No, we really do need fl_blocked_requests to be empty. > After fl_blocker is cleared, the owner might check for other blockers > and might queue behind them leaving the blocked requests in place. > Or it might have to detach all those blocked requests and wake them up > so they can go and fend for themselves. > > I think the worse-case scenario could go something like that. > Process A get a lock - Al > Process B tries to get a conflicting lock and blocks Bl -> Al > Process C tries to get a conflicting lock and blocks on B: > Cl -> Bl -> Al > > At much the same time that C goes to attach Cl to Bl, A > calls unlock and B get signaled. > > So A is calling locks_wake_up_blocks(Al) - which takes blocked_lock_lock. > C is calling locks_insert_block(Bl, Cl) - which also takes the lock > B is calling locks_delete_block(Bl) which might not take the lock. > > Assume C gets the lock first. > > Before C calls locks_insert_block, Bl->fl_blocked_requests is empty. > After A finishes in locks_wake_up_blocks, Bl->fl_blocker is NULL > > If B sees that fl_blocker is NULL, we need it to see that > fl_blocked_requests is no longer empty, so that it takes the lock and > cleans up fl_blocked_requests. > > If the list_empty test on fl_blocked_request goes after the fl_blocker > test, the memory barriers we have should assure that. I had thought > that it would need an extra barrier, but as a spinlock places the change > to fl_blocked_requests *before* the change to fl_blocker, I no longer > think that is needed. Got it. I was thinking all of the waiters of a blocker would already be awoken once fl_blocker was set to NULL, but you're correct and they aren't. How about this? -----------------8<------------------ From f40e865842ae84a9d465ca9edb66f0985c1587d4 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@linux-foundation.org> Date: Mon, 9 Mar 2020 14:35:43 -0400 Subject: [PATCH] locks: reinstate locks_delete_block optimization There is measurable performance impact in some synthetic tests due to commit 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter). Fix the race condition instead by clearing the fl_blocker pointer after the wake_up, using explicit acquire/release semantics. This does mean that we can no longer use the clearing of fl_blocker as the wait condition, so switch the waiters over to checking whether the fl_blocked_member list_head is empty. Cc: yangerkun <yangerkun@huawei.com> Cc: NeilBrown <neilb@suse.de> Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter) Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/cifs/file.c | 3 ++- fs/locks.c | 41 +++++++++++++++++++++++++++++++++++------ 2 files changed, 37 insertions(+), 7 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 3b942ecdd4be..8f9d849a0012 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1169,7 +1169,8 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock) rc = posix_lock_file(file, flock, NULL); up_write(&cinode->lock_sem); if (rc == FILE_LOCK_DEFERRED) { - rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker); + rc = wait_event_interruptible(flock->fl_wait, + list_empty(&flock->fl_blocked_member)); if (!rc) goto try_again; locks_delete_block(flock); diff --git a/fs/locks.c b/fs/locks.c index 426b55d333d5..eaf754ecdaa8 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -725,7 +725,6 @@ static void __locks_delete_block(struct file_lock *waiter) { locks_delete_global_blocked(waiter); list_del_init(&waiter->fl_blocked_member); - waiter->fl_blocker = NULL; } static void __locks_wake_up_blocks(struct file_lock *blocker) @@ -740,6 +739,12 @@ static void __locks_wake_up_blocks(struct file_lock *blocker) waiter->fl_lmops->lm_notify(waiter); else wake_up(&waiter->fl_wait); + + /* + * Tell the world we're done with it - see comment at + * top of locks_delete_block(). + */ + smp_store_release(&waiter->fl_blocker, NULL); } } @@ -753,11 +758,30 @@ int locks_delete_block(struct file_lock *waiter) { int status = -ENOENT; + /* + * If fl_blocker is NULL, it won't be set again as this thread "owns" + * the lock and is the only one that might try to claim the lock. + * Because fl_blocker is explicitly set last during a delete, it's + * safe to locklessly test to see if it's NULL. If it is, then we know + * that no new locks can be inserted into its fl_blocked_requests list, + * and we can therefore avoid doing anything further as long as that + * list is empty. + */ + if (!smp_load_acquire(&waiter->fl_blocker) && + list_empty(&waiter->fl_blocked_requests)) + return status; + spin_lock(&blocked_lock_lock); if (waiter->fl_blocker) status = 0; __locks_wake_up_blocks(waiter); __locks_delete_block(waiter); + + /* + * Tell the world we're done with it - see comment at top + * of this function + */ + smp_store_release(&waiter->fl_blocker, NULL); spin_unlock(&blocked_lock_lock); return status; } @@ -1350,7 +1374,8 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl) error = posix_lock_inode(inode, fl, NULL); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); + error = wait_event_interruptible(fl->fl_wait, + list_empty(&fl->fl_blocked_member)); if (error) break; } @@ -1435,7 +1460,8 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start, error = posix_lock_inode(inode, &fl, NULL); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker); + error = wait_event_interruptible(fl.fl_wait, + list_empty(&fl.fl_blocked_member)); if (!error) { /* * If we've been sleeping someone might have @@ -1638,7 +1664,8 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) locks_dispose_list(&dispose); error = wait_event_interruptible_timeout(new_fl->fl_wait, - !new_fl->fl_blocker, break_time); + list_empty(&new_fl->fl_blocked_member), + break_time); percpu_down_read(&file_rwsem); spin_lock(&ctx->flc_lock); @@ -2122,7 +2149,8 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl) error = flock_lock_inode(inode, fl); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); + error = wait_event_interruptible(fl->fl_wait, + list_empty(&fl->fl_blocked_member)); if (error) break; } @@ -2399,7 +2427,8 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd, error = vfs_lock_file(filp, cmd, fl, NULL); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); + error = wait_event_interruptible(fl->fl_wait, + list_empty(&fl->fl_blocked_member)); if (error) break; } -- 2.24.1 [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 862 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: Jeff Layton <jlayton@kernel.org> To: lkp@lists.01.org Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression Date: Mon, 16 Mar 2020 07:07:24 -0400 [thread overview] Message-ID: <ce48ed9e48eda3c0f27d2f417314bd00cb1a68db.camel@kernel.org> (raw) In-Reply-To: <87pndcsxc6.fsf@notabene.neil.brown.name> [-- Attachment #1: Type: text/plain, Size: 7215 bytes --] On Mon, 2020-03-16 at 16:06 +1100, NeilBrown wrote: [...] > No, we really do need fl_blocked_requests to be empty. > After fl_blocker is cleared, the owner might check for other blockers > and might queue behind them leaving the blocked requests in place. > Or it might have to detach all those blocked requests and wake them up > so they can go and fend for themselves. > > I think the worse-case scenario could go something like that. > Process A get a lock - Al > Process B tries to get a conflicting lock and blocks Bl -> Al > Process C tries to get a conflicting lock and blocks on B: > Cl -> Bl -> Al > > At much the same time that C goes to attach Cl to Bl, A > calls unlock and B get signaled. > > So A is calling locks_wake_up_blocks(Al) - which takes blocked_lock_lock. > C is calling locks_insert_block(Bl, Cl) - which also takes the lock > B is calling locks_delete_block(Bl) which might not take the lock. > > Assume C gets the lock first. > > Before C calls locks_insert_block, Bl->fl_blocked_requests is empty. > After A finishes in locks_wake_up_blocks, Bl->fl_blocker is NULL > > If B sees that fl_blocker is NULL, we need it to see that > fl_blocked_requests is no longer empty, so that it takes the lock and > cleans up fl_blocked_requests. > > If the list_empty test on fl_blocked_request goes after the fl_blocker > test, the memory barriers we have should assure that. I had thought > that it would need an extra barrier, but as a spinlock places the change > to fl_blocked_requests *before* the change to fl_blocker, I no longer > think that is needed. Got it. I was thinking all of the waiters of a blocker would already be awoken once fl_blocker was set to NULL, but you're correct and they aren't. How about this? -----------------8<------------------ >From f40e865842ae84a9d465ca9edb66f0985c1587d4 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@linux-foundation.org> Date: Mon, 9 Mar 2020 14:35:43 -0400 Subject: [PATCH] locks: reinstate locks_delete_block optimization There is measurable performance impact in some synthetic tests due to commit 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter). Fix the race condition instead by clearing the fl_blocker pointer after the wake_up, using explicit acquire/release semantics. This does mean that we can no longer use the clearing of fl_blocker as the wait condition, so switch the waiters over to checking whether the fl_blocked_member list_head is empty. Cc: yangerkun <yangerkun@huawei.com> Cc: NeilBrown <neilb@suse.de> Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter) Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/cifs/file.c | 3 ++- fs/locks.c | 41 +++++++++++++++++++++++++++++++++++------ 2 files changed, 37 insertions(+), 7 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 3b942ecdd4be..8f9d849a0012 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1169,7 +1169,8 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock) rc = posix_lock_file(file, flock, NULL); up_write(&cinode->lock_sem); if (rc == FILE_LOCK_DEFERRED) { - rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker); + rc = wait_event_interruptible(flock->fl_wait, + list_empty(&flock->fl_blocked_member)); if (!rc) goto try_again; locks_delete_block(flock); diff --git a/fs/locks.c b/fs/locks.c index 426b55d333d5..eaf754ecdaa8 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -725,7 +725,6 @@ static void __locks_delete_block(struct file_lock *waiter) { locks_delete_global_blocked(waiter); list_del_init(&waiter->fl_blocked_member); - waiter->fl_blocker = NULL; } static void __locks_wake_up_blocks(struct file_lock *blocker) @@ -740,6 +739,12 @@ static void __locks_wake_up_blocks(struct file_lock *blocker) waiter->fl_lmops->lm_notify(waiter); else wake_up(&waiter->fl_wait); + + /* + * Tell the world we're done with it - see comment at + * top of locks_delete_block(). + */ + smp_store_release(&waiter->fl_blocker, NULL); } } @@ -753,11 +758,30 @@ int locks_delete_block(struct file_lock *waiter) { int status = -ENOENT; + /* + * If fl_blocker is NULL, it won't be set again as this thread "owns" + * the lock and is the only one that might try to claim the lock. + * Because fl_blocker is explicitly set last during a delete, it's + * safe to locklessly test to see if it's NULL. If it is, then we know + * that no new locks can be inserted into its fl_blocked_requests list, + * and we can therefore avoid doing anything further as long as that + * list is empty. + */ + if (!smp_load_acquire(&waiter->fl_blocker) && + list_empty(&waiter->fl_blocked_requests)) + return status; + spin_lock(&blocked_lock_lock); if (waiter->fl_blocker) status = 0; __locks_wake_up_blocks(waiter); __locks_delete_block(waiter); + + /* + * Tell the world we're done with it - see comment at top + * of this function + */ + smp_store_release(&waiter->fl_blocker, NULL); spin_unlock(&blocked_lock_lock); return status; } @@ -1350,7 +1374,8 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl) error = posix_lock_inode(inode, fl, NULL); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); + error = wait_event_interruptible(fl->fl_wait, + list_empty(&fl->fl_blocked_member)); if (error) break; } @@ -1435,7 +1460,8 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start, error = posix_lock_inode(inode, &fl, NULL); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker); + error = wait_event_interruptible(fl.fl_wait, + list_empty(&fl.fl_blocked_member)); if (!error) { /* * If we've been sleeping someone might have @@ -1638,7 +1664,8 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) locks_dispose_list(&dispose); error = wait_event_interruptible_timeout(new_fl->fl_wait, - !new_fl->fl_blocker, break_time); + list_empty(&new_fl->fl_blocked_member), + break_time); percpu_down_read(&file_rwsem); spin_lock(&ctx->flc_lock); @@ -2122,7 +2149,8 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl) error = flock_lock_inode(inode, fl); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); + error = wait_event_interruptible(fl->fl_wait, + list_empty(&fl->fl_blocked_member)); if (error) break; } @@ -2399,7 +2427,8 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd, error = vfs_lock_file(filp, cmd, fl, NULL); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); + error = wait_event_interruptible(fl->fl_wait, + list_empty(&fl->fl_blocked_member)); if (error) break; } -- 2.24.1 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 862 bytes --]
next prev parent reply other threads:[~2020-03-16 11:07 UTC|newest] Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-03-08 14:03 [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression kernel test robot 2020-03-08 14:03 ` kernel test robot 2020-03-09 14:36 ` Jeff Layton 2020-03-09 14:36 ` Jeff Layton 2020-03-09 15:52 ` Linus Torvalds 2020-03-09 15:52 ` Linus Torvalds 2020-03-09 17:22 ` Jeff Layton 2020-03-09 17:22 ` Jeff Layton 2020-03-09 19:09 ` Jeff Layton 2020-03-09 19:09 ` Jeff Layton 2020-03-09 19:53 ` Jeff Layton 2020-03-09 19:53 ` Jeff Layton 2020-03-09 21:42 ` NeilBrown 2020-03-09 21:42 ` NeilBrown 2020-03-09 21:58 ` Jeff Layton 2020-03-09 21:58 ` Jeff Layton 2020-03-10 7:52 ` kernel test robot 2020-03-10 7:52 ` kernel test robot 2020-03-09 22:11 ` Jeff Layton 2020-03-09 22:11 ` Jeff Layton 2020-03-10 3:24 ` yangerkun 2020-03-10 3:24 ` yangerkun 2020-03-10 7:54 ` kernel test robot 2020-03-10 7:54 ` kernel test robot 2020-03-10 12:52 ` Jeff Layton 2020-03-10 12:52 ` Jeff Layton 2020-03-10 14:18 ` yangerkun 2020-03-10 14:18 ` yangerkun 2020-03-10 15:06 ` Jeff Layton 2020-03-10 15:06 ` Jeff Layton 2020-03-10 17:27 ` Jeff Layton 2020-03-10 17:27 ` Jeff Layton 2020-03-10 21:01 ` NeilBrown 2020-03-10 21:01 ` NeilBrown 2020-03-10 21:14 ` Jeff Layton 2020-03-10 21:14 ` Jeff Layton 2020-03-10 21:21 ` NeilBrown 2020-03-10 21:21 ` NeilBrown 2020-03-10 21:47 ` Linus Torvalds 2020-03-10 21:47 ` Linus Torvalds 2020-03-10 22:07 ` Jeff Layton 2020-03-10 22:07 ` Jeff Layton 2020-03-10 22:31 ` Linus Torvalds 2020-03-10 22:31 ` Linus Torvalds 2020-03-11 22:22 ` NeilBrown 2020-03-11 22:22 ` NeilBrown 2020-03-12 0:38 ` Linus Torvalds 2020-03-12 0:38 ` Linus Torvalds 2020-03-12 4:42 ` NeilBrown 2020-03-12 4:42 ` NeilBrown 2020-03-12 12:31 ` Jeff Layton 2020-03-12 12:31 ` Jeff Layton 2020-03-12 22:19 ` NeilBrown 2020-03-12 22:19 ` NeilBrown 2020-03-14 1:11 ` Jeff Layton 2020-03-14 1:11 ` Jeff Layton 2020-03-12 16:07 ` Linus Torvalds 2020-03-12 16:07 ` Linus Torvalds 2020-03-14 1:31 ` Jeff Layton 2020-03-14 1:31 ` Jeff Layton 2020-03-14 2:31 ` NeilBrown 2020-03-14 2:31 ` NeilBrown 2020-03-14 15:58 ` Linus Torvalds 2020-03-14 15:58 ` Linus Torvalds 2020-03-15 13:54 ` Jeff Layton 2020-03-15 13:54 ` Jeff Layton 2020-03-16 5:06 ` NeilBrown 2020-03-16 5:06 ` NeilBrown 2020-03-16 11:07 ` Jeff Layton [this message] 2020-03-16 11:07 ` Jeff Layton 2020-03-16 17:26 ` Linus Torvalds 2020-03-16 17:26 ` Linus Torvalds 2020-03-17 1:41 ` yangerkun 2020-03-17 1:41 ` yangerkun 2020-03-17 14:05 ` yangerkun 2020-03-17 14:05 ` yangerkun 2020-03-17 16:07 ` Jeff Layton 2020-03-17 16:07 ` Jeff Layton 2020-03-18 1:09 ` yangerkun 2020-03-18 1:09 ` yangerkun 2020-03-19 17:51 ` Jeff Layton 2020-03-19 17:51 ` Jeff Layton 2020-03-19 19:23 ` Linus Torvalds 2020-03-19 19:23 ` Linus Torvalds 2020-03-19 19:24 ` Jeff Layton 2020-03-19 19:24 ` Jeff Layton 2020-03-19 19:35 ` Linus Torvalds 2020-03-19 19:35 ` Linus Torvalds 2020-03-19 20:10 ` Jeff Layton 2020-03-19 20:10 ` Jeff Layton 2020-03-16 22:45 ` NeilBrown 2020-03-16 22:45 ` NeilBrown 2020-03-17 15:59 ` Jeff Layton 2020-03-17 15:59 ` Jeff Layton 2020-03-17 21:27 ` NeilBrown 2020-03-17 21:27 ` NeilBrown 2020-03-18 5:12 ` kernel test robot 2020-03-18 5:12 ` kernel test robot 2020-03-16 4:26 ` NeilBrown 2020-03-16 4:26 ` NeilBrown 2020-03-11 1:57 ` yangerkun 2020-03-11 1:57 ` yangerkun 2020-03-11 12:52 ` Jeff Layton 2020-03-11 12:52 ` Jeff Layton 2020-03-11 13:26 ` yangerkun 2020-03-11 13:26 ` yangerkun 2020-03-11 22:15 ` NeilBrown 2020-03-11 22:15 ` NeilBrown 2020-03-10 7:50 ` kernel test robot 2020-03-10 7:50 ` kernel test robot
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=ce48ed9e48eda3c0f27d2f417314bd00cb1a68db.camel@kernel.org \ --to=jlayton@kernel.org \ --cc=bfields@fieldses.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lkp@lists.01.org \ --cc=neilb@suse.de \ --cc=rong.a.chen@intel.com \ --cc=torvalds@linux-foundation.org \ --cc=viro@zeniv.linux.org.uk \ --cc=yangerkun@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.