linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups
@ 2018-08-14  3:56 NeilBrown
  2018-08-14  3:56 ` [PATCH 5/5] fs/locks: create a tree of dependent requests NeilBrown
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: NeilBrown @ 2018-08-14  3:56 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel


V2, which added wake_non_conflicts() was more broken than V1 - as
Bruce explained there is no transitivity in the blocking relation
between locks.
So this series takes a simpler approach.
It still attached waiters between other waiters as necessary to ensure
that:
  - a waiter is blocked by it's parent (fl->blocker) and all further
    ancestors, and
  - the list of waiters on fl_blocked are mutually non-conflicting.

When a lock (the root of a tree of requests) is released, only its
immediate children (fl_blocked) are woken.
When any lock is woken (either because its fl_blocker was released
to due to a signal or similar) it with either:
 - be granted
 - be aborted
 - be re-queued beneath some other lock.

In the first case tree of blocked locks is moved across to the newly
created lock, and the invariants still hold.
In the order two cases, the tree or blocked waiters are all detached
and woken.

Note that this series has not received much testing yet.

Original description:
If you have a many-core machine, and have many threads all wanting to
briefly lock a give file (udev is known to do this), you can get quite
poor performance.

When one thread releases a lock, it wakes up all other threads that
are waiting (classic thundering-herd) - one will get the lock and the
others go to sleep.
When you have few cores, this is not very noticeable: by the time the
4th or 5th thread gets enough CPU time to try to claim the lock, the
earlier threads have claimed it, done what was needed, and released.
With 50+ cores, the contention can easily be measured.

This patchset creates a tree of pending lock request in which siblings
don't conflict and each lock request does conflict with its parent.
When a lock is released, only requests which don't conflict with each
other a woken.

Testing shows that lock-acquisitions-per-second is now fairly stable even
as number of contending process goes to 1000.  Without this patch,
locks-per-second drops off steeply after a few 10s of processes.

There is a small cost to this extra complexity.
At 20 processes running a particular test on 72 cores, the lock
acquisitions per second drops from 1.8 million to 1.4 million with
this patch.  For 100 processes, this patch still provides 1.4 million
while without this patch there are about 700,000.

NeilBrown

---

NeilBrown (5):
      fs/locks: rename some lists and pointers.
      fs/locks: split out __locks_wake_up_blocks().
      fs/locks: allow a lock request to block other requests.
      fs/locks: change all *_conflict() functions to return bool.
      fs/locks: create a tree of dependent requests.


 fs/cifs/file.c                  |    2 -
 fs/locks.c                      |  156 ++++++++++++++++++++++++++-------------
 include/linux/fs.h              |    7 +-
 include/trace/events/filelock.h |   16 ++--
 4 files changed, 119 insertions(+), 62 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/5] fs/locks: allow a lock request to block other requests.
  2018-08-14  3:56 [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups NeilBrown
  2018-08-14  3:56 ` [PATCH 5/5] fs/locks: create a tree of dependent requests NeilBrown
@ 2018-08-14  3:56 ` NeilBrown
  2018-08-14  3:56 ` [PATCH 2/5] fs/locks: split out __locks_wake_up_blocks() NeilBrown
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2018-08-14  3:56 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

Currently, a lock can block pending requests, but all pending
requests are equal.  If lots of pending requests are
mutually exclusive, this means they will all be woken up
and all but one will fail.  This can hurt performance.

So we will allow pending requests to block other requests.
Only the first request will be woken, and it will wake the others.

This patch doesn't implement this fully, but prepares the way.

- It acknowledges that a request might be blocking other requests,
  and when the request is converted to a lock, those blocked
  requests are moved across.
- When a request is requeued or discarded, all blocked requests are
  woken.
- When deadlock-detection looks for the lock which blocks a
  given request, we follow the chain of ->fl_blocker all
  the way to the top.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   43 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index de0b9276f23d..9439eebd4eb9 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -408,9 +408,19 @@ void locks_copy_lock(struct file_lock *new, struct file_lock *fl)
 			fl->fl_ops->fl_copy_lock(new, fl);
 	}
 }
-
 EXPORT_SYMBOL(locks_copy_lock);
 
+static void locks_move_blocks(struct file_lock *new, struct file_lock *fl)
+{
+	struct file_lock *f;
+
+	spin_lock(&blocked_lock_lock);
+	list_splice_init(&fl->fl_blocked, &new->fl_blocked);
+	list_for_each_entry(f, &fl->fl_blocked, fl_block)
+		f->fl_blocker = new;
+	spin_unlock(&blocked_lock_lock);
+}
+
 static inline int flock_translate_cmd(int cmd) {
 	if (cmd & LOCK_MAND)
 		return cmd & (LOCK_MAND | LOCK_RW);
@@ -683,11 +693,15 @@ static void __locks_delete_block(struct file_lock *waiter)
 
 static void __locks_wake_up_blocks(struct file_lock *blocker)
 {
+	/* Wake up all requests in blocker->fl_blocked, including
+	 * requests blocked by those requests.
+	 */
 	while (!list_empty(&blocker->fl_blocked)) {
 		struct file_lock *waiter;
 
 		waiter = list_first_entry(&blocker->fl_blocked,
 					  struct file_lock, fl_block);
+		list_splice_init(&waiter->fl_blocked, &blocker->fl_blocked);
 		__locks_delete_block(waiter);
 		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
 			waiter->fl_lmops->lm_notify(waiter);
@@ -699,6 +713,7 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
 static void locks_delete_block(struct file_lock *waiter)
 {
 	spin_lock(&blocked_lock_lock);
+	__locks_wake_up_blocks(waiter);
 	__locks_delete_block(waiter);
 	spin_unlock(&blocked_lock_lock);
 }
@@ -714,13 +729,19 @@ static void locks_delete_block(struct file_lock *waiter)
  * blocked_lock_lock in some cases when we see that the fl_blocked list is empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
-					struct file_lock *waiter)
+				 struct file_lock *waiter)
 {
 	BUG_ON(!list_empty(&waiter->fl_block));
 	waiter->fl_blocker = blocker;
 	list_add_tail(&waiter->fl_block, &blocker->fl_blocked);
 	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
 		locks_insert_global_blocked(waiter);
+
+	/* The requests in waiter->fl_blocked are known to conflict with
+	 * waiter, but might not conflict with blocker, or the requests
+	 * and lock which block it.  So they all need to be woken.
+	 */
+	__locks_wake_up_blocks(waiter);
 }
 
 /* Must be called with flc_lock held. */
@@ -893,8 +914,11 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 	struct file_lock *fl;
 
 	hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) {
-		if (posix_same_owner(fl, block_fl))
-			return fl->fl_blocker;
+		if (posix_same_owner(fl, block_fl)) {
+			while (fl->fl_blocker)
+				fl = fl->fl_blocker;
+			return fl;
+		}
 	}
 	return NULL;
 }
@@ -987,6 +1011,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
 	if (request->fl_flags & FL_ACCESS)
 		goto out;
 	locks_copy_lock(new_fl, request);
+	locks_move_blocks(new_fl, request);
 	locks_insert_lock_ctx(new_fl, &ctx->flc_flock);
 	new_fl = NULL;
 	error = 0;
@@ -1179,6 +1204,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
 			goto out;
 		}
 		locks_copy_lock(new_fl, request);
+		locks_move_blocks(new_fl, request);
 		locks_insert_lock_ctx(new_fl, &fl->fl_list);
 		fl = new_fl;
 		new_fl = NULL;
@@ -2587,13 +2613,14 @@ void locks_remove_file(struct file *filp)
 int
 posix_unblock_lock(struct file_lock *waiter)
 {
-	int status = 0;
+	int status = -ENOENT;
 
 	spin_lock(&blocked_lock_lock);
-	if (waiter->fl_blocker)
+	if (waiter->fl_blocker) {
+		__locks_wake_up_blocks(waiter);
 		__locks_delete_block(waiter);
-	else
-		status = -ENOENT;
+		status = 0;
+	}
 	spin_unlock(&blocked_lock_lock);
 	return status;
 }

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] fs/locks: change all *_conflict() functions to return bool.
  2018-08-14  3:56 [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups NeilBrown
                   ` (3 preceding siblings ...)
  2018-08-14  3:56 ` [PATCH 1/5] fs/locks: rename some lists and pointers NeilBrown
@ 2018-08-14  3:56 ` NeilBrown
  2018-08-14 18:41 ` [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups J. Bruce Fields
  5 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2018-08-14  3:56 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

posix_locks_conflict() and flock_locks_conflict() both return int.
leases_conflict() returns bool.

This inconsistency will cause problems for the next patch if not
fixed.

So change posix_locks_conflict() and flock_locks_conflict() to return
bool.
Also change the locks_conflict() helper.

And convert some
   return (foo);
to
   return foo;

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 9439eebd4eb9..c7a372cebff1 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -803,47 +803,50 @@ locks_delete_lock_ctx(struct file_lock *fl, struct list_head *dispose)
 /* Determine if lock sys_fl blocks lock caller_fl. Common functionality
  * checks for shared/exclusive status of overlapping locks.
  */
-static int locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
+static bool locks_conflict(struct file_lock *caller_fl,
+			   struct file_lock *sys_fl)
 {
 	if (sys_fl->fl_type == F_WRLCK)
-		return 1;
+		return true;
 	if (caller_fl->fl_type == F_WRLCK)
-		return 1;
-	return 0;
+		return true;
+	return false;
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. POSIX specific
  * checking before calling the locks_conflict().
  */
-static int posix_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
+static bool posix_locks_conflict(struct file_lock *caller_fl,
+				 struct file_lock *sys_fl)
 {
 	/* POSIX locks owned by the same process do not conflict with
 	 * each other.
 	 */
 	if (posix_same_owner(caller_fl, sys_fl))
-		return (0);
+		return false;
 
 	/* Check whether they overlap */
 	if (!locks_overlap(caller_fl, sys_fl))
-		return 0;
+		return false;
 
-	return (locks_conflict(caller_fl, sys_fl));
+	return locks_conflict(caller_fl, sys_fl);
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific
  * checking before calling the locks_conflict().
  */
-static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
+static bool flock_locks_conflict(struct file_lock *caller_fl,
+				 struct file_lock *sys_fl)
 {
 	/* FLOCK locks referring to the same filp do not conflict with
 	 * each other.
 	 */
 	if (caller_fl->fl_file == sys_fl->fl_file)
-		return (0);
+		return false;
 	if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
-		return 0;
+		return false;
 
-	return (locks_conflict(caller_fl, sys_fl));
+	return locks_conflict(caller_fl, sys_fl);
 }
 
 void

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 1/5] fs/locks: rename some lists and pointers.
  2018-08-14  3:56 [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups NeilBrown
                   ` (2 preceding siblings ...)
  2018-08-14  3:56 ` [PATCH 2/5] fs/locks: split out __locks_wake_up_blocks() NeilBrown
@ 2018-08-14  3:56 ` NeilBrown
  2018-08-14  3:56 ` [PATCH 4/5] fs/locks: change all *_conflict() functions to return bool NeilBrown
  2018-08-14 18:41 ` [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups J. Bruce Fields
  5 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2018-08-14  3:56 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

struct file lock contains an 'fl_next' pointer which
is used to point to the lock that this request is blocked
waiting for.  So rename it to fl_blocker.

The fl_blocked list_head in an active lock is the head of a list of
blocked requests.  In a request it is a node in that list.
These are two distinct uses, so replace with two list_heads
with different names.
fl_blocked is the head of a list of blocked requests
fl_block is a node on that list.

The two different list_heads are never used at the same time, but that
will change in a future patch.

Note that a tracepoint is changed to report fl_blocker instead
of fl_next.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/cifs/file.c                  |    2 +-
 fs/locks.c                      |   40 ++++++++++++++++++++-------------------
 include/linux/fs.h              |    7 +++++--
 include/trace/events/filelock.h |   16 ++++++++--------
 4 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 8d41ca7bfcf1..066ed2e4ba96 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1092,7 +1092,7 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
 	rc = posix_lock_file(file, flock, NULL);
 	up_write(&cinode->lock_sem);
 	if (rc == FILE_LOCK_DEFERRED) {
-		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_next);
+		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
 		if (!rc)
 			goto try_again;
 		posix_unblock_lock(flock);
diff --git a/fs/locks.c b/fs/locks.c
index db7b6917d9c5..322491e70e41 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -194,7 +194,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
  * This lock protects the blocked_hash. Generally, if you're accessing it, you
  * want to be holding this lock.
  *
- * In addition, it also protects the fl->fl_block list, and the fl->fl_next
+ * In addition, it also protects the fl->fl_blocked list, and the fl->fl_blocker
  * pointer for file_lock structures that are acting as lock requests (in
  * contrast to those that are acting as records of acquired locks).
  *
@@ -203,7 +203,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
  * protected by this lock, we can skip acquiring it iff we already hold the
  * flc_lock.
  *
- * In particular, adding an entry to the fl_block list requires that you hold
+ * In particular, adding an entry to the fl_blocked list requires that you hold
  * both the flc_lock and the blocked_lock_lock (acquired in that order).
  * Deleting an entry from the list however only requires the file_lock_lock.
  */
@@ -302,6 +302,7 @@ static void locks_init_lock_heads(struct file_lock *fl)
 {
 	INIT_HLIST_NODE(&fl->fl_link);
 	INIT_LIST_HEAD(&fl->fl_list);
+	INIT_LIST_HEAD(&fl->fl_blocked);
 	INIT_LIST_HEAD(&fl->fl_block);
 	init_waitqueue_head(&fl->fl_wait);
 }
@@ -341,6 +342,7 @@ void locks_free_lock(struct file_lock *fl)
 {
 	BUG_ON(waitqueue_active(&fl->fl_wait));
 	BUG_ON(!list_empty(&fl->fl_list));
+	BUG_ON(!list_empty(&fl->fl_blocked));
 	BUG_ON(!list_empty(&fl->fl_block));
 	BUG_ON(!hlist_unhashed(&fl->fl_link));
 
@@ -676,7 +678,7 @@ static void __locks_delete_block(struct file_lock *waiter)
 {
 	locks_delete_global_blocked(waiter);
 	list_del_init(&waiter->fl_block);
-	waiter->fl_next = NULL;
+	waiter->fl_blocker = NULL;
 }
 
 static void locks_delete_block(struct file_lock *waiter)
@@ -692,16 +694,16 @@ static void locks_delete_block(struct file_lock *waiter)
  * it seems like the reasonable thing to do.
  *
  * Must be called with both the flc_lock and blocked_lock_lock held. The
- * fl_block list itself is protected by the blocked_lock_lock, but by ensuring
+ * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring
  * that the flc_lock is also held on insertions we can avoid taking the
- * blocked_lock_lock in some cases when we see that the fl_block list is empty.
+ * blocked_lock_lock in some cases when we see that the fl_blocked list is empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
 					struct file_lock *waiter)
 {
 	BUG_ON(!list_empty(&waiter->fl_block));
-	waiter->fl_next = blocker;
-	list_add_tail(&waiter->fl_block, &blocker->fl_block);
+	waiter->fl_blocker = blocker;
+	list_add_tail(&waiter->fl_block, &blocker->fl_blocked);
 	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
 		locks_insert_global_blocked(waiter);
 }
@@ -725,18 +727,18 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 	/*
 	 * Avoid taking global lock if list is empty. This is safe since new
 	 * blocked requests are only added to the list under the flc_lock, and
-	 * the flc_lock is always held here. Note that removal from the fl_block
+	 * the flc_lock is always held here. Note that removal from the fl_blocked
 	 * list does not require the flc_lock, so we must recheck list_empty()
 	 * after acquiring the blocked_lock_lock.
 	 */
-	if (list_empty(&blocker->fl_block))
+	if (list_empty(&blocker->fl_blocked))
 		return;
 
 	spin_lock(&blocked_lock_lock);
-	while (!list_empty(&blocker->fl_block)) {
+	while (!list_empty(&blocker->fl_blocked)) {
 		struct file_lock *waiter;
 
-		waiter = list_first_entry(&blocker->fl_block,
+		waiter = list_first_entry(&blocker->fl_blocked,
 				struct file_lock, fl_block);
 		__locks_delete_block(waiter);
 		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
@@ -887,7 +889,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 
 	hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) {
 		if (posix_same_owner(fl, block_fl))
-			return fl->fl_next;
+			return fl->fl_blocker;
 	}
 	return NULL;
 }
@@ -1245,7 +1247,7 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
 		error = posix_lock_inode(inode, fl, NULL);
 		if (error != FILE_LOCK_DEFERRED)
 			break;
-		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
+		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
 		if (!error)
 			continue;
 
@@ -1332,7 +1334,7 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
 		error = posix_lock_inode(inode, &fl, NULL);
 		if (error != FILE_LOCK_DEFERRED)
 			break;
-		error = wait_event_interruptible(fl.fl_wait, !fl.fl_next);
+		error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker);
 		if (!error) {
 			/*
 			 * If we've been sleeping someone might have
@@ -1526,7 +1528,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
 
 	locks_dispose_list(&dispose);
 	error = wait_event_interruptible_timeout(new_fl->fl_wait,
-						!new_fl->fl_next, break_time);
+						!new_fl->fl_blocker, break_time);
 
 	percpu_down_read_preempt_disable(&file_rwsem);
 	spin_lock(&ctx->flc_lock);
@@ -1940,7 +1942,7 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
 		error = flock_lock_inode(inode, fl);
 		if (error != FILE_LOCK_DEFERRED)
 			break;
-		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
+		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
 		if (!error)
 			continue;
 
@@ -2212,7 +2214,7 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
 		error = vfs_lock_file(filp, cmd, fl, NULL);
 		if (error != FILE_LOCK_DEFERRED)
 			break;
-		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
+		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
 		if (!error)
 			continue;
 
@@ -2583,7 +2585,7 @@ posix_unblock_lock(struct file_lock *waiter)
 	int status = 0;
 
 	spin_lock(&blocked_lock_lock);
-	if (waiter->fl_next)
+	if (waiter->fl_blocker)
 		__locks_delete_block(waiter);
 	else
 		status = -ENOENT;
@@ -2711,7 +2713,7 @@ static int locks_show(struct seq_file *f, void *v)
 
 	lock_get_status(f, fl, iter->li_pos, "");
 
-	list_for_each_entry(bfl, &fl->fl_block, fl_block)
+	list_for_each_entry(bfl, &fl->fl_blocked, fl_block)
 		lock_get_status(f, bfl, iter->li_pos, " ->");
 
 	return 0;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 805bf22898cf..b168013ef99e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1002,10 +1002,13 @@ bool opens_in_grace(struct net *);
  * Obviously, the last two criteria only matter for POSIX locks.
  */
 struct file_lock {
-	struct file_lock *fl_next;	/* singly linked list for this inode  */
+	struct file_lock *fl_blocker;	/* The lock, that is blocking us */
 	struct list_head fl_list;	/* link into file_lock_context */
 	struct hlist_node fl_link;	/* node in global lists */
-	struct list_head fl_block;	/* circular list of blocked processes */
+	struct list_head fl_blocked;	/* list of requests with
+					 * ->fl_blocker pointing here */
+	struct list_head fl_block;	/* node in
+					 * ->fl_blocker->fl_blocked */
 	fl_owner_t fl_owner;
 	unsigned int fl_flags;
 	unsigned char fl_type;
diff --git a/include/trace/events/filelock.h b/include/trace/events/filelock.h
index d1faf3597b9d..0e50f985a390 100644
--- a/include/trace/events/filelock.h
+++ b/include/trace/events/filelock.h
@@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(filelock_lock,
 		__field(struct file_lock *, fl)
 		__field(unsigned long, i_ino)
 		__field(dev_t, s_dev)
-		__field(struct file_lock *, fl_next)
+		__field(struct file_lock *, fl_blocker)
 		__field(fl_owner_t, fl_owner)
 		__field(unsigned int, fl_pid)
 		__field(unsigned int, fl_flags)
@@ -82,7 +82,7 @@ DECLARE_EVENT_CLASS(filelock_lock,
 		__entry->fl = fl ? fl : NULL;
 		__entry->s_dev = inode->i_sb->s_dev;
 		__entry->i_ino = inode->i_ino;
-		__entry->fl_next = fl ? fl->fl_next : NULL;
+		__entry->fl_blocker = fl ? fl->fl_blocker : NULL;
 		__entry->fl_owner = fl ? fl->fl_owner : NULL;
 		__entry->fl_pid = fl ? fl->fl_pid : 0;
 		__entry->fl_flags = fl ? fl->fl_flags : 0;
@@ -92,9 +92,9 @@ DECLARE_EVENT_CLASS(filelock_lock,
 		__entry->ret = ret;
 	),
 
-	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_next=0x%p fl_owner=0x%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d",
+	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_blocker=0x%p fl_owner=0x%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d",
 		__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
-		__entry->i_ino, __entry->fl_next, __entry->fl_owner,
+		__entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
 		__entry->fl_pid, show_fl_flags(__entry->fl_flags),
 		show_fl_type(__entry->fl_type),
 		__entry->fl_start, __entry->fl_end, __entry->ret)
@@ -122,7 +122,7 @@ DECLARE_EVENT_CLASS(filelock_lease,
 		__field(struct file_lock *, fl)
 		__field(unsigned long, i_ino)
 		__field(dev_t, s_dev)
-		__field(struct file_lock *, fl_next)
+		__field(struct file_lock *, fl_blocker)
 		__field(fl_owner_t, fl_owner)
 		__field(unsigned int, fl_flags)
 		__field(unsigned char, fl_type)
@@ -134,7 +134,7 @@ DECLARE_EVENT_CLASS(filelock_lease,
 		__entry->fl = fl ? fl : NULL;
 		__entry->s_dev = inode->i_sb->s_dev;
 		__entry->i_ino = inode->i_ino;
-		__entry->fl_next = fl ? fl->fl_next : NULL;
+		__entry->fl_blocker = fl ? fl->fl_blocker : NULL;
 		__entry->fl_owner = fl ? fl->fl_owner : NULL;
 		__entry->fl_flags = fl ? fl->fl_flags : 0;
 		__entry->fl_type = fl ? fl->fl_type : 0;
@@ -142,9 +142,9 @@ DECLARE_EVENT_CLASS(filelock_lease,
 		__entry->fl_downgrade_time = fl ? fl->fl_downgrade_time : 0;
 	),
 
-	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_next=0x%p fl_owner=0x%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu",
+	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_blocker=0x%p fl_owner=0x%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu",
 		__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
-		__entry->i_ino, __entry->fl_next, __entry->fl_owner,
+		__entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
 		show_fl_flags(__entry->fl_flags),
 		show_fl_type(__entry->fl_type),
 		__entry->fl_break_time, __entry->fl_downgrade_time)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] fs/locks: create a tree of dependent requests.
  2018-08-14  3:56 [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups NeilBrown
@ 2018-08-14  3:56 ` NeilBrown
  2018-08-14  3:56 ` [PATCH 3/5] fs/locks: allow a lock request to block other requests NeilBrown
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2018-08-14  3:56 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

When we find an existing lock which conflicts with a request,
and the request wants to wait, we currently add the request
to a list.  When the lock is removed, the whole list is woken.
This can cause the thundering-herd problem.
To reduce the problem, we make use of the (new) fact that
a pending request can itself have a list of blocked requests.
When we find a conflict, we look through the existing blocked requests.
If any one of them blocks the new request, the new request is attached
below that request, otherwise it is added to the list of blocked
requests, which are now known to be mutually non-conflicting.

This way, when the lock is released, only a set of non-conflicting
locks will be woken, the rest can stay asleep.
If the lock request cannot be granted and the request needs to be
requeued, all the other requests it blocks will then be woken

Reported-and-tested-by: Martin Wilck <mwilck@suse.de>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index c7a372cebff1..af250afceff4 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -727,11 +727,25 @@ static void locks_delete_block(struct file_lock *waiter)
  * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring
  * that the flc_lock is also held on insertions we can avoid taking the
  * blocked_lock_lock in some cases when we see that the fl_blocked list is empty.
+ *
+ * Rather than just adding to the list, we check for conflicts with any existing
+ * waiters, and add beneath any waiter that blocks the new waiter.
+ * Thus wakeups don't happen until needed.
  */
 static void __locks_insert_block(struct file_lock *blocker,
-				 struct file_lock *waiter)
+				 struct file_lock *waiter,
+				 bool conflict(struct file_lock *,
+					       struct file_lock *))
 {
+	struct file_lock *fl;
 	BUG_ON(!list_empty(&waiter->fl_block));
+
+new_blocker:
+	list_for_each_entry(fl, &blocker->fl_blocked, fl_block)
+		if (conflict(fl, waiter)) {
+			blocker =  fl;
+			goto new_blocker;
+		}
 	waiter->fl_blocker = blocker;
 	list_add_tail(&waiter->fl_block, &blocker->fl_blocked);
 	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
@@ -746,10 +760,12 @@ static void __locks_insert_block(struct file_lock *blocker,
 
 /* Must be called with flc_lock held. */
 static void locks_insert_block(struct file_lock *blocker,
-					struct file_lock *waiter)
+			       struct file_lock *waiter,
+			       bool conflict(struct file_lock *,
+					     struct file_lock *))
 {
 	spin_lock(&blocked_lock_lock);
-	__locks_insert_block(blocker, waiter);
+	__locks_insert_block(blocker, waiter, conflict);
 	spin_unlock(&blocked_lock_lock);
 }
 
@@ -1008,7 +1024,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
 		if (!(request->fl_flags & FL_SLEEP))
 			goto out;
 		error = FILE_LOCK_DEFERRED;
-		locks_insert_block(fl, request);
+		locks_insert_block(fl, request, flock_locks_conflict);
 		goto out;
 	}
 	if (request->fl_flags & FL_ACCESS)
@@ -1082,7 +1098,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
 			spin_lock(&blocked_lock_lock);
 			if (likely(!posix_locks_deadlock(request, fl))) {
 				error = FILE_LOCK_DEFERRED;
-				__locks_insert_block(fl, request);
+				__locks_insert_block(fl, request,
+						     posix_locks_conflict);
 			}
 			spin_unlock(&blocked_lock_lock);
 			goto out;
@@ -1555,7 +1572,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
 		break_time -= jiffies;
 	if (break_time == 0)
 		break_time++;
-	locks_insert_block(fl, new_fl);
+	locks_insert_block(fl, new_fl, leases_conflict);
 	trace_break_lease_block(inode, new_fl);
 	spin_unlock(&ctx->flc_lock);
 	percpu_up_read_preempt_enable(&file_rwsem);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] fs/locks: split out __locks_wake_up_blocks().
  2018-08-14  3:56 [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups NeilBrown
  2018-08-14  3:56 ` [PATCH 5/5] fs/locks: create a tree of dependent requests NeilBrown
  2018-08-14  3:56 ` [PATCH 3/5] fs/locks: allow a lock request to block other requests NeilBrown
@ 2018-08-14  3:56 ` NeilBrown
  2018-08-14  3:56 ` [PATCH 1/5] fs/locks: rename some lists and pointers NeilBrown
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2018-08-14  3:56 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

This functionality will be useful in future patches, so
split it out from locks_wake_up_blocks().

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 322491e70e41..de0b9276f23d 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -681,6 +681,21 @@ static void __locks_delete_block(struct file_lock *waiter)
 	waiter->fl_blocker = NULL;
 }
 
+static void __locks_wake_up_blocks(struct file_lock *blocker)
+{
+	while (!list_empty(&blocker->fl_blocked)) {
+		struct file_lock *waiter;
+
+		waiter = list_first_entry(&blocker->fl_blocked,
+					  struct file_lock, fl_block);
+		__locks_delete_block(waiter);
+		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
+			waiter->fl_lmops->lm_notify(waiter);
+		else
+			wake_up(&waiter->fl_wait);
+	}
+}
+
 static void locks_delete_block(struct file_lock *waiter)
 {
 	spin_lock(&blocked_lock_lock);
@@ -735,17 +750,7 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 		return;
 
 	spin_lock(&blocked_lock_lock);
-	while (!list_empty(&blocker->fl_blocked)) {
-		struct file_lock *waiter;
-
-		waiter = list_first_entry(&blocker->fl_blocked,
-				struct file_lock, fl_block);
-		__locks_delete_block(waiter);
-		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
-			waiter->fl_lmops->lm_notify(waiter);
-		else
-			wake_up(&waiter->fl_wait);
-	}
+	__locks_wake_up_blocks(blocker);
 	spin_unlock(&blocked_lock_lock);
 }
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups
  2018-08-14  3:56 [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups NeilBrown
                   ` (4 preceding siblings ...)
  2018-08-14  3:56 ` [PATCH 4/5] fs/locks: change all *_conflict() functions to return bool NeilBrown
@ 2018-08-14 18:41 ` J. Bruce Fields
  2018-08-14 19:12   ` Jeff Layton
  5 siblings, 1 reply; 8+ messages in thread
From: J. Bruce Fields @ 2018-08-14 18:41 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, Alexander Viro, Martin Wilck, linux-fsdevel,
	Frank Filz, linux-kernel

This version looks correct to me, and simpler.  I'll be curious to hear
whatever you learn from testing!

--b.

On Tue, Aug 14, 2018 at 01:56:51PM +1000, NeilBrown wrote:
> 
> V2, which added wake_non_conflicts() was more broken than V1 - as
> Bruce explained there is no transitivity in the blocking relation
> between locks.
> So this series takes a simpler approach.
> It still attached waiters between other waiters as necessary to ensure
> that:
>   - a waiter is blocked by it's parent (fl->blocker) and all further
>     ancestors, and
>   - the list of waiters on fl_blocked are mutually non-conflicting.
> 
> When a lock (the root of a tree of requests) is released, only its
> immediate children (fl_blocked) are woken.
> When any lock is woken (either because its fl_blocker was released
> to due to a signal or similar) it with either:
>  - be granted
>  - be aborted
>  - be re-queued beneath some other lock.
> 
> In the first case tree of blocked locks is moved across to the newly
> created lock, and the invariants still hold.
> In the order two cases, the tree or blocked waiters are all detached
> and woken.
> 
> Note that this series has not received much testing yet.
> 
> Original description:
> If you have a many-core machine, and have many threads all wanting to
> briefly lock a give file (udev is known to do this), you can get quite
> poor performance.
> 
> When one thread releases a lock, it wakes up all other threads that
> are waiting (classic thundering-herd) - one will get the lock and the
> others go to sleep.
> When you have few cores, this is not very noticeable: by the time the
> 4th or 5th thread gets enough CPU time to try to claim the lock, the
> earlier threads have claimed it, done what was needed, and released.
> With 50+ cores, the contention can easily be measured.
> 
> This patchset creates a tree of pending lock request in which siblings
> don't conflict and each lock request does conflict with its parent.
> When a lock is released, only requests which don't conflict with each
> other a woken.
> 
> Testing shows that lock-acquisitions-per-second is now fairly stable even
> as number of contending process goes to 1000.  Without this patch,
> locks-per-second drops off steeply after a few 10s of processes.
> 
> There is a small cost to this extra complexity.
> At 20 processes running a particular test on 72 cores, the lock
> acquisitions per second drops from 1.8 million to 1.4 million with
> this patch.  For 100 processes, this patch still provides 1.4 million
> while without this patch there are about 700,000.
> 
> NeilBrown
> 
> ---
> 
> NeilBrown (5):
>       fs/locks: rename some lists and pointers.
>       fs/locks: split out __locks_wake_up_blocks().
>       fs/locks: allow a lock request to block other requests.
>       fs/locks: change all *_conflict() functions to return bool.
>       fs/locks: create a tree of dependent requests.
> 
> 
>  fs/cifs/file.c                  |    2 -
>  fs/locks.c                      |  156 ++++++++++++++++++++++++++-------------
>  include/linux/fs.h              |    7 +-
>  include/trace/events/filelock.h |   16 ++--
>  4 files changed, 119 insertions(+), 62 deletions(-)
> 
> --
> Signature

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups
  2018-08-14 18:41 ` [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups J. Bruce Fields
@ 2018-08-14 19:12   ` Jeff Layton
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2018-08-14 19:12 UTC (permalink / raw)
  To: J. Bruce Fields, NeilBrown
  Cc: Alexander Viro, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

On Tue, 2018-08-14 at 14:41 -0400, J. Bruce Fields wrote:
> This version looks correct to me, and simpler.  I'll be curious to hear
> whatever you learn from testing!
> 
> --b.
> 

Agreed. I'll go ahead and put this in linux-next with an eye toward
merging in v4.20 if we don't hit any major problems with it.

Thanks again and nice work, Neil!

> On Tue, Aug 14, 2018 at 01:56:51PM +1000, NeilBrown wrote:
> > 
> > V2, which added wake_non_conflicts() was more broken than V1 - as
> > Bruce explained there is no transitivity in the blocking relation
> > between locks.
> > So this series takes a simpler approach.
> > It still attached waiters between other waiters as necessary to ensure
> > that:
> >   - a waiter is blocked by it's parent (fl->blocker) and all further
> >     ancestors, and
> >   - the list of waiters on fl_blocked are mutually non-conflicting.
> > 
> > When a lock (the root of a tree of requests) is released, only its
> > immediate children (fl_blocked) are woken.
> > When any lock is woken (either because its fl_blocker was released
> > to due to a signal or similar) it with either:
> >  - be granted
> >  - be aborted
> >  - be re-queued beneath some other lock.
> > 
> > In the first case tree of blocked locks is moved across to the newly
> > created lock, and the invariants still hold.
> > In the order two cases, the tree or blocked waiters are all detached
> > and woken.
> > 
> > Note that this series has not received much testing yet.
> > 
> > Original description:
> > If you have a many-core machine, and have many threads all wanting to
> > briefly lock a give file (udev is known to do this), you can get quite
> > poor performance.
> > 
> > When one thread releases a lock, it wakes up all other threads that
> > are waiting (classic thundering-herd) - one will get the lock and the
> > others go to sleep.
> > When you have few cores, this is not very noticeable: by the time the
> > 4th or 5th thread gets enough CPU time to try to claim the lock, the
> > earlier threads have claimed it, done what was needed, and released.
> > With 50+ cores, the contention can easily be measured.
> > 
> > This patchset creates a tree of pending lock request in which siblings
> > don't conflict and each lock request does conflict with its parent.
> > When a lock is released, only requests which don't conflict with each
> > other a woken.
> > 
> > Testing shows that lock-acquisitions-per-second is now fairly stable even
> > as number of contending process goes to 1000.  Without this patch,
> > locks-per-second drops off steeply after a few 10s of processes.
> > 
> > There is a small cost to this extra complexity.
> > At 20 processes running a particular test on 72 cores, the lock
> > acquisitions per second drops from 1.8 million to 1.4 million with
> > this patch.  For 100 processes, this patch still provides 1.4 million
> > while without this patch there are about 700,000.
> > 
> > NeilBrown
> > 
> > ---
> > 
> > NeilBrown (5):
> >       fs/locks: rename some lists and pointers.
> >       fs/locks: split out __locks_wake_up_blocks().
> >       fs/locks: allow a lock request to block other requests.
> >       fs/locks: change all *_conflict() functions to return bool.
> >       fs/locks: create a tree of dependent requests.
> > 
> > 
> >  fs/cifs/file.c                  |    2 -
> >  fs/locks.c                      |  156 ++++++++++++++++++++++++++-------------
> >  include/linux/fs.h              |    7 +-
> >  include/trace/events/filelock.h |   16 ++--
> >  4 files changed, 119 insertions(+), 62 deletions(-)
> > 
> > --
> > Signature

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-08-14 22:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-14  3:56 [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups NeilBrown
2018-08-14  3:56 ` [PATCH 5/5] fs/locks: create a tree of dependent requests NeilBrown
2018-08-14  3:56 ` [PATCH 3/5] fs/locks: allow a lock request to block other requests NeilBrown
2018-08-14  3:56 ` [PATCH 2/5] fs/locks: split out __locks_wake_up_blocks() NeilBrown
2018-08-14  3:56 ` [PATCH 1/5] fs/locks: rename some lists and pointers NeilBrown
2018-08-14  3:56 ` [PATCH 4/5] fs/locks: change all *_conflict() functions to return bool NeilBrown
2018-08-14 18:41 ` [PATCH 0/5 v2] locks: avoid thundering-herd wake-ups J. Bruce Fields
2018-08-14 19:12   ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).