linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] Series short description
@ 2018-11-05  1:30 NeilBrown
  2018-11-05  1:30 ` [PATCH 02/12] fs/locks: split out __locks_wake_up_blocks() NeilBrown
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

Here is the respin on this series with the file_lock properly
initlized for unlock requests.
I found one that I had missed before - in locks_remove_flock()
The change makes this code smaller!

Original series description:

If you have a many-core machine, and have many threads all wanting to
briefly lock a give file (udev is known to do this), you can get quite
poor performance.

When one thread releases a lock, it wakes up all other threads that
are waiting (classic thundering-herd) - one will get the lock and the
others go to sleep.
When you have few cores, this is not very noticeable: by the time the
4th or 5th thread gets enough CPU time to try to claim the lock, the
earlier threads have claimed it, done what was needed, and released.
With 50+ cores, the contention can easily be measured.

This patchset creates a tree of pending lock request in which siblings
don't conflict and each lock request does conflict with its parent.
When a lock is released, only requests which don't conflict with each
other a woken.

Testing shows that lock-acquisitions-per-second is now fairly stable even
as number of contending process goes to 1000.  Without this patch,
locks-per-second drops off steeply after a few 10s of processes.

There is a small cost to this extra complexity.
At 20 processes running a particular test on 72 cores, the lock
acquisitions per second drops from 1.8 million to 1.4 million with
this patch.  For 100 processes, this patch still provides 1.4 million
while without this patch there are about 700,000.

NeilBrown


---

NeilBrown (12):
      fs/locks: rename some lists and pointers.
      fs/locks: split out __locks_wake_up_blocks().
      NFS: use locks_copy_lock() to copy locks.
      gfs2: properly initial file_lock used for unlock.
      ocfs2: properly initial file_lock used for unlock.
      locks: use properly initialized file_lock when unlocking.
      fs/locks: allow a lock request to block other requests.
      fs/locks: always delete_block after waiting.
      fs/locks: change all *_conflict() functions to return bool.
      fs/locks: create a tree of dependent requests.
      locks: merge posix_unblock_lock() and locks_delete_block()
      VFS: locks: remove unnecessary white space.


 fs/cifs/file.c                  |    4 -
 fs/gfs2/file.c                  |   10 +-
 fs/lockd/svclock.c              |    2 
 fs/locks.c                      |  253 +++++++++++++++++++++------------------
 fs/nfs/nfs4proc.c               |    6 +
 fs/nfsd/nfs4state.c             |    6 -
 fs/ocfs2/locks.c                |   10 +-
 include/linux/fs.h              |   11 +-
 include/trace/events/filelock.h |   16 +-
 9 files changed, 173 insertions(+), 145 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 02/12] fs/locks: split out __locks_wake_up_blocks().
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-05  1:30 ` [PATCH 08/12] fs/locks: always delete_block after waiting NeilBrown
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

This functionality will be useful in future patches, so
split it out from locks_wake_up_blocks().

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index a6c6d601286c..b8f33792a0a6 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -672,6 +672,21 @@ static void __locks_delete_block(struct file_lock *waiter)
 	waiter->fl_blocker = NULL;
 }
 
+static void __locks_wake_up_blocks(struct file_lock *blocker)
+{
+	while (!list_empty(&blocker->fl_blocked)) {
+		struct file_lock *waiter;
+
+		waiter = list_first_entry(&blocker->fl_blocked,
+					  struct file_lock, fl_block);
+		__locks_delete_block(waiter);
+		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
+			waiter->fl_lmops->lm_notify(waiter);
+		else
+			wake_up(&waiter->fl_wait);
+	}
+}
+
 static void locks_delete_block(struct file_lock *waiter)
 {
 	spin_lock(&blocked_lock_lock);
@@ -726,17 +741,7 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 		return;
 
 	spin_lock(&blocked_lock_lock);
-	while (!list_empty(&blocker->fl_blocked)) {
-		struct file_lock *waiter;
-
-		waiter = list_first_entry(&blocker->fl_blocked,
-				struct file_lock, fl_block);
-		__locks_delete_block(waiter);
-		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
-			waiter->fl_lmops->lm_notify(waiter);
-		else
-			wake_up(&waiter->fl_wait);
-	}
+	__locks_wake_up_blocks(blocker);
 	spin_unlock(&blocked_lock_lock);
 }
 



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 01/12] fs/locks: rename some lists and pointers.
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
                   ` (2 preceding siblings ...)
  2018-11-05  1:30 ` [PATCH 06/12] locks: use properly initialized file_lock when unlocking NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-08 20:26   ` J. Bruce Fields
  2018-11-05  1:30 ` [PATCH 04/12] gfs2: properly initial file_lock used for unlock NeilBrown
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

struct file lock contains an 'fl_next' pointer which
is used to point to the lock that this request is blocked
waiting for.  So rename it to fl_blocker.

The fl_blocked list_head in an active lock is the head of a list of
blocked requests.  In a request it is a node in that list.
These are two distinct uses, so replace with two list_heads
with different names.
fl_blocked is the head of a list of blocked requests
fl_block is a node on that list.

The two different list_heads are never used at the same time, but that
will change in a future patch.

Note that a tracepoint is changed to report fl_blocker instead
of fl_next.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/cifs/file.c                  |    2 +-
 fs/locks.c                      |   38 ++++++++++++++++++++------------------
 include/linux/fs.h              |    7 +++++--
 include/trace/events/filelock.h |   16 ++++++++--------
 4 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 74c33d5fafc8..d7ed895e05d1 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1103,7 +1103,7 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
 	rc = posix_lock_file(file, flock, NULL);
 	up_write(&cinode->lock_sem);
 	if (rc == FILE_LOCK_DEFERRED) {
-		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_next);
+		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
 		if (!rc)
 			goto try_again;
 		posix_unblock_lock(flock);
diff --git a/fs/locks.c b/fs/locks.c
index 2ecb4db8c840..a6c6d601286c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -189,7 +189,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
  * This lock protects the blocked_hash. Generally, if you're accessing it, you
  * want to be holding this lock.
  *
- * In addition, it also protects the fl->fl_block list, and the fl->fl_next
+ * In addition, it also protects the fl->fl_blocked list, and the fl->fl_blocker
  * pointer for file_lock structures that are acting as lock requests (in
  * contrast to those that are acting as records of acquired locks).
  *
@@ -293,6 +293,7 @@ static void locks_init_lock_heads(struct file_lock *fl)
 {
 	INIT_HLIST_NODE(&fl->fl_link);
 	INIT_LIST_HEAD(&fl->fl_list);
+	INIT_LIST_HEAD(&fl->fl_blocked);
 	INIT_LIST_HEAD(&fl->fl_block);
 	init_waitqueue_head(&fl->fl_wait);
 }
@@ -332,6 +333,7 @@ void locks_free_lock(struct file_lock *fl)
 {
 	BUG_ON(waitqueue_active(&fl->fl_wait));
 	BUG_ON(!list_empty(&fl->fl_list));
+	BUG_ON(!list_empty(&fl->fl_blocked));
 	BUG_ON(!list_empty(&fl->fl_block));
 	BUG_ON(!hlist_unhashed(&fl->fl_link));
 
@@ -667,7 +669,7 @@ static void __locks_delete_block(struct file_lock *waiter)
 {
 	locks_delete_global_blocked(waiter);
 	list_del_init(&waiter->fl_block);
-	waiter->fl_next = NULL;
+	waiter->fl_blocker = NULL;
 }
 
 static void locks_delete_block(struct file_lock *waiter)
@@ -683,16 +685,16 @@ static void locks_delete_block(struct file_lock *waiter)
  * it seems like the reasonable thing to do.
  *
  * Must be called with both the flc_lock and blocked_lock_lock held. The
- * fl_block list itself is protected by the blocked_lock_lock, but by ensuring
+ * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring
  * that the flc_lock is also held on insertions we can avoid taking the
- * blocked_lock_lock in some cases when we see that the fl_block list is empty.
+ * blocked_lock_lock in some cases when we see that the fl_blocked list is empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
 					struct file_lock *waiter)
 {
 	BUG_ON(!list_empty(&waiter->fl_block));
-	waiter->fl_next = blocker;
-	list_add_tail(&waiter->fl_block, &blocker->fl_block);
+	waiter->fl_blocker = blocker;
+	list_add_tail(&waiter->fl_block, &blocker->fl_blocked);
 	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
 		locks_insert_global_blocked(waiter);
 }
@@ -716,18 +718,18 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 	/*
 	 * Avoid taking global lock if list is empty. This is safe since new
 	 * blocked requests are only added to the list under the flc_lock, and
-	 * the flc_lock is always held here. Note that removal from the fl_block
+	 * the flc_lock is always held here. Note that removal from the fl_blocked
 	 * list does not require the flc_lock, so we must recheck list_empty()
 	 * after acquiring the blocked_lock_lock.
 	 */
-	if (list_empty(&blocker->fl_block))
+	if (list_empty(&blocker->fl_blocked))
 		return;
 
 	spin_lock(&blocked_lock_lock);
-	while (!list_empty(&blocker->fl_block)) {
+	while (!list_empty(&blocker->fl_blocked)) {
 		struct file_lock *waiter;
 
-		waiter = list_first_entry(&blocker->fl_block,
+		waiter = list_first_entry(&blocker->fl_blocked,
 				struct file_lock, fl_block);
 		__locks_delete_block(waiter);
 		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
@@ -878,7 +880,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 
 	hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) {
 		if (posix_same_owner(fl, block_fl))
-			return fl->fl_next;
+			return fl->fl_blocker;
 	}
 	return NULL;
 }
@@ -1237,7 +1239,7 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
 		error = posix_lock_inode(inode, fl, NULL);
 		if (error != FILE_LOCK_DEFERRED)
 			break;
-		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
+		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
 		if (!error)
 			continue;
 
@@ -1324,7 +1326,7 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
 		error = posix_lock_inode(inode, &fl, NULL);
 		if (error != FILE_LOCK_DEFERRED)
 			break;
-		error = wait_event_interruptible(fl.fl_wait, !fl.fl_next);
+		error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker);
 		if (!error) {
 			/*
 			 * If we've been sleeping someone might have
@@ -1518,7 +1520,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
 
 	locks_dispose_list(&dispose);
 	error = wait_event_interruptible_timeout(new_fl->fl_wait,
-						!new_fl->fl_next, break_time);
+						!new_fl->fl_blocker, break_time);
 
 	percpu_down_read_preempt_disable(&file_rwsem);
 	spin_lock(&ctx->flc_lock);
@@ -1931,7 +1933,7 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
 		error = flock_lock_inode(inode, fl);
 		if (error != FILE_LOCK_DEFERRED)
 			break;
-		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
+		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
 		if (!error)
 			continue;
 
@@ -2210,7 +2212,7 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
 		error = vfs_lock_file(filp, cmd, fl, NULL);
 		if (error != FILE_LOCK_DEFERRED)
 			break;
-		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
+		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
 		if (!error)
 			continue;
 
@@ -2581,7 +2583,7 @@ posix_unblock_lock(struct file_lock *waiter)
 	int status = 0;
 
 	spin_lock(&blocked_lock_lock);
-	if (waiter->fl_next)
+	if (waiter->fl_blocker)
 		__locks_delete_block(waiter);
 	else
 		status = -ENOENT;
@@ -2707,7 +2709,7 @@ static int locks_show(struct seq_file *f, void *v)
 
 	lock_get_status(f, fl, iter->li_pos, "");
 
-	list_for_each_entry(bfl, &fl->fl_block, fl_block)
+	list_for_each_entry(bfl, &fl->fl_blocked, fl_block)
 		lock_get_status(f, bfl, iter->li_pos, " ->");
 
 	return 0;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c95c0807471f..a161bcdca8a2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1044,10 +1044,13 @@ bool opens_in_grace(struct net *);
  * Obviously, the last two criteria only matter for POSIX locks.
  */
 struct file_lock {
-	struct file_lock *fl_next;	/* singly linked list for this inode  */
+	struct file_lock *fl_blocker;	/* The lock, that is blocking us */
 	struct list_head fl_list;	/* link into file_lock_context */
 	struct hlist_node fl_link;	/* node in global lists */
-	struct list_head fl_block;	/* circular list of blocked processes */
+	struct list_head fl_blocked;	/* list of requests with
+					 * ->fl_blocker pointing here */
+	struct list_head fl_block;	/* node in
+					 * ->fl_blocker->fl_blocked */
 	fl_owner_t fl_owner;
 	unsigned int fl_flags;
 	unsigned char fl_type;
diff --git a/include/trace/events/filelock.h b/include/trace/events/filelock.h
index 68b17c116907..fad7befa612d 100644
--- a/include/trace/events/filelock.h
+++ b/include/trace/events/filelock.h
@@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(filelock_lock,
 		__field(struct file_lock *, fl)
 		__field(unsigned long, i_ino)
 		__field(dev_t, s_dev)
-		__field(struct file_lock *, fl_next)
+		__field(struct file_lock *, fl_blocker)
 		__field(fl_owner_t, fl_owner)
 		__field(unsigned int, fl_pid)
 		__field(unsigned int, fl_flags)
@@ -82,7 +82,7 @@ DECLARE_EVENT_CLASS(filelock_lock,
 		__entry->fl = fl ? fl : NULL;
 		__entry->s_dev = inode->i_sb->s_dev;
 		__entry->i_ino = inode->i_ino;
-		__entry->fl_next = fl ? fl->fl_next : NULL;
+		__entry->fl_blocker = fl ? fl->fl_blocker : NULL;
 		__entry->fl_owner = fl ? fl->fl_owner : NULL;
 		__entry->fl_pid = fl ? fl->fl_pid : 0;
 		__entry->fl_flags = fl ? fl->fl_flags : 0;
@@ -92,9 +92,9 @@ DECLARE_EVENT_CLASS(filelock_lock,
 		__entry->ret = ret;
 	),
 
-	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_next=0x%p fl_owner=0x%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d",
+	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_blocker=0x%p fl_owner=0x%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d",
 		__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
-		__entry->i_ino, __entry->fl_next, __entry->fl_owner,
+		__entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
 		__entry->fl_pid, show_fl_flags(__entry->fl_flags),
 		show_fl_type(__entry->fl_type),
 		__entry->fl_start, __entry->fl_end, __entry->ret)
@@ -125,7 +125,7 @@ DECLARE_EVENT_CLASS(filelock_lease,
 		__field(struct file_lock *, fl)
 		__field(unsigned long, i_ino)
 		__field(dev_t, s_dev)
-		__field(struct file_lock *, fl_next)
+		__field(struct file_lock *, fl_blocker)
 		__field(fl_owner_t, fl_owner)
 		__field(unsigned int, fl_flags)
 		__field(unsigned char, fl_type)
@@ -137,7 +137,7 @@ DECLARE_EVENT_CLASS(filelock_lease,
 		__entry->fl = fl ? fl : NULL;
 		__entry->s_dev = inode->i_sb->s_dev;
 		__entry->i_ino = inode->i_ino;
-		__entry->fl_next = fl ? fl->fl_next : NULL;
+		__entry->fl_blocker = fl ? fl->fl_blocker : NULL;
 		__entry->fl_owner = fl ? fl->fl_owner : NULL;
 		__entry->fl_flags = fl ? fl->fl_flags : 0;
 		__entry->fl_type = fl ? fl->fl_type : 0;
@@ -145,9 +145,9 @@ DECLARE_EVENT_CLASS(filelock_lease,
 		__entry->fl_downgrade_time = fl ? fl->fl_downgrade_time : 0;
 	),
 
-	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_next=0x%p fl_owner=0x%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu",
+	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_blocker=0x%p fl_owner=0x%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu",
 		__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
-		__entry->i_ino, __entry->fl_next, __entry->fl_owner,
+		__entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
 		show_fl_flags(__entry->fl_flags),
 		show_fl_type(__entry->fl_type),
 		__entry->fl_break_time, __entry->fl_downgrade_time)



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/12] NFS: use locks_copy_lock() to copy locks.
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
                   ` (7 preceding siblings ...)
  2018-11-05  1:30 ` [PATCH 09/12] fs/locks: change all *_conflict() functions to return bool NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-05  1:30 ` [PATCH 11/12] locks: merge posix_unblock_lock() and locks_delete_block() NeilBrown
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

Using memcpy() to copy lock requests leave the various
list_head in an inconsistent state.
As we will soon attach a list of conflicting request to
another pending request, we need these lists to be consistent.
So change NFS to use locks_init_lock/locks_copy_lock instead
of memcpy.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/nfs/nfs4proc.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 867457d6dfbe..0ba2b0fb8ff3 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6311,7 +6311,8 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl,
 	/* Ensure we don't close file until we're done freeing locks! */
 	p->ctx = get_nfs_open_context(ctx);
 	p->l_ctx = nfs_get_lock_context(ctx);
-	memcpy(&p->fl, fl, sizeof(p->fl));
+	locks_init_lock(&p->fl);
+	locks_copy_lock(&p->fl, fl);
 	p->server = NFS_SERVER(inode);
 	return p;
 }
@@ -6533,7 +6534,8 @@ static struct nfs4_lockdata *nfs4_alloc_lockdata(struct file_lock *fl,
 	p->server = server;
 	refcount_inc(&lsp->ls_count);
 	p->ctx = get_nfs_open_context(ctx);
-	memcpy(&p->fl, fl, sizeof(p->fl));
+	locks_init_lock(&p->fl);
+	locks_copy_lock(&p->fl, fl);
 	return p;
 out_free_seqid:
 	nfs_free_seqid(p->arg.open_seqid);



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/12] gfs2: properly initial file_lock used for unlock.
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
                   ` (3 preceding siblings ...)
  2018-11-05  1:30 ` [PATCH 01/12] fs/locks: rename some lists and pointers NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-05 12:18   ` Jeff Layton
  2018-11-05  1:30 ` [PATCH 05/12] ocfs2: " NeilBrown
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

Rather than assuming all-zeros is sufficient, use the available API to
initialize the file_lock structure use for unlock.
VFS-level changes will soon make it important that the
list_heads in file_lock are always properly initialized.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/gfs2/file.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 45a17b770d97..271f847705e3 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1199,13 +1199,13 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl)
 	mutex_lock(&fp->f_fl_mutex);
 
 	if (gfs2_holder_initialized(fl_gh)) {
+		struct file_lock request;
 		if (fl_gh->gh_state == state)
 			goto out;
-		locks_lock_file_wait(file,
-				     &(struct file_lock) {
-					     .fl_type = F_UNLCK,
-					     .fl_flags = FL_FLOCK
-				     });
+		locks_init_lock(&request);
+		request.fl_type = F_UNLOCK;
+		request.fl_flags = FL_FLOCK;
+		locks_lock_file_wait(file, &request);
 		gfs2_glock_dq(fl_gh);
 		gfs2_holder_reinit(state, flags, fl_gh);
 	} else {



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/12] ocfs2: properly initial file_lock used for unlock.
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
                   ` (4 preceding siblings ...)
  2018-11-05  1:30 ` [PATCH 04/12] gfs2: properly initial file_lock used for unlock NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-05  1:30 ` [PATCH 07/12] fs/locks: allow a lock request to block other requests NeilBrown
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

Rather than assuming all-zeros is sufficient, use the available API to
initialize the file_lock structure use for unlock.
VFS-level changes will soon make it important that the
list_heads in file_lock are always properly initialized.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/ocfs2/locks.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/locks.c b/fs/ocfs2/locks.c
index d56f0079b858..9bb978d57c7d 100644
--- a/fs/ocfs2/locks.c
+++ b/fs/ocfs2/locks.c
@@ -52,6 +52,7 @@ static int ocfs2_do_flock(struct file *file, struct inode *inode,
 	if (lockres->l_flags & OCFS2_LOCK_ATTACHED &&
 	    lockres->l_level > LKM_NLMODE) {
 		int old_level = 0;
+		struct file_lock request;
 
 		if (lockres->l_level == LKM_EXMODE)
 			old_level = 1;
@@ -66,11 +67,10 @@ static int ocfs2_do_flock(struct file *file, struct inode *inode,
 		 * level.
 		 */
 
-		locks_lock_file_wait(file,
-				&(struct file_lock) {
-					.fl_type = F_UNLCK,
-					.fl_flags = FL_FLOCK
-				});
+		locks_init_lock(&request);
+		request.fl_type = F_UNLOCK;
+		request.fl_flags = FL_FLOCK;
+		locks_lock_file_wait(file, &request);
 
 		ocfs2_file_unlock(file);
 	}



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/12] locks: use properly initialized file_lock when unlocking.
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
  2018-11-05  1:30 ` [PATCH 02/12] fs/locks: split out __locks_wake_up_blocks() NeilBrown
  2018-11-05  1:30 ` [PATCH 08/12] fs/locks: always delete_block after waiting NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-05  1:30 ` [PATCH 01/12] fs/locks: rename some lists and pointers NeilBrown
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

Both locks_remove_posix() and locks_remove_flock() use a
struct file_lock without calling locks_init_lock() on it.
This means the various list_heads are not initialized, which
will become a problem with a later patch.

So change them both to initialize properly.  For flock locks,
this involves using flock_make_lock(), and changing it to
allow a file_lock to be passed in, so memory allocation isn't
always needed.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index b8f33792a0a6..ea999ce0e93c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -418,15 +418,15 @@ static inline int flock_translate_cmd(int cmd) {
 
 /* Fill in a file_lock structure with an appropriate FLOCK lock. */
 static struct file_lock *
-flock_make_lock(struct file *filp, unsigned int cmd)
+flock_make_lock(struct file *filp, unsigned int cmd, struct file_lock *fl)
 {
-	struct file_lock *fl;
 	int type = flock_translate_cmd(cmd);
 
 	if (type < 0)
 		return ERR_PTR(type);
 	
-	fl = locks_alloc_lock();
+	if (fl == NULL)
+		fl = locks_alloc_lock();
 	if (fl == NULL)
 		return ERR_PTR(-ENOMEM);
 
@@ -2008,7 +2008,7 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 	    !(f.file->f_mode & (FMODE_READ|FMODE_WRITE)))
 		goto out_putf;
 
-	lock = flock_make_lock(f.file, cmd);
+	lock = flock_make_lock(f.file, cmd, NULL);
 	if (IS_ERR(lock)) {
 		error = PTR_ERR(lock);
 		goto out_putf;
@@ -2483,6 +2483,7 @@ void locks_remove_posix(struct file *filp, fl_owner_t owner)
 	if (!ctx || list_empty(&ctx->flc_posix))
 		return;
 
+	locks_init_lock(&lock);
 	lock.fl_type = F_UNLCK;
 	lock.fl_flags = FL_POSIX | FL_CLOSE;
 	lock.fl_start = 0;
@@ -2506,19 +2507,15 @@ EXPORT_SYMBOL(locks_remove_posix);
 static void
 locks_remove_flock(struct file *filp, struct file_lock_context *flctx)
 {
-	struct file_lock fl = {
-		.fl_owner = filp,
-		.fl_pid = current->tgid,
-		.fl_file = filp,
-		.fl_flags = FL_FLOCK | FL_CLOSE,
-		.fl_type = F_UNLCK,
-		.fl_end = OFFSET_MAX,
-	};
+	struct file_lock fl;
 	struct inode *inode = locks_inode(filp);
 
 	if (list_empty(&flctx->flc_flock))
 		return;
 
+	flock_make_lock(filp, LOCK_UN, &fl);
+	fl.fl_flags |= FL_CLOSE;
+
 	if (filp->f_op->flock)
 		filp->f_op->flock(filp, F_SETLKW, &fl);
 	else



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/12] fs/locks: allow a lock request to block other requests.
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
                   ` (5 preceding siblings ...)
  2018-11-05  1:30 ` [PATCH 05/12] ocfs2: " NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-05  1:30 ` [PATCH 09/12] fs/locks: change all *_conflict() functions to return bool NeilBrown
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

Currently, a lock can block pending requests, but all pending
requests are equal.  If lots of pending requests are
mutually exclusive, this means they will all be woken up
and all but one will fail.  This can hurt performance.

So we will allow pending requests to block other requests.
Only the first request will be woken, and it will wake the others.

This patch doesn't implement this fully, but prepares the way.

- It acknowledges that a request might be blocking other requests,
  and when the request is converted to a lock, those blocked
  requests are moved across.
- When a request is requeued or discarded, all blocked requests are
  woken.
- When deadlock-detection looks for the lock which blocks a
  given request, we follow the chain of ->fl_blocker all
  the way to the top.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index ea999ce0e93c..7a6df4f25b8b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -402,6 +402,17 @@ void locks_copy_lock(struct file_lock *new, struct file_lock *fl)
 
 EXPORT_SYMBOL(locks_copy_lock);
 
+static void locks_move_blocks(struct file_lock *new, struct file_lock *fl)
+{
+	struct file_lock *f;
+
+	spin_lock(&blocked_lock_lock);
+	list_splice_init(&fl->fl_blocked, &new->fl_blocked);
+	list_for_each_entry(f, &fl->fl_blocked, fl_block)
+		f->fl_blocker = new;
+	spin_unlock(&blocked_lock_lock);
+}
+
 static inline int flock_translate_cmd(int cmd) {
 	if (cmd & LOCK_MAND)
 		return cmd & (LOCK_MAND | LOCK_RW);
@@ -690,6 +701,7 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
 static void locks_delete_block(struct file_lock *waiter)
 {
 	spin_lock(&blocked_lock_lock);
+	__locks_wake_up_blocks(waiter);
 	__locks_delete_block(waiter);
 	spin_unlock(&blocked_lock_lock);
 }
@@ -712,6 +724,12 @@ static void __locks_insert_block(struct file_lock *blocker,
 	list_add_tail(&waiter->fl_block, &blocker->fl_blocked);
 	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
 		locks_insert_global_blocked(waiter);
+
+	/* The requests in waiter->fl_blocked are known to conflict with
+	 * waiter, but might not conflict with blocker, or the requests
+	 * and lock which block it.  So they all need to be woken.
+	 */
+	__locks_wake_up_blocks(waiter);
 }
 
 /* Must be called with flc_lock held. */
@@ -884,8 +902,11 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 	struct file_lock *fl;
 
 	hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) {
-		if (posix_same_owner(fl, block_fl))
-			return fl->fl_blocker;
+		if (posix_same_owner(fl, block_fl)) {
+			while (fl->fl_blocker)
+				fl = fl->fl_blocker;
+			return fl;
+		}
 	}
 	return NULL;
 }
@@ -978,6 +999,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
 	if (request->fl_flags & FL_ACCESS)
 		goto out;
 	locks_copy_lock(new_fl, request);
+	locks_move_blocks(new_fl, request);
 	locks_insert_lock_ctx(new_fl, &ctx->flc_flock);
 	new_fl = NULL;
 	error = 0;
@@ -1171,6 +1193,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
 			goto out;
 		}
 		locks_copy_lock(new_fl, request);
+		locks_move_blocks(new_fl, request);
 		locks_insert_lock_ctx(new_fl, &fl->fl_list);
 		fl = new_fl;
 		new_fl = NULL;
@@ -2582,13 +2605,14 @@ void locks_remove_file(struct file *filp)
 int
 posix_unblock_lock(struct file_lock *waiter)
 {
-	int status = 0;
+	int status = -ENOENT;
 
 	spin_lock(&blocked_lock_lock);
-	if (waiter->fl_blocker)
+	if (waiter->fl_blocker) {
+		__locks_wake_up_blocks(waiter);
 		__locks_delete_block(waiter);
-	else
-		status = -ENOENT;
+		status = 0;
+	}
 	spin_unlock(&blocked_lock_lock);
 	return status;
 }



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/12] fs/locks: always delete_block after waiting.
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
  2018-11-05  1:30 ` [PATCH 02/12] fs/locks: split out __locks_wake_up_blocks() NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-05  1:30 ` [PATCH 06/12] locks: use properly initialized file_lock when unlocking NeilBrown
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

Now that requests can block other requests, we
need to be careful to always clean up those blocked
requests.
Any time that we wait for a request, we might have
other requests attached, and when we stop waiting,
we must clean them up.
If the lock was granted, the requests might have been
moved to the new lock, though when merged with a
pre-exiting lock, this might not happen.
In all cases we don't want blocked locks to remain
attached, so we remove them to be safe.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 7a6df4f25b8b..a323160290f6 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1268,12 +1268,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
 		if (error != FILE_LOCK_DEFERRED)
 			break;
 		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-		if (!error)
-			continue;
-
-		locks_delete_block(fl);
-		break;
+		if (error)
+			break;
 	}
+	locks_delete_block(fl);
 	return error;
 }
 
@@ -1962,12 +1960,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
 		if (error != FILE_LOCK_DEFERRED)
 			break;
 		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-		if (!error)
-			continue;
-
-		locks_delete_block(fl);
-		break;
+		if (error)
+			break;
 	}
+	locks_delete_block(fl);
 	return error;
 }
 
@@ -2241,12 +2237,10 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
 		if (error != FILE_LOCK_DEFERRED)
 			break;
 		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
-		if (!error)
-			continue;
-
-		locks_delete_block(fl);
-		break;
+		if (error)
+			break;
 	}
+	locks_delete_block(fl);
 
 	return error;
 }



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/12] fs/locks: change all *_conflict() functions to return bool.
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
                   ` (6 preceding siblings ...)
  2018-11-05  1:30 ` [PATCH 07/12] fs/locks: allow a lock request to block other requests NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-05  1:30 ` [PATCH 03/12] NFS: use locks_copy_lock() to copy locks NeilBrown
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

posix_locks_conflict() and flock_locks_conflict() both return int.
leases_conflict() returns bool.

This inconsistency will cause problems for the next patch if not
fixed.

So change posix_locks_conflict() and flock_locks_conflict() to return
bool.
Also change the locks_conflict() helper.

And convert some
   return (foo);
to
   return foo;

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index a323160290f6..802d5853acd5 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -791,47 +791,50 @@ locks_delete_lock_ctx(struct file_lock *fl, struct list_head *dispose)
 /* Determine if lock sys_fl blocks lock caller_fl. Common functionality
  * checks for shared/exclusive status of overlapping locks.
  */
-static int locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
+static bool locks_conflict(struct file_lock *caller_fl,
+			   struct file_lock *sys_fl)
 {
 	if (sys_fl->fl_type == F_WRLCK)
-		return 1;
+		return true;
 	if (caller_fl->fl_type == F_WRLCK)
-		return 1;
-	return 0;
+		return true;
+	return false;
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. POSIX specific
  * checking before calling the locks_conflict().
  */
-static int posix_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
+static bool posix_locks_conflict(struct file_lock *caller_fl,
+				 struct file_lock *sys_fl)
 {
 	/* POSIX locks owned by the same process do not conflict with
 	 * each other.
 	 */
 	if (posix_same_owner(caller_fl, sys_fl))
-		return (0);
+		return false;
 
 	/* Check whether they overlap */
 	if (!locks_overlap(caller_fl, sys_fl))
-		return 0;
+		return false;
 
-	return (locks_conflict(caller_fl, sys_fl));
+	return locks_conflict(caller_fl, sys_fl);
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific
  * checking before calling the locks_conflict().
  */
-static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl)
+static bool flock_locks_conflict(struct file_lock *caller_fl,
+				 struct file_lock *sys_fl)
 {
 	/* FLOCK locks referring to the same filp do not conflict with
 	 * each other.
 	 */
 	if (caller_fl->fl_file == sys_fl->fl_file)
-		return (0);
+		return false;
 	if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
-		return 0;
+		return false;
 
-	return (locks_conflict(caller_fl, sys_fl));
+	return locks_conflict(caller_fl, sys_fl);
 }
 
 void



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/12] fs/locks: create a tree of dependent requests.
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
                   ` (9 preceding siblings ...)
  2018-11-05  1:30 ` [PATCH 11/12] locks: merge posix_unblock_lock() and locks_delete_block() NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-08 21:30   ` J. Bruce Fields
  2018-11-05  1:30 ` [PATCH 12/12] VFS: locks: remove unnecessary white space NeilBrown
  2018-11-08 21:35 ` [PATCH 00/12] Series short description J. Bruce Fields
  12 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

When we find an existing lock which conflicts with a request,
and the request wants to wait, we currently add the request
to a list.  When the lock is removed, the whole list is woken.
This can cause the thundering-herd problem.
To reduce the problem, we make use of the (new) fact that
a pending request can itself have a list of blocked requests.
When we find a conflict, we look through the existing blocked requests.
If any one of them blocks the new request, the new request is attached
below that request, otherwise it is added to the list of blocked
requests, which are now known to be mutually non-conflicting.

This way, when the lock is released, only a set of non-conflicting
locks will be woken, the rest can stay asleep.
If the lock request cannot be granted and the request needs to be
requeued, all the other requests it blocks will then be woken

Reported-and-tested-by: Martin Wilck <mwilck@suse.de>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 802d5853acd5..1b0eac6b2918 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -715,11 +715,25 @@ static void locks_delete_block(struct file_lock *waiter)
  * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring
  * that the flc_lock is also held on insertions we can avoid taking the
  * blocked_lock_lock in some cases when we see that the fl_blocked list is empty.
+ *
+ * Rather than just adding to the list, we check for conflicts with any existing
+ * waiters, and add beneath any waiter that blocks the new waiter.
+ * Thus wakeups don't happen until needed.
  */
 static void __locks_insert_block(struct file_lock *blocker,
-					struct file_lock *waiter)
+				 struct file_lock *waiter,
+				 bool conflict(struct file_lock *,
+					       struct file_lock *))
 {
+	struct file_lock *fl;
 	BUG_ON(!list_empty(&waiter->fl_block));
+
+new_blocker:
+	list_for_each_entry(fl, &blocker->fl_blocked, fl_block)
+		if (conflict(fl, waiter)) {
+			blocker =  fl;
+			goto new_blocker;
+		}
 	waiter->fl_blocker = blocker;
 	list_add_tail(&waiter->fl_block, &blocker->fl_blocked);
 	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
@@ -734,10 +748,12 @@ static void __locks_insert_block(struct file_lock *blocker,
 
 /* Must be called with flc_lock held. */
 static void locks_insert_block(struct file_lock *blocker,
-					struct file_lock *waiter)
+			       struct file_lock *waiter,
+			       bool conflict(struct file_lock *,
+					     struct file_lock *))
 {
 	spin_lock(&blocked_lock_lock);
-	__locks_insert_block(blocker, waiter);
+	__locks_insert_block(blocker, waiter, conflict);
 	spin_unlock(&blocked_lock_lock);
 }
 
@@ -996,7 +1012,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
 		if (!(request->fl_flags & FL_SLEEP))
 			goto out;
 		error = FILE_LOCK_DEFERRED;
-		locks_insert_block(fl, request);
+		locks_insert_block(fl, request, flock_locks_conflict);
 		goto out;
 	}
 	if (request->fl_flags & FL_ACCESS)
@@ -1071,7 +1087,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
 			spin_lock(&blocked_lock_lock);
 			if (likely(!posix_locks_deadlock(request, fl))) {
 				error = FILE_LOCK_DEFERRED;
-				__locks_insert_block(fl, request);
+				__locks_insert_block(fl, request,
+						     posix_locks_conflict);
 			}
 			spin_unlock(&blocked_lock_lock);
 			goto out;
@@ -1542,7 +1559,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
 		break_time -= jiffies;
 	if (break_time == 0)
 		break_time++;
-	locks_insert_block(fl, new_fl);
+	locks_insert_block(fl, new_fl, leases_conflict);
 	trace_break_lease_block(inode, new_fl);
 	spin_unlock(&ctx->flc_lock);
 	percpu_up_read_preempt_enable(&file_rwsem);



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/12] locks: merge posix_unblock_lock() and locks_delete_block()
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
                   ` (8 preceding siblings ...)
  2018-11-05  1:30 ` [PATCH 03/12] NFS: use locks_copy_lock() to copy locks NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-05  1:30 ` [PATCH 10/12] fs/locks: create a tree of dependent requests NeilBrown
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

posix_unblock_lock() is not specific to posix locks, and behaves
nearly identically to locks_delete_block() - the former returning a
status while the later doesn't.

So discard posix_unblock_lock() and use locks_delete_block() instead,
after giving that function an appropriate return value.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/cifs/file.c      |    2 +-
 fs/lockd/svclock.c  |    2 +-
 fs/locks.c          |   36 +++++++++++++-----------------------
 fs/nfsd/nfs4state.c |    6 +++---
 include/linux/fs.h  |    4 ++--
 5 files changed, 20 insertions(+), 30 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index d7ed895e05d1..94c3575e850c 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1106,7 +1106,7 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
 		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
 		if (!rc)
 			goto try_again;
-		posix_unblock_lock(flock);
+		locks_delete_block(flock);
 	}
 	return rc;
 }
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 74330daeab71..ea719cdd6a36 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -276,7 +276,7 @@ static int nlmsvc_unlink_block(struct nlm_block *block)
 	dprintk("lockd: unlinking block %p...\n", block);
 
 	/* Remove block from list */
-	status = posix_unblock_lock(&block->b_call->a_args.lock.fl);
+	status = locks_delete_block(&block->b_call->a_args.lock.fl);
 	nlmsvc_remove_block(block);
 	return status;
 }
diff --git a/fs/locks.c b/fs/locks.c
index 1b0eac6b2918..d00a36523c3f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -698,13 +698,25 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
 	}
 }
 
-static void locks_delete_block(struct file_lock *waiter)
+/**
+ *	locks_delete_lock - stop waiting for a file lock
+ *	@waiter: the lock which was waiting
+ *
+ *	lockd/nfsd need to disconnect the lock while working on it.
+ */
+int locks_delete_block(struct file_lock *waiter)
 {
+	int status = -ENOENT;
+
 	spin_lock(&blocked_lock_lock);
+	if (waiter->fl_blocker)
+		status = 0;
 	__locks_wake_up_blocks(waiter);
 	__locks_delete_block(waiter);
 	spin_unlock(&blocked_lock_lock);
+	return status;
 }
+EXPORT_SYMBOL(locks_delete_block);
 
 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
@@ -2610,28 +2622,6 @@ void locks_remove_file(struct file *filp)
 	spin_unlock(&ctx->flc_lock);
 }
 
-/**
- *	posix_unblock_lock - stop waiting for a file lock
- *	@waiter: the lock which was waiting
- *
- *	lockd needs to block waiting for locks.
- */
-int
-posix_unblock_lock(struct file_lock *waiter)
-{
-	int status = -ENOENT;
-
-	spin_lock(&blocked_lock_lock);
-	if (waiter->fl_blocker) {
-		__locks_wake_up_blocks(waiter);
-		__locks_delete_block(waiter);
-		status = 0;
-	}
-	spin_unlock(&blocked_lock_lock);
-	return status;
-}
-EXPORT_SYMBOL(posix_unblock_lock);
-
 /**
  * vfs_cancel_lock - file byte range unblock lock
  * @filp: The file to apply the unblock to
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index f093fbe47133..a334828723fa 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -238,7 +238,7 @@ find_blocked_lock(struct nfs4_lockowner *lo, struct knfsd_fh *fh,
 	}
 	spin_unlock(&nn->blocked_locks_lock);
 	if (found)
-		posix_unblock_lock(&found->nbl_lock);
+		locks_delete_block(&found->nbl_lock);
 	return found;
 }
 
@@ -293,7 +293,7 @@ remove_blocked_locks(struct nfs4_lockowner *lo)
 		nbl = list_first_entry(&reaplist, struct nfsd4_blocked_lock,
 					nbl_lru);
 		list_del_init(&nbl->nbl_lru);
-		posix_unblock_lock(&nbl->nbl_lock);
+		locks_delete_block(&nbl->nbl_lock);
 		free_blocked_lock(nbl);
 	}
 }
@@ -4863,7 +4863,7 @@ nfs4_laundromat(struct nfsd_net *nn)
 		nbl = list_first_entry(&reaplist,
 					struct nfsd4_blocked_lock, nbl_lru);
 		list_del_init(&nbl->nbl_lru);
-		posix_unblock_lock(&nbl->nbl_lock);
+		locks_delete_block(&nbl->nbl_lock);
 		free_blocked_lock(nbl);
 	}
 out:
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a161bcdca8a2..a47bf97cb964 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1122,7 +1122,7 @@ extern void locks_remove_file(struct file *);
 extern void locks_release_private(struct file_lock *);
 extern void posix_test_lock(struct file *, struct file_lock *);
 extern int posix_lock_file(struct file *, struct file_lock *, struct file_lock *);
-extern int posix_unblock_lock(struct file_lock *);
+extern int locks_delete_block(struct file_lock *);
 extern int vfs_test_lock(struct file *, struct file_lock *);
 extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
 extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
@@ -1212,7 +1212,7 @@ static inline int posix_lock_file(struct file *filp, struct file_lock *fl,
 	return -ENOLCK;
 }
 
-static inline int posix_unblock_lock(struct file_lock *waiter)
+static inline int locks_delete_block(struct file_lock *waiter)
 {
 	return -ENOENT;
 }



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/12] VFS: locks: remove unnecessary white space.
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
                   ` (10 preceding siblings ...)
  2018-11-05  1:30 ` [PATCH 10/12] fs/locks: create a tree of dependent requests NeilBrown
@ 2018-11-05  1:30 ` NeilBrown
  2018-11-08 21:35 ` [PATCH 00/12] Series short description J. Bruce Fields
  12 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2018-11-05  1:30 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

 - spaces before tabs,
 - spaces at the end of lines,
 - multiple blank lines,
 - blank lines before EXPORT_SYMBOL,
can all go.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   33 ++++++++++++---------------------
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index d00a36523c3f..17a72875b65c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -11,11 +11,11 @@
  *
  *  Miscellaneous edits, and a total rewrite of posix_lock_file() code.
  *  Kai Petzke (wpp@marie.physik.tu-berlin.de), 1994
- *  
+ *
  *  Converted file_lock_table to a linked list from an array, which eliminates
  *  the limits on how many active file locks are open.
  *  Chad Page (pageone@netcom.com), November 27, 1994
- * 
+ *
  *  Removed dependency on file descriptors. dup()'ed file descriptors now
  *  get the same locks as the original file descriptors, and a close() on
  *  any file descriptor removes ALL the locks on the file for the current
@@ -41,7 +41,7 @@
  *  with a file pointer (filp). As a result they can be shared by a parent
  *  process and its children after a fork(). They are removed when the last
  *  file descriptor referring to the file pointer is closed (unless explicitly
- *  unlocked). 
+ *  unlocked).
  *
  *  FL_FLOCK locks never deadlock, an existing lock is always removed before
  *  upgrading from shared to exclusive (or vice versa). When this happens
@@ -50,7 +50,7 @@
  *  Andy Walker (andy@lysaker.kvaerner.no), June 09, 1995
  *
  *  Removed some race conditions in flock_lock_file(), marked other possible
- *  races. Just grep for FIXME to see them. 
+ *  races. Just grep for FIXME to see them.
  *  Dmitry Gorodchanin (pgmdsg@ibi.com), February 09, 1996.
  *
  *  Addressed Dmitry's concerns. Deadlock checking no longer recursive.
@@ -359,7 +359,6 @@ void locks_init_lock(struct file_lock *fl)
 	memset(fl, 0, sizeof(struct file_lock));
 	locks_init_lock_heads(fl);
 }
-
 EXPORT_SYMBOL(locks_init_lock);
 
 /*
@@ -399,7 +398,6 @@ void locks_copy_lock(struct file_lock *new, struct file_lock *fl)
 			fl->fl_ops->fl_copy_lock(new, fl);
 	}
 }
-
 EXPORT_SYMBOL(locks_copy_lock);
 
 static void locks_move_blocks(struct file_lock *new, struct file_lock *fl)
@@ -435,7 +433,7 @@ flock_make_lock(struct file *filp, unsigned int cmd, struct file_lock *fl)
 
 	if (type < 0)
 		return ERR_PTR(type);
-	
+
 	if (fl == NULL)
 		fl = locks_alloc_lock();
 	if (fl == NULL)
@@ -447,7 +445,7 @@ flock_make_lock(struct file *filp, unsigned int cmd, struct file_lock *fl)
 	fl->fl_flags = FL_FLOCK;
 	fl->fl_type = type;
 	fl->fl_end = OFFSET_MAX;
-	
+
 	return fl;
 }
 
@@ -1104,8 +1102,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
 			}
 			spin_unlock(&blocked_lock_lock);
 			goto out;
-  		}
-  	}
+		}
+	}
 
 	/* If we're just looking for a conflict, we're done. */
 	error = 0;
@@ -1400,7 +1398,6 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
 
 	return error;
 }
-
 EXPORT_SYMBOL(locks_mandatory_area);
 #endif /* CONFIG_MANDATORY_FILE_LOCKING */
 
@@ -1602,7 +1599,6 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
 	locks_free_lock(new_fl);
 	return error;
 }
-
 EXPORT_SYMBOL(__break_lease);
 
 /**
@@ -1633,7 +1629,6 @@ void lease_get_mtime(struct inode *inode, struct timespec64 *time)
 	if (has_lease)
 		*time = current_time(inode);
 }
-
 EXPORT_SYMBOL(lease_get_mtime);
 
 /**
@@ -1688,8 +1683,8 @@ int fcntl_getlease(struct file *filp)
 
 /**
  * check_conflicting_open - see if the given dentry points to a file that has
- * 			    an existing open that would conflict with the
- * 			    desired lease.
+ *			    an existing open that would conflict with the
+ *			    desired lease.
  * @dentry:	dentry to check
  * @arg:	type of lease that we're trying to acquire
  * @flags:	current lock flags
@@ -1913,7 +1908,7 @@ EXPORT_SYMBOL(generic_setlease);
  * @arg:	type of lease to obtain
  * @lease:	file_lock to use when adding a lease
  * @priv:	private info for lm_setup when adding a lease (may be
- * 		NULL if lm_setup doesn't require it)
+ *		NULL if lm_setup doesn't require it)
  *
  * Call this to establish a lease on the file. The "lease" argument is not
  * used for F_UNLCK requests and may be NULL. For commands that set or alter
@@ -2201,7 +2196,7 @@ int fcntl_getlk(struct file *filp, unsigned int cmd, struct flock *flock)
 	error = vfs_test_lock(filp, fl);
 	if (error)
 		goto out;
- 
+
 	flock->l_type = fl->fl_type;
 	if (fl->fl_type != F_UNLCK) {
 		error = posix_lock_to_flock(flock, fl);
@@ -2549,7 +2544,6 @@ void locks_remove_posix(struct file *filp, fl_owner_t owner)
 		lock.fl_ops->fl_release_private(&lock);
 	trace_locks_remove_posix(inode, &lock, error);
 }
-
 EXPORT_SYMBOL(locks_remove_posix);
 
 /* The i_flctx must be valid when calling into here */
@@ -2635,7 +2629,6 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
 		return filp->f_op->lock(filp, F_CANCELLK, fl);
 	return 0;
 }
-
 EXPORT_SYMBOL_GPL(vfs_cancel_lock);
 
 #ifdef CONFIG_PROC_FS
@@ -2835,7 +2828,6 @@ static int __init filelock_init(void)
 	filelock_cache = kmem_cache_create("file_lock_cache",
 			sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
 
-
 	for_each_possible_cpu(i) {
 		struct file_lock_list_struct *fll = per_cpu_ptr(&file_lock_list, i);
 
@@ -2845,5 +2837,4 @@ static int __init filelock_init(void)
 
 	return 0;
 }
-
 core_initcall(filelock_init);



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/12] gfs2: properly initial file_lock used for unlock.
  2018-11-05  1:30 ` [PATCH 04/12] gfs2: properly initial file_lock used for unlock NeilBrown
@ 2018-11-05 12:18   ` Jeff Layton
  2018-11-06  1:48     ` NeilBrown
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Layton @ 2018-11-05 12:18 UTC (permalink / raw)
  To: NeilBrown, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

On Mon, 2018-11-05 at 12:30 +1100, NeilBrown wrote:
> Rather than assuming all-zeros is sufficient, use the available API to
> initialize the file_lock structure use for unlock.
> VFS-level changes will soon make it important that the
> list_heads in file_lock are always properly initialized.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  fs/gfs2/file.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index 45a17b770d97..271f847705e3 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -1199,13 +1199,13 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl)
>  	mutex_lock(&fp->f_fl_mutex);
>  
>  	if (gfs2_holder_initialized(fl_gh)) {
> +		struct file_lock request;
>  		if (fl_gh->gh_state == state)
>  			goto out;
> -		locks_lock_file_wait(file,
> -				     &(struct file_lock) {
> -					     .fl_type = F_UNLCK,
> -					     .fl_flags = FL_FLOCK
> -				     });
> +		locks_init_lock(&request);
> +		request.fl_type = F_UNLOCK;

F_UNLCK ?

The ocfs2 patch has the same bug.

> +		request.fl_flags = FL_FLOCK;
> +		locks_lock_file_wait(file, &request);
>  		gfs2_glock_dq(fl_gh);
>  		gfs2_holder_reinit(state, flags, fl_gh);
>  	} else {
> 
> 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/12] gfs2: properly initial file_lock used for unlock.
  2018-11-05 12:18   ` Jeff Layton
@ 2018-11-06  1:48     ` NeilBrown
  2018-11-06 13:20       ` Jeff Layton
  0 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2018-11-06  1:48 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

On Mon, Nov 05 2018, Jeff Layton wrote:

> On Mon, 2018-11-05 at 12:30 +1100, NeilBrown wrote:
>> Rather than assuming all-zeros is sufficient, use the available API to
>> initialize the file_lock structure use for unlock.
>> VFS-level changes will soon make it important that the
>> list_heads in file_lock are always properly initialized.
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  fs/gfs2/file.c |   10 +++++-----
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>> 
>> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
>> index 45a17b770d97..271f847705e3 100644
>> --- a/fs/gfs2/file.c
>> +++ b/fs/gfs2/file.c
>> @@ -1199,13 +1199,13 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl)
>>  	mutex_lock(&fp->f_fl_mutex);
>>  
>>  	if (gfs2_holder_initialized(fl_gh)) {
>> +		struct file_lock request;
>>  		if (fl_gh->gh_state == state)
>>  			goto out;
>> -		locks_lock_file_wait(file,
>> -				     &(struct file_lock) {
>> -					     .fl_type = F_UNLCK,
>> -					     .fl_flags = FL_FLOCK
>> -				     });
>> +		locks_init_lock(&request);
>> +		request.fl_type = F_UNLOCK;
>
> F_UNLCK ?
>
> The ocfs2 patch has the same bug.

Anyone would think that I hadn't even compile tested.....

This is true for OCFS2 :-( but I had actually compile-tested with GFS2
enabled.
But CONFIG_DLM *wasn't* enabled, so GFS2 was compiled without locking
support.
I guess there is a good reason that GFS2 doesn't require DLM.

Do you want me to resend the series, to will you just update those
patches.

Sorry about that,
NeilBrown


>
>> +		request.fl_flags = FL_FLOCK;
>> +		locks_lock_file_wait(file, &request);
>>  		gfs2_glock_dq(fl_gh);
>>  		gfs2_holder_reinit(state, flags, fl_gh);
>>  	} else {
>> 
>> 
>
> -- 
> Jeff Layton <jlayton@kernel.org>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/12] gfs2: properly initial file_lock used for unlock.
  2018-11-06  1:48     ` NeilBrown
@ 2018-11-06 13:20       ` Jeff Layton
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Layton @ 2018-11-06 13:20 UTC (permalink / raw)
  To: NeilBrown, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

On Tue, 2018-11-06 at 12:48 +1100, NeilBrown wrote:
> On Mon, Nov 05 2018, Jeff Layton wrote:
> 
> > On Mon, 2018-11-05 at 12:30 +1100, NeilBrown wrote:
> > > Rather than assuming all-zeros is sufficient, use the available API to
> > > initialize the file_lock structure use for unlock.
> > > VFS-level changes will soon make it important that the
> > > list_heads in file_lock are always properly initialized.
> > > 
> > > Signed-off-by: NeilBrown <neilb@suse.com>
> > > ---
> > >  fs/gfs2/file.c |   10 +++++-----
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> > > index 45a17b770d97..271f847705e3 100644
> > > --- a/fs/gfs2/file.c
> > > +++ b/fs/gfs2/file.c
> > > @@ -1199,13 +1199,13 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl)
> > >  	mutex_lock(&fp->f_fl_mutex);
> > >  
> > >  	if (gfs2_holder_initialized(fl_gh)) {
> > > +		struct file_lock request;
> > >  		if (fl_gh->gh_state == state)
> > >  			goto out;
> > > -		locks_lock_file_wait(file,
> > > -				     &(struct file_lock) {
> > > -					     .fl_type = F_UNLCK,
> > > -					     .fl_flags = FL_FLOCK
> > > -				     });
> > > +		locks_init_lock(&request);
> > > +		request.fl_type = F_UNLOCK;
> > 
> > F_UNLCK ?
> > 
> > The ocfs2 patch has the same bug.
> 
> Anyone would think that I hadn't even compile tested.....
> 
> This is true for OCFS2 :-( but I had actually compile-tested with GFS2
> enabled.
> But CONFIG_DLM *wasn't* enabled, so GFS2 was compiled without locking
> support.
> I guess there is a good reason that GFS2 doesn't require DLM.
> 

I think you can run GFS2 in a single-node configuration with local
locking.

> Do you want me to resend the series, to will you just update those
> patches.
> 
> Sorry about that,
> NeilBrown
> 

No worries. I'll just fix those patches up and note it in the
changelogs. Other than the build failure, this seems to be doing fine in
testing so far. I'll likely push them to linux-next later this week.


> > 
> > > +		request.fl_flags = FL_FLOCK;
> > > +		locks_lock_file_wait(file, &request);
> > >  		gfs2_glock_dq(fl_gh);
> > >  		gfs2_holder_reinit(state, flags, fl_gh);
> > >  	} else {
> > > 
> > > 

Thanks again!
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 01/12] fs/locks: rename some lists and pointers.
  2018-11-05  1:30 ` [PATCH 01/12] fs/locks: rename some lists and pointers NeilBrown
@ 2018-11-08 20:26   ` J. Bruce Fields
  2018-11-09  0:32     ` NeilBrown
  0 siblings, 1 reply; 26+ messages in thread
From: J. Bruce Fields @ 2018-11-08 20:26 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, Alexander Viro, Martin Wilck, linux-fsdevel,
	Frank Filz, linux-kernel

On Mon, Nov 05, 2018 at 12:30:47PM +1100, NeilBrown wrote:
> struct file lock contains an 'fl_next' pointer which
> is used to point to the lock that this request is blocked
> waiting for.  So rename it to fl_blocker.
> 
> The fl_blocked list_head in an active lock is the head of a list of
> blocked requests.  In a request it is a node in that list.
> These are two distinct uses, so replace with two list_heads
> with different names.
> fl_blocked is the head of a list of blocked requests
> fl_block is a node on that list.

Reading these, I have a lot of trouble keeping fl_blocked, fl_block, and
fl_blocker straight.  Is it just me?

I guess they form a tree, so fl_children, fl_siblings, and fl_parent
might be easier for me to keep straight.

--b.

> 
> The two different list_heads are never used at the same time, but that
> will change in a future patch.
> 
> Note that a tracepoint is changed to report fl_blocker instead
> of fl_next.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  fs/cifs/file.c                  |    2 +-
>  fs/locks.c                      |   38 ++++++++++++++++++++------------------
>  include/linux/fs.h              |    7 +++++--
>  include/trace/events/filelock.h |   16 ++++++++--------
>  4 files changed, 34 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 74c33d5fafc8..d7ed895e05d1 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1103,7 +1103,7 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
>  	rc = posix_lock_file(file, flock, NULL);
>  	up_write(&cinode->lock_sem);
>  	if (rc == FILE_LOCK_DEFERRED) {
> -		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_next);
> +		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
>  		if (!rc)
>  			goto try_again;
>  		posix_unblock_lock(flock);
> diff --git a/fs/locks.c b/fs/locks.c
> index 2ecb4db8c840..a6c6d601286c 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -189,7 +189,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
>   * This lock protects the blocked_hash. Generally, if you're accessing it, you
>   * want to be holding this lock.
>   *
> - * In addition, it also protects the fl->fl_block list, and the fl->fl_next
> + * In addition, it also protects the fl->fl_blocked list, and the fl->fl_blocker
>   * pointer for file_lock structures that are acting as lock requests (in
>   * contrast to those that are acting as records of acquired locks).
>   *
> @@ -293,6 +293,7 @@ static void locks_init_lock_heads(struct file_lock *fl)
>  {
>  	INIT_HLIST_NODE(&fl->fl_link);
>  	INIT_LIST_HEAD(&fl->fl_list);
> +	INIT_LIST_HEAD(&fl->fl_blocked);
>  	INIT_LIST_HEAD(&fl->fl_block);
>  	init_waitqueue_head(&fl->fl_wait);
>  }
> @@ -332,6 +333,7 @@ void locks_free_lock(struct file_lock *fl)
>  {
>  	BUG_ON(waitqueue_active(&fl->fl_wait));
>  	BUG_ON(!list_empty(&fl->fl_list));
> +	BUG_ON(!list_empty(&fl->fl_blocked));
>  	BUG_ON(!list_empty(&fl->fl_block));
>  	BUG_ON(!hlist_unhashed(&fl->fl_link));
>  
> @@ -667,7 +669,7 @@ static void __locks_delete_block(struct file_lock *waiter)
>  {
>  	locks_delete_global_blocked(waiter);
>  	list_del_init(&waiter->fl_block);
> -	waiter->fl_next = NULL;
> +	waiter->fl_blocker = NULL;
>  }
>  
>  static void locks_delete_block(struct file_lock *waiter)
> @@ -683,16 +685,16 @@ static void locks_delete_block(struct file_lock *waiter)
>   * it seems like the reasonable thing to do.
>   *
>   * Must be called with both the flc_lock and blocked_lock_lock held. The
> - * fl_block list itself is protected by the blocked_lock_lock, but by ensuring
> + * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring
>   * that the flc_lock is also held on insertions we can avoid taking the
> - * blocked_lock_lock in some cases when we see that the fl_block list is empty.
> + * blocked_lock_lock in some cases when we see that the fl_blocked list is empty.
>   */
>  static void __locks_insert_block(struct file_lock *blocker,
>  					struct file_lock *waiter)
>  {
>  	BUG_ON(!list_empty(&waiter->fl_block));
> -	waiter->fl_next = blocker;
> -	list_add_tail(&waiter->fl_block, &blocker->fl_block);
> +	waiter->fl_blocker = blocker;
> +	list_add_tail(&waiter->fl_block, &blocker->fl_blocked);
>  	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
>  		locks_insert_global_blocked(waiter);
>  }
> @@ -716,18 +718,18 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
>  	/*
>  	 * Avoid taking global lock if list is empty. This is safe since new
>  	 * blocked requests are only added to the list under the flc_lock, and
> -	 * the flc_lock is always held here. Note that removal from the fl_block
> +	 * the flc_lock is always held here. Note that removal from the fl_blocked
>  	 * list does not require the flc_lock, so we must recheck list_empty()
>  	 * after acquiring the blocked_lock_lock.
>  	 */
> -	if (list_empty(&blocker->fl_block))
> +	if (list_empty(&blocker->fl_blocked))
>  		return;
>  
>  	spin_lock(&blocked_lock_lock);
> -	while (!list_empty(&blocker->fl_block)) {
> +	while (!list_empty(&blocker->fl_blocked)) {
>  		struct file_lock *waiter;
>  
> -		waiter = list_first_entry(&blocker->fl_block,
> +		waiter = list_first_entry(&blocker->fl_blocked,
>  				struct file_lock, fl_block);
>  		__locks_delete_block(waiter);
>  		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> @@ -878,7 +880,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
>  
>  	hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) {
>  		if (posix_same_owner(fl, block_fl))
> -			return fl->fl_next;
> +			return fl->fl_blocker;
>  	}
>  	return NULL;
>  }
> @@ -1237,7 +1239,7 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>  		error = posix_lock_inode(inode, fl, NULL);
>  		if (error != FILE_LOCK_DEFERRED)
>  			break;
> -		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
> +		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
>  		if (!error)
>  			continue;
>  
> @@ -1324,7 +1326,7 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
>  		error = posix_lock_inode(inode, &fl, NULL);
>  		if (error != FILE_LOCK_DEFERRED)
>  			break;
> -		error = wait_event_interruptible(fl.fl_wait, !fl.fl_next);
> +		error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker);
>  		if (!error) {
>  			/*
>  			 * If we've been sleeping someone might have
> @@ -1518,7 +1520,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>  
>  	locks_dispose_list(&dispose);
>  	error = wait_event_interruptible_timeout(new_fl->fl_wait,
> -						!new_fl->fl_next, break_time);
> +						!new_fl->fl_blocker, break_time);
>  
>  	percpu_down_read_preempt_disable(&file_rwsem);
>  	spin_lock(&ctx->flc_lock);
> @@ -1931,7 +1933,7 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>  		error = flock_lock_inode(inode, fl);
>  		if (error != FILE_LOCK_DEFERRED)
>  			break;
> -		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
> +		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
>  		if (!error)
>  			continue;
>  
> @@ -2210,7 +2212,7 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
>  		error = vfs_lock_file(filp, cmd, fl, NULL);
>  		if (error != FILE_LOCK_DEFERRED)
>  			break;
> -		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
> +		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
>  		if (!error)
>  			continue;
>  
> @@ -2581,7 +2583,7 @@ posix_unblock_lock(struct file_lock *waiter)
>  	int status = 0;
>  
>  	spin_lock(&blocked_lock_lock);
> -	if (waiter->fl_next)
> +	if (waiter->fl_blocker)
>  		__locks_delete_block(waiter);
>  	else
>  		status = -ENOENT;
> @@ -2707,7 +2709,7 @@ static int locks_show(struct seq_file *f, void *v)
>  
>  	lock_get_status(f, fl, iter->li_pos, "");
>  
> -	list_for_each_entry(bfl, &fl->fl_block, fl_block)
> +	list_for_each_entry(bfl, &fl->fl_blocked, fl_block)
>  		lock_get_status(f, bfl, iter->li_pos, " ->");
>  
>  	return 0;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c95c0807471f..a161bcdca8a2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1044,10 +1044,13 @@ bool opens_in_grace(struct net *);
>   * Obviously, the last two criteria only matter for POSIX locks.
>   */
>  struct file_lock {
> -	struct file_lock *fl_next;	/* singly linked list for this inode  */
> +	struct file_lock *fl_blocker;	/* The lock, that is blocking us */
>  	struct list_head fl_list;	/* link into file_lock_context */
>  	struct hlist_node fl_link;	/* node in global lists */
> -	struct list_head fl_block;	/* circular list of blocked processes */
> +	struct list_head fl_blocked;	/* list of requests with
> +					 * ->fl_blocker pointing here */
> +	struct list_head fl_block;	/* node in
> +					 * ->fl_blocker->fl_blocked */
>  	fl_owner_t fl_owner;
>  	unsigned int fl_flags;
>  	unsigned char fl_type;
> diff --git a/include/trace/events/filelock.h b/include/trace/events/filelock.h
> index 68b17c116907..fad7befa612d 100644
> --- a/include/trace/events/filelock.h
> +++ b/include/trace/events/filelock.h
> @@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(filelock_lock,
>  		__field(struct file_lock *, fl)
>  		__field(unsigned long, i_ino)
>  		__field(dev_t, s_dev)
> -		__field(struct file_lock *, fl_next)
> +		__field(struct file_lock *, fl_blocker)
>  		__field(fl_owner_t, fl_owner)
>  		__field(unsigned int, fl_pid)
>  		__field(unsigned int, fl_flags)
> @@ -82,7 +82,7 @@ DECLARE_EVENT_CLASS(filelock_lock,
>  		__entry->fl = fl ? fl : NULL;
>  		__entry->s_dev = inode->i_sb->s_dev;
>  		__entry->i_ino = inode->i_ino;
> -		__entry->fl_next = fl ? fl->fl_next : NULL;
> +		__entry->fl_blocker = fl ? fl->fl_blocker : NULL;
>  		__entry->fl_owner = fl ? fl->fl_owner : NULL;
>  		__entry->fl_pid = fl ? fl->fl_pid : 0;
>  		__entry->fl_flags = fl ? fl->fl_flags : 0;
> @@ -92,9 +92,9 @@ DECLARE_EVENT_CLASS(filelock_lock,
>  		__entry->ret = ret;
>  	),
>  
> -	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_next=0x%p fl_owner=0x%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d",
> +	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_blocker=0x%p fl_owner=0x%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d",
>  		__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
> -		__entry->i_ino, __entry->fl_next, __entry->fl_owner,
> +		__entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
>  		__entry->fl_pid, show_fl_flags(__entry->fl_flags),
>  		show_fl_type(__entry->fl_type),
>  		__entry->fl_start, __entry->fl_end, __entry->ret)
> @@ -125,7 +125,7 @@ DECLARE_EVENT_CLASS(filelock_lease,
>  		__field(struct file_lock *, fl)
>  		__field(unsigned long, i_ino)
>  		__field(dev_t, s_dev)
> -		__field(struct file_lock *, fl_next)
> +		__field(struct file_lock *, fl_blocker)
>  		__field(fl_owner_t, fl_owner)
>  		__field(unsigned int, fl_flags)
>  		__field(unsigned char, fl_type)
> @@ -137,7 +137,7 @@ DECLARE_EVENT_CLASS(filelock_lease,
>  		__entry->fl = fl ? fl : NULL;
>  		__entry->s_dev = inode->i_sb->s_dev;
>  		__entry->i_ino = inode->i_ino;
> -		__entry->fl_next = fl ? fl->fl_next : NULL;
> +		__entry->fl_blocker = fl ? fl->fl_blocker : NULL;
>  		__entry->fl_owner = fl ? fl->fl_owner : NULL;
>  		__entry->fl_flags = fl ? fl->fl_flags : 0;
>  		__entry->fl_type = fl ? fl->fl_type : 0;
> @@ -145,9 +145,9 @@ DECLARE_EVENT_CLASS(filelock_lease,
>  		__entry->fl_downgrade_time = fl ? fl->fl_downgrade_time : 0;
>  	),
>  
> -	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_next=0x%p fl_owner=0x%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu",
> +	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_blocker=0x%p fl_owner=0x%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu",
>  		__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
> -		__entry->i_ino, __entry->fl_next, __entry->fl_owner,
> +		__entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
>  		show_fl_flags(__entry->fl_flags),
>  		show_fl_type(__entry->fl_type),
>  		__entry->fl_break_time, __entry->fl_downgrade_time)
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 10/12] fs/locks: create a tree of dependent requests.
  2018-11-05  1:30 ` [PATCH 10/12] fs/locks: create a tree of dependent requests NeilBrown
@ 2018-11-08 21:30   ` J. Bruce Fields
  2018-11-09  0:38     ` NeilBrown
  0 siblings, 1 reply; 26+ messages in thread
From: J. Bruce Fields @ 2018-11-08 21:30 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, Alexander Viro, Martin Wilck, linux-fsdevel,
	Frank Filz, linux-kernel

On Mon, Nov 05, 2018 at 12:30:48PM +1100, NeilBrown wrote:
> When we find an existing lock which conflicts with a request,
> and the request wants to wait, we currently add the request
> to a list.  When the lock is removed, the whole list is woken.
> This can cause the thundering-herd problem.
> To reduce the problem, we make use of the (new) fact that
> a pending request can itself have a list of blocked requests.
> When we find a conflict, we look through the existing blocked requests.
> If any one of them blocks the new request, the new request is attached
> below that request, otherwise it is added to the list of blocked
> requests, which are now known to be mutually non-conflicting.
> 
> This way, when the lock is released, only a set of non-conflicting
> locks will be woken, the rest can stay asleep.
> If the lock request cannot be granted and the request needs to be
> requeued, all the other requests it blocks will then be woken

So, to make sure I understand: the tree of blocking locks only ever has
three levels (the active lock, the locks blocking on it, and their
children?)

--b.

> 
> Reported-and-tested-by: Martin Wilck <mwilck@suse.de>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  fs/locks.c |   29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 802d5853acd5..1b0eac6b2918 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -715,11 +715,25 @@ static void locks_delete_block(struct file_lock *waiter)
>   * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring
>   * that the flc_lock is also held on insertions we can avoid taking the
>   * blocked_lock_lock in some cases when we see that the fl_blocked list is empty.
> + *
> + * Rather than just adding to the list, we check for conflicts with any existing
> + * waiters, and add beneath any waiter that blocks the new waiter.
> + * Thus wakeups don't happen until needed.
>   */
>  static void __locks_insert_block(struct file_lock *blocker,
> -					struct file_lock *waiter)
> +				 struct file_lock *waiter,
> +				 bool conflict(struct file_lock *,
> +					       struct file_lock *))
>  {
> +	struct file_lock *fl;
>  	BUG_ON(!list_empty(&waiter->fl_block));
> +
> +new_blocker:
> +	list_for_each_entry(fl, &blocker->fl_blocked, fl_block)
> +		if (conflict(fl, waiter)) {
> +			blocker =  fl;
> +			goto new_blocker;
> +		}
>  	waiter->fl_blocker = blocker;
>  	list_add_tail(&waiter->fl_block, &blocker->fl_blocked);
>  	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
> @@ -734,10 +748,12 @@ static void __locks_insert_block(struct file_lock *blocker,
>  
>  /* Must be called with flc_lock held. */
>  static void locks_insert_block(struct file_lock *blocker,
> -					struct file_lock *waiter)
> +			       struct file_lock *waiter,
> +			       bool conflict(struct file_lock *,
> +					     struct file_lock *))
>  {
>  	spin_lock(&blocked_lock_lock);
> -	__locks_insert_block(blocker, waiter);
> +	__locks_insert_block(blocker, waiter, conflict);
>  	spin_unlock(&blocked_lock_lock);
>  }
>  
> @@ -996,7 +1012,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
>  		if (!(request->fl_flags & FL_SLEEP))
>  			goto out;
>  		error = FILE_LOCK_DEFERRED;
> -		locks_insert_block(fl, request);
> +		locks_insert_block(fl, request, flock_locks_conflict);
>  		goto out;
>  	}
>  	if (request->fl_flags & FL_ACCESS)
> @@ -1071,7 +1087,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
>  			spin_lock(&blocked_lock_lock);
>  			if (likely(!posix_locks_deadlock(request, fl))) {
>  				error = FILE_LOCK_DEFERRED;
> -				__locks_insert_block(fl, request);
> +				__locks_insert_block(fl, request,
> +						     posix_locks_conflict);
>  			}
>  			spin_unlock(&blocked_lock_lock);
>  			goto out;
> @@ -1542,7 +1559,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>  		break_time -= jiffies;
>  	if (break_time == 0)
>  		break_time++;
> -	locks_insert_block(fl, new_fl);
> +	locks_insert_block(fl, new_fl, leases_conflict);
>  	trace_break_lease_block(inode, new_fl);
>  	spin_unlock(&ctx->flc_lock);
>  	percpu_up_read_preempt_enable(&file_rwsem);
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/12] Series short description
  2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
                   ` (11 preceding siblings ...)
  2018-11-05  1:30 ` [PATCH 12/12] VFS: locks: remove unnecessary white space NeilBrown
@ 2018-11-08 21:35 ` J. Bruce Fields
  12 siblings, 0 replies; 26+ messages in thread
From: J. Bruce Fields @ 2018-11-08 21:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, Alexander Viro, Martin Wilck, linux-fsdevel,
	Frank Filz, linux-kernel

On Mon, Nov 05, 2018 at 12:30:47PM +1100, NeilBrown wrote:
> Here is the respin on this series with the file_lock properly
> initlized for unlock requests.
> I found one that I had missed before - in locks_remove_flock()
> The change makes this code smaller!
> 
> Original series description:
> 
> If you have a many-core machine, and have many threads all wanting to
> briefly lock a give file (udev is known to do this), you can get quite
> poor performance.
> 
> When one thread releases a lock, it wakes up all other threads that
> are waiting (classic thundering-herd) - one will get the lock and the
> others go to sleep.
> When you have few cores, this is not very noticeable: by the time the
> 4th or 5th thread gets enough CPU time to try to claim the lock, the
> earlier threads have claimed it, done what was needed, and released.
> With 50+ cores, the contention can easily be measured.
> 
> This patchset creates a tree of pending lock request in which siblings
> don't conflict and each lock request does conflict with its parent.
> When a lock is released, only requests which don't conflict with each
> other a woken.
> 
> Testing shows that lock-acquisitions-per-second is now fairly stable even
> as number of contending process goes to 1000.  Without this patch,
> locks-per-second drops off steeply after a few 10s of processes.
> 
> There is a small cost to this extra complexity.
> At 20 processes running a particular test on 72 cores, the lock
> acquisitions per second drops from 1.8 million to 1.4 million with
> this patch.  For 100 processes, this patch still provides 1.4 million
> while without this patch there are about 700,000.

These details are all really useful motivation for the patches.  It'd be
nice to have them in the permanent record somehow.  Maybe just merge it
into the changelog on "fs/locks: create a tree of dependent requests."?

--b.

> 
> NeilBrown
> 
> 
> ---
> 
> NeilBrown (12):
>       fs/locks: rename some lists and pointers.
>       fs/locks: split out __locks_wake_up_blocks().
>       NFS: use locks_copy_lock() to copy locks.
>       gfs2: properly initial file_lock used for unlock.
>       ocfs2: properly initial file_lock used for unlock.
>       locks: use properly initialized file_lock when unlocking.
>       fs/locks: allow a lock request to block other requests.
>       fs/locks: always delete_block after waiting.
>       fs/locks: change all *_conflict() functions to return bool.
>       fs/locks: create a tree of dependent requests.
>       locks: merge posix_unblock_lock() and locks_delete_block()
>       VFS: locks: remove unnecessary white space.
> 
> 
>  fs/cifs/file.c                  |    4 -
>  fs/gfs2/file.c                  |   10 +-
>  fs/lockd/svclock.c              |    2 
>  fs/locks.c                      |  253 +++++++++++++++++++++------------------
>  fs/nfs/nfs4proc.c               |    6 +
>  fs/nfsd/nfs4state.c             |    6 -
>  fs/ocfs2/locks.c                |   10 +-
>  include/linux/fs.h              |   11 +-
>  include/trace/events/filelock.h |   16 +-
>  9 files changed, 173 insertions(+), 145 deletions(-)
> 
> --
> Signature

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 01/12] fs/locks: rename some lists and pointers.
  2018-11-08 20:26   ` J. Bruce Fields
@ 2018-11-09  0:32     ` NeilBrown
  2018-11-09  3:11       ` J. Bruce Fields
  0 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2018-11-09  0:32 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Alexander Viro, Martin Wilck, linux-fsdevel,
	Frank Filz, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 14379 bytes --]

On Thu, Nov 08 2018, J. Bruce Fields wrote:

> On Mon, Nov 05, 2018 at 12:30:47PM +1100, NeilBrown wrote:
>> struct file lock contains an 'fl_next' pointer which
>> is used to point to the lock that this request is blocked
>> waiting for.  So rename it to fl_blocker.
>> 
>> The fl_blocked list_head in an active lock is the head of a list of
>> blocked requests.  In a request it is a node in that list.
>> These are two distinct uses, so replace with two list_heads
>> with different names.
>> fl_blocked is the head of a list of blocked requests
>> fl_block is a node on that list.
>
> Reading these, I have a lot of trouble keeping fl_blocked, fl_block, and
> fl_blocker straight.  Is it just me?

"Naming is hard" - but that is no excuse.
I suspect it isn't just you.

I particularly like "fl_blocker".

		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);

reads well to me - wait until this lock has a no blocker - i.e. until
nothing blocks it.

fl_blocked could be fl_blockees (the things that I block), but I doubt
that is an improvement.

>
> I guess they form a tree, so fl_children, fl_siblings, and fl_parent
> might be easier for me to keep straight.

This requires one to know a priori that the tree records which locks
block which requests, which is obvious to us now, but might not be so
obvious in 5 years time when we look at this code again.

An I have never really liked the "siblings" naming. 'struct dentry' uses
"d_child", which is possibly ever more confusing.
I would like it to be obvious that this is a list-member, not a
list-head.  Rusty once posted patches to allow the list head to be a
different type to the members, but that fell on deaf ears.
So
   fl_blocked_member
might be an improvement - this is a member of the fl_blocked list.
It would be easier to search for than fl_block - which needs
fl_block[^a-z] to avoid false positives.

I'd be quite happy to change fl_block is any two people can agree on a
better name.  I'm less inclined to change the others without a really
good proposal.

Hmmm. what is the inverse of "Block"?  If I block you then you ....  I
know, you are a usurper.
So
  fl_blocker points to the "parent"
  fl_usurpers is a list of "children"
  fl_usurpers_member is my linkage in that list.
or not.

Thanks,
NeilBrown


>
> --b.
>
>> 
>> The two different list_heads are never used at the same time, but that
>> will change in a future patch.
>> 
>> Note that a tracepoint is changed to report fl_blocker instead
>> of fl_next.
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  fs/cifs/file.c                  |    2 +-
>>  fs/locks.c                      |   38 ++++++++++++++++++++------------------
>>  include/linux/fs.h              |    7 +++++--
>>  include/trace/events/filelock.h |   16 ++++++++--------
>>  4 files changed, 34 insertions(+), 29 deletions(-)
>> 
>> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
>> index 74c33d5fafc8..d7ed895e05d1 100644
>> --- a/fs/cifs/file.c
>> +++ b/fs/cifs/file.c
>> @@ -1103,7 +1103,7 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
>>  	rc = posix_lock_file(file, flock, NULL);
>>  	up_write(&cinode->lock_sem);
>>  	if (rc == FILE_LOCK_DEFERRED) {
>> -		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_next);
>> +		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
>>  		if (!rc)
>>  			goto try_again;
>>  		posix_unblock_lock(flock);
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 2ecb4db8c840..a6c6d601286c 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -189,7 +189,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
>>   * This lock protects the blocked_hash. Generally, if you're accessing it, you
>>   * want to be holding this lock.
>>   *
>> - * In addition, it also protects the fl->fl_block list, and the fl->fl_next
>> + * In addition, it also protects the fl->fl_blocked list, and the fl->fl_blocker
>>   * pointer for file_lock structures that are acting as lock requests (in
>>   * contrast to those that are acting as records of acquired locks).
>>   *
>> @@ -293,6 +293,7 @@ static void locks_init_lock_heads(struct file_lock *fl)
>>  {
>>  	INIT_HLIST_NODE(&fl->fl_link);
>>  	INIT_LIST_HEAD(&fl->fl_list);
>> +	INIT_LIST_HEAD(&fl->fl_blocked);
>>  	INIT_LIST_HEAD(&fl->fl_block);
>>  	init_waitqueue_head(&fl->fl_wait);
>>  }
>> @@ -332,6 +333,7 @@ void locks_free_lock(struct file_lock *fl)
>>  {
>>  	BUG_ON(waitqueue_active(&fl->fl_wait));
>>  	BUG_ON(!list_empty(&fl->fl_list));
>> +	BUG_ON(!list_empty(&fl->fl_blocked));
>>  	BUG_ON(!list_empty(&fl->fl_block));
>>  	BUG_ON(!hlist_unhashed(&fl->fl_link));
>>  
>> @@ -667,7 +669,7 @@ static void __locks_delete_block(struct file_lock *waiter)
>>  {
>>  	locks_delete_global_blocked(waiter);
>>  	list_del_init(&waiter->fl_block);
>> -	waiter->fl_next = NULL;
>> +	waiter->fl_blocker = NULL;
>>  }
>>  
>>  static void locks_delete_block(struct file_lock *waiter)
>> @@ -683,16 +685,16 @@ static void locks_delete_block(struct file_lock *waiter)
>>   * it seems like the reasonable thing to do.
>>   *
>>   * Must be called with both the flc_lock and blocked_lock_lock held. The
>> - * fl_block list itself is protected by the blocked_lock_lock, but by ensuring
>> + * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring
>>   * that the flc_lock is also held on insertions we can avoid taking the
>> - * blocked_lock_lock in some cases when we see that the fl_block list is empty.
>> + * blocked_lock_lock in some cases when we see that the fl_blocked list is empty.
>>   */
>>  static void __locks_insert_block(struct file_lock *blocker,
>>  					struct file_lock *waiter)
>>  {
>>  	BUG_ON(!list_empty(&waiter->fl_block));
>> -	waiter->fl_next = blocker;
>> -	list_add_tail(&waiter->fl_block, &blocker->fl_block);
>> +	waiter->fl_blocker = blocker;
>> +	list_add_tail(&waiter->fl_block, &blocker->fl_blocked);
>>  	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
>>  		locks_insert_global_blocked(waiter);
>>  }
>> @@ -716,18 +718,18 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
>>  	/*
>>  	 * Avoid taking global lock if list is empty. This is safe since new
>>  	 * blocked requests are only added to the list under the flc_lock, and
>> -	 * the flc_lock is always held here. Note that removal from the fl_block
>> +	 * the flc_lock is always held here. Note that removal from the fl_blocked
>>  	 * list does not require the flc_lock, so we must recheck list_empty()
>>  	 * after acquiring the blocked_lock_lock.
>>  	 */
>> -	if (list_empty(&blocker->fl_block))
>> +	if (list_empty(&blocker->fl_blocked))
>>  		return;
>>  
>>  	spin_lock(&blocked_lock_lock);
>> -	while (!list_empty(&blocker->fl_block)) {
>> +	while (!list_empty(&blocker->fl_blocked)) {
>>  		struct file_lock *waiter;
>>  
>> -		waiter = list_first_entry(&blocker->fl_block,
>> +		waiter = list_first_entry(&blocker->fl_blocked,
>>  				struct file_lock, fl_block);
>>  		__locks_delete_block(waiter);
>>  		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>> @@ -878,7 +880,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
>>  
>>  	hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) {
>>  		if (posix_same_owner(fl, block_fl))
>> -			return fl->fl_next;
>> +			return fl->fl_blocker;
>>  	}
>>  	return NULL;
>>  }
>> @@ -1237,7 +1239,7 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>>  		error = posix_lock_inode(inode, fl, NULL);
>>  		if (error != FILE_LOCK_DEFERRED)
>>  			break;
>> -		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
>> +		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
>>  		if (!error)
>>  			continue;
>>  
>> @@ -1324,7 +1326,7 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
>>  		error = posix_lock_inode(inode, &fl, NULL);
>>  		if (error != FILE_LOCK_DEFERRED)
>>  			break;
>> -		error = wait_event_interruptible(fl.fl_wait, !fl.fl_next);
>> +		error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker);
>>  		if (!error) {
>>  			/*
>>  			 * If we've been sleeping someone might have
>> @@ -1518,7 +1520,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>>  
>>  	locks_dispose_list(&dispose);
>>  	error = wait_event_interruptible_timeout(new_fl->fl_wait,
>> -						!new_fl->fl_next, break_time);
>> +						!new_fl->fl_blocker, break_time);
>>  
>>  	percpu_down_read_preempt_disable(&file_rwsem);
>>  	spin_lock(&ctx->flc_lock);
>> @@ -1931,7 +1933,7 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>>  		error = flock_lock_inode(inode, fl);
>>  		if (error != FILE_LOCK_DEFERRED)
>>  			break;
>> -		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
>> +		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
>>  		if (!error)
>>  			continue;
>>  
>> @@ -2210,7 +2212,7 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
>>  		error = vfs_lock_file(filp, cmd, fl, NULL);
>>  		if (error != FILE_LOCK_DEFERRED)
>>  			break;
>> -		error = wait_event_interruptible(fl->fl_wait, !fl->fl_next);
>> +		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
>>  		if (!error)
>>  			continue;
>>  
>> @@ -2581,7 +2583,7 @@ posix_unblock_lock(struct file_lock *waiter)
>>  	int status = 0;
>>  
>>  	spin_lock(&blocked_lock_lock);
>> -	if (waiter->fl_next)
>> +	if (waiter->fl_blocker)
>>  		__locks_delete_block(waiter);
>>  	else
>>  		status = -ENOENT;
>> @@ -2707,7 +2709,7 @@ static int locks_show(struct seq_file *f, void *v)
>>  
>>  	lock_get_status(f, fl, iter->li_pos, "");
>>  
>> -	list_for_each_entry(bfl, &fl->fl_block, fl_block)
>> +	list_for_each_entry(bfl, &fl->fl_blocked, fl_block)
>>  		lock_get_status(f, bfl, iter->li_pos, " ->");
>>  
>>  	return 0;
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index c95c0807471f..a161bcdca8a2 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -1044,10 +1044,13 @@ bool opens_in_grace(struct net *);
>>   * Obviously, the last two criteria only matter for POSIX locks.
>>   */
>>  struct file_lock {
>> -	struct file_lock *fl_next;	/* singly linked list for this inode  */
>> +	struct file_lock *fl_blocker;	/* The lock, that is blocking us */
>>  	struct list_head fl_list;	/* link into file_lock_context */
>>  	struct hlist_node fl_link;	/* node in global lists */
>> -	struct list_head fl_block;	/* circular list of blocked processes */
>> +	struct list_head fl_blocked;	/* list of requests with
>> +					 * ->fl_blocker pointing here */
>> +	struct list_head fl_block;	/* node in
>> +					 * ->fl_blocker->fl_blocked */
>>  	fl_owner_t fl_owner;
>>  	unsigned int fl_flags;
>>  	unsigned char fl_type;
>> diff --git a/include/trace/events/filelock.h b/include/trace/events/filelock.h
>> index 68b17c116907..fad7befa612d 100644
>> --- a/include/trace/events/filelock.h
>> +++ b/include/trace/events/filelock.h
>> @@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(filelock_lock,
>>  		__field(struct file_lock *, fl)
>>  		__field(unsigned long, i_ino)
>>  		__field(dev_t, s_dev)
>> -		__field(struct file_lock *, fl_next)
>> +		__field(struct file_lock *, fl_blocker)
>>  		__field(fl_owner_t, fl_owner)
>>  		__field(unsigned int, fl_pid)
>>  		__field(unsigned int, fl_flags)
>> @@ -82,7 +82,7 @@ DECLARE_EVENT_CLASS(filelock_lock,
>>  		__entry->fl = fl ? fl : NULL;
>>  		__entry->s_dev = inode->i_sb->s_dev;
>>  		__entry->i_ino = inode->i_ino;
>> -		__entry->fl_next = fl ? fl->fl_next : NULL;
>> +		__entry->fl_blocker = fl ? fl->fl_blocker : NULL;
>>  		__entry->fl_owner = fl ? fl->fl_owner : NULL;
>>  		__entry->fl_pid = fl ? fl->fl_pid : 0;
>>  		__entry->fl_flags = fl ? fl->fl_flags : 0;
>> @@ -92,9 +92,9 @@ DECLARE_EVENT_CLASS(filelock_lock,
>>  		__entry->ret = ret;
>>  	),
>>  
>> -	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_next=0x%p fl_owner=0x%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d",
>> +	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_blocker=0x%p fl_owner=0x%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d",
>>  		__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
>> -		__entry->i_ino, __entry->fl_next, __entry->fl_owner,
>> +		__entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
>>  		__entry->fl_pid, show_fl_flags(__entry->fl_flags),
>>  		show_fl_type(__entry->fl_type),
>>  		__entry->fl_start, __entry->fl_end, __entry->ret)
>> @@ -125,7 +125,7 @@ DECLARE_EVENT_CLASS(filelock_lease,
>>  		__field(struct file_lock *, fl)
>>  		__field(unsigned long, i_ino)
>>  		__field(dev_t, s_dev)
>> -		__field(struct file_lock *, fl_next)
>> +		__field(struct file_lock *, fl_blocker)
>>  		__field(fl_owner_t, fl_owner)
>>  		__field(unsigned int, fl_flags)
>>  		__field(unsigned char, fl_type)
>> @@ -137,7 +137,7 @@ DECLARE_EVENT_CLASS(filelock_lease,
>>  		__entry->fl = fl ? fl : NULL;
>>  		__entry->s_dev = inode->i_sb->s_dev;
>>  		__entry->i_ino = inode->i_ino;
>> -		__entry->fl_next = fl ? fl->fl_next : NULL;
>> +		__entry->fl_blocker = fl ? fl->fl_blocker : NULL;
>>  		__entry->fl_owner = fl ? fl->fl_owner : NULL;
>>  		__entry->fl_flags = fl ? fl->fl_flags : 0;
>>  		__entry->fl_type = fl ? fl->fl_type : 0;
>> @@ -145,9 +145,9 @@ DECLARE_EVENT_CLASS(filelock_lease,
>>  		__entry->fl_downgrade_time = fl ? fl->fl_downgrade_time : 0;
>>  	),
>>  
>> -	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_next=0x%p fl_owner=0x%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu",
>> +	TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_blocker=0x%p fl_owner=0x%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu",
>>  		__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
>> -		__entry->i_ino, __entry->fl_next, __entry->fl_owner,
>> +		__entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
>>  		show_fl_flags(__entry->fl_flags),
>>  		show_fl_type(__entry->fl_type),
>>  		__entry->fl_break_time, __entry->fl_downgrade_time)
>> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 10/12] fs/locks: create a tree of dependent requests.
  2018-11-08 21:30   ` J. Bruce Fields
@ 2018-11-09  0:38     ` NeilBrown
  2018-11-09  3:09       ` J. Bruce Fields
  0 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2018-11-09  0:38 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Alexander Viro, Martin Wilck, linux-fsdevel,
	Frank Filz, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5173 bytes --]

On Thu, Nov 08 2018, J. Bruce Fields wrote:

> On Mon, Nov 05, 2018 at 12:30:48PM +1100, NeilBrown wrote:
>> When we find an existing lock which conflicts with a request,
>> and the request wants to wait, we currently add the request
>> to a list.  When the lock is removed, the whole list is woken.
>> This can cause the thundering-herd problem.
>> To reduce the problem, we make use of the (new) fact that
>> a pending request can itself have a list of blocked requests.
>> When we find a conflict, we look through the existing blocked requests.
>> If any one of them blocks the new request, the new request is attached
>> below that request, otherwise it is added to the list of blocked
>> requests, which are now known to be mutually non-conflicting.
>> 
>> This way, when the lock is released, only a set of non-conflicting
>> locks will be woken, the rest can stay asleep.
>> If the lock request cannot be granted and the request needs to be
>> requeued, all the other requests it blocks will then be woken
>
> So, to make sure I understand: the tree of blocking locks only ever has
> three levels (the active lock, the locks blocking on it, and their
> children?)

Not correct.
Blocks is only vertical, never horizontal.  Siblings never block each
other.
So one process hold a lock on a byte, and 27 other process want a lock
on that byte, then there will be 28 levels in a narrow tree - it is
effectively a queue.
Branching (via siblings) only happens when a child conflict with only
part of the lock held by the parent.
So if one process locks 32K, then two other processes request locks on
the 2 16K halves, then 4 processes request locks on the 8K quarters, and
so-on, then you could end up with 32767 processes in a binary tree, with
half of them all waiting on different individual bytes.

NeilBrown

>
> --b.
>
>> 
>> Reported-and-tested-by: Martin Wilck <mwilck@suse.de>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  fs/locks.c |   29 +++++++++++++++++++++++------
>>  1 file changed, 23 insertions(+), 6 deletions(-)
>> 
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 802d5853acd5..1b0eac6b2918 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -715,11 +715,25 @@ static void locks_delete_block(struct file_lock *waiter)
>>   * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring
>>   * that the flc_lock is also held on insertions we can avoid taking the
>>   * blocked_lock_lock in some cases when we see that the fl_blocked list is empty.
>> + *
>> + * Rather than just adding to the list, we check for conflicts with any existing
>> + * waiters, and add beneath any waiter that blocks the new waiter.
>> + * Thus wakeups don't happen until needed.
>>   */
>>  static void __locks_insert_block(struct file_lock *blocker,
>> -					struct file_lock *waiter)
>> +				 struct file_lock *waiter,
>> +				 bool conflict(struct file_lock *,
>> +					       struct file_lock *))
>>  {
>> +	struct file_lock *fl;
>>  	BUG_ON(!list_empty(&waiter->fl_block));
>> +
>> +new_blocker:
>> +	list_for_each_entry(fl, &blocker->fl_blocked, fl_block)
>> +		if (conflict(fl, waiter)) {
>> +			blocker =  fl;
>> +			goto new_blocker;
>> +		}
>>  	waiter->fl_blocker = blocker;
>>  	list_add_tail(&waiter->fl_block, &blocker->fl_blocked);
>>  	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
>> @@ -734,10 +748,12 @@ static void __locks_insert_block(struct file_lock *blocker,
>>  
>>  /* Must be called with flc_lock held. */
>>  static void locks_insert_block(struct file_lock *blocker,
>> -					struct file_lock *waiter)
>> +			       struct file_lock *waiter,
>> +			       bool conflict(struct file_lock *,
>> +					     struct file_lock *))
>>  {
>>  	spin_lock(&blocked_lock_lock);
>> -	__locks_insert_block(blocker, waiter);
>> +	__locks_insert_block(blocker, waiter, conflict);
>>  	spin_unlock(&blocked_lock_lock);
>>  }
>>  
>> @@ -996,7 +1012,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
>>  		if (!(request->fl_flags & FL_SLEEP))
>>  			goto out;
>>  		error = FILE_LOCK_DEFERRED;
>> -		locks_insert_block(fl, request);
>> +		locks_insert_block(fl, request, flock_locks_conflict);
>>  		goto out;
>>  	}
>>  	if (request->fl_flags & FL_ACCESS)
>> @@ -1071,7 +1087,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
>>  			spin_lock(&blocked_lock_lock);
>>  			if (likely(!posix_locks_deadlock(request, fl))) {
>>  				error = FILE_LOCK_DEFERRED;
>> -				__locks_insert_block(fl, request);
>> +				__locks_insert_block(fl, request,
>> +						     posix_locks_conflict);
>>  			}
>>  			spin_unlock(&blocked_lock_lock);
>>  			goto out;
>> @@ -1542,7 +1559,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>>  		break_time -= jiffies;
>>  	if (break_time == 0)
>>  		break_time++;
>> -	locks_insert_block(fl, new_fl);
>> +	locks_insert_block(fl, new_fl, leases_conflict);
>>  	trace_break_lease_block(inode, new_fl);
>>  	spin_unlock(&ctx->flc_lock);
>>  	percpu_up_read_preempt_enable(&file_rwsem);
>> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 10/12] fs/locks: create a tree of dependent requests.
  2018-11-09  0:38     ` NeilBrown
@ 2018-11-09  3:09       ` J. Bruce Fields
  2018-11-09  6:24         ` NeilBrown
  0 siblings, 1 reply; 26+ messages in thread
From: J. Bruce Fields @ 2018-11-09  3:09 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, Alexander Viro, Martin Wilck, linux-fsdevel,
	Frank Filz, linux-kernel

On Fri, Nov 09, 2018 at 11:38:19AM +1100, NeilBrown wrote:
> On Thu, Nov 08 2018, J. Bruce Fields wrote:
> 
> > On Mon, Nov 05, 2018 at 12:30:48PM +1100, NeilBrown wrote:
> >> When we find an existing lock which conflicts with a request,
> >> and the request wants to wait, we currently add the request
> >> to a list.  When the lock is removed, the whole list is woken.
> >> This can cause the thundering-herd problem.
> >> To reduce the problem, we make use of the (new) fact that
> >> a pending request can itself have a list of blocked requests.
> >> When we find a conflict, we look through the existing blocked requests.
> >> If any one of them blocks the new request, the new request is attached
> >> below that request, otherwise it is added to the list of blocked
> >> requests, which are now known to be mutually non-conflicting.
> >> 
> >> This way, when the lock is released, only a set of non-conflicting
> >> locks will be woken, the rest can stay asleep.
> >> If the lock request cannot be granted and the request needs to be
> >> requeued, all the other requests it blocks will then be woken
> >
> > So, to make sure I understand: the tree of blocking locks only ever has
> > three levels (the active lock, the locks blocking on it, and their
> > children?)
> 
> Not correct.
> Blocks is only vertical, never horizontal.  Siblings never block each
> other.
> So one process hold a lock on a byte, and 27 other process want a lock
> on that byte, then there will be 28 levels in a narrow tree - it is
> effectively a queue.
> Branching (via siblings) only happens when a child conflict with only
> part of the lock held by the parent.
> So if one process locks 32K, then two other processes request locks on
> the 2 16K halves, then 4 processes request locks on the 8K quarters, and
> so-on, then you could end up with 32767 processes in a binary tree, with
> half of them all waiting on different individual bytes.

Maybe I should actually read the code carefully instead of just skimming
the changelog and jumping to conclusions.

I think this is correct, but I wish we had an actual written-out
argument that it's correct, because intuition isn't a great guide for
posix file locks.

Maybe:

Waiting and applied locks are all kept in trees whose properties are:

	- the root of a tree may be an applied or unapplied lock.
	- every other node in the tree is an unapplied lock that
	  conflicts with every ancestor of that node.

Every such tree begins life as an unapplied singleton which obviously
satisfies the above properties.

The only ways we modify trees preserve these properties:

	1. We may add a new child, but only after first verifying that it
	   conflicts with all of its ancestors.
	2. We may remove the root of a tree, creating a new singleton
	   tree from the root and N new trees rooted in the immediate
	   children.
	3. If the root of a tree is not currently an applied lock, we may
	   apply it (if possible).
	4. We may upgrade the root of the tree (either extend its range,
	   or upgrade its entire range from read to write).

When an applied lock is modified in a way that reduces or downgrades any
part of its range, we remove all its children (2 above).

For each of those child trees: if the root of the tree applies, we do so
(3).  If it doesn't, it must conflict with some applied lock.  We remove
all of its children (2), and add it is a new leaf to the tree rooted in
the applied lock (1).  We then repeat the process recursively with those
children.

Something like that.

--b.

> 
> NeilBrown
> 
> >
> > --b.
> >
> >> 
> >> Reported-and-tested-by: Martin Wilck <mwilck@suse.de>
> >> Signed-off-by: NeilBrown <neilb@suse.com>
> >> ---
> >>  fs/locks.c |   29 +++++++++++++++++++++++------
> >>  1 file changed, 23 insertions(+), 6 deletions(-)
> >> 
> >> diff --git a/fs/locks.c b/fs/locks.c
> >> index 802d5853acd5..1b0eac6b2918 100644
> >> --- a/fs/locks.c
> >> +++ b/fs/locks.c
> >> @@ -715,11 +715,25 @@ static void locks_delete_block(struct file_lock *waiter)
> >>   * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring
> >>   * that the flc_lock is also held on insertions we can avoid taking the
> >>   * blocked_lock_lock in some cases when we see that the fl_blocked list is empty.
> >> + *
> >> + * Rather than just adding to the list, we check for conflicts with any existing
> >> + * waiters, and add beneath any waiter that blocks the new waiter.
> >> + * Thus wakeups don't happen until needed.
> >>   */
> >>  static void __locks_insert_block(struct file_lock *blocker,
> >> -					struct file_lock *waiter)
> >> +				 struct file_lock *waiter,
> >> +				 bool conflict(struct file_lock *,
> >> +					       struct file_lock *))
> >>  {
> >> +	struct file_lock *fl;
> >>  	BUG_ON(!list_empty(&waiter->fl_block));
> >> +
> >> +new_blocker:
> >> +	list_for_each_entry(fl, &blocker->fl_blocked, fl_block)
> >> +		if (conflict(fl, waiter)) {
> >> +			blocker =  fl;
> >> +			goto new_blocker;
> >> +		}
> >>  	waiter->fl_blocker = blocker;
> >>  	list_add_tail(&waiter->fl_block, &blocker->fl_blocked);
> >>  	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
> >> @@ -734,10 +748,12 @@ static void __locks_insert_block(struct file_lock *blocker,
> >>  
> >>  /* Must be called with flc_lock held. */
> >>  static void locks_insert_block(struct file_lock *blocker,
> >> -					struct file_lock *waiter)
> >> +			       struct file_lock *waiter,
> >> +			       bool conflict(struct file_lock *,
> >> +					     struct file_lock *))
> >>  {
> >>  	spin_lock(&blocked_lock_lock);
> >> -	__locks_insert_block(blocker, waiter);
> >> +	__locks_insert_block(blocker, waiter, conflict);
> >>  	spin_unlock(&blocked_lock_lock);
> >>  }
> >>  
> >> @@ -996,7 +1012,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
> >>  		if (!(request->fl_flags & FL_SLEEP))
> >>  			goto out;
> >>  		error = FILE_LOCK_DEFERRED;
> >> -		locks_insert_block(fl, request);
> >> +		locks_insert_block(fl, request, flock_locks_conflict);
> >>  		goto out;
> >>  	}
> >>  	if (request->fl_flags & FL_ACCESS)
> >> @@ -1071,7 +1087,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
> >>  			spin_lock(&blocked_lock_lock);
> >>  			if (likely(!posix_locks_deadlock(request, fl))) {
> >>  				error = FILE_LOCK_DEFERRED;
> >> -				__locks_insert_block(fl, request);
> >> +				__locks_insert_block(fl, request,
> >> +						     posix_locks_conflict);
> >>  			}
> >>  			spin_unlock(&blocked_lock_lock);
> >>  			goto out;
> >> @@ -1542,7 +1559,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
> >>  		break_time -= jiffies;
> >>  	if (break_time == 0)
> >>  		break_time++;
> >> -	locks_insert_block(fl, new_fl);
> >> +	locks_insert_block(fl, new_fl, leases_conflict);
> >>  	trace_break_lease_block(inode, new_fl);
> >>  	spin_unlock(&ctx->flc_lock);
> >>  	percpu_up_read_preempt_enable(&file_rwsem);
> >> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 01/12] fs/locks: rename some lists and pointers.
  2018-11-09  0:32     ` NeilBrown
@ 2018-11-09  3:11       ` J. Bruce Fields
  0 siblings, 0 replies; 26+ messages in thread
From: J. Bruce Fields @ 2018-11-09  3:11 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, Alexander Viro, Martin Wilck, linux-fsdevel,
	Frank Filz, linux-kernel

On Fri, Nov 09, 2018 at 11:32:16AM +1100, NeilBrown wrote:
> On Thu, Nov 08 2018, J. Bruce Fields wrote:
> 
> > On Mon, Nov 05, 2018 at 12:30:47PM +1100, NeilBrown wrote:
> >> struct file lock contains an 'fl_next' pointer which
> >> is used to point to the lock that this request is blocked
> >> waiting for.  So rename it to fl_blocker.
> >> 
> >> The fl_blocked list_head in an active lock is the head of a list of
> >> blocked requests.  In a request it is a node in that list.
> >> These are two distinct uses, so replace with two list_heads
> >> with different names.
> >> fl_blocked is the head of a list of blocked requests
> >> fl_block is a node on that list.
> >
> > Reading these, I have a lot of trouble keeping fl_blocked, fl_block, and
> > fl_blocker straight.  Is it just me?
> 
> "Naming is hard" - but that is no excuse.
> I suspect it isn't just you.
> 
> I particularly like "fl_blocker".
> 
> 		error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> 
> reads well to me - wait until this lock has a no blocker - i.e. until
> nothing blocks it.
> 
> fl_blocked could be fl_blockees (the things that I block), but I doubt
> that is an improvement.

Maybe.  The plural might help me remember that it's the head of a list?

> > I guess they form a tree, so fl_children, fl_siblings, and fl_parent
> > might be easier for me to keep straight.
> 
> This requires one to know a priori that the tree records which locks
> block which requests, which is obvious to us now, but might not be so
> obvious in 5 years time when we look at this code again.
> 
> An I have never really liked the "siblings" naming. 'struct dentry' uses
> "d_child", which is possibly ever more confusing.
> I would like it to be obvious that this is a list-member, not a
> list-head.  Rusty once posted patches to allow the list head to be a
> different type to the members, but that fell on deaf ears.
> So
>    fl_blocked_member
> might be an improvement - this is a member of the fl_blocked list.
> It would be easier to search for than fl_block - which needs
> fl_block[^a-z] to avoid false positives.

Yeah, maybe, if it's not too long.

> I'd be quite happy to change fl_block is any two people can agree on a
> better name.  I'm less inclined to change the others without a really
> good proposal.
> 
> Hmmm. what is the inverse of "Block"?  If I block you then you ....  I
> know, you are a usurper.
> So
>   fl_blocker points to the "parent"
>   fl_usurpers is a list of "children"
>   fl_usurpers_member is my linkage in that list.
> or not.

"Usurper" isn't doing it for me.

Yeah, I've got no clever scheme.

--b.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 10/12] fs/locks: create a tree of dependent requests.
  2018-11-09  3:09       ` J. Bruce Fields
@ 2018-11-09  6:24         ` NeilBrown
  2018-11-09 15:08           ` J. Bruce Fields
  0 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2018-11-09  6:24 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Alexander Viro, Martin Wilck, linux-fsdevel,
	Frank Filz, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3959 bytes --]

On Thu, Nov 08 2018, J. Bruce Fields wrote:

> On Fri, Nov 09, 2018 at 11:38:19AM +1100, NeilBrown wrote:
>> On Thu, Nov 08 2018, J. Bruce Fields wrote:
>> 
>> > On Mon, Nov 05, 2018 at 12:30:48PM +1100, NeilBrown wrote:
>> >> When we find an existing lock which conflicts with a request,
>> >> and the request wants to wait, we currently add the request
>> >> to a list.  When the lock is removed, the whole list is woken.
>> >> This can cause the thundering-herd problem.
>> >> To reduce the problem, we make use of the (new) fact that
>> >> a pending request can itself have a list of blocked requests.
>> >> When we find a conflict, we look through the existing blocked requests.
>> >> If any one of them blocks the new request, the new request is attached
>> >> below that request, otherwise it is added to the list of blocked
>> >> requests, which are now known to be mutually non-conflicting.
>> >> 
>> >> This way, when the lock is released, only a set of non-conflicting
>> >> locks will be woken, the rest can stay asleep.
>> >> If the lock request cannot be granted and the request needs to be
>> >> requeued, all the other requests it blocks will then be woken
>> >
>> > So, to make sure I understand: the tree of blocking locks only ever has
>> > three levels (the active lock, the locks blocking on it, and their
>> > children?)
>> 
>> Not correct.
>> Blocks is only vertical, never horizontal.  Siblings never block each
>> other.
>> So one process hold a lock on a byte, and 27 other process want a lock
>> on that byte, then there will be 28 levels in a narrow tree - it is
>> effectively a queue.
>> Branching (via siblings) only happens when a child conflict with only
>> part of the lock held by the parent.
>> So if one process locks 32K, then two other processes request locks on
>> the 2 16K halves, then 4 processes request locks on the 8K quarters, and
>> so-on, then you could end up with 32767 processes in a binary tree, with
>> half of them all waiting on different individual bytes.
>
> Maybe I should actually read the code carefully instead of just skimming
> the changelog and jumping to conclusions.
>
> I think this is correct, but I wish we had an actual written-out
> argument that it's correct, because intuition isn't a great guide for
> posix file locks.
>
> Maybe:
>
> Waiting and applied locks are all kept in trees whose properties are:
>
> 	- the root of a tree may be an applied or unapplied lock.
> 	- every other node in the tree is an unapplied lock that
> 	  conflicts with every ancestor of that node.
>
> Every such tree begins life as an unapplied singleton which obviously
> satisfies the above properties.
>
> The only ways we modify trees preserve these properties:
>
> 	1. We may add a new child, but only after first verifying that it
> 	   conflicts with all of its ancestors.
> 	2. We may remove the root of a tree, creating a new singleton
> 	   tree from the root and N new trees rooted in the immediate
> 	   children.
> 	3. If the root of a tree is not currently an applied lock, we may
> 	   apply it (if possible).
> 	4. We may upgrade the root of the tree (either extend its range,
> 	   or upgrade its entire range from read to write).
>
> When an applied lock is modified in a way that reduces or downgrades any
> part of its range, we remove all its children (2 above).
>
> For each of those child trees: if the root of the tree applies, we do so
> (3).  If it doesn't, it must conflict with some applied lock.  We remove
> all of its children (2), and add it is a new leaf to the tree rooted in
> the applied lock (1).  We then repeat the process recursively with those
> children.
>

Thanks pretty thorough - and even looks correct.
I'll re-reading some time when it isn't late, and maybe make it into a
comment in the code.
I agree, this sort of documentation can be quite helpful.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 10/12] fs/locks: create a tree of dependent requests.
  2018-11-09  6:24         ` NeilBrown
@ 2018-11-09 15:08           ` J. Bruce Fields
  0 siblings, 0 replies; 26+ messages in thread
From: J. Bruce Fields @ 2018-11-09 15:08 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, Alexander Viro, Martin Wilck, linux-fsdevel,
	Frank Filz, linux-kernel

On Fri, Nov 09, 2018 at 05:24:04PM +1100, NeilBrown wrote:
> Thanks pretty thorough - and even looks correct.
> I'll re-reading some time when it isn't late, and maybe make it into a
> comment in the code.
> I agree, this sort of documentation can be quite helpful.

OK.  The idea looks sound to me, and the only problems I found were
documentation.  (I'd like this and the details in the 0/12 mail captured
somewhere.  And if we came up with better blocker/blocked/block naming
I'd be happier, though I definitely wouldn't want the patches help up
for that.)  Feel free to add my ACK to the series.

--b.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 06/12] locks: use properly initialized file_lock when unlocking.
  2018-11-12  1:14 [PATCH 00/12 v5] locks: avoid thundering-herd wake-ups NeilBrown
@ 2018-11-12  1:14 ` NeilBrown
  0 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2018-11-12  1:14 UTC (permalink / raw)
  To: Jeff Layton, Alexander Viro
  Cc: J. Bruce Fields, Martin Wilck, linux-fsdevel, Frank Filz, linux-kernel

Both locks_remove_posix() and locks_remove_flock() use a
struct file_lock without calling locks_init_lock() on it.
This means the various list_heads are not initialized, which
will become a problem with a later patch.

So change them both to initialize properly.  For flock locks,
this involves using flock_make_lock(), and changing it to
allow a file_lock to be passed in, so memory allocation isn't
always needed.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/locks.c |   28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 6adb22a21a92..492d970c67d7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -418,17 +418,20 @@ static inline int flock_translate_cmd(int cmd) {
 
 /* Fill in a file_lock structure with an appropriate FLOCK lock. */
 static struct file_lock *
-flock_make_lock(struct file *filp, unsigned int cmd)
+flock_make_lock(struct file *filp, unsigned int cmd, struct file_lock *fl)
 {
-	struct file_lock *fl;
 	int type = flock_translate_cmd(cmd);
 
 	if (type < 0)
 		return ERR_PTR(type);
 	
-	fl = locks_alloc_lock();
-	if (fl == NULL)
-		return ERR_PTR(-ENOMEM);
+	if (fl == NULL) {
+		fl = locks_alloc_lock();
+		if (fl == NULL)
+			return ERR_PTR(-ENOMEM);
+	} else {
+		locks_init_lock(fl);
+	}
 
 	fl->fl_file = filp;
 	fl->fl_owner = filp;
@@ -2009,7 +2012,7 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 	    !(f.file->f_mode & (FMODE_READ|FMODE_WRITE)))
 		goto out_putf;
 
-	lock = flock_make_lock(f.file, cmd);
+	lock = flock_make_lock(f.file, cmd, NULL);
 	if (IS_ERR(lock)) {
 		error = PTR_ERR(lock);
 		goto out_putf;
@@ -2484,6 +2487,7 @@ void locks_remove_posix(struct file *filp, fl_owner_t owner)
 	if (!ctx || list_empty(&ctx->flc_posix))
 		return;
 
+	locks_init_lock(&lock);
 	lock.fl_type = F_UNLCK;
 	lock.fl_flags = FL_POSIX | FL_CLOSE;
 	lock.fl_start = 0;
@@ -2507,19 +2511,15 @@ EXPORT_SYMBOL(locks_remove_posix);
 static void
 locks_remove_flock(struct file *filp, struct file_lock_context *flctx)
 {
-	struct file_lock fl = {
-		.fl_owner = filp,
-		.fl_pid = current->tgid,
-		.fl_file = filp,
-		.fl_flags = FL_FLOCK | FL_CLOSE,
-		.fl_type = F_UNLCK,
-		.fl_end = OFFSET_MAX,
-	};
+	struct file_lock fl;
 	struct inode *inode = locks_inode(filp);
 
 	if (list_empty(&flctx->flc_flock))
 		return;
 
+	flock_make_lock(filp, LOCK_UN, &fl);
+	fl.fl_flags |= FL_CLOSE;
+
 	if (filp->f_op->flock)
 		filp->f_op->flock(filp, F_SETLKW, &fl);
 	else



^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-11-12  1:17 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-05  1:30 [PATCH 00/12] Series short description NeilBrown
2018-11-05  1:30 ` [PATCH 02/12] fs/locks: split out __locks_wake_up_blocks() NeilBrown
2018-11-05  1:30 ` [PATCH 08/12] fs/locks: always delete_block after waiting NeilBrown
2018-11-05  1:30 ` [PATCH 06/12] locks: use properly initialized file_lock when unlocking NeilBrown
2018-11-05  1:30 ` [PATCH 01/12] fs/locks: rename some lists and pointers NeilBrown
2018-11-08 20:26   ` J. Bruce Fields
2018-11-09  0:32     ` NeilBrown
2018-11-09  3:11       ` J. Bruce Fields
2018-11-05  1:30 ` [PATCH 04/12] gfs2: properly initial file_lock used for unlock NeilBrown
2018-11-05 12:18   ` Jeff Layton
2018-11-06  1:48     ` NeilBrown
2018-11-06 13:20       ` Jeff Layton
2018-11-05  1:30 ` [PATCH 05/12] ocfs2: " NeilBrown
2018-11-05  1:30 ` [PATCH 07/12] fs/locks: allow a lock request to block other requests NeilBrown
2018-11-05  1:30 ` [PATCH 09/12] fs/locks: change all *_conflict() functions to return bool NeilBrown
2018-11-05  1:30 ` [PATCH 03/12] NFS: use locks_copy_lock() to copy locks NeilBrown
2018-11-05  1:30 ` [PATCH 11/12] locks: merge posix_unblock_lock() and locks_delete_block() NeilBrown
2018-11-05  1:30 ` [PATCH 10/12] fs/locks: create a tree of dependent requests NeilBrown
2018-11-08 21:30   ` J. Bruce Fields
2018-11-09  0:38     ` NeilBrown
2018-11-09  3:09       ` J. Bruce Fields
2018-11-09  6:24         ` NeilBrown
2018-11-09 15:08           ` J. Bruce Fields
2018-11-05  1:30 ` [PATCH 12/12] VFS: locks: remove unnecessary white space NeilBrown
2018-11-08 21:35 ` [PATCH 00/12] Series short description J. Bruce Fields
2018-11-12  1:14 [PATCH 00/12 v5] locks: avoid thundering-herd wake-ups NeilBrown
2018-11-12  1:14 ` [PATCH 06/12] locks: use properly initialized file_lock when unlocking NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).