All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/14] locks: scalability improvements for file locking
@ 2013-06-21 12:58 ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: linux-cifs, linux-nfs, cluster-devel, sage, samba-technical,
	Trond.Myklebust, linux-kernel, piastryyy, linux-afs, dhowells,
	smfrench, linux-fsdevel, ceph-devel, akpm

This is the fourth iteration of this patchset, at this point, I think
it's probably ready for merge. There are a few small cleanups in this
set, but it's almost identical functionally to the v3 set. The cover
letter below is basically equivalent to the one I sent in the v3 set
as well. There was no measurable performance difference with this
set and that one, so I've left the results in there as-is.

Summary of Significant Changes:
-------------------------------
v4:
- eliminate unused argument to posix_unblock_lock
- fix potential race in locks_wake_up_blocks
- more comment cleanups and clarifications

v3:
- Change spinlock handling to avoid the need to traverse the global
  blocked_hash when doing output of /proc/locks. This means that the
  fl_block list must continue to be protected by a global lock, but
  the fact that the i_lock is also held in most cases means that we
  can avoid taking it in certain situations.

v2:
- Fix potential races in deadlock detection. Manipulation of global
  blocked_hash and deadlock detection are now atomic. This is a
  little slower than the earlier set, but is provably correct. Also,
  the patch that converts to using the i_lock has been split out from
  most of the other changes. That should make it easier to review, but
  it does leave a potential race in the deadlock detection that is fixed
  up by the following patch. It may make sense to fold patches 7 and 8
  together before merging.

- Add percpu hlists and lglocks for global file_lock_list. This gives
  us some speedup since this list is seldom read.

Abstract (tl;dr version):
-------------------------
This patchset represents an overhaul of the file locking code with an
aim toward improving its scalability and making the code a bit easier to
understand.

Longer version:
---------------
When the BKL was finally ripped out of the kernel in 2010, the strategy
taken for the file locking code was to simply turn it into a new
file_lock_locks spinlock. It was an expedient way to deal with the file
locking code at the time, but having a giant spinlock around all of this
code is clearly not great for scalability. Red Hat has bug reports that
go back into the 2.6.18 era that point to BKL scalability problems in
the file locking code and the file_lock_lock suffers from the same
issues.

This patchset is my first attempt to make this code less dependent on
global locking. The main change is to switch most of the file locking
code to be protected by the inode->i_lock instead of the file_lock_lock.

While that works for most things, there are a couple of global data
structures (lists in the current code) that need a global lock to
protect them. So we still need a global lock in order to deal with
those. The remaining patches are intended to make that global locking
less painful. The big gains are made by turning the blocked_list into a
hashtable, which greatly speeds up the deadlock detection code and
making the file_lock_list percpu.

This is not the first attempt at doing this. The conversion to the
i_lock was originally attempted by Bruce Fields a few years ago. His
approach was NAK'ed since it involved ripping out the deadlock
detection. People also really seem to like /proc/locks for debugging, so
keeping that in is probably worthwhile.

There's more work to be done in this area and this patchset is just a
start. There's a horrible thundering herd problem when a blocking lock
is released, for instance. There was also interest in solving the goofy
"unlock on any close" POSIX lock semantics at this year's LSF. I think
this patchset will help lay the groundwork for those changes as well.

While file locking is not usually considered to be a high-performance
codepath, it *is* an IPC mechanism and I think it behooves us to try to
make it as fast as possible.

I'd like to see this considered for 3.11, but some soak time in -next
would be good. Comments and suggestions welcome.

Performance testing and results:
--------------------------------
In order to measure the benefit of this set, I've written some locking
performance tests that I've made available here:

    git://git.samba.org/jlayton/lockperf.git

Here are the results from the same 32-way, 4 NUMA node machine that I
used to generate the v2 patch results. The first number is the mean
time spent in locking for the test. The number in parenthesis is the
standard deviation.

		3.10.0-rc5-00219-ga2648eb	3.10.0-rc5-00231-g7569869
---------------------------------------------------------------------------
flock01		24119.96 (266.08)		24542.51 (254.89)
flock02		 1345.09  (37.37)		    8.60   (0.31)
posix01		31217.14 (320.91)		24899.20 (254.27)
posix02		 1348.60  (36.83)		   12.70   (0.44)

I wasn't able to reserve the exact same smaller machine for testing this
set, but this one is comparable with 4 CPUs and UMA architecture:

		3.10.0-rc5-00219-ga2648eb	3.10.0-rc5-00231-g7569869
---------------------------------------------------------------------------
flock01		1787.51 (11.23)			1797.75  (9.27)
flock02		 314.90	 (8.84)			  34.87  (2.82)
posix01		1843.43 (11.63)			1880.47 (13.47)
posix02		 325.13  (8.53)			  54.09  (4.02)

I think the conclusion we can draw here is that this patchset it roughly
as fast as the previous one. In addition, the posix02 test saw a vast
increase in performance.

I believe that's mostly due to the fact that with this set I added a
patch that allows the code to avoid taking the global blocked_lock_lock
when waking up waiters if there aren't any. With that, the
blocked_lock_lock never has to be taken at all if there's no contention
for the file_lock (as is the case in the posix02 and flock02 tests).

Jeff Layton (14):
  locks: drop the unused filp argument to posix_unblock_lock
  cifs: use posix_unblock_lock instead of locks_delete_block
  locks: make generic_add_lease and generic_delete_lease static
  locks: comment cleanups and clarifications
  locks: make "added" in __posix_lock_file a bool
  locks: encapsulate the fl_link list handling
  locks: protect most of the file_lock handling with i_lock
  locks: avoid taking global lock if possible when waking up blocked
    waiters
  locks: convert fl_link to a hlist_node
  locks: turn the blocked_list into a hashtable
  locks: add a new "lm_owner_key" lock operation
  locks: give the blocked_hash its own spinlock
  seq_file: add seq_list_*_percpu helpers
  locks: move file_lock_list to a set of percpu hlist_heads and convert
    file_lock_lock to an lglock

 Documentation/filesystems/Locking |   27 +++-
 fs/afs/flock.c                    |    5 +-
 fs/ceph/locks.c                   |    2 +-
 fs/ceph/mds_client.c              |    8 +-
 fs/cifs/cifsfs.c                  |    2 +-
 fs/cifs/file.c                    |   15 +-
 fs/gfs2/file.c                    |    2 +-
 fs/lockd/svclock.c                |   14 ++-
 fs/lockd/svcsubs.c                |   12 +-
 fs/locks.c                        |  326 ++++++++++++++++++++++++++-----------
 fs/nfs/delegation.c               |   10 +-
 fs/nfs/nfs4state.c                |    8 +-
 fs/nfsd/nfs4state.c               |    8 +-
 fs/seq_file.c                     |   54 ++++++
 include/linux/fs.h                |   43 +++---
 include/linux/seq_file.h          |    6 +
 16 files changed, 384 insertions(+), 158 deletions(-)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v4 00/14] locks: scalability improvements for file locking
@ 2013-06-21 12:58 ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

This is the fourth iteration of this patchset, at this point, I think
it's probably ready for merge. There are a few small cleanups in this
set, but it's almost identical functionally to the v3 set. The cover
letter below is basically equivalent to the one I sent in the v3 set
as well. There was no measurable performance difference with this
set and that one, so I've left the results in there as-is.

Summary of Significant Changes:
-------------------------------
v4:
- eliminate unused argument to posix_unblock_lock
- fix potential race in locks_wake_up_blocks
- more comment cleanups and clarifications

v3:
- Change spinlock handling to avoid the need to traverse the global
  blocked_hash when doing output of /proc/locks. This means that the
  fl_block list must continue to be protected by a global lock, but
  the fact that the i_lock is also held in most cases means that we
  can avoid taking it in certain situations.

v2:
- Fix potential races in deadlock detection. Manipulation of global
  blocked_hash and deadlock detection are now atomic. This is a
  little slower than the earlier set, but is provably correct. Also,
  the patch that converts to using the i_lock has been split out from
  most of the other changes. That should make it easier to review, but
  it does leave a potential race in the deadlock detection that is fixed
  up by the following patch. It may make sense to fold patches 7 and 8
  together before merging.

- Add percpu hlists and lglocks for global file_lock_list. This gives
  us some speedup since this list is seldom read.

Abstract (tl;dr version):
-------------------------
This patchset represents an overhaul of the file locking code with an
aim toward improving its scalability and making the code a bit easier to
understand.

Longer version:
---------------
When the BKL was finally ripped out of the kernel in 2010, the strategy
taken for the file locking code was to simply turn it into a new
file_lock_locks spinlock. It was an expedient way to deal with the file
locking code at the time, but having a giant spinlock around all of this
code is clearly not great for scalability. Red Hat has bug reports that
go back into the 2.6.18 era that point to BKL scalability problems in
the file locking code and the file_lock_lock suffers from the same
issues.

This patchset is my first attempt to make this code less dependent on
global locking. The main change is to switch most of the file locking
code to be protected by the inode->i_lock instead of the file_lock_lock.

While that works for most things, there are a couple of global data
structures (lists in the current code) that need a global lock to
protect them. So we still need a global lock in order to deal with
those. The remaining patches are intended to make that global locking
less painful. The big gains are made by turning the blocked_list into a
hashtable, which greatly speeds up the deadlock detection code and
making the file_lock_list percpu.

This is not the first attempt at doing this. The conversion to the
i_lock was originally attempted by Bruce Fields a few years ago. His
approach was NAK'ed since it involved ripping out the deadlock
detection. People also really seem to like /proc/locks for debugging, so
keeping that in is probably worthwhile.

There's more work to be done in this area and this patchset is just a
start. There's a horrible thundering herd problem when a blocking lock
is released, for instance. There was also interest in solving the goofy
"unlock on any close" POSIX lock semantics at this year's LSF. I think
this patchset will help lay the groundwork for those changes as well.

While file locking is not usually considered to be a high-performance
codepath, it *is* an IPC mechanism and I think it behooves us to try to
make it as fast as possible.

I'd like to see this considered for 3.11, but some soak time in -next
would be good. Comments and suggestions welcome.

Performance testing and results:
--------------------------------
In order to measure the benefit of this set, I've written some locking
performance tests that I've made available here:

    git://git.samba.org/jlayton/lockperf.git

Here are the results from the same 32-way, 4 NUMA node machine that I
used to generate the v2 patch results. The first number is the mean
time spent in locking for the test. The number in parenthesis is the
standard deviation.

		3.10.0-rc5-00219-ga2648eb	3.10.0-rc5-00231-g7569869
---------------------------------------------------------------------------
flock01		24119.96 (266.08)		24542.51 (254.89)
flock02		 1345.09  (37.37)		    8.60   (0.31)
posix01		31217.14 (320.91)		24899.20 (254.27)
posix02		 1348.60  (36.83)		   12.70   (0.44)

I wasn't able to reserve the exact same smaller machine for testing this
set, but this one is comparable with 4 CPUs and UMA architecture:

		3.10.0-rc5-00219-ga2648eb	3.10.0-rc5-00231-g7569869
---------------------------------------------------------------------------
flock01		1787.51 (11.23)			1797.75  (9.27)
flock02		 314.90	 (8.84)			  34.87  (2.82)
posix01		1843.43 (11.63)			1880.47 (13.47)
posix02		 325.13  (8.53)			  54.09  (4.02)

I think the conclusion we can draw here is that this patchset it roughly
as fast as the previous one. In addition, the posix02 test saw a vast
increase in performance.

I believe that's mostly due to the fact that with this set I added a
patch that allows the code to avoid taking the global blocked_lock_lock
when waking up waiters if there aren't any. With that, the
blocked_lock_lock never has to be taken at all if there's no contention
for the file_lock (as is the case in the posix02 and flock02 tests).

Jeff Layton (14):
  locks: drop the unused filp argument to posix_unblock_lock
  cifs: use posix_unblock_lock instead of locks_delete_block
  locks: make generic_add_lease and generic_delete_lease static
  locks: comment cleanups and clarifications
  locks: make "added" in __posix_lock_file a bool
  locks: encapsulate the fl_link list handling
  locks: protect most of the file_lock handling with i_lock
  locks: avoid taking global lock if possible when waking up blocked
    waiters
  locks: convert fl_link to a hlist_node
  locks: turn the blocked_list into a hashtable
  locks: add a new "lm_owner_key" lock operation
  locks: give the blocked_hash its own spinlock
  seq_file: add seq_list_*_percpu helpers
  locks: move file_lock_list to a set of percpu hlist_heads and convert
    file_lock_lock to an lglock

 Documentation/filesystems/Locking |   27 +++-
 fs/afs/flock.c                    |    5 +-
 fs/ceph/locks.c                   |    2 +-
 fs/ceph/mds_client.c              |    8 +-
 fs/cifs/cifsfs.c                  |    2 +-
 fs/cifs/file.c                    |   15 +-
 fs/gfs2/file.c                    |    2 +-
 fs/lockd/svclock.c                |   14 ++-
 fs/lockd/svcsubs.c                |   12 +-
 fs/locks.c                        |  326 ++++++++++++++++++++++++++-----------
 fs/nfs/delegation.c               |   10 +-
 fs/nfs/nfs4state.c                |    8 +-
 fs/nfsd/nfs4state.c               |    8 +-
 fs/seq_file.c                     |   54 ++++++
 include/linux/fs.h                |   43 +++---
 include/linux/seq_file.h          |    6 +
 16 files changed, 384 insertions(+), 158 deletions(-)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 00/14] locks: scalability improvements for file locking
@ 2013-06-21 12:58 ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This is the fourth iteration of this patchset, at this point, I think
it's probably ready for merge. There are a few small cleanups in this
set, but it's almost identical functionally to the v3 set. The cover
letter below is basically equivalent to the one I sent in the v3 set
as well. There was no measurable performance difference with this
set and that one, so I've left the results in there as-is.

Summary of Significant Changes:
-------------------------------
v4:
- eliminate unused argument to posix_unblock_lock
- fix potential race in locks_wake_up_blocks
- more comment cleanups and clarifications

v3:
- Change spinlock handling to avoid the need to traverse the global
  blocked_hash when doing output of /proc/locks. This means that the
  fl_block list must continue to be protected by a global lock, but
  the fact that the i_lock is also held in most cases means that we
  can avoid taking it in certain situations.

v2:
- Fix potential races in deadlock detection. Manipulation of global
  blocked_hash and deadlock detection are now atomic. This is a
  little slower than the earlier set, but is provably correct. Also,
  the patch that converts to using the i_lock has been split out from
  most of the other changes. That should make it easier to review, but
  it does leave a potential race in the deadlock detection that is fixed
  up by the following patch. It may make sense to fold patches 7 and 8
  together before merging.

- Add percpu hlists and lglocks for global file_lock_list. This gives
  us some speedup since this list is seldom read.

Abstract (tl;dr version):
-------------------------
This patchset represents an overhaul of the file locking code with an
aim toward improving its scalability and making the code a bit easier to
understand.

Longer version:
---------------
When the BKL was finally ripped out of the kernel in 2010, the strategy
taken for the file locking code was to simply turn it into a new
file_lock_locks spinlock. It was an expedient way to deal with the file
locking code at the time, but having a giant spinlock around all of this
code is clearly not great for scalability. Red Hat has bug reports that
go back into the 2.6.18 era that point to BKL scalability problems in
the file locking code and the file_lock_lock suffers from the same
issues.

This patchset is my first attempt to make this code less dependent on
global locking. The main change is to switch most of the file locking
code to be protected by the inode->i_lock instead of the file_lock_lock.

While that works for most things, there are a couple of global data
structures (lists in the current code) that need a global lock to
protect them. So we still need a global lock in order to deal with
those. The remaining patches are intended to make that global locking
less painful. The big gains are made by turning the blocked_list into a
hashtable, which greatly speeds up the deadlock detection code and
making the file_lock_list percpu.

This is not the first attempt at doing this. The conversion to the
i_lock was originally attempted by Bruce Fields a few years ago. His
approach was NAK'ed since it involved ripping out the deadlock
detection. People also really seem to like /proc/locks for debugging, so
keeping that in is probably worthwhile.

There's more work to be done in this area and this patchset is just a
start. There's a horrible thundering herd problem when a blocking lock
is released, for instance. There was also interest in solving the goofy
"unlock on any close" POSIX lock semantics at this year's LSF. I think
this patchset will help lay the groundwork for those changes as well.

While file locking is not usually considered to be a high-performance
codepath, it *is* an IPC mechanism and I think it behooves us to try to
make it as fast as possible.

I'd like to see this considered for 3.11, but some soak time in -next
would be good. Comments and suggestions welcome.

Performance testing and results:
--------------------------------
In order to measure the benefit of this set, I've written some locking
performance tests that I've made available here:

    git://git.samba.org/jlayton/lockperf.git

Here are the results from the same 32-way, 4 NUMA node machine that I
used to generate the v2 patch results. The first number is the mean
time spent in locking for the test. The number in parenthesis is the
standard deviation.

		3.10.0-rc5-00219-ga2648eb	3.10.0-rc5-00231-g7569869
---------------------------------------------------------------------------
flock01		24119.96 (266.08)		24542.51 (254.89)
flock02		 1345.09  (37.37)		    8.60   (0.31)
posix01		31217.14 (320.91)		24899.20 (254.27)
posix02		 1348.60  (36.83)		   12.70   (0.44)

I wasn't able to reserve the exact same smaller machine for testing this
set, but this one is comparable with 4 CPUs and UMA architecture:

		3.10.0-rc5-00219-ga2648eb	3.10.0-rc5-00231-g7569869
---------------------------------------------------------------------------
flock01		1787.51 (11.23)			1797.75  (9.27)
flock02		 314.90	 (8.84)			  34.87  (2.82)
posix01		1843.43 (11.63)			1880.47 (13.47)
posix02		 325.13  (8.53)			  54.09  (4.02)

I think the conclusion we can draw here is that this patchset it roughly
as fast as the previous one. In addition, the posix02 test saw a vast
increase in performance.

I believe that's mostly due to the fact that with this set I added a
patch that allows the code to avoid taking the global blocked_lock_lock
when waking up waiters if there aren't any. With that, the
blocked_lock_lock never has to be taken at all if there's no contention
for the file_lock (as is the case in the posix02 and flock02 tests).

Jeff Layton (14):
  locks: drop the unused filp argument to posix_unblock_lock
  cifs: use posix_unblock_lock instead of locks_delete_block
  locks: make generic_add_lease and generic_delete_lease static
  locks: comment cleanups and clarifications
  locks: make "added" in __posix_lock_file a bool
  locks: encapsulate the fl_link list handling
  locks: protect most of the file_lock handling with i_lock
  locks: avoid taking global lock if possible when waking up blocked
    waiters
  locks: convert fl_link to a hlist_node
  locks: turn the blocked_list into a hashtable
  locks: add a new "lm_owner_key" lock operation
  locks: give the blocked_hash its own spinlock
  seq_file: add seq_list_*_percpu helpers
  locks: move file_lock_list to a set of percpu hlist_heads and convert
    file_lock_lock to an lglock

 Documentation/filesystems/Locking |   27 +++-
 fs/afs/flock.c                    |    5 +-
 fs/ceph/locks.c                   |    2 +-
 fs/ceph/mds_client.c              |    8 +-
 fs/cifs/cifsfs.c                  |    2 +-
 fs/cifs/file.c                    |   15 +-
 fs/gfs2/file.c                    |    2 +-
 fs/lockd/svclock.c                |   14 ++-
 fs/lockd/svcsubs.c                |   12 +-
 fs/locks.c                        |  326 ++++++++++++++++++++++++++-----------
 fs/nfs/delegation.c               |   10 +-
 fs/nfs/nfs4state.c                |    8 +-
 fs/nfsd/nfs4state.c               |    8 +-
 fs/seq_file.c                     |   54 ++++++
 include/linux/fs.h                |   43 +++---
 include/linux/seq_file.h          |    6 +
 16 files changed, 384 insertions(+), 158 deletions(-)



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v4 01/14] locks: drop the unused filp argument to posix_unblock_lock
  2013-06-21 12:58 ` Jeff Layton
  (?)
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: linux-cifs, linux-nfs, cluster-devel, sage, samba-technical,
	Trond.Myklebust, linux-kernel, piastryyy, linux-afs, dhowells,
	smfrench, linux-fsdevel, ceph-devel, akpm

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/lockd/svclock.c |    2 +-
 fs/locks.c         |    4 +---
 include/linux/fs.h |    5 ++---
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index e703318..a469098 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -276,7 +276,7 @@ static int nlmsvc_unlink_block(struct nlm_block *block)
 	dprintk("lockd: unlinking block %p...\n", block);
 
 	/* Remove block from list */
-	status = posix_unblock_lock(block->b_file->f_file, &block->b_call->a_args.lock.fl);
+	status = posix_unblock_lock(&block->b_call->a_args.lock.fl);
 	nlmsvc_remove_block(block);
 	return status;
 }
diff --git a/fs/locks.c b/fs/locks.c
index cb424a4..72fb2b7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2099,13 +2099,12 @@ void locks_remove_flock(struct file *filp)
 
 /**
  *	posix_unblock_lock - stop waiting for a file lock
- *      @filp:   how the file was opened
  *	@waiter: the lock which was waiting
  *
  *	lockd needs to block waiting for locks.
  */
 int
-posix_unblock_lock(struct file *filp, struct file_lock *waiter)
+posix_unblock_lock(struct file_lock *waiter)
 {
 	int status = 0;
 
@@ -2117,7 +2116,6 @@ posix_unblock_lock(struct file *filp, struct file_lock *waiter)
 	unlock_flocks();
 	return status;
 }
-
 EXPORT_SYMBOL(posix_unblock_lock);
 
 /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 43db02e..9f160c6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -994,7 +994,7 @@ extern void locks_release_private(struct file_lock *);
 extern void posix_test_lock(struct file *, struct file_lock *);
 extern int posix_lock_file(struct file *, struct file_lock *, struct file_lock *);
 extern int posix_lock_file_wait(struct file *, struct file_lock *);
-extern int posix_unblock_lock(struct file *, struct file_lock *);
+extern int posix_unblock_lock(struct file_lock *);
 extern int vfs_test_lock(struct file *, struct file_lock *);
 extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
 extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
@@ -1084,8 +1084,7 @@ static inline int posix_lock_file_wait(struct file *filp, struct file_lock *fl)
 	return -ENOLCK;
 }
 
-static inline int posix_unblock_lock(struct file *filp,
-				     struct file_lock *waiter)
+static inline int posix_unblock_lock(struct file_lock *waiter)
 {
 	return -ENOENT;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 01/14] locks: drop the unused filp argument to posix_unblock_lock
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/lockd/svclock.c |    2 +-
 fs/locks.c         |    4 +---
 include/linux/fs.h |    5 ++---
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index e703318..a469098 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -276,7 +276,7 @@ static int nlmsvc_unlink_block(struct nlm_block *block)
 	dprintk("lockd: unlinking block %p...\n", block);
 
 	/* Remove block from list */
-	status = posix_unblock_lock(block->b_file->f_file, &block->b_call->a_args.lock.fl);
+	status = posix_unblock_lock(&block->b_call->a_args.lock.fl);
 	nlmsvc_remove_block(block);
 	return status;
 }
diff --git a/fs/locks.c b/fs/locks.c
index cb424a4..72fb2b7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2099,13 +2099,12 @@ void locks_remove_flock(struct file *filp)
 
 /**
  *	posix_unblock_lock - stop waiting for a file lock
- *      @filp:   how the file was opened
  *	@waiter: the lock which was waiting
  *
  *	lockd needs to block waiting for locks.
  */
 int
-posix_unblock_lock(struct file *filp, struct file_lock *waiter)
+posix_unblock_lock(struct file_lock *waiter)
 {
 	int status = 0;
 
@@ -2117,7 +2116,6 @@ posix_unblock_lock(struct file *filp, struct file_lock *waiter)
 	unlock_flocks();
 	return status;
 }
-
 EXPORT_SYMBOL(posix_unblock_lock);
 
 /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 43db02e..9f160c6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -994,7 +994,7 @@ extern void locks_release_private(struct file_lock *);
 extern void posix_test_lock(struct file *, struct file_lock *);
 extern int posix_lock_file(struct file *, struct file_lock *, struct file_lock *);
 extern int posix_lock_file_wait(struct file *, struct file_lock *);
-extern int posix_unblock_lock(struct file *, struct file_lock *);
+extern int posix_unblock_lock(struct file_lock *);
 extern int vfs_test_lock(struct file *, struct file_lock *);
 extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
 extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
@@ -1084,8 +1084,7 @@ static inline int posix_lock_file_wait(struct file *filp, struct file_lock *fl)
 	return -ENOLCK;
 }
 
-static inline int posix_unblock_lock(struct file *filp,
-				     struct file_lock *waiter)
+static inline int posix_unblock_lock(struct file_lock *waiter)
 {
 	return -ENOENT;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 01/14] locks: drop the unused filp argument to posix_unblock_lock
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/lockd/svclock.c |    2 +-
 fs/locks.c         |    4 +---
 include/linux/fs.h |    5 ++---
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index e703318..a469098 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -276,7 +276,7 @@ static int nlmsvc_unlink_block(struct nlm_block *block)
 	dprintk("lockd: unlinking block %p...\n", block);
 
 	/* Remove block from list */
-	status = posix_unblock_lock(block->b_file->f_file, &block->b_call->a_args.lock.fl);
+	status = posix_unblock_lock(&block->b_call->a_args.lock.fl);
 	nlmsvc_remove_block(block);
 	return status;
 }
diff --git a/fs/locks.c b/fs/locks.c
index cb424a4..72fb2b7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2099,13 +2099,12 @@ void locks_remove_flock(struct file *filp)
 
 /**
  *	posix_unblock_lock - stop waiting for a file lock
- *      @filp:   how the file was opened
  *	@waiter: the lock which was waiting
  *
  *	lockd needs to block waiting for locks.
  */
 int
-posix_unblock_lock(struct file *filp, struct file_lock *waiter)
+posix_unblock_lock(struct file_lock *waiter)
 {
 	int status = 0;
 
@@ -2117,7 +2116,6 @@ posix_unblock_lock(struct file *filp, struct file_lock *waiter)
 	unlock_flocks();
 	return status;
 }
-
 EXPORT_SYMBOL(posix_unblock_lock);
 
 /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 43db02e..9f160c6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -994,7 +994,7 @@ extern void locks_release_private(struct file_lock *);
 extern void posix_test_lock(struct file *, struct file_lock *);
 extern int posix_lock_file(struct file *, struct file_lock *, struct file_lock *);
 extern int posix_lock_file_wait(struct file *, struct file_lock *);
-extern int posix_unblock_lock(struct file *, struct file_lock *);
+extern int posix_unblock_lock(struct file_lock *);
 extern int vfs_test_lock(struct file *, struct file_lock *);
 extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
 extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
@@ -1084,8 +1084,7 @@ static inline int posix_lock_file_wait(struct file *filp, struct file_lock *fl)
 	return -ENOLCK;
 }
 
-static inline int posix_unblock_lock(struct file *filp,
-				     struct file_lock *waiter)
+static inline int posix_unblock_lock(struct file_lock *waiter)
 {
 	return -ENOENT;
 }
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 02/14] cifs: use posix_unblock_lock instead of locks_delete_block
  2013-06-21 12:58 ` Jeff Layton
  (?)
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: linux-cifs, linux-nfs, cluster-devel, sage, samba-technical,
	Trond.Myklebust, linux-kernel, piastryyy, linux-afs, dhowells,
	smfrench, linux-fsdevel, ceph-devel, akpm

commit 66189be74 (CIFS: Fix VFS lock usage for oplocked files) exported
the locks_delete_block symbol. There's already an exported helper
function that provides this capability however, so make cifs use that
instead and turn locks_delete_block back into a static function.

Note that if fl->fl_next == NULL then this lock has already been through
locks_delete_block(), so we should be OK to ignore an ENOENT error here
and simply not retry the lock.

Cc: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/cifs/file.c     |    2 +-
 fs/locks.c         |    3 +--
 include/linux/fs.h |    5 -----
 3 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 48b29d2..1686e40 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -999,7 +999,7 @@ try_again:
 		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_next);
 		if (!rc)
 			goto try_again;
-		locks_delete_block(flock);
+		posix_unblock_lock(flock);
 	}
 	return rc;
 }
diff --git a/fs/locks.c b/fs/locks.c
index 72fb2b7..d732e22 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -496,13 +496,12 @@ static void __locks_delete_block(struct file_lock *waiter)
 
 /*
  */
-void locks_delete_block(struct file_lock *waiter)
+static void locks_delete_block(struct file_lock *waiter)
 {
 	lock_flocks();
 	__locks_delete_block(waiter);
 	unlock_flocks();
 }
-EXPORT_SYMBOL(locks_delete_block);
 
 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9f160c6..cabc08a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1006,7 +1006,6 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
 extern int lease_modify(struct file_lock **, int);
 extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
 extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
-extern void locks_delete_block(struct file_lock *waiter);
 extern void lock_flocks(void);
 extern void unlock_flocks(void);
 #else /* !CONFIG_FILE_LOCKING */
@@ -1150,10 +1149,6 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
 	return 1;
 }
 
-static inline void locks_delete_block(struct file_lock *waiter)
-{
-}
-
 static inline void lock_flocks(void)
 {
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 02/14] cifs: use posix_unblock_lock instead of locks_delete_block
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

commit 66189be74 (CIFS: Fix VFS lock usage for oplocked files) exported
the locks_delete_block symbol. There's already an exported helper
function that provides this capability however, so make cifs use that
instead and turn locks_delete_block back into a static function.

Note that if fl->fl_next == NULL then this lock has already been through
locks_delete_block(), so we should be OK to ignore an ENOENT error here
and simply not retry the lock.

Cc: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/cifs/file.c     |    2 +-
 fs/locks.c         |    3 +--
 include/linux/fs.h |    5 -----
 3 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 48b29d2..1686e40 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -999,7 +999,7 @@ try_again:
 		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_next);
 		if (!rc)
 			goto try_again;
-		locks_delete_block(flock);
+		posix_unblock_lock(flock);
 	}
 	return rc;
 }
diff --git a/fs/locks.c b/fs/locks.c
index 72fb2b7..d732e22 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -496,13 +496,12 @@ static void __locks_delete_block(struct file_lock *waiter)
 
 /*
  */
-void locks_delete_block(struct file_lock *waiter)
+static void locks_delete_block(struct file_lock *waiter)
 {
 	lock_flocks();
 	__locks_delete_block(waiter);
 	unlock_flocks();
 }
-EXPORT_SYMBOL(locks_delete_block);
 
 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9f160c6..cabc08a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1006,7 +1006,6 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
 extern int lease_modify(struct file_lock **, int);
 extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
 extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
-extern void locks_delete_block(struct file_lock *waiter);
 extern void lock_flocks(void);
 extern void unlock_flocks(void);
 #else /* !CONFIG_FILE_LOCKING */
@@ -1150,10 +1149,6 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
 	return 1;
 }
 
-static inline void locks_delete_block(struct file_lock *waiter)
-{
-}
-
 static inline void lock_flocks(void)
 {
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 02/14] cifs: use posix_unblock_lock instead of locks_delete_block
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

commit 66189be74 (CIFS: Fix VFS lock usage for oplocked files) exported
the locks_delete_block symbol. There's already an exported helper
function that provides this capability however, so make cifs use that
instead and turn locks_delete_block back into a static function.

Note that if fl->fl_next == NULL then this lock has already been through
locks_delete_block(), so we should be OK to ignore an ENOENT error here
and simply not retry the lock.

Cc: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/cifs/file.c     |    2 +-
 fs/locks.c         |    3 +--
 include/linux/fs.h |    5 -----
 3 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 48b29d2..1686e40 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -999,7 +999,7 @@ try_again:
 		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_next);
 		if (!rc)
 			goto try_again;
-		locks_delete_block(flock);
+		posix_unblock_lock(flock);
 	}
 	return rc;
 }
diff --git a/fs/locks.c b/fs/locks.c
index 72fb2b7..d732e22 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -496,13 +496,12 @@ static void __locks_delete_block(struct file_lock *waiter)
 
 /*
  */
-void locks_delete_block(struct file_lock *waiter)
+static void locks_delete_block(struct file_lock *waiter)
 {
 	lock_flocks();
 	__locks_delete_block(waiter);
 	unlock_flocks();
 }
-EXPORT_SYMBOL(locks_delete_block);
 
 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9f160c6..cabc08a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1006,7 +1006,6 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
 extern int lease_modify(struct file_lock **, int);
 extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
 extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
-extern void locks_delete_block(struct file_lock *waiter);
 extern void lock_flocks(void);
 extern void unlock_flocks(void);
 #else /* !CONFIG_FILE_LOCKING */
@@ -1150,10 +1149,6 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
 	return 1;
 }
 
-static inline void locks_delete_block(struct file_lock *waiter)
-{
-}
-
 static inline void lock_flocks(void)
 {
 }
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 03/14] locks: make generic_add_lease and generic_delete_lease static
  2013-06-21 12:58 ` Jeff Layton
  (?)
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: linux-cifs, linux-nfs, cluster-devel, sage, samba-technical,
	Trond.Myklebust, linux-kernel, piastryyy, linux-afs, dhowells,
	smfrench, linux-fsdevel, ceph-devel, akpm

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index d732e22..804bb9e 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1337,7 +1337,7 @@ int fcntl_getlease(struct file *filp)
 	return type;
 }
 
-int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
+static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
 {
 	struct file_lock *fl, **before, **my_before = NULL, *lease;
 	struct dentry *dentry = filp->f_path.dentry;
@@ -1402,7 +1402,7 @@ out:
 	return error;
 }
 
-int generic_delete_lease(struct file *filp, struct file_lock **flp)
+static int generic_delete_lease(struct file *filp, struct file_lock **flp)
 {
 	struct file_lock *fl, **before;
 	struct dentry *dentry = filp->f_path.dentry;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 03/14] locks: make generic_add_lease and generic_delete_lease static
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index d732e22..804bb9e 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1337,7 +1337,7 @@ int fcntl_getlease(struct file *filp)
 	return type;
 }
 
-int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
+static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
 {
 	struct file_lock *fl, **before, **my_before = NULL, *lease;
 	struct dentry *dentry = filp->f_path.dentry;
@@ -1402,7 +1402,7 @@ out:
 	return error;
 }
 
-int generic_delete_lease(struct file *filp, struct file_lock **flp)
+static int generic_delete_lease(struct file *filp, struct file_lock **flp)
 {
 	struct file_lock *fl, **before;
 	struct dentry *dentry = filp->f_path.dentry;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 03/14] locks: make generic_add_lease and generic_delete_lease static
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index d732e22..804bb9e 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1337,7 +1337,7 @@ int fcntl_getlease(struct file *filp)
 	return type;
 }
 
-int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
+static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
 {
 	struct file_lock *fl, **before, **my_before = NULL, *lease;
 	struct dentry *dentry = filp->f_path.dentry;
@@ -1402,7 +1402,7 @@ out:
 	return error;
 }
 
-int generic_delete_lease(struct file *filp, struct file_lock **flp)
+static int generic_delete_lease(struct file *filp, struct file_lock **flp)
 {
 	struct file_lock *fl, **before;
 	struct dentry *dentry = filp->f_path.dentry;
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 04/14] locks: comment cleanups and clarifications
  2013-06-21 12:58 ` Jeff Layton
  (?)
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: linux-cifs, linux-nfs, cluster-devel, sage, samba-technical,
	Trond.Myklebust, linux-kernel, piastryyy, linux-afs, dhowells,
	smfrench, linux-fsdevel, ceph-devel, akpm

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c         |   21 +++++++++++++--------
 include/linux/fs.h |   18 ++++++++++++++++++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 804bb9e..ddeab49 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -518,9 +518,10 @@ static void locks_insert_block(struct file_lock *blocker,
 		list_add(&waiter->fl_link, &blocked_list);
 }
 
-/* Wake up processes blocked waiting for blocker.
- * If told to wait then schedule the processes until the block list
- * is empty, otherwise empty the block list ourselves.
+/*
+ * Wake up processes blocked waiting for blocker.
+ *
+ * Must be called with the file_lock_lock held!
  */
 static void locks_wake_up_blocks(struct file_lock *blocker)
 {
@@ -806,6 +807,11 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	}
 
 	lock_flocks();
+	/*
+	 * New lock request. Walk all POSIX locks and look for conflicts. If
+	 * there are any, either return error or put the request on the
+	 * blocker's list of waiters and the global blocked_list.
+	 */
 	if (request->fl_type != F_UNLCK) {
 		for_each_lock(inode, before) {
 			fl = *before;
@@ -844,7 +850,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		before = &fl->fl_next;
 	}
 
-	/* Process locks with this owner.  */
+	/* Process locks with this owner. */
 	while ((fl = *before) && posix_same_owner(request, fl)) {
 		/* Detect adjacent or overlapping regions (if same lock type)
 		 */
@@ -930,10 +936,9 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	}
 
 	/*
-	 * The above code only modifies existing locks in case of
-	 * merging or replacing.  If new lock(s) need to be inserted
-	 * all modifications are done bellow this, so it's safe yet to
-	 * bail out.
+	 * The above code only modifies existing locks in case of merging or
+	 * replacing. If new lock(s) need to be inserted all modifications are
+	 * done below this, so it's safe yet to bail out.
 	 */
 	error = -ENOLCK; /* "no luck" */
 	if (right && left == right && !new_fl2)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cabc08a..cba0fb5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -926,6 +926,24 @@ int locks_in_grace(struct net *);
 /* that will die - we need it for nfs_lock_info */
 #include <linux/nfs_fs_i.h>
 
+/*
+ * struct file_lock represents a generic "file lock". It's used to represent
+ * POSIX byte range locks, BSD (flock) locks, and leases. It's important to
+ * note that the same struct is used to represent both a request for a lock and
+ * the lock itself, but the same object is never used for both.
+ *
+ * FIXME: should we create a separate "struct lock_request" to help distinguish
+ * these two uses?
+ *
+ * The i_flock list is ordered by:
+ *
+ * 1) lock type -- FL_LEASEs first, then FL_FLOCK, and finally FL_POSIX
+ * 2) lock owner
+ * 3) lock range start
+ * 4) lock range end
+ *
+ * Obviously, the last two criteria only matter for POSIX locks.
+ */
 struct file_lock {
 	struct file_lock *fl_next;	/* singly linked list for this inode  */
 	struct list_head fl_link;	/* doubly linked list of all locks */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 04/14] locks: comment cleanups and clarifications
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c         |   21 +++++++++++++--------
 include/linux/fs.h |   18 ++++++++++++++++++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 804bb9e..ddeab49 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -518,9 +518,10 @@ static void locks_insert_block(struct file_lock *blocker,
 		list_add(&waiter->fl_link, &blocked_list);
 }
 
-/* Wake up processes blocked waiting for blocker.
- * If told to wait then schedule the processes until the block list
- * is empty, otherwise empty the block list ourselves.
+/*
+ * Wake up processes blocked waiting for blocker.
+ *
+ * Must be called with the file_lock_lock held!
  */
 static void locks_wake_up_blocks(struct file_lock *blocker)
 {
@@ -806,6 +807,11 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	}
 
 	lock_flocks();
+	/*
+	 * New lock request. Walk all POSIX locks and look for conflicts. If
+	 * there are any, either return error or put the request on the
+	 * blocker's list of waiters and the global blocked_list.
+	 */
 	if (request->fl_type != F_UNLCK) {
 		for_each_lock(inode, before) {
 			fl = *before;
@@ -844,7 +850,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		before = &fl->fl_next;
 	}
 
-	/* Process locks with this owner.  */
+	/* Process locks with this owner. */
 	while ((fl = *before) && posix_same_owner(request, fl)) {
 		/* Detect adjacent or overlapping regions (if same lock type)
 		 */
@@ -930,10 +936,9 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	}
 
 	/*
-	 * The above code only modifies existing locks in case of
-	 * merging or replacing.  If new lock(s) need to be inserted
-	 * all modifications are done bellow this, so it's safe yet to
-	 * bail out.
+	 * The above code only modifies existing locks in case of merging or
+	 * replacing. If new lock(s) need to be inserted all modifications are
+	 * done below this, so it's safe yet to bail out.
 	 */
 	error = -ENOLCK; /* "no luck" */
 	if (right && left == right && !new_fl2)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cabc08a..cba0fb5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -926,6 +926,24 @@ int locks_in_grace(struct net *);
 /* that will die - we need it for nfs_lock_info */
 #include <linux/nfs_fs_i.h>
 
+/*
+ * struct file_lock represents a generic "file lock". It's used to represent
+ * POSIX byte range locks, BSD (flock) locks, and leases. It's important to
+ * note that the same struct is used to represent both a request for a lock and
+ * the lock itself, but the same object is never used for both.
+ *
+ * FIXME: should we create a separate "struct lock_request" to help distinguish
+ * these two uses?
+ *
+ * The i_flock list is ordered by:
+ *
+ * 1) lock type -- FL_LEASEs first, then FL_FLOCK, and finally FL_POSIX
+ * 2) lock owner
+ * 3) lock range start
+ * 4) lock range end
+ *
+ * Obviously, the last two criteria only matter for POSIX locks.
+ */
 struct file_lock {
 	struct file_lock *fl_next;	/* singly linked list for this inode  */
 	struct list_head fl_link;	/* doubly linked list of all locks */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 04/14] locks: comment cleanups and clarifications
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c         |   21 +++++++++++++--------
 include/linux/fs.h |   18 ++++++++++++++++++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 804bb9e..ddeab49 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -518,9 +518,10 @@ static void locks_insert_block(struct file_lock *blocker,
 		list_add(&waiter->fl_link, &blocked_list);
 }
 
-/* Wake up processes blocked waiting for blocker.
- * If told to wait then schedule the processes until the block list
- * is empty, otherwise empty the block list ourselves.
+/*
+ * Wake up processes blocked waiting for blocker.
+ *
+ * Must be called with the file_lock_lock held!
  */
 static void locks_wake_up_blocks(struct file_lock *blocker)
 {
@@ -806,6 +807,11 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	}
 
 	lock_flocks();
+	/*
+	 * New lock request. Walk all POSIX locks and look for conflicts. If
+	 * there are any, either return error or put the request on the
+	 * blocker's list of waiters and the global blocked_list.
+	 */
 	if (request->fl_type != F_UNLCK) {
 		for_each_lock(inode, before) {
 			fl = *before;
@@ -844,7 +850,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		before = &fl->fl_next;
 	}
 
-	/* Process locks with this owner.  */
+	/* Process locks with this owner. */
 	while ((fl = *before) && posix_same_owner(request, fl)) {
 		/* Detect adjacent or overlapping regions (if same lock type)
 		 */
@@ -930,10 +936,9 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	}
 
 	/*
-	 * The above code only modifies existing locks in case of
-	 * merging or replacing.  If new lock(s) need to be inserted
-	 * all modifications are done bellow this, so it's safe yet to
-	 * bail out.
+	 * The above code only modifies existing locks in case of merging or
+	 * replacing. If new lock(s) need to be inserted all modifications are
+	 * done below this, so it's safe yet to bail out.
 	 */
 	error = -ENOLCK; /* "no luck" */
 	if (right && left == right && !new_fl2)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cabc08a..cba0fb5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -926,6 +926,24 @@ int locks_in_grace(struct net *);
 /* that will die - we need it for nfs_lock_info */
 #include <linux/nfs_fs_i.h>
 
+/*
+ * struct file_lock represents a generic "file lock". It's used to represent
+ * POSIX byte range locks, BSD (flock) locks, and leases. It's important to
+ * note that the same struct is used to represent both a request for a lock and
+ * the lock itself, but the same object is never used for both.
+ *
+ * FIXME: should we create a separate "struct lock_request" to help distinguish
+ * these two uses?
+ *
+ * The i_flock list is ordered by:
+ *
+ * 1) lock type -- FL_LEASEs first, then FL_FLOCK, and finally FL_POSIX
+ * 2) lock owner
+ * 3) lock range start
+ * 4) lock range end
+ *
+ * Obviously, the last two criteria only matter for POSIX locks.
+ */
 struct file_lock {
 	struct file_lock *fl_next;	/* singly linked list for this inode  */
 	struct list_head fl_link;	/* doubly linked list of all locks */
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 05/14] locks: make "added" in __posix_lock_file a bool
  2013-06-21 12:58 ` Jeff Layton
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

...save 3 bytes of stack space.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index ddeab49..1d6cb28 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -791,7 +791,8 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	struct file_lock *left = NULL;
 	struct file_lock *right = NULL;
 	struct file_lock **before;
-	int error, added = 0;
+	int error;
+	bool added = false;
 
 	/*
 	 * We may need two file_lock structures for this operation,
@@ -885,7 +886,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 				continue;
 			}
 			request = fl;
-			added = 1;
+			added = true;
 		}
 		else {
 			/* Processing for different lock types is a bit
@@ -896,7 +897,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 			if (fl->fl_start > request->fl_end)
 				break;
 			if (request->fl_type == F_UNLCK)
-				added = 1;
+				added = true;
 			if (fl->fl_start < request->fl_start)
 				left = fl;
 			/* If the next lock in the list has a higher end
@@ -926,7 +927,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 				locks_release_private(fl);
 				locks_copy_private(fl, request);
 				request = fl;
-				added = 1;
+				added = true;
 			}
 		}
 		/* Go on to next lock.
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 05/14] locks: make "added" in __posix_lock_file a bool
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

...save 3 bytes of stack space.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index ddeab49..1d6cb28 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -791,7 +791,8 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	struct file_lock *left = NULL;
 	struct file_lock *right = NULL;
 	struct file_lock **before;
-	int error, added = 0;
+	int error;
+	bool added = false;
 
 	/*
 	 * We may need two file_lock structures for this operation,
@@ -885,7 +886,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 				continue;
 			}
 			request = fl;
-			added = 1;
+			added = true;
 		}
 		else {
 			/* Processing for different lock types is a bit
@@ -896,7 +897,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 			if (fl->fl_start > request->fl_end)
 				break;
 			if (request->fl_type == F_UNLCK)
-				added = 1;
+				added = true;
 			if (fl->fl_start < request->fl_start)
 				left = fl;
 			/* If the next lock in the list has a higher end
@@ -926,7 +927,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 				locks_release_private(fl);
 				locks_copy_private(fl, request);
 				request = fl;
-				added = 1;
+				added = true;
 			}
 		}
 		/* Go on to next lock.
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 06/14] locks: encapsulate the fl_link list handling
  2013-06-21 12:58 ` Jeff Layton
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

Move the fl_link list handling routines into a separate set of helpers.
Also ensure that locks and requests are always put on global lists
last (after fully initializing them) and are taken off before unintializing
them.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c |   45 ++++++++++++++++++++++++++++++++++++---------
 1 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 1d6cb28..89d898b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -153,13 +153,15 @@ int lease_break_time = 45;
 #define for_each_lock(inode, lockp) \
 	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
 
+/* The global file_lock_list is only used for displaying /proc/locks. */
 static LIST_HEAD(file_lock_list);
+
+/* The blocked_list is used to find POSIX lock loops for deadlock detection. */
 static LIST_HEAD(blocked_list);
+
+/* Protects the two list heads above, plus the inode->i_flock list */
 static DEFINE_SPINLOCK(file_lock_lock);
 
-/*
- * Protects the two list heads above, plus the inode->i_flock list
- */
 void lock_flocks(void)
 {
 	spin_lock(&file_lock_lock);
@@ -484,13 +486,37 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
 	return fl1->fl_owner == fl2->fl_owner;
 }
 
+static inline void
+locks_insert_global_locks(struct file_lock *fl)
+{
+	list_add_tail(&fl->fl_link, &file_lock_list);
+}
+
+static inline void
+locks_delete_global_locks(struct file_lock *fl)
+{
+	list_del_init(&fl->fl_link);
+}
+
+static inline void
+locks_insert_global_blocked(struct file_lock *waiter)
+{
+	list_add(&waiter->fl_link, &blocked_list);
+}
+
+static inline void
+locks_delete_global_blocked(struct file_lock *waiter)
+{
+	list_del_init(&waiter->fl_link);
+}
+
 /* Remove waiter from blocker's block list.
  * When blocker ends up pointing to itself then the list is empty.
  */
 static void __locks_delete_block(struct file_lock *waiter)
 {
+	locks_delete_global_blocked(waiter);
 	list_del_init(&waiter->fl_block);
-	list_del_init(&waiter->fl_link);
 	waiter->fl_next = NULL;
 }
 
@@ -512,10 +538,10 @@ static void locks_insert_block(struct file_lock *blocker,
 			       struct file_lock *waiter)
 {
 	BUG_ON(!list_empty(&waiter->fl_block));
-	list_add_tail(&waiter->fl_block, &blocker->fl_block);
 	waiter->fl_next = blocker;
+	list_add_tail(&waiter->fl_block, &blocker->fl_block);
 	if (IS_POSIX(blocker))
-		list_add(&waiter->fl_link, &blocked_list);
+		locks_insert_global_blocked(request);
 }
 
 /*
@@ -543,13 +569,13 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
  */
 static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl)
 {
-	list_add(&fl->fl_link, &file_lock_list);
-
 	fl->fl_nspid = get_pid(task_tgid(current));
 
 	/* insert into file's list */
 	fl->fl_next = *pos;
 	*pos = fl;
+
+	locks_insert_global_locks(fl);
 }
 
 /*
@@ -562,9 +588,10 @@ static void locks_delete_lock(struct file_lock **thisfl_p)
 {
 	struct file_lock *fl = *thisfl_p;
 
+	locks_delete_global_locks(fl);
+
 	*thisfl_p = fl->fl_next;
 	fl->fl_next = NULL;
-	list_del_init(&fl->fl_link);
 
 	if (fl->fl_nspid) {
 		put_pid(fl->fl_nspid);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 06/14] locks: encapsulate the fl_link list handling
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Move the fl_link list handling routines into a separate set of helpers.
Also ensure that locks and requests are always put on global lists
last (after fully initializing them) and are taken off before unintializing
them.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c |   45 ++++++++++++++++++++++++++++++++++++---------
 1 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 1d6cb28..89d898b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -153,13 +153,15 @@ int lease_break_time = 45;
 #define for_each_lock(inode, lockp) \
 	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
 
+/* The global file_lock_list is only used for displaying /proc/locks. */
 static LIST_HEAD(file_lock_list);
+
+/* The blocked_list is used to find POSIX lock loops for deadlock detection. */
 static LIST_HEAD(blocked_list);
+
+/* Protects the two list heads above, plus the inode->i_flock list */
 static DEFINE_SPINLOCK(file_lock_lock);
 
-/*
- * Protects the two list heads above, plus the inode->i_flock list
- */
 void lock_flocks(void)
 {
 	spin_lock(&file_lock_lock);
@@ -484,13 +486,37 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
 	return fl1->fl_owner == fl2->fl_owner;
 }
 
+static inline void
+locks_insert_global_locks(struct file_lock *fl)
+{
+	list_add_tail(&fl->fl_link, &file_lock_list);
+}
+
+static inline void
+locks_delete_global_locks(struct file_lock *fl)
+{
+	list_del_init(&fl->fl_link);
+}
+
+static inline void
+locks_insert_global_blocked(struct file_lock *waiter)
+{
+	list_add(&waiter->fl_link, &blocked_list);
+}
+
+static inline void
+locks_delete_global_blocked(struct file_lock *waiter)
+{
+	list_del_init(&waiter->fl_link);
+}
+
 /* Remove waiter from blocker's block list.
  * When blocker ends up pointing to itself then the list is empty.
  */
 static void __locks_delete_block(struct file_lock *waiter)
 {
+	locks_delete_global_blocked(waiter);
 	list_del_init(&waiter->fl_block);
-	list_del_init(&waiter->fl_link);
 	waiter->fl_next = NULL;
 }
 
@@ -512,10 +538,10 @@ static void locks_insert_block(struct file_lock *blocker,
 			       struct file_lock *waiter)
 {
 	BUG_ON(!list_empty(&waiter->fl_block));
-	list_add_tail(&waiter->fl_block, &blocker->fl_block);
 	waiter->fl_next = blocker;
+	list_add_tail(&waiter->fl_block, &blocker->fl_block);
 	if (IS_POSIX(blocker))
-		list_add(&waiter->fl_link, &blocked_list);
+		locks_insert_global_blocked(request);
 }
 
 /*
@@ -543,13 +569,13 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
  */
 static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl)
 {
-	list_add(&fl->fl_link, &file_lock_list);
-
 	fl->fl_nspid = get_pid(task_tgid(current));
 
 	/* insert into file's list */
 	fl->fl_next = *pos;
 	*pos = fl;
+
+	locks_insert_global_locks(fl);
 }
 
 /*
@@ -562,9 +588,10 @@ static void locks_delete_lock(struct file_lock **thisfl_p)
 {
 	struct file_lock *fl = *thisfl_p;
 
+	locks_delete_global_locks(fl);
+
 	*thisfl_p = fl->fl_next;
 	fl->fl_next = NULL;
-	list_del_init(&fl->fl_link);
 
 	if (fl->fl_nspid) {
 		put_pid(fl->fl_nspid);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 07/14] locks: protect most of the file_lock handling with i_lock
  2013-06-21 12:58 ` Jeff Layton
  (?)
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: linux-cifs, linux-nfs, cluster-devel, sage, samba-technical,
	Trond.Myklebust, linux-kernel, piastryyy, linux-afs, dhowells,
	smfrench, linux-fsdevel, ceph-devel, akpm

Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 Documentation/filesystems/Locking |   21 +++--
 fs/afs/flock.c                    |    5 +-
 fs/ceph/locks.c                   |    2 +-
 fs/ceph/mds_client.c              |    8 +-
 fs/cifs/cifsfs.c                  |    2 +-
 fs/cifs/file.c                    |   13 ++--
 fs/gfs2/file.c                    |    2 +-
 fs/lockd/svcsubs.c                |   12 ++--
 fs/locks.c                        |  164 +++++++++++++++++++++++--------------
 fs/nfs/delegation.c               |   10 +-
 fs/nfs/nfs4state.c                |    8 +-
 fs/nfsd/nfs4state.c               |    8 +-
 include/linux/fs.h                |   11 ---
 13 files changed, 154 insertions(+), 112 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 0706d32..413685f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -344,7 +344,7 @@ prototypes:
 
 
 locking rules:
-			file_lock_lock	may block
+			inode->i_lock	may block
 fl_copy_lock:		yes		no
 fl_release_private:	maybe		no
 
@@ -357,12 +357,19 @@ prototypes:
 	int (*lm_change)(struct file_lock **, int);
 
 locking rules:
-			file_lock_lock	may block
-lm_compare_owner:	yes		no
-lm_notify:		yes		no
-lm_grant:		no		no
-lm_break:		yes		no
-lm_change		yes		no
+
+			inode->i_lock	file_lock_lock	may block
+lm_compare_owner:	yes[1]		maybe		no
+lm_notify:		yes		yes		no
+lm_grant:		no		no		no
+lm_break:		yes		no		no
+lm_change		yes		no		no
+
+[1]:	->lm_compare_owner is generally called with *an* inode->i_lock held. It
+may not be the i_lock of the inode for either file_lock being compared! This is
+the case with deadlock detection, since the code has to chase down the owners
+of locks that may be entirely unrelated to the one on which the lock is being
+acquired. When doing a search for deadlocks, the file_lock_lock is also held.
 
 --------------------------- buffer_head -----------------------------------
 prototypes:
diff --git a/fs/afs/flock.c b/fs/afs/flock.c
index 2497bf3..03fc0d1 100644
--- a/fs/afs/flock.c
+++ b/fs/afs/flock.c
@@ -252,6 +252,7 @@ static void afs_defer_unlock(struct afs_vnode *vnode, struct key *key)
  */
 static int afs_do_setlk(struct file *file, struct file_lock *fl)
 {
+	struct inode = file_inode(file);
 	struct afs_vnode *vnode = AFS_FS_I(file->f_mapping->host);
 	afs_lock_type_t type;
 	struct key *key = file->private_data;
@@ -273,7 +274,7 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
 
 	type = (fl->fl_type == F_RDLCK) ? AFS_LOCK_READ : AFS_LOCK_WRITE;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 
 	/* make sure we've got a callback on this file and that our view of the
 	 * data version is up to date */
@@ -420,7 +421,7 @@ given_lock:
 	afs_vnode_fetch_status(vnode, NULL, key);
 
 error:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	_leave(" = %d", ret);
 	return ret;
 
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index ebbf680..690f73f 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -192,7 +192,7 @@ void ceph_count_locks(struct inode *inode, int *fcntl_count, int *flock_count)
 
 /**
  * Encode the flock and fcntl locks for the given inode into the ceph_filelock
- * array. Must be called with lock_flocks() already held.
+ * array. Must be called with inode->i_lock already held.
  * If we encounter more of a specific lock type than expected, return -ENOSPC.
  */
 int ceph_encode_locks_to_buffer(struct inode *inode,
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 4d29203..74fd289 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2481,20 +2481,20 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
 		struct ceph_filelock *flocks;
 
 encode_again:
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 		ceph_count_locks(inode, &num_fcntl_locks, &num_flock_locks);
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		flocks = kmalloc((num_fcntl_locks+num_flock_locks) *
 				 sizeof(struct ceph_filelock), GFP_NOFS);
 		if (!flocks) {
 			err = -ENOMEM;
 			goto out_free;
 		}
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 		err = ceph_encode_locks_to_buffer(inode, flocks,
 						  num_fcntl_locks,
 						  num_flock_locks);
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		if (err) {
 			kfree(flocks);
 			if (err == -ENOSPC)
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 3752b9f..e81655c 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -765,7 +765,7 @@ static loff_t cifs_llseek(struct file *file, loff_t offset, int whence)
 
 static int cifs_setlease(struct file *file, long arg, struct file_lock **lease)
 {
-	/* note that this is called by vfs setlease with lock_flocks held
+	/* note that this is called by vfs setlease with i_lock held
 	   to protect *lease from going away */
 	struct inode *inode = file_inode(file);
 	struct cifsFileInfo *cfile = file->private_data;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 1686e40..0630710 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1092,6 +1092,7 @@ struct lock_to_push {
 static int
 cifs_push_posix_locks(struct cifsFileInfo *cfile)
 {
+	struct inode *inode = cfile->dentry->d_inode;
 	struct cifs_tcon *tcon = tlink_tcon(cfile->tlink);
 	struct file_lock *flock, **before;
 	unsigned int count = 0, i = 0;
@@ -1102,12 +1103,12 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
 
 	xid = get_xid();
 
-	lock_flocks();
-	cifs_for_each_lock(cfile->dentry->d_inode, before) {
+	spin_lock(&inode->i_lock);
+	cifs_for_each_lock(inode, before) {
 		if ((*before)->fl_flags & FL_POSIX)
 			count++;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	INIT_LIST_HEAD(&locks_to_send);
 
@@ -1126,8 +1127,8 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
 	}
 
 	el = locks_to_send.next;
-	lock_flocks();
-	cifs_for_each_lock(cfile->dentry->d_inode, before) {
+	spin_lock(&inode->i_lock);
+	cifs_for_each_lock(inode, before) {
 		flock = *before;
 		if ((flock->fl_flags & FL_POSIX) == 0)
 			continue;
@@ -1152,7 +1153,7 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
 		lck->offset = flock->fl_start;
 		el = el->next;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	list_for_each_entry_safe(lck, tmp, &locks_to_send, llist) {
 		int stored_rc;
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index ad0dc38..2398b41 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -896,7 +896,7 @@ out_uninit:
  * cluster; until we do, disable leases (by just returning -EINVAL),
  * unless the administrator has requested purely local locking.
  *
- * Locking: called under lock_flocks
+ * Locking: called under i_lock
  *
  * Returns: errno
  */
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
index 97e8741..dc5c759 100644
--- a/fs/lockd/svcsubs.c
+++ b/fs/lockd/svcsubs.c
@@ -169,7 +169,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file,
 
 again:
 	file->f_locks = 0;
-	lock_flocks(); /* protects i_flock list */
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl; fl = fl->fl_next) {
 		if (fl->fl_lmops != &nlmsvc_lock_operations)
 			continue;
@@ -181,7 +181,7 @@ again:
 		if (match(lockhost, host)) {
 			struct file_lock lock = *fl;
 
-			unlock_flocks();
+			spin_unlock(&inode->i_lock);
 			lock.fl_type  = F_UNLCK;
 			lock.fl_start = 0;
 			lock.fl_end   = OFFSET_MAX;
@@ -193,7 +193,7 @@ again:
 			goto again;
 		}
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	return 0;
 }
@@ -228,14 +228,14 @@ nlm_file_inuse(struct nlm_file *file)
 	if (file->f_count || !list_empty(&file->f_blocks) || file->f_shares)
 		return 1;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl; fl = fl->fl_next) {
 		if (fl->fl_lmops == &nlmsvc_lock_operations) {
-			unlock_flocks();
+			spin_unlock(&inode->i_lock);
 			return 1;
 		}
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	file->f_locks = 0;
 	return 0;
 }
diff --git a/fs/locks.c b/fs/locks.c
index 89d898b..ce302d4 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -153,27 +153,37 @@ int lease_break_time = 45;
 #define for_each_lock(inode, lockp) \
 	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
 
-/* The global file_lock_list is only used for displaying /proc/locks. */
+/*
+ * The global file_lock_list is only used for displaying /proc/locks. Protected
+ * by the file_lock_lock.
+ */
 static LIST_HEAD(file_lock_list);
 
-/* The blocked_list is used to find POSIX lock loops for deadlock detection. */
+/*
+ * The blocked_list is used to find POSIX lock loops for deadlock detection.
+ * Protected by file_lock_lock.
+ */
 static LIST_HEAD(blocked_list);
 
-/* Protects the two list heads above, plus the inode->i_flock list */
+/*
+ * This lock protects the blocked_list, and the file_lock_list. Generally, if
+ * you're accessing one of those lists, you want to be holding this lock.
+ *
+ * In addition, it also protects the fl->fl_block list, and the fl->fl_next
+ * pointer for file_lock structures that are acting as lock requests (in
+ * contrast to those that are acting as records of acquired locks).
+ *
+ * Note that when we acquire this lock in order to change the above fields,
+ * we often hold the i_lock as well. In certain cases, when reading the fields
+ * protected by this lock, we can skip acquiring it iff we already hold the
+ * i_lock.
+ *
+ * In particular, adding an entry to the fl_block list requires that you hold
+ * both the i_lock and the blocked_lock_lock (acquired in that order). Deleting
+ * an entry from the list however only requires the file_lock_lock.
+ */
 static DEFINE_SPINLOCK(file_lock_lock);
 
-void lock_flocks(void)
-{
-	spin_lock(&file_lock_lock);
-}
-EXPORT_SYMBOL_GPL(lock_flocks);
-
-void unlock_flocks(void)
-{
-	spin_unlock(&file_lock_lock);
-}
-EXPORT_SYMBOL_GPL(unlock_flocks);
-
 static struct kmem_cache *filelock_cache __read_mostly;
 
 static void locks_init_lock_heads(struct file_lock *fl)
@@ -489,13 +499,17 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
 static inline void
 locks_insert_global_locks(struct file_lock *fl)
 {
+	spin_lock(&file_lock_lock);
 	list_add_tail(&fl->fl_link, &file_lock_list);
+	spin_unlock(&file_lock_lock);
 }
 
 static inline void
 locks_delete_global_locks(struct file_lock *fl)
 {
+	spin_lock(&file_lock_lock);
 	list_del_init(&fl->fl_link);
+	spin_unlock(&file_lock_lock);
 }
 
 static inline void
@@ -512,6 +526,8 @@ locks_delete_global_blocked(struct file_lock *waiter)
 
 /* Remove waiter from blocker's block list.
  * When blocker ends up pointing to itself then the list is empty.
+ *
+ * Must be called with file_lock_lock held.
  */
 static void __locks_delete_block(struct file_lock *waiter)
 {
@@ -520,37 +536,47 @@ static void __locks_delete_block(struct file_lock *waiter)
 	waiter->fl_next = NULL;
 }
 
-/*
- */
 static void locks_delete_block(struct file_lock *waiter)
 {
-	lock_flocks();
+	spin_lock(&file_lock_lock);
 	__locks_delete_block(waiter);
-	unlock_flocks();
+	spin_unlock(&file_lock_lock);
 }
 
 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
+ *
+ * Must be called with file_lock_lock held!
  */
-static void locks_insert_block(struct file_lock *blocker, 
-			       struct file_lock *waiter)
+static void __locks_insert_block(struct file_lock *blocker,
+					struct file_lock *waiter)
 {
 	BUG_ON(!list_empty(&waiter->fl_block));
 	waiter->fl_next = blocker;
 	list_add_tail(&waiter->fl_block, &blocker->fl_block);
 	if (IS_POSIX(blocker))
-		locks_insert_global_blocked(request);
+		locks_insert_global_blocked(waiter);
+}
+
+/* Must be called with i_lock held. */
+static void locks_insert_block(struct file_lock *blocker,
+					struct file_lock *waiter)
+{
+	spin_lock(&file_lock_lock);
+	__locks_insert_block(blocker, waiter);
+	spin_unlock(&file_lock_lock);
 }
 
 /*
  * Wake up processes blocked waiting for blocker.
  *
- * Must be called with the file_lock_lock held!
+ * Must be called with the inode->i_lock held!
  */
 static void locks_wake_up_blocks(struct file_lock *blocker)
 {
+	spin_lock(&file_lock_lock);
 	while (!list_empty(&blocker->fl_block)) {
 		struct file_lock *waiter;
 
@@ -562,10 +588,13 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 		else
 			wake_up(&waiter->fl_wait);
 	}
+	spin_unlock(&file_lock_lock);
 }
 
 /* Insert file lock fl into an inode's lock list at the position indicated
  * by pos. At the same time add the lock to the global file lock list.
+ *
+ * Must be called with the i_lock held!
  */
 static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl)
 {
@@ -583,6 +612,8 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl)
  * Wake up processes that are blocked waiting for this lock,
  * notify the FS that the lock has been cleared and
  * finally free the lock.
+ *
+ * Must be called with the i_lock held!
  */
 static void locks_delete_lock(struct file_lock **thisfl_p)
 {
@@ -652,8 +683,9 @@ void
 posix_test_lock(struct file *filp, struct file_lock *fl)
 {
 	struct file_lock *cfl;
+	struct inode *inode = file_inode(filp);
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (cfl = file_inode(filp)->i_flock; cfl; cfl = cfl->fl_next) {
 		if (!IS_POSIX(cfl))
 			continue;
@@ -666,7 +698,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 			fl->fl_pid = pid_vnr(cfl->fl_nspid);
 	} else
 		fl->fl_type = F_UNLCK;
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return;
 }
 EXPORT_SYMBOL(posix_test_lock);
@@ -710,6 +742,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 	return NULL;
 }
 
+/* Must be called with the file_lock_lock held! */
 static int posix_locks_deadlock(struct file_lock *caller_fl,
 				struct file_lock *block_fl)
 {
@@ -745,7 +778,7 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 			return -ENOMEM;
 	}
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	if (request->fl_flags & FL_ACCESS)
 		goto find_conflict;
 
@@ -775,9 +808,9 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 	 * give it the opportunity to lock the file.
 	 */
 	if (found) {
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		cond_resched();
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 	}
 
 find_conflict:
@@ -804,7 +837,7 @@ find_conflict:
 	error = 0;
 
 out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	if (new_fl)
 		locks_free_lock(new_fl);
 	return error;
@@ -834,7 +867,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		new_fl2 = locks_alloc_lock();
 	}
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	/*
 	 * New lock request. Walk all POSIX locks and look for conflicts. If
 	 * there are any, either return error or put the request on the
@@ -852,11 +885,17 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 			error = -EAGAIN;
 			if (!(request->fl_flags & FL_SLEEP))
 				goto out;
+			/*
+			 * Deadlock detection and insertion into the blocked
+			 * locks list must be done while holding the same lock!
+			 */
 			error = -EDEADLK;
-			if (posix_locks_deadlock(request, fl))
-				goto out;
-			error = FILE_LOCK_DEFERRED;
-			locks_insert_block(fl, request);
+			spin_lock(&file_lock_lock);
+			if (likely(!posix_locks_deadlock(request, fl))) {
+				error = FILE_LOCK_DEFERRED;
+				__locks_insert_block(fl, request);
+			}
+			spin_unlock(&file_lock_lock);
 			goto out;
   		}
   	}
@@ -1006,7 +1045,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		locks_wake_up_blocks(left);
 	}
  out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	/*
 	 * Free any unused locks.
 	 */
@@ -1081,14 +1120,14 @@ int locks_mandatory_locked(struct inode *inode)
 	/*
 	 * Search the lock list for this inode for any POSIX locks.
 	 */
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!IS_POSIX(fl))
 			continue;
 		if (fl->fl_owner != owner)
 			break;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return fl ? -EAGAIN : 0;
 }
 
@@ -1231,7 +1270,7 @@ int __break_lease(struct inode *inode, unsigned int mode)
 	if (IS_ERR(new_fl))
 		return PTR_ERR(new_fl);
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 
 	time_out_leases(inode);
 
@@ -1281,11 +1320,11 @@ restart:
 			break_time++;
 	}
 	locks_insert_block(flock, new_fl);
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	error = wait_event_interruptible_timeout(new_fl->fl_wait,
 						!new_fl->fl_next, break_time);
-	lock_flocks();
-	__locks_delete_block(new_fl);
+	spin_lock(&inode->i_lock);
+	locks_delete_block(new_fl);
 	if (error >= 0) {
 		if (error == 0)
 			time_out_leases(inode);
@@ -1302,7 +1341,7 @@ restart:
 	}
 
 out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	locks_free_lock(new_fl);
 	return error;
 }
@@ -1355,9 +1394,10 @@ EXPORT_SYMBOL(lease_get_mtime);
 int fcntl_getlease(struct file *filp)
 {
 	struct file_lock *fl;
+	struct inode *inode = file_inode(filp);
 	int type = F_UNLCK;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	time_out_leases(file_inode(filp));
 	for (fl = file_inode(filp)->i_flock; fl && IS_LEASE(fl);
 			fl = fl->fl_next) {
@@ -1366,7 +1406,7 @@ int fcntl_getlease(struct file *filp)
 			break;
 		}
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return type;
 }
 
@@ -1460,7 +1500,7 @@ static int generic_delete_lease(struct file *filp, struct file_lock **flp)
  *	The (input) flp->fl_lmops->lm_break function is required
  *	by break_lease().
  *
- *	Called with file_lock_lock held.
+ *	Called with inode->i_lock held.
  */
 int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
 {
@@ -1529,11 +1569,12 @@ static int __vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 
 int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 {
+	struct inode *inode = file_inode(filp);
 	int error;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	error = __vfs_setlease(filp, arg, lease);
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	return error;
 }
@@ -1551,6 +1592,7 @@ static int do_fcntl_delete_lease(struct file *filp)
 static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
 {
 	struct file_lock *fl, *ret;
+	struct inode *inode = file_inode(filp);
 	struct fasync_struct *new;
 	int error;
 
@@ -1564,10 +1606,10 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
 		return -ENOMEM;
 	}
 	ret = fl;
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	error = __vfs_setlease(filp, arg, &ret);
 	if (error) {
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		locks_free_lock(fl);
 		goto out_free_fasync;
 	}
@@ -1584,7 +1626,7 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
 		new = NULL;
 
 	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 out_free_fasync:
 	if (new)
@@ -2108,7 +2150,7 @@ void locks_remove_flock(struct file *filp)
 			fl.fl_ops->fl_release_private(&fl);
 	}
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	before = &inode->i_flock;
 
 	while ((fl = *before) != NULL) {
@@ -2126,7 +2168,7 @@ void locks_remove_flock(struct file *filp)
  		}
 		before = &fl->fl_next;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 }
 
 /**
@@ -2140,12 +2182,12 @@ posix_unblock_lock(struct file_lock *waiter)
 {
 	int status = 0;
 
-	lock_flocks();
+	spin_lock(&file_lock_lock);
 	if (waiter->fl_next)
 		__locks_delete_block(waiter);
 	else
 		status = -ENOENT;
-	unlock_flocks();
+	spin_unlock(&file_lock_lock);
 	return status;
 }
 EXPORT_SYMBOL(posix_unblock_lock);
@@ -2259,7 +2301,7 @@ static void *locks_start(struct seq_file *f, loff_t *pos)
 {
 	loff_t *p = f->private;
 
-	lock_flocks();
+	spin_lock(&file_lock_lock);
 	*p = (*pos + 1);
 	return seq_list_start(&file_lock_list, *pos);
 }
@@ -2273,7 +2315,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 
 static void locks_stop(struct seq_file *f, void *v)
 {
-	unlock_flocks();
+	spin_unlock(&file_lock_lock);
 }
 
 static const struct seq_operations locks_seq_operations = {
@@ -2320,7 +2362,8 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
 {
 	struct file_lock *fl;
 	int result = 1;
-	lock_flocks();
+
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (IS_POSIX(fl)) {
 			if (fl->fl_type == F_RDLCK)
@@ -2337,7 +2380,7 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
 		result = 0;
 		break;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return result;
 }
 
@@ -2360,7 +2403,8 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
 {
 	struct file_lock *fl;
 	int result = 1;
-	lock_flocks();
+
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (IS_POSIX(fl)) {
 			if ((fl->fl_end < start) || (fl->fl_start > (start + len)))
@@ -2375,7 +2419,7 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
 		result = 0;
 		break;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return result;
 }
 
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 57db324..7ec4814 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -73,20 +73,20 @@ static int nfs_delegation_claim_locks(struct nfs_open_context *ctx, struct nfs4_
 	if (inode->i_flock == NULL)
 		goto out;
 
-	/* Protect inode->i_flock using the file locks lock */
-	lock_flocks();
+	/* Protect inode->i_flock using the i_lock */
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!(fl->fl_flags & (FL_POSIX|FL_FLOCK)))
 			continue;
 		if (nfs_file_open_context(fl->fl_file) != ctx)
 			continue;
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		status = nfs4_lock_delegation_recall(fl, state, stateid);
 		if (status < 0)
 			goto out;
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 out:
 	return status;
 }
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 1fab140..ff10b4a 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1373,13 +1373,13 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
 	/* Guard against delegation returns and new lock/unlock calls */
 	down_write(&nfsi->rwsem);
 	/* Protect inode->i_flock using the BKL */
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!(fl->fl_flags & (FL_POSIX|FL_FLOCK)))
 			continue;
 		if (nfs_file_open_context(fl->fl_file)->state != state)
 			continue;
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		status = ops->recover_lock(state, fl);
 		switch (status) {
 			case 0:
@@ -1406,9 +1406,9 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
 				/* kill_proc(fl->fl_pid, SIGLOST, 1); */
 				status = 0;
 		}
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 out:
 	up_write(&nfsi->rwsem);
 	return status;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 316ec84..f170518 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2645,13 +2645,13 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp)
 
 	list_add_tail(&dp->dl_recall_lru, &nn->del_recall_lru);
 
-	/* only place dl_time is set. protected by lock_flocks*/
+	/* Only place dl_time is set; protected by i_lock: */
 	dp->dl_time = get_seconds();
 
 	nfsd4_cb_recall(dp);
 }
 
-/* Called from break_lease() with lock_flocks() held. */
+/* Called from break_lease() with i_lock held. */
 static void nfsd_break_deleg_cb(struct file_lock *fl)
 {
 	struct nfs4_file *fp = (struct nfs4_file *)fl->fl_owner;
@@ -4520,7 +4520,7 @@ check_for_locks(struct nfs4_file *filp, struct nfs4_lockowner *lowner)
 	struct inode *inode = filp->fi_inode;
 	int status = 0;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (flpp = &inode->i_flock; *flpp != NULL; flpp = &(*flpp)->fl_next) {
 		if ((*flpp)->fl_owner == (fl_owner_t)lowner) {
 			status = 1;
@@ -4528,7 +4528,7 @@ check_for_locks(struct nfs4_file *filp, struct nfs4_lockowner *lowner)
 		}
 	}
 out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return status;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cba0fb5..805320b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1024,8 +1024,6 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
 extern int lease_modify(struct file_lock **, int);
 extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
 extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
-extern void lock_flocks(void);
-extern void unlock_flocks(void);
 #else /* !CONFIG_FILE_LOCKING */
 static inline int fcntl_getlk(struct file *file, struct flock __user *user)
 {
@@ -1166,15 +1164,6 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
 {
 	return 1;
 }
-
-static inline void lock_flocks(void)
-{
-}
-
-static inline void unlock_flocks(void)
-{
-}
-
 #endif /* !CONFIG_FILE_LOCKING */
 
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 07/14] locks: protect most of the file_lock handling with i_lock
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 Documentation/filesystems/Locking |   21 +++--
 fs/afs/flock.c                    |    5 +-
 fs/ceph/locks.c                   |    2 +-
 fs/ceph/mds_client.c              |    8 +-
 fs/cifs/cifsfs.c                  |    2 +-
 fs/cifs/file.c                    |   13 ++--
 fs/gfs2/file.c                    |    2 +-
 fs/lockd/svcsubs.c                |   12 ++--
 fs/locks.c                        |  164 +++++++++++++++++++++++--------------
 fs/nfs/delegation.c               |   10 +-
 fs/nfs/nfs4state.c                |    8 +-
 fs/nfsd/nfs4state.c               |    8 +-
 include/linux/fs.h                |   11 ---
 13 files changed, 154 insertions(+), 112 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 0706d32..413685f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -344,7 +344,7 @@ prototypes:
 
 
 locking rules:
-			file_lock_lock	may block
+			inode->i_lock	may block
 fl_copy_lock:		yes		no
 fl_release_private:	maybe		no
 
@@ -357,12 +357,19 @@ prototypes:
 	int (*lm_change)(struct file_lock **, int);
 
 locking rules:
-			file_lock_lock	may block
-lm_compare_owner:	yes		no
-lm_notify:		yes		no
-lm_grant:		no		no
-lm_break:		yes		no
-lm_change		yes		no
+
+			inode->i_lock	file_lock_lock	may block
+lm_compare_owner:	yes[1]		maybe		no
+lm_notify:		yes		yes		no
+lm_grant:		no		no		no
+lm_break:		yes		no		no
+lm_change		yes		no		no
+
+[1]:	->lm_compare_owner is generally called with *an* inode->i_lock held. It
+may not be the i_lock of the inode for either file_lock being compared! This is
+the case with deadlock detection, since the code has to chase down the owners
+of locks that may be entirely unrelated to the one on which the lock is being
+acquired. When doing a search for deadlocks, the file_lock_lock is also held.
 
 --------------------------- buffer_head -----------------------------------
 prototypes:
diff --git a/fs/afs/flock.c b/fs/afs/flock.c
index 2497bf3..03fc0d1 100644
--- a/fs/afs/flock.c
+++ b/fs/afs/flock.c
@@ -252,6 +252,7 @@ static void afs_defer_unlock(struct afs_vnode *vnode, struct key *key)
  */
 static int afs_do_setlk(struct file *file, struct file_lock *fl)
 {
+	struct inode = file_inode(file);
 	struct afs_vnode *vnode = AFS_FS_I(file->f_mapping->host);
 	afs_lock_type_t type;
 	struct key *key = file->private_data;
@@ -273,7 +274,7 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
 
 	type = (fl->fl_type == F_RDLCK) ? AFS_LOCK_READ : AFS_LOCK_WRITE;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 
 	/* make sure we've got a callback on this file and that our view of the
 	 * data version is up to date */
@@ -420,7 +421,7 @@ given_lock:
 	afs_vnode_fetch_status(vnode, NULL, key);
 
 error:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	_leave(" = %d", ret);
 	return ret;
 
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index ebbf680..690f73f 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -192,7 +192,7 @@ void ceph_count_locks(struct inode *inode, int *fcntl_count, int *flock_count)
 
 /**
  * Encode the flock and fcntl locks for the given inode into the ceph_filelock
- * array. Must be called with lock_flocks() already held.
+ * array. Must be called with inode->i_lock already held.
  * If we encounter more of a specific lock type than expected, return -ENOSPC.
  */
 int ceph_encode_locks_to_buffer(struct inode *inode,
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 4d29203..74fd289 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2481,20 +2481,20 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
 		struct ceph_filelock *flocks;
 
 encode_again:
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 		ceph_count_locks(inode, &num_fcntl_locks, &num_flock_locks);
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		flocks = kmalloc((num_fcntl_locks+num_flock_locks) *
 				 sizeof(struct ceph_filelock), GFP_NOFS);
 		if (!flocks) {
 			err = -ENOMEM;
 			goto out_free;
 		}
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 		err = ceph_encode_locks_to_buffer(inode, flocks,
 						  num_fcntl_locks,
 						  num_flock_locks);
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		if (err) {
 			kfree(flocks);
 			if (err == -ENOSPC)
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 3752b9f..e81655c 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -765,7 +765,7 @@ static loff_t cifs_llseek(struct file *file, loff_t offset, int whence)
 
 static int cifs_setlease(struct file *file, long arg, struct file_lock **lease)
 {
-	/* note that this is called by vfs setlease with lock_flocks held
+	/* note that this is called by vfs setlease with i_lock held
 	   to protect *lease from going away */
 	struct inode *inode = file_inode(file);
 	struct cifsFileInfo *cfile = file->private_data;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 1686e40..0630710 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1092,6 +1092,7 @@ struct lock_to_push {
 static int
 cifs_push_posix_locks(struct cifsFileInfo *cfile)
 {
+	struct inode *inode = cfile->dentry->d_inode;
 	struct cifs_tcon *tcon = tlink_tcon(cfile->tlink);
 	struct file_lock *flock, **before;
 	unsigned int count = 0, i = 0;
@@ -1102,12 +1103,12 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
 
 	xid = get_xid();
 
-	lock_flocks();
-	cifs_for_each_lock(cfile->dentry->d_inode, before) {
+	spin_lock(&inode->i_lock);
+	cifs_for_each_lock(inode, before) {
 		if ((*before)->fl_flags & FL_POSIX)
 			count++;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	INIT_LIST_HEAD(&locks_to_send);
 
@@ -1126,8 +1127,8 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
 	}
 
 	el = locks_to_send.next;
-	lock_flocks();
-	cifs_for_each_lock(cfile->dentry->d_inode, before) {
+	spin_lock(&inode->i_lock);
+	cifs_for_each_lock(inode, before) {
 		flock = *before;
 		if ((flock->fl_flags & FL_POSIX) == 0)
 			continue;
@@ -1152,7 +1153,7 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
 		lck->offset = flock->fl_start;
 		el = el->next;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	list_for_each_entry_safe(lck, tmp, &locks_to_send, llist) {
 		int stored_rc;
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index ad0dc38..2398b41 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -896,7 +896,7 @@ out_uninit:
  * cluster; until we do, disable leases (by just returning -EINVAL),
  * unless the administrator has requested purely local locking.
  *
- * Locking: called under lock_flocks
+ * Locking: called under i_lock
  *
  * Returns: errno
  */
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
index 97e8741..dc5c759 100644
--- a/fs/lockd/svcsubs.c
+++ b/fs/lockd/svcsubs.c
@@ -169,7 +169,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file,
 
 again:
 	file->f_locks = 0;
-	lock_flocks(); /* protects i_flock list */
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl; fl = fl->fl_next) {
 		if (fl->fl_lmops != &nlmsvc_lock_operations)
 			continue;
@@ -181,7 +181,7 @@ again:
 		if (match(lockhost, host)) {
 			struct file_lock lock = *fl;
 
-			unlock_flocks();
+			spin_unlock(&inode->i_lock);
 			lock.fl_type  = F_UNLCK;
 			lock.fl_start = 0;
 			lock.fl_end   = OFFSET_MAX;
@@ -193,7 +193,7 @@ again:
 			goto again;
 		}
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	return 0;
 }
@@ -228,14 +228,14 @@ nlm_file_inuse(struct nlm_file *file)
 	if (file->f_count || !list_empty(&file->f_blocks) || file->f_shares)
 		return 1;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl; fl = fl->fl_next) {
 		if (fl->fl_lmops == &nlmsvc_lock_operations) {
-			unlock_flocks();
+			spin_unlock(&inode->i_lock);
 			return 1;
 		}
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	file->f_locks = 0;
 	return 0;
 }
diff --git a/fs/locks.c b/fs/locks.c
index 89d898b..ce302d4 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -153,27 +153,37 @@ int lease_break_time = 45;
 #define for_each_lock(inode, lockp) \
 	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
 
-/* The global file_lock_list is only used for displaying /proc/locks. */
+/*
+ * The global file_lock_list is only used for displaying /proc/locks. Protected
+ * by the file_lock_lock.
+ */
 static LIST_HEAD(file_lock_list);
 
-/* The blocked_list is used to find POSIX lock loops for deadlock detection. */
+/*
+ * The blocked_list is used to find POSIX lock loops for deadlock detection.
+ * Protected by file_lock_lock.
+ */
 static LIST_HEAD(blocked_list);
 
-/* Protects the two list heads above, plus the inode->i_flock list */
+/*
+ * This lock protects the blocked_list, and the file_lock_list. Generally, if
+ * you're accessing one of those lists, you want to be holding this lock.
+ *
+ * In addition, it also protects the fl->fl_block list, and the fl->fl_next
+ * pointer for file_lock structures that are acting as lock requests (in
+ * contrast to those that are acting as records of acquired locks).
+ *
+ * Note that when we acquire this lock in order to change the above fields,
+ * we often hold the i_lock as well. In certain cases, when reading the fields
+ * protected by this lock, we can skip acquiring it iff we already hold the
+ * i_lock.
+ *
+ * In particular, adding an entry to the fl_block list requires that you hold
+ * both the i_lock and the blocked_lock_lock (acquired in that order). Deleting
+ * an entry from the list however only requires the file_lock_lock.
+ */
 static DEFINE_SPINLOCK(file_lock_lock);
 
-void lock_flocks(void)
-{
-	spin_lock(&file_lock_lock);
-}
-EXPORT_SYMBOL_GPL(lock_flocks);
-
-void unlock_flocks(void)
-{
-	spin_unlock(&file_lock_lock);
-}
-EXPORT_SYMBOL_GPL(unlock_flocks);
-
 static struct kmem_cache *filelock_cache __read_mostly;
 
 static void locks_init_lock_heads(struct file_lock *fl)
@@ -489,13 +499,17 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
 static inline void
 locks_insert_global_locks(struct file_lock *fl)
 {
+	spin_lock(&file_lock_lock);
 	list_add_tail(&fl->fl_link, &file_lock_list);
+	spin_unlock(&file_lock_lock);
 }
 
 static inline void
 locks_delete_global_locks(struct file_lock *fl)
 {
+	spin_lock(&file_lock_lock);
 	list_del_init(&fl->fl_link);
+	spin_unlock(&file_lock_lock);
 }
 
 static inline void
@@ -512,6 +526,8 @@ locks_delete_global_blocked(struct file_lock *waiter)
 
 /* Remove waiter from blocker's block list.
  * When blocker ends up pointing to itself then the list is empty.
+ *
+ * Must be called with file_lock_lock held.
  */
 static void __locks_delete_block(struct file_lock *waiter)
 {
@@ -520,37 +536,47 @@ static void __locks_delete_block(struct file_lock *waiter)
 	waiter->fl_next = NULL;
 }
 
-/*
- */
 static void locks_delete_block(struct file_lock *waiter)
 {
-	lock_flocks();
+	spin_lock(&file_lock_lock);
 	__locks_delete_block(waiter);
-	unlock_flocks();
+	spin_unlock(&file_lock_lock);
 }
 
 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
+ *
+ * Must be called with file_lock_lock held!
  */
-static void locks_insert_block(struct file_lock *blocker, 
-			       struct file_lock *waiter)
+static void __locks_insert_block(struct file_lock *blocker,
+					struct file_lock *waiter)
 {
 	BUG_ON(!list_empty(&waiter->fl_block));
 	waiter->fl_next = blocker;
 	list_add_tail(&waiter->fl_block, &blocker->fl_block);
 	if (IS_POSIX(blocker))
-		locks_insert_global_blocked(request);
+		locks_insert_global_blocked(waiter);
+}
+
+/* Must be called with i_lock held. */
+static void locks_insert_block(struct file_lock *blocker,
+					struct file_lock *waiter)
+{
+	spin_lock(&file_lock_lock);
+	__locks_insert_block(blocker, waiter);
+	spin_unlock(&file_lock_lock);
 }
 
 /*
  * Wake up processes blocked waiting for blocker.
  *
- * Must be called with the file_lock_lock held!
+ * Must be called with the inode->i_lock held!
  */
 static void locks_wake_up_blocks(struct file_lock *blocker)
 {
+	spin_lock(&file_lock_lock);
 	while (!list_empty(&blocker->fl_block)) {
 		struct file_lock *waiter;
 
@@ -562,10 +588,13 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 		else
 			wake_up(&waiter->fl_wait);
 	}
+	spin_unlock(&file_lock_lock);
 }
 
 /* Insert file lock fl into an inode's lock list at the position indicated
  * by pos. At the same time add the lock to the global file lock list.
+ *
+ * Must be called with the i_lock held!
  */
 static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl)
 {
@@ -583,6 +612,8 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl)
  * Wake up processes that are blocked waiting for this lock,
  * notify the FS that the lock has been cleared and
  * finally free the lock.
+ *
+ * Must be called with the i_lock held!
  */
 static void locks_delete_lock(struct file_lock **thisfl_p)
 {
@@ -652,8 +683,9 @@ void
 posix_test_lock(struct file *filp, struct file_lock *fl)
 {
 	struct file_lock *cfl;
+	struct inode *inode = file_inode(filp);
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (cfl = file_inode(filp)->i_flock; cfl; cfl = cfl->fl_next) {
 		if (!IS_POSIX(cfl))
 			continue;
@@ -666,7 +698,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 			fl->fl_pid = pid_vnr(cfl->fl_nspid);
 	} else
 		fl->fl_type = F_UNLCK;
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return;
 }
 EXPORT_SYMBOL(posix_test_lock);
@@ -710,6 +742,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 	return NULL;
 }
 
+/* Must be called with the file_lock_lock held! */
 static int posix_locks_deadlock(struct file_lock *caller_fl,
 				struct file_lock *block_fl)
 {
@@ -745,7 +778,7 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 			return -ENOMEM;
 	}
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	if (request->fl_flags & FL_ACCESS)
 		goto find_conflict;
 
@@ -775,9 +808,9 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 	 * give it the opportunity to lock the file.
 	 */
 	if (found) {
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		cond_resched();
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 	}
 
 find_conflict:
@@ -804,7 +837,7 @@ find_conflict:
 	error = 0;
 
 out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	if (new_fl)
 		locks_free_lock(new_fl);
 	return error;
@@ -834,7 +867,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		new_fl2 = locks_alloc_lock();
 	}
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	/*
 	 * New lock request. Walk all POSIX locks and look for conflicts. If
 	 * there are any, either return error or put the request on the
@@ -852,11 +885,17 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 			error = -EAGAIN;
 			if (!(request->fl_flags & FL_SLEEP))
 				goto out;
+			/*
+			 * Deadlock detection and insertion into the blocked
+			 * locks list must be done while holding the same lock!
+			 */
 			error = -EDEADLK;
-			if (posix_locks_deadlock(request, fl))
-				goto out;
-			error = FILE_LOCK_DEFERRED;
-			locks_insert_block(fl, request);
+			spin_lock(&file_lock_lock);
+			if (likely(!posix_locks_deadlock(request, fl))) {
+				error = FILE_LOCK_DEFERRED;
+				__locks_insert_block(fl, request);
+			}
+			spin_unlock(&file_lock_lock);
 			goto out;
   		}
   	}
@@ -1006,7 +1045,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		locks_wake_up_blocks(left);
 	}
  out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	/*
 	 * Free any unused locks.
 	 */
@@ -1081,14 +1120,14 @@ int locks_mandatory_locked(struct inode *inode)
 	/*
 	 * Search the lock list for this inode for any POSIX locks.
 	 */
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!IS_POSIX(fl))
 			continue;
 		if (fl->fl_owner != owner)
 			break;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return fl ? -EAGAIN : 0;
 }
 
@@ -1231,7 +1270,7 @@ int __break_lease(struct inode *inode, unsigned int mode)
 	if (IS_ERR(new_fl))
 		return PTR_ERR(new_fl);
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 
 	time_out_leases(inode);
 
@@ -1281,11 +1320,11 @@ restart:
 			break_time++;
 	}
 	locks_insert_block(flock, new_fl);
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	error = wait_event_interruptible_timeout(new_fl->fl_wait,
 						!new_fl->fl_next, break_time);
-	lock_flocks();
-	__locks_delete_block(new_fl);
+	spin_lock(&inode->i_lock);
+	locks_delete_block(new_fl);
 	if (error >= 0) {
 		if (error == 0)
 			time_out_leases(inode);
@@ -1302,7 +1341,7 @@ restart:
 	}
 
 out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	locks_free_lock(new_fl);
 	return error;
 }
@@ -1355,9 +1394,10 @@ EXPORT_SYMBOL(lease_get_mtime);
 int fcntl_getlease(struct file *filp)
 {
 	struct file_lock *fl;
+	struct inode *inode = file_inode(filp);
 	int type = F_UNLCK;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	time_out_leases(file_inode(filp));
 	for (fl = file_inode(filp)->i_flock; fl && IS_LEASE(fl);
 			fl = fl->fl_next) {
@@ -1366,7 +1406,7 @@ int fcntl_getlease(struct file *filp)
 			break;
 		}
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return type;
 }
 
@@ -1460,7 +1500,7 @@ static int generic_delete_lease(struct file *filp, struct file_lock **flp)
  *	The (input) flp->fl_lmops->lm_break function is required
  *	by break_lease().
  *
- *	Called with file_lock_lock held.
+ *	Called with inode->i_lock held.
  */
 int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
 {
@@ -1529,11 +1569,12 @@ static int __vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 
 int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 {
+	struct inode *inode = file_inode(filp);
 	int error;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	error = __vfs_setlease(filp, arg, lease);
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	return error;
 }
@@ -1551,6 +1592,7 @@ static int do_fcntl_delete_lease(struct file *filp)
 static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
 {
 	struct file_lock *fl, *ret;
+	struct inode *inode = file_inode(filp);
 	struct fasync_struct *new;
 	int error;
 
@@ -1564,10 +1606,10 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
 		return -ENOMEM;
 	}
 	ret = fl;
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	error = __vfs_setlease(filp, arg, &ret);
 	if (error) {
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		locks_free_lock(fl);
 		goto out_free_fasync;
 	}
@@ -1584,7 +1626,7 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
 		new = NULL;
 
 	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 out_free_fasync:
 	if (new)
@@ -2108,7 +2150,7 @@ void locks_remove_flock(struct file *filp)
 			fl.fl_ops->fl_release_private(&fl);
 	}
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	before = &inode->i_flock;
 
 	while ((fl = *before) != NULL) {
@@ -2126,7 +2168,7 @@ void locks_remove_flock(struct file *filp)
  		}
 		before = &fl->fl_next;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 }
 
 /**
@@ -2140,12 +2182,12 @@ posix_unblock_lock(struct file_lock *waiter)
 {
 	int status = 0;
 
-	lock_flocks();
+	spin_lock(&file_lock_lock);
 	if (waiter->fl_next)
 		__locks_delete_block(waiter);
 	else
 		status = -ENOENT;
-	unlock_flocks();
+	spin_unlock(&file_lock_lock);
 	return status;
 }
 EXPORT_SYMBOL(posix_unblock_lock);
@@ -2259,7 +2301,7 @@ static void *locks_start(struct seq_file *f, loff_t *pos)
 {
 	loff_t *p = f->private;
 
-	lock_flocks();
+	spin_lock(&file_lock_lock);
 	*p = (*pos + 1);
 	return seq_list_start(&file_lock_list, *pos);
 }
@@ -2273,7 +2315,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 
 static void locks_stop(struct seq_file *f, void *v)
 {
-	unlock_flocks();
+	spin_unlock(&file_lock_lock);
 }
 
 static const struct seq_operations locks_seq_operations = {
@@ -2320,7 +2362,8 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
 {
 	struct file_lock *fl;
 	int result = 1;
-	lock_flocks();
+
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (IS_POSIX(fl)) {
 			if (fl->fl_type == F_RDLCK)
@@ -2337,7 +2380,7 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
 		result = 0;
 		break;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return result;
 }
 
@@ -2360,7 +2403,8 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
 {
 	struct file_lock *fl;
 	int result = 1;
-	lock_flocks();
+
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (IS_POSIX(fl)) {
 			if ((fl->fl_end < start) || (fl->fl_start > (start + len)))
@@ -2375,7 +2419,7 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
 		result = 0;
 		break;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return result;
 }
 
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 57db324..7ec4814 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -73,20 +73,20 @@ static int nfs_delegation_claim_locks(struct nfs_open_context *ctx, struct nfs4_
 	if (inode->i_flock == NULL)
 		goto out;
 
-	/* Protect inode->i_flock using the file locks lock */
-	lock_flocks();
+	/* Protect inode->i_flock using the i_lock */
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!(fl->fl_flags & (FL_POSIX|FL_FLOCK)))
 			continue;
 		if (nfs_file_open_context(fl->fl_file) != ctx)
 			continue;
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		status = nfs4_lock_delegation_recall(fl, state, stateid);
 		if (status < 0)
 			goto out;
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 out:
 	return status;
 }
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 1fab140..ff10b4a 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1373,13 +1373,13 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
 	/* Guard against delegation returns and new lock/unlock calls */
 	down_write(&nfsi->rwsem);
 	/* Protect inode->i_flock using the BKL */
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!(fl->fl_flags & (FL_POSIX|FL_FLOCK)))
 			continue;
 		if (nfs_file_open_context(fl->fl_file)->state != state)
 			continue;
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		status = ops->recover_lock(state, fl);
 		switch (status) {
 			case 0:
@@ -1406,9 +1406,9 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
 				/* kill_proc(fl->fl_pid, SIGLOST, 1); */
 				status = 0;
 		}
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 out:
 	up_write(&nfsi->rwsem);
 	return status;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 316ec84..f170518 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2645,13 +2645,13 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp)
 
 	list_add_tail(&dp->dl_recall_lru, &nn->del_recall_lru);
 
-	/* only place dl_time is set. protected by lock_flocks*/
+	/* Only place dl_time is set; protected by i_lock: */
 	dp->dl_time = get_seconds();
 
 	nfsd4_cb_recall(dp);
 }
 
-/* Called from break_lease() with lock_flocks() held. */
+/* Called from break_lease() with i_lock held. */
 static void nfsd_break_deleg_cb(struct file_lock *fl)
 {
 	struct nfs4_file *fp = (struct nfs4_file *)fl->fl_owner;
@@ -4520,7 +4520,7 @@ check_for_locks(struct nfs4_file *filp, struct nfs4_lockowner *lowner)
 	struct inode *inode = filp->fi_inode;
 	int status = 0;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (flpp = &inode->i_flock; *flpp != NULL; flpp = &(*flpp)->fl_next) {
 		if ((*flpp)->fl_owner == (fl_owner_t)lowner) {
 			status = 1;
@@ -4528,7 +4528,7 @@ check_for_locks(struct nfs4_file *filp, struct nfs4_lockowner *lowner)
 		}
 	}
 out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return status;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cba0fb5..805320b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1024,8 +1024,6 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
 extern int lease_modify(struct file_lock **, int);
 extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
 extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
-extern void lock_flocks(void);
-extern void unlock_flocks(void);
 #else /* !CONFIG_FILE_LOCKING */
 static inline int fcntl_getlk(struct file *file, struct flock __user *user)
 {
@@ -1166,15 +1164,6 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
 {
 	return 1;
 }
-
-static inline void lock_flocks(void)
-{
-}
-
-static inline void unlock_flocks(void)
-{
-}
-
 #endif /* !CONFIG_FILE_LOCKING */
 
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 07/14] locks: protect most of the file_lock handling with i_lock
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 Documentation/filesystems/Locking |   21 +++--
 fs/afs/flock.c                    |    5 +-
 fs/ceph/locks.c                   |    2 +-
 fs/ceph/mds_client.c              |    8 +-
 fs/cifs/cifsfs.c                  |    2 +-
 fs/cifs/file.c                    |   13 ++--
 fs/gfs2/file.c                    |    2 +-
 fs/lockd/svcsubs.c                |   12 ++--
 fs/locks.c                        |  164 +++++++++++++++++++++++--------------
 fs/nfs/delegation.c               |   10 +-
 fs/nfs/nfs4state.c                |    8 +-
 fs/nfsd/nfs4state.c               |    8 +-
 include/linux/fs.h                |   11 ---
 13 files changed, 154 insertions(+), 112 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 0706d32..413685f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -344,7 +344,7 @@ prototypes:
 
 
 locking rules:
-			file_lock_lock	may block
+			inode->i_lock	may block
 fl_copy_lock:		yes		no
 fl_release_private:	maybe		no
 
@@ -357,12 +357,19 @@ prototypes:
 	int (*lm_change)(struct file_lock **, int);
 
 locking rules:
-			file_lock_lock	may block
-lm_compare_owner:	yes		no
-lm_notify:		yes		no
-lm_grant:		no		no
-lm_break:		yes		no
-lm_change		yes		no
+
+			inode->i_lock	file_lock_lock	may block
+lm_compare_owner:	yes[1]		maybe		no
+lm_notify:		yes		yes		no
+lm_grant:		no		no		no
+lm_break:		yes		no		no
+lm_change		yes		no		no
+
+[1]:	->lm_compare_owner is generally called with *an* inode->i_lock held. It
+may not be the i_lock of the inode for either file_lock being compared! This is
+the case with deadlock detection, since the code has to chase down the owners
+of locks that may be entirely unrelated to the one on which the lock is being
+acquired. When doing a search for deadlocks, the file_lock_lock is also held.
 
 --------------------------- buffer_head -----------------------------------
 prototypes:
diff --git a/fs/afs/flock.c b/fs/afs/flock.c
index 2497bf3..03fc0d1 100644
--- a/fs/afs/flock.c
+++ b/fs/afs/flock.c
@@ -252,6 +252,7 @@ static void afs_defer_unlock(struct afs_vnode *vnode, struct key *key)
  */
 static int afs_do_setlk(struct file *file, struct file_lock *fl)
 {
+	struct inode = file_inode(file);
 	struct afs_vnode *vnode = AFS_FS_I(file->f_mapping->host);
 	afs_lock_type_t type;
 	struct key *key = file->private_data;
@@ -273,7 +274,7 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
 
 	type = (fl->fl_type == F_RDLCK) ? AFS_LOCK_READ : AFS_LOCK_WRITE;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 
 	/* make sure we've got a callback on this file and that our view of the
 	 * data version is up to date */
@@ -420,7 +421,7 @@ given_lock:
 	afs_vnode_fetch_status(vnode, NULL, key);
 
 error:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	_leave(" = %d", ret);
 	return ret;
 
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index ebbf680..690f73f 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -192,7 +192,7 @@ void ceph_count_locks(struct inode *inode, int *fcntl_count, int *flock_count)
 
 /**
  * Encode the flock and fcntl locks for the given inode into the ceph_filelock
- * array. Must be called with lock_flocks() already held.
+ * array. Must be called with inode->i_lock already held.
  * If we encounter more of a specific lock type than expected, return -ENOSPC.
  */
 int ceph_encode_locks_to_buffer(struct inode *inode,
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 4d29203..74fd289 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2481,20 +2481,20 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
 		struct ceph_filelock *flocks;
 
 encode_again:
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 		ceph_count_locks(inode, &num_fcntl_locks, &num_flock_locks);
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		flocks = kmalloc((num_fcntl_locks+num_flock_locks) *
 				 sizeof(struct ceph_filelock), GFP_NOFS);
 		if (!flocks) {
 			err = -ENOMEM;
 			goto out_free;
 		}
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 		err = ceph_encode_locks_to_buffer(inode, flocks,
 						  num_fcntl_locks,
 						  num_flock_locks);
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		if (err) {
 			kfree(flocks);
 			if (err == -ENOSPC)
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 3752b9f..e81655c 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -765,7 +765,7 @@ static loff_t cifs_llseek(struct file *file, loff_t offset, int whence)
 
 static int cifs_setlease(struct file *file, long arg, struct file_lock **lease)
 {
-	/* note that this is called by vfs setlease with lock_flocks held
+	/* note that this is called by vfs setlease with i_lock held
 	   to protect *lease from going away */
 	struct inode *inode = file_inode(file);
 	struct cifsFileInfo *cfile = file->private_data;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 1686e40..0630710 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1092,6 +1092,7 @@ struct lock_to_push {
 static int
 cifs_push_posix_locks(struct cifsFileInfo *cfile)
 {
+	struct inode *inode = cfile->dentry->d_inode;
 	struct cifs_tcon *tcon = tlink_tcon(cfile->tlink);
 	struct file_lock *flock, **before;
 	unsigned int count = 0, i = 0;
@@ -1102,12 +1103,12 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
 
 	xid = get_xid();
 
-	lock_flocks();
-	cifs_for_each_lock(cfile->dentry->d_inode, before) {
+	spin_lock(&inode->i_lock);
+	cifs_for_each_lock(inode, before) {
 		if ((*before)->fl_flags & FL_POSIX)
 			count++;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	INIT_LIST_HEAD(&locks_to_send);
 
@@ -1126,8 +1127,8 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
 	}
 
 	el = locks_to_send.next;
-	lock_flocks();
-	cifs_for_each_lock(cfile->dentry->d_inode, before) {
+	spin_lock(&inode->i_lock);
+	cifs_for_each_lock(inode, before) {
 		flock = *before;
 		if ((flock->fl_flags & FL_POSIX) == 0)
 			continue;
@@ -1152,7 +1153,7 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
 		lck->offset = flock->fl_start;
 		el = el->next;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	list_for_each_entry_safe(lck, tmp, &locks_to_send, llist) {
 		int stored_rc;
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index ad0dc38..2398b41 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -896,7 +896,7 @@ out_uninit:
  * cluster; until we do, disable leases (by just returning -EINVAL),
  * unless the administrator has requested purely local locking.
  *
- * Locking: called under lock_flocks
+ * Locking: called under i_lock
  *
  * Returns: errno
  */
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
index 97e8741..dc5c759 100644
--- a/fs/lockd/svcsubs.c
+++ b/fs/lockd/svcsubs.c
@@ -169,7 +169,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file,
 
 again:
 	file->f_locks = 0;
-	lock_flocks(); /* protects i_flock list */
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl; fl = fl->fl_next) {
 		if (fl->fl_lmops != &nlmsvc_lock_operations)
 			continue;
@@ -181,7 +181,7 @@ again:
 		if (match(lockhost, host)) {
 			struct file_lock lock = *fl;
 
-			unlock_flocks();
+			spin_unlock(&inode->i_lock);
 			lock.fl_type  = F_UNLCK;
 			lock.fl_start = 0;
 			lock.fl_end   = OFFSET_MAX;
@@ -193,7 +193,7 @@ again:
 			goto again;
 		}
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	return 0;
 }
@@ -228,14 +228,14 @@ nlm_file_inuse(struct nlm_file *file)
 	if (file->f_count || !list_empty(&file->f_blocks) || file->f_shares)
 		return 1;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl; fl = fl->fl_next) {
 		if (fl->fl_lmops == &nlmsvc_lock_operations) {
-			unlock_flocks();
+			spin_unlock(&inode->i_lock);
 			return 1;
 		}
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	file->f_locks = 0;
 	return 0;
 }
diff --git a/fs/locks.c b/fs/locks.c
index 89d898b..ce302d4 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -153,27 +153,37 @@ int lease_break_time = 45;
 #define for_each_lock(inode, lockp) \
 	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
 
-/* The global file_lock_list is only used for displaying /proc/locks. */
+/*
+ * The global file_lock_list is only used for displaying /proc/locks. Protected
+ * by the file_lock_lock.
+ */
 static LIST_HEAD(file_lock_list);
 
-/* The blocked_list is used to find POSIX lock loops for deadlock detection. */
+/*
+ * The blocked_list is used to find POSIX lock loops for deadlock detection.
+ * Protected by file_lock_lock.
+ */
 static LIST_HEAD(blocked_list);
 
-/* Protects the two list heads above, plus the inode->i_flock list */
+/*
+ * This lock protects the blocked_list, and the file_lock_list. Generally, if
+ * you're accessing one of those lists, you want to be holding this lock.
+ *
+ * In addition, it also protects the fl->fl_block list, and the fl->fl_next
+ * pointer for file_lock structures that are acting as lock requests (in
+ * contrast to those that are acting as records of acquired locks).
+ *
+ * Note that when we acquire this lock in order to change the above fields,
+ * we often hold the i_lock as well. In certain cases, when reading the fields
+ * protected by this lock, we can skip acquiring it iff we already hold the
+ * i_lock.
+ *
+ * In particular, adding an entry to the fl_block list requires that you hold
+ * both the i_lock and the blocked_lock_lock (acquired in that order). Deleting
+ * an entry from the list however only requires the file_lock_lock.
+ */
 static DEFINE_SPINLOCK(file_lock_lock);
 
-void lock_flocks(void)
-{
-	spin_lock(&file_lock_lock);
-}
-EXPORT_SYMBOL_GPL(lock_flocks);
-
-void unlock_flocks(void)
-{
-	spin_unlock(&file_lock_lock);
-}
-EXPORT_SYMBOL_GPL(unlock_flocks);
-
 static struct kmem_cache *filelock_cache __read_mostly;
 
 static void locks_init_lock_heads(struct file_lock *fl)
@@ -489,13 +499,17 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
 static inline void
 locks_insert_global_locks(struct file_lock *fl)
 {
+	spin_lock(&file_lock_lock);
 	list_add_tail(&fl->fl_link, &file_lock_list);
+	spin_unlock(&file_lock_lock);
 }
 
 static inline void
 locks_delete_global_locks(struct file_lock *fl)
 {
+	spin_lock(&file_lock_lock);
 	list_del_init(&fl->fl_link);
+	spin_unlock(&file_lock_lock);
 }
 
 static inline void
@@ -512,6 +526,8 @@ locks_delete_global_blocked(struct file_lock *waiter)
 
 /* Remove waiter from blocker's block list.
  * When blocker ends up pointing to itself then the list is empty.
+ *
+ * Must be called with file_lock_lock held.
  */
 static void __locks_delete_block(struct file_lock *waiter)
 {
@@ -520,37 +536,47 @@ static void __locks_delete_block(struct file_lock *waiter)
 	waiter->fl_next = NULL;
 }
 
-/*
- */
 static void locks_delete_block(struct file_lock *waiter)
 {
-	lock_flocks();
+	spin_lock(&file_lock_lock);
 	__locks_delete_block(waiter);
-	unlock_flocks();
+	spin_unlock(&file_lock_lock);
 }
 
 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
+ *
+ * Must be called with file_lock_lock held!
  */
-static void locks_insert_block(struct file_lock *blocker, 
-			       struct file_lock *waiter)
+static void __locks_insert_block(struct file_lock *blocker,
+					struct file_lock *waiter)
 {
 	BUG_ON(!list_empty(&waiter->fl_block));
 	waiter->fl_next = blocker;
 	list_add_tail(&waiter->fl_block, &blocker->fl_block);
 	if (IS_POSIX(blocker))
-		locks_insert_global_blocked(request);
+		locks_insert_global_blocked(waiter);
+}
+
+/* Must be called with i_lock held. */
+static void locks_insert_block(struct file_lock *blocker,
+					struct file_lock *waiter)
+{
+	spin_lock(&file_lock_lock);
+	__locks_insert_block(blocker, waiter);
+	spin_unlock(&file_lock_lock);
 }
 
 /*
  * Wake up processes blocked waiting for blocker.
  *
- * Must be called with the file_lock_lock held!
+ * Must be called with the inode->i_lock held!
  */
 static void locks_wake_up_blocks(struct file_lock *blocker)
 {
+	spin_lock(&file_lock_lock);
 	while (!list_empty(&blocker->fl_block)) {
 		struct file_lock *waiter;
 
@@ -562,10 +588,13 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 		else
 			wake_up(&waiter->fl_wait);
 	}
+	spin_unlock(&file_lock_lock);
 }
 
 /* Insert file lock fl into an inode's lock list at the position indicated
  * by pos. At the same time add the lock to the global file lock list.
+ *
+ * Must be called with the i_lock held!
  */
 static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl)
 {
@@ -583,6 +612,8 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl)
  * Wake up processes that are blocked waiting for this lock,
  * notify the FS that the lock has been cleared and
  * finally free the lock.
+ *
+ * Must be called with the i_lock held!
  */
 static void locks_delete_lock(struct file_lock **thisfl_p)
 {
@@ -652,8 +683,9 @@ void
 posix_test_lock(struct file *filp, struct file_lock *fl)
 {
 	struct file_lock *cfl;
+	struct inode *inode = file_inode(filp);
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (cfl = file_inode(filp)->i_flock; cfl; cfl = cfl->fl_next) {
 		if (!IS_POSIX(cfl))
 			continue;
@@ -666,7 +698,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 			fl->fl_pid = pid_vnr(cfl->fl_nspid);
 	} else
 		fl->fl_type = F_UNLCK;
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return;
 }
 EXPORT_SYMBOL(posix_test_lock);
@@ -710,6 +742,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 	return NULL;
 }
 
+/* Must be called with the file_lock_lock held! */
 static int posix_locks_deadlock(struct file_lock *caller_fl,
 				struct file_lock *block_fl)
 {
@@ -745,7 +778,7 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 			return -ENOMEM;
 	}
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	if (request->fl_flags & FL_ACCESS)
 		goto find_conflict;
 
@@ -775,9 +808,9 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 	 * give it the opportunity to lock the file.
 	 */
 	if (found) {
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		cond_resched();
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 	}
 
 find_conflict:
@@ -804,7 +837,7 @@ find_conflict:
 	error = 0;
 
 out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	if (new_fl)
 		locks_free_lock(new_fl);
 	return error;
@@ -834,7 +867,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		new_fl2 = locks_alloc_lock();
 	}
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	/*
 	 * New lock request. Walk all POSIX locks and look for conflicts. If
 	 * there are any, either return error or put the request on the
@@ -852,11 +885,17 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 			error = -EAGAIN;
 			if (!(request->fl_flags & FL_SLEEP))
 				goto out;
+			/*
+			 * Deadlock detection and insertion into the blocked
+			 * locks list must be done while holding the same lock!
+			 */
 			error = -EDEADLK;
-			if (posix_locks_deadlock(request, fl))
-				goto out;
-			error = FILE_LOCK_DEFERRED;
-			locks_insert_block(fl, request);
+			spin_lock(&file_lock_lock);
+			if (likely(!posix_locks_deadlock(request, fl))) {
+				error = FILE_LOCK_DEFERRED;
+				__locks_insert_block(fl, request);
+			}
+			spin_unlock(&file_lock_lock);
 			goto out;
   		}
   	}
@@ -1006,7 +1045,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		locks_wake_up_blocks(left);
 	}
  out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	/*
 	 * Free any unused locks.
 	 */
@@ -1081,14 +1120,14 @@ int locks_mandatory_locked(struct inode *inode)
 	/*
 	 * Search the lock list for this inode for any POSIX locks.
 	 */
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!IS_POSIX(fl))
 			continue;
 		if (fl->fl_owner != owner)
 			break;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return fl ? -EAGAIN : 0;
 }
 
@@ -1231,7 +1270,7 @@ int __break_lease(struct inode *inode, unsigned int mode)
 	if (IS_ERR(new_fl))
 		return PTR_ERR(new_fl);
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 
 	time_out_leases(inode);
 
@@ -1281,11 +1320,11 @@ restart:
 			break_time++;
 	}
 	locks_insert_block(flock, new_fl);
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	error = wait_event_interruptible_timeout(new_fl->fl_wait,
 						!new_fl->fl_next, break_time);
-	lock_flocks();
-	__locks_delete_block(new_fl);
+	spin_lock(&inode->i_lock);
+	locks_delete_block(new_fl);
 	if (error >= 0) {
 		if (error == 0)
 			time_out_leases(inode);
@@ -1302,7 +1341,7 @@ restart:
 	}
 
 out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	locks_free_lock(new_fl);
 	return error;
 }
@@ -1355,9 +1394,10 @@ EXPORT_SYMBOL(lease_get_mtime);
 int fcntl_getlease(struct file *filp)
 {
 	struct file_lock *fl;
+	struct inode *inode = file_inode(filp);
 	int type = F_UNLCK;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	time_out_leases(file_inode(filp));
 	for (fl = file_inode(filp)->i_flock; fl && IS_LEASE(fl);
 			fl = fl->fl_next) {
@@ -1366,7 +1406,7 @@ int fcntl_getlease(struct file *filp)
 			break;
 		}
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return type;
 }
 
@@ -1460,7 +1500,7 @@ static int generic_delete_lease(struct file *filp, struct file_lock **flp)
  *	The (input) flp->fl_lmops->lm_break function is required
  *	by break_lease().
  *
- *	Called with file_lock_lock held.
+ *	Called with inode->i_lock held.
  */
 int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
 {
@@ -1529,11 +1569,12 @@ static int __vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 
 int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 {
+	struct inode *inode = file_inode(filp);
 	int error;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	error = __vfs_setlease(filp, arg, lease);
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 	return error;
 }
@@ -1551,6 +1592,7 @@ static int do_fcntl_delete_lease(struct file *filp)
 static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
 {
 	struct file_lock *fl, *ret;
+	struct inode *inode = file_inode(filp);
 	struct fasync_struct *new;
 	int error;
 
@@ -1564,10 +1606,10 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
 		return -ENOMEM;
 	}
 	ret = fl;
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	error = __vfs_setlease(filp, arg, &ret);
 	if (error) {
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		locks_free_lock(fl);
 		goto out_free_fasync;
 	}
@@ -1584,7 +1626,7 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
 		new = NULL;
 
 	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 
 out_free_fasync:
 	if (new)
@@ -2108,7 +2150,7 @@ void locks_remove_flock(struct file *filp)
 			fl.fl_ops->fl_release_private(&fl);
 	}
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	before = &inode->i_flock;
 
 	while ((fl = *before) != NULL) {
@@ -2126,7 +2168,7 @@ void locks_remove_flock(struct file *filp)
  		}
 		before = &fl->fl_next;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 }
 
 /**
@@ -2140,12 +2182,12 @@ posix_unblock_lock(struct file_lock *waiter)
 {
 	int status = 0;
 
-	lock_flocks();
+	spin_lock(&file_lock_lock);
 	if (waiter->fl_next)
 		__locks_delete_block(waiter);
 	else
 		status = -ENOENT;
-	unlock_flocks();
+	spin_unlock(&file_lock_lock);
 	return status;
 }
 EXPORT_SYMBOL(posix_unblock_lock);
@@ -2259,7 +2301,7 @@ static void *locks_start(struct seq_file *f, loff_t *pos)
 {
 	loff_t *p = f->private;
 
-	lock_flocks();
+	spin_lock(&file_lock_lock);
 	*p = (*pos + 1);
 	return seq_list_start(&file_lock_list, *pos);
 }
@@ -2273,7 +2315,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 
 static void locks_stop(struct seq_file *f, void *v)
 {
-	unlock_flocks();
+	spin_unlock(&file_lock_lock);
 }
 
 static const struct seq_operations locks_seq_operations = {
@@ -2320,7 +2362,8 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
 {
 	struct file_lock *fl;
 	int result = 1;
-	lock_flocks();
+
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (IS_POSIX(fl)) {
 			if (fl->fl_type == F_RDLCK)
@@ -2337,7 +2380,7 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
 		result = 0;
 		break;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return result;
 }
 
@@ -2360,7 +2403,8 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
 {
 	struct file_lock *fl;
 	int result = 1;
-	lock_flocks();
+
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (IS_POSIX(fl)) {
 			if ((fl->fl_end < start) || (fl->fl_start > (start + len)))
@@ -2375,7 +2419,7 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
 		result = 0;
 		break;
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return result;
 }
 
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 57db324..7ec4814 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -73,20 +73,20 @@ static int nfs_delegation_claim_locks(struct nfs_open_context *ctx, struct nfs4_
 	if (inode->i_flock == NULL)
 		goto out;
 
-	/* Protect inode->i_flock using the file locks lock */
-	lock_flocks();
+	/* Protect inode->i_flock using the i_lock */
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!(fl->fl_flags & (FL_POSIX|FL_FLOCK)))
 			continue;
 		if (nfs_file_open_context(fl->fl_file) != ctx)
 			continue;
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		status = nfs4_lock_delegation_recall(fl, state, stateid);
 		if (status < 0)
 			goto out;
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 out:
 	return status;
 }
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 1fab140..ff10b4a 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1373,13 +1373,13 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
 	/* Guard against delegation returns and new lock/unlock calls */
 	down_write(&nfsi->rwsem);
 	/* Protect inode->i_flock using the BKL */
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!(fl->fl_flags & (FL_POSIX|FL_FLOCK)))
 			continue;
 		if (nfs_file_open_context(fl->fl_file)->state != state)
 			continue;
-		unlock_flocks();
+		spin_unlock(&inode->i_lock);
 		status = ops->recover_lock(state, fl);
 		switch (status) {
 			case 0:
@@ -1406,9 +1406,9 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
 				/* kill_proc(fl->fl_pid, SIGLOST, 1); */
 				status = 0;
 		}
-		lock_flocks();
+		spin_lock(&inode->i_lock);
 	}
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 out:
 	up_write(&nfsi->rwsem);
 	return status;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 316ec84..f170518 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2645,13 +2645,13 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp)
 
 	list_add_tail(&dp->dl_recall_lru, &nn->del_recall_lru);
 
-	/* only place dl_time is set. protected by lock_flocks*/
+	/* Only place dl_time is set; protected by i_lock: */
 	dp->dl_time = get_seconds();
 
 	nfsd4_cb_recall(dp);
 }
 
-/* Called from break_lease() with lock_flocks() held. */
+/* Called from break_lease() with i_lock held. */
 static void nfsd_break_deleg_cb(struct file_lock *fl)
 {
 	struct nfs4_file *fp = (struct nfs4_file *)fl->fl_owner;
@@ -4520,7 +4520,7 @@ check_for_locks(struct nfs4_file *filp, struct nfs4_lockowner *lowner)
 	struct inode *inode = filp->fi_inode;
 	int status = 0;
 
-	lock_flocks();
+	spin_lock(&inode->i_lock);
 	for (flpp = &inode->i_flock; *flpp != NULL; flpp = &(*flpp)->fl_next) {
 		if ((*flpp)->fl_owner == (fl_owner_t)lowner) {
 			status = 1;
@@ -4528,7 +4528,7 @@ check_for_locks(struct nfs4_file *filp, struct nfs4_lockowner *lowner)
 		}
 	}
 out:
-	unlock_flocks();
+	spin_unlock(&inode->i_lock);
 	return status;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cba0fb5..805320b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1024,8 +1024,6 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
 extern int lease_modify(struct file_lock **, int);
 extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
 extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
-extern void lock_flocks(void);
-extern void unlock_flocks(void);
 #else /* !CONFIG_FILE_LOCKING */
 static inline int fcntl_getlk(struct file *file, struct flock __user *user)
 {
@@ -1166,15 +1164,6 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
 {
 	return 1;
 }
-
-static inline void lock_flocks(void)
-{
-}
-
-static inline void unlock_flocks(void)
-{
-}
-
 #endif /* !CONFIG_FILE_LOCKING */
 
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 08/14] locks: avoid taking global lock if possible when waking up blocked waiters
  2013-06-21 12:58 ` Jeff Layton
  (?)
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: linux-cifs, linux-nfs, cluster-devel, sage, samba-technical,
	Trond.Myklebust, linux-kernel, piastryyy, linux-afs, dhowells,
	smfrench, linux-fsdevel, ceph-devel, akpm

Since we always hold the i_lock when inserting a new waiter onto the
fl_block list, we can avoid taking the global lock at all if we find
that it's empty when we go to wake up blocked waiters.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index ce302d4..84e269f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -548,7 +548,10 @@ static void locks_delete_block(struct file_lock *waiter)
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
  *
- * Must be called with file_lock_lock held!
+ * Must be called with both the i_lock and file_lock_lock held. The fl_block
+ * list itself is protected by the file_lock_list, but by ensuring that the
+ * i_lock is also held on insertions we can avoid taking the file_lock_lock
+ * in some cases when we see that the fl_block list is empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
 					struct file_lock *waiter)
@@ -576,6 +579,16 @@ static void locks_insert_block(struct file_lock *blocker,
  */
 static void locks_wake_up_blocks(struct file_lock *blocker)
 {
+	/*
+	 * Avoid taking global lock if list is empty. This is safe since new
+	 * blocked requests are only added to the list under the i_lock, and
+	 * the i_lock is always held here. Note that removal from the fl_block
+	 * list does not require the i_lock, so we must recheck list_empty()
+	 * after acquiring the file_lock_lock.
+	 */
+	if (list_empty(&blocker->fl_block))
+		return;
+
 	spin_lock(&file_lock_lock);
 	while (!list_empty(&blocker->fl_block)) {
 		struct file_lock *waiter;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 08/14] locks: avoid taking global lock if possible when waking up blocked waiters
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

Since we always hold the i_lock when inserting a new waiter onto the
fl_block list, we can avoid taking the global lock at all if we find
that it's empty when we go to wake up blocked waiters.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index ce302d4..84e269f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -548,7 +548,10 @@ static void locks_delete_block(struct file_lock *waiter)
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
  *
- * Must be called with file_lock_lock held!
+ * Must be called with both the i_lock and file_lock_lock held. The fl_block
+ * list itself is protected by the file_lock_list, but by ensuring that the
+ * i_lock is also held on insertions we can avoid taking the file_lock_lock
+ * in some cases when we see that the fl_block list is empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
 					struct file_lock *waiter)
@@ -576,6 +579,16 @@ static void locks_insert_block(struct file_lock *blocker,
  */
 static void locks_wake_up_blocks(struct file_lock *blocker)
 {
+	/*
+	 * Avoid taking global lock if list is empty. This is safe since new
+	 * blocked requests are only added to the list under the i_lock, and
+	 * the i_lock is always held here. Note that removal from the fl_block
+	 * list does not require the i_lock, so we must recheck list_empty()
+	 * after acquiring the file_lock_lock.
+	 */
+	if (list_empty(&blocker->fl_block))
+		return;
+
 	spin_lock(&file_lock_lock);
 	while (!list_empty(&blocker->fl_block)) {
 		struct file_lock *waiter;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 08/14] locks: avoid taking global lock if possible when waking up blocked waiters
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Since we always hold the i_lock when inserting a new waiter onto the
fl_block list, we can avoid taking the global lock at all if we find
that it's empty when we go to wake up blocked waiters.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index ce302d4..84e269f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -548,7 +548,10 @@ static void locks_delete_block(struct file_lock *waiter)
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
  *
- * Must be called with file_lock_lock held!
+ * Must be called with both the i_lock and file_lock_lock held. The fl_block
+ * list itself is protected by the file_lock_list, but by ensuring that the
+ * i_lock is also held on insertions we can avoid taking the file_lock_lock
+ * in some cases when we see that the fl_block list is empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
 					struct file_lock *waiter)
@@ -576,6 +579,16 @@ static void locks_insert_block(struct file_lock *blocker,
  */
 static void locks_wake_up_blocks(struct file_lock *blocker)
 {
+	/*
+	 * Avoid taking global lock if list is empty. This is safe since new
+	 * blocked requests are only added to the list under the i_lock, and
+	 * the i_lock is always held here. Note that removal from the fl_block
+	 * list does not require the i_lock, so we must recheck list_empty()
+	 * after acquiring the file_lock_lock.
+	 */
+	if (list_empty(&blocker->fl_block))
+		return;
+
 	spin_lock(&file_lock_lock);
 	while (!list_empty(&blocker->fl_block)) {
 		struct file_lock *waiter;
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 09/14] locks: convert fl_link to a hlist_node
  2013-06-21 12:58 ` Jeff Layton
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

Testing has shown that iterating over the blocked_list for deadlock
detection turns out to be a bottleneck. In order to alleviate that,
begin the process of turning it into a hashtable. We start by turning
the fl_link into a hlist_node and the global lists into hlists. A later
patch will do the conversion of the blocked_list to a hashtable.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c         |   24 ++++++++++++------------
 include/linux/fs.h |    2 +-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 84e269f..941b714 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -157,13 +157,13 @@ int lease_break_time = 45;
  * The global file_lock_list is only used for displaying /proc/locks. Protected
  * by the file_lock_lock.
  */
-static LIST_HEAD(file_lock_list);
+static HLIST_HEAD(file_lock_list);
 
 /*
  * The blocked_list is used to find POSIX lock loops for deadlock detection.
  * Protected by file_lock_lock.
  */
-static LIST_HEAD(blocked_list);
+static HLIST_HEAD(blocked_list);
 
 /*
  * This lock protects the blocked_list, and the file_lock_list. Generally, if
@@ -188,7 +188,7 @@ static struct kmem_cache *filelock_cache __read_mostly;
 
 static void locks_init_lock_heads(struct file_lock *fl)
 {
-	INIT_LIST_HEAD(&fl->fl_link);
+	INIT_HLIST_NODE(&fl->fl_link);
 	INIT_LIST_HEAD(&fl->fl_block);
 	init_waitqueue_head(&fl->fl_wait);
 }
@@ -222,7 +222,7 @@ void locks_free_lock(struct file_lock *fl)
 {
 	BUG_ON(waitqueue_active(&fl->fl_wait));
 	BUG_ON(!list_empty(&fl->fl_block));
-	BUG_ON(!list_empty(&fl->fl_link));
+	BUG_ON(!hlist_unhashed(&fl->fl_link));
 
 	locks_release_private(fl);
 	kmem_cache_free(filelock_cache, fl);
@@ -500,7 +500,7 @@ static inline void
 locks_insert_global_locks(struct file_lock *fl)
 {
 	spin_lock(&file_lock_lock);
-	list_add_tail(&fl->fl_link, &file_lock_list);
+	hlist_add_head(&fl->fl_link, &file_lock_list);
 	spin_unlock(&file_lock_lock);
 }
 
@@ -508,20 +508,20 @@ static inline void
 locks_delete_global_locks(struct file_lock *fl)
 {
 	spin_lock(&file_lock_lock);
-	list_del_init(&fl->fl_link);
+	hlist_del_init(&fl->fl_link);
 	spin_unlock(&file_lock_lock);
 }
 
 static inline void
 locks_insert_global_blocked(struct file_lock *waiter)
 {
-	list_add(&waiter->fl_link, &blocked_list);
+	hlist_add_head(&waiter->fl_link, &blocked_list);
 }
 
 static inline void
 locks_delete_global_blocked(struct file_lock *waiter)
 {
-	list_del_init(&waiter->fl_link);
+	hlist_del_init(&waiter->fl_link);
 }
 
 /* Remove waiter from blocker's block list.
@@ -748,7 +748,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 {
 	struct file_lock *fl;
 
-	list_for_each_entry(fl, &blocked_list, fl_link) {
+	hlist_for_each_entry(fl, &blocked_list, fl_link) {
 		if (posix_same_owner(fl, block_fl))
 			return fl->fl_next;
 	}
@@ -2300,7 +2300,7 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct file_lock *fl, *bfl;
 
-	fl = list_entry(v, struct file_lock, fl_link);
+	fl = hlist_entry(v, struct file_lock, fl_link);
 
 	lock_get_status(f, fl, *((loff_t *)f->private), "");
 
@@ -2316,14 +2316,14 @@ static void *locks_start(struct seq_file *f, loff_t *pos)
 
 	spin_lock(&file_lock_lock);
 	*p = (*pos + 1);
-	return seq_list_start(&file_lock_list, *pos);
+	return seq_hlist_start(&file_lock_list, *pos);
 }
 
 static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 {
 	loff_t *p = f->private;
 	++*p;
-	return seq_list_next(v, &file_lock_list, pos);
+	return seq_hlist_next(v, &file_lock_list, pos);
 }
 
 static void locks_stop(struct seq_file *f, void *v)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 805320b..85ef56a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -946,7 +946,7 @@ int locks_in_grace(struct net *);
  */
 struct file_lock {
 	struct file_lock *fl_next;	/* singly linked list for this inode  */
-	struct list_head fl_link;	/* doubly linked list of all locks */
+	struct hlist_node fl_link;	/* node in global lists */
 	struct list_head fl_block;	/* circular list of blocked processes */
 	fl_owner_t fl_owner;
 	unsigned int fl_flags;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 09/14] locks: convert fl_link to a hlist_node
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Testing has shown that iterating over the blocked_list for deadlock
detection turns out to be a bottleneck. In order to alleviate that,
begin the process of turning it into a hashtable. We start by turning
the fl_link into a hlist_node and the global lists into hlists. A later
patch will do the conversion of the blocked_list to a hashtable.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c         |   24 ++++++++++++------------
 include/linux/fs.h |    2 +-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 84e269f..941b714 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -157,13 +157,13 @@ int lease_break_time = 45;
  * The global file_lock_list is only used for displaying /proc/locks. Protected
  * by the file_lock_lock.
  */
-static LIST_HEAD(file_lock_list);
+static HLIST_HEAD(file_lock_list);
 
 /*
  * The blocked_list is used to find POSIX lock loops for deadlock detection.
  * Protected by file_lock_lock.
  */
-static LIST_HEAD(blocked_list);
+static HLIST_HEAD(blocked_list);
 
 /*
  * This lock protects the blocked_list, and the file_lock_list. Generally, if
@@ -188,7 +188,7 @@ static struct kmem_cache *filelock_cache __read_mostly;
 
 static void locks_init_lock_heads(struct file_lock *fl)
 {
-	INIT_LIST_HEAD(&fl->fl_link);
+	INIT_HLIST_NODE(&fl->fl_link);
 	INIT_LIST_HEAD(&fl->fl_block);
 	init_waitqueue_head(&fl->fl_wait);
 }
@@ -222,7 +222,7 @@ void locks_free_lock(struct file_lock *fl)
 {
 	BUG_ON(waitqueue_active(&fl->fl_wait));
 	BUG_ON(!list_empty(&fl->fl_block));
-	BUG_ON(!list_empty(&fl->fl_link));
+	BUG_ON(!hlist_unhashed(&fl->fl_link));
 
 	locks_release_private(fl);
 	kmem_cache_free(filelock_cache, fl);
@@ -500,7 +500,7 @@ static inline void
 locks_insert_global_locks(struct file_lock *fl)
 {
 	spin_lock(&file_lock_lock);
-	list_add_tail(&fl->fl_link, &file_lock_list);
+	hlist_add_head(&fl->fl_link, &file_lock_list);
 	spin_unlock(&file_lock_lock);
 }
 
@@ -508,20 +508,20 @@ static inline void
 locks_delete_global_locks(struct file_lock *fl)
 {
 	spin_lock(&file_lock_lock);
-	list_del_init(&fl->fl_link);
+	hlist_del_init(&fl->fl_link);
 	spin_unlock(&file_lock_lock);
 }
 
 static inline void
 locks_insert_global_blocked(struct file_lock *waiter)
 {
-	list_add(&waiter->fl_link, &blocked_list);
+	hlist_add_head(&waiter->fl_link, &blocked_list);
 }
 
 static inline void
 locks_delete_global_blocked(struct file_lock *waiter)
 {
-	list_del_init(&waiter->fl_link);
+	hlist_del_init(&waiter->fl_link);
 }
 
 /* Remove waiter from blocker's block list.
@@ -748,7 +748,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 {
 	struct file_lock *fl;
 
-	list_for_each_entry(fl, &blocked_list, fl_link) {
+	hlist_for_each_entry(fl, &blocked_list, fl_link) {
 		if (posix_same_owner(fl, block_fl))
 			return fl->fl_next;
 	}
@@ -2300,7 +2300,7 @@ static int locks_show(struct seq_file *f, void *v)
 {
 	struct file_lock *fl, *bfl;
 
-	fl = list_entry(v, struct file_lock, fl_link);
+	fl = hlist_entry(v, struct file_lock, fl_link);
 
 	lock_get_status(f, fl, *((loff_t *)f->private), "");
 
@@ -2316,14 +2316,14 @@ static void *locks_start(struct seq_file *f, loff_t *pos)
 
 	spin_lock(&file_lock_lock);
 	*p = (*pos + 1);
-	return seq_list_start(&file_lock_list, *pos);
+	return seq_hlist_start(&file_lock_list, *pos);
 }
 
 static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 {
 	loff_t *p = f->private;
 	++*p;
-	return seq_list_next(v, &file_lock_list, pos);
+	return seq_hlist_next(v, &file_lock_list, pos);
 }
 
 static void locks_stop(struct seq_file *f, void *v)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 805320b..85ef56a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -946,7 +946,7 @@ int locks_in_grace(struct net *);
  */
 struct file_lock {
 	struct file_lock *fl_next;	/* singly linked list for this inode  */
-	struct list_head fl_link;	/* doubly linked list of all locks */
+	struct hlist_node fl_link;	/* node in global lists */
 	struct list_head fl_block;	/* circular list of blocked processes */
 	fl_owner_t fl_owner;
 	unsigned int fl_flags;
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 10/14] locks: turn the blocked_list into a hashtable
  2013-06-21 12:58 ` Jeff Layton
  (?)
@ 2013-06-21 12:58     ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, matthew-Ztpu424NOJ8,
	bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, sage-4GqslpFJ+cxBDgjK7y7TUQ,
	smfrench-Re5JQEeQqe8AvxtiuMwx3w, swhiteho-H+wXaHxf7aLQT0dZR+AlfA,
	Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	samba-technical-w/Ol4Ecudpl8XjKLYN78aQ,
	cluster-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	piastryyy-Re5JQEeQqe8AvxtiuMwx3w

Break up the blocked_list into a hashtable, using the fl_owner as a key.
This speeds up searching the hash chains, which is especially significant
for deadlock detection.

Note that the initial implementation assumes that hashing on fl_owner is
sufficient. In most cases it should be, with the notable exception being
server-side lockd, which compares ownership using a tuple of the
nlm_host and the pid sent in the lock request. So, this may degrade to a
single hash bucket when you only have a single NFS client. That will be
addressed in a later patch.

The careful observer may note that this patch leaves the file_lock_list
alone. There's much less of a case for turning the file_lock_list into a
hashtable. The only user of that list is the code that generates
/proc/locks, and it always walks the entire list.

Signed-off-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Acked-by: J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
---
 fs/locks.c |   25 +++++++++++++++++--------
 1 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 941b714..71d847c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -126,6 +126,7 @@
 #include <linux/time.h>
 #include <linux/rcupdate.h>
 #include <linux/pid_namespace.h>
+#include <linux/hashtable.h>
 
 #include <asm/uaccess.h>
 
@@ -160,13 +161,21 @@ int lease_break_time = 45;
 static HLIST_HEAD(file_lock_list);
 
 /*
- * The blocked_list is used to find POSIX lock loops for deadlock detection.
- * Protected by file_lock_lock.
+ * The blocked_hash is used to find POSIX lock loops for deadlock detection.
+ * It is protected by file_lock_lock.
+ *
+ * We hash locks by lockowner in order to optimize searching for the lock a
+ * particular lockowner is waiting on.
+ *
+ * FIXME: make this value scale via some heuristic? We generally will want more
+ * buckets when we have more lockowners holding locks, but that's a little
+ * difficult to determine without knowing what the workload will look like.
  */
-static HLIST_HEAD(blocked_list);
+#define BLOCKED_HASH_BITS	7
+static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
 
 /*
- * This lock protects the blocked_list, and the file_lock_list. Generally, if
+ * This lock protects the blocked_hash and the file_lock_list. Generally, if
  * you're accessing one of those lists, you want to be holding this lock.
  *
  * In addition, it also protects the fl->fl_block list, and the fl->fl_next
@@ -515,13 +524,13 @@ locks_delete_global_locks(struct file_lock *fl)
 static inline void
 locks_insert_global_blocked(struct file_lock *waiter)
 {
-	hlist_add_head(&waiter->fl_link, &blocked_list);
+	hash_add(blocked_hash, &waiter->fl_link, (unsigned long)waiter->fl_owner);
 }
 
 static inline void
 locks_delete_global_blocked(struct file_lock *waiter)
 {
-	hlist_del_init(&waiter->fl_link);
+	hash_del(&waiter->fl_link);
 }
 
 /* Remove waiter from blocker's block list.
@@ -748,7 +757,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 {
 	struct file_lock *fl;
 
-	hlist_for_each_entry(fl, &blocked_list, fl_link) {
+	hash_for_each_possible(blocked_hash, fl, fl_link, (unsigned long)block_fl->fl_owner) {
 		if (posix_same_owner(fl, block_fl))
 			return fl->fl_next;
 	}
@@ -884,7 +893,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	/*
 	 * New lock request. Walk all POSIX locks and look for conflicts. If
 	 * there are any, either return error or put the request on the
-	 * blocker's list of waiters and the global blocked_list.
+	 * blocker's list of waiters and the global blocked_hash.
 	 */
 	if (request->fl_type != F_UNLCK) {
 		for_each_lock(inode, before) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 10/14] locks: turn the blocked_list into a hashtable
@ 2013-06-21 12:58     ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

Break up the blocked_list into a hashtable, using the fl_owner as a key.
This speeds up searching the hash chains, which is especially significant
for deadlock detection.

Note that the initial implementation assumes that hashing on fl_owner is
sufficient. In most cases it should be, with the notable exception being
server-side lockd, which compares ownership using a tuple of the
nlm_host and the pid sent in the lock request. So, this may degrade to a
single hash bucket when you only have a single NFS client. That will be
addressed in a later patch.

The careful observer may note that this patch leaves the file_lock_list
alone. There's much less of a case for turning the file_lock_list into a
hashtable. The only user of that list is the code that generates
/proc/locks, and it always walks the entire list.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c |   25 +++++++++++++++++--------
 1 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 941b714..71d847c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -126,6 +126,7 @@
 #include <linux/time.h>
 #include <linux/rcupdate.h>
 #include <linux/pid_namespace.h>
+#include <linux/hashtable.h>
 
 #include <asm/uaccess.h>
 
@@ -160,13 +161,21 @@ int lease_break_time = 45;
 static HLIST_HEAD(file_lock_list);
 
 /*
- * The blocked_list is used to find POSIX lock loops for deadlock detection.
- * Protected by file_lock_lock.
+ * The blocked_hash is used to find POSIX lock loops for deadlock detection.
+ * It is protected by file_lock_lock.
+ *
+ * We hash locks by lockowner in order to optimize searching for the lock a
+ * particular lockowner is waiting on.
+ *
+ * FIXME: make this value scale via some heuristic? We generally will want more
+ * buckets when we have more lockowners holding locks, but that's a little
+ * difficult to determine without knowing what the workload will look like.
  */
-static HLIST_HEAD(blocked_list);
+#define BLOCKED_HASH_BITS	7
+static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
 
 /*
- * This lock protects the blocked_list, and the file_lock_list. Generally, if
+ * This lock protects the blocked_hash and the file_lock_list. Generally, if
  * you're accessing one of those lists, you want to be holding this lock.
  *
  * In addition, it also protects the fl->fl_block list, and the fl->fl_next
@@ -515,13 +524,13 @@ locks_delete_global_locks(struct file_lock *fl)
 static inline void
 locks_insert_global_blocked(struct file_lock *waiter)
 {
-	hlist_add_head(&waiter->fl_link, &blocked_list);
+	hash_add(blocked_hash, &waiter->fl_link, (unsigned long)waiter->fl_owner);
 }
 
 static inline void
 locks_delete_global_blocked(struct file_lock *waiter)
 {
-	hlist_del_init(&waiter->fl_link);
+	hash_del(&waiter->fl_link);
 }
 
 /* Remove waiter from blocker's block list.
@@ -748,7 +757,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 {
 	struct file_lock *fl;
 
-	hlist_for_each_entry(fl, &blocked_list, fl_link) {
+	hash_for_each_possible(blocked_hash, fl, fl_link, (unsigned long)block_fl->fl_owner) {
 		if (posix_same_owner(fl, block_fl))
 			return fl->fl_next;
 	}
@@ -884,7 +893,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	/*
 	 * New lock request. Walk all POSIX locks and look for conflicts. If
 	 * there are any, either return error or put the request on the
-	 * blocker's list of waiters and the global blocked_list.
+	 * blocker's list of waiters and the global blocked_hash.
 	 */
 	if (request->fl_type != F_UNLCK) {
 		for_each_lock(inode, before) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 10/14] locks: turn the blocked_list into a hashtable
@ 2013-06-21 12:58     ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Break up the blocked_list into a hashtable, using the fl_owner as a key.
This speeds up searching the hash chains, which is especially significant
for deadlock detection.

Note that the initial implementation assumes that hashing on fl_owner is
sufficient. In most cases it should be, with the notable exception being
server-side lockd, which compares ownership using a tuple of the
nlm_host and the pid sent in the lock request. So, this may degrade to a
single hash bucket when you only have a single NFS client. That will be
addressed in a later patch.

The careful observer may note that this patch leaves the file_lock_list
alone. There's much less of a case for turning the file_lock_list into a
hashtable. The only user of that list is the code that generates
/proc/locks, and it always walks the entire list.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c |   25 +++++++++++++++++--------
 1 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 941b714..71d847c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -126,6 +126,7 @@
 #include <linux/time.h>
 #include <linux/rcupdate.h>
 #include <linux/pid_namespace.h>
+#include <linux/hashtable.h>
 
 #include <asm/uaccess.h>
 
@@ -160,13 +161,21 @@ int lease_break_time = 45;
 static HLIST_HEAD(file_lock_list);
 
 /*
- * The blocked_list is used to find POSIX lock loops for deadlock detection.
- * Protected by file_lock_lock.
+ * The blocked_hash is used to find POSIX lock loops for deadlock detection.
+ * It is protected by file_lock_lock.
+ *
+ * We hash locks by lockowner in order to optimize searching for the lock a
+ * particular lockowner is waiting on.
+ *
+ * FIXME: make this value scale via some heuristic? We generally will want more
+ * buckets when we have more lockowners holding locks, but that's a little
+ * difficult to determine without knowing what the workload will look like.
  */
-static HLIST_HEAD(blocked_list);
+#define BLOCKED_HASH_BITS	7
+static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
 
 /*
- * This lock protects the blocked_list, and the file_lock_list. Generally, if
+ * This lock protects the blocked_hash and the file_lock_list. Generally, if
  * you're accessing one of those lists, you want to be holding this lock.
  *
  * In addition, it also protects the fl->fl_block list, and the fl->fl_next
@@ -515,13 +524,13 @@ locks_delete_global_locks(struct file_lock *fl)
 static inline void
 locks_insert_global_blocked(struct file_lock *waiter)
 {
-	hlist_add_head(&waiter->fl_link, &blocked_list);
+	hash_add(blocked_hash, &waiter->fl_link, (unsigned long)waiter->fl_owner);
 }
 
 static inline void
 locks_delete_global_blocked(struct file_lock *waiter)
 {
-	hlist_del_init(&waiter->fl_link);
+	hash_del(&waiter->fl_link);
 }
 
 /* Remove waiter from blocker's block list.
@@ -748,7 +757,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 {
 	struct file_lock *fl;
 
-	hlist_for_each_entry(fl, &blocked_list, fl_link) {
+	hash_for_each_possible(blocked_hash, fl, fl_link, (unsigned long)block_fl->fl_owner) {
 		if (posix_same_owner(fl, block_fl))
 			return fl->fl_next;
 	}
@@ -884,7 +893,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	/*
 	 * New lock request. Walk all POSIX locks and look for conflicts. If
 	 * there are any, either return error or put the request on the
-	 * blocker's list of waiters and the global blocked_list.
+	 * blocker's list of waiters and the global blocked_hash.
 	 */
 	if (request->fl_type != F_UNLCK) {
 		for_each_lock(inode, before) {
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 11/14] locks: add a new "lm_owner_key" lock operation
  2013-06-21 12:58 ` Jeff Layton
  (?)
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: linux-cifs, linux-nfs, cluster-devel, sage, samba-technical,
	Trond.Myklebust, linux-kernel, piastryyy, linux-afs, dhowells,
	smfrench, linux-fsdevel, ceph-devel, akpm

Currently, the hashing that the locking code uses to add these values
to the blocked_hash is simply calculated using fl_owner field. That's
valid in most cases except for server-side lockd, which validates the
owner of a lock based on fl_owner and fl_pid.

In the case where you have a small number of NFS clients doing a lot
of locking between different processes, you could end up with all
the blocked requests sitting in a very small number of hash buckets.

Add a new lm_owner_key operation to the lock_manager_operations that
will generate an unsigned long to use as the key in the hashtable.
That function is only implemented for server-side lockd, and simply
XORs the fl_owner and fl_pid.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 Documentation/filesystems/Locking |   16 +++++++++++-----
 fs/lockd/svclock.c                |   12 ++++++++++++
 fs/locks.c                        |   12 ++++++++++--
 include/linux/fs.h                |    1 +
 4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 413685f..dfeb01b 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -351,6 +351,7 @@ fl_release_private:	maybe		no
 ----------------------- lock_manager_operations ---------------------------
 prototypes:
 	int (*lm_compare_owner)(struct file_lock *, struct file_lock *);
+	unsigned long (*lm_owner_key)(struct file_lock *);
 	void (*lm_notify)(struct file_lock *);  /* unblock callback */
 	int (*lm_grant)(struct file_lock *, struct file_lock *, int);
 	void (*lm_break)(struct file_lock *); /* break_lease callback */
@@ -360,16 +361,21 @@ locking rules:
 
 			inode->i_lock	file_lock_lock	may block
 lm_compare_owner:	yes[1]		maybe		no
+lm_owner_key		yes[1]		yes		no
 lm_notify:		yes		yes		no
 lm_grant:		no		no		no
 lm_break:		yes		no		no
 lm_change		yes		no		no
 
-[1]:	->lm_compare_owner is generally called with *an* inode->i_lock held. It
-may not be the i_lock of the inode for either file_lock being compared! This is
-the case with deadlock detection, since the code has to chase down the owners
-of locks that may be entirely unrelated to the one on which the lock is being
-acquired. When doing a search for deadlocks, the file_lock_lock is also held.
+[1]:	->lm_compare_owner and ->lm_owner_key are generally called with
+*an* inode->i_lock held. It may not be the i_lock of the inode
+associated with either file_lock argument! This is the case with deadlock
+detection, since the code has to chase down the owners of locks that may
+be entirely unrelated to the one on which the lock is being acquired.
+For deadlock detection however, the file_lock_lock is also held. The
+fact that these locks are held ensures that the file_locks do not
+disappear out from under you while doing the comparison or generating an
+owner key.
 
 --------------------------- buffer_head -----------------------------------
 prototypes:
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index a469098..067778b 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -744,8 +744,20 @@ static int nlmsvc_same_owner(struct file_lock *fl1, struct file_lock *fl2)
 	return fl1->fl_owner == fl2->fl_owner && fl1->fl_pid == fl2->fl_pid;
 }
 
+/*
+ * Since NLM uses two "keys" for tracking locks, we need to hash them down
+ * to one for the blocked_hash. Here, we're just xor'ing the host address
+ * with the pid in order to create a key value for picking a hash bucket.
+ */
+static unsigned long
+nlmsvc_owner_key(struct file_lock *fl)
+{
+	return (unsigned long)fl->fl_owner ^ (unsigned long)fl->fl_pid;
+}
+
 const struct lock_manager_operations nlmsvc_lock_operations = {
 	.lm_compare_owner = nlmsvc_same_owner,
+	.lm_owner_key = nlmsvc_owner_key,
 	.lm_notify = nlmsvc_notify_blocked,
 	.lm_grant = nlmsvc_grant_deferred,
 };
diff --git a/fs/locks.c b/fs/locks.c
index 71d847c..6242e0b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -521,10 +521,18 @@ locks_delete_global_locks(struct file_lock *fl)
 	spin_unlock(&file_lock_lock);
 }
 
+static unsigned long
+posix_owner_key(struct file_lock *fl)
+{
+	if (fl->fl_lmops && fl->fl_lmops->lm_owner_key)
+		return fl->fl_lmops->lm_owner_key(fl);
+	return (unsigned long)fl->fl_owner;
+}
+
 static inline void
 locks_insert_global_blocked(struct file_lock *waiter)
 {
-	hash_add(blocked_hash, &waiter->fl_link, (unsigned long)waiter->fl_owner);
+	hash_add(blocked_hash, &waiter->fl_link, posix_owner_key(waiter));
 }
 
 static inline void
@@ -757,7 +765,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 {
 	struct file_lock *fl;
 
-	hash_for_each_possible(blocked_hash, fl, fl_link, (unsigned long)block_fl->fl_owner) {
+	hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) {
 		if (posix_same_owner(fl, block_fl))
 			return fl->fl_next;
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 85ef56a..e42e04f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -908,6 +908,7 @@ struct file_lock_operations {
 
 struct lock_manager_operations {
 	int (*lm_compare_owner)(struct file_lock *, struct file_lock *);
+	unsigned long (*lm_owner_key)(struct file_lock *);
 	void (*lm_notify)(struct file_lock *);	/* unblock callback */
 	int (*lm_grant)(struct file_lock *, struct file_lock *, int);
 	void (*lm_break)(struct file_lock *);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 11/14] locks: add a new "lm_owner_key" lock operation
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

Currently, the hashing that the locking code uses to add these values
to the blocked_hash is simply calculated using fl_owner field. That's
valid in most cases except for server-side lockd, which validates the
owner of a lock based on fl_owner and fl_pid.

In the case where you have a small number of NFS clients doing a lot
of locking between different processes, you could end up with all
the blocked requests sitting in a very small number of hash buckets.

Add a new lm_owner_key operation to the lock_manager_operations that
will generate an unsigned long to use as the key in the hashtable.
That function is only implemented for server-side lockd, and simply
XORs the fl_owner and fl_pid.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 Documentation/filesystems/Locking |   16 +++++++++++-----
 fs/lockd/svclock.c                |   12 ++++++++++++
 fs/locks.c                        |   12 ++++++++++--
 include/linux/fs.h                |    1 +
 4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 413685f..dfeb01b 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -351,6 +351,7 @@ fl_release_private:	maybe		no
 ----------------------- lock_manager_operations ---------------------------
 prototypes:
 	int (*lm_compare_owner)(struct file_lock *, struct file_lock *);
+	unsigned long (*lm_owner_key)(struct file_lock *);
 	void (*lm_notify)(struct file_lock *);  /* unblock callback */
 	int (*lm_grant)(struct file_lock *, struct file_lock *, int);
 	void (*lm_break)(struct file_lock *); /* break_lease callback */
@@ -360,16 +361,21 @@ locking rules:
 
 			inode->i_lock	file_lock_lock	may block
 lm_compare_owner:	yes[1]		maybe		no
+lm_owner_key		yes[1]		yes		no
 lm_notify:		yes		yes		no
 lm_grant:		no		no		no
 lm_break:		yes		no		no
 lm_change		yes		no		no
 
-[1]:	->lm_compare_owner is generally called with *an* inode->i_lock held. It
-may not be the i_lock of the inode for either file_lock being compared! This is
-the case with deadlock detection, since the code has to chase down the owners
-of locks that may be entirely unrelated to the one on which the lock is being
-acquired. When doing a search for deadlocks, the file_lock_lock is also held.
+[1]:	->lm_compare_owner and ->lm_owner_key are generally called with
+*an* inode->i_lock held. It may not be the i_lock of the inode
+associated with either file_lock argument! This is the case with deadlock
+detection, since the code has to chase down the owners of locks that may
+be entirely unrelated to the one on which the lock is being acquired.
+For deadlock detection however, the file_lock_lock is also held. The
+fact that these locks are held ensures that the file_locks do not
+disappear out from under you while doing the comparison or generating an
+owner key.
 
 --------------------------- buffer_head -----------------------------------
 prototypes:
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index a469098..067778b 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -744,8 +744,20 @@ static int nlmsvc_same_owner(struct file_lock *fl1, struct file_lock *fl2)
 	return fl1->fl_owner == fl2->fl_owner && fl1->fl_pid == fl2->fl_pid;
 }
 
+/*
+ * Since NLM uses two "keys" for tracking locks, we need to hash them down
+ * to one for the blocked_hash. Here, we're just xor'ing the host address
+ * with the pid in order to create a key value for picking a hash bucket.
+ */
+static unsigned long
+nlmsvc_owner_key(struct file_lock *fl)
+{
+	return (unsigned long)fl->fl_owner ^ (unsigned long)fl->fl_pid;
+}
+
 const struct lock_manager_operations nlmsvc_lock_operations = {
 	.lm_compare_owner = nlmsvc_same_owner,
+	.lm_owner_key = nlmsvc_owner_key,
 	.lm_notify = nlmsvc_notify_blocked,
 	.lm_grant = nlmsvc_grant_deferred,
 };
diff --git a/fs/locks.c b/fs/locks.c
index 71d847c..6242e0b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -521,10 +521,18 @@ locks_delete_global_locks(struct file_lock *fl)
 	spin_unlock(&file_lock_lock);
 }
 
+static unsigned long
+posix_owner_key(struct file_lock *fl)
+{
+	if (fl->fl_lmops && fl->fl_lmops->lm_owner_key)
+		return fl->fl_lmops->lm_owner_key(fl);
+	return (unsigned long)fl->fl_owner;
+}
+
 static inline void
 locks_insert_global_blocked(struct file_lock *waiter)
 {
-	hash_add(blocked_hash, &waiter->fl_link, (unsigned long)waiter->fl_owner);
+	hash_add(blocked_hash, &waiter->fl_link, posix_owner_key(waiter));
 }
 
 static inline void
@@ -757,7 +765,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 {
 	struct file_lock *fl;
 
-	hash_for_each_possible(blocked_hash, fl, fl_link, (unsigned long)block_fl->fl_owner) {
+	hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) {
 		if (posix_same_owner(fl, block_fl))
 			return fl->fl_next;
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 85ef56a..e42e04f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -908,6 +908,7 @@ struct file_lock_operations {
 
 struct lock_manager_operations {
 	int (*lm_compare_owner)(struct file_lock *, struct file_lock *);
+	unsigned long (*lm_owner_key)(struct file_lock *);
 	void (*lm_notify)(struct file_lock *);	/* unblock callback */
 	int (*lm_grant)(struct file_lock *, struct file_lock *, int);
 	void (*lm_break)(struct file_lock *);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 11/14] locks: add a new "lm_owner_key" lock operation
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Currently, the hashing that the locking code uses to add these values
to the blocked_hash is simply calculated using fl_owner field. That's
valid in most cases except for server-side lockd, which validates the
owner of a lock based on fl_owner and fl_pid.

In the case where you have a small number of NFS clients doing a lot
of locking between different processes, you could end up with all
the blocked requests sitting in a very small number of hash buckets.

Add a new lm_owner_key operation to the lock_manager_operations that
will generate an unsigned long to use as the key in the hashtable.
That function is only implemented for server-side lockd, and simply
XORs the fl_owner and fl_pid.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 Documentation/filesystems/Locking |   16 +++++++++++-----
 fs/lockd/svclock.c                |   12 ++++++++++++
 fs/locks.c                        |   12 ++++++++++--
 include/linux/fs.h                |    1 +
 4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 413685f..dfeb01b 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -351,6 +351,7 @@ fl_release_private:	maybe		no
 ----------------------- lock_manager_operations ---------------------------
 prototypes:
 	int (*lm_compare_owner)(struct file_lock *, struct file_lock *);
+	unsigned long (*lm_owner_key)(struct file_lock *);
 	void (*lm_notify)(struct file_lock *);  /* unblock callback */
 	int (*lm_grant)(struct file_lock *, struct file_lock *, int);
 	void (*lm_break)(struct file_lock *); /* break_lease callback */
@@ -360,16 +361,21 @@ locking rules:
 
 			inode->i_lock	file_lock_lock	may block
 lm_compare_owner:	yes[1]		maybe		no
+lm_owner_key		yes[1]		yes		no
 lm_notify:		yes		yes		no
 lm_grant:		no		no		no
 lm_break:		yes		no		no
 lm_change		yes		no		no
 
-[1]:	->lm_compare_owner is generally called with *an* inode->i_lock held. It
-may not be the i_lock of the inode for either file_lock being compared! This is
-the case with deadlock detection, since the code has to chase down the owners
-of locks that may be entirely unrelated to the one on which the lock is being
-acquired. When doing a search for deadlocks, the file_lock_lock is also held.
+[1]:	->lm_compare_owner and ->lm_owner_key are generally called with
+*an* inode->i_lock held. It may not be the i_lock of the inode
+associated with either file_lock argument! This is the case with deadlock
+detection, since the code has to chase down the owners of locks that may
+be entirely unrelated to the one on which the lock is being acquired.
+For deadlock detection however, the file_lock_lock is also held. The
+fact that these locks are held ensures that the file_locks do not
+disappear out from under you while doing the comparison or generating an
+owner key.
 
 --------------------------- buffer_head -----------------------------------
 prototypes:
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index a469098..067778b 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -744,8 +744,20 @@ static int nlmsvc_same_owner(struct file_lock *fl1, struct file_lock *fl2)
 	return fl1->fl_owner == fl2->fl_owner && fl1->fl_pid == fl2->fl_pid;
 }
 
+/*
+ * Since NLM uses two "keys" for tracking locks, we need to hash them down
+ * to one for the blocked_hash. Here, we're just xor'ing the host address
+ * with the pid in order to create a key value for picking a hash bucket.
+ */
+static unsigned long
+nlmsvc_owner_key(struct file_lock *fl)
+{
+	return (unsigned long)fl->fl_owner ^ (unsigned long)fl->fl_pid;
+}
+
 const struct lock_manager_operations nlmsvc_lock_operations = {
 	.lm_compare_owner = nlmsvc_same_owner,
+	.lm_owner_key = nlmsvc_owner_key,
 	.lm_notify = nlmsvc_notify_blocked,
 	.lm_grant = nlmsvc_grant_deferred,
 };
diff --git a/fs/locks.c b/fs/locks.c
index 71d847c..6242e0b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -521,10 +521,18 @@ locks_delete_global_locks(struct file_lock *fl)
 	spin_unlock(&file_lock_lock);
 }
 
+static unsigned long
+posix_owner_key(struct file_lock *fl)
+{
+	if (fl->fl_lmops && fl->fl_lmops->lm_owner_key)
+		return fl->fl_lmops->lm_owner_key(fl);
+	return (unsigned long)fl->fl_owner;
+}
+
 static inline void
 locks_insert_global_blocked(struct file_lock *waiter)
 {
-	hash_add(blocked_hash, &waiter->fl_link, (unsigned long)waiter->fl_owner);
+	hash_add(blocked_hash, &waiter->fl_link, posix_owner_key(waiter));
 }
 
 static inline void
@@ -757,7 +765,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 {
 	struct file_lock *fl;
 
-	hash_for_each_possible(blocked_hash, fl, fl_link, (unsigned long)block_fl->fl_owner) {
+	hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) {
 		if (posix_same_owner(fl, block_fl))
 			return fl->fl_next;
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 85ef56a..e42e04f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -908,6 +908,7 @@ struct file_lock_operations {
 
 struct lock_manager_operations {
 	int (*lm_compare_owner)(struct file_lock *, struct file_lock *);
+	unsigned long (*lm_owner_key)(struct file_lock *);
 	void (*lm_notify)(struct file_lock *);	/* unblock callback */
 	int (*lm_grant)(struct file_lock *, struct file_lock *, int);
 	void (*lm_break)(struct file_lock *);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 12/14] locks: give the blocked_hash its own spinlock
  2013-06-21 12:58 ` Jeff Layton
  (?)
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: linux-cifs, linux-nfs, cluster-devel, sage, samba-technical,
	Trond.Myklebust, linux-kernel, piastryyy, linux-afs, dhowells,
	smfrench, linux-fsdevel, ceph-devel, akpm

There's no reason we have to protect the blocked_hash and file_lock_list
with the same spinlock. With the tests I have, breaking it in two gives
a barely measurable performance benefit, but it seems reasonable to make
this locking as granular as possible.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 Documentation/filesystems/Locking |   16 +++++++-------
 fs/locks.c                        |   41 +++++++++++++++++++-----------------
 2 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index dfeb01b..cf04448 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -359,20 +359,20 @@ prototypes:
 
 locking rules:
 
-			inode->i_lock	file_lock_lock	may block
-lm_compare_owner:	yes[1]		maybe		no
-lm_owner_key		yes[1]		yes		no
-lm_notify:		yes		yes		no
-lm_grant:		no		no		no
-lm_break:		yes		no		no
-lm_change		yes		no		no
+			inode->i_lock	blocked_lock_lock	may block
+lm_compare_owner:	yes[1]		maybe			no
+lm_owner_key		yes[1]		yes			no
+lm_notify:		yes		yes			no
+lm_grant:		no		no			no
+lm_break:		yes		no			no
+lm_change		yes		no			no
 
 [1]:	->lm_compare_owner and ->lm_owner_key are generally called with
 *an* inode->i_lock held. It may not be the i_lock of the inode
 associated with either file_lock argument! This is the case with deadlock
 detection, since the code has to chase down the owners of locks that may
 be entirely unrelated to the one on which the lock is being acquired.
-For deadlock detection however, the file_lock_lock is also held. The
+For deadlock detection however, the blocked_lock_lock is also held. The
 fact that these locks are held ensures that the file_locks do not
 disappear out from under you while doing the comparison or generating an
 owner key.
diff --git a/fs/locks.c b/fs/locks.c
index 6242e0b..04e2c1f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -159,10 +159,11 @@ int lease_break_time = 45;
  * by the file_lock_lock.
  */
 static HLIST_HEAD(file_lock_list);
+static DEFINE_SPINLOCK(file_lock_lock);
 
 /*
  * The blocked_hash is used to find POSIX lock loops for deadlock detection.
- * It is protected by file_lock_lock.
+ * It is protected by blocked_lock_lock.
  *
  * We hash locks by lockowner in order to optimize searching for the lock a
  * particular lockowner is waiting on.
@@ -175,8 +176,8 @@ static HLIST_HEAD(file_lock_list);
 static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
 
 /*
- * This lock protects the blocked_hash and the file_lock_list. Generally, if
- * you're accessing one of those lists, you want to be holding this lock.
+ * This lock protects the blocked_hash. Generally, if you're accessing it, you
+ * want to be holding this lock.
  *
  * In addition, it also protects the fl->fl_block list, and the fl->fl_next
  * pointer for file_lock structures that are acting as lock requests (in
@@ -191,7 +192,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
  * both the i_lock and the blocked_lock_lock (acquired in that order). Deleting
  * an entry from the list however only requires the file_lock_lock.
  */
-static DEFINE_SPINLOCK(file_lock_lock);
+static DEFINE_SPINLOCK(blocked_lock_lock);
 
 static struct kmem_cache *filelock_cache __read_mostly;
 
@@ -544,7 +545,7 @@ locks_delete_global_blocked(struct file_lock *waiter)
 /* Remove waiter from blocker's block list.
  * When blocker ends up pointing to itself then the list is empty.
  *
- * Must be called with file_lock_lock held.
+ * Must be called with blocked_lock_lock held.
  */
 static void __locks_delete_block(struct file_lock *waiter)
 {
@@ -555,9 +556,9 @@ static void __locks_delete_block(struct file_lock *waiter)
 
 static void locks_delete_block(struct file_lock *waiter)
 {
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	__locks_delete_block(waiter);
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 }
 
 /* Insert waiter into blocker's block list.
@@ -565,9 +566,9 @@ static void locks_delete_block(struct file_lock *waiter)
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
  *
- * Must be called with both the i_lock and file_lock_lock held. The fl_block
+ * Must be called with both the i_lock and blocked_lock_lock held. The fl_block
  * list itself is protected by the file_lock_list, but by ensuring that the
- * i_lock is also held on insertions we can avoid taking the file_lock_lock
+ * i_lock is also held on insertions we can avoid taking the blocked_lock_lock
  * in some cases when we see that the fl_block list is empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
@@ -584,9 +585,9 @@ static void __locks_insert_block(struct file_lock *blocker,
 static void locks_insert_block(struct file_lock *blocker,
 					struct file_lock *waiter)
 {
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	__locks_insert_block(blocker, waiter);
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 }
 
 /*
@@ -601,12 +602,12 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 	 * blocked requests are only added to the list under the i_lock, and
 	 * the i_lock is always held here. Note that removal from the fl_block
 	 * list does not require the i_lock, so we must recheck list_empty()
-	 * after acquiring the file_lock_lock.
+	 * after acquiring the blocked_lock_lock.
 	 */
 	if (list_empty(&blocker->fl_block))
 		return;
 
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	while (!list_empty(&blocker->fl_block)) {
 		struct file_lock *waiter;
 
@@ -618,7 +619,7 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 		else
 			wake_up(&waiter->fl_wait);
 	}
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 }
 
 /* Insert file lock fl into an inode's lock list at the position indicated
@@ -772,7 +773,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 	return NULL;
 }
 
-/* Must be called with the file_lock_lock held! */
+/* Must be called with the blocked_lock_lock held! */
 static int posix_locks_deadlock(struct file_lock *caller_fl,
 				struct file_lock *block_fl)
 {
@@ -920,12 +921,12 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 			 * locks list must be done while holding the same lock!
 			 */
 			error = -EDEADLK;
-			spin_lock(&file_lock_lock);
+			spin_lock(&blocked_lock_lock);
 			if (likely(!posix_locks_deadlock(request, fl))) {
 				error = FILE_LOCK_DEFERRED;
 				__locks_insert_block(fl, request);
 			}
-			spin_unlock(&file_lock_lock);
+			spin_unlock(&blocked_lock_lock);
 			goto out;
   		}
   	}
@@ -2212,12 +2213,12 @@ posix_unblock_lock(struct file_lock *waiter)
 {
 	int status = 0;
 
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	if (waiter->fl_next)
 		__locks_delete_block(waiter);
 	else
 		status = -ENOENT;
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 	return status;
 }
 EXPORT_SYMBOL(posix_unblock_lock);
@@ -2332,6 +2333,7 @@ static void *locks_start(struct seq_file *f, loff_t *pos)
 	loff_t *p = f->private;
 
 	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	*p = (*pos + 1);
 	return seq_hlist_start(&file_lock_list, *pos);
 }
@@ -2345,6 +2347,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 
 static void locks_stop(struct seq_file *f, void *v)
 {
+	spin_unlock(&blocked_lock_lock);
 	spin_unlock(&file_lock_lock);
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 12/14] locks: give the blocked_hash its own spinlock
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

There's no reason we have to protect the blocked_hash and file_lock_list
with the same spinlock. With the tests I have, breaking it in two gives
a barely measurable performance benefit, but it seems reasonable to make
this locking as granular as possible.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 Documentation/filesystems/Locking |   16 +++++++-------
 fs/locks.c                        |   41 +++++++++++++++++++-----------------
 2 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index dfeb01b..cf04448 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -359,20 +359,20 @@ prototypes:
 
 locking rules:
 
-			inode->i_lock	file_lock_lock	may block
-lm_compare_owner:	yes[1]		maybe		no
-lm_owner_key		yes[1]		yes		no
-lm_notify:		yes		yes		no
-lm_grant:		no		no		no
-lm_break:		yes		no		no
-lm_change		yes		no		no
+			inode->i_lock	blocked_lock_lock	may block
+lm_compare_owner:	yes[1]		maybe			no
+lm_owner_key		yes[1]		yes			no
+lm_notify:		yes		yes			no
+lm_grant:		no		no			no
+lm_break:		yes		no			no
+lm_change		yes		no			no
 
 [1]:	->lm_compare_owner and ->lm_owner_key are generally called with
 *an* inode->i_lock held. It may not be the i_lock of the inode
 associated with either file_lock argument! This is the case with deadlock
 detection, since the code has to chase down the owners of locks that may
 be entirely unrelated to the one on which the lock is being acquired.
-For deadlock detection however, the file_lock_lock is also held. The
+For deadlock detection however, the blocked_lock_lock is also held. The
 fact that these locks are held ensures that the file_locks do not
 disappear out from under you while doing the comparison or generating an
 owner key.
diff --git a/fs/locks.c b/fs/locks.c
index 6242e0b..04e2c1f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -159,10 +159,11 @@ int lease_break_time = 45;
  * by the file_lock_lock.
  */
 static HLIST_HEAD(file_lock_list);
+static DEFINE_SPINLOCK(file_lock_lock);
 
 /*
  * The blocked_hash is used to find POSIX lock loops for deadlock detection.
- * It is protected by file_lock_lock.
+ * It is protected by blocked_lock_lock.
  *
  * We hash locks by lockowner in order to optimize searching for the lock a
  * particular lockowner is waiting on.
@@ -175,8 +176,8 @@ static HLIST_HEAD(file_lock_list);
 static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
 
 /*
- * This lock protects the blocked_hash and the file_lock_list. Generally, if
- * you're accessing one of those lists, you want to be holding this lock.
+ * This lock protects the blocked_hash. Generally, if you're accessing it, you
+ * want to be holding this lock.
  *
  * In addition, it also protects the fl->fl_block list, and the fl->fl_next
  * pointer for file_lock structures that are acting as lock requests (in
@@ -191,7 +192,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
  * both the i_lock and the blocked_lock_lock (acquired in that order). Deleting
  * an entry from the list however only requires the file_lock_lock.
  */
-static DEFINE_SPINLOCK(file_lock_lock);
+static DEFINE_SPINLOCK(blocked_lock_lock);
 
 static struct kmem_cache *filelock_cache __read_mostly;
 
@@ -544,7 +545,7 @@ locks_delete_global_blocked(struct file_lock *waiter)
 /* Remove waiter from blocker's block list.
  * When blocker ends up pointing to itself then the list is empty.
  *
- * Must be called with file_lock_lock held.
+ * Must be called with blocked_lock_lock held.
  */
 static void __locks_delete_block(struct file_lock *waiter)
 {
@@ -555,9 +556,9 @@ static void __locks_delete_block(struct file_lock *waiter)
 
 static void locks_delete_block(struct file_lock *waiter)
 {
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	__locks_delete_block(waiter);
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 }
 
 /* Insert waiter into blocker's block list.
@@ -565,9 +566,9 @@ static void locks_delete_block(struct file_lock *waiter)
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
  *
- * Must be called with both the i_lock and file_lock_lock held. The fl_block
+ * Must be called with both the i_lock and blocked_lock_lock held. The fl_block
  * list itself is protected by the file_lock_list, but by ensuring that the
- * i_lock is also held on insertions we can avoid taking the file_lock_lock
+ * i_lock is also held on insertions we can avoid taking the blocked_lock_lock
  * in some cases when we see that the fl_block list is empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
@@ -584,9 +585,9 @@ static void __locks_insert_block(struct file_lock *blocker,
 static void locks_insert_block(struct file_lock *blocker,
 					struct file_lock *waiter)
 {
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	__locks_insert_block(blocker, waiter);
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 }
 
 /*
@@ -601,12 +602,12 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 	 * blocked requests are only added to the list under the i_lock, and
 	 * the i_lock is always held here. Note that removal from the fl_block
 	 * list does not require the i_lock, so we must recheck list_empty()
-	 * after acquiring the file_lock_lock.
+	 * after acquiring the blocked_lock_lock.
 	 */
 	if (list_empty(&blocker->fl_block))
 		return;
 
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	while (!list_empty(&blocker->fl_block)) {
 		struct file_lock *waiter;
 
@@ -618,7 +619,7 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 		else
 			wake_up(&waiter->fl_wait);
 	}
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 }
 
 /* Insert file lock fl into an inode's lock list at the position indicated
@@ -772,7 +773,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 	return NULL;
 }
 
-/* Must be called with the file_lock_lock held! */
+/* Must be called with the blocked_lock_lock held! */
 static int posix_locks_deadlock(struct file_lock *caller_fl,
 				struct file_lock *block_fl)
 {
@@ -920,12 +921,12 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 			 * locks list must be done while holding the same lock!
 			 */
 			error = -EDEADLK;
-			spin_lock(&file_lock_lock);
+			spin_lock(&blocked_lock_lock);
 			if (likely(!posix_locks_deadlock(request, fl))) {
 				error = FILE_LOCK_DEFERRED;
 				__locks_insert_block(fl, request);
 			}
-			spin_unlock(&file_lock_lock);
+			spin_unlock(&blocked_lock_lock);
 			goto out;
   		}
   	}
@@ -2212,12 +2213,12 @@ posix_unblock_lock(struct file_lock *waiter)
 {
 	int status = 0;
 
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	if (waiter->fl_next)
 		__locks_delete_block(waiter);
 	else
 		status = -ENOENT;
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 	return status;
 }
 EXPORT_SYMBOL(posix_unblock_lock);
@@ -2332,6 +2333,7 @@ static void *locks_start(struct seq_file *f, loff_t *pos)
 	loff_t *p = f->private;
 
 	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	*p = (*pos + 1);
 	return seq_hlist_start(&file_lock_list, *pos);
 }
@@ -2345,6 +2347,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 
 static void locks_stop(struct seq_file *f, void *v)
 {
+	spin_unlock(&blocked_lock_lock);
 	spin_unlock(&file_lock_lock);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 12/14] locks: give the blocked_hash its own spinlock
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

There's no reason we have to protect the blocked_hash and file_lock_list
with the same spinlock. With the tests I have, breaking it in two gives
a barely measurable performance benefit, but it seems reasonable to make
this locking as granular as possible.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 Documentation/filesystems/Locking |   16 +++++++-------
 fs/locks.c                        |   41 +++++++++++++++++++-----------------
 2 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index dfeb01b..cf04448 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -359,20 +359,20 @@ prototypes:
 
 locking rules:
 
-			inode->i_lock	file_lock_lock	may block
-lm_compare_owner:	yes[1]		maybe		no
-lm_owner_key		yes[1]		yes		no
-lm_notify:		yes		yes		no
-lm_grant:		no		no		no
-lm_break:		yes		no		no
-lm_change		yes		no		no
+			inode->i_lock	blocked_lock_lock	may block
+lm_compare_owner:	yes[1]		maybe			no
+lm_owner_key		yes[1]		yes			no
+lm_notify:		yes		yes			no
+lm_grant:		no		no			no
+lm_break:		yes		no			no
+lm_change		yes		no			no
 
 [1]:	->lm_compare_owner and ->lm_owner_key are generally called with
 *an* inode->i_lock held. It may not be the i_lock of the inode
 associated with either file_lock argument! This is the case with deadlock
 detection, since the code has to chase down the owners of locks that may
 be entirely unrelated to the one on which the lock is being acquired.
-For deadlock detection however, the file_lock_lock is also held. The
+For deadlock detection however, the blocked_lock_lock is also held. The
 fact that these locks are held ensures that the file_locks do not
 disappear out from under you while doing the comparison or generating an
 owner key.
diff --git a/fs/locks.c b/fs/locks.c
index 6242e0b..04e2c1f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -159,10 +159,11 @@ int lease_break_time = 45;
  * by the file_lock_lock.
  */
 static HLIST_HEAD(file_lock_list);
+static DEFINE_SPINLOCK(file_lock_lock);
 
 /*
  * The blocked_hash is used to find POSIX lock loops for deadlock detection.
- * It is protected by file_lock_lock.
+ * It is protected by blocked_lock_lock.
  *
  * We hash locks by lockowner in order to optimize searching for the lock a
  * particular lockowner is waiting on.
@@ -175,8 +176,8 @@ static HLIST_HEAD(file_lock_list);
 static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
 
 /*
- * This lock protects the blocked_hash and the file_lock_list. Generally, if
- * you're accessing one of those lists, you want to be holding this lock.
+ * This lock protects the blocked_hash. Generally, if you're accessing it, you
+ * want to be holding this lock.
  *
  * In addition, it also protects the fl->fl_block list, and the fl->fl_next
  * pointer for file_lock structures that are acting as lock requests (in
@@ -191,7 +192,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
  * both the i_lock and the blocked_lock_lock (acquired in that order). Deleting
  * an entry from the list however only requires the file_lock_lock.
  */
-static DEFINE_SPINLOCK(file_lock_lock);
+static DEFINE_SPINLOCK(blocked_lock_lock);
 
 static struct kmem_cache *filelock_cache __read_mostly;
 
@@ -544,7 +545,7 @@ locks_delete_global_blocked(struct file_lock *waiter)
 /* Remove waiter from blocker's block list.
  * When blocker ends up pointing to itself then the list is empty.
  *
- * Must be called with file_lock_lock held.
+ * Must be called with blocked_lock_lock held.
  */
 static void __locks_delete_block(struct file_lock *waiter)
 {
@@ -555,9 +556,9 @@ static void __locks_delete_block(struct file_lock *waiter)
 
 static void locks_delete_block(struct file_lock *waiter)
 {
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	__locks_delete_block(waiter);
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 }
 
 /* Insert waiter into blocker's block list.
@@ -565,9 +566,9 @@ static void locks_delete_block(struct file_lock *waiter)
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
  *
- * Must be called with both the i_lock and file_lock_lock held. The fl_block
+ * Must be called with both the i_lock and blocked_lock_lock held. The fl_block
  * list itself is protected by the file_lock_list, but by ensuring that the
- * i_lock is also held on insertions we can avoid taking the file_lock_lock
+ * i_lock is also held on insertions we can avoid taking the blocked_lock_lock
  * in some cases when we see that the fl_block list is empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
@@ -584,9 +585,9 @@ static void __locks_insert_block(struct file_lock *blocker,
 static void locks_insert_block(struct file_lock *blocker,
 					struct file_lock *waiter)
 {
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	__locks_insert_block(blocker, waiter);
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 }
 
 /*
@@ -601,12 +602,12 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 	 * blocked requests are only added to the list under the i_lock, and
 	 * the i_lock is always held here. Note that removal from the fl_block
 	 * list does not require the i_lock, so we must recheck list_empty()
-	 * after acquiring the file_lock_lock.
+	 * after acquiring the blocked_lock_lock.
 	 */
 	if (list_empty(&blocker->fl_block))
 		return;
 
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	while (!list_empty(&blocker->fl_block)) {
 		struct file_lock *waiter;
 
@@ -618,7 +619,7 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 		else
 			wake_up(&waiter->fl_wait);
 	}
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 }
 
 /* Insert file lock fl into an inode's lock list at the position indicated
@@ -772,7 +773,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
 	return NULL;
 }
 
-/* Must be called with the file_lock_lock held! */
+/* Must be called with the blocked_lock_lock held! */
 static int posix_locks_deadlock(struct file_lock *caller_fl,
 				struct file_lock *block_fl)
 {
@@ -920,12 +921,12 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 			 * locks list must be done while holding the same lock!
 			 */
 			error = -EDEADLK;
-			spin_lock(&file_lock_lock);
+			spin_lock(&blocked_lock_lock);
 			if (likely(!posix_locks_deadlock(request, fl))) {
 				error = FILE_LOCK_DEFERRED;
 				__locks_insert_block(fl, request);
 			}
-			spin_unlock(&file_lock_lock);
+			spin_unlock(&blocked_lock_lock);
 			goto out;
   		}
   	}
@@ -2212,12 +2213,12 @@ posix_unblock_lock(struct file_lock *waiter)
 {
 	int status = 0;
 
-	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	if (waiter->fl_next)
 		__locks_delete_block(waiter);
 	else
 		status = -ENOENT;
-	spin_unlock(&file_lock_lock);
+	spin_unlock(&blocked_lock_lock);
 	return status;
 }
 EXPORT_SYMBOL(posix_unblock_lock);
@@ -2332,6 +2333,7 @@ static void *locks_start(struct seq_file *f, loff_t *pos)
 	loff_t *p = f->private;
 
 	spin_lock(&file_lock_lock);
+	spin_lock(&blocked_lock_lock);
 	*p = (*pos + 1);
 	return seq_hlist_start(&file_lock_list, *pos);
 }
@@ -2345,6 +2347,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 
 static void locks_stop(struct seq_file *f, void *v)
 {
+	spin_unlock(&blocked_lock_lock);
 	spin_unlock(&file_lock_lock);
 }
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 13/14] seq_file: add seq_list_*_percpu helpers
  2013-06-21 12:58 ` Jeff Layton
  (?)
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: linux-cifs, linux-nfs, cluster-devel, sage, samba-technical,
	Trond.Myklebust, linux-kernel, piastryyy, linux-afs, dhowells,
	smfrench, linux-fsdevel, ceph-devel, akpm

When we convert the file_lock_list to a set of percpu lists, we'll need
a way to iterate over them in order to output /proc/locks info. Add
some seq_list_*_percpu helpers to handle that.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/seq_file.c            |   54 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/seq_file.h |    6 +++++
 2 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 774c1eb..3135c25 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -921,3 +921,57 @@ struct hlist_node *seq_hlist_next_rcu(void *v,
 		return rcu_dereference(node->next);
 }
 EXPORT_SYMBOL(seq_hlist_next_rcu);
+
+/**
+ * seq_hlist_start_precpu - start an iteration of a percpu hlist array
+ * @head: pointer to percpu array of struct hlist_heads
+ * @cpu:  pointer to cpu "cursor"
+ * @pos:  start position of sequence
+ *
+ * Called at seq_file->op->start().
+ */
+struct hlist_node *
+seq_hlist_start_percpu(struct hlist_head __percpu *head, int *cpu, loff_t pos)
+{
+	struct hlist_node *node;
+
+	for_each_possible_cpu(*cpu) {
+		hlist_for_each(node, per_cpu_ptr(head, *cpu)) {
+			if (pos-- == 0)
+				return node;
+		}
+	}
+	return NULL;
+}
+EXPORT_SYMBOL(seq_hlist_start_percpu);
+
+/**
+ * seq_hlist_next_percpu - move to the next position of the percpu hlist array
+ * @v:    pointer to current hlist_node
+ * @head: pointer to percpu array of struct hlist_heads
+ * @cpu:  pointer to cpu "cursor"
+ * @pos:  start position of sequence
+ *
+ * Called at seq_file->op->next().
+ */
+struct hlist_node *
+seq_hlist_next_percpu(void *v, struct hlist_head __percpu *head,
+			int *cpu, loff_t *pos)
+{
+	struct hlist_node *node = v;
+
+	++*pos;
+
+	if (node->next)
+		return node->next;
+
+	for (*cpu = cpumask_next(*cpu, cpu_possible_mask); *cpu < nr_cpu_ids;
+	     *cpu = cpumask_next(*cpu, cpu_possible_mask)) {
+		struct hlist_head *bucket = per_cpu_ptr(head, *cpu);
+
+		if (!hlist_empty(bucket))
+			return bucket->first;
+	}
+	return NULL;
+}
+EXPORT_SYMBOL(seq_hlist_next_percpu);
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index 2da29ac..4e32edc 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -173,4 +173,10 @@ extern struct hlist_node *seq_hlist_start_head_rcu(struct hlist_head *head,
 extern struct hlist_node *seq_hlist_next_rcu(void *v,
 						   struct hlist_head *head,
 						   loff_t *ppos);
+
+/* Helpers for iterating over per-cpu hlist_head-s in seq_files */
+extern struct hlist_node *seq_hlist_start_percpu(struct hlist_head __percpu *head, int *cpu, loff_t pos);
+
+extern struct hlist_node *seq_hlist_next_percpu(void *v, struct hlist_head __percpu *head, int *cpu, loff_t *pos);
+
 #endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 13/14] seq_file: add seq_list_*_percpu helpers
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

When we convert the file_lock_list to a set of percpu lists, we'll need
a way to iterate over them in order to output /proc/locks info. Add
some seq_list_*_percpu helpers to handle that.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/seq_file.c            |   54 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/seq_file.h |    6 +++++
 2 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 774c1eb..3135c25 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -921,3 +921,57 @@ struct hlist_node *seq_hlist_next_rcu(void *v,
 		return rcu_dereference(node->next);
 }
 EXPORT_SYMBOL(seq_hlist_next_rcu);
+
+/**
+ * seq_hlist_start_precpu - start an iteration of a percpu hlist array
+ * @head: pointer to percpu array of struct hlist_heads
+ * @cpu:  pointer to cpu "cursor"
+ * @pos:  start position of sequence
+ *
+ * Called at seq_file->op->start().
+ */
+struct hlist_node *
+seq_hlist_start_percpu(struct hlist_head __percpu *head, int *cpu, loff_t pos)
+{
+	struct hlist_node *node;
+
+	for_each_possible_cpu(*cpu) {
+		hlist_for_each(node, per_cpu_ptr(head, *cpu)) {
+			if (pos-- == 0)
+				return node;
+		}
+	}
+	return NULL;
+}
+EXPORT_SYMBOL(seq_hlist_start_percpu);
+
+/**
+ * seq_hlist_next_percpu - move to the next position of the percpu hlist array
+ * @v:    pointer to current hlist_node
+ * @head: pointer to percpu array of struct hlist_heads
+ * @cpu:  pointer to cpu "cursor"
+ * @pos:  start position of sequence
+ *
+ * Called at seq_file->op->next().
+ */
+struct hlist_node *
+seq_hlist_next_percpu(void *v, struct hlist_head __percpu *head,
+			int *cpu, loff_t *pos)
+{
+	struct hlist_node *node = v;
+
+	++*pos;
+
+	if (node->next)
+		return node->next;
+
+	for (*cpu = cpumask_next(*cpu, cpu_possible_mask); *cpu < nr_cpu_ids;
+	     *cpu = cpumask_next(*cpu, cpu_possible_mask)) {
+		struct hlist_head *bucket = per_cpu_ptr(head, *cpu);
+
+		if (!hlist_empty(bucket))
+			return bucket->first;
+	}
+	return NULL;
+}
+EXPORT_SYMBOL(seq_hlist_next_percpu);
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index 2da29ac..4e32edc 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -173,4 +173,10 @@ extern struct hlist_node *seq_hlist_start_head_rcu(struct hlist_head *head,
 extern struct hlist_node *seq_hlist_next_rcu(void *v,
 						   struct hlist_head *head,
 						   loff_t *ppos);
+
+/* Helpers for iterating over per-cpu hlist_head-s in seq_files */
+extern struct hlist_node *seq_hlist_start_percpu(struct hlist_head __percpu *head, int *cpu, loff_t pos);
+
+extern struct hlist_node *seq_hlist_next_percpu(void *v, struct hlist_head __percpu *head, int *cpu, loff_t *pos);
+
 #endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 13/14] seq_file: add seq_list_*_percpu helpers
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When we convert the file_lock_list to a set of percpu lists, we'll need
a way to iterate over them in order to output /proc/locks info. Add
some seq_list_*_percpu helpers to handle that.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/seq_file.c            |   54 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/seq_file.h |    6 +++++
 2 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 774c1eb..3135c25 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -921,3 +921,57 @@ struct hlist_node *seq_hlist_next_rcu(void *v,
 		return rcu_dereference(node->next);
 }
 EXPORT_SYMBOL(seq_hlist_next_rcu);
+
+/**
+ * seq_hlist_start_precpu - start an iteration of a percpu hlist array
+ * @head: pointer to percpu array of struct hlist_heads
+ * @cpu:  pointer to cpu "cursor"
+ * @pos:  start position of sequence
+ *
+ * Called at seq_file->op->start().
+ */
+struct hlist_node *
+seq_hlist_start_percpu(struct hlist_head __percpu *head, int *cpu, loff_t pos)
+{
+	struct hlist_node *node;
+
+	for_each_possible_cpu(*cpu) {
+		hlist_for_each(node, per_cpu_ptr(head, *cpu)) {
+			if (pos-- == 0)
+				return node;
+		}
+	}
+	return NULL;
+}
+EXPORT_SYMBOL(seq_hlist_start_percpu);
+
+/**
+ * seq_hlist_next_percpu - move to the next position of the percpu hlist array
+ * @v:    pointer to current hlist_node
+ * @head: pointer to percpu array of struct hlist_heads
+ * @cpu:  pointer to cpu "cursor"
+ * @pos:  start position of sequence
+ *
+ * Called at seq_file->op->next().
+ */
+struct hlist_node *
+seq_hlist_next_percpu(void *v, struct hlist_head __percpu *head,
+			int *cpu, loff_t *pos)
+{
+	struct hlist_node *node = v;
+
+	++*pos;
+
+	if (node->next)
+		return node->next;
+
+	for (*cpu = cpumask_next(*cpu, cpu_possible_mask); *cpu < nr_cpu_ids;
+	     *cpu = cpumask_next(*cpu, cpu_possible_mask)) {
+		struct hlist_head *bucket = per_cpu_ptr(head, *cpu);
+
+		if (!hlist_empty(bucket))
+			return bucket->first;
+	}
+	return NULL;
+}
+EXPORT_SYMBOL(seq_hlist_next_percpu);
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index 2da29ac..4e32edc 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -173,4 +173,10 @@ extern struct hlist_node *seq_hlist_start_head_rcu(struct hlist_head *head,
 extern struct hlist_node *seq_hlist_next_rcu(void *v,
 						   struct hlist_head *head,
 						   loff_t *ppos);
+
+/* Helpers for iterating over per-cpu hlist_head-s in seq_files */
+extern struct hlist_node *seq_hlist_start_percpu(struct hlist_head __percpu *head, int *cpu, loff_t pos);
+
+extern struct hlist_node *seq_hlist_next_percpu(void *v, struct hlist_head __percpu *head, int *cpu, loff_t *pos);
+
 #endif
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 14/14] locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock
  2013-06-21 12:58 ` Jeff Layton
@ 2013-06-21 12:58   ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: viro, matthew, bfields
  Cc: dhowells, sage, smfrench, swhiteho, Trond.Myklebust, akpm,
	linux-kernel, linux-afs, ceph-devel, linux-cifs, samba-technical,
	cluster-devel, linux-nfs, linux-fsdevel, piastryyy

The file_lock_list is only used for /proc/locks. The vastly common case
is for locks to be put onto the list and come off again, without ever
being traversed.

Help optimize for this use-case by moving to percpu hlist_head-s. At the
same time, we can make the locking less contentious by moving to an
lglock. When iterating over the lists for /proc/locks, we must take the
global lock and then iterate over each CPU's list in turn.

This change necessitates a new fl_link_cpu field to keep track of which
CPU the entry is on. On x86_64 at least, this field is placed within an
existing hole in the struct to avoid growing the size.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c         |   69 +++++++++++++++++++++++++++++++++++++---------------
 include/linux/fs.h |    1 +
 2 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 04e2c1f..0f544cd 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -127,6 +127,8 @@
 #include <linux/rcupdate.h>
 #include <linux/pid_namespace.h>
 #include <linux/hashtable.h>
+#include <linux/percpu.h>
+#include <linux/lglock.h>
 
 #include <asm/uaccess.h>
 
@@ -155,11 +157,13 @@ int lease_break_time = 45;
 	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
 
 /*
- * The global file_lock_list is only used for displaying /proc/locks. Protected
- * by the file_lock_lock.
+ * The global file_lock_list is only used for displaying /proc/locks, so we
+ * keep a list on each CPU, with each list protected by its own spinlock via
+ * the file_lock_lglock. Note that alterations to the list also require that
+ * the relevant i_lock is held.
  */
-static HLIST_HEAD(file_lock_list);
-static DEFINE_SPINLOCK(file_lock_lock);
+DEFINE_STATIC_LGLOCK(file_lock_lglock);
+static DEFINE_PER_CPU(struct hlist_head, file_lock_list);
 
 /*
  * The blocked_hash is used to find POSIX lock loops for deadlock detection.
@@ -506,20 +510,30 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
 	return fl1->fl_owner == fl2->fl_owner;
 }
 
+/* Must be called with the i_lock held! */
 static inline void
 locks_insert_global_locks(struct file_lock *fl)
 {
-	spin_lock(&file_lock_lock);
-	hlist_add_head(&fl->fl_link, &file_lock_list);
-	spin_unlock(&file_lock_lock);
+	lg_local_lock(&file_lock_lglock);
+	fl->fl_link_cpu = smp_processor_id();
+	hlist_add_head(&fl->fl_link, this_cpu_ptr(&file_lock_list));
+	lg_local_unlock(&file_lock_lglock);
 }
 
+/* Must be called with the i_lock held! */
 static inline void
 locks_delete_global_locks(struct file_lock *fl)
 {
-	spin_lock(&file_lock_lock);
+	/*
+	 * Avoid taking lock if already unhashed. This is safe since this check
+	 * is done while holding the i_lock, and new insertions into the list
+	 * also require that it be held.
+	 */
+	if (hlist_unhashed(&fl->fl_link))
+		return;
+	lg_local_lock_cpu(&file_lock_lglock, fl->fl_link_cpu);
 	hlist_del_init(&fl->fl_link);
-	spin_unlock(&file_lock_lock);
+	lg_local_unlock_cpu(&file_lock_lglock, fl->fl_link_cpu);
 }
 
 static unsigned long
@@ -2243,6 +2257,11 @@ EXPORT_SYMBOL_GPL(vfs_cancel_lock);
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 
+struct locks_iterator {
+	int	li_cpu;
+	loff_t	li_pos;
+};
+
 static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 			    loff_t id, char *pfx)
 {
@@ -2316,39 +2335,41 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 
 static int locks_show(struct seq_file *f, void *v)
 {
+	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
-	lock_get_status(f, fl, *((loff_t *)f->private), "");
+	lock_get_status(f, fl, iter->li_pos, "");
 
 	list_for_each_entry(bfl, &fl->fl_block, fl_block)
-		lock_get_status(f, bfl, *((loff_t *)f->private), " ->");
+		lock_get_status(f, bfl, iter->li_pos, " ->");
 
 	return 0;
 }
 
 static void *locks_start(struct seq_file *f, loff_t *pos)
 {
-	loff_t *p = f->private;
+	struct locks_iterator *iter = f->private;
 
-	spin_lock(&file_lock_lock);
+	iter->li_pos = *pos + 1;
+	lg_global_lock(&file_lock_lglock);
 	spin_lock(&blocked_lock_lock);
-	*p = (*pos + 1);
-	return seq_hlist_start(&file_lock_list, *pos);
+	return seq_hlist_start_percpu(&file_lock_list, &iter->li_cpu, *pos);
 }
 
 static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 {
-	loff_t *p = f->private;
-	++*p;
-	return seq_hlist_next(v, &file_lock_list, pos);
+	struct locks_iterator *iter = f->private;
+
+	++iter->li_pos;
+	return seq_hlist_next_percpu(v, &file_lock_list, &iter->li_cpu, pos);
 }
 
 static void locks_stop(struct seq_file *f, void *v)
 {
 	spin_unlock(&blocked_lock_lock);
-	spin_unlock(&file_lock_lock);
+	lg_global_unlock(&file_lock_lglock);
 }
 
 static const struct seq_operations locks_seq_operations = {
@@ -2360,7 +2381,8 @@ static const struct seq_operations locks_seq_operations = {
 
 static int locks_open(struct inode *inode, struct file *filp)
 {
-	return seq_open_private(filp, &locks_seq_operations, sizeof(loff_t));
+	return seq_open_private(filp, &locks_seq_operations,
+					sizeof(struct locks_iterator));
 }
 
 static const struct file_operations proc_locks_operations = {
@@ -2460,9 +2482,16 @@ EXPORT_SYMBOL(lock_may_write);
 
 static int __init filelock_init(void)
 {
+	int i;
+
 	filelock_cache = kmem_cache_create("file_lock_cache",
 			sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
 
+	lg_lock_init(&file_lock_lglock, "file_lock_lglock");
+
+	for_each_possible_cpu(i)
+		INIT_HLIST_HEAD(per_cpu_ptr(&file_lock_list, i));
+
 	return 0;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e42e04f..039f2ae 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -953,6 +953,7 @@ struct file_lock {
 	unsigned int fl_flags;
 	unsigned char fl_type;
 	unsigned int fl_pid;
+	int fl_link_cpu;		/* what cpu's list is this on? */
 	struct pid *fl_nspid;
 	wait_queue_head_t fl_wait;
 	struct file *fl_file;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 14/14] locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock
@ 2013-06-21 12:58   ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-21 12:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

The file_lock_list is only used for /proc/locks. The vastly common case
is for locks to be put onto the list and come off again, without ever
being traversed.

Help optimize for this use-case by moving to percpu hlist_head-s. At the
same time, we can make the locking less contentious by moving to an
lglock. When iterating over the lists for /proc/locks, we must take the
global lock and then iterate over each CPU's list in turn.

This change necessitates a new fl_link_cpu field to keep track of which
CPU the entry is on. On x86_64 at least, this field is placed within an
existing hole in the struct to avoid growing the size.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
---
 fs/locks.c         |   69 +++++++++++++++++++++++++++++++++++++---------------
 include/linux/fs.h |    1 +
 2 files changed, 50 insertions(+), 20 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 04e2c1f..0f544cd 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -127,6 +127,8 @@
 #include <linux/rcupdate.h>
 #include <linux/pid_namespace.h>
 #include <linux/hashtable.h>
+#include <linux/percpu.h>
+#include <linux/lglock.h>
 
 #include <asm/uaccess.h>
 
@@ -155,11 +157,13 @@ int lease_break_time = 45;
 	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
 
 /*
- * The global file_lock_list is only used for displaying /proc/locks. Protected
- * by the file_lock_lock.
+ * The global file_lock_list is only used for displaying /proc/locks, so we
+ * keep a list on each CPU, with each list protected by its own spinlock via
+ * the file_lock_lglock. Note that alterations to the list also require that
+ * the relevant i_lock is held.
  */
-static HLIST_HEAD(file_lock_list);
-static DEFINE_SPINLOCK(file_lock_lock);
+DEFINE_STATIC_LGLOCK(file_lock_lglock);
+static DEFINE_PER_CPU(struct hlist_head, file_lock_list);
 
 /*
  * The blocked_hash is used to find POSIX lock loops for deadlock detection.
@@ -506,20 +510,30 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
 	return fl1->fl_owner == fl2->fl_owner;
 }
 
+/* Must be called with the i_lock held! */
 static inline void
 locks_insert_global_locks(struct file_lock *fl)
 {
-	spin_lock(&file_lock_lock);
-	hlist_add_head(&fl->fl_link, &file_lock_list);
-	spin_unlock(&file_lock_lock);
+	lg_local_lock(&file_lock_lglock);
+	fl->fl_link_cpu = smp_processor_id();
+	hlist_add_head(&fl->fl_link, this_cpu_ptr(&file_lock_list));
+	lg_local_unlock(&file_lock_lglock);
 }
 
+/* Must be called with the i_lock held! */
 static inline void
 locks_delete_global_locks(struct file_lock *fl)
 {
-	spin_lock(&file_lock_lock);
+	/*
+	 * Avoid taking lock if already unhashed. This is safe since this check
+	 * is done while holding the i_lock, and new insertions into the list
+	 * also require that it be held.
+	 */
+	if (hlist_unhashed(&fl->fl_link))
+		return;
+	lg_local_lock_cpu(&file_lock_lglock, fl->fl_link_cpu);
 	hlist_del_init(&fl->fl_link);
-	spin_unlock(&file_lock_lock);
+	lg_local_unlock_cpu(&file_lock_lglock, fl->fl_link_cpu);
 }
 
 static unsigned long
@@ -2243,6 +2257,11 @@ EXPORT_SYMBOL_GPL(vfs_cancel_lock);
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 
+struct locks_iterator {
+	int	li_cpu;
+	loff_t	li_pos;
+};
+
 static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 			    loff_t id, char *pfx)
 {
@@ -2316,39 +2335,41 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 
 static int locks_show(struct seq_file *f, void *v)
 {
+	struct locks_iterator *iter = f->private;
 	struct file_lock *fl, *bfl;
 
 	fl = hlist_entry(v, struct file_lock, fl_link);
 
-	lock_get_status(f, fl, *((loff_t *)f->private), "");
+	lock_get_status(f, fl, iter->li_pos, "");
 
 	list_for_each_entry(bfl, &fl->fl_block, fl_block)
-		lock_get_status(f, bfl, *((loff_t *)f->private), " ->");
+		lock_get_status(f, bfl, iter->li_pos, " ->");
 
 	return 0;
 }
 
 static void *locks_start(struct seq_file *f, loff_t *pos)
 {
-	loff_t *p = f->private;
+	struct locks_iterator *iter = f->private;
 
-	spin_lock(&file_lock_lock);
+	iter->li_pos = *pos + 1;
+	lg_global_lock(&file_lock_lglock);
 	spin_lock(&blocked_lock_lock);
-	*p = (*pos + 1);
-	return seq_hlist_start(&file_lock_list, *pos);
+	return seq_hlist_start_percpu(&file_lock_list, &iter->li_cpu, *pos);
 }
 
 static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 {
-	loff_t *p = f->private;
-	++*p;
-	return seq_hlist_next(v, &file_lock_list, pos);
+	struct locks_iterator *iter = f->private;
+
+	++iter->li_pos;
+	return seq_hlist_next_percpu(v, &file_lock_list, &iter->li_cpu, pos);
 }
 
 static void locks_stop(struct seq_file *f, void *v)
 {
 	spin_unlock(&blocked_lock_lock);
-	spin_unlock(&file_lock_lock);
+	lg_global_unlock(&file_lock_lglock);
 }
 
 static const struct seq_operations locks_seq_operations = {
@@ -2360,7 +2381,8 @@ static const struct seq_operations locks_seq_operations = {
 
 static int locks_open(struct inode *inode, struct file *filp)
 {
-	return seq_open_private(filp, &locks_seq_operations, sizeof(loff_t));
+	return seq_open_private(filp, &locks_seq_operations,
+					sizeof(struct locks_iterator));
 }
 
 static const struct file_operations proc_locks_operations = {
@@ -2460,9 +2482,16 @@ EXPORT_SYMBOL(lock_may_write);
 
 static int __init filelock_init(void)
 {
+	int i;
+
 	filelock_cache = kmem_cache_create("file_lock_cache",
 			sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
 
+	lg_lock_init(&file_lock_lglock, "file_lock_lglock");
+
+	for_each_possible_cpu(i)
+		INIT_HLIST_HEAD(per_cpu_ptr(&file_lock_list, i));
+
 	return 0;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e42e04f..039f2ae 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -953,6 +953,7 @@ struct file_lock {
 	unsigned int fl_flags;
 	unsigned char fl_type;
 	unsigned int fl_pid;
+	int fl_link_cpu;		/* what cpu's list is this on? */
 	struct pid *fl_nspid;
 	wait_queue_head_t fl_wait;
 	struct file *fl_file;
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 06/14] locks: encapsulate the fl_link list handling
  2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
  (?)
@ 2013-06-25  1:37       ` Stephen Rothwell
  -1 siblings, 0 replies; 47+ messages in thread
From: Stephen Rothwell @ 2013-06-25  1:37 UTC (permalink / raw)
  To: Jeff Layton
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, matthew-Ztpu424NOJ8,
	bfields-uC3wQj2KruNg9hUCZPvPmw, dhowells-H+wXaHxf7aLQT0dZR+AlfA,
	sage-4GqslpFJ+cxBDgjK7y7TUQ, smfrench-Re5JQEeQqe8AvxtiuMwx3w,
	swhiteho-H+wXaHxf7aLQT0dZR+AlfA,
	Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	samba-technical-w/Ol4Ecudpl8XjKLYN78aQ,
	cluster-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	piastryyy-Re5JQEeQqe8AvxtiuMwx3w

[-- Attachment #1: Type: text/plain, Size: 698 bytes --]

Hi Jeff,

Thanks for doing all this work!

Trivial comments below.

On Fri, 21 Jun 2013 08:58:14 -0400 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
> +static inline void
> +locks_insert_global_locks(struct file_lock *fl)
> +{
> +	list_add_tail(&fl->fl_link, &file_lock_list);
> +}

We generally do not use "inline" in C files any more and leave it to the
compiler to do that.  Also, without the "inline" these function headers
should all be able to fit on single lines like the others here i.e.

static void locks_insert_global_locks(struct file_lock *fl)

-- 
Cheers,
Stephen Rothwell                    sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 06/14] locks: encapsulate the fl_link list handling
@ 2013-06-25  1:37       ` Stephen Rothwell
  0 siblings, 0 replies; 47+ messages in thread
From: Stephen Rothwell @ 2013-06-25  1:37 UTC (permalink / raw)
  To: Jeff Layton
  Cc: viro, matthew, bfields, dhowells, sage, smfrench, swhiteho,
	Trond.Myklebust, akpm, linux-kernel, linux-afs, ceph-devel,
	linux-cifs, samba-technical, cluster-devel, linux-nfs,
	linux-fsdevel, piastryyy

[-- Attachment #1: Type: text/plain, Size: 646 bytes --]

Hi Jeff,

Thanks for doing all this work!

Trivial comments below.

On Fri, 21 Jun 2013 08:58:14 -0400 Jeff Layton <jlayton@redhat.com> wrote:
>
> +static inline void
> +locks_insert_global_locks(struct file_lock *fl)
> +{
> +	list_add_tail(&fl->fl_link, &file_lock_list);
> +}

We generally do not use "inline" in C files any more and leave it to the
compiler to do that.  Also, without the "inline" these function headers
should all be able to fit on single lines like the others here i.e.

static void locks_insert_global_locks(struct file_lock *fl)

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 06/14] locks: encapsulate the fl_link list handling
@ 2013-06-25  1:37       ` Stephen Rothwell
  0 siblings, 0 replies; 47+ messages in thread
From: Stephen Rothwell @ 2013-06-25  1:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Jeff,

Thanks for doing all this work!

Trivial comments below.

On Fri, 21 Jun 2013 08:58:14 -0400 Jeff Layton <jlayton@redhat.com> wrote:
>
> +static inline void
> +locks_insert_global_locks(struct file_lock *fl)
> +{
> +	list_add_tail(&fl->fl_link, &file_lock_list);
> +}

We generally do not use "inline" in C files any more and leave it to the
compiler to do that.  Also, without the "inline" these function headers
should all be able to fit on single lines like the others here i.e.

static void locks_insert_global_locks(struct file_lock *fl)

-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20130625/7161b155/attachment.sig>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 06/14] locks: encapsulate the fl_link list handling
  2013-06-25  1:37       ` Stephen Rothwell
  (?)
@ 2013-06-25 10:32           ` Jeff Layton
  -1 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-25 10:32 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, matthew-Ztpu424NOJ8,
	bfields-uC3wQj2KruNg9hUCZPvPmw, dhowells-H+wXaHxf7aLQT0dZR+AlfA,
	sage-4GqslpFJ+cxBDgjK7y7TUQ, smfrench-Re5JQEeQqe8AvxtiuMwx3w,
	swhiteho-H+wXaHxf7aLQT0dZR+AlfA,
	Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	samba-technical-w/Ol4Ecudpl8XjKLYN78aQ,
	cluster-devel-H+wXaHxf7aLQT0dZR+AlfA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	piastryyy-Re5JQEeQqe8AvxtiuMwx3w

[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]

On Tue, 25 Jun 2013 11:37:04 +1000
Stephen Rothwell <sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org> wrote:

> Hi Jeff,
> 
> Thanks for doing all this work!
> 
> Trivial comments below.
> 
> On Fri, 21 Jun 2013 08:58:14 -0400 Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > +static inline void
> > +locks_insert_global_locks(struct file_lock *fl)
> > +{
> > +	list_add_tail(&fl->fl_link, &file_lock_list);
> > +}
> 
> We generally do not use "inline" in C files any more and leave it to the
> compiler to do that.  Also, without the "inline" these function headers
> should all be able to fit on single lines like the others here i.e.
> 
> static void locks_insert_global_locks(struct file_lock *fl)
> 

Thanks for helping review.

Usually that makes sense, but doesn't the compiler generally determine
that by counting the number of call sites? In this case, we'll have
several call sites and it probably wouldn't inline the function. That
makes this a little less efficient since we'll have to jump to this
routine, do the list_add_tail and then jump back.

That said, I'm not opposed to doing that since these routines grow a
bit in size later and we'll only do this when a lock is acquired or
dropped. I think Al has already merged most of this set into his
for-next branch though. Perhaps I can do a patch on top of that set
that removes the inline keywords from those functions?

-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 06/14] locks: encapsulate the fl_link list handling
@ 2013-06-25 10:32           ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-25 10:32 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: viro, matthew, bfields, dhowells, sage, smfrench, swhiteho,
	Trond.Myklebust, akpm, linux-kernel, linux-afs, ceph-devel,
	linux-cifs, samba-technical, cluster-devel, linux-nfs,
	linux-fsdevel, piastryyy

[-- Attachment #1: Type: text/plain, Size: 1437 bytes --]

On Tue, 25 Jun 2013 11:37:04 +1000
Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> Hi Jeff,
> 
> Thanks for doing all this work!
> 
> Trivial comments below.
> 
> On Fri, 21 Jun 2013 08:58:14 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> >
> > +static inline void
> > +locks_insert_global_locks(struct file_lock *fl)
> > +{
> > +	list_add_tail(&fl->fl_link, &file_lock_list);
> > +}
> 
> We generally do not use "inline" in C files any more and leave it to the
> compiler to do that.  Also, without the "inline" these function headers
> should all be able to fit on single lines like the others here i.e.
> 
> static void locks_insert_global_locks(struct file_lock *fl)
> 

Thanks for helping review.

Usually that makes sense, but doesn't the compiler generally determine
that by counting the number of call sites? In this case, we'll have
several call sites and it probably wouldn't inline the function. That
makes this a little less efficient since we'll have to jump to this
routine, do the list_add_tail and then jump back.

That said, I'm not opposed to doing that since these routines grow a
bit in size later and we'll only do this when a lock is acquired or
dropped. I think Al has already merged most of this set into his
for-next branch though. Perhaps I can do a patch on top of that set
that removes the inline keywords from those functions?

-- 
Jeff Layton <jlayton@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Cluster-devel] [PATCH v4 06/14] locks: encapsulate the fl_link list handling
@ 2013-06-25 10:32           ` Jeff Layton
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Layton @ 2013-06-25 10:32 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, 25 Jun 2013 11:37:04 +1000
Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> Hi Jeff,
> 
> Thanks for doing all this work!
> 
> Trivial comments below.
> 
> On Fri, 21 Jun 2013 08:58:14 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> >
> > +static inline void
> > +locks_insert_global_locks(struct file_lock *fl)
> > +{
> > +	list_add_tail(&fl->fl_link, &file_lock_list);
> > +}
> 
> We generally do not use "inline" in C files any more and leave it to the
> compiler to do that.  Also, without the "inline" these function headers
> should all be able to fit on single lines like the others here i.e.
> 
> static void locks_insert_global_locks(struct file_lock *fl)
> 

Thanks for helping review.

Usually that makes sense, but doesn't the compiler generally determine
that by counting the number of call sites? In this case, we'll have
several call sites and it probably wouldn't inline the function. That
makes this a little less efficient since we'll have to jump to this
routine, do the list_add_tail and then jump back.

That said, I'm not opposed to doing that since these routines grow a
bit in size later and we'll only do this when a lock is acquired or
dropped. I think Al has already merged most of this set into his
for-next branch though. Perhaps I can do a patch on top of that set
that removes the inline keywords from those functions?

-- 
Jeff Layton <jlayton@redhat.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20130625/a9492a07/attachment.sig>

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2013-06-25 10:33 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-21 12:58 [PATCH v4 00/14] locks: scalability improvements for file locking Jeff Layton
2013-06-21 12:58 ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58 ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 01/14] locks: drop the unused filp argument to posix_unblock_lock Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58   ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 02/14] cifs: use posix_unblock_lock instead of locks_delete_block Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58   ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 03/14] locks: make generic_add_lease and generic_delete_lease static Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58   ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 04/14] locks: comment cleanups and clarifications Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58   ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 05/14] locks: make "added" in __posix_lock_file a bool Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58 ` [PATCH v4 06/14] locks: encapsulate the fl_link list handling Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
     [not found]   ` <1371819502-26363-7-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-25  1:37     ` Stephen Rothwell
2013-06-25  1:37       ` [Cluster-devel] " Stephen Rothwell
2013-06-25  1:37       ` Stephen Rothwell
     [not found]       ` <20130625113704.89a686a55dcec684e5e99434-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>
2013-06-25 10:32         ` Jeff Layton
2013-06-25 10:32           ` [Cluster-devel] " Jeff Layton
2013-06-25 10:32           ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 07/14] locks: protect most of the file_lock handling with i_lock Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58   ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 08/14] locks: avoid taking global lock if possible when waking up blocked waiters Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58   ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 09/14] locks: convert fl_link to a hlist_node Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
     [not found] ` <1371819502-26363-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-21 12:58   ` [PATCH v4 10/14] locks: turn the blocked_list into a hashtable Jeff Layton
2013-06-21 12:58     ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58     ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 11/14] locks: add a new "lm_owner_key" lock operation Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58   ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 12/14] locks: give the blocked_hash its own spinlock Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58   ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 13/14] seq_file: add seq_list_*_percpu helpers Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton
2013-06-21 12:58   ` Jeff Layton
2013-06-21 12:58 ` [PATCH v4 14/14] locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock Jeff Layton
2013-06-21 12:58   ` [Cluster-devel] " Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.