All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Remove BKL from fs/locks.c
@ 2010-09-14 20:06 Arnd Bergmann
  2010-09-14 20:20 ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2010-09-14 20:06 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: Matthew Wilcox, Christoph Hellwig, Trond Myklebust,
	J. Bruce Fields, Miklos Szeredi, Frederic Weisbecker,
	Ingo Molnar, John Kacur, Stephen Rothwell, Linus Torvalds,
	Andrew Morton, Thomas Gleixner

From: Matthew Wilcox <willy@linux.intel.com>

For reference, this is the patch I use for testing the BKL
removal tree, which does not really remove the BKL at all
as long as it is still required by fs/locks.c.

Christoph already made it clear that it cannot go upstream
like this and Stephen said he consequently didn't want it
in -next either. It may still be material for -mm or -rt,
or just anyone who wants to test a kernel with CONFIG_BKL
disabled.

It is also available in the a branch of my git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl.git nasty

I've taken a patch originally written by Matthew Wilcox and
ported it to the current version. Unfortunately, the change
conflicts with the use of lockd, which still heavily uses
the big kernel lock.

As a workaround, I've made the behaviour configurable,
it either uses the BKL when it's enabled or a spinlock
when the BKL (and consequently nfs and lockd) are
disabled.

Original introduction from Willy:

   I've been promising to do this for about seven years now.

   It seems to work well enough, but I haven't run any serious stress
   tests on it.  This implementation uses one spinlock to protect both lock
   lists and all the i_flock chains.  It doesn't seem worth splitting up
   the locking any further.

   I had to move one memory allocation out from under the file_lock_lock.
   I hope I got that logic right.  I'm rather tempted to split out the
   find_conflict algorithm from that function into something that can be
   called separately for the FL_ACCESS case.

   I also have to drop and reacquire the file_lock_lock around the call
   to cond_resched().  This was done automatically for us before by the
   special BKL semantics.

   I had to change vfs_setlease() as it relied on the special BKL ability
   to recursively acquire the same lock.  The internal caller now calls
   __vfs_setlease and the exported interface acquires and releases the
   file_lock_lock around calling __vfs_setlease.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Nacked-by: Christoph Hellwig <hch@lst.de>
---
 fs/Kconfig |    4 ++
 fs/locks.c |  116 +++++++++++++++++++++++++++++++++++++-----------------------
 2 files changed, 76 insertions(+), 44 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 3d18530..a424bf7 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -55,6 +55,10 @@ config FILE_LOCKING
           for filesystems like NFS and for the flock() system
           call. Disabling this option saves about 11k.
 
+config FLOCKS_SPINLOCK
+	bool
+	default FILE_LOCKING && !LOCK_KERNEL
+
 source "fs/notify/Kconfig"
 
 source "fs/quota/Kconfig"
diff --git a/fs/locks.c b/fs/locks.c
index ab24d49..77c3f00 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -140,9 +140,29 @@ int lease_break_time = 45;
 #define for_each_lock(inode, lockp) \
 	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
 
+/*
+ * Protects the two list heads below, plus the inode->i_flock list
+ */
 static LIST_HEAD(file_lock_list);
 static LIST_HEAD(blocked_list);
 
+#ifdef CONFIG_FLOCKS_SPINLOCK
+static DEFINE_SPINLOCK(file_lock_lock);
+
+static inline void lock_flocks(void)
+{
+	spin_lock(&file_lock_lock);
+}
+
+static inline void unlock_flocks(void)
+{
+	spin_unlock(&file_lock_lock);
+}
+#else
+#define lock_flocks() lock_kernel()
+#define unlock_flocks() unlock_kernel()
+#endif
+
 static struct kmem_cache *filelock_cache __read_mostly;
 
 /* Allocate an empty lock structure. */
@@ -511,9 +531,9 @@ static void __locks_delete_block(struct file_lock *waiter)
  */
 static void locks_delete_block(struct file_lock *waiter)
 {
-	lock_kernel();
+	lock_flocks();
 	__locks_delete_block(waiter);
-	unlock_kernel();
+	unlock_flocks();
 }
 
 /* Insert waiter into blocker's block list.
@@ -644,7 +664,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 {
 	struct file_lock *cfl;
 
-	lock_kernel();
+	lock_flocks();
 	for (cfl = filp->f_path.dentry->d_inode->i_flock; cfl; cfl = cfl->fl_next) {
 		if (!IS_POSIX(cfl))
 			continue;
@@ -657,7 +677,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 			fl->fl_pid = pid_vnr(cfl->fl_nspid);
 	} else
 		fl->fl_type = F_UNLCK;
-	unlock_kernel();
+	unlock_flocks();
 	return;
 }
 EXPORT_SYMBOL(posix_test_lock);
@@ -730,18 +750,16 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 	int error = 0;
 	int found = 0;
 
-	lock_kernel();
-	if (request->fl_flags & FL_ACCESS)
-		goto find_conflict;
-
-	if (request->fl_type != F_UNLCK) {
-		error = -ENOMEM;
+	if (!(request->fl_flags & FL_ACCESS) && (request->fl_type != F_UNLCK)) {
 		new_fl = locks_alloc_lock();
-		if (new_fl == NULL)
-			goto out;
-		error = 0;
+		if (!new_fl)
+			return -ENOMEM;
 	}
 
+	lock_flocks();
+	if (request->fl_flags & FL_ACCESS)
+		goto find_conflict;
+
 	for_each_lock(inode, before) {
 		struct file_lock *fl = *before;
 		if (IS_POSIX(fl))
@@ -767,8 +785,11 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 	 * If a higher-priority process was blocked on the old file lock,
 	 * give it the opportunity to lock the file.
 	 */
-	if (found)
+	if (found) {
+		unlock_flocks();
 		cond_resched();
+		lock_flocks();
+	}
 
 find_conflict:
 	for_each_lock(inode, before) {
@@ -794,7 +815,7 @@ find_conflict:
 	error = 0;
 
 out:
-	unlock_kernel();
+	unlock_flocks();
 	if (new_fl)
 		locks_free_lock(new_fl);
 	return error;
@@ -823,7 +844,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		new_fl2 = locks_alloc_lock();
 	}
 
-	lock_kernel();
+	lock_flocks();
 	if (request->fl_type != F_UNLCK) {
 		for_each_lock(inode, before) {
 			fl = *before;
@@ -991,7 +1012,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		locks_wake_up_blocks(left);
 	}
  out:
-	unlock_kernel();
+	unlock_flocks();
 	/*
 	 * Free any unused locks.
 	 */
@@ -1066,14 +1087,14 @@ int locks_mandatory_locked(struct inode *inode)
 	/*
 	 * Search the lock list for this inode for any POSIX locks.
 	 */
-	lock_kernel();
+	lock_flocks();
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!IS_POSIX(fl))
 			continue;
 		if (fl->fl_owner != owner)
 			break;
 	}
-	unlock_kernel();
+	unlock_flocks();
 	return fl ? -EAGAIN : 0;
 }
 
@@ -1186,7 +1207,7 @@ int __break_lease(struct inode *inode, unsigned int mode)
 
 	new_fl = lease_alloc(NULL, want_write ? F_WRLCK : F_RDLCK);
 
-	lock_kernel();
+	lock_flocks();
 
 	time_out_leases(inode);
 
@@ -1247,8 +1268,10 @@ restart:
 			break_time++;
 	}
 	locks_insert_block(flock, new_fl);
+	unlock_flocks();
 	error = wait_event_interruptible_timeout(new_fl->fl_wait,
 						!new_fl->fl_next, break_time);
+	lock_flocks();
 	__locks_delete_block(new_fl);
 	if (error >= 0) {
 		if (error == 0)
@@ -1263,7 +1286,7 @@ restart:
 	}
 
 out:
-	unlock_kernel();
+	unlock_flocks();
 	if (!IS_ERR(new_fl))
 		locks_free_lock(new_fl);
 	return error;
@@ -1319,7 +1342,7 @@ int fcntl_getlease(struct file *filp)
 	struct file_lock *fl;
 	int type = F_UNLCK;
 
-	lock_kernel();
+	lock_flocks();
 	time_out_leases(filp->f_path.dentry->d_inode);
 	for (fl = filp->f_path.dentry->d_inode->i_flock; fl && IS_LEASE(fl);
 			fl = fl->fl_next) {
@@ -1328,7 +1351,7 @@ int fcntl_getlease(struct file *filp)
 			break;
 		}
 	}
-	unlock_kernel();
+	unlock_flocks();
 	return type;
 }
 
@@ -1341,7 +1364,7 @@ int fcntl_getlease(struct file *filp)
  *	The (input) flp->fl_lmops->fl_break function is required
  *	by break_lease().
  *
- *	Called with kernel lock held.
+ *	Called with file_lock_lock held.
  */
 int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
 {
@@ -1436,7 +1459,15 @@ out:
 }
 EXPORT_SYMBOL(generic_setlease);
 
- /**
+static int __vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
+{
+	if (filp->f_op && filp->f_op->setlease)
+		return filp->f_op->setlease(filp, arg, lease);
+	else
+		return generic_setlease(filp, arg, lease);
+}
+
+/**
  *	vfs_setlease        -       sets a lease on an open file
  *	@filp: file pointer
  *	@arg: type of lease to obtain
@@ -1467,12 +1498,9 @@ int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 {
 	int error;
 
-	lock_kernel();
-	if (filp->f_op && filp->f_op->setlease)
-		error = filp->f_op->setlease(filp, arg, lease);
-	else
-		error = generic_setlease(filp, arg, lease);
-	unlock_kernel();
+	lock_flocks();
+	error = __vfs_setlease(filp, arg, lease);
+	unlock_flocks();
 
 	return error;
 }
@@ -1499,9 +1527,9 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
 	if (error)
 		return error;
 
-	lock_kernel();
+	lock_flocks();
 
-	error = vfs_setlease(filp, arg, &flp);
+	error = __vfs_setlease(filp, arg, &flp);
 	if (error || arg == F_UNLCK)
 		goto out_unlock;
 
@@ -1516,7 +1544,7 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
 
 	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
 out_unlock:
-	unlock_kernel();
+	unlock_flocks();
 	return error;
 }
 
@@ -2020,7 +2048,7 @@ void locks_remove_flock(struct file *filp)
 			fl.fl_ops->fl_release_private(&fl);
 	}
 
-	lock_kernel();
+	lock_flocks();
 	before = &inode->i_flock;
 
 	while ((fl = *before) != NULL) {
@@ -2038,7 +2066,7 @@ void locks_remove_flock(struct file *filp)
  		}
 		before = &fl->fl_next;
 	}
-	unlock_kernel();
+	unlock_flocks();
 }
 
 /**
@@ -2053,12 +2081,12 @@ posix_unblock_lock(struct file *filp, struct file_lock *waiter)
 {
 	int status = 0;
 
-	lock_kernel();
+	lock_flocks();
 	if (waiter->fl_next)
 		__locks_delete_block(waiter);
 	else
 		status = -ENOENT;
-	unlock_kernel();
+	unlock_flocks();
 	return status;
 }
 
@@ -2172,7 +2200,7 @@ static int locks_show(struct seq_file *f, void *v)
 
 static void *locks_start(struct seq_file *f, loff_t *pos)
 {
-	lock_kernel();
+	lock_flocks();
 	f->private = (void *)1;
 	return seq_list_start(&file_lock_list, *pos);
 }
@@ -2184,7 +2212,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 
 static void locks_stop(struct seq_file *f, void *v)
 {
-	unlock_kernel();
+	unlock_flocks();
 }
 
 static const struct seq_operations locks_seq_operations = {
@@ -2231,7 +2259,7 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
 {
 	struct file_lock *fl;
 	int result = 1;
-	lock_kernel();
+	lock_flocks();
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (IS_POSIX(fl)) {
 			if (fl->fl_type == F_RDLCK)
@@ -2248,7 +2276,7 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
 		result = 0;
 		break;
 	}
-	unlock_kernel();
+	unlock_flocks();
 	return result;
 }
 
@@ -2271,7 +2299,7 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
 {
 	struct file_lock *fl;
 	int result = 1;
-	lock_kernel();
+	lock_flocks();
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (IS_POSIX(fl)) {
 			if ((fl->fl_end < start) || (fl->fl_start > (start + len)))
@@ -2286,7 +2314,7 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
 		result = 0;
 		break;
 	}
-	unlock_kernel();
+	unlock_flocks();
 	return result;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-14 20:06 [PATCH] Remove BKL from fs/locks.c Arnd Bergmann
@ 2010-09-14 20:20 ` Linus Torvalds
  2010-09-14 20:39   ` Arnd Bergmann
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2010-09-14 20:20 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, linux-fsdevel, Matthew Wilcox, Christoph Hellwig,
	Trond Myklebust, J. Bruce Fields, Miklos Szeredi,
	Frederic Weisbecker, Ingo Molnar, John Kacur, Stephen Rothwell,
	Andrew Morton, Thomas Gleixner

On Tue, Sep 14, 2010 at 1:06 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> Christoph already made it clear that it cannot go upstream
> like this and Stephen said he consequently didn't want it
> in -next either. It may still be material for -mm or -rt,
> or just anyone who wants to test a kernel with CONFIG_BKL
> disabled.

I think something like this would be acceptable with that whole config
option removed.

That would not help your particular "test without BKL at all" case,
but basically I would suggest something like the following:

 - do the unconditional conversion to the new spinlock, so that
fs/locks.c is internally consistent and conceptually works without the
BKL at all.

 - remove all the EXPORT_SYMBOL()s

 - create wrapper functions named xyzzy_bkl() (perhaps in a different
fs/bkl.c file or whatever - or maybe just in the same file that you do
the bkl for CONFIG_BKL in the first place) that wrap the fs/locks.c
symbols with the kernel lock (so now those wrappers will take _both_
the BKL and the real spinlock)

 - fix up all modular users mindlessly to call the bkl version, and
ship it that way for one release

End result: anybody who uses the old exports will just not link at
all, and all in-kernel users (ie lockd etc) will now very explicitly
have that xyz_bkl() use.

After that one release, we can re-introduce the old names without the
bkl, and let filesystems and other subsystems just decide that they
don't want the bkl version one by one. So that first phase would be
basically a no-op with an interface rename just so that the rest could
then be done piece-meal.

                           Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-14 20:20 ` Linus Torvalds
@ 2010-09-14 20:39   ` Arnd Bergmann
  2010-09-14 20:53     ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2010-09-14 20:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-fsdevel, Matthew Wilcox, Christoph Hellwig,
	Trond Myklebust, J. Bruce Fields, Miklos Szeredi,
	Frederic Weisbecker, Ingo Molnar, John Kacur, Stephen Rothwell,
	Andrew Morton, Thomas Gleixner

On Tuesday 14 September 2010 22:20:22 Linus Torvalds wrote:
> End result: anybody who uses the old exports will just not link at
> all, and all in-kernel users (ie lockd etc) will now very explicitly
> have that xyz_bkl() use.
> 
> After that one release, we can re-introduce the old names without the
> bkl, and let filesystems and other subsystems just decide that they
> don't want the bkl version one by one. So that first phase would be
> basically a no-op with an interface rename just so that the rest could
> then be done piece-meal.

Hmm, maybe I'm misunderstanding part of that plan, but I think it
won't work because the BKL in there seems to protect all struct file_lock
instances and they are passed around to other code, most importantly
lockd. The file_locks themselves can be accessed both using sys_flock()
etc locally and using afs/nfs/coda/... exported file systems from other
systems.

AFAICT we could end up with lockd changing data under BKL while a sys_flock()
accesses the same data under file_lock_lock, with no mutual exclusion
between the two.

	Arnd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-14 20:39   ` Arnd Bergmann
@ 2010-09-14 20:53     ` Linus Torvalds
  2010-09-14 21:24       ` Arnd Bergmann
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2010-09-14 20:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, linux-fsdevel, Matthew Wilcox, Christoph Hellwig,
	Trond Myklebust, J. Bruce Fields, Miklos Szeredi,
	Frederic Weisbecker, Ingo Molnar, John Kacur, Stephen Rothwell,
	Andrew Morton, Thomas Gleixner

On Tue, Sep 14, 2010 at 1:39 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> Hmm, maybe I'm misunderstanding part of that plan, but I think it
> won't work because the BKL in there seems to protect all struct file_lock
> instances and they are passed around to other code, most importantly
> lockd. The file_locks themselves can be accessed both using sys_flock()
> etc locally and using afs/nfs/coda/... exported file systems from other
> systems.

Ahh, so lockd actually walks those things itself too, not just using
the fs/locks.c interfaces. Ok, scratch that then.

Oh well. I guess there's no incremental way to do things sanely. And
nobody has patches to fix those users, I guess.

                                Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-14 20:53     ` Linus Torvalds
@ 2010-09-14 21:24       ` Arnd Bergmann
  2010-09-14 21:55         ` Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2010-09-14 21:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-fsdevel, Matthew Wilcox, Christoph Hellwig,
	Trond Myklebust, J. Bruce Fields, Miklos Szeredi,
	Frederic Weisbecker, Ingo Molnar, John Kacur, Stephen Rothwell,
	Andrew Morton, Thomas Gleixner

On Tuesday 14 September 2010 22:53:31 Linus Torvalds wrote:
> Oh well. I guess there's no incremental way to do things sanely. And
> nobody has patches to fix those users, I guess.

The only critical user is fs/lockd, I can easily handle the rest.
When I talked to Bruce and Trond during LinuxCon, they told me that
it should be possible to separate the bits of fs/lockd that lock
against fs/locks.c and convert the former to use lock_flocks().

That would be enough to actually apply this patch without the
nasty CONFIG_BKL check and with changes to the few other
consumers of file locks. My original plan was to have my current
patch in -next until that happens.

	Arnd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-14 21:24       ` Arnd Bergmann
@ 2010-09-14 21:55         ` Trond Myklebust
  2010-09-15 16:30           ` Arnd Bergmann
  2010-09-15 20:42           ` J. Bruce Fields
  0 siblings, 2 replies; 13+ messages in thread
From: Trond Myklebust @ 2010-09-14 21:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linus Torvalds, linux-kernel, linux-fsdevel, Matthew Wilcox,
	Christoph Hellwig, J. Bruce Fields, Miklos Szeredi,
	Frederic Weisbecker, Ingo Molnar, John Kacur, Stephen Rothwell,
	Andrew Morton, Thomas Gleixner

On Tue, 2010-09-14 at 23:24 +0200, Arnd Bergmann wrote:
> On Tuesday 14 September 2010 22:53:31 Linus Torvalds wrote:
> > Oh well. I guess there's no incremental way to do things sanely. And
> > nobody has patches to fix those users, I guess.
> 
> The only critical user is fs/lockd, I can easily handle the rest.
> When I talked to Bruce and Trond during LinuxCon, they told me that
> it should be possible to separate the bits of fs/lockd that lock
> against fs/locks.c and convert the former to use lock_flocks().


The NFSv4 client is quite ready to just switch to using lock_flocks()
now. There is nothing that depends on the magical properties of the BKL.
I do also have someone working on BKL removal for the NLM/lockd client
and am hoping that should be ready in a couple of weeks time.

The timeline for the server is up to Bruce, however.

> That would be enough to actually apply this patch without the
> nasty CONFIG_BKL check and with changes to the few other
> consumers of file locks. My original plan was to have my current
> patch in -next until that happens.

Would it be possible to at least merge a patch that defines
lock_flocks() (even if it just is an alias for lock_kernel()), so that
we can convert those bits of the lock recovery code that need to peek
into the inode->i_flock list ASAP?

Cheers
  Trond

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-14 21:55         ` Trond Myklebust
@ 2010-09-15 16:30           ` Arnd Bergmann
  2010-09-15 17:02             ` Sage Weil
  2010-09-15 20:42           ` J. Bruce Fields
  1 sibling, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2010-09-15 16:30 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Linus Torvalds, linux-kernel, linux-fsdevel, Matthew Wilcox,
	Christoph Hellwig, J. Bruce Fields, Miklos Szeredi,
	Frederic Weisbecker, Ingo Molnar, John Kacur, Stephen Rothwell,
	Andrew Morton, Thomas Gleixner

On Tuesday 14 September 2010, Trond Myklebust wrote:
> Would it be possible to at least merge a patch that defines
> lock_flocks() (even if it just is an alias for lock_kernel()), so that
> we can convert those bits of the lock recovery code that need to peek
> into the inode->i_flock list ASAP?

Fine with me.

Another alternative would be to keep a series of all these patches
together, like

1. fs/locks.c: prepare for BKL removal
2. fs/lockd: use lock/flocks
3. fs/locks.c: lock_flocks using spinlock (trivial)

Feel free to take the patch below into a tree of yours when
adding patches 2 and 3 on top, or send me that patch once
it's done and I'll add both into my tree.

Linus, Christoph: Does the patch below look ok? I've removed
the CONFIG_BKL stuff in it now and changed over everything
besides lockd to use lock_flocks(). I'd like to put this into
the -next queue (after some basic testing).
---
Subject: [PATCH] fs/locks.c: prepare for BKL removal

This prepares the removal of the big kernel lock from the
file locking code. We still use the BKL as long as fs/lockd
uses it, but flip the definition to a private spinlock
as soon as that's done.

All users outside of fs/lockd get converted to use
lock_flocks() instead of lock_kernel() where appropriate.

Based on an earlier patch from Matthew Wilcox.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
---

 fs/afs/flock.c       |    5 +-
 fs/ceph/mds_client.c |    6 +-
 fs/locks.c           |  118 ++++++++++++++++++++++++++++---------------------
 fs/nfs/delegation.c  |   10 ++--
 fs/nfs/nfs4state.c   |   10 ++--
 fs/nfsd/nfs4state.c  |    6 +-
 include/linux/fs.h   |   10 ++++
 7 files changed, 95 insertions(+), 70 deletions(-)

diff --git a/fs/afs/flock.c b/fs/afs/flock.c
index 0931bc1..757d664 100644
--- a/fs/afs/flock.c
+++ b/fs/afs/flock.c
@@ -9,7 +9,6 @@
  * 2 of the License, or (at your option) any later version.
  */
 
-#include <linux/smp_lock.h>
 #include "internal.h"
 
 #define AFS_LOCK_GRANTED	0
@@ -274,7 +273,7 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
 
 	type = (fl->fl_type == F_RDLCK) ? AFS_LOCK_READ : AFS_LOCK_WRITE;
 
-	lock_kernel();
+	lock_flocks();
 
 	/* make sure we've got a callback on this file and that our view of the
 	 * data version is up to date */
@@ -421,7 +420,7 @@ given_lock:
 	afs_vnode_fetch_status(vnode, NULL, key);
 
 error:
-	unlock_kernel();
+	unlock_flocks();
 	_leave(" = %d", ret);
 	return ret;
 
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index f091b13..bda5211 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3,7 +3,7 @@
 #include <linux/wait.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
-#include <linux/smp_lock.h>
+#include <linux/fs.h>
 
 #include "mds_client.h"
 #include "mon_client.h"
@@ -2362,7 +2362,7 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
 	if (recon_state->flock) {
 		int num_fcntl_locks, num_flock_locks;
 
-		lock_kernel();
+		lock_flocks();
 		ceph_count_locks(inode, &num_fcntl_locks, &num_flock_locks);
 		rec.v2.flock_len = (2*sizeof(u32) +
 				    (num_fcntl_locks+num_flock_locks) *
@@ -2373,7 +2373,7 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
 			err = ceph_encode_locks(inode, pagelist,
 						num_fcntl_locks,
 						num_flock_locks);
-		unlock_kernel();
+		unlock_flocks();
 	}
 
 out_free:
diff --git a/fs/locks.c b/fs/locks.c
index ab24d49..8c6935f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -143,6 +143,21 @@ int lease_break_time = 45;
 static LIST_HEAD(file_lock_list);
 static LIST_HEAD(blocked_list);
 
+/*
+ * Protects the two list heads above, plus the inode->i_flock list
+ * FIXME: should use a spinlock, once fs/lockd uses lock_flocks().
+ */
+void lock_flocks(void)
+{
+	lock_kernel();
+}
+EXPORT_SYMBOL(lock_flocks);
+
+void unlock_flocks(void)
+{
+	unlock_kernel();
+}
+
 static struct kmem_cache *filelock_cache __read_mostly;
 
 /* Allocate an empty lock structure. */
@@ -511,9 +526,9 @@ static void __locks_delete_block(struct file_lock *waiter)
  */
 static void locks_delete_block(struct file_lock *waiter)
 {
-	lock_kernel();
+	lock_flocks();
 	__locks_delete_block(waiter);
-	unlock_kernel();
+	unlock_flocks();
 }
 
 /* Insert waiter into blocker's block list.
@@ -644,7 +659,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 {
 	struct file_lock *cfl;
 
-	lock_kernel();
+	lock_flocks();
 	for (cfl = filp->f_path.dentry->d_inode->i_flock; cfl; cfl = cfl->fl_next) {
 		if (!IS_POSIX(cfl))
 			continue;
@@ -657,7 +672,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 			fl->fl_pid = pid_vnr(cfl->fl_nspid);
 	} else
 		fl->fl_type = F_UNLCK;
-	unlock_kernel();
+	unlock_flocks();
 	return;
 }
 EXPORT_SYMBOL(posix_test_lock);
@@ -730,18 +745,16 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 	int error = 0;
 	int found = 0;
 
-	lock_kernel();
-	if (request->fl_flags & FL_ACCESS)
-		goto find_conflict;
-
-	if (request->fl_type != F_UNLCK) {
-		error = -ENOMEM;
+	if (!(request->fl_flags & FL_ACCESS) && (request->fl_type != F_UNLCK)) {
 		new_fl = locks_alloc_lock();
-		if (new_fl == NULL)
-			goto out;
-		error = 0;
+		if (!new_fl)
+			return -ENOMEM;
 	}
 
+	lock_flocks();
+	if (request->fl_flags & FL_ACCESS)
+		goto find_conflict;
+
 	for_each_lock(inode, before) {
 		struct file_lock *fl = *before;
 		if (IS_POSIX(fl))
@@ -767,8 +780,11 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 	 * If a higher-priority process was blocked on the old file lock,
 	 * give it the opportunity to lock the file.
 	 */
-	if (found)
+	if (found) {
+		unlock_flocks();
 		cond_resched();
+		lock_flocks();
+	}
 
 find_conflict:
 	for_each_lock(inode, before) {
@@ -794,7 +810,7 @@ find_conflict:
 	error = 0;
 
 out:
-	unlock_kernel();
+	unlock_flocks();
 	if (new_fl)
 		locks_free_lock(new_fl);
 	return error;
@@ -823,7 +839,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		new_fl2 = locks_alloc_lock();
 	}
 
-	lock_kernel();
+	lock_flocks();
 	if (request->fl_type != F_UNLCK) {
 		for_each_lock(inode, before) {
 			fl = *before;
@@ -991,7 +1007,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 		locks_wake_up_blocks(left);
 	}
  out:
-	unlock_kernel();
+	unlock_flocks();
 	/*
 	 * Free any unused locks.
 	 */
@@ -1066,14 +1082,14 @@ int locks_mandatory_locked(struct inode *inode)
 	/*
 	 * Search the lock list for this inode for any POSIX locks.
 	 */
-	lock_kernel();
+	lock_flocks();
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!IS_POSIX(fl))
 			continue;
 		if (fl->fl_owner != owner)
 			break;
 	}
-	unlock_kernel();
+	unlock_flocks();
 	return fl ? -EAGAIN : 0;
 }
 
@@ -1186,7 +1202,7 @@ int __break_lease(struct inode *inode, unsigned int mode)
 
 	new_fl = lease_alloc(NULL, want_write ? F_WRLCK : F_RDLCK);
 
-	lock_kernel();
+	lock_flocks();
 
 	time_out_leases(inode);
 
@@ -1247,8 +1263,10 @@ restart:
 			break_time++;
 	}
 	locks_insert_block(flock, new_fl);
+	unlock_flocks();
 	error = wait_event_interruptible_timeout(new_fl->fl_wait,
 						!new_fl->fl_next, break_time);
+	lock_flocks();
 	__locks_delete_block(new_fl);
 	if (error >= 0) {
 		if (error == 0)
@@ -1263,7 +1281,7 @@ restart:
 	}
 
 out:
-	unlock_kernel();
+	unlock_flocks();
 	if (!IS_ERR(new_fl))
 		locks_free_lock(new_fl);
 	return error;
@@ -1319,7 +1337,7 @@ int fcntl_getlease(struct file *filp)
 	struct file_lock *fl;
 	int type = F_UNLCK;
 
-	lock_kernel();
+	lock_flocks();
 	time_out_leases(filp->f_path.dentry->d_inode);
 	for (fl = filp->f_path.dentry->d_inode->i_flock; fl && IS_LEASE(fl);
 			fl = fl->fl_next) {
@@ -1328,7 +1346,7 @@ int fcntl_getlease(struct file *filp)
 			break;
 		}
 	}
-	unlock_kernel();
+	unlock_flocks();
 	return type;
 }
 
@@ -1341,7 +1359,7 @@ int fcntl_getlease(struct file *filp)
  *	The (input) flp->fl_lmops->fl_break function is required
  *	by break_lease().
  *
- *	Called with kernel lock held.
+ *	Called with file_lock_lock held.
  */
 int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
 {
@@ -1359,6 +1377,13 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
 	if (error)
 		return error;
 
+	if (arg != F_UNLCK) {
+		new_fl = locks_alloc_lock();
+		if (new_fl == NULL)
+			return -ENOMEM;
+	}
+
+	lock_flocks();
 	time_out_leases(inode);
 
 	BUG_ON(!(*flp)->fl_lmops->fl_break);
@@ -1366,11 +1391,6 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
 	lease = *flp;
 
 	if (arg != F_UNLCK) {
-		error = -ENOMEM;
-		new_fl = locks_alloc_lock();
-		if (new_fl == NULL)
-			goto out;
-
 		error = -EAGAIN;
 		if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
 			goto out;
@@ -1427,16 +1447,18 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
 	locks_insert_lock(before, new_fl);
 
 	*flp = new_fl;
+	unlock_flocks();
 	return 0;
 
 out:
 	if (new_fl != NULL)
 		locks_free_lock(new_fl);
+	unlock_flocks();
 	return error;
 }
 EXPORT_SYMBOL(generic_setlease);
 
- /**
+/**
  *	vfs_setlease        -       sets a lease on an open file
  *	@filp: file pointer
  *	@arg: type of lease to obtain
@@ -1465,16 +1487,10 @@ EXPORT_SYMBOL(generic_setlease);
 
 int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 {
-	int error;
-
-	lock_kernel();
 	if (filp->f_op && filp->f_op->setlease)
-		error = filp->f_op->setlease(filp, arg, lease);
+		return filp->f_op->setlease(filp, arg, lease);
 	else
-		error = generic_setlease(filp, arg, lease);
-	unlock_kernel();
-
-	return error;
+		return generic_setlease(filp, arg, lease);
 }
 EXPORT_SYMBOL_GPL(vfs_setlease);
 
@@ -1499,12 +1515,12 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
 	if (error)
 		return error;
 
-	lock_kernel();
-
 	error = vfs_setlease(filp, arg, &flp);
 	if (error || arg == F_UNLCK)
 		goto out_unlock;
 
+	lock_flocks();
+
 	error = fasync_helper(fd, filp, 1, &flp->fl_fasync);
 	if (error < 0) {
 		/* remove lease just inserted by setlease */
@@ -1516,7 +1532,7 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
 
 	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
 out_unlock:
-	unlock_kernel();
+	unlock_flocks();
 	return error;
 }
 
@@ -2020,7 +2036,7 @@ void locks_remove_flock(struct file *filp)
 			fl.fl_ops->fl_release_private(&fl);
 	}
 
-	lock_kernel();
+	lock_flocks();
 	before = &inode->i_flock;
 
 	while ((fl = *before) != NULL) {
@@ -2038,7 +2054,7 @@ void locks_remove_flock(struct file *filp)
  		}
 		before = &fl->fl_next;
 	}
-	unlock_kernel();
+	unlock_flocks();
 }
 
 /**
@@ -2053,12 +2069,12 @@ posix_unblock_lock(struct file *filp, struct file_lock *waiter)
 {
 	int status = 0;
 
-	lock_kernel();
+	lock_flocks();
 	if (waiter->fl_next)
 		__locks_delete_block(waiter);
 	else
 		status = -ENOENT;
-	unlock_kernel();
+	unlock_flocks();
 	return status;
 }
 
@@ -2172,7 +2188,7 @@ static int locks_show(struct seq_file *f, void *v)
 
 static void *locks_start(struct seq_file *f, loff_t *pos)
 {
-	lock_kernel();
+	lock_flocks();
 	f->private = (void *)1;
 	return seq_list_start(&file_lock_list, *pos);
 }
@@ -2184,7 +2200,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 
 static void locks_stop(struct seq_file *f, void *v)
 {
-	unlock_kernel();
+	unlock_flocks();
 }
 
 static const struct seq_operations locks_seq_operations = {
@@ -2231,7 +2247,7 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
 {
 	struct file_lock *fl;
 	int result = 1;
-	lock_kernel();
+	lock_flocks();
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (IS_POSIX(fl)) {
 			if (fl->fl_type == F_RDLCK)
@@ -2248,7 +2264,7 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
 		result = 0;
 		break;
 	}
-	unlock_kernel();
+	unlock_flocks();
 	return result;
 }
 
@@ -2271,7 +2287,7 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
 {
 	struct file_lock *fl;
 	int result = 1;
-	lock_kernel();
+	lock_flocks();
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (IS_POSIX(fl)) {
 			if ((fl->fl_end < start) || (fl->fl_start > (start + len)))
@@ -2286,7 +2302,7 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
 		result = 0;
 		break;
 	}
-	unlock_kernel();
+	unlock_flocks();
 	return result;
 }
 
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index b9c3c43..232a7ee 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -71,20 +71,20 @@ static int nfs_delegation_claim_locks(struct nfs_open_context *ctx, struct nfs4_
 	if (inode->i_flock == NULL)
 		goto out;
 
-	/* Protect inode->i_flock using the BKL */
-	lock_kernel();
+	/* Protect inode->i_flock using the file locks lock */
+	lock_flocks();
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!(fl->fl_flags & (FL_POSIX|FL_FLOCK)))
 			continue;
 		if (nfs_file_open_context(fl->fl_file) != ctx)
 			continue;
-		unlock_kernel();
+		unlock_flocks();
 		status = nfs4_lock_delegation_recall(state, fl);
 		if (status < 0)
 			goto out;
-		lock_kernel();
+		lock_flocks();
 	}
-	unlock_kernel();
+	unlock_flocks();
 out:
 	return status;
 }
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 3e2f19b..96524c5 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -40,7 +40,7 @@
 
 #include <linux/kernel.h>
 #include <linux/slab.h>
-#include <linux/smp_lock.h>
+#include <linux/fs.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_idmap.h>
 #include <linux/kthread.h>
@@ -970,13 +970,13 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
 	/* Guard against delegation returns and new lock/unlock calls */
 	down_write(&nfsi->rwsem);
 	/* Protect inode->i_flock using the BKL */
-	lock_kernel();
+	lock_flocks();
 	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
 		if (!(fl->fl_flags & (FL_POSIX|FL_FLOCK)))
 			continue;
 		if (nfs_file_open_context(fl->fl_file)->state != state)
 			continue;
-		unlock_kernel();
+		unlock_flocks();
 		status = ops->recover_lock(state, fl);
 		switch (status) {
 			case 0:
@@ -1003,9 +1003,9 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
 				/* kill_proc(fl->fl_pid, SIGLOST, 1); */
 				status = 0;
 		}
-		lock_kernel();
+		lock_flocks();
 	}
-	unlock_kernel();
+	unlock_flocks();
 out:
 	up_write(&nfsi->rwsem);
 	return status;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index cf0d2ff..a7292fc 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -33,7 +33,7 @@
 */
 
 #include <linux/file.h>
-#include <linux/smp_lock.h>
+#include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/namei.h>
 #include <linux/swap.h>
@@ -3895,7 +3895,7 @@ check_for_locks(struct nfs4_file *filp, struct nfs4_stateowner *lowner)
 	struct inode *inode = filp->fi_inode;
 	int status = 0;
 
-	lock_kernel();
+	lock_flocks();
 	for (flpp = &inode->i_flock; *flpp != NULL; flpp = &(*flpp)->fl_next) {
 		if ((*flpp)->fl_owner == (fl_owner_t)lowner) {
 			status = 1;
@@ -3903,7 +3903,7 @@ check_for_locks(struct nfs4_file *filp, struct nfs4_stateowner *lowner)
 		}
 	}
 out:
-	unlock_kernel();
+	unlock_flocks();
 	return status;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0ec4d60..fd85cd2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1127,6 +1127,8 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
 extern int lease_modify(struct file_lock **, int);
 extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
 extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
+extern void lock_flocks(void);
+extern void unlock_flocks(void);
 #else /* !CONFIG_FILE_LOCKING */
 static inline int fcntl_getlk(struct file *file, struct flock __user *user)
 {
@@ -1269,6 +1271,14 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
 	return 1;
 }
 
+static inline void lock_flocks()
+{
+}
+
+static inline void unlock_flocks()
+{
+}
+
 #endif /* !CONFIG_FILE_LOCKING */
 
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-15 16:30           ` Arnd Bergmann
@ 2010-09-15 17:02             ` Sage Weil
  2010-09-15 17:17               ` Arnd Bergmann
  0 siblings, 1 reply; 13+ messages in thread
From: Sage Weil @ 2010-09-15 17:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Trond Myklebust, Linus Torvalds, linux-kernel, linux-fsdevel,
	Matthew Wilcox, Christoph Hellwig, J. Bruce Fields,
	Miklos Szeredi, Frederic Weisbecker, Ingo Molnar, John Kacur,
	Stephen Rothwell, Andrew Morton, Thomas Gleixner, gregf

Hi Arnd,

On Wed, 15 Sep 2010, Arnd Bergmann wrote:

> On Tuesday 14 September 2010, Trond Myklebust wrote:
> > Would it be possible to at least merge a patch that defines
> > lock_flocks() (even if it just is an alias for lock_kernel()), so that
> > we can convert those bits of the lock recovery code that need to peek
> > into the inode->i_flock list ASAP?
> 
> Fine with me.

That's my preference.  See below:

> 
> Another alternative would be to keep a series of all these patches
> together, like
> 
> 1. fs/locks.c: prepare for BKL removal
> 2. fs/lockd: use lock/flocks
> 3. fs/locks.c: lock_flocks using spinlock (trivial)
>
> Feel free to take the patch below into a tree of yours when
> adding patches 2 and 3 on top, or send me that patch once
> it's done and I'll add both into my tree.
> 
> Linus, Christoph: Does the patch below look ok? I've removed
> the CONFIG_BKL stuff in it now and changed over everything
> besides lockd to use lock_flocks(). I'd like to put this into
> the -next queue (after some basic testing).
> ---
> Subject: [PATCH] fs/locks.c: prepare for BKL removal
> 
> This prepares the removal of the big kernel lock from the
> file locking code. We still use the BKL as long as fs/lockd
> uses it, but flip the definition to a private spinlock
> as soon as that's done.
> 
> All users outside of fs/lockd get converted to use
> lock_flocks() instead of lock_kernel() where appropriate.
> 
> Based on an earlier patch from Matthew Wilcox.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Cc: Matthew Wilcox <willy@linux.intel.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
> Cc: "J. Bruce Fields" <bfields@fieldses.org>
> Cc: Miklos Szeredi <mszeredi@suse.cz>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: John Kacur <jkacur@redhat.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> ---
> 
>  fs/afs/flock.c       |    5 +-
>  fs/ceph/mds_client.c |    6 +-
>  fs/locks.c           |  118 ++++++++++++++++++++++++++++---------------------
>  fs/nfs/delegation.c  |   10 ++--
>  fs/nfs/nfs4state.c   |   10 ++--
>  fs/nfsd/nfs4state.c  |    6 +-
>  include/linux/fs.h   |   10 ++++
>  7 files changed, 95 insertions(+), 70 deletions(-)
> 
> diff --git a/fs/afs/flock.c b/fs/afs/flock.c
> index 0931bc1..757d664 100644
> --- a/fs/afs/flock.c
> +++ b/fs/afs/flock.c
> @@ -9,7 +9,6 @@
>   * 2 of the License, or (at your option) any later version.
>   */
>  
> -#include <linux/smp_lock.h>
>  #include "internal.h"
>  
>  #define AFS_LOCK_GRANTED	0
> @@ -274,7 +273,7 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
>  
>  	type = (fl->fl_type == F_RDLCK) ? AFS_LOCK_READ : AFS_LOCK_WRITE;
>  
> -	lock_kernel();
> +	lock_flocks();
>  
>  	/* make sure we've got a callback on this file and that our view of the
>  	 * data version is up to date */
> @@ -421,7 +420,7 @@ given_lock:
>  	afs_vnode_fetch_status(vnode, NULL, key);
>  
>  error:
> -	unlock_kernel();
> +	unlock_flocks();
>  	_leave(" = %d", ret);
>  	return ret;
>  
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index f091b13..bda5211 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -3,7 +3,7 @@
>  #include <linux/wait.h>
>  #include <linux/slab.h>
>  #include <linux/sched.h>
> -#include <linux/smp_lock.h>
> +#include <linux/fs.h>
>  
>  #include "mds_client.h"
>  #include "mon_client.h"
> @@ -2362,7 +2362,7 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
>  	if (recon_state->flock) {
>  		int num_fcntl_locks, num_flock_locks;
>  
> -		lock_kernel();
> +		lock_flocks();
>  		ceph_count_locks(inode, &num_fcntl_locks, &num_flock_locks);
>  		rec.v2.flock_len = (2*sizeof(u32) +
>  				    (num_fcntl_locks+num_flock_locks) *
> @@ -2373,7 +2373,7 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
>  			err = ceph_encode_locks(inode, pagelist,
>  						num_fcntl_locks,
>  						num_flock_locks);
> -		unlock_kernel();
> +		unlock_flocks();
>  	}

The Ceph code won't currently behave with lock_flocks() taking a spinlock.  
We're preparing a patch to fix that now.  As long as there is a window 
between lock_flocks() being defined and the spinlock conversion, I can 
send the fix upstream then and avoid any breakage.  Or send the patches 
your way to include in your tree, whatever you prefer!

>  
>  out_free:
> diff --git a/fs/locks.c b/fs/locks.c
> index ab24d49..8c6935f 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -143,6 +143,21 @@ int lease_break_time = 45;
>  static LIST_HEAD(file_lock_list);
>  static LIST_HEAD(blocked_list);
>  
> +/*
> + * Protects the two list heads above, plus the inode->i_flock list
> + * FIXME: should use a spinlock, once fs/lockd uses lock_flocks().
> + */
> +void lock_flocks(void)
> +{
> +	lock_kernel();
> +}
> +EXPORT_SYMBOL(lock_flocks);
> +
> +void unlock_flocks(void)
> +{
> +	unlock_kernel();
> +}

EXPORT_SYMBOL(unlock_flocks); ?

Thanks!
sage


> +
>  static struct kmem_cache *filelock_cache __read_mostly;
>  
>  /* Allocate an empty lock structure. */
> @@ -511,9 +526,9 @@ static void __locks_delete_block(struct file_lock *waiter)
>   */
>  static void locks_delete_block(struct file_lock *waiter)
>  {
> -	lock_kernel();
> +	lock_flocks();
>  	__locks_delete_block(waiter);
> -	unlock_kernel();
> +	unlock_flocks();
>  }
>  
>  /* Insert waiter into blocker's block list.
> @@ -644,7 +659,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
>  {
>  	struct file_lock *cfl;
>  
> -	lock_kernel();
> +	lock_flocks();
>  	for (cfl = filp->f_path.dentry->d_inode->i_flock; cfl; cfl = cfl->fl_next) {
>  		if (!IS_POSIX(cfl))
>  			continue;
> @@ -657,7 +672,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
>  			fl->fl_pid = pid_vnr(cfl->fl_nspid);
>  	} else
>  		fl->fl_type = F_UNLCK;
> -	unlock_kernel();
> +	unlock_flocks();
>  	return;
>  }
>  EXPORT_SYMBOL(posix_test_lock);
> @@ -730,18 +745,16 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
>  	int error = 0;
>  	int found = 0;
>  
> -	lock_kernel();
> -	if (request->fl_flags & FL_ACCESS)
> -		goto find_conflict;
> -
> -	if (request->fl_type != F_UNLCK) {
> -		error = -ENOMEM;
> +	if (!(request->fl_flags & FL_ACCESS) && (request->fl_type != F_UNLCK)) {
>  		new_fl = locks_alloc_lock();
> -		if (new_fl == NULL)
> -			goto out;
> -		error = 0;
> +		if (!new_fl)
> +			return -ENOMEM;
>  	}
>  
> +	lock_flocks();
> +	if (request->fl_flags & FL_ACCESS)
> +		goto find_conflict;
> +
>  	for_each_lock(inode, before) {
>  		struct file_lock *fl = *before;
>  		if (IS_POSIX(fl))
> @@ -767,8 +780,11 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
>  	 * If a higher-priority process was blocked on the old file lock,
>  	 * give it the opportunity to lock the file.
>  	 */
> -	if (found)
> +	if (found) {
> +		unlock_flocks();
>  		cond_resched();
> +		lock_flocks();
> +	}
>  
>  find_conflict:
>  	for_each_lock(inode, before) {
> @@ -794,7 +810,7 @@ find_conflict:
>  	error = 0;
>  
>  out:
> -	unlock_kernel();
> +	unlock_flocks();
>  	if (new_fl)
>  		locks_free_lock(new_fl);
>  	return error;
> @@ -823,7 +839,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
>  		new_fl2 = locks_alloc_lock();
>  	}
>  
> -	lock_kernel();
> +	lock_flocks();
>  	if (request->fl_type != F_UNLCK) {
>  		for_each_lock(inode, before) {
>  			fl = *before;
> @@ -991,7 +1007,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
>  		locks_wake_up_blocks(left);
>  	}
>   out:
> -	unlock_kernel();
> +	unlock_flocks();
>  	/*
>  	 * Free any unused locks.
>  	 */
> @@ -1066,14 +1082,14 @@ int locks_mandatory_locked(struct inode *inode)
>  	/*
>  	 * Search the lock list for this inode for any POSIX locks.
>  	 */
> -	lock_kernel();
> +	lock_flocks();
>  	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
>  		if (!IS_POSIX(fl))
>  			continue;
>  		if (fl->fl_owner != owner)
>  			break;
>  	}
> -	unlock_kernel();
> +	unlock_flocks();
>  	return fl ? -EAGAIN : 0;
>  }
>  
> @@ -1186,7 +1202,7 @@ int __break_lease(struct inode *inode, unsigned int mode)
>  
>  	new_fl = lease_alloc(NULL, want_write ? F_WRLCK : F_RDLCK);
>  
> -	lock_kernel();
> +	lock_flocks();
>  
>  	time_out_leases(inode);
>  
> @@ -1247,8 +1263,10 @@ restart:
>  			break_time++;
>  	}
>  	locks_insert_block(flock, new_fl);
> +	unlock_flocks();
>  	error = wait_event_interruptible_timeout(new_fl->fl_wait,
>  						!new_fl->fl_next, break_time);
> +	lock_flocks();
>  	__locks_delete_block(new_fl);
>  	if (error >= 0) {
>  		if (error == 0)
> @@ -1263,7 +1281,7 @@ restart:
>  	}
>  
>  out:
> -	unlock_kernel();
> +	unlock_flocks();
>  	if (!IS_ERR(new_fl))
>  		locks_free_lock(new_fl);
>  	return error;
> @@ -1319,7 +1337,7 @@ int fcntl_getlease(struct file *filp)
>  	struct file_lock *fl;
>  	int type = F_UNLCK;
>  
> -	lock_kernel();
> +	lock_flocks();
>  	time_out_leases(filp->f_path.dentry->d_inode);
>  	for (fl = filp->f_path.dentry->d_inode->i_flock; fl && IS_LEASE(fl);
>  			fl = fl->fl_next) {
> @@ -1328,7 +1346,7 @@ int fcntl_getlease(struct file *filp)
>  			break;
>  		}
>  	}
> -	unlock_kernel();
> +	unlock_flocks();
>  	return type;
>  }
>  
> @@ -1341,7 +1359,7 @@ int fcntl_getlease(struct file *filp)
>   *	The (input) flp->fl_lmops->fl_break function is required
>   *	by break_lease().
>   *
> - *	Called with kernel lock held.
> + *	Called with file_lock_lock held.
>   */
>  int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
>  {
> @@ -1359,6 +1377,13 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
>  	if (error)
>  		return error;
>  
> +	if (arg != F_UNLCK) {
> +		new_fl = locks_alloc_lock();
> +		if (new_fl == NULL)
> +			return -ENOMEM;
> +	}
> +
> +	lock_flocks();
>  	time_out_leases(inode);
>  
>  	BUG_ON(!(*flp)->fl_lmops->fl_break);
> @@ -1366,11 +1391,6 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
>  	lease = *flp;
>  
>  	if (arg != F_UNLCK) {
> -		error = -ENOMEM;
> -		new_fl = locks_alloc_lock();
> -		if (new_fl == NULL)
> -			goto out;
> -
>  		error = -EAGAIN;
>  		if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
>  			goto out;
> @@ -1427,16 +1447,18 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
>  	locks_insert_lock(before, new_fl);
>  
>  	*flp = new_fl;
> +	unlock_flocks();
>  	return 0;
>  
>  out:
>  	if (new_fl != NULL)
>  		locks_free_lock(new_fl);
> +	unlock_flocks();
>  	return error;
>  }
>  EXPORT_SYMBOL(generic_setlease);
>  
> - /**
> +/**
>   *	vfs_setlease        -       sets a lease on an open file
>   *	@filp: file pointer
>   *	@arg: type of lease to obtain
> @@ -1465,16 +1487,10 @@ EXPORT_SYMBOL(generic_setlease);
>  
>  int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
>  {
> -	int error;
> -
> -	lock_kernel();
>  	if (filp->f_op && filp->f_op->setlease)
> -		error = filp->f_op->setlease(filp, arg, lease);
> +		return filp->f_op->setlease(filp, arg, lease);
>  	else
> -		error = generic_setlease(filp, arg, lease);
> -	unlock_kernel();
> -
> -	return error;
> +		return generic_setlease(filp, arg, lease);
>  }
>  EXPORT_SYMBOL_GPL(vfs_setlease);
>  
> @@ -1499,12 +1515,12 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
>  	if (error)
>  		return error;
>  
> -	lock_kernel();
> -
>  	error = vfs_setlease(filp, arg, &flp);
>  	if (error || arg == F_UNLCK)
>  		goto out_unlock;
>  
> +	lock_flocks();
> +
>  	error = fasync_helper(fd, filp, 1, &flp->fl_fasync);
>  	if (error < 0) {
>  		/* remove lease just inserted by setlease */
> @@ -1516,7 +1532,7 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
>  
>  	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
>  out_unlock:
> -	unlock_kernel();
> +	unlock_flocks();
>  	return error;
>  }
>  
> @@ -2020,7 +2036,7 @@ void locks_remove_flock(struct file *filp)
>  			fl.fl_ops->fl_release_private(&fl);
>  	}
>  
> -	lock_kernel();
> +	lock_flocks();
>  	before = &inode->i_flock;
>  
>  	while ((fl = *before) != NULL) {
> @@ -2038,7 +2054,7 @@ void locks_remove_flock(struct file *filp)
>   		}
>  		before = &fl->fl_next;
>  	}
> -	unlock_kernel();
> +	unlock_flocks();
>  }
>  
>  /**
> @@ -2053,12 +2069,12 @@ posix_unblock_lock(struct file *filp, struct file_lock *waiter)
>  {
>  	int status = 0;
>  
> -	lock_kernel();
> +	lock_flocks();
>  	if (waiter->fl_next)
>  		__locks_delete_block(waiter);
>  	else
>  		status = -ENOENT;
> -	unlock_kernel();
> +	unlock_flocks();
>  	return status;
>  }
>  
> @@ -2172,7 +2188,7 @@ static int locks_show(struct seq_file *f, void *v)
>  
>  static void *locks_start(struct seq_file *f, loff_t *pos)
>  {
> -	lock_kernel();
> +	lock_flocks();
>  	f->private = (void *)1;
>  	return seq_list_start(&file_lock_list, *pos);
>  }
> @@ -2184,7 +2200,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
>  
>  static void locks_stop(struct seq_file *f, void *v)
>  {
> -	unlock_kernel();
> +	unlock_flocks();
>  }
>  
>  static const struct seq_operations locks_seq_operations = {
> @@ -2231,7 +2247,7 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
>  {
>  	struct file_lock *fl;
>  	int result = 1;
> -	lock_kernel();
> +	lock_flocks();
>  	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
>  		if (IS_POSIX(fl)) {
>  			if (fl->fl_type == F_RDLCK)
> @@ -2248,7 +2264,7 @@ int lock_may_read(struct inode *inode, loff_t start, unsigned long len)
>  		result = 0;
>  		break;
>  	}
> -	unlock_kernel();
> +	unlock_flocks();
>  	return result;
>  }
>  
> @@ -2271,7 +2287,7 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
>  {
>  	struct file_lock *fl;
>  	int result = 1;
> -	lock_kernel();
> +	lock_flocks();
>  	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
>  		if (IS_POSIX(fl)) {
>  			if ((fl->fl_end < start) || (fl->fl_start > (start + len)))
> @@ -2286,7 +2302,7 @@ int lock_may_write(struct inode *inode, loff_t start, unsigned long len)
>  		result = 0;
>  		break;
>  	}
> -	unlock_kernel();
> +	unlock_flocks();
>  	return result;
>  }
>  
> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
> index b9c3c43..232a7ee 100644
> --- a/fs/nfs/delegation.c
> +++ b/fs/nfs/delegation.c
> @@ -71,20 +71,20 @@ static int nfs_delegation_claim_locks(struct nfs_open_context *ctx, struct nfs4_
>  	if (inode->i_flock == NULL)
>  		goto out;
>  
> -	/* Protect inode->i_flock using the BKL */
> -	lock_kernel();
> +	/* Protect inode->i_flock using the file locks lock */
> +	lock_flocks();
>  	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
>  		if (!(fl->fl_flags & (FL_POSIX|FL_FLOCK)))
>  			continue;
>  		if (nfs_file_open_context(fl->fl_file) != ctx)
>  			continue;
> -		unlock_kernel();
> +		unlock_flocks();
>  		status = nfs4_lock_delegation_recall(state, fl);
>  		if (status < 0)
>  			goto out;
> -		lock_kernel();
> +		lock_flocks();
>  	}
> -	unlock_kernel();
> +	unlock_flocks();
>  out:
>  	return status;
>  }
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 3e2f19b..96524c5 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -40,7 +40,7 @@
>  
>  #include <linux/kernel.h>
>  #include <linux/slab.h>
> -#include <linux/smp_lock.h>
> +#include <linux/fs.h>
>  #include <linux/nfs_fs.h>
>  #include <linux/nfs_idmap.h>
>  #include <linux/kthread.h>
> @@ -970,13 +970,13 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
>  	/* Guard against delegation returns and new lock/unlock calls */
>  	down_write(&nfsi->rwsem);
>  	/* Protect inode->i_flock using the BKL */
> -	lock_kernel();
> +	lock_flocks();
>  	for (fl = inode->i_flock; fl != NULL; fl = fl->fl_next) {
>  		if (!(fl->fl_flags & (FL_POSIX|FL_FLOCK)))
>  			continue;
>  		if (nfs_file_open_context(fl->fl_file)->state != state)
>  			continue;
> -		unlock_kernel();
> +		unlock_flocks();
>  		status = ops->recover_lock(state, fl);
>  		switch (status) {
>  			case 0:
> @@ -1003,9 +1003,9 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
>  				/* kill_proc(fl->fl_pid, SIGLOST, 1); */
>  				status = 0;
>  		}
> -		lock_kernel();
> +		lock_flocks();
>  	}
> -	unlock_kernel();
> +	unlock_flocks();
>  out:
>  	up_write(&nfsi->rwsem);
>  	return status;
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index cf0d2ff..a7292fc 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -33,7 +33,7 @@
>  */
>  
>  #include <linux/file.h>
> -#include <linux/smp_lock.h>
> +#include <linux/fs.h>
>  #include <linux/slab.h>
>  #include <linux/namei.h>
>  #include <linux/swap.h>
> @@ -3895,7 +3895,7 @@ check_for_locks(struct nfs4_file *filp, struct nfs4_stateowner *lowner)
>  	struct inode *inode = filp->fi_inode;
>  	int status = 0;
>  
> -	lock_kernel();
> +	lock_flocks();
>  	for (flpp = &inode->i_flock; *flpp != NULL; flpp = &(*flpp)->fl_next) {
>  		if ((*flpp)->fl_owner == (fl_owner_t)lowner) {
>  			status = 1;
> @@ -3903,7 +3903,7 @@ check_for_locks(struct nfs4_file *filp, struct nfs4_stateowner *lowner)
>  		}
>  	}
>  out:
> -	unlock_kernel();
> +	unlock_flocks();
>  	return status;
>  }
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 0ec4d60..fd85cd2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1127,6 +1127,8 @@ extern int vfs_setlease(struct file *, long, struct file_lock **);
>  extern int lease_modify(struct file_lock **, int);
>  extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
>  extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
> +extern void lock_flocks(void);
> +extern void unlock_flocks(void);
>  #else /* !CONFIG_FILE_LOCKING */
>  static inline int fcntl_getlk(struct file *file, struct flock __user *user)
>  {
> @@ -1269,6 +1271,14 @@ static inline int lock_may_write(struct inode *inode, loff_t start,
>  	return 1;
>  }
>  
> +static inline void lock_flocks()
> +{
> +}
> +
> +static inline void unlock_flocks()
> +{
> +}
> +
>  #endif /* !CONFIG_FILE_LOCKING */
>  
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-15 17:02             ` Sage Weil
@ 2010-09-15 17:17               ` Arnd Bergmann
  2010-09-15 18:22                 ` Sage Weil
  0 siblings, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2010-09-15 17:17 UTC (permalink / raw)
  To: Sage Weil
  Cc: Trond Myklebust, Linus Torvalds, linux-kernel, linux-fsdevel,
	Matthew Wilcox, Christoph Hellwig, J. Bruce Fields,
	Miklos Szeredi, Frederic Weisbecker, Ingo Molnar, John Kacur,
	Stephen Rothwell, Andrew Morton, Thomas Gleixner, gregf

On Wednesday 15 September 2010, Sage Weil wrote:
> > @@ -2362,7 +2362,7 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
> >       if (recon_state->flock) {
> >               int num_fcntl_locks, num_flock_locks;
> >  
> > -             lock_kernel();
> > +             lock_flocks();
> >               ceph_count_locks(inode, &num_fcntl_locks, &num_flock_locks);
> >               rec.v2.flock_len = (2*sizeof(u32) +
> >                                   (num_fcntl_locks+num_flock_locks) *
> > @@ -2373,7 +2373,7 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
> >                       err = ceph_encode_locks(inode, pagelist,
> >                                               num_fcntl_locks,
> >                                               num_flock_locks);
> > -             unlock_kernel();
> > +             unlock_flocks();
> >       }
> 
> The Ceph code won't currently behave with lock_flocks() taking a spinlock.  
> We're preparing a patch to fix that now.  As long as there is a window 
> between lock_flocks() being defined and the spinlock conversion, I can 
> send the fix upstream then and avoid any breakage.  Or send the patches 
> your way to include in your tree, whatever you prefer!

I'd be happy to just integrate the fix in this patch, or as a separate patch
in the series.

I certainly don't want to break any file system in the middle of the series,
I'm sure we can find a way to do it right.

What is the problem? I just saw ceph_pagelist_addpage potentially sleeping,
is that what you are thinking of?

> > +void unlock_flocks(void)
> > +{
> > +     unlock_kernel();
> > +}
> 
> EXPORT_SYMBOL(unlock_flocks); ?

Right, thanks!

	Arnd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-15 17:17               ` Arnd Bergmann
@ 2010-09-15 18:22                 ` Sage Weil
  0 siblings, 0 replies; 13+ messages in thread
From: Sage Weil @ 2010-09-15 18:22 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Trond Myklebust, Linus Torvalds, linux-kernel, linux-fsdevel,
	Matthew Wilcox, Christoph Hellwig, J. Bruce Fields,
	Miklos Szeredi, Frederic Weisbecker, Ingo Molnar, John Kacur,
	Stephen Rothwell, Andrew Morton, Thomas Gleixner, gregf

On Wed, 15 Sep 2010, Arnd Bergmann wrote:

> On Wednesday 15 September 2010, Sage Weil wrote:
> > > @@ -2362,7 +2362,7 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
> > >       if (recon_state->flock) {
> > >               int num_fcntl_locks, num_flock_locks;
> > >  
> > > -             lock_kernel();
> > > +             lock_flocks();
> > >               ceph_count_locks(inode, &num_fcntl_locks, &num_flock_locks);
> > >               rec.v2.flock_len = (2*sizeof(u32) +
> > >                                   (num_fcntl_locks+num_flock_locks) *
> > > @@ -2373,7 +2373,7 @@ static int encode_caps_cb(struct inode *inode, struct ceph_cap *cap,
> > >                       err = ceph_encode_locks(inode, pagelist,
> > >                                               num_fcntl_locks,
> > >                                               num_flock_locks);
> > > -             unlock_kernel();
> > > +             unlock_flocks();
> > >       }
> > 
> > The Ceph code won't currently behave with lock_flocks() taking a spinlock.  
> > We're preparing a patch to fix that now.  As long as there is a window 
> > between lock_flocks() being defined and the spinlock conversion, I can 
> > send the fix upstream then and avoid any breakage.  Or send the patches 
> > your way to include in your tree, whatever you prefer!
> 
> I'd be happy to just integrate the fix in this patch, or as a separate patch
> in the series.
> 
> I certainly don't want to break any file system in the middle of the series,
> I'm sure we can find a way to do it right.
> 
> What is the problem? I just saw ceph_pagelist_addpage potentially sleeping,
> is that what you are thinking of?

Yeah.  The pagelist code needs to be adjusted to handle preallocation of 
potentially multiple pages.  We'll send the fix your way shortly.

Thanks!
sage


> 
> > > +void unlock_flocks(void)
> > > +{
> > > +     unlock_kernel();
> > > +}
> > 
> > EXPORT_SYMBOL(unlock_flocks); ?
> 
> Right, thanks!
> 
> 	Arnd
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-14 21:55         ` Trond Myklebust
  2010-09-15 16:30           ` Arnd Bergmann
@ 2010-09-15 20:42           ` J. Bruce Fields
  2010-09-16  8:28             ` Arnd Bergmann
  1 sibling, 1 reply; 13+ messages in thread
From: J. Bruce Fields @ 2010-09-15 20:42 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Arnd Bergmann, Linus Torvalds, linux-kernel, linux-fsdevel,
	Matthew Wilcox, Christoph Hellwig, Miklos Szeredi,
	Frederic Weisbecker, Ingo Molnar, John Kacur, Stephen Rothwell,
	Andrew Morton, Thomas Gleixner

On Tue, Sep 14, 2010 at 05:55:56PM -0400, Trond Myklebust wrote:
> On Tue, 2010-09-14 at 23:24 +0200, Arnd Bergmann wrote:
> > On Tuesday 14 September 2010 22:53:31 Linus Torvalds wrote:
> > > Oh well. I guess there's no incremental way to do things sanely. And
> > > nobody has patches to fix those users, I guess.
> > 
> > The only critical user is fs/lockd, I can easily handle the rest.
> > When I talked to Bruce and Trond during LinuxCon, they told me that
> > it should be possible to separate the bits of fs/lockd that lock
> > against fs/locks.c and convert the former to use lock_flocks().
> 
> 
> The NFSv4 client is quite ready to just switch to using lock_flocks()
> now. There is nothing that depends on the magical properties of the BKL.
> I do also have someone working on BKL removal for the NLM/lockd client
> and am hoping that should be ready in a couple of weeks time.
> 
> The timeline for the server is up to Bruce, however.

Looking over the server code....  The only code I see under the BKL is:

	- a few lease callbacks in fs/nfsd/nfs4state.c, none of which
	  sleep at this point, so all should be fine under a spinlock or
	  whatever we want.
	- fs/nfsd/nfs4state.c:check_for_locks(), which explicitly takes
	  the BKL itself.  All it does, though, is walk the lock list
	  for a given file and check whether any of them have a given
	  owner.  It would be trivial to put it under some other lock
	  and/or move it to locks.c.

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-15 20:42           ` J. Bruce Fields
@ 2010-09-16  8:28             ` Arnd Bergmann
  2010-09-16 14:10               ` J. Bruce Fields
  0 siblings, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2010-09-16  8:28 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Trond Myklebust, Linus Torvalds, linux-kernel, linux-fsdevel,
	Matthew Wilcox, Christoph Hellwig, Miklos Szeredi,
	Frederic Weisbecker, Ingo Molnar, John Kacur, Stephen Rothwell,
	Andrew Morton, Thomas Gleixner

On Wednesday 15 September 2010 22:42:03 J. Bruce Fields wrote:
> Looking over the server code....  The only code I see under the BKL is:
> 
>         - a few lease callbacks in fs/nfsd/nfs4state.c, none of which
>           sleep at this point, so all should be fine under a spinlock or
>           whatever we want.
>         - fs/nfsd/nfs4state.c:check_for_locks(), which explicitly takes
>           the BKL itself.  All it does, though, is walk the lock list
>           for a given file and check whether any of them have a given
>           owner.  It would be trivial to put it under some other lock
>           and/or move it to locks.c.

Ok. In the version of the patch I sent out yesterday, I came to the
same conclusion and put both of these under lock_flocks(), which
is still the BKL but can be converted to a spinlock after we have
sorted out ceph and lockd. If you and others are fine with this patch,
I'll add it to my bkl/config series.

Note that walking the i_flock list needs to use lock_flocks()
(or the BKL), not a private lock. I guess that's what you are saying
anyway, just making sure.

	Arnd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Remove BKL from fs/locks.c
  2010-09-16  8:28             ` Arnd Bergmann
@ 2010-09-16 14:10               ` J. Bruce Fields
  0 siblings, 0 replies; 13+ messages in thread
From: J. Bruce Fields @ 2010-09-16 14:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Trond Myklebust, Linus Torvalds, linux-kernel, linux-fsdevel,
	Matthew Wilcox, Christoph Hellwig, Miklos Szeredi,
	Frederic Weisbecker, Ingo Molnar, John Kacur, Stephen Rothwell,
	Andrew Morton, Thomas Gleixner

On Thu, Sep 16, 2010 at 10:28:56AM +0200, Arnd Bergmann wrote:
> On Wednesday 15 September 2010 22:42:03 J. Bruce Fields wrote:
> > Looking over the server code....  The only code I see under the BKL is:
> > 
> >         - a few lease callbacks in fs/nfsd/nfs4state.c, none of which
> >           sleep at this point, so all should be fine under a spinlock or
> >           whatever we want.
> >         - fs/nfsd/nfs4state.c:check_for_locks(), which explicitly takes
> >           the BKL itself.  All it does, though, is walk the lock list
> >           for a given file and check whether any of them have a given
> >           owner.  It would be trivial to put it under some other lock
> >           and/or move it to locks.c.
> 
> Ok. In the version of the patch I sent out yesterday, I came to the
> same conclusion and put both of these under lock_flocks(), which
> is still the BKL but can be converted to a spinlock after we have
> sorted out ceph and lockd. If you and others are fine with this patch,
> I'll add it to my bkl/config series.
> 
> Note that walking the i_flock list needs to use lock_flocks()
> (or the BKL), not a private lock. I guess that's what you are saying
> anyway, just making sure.

Yes, that's fine.

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-09-16 14:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-14 20:06 [PATCH] Remove BKL from fs/locks.c Arnd Bergmann
2010-09-14 20:20 ` Linus Torvalds
2010-09-14 20:39   ` Arnd Bergmann
2010-09-14 20:53     ` Linus Torvalds
2010-09-14 21:24       ` Arnd Bergmann
2010-09-14 21:55         ` Trond Myklebust
2010-09-15 16:30           ` Arnd Bergmann
2010-09-15 17:02             ` Sage Weil
2010-09-15 17:17               ` Arnd Bergmann
2010-09-15 18:22                 ` Sage Weil
2010-09-15 20:42           ` J. Bruce Fields
2010-09-16  8:28             ` Arnd Bergmann
2010-09-16 14:10               ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.