linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks
@ 2014-01-09 14:19 Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 01/14] locks: close potential race between setlease and open Jeff Layton
                   ` (13 more replies)
  0 siblings, 14 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

This patchset is the fifth posting of this set. The big change in this
set is that it adds new fcntl cmd values instead of l_type values. I
think that will make the interface cleaner for userland, but it does
mean wading through the morass of arch-specific cmd definitions.

Here is a summary of the more notable changes:

v5:
- switch to adding new cmd values instead of new l_type values
- corrected patch by Bruce to fix up range overflow handling
- patch to add Bruce and I as maintainers for fs/locks.c
- disable deadlock detection for file-private locks

v4:
- prefixed the set with the rest of the locks.c patches I have.
- added patch to get rid of BUG() call in locks_remove_flock. I think
  a WARN  + delete the lock is sufficient there.
- added patches from Bruce to consolidate the struct flock/flock64
  to file_lock conversion code
- fixed locks_remove_file not to assume that filp-private locks won't
  be added on filesystems that have a ->lock method.

v3:
- more consolidation of common code between flock_to_posix_lock and
  flock_to_posix_lock64
- better bisectability by reordering changes, such that partial
  implementation won't be exposed
- s/filp/file/ s/FILP/FILE/ in symbol names

v2:
- inheritance semantics have been change to be more BSD-lock like
- patchset size has been reduced by changing how lock ownership
  is handled
- new F_UNLCKP l_type value has been added

Note too that I've gone ahead and opened a request for the POSIX folks
to consider adding this to the POSIX spec once we have something
mergeable. They seem amenable to the idea but don't want to enshrine it
into the standard until there's a real implementation of it:

    http://austingroupbugs.net/view.php?id=768

I also owe them a better writeup of the semantics for these new locks,
but have been holding off on doing that until they're a little more
settled.

I have written a fcntl(2) manpage update for this that I will send to
the list.

J. Bruce Fields (1):
  locks: fix posix lock range overflow handling

Jeff Layton (13):
  locks: close potential race between setlease and open
  locks: clean up comment typo
  locks: remove "inline" qualifier from fl_link manipulation functions
  locks: add __acquires and __releases annotations to locks_start and
    locks_stop
  locks: eliminate BUG() call when there's an unexpected lock on file
    close
  locks: consolidate checks for compatible filp->f_mode values in setlk
    handlers
  MAINTAINERS: add Bruce and myself to list of maintainers for file
    locking code
  locks: rename locks_remove_flock to locks_remove_file
  locks: make /proc/locks show IS_FILE_PVT locks with a P suffix
  locks: report l_pid as -1 for FL_FILE_PVT locks
  locks: pass the cmd value to fcntl_getlk/getlk64
  locks: skip deadlock detection on FL_FILE_PVT locks
  locks: add new fcntl cmd values for handling file private locks

 MAINTAINERS                          |   2 +
 arch/arm/kernel/sys_oabi-compat.c    |   6 +
 arch/arm64/include/asm/compat.h      |   3 +
 arch/mips/include/asm/compat.h       |   3 +
 arch/mips/include/uapi/asm/fcntl.h   |   3 +
 arch/parisc/include/uapi/asm/fcntl.h |   3 +
 arch/powerpc/include/asm/compat.h    |   3 +
 arch/s390/include/asm/compat.h       |   9 +-
 arch/sparc/include/asm/compat.h      |   3 +
 arch/tile/include/asm/compat.h       |   3 +
 arch/x86/include/asm/compat.h        |   3 +
 fs/compat.c                          |  48 ++++-
 fs/fcntl.c                           |  30 +--
 fs/file_table.c                      |   2 +-
 fs/locks.c                           | 342 ++++++++++++++++++++---------------
 include/linux/fs.h                   |  21 ++-
 include/uapi/asm-generic/fcntl.h     |  27 ++-
 security/selinux/hooks.c             |   6 +
 18 files changed, 345 insertions(+), 172 deletions(-)

-- 
1.8.4.2


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v5 01/14] locks: close potential race between setlease and open
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 02/14] locks: clean up comment typo Jeff Layton
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

As Al Viro points out, there is an unlikely, but possible race between
opening a file and setting a lease on it. generic_add_lease is done with
the i_lock held, but the inode->i_flock check in break_lease is
lockless. It's possible for another task doing an open to do the entire
pathwalk and call break_lease between the point where generic_add_lease
checks for a conflicting open and adds the lease to the list. If this
occurs, we can end up with a lease set on the file with a conflicting
open.

To guard against that, check again for a conflicting open after adding
the lease to the i_flock list. If the above race occurs, then we can
simply unwind the lease setting and return -EAGAIN.

Because we take dentry references and acquire write access on the file
before calling break_lease, we know that if the i_flock list is empty
when the open caller goes to check it then the necessary refcounts have
already been incremented. Thus the additional check for a conflicting
open will see that there is one and the setlease call will fail.

Cc: Bruce Fields <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/locks.c         | 75 ++++++++++++++++++++++++++++++++++++++++++++----------
 include/linux/fs.h |  6 +++++
 2 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 92a0f0a..2cfeea6 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -652,15 +652,18 @@ static void locks_insert_lock(struct file_lock **pos, struct file_lock *fl)
 	locks_insert_global_locks(fl);
 }
 
-/*
- * Delete a lock and then free it.
- * Wake up processes that are blocked waiting for this lock,
- * notify the FS that the lock has been cleared and
- * finally free the lock.
+/**
+ * locks_delete_lock - Delete a lock and then free it.
+ * @thisfl_p: pointer that points to the fl_next field of the previous
+ * 	      inode->i_flock list entry
+ *
+ * Unlink a lock from all lists and free the namespace reference, but don't
+ * free it yet. Wake up processes that are blocked waiting for this lock and
+ * notify the FS that the lock has been cleared.
  *
  * Must be called with the i_lock held!
  */
-static void locks_delete_lock(struct file_lock **thisfl_p)
+static void locks_unlink_lock(struct file_lock **thisfl_p)
 {
 	struct file_lock *fl = *thisfl_p;
 
@@ -675,6 +678,18 @@ static void locks_delete_lock(struct file_lock **thisfl_p)
 	}
 
 	locks_wake_up_blocks(fl);
+}
+
+/*
+ * Unlink a lock from all lists and free it.
+ *
+ * Must be called with i_lock held!
+ */
+static void locks_delete_lock(struct file_lock **thisfl_p)
+{
+	struct file_lock *fl = *thisfl_p;
+
+	locks_unlink_lock(thisfl_p);
 	locks_free_lock(fl);
 }
 
@@ -1472,6 +1487,32 @@ int fcntl_getlease(struct file *filp)
 	return type;
 }
 
+/**
+ * check_conflicting_open - see if the given dentry points to a file that has
+ * 			    an existing open that would conflict with the
+ * 			    desired lease.
+ * @dentry:	dentry to check
+ * @arg:	type of lease that we're trying to acquire
+ *
+ * Check to see if there's an existing open fd on this file that would
+ * conflict with the lease we're trying to set.
+ */
+static int
+check_conflicting_open(const struct dentry *dentry, const long arg)
+{
+	int ret = 0;
+	struct inode *inode = dentry->d_inode;
+
+	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
+		return -EAGAIN;
+
+	if ((arg == F_WRLCK) && ((d_count(dentry) > 1) ||
+	    (atomic_read(&inode->i_count) > 1)))
+		ret = -EAGAIN;
+
+	return ret;
+}
+
 static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
 {
 	struct file_lock *fl, **before, **my_before = NULL, *lease;
@@ -1499,12 +1540,8 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp
 		return -EINVAL;
 	}
 
-	error = -EAGAIN;
-	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
-		goto out;
-	if ((arg == F_WRLCK)
-	    && ((d_count(dentry) > 1)
-		|| (atomic_read(&inode->i_count) > 1)))
+	error = check_conflicting_open(dentry, arg);
+	if (error)
 		goto out;
 
 	/*
@@ -1549,7 +1586,19 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp
 		goto out;
 
 	locks_insert_lock(before, lease);
-	error = 0;
+	/*
+	 * The check in break_lease() is lockless. It's possible for another
+	 * open to race in after we did the earlier check for a conflicting
+	 * open but before the lease was inserted. Check again for a
+	 * conflicting open and cancel the lease if there is one.
+	 *
+	 * We also add a barrier here to ensure that the insertion of the lock
+	 * precedes these checks.
+	 */
+	smp_mb();
+	error = check_conflicting_open(dentry, arg);
+	if (error)
+		locks_unlink_lock(flp);
 out:
 	if (is_deleg)
 		mutex_unlock(&inode->i_mutex);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 121f11f..04be202 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1963,6 +1963,12 @@ static inline int locks_verify_truncate(struct inode *inode,
 
 static inline int break_lease(struct inode *inode, unsigned int mode)
 {
+	/*
+	 * Since this check is lockless, we must ensure that any refcounts
+	 * taken are done before checking inode->i_flock. Otherwise, we could
+	 * end up racing with tasks trying to set a new lease on this file.
+	 */
+	smp_mb();
 	if (inode->i_flock)
 		return __break_lease(inode, mode, FL_LEASE);
 	return 0;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 02/14] locks: clean up comment typo
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 01/14] locks: close potential race between setlease and open Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 03/14] locks: remove "inline" qualifier from fl_link manipulation functions Jeff Layton
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/locks.c b/fs/locks.c
index 2cfeea6..5e28612 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -581,7 +581,7 @@ static void locks_delete_block(struct file_lock *waiter)
  * it seems like the reasonable thing to do.
  *
  * Must be called with both the i_lock and blocked_lock_lock held. The fl_block
- * list itself is protected by the file_lock_list, but by ensuring that the
+ * list itself is protected by the blocked_lock_lock, but by ensuring that the
  * i_lock is also held on insertions we can avoid taking the blocked_lock_lock
  * in some cases when we see that the fl_block list is empty.
  */
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 03/14] locks: remove "inline" qualifier from fl_link manipulation functions
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 01/14] locks: close potential race between setlease and open Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 02/14] locks: clean up comment typo Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 04/14] locks: add __acquires and __releases annotations to locks_start and locks_stop Jeff Layton
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

It's best to let the compiler decide that.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 5e28612..049a144 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -511,8 +511,7 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
 }
 
 /* Must be called with the i_lock held! */
-static inline void
-locks_insert_global_locks(struct file_lock *fl)
+static void locks_insert_global_locks(struct file_lock *fl)
 {
 	lg_local_lock(&file_lock_lglock);
 	fl->fl_link_cpu = smp_processor_id();
@@ -521,8 +520,7 @@ locks_insert_global_locks(struct file_lock *fl)
 }
 
 /* Must be called with the i_lock held! */
-static inline void
-locks_delete_global_locks(struct file_lock *fl)
+static void locks_delete_global_locks(struct file_lock *fl)
 {
 	/*
 	 * Avoid taking lock if already unhashed. This is safe since this check
@@ -544,14 +542,12 @@ posix_owner_key(struct file_lock *fl)
 	return (unsigned long)fl->fl_owner;
 }
 
-static inline void
-locks_insert_global_blocked(struct file_lock *waiter)
+static void locks_insert_global_blocked(struct file_lock *waiter)
 {
 	hash_add(blocked_hash, &waiter->fl_link, posix_owner_key(waiter));
 }
 
-static inline void
-locks_delete_global_blocked(struct file_lock *waiter)
+static void locks_delete_global_blocked(struct file_lock *waiter)
 {
 	hash_del(&waiter->fl_link);
 }
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 04/14] locks: add __acquires and __releases annotations to locks_start and locks_stop
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
                   ` (2 preceding siblings ...)
  2014-01-09 14:19 ` [PATCH v5 03/14] locks: remove "inline" qualifier from fl_link manipulation functions Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 05/14] locks: eliminate BUG() call when there's an unexpected lock on file close Jeff Layton
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

...to make sparse happy.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index 049a144..6084f5a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2430,6 +2430,7 @@ static int locks_show(struct seq_file *f, void *v)
 }
 
 static void *locks_start(struct seq_file *f, loff_t *pos)
+	__acquires(&blocked_lock_lock)
 {
 	struct locks_iterator *iter = f->private;
 
@@ -2448,6 +2449,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 }
 
 static void locks_stop(struct seq_file *f, void *v)
+	__releases(&blocked_lock_lock)
 {
 	spin_unlock(&blocked_lock_lock);
 	lg_global_unlock(&file_lock_lglock);
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 05/14] locks: eliminate BUG() call when there's an unexpected lock on file close
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
                   ` (3 preceding siblings ...)
  2014-01-09 14:19 ` [PATCH v5 04/14] locks: add __acquires and __releases annotations to locks_start and locks_stop Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 06/14] locks: fix posix lock range overflow handling Jeff Layton
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

A leftover lock on the list is surely a sign of a problem of some sort,
but it's not necessarily a reason to panic the box. Instead, just log a
warning with some info about the lock, and then delete it like we would
any other lock.

In the event that the filesystem declares a ->lock f_op, we may end up
leaking something, but that's generally preferable to an immediate
panic.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 6084f5a..dd30933 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2281,16 +2281,28 @@ void locks_remove_flock(struct file *filp)
 
 	while ((fl = *before) != NULL) {
 		if (fl->fl_file == filp) {
-			if (IS_FLOCK(fl)) {
-				locks_delete_lock(before);
-				continue;
-			}
 			if (IS_LEASE(fl)) {
 				lease_modify(before, F_UNLCK);
 				continue;
 			}
-			/* What? */
-			BUG();
+
+			/*
+			 * There's a leftover lock on the list of a type that
+			 * we didn't expect to see. Most likely a classic
+			 * POSIX lock that ended up not getting released
+			 * properly, or that raced onto the list somehow. Log
+			 * some info about it and then just remove it from
+			 * the list.
+			 */
+			WARN(!IS_FLOCK(fl),
+				"leftover lock: dev=%u:%u ino=%lu type=%hhd flags=0x%x start=%lld end=%lld\n",
+				MAJOR(inode->i_sb->s_dev),
+				MINOR(inode->i_sb->s_dev), inode->i_ino,
+				fl->fl_type, fl->fl_flags,
+				fl->fl_start, fl->fl_end);
+
+			locks_delete_lock(before);
+			continue;
  		}
 		before = &fl->fl_next;
 	}
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 06/14] locks: fix posix lock range overflow handling
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
                   ` (4 preceding siblings ...)
  2014-01-09 14:19 ` [PATCH v5 05/14] locks: eliminate BUG() call when there's an unexpected lock on file close Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 07/14] locks: consolidate checks for compatible filp->f_mode values in setlk handlers Jeff Layton
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

From: "J. Bruce Fields" <bfields@redhat.com>

In the 32-bit case fcntl assigns the 64-bit f_pos and i_size to a 32-bit
off_t.

The existing range checks also seem to depend on signed arithmetic
wrapping when it overflows.  In practice maybe that works, but we can be
more careful.  That also allows us to make a more reliable distinction
between -EINVAL and -EOVERFLOW.

Note that in the 32-bit case SEEK_CUR or SEEK_END might allow the caller
to set a lock with starting point no longer representable as a 32-bit
value.  We could return -EOVERFLOW in such cases, but the locks code is
capable of handling such ranges, so we choose to be lenient here.  The
only problem is that subsequent GETLK calls on such a lock will fail
with EOVERFLOW.

While we're here, do some cleanup including consolidating code for the
flock and flock64 cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/locks.c                       | 100 +++++++++++++--------------------------
 include/uapi/asm-generic/fcntl.h |   3 --
 2 files changed, 32 insertions(+), 71 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index dd30933..b49e853 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -344,48 +344,43 @@ static int assign_type(struct file_lock *fl, long type)
 	return 0;
 }
 
-/* Verify a "struct flock" and copy it to a "struct file_lock" as a POSIX
- * style lock.
- */
-static int flock_to_posix_lock(struct file *filp, struct file_lock *fl,
-			       struct flock *l)
+static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl,
+				 struct flock64 *l)
 {
-	off_t start, end;
-
 	switch (l->l_whence) {
 	case SEEK_SET:
-		start = 0;
+		fl->fl_start = 0;
 		break;
 	case SEEK_CUR:
-		start = filp->f_pos;
+		fl->fl_start = filp->f_pos;
 		break;
 	case SEEK_END:
-		start = i_size_read(file_inode(filp));
+		fl->fl_start = i_size_read(file_inode(filp));
 		break;
 	default:
 		return -EINVAL;
 	}
+	if (l->l_start > OFFSET_MAX - fl->fl_start)
+		return -EOVERFLOW;
+	fl->fl_start += l->l_start;
+	if (fl->fl_start < 0)
+		return -EINVAL;
 
 	/* POSIX-1996 leaves the case l->l_len < 0 undefined;
 	   POSIX-2001 defines it. */
-	start += l->l_start;
-	if (start < 0)
-		return -EINVAL;
-	fl->fl_end = OFFSET_MAX;
 	if (l->l_len > 0) {
-		end = start + l->l_len - 1;
-		fl->fl_end = end;
+		if (l->l_len - 1 > OFFSET_MAX - fl->fl_start)
+			return -EOVERFLOW;
+		fl->fl_end = fl->fl_start + l->l_len - 1;
+
 	} else if (l->l_len < 0) {
-		end = start - 1;
-		fl->fl_end = end;
-		start += l->l_len;
-		if (start < 0)
+		if (fl->fl_start + l->l_len < 0)
 			return -EINVAL;
-	}
-	fl->fl_start = start;	/* we record the absolute position */
-	if (fl->fl_end < fl->fl_start)
-		return -EOVERFLOW;
-	
+		fl->fl_end = fl->fl_start - 1;
+		fl->fl_start += l->l_len;
+	} else
+		fl->fl_end = OFFSET_MAX;
+
 	fl->fl_owner = current->files;
 	fl->fl_pid = current->tgid;
 	fl->fl_file = filp;
@@ -396,52 +391,21 @@ static int flock_to_posix_lock(struct file *filp, struct file_lock *fl,
 	return assign_type(fl, l->l_type);
 }
 
-#if BITS_PER_LONG == 32
-static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl,
-				 struct flock64 *l)
+/* Verify a "struct flock" and copy it to a "struct file_lock" as a POSIX
+ * style lock.
+ */
+static int flock_to_posix_lock(struct file *filp, struct file_lock *fl,
+			       struct flock *l)
 {
-	loff_t start;
-
-	switch (l->l_whence) {
-	case SEEK_SET:
-		start = 0;
-		break;
-	case SEEK_CUR:
-		start = filp->f_pos;
-		break;
-	case SEEK_END:
-		start = i_size_read(file_inode(filp));
-		break;
-	default:
-		return -EINVAL;
-	}
+	struct flock64 ll = {
+		.l_type = l->l_type,
+		.l_whence = l->l_whence,
+		.l_start = l->l_start,
+		.l_len = l->l_len,
+	};
 
-	start += l->l_start;
-	if (start < 0)
-		return -EINVAL;
-	fl->fl_end = OFFSET_MAX;
-	if (l->l_len > 0) {
-		fl->fl_end = start + l->l_len - 1;
-	} else if (l->l_len < 0) {
-		fl->fl_end = start - 1;
-		start += l->l_len;
-		if (start < 0)
-			return -EINVAL;
-	}
-	fl->fl_start = start;	/* we record the absolute position */
-	if (fl->fl_end < fl->fl_start)
-		return -EOVERFLOW;
-	
-	fl->fl_owner = current->files;
-	fl->fl_pid = current->tgid;
-	fl->fl_file = filp;
-	fl->fl_flags = FL_POSIX;
-	fl->fl_ops = NULL;
-	fl->fl_lmops = NULL;
-
-	return assign_type(fl, l->l_type);
+	return flock64_to_posix_lock(filp, fl, &ll);
 }
-#endif
 
 /* default lease lock manager operations */
 static void lease_break_callback(struct file_lock *fl)
diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
index 95e46c8..36025f7 100644
--- a/include/uapi/asm-generic/fcntl.h
+++ b/include/uapi/asm-generic/fcntl.h
@@ -186,8 +186,6 @@ struct flock {
 };
 #endif
 
-#ifndef CONFIG_64BIT
-
 #ifndef HAVE_ARCH_STRUCT_FLOCK64
 #ifndef __ARCH_FLOCK64_PAD
 #define __ARCH_FLOCK64_PAD
@@ -202,6 +200,5 @@ struct flock64 {
 	__ARCH_FLOCK64_PAD
 };
 #endif
-#endif /* !CONFIG_64BIT */
 
 #endif /* _ASM_GENERIC_FCNTL_H */
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 07/14] locks: consolidate checks for compatible filp->f_mode values in setlk handlers
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
                   ` (5 preceding siblings ...)
  2014-01-09 14:19 ` [PATCH v5 06/14] locks: fix posix lock range overflow handling Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 08/14] MAINTAINERS: add Bruce and myself to list of maintainers for file locking code Jeff Layton
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

Move this check into flock64_to_posix_lock instead of duplicating it in
two places. This also fixes a minor wart in the code where we continue
referring to the struct flock after converting it to struct file_lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c | 46 ++++++++++++----------------------------------
 1 file changed, 12 insertions(+), 34 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index b49e853..4cd2578 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -388,6 +388,18 @@ static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl,
 	fl->fl_ops = NULL;
 	fl->fl_lmops = NULL;
 
+	/* Ensure that fl->fl_filp has compatible f_mode */
+	switch (l->l_type) {
+	case F_RDLCK:
+		if (!(filp->f_mode & FMODE_READ))
+			return -EBADF;
+		break;
+	case F_WRLCK:
+		if (!(filp->f_mode & FMODE_WRITE))
+			return -EBADF;
+		break;
+	}
+
 	return assign_type(fl, l->l_type);
 }
 
@@ -2025,23 +2037,6 @@ again:
 		file_lock->fl_flags |= FL_SLEEP;
 	}
 	
-	error = -EBADF;
-	switch (flock.l_type) {
-	case F_RDLCK:
-		if (!(filp->f_mode & FMODE_READ))
-			goto out;
-		break;
-	case F_WRLCK:
-		if (!(filp->f_mode & FMODE_WRITE))
-			goto out;
-		break;
-	case F_UNLCK:
-		break;
-	default:
-		error = -EINVAL;
-		goto out;
-	}
-
 	error = do_lock_file_wait(filp, cmd, file_lock);
 
 	/*
@@ -2143,23 +2138,6 @@ again:
 		file_lock->fl_flags |= FL_SLEEP;
 	}
 	
-	error = -EBADF;
-	switch (flock.l_type) {
-	case F_RDLCK:
-		if (!(filp->f_mode & FMODE_READ))
-			goto out;
-		break;
-	case F_WRLCK:
-		if (!(filp->f_mode & FMODE_WRITE))
-			goto out;
-		break;
-	case F_UNLCK:
-		break;
-	default:
-		error = -EINVAL;
-		goto out;
-	}
-
 	error = do_lock_file_wait(filp, cmd, file_lock);
 
 	/*
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 08/14] MAINTAINERS: add Bruce and myself to list of maintainers for file locking code
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
                   ` (6 preceding siblings ...)
  2014-01-09 14:19 ` [PATCH v5 07/14] locks: consolidate checks for compatible filp->f_mode values in setlk handlers Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 09/14] locks: rename locks_remove_flock to locks_remove_file Jeff Layton
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

Both Bruce and I have done a fair bit of work in these files recently,
and would like to be notified if anyone is proposing changes to it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6c20792..ea43f99 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3410,6 +3410,8 @@ F:	include/uapi/scsi/fc/
 
 FILE LOCKING (flock() and fcntl()/lockf())
 M:	Matthew Wilcox <matthew@wil.cx>
+M:	Jeff Layton <jlayton@redhat.com>
+M:	J. Bruce Fields <bfields@fieldses.org>
 L:	linux-fsdevel@vger.kernel.org
 S:	Maintained
 F:	include/linux/fcntl.h
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 09/14] locks: rename locks_remove_flock to locks_remove_file
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
                   ` (7 preceding siblings ...)
  2014-01-09 14:19 ` [PATCH v5 08/14] MAINTAINERS: add Bruce and myself to list of maintainers for file locking code Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 10/14] locks: make /proc/locks show IS_FILE_PVT locks with a P suffix Jeff Layton
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

This function currently removes leases in addition to flock locks and in
a later patch we'll have it deal with a new type of POSIX lock too.
Rename it to locks_remove_file to indicate that it removes locks that
are associated with a particular struct file, and not just flock locks.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/file_table.c    | 2 +-
 fs/locks.c         | 2 +-
 include/linux/fs.h | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 5fff903..468543c 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -234,7 +234,7 @@ static void __fput(struct file *file)
 	 * in the file cleanup chain.
 	 */
 	eventpoll_release(file);
-	locks_remove_flock(file);
+	locks_remove_file(file);
 
 	if (unlikely(file->f_flags & FASYNC)) {
 		if (file->f_op->fasync)
diff --git a/fs/locks.c b/fs/locks.c
index 4cd2578..4d4e790 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2196,7 +2196,7 @@ EXPORT_SYMBOL(locks_remove_posix);
 /*
  * This function is called on the last close of an open file.
  */
-void locks_remove_flock(struct file *filp)
+void locks_remove_file(struct file *filp)
 {
 	struct inode * inode = file_inode(filp);
 	struct file_lock *fl;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 04be202..51974e0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1012,7 +1012,7 @@ extern struct file_lock * locks_alloc_lock(void);
 extern void locks_copy_lock(struct file_lock *, struct file_lock *);
 extern void __locks_copy_lock(struct file_lock *, const struct file_lock *);
 extern void locks_remove_posix(struct file *, fl_owner_t);
-extern void locks_remove_flock(struct file *);
+extern void locks_remove_file(struct file *);
 extern void locks_release_private(struct file_lock *);
 extern void posix_test_lock(struct file *, struct file_lock *);
 extern int posix_lock_file(struct file *, struct file_lock *, struct file_lock *);
@@ -1083,7 +1083,7 @@ static inline void locks_remove_posix(struct file *filp, fl_owner_t owner)
 	return;
 }
 
-static inline void locks_remove_flock(struct file *filp)
+static inline void locks_remove_file(struct file *filp)
 {
 	return;
 }
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 10/14] locks: make /proc/locks show IS_FILE_PVT locks with a P suffix
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
                   ` (8 preceding siblings ...)
  2014-01-09 14:19 ` [PATCH v5 09/14] locks: rename locks_remove_flock to locks_remove_file Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 11/14] locks: report l_pid as -1 for FL_FILE_PVT locks Jeff Layton
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

Remove the check for FL_ACCESS when generating /proc/locks output. It's
dead code as those locks never end up on the fl_hash or fl_block lists.

In a later patch, we'll be adding a new type of lock that's owned by
the struct file instead of the files_struct. Those sorts of locks
will be flagged with a new IS_FILE_PVT flag.

Add a "P" suffix to the POSIX lock output in /proc/locks for locks that
have FL_FILE_PVT set.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c         | 9 +++++++--
 include/linux/fs.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 4d4e790..f1b9b14 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -135,6 +135,7 @@
 #define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
 #define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
 #define IS_LEASE(fl)	(fl->fl_flags & (FL_LEASE|FL_DELEG))
+#define IS_FILE_PVT(fl)	(fl->fl_flags & FL_FILE_PVT)
 
 static bool lease_breaking(struct file_lock *fl)
 {
@@ -2313,8 +2314,12 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
 
 	seq_printf(f, "%lld:%s ", id, pfx);
 	if (IS_POSIX(fl)) {
-		seq_printf(f, "%6s %s ",
-			     (fl->fl_flags & FL_ACCESS) ? "ACCESS" : "POSIX ",
+		if (fl->fl_flags & FL_ACCESS)
+			seq_printf(f, "ACCESS");
+		else
+			seq_printf(f, "POSIX%c", IS_FILE_PVT(fl) ? 'P' : ' ');
+
+		seq_printf(f, " %s ",
 			     (inode == NULL) ? "*NOINODE*" :
 			     mandatory_lock(inode) ? "MANDATORY" : "ADVISORY ");
 	} else if (IS_FLOCK(fl)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 51974e0..639a3b7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -888,6 +888,7 @@ static inline int file_check_writeable(struct file *filp)
 #define FL_SLEEP	128	/* A blocking lock */
 #define FL_DOWNGRADE_PENDING	256 /* Lease is being downgraded */
 #define FL_UNLOCK_PENDING	512 /* Lease is being broken */
+#define FL_FILE_PVT	1024	/* lock is private to the file */
 
 /*
  * Special return value from posix_lock_file() and vfs_lock_file() for
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 11/14] locks: report l_pid as -1 for FL_FILE_PVT locks
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
                   ` (9 preceding siblings ...)
  2014-01-09 14:19 ` [PATCH v5 10/14] locks: make /proc/locks show IS_FILE_PVT locks with a P suffix Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 12/14] locks: pass the cmd value to fcntl_getlk/getlk64 Jeff Layton
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

FL_FILE_PVT locks are no longer tied to a particular pid, and are
instead inheritable by child processes. Report a l_pid of '-1' for
these sorts of locks since the pid is somewhat meaningless for them.

This precedent comes from FreeBSD. There, POSIX and flock() locks can
conflict with one another. If fcntl(F_GETLK, ...) returns a lock set
with flock() then the l_pid member cannot be a process ID because the
lock is not held by a process as such.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index f1b9b14..5953b2c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1864,7 +1864,7 @@ EXPORT_SYMBOL_GPL(vfs_test_lock);
 
 static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
 {
-	flock->l_pid = fl->fl_pid;
+	flock->l_pid = IS_FILE_PVT(fl) ? -1 : fl->fl_pid;
 #if BITS_PER_LONG == 32
 	/*
 	 * Make sure we can represent the posix lock via
@@ -1886,7 +1886,7 @@ static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
 #if BITS_PER_LONG == 32
 static void posix_lock_to_flock64(struct flock64 *flock, struct file_lock *fl)
 {
-	flock->l_pid = fl->fl_pid;
+	flock->l_pid = IS_FILE_PVT(fl) ? -1 : fl->fl_pid;
 	flock->l_start = fl->fl_start;
 	flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
 		fl->fl_end - fl->fl_start + 1;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 12/14] locks: pass the cmd value to fcntl_getlk/getlk64
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
                   ` (10 preceding siblings ...)
  2014-01-09 14:19 ` [PATCH v5 11/14] locks: report l_pid as -1 for FL_FILE_PVT locks Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks Jeff Layton
  2014-01-09 14:19 ` [PATCH v5 14/14] locks: add new fcntl cmd values for handling file private locks Jeff Layton
  13 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

Once we introduce file private locks, we'll need to know what cmd value
was used, as that affects the ownership and whether a conflict would
arise.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/fcntl.c         |  4 ++--
 fs/locks.c         |  4 ++--
 include/linux/fs.h | 10 ++++++----
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index ef68665..7ef7f2d 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -273,7 +273,7 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		err = setfl(fd, filp, arg);
 		break;
 	case F_GETLK:
-		err = fcntl_getlk(filp, (struct flock __user *) arg);
+		err = fcntl_getlk(filp, cmd, (struct flock __user *) arg);
 		break;
 	case F_SETLK:
 	case F_SETLKW:
@@ -389,7 +389,7 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
 	
 	switch (cmd) {
 		case F_GETLK64:
-			err = fcntl_getlk64(f.file, (struct flock64 __user *) arg);
+			err = fcntl_getlk64(f.file, cmd, (struct flock64 __user *) arg);
 			break;
 		case F_SETLK64:
 		case F_SETLKW64:
diff --git a/fs/locks.c b/fs/locks.c
index 5953b2c..f8cd6d7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1898,7 +1898,7 @@ static void posix_lock_to_flock64(struct flock64 *flock, struct file_lock *fl)
 /* Report the first existing lock that would conflict with l.
  * This implements the F_GETLK command of fcntl().
  */
-int fcntl_getlk(struct file *filp, struct flock __user *l)
+int fcntl_getlk(struct file *filp, unsigned int cmd, struct flock __user *l)
 {
 	struct file_lock file_lock;
 	struct flock flock;
@@ -2066,7 +2066,7 @@ out:
 /* Report the first existing lock that would conflict with l.
  * This implements the F_GETLK command of fcntl().
  */
-int fcntl_getlk64(struct file *filp, struct flock64 __user *l)
+int fcntl_getlk64(struct file *filp, unsigned int cmd, struct flock64 __user *l)
 {
 	struct file_lock file_lock;
 	struct flock64 flock;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 639a3b7..d9bc9b5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -993,12 +993,12 @@ struct file_lock {
 extern void send_sigio(struct fown_struct *fown, int fd, int band);
 
 #ifdef CONFIG_FILE_LOCKING
-extern int fcntl_getlk(struct file *, struct flock __user *);
+extern int fcntl_getlk(struct file *, unsigned int, struct flock __user *);
 extern int fcntl_setlk(unsigned int, struct file *, unsigned int,
 			struct flock __user *);
 
 #if BITS_PER_LONG == 32
-extern int fcntl_getlk64(struct file *, struct flock64 __user *);
+extern int fcntl_getlk64(struct file *, unsigned int, struct flock64 __user *);
 extern int fcntl_setlk64(unsigned int, struct file *, unsigned int,
 			struct flock64 __user *);
 #endif
@@ -1031,7 +1031,8 @@ extern int lease_modify(struct file_lock **, int);
 extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
 extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
 #else /* !CONFIG_FILE_LOCKING */
-static inline int fcntl_getlk(struct file *file, struct flock __user *user)
+static inline int fcntl_getlk(struct file *file, unsigned int cmd,
+			      struct flock __user *user)
 {
 	return -EINVAL;
 }
@@ -1043,7 +1044,8 @@ static inline int fcntl_setlk(unsigned int fd, struct file *file,
 }
 
 #if BITS_PER_LONG == 32
-static inline int fcntl_getlk64(struct file *file, struct flock64 __user *user)
+static inline int fcntl_getlk64(struct file *file, unsigned int cmd,
+				struct flock64 __user *user)
 {
 	return -EINVAL;
 }
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
                   ` (11 preceding siblings ...)
  2014-01-09 14:19 ` [PATCH v5 12/14] locks: pass the cmd value to fcntl_getlk/getlk64 Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 20:25   ` Andy Lutomirski
  2014-01-09 14:19 ` [PATCH v5 14/14] locks: add new fcntl cmd values for handling file private locks Jeff Layton
  13 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

It's not really feasible to do deadlock detection with FL_FILE_PVT
locks since they aren't owned by a single task, per-se. Deadlock
detection also tends to be rather expensive so just skip it for
these sorts of locks.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/locks.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index f8cd6d7..94bcca6 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -564,7 +564,7 @@ static void __locks_insert_block(struct file_lock *blocker,
 	BUG_ON(!list_empty(&waiter->fl_block));
 	waiter->fl_next = blocker;
 	list_add_tail(&waiter->fl_block, &blocker->fl_block);
-	if (IS_POSIX(blocker))
+	if (IS_POSIX(blocker) && !IS_FILE_PVT(blocker))
 		locks_insert_global_blocked(waiter);
 }
 
@@ -757,8 +757,12 @@ EXPORT_SYMBOL(posix_test_lock);
  * Note: the above assumption may not be true when handling lock
  * requests from a broken NFS client. It may also fail in the presence
  * of tasks (such as posix threads) sharing the same open file table.
- *
  * To handle those cases, we just bail out after a few iterations.
+ *
+ * For FL_FILE_PVT locks, the owner is the filp, not the files_struct.
+ * Because the owner is not even nominally tied to a thread of
+ * execution, deadlock detection can't reasonably work well. Just skip
+ * it for those.
  */
 
 #define MAX_DEADLK_ITERATIONS 10
@@ -781,6 +785,10 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
 {
 	int i = 0;
 
+	/* Can't reasonably detect deadlocks with FL_FILE_PVT locks */
+	if (IS_FILE_PVT(caller_fl))
+		return 0;
+
 	while ((block_fl = what_owner_is_waiting_for(block_fl))) {
 		if (i++ > MAX_DEADLK_ITERATIONS)
 			return 0;
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v5 14/14] locks: add new fcntl cmd values for handling file private locks
  2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
                   ` (12 preceding siblings ...)
  2014-01-09 14:19 ` [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks Jeff Layton
@ 2014-01-09 14:19 ` Jeff Layton
  2014-01-09 20:29   ` Andy Lutomirski
  13 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2014-01-09 14:19 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: nfs-ganesha-devel, samba-technical, linux-kernel

Due to some unfortunate history, POSIX locks have very strange and
unhelpful semantics. The thing that usually catches people by surprise
is that they are dropped whenever the process closes any file descriptor
associated with the inode.

This is extremely problematic for people developing file servers that
need to implement byte-range locks. Developers often need a "lock
management" facility to ensure that file descriptors are not closed
until all of the locks associated with the inode are finished.

Additionally, "classic" POSIX locks are owned by the process. Locks
taken between threads within the same process won't conflict with one
another, which renders them useless for synchronization between threads.

This patchset adds a new type of lock that attempts to address these
issues. These locks conflict with classic POSIX read/write locks, but
have semantics that are more like BSD locks with respect to inheritance
and behavior on close.

This is implemented primarily by changing how fl_owner field is set for
these locks. Instead of having them owned by the files_struct of the
process, they are instead owned by the filp on which they were acquired.
Thus, they are inherited across fork() and are only released when the
last reference to a filp is put.

These new semantics prevent them from being merged with classic POSIX
locks, even if they are acquired by the same process. These locks will
also conflict with classic POSIX locks even if they are acquired by
the same process or on the same file descriptor.

The new locks are managed using a new set of cmd values to the fcntl()
syscall. The initial implementation of this converts these values to
"classic" cmd values at a fairly high level, and the details are not
exposed to the underlying filesystem. We may eventually want to push
this handing out to the lower filesystem code but for now I don't
see any need for it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 arch/arm/kernel/sys_oabi-compat.c    |  6 +++++
 arch/arm64/include/asm/compat.h      |  3 +++
 arch/mips/include/asm/compat.h       |  3 +++
 arch/mips/include/uapi/asm/fcntl.h   |  3 +++
 arch/parisc/include/uapi/asm/fcntl.h |  3 +++
 arch/powerpc/include/asm/compat.h    |  3 +++
 arch/s390/include/asm/compat.h       |  9 ++++---
 arch/sparc/include/asm/compat.h      |  3 +++
 arch/tile/include/asm/compat.h       |  3 +++
 arch/x86/include/asm/compat.h        |  3 +++
 fs/compat.c                          | 48 ++++++++++++++++++++++++++++-----
 fs/fcntl.c                           | 28 +++++++++++--------
 fs/locks.c                           | 52 +++++++++++++++++++++++++++++++++---
 include/uapi/asm-generic/fcntl.h     | 24 +++++++++++++++++
 security/selinux/hooks.c             |  6 +++++
 15 files changed, 174 insertions(+), 23 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index 3e94811..2fdda97 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -206,6 +206,9 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 	case F_GETLK64:
 	case F_SETLK64:
 	case F_SETLKW64:
+	case F_GETLKP64:
+	case F_SETLKP64:
+	case F_SETLKPW64:
 		if (copy_from_user(&user, (struct oabi_flock64 __user *)arg,
 				   sizeof(user)))
 			return -EFAULT;
@@ -223,6 +226,7 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 
 	switch (cmd) {
 	case F_GETLK64:
+	case F_GETLKP64:
 		if (!ret) {
 			user.l_type	= kernel.l_type;
 			user.l_whence	= kernel.l_whence;
@@ -235,6 +239,8 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 		}
 	case F_SETLK64:
 	case F_SETLKW64:
+	case F_SETLKP64:
+	case F_SETLKPW64:
 		set_fs(fs);
 	}
 
diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h
index fda2704..ee46cc4 100644
--- a/arch/arm64/include/asm/compat.h
+++ b/arch/arm64/include/asm/compat.h
@@ -117,6 +117,9 @@ struct compat_flock {
 #define F_GETLK64	12	/*  using 'struct flock64' */
 #define F_SETLK64	13
 #define F_SETLKW64	14
+#define F_GETLKP64	39
+#define F_SETLKP64	40
+#define F_SETLKPW64	41
 
 struct compat_flock64 {
 	short		l_type;
diff --git a/arch/mips/include/asm/compat.h b/arch/mips/include/asm/compat.h
index c4bd54a..9cd8c3c 100644
--- a/arch/mips/include/asm/compat.h
+++ b/arch/mips/include/asm/compat.h
@@ -92,6 +92,9 @@ struct compat_flock {
 #define F_GETLK64	33
 #define F_SETLK64	34
 #define F_SETLKW64	35
+#define F_GETLKP64	39
+#define F_SETLKP64	40
+#define F_SETLKPW64	41
 
 struct compat_flock64 {
 	short		l_type;
diff --git a/arch/mips/include/uapi/asm/fcntl.h b/arch/mips/include/uapi/asm/fcntl.h
index 6ca432f..ce6f5e3 100644
--- a/arch/mips/include/uapi/asm/fcntl.h
+++ b/arch/mips/include/uapi/asm/fcntl.h
@@ -47,6 +47,9 @@
 #define F_GETLK64	33	/*  using 'struct flock64' */
 #define F_SETLK64	34
 #define F_SETLKW64	35
+#define F_GETLKP64	39
+#define F_SETLKP64	40
+#define F_SETLKPW64	41
 #endif
 
 /*
diff --git a/arch/parisc/include/uapi/asm/fcntl.h b/arch/parisc/include/uapi/asm/fcntl.h
index 34a46cb..9289e88 100644
--- a/arch/parisc/include/uapi/asm/fcntl.h
+++ b/arch/parisc/include/uapi/asm/fcntl.h
@@ -25,6 +25,9 @@
 #define F_GETLK64	8
 #define F_SETLK64	9
 #define F_SETLKW64	10
+#define F_GETLKP64	39
+#define F_SETLKP64	40
+#define F_SETLKPW64	41
 
 #define F_GETOWN	11	/*  for sockets. */
 #define F_SETOWN	12	/*  for sockets. */
diff --git a/arch/powerpc/include/asm/compat.h b/arch/powerpc/include/asm/compat.h
index 84fdf68..ccc1ae2 100644
--- a/arch/powerpc/include/asm/compat.h
+++ b/arch/powerpc/include/asm/compat.h
@@ -81,6 +81,9 @@ struct compat_flock {
 #define F_GETLK64	12	/*  using 'struct flock64' */
 #define F_SETLK64	13
 #define F_SETLKW64	14
+#define F_GETLKP64	39
+#define F_SETLKP64	40
+#define F_SETLKPW64	41
 
 struct compat_flock64 {
 	short		l_type;
diff --git a/arch/s390/include/asm/compat.h b/arch/s390/include/asm/compat.h
index 4bf9da0..f18c209 100644
--- a/arch/s390/include/asm/compat.h
+++ b/arch/s390/include/asm/compat.h
@@ -130,9 +130,12 @@ struct compat_flock {
 	compat_pid_t	l_pid;
 };
 
-#define F_GETLK64       12
-#define F_SETLK64       13
-#define F_SETLKW64      14    
+#define F_GETLK64	12
+#define F_SETLK64	13
+#define F_SETLKW64	14
+#define F_GETLKP64	39
+#define F_SETLKP64	40
+#define F_SETLKPW64	41
 
 struct compat_flock64 {
 	short		l_type;
diff --git a/arch/sparc/include/asm/compat.h b/arch/sparc/include/asm/compat.h
index 830502fe..8af1898 100644
--- a/arch/sparc/include/asm/compat.h
+++ b/arch/sparc/include/asm/compat.h
@@ -114,6 +114,9 @@ struct compat_flock {
 #define F_GETLK64	12
 #define F_SETLK64	13
 #define F_SETLKW64	14
+#define F_GETLKP64	39
+#define F_SETLKP64	40
+#define F_SETLKPW64	41
 
 struct compat_flock64 {
 	short		l_type;
diff --git a/arch/tile/include/asm/compat.h b/arch/tile/include/asm/compat.h
index 78f1f2d..8ca0ccb 100644
--- a/arch/tile/include/asm/compat.h
+++ b/arch/tile/include/asm/compat.h
@@ -94,6 +94,9 @@ struct compat_flock {
 #define F_GETLK64	12	/*  using 'struct flock64' */
 #define F_SETLK64	13
 #define F_SETLKW64	14
+#define F_GETLKP64	39
+#define F_SETLKP64	40
+#define F_SETLKPW64	41
 
 struct compat_flock64 {
 	short		l_type;
diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 59c6c40..08a448b 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -87,6 +87,9 @@ struct compat_flock {
 #define F_GETLK64	12	/*  using 'struct flock64' */
 #define F_SETLK64	13
 #define F_SETLKW64	14
+#define F_GETLKP64	39
+#define F_SETLKP64	40
+#define F_SETLKPW64	41
 
 /*
  * IA32 uses 4 byte alignment for 64 bit quantities,
diff --git a/fs/compat.c b/fs/compat.c
index 6af20de..43a0973 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -399,17 +399,44 @@ static int put_compat_flock64(struct flock *kfl, struct compat_flock64 __user *u
 }
 #endif
 
+static unsigned int
+convert_fcntl_cmd(unsigned int cmd)
+{
+	switch (cmd) {
+	case F_GETLK64:
+		return F_GETLK;
+	case F_SETLK64:
+		return F_SETLK;
+	case F_SETLKW64:
+		return F_SETLKW;
+	case F_GETLKP64:
+		return F_GETLKP;
+	case F_SETLKP64:
+		return F_SETLKP;
+	case F_SETLKPW64:
+		return F_SETLKPW;
+	}
+
+	/* Should never happen! */
+	WARN(1, "%s: cmd=%u\n", __func__, cmd);
+	return cmd;
+}
+
 asmlinkage long compat_sys_fcntl64(unsigned int fd, unsigned int cmd,
 		unsigned long arg)
 {
 	mm_segment_t old_fs;
 	struct flock f;
 	long ret;
+	unsigned int conv_cmd;
 
 	switch (cmd) {
 	case F_GETLK:
+	case F_GETLKP:
 	case F_SETLK:
+	case F_SETLKP:
 	case F_SETLKW:
+	case F_SETLKPW:
 		ret = get_compat_flock(&f, compat_ptr(arg));
 		if (ret != 0)
 			break;
@@ -417,7 +444,7 @@ asmlinkage long compat_sys_fcntl64(unsigned int fd, unsigned int cmd,
 		set_fs(KERNEL_DS);
 		ret = sys_fcntl(fd, cmd, (unsigned long)&f);
 		set_fs(old_fs);
-		if (cmd == F_GETLK && ret == 0) {
+		if ((cmd == F_GETLK || cmd == F_GETLKP) && ret == 0) {
 			/* GETLK was successful and we need to return the data...
 			 * but it needs to fit in the compat structure.
 			 * l_start shouldn't be too big, unless the original
@@ -441,16 +468,18 @@ asmlinkage long compat_sys_fcntl64(unsigned int fd, unsigned int cmd,
 	case F_GETLK64:
 	case F_SETLK64:
 	case F_SETLKW64:
+	case F_GETLKP64:
+	case F_SETLKP64:
+	case F_SETLKPW64:
 		ret = get_compat_flock64(&f, compat_ptr(arg));
 		if (ret != 0)
 			break;
 		old_fs = get_fs();
 		set_fs(KERNEL_DS);
-		ret = sys_fcntl(fd, (cmd == F_GETLK64) ? F_GETLK :
-				((cmd == F_SETLK64) ? F_SETLK : F_SETLKW),
-				(unsigned long)&f);
+		conv_cmd = convert_fcntl_cmd(cmd);
+		ret = sys_fcntl(fd, conv_cmd, (unsigned long)&f);
 		set_fs(old_fs);
-		if (cmd == F_GETLK64 && ret == 0) {
+		if ((conv_cmd == F_GETLK || conv_cmd == F_GETLKP) && ret == 0) {
 			/* need to return lock information - see above for commentary */
 			if (f.l_start > COMPAT_LOFF_T_MAX)
 				ret = -EOVERFLOW;
@@ -471,8 +500,15 @@ asmlinkage long compat_sys_fcntl64(unsigned int fd, unsigned int cmd,
 asmlinkage long compat_sys_fcntl(unsigned int fd, unsigned int cmd,
 		unsigned long arg)
 {
-	if ((cmd == F_GETLK64) || (cmd == F_SETLK64) || (cmd == F_SETLKW64))
+	switch (cmd) {
+	case F_GETLK64:
+	case F_SETLK64:
+	case F_SETLKW64:
+	case F_GETLKP64:
+	case F_SETLKP64:
+	case F_SETLKPW64:
 		return -EINVAL;
+	}
 	return compat_sys_fcntl64(fd, cmd, arg);
 }
 
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 7ef7f2d..886817f 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -273,10 +273,13 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		err = setfl(fd, filp, arg);
 		break;
 	case F_GETLK:
+	case F_GETLKP:
 		err = fcntl_getlk(filp, cmd, (struct flock __user *) arg);
 		break;
 	case F_SETLK:
 	case F_SETLKW:
+	case F_SETLKP:
+	case F_SETLKPW:
 		err = fcntl_setlk(fd, filp, cmd, (struct flock __user *) arg);
 		break;
 	case F_GETOWN:
@@ -388,17 +391,20 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
 		goto out1;
 	
 	switch (cmd) {
-		case F_GETLK64:
-			err = fcntl_getlk64(f.file, cmd, (struct flock64 __user *) arg);
-			break;
-		case F_SETLK64:
-		case F_SETLKW64:
-			err = fcntl_setlk64(fd, f.file, cmd,
-					(struct flock64 __user *) arg);
-			break;
-		default:
-			err = do_fcntl(fd, cmd, arg, f.file);
-			break;
+	case F_GETLK64:
+	case F_GETLKP64:
+		err = fcntl_getlk64(f.file, cmd, (struct flock64 __user *) arg);
+		break;
+	case F_SETLK64:
+	case F_SETLKW64:
+	case F_SETLKP64:
+	case F_SETLKPW64:
+		err = fcntl_setlk64(fd, f.file, cmd,
+				(struct flock64 __user *) arg);
+		break;
+	default:
+		err = do_fcntl(fd, cmd, arg, f.file);
+		break;
 	}
 out1:
 	fdput(f);
diff --git a/fs/locks.c b/fs/locks.c
index 94bcca6..23c9d16 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1923,6 +1923,12 @@ int fcntl_getlk(struct file *filp, unsigned int cmd, struct flock __user *l)
 	if (error)
 		goto out;
 
+	if (cmd == F_GETLKP) {
+		cmd = F_GETLK;
+		file_lock.fl_owner = (fl_owner_t)filp;
+		file_lock.fl_flags |= FL_FILE_PVT;
+	}
+
 	error = vfs_test_lock(filp, &file_lock);
 	if (error)
 		goto out;
@@ -2042,10 +2048,26 @@ again:
 	error = flock_to_posix_lock(filp, file_lock, &flock);
 	if (error)
 		goto out;
-	if (cmd == F_SETLKW) {
+
+	/*
+	 * If the cmd is requesting file-private locks, then set the
+	 * FL_FILE_PVT flag and override the owner.
+	 */
+	switch (cmd) {
+	case F_SETLKP:
+		cmd = F_SETLK;
+		file_lock->fl_flags |= FL_FILE_PVT;
+		file_lock->fl_owner = (fl_owner_t)filp;
+		break;
+	case F_SETLKPW:
+		cmd = F_SETLKW;
+		file_lock->fl_flags |= FL_FILE_PVT;
+		file_lock->fl_owner = (fl_owner_t)filp;
+		/* Fallthrough */
+	case F_SETLKW:
 		file_lock->fl_flags |= FL_SLEEP;
 	}
-	
+
 	error = do_lock_file_wait(filp, cmd, file_lock);
 
 	/*
@@ -2091,6 +2113,12 @@ int fcntl_getlk64(struct file *filp, unsigned int cmd, struct flock64 __user *l)
 	if (error)
 		goto out;
 
+	if (cmd == F_GETLKP64) {
+		cmd = F_GETLK64;
+		file_lock.fl_owner = (fl_owner_t)filp;
+		file_lock.fl_flags |= FL_FILE_PVT;
+	}
+
 	error = vfs_test_lock(filp, &file_lock);
 	if (error)
 		goto out;
@@ -2143,7 +2171,23 @@ again:
 	error = flock64_to_posix_lock(filp, file_lock, &flock);
 	if (error)
 		goto out;
-	if (cmd == F_SETLKW64) {
+
+	/*
+	 * If the cmd is requesting file-private locks, then set the
+	 * FL_FILE_PVT flag and override the owner.
+	 */
+	switch (cmd) {
+	case F_SETLKP64:
+		cmd = F_SETLK64;
+		file_lock->fl_flags |= FL_FILE_PVT;
+		file_lock->fl_owner = (fl_owner_t)filp;
+		break;
+	case F_SETLKPW64:
+		cmd = F_SETLKW64;
+		file_lock->fl_flags |= FL_FILE_PVT;
+		file_lock->fl_owner = (fl_owner_t)filp;
+		/* Fallthrough */
+	case F_SETLKW64:
 		file_lock->fl_flags |= FL_SLEEP;
 	}
 	
@@ -2214,6 +2258,8 @@ void locks_remove_file(struct file *filp)
 	if (!inode->i_flock)
 		return;
 
+	locks_remove_posix(filp, (fl_owner_t)filp);
+
 	if (filp->f_op->flock) {
 		struct file_lock fl = {
 			.fl_pid = current->tgid,
diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
index 36025f7..952e26b 100644
--- a/include/uapi/asm-generic/fcntl.h
+++ b/include/uapi/asm-generic/fcntl.h
@@ -132,6 +132,30 @@
 #define F_GETOWNER_UIDS	17
 #endif
 
+/*
+ * fd "private" POSIX locks.
+ *
+ * Usually POSIX locks held by a process are released on *any* close and are
+ * not inherited across a fork().
+ *
+ * These cmd values will set locks that conflict with normal POSIX locks, but
+ * are "owned" by the opened file, not the process. This means that they are
+ * inherited across fork() like BSD (flock) locks, and they are only released
+ * automatically when the last reference to the the open file against which
+ * they were acquired is put.
+ */
+#define F_GETLKP	36
+#define F_SETLKP	37
+#define F_SETLKPW	38
+
+#ifndef CONFIG_64BIT
+#ifndef F_GETLK64
+#define F_GETLKP64	39
+#define F_SETLKP64	40
+#define F_SETLKPW64	41
+#endif
+#endif
+
 #define F_OWNER_TID	0
 #define F_OWNER_PID	1
 #define F_OWNER_PGRP	2
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6625699..488d449 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3287,10 +3287,16 @@ static int selinux_file_fcntl(struct file *file, unsigned int cmd,
 	case F_GETLK:
 	case F_SETLK:
 	case F_SETLKW:
+	case F_GETLKP:
+	case F_SETLKP:
+	case F_SETLKPW:
 #if BITS_PER_LONG == 32
 	case F_GETLK64:
 	case F_SETLK64:
 	case F_SETLKW64:
+	case F_GETLKP64:
+	case F_SETLKP64:
+	case F_SETLKPW64:
 #endif
 		err = file_has_perm(cred, file, FILE__LOCK);
 		break;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-09 14:19 ` [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks Jeff Layton
@ 2014-01-09 20:25   ` Andy Lutomirski
  2014-01-10  0:49     ` Jeff Layton
  0 siblings, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2014-01-09 20:25 UTC (permalink / raw)
  To: Jeff Layton, linux-fsdevel
  Cc: nfs-ganesha-devel, samba-technical, linux-kernel

On 01/09/2014 06:19 AM, Jeff Layton wrote:
> It's not really feasible to do deadlock detection with FL_FILE_PVT
> locks since they aren't owned by a single task, per-se. Deadlock
> detection also tends to be rather expensive so just skip it for
> these sorts of locks.

I just looked at the existing deadlock detector, and... eww.

When I think of deadlocks caused by r/w locks (which these are), I think
of two kinds.  First is what the current code tries to detect: two
processes that are each waiting for each other.  I don't know whether
POSIX enshrines the idea of detecting that, but I wouldn't be surprised,
considering how awful the old POSIX locks are.

The sensible kind of detectable deadlock involves just one lock, and it
happens when two processes both hold read locks and try to upgrade to
write locks.  This should be efficiently detectable and makes upgrading
locks safe(r).

I think I'd be happier if it's at least documented that the new fcntl
calls might (some day) detect that kind of deadlock.


All that being said, this patch series is awesome.  I've lost count of
the number of explosions I've seen to due POSIX lock crap.  Thanks!

--Andy


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 14/14] locks: add new fcntl cmd values for handling file private locks
  2014-01-09 14:19 ` [PATCH v5 14/14] locks: add new fcntl cmd values for handling file private locks Jeff Layton
@ 2014-01-09 20:29   ` Andy Lutomirski
  2014-01-10  0:55     ` Jeff Layton
  0 siblings, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2014-01-09 20:29 UTC (permalink / raw)
  To: Jeff Layton, linux-fsdevel
  Cc: nfs-ganesha-devel, samba-technical, linux-kernel

On 01/09/2014 06:19 AM, Jeff Layton wrote:
> Due to some unfortunate history, POSIX locks have very strange and
> unhelpful semantics. The thing that usually catches people by surprise
> is that they are dropped whenever the process closes any file descriptor
> associated with the inode.
> 

[...]

> +#define F_GETLKP	36
> +#define F_SETLKP	37
> +#define F_SETLKPW	38
> +
> +#ifndef CONFIG_64BIT
> +#ifndef F_GETLK64
> +#define F_GETLKP64	39
> +#define F_SETLKP64	40
> +#define F_SETLKPW64	41
> +#endif
> +#endif
> +

Since there are no existing callers of these fcntls, can you get rid of
the non-64-bit variants?  The implementation might be a bit more of
departure from current code, but it should make everything a lot cleaner
and make it easier (aka automatic) for new architectures to support this
feature.

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-09 20:25   ` Andy Lutomirski
@ 2014-01-10  0:49     ` Jeff Layton
  2014-01-10  0:58       ` Andy Lutomirski
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2014-01-10  0:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-fsdevel, nfs-ganesha-devel, samba-technical, linux-kernel

On Thu, 09 Jan 2014 12:25:25 -0800
Andy Lutomirski <luto@amacapital.net> wrote:

> On 01/09/2014 06:19 AM, Jeff Layton wrote:
> > It's not really feasible to do deadlock detection with FL_FILE_PVT
> > locks since they aren't owned by a single task, per-se. Deadlock
> > detection also tends to be rather expensive so just skip it for
> > these sorts of locks.
> 
> I just looked at the existing deadlock detector, and... eww.
> 

Actually, I find it to be pretty clever, but it's only useful in some
very limited cases. Also, the performance penalty it imposes on a
lock-heavy workload is pretty significant (I had some numbers when I
did the overhaul of the spinlocking around that code, but don't have
them handy).

> When I think of deadlocks caused by r/w locks (which these are), I think
> of two kinds.  First is what the current code tries to detect: two
> processes that are each waiting for each other.  I don't know whether
> POSIX enshrines the idea of detecting that, but I wouldn't be surprised,
> considering how awful the old POSIX locks are.
> 

It can walk down a chain of dependencies (which is the cool part), so
the two processes don't even need to be working on the same inode at
all in order for it to detect a deadlock. The catch is that there is no
real way to deal with stuff like threads with this mechanism.

In any case, the spec is here:

    http://pubs.opengroup.org/onlinepubs/009696899/functions/fcntl.html

...and it just says: "A potential for deadlock occurs if a process
controlling a locked region is put to sleep by attempting to lock
another process' locked region. If the system detects that sleeping
until a locked region is unlocked would cause a deadlock, fcntl() shall
fail with an [EDEADLK] error."

Since it just says "if the system detects", I take it to mean that all
of this deadlock detection stuff is entirely optional.

> The sensible kind of detectable deadlock involves just one lock, and it
> happens when two processes both hold read locks and try to upgrade to
> write locks.  This should be efficiently detectable and makes upgrading
> locks safe(r).
> 
> I think I'd be happier if it's at least documented that the new fcntl
> calls might (some day) detect that kind of deadlock.
> 
> 

Sure, I can toss a comment in to that effect.

Personally, I'm rather keen to avoid dealing with deadlock detection
here since it's a rather large pain to deal with. Deadlock detection is
the only reason we have a global spinlock in this code anymore. I
seriously considered ripping it all out when I was overhauling this a
few months ago.

I was able to get it to perform more reasonably by turning the global
list into a hash table, but I'd have preferred to remove it altogether.

> All that being said, this patch series is awesome.  I've lost count of
> the number of explosions I've seen to due POSIX lock crap.  Thanks!
> 

Thanks for having a look!

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 14/14] locks: add new fcntl cmd values for handling file private locks
  2014-01-09 20:29   ` Andy Lutomirski
@ 2014-01-10  0:55     ` Jeff Layton
  2014-01-10  1:01       ` Andy Lutomirski
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff Layton @ 2014-01-10  0:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-fsdevel, nfs-ganesha-devel, samba-technical, linux-kernel

On Thu, 09 Jan 2014 12:29:04 -0800
Andy Lutomirski <luto@amacapital.net> wrote:

> On 01/09/2014 06:19 AM, Jeff Layton wrote:
> > Due to some unfortunate history, POSIX locks have very strange and
> > unhelpful semantics. The thing that usually catches people by surprise
> > is that they are dropped whenever the process closes any file descriptor
> > associated with the inode.
> > 
> 
> [...]
> 
> > +#define F_GETLKP	36
> > +#define F_SETLKP	37
> > +#define F_SETLKPW	38
> > +
> > +#ifndef CONFIG_64BIT
> > +#ifndef F_GETLK64
> > +#define F_GETLKP64	39
> > +#define F_SETLKP64	40
> > +#define F_SETLKPW64	41
> > +#endif
> > +#endif
> > +
> 
> Since there are no existing callers of these fcntls, can you get rid of
> the non-64-bit variants?  The implementation might be a bit more of
> departure from current code, but it should make everything a lot cleaner
> and make it easier (aka automatic) for new architectures to support this
> feature.
> 

That sounds reasonable, but I'll admit I had some trouble slogging
through the morass of fcntl/fcntl64 syscall handling code. I mostly did
the cargo-cult thing on this patch to get something that worked.

So, to make sure I understand...

You're basically suggesting that we just require that 32-bit userland
always use fcntl64() to access these new cmd values?

I'll try to do that, but I'll probably need someone to carefully review
what I come up with (hint, hint).

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-10  0:49     ` Jeff Layton
@ 2014-01-10  0:58       ` Andy Lutomirski
  2014-01-14 19:27         ` J. Bruce Fields
  0 siblings, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2014-01-10  0:58 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Linux FS Devel, nfs-ganesha-devel, samba-technical, linux-kernel

On Thu, Jan 9, 2014 at 4:49 PM, Jeff Layton <jlayton@redhat.com> wrote:
> On Thu, 09 Jan 2014 12:25:25 -0800
> Andy Lutomirski <luto@amacapital.net> wrote:
>
>> On 01/09/2014 06:19 AM, Jeff Layton wrote:
>> > It's not really feasible to do deadlock detection with FL_FILE_PVT
>> > locks since they aren't owned by a single task, per-se. Deadlock
>> > detection also tends to be rather expensive so just skip it for
>> > these sorts of locks.
>>
>> I just looked at the existing deadlock detector, and... eww.
>>
>
> Actually, I find it to be pretty clever, but it's only useful in some
> very limited cases. Also, the performance penalty it imposes on a
> lock-heavy workload is pretty significant (I had some numbers when I
> did the overhaul of the spinlocking around that code, but don't have
> them handy).
>
>> When I think of deadlocks caused by r/w locks (which these are), I think
>> of two kinds.  First is what the current code tries to detect: two
>> processes that are each waiting for each other.  I don't know whether
>> POSIX enshrines the idea of detecting that, but I wouldn't be surprised,
>> considering how awful the old POSIX locks are.
>>
>
> It can walk down a chain of dependencies (which is the cool part), so
> the two processes don't even need to be working on the same inode at
> all in order for it to detect a deadlock. The catch is that there is no
> real way to deal with stuff like threads with this mechanism.
>
> In any case, the spec is here:
>
>     http://pubs.opengroup.org/onlinepubs/009696899/functions/fcntl.html
>
> ...and it just says: "A potential for deadlock occurs if a process
> controlling a locked region is put to sleep by attempting to lock
> another process' locked region. If the system detects that sleeping
> until a locked region is unlocked would cause a deadlock, fcntl() shall
> fail with an [EDEADLK] error."
>
> Since it just says "if the system detects", I take it to mean that all
> of this deadlock detection stuff is entirely optional.
>
>> The sensible kind of detectable deadlock involves just one lock, and it
>> happens when two processes both hold read locks and try to upgrade to
>> write locks.  This should be efficiently detectable and makes upgrading
>> locks safe(r).
>>
>> I think I'd be happier if it's at least documented that the new fcntl
>> calls might (some day) detect that kind of deadlock.
>>
>>
>
> Sure, I can toss a comment in to that effect.
>
> Personally, I'm rather keen to avoid dealing with deadlock detection
> here since it's a rather large pain to deal with. Deadlock detection is
> the only reason we have a global spinlock in this code anymore. I
> seriously considered ripping it all out when I was overhauling this a
> few months ago.
>
> I was able to get it to perform more reasonably by turning the global
> list into a hash table, but I'd have preferred to remove it altogether.

For this kind of deadlock detection, nothing global is needed -- I'm
only talking about detecting deadlocks due to two tasks upgrading
locks on the same file (with overlapping ranges) at the same time.

This is actually useful for SQL-like things.  Imagine this scenario:

Program 1:

Open a file
BEGIN;
SELECT whatever;  -- acquires a read lock

Program 2:

Open the same file
BEGIN;
SELECT whatever;  -- acquires a read lock

Program 1:
UPDATE something;  -- upgrades to write

Now program 1 is waiting for program 2 to release its lock.  But if
program 2 tries to UPDATE, then it deadlocks.  A friendly MySQL
implementation (which, sadly, does not include sqlite) will fail the
abort the transaction instead.  It would be nice if the kernel
supported this.

Note that unlocking and then re-locking for write is incorrect -- it
would allow program 2 to write inconsistent data.

I think that implementing this could be as simple as having some way
to check if a struct file_lock is currently trying to upgrade from
read to write and, if you try to upgrade and end up waiting for such a
lock, aborting.  The nasty case, though, is if you try to write-lock a
range while holding a read-lock on only part of the range -- you could
end up acquiring part of the range and deadlocking on the rest.  Now
you need to remember enough state to be able to abort.

(Actually, what happens if you receive a signal which waiting on a file lock?)

I would personally be okay with removing the existing deadlock
detector entirely.  I wouldn't be surprised if no one relies on it.

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 14/14] locks: add new fcntl cmd values for handling file private locks
  2014-01-10  0:55     ` Jeff Layton
@ 2014-01-10  1:01       ` Andy Lutomirski
  0 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-01-10  1:01 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Linux FS Devel, nfs-ganesha-devel, samba-technical, linux-kernel

On Thu, Jan 9, 2014 at 4:55 PM, Jeff Layton <jlayton@redhat.com> wrote:
> On Thu, 09 Jan 2014 12:29:04 -0800
> Andy Lutomirski <luto@amacapital.net> wrote:
>
>> On 01/09/2014 06:19 AM, Jeff Layton wrote:
>> > Due to some unfortunate history, POSIX locks have very strange and
>> > unhelpful semantics. The thing that usually catches people by surprise
>> > is that they are dropped whenever the process closes any file descriptor
>> > associated with the inode.
>> >
>>
>> [...]
>>
>> > +#define F_GETLKP   36
>> > +#define F_SETLKP   37
>> > +#define F_SETLKPW  38
>> > +
>> > +#ifndef CONFIG_64BIT
>> > +#ifndef F_GETLK64
>> > +#define F_GETLKP64 39
>> > +#define F_SETLKP64 40
>> > +#define F_SETLKPW64        41
>> > +#endif
>> > +#endif
>> > +
>>
>> Since there are no existing callers of these fcntls, can you get rid of
>> the non-64-bit variants?  The implementation might be a bit more of
>> departure from current code, but it should make everything a lot cleaner
>> and make it easier (aka automatic) for new architectures to support this
>> feature.
>>
>
> That sounds reasonable, but I'll admit I had some trouble slogging
> through the morass of fcntl/fcntl64 syscall handling code. I mostly did
> the cargo-cult thing on this patch to get something that worked.
>
> So, to make sure I understand...
>
> You're basically suggesting that we just require that 32-bit userland
> always use fcntl64() to access these new cmd values?

Exactly.

It's general Linux policy these days to avoid introducing new
interfaces that need compat support.  (The OABI thing is probably
unavoidable here, but at least the compat mess could be restricted to
ARM.)

You may be able to do pretty well by undoing the compat changes and
leaving everything else alone.  I suspect that everything will just
work.

>
> I'll try to do that, but I'll probably need someone to carefully review
> what I come up with (hint, hint).

I'll try.  I'm not an expert on this particular morass :)

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-10  0:58       ` Andy Lutomirski
@ 2014-01-14 19:27         ` J. Bruce Fields
  2014-01-14 20:29           ` Andy Lutomirski
  0 siblings, 1 reply; 39+ messages in thread
From: J. Bruce Fields @ 2014-01-14 19:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux FS Devel, nfs-ganesha-devel, samba-technical, linux-kernel,
	Jeff Layton

On Thu, Jan 09, 2014 at 04:58:59PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 9, 2014 at 4:49 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > On Thu, 09 Jan 2014 12:25:25 -0800
> > Andy Lutomirski <luto@amacapital.net> wrote:
> >> When I think of deadlocks caused by r/w locks (which these are), I think
> >> of two kinds.  First is what the current code tries to detect: two
> >> processes that are each waiting for each other.  I don't know whether
> >> POSIX enshrines the idea of detecting that, but I wouldn't be surprised,
> >> considering how awful the old POSIX locks are.
...
> >> The sensible kind of detectable deadlock involves just one lock, and it
> >> happens when two processes both hold read locks and try to upgrade to
> >> write locks.  This should be efficiently detectable and makes upgrading
> >> locks safe(r).

This also involves two processes waiting on each other, and the current
code should detect either case equally well.

...
> For this kind of deadlock detection, nothing global is needed -- I'm
> only talking about detecting deadlocks due to two tasks upgrading
> locks on the same file (with overlapping ranges) at the same time.
> 
> This is actually useful for SQL-like things.  Imagine this scenario:
> 
> Program 1:
> 
> Open a file
> BEGIN;
> SELECT whatever;  -- acquires a read lock
> 
> Program 2:
> 
> Open the same file
> BEGIN;
> SELECT whatever;  -- acquires a read lock
> 
> Program 1:
> UPDATE something;  -- upgrades to write
> 
> Now program 1 is waiting for program 2 to release its lock.  But if
> program 2 tries to UPDATE, then it deadlocks.  A friendly MySQL
> implementation (which, sadly, does not include sqlite) will fail the
> abort the transaction instead.

And then I suppose you'd need to get an exclusive lock when you retry,
to guarantee forward progress in the face of multiple processes retrying
at once.

I don't know, is this so useful?

> It would be nice if the kernel
> supported this.
> 
> Note that unlocking and then re-locking for write is incorrect -- it
> would allow program 2 to write inconsistent data.
> 
> I think that implementing this could be as simple as having some way
> to check if a struct file_lock is currently trying to upgrade from
> read to write and, if you try to upgrade and end up waiting for such a
> lock, aborting.

You have to be clear what you mean by "such a lock".  What you really
want to know is whether you'd be waiting on a lock that might be waiting
on a lock you hold.

To a first approximation, the current works with a graph with tasks as
nodes and an arrow from node X to node Y if X is waiting on a lock held
by node Y.  And it follows arrows in that graph looking for cycles.

And sure I guess it would be a bit nicer if you only bothered checking
for cycles that touch this one file.

But I'd really rather avoid the complication of deadlock detection
unless somebody can make a really strong case that they need it.

> The nasty case, though, is if you try to write-lock a
> range while holding a read-lock on only part of the range -- you could
> end up acquiring part of the range and deadlocking on the rest.  Now
> you need to remember enough state to be able to abort.

We wait until the entire lock can be applied, and then apply it all
atomically (under i_lock).

> (Actually, what happens if you receive a signal which waiting on a file lock?)

Return -EINTR.

> I would personally be okay with removing the existing deadlock
> detector entirely.  I wouldn't be surprised if no one relies on it.

I'd be in favor.

--b.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 19:27         ` J. Bruce Fields
@ 2014-01-14 20:29           ` Andy Lutomirski
  2014-01-14 21:10             ` J. Bruce Fields
  2014-01-14 21:21             ` Richard Hipp
  0 siblings, 2 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-01-14 20:29 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Linux FS Devel, nfs-ganesha-devel, samba-technical,
	linux-kernel, drh

[cc: drh, who I suspect is responsible for the most widespread
userspace software that uses this stuff]

On Tue, Jan 14, 2014 at 11:27 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Thu, Jan 09, 2014 at 04:58:59PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 9, 2014 at 4:49 PM, Jeff Layton <jlayton@redhat.com> wrote:
>> > On Thu, 09 Jan 2014 12:25:25 -0800
>> > Andy Lutomirski <luto@amacapital.net> wrote:
>> >> When I think of deadlocks caused by r/w locks (which these are), I think
>> >> of two kinds.  First is what the current code tries to detect: two
>> >> processes that are each waiting for each other.  I don't know whether
>> >> POSIX enshrines the idea of detecting that, but I wouldn't be surprised,
>> >> considering how awful the old POSIX locks are.
> ...
>> >> The sensible kind of detectable deadlock involves just one lock, and it
>> >> happens when two processes both hold read locks and try to upgrade to
>> >> write locks.  This should be efficiently detectable and makes upgrading
>> >> locks safe(r).
>
> This also involves two processes waiting on each other, and the current
> code should detect either case equally well.
>
> ...
>> For this kind of deadlock detection, nothing global is needed -- I'm
>> only talking about detecting deadlocks due to two tasks upgrading
>> locks on the same file (with overlapping ranges) at the same time.
>>
>> This is actually useful for SQL-like things.  Imagine this scenario:
>>
>> Program 1:
>>
>> Open a file
>> BEGIN;
>> SELECT whatever;  -- acquires a read lock
>>
>> Program 2:
>>
>> Open the same file
>> BEGIN;
>> SELECT whatever;  -- acquires a read lock
>>
>> Program 1:
>> UPDATE something;  -- upgrades to write
>>
>> Now program 1 is waiting for program 2 to release its lock.  But if
>> program 2 tries to UPDATE, then it deadlocks.  A friendly MySQL
>> implementation (which, sadly, does not include sqlite) will fail the
>> abort the transaction instead.
>
> And then I suppose you'd need to get an exclusive lock when you retry,
> to guarantee forward progress in the face of multiple processes retrying
> at once.

I don't think so -- as long as deadlock detection is 100% reliable and
if you have writer priority, then all that readers need to do is to
drop and re-acquire the read lock.  (This property is critical to
avoid livelocks in SQL.  I rely on it here: a deadlocked UPDATE just
retries the entire transaction without any special exclusive locks.)

>
> I don't know, is this so useful?
>
>> It would be nice if the kernel
>> supported this.
>>
>> Note that unlocking and then re-locking for write is incorrect -- it
>> would allow program 2 to write inconsistent data.
>>
>> I think that implementing this could be as simple as having some way
>> to check if a struct file_lock is currently trying to upgrade from
>> read to write and, if you try to upgrade and end up waiting for such a
>> lock, aborting.
>
> You have to be clear what you mean by "such a lock".  What you really
> want to know is whether you'd be waiting on a lock that might be waiting
> on a lock you hold.

By "such a lock" I mean a read lock on the same file that's trying to
upgrade to write.  I think that's the main (only?) interesting case.
Checking for this has the nice property that you don't need to iterate
and you don't care whom the holder of that lock is waiting for -- if
it's upgrading and you overlap with it, you are certainly in the
deadlock case.

>
> To a first approximation, the current works with a graph with tasks as
> nodes and an arrow from node X to node Y if X is waiting on a lock held
> by node Y.  And it follows arrows in that graph looking for cycles.
>
> And sure I guess it would be a bit nicer if you only bothered checking
> for cycles that touch this one file.
>
> But I'd really rather avoid the complication of deadlock detection
> unless somebody can make a really strong case that they need it.

TBH, I suspect that the person you really want to ask is drh, who
writes/maintains sqlite (cc'd).  sqlite has fancy locks built on top
of fcntl locks.

>
>> The nasty case, though, is if you try to write-lock a
>> range while holding a read-lock on only part of the range -- you could
>> end up acquiring part of the range and deadlocking on the rest.  Now
>> you need to remember enough state to be able to abort.
>
> We wait until the entire lock can be applied, and then apply it all
> atomically (under i_lock).
>
>> (Actually, what happens if you receive a signal which waiting on a file lock?)
>
> Return -EINTR.
>
>> I would personally be okay with removing the existing deadlock
>> detector entirely.  I wouldn't be surprised if no one relies on it.
>
> I'd be in favor.
>
> --b.

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 20:29           ` Andy Lutomirski
@ 2014-01-14 21:10             ` J. Bruce Fields
  2014-01-14 21:17               ` Andy Lutomirski
                                 ` (2 more replies)
  2014-01-14 21:21             ` Richard Hipp
  1 sibling, 3 replies; 39+ messages in thread
From: J. Bruce Fields @ 2014-01-14 21:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jeff Layton, drh, samba-technical, linux-kernel, Linux FS Devel,
	nfs-ganesha-devel

On Tue, Jan 14, 2014 at 12:29:17PM -0800, Andy Lutomirski wrote:
> [cc: drh, who I suspect is responsible for the most widespread
> userspace software that uses this stuff]
> 
> On Tue, Jan 14, 2014 at 11:27 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Thu, Jan 09, 2014 at 04:58:59PM -0800, Andy Lutomirski wrote:
> >> On Thu, Jan 9, 2014 at 4:49 PM, Jeff Layton <jlayton@redhat.com> wrote:
> >> > On Thu, 09 Jan 2014 12:25:25 -0800
> >> > Andy Lutomirski <luto@amacapital.net> wrote:
> >> >> When I think of deadlocks caused by r/w locks (which these are), I think
> >> >> of two kinds.  First is what the current code tries to detect: two
> >> >> processes that are each waiting for each other.  I don't know whether
> >> >> POSIX enshrines the idea of detecting that, but I wouldn't be surprised,
> >> >> considering how awful the old POSIX locks are.
> > ...
> >> >> The sensible kind of detectable deadlock involves just one lock, and it
> >> >> happens when two processes both hold read locks and try to upgrade to
> >> >> write locks.  This should be efficiently detectable and makes upgrading
> >> >> locks safe(r).
> >
> > This also involves two processes waiting on each other, and the current
> > code should detect either case equally well.
> >
> > ...
> >> For this kind of deadlock detection, nothing global is needed -- I'm
> >> only talking about detecting deadlocks due to two tasks upgrading
> >> locks on the same file (with overlapping ranges) at the same time.
> >>
> >> This is actually useful for SQL-like things.  Imagine this scenario:
> >>
> >> Program 1:
> >>
> >> Open a file
> >> BEGIN;
> >> SELECT whatever;  -- acquires a read lock
> >>
> >> Program 2:
> >>
> >> Open the same file
> >> BEGIN;
> >> SELECT whatever;  -- acquires a read lock
> >>
> >> Program 1:
> >> UPDATE something;  -- upgrades to write
> >>
> >> Now program 1 is waiting for program 2 to release its lock.  But if
> >> program 2 tries to UPDATE, then it deadlocks.  A friendly MySQL
> >> implementation (which, sadly, does not include sqlite) will fail the
> >> abort the transaction instead.
> >
> > And then I suppose you'd need to get an exclusive lock when you retry,
> > to guarantee forward progress in the face of multiple processes retrying
> > at once.
> 
> I don't think so -- as long as deadlock detection is 100% reliable and
> if you have writer priority,

We don't have writer priority.  Depending on how it worked I'm not
convinced it would help.  E.g. consider the above but with 3 processes:

	processes 1, 2, and 3 each get a whole-file read lock.

	process 1 requests a write lock, blocks because it conflicts
	with read locks held by 2 and 3.

	process 2 requests a write lock, gets -EDEADLK, unlocks and
	requests a new read lock.  That request succeeds because there
	is no conflicting lock.  (Note the lock manager had no
	opportunity to upgrade 1's lock here thanks to the conflict with
	3's lock.)

	process 3 requests a write lock, gets -EDEADLOCK, unlocks and
	requests a new read lock.

	Etc.

> then all that readers need to do is to
> drop and re-acquire the read lock.  (This property is critical to
> avoid livelocks in SQL.  I rely on it here: a deadlocked UPDATE just
> retries the entire transaction without any special exclusive locks.)
> 
> >
> > I don't know, is this so useful?
> >
> >> It would be nice if the kernel
> >> supported this.
> >>
> >> Note that unlocking and then re-locking for write is incorrect -- it
> >> would allow program 2 to write inconsistent data.
> >>
> >> I think that implementing this could be as simple as having some way
> >> to check if a struct file_lock is currently trying to upgrade from
> >> read to write and, if you try to upgrade and end up waiting for such a
> >> lock, aborting.
> >
> > You have to be clear what you mean by "such a lock".  What you really
> > want to know is whether you'd be waiting on a lock that might be waiting
> > on a lock you hold.
> 
> By "such a lock" I mean a read lock on the same file that's trying to
> upgrade to write.  I think that's the main (only?) interesting case.
> Checking for this has the nice property that you don't need to iterate
> and you don't care whom the holder of that lock is waiting for -- if
> it's upgrading and you overlap with it, you are certainly in the
> deadlock case.

OK, I think.

> > To a first approximation, the current works with a graph with tasks as
> > nodes and an arrow from node X to node Y if X is waiting on a lock held
> > by node Y.  And it follows arrows in that graph looking for cycles.
> >
> > And sure I guess it would be a bit nicer if you only bothered checking
> > for cycles that touch this one file.
> >
> > But I'd really rather avoid the complication of deadlock detection
> > unless somebody can make a really strong case that they need it.
> 
> TBH, I suspect that the person you really want to ask is drh, who
> writes/maintains sqlite (cc'd).  sqlite has fancy locks built on top
> of fcntl locks.

A quick check of the sqlite source shows some complaints about posix
locks in src/os.c.

Looks like all he's asking for his for the lock owner to be the file
descriptor not the pid, and for locks not to be thrown away on first
close.  Those are the two things jlayton addresses.

So yes I think it would be interesting to know whether some of the extra
layer of internal sqlite locking could have been chucked if it could
have been based on jlayton's locks.

--b.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:10             ` J. Bruce Fields
@ 2014-01-14 21:17               ` Andy Lutomirski
  2014-01-14 21:25                 ` J. Bruce Fields
  2014-01-14 21:18               ` Jeff Layton
  2014-01-14 21:19               ` Frank Filz
  2 siblings, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2014-01-14 21:17 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Linux FS Devel, nfs-ganesha-devel, samba-technical,
	linux-kernel, drh

On Tue, Jan 14, 2014 at 1:10 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Tue, Jan 14, 2014 at 12:29:17PM -0800, Andy Lutomirski wrote:
>> [cc: drh, who I suspect is responsible for the most widespread
>> userspace software that uses this stuff]
>>
>> On Tue, Jan 14, 2014 at 11:27 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> > On Thu, Jan 09, 2014 at 04:58:59PM -0800, Andy Lutomirski wrote:
>> >> On Thu, Jan 9, 2014 at 4:49 PM, Jeff Layton <jlayton@redhat.com> wrote:
>> >> > On Thu, 09 Jan 2014 12:25:25 -0800
>> >> > Andy Lutomirski <luto@amacapital.net> wrote:
>> >> >> When I think of deadlocks caused by r/w locks (which these are), I think
>> >> >> of two kinds.  First is what the current code tries to detect: two
>> >> >> processes that are each waiting for each other.  I don't know whether
>> >> >> POSIX enshrines the idea of detecting that, but I wouldn't be surprised,
>> >> >> considering how awful the old POSIX locks are.
>> > ...
>> >> >> The sensible kind of detectable deadlock involves just one lock, and it
>> >> >> happens when two processes both hold read locks and try to upgrade to
>> >> >> write locks.  This should be efficiently detectable and makes upgrading
>> >> >> locks safe(r).
>> >
>> > This also involves two processes waiting on each other, and the current
>> > code should detect either case equally well.
>> >
>> > ...
>> >> For this kind of deadlock detection, nothing global is needed -- I'm
>> >> only talking about detecting deadlocks due to two tasks upgrading
>> >> locks on the same file (with overlapping ranges) at the same time.
>> >>
>> >> This is actually useful for SQL-like things.  Imagine this scenario:
>> >>
>> >> Program 1:
>> >>
>> >> Open a file
>> >> BEGIN;
>> >> SELECT whatever;  -- acquires a read lock
>> >>
>> >> Program 2:
>> >>
>> >> Open the same file
>> >> BEGIN;
>> >> SELECT whatever;  -- acquires a read lock
>> >>
>> >> Program 1:
>> >> UPDATE something;  -- upgrades to write
>> >>
>> >> Now program 1 is waiting for program 2 to release its lock.  But if
>> >> program 2 tries to UPDATE, then it deadlocks.  A friendly MySQL
>> >> implementation (which, sadly, does not include sqlite) will fail the
>> >> abort the transaction instead.
>> >
>> > And then I suppose you'd need to get an exclusive lock when you retry,
>> > to guarantee forward progress in the face of multiple processes retrying
>> > at once.
>>
>> I don't think so -- as long as deadlock detection is 100% reliable and
>> if you have writer priority,
>
> We don't have writer priority.  Depending on how it worked I'm not
> convinced it would help.  E.g. consider the above but with 3 processes:
>
>         processes 1, 2, and 3 each get a whole-file read lock.
>
>         process 1 requests a write lock, blocks because it conflicts
>         with read locks held by 2 and 3.
>
>         process 2 requests a write lock, gets -EDEADLK, unlocks and
>         requests a new read lock.  That request succeeds because there
>         is no conflicting lock.  (Note the lock manager had no
>         opportunity to upgrade 1's lock here thanks to the conflict with
>         3's lock.)

Writer priority here would detect that someone's waiting for write
access and would cause new readers to block.

>
>         process 3 requests a write lock, gets -EDEADLOCK, unlocks and
>         requests a new read lock.
>
>         Etc.
>
>> then all that readers need to do is to
>> drop and re-acquire the read lock.  (This property is critical to
>> avoid livelocks in SQL.  I rely on it here: a deadlocked UPDATE just
>> retries the entire transaction without any special exclusive locks.)
>>
>> >
>> > I don't know, is this so useful?
>> >
>> >> It would be nice if the kernel
>> >> supported this.
>> >>
>> >> Note that unlocking and then re-locking for write is incorrect -- it
>> >> would allow program 2 to write inconsistent data.
>> >>
>> >> I think that implementing this could be as simple as having some way
>> >> to check if a struct file_lock is currently trying to upgrade from
>> >> read to write and, if you try to upgrade and end up waiting for such a
>> >> lock, aborting.
>> >
>> > You have to be clear what you mean by "such a lock".  What you really
>> > want to know is whether you'd be waiting on a lock that might be waiting
>> > on a lock you hold.
>>
>> By "such a lock" I mean a read lock on the same file that's trying to
>> upgrade to write.  I think that's the main (only?) interesting case.
>> Checking for this has the nice property that you don't need to iterate
>> and you don't care whom the holder of that lock is waiting for -- if
>> it's upgrading and you overlap with it, you are certainly in the
>> deadlock case.
>
> OK, I think.
>
>> > To a first approximation, the current works with a graph with tasks as
>> > nodes and an arrow from node X to node Y if X is waiting on a lock held
>> > by node Y.  And it follows arrows in that graph looking for cycles.
>> >
>> > And sure I guess it would be a bit nicer if you only bothered checking
>> > for cycles that touch this one file.
>> >
>> > But I'd really rather avoid the complication of deadlock detection
>> > unless somebody can make a really strong case that they need it.
>>
>> TBH, I suspect that the person you really want to ask is drh, who
>> writes/maintains sqlite (cc'd).  sqlite has fancy locks built on top
>> of fcntl locks.
>
> A quick check of the sqlite source shows some complaints about posix
> locks in src/os.c.
>
> Looks like all he's asking for his for the lock owner to be the file
> descriptor not the pid, and for locks not to be thrown away on first
> close.  Those are the two things jlayton addresses.
>
> So yes I think it would be interesting to know whether some of the extra
> layer of internal sqlite locking could have been chucked if it could
> have been based on jlayton's locks.

FWIW, at least last time I checked, sqlite didn't implement deadlock
detection (it uses timeouts instead).  That was one of my least
favorite things about sqlite.

With this feature in fcntl, I think that sqlite could add deadlock
detection and a true blocking mode without changing the file/locking
format, at least if it still works the way I remember it working.

Anyway, I still don't think that this feature should be a prerequisite
for the new lock types.

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:10             ` J. Bruce Fields
  2014-01-14 21:17               ` Andy Lutomirski
@ 2014-01-14 21:18               ` Jeff Layton
  2014-01-14 21:19               ` Frank Filz
  2 siblings, 0 replies; 39+ messages in thread
From: Jeff Layton @ 2014-01-14 21:18 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Florian Weimer, drh, samba-technical, linux-kernel,
	Andy Lutomirski, Linux FS Devel, nfs-ganesha-devel

On Tue, 14 Jan 2014 16:10:09 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Tue, Jan 14, 2014 at 12:29:17PM -0800, Andy Lutomirski wrote:
> > [cc: drh, who I suspect is responsible for the most widespread
> > userspace software that uses this stuff]
> > 
> > On Tue, Jan 14, 2014 at 11:27 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > > On Thu, Jan 09, 2014 at 04:58:59PM -0800, Andy Lutomirski wrote:
> > >> On Thu, Jan 9, 2014 at 4:49 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > >> > On Thu, 09 Jan 2014 12:25:25 -0800
> > >> > Andy Lutomirski <luto@amacapital.net> wrote:
> > >> >> When I think of deadlocks caused by r/w locks (which these are), I think
> > >> >> of two kinds.  First is what the current code tries to detect: two
> > >> >> processes that are each waiting for each other.  I don't know whether
> > >> >> POSIX enshrines the idea of detecting that, but I wouldn't be surprised,
> > >> >> considering how awful the old POSIX locks are.
> > > ...
> > >> >> The sensible kind of detectable deadlock involves just one lock, and it
> > >> >> happens when two processes both hold read locks and try to upgrade to
> > >> >> write locks.  This should be efficiently detectable and makes upgrading
> > >> >> locks safe(r).
> > >
> > > This also involves two processes waiting on each other, and the current
> > > code should detect either case equally well.
> > >
> > > ...
> > >> For this kind of deadlock detection, nothing global is needed -- I'm
> > >> only talking about detecting deadlocks due to two tasks upgrading
> > >> locks on the same file (with overlapping ranges) at the same time.
> > >>
> > >> This is actually useful for SQL-like things.  Imagine this scenario:
> > >>
> > >> Program 1:
> > >>
> > >> Open a file
> > >> BEGIN;
> > >> SELECT whatever;  -- acquires a read lock
> > >>
> > >> Program 2:
> > >>
> > >> Open the same file
> > >> BEGIN;
> > >> SELECT whatever;  -- acquires a read lock
> > >>
> > >> Program 1:
> > >> UPDATE something;  -- upgrades to write
> > >>
> > >> Now program 1 is waiting for program 2 to release its lock.  But if
> > >> program 2 tries to UPDATE, then it deadlocks.  A friendly MySQL
> > >> implementation (which, sadly, does not include sqlite) will fail the
> > >> abort the transaction instead.
> > >
> > > And then I suppose you'd need to get an exclusive lock when you retry,
> > > to guarantee forward progress in the face of multiple processes retrying
> > > at once.
> > 
> > I don't think so -- as long as deadlock detection is 100% reliable and
> > if you have writer priority,
> 
> We don't have writer priority.  Depending on how it worked I'm not
> convinced it would help.  E.g. consider the above but with 3 processes:
> 
> 	processes 1, 2, and 3 each get a whole-file read lock.
> 
> 	process 1 requests a write lock, blocks because it conflicts
> 	with read locks held by 2 and 3.
> 
> 	process 2 requests a write lock, gets -EDEADLK, unlocks and
> 	requests a new read lock.  That request succeeds because there
> 	is no conflicting lock.  (Note the lock manager had no
> 	opportunity to upgrade 1's lock here thanks to the conflict with
> 	3's lock.)
> 
> 	process 3 requests a write lock, gets -EDEADLOCK, unlocks and
> 	requests a new read lock.
> 
> 	Etc.
> 
> > then all that readers need to do is to
> > drop and re-acquire the read lock.  (This property is critical to
> > avoid livelocks in SQL.  I rely on it here: a deadlocked UPDATE just
> > retries the entire transaction without any special exclusive locks.)
> > 
> > >
> > > I don't know, is this so useful?
> > >
> > >> It would be nice if the kernel
> > >> supported this.
> > >>
> > >> Note that unlocking and then re-locking for write is incorrect -- it
> > >> would allow program 2 to write inconsistent data.
> > >>
> > >> I think that implementing this could be as simple as having some way
> > >> to check if a struct file_lock is currently trying to upgrade from
> > >> read to write and, if you try to upgrade and end up waiting for such a
> > >> lock, aborting.
> > >
> > > You have to be clear what you mean by "such a lock".  What you really
> > > want to know is whether you'd be waiting on a lock that might be waiting
> > > on a lock you hold.
> > 
> > By "such a lock" I mean a read lock on the same file that's trying to
> > upgrade to write.  I think that's the main (only?) interesting case.
> > Checking for this has the nice property that you don't need to iterate
> > and you don't care whom the holder of that lock is waiting for -- if
> > it's upgrading and you overlap with it, you are certainly in the
> > deadlock case.
> 
> OK, I think.
> 
> > > To a first approximation, the current works with a graph with tasks as
> > > nodes and an arrow from node X to node Y if X is waiting on a lock held
> > > by node Y.  And it follows arrows in that graph looking for cycles.
> > >
> > > And sure I guess it would be a bit nicer if you only bothered checking
> > > for cycles that touch this one file.
> > >
> > > But I'd really rather avoid the complication of deadlock detection
> > > unless somebody can make a really strong case that they need it.
> > 
> > TBH, I suspect that the person you really want to ask is drh, who
> > writes/maintains sqlite (cc'd).  sqlite has fancy locks built on top
> > of fcntl locks.
> 
> A quick check of the sqlite source shows some complaints about posix
> locks in src/os.c.
> 
> Looks like all he's asking for his for the lock owner to be the file
> descriptor not the pid, and for locks not to be thrown away on first
> close.  Those are the two things jlayton addresses.
> 
> So yes I think it would be interesting to know whether some of the extra
> layer of internal sqlite locking could have been chucked if it could
> have been based on jlayton's locks.
> 

FWIW, Florian (cc'ed here) emailed me recently asking about the
status of these patches. He specifically mentioned hacking up the sqlite
code to use these locks instead of the existing posix lock layer.

At the time, the patches were still pretty rough but I think they're
getting closer to their final form now.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:10             ` J. Bruce Fields
  2014-01-14 21:17               ` Andy Lutomirski
  2014-01-14 21:18               ` Jeff Layton
@ 2014-01-14 21:19               ` Frank Filz
  2014-01-14 21:24                 ` Andy Lutomirski
  2 siblings, 1 reply; 39+ messages in thread
From: Frank Filz @ 2014-01-14 21:19 UTC (permalink / raw)
  To: 'J. Bruce Fields', 'Andy Lutomirski'
  Cc: 'Jeff Layton', 'Linux FS Devel',
	nfs-ganesha-devel, samba-technical, linux-kernel, drh

> On Tue, Jan 14, 2014 at 12:29:17PM -0800, Andy Lutomirski wrote:
> > [cc: drh, who I suspect is responsible for the most widespread
> > userspace software that uses this stuff]
> >
> > On Tue, Jan 14, 2014 at 11:27 AM, J. Bruce Fields <bfields@fieldses.org>
> wrote:
> > > On Thu, Jan 09, 2014 at 04:58:59PM -0800, Andy Lutomirski wrote:
> > >> On Thu, Jan 9, 2014 at 4:49 PM, Jeff Layton <jlayton@redhat.com>
> wrote:
> > >> > On Thu, 09 Jan 2014 12:25:25 -0800 Andy Lutomirski
> > >> > <luto@amacapital.net> wrote:
> > >> >> When I think of deadlocks caused by r/w locks (which these are),
> > >> >> I think of two kinds.  First is what the current code tries to
> > >> >> detect: two processes that are each waiting for each other.  I
> > >> >> don't know whether POSIX enshrines the idea of detecting that,
> > >> >> but I wouldn't be surprised, considering how awful the old POSIX
> locks are.
> > > ...
> > >> >> The sensible kind of detectable deadlock involves just one lock,
> > >> >> and it happens when two processes both hold read locks and try
> > >> >> to upgrade to write locks.  This should be efficiently
> > >> >> detectable and makes upgrading locks safe(r).
> > >
> > > This also involves two processes waiting on each other, and the
> > > current code should detect either case equally well.
> > >
> > > ...
> > >> For this kind of deadlock detection, nothing global is needed --
> > >> I'm only talking about detecting deadlocks due to two tasks
> > >> upgrading locks on the same file (with overlapping ranges) at the
same
> time.
> > >>
> > >> This is actually useful for SQL-like things.  Imagine this scenario:
> > >>
> > >> Program 1:
> > >>
> > >> Open a file
> > >> BEGIN;
> > >> SELECT whatever;  -- acquires a read lock
> > >>
> > >> Program 2:
> > >>
> > >> Open the same file
> > >> BEGIN;
> > >> SELECT whatever;  -- acquires a read lock
> > >>
> > >> Program 1:
> > >> UPDATE something;  -- upgrades to write
> > >>
> > >> Now program 1 is waiting for program 2 to release its lock.  But if
> > >> program 2 tries to UPDATE, then it deadlocks.  A friendly MySQL
> > >> implementation (which, sadly, does not include sqlite) will fail
> > >> the abort the transaction instead.
> > >
> > > And then I suppose you'd need to get an exclusive lock when you
> > > retry, to guarantee forward progress in the face of multiple
> > > processes retrying at once.
> >
> > I don't think so -- as long as deadlock detection is 100% reliable and
> > if you have writer priority,
> 
> We don't have writer priority.  Depending on how it worked I'm not
> convinced it would help.  E.g. consider the above but with 3 processes:
> 
> 	processes 1, 2, and 3 each get a whole-file read lock.
> 
> 	process 1 requests a write lock, blocks because it conflicts
> 	with read locks held by 2 and 3.
> 
> 	process 2 requests a write lock, gets -EDEADLK, unlocks and
> 	requests a new read lock.  That request succeeds because there
> 	is no conflicting lock.  (Note the lock manager had no
> 	opportunity to upgrade 1's lock here thanks to the conflict with
> 	3's lock.)

As I understand write lock priority, process 2 requesting a new read lock
would block, once there is a write lock waiter, no further read locks would
be granted that would conflict with that waiting write lock.

> 	process 3 requests a write lock, gets -EDEADLOCK, unlocks and
> 	requests a new read lock.

And then at this point, all read locks have been dropped, and there are
pending read locks blocking. The blocked write lock can now be granted. When
that lock is done, the read locks are again granted, and then we go through
the cycle again, this time with just two processes. And then one of those
processes wins the write lock race and blocks while the other gets EDEADLK
and drops it's read lock at which point of course the blocked write lock is
immediately granted. And then finally process 3 is granted it's read lock
and is able to upgrade to a write lock.

Frank

> 	Etc.
> 
> > then all that readers need to do is to drop and re-acquire the read
> > lock.  (This property is critical to avoid livelocks in SQL.  I rely
> > on it here: a deadlocked UPDATE just retries the entire transaction
> > without any special exclusive locks.)
> >
> > >
> > > I don't know, is this so useful?
> > >
> > >> It would be nice if the kernel
> > >> supported this.
> > >>
> > >> Note that unlocking and then re-locking for write is incorrect --
> > >> it would allow program 2 to write inconsistent data.
> > >>
> > >> I think that implementing this could be as simple as having some
> > >> way to check if a struct file_lock is currently trying to upgrade
> > >> from read to write and, if you try to upgrade and end up waiting
> > >> for such a lock, aborting.
> > >
> > > You have to be clear what you mean by "such a lock".  What you
> > > really want to know is whether you'd be waiting on a lock that might
> > > be waiting on a lock you hold.
> >
> > By "such a lock" I mean a read lock on the same file that's trying to
> > upgrade to write.  I think that's the main (only?) interesting case.
> > Checking for this has the nice property that you don't need to iterate
> > and you don't care whom the holder of that lock is waiting for -- if
> > it's upgrading and you overlap with it, you are certainly in the
> > deadlock case.
> 
> OK, I think.
> 
> > > To a first approximation, the current works with a graph with tasks
> > > as nodes and an arrow from node X to node Y if X is waiting on a
> > > lock held by node Y.  And it follows arrows in that graph looking for
cycles.
> > >
> > > And sure I guess it would be a bit nicer if you only bothered
> > > checking for cycles that touch this one file.
> > >
> > > But I'd really rather avoid the complication of deadlock detection
> > > unless somebody can make a really strong case that they need it.
> >
> > TBH, I suspect that the person you really want to ask is drh, who
> > writes/maintains sqlite (cc'd).  sqlite has fancy locks built on top
> > of fcntl locks.
> 
> A quick check of the sqlite source shows some complaints about posix locks
> in src/os.c.
> 
> Looks like all he's asking for his for the lock owner to be the file
descriptor not
> the pid, and for locks not to be thrown away on first close.  Those are
the two
> things jlayton addresses.
> 
> So yes I think it would be interesting to know whether some of the extra
> layer of internal sqlite locking could have been chucked if it could have
been
> based on jlayton's locks.
> 
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 20:29           ` Andy Lutomirski
  2014-01-14 21:10             ` J. Bruce Fields
@ 2014-01-14 21:21             ` Richard Hipp
  2014-01-14 21:24               ` Andy Lutomirski
  2014-01-14 21:30               ` J. Bruce Fields
  1 sibling, 2 replies; 39+ messages in thread
From: Richard Hipp @ 2014-01-14 21:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Richard Hipp, Jeff Layton, samba-technical, linux-kernel,
	Linux FS Devel, nfs-ganesha-devel

I have no context here.  I'm not sure what you are discussing or what
questions you have or what SQLite has to do with any of it.  Nevertheless,
I have injected a few remarks inline....


On Tue, Jan 14, 2014 at 3:29 PM, Andy Lutomirski <luto@amacapital.net>wrote:

> [cc: drh, who I suspect is responsible for the most widespread
> userspace software that uses this stuff]
>
> On Tue, Jan 14, 2014 at 11:27 AM, J. Bruce Fields <bfields@fieldses.org>
> wrote:
> > On Thu, Jan 09, 2014 at 04:58:59PM -0800, Andy Lutomirski wrote:
> >> On Thu, Jan 9, 2014 at 4:49 PM, Jeff Layton <jlayton@redhat.com> wrote:
> >> > On Thu, 09 Jan 2014 12:25:25 -0800
> >> > Andy Lutomirski <luto@amacapital.net> wrote:
> >> >> When I think of deadlocks caused by r/w locks (which these are), I
> think
> >> >> of two kinds.  First is what the current code tries to detect: two
> >> >> processes that are each waiting for each other.  I don't know whether
> >> >> POSIX enshrines the idea of detecting that, but I wouldn't be
> surprised,
> >> >> considering how awful the old POSIX locks are.
> > ...
> >> >> The sensible kind of detectable deadlock involves just one lock, and
> it
> >> >> happens when two processes both hold read locks and try to upgrade to
> >> >> write locks.  This should be efficiently detectable and makes
> upgrading
> >> >> locks safe(r).
> >
> > This also involves two processes waiting on each other, and the current
> > code should detect either case equally well.
> >
> > ...
> >> For this kind of deadlock detection, nothing global is needed -- I'm
> >> only talking about detecting deadlocks due to two tasks upgrading
> >> locks on the same file (with overlapping ranges) at the same time.
> >>
> >> This is actually useful for SQL-like things.  Imagine this scenario:
> >>
> >> Program 1:
> >>
> >> Open a file
> >> BEGIN;
> >> SELECT whatever;  -- acquires a read lock
> >>
> >> Program 2:
> >>
> >> Open the same file
> >> BEGIN;
> >> SELECT whatever;  -- acquires a read lock
> >>
> >> Program 1:
> >> UPDATE something;  -- upgrades to write
>

Technicality:  The lock upgrade is deferred until content must be written
to disk, which in the usual case does not happen until COMMIT.


> >>
> >> Now program 1 is waiting for program 2 to release its lock.  But if
> >> program 2 tries to UPDATE, then it deadlocks.  A friendly ... SQL
> >> implementation (which, sadly, does not include sqlite) will ...
> >> abort the transaction instead.
>

SQLite fails the particular statement that needed the write lock (usually a
COMMIT) which gives the application an opportunity to either retry the
statement after a delay or to abort the transaction, as it sees fit.  The
usually practice is to pause for a few milliseconds and retry, aborting the
transaction only after many repeated failures.  There are convenience
functions (and a PRAGMA) in SQLite that make it easy for applications to
specify whatever behavior they want in this regard.



> >
> > And then I suppose you'd need to get an exclusive lock when you retry,
> > to guarantee forward progress in the face of multiple processes retrying
> > at once.
>
> I don't think so -- as long as deadlock detection is 100% reliable and
> if you have writer priority, then all that readers need to do is to
> drop and re-acquire the read lock.  (This property is critical to
> avoid livelocks in SQL.  I rely on it here: a deadlocked UPDATE just
> retries the entire transaction without any special exclusive locks.)
>

SQLite uses only F_SETLK, never F_SETLKW.  Doesn't that mean that SQLite
will work the same with or without deadlock detection?  Doesn't deadlock
detection only come into play with F_SETLKW?  Or am I missing something?


>
> >
> > I don't know, is this so useful?
> >
> >> It would be nice if the kernel
> >> supported this.
> >>
> >> Note that unlocking and then re-locking for write is incorrect -- it
> >> would allow program 2 to write inconsistent data.
> >>
> >> I think that implementing this could be as simple as having some way
> >> to check if a struct file_lock is currently trying to upgrade from
> >> read to write and, if you try to upgrade and end up waiting for such a
> >> lock, aborting.
> >
> > You have to be clear what you mean by "such a lock".  What you really
> > want to know is whether you'd be waiting on a lock that might be waiting
> > on a lock you hold.
>
> By "such a lock" I mean a read lock on the same file that's trying to
> upgrade to write.  I think that's the main (only?) interesting case.
> Checking for this has the nice property that you don't need to iterate
> and you don't care whom the holder of that lock is waiting for -- if
> it's upgrading and you overlap with it, you are certainly in the
> deadlock case.
>
> >
> > To a first approximation, the current works with a graph with tasks as
> > nodes and an arrow from node X to node Y if X is waiting on a lock held
> > by node Y.  And it follows arrows in that graph looking for cycles.
> >
> > And sure I guess it would be a bit nicer if you only bothered checking
> > for cycles that touch this one file.
> >
> > But I'd really rather avoid the complication of deadlock detection
> > unless somebody can make a really strong case that they need it.
>
> TBH, I suspect that the person you really want to ask is drh, who
> writes/maintains sqlite (cc'd).  sqlite has fancy locks built on top
> of fcntl locks.
>

See above.  SQLite uses only F_SETLK, never F_SETLKW.



>
> >
> >> The nasty case, though, is if you try to write-lock a
> >> range while holding a read-lock on only part of the range -- you could
> >> end up acquiring part of the range and deadlocking on the rest.  Now
> >> you need to remember enough state to be able to abort.
> >
> > We wait until the entire lock can be applied, and then apply it all
> > atomically (under i_lock).
> >
> >> (Actually, what happens if you receive a signal which waiting on a file
> lock?)
> >
> > Return -EINTR.
>

Huh.  SQLite is not checking for EINTR if fcntl(F_SETLK,...) fails.  Should
it be?  Or does EINTR only come up for F_SETLKW?

We do check for EINTR and retry for other system calls (read(), write(),
fallocate(), ftruncate(), close(), chmod(), open(), maybe others too).


> >
> >> I would personally be okay with removing the existing deadlock
> >> detector entirely.  I wouldn't be surprised if no one relies on it.
> >
> > I'd be in favor.
> >
> > --b.
>
> --Andy
>



-- 
D. Richard Hipp
drh@sqlite.org

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:19               ` Frank Filz
@ 2014-01-14 21:24                 ` Andy Lutomirski
  2014-01-14 21:26                   ` Andy Lutomirski
  2014-01-14 21:26                   ` J. Bruce Fields
  0 siblings, 2 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-01-14 21:24 UTC (permalink / raw)
  To: Frank Filz
  Cc: J. Bruce Fields, Jeff Layton, Linux FS Devel, nfs-ganesha-devel,
	samba-technical, linux-kernel, Richard Hipp

On Tue, Jan 14, 2014 at 1:19 PM, Frank Filz <ffilzlnx@mindspring.com> wrote:
>> On Tue, Jan 14, 2014 at 12:29:17PM -0800, Andy Lutomirski wrote:
>> > [cc: drh, who I suspect is responsible for the most widespread
>> > userspace software that uses this stuff]
>> >
>> > On Tue, Jan 14, 2014 at 11:27 AM, J. Bruce Fields <bfields@fieldses.org>
>> wrote:
>> > > On Thu, Jan 09, 2014 at 04:58:59PM -0800, Andy Lutomirski wrote:
>> > >> On Thu, Jan 9, 2014 at 4:49 PM, Jeff Layton <jlayton@redhat.com>
>> wrote:
>> > >> > On Thu, 09 Jan 2014 12:25:25 -0800 Andy Lutomirski
>> > >> > <luto@amacapital.net> wrote:
>> > >> >> When I think of deadlocks caused by r/w locks (which these are),
>> > >> >> I think of two kinds.  First is what the current code tries to
>> > >> >> detect: two processes that are each waiting for each other.  I
>> > >> >> don't know whether POSIX enshrines the idea of detecting that,
>> > >> >> but I wouldn't be surprised, considering how awful the old POSIX
>> locks are.
>> > > ...
>> > >> >> The sensible kind of detectable deadlock involves just one lock,
>> > >> >> and it happens when two processes both hold read locks and try
>> > >> >> to upgrade to write locks.  This should be efficiently
>> > >> >> detectable and makes upgrading locks safe(r).
>> > >
>> > > This also involves two processes waiting on each other, and the
>> > > current code should detect either case equally well.
>> > >
>> > > ...
>> > >> For this kind of deadlock detection, nothing global is needed --
>> > >> I'm only talking about detecting deadlocks due to two tasks
>> > >> upgrading locks on the same file (with overlapping ranges) at the
> same
>> time.
>> > >>
>> > >> This is actually useful for SQL-like things.  Imagine this scenario:
>> > >>
>> > >> Program 1:
>> > >>
>> > >> Open a file
>> > >> BEGIN;
>> > >> SELECT whatever;  -- acquires a read lock
>> > >>
>> > >> Program 2:
>> > >>
>> > >> Open the same file
>> > >> BEGIN;
>> > >> SELECT whatever;  -- acquires a read lock
>> > >>
>> > >> Program 1:
>> > >> UPDATE something;  -- upgrades to write
>> > >>
>> > >> Now program 1 is waiting for program 2 to release its lock.  But if
>> > >> program 2 tries to UPDATE, then it deadlocks.  A friendly MySQL
>> > >> implementation (which, sadly, does not include sqlite) will fail
>> > >> the abort the transaction instead.
>> > >
>> > > And then I suppose you'd need to get an exclusive lock when you
>> > > retry, to guarantee forward progress in the face of multiple
>> > > processes retrying at once.
>> >
>> > I don't think so -- as long as deadlock detection is 100% reliable and
>> > if you have writer priority,
>>
>> We don't have writer priority.  Depending on how it worked I'm not
>> convinced it would help.  E.g. consider the above but with 3 processes:
>>
>>       processes 1, 2, and 3 each get a whole-file read lock.
>>
>>       process 1 requests a write lock, blocks because it conflicts
>>       with read locks held by 2 and 3.
>>
>>       process 2 requests a write lock, gets -EDEADLK, unlocks and
>>       requests a new read lock.  That request succeeds because there
>>       is no conflicting lock.  (Note the lock manager had no
>>       opportunity to upgrade 1's lock here thanks to the conflict with
>>       3's lock.)
>
> As I understand write lock priority, process 2 requesting a new read lock
> would block, once there is a write lock waiter, no further read locks would
> be granted that would conflict with that waiting write lock.

...which reminds me -- if anyone implements writer priority, please
make it optional (either w/ a writer-priority-ignoring read lock or a
non-priority-granting write lock).  I have an application for which
writer priority would be really annoying.

Even better: Have read-lock-and-wait-for-pending-writers

(Writer priority a
>
>>       process 3 requests a write lock, gets -EDEADLOCK, unlocks and
>>       requests a new read lock.
>
> And then at this point, all read locks have been dropped, and there are
> pending read locks blocking. The blocked write lock can now be granted. When
> that lock is done, the read locks are again granted, and then we go through
> the cycle again, this time with just two processes. And then one of those
> processes wins the write lock race and blocks while the other gets EDEADLK
> and drops it's read lock at which point of course the blocked write lock is
> immediately granted. And then finally process 3 is granted it's read lock
> and is able to upgrade to a write lock.
>
> Frank
>
>>       Etc.
>>
>> > then all that readers need to do is to drop and re-acquire the read
>> > lock.  (This property is critical to avoid livelocks in SQL.  I rely
>> > on it here: a deadlocked UPDATE just retries the entire transaction
>> > without any special exclusive locks.)
>> >
>> > >
>> > > I don't know, is this so useful?
>> > >
>> > >> It would be nice if the kernel
>> > >> supported this.
>> > >>
>> > >> Note that unlocking and then re-locking for write is incorrect --
>> > >> it would allow program 2 to write inconsistent data.
>> > >>
>> > >> I think that implementing this could be as simple as having some
>> > >> way to check if a struct file_lock is currently trying to upgrade
>> > >> from read to write and, if you try to upgrade and end up waiting
>> > >> for such a lock, aborting.
>> > >
>> > > You have to be clear what you mean by "such a lock".  What you
>> > > really want to know is whether you'd be waiting on a lock that might
>> > > be waiting on a lock you hold.
>> >
>> > By "such a lock" I mean a read lock on the same file that's trying to
>> > upgrade to write.  I think that's the main (only?) interesting case.
>> > Checking for this has the nice property that you don't need to iterate
>> > and you don't care whom the holder of that lock is waiting for -- if
>> > it's upgrading and you overlap with it, you are certainly in the
>> > deadlock case.
>>
>> OK, I think.
>>
>> > > To a first approximation, the current works with a graph with tasks
>> > > as nodes and an arrow from node X to node Y if X is waiting on a
>> > > lock held by node Y.  And it follows arrows in that graph looking for
> cycles.
>> > >
>> > > And sure I guess it would be a bit nicer if you only bothered
>> > > checking for cycles that touch this one file.
>> > >
>> > > But I'd really rather avoid the complication of deadlock detection
>> > > unless somebody can make a really strong case that they need it.
>> >
>> > TBH, I suspect that the person you really want to ask is drh, who
>> > writes/maintains sqlite (cc'd).  sqlite has fancy locks built on top
>> > of fcntl locks.
>>
>> A quick check of the sqlite source shows some complaints about posix locks
>> in src/os.c.
>>
>> Looks like all he's asking for his for the lock owner to be the file
> descriptor not
>> the pid, and for locks not to be thrown away on first close.  Those are
> the two
>> things jlayton addresses.
>>
>> So yes I think it would be interesting to know whether some of the extra
>> layer of internal sqlite locking could have been chucked if it could have
> been
>> based on jlayton's locks.
>>
>> --b.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
> in the
>> body of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:21             ` Richard Hipp
@ 2014-01-14 21:24               ` Andy Lutomirski
  2014-01-14 21:43                 ` Richard Hipp
  2014-01-14 21:30               ` J. Bruce Fields
  1 sibling, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2014-01-14 21:24 UTC (permalink / raw)
  To: Richard Hipp
  Cc: J. Bruce Fields, Jeff Layton, Linux FS Devel, nfs-ganesha-devel,
	samba-technical, linux-kernel, Richard Hipp

On Tue, Jan 14, 2014 at 1:21 PM, Richard Hipp <drh@sqlite.org> wrote:
> I have no context here.  I'm not sure what you are discussing or what
> questions you have or what SQLite has to do with any of it.  Nevertheless, I
> have injected a few remarks inline....
>

The discussion is about a new set of fcntl locking commands that are
are respect locks acquired with the old ones but suck less.  This
seems like the right time to discuss what would make them even better.
 The immediate change is to let them be owned by the fd instead of the
process.

This might end up in POSIX and could make sqlite (and lots of other
things') lives easier.

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:17               ` Andy Lutomirski
@ 2014-01-14 21:25                 ` J. Bruce Fields
  0 siblings, 0 replies; 39+ messages in thread
From: J. Bruce Fields @ 2014-01-14 21:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jeff Layton, drh, samba-technical, linux-kernel, Linux FS Devel,
	nfs-ganesha-devel

On Tue, Jan 14, 2014 at 01:17:20PM -0800, Andy Lutomirski wrote:
> On Tue, Jan 14, 2014 at 1:10 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Tue, Jan 14, 2014 at 12:29:17PM -0800, Andy Lutomirski wrote:
> >> [cc: drh, who I suspect is responsible for the most widespread
> >> userspace software that uses this stuff]
> >>
> >> On Tue, Jan 14, 2014 at 11:27 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >> > On Thu, Jan 09, 2014 at 04:58:59PM -0800, Andy Lutomirski wrote:
> >> >> For this kind of deadlock detection, nothing global is needed -- I'm
> >> >> only talking about detecting deadlocks due to two tasks upgrading
> >> >> locks on the same file (with overlapping ranges) at the same time.
> >> >>
> >> >> This is actually useful for SQL-like things.  Imagine this scenario:
> >> >>
> >> >> Program 1:
> >> >>
> >> >> Open a file
> >> >> BEGIN;
> >> >> SELECT whatever;  -- acquires a read lock
> >> >>
> >> >> Program 2:
> >> >>
> >> >> Open the same file
> >> >> BEGIN;
> >> >> SELECT whatever;  -- acquires a read lock
> >> >>
> >> >> Program 1:
> >> >> UPDATE something;  -- upgrades to write
> >> >>
> >> >> Now program 1 is waiting for program 2 to release its lock.  But if
> >> >> program 2 tries to UPDATE, then it deadlocks.  A friendly MySQL
> >> >> implementation (which, sadly, does not include sqlite) will fail the
> >> >> abort the transaction instead.
> >> >
> >> > And then I suppose you'd need to get an exclusive lock when you retry,
> >> > to guarantee forward progress in the face of multiple processes retrying
> >> > at once.
> >>
> >> I don't think so -- as long as deadlock detection is 100% reliable and
> >> if you have writer priority,
> >
> > We don't have writer priority.  Depending on how it worked I'm not
> > convinced it would help.  E.g. consider the above but with 3 processes:
> >
> >         processes 1, 2, and 3 each get a whole-file read lock.
> >
> >         process 1 requests a write lock, blocks because it conflicts
> >         with read locks held by 2 and 3.
> >
> >         process 2 requests a write lock, gets -EDEADLK, unlocks and
> >         requests a new read lock.  That request succeeds because there
> >         is no conflicting lock.  (Note the lock manager had no
> >         opportunity to upgrade 1's lock here thanks to the conflict with
> >         3's lock.)
> 
> Writer priority here would detect that someone's waiting for write
> access and would cause new readers to block.

OK, got it, thanks.

I don't *think* traditional posix locks are allowed to block on a lock
that's not yet applied.  But I haven't thought it through.  If not, it
might be worth adding to a new lock type, though that could be a
slightly subtle distinction to document.

...
> FWIW, at least last time I checked, sqlite didn't implement deadlock
> detection (it uses timeouts instead).  That was one of my least
> favorite things about sqlite.
> 
> With this feature in fcntl, I think that sqlite could add deadlock
> detection and a true blocking mode without changing the file/locking
> format, at least if it still works the way I remember it working.
> 
> Anyway, I still don't think that this feature should be a prerequisite
> for the new lock types.

Well, if there's another posix lock brokenness here that we could easily
fix at the same time, it might make sense to.  At least it seems worth
understanding.

--b.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:24                 ` Andy Lutomirski
@ 2014-01-14 21:26                   ` Andy Lutomirski
  2014-01-14 21:34                     ` Frank Filz
  2014-01-14 21:51                     ` J. Bruce Fields
  2014-01-14 21:26                   ` J. Bruce Fields
  1 sibling, 2 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-01-14 21:26 UTC (permalink / raw)
  To: Frank Filz
  Cc: J. Bruce Fields, Jeff Layton, Linux FS Devel, nfs-ganesha-devel,
	samba-technical, linux-kernel, Richard Hipp

[grr, gmail -- I didn't actually intend to send that.]

On Tue, Jan 14, 2014 at 1:24 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Tue, Jan 14, 2014 at 1:19 PM, Frank Filz <ffilzlnx@mindspring.com> wrote:
>>>       process 2 requests a write lock, gets -EDEADLK, unlocks and
>>>       requests a new read lock.  That request succeeds because there
>>>       is no conflicting lock.  (Note the lock manager had no
>>>       opportunity to upgrade 1's lock here thanks to the conflict with
>>>       3's lock.)
>>
>> As I understand write lock priority, process 2 requesting a new read lock
>> would block, once there is a write lock waiter, no further read locks would
>> be granted that would conflict with that waiting write lock.
>
> ...which reminds me -- if anyone implements writer priority, please
> make it optional (either w/ a writer-priority-ignoring read lock or a
> non-priority-granting write lock).  I have an application for which
> writer priority would be really annoying.
>
> Even better: Have read-lock-and-wait-for-pending-writers be an explicit new operation.
>
> (Writer priority a

Writer priority can introduce new deadlocks.  Suppose that a reader
(holding a read lock) starts a subprocess that takes a new read lock
and waits for that subprocess.  Throw an unrelated process in that
tries to take a write lock and you have an instant deadlock.

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:24                 ` Andy Lutomirski
  2014-01-14 21:26                   ` Andy Lutomirski
@ 2014-01-14 21:26                   ` J. Bruce Fields
  1 sibling, 0 replies; 39+ messages in thread
From: J. Bruce Fields @ 2014-01-14 21:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Richard Hipp, Jeff Layton, Frank Filz, samba-technical,
	linux-kernel, Linux FS Devel, nfs-ganesha-devel

On Tue, Jan 14, 2014 at 01:24:23PM -0800, Andy Lutomirski wrote:
> On Tue, Jan 14, 2014 at 1:19 PM, Frank Filz <ffilzlnx@mindspring.com> wrote:
> >> On Tue, Jan 14, 2014 at 12:29:17PM -0800, Andy Lutomirski wrote:
> >> > [cc: drh, who I suspect is responsible for the most widespread
> >> > userspace software that uses this stuff]
> >> >
> >> > On Tue, Jan 14, 2014 at 11:27 AM, J. Bruce Fields <bfields@fieldses.org>
> >> wrote:
> >> > > On Thu, Jan 09, 2014 at 04:58:59PM -0800, Andy Lutomirski wrote:
> >> > >> On Thu, Jan 9, 2014 at 4:49 PM, Jeff Layton <jlayton@redhat.com>
> >> wrote:
> >> > >> > On Thu, 09 Jan 2014 12:25:25 -0800 Andy Lutomirski
> >> > >> > <luto@amacapital.net> wrote:
> >> > >> >> When I think of deadlocks caused by r/w locks (which these are),
> >> > >> >> I think of two kinds.  First is what the current code tries to
> >> > >> >> detect: two processes that are each waiting for each other.  I
> >> > >> >> don't know whether POSIX enshrines the idea of detecting that,
> >> > >> >> but I wouldn't be surprised, considering how awful the old POSIX
> >> locks are.
> >> > > ...
> >> > >> >> The sensible kind of detectable deadlock involves just one lock,
> >> > >> >> and it happens when two processes both hold read locks and try
> >> > >> >> to upgrade to write locks.  This should be efficiently
> >> > >> >> detectable and makes upgrading locks safe(r).
> >> > >
> >> > > This also involves two processes waiting on each other, and the
> >> > > current code should detect either case equally well.
> >> > >
> >> > > ...
> >> > >> For this kind of deadlock detection, nothing global is needed --
> >> > >> I'm only talking about detecting deadlocks due to two tasks
> >> > >> upgrading locks on the same file (with overlapping ranges) at the
> > same
> >> time.
> >> > >>
> >> > >> This is actually useful for SQL-like things.  Imagine this scenario:
> >> > >>
> >> > >> Program 1:
> >> > >>
> >> > >> Open a file
> >> > >> BEGIN;
> >> > >> SELECT whatever;  -- acquires a read lock
> >> > >>
> >> > >> Program 2:
> >> > >>
> >> > >> Open the same file
> >> > >> BEGIN;
> >> > >> SELECT whatever;  -- acquires a read lock
> >> > >>
> >> > >> Program 1:
> >> > >> UPDATE something;  -- upgrades to write
> >> > >>
> >> > >> Now program 1 is waiting for program 2 to release its lock.  But if
> >> > >> program 2 tries to UPDATE, then it deadlocks.  A friendly MySQL
> >> > >> implementation (which, sadly, does not include sqlite) will fail
> >> > >> the abort the transaction instead.
> >> > >
> >> > > And then I suppose you'd need to get an exclusive lock when you
> >> > > retry, to guarantee forward progress in the face of multiple
> >> > > processes retrying at once.
> >> >
> >> > I don't think so -- as long as deadlock detection is 100% reliable and
> >> > if you have writer priority,
> >>
> >> We don't have writer priority.  Depending on how it worked I'm not
> >> convinced it would help.  E.g. consider the above but with 3 processes:
> >>
> >>       processes 1, 2, and 3 each get a whole-file read lock.
> >>
> >>       process 1 requests a write lock, blocks because it conflicts
> >>       with read locks held by 2 and 3.
> >>
> >>       process 2 requests a write lock, gets -EDEADLK, unlocks and
> >>       requests a new read lock.  That request succeeds because there
> >>       is no conflicting lock.  (Note the lock manager had no
> >>       opportunity to upgrade 1's lock here thanks to the conflict with
> >>       3's lock.)
> >
> > As I understand write lock priority, process 2 requesting a new read lock
> > would block, once there is a write lock waiter, no further read locks would
> > be granted that would conflict with that waiting write lock.
> 
> ...which reminds me -- if anyone implements writer priority, please
> make it optional (either w/ a writer-priority-ignoring read lock or a
> non-priority-granting write lock).  I have an application for which
> writer priority would be really annoying.

Is it something you could describe briefly?

--b.

> 
> Even better: Have read-lock-and-wait-for-pending-writers

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:21             ` Richard Hipp
  2014-01-14 21:24               ` Andy Lutomirski
@ 2014-01-14 21:30               ` J. Bruce Fields
  1 sibling, 0 replies; 39+ messages in thread
From: J. Bruce Fields @ 2014-01-14 21:30 UTC (permalink / raw)
  To: Richard Hipp
  Cc: Richard Hipp, Jeff Layton, samba-technical, linux-kernel,
	Andy Lutomirski, Linux FS Devel, nfs-ganesha-devel

On Tue, Jan 14, 2014 at 04:21:53PM -0500, Richard Hipp wrote:
> SQLite uses only F_SETLK, never F_SETLKW.  Doesn't that mean that SQLite
> will work the same with or without deadlock detection?  Doesn't deadlock
> detection only come into play with F_SETLKW?

That's correct.

> > >> (Actually, what happens if you receive a signal which waiting on a file
> > lock?)
> > >
> > > Return -EINTR.
> >
> 
> Huh.  SQLite is not checking for EINTR if fcntl(F_SETLK,...) fails.  Should
> it be?  Or does EINTR only come up for F_SETLKW?

I don't know--I wonder if a distributed filesystem, for example, might
allow even a non-blocking lock request to be interrupted?  Might be
interesting to check what nfs does.

--b.

> 
> We do check for EINTR and retry for other system calls (read(), write(),
> fallocate(), ftruncate(), close(), chmod(), open(), maybe others too).

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:26                   ` Andy Lutomirski
@ 2014-01-14 21:34                     ` Frank Filz
  2014-01-14 21:51                     ` J. Bruce Fields
  1 sibling, 0 replies; 39+ messages in thread
From: Frank Filz @ 2014-01-14 21:34 UTC (permalink / raw)
  To: 'Andy Lutomirski'
  Cc: 'J. Bruce Fields', 'Jeff Layton',
	'Linux FS Devel',
	nfs-ganesha-devel, samba-technical, linux-kernel,
	'Richard Hipp'

> [grr, gmail -- I didn't actually intend to send that.]
> 
> On Tue, Jan 14, 2014 at 1:24 PM, Andy Lutomirski <luto@amacapital.net>
> wrote:
> > On Tue, Jan 14, 2014 at 1:19 PM, Frank Filz <ffilzlnx@mindspring.com>
> wrote:
> >>>       process 2 requests a write lock, gets -EDEADLK, unlocks and
> >>>       requests a new read lock.  That request succeeds because there
> >>>       is no conflicting lock.  (Note the lock manager had no
> >>>       opportunity to upgrade 1's lock here thanks to the conflict with
> >>>       3's lock.)
> >>
> >> As I understand write lock priority, process 2 requesting a new read
> >> lock would block, once there is a write lock waiter, no further read
> >> locks would be granted that would conflict with that waiting write
lock.
> >
> > ...which reminds me -- if anyone implements writer priority, please
> > make it optional (either w/ a writer-priority-ignoring read lock or a
> > non-priority-granting write lock).  I have an application for which
> > writer priority would be really annoying.
> >
> > Even better: Have read-lock-and-wait-for-pending-writers be an explicit
> new operation.
> >
> > (Writer priority a
> 
> Writer priority can introduce new deadlocks.  Suppose that a reader
(holding
> a read lock) starts a subprocess that takes a new read lock and waits for
that
> subprocess.  Throw an unrelated process in that tries to take a write lock
and
> you have an instant deadlock.

Hmm, that's an interesting one.

With the new private locks, you could avoid that, because you can pass the
read lock you already hold to that sub-process, such that the sub-process
doesn't have to get it's own lock on the record in question.

Frank

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:24               ` Andy Lutomirski
@ 2014-01-14 21:43                 ` Richard Hipp
  2014-01-15  4:10                   ` [Nfs-ganesha-devel] " Frank Filz
  0 siblings, 1 reply; 39+ messages in thread
From: Richard Hipp @ 2014-01-14 21:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Richard Hipp, Jeff Layton, samba-technical, linux-kernel,
	Linux FS Devel, nfs-ganesha-devel

On Tue, Jan 14, 2014 at 4:24 PM, Andy Lutomirski <luto@amacapital.net>wrote:

> On Tue, Jan 14, 2014 at 1:21 PM, Richard Hipp <drh@sqlite.org> wrote:
> > I have no context here.  I'm not sure what you are discussing or what
> > questions you have or what SQLite has to do with any of it.
>  Nevertheless, I
> > have injected a few remarks inline....
> >
>
> The discussion is about a new set of fcntl locking commands that are
> are respect locks acquired with the old ones but suck less.  This
> seems like the right time to discuss what would make them even better.
>  The immediate change is to let them be owned by the fd instead of the
> process.
>
> This might end up in POSIX and could make sqlite (and lots of other
> things') lives easier.
>
>
Sounds great.

A common cause of SQLite database corruption is when another application
thread does close(open(zDatabaseFile)) and deletes all of SQLite's locks
out from under it.  (Usually that happens in a misguided attempt to backup
the database.).

Any help we can get in that area will be appreciated!


-- 
D. Richard Hipp
drh@sqlite.org

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:26                   ` Andy Lutomirski
  2014-01-14 21:34                     ` Frank Filz
@ 2014-01-14 21:51                     ` J. Bruce Fields
  2014-01-14 22:26                       ` Andy Lutomirski
  1 sibling, 1 reply; 39+ messages in thread
From: J. Bruce Fields @ 2014-01-14 21:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Richard Hipp, Jeff Layton, Frank Filz, samba-technical,
	linux-kernel, Linux FS Devel, nfs-ganesha-devel

On Tue, Jan 14, 2014 at 01:26:26PM -0800, Andy Lutomirski wrote:
> [grr, gmail -- I didn't actually intend to send that.]
> 
> On Tue, Jan 14, 2014 at 1:24 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On Tue, Jan 14, 2014 at 1:19 PM, Frank Filz <ffilzlnx@mindspring.com> wrote:
> >>>       process 2 requests a write lock, gets -EDEADLK, unlocks and
> >>>       requests a new read lock.  That request succeeds because there
> >>>       is no conflicting lock.  (Note the lock manager had no
> >>>       opportunity to upgrade 1's lock here thanks to the conflict with
> >>>       3's lock.)
> >>
> >> As I understand write lock priority, process 2 requesting a new read lock
> >> would block, once there is a write lock waiter, no further read locks would
> >> be granted that would conflict with that waiting write lock.
> >
> > ...which reminds me -- if anyone implements writer priority, please
> > make it optional (either w/ a writer-priority-ignoring read lock or a
> > non-priority-granting write lock).  I have an application for which
> > writer priority would be really annoying.
> >
> > Even better: Have read-lock-and-wait-for-pending-writers be an explicit new operation.
> >
> > (Writer priority a
> 
> Writer priority can introduce new deadlocks.  Suppose that a reader
> (holding a read lock) starts a subprocess that takes a new read lock
> and waits for that subprocess.  Throw an unrelated process in that
> tries to take a write lock and you have an instant deadlock.

OK, so we definitely can't silently change existing lock behavior to
prioritize writes in this way.

A remaining interesting question is whether we'd like the new locks to
support either behavior or both.

I'd still be inclined to stick to the existing (unprioritized) behavior
just to minimize the scope of the project.

--b.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:51                     ` J. Bruce Fields
@ 2014-01-14 22:26                       ` Andy Lutomirski
  0 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-01-14 22:26 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Frank Filz, Jeff Layton, Linux FS Devel, nfs-ganesha-devel,
	samba-technical, linux-kernel, Richard Hipp

On Tue, Jan 14, 2014 at 1:51 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Tue, Jan 14, 2014 at 01:26:26PM -0800, Andy Lutomirski wrote:
>> [grr, gmail -- I didn't actually intend to send that.]
>>
>> On Tue, Jan 14, 2014 at 1:24 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> > On Tue, Jan 14, 2014 at 1:19 PM, Frank Filz <ffilzlnx@mindspring.com> wrote:
>> >>>       process 2 requests a write lock, gets -EDEADLK, unlocks and
>> >>>       requests a new read lock.  That request succeeds because there
>> >>>       is no conflicting lock.  (Note the lock manager had no
>> >>>       opportunity to upgrade 1's lock here thanks to the conflict with
>> >>>       3's lock.)
>> >>
>> >> As I understand write lock priority, process 2 requesting a new read lock
>> >> would block, once there is a write lock waiter, no further read locks would
>> >> be granted that would conflict with that waiting write lock.
>> >
>> > ...which reminds me -- if anyone implements writer priority, please
>> > make it optional (either w/ a writer-priority-ignoring read lock or a
>> > non-priority-granting write lock).  I have an application for which
>> > writer priority would be really annoying.
>> >
>> > Even better: Have read-lock-and-wait-for-pending-writers be an explicit new operation.
>> >
>> > (Writer priority a
>>
>> Writer priority can introduce new deadlocks.  Suppose that a reader
>> (holding a read lock) starts a subprocess that takes a new read lock
>> and waits for that subprocess.  Throw an unrelated process in that
>> tries to take a write lock and you have an instant deadlock.
>
> OK, so we definitely can't silently change existing lock behavior to
> prioritize writes in this way.
>
> A remaining interesting question is whether we'd like the new locks to
> support either behavior or both.
>
> I'd still be inclined to stick to the existing (unprioritized) behavior
> just to minimize the scope of the project.

I think it would be silly to change the behavior at all (other than
probably documenting that -EDEADLK is a valid return value) until this
stuff is merged.  None of this has identified anything that's either
wrong or unnecessarily limiting about the current proposal, so I see
no reason to try to do anything fancy right now.

Long term, I'd advocate for a new l_type value
F_RDLCK_WAIT_FOR_WRITERS (or the equivalent with a better name) and
implementing -EDEADLK, for the case where two overlapping upgrade
attempts conflict.

If it's indeed true that a failed F_SETLK (or F_SETLKW) does not
change lock state, documenting that would be nice, too.

Finally, on a completely unrelated note, IIRC lock positions are
treated as *signed* integers and can't be negative.  Documenting that
(or the reverse) would be nice, too.  This bit me once, and it's
probably briefly confused other people, too.

--Andy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [Nfs-ganesha-devel] [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks
  2014-01-14 21:43                 ` Richard Hipp
@ 2014-01-15  4:10                   ` Frank Filz
  0 siblings, 0 replies; 39+ messages in thread
From: Frank Filz @ 2014-01-15  4:10 UTC (permalink / raw)
  To: 'Richard Hipp', 'Andy Lutomirski'
  Cc: 'Richard Hipp', 'Jeff Layton',
	'samba-technical', linux-kernel, 'Linux FS Devel',
	'nfs-ganesha-devel'

That is exactly the behavior we are trying to avoid with the new private
locks. We are also adding behavior that allows an application to have
multiple threads of the same process to have their own locking context for a
file.

 

Frank

 

From: Richard Hipp [mailto:drh@sqlite.org] 
Sent: Tuesday, January 14, 2014 1:44 PM
To: Andy Lutomirski
Cc: Richard Hipp; Jeff Layton; samba-technical;
linux-kernel@vger.kernel.org; Linux FS Devel; nfs-ganesha-devel
Subject: Re: [Nfs-ganesha-devel] [PATCH v5 13/14] locks: skip deadlock
detection on FL_FILE_PVT locks

 

 

 

On Tue, Jan 14, 2014 at 4:24 PM, Andy Lutomirski <luto@amacapital.net
<mailto:luto@amacapital.net> > wrote:

On Tue, Jan 14, 2014 at 1:21 PM, Richard Hipp <drh@sqlite.org
<mailto:drh@sqlite.org> > wrote:
> I have no context here.  I'm not sure what you are discussing or what
> questions you have or what SQLite has to do with any of it.  Nevertheless,
I
> have injected a few remarks inline....
>

The discussion is about a new set of fcntl locking commands that are
are respect locks acquired with the old ones but suck less.  This
seems like the right time to discuss what would make them even better.
 The immediate change is to let them be owned by the fd instead of the
process.

This might end up in POSIX and could make sqlite (and lots of other
things') lives easier.



 

Sounds great. 

A common cause of SQLite database corruption is when another application
thread does close(open(zDatabaseFile)) and deletes all of SQLite's locks out
from under it.  (Usually that happens in a misguided attempt to backup the
database.).

Any help we can get in that area will be appreciated!



-- 
D. Richard Hipp
drh@sqlite.org <mailto:drh@sqlite.org>  

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2014-01-15  4:10 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-09 14:19 [PATCH v5 00/14] locks: implement "file-private" (aka UNPOSIX) locks Jeff Layton
2014-01-09 14:19 ` [PATCH v5 01/14] locks: close potential race between setlease and open Jeff Layton
2014-01-09 14:19 ` [PATCH v5 02/14] locks: clean up comment typo Jeff Layton
2014-01-09 14:19 ` [PATCH v5 03/14] locks: remove "inline" qualifier from fl_link manipulation functions Jeff Layton
2014-01-09 14:19 ` [PATCH v5 04/14] locks: add __acquires and __releases annotations to locks_start and locks_stop Jeff Layton
2014-01-09 14:19 ` [PATCH v5 05/14] locks: eliminate BUG() call when there's an unexpected lock on file close Jeff Layton
2014-01-09 14:19 ` [PATCH v5 06/14] locks: fix posix lock range overflow handling Jeff Layton
2014-01-09 14:19 ` [PATCH v5 07/14] locks: consolidate checks for compatible filp->f_mode values in setlk handlers Jeff Layton
2014-01-09 14:19 ` [PATCH v5 08/14] MAINTAINERS: add Bruce and myself to list of maintainers for file locking code Jeff Layton
2014-01-09 14:19 ` [PATCH v5 09/14] locks: rename locks_remove_flock to locks_remove_file Jeff Layton
2014-01-09 14:19 ` [PATCH v5 10/14] locks: make /proc/locks show IS_FILE_PVT locks with a P suffix Jeff Layton
2014-01-09 14:19 ` [PATCH v5 11/14] locks: report l_pid as -1 for FL_FILE_PVT locks Jeff Layton
2014-01-09 14:19 ` [PATCH v5 12/14] locks: pass the cmd value to fcntl_getlk/getlk64 Jeff Layton
2014-01-09 14:19 ` [PATCH v5 13/14] locks: skip deadlock detection on FL_FILE_PVT locks Jeff Layton
2014-01-09 20:25   ` Andy Lutomirski
2014-01-10  0:49     ` Jeff Layton
2014-01-10  0:58       ` Andy Lutomirski
2014-01-14 19:27         ` J. Bruce Fields
2014-01-14 20:29           ` Andy Lutomirski
2014-01-14 21:10             ` J. Bruce Fields
2014-01-14 21:17               ` Andy Lutomirski
2014-01-14 21:25                 ` J. Bruce Fields
2014-01-14 21:18               ` Jeff Layton
2014-01-14 21:19               ` Frank Filz
2014-01-14 21:24                 ` Andy Lutomirski
2014-01-14 21:26                   ` Andy Lutomirski
2014-01-14 21:34                     ` Frank Filz
2014-01-14 21:51                     ` J. Bruce Fields
2014-01-14 22:26                       ` Andy Lutomirski
2014-01-14 21:26                   ` J. Bruce Fields
2014-01-14 21:21             ` Richard Hipp
2014-01-14 21:24               ` Andy Lutomirski
2014-01-14 21:43                 ` Richard Hipp
2014-01-15  4:10                   ` [Nfs-ganesha-devel] " Frank Filz
2014-01-14 21:30               ` J. Bruce Fields
2014-01-09 14:19 ` [PATCH v5 14/14] locks: add new fcntl cmd values for handling file private locks Jeff Layton
2014-01-09 20:29   ` Andy Lutomirski
2014-01-10  0:55     ` Jeff Layton
2014-01-10  1:01       ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).