linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* DoS with POSIX file locks?
@ 2006-03-20 11:41 Miklos Szeredi
  2006-03-20 12:11 ` Matthew Wilcox
  0 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-20 11:41 UTC (permalink / raw)
  To: matthew; +Cc: linux-fsdevel, linux-kernel

Unlike open files there doesn't seem to be any limit on the number of
locks being held either globally or by a single process.

Hence an unprivileged process can consume large amounts of unswappable
kernel memory even if there are strict limits on other resources.

Is there a reason why this shouldn't be a problem?

Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 11:41 DoS with POSIX file locks? Miklos Szeredi
@ 2006-03-20 12:11 ` Matthew Wilcox
  2006-03-20 12:19   ` Miklos Szeredi
  0 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox @ 2006-03-20 12:11 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 20, 2006 at 12:41:21PM +0100, Miklos Szeredi wrote:
> Unlike open files there doesn't seem to be any limit on the number of
> locks being held either globally or by a single process.

RLIMIT_LOCKS

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 12:11 ` Matthew Wilcox
@ 2006-03-20 12:19   ` Miklos Szeredi
  2006-03-20 12:39     ` Matthew Wilcox
  0 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-20 12:19 UTC (permalink / raw)
  To: matthew; +Cc: linux-fsdevel, linux-kernel

> > Unlike open files there doesn't seem to be any limit on the number of
> > locks being held either globally or by a single process.
> 
> RLIMIT_LOCKS

Which is not actually used anywhere.

Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 12:19   ` Miklos Szeredi
@ 2006-03-20 12:39     ` Matthew Wilcox
  2006-03-20 12:52       ` Miklos Szeredi
  0 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox @ 2006-03-20 12:39 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 20, 2006 at 01:19:12PM +0100, Miklos Szeredi wrote:
> > > Unlike open files there doesn't seem to be any limit on the number of
> > > locks being held either globally or by a single process.
> > 
> > RLIMIT_LOCKS
> 
> Which is not actually used anywhere.

Right.  Um.  I took it out back in March 2003 after enough people
convinced me it wasn't worth trying to account for all the memory
processes use, and the userbeans project would take care of it anyway.
Haha.

It's hard to fix the accounting.  You have to deal with one thread
allocating the lock, and then a different thread freeing it.  We never
actually accounted for posix locks (which are the ones we really needed
to!) and on occasion had current->locks go negative, with all kinds of
associated badness.

So you're welcome to try and fix it, but you may well go mad doing so.
Fortunately, I'm no longer maintaining locks.c, but I'd be happy to
answer questions.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 12:39     ` Matthew Wilcox
@ 2006-03-20 12:52       ` Miklos Szeredi
  2006-03-20 13:13         ` Arjan van de Ven
  2006-03-20 15:32         ` Matthew Wilcox
  0 siblings, 2 replies; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-20 12:52 UTC (permalink / raw)
  To: matthew; +Cc: linux-fsdevel, linux-kernel

> Right.  Um.  I took it out back in March 2003 after enough people
> convinced me it wasn't worth trying to account for all the memory
> processes use, and the userbeans project would take care of it anyway.
> Haha.
> 
> It's hard to fix the accounting.  You have to deal with one thread
> allocating the lock, and then a different thread freeing it.  We never
> actually accounted for posix locks (which are the ones we really needed
> to!) and on occasion had current->locks go negative, with all kinds of
> associated badness.

Things look fairly straightforward if the accounting is done in
files_struct instead of task_struct.  At least for POSIX locks.  I
haven't looked at flocks or leases yet.

steal_locks() might cause problems, but that function should be gotten
rid of anyway.

Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 12:52       ` Miklos Szeredi
@ 2006-03-20 13:13         ` Arjan van de Ven
  2006-03-20 13:24           ` Miklos Szeredi
  2006-03-20 15:32         ` Matthew Wilcox
  1 sibling, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2006-03-20 13:13 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: matthew, linux-fsdevel, linux-kernel

On Mon, 2006-03-20 at 13:52 +0100, Miklos Szeredi wrote:
> > Right.  Um.  I took it out back in March 2003 after enough people
> > convinced me it wasn't worth trying to account for all the memory
> > processes use, and the userbeans project would take care of it anyway.
> > Haha.
> > 
> > It's hard to fix the accounting.  You have to deal with one thread
> > allocating the lock, and then a different thread freeing it.  We never
> > actually accounted for posix locks (which are the ones we really needed
> > to!) and on occasion had current->locks go negative, with all kinds of
> > associated badness.
> 
> Things look fairly straightforward if the accounting is done in
> files_struct instead of task_struct. 

that's the wrong place; you can send fd's over unix sockets to other
processes....



the better solution is to account per user struct, and keep a pointer
(and a refcount) of that user struct inside your lock data somehow.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 13:13         ` Arjan van de Ven
@ 2006-03-20 13:24           ` Miklos Szeredi
  2006-03-20 13:30             ` Arjan van de Ven
  0 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-20 13:24 UTC (permalink / raw)
  To: arjan; +Cc: matthew, linux-fsdevel, linux-kernel

> > > Right.  Um.  I took it out back in March 2003 after enough people
> > > convinced me it wasn't worth trying to account for all the memory
> > > processes use, and the userbeans project would take care of it anyway.
> > > Haha.
> > > 
> > > It's hard to fix the accounting.  You have to deal with one thread
> > > allocating the lock, and then a different thread freeing it.  We never
> > > actually accounted for posix locks (which are the ones we really needed
> > > to!) and on occasion had current->locks go negative, with all kinds of
> > > associated badness.
> > 
> > Things look fairly straightforward if the accounting is done in
> > files_struct instead of task_struct. 
> 
> that's the wrong place; you can send fd's over unix sockets to other
> processes....

POSIX locks have no association with fd's.  Only the inode and the
"owner" is relevant, where owner is derived from the files_struct
pointer for local locks.

Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 13:24           ` Miklos Szeredi
@ 2006-03-20 13:30             ` Arjan van de Ven
  2006-03-20 13:39               ` Miklos Szeredi
  0 siblings, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2006-03-20 13:30 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: matthew, linux-fsdevel, linux-kernel

On Mon, 2006-03-20 at 14:24 +0100, Miklos Szeredi wrote:
> > > > Right.  Um.  I took it out back in March 2003 after enough people
> > > > convinced me it wasn't worth trying to account for all the memory
> > > > processes use, and the userbeans project would take care of it anyway.
> > > > Haha.
> > > > 
> > > > It's hard to fix the accounting.  You have to deal with one thread
> > > > allocating the lock, and then a different thread freeing it.  We never
> > > > actually accounted for posix locks (which are the ones we really needed
> > > > to!) and on occasion had current->locks go negative, with all kinds of
> > > > associated badness.
> > > 
> > > Things look fairly straightforward if the accounting is done in
> > > files_struct instead of task_struct. 
> > 
> > that's the wrong place; you can send fd's over unix sockets to other
> > processes....
> 
> POSIX locks have no association with fd's.  Only the inode and the
> "owner" is relevant

but the point is that with unix sockets you can send inodes to other
processes.. who don't share files_struct



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 13:30             ` Arjan van de Ven
@ 2006-03-20 13:39               ` Miklos Szeredi
  0 siblings, 0 replies; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-20 13:39 UTC (permalink / raw)
  To: arjan; +Cc: matthew, linux-fsdevel, linux-kernel

> but the point is that with unix sockets you can send inodes to other
> processes.. who don't share files_struct

In which case the other process won't be able to manipulate
(unlock/upgrade/etc) the lock, since it doesn't own the lock (unless
it shares files_struct with the original process).

Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 12:52       ` Miklos Szeredi
  2006-03-20 13:13         ` Arjan van de Ven
@ 2006-03-20 15:32         ` Matthew Wilcox
  2006-03-20 16:41           ` Miklos Szeredi
  2006-03-20 18:22           ` Trond Myklebust
  1 sibling, 2 replies; 26+ messages in thread
From: Matthew Wilcox @ 2006-03-20 15:32 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel

On Mon, Mar 20, 2006 at 01:52:39PM +0100, Miklos Szeredi wrote:
> Things look fairly straightforward if the accounting is done in
> files_struct instead of task_struct.  At least for POSIX locks.  I
> haven't looked at flocks or leases yet.

I was thinking that would work, yes.  It might not be worth worrying
about accounting for leases/flocks since each process can only have one
of those per open file anyway.

> steal_locks() might cause problems, but that function should be gotten
> rid of anyway.

I quite agree.  Now we need to find a better way to solve the problem it
papers over.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 15:32         ` Matthew Wilcox
@ 2006-03-20 16:41           ` Miklos Szeredi
  2006-03-20 20:35             ` J. Bruce Fields
  2006-03-20 18:22           ` Trond Myklebust
  1 sibling, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-20 16:41 UTC (permalink / raw)
  To: matthew; +Cc: linux-fsdevel, linux-kernel

> > Things look fairly straightforward if the accounting is done in
> > files_struct instead of task_struct.  At least for POSIX locks.  I
> > haven't looked at flocks or leases yet.
> 
> I was thinking that would work, yes.  It might not be worth worrying
> about accounting for leases/flocks since each process can only have one
> of those per open file anyway.

Here's a minimally tested patch.  The only tricky part is when the
unlock splits an existing lock in two.

Also the limit checking is sloppy when the lock is split, and in that
case allows the counter to go one above the limit.

> > steal_locks() might cause problems, but that function should be gotten
> > rid of anyway.
> 
> I quite agree.  Now we need to find a better way to solve the problem it
> papers over.

steal_locks() is not yet handled in this patch.  Any ideas on what to
do with it?

Miklos

Index: linux/fs/locks.c
===================================================================
--- linux.orig/fs/locks.c	2006-03-20 06:53:29.000000000 +0100
+++ linux/fs/locks.c	2006-03-20 17:25:11.000000000 +0100
@@ -539,6 +539,11 @@ static void locks_wake_up_blocks(struct 
 	}
 }
 
+static int posix_lock_account(struct file_lock *fl)
+{
+	return IS_POSIX(fl) && fl->fl_owner == current->files;
+}
+
 /* Insert file lock fl into an inode's lock list at the position indicated
  * by pos. At the same time add the lock to the global file lock list.
  */
@@ -550,6 +555,9 @@ static void locks_insert_lock(struct fil
 	fl->fl_next = *pos;
 	*pos = fl;
 
+	if (posix_lock_account(fl))
+		atomic_inc(&current->files->posix_lock_ctr);
+
 	if (fl->fl_ops && fl->fl_ops->fl_insert)
 		fl->fl_ops->fl_insert(fl);
 }
@@ -564,6 +572,9 @@ static void locks_delete_lock(struct fil
 {
 	struct file_lock *fl = *thisfl_p;
 
+	if (posix_lock_account(fl))
+		atomic_dec(&current->files->posix_lock_ctr);
+
 	*thisfl_p = fl->fl_next;
 	fl->fl_next = NULL;
 	list_del_init(&fl->fl_link);
@@ -769,6 +780,15 @@ out:
 	return error;
 }
 
+static int posix_lock_allow(struct file_lock *request)
+{
+	if (!posix_lock_account(request))
+		return 1;
+
+	return atomic_read(&current->files->posix_lock_ctr) <
+		current->signal->rlim[RLIMIT_LOCKS].rlim_cur;
+}
+
 EXPORT_SYMBOL(posix_lock_file);
 
 static int __posix_lock_file(struct inode *inode, struct file_lock *request)
@@ -816,13 +836,16 @@ static int __posix_lock_file(struct inod
 	if (!(new_fl && new_fl2))
 		goto out;
 
+	/* If request is not unlock, check against resource limits.
+	 * If an existing lock is split, then the lock counter might
+	 * go one above the limit
+	 */
+	if (request->fl_type != F_UNLCK && !posix_lock_allow(request))
+		goto out;
+
 	/*
-	 * We've allocated the new locks in advance, so there are no
-	 * errors possible (and no blocking operations) from here on.
-	 * 
 	 * Find the first old lock with the same owner as the new lock.
 	 */
-	
 	before = &inode->i_flock;
 
 	/* First skip locks owned by other processes.  */
@@ -927,7 +950,18 @@ static int __posix_lock_file(struct inod
 		if (left == right) {
 			/* The new lock breaks the old one in two pieces,
 			 * so we have to use the second new lock.
+			 *
+			 * This is the only case, when an unlock
+			 * operation increases the number of used
+			 * locks, luckily nothing has yet been
+			 * touched.  So check rlimit...
 			 */
+			if (request->fl_type == F_UNLCK &&
+			    !posix_lock_allow(request)) {
+				error = -ENOLCK;
+				goto out;
+			}
+
 			left = new_fl2;
 			new_fl2 = NULL;
 			locks_copy_lock(left, right);
Index: linux/include/linux/file.h
===================================================================
--- linux.orig/include/linux/file.h	2006-03-20 06:53:29.000000000 +0100
+++ linux/include/linux/file.h	2006-03-20 15:33:05.000000000 +0100
@@ -40,6 +40,7 @@ struct files_struct {
 	fd_set open_fds_init;
 	struct file * fd_array[NR_OPEN_DEFAULT];
 	spinlock_t file_lock;     /* Protects concurrent writers.  Nests inside tsk->alloc_lock */
+	atomic_t posix_lock_ctr;
 };
 
 #define files_fdtable(files) (rcu_dereference((files)->fdt))
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c	2006-03-20 06:53:29.000000000 +0100
+++ linux/kernel/fork.c	2006-03-20 17:14:30.000000000 +0100
@@ -618,6 +618,7 @@ static struct files_struct *alloc_files(
 	fdt->free_files = NULL;
 	fdt->next = NULL;
 	rcu_assign_pointer(newf->fdt, fdt);
+	atomic_set(&newf->posix_lock_ctr, 0);
 out:
 	return newf;
 }




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 15:32         ` Matthew Wilcox
  2006-03-20 16:41           ` Miklos Szeredi
@ 2006-03-20 18:22           ` Trond Myklebust
  2006-03-21  9:44             ` Miklos Szeredi
  1 sibling, 1 reply; 26+ messages in thread
From: Trond Myklebust @ 2006-03-20 18:22 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel

On Mon, 2006-03-20 at 08:32 -0700, Matthew Wilcox wrote:
> On Mon, Mar 20, 2006 at 01:52:39PM +0100, Miklos Szeredi wrote:
> > Things look fairly straightforward if the accounting is done in
> > files_struct instead of task_struct.  At least for POSIX locks.  I
> > haven't looked at flocks or leases yet.
> 
> I was thinking that would work, yes.  It might not be worth worrying
> about accounting for leases/flocks since each process can only have one
> of those per open file anyway.
> 
> > steal_locks() might cause problems, but that function should be gotten
> > rid of anyway.
> 
> I quite agree.  Now we need to find a better way to solve the problem it
> papers over.

The _only_ sane way to solve it is to decree that you lose your locks if
you clone(CLONE_FILES) and then call exec(). If there are any
applications out there that rely on the current behaviour, then they are
doomed anyway since there is no way to implement it safely on non-local
filesystems.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 16:41           ` Miklos Szeredi
@ 2006-03-20 20:35             ` J. Bruce Fields
  2006-03-21  6:38               ` Miklos Szeredi
  0 siblings, 1 reply; 26+ messages in thread
From: J. Bruce Fields @ 2006-03-20 20:35 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: matthew, linux-fsdevel, linux-kernel

On Mon, Mar 20, 2006 at 05:41:30PM +0100, Miklos Szeredi wrote:
> > > Things look fairly straightforward if the accounting is done in
> > > files_struct instead of task_struct.  At least for POSIX locks.  I
> > > haven't looked at flocks or leases yet.
> > 
> > I was thinking that would work, yes.  It might not be worth worrying
> > about accounting for leases/flocks since each process can only have one
> > of those per open file anyway.
> 
> Here's a minimally tested patch.  The only tricky part is when the
> unlock splits an existing lock in two.
> 
> Also the limit checking is sloppy when the lock is split, and in that
> case allows the counter to go one above the limit.

Do you need to handle blocks as well as applied locks?

--b.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 20:35             ` J. Bruce Fields
@ 2006-03-21  6:38               ` Miklos Szeredi
  0 siblings, 0 replies; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-21  6:38 UTC (permalink / raw)
  To: bfields; +Cc: matthew, linux-fsdevel, linux-kernel

> > > > Things look fairly straightforward if the accounting is done in
> > > > files_struct instead of task_struct.  At least for POSIX locks.  I
> > > > haven't looked at flocks or leases yet.
> > > 
> > > I was thinking that would work, yes.  It might not be worth worrying
> > > about accounting for leases/flocks since each process can only have one
> > > of those per open file anyway.
> > 
> > Here's a minimally tested patch.  The only tricky part is when the
> > unlock splits an existing lock in two.
> > 
> > Also the limit checking is sloppy when the lock is split, and in that
> > case allows the counter to go one above the limit.
> 
> Do you need to handle blocks as well as applied locks?

I don't think so, because each task may only have at most one block at
a time.

Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-20 18:22           ` Trond Myklebust
@ 2006-03-21  9:44             ` Miklos Szeredi
  2006-03-21 17:28               ` Trond Myklebust
  0 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-21  9:44 UTC (permalink / raw)
  To: trond.myklebust; +Cc: matthew, miklos, linux-fsdevel, linux-kernel

> The _only_ sane way to solve it is to decree that you lose your locks if
> you clone(CLONE_FILES) and then call exec(). If there are any
> applications out there that rely on the current behaviour,

Apps using LinuxThreads seem to be candidates:

     According to POSIX 1003.1c, a successful `exec*' in one of the
     threads should automatically terminate all other threads in the
     program.  This behavior is not yet implemented in LinuxThreads.
     Calling `pthread_kill_other_threads_np' before `exec*' achieves
     much of the same behavior, except that if `exec*' ultimately
     fails, then all other threads are already killed.

steal_locks() was probably added as a workaround for this case, no?

I wonder how many installations still run LinuxThreads with 2.6 kernels.

Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-21  9:44             ` Miklos Szeredi
@ 2006-03-21 17:28               ` Trond Myklebust
  2006-03-21 17:58                 ` Miklos Szeredi
  0 siblings, 1 reply; 26+ messages in thread
From: Trond Myklebust @ 2006-03-21 17:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: matthew, linux-fsdevel, linux-kernel

On Tue, 2006-03-21 at 10:44 +0100, Miklos Szeredi wrote:
> > The _only_ sane way to solve it is to decree that you lose your locks if
> > you clone(CLONE_FILES) and then call exec(). If there are any
> > applications out there that rely on the current behaviour,
> 
> Apps using LinuxThreads seem to be candidates:
> 
>      According to POSIX 1003.1c, a successful `exec*' in one of the
>      threads should automatically terminate all other threads in the
>      program.  This behavior is not yet implemented in LinuxThreads.
>      Calling `pthread_kill_other_threads_np' before `exec*' achieves
>      much of the same behavior, except that if `exec*' ultimately
>      fails, then all other threads are already killed.
> 
> steal_locks() was probably added as a workaround for this case, no?

Possibly, but LinuxThreads were never really POSIX thread compliant
anyway. Anyhow, the problem isn't really LinuxThreads, it is rather that
the existence of the standalone CLONE_FILES flag allows you to do a lot
of weird inheritance crap with 'posix locks' that the POSIX standards
committees never even had to consider.

The fact remains, though, that steal_locks() is not a solution: it is
inherently broken for several of the most common filesystems that are
out there.

Cheers,
   Trond


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-21 17:28               ` Trond Myklebust
@ 2006-03-21 17:58                 ` Miklos Szeredi
  2006-03-21 18:16                   ` Trond Myklebust
  2006-03-21 19:16                   ` Chris Wright
  0 siblings, 2 replies; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-21 17:58 UTC (permalink / raw)
  To: trond.myklebust; +Cc: chrisw, matthew, linux-fsdevel, linux-kernel

> > Apps using LinuxThreads seem to be candidates:
> > 
> >      According to POSIX 1003.1c, a successful `exec*' in one of the
> >      threads should automatically terminate all other threads in the
> >      program.  This behavior is not yet implemented in LinuxThreads.
> >      Calling `pthread_kill_other_threads_np' before `exec*' achieves
> >      much of the same behavior, except that if `exec*' ultimately
> >      fails, then all other threads are already killed.
> > 
> > steal_locks() was probably added as a workaround for this case, no?
> 
> Possibly, but LinuxThreads were never really POSIX thread compliant
> anyway. Anyhow, the problem isn't really LinuxThreads, it is rather that
> the existence of the standalone CLONE_FILES flag allows you to do a lot
> of weird inheritance crap with 'posix locks' that the POSIX standards
> committees never even had to consider.

Yes.  The execve-with-multiple-threads/posix-locks interaction is not
documented for LinuxThreads but removing steal_locks() makes that
implementation slighly differently incompatible to POSIX.  Some
application _might_ be relying on the current behavior.

It's just a question of how much confidence do we have, that no app
will break if steal_locks() is removed.  This function was added by
Chris Wright on 2003-12-29 (Cset 1.1371.111.3):

  Add steal_locks helper for use in conjunction with unshare_files to
  make sure POSIX file lock semantics aren't broken due to
  unshare_files.

Chris, do you remember if this was due to some concrete breakage or
just a preemtive measure?

Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-21 17:58                 ` Miklos Szeredi
@ 2006-03-21 18:16                   ` Trond Myklebust
  2006-03-21 19:16                   ` Chris Wright
  1 sibling, 0 replies; 26+ messages in thread
From: Trond Myklebust @ 2006-03-21 18:16 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: chrisw, matthew, linux-fsdevel, linux-kernel

On Tue, 2006-03-21 at 18:58 +0100, Miklos Szeredi wrote:
> > > Apps using LinuxThreads seem to be candidates:
> > > 
> > >      According to POSIX 1003.1c, a successful `exec*' in one of the
> > >      threads should automatically terminate all other threads in the
> > >      program.  This behavior is not yet implemented in LinuxThreads.
> > >      Calling `pthread_kill_other_threads_np' before `exec*' achieves
> > >      much of the same behavior, except that if `exec*' ultimately
> > >      fails, then all other threads are already killed.
> > > 
> > > steal_locks() was probably added as a workaround for this case, no?
> > 
> > Possibly, but LinuxThreads were never really POSIX thread compliant
> > anyway. Anyhow, the problem isn't really LinuxThreads, it is rather that
> > the existence of the standalone CLONE_FILES flag allows you to do a lot
> > of weird inheritance crap with 'posix locks' that the POSIX standards
> > committees never even had to consider.
> 
> Yes.  The execve-with-multiple-threads/posix-locks interaction is not
> documented for LinuxThreads but removing steal_locks() makes that
> implementation slighly differently incompatible to POSIX.  Some
> application _might_ be relying on the current behavior.

That really doesn't matter: the current behaviour isn't ever going to
work for some of the most commonly used Linux filesystems out there.
IOW: Such an application isn't even going to be portable from one Linux
installation to another.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-21 17:58                 ` Miklos Szeredi
  2006-03-21 18:16                   ` Trond Myklebust
@ 2006-03-21 19:16                   ` Chris Wright
  2006-03-22  6:21                     ` Miklos Szeredi
  1 sibling, 1 reply; 26+ messages in thread
From: Chris Wright @ 2006-03-21 19:16 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: trond.myklebust, chrisw, matthew, linux-fsdevel, linux-kernel

* Miklos Szeredi (miklos@szeredi.hu) wrote:
> > > Apps using LinuxThreads seem to be candidates:
> > > 
> > >      According to POSIX 1003.1c, a successful `exec*' in one of the
> > >      threads should automatically terminate all other threads in the
> > >      program.  This behavior is not yet implemented in LinuxThreads.
> > >      Calling `pthread_kill_other_threads_np' before `exec*' achieves
> > >      much of the same behavior, except that if `exec*' ultimately
> > >      fails, then all other threads are already killed.
> > > 
> > > steal_locks() was probably added as a workaround for this case, no?
> > 
> > Possibly, but LinuxThreads were never really POSIX thread compliant
> > anyway. Anyhow, the problem isn't really LinuxThreads, it is rather that
> > the existence of the standalone CLONE_FILES flag allows you to do a lot
> > of weird inheritance crap with 'posix locks' that the POSIX standards
> > committees never even had to consider.
> 
> Yes.  The execve-with-multiple-threads/posix-locks interaction is not
> documented for LinuxThreads but removing steal_locks() makes that
> implementation slighly differently incompatible to POSIX.  Some
> application _might_ be relying on the current behavior.
> 
> It's just a question of how much confidence do we have, that no app
> will break if steal_locks() is removed.  This function was added by
> Chris Wright on 2003-12-29 (Cset 1.1371.111.3):
> 
>   Add steal_locks helper for use in conjunction with unshare_files to
>   make sure POSIX file lock semantics aren't broken due to
>   unshare_files.
> 
> Chris, do you remember if this was due to some concrete breakage or
> just a preemtive measure?

Concrete breakage.  Something like:

clone(CLONE_FILES)
  /* in child */
  lock
  execve
  lock

w/out the kludge[1], the lock fails.  I should have a test program about
that I wrote to test this, although it was originally triggered via some
LTP or LSB type of test (don't recall which).

thanks,
-chris

[1] happy to see it go.  i concur with Trond, there's no sane way to get
rid of it w/out formalizing CLONE_FILES and locks on exec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-21 19:16                   ` Chris Wright
@ 2006-03-22  6:21                     ` Miklos Szeredi
  2006-03-22 11:12                       ` Trond Myklebust
  0 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-22  6:21 UTC (permalink / raw)
  To: chrisw; +Cc: trond.myklebust, matthew, linux-fsdevel, linux-kernel

> Concrete breakage.  Something like:
> 
> clone(CLONE_FILES)
>   /* in child */
>   lock
>   execve
>   lock
> 
> w/out the kludge[1], the lock fails.  I should have a test program about
> that I wrote to test this, although it was originally triggered via some
> LTP or LSB type of test (don't recall which).
> 
> thanks,
> -chris
> 
> [1] happy to see it go.

We all agree on this then.

I'm just little paranoid about a real-world app (LTP/LSB don't matter)
relying on the current semantics.

But maybe there's no other way to find out, than to remove
steal_locks() and see if anybody complains.

> i concur with Trond, there's no sane way to get rid of it w/out
> formalizing CLONE_FILES and locks on exec

Probably there is.  It would involve allocating a separate
lock-owner-ID stored in files_struct but separate from it.  But it's
more complicated than simply not propagating locks on exec in the
CLONE_FILES case.

Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-22  6:21                     ` Miklos Szeredi
@ 2006-03-22 11:12                       ` Trond Myklebust
  2006-03-22 12:16                         ` Miklos Szeredi
  0 siblings, 1 reply; 26+ messages in thread
From: Trond Myklebust @ 2006-03-22 11:12 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: chrisw, matthew, linux-fsdevel, linux-kernel

On Wed, 2006-03-22 at 07:21 +0100, Miklos Szeredi wrote:
> > i concur with Trond, there's no sane way to get rid of it w/out
> > formalizing CLONE_FILES and locks on exec
> 
> Probably there is.  It would involve allocating a separate
> lock-owner-ID stored in files_struct but separate from it.  But it's
> more complicated than simply not propagating locks on exec in the
> CLONE_FILES case.

That doesn't solve the fundamental problem.

You would still have to be able to tell a remote server that some locks
which previously belonged to one owner are being reallocated to several
owners. That is not something that NFS, CIFS, AFS,... support, and I'd
be surprised if any of the clustered filesystems support it either.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-22 11:12                       ` Trond Myklebust
@ 2006-03-22 12:16                         ` Miklos Szeredi
  2006-03-22 15:56                           ` Trond Myklebust
  0 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-22 12:16 UTC (permalink / raw)
  To: trond.myklebust; +Cc: chrisw, matthew, linux-fsdevel, linux-kernel

> > > i concur with Trond, there's no sane way to get rid of it w/out
> > > formalizing CLONE_FILES and locks on exec
> > 
> > Probably there is.  It would involve allocating a separate
> > lock-owner-ID stored in files_struct but separate from it.  But it's
> > more complicated than simply not propagating locks on exec in the
> > CLONE_FILES case.
> 
> That doesn't solve the fundamental problem.
> 
> You would still have to be able to tell a remote server that some locks
> which previously belonged to one owner are being reallocated to several
> owners.

No changing of lock owner is involved, that's the whole point.

Instead of trying to explain, here's a untested/unfinished patch
showing the idea.

Miklos

Index: linux/fs/locks.c
===================================================================
--- linux.orig/fs/locks.c	2006-03-22 11:20:15.000000000 +0100
+++ linux/fs/locks.c	2006-03-22 13:03:47.000000000 +0100
@@ -333,7 +333,7 @@ static int flock_to_posix_lock(struct fi
 	if (fl->fl_end < fl->fl_start)
 		return -EOVERFLOW;
 	
-	fl->fl_owner = current->files;
+	fl->fl_owner = current->files->lock_owner;
 	fl->fl_pid = current->tgid;
 	fl->fl_file = filp;
 	fl->fl_flags = FL_POSIX;
@@ -379,7 +379,7 @@ static int flock64_to_posix_lock(struct 
 	if (fl->fl_end < fl->fl_start)
 		return -EOVERFLOW;
 	
-	fl->fl_owner = current->files;
+	fl->fl_owner = current->files->lock_owner;
 	fl->fl_pid = current->tgid;
 	fl->fl_file = filp;
 	fl->fl_flags = FL_POSIX;
@@ -432,7 +432,7 @@ static struct lock_manager_operations le
  */
 static int lease_init(struct file *filp, int type, struct file_lock *fl)
  {
-	fl->fl_owner = current->files;
+	fl->fl_owner = current->files->lock_owner;
 	fl->fl_pid = current->tgid;
 
 	fl->fl_file = filp;
@@ -1003,7 +1003,7 @@ EXPORT_SYMBOL(posix_lock_file_wait);
  */
 int locks_mandatory_locked(struct inode *inode)
 {
-	fl_owner_t owner = current->files;
+	fl_owner_t owner = current->files->lock_owner;
 	struct file_lock *fl;
 
 	/*
@@ -1041,7 +1041,7 @@ int locks_mandatory_area(int read_write,
 	int error;
 
 	locks_init_lock(&fl);
-	fl.fl_owner = current->files;
+	fl.fl_owner = current->files->lock_owner;
 	fl.fl_pid = current->tgid;
 	fl.fl_file = filp;
 	fl.fl_flags = FL_POSIX | FL_ACCESS;
@@ -1141,7 +1141,7 @@ int __break_lease(struct inode *inode, u
 		goto out;
 
 	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next)
-		if (fl->fl_owner == current->files)
+		if (fl->fl_owner == current->files->lock_owner)
 			i_have_this_lease = 1;
 
 	if (mode & FMODE_WRITE) {
@@ -2183,55 +2183,23 @@ int lock_may_write(struct inode *inode, 
 
 EXPORT_SYMBOL(lock_may_write);
 
-static inline void __steal_locks(struct file *file, fl_owner_t from)
-{
-	struct inode *inode = file->f_dentry->d_inode;
-	struct file_lock *fl = inode->i_flock;
-
-	while (fl) {
-		if (fl->fl_file == file && fl->fl_owner == from)
-			fl->fl_owner = current->files;
-		fl = fl->fl_next;
-	}
-}
-
 /* When getting ready for executing a binary, we make sure that current
  * has a files_struct on its own. Before dropping the old files_struct,
  * we take over ownership of all locks for all file descriptors we own.
  * Note that we may accidentally steal a lock for a file that a sibling
  * has created since the unshare_files() call.
  */
-void steal_locks(fl_owner_t from)
+void steal_locks(struct files_struct *from)
 {
 	struct files_struct *files = current->files;
-	int i, j;
-	struct fdtable *fdt;
+	struct lock_owner *tmp;
 
 	if (from == files)
 		return;
 
-	lock_kernel();
-	j = 0;
-	rcu_read_lock();
-	fdt = files_fdtable(files);
-	for (;;) {
-		unsigned long set;
-		i = j * __NFDBITS;
-		if (i >= fdt->max_fdset || i >= fdt->max_fds)
-			break;
-		set = fdt->open_fds->fds_bits[j++];
-		while (set) {
-			if (set & 1) {
-				struct file *file = fdt->fd[i];
-				if (file)
-					__steal_locks(file, from);
-			}
-			i++;
-			set >>= 1;
-		}
-	}
-	rcu_read_unlock();
-	unlock_kernel();
+	tmp = from->lock_owner;
+	from->lock_owner = files->lock_owner;
+	files->lock_owner = tmp;
 }
 EXPORT_SYMBOL(steal_locks);
 
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2006-03-22 11:20:15.000000000 +0100
+++ linux/include/linux/fs.h	2006-03-22 12:59:09.000000000 +0100
@@ -678,7 +678,7 @@ extern spinlock_t files_lock;
  *
  * Lockd stuffs a "host" pointer into this.
  */
-typedef struct files_struct *fl_owner_t;
+typedef struct lock_owner *fl_owner_t;
 
 struct file_lock_operations {
 	void (*fl_insert)(struct file_lock *);	/* lock insertion callback */
@@ -767,7 +767,7 @@ extern int setlease(struct file *, long,
 extern int lease_modify(struct file_lock **, int);
 extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
 extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
-extern void steal_locks(fl_owner_t from);
+extern void steal_locks(struct files_struct *from);
 
 struct fasync_struct {
 	int	magic;
Index: linux/fs/fcntl.c
===================================================================
--- linux.orig/fs/fcntl.c	2006-03-20 06:53:29.000000000 +0100
+++ linux/fs/fcntl.c	2006-03-22 13:02:36.000000000 +0100
@@ -177,7 +177,7 @@ asmlinkage long sys_dup2(unsigned int ol
 	spin_unlock(&files->file_lock);
 
 	if (tofree)
-		filp_close(tofree, files);
+		filp_close(tofree, files->lock_owner);
 	err = newfd;
 out:
 	return err;
Index: linux/fs/open.c
===================================================================
--- linux.orig/fs/open.c	2006-03-20 06:53:29.000000000 +0100
+++ linux/fs/open.c	2006-03-22 13:01:39.000000000 +0100
@@ -1159,7 +1159,7 @@ asmlinkage long sys_close(unsigned int f
 	FD_CLR(fd, fdt->close_on_exec);
 	__put_unused_fd(files, fd);
 	spin_unlock(&files->file_lock);
-	return filp_close(filp, files);
+	return filp_close(filp, files->lock_owner);
 
 out_unlock:
 	spin_unlock(&files->file_lock);
Index: linux/include/linux/file.h
===================================================================
--- linux.orig/include/linux/file.h	2006-03-22 11:20:15.000000000 +0100
+++ linux/include/linux/file.h	2006-03-22 12:28:21.000000000 +0100
@@ -29,6 +29,11 @@ struct fdtable {
 	struct fdtable *next;
 };
 
+struct lock_owner {
+	/* lock accounting will go in here */
+	int dummy;
+};
+
 /*
  * Open file table structure
  */
@@ -40,6 +45,7 @@ struct files_struct {
 	fd_set open_fds_init;
 	struct file * fd_array[NR_OPEN_DEFAULT];
 	spinlock_t file_lock;     /* Protects concurrent writers.  Nests inside tsk->alloc_lock */
+	struct lock_owner *lock_owner;
 };
 
 #define files_fdtable(files) (rcu_dereference((files)->fdt))
Index: linux/kernel/fork.c
===================================================================
--- linux.orig/kernel/fork.c	2006-03-22 11:20:15.000000000 +0100
+++ linux/kernel/fork.c	2006-03-22 12:48:51.000000000 +0100
@@ -605,6 +605,7 @@ static struct files_struct *alloc_files(
 		goto out;
 
 	atomic_set(&newf->count, 1);
+	newf->lock_owner = kmalloc(sizeof(struct lock_owner), SLAB_KERNEL);
 
 	spin_lock_init(&newf->file_lock);
 	fdt = &newf->fdtab;
Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c	2006-03-22 13:12:20.000000000 +0100
+++ linux/kernel/exit.c	2006-03-22 13:12:50.000000000 +0100
@@ -395,7 +395,7 @@ static void close_files(struct files_str
 			if (set & 1) {
 				struct file * file = xchg(&fdt->fd[i], NULL);
 				if (file)
-					filp_close(file, files);
+					filp_close(file, files->lock_owner);
 			}
 			i++;
 			set >>= 1;
@@ -422,6 +422,7 @@ void fastcall put_files_struct(struct fi
 
 	if (atomic_dec_and_test(&files->count)) {
 		close_files(files);
+		kfree(files->lock_owner);
 		/*
 		 * Free the fd and fdset arrays if we expanded them.
 		 * If the fdtable was embedded, pass files for freeing







^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-22 12:16                         ` Miklos Szeredi
@ 2006-03-22 15:56                           ` Trond Myklebust
  2006-03-22 16:34                             ` Miklos Szeredi
  0 siblings, 1 reply; 26+ messages in thread
From: Trond Myklebust @ 2006-03-22 15:56 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: chrisw, matthew, linux-fsdevel, linux-kernel

On Wed, 2006-03-22 at 13:16 +0100, Miklos Szeredi wrote:
> > > > i concur with Trond, there's no sane way to get rid of it w/out
> > > > formalizing CLONE_FILES and locks on exec
> > > 
> > > Probably there is.  It would involve allocating a separate
> > > lock-owner-ID stored in files_struct but separate from it.  But it's
> > > more complicated than simply not propagating locks on exec in the
> > > CLONE_FILES case.
> > 
> > That doesn't solve the fundamental problem.
> > 
> > You would still have to be able to tell a remote server that some locks
> > which previously belonged to one owner are being reallocated to several
> > owners.
> 
> No changing of lock owner is involved, that's the whole point.

You still don't get it. For NFS/CIFS/... the locks on the server _also_
have a lock owner. The local lockowner is completely and utterly
irrelevant.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-22 15:56                           ` Trond Myklebust
@ 2006-03-22 16:34                             ` Miklos Szeredi
  2006-03-22 20:07                               ` Trond Myklebust
  0 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-22 16:34 UTC (permalink / raw)
  To: trond.myklebust; +Cc: chrisw, matthew, linux-fsdevel, linux-kernel

> > > > > i concur with Trond, there's no sane way to get rid of it w/out
> > > > > formalizing CLONE_FILES and locks on exec
> > > > 
> > > > Probably there is.  It would involve allocating a separate
> > > > lock-owner-ID stored in files_struct but separate from it.  But it's
> > > > more complicated than simply not propagating locks on exec in the
> > > > CLONE_FILES case.
> > > 
> > > That doesn't solve the fundamental problem.
> > > 
> > > You would still have to be able to tell a remote server that some locks
> > > which previously belonged to one owner are being reallocated to several
> > > owners.
> > 
> > No changing of lock owner is involved, that's the whole point.
> 
> You still don't get it.

No!  You don't get it! ;)

> For NFS/CIFS/... the locks on the server _also_ have a lock
> owner. The local lockowner is completely and utterly irrelevant.

You mean the "local lockowner being stable" is irrelevant.

Yes that is true, but the patch not only makes the local lockowner
stable, it makes the "owner" stable.  And that is the important part
for NFS, etc.

The remote lockowner has to be derived from the owner, which used to
be current->files, but is changed to current->file->owner.

The fact that current->file->owner will remain stable across the exec
will mean that locking will behave consistently for local _and_ remote
filesystems.

Now I'm not saying I want to keep this weird semantics of always
inheriting locks on exec.  All I'm saying that it's _possible_.

Miklos


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-22 16:34                             ` Miklos Szeredi
@ 2006-03-22 20:07                               ` Trond Myklebust
  2006-03-22 20:19                                 ` Miklos Szeredi
  0 siblings, 1 reply; 26+ messages in thread
From: Trond Myklebust @ 2006-03-22 20:07 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: chrisw, matthew, linux-fsdevel, linux-kernel

On Wed, 2006-03-22 at 17:34 +0100, Miklos Szeredi wrote:

> You mean the "local lockowner being stable" is irrelevant.
> 
> Yes that is true, but the patch not only makes the local lockowner
> stable, it makes the "owner" stable.  And that is the important part
> for NFS, etc.
> 
> The remote lockowner has to be derived from the owner, which used to
> be current->files, but is changed to current->file->owner.
> 
> The fact that current->file->owner will remain stable across the exec
> will mean that locking will behave consistently for local _and_ remote
> filesystems.
> 
> Now I'm not saying I want to keep this weird semantics of always
> inheriting locks on exec.  All I'm saying that it's _possible_.

You'd have to ensure that none of the threads involved are able to grab
new posix locks in the period between the unsharing of current->files to
the moment when current->files->owner is swapped.

If not, one thread could in theory open a new file and grab a lock that
can never be unlocked because its lockowner gets stolen away from it by
another execing thread.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: DoS with POSIX file locks?
  2006-03-22 20:07                               ` Trond Myklebust
@ 2006-03-22 20:19                                 ` Miklos Szeredi
  0 siblings, 0 replies; 26+ messages in thread
From: Miklos Szeredi @ 2006-03-22 20:19 UTC (permalink / raw)
  To: trond.myklebust; +Cc: chrisw, matthew, linux-fsdevel, linux-kernel

> You'd have to ensure that none of the threads involved are able to grab
> new posix locks in the period between the unsharing of current->files to
> the moment when current->files->owner is swapped.
> 
> If not, one thread could in theory open a new file and grab a lock that
> can never be unlocked because its lockowner gets stolen away from it by
> another execing thread.

This race is already there.  Header comment on steal_locks() documents
it.

The patch does open this race window much wider, because pending locks
are also transfered to the task doing the exec.  The original
steal_locks() only stole already held locks.  But I don't think this
fundamentaly changes things.  It just shows more clearly how ugly the
current semantics are.

Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2006-03-22 20:20 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-20 11:41 DoS with POSIX file locks? Miklos Szeredi
2006-03-20 12:11 ` Matthew Wilcox
2006-03-20 12:19   ` Miklos Szeredi
2006-03-20 12:39     ` Matthew Wilcox
2006-03-20 12:52       ` Miklos Szeredi
2006-03-20 13:13         ` Arjan van de Ven
2006-03-20 13:24           ` Miklos Szeredi
2006-03-20 13:30             ` Arjan van de Ven
2006-03-20 13:39               ` Miklos Szeredi
2006-03-20 15:32         ` Matthew Wilcox
2006-03-20 16:41           ` Miklos Szeredi
2006-03-20 20:35             ` J. Bruce Fields
2006-03-21  6:38               ` Miklos Szeredi
2006-03-20 18:22           ` Trond Myklebust
2006-03-21  9:44             ` Miklos Szeredi
2006-03-21 17:28               ` Trond Myklebust
2006-03-21 17:58                 ` Miklos Szeredi
2006-03-21 18:16                   ` Trond Myklebust
2006-03-21 19:16                   ` Chris Wright
2006-03-22  6:21                     ` Miklos Szeredi
2006-03-22 11:12                       ` Trond Myklebust
2006-03-22 12:16                         ` Miklos Szeredi
2006-03-22 15:56                           ` Trond Myklebust
2006-03-22 16:34                             ` Miklos Szeredi
2006-03-22 20:07                               ` Trond Myklebust
2006-03-22 20:19                                 ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).