linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] af_unix: Fix splice-bind deadlock
@ 2015-12-27 20:13 Rainer Weikusat
  2015-12-29 10:58 ` Hannes Frederic Sowa
  2016-01-03 18:04 ` Rainer Weikusat
  0 siblings, 2 replies; 9+ messages in thread
From: Rainer Weikusat @ 2015-12-27 20:13 UTC (permalink / raw)
  To: David Miller; +Cc: dvyukov, netdev, linux-kernel, viro

On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice
system call and AF_UNIX sockets, 

http://lists.openwall.net/netdev/2015/11/06/24

The situation was analyzed as

(a while ago) A: socketpair()
B: splice() from a pipe to /mnt/regular_file
	does sb_start_write() on /mnt
C: try to freeze /mnt
	wait for B to finish with /mnt
A: bind() try to bind our socket to /mnt/new_socket_name
	lock our socket, see it not bound yet
	decide that it needs to create something in /mnt
	try to do sb_start_write() on /mnt, block (it's
	waiting for C).
D: splice() from the same pipe to our socket
	lock the pipe, see that socket is connected
	try to lock the socket, block waiting for A
B:	get around to actually feeding a chunk from
	pipe to file, try to lock the pipe.  Deadlock.

on 2015/11/10 by Al Viro,

http://lists.openwall.net/netdev/2015/11/10/4

The patch fixes this by removing the kern_path_create related code from
unix_mknod and executing it as part of unix_bind prior acquiring the
readlock of the socket in question. This means that A (as used above)
will sb_start_write on /mnt before it acquires the readlock, hence, it
won't indirectly block B which first did a sb_start_write and then
waited for a thread trying to acquire the readlock. Consequently, A
being blocked by C waiting for B won't cause a deadlock anymore
(effectively, both A and B acquire two locks in opposite order in the
situation described above).

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
---

I also think this is a better (or at least more correct) solution than
the pretty obvious idea to record that the socket is in the process of
being bound and performing the mknod without the lock. Assuming the
first bind fails with -EADDRINUSE, a concurrent bind which might have
succeeded had it waited for the ultimate outcome of the first will
meanwhile have failed with -EINVAL despite the socket will end up
unbound.

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index b1314c0..9b3d268 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -953,32 +953,20 @@ fail:
 	return NULL;
 }
 
-static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
+static int unix_mknod(struct dentry *dentry, struct path *path, umode_t mode,
+		      struct path *res)
 {
-	struct dentry *dentry;
-	struct path path;
-	int err = 0;
-	/*
-	 * Get the parent directory, calculate the hash for last
-	 * component.
-	 */
-	dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
-	err = PTR_ERR(dentry);
-	if (IS_ERR(dentry))
-		return err;
+	int err;
 
-	/*
-	 * All right, let's create it.
-	 */
-	err = security_path_mknod(&path, dentry, mode, 0);
+	err = security_path_mknod(path, dentry, mode, 0);
 	if (!err) {
-		err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
+		err = vfs_mknod(d_inode(path->dentry), dentry, mode, 0);
 		if (!err) {
-			res->mnt = mntget(path.mnt);
+			res->mnt = mntget(path->mnt);
 			res->dentry = dget(dentry);
 		}
 	}
-	done_path_create(&path, dentry);
+
 	return err;
 }
 
@@ -993,6 +981,8 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	unsigned int hash;
 	struct unix_address *addr;
 	struct hlist_head *list;
+	struct path path;
+	struct dentry *dentry;
 
 	err = -EINVAL;
 	if (sunaddr->sun_family != AF_UNIX)
@@ -1008,9 +998,21 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		goto out;
 	addr_len = err;
 
+	dentry = NULL;
+	if (sun_path[0]) {
+		/* Get the parent directory, calculate the hash for last
+		 * component.
+		 */
+		dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
+
+		err = PTR_ERR(dentry);
+		if (IS_ERR(dentry))
+			goto out;
+	}
+
 	err = mutex_lock_interruptible(&u->readlock);
 	if (err)
-		goto out;
+		goto out_path;
 
 	err = -EINVAL;
 	if (u->addr)
@@ -1026,11 +1028,11 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	addr->hash = hash ^ sk->sk_type;
 	atomic_set(&addr->refcnt, 1);
 
-	if (sun_path[0]) {
-		struct path path;
+	if (dentry) {
+		struct path u_path;
 		umode_t mode = S_IFSOCK |
 		       (SOCK_INODE(sock)->i_mode & ~current_umask());
-		err = unix_mknod(sun_path, mode, &path);
+		err = unix_mknod(dentry, &path, mode, &u_path);
 		if (err) {
 			if (err == -EEXIST)
 				err = -EADDRINUSE;
@@ -1038,9 +1040,9 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			goto out_up;
 		}
 		addr->hash = UNIX_HASH_SIZE;
-		hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE-1);
+		hash = d_backing_inode(dentry)->i_ino & (UNIX_HASH_SIZE - 1);
 		spin_lock(&unix_table_lock);
-		u->path = path;
+		u->path = u_path;
 		list = &unix_socket_table[hash];
 	} else {
 		spin_lock(&unix_table_lock);
@@ -1063,6 +1065,10 @@ out_unlock:
 	spin_unlock(&unix_table_lock);
 out_up:
 	mutex_unlock(&u->readlock);
+out_path:
+	if (dentry)
+		done_path_create(&path, dentry);
+
 out:
 	return err;
 }

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] af_unix: Fix splice-bind deadlock
  2015-12-27 20:13 [PATCH] af_unix: Fix splice-bind deadlock Rainer Weikusat
@ 2015-12-29 10:58 ` Hannes Frederic Sowa
  2015-12-31 19:36   ` Rainer Weikusat
  2016-01-03 18:04 ` Rainer Weikusat
  1 sibling, 1 reply; 9+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-29 10:58 UTC (permalink / raw)
  To: Rainer Weikusat, David Miller; +Cc: dvyukov, netdev, linux-kernel, viro

Hello Rainer,

On 27.12.2015 21:13, Rainer Weikusat wrote:
> -static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
> +static int unix_mknod(struct dentry *dentry, struct path *path, umode_t mode,
> +		      struct path *res)
>   {
> -	struct dentry *dentry;
> -	struct path path;
> -	int err = 0;
> -	/*
> -	 * Get the parent directory, calculate the hash for last
> -	 * component.
> -	 */
> -	dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
> -	err = PTR_ERR(dentry);
> -	if (IS_ERR(dentry))
> -		return err;
> +	int err;
>
> -	/*
> -	 * All right, let's create it.
> -	 */
> -	err = security_path_mknod(&path, dentry, mode, 0);
> +	err = security_path_mknod(path, dentry, mode, 0);
>   	if (!err) {
> -		err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
> +		err = vfs_mknod(d_inode(path->dentry), dentry, mode, 0);
>   		if (!err) {
> -			res->mnt = mntget(path.mnt);
> +			res->mnt = mntget(path->mnt);
>   			res->dentry = dget(dentry);
>   		}
>   	}
> -	done_path_create(&path, dentry);
> +

The reordered call to done_path_create will change the locking ordering 
between the i_mutexes and the unix readlock. Can you comment on this? On 
a first sight this looks like a much more dangerous change than the 
original deadlock report. Can't this also conflict with splice code deep 
down in vfs layer?

Thanks,
   Hannes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] af_unix: Fix splice-bind deadlock
  2015-12-29 10:58 ` Hannes Frederic Sowa
@ 2015-12-31 19:36   ` Rainer Weikusat
  2016-01-03 18:03     ` Rainer Weikusat
  0 siblings, 1 reply; 9+ messages in thread
From: Rainer Weikusat @ 2015-12-31 19:36 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Rainer Weikusat, David Miller, dvyukov, netdev, linux-kernel, viro

Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
> On 27.12.2015 21:13, Rainer Weikusat wrote:
>> -static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
>> +static int unix_mknod(struct dentry *dentry, struct path *path, umode_t mode,
>> +		      struct path *res)
>>   {
>> -	struct dentry *dentry;
>> -	struct path path;
>> -	int err = 0;
>> -	/*
>> -	 * Get the parent directory, calculate the hash for last
>> -	 * component.
>> -	 */
>> -	dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
>> -	err = PTR_ERR(dentry);
>> -	if (IS_ERR(dentry))
>> -		return err;
>> +	int err;
>>
>> -	/*
>> -	 * All right, let's create it.
>> -	 */
>> -	err = security_path_mknod(&path, dentry, mode, 0);
>> +	err = security_path_mknod(path, dentry, mode, 0);
>>   	if (!err) {
>> -		err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
>> +		err = vfs_mknod(d_inode(path->dentry), dentry, mode, 0);
>>   		if (!err) {
>> -			res->mnt = mntget(path.mnt);
>> +			res->mnt = mntget(path->mnt);
>>   			res->dentry = dget(dentry);
>>   		}
>>   	}
>> -	done_path_create(&path, dentry);
>> +
>
> The reordered call to done_path_create will change the locking
> ordering between the i_mutexes and the unix readlock. Can you comment
> on this? On a first sight this looks like a much more dangerous change
> than the original deadlock report. Can't this also conflict with
> splice code deep down in vfs layer?

Practical consideration
-----------------------

kern_path_create acquires the i_mutex of the parent directory of the
to-be-created directory entry (via filename_create/ namei.c), as
required for reading a directory or creating a new entry in a directory
(as per Documentation/filesystems/directory-locking). A deadlock was
possible here if the thread doing the bind then blocked when trying to
acquire the readlock while the thread holding the readlock is blocked on
another lock held by a thread trying to perform an operation on the same
directory as the bind (possibly with some indirection). The only 'other
lock' which could come into play here is the pipe lock of a pipe
partaking in a splice_to_pipe from the same AF_UNIX socket. But the idea
that some thread would need to take a pipe lock prior to performing a
directory operation is quite odd (splice_from_pipe_to_directory?
openatparentoffifo?). I've also checked all existing users
of pipe_lock and at least, I didn't find one performing a directory
operation.


Theoretical consideration
-------------------------

NB: The text below represents my opinion on this after spending a few
days thinking about it (on and of, of course). Making an argument for
the opposite position is also possible.

The filesystem (namespace) is a shared namespace accessible to all
currently running threads/ processes. Whoever uses the filesystem may
have to wait for other filesystem users but threads not using it
shouldn't have to. Because of this and because the filesystem is a
pretty central facility, an operation needing 'some filesystem lock' and
also some other lock (or locks) should always acquire the filesystem
ones before any more specialized locks (as do_splice does when splicing
to a file). If 'filesystem locks' are always acquired first, there's
also no risk of a deadlock because code holding a filesystem lock is
blocked on a more specialized lock (eg, a pipe lock or the readlock
mutx) while some other thread holding the/ a more specialized lock wants
the already held filesystem lock.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] af_unix: Fix splice-bind deadlock
  2015-12-31 19:36   ` Rainer Weikusat
@ 2016-01-03 18:03     ` Rainer Weikusat
  2016-01-04 23:25       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 9+ messages in thread
From: Rainer Weikusat @ 2016-01-03 18:03 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Rainer Weikusat, David Miller, dvyukov, netdev, linux-kernel, viro

Rainer Weikusat <rw@doppelsaurus.mobileactivedefense.com> writes:

> Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
>> On 27.12.2015 21:13, Rainer Weikusat wrote:
>>> -static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
>>> +static int unix_mknod(struct dentry *dentry, struct path *path, umode_t mode,
>>> +		      struct path *res)
>>>   {
>>> -	struct dentry *dentry;
>>> -	struct path path;
>>> -	int err = 0;
>>> -	/*
>>> -	 * Get the parent directory, calculate the hash for last
>>> -	 * component.
>>> -	 */
>>> -	dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
>>> -	err = PTR_ERR(dentry);
>>> -	if (IS_ERR(dentry))
>>> -		return err;
>>> +	int err;
>>>
>>> -	/*
>>> -	 * All right, let's create it.
>>> -	 */
>>> -	err = security_path_mknod(&path, dentry, mode, 0);
>>> +	err = security_path_mknod(path, dentry, mode, 0);
>>>   	if (!err) {
>>> -		err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
>>> +		err = vfs_mknod(d_inode(path->dentry), dentry, mode, 0);
>>>   		if (!err) {
>>> -			res->mnt = mntget(path.mnt);
>>> +			res->mnt = mntget(path->mnt);
>>>   			res->dentry = dget(dentry);
>>>   		}
>>>   	}
>>> -	done_path_create(&path, dentry);
>>> +
>>
>> The reordered call to done_path_create will change the locking
>> ordering between the i_mutexes and the unix readlock. Can you comment
>> on this? On a first sight this looks like a much more dangerous change
>> than the original deadlock report. Can't this also conflict with
>> splice code deep down in vfs layer?
>
> Practical consideration

[...]

> A deadlock was possible here if the thread doing the bind then blocked
> when trying to acquire the readlock while the thread holding the
> readlock is blocked on another lock held by a thread trying to perform
> an operation on the same directory as the bind (possibly with some
> indirection).

Since this was probably pretty much a "write only" sentence, I think I
should try this again (with apologies in case a now err on the other
side and rather explain to much --- my abilities to express myself such
that people understand what I mean to express instead of just getting
mad at me are not great).

For a deadlock to happen here, there needs to be a cycle (circle?) of
threads each holding one lock and blocking while trying to acquire
another lock which ultimatively ends with a thread trying to acquire the
i_mutex of the directory where the socket name is to be created. The
binding thread would need to block when trying to acquire the
readlock. But (contrary to what I originally wrote[*]) this cannot happen
because the af_unix code doesn't lock anything non-socket related while
holding the readlock. The only instance of that was in _bind and caused
the deadlock.

[*] I misread

static ssize_t skb_unix_socket_splice(struct sock *sk,
				      struct pipe_inode_info *pipe,
				      struct splice_pipe_desc *spd)
{
	int ret;
	struct unix_sock *u = unix_sk(sk);

	mutex_unlock(&u->readlock);
	ret = splice_to_pipe(pipe, spd);
	mutex_lock(&u->readlock);

	return ret;
}

as 'lock followed by unlock' instead of 'unlock followed by lock'.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] af_unix: Fix splice-bind deadlock
  2015-12-27 20:13 [PATCH] af_unix: Fix splice-bind deadlock Rainer Weikusat
  2015-12-29 10:58 ` Hannes Frederic Sowa
@ 2016-01-03 18:04 ` Rainer Weikusat
  2016-01-03 18:56   ` Rainer Weikusat
  1 sibling, 1 reply; 9+ messages in thread
From: Rainer Weikusat @ 2016-01-03 18:04 UTC (permalink / raw)
  To: David Miller; +Cc: dvyukov, netdev, linux-kernel, viro

Rainer Weikusat <rw@doppelsaurus.mobileactivedefense.com> writes:

[...]

> +	dentry = NULL;
> +	if (sun_path[0]) {
> +		/* Get the parent directory, calculate the hash for last
> +		 * component.
> +		 */
> +		dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
> +
> +		err = PTR_ERR(dentry);
> +		if (IS_ERR(dentry))
> +			goto out;
> +	}
> +

This is wrong because kern_path_create can return with -EEXIST which
needs to be translated to -EADDRINUSE for this case. I'll fix that.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] af_unix: Fix splice-bind deadlock
  2016-01-03 18:04 ` Rainer Weikusat
@ 2016-01-03 18:56   ` Rainer Weikusat
  2016-01-05  4:23     ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Rainer Weikusat @ 2016-01-03 18:56 UTC (permalink / raw)
  To: David Miller; +Cc: dvyukov, netdev, linux-kernel, viro

On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice
system call and AF_UNIX sockets, 

http://lists.openwall.net/netdev/2015/11/06/24

The situation was analyzed as

(a while ago) A: socketpair()
B: splice() from a pipe to /mnt/regular_file
	does sb_start_write() on /mnt
C: try to freeze /mnt
	wait for B to finish with /mnt
A: bind() try to bind our socket to /mnt/new_socket_name
	lock our socket, see it not bound yet
	decide that it needs to create something in /mnt
	try to do sb_start_write() on /mnt, block (it's
	waiting for C).
D: splice() from the same pipe to our socket
	lock the pipe, see that socket is connected
	try to lock the socket, block waiting for A
B:	get around to actually feeding a chunk from
	pipe to file, try to lock the pipe.  Deadlock.

on 2015/11/10 by Al Viro,

http://lists.openwall.net/netdev/2015/11/10/4

The patch fixes this by removing the kern_path_create related code from
unix_mknod and executing it as part of unix_bind prior acquiring the
readlock of the socket in question. This means that A (as used above)
will sb_start_write on /mnt before it acquires the readlock, hence, it
won't indirectly block B which first did a sb_start_write and then
waited for a thread trying to acquire the readlock. Consequently, A
being blocked by C waiting for B won't cause a deadlock anymore
(effectively, both A and B acquire two locks in opposite order in the
situation described above).

Dmitry Vyukov(<dvyukov@google.com>) tested the original patch.

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
---

This fixes two 'wrong' error returns, namely, return -EADDRINUSE if
kern_path_create returned -EEXIST but delay returning an error from
kern_path_create until after the u->addr check as the -EINVAL should IMO
take precedence here.

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index b1314c0..e6d3556 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -953,32 +953,20 @@ fail:
 	return NULL;
 }
 
-static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
+static int unix_mknod(struct dentry *dentry, struct path *path, umode_t mode,
+		      struct path *res)
 {
-	struct dentry *dentry;
-	struct path path;
-	int err = 0;
-	/*
-	 * Get the parent directory, calculate the hash for last
-	 * component.
-	 */
-	dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
-	err = PTR_ERR(dentry);
-	if (IS_ERR(dentry))
-		return err;
+	int err;
 
-	/*
-	 * All right, let's create it.
-	 */
-	err = security_path_mknod(&path, dentry, mode, 0);
+	err = security_path_mknod(path, dentry, mode, 0);
 	if (!err) {
-		err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
+		err = vfs_mknod(d_inode(path->dentry), dentry, mode, 0);
 		if (!err) {
-			res->mnt = mntget(path.mnt);
+			res->mnt = mntget(path->mnt);
 			res->dentry = dget(dentry);
 		}
 	}
-	done_path_create(&path, dentry);
+
 	return err;
 }
 
@@ -989,10 +977,12 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	struct unix_sock *u = unix_sk(sk);
 	struct sockaddr_un *sunaddr = (struct sockaddr_un *)uaddr;
 	char *sun_path = sunaddr->sun_path;
-	int err;
+	int err, name_err;
 	unsigned int hash;
 	struct unix_address *addr;
 	struct hlist_head *list;
+	struct path path;
+	struct dentry *dentry;
 
 	err = -EINVAL;
 	if (sunaddr->sun_family != AF_UNIX)
@@ -1008,14 +998,34 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		goto out;
 	addr_len = err;
 
+	name_err = 0;
+	dentry = NULL;
+	if (sun_path[0]) {
+		/* Get the parent directory, calculate the hash for last
+		 * component.
+		 */
+		dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
+
+		if (IS_ERR(dentry)) {
+			/* delay report until after 'already bound' check */
+			name_err = PTR_ERR(dentry);
+			dentry = NULL;
+		}
+	}
+
 	err = mutex_lock_interruptible(&u->readlock);
 	if (err)
-		goto out;
+		goto out_path;
 
 	err = -EINVAL;
 	if (u->addr)
 		goto out_up;
 
+	if (name_err) {
+		err = name_err == -EEXIST ? -EADDRINUSE : name_err;
+		goto out_up;
+	}
+
 	err = -ENOMEM;
 	addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL);
 	if (!addr)
@@ -1026,11 +1036,11 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	addr->hash = hash ^ sk->sk_type;
 	atomic_set(&addr->refcnt, 1);
 
-	if (sun_path[0]) {
-		struct path path;
+	if (dentry) {
+		struct path u_path;
 		umode_t mode = S_IFSOCK |
 		       (SOCK_INODE(sock)->i_mode & ~current_umask());
-		err = unix_mknod(sun_path, mode, &path);
+		err = unix_mknod(dentry, &path, mode, &u_path);
 		if (err) {
 			if (err == -EEXIST)
 				err = -EADDRINUSE;
@@ -1038,9 +1048,9 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			goto out_up;
 		}
 		addr->hash = UNIX_HASH_SIZE;
-		hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE-1);
+		hash = d_backing_inode(dentry)->i_ino & (UNIX_HASH_SIZE - 1);
 		spin_lock(&unix_table_lock);
-		u->path = path;
+		u->path = u_path;
 		list = &unix_socket_table[hash];
 	} else {
 		spin_lock(&unix_table_lock);
@@ -1063,6 +1073,10 @@ out_unlock:
 	spin_unlock(&unix_table_lock);
 out_up:
 	mutex_unlock(&u->readlock);
+out_path:
+	if (dentry)
+		done_path_create(&path, dentry);
+
 out:
 	return err;
 }

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] af_unix: Fix splice-bind deadlock
  2016-01-03 18:03     ` Rainer Weikusat
@ 2016-01-04 23:25       ` Hannes Frederic Sowa
  2016-01-06 14:45         ` Rainer Weikusat
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Frederic Sowa @ 2016-01-04 23:25 UTC (permalink / raw)
  To: Rainer Weikusat; +Cc: David Miller, dvyukov, netdev, linux-kernel, viro

Hello,

On Sun, Jan 3, 2016, at 19:03, Rainer Weikusat wrote:
> Rainer Weikusat <rw@doppelsaurus.mobileactivedefense.com> writes:
> 
> > Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
> >> On 27.12.2015 21:13, Rainer Weikusat wrote:
> >>> -static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
> >>> +static int unix_mknod(struct dentry *dentry, struct path *path, umode_t mode,
> >>> +		      struct path *res)
> >>>   {
> >>> -	struct dentry *dentry;
> >>> -	struct path path;
> >>> -	int err = 0;
> >>> -	/*
> >>> -	 * Get the parent directory, calculate the hash for last
> >>> -	 * component.
> >>> -	 */
> >>> -	dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
> >>> -	err = PTR_ERR(dentry);
> >>> -	if (IS_ERR(dentry))
> >>> -		return err;
> >>> +	int err;
> >>>
> >>> -	/*
> >>> -	 * All right, let's create it.
> >>> -	 */
> >>> -	err = security_path_mknod(&path, dentry, mode, 0);
> >>> +	err = security_path_mknod(path, dentry, mode, 0);
> >>>   	if (!err) {
> >>> -		err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
> >>> +		err = vfs_mknod(d_inode(path->dentry), dentry, mode, 0);
> >>>   		if (!err) {
> >>> -			res->mnt = mntget(path.mnt);
> >>> +			res->mnt = mntget(path->mnt);
> >>>   			res->dentry = dget(dentry);
> >>>   		}
> >>>   	}
> >>> -	done_path_create(&path, dentry);
> >>> +
> >>
> >> The reordered call to done_path_create will change the locking
> >> ordering between the i_mutexes and the unix readlock. Can you comment
> >> on this? On a first sight this looks like a much more dangerous change
> >> than the original deadlock report. Can't this also conflict with
> >> splice code deep down in vfs layer?
> >
> > Practical consideration
> 
> [...]
> 
> > A deadlock was possible here if the thread doing the bind then blocked
> > when trying to acquire the readlock while the thread holding the
> > readlock is blocked on another lock held by a thread trying to perform
> > an operation on the same directory as the bind (possibly with some
> > indirection).
> 
> Since this was probably pretty much a "write only" sentence, I think I
> should try this again (with apologies in case a now err on the other
> side and rather explain to much --- my abilities to express myself such
> that people understand what I mean to express instead of just getting
> mad at me are not great).
> 
> For a deadlock to happen here, there needs to be a cycle (circle?) of
> threads each holding one lock and blocking while trying to acquire
> another lock which ultimatively ends with a thread trying to acquire the
> i_mutex of the directory where the socket name is to be created. The
> binding thread would need to block when trying to acquire the
> readlock. But (contrary to what I originally wrote[*]) this cannot happen
> because the af_unix code doesn't lock anything non-socket related while
> holding the readlock. The only instance of that was in _bind and caused
> the deadlock.
> 
> [*] I misread
> 
> static ssize_t skb_unix_socket_splice(struct sock *sk,
> 				      struct pipe_inode_info *pipe,
> 				      struct splice_pipe_desc *spd)
> {
> 	int ret;
> 	struct unix_sock *u = unix_sk(sk);
> 
> 	mutex_unlock(&u->readlock);
> 	ret = splice_to_pipe(pipe, spd);
> 	mutex_lock(&u->readlock);
> 
> 	return ret;
> }
> 
> as 'lock followed by unlock' instead of 'unlock followed by lock'.

I agree with your arguments but haven't finished researching enough of
how i_mutex is handled in all regards. I will have to do further
research on this.

I was concerned because of the comment in skb_socket_splice:

        /* Drop the socket lock, otherwise we have reverse
         * locking dependencies between sk_lock and i_mutex
         * here as compared to sendfile(). We enter here
         * with the socket lock held, and splice_to_pipe() will
         * grab the pipe inode lock. For sendfile() emulation,
         * we call into ->sendpage() with the i_mutex lock held
         * and networking will grab the socket lock.
         */

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] af_unix: Fix splice-bind deadlock
  2016-01-03 18:56   ` Rainer Weikusat
@ 2016-01-05  4:23     ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2016-01-05  4:23 UTC (permalink / raw)
  To: rweikusat; +Cc: dvyukov, netdev, linux-kernel, viro

From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Date: Sun, 03 Jan 2016 18:56:38 +0000

> On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice
> system call and AF_UNIX sockets, 
> 
> http://lists.openwall.net/netdev/2015/11/06/24
> 
> The situation was analyzed as
> 
> (a while ago) A: socketpair()
> B: splice() from a pipe to /mnt/regular_file
> 	does sb_start_write() on /mnt
> C: try to freeze /mnt
> 	wait for B to finish with /mnt
> A: bind() try to bind our socket to /mnt/new_socket_name
> 	lock our socket, see it not bound yet
> 	decide that it needs to create something in /mnt
> 	try to do sb_start_write() on /mnt, block (it's
> 	waiting for C).
> D: splice() from the same pipe to our socket
> 	lock the pipe, see that socket is connected
> 	try to lock the socket, block waiting for A
> B:	get around to actually feeding a chunk from
> 	pipe to file, try to lock the pipe.  Deadlock.
> 
> on 2015/11/10 by Al Viro,
> 
> http://lists.openwall.net/netdev/2015/11/10/4
> 
> The patch fixes this by removing the kern_path_create related code from
> unix_mknod and executing it as part of unix_bind prior acquiring the
> readlock of the socket in question. This means that A (as used above)
> will sb_start_write on /mnt before it acquires the readlock, hence, it
> won't indirectly block B which first did a sb_start_write and then
> waited for a thread trying to acquire the readlock. Consequently, A
> being blocked by C waiting for B won't cause a deadlock anymore
> (effectively, both A and B acquire two locks in opposite order in the
> situation described above).
> 
> Dmitry Vyukov(<dvyukov@google.com>) tested the original patch.
> 
> Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>

Applied and queued up for -stable, thanks Rainer.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] af_unix: Fix splice-bind deadlock
  2016-01-04 23:25       ` Hannes Frederic Sowa
@ 2016-01-06 14:45         ` Rainer Weikusat
  0 siblings, 0 replies; 9+ messages in thread
From: Rainer Weikusat @ 2016-01-06 14:45 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Rainer Weikusat, David Miller, dvyukov, netdev, linux-kernel, viro

Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
> On Sun, Jan 3, 2016, at 19:03, Rainer Weikusat wrote:

[reorder i_mutex and readlock locking]

> I was concerned because of the comment in skb_socket_splice:
>
>         /* Drop the socket lock, otherwise we have reverse
>          * locking dependencies between sk_lock and i_mutex
>          * here as compared to sendfile(). We enter here
>          * with the socket lock held, and splice_to_pipe() will
>          * grab the pipe inode lock. For sendfile() emulation,
>          * we call into ->sendpage() with the i_mutex lock held
>          * and networking will grab the socket lock.
>          */

AFAICT, this comment is "a bit misleading": sendfile (from file to
socket) is internally implemented as 'splice from file to pipe' +
'splice from pipe to socket'. The later acquires the pipe lock of the
pipe and then invokes the sendpage method of the socket which acquires
the appropiate socket lock (for an AF_UNIX socket, it's
u->readlock). But 'pipe lock' and 'i_mutex' are two completely
different things: The former is the mutex in a struct pipe_inode_info
(pipe_fs_u.h), the latter is the i_mutex in a struct inode (fs.h).

"Code explains comment" :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-01-06 14:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-27 20:13 [PATCH] af_unix: Fix splice-bind deadlock Rainer Weikusat
2015-12-29 10:58 ` Hannes Frederic Sowa
2015-12-31 19:36   ` Rainer Weikusat
2016-01-03 18:03     ` Rainer Weikusat
2016-01-04 23:25       ` Hannes Frederic Sowa
2016-01-06 14:45         ` Rainer Weikusat
2016-01-03 18:04 ` Rainer Weikusat
2016-01-03 18:56   ` Rainer Weikusat
2016-01-05  4:23     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).