linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code
@ 2015-12-16 20:09 Rainer Weikusat
  2015-12-17  9:22 ` Hannes Frederic Sowa
  2015-12-17 20:34 ` [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Rainer Weikusat @ 2015-12-16 20:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel

With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
receive code was changed from using mutex_lock(&u->readlock) to
mutex_lock_interruptible(&u->readlock) to prevent signals from being
delayed for an indefinite time if a thread sleeping on the mutex
happened to be selected for handling the signal. But this was never a
problem with the stream receive code (as opposed to its datagram
counterpart) as that never went to sleep waiting for new messages with the
mutex held and thus, wouldn't cause secondary readers to block on the
mutex waiting for the sleeping primary reader. As the interruptible
locking makes the code more complicated in exchange for no benefit,
change it back to using mutex_lock.

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
---

Considering that the datagram receive routine also doesn't go the sleep
with the mutex held anymore, the 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490
change to unix_autobind is now similarly purposeless.

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 1c3c1f3..b1314c0 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2263,14 +2263,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state)
 	/* Lock the socket to prevent queue disordering
 	 * while sleeps in memcpy_tomsg
 	 */
-	err = mutex_lock_interruptible(&u->readlock);
-	if (unlikely(err)) {
-		/* recvmsg() in non blocking mode is supposed to return -EAGAIN
-		 * sk_rcvtimeo is not honored by mutex_lock_interruptible()
-		 */
-		err = noblock ? -EAGAIN : -ERESTARTSYS;
-		goto out;
-	}
+	mutex_lock(&u->readlock);
 
 	if (flags & MSG_PEEK)
 		skip = sk_peek_offset(sk, flags);
@@ -2314,12 +2307,12 @@ again:
 			timeo = unix_stream_data_wait(sk, timeo, last,
 						      last_len);
 
-			if (signal_pending(current) ||
-			    mutex_lock_interruptible(&u->readlock)) {
+			if (signal_pending(current)) {
 				err = sock_intr_errno(timeo);
 				goto out;
 			}
 
+			mutex_lock(&u->readlock);
 			continue;
 unlock:
 			unix_state_unlock(sk);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code
  2015-12-16 20:09 [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code Rainer Weikusat
@ 2015-12-17  9:22 ` Hannes Frederic Sowa
  2015-12-17 15:28   ` Rainer Weikusat
  2015-12-17 23:26   ` Rainer Weikusat
  2015-12-17 20:34 ` [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code David Miller
  1 sibling, 2 replies; 7+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-17  9:22 UTC (permalink / raw)
  To: Rainer Weikusat, David Miller; +Cc: netdev, linux-kernel, Al Viro

On 16.12.2015 21:09, Rainer Weikusat wrote:
> With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
> receive code was changed from using mutex_lock(&u->readlock) to
> mutex_lock_interruptible(&u->readlock) to prevent signals from being
> delayed for an indefinite time if a thread sleeping on the mutex
> happened to be selected for handling the signal. But this was never a
> problem with the stream receive code (as opposed to its datagram
> counterpart) as that never went to sleep waiting for new messages with the
> mutex held and thus, wouldn't cause secondary readers to block on the
> mutex waiting for the sleeping primary reader. As the interruptible
> locking makes the code more complicated in exchange for no benefit,
> change it back to using mutex_lock.
> 
> Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
> ---
> 
> Considering that the datagram receive routine also doesn't go the sleep
> with the mutex held anymore, the 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490
> change to unix_autobind is now similarly purposeless.

I wouldn't do this conversion, yet. There is still a deadlock lingering
around which should be solved earlier:

http://lists.openwall.net/netdev/2015/11/10/4

Unfortunately I haven't found a good way how to solve it, yet.

Thanks,
Hannes


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code
  2015-12-17  9:22 ` Hannes Frederic Sowa
@ 2015-12-17 15:28   ` Rainer Weikusat
  2015-12-17 15:43     ` Hannes Frederic Sowa
  2015-12-17 23:26   ` Rainer Weikusat
  1 sibling, 1 reply; 7+ messages in thread
From: Rainer Weikusat @ 2015-12-17 15:28 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: David Miller, netdev, linux-kernel, Al Viro

Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
> On 16.12.2015 21:09, Rainer Weikusat wrote:
>> With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
>> receive code was changed from using mutex_lock(&u->readlock) to
>> mutex_lock_interruptible(&u->readlock) to prevent signals from being
>> delayed for an indefinite time if a thread sleeping on the mutex
>> happened to be selected for handling the signal. But this was never a
>> problem with the stream receive code (as opposed to its datagram
>> counterpart) as that never went to sleep waiting for new messages with the
>> mutex held and thus, wouldn't cause secondary readers to block on the
>> mutex waiting for the sleeping primary reader. As the interruptible
>> locking makes the code more complicated in exchange for no benefit,
>> change it back to using mutex_lock.
>> 
>> Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
>> ---
>> 
>> Considering that the datagram receive routine also doesn't go the sleep
>> with the mutex held anymore, the 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490
>> change to unix_autobind is now similarly purposeless.
>
> I wouldn't do this conversion, yet. There is still a deadlock lingering
> around which should be solved earlier:
>
> http://lists.openwall.net/netdev/2015/11/10/4
>
> Unfortunately I haven't found a good way how to solve it, yet.

Judging from the link, that's not related to the stream receive code.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code
  2015-12-17 15:28   ` Rainer Weikusat
@ 2015-12-17 15:43     ` Hannes Frederic Sowa
  0 siblings, 0 replies; 7+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-17 15:43 UTC (permalink / raw)
  To: Rainer Weikusat; +Cc: David Miller, netdev, linux-kernel, Al Viro

On 17.12.2015 16:28, Rainer Weikusat wrote:
> Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
>> On 16.12.2015 21:09, Rainer Weikusat wrote:
>>> With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
>>> receive code was changed from using mutex_lock(&u->readlock) to
>>> mutex_lock_interruptible(&u->readlock) to prevent signals from being
>>> delayed for an indefinite time if a thread sleeping on the mutex
>>> happened to be selected for handling the signal. But this was never a
>>> problem with the stream receive code (as opposed to its datagram
>>> counterpart) as that never went to sleep waiting for new messages with the
>>> mutex held and thus, wouldn't cause secondary readers to block on the
>>> mutex waiting for the sleeping primary reader. As the interruptible
>>> locking makes the code more complicated in exchange for no benefit,
>>> change it back to using mutex_lock.
>>>
>>> Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
>>> ---
>>>
>>> Considering that the datagram receive routine also doesn't go the sleep
>>> with the mutex held anymore, the 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490
>>> change to unix_autobind is now similarly purposeless.
>>
>> I wouldn't do this conversion, yet. There is still a deadlock lingering
>> around which should be solved earlier:
>>
>> http://lists.openwall.net/netdev/2015/11/10/4
>>
>> Unfortunately I haven't found a good way how to solve it, yet.
> 
> Judging from the link, that's not related to the stream receive code.
> 

No, but to commit 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490 where the
mutexes of unix_bind and unix_autobind got changed.

The unix_stream_read_generic conversion is fine.

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks,
Hannes


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code
  2015-12-16 20:09 [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code Rainer Weikusat
  2015-12-17  9:22 ` Hannes Frederic Sowa
@ 2015-12-17 20:34 ` David Miller
  1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2015-12-17 20:34 UTC (permalink / raw)
  To: rweikusat; +Cc: netdev, linux-kernel

From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Date: Wed, 16 Dec 2015 20:09:25 +0000

> With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
> receive code was changed from using mutex_lock(&u->readlock) to
> mutex_lock_interruptible(&u->readlock) to prevent signals from being
> delayed for an indefinite time if a thread sleeping on the mutex
> happened to be selected for handling the signal. But this was never a
> problem with the stream receive code (as opposed to its datagram
> counterpart) as that never went to sleep waiting for new messages with the
> mutex held and thus, wouldn't cause secondary readers to block on the
> mutex waiting for the sleeping primary reader. As the interruptible
> locking makes the code more complicated in exchange for no benefit,
> change it back to using mutex_lock.
> 
> Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>

Applied, thanks Rainer.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code
  2015-12-17  9:22 ` Hannes Frederic Sowa
  2015-12-17 15:28   ` Rainer Weikusat
@ 2015-12-17 23:26   ` Rainer Weikusat
  2015-12-18 16:04     ` splice-bind deadlock (was: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code) Rainer Weikusat
  1 sibling, 1 reply; 7+ messages in thread
From: Rainer Weikusat @ 2015-12-17 23:26 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: David Miller, netdev, linux-kernel, Al Viro

Hannes Frederic Sowa <hannes@stressinduktion.org> writes:

[...]

> There is still a deadlock lingering around

[...]

> http://lists.openwall.net/netdev/2015/11/10/4

Interesting problem. Assuming the description

	(a while ago) A: socketpair()
        
	B: splice() from a pipe to /mnt/regular_file
 	   does sb_start_write() on /mnt
           
	C: try to freeze /mnt
	   wait for B to finish with /mnt
           
	A: bind() try to bind our socket to /mnt/new_socket_name
	   lock our socket, see it not bound yet
	   decide that it needs to create something in /mnt
	   try to do sb_start_write() on /mnt, block (it's
	   waiting for C).
           
	D: splice() from the same pipe to our socket
	   lock the pipe, see that socket is connected
	   try to lock the socket, block waiting for A
           
	B: get around to actually feeding a chunk from
	   pipe to file, try to lock the pipe.

is correct, the sequence of events could be described as

Given
	a/b	- acquire a block b (eg, get read lock on superblock
                  rwsem)

	b/a	- acquire b block a

        c	- u->readlock

        d	- pipe lock

	[*y]   - blocks waiting for y

        
B	a/b

C	b/a[*B]

A	c
A	a/b[*C]

D	d
D	c[*A]

B	d[*D]

considering that C waits for B, the situation is A blocked by B, D
blocked by A, B blocked by D. This could be avoided by making
A do the a/b[*C] before acquiring c. D then wouldn't end up blocked
waiting for A and hence, B would complete after D completed, enabling C
to complete and finally, A. The present unix_mknod is

static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
{
        struct dentry *dentry;
        struct path path;
        int err = 0;
        /*
         * Get the parent directory, calculate the hash for last
         * component.
         */
        dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
        err = PTR_ERR(dentry);
        if (IS_ERR(dentry))
                return err;

        /*
         * All right, let's create it.
         */
        err = security_path_mknod(&path, dentry, mode, 0);
        if (!err) {
                err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
                if (!err) {
                        res->mnt = mntget(path.mnt);
                        res->dentry = dget(dentry);
                }
        }
        done_path_create(&path, dentry);
        return err;
}

The a/b[*C] is a side-effect of the kern_path_create. unix_mknod is
called with u->readlock held because an already bound socket must not
be bound (binded?) again. As far as I understand the above, the actual
filesystem manipulation is performed by vfs_mknod. It should be possible
to split this function in two so that the sequence of 'bind events'
becomes

1. kern_path_create (acquires superblock rw sem)

2. lock u->readlock

3. already bound? yes goto 5

4. create directory entry

5. done_path_create ... / unlock u->readlock

Below is a patch changing the code as described. I've tested that
creating sockets with names in the filesystem still works but nothing
else (At least not systematically. My 'workstation' didn't blow up in
the 21 minutes I've been running the modified kernel on it).

---
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 1c3c1f3..ed3d380 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -953,32 +953,30 @@ fail:
 	return NULL;
 }
 
-static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
+static struct dentry *unix_path_create(const char *sun_path, struct path *path)
 {
-	struct dentry *dentry;
-	struct path path;
-	int err = 0;
 	/*
 	 * Get the parent directory, calculate the hash for last
 	 * component.
 	 */
-	dentry = kern_path_create(AT_FDCWD, sun_path, &path, 0);
-	err = PTR_ERR(dentry);
-	if (IS_ERR(dentry))
-		return err;
 
-	/*
-	 * All right, let's create it.
-	 */
-	err = security_path_mknod(&path, dentry, mode, 0);
+	return kern_path_create(AT_FDCWD, sun_path, path, 0);
+}
+
+static int unix_mknod(struct dentry *dentry, struct path *path, umode_t mode,
+		      struct path *res)
+{
+	int err;
+
+	err = security_path_mknod(path, dentry, mode, 0);
 	if (!err) {
-		err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
+		err = vfs_mknod(d_inode(path->dentry), dentry, mode, 0);
 		if (!err) {
-			res->mnt = mntget(path.mnt);
+			res->mnt = mntget(path->mnt);
 			res->dentry = dget(dentry);
 		}
 	}
-	done_path_create(&path, dentry);
+
 	return err;
 }
 
@@ -993,6 +991,8 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	unsigned int hash;
 	struct unix_address *addr;
 	struct hlist_head *list;
+	struct path parent_path;
+	struct dentry *parent;
 
 	err = -EINVAL;
 	if (sunaddr->sun_family != AF_UNIX)
@@ -1008,9 +1008,18 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		goto out;
 	addr_len = err;
 
+	parent = NULL;
+	if (sun_path[0]) {
+		parent = unix_path_create(sun_path, &parent_path);
+
+		err = PTR_ERR(parent);
+		if (IS_ERR(parent))
+			goto out;
+	}
+
 	err = mutex_lock_interruptible(&u->readlock);
 	if (err)
-		goto out;
+		goto out_parent;
 
 	err = -EINVAL;
 	if (u->addr)
@@ -1026,11 +1035,11 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	addr->hash = hash ^ sk->sk_type;
 	atomic_set(&addr->refcnt, 1);
 
-	if (sun_path[0]) {
+	if (parent) {
 		struct path path;
 		umode_t mode = S_IFSOCK |
 		       (SOCK_INODE(sock)->i_mode & ~current_umask());
-		err = unix_mknod(sun_path, mode, &path);
+		err = unix_mknod(parent, &parent_path, mode, &path);
 		if (err) {
 			if (err == -EEXIST)
 				err = -EADDRINUSE;
@@ -1063,6 +1072,10 @@ out_unlock:
 	spin_unlock(&unix_table_lock);
 out_up:
 	mutex_unlock(&u->readlock);
+out_parent:
+	if (parent)
+		done_path_create(&parent_path, parent);
+
 out:
 	return err;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* splice-bind deadlock (was: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code)
  2015-12-17 23:26   ` Rainer Weikusat
@ 2015-12-18 16:04     ` Rainer Weikusat
  0 siblings, 0 replies; 7+ messages in thread
From: Rainer Weikusat @ 2015-12-18 16:04 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: David Miller, netdev, linux-kernel, Al Viro

Rainer Weikusat <rw@doppelsaurus.mobileactivedefense.com> writes:
> Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
>
> [...]
>
>> There is still a deadlock lingering around
>
> [...]
>
>> http://lists.openwall.net/netdev/2015/11/10/4

[...]

> 	(a while ago) A: socketpair()
>         
> 	B: splice() from a pipe to /mnt/regular_file
>  	   does sb_start_write() on /mnt
>            
> 	C: try to freeze /mnt
> 	   wait for B to finish with /mnt
>            
> 	A: bind() try to bind our socket to /mnt/new_socket_name
> 	   lock our socket, see it not bound yet
> 	   decide that it needs to create something in /mnt
> 	   try to do sb_start_write() on /mnt, block (it's
> 	   waiting for C).
>            
> 	D: splice() from the same pipe to our socket
> 	   lock the pipe, see that socket is connected
> 	   try to lock the socket, block waiting for A
>            
> 	B: get around to actually feeding a chunk from
> 	   pipe to file, try to lock the pipe.
	[from the page]

[...]

> Given
> 	a/b	- acquire a block b (eg, get read lock on superblock
>                   rwsem)
>
> 	b/a	- acquire b block a
>
>         c	- u->readlock
>
>         d	- pipe lock
>
> 	[*y]   - blocks waiting for y
>
>         
> B	a/b
>
> C	b/a[*B]
>
> A	c
> A	a/b[*C]
>
> D	d
> D	c[*A]
>
> B	d[*D]

Some more explanations on this: There two groups of three in the above
(X <- Y supposed to mean 'Y waits for X'), B <- C <- A and A <- D <-
B. 'B blocking C blocking A' is really the same as if B was holding an
abstract mutex m0 A wants. Likewise, A <- D <- B is equivalent to A
holding an abstract mutex m1 B wants. Conceptually, there are two
threads and two locks here,

B: acquires m0 then m1
A: acquires m1 then m0

and because of the conflicting  locking orders, the whole
shoggoth deadlocks sooner or later (Fhtagn!).

The obvious idea to fix this is to reverse either A or B. I think A
should be reversed because that's probably easier (unless there's some
technical problem with that I don't yet know of) and because this avoids
a situation where some other thread which wants the readlock mutex has to
wait until some completeld unrelated filesystem operations have
completed.

But theory only gets one so far and it would be good if someone capable
of reproducing the problem tested this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-12-18 16:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-16 20:09 [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code Rainer Weikusat
2015-12-17  9:22 ` Hannes Frederic Sowa
2015-12-17 15:28   ` Rainer Weikusat
2015-12-17 15:43     ` Hannes Frederic Sowa
2015-12-17 23:26   ` Rainer Weikusat
2015-12-18 16:04     ` splice-bind deadlock (was: [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code) Rainer Weikusat
2015-12-17 20:34 ` [PATCH] af_unix: Revert 'lock_interruptible' in stream receive code David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).