All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] nasty corner case in unix_dgram_sendmsg()
@ 2019-02-25  3:51 Al Viro
  2019-02-26  6:28 ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2019-02-25  3:51 UTC (permalink / raw)
  To: netdev

	Consider the following scenario: sendmsg() with explicit ->msg_name
on unconnected SOCK_DGRAM AF_UNIX socket finds the recepient just about to
die.  We go through
        sk_locked = 0;
        unix_state_lock(other);
restart_locked:
        err = -EPERM;
        if (!unix_may_send(sk, other))
                goto out_unlock;
OK, since other->peer is already NULL
        if (unlikely(sock_flag(other, SOCK_DEAD))) {
Yes, it is.
                /*
                 *      Check with 1003.1g - what should
                 *      datagram error
                 */
                unix_state_unlock(other);
no locks held now...
                sock_put(other);
... and there goes the last reference to other.  We get preempted (to make
the window wider - the race would still exist without preempt, but it
would be much harder to hit).

Memory that used to hold *other gets reused for another AF_UNIX socket,
which gets bound to the same address *and* another thread does connect()
to that address on our socket.  Now unix_peer(sk) is equal to other.
Our thread gets to run again, and
                if (!sk_locked)
                        unix_state_lock(sk);
grabs sk->lock
                err = 0;
                if (unix_peer(sk) == other) {
... yes, it is.  Not the same object, though
                        unix_peer(sk) = NULL;
... and it gets disconnected
                        unix_dgram_peer_wake_disconnect_wakeup(sk, other);

                        unix_state_unlock(sk);

                        unix_dgram_disconnected(sk, other);
... with receive queue purged.

AFAICS, that's bogus.  And easily prevented - all we need here is do
the first sock_put() *after* the "have we just found the peer dead?"
logics, avoiding the memory reuse.

Objections?

PS: unix_dgram_sendmsg() is really much too subtle for its own good -
AFAICS, it *does* avoid blocking operations under sk->lock, but proof
is considerably more complex than one would like it to be...  And
I'm still not convinced that no codepath in it could end up doing
something unpleasant to SOCK_SEQPACKET sockets ;-/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] nasty corner case in unix_dgram_sendmsg()
  2019-02-25  3:51 [RFC] nasty corner case in unix_dgram_sendmsg() Al Viro
@ 2019-02-26  6:28 ` Al Viro
  2019-02-26  6:38   ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2019-02-26  6:28 UTC (permalink / raw)
  To: Rainer Weikusat, Jason Baron; +Cc: netdev

On Mon, Feb 25, 2019 at 03:51:21AM +0000, Al Viro wrote:

> PS: unix_dgram_sendmsg() is really much too subtle for its own good -
> AFAICS, it *does* avoid blocking operations under sk->lock, but proof
> is considerably more complex than one would like it to be...

Several questions about the whole peer_wake mechanism:

1) why do we even bother with that for SOCK_SEQPACKET?  AFAICS, there
we do *not* hit unix_wait_for_peer() on send - it's conditional upon
        if (other != sk &&
            unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
after we'd checked that other is non-NULL and not dead.  For seqpacket
we start with both ends of connection having unix_peer() pointing to
each other and only changed to NULL when one of the ends gets closed.
The socket we'd passed to sendmsg is obviously not dead yet, and we'd
verified that other isn't dead either.  So unix_peer(other) must be
equal to sk and we don't go into that if.

In unix_dgram_poll() we might get to unix_dgram_peer_wake_me() for
seqpacket, but only in case when other is dead.  In that case
unix_dgram_peer_wake_me() will insert our peer_wake into queue,
see SOCK_DEAD, remove peer_wake from queue and return false.
AFAICS, that serves no purpose whatsoever...

2) the logics in sendmsg is very odd:
	* if we'd detected n:1 send *and* found that we need to
wait, do so (not using the peer_wake - other's peer_wait is not
going away).  No questions there.
	* if we'd detected n:1 send *and* found that we need to
wait, but don't have any remaining timeout, we lock both ends
and check if unix_peer(sk) is (now) not equal to other, either
because it never had or because we'd been hit by connect(2) while
we'd been doing the locking.  In that case we fail with -EAGAIN.
Fair enough, but
	* if after relocking we see that unix_peer(sk) now
is equal to other, we arrange for wakeup forwarding from other's
peer_wait *and* if that has (likely) succeeded we fail with -EAGAIN.
Huh?  What's the point?  The only thing that depends upon that
wakeup forwarding is poll, and _that_ will set the forwarding up
on its own.
	* if we'd failed (either because other is dead now or
because its queue is not full), we go back to restart_locked.
If it's dead, we'll sod off with ECONNREFUSED, if it's not
full anymore, we'll send the damn thing.

All of that comes at the cost of considerable headache in
unix_dgram_sendmsg() - flag-conditional locking, etc.  Why do
we bother?  What's wrong with simple "n:1 case detected, queue
full, no timeout left, return -EAGAIN and be done with that"?

IDGI...  Am I missing something subtle going on here?

I understand what peer_wake is for, and the poll side of things
is fine; it's sendmsg one that looks weird.  What's going on
there?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] nasty corner case in unix_dgram_sendmsg()
  2019-02-26  6:28 ` Al Viro
@ 2019-02-26  6:38   ` Al Viro
  2019-02-26 15:31     ` Rainer Weikusat
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2019-02-26  6:38 UTC (permalink / raw)
  To: Rainer Weikusat, Jason Baron; +Cc: netdev

On Tue, Feb 26, 2019 at 06:28:17AM +0000, Al Viro wrote:
> 2) the logics in sendmsg is very odd:
> 	* if we'd detected n:1 send *and* found that we need to
> wait, do so (not using the peer_wake - other's peer_wait is not
> going away).  No questions there.
> 	* if we'd detected n:1 send *and* found that we need to
> wait, but don't have any remaining timeout, we lock both ends
> and check if unix_peer(sk) is (now) not equal to other, either
> because it never had or because we'd been hit by connect(2) while
> we'd been doing the locking.  In that case we fail with -EAGAIN.
> Fair enough, but
> 	* if after relocking we see that unix_peer(sk) now
> is equal to other, we arrange for wakeup forwarding from other's
> peer_wait *and* if that has (likely) succeeded we fail with -EAGAIN.
> Huh?  What's the point?  The only thing that depends upon that
> wakeup forwarding is poll, and _that_ will set the forwarding up
> on its own.
> 	* if we'd failed (either because other is dead now or
> because its queue is not full), we go back to restart_locked.
> If it's dead, we'll sod off with ECONNREFUSED, if it's not
> full anymore, we'll send the damn thing.
> 
> All of that comes at the cost of considerable headache in
> unix_dgram_sendmsg() - flag-conditional locking, etc.  Why do
> we bother?  What's wrong with simple "n:1 case detected, queue
> full, no timeout left, return -EAGAIN and be done with that"?
> 
> IDGI...  Am I missing something subtle going on here?
> 
> I understand what peer_wake is for, and the poll side of things
> is fine; it's sendmsg one that looks weird.  What's going on
> there?

What I have in mind is something like this (for that part of the
issues):

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 74d1eed7cbd4..85d72ea79924 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1635,7 +1635,6 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	long timeo;
 	struct scm_cookie scm;
 	int data_len = 0;
-	int sk_locked;
 
 	wait_for_unix_gc();
 	err = scm_send(sock, msg, &scm, false);
@@ -1713,9 +1712,8 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 		goto out_free;
 	}
 
-	sk_locked = 0;
 	unix_state_lock(other);
-restart_locked:
+
 	err = -EPERM;
 	if (!unix_may_send(sk, other))
 		goto out_unlock;
@@ -1728,8 +1726,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 		unix_state_unlock(other);
 		sock_put(other);
 
-		if (!sk_locked)
-			unix_state_lock(sk);
+		unix_state_lock(sk);
 
 		err = 0;
 		if (unix_peer(sk) == other) {
@@ -1767,36 +1764,19 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	 */
 	if (other != sk &&
 	    unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
-		if (timeo) {
-			timeo = unix_wait_for_peer(other, timeo);
-
-			err = sock_intr_errno(timeo);
-			if (signal_pending(current))
-				goto out_free;
-
-			goto restart;
-		}
-
-		if (!sk_locked) {
-			unix_state_unlock(other);
-			unix_state_double_lock(sk, other);
-		}
-
-		if (unix_peer(sk) != other ||
-		    unix_dgram_peer_wake_me(sk, other)) {
+		if (!timeo) {
 			err = -EAGAIN;
-			sk_locked = 1;
 			goto out_unlock;
 		}
 
-		if (!sk_locked) {
-			sk_locked = 1;
-			goto restart_locked;
-		}
-	}
+		timeo = unix_wait_for_peer(other, timeo);
 
-	if (unlikely(sk_locked))
-		unix_state_unlock(sk);
+		err = sock_intr_errno(timeo);
+		if (signal_pending(current))
+			goto out_free;
+
+		goto restart;
+	}
 
 	if (sock_flag(other, SOCK_RCVTSTAMP))
 		__net_timestamp(skb);
@@ -1809,8 +1789,6 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	return len;
 
 out_unlock:
-	if (sk_locked)
-		unix_state_unlock(sk);
 	unix_state_unlock(other);
 out_free:
 	kfree_skb(skb);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC] nasty corner case in unix_dgram_sendmsg()
  2019-02-26  6:38   ` Al Viro
@ 2019-02-26 15:31     ` Rainer Weikusat
  2019-02-26 19:03       ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Rainer Weikusat @ 2019-02-26 15:31 UTC (permalink / raw)
  To: Al Viro; +Cc: Jason Baron, netdev

Al Viro <viro@zeniv.linux.org.uk> writes:
> On Tue, Feb 26, 2019 at 06:28:17AM +0000, Al Viro wrote:

[...]


>> 	* if after relocking we see that unix_peer(sk) now
>> is equal to other, we arrange for wakeup forwarding from other's
>> peer_wait *and* if that has (likely) succeeded we fail with -EAGAIN.
>> Huh?

This returns 1 if sending isn't possible at the moment, ie, if the
process which tries to send has to wait.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] nasty corner case in unix_dgram_sendmsg()
  2019-02-26 15:31     ` Rainer Weikusat
@ 2019-02-26 19:03       ` Al Viro
  2019-02-26 20:35         ` Jason Baron
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2019-02-26 19:03 UTC (permalink / raw)
  To: Rainer Weikusat; +Cc: Jason Baron, netdev

On Tue, Feb 26, 2019 at 03:31:32PM +0000, Rainer Weikusat wrote:
> Al Viro <viro@zeniv.linux.org.uk> writes:
> > On Tue, Feb 26, 2019 at 06:28:17AM +0000, Al Viro wrote:
> 
> [...]
> 
> 
> >> 	* if after relocking we see that unix_peer(sk) now
> >> is equal to other, we arrange for wakeup forwarding from other's
> >> peer_wait *and* if that has (likely) succeeded we fail with -EAGAIN.
> >> Huh?
> 
> This returns 1 if sending isn't possible at the moment, ie, if the
> process which tries to send has to wait.

Except that in _this_ case we won't be waiting at all - we'll just
return -EAGAIN (as one could expect, what with no timeout given/left).
So what's the point of forwarding wakeups?  IOW, what is it that we
expect to be waiting on sk_sleep(sk)?  Note that it won't be this
call of sendmsg(2) (it'll bugger off without any further waiting).
It won't be subsequent calls of sendmsg(2) either - they either
sleep on skb allocation (which has nothing to do with destination)
_or_ they sleep directly on other->peer_wait.  And poll(), while it
will be sleeping on sk_sleep(sk), will make sure to set the forwarding 
up.

I understand what the unix_dgram_peer_wake_me() is doing; I understand
what unix_dgram_poll() is using it for.  What I do not understand is
what's the point of doing that in unix_dgram_sendmsg()...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] nasty corner case in unix_dgram_sendmsg()
  2019-02-26 19:03       ` Al Viro
@ 2019-02-26 20:35         ` Jason Baron
  2019-02-26 23:59           ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Baron @ 2019-02-26 20:35 UTC (permalink / raw)
  To: Al Viro, Rainer Weikusat; +Cc: netdev



On 2/26/19 2:03 PM, Al Viro wrote:
> On Tue, Feb 26, 2019 at 03:31:32PM +0000, Rainer Weikusat wrote:
>> Al Viro <viro@zeniv.linux.org.uk> writes:
>>> On Tue, Feb 26, 2019 at 06:28:17AM +0000, Al Viro wrote:
>>
>> [...]
>>
>>
>>>> 	* if after relocking we see that unix_peer(sk) now
>>>> is equal to other, we arrange for wakeup forwarding from other's
>>>> peer_wait *and* if that has (likely) succeeded we fail with -EAGAIN.
>>>> Huh?
>>
>> This returns 1 if sending isn't possible at the moment, ie, if the
>> process which tries to send has to wait.
> 
> Except that in _this_ case we won't be waiting at all - we'll just
> return -EAGAIN (as one could expect, what with no timeout given/left).
> So what's the point of forwarding wakeups?  IOW, what is it that we
> expect to be waiting on sk_sleep(sk)?  Note that it won't be this
> call of sendmsg(2) (it'll bugger off without any further waiting).
> It won't be subsequent calls of sendmsg(2) either - they either
> sleep on skb allocation (which has nothing to do with destination)
> _or_ they sleep directly on other->peer_wait.  And poll(), while it
> will be sleeping on sk_sleep(sk), will make sure to set the forwarding 
> up.
> 
> I understand what the unix_dgram_peer_wake_me() is doing; I understand
> what unix_dgram_poll() is using it for.  What I do not understand is
> what's the point of doing that in unix_dgram_sendmsg()...
> 

Hi,

So the unix_dgram_peer_wake_me() in unix_dgram_sendmsg() is there for
epoll in edge-triggered mode. In that case, we want to ensure that if
-EAGAIN is returned a subsequent epoll_wait() is not stuck indefinitely.
Probably could use a comment...

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] nasty corner case in unix_dgram_sendmsg()
  2019-02-26 20:35         ` Jason Baron
@ 2019-02-26 23:59           ` Al Viro
  2019-02-27 16:45             ` Jason Baron
  0 siblings, 1 reply; 8+ messages in thread
From: Al Viro @ 2019-02-26 23:59 UTC (permalink / raw)
  To: Jason Baron; +Cc: Rainer Weikusat, netdev

On Tue, Feb 26, 2019 at 03:35:39PM -0500, Jason Baron wrote:

> > I understand what the unix_dgram_peer_wake_me() is doing; I understand
> > what unix_dgram_poll() is using it for.  What I do not understand is
> > what's the point of doing that in unix_dgram_sendmsg()...
> > 
> 
> Hi,
> 
> So the unix_dgram_peer_wake_me() in unix_dgram_sendmsg() is there for
> epoll in edge-triggered mode. In that case, we want to ensure that if
> -EAGAIN is returned a subsequent epoll_wait() is not stuck indefinitely.
> Probably could use a comment...

*owwww*

Let me see if I've got it straight - you want the forwarding rearmed,
so that it would match the behaviour of ep_poll_callback() (i.e.
removing only when POLLFREE is passed)?  Looks like an odd way to
do it, if that's what's happening...

While we are at it, why disarm a forwarder upon noticing that peer
is dead?  Wouldn't it be simpler to move that
        wake_up_interruptible_all(&u->peer_wait);
in unix_release_sock() to just before
        unix_state_unlock(sk);
a line prior?  Then anyone seeing SOCK_DEAD on (locked) peer
would be guaranteed that all forwarders are gone...

Another fun question about the same dgram sendmsg:
                if (unix_peer(sk) == other) {
                        unix_peer(sk) = NULL;
                        unix_dgram_peer_wake_disconnect_wakeup(sk, other);

                        unix_state_unlock(sk);

                        unix_dgram_disconnected(sk, other);

... and we are holding any locks at the last line.  What happens
if we have thread A doing
	decide which address to talk to
	connect(fd, that address)
	send request over fd (with send(2) or write(2))
	read reply from fd (recv(2) or read(2))
in a loop, with thread B doing explicit sendto(2) over the same
socket?

Suppose B happens to send to the last server thread A was talking
to and finds it just closed (e.g. because the last request from
A had been "shut down", which server has honoured).  B gets ECONNREFUSED,
as it ought to, but it can also ends up disrupting the next exchange
of A.

Shouldn't we rather extract the skbs from that queue *before*
dropping sk->lock?  E.g. move them to a temporary queue, and flush
that queue after we'd unlocked sk...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] nasty corner case in unix_dgram_sendmsg()
  2019-02-26 23:59           ` Al Viro
@ 2019-02-27 16:45             ` Jason Baron
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Baron @ 2019-02-27 16:45 UTC (permalink / raw)
  To: Al Viro; +Cc: Rainer Weikusat, netdev



On 2/26/19 6:59 PM, Al Viro wrote:
> On Tue, Feb 26, 2019 at 03:35:39PM -0500, Jason Baron wrote:
> 
>>> I understand what the unix_dgram_peer_wake_me() is doing; I understand
>>> what unix_dgram_poll() is using it for.  What I do not understand is
>>> what's the point of doing that in unix_dgram_sendmsg()...
>>>
>>
>> Hi,
>>
>> So the unix_dgram_peer_wake_me() in unix_dgram_sendmsg() is there for
>> epoll in edge-triggered mode. In that case, we want to ensure that if
>> -EAGAIN is returned a subsequent epoll_wait() is not stuck indefinitely.
>> Probably could use a comment...
> 
> *owwww*
> 
> Let me see if I've got it straight - you want the forwarding rearmed,
> so that it would match the behaviour of ep_poll_callback() (i.e.
> removing only when POLLFREE is passed)?  Looks like an odd way to
> do it, if that's what's happening...

If unix_dgram_sendmsg() return -EAGAIN in this case, then a subsequent call
to poll()/select()/epoll_wait() is normally going to do the forwarding rearm
via unix_dgram_poll() (unless its already writeable). However, in the
special case of epoll with edge-trigger, the call to epoll_wait does not
call unix_dgram_poll() and thus the re-arm has to happen in
unix_dgram_sendmsg().

> 
> While we are at it, why disarm a forwarder upon noticing that peer
> is dead?  Wouldn't it be simpler to move that
>         wake_up_interruptible_all(&u->peer_wait);
> in unix_release_sock() to just before
>         unix_state_unlock(sk);
> a line prior?  Then anyone seeing SOCK_DEAD on (locked) peer
> would be guaranteed that all forwarders are gone...
>

The condition we are checking here is unix_recvq_full(), so even if
the wakeup happens under the lock, we could end up waking up the
waiter that still sees unix_recvq_full() because the skb's aren't
freed until *after* the wakeup call. The race is described here:

51f7e95 af_unix: ensure POLLOUT on remote close() for connected dgram socket

Note, that I did have an earlier version of that patch that moved
the wake up call (instead of checking for SOCK_DEAD), see:
https://patchwork.ozlabs.org/patch/944593/

However, I thought that the explicit check for SOCK_DEAD made things
more explicit. IE we don't wait on a SOCK_DEAD socket.

> Another fun question about the same dgram sendmsg:
>                 if (unix_peer(sk) == other) {
>                         unix_peer(sk) = NULL;
>                         unix_dgram_peer_wake_disconnect_wakeup(sk, other);
> 
>                         unix_state_unlock(sk);
> 
>                         unix_dgram_disconnected(sk, other);
> 
> ... and we are holding any locks at the last line.  What happens
> if we have thread A doing
> 	decide which address to talk to
> 	connect(fd, that address)
> 	send request over fd (with send(2) or write(2))
> 	read reply from fd (recv(2) or read(2))
> in a loop, with thread B doing explicit sendto(2) over the same
> socket?
> 
> Suppose B happens to send to the last server thread A was talking
> to and finds it just closed (e.g. because the last request from
> A had been "shut down", which server has honoured).  B gets ECONNREFUSED,
> as it ought to, but it can also ends up disrupting the next exchange
> of A.
> 
> Shouldn't we rather extract the skbs from that queue *before*
> dropping sk->lock?  E.g. move them to a temporary queue, and flush
> that queue after we'd unlocked sk...
> 

If I understand your concern, B drops the lock as above and then
A does a connect() to somewhere else and then B drops skbs from the
new source. Looks plausible. I think in general, A and B would probably
be co-ordinating if they are both reading/writing the same socket,
but I think it probably would make sense to fix this case. Note that,
unix_dgram_disconnected() is also called in unix_dgram_connect() after
the lock is dropped so that would need a similar fix.

Thanks,

-Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-02-27 16:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-25  3:51 [RFC] nasty corner case in unix_dgram_sendmsg() Al Viro
2019-02-26  6:28 ` Al Viro
2019-02-26  6:38   ` Al Viro
2019-02-26 15:31     ` Rainer Weikusat
2019-02-26 19:03       ` Al Viro
2019-02-26 20:35         ` Jason Baron
2019-02-26 23:59           ` Al Viro
2019-02-27 16:45             ` Jason Baron

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.