Re: [RFC] nasty corner case in unix_dgram_sendmsg()

From: Jason Baron <jbaron@akamai.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rainer Weikusat <rweikusat@talktalk.net>, netdev@vger.kernel.org
Subject: Re: [RFC] nasty corner case in unix_dgram_sendmsg()
Date: Wed, 27 Feb 2019 11:45:40 -0500	[thread overview]
Message-ID: <59657502-5154-a2ff-ab5f-a432b217f9d6@akamai.com> (raw)
In-Reply-To: <20190226235912.GL2217@ZenIV.linux.org.uk>

On 2/26/19 6:59 PM, Al Viro wrote:
> On Tue, Feb 26, 2019 at 03:35:39PM -0500, Jason Baron wrote:
> 
>>> I understand what the unix_dgram_peer_wake_me() is doing; I understand
>>> what unix_dgram_poll() is using it for.  What I do not understand is
>>> what's the point of doing that in unix_dgram_sendmsg()...
>>>
>>
>> Hi,
>>
>> So the unix_dgram_peer_wake_me() in unix_dgram_sendmsg() is there for
>> epoll in edge-triggered mode. In that case, we want to ensure that if
>> -EAGAIN is returned a subsequent epoll_wait() is not stuck indefinitely.
>> Probably could use a comment...
> 
> *owwww*
> 
> Let me see if I've got it straight - you want the forwarding rearmed,
> so that it would match the behaviour of ep_poll_callback() (i.e.
> removing only when POLLFREE is passed)?  Looks like an odd way to
> do it, if that's what's happening...

If unix_dgram_sendmsg() return -EAGAIN in this case, then a subsequent call
to poll()/select()/epoll_wait() is normally going to do the forwarding rearm
via unix_dgram_poll() (unless its already writeable). However, in the
special case of epoll with edge-trigger, the call to epoll_wait does not
call unix_dgram_poll() and thus the re-arm has to happen in
unix_dgram_sendmsg().

> 
> While we are at it, why disarm a forwarder upon noticing that peer
> is dead?  Wouldn't it be simpler to move that
>         wake_up_interruptible_all(&u->peer_wait);
> in unix_release_sock() to just before
>         unix_state_unlock(sk);
> a line prior?  Then anyone seeing SOCK_DEAD on (locked) peer
> would be guaranteed that all forwarders are gone...
>

The condition we are checking here is unix_recvq_full(), so even if
the wakeup happens under the lock, we could end up waking up the
waiter that still sees unix_recvq_full() because the skb's aren't
freed until *after* the wakeup call. The race is described here:

51f7e95 af_unix: ensure POLLOUT on remote close() for connected dgram socket

Note, that I did have an earlier version of that patch that moved
the wake up call (instead of checking for SOCK_DEAD), see:
https://patchwork.ozlabs.org/patch/944593/

However, I thought that the explicit check for SOCK_DEAD made things
more explicit. IE we don't wait on a SOCK_DEAD socket.

> Another fun question about the same dgram sendmsg:
>                 if (unix_peer(sk) == other) {
>                         unix_peer(sk) = NULL;
>                         unix_dgram_peer_wake_disconnect_wakeup(sk, other);
> 
>                         unix_state_unlock(sk);
> 
>                         unix_dgram_disconnected(sk, other);
> 
> ... and we are holding any locks at the last line.  What happens
> if we have thread A doing
> 	decide which address to talk to
> 	connect(fd, that address)
> 	send request over fd (with send(2) or write(2))
> 	read reply from fd (recv(2) or read(2))
> in a loop, with thread B doing explicit sendto(2) over the same
> socket?
> 
> Suppose B happens to send to the last server thread A was talking
> to and finds it just closed (e.g. because the last request from
> A had been "shut down", which server has honoured).  B gets ECONNREFUSED,
> as it ought to, but it can also ends up disrupting the next exchange
> of A.
> 
> Shouldn't we rather extract the skbs from that queue *before*
> dropping sk->lock?  E.g. move them to a temporary queue, and flush
> that queue after we'd unlocked sk...
> 

If I understand your concern, B drops the lock as above and then
A does a connect() to somewhere else and then B drops skbs from the
new source. Looks plausible. I think in general, A and B would probably
be co-ordinating if they are both reading/writing the same socket,
but I think it probably would make sense to fix this case. Note that,
unix_dgram_disconnected() is also called in unix_dgram_connect() after
the lock is dropped so that would need a similar fix.

Thanks,

-Jason