From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AC93C43381 for ; Wed, 27 Feb 2019 16:45:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 410542184A for ; Wed, 27 Feb 2019 16:45:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=akamai.com header.i=@akamai.com header.b="cnKCYe7x" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726877AbfB0Qpw (ORCPT ); Wed, 27 Feb 2019 11:45:52 -0500 Received: from mx0b-00190b01.pphosted.com ([67.231.157.127]:53278 "EHLO mx0b-00190b01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726181AbfB0Qpw (ORCPT ); Wed, 27 Feb 2019 11:45:52 -0500 Received: from pps.filterd (m0122330.ppops.net [127.0.0.1]) by mx0b-00190b01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x1RGbtdt023694; Wed, 27 Feb 2019 16:45:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=jan2016.eng; bh=MlvHY+kJbMuu7KV62M/6xBonL+yJTdRb1ajJzxwYp5Q=; b=cnKCYe7xhPt5zXE1WF54DkCgvbSLOKlduzkd83NQBtAD7NpSR8rKLR64gOPkNXyw2Wf4 TkmH9gX33/BpUeqvZmnoW3rRHZnsn7oRIjz6FE0FLh2UuXyKo+V/GelSoizGZzqTnH2z lWNkiRJQbyKEqHj3UOrbQ/5zDfKtjwF2VFEEGF+WLswo8DPFy+FZpJmU7cVjg/QoE1KF YCm9/Xh62lDy4X/oJhnerBsAleAlpy8T8FFee4s/Bpy3KeGhJtgsAGpE7rOcv5tQ8S/7 EpfeJUamk1+SI9L3AndxXg1z4KPVUabVcf738CtkHFqnrSLiDQLkNHYeYUw5T2jeC6ZS 4A== Received: from prod-mail-ppoint3 (a96-6-114-86.deploy.static.akamaitechnologies.com [96.6.114.86] (may be forged)) by mx0b-00190b01.pphosted.com with ESMTP id 2qvj5d07un-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 27 Feb 2019 16:45:42 +0000 Received: from pps.filterd (prod-mail-ppoint3.akamai.com [127.0.0.1]) by prod-mail-ppoint3.akamai.com (8.16.0.27/8.16.0.27) with SMTP id x1RGWTQw006933; Wed, 27 Feb 2019 11:45:42 -0500 Received: from prod-mail-relay14.akamai.com ([172.27.17.39]) by prod-mail-ppoint3.akamai.com with ESMTP id 2qu2d2cvbr-1; Wed, 27 Feb 2019 11:45:42 -0500 Received: from [0.0.0.0] (prod-ssh-gw01.bos01.corp.akamai.com [172.27.119.138]) by prod-mail-relay14.akamai.com (Postfix) with ESMTP id 6300C81668; Wed, 27 Feb 2019 16:45:41 +0000 (GMT) Subject: Re: [RFC] nasty corner case in unix_dgram_sendmsg() To: Al Viro Cc: Rainer Weikusat , netdev@vger.kernel.org References: <20190225035121.GH2217@ZenIV.linux.org.uk> <20190226062817.GA17962@ZenIV.linux.org.uk> <20190226063804.GI2217@ZenIV.linux.org.uk> <878sy2k1m3.fsf@doppelsaurus.mobileactivedefense.com> <20190226190316.GJ2217@ZenIV.linux.org.uk> <552b3d67-2f43-5831-e4ea-666827de54fe@akamai.com> <20190226235912.GL2217@ZenIV.linux.org.uk> From: Jason Baron Message-ID: <59657502-5154-a2ff-ab5f-a432b217f9d6@akamai.com> Date: Wed, 27 Feb 2019 11:45:40 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <20190226235912.GL2217@ZenIV.linux.org.uk> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-02-27_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1902270112 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-02-27_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1902270113 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 2/26/19 6:59 PM, Al Viro wrote: > On Tue, Feb 26, 2019 at 03:35:39PM -0500, Jason Baron wrote: > >>> I understand what the unix_dgram_peer_wake_me() is doing; I understand >>> what unix_dgram_poll() is using it for. What I do not understand is >>> what's the point of doing that in unix_dgram_sendmsg()... >>> >> >> Hi, >> >> So the unix_dgram_peer_wake_me() in unix_dgram_sendmsg() is there for >> epoll in edge-triggered mode. In that case, we want to ensure that if >> -EAGAIN is returned a subsequent epoll_wait() is not stuck indefinitely. >> Probably could use a comment... > > *owwww* > > Let me see if I've got it straight - you want the forwarding rearmed, > so that it would match the behaviour of ep_poll_callback() (i.e. > removing only when POLLFREE is passed)? Looks like an odd way to > do it, if that's what's happening... If unix_dgram_sendmsg() return -EAGAIN in this case, then a subsequent call to poll()/select()/epoll_wait() is normally going to do the forwarding rearm via unix_dgram_poll() (unless its already writeable). However, in the special case of epoll with edge-trigger, the call to epoll_wait does not call unix_dgram_poll() and thus the re-arm has to happen in unix_dgram_sendmsg(). > > While we are at it, why disarm a forwarder upon noticing that peer > is dead? Wouldn't it be simpler to move that > wake_up_interruptible_all(&u->peer_wait); > in unix_release_sock() to just before > unix_state_unlock(sk); > a line prior? Then anyone seeing SOCK_DEAD on (locked) peer > would be guaranteed that all forwarders are gone... > The condition we are checking here is unix_recvq_full(), so even if the wakeup happens under the lock, we could end up waking up the waiter that still sees unix_recvq_full() because the skb's aren't freed until *after* the wakeup call. The race is described here: 51f7e95 af_unix: ensure POLLOUT on remote close() for connected dgram socket Note, that I did have an earlier version of that patch that moved the wake up call (instead of checking for SOCK_DEAD), see: https://patchwork.ozlabs.org/patch/944593/ However, I thought that the explicit check for SOCK_DEAD made things more explicit. IE we don't wait on a SOCK_DEAD socket. > Another fun question about the same dgram sendmsg: > if (unix_peer(sk) == other) { > unix_peer(sk) = NULL; > unix_dgram_peer_wake_disconnect_wakeup(sk, other); > > unix_state_unlock(sk); > > unix_dgram_disconnected(sk, other); > > ... and we are holding any locks at the last line. What happens > if we have thread A doing > decide which address to talk to > connect(fd, that address) > send request over fd (with send(2) or write(2)) > read reply from fd (recv(2) or read(2)) > in a loop, with thread B doing explicit sendto(2) over the same > socket? > > Suppose B happens to send to the last server thread A was talking > to and finds it just closed (e.g. because the last request from > A had been "shut down", which server has honoured). B gets ECONNREFUSED, > as it ought to, but it can also ends up disrupting the next exchange > of A. > > Shouldn't we rather extract the skbs from that queue *before* > dropping sk->lock? E.g. move them to a temporary queue, and flush > that queue after we'd unlocked sk... > If I understand your concern, B drops the lock as above and then A does a connect() to somewhere else and then B drops skbs from the new source. Looks plausible. I think in general, A and B would probably be co-ordinating if they are both reading/writing the same socket, but I think it probably would make sense to fix this case. Note that, unix_dgram_disconnected() is also called in unix_dgram_connect() after the lock is dropped so that would need a similar fix. Thanks, -Jason