From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lorenzo Colitti Subject: Re: Add a SOCK_DESTROY operation to close sockets from userspace Date: Thu, 19 Nov 2015 14:13:48 +0900 Message-ID: References: <20151118.153508.123902005995190872.davem@davemloft.net> <1447879416.562854.443622857.62708268@webmail.messagingengine.com> <20151118.224919.452852815199526735.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Hannes Frederic Sowa , Eric Dumazet , Stephen Hemminger , "netdev@vger.kernel.org" , Eric Dumazet , Erik Kline , =?UTF-8?Q?Maciej_=C5=BBenczykowski?= , Dmitry Torokhov To: David Miller Return-path: Received: from mail-yk0-f174.google.com ([209.85.160.174]:32923 "EHLO mail-yk0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750715AbbKSFOJ (ORCPT ); Thu, 19 Nov 2015 00:14:09 -0500 Received: by ykdv3 with SMTP id v3so96171464ykd.0 for ; Wed, 18 Nov 2015 21:14:08 -0800 (PST) In-Reply-To: <20151118.224919.452852815199526735.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Nov 19, 2015 at 12:49 PM, David Miller wrote: > What if we implemented this the other way. The operations that make > the sockets no longer connected to the world, close them. The route > delete during address removal does the socket scan and then the done > calls on those sockets. In many cases it's not that simple. Routing can be as complex as the RPDB allows it to be, and in general the kernel cannot know if a socket is routable or not. As an example, a system might use mark-based routing, like so: 100 from all fwmark aaaa/0xffff lookup wifi 200 from all fwmark bbbb/0xffff lookup cell 9999 from all lookup wifi (This is the basic idea of what Android >= 5.0 does). Suppose that a VPN connects and routing needs to be moved to the VPN. The system might implement this by adding the following rule: 50 from all fwmark 0x0/0x10000 lookup vpn Now all sockets where the fwmark matches aaaa/0x1ffff are dead in the water. They have the wifi source address, but they are routed to the VPN and go nowhere. The system can't remove the wifi rule or take wifi down, because the VPN socket itself (which will have a mark of 0x1aaaa/0x1000) needs to continue to work on wifi. It can't route those sockets over wifi, because the user expects that the VPN is securing all network traffic. In this situation, even if the kernel were to examine all sockets when the rule is added, how would it know that sockets with a mark of 1aaaa should now be closed? The IP address is still there. Routing lookups on those sockets will succeed just fine - they just now point to the VPN, which doesn't work. > The more I think about it more the more I agree with him and dislike > having user space make sure "it's ok", that isn't where TCP protocol > semantic rules are implemented. It belongs in the kernel. Today any app can always, on one of its sockets, set SO_LINGER with a timeout of 0 and call tcp_close. That results in immediately sending a RST and forgetting about local state. (Those semantics are the ones of RFC 793 ABORT.) If SOCK_DESTROY did that instead of just calling tcp_done, would that be acceptable?