From mboxrd@z Thu Jan  1 00:00:00 1970
From: Guillaume Nault <g.nault@alphalink.fr>
Subject: Re: [PATCH net-next] kcm: do not attach sockets if sk_user_data is
 already used
Date: Thu, 18 Jan 2018 16:18:53 +0100
Message-ID: <20180118151853.GL1422@alphalink.fr>
References: <CALx6S35gZbfY-SwLjrvdazMWAMpC+fVhO8+O7C0QzoVaTdtKjg@mail.gmail.com>
 <20180116.140052.88231511056432866.davem@davemloft.net>
 <CAEwTi7QAYdiYdtWv+btWpDRGYr1oQXkF6zJm2tDnO-tNA5wU2g@mail.gmail.com>
 <20180117.142538.1972806008716856078.davem@davemloft.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: jchapman@katalix.com, tom@herbertland.com, netdev@vger.kernel.org
To: David Miller <davem@davemloft.net>
Return-path: <netdev-owner@vger.kernel.org>
Received: from zimbra.alphalink.fr ([217.15.80.77]:47607 "EHLO
        zimbra.alphalink.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S932124AbeARPS7 (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 18 Jan 2018 10:18:59 -0500
Content-Disposition: inline
In-Reply-To: <20180117.142538.1972806008716856078.davem@davemloft.net>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Wed, Jan 17, 2018 at 02:25:38PM -0500, David Miller wrote:
> From: James Chapman <jchapman@katalix.com>
> Date: Wed, 17 Jan 2018 11:13:33 +0000
> 
> > On 16 January 2018 at 19:00, David Miller <davem@davemloft.net> wrote:
> >> From: Tom Herbert <tom@herbertland.com>
> >> Date: Tue, 16 Jan 2018 09:36:41 -0800
> >>
> >>> sk_user_data is set with the sk_callback lock held in code below.
> >>> Should be able to take the lock earlier can do this check under the
> >>> lock.
> >>
> >> csock, and this csk, is obtained from an arbitrary one of the
> >> process's FDs.  It can be any socket type or family, and that socket's
> >> family might set sk_user_data without the callback lock.
> >>
> >> The only socket type check is making sure it is not another PF_KCM
> >> socket.  So that doesn't help with this problem.
> > 
> > Is it the intention to update all socket code over time to write
> > sk_user_data within the sk_callback lock? If so, I'm happy to address
> > that in the l2tp code (and update the kcm patch to check sk_user_data
> > within the sk_callback lock). Or is the preferred solution to restrict
> > KCM to specific socket families, as suggested by Guillaume earlier in
> > the thread?
> 
> I think we have a more fundamental issue here.
> 
> sk->sk_user_data is a place where RPC layer specific data is hung off
> of.  By this definition SunRPC, RXRPC, RDS, TIPC, and KCM are all
> using it correctly.
> 
> Phonet has a similar issue to the one seen here, it tests and changes
> sk_user_data under lock_sock().  The only requirement it makes is
> that the socket type is not SOCK_STREAM.  However, this one might be OK
> since only pep_sock sockets can be passed down into gprs_attach().
> 
But, if I read it correctly, that doesn't prevent it from being passed
to kcm_attach() later on, which will overwrite sk_user_data (unless we
update the locking scheme and refuse to overwrite sk_user_data in a
race-free way).

BTW couldn't the gprs_dev pointer be embedded in struct pep_sock?
This way pep_sk(sk)->gp could be used instead of sk->sk_user_data.
That'd probably be a violation of the phonet's layering, as that'd
tie gprs_dev to pep sockets. OTOH, only pep sockets can currently be
attached to gprs_dev, so in practice that might be a reasonable
compromise.

> Most of these cases like SunRPC, RXRPC, etc. are fine because they
> only graft on top of TCP and UDP sockets.
> 
> The weird situation here is that L2TP does tunneling and stores it's
> private state in sk->sk_user_data like an RPC layer would.  And KCM
> allows basically any socket type to be attached.
> 
> The RPC layers create their sockets internally, so I cannot see a way
> that those can be sent to a KCM attach operations.  And I think that
> is why this RPC invariant is important for sk_user_data usage.
> 
SunRPC seems to possibly set sk_user_data on user sockets: svc_addsock()
gets a socket using sockfd_lookup() then passes it to svc_setup_socket()
which in turn sets sk_user_data. I don't know anything about SunRPC, so
I might very well have missed important details, but I believe such a
socket could be passed to KCM which could lead to the same kind of
issues as for L2TP. Other RPCs look safe to me.

> If all else was equal, even though it doesn't make much sense to KCM
> attach L2TP sockets to KCM, I would suggest to change L2TP to store
> it's private stuff elsewhere.
> 
> But that is not the case.  Anything using the generic UDP
> encapsulation layer is going to make use of sk->sk_user_data like this
> (see setup_udp_tunnel_sock).
> 
Most UDP encapsulations only use kernel sockets though. It seems that
only L2TP and GTP use setup_udp_tunnel_sock() with userpsace sockets.
So it might be feasible to restrict usage of sk_user_data to kernel
sockets only.

For L2TP, we probably can adapt l2tp_sock_to_tunnel() so that it does
a lookup in a hashtable indexed by the socket pointer, rather than
dereferencing sk_user_data. That doesn't look very satisfying to me,
but that's the only way I found so far.

We also have another user of sk_user_data in l2tp_ppp, but since it
uses its own socket type, I guess we could simply embed the pointer in
its parent structure.

> It looks like over time we've accumulated this new class of uses
> of sk->sk_user_data, ho hum...
> 
> And it's not like we can add a test to KCM to avoid these socket
> types, because they will look like normal UDP datagram sockets.
> 
> What a mess...
> 
> Furthermore, even if you add a test to KCM, you will now need to
> add the same test to L2TP and anything else which uses sk_user_data
> for tunneling and for which userspace has access to the socket fd.
> 
> And it will be racy, indeed, until all such users align to the same
> precise locking scheme for tests and updates to sk_user_data.
> 
> Again, what a mess...
> 
So, if I understand correctly, we can either restrict sk_user_data to
kernel sockets so that KCM couldn't act on them (but then why would we
make an exception for KCM and allow it to set sk_user_data on
non-kernel sockets?).
Or we could agree on a locking scheme for sk_user_data and update all
users so that they'd fail instead of overwriting it when it's not NULL.

Assuming my understanding is correct, do you have any preference for
fixing this issue? Or any other ideas?