Re: [RFCv2 1/9] tcp: authopt: Initial support and key management

From: Leonard Crestez <cdleonard@gmail.com>
To: Dmitry Safonov <0x7f454c46@gmail.com>,
	Leonard Crestez <cdleonard@gmail.com>,
	David Ahern <dsahern@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Kuniyuki Iwashima <kuniyu@amazon.co.jp>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Jakub Kicinski <kuba@kernel.org>,
	Yuchung Cheng <ycheng@google.com>,
	Francesco Ruggeri <fruggeri@arista.com>,
	Mat Martineau <mathew.j.martineau@linux.intel.com>,
	Christoph Paasch <cpaasch@apple.com>,
	Ivan Delalande <colona@arista.com>,
	Priyaranjan Jha <priyarjha@google.com>,
	Menglong Dong <dong.menglong@zte.com.cn>,
	open list <linux-kernel@vger.kernel.org>,
	linux-crypto@vger.kernel.org,
	Network Development <netdev@vger.kernel.org>,
	Dmitry Safonov <dima@arista.com>
Subject: Re: [RFCv2 1/9] tcp: authopt: Initial support and key management
Date: Wed, 11 Aug 2021 22:08:57 +0300	[thread overview]
Message-ID: <44e5ae2b-0a9c-b5bc-19d2-f037de061944@gmail.com> (raw)
In-Reply-To: <68749e37-8e29-7a51-2186-7692f5fd6a79@gmail.com>

On 11.08.2021 17:31, Dmitry Safonov wrote:
> On 8/11/21 9:29 AM, Leonard Crestez wrote:
>> On 8/10/21 11:41 PM, Dmitry Safonov wrote:
>>> I wonder if it's also worth saving some bytes by something like
>>> struct tcp_ao_key *key;
>>>
>>> With
>>> struct tcp_ao_key {
>>>         u8 keylen;
>>>         u8 key[0];
>>> };
>>>
>>> Hmm?
>>
>> This increases memory management complexity for very minor gains. Very
>> few tcp_authopt_key will ever be created.
> 
> The change doesn't seem to be big, like:
> --- a/net/ipv4/tcp_authopt.c
> +++ b/net/ipv4/tcp_authopt.c
> @@ -422,8 +422,16 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t
> optval, unsig>
>          key_info = __tcp_authopt_key_info_lookup(sk, info, opt.local_id);
>          if (key_info)
>                  tcp_authopt_key_del(sk, info, key_info);
> +
> +       key = sock_kmalloc(sk, sizeof(*key) + opt.keylen, GFP_KERNEL |
> __GFP_ZERO);
> +       if (!key) {
> +               tcp_authopt_alg_release(alg);
> +               return -ENOMEM;
> +       }
> +
>          key_info = sock_kmalloc(sk, sizeof(*key_info), GFP_KERNEL |
> __GFP_ZERO);
>          if (!key_info) {
> +               sock_kfree_s(sk, key, sizeof(*key) + opt.keylen);
>                  tcp_authopt_alg_release(alg);
>                  return -ENOMEM;
>          }
> 
> I don't know, probably it'll be enough for every user to limit their
> keys by length of 80, but if one day it won't be enough - this ABI will
> be painful to extend.

struct tcp_authopt_key also needs to be modified and a separate 
copy_from_user is required. It's not very complicated but not very 
useful either.

>>>> +struct tcp_authopt_key {
>>>> +       /** @flags: Combination of &enum tcp_authopt_key_flag */
>>>> +       __u32   flags;
>>>> +       /** @local_id: Local identifier */
>>>> +       __u32   local_id;
>>>> +       /** @send_id: keyid value for send */
>>>> +       __u8    send_id;
>>>> +       /** @recv_id: keyid value for receive */
>>>> +       __u8    recv_id;
>>>> +       /** @alg: One of &enum tcp_authopt_alg */
>>>> +       __u8    alg;
>>>> +       /** @keylen: Length of the key buffer */
>>>> +       __u8    keylen;
>>>> +       /** @key: Secret key */
>>>> +       __u8    key[TCP_AUTHOPT_MAXKEYLEN];
>>>> +       /**
>>>> +        * @addr: Key is only valid for this address
>>>> +        *
>>>> +        * Ignored unless TCP_AUTHOPT_KEY_ADDR_BIND flag is set
>>>> +        */
>>>> +       struct __kernel_sockaddr_storage addr;
>>>> +};
>>>
>>> It'll be an ABI if this is accepted. As it is - it can't support
>>> RFC5925 fully.
>>> Extending syscall ABI is painful. I think, even the initial ABI *must*
>>> support
>>> all possible features of the RFC.
>>> In other words, there must be src_addr, src_port, dst_addr and dst_port.
>>> I can see from docs you've written you don't want to support a mix of
>>> different
>>> addr/port MKTs, so you can return -EINVAL or -ENOTSUPP for any value
>>> but zero.
>>> This will create an ABI that can be later extended to fully support RFC.
>>
>> RFC states that MKT connection identifiers can be specified using ranges
>> and wildcards and the details are up to the implementation. Keys are
>> *NOT* just bound to a classical TCP 4-tuple.
>>
>> * src_addr and src_port is implicit in socket binding. Maybe in theory
>> src_addr they could apply for a server socket bound to 0.0.0.0:PORT but
>> userspace can just open more sockets.
>> * dst_port is not supported by MD5 and I can't think of any useful
>> usecase. This is either well known (179 for BGP) or auto-allocated.
>> * tcp_md5 was recently enhanced allow a "prefixlen" for addr and
>> "l3mdev" ifindex binding.
>>
>> This last point shows that the binding features people require can't be
>> easily predicted in advance so it's better to allow the rules to be
>> extended.
> 
> Yeah, I see both changes you mention went on easy way as they reused
> existing paddings in the ABI structures.
> Ok, if you don't want to reserve src_addr/src_port/dst_addr/dst_port,
> than how about reserving some space for those instead?

My idea was that each additional binding featurs can be added as a new 
bit flag in tcp_authopt_key_flag and the structure extended. Older 
applications won't pass the flag and kernel will silently accept the 
shorter optval and you get compatibility.

As far as I can tell MD5 supports binding in 3 ways:

1) By dst ip address
2) By dst ip address and prefixlen
3) By ifindex for vrfs

Current version of tcp_authopt only supports (1) but I think adding the 
other methods isn't actually difficult at all.

I'd rather not guess at future features by adding unused fields.

>>> The same is about key: I don't think you need to define/use
>>> tcp_authopt_alg.
>>> Just use algo name - that way TCP-AO will automatically be able to use
>>> any algo supported by crypto engine.
>>> See how xfrm does it, e.g.:
>>> struct xfrm_algo_auth {
>>>       char        alg_name[64];
>>>       unsigned int    alg_key_len;    /* in bits */
>>>       unsigned int    alg_trunc_len;  /* in bits */
>>>       char        alg_key[0];
>>> };
>>>
>>> So you can let a user chose maclen instead of hard-coding it.
>>> Much more extendable than what you propose.
>>
>> This complicates ABI and implementation with features that are not
>> needed. I'd much rather only expose an enum of real-world tcp-ao
>> algorithms.
> 
> I see it exactly the opposite way: a new enum unnecessary complicates
> ABI, instead of passing alg_name[] to crypto engine. No need to add any
> support in tcp-ao as the algorithms are already provided by kernel.
> That is how it transparently works for ipsec, why not for tcp-ao?

The TCP Authentication Option standard has been out there for many many 
years now and alternative algorithms are not widely used. I think cisco 
has a third algorithm? What you're asking for is a large extension of 
the IETF standard.

If you look at the rest of this series I had a lot of trouble with 
crypto tfm allocation context so I had to create per-cpu pool, similar 
to tcp-md5. Should I potentially create a pool for each alg known to 
crypto-api?

Letting use control algorithms and traffickey and mac lengths creates 
new potential avenues for privilege escalation that need to be checked.

As much as possible I would like to avoid exposing the linux crypto api 
through TCP sockopts.

I was also thinking of having a non-namespaced sysctl to disable 
tcp_authopt by default for security reasons. Unless you're running a 
router you should never let userspace touch these options.

>>> [..]
>>>> +#ifdef CONFIG_TCP_AUTHOPT
>>>> +       case TCP_AUTHOPT: {
>>>> +               struct tcp_authopt info;
>>>> +
>>>> +               if (get_user(len, optlen))
>>>> +                       return -EFAULT;
>>>> +
>>>> +               lock_sock(sk);
>>>> +               tcp_get_authopt_val(sk, &info);
>>>> +               release_sock(sk);
>>>> +
>>>> +               len = min_t(unsigned int, len, sizeof(info));
>>>> +               if (put_user(len, optlen))
>>>> +                       return -EFAULT;
>>>> +               if (copy_to_user(optval, &info, len))
>>>> +                       return -EFAULT;
>>>> +               return 0;
>>>> +       }
>>>
>>> I'm pretty sure it's not a good choice to write partly tcp_authopt.
>>> And a user has no way to check what's the correct len on this kernel.
>>> Instead of len = min_t(unsigned int, len, sizeof(info)), it should be
>>> if (len != sizeof(info))
>>>       return -EINVAL;
>>
>> Purpose is to allow sockopts to grow as md5 has grown.
> 
> md5 has not grown. See above.
> 
> Another issue with your approach
> 
> +       /* If userspace optlen is too short fill the rest with zeros */
> +       if (optlen > sizeof(opt))
> +               return -EINVAL;
> +       memset(&opt, 0, sizeof(opt));
> +       if (copy_from_sockptr(&opt, optval, optlen))
> +               return -EFAULT;
> 
> is that userspace compiled with updated/grew structure will fail on
> older kernel. So, no extension without breaking something is possible.

Userspace that needs new features and also compatibility with older 
kernels could check something like uname.

I think this is already a problem with md5: passing 
TCP_MD5SIG_FLAG_PREFIX on an old kernel simply won't take effect and 
that's fine.

The bigger concern is to ensure that old binaries work without changes.

> Which also reminds me that currently you don't validate (opt.flags) for
> unknown by kernel flags.

Not sure what you mean, it is explicitly only known flags that are 
copied from userspace. I can make setsockopt return an error on unknown 
flags, this will make new apps fail more explicitly on old kernels.

> Extending syscalls is impossible without breaking userspace if ABI is
> not designed with extensibility in mind. That was quite a big problem,
> and still is. Please, read this carefully:
> https://lwn.net/Articles/830666/
> 
> That is why I'm suggesting you all those changes that will be harder to
> fix when/if your patches get accepted.

Both of the sockopt structs have a "flags" field and structure expansion 
will be accompanied by new flags. Is this not sufficient for compatibility?