[net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy

All of lore.kernel.org
 help / color / mirror / Atom feed

* [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
@ 2021-02-06 20:36 Arjun Roy
  2021-02-06 23:28 ` Jakub Kicinski
  2021-02-07 17:49 ` David Ahern
  0 siblings, 2 replies; 17+ messages in thread
From: Arjun Roy @ 2021-02-06 20:36 UTC (permalink / raw)
  To: davem, netdev
  Cc: arjunroy, edumazet, soheil, David Ahern, Leon Romanovsky, Jakub Kicinski

From: Arjun Roy <arjunroy@google.com>

Explicitly define reserved field and require it to be 0-valued.

Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
---
 include/uapi/linux/tcp.h | 2 +-
 net/ipv4/tcp.c           | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 42fc5a640df4..8fc09e8638b3 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -357,6 +357,6 @@ struct tcp_zerocopy_receive {
 	__u64 msg_control; /* ancillary data */
 	__u64 msg_controllen;
 	__u32 msg_flags;
-	/* __u32 hole;  Next we must add >1 u32 otherwise length checks fail. */
+	__u32 reserved; /* set to 0 for now */
 };
 #endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e1a17c6b473c..c8469c579ed8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		}
 		if (copy_from_user(&zc, optval, len))
 			return -EFAULT;
+		if (zc.reserved)
+			return -EINVAL;
 		lock_sock(sk);
 		err = tcp_zerocopy_receive(sk, &zc, &tss);
 		release_sock(sk);
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-06 20:36 [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args Arjun Roy
@ 2021-02-06 23:28 ` Jakub Kicinski
  2021-02-07  8:26   ` Leon Romanovsky
  2021-02-07 17:49 ` David Ahern
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2021-02-06 23:28 UTC (permalink / raw)
  To: Arjun Roy
  Cc: davem, netdev, arjunroy, edumazet, soheil, David Ahern, Leon Romanovsky

On Sat,  6 Feb 2021 12:36:48 -0800 Arjun Roy wrote:
> From: Arjun Roy <arjunroy@google.com>
> 
> Explicitly define reserved field and require it to be 0-valued.

> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e1a17c6b473c..c8469c579ed8 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
>  		}
>  		if (copy_from_user(&zc, optval, len))
>  			return -EFAULT;
> +		if (zc.reserved)
> +			return -EINVAL;
>  		lock_sock(sk);
>  		err = tcp_zerocopy_receive(sk, &zc, &tss);
>  		release_sock(sk);

I was expecting we'd also throw in a check_zeroed_user().
Either we can check if the buffer is zeroed all the way,
or we can't and we shouldn't validate reserved either

	check_zeroed_user(optval + offsetof(reserved), 
			  len - offsetof(reserved))
?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-06 23:28 ` Jakub Kicinski
@ 2021-02-07  8:26   ` Leon Romanovsky
  2021-02-08 18:41     ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Leon Romanovsky @ 2021-02-07  8:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Arjun Roy, davem, netdev, arjunroy, edumazet, soheil, David Ahern

On Sat, Feb 06, 2021 at 03:28:28PM -0800, Jakub Kicinski wrote:
> On Sat,  6 Feb 2021 12:36:48 -0800 Arjun Roy wrote:
> > From: Arjun Roy <arjunroy@google.com>
> >
> > Explicitly define reserved field and require it to be 0-valued.
>
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index e1a17c6b473c..c8469c579ed8 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
> >  		}
> >  		if (copy_from_user(&zc, optval, len))
> >  			return -EFAULT;
> > +		if (zc.reserved)
> > +			return -EINVAL;
> >  		lock_sock(sk);
> >  		err = tcp_zerocopy_receive(sk, &zc, &tss);
> >  		release_sock(sk);
>
> I was expecting we'd also throw in a check_zeroed_user().
> Either we can check if the buffer is zeroed all the way,
> or we can't and we shouldn't validate reserved either
>
> 	check_zeroed_user(optval + offsetof(reserved),
> 			  len - offsetof(reserved))
> ?

There is a check that len is not larger than zs and users can't give
large buffer.

I would say that is pretty safe to write "if (zc.reserved)".

Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-06 20:36 [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args Arjun Roy
  2021-02-06 23:28 ` Jakub Kicinski
@ 2021-02-07 17:49 ` David Ahern
  2021-02-07 17:53   ` David Ahern
  1 sibling, 1 reply; 17+ messages in thread
From: David Ahern @ 2021-02-07 17:49 UTC (permalink / raw)
  To: Arjun Roy, davem, netdev
  Cc: arjunroy, edumazet, soheil, Leon Romanovsky, Jakub Kicinski

On 2/6/21 1:36 PM, Arjun Roy wrote:
> From: Arjun Roy <arjunroy@google.com>
> 
> Explicitly define reserved field and require it to be 0-valued.
> 
> Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.")
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Suggested-by: David Ahern <dsahern@gmail.com>
> Suggested-by: Leon Romanovsky <leon@kernel.org>
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  include/uapi/linux/tcp.h | 2 +-
>  net/ipv4/tcp.c           | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 42fc5a640df4..8fc09e8638b3 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -357,6 +357,6 @@ struct tcp_zerocopy_receive {
>  	__u64 msg_control; /* ancillary data */
>  	__u64 msg_controllen;
>  	__u32 msg_flags;
> -	/* __u32 hole;  Next we must add >1 u32 otherwise length checks fail. */
> +	__u32 reserved; /* set to 0 for now */
>  };
>  #endif /* _UAPI_LINUX_TCP_H */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e1a17c6b473c..c8469c579ed8 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
>  		}
>  		if (copy_from_user(&zc, optval, len))
>  			return -EFAULT;
> +		if (zc.reserved)
> +			return -EINVAL;
>  		lock_sock(sk);
>  		err = tcp_zerocopy_receive(sk, &zc, &tss);
>  		release_sock(sk);
> 


The 'switch (len)' statement needs to be updated now that 'len' is not
going to end on the 'msg_flags' boundary? But then, how did that work
before if there was 4 byte padding?

Maybe I am missing something here. You currently have:

	switch (len) {
	case offsetofend(struct tcp_zerocopy_receive, msg_flags):

which should == 60, yet the struct size is 64 with 4-bytes of padding. A
user doing

	int optlen = sizeof(struct tcp_zerocopy_receive);

	getsockopt(...., &optlen)

would pass in a value of 64, right?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-07 17:49 ` David Ahern
@ 2021-02-07 17:53   ` David Ahern
  0 siblings, 0 replies; 17+ messages in thread
From: David Ahern @ 2021-02-07 17:53 UTC (permalink / raw)
  To: Arjun Roy, davem, netdev
  Cc: arjunroy, edumazet, soheil, Leon Romanovsky, Jakub Kicinski

On 2/7/21 10:49 AM, David Ahern wrote:
> On 2/6/21 1:36 PM, Arjun Roy wrote:
>> From: Arjun Roy <arjunroy@google.com>
>>
>> Explicitly define reserved field and require it to be 0-valued.
>>
>> Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.")
>> Signed-off-by: Arjun Roy <arjunroy@google.com>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
>> Suggested-by: David Ahern <dsahern@gmail.com>
>> Suggested-by: Leon Romanovsky <leon@kernel.org>
>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>> ---
>>  include/uapi/linux/tcp.h | 2 +-
>>  net/ipv4/tcp.c           | 2 ++
>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
>> index 42fc5a640df4..8fc09e8638b3 100644
>> --- a/include/uapi/linux/tcp.h
>> +++ b/include/uapi/linux/tcp.h
>> @@ -357,6 +357,6 @@ struct tcp_zerocopy_receive {
>>  	__u64 msg_control; /* ancillary data */
>>  	__u64 msg_controllen;
>>  	__u32 msg_flags;
>> -	/* __u32 hole;  Next we must add >1 u32 otherwise length checks fail. */
>> +	__u32 reserved; /* set to 0 for now */
>>  };
>>  #endif /* _UAPI_LINUX_TCP_H */
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index e1a17c6b473c..c8469c579ed8 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
>>  		}
>>  		if (copy_from_user(&zc, optval, len))
>>  			return -EFAULT;
>> +		if (zc.reserved)
>> +			return -EINVAL;
>>  		lock_sock(sk);
>>  		err = tcp_zerocopy_receive(sk, &zc, &tss);
>>  		release_sock(sk);
>>
> 
> 
> The 'switch (len)' statement needs to be updated now that 'len' is not
> going to end on the 'msg_flags' boundary? But then, how did that work
> before if there was 4 byte padding?
> 
> Maybe I am missing something here. You currently have:
> 
> 	switch (len) {
> 	case offsetofend(struct tcp_zerocopy_receive, msg_flags):
> 

Ah, I missed the lines before it:

                if (len >= offsetofend(struct tcp_zerocopy_receive,
msg_flags))
                        goto zerocopy_rcv_cmsg;


Also, I see a check on zc.msg_flags for a specific flag option being
set. What about other invalid bits in msg_flags? I do not see a check like:

#define TCP_VALID_ZC_MSG_FLAGS   (TCP_CMSG_TS)

	if (zc.msg_flags & ~(TCP_VALID_ZC_MSG_FLAGS))
		return -EINVAL;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-07  8:26   ` Leon Romanovsky
@ 2021-02-08 18:41     ` Jakub Kicinski
  2021-02-09  2:24       ` David Ahern
  2021-02-09  6:15       ` Leon Romanovsky
  0 siblings, 2 replies; 17+ messages in thread
From: Jakub Kicinski @ 2021-02-08 18:41 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Arjun Roy, davem, netdev, arjunroy, edumazet, soheil, David Ahern

On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:
> On Sat, Feb 06, 2021 at 03:28:28PM -0800, Jakub Kicinski wrote:
> > On Sat,  6 Feb 2021 12:36:48 -0800 Arjun Roy wrote:  
> > > From: Arjun Roy <arjunroy@google.com>
> > >
> > > Explicitly define reserved field and require it to be 0-valued.  
> >  
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index e1a17c6b473c..c8469c579ed8 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
> > >  		}
> > >  		if (copy_from_user(&zc, optval, len))
> > >  			return -EFAULT;
> > > +		if (zc.reserved)
> > > +			return -EINVAL;
> > >  		lock_sock(sk);
> > >  		err = tcp_zerocopy_receive(sk, &zc, &tss);
> > >  		release_sock(sk);  
> >
> > I was expecting we'd also throw in a check_zeroed_user().
> > Either we can check if the buffer is zeroed all the way,
> > or we can't and we shouldn't validate reserved either
> >
> > 	check_zeroed_user(optval + offsetof(reserved),
> > 			  len - offsetof(reserved))
> > ?  
> 
> There is a check that len is not larger than zs and users can't give
> large buffer.
> 
> I would say that is pretty safe to write "if (zc.reserved)".

Which check? There's a check which truncates (writes back to user space
len = min(len, sizeof(zc)). Application can still pass garbage beyond
sizeof(zc) and syscall may start failing in the future if sizeof(zc)
changes.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-08 18:41     ` Jakub Kicinski
@ 2021-02-09  2:24       ` David Ahern
  2021-02-09  2:53         ` Jakub Kicinski
  2021-02-09  6:15       ` Leon Romanovsky
  1 sibling, 1 reply; 17+ messages in thread
From: David Ahern @ 2021-02-09  2:24 UTC (permalink / raw)
  To: Jakub Kicinski, Leon Romanovsky
  Cc: Arjun Roy, davem, netdev, arjunroy, edumazet, soheil

On 2/8/21 11:41 AM, Jakub Kicinski wrote:
> On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:
>> On Sat, Feb 06, 2021 at 03:28:28PM -0800, Jakub Kicinski wrote:
>>> On Sat,  6 Feb 2021 12:36:48 -0800 Arjun Roy wrote:  
>>>> From: Arjun Roy <arjunroy@google.com>
>>>>
>>>> Explicitly define reserved field and require it to be 0-valued.  
>>>  
>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>> index e1a17c6b473c..c8469c579ed8 100644
>>>> --- a/net/ipv4/tcp.c
>>>> +++ b/net/ipv4/tcp.c
>>>> @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
>>>>  		}
>>>>  		if (copy_from_user(&zc, optval, len))
>>>>  			return -EFAULT;
>>>> +		if (zc.reserved)
>>>> +			return -EINVAL;
>>>>  		lock_sock(sk);
>>>>  		err = tcp_zerocopy_receive(sk, &zc, &tss);
>>>>  		release_sock(sk);  
>>>
>>> I was expecting we'd also throw in a check_zeroed_user().
>>> Either we can check if the buffer is zeroed all the way,
>>> or we can't and we shouldn't validate reserved either
>>>
>>> 	check_zeroed_user(optval + offsetof(reserved),
>>> 			  len - offsetof(reserved))
>>> ?  
>>
>> There is a check that len is not larger than zs and users can't give
>> large buffer.
>>
>> I would say that is pretty safe to write "if (zc.reserved)".
> 
> Which check? There's a check which truncates (writes back to user space
> len = min(len, sizeof(zc)). Application can still pass garbage beyond
> sizeof(zc) and syscall may start failing in the future if sizeof(zc)
> changes.
> 

That would be the case for new userspace on old kernel. Extending the
check to the end of the struct would guarantee new userspace can not ask
for something that the running kernel does not understand.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-09  2:24       ` David Ahern
@ 2021-02-09  2:53         ` Jakub Kicinski
  2021-02-09  3:20           ` David Ahern
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2021-02-09  2:53 UTC (permalink / raw)
  To: David Ahern
  Cc: Leon Romanovsky, Arjun Roy, davem, netdev, arjunroy, edumazet, soheil

On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:
> On 2/8/21 11:41 AM, Jakub Kicinski wrote:
> > On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:  
> >> There is a check that len is not larger than zs and users can't give
> >> large buffer.
> >>
> >> I would say that is pretty safe to write "if (zc.reserved)".  
> > 
> > Which check? There's a check which truncates (writes back to user space
> > len = min(len, sizeof(zc)). Application can still pass garbage beyond
> > sizeof(zc) and syscall may start failing in the future if sizeof(zc)
> > changes.
> 
> That would be the case for new userspace on old kernel. Extending the
> check to the end of the struct would guarantee new userspace can not ask
> for something that the running kernel does not understand.

Indeed, so we're agreeing that check_zeroed_user() is needed before
original optlen from user space gets truncated?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-09  2:53         ` Jakub Kicinski
@ 2021-02-09  3:20           ` David Ahern
  2021-02-09  6:29             ` Leon Romanovsky
  2021-02-09 16:59             ` Jakub Kicinski
  0 siblings, 2 replies; 17+ messages in thread
From: David Ahern @ 2021-02-09  3:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Leon Romanovsky, Arjun Roy, davem, netdev, arjunroy, edumazet, soheil

On 2/8/21 7:53 PM, Jakub Kicinski wrote:
> On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:
>> On 2/8/21 11:41 AM, Jakub Kicinski wrote:
>>> On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:  
>>>> There is a check that len is not larger than zs and users can't give
>>>> large buffer.
>>>>
>>>> I would say that is pretty safe to write "if (zc.reserved)".  
>>>
>>> Which check? There's a check which truncates (writes back to user space
>>> len = min(len, sizeof(zc)). Application can still pass garbage beyond
>>> sizeof(zc) and syscall may start failing in the future if sizeof(zc)
>>> changes.
>>
>> That would be the case for new userspace on old kernel. Extending the
>> check to the end of the struct would guarantee new userspace can not ask
>> for something that the running kernel does not understand.
> 
> Indeed, so we're agreeing that check_zeroed_user() is needed before
> original optlen from user space gets truncated?
> 

I thought so, but maybe not. To think through this ...

If current kernel understands a struct of size N, it can only copy that
amount from user to kernel. Anything beyond is ignored in these
multiplexed uAPIs, and that is where the new userspace on old kernel falls.

Known value checks can only be done up to size N. In this case, the
reserved field is at the end of the known struct size, so checking just
the field is fine. Going beyond the reserved field has implications for
extensions to the API which should be handled when those extensions are
added.

So, in short I think the "if (zc.reserved)" is correct as Leon noted.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-08 18:41     ` Jakub Kicinski
  2021-02-09  2:24       ` David Ahern
@ 2021-02-09  6:15       ` Leon Romanovsky
  2021-02-09 16:59         ` Jakub Kicinski
  1 sibling, 1 reply; 17+ messages in thread
From: Leon Romanovsky @ 2021-02-09  6:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Arjun Roy, davem, netdev, arjunroy, edumazet, soheil, David Ahern

On Mon, Feb 08, 2021 at 10:41:43AM -0800, Jakub Kicinski wrote:
> On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:
> > On Sat, Feb 06, 2021 at 03:28:28PM -0800, Jakub Kicinski wrote:
> > > On Sat,  6 Feb 2021 12:36:48 -0800 Arjun Roy wrote:
> > > > From: Arjun Roy <arjunroy@google.com>
> > > >
> > > > Explicitly define reserved field and require it to be 0-valued.
> > >
> > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > > index e1a17c6b473c..c8469c579ed8 100644
> > > > --- a/net/ipv4/tcp.c
> > > > +++ b/net/ipv4/tcp.c
> > > > @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
> > > >  		}
> > > >  		if (copy_from_user(&zc, optval, len))
> > > >  			return -EFAULT;
> > > > +		if (zc.reserved)
> > > > +			return -EINVAL;
> > > >  		lock_sock(sk);
> > > >  		err = tcp_zerocopy_receive(sk, &zc, &tss);
> > > >  		release_sock(sk);
> > >
> > > I was expecting we'd also throw in a check_zeroed_user().
> > > Either we can check if the buffer is zeroed all the way,
> > > or we can't and we shouldn't validate reserved either
> > >
> > > 	check_zeroed_user(optval + offsetof(reserved),
> > > 			  len - offsetof(reserved))
> > > ?
> >
> > There is a check that len is not larger than zs and users can't give
> > large buffer.
> >
> > I would say that is pretty safe to write "if (zc.reserved)".
>
> Which check? There's a check which truncates (writes back to user space
> len = min(len, sizeof(zc)). Application can still pass garbage beyond
> sizeof(zc) and syscall may start failing in the future if sizeof(zc)
> changes.

At least in my tree, we have the length check:
  4155                 if (len > sizeof(zc)) {
  4156                         len = sizeof(zc);
  4157                         if (put_user(len, optlen))
  4158                                 return -EFAULT;
  4159                 }


Ad David wrote below, the "if (zc.reserved)" is enough.

We have following options:
1. Old kernel that have sizeof(sz) upto .reserved and old userspace
-> len <= sizeof(sz) - works correctly.
2. Old kernel that have sizeof(sz) upto .reserved and new userspace that
sends larger struct -> "f (len > sizeof(zc))" will return -EFAULT
3. New kernel that have sizeof(sz) beyond reserved and old userspace
-> any new added field to struct sz should be checked and anyway it is the same as item 1.
4. New kernel and new userspace
-> standard flow.

Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-09  3:20           ` David Ahern
@ 2021-02-09  6:29             ` Leon Romanovsky
  2021-02-09 16:59             ` Jakub Kicinski
  1 sibling, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2021-02-09  6:29 UTC (permalink / raw)
  To: David Ahern
  Cc: Jakub Kicinski, Arjun Roy, davem, netdev, arjunroy, edumazet, soheil

On Mon, Feb 08, 2021 at 08:20:29PM -0700, David Ahern wrote:
> On 2/8/21 7:53 PM, Jakub Kicinski wrote:
> > On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:
> >> On 2/8/21 11:41 AM, Jakub Kicinski wrote:
> >>> On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:
> >>>> There is a check that len is not larger than zs and users can't give
> >>>> large buffer.
> >>>>
> >>>> I would say that is pretty safe to write "if (zc.reserved)".
> >>>
> >>> Which check? There's a check which truncates (writes back to user space
> >>> len = min(len, sizeof(zc)). Application can still pass garbage beyond
> >>> sizeof(zc) and syscall may start failing in the future if sizeof(zc)
> >>> changes.
> >>
> >> That would be the case for new userspace on old kernel. Extending the
> >> check to the end of the struct would guarantee new userspace can not ask
> >> for something that the running kernel does not understand.
> >
> > Indeed, so we're agreeing that check_zeroed_user() is needed before
> > original optlen from user space gets truncated?
> >
>
> I thought so, but maybe not. To think through this ...
>
> If current kernel understands a struct of size N, it can only copy that
> amount from user to kernel. Anything beyond is ignored in these
> multiplexed uAPIs, and that is where the new userspace on old kernel falls.
>
> Known value checks can only be done up to size N. In this case, the
> reserved field is at the end of the known struct size, so checking just
> the field is fine. Going beyond the reserved field has implications for
> extensions to the API which should be handled when those extensions are
> added.

It is handled.
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/ipv4/tcp.c#n4155
		if (len > sizeof(zc)) {
			len = sizeof(zc);
			if (put_user(len, optlen))
				return -EFAULT;
		}

Thanks

>
> So, in short I think the "if (zc.reserved)" is correct as Leon noted.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-09  3:20           ` David Ahern
  2021-02-09  6:29             ` Leon Romanovsky
@ 2021-02-09 16:59             ` Jakub Kicinski
  2021-02-09 23:46               ` Arjun Roy
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2021-02-09 16:59 UTC (permalink / raw)
  To: David Ahern
  Cc: Leon Romanovsky, Arjun Roy, davem, netdev, arjunroy, edumazet, soheil

On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote:
> On 2/8/21 7:53 PM, Jakub Kicinski wrote:
> > On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:  
> >> That would be the case for new userspace on old kernel. Extending the
> >> check to the end of the struct would guarantee new userspace can not ask
> >> for something that the running kernel does not understand.  
> > 
> > Indeed, so we're agreeing that check_zeroed_user() is needed before
> > original optlen from user space gets truncated?
> 
> I thought so, but maybe not. To think through this ...
> 
> If current kernel understands a struct of size N, it can only copy that
> amount from user to kernel. Anything beyond is ignored in these
> multiplexed uAPIs, and that is where the new userspace on old kernel falls.
> 
> Known value checks can only be done up to size N. In this case, the
> reserved field is at the end of the known struct size, so checking just
> the field is fine. Going beyond the reserved field has implications for
> extensions to the API which should be handled when those extensions are
> added.

Let me try one last time.

There is no check in the kernels that len <= N. User can pass any
length _already_. check_zeroed_user() forces the values beyond the
structure length to be known (0) rather than anything. It can only 
avoid breakages in the future.

> So, in short I think the "if (zc.reserved)" is correct as Leon noted.

If it's correct to check some arbitrary part of the buffer is zeroed 
it should be correct to check the entire tail is zeroed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-09  6:15       ` Leon Romanovsky
@ 2021-02-09 16:59         ` Jakub Kicinski
  2021-02-09 19:01           ` Leon Romanovsky
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2021-02-09 16:59 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Arjun Roy, davem, netdev, arjunroy, edumazet, soheil, David Ahern

On Tue, 9 Feb 2021 08:15:11 +0200 Leon Romanovsky wrote:
> At least in my tree, we have the length check:
>   4155                 if (len > sizeof(zc)) {
>   4156                         len = sizeof(zc);
>   4157                         if (put_user(len, optlen))
>   4158                                 return -EFAULT;
>   4159                 }
> 
> 
> Ad David wrote below, the "if (zc.reserved)" is enough.
> 
> We have following options:
> 1. Old kernel that have sizeof(sz) upto .reserved and old userspace
> -> len <= sizeof(sz) - works correctly.  
> 2. Old kernel that have sizeof(sz) upto .reserved and new userspace that
> sends larger struct -> "f (len > sizeof(zc))" will return -EFAULT

Based on the code you quoted? I don't see how. Maybe I need a vacation.
put_user() just copies len back to user space after truncation.

> 3. New kernel that have sizeof(sz) beyond reserved and old userspace
> -> any new added field to struct sz should be checked and anyway it is the same as item 1.  
> 4. New kernel and new userspace
> -> standard flow.  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-09 16:59         ` Jakub Kicinski
@ 2021-02-09 19:01           ` Leon Romanovsky
  0 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2021-02-09 19:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Arjun Roy, davem, netdev, arjunroy, edumazet, soheil, David Ahern

On Tue, Feb 09, 2021 at 08:59:38AM -0800, Jakub Kicinski wrote:
> On Tue, 9 Feb 2021 08:15:11 +0200 Leon Romanovsky wrote:
> > At least in my tree, we have the length check:
> >   4155                 if (len > sizeof(zc)) {
> >   4156                         len = sizeof(zc);
> >   4157                         if (put_user(len, optlen))
> >   4158                                 return -EFAULT;
> >   4159                 }
> >
> >
> > Ad David wrote below, the "if (zc.reserved)" is enough.
> >
> > We have following options:
> > 1. Old kernel that have sizeof(sz) upto .reserved and old userspace
> > -> len <= sizeof(sz) - works correctly.
> > 2. Old kernel that have sizeof(sz) upto .reserved and new userspace that
> > sends larger struct -> "f (len > sizeof(zc))" will return -EFAULT
>
> Based on the code you quoted? I don't see how. Maybe I need a vacation.
> put_user() just copies len back to user space after truncation.

It is me who needs to go to vacation, you are right it doesn't return
for lengths larger than sizeof(zc).

>
> > 3. New kernel that have sizeof(sz) beyond reserved and old userspace
> > -> any new added field to struct sz should be checked and anyway it is the same as item 1.
> > 4. New kernel and new userspace
> > -> standard flow.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-09 16:59             ` Jakub Kicinski
@ 2021-02-09 23:46               ` Arjun Roy
  2021-02-10  4:35                 ` David Ahern
  0 siblings, 1 reply; 17+ messages in thread
From: Arjun Roy @ 2021-02-09 23:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Ahern, Leon Romanovsky, Arjun Roy, David Miller, netdev,
	Eric Dumazet, Soheil Hassas Yeganeh

On Tue, Feb 9, 2021 at 8:59 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote:
> > On 2/8/21 7:53 PM, Jakub Kicinski wrote:
> > > On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:
> > >> That would be the case for new userspace on old kernel. Extending the
> > >> check to the end of the struct would guarantee new userspace can not ask
> > >> for something that the running kernel does not understand.
> > >
> > > Indeed, so we're agreeing that check_zeroed_user() is needed before
> > > original optlen from user space gets truncated?
> >
> > I thought so, but maybe not. To think through this ...
> >
> > If current kernel understands a struct of size N, it can only copy that
> > amount from user to kernel. Anything beyond is ignored in these
> > multiplexed uAPIs, and that is where the new userspace on old kernel falls.
> >
> > Known value checks can only be done up to size N. In this case, the
> > reserved field is at the end of the known struct size, so checking just
> > the field is fine. Going beyond the reserved field has implications for
> > extensions to the API which should be handled when those extensions are
> > added.
>
> Let me try one last time.
>
> There is no check in the kernels that len <= N. User can pass any
> length _already_. check_zeroed_user() forces the values beyond the
> structure length to be known (0) rather than anything. It can only
> avoid breakages in the future.
>
> > So, in short I think the "if (zc.reserved)" is correct as Leon noted.
>
> If it's correct to check some arbitrary part of the buffer is zeroed
> it should be correct to check the entire tail is zeroed.

So, coming back to the thread, I think the following appears to be the
current thoughts:

1. It is requested that, on the kernel as it stands today, fields
beyond zc.msg_flags (including zc.reserved, the only such field as of
this patch) are zero'd out. So a new userspace asking to do specific
things would fail on this old kernel with EINVAL. Old userspace would
work on old or new kernels. New of course works on new kernels.
2. If it's correct to check some arbitrary field (zc.reserved) to be
0, then it should be fine to check this for all future fields >=
reserved in the struct. So some advanced userspace down the line
doesn't get confused.

Strictly speaking, I'm not convinced this is necessary - eg. 64 bytes
struct right now, suppose userspace of the future gives us 96 bytes of
which the last 32 are non-zero for some feature or the other. We, in
the here and now kernel, truncate that length to 64 (as in we only
copy to kernel those first 64 bytes) and set the returned length to
64. The understanding being, any (future, past or present) userspace
consults the output value; and considers anything byte >= the returned
len to be untouched by the kernel executing the call (ie. garbage,
unacted upon).

So, how would this work for old+new userspace on old+new kernel?

A) old+old, new+new: sizes match, no issue
B) new kernel, old userspace: That's not an issue. We have the
switch(len) statement for that.
C) old kernel, new userspace: that's the 96 vs. 64 B example above -
new userspace would see that the kernel only operated on 64 B and
treat the last 32 B as garbage/unacted on.

In this case, we would not give EINVAL on case C, as we would if we
returned EINVAL on a check_zeroed_user() case for fields past
zc.reserved. We'd do a zerocopy operating on just the features we know
about, and communicate to the user that we only acted on features up
until this byte offset.

Now, given this is the case, we still have the padding confusion with
zc.reserved and the current struct size, so we have to force it to 0
as we are doing. But I think we don't need to go beyond this so far.

Thus, my personal preference is to not have the check_zeroed_user()
check. But if the consensus demands it, then it's an easy enough fix.
What are your thoughts?

Thanks,
-Arjun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-09 23:46               ` Arjun Roy
@ 2021-02-10  4:35                 ` David Ahern
  2021-02-10 19:23                   ` Arjun Roy
  0 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2021-02-10  4:35 UTC (permalink / raw)
  To: Arjun Roy, Jakub Kicinski
  Cc: Leon Romanovsky, Arjun Roy, David Miller, netdev, Eric Dumazet,
	Soheil Hassas Yeganeh

On 2/9/21 4:46 PM, Arjun Roy wrote:
> On Tue, Feb 9, 2021 at 8:59 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote:
>>> On 2/8/21 7:53 PM, Jakub Kicinski wrote:
>>>> On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:
>>>>> That would be the case for new userspace on old kernel. Extending the
>>>>> check to the end of the struct would guarantee new userspace can not ask
>>>>> for something that the running kernel does not understand.
>>>>
>>>> Indeed, so we're agreeing that check_zeroed_user() is needed before
>>>> original optlen from user space gets truncated?
>>>
>>> I thought so, but maybe not. To think through this ...
>>>
>>> If current kernel understands a struct of size N, it can only copy that
>>> amount from user to kernel. Anything beyond is ignored in these
>>> multiplexed uAPIs, and that is where the new userspace on old kernel falls.
>>>
>>> Known value checks can only be done up to size N. In this case, the
>>> reserved field is at the end of the known struct size, so checking just
>>> the field is fine. Going beyond the reserved field has implications for
>>> extensions to the API which should be handled when those extensions are
>>> added.
>>
>> Let me try one last time.
>>
>> There is no check in the kernels that len <= N. User can pass any
>> length _already_. check_zeroed_user() forces the values beyond the
>> structure length to be known (0) rather than anything. It can only
>> avoid breakages in the future.
>>
>>> So, in short I think the "if (zc.reserved)" is correct as Leon noted.
>>
>> If it's correct to check some arbitrary part of the buffer is zeroed
>> it should be correct to check the entire tail is zeroed.
> 
> So, coming back to the thread, I think the following appears to be the
> current thoughts:
> 
> 1. It is requested that, on the kernel as it stands today, fields
> beyond zc.msg_flags (including zc.reserved, the only such field as of
> this patch) are zero'd out. So a new userspace asking to do specific
> things would fail on this old kernel with EINVAL. Old userspace would
> work on old or new kernels. New of course works on new kernels.
> 2. If it's correct to check some arbitrary field (zc.reserved) to be
> 0, then it should be fine to check this for all future fields >=
> reserved in the struct. So some advanced userspace down the line
> doesn't get confused.
> 
> Strictly speaking, I'm not convinced this is necessary - eg. 64 bytes
> struct right now, suppose userspace of the future gives us 96 bytes of
> which the last 32 are non-zero for some feature or the other. We, in
> the here and now kernel, truncate that length to 64 (as in we only
> copy to kernel those first 64 bytes) and set the returned length to
> 64. The understanding being, any (future, past or present) userspace
> consults the output value; and considers anything byte >= the returned
> len to be untouched by the kernel executing the call (ie. garbage,
> unacted upon).
> 
> So, how would this work for old+new userspace on old+new kernel?
> 
> A) old+old, new+new: sizes match, no issue
> B) new kernel, old userspace: That's not an issue. We have the
> switch(len) statement for that.
> C) old kernel, new userspace: that's the 96 vs. 64 B example above -
> new userspace would see that the kernel only operated on 64 B and
> treat the last 32 B as garbage/unacted on.
> 
> In this case, we would not give EINVAL on case C, as we would if we
> returned EINVAL on a check_zeroed_user() case for fields past
> zc.reserved. We'd do a zerocopy operating on just the features we know
> about, and communicate to the user that we only acted on features up
> until this byte offset.
> 
> Now, given this is the case, we still have the padding confusion with
> zc.reserved and the current struct size, so we have to force it to 0
> as we are doing. But I think we don't need to go beyond this so far.
> 
> Thus, my personal preference is to not have the check_zeroed_user()
> check. But if the consensus demands it, then it's an easy enough fix.
> What are your thoughts?
> 

bpf uses check_zeroed_user to make sure extensions to its structs are
compatible, so yes, this is required.

Also, you need to address legitimate msg_flags as I mentioned in another
response.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.
  2021-02-10  4:35                 ` David Ahern
@ 2021-02-10 19:23                   ` Arjun Roy
  0 siblings, 0 replies; 17+ messages in thread
From: Arjun Roy @ 2021-02-10 19:23 UTC (permalink / raw)
  To: David Ahern
  Cc: Jakub Kicinski, Leon Romanovsky, Arjun Roy, David Miller, netdev,
	Eric Dumazet, Soheil Hassas Yeganeh

On Tue, Feb 9, 2021 at 8:35 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 2/9/21 4:46 PM, Arjun Roy wrote:
> > On Tue, Feb 9, 2021 at 8:59 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote:
> >>> On 2/8/21 7:53 PM, Jakub Kicinski wrote:
> >>>> On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:
> >>>>> That would be the case for new userspace on old kernel. Extending the
> >>>>> check to the end of the struct would guarantee new userspace can not ask
> >>>>> for something that the running kernel does not understand.
> >>>>
> >>>> Indeed, so we're agreeing that check_zeroed_user() is needed before
> >>>> original optlen from user space gets truncated?
> >>>
> >>> I thought so, but maybe not. To think through this ...
> >>>
> >>> If current kernel understands a struct of size N, it can only copy that
> >>> amount from user to kernel. Anything beyond is ignored in these
> >>> multiplexed uAPIs, and that is where the new userspace on old kernel falls.
> >>>
> >>> Known value checks can only be done up to size N. In this case, the
> >>> reserved field is at the end of the known struct size, so checking just
> >>> the field is fine. Going beyond the reserved field has implications for
> >>> extensions to the API which should be handled when those extensions are
> >>> added.
> >>
> >> Let me try one last time.
> >>
> >> There is no check in the kernels that len <= N. User can pass any
> >> length _already_. check_zeroed_user() forces the values beyond the
> >> structure length to be known (0) rather than anything. It can only
> >> avoid breakages in the future.
> >>
> >>> So, in short I think the "if (zc.reserved)" is correct as Leon noted.
> >>
> >> If it's correct to check some arbitrary part of the buffer is zeroed
> >> it should be correct to check the entire tail is zeroed.
> >
> > So, coming back to the thread, I think the following appears to be the
> > current thoughts:
> >
> > 1. It is requested that, on the kernel as it stands today, fields
> > beyond zc.msg_flags (including zc.reserved, the only such field as of
> > this patch) are zero'd out. So a new userspace asking to do specific
> > things would fail on this old kernel with EINVAL. Old userspace would
> > work on old or new kernels. New of course works on new kernels.
> > 2. If it's correct to check some arbitrary field (zc.reserved) to be
> > 0, then it should be fine to check this for all future fields >=
> > reserved in the struct. So some advanced userspace down the line
> > doesn't get confused.
> >
> > Strictly speaking, I'm not convinced this is necessary - eg. 64 bytes
> > struct right now, suppose userspace of the future gives us 96 bytes of
> > which the last 32 are non-zero for some feature or the other. We, in
> > the here and now kernel, truncate that length to 64 (as in we only
> > copy to kernel those first 64 bytes) and set the returned length to
> > 64. The understanding being, any (future, past or present) userspace
> > consults the output value; and considers anything byte >= the returned
> > len to be untouched by the kernel executing the call (ie. garbage,
> > unacted upon).
> >
> > So, how would this work for old+new userspace on old+new kernel?
> >
> > A) old+old, new+new: sizes match, no issue
> > B) new kernel, old userspace: That's not an issue. We have the
> > switch(len) statement for that.
> > C) old kernel, new userspace: that's the 96 vs. 64 B example above -
> > new userspace would see that the kernel only operated on 64 B and
> > treat the last 32 B as garbage/unacted on.
> >
> > In this case, we would not give EINVAL on case C, as we would if we
> > returned EINVAL on a check_zeroed_user() case for fields past
> > zc.reserved. We'd do a zerocopy operating on just the features we know
> > about, and communicate to the user that we only acted on features up
> > until this byte offset.
> >
> > Now, given this is the case, we still have the padding confusion with
> > zc.reserved and the current struct size, so we have to force it to 0
> > as we are doing. But I think we don't need to go beyond this so far.
> >
> > Thus, my personal preference is to not have the check_zeroed_user()
> > check. But if the consensus demands it, then it's an easy enough fix.
> > What are your thoughts?
> >
>
> bpf uses check_zeroed_user to make sure extensions to its structs are
> compatible, so yes, this is required.

Very well; I shall send out a v3 patch with this.

>
> Also, you need to address legitimate msg_flags as I mentioned in another
> response.

I meant to respond to this earlier but forgot. v3 will address this as well.

-Arjun

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-02-10 19:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-06 20:36 [net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args Arjun Roy
2021-02-06 23:28 ` Jakub Kicinski
2021-02-07  8:26   ` Leon Romanovsky
2021-02-08 18:41     ` Jakub Kicinski
2021-02-09  2:24       ` David Ahern
2021-02-09  2:53         ` Jakub Kicinski
2021-02-09  3:20           ` David Ahern
2021-02-09  6:29             ` Leon Romanovsky
2021-02-09 16:59             ` Jakub Kicinski
2021-02-09 23:46               ` Arjun Roy
2021-02-10  4:35                 ` David Ahern
2021-02-10 19:23                   ` Arjun Roy
2021-02-09  6:15       ` Leon Romanovsky
2021-02-09 16:59         ` Jakub Kicinski
2021-02-09 19:01           ` Leon Romanovsky
2021-02-07 17:49 ` David Ahern
2021-02-07 17:53   ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.