netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next] xsk: honor SO_BINDTODEVICE on bind
@ 2023-07-03 17:53 Ilya Maximets
  2023-07-03 21:19 ` John Fastabend
  2023-07-04  2:31 ` Jason Wang
  0 siblings, 2 replies; 4+ messages in thread
From: Ilya Maximets @ 2023-07-03 17:53 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Björn Töpel, Magnus Karlsson, Maciej Fijalkowski,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-doc, linux-kernel, Jason Wang, Stefan Hajnoczi,
	Ilya Maximets

Initial creation of an AF_XDP socket requires CAP_NET_RAW capability.
A privileged process might create the socket and pass it to a
non-privileged process for later use.  However, that process will be
able to bind the socket to any network interface.  Even though it will
not be able to receive any traffic without modification of the BPF map,
the situation is not ideal.

Sockets already have a mechanism that can be used to restrict what
interface they can be attached to.  That is SO_BINDTODEVICE.

To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW.

Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer
workflow when non-privileged process is using AF_XDP.

The intended workflow is following:

  1. First process creates a bare socket with socket(AF_XDP, ...).
  2. First process loads the XSK program to the interface.
  3. First process adds the socket fd to a BPF map.
  4. First process ties socket fd to a particular interface using
     SO_BINDTODEVICE.
  5. First process sends socket fd to a second process.
  6. Second process allocates UMEM.
  7. Second process binds socket to the interface with bind(...).
  8. Second process sends/receives the traffic.

All the steps above are possible today if the first process is
privileged and the second one has sufficient RLIMIT_MEMLOCK and no
capabilities.  However, the second process will be able to bind the
socket to any interface it wants on step 7 and send traffic from it.
With the proposed change, the second process will be able to bind
the socket only to a specific interface chosen by the first process
at step 4.

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---

RFC --> PATCH:
  * Better explained intended workflow in a commit message.
  * Added ACK from Magnus.

 Documentation/networking/af_xdp.rst | 9 +++++++++
 net/xdp/xsk.c                       | 6 ++++++
 2 files changed, 15 insertions(+)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index 247c6c4127e9..1cc35de336a4 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -433,6 +433,15 @@ start N bytes into the buffer leaving the first N bytes for the
 application to use. The final option is the flags field, but it will
 be dealt with in separate sections for each UMEM flag.
 
+SO_BINDTODEVICE setsockopt
+--------------------------
+
+This is a generic SOL_SOCKET option that can be used to tie AF_XDP
+socket to a particular network interface.  It is useful when a socket
+is created by a privileged process and passed to a non-privileged one.
+Once the option is set, kernel will refuse attempts to bind that socket
+to a different interface.  Updating the value requires CAP_NET_RAW.
+
 XDP_STATISTICS getsockopt
 -------------------------
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5a8c0dd250af..386ff641db0f 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -886,6 +886,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	struct sock *sk = sock->sk;
 	struct xdp_sock *xs = xdp_sk(sk);
 	struct net_device *dev;
+	int bound_dev_if;
 	u32 flags, qid;
 	int err = 0;
 
@@ -899,6 +900,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		      XDP_USE_NEED_WAKEUP))
 		return -EINVAL;
 
+	bound_dev_if = READ_ONCE(sk->sk_bound_dev_if);
+
+	if (bound_dev_if && bound_dev_if != sxdp->sxdp_ifindex)
+		return -EINVAL;
+
 	rtnl_lock();
 	mutex_lock(&xs->mutex);
 	if (xs->state != XSK_READY) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [PATCH bpf-next] xsk: honor SO_BINDTODEVICE on bind
  2023-07-03 17:53 [PATCH bpf-next] xsk: honor SO_BINDTODEVICE on bind Ilya Maximets
@ 2023-07-03 21:19 ` John Fastabend
  2023-07-04  2:31 ` Jason Wang
  1 sibling, 0 replies; 4+ messages in thread
From: John Fastabend @ 2023-07-03 21:19 UTC (permalink / raw)
  To: Ilya Maximets, netdev, bpf
  Cc: Björn Töpel, Magnus Karlsson, Maciej Fijalkowski,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-doc, linux-kernel, Jason Wang, Stefan Hajnoczi,
	Ilya Maximets

Ilya Maximets wrote:
> Initial creation of an AF_XDP socket requires CAP_NET_RAW capability.
> A privileged process might create the socket and pass it to a
> non-privileged process for later use.  However, that process will be
> able to bind the socket to any network interface.  Even though it will
> not be able to receive any traffic without modification of the BPF map,
> the situation is not ideal.
> 
> Sockets already have a mechanism that can be used to restrict what
> interface they can be attached to.  That is SO_BINDTODEVICE.
> 
> To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW.
> 
> Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer
> workflow when non-privileged process is using AF_XDP.
> 
> The intended workflow is following:
> 
>   1. First process creates a bare socket with socket(AF_XDP, ...).
>   2. First process loads the XSK program to the interface.
>   3. First process adds the socket fd to a BPF map.
>   4. First process ties socket fd to a particular interface using
>      SO_BINDTODEVICE.
>   5. First process sends socket fd to a second process.
>   6. Second process allocates UMEM.
>   7. Second process binds socket to the interface with bind(...).
>   8. Second process sends/receives the traffic.
> 
> All the steps above are possible today if the first process is
> privileged and the second one has sufficient RLIMIT_MEMLOCK and no
> capabilities.  However, the second process will be able to bind the
> socket to any interface it wants on step 7 and send traffic from it.
> With the proposed change, the second process will be able to bind
> the socket only to a specific interface chosen by the first process
> at step 4.
> 
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---

LGTM.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf-next] xsk: honor SO_BINDTODEVICE on bind
  2023-07-03 17:53 [PATCH bpf-next] xsk: honor SO_BINDTODEVICE on bind Ilya Maximets
  2023-07-03 21:19 ` John Fastabend
@ 2023-07-04  2:31 ` Jason Wang
  2023-07-04  9:16   ` Daniel Borkmann
  1 sibling, 1 reply; 4+ messages in thread
From: Jason Wang @ 2023-07-04  2:31 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: netdev, bpf, Björn Töpel, Magnus Karlsson,
	Maciej Fijalkowski, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-doc, linux-kernel,
	Stefan Hajnoczi

On Tue, Jul 4, 2023 at 1:53 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> Initial creation of an AF_XDP socket requires CAP_NET_RAW capability.
> A privileged process might create the socket and pass it to a
> non-privileged process for later use.  However, that process will be
> able to bind the socket to any network interface.  Even though it will
> not be able to receive any traffic without modification of the BPF map,
> the situation is not ideal.
>
> Sockets already have a mechanism that can be used to restrict what
> interface they can be attached to.  That is SO_BINDTODEVICE.
>
> To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW.
>
> Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer
> workflow when non-privileged process is using AF_XDP.
>
> The intended workflow is following:
>
>   1. First process creates a bare socket with socket(AF_XDP, ...).
>   2. First process loads the XSK program to the interface.
>   3. First process adds the socket fd to a BPF map.
>   4. First process ties socket fd to a particular interface using
>      SO_BINDTODEVICE.
>   5. First process sends socket fd to a second process.
>   6. Second process allocates UMEM.
>   7. Second process binds socket to the interface with bind(...).
>   8. Second process sends/receives the traffic.
>
> All the steps above are possible today if the first process is
> privileged and the second one has sufficient RLIMIT_MEMLOCK and no
> capabilities.  However, the second process will be able to bind the
> socket to any interface it wants on step 7 and send traffic from it.
> With the proposed change, the second process will be able to bind
> the socket only to a specific interface chosen by the first process
> at step 4.
>
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Acked-by: Jason Wang <jasowang@redhat.com>

Is this a stable material or not?

Thanks

> ---
>
> RFC --> PATCH:
>   * Better explained intended workflow in a commit message.
>   * Added ACK from Magnus.
>
>  Documentation/networking/af_xdp.rst | 9 +++++++++
>  net/xdp/xsk.c                       | 6 ++++++
>  2 files changed, 15 insertions(+)
>
> diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
> index 247c6c4127e9..1cc35de336a4 100644
> --- a/Documentation/networking/af_xdp.rst
> +++ b/Documentation/networking/af_xdp.rst
> @@ -433,6 +433,15 @@ start N bytes into the buffer leaving the first N bytes for the
>  application to use. The final option is the flags field, but it will
>  be dealt with in separate sections for each UMEM flag.
>
> +SO_BINDTODEVICE setsockopt
> +--------------------------
> +
> +This is a generic SOL_SOCKET option that can be used to tie AF_XDP
> +socket to a particular network interface.  It is useful when a socket
> +is created by a privileged process and passed to a non-privileged one.
> +Once the option is set, kernel will refuse attempts to bind that socket
> +to a different interface.  Updating the value requires CAP_NET_RAW.
> +
>  XDP_STATISTICS getsockopt
>  -------------------------
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 5a8c0dd250af..386ff641db0f 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -886,6 +886,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>         struct sock *sk = sock->sk;
>         struct xdp_sock *xs = xdp_sk(sk);
>         struct net_device *dev;
> +       int bound_dev_if;
>         u32 flags, qid;
>         int err = 0;
>
> @@ -899,6 +900,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>                       XDP_USE_NEED_WAKEUP))
>                 return -EINVAL;
>
> +       bound_dev_if = READ_ONCE(sk->sk_bound_dev_if);
> +
> +       if (bound_dev_if && bound_dev_if != sxdp->sxdp_ifindex)
> +               return -EINVAL;
> +
>         rtnl_lock();
>         mutex_lock(&xs->mutex);
>         if (xs->state != XSK_READY) {
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf-next] xsk: honor SO_BINDTODEVICE on bind
  2023-07-04  2:31 ` Jason Wang
@ 2023-07-04  9:16   ` Daniel Borkmann
  0 siblings, 0 replies; 4+ messages in thread
From: Daniel Borkmann @ 2023-07-04  9:16 UTC (permalink / raw)
  To: Jason Wang, Ilya Maximets
  Cc: netdev, bpf, Björn Töpel, Magnus Karlsson,
	Maciej Fijalkowski, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-doc, linux-kernel,
	Stefan Hajnoczi

On 7/4/23 4:31 AM, Jason Wang wrote:
> On Tue, Jul 4, 2023 at 1:53 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> Initial creation of an AF_XDP socket requires CAP_NET_RAW capability.
>> A privileged process might create the socket and pass it to a
>> non-privileged process for later use.  However, that process will be
>> able to bind the socket to any network interface.  Even though it will
>> not be able to receive any traffic without modification of the BPF map,
>> the situation is not ideal.
>>
>> Sockets already have a mechanism that can be used to restrict what
>> interface they can be attached to.  That is SO_BINDTODEVICE.
>>
>> To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW.
>>
>> Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer
>> workflow when non-privileged process is using AF_XDP.
>>
>> The intended workflow is following:
>>
>>    1. First process creates a bare socket with socket(AF_XDP, ...).
>>    2. First process loads the XSK program to the interface.
>>    3. First process adds the socket fd to a BPF map.
>>    4. First process ties socket fd to a particular interface using
>>       SO_BINDTODEVICE.
>>    5. First process sends socket fd to a second process.
>>    6. Second process allocates UMEM.
>>    7. Second process binds socket to the interface with bind(...).
>>    8. Second process sends/receives the traffic.
>>
>> All the steps above are possible today if the first process is
>> privileged and the second one has sufficient RLIMIT_MEMLOCK and no
>> capabilities.  However, the second process will be able to bind the
>> socket to any interface it wants on step 7 and send traffic from it.
>> With the proposed change, the second process will be able to bind
>> the socket only to a specific interface chosen by the first process
>> at step 4.
>>
>> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> 
> Acked-by: Jason Wang <jasowang@redhat.com>
> 
> Is this a stable material or not?

To me this is a bug rather than 'feature', so I applied it to bpf tree and
also added Fixes tag. Thanks everyone!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-04  9:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-03 17:53 [PATCH bpf-next] xsk: honor SO_BINDTODEVICE on bind Ilya Maximets
2023-07-03 21:19 ` John Fastabend
2023-07-04  2:31 ` Jason Wang
2023-07-04  9:16   ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).