Re: Netlink mmap tx security?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Netlink mmap tx security?
       [not found] <CALCETrWsCqy7KEKPTOeHjrO1c2JE1E_seO_J+yA=sYFhS11NXQ@mail.gmail.com>
@ 2014-05-12 21:08 ` Andy Lutomirski
  2014-10-11 22:29   ` Andy Lutomirski
  0 siblings, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2014-05-12 21:08 UTC (permalink / raw)
  To: David S. Miller, Network Development, Linus Torvalds, Patrick McHardy

[moving to netdev -- this is much lower impact than I thought, since
you can't set up a netlink mmap ring without global CAP_NET_ADMIN]

On Wed, May 7, 2014 at 5:46 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> Disclaimer: I haven't tried to write a proof-of-concept, so I could be
> wrong here.
>
>     /* Netlink messages are validated by the receiver before processing.
>      * In order to avoid userspace changing the contents of the message
>      * after validation, the socket and the ring may only be used by a
>      * single process, otherwise we fall back to copying.
>      */
>     if (atomic_long_read(&sk->sk_socket->file->f_count) > 2 ||
>         atomic_read(&nlk->mapped) > 1)
>         excl = false;
>
> How is this possibly safe?  I think it's broken for at least three reasons:
>
> 1. Shouldn't that be atomic_long(read(&f_count) > 1))?
>
> 2. threads
>
> 3. process_vm_writev
>
> I wouldn't be surprised if RDMA and AIO also break this assumption.
>
> Does anything rely on mmapped netlink tx being fast?  If not, can this
> code just be deleted?
>
> For that matter, does anything rely on mmapped netlink tx at all?
>
> (On a non-KASLR, non-SMEP system, this probably gives reliable kernel
> code execution.)
>
> For that matter, netlink_mmap_sendmsg also appears to vulnerable to a
> TOCTOU attack via hdr.
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-05-12 21:08 ` Netlink mmap tx security? Andy Lutomirski
@ 2014-10-11 22:29   ` Andy Lutomirski
  2014-10-11 23:09     ` David Miller
  2014-10-14 19:19     ` David Miller
  0 siblings, 2 replies; 26+ messages in thread
From: Andy Lutomirski @ 2014-10-11 22:29 UTC (permalink / raw)
  To: Linus Torvalds, David S. Miller, Patrick McHardy, Network Development

On May 12, 2014 3:08 PM, "Andy Lutomirski" <luto@amacapital.net> wrote:
>
> [moving to netdev -- this is much lower impact than I thought, since
> you can't set up a netlink mmap ring without global CAP_NET_ADMIN]

Did anything ever happen here?  Despite not being a privilege
escalation in the normal sense, it's still a bug, and it's still a
fairly easy bypass of module signatures.


--Andy

>
> On Wed, May 7, 2014 at 5:46 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > Disclaimer: I haven't tried to write a proof-of-concept, so I could be
> > wrong here.
> >
> >     /* Netlink messages are validated by the receiver before processing.
> >      * In order to avoid userspace changing the contents of the message
> >      * after validation, the socket and the ring may only be used by a
> >      * single process, otherwise we fall back to copying.
> >      */
> >     if (atomic_long_read(&sk->sk_socket->file->f_count) > 2 ||
> >         atomic_read(&nlk->mapped) > 1)
> >         excl = false;
> >
> > How is this possibly safe?  I think it's broken for at least three reasons:
> >
> > 1. Shouldn't that be atomic_long(read(&f_count) > 1))?
> >
> > 2. threads
> >
> > 3. process_vm_writev
> >
> > I wouldn't be surprised if RDMA and AIO also break this assumption.
> >
> > Does anything rely on mmapped netlink tx being fast?  If not, can this
> > code just be deleted?
> >
> > For that matter, does anything rely on mmapped netlink tx at all?
> >
> > (On a non-KASLR, non-SMEP system, this probably gives reliable kernel
> > code execution.)
> >
> > For that matter, netlink_mmap_sendmsg also appears to vulnerable to a
> > TOCTOU attack via hdr.
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-11 22:29   ` Andy Lutomirski
@ 2014-10-11 23:09     ` David Miller
  2014-10-14 19:19     ` David Miller
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2014-10-11 23:09 UTC (permalink / raw)
  To: luto; +Cc: torvalds, kaber, netdev

From: Andy Lutomirski <luto@amacapital.net>
Date: Sat, 11 Oct 2014 15:29:17 -0700

> On May 12, 2014 3:08 PM, "Andy Lutomirski" <luto@amacapital.net> wrote:
>>
>> [moving to netdev -- this is much lower impact than I thought, since
>> you can't set up a netlink mmap ring without global CAP_NET_ADMIN]
> 
> Did anything ever happen here?  Despite not being a privilege
> escalation in the normal sense, it's still a bug, and it's still a
> fairly easy bypass of module signatures.

We definitely still need to address this, and since Patrick has still
not responded on this issue after being given 5 months to do so, the
only reasonable path is to get rid of this feature altogether.

Thanks for reminding me.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-11 22:29   ` Andy Lutomirski
  2014-10-11 23:09     ` David Miller
@ 2014-10-14 19:19     ` David Miller
  2014-10-14 19:33       ` Andy Lutomirski
  1 sibling, 1 reply; 26+ messages in thread
From: David Miller @ 2014-10-14 19:19 UTC (permalink / raw)
  To: luto; +Cc: torvalds, kaber, netdev

From: Andy Lutomirski <luto@amacapital.net>
Date: Sat, 11 Oct 2014 15:29:17 -0700

> On May 12, 2014 3:08 PM, "Andy Lutomirski" <luto@amacapital.net> wrote:
>>
>> [moving to netdev -- this is much lower impact than I thought, since
>> you can't set up a netlink mmap ring without global CAP_NET_ADMIN]
> 
> Did anything ever happen here?  Despite not being a privilege
> escalation in the normal sense, it's still a bug, and it's still a
> fairly easy bypass of module signatures.

Andy, please review:

====================
[PATCH] netlink: Remove TX mmap support.

There is no reasonable manner in which to absolutely make sure that another
thread of control cannot write to the pages in the mmap()'d area and thus
make sure that netlink messages do not change underneath us after we've
performed verifications.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/netlink/af_netlink.c | 135 ++---------------------------------------------
 1 file changed, 5 insertions(+), 130 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index c416725..771e6c0 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -289,11 +289,6 @@ static bool netlink_rx_is_mmaped(struct sock *sk)
 	return nlk_sk(sk)->rx_ring.pg_vec != NULL;
 }
 
-static bool netlink_tx_is_mmaped(struct sock *sk)
-{
-	return nlk_sk(sk)->tx_ring.pg_vec != NULL;
-}
-
 static __pure struct page *pgvec_to_page(const void *addr)
 {
 	if (is_vmalloc_addr(addr))
@@ -662,13 +657,6 @@ static unsigned int netlink_poll(struct file *file, struct socket *sock,
 	}
 	spin_unlock_bh(&sk->sk_receive_queue.lock);
 
-	spin_lock_bh(&sk->sk_write_queue.lock);
-	if (nlk->tx_ring.pg_vec) {
-		if (netlink_current_frame(&nlk->tx_ring, NL_MMAP_STATUS_UNUSED))
-			mask |= POLLOUT | POLLWRNORM;
-	}
-	spin_unlock_bh(&sk->sk_write_queue.lock);
-
 	return mask;
 }
 
@@ -698,104 +686,6 @@ static void netlink_ring_setup_skb(struct sk_buff *skb, struct sock *sk,
 	NETLINK_CB(skb).sk = sk;
 }
 
-static int netlink_mmap_sendmsg(struct sock *sk, struct msghdr *msg,
-				u32 dst_portid, u32 dst_group,
-				struct sock_iocb *siocb)
-{
-	struct netlink_sock *nlk = nlk_sk(sk);
-	struct netlink_ring *ring;
-	struct nl_mmap_hdr *hdr;
-	struct sk_buff *skb;
-	unsigned int maxlen;
-	bool excl = true;
-	int err = 0, len = 0;
-
-	/* Netlink messages are validated by the receiver before processing.
-	 * In order to avoid userspace changing the contents of the message
-	 * after validation, the socket and the ring may only be used by a
-	 * single process, otherwise we fall back to copying.
-	 */
-	if (atomic_long_read(&sk->sk_socket->file->f_count) > 2 ||
-	    atomic_read(&nlk->mapped) > 1)
-		excl = false;
-
-	mutex_lock(&nlk->pg_vec_lock);
-
-	ring   = &nlk->tx_ring;
-	maxlen = ring->frame_size - NL_MMAP_HDRLEN;
-
-	do {
-		hdr = netlink_current_frame(ring, NL_MMAP_STATUS_VALID);
-		if (hdr == NULL) {
-			if (!(msg->msg_flags & MSG_DONTWAIT) &&
-			    atomic_read(&nlk->tx_ring.pending))
-				schedule();
-			continue;
-		}
-		if (hdr->nm_len > maxlen) {
-			err = -EINVAL;
-			goto out;
-		}
-
-		netlink_frame_flush_dcache(hdr);
-
-		if (likely(dst_portid == 0 && dst_group == 0 && excl)) {
-			skb = alloc_skb_head(GFP_KERNEL);
-			if (skb == NULL) {
-				err = -ENOBUFS;
-				goto out;
-			}
-			sock_hold(sk);
-			netlink_ring_setup_skb(skb, sk, ring, hdr);
-			NETLINK_CB(skb).flags |= NETLINK_SKB_TX;
-			__skb_put(skb, hdr->nm_len);
-			netlink_set_status(hdr, NL_MMAP_STATUS_RESERVED);
-			atomic_inc(&ring->pending);
-		} else {
-			skb = alloc_skb(hdr->nm_len, GFP_KERNEL);
-			if (skb == NULL) {
-				err = -ENOBUFS;
-				goto out;
-			}
-			__skb_put(skb, hdr->nm_len);
-			memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
-			netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
-		}
-
-		netlink_increment_head(ring);
-
-		NETLINK_CB(skb).portid	  = nlk->portid;
-		NETLINK_CB(skb).dst_group = dst_group;
-		NETLINK_CB(skb).creds	  = siocb->scm->creds;
-
-		err = security_netlink_send(sk, skb);
-		if (err) {
-			kfree_skb(skb);
-			goto out;
-		}
-
-		if (unlikely(dst_group)) {
-			atomic_inc(&skb->users);
-			netlink_broadcast(sk, skb, dst_portid, dst_group,
-					  GFP_KERNEL);
-		}
-		err = netlink_unicast(sk, skb, dst_portid,
-				      msg->msg_flags & MSG_DONTWAIT);
-		if (err < 0)
-			goto out;
-		len += err;
-
-	} while (hdr != NULL ||
-		 (!(msg->msg_flags & MSG_DONTWAIT) &&
-		  atomic_read(&nlk->tx_ring.pending)));
-
-	if (len > 0)
-		err = len;
-out:
-	mutex_unlock(&nlk->pg_vec_lock);
-	return err;
-}
-
 static void netlink_queue_mmaped_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct nl_mmap_hdr *hdr;
@@ -842,10 +732,8 @@ static void netlink_ring_set_copied(struct sock *sk, struct sk_buff *skb)
 #else /* CONFIG_NETLINK_MMAP */
 #define netlink_skb_is_mmaped(skb)	false
 #define netlink_rx_is_mmaped(sk)	false
-#define netlink_tx_is_mmaped(sk)	false
 #define netlink_mmap			sock_no_mmap
 #define netlink_poll			datagram_poll
-#define netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group, siocb)	0
 #endif /* CONFIG_NETLINK_MMAP */
 
 static void netlink_skb_destructor(struct sk_buff *skb)
@@ -864,16 +752,11 @@ static void netlink_skb_destructor(struct sk_buff *skb)
 		hdr = netlink_mmap_hdr(skb);
 		sk = NETLINK_CB(skb).sk;
 
-		if (NETLINK_CB(skb).flags & NETLINK_SKB_TX) {
-			netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
-			ring = &nlk_sk(sk)->tx_ring;
-		} else {
-			if (!(NETLINK_CB(skb).flags & NETLINK_SKB_DELIVERED)) {
-				hdr->nm_len = 0;
-				netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
-			}
-			ring = &nlk_sk(sk)->rx_ring;
+		if (!(NETLINK_CB(skb).flags & NETLINK_SKB_DELIVERED)) {
+			hdr->nm_len = 0;
+			netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
 		}
+		ring = &nlk_sk(sk)->rx_ring;
 
 		WARN_ON(atomic_read(&ring->pending) == 0);
 		atomic_dec(&ring->pending);
@@ -2165,8 +2048,7 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 		err = 0;
 		break;
 #ifdef CONFIG_NETLINK_MMAP
-	case NETLINK_RX_RING:
-	case NETLINK_TX_RING: {
+	case NETLINK_RX_RING: {
 		struct nl_mmap_req req;
 
 		/* Rings might consume more memory than queue limits, require
@@ -2295,13 +2177,6 @@ static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
 			goto out;
 	}
 
-	if (netlink_tx_is_mmaped(sk) &&
-	    msg->msg_iov->iov_base == NULL) {
-		err = netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group,
-					   siocb);
-		goto out;
-	}
-
 	err = -EMSGSIZE;
 	if (len > sk->sk_sndbuf - 32)
 		goto out;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-14 19:19     ` David Miller
@ 2014-10-14 19:33       ` Andy Lutomirski
  2014-10-14 20:00         ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2014-10-14 19:33 UTC (permalink / raw)
  To: David Miller; +Cc: Linus Torvalds, Patrick McHardy, Network Development

On Tue, Oct 14, 2014 at 12:19 PM, David Miller <davem@davemloft.net> wrote:
> From: Andy Lutomirski <luto@amacapital.net>
> Date: Sat, 11 Oct 2014 15:29:17 -0700
>
>> On May 12, 2014 3:08 PM, "Andy Lutomirski" <luto@amacapital.net> wrote:
>>>
>>> [moving to netdev -- this is much lower impact than I thought, since
>>> you can't set up a netlink mmap ring without global CAP_NET_ADMIN]
>>
>> Did anything ever happen here?  Despite not being a privilege
>> escalation in the normal sense, it's still a bug, and it's still a
>> fairly easy bypass of module signatures.
>
> Andy, please review:
>
> ====================
> [PATCH] netlink: Remove TX mmap support.
>
> There is no reasonable manner in which to absolutely make sure that another
> thread of control cannot write to the pages in the mmap()'d area and thus
> make sure that netlink messages do not change underneath us after we've
> performed verifications.

For full honesty, there is now the machinery developed for memfd
sealing, but I can't imagine that this is ever faster than just
copying the buffer.

I think that the NETLINK_SKB_TX declaration in include/linux/netlink.h
should probably go, too.  And there's the last parameter to
netlink_set_ring, too, and possibly even the nlk->tx_ring struct
itself.

--Andy

>
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/netlink/af_netlink.c | 135 ++---------------------------------------------
>  1 file changed, 5 insertions(+), 130 deletions(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index c416725..771e6c0 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -289,11 +289,6 @@ static bool netlink_rx_is_mmaped(struct sock *sk)
>         return nlk_sk(sk)->rx_ring.pg_vec != NULL;
>  }
>
> -static bool netlink_tx_is_mmaped(struct sock *sk)
> -{
> -       return nlk_sk(sk)->tx_ring.pg_vec != NULL;
> -}
> -
>  static __pure struct page *pgvec_to_page(const void *addr)
>  {
>         if (is_vmalloc_addr(addr))
> @@ -662,13 +657,6 @@ static unsigned int netlink_poll(struct file *file, struct socket *sock,
>         }
>         spin_unlock_bh(&sk->sk_receive_queue.lock);
>
> -       spin_lock_bh(&sk->sk_write_queue.lock);
> -       if (nlk->tx_ring.pg_vec) {
> -               if (netlink_current_frame(&nlk->tx_ring, NL_MMAP_STATUS_UNUSED))
> -                       mask |= POLLOUT | POLLWRNORM;
> -       }
> -       spin_unlock_bh(&sk->sk_write_queue.lock);
> -
>         return mask;
>  }
>
> @@ -698,104 +686,6 @@ static void netlink_ring_setup_skb(struct sk_buff *skb, struct sock *sk,
>         NETLINK_CB(skb).sk = sk;
>  }
>
> -static int netlink_mmap_sendmsg(struct sock *sk, struct msghdr *msg,
> -                               u32 dst_portid, u32 dst_group,
> -                               struct sock_iocb *siocb)
> -{
> -       struct netlink_sock *nlk = nlk_sk(sk);
> -       struct netlink_ring *ring;
> -       struct nl_mmap_hdr *hdr;
> -       struct sk_buff *skb;
> -       unsigned int maxlen;
> -       bool excl = true;
> -       int err = 0, len = 0;
> -
> -       /* Netlink messages are validated by the receiver before processing.
> -        * In order to avoid userspace changing the contents of the message
> -        * after validation, the socket and the ring may only be used by a
> -        * single process, otherwise we fall back to copying.
> -        */
> -       if (atomic_long_read(&sk->sk_socket->file->f_count) > 2 ||
> -           atomic_read(&nlk->mapped) > 1)
> -               excl = false;
> -
> -       mutex_lock(&nlk->pg_vec_lock);
> -
> -       ring   = &nlk->tx_ring;
> -       maxlen = ring->frame_size - NL_MMAP_HDRLEN;
> -
> -       do {
> -               hdr = netlink_current_frame(ring, NL_MMAP_STATUS_VALID);
> -               if (hdr == NULL) {
> -                       if (!(msg->msg_flags & MSG_DONTWAIT) &&
> -                           atomic_read(&nlk->tx_ring.pending))
> -                               schedule();
> -                       continue;
> -               }
> -               if (hdr->nm_len > maxlen) {
> -                       err = -EINVAL;
> -                       goto out;
> -               }
> -
> -               netlink_frame_flush_dcache(hdr);
> -
> -               if (likely(dst_portid == 0 && dst_group == 0 && excl)) {
> -                       skb = alloc_skb_head(GFP_KERNEL);
> -                       if (skb == NULL) {
> -                               err = -ENOBUFS;
> -                               goto out;
> -                       }
> -                       sock_hold(sk);
> -                       netlink_ring_setup_skb(skb, sk, ring, hdr);
> -                       NETLINK_CB(skb).flags |= NETLINK_SKB_TX;
> -                       __skb_put(skb, hdr->nm_len);
> -                       netlink_set_status(hdr, NL_MMAP_STATUS_RESERVED);
> -                       atomic_inc(&ring->pending);
> -               } else {
> -                       skb = alloc_skb(hdr->nm_len, GFP_KERNEL);
> -                       if (skb == NULL) {
> -                               err = -ENOBUFS;
> -                               goto out;
> -                       }
> -                       __skb_put(skb, hdr->nm_len);
> -                       memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
> -                       netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
> -               }
> -
> -               netlink_increment_head(ring);
> -
> -               NETLINK_CB(skb).portid    = nlk->portid;
> -               NETLINK_CB(skb).dst_group = dst_group;
> -               NETLINK_CB(skb).creds     = siocb->scm->creds;
> -
> -               err = security_netlink_send(sk, skb);
> -               if (err) {
> -                       kfree_skb(skb);
> -                       goto out;
> -               }
> -
> -               if (unlikely(dst_group)) {
> -                       atomic_inc(&skb->users);
> -                       netlink_broadcast(sk, skb, dst_portid, dst_group,
> -                                         GFP_KERNEL);
> -               }
> -               err = netlink_unicast(sk, skb, dst_portid,
> -                                     msg->msg_flags & MSG_DONTWAIT);
> -               if (err < 0)
> -                       goto out;
> -               len += err;
> -
> -       } while (hdr != NULL ||
> -                (!(msg->msg_flags & MSG_DONTWAIT) &&
> -                 atomic_read(&nlk->tx_ring.pending)));
> -
> -       if (len > 0)
> -               err = len;
> -out:
> -       mutex_unlock(&nlk->pg_vec_lock);
> -       return err;
> -}
> -
>  static void netlink_queue_mmaped_skb(struct sock *sk, struct sk_buff *skb)
>  {
>         struct nl_mmap_hdr *hdr;
> @@ -842,10 +732,8 @@ static void netlink_ring_set_copied(struct sock *sk, struct sk_buff *skb)
>  #else /* CONFIG_NETLINK_MMAP */
>  #define netlink_skb_is_mmaped(skb)     false
>  #define netlink_rx_is_mmaped(sk)       false
> -#define netlink_tx_is_mmaped(sk)       false
>  #define netlink_mmap                   sock_no_mmap
>  #define netlink_poll                   datagram_poll
> -#define netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group, siocb)    0
>  #endif /* CONFIG_NETLINK_MMAP */
>
>  static void netlink_skb_destructor(struct sk_buff *skb)
> @@ -864,16 +752,11 @@ static void netlink_skb_destructor(struct sk_buff *skb)
>                 hdr = netlink_mmap_hdr(skb);
>                 sk = NETLINK_CB(skb).sk;
>
> -               if (NETLINK_CB(skb).flags & NETLINK_SKB_TX) {
> -                       netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
> -                       ring = &nlk_sk(sk)->tx_ring;
> -               } else {
> -                       if (!(NETLINK_CB(skb).flags & NETLINK_SKB_DELIVERED)) {
> -                               hdr->nm_len = 0;
> -                               netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
> -                       }
> -                       ring = &nlk_sk(sk)->rx_ring;
> +               if (!(NETLINK_CB(skb).flags & NETLINK_SKB_DELIVERED)) {
> +                       hdr->nm_len = 0;
> +                       netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
>                 }
> +               ring = &nlk_sk(sk)->rx_ring;
>
>                 WARN_ON(atomic_read(&ring->pending) == 0);
>                 atomic_dec(&ring->pending);
> @@ -2165,8 +2048,7 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
>                 err = 0;
>                 break;
>  #ifdef CONFIG_NETLINK_MMAP
> -       case NETLINK_RX_RING:
> -       case NETLINK_TX_RING: {
> +       case NETLINK_RX_RING: {
>                 struct nl_mmap_req req;
>
>                 /* Rings might consume more memory than queue limits, require
> @@ -2295,13 +2177,6 @@ static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
>                         goto out;
>         }
>
> -       if (netlink_tx_is_mmaped(sk) &&
> -           msg->msg_iov->iov_base == NULL) {
> -               err = netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group,
> -                                          siocb);
> -               goto out;
> -       }
> -
>         err = -EMSGSIZE;
>         if (len > sk->sk_sndbuf - 32)
>                 goto out;
> --
> 1.7.11.7
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-14 19:33       ` Andy Lutomirski
@ 2014-10-14 20:00         ` David Miller
  2014-10-14 22:16           ` Andy Lutomirski
  0 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2014-10-14 20:00 UTC (permalink / raw)
  To: luto; +Cc: torvalds, kaber, netdev

From: Andy Lutomirski <luto@amacapital.net>
Date: Tue, 14 Oct 2014 12:33:43 -0700

> For full honesty, there is now the machinery developed for memfd
> sealing, but I can't imagine that this is ever faster than just
> copying the buffer.

I don't have much motivation to even check if it's a worthwhile
pursuit at this point.  

Someone who wants to can :-)

> I think that the NETLINK_SKB_TX declaration in include/linux/netlink.h
> should probably go, too.  And there's the last parameter to
> netlink_set_ring, too, and possibly even the nlk->tx_ring struct
> itself.

Agreed on all counts, here is the new patch:

====================
[PATCH] netlink: Remove TX mmap support.

There is no reasonable manner in which to absolutely make sure that another
thread of control cannot write to the pages in the mmap()'d area and thus
make sure that netlink messages do not change underneath us after we've
performed verifications.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netlink.h  |   5 +-
 net/netlink/af_netlink.c | 161 +++++------------------------------------------
 net/netlink/af_netlink.h |   1 -
 3 files changed, 16 insertions(+), 151 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 9e572da..57080a9 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -17,9 +17,8 @@ static inline struct nlmsghdr *nlmsg_hdr(const struct sk_buff *skb)
 
 enum netlink_skb_flags {
 	NETLINK_SKB_MMAPED	= 0x1,	/* Packet data is mmaped */
-	NETLINK_SKB_TX		= 0x2,	/* Packet was sent by userspace */
-	NETLINK_SKB_DELIVERED	= 0x4,	/* Packet was delivered */
-	NETLINK_SKB_DST		= 0x8,	/* Dst set in sendto or sendmsg */
+	NETLINK_SKB_DELIVERED	= 0x2,	/* Packet was delivered */
+	NETLINK_SKB_DST		= 0x4,	/* Dst set in sendto or sendmsg */
 };
 
 struct netlink_skb_parms {
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index c416725..07ef0c9 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -289,11 +289,6 @@ static bool netlink_rx_is_mmaped(struct sock *sk)
 	return nlk_sk(sk)->rx_ring.pg_vec != NULL;
 }
 
-static bool netlink_tx_is_mmaped(struct sock *sk)
-{
-	return nlk_sk(sk)->tx_ring.pg_vec != NULL;
-}
-
 static __pure struct page *pgvec_to_page(const void *addr)
 {
 	if (is_vmalloc_addr(addr))
@@ -359,7 +354,7 @@ err1:
 }
 
 static int netlink_set_ring(struct sock *sk, struct nl_mmap_req *req,
-			    bool closing, bool tx_ring)
+			    bool closing)
 {
 	struct netlink_sock *nlk = nlk_sk(sk);
 	struct netlink_ring *ring;
@@ -368,8 +363,8 @@ static int netlink_set_ring(struct sock *sk, struct nl_mmap_req *req,
 	unsigned int order = 0;
 	int err;
 
-	ring  = tx_ring ? &nlk->tx_ring : &nlk->rx_ring;
-	queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue;
+	ring  = &nlk->rx_ring;
+	queue = &sk->sk_receive_queue;
 
 	if (!closing) {
 		if (atomic_read(&nlk->mapped))
@@ -476,11 +471,9 @@ static int netlink_mmap(struct file *file, struct socket *sock,
 	mutex_lock(&nlk->pg_vec_lock);
 
 	expected = 0;
-	for (ring = &nlk->rx_ring; ring <= &nlk->tx_ring; ring++) {
-		if (ring->pg_vec == NULL)
-			continue;
+	ring = &nlk->rx_ring;
+	if (ring->pg_vec)
 		expected += ring->pg_vec_len * ring->pg_vec_pages * PAGE_SIZE;
-	}
 
 	if (expected == 0)
 		goto out;
@@ -490,10 +483,8 @@ static int netlink_mmap(struct file *file, struct socket *sock,
 		goto out;
 
 	start = vma->vm_start;
-	for (ring = &nlk->rx_ring; ring <= &nlk->tx_ring; ring++) {
-		if (ring->pg_vec == NULL)
-			continue;
-
+	ring = &nlk->rx_ring;
+	if (ring->pg_vec) {
 		for (i = 0; i < ring->pg_vec_len; i++) {
 			struct page *page;
 			void *kaddr = ring->pg_vec[i];
@@ -662,13 +653,6 @@ static unsigned int netlink_poll(struct file *file, struct socket *sock,
 	}
 	spin_unlock_bh(&sk->sk_receive_queue.lock);
 
-	spin_lock_bh(&sk->sk_write_queue.lock);
-	if (nlk->tx_ring.pg_vec) {
-		if (netlink_current_frame(&nlk->tx_ring, NL_MMAP_STATUS_UNUSED))
-			mask |= POLLOUT | POLLWRNORM;
-	}
-	spin_unlock_bh(&sk->sk_write_queue.lock);
-
 	return mask;
 }
 
@@ -698,104 +682,6 @@ static void netlink_ring_setup_skb(struct sk_buff *skb, struct sock *sk,
 	NETLINK_CB(skb).sk = sk;
 }
 
-static int netlink_mmap_sendmsg(struct sock *sk, struct msghdr *msg,
-				u32 dst_portid, u32 dst_group,
-				struct sock_iocb *siocb)
-{
-	struct netlink_sock *nlk = nlk_sk(sk);
-	struct netlink_ring *ring;
-	struct nl_mmap_hdr *hdr;
-	struct sk_buff *skb;
-	unsigned int maxlen;
-	bool excl = true;
-	int err = 0, len = 0;
-
-	/* Netlink messages are validated by the receiver before processing.
-	 * In order to avoid userspace changing the contents of the message
-	 * after validation, the socket and the ring may only be used by a
-	 * single process, otherwise we fall back to copying.
-	 */
-	if (atomic_long_read(&sk->sk_socket->file->f_count) > 2 ||
-	    atomic_read(&nlk->mapped) > 1)
-		excl = false;
-
-	mutex_lock(&nlk->pg_vec_lock);
-
-	ring   = &nlk->tx_ring;
-	maxlen = ring->frame_size - NL_MMAP_HDRLEN;
-
-	do {
-		hdr = netlink_current_frame(ring, NL_MMAP_STATUS_VALID);
-		if (hdr == NULL) {
-			if (!(msg->msg_flags & MSG_DONTWAIT) &&
-			    atomic_read(&nlk->tx_ring.pending))
-				schedule();
-			continue;
-		}
-		if (hdr->nm_len > maxlen) {
-			err = -EINVAL;
-			goto out;
-		}
-
-		netlink_frame_flush_dcache(hdr);
-
-		if (likely(dst_portid == 0 && dst_group == 0 && excl)) {
-			skb = alloc_skb_head(GFP_KERNEL);
-			if (skb == NULL) {
-				err = -ENOBUFS;
-				goto out;
-			}
-			sock_hold(sk);
-			netlink_ring_setup_skb(skb, sk, ring, hdr);
-			NETLINK_CB(skb).flags |= NETLINK_SKB_TX;
-			__skb_put(skb, hdr->nm_len);
-			netlink_set_status(hdr, NL_MMAP_STATUS_RESERVED);
-			atomic_inc(&ring->pending);
-		} else {
-			skb = alloc_skb(hdr->nm_len, GFP_KERNEL);
-			if (skb == NULL) {
-				err = -ENOBUFS;
-				goto out;
-			}
-			__skb_put(skb, hdr->nm_len);
-			memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
-			netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
-		}
-
-		netlink_increment_head(ring);
-
-		NETLINK_CB(skb).portid	  = nlk->portid;
-		NETLINK_CB(skb).dst_group = dst_group;
-		NETLINK_CB(skb).creds	  = siocb->scm->creds;
-
-		err = security_netlink_send(sk, skb);
-		if (err) {
-			kfree_skb(skb);
-			goto out;
-		}
-
-		if (unlikely(dst_group)) {
-			atomic_inc(&skb->users);
-			netlink_broadcast(sk, skb, dst_portid, dst_group,
-					  GFP_KERNEL);
-		}
-		err = netlink_unicast(sk, skb, dst_portid,
-				      msg->msg_flags & MSG_DONTWAIT);
-		if (err < 0)
-			goto out;
-		len += err;
-
-	} while (hdr != NULL ||
-		 (!(msg->msg_flags & MSG_DONTWAIT) &&
-		  atomic_read(&nlk->tx_ring.pending)));
-
-	if (len > 0)
-		err = len;
-out:
-	mutex_unlock(&nlk->pg_vec_lock);
-	return err;
-}
-
 static void netlink_queue_mmaped_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct nl_mmap_hdr *hdr;
@@ -842,10 +728,8 @@ static void netlink_ring_set_copied(struct sock *sk, struct sk_buff *skb)
 #else /* CONFIG_NETLINK_MMAP */
 #define netlink_skb_is_mmaped(skb)	false
 #define netlink_rx_is_mmaped(sk)	false
-#define netlink_tx_is_mmaped(sk)	false
 #define netlink_mmap			sock_no_mmap
 #define netlink_poll			datagram_poll
-#define netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group, siocb)	0
 #endif /* CONFIG_NETLINK_MMAP */
 
 static void netlink_skb_destructor(struct sk_buff *skb)
@@ -864,16 +748,11 @@ static void netlink_skb_destructor(struct sk_buff *skb)
 		hdr = netlink_mmap_hdr(skb);
 		sk = NETLINK_CB(skb).sk;
 
-		if (NETLINK_CB(skb).flags & NETLINK_SKB_TX) {
-			netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
-			ring = &nlk_sk(sk)->tx_ring;
-		} else {
-			if (!(NETLINK_CB(skb).flags & NETLINK_SKB_DELIVERED)) {
-				hdr->nm_len = 0;
-				netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
-			}
-			ring = &nlk_sk(sk)->rx_ring;
+		if (!(NETLINK_CB(skb).flags & NETLINK_SKB_DELIVERED)) {
+			hdr->nm_len = 0;
+			netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
 		}
+		ring = &nlk_sk(sk)->rx_ring;
 
 		WARN_ON(atomic_read(&ring->pending) == 0);
 		atomic_dec(&ring->pending);
@@ -921,10 +800,7 @@ static void netlink_sock_destruct(struct sock *sk)
 
 		memset(&req, 0, sizeof(req));
 		if (nlk->rx_ring.pg_vec)
-			netlink_set_ring(sk, &req, true, false);
-		memset(&req, 0, sizeof(req));
-		if (nlk->tx_ring.pg_vec)
-			netlink_set_ring(sk, &req, true, true);
+			netlink_set_ring(sk, &req, true);
 	}
 #endif /* CONFIG_NETLINK_MMAP */
 
@@ -2165,8 +2041,7 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 		err = 0;
 		break;
 #ifdef CONFIG_NETLINK_MMAP
-	case NETLINK_RX_RING:
-	case NETLINK_TX_RING: {
+	case NETLINK_RX_RING: {
 		struct nl_mmap_req req;
 
 		/* Rings might consume more memory than queue limits, require
@@ -2178,8 +2053,7 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 			return -EINVAL;
 		if (copy_from_user(&req, optval, sizeof(req)))
 			return -EFAULT;
-		err = netlink_set_ring(sk, &req, false,
-				       optname == NETLINK_TX_RING);
+		err = netlink_set_ring(sk, &req, false);
 		break;
 	}
 #endif /* CONFIG_NETLINK_MMAP */
@@ -2295,13 +2169,6 @@ static int netlink_sendmsg(struct kiocb *kiocb, struct socket *sock,
 			goto out;
 	}
 
-	if (netlink_tx_is_mmaped(sk) &&
-	    msg->msg_iov->iov_base == NULL) {
-		err = netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group,
-					   siocb);
-		goto out;
-	}
-
 	err = -EMSGSIZE;
 	if (len > sk->sk_sndbuf - 32)
 		goto out;
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index b20a173..4741b88 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -45,7 +45,6 @@ struct netlink_sock {
 #ifdef CONFIG_NETLINK_MMAP
 	struct mutex		pg_vec_lock;
 	struct netlink_ring	rx_ring;
-	struct netlink_ring	tx_ring;
 	atomic_t		mapped;
 #endif /* CONFIG_NETLINK_MMAP */
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-14 20:00         ` David Miller
@ 2014-10-14 22:16           ` Andy Lutomirski
  2014-10-15  2:01             ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2014-10-14 22:16 UTC (permalink / raw)
  To: David Miller; +Cc: Linus Torvalds, Patrick McHardy, Network Development

On Tue, Oct 14, 2014 at 1:00 PM, David Miller <davem@davemloft.net> wrote:
> From: Andy Lutomirski <luto@amacapital.net>
> Date: Tue, 14 Oct 2014 12:33:43 -0700
>
>> For full honesty, there is now the machinery developed for memfd
>> sealing, but I can't imagine that this is ever faster than just
>> copying the buffer.
>
> I don't have much motivation to even check if it's a worthwhile
> pursuit at this point.
>
> Someone who wants to can :-)
>
>> I think that the NETLINK_SKB_TX declaration in include/linux/netlink.h
>> should probably go, too.  And there's the last parameter to
>> netlink_set_ring, too, and possibly even the nlk->tx_ring struct
>> itself.
>
> Agreed on all counts, here is the new patch:
>
> ====================
> [PATCH] netlink: Remove TX mmap support.
>
> There is no reasonable manner in which to absolutely make sure that another
> thread of control cannot write to the pages in the mmap()'d area and thus
> make sure that netlink messages do not change underneath us after we've
> performed verifications.
>
> Reported-by: Andy Lutomirski <luto@amacapital.net>
> Signed-off-by: David S. Miller <davem@davemloft.net>

Looks sensible to me, but I have no idea how to test it.

It's at least remotely possible that there's something that assumes
that assumes that the availability of NETLINK_RX_RING implies
NETLINK_TX_RING, which would be unfortunate.

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-14 22:16           ` Andy Lutomirski
@ 2014-10-15  2:01             ` David Miller
  2014-10-15  2:03               ` Andy Lutomirski
  2014-10-15 23:45               ` Netlink mmap tx security? Daniel Borkmann
  0 siblings, 2 replies; 26+ messages in thread
From: David Miller @ 2014-10-15  2:01 UTC (permalink / raw)
  To: luto; +Cc: torvalds, kaber, netdev, tgraf

From: Andy Lutomirski <luto@amacapital.net>
Date: Tue, 14 Oct 2014 15:16:46 -0700

> It's at least remotely possible that there's something that assumes
> that assumes that the availability of NETLINK_RX_RING implies
> NETLINK_TX_RING, which would be unfortunate.

I already found one such case, nlmon :-/

It also reminds me that I'll have to update
Documentation/networking/netlink_mmap.txt

Thomas, the context is that we have to remove NETLINK_TX_RING support
(there is absolutely no way whatsoever to reliably keep some thread of
control from modifying the underlying pages while we parse and
validate the netlink request).

I'd like to be able to do so while retaining NETLINK_RX_RING because
that works fine and is great for monitoring when the rate of events
is high.

But I already have found userland pieces of code, like nlmon, which
assume that if one is present then both must be present.

I really think this means I'll have to remove all of the netlink
mmap() support in order to prevent from breaking applications. :(

The other option is to keep NETLINK_TX_RING, but copy the data into
a kernel side buffer before acting upon it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-15  2:01             ` David Miller
@ 2014-10-15  2:03               ` Andy Lutomirski
  2014-10-15  2:09                 ` David Miller
  2014-10-15 23:45               ` Netlink mmap tx security? Daniel Borkmann
  1 sibling, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2014-10-15  2:03 UTC (permalink / raw)
  To: David Miller
  Cc: Linus Torvalds, Patrick McHardy, Network Development, Thomas Graf

On Tue, Oct 14, 2014 at 7:01 PM, David Miller <davem@davemloft.net> wrote:
> From: Andy Lutomirski <luto@amacapital.net>
> Date: Tue, 14 Oct 2014 15:16:46 -0700
>
>> It's at least remotely possible that there's something that assumes
>> that assumes that the availability of NETLINK_RX_RING implies
>> NETLINK_TX_RING, which would be unfortunate.
>
> I already found one such case, nlmon :-/
>
> It also reminds me that I'll have to update
> Documentation/networking/netlink_mmap.txt
>
> Thomas, the context is that we have to remove NETLINK_TX_RING support
> (there is absolutely no way whatsoever to reliably keep some thread of
> control from modifying the underlying pages while we parse and
> validate the netlink request).
>
> I'd like to be able to do so while retaining NETLINK_RX_RING because
> that works fine and is great for monitoring when the rate of events
> is high.
>
> But I already have found userland pieces of code, like nlmon, which
> assume that if one is present then both must be present.
>
> I really think this means I'll have to remove all of the netlink
> mmap() support in order to prevent from breaking applications. :(
>
> The other option is to keep NETLINK_TX_RING, but copy the data into
> a kernel side buffer before acting upon it.

Option 3, which sucks but maybe not that badly: change the value of
NETLINK_RX_RING.  (Practically: add NETLINK_RX_RING2 or something like
that.)

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-15  2:03               ` Andy Lutomirski
@ 2014-10-15  2:09                 ` David Miller
  2014-10-16  6:45                   ` Daniel Borkmann
  0 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2014-10-15  2:09 UTC (permalink / raw)
  To: luto; +Cc: torvalds, kaber, netdev, tgraf

From: Andy Lutomirski <luto@amacapital.net>
Date: Tue, 14 Oct 2014 19:03:11 -0700

> On Tue, Oct 14, 2014 at 7:01 PM, David Miller <davem@davemloft.net> wrote:
>> I really think this means I'll have to remove all of the netlink
>> mmap() support in order to prevent from breaking applications. :(
>>
>> The other option is to keep NETLINK_TX_RING, but copy the data into
>> a kernel side buffer before acting upon it.
> 
> Option 3, which sucks but maybe not that badly: change the value of
> NETLINK_RX_RING.  (Practically: add NETLINK_RX_RING2 or something like
> that.)

That would work as well.

There are pros and cons to all of these approaches.

I was thinking that if we do the "TX mmap --> copy to kernel buffer"
approach, then if in the future we find a way to make it work
reliably, we can avoid the copy.  And frankly performance wise it's no
worse than what happens via normal sendmsg() calls.

And all applications using NETLINK_RX_RING keep working and keep
getting the performance boost.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-15  2:01             ` David Miller
  2014-10-15  2:03               ` Andy Lutomirski
@ 2014-10-15 23:45               ` Daniel Borkmann
  2014-10-15 23:57                 ` David Miller
  1 sibling, 1 reply; 26+ messages in thread
From: Daniel Borkmann @ 2014-10-15 23:45 UTC (permalink / raw)
  To: David Miller; +Cc: luto, torvalds, kaber, netdev, tgraf

On 10/15/2014 04:01 AM, David Miller wrote:
> From: Andy Lutomirski <luto@amacapital.net>
> Date: Tue, 14 Oct 2014 15:16:46 -0700
>
>> It's at least remotely possible that there's something that assumes
>> that assumes that the availability of NETLINK_RX_RING implies
>> NETLINK_TX_RING, which would be unfortunate.
>
> I already found one such case, nlmon :-/

Hmm, can you elaborate? I currently don't think that nlmon cares
actually.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-15 23:45               ` Netlink mmap tx security? Daniel Borkmann
@ 2014-10-15 23:57                 ` David Miller
  2014-10-15 23:58                   ` Andy Lutomirski
  2014-10-16  5:52                   ` Thomas Graf
  0 siblings, 2 replies; 26+ messages in thread
From: David Miller @ 2014-10-15 23:57 UTC (permalink / raw)
  To: dborkman; +Cc: luto, torvalds, kaber, netdev, tgraf

From: Daniel Borkmann <dborkman@redhat.com>
Date: Thu, 16 Oct 2014 01:45:22 +0200

> On 10/15/2014 04:01 AM, David Miller wrote:
>> From: Andy Lutomirski <luto@amacapital.net>
>> Date: Tue, 14 Oct 2014 15:16:46 -0700
>>
>>> It's at least remotely possible that there's something that assumes
>>> that assumes that the availability of NETLINK_RX_RING implies
>>> NETLINK_TX_RING, which would be unfortunate.
>>
>> I already found one such case, nlmon :-/
> 
> Hmm, can you elaborate? I currently don't think that nlmon cares
> actually.

nlmon cares, openvswitch cares, etc:

http://openvswitch.org/pipermail/dev/2013-December/034496.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-15 23:57                 ` David Miller
@ 2014-10-15 23:58                   ` Andy Lutomirski
  2014-10-16  3:34                     ` David Miller
  2014-10-16  5:52                   ` Thomas Graf
  1 sibling, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2014-10-15 23:58 UTC (permalink / raw)
  To: David Miller
  Cc: Daniel Borkmann, Linus Torvalds, Patrick McHardy,
	Network Development, Thomas Graf

On Wed, Oct 15, 2014 at 4:57 PM, David Miller <davem@davemloft.net> wrote:
> From: Daniel Borkmann <dborkman@redhat.com>
> Date: Thu, 16 Oct 2014 01:45:22 +0200
>
>> On 10/15/2014 04:01 AM, David Miller wrote:
>>> From: Andy Lutomirski <luto@amacapital.net>
>>> Date: Tue, 14 Oct 2014 15:16:46 -0700
>>>
>>>> It's at least remotely possible that there's something that assumes
>>>> that assumes that the availability of NETLINK_RX_RING implies
>>>> NETLINK_TX_RING, which would be unfortunate.
>>>
>>> I already found one such case, nlmon :-/
>>
>> Hmm, can you elaborate? I currently don't think that nlmon cares
>> actually.
>
> nlmon cares, openvswitch cares, etc:

It'll still work, just slower, right?

+    if (setsockopt(sock->fd, SOL_NETLINK, NETLINK_RX_RING, &req,
sizeof(req)) < 0
+        || setsockopt(sock->fd, SOL_NETLINK, NETLINK_TX_RING, &req,
sizeof(req)) < 0) {
+        VLOG_INFO("mmap netlink is not supported");
+        return 0;
+    }
+

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-15 23:58                   ` Andy Lutomirski
@ 2014-10-16  3:34                     ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2014-10-16  3:34 UTC (permalink / raw)
  To: luto; +Cc: dborkman, torvalds, kaber, netdev, tgraf

From: Andy Lutomirski <luto@amacapital.net>
Date: Wed, 15 Oct 2014 16:58:39 -0700

> On Wed, Oct 15, 2014 at 4:57 PM, David Miller <davem@davemloft.net> wrote:
>> From: Daniel Borkmann <dborkman@redhat.com>
>> Date: Thu, 16 Oct 2014 01:45:22 +0200
>>
>>> On 10/15/2014 04:01 AM, David Miller wrote:
>>>> From: Andy Lutomirski <luto@amacapital.net>
>>>> Date: Tue, 14 Oct 2014 15:16:46 -0700
>>>>
>>>>> It's at least remotely possible that there's something that assumes
>>>>> that assumes that the availability of NETLINK_RX_RING implies
>>>>> NETLINK_TX_RING, which would be unfortunate.
>>>>
>>>> I already found one such case, nlmon :-/
>>>
>>> Hmm, can you elaborate? I currently don't think that nlmon cares
>>> actually.
>>
>> nlmon cares, openvswitch cares, etc:
> 
> It'll still work, just slower, right?
> 
> +    if (setsockopt(sock->fd, SOL_NETLINK, NETLINK_RX_RING, &req,
> sizeof(req)) < 0
> +        || setsockopt(sock->fd, SOL_NETLINK, NETLINK_TX_RING, &req,
> sizeof(req)) < 0) {
> +        VLOG_INFO("mmap netlink is not supported");
> +        return 0;
> +    }

So NETLINK_RX_RING succeeds but NETLINK_TX_RING fails, so now the
recvmsg() path will take the RX ring code path because this code
doesn't clean up and undo the setsockopt().

This is the common coding paradigm I've seen in all NETLINK_*_RING
uses.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-15 23:57                 ` David Miller
  2014-10-15 23:58                   ` Andy Lutomirski
@ 2014-10-16  5:52                   ` Thomas Graf
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Graf @ 2014-10-16  5:52 UTC (permalink / raw)
  To: David Miller; +Cc: dborkman, luto, torvalds, kaber, netdev

On 10/15/14 at 07:57pm, David Miller wrote:
> From: Daniel Borkmann <dborkman@redhat.com>
> Date: Thu, 16 Oct 2014 01:45:22 +0200
> 
> > On 10/15/2014 04:01 AM, David Miller wrote:
> >> From: Andy Lutomirski <luto@amacapital.net>
> >> Date: Tue, 14 Oct 2014 15:16:46 -0700
> >>
> >>> It's at least remotely possible that there's something that assumes
> >>> that assumes that the availability of NETLINK_RX_RING implies
> >>> NETLINK_TX_RING, which would be unfortunate.
> >>
> >> I already found one such case, nlmon :-/
> > 
> > Hmm, can you elaborate? I currently don't think that nlmon cares
> > actually.
> 
> nlmon cares, openvswitch cares, etc:
> 
> http://openvswitch.org/pipermail/dev/2013-December/034496.html

(Fortunately) the OVS patch has not been merged yet because the number
of Netlink sockets created per vport in the current architecture
currently make it a non scalable approach.

I think introdcing a NETLINK_RX_RING2 and having NETLINK_RX_RING fail
is not a bad way out of this.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-15  2:09                 ` David Miller
@ 2014-10-16  6:45                   ` Daniel Borkmann
  2014-10-16  7:07                     ` Thomas Graf
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Borkmann @ 2014-10-16  6:45 UTC (permalink / raw)
  To: David Miller; +Cc: luto, torvalds, kaber, netdev, tgraf

On 10/15/2014 04:09 AM, David Miller wrote:
> From: Andy Lutomirski <luto@amacapital.net>
> Date: Tue, 14 Oct 2014 19:03:11 -0700
>
>> On Tue, Oct 14, 2014 at 7:01 PM, David Miller <davem@davemloft.net> wrote:
>>> I really think this means I'll have to remove all of the netlink
>>> mmap() support in order to prevent from breaking applications. :(
>>>
>>> The other option is to keep NETLINK_TX_RING, but copy the data into
>>> a kernel side buffer before acting upon it.
>>
>> Option 3, which sucks but maybe not that badly: change the value of
>> NETLINK_RX_RING.  (Practically: add NETLINK_RX_RING2 or something like
>> that.)
>
> That would work as well.
>
> There are pros and cons to all of these approaches.
>
> I was thinking that if we do the "TX mmap --> copy to kernel buffer"
> approach, then if in the future we find a way to make it work
> reliably, we can avoid the copy.  And frankly performance wise it's no
> worse than what happens via normal sendmsg() calls.
>
> And all applications using NETLINK_RX_RING keep working and keep
> getting the performance boost.

That would be better, yes. This would avoid having such a TPACKET_V*
API chaos we have in packet sockets if this could be fixed for netlink
eventually.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-16  6:45                   ` Daniel Borkmann
@ 2014-10-16  7:07                     ` Thomas Graf
  2014-12-16 22:58                       ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2014-10-16  7:07 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Miller, luto, torvalds, kaber, netdev

On 10/16/14 at 08:45am, Daniel Borkmann wrote:
> On 10/15/2014 04:09 AM, David Miller wrote:
> >That would work as well.
> >
> >There are pros and cons to all of these approaches.
> >
> >I was thinking that if we do the "TX mmap --> copy to kernel buffer"
> >approach, then if in the future we find a way to make it work
> >reliably, we can avoid the copy.  And frankly performance wise it's no
> >worse than what happens via normal sendmsg() calls.
> >
> >And all applications using NETLINK_RX_RING keep working and keep
> >getting the performance boost.
> 
> That would be better, yes. This would avoid having such a TPACKET_V*
> API chaos we have in packet sockets if this could be fixed for netlink
> eventually.

Only saw the second part of Dave's message now. I agree that this
is even a better option.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-10-16  7:07                     ` Thomas Graf
@ 2014-12-16 22:58                       ` David Miller
  2014-12-16 23:58                         ` Daniel Borkmann
  2014-12-17  0:02                         ` Eric Dumazet
  0 siblings, 2 replies; 26+ messages in thread
From: David Miller @ 2014-12-16 22:58 UTC (permalink / raw)
  To: tgraf; +Cc: dborkman, luto, torvalds, kaber, netdev

From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 16 Oct 2014 08:07:53 +0100

> On 10/16/14 at 08:45am, Daniel Borkmann wrote:
>> On 10/15/2014 04:09 AM, David Miller wrote:
>> >That would work as well.
>> >
>> >There are pros and cons to all of these approaches.
>> >
>> >I was thinking that if we do the "TX mmap --> copy to kernel buffer"
>> >approach, then if in the future we find a way to make it work
>> >reliably, we can avoid the copy.  And frankly performance wise it's no
>> >worse than what happens via normal sendmsg() calls.
>> >
>> >And all applications using NETLINK_RX_RING keep working and keep
>> >getting the performance boost.
>> 
>> That would be better, yes. This would avoid having such a TPACKET_V*
>> API chaos we have in packet sockets if this could be fixed for netlink
>> eventually.
> 
> Only saw the second part of Dave's message now. I agree that this
> is even a better option.

Sorry for dropping the ball on this, here is the patch I have right
now, any comments?

====================
[PATCH] netlink: Always copy on mmap TX.

Checking the file f_count and the nlk->mapped count is not completely
sufficient to prevent the mmap'd area contents from changing from
under us during netlink mmap sendmsg() operations.

Be careful to sample the header's length field only once, because this
could change from under us as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/netlink/af_netlink.c | 52 +++++++++++++++---------------------------------
 1 file changed, 16 insertions(+), 36 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ef5f77b..a64680a 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -525,14 +525,14 @@ out:
 	return err;
 }
 
-static void netlink_frame_flush_dcache(const struct nl_mmap_hdr *hdr)
+static void netlink_frame_flush_dcache(const struct nl_mmap_hdr *hdr, unsigned int nm_len)
 {
 #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 1
 	struct page *p_start, *p_end;
 
 	/* First page is flushed through netlink_{get,set}_status */
 	p_start = pgvec_to_page(hdr + PAGE_SIZE);
-	p_end   = pgvec_to_page((void *)hdr + NL_MMAP_HDRLEN + hdr->nm_len - 1);
+	p_end   = pgvec_to_page((void *)hdr + NL_MMAP_HDRLEN + nm_len - 1);
 	while (p_start <= p_end) {
 		flush_dcache_page(p_start);
 		p_start++;
@@ -714,24 +714,16 @@ static int netlink_mmap_sendmsg(struct sock *sk, struct msghdr *msg,
 	struct nl_mmap_hdr *hdr;
 	struct sk_buff *skb;
 	unsigned int maxlen;
-	bool excl = true;
 	int err = 0, len = 0;
 
-	/* Netlink messages are validated by the receiver before processing.
-	 * In order to avoid userspace changing the contents of the message
-	 * after validation, the socket and the ring may only be used by a
-	 * single process, otherwise we fall back to copying.
-	 */
-	if (atomic_long_read(&sk->sk_socket->file->f_count) > 1 ||
-	    atomic_read(&nlk->mapped) > 1)
-		excl = false;
-
 	mutex_lock(&nlk->pg_vec_lock);
 
 	ring   = &nlk->tx_ring;
 	maxlen = ring->frame_size - NL_MMAP_HDRLEN;
 
 	do {
+		unsigned int nm_len;
+
 		hdr = netlink_current_frame(ring, NL_MMAP_STATUS_VALID);
 		if (hdr == NULL) {
 			if (!(msg->msg_flags & MSG_DONTWAIT) &&
@@ -739,35 +731,23 @@ static int netlink_mmap_sendmsg(struct sock *sk, struct msghdr *msg,
 				schedule();
 			continue;
 		}
-		if (hdr->nm_len > maxlen) {
+
+		nm_len = ACCESS_ONCE(hdr->nm_len);
+		if (nm_len > maxlen) {
 			err = -EINVAL;
 			goto out;
 		}
 
-		netlink_frame_flush_dcache(hdr);
+		netlink_frame_flush_dcache(hdr, nm_len);
 
-		if (likely(dst_portid == 0 && dst_group == 0 && excl)) {
-			skb = alloc_skb_head(GFP_KERNEL);
-			if (skb == NULL) {
-				err = -ENOBUFS;
-				goto out;
-			}
-			sock_hold(sk);
-			netlink_ring_setup_skb(skb, sk, ring, hdr);
-			NETLINK_CB(skb).flags |= NETLINK_SKB_TX;
-			__skb_put(skb, hdr->nm_len);
-			netlink_set_status(hdr, NL_MMAP_STATUS_RESERVED);
-			atomic_inc(&ring->pending);
-		} else {
-			skb = alloc_skb(hdr->nm_len, GFP_KERNEL);
-			if (skb == NULL) {
-				err = -ENOBUFS;
-				goto out;
-			}
-			__skb_put(skb, hdr->nm_len);
-			memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
-			netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
+		skb = alloc_skb(nm_len, GFP_KERNEL);
+		if (skb == NULL) {
+			err = -ENOBUFS;
+			goto out;
 		}
+		__skb_put(skb, nm_len);
+		memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, nm_len);
+		netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
 
 		netlink_increment_head(ring);
 
@@ -813,7 +793,7 @@ static void netlink_queue_mmaped_skb(struct sock *sk, struct sk_buff *skb)
 	hdr->nm_pid	= NETLINK_CB(skb).creds.pid;
 	hdr->nm_uid	= from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid);
 	hdr->nm_gid	= from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid);
-	netlink_frame_flush_dcache(hdr);
+	netlink_frame_flush_dcache(hdr, hdr->nm_len);
 	netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
 
 	NETLINK_CB(skb).flags |= NETLINK_SKB_DELIVERED;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-12-16 22:58                       ` David Miller
@ 2014-12-16 23:58                         ` Daniel Borkmann
  2014-12-17 16:27                           ` Thomas Graf
  2014-12-17  0:02                         ` Eric Dumazet
  1 sibling, 1 reply; 26+ messages in thread
From: Daniel Borkmann @ 2014-12-16 23:58 UTC (permalink / raw)
  To: David Miller; +Cc: tgraf, luto, torvalds, kaber, netdev

On 12/16/2014 11:58 PM, David Miller wrote:
> From: Thomas Graf <tgraf@suug.ch>
> Date: Thu, 16 Oct 2014 08:07:53 +0100
>
>> On 10/16/14 at 08:45am, Daniel Borkmann wrote:
>>> On 10/15/2014 04:09 AM, David Miller wrote:
>>>> That would work as well.
>>>>
>>>> There are pros and cons to all of these approaches.
>>>>
>>>> I was thinking that if we do the "TX mmap --> copy to kernel buffer"
>>>> approach, then if in the future we find a way to make it work
>>>> reliably, we can avoid the copy.  And frankly performance wise it's no
>>>> worse than what happens via normal sendmsg() calls.
>>>>
>>>> And all applications using NETLINK_RX_RING keep working and keep
>>>> getting the performance boost.
>>>
>>> That would be better, yes. This would avoid having such a TPACKET_V*
>>> API chaos we have in packet sockets if this could be fixed for netlink
>>> eventually.
>>
>> Only saw the second part of Dave's message now. I agree that this
>> is even a better option.
>
> Sorry for dropping the ball on this, here is the patch I have right
> now, any comments?
>
> ====================
> [PATCH] netlink: Always copy on mmap TX.
>
> Checking the file f_count and the nlk->mapped count is not completely
> sufficient to prevent the mmap'd area contents from changing from
> under us during netlink mmap sendmsg() operations.
>
> Be careful to sample the header's length field only once, because this
> could change from under us as well.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>

Looks good to me, thanks for following up on this Dave!

Now this also leaves NETLINK_SKB_TX unused afterwards, so we should
get rid of these bits, too. It's only set in the broken 'exclusive'
path and tested for resetting the slot status in the skb destructor,
but _now_ we set it back immediately to NL_MMAP_STATUS_UNUSED instead
of a NL_MMAP_STATUS_RESERVED -> NL_MMAP_STATUS_UNUSED transition via
destructor.

The atomic (tx)ring->pending logic inside the send path seems also
unused now; similarly as in, resp. taken from packet sockets, it was
there to wait for completion (for blocking syscalls) until the
netlink_skb_destructor() has been invoked and thus the slot given
back to user space.

Anyways, above could probably be followed-up; other than that:

Fixes: 5fd96123ee19 ("netlink: implement memory mapped sendmsg()")
Acked-by: Daniel Borkmann <dborkman@redhat.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-12-16 22:58                       ` David Miller
  2014-12-16 23:58                         ` Daniel Borkmann
@ 2014-12-17  0:02                         ` Eric Dumazet
  2014-12-17 16:26                           ` Thomas Graf
  2014-12-18 10:30                           ` [PATCH net] netlink: Don't reorder loads/stores before marking mmap netlink frame as available Thomas Graf
  1 sibling, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2014-12-17  0:02 UTC (permalink / raw)
  To: David Miller; +Cc: tgraf, dborkman, luto, torvalds, kaber, netdev

On Tue, 2014-12-16 at 17:58 -0500, David Miller wrote:

> +		__skb_put(skb, nm_len);
> +		memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, nm_len);
> +		netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
>  

Not related to this patch, but it looks like netlink_set_status()
barrier is wrong ?

It seems we need smp_wmb() after the memcpy() and before the
hdr->nm_status = status; 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-12-17  0:02                         ` Eric Dumazet
@ 2014-12-17 16:26                           ` Thomas Graf
  2014-12-18 10:30                           ` [PATCH net] netlink: Don't reorder loads/stores before marking mmap netlink frame as available Thomas Graf
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Graf @ 2014-12-17 16:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, dborkman, luto, torvalds, kaber, netdev

On 12/16/14 at 04:02pm, Eric Dumazet wrote:
> On Tue, 2014-12-16 at 17:58 -0500, David Miller wrote:
> 
> > +		__skb_put(skb, nm_len);
> > +		memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, nm_len);
> > +		netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
> >  
> 
> Not related to this patch, but it looks like netlink_set_status()
> barrier is wrong ?
> 
> It seems we need smp_wmb() after the memcpy() and before the
> hdr->nm_status = status; 

Yes, definitely wrong as-is. I'll send a patch. For the particular
case we'd need a smp_rmb() after the memcpy() to complete the loads.
The skb destructor needs a smp_wmb() after setting nm_len. We could
get away with a smp_wmb() first thing in netlink_set_status() with
the code as-is but smp_mb() might be the less fragile thing to do.
Objections?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-12-16 23:58                         ` Daniel Borkmann
@ 2014-12-17 16:27                           ` Thomas Graf
  2014-12-18 17:36                             ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Graf @ 2014-12-17 16:27 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Miller, luto, torvalds, kaber, netdev

On 12/17/14 at 12:58am, Daniel Borkmann wrote:
> Fixes: 5fd96123ee19 ("netlink: implement memory mapped sendmsg()")
> Acked-by: Daniel Borkmann <dborkman@redhat.com>

Nothing to add to Daniel's excellent feedback.

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net] netlink: Don't reorder loads/stores before marking mmap netlink frame as available
  2014-12-17  0:02                         ` Eric Dumazet
  2014-12-17 16:26                           ` Thomas Graf
@ 2014-12-18 10:30                           ` Thomas Graf
  2014-12-18 17:36                             ` David Miller
  2014-12-18 19:13                             ` Linus Torvalds
  1 sibling, 2 replies; 26+ messages in thread
From: Thomas Graf @ 2014-12-18 10:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, dborkman, luto, torvalds, kaber, netdev

Each mmap Netlink frame contains a status field which indicates
whether the frame is unused, reserved, contains data or needs to
be skipped. Both loads and stores may not be reordeded and must
complete before the status field is changed and another CPU might
pick up the frame for use. Use an smp_mb() to cover needs of both
types of callers to netlink_set_status(), callers which have been
reading data frame from the frame, and callers which have been
filling or releasing and thus writing to the frame.

- Example code path requiring a smp_rmb():
  memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
  netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);

- Example code path requiring a smp_wmb():
  hdr->nm_uid	= from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid);
  hdr->nm_gid	= from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid);
  netlink_frame_flush_dcache(hdr);
  netlink_set_status(hdr, NL_MMAP_STATUS_VALID);

Fixes: f9c228 ("netlink: implement memory mapped recvmsg()")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/netlink/af_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ef5f77b..2662821 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -550,9 +550,9 @@ static enum nl_mmap_status netlink_get_status(const struct nl_mmap_hdr *hdr)
 static void netlink_set_status(struct nl_mmap_hdr *hdr,
 			       enum nl_mmap_status status)
 {
+	smp_mb();
 	hdr->nm_status = status;
 	flush_dcache_page(pgvec_to_page(hdr));
-	smp_wmb();
 }
 
 static struct nl_mmap_hdr *

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Netlink mmap tx security?
  2014-12-17 16:27                           ` Thomas Graf
@ 2014-12-18 17:36                             ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2014-12-18 17:36 UTC (permalink / raw)
  To: tgraf; +Cc: dborkman, luto, torvalds, kaber, netdev

From: Thomas Graf <tgraf@suug.ch>
Date: Wed, 17 Dec 2014 16:27:49 +0000

> On 12/17/14 at 12:58am, Daniel Borkmann wrote:
>> Fixes: 5fd96123ee19 ("netlink: implement memory mapped sendmsg()")
>> Acked-by: Daniel Borkmann <dborkman@redhat.com>
> 
> Nothing to add to Daniel's excellent feedback.
> 
> Acked-by: Thomas Graf <tgraf@suug.ch>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] netlink: Don't reorder loads/stores before marking mmap netlink frame as available
  2014-12-18 10:30                           ` [PATCH net] netlink: Don't reorder loads/stores before marking mmap netlink frame as available Thomas Graf
@ 2014-12-18 17:36                             ` David Miller
  2014-12-18 19:13                             ` Linus Torvalds
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2014-12-18 17:36 UTC (permalink / raw)
  To: tgraf; +Cc: eric.dumazet, dborkman, luto, torvalds, kaber, netdev

From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 18 Dec 2014 10:30:26 +0000

> Each mmap Netlink frame contains a status field which indicates
> whether the frame is unused, reserved, contains data or needs to
> be skipped. Both loads and stores may not be reordeded and must
> complete before the status field is changed and another CPU might
> pick up the frame for use. Use an smp_mb() to cover needs of both
> types of callers to netlink_set_status(), callers which have been
> reading data frame from the frame, and callers which have been
> filling or releasing and thus writing to the frame.
> 
> - Example code path requiring a smp_rmb():
>   memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
>   netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
> 
> - Example code path requiring a smp_wmb():
>   hdr->nm_uid	= from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid);
>   hdr->nm_gid	= from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid);
>   netlink_frame_flush_dcache(hdr);
>   netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
> 
> Fixes: f9c228 ("netlink: implement memory mapped recvmsg()")
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>

Also applied and queued up for -stable, thanks Thomas.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net] netlink: Don't reorder loads/stores before marking mmap netlink frame as available
  2014-12-18 10:30                           ` [PATCH net] netlink: Don't reorder loads/stores before marking mmap netlink frame as available Thomas Graf
  2014-12-18 17:36                             ` David Miller
@ 2014-12-18 19:13                             ` Linus Torvalds
  1 sibling, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2014-12-18 19:13 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Eric Dumazet, David Miller, Daniel Borkmann, Andy Lutomirski,
	Patrick McHardy, Network Development

On Thu, Dec 18, 2014 at 2:30 AM, Thomas Graf <tgraf@suug.ch> wrote:
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index ef5f77b..2662821 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -550,9 +550,9 @@ static enum nl_mmap_status netlink_get_status(const struct nl_mmap_hdr *hdr)
>  static void netlink_set_status(struct nl_mmap_hdr *hdr,
>                                enum nl_mmap_status status)
>  {
> +       smp_mb();
>         hdr->nm_status = status;
>         flush_dcache_page(pgvec_to_page(hdr));
> -       smp_wmb();
>  }

If this is performance-critical code (I have no idea), then a better
idea might be to use

        smp_store_release(&hdr->nm_status, status);

instead of using explicit memory barriers. That's going to  generally
be much cheaper than a memory barrier.

                      Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-12-18 19:13 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CALCETrWsCqy7KEKPTOeHjrO1c2JE1E_seO_J+yA=sYFhS11NXQ@mail.gmail.com>
2014-05-12 21:08 ` Netlink mmap tx security? Andy Lutomirski
2014-10-11 22:29   ` Andy Lutomirski
2014-10-11 23:09     ` David Miller
2014-10-14 19:19     ` David Miller
2014-10-14 19:33       ` Andy Lutomirski
2014-10-14 20:00         ` David Miller
2014-10-14 22:16           ` Andy Lutomirski
2014-10-15  2:01             ` David Miller
2014-10-15  2:03               ` Andy Lutomirski
2014-10-15  2:09                 ` David Miller
2014-10-16  6:45                   ` Daniel Borkmann
2014-10-16  7:07                     ` Thomas Graf
2014-12-16 22:58                       ` David Miller
2014-12-16 23:58                         ` Daniel Borkmann
2014-12-17 16:27                           ` Thomas Graf
2014-12-18 17:36                             ` David Miller
2014-12-17  0:02                         ` Eric Dumazet
2014-12-17 16:26                           ` Thomas Graf
2014-12-18 10:30                           ` [PATCH net] netlink: Don't reorder loads/stores before marking mmap netlink frame as available Thomas Graf
2014-12-18 17:36                             ` David Miller
2014-12-18 19:13                             ` Linus Torvalds
2014-10-15 23:45               ` Netlink mmap tx security? Daniel Borkmann
2014-10-15 23:57                 ` David Miller
2014-10-15 23:58                   ` Andy Lutomirski
2014-10-16  3:34                     ` David Miller
2014-10-16  5:52                   ` Thomas Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).