All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 1/2] fou: clean up socket with kfree_rcu
@ 2015-12-15 20:01 Hannes Frederic Sowa
  2015-12-15 20:01 ` [PATCH net 2/2] udp: restrict offloads to one namespace Hannes Frederic Sowa
  2015-12-17  0:03 ` [PATCH net 1/2] fou: clean up socket with kfree_rcu David Miller
  0 siblings, 2 replies; 14+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-15 20:01 UTC (permalink / raw)
  To: netdev; +Cc: Tom Herbert

fou->udp_offloads is managed by RCU. As it is actually included inside
the fou sockets, we cannot let the memory go out of scope before a grace
period. We either can synchronize_rcu or switch over to kfree_rcu to
manage the sockets. kfree_rcu seems appropriate as it is used by vxlan
and geneve.

Fixes: 23461551c00628c ("fou: Support for foo-over-udp RX path")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/fou.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index e0fcbbbcfe54d0..bd903fe0f7508d 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -24,6 +24,7 @@ struct fou {
 	u16 type;
 	struct udp_offload udp_offloads;
 	struct list_head list;
+	struct rcu_head rcu;
 };
 
 #define FOU_F_REMCSUM_NOPARTIAL BIT(0)
@@ -417,7 +418,7 @@ static void fou_release(struct fou *fou)
 	list_del(&fou->list);
 	udp_tunnel_sock_release(sock);
 
-	kfree(fou);
+	kfree_rcu(fou, rcu);
 }
 
 static int fou_encap_init(struct sock *sk, struct fou *fou, struct fou_cfg *cfg)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-15 20:01 [PATCH net 1/2] fou: clean up socket with kfree_rcu Hannes Frederic Sowa
@ 2015-12-15 20:01 ` Hannes Frederic Sowa
  2015-12-15 20:26   ` Tom Herbert
  2015-12-17  0:04   ` David Miller
  2015-12-17  0:03 ` [PATCH net 1/2] fou: clean up socket with kfree_rcu David Miller
  1 sibling, 2 replies; 14+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-15 20:01 UTC (permalink / raw)
  To: netdev; +Cc: Tom Herbert, Eric Dumazet

udp tunnel offloads tend to aggregate datagrams based on inner
headers. gro engine gets notified by tunnel implementations about
possible offloads. The match is solely based on the port number.

Imagine a tunnel bound to port 53, the offloading will look into all
DNS packets and tries to aggregate them based on the inner data found
within. This could lead to data corruption and malformed DNS packets.

While this patch minimizes the problem and helps an administrator to find
the issue by querying ip tunnel/fou, a better way would be to match on
the specific destination ip address so if a user space socket is bound
to the same address it will conflict.

Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 drivers/net/geneve.c   |  2 +-
 drivers/net/vxlan.c    |  2 +-
 include/net/protocol.h |  2 +-
 net/ipv4/fou.c         |  2 +-
 net/ipv4/udp_offload.c | 10 +++++++---
 5 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index c2b79f5d1c89b6..b70ce46a5e7040 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -376,7 +376,7 @@ static void geneve_notify_add_rx_port(struct geneve_sock *gs)
 	int err;
 
 	if (sa_family == AF_INET) {
-		err = udp_add_offload(&gs->udp_offloads);
+		err = udp_add_offload(sock_net(sk), &gs->udp_offloads);
 		if (err)
 			pr_warn("geneve: udp_add_offload failed with status %d\n",
 				err);
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ba363cedef8082..2428175c4dd4bb 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -621,7 +621,7 @@ static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
 	int err;
 
 	if (sa_family == AF_INET) {
-		err = udp_add_offload(&vs->udp_offloads);
+		err = udp_add_offload(net, &vs->udp_offloads);
 		if (err)
 			pr_warn("vxlan: udp_add_offload failed with status %d\n", err);
 	}
diff --git a/include/net/protocol.h b/include/net/protocol.h
index d6fcc1fcdb5b09..da689f5432dee2 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -107,7 +107,7 @@ int inet_del_offload(const struct net_offload *prot, unsigned char num);
 void inet_register_protosw(struct inet_protosw *p);
 void inet_unregister_protosw(struct inet_protosw *p);
 
-int  udp_add_offload(struct udp_offload *prot);
+int  udp_add_offload(struct net *net, struct udp_offload *prot);
 void udp_del_offload(struct udp_offload *prot);
 
 #if IS_ENABLED(CONFIG_IPV6)
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index bd903fe0f7508d..976f0dcf699197 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -498,7 +498,7 @@ static int fou_create(struct net *net, struct fou_cfg *cfg,
 	sk->sk_allocation = GFP_ATOMIC;
 
 	if (cfg->udp_config.family == AF_INET) {
-		err = udp_add_offload(&fou->udp_offloads);
+		err = udp_add_offload(net, &fou->udp_offloads);
 		if (err)
 			goto error;
 	}
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index f9386160cbee02..5d396b96ae8bb9 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -21,6 +21,7 @@ static struct udp_offload_priv __rcu *udp_offload_base __read_mostly;
 
 struct udp_offload_priv {
 	struct udp_offload	*offload;
+	possible_net_t	net;
 	struct rcu_head		rcu;
 	struct udp_offload_priv __rcu *next;
 };
@@ -241,13 +242,14 @@ out:
 	return segs;
 }
 
-int udp_add_offload(struct udp_offload *uo)
+int udp_add_offload(struct net *net, struct udp_offload *uo)
 {
 	struct udp_offload_priv *new_offload = kzalloc(sizeof(*new_offload), GFP_ATOMIC);
 
 	if (!new_offload)
 		return -ENOMEM;
 
+	write_pnet(&new_offload->net, net);
 	new_offload->offload = uo;
 
 	spin_lock(&udp_offload_lock);
@@ -311,7 +313,8 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
 	rcu_read_lock();
 	uo_priv = rcu_dereference(udp_offload_base);
 	for (; uo_priv != NULL; uo_priv = rcu_dereference(uo_priv->next)) {
-		if (uo_priv->offload->port == uh->dest &&
+		if (net_eq(read_pnet(&uo_priv->net), dev_net(skb->dev)) &&
+		    uo_priv->offload->port == uh->dest &&
 		    uo_priv->offload->callbacks.gro_receive)
 			goto unflush;
 	}
@@ -389,7 +392,8 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff)
 
 	uo_priv = rcu_dereference(udp_offload_base);
 	for (; uo_priv != NULL; uo_priv = rcu_dereference(uo_priv->next)) {
-		if (uo_priv->offload->port == uh->dest &&
+		if (net_eq(read_pnet(&uo_priv->net), dev_net(skb->dev)) &&
+		    uo_priv->offload->port == uh->dest &&
 		    uo_priv->offload->callbacks.gro_complete)
 			break;
 	}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-15 20:01 ` [PATCH net 2/2] udp: restrict offloads to one namespace Hannes Frederic Sowa
@ 2015-12-15 20:26   ` Tom Herbert
  2015-12-15 20:46     ` Hannes Frederic Sowa
  2015-12-17  0:04   ` David Miller
  1 sibling, 1 reply; 14+ messages in thread
From: Tom Herbert @ 2015-12-15 20:26 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Linux Kernel Network Developers, Eric Dumazet

On Tue, Dec 15, 2015 at 12:01 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> udp tunnel offloads tend to aggregate datagrams based on inner
> headers. gro engine gets notified by tunnel implementations about
> possible offloads. The match is solely based on the port number.
>
> Imagine a tunnel bound to port 53, the offloading will look into all
> DNS packets and tries to aggregate them based on the inner data found
> within. This could lead to data corruption and malformed DNS packets.
>
> While this patch minimizes the problem and helps an administrator to find
> the issue by querying ip tunnel/fou, a better way would be to match on
> the specific destination ip address so if a user space socket is bound
> to the same address it will conflict.
>
I don't know... seems like this is more likely to add code into the
critical path rather than solve a problem impacting anyone yet. No
other GRO code needs to be namespace aware and none of these fancy HW
offloads for UDP encapsulations are going to care anything about
namespaces.

I think you point out the real underlying problem though, the UDP
offloads are restricted only be done by destination port and nothing
else. A more flexible method would be to allow matching on based
addresses, four tuples, interfaces etc. (latter may be needed to
offload connected UDP).

Tom

> Cc: Tom Herbert <tom@herbertland.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>  drivers/net/geneve.c   |  2 +-
>  drivers/net/vxlan.c    |  2 +-
>  include/net/protocol.h |  2 +-
>  net/ipv4/fou.c         |  2 +-
>  net/ipv4/udp_offload.c | 10 +++++++---
>  5 files changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index c2b79f5d1c89b6..b70ce46a5e7040 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> @@ -376,7 +376,7 @@ static void geneve_notify_add_rx_port(struct geneve_sock *gs)
>         int err;
>
>         if (sa_family == AF_INET) {
> -               err = udp_add_offload(&gs->udp_offloads);
> +               err = udp_add_offload(sock_net(sk), &gs->udp_offloads);
>                 if (err)
>                         pr_warn("geneve: udp_add_offload failed with status %d\n",
>                                 err);
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index ba363cedef8082..2428175c4dd4bb 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -621,7 +621,7 @@ static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
>         int err;
>
>         if (sa_family == AF_INET) {
> -               err = udp_add_offload(&vs->udp_offloads);
> +               err = udp_add_offload(net, &vs->udp_offloads);
>                 if (err)
>                         pr_warn("vxlan: udp_add_offload failed with status %d\n", err);
>         }
> diff --git a/include/net/protocol.h b/include/net/protocol.h
> index d6fcc1fcdb5b09..da689f5432dee2 100644
> --- a/include/net/protocol.h
> +++ b/include/net/protocol.h
> @@ -107,7 +107,7 @@ int inet_del_offload(const struct net_offload *prot, unsigned char num);
>  void inet_register_protosw(struct inet_protosw *p);
>  void inet_unregister_protosw(struct inet_protosw *p);
>
> -int  udp_add_offload(struct udp_offload *prot);
> +int  udp_add_offload(struct net *net, struct udp_offload *prot);
>  void udp_del_offload(struct udp_offload *prot);
>
>  #if IS_ENABLED(CONFIG_IPV6)
> diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
> index bd903fe0f7508d..976f0dcf699197 100644
> --- a/net/ipv4/fou.c
> +++ b/net/ipv4/fou.c
> @@ -498,7 +498,7 @@ static int fou_create(struct net *net, struct fou_cfg *cfg,
>         sk->sk_allocation = GFP_ATOMIC;
>
>         if (cfg->udp_config.family == AF_INET) {
> -               err = udp_add_offload(&fou->udp_offloads);
> +               err = udp_add_offload(net, &fou->udp_offloads);
>                 if (err)
>                         goto error;
>         }
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index f9386160cbee02..5d396b96ae8bb9 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -21,6 +21,7 @@ static struct udp_offload_priv __rcu *udp_offload_base __read_mostly;
>
>  struct udp_offload_priv {
>         struct udp_offload      *offload;
> +       possible_net_t  net;
>         struct rcu_head         rcu;
>         struct udp_offload_priv __rcu *next;
>  };
> @@ -241,13 +242,14 @@ out:
>         return segs;
>  }
>
> -int udp_add_offload(struct udp_offload *uo)
> +int udp_add_offload(struct net *net, struct udp_offload *uo)
>  {
>         struct udp_offload_priv *new_offload = kzalloc(sizeof(*new_offload), GFP_ATOMIC);
>
>         if (!new_offload)
>                 return -ENOMEM;
>
> +       write_pnet(&new_offload->net, net);
>         new_offload->offload = uo;
>
>         spin_lock(&udp_offload_lock);
> @@ -311,7 +313,8 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
>         rcu_read_lock();
>         uo_priv = rcu_dereference(udp_offload_base);
>         for (; uo_priv != NULL; uo_priv = rcu_dereference(uo_priv->next)) {
> -               if (uo_priv->offload->port == uh->dest &&
> +               if (net_eq(read_pnet(&uo_priv->net), dev_net(skb->dev)) &&
> +                   uo_priv->offload->port == uh->dest &&
>                     uo_priv->offload->callbacks.gro_receive)
>                         goto unflush;
>         }
> @@ -389,7 +392,8 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff)
>
>         uo_priv = rcu_dereference(udp_offload_base);
>         for (; uo_priv != NULL; uo_priv = rcu_dereference(uo_priv->next)) {
> -               if (uo_priv->offload->port == uh->dest &&
> +               if (net_eq(read_pnet(&uo_priv->net), dev_net(skb->dev)) &&
> +                   uo_priv->offload->port == uh->dest &&
>                     uo_priv->offload->callbacks.gro_complete)
>                         break;
>         }
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-15 20:26   ` Tom Herbert
@ 2015-12-15 20:46     ` Hannes Frederic Sowa
  2015-12-15 22:39       ` Tom Herbert
  0 siblings, 1 reply; 14+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-15 20:46 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, Eric Dumazet

On 15.12.2015 21:26, Tom Herbert wrote:
> On Tue, Dec 15, 2015 at 12:01 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> udp tunnel offloads tend to aggregate datagrams based on inner
>> headers. gro engine gets notified by tunnel implementations about
>> possible offloads. The match is solely based on the port number.
>>
>> Imagine a tunnel bound to port 53, the offloading will look into all
>> DNS packets and tries to aggregate them based on the inner data found
>> within. This could lead to data corruption and malformed DNS packets.
>>
>> While this patch minimizes the problem and helps an administrator to find
>> the issue by querying ip tunnel/fou, a better way would be to match on
>> the specific destination ip address so if a user space socket is bound
>> to the same address it will conflict.
>>
> I don't know... seems like this is more likely to add code into the
> critical path rather than solve a problem impacting anyone yet. No
> other GRO code needs to be namespace aware and none of these fancy HW
> offloads for UDP encapsulations are going to care anything about
> namespaces.

HW encapsulation actually already respects namespaces, they only iterate
over the net_devices in the namespace the tunnel is created in to push
down the udp port information.

I would like to extend this to destination addresses, too. I am not sure
this is possible and if hw offloads actually corrupt packets.

> I think you point out the real underlying problem though, the UDP
> offloads are restricted only be done by destination port and nothing
> else. A more flexible method would be to allow matching on based
> addresses, four tuples, interfaces etc. (latter may be needed to
> offload connected UDP).

With net namespaces a quadruple does not uniquely identify a socket
anymore, as different netns could have the same ip address bound. So
separation by netns seems to be the first and easy implementable
solution to protect against those problems. I am already working to push
the local address to gro, too.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-15 20:46     ` Hannes Frederic Sowa
@ 2015-12-15 22:39       ` Tom Herbert
  2015-12-16 16:43         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 14+ messages in thread
From: Tom Herbert @ 2015-12-15 22:39 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Linux Kernel Network Developers, Eric Dumazet

On Tue, Dec 15, 2015 at 12:46 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On 15.12.2015 21:26, Tom Herbert wrote:
>> On Tue, Dec 15, 2015 at 12:01 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>>> udp tunnel offloads tend to aggregate datagrams based on inner
>>> headers. gro engine gets notified by tunnel implementations about
>>> possible offloads. The match is solely based on the port number.
>>>
>>> Imagine a tunnel bound to port 53, the offloading will look into all
>>> DNS packets and tries to aggregate them based on the inner data found
>>> within. This could lead to data corruption and malformed DNS packets.
>>>
>>> While this patch minimizes the problem and helps an administrator to find
>>> the issue by querying ip tunnel/fou, a better way would be to match on
>>> the specific destination ip address so if a user space socket is bound
>>> to the same address it will conflict.
>>>
>> I don't know... seems like this is more likely to add code into the
>> critical path rather than solve a problem impacting anyone yet. No
>> other GRO code needs to be namespace aware and none of these fancy HW
>> offloads for UDP encapsulations are going to care anything about
>> namespaces.
>
> HW encapsulation actually already respects namespaces, they only iterate
> over the net_devices in the namespace the tunnel is created in to push
> down the udp port information.
>
> I would like to extend this to destination addresses, too. I am not sure
> this is possible and if hw offloads actually corrupt packets.
>
>> I think you point out the real underlying problem though, the UDP
>> offloads are restricted only be done by destination port and nothing
>> else. A more flexible method would be to allow matching on based
>> addresses, four tuples, interfaces etc. (latter may be needed to
>> offload connected UDP).
>
> With net namespaces a quadruple does not uniquely identify a socket
> anymore, as different netns could have the same ip address bound. So
> separation by netns seems to be the first and easy implementable
> solution to protect against those problems. I am already working to push
> the local address to gro, too.
>
Consider the following scenario with netns:

1) VXLAN is loaded with port number 7777.
2) add_rx_port is caller, driver gets this and then programs device
that port number 7777 means VXLAN.
3) A network name space is added using L3 IPVLAN
4) Application in network space now binds an application (not VXLAN)
to port 7777
5) Packets sent to the application at port 7777 are misinterpreted as
VLXAN by the device

Hopefully, the misinterpretation won't result in corrupted packet (RSS
and checksum offload should not). However, LRO would have the
potential for corruption... Unfortunately,  this is potentially a
problem on the host today with GRO since it appears we are doing GRO
before identifying the packet as IPVLAN :-(

Tom

> Bye,
> Hannes
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-15 22:39       ` Tom Herbert
@ 2015-12-16 16:43         ` Hannes Frederic Sowa
  0 siblings, 0 replies; 14+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-16 16:43 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, Eric Dumazet

On 15.12.2015 23:39, Tom Herbert wrote:
> On Tue, Dec 15, 2015 at 12:46 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> On 15.12.2015 21:26, Tom Herbert wrote:
>>> On Tue, Dec 15, 2015 at 12:01 PM, Hannes Frederic Sowa
>>> <hannes@stressinduktion.org> wrote:
>>>> udp tunnel offloads tend to aggregate datagrams based on inner
>>>> headers. gro engine gets notified by tunnel implementations about
>>>> possible offloads. The match is solely based on the port number.
>>>>
>>>> Imagine a tunnel bound to port 53, the offloading will look into all
>>>> DNS packets and tries to aggregate them based on the inner data found
>>>> within. This could lead to data corruption and malformed DNS packets.
>>>>
>>>> While this patch minimizes the problem and helps an administrator to find
>>>> the issue by querying ip tunnel/fou, a better way would be to match on
>>>> the specific destination ip address so if a user space socket is bound
>>>> to the same address it will conflict.
>>>>
>>> I don't know... seems like this is more likely to add code into the
>>> critical path rather than solve a problem impacting anyone yet. No
>>> other GRO code needs to be namespace aware and none of these fancy HW
>>> offloads for UDP encapsulations are going to care anything about
>>> namespaces.
>>
>> HW encapsulation actually already respects namespaces, they only iterate
>> over the net_devices in the namespace the tunnel is created in to push
>> down the udp port information.
>>
>> I would like to extend this to destination addresses, too. I am not sure
>> this is possible and if hw offloads actually corrupt packets.
>>
>>> I think you point out the real underlying problem though, the UDP
>>> offloads are restricted only be done by destination port and nothing
>>> else. A more flexible method would be to allow matching on based
>>> addresses, four tuples, interfaces etc. (latter may be needed to
>>> offload connected UDP).
>>
>> With net namespaces a quadruple does not uniquely identify a socket
>> anymore, as different netns could have the same ip address bound. So
>> separation by netns seems to be the first and easy implementable
>> solution to protect against those problems. I am already working to push
>> the local address to gro, too.
>>
> Consider the following scenario with netns:
> 
> 1) VXLAN is loaded with port number 7777.
> 2) add_rx_port is caller, driver gets this and then programs device
> that port number 7777 means VXLAN.
> 3) A network name space is added using L3 IPVLAN
> 4) Application in network space now binds an application (not VXLAN)
> to port 7777
> 5) Packets sent to the application at port 7777 are misinterpreted as
> VLXAN by the device
> 
> Hopefully, the misinterpretation won't result in corrupted packet (RSS
> and checksum offload should not). However, LRO would have the
> potential for corruption... Unfortunately,  this is potentially a
> problem on the host today with GRO since it appears we are doing GRO
> before identifying the packet as IPVLAN :-(

Yes, this is also a possible scenario.

In regard to moving interfaces which have enabled hw offloading across
namespaces this becomes funnier.

I think we should start to add a some more offloading querying
facilities either with procfs or netlink.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 1/2] fou: clean up socket with kfree_rcu
  2015-12-15 20:01 [PATCH net 1/2] fou: clean up socket with kfree_rcu Hannes Frederic Sowa
  2015-12-15 20:01 ` [PATCH net 2/2] udp: restrict offloads to one namespace Hannes Frederic Sowa
@ 2015-12-17  0:03 ` David Miller
  1 sibling, 0 replies; 14+ messages in thread
From: David Miller @ 2015-12-17  0:03 UTC (permalink / raw)
  To: hannes; +Cc: netdev, tom

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Tue, 15 Dec 2015 21:01:53 +0100

> fou->udp_offloads is managed by RCU. As it is actually included inside
> the fou sockets, we cannot let the memory go out of scope before a grace
> period. We either can synchronize_rcu or switch over to kfree_rcu to
> manage the sockets. kfree_rcu seems appropriate as it is used by vxlan
> and geneve.
> 
> Fixes: 23461551c00628c ("fou: Support for foo-over-udp RX path")
> Cc: Tom Herbert <tom@herbertland.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Applied.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-15 20:01 ` [PATCH net 2/2] udp: restrict offloads to one namespace Hannes Frederic Sowa
  2015-12-15 20:26   ` Tom Herbert
@ 2015-12-17  0:04   ` David Miller
  2015-12-17  8:49     ` Hannes Frederic Sowa
  1 sibling, 1 reply; 14+ messages in thread
From: David Miller @ 2015-12-17  0:04 UTC (permalink / raw)
  To: hannes; +Cc: netdev, tom, edumazet

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Tue, 15 Dec 2015 21:01:54 +0100

> udp tunnel offloads tend to aggregate datagrams based on inner
> headers. gro engine gets notified by tunnel implementations about
> possible offloads. The match is solely based on the port number.
> 
> Imagine a tunnel bound to port 53, the offloading will look into all
> DNS packets and tries to aggregate them based on the inner data found
> within. This could lead to data corruption and malformed DNS packets.
> 
> While this patch minimizes the problem and helps an administrator to find
> the issue by querying ip tunnel/fou, a better way would be to match on
> the specific destination ip address so if a user space socket is bound
> to the same address it will conflict.
> 
> Cc: Tom Herbert <tom@herbertland.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

It looks this issue is still being hashed out so I've marked this
patch as deferred for now.

THanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-17  0:04   ` David Miller
@ 2015-12-17  8:49     ` Hannes Frederic Sowa
  2015-12-17 17:32       ` Tom Herbert
  0 siblings, 1 reply; 14+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-17  8:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, tom, edumazet

Hi all,

On 17.12.2015 01:04, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Tue, 15 Dec 2015 21:01:54 +0100
> 
>> udp tunnel offloads tend to aggregate datagrams based on inner
>> headers. gro engine gets notified by tunnel implementations about
>> possible offloads. The match is solely based on the port number.
>>
>> Imagine a tunnel bound to port 53, the offloading will look into all
>> DNS packets and tries to aggregate them based on the inner data found
>> within. This could lead to data corruption and malformed DNS packets.
>>
>> While this patch minimizes the problem and helps an administrator to find
>> the issue by querying ip tunnel/fou, a better way would be to match on
>> the specific destination ip address so if a user space socket is bound
>> to the same address it will conflict.
>>
>> Cc: Tom Herbert <tom@herbertland.com>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> 
> It looks this issue is still being hashed out so I've marked this
> patch as deferred for now.


I think we need this patch. We later can decide to add more
classification attributes, like dst ip down to gro, but the netns marks
are important.

With user namespaces a normal user can start a new network namespace
with all privileges and thus add new offloads, letting the other stack
interpret this garbage. Because the user namespace can also add
arbitrary ip addresses to its interface, solely matching those is not
enough.

Tom any further comments?

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-17  8:49     ` Hannes Frederic Sowa
@ 2015-12-17 17:32       ` Tom Herbert
  2015-12-17 17:40         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 14+ messages in thread
From: Tom Herbert @ 2015-12-17 17:32 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Miller, Linux Kernel Network Developers, Eric Dumazet

On Thu, Dec 17, 2015 at 12:49 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi all,
>
> On 17.12.2015 01:04, David Miller wrote:
>> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> Date: Tue, 15 Dec 2015 21:01:54 +0100
>>
>>> udp tunnel offloads tend to aggregate datagrams based on inner
>>> headers. gro engine gets notified by tunnel implementations about
>>> possible offloads. The match is solely based on the port number.
>>>
>>> Imagine a tunnel bound to port 53, the offloading will look into all
>>> DNS packets and tries to aggregate them based on the inner data found
>>> within. This could lead to data corruption and malformed DNS packets.
>>>
>>> While this patch minimizes the problem and helps an administrator to find
>>> the issue by querying ip tunnel/fou, a better way would be to match on
>>> the specific destination ip address so if a user space socket is bound
>>> to the same address it will conflict.
>>>
>>> Cc: Tom Herbert <tom@herbertland.com>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>
>> It looks this issue is still being hashed out so I've marked this
>> patch as deferred for now.
>
>
> I think we need this patch. We later can decide to add more
> classification attributes, like dst ip down to gro, but the netns marks
> are important.
>
> With user namespaces a normal user can start a new network namespace
> with all privileges and thus add new offloads, letting the other stack
> interpret this garbage. Because the user namespace can also add
> arbitrary ip addresses to its interface, solely matching those is not
> enough.
>
> Tom any further comments?
>
I still don't think this addresses the core problem. If we're just
worried about offloads being added in a user namespace that conflict
with the those in the root space, it might be just as easy to disallow
setting offloads except in default namespace.

The core problem is that UDP port numbers don't have global meaning,
and don't really have any meaning to anyone except the sender and
receiver. This is different from IP protocol numbers, where IP
protocol number 6 is always interpreted as TCP anywhere in the
network. From RFC7605:

"It is important to recognize that any interpretation of port numbers
-- except at the endpoints -- may be incorrect, because port numbers
are meaningful only at the endpoints."

In the case of device offloads the device is not an endpoint so
interpretation of port numbers may be incorrect. This is also true in
GRO since it happens before it has been determined that packet is
being received at the local endpoint. The possibility of
misinterpretation based on destination port in the stack occurs when
we process packets that are later be forwarded as opposed to received
which can happen with netns or even with just forwarding enabled. If
the misinterpretation causes corruption or mis-delivery the fault lies
in the *implementation* not the protocol!

To address this in the host stack the solution is pretty
straightforward, we need to decide that the packet is going to be
received before applying any offloads. Essentially we want to do an
early_demux _really_ early. If we demux and get UDP socket for
instance, then the protocol specific GRO function can be retrieved
from the socket. So this will work with single listener port like
encaps do today,  and also if encapsulation is being used over a
connected socket. This also works if we want to support a user defined
GRO function like I mentioned we might want to do for QUIC etc.

For hardware offloads the problem is harder to solve to be completely
correct (or a least correct approaching 100% probability).
Possibilities are:
1) Use protocol agnostic offloads since they don't care about UDP or
port numbers (we've already discussed this!)
2) Use magic numbers in the protocol
(https://www.ietf.org/id/draft-herbert-udp-magic-numbers-01.txt).
3) Use ntuple filters identify the packets to be subject to offload
based on more than just. This really should have been the interface
for VXLAN offload from the beginning anyway!

Tom

> Thanks,
> Hannes
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-17 17:32       ` Tom Herbert
@ 2015-12-17 17:40         ` Hannes Frederic Sowa
  2015-12-17 18:10           ` Tom Herbert
  0 siblings, 1 reply; 14+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-17 17:40 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Kernel Network Developers, Eric Dumazet

On 17.12.2015 18:32, Tom Herbert wrote:
> On Thu, Dec 17, 2015 at 12:49 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> With user namespaces a normal user can start a new network namespace
>> with all privileges and thus add new offloads, letting the other stack
>> interpret this garbage. Because the user namespace can also add
>> arbitrary ip addresses to its interface, solely matching those is not
>> enough.
>>
>> Tom any further comments?
>>
> I still don't think this addresses the core problem. If we're just
> worried about offloads being added in a user namespace that conflict
> with the those in the root space, it might be just as easy to disallow
> setting offloads except in default namespace.

I am fine with that solution, too.

> [...]
>
> To address this in the host stack the solution is pretty
> straightforward, we need to decide that the packet is going to be
> received before applying any offloads. Essentially we want to do an
> early_demux _really_ early. If we demux and get UDP socket for
> instance, then the protocol specific GRO function can be retrieved
> from the socket. So this will work with single listener port like
> encaps do today,  and also if encapsulation is being used over a
> connected socket. This also works if we want to support a user defined
> GRO function like I mentioned we might want to do for QUIC etc.

An approximation can be done, but I don't think it is feasible to
implement this kind of checks across namespace borders, ip rules and
netfilter rulesets, which could all change the outcome of the process.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-17 17:40         ` Hannes Frederic Sowa
@ 2015-12-17 18:10           ` Tom Herbert
  2015-12-17 20:33             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 14+ messages in thread
From: Tom Herbert @ 2015-12-17 18:10 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Miller, Linux Kernel Network Developers, Eric Dumazet

On Thu, Dec 17, 2015 at 9:40 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On 17.12.2015 18:32, Tom Herbert wrote:
>> On Thu, Dec 17, 2015 at 12:49 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>>> With user namespaces a normal user can start a new network namespace
>>> with all privileges and thus add new offloads, letting the other stack
>>> interpret this garbage. Because the user namespace can also add
>>> arbitrary ip addresses to its interface, solely matching those is not
>>> enough.
>>>
>>> Tom any further comments?
>>>
>> I still don't think this addresses the core problem. If we're just
>> worried about offloads being added in a user namespace that conflict
>> with the those in the root space, it might be just as easy to disallow
>> setting offloads except in default namespace.
>
> I am fine with that solution, too.
>
>> [...]
>>
>> To address this in the host stack the solution is pretty
>> straightforward, we need to decide that the packet is going to be
>> received before applying any offloads. Essentially we want to do an
>> early_demux _really_ early. If we demux and get UDP socket for
>> instance, then the protocol specific GRO function can be retrieved
>> from the socket. So this will work with single listener port like
>> encaps do today,  and also if encapsulation is being used over a
>> connected socket. This also works if we want to support a user defined
>> GRO function like I mentioned we might want to do for QUIC etc.
>
> An approximation can be done, but I don't think it is feasible to
> implement this kind of checks across namespace borders, ip rules and
> netfilter rulesets, which could all change the outcome of the process.
>
For receive offloads we don't need to worry about checking other namespaces.

> Bye,
> Hannes
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-17 18:10           ` Tom Herbert
@ 2015-12-17 20:33             ` Hannes Frederic Sowa
  2015-12-17 21:31               ` Tom Herbert
  0 siblings, 1 reply; 14+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-17 20:33 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Kernel Network Developers, Eric Dumazet

On 17.12.2015 19:10, Tom Herbert wrote:
> On Thu, Dec 17, 2015 at 9:40 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> On 17.12.2015 18:32, Tom Herbert wrote:
>>> On Thu, Dec 17, 2015 at 12:49 AM, Hannes Frederic Sowa
>>> <hannes@stressinduktion.org> wrote:
>>>> With user namespaces a normal user can start a new network namespace
>>>> with all privileges and thus add new offloads, letting the other stack
>>>> interpret this garbage. Because the user namespace can also add
>>>> arbitrary ip addresses to its interface, solely matching those is not
>>>> enough.
>>>>
>>>> Tom any further comments?
>>>>
>>> I still don't think this addresses the core problem. If we're just
>>> worried about offloads being added in a user namespace that conflict
>>> with the those in the root space, it might be just as easy to disallow
>>> setting offloads except in default namespace.
>>
>> I am fine with that solution, too.
>>
>>> [...]
>>>
>>> To address this in the host stack the solution is pretty
>>> straightforward, we need to decide that the packet is going to be
>>> received before applying any offloads. Essentially we want to do an
>>> early_demux _really_ early. If we demux and get UDP socket for
>>> instance, then the protocol specific GRO function can be retrieved
>>> from the socket. So this will work with single listener port like
>>> encaps do today,  and also if encapsulation is being used over a
>>> connected socket. This also works if we want to support a user defined
>>> GRO function like I mentioned we might want to do for QUIC etc.
>>
>> An approximation can be done, but I don't think it is feasible to
>> implement this kind of checks across namespace borders, ip rules and
>> netfilter rulesets, which could all change the outcome of the process.
>>
> For receive offloads we don't need to worry about checking other namespaces.

That is true. Albeit for net-branch/stable I would still suggest either
this patch or restricting udp offloads just to the initial net namespace.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net 2/2] udp: restrict offloads to one namespace
  2015-12-17 20:33             ` Hannes Frederic Sowa
@ 2015-12-17 21:31               ` Tom Herbert
  0 siblings, 0 replies; 14+ messages in thread
From: Tom Herbert @ 2015-12-17 21:31 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Miller, Linux Kernel Network Developers, Eric Dumazet

On Thu, Dec 17, 2015 at 12:33 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On 17.12.2015 19:10, Tom Herbert wrote:
>> On Thu, Dec 17, 2015 at 9:40 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>>> On 17.12.2015 18:32, Tom Herbert wrote:
>>>> On Thu, Dec 17, 2015 at 12:49 AM, Hannes Frederic Sowa
>>>> <hannes@stressinduktion.org> wrote:
>>>>> With user namespaces a normal user can start a new network namespace
>>>>> with all privileges and thus add new offloads, letting the other stack
>>>>> interpret this garbage. Because the user namespace can also add
>>>>> arbitrary ip addresses to its interface, solely matching those is not
>>>>> enough.
>>>>>
>>>>> Tom any further comments?
>>>>>
>>>> I still don't think this addresses the core problem. If we're just
>>>> worried about offloads being added in a user namespace that conflict
>>>> with the those in the root space, it might be just as easy to disallow
>>>> setting offloads except in default namespace.
>>>
>>> I am fine with that solution, too.
>>>
>>>> [...]
>>>>
>>>> To address this in the host stack the solution is pretty
>>>> straightforward, we need to decide that the packet is going to be
>>>> received before applying any offloads. Essentially we want to do an
>>>> early_demux _really_ early. If we demux and get UDP socket for
>>>> instance, then the protocol specific GRO function can be retrieved
>>>> from the socket. So this will work with single listener port like
>>>> encaps do today,  and also if encapsulation is being used over a
>>>> connected socket. This also works if we want to support a user defined
>>>> GRO function like I mentioned we might want to do for QUIC etc.
>>>
>>> An approximation can be done, but I don't think it is feasible to
>>> implement this kind of checks across namespace borders, ip rules and
>>> netfilter rulesets, which could all change the outcome of the process.
>>>
>> For receive offloads we don't need to worry about checking other namespaces.
>
> That is true. Albeit for net-branch/stable I would still suggest either
> this patch or restricting udp offloads just to the initial net namespace.
>
I would opt for the latter then.

> Bye,
> Hannes
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-12-17 21:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-15 20:01 [PATCH net 1/2] fou: clean up socket with kfree_rcu Hannes Frederic Sowa
2015-12-15 20:01 ` [PATCH net 2/2] udp: restrict offloads to one namespace Hannes Frederic Sowa
2015-12-15 20:26   ` Tom Herbert
2015-12-15 20:46     ` Hannes Frederic Sowa
2015-12-15 22:39       ` Tom Herbert
2015-12-16 16:43         ` Hannes Frederic Sowa
2015-12-17  0:04   ` David Miller
2015-12-17  8:49     ` Hannes Frederic Sowa
2015-12-17 17:32       ` Tom Herbert
2015-12-17 17:40         ` Hannes Frederic Sowa
2015-12-17 18:10           ` Tom Herbert
2015-12-17 20:33             ` Hannes Frederic Sowa
2015-12-17 21:31               ` Tom Herbert
2015-12-17  0:03 ` [PATCH net 1/2] fou: clean up socket with kfree_rcu David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.