All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Restore UFO support to virtio_net devices
@ 2015-01-26 14:37 Vladislav Yasevich
  2015-01-26 14:37 ` [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set Vladislav Yasevich
                   ` (5 more replies)
  0 siblings, 6 replies; 36+ messages in thread
From: Vladislav Yasevich @ 2015-01-26 14:37 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, mst, ben, virtualization

commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4
Author: Ben Hutchings <ben@decadent.org.uk>
Date:   Thu Oct 30 18:27:12 2014 +0000

    drivers/net: Disable UFO through virtio

Turned off UFO support to virtio-net based devices due to issues
with IPv6 fragment id generation for UFO packets.  The issue
was that IPv6 UFO/GSO implementation expects the fragment id
to be supplied in skb_shinfo().  However, for packets generated
by the VMs, the fragment id is not supplied which causes all
IPv6 fragments to have the id of 0.

The problem is that turning off UFO support on tap/macvtap
as well as virtio devices caused issues with migrations.  
Migrations would fail when moving a vm from a kernel supporting
expecting UFO to work to the newer kernels that disabled UFO.

This series provides a partial solution to address the migration
issue.  The series reserves a bit in the skb and sets the bit
with the ipv6 fragment id has been generated for the packet.
UFO/GSO code then checks the bit to see if the fragment id
is already present or if a new fragment id needs to be generated.
This solution allows host-originated UFO packets to keep a
better randomized fragment id, as well as generating a randomized
id for VM generated traffic (solving the fragment id 0 issue).

Vladislav Yasevich (3):
  ipv6: Select fragment id during UFO/GSO segmentation if not set.
  Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO
    packets"
  Revert "drivers/net: Disable UFO through virtio"

 drivers/net/macvtap.c    | 16 ++++++++--------
 drivers/net/tun.c        | 25 +++++++++----------------
 drivers/net/virtio_net.c | 24 ++++++++++--------------
 include/linux/skbuff.h   |  3 ++-
 include/net/ipv6.h       |  2 ++
 net/ipv6/ip6_output.c    |  4 ++--
 net/ipv6/output_core.c   |  9 ++++++++-
 net/ipv6/udp_offload.c   | 10 +++++++++-
 8 files changed, 50 insertions(+), 43 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-26 14:37 [PATCH 0/3] Restore UFO support to virtio_net devices Vladislav Yasevich
@ 2015-01-26 14:37 ` Vladislav Yasevich
  2015-01-27  2:47   ` Ben Hutchings
  2015-01-26 14:37 ` [PATCH 2/3] Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets" Vladislav Yasevich
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Vladislav Yasevich @ 2015-01-26 14:37 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, mst, ben, virtualization

If the IPv6 fragment id has not been set and we perform
fragmentation due to UFO, select a new fragment id.
When we store the fragment id into skb_shinfo, set the bit
in the skb so we can re-use the selected id.
This preserves the behavior of UFO packets generated on the
host and solves the issue of id generation for packet sockets
and tap/macvtap devices.

This patch moves ipv6_select_ident() back in to the header file.  
It also provides the helper function that sets skb_shinfo() frag
id and sets the bit.

It also makes sure that we select the fragment id when doing
just gso validation, since it's possible for the packet to
come from an untrusted source (VM) and be forwarded through
a UFO enabled device which will expect the fragment id.

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
 include/linux/skbuff.h |  3 ++-
 include/net/ipv6.h     |  2 ++
 net/ipv6/ip6_output.c  |  4 ++--
 net/ipv6/output_core.c |  9 ++++++++-
 net/ipv6/udp_offload.c | 10 +++++++++-
 5 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 85ab7d7..3ad5203 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -605,7 +605,8 @@ struct sk_buff {
 	__u8			ipvs_property:1;
 	__u8			inner_protocol_type:1;
 	__u8			remcsum_offload:1;
-	/* 3 or 5 bit hole */
+	__u8			ufo_fragid_set:1;
+	/* 2 or 4 bit hole */
 
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 4292929..ca6137b 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -671,7 +671,9 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add
 	return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr));
 }
 
+void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt);
 void ipv6_proxy_select_ident(struct sk_buff *skb);
+void ipv6_skb_set_fragid(struct sk_buff *skb, __be32 frag_id);
 
 int ip6_dst_hoplimit(struct dst_entry *dst);
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ce69a12..b940b3f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -537,7 +537,7 @@ static void ip6_copy_metadata(struct sk_buff *to, struct sk_buff *from)
 	skb_copy_secmark(to, from);
 }
 
-static void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt)
+void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt)
 {
 	static u32 ip6_idents_hashrnd __read_mostly;
 	u32 hash, id;
@@ -1092,7 +1092,7 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 				     sizeof(struct frag_hdr)) & ~7;
 	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
 	ipv6_select_ident(&fhdr, rt);
-	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
+	ipv6_skb_set_fragid(skb, fhdr.identification);
 
 append:
 	return skb_append_datato_frags(sk, skb, getfrag, from,
diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index 97f41a3..68f879b 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -38,10 +38,17 @@ void ipv6_proxy_select_ident(struct sk_buff *skb)
 	hash = __ipv6_addr_jhash(&addrs[0], hash);
 
 	id = ip_idents_reserve(hash, 1);
-	skb_shinfo(skb)->ip6_frag_id = htonl(id);
+	ipv6_skb_set_fragid(skb, htonl(id));
 }
 EXPORT_SYMBOL_GPL(ipv6_proxy_select_ident);
 
+void ipv6_skb_set_fragid(struct sk_buff *skb, __be32 frag_id)
+{
+	skb_shinfo(skb)->ip6_frag_id = frag_id;
+	skb->ufo_fragid_set = 1;
+}
+EXPORT_SYMBOL_GPL(ipv6_skb_set_fragid);
+
 int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
 {
 	u16 offset = sizeof(struct ipv6hdr);
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index b6aa8ed..7cda88d 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -52,6 +52,10 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 
 		skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss);
 
+		/* Set the IPv6 fragment id if not set yet */
+		if (!skb->ufo_fragid_set)
+			ipv6_proxy_select_ident(skb);
+
 		segs = NULL;
 		goto out;
 	}
@@ -108,7 +112,11 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 		fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
 		fptr->nexthdr = nexthdr;
 		fptr->reserved = 0;
-		fptr->identification = skb_shinfo(skb)->ip6_frag_id;
+		if (!skb->ufo_fragid_set)
+			fptr->identification = skb_shinfo(skb)->ip6_frag_id;
+		else
+			ipv6_select_ident(fptr,
+					  (struct rt6_info *)skb_dst(skb));
 
 		/* Fragment the skb. ipv6 header and the remaining fields of the
 		 * fragment header are updated in ipv6_gso_segment()
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/3] Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets"
  2015-01-26 14:37 [PATCH 0/3] Restore UFO support to virtio_net devices Vladislav Yasevich
  2015-01-26 14:37 ` [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set Vladislav Yasevich
  2015-01-26 14:37 ` [PATCH 2/3] Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets" Vladislav Yasevich
@ 2015-01-26 14:37 ` Vladislav Yasevich
  2015-01-26 14:37 ` [PATCH 3/3] Revert "drivers/net: Disable UFO through virtio" Vladislav Yasevich
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Vladislav Yasevich @ 2015-01-26 14:37 UTC (permalink / raw)
  To: netdev; +Cc: virtualization, mst, ben, edumazet, Vladislav Yasevich

This reverts commit 5188cd44c55db3e92cd9e77a40b5baa7ed4340f7.

Now that GSO layer can track if fragment id has been selected
and can allocate on if necessary, we don't need to do this in
tap and macvtap.  This reverts most of the code and only keeps
the new ipv6 fragment id generation function that is still needed.

It also solves this issues for packet sockets which were
missed before.

Fixes: 3d0ad09412ff (drivers/net: Disable UFO through virtio)
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
 drivers/net/macvtap.c | 3 ---
 drivers/net/tun.c     | 6 +-----
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 7df2217..0b86e46 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -17,7 +17,6 @@
 #include <linux/fs.h>
 #include <linux/uio.h>
 
-#include <net/ipv6.h>
 #include <net/net_namespace.h>
 #include <net/rtnetlink.h>
 #include <net/sock.h>
@@ -589,8 +588,6 @@ static int macvtap_skb_from_vnet_hdr(struct macvtap_queue *q,
 			pr_warn_once("macvtap: %s: using disabled UFO feature; please fix this program\n",
 				     current->comm);
 			gso_type = SKB_GSO_UDP;
-			if (skb->protocol == htons(ETH_P_IPV6))
-				ipv6_proxy_select_ident(skb);
 			break;
 		default:
 			return -EINVAL;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8c8dc16..5ca42b7 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -65,7 +65,6 @@
 #include <linux/nsproxy.h>
 #include <linux/virtio_net.h>
 #include <linux/rcupdate.h>
-#include <net/ipv6.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
@@ -1167,8 +1166,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		break;
 	}
 
-	skb_reset_network_header(skb);
-
 	if (gso.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
 		pr_debug("GSO!\n");
 		switch (gso.gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
@@ -1189,8 +1186,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 					    current->comm);
 			}
 			skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
-			if (skb->protocol == htons(ETH_P_IPV6))
-				ipv6_proxy_select_ident(skb);
 			break;
 		}
 		default:
@@ -1221,6 +1216,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
 	}
 
+	skb_reset_network_header(skb);
 	skb_probe_transport_header(skb, 0);
 
 	rxhash = skb_get_hash(skb);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/3] Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets"
  2015-01-26 14:37 [PATCH 0/3] Restore UFO support to virtio_net devices Vladislav Yasevich
  2015-01-26 14:37 ` [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set Vladislav Yasevich
@ 2015-01-26 14:37 ` Vladislav Yasevich
  2015-01-26 14:37 ` Vladislav Yasevich
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Vladislav Yasevich @ 2015-01-26 14:37 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, mst, ben, virtualization

This reverts commit 5188cd44c55db3e92cd9e77a40b5baa7ed4340f7.

Now that GSO layer can track if fragment id has been selected
and can allocate on if necessary, we don't need to do this in
tap and macvtap.  This reverts most of the code and only keeps
the new ipv6 fragment id generation function that is still needed.

It also solves this issues for packet sockets which were
missed before.

Fixes: 3d0ad09412ff (drivers/net: Disable UFO through virtio)
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
 drivers/net/macvtap.c | 3 ---
 drivers/net/tun.c     | 6 +-----
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 7df2217..0b86e46 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -17,7 +17,6 @@
 #include <linux/fs.h>
 #include <linux/uio.h>
 
-#include <net/ipv6.h>
 #include <net/net_namespace.h>
 #include <net/rtnetlink.h>
 #include <net/sock.h>
@@ -589,8 +588,6 @@ static int macvtap_skb_from_vnet_hdr(struct macvtap_queue *q,
 			pr_warn_once("macvtap: %s: using disabled UFO feature; please fix this program\n",
 				     current->comm);
 			gso_type = SKB_GSO_UDP;
-			if (skb->protocol == htons(ETH_P_IPV6))
-				ipv6_proxy_select_ident(skb);
 			break;
 		default:
 			return -EINVAL;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8c8dc16..5ca42b7 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -65,7 +65,6 @@
 #include <linux/nsproxy.h>
 #include <linux/virtio_net.h>
 #include <linux/rcupdate.h>
-#include <net/ipv6.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
@@ -1167,8 +1166,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		break;
 	}
 
-	skb_reset_network_header(skb);
-
 	if (gso.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
 		pr_debug("GSO!\n");
 		switch (gso.gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
@@ -1189,8 +1186,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 					    current->comm);
 			}
 			skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
-			if (skb->protocol == htons(ETH_P_IPV6))
-				ipv6_proxy_select_ident(skb);
 			break;
 		}
 		default:
@@ -1221,6 +1216,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
 	}
 
+	skb_reset_network_header(skb);
 	skb_probe_transport_header(skb, 0);
 
 	rxhash = skb_get_hash(skb);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/3] Revert "drivers/net: Disable UFO through virtio"
  2015-01-26 14:37 [PATCH 0/3] Restore UFO support to virtio_net devices Vladislav Yasevich
                   ` (2 preceding siblings ...)
  2015-01-26 14:37 ` Vladislav Yasevich
@ 2015-01-26 14:37 ` Vladislav Yasevich
  2015-01-27  2:51   ` Ben Hutchings
  2015-01-26 15:28 ` [PATCH 0/3] Restore UFO support to virtio_net devices Michael S. Tsirkin
  2015-01-26 15:28 ` Michael S. Tsirkin
  5 siblings, 1 reply; 36+ messages in thread
From: Vladislav Yasevich @ 2015-01-26 14:37 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, mst, ben, virtualization

This reverts commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4.

Now that UFO functionality can correctly track the fragment
id has been selected and select a fragment id if necessary,
we can re-enable UFO on tap/macvap and virtio devices.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
 drivers/net/macvtap.c    | 13 ++++++++-----
 drivers/net/tun.c        | 19 ++++++++-----------
 drivers/net/virtio_net.c | 24 ++++++++++--------------
 3 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 0b86e46..919f4fc 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -80,7 +80,7 @@ static struct cdev macvtap_cdev;
 static const struct proto_ops macvtap_socket_ops;
 
 #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
-		      NETIF_F_TSO6)
+		      NETIF_F_TSO6 | NETIF_F_UFO)
 #define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
 #define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
 
@@ -585,8 +585,6 @@ static int macvtap_skb_from_vnet_hdr(struct macvtap_queue *q,
 			gso_type = SKB_GSO_TCPV6;
 			break;
 		case VIRTIO_NET_HDR_GSO_UDP:
-			pr_warn_once("macvtap: %s: using disabled UFO feature; please fix this program\n",
-				     current->comm);
 			gso_type = SKB_GSO_UDP;
 			break;
 		default:
@@ -633,6 +631,8 @@ static void macvtap_skb_to_vnet_hdr(struct macvtap_queue *q,
 			vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
 		else if (sinfo->gso_type & SKB_GSO_TCPV6)
 			vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+		else if (sinfo->gso_type & SKB_GSO_UDP)
+			vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_UDP;
 		else
 			BUG();
 		if (sinfo->gso_type & SKB_GSO_TCP_ECN)
@@ -962,6 +962,9 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
 			if (arg & TUN_F_TSO6)
 				feature_mask |= NETIF_F_TSO6;
 		}
+
+		if (arg & TUN_F_UFO)
+			feature_mask |= NETIF_F_UFO;
 	}
 
 	/* tun/tap driver inverts the usage for TSO offloads, where
@@ -972,7 +975,7 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
 	 * When user space turns off TSO, we turn off GSO/LRO so that
 	 * user-space will not receive TSO frames.
 	 */
-	if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6))
+	if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO))
 		features |= RX_OFFLOADS;
 	else
 		features &= ~RX_OFFLOADS;
@@ -1087,7 +1090,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 	case TUNSETOFFLOAD:
 		/* let the user check for future flags */
 		if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
-			    TUN_F_TSO_ECN))
+			    TUN_F_TSO_ECN | TUN_F_UFO))
 			return -EINVAL;
 
 		rtnl_lock();
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 5ca42b7..10f9e40 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -186,7 +186,7 @@ struct tun_struct {
 	struct net_device	*dev;
 	netdev_features_t	set_features;
 #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
-			  NETIF_F_TSO6)
+			  NETIF_F_TSO6|NETIF_F_UFO)
 
 	int			vnet_hdr_sz;
 	int			sndbuf;
@@ -1176,18 +1176,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
 			break;
 		case VIRTIO_NET_HDR_GSO_UDP:
-		{
-			static bool warned;
-
-			if (!warned) {
-				warned = true;
-				netdev_warn(tun->dev,
-					    "%s: using disabled UFO feature; please fix this program\n",
-					    current->comm);
-			}
 			skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
 			break;
-		}
 		default:
 			tun->dev->stats.rx_frame_errors++;
 			kfree_skb(skb);
@@ -1294,6 +1284,8 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 				gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
 			else if (sinfo->gso_type & SKB_GSO_TCPV6)
 				gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+			else if (sinfo->gso_type & SKB_GSO_UDP)
+				gso.gso_type = VIRTIO_NET_HDR_GSO_UDP;
 			else {
 				pr_err("unexpected GSO type: "
 				       "0x%x, gso_size %d, hdr_len %d\n",
@@ -1742,6 +1734,11 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
 				features |= NETIF_F_TSO6;
 			arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
 		}
+
+		if (arg & TUN_F_UFO) {
+			features |= NETIF_F_UFO;
+			arg &= ~TUN_F_UFO;
+		}
 	}
 
 	/* This gives the user a way to test for new features in future by
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5ca9771..059fdf1 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -490,17 +490,8 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
 			break;
 		case VIRTIO_NET_HDR_GSO_UDP:
-		{
-			static bool warned;
-
-			if (!warned) {
-				warned = true;
-				netdev_warn(dev,
-					    "host using disabled UFO feature; please fix it\n");
-			}
 			skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
 			break;
-		}
 		case VIRTIO_NET_HDR_GSO_TCPV6:
 			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
 			break;
@@ -888,6 +879,8 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 			hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
 		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
 			hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+		else if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
+			hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
 		else
 			BUG();
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_ECN)
@@ -1748,7 +1741,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 			dev->features |= NETIF_F_HW_CSUM|NETIF_F_SG|NETIF_F_FRAGLIST;
 
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_GSO)) {
-			dev->hw_features |= NETIF_F_TSO
+			dev->hw_features |= NETIF_F_TSO | NETIF_F_UFO
 				| NETIF_F_TSO_ECN | NETIF_F_TSO6;
 		}
 		/* Individual feature bits: what can host handle? */
@@ -1758,9 +1751,11 @@ static int virtnet_probe(struct virtio_device *vdev)
 			dev->hw_features |= NETIF_F_TSO6;
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
 			dev->hw_features |= NETIF_F_TSO_ECN;
+		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UFO))
+			dev->hw_features |= NETIF_F_UFO;
 
 		if (gso)
-			dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
+			dev->features |= dev->hw_features & (NETIF_F_ALL_TSO|NETIF_F_UFO);
 		/* (!csum && gso) case will be fixed by register_netdev() */
 	}
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
@@ -1798,7 +1793,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 	/* If we can receive ANY GSO packets, we must allocate large ones. */
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
 	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
-	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
+	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
+	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
 		vi->big_packets = true;
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
@@ -1994,9 +1990,9 @@ static struct virtio_device_id id_table[] = {
 static unsigned int features[] = {
 	VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM,
 	VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC,
-	VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6,
+	VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6,
 	VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
-	VIRTIO_NET_F_GUEST_ECN,
+	VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
 	VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
 	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Restore UFO support to virtio_net devices
  2015-01-26 14:37 [PATCH 0/3] Restore UFO support to virtio_net devices Vladislav Yasevich
                   ` (4 preceding siblings ...)
  2015-01-26 15:28 ` [PATCH 0/3] Restore UFO support to virtio_net devices Michael S. Tsirkin
@ 2015-01-26 15:28 ` Michael S. Tsirkin
  2015-01-26 15:32   ` Michael S. Tsirkin
  5 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2015-01-26 15:28 UTC (permalink / raw)
  To: Vladislav Yasevich
  Cc: netdev, virtualization, ben, edumazet, Vladislav Yasevich, David Miller

On Mon, Jan 26, 2015 at 09:37:03AM -0500, Vladislav Yasevich wrote:
> commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4
> Author: Ben Hutchings <ben@decadent.org.uk>
> Date:   Thu Oct 30 18:27:12 2014 +0000
> 
>     drivers/net: Disable UFO through virtio
> 
> Turned off UFO support to virtio-net based devices due to issues
> with IPv6 fragment id generation for UFO packets.  The issue
> was that IPv6 UFO/GSO implementation expects the fragment id
> to be supplied in skb_shinfo().  However, for packets generated
> by the VMs, the fragment id is not supplied which causes all
> IPv6 fragments to have the id of 0.
> 
> The problem is that turning off UFO support on tap/macvtap
> as well as virtio devices caused issues with migrations.  
> Migrations would fail when moving a vm from a kernel supporting
> expecting UFO to work to the newer kernels that disabled UFO.
> 
> This series provides a partial solution to address the migration
> issue.  The series reserves a bit in the skb and sets the bit
> with the ipv6 fragment id has been generated for the packet.
> UFO/GSO code then checks the bit to see if the fragment id
> is already present or if a new fragment id needs to be generated.
> This solution allows host-originated UFO packets to keep a
> better randomized fragment id, as well as generating a randomized
> id for VM generated traffic (solving the fragment id 0 issue).
> 
> Vladislav Yasevich (3):
>   ipv6: Select fragment id during UFO/GSO segmentation if not set.
>   Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO
>     packets"
>   Revert "drivers/net: Disable UFO through virtio"
> 
>  drivers/net/macvtap.c    | 16 ++++++++--------
>  drivers/net/tun.c        | 25 +++++++++----------------
>  drivers/net/virtio_net.c | 24 ++++++++++--------------
>  include/linux/skbuff.h   |  3 ++-
>  include/net/ipv6.h       |  2 ++
>  net/ipv6/ip6_output.c    |  4 ++--
>  net/ipv6/output_core.c   |  9 ++++++++-
>  net/ipv6/udp_offload.c   | 10 +++++++++-
>  8 files changed, 50 insertions(+), 43 deletions(-)

Should also help a lot to put UDP performance where it was.

Acked-by: Michael S. Tsirkin <mst@redhat.com>



> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Restore UFO support to virtio_net devices
  2015-01-26 14:37 [PATCH 0/3] Restore UFO support to virtio_net devices Vladislav Yasevich
                   ` (3 preceding siblings ...)
  2015-01-26 14:37 ` [PATCH 3/3] Revert "drivers/net: Disable UFO through virtio" Vladislav Yasevich
@ 2015-01-26 15:28 ` Michael S. Tsirkin
  2015-01-26 15:28 ` Michael S. Tsirkin
  5 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2015-01-26 15:28 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: netdev, virtualization, edumazet, ben, David Miller

On Mon, Jan 26, 2015 at 09:37:03AM -0500, Vladislav Yasevich wrote:
> commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4
> Author: Ben Hutchings <ben@decadent.org.uk>
> Date:   Thu Oct 30 18:27:12 2014 +0000
> 
>     drivers/net: Disable UFO through virtio
> 
> Turned off UFO support to virtio-net based devices due to issues
> with IPv6 fragment id generation for UFO packets.  The issue
> was that IPv6 UFO/GSO implementation expects the fragment id
> to be supplied in skb_shinfo().  However, for packets generated
> by the VMs, the fragment id is not supplied which causes all
> IPv6 fragments to have the id of 0.
> 
> The problem is that turning off UFO support on tap/macvtap
> as well as virtio devices caused issues with migrations.  
> Migrations would fail when moving a vm from a kernel supporting
> expecting UFO to work to the newer kernels that disabled UFO.
> 
> This series provides a partial solution to address the migration
> issue.  The series reserves a bit in the skb and sets the bit
> with the ipv6 fragment id has been generated for the packet.
> UFO/GSO code then checks the bit to see if the fragment id
> is already present or if a new fragment id needs to be generated.
> This solution allows host-originated UFO packets to keep a
> better randomized fragment id, as well as generating a randomized
> id for VM generated traffic (solving the fragment id 0 issue).
> 
> Vladislav Yasevich (3):
>   ipv6: Select fragment id during UFO/GSO segmentation if not set.
>   Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO
>     packets"
>   Revert "drivers/net: Disable UFO through virtio"
> 
>  drivers/net/macvtap.c    | 16 ++++++++--------
>  drivers/net/tun.c        | 25 +++++++++----------------
>  drivers/net/virtio_net.c | 24 ++++++++++--------------
>  include/linux/skbuff.h   |  3 ++-
>  include/net/ipv6.h       |  2 ++
>  net/ipv6/ip6_output.c    |  4 ++--
>  net/ipv6/output_core.c   |  9 ++++++++-
>  net/ipv6/udp_offload.c   | 10 +++++++++-
>  8 files changed, 50 insertions(+), 43 deletions(-)

Should also help a lot to put UDP performance where it was.

Acked-by: Michael S. Tsirkin <mst@redhat.com>



> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] Restore UFO support to virtio_net devices
  2015-01-26 15:28 ` Michael S. Tsirkin
@ 2015-01-26 15:32   ` Michael S. Tsirkin
  0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2015-01-26 15:32 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: netdev, virtualization, edumazet, ben, David Miller

On Mon, Jan 26, 2015 at 05:28:42PM +0200, Michael S. Tsirkin wrote:
> On Mon, Jan 26, 2015 at 09:37:03AM -0500, Vladislav Yasevich wrote:
> > commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4
> > Author: Ben Hutchings <ben@decadent.org.uk>
> > Date:   Thu Oct 30 18:27:12 2014 +0000
> > 
> >     drivers/net: Disable UFO through virtio
> > 
> > Turned off UFO support to virtio-net based devices due to issues
> > with IPv6 fragment id generation for UFO packets.  The issue
> > was that IPv6 UFO/GSO implementation expects the fragment id
> > to be supplied in skb_shinfo().  However, for packets generated
> > by the VMs, the fragment id is not supplied which causes all
> > IPv6 fragments to have the id of 0.
> > 
> > The problem is that turning off UFO support on tap/macvtap
> > as well as virtio devices caused issues with migrations.  
> > Migrations would fail when moving a vm from a kernel supporting
> > expecting UFO to work to the newer kernels that disabled UFO.
> > 
> > This series provides a partial solution to address the migration
> > issue.  The series reserves a bit in the skb and sets the bit
> > with the ipv6 fragment id has been generated for the packet.
> > UFO/GSO code then checks the bit to see if the fragment id
> > is already present or if a new fragment id needs to be generated.
> > This solution allows host-originated UFO packets to keep a
> > better randomized fragment id, as well as generating a randomized
> > id for VM generated traffic (solving the fragment id 0 issue).
> > 
> > Vladislav Yasevich (3):
> >   ipv6: Select fragment id during UFO/GSO segmentation if not set.
> >   Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO
> >     packets"
> >   Revert "drivers/net: Disable UFO through virtio"
> > 
> >  drivers/net/macvtap.c    | 16 ++++++++--------
> >  drivers/net/tun.c        | 25 +++++++++----------------
> >  drivers/net/virtio_net.c | 24 ++++++++++--------------
> >  include/linux/skbuff.h   |  3 ++-
> >  include/net/ipv6.h       |  2 ++
> >  net/ipv6/ip6_output.c    |  4 ++--
> >  net/ipv6/output_core.c   |  9 ++++++++-
> >  net/ipv6/udp_offload.c   | 10 +++++++++-
> >  8 files changed, 50 insertions(+), 43 deletions(-)
> 
> Should also help a lot to put UDP performance where it was.
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 

BTW these might also be a good stable candidate.

> 
> > -- 
> > 1.9.3

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-26 14:37 ` [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set Vladislav Yasevich
@ 2015-01-27  2:47   ` Ben Hutchings
  2015-01-27  8:27     ` David Miller
                       ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Ben Hutchings @ 2015-01-27  2:47 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: netdev, edumazet, mst, virtualization


[-- Attachment #1.1: Type: text/plain, Size: 1772 bytes --]

On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> If the IPv6 fragment id has not been set and we perform
> fragmentation due to UFO, select a new fragment id.
> When we store the fragment id into skb_shinfo, set the bit
> in the skb so we can re-use the selected id.
> This preserves the behavior of UFO packets generated on the
> host and solves the issue of id generation for packet sockets
> and tap/macvtap devices.
> 
> This patch moves ipv6_select_ident() back in to the header file.  
> It also provides the helper function that sets skb_shinfo() frag
> id and sets the bit.
> 
> It also makes sure that we select the fragment id when doing
> just gso validation, since it's possible for the packet to
> come from an untrusted source (VM) and be forwarded through
> a UFO enabled device which will expect the fragment id.
> 
> CC: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
>  include/linux/skbuff.h |  3 ++-
>  include/net/ipv6.h     |  2 ++
>  net/ipv6/ip6_output.c  |  4 ++--
>  net/ipv6/output_core.c |  9 ++++++++-
>  net/ipv6/udp_offload.c | 10 +++++++++-
>  5 files changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 85ab7d7..3ad5203 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -605,7 +605,8 @@ struct sk_buff {
>  	__u8			ipvs_property:1;
>  	__u8			inner_protocol_type:1;
>  	__u8			remcsum_offload:1;
> -	/* 3 or 5 bit hole */
> +	__u8			ufo_fragid_set:1;
[...]

Doesn't the flag belong in struct skb_shared_info, rather than struct
sk_buff?  Otherwise this looks fine.

Ben.

-- 
Ben Hutchings
When in doubt, use brute force. - Ken Thompson

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/3] Revert "drivers/net: Disable UFO through virtio"
  2015-01-26 14:37 ` [PATCH 3/3] Revert "drivers/net: Disable UFO through virtio" Vladislav Yasevich
@ 2015-01-27  2:51   ` Ben Hutchings
  0 siblings, 0 replies; 36+ messages in thread
From: Ben Hutchings @ 2015-01-27  2:51 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: netdev, edumazet, mst, virtualization


[-- Attachment #1.1: Type: text/plain, Size: 572 bytes --]

On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> This reverts commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4.
> 
> Now that UFO functionality can correctly track the fragment
> id has been selected and select a fragment id if necessary,
> we can re-enable UFO on tap/macvap and virtio devices.
> 
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
[...]

Acked-by: Ben Hutchings <ben@decadent.org.uk>

I think this is sensible even before your other patches.

Ben.

-- 
Ben Hutchings
When in doubt, use brute force. - Ken Thompson

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27  2:47   ` Ben Hutchings
  2015-01-27  8:27     ` David Miller
@ 2015-01-27  8:27     ` David Miller
  2015-01-27  8:42     ` Michael S. Tsirkin
  2015-01-27  8:42     ` Michael S. Tsirkin
  3 siblings, 0 replies; 36+ messages in thread
From: David Miller @ 2015-01-27  8:27 UTC (permalink / raw)
  To: ben; +Cc: vyasevich, netdev, virtualization, mst, edumazet, vyasevic

From: Ben Hutchings <ben@decadent.org.uk>
Date: Tue, 27 Jan 2015 02:47:54 +0000

> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
>> If the IPv6 fragment id has not been set and we perform
>> fragmentation due to UFO, select a new fragment id.
>> When we store the fragment id into skb_shinfo, set the bit
>> in the skb so we can re-use the selected id.
>> This preserves the behavior of UFO packets generated on the
>> host and solves the issue of id generation for packet sockets
>> and tap/macvtap devices.
>> 
>> This patch moves ipv6_select_ident() back in to the header file.  
>> It also provides the helper function that sets skb_shinfo() frag
>> id and sets the bit.
>> 
>> It also makes sure that we select the fragment id when doing
>> just gso validation, since it's possible for the packet to
>> come from an untrusted source (VM) and be forwarded through
>> a UFO enabled device which will expect the fragment id.
>> 
>> CC: Eric Dumazet <edumazet@google.com>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>

Applied, thanks Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27  2:47   ` Ben Hutchings
@ 2015-01-27  8:27     ` David Miller
  2015-01-27  8:27     ` David Miller
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: David Miller @ 2015-01-27  8:27 UTC (permalink / raw)
  To: ben; +Cc: mst, netdev, vyasevich, virtualization, edumazet

From: Ben Hutchings <ben@decadent.org.uk>
Date: Tue, 27 Jan 2015 02:47:54 +0000

> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
>> If the IPv6 fragment id has not been set and we perform
>> fragmentation due to UFO, select a new fragment id.
>> When we store the fragment id into skb_shinfo, set the bit
>> in the skb so we can re-use the selected id.
>> This preserves the behavior of UFO packets generated on the
>> host and solves the issue of id generation for packet sockets
>> and tap/macvtap devices.
>> 
>> This patch moves ipv6_select_ident() back in to the header file.  
>> It also provides the helper function that sets skb_shinfo() frag
>> id and sets the bit.
>> 
>> It also makes sure that we select the fragment id when doing
>> just gso validation, since it's possible for the packet to
>> come from an untrusted source (VM) and be forwarded through
>> a UFO enabled device which will expect the fragment id.
>> 
>> CC: Eric Dumazet <edumazet@google.com>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>

Applied, thanks Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27  2:47   ` Ben Hutchings
                       ` (2 preceding siblings ...)
  2015-01-27  8:42     ` Michael S. Tsirkin
@ 2015-01-27  8:42     ` Michael S. Tsirkin
  2015-01-27 13:47       ` Hannes Frederic Sowa
  2015-01-27 13:47       ` Hannes Frederic Sowa
  3 siblings, 2 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2015-01-27  8:42 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Vladislav Yasevich, netdev, virtualization, edumazet, Vladislav Yasevich

On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > If the IPv6 fragment id has not been set and we perform
> > fragmentation due to UFO, select a new fragment id.
> > When we store the fragment id into skb_shinfo, set the bit
> > in the skb so we can re-use the selected id.
> > This preserves the behavior of UFO packets generated on the
> > host and solves the issue of id generation for packet sockets
> > and tap/macvtap devices.
> > 
> > This patch moves ipv6_select_ident() back in to the header file.  
> > It also provides the helper function that sets skb_shinfo() frag
> > id and sets the bit.
> > 
> > It also makes sure that we select the fragment id when doing
> > just gso validation, since it's possible for the packet to
> > come from an untrusted source (VM) and be forwarded through
> > a UFO enabled device which will expect the fragment id.
> > 
> > CC: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > ---
> >  include/linux/skbuff.h |  3 ++-
> >  include/net/ipv6.h     |  2 ++
> >  net/ipv6/ip6_output.c  |  4 ++--
> >  net/ipv6/output_core.c |  9 ++++++++-
> >  net/ipv6/udp_offload.c | 10 +++++++++-
> >  5 files changed, 23 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 85ab7d7..3ad5203 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -605,7 +605,8 @@ struct sk_buff {
> >  	__u8			ipvs_property:1;
> >  	__u8			inner_protocol_type:1;
> >  	__u8			remcsum_offload:1;
> > -	/* 3 or 5 bit hole */
> > +	__u8			ufo_fragid_set:1;
> [...]
> 
> Doesn't the flag belong in struct skb_shared_info, rather than struct
> sk_buff?  Otherwise this looks fine.
> 
> Ben.

Hmm we seem to be out of tx flags.
Maybe ip6_frag_id == 0 should mean "not set".


> -- 
> Ben Hutchings
> When in doubt, use brute force. - Ken Thompson

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27  2:47   ` Ben Hutchings
  2015-01-27  8:27     ` David Miller
  2015-01-27  8:27     ` David Miller
@ 2015-01-27  8:42     ` Michael S. Tsirkin
  2015-01-27  8:42     ` Michael S. Tsirkin
  3 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2015-01-27  8:42 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, Vladislav Yasevich, edumazet, virtualization

On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > If the IPv6 fragment id has not been set and we perform
> > fragmentation due to UFO, select a new fragment id.
> > When we store the fragment id into skb_shinfo, set the bit
> > in the skb so we can re-use the selected id.
> > This preserves the behavior of UFO packets generated on the
> > host and solves the issue of id generation for packet sockets
> > and tap/macvtap devices.
> > 
> > This patch moves ipv6_select_ident() back in to the header file.  
> > It also provides the helper function that sets skb_shinfo() frag
> > id and sets the bit.
> > 
> > It also makes sure that we select the fragment id when doing
> > just gso validation, since it's possible for the packet to
> > come from an untrusted source (VM) and be forwarded through
> > a UFO enabled device which will expect the fragment id.
> > 
> > CC: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > ---
> >  include/linux/skbuff.h |  3 ++-
> >  include/net/ipv6.h     |  2 ++
> >  net/ipv6/ip6_output.c  |  4 ++--
> >  net/ipv6/output_core.c |  9 ++++++++-
> >  net/ipv6/udp_offload.c | 10 +++++++++-
> >  5 files changed, 23 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 85ab7d7..3ad5203 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -605,7 +605,8 @@ struct sk_buff {
> >  	__u8			ipvs_property:1;
> >  	__u8			inner_protocol_type:1;
> >  	__u8			remcsum_offload:1;
> > -	/* 3 or 5 bit hole */
> > +	__u8			ufo_fragid_set:1;
> [...]
> 
> Doesn't the flag belong in struct skb_shared_info, rather than struct
> sk_buff?  Otherwise this looks fine.
> 
> Ben.

Hmm we seem to be out of tx flags.
Maybe ip6_frag_id == 0 should mean "not set".


> -- 
> Ben Hutchings
> When in doubt, use brute force. - Ken Thompson

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27  8:42     ` Michael S. Tsirkin
@ 2015-01-27 13:47       ` Hannes Frederic Sowa
  2015-01-27 14:26         ` Vlad Yasevich
  2015-01-27 13:47       ` Hannes Frederic Sowa
  1 sibling, 1 reply; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-27 13:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ben Hutchings, Vladislav Yasevich, netdev, virtualization,
	edumazet, Vladislav Yasevich

On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > > If the IPv6 fragment id has not been set and we perform
> > > fragmentation due to UFO, select a new fragment id.
> > > When we store the fragment id into skb_shinfo, set the bit
> > > in the skb so we can re-use the selected id.
> > > This preserves the behavior of UFO packets generated on the
> > > host and solves the issue of id generation for packet sockets
> > > and tap/macvtap devices.
> > > 
> > > This patch moves ipv6_select_ident() back in to the header file.  
> > > It also provides the helper function that sets skb_shinfo() frag
> > > id and sets the bit.
> > > 
> > > It also makes sure that we select the fragment id when doing
> > > just gso validation, since it's possible for the packet to
> > > come from an untrusted source (VM) and be forwarded through
> > > a UFO enabled device which will expect the fragment id.
> > > 
> > > CC: Eric Dumazet <edumazet@google.com>
> > > Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > ---
> > >  include/linux/skbuff.h |  3 ++-
> > >  include/net/ipv6.h     |  2 ++
> > >  net/ipv6/ip6_output.c  |  4 ++--
> > >  net/ipv6/output_core.c |  9 ++++++++-
> > >  net/ipv6/udp_offload.c | 10 +++++++++-
> > >  5 files changed, 23 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index 85ab7d7..3ad5203 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -605,7 +605,8 @@ struct sk_buff {
> > >  	__u8			ipvs_property:1;
> > >  	__u8			inner_protocol_type:1;
> > >  	__u8			remcsum_offload:1;
> > > -	/* 3 or 5 bit hole */
> > > +	__u8			ufo_fragid_set:1;
> > [...]
> > 
> > Doesn't the flag belong in struct skb_shared_info, rather than struct
> > sk_buff?  Otherwise this looks fine.
> > 
> > Ben.
> 
> Hmm we seem to be out of tx flags.
> Maybe ip6_frag_id == 0 should mean "not set".

Maybe that is the best idea. Definitely the ufo_fragid_set bit should
move into the skb_shared_info area.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27  8:42     ` Michael S. Tsirkin
  2015-01-27 13:47       ` Hannes Frederic Sowa
@ 2015-01-27 13:47       ` Hannes Frederic Sowa
  1 sibling, 0 replies; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-27 13:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > > If the IPv6 fragment id has not been set and we perform
> > > fragmentation due to UFO, select a new fragment id.
> > > When we store the fragment id into skb_shinfo, set the bit
> > > in the skb so we can re-use the selected id.
> > > This preserves the behavior of UFO packets generated on the
> > > host and solves the issue of id generation for packet sockets
> > > and tap/macvtap devices.
> > > 
> > > This patch moves ipv6_select_ident() back in to the header file.  
> > > It also provides the helper function that sets skb_shinfo() frag
> > > id and sets the bit.
> > > 
> > > It also makes sure that we select the fragment id when doing
> > > just gso validation, since it's possible for the packet to
> > > come from an untrusted source (VM) and be forwarded through
> > > a UFO enabled device which will expect the fragment id.
> > > 
> > > CC: Eric Dumazet <edumazet@google.com>
> > > Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > ---
> > >  include/linux/skbuff.h |  3 ++-
> > >  include/net/ipv6.h     |  2 ++
> > >  net/ipv6/ip6_output.c  |  4 ++--
> > >  net/ipv6/output_core.c |  9 ++++++++-
> > >  net/ipv6/udp_offload.c | 10 +++++++++-
> > >  5 files changed, 23 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index 85ab7d7..3ad5203 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -605,7 +605,8 @@ struct sk_buff {
> > >  	__u8			ipvs_property:1;
> > >  	__u8			inner_protocol_type:1;
> > >  	__u8			remcsum_offload:1;
> > > -	/* 3 or 5 bit hole */
> > > +	__u8			ufo_fragid_set:1;
> > [...]
> > 
> > Doesn't the flag belong in struct skb_shared_info, rather than struct
> > sk_buff?  Otherwise this looks fine.
> > 
> > Ben.
> 
> Hmm we seem to be out of tx flags.
> Maybe ip6_frag_id == 0 should mean "not set".

Maybe that is the best idea. Definitely the ufo_fragid_set bit should
move into the skb_shared_info area.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27 13:47       ` Hannes Frederic Sowa
@ 2015-01-27 14:26         ` Vlad Yasevich
  2015-01-27 14:38           ` Eric Dumazet
  2015-01-27 16:02           ` Hannes Frederic Sowa
  0 siblings, 2 replies; 36+ messages in thread
From: Vlad Yasevich @ 2015-01-27 14:26 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Michael S. Tsirkin
  Cc: netdev, Vladislav Yasevich, Ben Hutchings, edumazet, virtualization

On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
>> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
>>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
>>>> If the IPv6 fragment id has not been set and we perform
>>>> fragmentation due to UFO, select a new fragment id.
>>>> When we store the fragment id into skb_shinfo, set the bit
>>>> in the skb so we can re-use the selected id.
>>>> This preserves the behavior of UFO packets generated on the
>>>> host and solves the issue of id generation for packet sockets
>>>> and tap/macvtap devices.
>>>>
>>>> This patch moves ipv6_select_ident() back in to the header file.  
>>>> It also provides the helper function that sets skb_shinfo() frag
>>>> id and sets the bit.
>>>>
>>>> It also makes sure that we select the fragment id when doing
>>>> just gso validation, since it's possible for the packet to
>>>> come from an untrusted source (VM) and be forwarded through
>>>> a UFO enabled device which will expect the fragment id.
>>>>
>>>> CC: Eric Dumazet <edumazet@google.com>
>>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>>>> ---
>>>>  include/linux/skbuff.h |  3 ++-
>>>>  include/net/ipv6.h     |  2 ++
>>>>  net/ipv6/ip6_output.c  |  4 ++--
>>>>  net/ipv6/output_core.c |  9 ++++++++-
>>>>  net/ipv6/udp_offload.c | 10 +++++++++-
>>>>  5 files changed, 23 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>> index 85ab7d7..3ad5203 100644
>>>> --- a/include/linux/skbuff.h
>>>> +++ b/include/linux/skbuff.h
>>>> @@ -605,7 +605,8 @@ struct sk_buff {
>>>>  	__u8			ipvs_property:1;
>>>>  	__u8			inner_protocol_type:1;
>>>>  	__u8			remcsum_offload:1;
>>>> -	/* 3 or 5 bit hole */
>>>> +	__u8			ufo_fragid_set:1;
>>> [...]
>>>
>>> Doesn't the flag belong in struct skb_shared_info, rather than struct
>>> sk_buff?  Otherwise this looks fine.
>>>
>>> Ben.
>>
>> Hmm we seem to be out of tx flags.
>> Maybe ip6_frag_id == 0 should mean "not set".
> 
> Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> move into the skb_shared_info area.

That's what I originally wanted to do, but had to move and grow txflags thus
skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.

I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
from the protocol perspective and could actually be generated by the id generator
functions.  This may cause us to call the id generation multiple times.

-vlad
> 
> Thanks,
> Hannes
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27 14:26         ` Vlad Yasevich
@ 2015-01-27 14:38           ` Eric Dumazet
  2015-01-27 16:02           ` Hannes Frederic Sowa
  1 sibling, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2015-01-27 14:38 UTC (permalink / raw)
  To: vyasevic
  Cc: Michael S. Tsirkin, netdev, Vladislav Yasevich, virtualization,
	edumazet, Hannes Frederic Sowa, Ben Hutchings

On Tue, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:

> That's what I originally wanted to do, but had to move and grow txflags thus
> skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> 
> I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> from the protocol perspective and could actually be generated by the id generator
> functions.  This may cause us to call the id generation multiple times.

With 32bit ID, you certainly can replace fragid=0 by fragid=0x80000000
and nobody will notice.

I certainly vote for not adding an extra bit in skb or skb_shared_info,
considering we already consume 32bits for this thing.

fragid are best effort, otherwise they would have 128bits.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27 14:26         ` Vlad Yasevich
  2015-01-27 14:38           ` Eric Dumazet
@ 2015-01-27 16:02           ` Hannes Frederic Sowa
  2015-01-27 16:08             ` Michael S. Tsirkin
  2015-01-27 16:25             ` Vlad Yasevich
  1 sibling, 2 replies; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-27 16:02 UTC (permalink / raw)
  To: vyasevic
  Cc: Michael S. Tsirkin, netdev, Vladislav Yasevich, virtualization,
	edumazet, Ben Hutchings

On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> > On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> >> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> >>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> >>>> If the IPv6 fragment id has not been set and we perform
> >>>> fragmentation due to UFO, select a new fragment id.
> >>>> When we store the fragment id into skb_shinfo, set the bit
> >>>> in the skb so we can re-use the selected id.
> >>>> This preserves the behavior of UFO packets generated on the
> >>>> host and solves the issue of id generation for packet sockets
> >>>> and tap/macvtap devices.
> >>>>
> >>>> This patch moves ipv6_select_ident() back in to the header file.  
> >>>> It also provides the helper function that sets skb_shinfo() frag
> >>>> id and sets the bit.
> >>>>
> >>>> It also makes sure that we select the fragment id when doing
> >>>> just gso validation, since it's possible for the packet to
> >>>> come from an untrusted source (VM) and be forwarded through
> >>>> a UFO enabled device which will expect the fragment id.
> >>>>
> >>>> CC: Eric Dumazet <edumazet@google.com>
> >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> >>>> ---
> >>>>  include/linux/skbuff.h |  3 ++-
> >>>>  include/net/ipv6.h     |  2 ++
> >>>>  net/ipv6/ip6_output.c  |  4 ++--
> >>>>  net/ipv6/output_core.c |  9 ++++++++-
> >>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> >>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >>>> index 85ab7d7..3ad5203 100644
> >>>> --- a/include/linux/skbuff.h
> >>>> +++ b/include/linux/skbuff.h
> >>>> @@ -605,7 +605,8 @@ struct sk_buff {
> >>>>  	__u8			ipvs_property:1;
> >>>>  	__u8			inner_protocol_type:1;
> >>>>  	__u8			remcsum_offload:1;
> >>>> -	/* 3 or 5 bit hole */
> >>>> +	__u8			ufo_fragid_set:1;
> >>> [...]
> >>>
> >>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> >>> sk_buff?  Otherwise this looks fine.
> >>>
> >>> Ben.
> >>
> >> Hmm we seem to be out of tx flags.
> >> Maybe ip6_frag_id == 0 should mean "not set".
> > 
> > Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> > move into the skb_shared_info area.
> 
> That's what I originally wanted to do, but had to move and grow txflags thus
> skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> 
> I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> from the protocol perspective and could actually be generated by the id generator
> functions.  This may cause us to call the id generation multiple times.

Are there plans in the long run to let virtio_net transmit auxiliary
data to the other end so we can clean all of this this up one day?

I don't like the whole situation: looking into the virtio_net headers
just adding a field for ipv6 fragmentation ids to those small structs
seems bloated, not doing it feels incorrect. :/

Thoughts?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27 16:02           ` Hannes Frederic Sowa
@ 2015-01-27 16:08             ` Michael S. Tsirkin
  2015-01-28  8:25               ` Hannes Frederic Sowa
  2015-01-27 16:25             ` Vlad Yasevich
  1 sibling, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2015-01-27 16:08 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> > On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> > > On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> > >> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > >>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > >>>> If the IPv6 fragment id has not been set and we perform
> > >>>> fragmentation due to UFO, select a new fragment id.
> > >>>> When we store the fragment id into skb_shinfo, set the bit
> > >>>> in the skb so we can re-use the selected id.
> > >>>> This preserves the behavior of UFO packets generated on the
> > >>>> host and solves the issue of id generation for packet sockets
> > >>>> and tap/macvtap devices.
> > >>>>
> > >>>> This patch moves ipv6_select_ident() back in to the header file.  
> > >>>> It also provides the helper function that sets skb_shinfo() frag
> > >>>> id and sets the bit.
> > >>>>
> > >>>> It also makes sure that we select the fragment id when doing
> > >>>> just gso validation, since it's possible for the packet to
> > >>>> come from an untrusted source (VM) and be forwarded through
> > >>>> a UFO enabled device which will expect the fragment id.
> > >>>>
> > >>>> CC: Eric Dumazet <edumazet@google.com>
> > >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > >>>> ---
> > >>>>  include/linux/skbuff.h |  3 ++-
> > >>>>  include/net/ipv6.h     |  2 ++
> > >>>>  net/ipv6/ip6_output.c  |  4 ++--
> > >>>>  net/ipv6/output_core.c |  9 ++++++++-
> > >>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> > >>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> > >>>>
> > >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > >>>> index 85ab7d7..3ad5203 100644
> > >>>> --- a/include/linux/skbuff.h
> > >>>> +++ b/include/linux/skbuff.h
> > >>>> @@ -605,7 +605,8 @@ struct sk_buff {
> > >>>>  	__u8			ipvs_property:1;
> > >>>>  	__u8			inner_protocol_type:1;
> > >>>>  	__u8			remcsum_offload:1;
> > >>>> -	/* 3 or 5 bit hole */
> > >>>> +	__u8			ufo_fragid_set:1;
> > >>> [...]
> > >>>
> > >>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> > >>> sk_buff?  Otherwise this looks fine.
> > >>>
> > >>> Ben.
> > >>
> > >> Hmm we seem to be out of tx flags.
> > >> Maybe ip6_frag_id == 0 should mean "not set".
> > > 
> > > Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> > > move into the skb_shared_info area.
> > 
> > That's what I originally wanted to do, but had to move and grow txflags thus
> > skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> > 
> > I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> > from the protocol perspective and could actually be generated by the id generator
> > functions.  This may cause us to call the id generation multiple times.
> 
> Are there plans in the long run to let virtio_net transmit auxiliary
> data to the other end so we can clean all of this this up one day?
> 
> I don't like the whole situation: looking into the virtio_net headers
> just adding a field for ipv6 fragmentation ids to those small structs
> seems bloated, not doing it feels incorrect. :/
> 
> Thoughts?
> 
> Bye,
> Hannes

I'm not sure - what will be achieved by generating the IDs guest side as
opposed to host side?  It's certainly harder to get hold of entropy
guest-side.

-- 
MST

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27 16:02           ` Hannes Frederic Sowa
  2015-01-27 16:08             ` Michael S. Tsirkin
@ 2015-01-27 16:25             ` Vlad Yasevich
  1 sibling, 0 replies; 36+ messages in thread
From: Vlad Yasevich @ 2015-01-27 16:25 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Michael S. Tsirkin, netdev, Vladislav Yasevich, virtualization,
	edumazet, Ben Hutchings

On 01/27/2015 11:02 AM, Hannes Frederic Sowa wrote:
> On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
>> On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
>>> On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
>>>> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
>>>>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
>>>>>> If the IPv6 fragment id has not been set and we perform
>>>>>> fragmentation due to UFO, select a new fragment id.
>>>>>> When we store the fragment id into skb_shinfo, set the bit
>>>>>> in the skb so we can re-use the selected id.
>>>>>> This preserves the behavior of UFO packets generated on the
>>>>>> host and solves the issue of id generation for packet sockets
>>>>>> and tap/macvtap devices.
>>>>>>
>>>>>> This patch moves ipv6_select_ident() back in to the header file.  
>>>>>> It also provides the helper function that sets skb_shinfo() frag
>>>>>> id and sets the bit.
>>>>>>
>>>>>> It also makes sure that we select the fragment id when doing
>>>>>> just gso validation, since it's possible for the packet to
>>>>>> come from an untrusted source (VM) and be forwarded through
>>>>>> a UFO enabled device which will expect the fragment id.
>>>>>>
>>>>>> CC: Eric Dumazet <edumazet@google.com>
>>>>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>>>>>> ---
>>>>>>  include/linux/skbuff.h |  3 ++-
>>>>>>  include/net/ipv6.h     |  2 ++
>>>>>>  net/ipv6/ip6_output.c  |  4 ++--
>>>>>>  net/ipv6/output_core.c |  9 ++++++++-
>>>>>>  net/ipv6/udp_offload.c | 10 +++++++++-
>>>>>>  5 files changed, 23 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>>>> index 85ab7d7..3ad5203 100644
>>>>>> --- a/include/linux/skbuff.h
>>>>>> +++ b/include/linux/skbuff.h
>>>>>> @@ -605,7 +605,8 @@ struct sk_buff {
>>>>>>  	__u8			ipvs_property:1;
>>>>>>  	__u8			inner_protocol_type:1;
>>>>>>  	__u8			remcsum_offload:1;
>>>>>> -	/* 3 or 5 bit hole */
>>>>>> +	__u8			ufo_fragid_set:1;
>>>>> [...]
>>>>>
>>>>> Doesn't the flag belong in struct skb_shared_info, rather than struct
>>>>> sk_buff?  Otherwise this looks fine.
>>>>>
>>>>> Ben.
>>>>
>>>> Hmm we seem to be out of tx flags.
>>>> Maybe ip6_frag_id == 0 should mean "not set".
>>>
>>> Maybe that is the best idea. Definitely the ufo_fragid_set bit should
>>> move into the skb_shared_info area.
>>
>> That's what I originally wanted to do, but had to move and grow txflags thus
>> skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
>>
>> I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
>> from the protocol perspective and could actually be generated by the id generator
>> functions.  This may cause us to call the id generation multiple times.
> 
> Are there plans in the long run to let virtio_net transmit auxiliary
> data to the other end so we can clean all of this this up one day?

Yes, and I am working on this.  Part of that is UFO/UFO6 split so that the
fragment id is carried for UFO6.

> 
> I don't like the whole situation: looking into the virtio_net headers
> just adding a field for ipv6 fragmentation ids to those small structs
> seems bloated, not doing it feels incorrect. :/
> 

We are thinking right now how to extend virtio_net header as this is
not the only extension people have thought off.  It'll probably only
happen for virtio 1.0 spec, so we may still have to support legacy
devices that may still rely on UFO being available.

-vlad
> Thoughts?
> 
> Bye,
> Hannes
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-27 16:08             ` Michael S. Tsirkin
@ 2015-01-28  8:25               ` Hannes Frederic Sowa
  2015-01-28  9:46                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-28  8:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

Hello,

On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> > On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> > > On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> > > > On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> > > >> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > > >>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > > >>>> If the IPv6 fragment id has not been set and we perform
> > > >>>> fragmentation due to UFO, select a new fragment id.
> > > >>>> When we store the fragment id into skb_shinfo, set the bit
> > > >>>> in the skb so we can re-use the selected id.
> > > >>>> This preserves the behavior of UFO packets generated on the
> > > >>>> host and solves the issue of id generation for packet sockets
> > > >>>> and tap/macvtap devices.
> > > >>>>
> > > >>>> This patch moves ipv6_select_ident() back in to the header file.  
> > > >>>> It also provides the helper function that sets skb_shinfo() frag
> > > >>>> id and sets the bit.
> > > >>>>
> > > >>>> It also makes sure that we select the fragment id when doing
> > > >>>> just gso validation, since it's possible for the packet to
> > > >>>> come from an untrusted source (VM) and be forwarded through
> > > >>>> a UFO enabled device which will expect the fragment id.
> > > >>>>
> > > >>>> CC: Eric Dumazet <edumazet@google.com>
> > > >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > >>>> ---
> > > >>>>  include/linux/skbuff.h |  3 ++-
> > > >>>>  include/net/ipv6.h     |  2 ++
> > > >>>>  net/ipv6/ip6_output.c  |  4 ++--
> > > >>>>  net/ipv6/output_core.c |  9 ++++++++-
> > > >>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> > > >>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> > > >>>>
> > > >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > >>>> index 85ab7d7..3ad5203 100644
> > > >>>> --- a/include/linux/skbuff.h
> > > >>>> +++ b/include/linux/skbuff.h
> > > >>>> @@ -605,7 +605,8 @@ struct sk_buff {
> > > >>>>  	__u8			ipvs_property:1;
> > > >>>>  	__u8			inner_protocol_type:1;
> > > >>>>  	__u8			remcsum_offload:1;
> > > >>>> -	/* 3 or 5 bit hole */
> > > >>>> +	__u8			ufo_fragid_set:1;
> > > >>> [...]
> > > >>>
> > > >>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> > > >>> sk_buff?  Otherwise this looks fine.
> > > >>>
> > > >>> Ben.
> > > >>
> > > >> Hmm we seem to be out of tx flags.
> > > >> Maybe ip6_frag_id == 0 should mean "not set".
> > > > 
> > > > Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> > > > move into the skb_shared_info area.
> > > 
> > > That's what I originally wanted to do, but had to move and grow txflags thus
> > > skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> > > 
> > > I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> > > from the protocol perspective and could actually be generated by the id generator
> > > functions.  This may cause us to call the id generation multiple times.
> > 
> > Are there plans in the long run to let virtio_net transmit auxiliary
> > data to the other end so we can clean all of this this up one day?
> > 
> > I don't like the whole situation: looking into the virtio_net headers
> > just adding a field for ipv6 fragmentation ids to those small structs
> > seems bloated, not doing it feels incorrect. :/
> > 
> > Thoughts?
> > 
> > Bye,
> > Hannes
> 
> I'm not sure - what will be achieved by generating the IDs guest side as
> opposed to host side?  It's certainly harder to get hold of entropy
> guest-side.

It is not only about entropy but about uniqueness. Also fragmentation
ids should not be discoverable, so there are several aspects:

I see fragmentation id generation still as security critical:
When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
identifiers less predictable") I could patch my kernels and use the
patch regardless of the machine being virtualized or not. It was not
dependent on the hypervisor. I think that is the same reasoning why we
don't support TOE.

If we use one generator in the hypervisor in an openstack alike setting,
the host deals with quite a lot of overlay networks. A lot of default
configurations use the same addresses internally, so on the hypervisor
the frag id generators would interfere by design.

I could come up with an attack scenario for DNS servers (again :) ):

You are sitting next to a DNS server on the same hypervisor and can send
packets without source validation (because that is handled later on in
case of openvswitch when the packet is put into the corresponding
overlay network). You emit a gso packet with the same source and
destination addresses as the DNS server would do and would get an
fragmentation id which is linearly (+ time delta) incremented depending
on the source and destination address. With such a leak you could start
trying attack and spoof DNS responses (fragmentation attacks etc.).

See also details on such kind of attacks in the description of commit
04ca6973f7c1a0d.

AFAIK IETF tried with IPv6 to push fragmentation id generation to the
end hosts, that's also the reason for the introduction of atomic
fragments (which are now being rolled back ;) ).

Still it is better to generate a frag id on the hypervisor than just
sending a 0, so I am ok with this change, albeit not happy.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28  8:25               ` Hannes Frederic Sowa
@ 2015-01-28  9:46                 ` Michael S. Tsirkin
  2015-01-28 10:34                   ` Hannes Frederic Sowa
  2015-01-28 17:24                   ` Ben Hutchings
  0 siblings, 2 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2015-01-28  9:46 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
> Hello,
> 
> On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
> > On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> > > On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> > > > On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> > > > > On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> > > > >> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > > > >>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > > > >>>> If the IPv6 fragment id has not been set and we perform
> > > > >>>> fragmentation due to UFO, select a new fragment id.
> > > > >>>> When we store the fragment id into skb_shinfo, set the bit
> > > > >>>> in the skb so we can re-use the selected id.
> > > > >>>> This preserves the behavior of UFO packets generated on the
> > > > >>>> host and solves the issue of id generation for packet sockets
> > > > >>>> and tap/macvtap devices.
> > > > >>>>
> > > > >>>> This patch moves ipv6_select_ident() back in to the header file.  
> > > > >>>> It also provides the helper function that sets skb_shinfo() frag
> > > > >>>> id and sets the bit.
> > > > >>>>
> > > > >>>> It also makes sure that we select the fragment id when doing
> > > > >>>> just gso validation, since it's possible for the packet to
> > > > >>>> come from an untrusted source (VM) and be forwarded through
> > > > >>>> a UFO enabled device which will expect the fragment id.
> > > > >>>>
> > > > >>>> CC: Eric Dumazet <edumazet@google.com>
> > > > >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > > >>>> ---
> > > > >>>>  include/linux/skbuff.h |  3 ++-
> > > > >>>>  include/net/ipv6.h     |  2 ++
> > > > >>>>  net/ipv6/ip6_output.c  |  4 ++--
> > > > >>>>  net/ipv6/output_core.c |  9 ++++++++-
> > > > >>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> > > > >>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> > > > >>>>
> > > > >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > >>>> index 85ab7d7..3ad5203 100644
> > > > >>>> --- a/include/linux/skbuff.h
> > > > >>>> +++ b/include/linux/skbuff.h
> > > > >>>> @@ -605,7 +605,8 @@ struct sk_buff {
> > > > >>>>  	__u8			ipvs_property:1;
> > > > >>>>  	__u8			inner_protocol_type:1;
> > > > >>>>  	__u8			remcsum_offload:1;
> > > > >>>> -	/* 3 or 5 bit hole */
> > > > >>>> +	__u8			ufo_fragid_set:1;
> > > > >>> [...]
> > > > >>>
> > > > >>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> > > > >>> sk_buff?  Otherwise this looks fine.
> > > > >>>
> > > > >>> Ben.
> > > > >>
> > > > >> Hmm we seem to be out of tx flags.
> > > > >> Maybe ip6_frag_id == 0 should mean "not set".
> > > > > 
> > > > > Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> > > > > move into the skb_shared_info area.
> > > > 
> > > > That's what I originally wanted to do, but had to move and grow txflags thus
> > > > skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> > > > 
> > > > I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> > > > from the protocol perspective and could actually be generated by the id generator
> > > > functions.  This may cause us to call the id generation multiple times.
> > > 
> > > Are there plans in the long run to let virtio_net transmit auxiliary
> > > data to the other end so we can clean all of this this up one day?
> > > 
> > > I don't like the whole situation: looking into the virtio_net headers
> > > just adding a field for ipv6 fragmentation ids to those small structs
> > > seems bloated, not doing it feels incorrect. :/
> > > 
> > > Thoughts?
> > > 
> > > Bye,
> > > Hannes
> > 
> > I'm not sure - what will be achieved by generating the IDs guest side as
> > opposed to host side?  It's certainly harder to get hold of entropy
> > guest-side.
> 
> It is not only about entropy but about uniqueness.  Also fragmentation
> ids should not be discoverable,

I belive "predictable" is the language used by the IETF draft.

> so there are several aspects:
> 
> I see fragmentation id generation still as security critical:
> When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
> identifiers less predictable") I could patch my kernels and use the
> patch regardless of the machine being virtualized or not. It was not
> dependent on the hypervisor.

And now it's even easier - just patch the hypervisor, and all VMs
automatically benefit.

> I think that is the same reasoning why we
> don't support TOE.
> If we use one generator in the hypervisor in an openstack alike setting,
> the host deals with quite a lot of overlay networks. A lot of default
> configurations use the same addresses internally, so on the hypervisor
> the frag id generators would interfere by design.
> I could come up with an attack scenario for DNS servers (again :) ):
> 
> You are sitting next to a DNS server on the same hypervisor and can send
> packets without source validation (because that is handled later on in
> case of openvswitch when the packet is put into the corresponding
> overlay network). You emit a gso packet with the same source and
> destination addresses as the DNS server would do and would get an
> fragmentation id which is linearly (+ time delta) incremented depending
> on the source and destination address. With such a leak you could start
> trying attack and spoof DNS responses (fragmentation attacks etc.).
> See also details on such kind of attacks in the description of commit
> 04ca6973f7c1a0d.
> 
> AFAIK IETF tried with IPv6 to push fragmentation id generation to the
> end hosts, that's also the reason for the introduction of atomic
> fragments (which are now being rolled back ;) ).
> 
> Still it is better to generate a frag id on the hypervisor than just
> sending a 0, so I am ok with this change, albeit not happy.
> 
> Thanks,
> Hannes
> 

OK so to summarize, identifiers are only re-randomized once per jiffy,
so you worry that within this window, an external observer can discover
past fragment ID values and so predict the future ones.
All that's required is that two paths go through the same box performing
fragmentation.

Is that a fair summary?

If yes, we can make this a bit harder by mixing in some
data per input and/or output devices.

For example, just to give you the idea:

diff --git a/net/core/dev.c b/net/core/dev.c
index 683d493..4faa7ef 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
 	trace_netif_receive_skb(skb);
 
 	orig_dev = skb->dev;
+	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
 
 	skb_reset_network_header(skb);
 	if (!skb_transport_header_was_set(skb))
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ce69a12..819a821 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 				     sizeof(struct frag_hdr)) & ~7;
 	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
 	ipv6_select_ident(&fhdr, rt);
-	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
+	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
+						   fhdr.identification);
 
 append:
 	return skb_append_datato_frags(sk, skb, getfrag, from,

-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28  9:46                 ` Michael S. Tsirkin
@ 2015-01-28 10:34                   ` Hannes Frederic Sowa
  2015-01-28 10:39                     ` Hannes Frederic Sowa
                                       ` (3 more replies)
  2015-01-28 17:24                   ` Ben Hutchings
  1 sibling, 4 replies; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-28 10:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

Hi,

On Mi, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
> > Hello,
> > 
> > On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
> > > On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> > > > On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> > > > > On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> > > > > > On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> > > > > >> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > > > > >>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > > > > >>>> If the IPv6 fragment id has not been set and we perform
> > > > > >>>> fragmentation due to UFO, select a new fragment id.
> > > > > >>>> When we store the fragment id into skb_shinfo, set the bit
> > > > > >>>> in the skb so we can re-use the selected id.
> > > > > >>>> This preserves the behavior of UFO packets generated on the
> > > > > >>>> host and solves the issue of id generation for packet sockets
> > > > > >>>> and tap/macvtap devices.
> > > > > >>>>
> > > > > >>>> This patch moves ipv6_select_ident() back in to the header file.  
> > > > > >>>> It also provides the helper function that sets skb_shinfo() frag
> > > > > >>>> id and sets the bit.
> > > > > >>>>
> > > > > >>>> It also makes sure that we select the fragment id when doing
> > > > > >>>> just gso validation, since it's possible for the packet to
> > > > > >>>> come from an untrusted source (VM) and be forwarded through
> > > > > >>>> a UFO enabled device which will expect the fragment id.
> > > > > >>>>
> > > > > >>>> CC: Eric Dumazet <edumazet@google.com>
> > > > > >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > > > >>>> ---
> > > > > >>>>  include/linux/skbuff.h |  3 ++-
> > > > > >>>>  include/net/ipv6.h     |  2 ++
> > > > > >>>>  net/ipv6/ip6_output.c  |  4 ++--
> > > > > >>>>  net/ipv6/output_core.c |  9 ++++++++-
> > > > > >>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> > > > > >>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> > > > > >>>>
> > > > > >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > > >>>> index 85ab7d7..3ad5203 100644
> > > > > >>>> --- a/include/linux/skbuff.h
> > > > > >>>> +++ b/include/linux/skbuff.h
> > > > > >>>> @@ -605,7 +605,8 @@ struct sk_buff {
> > > > > >>>>  	__u8			ipvs_property:1;
> > > > > >>>>  	__u8			inner_protocol_type:1;
> > > > > >>>>  	__u8			remcsum_offload:1;
> > > > > >>>> -	/* 3 or 5 bit hole */
> > > > > >>>> +	__u8			ufo_fragid_set:1;
> > > > > >>> [...]
> > > > > >>>
> > > > > >>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> > > > > >>> sk_buff?  Otherwise this looks fine.
> > > > > >>>
> > > > > >>> Ben.
> > > > > >>
> > > > > >> Hmm we seem to be out of tx flags.
> > > > > >> Maybe ip6_frag_id == 0 should mean "not set".
> > > > > > 
> > > > > > Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> > > > > > move into the skb_shared_info area.
> > > > > 
> > > > > That's what I originally wanted to do, but had to move and grow txflags thus
> > > > > skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> > > > > 
> > > > > I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> > > > > from the protocol perspective and could actually be generated by the id generator
> > > > > functions.  This may cause us to call the id generation multiple times.
> > > > 
> > > > Are there plans in the long run to let virtio_net transmit auxiliary
> > > > data to the other end so we can clean all of this this up one day?
> > > > 
> > > > I don't like the whole situation: looking into the virtio_net headers
> > > > just adding a field for ipv6 fragmentation ids to those small structs
> > > > seems bloated, not doing it feels incorrect. :/
> > > > 
> > > > Thoughts?
> > > > 
> > > > Bye,
> > > > Hannes
> > > 
> > > I'm not sure - what will be achieved by generating the IDs guest side as
> > > opposed to host side?  It's certainly harder to get hold of entropy
> > > guest-side.
> > 
> > It is not only about entropy but about uniqueness.  Also fragmentation
> > ids should not be discoverable,
> 
> I belive "predictable" is the language used by the IETF draft.
> 
> > so there are several aspects:
> > 
> > I see fragmentation id generation still as security critical:
> > When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
> > identifiers less predictable") I could patch my kernels and use the
> > patch regardless of the machine being virtualized or not. It was not
> > dependent on the hypervisor.
> 
> And now it's even easier - just patch the hypervisor, and all VMs
> automatically benefit.

Sometimes the hypervisor is not under my control. You would need to
patch both kernels in your case - non gso frames would still get the
fragmentation id generated in the host kernel.

> > I think that is the same reasoning why we
> > don't support TOE.
> > If we use one generator in the hypervisor in an openstack alike setting,
> > the host deals with quite a lot of overlay networks. A lot of default
> > configurations use the same addresses internally, so on the hypervisor
> > the frag id generators would interfere by design.
> > I could come up with an attack scenario for DNS servers (again :) ):
> > 
> > You are sitting next to a DNS server on the same hypervisor and can send
> > packets without source validation (because that is handled later on in
> > case of openvswitch when the packet is put into the corresponding
> > overlay network). You emit a gso packet with the same source and
> > destination addresses as the DNS server would do and would get an
> > fragmentation id which is linearly (+ time delta) incremented depending
> > on the source and destination address. With such a leak you could start
> > trying attack and spoof DNS responses (fragmentation attacks etc.).
> > See also details on such kind of attacks in the description of commit
> > 04ca6973f7c1a0d.
> > 
> > AFAIK IETF tried with IPv6 to push fragmentation id generation to the
> > end hosts, that's also the reason for the introduction of atomic
> > fragments (which are now being rolled back ;) ).
> > 
> > Still it is better to generate a frag id on the hypervisor than just
> > sending a 0, so I am ok with this change, albeit not happy.
> > 
> > Thanks,
> > Hannes
> > 
> 
> OK so to summarize, identifiers are only re-randomized once per jiffy,
> so you worry that within this window, an external observer can discover
> past fragment ID values and so predict the future ones.
> All that's required is that two paths go through the same box performing
> fragmentation.
> 
> Is that a fair summary?
> 
> If yes, we can make this a bit harder by mixing in some
> data per input and/or output devices.
> 
> For example, just to give you the idea:
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 683d493..4faa7ef 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
>  	trace_netif_receive_skb(skb);
>  
>  	orig_dev = skb->dev;
> +	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
>  
>  	skb_reset_network_header(skb);
>  	if (!skb_transport_header_was_set(skb))
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index ce69a12..819a821 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
>  				     sizeof(struct frag_hdr)) & ~7;
>  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
>  	ipv6_select_ident(&fhdr, rt);
> -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> +						   fhdr.identification);
>  
>  append:
>  	return skb_append_datato_frags(sk, skb, getfrag, from,
> 

I thought about mixing in the incoming interface identifier into the
frag id generation, but that could hurt us badly as soon as a VM has
more than one interface to the outside world and uses e.g. ECMP. We need
to make sure that those frag ids are unique and the kernel needs to be
better than just using a random number generator.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28 10:34                   ` Hannes Frederic Sowa
@ 2015-01-28 10:39                     ` Hannes Frederic Sowa
  2015-01-28 13:43                     ` Michael S. Tsirkin
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-28 10:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

On Mi, 2015-01-28 at 11:34 +0100, Hannes Frederic Sowa wrote:
> 
> > And now it's even easier - just patch the hypervisor, and all VMs
> > automatically benefit.
> 
> Sometimes the hypervisor is not under my control. You would need to
> patch both kernels in your case - non gso frames would still get the
> fragmentation id generated in the host kernel.

Actually this now became my biggest concern. :/

Bye,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28 10:34                   ` Hannes Frederic Sowa
  2015-01-28 10:39                     ` Hannes Frederic Sowa
@ 2015-01-28 13:43                     ` Michael S. Tsirkin
  2015-01-28 14:17                       ` Hannes Frederic Sowa
  2015-01-28 14:16                     ` Vlad Yasevich
  2015-01-28 16:00                     ` Michael S. Tsirkin
  3 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2015-01-28 13:43 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

On Wed, Jan 28, 2015 at 11:34:02AM +0100, Hannes Frederic Sowa wrote:
> Hi,
> 
> On Mi, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
> > On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
> > > Hello,
> > > 
> > > On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
> > > > On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> > > > > On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> > > > > > On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> > > > > > > On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> > > > > > >> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > > > > > >>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > > > > > >>>> If the IPv6 fragment id has not been set and we perform
> > > > > > >>>> fragmentation due to UFO, select a new fragment id.
> > > > > > >>>> When we store the fragment id into skb_shinfo, set the bit
> > > > > > >>>> in the skb so we can re-use the selected id.
> > > > > > >>>> This preserves the behavior of UFO packets generated on the
> > > > > > >>>> host and solves the issue of id generation for packet sockets
> > > > > > >>>> and tap/macvtap devices.
> > > > > > >>>>
> > > > > > >>>> This patch moves ipv6_select_ident() back in to the header file.  
> > > > > > >>>> It also provides the helper function that sets skb_shinfo() frag
> > > > > > >>>> id and sets the bit.
> > > > > > >>>>
> > > > > > >>>> It also makes sure that we select the fragment id when doing
> > > > > > >>>> just gso validation, since it's possible for the packet to
> > > > > > >>>> come from an untrusted source (VM) and be forwarded through
> > > > > > >>>> a UFO enabled device which will expect the fragment id.
> > > > > > >>>>
> > > > > > >>>> CC: Eric Dumazet <edumazet@google.com>
> > > > > > >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > > > > >>>> ---
> > > > > > >>>>  include/linux/skbuff.h |  3 ++-
> > > > > > >>>>  include/net/ipv6.h     |  2 ++
> > > > > > >>>>  net/ipv6/ip6_output.c  |  4 ++--
> > > > > > >>>>  net/ipv6/output_core.c |  9 ++++++++-
> > > > > > >>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> > > > > > >>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> > > > > > >>>>
> > > > > > >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > > > >>>> index 85ab7d7..3ad5203 100644
> > > > > > >>>> --- a/include/linux/skbuff.h
> > > > > > >>>> +++ b/include/linux/skbuff.h
> > > > > > >>>> @@ -605,7 +605,8 @@ struct sk_buff {
> > > > > > >>>>  	__u8			ipvs_property:1;
> > > > > > >>>>  	__u8			inner_protocol_type:1;
> > > > > > >>>>  	__u8			remcsum_offload:1;
> > > > > > >>>> -	/* 3 or 5 bit hole */
> > > > > > >>>> +	__u8			ufo_fragid_set:1;
> > > > > > >>> [...]
> > > > > > >>>
> > > > > > >>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> > > > > > >>> sk_buff?  Otherwise this looks fine.
> > > > > > >>>
> > > > > > >>> Ben.
> > > > > > >>
> > > > > > >> Hmm we seem to be out of tx flags.
> > > > > > >> Maybe ip6_frag_id == 0 should mean "not set".
> > > > > > > 
> > > > > > > Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> > > > > > > move into the skb_shared_info area.
> > > > > > 
> > > > > > That's what I originally wanted to do, but had to move and grow txflags thus
> > > > > > skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> > > > > > 
> > > > > > I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> > > > > > from the protocol perspective and could actually be generated by the id generator
> > > > > > functions.  This may cause us to call the id generation multiple times.
> > > > > 
> > > > > Are there plans in the long run to let virtio_net transmit auxiliary
> > > > > data to the other end so we can clean all of this this up one day?
> > > > > 
> > > > > I don't like the whole situation: looking into the virtio_net headers
> > > > > just adding a field for ipv6 fragmentation ids to those small structs
> > > > > seems bloated, not doing it feels incorrect. :/
> > > > > 
> > > > > Thoughts?
> > > > > 
> > > > > Bye,
> > > > > Hannes
> > > > 
> > > > I'm not sure - what will be achieved by generating the IDs guest side as
> > > > opposed to host side?  It's certainly harder to get hold of entropy
> > > > guest-side.
> > > 
> > > It is not only about entropy but about uniqueness.  Also fragmentation
> > > ids should not be discoverable,
> > 
> > I belive "predictable" is the language used by the IETF draft.
> > 
> > > so there are several aspects:
> > > 
> > > I see fragmentation id generation still as security critical:
> > > When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
> > > identifiers less predictable") I could patch my kernels and use the
> > > patch regardless of the machine being virtualized or not. It was not
> > > dependent on the hypervisor.
> > 
> > And now it's even easier - just patch the hypervisor, and all VMs
> > automatically benefit.
> 
> Sometimes the hypervisor is not under my control. You would need to
> patch both kernels in your case - non gso frames would still get the
> fragmentation id generated in the host kernel.

Confused. You would have to patch both kernels *in your case*.
If it's all done by host, then it's in a single place, on host.

> > > I think that is the same reasoning why we
> > > don't support TOE.
> > > If we use one generator in the hypervisor in an openstack alike setting,
> > > the host deals with quite a lot of overlay networks. A lot of default
> > > configurations use the same addresses internally, so on the hypervisor
> > > the frag id generators would interfere by design.
> > > I could come up with an attack scenario for DNS servers (again :) ):
> > > 
> > > You are sitting next to a DNS server on the same hypervisor and can send
> > > packets without source validation (because that is handled later on in
> > > case of openvswitch when the packet is put into the corresponding
> > > overlay network). You emit a gso packet with the same source and
> > > destination addresses as the DNS server would do and would get an
> > > fragmentation id which is linearly (+ time delta) incremented depending
> > > on the source and destination address. With such a leak you could start
> > > trying attack and spoof DNS responses (fragmentation attacks etc.).
> > > See also details on such kind of attacks in the description of commit
> > > 04ca6973f7c1a0d.
> > > 
> > > AFAIK IETF tried with IPv6 to push fragmentation id generation to the
> > > end hosts, that's also the reason for the introduction of atomic
> > > fragments (which are now being rolled back ;) ).
> > > 
> > > Still it is better to generate a frag id on the hypervisor than just
> > > sending a 0, so I am ok with this change, albeit not happy.
> > > 
> > > Thanks,
> > > Hannes
> > > 
> > 
> > OK so to summarize, identifiers are only re-randomized once per jiffy,
> > so you worry that within this window, an external observer can discover
> > past fragment ID values and so predict the future ones.
> > All that's required is that two paths go through the same box performing
> > fragmentation.
> > 
> > Is that a fair summary?
> > 
> > If yes, we can make this a bit harder by mixing in some
> > data per input and/or output devices.
> > 
> > For example, just to give you the idea:
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 683d493..4faa7ef 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
> >  	trace_netif_receive_skb(skb);
> >  
> >  	orig_dev = skb->dev;
> > +	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
> >  
> >  	skb_reset_network_header(skb);
> >  	if (!skb_transport_header_was_set(skb))
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index ce69a12..819a821 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> >  				     sizeof(struct frag_hdr)) & ~7;
> >  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> >  	ipv6_select_ident(&fhdr, rt);
> > -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> > +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> > +						   fhdr.identification);
> >  
> >  append:
> >  	return skb_append_datato_frags(sk, skb, getfrag, from,
> > 
> 
> I thought about mixing in the incoming interface identifier into the
> frag id generation, but that could hurt us badly as soon as a VM has
> more than one interface to the outside world and uses e.g. ECMP.

I don't understand. Fragmentation is done after routing,
isn't it? So all fragments always go out on the same device.

> We need
> to make sure that those frag ids are unique and the kernel needs to be
> better than just using a random number generator.
> 
> Bye,
> Hannes

32 bit numbers can't be unique. They just shouldn't be discoverable
by an off-path observer.

-- 
MST

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28 10:34                   ` Hannes Frederic Sowa
  2015-01-28 10:39                     ` Hannes Frederic Sowa
  2015-01-28 13:43                     ` Michael S. Tsirkin
@ 2015-01-28 14:16                     ` Vlad Yasevich
  2015-01-28 14:45                       ` Hannes Frederic Sowa
  2015-01-28 16:00                     ` Michael S. Tsirkin
  3 siblings, 1 reply; 36+ messages in thread
From: Vlad Yasevich @ 2015-01-28 14:16 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Michael S. Tsirkin
  Cc: netdev, Vladislav Yasevich, Ben Hutchings, edumazet, virtualization

On 01/28/2015 05:34 AM, Hannes Frederic Sowa wrote:
> Hi,
> 
> On Mi, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
>> On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
>>> Hello,
>>>
>>> On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
>>>> On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
>>>>> On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
>>>>>> On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
>>>>>>> On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
>>>>>>>> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
>>>>>>>>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
>>>>>>>>>> If the IPv6 fragment id has not been set and we perform
>>>>>>>>>> fragmentation due to UFO, select a new fragment id.
>>>>>>>>>> When we store the fragment id into skb_shinfo, set the bit
>>>>>>>>>> in the skb so we can re-use the selected id.
>>>>>>>>>> This preserves the behavior of UFO packets generated on the
>>>>>>>>>> host and solves the issue of id generation for packet sockets
>>>>>>>>>> and tap/macvtap devices.
>>>>>>>>>>
>>>>>>>>>> This patch moves ipv6_select_ident() back in to the header file.  
>>>>>>>>>> It also provides the helper function that sets skb_shinfo() frag
>>>>>>>>>> id and sets the bit.
>>>>>>>>>>
>>>>>>>>>> It also makes sure that we select the fragment id when doing
>>>>>>>>>> just gso validation, since it's possible for the packet to
>>>>>>>>>> come from an untrusted source (VM) and be forwarded through
>>>>>>>>>> a UFO enabled device which will expect the fragment id.
>>>>>>>>>>
>>>>>>>>>> CC: Eric Dumazet <edumazet@google.com>
>>>>>>>>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>>>>>>>>>> ---
>>>>>>>>>>  include/linux/skbuff.h |  3 ++-
>>>>>>>>>>  include/net/ipv6.h     |  2 ++
>>>>>>>>>>  net/ipv6/ip6_output.c  |  4 ++--
>>>>>>>>>>  net/ipv6/output_core.c |  9 ++++++++-
>>>>>>>>>>  net/ipv6/udp_offload.c | 10 +++++++++-
>>>>>>>>>>  5 files changed, 23 insertions(+), 5 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>>>>>>>> index 85ab7d7..3ad5203 100644
>>>>>>>>>> --- a/include/linux/skbuff.h
>>>>>>>>>> +++ b/include/linux/skbuff.h
>>>>>>>>>> @@ -605,7 +605,8 @@ struct sk_buff {
>>>>>>>>>>  	__u8			ipvs_property:1;
>>>>>>>>>>  	__u8			inner_protocol_type:1;
>>>>>>>>>>  	__u8			remcsum_offload:1;
>>>>>>>>>> -	/* 3 or 5 bit hole */
>>>>>>>>>> +	__u8			ufo_fragid_set:1;
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>> Doesn't the flag belong in struct skb_shared_info, rather than struct
>>>>>>>>> sk_buff?  Otherwise this looks fine.
>>>>>>>>>
>>>>>>>>> Ben.
>>>>>>>>
>>>>>>>> Hmm we seem to be out of tx flags.
>>>>>>>> Maybe ip6_frag_id == 0 should mean "not set".
>>>>>>>
>>>>>>> Maybe that is the best idea. Definitely the ufo_fragid_set bit should
>>>>>>> move into the skb_shared_info area.
>>>>>>
>>>>>> That's what I originally wanted to do, but had to move and grow txflags thus
>>>>>> skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
>>>>>>
>>>>>> I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
>>>>>> from the protocol perspective and could actually be generated by the id generator
>>>>>> functions.  This may cause us to call the id generation multiple times.
>>>>>
>>>>> Are there plans in the long run to let virtio_net transmit auxiliary
>>>>> data to the other end so we can clean all of this this up one day?
>>>>>
>>>>> I don't like the whole situation: looking into the virtio_net headers
>>>>> just adding a field for ipv6 fragmentation ids to those small structs
>>>>> seems bloated, not doing it feels incorrect. :/
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Bye,
>>>>> Hannes
>>>>
>>>> I'm not sure - what will be achieved by generating the IDs guest side as
>>>> opposed to host side?  It's certainly harder to get hold of entropy
>>>> guest-side.
>>>
>>> It is not only about entropy but about uniqueness.  Also fragmentation
>>> ids should not be discoverable,
>>
>> I belive "predictable" is the language used by the IETF draft.
>>
>>> so there are several aspects:
>>>
>>> I see fragmentation id generation still as security critical:
>>> When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
>>> identifiers less predictable") I could patch my kernels and use the
>>> patch regardless of the machine being virtualized or not. It was not
>>> dependent on the hypervisor.
>>
>> And now it's even easier - just patch the hypervisor, and all VMs
>> automatically benefit.
> 
> Sometimes the hypervisor is not under my control. You would need to
> patch both kernels in your case - non gso frames would still get the
> fragmentation id generated in the host kernel.

Why would non-gso frames need a frag id?  We are talking only UDP IPv6
here, so there is no frag id generation if the packet does't need to
be fragmented.

> 
>>> I think that is the same reasoning why we
>>> don't support TOE.
>>> If we use one generator in the hypervisor in an openstack alike setting,
>>> the host deals with quite a lot of overlay networks. A lot of default
>>> configurations use the same addresses internally, so on the hypervisor
>>> the frag id generators would interfere by design.
>>> I could come up with an attack scenario for DNS servers (again :) ):
>>>
>>> You are sitting next to a DNS server on the same hypervisor and can send
>>> packets without source validation (because that is handled later on in
>>> case of openvswitch when the packet is put into the corresponding
>>> overlay network). You emit a gso packet with the same source and
>>> destination addresses as the DNS server would do and would get an
>>> fragmentation id which is linearly (+ time delta) incremented depending
>>> on the source and destination address. With such a leak you could start
>>> trying attack and spoof DNS responses (fragmentation attacks etc.).
>>> See also details on such kind of attacks in the description of commit
>>> 04ca6973f7c1a0d.
>>>
>>> AFAIK IETF tried with IPv6 to push fragmentation id generation to the
>>> end hosts, that's also the reason for the introduction of atomic
>>> fragments (which are now being rolled back ;) ).
>>>
>>> Still it is better to generate a frag id on the hypervisor than just
>>> sending a 0, so I am ok with this change, albeit not happy.
>>>
>>> Thanks,
>>> Hannes
>>>
>>
>> OK so to summarize, identifiers are only re-randomized once per jiffy,
>> so you worry that within this window, an external observer can discover
>> past fragment ID values and so predict the future ones.
>> All that's required is that two paths go through the same box performing
>> fragmentation.
>>
>> Is that a fair summary?
>>
>> If yes, we can make this a bit harder by mixing in some
>> data per input and/or output devices.
>>
>> For example, just to give you the idea:
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 683d493..4faa7ef 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
>>  	trace_netif_receive_skb(skb);
>>  
>>  	orig_dev = skb->dev;
>> +	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
>>  
>>  	skb_reset_network_header(skb);
>>  	if (!skb_transport_header_was_set(skb))
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index ce69a12..819a821 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
>>  				     sizeof(struct frag_hdr)) & ~7;
>>  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
>>  	ipv6_select_ident(&fhdr, rt);
>> -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
>> +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
>> +						   fhdr.identification);
>>  
>>  append:
>>  	return skb_append_datato_frags(sk, skb, getfrag, from,
>>
> 
> I thought about mixing in the incoming interface identifier into the
> frag id generation, but that could hurt us badly as soon as a VM has
> more than one interface to the outside world and uses e.g. ECMP. We need
> to make sure that those frag ids are unique and the kernel needs to be
> better than just using a random number generator.
>

So the goal behind this series of patches is to restore VM functionality to
pre-916e4cf46d0204 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data").

-vlad

> Bye,
> Hannes
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28 13:43                     ` Michael S. Tsirkin
@ 2015-01-28 14:17                       ` Hannes Frederic Sowa
  0 siblings, 0 replies; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-28 14:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

On Mi, 2015-01-28 at 15:43 +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 28, 2015 at 11:34:02AM +0100, Hannes Frederic Sowa wrote:
> > Hi,
> > 
> > On Mi, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
> > > On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
> > > > Hello,
> > > > 
> > > > On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
> > > > > On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> > > > > > On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> > > > > > > On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> > > > > > > > On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> > > > > > > >> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > > > > > > >>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > > > > > > >>>> If the IPv6 fragment id has not been set and we perform
> > > > > > > >>>> fragmentation due to UFO, select a new fragment id.
> > > > > > > >>>> When we store the fragment id into skb_shinfo, set the bit
> > > > > > > >>>> in the skb so we can re-use the selected id.
> > > > > > > >>>> This preserves the behavior of UFO packets generated on the
> > > > > > > >>>> host and solves the issue of id generation for packet sockets
> > > > > > > >>>> and tap/macvtap devices.
> > > > > > > >>>>
> > > > > > > >>>> This patch moves ipv6_select_ident() back in to the header file.  
> > > > > > > >>>> It also provides the helper function that sets skb_shinfo() fragd have to patch both kernels *in your case*.
If it's all done by host, then it's in a single place, on host.
> > > > > > > >>>> id and sets the bit.
> > > > > > > >>>>
> > > > > > > >>>> It also makes sure that we select the fragment id when doing
> > > > > > > >>>> just gso validation, since it's possible for the packet to
> > > > > > > >>>> come from an untrusted source (VM) and be forwarded through
> > > > > > > >>>> a UFO enabled device which will expect the fragment id.
> > > > > > > >>>>
> > > > > > > >>>> CC: Eric Dumazet <edumazet@google.com>
> > > > > > > >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > > > > > >>>> ---
> > > > > > > >>>>  include/linux/skbuff.h |  3 ++-
> > > > > > > >>>>  include/net/ipv6.h     |  2 ++
> > > > > > > >>>>  net/ipv6/ip6_output.c  |  4 ++--
> > > > > > > >>>>  net/ipv6/output_core.c |  9 ++++++++-
> > > > > > > >>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> > > > > > > >>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> > > > > > > >>>>
> > > > > > > >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > > > > >>>> index 85ab7d7..3ad5203 100644
> > > > > > > >>>> --- a/include/linux/skbuff.h
> > > > > > > >>>> +++ b/include/linux/skbuff.h
> > > > > > > >>>> @@ -605,7 +605,8 @@ struct sk_buff {
> > > > > > > >>>>  	__u8			ipvs_property:1;
> > > > > > > >>>>  	__u8			inner_protocol_type:1;
> > > > > > > >>>>  	__u8			remcsum_offload:1;
> > > > > > > >>>> -	/* 3 or 5 bit hole */
> > > > > > > >>>> +	__u8			ufo_fragid_set:1;
> > > > > > > >>> [...]
> > > > > > > >>>
> > > > > > > >>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> > > > > > > >>> sk_buff?  Otherwise this looks fine.
> > > > > > > >>>
> > > > > > > >>> Ben.
> > > > > > > >>
> > > > > > > >> Hmm we seem to be out of tx flags.
> > > > > > > >> Maybe ip6_frag_id == 0 should mean "not set".
> > > > > > > > 
> > > > > > > > Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> > > > > > > > move into the skb_shared_info area.
> > > > > > > 
> > > > > > > That's what I originally wanted to do, but had to move and grow txflags thus
> > > > > > > skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> > > > > > > 
> > > > > > > I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> > > > > > > from the protocol perspective and could actually be generated by the id generator
> > > > > > > functions.  This may cause us to call the id generation multiple times.
> > > > > > 
> > > > > > Are there plans in the long run to let virtio_net transmit auxiliary
> > > > > > data to the other end so we can clean all of this this up one day?
> > > > > > 
> > > > > > I don't like the whole situation: looking into the virtio_net headers
> > > > > > just adding a field for ipv6 fragmentation ids to those small structs
> > > > > > seems bloated, not doing it feels incorrect. :/
> > > > > > 
> > > > > > Thoughts?
> > > > > > 
> > > > > > Bye,
> > > > > > Hannes
> > > > > 
> > > > > I'm not sure - what will be achieved by generating the IDs guest side as
> > > > > opposed to host side?  It's certainly harder to get hold of entropy
> > > > > guest-side.
> > > > 
> > > > It is not only about entropy but about uniqueness.  Also fragmentation
> > > > ids should not be discoverable,
> > > 
> > > I belive "predictable" is the language used by the IETF draft.
> > > 
> > > > so there are several aspects:
> > > > 
> > > > I see fragmentation id generation still as security critical:
> > > > When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
> > > > identifiers less predictable") I could patch my kernels and use the
> > > > patch regardless of the machine being virtualized or not. It was not
> > > > dependent on the hypervisor.
> > > 
> > > And now it's even easier - just patch the hypervisor, and all VMs
> > > automatically benefit.
> > 
> > Sometimes the hypervisor is not under my control. You would need to
> > patch both kernels in your case - non gso frames would still get the
> > fragmentation id generated in the host kernel.
> 
> Confused. You would have to patch both kernels *in your case*.
> If it's all done by host, then it's in a single place, on host.

host is the hypervisor?

Anyway, we would have to patch both kernels now anyway. :)

We still have a working ipv6_fragment routine in the virtualized kernel
which can embed fragmentation extension headers in frames, thus
generating the fragmentation id in the virtualized kernel.

> > > > I think that is the same reasoning why we
> > > > don't support TOE.
> > > > If we use one generator in the hypervisor in an openstack alike setting,
> > > > the host deals with quite a lot of overlay networks. A lot of default
> > > > configurations use the same addresses internally, so on the hypervisor
> > > > the frag id generators would interfere by design.
> > > > I could come up with an attack scenario for DNS servers (again :) ):
> > > > 
> > > > You are sitting next to a DNS server on the same hypervisor and can send
> > > > packets without source validation (because that is handled later on in
> > > > case of openvswitch when the packet is put into the corresponding
> > > > overlay network). You emit a gso packet with the same source and
> > > > destination addresses as the DNS server would do and would get an
> > > > fragmentation id which is linearly (+ time delta) incremented depending
> > > > on the source and destination address. With such a leak you could start
> > > > trying attack and spoof DNS responses (fragmentation attacks etc.).
> > > > See also details on such kind of attacks in the description of commit
> > > > 04ca6973f7c1a0d.
> > > > 
> > > > AFAIK IETF tried with IPv6 to push fragmentation id generation to the
> > > > end hosts, that's also the reason for the introduction of atomic
> > > > fragments (which are now being rolled back ;) ).
> > > > 
> > > > Still it is better to generate a frag id on the hypervisor than just
> > > > sending a 0, so I am ok with this change, albeit not happy.
> > > > 
> > > > Thanks,
> > > > Hannes
> > > > 
> > > 
> > > OK so to summarize, identifiers are only re-randomized once per jiffy,
> > > so you worry that within this window, an external observer can discover
> > > past fragment ID values and so predict the future ones.
> > > All that's required is that two paths go through the same box performing
> > > fragmentation.
> > > 
> > > Is that a fair summary?
> > > 
> > > If yes, we can make this a bit harder by mixing in some
> > > data per input and/or output devices.
> > > 
> > > For example, just to give you the idea:
> > > 
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 683d493..4faa7ef 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
> > >  	trace_netif_receive_skb(skb);
> > >  
> > >  	orig_dev = skb->dev;
> > > +	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
> > >  
> > >  	skb_reset_network_header(skb);
> > >  	if (!skb_transport_header_was_set(skb))
> > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > > index ce69a12..819a821 100644
> > > --- a/net/ipv6/ip6_output.c
> > > +++ b/net/ipv6/ip6_output.c
> > > @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> > >  				     sizeof(struct frag_hdr)) & ~7;
> > >  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> > >  	ipv6_select_ident(&fhdr, rt);
> > > -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> > > +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> > > +						   fhdr.identification);
> > >  
> > >  append:
> > >  	return skb_append_datato_frags(sk, skb, getfrag, from,
> > > 
> > 
> > I thought about mixing in the incoming interface identifier into the
> > frag id generation, but that could hurt us badly as soon as a VM has
> > more than one interface to the outside world and uses e.g. ECMP.
> 
> I don't understand. Fragmentation is done after routing,
> isn't it? So all fragments always go out on the same device.

It is the other way around:

(Source, Dest) is the key to lookup the next fragmentation id. We send
two fragments and use (Source, Dest, ifindex) as key, then the first
packet leaves the host on ifindex 1 with fragid x (because it was a big
gso packet it got segments), second packet leaves host on ifindex 2 with
same Source and Dest and results in fragid y. Ideally both those packets
should use the same bucket to improve uniqueness. Without ifindex we
would have y = x+<number segments>, otherwise it would be (I guess)
random.

> > We need
> > to make sure that those frag ids are unique and the kernel needs to be
> > better than just using a random number generator.
> > 
> > Bye,
> > Hannes
> 
> 32 bit numbers can't be unique. They just shouldn't be discoverable
> by an off-path observer.

No, they can't. But they should be reasonable unique and, as you said,
not discoverable.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28 14:16                     ` Vlad Yasevich
@ 2015-01-28 14:45                       ` Hannes Frederic Sowa
  2015-01-28 15:27                         ` Vlad Yasevich
  0 siblings, 1 reply; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-28 14:45 UTC (permalink / raw)
  To: vyasevic
  Cc: Michael S. Tsirkin, netdev, Vladislav Yasevich, virtualization,
	edumazet, Ben Hutchings

Hi,

On Mi, 2015-01-28 at 09:16 -0500, Vlad Yasevich wrote:
> On 01/28/2015 05:34 AM, Hannes Frederic Sowa wrote:
> > Hi,
> > 
> > On Mi, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
> >> On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
> >>> Hello,
> >>>
> >>> On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
> >>>> On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> >>>>> On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> >>>>>> On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> >>>>>>> On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> >>>>>>>> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> >>>>>>>>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> >>>>>>>>>> If the IPv6 fragment id has not been set and we perform
> >>>>>>>>>> fragmentation due to UFO, select a new fragment id.
> >>>>>>>>>> When we store the fragment id into skb_shinfo, set the bit
> >>>>>>>>>> in the skb so we can re-use the selected id.
> >>>>>>>>>> This preserves the behavior of UFO packets generated on the
> >>>>>>>>>> host and solves the issue of id generation for packet sockets
> >>>>>>>>>> and tap/macvtap devices.
> >>>>>>>>>>
> >>>>>>>>>> This patch moves ipv6_select_ident() back in to the header file.  
> >>>>>>>>>> It also provides the helper function that sets skb_shinfo() frag
> >>>>>>>>>> id and sets the bit.
> >>>>>>>>>>
> >>>>>>>>>> It also makes sure that we select the fragment id when doing
> >>>>>>>>>> just gso validation, since it's possible for the packet to
> >>>>>>>>>> come from an untrusted source (VM) and be forwarded through
> >>>>>>>>>> a UFO enabled device which will expect the fragment id.
> >>>>>>>>>>
> >>>>>>>>>> CC: Eric Dumazet <edumazet@google.com>
> >>>>>>>>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> >>>>>>>>>> ---
> >>>>>>>>>>  include/linux/skbuff.h |  3 ++-
> >>>>>>>>>>  include/net/ipv6.h     |  2 ++
> >>>>>>>>>>  net/ipv6/ip6_output.c  |  4 ++--
> >>>>>>>>>>  net/ipv6/output_core.c |  9 ++++++++-
> >>>>>>>>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> >>>>>>>>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >>>>>>>>>> index 85ab7d7..3ad5203 100644
> >>>>>>>>>> --- a/include/linux/skbuff.h
> >>>>>>>>>> +++ b/include/linux/skbuff.h
> >>>>>>>>>> @@ -605,7 +605,8 @@ struct sk_buff {
> >>>>>>>>>>  	__u8			ipvs_property:1;
> >>>>>>>>>>  	__u8			inner_protocol_type:1;
> >>>>>>>>>>  	__u8			remcsum_offload:1;
> >>>>>>>>>> -	/* 3 or 5 bit hole */
> >>>>>>>>>> +	__u8			ufo_fragid_set:1;
> >>>>>>>>> [...]
> >>>>>>>>>
> >>>>>>>>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> >>>>>>>>> sk_buff?  Otherwise this looks fine.
> >>>>>>>>>
> >>>>>>>>> Ben.
> >>>>>>>>
> >>>>>>>> Hmm we seem to be out of tx flags.
> >>>>>>>> Maybe ip6_frag_id == 0 should mean "not set".
> >>>>>>>
> >>>>>>> Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> >>>>>>> move into the skb_shared_info area.
> >>>>>>
> >>>>>> That's what I originally wanted to do, but had to move and grow txflags thus
> >>>>>> skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> >>>>>>
> >>>>>> I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> >>>>>> from the protocol perspective and could actually be generated by the id generator
> >>>>>> functions.  This may cause us to call the id generation multiple times.
> >>>>>
> >>>>> Are there plans in the long run to let virtio_net transmit auxiliary
> >>>>> data to the other end so we can clean all of this this up one day?
> >>>>>
> >>>>> I don't like the whole situation: looking into the virtio_net headers
> >>>>> just adding a field for ipv6 fragmentation ids to those small structs
> >>>>> seems bloated, not doing it feels incorrect. :/
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>> Bye,
> >>>>> Hannes
> >>>>
> >>>> I'm not sure - what will be achieved by generating the IDs guest side as
> >>>> opposed to host side?  It's certainly harder to get hold of entropy
> >>>> guest-side.
> >>>
> >>> It is not only about entropy but about uniqueness.  Also fragmentation
> >>> ids should not be discoverable,
> >>
> >> I belive "predictable" is the language used by the IETF draft.
> >>
> >>> so there are several aspects:
> >>>
> >>> I see fragmentation id generation still as security critical:
> >>> When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
> >>> identifiers less predictable") I could patch my kernels and use the
> >>> patch regardless of the machine being virtualized or not. It was not
> >>> dependent on the hypervisor.
> >>
> >> And now it's even easier - just patch the hypervisor, and all VMs
> >> automatically benefit.
> > 
> > Sometimes the hypervisor is not under my control. You would need to
> > patch both kernels in your case - non gso frames would still get the
> > fragmentation id generated in the host kernel.
> 
> Why would non-gso frames need a frag id?  We are talking only UDP IPv6
> here, so there is no frag id generation if the packet does't need to
> be fragmented.

E.g. raw sockets still can generate fragments locally. It is also a
valid setup to have multiple interfaces in one machine, one that is UFO
enabled and one that isn't. In that case, fragmentation id generation
happens on different hosts which I want to avoid.

I haven't looked closely but mismatch of MTUs on interfaces seems like
it could lead to unwanted fragmentation, e.g. see is_skb_forwardable
which is mostly always true for gso frames, so we never stop them on
bridges etc.

> >>> I think that is the same reasoning why we
> >>> don't support TOE.
> >>> If we use one generator in the hypervisor in an openstack alike setting,
> >>> the host deals with quite a lot of overlay networks. A lot of default
> >>> configurations use the same addresses internally, so on the hypervisor
> >>> the frag id generators would interfere by design.
> >>> I could come up with an attack scenario for DNS servers (again :) ):
> >>>
> >>> You are sitting next to a DNS server on the same hypervisor and can send
> >>> packets without source validation (because that is handled later on in
> >>> case of openvswitch when the packet is put into the corresponding
> >>> overlay network). You emit a gso packet with the same source and
> >>> destination addresses as the DNS server would do and would get an
> >>> fragmentation id which is linearly (+ time delta) incremented depending
> >>> on the source and destination address. With such a leak you could start
> >>> trying attack and spoof DNS responses (fragmentation attacks etc.).
> >>> See also details on such kind of attacks in the description of commit
> >>> 04ca6973f7c1a0d.
> >>>
> >>> AFAIK IETF tried with IPv6 to push fragmentation id generation to the
> >>> end hosts, that's also the reason for the introduction of atomic
> >>> fragments (which are now being rolled back ;) ).
> >>>
> >>> Still it is better to generate a frag id on the hypervisor than just
> >>> sending a 0, so I am ok with this change, albeit not happy.
> >>>
> >>> Thanks,
> >>> Hannes
> >>>
> >>
> >> OK so to summarize, identifiers are only re-randomized once per jiffy,
> >> so you worry that within this window, an external observer can discover
> >> past fragment ID values and so predict the future ones.
> >> All that's required is that two paths go through the same box performing
> >> fragmentation.
> >>
> >> Is that a fair summary?
> >>
> >> If yes, we can make this a bit harder by mixing in some
> >> data per input and/or output devices.
> >>
> >> For example, just to give you the idea:
> >>
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index 683d493..4faa7ef 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
> >>  	trace_netif_receive_skb(skb);
> >>  
> >>  	orig_dev = skb->dev;
> >> +	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
> >>  
> >>  	skb_reset_network_header(skb);
> >>  	if (!skb_transport_header_was_set(skb))
> >> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> >> index ce69a12..819a821 100644
> >> --- a/net/ipv6/ip6_output.c
> >> +++ b/net/ipv6/ip6_output.c
> >> @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> >>  				     sizeof(struct frag_hdr)) & ~7;
> >>  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> >>  	ipv6_select_ident(&fhdr, rt);
> >> -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> >> +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> >> +						   fhdr.identification);
> >>  
> >>  append:
> >>  	return skb_append_datato_frags(sk, skb, getfrag, from,
> >>
> > 
> > I thought about mixing in the incoming interface identifier into the
> > frag id generation, but that could hurt us badly as soon as a VM has
> > more than one interface to the outside world and uses e.g. ECMP. We need
> > to make sure that those frag ids are unique and the kernel needs to be
> > better than just using a random number generator.
> >
> 
> So the goal behind this series of patches is to restore VM functionality to
> pre-916e4cf46d0204 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data").

I understand (the patch fixed a NULL ptr deref btw.).

As I said, I don't want to stop this series (hopefully the flag can be
moved into skb_shared_info etc.), would look after that IMHO
(skb flags/IPCB and skb_shared_info have different semantics on
__skb_clone).

I think it is very much worth to try to move the fragmentation id
generation back to the end host and only use this as a fallback.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28 14:45                       ` Hannes Frederic Sowa
@ 2015-01-28 15:27                         ` Vlad Yasevich
  2015-01-28 15:49                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: Vlad Yasevich @ 2015-01-28 15:27 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Michael S. Tsirkin, netdev, Vladislav Yasevich, virtualization,
	edumazet, Ben Hutchings

On 01/28/2015 09:45 AM, Hannes Frederic Sowa wrote:
> Hi,
> 
> On Mi, 2015-01-28 at 09:16 -0500, Vlad Yasevich wrote:
>> On 01/28/2015 05:34 AM, Hannes Frederic Sowa wrote:
>>> Hi,
>>>
>>> On Mi, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
>>>> On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
>>>>> Hello,
>>>>>
>>>>> On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
>>>>>> On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
>>>>>>> On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
>>>>>>>> On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
>>>>>>>>> On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
>>>>>>>>>> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
>>>>>>>>>>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
>>>>>>>>>>>> If the IPv6 fragment id has not been set and we perform
>>>>>>>>>>>> fragmentation due to UFO, select a new fragment id.
>>>>>>>>>>>> When we store the fragment id into skb_shinfo, set the bit
>>>>>>>>>>>> in the skb so we can re-use the selected id.
>>>>>>>>>>>> This preserves the behavior of UFO packets generated on the
>>>>>>>>>>>> host and solves the issue of id generation for packet sockets
>>>>>>>>>>>> and tap/macvtap devices.
>>>>>>>>>>>>
>>>>>>>>>>>> This patch moves ipv6_select_ident() back in to the header file.  
>>>>>>>>>>>> It also provides the helper function that sets skb_shinfo() frag
>>>>>>>>>>>> id and sets the bit.
>>>>>>>>>>>>
>>>>>>>>>>>> It also makes sure that we select the fragment id when doing
>>>>>>>>>>>> just gso validation, since it's possible for the packet to
>>>>>>>>>>>> come from an untrusted source (VM) and be forwarded through
>>>>>>>>>>>> a UFO enabled device which will expect the fragment id.
>>>>>>>>>>>>
>>>>>>>>>>>> CC: Eric Dumazet <edumazet@google.com>
>>>>>>>>>>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  include/linux/skbuff.h |  3 ++-
>>>>>>>>>>>>  include/net/ipv6.h     |  2 ++
>>>>>>>>>>>>  net/ipv6/ip6_output.c  |  4 ++--
>>>>>>>>>>>>  net/ipv6/output_core.c |  9 ++++++++-
>>>>>>>>>>>>  net/ipv6/udp_offload.c | 10 +++++++++-
>>>>>>>>>>>>  5 files changed, 23 insertions(+), 5 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>>>>>>>>>> index 85ab7d7..3ad5203 100644
>>>>>>>>>>>> --- a/include/linux/skbuff.h
>>>>>>>>>>>> +++ b/include/linux/skbuff.h
>>>>>>>>>>>> @@ -605,7 +605,8 @@ struct sk_buff {
>>>>>>>>>>>>  	__u8			ipvs_property:1;
>>>>>>>>>>>>  	__u8			inner_protocol_type:1;
>>>>>>>>>>>>  	__u8			remcsum_offload:1;
>>>>>>>>>>>> -	/* 3 or 5 bit hole */
>>>>>>>>>>>> +	__u8			ufo_fragid_set:1;
>>>>>>>>>>> [...]
>>>>>>>>>>>
>>>>>>>>>>> Doesn't the flag belong in struct skb_shared_info, rather than struct
>>>>>>>>>>> sk_buff?  Otherwise this looks fine.
>>>>>>>>>>>
>>>>>>>>>>> Ben.
>>>>>>>>>>
>>>>>>>>>> Hmm we seem to be out of tx flags.
>>>>>>>>>> Maybe ip6_frag_id == 0 should mean "not set".
>>>>>>>>>
>>>>>>>>> Maybe that is the best idea. Definitely the ufo_fragid_set bit should
>>>>>>>>> move into the skb_shared_info area.
>>>>>>>>
>>>>>>>> That's what I originally wanted to do, but had to move and grow txflags thus
>>>>>>>> skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
>>>>>>>>
>>>>>>>> I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
>>>>>>>> from the protocol perspective and could actually be generated by the id generator
>>>>>>>> functions.  This may cause us to call the id generation multiple times.
>>>>>>>
>>>>>>> Are there plans in the long run to let virtio_net transmit auxiliary
>>>>>>> data to the other end so we can clean all of this this up one day?
>>>>>>>
>>>>>>> I don't like the whole situation: looking into the virtio_net headers
>>>>>>> just adding a field for ipv6 fragmentation ids to those small structs
>>>>>>> seems bloated, not doing it feels incorrect. :/
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> Bye,
>>>>>>> Hannes
>>>>>>
>>>>>> I'm not sure - what will be achieved by generating the IDs guest side as
>>>>>> opposed to host side?  It's certainly harder to get hold of entropy
>>>>>> guest-side.
>>>>>
>>>>> It is not only about entropy but about uniqueness.  Also fragmentation
>>>>> ids should not be discoverable,
>>>>
>>>> I belive "predictable" is the language used by the IETF draft.
>>>>
>>>>> so there are several aspects:
>>>>>
>>>>> I see fragmentation id generation still as security critical:
>>>>> When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
>>>>> identifiers less predictable") I could patch my kernels and use the
>>>>> patch regardless of the machine being virtualized or not. It was not
>>>>> dependent on the hypervisor.
>>>>
>>>> And now it's even easier - just patch the hypervisor, and all VMs
>>>> automatically benefit.
>>>
>>> Sometimes the hypervisor is not under my control. You would need to
>>> patch both kernels in your case - non gso frames would still get the
>>> fragmentation id generated in the host kernel.
>>
>> Why would non-gso frames need a frag id?  We are talking only UDP IPv6
>> here, so there is no frag id generation if the packet does't need to
>> be fragmented.
> 
> E.g. raw sockets still can generate fragments locally. It is also a
> valid setup to have multiple interfaces in one machine, one that is UFO
> enabled and one that isn't. In that case, fragmentation id generation
> happens on different hosts which I want to avoid.

OK, so you are concerned about both host and guest generating fragment
ids.  Host would do it for GSO frames and guest would do it for fragmented
frames.  Yes, there is room for collision, which is why we are aiming to
fix this with fragment id passing through virtio_net.  However, I am still
trying to figure the best way to do this as it extends the virtio_net header
and we want to do it right.

> 
> I haven't looked closely but mismatch of MTUs on interfaces seems like
> it could lead to unwanted fragmentation, e.g. see is_skb_forwardable
> which is mostly always true for gso frames, so we never stop them on
> bridges etc.

Yes,  this is one of the cases that gets triggered with VMs.

> 
>>>>> I think that is the same reasoning why we
>>>>> don't support TOE.
>>>>> If we use one generator in the hypervisor in an openstack alike setting,
>>>>> the host deals with quite a lot of overlay networks. A lot of default
>>>>> configurations use the same addresses internally, so on the hypervisor
>>>>> the frag id generators would interfere by design.
>>>>> I could come up with an attack scenario for DNS servers (again :) ):
>>>>>
>>>>> You are sitting next to a DNS server on the same hypervisor and can send
>>>>> packets without source validation (because that is handled later on in
>>>>> case of openvswitch when the packet is put into the corresponding
>>>>> overlay network). You emit a gso packet with the same source and
>>>>> destination addresses as the DNS server would do and would get an
>>>>> fragmentation id which is linearly (+ time delta) incremented depending
>>>>> on the source and destination address. With such a leak you could start
>>>>> trying attack and spoof DNS responses (fragmentation attacks etc.).
>>>>> See also details on such kind of attacks in the description of commit
>>>>> 04ca6973f7c1a0d.
>>>>>
>>>>> AFAIK IETF tried with IPv6 to push fragmentation id generation to the
>>>>> end hosts, that's also the reason for the introduction of atomic
>>>>> fragments (which are now being rolled back ;) ).
>>>>>
>>>>> Still it is better to generate a frag id on the hypervisor than just
>>>>> sending a 0, so I am ok with this change, albeit not happy.
>>>>>
>>>>> Thanks,
>>>>> Hannes
>>>>>
>>>>
>>>> OK so to summarize, identifiers are only re-randomized once per jiffy,
>>>> so you worry that within this window, an external observer can discover
>>>> past fragment ID values and so predict the future ones.
>>>> All that's required is that two paths go through the same box performing
>>>> fragmentation.
>>>>
>>>> Is that a fair summary?
>>>>
>>>> If yes, we can make this a bit harder by mixing in some
>>>> data per input and/or output devices.
>>>>
>>>> For example, just to give you the idea:
>>>>
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index 683d493..4faa7ef 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
>>>>  	trace_netif_receive_skb(skb);
>>>>  
>>>>  	orig_dev = skb->dev;
>>>> +	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
>>>>  
>>>>  	skb_reset_network_header(skb);
>>>>  	if (!skb_transport_header_was_set(skb))
>>>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>>>> index ce69a12..819a821 100644
>>>> --- a/net/ipv6/ip6_output.c
>>>> +++ b/net/ipv6/ip6_output.c
>>>> @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
>>>>  				     sizeof(struct frag_hdr)) & ~7;
>>>>  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
>>>>  	ipv6_select_ident(&fhdr, rt);
>>>> -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
>>>> +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
>>>> +						   fhdr.identification);
>>>>  
>>>>  append:
>>>>  	return skb_append_datato_frags(sk, skb, getfrag, from,
>>>>
>>>
>>> I thought about mixing in the incoming interface identifier into the
>>> frag id generation, but that could hurt us badly as soon as a VM has
>>> more than one interface to the outside world and uses e.g. ECMP. We need
>>> to make sure that those frag ids are unique and the kernel needs to be
>>> better than just using a random number generator.
>>>
>>
>> So the goal behind this series of patches is to restore VM functionality to
>> pre-916e4cf46d0204 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data").
> 
> I understand (the patch fixed a NULL ptr deref btw.).
> 
> As I said, I don't want to stop this series (hopefully the flag can be
> moved into skb_shared_info etc.), would look after that IMHO
> (skb flags/IPCB and skb_shared_info have different semantics on
> __skb_clone).
> 
> I think it is very much worth to try to move the fragmentation id
> generation back to the end host and only use this as a fallback.

I think we are in agreement here.

-vlad
> 
> Bye,
> Hannes
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28 15:27                         ` Vlad Yasevich
@ 2015-01-28 15:49                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2015-01-28 15:49 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet,
	Hannes Frederic Sowa, Ben Hutchings

On Wed, Jan 28, 2015 at 10:27:47AM -0500, Vlad Yasevich wrote:
> On 01/28/2015 09:45 AM, Hannes Frederic Sowa wrote:
> > Hi,
> > 
> > On Mi, 2015-01-28 at 09:16 -0500, Vlad Yasevich wrote:
> >> On 01/28/2015 05:34 AM, Hannes Frederic Sowa wrote:
> >>> Hi,
> >>>
> >>> On Mi, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
> >>>> On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
> >>>>> Hello,
> >>>>>
> >>>>> On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
> >>>>>> On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> >>>>>>> On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> >>>>>>>> On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> >>>>>>>>> On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> >>>>>>>>>> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> >>>>>>>>>>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> >>>>>>>>>>>> If the IPv6 fragment id has not been set and we perform
> >>>>>>>>>>>> fragmentation due to UFO, select a new fragment id.
> >>>>>>>>>>>> When we store the fragment id into skb_shinfo, set the bit
> >>>>>>>>>>>> in the skb so we can re-use the selected id.
> >>>>>>>>>>>> This preserves the behavior of UFO packets generated on the
> >>>>>>>>>>>> host and solves the issue of id generation for packet sockets
> >>>>>>>>>>>> and tap/macvtap devices.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This patch moves ipv6_select_ident() back in to the header file.  
> >>>>>>>>>>>> It also provides the helper function that sets skb_shinfo() frag
> >>>>>>>>>>>> id and sets the bit.
> >>>>>>>>>>>>
> >>>>>>>>>>>> It also makes sure that we select the fragment id when doing
> >>>>>>>>>>>> just gso validation, since it's possible for the packet to
> >>>>>>>>>>>> come from an untrusted source (VM) and be forwarded through
> >>>>>>>>>>>> a UFO enabled device which will expect the fragment id.
> >>>>>>>>>>>>
> >>>>>>>>>>>> CC: Eric Dumazet <edumazet@google.com>
> >>>>>>>>>>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>>  include/linux/skbuff.h |  3 ++-
> >>>>>>>>>>>>  include/net/ipv6.h     |  2 ++
> >>>>>>>>>>>>  net/ipv6/ip6_output.c  |  4 ++--
> >>>>>>>>>>>>  net/ipv6/output_core.c |  9 ++++++++-
> >>>>>>>>>>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> >>>>>>>>>>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >>>>>>>>>>>> index 85ab7d7..3ad5203 100644
> >>>>>>>>>>>> --- a/include/linux/skbuff.h
> >>>>>>>>>>>> +++ b/include/linux/skbuff.h
> >>>>>>>>>>>> @@ -605,7 +605,8 @@ struct sk_buff {
> >>>>>>>>>>>>  	__u8			ipvs_property:1;
> >>>>>>>>>>>>  	__u8			inner_protocol_type:1;
> >>>>>>>>>>>>  	__u8			remcsum_offload:1;
> >>>>>>>>>>>> -	/* 3 or 5 bit hole */
> >>>>>>>>>>>> +	__u8			ufo_fragid_set:1;
> >>>>>>>>>>> [...]
> >>>>>>>>>>>
> >>>>>>>>>>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> >>>>>>>>>>> sk_buff?  Otherwise this looks fine.
> >>>>>>>>>>>
> >>>>>>>>>>> Ben.
> >>>>>>>>>>
> >>>>>>>>>> Hmm we seem to be out of tx flags.
> >>>>>>>>>> Maybe ip6_frag_id == 0 should mean "not set".
> >>>>>>>>>
> >>>>>>>>> Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> >>>>>>>>> move into the skb_shared_info area.
> >>>>>>>>
> >>>>>>>> That's what I originally wanted to do, but had to move and grow txflags thus
> >>>>>>>> skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> >>>>>>>>
> >>>>>>>> I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> >>>>>>>> from the protocol perspective and could actually be generated by the id generator
> >>>>>>>> functions.  This may cause us to call the id generation multiple times.
> >>>>>>>
> >>>>>>> Are there plans in the long run to let virtio_net transmit auxiliary
> >>>>>>> data to the other end so we can clean all of this this up one day?
> >>>>>>>
> >>>>>>> I don't like the whole situation: looking into the virtio_net headers
> >>>>>>> just adding a field for ipv6 fragmentation ids to those small structs
> >>>>>>> seems bloated, not doing it feels incorrect. :/
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>>
> >>>>>>> Bye,
> >>>>>>> Hannes
> >>>>>>
> >>>>>> I'm not sure - what will be achieved by generating the IDs guest side as
> >>>>>> opposed to host side?  It's certainly harder to get hold of entropy
> >>>>>> guest-side.
> >>>>>
> >>>>> It is not only about entropy but about uniqueness.  Also fragmentation
> >>>>> ids should not be discoverable,
> >>>>
> >>>> I belive "predictable" is the language used by the IETF draft.
> >>>>
> >>>>> so there are several aspects:
> >>>>>
> >>>>> I see fragmentation id generation still as security critical:
> >>>>> When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
> >>>>> identifiers less predictable") I could patch my kernels and use the
> >>>>> patch regardless of the machine being virtualized or not. It was not
> >>>>> dependent on the hypervisor.
> >>>>
> >>>> And now it's even easier - just patch the hypervisor, and all VMs
> >>>> automatically benefit.
> >>>
> >>> Sometimes the hypervisor is not under my control. You would need to
> >>> patch both kernels in your case - non gso frames would still get the
> >>> fragmentation id generated in the host kernel.
> >>
> >> Why would non-gso frames need a frag id?  We are talking only UDP IPv6
> >> here, so there is no frag id generation if the packet does't need to
> >> be fragmented.
> > 
> > E.g. raw sockets still can generate fragments locally. It is also a
> > valid setup to have multiple interfaces in one machine, one that is UFO
> > enabled and one that isn't. In that case, fragmentation id generation
> > happens on different hosts which I want to avoid.
> 
> OK, so you are concerned about both host and guest generating fragment
> ids.  Host would do it for GSO frames and guest would do it for fragmented
> frames.  Yes, there is room for collision,

collision is not a problem. It is in fact unavoidable.

> which is why we are aiming to
> fix this with fragment id passing through virtio_net.  However, I am still
> trying to figure the best way to do this as it extends the virtio_net header
> and we want to do it right.
> 
> > 
> > I haven't looked closely but mismatch of MTUs on interfaces seems like
> > it could lead to unwanted fragmentation, e.g. see is_skb_forwardable
> > which is mostly always true for gso frames, so we never stop them on
> > bridges etc.
> 
> Yes,  this is one of the cases that gets triggered with VMs.
> 
> > 
> >>>>> I think that is the same reasoning why we
> >>>>> don't support TOE.
> >>>>> If we use one generator in the hypervisor in an openstack alike setting,
> >>>>> the host deals with quite a lot of overlay networks. A lot of default
> >>>>> configurations use the same addresses internally, so on the hypervisor
> >>>>> the frag id generators would interfere by design.
> >>>>> I could come up with an attack scenario for DNS servers (again :) ):
> >>>>>
> >>>>> You are sitting next to a DNS server on the same hypervisor and can send
> >>>>> packets without source validation (because that is handled later on in
> >>>>> case of openvswitch when the packet is put into the corresponding
> >>>>> overlay network). You emit a gso packet with the same source and
> >>>>> destination addresses as the DNS server would do and would get an
> >>>>> fragmentation id which is linearly (+ time delta) incremented depending
> >>>>> on the source and destination address. With such a leak you could start
> >>>>> trying attack and spoof DNS responses (fragmentation attacks etc.).
> >>>>> See also details on such kind of attacks in the description of commit
> >>>>> 04ca6973f7c1a0d.
> >>>>>
> >>>>> AFAIK IETF tried with IPv6 to push fragmentation id generation to the
> >>>>> end hosts, that's also the reason for the introduction of atomic
> >>>>> fragments (which are now being rolled back ;) ).
> >>>>>
> >>>>> Still it is better to generate a frag id on the hypervisor than just
> >>>>> sending a 0, so I am ok with this change, albeit not happy.
> >>>>>
> >>>>> Thanks,
> >>>>> Hannes
> >>>>>
> >>>>
> >>>> OK so to summarize, identifiers are only re-randomized once per jiffy,
> >>>> so you worry that within this window, an external observer can discover
> >>>> past fragment ID values and so predict the future ones.
> >>>> All that's required is that two paths go through the same box performing
> >>>> fragmentation.
> >>>>
> >>>> Is that a fair summary?
> >>>>
> >>>> If yes, we can make this a bit harder by mixing in some
> >>>> data per input and/or output devices.
> >>>>
> >>>> For example, just to give you the idea:
> >>>>
> >>>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>>> index 683d493..4faa7ef 100644
> >>>> --- a/net/core/dev.c
> >>>> +++ b/net/core/dev.c
> >>>> @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
> >>>>  	trace_netif_receive_skb(skb);
> >>>>  
> >>>>  	orig_dev = skb->dev;
> >>>> +	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
> >>>>  
> >>>>  	skb_reset_network_header(skb);
> >>>>  	if (!skb_transport_header_was_set(skb))
> >>>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> >>>> index ce69a12..819a821 100644
> >>>> --- a/net/ipv6/ip6_output.c
> >>>> +++ b/net/ipv6/ip6_output.c
> >>>> @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> >>>>  				     sizeof(struct frag_hdr)) & ~7;
> >>>>  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> >>>>  	ipv6_select_ident(&fhdr, rt);
> >>>> -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> >>>> +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> >>>> +						   fhdr.identification);
> >>>>  
> >>>>  append:
> >>>>  	return skb_append_datato_frags(sk, skb, getfrag, from,
> >>>>
> >>>
> >>> I thought about mixing in the incoming interface identifier into the
> >>> frag id generation, but that could hurt us badly as soon as a VM has
> >>> more than one interface to the outside world and uses e.g. ECMP. We need
> >>> to make sure that those frag ids are unique and the kernel needs to be
> >>> better than just using a random number generator.
> >>>
> >>
> >> So the goal behind this series of patches is to restore VM functionality to
> >> pre-916e4cf46d0204 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data").
> > 
> > I understand (the patch fixed a NULL ptr deref btw.).
> > 
> > As I said, I don't want to stop this series (hopefully the flag can be
> > moved into skb_shared_info etc.), would look after that IMHO
> > (skb flags/IPCB and skb_shared_info have different semantics on
> > __skb_clone).
> > 
> > I think it is very much worth to try to move the fragmentation id
> > generation back to the end host and only use this as a fallback.
> 
> I think we are in agreement here.
> 
> -vlad
> > 
> > Bye,
> > Hannes
> > 
> > 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28 10:34                   ` Hannes Frederic Sowa
                                       ` (2 preceding siblings ...)
  2015-01-28 14:16                     ` Vlad Yasevich
@ 2015-01-28 16:00                     ` Michael S. Tsirkin
  2015-01-28 16:15                       ` Hannes Frederic Sowa
  3 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2015-01-28 16:00 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

On Wed, Jan 28, 2015 at 11:34:02AM +0100, Hannes Frederic Sowa wrote:
> Hi,
> 
> On Mi, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
> > On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
> > > Hello,
> > > 
> > > On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
> > > > On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> > > > > On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> > > > > > On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> > > > > > > On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> > > > > > >> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > > > > > >>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > > > > > >>>> If the IPv6 fragment id has not been set and we perform
> > > > > > >>>> fragmentation due to UFO, select a new fragment id.
> > > > > > >>>> When we store the fragment id into skb_shinfo, set the bit
> > > > > > >>>> in the skb so we can re-use the selected id.
> > > > > > >>>> This preserves the behavior of UFO packets generated on the
> > > > > > >>>> host and solves the issue of id generation for packet sockets
> > > > > > >>>> and tap/macvtap devices.
> > > > > > >>>>
> > > > > > >>>> This patch moves ipv6_select_ident() back in to the header file.  
> > > > > > >>>> It also provides the helper function that sets skb_shinfo() frag
> > > > > > >>>> id and sets the bit.
> > > > > > >>>>
> > > > > > >>>> It also makes sure that we select the fragment id when doing
> > > > > > >>>> just gso validation, since it's possible for the packet to
> > > > > > >>>> come from an untrusted source (VM) and be forwarded through
> > > > > > >>>> a UFO enabled device which will expect the fragment id.
> > > > > > >>>>
> > > > > > >>>> CC: Eric Dumazet <edumazet@google.com>
> > > > > > >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > > > > >>>> ---
> > > > > > >>>>  include/linux/skbuff.h |  3 ++-
> > > > > > >>>>  include/net/ipv6.h     |  2 ++
> > > > > > >>>>  net/ipv6/ip6_output.c  |  4 ++--
> > > > > > >>>>  net/ipv6/output_core.c |  9 ++++++++-
> > > > > > >>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> > > > > > >>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> > > > > > >>>>
> > > > > > >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > > > >>>> index 85ab7d7..3ad5203 100644
> > > > > > >>>> --- a/include/linux/skbuff.h
> > > > > > >>>> +++ b/include/linux/skbuff.h
> > > > > > >>>> @@ -605,7 +605,8 @@ struct sk_buff {
> > > > > > >>>>  	__u8			ipvs_property:1;
> > > > > > >>>>  	__u8			inner_protocol_type:1;
> > > > > > >>>>  	__u8			remcsum_offload:1;
> > > > > > >>>> -	/* 3 or 5 bit hole */
> > > > > > >>>> +	__u8			ufo_fragid_set:1;
> > > > > > >>> [...]
> > > > > > >>>
> > > > > > >>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> > > > > > >>> sk_buff?  Otherwise this looks fine.
> > > > > > >>>
> > > > > > >>> Ben.
> > > > > > >>
> > > > > > >> Hmm we seem to be out of tx flags.
> > > > > > >> Maybe ip6_frag_id == 0 should mean "not set".
> > > > > > > 
> > > > > > > Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> > > > > > > move into the skb_shared_info area.
> > > > > > 
> > > > > > That's what I originally wanted to do, but had to move and grow txflags thus
> > > > > > skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> > > > > > 
> > > > > > I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> > > > > > from the protocol perspective and could actually be generated by the id generator
> > > > > > functions.  This may cause us to call the id generation multiple times.
> > > > > 
> > > > > Are there plans in the long run to let virtio_net transmit auxiliary
> > > > > data to the other end so we can clean all of this this up one day?
> > > > > 
> > > > > I don't like the whole situation: looking into the virtio_net headers
> > > > > just adding a field for ipv6 fragmentation ids to those small structs
> > > > > seems bloated, not doing it feels incorrect. :/
> > > > > 
> > > > > Thoughts?
> > > > > 
> > > > > Bye,
> > > > > Hannes
> > > > 
> > > > I'm not sure - what will be achieved by generating the IDs guest side as
> > > > opposed to host side?  It's certainly harder to get hold of entropy
> > > > guest-side.
> > > 
> > > It is not only about entropy but about uniqueness.  Also fragmentation
> > > ids should not be discoverable,
> > 
> > I belive "predictable" is the language used by the IETF draft.
> > 
> > > so there are several aspects:
> > > 
> > > I see fragmentation id generation still as security critical:
> > > When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
> > > identifiers less predictable") I could patch my kernels and use the
> > > patch regardless of the machine being virtualized or not. It was not
> > > dependent on the hypervisor.
> > 
> > And now it's even easier - just patch the hypervisor, and all VMs
> > automatically benefit.
> 
> Sometimes the hypervisor is not under my control.

In that case doing things like extending virtio
is out of the question too, isn't it?
It needs hypervisor changes.

> You would need to
> patch both kernels in your case - non gso frames would still get the
> fragmentation id generated in the host kernel.
> 
> > > I think that is the same reasoning why we
> > > don't support TOE.
> > > If we use one generator in the hypervisor in an openstack alike setting,
> > > the host deals with quite a lot of overlay networks. A lot of default
> > > configurations use the same addresses internally, so on the hypervisor
> > > the frag id generators would interfere by design.
> > > I could come up with an attack scenario for DNS servers (again :) ):
> > > 
> > > You are sitting next to a DNS server on the same hypervisor and can send
> > > packets without source validation (because that is handled later on in
> > > case of openvswitch when the packet is put into the corresponding
> > > overlay network). You emit a gso packet with the same source and
> > > destination addresses as the DNS server would do and would get an
> > > fragmentation id which is linearly (+ time delta) incremented depending
> > > on the source and destination address. With such a leak you could start
> > > trying attack and spoof DNS responses (fragmentation attacks etc.).
> > > See also details on such kind of attacks in the description of commit
> > > 04ca6973f7c1a0d.
> > > 
> > > AFAIK IETF tried with IPv6 to push fragmentation id generation to the
> > > end hosts, that's also the reason for the introduction of atomic
> > > fragments (which are now being rolled back ;) ).
> > > 
> > > Still it is better to generate a frag id on the hypervisor than just
> > > sending a 0, so I am ok with this change, albeit not happy.
> > > 
> > > Thanks,
> > > Hannes
> > > 
> > 
> > OK so to summarize, identifiers are only re-randomized once per jiffy,
> > so you worry that within this window, an external observer can discover
> > past fragment ID values and so predict the future ones.
> > All that's required is that two paths go through the same box performing
> > fragmentation.
> > 
> > Is that a fair summary?

No answer here?

> > If yes, we can make this a bit harder by mixing in some
> > data per input and/or output devices.
> > 
> > For example, just to give you the idea:
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 683d493..4faa7ef 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
> >  	trace_netif_receive_skb(skb);
> >  
> >  	orig_dev = skb->dev;
> > +	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
> >  
> >  	skb_reset_network_header(skb);
> >  	if (!skb_transport_header_was_set(skb))
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index ce69a12..819a821 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> >  				     sizeof(struct frag_hdr)) & ~7;
> >  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> >  	ipv6_select_ident(&fhdr, rt);
> > -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> > +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> > +						   fhdr.identification);
> >  
> >  append:
> >  	return skb_append_datato_frags(sk, skb, getfrag, from,
> > 
> 
> I thought about mixing in the incoming interface identifier into the
> frag id generation, but that could hurt us badly as soon as a VM has
> more than one interface to the outside world and uses e.g. ECMP.
> We need
> to make sure that those frag ids are unique and the kernel needs to be
> better than just using a random number generator.
> 
> Bye,
> Hannes

OK then. Like this:

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 679e6e9..1ee9a3a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1508,6 +1508,9 @@ struct net_device {
 	 *	part of the usual set specified in Space.c.
 	 */
 
+	/* Extra hash to mix into IPv6 frag ID on packets received from here. */
+	unsigned int		frag_id_hash;
+
 	unsigned long		state;
 
 	struct list_head	dev_list;
diff --git a/net/core/dev.c b/net/core/dev.c
index 683d493..56f1898 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
 	trace_netif_receive_skb(skb);
 
 	orig_dev = skb->dev;
+	skb_shinfo(skb)->ip6_frag_id = skb->dev->frag_id_hash;
 
 	skb_reset_network_header(skb);
 	if (!skb_transport_header_was_set(skb))
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ce69a12..819a821 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 				     sizeof(struct frag_hdr)) & ~7;
 	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
 	ipv6_select_ident(&fhdr, rt);
-	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
+	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
+						   fhdr.identification);
 
 append:
 	return skb_append_datato_frags(sk, skb, getfrag, from,


Add to this a netlink/sysfs API to set the frag_id_hash for
devices.

Now, user can set identical frag id hash for all devices
for a given VM.

We can even expose this to guests: each guest would generate
the ID on boot and send it to host, host would set it
in sysfs.



-- 
MST

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28 16:00                     ` Michael S. Tsirkin
@ 2015-01-28 16:15                       ` Hannes Frederic Sowa
  2015-01-28 16:48                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-28 16:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

Hi,

On Mi, 2015-01-28 at 18:00 +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 28, 2015 at 11:34:02AM +0100, Hannes Frederic Sowa wrote:
> > Hi,
> > 
> > On Mi, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
> > > On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
> > > > Hello,
> > > > 
> > > > On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
> > > > > On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> > > > > > On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> > > > > > > On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> > > > > > > > On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> > > > > > > >> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > > > > > > >>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > > > > > > >>>> If the IPv6 fragment id has not been set and we perform
> > > > > > > >>>> fragmentation due to UFO, select a new fragment id.
> > > > > > > >>>> When we store the fragment id into skb_shinfo, set the bit
> > > > > > > >>>> in the skb so we can re-use the selected id.
> > > > > > > >>>> This preserves the behavior of UFO packets generated on the
> > > > > > > >>>> host and solves the issue of id generation for packet sockets
> > > > > > > >>>> and tap/macvtap devices.
> > > > > > > >>>>
> > > > > > > >>>> This patch moves ipv6_select_ident() back in to the header file.  
> > > > > > > >>>> It also provides the helper function that sets skb_shinfo() frag
> > > > > > > >>>> id and sets the bit.
> > > > > > > >>>>
> > > > > > > >>>> It also makes sure that we select the fragment id when doing
> > > > > > > >>>> just gso validation, since it's possible for the packet to
> > > > > > > >>>> come from an untrusted source (VM) and be forwarded through
> > > > > > > >>>> a UFO enabled device which will expect the fragment id.
> > > > > > > >>>>
> > > > > > > >>>> CC: Eric Dumazet <edumazet@google.com>
> > > > > > > >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > > > > > >>>> ---
> > > > > > > >>>>  include/linux/skbuff.h |  3 ++-
> > > > > > > >>>>  include/net/ipv6.h     |  2 ++
> > > > > > > >>>>  net/ipv6/ip6_output.c  |  4 ++--
> > > > > > > >>>>  net/ipv6/output_core.c |  9 ++++++++-
> > > > > > > >>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> > > > > > > >>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> > > > > > > >>>>
> > > > > > > >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > > > > >>>> index 85ab7d7..3ad5203 100644
> > > > > > > >>>> --- a/include/linux/skbuff.h
> > > > > > > >>>> +++ b/include/linux/skbuff.h
> > > > > > > >>>> @@ -605,7 +605,8 @@ struct sk_buff {
> > > > > > > >>>>  	__u8			ipvs_property:1;
> > > > > > > >>>>  	__u8			inner_protocol_type:1;
> > > > > > > >>>>  	__u8			remcsum_offload:1;
> > > > > > > >>>> -	/* 3 or 5 bit hole */
> > > > > > > >>>> +	__u8			ufo_fragid_set:1;
> > > > > > > >>> [...]
> > > > > > > >>>
> > > > > > > >>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> > > > > > > >>> sk_buff?  Otherwise this looks fine.
> > > > > > > >>>
> > > > > > > >>> Ben.
> > > > > > > >>
> > > > > > > >> Hmm we seem to be out of tx flags.
> > > > > > > >> Maybe ip6_frag_id == 0 should mean "not set".
> > > > > > > > 
> > > > > > > > Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> > > > > > > > move into the skb_shared_info area.
> > > > > > > 
> > > > > > > That's what I originally wanted to do, but had to move and grow txflags thus
> > > > > > > skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> > > > > > > 
> > > > > > > I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> > > > > > > from the protocol perspective and could actually be generated by the id generator
> > > > > > > functions.  This may cause us to call the id generation multiple times.
> > > > > > 
> > > > > > Are there plans in the long run to let virtio_net transmit auxiliary
> > > > > > data to the other end so we can clean all of this this up one day?
> > > > > > 
> > > > > > I don't like the whole situation: looking into the virtio_net headers
> > > > > > just adding a field for ipv6 fragmentation ids to those small structs
> > > > > > seems bloated, not doing it feels incorrect. :/
> > > > > > 
> > > > > > Thoughts?
> > > > > > 
> > > > > > Bye,
> > > > > > Hannes
> > > > > 
> > > > > I'm not sure - what will be achieved by generating the IDs guest side as
> > > > > opposed to host side?  It's certainly harder to get hold of entropy
> > > > > guest-side.
> > > > 
> > > > It is not only about entropy but about uniqueness.  Also fragmentation
> > > > ids should not be discoverable,
> > > 
> > > I belive "predictable" is the language used by the IETF draft.
> > > 
> > > > so there are several aspects:
> > > > 
> > > > I see fragmentation id generation still as security critical:
> > > > When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
> > > > identifiers less predictable") I could patch my kernels and use the
> > > > patch regardless of the machine being virtualized or not. It was not
> > > > dependent on the hypervisor.
> > > 
> > > And now it's even easier - just patch the hypervisor, and all VMs
> > > automatically benefit.
> > 
> > Sometimes the hypervisor is not under my control.
> 
> In that case doing things like extending virtio
> is out of the question too, isn't it?
> It needs hypervisor changes.

Sure, but I would like to have the fragmentation id generator to reside
inside the end-host kernel. Hypervisor needs to carry the frag id along,
sure, and needs to be changed accordingly.

So in either case we need to change both kernels. ;)

> 
> > You would need to
> > patch both kernels in your case - non gso frames would still get the
> > fragmentation id generated in the host kernel.
> > 
> > > > I think that is the same reasoning why we
> > > > don't support TOE.
> > > > If we use one generator in the hypervisor in an openstack alike setting,
> > > > the host deals with quite a lot of overlay networks. A lot of default
> > > > configurations use the same addresses internally, so on the hypervisor
> > > > the frag id generators would interfere by design.
> > > > I could come up with an attack scenario for DNS servers (again :) ):
> > > > 
> > > > You are sitting next to a DNS server on the same hypervisor and can send
> > > > packets without source validation (because that is handled later on in
> > > > case of openvswitch when the packet is put into the corresponding
> > > > overlay network). You emit a gso packet with the same source and
> > > > destination addresses as the DNS server would do and would get an
> > > > fragmentation id which is linearly (+ time delta) incremented depending
> > > > on the source and destination address. With such a leak you could start
> > > > trying attack and spoof DNS responses (fragmentation attacks etc.).
> > > > See also details on such kind of attacks in the description of commit
> > > > 04ca6973f7c1a0d.
> > > > 
> > > > AFAIK IETF tried with IPv6 to push fragmentation id generation to the
> > > > end hosts, that's also the reason for the introduction of atomic
> > > > fragments (which are now being rolled back ;) ).
> > > > 
> > > > Still it is better to generate a frag id on the hypervisor than just
> > > > sending a 0, so I am ok with this change, albeit not happy.
> > > > 
> > > > Thanks,
> > > > Hannes
> > > > 
> > > 
> > > OK so to summarize, identifiers are only re-randomized once per jiffy,
> > > so you worry that within this window, an external observer can discover
> > > past fragment ID values and so predict the future ones.
> > > All that's required is that two paths go through the same box performing
> > > fragmentation.
> > > 
> > > Is that a fair summary?
> 
> No answer here?

Ups, sorry.

It is not re-randomized but only biased by a time delta (note the
prandom_u32_max). So even after such an increment happens you can still
guess the range of the current fragmentation ids for a longer time.

Otherwise it is a fair summary.

> 
> > > If yes, we can make this a bit harder by mixing in some
> > > data per input and/or output devices.
> > > 
> > > For example, just to give you the idea:
> > > 
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 683d493..4faa7ef 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
> > >  	trace_netif_receive_skb(skb);
> > >  
> > >  	orig_dev = skb->dev;
> > > +	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
> > >  
> > >  	skb_reset_network_header(skb);
> > >  	if (!skb_transport_header_was_set(skb))
> > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > > index ce69a12..819a821 100644
> > > --- a/net/ipv6/ip6_output.c
> > > +++ b/net/ipv6/ip6_output.c
> > > @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> > >  				     sizeof(struct frag_hdr)) & ~7;
> > >  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> > >  	ipv6_select_ident(&fhdr, rt);
> > > -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> > > +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> > > +						   fhdr.identification);
> > >  
> > >  append:
> > >  	return skb_append_datato_frags(sk, skb, getfrag, from,
> > > 
> > 
> > I thought about mixing in the incoming interface identifier into the
> > frag id generation, but that could hurt us badly as soon as a VM has
> > more than one interface to the outside world and uses e.g. ECMP.
> > We need
> > to make sure that those frag ids are unique and the kernel needs to be
> > better than just using a random number generator.
> > 
> > Bye,
> > Hannes
> 
> OK then. Like this:
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 679e6e9..1ee9a3a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1508,6 +1508,9 @@ struct net_device {
>  	 *	part of the usual set specified in Space.c.
>  	 */
>  
> +	/* Extra hash to mix into IPv6 frag ID on packets received from here. */
> +	unsigned int		frag_id_hash;
> +
>  	unsigned long		state;
>  
>  	struct list_head	dev_list;
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 683d493..56f1898 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
>  	trace_netif_receive_skb(skb);
>  
>  	orig_dev = skb->dev;
> +	skb_shinfo(skb)->ip6_frag_id = skb->dev->frag_id_hash;
>  
>  	skb_reset_network_header(skb);
>  	if (!skb_transport_header_was_set(skb))
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index ce69a12..819a821 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
>  				     sizeof(struct frag_hdr)) & ~7;
>  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
>  	ipv6_select_ident(&fhdr, rt);
> -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> +						   fhdr.identification);
>  
>  append:
>  	return skb_append_datato_frags(sk, skb, getfrag, from,
> 
> 
> Add to this a netlink/sysfs API to set the frag_id_hash for
> devices.
> 
> Now, user can set identical frag id hash for all devices
> for a given VM.
> 
> We can even expose this to guests: each guest would generate
> the ID on boot and send it to host, host would set it
> in sysfs.

jhash_1word shouldn't be a bijection, so we are randomizing here and are
increasing the probability of collisions. Instead of jhash_1word you
would need to take a simple block cipher with the hash as key.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28 16:15                       ` Hannes Frederic Sowa
@ 2015-01-28 16:48                         ` Michael S. Tsirkin
  2015-01-28 17:34                           ` Hannes Frederic Sowa
  0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2015-01-28 16:48 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

On Wed, Jan 28, 2015 at 05:15:49PM +0100, Hannes Frederic Sowa wrote:
> Hi,
> 
> On Mi, 2015-01-28 at 18:00 +0200, Michael S. Tsirkin wrote:
> > On Wed, Jan 28, 2015 at 11:34:02AM +0100, Hannes Frederic Sowa wrote:
> > > Hi,
> > > 
> > > On Mi, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
> > > > On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
> > > > > Hello,
> > > > > 
> > > > > On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> > > > > > > On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> > > > > > > > On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> > > > > > > > > On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> > > > > > > > >> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > > > > > > > >>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > > > > > > > >>>> If the IPv6 fragment id has not been set and we perform
> > > > > > > > >>>> fragmentation due to UFO, select a new fragment id.
> > > > > > > > >>>> When we store the fragment id into skb_shinfo, set the bit
> > > > > > > > >>>> in the skb so we can re-use the selected id.
> > > > > > > > >>>> This preserves the behavior of UFO packets generated on the
> > > > > > > > >>>> host and solves the issue of id generation for packet sockets
> > > > > > > > >>>> and tap/macvtap devices.
> > > > > > > > >>>>
> > > > > > > > >>>> This patch moves ipv6_select_ident() back in to the header file.  
> > > > > > > > >>>> It also provides the helper function that sets skb_shinfo() frag
> > > > > > > > >>>> id and sets the bit.
> > > > > > > > >>>>
> > > > > > > > >>>> It also makes sure that we select the fragment id when doing
> > > > > > > > >>>> just gso validation, since it's possible for the packet to
> > > > > > > > >>>> come from an untrusted source (VM) and be forwarded through
> > > > > > > > >>>> a UFO enabled device which will expect the fragment id.
> > > > > > > > >>>>
> > > > > > > > >>>> CC: Eric Dumazet <edumazet@google.com>
> > > > > > > > >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > > > > > > >>>> ---
> > > > > > > > >>>>  include/linux/skbuff.h |  3 ++-
> > > > > > > > >>>>  include/net/ipv6.h     |  2 ++
> > > > > > > > >>>>  net/ipv6/ip6_output.c  |  4 ++--
> > > > > > > > >>>>  net/ipv6/output_core.c |  9 ++++++++-
> > > > > > > > >>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> > > > > > > > >>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> > > > > > > > >>>>
> > > > > > > > >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > > > > > >>>> index 85ab7d7..3ad5203 100644
> > > > > > > > >>>> --- a/include/linux/skbuff.h
> > > > > > > > >>>> +++ b/include/linux/skbuff.h
> > > > > > > > >>>> @@ -605,7 +605,8 @@ struct sk_buff {
> > > > > > > > >>>>  	__u8			ipvs_property:1;
> > > > > > > > >>>>  	__u8			inner_protocol_type:1;
> > > > > > > > >>>>  	__u8			remcsum_offload:1;
> > > > > > > > >>>> -	/* 3 or 5 bit hole */
> > > > > > > > >>>> +	__u8			ufo_fragid_set:1;
> > > > > > > > >>> [...]
> > > > > > > > >>>
> > > > > > > > >>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> > > > > > > > >>> sk_buff?  Otherwise this looks fine.
> > > > > > > > >>>
> > > > > > > > >>> Ben.
> > > > > > > > >>
> > > > > > > > >> Hmm we seem to be out of tx flags.
> > > > > > > > >> Maybe ip6_frag_id == 0 should mean "not set".
> > > > > > > > > 
> > > > > > > > > Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> > > > > > > > > move into the skb_shared_info area.
> > > > > > > > 
> > > > > > > > That's what I originally wanted to do, but had to move and grow txflags thus
> > > > > > > > skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> > > > > > > > 
> > > > > > > > I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> > > > > > > > from the protocol perspective and could actually be generated by the id generator
> > > > > > > > functions.  This may cause us to call the id generation multiple times.
> > > > > > > 
> > > > > > > Are there plans in the long run to let virtio_net transmit auxiliary
> > > > > > > data to the other end so we can clean all of this this up one day?
> > > > > > > 
> > > > > > > I don't like the whole situation: looking into the virtio_net headers
> > > > > > > just adding a field for ipv6 fragmentation ids to those small structs
> > > > > > > seems bloated, not doing it feels incorrect. :/
> > > > > > > 
> > > > > > > Thoughts?
> > > > > > > 
> > > > > > > Bye,
> > > > > > > Hannes
> > > > > > 
> > > > > > I'm not sure - what will be achieved by generating the IDs guest side as
> > > > > > opposed to host side?  It's certainly harder to get hold of entropy
> > > > > > guest-side.
> > > > > 
> > > > > It is not only about entropy but about uniqueness.  Also fragmentation
> > > > > ids should not be discoverable,
> > > > 
> > > > I belive "predictable" is the language used by the IETF draft.
> > > > 
> > > > > so there are several aspects:
> > > > > 
> > > > > I see fragmentation id generation still as security critical:
> > > > > When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
> > > > > identifiers less predictable") I could patch my kernels and use the
> > > > > patch regardless of the machine being virtualized or not. It was not
> > > > > dependent on the hypervisor.
> > > > 
> > > > And now it's even easier - just patch the hypervisor, and all VMs
> > > > automatically benefit.
> > > 
> > > Sometimes the hypervisor is not under my control.
> > 
> > In that case doing things like extending virtio
> > is out of the question too, isn't it?
> > It needs hypervisor changes.
> 
> Sure, but I would like to have the fragmentation id generator to reside
> inside the end-host kernel. Hypervisor needs to carry the frag id along,
> sure, and needs to be changed accordingly.
> 
> So in either case we need to change both kernels. ;)
> 
> > 
> > > You would need to
> > > patch both kernels in your case - non gso frames would still get the
> > > fragmentation id generated in the host kernel.
> > > 
> > > > > I think that is the same reasoning why we
> > > > > don't support TOE.
> > > > > If we use one generator in the hypervisor in an openstack alike setting,
> > > > > the host deals with quite a lot of overlay networks. A lot of default
> > > > > configurations use the same addresses internally, so on the hypervisor
> > > > > the frag id generators would interfere by design.
> > > > > I could come up with an attack scenario for DNS servers (again :) ):
> > > > > 
> > > > > You are sitting next to a DNS server on the same hypervisor and can send
> > > > > packets without source validation (because that is handled later on in
> > > > > case of openvswitch when the packet is put into the corresponding
> > > > > overlay network). You emit a gso packet with the same source and
> > > > > destination addresses as the DNS server would do and would get an
> > > > > fragmentation id which is linearly (+ time delta) incremented depending
> > > > > on the source and destination address. With such a leak you could start
> > > > > trying attack and spoof DNS responses (fragmentation attacks etc.).
> > > > > See also details on such kind of attacks in the description of commit
> > > > > 04ca6973f7c1a0d.
> > > > > 
> > > > > AFAIK IETF tried with IPv6 to push fragmentation id generation to the
> > > > > end hosts, that's also the reason for the introduction of atomic
> > > > > fragments (which are now being rolled back ;) ).
> > > > > 
> > > > > Still it is better to generate a frag id on the hypervisor than just
> > > > > sending a 0, so I am ok with this change, albeit not happy.
> > > > > 
> > > > > Thanks,
> > > > > Hannes
> > > > > 
> > > > 
> > > > OK so to summarize, identifiers are only re-randomized once per jiffy,
> > > > so you worry that within this window, an external observer can discover
> > > > past fragment ID values and so predict the future ones.
> > > > All that's required is that two paths go through the same box performing
> > > > fragmentation.
> > > > 
> > > > Is that a fair summary?
> > 
> > No answer here?
> 
> Ups, sorry.
> 
> It is not re-randomized but only biased by a time delta (note the
> prandom_u32_max). So even after such an increment happens you can still
> guess the range of the current fragmentation ids for a longer time.
> 
> Otherwise it is a fair summary.
> 
> > 
> > > > If yes, we can make this a bit harder by mixing in some
> > > > data per input and/or output devices.
> > > > 
> > > > For example, just to give you the idea:
> > > > 
> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index 683d493..4faa7ef 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
> > > >  	trace_netif_receive_skb(skb);
> > > >  
> > > >  	orig_dev = skb->dev;
> > > > +	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
> > > >  
> > > >  	skb_reset_network_header(skb);
> > > >  	if (!skb_transport_header_was_set(skb))
> > > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > > > index ce69a12..819a821 100644
> > > > --- a/net/ipv6/ip6_output.c
> > > > +++ b/net/ipv6/ip6_output.c
> > > > @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> > > >  				     sizeof(struct frag_hdr)) & ~7;
> > > >  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> > > >  	ipv6_select_ident(&fhdr, rt);
> > > > -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> > > > +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> > > > +						   fhdr.identification);
> > > >  
> > > >  append:
> > > >  	return skb_append_datato_frags(sk, skb, getfrag, from,
> > > > 
> > > 
> > > I thought about mixing in the incoming interface identifier into the
> > > frag id generation, but that could hurt us badly as soon as a VM has
> > > more than one interface to the outside world and uses e.g. ECMP.
> > > We need
> > > to make sure that those frag ids are unique and the kernel needs to be
> > > better than just using a random number generator.
> > > 
> > > Bye,
> > > Hannes
> > 
> > OK then. Like this:
> > 
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 679e6e9..1ee9a3a 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -1508,6 +1508,9 @@ struct net_device {
> >  	 *	part of the usual set specified in Space.c.
> >  	 */
> >  
> > +	/* Extra hash to mix into IPv6 frag ID on packets received from here. */
> > +	unsigned int		frag_id_hash;
> > +
> >  	unsigned long		state;
> >  
> >  	struct list_head	dev_list;
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 683d493..56f1898 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
> >  	trace_netif_receive_skb(skb);
> >  
> >  	orig_dev = skb->dev;
> > +	skb_shinfo(skb)->ip6_frag_id = skb->dev->frag_id_hash;
> >  
> >  	skb_reset_network_header(skb);
> >  	if (!skb_transport_header_was_set(skb))
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index ce69a12..819a821 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> >  				     sizeof(struct frag_hdr)) & ~7;
> >  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> >  	ipv6_select_ident(&fhdr, rt);
> > -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> > +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> > +						   fhdr.identification);
> >  
> >  append:
> >  	return skb_append_datato_frags(sk, skb, getfrag, from,
> > 
> > 
> > Add to this a netlink/sysfs API to set the frag_id_hash for
> > devices.
> > 
> > Now, user can set identical frag id hash for all devices
> > for a given VM.
> > 
> > We can even expose this to guests: each guest would generate
> > the ID on boot and send it to host, host would set it
> > in sysfs.
> 
> jhash_1word shouldn't be a bijection, so we are randomizing here and are
> increasing the probability of collisions.
> Instead of jhash_1word you
> would need to take a simple block cipher with the hash as key.
> 
> Bye,
> Hannes

fhdr.identification is coming from jhash_3word itself, how is this
different?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28  9:46                 ` Michael S. Tsirkin
  2015-01-28 10:34                   ` Hannes Frederic Sowa
@ 2015-01-28 17:24                   ` Ben Hutchings
  1 sibling, 0 replies; 36+ messages in thread
From: Ben Hutchings @ 2015-01-28 17:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet,
	Hannes Frederic Sowa


[-- Attachment #1.1: Type: text/plain, Size: 1094 bytes --]

On Wed, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
[...]
> > I see fragmentation id generation still as security critical:
> > When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
> > identifiers less predictable") I could patch my kernels and use the
> > patch regardless of the machine being virtualized or not. It was not
> > dependent on the hypervisor.
> 
> And now it's even easier - just patch the hypervisor, and all VMs
> automatically benefit.
[...]

You are advocating that the hypervisor should continue to act as a
middle-box that quietly modifies packets.  This may be useful to protect
guests that have poor fragment ID generation, but then that should be an
optional netfilter module and *not* the default.  The default should be
that UFO has no effect on the packet headers on the wire, and therefore
that the fragment ID is chosen by the IPv6 stack in the guest.

Ben.

-- 
Ben Hutchings
Teamwork is essential - it allows you to blame someone else.

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set.
  2015-01-28 16:48                         ` Michael S. Tsirkin
@ 2015-01-28 17:34                           ` Hannes Frederic Sowa
  0 siblings, 0 replies; 36+ messages in thread
From: Hannes Frederic Sowa @ 2015-01-28 17:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Vladislav Yasevich, virtualization, edumazet, Ben Hutchings

On Mi, 2015-01-28 at 18:48 +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 28, 2015 at 05:15:49PM +0100, Hannes Frederic Sowa wrote:
> > Hi,
> > 
> > On Mi, 2015-01-28 at 18:00 +0200, Michael S. Tsirkin wrote:
> > > On Wed, Jan 28, 2015 at 11:34:02AM +0100, Hannes Frederic Sowa wrote:
> > > > Hi,
> > > > 
> > > > On Mi, 2015-01-28 at 11:46 +0200, Michael S. Tsirkin wrote:
> > > > > On Wed, Jan 28, 2015 at 09:25:08AM +0100, Hannes Frederic Sowa wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > On Di, 2015-01-27 at 18:08 +0200, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Jan 27, 2015 at 05:02:31PM +0100, Hannes Frederic Sowa wrote:
> > > > > > > > On Di, 2015-01-27 at 09:26 -0500, Vlad Yasevich wrote:
> > > > > > > > > On 01/27/2015 08:47 AM, Hannes Frederic Sowa wrote:
> > > > > > > > > > On Di, 2015-01-27 at 10:42 +0200, Michael S. Tsirkin wrote:
> > > > > > > > > >> On Tue, Jan 27, 2015 at 02:47:54AM +0000, Ben Hutchings wrote:
> > > > > > > > > >>> On Mon, 2015-01-26 at 09:37 -0500, Vladislav Yasevich wrote:
> > > > > > > > > >>>> If the IPv6 fragment id has not been set and we perform
> > > > > > > > > >>>> fragmentation due to UFO, select a new fragment id.
> > > > > > > > > >>>> When we store the fragment id into skb_shinfo, set the bit
> > > > > > > > > >>>> in the skb so we can re-use the selected id.
> > > > > > > > > >>>> This preserves the behavior of UFO packets generated on the
> > > > > > > > > >>>> host and solves the issue of id generation for packet sockets
> > > > > > > > > >>>> and tap/macvtap devices.
> > > > > > > > > >>>>
> > > > > > > > > >>>> This patch moves ipv6_select_ident() back in to the header file.  
> > > > > > > > > >>>> It also provides the helper function that sets skb_shinfo() frag
> > > > > > > > > >>>> id and sets the bit.
> > > > > > > > > >>>>
> > > > > > > > > >>>> It also makes sure that we select the fragment id when doing
> > > > > > > > > >>>> just gso validation, since it's possible for the packet to
> > > > > > > > > >>>> come from an untrusted source (VM) and be forwarded through
> > > > > > > > > >>>> a UFO enabled device which will expect the fragment id.
> > > > > > > > > >>>>
> > > > > > > > > >>>> CC: Eric Dumazet <edumazet@google.com>
> > > > > > > > > >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > > > > > > > >>>> ---
> > > > > > > > > >>>>  include/linux/skbuff.h |  3 ++-
> > > > > > > > > >>>>  include/net/ipv6.h     |  2 ++
> > > > > > > > > >>>>  net/ipv6/ip6_output.c  |  4 ++--
> > > > > > > > > >>>>  net/ipv6/output_core.c |  9 ++++++++-
> > > > > > > > > >>>>  net/ipv6/udp_offload.c | 10 +++++++++-
> > > > > > > > > >>>>  5 files changed, 23 insertions(+), 5 deletions(-)
> > > > > > > > > >>>>
> > > > > > > > > >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > > > > > > >>>> index 85ab7d7..3ad5203 100644
> > > > > > > > > >>>> --- a/include/linux/skbuff.h
> > > > > > > > > >>>> +++ b/include/linux/skbuff.h
> > > > > > > > > >>>> @@ -605,7 +605,8 @@ struct sk_buff {
> > > > > > > > > >>>>  	__u8			ipvs_property:1;
> > > > > > > > > >>>>  	__u8			inner_protocol_type:1;
> > > > > > > > > >>>>  	__u8			remcsum_offload:1;
> > > > > > > > > >>>> -	/* 3 or 5 bit hole */
> > > > > > > > > >>>> +	__u8			ufo_fragid_set:1;
> > > > > > > > > >>> [...]
> > > > > > > > > >>>
> > > > > > > > > >>> Doesn't the flag belong in struct skb_shared_info, rather than struct
> > > > > > > > > >>> sk_buff?  Otherwise this looks fine.
> > > > > > > > > >>>
> > > > > > > > > >>> Ben.
> > > > > > > > > >>
> > > > > > > > > >> Hmm we seem to be out of tx flags.
> > > > > > > > > >> Maybe ip6_frag_id == 0 should mean "not set".
> > > > > > > > > > 
> > > > > > > > > > Maybe that is the best idea. Definitely the ufo_fragid_set bit should
> > > > > > > > > > move into the skb_shared_info area.
> > > > > > > > > 
> > > > > > > > > That's what I originally wanted to do, but had to move and grow txflags thus
> > > > > > > > > skb_shinfo ended up growing.  I wanted to avoid that, so stole an skb flag.
> > > > > > > > > 
> > > > > > > > > I considered treating fragid == 0 as unset, but a 0 fragid is perfectly valid
> > > > > > > > > from the protocol perspective and could actually be generated by the id generator
> > > > > > > > > functions.  This may cause us to call the id generation multiple times.
> > > > > > > > 
> > > > > > > > Are there plans in the long run to let virtio_net transmit auxiliary
> > > > > > > > data to the other end so we can clean all of this this up one day?
> > > > > > > > 
> > > > > > > > I don't like the whole situation: looking into the virtio_net headers
> > > > > > > > just adding a field for ipv6 fragmentation ids to those small structs
> > > > > > > > seems bloated, not doing it feels incorrect. :/
> > > > > > > > 
> > > > > > > > Thoughts?
> > > > > > > > 
> > > > > > > > Bye,
> > > > > > > > Hannes
> > > > > > > 
> > > > > > > I'm not sure - what will be achieved by generating the IDs guest side as
> > > > > > > opposed to host side?  It's certainly harder to get hold of entropy
> > > > > > > guest-side.
> > > > > > 
> > > > > > It is not only about entropy but about uniqueness.  Also fragmentation
> > > > > > ids should not be discoverable,
> > > > > 
> > > > > I belive "predictable" is the language used by the IETF draft.
> > > > > 
> > > > > > so there are several aspects:
> > > > > > 
> > > > > > I see fragmentation id generation still as security critical:
> > > > > > When Eric patched the frag id generator in 04ca6973f7c1a0d ("ip: make IP
> > > > > > identifiers less predictable") I could patch my kernels and use the
> > > > > > patch regardless of the machine being virtualized or not. It was not
> > > > > > dependent on the hypervisor.
> > > > > 
> > > > > And now it's even easier - just patch the hypervisor, and all VMs
> > > > > automatically benefit.
> > > > 
> > > > Sometimes the hypervisor is not under my control.
> > > 
> > > In that case doing things like extending virtio
> > > is out of the question too, isn't it?
> > > It needs hypervisor changes.
> > 
> > Sure, but I would like to have the fragmentation id generator to reside
> > inside the end-host kernel. Hypervisor needs to carry the frag id along,
> > sure, and needs to be changed accordingly.
> > 
> > So in either case we need to change both kernels. ;)
> > 
> > > 
> > > > You would need to
> > > > patch both kernels in your case - non gso frames would still get the
> > > > fragmentation id generated in the host kernel.
> > > > 
> > > > > > I think that is the same reasoning why we
> > > > > > don't support TOE.
> > > > > > If we use one generator in the hypervisor in an openstack alike setting,
> > > > > > the host deals with quite a lot of overlay networks. A lot of default
> > > > > > configurations use the same addresses internally, so on the hypervisor
> > > > > > the frag id generators would interfere by design.
> > > > > > I could come up with an attack scenario for DNS servers (again :) ):
> > > > > > 
> > > > > > You are sitting next to a DNS server on the same hypervisor and can send
> > > > > > packets without source validation (because that is handled later on in
> > > > > > case of openvswitch when the packet is put into the corresponding
> > > > > > overlay network). You emit a gso packet with the same source and
> > > > > > destination addresses as the DNS server would do and would get an
> > > > > > fragmentation id which is linearly (+ time delta) incremented depending
> > > > > > on the source and destination address. With such a leak you could start
> > > > > > trying attack and spoof DNS responses (fragmentation attacks etc.).
> > > > > > See also details on such kind of attacks in the description of commit
> > > > > > 04ca6973f7c1a0d.
> > > > > > 
> > > > > > AFAIK IETF tried with IPv6 to push fragmentation id generation to the
> > > > > > end hosts, that's also the reason for the introduction of atomic
> > > > > > fragments (which are now being rolled back ;) ).
> > > > > > 
> > > > > > Still it is better to generate a frag id on the hypervisor than just
> > > > > > sending a 0, so I am ok with this change, albeit not happy.
> > > > > > 
> > > > > > Thanks,
> > > > > > Hannes
> > > > > > 
> > > > > 
> > > > > OK so to summarize, identifiers are only re-randomized once per jiffy,
> > > > > so you worry that within this window, an external observer can discover
> > > > > past fragment ID values and so predict the future ones.
> > > > > All that's required is that two paths go through the same box performing
> > > > > fragmentation.
> > > > > 
> > > > > Is that a fair summary?
> > > 
> > > No answer here?
> > 
> > Ups, sorry.
> > 
> > It is not re-randomized but only biased by a time delta (note the
> > prandom_u32_max). So even after such an increment happens you can still
> > guess the range of the current fragmentation ids for a longer time.
> > 
> > Otherwise it is a fair summary.
> > 
> > > 
> > > > > If yes, we can make this a bit harder by mixing in some
> > > > > data per input and/or output devices.
> > > > > 
> > > > > For example, just to give you the idea:
> > > > > 
> > > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > > index 683d493..4faa7ef 100644
> > > > > --- a/net/core/dev.c
> > > > > +++ b/net/core/dev.c
> > > > > @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
> > > > >  	trace_netif_receive_skb(skb);
> > > > >  
> > > > >  	orig_dev = skb->dev;
> > > > > +	skb_shinfo(skb)->ip6_frag_id = skb->dev->ifindex;
> > > > >  
> > > > >  	skb_reset_network_header(skb);
> > > > >  	if (!skb_transport_header_was_set(skb))
> > > > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > > > > index ce69a12..819a821 100644
> > > > > --- a/net/ipv6/ip6_output.c
> > > > > +++ b/net/ipv6/ip6_output.c
> > > > > @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> > > > >  				     sizeof(struct frag_hdr)) & ~7;
> > > > >  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> > > > >  	ipv6_select_ident(&fhdr, rt);
> > > > > -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> > > > > +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> > > > > +						   fhdr.identification);
> > > > >  
> > > > >  append:
> > > > >  	return skb_append_datato_frags(sk, skb, getfrag, from,
> > > > > 
> > > > 
> > > > I thought about mixing in the incoming interface identifier into the
> > > > frag id generation, but that could hurt us badly as soon as a VM has
> > > > more than one interface to the outside world and uses e.g. ECMP.
> > > > We need
> > > > to make sure that those frag ids are unique and the kernel needs to be
> > > > better than just using a random number generator.
> > > > 
> > > > Bye,
> > > > Hannes
> > > 
> > > OK then. Like this:
> > > 
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 679e6e9..1ee9a3a 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -1508,6 +1508,9 @@ struct net_device {
> > >  	 *	part of the usual set specified in Space.c.
> > >  	 */
> > >  
> > > +	/* Extra hash to mix into IPv6 frag ID on packets received from here. */
> > > +	unsigned int		frag_id_hash;
> > > +
> > >  	unsigned long		state;
> > >  
> > >  	struct list_head	dev_list;
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 683d493..56f1898 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -3625,6 +3625,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
> > >  	trace_netif_receive_skb(skb);
> > >  
> > >  	orig_dev = skb->dev;
> > > +	skb_shinfo(skb)->ip6_frag_id = skb->dev->frag_id_hash;
> > >  
> > >  	skb_reset_network_header(skb);
> > >  	if (!skb_transport_header_was_set(skb))
> > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > > index ce69a12..819a821 100644
> > > --- a/net/ipv6/ip6_output.c
> > > +++ b/net/ipv6/ip6_output.c
> > > @@ -1092,7 +1092,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> > >  				     sizeof(struct frag_hdr)) & ~7;
> > >  	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> > >  	ipv6_select_ident(&fhdr, rt);
> > > -	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> > > +	skb_shinfo(skb)->ip6_frag_id = jhash_1word(skb_shinfo(skb)->ip6_frag_id,
> > > +						   fhdr.identification);
> > >  
> > >  append:
> > >  	return skb_append_datato_frags(sk, skb, getfrag, from,
> > > 
> > > 
> > > Add to this a netlink/sysfs API to set the frag_id_hash for
> > > devices.
> > > 
> > > Now, user can set identical frag id hash for all devices
> > > for a given VM.
> > > 
> > > We can even expose this to guests: each guest would generate
> > > the ID on boot and send it to host, host would set it
> > > in sysfs.
> > 
> > jhash_1word shouldn't be a bijection, so we are randomizing here and are
> > increasing the probability of collisions.
> > Instead of jhash_1word you
> > would need to take a simple block cipher with the hash as key.
> > 
> > Bye,
> > Hannes
> 
> fhdr.identification is coming from jhash_3word itself, how is this
> different?
> 

Sorry, I currently cannot follow. Does it? We hash the ipv6 addresses
and the hash is used as an index into the ip_idents array.

Sorry, maybe I have overlooked something?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2015-01-28 17:34 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-26 14:37 [PATCH 0/3] Restore UFO support to virtio_net devices Vladislav Yasevich
2015-01-26 14:37 ` [PATCH 1/3] ipv6: Select fragment id during UFO/GSO segmentation if not set Vladislav Yasevich
2015-01-27  2:47   ` Ben Hutchings
2015-01-27  8:27     ` David Miller
2015-01-27  8:27     ` David Miller
2015-01-27  8:42     ` Michael S. Tsirkin
2015-01-27  8:42     ` Michael S. Tsirkin
2015-01-27 13:47       ` Hannes Frederic Sowa
2015-01-27 14:26         ` Vlad Yasevich
2015-01-27 14:38           ` Eric Dumazet
2015-01-27 16:02           ` Hannes Frederic Sowa
2015-01-27 16:08             ` Michael S. Tsirkin
2015-01-28  8:25               ` Hannes Frederic Sowa
2015-01-28  9:46                 ` Michael S. Tsirkin
2015-01-28 10:34                   ` Hannes Frederic Sowa
2015-01-28 10:39                     ` Hannes Frederic Sowa
2015-01-28 13:43                     ` Michael S. Tsirkin
2015-01-28 14:17                       ` Hannes Frederic Sowa
2015-01-28 14:16                     ` Vlad Yasevich
2015-01-28 14:45                       ` Hannes Frederic Sowa
2015-01-28 15:27                         ` Vlad Yasevich
2015-01-28 15:49                           ` Michael S. Tsirkin
2015-01-28 16:00                     ` Michael S. Tsirkin
2015-01-28 16:15                       ` Hannes Frederic Sowa
2015-01-28 16:48                         ` Michael S. Tsirkin
2015-01-28 17:34                           ` Hannes Frederic Sowa
2015-01-28 17:24                   ` Ben Hutchings
2015-01-27 16:25             ` Vlad Yasevich
2015-01-27 13:47       ` Hannes Frederic Sowa
2015-01-26 14:37 ` [PATCH 2/3] Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets" Vladislav Yasevich
2015-01-26 14:37 ` Vladislav Yasevich
2015-01-26 14:37 ` [PATCH 3/3] Revert "drivers/net: Disable UFO through virtio" Vladislav Yasevich
2015-01-27  2:51   ` Ben Hutchings
2015-01-26 15:28 ` [PATCH 0/3] Restore UFO support to virtio_net devices Michael S. Tsirkin
2015-01-26 15:28 ` Michael S. Tsirkin
2015-01-26 15:32   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.