All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support
@ 2015-03-10 19:07 Andy Zhou
  2015-03-10 19:07 ` [patch net-next v2 1/5] net: add 'const' to __ipv6_select_ident()'s input address parameter Andy Zhou
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Andy Zhou @ 2015-03-10 19:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andy Zhou

These are the patches refactors the core APIs for both IPv4 and IPv6.

The intended use for openvswitch has been posted in the eariler RFC:

http://lists.openwall.net/netdev/2015/03/02/220

Andy Zhou (5):
  net: add 'const' to __ipv6_select_ident()'s input address parameter
  net: add ipv6_select_ident_by_addr() API
  net: refactor IPv4 and IPv6 fragmentation APIs
  net: refactor IPv4 and IPv6 defragmentation APIs
  net: export symbol nf_ct_frag6_consume_orig

 include/net/ip.h                            |   7 +-
 include/net/ip6_route.h                     |   3 +-
 include/net/ipv6.h                          |   9 +-
 include/net/netfilter/ipv6/nf_defrag_ipv6.h |   2 +
 net/bridge/br_netfilter.c                   |   7 +-
 net/ipv4/ip_fragment.c                      |  15 ++-
 net/ipv4/ip_output.c                        | 126 ++++++++++++---------
 net/ipv6/ip6_output.c                       | 165 +++++++++++++++++-----------
 net/ipv6/netfilter/nf_conntrack_reasm.c     |  18 ++-
 net/ipv6/output_core.c                      |  18 ++-
 net/ipv6/xfrm6_output.c                     |   9 +-
 11 files changed, 242 insertions(+), 137 deletions(-)

-- 
v1->v2: change both ip_fragment() and ip6_fragment()'s output callback
        functions to take two arguments, skb and an output argument.

	split exporting nf_cf_frag6_consume_orig into its own patch. 

	fix style issues

1.9.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch net-next v2 1/5] net: add 'const' to __ipv6_select_ident()'s input address parameter
  2015-03-10 19:07 [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support Andy Zhou
@ 2015-03-10 19:07 ` Andy Zhou
  2015-03-10 19:07 ` [patch net-next v2 2/5] net: add ipv6_select_ident_by_addr() API Andy Zhou
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Andy Zhou @ 2015-03-10 19:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andy Zhou

This function does not change those addresses.

Signed-off-by: Andy Zhou <azhou@nicira.com>
---
 net/ipv6/output_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index 74581f7..86ff1cf 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -9,8 +9,8 @@
 #include <net/addrconf.h>
 #include <net/secure_seq.h>
 
-static u32 __ipv6_select_ident(u32 hashrnd, struct in6_addr *dst,
-			       struct in6_addr *src)
+static u32 __ipv6_select_ident(u32 hashrnd, const struct in6_addr *dst,
+			       const struct in6_addr *src)
 {
 	u32 hash, id;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [patch net-next v2 2/5] net: add ipv6_select_ident_by_addr() API
  2015-03-10 19:07 [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support Andy Zhou
  2015-03-10 19:07 ` [patch net-next v2 1/5] net: add 'const' to __ipv6_select_ident()'s input address parameter Andy Zhou
@ 2015-03-10 19:07 ` Andy Zhou
  2015-03-10 19:07 ` [patch net-next v2 3/5] net: refactor IPv4 and IPv6 fragmentation APIs Andy Zhou
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Andy Zhou @ 2015-03-10 19:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andy Zhou

In case the route information is not available but IPv6 identification
field needs to be calculated, for example, in ipv6 fragmentation. This
new API can used.

The current ipv6_select_ident() is kept as is. Its implementation now
calls the new API.

Signed-off-by: Andy Zhou <azhou@nicira.com>
---
 include/net/ipv6.h     |  5 ++++-
 net/ipv6/output_core.c | 14 +++++++++++---
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index b767306..db92dc7 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -671,7 +671,10 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add
 	return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr));
 }
 
-void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt);
+void ipv6_select_ident(struct frag_hdr *fhdr, const struct rt6_info *rt);
+void ipv6_select_ident_by_addr(struct frag_hdr *fhdr,
+			       const struct in6_addr *dst,
+			       const struct in6_addr *src);
 void ipv6_proxy_select_ident(struct sk_buff *skb);
 
 int ip6_dst_hoplimit(struct dst_entry *dst);
diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index 86ff1cf..6dd4ab2 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -59,17 +59,25 @@ void ipv6_proxy_select_ident(struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(ipv6_proxy_select_ident);
 
-void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt)
+void ipv6_select_ident_by_addr(struct frag_hdr *fhdr,
+			       const struct in6_addr *dst,
+			       const struct in6_addr *src)
 {
 	static u32 ip6_idents_hashrnd __read_mostly;
 	u32 id;
 
 	net_get_random_once(&ip6_idents_hashrnd, sizeof(ip6_idents_hashrnd));
 
-	id = __ipv6_select_ident(ip6_idents_hashrnd, &rt->rt6i_dst.addr,
-				 &rt->rt6i_src.addr);
+	id = __ipv6_select_ident(ip6_idents_hashrnd, dst, src);
 	fhdr->identification = htonl(id);
 }
+EXPORT_SYMBOL(ipv6_select_ident_by_addr);
+
+void ipv6_select_ident(struct frag_hdr *fhdr, const struct rt6_info *rt)
+{
+	ipv6_select_ident_by_addr(fhdr, &rt->rt6i_dst.addr,
+				  &rt->rt6i_src.addr);
+}
 EXPORT_SYMBOL(ipv6_select_ident);
 
 int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [patch net-next v2 3/5] net: refactor IPv4 and IPv6 fragmentation APIs
  2015-03-10 19:07 [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support Andy Zhou
  2015-03-10 19:07 ` [patch net-next v2 1/5] net: add 'const' to __ipv6_select_ident()'s input address parameter Andy Zhou
  2015-03-10 19:07 ` [patch net-next v2 2/5] net: add ipv6_select_ident_by_addr() API Andy Zhou
@ 2015-03-10 19:07 ` Andy Zhou
  2015-03-10 19:07 ` [patch net-next v2 4/5] net: refactor IPv4 and IPv6 defragmentation APIs Andy Zhou
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Andy Zhou @ 2015-03-10 19:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andy Zhou

Both ip_fragment() and ip6_fragment() APIs assume skb has an
attached netdev device, from which the MTU size can be derived.
However, skbs incoming from OVS vports do not have an attached
netdev device.

This patch splits the original function into two parts: The core
fragmentation logic is now provided by
ip_fragment_mtu()/ip6_fragment_mut().

The original APIs are kept as is. Their implementation now calls
the new APIs. Any information derived from the attached netdev
device is first derived in the original APIs and passed into the
new APIs.

In addition, The call back output function into the new APIs now
accepts two arguments: a skb and an application specific pointer,
which specifies additional information not directly associated
with skb, such as OVS flow, to the output function.

Signed-off-by: Andy Zhou <azhou@nicira.com>
---
 include/net/ip.h          |   6 +-
 include/net/ip6_route.h   |   3 +-
 include/net/ipv6.h        |   4 ++
 net/bridge/br_netfilter.c |   7 +-
 net/ipv4/ip_output.c      | 126 ++++++++++++++++++++---------------
 net/ipv6/ip6_output.c     | 165 ++++++++++++++++++++++++++++------------------
 net/ipv6/xfrm6_output.c   |   9 ++-
 7 files changed, 198 insertions(+), 122 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 025c61c..bd49ecc 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -108,7 +108,11 @@ int ip_local_deliver(struct sk_buff *skb);
 int ip_mr_input(struct sk_buff *skb);
 int ip_output(struct sock *sk, struct sk_buff *skb);
 int ip_mc_output(struct sock *sk, struct sk_buff *skb);
-int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *));
+int ip_fragment(struct sk_buff *skb, void *output_arg,
+		int (*output)(struct sk_buff *, void *output_arg));
+int ip_fragment_mtu(struct sk_buff *skb, unsigned int mtu, unsigned int ll_rs,
+		    struct net_device *dev, void *output_arg,
+		    int (*output)(struct sk_buff *, void *output_arg));
 int ip_do_nat(struct sk_buff *skb);
 void ip_send_check(struct iphdr *ip);
 int __ip_local_out(struct sk_buff *skb);
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 1d09b46..7e2c2c5 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -170,7 +170,8 @@ static inline bool ipv6_anycast_destination(const struct sk_buff *skb)
 	return rt->rt6i_flags & RTF_ANYCAST;
 }
 
-int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *));
+int ip6_fragment(struct sk_buff *skb, void *output_arg,
+		 int (*output)(struct sk_buff *, void *output_arg));
 
 static inline int ip6_skb_dst_mtu(struct sk_buff *skb)
 {
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index db92dc7..d35e769 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -910,6 +910,10 @@ int ip6_mc_source(int add, int omode, struct sock *sk,
 int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf);
 int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
 		  struct group_filter __user *optval, int __user *optlen);
+int ip6_fragment_mtu(struct sk_buff *skb, unsigned int mtu, int hroom,
+		     int troom, struct net_device *dev, __be32 frag_id,
+		     void *output_arg,
+		     int (*output)(struct sk_buff *, void *));
 
 #ifdef CONFIG_PROC_FS
 int ac6_proc_init(struct net *net);
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 0ee453f..6bef77b 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -764,6 +764,11 @@ static unsigned int br_nf_forward_arp(const struct nf_hook_ops *ops,
 }
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4)
+static int br_dev_queue_push_xmit__(struct sk_buff *skb, void *__unused)
+{
+	 return br_dev_queue_push_xmit(skb);
+}
+
 static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 {
 	int ret;
@@ -780,7 +785,7 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb)
 			/* Drop invalid packet */
 			return NF_DROP;
 		IPCB(skb)->frag_max_size = frag_max_size;
-		ret = ip_fragment(skb, br_dev_queue_push_xmit);
+		ret = ip_fragment(skb, NULL, br_dev_queue_push_xmit__);
 	} else
 		ret = br_dev_queue_push_xmit(skb);
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index a7aea20..a0dfe18 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -211,6 +211,11 @@ static inline int ip_finish_output2(struct sk_buff *skb)
 	return -EINVAL;
 }
 
+static inline int __ip_finish_output2(struct sk_buff *skb, void *__unused)
+{
+	return ip_finish_output2(skb);
+}
+
 static int ip_finish_output_gso(struct sk_buff *skb)
 {
 	netdev_features_t features;
@@ -243,7 +248,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
 		int err;
 
 		segs->next = NULL;
-		err = ip_fragment(segs, ip_finish_output2);
+		err = ip_fragment(segs, NULL, __ip_finish_output2);
 
 		if (err && ret == 0)
 			ret = err;
@@ -266,7 +271,7 @@ static int ip_finish_output(struct sk_buff *skb)
 		return ip_finish_output_gso(skb);
 
 	if (skb->len > ip_skb_dst_mtu(skb))
-		return ip_fragment(skb, ip_finish_output2);
+		return ip_fragment(skb, NULL, __ip_finish_output2);
 
 	return ip_finish_output2(skb);
 }
@@ -472,54 +477,22 @@ static void ip_copy_metadata(struct sk_buff *to, struct sk_buff *from)
 	skb_copy_secmark(to, from);
 }
 
-/*
- *	This IP datagram is too large to be sent in one piece.  Break it up into
- *	smaller pieces (each of size equal to IP header plus
- *	a block of the data of the original IP data part) that will yet fit in a
- *	single device frame, and queue such a frame for sending.
- */
-
-int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
+int ip_fragment_mtu(struct sk_buff *skb, unsigned int mtu, unsigned int ll_rs,
+		    struct net_device *dev, void *output_arg,
+		    int (*output)(struct sk_buff *, void *output_arg))
 {
+	unsigned int hlen, left, len;
+	__be16 not_last_frag;
+	struct sk_buff *skb2;
 	struct iphdr *iph;
 	int ptr;
-	struct net_device *dev;
-	struct sk_buff *skb2;
-	unsigned int mtu, hlen, left, len, ll_rs;
 	int offset;
-	__be16 not_last_frag;
-	struct rtable *rt = skb_rtable(skb);
 	int err = 0;
 
-	dev = rt->dst.dev;
-
-	/*
-	 *	Point into the IP datagram header.
-	 */
-
 	iph = ip_hdr(skb);
-
-	mtu = ip_skb_dst_mtu(skb);
-	if (unlikely(((iph->frag_off & htons(IP_DF)) && !skb->ignore_df) ||
-		     (IPCB(skb)->frag_max_size &&
-		      IPCB(skb)->frag_max_size > mtu))) {
-		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
-		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
-			  htonl(mtu));
-		kfree_skb(skb);
-		return -EMSGSIZE;
-	}
-
-	/*
-	 *	Setup starting values.
-	 */
-
 	hlen = iph->ihl * 4;
 	mtu = mtu - hlen;	/* Size of data space */
-#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
-	if (skb->nf_bridge)
-		mtu -= nf_bridge_mtu_reduction(skb);
-#endif
+
 	IPCB(skb)->flags |= IPSKB_FRAG_COMPLETE;
 
 	/* When frag_list is given, use it. First, check its validity:
@@ -592,10 +565,11 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 				ip_send_check(iph);
 			}
 
-			err = output(skb);
+			err = output(skb, output_arg);
 
-			if (!err)
-				IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGCREATES);
+			if (!err && dev)
+				IP_INC_STATS(dev_net(dev),
+					     IPSTATS_MIB_FRAGCREATES);
 			if (err || !frag)
 				break;
 
@@ -605,7 +579,8 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		}
 
 		if (err == 0) {
-			IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGOKS);
+			if (dev)
+				IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGOKS);
 			return 0;
 		}
 
@@ -614,7 +589,8 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 			kfree_skb(frag);
 			frag = skb;
 		}
-		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
+		if (dev)
+			IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
 		return err;
 
 slow_path_clean:
@@ -636,10 +612,6 @@ slow_path:
 	left = skb->len - hlen;		/* Space per frame */
 	ptr = hlen;		/* Where to start from */
 
-	/* for bridged IP traffic encapsulated inside f.e. a vlan header,
-	 * we need to make room for the encapsulating header
-	 */
-	ll_rs = LL_RESERVED_SPACE_EXTRA(rt->dst.dev, nf_bridge_pad(skb));
 
 	/*
 	 *	Fragment the datagram.
@@ -732,21 +704,67 @@ slow_path:
 
 		ip_send_check(iph);
 
-		err = output(skb2);
+		err = output(skb2, output_arg);
 		if (err)
 			goto fail;
 
-		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGCREATES);
+		if (dev)
+			IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGCREATES);
 	}
 	consume_skb(skb);
-	IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGOKS);
+	if (dev)
+		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGOKS);
 	return err;
 
 fail:
 	kfree_skb(skb);
-	IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
+	if (dev)
+		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
 	return err;
 }
+EXPORT_SYMBOL(ip_fragment_mtu);
+
+/* This IP datagram is too large to be sent in one piece.  Break it up into
+ * smaller pieces (each of size equal to IP header plus
+ * a block of the data of the original IP data part) that will yet fit in a
+ * single device frame, and queue such a frame for sending.
+ */
+int ip_fragment(struct sk_buff *skb, void *output_arg,
+		int (*output)(struct sk_buff *, void *output_arg))
+{
+	struct rtable *rt = skb_rtable(skb);
+	unsigned int mtu, ll_rs;
+	struct net_device *dev;
+	struct iphdr *iph;
+
+	dev = rt->dst.dev;
+
+	/* Point into the IP datagram header.  */
+	iph = ip_hdr(skb);
+
+	mtu = ip_skb_dst_mtu(skb);
+	if (unlikely(((iph->frag_off & htons(IP_DF)) && !skb->ignore_df) ||
+		     (IPCB(skb)->frag_max_size &&
+		      IPCB(skb)->frag_max_size > mtu))) {
+		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
+		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+			  htonl(mtu));
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	/* Setup starting values.  */
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+	if (skb->nf_bridge)
+		mtu -= nf_bridge_mtu_reduction(skb);
+#endif
+	/* for bridged IP traffic encapsulated inside f.e. a vlan header,
+	 * we need to make room for the encapsulating header
+	 */
+	ll_rs = LL_RESERVED_SPACE_EXTRA(rt->dst.dev, nf_bridge_pad(skb));
+
+	return ip_fragment_mtu(skb, mtu, ll_rs, output_arg, dev, output);
+}
 EXPORT_SYMBOL(ip_fragment);
 
 int
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 0a04a37..a6714f5 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -122,12 +122,17 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	return -EINVAL;
 }
 
+static int __ip6_finish_output2(struct sk_buff *skb, void *__unused)
+{
+	return ip6_finish_output2(skb);
+}
+
 static int ip6_finish_output(struct sk_buff *skb)
 {
 	if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
 	    dst_allfrag(skb_dst(skb)) ||
 	    (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size))
-		return ip6_fragment(skb, ip6_finish_output2);
+		return ip6_fragment(skb, NULL, __ip6_finish_output2);
 	else
 		return ip6_finish_output2(skb);
 }
@@ -537,46 +542,30 @@ static void ip6_copy_metadata(struct sk_buff *to, struct sk_buff *from)
 	skb_copy_secmark(to, from);
 }
 
-int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
+int ip6_fragment_mtu(struct sk_buff *skb, unsigned int mtu,
+		     int hroom, int troom, struct net_device *dev,
+		     __be32 frag_id, void *output_arg,
+		     int (*output)(struct sk_buff *, void *))
 {
-	struct sk_buff *frag;
-	struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
-	struct ipv6_pinfo *np = skb->sk ? inet6_sk(skb->sk) : NULL;
-	struct ipv6hdr *tmp_hdr;
-	struct frag_hdr *fh;
-	unsigned int mtu, hlen, left, len;
-	int hroom, troom;
-	__be32 frag_id = 0;
 	int ptr, offset = 0, err = 0;
+	unsigned int hlen, left, len;
 	u8 *prevhdr, nexthdr = 0;
-	struct net *net = dev_net(skb_dst(skb)->dev);
-
-	hlen = ip6_find_1stfragopt(skb, &prevhdr);
-	nexthdr = *prevhdr;
-
-	mtu = ip6_skb_dst_mtu(skb);
-
-	/* We must not fragment if the socket is set to force MTU discovery
-	 * or if the skb it not generated by a local socket.
-	 */
-	if (unlikely(!skb->ignore_df && skb->len > mtu) ||
-		     (IP6CB(skb)->frag_max_size &&
-		      IP6CB(skb)->frag_max_size > mtu)) {
-		if (skb->sk && dst_allfrag(skb_dst(skb)))
-			sk_nocaps_add(skb->sk, NETIF_F_GSO_MASK);
+	struct ipv6hdr *tmp_hdr;
+	struct sk_buff *frag;
+	struct frag_hdr *fh;
+	struct rt6_info *rt;
+	struct net *net;
 
-		skb->dev = skb_dst(skb)->dev;
-		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
-		IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
-			      IPSTATS_MIB_FRAGFAILS);
-		kfree_skb(skb);
-		return -EMSGSIZE;
+	if (dev) {
+		net = dev_net(skb_dst(skb)->dev);
+		rt = (struct rt6_info *)skb_dst(skb);
+	} else {
+		net = NULL;
+		rt = NULL;
 	}
 
-	if (np && np->frag_size < mtu) {
-		if (np->frag_size)
-			mtu = np->frag_size;
-	}
+	hlen = ip6_find_1stfragopt(skb, &prevhdr);
+	nexthdr = *prevhdr;
 	mtu -= hlen + sizeof(struct frag_hdr);
 
 	if (skb_has_frag_list(skb)) {
@@ -616,8 +605,9 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		*prevhdr = NEXTHDR_FRAGMENT;
 		tmp_hdr = kmemdup(skb_network_header(skb), hlen, GFP_ATOMIC);
 		if (!tmp_hdr) {
-			IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
-				      IPSTATS_MIB_FRAGFAILS);
+			if (dev)
+				IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
+					      IPSTATS_MIB_FRAGFAILS);
 			return -ENOMEM;
 		}
 
@@ -627,11 +617,10 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		skb_reset_network_header(skb);
 		memcpy(skb_network_header(skb), tmp_hdr, hlen);
 
-		ipv6_select_ident(fh, rt);
 		fh->nexthdr = nexthdr;
 		fh->reserved = 0;
 		fh->frag_off = htons(IP6_MF);
-		frag_id = fh->identification;
+		fh->identification = frag_id;
 
 		first_len = skb_pagelen(skb);
 		skb->data_len = first_len - skb_headlen(skb);
@@ -639,7 +628,8 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		ipv6_hdr(skb)->payload_len = htons(first_len -
 						   sizeof(struct ipv6hdr));
 
-		dst_hold(&rt->dst);
+		if (dev)
+			dst_hold(&rt->dst);
 
 		for (;;) {
 			/* Prepare header of the next frame,
@@ -665,8 +655,8 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 				ip6_copy_metadata(frag, skb);
 			}
 
-			err = output(skb);
-			if (!err)
+			err = output(skb, output_arg);
+			if (!err && dev)
 				IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
 					      IPSTATS_MIB_FRAGCREATES);
 
@@ -681,17 +671,21 @@ int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		kfree(tmp_hdr);
 
 		if (err == 0) {
-			IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
-				      IPSTATS_MIB_FRAGOKS);
-			ip6_rt_put(rt);
+			if (dev) {
+				IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
+					      IPSTATS_MIB_FRAGOKS);
+				ip6_rt_put(rt);
+			}
 			return 0;
 		}
 
 		kfree_skb_list(frag);
 
-		IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
-			      IPSTATS_MIB_FRAGFAILS);
-		ip6_rt_put(rt);
+		if (dev) {
+			IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
+				      IPSTATS_MIB_FRAGFAILS);
+			ip6_rt_put(rt);
+		}
 		return err;
 
 slow_path_clean:
@@ -717,8 +711,6 @@ slow_path:
 	 */
 
 	*prevhdr = NEXTHDR_FRAGMENT;
-	hroom = LL_RESERVED_SPACE(rt->dst.dev);
-	troom = rt->dst.dev->needed_tailroom;
 
 	/*
 	 *	Keep copying data until we run out.
@@ -738,8 +730,10 @@ slow_path:
 		frag = alloc_skb(len + hlen + sizeof(struct frag_hdr) +
 				 hroom + troom, GFP_ATOMIC);
 		if (!frag) {
-			IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
-				      IPSTATS_MIB_FRAGFAILS);
+			if (dev)
+				IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
+					      IPSTATS_MIB_FRAGFAILS);
+
 			err = -ENOMEM;
 			goto fail;
 		}
@@ -773,11 +767,7 @@ slow_path:
 		 */
 		fh->nexthdr = nexthdr;
 		fh->reserved = 0;
-		if (!frag_id) {
-			ipv6_select_ident(fh, rt);
-			frag_id = fh->identification;
-		} else
-			fh->identification = frag_id;
+		fh->identification = frag_id;
 
 		/*
 		 *	Copy a block of the IP datagram.
@@ -798,24 +788,71 @@ slow_path:
 		/*
 		 *	Put this fragment into the sending queue.
 		 */
-		err = output(frag);
+		err = output(frag, output_arg);
 		if (err)
 			goto fail;
 
-		IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
-			      IPSTATS_MIB_FRAGCREATES);
+		if (dev)
+			IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
+				      IPSTATS_MIB_FRAGCREATES);
 	}
-	IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
-		      IPSTATS_MIB_FRAGOKS);
+	if (dev)
+		IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
+			      IPSTATS_MIB_FRAGOKS);
 	consume_skb(skb);
 	return err;
 
 fail:
-	IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
-		      IPSTATS_MIB_FRAGFAILS);
+	if (dev)
+		IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
+			      IPSTATS_MIB_FRAGFAILS);
 	kfree_skb(skb);
 	return err;
 }
+EXPORT_SYMBOL(ip6_fragment_mtu);
+
+int ip6_fragment(struct sk_buff *skb, void *output_arg,
+		 int (*output)(struct sk_buff *, void *))
+{
+	struct ipv6_pinfo *np = skb->sk ? inet6_sk(skb->sk) : NULL;
+	struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
+	struct net *net = dev_net(skb_dst(skb)->dev);
+	struct net_device *dev = skb_dst(skb)->dev;
+	struct frag_hdr fh;
+	unsigned int mtu;
+	int hroom, troom;
+
+	hroom = LL_RESERVED_SPACE(rt->dst.dev);
+	troom = rt->dst.dev->needed_tailroom;
+	mtu = ip6_skb_dst_mtu(skb);
+
+	/* We must not fragment if the socket is set to force MTU discovery
+	 * or if the skb it not generated by a local socket.
+	 */
+	if (unlikely(!skb->ignore_df && skb->len > mtu) ||
+	    (IP6CB(skb)->frag_max_size &&
+	     IP6CB(skb)->frag_max_size > mtu)) {
+		if (skb->sk && dst_allfrag(skb_dst(skb)))
+			sk_nocaps_add(skb->sk, NETIF_F_GSO_MASK);
+
+		skb->dev = skb_dst(skb)->dev;
+		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+		IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
+			      IPSTATS_MIB_FRAGFAILS);
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	if (np && np->frag_size < mtu) {
+		if (np->frag_size)
+			mtu = np->frag_size;
+	}
+
+	dev = skb_dst(skb)->dev;
+	ipv6_select_ident(&fh, rt);
+	return ip6_fragment_mtu(skb, mtu, hroom, troom, dev,
+				fh.identification, NULL, output);
+}
 
 static inline int ip6_rt_check(const struct rt6key *rt_key,
 			       const struct in6_addr *fl_addr,
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index ca3f29b..ce52f2f 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -131,6 +131,13 @@ int xfrm6_output_finish(struct sk_buff *skb)
 	return xfrm_output(skb);
 }
 
+static int __xfrm6_output_finish(struct sk_buff *skb, void *_x)
+{
+	struct xfrm_state *x = (struct xfrm_state *)_x;
+
+	return x->outer_mode->afinfo->output_finish(skb);
+}
+
 static int __xfrm6_output(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
@@ -160,7 +167,7 @@ static int __xfrm6_output(struct sk_buff *skb)
 	if (x->props.mode == XFRM_MODE_TUNNEL &&
 	    ((skb->len > mtu && !skb_is_gso(skb)) ||
 		dst_allfrag(skb_dst(skb)))) {
-			return ip6_fragment(skb, x->outer_mode->afinfo->output_finish);
+			return ip6_fragment(skb, x, __xfrm6_output_finish);
 	}
 	return x->outer_mode->afinfo->output_finish(skb);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [patch net-next v2 4/5] net: refactor IPv4 and IPv6 defragmentation APIs
  2015-03-10 19:07 [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support Andy Zhou
                   ` (2 preceding siblings ...)
  2015-03-10 19:07 ` [patch net-next v2 3/5] net: refactor IPv4 and IPv6 fragmentation APIs Andy Zhou
@ 2015-03-10 19:07 ` Andy Zhou
  2015-03-10 19:07 ` [patch net-next v2 5/5] net: export symbol nf_ct_frag6_consume_orig Andy Zhou
  2015-03-11 20:45 ` [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: Andy Zhou @ 2015-03-10 19:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andy Zhou

Both ip_defrag() and nf_ct_frag6_gather() derive the name space
information from the netdev device attached to the skb. However,
packets processed by  openvswitch may not have a netdev device.

This patch adds new ip_defrag_net() and  nf_ct_frag6_gather_net()
API that accepts net as an argument.

Signed-off-by: Andy Zhou <azhou@nicira.com>
---
 include/net/ip.h                            |  1 +
 include/net/netfilter/ipv6/nf_defrag_ipv6.h |  2 ++
 net/ipv4/ip_fragment.c                      | 15 +++++++++++----
 net/ipv6/netfilter/nf_conntrack_reasm.c     | 17 ++++++++++++-----
 4 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index bd49ecc..80f41d2 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -496,6 +496,7 @@ enum ip_defrag_users {
 };
 
 int ip_defrag(struct sk_buff *skb, u32 user);
+int ip_defrag_net(struct net *net, struct sk_buff *skb, u32 user);
 #ifdef CONFIG_INET
 struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 user);
 #else
diff --git a/include/net/netfilter/ipv6/nf_defrag_ipv6.h b/include/net/netfilter/ipv6/nf_defrag_ipv6.h
index 27666d8..bf3b9e1 100644
--- a/include/net/netfilter/ipv6/nf_defrag_ipv6.h
+++ b/include/net/netfilter/ipv6/nf_defrag_ipv6.h
@@ -6,6 +6,8 @@ void nf_defrag_ipv6_enable(void);
 int nf_ct_frag6_init(void);
 void nf_ct_frag6_cleanup(void);
 struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user);
+struct sk_buff *nf_ct_frag6_gather_net(struct net *net, struct sk_buff *skb,
+				   u32 user);
 void nf_ct_frag6_consume_orig(struct sk_buff *skb);
 
 struct inet_frags_ctl;
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 2c8d98e..fc7ae79 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -628,13 +628,10 @@ out_fail:
 	return err;
 }
 
-/* Process an incoming IP datagram fragment. */
-int ip_defrag(struct sk_buff *skb, u32 user)
+int ip_defrag_net(struct net *net, struct sk_buff *skb, u32 user)
 {
 	struct ipq *qp;
-	struct net *net;
 
-	net = skb->dev ? dev_net(skb->dev) : dev_net(skb_dst(skb)->dev);
 	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMREQDS);
 
 	/* Lookup (or create) queue header */
@@ -654,6 +651,16 @@ int ip_defrag(struct sk_buff *skb, u32 user)
 	kfree_skb(skb);
 	return -ENOMEM;
 }
+EXPORT_SYMBOL(ip_defrag_net);
+
+/* Process an incoming IP datagram fragment. */
+int ip_defrag(struct sk_buff *skb, u32 user)
+{
+	struct net *net;
+
+	net = skb->dev ? dev_net(skb->dev) : dev_net(skb_dst(skb)->dev);
+	return ip_defrag_net(net, skb, user);
+}
 EXPORT_SYMBOL(ip_defrag);
 
 struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 user)
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 6f187c8..9d36db9 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -563,12 +563,10 @@ find_prev_fhdr(struct sk_buff *skb, u8 *prevhdrp, int *prevhoff, int *fhoff)
 	return 0;
 }
 
-struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
+struct sk_buff *nf_ct_frag6_gather_net(struct net *net, struct sk_buff *skb,
+				       u32 user)
 {
 	struct sk_buff *clone;
-	struct net_device *dev = skb->dev;
-	struct net *net = skb_dst(skb) ? dev_net(skb_dst(skb)->dev)
-				       : dev_net(skb->dev);
 	struct frag_hdr *fhdr;
 	struct frag_queue *fq;
 	struct ipv6hdr *hdr;
@@ -620,7 +618,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
 
 	if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
 	    fq->q.meat == fq->q.len) {
-		ret_skb = nf_ct_frag6_reasm(fq, dev);
+		ret_skb = nf_ct_frag6_reasm(fq, skb->dev);
 		if (ret_skb == NULL)
 			pr_debug("Can't reassemble fragmented packets\n");
 	}
@@ -633,6 +631,15 @@ ret_orig:
 	kfree_skb(clone);
 	return skb;
 }
+EXPORT_SYMBOL(nf_ct_frag6_gather_net);
+
+struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
+{
+	struct net *net = skb_dst(skb) ? dev_net(skb_dst(skb)->dev)
+				       : dev_net(skb->dev);
+
+	return nf_ct_frag6_gather_net(net, skb, user);
+}
 
 void nf_ct_frag6_consume_orig(struct sk_buff *skb)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [patch net-next v2 5/5] net: export symbol nf_ct_frag6_consume_orig
  2015-03-10 19:07 [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support Andy Zhou
                   ` (3 preceding siblings ...)
  2015-03-10 19:07 ` [patch net-next v2 4/5] net: refactor IPv4 and IPv6 defragmentation APIs Andy Zhou
@ 2015-03-10 19:07 ` Andy Zhou
  2015-03-11 20:45 ` [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: Andy Zhou @ 2015-03-10 19:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, Andy Zhou

Openvswitch conntrack support needs to access this function.

Signed-off-by: Andy Zhou <azhou@nicira.com>
---
 net/ipv6/netfilter/nf_conntrack_reasm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 9d36db9..b8ffde9 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -652,6 +652,7 @@ void nf_ct_frag6_consume_orig(struct sk_buff *skb)
 		s = s2;
 	}
 }
+EXPORT_SYMBOL(nf_ct_frag6_consume_orig);
 
 static int nf_ct_net_init(struct net *net)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support
  2015-03-10 19:07 [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support Andy Zhou
                   ` (4 preceding siblings ...)
  2015-03-10 19:07 ` [patch net-next v2 5/5] net: export symbol nf_ct_frag6_consume_orig Andy Zhou
@ 2015-03-11 20:45 ` David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2015-03-11 20:45 UTC (permalink / raw)
  To: azhou; +Cc: netdev

From: Andy Zhou <azhou@nicira.com>
Date: Tue, 10 Mar 2015 12:07:13 -0700

> These are the patches refactors the core APIs for both IPv4 and IPv6.
> 
> The intended use for openvswitch has been posted in the eariler RFC:
> 
> http://lists.openwall.net/netdev/2015/03/02/220

Yet another hack for openvswitch.

I'm tired of this frankenstein subsystem deciding what the invariants
actually are for our core APIs.

It is completely natural to expect that there is some device to
interrogate regarding link layer capabilities, such as MTU.

And it is openvswitch's problem that they decided to break this
invariant.

I'm not accepting hacks in ipv4/ipv6 to workaround a mistake that
the OVS folks decided to make, sorry.

Stop making OVS such a damn mess, please.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-03-11 20:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-10 19:07 [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support Andy Zhou
2015-03-10 19:07 ` [patch net-next v2 1/5] net: add 'const' to __ipv6_select_ident()'s input address parameter Andy Zhou
2015-03-10 19:07 ` [patch net-next v2 2/5] net: add ipv6_select_ident_by_addr() API Andy Zhou
2015-03-10 19:07 ` [patch net-next v2 3/5] net: refactor IPv4 and IPv6 fragmentation APIs Andy Zhou
2015-03-10 19:07 ` [patch net-next v2 4/5] net: refactor IPv4 and IPv6 defragmentation APIs Andy Zhou
2015-03-10 19:07 ` [patch net-next v2 5/5] net: export symbol nf_ct_frag6_consume_orig Andy Zhou
2015-03-11 20:45 ` [patch net-next v2 0/5] Refactor IP defragmentation and fragmenation APIs for OVS conntrack support David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.