netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag
@ 2015-10-17 20:14 Florian Westphal
  2015-10-17 20:14 ` [PATCH nf-next 1/4] netfilter: ipv6: remove extra clone/free operations Florian Westphal
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Florian Westphal @ 2015-10-17 20:14 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, azhou, joestringer

[ CC netdev since patch #2 isn't nf-specific.  Dave, if you want
  I can resubmit that one after the next nf-pull request; let me know if
  you would prefer that ].

Openvswitch seems broken wrt. to defragmentation, it doesn't call
nf_ct_frag6_consume_orig to free the original fragments.

Moreover, openvswitch design seems to require that it can reuse current
skb rather than work with a new skb pointer (it uses skb_morph for this).

Instead of OVS-side fix this series tries to alter netfilter ipv6 defrag
accordingly.

1. nf_ct_frag6_consume_orig is removed, since
commit 6aafeef03b9d9ecf ("netfilter: push reasm skb through instead of
original frag skbs") nothing needs the original fragments so there is no
reason why we need to clone+store original skb -- just stash original
skbs in the frag_list.

2. Use skb_morph to make the last skb processed (not necessarily last
fragment) the reassembled one.

3. remove the no-longer needed recursion into nf_iterate, we can now just
return ACCEPT/STOLEN as needed instead of NF_HOOK_THRESH()+NF_STOLEN.

Tested with flood-ping6+ fault-injection framework 'failslab' type.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH nf-next 1/4] netfilter: ipv6: remove extra clone/free operations
  2015-10-17 20:14 [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag Florian Westphal
@ 2015-10-17 20:14 ` Florian Westphal
  2015-10-17 20:14 ` [PATCH nf-next 2/4] inet: kill obsolete skb_free op Florian Westphal
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Florian Westphal @ 2015-10-17 20:14 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, azhou, joestringer, Florian Westphal, Jiri Pirko

commit 6aafeef03b9d9ecf
("netfilter: push reasm skb through instead of original frag skbs")
changed ipv6 defrag to not use the original skbs anymore.

So rather than keeping the original skbs around just to discard them
afterwards just use the original skbs directly for the fraglist of
the newly assemled skb.

Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/ipv6/nf_defrag_ipv6.h |  1 -
 net/ipv6/netfilter/nf_conntrack_reasm.c     | 71 ++++-------------------------
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c   |  2 -
 3 files changed, 9 insertions(+), 65 deletions(-)

diff --git a/include/net/netfilter/ipv6/nf_defrag_ipv6.h b/include/net/netfilter/ipv6/nf_defrag_ipv6.h
index fb7da5b..fcd20cf 100644
--- a/include/net/netfilter/ipv6/nf_defrag_ipv6.h
+++ b/include/net/netfilter/ipv6/nf_defrag_ipv6.h
@@ -6,7 +6,6 @@ void nf_defrag_ipv6_enable(void);
 int nf_ct_frag6_init(void);
 void nf_ct_frag6_cleanup(void);
 struct sk_buff *nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user);
-void nf_ct_frag6_consume_orig(struct sk_buff *skb);
 
 struct inet_frags_ctl;
 
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 056f5d4..1b1a851 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -56,7 +56,6 @@ struct nf_ct_frag6_skb_cb
 {
 	struct inet6_skb_parm	h;
 	int			offset;
-	struct sk_buff		*orig;
 };
 
 #define NFCT_FRAG6_CB(skb)	((struct nf_ct_frag6_skb_cb *)((skb)->cb))
@@ -170,12 +169,6 @@ static unsigned int nf_hashfn(const struct inet_frag_queue *q)
 	return nf_hash_frag(nq->id, &nq->saddr, &nq->daddr);
 }
 
-static void nf_skb_free(struct sk_buff *skb)
-{
-	if (NFCT_FRAG6_CB(skb)->orig)
-		kfree_skb(NFCT_FRAG6_CB(skb)->orig);
-}
-
 static void nf_ct_frag6_expire(unsigned long data)
 {
 	struct frag_queue *fq;
@@ -378,7 +371,7 @@ err:
 static struct sk_buff *
 nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
 {
-	struct sk_buff *fp, *op, *head = fq->q.fragments;
+	struct sk_buff *fp, *head = fq->q.fragments;
 	int    payload_len;
 	u8 ecn;
 
@@ -429,7 +422,6 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
 		clone->csum = 0;
 		clone->ip_summed = head->ip_summed;
 
-		NFCT_FRAG6_CB(clone)->orig = NULL;
 		add_frag_mem_limit(fq->q.net, clone->truesize);
 	}
 
@@ -473,21 +465,6 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
 	fq->q.fragments = NULL;
 	fq->q.fragments_tail = NULL;
 
-	/* all original skbs are linked into the NFCT_FRAG6_CB(head).orig */
-	fp = skb_shinfo(head)->frag_list;
-	if (fp && NFCT_FRAG6_CB(fp)->orig == NULL)
-		/* at above code, head skb is divided into two skbs. */
-		fp = fp->next;
-
-	op = NFCT_FRAG6_CB(head)->orig;
-	for (; fp; fp = fp->next) {
-		struct sk_buff *orig = NFCT_FRAG6_CB(fp)->orig;
-
-		op->next = orig;
-		op = orig;
-		NFCT_FRAG6_CB(fp)->orig = NULL;
-	}
-
 	return head;
 
 out_oversize:
@@ -565,7 +542,6 @@ find_prev_fhdr(struct sk_buff *skb, u8 *prevhdrp, int *prevhoff, int *fhoff)
 
 struct sk_buff *nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user)
 {
-	struct sk_buff *clone;
 	struct net_device *dev = skb->dev;
 	struct frag_hdr *fhdr;
 	struct frag_queue *fq;
@@ -583,37 +559,25 @@ struct sk_buff *nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 use
 	if (find_prev_fhdr(skb, &prevhdr, &nhoff, &fhoff) < 0)
 		return skb;
 
-	clone = skb_clone(skb, GFP_ATOMIC);
-	if (clone == NULL) {
-		pr_debug("Can't clone skb\n");
+	if (!pskb_may_pull(skb, fhoff + sizeof(*fhdr)))
 		return skb;
-	}
-
-	NFCT_FRAG6_CB(clone)->orig = skb;
-
-	if (!pskb_may_pull(clone, fhoff + sizeof(*fhdr))) {
-		pr_debug("message is too short.\n");
-		goto ret_orig;
-	}
 
-	skb_set_transport_header(clone, fhoff);
-	hdr = ipv6_hdr(clone);
-	fhdr = (struct frag_hdr *)skb_transport_header(clone);
+	skb_set_transport_header(skb, fhoff);
+	hdr = ipv6_hdr(skb);
+	fhdr = (struct frag_hdr *)skb_transport_header(skb);
 
 	fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr,
 		     ip6_frag_ecn(hdr));
-	if (fq == NULL) {
-		pr_debug("Can't find and can't create new queue\n");
-		goto ret_orig;
-	}
+	if (fq == NULL)
+		return skb;
 
 	spin_lock_bh(&fq->q.lock);
 
-	if (nf_ct_frag6_queue(fq, clone, fhdr, nhoff) < 0) {
+	if (nf_ct_frag6_queue(fq, skb, fhdr, nhoff) < 0) {
 		spin_unlock_bh(&fq->q.lock);
 		pr_debug("Can't insert skb to queue\n");
 		inet_frag_put(&fq->q, &nf_frags);
-		goto ret_orig;
+		return skb;
 	}
 
 	if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
@@ -626,25 +590,9 @@ struct sk_buff *nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 use
 
 	inet_frag_put(&fq->q, &nf_frags);
 	return ret_skb;
-
-ret_orig:
-	kfree_skb(clone);
-	return skb;
 }
 EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);
 
-void nf_ct_frag6_consume_orig(struct sk_buff *skb)
-{
-	struct sk_buff *s, *s2;
-
-	for (s = NFCT_FRAG6_CB(skb)->orig; s;) {
-		s2 = s->next;
-		s->next = NULL;
-		consume_skb(s);
-		s = s2;
-	}
-}
-
 static int nf_ct_net_init(struct net *net)
 {
 	net->nf_frag.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
@@ -673,7 +621,6 @@ int nf_ct_frag6_init(void)
 	nf_frags.hashfn = nf_hashfn;
 	nf_frags.constructor = ip6_frag_init;
 	nf_frags.destructor = NULL;
-	nf_frags.skb_free = nf_skb_free;
 	nf_frags.qsize = sizeof(struct frag_queue);
 	nf_frags.match = ip6_frag_match;
 	nf_frags.frag_expire = nf_ct_frag6_expire;
diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
index 4fdbed5e..313c1d0 100644
--- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
+++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
@@ -73,8 +73,6 @@ static unsigned int ipv6_defrag(void *priv,
 	if (reasm == skb)
 		return NF_ACCEPT;
 
-	nf_ct_frag6_consume_orig(reasm);
-
 	NF_HOOK_THRESH(NFPROTO_IPV6, state->hook, state->net, state->sk, reasm,
 		       state->in, state->out,
 		       state->okfn, NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
-- 
2.0.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH nf-next 2/4] inet: kill obsolete skb_free op
  2015-10-17 20:14 [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag Florian Westphal
  2015-10-17 20:14 ` [PATCH nf-next 1/4] netfilter: ipv6: remove extra clone/free operations Florian Westphal
@ 2015-10-17 20:14 ` Florian Westphal
  2015-10-17 20:14 ` [PATCH nf-next 3/4] netfilter: ipv6: in-place replacement of last skb Florian Westphal
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Florian Westphal @ 2015-10-17 20:14 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, azhou, joestringer, Florian Westphal

The only user was removed in preceeding commit.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/inet_frag.h             |  1 -
 net/ieee802154/6lowpan/reassembly.c |  1 -
 net/ipv4/inet_fragment.c            | 10 +---------
 net/ipv4/ip_fragment.c              |  1 -
 net/ipv6/reassembly.c               |  1 -
 5 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 53eead2..9f9aa4d 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -99,7 +99,6 @@ struct inet_frags {
 	void			(*constructor)(struct inet_frag_queue *q,
 					       const void *arg);
 	void			(*destructor)(struct inet_frag_queue *);
-	void			(*skb_free)(struct sk_buff *);
 	void			(*frag_expire)(unsigned long data);
 	struct kmem_cache	*frags_cachep;
 	const char		*frags_cache_name;
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 12e8cf4..f85b08b 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -619,7 +619,6 @@ int __init lowpan_net_frag_init(void)
 	lowpan_frags.hashfn = lowpan_hashfn;
 	lowpan_frags.constructor = lowpan_frag_init;
 	lowpan_frags.destructor = NULL;
-	lowpan_frags.skb_free = NULL;
 	lowpan_frags.qsize = sizeof(struct frag_queue);
 	lowpan_frags.match = lowpan_frag_match;
 	lowpan_frags.frag_expire = lowpan_frag_expire;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index d0a7c03..bf63ea4 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -291,14 +291,6 @@ void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
 }
 EXPORT_SYMBOL(inet_frag_kill);
 
-static inline void frag_kfree_skb(struct netns_frags *nf, struct inet_frags *f,
-				  struct sk_buff *skb)
-{
-	if (f->skb_free)
-		f->skb_free(skb);
-	kfree_skb(skb);
-}
-
 void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f)
 {
 	struct sk_buff *fp;
@@ -315,7 +307,7 @@ void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f)
 		struct sk_buff *xp = fp->next;
 
 		sum_truesize += fp->truesize;
-		frag_kfree_skb(nf, f, fp);
+		kfree_skb(fp);
 		fp = xp;
 	}
 	sum = sum_truesize + f->qsize;
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 5482745..325106d 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -885,7 +885,6 @@ void __init ipfrag_init(void)
 	ip4_frags.hashfn = ip4_hashfn;
 	ip4_frags.constructor = ip4_frag_init;
 	ip4_frags.destructor = ip4_frag_free;
-	ip4_frags.skb_free = NULL;
 	ip4_frags.qsize = sizeof(struct ipq);
 	ip4_frags.match = ip4_frag_match;
 	ip4_frags.frag_expire = ip_expire;
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index f1159bb..10bbcd1 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -745,7 +745,6 @@ int __init ipv6_frag_init(void)
 	ip6_frags.hashfn = ip6_hashfn;
 	ip6_frags.constructor = ip6_frag_init;
 	ip6_frags.destructor = NULL;
-	ip6_frags.skb_free = NULL;
 	ip6_frags.qsize = sizeof(struct frag_queue);
 	ip6_frags.match = ip6_frag_match;
 	ip6_frags.frag_expire = ip6_frag_expire;
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH nf-next 3/4] netfilter: ipv6: in-place replacement of last skb
  2015-10-17 20:14 [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag Florian Westphal
  2015-10-17 20:14 ` [PATCH nf-next 1/4] netfilter: ipv6: remove extra clone/free operations Florian Westphal
  2015-10-17 20:14 ` [PATCH nf-next 2/4] inet: kill obsolete skb_free op Florian Westphal
@ 2015-10-17 20:14 ` Florian Westphal
  2015-10-20 18:39   ` Joe Stringer
  2015-10-17 20:14 ` [PATCH nf-next 4/4] netfilter: ipv6: avoid nf_iterate recursion Florian Westphal
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Florian Westphal @ 2015-10-17 20:14 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, azhou, joestringer, Florian Westphal

openvswitch attempts to morph the reassembled skb with the currently
processed one.  But this looks broken -- the currently processed skb
is part of the reassembled skbs frag_list.

IOW, we morph an element of reasms frag_list into reasm itself, then
free said frag_list element.

This allows callers to process skb as intended by openvswitch: we either
return NULL (skb queued for reassembly), or turn the provided skb into
a reassembled one.

A followup patch will change nf_defrag to avoid the NF_HOOK recursion
which is now no longer needed.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv6/netfilter/nf_conntrack_reasm.c   | 33 +++++++++++++++++++++++++++++--
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c |  4 ----
 net/openvswitch/conntrack.c               |  5 -----
 3 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 1b1a851..72ac916 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -369,7 +369,7 @@ err:
  *	the last and the first frames arrived and all the bits are here.
  */
 static struct sk_buff *
-nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
+nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev,  struct net_device *dev)
 {
 	struct sk_buff *fp, *head = fq->q.fragments;
 	int    payload_len;
@@ -425,6 +425,35 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
 		add_frag_mem_limit(fq->q.net, clone->truesize);
 	}
 
+	/* morph head into last received skb: prev.
+	 *
+	 * This allows callers of ipv6 conntrack defrag to continue
+	 * to use the last skb(frag) passed into the reasm engine.
+	 * The last skb frag 'silently' turns into the full reassembled skb.
+	 *
+	 * Since prev is also part of q->fragments we have to clone it first.
+	 */
+	if (head != prev) {
+		struct sk_buff *iter;
+
+		fp = skb_clone(prev, GFP_ATOMIC);
+		if (!fp)
+			goto out_oom;
+
+		fp->next = prev->next;
+		skb_queue_walk(head, iter) {
+			if (iter->next != prev)
+				continue;
+			iter->next = fp;
+			break;
+		}
+
+		skb_morph(prev, head);
+		prev->next = head->next;
+		consume_skb(head);
+		head = prev;
+	}
+
 	/* We have to remove fragment header from datagram and to relocate
 	 * header in order to calculate ICV correctly. */
 	skb_network_header(head)[fq->nhoffset] = skb_transport_header(head)[0];
@@ -582,7 +611,7 @@ struct sk_buff *nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 use
 
 	if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
 	    fq->q.meat == fq->q.len) {
-		ret_skb = nf_ct_frag6_reasm(fq, dev);
+		ret_skb = nf_ct_frag6_reasm(fq, skb, dev);
 		if (ret_skb == NULL)
 			pr_debug("Can't reassemble fragmented packets\n");
 	}
diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
index 313c1d0..fb96b10 100644
--- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
+++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
@@ -69,10 +69,6 @@ static unsigned int ipv6_defrag(void *priv,
 	if (reasm == NULL)
 		return NF_STOLEN;
 
-	/* error occurred or not fragmented */
-	if (reasm == skb)
-		return NF_ACCEPT;
-
 	NF_HOOK_THRESH(NFPROTO_IPV6, state->hook, state->net, state->sk, reasm,
 		       state->in, state->out,
 		       state->okfn, NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index ad61426..30ece1d 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -319,12 +319,7 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key,
 		if (!reasm)
 			return -EINPROGRESS;
 
-		if (skb == reasm)
-			return -EINVAL;
-
 		key->ip.proto = ipv6_hdr(reasm)->nexthdr;
-		skb_morph(skb, reasm);
-		consume_skb(reasm);
 		ovs_cb.mru = IP6CB(skb)->frag_max_size;
 #else
 		return -EPFNOSUPPORT;
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH nf-next 4/4] netfilter: ipv6: avoid nf_iterate recursion
  2015-10-17 20:14 [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag Florian Westphal
                   ` (2 preceding siblings ...)
  2015-10-17 20:14 ` [PATCH nf-next 3/4] netfilter: ipv6: in-place replacement of last skb Florian Westphal
@ 2015-10-17 20:14 ` Florian Westphal
  2015-10-20  6:25   ` Joe Stringer
  2015-10-20  6:16 ` [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag Joe Stringer
  2015-10-21 14:34 ` David Miller
  5 siblings, 1 reply; 18+ messages in thread
From: Florian Westphal @ 2015-10-17 20:14 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, azhou, joestringer, Florian Westphal

The previous patch changed nf_ct_frag6_gather() to morph reassembled skb
with the previous one.

This means that the return value is always NULL or the skb argument.
So change it to an err value.

Instead of invoking NF_HOOK recursively with threshold to skip already-called hooks
we can now just return NF_ACCEPT to move on to the next hook except for
-EINPROGRESS (which means skb has been queued for reassembly), in which case we
return NF_STOLEN.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/ipv6/nf_defrag_ipv6.h |  2 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c     | 68 ++++++++++++++---------------
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c   | 14 +++---
 net/openvswitch/conntrack.c                 | 11 +++--
 4 files changed, 43 insertions(+), 52 deletions(-)

diff --git a/include/net/netfilter/ipv6/nf_defrag_ipv6.h b/include/net/netfilter/ipv6/nf_defrag_ipv6.h
index fcd20cf..ddf162f 100644
--- a/include/net/netfilter/ipv6/nf_defrag_ipv6.h
+++ b/include/net/netfilter/ipv6/nf_defrag_ipv6.h
@@ -5,7 +5,7 @@ void nf_defrag_ipv6_enable(void);
 
 int nf_ct_frag6_init(void);
 void nf_ct_frag6_cleanup(void);
-struct sk_buff *nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user);
+int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user);
 
 struct inet_frags_ctl;
 
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 72ac916..b87dd75 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -361,14 +361,18 @@ err:
 
 /*
  *	Check if this packet is complete.
- *	Returns NULL on failure by any reason, and pointer
- *	to current nexthdr field in reassembled frame.
  *
  *	It is called with locked fq, and caller must check that
  *	queue is eligible for reassembly i.e. it is not COMPLETE,
  *	the last and the first frames arrived and all the bits are here.
+ *
+ *	returns true if *prev skb has been transformed into the reassembled
+ *	skb, false otherwise.
+ *
+ *	Note: If false is returned, *prev is still on the fragment queue, freeing
+ *	the queue is enough to discard *prev, too.
  */
-static struct sk_buff *
+static bool
 nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev,  struct net_device *dev)
 {
 	struct sk_buff *fp, *head = fq->q.fragments;
@@ -382,22 +386,21 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev,  struct net_devic
 
 	ecn = ip_frag_ecn_table[fq->ecn];
 	if (unlikely(ecn == 0xff))
-		goto out_fail;
+		return false;
 
 	/* Unfragmented part is taken from the first segment. */
 	payload_len = ((head->data - skb_network_header(head)) -
 		       sizeof(struct ipv6hdr) + fq->q.len -
 		       sizeof(struct frag_hdr));
 	if (payload_len > IPV6_MAXPLEN) {
-		pr_debug("payload len is too large.\n");
-		goto out_oversize;
+		net_dbg_ratelimited("nf_ct_frag6_reasm: payload len = %d\n",
+				    payload_len);
+		return false;
 	}
 
 	/* Head of list must not be cloned. */
-	if (skb_unclone(head, GFP_ATOMIC)) {
-		pr_debug("skb is cloned but can't expand head");
-		goto out_oom;
-	}
+	if (skb_unclone(head, GFP_ATOMIC))
+		return false;
 
 	/* If the first fragment is fragmented itself, we split
 	 * it to two chunks: the first with data and paged part
@@ -408,7 +411,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev,  struct net_devic
 
 		clone = alloc_skb(0, GFP_ATOMIC);
 		if (clone == NULL)
-			goto out_oom;
+			return false;
 
 		clone->next = head->next;
 		head->next = clone;
@@ -438,7 +441,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev,  struct net_devic
 
 		fp = skb_clone(prev, GFP_ATOMIC);
 		if (!fp)
-			goto out_oom;
+			return false;
 
 		fp->next = prev->next;
 		skb_queue_walk(head, iter) {
@@ -494,16 +497,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *prev,  struct net_devic
 	fq->q.fragments = NULL;
 	fq->q.fragments_tail = NULL;
 
-	return head;
-
-out_oversize:
-	net_dbg_ratelimited("nf_ct_frag6_reasm: payload len = %d\n",
-			    payload_len);
-	goto out_fail;
-out_oom:
-	net_dbg_ratelimited("nf_ct_frag6_reasm: no memory for reassembly\n");
-out_fail:
-	return NULL;
+	return true;
 }
 
 /*
@@ -569,27 +563,26 @@ find_prev_fhdr(struct sk_buff *skb, u8 *prevhdrp, int *prevhoff, int *fhoff)
 	return 0;
 }
 
-struct sk_buff *nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user)
+int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user)
 {
 	struct net_device *dev = skb->dev;
+	int fhoff, nhoff, ret;
 	struct frag_hdr *fhdr;
 	struct frag_queue *fq;
 	struct ipv6hdr *hdr;
-	int fhoff, nhoff;
 	u8 prevhdr;
-	struct sk_buff *ret_skb = NULL;
 
 	/* Jumbo payload inhibits frag. header */
 	if (ipv6_hdr(skb)->payload_len == 0) {
 		pr_debug("payload len = 0\n");
-		return skb;
+		return -EINVAL;
 	}
 
 	if (find_prev_fhdr(skb, &prevhdr, &nhoff, &fhoff) < 0)
-		return skb;
+		return -EINVAL;
 
 	if (!pskb_may_pull(skb, fhoff + sizeof(*fhdr)))
-		return skb;
+		return -ENOMEM;
 
 	skb_set_transport_header(skb, fhoff);
 	hdr = ipv6_hdr(skb);
@@ -598,7 +591,7 @@ struct sk_buff *nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 use
 	fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr,
 		     ip6_frag_ecn(hdr));
 	if (fq == NULL)
-		return skb;
+		return -ENOMEM;
 
 	spin_lock_bh(&fq->q.lock);
 
@@ -606,19 +599,22 @@ struct sk_buff *nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 use
 		spin_unlock_bh(&fq->q.lock);
 		pr_debug("Can't insert skb to queue\n");
 		inet_frag_put(&fq->q, &nf_frags);
-		return skb;
+		return -EINVAL;
 	}
 
+	/* after queue has assumed skb ownership, only 0 or -EINPROGRESS
+	 * must be returned.
+	 */
+	ret = -EINPROGRESS;
 	if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
-	    fq->q.meat == fq->q.len) {
-		ret_skb = nf_ct_frag6_reasm(fq, skb, dev);
-		if (ret_skb == NULL)
-			pr_debug("Can't reassemble fragmented packets\n");
-	}
+	    fq->q.meat == fq->q.len &&
+	    nf_ct_frag6_reasm(fq, skb, dev))
+		ret = 0;
+
 	spin_unlock_bh(&fq->q.lock);
 
 	inet_frag_put(&fq->q, &nf_frags);
-	return ret_skb;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);
 
diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
index fb96b10..f7aab5a 100644
--- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
+++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
@@ -55,7 +55,7 @@ static unsigned int ipv6_defrag(void *priv,
 				struct sk_buff *skb,
 				const struct nf_hook_state *state)
 {
-	struct sk_buff *reasm;
+	int err;
 
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 	/* Previously seen (loopback)?	*/
@@ -63,17 +63,13 @@ static unsigned int ipv6_defrag(void *priv,
 		return NF_ACCEPT;
 #endif
 
-	reasm = nf_ct_frag6_gather(state->net, skb,
-				   nf_ct6_defrag_user(state->hook, skb));
+	err = nf_ct_frag6_gather(state->net, skb,
+				 nf_ct6_defrag_user(state->hook, skb));
 	/* queued */
-	if (reasm == NULL)
+	if (err == -EINPROGRESS)
 		return NF_STOLEN;
 
-	NF_HOOK_THRESH(NFPROTO_IPV6, state->hook, state->net, state->sk, reasm,
-		       state->in, state->out,
-		       state->okfn, NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
-
-	return NF_STOLEN;
+	return NF_ACCEPT;
 }
 
 static struct nf_hook_ops ipv6_defrag_ops[] = {
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 30ece1d..b42630d 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -298,10 +298,10 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key,
 			    u16 zone, struct sk_buff *skb)
 {
 	struct ovs_skb_cb ovs_cb = *OVS_CB(skb);
+	int err;
 
 	if (key->eth.type == htons(ETH_P_IP)) {
 		enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone;
-		int err;
 
 		memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
 		err = ip_defrag(net, skb, user);
@@ -312,14 +312,13 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key,
 	} else if (key->eth.type == htons(ETH_P_IPV6)) {
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
 		enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
-		struct sk_buff *reasm;
 
 		memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
-		reasm = nf_ct_frag6_gather(net, skb, user);
-		if (!reasm)
-			return -EINPROGRESS;
+		err = nf_ct_frag6_gather(net, skb, user);
+		if (err)
+			return err;
 
-		key->ip.proto = ipv6_hdr(reasm)->nexthdr;
+		key->ip.proto = ipv6_hdr(skb)->nexthdr;
 		ovs_cb.mru = IP6CB(skb)->frag_max_size;
 #else
 		return -EPFNOSUPPORT;
-- 
2.0.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag
  2015-10-17 20:14 [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag Florian Westphal
                   ` (3 preceding siblings ...)
  2015-10-17 20:14 ` [PATCH nf-next 4/4] netfilter: ipv6: avoid nf_iterate recursion Florian Westphal
@ 2015-10-20  6:16 ` Joe Stringer
  2015-10-20  8:17   ` Florian Westphal
  2015-10-21 14:34 ` David Miller
  5 siblings, 1 reply; 18+ messages in thread
From: Joe Stringer @ 2015-10-20  6:16 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, Linux Netdev List, Andy Zhou

On 17 October 2015 at 13:14, Florian Westphal <fw@strlen.de> wrote:
> [ CC netdev since patch #2 isn't nf-specific.  Dave, if you want
>   I can resubmit that one after the next nf-pull request; let me know if
>   you would prefer that ].
>
> Openvswitch seems broken wrt. to defragmentation, it doesn't call
> nf_ct_frag6_consume_orig to free the original fragments.

This will need to be fixed for 'net' as well, do you have a path in
mind for that?

> Moreover, openvswitch design seems to require that it can reuse current
> skb rather than work with a new skb pointer (it uses skb_morph for this).
>
> Instead of OVS-side fix this series tries to alter netfilter ipv6 defrag
> accordingly.
>
> 1. nf_ct_frag6_consume_orig is removed, since
> commit 6aafeef03b9d9ecf ("netfilter: push reasm skb through instead of
> original frag skbs") nothing needs the original fragments so there is no
> reason why we need to clone+store original skb -- just stash original
> skbs in the frag_list.
>
> 2. Use skb_morph to make the last skb processed (not necessarily last
> fragment) the reassembled one.
>
> 3. remove the no-longer needed recursion into nf_iterate, we can now just
> return ACCEPT/STOLEN as needed instead of NF_HOOK_THRESH()+NF_STOLEN.
>
> Tested with flood-ping6+ fault-injection framework 'failslab' type.

Overall, I like it in that it seems to make the IPv6 code more
consistent with IPv4, and it's a net negative LoC, simplifying the
code.

Patch 3 when taken independently from patch 4 hides user-visible error
codes on the OVS side. The OVS conntrack action hides -EINPROGRESS
from userspace, treating it as a successful execution. All other
errors are returned up. With that patch, all errors will be hidden. I
see that it's fixed in Patch 4, so maybe it's not a biggie but those
two patches should be tightly coupled.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 4/4] netfilter: ipv6: avoid nf_iterate recursion
  2015-10-17 20:14 ` [PATCH nf-next 4/4] netfilter: ipv6: avoid nf_iterate recursion Florian Westphal
@ 2015-10-20  6:25   ` Joe Stringer
  2015-10-20  8:18     ` Florian Westphal
  0 siblings, 1 reply; 18+ messages in thread
From: Joe Stringer @ 2015-10-20  6:25 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, Linux Netdev List, Andy Zhou

On 17 October 2015 at 13:14, Florian Westphal <fw@strlen.de> wrote:
> @@ -606,19 +599,22 @@ struct sk_buff *nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 use
>                 spin_unlock_bh(&fq->q.lock);
>                 pr_debug("Can't insert skb to queue\n");
>                 inet_frag_put(&fq->q, &nf_frags);
> -               return skb;
> +               return -EINVAL;
>         }
>
> +       /* after queue has assumed skb ownership, only 0 or -EINPROGRESS
> +        * must be returned.
> +        */
> +       ret = -EINPROGRESS;
>         if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
> -           fq->q.meat == fq->q.len) {
> -               ret_skb = nf_ct_frag6_reasm(fq, skb, dev);
> -               if (ret_skb == NULL)
> -                       pr_debug("Can't reassemble fragmented packets\n");
> -       }
> +           fq->q.meat == fq->q.len &&
> +           nf_ct_frag6_reasm(fq, skb, dev))
> +               ret = 0;
> +
>         spin_unlock_bh(&fq->q.lock);
>
>         inet_frag_put(&fq->q, &nf_frags);
> -       return ret_skb;
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);

Minor nit below, feel free to disregard if it misses some code style
guideline. The lock/unlock reads easier to me if the
unlocking/frag_put is grouped together, something like the following
incremental:

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index b87dd75967d0..edfd290792f5 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -596,10 +596,8 @@ int nf_ct_frag6_gather(struct net *net, struct
sk_buff *skb, u32 user)
        spin_lock_bh(&fq->q.lock);

        if (nf_ct_frag6_queue(fq, skb, fhdr, nhoff) < 0) {
-               spin_unlock_bh(&fq->q.lock);
-               pr_debug("Can't insert skb to queue\n");
-               inet_frag_put(&fq->q, &nf_frags);
-               return -EINVAL;
+               ret = -EINVAL;
+               goto unlock;
        }

        /* after queue has assumed skb ownership, only 0 or -EINPROGRESS
@@ -611,9 +609,11 @@ int nf_ct_frag6_gather(struct net *net, struct
sk_buff *skb, u32 user)
            nf_ct_frag6_reasm(fq, skb, dev))
                ret = 0;

+unlock:
        spin_unlock_bh(&fq->q.lock);
-
        inet_frag_put(&fq->q, &nf_frags);
+       if (unlikely(ret == -EINVAL))
+               pr_debug("Can't insert skb to queue\n");
        return ret;
 }
 EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag
  2015-10-20  6:16 ` [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag Joe Stringer
@ 2015-10-20  8:17   ` Florian Westphal
  2015-10-20 18:43     ` Joe Stringer
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Westphal @ 2015-10-20  8:17 UTC (permalink / raw)
  To: Joe Stringer
  Cc: Florian Westphal, netfilter-devel, Linux Netdev List, Andy Zhou

Joe Stringer <joestringer@nicira.com> wrote:
> On 17 October 2015 at 13:14, Florian Westphal <fw@strlen.de> wrote:
> > [ CC netdev since patch #2 isn't nf-specific.  Dave, if you want
> >   I can resubmit that one after the next nf-pull request; let me know if
> >   you would prefer that ].
> >
> > Openvswitch seems broken wrt. to defragmentation, it doesn't call
> > nf_ct_frag6_consume_orig to free the original fragments.
> 
> This will need to be fixed for 'net' as well, do you have a path in
> mind for that?

Good point.  No, I don't.  Any suggestions?
I can try to just re-target -nf tree (sans patch #2).  Pablo?

ipv4 side seems broken as well (ip_defrag frees skb on errors other than
-EINPROGRESS, so it looks like we will double-free in
do_execute_actions)

> Patch 3 when taken independently from patch 4 hides user-visible error
> codes on the OVS side. The OVS conntrack action hides -EINPROGRESS
> from userspace, treating it as a successful execution. All other
> errors are returned up. With that patch, all errors will be hidden. I
> see that it's fixed in Patch 4, so maybe it's not a biggie but those
> two patches should be tightly coupled.

You're right, we can't signal "skb unchanged".  I guess one could
just test wheter skb is a fragment and -EINVAL if it is, not sure
if its worth doing given that such test would be removed again
by the very next patch?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 4/4] netfilter: ipv6: avoid nf_iterate recursion
  2015-10-20  6:25   ` Joe Stringer
@ 2015-10-20  8:18     ` Florian Westphal
  0 siblings, 0 replies; 18+ messages in thread
From: Florian Westphal @ 2015-10-20  8:18 UTC (permalink / raw)
  To: Joe Stringer
  Cc: Florian Westphal, netfilter-devel, Linux Netdev List, Andy Zhou

Joe Stringer <joestringer@nicira.com> wrote:
> On 17 October 2015 at 13:14, Florian Westphal <fw@strlen.de> wrote:
> > @@ -606,19 +599,22 @@ struct sk_buff *nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 use
> >                 spin_unlock_bh(&fq->q.lock);
> >                 pr_debug("Can't insert skb to queue\n");
> >                 inet_frag_put(&fq->q, &nf_frags);
> > -               return skb;
> > +               return -EINVAL;
> >         }
> >
> > +       /* after queue has assumed skb ownership, only 0 or -EINPROGRESS
> > +        * must be returned.
> > +        */
> > +       ret = -EINPROGRESS;
> >         if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
> > -           fq->q.meat == fq->q.len) {
> > -               ret_skb = nf_ct_frag6_reasm(fq, skb, dev);
> > -               if (ret_skb == NULL)
> > -                       pr_debug("Can't reassemble fragmented packets\n");
> > -       }
> > +           fq->q.meat == fq->q.len &&
> > +           nf_ct_frag6_reasm(fq, skb, dev))
> > +               ret = 0;
> > +
> >         spin_unlock_bh(&fq->q.lock);
> >
> >         inet_frag_put(&fq->q, &nf_frags);
> > -       return ret_skb;
> > +       return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);
> 
> Minor nit below, feel free to disregard if it misses some code style
> guideline. The lock/unlock reads easier to me if the
> unlocking/frag_put is grouped together, something like the following
> incremental:

I like it, will change it for v2.

Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 3/4] netfilter: ipv6: in-place replacement of last skb
  2015-10-17 20:14 ` [PATCH nf-next 3/4] netfilter: ipv6: in-place replacement of last skb Florian Westphal
@ 2015-10-20 18:39   ` Joe Stringer
  2015-10-20 20:46     ` Florian Westphal
  0 siblings, 1 reply; 18+ messages in thread
From: Joe Stringer @ 2015-10-20 18:39 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, Linux Netdev List, Andy Zhou

On 17 October 2015 at 13:14, Florian Westphal <fw@strlen.de> wrote:
> @@ -425,6 +425,35 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
>                 add_frag_mem_limit(fq->q.net, clone->truesize);
>         }
>
> +       /* morph head into last received skb: prev.
> +        *
> +        * This allows callers of ipv6 conntrack defrag to continue
> +        * to use the last skb(frag) passed into the reasm engine.
> +        * The last skb frag 'silently' turns into the full reassembled skb.
> +        *
> +        * Since prev is also part of q->fragments we have to clone it first.
> +        */
> +       if (head != prev) {
> +               struct sk_buff *iter;
> +
> +               fp = skb_clone(prev, GFP_ATOMIC);
> +               if (!fp)
> +                       goto out_oom;
> +
> +               fp->next = prev->next;
> +               skb_queue_walk(head, iter) {
> +                       if (iter->next != prev)
> +                               continue;
> +                       iter->next = fp;
> +                       break;
> +               }
> +
> +               skb_morph(prev, head);
> +               prev->next = head->next;
> +               consume_skb(head);
> +               head = prev;
> +       }
> +
>         /* We have to remove fragment header from datagram and to relocate
>          * header in order to calculate ICV correctly. */
>         skb_network_header(head)[fq->nhoffset] = skb_transport_header(head)[0];

This hunk looks very similar to the logic in ip_frag_reasm(). Did you
consider refactoring to share it?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag
  2015-10-20  8:17   ` Florian Westphal
@ 2015-10-20 18:43     ` Joe Stringer
  2015-10-20 20:53       ` Florian Westphal
  0 siblings, 1 reply; 18+ messages in thread
From: Joe Stringer @ 2015-10-20 18:43 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, Linux Netdev List, Andy Zhou

On 20 October 2015 at 01:17, Florian Westphal <fw@strlen.de> wrote:
> Joe Stringer <joestringer@nicira.com> wrote:
>> On 17 October 2015 at 13:14, Florian Westphal <fw@strlen.de> wrote:
>> > [ CC netdev since patch #2 isn't nf-specific.  Dave, if you want
>> >   I can resubmit that one after the next nf-pull request; let me know if
>> >   you would prefer that ].
>> >
>> > Openvswitch seems broken wrt. to defragmentation, it doesn't call
>> > nf_ct_frag6_consume_orig to free the original fragments.
>>
>> This will need to be fixed for 'net' as well, do you have a path in
>> mind for that?
>
> Good point.  No, I don't.  Any suggestions?
> I can try to just re-target -nf tree (sans patch #2).  Pablo?

The smallest change seems to be adding the nf_ct_frag6_consume_orig()
call to OVS, plus the morph logic from patch 3. Alternatively if Pablo
is fine with having the series re-targeted, then that sounds
reasonable to me too.

> ipv4 side seems broken as well (ip_defrag frees skb on errors other than
> -EINPROGRESS, so it looks like we will double-free in
> do_execute_actions)

Oh dear. Thanks for the report. I propose wrapping the ip_defrag()
with an skb_get()/skb_consume() as this seems to require the least
invasive changes:

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index a5ec34f8502f..0d2d24c99fd5 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -303,7 +303,11 @@ static int handle_fragments(struct net *net,
struct sw_flow_key *key,
                int err;

                memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+
+               skb_get(skb);
                err = ip_defrag(skb, user);
+               if (!err || err == -EINPROGRESS)
+                       consume_skb(skb);
                if (err)
                        return err;

I don't have a test environment to verify this at the moment though.
Let me know if you'll take this with your patch series or if you'd
like me to send it out.

>> Patch 3 when taken independently from patch 4 hides user-visible error
>> codes on the OVS side. The OVS conntrack action hides -EINPROGRESS
>> from userspace, treating it as a successful execution. All other
>> errors are returned up. With that patch, all errors will be hidden. I
>> see that it's fixed in Patch 4, so maybe it's not a biggie but those
>> two patches should be tightly coupled.
>
> You're right, we can't signal "skb unchanged".  I guess one could
> just test wheter skb is a fragment and -EINVAL if it is, not sure
> if its worth doing given that such test would be removed again
> by the very next patch?

On second thought, this would only affect 'net', and if we're
considering the solution for that tree too then it really doesn't
matter.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 3/4] netfilter: ipv6: in-place replacement of last skb
  2015-10-20 18:39   ` Joe Stringer
@ 2015-10-20 20:46     ` Florian Westphal
  0 siblings, 0 replies; 18+ messages in thread
From: Florian Westphal @ 2015-10-20 20:46 UTC (permalink / raw)
  To: Joe Stringer
  Cc: Florian Westphal, netfilter-devel, Linux Netdev List, Andy Zhou

Joe Stringer <joestringer@nicira.com> wrote:
> This hunk looks very similar to the logic in ip_frag_reasm(). Did you
> consider refactoring to share it?

Could be done but I did not plan to do that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag
  2015-10-20 18:43     ` Joe Stringer
@ 2015-10-20 20:53       ` Florian Westphal
  2015-10-20 23:59         ` Joe Stringer
  2015-10-21 12:42         ` Pablo Neira Ayuso
  0 siblings, 2 replies; 18+ messages in thread
From: Florian Westphal @ 2015-10-20 20:53 UTC (permalink / raw)
  To: Joe Stringer
  Cc: Florian Westphal, netfilter-devel, Linux Netdev List, Andy Zhou

Joe Stringer <joestringer@nicira.com> wrote:
> > Good point.  No, I don't.  Any suggestions?
> > I can try to just re-target -nf tree (sans patch #2).  Pablo?
> 
> The smallest change seems to be adding the nf_ct_frag6_consume_orig()
> call to OVS, plus the morph logic from patch 3. Alternatively if Pablo
> is fine with having the series re-targeted, then that sounds
> reasonable to me too.

Pablo, your call.

I would suggest to re-target patches #1 and #3 to nf tree, I can do
this, just let me know.

Alternative is to just add the nf_ct_frag6_consume_orig call to
openvswitch and handle that via net tree.

I can then wait for that change to pop up in nf-next and just resend
this series (which will then undo that change).

Let me know, thanks!

> > ipv4 side seems broken as well (ip_defrag frees skb on errors other than
> > -EINPROGRESS, so it looks like we will double-free in
> > do_execute_actions)
> 
> Oh dear. Thanks for the report. I propose wrapping the ip_defrag()
> with an skb_get()/skb_consume() as this seems to require the least
> invasive changes:
> 
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index a5ec34f8502f..0d2d24c99fd5 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -303,7 +303,11 @@ static int handle_fragments(struct net *net,
> struct sw_flow_key *key,
>                 int err;
> 
>                 memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
> +
> +               skb_get(skb);
>                 err = ip_defrag(skb, user);
> +               if (!err || err == -EINPROGRESS)
> +                       consume_skb(skb);
>                 if (err)
>                         return err;

Indeed, that seems like the least invasive change.

Feel free to submit this to -net, there is no dependency on any of the
other changes.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag
  2015-10-20 20:53       ` Florian Westphal
@ 2015-10-20 23:59         ` Joe Stringer
  2015-10-21 12:42         ` Pablo Neira Ayuso
  1 sibling, 0 replies; 18+ messages in thread
From: Joe Stringer @ 2015-10-20 23:59 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, Linux Netdev List, Andy Zhou

On 20 October 2015 at 13:53, Florian Westphal <fw@strlen.de> wrote:
> Feel free to submit this to -net, there is no dependency on any of the
> other changes.

OK, I'll follow up on that patch.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag
  2015-10-20 20:53       ` Florian Westphal
  2015-10-20 23:59         ` Joe Stringer
@ 2015-10-21 12:42         ` Pablo Neira Ayuso
  2015-10-21 14:50           ` Florian Westphal
  1 sibling, 1 reply; 18+ messages in thread
From: Pablo Neira Ayuso @ 2015-10-21 12:42 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Joe Stringer, netfilter-devel, Linux Netdev List, Andy Zhou

On Tue, Oct 20, 2015 at 10:53:07PM +0200, Florian Westphal wrote:
> Joe Stringer <joestringer@nicira.com> wrote:
> > > Good point.  No, I don't.  Any suggestions?
> > > I can try to just re-target -nf tree (sans patch #2).  Pablo?
> > 
> > The smallest change seems to be adding the nf_ct_frag6_consume_orig()
> > call to OVS, plus the morph logic from patch 3. Alternatively if Pablo
> > is fine with having the series re-targeted, then that sounds
> > reasonable to me too.
> 
> Pablo, your call.
> 
> I would suggest to re-target patches #1 and #3 to nf tree, I can do
> this, just let me know.

It's fairly late, we're on -rc6 so I don't think it's a good idea to
submit a large rework to -nf at this stage.

> Alternative is to just add the nf_ct_frag6_consume_orig call to
> openvswitch and handle that via net tree.
> 
> I can then wait for that change to pop up in nf-next and just resend
> this series (which will then undo that change).

I'd rather get things fixes for the existing code. This would also
allow simple passing back to -stable, then we can move forward discuss
and review your rework with sufficient time.

Let me know, thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag
  2015-10-17 20:14 [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag Florian Westphal
                   ` (4 preceding siblings ...)
  2015-10-20  6:16 ` [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag Joe Stringer
@ 2015-10-21 14:34 ` David Miller
  5 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2015-10-21 14:34 UTC (permalink / raw)
  To: fw; +Cc: netfilter-devel, netdev, azhou, joestringer

From: Florian Westphal <fw@strlen.de>
Date: Sat, 17 Oct 2015 22:14:21 +0200

> [ CC netdev since patch #2 isn't nf-specific.  Dave, if you want
>   I can resubmit that one after the next nf-pull request; let me know if
>   you would prefer that ].

No objections to you merging patch #2 however you wish, it's
perfectly fine.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag
  2015-10-21 12:42         ` Pablo Neira Ayuso
@ 2015-10-21 14:50           ` Florian Westphal
  2015-10-21 16:52             ` Joe Stringer
  0 siblings, 1 reply; 18+ messages in thread
From: Florian Westphal @ 2015-10-21 14:50 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, Joe Stringer, netfilter-devel,
	Linux Netdev List, Andy Zhou

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > I can then wait for that change to pop up in nf-next and just resend
> > this series (which will then undo that change).
> 
> I'd rather get things fixes for the existing code. This would also
> allow simple passing back to -stable, then we can move forward discuss
> and review your rework with sufficient time.

Joe, could you take care of this and submit a OVS fix to net tree?

(just add that call to nf_ct_frag6_consume_orig and take
 the morph change directly into the OVS callpath)

I will then resubmit all of this at some later point.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag
  2015-10-21 14:50           ` Florian Westphal
@ 2015-10-21 16:52             ` Joe Stringer
  0 siblings, 0 replies; 18+ messages in thread
From: Joe Stringer @ 2015-10-21 16:52 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, netfilter-devel, Linux Netdev List, Andy Zhou

On 21 October 2015 at 07:50, Florian Westphal <fw@strlen.de> wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> > I can then wait for that change to pop up in nf-next and just resend
>> > this series (which will then undo that change).
>>
>> I'd rather get things fixes for the existing code. This would also
>> allow simple passing back to -stable, then we can move forward discuss
>> and review your rework with sufficient time.
>
> Joe, could you take care of this and submit a OVS fix to net tree?
>
> (just add that call to nf_ct_frag6_consume_orig and take
>  the morph change directly into the OVS callpath)
>
> I will then resubmit all of this at some later point.
>
> Thanks.

Sure thing.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-10-21 16:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-17 20:14 [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag Florian Westphal
2015-10-17 20:14 ` [PATCH nf-next 1/4] netfilter: ipv6: remove extra clone/free operations Florian Westphal
2015-10-17 20:14 ` [PATCH nf-next 2/4] inet: kill obsolete skb_free op Florian Westphal
2015-10-17 20:14 ` [PATCH nf-next 3/4] netfilter: ipv6: in-place replacement of last skb Florian Westphal
2015-10-20 18:39   ` Joe Stringer
2015-10-20 20:46     ` Florian Westphal
2015-10-17 20:14 ` [PATCH nf-next 4/4] netfilter: ipv6: avoid nf_iterate recursion Florian Westphal
2015-10-20  6:25   ` Joe Stringer
2015-10-20  8:18     ` Florian Westphal
2015-10-20  6:16 ` [PATCH nf-next 0/4] netfilter: rework netfilter ipv6 defrag Joe Stringer
2015-10-20  8:17   ` Florian Westphal
2015-10-20 18:43     ` Joe Stringer
2015-10-20 20:53       ` Florian Westphal
2015-10-20 23:59         ` Joe Stringer
2015-10-21 12:42         ` Pablo Neira Ayuso
2015-10-21 14:50           ` Florian Westphal
2015-10-21 16:52             ` Joe Stringer
2015-10-21 14:34 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).