All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] ipvs: IPv6 fragment handling for IPVS
@ 2012-08-20 13:08 Jesper Dangaard Brouer
  2012-08-20 13:08 ` [PATCH 1/3] ipvs: Trivial changes, use compressed IPv6 address in output Jesper Dangaard Brouer
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-20 13:08 UTC (permalink / raw)
  To: netdev, Patrick McHardy, Hans Schillstrom, lvs-devel,
	Julian Anastasov, Simon Horman
  Cc: Jesper Dangaard Brouer, Wensong Zhang, netfilter-devel

The following patchset implement IPv6 fragment handling for IPVS.

This work is based upon patches from Hans Schillstrom.  I have taken
over the patchset, in close agreement with Hans, because he don't have
(gotten allocated) time to complete his work.

I have cleaned up the patchset, changed the API a bit, fixed a refcnt
bug, and rebased on top of Julians recent changes. (All with Hans'es
knowledge)

 Patch01: is just unrelated trivial fixes.

 Patch02: Fix faulty IPv6 extension header handling in IPVS

 Patch03: Complete IPv6 fragment handling for IPVS

This patchset is based upon:
 Homes ipvs-next tree:
  git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git

 On top of commit 3654e61137db891f5312e6dd813b961484b5fdf3:
  ipvs: add pmtu_disc option to disable IP DF for TUN packets

---

Jesper Dangaard Brouer (3):
      ipvs: Complete IPv6 fragment handling for IPVS
      ipvs: Fix faulty IPv6 extension header handling in IPVS
      ipvs: Trivial changes, use compressed IPv6 address in output


 include/net/ip_vs.h                     |  191 +++++++++++----
 net/netfilter/ipvs/Kconfig              |    7 -
 net/netfilter/ipvs/ip_vs_conn.c         |   15 -
 net/netfilter/ipvs/ip_vs_core.c         |  384 +++++++++++++++++--------------
 net/netfilter/ipvs/ip_vs_dh.c           |    2 
 net/netfilter/ipvs/ip_vs_lblc.c         |    2 
 net/netfilter/ipvs/ip_vs_lblcr.c        |    2 
 net/netfilter/ipvs/ip_vs_pe_sip.c       |   27 ++
 net/netfilter/ipvs/ip_vs_proto.c        |    6 
 net/netfilter/ipvs/ip_vs_proto_ah_esp.c |    9 -
 net/netfilter/ipvs/ip_vs_proto_sctp.c   |   42 +--
 net/netfilter/ipvs/ip_vs_proto_tcp.c    |   40 +--
 net/netfilter/ipvs/ip_vs_proto_udp.c    |   41 +--
 net/netfilter/ipvs/ip_vs_sched.c        |    2 
 net/netfilter/ipvs/ip_vs_sh.c           |    2 
 net/netfilter/ipvs/ip_vs_xmit.c         |   75 +++---
 net/netfilter/xt_ipvs.c                 |    4 
 17 files changed, 489 insertions(+), 362 deletions(-)


--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] ipvs: Trivial changes, use compressed IPv6 address in output
  2012-08-20 13:08 [PATCH 0/3] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
@ 2012-08-20 13:08 ` Jesper Dangaard Brouer
  2012-08-20 13:08 ` [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-20 13:08 UTC (permalink / raw)
  To: netdev, Patrick McHardy, Hans Schillstrom, lvs-devel,
	Julian Anastasov, Simon Horman
  Cc: Jesper Dangaard Brouer, Wensong Zhang, netfilter-devel

Have not converted the proc file output to compressed IPv6 addresses.

Move trivial fixes to this first patch.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 include/net/ip_vs.h              |    2 +-
 net/netfilter/ipvs/ip_vs_proto.c |    6 +++---
 net/netfilter/ipvs/ip_vs_sched.c |    2 +-
 net/netfilter/ipvs/ip_vs_xmit.c  |   10 +++++-----
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index ee75ccd..aba0bb2 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -165,7 +165,7 @@ static inline const char *ip_vs_dbg_addr(int af, char *buf, size_t buf_len,
 	int len;
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6)
-		len = snprintf(&buf[*idx], buf_len - *idx, "[%pI6]",
+		len = snprintf(&buf[*idx], buf_len - *idx, "[%pI6c]",
 			       &addr->in6) + 1;
 	else
 #endif
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 50d8218..939f7fb 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -280,17 +280,17 @@ ip_vs_tcpudp_debug_packet_v6(struct ip_vs_protocol *pp,
 	if (ih == NULL)
 		sprintf(buf, "TRUNCATED");
 	else if (ih->nexthdr == IPPROTO_FRAGMENT)
-		sprintf(buf, "%pI6->%pI6 frag",	&ih->saddr, &ih->daddr);
+		sprintf(buf, "%pI6c->%pI6c frag", &ih->saddr, &ih->daddr);
 	else {
 		__be16 _ports[2], *pptr;
 
 		pptr = skb_header_pointer(skb, offset + sizeof(struct ipv6hdr),
 					  sizeof(_ports), _ports);
 		if (pptr == NULL)
-			sprintf(buf, "TRUNCATED %pI6->%pI6",
+			sprintf(buf, "TRUNCATED %pI6c->%pI6c",
 				&ih->saddr, &ih->daddr);
 		else
-			sprintf(buf, "%pI6:%u->%pI6:%u",
+			sprintf(buf, "%pI6c:%u->%pI6c:%u",
 				&ih->saddr, ntohs(pptr[0]),
 				&ih->daddr, ntohs(pptr[1]));
 	}
diff --git a/net/netfilter/ipvs/ip_vs_sched.c b/net/netfilter/ipvs/ip_vs_sched.c
index 08dbdd5..d6bf20d 100644
--- a/net/netfilter/ipvs/ip_vs_sched.c
+++ b/net/netfilter/ipvs/ip_vs_sched.c
@@ -159,7 +159,7 @@ void ip_vs_scheduler_err(struct ip_vs_service *svc, const char *msg)
 			     svc->fwmark, msg);
 #ifdef CONFIG_IP_VS_IPV6
 	} else if (svc->af == AF_INET6) {
-		IP_VS_ERR_RL("%s: %s [%pI6]:%d - %s\n",
+		IP_VS_ERR_RL("%s: %s [%pI6c]:%d - %s\n",
 			     svc->scheduler->name,
 			     ip_vs_proto_name(svc->protocol),
 			     &svc->addr.in6, ntohs(svc->port), msg);
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 543a554..eef3432 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -319,7 +319,7 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
 	local = __ip_vs_is_local_route6(rt);
 	if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) &
 	      rt_mode)) {
-		IP_VS_DBG_RL("Stopping traffic to %s address, dest: %pI6\n",
+		IP_VS_DBG_RL("Stopping traffic to %s address, dest: %pI6c\n",
 			     local ? "local":"non-local", daddr);
 		dst_release(&rt->dst);
 		return NULL;
@@ -327,8 +327,8 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
 	if (local && !(rt_mode & IP_VS_RT_MODE_RDR) &&
 	    !((ort = (struct rt6_info *) skb_dst(skb)) &&
 	      __ip_vs_is_local_route6(ort))) {
-		IP_VS_DBG_RL("Redirect from non-local address %pI6 to local "
-			     "requires NAT method, dest: %pI6\n",
+		IP_VS_DBG_RL("Redirect from non-local address %pI6c to local "
+			     "requires NAT method, dest: %pI6c\n",
 			     &ipv6_hdr(skb)->daddr, daddr);
 		dst_release(&rt->dst);
 		return NULL;
@@ -336,8 +336,8 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
 	if (unlikely(!local && (!skb->dev || skb->dev->flags & IFF_LOOPBACK) &&
 		     ipv6_addr_type(&ipv6_hdr(skb)->saddr) &
 				    IPV6_ADDR_LOOPBACK)) {
-		IP_VS_DBG_RL("Stopping traffic from loopback address %pI6 "
-			     "to non-local address, dest: %pI6\n",
+		IP_VS_DBG_RL("Stopping traffic from loopback address %pI6c "
+			     "to non-local address, dest: %pI6c\n",
 			     &ipv6_hdr(skb)->saddr, daddr);
 		dst_release(&rt->dst);
 		return NULL;


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS
  2012-08-20 13:08 [PATCH 0/3] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
  2012-08-20 13:08 ` [PATCH 1/3] ipvs: Trivial changes, use compressed IPv6 address in output Jesper Dangaard Brouer
@ 2012-08-20 13:08 ` Jesper Dangaard Brouer
  2012-08-21 14:14   ` Julian Anastasov
  2012-08-26 21:13   ` Patrick McHardy
  2012-08-20 13:08 ` [PATCH 3/3] ipvs: Complete IPv6 fragment handling for IPVS Jesper Dangaard Brouer
  2012-08-21  5:24 ` [PATCH 0/3] ipvs: " Simon Horman
  3 siblings, 2 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-20 13:08 UTC (permalink / raw)
  To: netdev, Patrick McHardy, Hans Schillstrom, lvs-devel,
	Julian Anastasov, Simon Horman
  Cc: Jesper Dangaard Brouer, Wensong Zhang, netfilter-devel

Based on patch from: Hans Schillstrom

IPv6 headers must be processed in order of appearance,
neither can it be assumed that Upper layer headers is first.
If anything else than L4 is the first header IPVS will throw it.

IPVS will write SNAT & DNAT modifications at a fixed pos which
will corrupt the message. Proper header position must be found
before writing modifying packet.

This patch contains a lot of API changes.  This is done, to avoid
the costly scan of finding the IPv6 headers, via ipv6_find_hdr().
Finding the IPv6 headers is done as early as possible, and passed
on as a pointer "struct ip_vs_iphdr *" to the affected functions.

Notice, I have choosen, not to change the API of function
pointer "(*schedule)" (in struct ip_vs_scheduler) as it can be
used by external schedulers, via {un,}register_ip_vs_scheduler.
Only 4 out of 10 schedulers use info from ip_vs_iphdr*, and when
they do, they are only interested in iph->{s,d}addr.

This patch depends on commit 84018f55a:
 "netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()"

This also adds a dependency to ip6_tables.

Hans left some questions in ip_vs_pe_sip.c, which I'm uncertain about.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
---

 include/net/ip_vs.h                     |  173 +++++++++++++++------
 net/netfilter/ipvs/ip_vs_conn.c         |   15 +-
 net/netfilter/ipvs/ip_vs_core.c         |  253 +++++++++++++------------------
 net/netfilter/ipvs/ip_vs_dh.c           |    2 
 net/netfilter/ipvs/ip_vs_lblc.c         |    2 
 net/netfilter/ipvs/ip_vs_lblcr.c        |    2 
 net/netfilter/ipvs/ip_vs_pe_sip.c       |   27 ++-
 net/netfilter/ipvs/ip_vs_proto_ah_esp.c |    9 -
 net/netfilter/ipvs/ip_vs_proto_sctp.c   |   42 ++---
 net/netfilter/ipvs/ip_vs_proto_tcp.c    |   40 ++---
 net/netfilter/ipvs/ip_vs_proto_udp.c    |   41 ++---
 net/netfilter/ipvs/ip_vs_sh.c           |    2 
 net/netfilter/ipvs/ip_vs_xmit.c         |   41 +++--
 net/netfilter/xt_ipvs.c                 |    4 
 14 files changed, 340 insertions(+), 313 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index aba0bb2..8d5920f 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -22,6 +22,9 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>			/* for struct ipv6hdr */
 #include <net/ipv6.h>
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 #include <net/netfilter/nf_conntrack.h>
 #endif
@@ -103,30 +106,99 @@ static inline struct net *seq_file_single_net(struct seq_file *seq)
 /* Connections' size value needed by ip_vs_ctl.c */
 extern int ip_vs_conn_tab_size;
 
-
 struct ip_vs_iphdr {
-	int len;
-	__u8 protocol;
+	__u32 len;	/* IPv4 simply where L4 starts
+			   IPv6 where to find next header */
+	__u32 offs;	/* IPv6 frags: header offset in nfct_reasm skb */
+	__u16 fragoffs; /* IPv6 fragment offset, 0 if first frag (or not frag)*/
+	__s16 protocol;
+	__s32 flags;
 	union nf_inet_addr saddr;
 	union nf_inet_addr daddr;
 };
 
+/* Dependency to module: nf_defrag_ipv6 */
+#if defined(CONFIG_NF_DEFRAG_IPV6) || defined(CONFIG_NF_DEFRAG_IPV6_MODULE)
+static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
+{
+	return skb->nfct_reasm;
+}
+#else
+static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
+{
+	return NULL;
+}
+#endif
+
+static inline void
+ip_vs_fill_ip4hdr(const void *nh, struct ip_vs_iphdr *iphdr)
+{
+	const struct iphdr *iph = nh;
+
+	iphdr->len	= iph->ihl * 4;
+	iphdr->fragoffs	= 0;
+	iphdr->protocol	= iph->protocol;
+	iphdr->saddr.ip	= iph->saddr;
+	iphdr->daddr.ip	= iph->daddr;
+}
+
+/* This function handles filling *ip_vs_iphdr, both for IPv4 and IPv6.
+ * IPv6 requires some extra work, as finding proper header position,
+ * depend on the IPv6 extension headers.
+ */
 static inline void
-ip_vs_fill_iphdr(int af, const void *nh, struct ip_vs_iphdr *iphdr)
+ip_vs_fill_iph_skb(int af, const struct sk_buff *skb, struct ip_vs_iphdr *iphdr)
 {
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6) {
-		const struct ipv6hdr *iph = nh;
-		iphdr->len = sizeof(struct ipv6hdr);
-		iphdr->protocol = iph->nexthdr;
+		const struct ipv6hdr *iph =
+			(struct ipv6hdr *)skb_network_header(skb);
 		iphdr->saddr.in6 = iph->saddr;
 		iphdr->daddr.in6 = iph->daddr;
+		/* ipv6_find_hdr() updates len, flags, offs */
+		iphdr->len	 = 0;
+		iphdr->flags	 = 0;
+		iphdr->offs	 = 0;
+		iphdr->protocol  = ipv6_find_hdr(skb, &iphdr->len, -1,
+						 &iphdr->fragoffs,
+						 &iphdr->flags);
+		/* get proto from re-assembled packet and it's offset */
+		if (skb_nfct_reasm(skb))
+			iphdr->protocol = ipv6_find_hdr(skb_nfct_reasm(skb),
+							&iphdr->offs, -1, NULL,
+							NULL);
 	} else
 #endif
 	{
-		const struct iphdr *iph = nh;
-		iphdr->len = iph->ihl * 4;
-		iphdr->protocol = iph->protocol;
+		const struct iphdr *iph =
+			(struct iphdr *)skb_network_header(skb);
+		iphdr->len	= iph->ihl * 4;
+		iphdr->fragoffs	= 0;
+		iphdr->protocol	= iph->protocol;
+		iphdr->saddr.ip	= iph->saddr;
+		iphdr->daddr.ip	= iph->daddr;
+	}
+}
+
+/* This function is a faster version of ip_vs_fill_iph_skb().
+ * Where we only populate {s,d}addr (and avoid calling ipv6_find_hdr()).
+ * This is used by the some of the ip_vs_*_schedule() functions.
+ * (Mostly done to avoid ABI breakage of external schedulers)
+ */
+static inline void
+ip_vs_fill_iph_addr_only(int af, const struct sk_buff *skb,
+			 struct ip_vs_iphdr *iphdr)
+{
+#ifdef CONFIG_IP_VS_IPV6
+	if (af == AF_INET6) {
+		const struct ipv6hdr *iph =
+			(struct ipv6hdr *)skb_network_header(skb);
+		iphdr->saddr.in6 = iph->saddr;
+		iphdr->daddr.in6 = iph->daddr;
+	} else {
+#endif
+		const struct iphdr *iph =
+			(struct iphdr *)skb_network_header(skb);
 		iphdr->saddr.ip = iph->saddr;
 		iphdr->daddr.ip = iph->daddr;
 	}
@@ -398,27 +470,26 @@ struct ip_vs_protocol {
 
 	int (*conn_schedule)(int af, struct sk_buff *skb,
 			     struct ip_vs_proto_data *pd,
-			     int *verdict, struct ip_vs_conn **cpp);
+			     int *verdict, struct ip_vs_conn **cpp,
+			     struct ip_vs_iphdr *iph);
 
 	struct ip_vs_conn *
 	(*conn_in_get)(int af,
 		       const struct sk_buff *skb,
 		       const struct ip_vs_iphdr *iph,
-		       unsigned int proto_off,
 		       int inverse);
 
 	struct ip_vs_conn *
 	(*conn_out_get)(int af,
 			const struct sk_buff *skb,
 			const struct ip_vs_iphdr *iph,
-			unsigned int proto_off,
 			int inverse);
 
-	int (*snat_handler)(struct sk_buff *skb,
-			    struct ip_vs_protocol *pp, struct ip_vs_conn *cp);
+	int (*snat_handler)(struct sk_buff *skb, struct ip_vs_protocol *pp,
+			    struct ip_vs_conn *cp, struct ip_vs_iphdr *iph);
 
-	int (*dnat_handler)(struct sk_buff *skb,
-			    struct ip_vs_protocol *pp, struct ip_vs_conn *cp);
+	int (*dnat_handler)(struct sk_buff *skb, struct ip_vs_protocol *pp,
+			    struct ip_vs_conn *cp, struct ip_vs_iphdr *iph);
 
 	int (*csum_check)(int af, struct sk_buff *skb,
 			  struct ip_vs_protocol *pp);
@@ -518,7 +589,7 @@ struct ip_vs_conn {
 	   NF_ACCEPT can be returned when destination is local.
 	 */
 	int (*packet_xmit)(struct sk_buff *skb, struct ip_vs_conn *cp,
-			   struct ip_vs_protocol *pp);
+			   struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph);
 
 	/* Note: we can group the following members into a structure,
 	   in order to save more space, and the following members are
@@ -769,13 +840,11 @@ struct ip_vs_app {
 
 	struct ip_vs_conn *
 	(*conn_in_get)(const struct sk_buff *skb, struct ip_vs_app *app,
-		       const struct iphdr *iph, unsigned int proto_off,
-		       int inverse);
+		       const struct iphdr *iph, int inverse);
 
 	struct ip_vs_conn *
 	(*conn_out_get)(const struct sk_buff *skb, struct ip_vs_app *app,
-			const struct iphdr *iph, unsigned int proto_off,
-			int inverse);
+			const struct iphdr *iph, int inverse);
 
 	int (*state_transition)(struct ip_vs_conn *cp, int direction,
 				const struct sk_buff *skb,
@@ -1074,14 +1143,12 @@ struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p);
 
 struct ip_vs_conn * ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
 					    const struct ip_vs_iphdr *iph,
-					    unsigned int proto_off,
 					    int inverse);
 
 struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p);
 
 struct ip_vs_conn * ip_vs_conn_out_get_proto(int af, const struct sk_buff *skb,
 					     const struct ip_vs_iphdr *iph,
-					     unsigned int proto_off,
 					     int inverse);
 
 /* put back the conn without restarting its timer */
@@ -1254,9 +1321,10 @@ extern struct ip_vs_scheduler *ip_vs_scheduler_get(const char *sched_name);
 extern void ip_vs_scheduler_put(struct ip_vs_scheduler *scheduler);
 extern struct ip_vs_conn *
 ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
-	       struct ip_vs_proto_data *pd, int *ignored);
+	       struct ip_vs_proto_data *pd, int *ignored,
+	       struct ip_vs_iphdr *iph);
 extern int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
-			struct ip_vs_proto_data *pd);
+			struct ip_vs_proto_data *pd, struct ip_vs_iphdr *iph);
 
 extern void ip_vs_scheduler_err(struct ip_vs_service *svc, const char *msg);
 
@@ -1315,33 +1383,38 @@ extern void ip_vs_read_estimator(struct ip_vs_stats_user *dst,
 /*
  *	Various IPVS packet transmitters (from ip_vs_xmit.c)
  */
-extern int ip_vs_null_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_bypass_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_nat_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_tunnel_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_dr_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_icmp_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp,
- int offset, unsigned int hooknum);
+extern int ip_vs_null_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+			   struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph);
+extern int ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+			     struct ip_vs_protocol *pp,
+			     struct ip_vs_iphdr *iph);
+extern int ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+			  struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph);
+extern int ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+			     struct ip_vs_protocol *pp,
+			     struct ip_vs_iphdr *iph);
+extern int ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+			 struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph);
+extern int ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+			   struct ip_vs_protocol *pp, int offset,
+			   unsigned int hooknum, struct ip_vs_iphdr *iph);
 extern void ip_vs_dst_reset(struct ip_vs_dest *dest);
 
 #ifdef CONFIG_IP_VS_IPV6
-extern int ip_vs_bypass_xmit_v6
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_nat_xmit_v6
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_tunnel_xmit_v6
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_dr_xmit_v6
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_icmp_xmit_v6
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp,
- int offset, unsigned int hooknum);
+extern int ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
+				struct ip_vs_protocol *pp,
+				struct ip_vs_iphdr *iph);
+extern int ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
+			     struct ip_vs_protocol *pp,
+			     struct ip_vs_iphdr *iph);
+extern int ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
+				struct ip_vs_protocol *pp,
+				struct ip_vs_iphdr *iph);
+extern int ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
+			    struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph);
+extern int ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
+			      struct ip_vs_protocol *pp, int offset,
+			      unsigned int hooknum, struct ip_vs_iphdr *iph);
 #endif
 
 #ifdef CONFIG_SYSCTL
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 1548df9..a00db99 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -308,13 +308,12 @@ struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p)
 static int
 ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
 			    const struct ip_vs_iphdr *iph,
-			    unsigned int proto_off, int inverse,
-			    struct ip_vs_conn_param *p)
+			    int inverse, struct ip_vs_conn_param *p)
 {
 	__be16 _ports[2], *pptr;
 	struct net *net = skb_net(skb);
 
-	pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
+	pptr = skb_header_pointer(skb, iph->len, sizeof(_ports), _ports);
 	if (pptr == NULL)
 		return 1;
 
@@ -329,12 +328,11 @@ ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
 
 struct ip_vs_conn *
 ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
-			const struct ip_vs_iphdr *iph,
-			unsigned int proto_off, int inverse)
+			const struct ip_vs_iphdr *iph, int inverse)
 {
 	struct ip_vs_conn_param p;
 
-	if (ip_vs_conn_fill_param_proto(af, skb, iph, proto_off, inverse, &p))
+	if (ip_vs_conn_fill_param_proto(af, skb, iph, inverse, &p))
 		return NULL;
 
 	return ip_vs_conn_in_get(&p);
@@ -432,12 +430,11 @@ struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
 
 struct ip_vs_conn *
 ip_vs_conn_out_get_proto(int af, const struct sk_buff *skb,
-			 const struct ip_vs_iphdr *iph,
-			 unsigned int proto_off, int inverse)
+			 const struct ip_vs_iphdr *iph, int inverse)
 {
 	struct ip_vs_conn_param p;
 
-	if (ip_vs_conn_fill_param_proto(af, skb, iph, proto_off, inverse, &p))
+	if (ip_vs_conn_fill_param_proto(af, skb, iph, inverse, &p))
 		return NULL;
 
 	return ip_vs_conn_out_get(&p);
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 58918e2..32c69ed 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -222,11 +222,10 @@ ip_vs_conn_fill_param_persist(const struct ip_vs_service *svc,
  */
 static struct ip_vs_conn *
 ip_vs_sched_persist(struct ip_vs_service *svc,
-		    struct sk_buff *skb,
-		    __be16 src_port, __be16 dst_port, int *ignored)
+		    struct sk_buff *skb, __be16 src_port, __be16 dst_port,
+		    int *ignored, struct ip_vs_iphdr *iph)
 {
 	struct ip_vs_conn *cp = NULL;
-	struct ip_vs_iphdr iph;
 	struct ip_vs_dest *dest;
 	struct ip_vs_conn *ct;
 	__be16 dport = 0;		/* destination port to forward */
@@ -236,20 +235,18 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
 	union nf_inet_addr snet;	/* source network of the client,
 					   after masking */
 
-	ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
-
 	/* Mask saddr with the netmask to adjust template granularity */
 #ifdef CONFIG_IP_VS_IPV6
 	if (svc->af == AF_INET6)
-		ipv6_addr_prefix(&snet.in6, &iph.saddr.in6, svc->netmask);
+		ipv6_addr_prefix(&snet.in6, &iph->saddr.in6, svc->netmask);
 	else
 #endif
-		snet.ip = iph.saddr.ip & svc->netmask;
+		snet.ip = iph->saddr.ip & svc->netmask;
 
 	IP_VS_DBG_BUF(6, "p-schedule: src %s:%u dest %s:%u "
 		      "mnet %s\n",
-		      IP_VS_DBG_ADDR(svc->af, &iph.saddr), ntohs(src_port),
-		      IP_VS_DBG_ADDR(svc->af, &iph.daddr), ntohs(dst_port),
+		      IP_VS_DBG_ADDR(svc->af, &iph->saddr), ntohs(src_port),
+		      IP_VS_DBG_ADDR(svc->af, &iph->daddr), ntohs(dst_port),
 		      IP_VS_DBG_ADDR(svc->af, &snet));
 
 	/*
@@ -266,8 +263,8 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
 	 * is created for other persistent services.
 	 */
 	{
-		int protocol = iph.protocol;
-		const union nf_inet_addr *vaddr = &iph.daddr;
+		int protocol = iph->protocol;
+		const union nf_inet_addr *vaddr = &iph->daddr;
 		__be16 vport = 0;
 
 		if (dst_port == svc->port) {
@@ -342,14 +339,14 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
 		dport = dest->port;
 
 	flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
-		 && iph.protocol == IPPROTO_UDP)?
+		 && iph->protocol == IPPROTO_UDP) ?
 		IP_VS_CONN_F_ONE_PACKET : 0;
 
 	/*
 	 *    Create a new connection according to the template
 	 */
-	ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol, &iph.saddr,
-			      src_port, &iph.daddr, dst_port, &param);
+	ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol, &iph->saddr,
+			      src_port, &iph->daddr, dst_port, &param);
 
 	cp = ip_vs_conn_new(&param, &dest->addr, dport, flags, dest, skb->mark);
 	if (cp == NULL) {
@@ -392,18 +389,20 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
  */
 struct ip_vs_conn *
 ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
-	       struct ip_vs_proto_data *pd, int *ignored)
+	       struct ip_vs_proto_data *pd, int *ignored,
+	       struct ip_vs_iphdr *iph)
 {
 	struct ip_vs_protocol *pp = pd->pp;
 	struct ip_vs_conn *cp = NULL;
-	struct ip_vs_iphdr iph;
 	struct ip_vs_dest *dest;
 	__be16 _ports[2], *pptr;
 	unsigned int flags;
 
 	*ignored = 1;
-	ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
-	pptr = skb_header_pointer(skb, iph.len, sizeof(_ports), _ports);
+	/*
+	 * IPv6 frags, only the first hit here.
+	 */
+	pptr = skb_header_pointer(skb, iph->len, sizeof(_ports), _ports);
 	if (pptr == NULL)
 		return NULL;
 
@@ -423,7 +422,7 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
 	 *    Do not schedule replies from local real server.
 	 */
 	if ((!skb->dev || skb->dev->flags & IFF_LOOPBACK) &&
-	    (cp = pp->conn_in_get(svc->af, skb, &iph, iph.len, 1))) {
+	    (cp = pp->conn_in_get(svc->af, skb, iph, 1))) {
 		IP_VS_DBG_PKT(12, svc->af, pp, skb, 0,
 			      "Not scheduling reply for existing connection");
 		__ip_vs_conn_put(cp);
@@ -434,7 +433,8 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
 	 *    Persistent service
 	 */
 	if (svc->flags & IP_VS_SVC_F_PERSISTENT)
-		return ip_vs_sched_persist(svc, skb, pptr[0], pptr[1], ignored);
+		return ip_vs_sched_persist(svc, skb, pptr[0], pptr[1], ignored,
+					   iph);
 
 	*ignored = 0;
 
@@ -456,7 +456,7 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
 	}
 
 	flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
-		 && iph.protocol == IPPROTO_UDP)?
+		 && iph->protocol == IPPROTO_UDP) ?
 		IP_VS_CONN_F_ONE_PACKET : 0;
 
 	/*
@@ -465,9 +465,9 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
 	{
 		struct ip_vs_conn_param p;
 
-		ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol,
-				      &iph.saddr, pptr[0], &iph.daddr, pptr[1],
-				      &p);
+		ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol,
+				      &iph->saddr, pptr[0], &iph->daddr,
+				      pptr[1], &p);
 		cp = ip_vs_conn_new(&p, &dest->addr,
 				    dest->port ? dest->port : pptr[1],
 				    flags, dest, skb->mark);
@@ -496,19 +496,16 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
  *  no destination is available for a new connection.
  */
 int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
-		struct ip_vs_proto_data *pd)
+		struct ip_vs_proto_data *pd, struct ip_vs_iphdr *iph)
 {
 	__be16 _ports[2], *pptr;
-	struct ip_vs_iphdr iph;
 #ifdef CONFIG_SYSCTL
 	struct net *net;
 	struct netns_ipvs *ipvs;
 	int unicast;
 #endif
 
-	ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
-
-	pptr = skb_header_pointer(skb, iph.len, sizeof(_ports), _ports);
+	pptr = skb_header_pointer(skb, iph->len, sizeof(_ports), _ports);
 	if (pptr == NULL) {
 		ip_vs_service_put(svc);
 		return NF_DROP;
@@ -519,10 +516,10 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 
 #ifdef CONFIG_IP_VS_IPV6
 	if (svc->af == AF_INET6)
-		unicast = ipv6_addr_type(&iph.daddr.in6) & IPV6_ADDR_UNICAST;
+		unicast = ipv6_addr_type(&iph->daddr.in6) & IPV6_ADDR_UNICAST;
 	else
 #endif
-		unicast = (inet_addr_type(net, iph.daddr.ip) == RTN_UNICAST);
+		unicast = (inet_addr_type(net, iph->daddr.ip) == RTN_UNICAST);
 
 	/* if it is fwmark-based service, the cache_bypass sysctl is up
 	   and the destination is a non-local unicast, then create
@@ -532,7 +529,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 		int ret;
 		struct ip_vs_conn *cp;
 		unsigned int flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
-				      iph.protocol == IPPROTO_UDP)?
+				      iph->protocol == IPPROTO_UDP) ?
 				      IP_VS_CONN_F_ONE_PACKET : 0;
 		union nf_inet_addr daddr =  { .all = { 0, 0, 0, 0 } };
 
@@ -542,9 +539,9 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 		IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__);
 		{
 			struct ip_vs_conn_param p;
-			ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol,
-					      &iph.saddr, pptr[0],
-					      &iph.daddr, pptr[1], &p);
+			ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol,
+					      &iph->saddr, pptr[0],
+					      &iph->daddr, pptr[1], &p);
 			cp = ip_vs_conn_new(&p, &daddr, 0,
 					    IP_VS_CONN_F_BYPASS | flags,
 					    NULL, skb->mark);
@@ -559,7 +556,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 		ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd);
 
 		/* transmit the first SYN packet */
-		ret = cp->packet_xmit(skb, cp, pd->pp);
+		ret = cp->packet_xmit(skb, cp, pd->pp, iph);
 		/* do not touch skb anymore */
 
 		atomic_inc(&cp->in_pkts);
@@ -898,50 +895,38 @@ static int ip_vs_out_icmp(struct sk_buff *skb, int *related,
 	IP_VS_DBG_PKT(11, AF_INET, pp, skb, offset,
 		      "Checking outgoing ICMP for");
 
-	offset += cih->ihl * 4;
-
-	ip_vs_fill_iphdr(AF_INET, cih, &ciph);
+	ip_vs_fill_ip4hdr(cih, &ciph);
+	ciph.len += offset;
 	/* The embedded headers contain source and dest in reverse order */
-	cp = pp->conn_out_get(AF_INET, skb, &ciph, offset, 1);
+	cp = pp->conn_out_get(AF_INET, skb, &ciph, 1);
 	if (!cp)
 		return NF_ACCEPT;
 
 	snet.ip = iph->saddr;
 	return handle_response_icmp(AF_INET, skb, &snet, cih->protocol, cp,
-				    pp, offset, ihl);
+				    pp, ciph.len, ihl);
 }
 
 #ifdef CONFIG_IP_VS_IPV6
 static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
-			     unsigned int hooknum)
+			     unsigned int hooknum, struct ip_vs_iphdr *ipvsh)
 {
-	struct ipv6hdr *iph;
 	struct icmp6hdr	_icmph, *ic;
-	struct ipv6hdr	_ciph, *cih;	/* The ip header contained
+	struct ipv6hdr _ip6, *ip6;	/* The ip header contained
 					   within the ICMP */
-	struct ip_vs_iphdr ciph;
 	struct ip_vs_conn *cp;
 	struct ip_vs_protocol *pp;
-	unsigned int offset;
 	union nf_inet_addr snet;
 
 	*related = 1;
 
-	/* reassemble IP fragments */
-	if (ipv6_hdr(skb)->nexthdr == IPPROTO_FRAGMENT) {
-		if (ip_vs_gather_frags_v6(skb, ip_vs_defrag_user(hooknum)))
-			return NF_STOLEN;
-	}
-
-	iph = ipv6_hdr(skb);
-	offset = sizeof(struct ipv6hdr);
-	ic = skb_header_pointer(skb, offset, sizeof(_icmph), &_icmph);
+	ic = skb_header_pointer(skb, ipvsh->len, sizeof(_icmph), &_icmph);
 	if (ic == NULL)
 		return NF_DROP;
 
-	IP_VS_DBG(12, "Outgoing ICMPv6 (%d,%d) %pI6->%pI6\n",
+	IP_VS_DBG(12, "Outgoing ICMPv6 (%d,%d) %pI6c->%pI6c\n",
 		  ic->icmp6_type, ntohs(icmpv6_id(ic)),
-		  &iph->saddr, &iph->daddr);
+		  &ipvsh->saddr, &ipvsh->daddr);
 
 	/*
 	 * Work through seeing if this is for us.
@@ -958,34 +943,26 @@ static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
 	}
 
 	/* Now find the contained IP header */
-	offset += sizeof(_icmph);
-	cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph);
-	if (cih == NULL)
-		return NF_ACCEPT; /* The packet looks wrong, ignore */
+	ipvsh->len += sizeof(_icmph);
+	ip6 = skb_header_pointer(skb, ipvsh->len, sizeof(_ip6), &_ip6);
+	ipvsh->protocol = ipv6_find_hdr(skb, &ipvsh->len, -1,
+					&ipvsh->fragoffs, &ipvsh->flags);
 
-	pp = ip_vs_proto_get(cih->nexthdr);
-	if (!pp)
-		return NF_ACCEPT;
-
-	/* Is the embedded protocol header present? */
-	/* TODO: we don't support fragmentation at the moment anyways */
-	if (unlikely(cih->nexthdr == IPPROTO_FRAGMENT && pp->dont_defrag))
+	pp = ip_vs_proto_get(ipvsh->protocol);
+	if (!pp || (ipvsh->protocol < 0))
 		return NF_ACCEPT;
+	/* fill the rest of ipvsh */
+	ipvsh->saddr.in6 = ip6->saddr;
+	ipvsh->daddr.in6 = ip6->daddr;
 
-	IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offset,
-		      "Checking outgoing ICMPv6 for");
-
-	offset += sizeof(struct ipv6hdr);
-
-	ip_vs_fill_iphdr(AF_INET6, cih, &ciph);
 	/* The embedded headers contain source and dest in reverse order */
-	cp = pp->conn_out_get(AF_INET6, skb, &ciph, offset, 1);
+	cp = pp->conn_out_get(AF_INET6, skb, ipvsh, 1);
 	if (!cp)
 		return NF_ACCEPT;
 
-	snet.in6 = iph->saddr;
-	return handle_response_icmp(AF_INET6, skb, &snet, cih->nexthdr, cp,
-				    pp, offset, sizeof(struct ipv6hdr));
+	snet.in6 = ipvsh->saddr.in6;
+	return handle_response_icmp(AF_INET6, skb, &snet, ipvsh->protocol, cp,
+				    pp, ipvsh->len, sizeof(struct ipv6hdr));
 }
 #endif
 
@@ -1018,17 +995,17 @@ static inline int is_tcp_reset(const struct sk_buff *skb, int nh_len)
  */
 static unsigned int
 handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
-		struct ip_vs_conn *cp, int ihl)
+		struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
 {
 	struct ip_vs_protocol *pp = pd->pp;
 
 	IP_VS_DBG_PKT(11, af, pp, skb, 0, "Outgoing packet");
 
-	if (!skb_make_writable(skb, ihl))
+	if (!skb_make_writable(skb, iph->len))
 		goto drop;
 
 	/* mangle the packet */
-	if (pp->snat_handler && !pp->snat_handler(skb, pp, cp))
+	if (pp->snat_handler && !pp->snat_handler(skb, pp, cp, iph))
 		goto drop;
 
 #ifdef CONFIG_IP_VS_IPV6
@@ -1115,17 +1092,17 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
 	if (!net_ipvs(net)->enable)
 		return NF_ACCEPT;
 
-	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+	ip_vs_fill_iph_skb(af, skb, &iph);
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6) {
 		if (unlikely(iph.protocol == IPPROTO_ICMPV6)) {
 			int related;
 			int verdict = ip_vs_out_icmp_v6(skb, &related,
-							hooknum);
+							hooknum, &iph);
 
 			if (related)
 				return verdict;
-			ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+			ip_vs_fill_iph_skb(af, skb, &iph);
 		}
 	} else
 #endif
@@ -1135,7 +1112,7 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
 
 			if (related)
 				return verdict;
-			ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+			ip_vs_fill_ip4hdr(skb_network_header(skb), &iph);
 		}
 
 	pd = ip_vs_proto_data_get(net, iph.protocol);
@@ -1145,31 +1122,23 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
 
 	/* reassemble IP fragments */
 #ifdef CONFIG_IP_VS_IPV6
-	if (af == AF_INET6) {
-		if (ipv6_hdr(skb)->nexthdr == IPPROTO_FRAGMENT) {
-			if (ip_vs_gather_frags_v6(skb,
-						  ip_vs_defrag_user(hooknum)))
-				return NF_STOLEN;
-		}
-
-		ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
-	} else
+	if (af == AF_INET)
 #endif
 		if (unlikely(ip_is_fragment(ip_hdr(skb)) && !pp->dont_defrag)) {
 			if (ip_vs_gather_frags(skb,
 					       ip_vs_defrag_user(hooknum)))
 				return NF_STOLEN;
 
-			ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+			ip_vs_fill_ip4hdr(skb_network_header(skb), &iph);
 		}
 
 	/*
 	 * Check if the packet belongs to an existing entry
 	 */
-	cp = pp->conn_out_get(af, skb, &iph, iph.len, 0);
+	cp = pp->conn_out_get(af, skb, &iph, 0);
 
 	if (likely(cp))
-		return handle_response(af, skb, pd, cp, iph.len);
+		return handle_response(af, skb, pd, cp, &iph);
 	if (sysctl_nat_icmp_send(net) &&
 	    (pp->protocol == IPPROTO_TCP ||
 	     pp->protocol == IPPROTO_UDP ||
@@ -1375,13 +1344,13 @@ ip_vs_in_icmp(struct sk_buff *skb, int *related, unsigned int hooknum)
 		      "Checking incoming ICMP for");
 
 	offset2 = offset;
-	offset += cih->ihl * 4;
-
-	ip_vs_fill_iphdr(AF_INET, cih, &ciph);
+	ip_vs_fill_ip4hdr(cih, &ciph);
+	ciph.len += offset;
+	offset = ciph.len;
 	/* The embedded headers contain source and dest in reverse order.
 	 * For IPIP this is error for request, not for reply.
 	 */
-	cp = pp->conn_in_get(AF_INET, skb, &ciph, offset, ipip ? 0 : 1);
+	cp = pp->conn_in_get(AF_INET, skb, &ciph, ipip ? 0 : 1);
 	if (!cp)
 		return NF_ACCEPT;
 
@@ -1450,7 +1419,7 @@ ignore_ipip:
 	ip_vs_in_stats(cp, skb);
 	if (IPPROTO_TCP == cih->protocol || IPPROTO_UDP == cih->protocol)
 		offset += 2 * sizeof(__u16);
-	verdict = ip_vs_icmp_xmit(skb, cp, pp, offset, hooknum);
+	verdict = ip_vs_icmp_xmit(skb, cp, pp, offset, hooknum, &ciph);
 
 out:
 	__ip_vs_conn_put(cp);
@@ -1459,14 +1428,11 @@ out:
 }
 
 #ifdef CONFIG_IP_VS_IPV6
-static int
-ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
+static int ip_vs_in_icmp_v6(struct sk_buff *skb, int *related,
+			    unsigned int hooknum, struct ip_vs_iphdr *iph)
 {
 	struct net *net = NULL;
-	struct ipv6hdr *iph;
 	struct icmp6hdr	_icmph, *ic;
-	struct ipv6hdr	_ciph, *cih;	/* The ip header contained
-					   within the ICMP */
 	struct ip_vs_iphdr ciph;
 	struct ip_vs_conn *cp;
 	struct ip_vs_protocol *pp;
@@ -1475,19 +1441,11 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
 
 	*related = 1;
 
-	/* reassemble IP fragments */
-	if (ipv6_hdr(skb)->nexthdr == IPPROTO_FRAGMENT) {
-		if (ip_vs_gather_frags_v6(skb, ip_vs_defrag_user(hooknum)))
-			return NF_STOLEN;
-	}
-
-	iph = ipv6_hdr(skb);
-	offset = sizeof(struct ipv6hdr);
-	ic = skb_header_pointer(skb, offset, sizeof(_icmph), &_icmph);
+	ic = skb_header_pointer(skb, iph->len, sizeof(_icmph), &_icmph);
 	if (ic == NULL)
 		return NF_DROP;
 
-	IP_VS_DBG(12, "Incoming ICMPv6 (%d,%d) %pI6->%pI6\n",
+	IP_VS_DBG(12, "Incoming ICMPv6 (%d,%d) %pI6c->%pI6c\n",
 		  ic->icmp6_type, ntohs(icmpv6_id(ic)),
 		  &iph->saddr, &iph->daddr);
 
@@ -1506,39 +1464,43 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
 	}
 
 	/* Now find the contained IP header */
-	offset += sizeof(_icmph);
-	cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph);
-	if (cih == NULL)
-		return NF_ACCEPT; /* The packet looks wrong, ignore */
+	ciph.len = iph->len + sizeof(_icmph);
+	ciph.flags = 0;
+	ciph.fragoffs = 0;
+	ciph.protocol = ipv6_find_hdr(skb, &ciph.len, -1, &ciph.fragoffs,
+				      &ciph.flags);
+	ciph.saddr = iph->saddr;	/* con_in_get() handles reverse order */
+	ciph.daddr = iph->daddr;
 
 	net = skb_net(skb);
-	pd = ip_vs_proto_data_get(net, cih->nexthdr);
+	pd = ip_vs_proto_data_get(net, ciph.protocol);
 	if (!pd)
 		return NF_ACCEPT;
 	pp = pd->pp;
 
-	/* Is the embedded protocol header present? */
-	/* TODO: we don't support fragmentation at the moment anyways */
-	if (unlikely(cih->nexthdr == IPPROTO_FRAGMENT && pp->dont_defrag))
+	/* Is the embedded protocol header present?
+	 * If it's the second or later fragment we don't know what it is
+	 * i.e. just let it through.
+	 */
+	if (ciph.fragoffs)
 		return NF_ACCEPT;
 
+	offset = ciph.len;
 	IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offset,
 		      "Checking incoming ICMPv6 for");
 
-	offset += sizeof(struct ipv6hdr);
-
-	ip_vs_fill_iphdr(AF_INET6, cih, &ciph);
 	/* The embedded headers contain source and dest in reverse order */
-	cp = pp->conn_in_get(AF_INET6, skb, &ciph, offset, 1);
+	cp = pp->conn_in_get(AF_INET6, skb, &ciph, 1);
 	if (!cp)
 		return NF_ACCEPT;
 
 	/* do the statistics and put it back */
 	ip_vs_in_stats(cp, skb);
-	if (IPPROTO_TCP == cih->nexthdr || IPPROTO_UDP == cih->nexthdr ||
-	    IPPROTO_SCTP == cih->nexthdr)
-		offset += 2 * sizeof(__u16);
-	verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offset, hooknum);
+	if (IPPROTO_TCP == ciph.protocol || IPPROTO_UDP == ciph.protocol ||
+	    IPPROTO_SCTP == ciph.protocol)
+		offset = ciph.len + (2 * sizeof(__u16));
+
+	verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offset, hooknum, &ciph);
 
 	__ip_vs_conn_put(cp);
 
@@ -1574,7 +1536,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	if (unlikely((skb->pkt_type != PACKET_HOST &&
 		      hooknum != NF_INET_LOCAL_OUT) ||
 		     !skb_dst(skb))) {
-		ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+		ip_vs_fill_iph_skb(af, skb, &iph);
 		IP_VS_DBG_BUF(12, "packet type=%d proto=%d daddr=%s"
 			      " ignored in hook %u\n",
 			      skb->pkt_type, iph.protocol,
@@ -1586,7 +1548,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	if (!net_ipvs(net)->enable)
 		return NF_ACCEPT;
 
-	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+	ip_vs_fill_iph_skb(af, skb, &iph);
 
 	/* Bad... Do not break raw sockets */
 	if (unlikely(skb->sk != NULL && hooknum == NF_INET_LOCAL_OUT &&
@@ -1602,11 +1564,11 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	if (af == AF_INET6) {
 		if (unlikely(iph.protocol == IPPROTO_ICMPV6)) {
 			int related;
-			int verdict = ip_vs_in_icmp_v6(skb, &related, hooknum);
+			int verdict = ip_vs_in_icmp_v6(skb, &related, hooknum,
+						       &iph);
 
 			if (related)
 				return verdict;
-			ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
 		}
 	} else
 #endif
@@ -1616,7 +1578,6 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 
 			if (related)
 				return verdict;
-			ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
 		}
 
 	/* Protocol supported? */
@@ -1626,13 +1587,13 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	pp = pd->pp;
 	/*
 	 * Check if the packet belongs to an existing connection entry
+	 * Only sched first IPv6 fragment.
 	 */
-	cp = pp->conn_in_get(af, skb, &iph, iph.len, 0);
-
-	if (unlikely(!cp)) {
+	cp = pp->conn_in_get(af, skb, &iph, 0);
+	if (unlikely(!cp) && !iph.fragoffs) {
 		int v;
 
-		if (!pp->conn_schedule(af, skb, pd, &v, &cp))
+		if (!pp->conn_schedule(af, skb, pd, &v, &cp, &iph))
 			return v;
 	}
 
@@ -1662,7 +1623,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	ip_vs_in_stats(cp, skb);
 	ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd);
 	if (cp->packet_xmit)
-		ret = cp->packet_xmit(skb, cp, pp);
+		ret = cp->packet_xmit(skb, cp, pp, &iph);
 		/* do not touch skb anymore */
 	else {
 		IP_VS_DBG_RL("warning: packet_xmit is null");
@@ -1793,8 +1754,10 @@ ip_vs_forward_icmp_v6(unsigned int hooknum, struct sk_buff *skb,
 {
 	int r;
 	struct net *net;
+	struct ip_vs_iphdr iphdr;
 
-	if (ipv6_hdr(skb)->nexthdr != IPPROTO_ICMPV6)
+	ip_vs_fill_iph_skb(AF_INET6, skb, &iphdr);
+	if (iphdr.protocol != IPPROTO_ICMPV6)
 		return NF_ACCEPT;
 
 	/* ipvs enabled in this netns ? */
@@ -1802,7 +1765,7 @@ ip_vs_forward_icmp_v6(unsigned int hooknum, struct sk_buff *skb,
 	if (!net_ipvs(net)->enable)
 		return NF_ACCEPT;
 
-	return ip_vs_in_icmp_v6(skb, &r, hooknum);
+	return ip_vs_in_icmp_v6(skb, &r, hooknum, &iphdr);
 }
 #endif
 
diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
index 8b7dca9..7f3b0cc 100644
--- a/net/netfilter/ipvs/ip_vs_dh.c
+++ b/net/netfilter/ipvs/ip_vs_dh.c
@@ -215,7 +215,7 @@ ip_vs_dh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
 	struct ip_vs_dh_bucket *tbl;
 	struct ip_vs_iphdr iph;
 
-	ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
+	ip_vs_fill_iph_addr_only(svc->af, skb, &iph);
 
 	IP_VS_DBG(6, "%s(): Scheduling...\n", __func__);
 
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index df646cc..cbd3748 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -479,7 +479,7 @@ ip_vs_lblc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
 	struct ip_vs_dest *dest = NULL;
 	struct ip_vs_lblc_entry *en;
 
-	ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
+	ip_vs_fill_iph_addr_only(svc->af, skb, &iph);
 
 	IP_VS_DBG(6, "%s(): Scheduling...\n", __func__);
 
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 570e31e..161b679 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -649,7 +649,7 @@ ip_vs_lblcr_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
 	struct ip_vs_dest *dest = NULL;
 	struct ip_vs_lblcr_entry *en;
 
-	ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
+	ip_vs_fill_iph_addr_only(svc->af, skb, &iph);
 
 	IP_VS_DBG(6, "%s(): Scheduling...\n", __func__);
 
diff --git a/net/netfilter/ipvs/ip_vs_pe_sip.c b/net/netfilter/ipvs/ip_vs_pe_sip.c
index 1aa5cac..bb28b4f 100644
--- a/net/netfilter/ipvs/ip_vs_pe_sip.c
+++ b/net/netfilter/ipvs/ip_vs_pe_sip.c
@@ -68,26 +68,37 @@ static int get_callid(const char *dptr, unsigned int dataoff,
 static int
 ip_vs_sip_fill_param(struct ip_vs_conn_param *p, struct sk_buff *skb)
 {
+	struct sk_buff *reasm = skb_nfct_reasm(skb);
 	struct ip_vs_iphdr iph;
 	unsigned int dataoff, datalen, matchoff, matchlen;
 	const char *dptr;
 	int retc;
 
-	ip_vs_fill_iphdr(p->af, skb_network_header(skb), &iph);
+	ip_vs_fill_iph_skb(p->af, skb, &iph);
 
 	/* Only useful with UDP */
 	if (iph.protocol != IPPROTO_UDP)
 		return -EINVAL;
+	/*
+	 * todo: IPv6 fragments:
+	 *       I think this only should be done for the first fragment. /HS
+	 */
+	if (!reasm) {
+		reasm = skb;
+		dataoff = iph.len + sizeof(struct udphdr);
+	} else
+		dataoff = iph.offs + sizeof(struct udphdr);
 
-	/* No Data ? */
-	dataoff = iph.len + sizeof(struct udphdr);
-	if (dataoff >= skb->len)
+	if (dataoff >= reasm->len)
 		return -EINVAL;
-
-	if ((retc=skb_linearize(skb)) < 0)
+	/*
+	 * todo: Check if this will mess-up the reasm skb !!! /HS
+	 */
+	retc = skb_linearize(reasm);
+	if (retc < 0)
 		return retc;
-	dptr = skb->data + dataoff;
-	datalen = skb->len - dataoff;
+	dptr = reasm->data + dataoff;
+	datalen = reasm->len - dataoff;
 
 	if (get_callid(dptr, dataoff, datalen, &matchoff, &matchlen))
 		return -EINVAL;
diff --git a/net/netfilter/ipvs/ip_vs_proto_ah_esp.c b/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
index 5b8eb8b..5de3dd3 100644
--- a/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
@@ -57,7 +57,7 @@ ah_esp_conn_fill_param_proto(struct net *net, int af,
 
 static struct ip_vs_conn *
 ah_esp_conn_in_get(int af, const struct sk_buff *skb,
-		   const struct ip_vs_iphdr *iph, unsigned int proto_off,
+		   const struct ip_vs_iphdr *iph,
 		   int inverse)
 {
 	struct ip_vs_conn *cp;
@@ -85,9 +85,7 @@ ah_esp_conn_in_get(int af, const struct sk_buff *skb,
 
 static struct ip_vs_conn *
 ah_esp_conn_out_get(int af, const struct sk_buff *skb,
-		    const struct ip_vs_iphdr *iph,
-		    unsigned int proto_off,
-		    int inverse)
+		    const struct ip_vs_iphdr *iph, int inverse)
 {
 	struct ip_vs_conn *cp;
 	struct ip_vs_conn_param p;
@@ -110,7 +108,8 @@ ah_esp_conn_out_get(int af, const struct sk_buff *skb,
 
 static int
 ah_esp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
-		     int *verdict, struct ip_vs_conn **cpp)
+		     int *verdict, struct ip_vs_conn **cpp,
+		     struct ip_vs_iphdr *iph)
 {
 	/*
 	 * AH/ESP is only related traffic. Pass the packet to IP stack.
diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index 9f3fb75..746048b 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -10,28 +10,26 @@
 
 static int
 sctp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
-		   int *verdict, struct ip_vs_conn **cpp)
+		   int *verdict, struct ip_vs_conn **cpp,
+		   struct ip_vs_iphdr *iph)
 {
 	struct net *net;
 	struct ip_vs_service *svc;
 	sctp_chunkhdr_t _schunkh, *sch;
 	sctp_sctphdr_t *sh, _sctph;
-	struct ip_vs_iphdr iph;
 
-	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
-
-	sh = skb_header_pointer(skb, iph.len, sizeof(_sctph), &_sctph);
+	sh = skb_header_pointer(skb, iph->len, sizeof(_sctph), &_sctph);
 	if (sh == NULL)
 		return 0;
 
-	sch = skb_header_pointer(skb, iph.len + sizeof(sctp_sctphdr_t),
+	sch = skb_header_pointer(skb, iph->len + sizeof(sctp_sctphdr_t),
 				 sizeof(_schunkh), &_schunkh);
 	if (sch == NULL)
 		return 0;
 	net = skb_net(skb);
 	if ((sch->type == SCTP_CID_INIT) &&
-	    (svc = ip_vs_service_get(net, af, skb->mark, iph.protocol,
-				     &iph.daddr, sh->dest))) {
+	    (svc = ip_vs_service_get(net, af, skb->mark, iph->protocol,
+				     &iph->daddr, sh->dest))) {
 		int ignored;
 
 		if (ip_vs_todrop(net_ipvs(net))) {
@@ -47,10 +45,10 @@ sctp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
 		 * Let the virtual server select a real server for the
 		 * incoming connection, and create a connection entry.
 		 */
-		*cpp = ip_vs_schedule(svc, skb, pd, &ignored);
+		*cpp = ip_vs_schedule(svc, skb, pd, &ignored, iph);
 		if (!*cpp && ignored <= 0) {
 			if (!ignored)
-				*verdict = ip_vs_leave(svc, skb, pd);
+				*verdict = ip_vs_leave(svc, skb, pd, iph);
 			else {
 				ip_vs_service_put(svc);
 				*verdict = NF_DROP;
@@ -64,20 +62,18 @@ sctp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
 }
 
 static int
-sctp_snat_handler(struct sk_buff *skb,
-		  struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+sctp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+		  struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
 {
 	sctp_sctphdr_t *sctph;
-	unsigned int sctphoff;
+	unsigned int sctphoff = iph->len;
 	struct sk_buff *iter;
 	__be32 crc32;
 
 #ifdef CONFIG_IP_VS_IPV6
-	if (cp->af == AF_INET6)
-		sctphoff = sizeof(struct ipv6hdr);
-	else
+	if (cp->af == AF_INET6 && iph->fragoffs)
+		return 1;
 #endif
-		sctphoff = ip_hdrlen(skb);
 
 	/* csum_check requires unshared skb */
 	if (!skb_make_writable(skb, sctphoff + sizeof(*sctph)))
@@ -108,20 +104,18 @@ sctp_snat_handler(struct sk_buff *skb,
 }
 
 static int
-sctp_dnat_handler(struct sk_buff *skb,
-		  struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+sctp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+		  struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
 {
 	sctp_sctphdr_t *sctph;
-	unsigned int sctphoff;
+	unsigned int sctphoff = iph->len;
 	struct sk_buff *iter;
 	__be32 crc32;
 
 #ifdef CONFIG_IP_VS_IPV6
-	if (cp->af == AF_INET6)
-		sctphoff = sizeof(struct ipv6hdr);
-	else
+	if (cp->af == AF_INET6 && iph->fragoffs)
+		return 1;
 #endif
-		sctphoff = ip_hdrlen(skb);
 
 	/* csum_check requires unshared skb */
 	if (!skb_make_writable(skb, sctphoff + sizeof(*sctph)))
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index cd609cc..9af653a 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -33,16 +33,14 @@
 
 static int
 tcp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
-		  int *verdict, struct ip_vs_conn **cpp)
+		  int *verdict, struct ip_vs_conn **cpp,
+		  struct ip_vs_iphdr *iph)
 {
 	struct net *net;
 	struct ip_vs_service *svc;
 	struct tcphdr _tcph, *th;
-	struct ip_vs_iphdr iph;
 
-	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
-
-	th = skb_header_pointer(skb, iph.len, sizeof(_tcph), &_tcph);
+	th = skb_header_pointer(skb, iph->len, sizeof(_tcph), &_tcph);
 	if (th == NULL) {
 		*verdict = NF_DROP;
 		return 0;
@@ -50,8 +48,8 @@ tcp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
 	net = skb_net(skb);
 	/* No !th->ack check to allow scheduling on SYN+ACK for Active FTP */
 	if (th->syn &&
-	    (svc = ip_vs_service_get(net, af, skb->mark, iph.protocol,
-				     &iph.daddr, th->dest))) {
+	    (svc = ip_vs_service_get(net, af, skb->mark, iph->protocol,
+				     &iph->daddr, th->dest))) {
 		int ignored;
 
 		if (ip_vs_todrop(net_ipvs(net))) {
@@ -68,10 +66,10 @@ tcp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
 		 * Let the virtual server select a real server for the
 		 * incoming connection, and create a connection entry.
 		 */
-		*cpp = ip_vs_schedule(svc, skb, pd, &ignored);
+		*cpp = ip_vs_schedule(svc, skb, pd, &ignored, iph);
 		if (!*cpp && ignored <= 0) {
 			if (!ignored)
-				*verdict = ip_vs_leave(svc, skb, pd);
+				*verdict = ip_vs_leave(svc, skb, pd, iph);
 			else {
 				ip_vs_service_put(svc);
 				*verdict = NF_DROP;
@@ -128,20 +126,18 @@ tcp_partial_csum_update(int af, struct tcphdr *tcph,
 
 
 static int
-tcp_snat_handler(struct sk_buff *skb,
-		 struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+tcp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+		 struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
 {
 	struct tcphdr *tcph;
-	unsigned int tcphoff;
+	unsigned int tcphoff = iph->len;
 	int oldlen;
 	int payload_csum = 0;
 
 #ifdef CONFIG_IP_VS_IPV6
-	if (cp->af == AF_INET6)
-		tcphoff = sizeof(struct ipv6hdr);
-	else
+	if (cp->af == AF_INET6 && iph->fragoffs)
+		return 1;
 #endif
-		tcphoff = ip_hdrlen(skb);
 	oldlen = skb->len - tcphoff;
 
 	/* csum_check requires unshared skb */
@@ -208,20 +204,18 @@ tcp_snat_handler(struct sk_buff *skb,
 
 
 static int
-tcp_dnat_handler(struct sk_buff *skb,
-		 struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+tcp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+		 struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
 {
 	struct tcphdr *tcph;
-	unsigned int tcphoff;
+	unsigned int tcphoff = iph->len;
 	int oldlen;
 	int payload_csum = 0;
 
 #ifdef CONFIG_IP_VS_IPV6
-	if (cp->af == AF_INET6)
-		tcphoff = sizeof(struct ipv6hdr);
-	else
+	if (cp->af == AF_INET6 && iph->fragoffs)
+		return 1;
 #endif
-		tcphoff = ip_hdrlen(skb);
 	oldlen = skb->len - tcphoff;
 
 	/* csum_check requires unshared skb */
diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c
index 2fedb2d..503a842 100644
--- a/net/netfilter/ipvs/ip_vs_proto_udp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_udp.c
@@ -30,23 +30,22 @@
 
 static int
 udp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
-		  int *verdict, struct ip_vs_conn **cpp)
+		  int *verdict, struct ip_vs_conn **cpp,
+		  struct ip_vs_iphdr *iph)
 {
 	struct net *net;
 	struct ip_vs_service *svc;
 	struct udphdr _udph, *uh;
-	struct ip_vs_iphdr iph;
 
-	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
-
-	uh = skb_header_pointer(skb, iph.len, sizeof(_udph), &_udph);
+	/* IPv6 fragments, only first fragment will hit this */
+	uh = skb_header_pointer(skb, iph->len, sizeof(_udph), &_udph);
 	if (uh == NULL) {
 		*verdict = NF_DROP;
 		return 0;
 	}
 	net = skb_net(skb);
-	svc = ip_vs_service_get(net, af, skb->mark, iph.protocol,
-				&iph.daddr, uh->dest);
+	svc = ip_vs_service_get(net, af, skb->mark, iph->protocol,
+				&iph->daddr, uh->dest);
 	if (svc) {
 		int ignored;
 
@@ -64,10 +63,10 @@ udp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
 		 * Let the virtual server select a real server for the
 		 * incoming connection, and create a connection entry.
 		 */
-		*cpp = ip_vs_schedule(svc, skb, pd, &ignored);
+		*cpp = ip_vs_schedule(svc, skb, pd, &ignored, iph);
 		if (!*cpp && ignored <= 0) {
 			if (!ignored)
-				*verdict = ip_vs_leave(svc, skb, pd);
+				*verdict = ip_vs_leave(svc, skb, pd, iph);
 			else {
 				ip_vs_service_put(svc);
 				*verdict = NF_DROP;
@@ -125,20 +124,18 @@ udp_partial_csum_update(int af, struct udphdr *uhdr,
 
 
 static int
-udp_snat_handler(struct sk_buff *skb,
-		 struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+udp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+		 struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
 {
 	struct udphdr *udph;
-	unsigned int udphoff;
+	unsigned int udphoff = iph->len;
 	int oldlen;
 	int payload_csum = 0;
 
 #ifdef CONFIG_IP_VS_IPV6
-	if (cp->af == AF_INET6)
-		udphoff = sizeof(struct ipv6hdr);
-	else
+	if (cp->af == AF_INET6 && iph->fragoffs)
+		return 1;
 #endif
-		udphoff = ip_hdrlen(skb);
 	oldlen = skb->len - udphoff;
 
 	/* csum_check requires unshared skb */
@@ -210,20 +207,18 @@ udp_snat_handler(struct sk_buff *skb,
 
 
 static int
-udp_dnat_handler(struct sk_buff *skb,
-		 struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+udp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+		 struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
 {
 	struct udphdr *udph;
-	unsigned int udphoff;
+	unsigned int udphoff = iph->len;
 	int oldlen;
 	int payload_csum = 0;
 
 #ifdef CONFIG_IP_VS_IPV6
-	if (cp->af == AF_INET6)
-		udphoff = sizeof(struct ipv6hdr);
-	else
+	if (cp->af == AF_INET6 && iph->fragoffs)
+		return 1;
 #endif
-		udphoff = ip_hdrlen(skb);
 	oldlen = skb->len - udphoff;
 
 	/* csum_check requires unshared skb */
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index 0512652..e331269 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -228,7 +228,7 @@ ip_vs_sh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
 	struct ip_vs_sh_bucket *tbl;
 	struct ip_vs_iphdr iph;
 
-	ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
+	ip_vs_fill_iph_addr_only(svc->af, skb, &iph);
 
 	IP_VS_DBG(6, "ip_vs_sh_schedule(): Scheduling...\n");
 
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index eef3432..925cca2 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -408,7 +408,7 @@ do {							\
  */
 int
 ip_vs_null_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
-		struct ip_vs_protocol *pp)
+		struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
 {
 	/* we do not touch skb and do not need pskb ptr */
 	IP_VS_XMIT(NFPROTO_IPV4, skb, cp, 1);
@@ -422,7 +422,7 @@ ip_vs_null_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
  */
 int
 ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
-		  struct ip_vs_protocol *pp)
+		  struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
 {
 	struct rtable *rt;			/* Route to the other host */
 	struct iphdr  *iph = ip_hdr(skb);
@@ -477,16 +477,16 @@ ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 #ifdef CONFIG_IP_VS_IPV6
 int
 ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
-		     struct ip_vs_protocol *pp)
+		     struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph)
 {
 	struct rt6_info *rt;			/* Route to the other host */
-	struct ipv6hdr  *iph = ipv6_hdr(skb);
 	int    mtu;
 
 	EnterFunction(10);
 
-	if (!(rt = __ip_vs_get_out_rt_v6(skb, NULL, &iph->daddr, NULL, 0,
-					 IP_VS_RT_MODE_NON_LOCAL)))
+	rt = __ip_vs_get_out_rt_v6(skb, NULL, &iph->daddr.in6, NULL, 0,
+				   IP_VS_RT_MODE_NON_LOCAL);
+	if (!rt)
 		goto tx_error_icmp;
 
 	/* MTU checking */
@@ -540,7 +540,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
  */
 int
 ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
-	       struct ip_vs_protocol *pp)
+	       struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
 {
 	struct rtable *rt;		/* Route to the other host */
 	int mtu;
@@ -610,7 +610,7 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 		goto tx_error_put;
 
 	/* mangle the packet */
-	if (pp->dnat_handler && !pp->dnat_handler(skb, pp, cp))
+	if (pp->dnat_handler && !pp->dnat_handler(skb, pp, cp, ipvsh))
 		goto tx_error_put;
 	ip_hdr(skb)->daddr = cp->daddr.ip;
 	ip_send_check(ip_hdr(skb));
@@ -658,7 +658,7 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 #ifdef CONFIG_IP_VS_IPV6
 int
 ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
-		  struct ip_vs_protocol *pp)
+		  struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph)
 {
 	struct rt6_info *rt;		/* Route to the other host */
 	int mtu;
@@ -669,8 +669,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	/* check if it is a connection of no-client-port */
 	if (unlikely(cp->flags & IP_VS_CONN_F_NO_CPORT)) {
 		__be16 _pt, *p;
-		p = skb_header_pointer(skb, sizeof(struct ipv6hdr),
-				       sizeof(_pt), &_pt);
+		p = skb_header_pointer(skb, iph->len, sizeof(_pt), &_pt);
 		if (p == NULL)
 			goto tx_error;
 		ip_vs_conn_fill_cport(cp, *p);
@@ -732,7 +731,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 		goto tx_error_put;
 
 	/* mangle the packet */
-	if (pp->dnat_handler && !pp->dnat_handler(skb, pp, cp))
+	if (pp->dnat_handler && !pp->dnat_handler(skb, pp, cp, iph))
 		goto tx_error;
 	ipv6_hdr(skb)->daddr = cp->daddr.in6;
 
@@ -793,7 +792,7 @@ tx_error_put:
  */
 int
 ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
-		  struct ip_vs_protocol *pp)
+		  struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
 {
 	struct netns_ipvs *ipvs = net_ipvs(skb_net(skb));
 	struct rtable *rt;			/* Route to the other host */
@@ -913,7 +912,7 @@ tx_error_put:
 #ifdef CONFIG_IP_VS_IPV6
 int
 ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
-		     struct ip_vs_protocol *pp)
+		     struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
 {
 	struct rt6_info *rt;		/* Route to the other host */
 	struct in6_addr saddr;		/* Source for tunnel */
@@ -1034,7 +1033,7 @@ tx_error_put:
  */
 int
 ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
-	      struct ip_vs_protocol *pp)
+	      struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
 {
 	struct rtable *rt;			/* Route to the other host */
 	struct iphdr  *iph = ip_hdr(skb);
@@ -1095,7 +1094,7 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 #ifdef CONFIG_IP_VS_IPV6
 int
 ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
-		 struct ip_vs_protocol *pp)
+		 struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph)
 {
 	struct rt6_info *rt;			/* Route to the other host */
 	int    mtu;
@@ -1163,7 +1162,8 @@ tx_error:
  */
 int
 ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
-		struct ip_vs_protocol *pp, int offset, unsigned int hooknum)
+		struct ip_vs_protocol *pp, int offset, unsigned int hooknum,
+		struct ip_vs_iphdr *iph)
 {
 	struct rtable	*rt;	/* Route to the other host */
 	int mtu;
@@ -1178,7 +1178,7 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	   translate address/port back */
 	if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) {
 		if (cp->packet_xmit)
-			rc = cp->packet_xmit(skb, cp, pp);
+			rc = cp->packet_xmit(skb, cp, pp, iph);
 		else
 			rc = NF_ACCEPT;
 		/* do not touch skb anymore */
@@ -1284,7 +1284,8 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 #ifdef CONFIG_IP_VS_IPV6
 int
 ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
-		struct ip_vs_protocol *pp, int offset, unsigned int hooknum)
+		struct ip_vs_protocol *pp, int offset, unsigned int hooknum,
+		struct ip_vs_iphdr *iph)
 {
 	struct rt6_info	*rt;	/* Route to the other host */
 	int mtu;
@@ -1299,7 +1300,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	   translate address/port back */
 	if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) {
 		if (cp->packet_xmit)
-			rc = cp->packet_xmit(skb, cp, pp);
+			rc = cp->packet_xmit(skb, cp, pp, iph);
 		else
 			rc = NF_ACCEPT;
 		/* do not touch skb anymore */
diff --git a/net/netfilter/xt_ipvs.c b/net/netfilter/xt_ipvs.c
index bb10b07..8d47c37 100644
--- a/net/netfilter/xt_ipvs.c
+++ b/net/netfilter/xt_ipvs.c
@@ -67,7 +67,7 @@ ipvs_mt(const struct sk_buff *skb, struct xt_action_param *par)
 		goto out;
 	}
 
-	ip_vs_fill_iphdr(family, skb_network_header(skb), &iph);
+	ip_vs_fill_iph_skb(family, skb, &iph);
 
 	if (data->bitmask & XT_IPVS_PROTO)
 		if ((iph.protocol == data->l4proto) ^
@@ -85,7 +85,7 @@ ipvs_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	/*
 	 * Check if the packet belongs to an existing entry
 	 */
-	cp = pp->conn_out_get(family, skb, &iph, iph.len, 1 /* inverse */);
+	cp = pp->conn_out_get(family, skb, &iph, 1 /* inverse */);
 	if (unlikely(cp == NULL)) {
 		match = false;
 		goto out;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] ipvs: Complete IPv6 fragment handling for IPVS
  2012-08-20 13:08 [PATCH 0/3] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
  2012-08-20 13:08 ` [PATCH 1/3] ipvs: Trivial changes, use compressed IPv6 address in output Jesper Dangaard Brouer
  2012-08-20 13:08 ` [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS Jesper Dangaard Brouer
@ 2012-08-20 13:08 ` Jesper Dangaard Brouer
  2012-08-21  5:24 ` [PATCH 0/3] ipvs: " Simon Horman
  3 siblings, 0 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-20 13:08 UTC (permalink / raw)
  To: netdev, Patrick McHardy, Hans Schillstrom, lvs-devel,
	Julian Anastasov, Simon Horman
  Cc: Jesper Dangaard Brouer, Wensong Zhang, netfilter-devel

IPVS now supports fragmented packets, with support from nf_conntrack_reasm.c

Based on patch from: Hans Schillstrom.

IPVS do like conntrack i.e. use the skb->nfct_reasm
(i.e. when all fragments is collected, nf_ct_frag6_output()
starts a "re-play" of all fragments into the interrupted
PREROUTING chain at prio -399 (NF_IP6_PRI_CONNTRACK_DEFRAG+1)
with nfct_reasm pointing to the assembled packet.)

Notice, module nf_defrag_ipv6 must be loaded for this to work.

IPVS adds a new hook into prerouting chain at prio
-99 (NF_IP6_PRI_NAT_DST+1) to catch fragments, and copy fw-mark
info from the first packet with an upper layer header.

Also, for IPv6, handle all ICMPv6 NONE Informational Messages (via
ICMPV6_INFOMSG_MASK).  This actually only extend our handling to
type ICMPV6_PARAMPROB (Parameter Problem), and future types.

- Fixed refcnt bug since last.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
---

 include/net/ip_vs.h             |   16 ++++
 net/netfilter/ipvs/Kconfig      |    7 +-
 net/netfilter/ipvs/ip_vs_conn.c |    2 
 net/netfilter/ipvs/ip_vs_core.c |  173 ++++++++++++++++++++++++++++-----------
 net/netfilter/ipvs/ip_vs_xmit.c |   24 ++++-
 5 files changed, 161 insertions(+), 61 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 8d5920f..50f377e 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -123,11 +123,27 @@ static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
 {
 	return skb->nfct_reasm;
 }
+static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
+				      int len, void *buffer,
+				      const struct ip_vs_iphdr *ipvsh)
+{
+	if (unlikely(ipvsh->fragoffs && skb_nfct_reasm(skb)))
+		return skb_header_pointer(skb_nfct_reasm(skb), ipvsh->offs,
+					  len, buffer);
+
+	return skb_header_pointer(skb, offset, len, buffer);
+}
 #else
 static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
 {
 	return NULL;
 }
+static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
+				      int len, void *buffer,
+				      const struct ip_vs_iphdr *ipvsh)
+{
+	return skb_header_pointer(skb, offset, len, buffer);
+}
 #endif
 
 static inline void
diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index 8b2cffd..0c3b167 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -28,12 +28,11 @@ if IP_VS
 config	IP_VS_IPV6
 	bool "IPv6 support for IPVS"
 	depends on IPV6 = y || IP_VS = IPV6
+	select IP6_NF_IPTABLES
 	---help---
-	  Add IPv6 support to IPVS. This is incomplete and might be dangerous.
+	  Add IPv6 support to IPVS.
 
-	  See http://www.mindbasket.com/ipvs for more information.
-
-	  Say N if unsure.
+	  Say Y if unsure.
 
 config	IP_VS_DEBUG
 	bool "IP virtual server debugging"
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index a00db99..30e764a 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -313,7 +313,7 @@ ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
 	__be16 _ports[2], *pptr;
 	struct net *net = skb_net(skb);
 
-	pptr = skb_header_pointer(skb, iph->len, sizeof(_ports), _ports);
+	pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports, iph);
 	if (pptr == NULL)
 		return 1;
 
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 32c69ed..9f2e167 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -402,7 +402,7 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
 	/*
 	 * IPv6 frags, only the first hit here.
 	 */
-	pptr = skb_header_pointer(skb, iph->len, sizeof(_ports), _ports);
+	pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports, iph);
 	if (pptr == NULL)
 		return NULL;
 
@@ -505,7 +505,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 	int unicast;
 #endif
 
-	pptr = skb_header_pointer(skb, iph->len, sizeof(_ports), _ports);
+	pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports, iph);
 	if (pptr == NULL) {
 		ip_vs_service_put(svc);
 		return NF_DROP;
@@ -651,14 +651,6 @@ static inline int ip_vs_gather_frags(struct sk_buff *skb, u_int32_t user)
 	return err;
 }
 
-#ifdef CONFIG_IP_VS_IPV6
-static inline int ip_vs_gather_frags_v6(struct sk_buff *skb, u_int32_t user)
-{
-	/* TODO IPv6: Find out what to do here for IPv6 */
-	return 0;
-}
-#endif
-
 static int ip_vs_route_me_harder(int af, struct sk_buff *skb)
 {
 #ifdef CONFIG_IP_VS_IPV6
@@ -729,10 +721,22 @@ void ip_vs_nat_icmp_v6(struct sk_buff *skb, struct ip_vs_protocol *pp,
 		    struct ip_vs_conn *cp, int inout)
 {
 	struct ipv6hdr *iph	 = ipv6_hdr(skb);
-	unsigned int icmp_offset = sizeof(struct ipv6hdr);
-	struct icmp6hdr *icmph	 = (struct icmp6hdr *)(skb_network_header(skb) +
-						      icmp_offset);
-	struct ipv6hdr *ciph	 = (struct ipv6hdr *)(icmph + 1);
+	unsigned int icmp_offset = 0;
+	unsigned int offs	 = 0; /* header offset*/
+	int protocol;
+	struct icmp6hdr *icmph;
+	struct ipv6hdr *ciph;
+	unsigned short fragoffs;
+
+	ipv6_find_hdr(skb, &icmp_offset, IPPROTO_ICMPV6, &fragoffs, NULL);
+	icmph = (struct icmp6hdr *)(skb_network_header(skb) + icmp_offset);
+	offs = icmp_offset + sizeof(struct icmp6hdr);
+	ciph = (struct ipv6hdr *)(skb_network_header(skb) + offs);
+
+	protocol = ipv6_find_hdr(skb, &offs, -1, &fragoffs, NULL);
+
+	if (!skb_make_writable(skb, offs + sizeof(__u32)))
+		return;
 
 	if (inout) {
 		iph->saddr = cp->vaddr.in6;
@@ -743,10 +747,13 @@ void ip_vs_nat_icmp_v6(struct sk_buff *skb, struct ip_vs_protocol *pp,
 	}
 
 	/* the TCP/UDP/SCTP port */
-	if (IPPROTO_TCP == ciph->nexthdr || IPPROTO_UDP == ciph->nexthdr ||
-	    IPPROTO_SCTP == ciph->nexthdr) {
-		__be16 *ports = (void *)ciph + sizeof(struct ipv6hdr);
+	if (!fragoffs && (IPPROTO_TCP == protocol || IPPROTO_UDP == protocol ||
+			  IPPROTO_SCTP == protocol)) {
+		__be16 *ports = (void *)(skb_network_header(skb) + offs);
 
+		IP_VS_DBG(11, "%s() changed port %d to %d\n", __func__,
+			      ntohs(inout ? ports[1] : ports[0]),
+			      ntohs(inout ? cp->vport : cp->dport));
 		if (inout)
 			ports[1] = cp->vport;
 		else
@@ -919,12 +926,12 @@ static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
 	union nf_inet_addr snet;
 
 	*related = 1;
-
-	ic = skb_header_pointer(skb, ipvsh->len, sizeof(_icmph), &_icmph);
+	ic = frag_safe_skb_hp(skb, ipvsh->len, sizeof(_icmph), &_icmph, ipvsh);
 	if (ic == NULL)
 		return NF_DROP;
 
-	IP_VS_DBG(12, "Outgoing ICMPv6 (%d,%d) %pI6c->%pI6c\n",
+	IP_VS_DBG(12, "Outgoing ICMPv6 %s(%d,%d) %pI6c->%pI6c\n",
+		  ipvsh->flags & IP6T_FH_F_FRAG ? "Fragment " : "",
 		  ic->icmp6_type, ntohs(icmpv6_id(ic)),
 		  &ipvsh->saddr, &ipvsh->daddr);
 
@@ -935,12 +942,15 @@ static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
 	 * this means that some packets will manage to get a long way
 	 * down this stack and then be rejected, but that's life.
 	 */
-	if ((ic->icmp6_type != ICMPV6_DEST_UNREACH) &&
-	    (ic->icmp6_type != ICMPV6_PKT_TOOBIG) &&
-	    (ic->icmp6_type != ICMPV6_TIME_EXCEED)) {
+	if (ic->icmp6_type & ICMPV6_INFOMSG_MASK) {
 		*related = 0;
 		return NF_ACCEPT;
 	}
+	/* Fragment header that is before ICMP header tells us that:
+	 * it's not an error message since they can't be fragmented.
+	 */
+	if (ipvsh->flags & IP6T_FH_F_FRAG)
+		return NF_DROP;
 
 	/* Now find the contained IP header */
 	ipvsh->len += sizeof(_icmph);
@@ -1095,6 +1105,12 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
 	ip_vs_fill_iph_skb(af, skb, &iph);
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6) {
+		if (!iph.fragoffs && skb_nfct_reasm(skb)) {
+			struct sk_buff *reasm = skb_nfct_reasm(skb);
+			/* Save fw mark for coming frags */
+			reasm->ipvs_property = 1;
+			reasm->mark = skb->mark;
+		}
 		if (unlikely(iph.protocol == IPPROTO_ICMPV6)) {
 			int related;
 			int verdict = ip_vs_out_icmp_v6(skb, &related,
@@ -1102,7 +1118,6 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
 
 			if (related)
 				return verdict;
-			ip_vs_fill_iph_skb(af, skb, &iph);
 		}
 	} else
 #endif
@@ -1112,7 +1127,6 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
 
 			if (related)
 				return verdict;
-			ip_vs_fill_ip4hdr(skb_network_header(skb), &iph);
 		}
 
 	pd = ip_vs_proto_data_get(net, iph.protocol);
@@ -1145,8 +1159,8 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
 	     pp->protocol == IPPROTO_SCTP)) {
 		__be16 _ports[2], *pptr;
 
-		pptr = skb_header_pointer(skb, iph.len,
-					  sizeof(_ports), _ports);
+		pptr = frag_safe_skb_hp(skb, iph.len,
+					 sizeof(_ports), _ports, &iph);
 		if (pptr == NULL)
 			return NF_ACCEPT;	/* Not for me */
 		if (ip_vs_lookup_real_service(net, af, iph.protocol,
@@ -1432,20 +1446,21 @@ static int ip_vs_in_icmp_v6(struct sk_buff *skb, int *related,
 			    unsigned int hooknum, struct ip_vs_iphdr *iph)
 {
 	struct net *net = NULL;
+	const struct ipv6hdr _ip6h, *ip6h;
 	struct icmp6hdr	_icmph, *ic;
 	struct ip_vs_iphdr ciph;
 	struct ip_vs_conn *cp;
 	struct ip_vs_protocol *pp;
 	struct ip_vs_proto_data *pd;
-	unsigned int offset, verdict;
+	unsigned int offs_ciph, verdict;
 
 	*related = 1;
 
-	ic = skb_header_pointer(skb, iph->len, sizeof(_icmph), &_icmph);
+	ic = frag_safe_skb_hp(skb, iph->len, sizeof(_icmph), &_icmph, iph);
 	if (ic == NULL)
 		return NF_DROP;
 
-	IP_VS_DBG(12, "Incoming ICMPv6 (%d,%d) %pI6c->%pI6c\n",
+	IP_VS_DBG(12, "Incoming ICMPv6 %d(%d,%d) %pI6c->%pI6c\n", hooknum,
 		  ic->icmp6_type, ntohs(icmpv6_id(ic)),
 		  &iph->saddr, &iph->daddr);
 
@@ -1456,51 +1471,64 @@ static int ip_vs_in_icmp_v6(struct sk_buff *skb, int *related,
 	 * this means that some packets will manage to get a long way
 	 * down this stack and then be rejected, but that's life.
 	 */
-	if ((ic->icmp6_type != ICMPV6_DEST_UNREACH) &&
-	    (ic->icmp6_type != ICMPV6_PKT_TOOBIG) &&
-	    (ic->icmp6_type != ICMPV6_TIME_EXCEED)) {
+	if (ic->icmp6_type & ICMPV6_INFOMSG_MASK) {
 		*related = 0;
 		return NF_ACCEPT;
 	}
+	/* Fragment header that is before ICMP header tells us that:
+	 * it's not an error message since they can't be fragmented.
+	 */
+	if (iph->flags & IP6T_FH_F_FRAG)
+		return NF_DROP;
 
 	/* Now find the contained IP header */
 	ciph.len = iph->len + sizeof(_icmph);
 	ciph.flags = 0;
 	ciph.fragoffs = 0;
+	offs_ciph = ciph.len;	/* Save ip header offset */
+	ip6h = skb_header_pointer(skb, ciph.len, sizeof(_ip6h),
+				 (void *)&_ip6h);
 	ciph.protocol = ipv6_find_hdr(skb, &ciph.len, -1, &ciph.fragoffs,
 				      &ciph.flags);
-	ciph.saddr = iph->saddr;	/* con_in_get() handles reverse order */
-	ciph.daddr = iph->daddr;
+	ciph.saddr.in6 = ip6h->saddr;	/* con_in_get() handles reverse order */
+	ciph.daddr.in6 = ip6h->daddr;
 
 	net = skb_net(skb);
 	pd = ip_vs_proto_data_get(net, ciph.protocol);
-	if (!pd)
-		return NF_ACCEPT;
-	pp = pd->pp;
 
-	/* Is the embedded protocol header present?
-	 * If it's the second or later fragment we don't know what it is
+	/* Is not the embedded protocol header present?
+	 * or it's the second or later fragment we don't know what it is
 	 * i.e. just let it through.
 	 */
-	if (ciph.fragoffs)
+	if (!pd || ciph.fragoffs)
 		return NF_ACCEPT;
+	pp = pd->pp;
 
-	offset = ciph.len;
-	IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offset,
+	IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offs_ciph,
 		      "Checking incoming ICMPv6 for");
 
-	/* The embedded headers contain source and dest in reverse order */
-	cp = pp->conn_in_get(AF_INET6, skb, &ciph, 1);
+	/* The embedded headers contain source and dest in reverse order
+	 * if not from localhost
+	 */
+	cp = pp->conn_in_get(AF_INET6, skb, &ciph,
+			     (hooknum == NF_INET_LOCAL_OUT) ? 0 : 1);
+
 	if (!cp)
 		return NF_ACCEPT;
+	/* VS/TUN, VS/DR and LOCALNODE just let it go */
+	if ((hooknum == NF_INET_LOCAL_OUT) &&
+	    (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)) {
+		__ip_vs_conn_put(cp);
+		return NF_ACCEPT;
+	}
 
 	/* do the statistics and put it back */
 	ip_vs_in_stats(cp, skb);
 	if (IPPROTO_TCP == ciph.protocol || IPPROTO_UDP == ciph.protocol ||
 	    IPPROTO_SCTP == ciph.protocol)
-		offset = ciph.len + (2 * sizeof(__u16));
+		offs_ciph = ciph.len;
 
-	verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offset, hooknum, &ciph);
+	verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offs_ciph, hooknum, &ciph);
 
 	__ip_vs_conn_put(cp);
 
@@ -1562,6 +1590,12 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6) {
+		if (!iph.fragoffs && skb_nfct_reasm(skb)) {
+			struct sk_buff *reasm = skb_nfct_reasm(skb);
+			/* Save fw mark for coming frags. */
+			reasm->ipvs_property = 1;
+			reasm->mark = skb->mark;
+		}
 		if (unlikely(iph.protocol == IPPROTO_ICMPV6)) {
 			int related;
 			int verdict = ip_vs_in_icmp_v6(skb, &related, hooknum,
@@ -1587,12 +1621,12 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	pp = pd->pp;
 	/*
 	 * Check if the packet belongs to an existing connection entry
-	 * Only sched first IPv6 fragment.
 	 */
 	cp = pp->conn_in_get(af, skb, &iph, 0);
-	if (unlikely(!cp) && !iph.fragoffs) {
+	if (unlikely(!cp)) {
 		int v;
 
+		/* Schedule and create new connection entry into &cp */
 		if (!pp->conn_schedule(af, skb, pd, &v, &cp, &iph))
 			return v;
 	}
@@ -1685,6 +1719,39 @@ ip_vs_local_request4(unsigned int hooknum, struct sk_buff *skb,
 #ifdef CONFIG_IP_VS_IPV6
 
 /*
+ * AF_INET6 fragment handling
+ * Copy info from first fragment, to the rest of them.
+ */
+static unsigned int
+ip_vs_preroute_frag6(unsigned int hooknum, struct sk_buff *skb,
+		     const struct net_device *in,
+		     const struct net_device *out,
+		     int (*okfn)(struct sk_buff *))
+{
+	struct ip_vs_iphdr iphdr  = { .len = 0, .flags = 0, };
+	struct sk_buff *reasm = skb_nfct_reasm(skb);
+	struct net *net;
+
+	/* Skip if not a "replay" from nf_ct_frag6_output or first fragment.
+	 * ipvs_property is set when checking first fragment
+	 * in ip_vs_in() and ip_vs_out().
+	 */
+	if (reasm)
+		IP_VS_DBG(2, "Fragment recv prop:%d\n", reasm->ipvs_property);
+	if (!reasm || !reasm->ipvs_property)
+		return NF_ACCEPT;
+
+	net = skb_net(skb);
+	if (!net_ipvs(net)->enable)
+		return NF_ACCEPT;
+
+	/* Copy stored fw mark, saved in ip_vs_{in,out} */
+	skb->mark = reasm->mark;
+
+	return NF_ACCEPT;
+}
+
+/*
  *	AF_INET6 handler in NF_INET_LOCAL_IN chain
  *	Schedule and forward packets from remote clients
  */
@@ -1823,6 +1890,14 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = {
 		.priority	= 100,
 	},
 #ifdef CONFIG_IP_VS_IPV6
+	/* After mangle & nat fetch 2:nd fragment and following */
+	{
+		.hook		= ip_vs_preroute_frag6,
+		.owner		= THIS_MODULE,
+		.pf		= NFPROTO_IPV6,
+		.hooknum	= NF_INET_PRE_ROUTING,
+		.priority	= NF_IP6_PRI_NAT_DST + 1,
+	},
 	/* After packet filtering, change source only for VS/NAT */
 	{
 		.hook		= ip_vs_reply6,
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 925cca2..422b92f 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -497,7 +497,9 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 			skb->dev = net->loopback_dev;
 		}
-		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+		/* only send ICMP too big on first fragment */
+		if (!iph->fragoffs)
+			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
 		dst_release(&rt->dst);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
 		goto tx_error;
@@ -667,7 +669,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	EnterFunction(10);
 
 	/* check if it is a connection of no-client-port */
-	if (unlikely(cp->flags & IP_VS_CONN_F_NO_CPORT)) {
+	if (unlikely(cp->flags & IP_VS_CONN_F_NO_CPORT && !iph->fragoffs)) {
 		__be16 _pt, *p;
 		p = skb_header_pointer(skb, iph->len, sizeof(_pt), &_pt);
 		if (p == NULL)
@@ -693,7 +695,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 		if (ct && !nf_ct_is_untracked(ct)) {
 			IP_VS_DBG_RL_PKT(10, AF_INET6, pp, skb, 0,
-					 "ip_vs_nat_xmit_v6(): "
+					 "ip_vs_nat_xmit_v6(): "\
 					 "stopping DNAT to local address");
 			goto tx_error_put;
 		}
@@ -717,7 +719,9 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 			skb->dev = net->loopback_dev;
 		}
-		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+		/* only send ICMP too big on first fragment */
+		if (!iph->fragoffs)
+			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
 		IP_VS_DBG_RL_PKT(0, AF_INET6, pp, skb, 0,
 				 "ip_vs_nat_xmit_v6(): frag needed for");
 		goto tx_error_put;
@@ -952,7 +956,9 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 			skb->dev = net->loopback_dev;
 		}
-		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+		/* only send ICMP too big on first fragment */
+		if (!ipvsh->fragoffs)
+			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
 		goto tx_error_put;
 	}
@@ -1118,7 +1124,9 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 			skb->dev = net->loopback_dev;
 		}
-		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+		/* only send ICMP too big on first fragment */
+		if (!iph->fragoffs)
+			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
 		dst_release(&rt->dst);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
 		goto tx_error;
@@ -1356,7 +1364,9 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 			skb->dev = net->loopback_dev;
 		}
-		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+		/* only send ICMP too big on first fragment */
+		if (!iph->fragoffs)
+			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
 		goto tx_error_put;
 	}


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] ipvs: IPv6 fragment handling for IPVS
  2012-08-20 13:08 [PATCH 0/3] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2012-08-20 13:08 ` [PATCH 3/3] ipvs: Complete IPv6 fragment handling for IPVS Jesper Dangaard Brouer
@ 2012-08-21  5:24 ` Simon Horman
  2012-08-21  7:51   ` Jesper Dangaard Brouer
  3 siblings, 1 reply; 15+ messages in thread
From: Simon Horman @ 2012-08-21  5:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Patrick McHardy, Hans Schillstrom, lvs-devel,
	Julian Anastasov, Wensong Zhang, netfilter-devel,
	Pablo Neira Ayuso

On Mon, Aug 20, 2012 at 03:08:30PM +0200, Jesper Dangaard Brouer wrote:
> The following patchset implement IPv6 fragment handling for IPVS.
> 
> This work is based upon patches from Hans Schillstrom.  I have taken
> over the patchset, in close agreement with Hans, because he don't have
> (gotten allocated) time to complete his work.
> 
> I have cleaned up the patchset, changed the API a bit, fixed a refcnt
> bug, and rebased on top of Julians recent changes. (All with Hans'es
> knowledge)
> 
>  Patch01: is just unrelated trivial fixes.
> 
>  Patch02: Fix faulty IPv6 extension header handling in IPVS
> 
>  Patch03: Complete IPv6 fragment handling for IPVS
> 
> This patchset is based upon:
>  Homes ipvs-next tree:
>   git://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git
> 
>  On top of commit 3654e61137db891f5312e6dd813b961484b5fdf3:
>   ipvs: add pmtu_disc option to disable IP DF for TUN packets

I have no objection to these changes, but I would be more comfortable
applying them after a review from Hans, Julian or Pablo.

> 
> ---
> 
> Jesper Dangaard Brouer (3):
>       ipvs: Complete IPv6 fragment handling for IPVS
>       ipvs: Fix faulty IPv6 extension header handling in IPVS
>       ipvs: Trivial changes, use compressed IPv6 address in output
> 
> 
>  include/net/ip_vs.h                     |  191 +++++++++++----
>  net/netfilter/ipvs/Kconfig              |    7 -
>  net/netfilter/ipvs/ip_vs_conn.c         |   15 -
>  net/netfilter/ipvs/ip_vs_core.c         |  384 +++++++++++++++++--------------
>  net/netfilter/ipvs/ip_vs_dh.c           |    2 
>  net/netfilter/ipvs/ip_vs_lblc.c         |    2 
>  net/netfilter/ipvs/ip_vs_lblcr.c        |    2 
>  net/netfilter/ipvs/ip_vs_pe_sip.c       |   27 ++
>  net/netfilter/ipvs/ip_vs_proto.c        |    6 
>  net/netfilter/ipvs/ip_vs_proto_ah_esp.c |    9 -
>  net/netfilter/ipvs/ip_vs_proto_sctp.c   |   42 +--
>  net/netfilter/ipvs/ip_vs_proto_tcp.c    |   40 +--
>  net/netfilter/ipvs/ip_vs_proto_udp.c    |   41 +--
>  net/netfilter/ipvs/ip_vs_sched.c        |    2 
>  net/netfilter/ipvs/ip_vs_sh.c           |    2 
>  net/netfilter/ipvs/ip_vs_xmit.c         |   75 +++---
>  net/netfilter/xt_ipvs.c                 |    4 
>  17 files changed, 489 insertions(+), 362 deletions(-)
> 
> 
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Sr. Network Kernel Developer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] ipvs: IPv6 fragment handling for IPVS
  2012-08-21  5:24 ` [PATCH 0/3] ipvs: " Simon Horman
@ 2012-08-21  7:51   ` Jesper Dangaard Brouer
  2012-08-22  6:42     ` Simon Horman
  0 siblings, 1 reply; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-21  7:51 UTC (permalink / raw)
  To: Simon Horman, Julian Anastasov
  Cc: netdev, Patrick McHardy, Hans Schillstrom, LVS devel,
	Wensong Zhang, netfilter-devel, Pablo Neira Ayuso

On Tue, 2012-08-21 at 14:24 +0900, Simon Horman wrote:
> I have no objection to these changes, but I would be more comfortable
> applying them after a review from Hans, Julian or Pablo.
> 
I would appreciate a review, especially from Julian.

I'm going to do some more extensive testing on the patchset, in this
week.  So, no hurry on applying these, we have time for a good review
process.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS
  2012-08-20 13:08 ` [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS Jesper Dangaard Brouer
@ 2012-08-21 14:14   ` Julian Anastasov
  2012-08-23 12:50     ` Jesper Dangaard Brouer
  2012-08-26 21:13   ` Patrick McHardy
  1 sibling, 1 reply; 15+ messages in thread
From: Julian Anastasov @ 2012-08-21 14:14 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Patrick McHardy, Hans Schillstrom, lvs-devel,
	Simon Horman, Wensong Zhang, netfilter-devel


	Hello,

On Mon, 20 Aug 2012, Jesper Dangaard Brouer wrote:

> Based on patch from: Hans Schillstrom
> 
> IPv6 headers must be processed in order of appearance,
> neither can it be assumed that Upper layer headers is first.
> If anything else than L4 is the first header IPVS will throw it.
> 
> IPVS will write SNAT & DNAT modifications at a fixed pos which
> will corrupt the message. Proper header position must be found
> before writing modifying packet.
> 
> This patch contains a lot of API changes.  This is done, to avoid
> the costly scan of finding the IPv6 headers, via ipv6_find_hdr().
> Finding the IPv6 headers is done as early as possible, and passed
> on as a pointer "struct ip_vs_iphdr *" to the affected functions.
> 
> Notice, I have choosen, not to change the API of function
> pointer "(*schedule)" (in struct ip_vs_scheduler) as it can be
> used by external schedulers, via {un,}register_ip_vs_scheduler.
> Only 4 out of 10 schedulers use info from ip_vs_iphdr*, and when
> they do, they are only interested in iph->{s,d}addr.
> 
> This patch depends on commit 84018f55a:
>  "netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()"
> 
> This also adds a dependency to ip6_tables.
> 
> Hans left some questions in ip_vs_pe_sip.c, which I'm uncertain about.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Hans Schillstrom <hans@schillstrom.com>

	Patch 1 looks ok, following are some small comments
for patch 2 and 3.

> +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> +#include <linux/netfilter_ipv6/ip6_tables.h>
> +#endif

	There is already #if IS_ENABLED(CONFIG_IPV6) that
can replace #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

	It seems we need IS_ENABLED for many places:
CONFIG_NF_CONNTRACK, CONFIG_NF_DEFRAG_IPV6

> @@ -958,34 +943,26 @@ static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
>  	}
>  
>  	/* Now find the contained IP header */
> -	offset += sizeof(_icmph);
> -	cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph);
> -	if (cih == NULL)
> -		return NF_ACCEPT; /* The packet looks wrong, ignore */
> +	ipvsh->len += sizeof(_icmph);
> +	ip6 = skb_header_pointer(skb, ipvsh->len, sizeof(_ip6), &_ip6);

	ip6 is not checked here for NULL or we rely on
ipv6_find_hdr checks?

> @@ -1506,39 +1464,43 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
>  	}
>  
>  	/* Now find the contained IP header */
> -	offset += sizeof(_icmph);
> -	cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph);
> -	if (cih == NULL)
> -		return NF_ACCEPT; /* The packet looks wrong, ignore */
> +	ciph.len = iph->len + sizeof(_icmph);
> +	ciph.flags = 0;
> +	ciph.fragoffs = 0;
> +	ciph.protocol = ipv6_find_hdr(skb, &ciph.len, -1, &ciph.fragoffs,
> +				      &ciph.flags);
> +	ciph.saddr = iph->saddr;	/* con_in_get() handles reverse order */
> +	ciph.daddr = iph->daddr;

	The ciph initialization looks dangerous if one day
we add new field into the header.

	Can we use ciph = (struct ip_vs_iphdr) { .XXX = val, ... },
in such case we have to call ipv6_find_hdr out of (after)
this initialization? Of course, we will write twice to
small fields such as protocol, len, fragoffs, flags

	Also ipv6_find_hdr looks a bit noisy for missing header,
can it be a problem for the inner IPv6 header in ICMP messages?

	In patch 3 ip_vs_in_icmp_v6 initializes ciph in the
same way. It will be difficult to audit the code later
considering the large number of places where iph is used.

>  	net = skb_net(skb);
> -	pd = ip_vs_proto_data_get(net, cih->nexthdr);
> +	pd = ip_vs_proto_data_get(net, ciph.protocol);
>  	if (!pd)
>  		return NF_ACCEPT;
>  	pp = pd->pp;
>  
> -	/* Is the embedded protocol header present? */
> -	/* TODO: we don't support fragmentation at the moment anyways */
> -	if (unlikely(cih->nexthdr == IPPROTO_FRAGMENT && pp->dont_defrag))
> +	/* Is the embedded protocol header present?
> +	 * If it's the second or later fragment we don't know what it is
> +	 * i.e. just let it through.
> +	 */
> +	if (ciph.fragoffs)
>  		return NF_ACCEPT;
>  
> +	offset = ciph.len;
>  	IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offset,
>  		      "Checking incoming ICMPv6 for");
>  
> -	offset += sizeof(struct ipv6hdr);
> -
> -	ip_vs_fill_iphdr(AF_INET6, cih, &ciph);
>  	/* The embedded headers contain source and dest in reverse order */
> -	cp = pp->conn_in_get(AF_INET6, skb, &ciph, offset, 1);
> +	cp = pp->conn_in_get(AF_INET6, skb, &ciph, 1);
>  	if (!cp)
>  		return NF_ACCEPT;
>  
>  	/* do the statistics and put it back */
>  	ip_vs_in_stats(cp, skb);
> -	if (IPPROTO_TCP == cih->nexthdr || IPPROTO_UDP == cih->nexthdr ||
> -	    IPPROTO_SCTP == cih->nexthdr)
> -		offset += 2 * sizeof(__u16);
> -	verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offset, hooknum);
> +	if (IPPROTO_TCP == ciph.protocol || IPPROTO_UDP == ciph.protocol ||
> +	    IPPROTO_SCTP == ciph.protocol)
> +		offset = ciph.len + (2 * sizeof(__u16));

	Still in the same func, above code is correct but
patch 3 changes it back to wrong state (offs_ciph = ciph.len).

> +
> +	verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offset, hooknum, &ciph);
>  
>  	__ip_vs_conn_put(cp);

> diff --git a/net/netfilter/ipvs/ip_vs_pe_sip.c b/net/netfilter/ipvs/ip_vs_pe_sip.c
> index 1aa5cac..bb28b4f 100644
> --- a/net/netfilter/ipvs/ip_vs_pe_sip.c
> +++ b/net/netfilter/ipvs/ip_vs_pe_sip.c
> @@ -68,26 +68,37 @@ static int get_callid(const char *dptr, unsigned int dataoff,
>  static int
>  ip_vs_sip_fill_param(struct ip_vs_conn_param *p, struct sk_buff *skb)
>  {
> +	struct sk_buff *reasm = skb_nfct_reasm(skb);
>  	struct ip_vs_iphdr iph;
>  	unsigned int dataoff, datalen, matchoff, matchlen;
>  	const char *dptr;
>  	int retc;
>  
> -	ip_vs_fill_iphdr(p->af, skb_network_header(skb), &iph);
> +	ip_vs_fill_iph_skb(p->af, skb, &iph);

	May be skb_linearize is bad for IPv6? IIRC,
ip_vs_pe_sip.c needs access just to read the Call-ID.
For IPv4 it was simple to use skb_linearize, may be
the logic should be improved to read the values even
from non-linear data. May be there is already some
example code for this. For IPv6 I'm not sure what
kind are the problems here, may be it depends if
we try to call skb_linearize for reasm packet?

> -
> -	if ((retc=skb_linearize(skb)) < 0)
> +	/*
> +	 * todo: Check if this will mess-up the reasm skb !!! /HS
> +	 */
> +	retc = skb_linearize(reasm);
> +	if (retc < 0)
>  		return retc;
> -	dptr = skb->data + dataoff;
> -	datalen = skb->len - dataoff;
> +	dptr = reasm->data + dataoff;
> +	datalen = reasm->len - dataoff;
>  
>  	if (get_callid(dptr, dataoff, datalen, &matchoff, &matchlen))
>  		return -EINVAL;

	There are recents changes for IPv6 from Patrick McHardy:

http://marc.info/?l=netfilter-devel&m=134543406303402&w=2
http://marc.info/?l=netfilter-devel&m=134543407803412&w=2

	In this context, may be soon we will modify ip_vs_ftp to
support IPv6. It is possible to work with present FTP helper
in netfilter. Does it mean that we will see only reassembled
packets when conntrack is running? No original fragments.
Are we prepared to work in both ways (originals+reasm and
just reasm) ?

- For patch 3 in Kconfig do we need 'select NF_DEFRAG_IPV6' ?

	Basicly, I'm concerned what will happen when we
start to mangle the protocol payloads (FTP). For now
we are safe by touching only addresses and ports. May be
we have to synchronize these changes with the work from
Patrick.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] ipvs: IPv6 fragment handling for IPVS
  2012-08-21  7:51   ` Jesper Dangaard Brouer
@ 2012-08-22  6:42     ` Simon Horman
  0 siblings, 0 replies; 15+ messages in thread
From: Simon Horman @ 2012-08-22  6:42 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Julian Anastasov, netdev, Patrick McHardy, Hans Schillstrom,
	LVS devel, Wensong Zhang, netfilter-devel, Pablo Neira Ayuso

On Tue, Aug 21, 2012 at 09:51:27AM +0200, Jesper Dangaard Brouer wrote:
> On Tue, 2012-08-21 at 14:24 +0900, Simon Horman wrote:
> > I have no objection to these changes, but I would be more comfortable
> > applying them after a review from Hans, Julian or Pablo.
> > 
> I would appreciate a review, especially from Julian.
> 
> I'm going to do some more extensive testing on the patchset, in this
> week.  So, no hurry on applying these, we have time for a good review
> process.

Thanks Jesper, and Julian and Hans who also responded.
Let me know when you feel the series is ready to merge.
I am also in no hurry.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS
  2012-08-21 14:14   ` Julian Anastasov
@ 2012-08-23 12:50     ` Jesper Dangaard Brouer
  2012-08-23 16:06       ` Julian Anastasov
  0 siblings, 1 reply; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-23 12:50 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: netdev, Patrick McHardy, Hans Schillstrom, lvs-devel,
	Simon Horman, Wensong Zhang, netfilter-devel


On Tue, 2012-08-21 at 17:14 +0300, Julian Anastasov wrote:
> On Mon, 20 Aug 2012, Jesper Dangaard Brouer wrote:
> 
> > Based on patch from: Hans Schillstrom
> > 
> > IPv6 headers must be processed in order of appearance,
> > neither can it be assumed that Upper layer headers is first.
> > If anything else than L4 is the first header IPVS will throw it.
> > 
> > IPVS will write SNAT & DNAT modifications at a fixed pos which
> > will corrupt the message. Proper header position must be found
> > before writing modifying packet.
> > 
> > This patch contains a lot of API changes.  This is done, to avoid
> > the costly scan of finding the IPv6 headers, via ipv6_find_hdr().
> > Finding the IPv6 headers is done as early as possible, and passed
> > on as a pointer "struct ip_vs_iphdr *" to the affected functions.
> > 
> > Notice, I have choosen, not to change the API of function
> > pointer "(*schedule)" (in struct ip_vs_scheduler) as it can be
> > used by external schedulers, via {un,}register_ip_vs_scheduler.
> > Only 4 out of 10 schedulers use info from ip_vs_iphdr*, and when
> > they do, they are only interested in iph->{s,d}addr.
> > 
> > This patch depends on commit 84018f55a:
> >  "netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()"
> > 
> > This also adds a dependency to ip6_tables.
> > 
> > Hans left some questions in ip_vs_pe_sip.c, which I'm uncertain about.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
> 
> 	Patch 1 looks ok, following are some small comments
> for patch 2 and 3.
> 
> > +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> > +#include <linux/netfilter_ipv6/ip6_tables.h>
> > +#endif
> 
> 	There is already #if IS_ENABLED(CONFIG_IPV6) that
> can replace #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Okay.

> 	It seems we need IS_ENABLED for many places:
> CONFIG_NF_CONNTRACK, CONFIG_NF_DEFRAG_IPV6

Wondering if we should keep these cleanup changes to a separate patch.


> > @@ -958,34 +943,26 @@ static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
> >  	}
> >  
> >  	/* Now find the contained IP header */
> > -	offset += sizeof(_icmph);
> > -	cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph);
> > -	if (cih == NULL)
> > -		return NF_ACCEPT; /* The packet looks wrong, ignore */
> > +	ipvsh->len += sizeof(_icmph);
> > +	ip6 = skb_header_pointer(skb, ipvsh->len, sizeof(_ip6), &_ip6);
> 
> 	ip6 is not checked here for NULL or we rely on
> ipv6_find_hdr checks?

Good catch, I'll re-add the NULL pointer check.



> > @@ -1506,39 +1464,43 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
> >  	}
> >  
> >  	/* Now find the contained IP header */
> > -	offset += sizeof(_icmph);
> > -	cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph);
> > -	if (cih == NULL)
> > -		return NF_ACCEPT; /* The packet looks wrong, ignore */
> > +	ciph.len = iph->len + sizeof(_icmph);
> > +	ciph.flags = 0;
> > +	ciph.fragoffs = 0;
> > +	ciph.protocol = ipv6_find_hdr(skb, &ciph.len, -1, &ciph.fragoffs,
> > +				      &ciph.flags);

(notice that &ciph.len can get updated by ipv6_find_hdr())

> > +	ciph.saddr = iph->saddr;	/* con_in_get() handles reverse order */
> > +	ciph.daddr = iph->daddr;
> 
> 	The ciph initialization looks dangerous if one day
> we add new field into the header.
> 
> 	Can we use ciph = (struct ip_vs_iphdr) { .XXX = val, ... },
> in such case we have to call ipv6_find_hdr out of (after)
> this initialization? Of course, we will write twice to
> small fields such as protocol, len, fragoffs, flags

I'm not sure I follow/understand.


> 	Also ipv6_find_hdr looks a bit noisy for missing header,
> can it be a problem for the inner IPv6 header in ICMP messages?

I can see, that I don't handle the error cases of missing headers, from
a call to ipv6_find_hdr() ... I guess I need to check if the return
value "protocol" is negative.  But I'm not sure if it matters in this
case (Hans?)


> 	In patch 3 ip_vs_in_icmp_v6 initializes ciph in the
> same way. It will be difficult to audit the code later
> considering the large number of places where iph is used.

I'm not sure what you want me to do?


> >  	net = skb_net(skb);
> > -	pd = ip_vs_proto_data_get(net, cih->nexthdr);
> > +	pd = ip_vs_proto_data_get(net, ciph.protocol);
> >  	if (!pd)
> >  		return NF_ACCEPT;
> >  	pp = pd->pp;
> >  
> > -	/* Is the embedded protocol header present? */
> > -	/* TODO: we don't support fragmentation at the moment anyways */
> > -	if (unlikely(cih->nexthdr == IPPROTO_FRAGMENT && pp->dont_defrag))
> > +	/* Is the embedded protocol header present?
> > +	 * If it's the second or later fragment we don't know what it is
> > +	 * i.e. just let it through.
> > +	 */
> > +	if (ciph.fragoffs)
> >  		return NF_ACCEPT;
> >  
> > +	offset = ciph.len;
> >  	IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offset,
> >  		      "Checking incoming ICMPv6 for");
> >  
> > -	offset += sizeof(struct ipv6hdr);
> > -
> > -	ip_vs_fill_iphdr(AF_INET6, cih, &ciph);
> >  	/* The embedded headers contain source and dest in reverse order */
> > -	cp = pp->conn_in_get(AF_INET6, skb, &ciph, offset, 1);
> > +	cp = pp->conn_in_get(AF_INET6, skb, &ciph, 1);
> >  	if (!cp)
> >  		return NF_ACCEPT;
> >  
> >  	/* do the statistics and put it back */
> >  	ip_vs_in_stats(cp, skb);
> > -	if (IPPROTO_TCP == cih->nexthdr || IPPROTO_UDP == cih->nexthdr ||
> > -	    IPPROTO_SCTP == cih->nexthdr)
> > -		offset += 2 * sizeof(__u16);
> > -	verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offset, hooknum);
> > +	if (IPPROTO_TCP == ciph.protocol || IPPROTO_UDP == ciph.protocol ||
> > +	    IPPROTO_SCTP == ciph.protocol)
> > +		offset = ciph.len + (2 * sizeof(__u16));
> 
> 	Still in the same func, above code is correct but
> patch 3 changes it back to wrong state (offs_ciph = ciph.len).

Don't think its a "bug" in patch3, it might be a "bug" in this patch.
Because &ciph.len gets updated earlier by ipv6_find_hdr()... but I'm
starting to get confused.
Perhaps we should move these changes to ip_vs_in_icmp_v6() into the same
patch?

> > +
> > +	verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offset, hooknum, &ciph);
> >  
> >  	__ip_vs_conn_put(cp);
> 
> > diff --git a/net/netfilter/ipvs/ip_vs_pe_sip.c b/net/netfilter/ipvs/ip_vs_pe_sip.c
> > index 1aa5cac..bb28b4f 100644
> > --- a/net/netfilter/ipvs/ip_vs_pe_sip.c
> > +++ b/net/netfilter/ipvs/ip_vs_pe_sip.c
> > @@ -68,26 +68,37 @@ static int get_callid(const char *dptr, unsigned int dataoff,
> >  static int
> >  ip_vs_sip_fill_param(struct ip_vs_conn_param *p, struct sk_buff *skb)
> >  {
> > +	struct sk_buff *reasm = skb_nfct_reasm(skb);
> >  	struct ip_vs_iphdr iph;
> >  	unsigned int dataoff, datalen, matchoff, matchlen;
> >  	const char *dptr;
> >  	int retc;
> >  
> > -	ip_vs_fill_iphdr(p->af, skb_network_header(skb), &iph);
> > +	ip_vs_fill_iph_skb(p->af, skb, &iph);
> 
> 	May be skb_linearize is bad for IPv6? IIRC,
> ip_vs_pe_sip.c needs access just to read the Call-ID.
> For IPv4 it was simple to use skb_linearize, may be
> the logic should be improved to read the values even
> from non-linear data. May be there is already some
> example code for this. For IPv6 I'm not sure what
> kind are the problems here, may be it depends if
> we try to call skb_linearize for reasm packet?

I'm not able to answer these questions, my self...
 

> > -
> > -	if ((retc=skb_linearize(skb)) < 0)
> > +	/*
> > +	 * todo: Check if this will mess-up the reasm skb !!! /HS
> > +	 */
> > +	retc = skb_linearize(reasm);
> > +	if (retc < 0)
> >  		return retc;
> > -	dptr = skb->data + dataoff;
> > -	datalen = skb->len - dataoff;
> > +	dptr = reasm->data + dataoff;
> > +	datalen = reasm->len - dataoff;
> >  
> >  	if (get_callid(dptr, dataoff, datalen, &matchoff, &matchlen))
> >  		return -EINVAL;
> 
> 	There are recents changes for IPv6 from Patrick McHardy:
> 
> http://marc.info/?l=netfilter-devel&m=134543406303402&w=2
> http://marc.info/?l=netfilter-devel&m=134543407803412&w=2
> 
> 	In this context, may be soon we will modify ip_vs_ftp to
> support IPv6. It is possible to work with present FTP helper
> in netfilter. Does it mean that we will see only reassembled
> packets when conntrack is running? No original fragments.

Yes, it seems that we will *only* see the reassembled packet (no
original fragments) after Patricks patches.  BUT *only* when loading the
module nf_conntrack_ipv6.


> Are we prepared to work in both ways (originals+reasm and
> just reasm) ?

As mentioned in another thread, no. But only because the reassembled
packet will be dropped due to the MTU check. After this is fixed, the
ipvs code seems to work. (Notice, this is both without and with my/these
patches)


> - For patch 3 in Kconfig do we need 'select NF_DEFRAG_IPV6' ?

Yes, but it is still possible not to load the module nf_defrag_ipv6.

When not loading, nf_defrag_ipv6, these patches have no effect, and no
fragments are passed through.


> 	Basicly, I'm concerned what will happen when we
> start to mangle the protocol payloads (FTP). For now
> we are safe by touching only addresses and ports. May be
> we have to synchronize these changes with the work from
> Patrick.

Yes, I think we should take Patrick's work into account.

My biggest concern is, that depending on which modules (nf_defrag_ipv6
only, or also nf_conntrack_ipv6) are loaded, different code paths are
used (to support IPv6 fragments for IPVS).
This will be hard to understand, from a user perspective.  And also
difficult for us, when users report bugs...

Is loading nf_conntrack_ipv6 considered a big performance problem/issue
for IPVS?
(Can we tell people, to enable conntrack for frag support?)


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS
  2012-08-23 12:50     ` Jesper Dangaard Brouer
@ 2012-08-23 16:06       ` Julian Anastasov
  0 siblings, 0 replies; 15+ messages in thread
From: Julian Anastasov @ 2012-08-23 16:06 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Patrick McHardy, Hans Schillstrom, lvs-devel,
	Simon Horman, Wensong Zhang, netfilter-devel


	Hello,

On Thu, 23 Aug 2012, Jesper Dangaard Brouer wrote:

> > > +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> > > +#include <linux/netfilter_ipv6/ip6_tables.h>
> > > +#endif
> > 
> > 	There is already #if IS_ENABLED(CONFIG_IPV6) that
> > can replace #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> 
> Okay.
> 
> > 	It seems we need IS_ENABLED for many places:
> > CONFIG_NF_CONNTRACK, CONFIG_NF_DEFRAG_IPV6
> 
> Wondering if we should keep these cleanup changes to a separate patch.

	The way you prefer. We can fix it after this
patchset.

> > >  	/* Now find the contained IP header */
> > > -	offset += sizeof(_icmph);
> > > -	cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph);
> > > -	if (cih == NULL)
> > > -		return NF_ACCEPT; /* The packet looks wrong, ignore */
> > > +	ciph.len = iph->len + sizeof(_icmph);
> > > +	ciph.flags = 0;
> > > +	ciph.fragoffs = 0;
> > > +	ciph.protocol = ipv6_find_hdr(skb, &ciph.len, -1, &ciph.fragoffs,
> > > +				      &ciph.flags);
> 
> (notice that &ciph.len can get updated by ipv6_find_hdr())
> 
> > > +	ciph.saddr = iph->saddr;	/* con_in_get() handles reverse order */
> > > +	ciph.daddr = iph->daddr;
> > 
> > 	The ciph initialization looks dangerous if one day
> > we add new field into the header.
> > 
> > 	Can we use ciph = (struct ip_vs_iphdr) { .XXX = val, ... },
> > in such case we have to call ipv6_find_hdr out of (after)
> > this initialization? Of course, we will write twice to
> > small fields such as protocol, len, fragoffs, flags
> 
> I'm not sure I follow/understand.

	For example:

	ciph = (struct ip_vs_iphdr) {
		.len = iph->len + sizeof(_icmph),
		.saddr = iph->saddr,
		.daddr = iph->daddr,
	};
	ciph.protocol = ipv6_find_hdr(skb, &ciph.len, -1, &ciph.fragoffs,
				      &ciph.flags);

	but I'm not sure if the saddr/daddr part compiles.

> > 	Also ipv6_find_hdr looks a bit noisy for missing header,
> > can it be a problem for the inner IPv6 header in ICMP messages?
> 
> I can see, that I don't handle the error cases of missing headers, from
> a call to ipv6_find_hdr() ... I guess I need to check if the return
> value "protocol" is negative.  But I'm not sure if it matters in this
> case (Hans?)

	ok. But I was referring to this message:

printk(KERN_ERR "IPv6 header not found\n");

> > 	In patch 3 ip_vs_in_icmp_v6 initializes ciph in the
> > same way. It will be difficult to audit the code later
> > considering the large number of places where iph is used.
> 
> I'm not sure what you want me to do?

	May be it is only in ip_vs_in_icmp_v6, my above
example is for ip_vs_in_icmp_v6, let me know if it
compiles.

> > 	Still in the same func, above code is correct but
> > patch 3 changes it back to wrong state (offs_ciph = ciph.len).
> 
> Don't think its a "bug" in patch3, it might be a "bug" in this patch.
> Because &ciph.len gets updated earlier by ipv6_find_hdr()... but I'm
> starting to get confused.
> Perhaps we should move these changes to ip_vs_in_icmp_v6() into the same
> patch?

	ip_vs_icmp_xmit_v6 expects to see length that
will be made writeable, not exactly offset of header.
May be you can rename the ip_vs_icmp_xmit_v6 argument
from offset to wrtlen, it covers the ports because we
are going to mangle them. Patch 3 should not change
anymore there.

> > 	In this context, may be soon we will modify ip_vs_ftp to
> > support IPv6. It is possible to work with present FTP helper
> > in netfilter. Does it mean that we will see only reassembled
> > packets when conntrack is running? No original fragments.
> 
> Yes, it seems that we will *only* see the reassembled packet (no
> original fragments) after Patricks patches.  BUT *only* when loading the
> module nf_conntrack_ipv6.

	Yes, ok

> > Are we prepared to work in both ways (originals+reasm and
> > just reasm) ?
> 
> As mentioned in another thread, no. But only because the reassembled
> packet will be dropped due to the MTU check. After this is fixed, the
> ipvs code seems to work. (Notice, this is both without and with my/these
> patches)

	Very good

> > - For patch 3 in Kconfig do we need 'select NF_DEFRAG_IPV6' ?
> 
> Yes, but it is still possible not to load the module nf_defrag_ipv6.

	nf_defrag_ipv6 becomes mandatory for IPVS-IPv6.
Conntrack should be optional.

> When not loading, nf_defrag_ipv6, these patches have no effect, and no
> fragments are passed through.

	What happens with fragments if nf_defrag_ipv6 is
not loaded? IPVS will see original fragments without
reasm ptr? I assume IPVS can see them but can not do
much except to mangle addresses and ports.

> > 	Basicly, I'm concerned what will happen when we
> > start to mangle the protocol payloads (FTP). For now
> > we are safe by touching only addresses and ports. May be
> > we have to synchronize these changes with the work from
> > Patrick.
> 
> Yes, I think we should take Patrick's work into account.
> 
> My biggest concern is, that depending on which modules (nf_defrag_ipv6
> only, or also nf_conntrack_ipv6) are loaded, different code paths are
> used (to support IPv6 fragments for IPVS).

	May be just like for IPv4, I think, nf_defrag_ipv6
should be loaded (added as dep) but nf_conntrack_ipv6 should
be required only for FTP as we use some code from there.

> This will be hard to understand, from a user perspective.  And also
> difficult for us, when users report bugs...
> 
> Is loading nf_conntrack_ipv6 considered a big performance problem/issue
> for IPVS?
> (Can we tell people, to enable conntrack for frag support?)

	We should avoid it if possible. There can be small
routers that do not want conntrack. It would be better if
IPVS uses some symbol that will cause nf_defrag_ipv6 to
load, now we can add it as dep but what if user does not
load it? nf_defrag_ipv6_hooks.c is very small, I'm not
sure if we have to duplicate it just to cause defrag at
prerouting or even input to work for us in the case without
conntrack. Without this support we can not mangle FTP
payload.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS
  2012-08-20 13:08 ` [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS Jesper Dangaard Brouer
  2012-08-21 14:14   ` Julian Anastasov
@ 2012-08-26 21:13   ` Patrick McHardy
  2012-09-04 21:25     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 15+ messages in thread
From: Patrick McHardy @ 2012-08-26 21:13 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Hans Schillstrom, lvs-devel, Julian Anastasov,
	Simon Horman, Wensong Zhang, netfilter-devel

On Mon, 20 Aug 2012, Jesper Dangaard Brouer wrote:

> Based on patch from: Hans Schillstrom
>
> IPv6 headers must be processed in order of appearance,
> neither can it be assumed that Upper layer headers is first.
> If anything else than L4 is the first header IPVS will throw it.
>
> IPVS will write SNAT & DNAT modifications at a fixed pos which
> will corrupt the message. Proper header position must be found
> before writing modifying packet.
>
> This patch contains a lot of API changes.  This is done, to avoid
> the costly scan of finding the IPv6 headers, via ipv6_find_hdr().
> Finding the IPv6 headers is done as early as possible, and passed
> on as a pointer "struct ip_vs_iphdr *" to the affected functions.

How about we change netfilter to set up the skb's transport header
at an early time so we can avoid all (most of) these header scans
in netfilter?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS
  2012-08-26 21:13   ` Patrick McHardy
@ 2012-09-04 21:25     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2012-09-04 21:25 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: netdev, Hans Schillstrom, lvs-devel, Julian Anastasov,
	Simon Horman, Wensong Zhang, netfilter-devel

On Mon, 20 Aug 2012, Jesper Dangaard Brouer wrote:

[cut]

> This patch contains a lot of API changes.  This is done, to avoid
> the costly scan of finding the IPv6 headers, via ipv6_find_hdr().

(small correction ipv6_find_hdr() is not that costly for the general
case of no exthdrs)

> Finding the IPv6 headers is done as early as possible, and passed
> on as a pointer "struct ip_vs_iphdr *" to the affected functions.

This passing the "struct ip_vs_iphdr" actually makes sense.  It reminds
me of the way netfilter/iptables passes the xt_actions_param to each
rule.  Which contains the same information as ip_vs_iphdr.  (note ipvs
register at hooks at a lower level and don't get passed the
xt_actions_param).

Thus, perhaps we should keep these API changes.  Even if we decide to
optimize ipv6_find_hdr().  (as proposed by my RFC patch)

Perhaps we should consider adding a "family" to ip_vs_iphdr, as is done
in xt_actions_param.  This could help us, with collapsing IPv4 and IPv6
code, but i can see that other structs in IPVS carry this info already,
so not sure its relevant.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] ipvs: Fix faulty IPv6 extension  header handling in IPVS
  2012-08-29 12:34   ` Patrick McHardy
@ 2012-08-31 10:22     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-31 10:22 UTC (permalink / raw)
  To: Patrick McHardy, Hideaki YOSHIFUJI
  Cc: Hans Schillstrom, Hans Schillstrom, netdev, lvs-devel,
	Julian Anastasov, Simon Horman, netfilter-devel

On Wed, 2012-08-29 at 14:34 +0200, Patrick McHardy wrote:
> On Wed, 29 Aug 2012, Jesper Dangaard Brouer wrote:
> > On Wed, 2012-08-29 at 11:47 +0200, Hans Schillstrom wrote:
> >>>
> >>> On Mon, 2012-08-27 at 14:02 +0200, Patrick McHardy wrote:
> >>>> On Mon, 27 Aug 2012, Hans Schillstrom wrote:
> >>>>
> >>>>>>>> How about we change netfilter to set up the skb's transport header
> >>>>>>>> at an early time so we can avoid all (most of) these header scans
> >>>>>>>> in netfilter?

[...cut...]

> >>>> I guess inet6_skb_parm will be at least slightly more popular than
> >>>> adding it to the skb itself. The netfilter pointers are all used for
> >>>> optional things, so we can't really add it to any of those.
> >>>

[...cut...]

> >> Should we give it a try to put it in inet6_skb_parm
> >> and minimize what we put there ?
> >> I think it could be worth it.
> >
> > Okay, but then I do need some help and guidance, especially from
> > Patrick, think.
> >
> > First of all, where in the netfilter code, should we update the new
> > fields in inet6_skb_parm?
> 
> Good question. I think we'd need at least three spots since every one
> of these subsystems can be used indepedently from each other:
> 
> - conntrack/IPVS: PRE_ROUTING/LOCAL_OUT at lowest priority
> - ip6tables: first time packet hits ip6t_do_table()?

I've been looking at the code for ip6t_do_table() and it already calls
ipv6_find_hdr().  ip6t_do_table() calls ip6_packet_match()

And ip6_packet_match() already calls
  ipv6_find_hdr(skb, protoff, -1, &_frag_off, NULL);
but only if((ip6info->flags & IP6T_F_PROTO))

ip6t_do_table() uses the data found by
ipv6_find_hdr()/ip6_packet_match() and updates 'struct xt_action_param
acpar' (which is passed on to all netfilter modules/functions as 'par')

 protohdr = ipv6_find_hdr(skb, protoff, -1, &_frag_off, NULL)
 *fragoff = _frag_off;

 par->thoff   = protoff  /* thoff = Transport Header Offset */
 par->fragoff = fragoff  /* frag indicator and fragment offset */

The returned protocol (protohdr) is only used inside
ip6_packet_match(), thus the info on the protocol is lost.

(Side note) Saving the protocol could be useful for, the following
modules, as they call ipv6_find_hdr() once again to extract this:

 net/netfilter/xt_TPROXY.c: function tproxy_tg6_v1()
 net/netfilter/xt_socket.c: function socket_mt6_v1()

Thus, the netfilter framework already have this information available.
It just uses the 'struct xt_action_param par' to carry this
information, to its modules.

Mine and Hans's patch are basically introducing the same thing for IPVS,
only that this information is carried via 'struct ip_vs_iphdr'.

I don't know, if its worth to store this information in
inet6_skb_parm/IP6CB ?

I guess, to would make sense to store 'thoff' transport header offset,
especially for IPv6, given the extension headers.

But how many (code) users are there?
Is it only Netfilter and IPVS that want to look at the port numbers?

There also seems to a lot of users of "ipv6_skip_exthdr", which could
benefit?  But I simply don't know the IPv6 code well enough...


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] ipvs: Fix faulty IPv6 extension  header handling in IPVS
  2012-08-29 11:46 ` Jesper Dangaard Brouer
@ 2012-08-29 12:34   ` Patrick McHardy
  2012-08-31 10:22     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 15+ messages in thread
From: Patrick McHardy @ 2012-08-29 12:34 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Hans Schillstrom, Hans Schillstrom, netdev, lvs-devel,
	Julian Anastasov, Simon Horman, netfilter-devel

On Wed, 29 Aug 2012, Jesper Dangaard Brouer wrote:

> To Patrick,
>
> On Wed, 2012-08-29 at 11:47 +0200, Hans Schillstrom wrote:
>>>
>>> On Mon, 2012-08-27 at 14:02 +0200, Patrick McHardy wrote:
>>>> On Mon, 27 Aug 2012, Hans Schillstrom wrote:
>>>>
>>>>>>>> How about we change netfilter to set up the skb's transport header
>>>>>>>> at an early time so we can avoid all (most of) these header scans
>>>>>>>> in netfilter?
>>>>>>>
>>>>>>> I think that would be great, maybe it should be global i.e. not only a netfilter issue.
>>>>>>
>>>>>> I think in most other cases the headers are supposed to be processed
>>>>>> sequentially. One problem though - to be useful for netfilter/IPVS
>>>>>> we'd also need to store the transport layer protocol somewhere.
>>>>>
>>>>> I guess that's the problem, adding it to the skb will not be popular ....
>>>>> Right now I don't have a good solution, maybe a more generic netfilter ptr in the skb ...
>>>>
>>>> I guess inet6_skb_parm will be at least slightly more popular than
>>>> adding it to the skb itself. The netfilter pointers are all used for
>>>> optional things, so we can't really add it to any of those.
>>>
>>> Okay, but how do we go from here?
>>>
>>> Hans, should this hold back the patch ("ipvs: Fix faulty IPv6 extension
>>> header handling in IPVS").  Or should we pursue our patch, and circle
>>> back later once e.g. Patrick have found a generic solution for IPv6
>>> transport header handling?
>>
>> Should we give it a try to put it in inet6_skb_parm
>> and minimize what we put there ?
>> I think it could be worth it.
>
> Okay, but then I do need some help and guidance, especially from
> Patrick, think.
>
> First of all, where in the netfilter code, should we update the new
> fields in inet6_skb_parm?

Good question. I think we'd need at least three spots since every one
of these subsystems can be used indepedently from each other:

- conntrack/IPVS: PRE_ROUTING/LOCAL_OUT at lowest priority
- ip6tables: first time packet hits ip6t_do_table()?

Actually, looking at ipv6_rcv(), this might not work at all since it
sets skb->transport_header to the first header following the IPv6
header. This is used when processing extension headers by IPv6.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] ipvs: Fix faulty IPv6 extension  header handling in IPVS
  2012-08-29  9:47 Re[2]: Re[3]: [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS Hans Schillstrom
@ 2012-08-29 11:46 ` Jesper Dangaard Brouer
  2012-08-29 12:34   ` Patrick McHardy
  0 siblings, 1 reply; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2012-08-29 11:46 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Hans Schillstrom, Hans Schillstrom, netdev, lvs-devel,
	Julian Anastasov, Simon Horman, netfilter-devel

To Patrick,

On Wed, 2012-08-29 at 11:47 +0200, Hans Schillstrom wrote:
> >
> >On Mon, 2012-08-27 at 14:02 +0200, Patrick McHardy wrote:
> >> On Mon, 27 Aug 2012, Hans Schillstrom wrote:
> >> 
> >> >>>>
> >> >>>> On Mon, 20 Aug 2012, Jesper Dangaard Brouer wrote:
> >> >>>>
> >> >>>>> Based on patch from: Hans Schillstrom
> >> >>>>>
> >> >>>>> IPv6 headers must be processed in order of appearance,
> >> >>>>> neither can it be assumed that Upper layer headers is first.
> >> >>>>> If anything else than L4 is the first header IPVS will throw it.
> >> >>>>>
> >> >>>>> IPVS will write SNAT & DNAT modifications at a fixed pos which
> >> >>>>> will corrupt the message. Proper header position must be found
> >> >>>>> before writing modifying packet.
> >> >>>>>
> >> >>>>> This patch contains a lot of API changes.  This is done, to avoid
> >> >>>>> the costly scan of finding the IPv6 headers, via ipv6_find_hdr().
> >> >>>>> Finding the IPv6 headers is done as early as possible, and passed
> >> >>>>> on as a pointer "struct ip_vs_iphdr *" to the affected functions.
> >> >>>>
> >> >>>> How about we change netfilter to set up the skb's transport header
> >> >>>> at an early time so we can avoid all (most of) these header scans
> >> >>>> in netfilter?
> >> >>>
> >> >>> I think that would be great, maybe it should be global i.e. not only a netfilter issue.
> >> >>
> >> >> I think in most other cases the headers are supposed to be processed
> >> >> sequentially. One problem though - to be useful for netfilter/IPVS
> >> >> we'd also need to store the transport layer protocol somewhere.
> >> >
> >> > I guess that's the problem, adding it to the skb will not be popular ....
> >> > Right now I don't have a good solution, maybe a more generic netfilter ptr in the skb ...
> >> 
> >> I guess inet6_skb_parm will be at least slightly more popular than
> >> adding it to the skb itself. The netfilter pointers are all used for
> >> optional things, so we can't really add it to any of those.
> >
> >Okay, but how do we go from here?
> >
> >Hans, should this hold back the patch ("ipvs: Fix faulty IPv6 extension
> >header handling in IPVS").  Or should we pursue our patch, and circle
> >back later once e.g. Patrick have found a generic solution for IPv6
> >transport header handling?
> 
> Should we give it a try to put it in inet6_skb_parm 
> and minimize what we put there ?
> I think it could be worth it.

Okay, but then I do need some help and guidance, especially from
Patrick, think.

First of all, where in the netfilter code, should we update the new
fields in inet6_skb_parm?




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-09-04 21:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-20 13:08 [PATCH 0/3] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
2012-08-20 13:08 ` [PATCH 1/3] ipvs: Trivial changes, use compressed IPv6 address in output Jesper Dangaard Brouer
2012-08-20 13:08 ` [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS Jesper Dangaard Brouer
2012-08-21 14:14   ` Julian Anastasov
2012-08-23 12:50     ` Jesper Dangaard Brouer
2012-08-23 16:06       ` Julian Anastasov
2012-08-26 21:13   ` Patrick McHardy
2012-09-04 21:25     ` Jesper Dangaard Brouer
2012-08-20 13:08 ` [PATCH 3/3] ipvs: Complete IPv6 fragment handling for IPVS Jesper Dangaard Brouer
2012-08-21  5:24 ` [PATCH 0/3] ipvs: " Simon Horman
2012-08-21  7:51   ` Jesper Dangaard Brouer
2012-08-22  6:42     ` Simon Horman
2012-08-29  9:47 Re[2]: Re[3]: [PATCH 2/3] ipvs: Fix faulty IPv6 extension header handling in IPVS Hans Schillstrom
2012-08-29 11:46 ` Jesper Dangaard Brouer
2012-08-29 12:34   ` Patrick McHardy
2012-08-31 10:22     ` Jesper Dangaard Brouer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.