All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa
@ 2014-08-15  3:23 Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 01/18] ipvs: Pass destination address family to ip_vs_trash_get_dest Alex Gartrell
                   ` (18 more replies)
  0 siblings, 19 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

At Facebook we use ipip forwarding to deliver packets from our layer 4 ipvs
load balancers to our layer 7 proxies.  Today these layer 7 proxies are all
dual stacked, so we can simply send v4 over v4 and v6 over v6.  In the
future, we're going to have v6-only layer 7 load balancers (no internal v4
address).  To deal with this, we'll forward v4 packets in v6 tunnels.  This
patchset introduces the necessary functionality into ipvs.

The noteworthy limitation of this is that it is not compatible with state
synchronization, so great care is taken to keep these things mutually
exclusive.

This patchset includes changes that add an additional netlink attribute to
destinations and changes that plumb the destination address family through
parts of the code where it was assumed that it was the same as the service.
Finally, there's a change that updates the transmit functions for tunneling
to share common code and support v4 in v6 and vice versa.

Changes for v2:

Introduced crosses_local_route_boundary and update_pmtu functions and
conditionally do the ip_hdr operations.  The latter means that df will be
effectively zero when we forward an ipv6 packet over an ipv4 tunnel.

Additionally, I addressed Julian's other statements.

Alex Gartrell (9):
  ipvs: Pass destination address family to ip_vs_trash_get_dest
  ipvs: Supply destination address family to ip_vs_conn_new
  ipvs: maintain a mixed_address_family_dest count
  ipvs: prevent mixing heterogeneous pools and synchronization
  ipvs: Supply skb_af to out_rt* functions
  ipvs: Pull out crosses_local_route_boundary logic
  ipvs: Pull out update_pmtu code
  ipvs: Only do ip_hdr operations in *out_rt when skb_af is AF_INET
  ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding

Julian Anastasov (9):
  ipvs: address family of LBLC entry depends on svc family
  ipvs: address family of LBLCR entry depends on svc family
  ipvs: use correct address family in DH logs
  ipvs: use correct address family in LC logs
  ipvs: use correct address family in NQ logs
  ipvs: use correct address family in RR logs
  ipvs: use correct address family in SED logs
  ipvs: use correct address family in SH logs
  ipvs: use correct address family in WLC logs

 include/net/ip_vs.h              |   7 +-
 net/netfilter/ipvs/ip_vs_conn.c  |  17 +-
 net/netfilter/ipvs/ip_vs_core.c  |   9 +-
 net/netfilter/ipvs/ip_vs_ctl.c   |  55 +++++--
 net/netfilter/ipvs/ip_vs_dh.c    |   2 +-
 net/netfilter/ipvs/ip_vs_ftp.c   |   6 +-
 net/netfilter/ipvs/ip_vs_lblc.c  |  12 +-
 net/netfilter/ipvs/ip_vs_lblcr.c |  12 +-
 net/netfilter/ipvs/ip_vs_lc.c    |   2 +-
 net/netfilter/ipvs/ip_vs_nq.c    |   3 +-
 net/netfilter/ipvs/ip_vs_rr.c    |   2 +-
 net/netfilter/ipvs/ip_vs_sed.c   |   3 +-
 net/netfilter/ipvs/ip_vs_sh.c    |   8 +-
 net/netfilter/ipvs/ip_vs_sync.c  |   3 +-
 net/netfilter/ipvs/ip_vs_wlc.c   |   3 +-
 net/netfilter/ipvs/ip_vs_xmit.c  | 329 ++++++++++++++++++++++++---------------
 16 files changed, 304 insertions(+), 169 deletions(-)

-- 
1.8.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 01/18] ipvs: Pass destination address family to ip_vs_trash_get_dest
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 02/18] ipvs: Supply destination address family to ip_vs_conn_new Alex Gartrell
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

Part of a series of diffs to tease out destination family from virtual
family.  This diff just adds a parameter to ip_vs_trash_get and then uses
it for comparison rather than svc->af.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index eec8dee..01c813e 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -657,8 +657,8 @@ static void __ip_vs_dst_cache_reset(struct ip_vs_dest *dest)
  *  scheduling.
  */
 static struct ip_vs_dest *
-ip_vs_trash_get_dest(struct ip_vs_service *svc, const union nf_inet_addr *daddr,
-		     __be16 dport)
+ip_vs_trash_get_dest(struct ip_vs_service *svc, int dest_af,
+		     const union nf_inet_addr *daddr, __be16 dport)
 {
 	struct ip_vs_dest *dest;
 	struct netns_ipvs *ipvs = net_ipvs(svc->net);
@@ -671,11 +671,11 @@ ip_vs_trash_get_dest(struct ip_vs_service *svc, const union nf_inet_addr *daddr,
 		IP_VS_DBG_BUF(3, "Destination %u/%s:%u still in trash, "
 			      "dest->refcnt=%d\n",
 			      dest->vfwmark,
-			      IP_VS_DBG_ADDR(svc->af, &dest->addr),
+			      IP_VS_DBG_ADDR(dest->af, &dest->addr),
 			      ntohs(dest->port),
 			      atomic_read(&dest->refcnt));
-		if (dest->af == svc->af &&
-		    ip_vs_addr_equal(svc->af, &dest->addr, daddr) &&
+		if (dest->af == dest_af &&
+		    ip_vs_addr_equal(dest_af, &dest->addr, daddr) &&
 		    dest->port == dport &&
 		    dest->vfwmark == svc->fwmark &&
 		    dest->protocol == svc->protocol &&
@@ -950,7 +950,7 @@ ip_vs_add_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
 	 * Check if the dest already exists in the trash and
 	 * is from the same service
 	 */
-	dest = ip_vs_trash_get_dest(svc, &daddr, dport);
+	dest = ip_vs_trash_get_dest(svc, udest->af, &daddr, dport);
 
 	if (dest != NULL) {
 		IP_VS_DBG_BUF(3, "Get destination %s:%u from trash, "
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 02/18] ipvs: Supply destination address family to ip_vs_conn_new
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 01/18] ipvs: Pass destination address family to ip_vs_trash_get_dest Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 03/18] ipvs: maintain a mixed_address_family_dest count Alex Gartrell
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

The assumption that dest af is equal to service af is now unreliable, so we
must specify it manually so as not to copy just the first 4 bytes of a v6
address or doing an illegal read of 16 butes on a v6 address.

We "lie" in two places: for synchronization (which we will explicitly
disallow from happening when we have heterogeneous pools) and for black
hole addresses where there's no real dest.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 include/net/ip_vs.h             | 3 ++-
 net/netfilter/ipvs/ip_vs_conn.c | 5 +++--
 net/netfilter/ipvs/ip_vs_core.c | 9 +++++----
 net/netfilter/ipvs/ip_vs_ftp.c  | 6 ++++--
 net/netfilter/ipvs/ip_vs_sync.c | 3 ++-
 5 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 2fa1155..7600dbe 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -535,6 +535,7 @@ struct ip_vs_conn {
 	union nf_inet_addr      daddr;          /* destination address */
 	volatile __u32          flags;          /* status flags */
 	__u16                   protocol;       /* Which protocol (TCP/UDP) */
+	__u16			daf;		/* Address family of the dest */
 #ifdef CONFIG_NET_NS
 	struct net              *net;           /* Name space */
 #endif
@@ -1213,7 +1214,7 @@ static inline void __ip_vs_conn_put(struct ip_vs_conn *cp)
 void ip_vs_conn_put(struct ip_vs_conn *cp);
 void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport);
 
-struct ip_vs_conn *ip_vs_conn_new(const struct ip_vs_conn_param *p,
+struct ip_vs_conn *ip_vs_conn_new(const struct ip_vs_conn_param *p, int dest_af,
 				  const union nf_inet_addr *daddr,
 				  __be16 dport, unsigned int flags,
 				  struct ip_vs_dest *dest, __u32 fwmark);
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 8f4c602..fdb4880 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -854,7 +854,7 @@ void ip_vs_conn_expire_now(struct ip_vs_conn *cp)
  *	Create a new connection entry and hash it into the ip_vs_conn_tab
  */
 struct ip_vs_conn *
-ip_vs_conn_new(const struct ip_vs_conn_param *p,
+ip_vs_conn_new(const struct ip_vs_conn_param *p, int dest_af,
 	       const union nf_inet_addr *daddr, __be16 dport, unsigned int flags,
 	       struct ip_vs_dest *dest, __u32 fwmark)
 {
@@ -873,6 +873,7 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
 	setup_timer(&cp->timer, ip_vs_conn_expire, (unsigned long)cp);
 	ip_vs_conn_net_set(cp, p->net);
 	cp->af		   = p->af;
+	cp->daf		   = dest_af;
 	cp->protocol	   = p->protocol;
 	ip_vs_addr_set(p->af, &cp->caddr, p->caddr);
 	cp->cport	   = p->cport;
@@ -880,7 +881,7 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
 	ip_vs_addr_set(p->protocol == IPPROTO_IP ? AF_UNSPEC : p->af,
 		       &cp->vaddr, p->vaddr);
 	cp->vport	   = p->vport;
-	ip_vs_addr_set(p->af, &cp->daddr, daddr);
+	ip_vs_addr_set(cp->daf, &cp->daddr, daddr);
 	cp->dport          = dport;
 	cp->flags	   = flags;
 	cp->fwmark         = fwmark;
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index e683675..0cf952a 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -328,7 +328,7 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
 		 * This adds param.pe_data to the template,
 		 * and thus param.pe_data will be destroyed
 		 * when the template expires */
-		ct = ip_vs_conn_new(&param, &dest->addr, dport,
+		ct = ip_vs_conn_new(&param, dest->af, &dest->addr, dport,
 				    IP_VS_CONN_F_TEMPLATE, dest, skb->mark);
 		if (ct == NULL) {
 			kfree(param.pe_data);
@@ -357,7 +357,8 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
 	ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol, &iph->saddr,
 			      src_port, &iph->daddr, dst_port, &param);
 
-	cp = ip_vs_conn_new(&param, &dest->addr, dport, flags, dest, skb->mark);
+	cp = ip_vs_conn_new(&param, dest->af, &dest->addr, dport, flags, dest,
+			    skb->mark);
 	if (cp == NULL) {
 		ip_vs_conn_put(ct);
 		*ignored = -1;
@@ -479,7 +480,7 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
 		ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol,
 				      &iph->saddr, pptr[0], &iph->daddr,
 				      pptr[1], &p);
-		cp = ip_vs_conn_new(&p, &dest->addr,
+		cp = ip_vs_conn_new(&p, dest->af, &dest->addr,
 				    dest->port ? dest->port : pptr[1],
 				    flags, dest, skb->mark);
 		if (!cp) {
@@ -550,7 +551,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 			ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol,
 					      &iph->saddr, pptr[0],
 					      &iph->daddr, pptr[1], &p);
-			cp = ip_vs_conn_new(&p, &daddr, 0,
+			cp = ip_vs_conn_new(&p, svc->af, &daddr, 0,
 					    IP_VS_CONN_F_BYPASS | flags,
 					    NULL, skb->mark);
 			if (!cp)
diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index 77c1732..a64fa15 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -233,7 +233,8 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
 			ip_vs_conn_fill_param(ip_vs_conn_net(cp),
 					      AF_INET, IPPROTO_TCP, &cp->caddr,
 					      0, &cp->vaddr, port, &p);
-			n_cp = ip_vs_conn_new(&p, &from, port,
+			/* As above, this is ipv4 only */
+			n_cp = ip_vs_conn_new(&p, AF_INET, &from, port,
 					      IP_VS_CONN_F_NO_CPORT |
 					      IP_VS_CONN_F_NFCT,
 					      cp->dest, skb->mark);
@@ -396,7 +397,8 @@ static int ip_vs_ftp_in(struct ip_vs_app *app, struct ip_vs_conn *cp,
 				      htons(ntohs(cp->vport)-1), &p);
 		n_cp = ip_vs_conn_in_get(&p);
 		if (!n_cp) {
-			n_cp = ip_vs_conn_new(&p, &cp->daddr,
+			/* This is ipv4 only */
+			n_cp = ip_vs_conn_new(&p, AF_INET, &cp->daddr,
 					      htons(ntohs(cp->dport)-1),
 					      IP_VS_CONN_F_NFCT, cp->dest,
 					      skb->mark);
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 61701ed..da7e0a2 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -889,7 +889,8 @@ static void ip_vs_proc_conn(struct net *net, struct ip_vs_conn_param *param,
 				       param->vaddr, param->vport, protocol,
 				       fwmark, flags);
 
-		cp = ip_vs_conn_new(param, daddr, dport, flags, dest, fwmark);
+		cp = ip_vs_conn_new(param, type, daddr, dport, flags, dest,
+				    fwmark);
 		rcu_read_unlock();
 		if (!cp) {
 			if (param->pe_data)
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 03/18] ipvs: maintain a mixed_address_family_dest count
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 01/18] ipvs: Pass destination address family to ip_vs_trash_get_dest Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 02/18] ipvs: Supply destination address family to ip_vs_conn_new Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 04/18] ipvs: prevent mixing heterogeneous pools and synchronization Alex Gartrell
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

This is necessary to validate that we're not accidentally enabling
heterogeneous pool incompatible features like syncing.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 include/net/ip_vs.h            |  4 ++++
 net/netfilter/ipvs/ip_vs_ctl.c | 13 +++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 7600dbe..576d7f0 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -990,6 +990,10 @@ struct netns_ipvs {
 	char			backup_mcast_ifn[IP_VS_IFNAME_MAXLEN];
 	/* net name space ptr */
 	struct net		*net;            /* Needed by timer routines */
+	/* Number of heterogeneous destinations, needed because
+	 * heterogeneous are not supported when synchronization is
+	 * enabled */
+	unsigned int		mixed_address_family_dests;
 };
 
 #define DEFAULT_SYNC_THRESHOLD	3
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 01c813e..2356f1d 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -778,6 +778,16 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
 	struct ip_vs_service *old_svc;
 	struct ip_vs_scheduler *sched;
 	int conn_flags;
+	u16 old_af, new_af;
+
+	new_af = udest->af;
+	old_af = dest->af;
+
+	/* We cannot modify an address and change the address family */
+	BUG_ON(!add && udest->af != dest->af);
+
+	if (add && udest->af != svc->af)
+		ipvs->mixed_address_family_dests++;
 
 	/* set the weight and the flags */
 	atomic_set(&dest->weight, udest->weight);
@@ -1061,6 +1071,9 @@ static void __ip_vs_unlink_dest(struct ip_vs_service *svc,
 	list_del_rcu(&dest->n_list);
 	svc->num_dests--;
 
+	if (dest->af != svc->af)
+		net_ipvs(svc->net)->mixed_address_family_dests--;
+
 	if (svcupd) {
 		struct ip_vs_scheduler *sched;
 
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 04/18] ipvs: prevent mixing heterogeneous pools and synchronization
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (2 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 03/18] ipvs: maintain a mixed_address_family_dest count Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 05/18] ipvs: Supply skb_af to out_rt* functions Alex Gartrell
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

The synchronization protocol is not compatible with heterogeneous pools, so
we need to verify that we're not turning both on at the same time.

This also introduces a switch statement that we'll use to turn on
forwarding types on a case by case basis.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 2356f1d..6d07a51 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -858,10 +858,6 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest,
 
 	EnterFunction(2);
 
-	/* Temporary for consistency */
-	if (udest->af != svc->af)
-		return -EINVAL;
-
 #ifdef CONFIG_IP_VS_IPV6
 	if (udest->af == AF_INET6) {
 		atype = ipv6_addr_type(&udest->addr.in6);
@@ -3355,6 +3351,12 @@ static int ip_vs_genl_new_daemon(struct net *net, struct nlattr **attrs)
 	      attrs[IPVS_DAEMON_ATTR_SYNC_ID]))
 		return -EINVAL;
 
+	/* The synchronization protocol is incompatible with mixed family
+	 * services
+	 */
+	if (net_ipvs(net)->mixed_address_family_dests > 0)
+		return -EINVAL;
+
 	return start_sync_thread(net,
 				 nla_get_u32(attrs[IPVS_DAEMON_ATTR_STATE]),
 				 nla_data(attrs[IPVS_DAEMON_ATTR_MCAST_IFN]),
@@ -3487,6 +3489,26 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info)
 		 */
 		if (udest.af == 0)
 			udest.af = svc->af;
+
+		if (udest.af != svc->af) {
+			/* The synchronization protocol is incompatible
+			 * with mixed family services
+			 */
+			if (net_ipvs(net)->sync_state) {
+				ret = -EINVAL;
+				goto out;
+			}
+
+			/* Which connection types do we support? */
+			switch (udest.conn_flags) {
+			case IP_VS_CONN_F_TUNNEL:
+				/* We are able to forward this */
+				break;
+			default:
+				ret = -EINVAL;
+				goto out;
+			}
+		}
 	}
 
 	switch (cmd) {
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 05/18] ipvs: Supply skb_af to out_rt* functions
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (3 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 04/18] ipvs: prevent mixing heterogeneous pools and synchronization Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 06/18] ipvs: Pull out crosses_local_route_boundary logic Alex Gartrell
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

The out_rt functions inspect the skb's to ensure that we aren't breaking
any rules by crossing local/external boundaries.  Right now, they assume
that the address family of the skb packets is the same as they are, but
that assumption will no longer be true.

This patch introduces an additional parameter to the out route functions so
that we can make more intelligent decisions based upon the actual skb
address family later.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_xmit.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 56896a4..94c7466 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -159,7 +159,7 @@ retry:
 
 /* Get route to destination or remote server */
 static int
-__ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
+__ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 		   __be32 daddr, int rt_mode, __be32 *ret_saddr)
 {
 	struct net *net = dev_net(skb_dst(skb)->dev);
@@ -339,7 +339,7 @@ out_err:
  * Get route to destination or remote server
  */
 static int
-__ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
+__ip_vs_get_out_rt_v6(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 		      struct in6_addr *daddr, struct in6_addr *ret_saddr,
 		      struct ip_vs_iphdr *ipvsh, int do_xfrm, int rt_mode)
 {
@@ -556,8 +556,8 @@ ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	EnterFunction(10);
 
 	rcu_read_lock();
-	if (__ip_vs_get_out_rt(skb, NULL, iph->daddr, IP_VS_RT_MODE_NON_LOCAL,
-			       NULL) < 0)
+	if (__ip_vs_get_out_rt(cp->af, skb, NULL, iph->daddr,
+			       IP_VS_RT_MODE_NON_LOCAL, NULL) < 0)
 		goto tx_error;
 
 	ip_send_check(iph);
@@ -586,7 +586,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	EnterFunction(10);
 
 	rcu_read_lock();
-	if (__ip_vs_get_out_rt_v6(skb, NULL, &ipvsh->daddr.in6, NULL,
+	if (__ip_vs_get_out_rt_v6(cp->af, skb, NULL, &ipvsh->daddr.in6, NULL,
 				  ipvsh, 0, IP_VS_RT_MODE_NON_LOCAL) < 0)
 		goto tx_error;
 
@@ -633,7 +633,7 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	}
 
 	was_input = rt_is_input_route(skb_rtable(skb));
-	local = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
+	local = __ip_vs_get_out_rt(cp->af, skb, cp->dest, cp->daddr.ip,
 				   IP_VS_RT_MODE_LOCAL |
 				   IP_VS_RT_MODE_NON_LOCAL |
 				   IP_VS_RT_MODE_RDR, NULL);
@@ -721,8 +721,8 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 		IP_VS_DBG(10, "filled cport=%d\n", ntohs(*p));
 	}
 
-	local = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, NULL,
-				      ipvsh, 0,
+	local = __ip_vs_get_out_rt_v6(cp->af, skb, cp->dest, &cp->daddr.in6,
+				      NULL, ipvsh, 0,
 				      IP_VS_RT_MODE_LOCAL |
 				      IP_VS_RT_MODE_NON_LOCAL |
 				      IP_VS_RT_MODE_RDR);
@@ -829,7 +829,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	EnterFunction(10);
 
 	rcu_read_lock();
-	local = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
+	local = __ip_vs_get_out_rt(cp->af, skb, cp->dest, cp->daddr.ip,
 				   IP_VS_RT_MODE_LOCAL |
 				   IP_VS_RT_MODE_NON_LOCAL |
 				   IP_VS_RT_MODE_CONNECT |
@@ -928,7 +928,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	EnterFunction(10);
 
 	rcu_read_lock();
-	local = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6,
+	local = __ip_vs_get_out_rt_v6(cp->af, skb, cp->dest, &cp->daddr.in6,
 				      &saddr, ipvsh, 1,
 				      IP_VS_RT_MODE_LOCAL |
 				      IP_VS_RT_MODE_NON_LOCAL |
@@ -1021,7 +1021,7 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	EnterFunction(10);
 
 	rcu_read_lock();
-	local = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
+	local = __ip_vs_get_out_rt(cp->af, skb, cp->dest, cp->daddr.ip,
 				   IP_VS_RT_MODE_LOCAL |
 				   IP_VS_RT_MODE_NON_LOCAL |
 				   IP_VS_RT_MODE_KNOWN_NH, NULL);
@@ -1060,8 +1060,8 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	EnterFunction(10);
 
 	rcu_read_lock();
-	local = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, NULL,
-				      ipvsh, 0,
+	local = __ip_vs_get_out_rt_v6(cp->af, skb, cp->dest, &cp->daddr.in6,
+				      NULL, ipvsh, 0,
 				      IP_VS_RT_MODE_LOCAL |
 				      IP_VS_RT_MODE_NON_LOCAL);
 	if (local < 0)
@@ -1128,7 +1128,8 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 		  IP_VS_RT_MODE_LOCAL | IP_VS_RT_MODE_NON_LOCAL |
 		  IP_VS_RT_MODE_RDR : IP_VS_RT_MODE_NON_LOCAL;
 	rcu_read_lock();
-	local = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip, rt_mode, NULL);
+	local = __ip_vs_get_out_rt(cp->af, skb, cp->dest, cp->daddr.ip, rt_mode,
+				   NULL);
 	if (local < 0)
 		goto tx_error;
 	rt = skb_rtable(skb);
@@ -1219,8 +1220,8 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 		  IP_VS_RT_MODE_LOCAL | IP_VS_RT_MODE_NON_LOCAL |
 		  IP_VS_RT_MODE_RDR : IP_VS_RT_MODE_NON_LOCAL;
 	rcu_read_lock();
-	local = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, NULL,
-				      ipvsh, 0, rt_mode);
+	local = __ip_vs_get_out_rt_v6(cp->af, skb, cp->dest, &cp->daddr.in6,
+				      NULL, ipvsh, 0, rt_mode);
 	if (local < 0)
 		goto tx_error;
 	rt = (struct rt6_info *) skb_dst(skb);
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 06/18] ipvs: Pull out crosses_local_route_boundary logic
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (4 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 05/18] ipvs: Supply skb_af to out_rt* functions Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 07/18] ipvs: Pull out update_pmtu code Alex Gartrell
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

This logic is repeated in both out_rt functions so it was redundant.
Additionally, we'll need to be able to do checks to route v4 to v6 and vice
versa in order to deal with heterogeneous pools.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_xmit.c | 105 +++++++++++++++++++++-------------------
 1 file changed, 54 insertions(+), 51 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 94c7466..b348347 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -157,6 +157,49 @@ retry:
 	return rt;
 }
 
+#ifdef CONFIG_IP_VS_IPV6
+static inline int __ip_vs_is_local_route6(struct rt6_info *rt)
+{
+	return rt->dst.dev && rt->dst.dev->flags & IFF_LOOPBACK;
+}
+#endif
+
+static inline bool crosses_local_route_boundary(int skb_af, struct sk_buff *skb,
+						int rt_mode, bool new_rt_is_local)
+{
+	bool rt_mode_allow_local = !!(rt_mode & IP_VS_RT_MODE_LOCAL);
+	bool rt_mode_allow_non_local = !!(rt_mode & IP_VS_RT_MODE_LOCAL);
+	bool rt_mode_allow_redirect = !!(rt_mode & IP_VS_RT_MODE_RDR);
+	bool source_is_loopback;
+	bool old_rt_is_local;
+
+#ifdef CONFIG_IP_VS_IPV6
+	if (skb_af == AF_INET6) {
+		source_is_loopback = (!skb->dev || skb->dev->flags & IFF_LOOPBACK) &&
+			(ipv6_addr_type(&ipv6_hdr(skb)->saddr) & IPV6_ADDR_LOOPBACK);
+		old_rt_is_local = __ip_vs_is_local_route6(
+			(struct rt6_info *) skb_dst(skb));
+	} else
+#endif
+	{
+		source_is_loopback = ipv4_is_loopback(ip_hdr(skb)->saddr);
+		old_rt_is_local = skb_rtable(skb)->rt_flags & RTCF_LOCAL;
+	}
+
+	if (unlikely(new_rt_is_local)) {
+		if (!rt_mode_allow_local)
+			return true;
+		if (!rt_mode_allow_redirect && old_rt_is_local)
+			return true;
+	} else {
+		if (!rt_mode_allow_non_local)
+			return true;
+		if (source_is_loopback)
+			return true;
+	}
+	return false;
+}
+
 /* Get route to destination or remote server */
 static int
 __ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
@@ -218,30 +261,14 @@ __ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 	}
 
 	local = (rt->rt_flags & RTCF_LOCAL) ? 1 : 0;
-	if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) &
-	      rt_mode)) {
-		IP_VS_DBG_RL("Stopping traffic to %s address, dest: %pI4\n",
-			     (rt->rt_flags & RTCF_LOCAL) ?
-			     "local":"non-local", &daddr);
+	if (unlikely(crosses_local_route_boundary(skb_af, skb, rt_mode,
+						  local))) {
+		IP_VS_DBG_RL("We are crossing local and non-local addresses\n");
 		goto err_put;
 	}
 	iph = ip_hdr(skb);
-	if (likely(!local)) {
-		if (unlikely(ipv4_is_loopback(iph->saddr))) {
-			IP_VS_DBG_RL("Stopping traffic from loopback address "
-				     "%pI4 to non-local address, dest: %pI4\n",
-				     &iph->saddr, &daddr);
-			goto err_put;
-		}
-	} else {
-		ort = skb_rtable(skb);
-		if (!(rt_mode & IP_VS_RT_MODE_RDR) &&
-		    !(ort->rt_flags & RTCF_LOCAL)) {
-			IP_VS_DBG_RL("Redirect from non-local address %pI4 to "
-				     "local requires NAT method, dest: %pI4\n",
-				     &iph->daddr, &daddr);
-			goto err_put;
-		}
+
+	if (unlikely(local)) {
 		/* skb to local stack, preserve old route */
 		if (!noref)
 			ip_rt_put(rt);
@@ -295,12 +322,6 @@ err_unreach:
 }
 
 #ifdef CONFIG_IP_VS_IPV6
-
-static inline int __ip_vs_is_local_route6(struct rt6_info *rt)
-{
-	return rt->dst.dev && rt->dst.dev->flags & IFF_LOOPBACK;
-}
-
 static struct dst_entry *
 __ip_vs_route_output_v6(struct net *net, struct in6_addr *daddr,
 			struct in6_addr *ret_saddr, int do_xfrm)
@@ -393,32 +414,14 @@ __ip_vs_get_out_rt_v6(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 	}
 
 	local = __ip_vs_is_local_route6(rt);
-	if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) &
-	      rt_mode)) {
-		IP_VS_DBG_RL("Stopping traffic to %s address, dest: %pI6c\n",
-			     local ? "local":"non-local", daddr);
+
+	if (unlikely(crosses_local_route_boundary(skb_af, skb, rt_mode,
+						  local))) {
+		IP_VS_DBG_RL("We are crossing local and non-local addresses\n");
 		goto err_put;
 	}
-	if (likely(!local)) {
-		if (unlikely((!skb->dev || skb->dev->flags & IFF_LOOPBACK) &&
-			     ipv6_addr_type(&ipv6_hdr(skb)->saddr) &
-					    IPV6_ADDR_LOOPBACK)) {
-			IP_VS_DBG_RL("Stopping traffic from loopback address "
-				     "%pI6c to non-local address, "
-				     "dest: %pI6c\n",
-				     &ipv6_hdr(skb)->saddr, daddr);
-			goto err_put;
-		}
-	} else {
-		ort = (struct rt6_info *) skb_dst(skb);
-		if (!(rt_mode & IP_VS_RT_MODE_RDR) &&
-		    !__ip_vs_is_local_route6(ort)) {
-			IP_VS_DBG_RL("Redirect from non-local address %pI6c "
-				     "to local requires NAT method, "
-				     "dest: %pI6c\n",
-				     &ipv6_hdr(skb)->daddr, daddr);
-			goto err_put;
-		}
+
+	if (unlikely(local)) {
 		/* skb to local stack, preserve old route */
 		if (!noref)
 			dst_release(&rt->dst);
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 07/18] ipvs: Pull out update_pmtu code
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (5 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 06/18] ipvs: Pull out crosses_local_route_boundary logic Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 08/18] ipvs: Only do ip_hdr operations in *out_rt when skb_af is AF_INET Alex Gartrell
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

Another step toward heterogeneous pools, this removes another piece of
functionality currently specific to each address family type.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_xmit.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index b348347..193ad01 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -200,6 +200,25 @@ static inline bool crosses_local_route_boundary(int skb_af, struct sk_buff *skb,
 	return false;
 }
 
+static inline void maybe_update_pmtu(int skb_af, struct sk_buff *skb, int mtu)
+{
+	struct sock *sk = skb->sk;
+
+#ifdef CONFIG_IP_VS_IPV6
+	if (skb_af == AF_INET6) {
+		struct rt6_info *ort = (struct rt6_info *) skb_dst(skb);
+		if (!skb->dev && sk && sk->sk_state != TCP_TIME_WAIT)
+			ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu);
+
+	} else
+#endif
+	{
+		struct rtable *ort = skb_rtable(skb);
+		if (!skb->dev && sk && sk->sk_state != TCP_TIME_WAIT)
+			ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu);
+	}
+}
+
 /* Get route to destination or remote server */
 static int
 __ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
@@ -209,7 +228,6 @@ __ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 	struct netns_ipvs *ipvs = net_ipvs(net);
 	struct ip_vs_dest_dst *dest_dst;
 	struct rtable *rt;			/* Route to the other host */
-	struct rtable *ort;			/* Original route */
 	struct iphdr *iph;
 	__be16 df;
 	int mtu;
@@ -279,16 +297,12 @@ __ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 		mtu = dst_mtu(&rt->dst);
 		df = iph->frag_off & htons(IP_DF);
 	} else {
-		struct sock *sk = skb->sk;
-
 		mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr);
 		if (mtu < 68) {
 			IP_VS_DBG_RL("%s(): mtu less than 68\n", __func__);
 			goto err_put;
 		}
-		ort = skb_rtable(skb);
-		if (!skb->dev && sk && sk->sk_state != TCP_TIME_WAIT)
-			ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu);
+		maybe_update_pmtu(skb_af, skb, mtu);
 		/* MTU check allowed? */
 		df = sysctl_pmtu_disc(ipvs) ? iph->frag_off & htons(IP_DF) : 0;
 	}
@@ -367,7 +381,6 @@ __ip_vs_get_out_rt_v6(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 	struct net *net = dev_net(skb_dst(skb)->dev);
 	struct ip_vs_dest_dst *dest_dst;
 	struct rt6_info *rt;			/* Route to the other host */
-	struct rt6_info *ort;			/* Original route */
 	struct dst_entry *dst;
 	int mtu;
 	int local, noref = 1;
@@ -432,17 +445,13 @@ __ip_vs_get_out_rt_v6(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 	if (likely(!(rt_mode & IP_VS_RT_MODE_TUNNEL)))
 		mtu = dst_mtu(&rt->dst);
 	else {
-		struct sock *sk = skb->sk;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 08/18] ipvs: Only do ip_hdr operations in *out_rt when skb_af is AF_INET
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (6 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 07/18] ipvs: Pull out update_pmtu code Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 09/18] ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding Alex Gartrell
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

We can no longer count on always being able to access a v4 ip header, so
just drop the local variable and use ip_hdr only when skb_af is AF_INET.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_xmit.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 193ad01..7990641 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -228,8 +228,7 @@ __ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 	struct netns_ipvs *ipvs = net_ipvs(net);
 	struct ip_vs_dest_dst *dest_dst;
 	struct rtable *rt;			/* Route to the other host */
-	struct iphdr *iph;
-	__be16 df;
+	__be16 df = 0;
 	int mtu;
 	int local, noref = 1;
 
@@ -284,7 +283,6 @@ __ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 		IP_VS_DBG_RL("We are crossing local and non-local addresses\n");
 		goto err_put;
 	}
-	iph = ip_hdr(skb);
 
 	if (unlikely(local)) {
 		/* skb to local stack, preserve old route */
@@ -295,7 +293,8 @@ __ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 
 	if (likely(!(rt_mode & IP_VS_RT_MODE_TUNNEL))) {
 		mtu = dst_mtu(&rt->dst);
-		df = iph->frag_off & htons(IP_DF);
+		if (skb_af == AF_INET)
+			df = ip_hdr(skb)->frag_off & htons(IP_DF);
 	} else {
 		mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr);
 		if (mtu < 68) {
@@ -304,13 +303,14 @@ __ip_vs_get_out_rt(int skb_af, struct sk_buff *skb, struct ip_vs_dest *dest,
 		}
 		maybe_update_pmtu(skb_af, skb, mtu);
 		/* MTU check allowed? */
-		df = sysctl_pmtu_disc(ipvs) ? iph->frag_off & htons(IP_DF) : 0;
+		if (skb_af == AF_INET && sysctl_pmtu_disc(ipvs))
+			df = ip_hdr(skb)->frag_off & htons(IP_DF);
 	}
 
 	/* MTU checking */
 	if (unlikely(df && skb->len > mtu && !skb_is_gso(skb))) {
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
-		IP_VS_DBG(1, "frag needed for %pI4\n", &iph->saddr);
+		IP_VS_DBG(1, "frag needed for %pI4\n", &ip_hdr(skb)->saddr);
 		goto err_put;
 	}
 
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 09/18] ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (7 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 08/18] ipvs: Only do ip_hdr operations in *out_rt when skb_af is AF_INET Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 10/18] ipvs: address family of LBLC entry depends on svc family Alex Gartrell
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

Pull the common logic for preparing an skb to prepend the header into a
single function and then set fields such that they can be used in either
case (generalize tos and tclass to dscp, hop_limit and ttl to ttl, etc)

Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_conn.c |  12 +++-
 net/netfilter/ipvs/ip_vs_xmit.c | 146 +++++++++++++++++++++++++++++-----------
 2 files changed, 116 insertions(+), 42 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index fdb4880..13e9cee 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -488,7 +488,12 @@ static inline void ip_vs_bind_xmit(struct ip_vs_conn *cp)
 		break;
 
 	case IP_VS_CONN_F_TUNNEL:
-		cp->packet_xmit = ip_vs_tunnel_xmit;
+#ifdef CONFIG_IP_VS_IPV6
+		if (cp->daf == AF_INET6)
+			cp->packet_xmit = ip_vs_tunnel_xmit_v6;
+		else
+#endif
+			cp->packet_xmit = ip_vs_tunnel_xmit;
 		break;
 
 	case IP_VS_CONN_F_DROUTE:
@@ -514,7 +519,10 @@ static inline void ip_vs_bind_xmit_v6(struct ip_vs_conn *cp)
 		break;
 
 	case IP_VS_CONN_F_TUNNEL:
-		cp->packet_xmit = ip_vs_tunnel_xmit_v6;
+		if (cp->daf == AF_INET6)
+			cp->packet_xmit = ip_vs_tunnel_xmit_v6;
+		else
+			cp->packet_xmit = ip_vs_tunnel_xmit;
 		break;
 
 	case IP_VS_CONN_F_DROUTE:
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 7990641..3026fe4 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -803,6 +803,76 @@ tx_error:
 }
 #endif
 
+/* When forwarding a packet, we must ensure that we've got enough headroom
+ * for the encapsulation packet in the skb.  This also gives us an
+ * opportunity to figure out what the payload_len, dsfield, ttl, and df
+ * values should be, so that we won't need to look at the old ip header
+ * again */
+static struct sk_buff *
+ip_vs_prepare_tunneled_skb(struct sk_buff *skb, int skb_af,
+			   unsigned int max_headroom, __u8 *next_protocol,
+			   __u32 *payload_len, __u8 *dsfield, __u8 *ttl,
+			   __be16 *df)
+{
+	struct sk_buff *out_skb = NULL;
+	__u8 version = ip_hdr(skb)->version;
+	struct iphdr *old_iph = NULL;
+#ifdef CONFIG_IP_VS_IPV6
+	struct ipv6hdr *old_ipv6h = NULL;
+#endif
+
+	if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) {
+		out_skb = skb_realloc_headroom(skb, max_headroom);
+		if (!out_skb)
+			goto error;
+		consume_skb(skb);
+	} else
+		out_skb = skb;
+
+#ifdef CONFIG_IP_VS_IPV6
+	if (version == 6) {
+		old_ipv6h = ipv6_hdr(skb);
+		*next_protocol = IPPROTO_IPV6;
+		if (payload_len)
+			*payload_len =
+				ntohs(old_ipv6h->payload_len) + sizeof(*old_ipv6h);
+		*dsfield = ipv6_get_dsfield(old_ipv6h);
+		*ttl = old_ipv6h->hop_limit;
+		if (df)
+			*df = 0;
+	} else
+#endif
+	{
+		old_iph = ip_hdr(out_skb);
+		/* Copy DF, reset fragment offset and MF */
+		if (df)
+			*df = (old_iph->frag_off & htons(IP_DF));
+		*next_protocol = IPPROTO_IPIP;
+
+		/* fix old IP header checksum */
+		ip_send_check(old_iph);
+		*dsfield = ipv4_get_dsfield(old_iph);
+		*ttl = old_iph->ttl;
+		if (payload_len)
+			*payload_len = ntohs(old_iph->tot_len);
+	}
+
+	return out_skb;
+error:
+	kfree_skb(skb);
+	return ERR_PTR(ENOMEM);
+}
+
+static inline int __tun_gso_type_mask(int encaps_af, int orig_af)
+{
+	if (encaps_af == AF_INET && orig_af == AF_INET6)
+		return SKB_GSO_IPIP;
+
+	/* GSO: we need to provide proper SKB_GSO_ value for IPv6:
+	 * SKB_GSO_SIT/IPV6
+	 */
+	return 0;
+}
 
 /*
  *   IP Tunneling transmitter
@@ -831,9 +901,11 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	struct rtable *rt;			/* Route to the other host */
 	__be32 saddr;				/* Source for tunnel */
 	struct net_device *tdev;		/* Device to other host */
-	struct iphdr  *old_iph = ip_hdr(skb);
-	u8     tos = old_iph->tos;
-	__be16 df;
+	__u8 next_protocol = 0;
+	__u8 dsfield = 0;
+	__u8 ttl = 0;
+	__be16 df = 0;
+	__be16 *dfp = NULL;
 	struct iphdr  *iph;			/* Our new IP header */
 	unsigned int max_headroom;		/* The extra header space needed */
 	int ret, local;
@@ -856,29 +928,21 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	rt = skb_rtable(skb);
 	tdev = rt->dst.dev;
 
-	/* Copy DF, reset fragment offset and MF */
-	df = sysctl_pmtu_disc(ipvs) ? old_iph->frag_off & htons(IP_DF) : 0;
-
 	/*
 	 * Okay, now see if we can stuff it in the buffer as-is.
 	 */
 	max_headroom = LL_RESERVED_SPACE(tdev) + sizeof(struct iphdr);
 
-	if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) {
-		struct sk_buff *new_skb =
-			skb_realloc_headroom(skb, max_headroom);
-
-		if (!new_skb)
-			goto tx_error;
-		consume_skb(skb);
-		skb = new_skb;
-		old_iph = ip_hdr(skb);
-	}
-
-	/* fix old IP header checksum */
-	ip_send_check(old_iph);
+	/* We only care about the df field is sysctl_pmtu_disc(ipvs) is set */
+	dfp = sysctl_pmtu_disc(ipvs) ? &df : NULL;
+	skb = ip_vs_prepare_tunneled_skb(skb, cp->af, max_headroom,
+					 &next_protocol, NULL, &dsfield,
+					 &ttl, dfp);
+	if (IS_ERR(skb))
+		goto tx_error;
 
-	skb = iptunnel_handle_offloads(skb, false, SKB_GSO_IPIP);
+	skb = iptunnel_handle_offloads(
+		skb, false, __tun_gso_type_mask(AF_INET, cp->af));
 	if (IS_ERR(skb))
 		goto tx_error;
 
@@ -895,11 +959,11 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	iph->version		=	4;
 	iph->ihl		=	sizeof(struct iphdr)>>2;
 	iph->frag_off		=	df;
-	iph->protocol		=	IPPROTO_IPIP;
-	iph->tos		=	tos;
+	iph->protocol		=	next_protocol;
+	iph->tos		=	dsfield;
 	iph->daddr		=	cp->daddr.ip;
 	iph->saddr		=	saddr;
-	iph->ttl		=	old_iph->ttl;
+	iph->ttl		=	ttl;
 	ip_select_ident(skb, NULL);
 
 	/* Another hack: avoid icmp_send in ip_fragment */
@@ -932,7 +996,10 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	struct rt6_info *rt;		/* Route to the other host */
 	struct in6_addr saddr;		/* Source for tunnel */
 	struct net_device *tdev;	/* Device to other host */
-	struct ipv6hdr  *old_iph = ipv6_hdr(skb);
+	__u8 next_protocol = 0;
+	__u32 payload_len = 0;
+	__u8 dsfield = 0;
+	__u8 ttl = 0;
 	struct ipv6hdr  *iph;		/* Our new IP header */
 	unsigned int max_headroom;	/* The extra header space needed */
 	int ret, local;
@@ -960,22 +1027,22 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	 */
 	max_headroom = LL_RESERVED_SPACE(tdev) + sizeof(struct ipv6hdr);
 
-	if (skb_headroom(skb) < max_headroom || skb_cloned(skb)) {
-		struct sk_buff *new_skb =
-			skb_realloc_headroom(skb, max_headroom);
-
-		if (!new_skb)
-			goto tx_error;
-		consume_skb(skb);
-		skb = new_skb;
-		old_iph = ipv6_hdr(skb);
-	}
-
 	/* GSO: we need to provide proper SKB_GSO_ value for IPv6 */
 	skb = iptunnel_handle_offloads(skb, false, 0); /* SKB_GSO_SIT/IPV6 */
 	if (IS_ERR(skb))
 		goto tx_error;
 
+	skb = ip_vs_prepare_tunneled_skb(skb, cp->af, max_headroom,
+					 &next_protocol, &payload_len,
+					 &dsfield, &ttl, NULL);
+	if (IS_ERR(skb))
+		goto tx_error;
+
+	skb = iptunnel_handle_offloads(
+		skb, false, __tun_gso_type_mask(AF_INET6, cp->af));
+	if (IS_ERR(skb))
+		goto tx_error;
+
 	skb->transport_header = skb->network_header;
 
 	skb_push(skb, sizeof(struct ipv6hdr));
@@ -987,14 +1054,13 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	 */
 	iph			=	ipv6_hdr(skb);
 	iph->version		=	6;
-	iph->nexthdr		=	IPPROTO_IPV6;
-	iph->payload_len	=	old_iph->payload_len;
-	be16_add_cpu(&iph->payload_len, sizeof(*old_iph));
+	iph->nexthdr		=	next_protocol;
+	iph->payload_len	=	ntohs(payload_len);
 	memset(&iph->flow_lbl, 0, sizeof(iph->flow_lbl));
-	ipv6_change_dsfield(iph, 0, ipv6_get_dsfield(old_iph));
+	ipv6_change_dsfield(iph, 0, dsfield);
 	iph->daddr = cp->daddr.in6;
 	iph->saddr = saddr;
-	iph->hop_limit		=	old_iph->hop_limit;
+	iph->hop_limit		=	ttl;
 
 	/* Another hack: avoid icmp_send in ip_fragment */
 	skb->ignore_df = 1;
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 10/18] ipvs: address family of LBLC entry depends on svc family
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (8 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 09/18] ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 11/18] ipvs: address family of LBLCR " Alex Gartrell
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

From: Julian Anastasov <ja@ssi.bg>

The LBLC entries should use svc->af, not dest->af.
Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_lblc.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index 547ff33..127f140 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -199,11 +199,11 @@ ip_vs_lblc_get(int af, struct ip_vs_lblc_table *tbl,
  */
 static inline struct ip_vs_lblc_entry *
 ip_vs_lblc_new(struct ip_vs_lblc_table *tbl, const union nf_inet_addr *daddr,
-	       struct ip_vs_dest *dest)
+	       u16 af, struct ip_vs_dest *dest)
 {
 	struct ip_vs_lblc_entry *en;
 
-	en = ip_vs_lblc_get(dest->af, tbl, daddr);
+	en = ip_vs_lblc_get(af, tbl, daddr);
 	if (en) {
 		if (en->dest == dest)
 			return en;
@@ -213,8 +213,8 @@ ip_vs_lblc_new(struct ip_vs_lblc_table *tbl, const union nf_inet_addr *daddr,
 	if (!en)
 		return NULL;
 
-	en->af = dest->af;
-	ip_vs_addr_copy(dest->af, &en->addr, daddr);
+	en->af = af;
+	ip_vs_addr_copy(af, &en->addr, daddr);
 	en->lastuse = jiffies;
 
 	ip_vs_dest_hold(dest);
@@ -521,13 +521,13 @@ ip_vs_lblc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
 	/* If we fail to create a cache entry, we'll just use the valid dest */
 	spin_lock_bh(&svc->sched_lock);
 	if (!tbl->dead)
-		ip_vs_lblc_new(tbl, &iph->daddr, dest);
+		ip_vs_lblc_new(tbl, &iph->daddr, svc->af, dest);
 	spin_unlock_bh(&svc->sched_lock);
 
 out:
 	IP_VS_DBG_BUF(6, "LBLC: destination IP address %s --> server %s:%d\n",
 		      IP_VS_DBG_ADDR(svc->af, &iph->daddr),
-		      IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port));
+		      IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port));
 
 	return dest;
 }
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 11/18] ipvs: address family of LBLCR entry depends on svc family
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (9 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 10/18] ipvs: address family of LBLC entry depends on svc family Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 12/18] ipvs: use correct address family in DH logs Alex Gartrell
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

From: Julian Anastasov <ja@ssi.bg>

The LBLCR entries should use svc->af, not dest->af.
Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_lblcr.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 3f21a2f..2229d2d 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -362,18 +362,18 @@ ip_vs_lblcr_get(int af, struct ip_vs_lblcr_table *tbl,
  */
 static inline struct ip_vs_lblcr_entry *
 ip_vs_lblcr_new(struct ip_vs_lblcr_table *tbl, const union nf_inet_addr *daddr,
-		struct ip_vs_dest *dest)
+		u16 af, struct ip_vs_dest *dest)
 {
 	struct ip_vs_lblcr_entry *en;
 
-	en = ip_vs_lblcr_get(dest->af, tbl, daddr);
+	en = ip_vs_lblcr_get(af, tbl, daddr);
 	if (!en) {
 		en = kmalloc(sizeof(*en), GFP_ATOMIC);
 		if (!en)
 			return NULL;
 
-		en->af = dest->af;
-		ip_vs_addr_copy(dest->af, &en->addr, daddr);
+		en->af = af;
+		ip_vs_addr_copy(af, &en->addr, daddr);
 		en->lastuse = jiffies;
 
 		/* initialize its dest set */
@@ -706,13 +706,13 @@ ip_vs_lblcr_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
 	/* If we fail to create a cache entry, we'll just use the valid dest */
 	spin_lock_bh(&svc->sched_lock);
 	if (!tbl->dead)
-		ip_vs_lblcr_new(tbl, &iph->daddr, dest);
+		ip_vs_lblcr_new(tbl, &iph->daddr, svc->af, dest);
 	spin_unlock_bh(&svc->sched_lock);
 
 out:
 	IP_VS_DBG_BUF(6, "LBLCR: destination IP address %s --> server %s:%d\n",
 		      IP_VS_DBG_ADDR(svc->af, &iph->daddr),
-		      IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port));
+		      IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port));
 
 	return dest;
 }
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 12/18] ipvs: use correct address family in DH logs
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (10 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 11/18] ipvs: address family of LBLCR " Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 13/18] ipvs: use correct address family in LC logs Alex Gartrell
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

From: Julian Anastasov <ja@ssi.bg>

Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_dh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
index c3b8454..6be5c53 100644
--- a/net/netfilter/ipvs/ip_vs_dh.c
+++ b/net/netfilter/ipvs/ip_vs_dh.c
@@ -234,7 +234,7 @@ ip_vs_dh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
 
 	IP_VS_DBG_BUF(6, "DH: destination IP address %s --> server %s:%d\n",
 		      IP_VS_DBG_ADDR(svc->af, &iph->daddr),
-		      IP_VS_DBG_ADDR(svc->af, &dest->addr),
+		      IP_VS_DBG_ADDR(dest->af, &dest->addr),
 		      ntohs(dest->port));
 
 	return dest;
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 13/18] ipvs: use correct address family in LC logs
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (11 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 12/18] ipvs: use correct address family in DH logs Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 14/18] ipvs: use correct address family in NQ logs Alex Gartrell
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

From: Julian Anastasov <ja@ssi.bg>

Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_lc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_lc.c b/net/netfilter/ipvs/ip_vs_lc.c
index 2bdcb1c..19a0769 100644
--- a/net/netfilter/ipvs/ip_vs_lc.c
+++ b/net/netfilter/ipvs/ip_vs_lc.c
@@ -59,7 +59,7 @@ ip_vs_lc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
 	else
 		IP_VS_DBG_BUF(6, "LC: server %s:%u activeconns %d "
 			      "inactconns %d\n",
-			      IP_VS_DBG_ADDR(svc->af, &least->addr),
+			      IP_VS_DBG_ADDR(least->af, &least->addr),
 			      ntohs(least->port),
 			      atomic_read(&least->activeconns),
 			      atomic_read(&least->inactconns));
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 14/18] ipvs: use correct address family in NQ logs
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (12 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 13/18] ipvs: use correct address family in LC logs Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 15/18] ipvs: use correct address family in RR logs Alex Gartrell
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

From: Julian Anastasov <ja@ssi.bg>

Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_nq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_nq.c b/net/netfilter/ipvs/ip_vs_nq.c
index 961a6de..a8b6340 100644
--- a/net/netfilter/ipvs/ip_vs_nq.c
+++ b/net/netfilter/ipvs/ip_vs_nq.c
@@ -107,7 +107,8 @@ ip_vs_nq_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
   out:
 	IP_VS_DBG_BUF(6, "NQ: server %s:%u "
 		      "activeconns %d refcnt %d weight %d overhead %d\n",
-		      IP_VS_DBG_ADDR(svc->af, &least->addr), ntohs(least->port),
+		      IP_VS_DBG_ADDR(least->af, &least->addr),
+		      ntohs(least->port),
 		      atomic_read(&least->activeconns),
 		      atomic_read(&least->refcnt),
 		      atomic_read(&least->weight), loh);
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 15/18] ipvs: use correct address family in RR logs
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (13 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 14/18] ipvs: use correct address family in NQ logs Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 16/18] ipvs: use correct address family in SED logs Alex Gartrell
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

From: Julian Anastasov <ja@ssi.bg>

Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_rr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_rr.c b/net/netfilter/ipvs/ip_vs_rr.c
index 176b87c..58bacfc 100644
--- a/net/netfilter/ipvs/ip_vs_rr.c
+++ b/net/netfilter/ipvs/ip_vs_rr.c
@@ -95,7 +95,7 @@ stop:
 	spin_unlock_bh(&svc->sched_lock);
 	IP_VS_DBG_BUF(6, "RR: server %s:%u "
 		      "activeconns %d refcnt %d weight %d\n",
-		      IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port),
+		      IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port),
 		      atomic_read(&dest->activeconns),
 		      atomic_read(&dest->refcnt), atomic_read(&dest->weight));
 
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 16/18] ipvs: use correct address family in SED logs
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (14 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 15/18] ipvs: use correct address family in RR logs Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 17/18] ipvs: use correct address family in SH logs Alex Gartrell
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

From: Julian Anastasov <ja@ssi.bg>

Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_sed.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_sed.c b/net/netfilter/ipvs/ip_vs_sed.c
index e446b9f..f8e2d00 100644
--- a/net/netfilter/ipvs/ip_vs_sed.c
+++ b/net/netfilter/ipvs/ip_vs_sed.c
@@ -108,7 +108,8 @@ ip_vs_sed_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
 
 	IP_VS_DBG_BUF(6, "SED: server %s:%u "
 		      "activeconns %d refcnt %d weight %d overhead %d\n",
-		      IP_VS_DBG_ADDR(svc->af, &least->addr), ntohs(least->port),
+		      IP_VS_DBG_ADDR(least->af, &least->addr),
+		      ntohs(least->port),
 		      atomic_read(&least->activeconns),
 		      atomic_read(&least->refcnt),
 		      atomic_read(&least->weight), loh);
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 17/18] ipvs: use correct address family in SH logs
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (15 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 16/18] ipvs: use correct address family in SED logs Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15  3:23 ` [PATCH ipvs,v2 18/18] ipvs: use correct address family in WLC logs Alex Gartrell
  2014-08-15 12:53 ` [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Julian Anastasov
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

From: Julian Anastasov <ja@ssi.bg>

Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_sh.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index cc65b2f..98a1343 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -138,7 +138,7 @@ ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
 		return dest;
 
 	IP_VS_DBG_BUF(6, "SH: selected unavailable server %s:%d, reselecting",
-		      IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port));
+		      IP_VS_DBG_ADDR(dest->af, &dest->addr), ntohs(dest->port));
 
 	/* if the original dest is unavailable, loop around the table
 	 * starting from ihash to find a new dest
@@ -153,7 +153,7 @@ ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
 			return dest;
 		IP_VS_DBG_BUF(6, "SH: selected unavailable "
 			      "server %s:%d (offset %d), reselecting",
-			      IP_VS_DBG_ADDR(svc->af, &dest->addr),
+			      IP_VS_DBG_ADDR(dest->af, &dest->addr),
 			      ntohs(dest->port), roffset);
 	}
 
@@ -192,7 +192,7 @@ ip_vs_sh_reassign(struct ip_vs_sh_state *s, struct ip_vs_service *svc)
 			RCU_INIT_POINTER(b->dest, dest);
 
 			IP_VS_DBG_BUF(6, "assigned i: %d dest: %s weight: %d\n",
-				      i, IP_VS_DBG_ADDR(svc->af, &dest->addr),
+				      i, IP_VS_DBG_ADDR(dest->af, &dest->addr),
 				      atomic_read(&dest->weight));
 
 			/* Don't move to next dest until filling weight */
@@ -342,7 +342,7 @@ ip_vs_sh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
 
 	IP_VS_DBG_BUF(6, "SH: source IP address %s --> server %s:%d\n",
 		      IP_VS_DBG_ADDR(svc->af, &iph->saddr),
-		      IP_VS_DBG_ADDR(svc->af, &dest->addr),
+		      IP_VS_DBG_ADDR(dest->af, &dest->addr),
 		      ntohs(dest->port));
 
 	return dest;
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH ipvs,v2 18/18] ipvs: use correct address family in WLC logs
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (16 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 17/18] ipvs: use correct address family in SH logs Alex Gartrell
@ 2014-08-15  3:23 ` Alex Gartrell
  2014-08-15 12:53 ` [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Julian Anastasov
  18 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-15  3:23 UTC (permalink / raw)
  To: horms; +Cc: ja, lvs-devel, kernel-team, Alex Gartrell

From: Julian Anastasov <ja@ssi.bg>

Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_wlc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_wlc.c b/net/netfilter/ipvs/ip_vs_wlc.c
index b5b4650..6b366fd 100644
--- a/net/netfilter/ipvs/ip_vs_wlc.c
+++ b/net/netfilter/ipvs/ip_vs_wlc.c
@@ -80,7 +80,8 @@ ip_vs_wlc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
 
 	IP_VS_DBG_BUF(6, "WLC: server %s:%u "
 		      "activeconns %d refcnt %d weight %d overhead %d\n",
-		      IP_VS_DBG_ADDR(svc->af, &least->addr), ntohs(least->port),
+		      IP_VS_DBG_ADDR(least->af, &least->addr),
+		      ntohs(least->port),
 		      atomic_read(&least->activeconns),
 		      atomic_read(&least->refcnt),
 		      atomic_read(&least->weight), loh);
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa
  2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
                   ` (17 preceding siblings ...)
  2014-08-15  3:23 ` [PATCH ipvs,v2 18/18] ipvs: use correct address family in WLC logs Alex Gartrell
@ 2014-08-15 12:53 ` Julian Anastasov
  2014-08-27 21:37   ` Alex Gartrell
  18 siblings, 1 reply; 21+ messages in thread
From: Julian Anastasov @ 2014-08-15 12:53 UTC (permalink / raw)
  To: Alex Gartrell; +Cc: horms, lvs-devel, kernel-team


	Hello,

On Thu, 14 Aug 2014, Alex Gartrell wrote:

> At Facebook we use ipip forwarding to deliver packets from our layer 4 ipvs
> load balancers to our layer 7 proxies.  Today these layer 7 proxies are all
> dual stacked, so we can simply send v4 over v4 and v6 over v6.  In the
> future, we're going to have v6-only layer 7 load balancers (no internal v4
> address).  To deal with this, we'll forward v4 packets in v6 tunnels.  This
> patchset introduces the necessary functionality into ipvs.
> 
> The noteworthy limitation of this is that it is not compatible with state
> synchronization, so great care is taken to keep these things mutually
> exclusive.
> 
> This patchset includes changes that add an additional netlink attribute to
> destinations and changes that plumb the destination address family through
> parts of the code where it was assumed that it was the same as the service.
> Finally, there's a change that updates the transmit functions for tunneling
> to share common code and support v4 in v6 and vice versa.
> 
> Changes for v2:
> 
> Introduced crosses_local_route_boundary and update_pmtu functions and
> conditionally do the ip_hdr operations.  The latter means that df will be
> effectively zero when we forward an ipv6 packet over an ipv4 tunnel.
> 
> Additionally, I addressed Julian's other statements.
> 
> Alex Gartrell (9):
>   ipvs: Pass destination address family to ip_vs_trash_get_dest
>   ipvs: Supply destination address family to ip_vs_conn_new
>   ipvs: maintain a mixed_address_family_dest count
>   ipvs: prevent mixing heterogeneous pools and synchronization
>   ipvs: Supply skb_af to out_rt* functions
>   ipvs: Pull out crosses_local_route_boundary logic
>   ipvs: Pull out update_pmtu code
>   ipvs: Only do ip_hdr operations in *out_rt when skb_af is AF_INET
>   ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding
> 
> Julian Anastasov (9):
>   ipvs: address family of LBLC entry depends on svc family
>   ipvs: address family of LBLCR entry depends on svc family
>   ipvs: use correct address family in DH logs
>   ipvs: use correct address family in LC logs
>   ipvs: use correct address family in NQ logs
>   ipvs: use correct address family in RR logs
>   ipvs: use correct address family in SED logs
>   ipvs: use correct address family in SH logs
>   ipvs: use correct address family in WLC logs

	Great, here are some comments from me:

- as some patches are missing I'm not sure if some
of my notes for cp->af usage are addressed, eg. in
set_tcp_state.

Patch 1:
	- I guess this patch depends on patch 1 and 2 from v1, they are
	now missing. This patch looks like a fixed patch 3 from v1.

Patch 2:
	- looks ok

Patch 3:
	- old_af and new_af are not needed anymore

Patch 4:
	- it would be logically correct to take out the
	/* Temporary for consistency */ check and the
	IP_VS_CONN_F_TUNNEL check after
	/* Which connection types do we support? */ into another
	patch, not part of this patch. May be it can follow
	this patch or even it should be the last patch in this patchset.
	As result, current patch will include only the restriction
	for sync protocol and the last patch will enable the
	new feature.

Patch 5:
	- it is not very good to just add unused args to funcs,
	may be we should mix patch 5 and 6?

Patch 6:
	- scripts/checkpatch.pl warnings, may be renaming
	source_is_loopback to local_src can help. Make sure
	we do not exceed 80 columns.

	- in crosses_local_route_boundary() the check for
	old_rt_is_local should be inverted, it should be:

	if (!rt_mode_allow_redirect && !old_rt_is_local)

	- In the "We are crossing local and non-local addresses" message
	we can safely print the daddr because the family is constant for
	the function. We will not do it for source address because
	it is more complex (depends on skb family).

Patch 7:
	- I think, it should be safe to call
	skb_dst(skb)->ops->update_pmtu(skb_dst(skb), sk, NULL, mtu);
	in maybe_update_pmtu(), without any family checks.

	- scripts/checkpatch.pl warnings

Patch 8:
	- it looks like now icmp_send* calls should depend on skb_af,
	not on dest af. It means both __ip_vs_get_out_rt and
	__ip_vs_get_out_rt_v6 should use some new common function that
	will do "frag needed" checks for AF_INET skbs. It will also
	include the __mtu_check_toobig_v6 call, the IPv6 specific part.
	Then we have to provide 'ipvsh' as arg to __ip_vs_get_out_rt,
	it is already provided to __ip_vs_get_out_rt_v6.

	- "frag needed for %pI4" looks correct because df is set only
	for IPv4.

Patch 9:
	- we can use skb_af instead of version, right?

	- using of out_skb in ip_vs_prepare_tunneled_skb leads to
	errors (example: IPv6), better to use new_skb and to work
	only with skb.

	- ERR_PTR(ENOMEM) should be ERR_PTR(-ENOMEM)

	- the old iptunnel_handle_offloads call still remains in
	ip_vs_tunnel_xmit_v6, should be removed

	- __tun_gso_type_mask: only encaps_af should be used/checked?
	Or may be:
		if (encaps_af == AF_INET) {
			if (orig_af == AF_INET)
				return SKB_GSO_IPIP;
#ifdef CONFIG_IP_VS_IPV6
			if (orig_af == AF_INET6)
				return SKB_GSO_SIT;
#endif
		}
		...
		return 0;

	- iph->payload_len = ntohs(payload_len);
	should be iph->payload_len = htons(payload_len);

	- scripts/checkpatch.pl warnings

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa
  2014-08-15 12:53 ` [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Julian Anastasov
@ 2014-08-27 21:37   ` Alex Gartrell
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Gartrell @ 2014-08-27 21:37 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: horms, lvs-devel, kernel-team

Hey,

Thanks again for your fantastic review, Julian.  As you mentioned, I 
omitted the first two patches.  I'll include them in v3 and be more 
careful in the future.

Additionally, I've done an interactive rebase wherein I exec make -j && 
./scripts/checkpatch.pl $(git format-patch) between every pick, so 
hopefully this will be less sloppy.  The only warnings were related to 
splitting string literals across lines, which is done elsewhere in the file.

I've implemented the requested changes and will do the functional tests. 
  If all goes well I will mail v3 this evening.

On 8/15/14 3:42 PM, Alex Gartrell wrote:> On Fri, 15 Aug 2014 15:53:49 +0300
 > Julian Anastasov <ja@ssi.bg> wrote:
 >
 >>
 >> 	Hello,
 >>
 >> On Thu, 14 Aug 2014, Alex Gartrell wrote:
 >>
 >>> At Facebook we use ipip forwarding to deliver packets from our
 >>> layer 4 ipvs load balancers to our layer 7 proxies.  Today these
 >>> layer 7 proxies are all dual stacked, so we can simply send v4 over
 >>> v4 and v6 over v6.  In the future, we're going to have v6-only
 >>> layer 7 load balancers (no internal v4 address).  To deal with
 >>> this, we'll forward v4 packets in v6 tunnels.  This patchset
 >>> introduces the necessary functionality into ipvs.
 >>>
 >>> The noteworthy limitation of this is that it is not compatible with
 >>> state synchronization, so great care is taken to keep these things
 >>> mutually exclusive.
 >>>
 >>> This patchset includes changes that add an additional netlink
 >>> attribute to destinations and changes that plumb the destination
 >>> address family through parts of the code where it was assumed that
 >>> it was the same as the service. Finally, there's a change that
 >>> updates the transmit functions for tunneling to share common code
 >>> and support v4 in v6 and vice versa.
 >>>
 >>> Changes for v2:
 >>>
 >>> Introduced crosses_local_route_boundary and update_pmtu functions
 >>> and conditionally do the ip_hdr operations.  The latter means that
 >>> df will be effectively zero when we forward an ipv6 packet over an
 >>> ipv4 tunnel.
 >>>
 >>> Additionally, I addressed Julian's other statements.
 >>>
 >>> Alex Gartrell (9):
 >>>    ipvs: Pass destination address family to ip_vs_trash_get_dest
 >>>    ipvs: Supply destination address family to ip_vs_conn_new
 >>>    ipvs: maintain a mixed_address_family_dest count
 >>>    ipvs: prevent mixing heterogeneous pools and synchronization
 >>>    ipvs: Supply skb_af to out_rt* functions
 >>>    ipvs: Pull out crosses_local_route_boundary logic
 >>>    ipvs: Pull out update_pmtu code
 >>>    ipvs: Only do ip_hdr operations in *out_rt when skb_af is AF_INET
 >>>    ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding
 >>>
 >>> Julian Anastasov (9):
 >>>    ipvs: address family of LBLC entry depends on svc family
 >>>    ipvs: address family of LBLCR entry depends on svc family
 >>>    ipvs: use correct address family in DH logs
 >>>    ipvs: use correct address family in LC logs
 >>>    ipvs: use correct address family in NQ logs
 >>>    ipvs: use correct address family in RR logs
 >>>    ipvs: use correct address family in SED logs
 >>>    ipvs: use correct address family in SH logs
 >>>    ipvs: use correct address family in WLC logs
 >>
 >> 	Great, here are some comments from me:
 >>
 >> - as some patches are missing I'm not sure if some
 >> of my notes for cp->af usage are addressed, eg. in
 >> set_tcp_state.

Yes you are right.  I forgot the first two patches.  Will include them in v3

 >> Patch 1:
 >> 	- I guess this patch depends on patch 1 and 2 from v1, they
 >> are now missing. This patch looks like a fixed patch 3 from v1.
 >>

Yeah that's right

 >> Patch 2:
 >> 	- looks ok
 >>
 >> Patch 3:
 >> 	- old_af and new_af are not needed anymore
 >>

Duh, thanks

 >> Patch 4:
 >> 	- it would be logically correct to take out the
 >> 	/* Temporary for consistency */ check and the
 >> 	IP_VS_CONN_F_TUNNEL check after
 >> 	/* Which connection types do we support? */ into another
 >> 	patch, not part of this patch. May be it can follow
 >> 	this patch or even it should be the last patch in this
 >> patchset. As result, current patch will include only the restriction
 >> 	for sync protocol and the last patch will enable the
 >> 	new feature.

I'll split this up and put it at the end.

 >> Patch 5:
 >> 	- it is not very good to just add unused args to funcs,
 >> 	may be we should mix patch 5 and 6?

Yeah, I was kind of afraid that it'd be confusing for people, but I 
think that the code is separate enough that it won't be.

 >> Patch 6:
 >> 	- scripts/checkpatch.pl warnings, may be renaming
 >> 	source_is_loopback to local_src can help. Make sure
 >> 	we do not exceed 80 columns.

Most embarrassing screw up :(

But yeah, these are just really unpleasantly long identifiers.  I'll 
either need to break the lines in weird ways or shorten identifiers or both.

 >>
 >> 	- in crosses_local_route_boundary() the check for
 >> 	old_rt_is_local should be inverted, it should be:
 >>
 >> 	if (!rt_mode_allow_redirect && !old_rt_is_local)

That's scary.  Good catch.

 >> 	- In the "We are crossing local and non-local addresses"
 >> message we can safely print the daddr because the family is constant
 >> for the function. We will not do it for source address because
 >> 	it is more complex (depends on skb family).

I'll add "daddr=" to each

 >>
 >> Patch 7:
 >> 	- I think, it should be safe to call
 >> 	skb_dst(skb)->ops->update_pmtu(skb_dst(skb), sk, NULL, mtu);
 >> 	in maybe_update_pmtu(), without any family checks.

Yeah, both route types have dst in the same place and I imagine that's 
by design.


 >> 	- scripts/checkpatch.pl warnings
 >>
 >> Patch 8:
 >> 	- it looks like now icmp_send* calls should depend on skb_af,
 >> 	not on dest af. It means both __ip_vs_get_out_rt and
 >> 	__ip_vs_get_out_rt_v6 should use some new common function that
 >> 	will do "frag needed" checks for AF_INET skbs. It will also
 >> 	include the __mtu_check_toobig_v6 call, the IPv6 specific
 >> part. Then we have to provide 'ipvsh' as arg to __ip_vs_get_out_rt,
 >> 	it is already provided to __ip_vs_get_out_rt_v6.
 >>

Yeah I was able to pull this out.

 >> 	- "frag needed for %pI4" looks correct because df is set only
 >> 	for IPv4.
 >>
 >> Patch 9:
 >> 	- we can use skb_af instead of version, right?

Yeah, I remembered fixing this but must have done this elsewhere.

 >>
 >> 	- using of out_skb in ip_vs_prepare_tunneled_skb leads to
 >> 	errors (example: IPv6), better to use new_skb and to work
 >> 	only with skb.

Yeah that's fair.  I was concerned about not attempting to kfree_skb skb 
which is why I made the change, but the fact that it made me miss 
something else convinces me that it's problematic.

 >>
 >> 	- ERR_PTR(ENOMEM) should be ERR_PTR(-ENOMEM)

Ah, that makes sense.  Thanks

 >>
 >> 	- the old iptunnel_handle_offloads call still remains in
 >> 	ip_vs_tunnel_xmit_v6, should be removed
 >>

*facepalm*

 >> 	- __tun_gso_type_mask: only encaps_af should be used/checked?
 >> 	Or may be:
 >> 		if (encaps_af == AF_INET) {
 >> 			if (orig_af == AF_INET)
 >> 				return SKB_GSO_IPIP;
 >> #ifdef CONFIG_IP_VS_IPV6
 >> 			if (orig_af == AF_INET6)
 >> 				return SKB_GSO_SIT;
 >> #endif

Can this just be else to spare the #ifdef lines?

 >> 		}
 >> 		...
 >> 		return 0;
 >>
 >> 	- iph->payload_len = ntohs(payload_len);
 >> 	should be iph->payload_len = htons(payload_len);
 >>

+1

 >> 	- scripts/checkpatch.pl warnings
 >>
 >> Regards
 >>
 >> --
 >> Julian Anastasov <ja@ssi.bg>
 >



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-08-27 21:37 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-15  3:23 [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 01/18] ipvs: Pass destination address family to ip_vs_trash_get_dest Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 02/18] ipvs: Supply destination address family to ip_vs_conn_new Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 03/18] ipvs: maintain a mixed_address_family_dest count Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 04/18] ipvs: prevent mixing heterogeneous pools and synchronization Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 05/18] ipvs: Supply skb_af to out_rt* functions Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 06/18] ipvs: Pull out crosses_local_route_boundary logic Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 07/18] ipvs: Pull out update_pmtu code Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 08/18] ipvs: Only do ip_hdr operations in *out_rt when skb_af is AF_INET Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 09/18] ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 10/18] ipvs: address family of LBLC entry depends on svc family Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 11/18] ipvs: address family of LBLCR " Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 12/18] ipvs: use correct address family in DH logs Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 13/18] ipvs: use correct address family in LC logs Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 14/18] ipvs: use correct address family in NQ logs Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 15/18] ipvs: use correct address family in RR logs Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 16/18] ipvs: use correct address family in SED logs Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 17/18] ipvs: use correct address family in SH logs Alex Gartrell
2014-08-15  3:23 ` [PATCH ipvs,v2 18/18] ipvs: use correct address family in WLC logs Alex Gartrell
2014-08-15 12:53 ` [PATCH ipvs,v2 00/18] Support v6 real servers in v4 pools and vice versa Julian Anastasov
2014-08-27 21:37   ` Alex Gartrell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.