[PATCH net-next v10 00/11] vxlan: add ipv6 support

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v10 00/11] vxlan: add ipv6 support
@ 2013-08-28  5:22 Cong Wang
  2013-08-28  5:22 ` [PATCH net-next v10 01/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang

From: Cong Wang <amwang@redhat.com>

v10: rebase on the latest net-next
     fix a deadlock, which didn't exist before
     fix another compile error when IPV6=m
     some cleanup

v9: rebase on the latest net-next
    adjust coding style to make DaveM happy
    merge disable_ipv6 patch into the previous one
    add a cleanup patch

v8: fix the bug when bindv6only=1
    fix more compile errors when IPV6=m
    complete the rest missing features for IPv6

v7: respect disable_ipv6 flag
    back to ipv4 only when ipv6 is not supported

v6: use a stub for IPv6 mcast functions
    split a few more long lines
    rebased on the latest net-next

v5: make David happy on the names of the fields
    fix my mistake during rebasing the patches
    drop the scope_id patch, because it is broken
    export in6addr_loopback
    fix a udp checksum bug
    rebased on the latest net-next

v4: rename ->sin to ->va_sin
    rename ->sin6 to ->va_sin6
    rename ->family to ->va_sa
    support ll addr
    fix more ugly #ifdef
    rebased on the latest net-next

v3: fix many coding style issues
    fix some ugly #ifdef
    rename vxlan_ip to vxlan_addr
    rename ->proto to ->family
    rename ->ip4/->ip6 to ->sin/->sin6

v2: fix some compile error when !CONFIG_IPV6
    improve some code based on Stephen's comments
    use sockaddr suggested by David

Cong Wang (11):
  ipv6: make ip6_dst_hoplimit() static inline
  ipv6: move ip6_local_out into core kernel
  ipv6: export a stub for IPv6 symbols used by vxlan
  ipv6: export in6addr_loopback to modules
  ipv6: do not call ndisc_send_rs() with write lock
  vxlan: add ipv6 support
  vxlan: add ipv6 route short circuit support
  ipv6: move in6_dev_finish_destroy() into core kernel
  vxlan: add ipv6 proxy support
  vxlan: respect scope_id for ll addr
  ipv6: Add generic UDP Tunnel segmentation

 drivers/net/vxlan.c           |  876 ++++++++++++++++++++++++++++++++++-------
 include/net/addrconf.h        |   20 +
 include/net/ip6_route.h       |   23 +-
 include/net/ndisc.h           |    5 +
 include/net/vxlan.h           |    2 +-
 include/uapi/linux/if_link.h  |    2 +
 net/ipv6/addrconf.c           |   49 +--
 net/ipv6/addrconf_core.c      |   50 +++
 net/ipv6/af_inet6.c           |   14 +
 net/ipv6/ip6_offload.c        |    4 +-
 net/ipv6/ip6_output.c         |   25 --
 net/ipv6/ndisc.c              |    8 +-
 net/ipv6/output_core.c        |   26 ++
 net/ipv6/route.c              |   19 -
 net/ipv6/udp_offload.c        |  159 +++++---
 net/openvswitch/vport-vxlan.c |    2 +-
 16 files changed, 988 insertions(+), 296 deletions(-)

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 01/11] ipv6: make ip6_dst_hoplimit() static inline
  2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
@ 2013-08-28  5:22 ` Cong Wang
  2013-08-28 10:39   ` Eric Dumazet
  2013-08-28  5:22 ` [PATCH net-next v10 02/11] ipv6: move ip6_local_out into core kernel Cong Wang
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang

From: Cong Wang <amwang@redhat.com>

It will be used by vxlan.

Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/net/ip6_route.h |   23 +++++++++++++++++++++--
 net/ipv6/route.c        |   19 -------------------
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index f667248..2afcd0d 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -21,6 +21,7 @@ struct route_info {
 #include <net/flow.h>
 #include <net/ip6_fib.h>
 #include <net/sock.h>
+#include <net/addrconf.h>
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/route.h>
@@ -112,8 +113,6 @@ extern struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
 					   const struct in6_addr *addr,
 					   bool anycast);
 
-extern int			ip6_dst_hoplimit(struct dst_entry *dst);
-
 /*
  *	support functions for ND
  *
@@ -203,4 +202,24 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct in6_addr
 	return dest;
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
+static inline int ip6_dst_hoplimit(struct dst_entry *dst)
+{
+	int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT);
+	if (hoplimit == 0) {
+		struct net_device *dev = dst->dev;
+		struct inet6_dev *idev;
+
+		rcu_read_lock();
+		idev = __in6_dev_get(dev);
+		if (idev)
+			hoplimit = idev->cnf.hop_limit;
+		else
+			hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit;
+		rcu_read_unlock();
+	}
+	return hoplimit;
+}
+#endif
+
 #endif
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 55236a8..b770085 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1354,25 +1354,6 @@ out:
 	return entries > rt_max_size;
 }
 
-int ip6_dst_hoplimit(struct dst_entry *dst)
-{
-	int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT);
-	if (hoplimit == 0) {
-		struct net_device *dev = dst->dev;
-		struct inet6_dev *idev;
-
-		rcu_read_lock();
-		idev = __in6_dev_get(dev);
-		if (idev)
-			hoplimit = idev->cnf.hop_limit;
-		else
-			hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit;
-		rcu_read_unlock();
-	}
-	return hoplimit;
-}
-EXPORT_SYMBOL(ip6_dst_hoplimit);
-
 /*
  *
  */
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 02/11] ipv6: move ip6_local_out into core kernel
  2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
  2013-08-28  5:22 ` [PATCH net-next v10 01/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang
@ 2013-08-28  5:22 ` Cong Wang
  2013-08-28  5:22 ` [PATCH net-next v10 03/11] ipv6: export a stub for IPv6 symbols used by vxlan Cong Wang
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang

From: Cong Wang <amwang@redhat.com>

It will be used the vxlan kernel module.

Signed-off-by: Cong Wang <amwang@redhat.com>
---
 net/ipv6/ip6_output.c  |   25 -------------------------
 net/ipv6/output_core.c |   26 ++++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6e3ddf8..dd08cfd 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -56,31 +56,6 @@
 #include <net/checksum.h>
 #include <linux/mroute6.h>
 
-int __ip6_local_out(struct sk_buff *skb)
-{
-	int len;
-
-	len = skb->len - sizeof(struct ipv6hdr);
-	if (len > IPV6_MAXPLEN)
-		len = 0;
-	ipv6_hdr(skb)->payload_len = htons(len);
-
-	return nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, skb, NULL,
-		       skb_dst(skb)->dev, dst_output);
-}
-
-int ip6_local_out(struct sk_buff *skb)
-{
-	int err;
-
-	err = __ip6_local_out(skb);
-	if (likely(err == 1))
-		err = dst_output(skb);
-
-	return err;
-}
-EXPORT_SYMBOL_GPL(ip6_local_out);
-
 static int ip6_finish_output2(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index ab92a36..c1c741e 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -75,3 +75,29 @@ int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
 	return offset;
 }
 EXPORT_SYMBOL(ip6_find_1stfragopt);
+
+int __ip6_local_out(struct sk_buff *skb)
+{
+	int len;
+
+	len = skb->len - sizeof(struct ipv6hdr);
+	if (len > IPV6_MAXPLEN)
+		len = 0;
+	ipv6_hdr(skb)->payload_len = htons(len);
+
+	return nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, skb, NULL,
+		       skb_dst(skb)->dev, dst_output);
+}
+EXPORT_SYMBOL_GPL(__ip6_local_out);
+
+int ip6_local_out(struct sk_buff *skb)
+{
+	int err;
+
+	err = __ip6_local_out(skb);
+	if (likely(err == 1))
+		err = dst_output(skb);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(ip6_local_out);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 03/11] ipv6: export a stub for IPv6 symbols used by vxlan
  2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
  2013-08-28  5:22 ` [PATCH net-next v10 01/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang
  2013-08-28  5:22 ` [PATCH net-next v10 02/11] ipv6: move ip6_local_out into core kernel Cong Wang
@ 2013-08-28  5:22 ` Cong Wang
  2013-08-28 16:12   ` Stephen Hemminger
  2013-08-28  5:22 ` [PATCH net-next v10 04/11] ipv6: export some IPv6 special addresses to modules Cong Wang
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang, Ben Hutchings, Stephen Hemminger

From: Cong Wang <amwang@redhat.com>

In case IPv6 is compiled as a module, introduce a stub
for ipv6_sock_mc_join and ipv6_sock_mc_drop etc.. It will be used
by vxlan module. Suggested by Ben.

This is an ugly but easy solution for now.

Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/net/addrconf.h   |   15 +++++++++++++++
 net/ipv6/addrconf_core.c |    3 +++
 net/ipv6/af_inet6.c      |   11 +++++++++++
 3 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 43fa31a..5339cab 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -141,6 +141,21 @@ bool ipv6_chk_mcast_addr(struct net_device *dev, const struct in6_addr *group,
 			 const struct in6_addr *src_addr);
 
 void ipv6_mc_dad_complete(struct inet6_dev *idev);
+
+/* A stub used by vxlan module. This is ugly, ideally these
+ * symbols should be built into the core kernel.
+ */
+struct ipv6_stub {
+	int (*ipv6_sock_mc_join)(struct sock *sk, int ifindex,
+				 const struct in6_addr *addr);
+	int (*ipv6_sock_mc_drop)(struct sock *sk, int ifindex,
+				 const struct in6_addr *addr);
+	int (*ipv6_dst_lookup)(struct sock *sk, struct dst_entry **dst,
+				struct flowi6 *fl6);
+	void (*udpv6_encap_enable)(void);
+};
+extern const struct ipv6_stub *ipv6_stub __read_mostly;
+
 /*
  * identify MLD packets for MLD filter exceptions
  */
diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
index d2f8742..93b24c6 100644
--- a/net/ipv6/addrconf_core.c
+++ b/net/ipv6/addrconf_core.c
@@ -98,3 +98,6 @@ int inet6addr_notifier_call_chain(unsigned long val, void *v)
 	return atomic_notifier_call_chain(&inet6addr_chain, val, v);
 }
 EXPORT_SYMBOL(inet6addr_notifier_call_chain);
+
+const struct ipv6_stub *ipv6_stub __read_mostly;
+EXPORT_SYMBOL_GPL(ipv6_stub);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 0d1a9b1..0c9c22f 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -810,6 +810,13 @@ static struct pernet_operations inet6_net_ops = {
 	.exit = inet6_net_exit,
 };
 
+static const struct ipv6_stub ipv6_stub_impl = {
+	.ipv6_sock_mc_join = ipv6_sock_mc_join,
+	.ipv6_sock_mc_drop = ipv6_sock_mc_drop,
+	.ipv6_dst_lookup = ip6_dst_lookup,
+	.udpv6_encap_enable = udpv6_encap_enable,
+};
+
 static int __init inet6_init(void)
 {
 	struct list_head *r;
@@ -884,6 +891,9 @@ static int __init inet6_init(void)
 	err = igmp6_init();
 	if (err)
 		goto igmp_fail;
+
+	ipv6_stub = &ipv6_stub_impl;
+
 	err = ipv6_netfilter_init();
 	if (err)
 		goto netfilter_fail;
@@ -1040,6 +1050,7 @@ static void __exit inet6_exit(void)
 	raw6_proc_exit();
 #endif
 	ipv6_netfilter_fini();
+	ipv6_stub = NULL;
 	igmp6_cleanup();
 	ndisc_cleanup();
 	ip6_mr_cleanup();
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 04/11] ipv6: export some IPv6 special addresses to modules
  2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
                   ` (2 preceding siblings ...)
  2013-08-28  5:22 ` [PATCH net-next v10 03/11] ipv6: export a stub for IPv6 symbols used by vxlan Cong Wang
@ 2013-08-28  5:22 ` Cong Wang
  2013-08-28  5:22 ` [PATCH net-next v10 05/11] ipv6: do not call ndisc_send_rs() with write lock Cong Wang
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang, Mike Rapoport

From: Cong Wang <amwang@redhat.com>

It is needed by vxlan module. Noticed by Mike.

Cc: Mike Rapoport <mike.rapoport@ravellosystems.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 net/ipv6/addrconf.c      |    9 ---------
 net/ipv6/addrconf_core.c |   16 ++++++++++++++++
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2d6d179..8e68466 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -243,15 +243,6 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 	.accept_dad		= 1,
 };
 
-/* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */
-const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT;
-const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT;
-const struct in6_addr in6addr_linklocal_allnodes = IN6ADDR_LINKLOCAL_ALLNODES_INIT;
-const struct in6_addr in6addr_linklocal_allrouters = IN6ADDR_LINKLOCAL_ALLROUTERS_INIT;
-const struct in6_addr in6addr_interfacelocal_allnodes = IN6ADDR_INTERFACELOCAL_ALLNODES_INIT;
-const struct in6_addr in6addr_interfacelocal_allrouters = IN6ADDR_INTERFACELOCAL_ALLROUTERS_INIT;
-const struct in6_addr in6addr_sitelocal_allrouters = IN6ADDR_SITELOCAL_ALLROUTERS_INIT;
-
 /* Check if a valid qdisc is available */
 static inline bool addrconf_qdisc_ok(const struct net_device *dev)
 {
diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
index 93b24c6..a864033 100644
--- a/net/ipv6/addrconf_core.c
+++ b/net/ipv6/addrconf_core.c
@@ -101,3 +101,19 @@ EXPORT_SYMBOL(inet6addr_notifier_call_chain);
 
 const struct ipv6_stub *ipv6_stub __read_mostly;
 EXPORT_SYMBOL_GPL(ipv6_stub);
+
+/* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */
+const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT;
+EXPORT_SYMBOL(in6addr_loopback);
+const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT;
+EXPORT_SYMBOL(in6addr_any);
+const struct in6_addr in6addr_linklocal_allnodes = IN6ADDR_LINKLOCAL_ALLNODES_INIT;
+EXPORT_SYMBOL(in6addr_linklocal_allnodes);
+const struct in6_addr in6addr_linklocal_allrouters = IN6ADDR_LINKLOCAL_ALLROUTERS_INIT;
+EXPORT_SYMBOL(in6addr_linklocal_allrouters);
+const struct in6_addr in6addr_interfacelocal_allnodes = IN6ADDR_INTERFACELOCAL_ALLNODES_INIT;
+EXPORT_SYMBOL(in6addr_interfacelocal_allnodes);
+const struct in6_addr in6addr_interfacelocal_allrouters = IN6ADDR_INTERFACELOCAL_ALLROUTERS_INIT;
+EXPORT_SYMBOL(in6addr_interfacelocal_allrouters);
+const struct in6_addr in6addr_sitelocal_allrouters = IN6ADDR_SITELOCAL_ALLROUTERS_INIT;
+EXPORT_SYMBOL(in6addr_sitelocal_allrouters);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 05/11] ipv6: do not call ndisc_send_rs() with write lock
  2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
                   ` (3 preceding siblings ...)
  2013-08-28  5:22 ` [PATCH net-next v10 04/11] ipv6: export some IPv6 special addresses to modules Cong Wang
@ 2013-08-28  5:22 ` Cong Wang
  2013-08-28 15:34   ` Hannes Frederic Sowa
  2013-08-28  5:22 ` [PATCH net-next v10 06/11] vxlan: add ipv6 support Cong Wang
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang

From: Cong Wang <amwang@redhat.com>

Because vxlan module will call ip6_dst_lookup() in TX path,
which will hold write lock. So we have to release this write lock
before calling ndisc_send_rs(), otherwise could deadlock.

Signed-off-by: Cong Wang <amwang@redhat.com>
---
 net/ipv6/addrconf.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8e68466..1bb0861 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3088,6 +3088,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 static void addrconf_rs_timer(unsigned long data)
 {
 	struct inet6_dev *idev = (struct inet6_dev *)data;
+	struct net_device *dev = idev->dev;
 	struct in6_addr lladdr;
 
 	write_lock(&idev->lock);
@@ -3102,12 +3103,14 @@ static void addrconf_rs_timer(unsigned long data)
 		goto out;
 
 	if (idev->rs_probes++ < idev->cnf.rtr_solicits) {
-		if (!__ipv6_get_lladdr(idev, &lladdr, IFA_F_TENTATIVE))
-			ndisc_send_rs(idev->dev, &lladdr,
+		write_unlock(&idev->lock);
+		if (!ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE))
+			ndisc_send_rs(dev, &lladdr,
 				      &in6addr_linklocal_allrouters);
 		else
-			goto out;
+			goto put;
 
+		write_lock(&idev->lock);
 		/* The wait after the last probe can be shorter */
 		addrconf_mod_rs_timer(idev, (idev->rs_probes ==
 					     idev->cnf.rtr_solicits) ?
@@ -3123,6 +3126,7 @@ static void addrconf_rs_timer(unsigned long data)
 
 out:
 	write_unlock(&idev->lock);
+put:
 	in6_dev_put(idev);
 }
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 06/11] vxlan: add ipv6 support
  2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
                   ` (4 preceding siblings ...)
  2013-08-28  5:22 ` [PATCH net-next v10 05/11] ipv6: do not call ndisc_send_rs() with write lock Cong Wang
@ 2013-08-28  5:22 ` Cong Wang
  2013-08-28  5:22 ` [PATCH net-next v10 07/11] vxlan: add ipv6 route short circuit support Cong Wang
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang, David Stevens, Stephen Hemminger

From: Cong Wang <amwang@redhat.com>

This patch adds IPv6 support to vxlan device, as the new version
RFC already mentions it:

   http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03

Cc: David Stevens <dlstevens@us.ibm.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 drivers/net/vxlan.c           |  759 +++++++++++++++++++++++++++++++++--------
 include/net/vxlan.h           |    2 +-
 include/uapi/linux/if_link.h  |    2 +
 net/openvswitch/vport-vxlan.c |    2 +-
 4 files changed, 617 insertions(+), 148 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3b21aca..717c6e3 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -6,9 +6,6 @@
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
- *
- * TODO
- *  - IPv6 (not in RFC)
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -43,6 +40,11 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <net/vxlan.h>
+#if IS_ENABLED(CONFIG_IPV6)
+#include <net/addrconf.h>
+#include <net/ip6_route.h>
+#include <net/ip6_tunnel.h>
+#endif
 
 #define VXLAN_VERSION	"0.1"
 
@@ -59,6 +61,8 @@
 #define VXLAN_VID_MASK	(VXLAN_N_VID - 1)
 /* IP header + UDP + VXLAN + Ethernet header */
 #define VXLAN_HEADROOM (20 + 8 + 8 + 14)
+/* IPv6 header + UDP + VXLAN + Ethernet header */
+#define VXLAN6_HEADROOM (40 + 8 + 8 + 14)
 #define VXLAN_HLEN (sizeof(struct udphdr) + sizeof(struct vxlanhdr))
 
 #define VXLAN_FLAGS 0x08000000	/* struct vxlanhdr.vx_flags required value. */
@@ -92,8 +96,14 @@ struct vxlan_net {
 	spinlock_t	  sock_lock;
 };
 
+union vxlan_addr {
+	struct sockaddr_in sin;
+	struct sockaddr_in6 sin6;
+	struct sockaddr sa;
+};
+
 struct vxlan_rdst {
-	__be32			 remote_ip;
+	union vxlan_addr	 remote_ip;
 	__be16			 remote_port;
 	u32			 remote_vni;
 	u32			 remote_ifindex;
@@ -120,7 +130,7 @@ struct vxlan_dev {
 	struct vxlan_sock *vn_sock;	/* listening socket */
 	struct net_device *dev;
 	struct vxlan_rdst default_dst;	/* default destination */
-	__be32		  saddr;	/* source address */
+	union vxlan_addr  saddr;	/* source address */
 	__be16		  dst_port;
 	__u16		  port_min;	/* source port range */
 	__u16		  port_max;
@@ -146,6 +156,7 @@ struct vxlan_dev {
 #define VXLAN_F_RSC	0x04
 #define VXLAN_F_L2MISS	0x08
 #define VXLAN_F_L3MISS	0x10
+#define VXLAN_F_IPV6	0x20 /* internal flag */
 
 /* salt for hash table */
 static u32 vxlan_salt __read_mostly;
@@ -153,6 +164,96 @@ static struct workqueue_struct *vxlan_wq;
 
 static void vxlan_sock_work(struct work_struct *work);
 
+#if IS_ENABLED(CONFIG_IPV6)
+static inline
+bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b)
+{
+       if (a->sa.sa_family != b->sa.sa_family)
+               return false;
+       if (a->sa.sa_family == AF_INET6)
+               return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr);
+       else
+               return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr;
+}
+
+static inline bool vxlan_addr_any(const union vxlan_addr *ipa)
+{
+       if (ipa->sa.sa_family == AF_INET6)
+               return ipv6_addr_any(&ipa->sin6.sin6_addr);
+       else
+               return ipa->sin.sin_addr.s_addr == htonl(INADDR_ANY);
+}
+
+static inline bool vxlan_addr_multicast(const union vxlan_addr *ipa)
+{
+       if (ipa->sa.sa_family == AF_INET6)
+               return ipv6_addr_is_multicast(&ipa->sin6.sin6_addr);
+       else
+               return IN_MULTICAST(ntohl(ipa->sin.sin_addr.s_addr));
+}
+
+static int vxlan_nla_get_addr(union vxlan_addr *ip, struct nlattr *nla)
+{
+       if (nla_len(nla) >= sizeof(struct in6_addr)) {
+               nla_memcpy(&ip->sin6.sin6_addr, nla, sizeof(struct in6_addr));
+               ip->sa.sa_family = AF_INET6;
+               return 0;
+       } else if (nla_len(nla) >= sizeof(__be32)) {
+               ip->sin.sin_addr.s_addr = nla_get_be32(nla);
+               ip->sa.sa_family = AF_INET;
+               return 0;
+       } else {
+               return -EAFNOSUPPORT;
+       }
+}
+
+static int vxlan_nla_put_addr(struct sk_buff *skb, int attr,
+                             const union vxlan_addr *ip)
+{
+       if (ip->sa.sa_family == AF_INET6)
+               return nla_put(skb, attr, sizeof(struct in6_addr), &ip->sin6.sin6_addr);
+       else
+               return nla_put_be32(skb, attr, ip->sin.sin_addr.s_addr);
+}
+
+#else /* !CONFIG_IPV6 */
+
+static inline
+bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b)
+{
+       return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr;
+}
+
+static inline bool vxlan_addr_any(const union vxlan_addr *ipa)
+{
+       return ipa->sin.sin_addr.s_addr == htonl(INADDR_ANY);
+}
+
+static inline bool vxlan_addr_multicast(const union vxlan_addr *ipa)
+{
+       return IN_MULTICAST(ntohl(ipa->sin.sin_addr.s_addr));
+}
+
+static int vxlan_nla_get_addr(union vxlan_addr *ip, struct nlattr *nla)
+{
+       if (nla_len(nla) >= sizeof(struct in6_addr)) {
+               return -EAFNOSUPPORT;
+       } else if (nla_len(nla) >= sizeof(__be32)) {
+               ip->sin.sin_addr.s_addr = nla_get_be32(nla);
+               ip->sa.sa_family = AF_INET;
+               return 0;
+       } else {
+               return -EAFNOSUPPORT;
+       }
+}
+
+static int vxlan_nla_put_addr(struct sk_buff *skb, int attr,
+                             const union vxlan_addr *ip)
+{
+       return nla_put_be32(skb, attr, ip->sin.sin_addr.s_addr);
+}
+#endif
+
 /* Virtual Network hash table head */
 static inline struct hlist_head *vni_head(struct vxlan_sock *vs, u32 id)
 {
@@ -239,7 +340,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 
 	if (type == RTM_GETNEIGH) {
 		ndm->ndm_family	= AF_INET;
-		send_ip = rdst->remote_ip != htonl(INADDR_ANY);
+		send_ip = !vxlan_addr_any(&rdst->remote_ip);
 		send_eth = !is_zero_ether_addr(fdb->eth_addr);
 	} else
 		ndm->ndm_family	= AF_BRIDGE;
@@ -251,7 +352,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 	if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
 		goto nla_put_failure;
 
-	if (send_ip && nla_put_be32(skb, NDA_DST, rdst->remote_ip))
+	if (send_ip && vxlan_nla_put_addr(skb, NDA_DST, &rdst->remote_ip))
 		goto nla_put_failure;
 
 	if (rdst->remote_port && rdst->remote_port != vxlan->dst_port &&
@@ -283,7 +384,7 @@ static inline size_t vxlan_nlmsg_size(void)
 {
 	return NLMSG_ALIGN(sizeof(struct ndmsg))
 		+ nla_total_size(ETH_ALEN) /* NDA_LLADDR */
-		+ nla_total_size(sizeof(__be32)) /* NDA_DST */
+		+ nla_total_size(sizeof(struct in6_addr)) /* NDA_DST */
 		+ nla_total_size(sizeof(__be16)) /* NDA_PORT */
 		+ nla_total_size(sizeof(__be32)) /* NDA_VNI */
 		+ nla_total_size(sizeof(__u32)) /* NDA_IFINDEX */
@@ -317,14 +418,14 @@ errout:
 		rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
 }
 
-static void vxlan_ip_miss(struct net_device *dev, __be32 ipa)
+static void vxlan_ip_miss(struct net_device *dev, union vxlan_addr *ipa)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct vxlan_fdb f = {
 		.state = NUD_STALE,
 	};
 	struct vxlan_rdst remote = {
-		.remote_ip = ipa, /* goes to NDA_DST */
+		.remote_ip = *ipa, /* goes to NDA_DST */
 		.remote_vni = VXLAN_N_VID,
 	};
 
@@ -397,13 +498,13 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
 
 /* caller should hold vxlan->hash_lock */
 static struct vxlan_rdst *vxlan_fdb_find_rdst(struct vxlan_fdb *f,
-					      __be32 ip, __be16 port,
+					      union vxlan_addr *ip, __be16 port,
 					      __u32 vni, __u32 ifindex)
 {
 	struct vxlan_rdst *rd;
 
 	list_for_each_entry(rd, &f->remotes, list) {
-		if (rd->remote_ip == ip &&
+		if (vxlan_addr_equal(&rd->remote_ip, ip) &&
 		    rd->remote_port == port &&
 		    rd->remote_vni == vni &&
 		    rd->remote_ifindex == ifindex)
@@ -415,7 +516,7 @@ static struct vxlan_rdst *vxlan_fdb_find_rdst(struct vxlan_fdb *f,
 
 /* Replace destination of unicast mac */
 static int vxlan_fdb_replace(struct vxlan_fdb *f,
-			    __be32 ip, __be16 port, __u32 vni, __u32 ifindex)
+			     union vxlan_addr *ip, __be16 port, __u32 vni, __u32 ifindex)
 {
 	struct vxlan_rdst *rd;
 
@@ -426,7 +527,7 @@ static int vxlan_fdb_replace(struct vxlan_fdb *f,
 	rd = list_first_entry_or_null(&f->remotes, struct vxlan_rdst, list);
 	if (!rd)
 		return 0;
-	rd->remote_ip = ip;
+	rd->remote_ip = *ip;
 	rd->remote_port = port;
 	rd->remote_vni = vni;
 	rd->remote_ifindex = ifindex;
@@ -435,7 +536,7 @@ static int vxlan_fdb_replace(struct vxlan_fdb *f,
 
 /* Add/update destinations for multicast */
 static int vxlan_fdb_append(struct vxlan_fdb *f,
-			    __be32 ip, __be16 port, __u32 vni, __u32 ifindex)
+			    union vxlan_addr *ip, __be16 port, __u32 vni, __u32 ifindex)
 {
 	struct vxlan_rdst *rd;
 
@@ -446,7 +547,7 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
 	rd = kmalloc(sizeof(*rd), GFP_ATOMIC);
 	if (rd == NULL)
 		return -ENOBUFS;
-	rd->remote_ip = ip;
+	rd->remote_ip = *ip;
 	rd->remote_port = port;
 	rd->remote_vni = vni;
 	rd->remote_ifindex = ifindex;
@@ -458,7 +559,7 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
 
 /* Add new entry to forwarding table -- assumes lock held */
 static int vxlan_fdb_create(struct vxlan_dev *vxlan,
-			    const u8 *mac, __be32 ip,
+			    const u8 *mac, union vxlan_addr *ip,
 			    __u16 state, __u16 flags,
 			    __be16 port, __u32 vni, __u32 ifindex,
 			    __u8 ndm_flags)
@@ -517,7 +618,7 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
 		    (is_multicast_ether_addr(mac) || is_zero_ether_addr(mac)))
 			return -EOPNOTSUPP;
 
-		netdev_dbg(vxlan->dev, "add %pM -> %pI4\n", mac, &ip);
+		netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
 		f = kmalloc(sizeof(*f), GFP_ATOMIC);
 		if (!f)
 			return -ENOMEM;
@@ -565,17 +666,26 @@ static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f)
 }
 
 static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan,
-			   __be32 *ip, __be16 *port, u32 *vni, u32 *ifindex)
+			   union vxlan_addr *ip, __be16 *port, u32 *vni, u32 *ifindex)
 {
 	struct net *net = dev_net(vxlan->dev);
+	int err;
 
 	if (tb[NDA_DST]) {
-		if (nla_len(tb[NDA_DST]) != sizeof(__be32))
-			return -EAFNOSUPPORT;
-
-		*ip = nla_get_be32(tb[NDA_DST]);
+		err = vxlan_nla_get_addr(ip, tb[NDA_DST]);
+		if (err)
+			return err;
 	} else {
-		*ip = htonl(INADDR_ANY);
+		union vxlan_addr *remote = &vxlan->default_dst.remote_ip;
+		if (remote->sa.sa_family == AF_INET) {
+			ip->sin.sin_addr.s_addr = htonl(INADDR_ANY);
+			ip->sa.sa_family = AF_INET;
+#if IS_ENABLED(CONFIG_IPV6)
+		} else {
+			ip->sin6.sin6_addr = in6addr_any;
+			ip->sa.sa_family = AF_INET6;
+#endif
+		}
 	}
 
 	if (tb[NDA_PORT]) {
@@ -618,7 +728,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	/* struct net *net = dev_net(vxlan->dev); */
-	__be32 ip;
+	union vxlan_addr ip;
 	__be16 port;
 	u32 vni, ifindex;
 	int err;
@@ -637,7 +747,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 		return err;
 
 	spin_lock_bh(&vxlan->hash_lock);
-	err = vxlan_fdb_create(vxlan, addr, ip, ndm->ndm_state, flags,
+	err = vxlan_fdb_create(vxlan, addr, &ip, ndm->ndm_state, flags,
 			       port, vni, ifindex, ndm->ndm_flags);
 	spin_unlock_bh(&vxlan->hash_lock);
 
@@ -652,7 +762,7 @@ static int vxlan_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct vxlan_fdb *f;
 	struct vxlan_rdst *rd = NULL;
-	__be32 ip;
+	union vxlan_addr ip;
 	__be16 port;
 	u32 vni, ifindex;
 	int err;
@@ -668,8 +778,8 @@ static int vxlan_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
 	if (!f)
 		goto out;
 
-	if (ip != htonl(INADDR_ANY)) {
-		rd = vxlan_fdb_find_rdst(f, ip, port, vni, ifindex);
+	if (!vxlan_addr_any(&ip)) {
+		rd = vxlan_fdb_find_rdst(f, &ip, port, vni, ifindex);
 		if (!rd)
 			goto out;
 	}
@@ -732,7 +842,7 @@ out:
  * Return true if packet is bogus and should be droppped.
  */
 static bool vxlan_snoop(struct net_device *dev,
-			__be32 src_ip, const u8 *src_mac)
+			union vxlan_addr *src_ip, const u8 *src_mac)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct vxlan_fdb *f;
@@ -741,7 +851,7 @@ static bool vxlan_snoop(struct net_device *dev,
 	if (likely(f)) {
 		struct vxlan_rdst *rdst = first_remote_rcu(f);
 
-		if (likely(rdst->remote_ip == src_ip))
+		if (likely(vxlan_addr_equal(&rdst->remote_ip, src_ip)))
 			return false;
 
 		/* Don't migrate static entries, drop packets */
@@ -750,10 +860,10 @@ static bool vxlan_snoop(struct net_device *dev,
 
 		if (net_ratelimit())
 			netdev_info(dev,
-				    "%pM migrated from %pI4 to %pI4\n",
+				    "%pM migrated from %pIS to %pIS\n",
 				    src_mac, &rdst->remote_ip, &src_ip);
 
-		rdst->remote_ip = src_ip;
+		rdst->remote_ip = *src_ip;
 		f->updated = jiffies;
 		vxlan_fdb_notify(vxlan, f, RTM_NEWNEIGH);
 	} else {
@@ -775,7 +885,7 @@ static bool vxlan_snoop(struct net_device *dev,
 }
 
 /* See if multicast group is already in use by other ID */
-static bool vxlan_group_used(struct vxlan_net *vn, __be32 remote_ip)
+static bool vxlan_group_used(struct vxlan_net *vn, union vxlan_addr *remote_ip)
 {
 	struct vxlan_dev *vxlan;
 
@@ -783,7 +893,8 @@ static bool vxlan_group_used(struct vxlan_net *vn, __be32 remote_ip)
 		if (!netif_running(vxlan->dev))
 			continue;
 
-		if (vxlan->default_dst.remote_ip == remote_ip)
+		if (vxlan_addr_equal(&vxlan->default_dst.remote_ip,
+				     remote_ip))
 			return true;
 	}
 
@@ -819,13 +930,23 @@ static void vxlan_igmp_join(struct work_struct *work)
 	struct vxlan_dev *vxlan = container_of(work, struct vxlan_dev, igmp_join);
 	struct vxlan_sock *vs = vxlan->vn_sock;
 	struct sock *sk = vs->sock->sk;
-	struct ip_mreqn mreq = {
-		.imr_multiaddr.s_addr	= vxlan->default_dst.remote_ip,
-		.imr_ifindex		= vxlan->default_dst.remote_ifindex,
-	};
+	union vxlan_addr *ip = &vxlan->default_dst.remote_ip;
+	int ifindex = vxlan->default_dst.remote_ifindex;
 
 	lock_sock(sk);
-	ip_mc_join_group(sk, &mreq);
+	if (ip->sa.sa_family == AF_INET) {
+		struct ip_mreqn mreq = {
+			.imr_multiaddr.s_addr	= ip->sin.sin_addr.s_addr,
+			.imr_ifindex		= ifindex,
+		};
+
+		ip_mc_join_group(sk, &mreq);
+#if IS_ENABLED(CONFIG_IPV6)
+	} else {
+		ipv6_stub->ipv6_sock_mc_join(sk, ifindex,
+					     &ip->sin6.sin6_addr);
+#endif
+	}
 	release_sock(sk);
 
 	vxlan_sock_release(vs);
@@ -838,13 +959,24 @@ static void vxlan_igmp_leave(struct work_struct *work)
 	struct vxlan_dev *vxlan = container_of(work, struct vxlan_dev, igmp_leave);
 	struct vxlan_sock *vs = vxlan->vn_sock;
 	struct sock *sk = vs->sock->sk;
-	struct ip_mreqn mreq = {
-		.imr_multiaddr.s_addr	= vxlan->default_dst.remote_ip,
-		.imr_ifindex		= vxlan->default_dst.remote_ifindex,
-	};
+	union vxlan_addr *ip = &vxlan->default_dst.remote_ip;
+	int ifindex = vxlan->default_dst.remote_ifindex;
 
 	lock_sock(sk);
-	ip_mc_leave_group(sk, &mreq);
+	if (ip->sa.sa_family == AF_INET) {
+		struct ip_mreqn mreq = {
+			.imr_multiaddr.s_addr	= ip->sin.sin_addr.s_addr,
+			.imr_ifindex		= ifindex,
+		};
+
+		ip_mc_leave_group(sk, &mreq);
+#if IS_ENABLED(CONFIG_IPV6)
+	} else {
+		ipv6_stub->ipv6_sock_mc_drop(sk, ifindex,
+					     &ip->sin6.sin6_addr);
+#endif
+	}
+
 	release_sock(sk);
 
 	vxlan_sock_release(vs);
@@ -896,11 +1028,14 @@ error:
 static void vxlan_rcv(struct vxlan_sock *vs,
 		      struct sk_buff *skb, __be32 vx_vni)
 {
-	struct iphdr *oip;
+	struct iphdr *oip = NULL;
+	struct ipv6hdr *oip6 = NULL;
 	struct vxlan_dev *vxlan;
 	struct pcpu_tstats *stats;
+	union vxlan_addr saddr;
 	__u32 vni;
-	int err;
+	int err = 0;
+	union vxlan_addr *remote_ip;
 
 	vni = ntohl(vx_vni) >> 8;
 	/* Is this VNI defined? */
@@ -908,6 +1043,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 	if (!vxlan)
 		goto drop;
 
+	remote_ip = &vxlan->default_dst.remote_ip;
 	skb_reset_mac_header(skb);
 	skb->protocol = eth_type_trans(skb, vxlan->dev);
 
@@ -917,9 +1053,20 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 		goto drop;
 
 	/* Re-examine inner Ethernet packet */
-	oip = ip_hdr(skb);
+	if (remote_ip->sa.sa_family == AF_INET) {
+		oip = ip_hdr(skb);
+		saddr.sin.sin_addr.s_addr = oip->saddr;
+		saddr.sa.sa_family = AF_INET;
+#if IS_ENABLED(CONFIG_IPV6)
+	} else {
+		oip6 = ipv6_hdr(skb);
+		saddr.sin6.sin6_addr = oip6->saddr;
+		saddr.sa.sa_family = AF_INET6;
+#endif
+	}
+
 	if ((vxlan->flags & VXLAN_F_LEARN) &&
-	    vxlan_snoop(skb->dev, oip->saddr, eth_hdr(skb)->h_source))
+	    vxlan_snoop(skb->dev, &saddr, eth_hdr(skb)->h_source))
 		goto drop;
 
 	skb_reset_network_header(skb);
@@ -935,11 +1082,20 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 
 	skb->encapsulation = 0;
 
-	err = IP_ECN_decapsulate(oip, skb);
+	if (oip6)
+		err = IP6_ECN_decapsulate(oip6, skb);
+	if (oip)
+		err = IP_ECN_decapsulate(oip, skb);
+
 	if (unlikely(err)) {
-		if (log_ecn_error)
-			net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n",
-					     &oip->saddr, oip->tos);
+		if (log_ecn_error) {
+			if (oip6)
+				net_info_ratelimited("non-ECT from %pI6\n",
+						     &oip6->saddr);
+			if (oip)
+				net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n",
+						     &oip->saddr, oip->tos);
+		}
 		if (err > 1) {
 			++vxlan->dev->stats.rx_frame_errors;
 			++vxlan->dev->stats.rx_errors;
@@ -968,6 +1124,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
 	u8 *arpptr, *sha;
 	__be32 sip, tip;
 	struct neighbour *n;
+	union vxlan_addr ipa;
 
 	if (dev->flags & IFF_NOARP)
 		goto out;
@@ -1009,7 +1166,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
 		}
 
 		f = vxlan_find_mac(vxlan, n->ha);
-		if (f && first_remote_rcu(f)->remote_ip == htonl(INADDR_ANY)) {
+		if (f && vxlan_addr_any(&(first_remote_rcu(f)->remote_ip))) {
 			/* bridge-local neighbor */
 			neigh_release(n);
 			goto out;
@@ -1027,8 +1184,11 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
 
 		if (netif_rx_ni(reply) == NET_RX_DROP)
 			dev->stats.rx_dropped++;
-	} else if (vxlan->flags & VXLAN_F_L3MISS)
-		vxlan_ip_miss(dev, tip);
+	} else if (vxlan->flags & VXLAN_F_L3MISS) {
+		ipa.sin.sin_addr.s_addr = tip;
+		ipa.sa.sa_family = AF_INET;
+		vxlan_ip_miss(dev, &ipa);
+	}
 out:
 	consume_skb(skb);
 	return NETDEV_TX_OK;
@@ -1050,6 +1210,14 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 			return false;
 		pip = ip_hdr(skb);
 		n = neigh_lookup(&arp_tbl, &pip->daddr, dev);
+		if (!n && vxlan->flags & VXLAN_F_L3MISS) {
+			union vxlan_addr ipa;
+			ipa.sin.sin_addr.s_addr = pip->daddr;
+			ipa.sa.sa_family = AF_INET;
+			vxlan_ip_miss(dev, &ipa);
+			return false;
+		}
+
 		break;
 	default:
 		return false;
@@ -1066,8 +1234,8 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 		}
 		neigh_release(n);
 		return diff;
-	} else if (vxlan->flags & VXLAN_F_L3MISS)
-		vxlan_ip_miss(dev, pip->daddr);
+	}
+
 	return false;
 }
 
@@ -1118,6 +1286,102 @@ static int handle_offloads(struct sk_buff *skb)
 	return 0;
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
+static int vxlan6_xmit_skb(struct net *net, struct vxlan_sock *vs,
+			   struct dst_entry *dst, struct sk_buff *skb,
+			   struct net_device *dev, struct in6_addr *saddr,
+			   struct in6_addr *daddr, __u8 prio, __u8 ttl,
+			   __be16 src_port, __be16 dst_port, __be32 vni)
+{
+	struct ipv6hdr *ip6h;
+	struct vxlanhdr *vxh;
+	struct udphdr *uh;
+	int min_headroom;
+	int err;
+
+	if (!skb->encapsulation) {
+		skb_reset_inner_headers(skb);
+		skb->encapsulation = 1;
+	}
+
+	min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
+			+ VXLAN_HLEN + sizeof(struct ipv6hdr)
+			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
+
+	/* Need space for new headers (invalidates iph ptr) */
+	err = skb_cow_head(skb, min_headroom);
+	if (unlikely(err))
+		return err;
+
+	if (vlan_tx_tag_present(skb)) {
+		if (WARN_ON(!__vlan_put_tag(skb,
+					    skb->vlan_proto,
+					    vlan_tx_tag_get(skb))))
+			return -ENOMEM;
+
+		skb->vlan_tci = 0;
+	}
+
+	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
+	vxh->vx_flags = htonl(VXLAN_FLAGS);
+	vxh->vx_vni = vni;
+
+	__skb_push(skb, sizeof(*uh));
+	skb_reset_transport_header(skb);
+	uh = udp_hdr(skb);
+
+	uh->dest = dst_port;
+	uh->source = src_port;
+
+	uh->len = htons(skb->len);
+	uh->check = 0;
+
+	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
+	IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
+			      IPSKB_REROUTED);
+	skb_dst_drop(skb);
+	skb_dst_set(skb, dst);
+
+	if (!skb_is_gso(skb) && !(dst->dev->features & NETIF_F_IPV6_CSUM)) {
+		__wsum csum = skb_checksum(skb, 0, skb->len, 0);
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		uh->check = csum_ipv6_magic(saddr, daddr, skb->len,
+					    IPPROTO_UDP, csum);
+		if (uh->check == 0)
+			uh->check = CSUM_MANGLED_0;
+	} else {
+		skb->ip_summed = CHECKSUM_PARTIAL;
+		skb->csum_start = skb_transport_header(skb) - skb->head;
+		skb->csum_offset = offsetof(struct udphdr, check);
+		uh->check = ~csum_ipv6_magic(saddr, daddr,
+					     skb->len, IPPROTO_UDP, 0);
+	}
+
+	__skb_push(skb, sizeof(*ip6h));
+	skb_reset_network_header(skb);
+	ip6h		  = ipv6_hdr(skb);
+	ip6h->version	  = 6;
+	ip6h->priority	  = prio;
+	ip6h->flow_lbl[0] = 0;
+	ip6h->flow_lbl[1] = 0;
+	ip6h->flow_lbl[2] = 0;
+	ip6h->payload_len = htons(skb->len);
+	ip6h->nexthdr     = IPPROTO_UDP;
+	ip6h->hop_limit   = ttl;
+	ip6h->daddr	  = *daddr;
+	ip6h->saddr	  = *saddr;
+
+	vxlan_set_owner(vs->sock->sk, skb);
+
+	err = handle_offloads(skb);
+	if (err)
+		return err;
+
+	ip6tunnel_xmit(skb, dev);
+	return 0;
+}
+#endif
+
 int vxlan_xmit_skb(struct net *net, struct vxlan_sock *vs,
 		   struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
@@ -1182,15 +1446,26 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan,
 {
 	struct pcpu_tstats *tx_stats = this_cpu_ptr(src_vxlan->dev->tstats);
 	struct pcpu_tstats *rx_stats = this_cpu_ptr(dst_vxlan->dev->tstats);
+	union vxlan_addr loopback;
+	union vxlan_addr *remote_ip = &dst_vxlan->default_dst.remote_ip;
 
 	skb->pkt_type = PACKET_HOST;
 	skb->encapsulation = 0;
 	skb->dev = dst_vxlan->dev;
 	__skb_pull(skb, skb_network_offset(skb));
 
+	if (remote_ip->sa.sa_family == AF_INET) {
+		loopback.sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+		loopback.sa.sa_family =  AF_INET;
+#if IS_ENABLED(CONFIG_IPV6)
+	} else {
+		loopback.sin6.sin6_addr = in6addr_loopback;
+		loopback.sa.sa_family =  AF_INET6;
+#endif
+	}
+
 	if (dst_vxlan->flags & VXLAN_F_LEARN)
-		vxlan_snoop(skb->dev, htonl(INADDR_LOOPBACK),
-			    eth_hdr(skb)->h_source);
+		vxlan_snoop(skb->dev, &loopback, eth_hdr(skb)->h_source);
 
 	u64_stats_update_begin(&tx_stats->syncp);
 	tx_stats->tx_packets++;
@@ -1211,11 +1486,11 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 			   struct vxlan_rdst *rdst, bool did_rsc)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
-	struct rtable *rt;
+	struct rtable *rt = NULL;
 	const struct iphdr *old_iph;
 	struct flowi4 fl4;
-	__be32 dst;
-	__be16 src_port, dst_port;
+	union vxlan_addr *dst;
+	__be16 src_port = 0, dst_port;
 	u32 vni;
 	__be16 df = 0;
 	__u8 tos, ttl;
@@ -1223,9 +1498,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 	dst_port = rdst->remote_port ? rdst->remote_port : vxlan->dst_port;
 	vni = rdst->remote_vni;
-	dst = rdst->remote_ip;
+	dst = &rdst->remote_ip;
 
-	if (!dst) {
+	if (vxlan_addr_any(dst)) {
 		if (did_rsc) {
 			/* short-circuited back to local bridge */
 			vxlan_encap_bypass(skb, vxlan, vxlan);
@@ -1237,7 +1512,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	old_iph = ip_hdr(skb);
 
 	ttl = vxlan->ttl;
-	if (!ttl && IN_MULTICAST(ntohl(dst)))
+	if (!ttl && vxlan_addr_multicast(dst))
 		ttl = 1;
 
 	tos = vxlan->tos;
@@ -1246,48 +1521,101 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 	src_port = vxlan_src_port(vxlan->port_min, vxlan->port_max, skb);
 
-	memset(&fl4, 0, sizeof(fl4));
-	fl4.flowi4_oif = rdst->remote_ifindex;
-	fl4.flowi4_tos = RT_TOS(tos);
-	fl4.daddr = dst;
-	fl4.saddr = vxlan->saddr;
-
-	rt = ip_route_output_key(dev_net(dev), &fl4);
-	if (IS_ERR(rt)) {
-		netdev_dbg(dev, "no route to %pI4\n", &dst);
-		dev->stats.tx_carrier_errors++;
-		goto tx_error;
-	}
+	if (dst->sa.sa_family == AF_INET) {
+		memset(&fl4, 0, sizeof(fl4));
+		fl4.flowi4_oif = rdst->remote_ifindex;
+		fl4.flowi4_tos = RT_TOS(tos);
+		fl4.daddr = dst->sin.sin_addr.s_addr;
+		fl4.saddr = vxlan->saddr.sin.sin_addr.s_addr;
+
+		rt = ip_route_output_key(dev_net(dev), &fl4);
+		if (IS_ERR(rt)) {
+			netdev_dbg(dev, "no route to %pI4\n",
+				   &dst->sin.sin_addr.s_addr);
+			dev->stats.tx_carrier_errors++;
+			goto tx_error;
+		}
 
-	if (rt->dst.dev == dev) {
-		netdev_dbg(dev, "circular route to %pI4\n", &dst);
-		dev->stats.collisions++;
-		goto rt_tx_error;
-	}
+		if (rt->dst.dev == dev) {
+			netdev_dbg(dev, "circular route to %pI4\n",
+				   &dst->sin.sin_addr.s_addr);
+			dev->stats.collisions++;
+			goto tx_error;
+		}
+
+		/* Bypass encapsulation if the destination is local */
+		if (rt->rt_flags & RTCF_LOCAL &&
+		    !(rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) {
+			struct vxlan_dev *dst_vxlan;
+
+			ip_rt_put(rt);
+			dst_vxlan = vxlan_find_vni(dev_net(dev), vni, dst_port);
+			if (!dst_vxlan)
+				goto tx_error;
+			vxlan_encap_bypass(skb, vxlan, dst_vxlan);
+			return;
+		}
+
+		tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
+		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
 
-	/* Bypass encapsulation if the destination is local */
-	if (rt->rt_flags & RTCF_LOCAL &&
-	    !(rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) {
-		struct vxlan_dev *dst_vxlan;
+		err = vxlan_xmit_skb(dev_net(dev), vxlan->vn_sock, rt, skb,
+				     fl4.saddr, dst->sin.sin_addr.s_addr,
+				     tos, ttl, df, src_port, dst_port,
+				     htonl(vni << 8));
 
-		ip_rt_put(rt);
-		dst_vxlan = vxlan_find_vni(dev_net(dev), vni, dst_port);
-		if (!dst_vxlan)
+		if (err < 0)
+			goto rt_tx_error;
+		iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
+#if IS_ENABLED(CONFIG_IPV6)
+	} else {
+		struct sock *sk = vxlan->vn_sock->sock->sk;
+		struct dst_entry *ndst;
+		struct flowi6 fl6;
+		u32 flags;
+
+		memset(&fl6, 0, sizeof(fl6));
+		fl6.flowi6_oif = rdst->remote_ifindex;
+		fl6.daddr = dst->sin6.sin6_addr;
+		fl6.saddr = vxlan->saddr.sin6.sin6_addr;
+		fl6.flowi6_proto = skb->protocol;
+
+		if (ipv6_stub->ipv6_dst_lookup(sk, &ndst, &fl6)) {
+			netdev_dbg(dev, "no route to %pI6\n",
+				   &dst->sin6.sin6_addr);
+			dev->stats.tx_carrier_errors++;
 			goto tx_error;
-		vxlan_encap_bypass(skb, vxlan, dst_vxlan);
-		return;
-	}
+		}
 
-	tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
-	ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
+		if (ndst->dev == dev) {
+			netdev_dbg(dev, "circular route to %pI6\n",
+				   &dst->sin6.sin6_addr);
+			dst_release(ndst);
+			dev->stats.collisions++;
+			goto tx_error;
+		}
 
-	err = vxlan_xmit_skb(dev_net(dev), vxlan->vn_sock, rt, skb,
-			     fl4.saddr, dst, tos, ttl, df,
-			     src_port, dst_port, htonl(vni << 8));
+		/* Bypass encapsulation if the destination is local */
+		flags = ((struct rt6_info *)ndst)->rt6i_flags;
+		if (flags & RTF_LOCAL &&
+		    !(flags & (RTCF_BROADCAST | RTCF_MULTICAST))) {
+			struct vxlan_dev *dst_vxlan;
+
+			dst_release(ndst);
+			dst_vxlan = vxlan_find_vni(dev_net(dev), vni, dst_port);
+			if (!dst_vxlan)
+				goto tx_error;
+			vxlan_encap_bypass(skb, vxlan, dst_vxlan);
+			return;
+		}
 
-	if (err < 0)
-		goto rt_tx_error;
-	iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
+		ttl = ttl ? : ip6_dst_hoplimit(ndst);
+
+		err = vxlan6_xmit_skb(dev_net(dev), vxlan->vn_sock, ndst, skb,
+				      dev, &fl6.saddr, &fl6.daddr, 0, ttl,
+				      src_port, dst_port, htonl(vni << 8));
+#endif
+	}
 
 	return;
 
@@ -1464,8 +1792,8 @@ static int vxlan_open(struct net_device *dev)
 	if (!vs)
 		return -ENOTCONN;
 
-	if (IN_MULTICAST(ntohl(vxlan->default_dst.remote_ip)) &&
-	    vxlan_group_used(vn, vxlan->default_dst.remote_ip)) {
+	if (vxlan_addr_multicast(&vxlan->default_dst.remote_ip) &&
+	    vxlan_group_used(vn, &vxlan->default_dst.remote_ip)) {
 		vxlan_sock_hold(vs);
 		dev_hold(dev);
 		queue_work(vxlan_wq, &vxlan->igmp_join);
@@ -1503,8 +1831,8 @@ static int vxlan_stop(struct net_device *dev)
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct vxlan_sock *vs = vxlan->vn_sock;
 
-	if (vs && IN_MULTICAST(ntohl(vxlan->default_dst.remote_ip)) &&
-	    ! vxlan_group_used(vn, vxlan->default_dst.remote_ip)) {
+	if (vs && vxlan_addr_multicast(&vxlan->default_dst.remote_ip) &&
+	    ! vxlan_group_used(vn, &vxlan->default_dst.remote_ip)) {
 		vxlan_sock_hold(vs);
 		dev_hold(dev);
 		queue_work(vxlan_wq, &vxlan->igmp_leave);
@@ -1552,7 +1880,10 @@ static void vxlan_setup(struct net_device *dev)
 
 	eth_hw_addr_random(dev);
 	ether_setup(dev);
-	dev->hard_header_len = ETH_HLEN + VXLAN_HEADROOM;
+	if (vxlan->default_dst.remote_ip.sa.sa_family == AF_INET6)
+		dev->hard_header_len = ETH_HLEN + VXLAN6_HEADROOM;
+	else
+		dev->hard_header_len = ETH_HLEN + VXLAN_HEADROOM;
 
 	dev->netdev_ops = &vxlan_netdev_ops;
 	dev->destructor = free_netdev;
@@ -1597,8 +1928,10 @@ static void vxlan_setup(struct net_device *dev)
 static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_ID]		= { .type = NLA_U32 },
 	[IFLA_VXLAN_GROUP]	= { .len = FIELD_SIZEOF(struct iphdr, daddr) },
+	[IFLA_VXLAN_GROUP6]	= { .len = sizeof(struct in6_addr) },
 	[IFLA_VXLAN_LINK]	= { .type = NLA_U32 },
 	[IFLA_VXLAN_LOCAL]	= { .len = FIELD_SIZEOF(struct iphdr, saddr) },
+	[IFLA_VXLAN_LOCAL6]	= { .len = sizeof(struct in6_addr) },
 	[IFLA_VXLAN_TOS]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_TTL]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_LEARNING]	= { .type = NLA_U8 },
@@ -1669,58 +2002,132 @@ static void vxlan_del_work(struct work_struct *work)
 	kfree_rcu(vs, rcu);
 }
 
-static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
-					      vxlan_rcv_t *rcv, void *data)
+#if IS_ENABLED(CONFIG_IPV6)
+/* Create UDP socket for encapsulation receive. AF_INET6 socket
+ * could be used for both IPv4 and IPv6 communications, but
+ * users may set bindv6only=1.
+ */
+static int create_v6_sock(struct net *net, __be16 port, struct socket **psock)
 {
-	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
-	struct vxlan_sock *vs;
 	struct sock *sk;
+	struct socket *sock;
+	struct sockaddr_in6 vxlan_addr = {
+		.sin6_family = AF_INET6,
+		.sin6_port = port,
+	};
+	int rc, val = 1;
+
+	rc = sock_create_kern(AF_INET6, SOCK_DGRAM, IPPROTO_UDP, &sock);
+	if (rc < 0) {
+		pr_debug("UDPv6 socket create failed\n");
+		return rc;
+	}
+
+	/* Put in proper namespace */
+	sk = sock->sk;
+	sk_change_net(sk, net);
+
+	kernel_setsockopt(sock, SOL_IPV6, IPV6_V6ONLY,
+			  (char *)&val, sizeof(val));
+	rc = kernel_bind(sock, (struct sockaddr *)&vxlan_addr,
+			 sizeof(struct sockaddr_in6));
+	if (rc < 0) {
+		pr_debug("bind for UDPv6 socket %pI6:%u (%d)\n",
+			 &vxlan_addr.sin6_addr, ntohs(vxlan_addr.sin6_port), rc);
+		sk_release_kernel(sk);
+		return rc;
+	}
+	/* At this point, IPv6 module should have been loaded in
+	 * sock_create_kern().
+	 */
+	BUG_ON(!ipv6_stub);
+
+	*psock = sock;
+	/* Disable multicast loopback */
+	inet_sk(sk)->mc_loop = 0;
+	return 0;
+}
+
+#else
+
+static int create_v6_sock(struct net *net, __be16 port, struct socket **psock)
+{
+		return -EPFNOSUPPORT;
+}
+#endif
+
+static int create_v4_sock(struct net *net, __be16 port, struct socket **psock)
+{
+	struct sock *sk;
+	struct socket *sock;
 	struct sockaddr_in vxlan_addr = {
 		.sin_family = AF_INET,
 		.sin_addr.s_addr = htonl(INADDR_ANY),
 		.sin_port = port,
 	};
 	int rc;
-	unsigned int h;
-
-	vs = kmalloc(sizeof(*vs), GFP_KERNEL);
-	if (!vs) {
-		pr_debug("memory alocation failure\n");
-		return ERR_PTR(-ENOMEM);
-	}
-
-	for (h = 0; h < VNI_HASH_SIZE; ++h)
-		INIT_HLIST_HEAD(&vs->vni_list[h]);
-
-	INIT_WORK(&vs->del_work, vxlan_del_work);
 
 	/* Create UDP socket for encapsulation receive. */
-	rc = sock_create_kern(AF_INET, SOCK_DGRAM, IPPROTO_UDP, &vs->sock);
+	rc = sock_create_kern(AF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
 	if (rc < 0) {
 		pr_debug("UDP socket create failed\n");
-		kfree(vs);
-		return ERR_PTR(rc);
+		return rc;
 	}
 
 	/* Put in proper namespace */
-	sk = vs->sock->sk;
+	sk = sock->sk;
 	sk_change_net(sk, net);
 
-	rc = kernel_bind(vs->sock, (struct sockaddr *) &vxlan_addr,
+	rc = kernel_bind(sock, (struct sockaddr *) &vxlan_addr,
 			 sizeof(vxlan_addr));
 	if (rc < 0) {
 		pr_debug("bind for UDP socket %pI4:%u (%d)\n",
 			 &vxlan_addr.sin_addr, ntohs(vxlan_addr.sin_port), rc);
 		sk_release_kernel(sk);
+		return rc;
+	}
+
+	*psock = sock;
+	/* Disable multicast loopback */
+	inet_sk(sk)->mc_loop = 0;
+	return 0;
+}
+
+/* Create new listen socket if needed */
+static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
+					      vxlan_rcv_t *rcv, void *data, bool ipv6)
+{
+	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
+	struct vxlan_sock *vs;
+	struct socket *sock;
+	struct sock *sk;
+	int rc = 0;
+	unsigned int h;
+
+	vs = kmalloc(sizeof(*vs), GFP_KERNEL);
+	if (!vs)
+		return ERR_PTR(-ENOMEM);
+
+	for (h = 0; h < VNI_HASH_SIZE; ++h)
+		INIT_HLIST_HEAD(&vs->vni_list[h]);
+
+	INIT_WORK(&vs->del_work, vxlan_del_work);
+
+	if (ipv6)
+		rc = create_v6_sock(net, port, &sock);
+	else
+		rc = create_v4_sock(net, port, &sock);
+	if (rc < 0) {
 		kfree(vs);
 		return ERR_PTR(rc);
 	}
+
+	vs->sock = sock;
+	sk = sock->sk;
 	atomic_set(&vs->refcnt, 1);
 	vs->rcv = rcv;
 	vs->data = data;
 
-	/* Disable multicast loopback */
-	inet_sk(sk)->mc_loop = 0;
 	spin_lock(&vn->sock_lock);
 	hlist_add_head_rcu(&vs->hlist, vs_head(net, port));
 	spin_unlock(&vn->sock_lock);
@@ -1728,18 +2135,24 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 	/* Mark socket as an encapsulation socket. */
 	udp_sk(sk)->encap_type = 1;
 	udp_sk(sk)->encap_rcv = vxlan_udp_encap_recv;
-	udp_encap_enable();
+#if IS_ENABLED(CONFIG_IPV6)
+	if (ipv6)
+		ipv6_stub->udpv6_encap_enable();
+	else
+#endif
+		udp_encap_enable();
+
 	return vs;
 }
 
 struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
 				  vxlan_rcv_t *rcv, void *data,
-				  bool no_share)
+				  bool no_share, bool ipv6)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
 	struct vxlan_sock *vs;
 
-	vs = vxlan_socket_create(net, port, rcv, data);
+	vs = vxlan_socket_create(net, port, rcv, data, ipv6);
 	if (!IS_ERR(vs))
 		return vs;
 
@@ -1772,7 +2185,7 @@ static void vxlan_sock_work(struct work_struct *work)
 	__be16 port = vxlan->dst_port;
 	struct vxlan_sock *nvs;
 
-	nvs = vxlan_sock_add(net, port, vxlan_rcv, NULL, false);
+	nvs = vxlan_sock_add(net, port, vxlan_rcv, NULL, false, vxlan->flags & VXLAN_F_IPV6);
 	spin_lock(&vn->sock_lock);
 	if (!IS_ERR(nvs))
 		vxlan_vs_add_dev(nvs, vxlan);
@@ -1789,6 +2202,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 	struct vxlan_rdst *dst = &vxlan->default_dst;
 	__u32 vni;
 	int err;
+	bool use_ipv6 = false;
 
 	if (!data[IFLA_VXLAN_ID])
 		return -EINVAL;
@@ -1796,11 +2210,31 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 	vni = nla_get_u32(data[IFLA_VXLAN_ID]);
 	dst->remote_vni = vni;
 
-	if (data[IFLA_VXLAN_GROUP])
-		dst->remote_ip = nla_get_be32(data[IFLA_VXLAN_GROUP]);
+	if (data[IFLA_VXLAN_GROUP]) {
+		dst->remote_ip.sin.sin_addr.s_addr = nla_get_be32(data[IFLA_VXLAN_GROUP]);
+		dst->remote_ip.sa.sa_family = AF_INET;
+	} else if (data[IFLA_VXLAN_GROUP6]) {
+		if (!IS_ENABLED(CONFIG_IPV6))
+			return -EPFNOSUPPORT;
+
+		nla_memcpy(&dst->remote_ip.sin6.sin6_addr, data[IFLA_VXLAN_GROUP6],
+			   sizeof(struct in6_addr));
+		dst->remote_ip.sa.sa_family = AF_INET6;
+		use_ipv6 = true;
+	}
 
-	if (data[IFLA_VXLAN_LOCAL])
-		vxlan->saddr = nla_get_be32(data[IFLA_VXLAN_LOCAL]);
+	if (data[IFLA_VXLAN_LOCAL]) {
+		vxlan->saddr.sin.sin_addr.s_addr = nla_get_be32(data[IFLA_VXLAN_LOCAL]);
+		vxlan->saddr.sa.sa_family = AF_INET;
+	} else if (data[IFLA_VXLAN_LOCAL6]) {
+		if (!IS_ENABLED(CONFIG_IPV6))
+			return -EPFNOSUPPORT;
+
+		nla_memcpy(&vxlan->saddr.sin6.sin6_addr, data[IFLA_VXLAN_LOCAL6],
+			   sizeof(struct in6_addr));
+		vxlan->saddr.sa.sa_family = AF_INET6;
+		use_ipv6 = true;
+	}
 
 	if (data[IFLA_VXLAN_LINK] &&
 	    (dst->remote_ifindex = nla_get_u32(data[IFLA_VXLAN_LINK]))) {
@@ -1812,12 +2246,23 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 			return -ENODEV;
 		}
 
+#if IS_ENABLED(CONFIG_IPV6)
+		if (use_ipv6) {
+			struct inet6_dev *idev = __in6_dev_get(lowerdev);
+			if (idev && idev->cnf.disable_ipv6) {
+				pr_info("IPv6 is disabled via sysctl\n");
+				return -EPERM;
+			}
+			vxlan->flags |= VXLAN_F_IPV6;
+		}
+#endif
+
 		if (!tb[IFLA_MTU])
-			dev->mtu = lowerdev->mtu - VXLAN_HEADROOM;
+			dev->mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM);
 
 		/* update header length based on lower device */
 		dev->hard_header_len = lowerdev->hard_header_len +
-				       VXLAN_HEADROOM;
+				       (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM);
 	}
 
 	if (data[IFLA_VXLAN_TOS])
@@ -1868,7 +2313,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 
 	/* create an fdb entry for default destination */
 	err = vxlan_fdb_create(vxlan, all_zeros_mac,
-			       vxlan->default_dst.remote_ip,
+			       &vxlan->default_dst.remote_ip,
 			       NUD_REACHABLE|NUD_PERMANENT,
 			       NLM_F_EXCL|NLM_F_CREATE,
 			       vxlan->dst_port, vxlan->default_dst.remote_vni,
@@ -1905,9 +2350,9 @@ static size_t vxlan_get_size(const struct net_device *dev)
 {
 
 	return nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_ID */
-		nla_total_size(sizeof(__be32)) +/* IFLA_VXLAN_GROUP */
+		nla_total_size(sizeof(struct in6_addr)) + /* IFLA_VXLAN_GROUP{6} */
 		nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_LINK */
-		nla_total_size(sizeof(__be32))+	/* IFLA_VXLAN_LOCAL */
+		nla_total_size(sizeof(struct in6_addr)) + /* IFLA_VXLAN_LOCAL{6} */
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_TTL */
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_TOS */
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_LEARNING */
@@ -1934,14 +2379,36 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	if (nla_put_u32(skb, IFLA_VXLAN_ID, dst->remote_vni))
 		goto nla_put_failure;
 
-	if (dst->remote_ip && nla_put_be32(skb, IFLA_VXLAN_GROUP, dst->remote_ip))
-		goto nla_put_failure;
+	if (!vxlan_addr_any(&dst->remote_ip)) {
+		if (dst->remote_ip.sa.sa_family == AF_INET) {
+			if (nla_put_be32(skb, IFLA_VXLAN_GROUP,
+					 dst->remote_ip.sin.sin_addr.s_addr))
+				goto nla_put_failure;
+#if IS_ENABLED(CONFIG_IPV6)
+		} else {
+			if (nla_put(skb, IFLA_VXLAN_GROUP6, sizeof(struct in6_addr),
+				    &dst->remote_ip.sin6.sin6_addr))
+				goto nla_put_failure;
+#endif
+		}
+	}
 
 	if (dst->remote_ifindex && nla_put_u32(skb, IFLA_VXLAN_LINK, dst->remote_ifindex))
 		goto nla_put_failure;
 
-	if (vxlan->saddr && nla_put_be32(skb, IFLA_VXLAN_LOCAL, vxlan->saddr))
-		goto nla_put_failure;
+	if (!vxlan_addr_any(&vxlan->saddr)) {
+		if (vxlan->saddr.sa.sa_family == AF_INET) {
+			if (nla_put_be32(skb, IFLA_VXLAN_LOCAL,
+					 vxlan->saddr.sin.sin_addr.s_addr))
+				goto nla_put_failure;
+#if IS_ENABLED(CONFIG_IPV6)
+		} else {
+			if (nla_put(skb, IFLA_VXLAN_LOCAL6, sizeof(struct in6_addr),
+				    &vxlan->saddr.sin6.sin6_addr))
+				goto nla_put_failure;
+#endif
+		}
+	}
 
 	if (nla_put_u8(skb, IFLA_VXLAN_TTL, vxlan->ttl) ||
 	    nla_put_u8(skb, IFLA_VXLAN_TOS, vxlan->tos) ||
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index ad342e3..d2b88ca 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -25,7 +25,7 @@ struct vxlan_sock {
 
 struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
 				  vxlan_rcv_t *rcv, void *data,
-				  bool no_share);
+				  bool no_share, bool ipv6);
 
 void vxlan_sock_release(struct vxlan_sock *vs);
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 04c0e7a..80394e8 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -314,6 +314,8 @@ enum {
 	IFLA_VXLAN_L2MISS,
 	IFLA_VXLAN_L3MISS,
 	IFLA_VXLAN_PORT,	/* destination port */
+	IFLA_VXLAN_GROUP6,
+	IFLA_VXLAN_LOCAL6,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 36848bd..a006024 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -123,7 +123,7 @@ static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
 	vxlan_port = vxlan_vport(vport);
 	strncpy(vxlan_port->name, parms->name, IFNAMSIZ);
 
-	vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true);
+	vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true, false);
 	if (IS_ERR(vs)) {
 		ovs_vport_free(vport);
 		return (void *)vs;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 07/11] vxlan: add ipv6 route short circuit support
  2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
                   ` (5 preceding siblings ...)
  2013-08-28  5:22 ` [PATCH net-next v10 06/11] vxlan: add ipv6 support Cong Wang
@ 2013-08-28  5:22 ` Cong Wang
  2013-08-28 16:08   ` Stephen Hemminger
  2013-08-28  5:22 ` [PATCH net-next v10 08/11] ipv6: move in6_dev_finish_destroy() into core kernel Cong Wang
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang, David Stevens

From: Cong Wang <amwang@redhat.com>

route short circuit only has IPv4 part, this patch adds
the IPv6 part. nd_tbl will be needed.

Cc: David S. Miller <davem@davemloft.net>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 drivers/net/vxlan.c    |   28 ++++++++++++++++++++++++++--
 include/net/addrconf.h |    1 +
 net/ipv6/af_inet6.c    |    1 +
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 717c6e3..79eb09c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1198,7 +1198,6 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct neighbour *n;
-	struct iphdr *pip;
 
 	if (is_multicast_ether_addr(eth_hdr(skb)->h_dest))
 		return false;
@@ -1206,6 +1205,9 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 	n = NULL;
 	switch (ntohs(eth_hdr(skb)->h_proto)) {
 	case ETH_P_IP:
+	{
+		struct iphdr *pip;
+
 		if (!pskb_may_pull(skb, sizeof(struct iphdr)))
 			return false;
 		pip = ip_hdr(skb);
@@ -1219,6 +1221,27 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 		}
 
 		break;
+	}
+#if IS_ENABLED(CONFIG_IPV6)
+	case ETH_P_IPV6:
+	{
+		struct ipv6hdr *pip6;
+
+		if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
+			return false;
+		pip6 = ipv6_hdr(skb);
+		n = neigh_lookup(ipv6_stub->nd_tbl, &pip6->daddr, dev);
+		if (!n && vxlan->flags & VXLAN_F_L3MISS) {
+			union vxlan_addr ipa;
+			ipa.sin6.sin6_addr = pip6->daddr;
+			ipa.sa.sa_family = AF_INET6;
+			vxlan_ip_miss(dev, &ipa);
+			return false;
+		}
+
+		break;
+	}
+#endif
 	default:
 		return false;
 	}
@@ -1655,7 +1678,8 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	did_rsc = false;
 
 	if (f && (f->flags & NTF_ROUTER) && (vxlan->flags & VXLAN_F_RSC) &&
-	    ntohs(eth->h_proto) == ETH_P_IP) {
+	    (ntohs(eth->h_proto) == ETH_P_IP ||
+	     ntohs(eth->h_proto) == ETH_P_IPV6)) {
 		did_rsc = route_shortcircuit(dev, skb);
 		if (did_rsc)
 			f = vxlan_find_mac(vxlan, eth->h_dest);
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 5339cab..bcf9573 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -153,6 +153,7 @@ struct ipv6_stub {
 	int (*ipv6_dst_lookup)(struct sock *sk, struct dst_entry **dst,
 				struct flowi6 *fl6);
 	void (*udpv6_encap_enable)(void);
+	struct neigh_table *nd_tbl;
 };
 extern const struct ipv6_stub *ipv6_stub __read_mostly;
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 0c9c22f..1996a7c 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -815,6 +815,7 @@ static const struct ipv6_stub ipv6_stub_impl = {
 	.ipv6_sock_mc_drop = ipv6_sock_mc_drop,
 	.ipv6_dst_lookup = ip6_dst_lookup,
 	.udpv6_encap_enable = udpv6_encap_enable,
+	.nd_tbl	= &nd_tbl,
 };
 
 static int __init inet6_init(void)
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 08/11] ipv6: move in6_dev_finish_destroy() into core kernel
  2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
                   ` (6 preceding siblings ...)
  2013-08-28  5:22 ` [PATCH net-next v10 07/11] vxlan: add ipv6 route short circuit support Cong Wang
@ 2013-08-28  5:22 ` Cong Wang
  2013-08-28  5:22 ` [PATCH net-next v10 09/11] vxlan: add ipv6 proxy support Cong Wang
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang

From: Cong Wang <amwang@redhat.com>

in6_dev_put() will be needed by vxlan module, so is
in6_dev_finish_destroy().

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 net/ipv6/addrconf.c      |   30 ------------------------------
 net/ipv6/addrconf_core.c |   31 +++++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 1bb0861..ff7967e 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -302,36 +302,6 @@ err_ip:
 	return -ENOMEM;
 }
 
-static void snmp6_free_dev(struct inet6_dev *idev)
-{
-	kfree(idev->stats.icmpv6msgdev);
-	kfree(idev->stats.icmpv6dev);
-	snmp_mib_free((void __percpu **)idev->stats.ipv6);
-}
-
-/* Nobody refers to this device, we may destroy it. */
-
-void in6_dev_finish_destroy(struct inet6_dev *idev)
-{
-	struct net_device *dev = idev->dev;
-
-	WARN_ON(!list_empty(&idev->addr_list));
-	WARN_ON(idev->mc_list != NULL);
-	WARN_ON(timer_pending(&idev->rs_timer));
-
-#ifdef NET_REFCNT_DEBUG
-	pr_debug("%s: %s\n", __func__, dev ? dev->name : "NIL");
-#endif
-	dev_put(dev);
-	if (!idev->dead) {
-		pr_warn("Freeing alive inet6 device %p\n", idev);
-		return;
-	}
-	snmp6_free_dev(idev);
-	kfree_rcu(idev, rcu);
-}
-EXPORT_SYMBOL(in6_dev_finish_destroy);
-
 static struct inet6_dev *ipv6_add_dev(struct net_device *dev)
 {
 	struct inet6_dev *ndev;
diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
index a864033..4c11cbc 100644
--- a/net/ipv6/addrconf_core.c
+++ b/net/ipv6/addrconf_core.c
@@ -6,6 +6,7 @@
 #include <linux/export.h>
 #include <net/ipv6.h>
 #include <net/addrconf.h>
+#include <net/ip.h>
 
 #define IPV6_ADDR_SCOPE_TYPE(scope)	((scope) << 16)
 
@@ -117,3 +118,33 @@ const struct in6_addr in6addr_interfacelocal_allrouters = IN6ADDR_INTERFACELOCAL
 EXPORT_SYMBOL(in6addr_interfacelocal_allrouters);
 const struct in6_addr in6addr_sitelocal_allrouters = IN6ADDR_SITELOCAL_ALLROUTERS_INIT;
 EXPORT_SYMBOL(in6addr_sitelocal_allrouters);
+
+static void snmp6_free_dev(struct inet6_dev *idev)
+{
+	kfree(idev->stats.icmpv6msgdev);
+	kfree(idev->stats.icmpv6dev);
+	snmp_mib_free((void __percpu **)idev->stats.ipv6);
+}
+
+/* Nobody refers to this device, we may destroy it. */
+
+void in6_dev_finish_destroy(struct inet6_dev *idev)
+{
+	struct net_device *dev = idev->dev;
+
+	WARN_ON(!list_empty(&idev->addr_list));
+	WARN_ON(idev->mc_list != NULL);
+	WARN_ON(timer_pending(&idev->rs_timer));
+
+#ifdef NET_REFCNT_DEBUG
+	pr_debug("%s: %s\n", __func__, dev ? dev->name : "NIL");
+#endif
+	dev_put(dev);
+	if (!idev->dead) {
+		pr_warn("Freeing alive inet6 device %p\n", idev);
+		return;
+	}
+	snmp6_free_dev(idev);
+	kfree_rcu(idev, rcu);
+}
+EXPORT_SYMBOL(in6_dev_finish_destroy);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 09/11] vxlan: add ipv6 proxy support
  2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
                   ` (7 preceding siblings ...)
  2013-08-28  5:22 ` [PATCH net-next v10 08/11] ipv6: move in6_dev_finish_destroy() into core kernel Cong Wang
@ 2013-08-28  5:22 ` Cong Wang
  2013-08-28 16:13   ` Stephen Hemminger
  2013-08-28  5:22 ` [PATCH net-next v10 10/11] vxlan: respect scope_id for ll addr Cong Wang
  2013-08-28  5:22 ` [PATCH net-next v10 11/11] ipv6: Add generic UDP Tunnel segmentation Cong Wang
  10 siblings, 1 reply; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang, David Stevens

From: Cong Wang <amwang@redhat.com>

This patch adds the IPv6 version of "arp_reduce", ndisc_send_na()
will be needed.

Cc: David S. Miller <davem@davemloft.net>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 drivers/net/vxlan.c    |   83 ++++++++++++++++++++++++++++++++++++++++++++++-
 include/net/addrconf.h |    4 ++
 include/net/ndisc.h    |    5 +++
 net/ipv6/af_inet6.c    |    2 +
 net/ipv6/ndisc.c       |    8 ++--
 5 files changed, 96 insertions(+), 6 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 79eb09c..33a2c6e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1194,6 +1194,71 @@ out:
 	return NETDEV_TX_OK;
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
+static int neigh_reduce(struct net_device *dev, struct sk_buff *skb)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	struct neighbour *n;
+	union vxlan_addr ipa;
+	const struct ipv6hdr *iphdr;
+	const struct in6_addr *saddr, *daddr;
+	struct nd_msg *msg;
+	struct inet6_dev *in6_dev = NULL;
+
+	WARN_ON(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
+	in6_dev = __in6_dev_get(dev);
+	if (!in6_dev)
+		goto out;
+
+	if (!pskb_may_pull(skb, skb->len))
+		goto out;
+
+	iphdr = ipv6_hdr(skb);
+	saddr = &iphdr->saddr;
+	daddr = &iphdr->daddr;
+
+	if (ipv6_addr_loopback(daddr) ||
+	    ipv6_addr_is_multicast(daddr))
+		goto out;
+
+	msg = (struct nd_msg *)skb_transport_header(skb);
+	if (msg->icmph.icmp6_code != 0 ||
+	    msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION)
+		goto out;
+
+	n = neigh_lookup(ipv6_stub->nd_tbl, daddr, dev);
+
+	if (n) {
+		struct vxlan_fdb *f;
+
+		if (!(n->nud_state & NUD_CONNECTED)) {
+			neigh_release(n);
+			goto out;
+		}
+
+		f = vxlan_find_mac(vxlan, n->ha);
+		if (f && vxlan_addr_any(&(first_remote_rcu(f)->remote_ip))) {
+			/* bridge-local neighbor */
+			neigh_release(n);
+			goto out;
+		}
+
+		ipv6_stub->ndisc_send_na(dev, n, saddr, &msg->target,
+					 !!in6_dev->cnf.forwarding,
+					 true, false, false);
+		neigh_release(n);
+	} else if (vxlan->flags & VXLAN_F_L3MISS) {
+		ipa.sin6.sin6_addr = *daddr;
+		ipa.sa.sa_family = AF_INET6;
+		vxlan_ip_miss(dev, &ipa);
+	}
+
+out:
+	consume_skb(skb);
+	return NETDEV_TX_OK;
+}
+#endif
+
 static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
@@ -1671,8 +1736,22 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	skb_reset_mac_header(skb);
 	eth = eth_hdr(skb);
 
-	if ((vxlan->flags & VXLAN_F_PROXY) && ntohs(eth->h_proto) == ETH_P_ARP)
-		return arp_reduce(dev, skb);
+	if ((vxlan->flags & VXLAN_F_PROXY)) {
+		if (ntohs(eth->h_proto) == ETH_P_ARP)
+			return arp_reduce(dev, skb);
+#if IS_ENABLED(CONFIG_IPV6)
+		else if (ntohs(eth->h_proto) == ETH_P_IPV6 &&
+			 skb->len >= sizeof(struct ipv6hdr) + sizeof(struct nd_msg) &&
+			 ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) {
+				struct nd_msg *msg;
+
+				msg = (struct nd_msg *)skb_transport_header(skb);
+				if (msg->icmph.icmp6_code == 0 &&
+				    msg->icmph.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION)
+					return neigh_reduce(dev, skb);
+		}
+#endif
+	}
 
 	f = vxlan_find_mac(vxlan, eth->h_dest);
 	did_rsc = false;
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index bcf9573..fb314de 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -153,6 +153,10 @@ struct ipv6_stub {
 	int (*ipv6_dst_lookup)(struct sock *sk, struct dst_entry **dst,
 				struct flowi6 *fl6);
 	void (*udpv6_encap_enable)(void);
+	void (*ndisc_send_na)(struct net_device *dev, struct neighbour *neigh,
+			      const struct in6_addr *daddr,
+			      const struct in6_addr *solicited_addr,
+			      bool router, bool solicited, bool override, bool inc_opt);
 	struct neigh_table *nd_tbl;
 };
 extern const struct ipv6_stub *ipv6_stub __read_mostly;
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 6fea323..3c4211f 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -204,6 +204,11 @@ extern void			ndisc_send_ns(struct net_device *dev,
 extern void			ndisc_send_rs(struct net_device *dev,
 					      const struct in6_addr *saddr,
 					      const struct in6_addr *daddr);
+extern void			ndisc_send_na(struct net_device *dev, struct neighbour *neigh,
+					      const struct in6_addr *daddr,
+					      const struct in6_addr *solicited_addr,
+					      bool router, bool solicited, bool override,
+					      bool inc_opt);
 
 extern void			ndisc_send_redirect(struct sk_buff *skb,
 						    const struct in6_addr *target);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 1996a7c..136fe55 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -56,6 +56,7 @@
 #include <net/transp_v6.h>
 #include <net/ip6_route.h>
 #include <net/addrconf.h>
+#include <net/ndisc.h>
 #ifdef CONFIG_IPV6_TUNNEL
 #include <net/ip6_tunnel.h>
 #endif
@@ -815,6 +816,7 @@ static const struct ipv6_stub ipv6_stub_impl = {
 	.ipv6_sock_mc_drop = ipv6_sock_mc_drop,
 	.ipv6_dst_lookup = ip6_dst_lookup,
 	.udpv6_encap_enable = udpv6_encap_enable,
+	.ndisc_send_na = ndisc_send_na,
 	.nd_tbl	= &nd_tbl,
 };
 
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 04d31c2..4b6cac3 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -462,10 +462,10 @@ static void ndisc_send_skb(struct sk_buff *skb,
 	rcu_read_unlock();
 }
 
-static void ndisc_send_na(struct net_device *dev, struct neighbour *neigh,
-			  const struct in6_addr *daddr,
-			  const struct in6_addr *solicited_addr,
-			  bool router, bool solicited, bool override, bool inc_opt)
+void ndisc_send_na(struct net_device *dev, struct neighbour *neigh,
+		   const struct in6_addr *daddr,
+		   const struct in6_addr *solicited_addr,
+		   bool router, bool solicited, bool override, bool inc_opt)
 {
 	struct sk_buff *skb;
 	struct in6_addr tmpaddr;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 10/11] vxlan: respect scope_id for ll addr
  2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
                   ` (8 preceding siblings ...)
  2013-08-28  5:22 ` [PATCH net-next v10 09/11] vxlan: add ipv6 proxy support Cong Wang
@ 2013-08-28  5:22 ` Cong Wang
  2013-08-29 21:20   ` David Stevens
  2013-08-28  5:22 ` [PATCH net-next v10 11/11] ipv6: Add generic UDP Tunnel segmentation Cong Wang
  10 siblings, 1 reply; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang, David Stevens

From: Cong Wang <amwang@redhat.com>

As pointed out by David, we should take care of scope id for ll
addr, and use it for route lookup.

Cc: David S. Miller <davem@davemloft.net>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 drivers/net/vxlan.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 33a2c6e..a06355d 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1662,6 +1662,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		struct flowi6 fl6;
 		u32 flags;
 
+		if (ipv6_addr_type(&dst->sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL) {
+			dst->sin6.sin6_scope_id = ipv6_iface_scope_id(&dst->sin6.sin6_addr,
+								rdst->remote_ifindex);
+			rdst->remote_ifindex = dst->sin6.sin6_scope_id;
+		}
+
 		memset(&fl6, 0, sizeof(fl6));
 		fl6.flowi6_oif = rdst->remote_ifindex;
 		fl6.daddr = dst->sin6.sin6_addr;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v10 11/11] ipv6: Add generic UDP Tunnel segmentation
  2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
                   ` (9 preceding siblings ...)
  2013-08-28  5:22 ` [PATCH net-next v10 10/11] vxlan: respect scope_id for ll addr Cong Wang
@ 2013-08-28  5:22 ` Cong Wang
  2013-08-28 17:55   ` Pravin Shelar
  10 siblings, 1 reply; 23+ messages in thread
From: Cong Wang @ 2013-08-28  5:22 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Cong Wang, Jesse Gross, Pravin B Shelar,
	Stephen Hemminger

From: Cong Wang <amwang@redhat.com>

Similar to commit 731362674580cb0c696cd1b1a03d8461a10cf90a
(tunneling: Add generic Tunnel segmentation)

This patch adds generic tunneling offloading support for
IPv6-UDP based tunnels.

This can be used by tunneling protocols like VXLAN.

Cc: Jesse Gross <jesse@nicira.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 net/ipv6/ip6_offload.c |    4 +-
 net/ipv6/udp_offload.c |  159 ++++++++++++++++++++++++++++++++---------------
 2 files changed, 111 insertions(+), 52 deletions(-)

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index a263b99..d82de72 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -91,6 +91,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	unsigned int unfrag_ip6hlen;
 	u8 *prevhdr;
 	int offset = 0;
+	bool tunnel;
 
 	if (unlikely(skb_shinfo(skb)->gso_type &
 		     ~(SKB_GSO_UDP |
@@ -106,6 +107,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h))))
 		goto out;
 
+	tunnel = skb->encapsulation;
 	ipv6h = ipv6_hdr(skb);
 	__skb_pull(skb, sizeof(*ipv6h));
 	segs = ERR_PTR(-EPROTONOSUPPORT);
@@ -126,7 +128,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 		ipv6h = ipv6_hdr(skb);
 		ipv6h->payload_len = htons(skb->len - skb->mac_len -
 					   sizeof(*ipv6h));
-		if (proto == IPPROTO_UDP) {
+		if (!tunnel && proto == IPPROTO_UDP) {
 			unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
 			fptr = (struct frag_hdr *)(skb_network_header(skb) +
 				unfrag_ip6hlen);
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 5d1b8d7..7e5e5ac 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -21,26 +21,79 @@ static int udp6_ufo_send_check(struct sk_buff *skb)
 	const struct ipv6hdr *ipv6h;
 	struct udphdr *uh;
 
-	/* UDP Tunnel offload on ipv6 is not yet supported. */
-	if (skb->encapsulation)
-		return -EINVAL;
-
 	if (!pskb_may_pull(skb, sizeof(*uh)))
 		return -EINVAL;
 
-	ipv6h = ipv6_hdr(skb);
-	uh = udp_hdr(skb);
+	if (likely(!skb->encapsulation)) {
+		ipv6h = ipv6_hdr(skb);
+		uh = udp_hdr(skb);
+
+		uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, skb->len,
+					     IPPROTO_UDP, 0);
+		skb->csum_start = skb_transport_header(skb) - skb->head;
+		skb->csum_offset = offsetof(struct udphdr, check);
+		skb->ip_summed = CHECKSUM_PARTIAL;
+	}
 
-	uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, skb->len,
-				     IPPROTO_UDP, 0);
-	skb->csum_start = skb_transport_header(skb) - skb->head;
-	skb->csum_offset = offsetof(struct udphdr, check);
-	skb->ip_summed = CHECKSUM_PARTIAL;
 	return 0;
 }
 
+static struct sk_buff *skb_udp6_tunnel_segment(struct sk_buff *skb,
+					       netdev_features_t features)
+{
+	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	int mac_len = skb->mac_len;
+	int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);
+	int outer_hlen;
+	netdev_features_t enc_features;
+
+	if (unlikely(!pskb_may_pull(skb, tnl_hlen)))
+		goto out;
+
+	skb->encapsulation = 0;
+	__skb_pull(skb, tnl_hlen);
+	skb_reset_mac_header(skb);
+	skb_set_network_header(skb, skb_inner_network_offset(skb));
+	skb->mac_len = skb_inner_network_offset(skb);
+
+	/* segment inner packet. */
+	enc_features = skb->dev->hw_enc_features & netif_skb_features(skb);
+	segs = skb_mac_gso_segment(skb, enc_features);
+	if (!segs || IS_ERR(segs))
+		goto out;
+
+	outer_hlen = skb_tnl_header_len(skb);
+	skb = segs;
+	do {
+		struct udphdr *uh;
+		struct ipv6hdr *ipv6h;
+		int udp_offset = outer_hlen - tnl_hlen;
+		u32 len;
+
+		skb->mac_len = mac_len;
+
+		skb_push(skb, outer_hlen);
+		skb_reset_mac_header(skb);
+		skb_set_network_header(skb, mac_len);
+		skb_set_transport_header(skb, udp_offset);
+		uh = udp_hdr(skb);
+		uh->len = htons(skb->len - udp_offset);
+		ipv6h = ipv6_hdr(skb);
+		len = skb->len - udp_offset;
+
+		uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr,
+					     len, IPPROTO_UDP, 0);
+		uh->check = csum_fold(skb_checksum(skb, udp_offset, len, 0));
+		if (uh->check == 0)
+			uh->check = CSUM_MANGLED_0;
+		skb->ip_summed = CHECKSUM_NONE;
+	} while ((skb = skb->next));
+out:
+	return segs;
+}
+
 static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
-	netdev_features_t features)
+					 netdev_features_t features)
 {
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	unsigned int mss;
@@ -75,47 +128,51 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 		goto out;
 	}
 
-	/* Do software UFO. Complete and fill in the UDP checksum as HW cannot
-	 * do checksum of UDP packets sent as multiple IP fragments.
-	 */
-	offset = skb_checksum_start_offset(skb);
-	csum = skb_checksum(skb, offset, skb->len - offset, 0);
-	offset += skb->csum_offset;
-	*(__sum16 *)(skb->data + offset) = csum_fold(csum);
-	skb->ip_summed = CHECKSUM_NONE;
-
-	/* Check if there is enough headroom to insert fragment header. */
-	tnl_hlen = skb_tnl_header_len(skb);
-	if (skb_headroom(skb) < (tnl_hlen + frag_hdr_sz)) {
-		if (gso_pskb_expand_head(skb, tnl_hlen + frag_hdr_sz))
-			goto out;
+	if (skb->encapsulation && skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL)
+		segs = skb_udp6_tunnel_segment(skb, features);
+	else {
+		/* Do software UFO. Complete and fill in the UDP checksum as HW cannot
+		 * do checksum of UDP packets sent as multiple IP fragments.
+		 */
+		offset = skb_checksum_start_offset(skb);
+		csum = skb_checksum(skb, offset, skb->len - offset, 0);
+		offset += skb->csum_offset;
+		*(__sum16 *)(skb->data + offset) = csum_fold(csum);
+		skb->ip_summed = CHECKSUM_NONE;
+
+		/* Check if there is enough headroom to insert fragment header. */
+		tnl_hlen = skb_tnl_header_len(skb);
+		if (skb_headroom(skb) < (tnl_hlen + frag_hdr_sz)) {
+			if (gso_pskb_expand_head(skb, tnl_hlen + frag_hdr_sz))
+				goto out;
+		}
+
+		/* Find the unfragmentable header and shift it left by frag_hdr_sz
+		 * bytes to insert fragment header.
+		 */
+		unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
+		nexthdr = *prevhdr;
+		*prevhdr = NEXTHDR_FRAGMENT;
+		unfrag_len = (skb_network_header(skb) - skb_mac_header(skb)) +
+			     unfrag_ip6hlen + tnl_hlen;
+		packet_start = (u8 *) skb->head + SKB_GSO_CB(skb)->mac_offset;
+		memmove(packet_start-frag_hdr_sz, packet_start, unfrag_len);
+
+		SKB_GSO_CB(skb)->mac_offset -= frag_hdr_sz;
+		skb->mac_header -= frag_hdr_sz;
+		skb->network_header -= frag_hdr_sz;
+
+		fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
+		fptr->nexthdr = nexthdr;
+		fptr->reserved = 0;
+		ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
+
+		/* Fragment the skb. ipv6 header and the remaining fields of the
+		 * fragment header are updated in ipv6_gso_segment()
+		 */
+		segs = skb_segment(skb, features);
 	}
 
-	/* Find the unfragmentable header and shift it left by frag_hdr_sz
-	 * bytes to insert fragment header.
-	 */
-	unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
-	nexthdr = *prevhdr;
-	*prevhdr = NEXTHDR_FRAGMENT;
-	unfrag_len = (skb_network_header(skb) - skb_mac_header(skb)) +
-		     unfrag_ip6hlen + tnl_hlen;
-	packet_start = (u8 *) skb->head + SKB_GSO_CB(skb)->mac_offset;
-	memmove(packet_start-frag_hdr_sz, packet_start, unfrag_len);
-
-	SKB_GSO_CB(skb)->mac_offset -= frag_hdr_sz;
-	skb->mac_header -= frag_hdr_sz;
-	skb->network_header -= frag_hdr_sz;
-
-	fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
-	fptr->nexthdr = nexthdr;
-	fptr->reserved = 0;
-	ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
-
-	/* Fragment the skb. ipv6 header and the remaining fields of the
-	 * fragment header are updated in ipv6_gso_segment()
-	 */
-	segs = skb_segment(skb, features);
-
 out:
 	return segs;
 }
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 01/11] ipv6: make ip6_dst_hoplimit() static inline
  2013-08-28  5:22 ` [PATCH net-next v10 01/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang
@ 2013-08-28 10:39   ` Eric Dumazet
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2013-08-28 10:39 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, David S. Miller

On Wed, 2013-08-28 at 13:22 +0800, Cong Wang wrote:
> From: Cong Wang <amwang@redhat.com>
> 
> It will be used by vxlan.
> 
> Signed-off-by: Cong Wang <amwang@redhat.com>
> ---
>  include/net/ip6_route.h |   23 +++++++++++++++++++++--
>  net/ipv6/route.c        |   19 -------------------
>  2 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
> index f667248..2afcd0d 100644
> --- a/include/net/ip6_route.h
> +++ b/include/net/ip6_route.h
> @@ -21,6 +21,7 @@ struct route_info {
>  #include <net/flow.h>
>  #include <net/ip6_fib.h>
>  #include <net/sock.h>
> +#include <net/addrconf.h>
>  #include <linux/ip.h>
>  #include <linux/ipv6.h>
>  #include <linux/route.h>
> @@ -112,8 +113,6 @@ extern struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
>  					   const struct in6_addr *addr,
>  					   bool anycast);
>  
> -extern int			ip6_dst_hoplimit(struct dst_entry *dst);
> -
>  /*
>   *	support functions for ND
>   *
> @@ -203,4 +202,24 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct in6_addr
>  	return dest;
>  }
>  
> +#if IS_ENABLED(CONFIG_IPV6)
> +static inline int ip6_dst_hoplimit(struct dst_entry *dst)
> +{
> +	int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT);
> +	if (hoplimit == 0) {
> +		struct net_device *dev = dst->dev;
> +		struct inet6_dev *idev;
> +
> +		rcu_read_lock();
> +		idev = __in6_dev_get(dev);
> +		if (idev)
> +			hoplimit = idev->cnf.hop_limit;
> +		else
> +			hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit;
> +		rcu_read_unlock();
> +	}
> +	return hoplimit;
> +}
> +#endif
> +

It probably be not inlined (10 call sites), but moved into
net/ipv6/output_core.c

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 05/11] ipv6: do not call ndisc_send_rs() with write lock
  2013-08-28  5:22 ` [PATCH net-next v10 05/11] ipv6: do not call ndisc_send_rs() with write lock Cong Wang
@ 2013-08-28 15:34   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 23+ messages in thread
From: Hannes Frederic Sowa @ 2013-08-28 15:34 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, David S. Miller

On Wed, Aug 28, 2013 at 01:22:53PM +0800, Cong Wang wrote:
> From: Cong Wang <amwang@redhat.com>
> 
> Because vxlan module will call ip6_dst_lookup() in TX path,
> which will hold write lock. So we have to release this write lock
> before calling ndisc_send_rs(), otherwise could deadlock.
> 
> Signed-off-by: Cong Wang <amwang@redhat.com>

Change seems fine:

Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

> ---
>  net/ipv6/addrconf.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 8e68466..1bb0861 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3088,6 +3088,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
>  static void addrconf_rs_timer(unsigned long data)
>  {
>  	struct inet6_dev *idev = (struct inet6_dev *)data;
> +	struct net_device *dev = idev->dev;
>  	struct in6_addr lladdr;
>  
>  	write_lock(&idev->lock);
> @@ -3102,12 +3103,14 @@ static void addrconf_rs_timer(unsigned long data)
>  		goto out;
>  
>  	if (idev->rs_probes++ < idev->cnf.rtr_solicits) {
> -		if (!__ipv6_get_lladdr(idev, &lladdr, IFA_F_TENTATIVE))
> -			ndisc_send_rs(idev->dev, &lladdr,
> +		write_unlock(&idev->lock);
> +		if (!ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE))
> +			ndisc_send_rs(dev, &lladdr,
>  				      &in6addr_linklocal_allrouters);
>  		else
> -			goto out;
> +			goto put;
>  
> +		write_lock(&idev->lock);
>  		/* The wait after the last probe can be shorter */
>  		addrconf_mod_rs_timer(idev, (idev->rs_probes ==
>  					     idev->cnf.rtr_solicits) ?
> @@ -3123,6 +3126,7 @@ static void addrconf_rs_timer(unsigned long data)
>  
>  out:
>  	write_unlock(&idev->lock);
> +put:
>  	in6_dev_put(idev);
>  }

Maybe it would be cleaner to just update a boolean in the inner block
and call ndisc_send_rs after the write_unlock.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 07/11] vxlan: add ipv6 route short circuit support
  2013-08-28  5:22 ` [PATCH net-next v10 07/11] vxlan: add ipv6 route short circuit support Cong Wang
@ 2013-08-28 16:08   ` Stephen Hemminger
  0 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2013-08-28 16:08 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, David S. Miller, David Stevens

On Wed, 28 Aug 2013 13:22:55 +0800
Cong Wang <amwang@redhat.com> wrote:

> +		n = neigh_lookup(ipv6_stub->nd_tbl, &pip6->daddr, dev);
> +		if (!n && vxlan->flags & VXLAN_F_L3MISS) {
> +			union vxlan_addr ipa;
> +			ipa.sin6.sin6_addr = pip6->daddr;
> +			ipa.sa.sa_family = AF_INET6;

Please always add paren's when doing bitwise mask in comparison, gcc will even warn
about this.

You could use C99 initialization for this which has added benefit
of zeroing unused fields.
			union vxlan_addr ipa = {
				.sa.sa_family = AF_INET6,
				.sin6.sin6_sin6)addr = pip6->daddr,
			};

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 03/11] ipv6: export a stub for IPv6 symbols used by vxlan
  2013-08-28  5:22 ` [PATCH net-next v10 03/11] ipv6: export a stub for IPv6 symbols used by vxlan Cong Wang
@ 2013-08-28 16:12   ` Stephen Hemminger
  2013-08-30  7:47     ` Cong Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2013-08-28 16:12 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, David S. Miller, Ben Hutchings

On Wed, 28 Aug 2013 13:22:51 +0800
Cong Wang <amwang@redhat.com> wrote:

> +struct ipv6_stub {
> +	int (*ipv6_sock_mc_join)(struct sock *sk, int ifindex,
> +				 const struct in6_addr *addr);
> +	int (*ipv6_sock_mc_drop)(struct sock *sk, int ifindex,
> +				 const struct in6_addr *addr);
> +	int (*ipv6_dst_lookup)(struct sock *sk, struct dst_entry **dst,
> +				struct flowi6 *fl6);
> +	void (*udpv6_encap_enable)(void);
> +};
> +extern const struct ipv6_stub *ipv6_stub __read_mostly;
> +

Since IPv6 really can't ever be safely unloaded. why not:
 * admit that, and get rid of ipv6_module exit.
 * then this stub can be a static const table.

You see virtual function tables that are writeable are potential malware
targets, although eliminating all of them is impossible, I hate to
see adding new ones.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 09/11] vxlan: add ipv6 proxy support
  2013-08-28  5:22 ` [PATCH net-next v10 09/11] vxlan: add ipv6 proxy support Cong Wang
@ 2013-08-28 16:13   ` Stephen Hemminger
  0 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2013-08-28 16:13 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, David S. Miller, David Stevens

On Wed, 28 Aug 2013 13:22:57 +0800
Cong Wang <amwang@redhat.com> wrote:

> +#if IS_ENABLED(CONFIG_IPV6)
> +static int neigh_reduce(struct net_device *dev, struct sk_buff *skb)
> +{
> +	struct vxlan_dev *vxlan = netdev_priv(dev);
> +	struct neighbour *n;
> +	union vxlan_addr ipa;
> +	const struct ipv6hdr *iphdr;
> +	const struct in6_addr *saddr, *daddr;
> +	struct nd_msg *msg;
> +	struct inet6_dev *in6_dev = NULL;
> +
> +	WARN_ON(!rcu_read_lock_held() && !rcu_read_lock_bh_held());

You may have needed warn during debugging, but you should be able
to audit all the codepaths (this is a static function) to be sure
that this code is correct (ie rcu lock always held).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 11/11] ipv6: Add generic UDP Tunnel segmentation
  2013-08-28  5:22 ` [PATCH net-next v10 11/11] ipv6: Add generic UDP Tunnel segmentation Cong Wang
@ 2013-08-28 17:55   ` Pravin Shelar
  2013-08-30  8:16     ` Cong Wang
  0 siblings, 1 reply; 23+ messages in thread
From: Pravin Shelar @ 2013-08-28 17:55 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, David S. Miller, Jesse Gross, Stephen Hemminger

On Tue, Aug 27, 2013 at 10:22 PM, Cong Wang <amwang@redhat.com> wrote:
> From: Cong Wang <amwang@redhat.com>
>
> Similar to commit 731362674580cb0c696cd1b1a03d8461a10cf90a
> (tunneling: Add generic Tunnel segmentation)
>
Can we factor out ipv4-udp-tunnel code so that there is no duplicating code?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 10/11] vxlan: respect scope_id for ll addr
  2013-08-28  5:22 ` [PATCH net-next v10 10/11] vxlan: respect scope_id for ll addr Cong Wang
@ 2013-08-29 21:20   ` David Stevens
  2013-08-30  8:10     ` Cong Wang
  0 siblings, 1 reply; 23+ messages in thread
From: David Stevens @ 2013-08-29 21:20 UTC (permalink / raw)
  To: Cong Wang; +Cc: Cong Wang, David S. Miller, netdev, netdev-owner

netdev-owner@vger.kernel.org wrote on 08/28/2013 01:22:58 AM:

> From: Cong Wang <amwang@redhat.com>
 
> As pointed out by David, we should take care of scope id for ll
> addr, and use it for route lookup.

        It is an error to have a zero scope_id for an LL addr, but
this code is not correct, because it only honors scope_id for
LL addrs. Multicast addresses also require scope_id be set.
        This shouldn't be in the transmit path. The netlink code
that sets it should require it be nonzero for LL addrs and multicast
addrs when it is set, and the transmit path should *always* use scope_id,
whether for an lladdr or not, whether scope_id is zero or not.

        I'm away this week and won't be able to review these in
detail until next week, but wanted to get that comment in.

                                                +-DLS

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 03/11] ipv6: export a stub for IPv6 symbols used by vxlan
  2013-08-28 16:12   ` Stephen Hemminger
@ 2013-08-30  7:47     ` Cong Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Cong Wang @ 2013-08-30  7:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, David S. Miller, Ben Hutchings

On Wed, 2013-08-28 at 09:12 -0700, Stephen Hemminger wrote:
> On Wed, 28 Aug 2013 13:22:51 +0800
> Cong Wang <amwang@redhat.com> wrote:
> 
> > +struct ipv6_stub {
> > +	int (*ipv6_sock_mc_join)(struct sock *sk, int ifindex,
> > +				 const struct in6_addr *addr);
> > +	int (*ipv6_sock_mc_drop)(struct sock *sk, int ifindex,
> > +				 const struct in6_addr *addr);
> > +	int (*ipv6_dst_lookup)(struct sock *sk, struct dst_entry **dst,
> > +				struct flowi6 *fl6);
> > +	void (*udpv6_encap_enable)(void);
> > +};
> > +extern const struct ipv6_stub *ipv6_stub __read_mostly;
> > +
> 
> Since IPv6 really can't ever be safely unloaded. why not:
>  * admit that, and get rid of ipv6_module exit.
>  * then this stub can be a static const table.
> 
> You see virtual function tables that are writeable are potential malware
> targets, although eliminating all of them is impossible, I hate to
> see adding new ones.

I think getting rid of ipv6 exit deserves another thread for discussion,
not in this thread. Actually I raised same question before, but no one
replied me. If you want, I can send another patch for it after this
patchset gets merged.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 10/11] vxlan: respect scope_id for ll addr
  2013-08-29 21:20   ` David Stevens
@ 2013-08-30  8:10     ` Cong Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Cong Wang @ 2013-08-30  8:10 UTC (permalink / raw)
  To: David Stevens; +Cc: David S. Miller, netdev, netdev-owner

On Thu, 2013-08-29 at 17:20 -0400, David Stevens wrote:
> netdev-owner@vger.kernel.org wrote on 08/28/2013 01:22:58 AM:
> 
> > From: Cong Wang <amwang@redhat.com>
>  
> > As pointed out by David, we should take care of scope id for ll
> > addr, and use it for route lookup.
> 
>         It is an error to have a zero scope_id for an LL addr, but
> this code is not correct, because it only honors scope_id for
> LL addrs. Multicast addresses also require scope_id be set.
>         This shouldn't be in the transmit path. The netlink code
> that sets it should require it be nonzero for LL addrs and multicast
> addrs when it is set, and the transmit path should *always* use scope_id,
> whether for an lladdr or not, whether scope_id is zero or not.
> 

Except multicast address, I think this is what I did in some version
before? It is you who told me to respect scope id on tx path for
routing... I am really tired of changing it back again and again...

I will drop this patch and let you fix it by yourself later.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 11/11] ipv6: Add generic UDP Tunnel segmentation
  2013-08-28 17:55   ` Pravin Shelar
@ 2013-08-30  8:16     ` Cong Wang
  2013-08-30 21:14       ` Pravin Shelar
  0 siblings, 1 reply; 23+ messages in thread
From: Cong Wang @ 2013-08-30  8:16 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: netdev, David S. Miller, Jesse Gross, Stephen Hemminger

On Wed, 2013-08-28 at 10:55 -0700, Pravin Shelar wrote:
> On Tue, Aug 27, 2013 at 10:22 PM, Cong Wang <amwang@redhat.com> wrote:
> > From: Cong Wang <amwang@redhat.com>
> >
> > Similar to commit 731362674580cb0c696cd1b1a03d8461a10cf90a
> > (tunneling: Add generic Tunnel segmentation)
> >
> Can we factor out ipv4-udp-tunnel code so that there is no duplicating code?

Probably we can, but it will make code review harder? Or you mean doing
it in a separated patch? If you insist, I can do it in a separated
patch.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v10 11/11] ipv6: Add generic UDP Tunnel segmentation
  2013-08-30  8:16     ` Cong Wang
@ 2013-08-30 21:14       ` Pravin Shelar
  0 siblings, 0 replies; 23+ messages in thread
From: Pravin Shelar @ 2013-08-30 21:14 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, David S. Miller, Jesse Gross, Stephen Hemminger

On Fri, Aug 30, 2013 at 1:16 AM, Cong Wang <amwang@redhat.com> wrote:
> On Wed, 2013-08-28 at 10:55 -0700, Pravin Shelar wrote:
>> On Tue, Aug 27, 2013 at 10:22 PM, Cong Wang <amwang@redhat.com> wrote:
>> > From: Cong Wang <amwang@redhat.com>
>> >
>> > Similar to commit 731362674580cb0c696cd1b1a03d8461a10cf90a
>> > (tunneling: Add generic Tunnel segmentation)
>> >
>> Can we factor out ipv4-udp-tunnel code so that there is no duplicating code?
>
> Probably we can, but it will make code review harder? Or you mean doing
> it in a separated patch? If you insist, I can do it in a separated
> patch.
>

I doubt it makes review any harder. But separate patch is also fine with me.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-08-30 21:14 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-28  5:22 [PATCH net-next v10 00/11] vxlan: add ipv6 support Cong Wang
2013-08-28  5:22 ` [PATCH net-next v10 01/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang
2013-08-28 10:39   ` Eric Dumazet
2013-08-28  5:22 ` [PATCH net-next v10 02/11] ipv6: move ip6_local_out into core kernel Cong Wang
2013-08-28  5:22 ` [PATCH net-next v10 03/11] ipv6: export a stub for IPv6 symbols used by vxlan Cong Wang
2013-08-28 16:12   ` Stephen Hemminger
2013-08-30  7:47     ` Cong Wang
2013-08-28  5:22 ` [PATCH net-next v10 04/11] ipv6: export some IPv6 special addresses to modules Cong Wang
2013-08-28  5:22 ` [PATCH net-next v10 05/11] ipv6: do not call ndisc_send_rs() with write lock Cong Wang
2013-08-28 15:34   ` Hannes Frederic Sowa
2013-08-28  5:22 ` [PATCH net-next v10 06/11] vxlan: add ipv6 support Cong Wang
2013-08-28  5:22 ` [PATCH net-next v10 07/11] vxlan: add ipv6 route short circuit support Cong Wang
2013-08-28 16:08   ` Stephen Hemminger
2013-08-28  5:22 ` [PATCH net-next v10 08/11] ipv6: move in6_dev_finish_destroy() into core kernel Cong Wang
2013-08-28  5:22 ` [PATCH net-next v10 09/11] vxlan: add ipv6 proxy support Cong Wang
2013-08-28 16:13   ` Stephen Hemminger
2013-08-28  5:22 ` [PATCH net-next v10 10/11] vxlan: respect scope_id for ll addr Cong Wang
2013-08-29 21:20   ` David Stevens
2013-08-30  8:10     ` Cong Wang
2013-08-28  5:22 ` [PATCH net-next v10 11/11] ipv6: Add generic UDP Tunnel segmentation Cong Wang
2013-08-28 17:55   ` Pravin Shelar
2013-08-30  8:16     ` Cong Wang
2013-08-30 21:14       ` Pravin Shelar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).