All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-23 21:02 ` [PATCH v1 1/6] net: Generalize udp based tunnel offload Anjali Singhai Jain
@ 2015-11-23 20:57   ` kbuild test robot
  2015-11-23 20:58   ` kbuild test robot
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 94+ messages in thread
From: kbuild test robot @ 2015-11-23 20:57 UTC (permalink / raw)
  To: Anjali Singhai Jain
  Cc: kbuild-all, netdev, jesse, Anjali Singhai Jain, Kiran Patil

[-- Attachment #1: Type: text/plain, Size: 1581 bytes --]

Hi Anjali,

[auto build test ERROR on net/master]
[also build test ERROR on v4.4-rc2 next-20151123]

url:    https://github.com/0day-ci/linux/commits/Anjali-Singhai-Jain/Generalize-udp-based-tunnels-and-add-geneve-offload/20151124-044959
config: x86_64-randconfig-x001-11230704 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_add_vxlan_port':
>> drivers/net/ethernet/broadcom/bnxt/bnxt.c:5436:14: error: 'UDP_TUNNEL_VXLAN' undeclared (first use in this function)
     if (type != UDP_TUNNEL_VXLAN)
                 ^
   drivers/net/ethernet/broadcom/bnxt/bnxt.c:5436:14: note: each undeclared identifier is reported only once for each function it appears in
   drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_del_vxlan_port':
   drivers/net/ethernet/broadcom/bnxt/bnxt.c:5461:14: error: 'UDP_TUNNEL_VXLAN' undeclared (first use in this function)
     if (type != UDP_TUNNEL_VXLAN)
                 ^

vim +/UDP_TUNNEL_VXLAN +5436 drivers/net/ethernet/broadcom/bnxt/bnxt.c

  5430		if (!netif_running(dev))
  5431			return;
  5432	
  5433		if (sa_family != AF_INET6 && sa_family != AF_INET)
  5434			return;
  5435	
> 5436		if (type != UDP_TUNNEL_VXLAN)
  5437			return;
  5438	
  5439		if (bp->vxlan_port_cnt && bp->vxlan_port != port)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 30328 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-23 21:02 ` [PATCH v1 1/6] net: Generalize udp based tunnel offload Anjali Singhai Jain
  2015-11-23 20:57   ` kbuild test robot
@ 2015-11-23 20:58   ` kbuild test robot
  2015-11-23 21:53   ` Tom Herbert
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 94+ messages in thread
From: kbuild test robot @ 2015-11-23 20:58 UTC (permalink / raw)
  To: Anjali Singhai Jain
  Cc: kbuild-all, netdev, jesse, Anjali Singhai Jain, Kiran Patil

[-- Attachment #1: Type: text/plain, Size: 2142 bytes --]

Hi Anjali,

[auto build test ERROR on net/master]
[also build test ERROR on v4.4-rc2 next-20151123]

url:    https://github.com/0day-ci/linux/commits/Anjali-Singhai-Jain/Generalize-udp-based-tunnels-and-add-geneve-offload/20151124-044959
config: i386-randconfig-x006-11230317 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:53:0:
   include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads':
>> include/net/udp_tunnel.h:112:9: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration]
     return iptunnel_handle_offloads(skb, udp_csum, type);
            ^
   include/net/udp_tunnel.h:112:9: warning: return makes pointer from integer without a cast [-Wint-conversion]
   cc1: some warnings being treated as errors

vim +/iptunnel_handle_offloads +112 include/net/udp_tunnel.h

c29a70d2 Pravin B Shelar 2015-08-26  106  
6a93cc90 Andy Zhou       2014-09-16  107  static inline struct sk_buff *udp_tunnel_handle_offloads(struct sk_buff *skb,
6a93cc90 Andy Zhou       2014-09-16  108  							 bool udp_csum)
6a93cc90 Andy Zhou       2014-09-16  109  {
6a93cc90 Andy Zhou       2014-09-16  110  	int type = udp_csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
6a93cc90 Andy Zhou       2014-09-16  111  
6a93cc90 Andy Zhou       2014-09-16 @112  	return iptunnel_handle_offloads(skb, udp_csum, type);
6a93cc90 Andy Zhou       2014-09-16  113  }
6a93cc90 Andy Zhou       2014-09-16  114  
cfdf1e1b Jesse Gross     2014-11-10  115  static inline void udp_tunnel_gro_complete(struct sk_buff *skb, int nhoff)

:::::: The code at line 112 was first introduced by commit
:::::: 6a93cc9052748c6355ec9d5b6c38b77f85f1cb0d udp-tunnel: Add a few more UDP tunnel APIs

:::::: TO: Andy Zhou <azhou@nicira.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 21526 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 0/6] Generalize udp based tunnels and add geneve offload
@ 2015-11-23 21:02 Anjali Singhai Jain
  2015-11-23 21:02 ` [PATCH v1 1/6] net: Generalize udp based tunnel offload Anjali Singhai Jain
                   ` (5 more replies)
  0 siblings, 6 replies; 94+ messages in thread
From: Anjali Singhai Jain @ 2015-11-23 21:02 UTC (permalink / raw)
  To: netdev; +Cc: jesse

This patch series generalizes the flow for udp_based tunnels for
offload hooks. So that much of the frame work can remain common
for any driver adding support for geneve or any other
udp_based tunnel in future.

This patch series also makes the driver calls to/from the stack
independent of the tunnel modules. So it would be easier to remove
module config parameters from the respective drivers for tunnel
support. One such example is in i40e driver.

This patch series also adds support for geneve offload in i40e driver.

Much thanks to Jesse Gross for his valuable feedback in getting this
done the right way.

Anjali Singhai Jain (6):
net: Generalize udp based tunnel offload
net: Add a generic udp_offload_get_port function
i40e: Generalize the flow for udp based tunnels
i40e: Remove CONFIG_I40E_VXLAN
net: Refactor udp_offload and add Geneve port offload
i40e:Add geneve tunnel offload support

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-23 21:02 [PATCH 0/6] Generalize udp based tunnels and add geneve offload Anjali Singhai Jain
@ 2015-11-23 21:02 ` Anjali Singhai Jain
  2015-11-23 20:57   ` kbuild test robot
                     ` (4 more replies)
  2015-11-23 21:02 ` [PATCH v1 2/6] net: Add a generic udp_offload_get_port function Anjali Singhai Jain
                   ` (4 subsequent siblings)
  5 siblings, 5 replies; 94+ messages in thread
From: Anjali Singhai Jain @ 2015-11-23 21:02 UTC (permalink / raw)
  To: netdev; +Cc: jesse, Anjali Singhai Jain, Kiran Patil

Replace add/del ndo ops for vxlan_port with tunnel_port so that all UDP
based tunnels can use the same ndo op. Add a parameter to pass tunnel
type to the ndo_op.

Change all drivers to use the generalized udp tunnel offload

Patch was compile tested with x86_64_defconfig.

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 15 ++++++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c        | 13 +++++---
 drivers/net/ethernet/emulex/benet/be_main.c      | 14 +++++---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  | 27 ++++++++++++----
 drivers/net/ethernet/intel/i40e/i40e_main.c      | 41 +++++++++++++++++-------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 17 +++++++---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   | 21 ++++++++----
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 17 +++++++---
 drivers/net/vxlan.c                              | 23 +++++++------
 include/linux/netdevice.h                        | 34 ++++++++++----------
 include/net/udp_tunnel.h                         |  6 ++++
 11 files changed, 157 insertions(+), 71 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 2273576..ad2782f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -47,6 +47,7 @@
 #include <net/ip.h>
 #include <net/ipv6.h>
 #include <net/tcp.h>
+#include <net/udp_tunnel.h>
 #include <net/vxlan.h>
 #include <net/checksum.h>
 #include <net/ip6_checksum.h>
@@ -10124,11 +10125,14 @@ static void __bnx2x_add_vxlan_port(struct bnx2x *bp, u16 port)
 }
 
 static void bnx2x_add_vxlan_port(struct net_device *netdev,
-				 sa_family_t sa_family, __be16 port)
+				 sa_family_t sa_family, __be16 port,
+				 u32 type)
 {
 	struct bnx2x *bp = netdev_priv(netdev);
 	u16 t_port = ntohs(port);
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
 	__bnx2x_add_vxlan_port(bp, t_port);
 }
 
@@ -10152,11 +10156,14 @@ static void __bnx2x_del_vxlan_port(struct bnx2x *bp, u16 port)
 }
 
 static void bnx2x_del_vxlan_port(struct net_device *netdev,
-				 sa_family_t sa_family, __be16 port)
+				 sa_family_t sa_family, __be16 port,
+				 u32 type)
 {
 	struct bnx2x *bp = netdev_priv(netdev);
 	u16 t_port = ntohs(port);
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
 	__bnx2x_del_vxlan_port(bp, t_port);
 }
 #endif
@@ -13008,8 +13015,8 @@ static const struct net_device_ops bnx2x_netdev_ops = {
 	.ndo_set_vf_link_state	= bnx2x_set_vf_link_state,
 	.ndo_features_check	= bnx2x_features_check,
 #ifdef CONFIG_BNX2X_VXLAN
-	.ndo_add_vxlan_port	= bnx2x_add_vxlan_port,
-	.ndo_del_vxlan_port	= bnx2x_del_vxlan_port,
+	.ndo_add_udp_tunnel_port	= bnx2x_add_vxlan_port,
+	.ndo_del_udp_tunnel_port	= bnx2x_del_vxlan_port,
 #endif
 };
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f2d0dc9..5b96ddf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5421,7 +5421,7 @@ static void bnxt_cfg_ntp_filters(struct bnxt *bp)
 #endif /* CONFIG_RFS_ACCEL */
 
 static void bnxt_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
-				__be16 port)
+				__be16 port, u32 type)
 {
 	struct bnxt *bp = netdev_priv(dev);
 
@@ -5431,6 +5431,9 @@ static void bnxt_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
 	if (sa_family != AF_INET6 && sa_family != AF_INET)
 		return;
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	if (bp->vxlan_port_cnt && bp->vxlan_port != port)
 		return;
 
@@ -5443,7 +5446,7 @@ static void bnxt_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
 }
 
 static void bnxt_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
-				__be16 port)
+				__be16 port, u32 type)
 {
 	struct bnxt *bp = netdev_priv(dev);
 
@@ -5453,6 +5456,8 @@ static void bnxt_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
 	if (sa_family != AF_INET6 && sa_family != AF_INET)
 		return;
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
 	if (bp->vxlan_port_cnt && bp->vxlan_port == port) {
 		bp->vxlan_port_cnt--;
 
@@ -5491,8 +5496,8 @@ static const struct net_device_ops bnxt_netdev_ops = {
 #ifdef CONFIG_RFS_ACCEL
 	.ndo_rx_flow_steer	= bnxt_rx_flow_steer,
 #endif
-	.ndo_add_vxlan_port	= bnxt_add_vxlan_port,
-	.ndo_del_vxlan_port	= bnxt_del_vxlan_port,
+	.ndo_add_udp_tunnel_port	= bnxt_add_vxlan_port,
+	.ndo_del_udp_tunnel_port	= bnxt_del_vxlan_port,
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	.ndo_busy_poll		= bnxt_busy_poll,
 #endif
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 4cab887..e699deca 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -23,6 +23,7 @@
 #include <linux/aer.h>
 #include <linux/if_bridge.h>
 #include <net/busy_poll.h>
+#include <net/udp_tunnel.h>
 #include <net/vxlan.h>
 
 MODULE_VERSION(DRV_VER);
@@ -5175,12 +5176,15 @@ static int be_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
  * until after all the tunnels are removed.
  */
 static void be_add_vxlan_port(struct net_device *netdev, sa_family_t sa_family,
-			      __be16 port)
+			      __be16 port, u32 type)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
 	struct device *dev = &adapter->pdev->dev;
 	int status;
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter))
 		return;
 
@@ -5229,10 +5233,12 @@ err:
 }
 
 static void be_del_vxlan_port(struct net_device *netdev, sa_family_t sa_family,
-			      __be16 port)
+			      __be16 port, u32 type)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
 	if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter))
 		return;
 
@@ -5342,8 +5348,8 @@ static const struct net_device_ops be_netdev_ops = {
 	.ndo_busy_poll		= be_busy_poll,
 #endif
 #ifdef CONFIG_BE2NET_VXLAN
-	.ndo_add_vxlan_port	= be_add_vxlan_port,
-	.ndo_del_vxlan_port	= be_del_vxlan_port,
+	.ndo_add_udp_tunnel_port	= be_add_vxlan_port,
+	.ndo_del_udp_tunnel_port	= be_del_vxlan_port,
 	.ndo_features_check	= be_features_check,
 #endif
 	.ndo_get_phys_port_id   = be_get_phys_port_id,
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 639263d..447d5e6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -21,6 +21,7 @@
 #include "fm10k.h"
 #include <linux/vmalloc.h>
 #if IS_ENABLED(CONFIG_FM10K_VXLAN)
+#include <net/udp_tunnel.h>
 #include <net/vxlan.h>
 #endif /* CONFIG_FM10K_VXLAN */
 
@@ -439,18 +440,24 @@ static void fm10k_restore_vxlan_port(struct fm10k_intfc *interface)
  * @netdev: network interface device structure
  * @sa_family: Address family of new port
  * @port: port number used for VXLAN
+ * @type: Tunnel type
  *
- * This funciton is called when a new VXLAN interface has added a new port
+ * This function is called when a new VXLAN interface has added a new port
  * number to the range that is currently in use for VXLAN.  The new port
  * number is always added to the tail so that the port number list should
  * match the order in which the ports were allocated.  The head of the list
  * is always used as the VXLAN port number for offloads.
  **/
 static void fm10k_add_vxlan_port(struct net_device *dev,
-				 sa_family_t sa_family, __be16 port) {
+				 sa_family_t sa_family, __be16 port,
+				 u32 type) {
+#if IS_ENABLED(CONFIG_FM10K_VXLAN)
 	struct fm10k_intfc *interface = netdev_priv(dev);
 	struct fm10k_vxlan_port *vxlan_port;
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	/* only the PF supports configuring tunnels */
 	if (interface->hw.mac.type != fm10k_mac_pf)
 		return;
@@ -476,6 +483,7 @@ insert_tail:
 	list_add_tail(&vxlan_port->list, &interface->vxlan_port);
 
 	fm10k_restore_vxlan_port(interface);
+#endif
 }
 
 /**
@@ -483,17 +491,23 @@ insert_tail:
  * @netdev: network interface device structure
  * @sa_family: Address family of freed port
  * @port: port number used for VXLAN
+ * @type: Tunnel type
  *
- * This funciton is called when a new VXLAN interface has freed a port
+ * This function is called when a new VXLAN interface has freed a port
  * number from the range that is currently in use for VXLAN.  The freed
  * port is removed from the list and the new head is used to determine
  * the port number for offloads.
  **/
 static void fm10k_del_vxlan_port(struct net_device *dev,
-				 sa_family_t sa_family, __be16 port) {
+				 sa_family_t sa_family, __be16 port,
+				 u32 type) {
+#if IS_ENABLED(CONFIG_FM10K_VXLAN)
 	struct fm10k_intfc *interface = netdev_priv(dev);
 	struct fm10k_vxlan_port *vxlan_port;
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	if (interface->hw.mac.type != fm10k_mac_pf)
 		return;
 
@@ -508,6 +522,7 @@ static void fm10k_del_vxlan_port(struct net_device *dev,
 	}
 
 	fm10k_restore_vxlan_port(interface);
+#endif
 }
 
 /**
@@ -1373,8 +1388,8 @@ static const struct net_device_ops fm10k_netdev_ops = {
 	.ndo_set_vf_vlan	= fm10k_ndo_set_vf_vlan,
 	.ndo_set_vf_rate	= fm10k_ndo_set_vf_bw,
 	.ndo_get_vf_config	= fm10k_ndo_get_vf_config,
-	.ndo_add_vxlan_port	= fm10k_add_vxlan_port,
-	.ndo_del_vxlan_port	= fm10k_del_vxlan_port,
+	.ndo_add_udp_tunnel_port	= fm10k_add_vxlan_port,
+	.ndo_del_udp_tunnel_port	= fm10k_del_vxlan_port,
 	.ndo_do_ioctl		= fm10k_ioctl,
 	.ndo_dfwd_add_station	= fm10k_dfwd_add_station,
 	.ndo_dfwd_del_station	= fm10k_dfwd_del_station,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b825f97..520e34e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -30,6 +30,7 @@
 #ifdef CONFIG_I40E_VXLAN
 #include <net/vxlan.h>
 #endif
+#include <net/udp_tunnel.h>
 
 const char i40e_driver_name[] = "i40e";
 static const char i40e_driver_string[] =
@@ -8296,13 +8297,18 @@ static u8 i40e_get_vxlan_port_idx(struct i40e_pf *pf, __be16 port)
 }
 
 /**
- * i40e_add_vxlan_port - Get notifications about VXLAN ports that come up
+ * i40e_add_tunnel_port - Get notifications about UDP tunnel ports that come up
  * @netdev: This physical port's netdev
- * @sa_family: Socket Family that VXLAN is notifying us about
- * @port: New UDP port number that VXLAN started listening to
+ * @sa_family: Socket Family that tunnel netdev is  associated with
+ * @port: New UDP port number that tunnel started listening to
+ * @type: Tunnel Type
+ *
+ * This function modifies a common data structure for all udp_tunnels
+ * hence it is expected that it is called under a common lock.
  **/
-static void i40e_add_vxlan_port(struct net_device *netdev,
-				sa_family_t sa_family, __be16 port)
+static void i40e_add_tunnel_port(struct net_device *netdev,
+				 sa_family_t sa_family, __be16 port,
+				 u32 type)
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
@@ -8310,6 +8316,9 @@ static void i40e_add_vxlan_port(struct net_device *netdev,
 	u8 next_idx;
 	u8 idx;
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	if (sa_family == AF_INET6)
 		return;
 
@@ -8338,19 +8347,27 @@ static void i40e_add_vxlan_port(struct net_device *netdev,
 }
 
 /**
- * i40e_del_vxlan_port - Get notifications about VXLAN ports that go away
+ * i40e_del_tunnel_port - Get notifications about UDP tunnel ports that go away
  * @netdev: This physical port's netdev
- * @sa_family: Socket Family that VXLAN is notifying us about
- * @port: UDP port number that VXLAN stopped listening to
+ * @sa_family: Socket Family that tunnel netdev is associated with
+ * @port: UDP port number that tunnel stopped listening to
+ * @type: Tunnel Type
+ *
+ * This function modifies a common data structure for all udp_tunnels
+ * hence it is expected that it is called under common lock.
  **/
-static void i40e_del_vxlan_port(struct net_device *netdev,
-				sa_family_t sa_family, __be16 port)
+static void i40e_del_tunnel_port(struct net_device *netdev,
+				 sa_family_t sa_family, __be16 port,
+				 u32 type)
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
 	u8 idx;
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	if (sa_family == AF_INET6)
 		return;
 
@@ -8596,8 +8613,8 @@ static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_set_vf_link_state	= i40e_ndo_set_vf_link_state,
 	.ndo_set_vf_spoofchk	= i40e_ndo_set_vf_spoofchk,
 #ifdef CONFIG_I40E_VXLAN
-	.ndo_add_vxlan_port	= i40e_add_vxlan_port,
-	.ndo_del_vxlan_port	= i40e_del_vxlan_port,
+	.ndo_add_udp_tunnel_port	= i40e_add_tunnel_port,
+	.ndo_del_udp_tunnel_port	= i40e_del_tunnel_port,
 #endif
 	.ndo_get_phys_port_id	= i40e_get_phys_port_id,
 	.ndo_fdb_add		= i40e_ndo_fdb_add,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 4089d77..76ccc77 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -50,6 +50,7 @@
 #include <linux/if_bridge.h>
 #include <linux/prefetch.h>
 #include <scsi/fc/fc_fcoe.h>
+#include <net/udp_tunnel.h>
 #include <net/vxlan.h>
 
 #ifdef CONFIG_OF
@@ -8088,14 +8089,18 @@ static int ixgbe_set_features(struct net_device *netdev,
  * @dev: The port's netdev
  * @sa_family: Socket Family that VXLAN is notifiying us about
  * @port: New UDP port number that VXLAN started listening to
+ * @type: Tunnel type
  **/
 static void ixgbe_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
-				 __be16 port)
+				 __be16 port, u32 type)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_hw *hw = &adapter->hw;
 	u16 new_port = ntohs(port);
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	if (!(adapter->flags & IXGBE_FLAG_VXLAN_OFFLOAD_CAPABLE))
 		return;
 
@@ -8121,13 +8126,17 @@ static void ixgbe_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
  * @dev: The port's netdev
  * @sa_family: Socket Family that VXLAN is notifying us about
  * @port: UDP port number that VXLAN stopped listening to
+ * @type: Tunnel type
  **/
 static void ixgbe_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
-				 __be16 port)
+				 __be16 port, u32 type)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	u16 new_port = ntohs(port);
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	if (!(adapter->flags & IXGBE_FLAG_VXLAN_OFFLOAD_CAPABLE))
 		return;
 
@@ -8436,8 +8445,8 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_dfwd_add_station	= ixgbe_fwd_add,
 	.ndo_dfwd_del_station	= ixgbe_fwd_del,
 #ifdef CONFIG_IXGBE_VXLAN
-	.ndo_add_vxlan_port	= ixgbe_add_vxlan_port,
-	.ndo_del_vxlan_port	= ixgbe_del_vxlan_port,
+	.ndo_add_udp_tunnel_port	= ixgbe_add_vxlan_port,
+	.ndo_del_udp_tunnel_port	= ixgbe_del_vxlan_port,
 #endif /* CONFIG_IXGBE_VXLAN */
 	.ndo_features_check	= ixgbe_features_check,
 };
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 659209f..2cb19c7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -39,6 +39,7 @@
 #include <linux/hash.h>
 #include <net/ip.h>
 #include <net/busy_poll.h>
+#include <net/udp_tunnel.h>
 #include <net/vxlan.h>
 
 #include <linux/mlx4/driver.h>
@@ -2365,11 +2366,15 @@ static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
 }
 
 static void mlx4_en_add_vxlan_port(struct  net_device *dev,
-				   sa_family_t sa_family, __be16 port)
+				   sa_family_t sa_family, __be16 port,
+				   u32 type)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	__be16 current_port;
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	if (priv->mdev->dev->caps.tunnel_offload_mode != MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
 		return;
 
@@ -2388,11 +2393,15 @@ static void mlx4_en_add_vxlan_port(struct  net_device *dev,
 }
 
 static void mlx4_en_del_vxlan_port(struct  net_device *dev,
-				   sa_family_t sa_family, __be16 port)
+				   sa_family_t sa_family, __be16 port,
+				   u32 type)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	__be16 current_port;
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	if (priv->mdev->dev->caps.tunnel_offload_mode != MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
 		return;
 
@@ -2469,8 +2478,8 @@ static const struct net_device_ops mlx4_netdev_ops = {
 #endif
 	.ndo_get_phys_port_id	= mlx4_en_get_phys_port_id,
 #ifdef CONFIG_MLX4_EN_VXLAN
-	.ndo_add_vxlan_port	= mlx4_en_add_vxlan_port,
-	.ndo_del_vxlan_port	= mlx4_en_del_vxlan_port,
+	.ndo_add_udp_tunnel_port	= mlx4_en_add_vxlan_port,
+	.ndo_del_udp_tunnel_port	= mlx4_en_del_vxlan_port,
 	.ndo_features_check	= mlx4_en_features_check,
 #endif
 	.ndo_set_tx_maxrate	= mlx4_en_set_tx_maxrate,
@@ -2507,8 +2516,8 @@ static const struct net_device_ops mlx4_netdev_ops_master = {
 #endif
 	.ndo_get_phys_port_id	= mlx4_en_get_phys_port_id,
 #ifdef CONFIG_MLX4_EN_VXLAN
-	.ndo_add_vxlan_port	= mlx4_en_add_vxlan_port,
-	.ndo_del_vxlan_port	= mlx4_en_del_vxlan_port,
+	.ndo_add_udp_tunnel_port	= mlx4_en_add_vxlan_port,
+	.ndo_del_udp_tunnel_port	= mlx4_en_del_vxlan_port,
 	.ndo_features_check	= mlx4_en_features_check,
 #endif
 	.ndo_set_tx_maxrate	= mlx4_en_set_tx_maxrate,
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 1205f6f..aa38dbb 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -17,6 +17,7 @@
 #include <linux/log2.h>
 #include <linux/pci.h>
 #ifdef CONFIG_QLCNIC_VXLAN
+#include <net/udp_tunnel.h>
 #include <net/vxlan.h>
 #endif
 
@@ -476,11 +477,15 @@ static int qlcnic_get_phys_port_id(struct net_device *netdev,
 
 #ifdef CONFIG_QLCNIC_VXLAN
 static void qlcnic_add_vxlan_port(struct net_device *netdev,
-				  sa_family_t sa_family, __be16 port)
+				  sa_family_t sa_family, __be16 port,
+				  u32 type)
 {
 	struct qlcnic_adapter *adapter = netdev_priv(netdev);
 	struct qlcnic_hardware_context *ahw = adapter->ahw;
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	/* Adapter supports only one VXLAN port. Use very first port
 	 * for enabling offload
 	 */
@@ -498,11 +503,15 @@ static void qlcnic_add_vxlan_port(struct net_device *netdev,
 }
 
 static void qlcnic_del_vxlan_port(struct net_device *netdev,
-				  sa_family_t sa_family, __be16 port)
+				  sa_family_t sa_family, __be16 port,
+				  u32 type)
 {
 	struct qlcnic_adapter *adapter = netdev_priv(netdev);
 	struct qlcnic_hardware_context *ahw = adapter->ahw;
 
+	if (type != UDP_TUNNEL_VXLAN)
+		return;
+
 	if (!qlcnic_encap_rx_offload(adapter) || !ahw->vxlan_port_count ||
 	    (ahw->vxlan_port != ntohs(port)))
 		return;
@@ -540,8 +549,8 @@ static const struct net_device_ops qlcnic_netdev_ops = {
 	.ndo_fdb_dump		= qlcnic_fdb_dump,
 	.ndo_get_phys_port_id	= qlcnic_get_phys_port_id,
 #ifdef CONFIG_QLCNIC_VXLAN
-	.ndo_add_vxlan_port	= qlcnic_add_vxlan_port,
-	.ndo_del_vxlan_port	= qlcnic_del_vxlan_port,
+	.ndo_add_udp_tunnel_port	= qlcnic_add_vxlan_port,
+	.ndo_del_udp_tunnel_port	= qlcnic_del_vxlan_port,
 	.ndo_features_check	= qlcnic_features_check,
 #endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 6369a57..5490629 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -628,9 +628,10 @@ static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
 
 	rcu_read_lock();
 	for_each_netdev_rcu(net, dev) {
-		if (dev->netdev_ops->ndo_add_vxlan_port)
-			dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
-							    port);
+		if (dev->netdev_ops->ndo_add_udp_tunnel_port)
+			dev->netdev_ops->ndo_add_udp_tunnel_port(dev, sa_family,
+							      port,
+							      UDP_TUNNEL_VXLAN);
 	}
 	rcu_read_unlock();
 }
@@ -646,9 +647,10 @@ static void vxlan_notify_del_rx_port(struct vxlan_sock *vs)
 
 	rcu_read_lock();
 	for_each_netdev_rcu(net, dev) {
-		if (dev->netdev_ops->ndo_del_vxlan_port)
-			dev->netdev_ops->ndo_del_vxlan_port(dev, sa_family,
-							    port);
+		if (dev->netdev_ops->ndo_del_udp_tunnel_port)
+			dev->netdev_ops->ndo_del_udp_tunnel_port(dev, sa_family,
+							      port,
+							      UDP_TUNNEL_VXLAN);
 	}
 	rcu_read_unlock();
 
@@ -2422,9 +2424,9 @@ static struct device_type vxlan_type = {
 	.name = "vxlan",
 };
 
-/* Calls the ndo_add_vxlan_port of the caller in order to
+/* Calls the ndo_add_udp_tunnel_port of the caller in order to
  * supply the listening VXLAN udp ports. Callers are expected
- * to implement the ndo_add_vxlan_port.
+ * to implement the ndo_add_tunnel_port.
  */
 void vxlan_get_rx_port(struct net_device *dev)
 {
@@ -2440,8 +2442,9 @@ void vxlan_get_rx_port(struct net_device *dev)
 		hlist_for_each_entry_rcu(vs, &vn->sock_list[i], hlist) {
 			port = inet_sk(vs->sock->sk)->inet_sport;
 			sa_family = vxlan_get_sk_family(vs);
-			dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
-							    port);
+			dev->netdev_ops->ndo_add_udp_tunnel_port(dev, sa_family,
+							      port,
+							      UDP_TUNNEL_VXLAN);
 		}
 	}
 	spin_unlock(&vn->sock_lock);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7d2d1d7..eaecc42 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1004,18 +1004,19 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	not implement this, it is assumed that the hw is not able to have
  *	multiple net devices on single physical port.
  *
- * void (*ndo_add_vxlan_port)(struct  net_device *dev,
- *			      sa_family_t sa_family, __be16 port);
- *	Called by vxlan to notiy a driver about the UDP port and socket
- *	address family that vxlan is listnening to. It is called only when
- *	a new port starts listening. The operation is protected by the
- *	vxlan_net->sock_lock.
- *
- * void (*ndo_del_vxlan_port)(struct  net_device *dev,
- *			      sa_family_t sa_family, __be16 port);
- *	Called by vxlan to notify the driver about a UDP port and socket
- *	address family that vxlan is not listening to anymore. The operation
- *	is protected by the vxlan_net->sock_lock.
+ * void (*ndo_add_udp_tunnel_port)(struct  net_device *dev,
+ *			      sa_family_t sa_family, __be16 port, u32 type);
+ *	Called by UDP based tunnel modules to notify a driver about a UDP
+ *	port and socket address family that the tunnel is listening to. It is
+ *	called only when a new port starts listening. The operation is
+ *	protected by udp_offload_lock across all udp based tunnels.
+ *
+ * void (*ndo_del_udp_tunnel_port)(struct  net_device *dev,
+ *			      sa_family_t sa_family, __be16 port, u32 type);
+ *	Called by UDP based tunnel modules to notify the driver about a UDP port
+ *	and socket address family that tunnel is not listening to anymore.
+ *	The operation is protected by udp_offload_lock across all udp based
+ *	tunnels.
  *
  * void* (*ndo_dfwd_add_station)(struct net_device *pdev,
  *				 struct net_device *dev)
@@ -1209,13 +1210,12 @@ struct net_device_ops {
 							struct netdev_phys_item_id *ppid);
 	int			(*ndo_get_phys_port_name)(struct net_device *dev,
 							  char *name, size_t len);
-	void			(*ndo_add_vxlan_port)(struct  net_device *dev,
+	void			(*ndo_add_udp_tunnel_port)(struct  net_device *dev,
 						      sa_family_t sa_family,
-						      __be16 port);
-	void			(*ndo_del_vxlan_port)(struct  net_device *dev,
+						      __be16 port, u32 type);
+	void			(*ndo_del_udp_tunnel_port)(struct  net_device *dev,
 						      sa_family_t sa_family,
-						      __be16 port);
-
+						      __be16 port, u32 type);
 	void*			(*ndo_dfwd_add_station)(struct net_device *pdev,
 							struct net_device *dev);
 	void			(*ndo_dfwd_del_station)(struct net_device *pdev,
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index cb2f89f..72415aa 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -9,6 +9,12 @@
 #include <net/addrconf.h>
 #endif
 
+enum udp_tunnel_type {
+	UDP_TUNNEL_UNSPEC,
+	UDP_TUNNEL_VXLAN,
+	UDP_TUNNEL_GENEVE,
+};
+
 struct udp_port_cfg {
 	u8			family;
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v1 2/6] net: Add a generic udp_offload_get_port function
  2015-11-23 21:02 [PATCH 0/6] Generalize udp based tunnels and add geneve offload Anjali Singhai Jain
  2015-11-23 21:02 ` [PATCH v1 1/6] net: Generalize udp based tunnel offload Anjali Singhai Jain
@ 2015-11-23 21:02 ` Anjali Singhai Jain
  2015-11-24  6:08   ` Alexander Duyck
  2015-11-24  6:37   ` Alexander Duyck
  2015-11-23 21:02 ` [PATCH v1 3/6] i40e: Generalize the flow for udp based tunnels Anjali Singhai Jain
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 94+ messages in thread
From: Anjali Singhai Jain @ 2015-11-23 21:02 UTC (permalink / raw)
  To: netdev; +Cc: jesse, Anjali Singhai Jain, Kiran Patil

The new function udp_offload_get_port replaces vxlan_get_rx_port().
This is a generic function that will help replay all udp tunnel ports
irrespective of tunnel type.
This way when new udp tunnels get added this function need not change.

Note: Drivers besides i40e are compile tested with this change.

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  5 ++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c        |  4 +++-
 drivers/net/ethernet/emulex/benet/be_main.c      |  4 +++-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |  3 ++-
 drivers/net/ethernet/intel/i40e/i40e_main.c      |  5 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  5 ++--
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |  3 ++-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |  3 ++-
 drivers/net/vxlan.c                              | 29 ++----------------------
 include/linux/netdevice.h                        |  2 ++
 include/net/protocol.h                           |  2 ++
 include/net/vxlan.h                              |  8 -------
 net/ipv4/udp_offload.c                           | 27 ++++++++++++++++++++++
 13 files changed, 53 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index ad2782f..56777c8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -60,6 +60,7 @@
 #include <linux/semaphore.h>
 #include <linux/stringify.h>
 #include <linux/vmalloc.h>
+#include <net/protocol.h>
 
 #include "bnx2x.h"
 #include "bnx2x_init.h"
@@ -10293,7 +10294,7 @@ sp_rtnl_not_reset:
 			netdev_info(bp->dev,
 				    "Deleted vxlan dest port %d", port);
 			bp->vxlan_dst_port = 0;
-			vxlan_get_rx_port(bp->dev);
+			udp_offload_get_port(bp->dev);
 		}
 	}
 #endif
@@ -12499,7 +12500,7 @@ static int bnx2x_open(struct net_device *dev)
 
 #ifdef CONFIG_BNX2X_VXLAN
 	if (IS_PF(bp))
-		vxlan_get_rx_port(dev);
+		udp_offload_get_port(dev);
 #endif
 
 	return 0;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5b96ddf..f49ca38 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -39,6 +39,7 @@
 #include <net/ip6_checksum.h>
 #if defined(CONFIG_VXLAN) || defined(CONFIG_VXLAN_MODULE)
 #include <net/vxlan.h>
+#include <net/protocol.h>
 #endif
 #ifdef CONFIG_NET_RX_BUSY_POLL
 #include <net/busy_poll.h>
@@ -4589,7 +4590,7 @@ static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init)
 
 	if (irq_re_init) {
 #if defined(CONFIG_VXLAN) || defined(CONFIG_VXLAN_MODULE)
-		vxlan_get_rx_port(bp->dev);
+		udp_offload_get_port(bp->dev);
 #endif
 		if (!bnxt_hwrm_tunnel_dst_port_alloc(
 				bp, htons(0x17c1),
@@ -5458,6 +5459,7 @@ static void bnxt_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
 
 	if (type != UDP_TUNNEL_VXLAN)
 		return;
+
 	if (bp->vxlan_port_cnt && bp->vxlan_port == port) {
 		bp->vxlan_port_cnt--;
 
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index e699deca..a4da753 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -25,6 +25,7 @@
 #include <net/busy_poll.h>
 #include <net/udp_tunnel.h>
 #include <net/vxlan.h>
+#include <net/protocol.h>
 
 MODULE_VERSION(DRV_VER);
 MODULE_DESCRIPTION(DRV_DESC " " DRV_VER);
@@ -3604,7 +3605,7 @@ static int be_open(struct net_device *netdev)
 
 #ifdef CONFIG_BE2NET_VXLAN
 	if (skyhawk_chip(adapter))
-		vxlan_get_rx_port(netdev);
+		udp_offload_get_port(netdev);
 #endif
 
 	return 0;
@@ -5239,6 +5240,7 @@ static void be_del_vxlan_port(struct net_device *netdev, sa_family_t sa_family,
 
 	if (type != UDP_TUNNEL_VXLAN)
 		return;
+
 	if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter))
 		return;
 
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 447d5e6..1564a13 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -23,6 +23,7 @@
 #if IS_ENABLED(CONFIG_FM10K_VXLAN)
 #include <net/udp_tunnel.h>
 #include <net/vxlan.h>
+#include <net/protocol.h>
 #endif /* CONFIG_FM10K_VXLAN */
 
 /**
@@ -573,7 +574,7 @@ int fm10k_open(struct net_device *netdev)
 
 #if IS_ENABLED(CONFIG_FM10K_VXLAN)
 	/* update VXLAN port configuration */
-	vxlan_get_rx_port(netdev);
+	udp_offload_get_port(netdev);
 
 #endif
 	fm10k_up(interface);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 520e34e..4be0a26 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -31,6 +31,7 @@
 #include <net/vxlan.h>
 #endif
 #include <net/udp_tunnel.h>
+#include <net/protocol.h>
 
 const char i40e_driver_name[] = "i40e";
 static const char i40e_driver_string[] =
@@ -5303,9 +5304,7 @@ int i40e_open(struct net_device *netdev)
 						       TCP_FLAG_CWR) >> 16);
 	wr32(&pf->hw, I40E_GLLAN_TSOMSK_L, be32_to_cpu(TCP_FLAG_CWR) >> 16);
 
-#ifdef CONFIG_I40E_VXLAN
-	vxlan_get_rx_port(netdev);
-#endif
+	udp_offload_get_port(netdev);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 76ccc77..ba92c7a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -52,6 +52,7 @@
 #include <scsi/fc/fc_fcoe.h>
 #include <net/udp_tunnel.h>
 #include <net/vxlan.h>
+#include <net/protocol.h>
 
 #ifdef CONFIG_OF
 #include <linux/of_net.h>
@@ -5823,7 +5824,7 @@ static int ixgbe_open(struct net_device *netdev)
 
 	ixgbe_clear_vxlan_port(adapter);
 #ifdef CONFIG_IXGBE_VXLAN
-	vxlan_get_rx_port(netdev);
+	udp_offload_get_port(netdev);
 #endif
 
 	return 0;
@@ -6913,7 +6914,7 @@ static void ixgbe_service_task(struct work_struct *work)
 #ifdef CONFIG_IXGBE_VXLAN
 	if (adapter->flags2 & IXGBE_FLAG2_VXLAN_REREG_NEEDED) {
 		adapter->flags2 &= ~IXGBE_FLAG2_VXLAN_REREG_NEEDED;
-		vxlan_get_rx_port(adapter->netdev);
+		udp_offload_get_port(adapter->netdev);
 	}
 #endif /* CONFIG_IXGBE_VXLAN */
 	ixgbe_reset_subtask(adapter);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 2cb19c7..b91b8f1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -41,6 +41,7 @@
 #include <net/busy_poll.h>
 #include <net/udp_tunnel.h>
 #include <net/vxlan.h>
+#include <net/protocol.h>
 
 #include <linux/mlx4/driver.h>
 #include <linux/mlx4/device.h>
@@ -1684,7 +1685,7 @@ int mlx4_en_start_port(struct net_device *dev)
 
 #ifdef CONFIG_MLX4_EN_VXLAN
 	if (priv->mdev->dev->caps.tunnel_offload_mode == MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
-		vxlan_get_rx_port(dev);
+		udp_offload_get_port(dev);
 #endif
 	priv->port_up = true;
 	netif_tx_start_all_queues(dev);
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index aa38dbb..a640872 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -19,6 +19,7 @@
 #ifdef CONFIG_QLCNIC_VXLAN
 #include <net/udp_tunnel.h>
 #include <net/vxlan.h>
+#include <net/protocol.h>
 #endif
 
 #include "qlcnic.h"
@@ -2026,7 +2027,7 @@ qlcnic_attach(struct qlcnic_adapter *adapter)
 
 #ifdef CONFIG_QLCNIC_VXLAN
 	if (qlcnic_encap_rx_offload(adapter))
-		vxlan_get_rx_port(netdev);
+		udp_offload_get_port(netdev);
 #endif
 
 	adapter->is_up = QLCNIC_ADAPTER_UP_MAGIC;
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 5490629..702f9be 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2424,33 +2424,6 @@ static struct device_type vxlan_type = {
 	.name = "vxlan",
 };
 
-/* Calls the ndo_add_udp_tunnel_port of the caller in order to
- * supply the listening VXLAN udp ports. Callers are expected
- * to implement the ndo_add_tunnel_port.
- */
-void vxlan_get_rx_port(struct net_device *dev)
-{
-	struct vxlan_sock *vs;
-	struct net *net = dev_net(dev);
-	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
-	sa_family_t sa_family;
-	__be16 port;
-	unsigned int i;
-
-	spin_lock(&vn->sock_lock);
-	for (i = 0; i < PORT_HASH_SIZE; ++i) {
-		hlist_for_each_entry_rcu(vs, &vn->sock_list[i], hlist) {
-			port = inet_sk(vs->sock->sk)->inet_sport;
-			sa_family = vxlan_get_sk_family(vs);
-			dev->netdev_ops->ndo_add_udp_tunnel_port(dev, sa_family,
-							      port,
-							      UDP_TUNNEL_VXLAN);
-		}
-	}
-	spin_unlock(&vn->sock_lock);
-}
-EXPORT_SYMBOL_GPL(vxlan_get_rx_port);
-
 /* Initialize the device structure. */
 static void vxlan_setup(struct net_device *dev)
 {
@@ -2639,6 +2612,8 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6,
 
 	/* Initialize the vxlan udp offloads structure */
 	vs->udp_offloads.port = port;
+	vs->udp_offloads.tunnel_type = UDP_TUNNEL_VXLAN;
+	vs->udp_offloads.family = ipv6 ? AF_INET6 : AF_INET;
 	vs->udp_offloads.callbacks.gro_receive  = vxlan_gro_receive;
 	vs->udp_offloads.callbacks.gro_complete = vxlan_gro_complete;
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index eaecc42..0073009 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2075,6 +2075,8 @@ struct udp_offload_callbacks {
 
 struct udp_offload {
 	__be16			 port;
+	u8			 tunnel_type;
+	u8			 family;
 	u8			 ipproto;
 	struct udp_offload_callbacks callbacks;
 };
diff --git a/include/net/protocol.h b/include/net/protocol.h
index d6fcc1f..738bfc6 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -110,6 +110,8 @@ void inet_unregister_protosw(struct inet_protosw *p);
 int  udp_add_offload(struct udp_offload *prot);
 void udp_del_offload(struct udp_offload *prot);
 
+void udp_offload_get_port(struct net_device *dev);
+
 #if IS_ENABLED(CONFIG_IPV6)
 int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
 int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char num);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index c1c899c..926455e 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -242,14 +242,6 @@ static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
 /* IPv6 header + UDP + VXLAN + Ethernet header */
 #define VXLAN6_HEADROOM (40 + 8 + 8 + 14)
 
-#if IS_ENABLED(CONFIG_VXLAN)
-void vxlan_get_rx_port(struct net_device *netdev);
-#else
-static inline void vxlan_get_rx_port(struct net_device *netdev)
-{
-}
-#endif
-
 static inline unsigned short vxlan_get_sk_family(struct vxlan_sock *vs)
 {
 	return vs->sock->sk->sk_family;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index f938616..8597020 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -290,6 +290,33 @@ unlock:
 }
 EXPORT_SYMBOL(udp_del_offload);
 
+void udp_offload_get_port(struct net_device *dev)
+{
+	struct udp_offload_priv __rcu **head;
+	struct udp_offload_priv *uo_priv;
+	struct udp_offload *uo;
+
+	if (udp_offload_base)
+		head = &udp_offload_base;
+	else
+		return;
+
+	spin_lock(&udp_offload_lock);
+	uo_priv = udp_deref_protected(*head);
+	for (; uo_priv != NULL; uo_priv = udp_deref_protected(*head)) {
+		/* call the right add port */
+		uo = uo_priv->offload;
+		if (uo && dev->netdev_ops->ndo_add_udp_tunnel_port)
+			dev->netdev_ops->ndo_add_udp_tunnel_port(dev,
+							uo->family,
+							uo->port,
+							uo->tunnel_type);
+		head = &uo_priv->next;
+	}
+	spin_unlock(&udp_offload_lock);
+}
+EXPORT_SYMBOL(udp_offload_get_port);
+
 struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
 				 struct udphdr *uh)
 {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v1 3/6] i40e: Generalize the flow for udp based tunnels
  2015-11-23 21:02 [PATCH 0/6] Generalize udp based tunnels and add geneve offload Anjali Singhai Jain
  2015-11-23 21:02 ` [PATCH v1 1/6] net: Generalize udp based tunnel offload Anjali Singhai Jain
  2015-11-23 21:02 ` [PATCH v1 2/6] net: Add a generic udp_offload_get_port function Anjali Singhai Jain
@ 2015-11-23 21:02 ` Anjali Singhai Jain
  2015-11-23 21:02 ` [PATCH v1 4/6] i40e: Remove CONFIG_I40E_VXLAN Anjali Singhai Jain
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 94+ messages in thread
From: Anjali Singhai Jain @ 2015-11-23 21:02 UTC (permalink / raw)
  To: netdev; +Cc: jesse, Anjali Singhai Jain, Kiran Patil

This patch generalizes the driver flow to make room for more than just
VXLAN tunnel by defining a common data structure with tunnel type.

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      | 15 ++++----
 drivers/net/ethernet/intel/i40e/i40e_main.c | 59 +++++++++++++++--------------
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |  6 +--
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |  2 +-
 4 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 4dd3e26..c701bbb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -244,6 +244,11 @@ struct i40e_tc_configuration {
 	struct i40e_tc_info tc_info[I40E_MAX_TRAFFIC_CLASS];
 };
 
+struct i40e_udp_port_config {
+	__be16 index;
+	u8 type;
+};
+
 /* struct that defines the Ethernet device */
 struct i40e_pf {
 	struct pci_dev *pdev;
@@ -280,11 +285,9 @@ struct i40e_pf {
 	u32 fd_atr_cnt;
 	u32 fd_tcp_rule;
 
-#ifdef CONFIG_I40E_VXLAN
-	__be16  vxlan_ports[I40E_MAX_PF_UDP_OFFLOAD_PORTS];
-	u16 pending_vxlan_bitmap;
+	struct i40e_udp_port_config udp_ports[I40E_MAX_PF_UDP_OFFLOAD_PORTS];
+	u16 pending_udp_bitmap;
 
-#endif
 	enum i40e_interrupt_policy int_policy;
 	u16 rx_itr_default;
 	u16 tx_itr_default;
@@ -321,9 +324,7 @@ struct i40e_pf {
 #define I40E_FLAG_FD_ATR_ENABLED		BIT_ULL(22)
 #define I40E_FLAG_PTP				BIT_ULL(25)
 #define I40E_FLAG_MFP_ENABLED			BIT_ULL(26)
-#ifdef CONFIG_I40E_VXLAN
-#define I40E_FLAG_VXLAN_FILTER_SYNC		BIT_ULL(27)
-#endif
+#define I40E_FLAG_UDP_FILTER_SYNC              BIT_ULL(27)
 #define I40E_FLAG_PORT_ID_VALID			BIT_ULL(28)
 #define I40E_FLAG_DCB_CAPABLE			BIT_ULL(29)
 #define I40E_FLAG_RSS_AQ_CAPABLE		BIT_ULL(31)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 4be0a26..d65c10b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6984,30 +6984,30 @@ static void i40e_handle_mdd_event(struct i40e_pf *pf)
 	i40e_flush(hw);
 }
 
-#ifdef CONFIG_I40E_VXLAN
 /**
- * i40e_sync_vxlan_filters_subtask - Sync the VSI filter list with HW
+ * i40e_sync_udp_filters_subtask - Sync the VSI filter list with HW
  * @pf: board private structure
  **/
-static void i40e_sync_vxlan_filters_subtask(struct i40e_pf *pf)
+static void i40e_sync_udp_filters_subtask(struct i40e_pf *pf)
 {
+#ifdef CONFIG_I40E_VXLAN
 	struct i40e_hw *hw = &pf->hw;
 	i40e_status ret;
 	__be16 port;
 	int i;
 
-	if (!(pf->flags & I40E_FLAG_VXLAN_FILTER_SYNC))
+	if (!(pf->flags & I40E_FLAG_UDP_FILTER_SYNC))
 		return;
 
-	pf->flags &= ~I40E_FLAG_VXLAN_FILTER_SYNC;
+	pf->flags &= ~I40E_FLAG_UDP_FILTER_SYNC;
 
 	for (i = 0; i < I40E_MAX_PF_UDP_OFFLOAD_PORTS; i++) {
-		if (pf->pending_vxlan_bitmap & BIT_ULL(i)) {
-			pf->pending_vxlan_bitmap &= ~BIT_ULL(i);
-			port = pf->vxlan_ports[i];
+		if (pf->pending_udp_bitmap & BIT_ULL(i)) {
+			pf->pending_udp_bitmap &= ~BIT_ULL(i);
+			port = pf->udp_ports[i].index;
 			if (port)
 				ret = i40e_aq_add_udp_tunnel(hw, ntohs(port),
-						     I40E_AQC_TUNNEL_TYPE_VXLAN,
+						     pf->udp_ports[i].type,
 						     NULL, NULL);
 			else
 				ret = i40e_aq_del_udp_tunnel(hw, i, NULL);
@@ -7020,13 +7020,13 @@ static void i40e_sync_vxlan_filters_subtask(struct i40e_pf *pf)
 					 i40e_stat_str(&pf->hw, ret),
 					 i40e_aq_str(&pf->hw,
 						    pf->hw.aq.asq_last_status));
-				pf->vxlan_ports[i] = 0;
+				pf->udp_ports[i].index = 0;
 			}
 		}
 	}
+#endif
 }
 
-#endif
 /**
  * i40e_service_task - Run the driver's async subtasks
  * @work: pointer to work_struct containing our data
@@ -7051,9 +7051,7 @@ static void i40e_service_task(struct work_struct *work)
 	i40e_watchdog_subtask(pf);
 	i40e_fdir_reinit_subtask(pf);
 	i40e_sync_filters_subtask(pf);
-#ifdef CONFIG_I40E_VXLAN
-	i40e_sync_vxlan_filters_subtask(pf);
-#endif
+	i40e_sync_udp_filters_subtask(pf);
 	i40e_clean_adminq_subtask(pf);
 
 	i40e_service_event_complete(pf);
@@ -8277,18 +8275,18 @@ static int i40e_set_features(struct net_device *netdev,
 
 #ifdef CONFIG_I40E_VXLAN
 /**
- * i40e_get_vxlan_port_idx - Lookup a possibly offloaded for Rx UDP port
+ * i40e_get_udp_port_idx - Lookup a possibly offloaded for Rx UDP port
  * @pf: board private structure
  * @port: The UDP port to look up
  *
  * Returns the index number or I40E_MAX_PF_UDP_OFFLOAD_PORTS if port not found
  **/
-static u8 i40e_get_vxlan_port_idx(struct i40e_pf *pf, __be16 port)
+static u8 i40e_get_udp_port_idx(struct i40e_pf *pf, __be16 port)
 {
 	u8 i;
 
 	for (i = 0; i < I40E_MAX_PF_UDP_OFFLOAD_PORTS; i++) {
-		if (pf->vxlan_ports[i] == port)
+		if (pf->udp_ports[i].index == port)
 			return i;
 	}
 
@@ -8321,28 +8319,31 @@ static void i40e_add_tunnel_port(struct net_device *netdev,
 	if (sa_family == AF_INET6)
 		return;
 
-	idx = i40e_get_vxlan_port_idx(pf, port);
+	idx = i40e_get_udp_port_idx(pf, port);
 
 	/* Check if port already exists */
 	if (idx < I40E_MAX_PF_UDP_OFFLOAD_PORTS) {
-		netdev_info(netdev, "vxlan port %d already offloaded\n",
+		netdev_info(netdev, "UDP port %d already offloaded\n",
 			    ntohs(port));
 		return;
 	}
 
 	/* Now check if there is space to add the new port */
-	next_idx = i40e_get_vxlan_port_idx(pf, 0);
+	next_idx = i40e_get_udp_port_idx(pf, 0);
 
 	if (next_idx == I40E_MAX_PF_UDP_OFFLOAD_PORTS) {
-		netdev_info(netdev, "maximum number of vxlan UDP ports reached, not adding port %d\n",
+		netdev_info(netdev, "maximum number of UDP ports reached, not adding port %d\n",
 			    ntohs(port));
 		return;
 	}
 
 	/* New port: add it and mark its index in the bitmap */
-	pf->vxlan_ports[next_idx] = port;
-	pf->pending_vxlan_bitmap |= BIT_ULL(next_idx);
-	pf->flags |= I40E_FLAG_VXLAN_FILTER_SYNC;
+	pf->udp_ports[next_idx].index = port;
+	if (type == UDP_TUNNEL_VXLAN)
+		pf->udp_ports[next_idx].type = I40E_AQC_TUNNEL_TYPE_VXLAN;
+
+	pf->pending_udp_bitmap |= BIT_ULL(next_idx);
+	pf->flags |= I40E_FLAG_UDP_FILTER_SYNC;
 }
 
 /**
@@ -8370,18 +8371,18 @@ static void i40e_del_tunnel_port(struct net_device *netdev,
 	if (sa_family == AF_INET6)
 		return;
 
-	idx = i40e_get_vxlan_port_idx(pf, port);
+	idx = i40e_get_udp_port_idx(pf, port);
 
 	/* Check if port already exists */
 	if (idx < I40E_MAX_PF_UDP_OFFLOAD_PORTS) {
 		/* if port exists, set it to 0 (mark for deletion)
 		 * and make it pending
 		 */
-		pf->vxlan_ports[idx] = 0;
-		pf->pending_vxlan_bitmap |= BIT_ULL(idx);
-		pf->flags |= I40E_FLAG_VXLAN_FILTER_SYNC;
+		pf->udp_ports[idx].index = 0;
+		pf->pending_udp_bitmap |= BIT_ULL(idx);
+		pf->flags |= I40E_FLAG_UDP_FILTER_SYNC;
 	} else {
-		netdev_warn(netdev, "vxlan port %d was not found, not deleting\n",
+		netdev_warn(netdev, "udp tunnel port %d was not found, not deleting\n",
 			    ntohs(port));
 	}
 }
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 6649ce4..46556b6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1992,7 +1992,7 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	if (!(tx_flags & (I40E_TX_FLAGS_IPV4 | I40E_TX_FLAGS_IPV6)))
 		return;
 
-	if (!(tx_flags & I40E_TX_FLAGS_VXLAN_TUNNEL)) {
+	if (!(tx_flags & I40E_TX_FLAGS_UDP_TUNNEL)) {
 		/* snag network header to get L4 type and address */
 		hdr.network = skb_network_header(skb);
 
@@ -2077,7 +2077,7 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct sk_buff *skb,
 		     I40E_TXD_FLTR_QW1_FD_STATUS_SHIFT;
 
 	dtype_cmd |= I40E_TXD_FLTR_QW1_CNT_ENA_MASK;
-	if (!(tx_flags & I40E_TX_FLAGS_VXLAN_TUNNEL))
+	if (!(tx_flags & I40E_TX_FLAGS_UDP_TUNNEL))
 		dtype_cmd |=
 			((u32)I40E_FD_ATR_STAT_IDX(pf->hw.pf_id) <<
 			I40E_TXD_FLTR_QW1_CNTINDEX_SHIFT) &
@@ -2312,7 +2312,7 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 *tx_flags,
 			oudph = udp_hdr(skb);
 			oiph = ip_hdr(skb);
 			l4_tunnel = I40E_TXD_CTX_UDP_TUNNELING;
-			*tx_flags |= I40E_TX_FLAGS_VXLAN_TUNNEL;
+			*tx_flags |= I40E_TX_FLAGS_UDP_TUNNEL;
 			break;
 		case IPPROTO_GRE:
 			l4_tunnel = I40E_TXD_CTX_GRE_TUNNELING;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 6779fb7..b12691f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -163,7 +163,7 @@ enum i40e_dyn_idx_t {
 #define I40E_TX_FLAGS_FSO		BIT(7)
 #define I40E_TX_FLAGS_TSYN		BIT(8)
 #define I40E_TX_FLAGS_FD_SB		BIT(9)
-#define I40E_TX_FLAGS_VXLAN_TUNNEL	BIT(10)
+#define I40E_TX_FLAGS_UDP_TUNNEL	BIT(10)
 #define I40E_TX_FLAGS_VLAN_MASK		0xffff0000
 #define I40E_TX_FLAGS_VLAN_PRIO_MASK	0xe0000000
 #define I40E_TX_FLAGS_VLAN_PRIO_SHIFT	29
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v1 4/6] i40e: Remove CONFIG_I40E_VXLAN
  2015-11-23 21:02 [PATCH 0/6] Generalize udp based tunnels and add geneve offload Anjali Singhai Jain
                   ` (2 preceding siblings ...)
  2015-11-23 21:02 ` [PATCH v1 3/6] i40e: Generalize the flow for udp based tunnels Anjali Singhai Jain
@ 2015-11-23 21:02 ` Anjali Singhai Jain
  2015-11-23 21:02 ` [PATCH v1 5/6] net: Refactor udp_offload and add Geneve port offload support Anjali Singhai Jain
  2015-11-23 21:02 ` [PATCH v1 6/6] i40e:Add geneve tunnel " Anjali Singhai Jain
  5 siblings, 0 replies; 94+ messages in thread
From: Anjali Singhai Jain @ 2015-11-23 21:02 UTC (permalink / raw)
  To: netdev; +Cc: jesse, Anjali Singhai Jain, Kiran Patil

If the kernel flag CONFIG_VXLAN is true or CONFIG_VXLAN_MODULE is true,
enable VXLAN offload in the driver.

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
---
 drivers/net/ethernet/intel/Kconfig          | 11 -----------
 drivers/net/ethernet/intel/i40e/i40e_main.c | 14 ++++++++------
 2 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 4163b16..061e4e0 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -269,17 +269,6 @@ config I40E
 	  To compile this driver as a module, choose M here. The module
 	  will be called i40e.
 
-config I40E_VXLAN
-	bool "Virtual eXtensible Local Area Network Support"
-	default n
-	depends on I40E && VXLAN && !(I40E=y && VXLAN=m)
-	---help---
-	  This allows one to create VXLAN virtual interfaces that provide
-	  Layer 2 Networks over Layer 3 Networks. VXLAN is often used
-	  to tunnel virtual network infrastructure in virtualized environments.
-	  Say Y here if you want to use Virtual eXtensible Local Area Network
-	  (VXLAN) in the driver.
-
 config I40E_DCB
 	bool "Data Center Bridging (DCB) Support"
 	default n
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d65c10b..f6447c7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -27,7 +27,7 @@
 /* Local includes */
 #include "i40e.h"
 #include "i40e_diag.h"
-#ifdef CONFIG_I40E_VXLAN
+#if IS_ENABLED(CONFIG_VXLAN)
 #include <net/vxlan.h>
 #endif
 #include <net/udp_tunnel.h>
@@ -6990,7 +6990,7 @@ static void i40e_handle_mdd_event(struct i40e_pf *pf)
  **/
 static void i40e_sync_udp_filters_subtask(struct i40e_pf *pf)
 {
-#ifdef CONFIG_I40E_VXLAN
+#if IS_ENABLED(CONFIG_VXLAN)
 	struct i40e_hw *hw = &pf->hw;
 	i40e_status ret;
 	__be16 port;
@@ -8273,7 +8273,7 @@ static int i40e_set_features(struct net_device *netdev,
 	return 0;
 }
 
-#ifdef CONFIG_I40E_VXLAN
+#if IS_ENABLED(CONFIG_VXLAN)
 /**
  * i40e_get_udp_port_idx - Lookup a possibly offloaded for Rx UDP port
  * @pf: board private structure
@@ -8293,6 +8293,7 @@ static u8 i40e_get_udp_port_idx(struct i40e_pf *pf, __be16 port)
 	return i;
 }
 
+#endif
 /**
  * i40e_add_tunnel_port - Get notifications about UDP tunnel ports that come up
  * @netdev: This physical port's netdev
@@ -8307,6 +8308,7 @@ static void i40e_add_tunnel_port(struct net_device *netdev,
 				 sa_family_t sa_family, __be16 port,
 				 u32 type)
 {
+#if IS_ENABLED(CONFIG_VXLAN)
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
@@ -8344,6 +8346,7 @@ static void i40e_add_tunnel_port(struct net_device *netdev,
 
 	pf->pending_udp_bitmap |= BIT_ULL(next_idx);
 	pf->flags |= I40E_FLAG_UDP_FILTER_SYNC;
+#endif
 }
 
 /**
@@ -8360,6 +8363,7 @@ static void i40e_del_tunnel_port(struct net_device *netdev,
 				 sa_family_t sa_family, __be16 port,
 				 u32 type)
 {
+#if IS_ENABLED(CONFIG_VXLAN)
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
@@ -8385,9 +8389,9 @@ static void i40e_del_tunnel_port(struct net_device *netdev,
 		netdev_warn(netdev, "udp tunnel port %d was not found, not deleting\n",
 			    ntohs(port));
 	}
+#endif
 }
 
-#endif
 static int i40e_get_phys_port_id(struct net_device *netdev,
 				 struct netdev_phys_item_id *ppid)
 {
@@ -8612,10 +8616,8 @@ static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_get_vf_config	= i40e_ndo_get_vf_config,
 	.ndo_set_vf_link_state	= i40e_ndo_set_vf_link_state,
 	.ndo_set_vf_spoofchk	= i40e_ndo_set_vf_spoofchk,
-#ifdef CONFIG_I40E_VXLAN
 	.ndo_add_udp_tunnel_port	= i40e_add_tunnel_port,
 	.ndo_del_udp_tunnel_port	= i40e_del_tunnel_port,
-#endif
 	.ndo_get_phys_port_id	= i40e_get_phys_port_id,
 	.ndo_fdb_add		= i40e_ndo_fdb_add,
 	.ndo_features_check	= i40e_features_check,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v1 5/6] net: Refactor udp_offload and add Geneve port offload support
  2015-11-23 21:02 [PATCH 0/6] Generalize udp based tunnels and add geneve offload Anjali Singhai Jain
                   ` (3 preceding siblings ...)
  2015-11-23 21:02 ` [PATCH v1 4/6] i40e: Remove CONFIG_I40E_VXLAN Anjali Singhai Jain
@ 2015-11-23 21:02 ` Anjali Singhai Jain
  2015-11-23 21:02 ` [PATCH v1 6/6] i40e:Add geneve tunnel " Anjali Singhai Jain
  5 siblings, 0 replies; 94+ messages in thread
From: Anjali Singhai Jain @ 2015-11-23 21:02 UTC (permalink / raw)
  To: netdev; +Cc: jesse, Anjali Singhai Jain, Kiran Patil

This patch moves the calls to ndo_add/del ops for vxlan tunnel
into udp_add/del_offload call. This way the ndo_add/del ops get called
for not only vxlan tunnels but for any udp based tunnel.

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
---
 drivers/net/geneve.c   | 18 +++++++--------
 drivers/net/vxlan.c    | 35 ++++------------------------
 include/net/protocol.h |  4 ++--
 net/ipv4/fou.c         | 15 ++++++------
 net/ipv4/udp_offload.c | 63 +++++++++++++++++++++++++++++++++-----------------
 5 files changed, 64 insertions(+), 71 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index de5c30c..9dc513a 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -372,15 +372,12 @@ static struct socket *geneve_create_sock(struct net *net, bool ipv6,
 static void geneve_notify_add_rx_port(struct geneve_sock *gs)
 {
 	struct sock *sk = gs->sock->sk;
-	sa_family_t sa_family = sk->sk_family;
+	struct net *net = sock_net(sk);
 	int err;
 
-	if (sa_family == AF_INET) {
-		err = udp_add_offload(&gs->udp_offloads);
-		if (err)
-			pr_warn("geneve: udp_add_offload failed with status %d\n",
-				err);
-	}
+	err = udp_add_offload(&gs->udp_offloads, net);
+	if (err)
+		pr_warn("geneve: udp_add_offload failed with status %d\n", err);
 }
 
 static int geneve_hlen(struct genevehdr *gh)
@@ -505,6 +502,8 @@ static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
 
 	/* Initialize the geneve udp offloads structure */
 	gs->udp_offloads.port = port;
+	gs->udp_offloads.tunnel_type = UDP_TUNNEL_GENEVE;
+	gs->udp_offloads.family = ipv6 ? AF_INET6 : AF_INET;
 	gs->udp_offloads.callbacks.gro_receive  = geneve_gro_receive;
 	gs->udp_offloads.callbacks.gro_complete = geneve_gro_complete;
 	geneve_notify_add_rx_port(gs);
@@ -522,10 +521,9 @@ static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
 static void geneve_notify_del_rx_port(struct geneve_sock *gs)
 {
 	struct sock *sk = gs->sock->sk;
-	sa_family_t sa_family = sk->sk_family;
+	struct net *net = sock_net(sk);
 
-	if (sa_family == AF_INET)
-		udp_del_offload(&gs->udp_offloads);
+	udp_del_offload(&gs->udp_offloads, net);
 }
 
 static void __geneve_sock_release(struct geneve_sock *gs)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 702f9be..231c17e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -613,49 +613,22 @@ static int vxlan_gro_complete(struct sk_buff *skb, int nhoff,
 /* Notify netdevs that UDP port started listening */
 static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
 {
-	struct net_device *dev;
 	struct sock *sk = vs->sock->sk;
 	struct net *net = sock_net(sk);
-	sa_family_t sa_family = vxlan_get_sk_family(vs);
-	__be16 port = inet_sk(sk)->inet_sport;
 	int err;
 
-	if (sa_family == AF_INET) {
-		err = udp_add_offload(&vs->udp_offloads);
-		if (err)
-			pr_warn("vxlan: udp_add_offload failed with status %d\n", err);
-	}
-
-	rcu_read_lock();
-	for_each_netdev_rcu(net, dev) {
-		if (dev->netdev_ops->ndo_add_udp_tunnel_port)
-			dev->netdev_ops->ndo_add_udp_tunnel_port(dev, sa_family,
-							      port,
-							      UDP_TUNNEL_VXLAN);
-	}
-	rcu_read_unlock();
+	err = udp_add_offload(&vs->udp_offloads, net);
+	if (err)
+		pr_warn("vxlan: udp_add_offload failed with status %d\n", err);
 }
 
 /* Notify netdevs that UDP port is no more listening */
 static void vxlan_notify_del_rx_port(struct vxlan_sock *vs)
 {
-	struct net_device *dev;
 	struct sock *sk = vs->sock->sk;
 	struct net *net = sock_net(sk);
-	sa_family_t sa_family = vxlan_get_sk_family(vs);
-	__be16 port = inet_sk(sk)->inet_sport;
-
-	rcu_read_lock();
-	for_each_netdev_rcu(net, dev) {
-		if (dev->netdev_ops->ndo_del_udp_tunnel_port)
-			dev->netdev_ops->ndo_del_udp_tunnel_port(dev, sa_family,
-							      port,
-							      UDP_TUNNEL_VXLAN);
-	}
-	rcu_read_unlock();
 
-	if (sa_family == AF_INET)
-		udp_del_offload(&vs->udp_offloads);
+	udp_del_offload(&vs->udp_offloads, net);
 }
 
 /* Add new entry to forwarding table -- assumes lock held */
diff --git a/include/net/protocol.h b/include/net/protocol.h
index 738bfc6..16ee9b5 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -107,8 +107,8 @@ int inet_del_offload(const struct net_offload *prot, unsigned char num);
 void inet_register_protosw(struct inet_protosw *p);
 void inet_unregister_protosw(struct inet_protosw *p);
 
-int  udp_add_offload(struct udp_offload *prot);
-void udp_del_offload(struct udp_offload *prot);
+int  udp_add_offload(struct udp_offload *prot, struct net *net);
+void udp_del_offload(struct udp_offload *prot, struct net *net);
 
 void udp_offload_get_port(struct net_device *dev);
 
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index e0fcbbb..4705590 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -411,9 +411,9 @@ static void fou_release(struct fou *fou)
 {
 	struct socket *sock = fou->sock;
 	struct sock *sk = sock->sk;
+	struct net *net = sock_net(sk);
 
-	if (sk->sk_family == AF_INET)
-		udp_del_offload(&fou->udp_offloads);
+	udp_del_offload(&fou->udp_offloads, net);
 	list_del(&fou->list);
 	udp_tunnel_sock_release(sock);
 
@@ -484,6 +484,9 @@ static int fou_create(struct net *net, struct fou_cfg *cfg,
 		goto error;
 	}
 
+	fou->udp_offloads.tunnel_type = UDP_TUNNEL_UNSPEC;
+	fou->udp_offloads.family = cfg->udp_config.family;
+
 	fou->type = cfg->type;
 
 	udp_sk(sk)->encap_type = 1;
@@ -496,11 +499,9 @@ static int fou_create(struct net *net, struct fou_cfg *cfg,
 
 	sk->sk_allocation = GFP_ATOMIC;
 
-	if (cfg->udp_config.family == AF_INET) {
-		err = udp_add_offload(&fou->udp_offloads);
-		if (err)
-			goto error;
-	}
+	err = udp_add_offload(&fou->udp_offloads, net);
+	if (err)
+		goto error;
 
 	err = fou_add_to_port_list(net, fou);
 	if (err)
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 8597020..cf71bd0 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -241,18 +241,28 @@ out:
 	return segs;
 }
 
-int udp_add_offload(struct udp_offload *uo)
+int udp_add_offload(struct udp_offload *uo, struct net *net)
 {
-	struct udp_offload_priv *new_offload = kzalloc(sizeof(*new_offload), GFP_ATOMIC);
-
-	if (!new_offload)
-		return -ENOMEM;
-
-	new_offload->offload = uo;
+	struct udp_offload_priv *new_offload = NULL;
+	struct net_device *dev;
+
+	if (uo->family == AF_INET) {
+		new_offload = kzalloc(sizeof(*new_offload), GFP_ATOMIC);
+		if (!new_offload)
+			return -ENOMEM;
+		new_offload->offload = uo;
+	}
 
 	spin_lock(&udp_offload_lock);
-	new_offload->next = udp_offload_base;
-	rcu_assign_pointer(udp_offload_base, new_offload);
+	if (new_offload) {
+		new_offload->next = udp_offload_base;
+		rcu_assign_pointer(udp_offload_base, new_offload);
+	}
+	for_each_netdev_rcu(net, dev) {
+		if (dev->netdev_ops->ndo_add_udp_tunnel_port)
+			dev->netdev_ops->ndo_add_udp_tunnel_port(dev,
+					 uo->family, uo->port, uo->tunnel_type);
+	}
 	spin_unlock(&udp_offload_lock);
 
 	return 0;
@@ -265,24 +275,35 @@ static void udp_offload_free_routine(struct rcu_head *head)
 	kfree(ou_priv);
 }
 
-void udp_del_offload(struct udp_offload *uo)
+void udp_del_offload(struct udp_offload *uo, struct net *net)
 {
-	struct udp_offload_priv __rcu **head = &udp_offload_base;
-	struct udp_offload_priv *uo_priv;
+	struct udp_offload_priv __rcu **head;
+	struct udp_offload_priv *uo_priv = NULL;
+	struct net_device *dev;
 
 	spin_lock(&udp_offload_lock);
 
-	uo_priv = udp_deref_protected(*head);
-	for (; uo_priv != NULL;
-	     uo_priv = udp_deref_protected(*head)) {
-		if (uo_priv->offload == uo) {
-			rcu_assign_pointer(*head,
-					   udp_deref_protected(uo_priv->next));
-			goto unlock;
+	for_each_netdev_rcu(net, dev) {
+		if (dev->netdev_ops->ndo_add_udp_tunnel_port)
+			dev->netdev_ops->ndo_del_udp_tunnel_port(dev,
+					 uo->family, uo->port, uo->tunnel_type);
+	}
+
+	if (uo->family == AF_INET) {
+		head = &udp_offload_base;
+		uo_priv = udp_deref_protected(*head);
+		for (; uo_priv != NULL;
+		     uo_priv = udp_deref_protected(*head)) {
+			if (uo_priv->offload == uo) {
+				rcu_assign_pointer(*head,
+					    udp_deref_protected(uo_priv->next));
+				goto unlock;
+			}
+			head = &uo_priv->next;
 		}
-		head = &uo_priv->next;
+		pr_warn("udp_del_offload: didn't find offload for port %d\n",
+			ntohs(uo->port));
 	}
-	pr_warn("udp_del_offload: didn't find offload for port %d\n", ntohs(uo->port));
 unlock:
 	spin_unlock(&udp_offload_lock);
 	if (uo_priv)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v1 6/6] i40e:Add geneve tunnel offload support
  2015-11-23 21:02 [PATCH 0/6] Generalize udp based tunnels and add geneve offload Anjali Singhai Jain
                   ` (4 preceding siblings ...)
  2015-11-23 21:02 ` [PATCH v1 5/6] net: Refactor udp_offload and add Geneve port offload support Anjali Singhai Jain
@ 2015-11-23 21:02 ` Anjali Singhai Jain
  5 siblings, 0 replies; 94+ messages in thread
From: Anjali Singhai Jain @ 2015-11-23 21:02 UTC (permalink / raw)
  To: netdev; +Cc: jesse, Anjali Singhai Jain, Kiran Patil

This patch adds driver hooks to implement ndo_ops to add/del udp
port in the HW to identify GENEVE tunnels.

Also cleans up the features_check path now that we support
multiple udp based tunnels.

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 35 +++++++++++++++++++++--------
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |  2 +-
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index c701bbb..69c2c88 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -336,6 +336,7 @@ struct i40e_pf {
 #define I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE	BIT_ULL(38)
 #define I40E_FLAG_LINK_POLLING_ENABLED		BIT_ULL(39)
 #define I40E_FLAG_VEB_MODE_ENABLED		BIT_ULL(40)
+#define I40E_FLAG_GENEVE_OFFLOAD_CAPABLE	BIT_ULL(41)
 #define I40E_FLAG_NO_PCI_LINK_CHECK		BIT_ULL(42)
 
 	/* tracks features that get auto disabled by errors */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f6447c7..e0e40c3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -30,6 +30,9 @@
 #if IS_ENABLED(CONFIG_VXLAN)
 #include <net/vxlan.h>
 #endif
+#if IS_ENABLED(CONFIG_GENEVE)
+#include <net/geneve.h>
+#endif
 #include <net/udp_tunnel.h>
 #include <net/protocol.h>
 
@@ -6990,7 +6993,7 @@ static void i40e_handle_mdd_event(struct i40e_pf *pf)
  **/
 static void i40e_sync_udp_filters_subtask(struct i40e_pf *pf)
 {
-#if IS_ENABLED(CONFIG_VXLAN)
+#if IS_ENABLED(CONFIG_VXLAN) || IS_ENABLED(CONFIG_GENEVE)
 	struct i40e_hw *hw = &pf->hw;
 	i40e_status ret;
 	__be16 port;
@@ -8174,7 +8177,8 @@ static int i40e_sw_init(struct i40e_pf *pf)
 			     I40E_FLAG_HW_ATR_EVICT_CAPABLE |
 			     I40E_FLAG_OUTER_UDP_CSUM_CAPABLE |
 			     I40E_FLAG_WB_ON_ITR_CAPABLE |
-			     I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE;
+			     I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE |
+			     I40E_FLAG_GENEVE_OFFLOAD_CAPABLE;
 	}
 	pf->eeprom_version = 0xDEAD;
 	pf->lan_veb = I40E_NO_VEB;
@@ -8273,7 +8277,7 @@ static int i40e_set_features(struct net_device *netdev,
 	return 0;
 }
 
-#if IS_ENABLED(CONFIG_VXLAN)
+#if IS_ENABLED(CONFIG_VXLAN) || IS_ENABLED(CONFIG_GENEVE)
 /**
  * i40e_get_udp_port_idx - Lookup a possibly offloaded for Rx UDP port
  * @pf: board private structure
@@ -8308,14 +8312,18 @@ static void i40e_add_tunnel_port(struct net_device *netdev,
 				 sa_family_t sa_family, __be16 port,
 				 u32 type)
 {
-#if IS_ENABLED(CONFIG_VXLAN)
+#if IS_ENABLED(CONFIG_VXLAN) || IS_ENABLED(CONFIG_GENEVE)
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
 	u8 next_idx;
 	u8 idx;
 
-	if (type != UDP_TUNNEL_VXLAN)
+	if (!(type == UDP_TUNNEL_VXLAN || type == UDP_TUNNEL_GENEVE))
+		return;
+
+	if ((type == UDP_TUNNEL_GENEVE) &&
+	    (!(pf->flags & I40E_FLAG_GENEVE_OFFLOAD_CAPABLE)))
 		return;
 
 	if (sa_family == AF_INET6)
@@ -8343,6 +8351,8 @@ static void i40e_add_tunnel_port(struct net_device *netdev,
 	pf->udp_ports[next_idx].index = port;
 	if (type == UDP_TUNNEL_VXLAN)
 		pf->udp_ports[next_idx].type = I40E_AQC_TUNNEL_TYPE_VXLAN;
+	else if (type == UDP_TUNNEL_GENEVE)
+		pf->udp_ports[next_idx].type = I40E_AQC_TUNNEL_TYPE_NGE;
 
 	pf->pending_udp_bitmap |= BIT_ULL(next_idx);
 	pf->flags |= I40E_FLAG_UDP_FILTER_SYNC;
@@ -8363,13 +8373,13 @@ static void i40e_del_tunnel_port(struct net_device *netdev,
 				 sa_family_t sa_family, __be16 port,
 				 u32 type)
 {
-#if IS_ENABLED(CONFIG_VXLAN)
+#if IS_ENABLED(CONFIG_VXLAN) || IS_ENABLED(CONFIG_GENEVE)
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
 	u8 idx;
 
-	if (type != UDP_TUNNEL_VXLAN)
+	if (!(type == UDP_TUNNEL_VXLAN || type == UDP_TUNNEL_GENEVE))
 		return;
 
 	if (sa_family == AF_INET6)
@@ -8569,7 +8579,10 @@ static int i40e_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
 				       nlflags, 0, 0, filter_mask, NULL);
 }
 
-#define I40E_MAX_TUNNEL_HDR_LEN 80
+/* Hardware supports L4 tunnel length of 128B (=2^7) which includes
+ * inner mac plus all inner ethertypes.
+ */
+#define I40E_MAX_TUNNEL_HDR_LEN 128
 /**
  * i40e_features_check - Validate encapsulated packet conforms to limits
  * @skb: skb buff
@@ -8581,7 +8594,7 @@ static netdev_features_t i40e_features_check(struct sk_buff *skb,
 					     netdev_features_t features)
 {
 	if (skb->encapsulation &&
-	    (skb_inner_mac_header(skb) - skb_transport_header(skb) >
+	    ((skb_inner_network_header(skb) - skb_transport_header(skb)) >
 	     I40E_MAX_TUNNEL_HDR_LEN))
 		return features & ~(NETIF_F_ALL_CSUM | NETIF_F_GSO_MASK);
 
@@ -8651,6 +8664,7 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 	np->vsi = vsi;
 
 	netdev->hw_enc_features |= NETIF_F_IP_CSUM	 |
+				  NETIF_F_RXCSUM	 |
 				  NETIF_F_GSO_UDP_TUNNEL |
 				  NETIF_F_GSO_GRE	 |
 				  NETIF_F_TSO;
@@ -10165,6 +10179,9 @@ static void i40e_print_features(struct i40e_pf *pf)
 #if IS_ENABLED(CONFIG_VXLAN)
 	buf += sprintf(buf, "VxLAN ");
 #endif
+#if IS_ENABLED(CONFIG_GENVE)
+	i += snprintf(&buf[i], REMAIN(i), "Geneve ");
+#endif
 	if (pf->flags & I40E_FLAG_PTP)
 		buf += sprintf(buf, "PTP ");
 #ifdef I40E_FCOE
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 46556b6..a66d590 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1374,7 +1374,7 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 	if (rx_error & BIT(I40E_RX_DESC_ERROR_PPRS_SHIFT))
 		return;
 
-	/* If VXLAN traffic has an outer UDPv4 checksum we need to check
+	/* If VXLAN/GENEVE traffic has an outer UDPv4 checksum we need to check
 	 * it in the driver, hardware does not do it for us.
 	 * Since L3L4P bit was set we assume a valid IHL value (>=5)
 	 * so the total length of IPv4 header is IHL*4 bytes
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-23 21:02 ` [PATCH v1 1/6] net: Generalize udp based tunnel offload Anjali Singhai Jain
  2015-11-23 20:57   ` kbuild test robot
  2015-11-23 20:58   ` kbuild test robot
@ 2015-11-23 21:53   ` Tom Herbert
  2015-11-23 22:49     ` Jesse Gross
  2015-11-30  3:21     ` David Miller
  2015-11-24  5:41   ` Alexander Duyck
  2015-11-30 16:35   ` Tom Herbert
  4 siblings, 2 replies; 94+ messages in thread
From: Tom Herbert @ 2015-11-23 21:53 UTC (permalink / raw)
  To: Anjali Singhai Jain; +Cc: Linux Kernel Network Developers, jesse, Kiran Patil

> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
> index cb2f89f..72415aa 100644
> --- a/include/net/udp_tunnel.h
> +++ b/include/net/udp_tunnel.h
> @@ -9,6 +9,12 @@
>  #include <net/addrconf.h>
>  #endif
>
> +enum udp_tunnel_type {
> +       UDP_TUNNEL_UNSPEC,
> +       UDP_TUNNEL_VXLAN,
> +       UDP_TUNNEL_GENEVE,
> +};
> +

Sorry, I still don't like this. Grant it least it gets rid of of VXLAN
specific ops, but the problem is there no such things as a common set
of encapsulations in the kernel (e.g. foo-over-udp adds a bunch of
encapsulations not represented here), no defined common set of device
functionality that needs this, and this precludes the use of the RX
accelerations to be available from a userpsace  implementation.

The bad effect of this model is that it is encourages HW vendors to
continue implement HW protocol specific support for encapsulations, we
get so much more benefit if they implement protocol generic
mechanisms. For instance, it is much better that they return
CHECKSUM_COMPLETE rather than giving us checksum unnecessary
indication for a TCP checksum within VXLAN.

If the devices needs to be configured for some protocol specific
actions then ntuple filters on the port seems like the right
interface. Really the only common NIC offload that might need this at
all is LRO. RSS, the checksum offloads, and LSO (for UDP tunnels) can
all be implemented generically without regard to the specific
encapsulation being used.

Tom

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-23 21:53   ` Tom Herbert
@ 2015-11-23 22:49     ` Jesse Gross
  2015-11-24  0:32       ` Singhai, Anjali
  2015-11-30  3:21     ` David Miller
  1 sibling, 1 reply; 94+ messages in thread
From: Jesse Gross @ 2015-11-23 22:49 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Anjali Singhai Jain, Linux Kernel Network Developers, Kiran Patil

On Mon, Nov 23, 2015 at 1:53 PM, Tom Herbert <tom@herbertland.com> wrote:
>> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
>> index cb2f89f..72415aa 100644
>> --- a/include/net/udp_tunnel.h
>> +++ b/include/net/udp_tunnel.h
>> @@ -9,6 +9,12 @@
>>  #include <net/addrconf.h>
>>  #endif
>>
>> +enum udp_tunnel_type {
>> +       UDP_TUNNEL_UNSPEC,
>> +       UDP_TUNNEL_VXLAN,
>> +       UDP_TUNNEL_GENEVE,
>> +};
>> +
>
> Sorry, I still don't like this. Grant it least it gets rid of of VXLAN
> specific ops, but the problem is there no such things as a common set
> of encapsulations in the kernel (e.g. foo-over-udp adds a bunch of
> encapsulations not represented here), no defined common set of device
> functionality that needs this, and this precludes the use of the RX
> accelerations to be available from a userpsace  implementation.

Regardless, I think this is at least a good cleanup of what is already
there compared to having VXLAN-specific NDOs. We can always add
additional things in the future.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-23 22:49     ` Jesse Gross
@ 2015-11-24  0:32       ` Singhai, Anjali
  2015-11-24  0:38         ` Tom Herbert
  0 siblings, 1 reply; 94+ messages in thread
From: Singhai, Anjali @ 2015-11-24  0:32 UTC (permalink / raw)
  To: Jesse Gross, Tom Herbert
  Cc: Linux Kernel Network Developers, Patil, Kiran, Brandeburg, Jesse



> -----Original Message-----
> From: Jesse Gross [mailto:jesse@kernel.org]
> Sent: Monday, November 23, 2015 2:50 PM
> To: Tom Herbert
> Cc: Singhai, Anjali; Linux Kernel Network Developers; Patil, Kiran
> Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
> 
> On Mon, Nov 23, 2015 at 1:53 PM, Tom Herbert <tom@herbertland.com>
> wrote:
> >> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
> >> index cb2f89f..72415aa 100644
> >> --- a/include/net/udp_tunnel.h
> >> +++ b/include/net/udp_tunnel.h
> >> @@ -9,6 +9,12 @@
> >>  #include <net/addrconf.h>
> >>  #endif
> >>
> >> +enum udp_tunnel_type {
> >> +       UDP_TUNNEL_UNSPEC,
> >> +       UDP_TUNNEL_VXLAN,
> >> +       UDP_TUNNEL_GENEVE,
> >> +};
> >> +
> >
> > Sorry, I still don't like this. Grant it least it gets rid of of VXLAN
> > specific ops, but the problem is there no such things as a common set
> > of encapsulations in the kernel (e.g. foo-over-udp adds a bunch of
> > encapsulations not represented here), no defined common set of device
> > functionality that needs this, and this precludes the use of the RX
> > accelerations to be available from a userpsace  implementation.
> 
> Regardless, I think this is at least a good cleanup of what is already
> there compared to having VXLAN-specific NDOs. We can always add
> additional things in the future.

Agreed with Jesse that this will help not hurt,  when we are ready to cross the bridge for removing RX side Protocol ossification. 


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-24  0:32       ` Singhai, Anjali
@ 2015-11-24  0:38         ` Tom Herbert
  2015-11-24  1:11           ` Jesse Brandeburg
  0 siblings, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-11-24  0:38 UTC (permalink / raw)
  To: Singhai, Anjali
  Cc: Jesse Gross, Linux Kernel Network Developers, Patil, Kiran,
	Brandeburg, Jesse

On Mon, Nov 23, 2015 at 4:32 PM, Singhai, Anjali
<anjali.singhai@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: Jesse Gross [mailto:jesse@kernel.org]
>> Sent: Monday, November 23, 2015 2:50 PM
>> To: Tom Herbert
>> Cc: Singhai, Anjali; Linux Kernel Network Developers; Patil, Kiran
>> Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
>>
>> On Mon, Nov 23, 2015 at 1:53 PM, Tom Herbert <tom@herbertland.com>
>> wrote:
>> >> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
>> >> index cb2f89f..72415aa 100644
>> >> --- a/include/net/udp_tunnel.h
>> >> +++ b/include/net/udp_tunnel.h
>> >> @@ -9,6 +9,12 @@
>> >>  #include <net/addrconf.h>
>> >>  #endif
>> >>
>> >> +enum udp_tunnel_type {
>> >> +       UDP_TUNNEL_UNSPEC,
>> >> +       UDP_TUNNEL_VXLAN,
>> >> +       UDP_TUNNEL_GENEVE,
>> >> +};
>> >> +
>> >
>> > Sorry, I still don't like this. Grant it least it gets rid of of VXLAN
>> > specific ops, but the problem is there no such things as a common set
>> > of encapsulations in the kernel (e.g. foo-over-udp adds a bunch of
>> > encapsulations not represented here), no defined common set of device
>> > functionality that needs this, and this precludes the use of the RX
>> > accelerations to be available from a userpsace  implementation.
>>
>> Regardless, I think this is at least a good cleanup of what is already
>> there compared to having VXLAN-specific NDOs. We can always add
>> additional things in the future.
>
> Agreed with Jesse that this will help not hurt,  when we are ready to cross the bridge for removing RX side Protocol ossification.
>
The time is now to address the protocol ossification problem. HW
vendors leaking out support for random protocols one at a time really
isn't helpful at all. Unfortunately, it's pretty clear that many
vendors aren't going to fix this on their own volition. Fixing the
interfaces to "encourage" change seems to be a way we can help things
a long from kernel perspective.

Tom

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-24  0:38         ` Tom Herbert
@ 2015-11-24  1:11           ` Jesse Brandeburg
  2015-11-24 17:32             ` Tom Herbert
  0 siblings, 1 reply; 94+ messages in thread
From: Jesse Brandeburg @ 2015-11-24  1:11 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Singhai, Anjali, Jesse Gross, Linux Kernel Network Developers,
	Patil, Kiran, jesse.brandeburg

On Mon, 23 Nov 2015 16:38:59 -0800
Tom Herbert <tom@herbertland.com> wrote:
> >> >
> >> > Sorry, I still don't like this. Grant it least it gets rid of of VXLAN
> >> > specific ops, but the problem is there no such things as a common set
> >> > of encapsulations in the kernel (e.g. foo-over-udp adds a bunch of
> >> > encapsulations not represented here), no defined common set of device

Tom, thanks for your feedback.

Is anyone using foo-over-udp besides you?
I know a lot of people using VxLAN and many who want Geneve offloads.
The performance gain of using hardware offload in this area is
non-trivial (like 300% or more)

> >> > functionality that needs this, and this precludes the use of the RX
> >> > accelerations to be available from a userpsace  implementation.
> >>
> >> Regardless, I think this is at least a good cleanup of what is already
> >> there compared to having VXLAN-specific NDOs. We can always add
> >> additional things in the future.
> >
> > Agreed with Jesse that this will help not hurt,  when we are ready to 
> > cross the bridge for removing RX side Protocol ossification.
> >
> The time is now to address the protocol ossification problem. HW
> vendors leaking out support for random protocols one at a time really
> isn't helpful at all. Unfortunately, it's pretty clear that many
> vendors aren't going to fix this on their own volition. Fixing the
> interfaces to "encourage" change seems to be a way we can help things
> a long from kernel perspective.

So we (as a kernel community) have users *NOW* who want this
feature, and hardware that is available *now* that has this feature.
Do you think we should wait for a unicorn to arrive that has a fully
programmable de-ossified checksum engine?  How long?

Agree that we can start to address the Protocol Ossification problem by
working with hardware vendors, but that is a multi-year process to get
to new silicon with these changes.  Those with fully programmable
firmware engines might be able to get a change done sooner, but that
requires a non-trivial effort by the vendor that isn't reusable in
other operating systems, or maybe isn't possible at all due to hardware
limits.

FWIW, I've brought the issue to the attention of the architects here,
and we will likely be able to make changes in this space.  Intel
hardware (as demonstrated by your patches) already is able to deal with
this de-ossification on transmit.  Receive is a whole different beast.

I think that trying to force an agenda with no fore-warning and also
punishing the users in order to get hardware vendors to change is the
wrong way to go about this.  All you end up with is people just asking
you why their hardware doesn't work in the kernel.

You have a proposal, let's codify it and enable it for the future, and
especially be *really* clear what you want hardware vendors to
implement so that they get it right.  MS does this by publishing
specifications and being clear what MUST be implemented and what COULD
be implemented.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-23 21:02 ` [PATCH v1 1/6] net: Generalize udp based tunnel offload Anjali Singhai Jain
                     ` (2 preceding siblings ...)
  2015-11-23 21:53   ` Tom Herbert
@ 2015-11-24  5:41   ` Alexander Duyck
  2015-11-30 16:35   ` Tom Herbert
  4 siblings, 0 replies; 94+ messages in thread
From: Alexander Duyck @ 2015-11-24  5:41 UTC (permalink / raw)
  To: Anjali Singhai Jain, netdev; +Cc: jesse, Kiran Patil

On 11/23/2015 01:02 PM, Anjali Singhai Jain wrote:
> Replace add/del ndo ops for vxlan_port with tunnel_port so that all UDP
> based tunnels can use the same ndo op. Add a parameter to pass tunnel
> type to the ndo_op.
>
> Change all drivers to use the generalized udp tunnel offload
>
> Patch was compile tested with x86_64_defconfig.
>
> Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
> ---

[...]

> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
> index cb2f89f..72415aa 100644
> --- a/include/net/udp_tunnel.h
> +++ b/include/net/udp_tunnel.h
> @@ -9,6 +9,12 @@
>   #include <net/addrconf.h>
>   #endif
>
> +enum udp_tunnel_type {
> +	UDP_TUNNEL_UNSPEC,
> +	UDP_TUNNEL_VXLAN,
> +	UDP_TUNNEL_GENEVE,
> +};
> +
>   struct udp_port_cfg {
>   	u8			family;
>
>

I'm not a fan of UDP_TUNNEL_UNSPEC.  If you are going to implement a 
"tunnel type" field it should specify tunnel type 1:1, not just 
generically refer to UNSPEC for everything that isn't VXLAN or GENEVE. 
This way we can avoid any issues with anyone implementing an offload 
that later relies on their tunnel type value being equal to 0.

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 2/6] net: Add a generic udp_offload_get_port function
  2015-11-23 21:02 ` [PATCH v1 2/6] net: Add a generic udp_offload_get_port function Anjali Singhai Jain
@ 2015-11-24  6:08   ` Alexander Duyck
  2015-11-24  6:37   ` Alexander Duyck
  1 sibling, 0 replies; 94+ messages in thread
From: Alexander Duyck @ 2015-11-24  6:08 UTC (permalink / raw)
  To: Anjali Singhai Jain, netdev; +Cc: jesse, Kiran Patil

On 11/23/2015 01:02 PM, Anjali Singhai Jain wrote:
> The new function udp_offload_get_port replaces vxlan_get_rx_port().
> This is a generic function that will help replay all udp tunnel ports
> irrespective of tunnel type.
> This way when new udp tunnels get added this function need not change.
>
> Note: Drivers besides i40e are compile tested with this change.
>
> Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
> Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> ---
>   drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  5 ++--
>   drivers/net/ethernet/broadcom/bnxt/bnxt.c        |  4 +++-
>   drivers/net/ethernet/emulex/benet/be_main.c      |  4 +++-
>   drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |  3 ++-
>   drivers/net/ethernet/intel/i40e/i40e_main.c      |  5 ++--
>   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  5 ++--
>   drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |  3 ++-
>   drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |  3 ++-
>   drivers/net/vxlan.c                              | 29 ++----------------------
>   include/linux/netdevice.h                        |  2 ++
>   include/net/protocol.h                           |  2 ++
>   include/net/vxlan.h                              |  8 -------
>   net/ipv4/udp_offload.c                           | 27 ++++++++++++++++++++++
>   13 files changed, 53 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> index ad2782f..56777c8 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -60,6 +60,7 @@
>   #include <linux/semaphore.h>
>   #include <linux/stringify.h>
>   #include <linux/vmalloc.h>
> +#include <net/protocol.h>
>
>   #include "bnx2x.h"
>   #include "bnx2x_init.h"
> @@ -10293,7 +10294,7 @@ sp_rtnl_not_reset:
>   			netdev_info(bp->dev,
>   				    "Deleted vxlan dest port %d", port);
>   			bp->vxlan_dst_port = 0;
> -			vxlan_get_rx_port(bp->dev);
> +			udp_offload_get_port(bp->dev);
>   		}
>   	}
>   #endif
> @@ -12499,7 +12500,7 @@ static int bnx2x_open(struct net_device *dev)
>
>   #ifdef CONFIG_BNX2X_VXLAN
>   	if (IS_PF(bp))
> -		vxlan_get_rx_port(dev);
> +		udp_offload_get_port(dev);
>   #endif
>
>   	return 0;
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 5b96ddf..f49ca38 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -39,6 +39,7 @@
>   #include <net/ip6_checksum.h>
>   #if defined(CONFIG_VXLAN) || defined(CONFIG_VXLAN_MODULE)
>   #include <net/vxlan.h>
> +#include <net/protocol.h>
>   #endif
>   #ifdef CONFIG_NET_RX_BUSY_POLL
>   #include <net/busy_poll.h>
> @@ -4589,7 +4590,7 @@ static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init)
>
>   	if (irq_re_init) {
>   #if defined(CONFIG_VXLAN) || defined(CONFIG_VXLAN_MODULE)
> -		vxlan_get_rx_port(bp->dev);
> +		udp_offload_get_port(bp->dev);
>   #endif
>   		if (!bnxt_hwrm_tunnel_dst_port_alloc(
>   				bp, htons(0x17c1),
> @@ -5458,6 +5459,7 @@ static void bnxt_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
>
>   	if (type != UDP_TUNNEL_VXLAN)
>   		return;
> +
>   	if (bp->vxlan_port_cnt && bp->vxlan_port == port) {
>   		bp->vxlan_port_cnt--;
>

This looks like a bit of white-space bleed-through from your first 
patch.  You probably need to place it there.

> diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
> index e699deca..a4da753 100644
> --- a/drivers/net/ethernet/emulex/benet/be_main.c
> +++ b/drivers/net/ethernet/emulex/benet/be_main.c
> @@ -25,6 +25,7 @@
>   #include <net/busy_poll.h>
>   #include <net/udp_tunnel.h>
>   #include <net/vxlan.h>
> +#include <net/protocol.h>
>
>   MODULE_VERSION(DRV_VER);
>   MODULE_DESCRIPTION(DRV_DESC " " DRV_VER);
> @@ -3604,7 +3605,7 @@ static int be_open(struct net_device *netdev)
>
>   #ifdef CONFIG_BE2NET_VXLAN
>   	if (skyhawk_chip(adapter))
> -		vxlan_get_rx_port(netdev);
> +		udp_offload_get_port(netdev);
>   #endif
>
>   	return 0;
> @@ -5239,6 +5240,7 @@ static void be_del_vxlan_port(struct net_device *netdev, sa_family_t sa_family,
>
>   	if (type != UDP_TUNNEL_VXLAN)
>   		return;
> +
>   	if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter))
>   		return;
>

Same here.  We don't need white space changes in these patches as they 
just clutter things up.

> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> index 447d5e6..1564a13 100644
> --- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> @@ -23,6 +23,7 @@
>   #if IS_ENABLED(CONFIG_FM10K_VXLAN)
>   #include <net/udp_tunnel.h>
>   #include <net/vxlan.h>
> +#include <net/protocol.h>
>   #endif /* CONFIG_FM10K_VXLAN */
>
>   /**
> @@ -573,7 +574,7 @@ int fm10k_open(struct net_device *netdev)
>
>   #if IS_ENABLED(CONFIG_FM10K_VXLAN)
>   	/* update VXLAN port configuration */
> -	vxlan_get_rx_port(netdev);
> +	udp_offload_get_port(netdev);
>
>   #endif
>   	fm10k_up(interface);
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 520e34e..4be0a26 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -31,6 +31,7 @@
>   #include <net/vxlan.h>
>   #endif
>   #include <net/udp_tunnel.h>
> +#include <net/protocol.h>
>
>   const char i40e_driver_name[] = "i40e";
>   static const char i40e_driver_string[] =
> @@ -5303,9 +5304,7 @@ int i40e_open(struct net_device *netdev)
>   						       TCP_FLAG_CWR) >> 16);
>   	wr32(&pf->hw, I40E_GLLAN_TSOMSK_L, be32_to_cpu(TCP_FLAG_CWR) >> 16);
>
> -#ifdef CONFIG_I40E_VXLAN
> -	vxlan_get_rx_port(netdev);
> -#endif
> +	udp_offload_get_port(netdev);
>
>   	return 0;
>   }
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 76ccc77..ba92c7a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -52,6 +52,7 @@
>   #include <scsi/fc/fc_fcoe.h>
>   #include <net/udp_tunnel.h>
>   #include <net/vxlan.h>
> +#include <net/protocol.h>
>
>   #ifdef CONFIG_OF
>   #include <linux/of_net.h>
> @@ -5823,7 +5824,7 @@ static int ixgbe_open(struct net_device *netdev)
>
>   	ixgbe_clear_vxlan_port(adapter);
>   #ifdef CONFIG_IXGBE_VXLAN
> -	vxlan_get_rx_port(netdev);
> +	udp_offload_get_port(netdev);
>   #endif
>
>   	return 0;
> @@ -6913,7 +6914,7 @@ static void ixgbe_service_task(struct work_struct *work)
>   #ifdef CONFIG_IXGBE_VXLAN
>   	if (adapter->flags2 & IXGBE_FLAG2_VXLAN_REREG_NEEDED) {
>   		adapter->flags2 &= ~IXGBE_FLAG2_VXLAN_REREG_NEEDED;
> -		vxlan_get_rx_port(adapter->netdev);
> +		udp_offload_get_port(adapter->netdev);
>   	}
>   #endif /* CONFIG_IXGBE_VXLAN */
>   	ixgbe_reset_subtask(adapter);
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 2cb19c7..b91b8f1 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -41,6 +41,7 @@
>   #include <net/busy_poll.h>
>   #include <net/udp_tunnel.h>
>   #include <net/vxlan.h>
> +#include <net/protocol.h>
>
>   #include <linux/mlx4/driver.h>
>   #include <linux/mlx4/device.h>
> @@ -1684,7 +1685,7 @@ int mlx4_en_start_port(struct net_device *dev)
>
>   #ifdef CONFIG_MLX4_EN_VXLAN
>   	if (priv->mdev->dev->caps.tunnel_offload_mode == MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
> -		vxlan_get_rx_port(dev);
> +		udp_offload_get_port(dev);
>   #endif
>   	priv->port_up = true;
>   	netif_tx_start_all_queues(dev);
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> index aa38dbb..a640872 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> @@ -19,6 +19,7 @@
>   #ifdef CONFIG_QLCNIC_VXLAN
>   #include <net/udp_tunnel.h>
>   #include <net/vxlan.h>
> +#include <net/protocol.h>
>   #endif
>
>   #include "qlcnic.h"
> @@ -2026,7 +2027,7 @@ qlcnic_attach(struct qlcnic_adapter *adapter)
>
>   #ifdef CONFIG_QLCNIC_VXLAN
>   	if (qlcnic_encap_rx_offload(adapter))
> -		vxlan_get_rx_port(netdev);
> +		udp_offload_get_port(netdev);
>   #endif
>
>   	adapter->is_up = QLCNIC_ADAPTER_UP_MAGIC;
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 5490629..702f9be 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2424,33 +2424,6 @@ static struct device_type vxlan_type = {
>   	.name = "vxlan",
>   };
>
> -/* Calls the ndo_add_udp_tunnel_port of the caller in order to
> - * supply the listening VXLAN udp ports. Callers are expected
> - * to implement the ndo_add_tunnel_port.
> - */
> -void vxlan_get_rx_port(struct net_device *dev)
> -{
> -	struct vxlan_sock *vs;
> -	struct net *net = dev_net(dev);
> -	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
> -	sa_family_t sa_family;
> -	__be16 port;
> -	unsigned int i;
> -
> -	spin_lock(&vn->sock_lock);
> -	for (i = 0; i < PORT_HASH_SIZE; ++i) {
> -		hlist_for_each_entry_rcu(vs, &vn->sock_list[i], hlist) {
> -			port = inet_sk(vs->sock->sk)->inet_sport;
> -			sa_family = vxlan_get_sk_family(vs);
> -			dev->netdev_ops->ndo_add_udp_tunnel_port(dev, sa_family,
> -							      port,
> -							      UDP_TUNNEL_VXLAN);
> -		}
> -	}
> -	spin_unlock(&vn->sock_lock);
> -}
> -EXPORT_SYMBOL_GPL(vxlan_get_rx_port);
> -
>   /* Initialize the device structure. */
>   static void vxlan_setup(struct net_device *dev)
>   {
> @@ -2639,6 +2612,8 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6,
>
>   	/* Initialize the vxlan udp offloads structure */
>   	vs->udp_offloads.port = port;
> +	vs->udp_offloads.tunnel_type = UDP_TUNNEL_VXLAN;
> +	vs->udp_offloads.family = ipv6 ? AF_INET6 : AF_INET;
>   	vs->udp_offloads.callbacks.gro_receive  = vxlan_gro_receive;
>   	vs->udp_offloads.callbacks.gro_complete = vxlan_gro_complete;
>

It seems like you are losing functionality here.  The function 
vxlan_get_sk_family was what was being used to obtain the socket family 
before and it retrieved it from the socket.

If you are going to drop use of the function vxlan_get_sk_family like 
this you should probably just to through the code and drop all of the 
references to it since you have duplicated the functionality with the 
tunnel_type field.  It might make sense to make that a seperate patch 
and to take care of it before you make this change.

> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index eaecc42..0073009 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2075,6 +2075,8 @@ struct udp_offload_callbacks {
>
>   struct udp_offload {
>   	__be16			 port;
> +	u8			 tunnel_type;
> +	u8			 family;
>   	u8			 ipproto;
>   	struct udp_offload_callbacks callbacks;
>   };

You cast family as a u8, but skc_family is an unsigned short.  You 
should probably use either that or sa_family_t.

> diff --git a/include/net/protocol.h b/include/net/protocol.h
> index d6fcc1f..738bfc6 100644
> --- a/include/net/protocol.h
> +++ b/include/net/protocol.h
> @@ -110,6 +110,8 @@ void inet_unregister_protosw(struct inet_protosw *p);
>   int  udp_add_offload(struct udp_offload *prot);
>   void udp_del_offload(struct udp_offload *prot);
>
> +void udp_offload_get_port(struct net_device *dev);
> +
>   #if IS_ENABLED(CONFIG_IPV6)
>   int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
>   int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char num);
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index c1c899c..926455e 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -242,14 +242,6 @@ static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
>   /* IPv6 header + UDP + VXLAN + Ethernet header */
>   #define VXLAN6_HEADROOM (40 + 8 + 8 + 14)
>
> -#if IS_ENABLED(CONFIG_VXLAN)
> -void vxlan_get_rx_port(struct net_device *netdev);
> -#else
> -static inline void vxlan_get_rx_port(struct net_device *netdev)
> -{
> -}
> -#endif
> -
>   static inline unsigned short vxlan_get_sk_family(struct vxlan_sock *vs)
>   {
>   	return vs->sock->sk->sk_family;
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index f938616..8597020 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -290,6 +290,33 @@ unlock:
>   }
>   EXPORT_SYMBOL(udp_del_offload);
>
> +void udp_offload_get_port(struct net_device *dev)
> +{
> +	struct udp_offload_priv __rcu **head;
> +	struct udp_offload_priv *uo_priv;
> +	struct udp_offload *uo;
> +
> +	if (udp_offload_base)
> +		head = &udp_offload_base;
> +	else
> +		return;
> +
> +	spin_lock(&udp_offload_lock);
> +	uo_priv = udp_deref_protected(*head);
> +	for (; uo_priv != NULL; uo_priv = udp_deref_protected(*head)) {

You can really simplify all of this by using a while loop here instead 
of a for loop.  Just check for "while (uo_priv)".

> +		/* call the right add port */
> +		uo = uo_priv->offload;
> +		if (uo && dev->netdev_ops->ndo_add_udp_tunnel_port)
> +			dev->netdev_ops->ndo_add_udp_tunnel_port(dev,
> +							uo->family,
> +							uo->port,
> +							uo->tunnel_type);
> +		head = &uo_priv->next;

No need to carry head, it is just dead weight.  At this point you could do:
		uo_priv = udp_deref_protected(uo_priv->next);

> +	}
> +	spin_unlock(&udp_offload_lock);
> +}
> +EXPORT_SYMBOL(udp_offload_get_port);
> +
>   struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
>   				 struct udphdr *uh)
>   {
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 2/6] net: Add a generic udp_offload_get_port function
  2015-11-23 21:02 ` [PATCH v1 2/6] net: Add a generic udp_offload_get_port function Anjali Singhai Jain
  2015-11-24  6:08   ` Alexander Duyck
@ 2015-11-24  6:37   ` Alexander Duyck
  2015-11-24 19:35     ` Singhai, Anjali
  1 sibling, 1 reply; 94+ messages in thread
From: Alexander Duyck @ 2015-11-24  6:37 UTC (permalink / raw)
  To: Anjali Singhai Jain, netdev; +Cc: jesse, Kiran Patil

On 11/23/2015 01:02 PM, Anjali Singhai Jain wrote:
> The new function udp_offload_get_port replaces vxlan_get_rx_port().
> This is a generic function that will help replay all udp tunnel ports
> irrespective of tunnel type.
> This way when new udp tunnels get added this function need not change.
>
> Note: Drivers besides i40e are compile tested with this change.
>
> Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
> Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> ---

[...]

> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index f938616..8597020 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -290,6 +290,33 @@ unlock:
>   }
>   EXPORT_SYMBOL(udp_del_offload);
>
> +void udp_offload_get_port(struct net_device *dev)
> +{
> +	struct udp_offload_priv __rcu **head;
> +	struct udp_offload_priv *uo_priv;
> +	struct udp_offload *uo;
> +
> +	if (udp_offload_base)
> +		head = &udp_offload_base;
> +	else
> +		return;
> +
> +	spin_lock(&udp_offload_lock);
> +	uo_priv = udp_deref_protected(*head);
> +	for (; uo_priv != NULL; uo_priv = udp_deref_protected(*head)) {
> +		/* call the right add port */
> +		uo = uo_priv->offload;
> +		if (uo && dev->netdev_ops->ndo_add_udp_tunnel_port)
> +			dev->netdev_ops->ndo_add_udp_tunnel_port(dev,
> +							uo->family,
> +							uo->port,
> +							uo->tunnel_type);
> +		head = &uo_priv->next;
> +	}
> +	spin_unlock(&udp_offload_lock);
> +}
> +EXPORT_SYMBOL(udp_offload_get_port);
> +
>   struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
>   				 struct udphdr *uh)
>   {
>

So when I got to patch 5 I realized this approach is horribly broken for 
IPv6 tunnels.  The udp_offload_base is only populated if the family is 
AF_INET.  What do you guys plan to do to get support for AF_INET6?

You probably ought to look at something like what ended up being done 
for the IOAT stuff.  What you end up needing is to support the drivers 
querying for what ports are active, and receiving notifications of 
tunnel updates, and the tunnel side that will register some 
functionality allowing the active ports for a given tunnel type to be 
queried.

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-24  1:11           ` Jesse Brandeburg
@ 2015-11-24 17:32             ` Tom Herbert
  2015-11-24 17:43               ` Hannes Frederic Sowa
  2015-11-30  3:22               ` David Miller
  0 siblings, 2 replies; 94+ messages in thread
From: Tom Herbert @ 2015-11-24 17:32 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Singhai, Anjali, Jesse Gross, Linux Kernel Network Developers,
	Patil, Kiran

>
> FWIW, I've brought the issue to the attention of the architects here,
> and we will likely be able to make changes in this space.  Intel
> hardware (as demonstrated by your patches) already is able to deal with
> this de-ossification on transmit.  Receive is a whole different beast.
>
Please provide the specifics on why "Receive is a whole different
beast.". Generic receive checksum is already a subset of the
functionality that you must have implement to support the protocol
specific offloads. All the hardware needs to do is calculate the 1's
complement checksum of the packet and return the value on the to the
host with that packet. That's it. No parsing of headers, no worrying
about the pseudo header, no dealing with any encapsulation. Just do
the calculation, return the result to the host and the driver converts
this to CHECKSUM_COMPLETE. I find it very hard to believe that this is
any harder than specific support the next protocol du jour.

> I think that trying to force an agenda with no fore-warning and also
> punishing the users in order to get hardware vendors to change is the
> wrong way to go about this.  All you end up with is people just asking
> you why their hardware doesn't work in the kernel.
>
As you said this in only feedback and nobody is forcing anyone to do
anything. But encouraging HW vendors to provide generic mechanisms so
that your users can use whatever protocol they want is the exact
_opposite_ of punishing users, this is very much a pro-user direction.

> You have a proposal, let's codify it and enable it for the future, and
> especially be *really* clear what you want hardware vendors to
> implement so that they get it right.  MS does this by publishing
> specifications and being clear what MUST be implemented and what COULD
> be implemented.
>

Linux does not mandate HW implementation like MS, what we we do is
define driver interfaces which allow for a variety of different HW
implementations. The stack-driver checksum interface is described at
the top of skbuff.h. If this interface description is not clear enough
please let me know and we can fix that. If it is helpful we can
publish our requirements of new NICs at Facebook for reference.

Tom

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-24 17:32             ` Tom Herbert
@ 2015-11-24 17:43               ` Hannes Frederic Sowa
  2015-11-24 17:52                 ` Tom Herbert
  2015-11-24 18:37                 ` David Miller
  2015-11-30  3:22               ` David Miller
  1 sibling, 2 replies; 94+ messages in thread
From: Hannes Frederic Sowa @ 2015-11-24 17:43 UTC (permalink / raw)
  To: Tom Herbert, Jesse Brandeburg
  Cc: Singhai, Anjali, Jesse Gross, Linux Kernel Network Developers,
	Patil, Kiran

On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
> As you said this in only feedback and nobody is forcing anyone to do
> anything. But encouraging HW vendors to provide generic mechanisms so
> that your users can use whatever protocol they want is the exact
> _opposite_ of punishing users, this is very much a pro-user direction.

Some users will suffer worse performance in case we don't correctly set
ip_summed for a specific protocol before we do the copy operations from
user space into skbs but if they are always done in the driver.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-24 17:43               ` Hannes Frederic Sowa
@ 2015-11-24 17:52                 ` Tom Herbert
  2015-11-24 18:16                   ` Hannes Frederic Sowa
  2015-11-24 18:37                 ` David Miller
  1 sibling, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-11-24 17:52 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jesse Brandeburg, Singhai, Anjali, Jesse Gross,
	Linux Kernel Network Developers, Patil, Kiran

On Tue, Nov 24, 2015 at 9:43 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
>> As you said this in only feedback and nobody is forcing anyone to do
>> anything. But encouraging HW vendors to provide generic mechanisms so
>> that your users can use whatever protocol they want is the exact
>> _opposite_ of punishing users, this is very much a pro-user direction.
>
> Some users will suffer worse performance in case we don't correctly set
> ip_summed for a specific protocol before we do the copy operations from
> user space into skbs but if they are always done in the driver.
>
Please be specific. Who are the users, what is exact performance
regression, what are specific protocols in question?

> Bye,
> Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-24 17:52                 ` Tom Herbert
@ 2015-11-24 18:16                   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 94+ messages in thread
From: Hannes Frederic Sowa @ 2015-11-24 18:16 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Jesse Brandeburg, Singhai, Anjali, Jesse Gross,
	Linux Kernel Network Developers, Patil, Kiran



On Tue, Nov 24, 2015, at 18:52, Tom Herbert wrote:
> On Tue, Nov 24, 2015 at 9:43 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
> >> As you said this in only feedback and nobody is forcing anyone to do
> >> anything. But encouraging HW vendors to provide generic mechanisms so
> >> that your users can use whatever protocol they want is the exact
> >> _opposite_ of punishing users, this is very much a pro-user direction.
> >
> > Some users will suffer worse performance in case we don't correctly set
> > ip_summed for a specific protocol before we do the copy operations from
> > user space into skbs but if they are always done in the driver.
> >
> Please be specific. Who are the users, what is exact performance
> regression, what are specific protocols in question?

Depending on ip_summed after a connect on a socket and having the
correct dst_entry we either copy data from user space and calculate the
checksum concurrently or let the hardware do that
(csum_partial_from_user).

In case someone owns a IPv4 capable but not IPv6 capable checksum
offloading NIC they will end up in the slow path in the driver either
for IPv4 xor IPv6 because out of one we could not set the correct
ip_summed mode before copy to the kernel.

Data during transmit seems to be cold already, so a proper checksum
while we touch the data anyway would be favorable.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-24 17:43               ` Hannes Frederic Sowa
  2015-11-24 17:52                 ` Tom Herbert
@ 2015-11-24 18:37                 ` David Miller
  2015-11-24 18:42                   ` Hannes Frederic Sowa
  2015-11-24 18:43                   ` Tom Herbert
  1 sibling, 2 replies; 94+ messages in thread
From: David Miller @ 2015-11-24 18:37 UTC (permalink / raw)
  To: hannes; +Cc: tom, jesse.brandeburg, anjali.singhai, jesse, netdev, kiran.patil

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Tue, 24 Nov 2015 18:43:35 +0100

> On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
>> As you said this in only feedback and nobody is forcing anyone to do
>> anything. But encouraging HW vendors to provide generic mechanisms so
>> that your users can use whatever protocol they want is the exact
>> _opposite_ of punishing users, this is very much a pro-user direction.
> 
> Some users will suffer worse performance in case we don't correctly set
> ip_summed for a specific protocol before we do the copy operations from
> user space into skbs but if they are always done in the driver.

Your concern presumes that looking backwards is as important as looking
forward.

We want to simplify things _and_ move away from protocol specific
csums, and if some old crufty hardware based systems pay some performance
cost for this I say so be it.

So this is not a valid argument against Tom's changes in my mind.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-24 18:37                 ` David Miller
@ 2015-11-24 18:42                   ` Hannes Frederic Sowa
  2015-11-24 18:43                   ` Tom Herbert
  1 sibling, 0 replies; 94+ messages in thread
From: Hannes Frederic Sowa @ 2015-11-24 18:42 UTC (permalink / raw)
  To: David Miller
  Cc: tom, jesse.brandeburg, anjali.singhai, jesse, netdev, kiran.patil

On Tue, Nov 24, 2015, at 19:37, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Tue, 24 Nov 2015 18:43:35 +0100
> 
> > On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
> >> As you said this in only feedback and nobody is forcing anyone to do
> >> anything. But encouraging HW vendors to provide generic mechanisms so
> >> that your users can use whatever protocol they want is the exact
> >> _opposite_ of punishing users, this is very much a pro-user direction.
> > 
> > Some users will suffer worse performance in case we don't correctly set
> > ip_summed for a specific protocol before we do the copy operations from
> > user space into skbs but if they are always done in the driver.
> 
> Your concern presumes that looking backwards is as important as looking
> forward.
> 
> We want to simplify things _and_ move away from protocol specific
> csums, and if some old crufty hardware based systems pay some performance
> cost for this I say so be it.
> 
> So this is not a valid argument against Tom's changes in my mind.

I agree with you and we should move forward with this. It is just
something to keep in mind IMHO. Otherwise maintenance of those
additional bits did not hurt a lot IMHO.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-24 18:37                 ` David Miller
  2015-11-24 18:42                   ` Hannes Frederic Sowa
@ 2015-11-24 18:43                   ` Tom Herbert
  1 sibling, 0 replies; 94+ messages in thread
From: Tom Herbert @ 2015-11-24 18:43 UTC (permalink / raw)
  To: David Miller
  Cc: Hannes Frederic Sowa, Jesse Brandeburg, Anjali Singhai Jain,
	Jesse Gross, Linux Kernel Network Developers, Kiran Patil

On Tue, Nov 24, 2015 at 10:37 AM, David Miller <davem@davemloft.net> wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Tue, 24 Nov 2015 18:43:35 +0100
>
>> On Tue, Nov 24, 2015, at 18:32, Tom Herbert wrote:
>>> As you said this in only feedback and nobody is forcing anyone to do
>>> anything. But encouraging HW vendors to provide generic mechanisms so
>>> that your users can use whatever protocol they want is the exact
>>> _opposite_ of punishing users, this is very much a pro-user direction.
>>
>> Some users will suffer worse performance in case we don't correctly set
>> ip_summed for a specific protocol before we do the copy operations from
>> user space into skbs but if they are always done in the driver.
>
> Your concern presumes that looking backwards is as important as looking
> forward.
>
> We want to simplify things _and_ move away from protocol specific
> csums, and if some old crufty hardware based systems pay some performance
> cost for this I say so be it.
>
> So this is not a valid argument against Tom's changes in my mind.
>
And for that matter these arguments have nothing to do with these UDP
encapsulation patches at all, they seem to be directed to the patches
to eliminate NETIF_F_IP{V6}_CSUM so please post on that thread.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* RE: [PATCH v1 2/6] net: Add a generic udp_offload_get_port function
  2015-11-24  6:37   ` Alexander Duyck
@ 2015-11-24 19:35     ` Singhai, Anjali
  0 siblings, 0 replies; 94+ messages in thread
From: Singhai, Anjali @ 2015-11-24 19:35 UTC (permalink / raw)
  To: Alexander Duyck, netdev; +Cc: jesse, Patil, Kiran



> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> Sent: Monday, November 23, 2015 10:38 PM
> To: Singhai, Anjali; netdev@vger.kernel.org
> Cc: jesse@kernel.org; Patil, Kiran
> Subject: Re: [PATCH v1 2/6] net: Add a generic udp_offload_get_port
> function
> 
> On 11/23/2015 01:02 PM, Anjali Singhai Jain wrote:
> > The new function udp_offload_get_port replaces vxlan_get_rx_port().
> > This is a generic function that will help replay all udp tunnel ports
> > irrespective of tunnel type.
> > This way when new udp tunnels get added this function need not change.
> >
> > Note: Drivers besides i40e are compile tested with this change.
> >
> > Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
> > Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> > ---
> 
> [...]
> 
> > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index
> > f938616..8597020 100644
> > --- a/net/ipv4/udp_offload.c
> > +++ b/net/ipv4/udp_offload.c
> > @@ -290,6 +290,33 @@ unlock:
> >   }
> >   EXPORT_SYMBOL(udp_del_offload);
> >
> > +void udp_offload_get_port(struct net_device *dev) {
> > +	struct udp_offload_priv __rcu **head;
> > +	struct udp_offload_priv *uo_priv;
> > +	struct udp_offload *uo;
> > +
> > +	if (udp_offload_base)
> > +		head = &udp_offload_base;
> > +	else
> > +		return;
> > +
> > +	spin_lock(&udp_offload_lock);
> > +	uo_priv = udp_deref_protected(*head);
> > +	for (; uo_priv != NULL; uo_priv = udp_deref_protected(*head)) {
> > +		/* call the right add port */
> > +		uo = uo_priv->offload;
> > +		if (uo && dev->netdev_ops->ndo_add_udp_tunnel_port)
> > +			dev->netdev_ops->ndo_add_udp_tunnel_port(dev,
> > +							uo->family,
> > +							uo->port,
> > +							uo->tunnel_type);
> > +		head = &uo_priv->next;
> > +	}
> > +	spin_unlock(&udp_offload_lock);
> > +}
> > +EXPORT_SYMBOL(udp_offload_get_port);
> > +
> >   struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff
> *skb,
> >   				 struct udphdr *uh)
> >   {
> >
> 
> So when I got to patch 5 I realized this approach is horribly broken for
> IPv6 tunnels.  The udp_offload_base is only populated if the family is
> AF_INET.  What do you guys plan to do to get support for AF_INET6?
> 
> You probably ought to look at something like what ended up being done for
> the IOAT stuff.  What you end up needing is to support the drivers querying
> for what ports are active, and receiving notifications of tunnel updates, and
> the tunnel side that will register some functionality allowing the active ports
> for a given tunnel type to be queried.
> 
> - Alex

Alex you are right about the IPv6 handling and I will update the patch series to help fix that.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-23 21:53   ` Tom Herbert
  2015-11-23 22:49     ` Jesse Gross
@ 2015-11-30  3:21     ` David Miller
  2015-11-30 21:33       ` Singhai, Anjali
  2015-12-01  0:25       ` Jesse Gross
  1 sibling, 2 replies; 94+ messages in thread
From: David Miller @ 2015-11-30  3:21 UTC (permalink / raw)
  To: tom; +Cc: anjali.singhai, netdev, jesse, kiran.patil

From: Tom Herbert <tom@herbertland.com>
Date: Mon, 23 Nov 2015 13:53:44 -0800

> The bad effect of this model is that it is encourages HW vendors to
> continue implement HW protocol specific support for encapsulations, we
> get so much more benefit if they implement protocol generic
> mechanisms.

+1

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-24 17:32             ` Tom Herbert
  2015-11-24 17:43               ` Hannes Frederic Sowa
@ 2015-11-30  3:22               ` David Miller
  2015-11-30 21:42                 ` Singhai, Anjali
  1 sibling, 1 reply; 94+ messages in thread
From: David Miller @ 2015-11-30  3:22 UTC (permalink / raw)
  To: tom; +Cc: jesse.brandeburg, anjali.singhai, jesse, netdev, kiran.patil

From: Tom Herbert <tom@herbertland.com>
Date: Tue, 24 Nov 2015 09:32:11 -0800

>>
>> FWIW, I've brought the issue to the attention of the architects here,
>> and we will likely be able to make changes in this space.  Intel
>> hardware (as demonstrated by your patches) already is able to deal with
>> this de-ossification on transmit.  Receive is a whole different beast.
>>
> Please provide the specifics on why "Receive is a whole different
> beast.". Generic receive checksum is already a subset of the
> functionality that you must have implement to support the protocol
> specific offloads. All the hardware needs to do is calculate the 1's
> complement checksum of the packet and return the value on the to the
> host with that packet. That's it. No parsing of headers, no worrying
> about the pseudo header, no dealing with any encapsulation. Just do
> the calculation, return the result to the host and the driver converts
> this to CHECKSUM_COMPLETE. I find it very hard to believe that this is
> any harder than specific support the next protocol du jour.

+1

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-23 21:02 ` [PATCH v1 1/6] net: Generalize udp based tunnel offload Anjali Singhai Jain
                     ` (3 preceding siblings ...)
  2015-11-24  5:41   ` Alexander Duyck
@ 2015-11-30 16:35   ` Tom Herbert
  2015-11-30 21:53     ` Singhai, Anjali
  4 siblings, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-11-30 16:35 UTC (permalink / raw)
  To: Anjali Singhai Jain
  Cc: Linux Kernel Network Developers, Jesse Gross, Kiran Patil

On Mon, Nov 23, 2015 at 1:02 PM, Anjali Singhai Jain
<anjali.singhai@intel.com> wrote:
> Replace add/del ndo ops for vxlan_port with tunnel_port so that all UDP
> based tunnels can use the same ndo op. Add a parameter to pass tunnel
> type to the ndo_op.
>
Please consider using RX ntuple filters for this instead of a new ndo
op. The vxlan ndo op essentailly implements a limited filter with a
rule to match a destination UDP port and the the action of processing
the packet as vxlan. ntuple filters generalizes that so that the
filtering becomes arbitrary. We'll need the ability to filter on
4-tuple when we implement tunnels to go through firewalls or for
offloading other UDP protocols such SPUD or QUIC.

Tom

> Change all drivers to use the generalized udp tunnel offload
>
> Patch was compile tested with x86_64_defconfig.
>
> Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 15 ++++++---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c        | 13 +++++---
>  drivers/net/ethernet/emulex/benet/be_main.c      | 14 +++++---
>  drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  | 27 ++++++++++++----
>  drivers/net/ethernet/intel/i40e/i40e_main.c      | 41 +++++++++++++++++-------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 17 +++++++---
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c   | 21 ++++++++----
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 17 +++++++---
>  drivers/net/vxlan.c                              | 23 +++++++------
>  include/linux/netdevice.h                        | 34 ++++++++++----------
>  include/net/udp_tunnel.h                         |  6 ++++
>  11 files changed, 157 insertions(+), 71 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> index 2273576..ad2782f 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -47,6 +47,7 @@
>  #include <net/ip.h>
>  #include <net/ipv6.h>
>  #include <net/tcp.h>
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>  #include <net/checksum.h>
>  #include <net/ip6_checksum.h>
> @@ -10124,11 +10125,14 @@ static void __bnx2x_add_vxlan_port(struct bnx2x *bp, u16 port)
>  }
>
>  static void bnx2x_add_vxlan_port(struct net_device *netdev,
> -                                sa_family_t sa_family, __be16 port)
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type)
>  {
>         struct bnx2x *bp = netdev_priv(netdev);
>         u16 t_port = ntohs(port);
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
>         __bnx2x_add_vxlan_port(bp, t_port);
>  }
>
> @@ -10152,11 +10156,14 @@ static void __bnx2x_del_vxlan_port(struct bnx2x *bp, u16 port)
>  }
>
>  static void bnx2x_del_vxlan_port(struct net_device *netdev,
> -                                sa_family_t sa_family, __be16 port)
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type)
>  {
>         struct bnx2x *bp = netdev_priv(netdev);
>         u16 t_port = ntohs(port);
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
>         __bnx2x_del_vxlan_port(bp, t_port);
>  }
>  #endif
> @@ -13008,8 +13015,8 @@ static const struct net_device_ops bnx2x_netdev_ops = {
>         .ndo_set_vf_link_state  = bnx2x_set_vf_link_state,
>         .ndo_features_check     = bnx2x_features_check,
>  #ifdef CONFIG_BNX2X_VXLAN
> -       .ndo_add_vxlan_port     = bnx2x_add_vxlan_port,
> -       .ndo_del_vxlan_port     = bnx2x_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = bnx2x_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = bnx2x_del_vxlan_port,
>  #endif
>  };
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index f2d0dc9..5b96ddf 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -5421,7 +5421,7 @@ static void bnxt_cfg_ntp_filters(struct bnxt *bp)
>  #endif /* CONFIG_RFS_ACCEL */
>
>  static void bnxt_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
> -                               __be16 port)
> +                               __be16 port, u32 type)
>  {
>         struct bnxt *bp = netdev_priv(dev);
>
> @@ -5431,6 +5431,9 @@ static void bnxt_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
>         if (sa_family != AF_INET6 && sa_family != AF_INET)
>                 return;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (bp->vxlan_port_cnt && bp->vxlan_port != port)
>                 return;
>
> @@ -5443,7 +5446,7 @@ static void bnxt_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
>  }
>
>  static void bnxt_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
> -                               __be16 port)
> +                               __be16 port, u32 type)
>  {
>         struct bnxt *bp = netdev_priv(dev);
>
> @@ -5453,6 +5456,8 @@ static void bnxt_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
>         if (sa_family != AF_INET6 && sa_family != AF_INET)
>                 return;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
>         if (bp->vxlan_port_cnt && bp->vxlan_port == port) {
>                 bp->vxlan_port_cnt--;
>
> @@ -5491,8 +5496,8 @@ static const struct net_device_ops bnxt_netdev_ops = {
>  #ifdef CONFIG_RFS_ACCEL
>         .ndo_rx_flow_steer      = bnxt_rx_flow_steer,
>  #endif
> -       .ndo_add_vxlan_port     = bnxt_add_vxlan_port,
> -       .ndo_del_vxlan_port     = bnxt_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = bnxt_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = bnxt_del_vxlan_port,
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>         .ndo_busy_poll          = bnxt_busy_poll,
>  #endif
> diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
> index 4cab887..e699deca 100644
> --- a/drivers/net/ethernet/emulex/benet/be_main.c
> +++ b/drivers/net/ethernet/emulex/benet/be_main.c
> @@ -23,6 +23,7 @@
>  #include <linux/aer.h>
>  #include <linux/if_bridge.h>
>  #include <net/busy_poll.h>
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>
>  MODULE_VERSION(DRV_VER);
> @@ -5175,12 +5176,15 @@ static int be_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
>   * until after all the tunnels are removed.
>   */
>  static void be_add_vxlan_port(struct net_device *netdev, sa_family_t sa_family,
> -                             __be16 port)
> +                             __be16 port, u32 type)
>  {
>         struct be_adapter *adapter = netdev_priv(netdev);
>         struct device *dev = &adapter->pdev->dev;
>         int status;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter))
>                 return;
>
> @@ -5229,10 +5233,12 @@ err:
>  }
>
>  static void be_del_vxlan_port(struct net_device *netdev, sa_family_t sa_family,
> -                             __be16 port)
> +                             __be16 port, u32 type)
>  {
>         struct be_adapter *adapter = netdev_priv(netdev);
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
>         if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter))
>                 return;
>
> @@ -5342,8 +5348,8 @@ static const struct net_device_ops be_netdev_ops = {
>         .ndo_busy_poll          = be_busy_poll,
>  #endif
>  #ifdef CONFIG_BE2NET_VXLAN
> -       .ndo_add_vxlan_port     = be_add_vxlan_port,
> -       .ndo_del_vxlan_port     = be_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = be_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = be_del_vxlan_port,
>         .ndo_features_check     = be_features_check,
>  #endif
>         .ndo_get_phys_port_id   = be_get_phys_port_id,
> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> index 639263d..447d5e6 100644
> --- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> @@ -21,6 +21,7 @@
>  #include "fm10k.h"
>  #include <linux/vmalloc.h>
>  #if IS_ENABLED(CONFIG_FM10K_VXLAN)
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>  #endif /* CONFIG_FM10K_VXLAN */
>
> @@ -439,18 +440,24 @@ static void fm10k_restore_vxlan_port(struct fm10k_intfc *interface)
>   * @netdev: network interface device structure
>   * @sa_family: Address family of new port
>   * @port: port number used for VXLAN
> + * @type: Tunnel type
>   *
> - * This funciton is called when a new VXLAN interface has added a new port
> + * This function is called when a new VXLAN interface has added a new port
>   * number to the range that is currently in use for VXLAN.  The new port
>   * number is always added to the tail so that the port number list should
>   * match the order in which the ports were allocated.  The head of the list
>   * is always used as the VXLAN port number for offloads.
>   **/
>  static void fm10k_add_vxlan_port(struct net_device *dev,
> -                                sa_family_t sa_family, __be16 port) {
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type) {
> +#if IS_ENABLED(CONFIG_FM10K_VXLAN)
>         struct fm10k_intfc *interface = netdev_priv(dev);
>         struct fm10k_vxlan_port *vxlan_port;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         /* only the PF supports configuring tunnels */
>         if (interface->hw.mac.type != fm10k_mac_pf)
>                 return;
> @@ -476,6 +483,7 @@ insert_tail:
>         list_add_tail(&vxlan_port->list, &interface->vxlan_port);
>
>         fm10k_restore_vxlan_port(interface);
> +#endif
>  }
>
>  /**
> @@ -483,17 +491,23 @@ insert_tail:
>   * @netdev: network interface device structure
>   * @sa_family: Address family of freed port
>   * @port: port number used for VXLAN
> + * @type: Tunnel type
>   *
> - * This funciton is called when a new VXLAN interface has freed a port
> + * This function is called when a new VXLAN interface has freed a port
>   * number from the range that is currently in use for VXLAN.  The freed
>   * port is removed from the list and the new head is used to determine
>   * the port number for offloads.
>   **/
>  static void fm10k_del_vxlan_port(struct net_device *dev,
> -                                sa_family_t sa_family, __be16 port) {
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type) {
> +#if IS_ENABLED(CONFIG_FM10K_VXLAN)
>         struct fm10k_intfc *interface = netdev_priv(dev);
>         struct fm10k_vxlan_port *vxlan_port;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (interface->hw.mac.type != fm10k_mac_pf)
>                 return;
>
> @@ -508,6 +522,7 @@ static void fm10k_del_vxlan_port(struct net_device *dev,
>         }
>
>         fm10k_restore_vxlan_port(interface);
> +#endif
>  }
>
>  /**
> @@ -1373,8 +1388,8 @@ static const struct net_device_ops fm10k_netdev_ops = {
>         .ndo_set_vf_vlan        = fm10k_ndo_set_vf_vlan,
>         .ndo_set_vf_rate        = fm10k_ndo_set_vf_bw,
>         .ndo_get_vf_config      = fm10k_ndo_get_vf_config,
> -       .ndo_add_vxlan_port     = fm10k_add_vxlan_port,
> -       .ndo_del_vxlan_port     = fm10k_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = fm10k_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = fm10k_del_vxlan_port,
>         .ndo_do_ioctl           = fm10k_ioctl,
>         .ndo_dfwd_add_station   = fm10k_dfwd_add_station,
>         .ndo_dfwd_del_station   = fm10k_dfwd_del_station,
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index b825f97..520e34e 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -30,6 +30,7 @@
>  #ifdef CONFIG_I40E_VXLAN
>  #include <net/vxlan.h>
>  #endif
> +#include <net/udp_tunnel.h>
>
>  const char i40e_driver_name[] = "i40e";
>  static const char i40e_driver_string[] =
> @@ -8296,13 +8297,18 @@ static u8 i40e_get_vxlan_port_idx(struct i40e_pf *pf, __be16 port)
>  }
>
>  /**
> - * i40e_add_vxlan_port - Get notifications about VXLAN ports that come up
> + * i40e_add_tunnel_port - Get notifications about UDP tunnel ports that come up
>   * @netdev: This physical port's netdev
> - * @sa_family: Socket Family that VXLAN is notifying us about
> - * @port: New UDP port number that VXLAN started listening to
> + * @sa_family: Socket Family that tunnel netdev is  associated with
> + * @port: New UDP port number that tunnel started listening to
> + * @type: Tunnel Type
> + *
> + * This function modifies a common data structure for all udp_tunnels
> + * hence it is expected that it is called under a common lock.
>   **/
> -static void i40e_add_vxlan_port(struct net_device *netdev,
> -                               sa_family_t sa_family, __be16 port)
> +static void i40e_add_tunnel_port(struct net_device *netdev,
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type)
>  {
>         struct i40e_netdev_priv *np = netdev_priv(netdev);
>         struct i40e_vsi *vsi = np->vsi;
> @@ -8310,6 +8316,9 @@ static void i40e_add_vxlan_port(struct net_device *netdev,
>         u8 next_idx;
>         u8 idx;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (sa_family == AF_INET6)
>                 return;
>
> @@ -8338,19 +8347,27 @@ static void i40e_add_vxlan_port(struct net_device *netdev,
>  }
>
>  /**
> - * i40e_del_vxlan_port - Get notifications about VXLAN ports that go away
> + * i40e_del_tunnel_port - Get notifications about UDP tunnel ports that go away
>   * @netdev: This physical port's netdev
> - * @sa_family: Socket Family that VXLAN is notifying us about
> - * @port: UDP port number that VXLAN stopped listening to
> + * @sa_family: Socket Family that tunnel netdev is associated with
> + * @port: UDP port number that tunnel stopped listening to
> + * @type: Tunnel Type
> + *
> + * This function modifies a common data structure for all udp_tunnels
> + * hence it is expected that it is called under common lock.
>   **/
> -static void i40e_del_vxlan_port(struct net_device *netdev,
> -                               sa_family_t sa_family, __be16 port)
> +static void i40e_del_tunnel_port(struct net_device *netdev,
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type)
>  {
>         struct i40e_netdev_priv *np = netdev_priv(netdev);
>         struct i40e_vsi *vsi = np->vsi;
>         struct i40e_pf *pf = vsi->back;
>         u8 idx;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (sa_family == AF_INET6)
>                 return;
>
> @@ -8596,8 +8613,8 @@ static const struct net_device_ops i40e_netdev_ops = {
>         .ndo_set_vf_link_state  = i40e_ndo_set_vf_link_state,
>         .ndo_set_vf_spoofchk    = i40e_ndo_set_vf_spoofchk,
>  #ifdef CONFIG_I40E_VXLAN
> -       .ndo_add_vxlan_port     = i40e_add_vxlan_port,
> -       .ndo_del_vxlan_port     = i40e_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = i40e_add_tunnel_port,
> +       .ndo_del_udp_tunnel_port        = i40e_del_tunnel_port,
>  #endif
>         .ndo_get_phys_port_id   = i40e_get_phys_port_id,
>         .ndo_fdb_add            = i40e_ndo_fdb_add,
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 4089d77..76ccc77 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -50,6 +50,7 @@
>  #include <linux/if_bridge.h>
>  #include <linux/prefetch.h>
>  #include <scsi/fc/fc_fcoe.h>
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>
>  #ifdef CONFIG_OF
> @@ -8088,14 +8089,18 @@ static int ixgbe_set_features(struct net_device *netdev,
>   * @dev: The port's netdev
>   * @sa_family: Socket Family that VXLAN is notifiying us about
>   * @port: New UDP port number that VXLAN started listening to
> + * @type: Tunnel type
>   **/
>  static void ixgbe_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
> -                                __be16 port)
> +                                __be16 port, u32 type)
>  {
>         struct ixgbe_adapter *adapter = netdev_priv(dev);
>         struct ixgbe_hw *hw = &adapter->hw;
>         u16 new_port = ntohs(port);
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (!(adapter->flags & IXGBE_FLAG_VXLAN_OFFLOAD_CAPABLE))
>                 return;
>
> @@ -8121,13 +8126,17 @@ static void ixgbe_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
>   * @dev: The port's netdev
>   * @sa_family: Socket Family that VXLAN is notifying us about
>   * @port: UDP port number that VXLAN stopped listening to
> + * @type: Tunnel type
>   **/
>  static void ixgbe_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
> -                                __be16 port)
> +                                __be16 port, u32 type)
>  {
>         struct ixgbe_adapter *adapter = netdev_priv(dev);
>         u16 new_port = ntohs(port);
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (!(adapter->flags & IXGBE_FLAG_VXLAN_OFFLOAD_CAPABLE))
>                 return;
>
> @@ -8436,8 +8445,8 @@ static const struct net_device_ops ixgbe_netdev_ops = {
>         .ndo_dfwd_add_station   = ixgbe_fwd_add,
>         .ndo_dfwd_del_station   = ixgbe_fwd_del,
>  #ifdef CONFIG_IXGBE_VXLAN
> -       .ndo_add_vxlan_port     = ixgbe_add_vxlan_port,
> -       .ndo_del_vxlan_port     = ixgbe_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = ixgbe_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = ixgbe_del_vxlan_port,
>  #endif /* CONFIG_IXGBE_VXLAN */
>         .ndo_features_check     = ixgbe_features_check,
>  };
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 659209f..2cb19c7 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -39,6 +39,7 @@
>  #include <linux/hash.h>
>  #include <net/ip.h>
>  #include <net/busy_poll.h>
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>
>  #include <linux/mlx4/driver.h>
> @@ -2365,11 +2366,15 @@ static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
>  }
>
>  static void mlx4_en_add_vxlan_port(struct  net_device *dev,
> -                                  sa_family_t sa_family, __be16 port)
> +                                  sa_family_t sa_family, __be16 port,
> +                                  u32 type)
>  {
>         struct mlx4_en_priv *priv = netdev_priv(dev);
>         __be16 current_port;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (priv->mdev->dev->caps.tunnel_offload_mode != MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
>                 return;
>
> @@ -2388,11 +2393,15 @@ static void mlx4_en_add_vxlan_port(struct  net_device *dev,
>  }
>
>  static void mlx4_en_del_vxlan_port(struct  net_device *dev,
> -                                  sa_family_t sa_family, __be16 port)
> +                                  sa_family_t sa_family, __be16 port,
> +                                  u32 type)
>  {
>         struct mlx4_en_priv *priv = netdev_priv(dev);
>         __be16 current_port;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (priv->mdev->dev->caps.tunnel_offload_mode != MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
>                 return;
>
> @@ -2469,8 +2478,8 @@ static const struct net_device_ops mlx4_netdev_ops = {
>  #endif
>         .ndo_get_phys_port_id   = mlx4_en_get_phys_port_id,
>  #ifdef CONFIG_MLX4_EN_VXLAN
> -       .ndo_add_vxlan_port     = mlx4_en_add_vxlan_port,
> -       .ndo_del_vxlan_port     = mlx4_en_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = mlx4_en_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = mlx4_en_del_vxlan_port,
>         .ndo_features_check     = mlx4_en_features_check,
>  #endif
>         .ndo_set_tx_maxrate     = mlx4_en_set_tx_maxrate,
> @@ -2507,8 +2516,8 @@ static const struct net_device_ops mlx4_netdev_ops_master = {
>  #endif
>         .ndo_get_phys_port_id   = mlx4_en_get_phys_port_id,
>  #ifdef CONFIG_MLX4_EN_VXLAN
> -       .ndo_add_vxlan_port     = mlx4_en_add_vxlan_port,
> -       .ndo_del_vxlan_port     = mlx4_en_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = mlx4_en_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = mlx4_en_del_vxlan_port,
>         .ndo_features_check     = mlx4_en_features_check,
>  #endif
>         .ndo_set_tx_maxrate     = mlx4_en_set_tx_maxrate,
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> index 1205f6f..aa38dbb 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> @@ -17,6 +17,7 @@
>  #include <linux/log2.h>
>  #include <linux/pci.h>
>  #ifdef CONFIG_QLCNIC_VXLAN
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>  #endif
>
> @@ -476,11 +477,15 @@ static int qlcnic_get_phys_port_id(struct net_device *netdev,
>
>  #ifdef CONFIG_QLCNIC_VXLAN
>  static void qlcnic_add_vxlan_port(struct net_device *netdev,
> -                                 sa_family_t sa_family, __be16 port)
> +                                 sa_family_t sa_family, __be16 port,
> +                                 u32 type)
>  {
>         struct qlcnic_adapter *adapter = netdev_priv(netdev);
>         struct qlcnic_hardware_context *ahw = adapter->ahw;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         /* Adapter supports only one VXLAN port. Use very first port
>          * for enabling offload
>          */
> @@ -498,11 +503,15 @@ static void qlcnic_add_vxlan_port(struct net_device *netdev,
>  }
>
>  static void qlcnic_del_vxlan_port(struct net_device *netdev,
> -                                 sa_family_t sa_family, __be16 port)
> +                                 sa_family_t sa_family, __be16 port,
> +                                 u32 type)
>  {
>         struct qlcnic_adapter *adapter = netdev_priv(netdev);
>         struct qlcnic_hardware_context *ahw = adapter->ahw;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (!qlcnic_encap_rx_offload(adapter) || !ahw->vxlan_port_count ||
>             (ahw->vxlan_port != ntohs(port)))
>                 return;
> @@ -540,8 +549,8 @@ static const struct net_device_ops qlcnic_netdev_ops = {
>         .ndo_fdb_dump           = qlcnic_fdb_dump,
>         .ndo_get_phys_port_id   = qlcnic_get_phys_port_id,
>  #ifdef CONFIG_QLCNIC_VXLAN
> -       .ndo_add_vxlan_port     = qlcnic_add_vxlan_port,
> -       .ndo_del_vxlan_port     = qlcnic_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = qlcnic_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = qlcnic_del_vxlan_port,
>         .ndo_features_check     = qlcnic_features_check,
>  #endif
>  #ifdef CONFIG_NET_POLL_CONTROLLER
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 6369a57..5490629 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -628,9 +628,10 @@ static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
>
>         rcu_read_lock();
>         for_each_netdev_rcu(net, dev) {
> -               if (dev->netdev_ops->ndo_add_vxlan_port)
> -                       dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
> -                                                           port);
> +               if (dev->netdev_ops->ndo_add_udp_tunnel_port)
> +                       dev->netdev_ops->ndo_add_udp_tunnel_port(dev, sa_family,
> +                                                             port,
> +                                                             UDP_TUNNEL_VXLAN);
>         }
>         rcu_read_unlock();
>  }
> @@ -646,9 +647,10 @@ static void vxlan_notify_del_rx_port(struct vxlan_sock *vs)
>
>         rcu_read_lock();
>         for_each_netdev_rcu(net, dev) {
> -               if (dev->netdev_ops->ndo_del_vxlan_port)
> -                       dev->netdev_ops->ndo_del_vxlan_port(dev, sa_family,
> -                                                           port);
> +               if (dev->netdev_ops->ndo_del_udp_tunnel_port)
> +                       dev->netdev_ops->ndo_del_udp_tunnel_port(dev, sa_family,
> +                                                             port,
> +                                                             UDP_TUNNEL_VXLAN);
>         }
>         rcu_read_unlock();
>
> @@ -2422,9 +2424,9 @@ static struct device_type vxlan_type = {
>         .name = "vxlan",
>  };
>
> -/* Calls the ndo_add_vxlan_port of the caller in order to
> +/* Calls the ndo_add_udp_tunnel_port of the caller in order to
>   * supply the listening VXLAN udp ports. Callers are expected
> - * to implement the ndo_add_vxlan_port.
> + * to implement the ndo_add_tunnel_port.
>   */
>  void vxlan_get_rx_port(struct net_device *dev)
>  {
> @@ -2440,8 +2442,9 @@ void vxlan_get_rx_port(struct net_device *dev)
>                 hlist_for_each_entry_rcu(vs, &vn->sock_list[i], hlist) {
>                         port = inet_sk(vs->sock->sk)->inet_sport;
>                         sa_family = vxlan_get_sk_family(vs);
> -                       dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
> -                                                           port);
> +                       dev->netdev_ops->ndo_add_udp_tunnel_port(dev, sa_family,
> +                                                             port,
> +                                                             UDP_TUNNEL_VXLAN);
>                 }
>         }
>         spin_unlock(&vn->sock_lock);
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 7d2d1d7..eaecc42 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1004,18 +1004,19 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>   *     not implement this, it is assumed that the hw is not able to have
>   *     multiple net devices on single physical port.
>   *
> - * void (*ndo_add_vxlan_port)(struct  net_device *dev,
> - *                           sa_family_t sa_family, __be16 port);
> - *     Called by vxlan to notiy a driver about the UDP port and socket
> - *     address family that vxlan is listnening to. It is called only when
> - *     a new port starts listening. The operation is protected by the
> - *     vxlan_net->sock_lock.
> - *
> - * void (*ndo_del_vxlan_port)(struct  net_device *dev,
> - *                           sa_family_t sa_family, __be16 port);
> - *     Called by vxlan to notify the driver about a UDP port and socket
> - *     address family that vxlan is not listening to anymore. The operation
> - *     is protected by the vxlan_net->sock_lock.
> + * void (*ndo_add_udp_tunnel_port)(struct  net_device *dev,
> + *                           sa_family_t sa_family, __be16 port, u32 type);
> + *     Called by UDP based tunnel modules to notify a driver about a UDP
> + *     port and socket address family that the tunnel is listening to. It is
> + *     called only when a new port starts listening. The operation is
> + *     protected by udp_offload_lock across all udp based tunnels.
> + *
> + * void (*ndo_del_udp_tunnel_port)(struct  net_device *dev,
> + *                           sa_family_t sa_family, __be16 port, u32 type);
> + *     Called by UDP based tunnel modules to notify the driver about a UDP port
> + *     and socket address family that tunnel is not listening to anymore.
> + *     The operation is protected by udp_offload_lock across all udp based
> + *     tunnels.
>   *
>   * void* (*ndo_dfwd_add_station)(struct net_device *pdev,
>   *                              struct net_device *dev)
> @@ -1209,13 +1210,12 @@ struct net_device_ops {
>                                                         struct netdev_phys_item_id *ppid);
>         int                     (*ndo_get_phys_port_name)(struct net_device *dev,
>                                                           char *name, size_t len);
> -       void                    (*ndo_add_vxlan_port)(struct  net_device *dev,
> +       void                    (*ndo_add_udp_tunnel_port)(struct  net_device *dev,
>                                                       sa_family_t sa_family,
> -                                                     __be16 port);
> -       void                    (*ndo_del_vxlan_port)(struct  net_device *dev,
> +                                                     __be16 port, u32 type);
> +       void                    (*ndo_del_udp_tunnel_port)(struct  net_device *dev,
>                                                       sa_family_t sa_family,
> -                                                     __be16 port);
> -
> +                                                     __be16 port, u32 type);
>         void*                   (*ndo_dfwd_add_station)(struct net_device *pdev,
>                                                         struct net_device *dev);
>         void                    (*ndo_dfwd_del_station)(struct net_device *pdev,
> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
> index cb2f89f..72415aa 100644
> --- a/include/net/udp_tunnel.h
> +++ b/include/net/udp_tunnel.h
> @@ -9,6 +9,12 @@
>  #include <net/addrconf.h>
>  #endif
>
> +enum udp_tunnel_type {
> +       UDP_TUNNEL_UNSPEC,
> +       UDP_TUNNEL_VXLAN,
> +       UDP_TUNNEL_GENEVE,
> +};
> +
>  struct udp_port_cfg {
>         u8                      family;
>
> --
> 1.8.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-30  3:21     ` David Miller
@ 2015-11-30 21:33       ` Singhai, Anjali
  2015-12-01  0:25       ` Jesse Gross
  1 sibling, 0 replies; 94+ messages in thread
From: Singhai, Anjali @ 2015-11-30 21:33 UTC (permalink / raw)
  To: David Miller, tom; +Cc: netdev, jesse, Patil, Kiran



-----Original Message-----
From: David Miller [mailto:davem@davemloft.net] 
Sent: Sunday, November 29, 2015 7:22 PM
To: tom@herbertland.com
Cc: Singhai, Anjali <anjali.singhai@intel.com>; netdev@vger.kernel.org; jesse@kernel.org; Patil, Kiran <kiran.patil@intel.com>
Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

From: Tom Herbert <tom@herbertland.com>
Date: Mon, 23 Nov 2015 13:53:44 -0800

> The bad effect of this model is that it is encourages HW vendors to 
> continue implement HW protocol specific support for encapsulations, we 
> get so much more benefit if they implement protocol generic 
> mechanisms.
Dave, at least Intel parts have a protocol generic model for tunneled packet offloads and hence we are able to extend our support to newer tunnel types. We do  not have protocol specific support in the HW, but since the udp based tunnels do not have a packet type for the tunnel header, the HW needs to know which udp port should be mapped to which specific encapsulation. Otherwise encapsulated types like NVGRE we can identify through packet type and program the HW to account for the header. The newer patches for sure reduce the protocol ossification since in communalizes all the different tunnels into one interface so that any further support to a newer udp tunnel type requires just a type definition and if the driver/HW can support it, minor driver changes to set the right bits for HW. No 
 interface change for sure. And I think that is definitely a step in the right direction.

+1

^ permalink raw reply	[flat|nested] 94+ messages in thread

* RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-30  3:22               ` David Miller
@ 2015-11-30 21:42                 ` Singhai, Anjali
  2015-11-30 21:48                   ` Tom Herbert
  2015-12-01  3:48                   ` David Miller
  0 siblings, 2 replies; 94+ messages in thread
From: Singhai, Anjali @ 2015-11-30 21:42 UTC (permalink / raw)
  To: David Miller, tom; +Cc: Brandeburg, Jesse, jesse, netdev, Patil, Kiran



-----Original Message-----
From: David Miller [mailto:davem@davemloft.net] 
Sent: Sunday, November 29, 2015 7:23 PM
To: tom@herbertland.com
Cc: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Singhai, Anjali <anjali.singhai@intel.com>; jesse@kernel.org; netdev@vger.kernel.org; Patil, Kiran <kiran.patil@intel.com>
Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

From: Tom Herbert <tom@herbertland.com>
Date: Tue, 24 Nov 2015 09:32:11 -0800

>>
>> FWIW, I've brought the issue to the attention of the architects here, 
>> and we will likely be able to make changes in this space.  Intel 
>> hardware (as demonstrated by your patches) already is able to deal 
>> with this de-ossification on transmit.  Receive is a whole different beast.
>>
> Please provide the specifics on why "Receive is a whole different 
> beast.". Generic receive checksum is already a subset of the 
> functionality that you must have implement to support the protocol 
> specific offloads. All the hardware needs to do is calculate the 1's 
> complement checksum of the packet and return the value on the to the 
> host with that packet. That's it. No parsing of headers, no worrying 
> about the pseudo header, no dealing with any encapsulation. Just do 
> the calculation, return the result to the host and the driver converts 
> this to CHECKSUM_COMPLETE. I find it very hard to believe that this is 
> any harder than specific support the next protocol du jour.

The reason for receive being different than transmit is, on TX side driver can provide the meta data for where the checksum field is and what is the length that needs to be check summed to the HW on a per packet basis. On Rx the HW parser has to parse the packet to identify the tunnel type and based on that figure out the checksum locations and length in the packet, so definitely HW has to parse the packet and it can parse only based on next header type information or in case of udp tunnels based on udp port mapping to a particular protocol. I am not sure why you say it doesn't need to parse the packet, maybe I am miss- understanding something.  Although it's not difficult to reduce protocol ossification on the RX side but it is certainly different and particularly in case of udp-tunnels i
 t needs the port to protocol mapping.

+1

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-30 21:42                 ` Singhai, Anjali
@ 2015-11-30 21:48                   ` Tom Herbert
  2015-12-01  3:51                     ` David Miller
  2015-12-01  3:48                   ` David Miller
  1 sibling, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-11-30 21:48 UTC (permalink / raw)
  To: Singhai, Anjali
  Cc: David Miller, Brandeburg, Jesse, jesse, netdev, Patil, Kiran

On Mon, Nov 30, 2015 at 1:42 PM, Singhai, Anjali
<anjali.singhai@intel.com> wrote:
>
>
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Sunday, November 29, 2015 7:23 PM
> To: tom@herbertland.com
> Cc: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Singhai, Anjali <anjali.singhai@intel.com>; jesse@kernel.org; netdev@vger.kernel.org; Patil, Kiran <kiran.patil@intel.com>
> Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
>
> From: Tom Herbert <tom@herbertland.com>
> Date: Tue, 24 Nov 2015 09:32:11 -0800
>
>>>
>>> FWIW, I've brought the issue to the attention of the architects here,
>>> and we will likely be able to make changes in this space.  Intel
>>> hardware (as demonstrated by your patches) already is able to deal
>>> with this de-ossification on transmit.  Receive is a whole different beast.
>>>
>> Please provide the specifics on why "Receive is a whole different
>> beast.". Generic receive checksum is already a subset of the
>> functionality that you must have implement to support the protocol
>> specific offloads. All the hardware needs to do is calculate the 1's
>> complement checksum of the packet and return the value on the to the
>> host with that packet. That's it. No parsing of headers, no worrying
>> about the pseudo header, no dealing with any encapsulation. Just do
>> the calculation, return the result to the host and the driver converts
>> this to CHECKSUM_COMPLETE. I find it very hard to believe that this is
>> any harder than specific support the next protocol du jour.
>
> The reason for receive being different than transmit is, on TX side driver can provide the meta data for where the checksum field is and what is the length that needs to be check summed to the HW on a per packet basis. On Rx the HW parser has to parse the packet to identify the tunnel type and based on that figure out the checksum locations and length in the packet, so definitely HW has to parse the packet and it can parse only based on next header type information or in case of udp tunnels based on udp port mapping to a particular protocol. I am not sure why you say it doesn't need to parse the packet, maybe I am miss- understanding something.  Although it's not difficult to reduce protocol ossification on the RX side but it is certainly different and particularly in case of udp-tunnels
  it needs the port to protocol mapping.
>
Please look at how CHECKSUM_COMPLETE interface works. Description is
in sk_buff.h or
http://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 94+ messages in thread

* RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-30 16:35   ` Tom Herbert
@ 2015-11-30 21:53     ` Singhai, Anjali
  2015-12-01  3:52       ` David Miller
  0 siblings, 1 reply; 94+ messages in thread
From: Singhai, Anjali @ 2015-11-30 21:53 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, Jesse Gross, Patil, Kiran



-----Original Message-----
From: Tom Herbert [mailto:tom@herbertland.com] 
Sent: Monday, November 30, 2015 8:36 AM
To: Singhai, Anjali <anjali.singhai@intel.com>
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>; Jesse Gross <jesse@kernel.org>; Patil, Kiran <kiran.patil@intel.com>
Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

On Mon, Nov 23, 2015 at 1:02 PM, Anjali Singhai Jain <anjali.singhai@intel.com> wrote:
> Replace add/del ndo ops for vxlan_port with tunnel_port so that all 
> UDP based tunnels can use the same ndo op. Add a parameter to pass 
> tunnel type to the ndo_op.
>
Please consider using RX ntuple filters for this instead of a new ndo op. The vxlan ndo op essentailly implements a limited filter with a rule to match a destination UDP port and the the action of processing the packet as vxlan. ntuple filters generalizes that so that the filtering becomes arbitrary. We'll need the ability to filter on 4-tuple when we implement tunnels to go through firewalls or for offloading other UDP protocols such SPUD or QUIC.

Tom

- Tom I am not sure I agree with this suggestion. The easiest way to let the hardware know about port to protocol mapping in case of udp-based tunnels is when we add udp offloads for the ports aka gro etc in the stack. This way the user gets benefit of tunnel offloads from the HWs that support it without having to do any extra filter setups from ethtool. Just like ip/tcp/udp checksum and TSO support, the user does not have to turn this ON specifically if they plan to use those protocols (of course they can turn it off). Besides these are not true filters in that sense, they are not used to guide packets to any particular destination in this case, rather used to identify packets for checksum and TSO purpose.
And I agree with your patch series that reduces protocol ossification of the stack and driver interface. My point is this set of patches help with that goal and not really hurt because any new tunnel support would mean no change in the interface and just a new type in the enum and then the drivers can decide to do the magic setup in the HW in their driver based on this new type without ever having to touch the interface. So try to explain to me why this is causing protocol ossification because I don't believe so. And I think the ntupe interface should remain for the purpose of filters which are used to route packet or drop them. Not for packet identification and checksum offload support.

> Change all drivers to use the generalized udp tunnel offload
>
> Patch was compile tested with x86_64_defconfig.
>
> Signed-off-by: Kiran Patil <kiran.patil@intel.com>
> Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 15 ++++++---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c        | 13 +++++---
>  drivers/net/ethernet/emulex/benet/be_main.c      | 14 +++++---
>  drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  | 27 ++++++++++++----
>  drivers/net/ethernet/intel/i40e/i40e_main.c      | 41 +++++++++++++++++-------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 17 +++++++---
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c   | 21 ++++++++----
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 17 +++++++---
>  drivers/net/vxlan.c                              | 23 +++++++------
>  include/linux/netdevice.h                        | 34 ++++++++++----------
>  include/net/udp_tunnel.h                         |  6 ++++
>  11 files changed, 157 insertions(+), 71 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> index 2273576..ad2782f 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -47,6 +47,7 @@
>  #include <net/ip.h>
>  #include <net/ipv6.h>
>  #include <net/tcp.h>
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>  #include <net/checksum.h>
>  #include <net/ip6_checksum.h>
> @@ -10124,11 +10125,14 @@ static void __bnx2x_add_vxlan_port(struct 
> bnx2x *bp, u16 port)  }
>
>  static void bnx2x_add_vxlan_port(struct net_device *netdev,
> -                                sa_family_t sa_family, __be16 port)
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type)
>  {
>         struct bnx2x *bp = netdev_priv(netdev);
>         u16 t_port = ntohs(port);
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
>         __bnx2x_add_vxlan_port(bp, t_port);  }
>
> @@ -10152,11 +10156,14 @@ static void __bnx2x_del_vxlan_port(struct 
> bnx2x *bp, u16 port)  }
>
>  static void bnx2x_del_vxlan_port(struct net_device *netdev,
> -                                sa_family_t sa_family, __be16 port)
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type)
>  {
>         struct bnx2x *bp = netdev_priv(netdev);
>         u16 t_port = ntohs(port);
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
>         __bnx2x_del_vxlan_port(bp, t_port);  }  #endif @@ -13008,8 
> +13015,8 @@ static const struct net_device_ops bnx2x_netdev_ops = {
>         .ndo_set_vf_link_state  = bnx2x_set_vf_link_state,
>         .ndo_features_check     = bnx2x_features_check,
>  #ifdef CONFIG_BNX2X_VXLAN
> -       .ndo_add_vxlan_port     = bnx2x_add_vxlan_port,
> -       .ndo_del_vxlan_port     = bnx2x_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = bnx2x_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = bnx2x_del_vxlan_port,
>  #endif
>  };
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
> b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index f2d0dc9..5b96ddf 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -5421,7 +5421,7 @@ static void bnxt_cfg_ntp_filters(struct bnxt 
> *bp)  #endif /* CONFIG_RFS_ACCEL */
>
>  static void bnxt_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
> -                               __be16 port)
> +                               __be16 port, u32 type)
>  {
>         struct bnxt *bp = netdev_priv(dev);
>
> @@ -5431,6 +5431,9 @@ static void bnxt_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
>         if (sa_family != AF_INET6 && sa_family != AF_INET)
>                 return;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (bp->vxlan_port_cnt && bp->vxlan_port != port)
>                 return;
>
> @@ -5443,7 +5446,7 @@ static void bnxt_add_vxlan_port(struct 
> net_device *dev, sa_family_t sa_family,  }
>
>  static void bnxt_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
> -                               __be16 port)
> +                               __be16 port, u32 type)
>  {
>         struct bnxt *bp = netdev_priv(dev);
>
> @@ -5453,6 +5456,8 @@ static void bnxt_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
>         if (sa_family != AF_INET6 && sa_family != AF_INET)
>                 return;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
>         if (bp->vxlan_port_cnt && bp->vxlan_port == port) {
>                 bp->vxlan_port_cnt--;
>
> @@ -5491,8 +5496,8 @@ static const struct net_device_ops 
> bnxt_netdev_ops = {  #ifdef CONFIG_RFS_ACCEL
>         .ndo_rx_flow_steer      = bnxt_rx_flow_steer,
>  #endif
> -       .ndo_add_vxlan_port     = bnxt_add_vxlan_port,
> -       .ndo_del_vxlan_port     = bnxt_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = bnxt_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = bnxt_del_vxlan_port,
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>         .ndo_busy_poll          = bnxt_busy_poll,
>  #endif
> diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
> b/drivers/net/ethernet/emulex/benet/be_main.c
> index 4cab887..e699deca 100644
> --- a/drivers/net/ethernet/emulex/benet/be_main.c
> +++ b/drivers/net/ethernet/emulex/benet/be_main.c
> @@ -23,6 +23,7 @@
>  #include <linux/aer.h>
>  #include <linux/if_bridge.h>
>  #include <net/busy_poll.h>
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>
>  MODULE_VERSION(DRV_VER);
> @@ -5175,12 +5176,15 @@ static int be_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
>   * until after all the tunnels are removed.
>   */
>  static void be_add_vxlan_port(struct net_device *netdev, sa_family_t sa_family,
> -                             __be16 port)
> +                             __be16 port, u32 type)
>  {
>         struct be_adapter *adapter = netdev_priv(netdev);
>         struct device *dev = &adapter->pdev->dev;
>         int status;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter))
>                 return;
>
> @@ -5229,10 +5233,12 @@ err:
>  }
>
>  static void be_del_vxlan_port(struct net_device *netdev, sa_family_t sa_family,
> -                             __be16 port)
> +                             __be16 port, u32 type)
>  {
>         struct be_adapter *adapter = netdev_priv(netdev);
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
>         if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter))
>                 return;
>
> @@ -5342,8 +5348,8 @@ static const struct net_device_ops be_netdev_ops = {
>         .ndo_busy_poll          = be_busy_poll,
>  #endif
>  #ifdef CONFIG_BE2NET_VXLAN
> -       .ndo_add_vxlan_port     = be_add_vxlan_port,
> -       .ndo_del_vxlan_port     = be_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = be_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = be_del_vxlan_port,
>         .ndo_features_check     = be_features_check,
>  #endif
>         .ndo_get_phys_port_id   = be_get_phys_port_id,
> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c 
> b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> index 639263d..447d5e6 100644
> --- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> @@ -21,6 +21,7 @@
>  #include "fm10k.h"
>  #include <linux/vmalloc.h>
>  #if IS_ENABLED(CONFIG_FM10K_VXLAN)
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>  #endif /* CONFIG_FM10K_VXLAN */
>
> @@ -439,18 +440,24 @@ static void fm10k_restore_vxlan_port(struct fm10k_intfc *interface)
>   * @netdev: network interface device structure
>   * @sa_family: Address family of new port
>   * @port: port number used for VXLAN
> + * @type: Tunnel type
>   *
> - * This funciton is called when a new VXLAN interface has added a new 
> port
> + * This function is called when a new VXLAN interface has added a new 
> + port
>   * number to the range that is currently in use for VXLAN.  The new port
>   * number is always added to the tail so that the port number list should
>   * match the order in which the ports were allocated.  The head of the list
>   * is always used as the VXLAN port number for offloads.
>   **/
>  static void fm10k_add_vxlan_port(struct net_device *dev,
> -                                sa_family_t sa_family, __be16 port) {
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type) { #if 
> +IS_ENABLED(CONFIG_FM10K_VXLAN)
>         struct fm10k_intfc *interface = netdev_priv(dev);
>         struct fm10k_vxlan_port *vxlan_port;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         /* only the PF supports configuring tunnels */
>         if (interface->hw.mac.type != fm10k_mac_pf)
>                 return;
> @@ -476,6 +483,7 @@ insert_tail:
>         list_add_tail(&vxlan_port->list, &interface->vxlan_port);
>
>         fm10k_restore_vxlan_port(interface);
> +#endif
>  }
>
>  /**
> @@ -483,17 +491,23 @@ insert_tail:
>   * @netdev: network interface device structure
>   * @sa_family: Address family of freed port
>   * @port: port number used for VXLAN
> + * @type: Tunnel type
>   *
> - * This funciton is called when a new VXLAN interface has freed a 
> port
> + * This function is called when a new VXLAN interface has freed a 
> + port
>   * number from the range that is currently in use for VXLAN.  The freed
>   * port is removed from the list and the new head is used to determine
>   * the port number for offloads.
>   **/
>  static void fm10k_del_vxlan_port(struct net_device *dev,
> -                                sa_family_t sa_family, __be16 port) {
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type) { #if 
> +IS_ENABLED(CONFIG_FM10K_VXLAN)
>         struct fm10k_intfc *interface = netdev_priv(dev);
>         struct fm10k_vxlan_port *vxlan_port;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (interface->hw.mac.type != fm10k_mac_pf)
>                 return;
>
> @@ -508,6 +522,7 @@ static void fm10k_del_vxlan_port(struct net_device *dev,
>         }
>
>         fm10k_restore_vxlan_port(interface);
> +#endif
>  }
>
>  /**
> @@ -1373,8 +1388,8 @@ static const struct net_device_ops fm10k_netdev_ops = {
>         .ndo_set_vf_vlan        = fm10k_ndo_set_vf_vlan,
>         .ndo_set_vf_rate        = fm10k_ndo_set_vf_bw,
>         .ndo_get_vf_config      = fm10k_ndo_get_vf_config,
> -       .ndo_add_vxlan_port     = fm10k_add_vxlan_port,
> -       .ndo_del_vxlan_port     = fm10k_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = fm10k_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = fm10k_del_vxlan_port,
>         .ndo_do_ioctl           = fm10k_ioctl,
>         .ndo_dfwd_add_station   = fm10k_dfwd_add_station,
>         .ndo_dfwd_del_station   = fm10k_dfwd_del_station,
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index b825f97..520e34e 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -30,6 +30,7 @@
>  #ifdef CONFIG_I40E_VXLAN
>  #include <net/vxlan.h>
>  #endif
> +#include <net/udp_tunnel.h>
>
>  const char i40e_driver_name[] = "i40e";  static const char 
> i40e_driver_string[] = @@ -8296,13 +8297,18 @@ static u8 
> i40e_get_vxlan_port_idx(struct i40e_pf *pf, __be16 port)  }
>
>  /**
> - * i40e_add_vxlan_port - Get notifications about VXLAN ports that 
> come up
> + * i40e_add_tunnel_port - Get notifications about UDP tunnel ports 
> + that come up
>   * @netdev: This physical port's netdev
> - * @sa_family: Socket Family that VXLAN is notifying us about
> - * @port: New UDP port number that VXLAN started listening to
> + * @sa_family: Socket Family that tunnel netdev is  associated with
> + * @port: New UDP port number that tunnel started listening to
> + * @type: Tunnel Type
> + *
> + * This function modifies a common data structure for all udp_tunnels
> + * hence it is expected that it is called under a common lock.
>   **/
> -static void i40e_add_vxlan_port(struct net_device *netdev,
> -                               sa_family_t sa_family, __be16 port)
> +static void i40e_add_tunnel_port(struct net_device *netdev,
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type)
>  {
>         struct i40e_netdev_priv *np = netdev_priv(netdev);
>         struct i40e_vsi *vsi = np->vsi; @@ -8310,6 +8316,9 @@ static 
> void i40e_add_vxlan_port(struct net_device *netdev,
>         u8 next_idx;
>         u8 idx;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (sa_family == AF_INET6)
>                 return;
>
> @@ -8338,19 +8347,27 @@ static void i40e_add_vxlan_port(struct 
> net_device *netdev,  }
>
>  /**
> - * i40e_del_vxlan_port - Get notifications about VXLAN ports that go 
> away
> + * i40e_del_tunnel_port - Get notifications about UDP tunnel ports 
> + that go away
>   * @netdev: This physical port's netdev
> - * @sa_family: Socket Family that VXLAN is notifying us about
> - * @port: UDP port number that VXLAN stopped listening to
> + * @sa_family: Socket Family that tunnel netdev is associated with
> + * @port: UDP port number that tunnel stopped listening to
> + * @type: Tunnel Type
> + *
> + * This function modifies a common data structure for all udp_tunnels
> + * hence it is expected that it is called under common lock.
>   **/
> -static void i40e_del_vxlan_port(struct net_device *netdev,
> -                               sa_family_t sa_family, __be16 port)
> +static void i40e_del_tunnel_port(struct net_device *netdev,
> +                                sa_family_t sa_family, __be16 port,
> +                                u32 type)
>  {
>         struct i40e_netdev_priv *np = netdev_priv(netdev);
>         struct i40e_vsi *vsi = np->vsi;
>         struct i40e_pf *pf = vsi->back;
>         u8 idx;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (sa_family == AF_INET6)
>                 return;
>
> @@ -8596,8 +8613,8 @@ static const struct net_device_ops i40e_netdev_ops = {
>         .ndo_set_vf_link_state  = i40e_ndo_set_vf_link_state,
>         .ndo_set_vf_spoofchk    = i40e_ndo_set_vf_spoofchk,
>  #ifdef CONFIG_I40E_VXLAN
> -       .ndo_add_vxlan_port     = i40e_add_vxlan_port,
> -       .ndo_del_vxlan_port     = i40e_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = i40e_add_tunnel_port,
> +       .ndo_del_udp_tunnel_port        = i40e_del_tunnel_port,
>  #endif
>         .ndo_get_phys_port_id   = i40e_get_phys_port_id,
>         .ndo_fdb_add            = i40e_ndo_fdb_add,
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 4089d77..76ccc77 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -50,6 +50,7 @@
>  #include <linux/if_bridge.h>
>  #include <linux/prefetch.h>
>  #include <scsi/fc/fc_fcoe.h>
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>
>  #ifdef CONFIG_OF
> @@ -8088,14 +8089,18 @@ static int ixgbe_set_features(struct net_device *netdev,
>   * @dev: The port's netdev
>   * @sa_family: Socket Family that VXLAN is notifiying us about
>   * @port: New UDP port number that VXLAN started listening to
> + * @type: Tunnel type
>   **/
>  static void ixgbe_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
> -                                __be16 port)
> +                                __be16 port, u32 type)
>  {
>         struct ixgbe_adapter *adapter = netdev_priv(dev);
>         struct ixgbe_hw *hw = &adapter->hw;
>         u16 new_port = ntohs(port);
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (!(adapter->flags & IXGBE_FLAG_VXLAN_OFFLOAD_CAPABLE))
>                 return;
>
> @@ -8121,13 +8126,17 @@ static void ixgbe_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
>   * @dev: The port's netdev
>   * @sa_family: Socket Family that VXLAN is notifying us about
>   * @port: UDP port number that VXLAN stopped listening to
> + * @type: Tunnel type
>   **/
>  static void ixgbe_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
> -                                __be16 port)
> +                                __be16 port, u32 type)
>  {
>         struct ixgbe_adapter *adapter = netdev_priv(dev);
>         u16 new_port = ntohs(port);
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (!(adapter->flags & IXGBE_FLAG_VXLAN_OFFLOAD_CAPABLE))
>                 return;
>
> @@ -8436,8 +8445,8 @@ static const struct net_device_ops ixgbe_netdev_ops = {
>         .ndo_dfwd_add_station   = ixgbe_fwd_add,
>         .ndo_dfwd_del_station   = ixgbe_fwd_del,
>  #ifdef CONFIG_IXGBE_VXLAN
> -       .ndo_add_vxlan_port     = ixgbe_add_vxlan_port,
> -       .ndo_del_vxlan_port     = ixgbe_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = ixgbe_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = ixgbe_del_vxlan_port,
>  #endif /* CONFIG_IXGBE_VXLAN */
>         .ndo_features_check     = ixgbe_features_check,
>  };
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 659209f..2cb19c7 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -39,6 +39,7 @@
>  #include <linux/hash.h>
>  #include <net/ip.h>
>  #include <net/busy_poll.h>
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>
>  #include <linux/mlx4/driver.h>
> @@ -2365,11 +2366,15 @@ static void mlx4_en_del_vxlan_offloads(struct 
> work_struct *work)  }
>
>  static void mlx4_en_add_vxlan_port(struct  net_device *dev,
> -                                  sa_family_t sa_family, __be16 port)
> +                                  sa_family_t sa_family, __be16 port,
> +                                  u32 type)
>  {
>         struct mlx4_en_priv *priv = netdev_priv(dev);
>         __be16 current_port;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (priv->mdev->dev->caps.tunnel_offload_mode != MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
>                 return;
>
> @@ -2388,11 +2393,15 @@ static void mlx4_en_add_vxlan_port(struct  
> net_device *dev,  }
>
>  static void mlx4_en_del_vxlan_port(struct  net_device *dev,
> -                                  sa_family_t sa_family, __be16 port)
> +                                  sa_family_t sa_family, __be16 port,
> +                                  u32 type)
>  {
>         struct mlx4_en_priv *priv = netdev_priv(dev);
>         __be16 current_port;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (priv->mdev->dev->caps.tunnel_offload_mode != MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
>                 return;
>
> @@ -2469,8 +2478,8 @@ static const struct net_device_ops 
> mlx4_netdev_ops = {  #endif
>         .ndo_get_phys_port_id   = mlx4_en_get_phys_port_id,
>  #ifdef CONFIG_MLX4_EN_VXLAN
> -       .ndo_add_vxlan_port     = mlx4_en_add_vxlan_port,
> -       .ndo_del_vxlan_port     = mlx4_en_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = mlx4_en_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = mlx4_en_del_vxlan_port,
>         .ndo_features_check     = mlx4_en_features_check,
>  #endif
>         .ndo_set_tx_maxrate     = mlx4_en_set_tx_maxrate,
> @@ -2507,8 +2516,8 @@ static const struct net_device_ops 
> mlx4_netdev_ops_master = {  #endif
>         .ndo_get_phys_port_id   = mlx4_en_get_phys_port_id,
>  #ifdef CONFIG_MLX4_EN_VXLAN
> -       .ndo_add_vxlan_port     = mlx4_en_add_vxlan_port,
> -       .ndo_del_vxlan_port     = mlx4_en_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = mlx4_en_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = mlx4_en_del_vxlan_port,
>         .ndo_features_check     = mlx4_en_features_check,
>  #endif
>         .ndo_set_tx_maxrate     = mlx4_en_set_tx_maxrate,
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c 
> b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> index 1205f6f..aa38dbb 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> @@ -17,6 +17,7 @@
>  #include <linux/log2.h>
>  #include <linux/pci.h>
>  #ifdef CONFIG_QLCNIC_VXLAN
> +#include <net/udp_tunnel.h>
>  #include <net/vxlan.h>
>  #endif
>
> @@ -476,11 +477,15 @@ static int qlcnic_get_phys_port_id(struct 
> net_device *netdev,
>
>  #ifdef CONFIG_QLCNIC_VXLAN
>  static void qlcnic_add_vxlan_port(struct net_device *netdev,
> -                                 sa_family_t sa_family, __be16 port)
> +                                 sa_family_t sa_family, __be16 port,
> +                                 u32 type)
>  {
>         struct qlcnic_adapter *adapter = netdev_priv(netdev);
>         struct qlcnic_hardware_context *ahw = adapter->ahw;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         /* Adapter supports only one VXLAN port. Use very first port
>          * for enabling offload
>          */
> @@ -498,11 +503,15 @@ static void qlcnic_add_vxlan_port(struct 
> net_device *netdev,  }
>
>  static void qlcnic_del_vxlan_port(struct net_device *netdev,
> -                                 sa_family_t sa_family, __be16 port)
> +                                 sa_family_t sa_family, __be16 port,
> +                                 u32 type)
>  {
>         struct qlcnic_adapter *adapter = netdev_priv(netdev);
>         struct qlcnic_hardware_context *ahw = adapter->ahw;
>
> +       if (type != UDP_TUNNEL_VXLAN)
> +               return;
> +
>         if (!qlcnic_encap_rx_offload(adapter) || !ahw->vxlan_port_count ||
>             (ahw->vxlan_port != ntohs(port)))
>                 return;
> @@ -540,8 +549,8 @@ static const struct net_device_ops qlcnic_netdev_ops = {
>         .ndo_fdb_dump           = qlcnic_fdb_dump,
>         .ndo_get_phys_port_id   = qlcnic_get_phys_port_id,
>  #ifdef CONFIG_QLCNIC_VXLAN
> -       .ndo_add_vxlan_port     = qlcnic_add_vxlan_port,
> -       .ndo_del_vxlan_port     = qlcnic_del_vxlan_port,
> +       .ndo_add_udp_tunnel_port        = qlcnic_add_vxlan_port,
> +       .ndo_del_udp_tunnel_port        = qlcnic_del_vxlan_port,
>         .ndo_features_check     = qlcnic_features_check,
>  #endif
>  #ifdef CONFIG_NET_POLL_CONTROLLER
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 
> 6369a57..5490629 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -628,9 +628,10 @@ static void vxlan_notify_add_rx_port(struct 
> vxlan_sock *vs)
>
>         rcu_read_lock();
>         for_each_netdev_rcu(net, dev) {
> -               if (dev->netdev_ops->ndo_add_vxlan_port)
> -                       dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
> -                                                           port);
> +               if (dev->netdev_ops->ndo_add_udp_tunnel_port)
> +                       dev->netdev_ops->ndo_add_udp_tunnel_port(dev, sa_family,
> +                                                             port,
> +                                                             
> + UDP_TUNNEL_VXLAN);
>         }
>         rcu_read_unlock();
>  }
> @@ -646,9 +647,10 @@ static void vxlan_notify_del_rx_port(struct 
> vxlan_sock *vs)
>
>         rcu_read_lock();
>         for_each_netdev_rcu(net, dev) {
> -               if (dev->netdev_ops->ndo_del_vxlan_port)
> -                       dev->netdev_ops->ndo_del_vxlan_port(dev, sa_family,
> -                                                           port);
> +               if (dev->netdev_ops->ndo_del_udp_tunnel_port)
> +                       dev->netdev_ops->ndo_del_udp_tunnel_port(dev, sa_family,
> +                                                             port,
> +                                                             
> + UDP_TUNNEL_VXLAN);
>         }
>         rcu_read_unlock();
>
> @@ -2422,9 +2424,9 @@ static struct device_type vxlan_type = {
>         .name = "vxlan",
>  };
>
> -/* Calls the ndo_add_vxlan_port of the caller in order to
> +/* Calls the ndo_add_udp_tunnel_port of the caller in order to
>   * supply the listening VXLAN udp ports. Callers are expected
> - * to implement the ndo_add_vxlan_port.
> + * to implement the ndo_add_tunnel_port.
>   */
>  void vxlan_get_rx_port(struct net_device *dev)  { @@ -2440,8 +2442,9 
> @@ void vxlan_get_rx_port(struct net_device *dev)
>                 hlist_for_each_entry_rcu(vs, &vn->sock_list[i], hlist) {
>                         port = inet_sk(vs->sock->sk)->inet_sport;
>                         sa_family = vxlan_get_sk_family(vs);
> -                       dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
> -                                                           port);
> +                       dev->netdev_ops->ndo_add_udp_tunnel_port(dev, sa_family,
> +                                                             port,
> +                                                             
> + UDP_TUNNEL_VXLAN);
>                 }
>         }
>         spin_unlock(&vn->sock_lock);
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h 
> index 7d2d1d7..eaecc42 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1004,18 +1004,19 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>   *     not implement this, it is assumed that the hw is not able to have
>   *     multiple net devices on single physical port.
>   *
> - * void (*ndo_add_vxlan_port)(struct  net_device *dev,
> - *                           sa_family_t sa_family, __be16 port);
> - *     Called by vxlan to notiy a driver about the UDP port and socket
> - *     address family that vxlan is listnening to. It is called only when
> - *     a new port starts listening. The operation is protected by the
> - *     vxlan_net->sock_lock.
> - *
> - * void (*ndo_del_vxlan_port)(struct  net_device *dev,
> - *                           sa_family_t sa_family, __be16 port);
> - *     Called by vxlan to notify the driver about a UDP port and socket
> - *     address family that vxlan is not listening to anymore. The operation
> - *     is protected by the vxlan_net->sock_lock.
> + * void (*ndo_add_udp_tunnel_port)(struct  net_device *dev,
> + *                           sa_family_t sa_family, __be16 port, u32 type);
> + *     Called by UDP based tunnel modules to notify a driver about a UDP
> + *     port and socket address family that the tunnel is listening to. It is
> + *     called only when a new port starts listening. The operation is
> + *     protected by udp_offload_lock across all udp based tunnels.
> + *
> + * void (*ndo_del_udp_tunnel_port)(struct  net_device *dev,
> + *                           sa_family_t sa_family, __be16 port, u32 type);
> + *     Called by UDP based tunnel modules to notify the driver about a UDP port
> + *     and socket address family that tunnel is not listening to anymore.
> + *     The operation is protected by udp_offload_lock across all udp based
> + *     tunnels.
>   *
>   * void* (*ndo_dfwd_add_station)(struct net_device *pdev,
>   *                              struct net_device *dev)
> @@ -1209,13 +1210,12 @@ struct net_device_ops {
>                                                         struct netdev_phys_item_id *ppid);
>         int                     (*ndo_get_phys_port_name)(struct net_device *dev,
>                                                           char *name, size_t len);
> -       void                    (*ndo_add_vxlan_port)(struct  net_device *dev,
> +       void                    (*ndo_add_udp_tunnel_port)(struct  net_device *dev,
>                                                       sa_family_t sa_family,
> -                                                     __be16 port);
> -       void                    (*ndo_del_vxlan_port)(struct  net_device *dev,
> +                                                     __be16 port, u32 type);
> +       void                    (*ndo_del_udp_tunnel_port)(struct  net_device *dev,
>                                                       sa_family_t sa_family,
> -                                                     __be16 port);
> -
> +                                                     __be16 port, u32 
> + type);
>         void*                   (*ndo_dfwd_add_station)(struct net_device *pdev,
>                                                         struct net_device *dev);
>         void                    (*ndo_dfwd_del_station)(struct net_device *pdev,
> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h index 
> cb2f89f..72415aa 100644
> --- a/include/net/udp_tunnel.h
> +++ b/include/net/udp_tunnel.h
> @@ -9,6 +9,12 @@
>  #include <net/addrconf.h>
>  #endif
>
> +enum udp_tunnel_type {
> +       UDP_TUNNEL_UNSPEC,
> +       UDP_TUNNEL_VXLAN,
> +       UDP_TUNNEL_GENEVE,
> +};
> +
>  struct udp_port_cfg {
>         u8                      family;
>
> --
> 1.8.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in 
> the body of a message to majordomo@vger.kernel.org More majordomo info 
> at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-30  3:21     ` David Miller
  2015-11-30 21:33       ` Singhai, Anjali
@ 2015-12-01  0:25       ` Jesse Gross
  2015-12-01  1:02         ` Tom Herbert
  1 sibling, 1 reply; 94+ messages in thread
From: Jesse Gross @ 2015-12-01  0:25 UTC (permalink / raw)
  To: David Miller
  Cc: Tom Herbert, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Sun, Nov 29, 2015 at 7:21 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Mon, 23 Nov 2015 13:53:44 -0800
>
>> The bad effect of this model is that it is encourages HW vendors to
>> continue implement HW protocol specific support for encapsulations, we
>> get so much more benefit if they implement protocol generic
>> mechanisms.
>
> +1

Regardless of what happens in the future, I think the main question is
how this relates to the code that is currently present in the tree. We
already have NDOs for VXLAN offloading, which is about as protocol
specific as you can get. In my mind, this series is strictly an
improvement to what is already there - it pulls all hardware
offloading code out of the various protocol implementations and VXLAN
out of the driver interface. That seems like a pretty nice cleanup to
me.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-01  0:25       ` Jesse Gross
@ 2015-12-01  1:02         ` Tom Herbert
  2015-12-01  1:28           ` Jesse Gross
  0 siblings, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-12-01  1:02 UTC (permalink / raw)
  To: Jesse Gross
  Cc: David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Mon, Nov 30, 2015 at 4:25 PM, Jesse Gross <jesse@kernel.org> wrote:
> On Sun, Nov 29, 2015 at 7:21 PM, David Miller <davem@davemloft.net> wrote:
>> From: Tom Herbert <tom@herbertland.com>
>> Date: Mon, 23 Nov 2015 13:53:44 -0800
>>
>>> The bad effect of this model is that it is encourages HW vendors to
>>> continue implement HW protocol specific support for encapsulations, we
>>> get so much more benefit if they implement protocol generic
>>> mechanisms.
>>
>> +1
>
> Regardless of what happens in the future, I think the main question is
> how this relates to the code that is currently present in the tree. We
> already have NDOs for VXLAN offloading, which is about as protocol
> specific as you can get. In my mind, this series is strictly an
> improvement to what is already there - it pulls all hardware
> offloading code out of the various protocol implementations and VXLAN
> out of the driver interface. That seems like a pretty nice cleanup to
> me.

Jesse,

I don't think VXLAN is a good role model here. Consider that Cisco now
is basically trying to obsolete VXLAN in favor of VXLAN-GPE. VXLAN-GPE
is not compatible with VXLAN, so in order to get the same HW offloads
talking VXLAN-GPE users will probably need to swap out their HW. If I
am misreading this situation let me know, but to me this doesn't sound
like a model the stack should endorse.

Tom

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-01  1:02         ` Tom Herbert
@ 2015-12-01  1:28           ` Jesse Gross
  2015-12-01  5:26             ` Tom Herbert
  0 siblings, 1 reply; 94+ messages in thread
From: Jesse Gross @ 2015-12-01  1:28 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Mon, Nov 30, 2015 at 5:02 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Nov 30, 2015 at 4:25 PM, Jesse Gross <jesse@kernel.org> wrote:
>> On Sun, Nov 29, 2015 at 7:21 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Tom Herbert <tom@herbertland.com>
>>> Date: Mon, 23 Nov 2015 13:53:44 -0800
>>>
>>>> The bad effect of this model is that it is encourages HW vendors to
>>>> continue implement HW protocol specific support for encapsulations, we
>>>> get so much more benefit if they implement protocol generic
>>>> mechanisms.
>>>
>>> +1
>>
>> Regardless of what happens in the future, I think the main question is
>> how this relates to the code that is currently present in the tree. We
>> already have NDOs for VXLAN offloading, which is about as protocol
>> specific as you can get. In my mind, this series is strictly an
>> improvement to what is already there - it pulls all hardware
>> offloading code out of the various protocol implementations and VXLAN
>> out of the driver interface. That seems like a pretty nice cleanup to
>> me.
>
> Jesse,
>
> I don't think VXLAN is a good role model here. Consider that Cisco now
> is basically trying to obsolete VXLAN in favor of VXLAN-GPE. VXLAN-GPE
> is not compatible with VXLAN, so in order to get the same HW offloads
> talking VXLAN-GPE users will probably need to swap out their HW. If I
> am misreading this situation let me know, but to me this doesn't sound
> like a model the stack should endorse.

The point that I was trying to make is that we already have VXLAN
offloading in the stack and it exists there today. The way that it is
implemented is about as protocol specific as you can get - protocols
need to know about offloads and drivers need to know about protocols.
I would like to find a way to at least make the code cleaner while we
wait for hardware to evolve.

Based on what we can do today, I see only two real choices: do some
refactoring to clean up the stack a bit or remove the existing VXLAN
offloading altogether. I think this series is trying to do the former
and the result is that the stack is cleaner after than before. That
seems like a good thing.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-30 21:42                 ` Singhai, Anjali
  2015-11-30 21:48                   ` Tom Herbert
@ 2015-12-01  3:48                   ` David Miller
  2015-12-01  6:33                     ` Alexander Duyck
  1 sibling, 1 reply; 94+ messages in thread
From: David Miller @ 2015-12-01  3:48 UTC (permalink / raw)
  To: anjali.singhai; +Cc: tom, jesse.brandeburg, jesse, netdev, kiran.patil

From: "Singhai, Anjali" <anjali.singhai@intel.com>
Date: Mon, 30 Nov 2015 21:42:37 +0000

> The reason for receive being different than transmit is, on TX side
> driver can provide the meta data for where the checksum field is and
> what is the length that needs to be check summed to the HW on a per
> packet basis. On Rx the HW parser has to parse the packet to
> identify the tunnel type and based on that figure out the checksum
> locations and length in the packet, so definitely HW has to parse
> the packet and it can parse only based on next header type
> information or in case of udp tunnels based on udp port mapping to a
> particular protocol. I am not sure why you say it doesn't need to
> parse the packet, maybe I am miss- understanding something.
> Although it's not difficult to reduce protocol ossification on the
> RX side but it is certainly different and particularly in case of
> udp-tunnels it needs the port to protocol mapping.

You're just proving more and more why doing anything other than 2's
complement checksum provision in the RX descriptor is stupid.

Let me know when you guys enter this century.

I'll tell you right now that your arguments are akin to trying to
climb up a wall which is vertical.  I can assure you that you will not
reach your destination, so save your self some scratching and clawing
and accept reality.

Doing anything other than providing 2's complement checksums in the RX
descriptor doesn't work.  We know this.

So we will not add to our core architecture and frameworks anything
that directly facilitates designs which we know are suboptimal.  And
protocol specific support for tunnel offloading is suboptimal and not
the way forward.

I completly agree with Tom, his goals, his vision, and his priorities
when it comes to handling this stuff.  Don't fight it.

You also need to learn how to properly reply to list postings, it
looks terrible, as if you're replying to Tom then adding in my comment
which is what you are actually repling to.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-30 21:48                   ` Tom Herbert
@ 2015-12-01  3:51                     ` David Miller
  0 siblings, 0 replies; 94+ messages in thread
From: David Miller @ 2015-12-01  3:51 UTC (permalink / raw)
  To: tom; +Cc: anjali.singhai, jesse.brandeburg, jesse, netdev, kiran.patil

From: Tom Herbert <tom@herbertland.com>
Date: Mon, 30 Nov 2015 13:48:36 -0800

> Please look at how CHECKSUM_COMPLETE interface works. Description is
> in sk_buff.h or
> http://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf.

+1

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-11-30 21:53     ` Singhai, Anjali
@ 2015-12-01  3:52       ` David Miller
  0 siblings, 0 replies; 94+ messages in thread
From: David Miller @ 2015-12-01  3:52 UTC (permalink / raw)
  To: anjali.singhai; +Cc: tom, netdev, jesse, kiran.patil


Please learn how to properly quote people and respond to list postings.

The material from Tom you are quoting looks like it is something you
are writing.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-01  1:28           ` Jesse Gross
@ 2015-12-01  5:26             ` Tom Herbert
  2015-12-01 15:44               ` John W. Linville
  0 siblings, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-12-01  5:26 UTC (permalink / raw)
  To: Jesse Gross
  Cc: David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross <jesse@kernel.org> wrote:
> On Mon, Nov 30, 2015 at 5:02 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Mon, Nov 30, 2015 at 4:25 PM, Jesse Gross <jesse@kernel.org> wrote:
>>> On Sun, Nov 29, 2015 at 7:21 PM, David Miller <davem@davemloft.net> wrote:
>>>> From: Tom Herbert <tom@herbertland.com>
>>>> Date: Mon, 23 Nov 2015 13:53:44 -0800
>>>>
>>>>> The bad effect of this model is that it is encourages HW vendors to
>>>>> continue implement HW protocol specific support for encapsulations, we
>>>>> get so much more benefit if they implement protocol generic
>>>>> mechanisms.
>>>>
>>>> +1
>>>
>>> Regardless of what happens in the future, I think the main question is
>>> how this relates to the code that is currently present in the tree. We
>>> already have NDOs for VXLAN offloading, which is about as protocol
>>> specific as you can get. In my mind, this series is strictly an
>>> improvement to what is already there - it pulls all hardware
>>> offloading code out of the various protocol implementations and VXLAN
>>> out of the driver interface. That seems like a pretty nice cleanup to
>>> me.
>>
>> Jesse,
>>
>> I don't think VXLAN is a good role model here. Consider that Cisco now
>> is basically trying to obsolete VXLAN in favor of VXLAN-GPE. VXLAN-GPE
>> is not compatible with VXLAN, so in order to get the same HW offloads
>> talking VXLAN-GPE users will probably need to swap out their HW. If I
>> am misreading this situation let me know, but to me this doesn't sound
>> like a model the stack should endorse.
>
> The point that I was trying to make is that we already have VXLAN
> offloading in the stack and it exists there today. The way that it is
> implemented is about as protocol specific as you can get - protocols
> need to know about offloads and drivers need to know about protocols.
> I would like to find a way to at least make the code cleaner while we
> wait for hardware to evolve.
>
> Based on what we can do today, I see only two real choices: do some
> refactoring to clean up the stack a bit or remove the existing VXLAN
> offloading altogether. I think this series is trying to do the former
> and the result is that the stack is cleaner after than before. That
> seems like a good thing.

There is a third choice which is to do nothing. Creating an
infrastructure that claims to "Generalize udp based tunnel offload"
but actually doesn't generalize the mechanism is nothing more than
window dressing-- this does nothing to help with the VXLAN to
VXLAN-GPE transition for instance. If geneve specific offload is
really needed now then that can be should with another ndo function,
or alternatively ntuple filter with a device specific action would at
least get the stack out of needing to be concerned with that.
Regardless, we will work optimize the rest of the stack for devices
that implement protocol agnostic mechanisms.

Tom

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-01  3:48                   ` David Miller
@ 2015-12-01  6:33                     ` Alexander Duyck
  0 siblings, 0 replies; 94+ messages in thread
From: Alexander Duyck @ 2015-12-01  6:33 UTC (permalink / raw)
  To: David Miller
  Cc: Anjali Singhai Jain, Tom Herbert, Brandeburg, Jesse, jesse,
	Netdev, Kiran Patil

On Mon, Nov 30, 2015 at 7:48 PM, David Miller <davem@davemloft.net> wrote:
> From: "Singhai, Anjali" <anjali.singhai@intel.com>
> Date: Mon, 30 Nov 2015 21:42:37 +0000
>
>> The reason for receive being different than transmit is, on TX side
>> driver can provide the meta data for where the checksum field is and
>> what is the length that needs to be check summed to the HW on a per
>> packet basis. On Rx the HW parser has to parse the packet to
>> identify the tunnel type and based on that figure out the checksum
>> locations and length in the packet, so definitely HW has to parse
>> the packet and it can parse only based on next header type
>> information or in case of udp tunnels based on udp port mapping to a
>> particular protocol. I am not sure why you say it doesn't need to
>> parse the packet, maybe I am miss- understanding something.
>> Although it's not difficult to reduce protocol ossification on the
>> RX side but it is certainly different and particularly in case of
>> udp-tunnels it needs the port to protocol mapping.
>
> You're just proving more and more why doing anything other than 2's
> complement checksum provision in the RX descriptor is stupid.
>
> Let me know when you guys enter this century.
>
> I'll tell you right now that your arguments are akin to trying to
> climb up a wall which is vertical.  I can assure you that you will not
> reach your destination, so save your self some scratching and clawing
> and accept reality.
>
> Doing anything other than providing 2's complement checksums in the RX
> descriptor doesn't work.  We know this.

While I fully agree that the 2's compliment is the way to go we still
have to deal with all of the legacy hardware out there.  In addition
while the 2's compliment bit works for the checksums what are vendors
expected to do about other offloads that will need to parse inner
headers such as RSS, LRO, or network flow classificiation?  The
problem is that at some point the hardware does need to know how to
parse the tunnel headers for the purposes of doing offloads besides
checksum.

> So we will not add to our core architecture and frameworks anything
> that directly facilitates designs which we know are suboptimal.  And
> protocol specific support for tunnel offloading is suboptimal and not
> the way forward.
>
> I completly agree with Tom, his goals, his vision, and his priorities
> when it comes to handling this stuff.  Don't fight it.

I have to disagree here.  I really feel that going beyond
check-summing what Tom has proposed might be a step backwards.

We end up needing some sort of mechanism for identifying what the
tunnel frames will look like when doing any sort of Rx parsing.  Just
applying a generic offload for everything only really works on things
that are truly generic such as the 2's compliment checksum.  Tom had
mentioned possibly using something like the ntuple/nfc interface.  I
would argue that is kind of the direction this is going in but with a
few flaws.   For example, the notifier probably should pass a pointer
to the udp_port_cfg structure instead of just the family and port.
This way if a given UDP tunnel has IP endpoints it would be possible
to setup an ntuple/nfc style filter rule to only offload that tunnel
instead generically just basing things off of the port and family.

In addition it might be worthwhile to add a type field similar to what
is already in the fou_cfg block to the udp_port_cfg.  It could
probably just be moved from fou_cfg into udp_port_cfg for use by the
other tunnels, you might even move the protocol field while you are at
it.  Then they could be treated as a type of "action" indication to
the drivers that the given attributes will be associated with this
type of tunnel.  I would want to make sure that we give each unique
tunnel type its own value for the "action" so that if for example in
the future someone decided to offload fou, gue, l2tp, or whatever in
hardware they would have a way of identifying the incoming frames as
such and parse the inner headers.

Anyway that is my $0.02 on this.

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-01  5:26             ` Tom Herbert
@ 2015-12-01 15:44               ` John W. Linville
  2015-12-01 15:49                 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 94+ messages in thread
From: John W. Linville @ 2015-12-01 15:44 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
> On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross <jesse@kernel.org> wrote:

> > Based on what we can do today, I see only two real choices: do some
> > refactoring to clean up the stack a bit or remove the existing VXLAN
> > offloading altogether. I think this series is trying to do the former
> > and the result is that the stack is cleaner after than before. That
> > seems like a good thing.
> 
> There is a third choice which is to do nothing. Creating an
> infrastructure that claims to "Generalize udp based tunnel offload"
> but actually doesn't generalize the mechanism is nothing more than
> window dressing-- this does nothing to help with the VXLAN to
> VXLAN-GPE transition for instance. If geneve specific offload is
> really needed now then that can be should with another ndo function,
> or alternatively ntuple filter with a device specific action would at
> least get the stack out of needing to be concerned with that.
> Regardless, we will work optimize the rest of the stack for devices
> that implement protocol agnostic mechanisms.

Is there no concern about NDO proliferation? Does the size of the
netdev_ops structure matter? Beyond that, I can see how a single
entry point with an enum specifying the offload type isn't really any
different in the grand scheme of things than having multiple NDOs,
one per offload.

Given the need to live with existing hardware offloads, I would lean
toward a consolidated NDO. But if a different NDO per tunnel type is
preferred, I can be satisified with that.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-01 15:44               ` John W. Linville
@ 2015-12-01 15:49                 ` Hannes Frederic Sowa
  2015-12-01 16:08                   ` John W. Linville
  2015-12-02  3:50                   ` Tom Herbert
  0 siblings, 2 replies; 94+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-01 15:49 UTC (permalink / raw)
  To: John W. Linville, Tom Herbert
  Cc: Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:
> On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
> > On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross <jesse@kernel.org> wrote:
> 
> > > Based on what we can do today, I see only two real choices: do some
> > > refactoring to clean up the stack a bit or remove the existing VXLAN
> > > offloading altogether. I think this series is trying to do the former
> > > and the result is that the stack is cleaner after than before. That
> > > seems like a good thing.
> > 
> > There is a third choice which is to do nothing. Creating an
> > infrastructure that claims to "Generalize udp based tunnel offload"
> > but actually doesn't generalize the mechanism is nothing more than
> > window dressing-- this does nothing to help with the VXLAN to
> > VXLAN-GPE transition for instance. If geneve specific offload is
> > really needed now then that can be should with another ndo function,
> > or alternatively ntuple filter with a device specific action would at
> > least get the stack out of needing to be concerned with that.
> > Regardless, we will work optimize the rest of the stack for devices
> > that implement protocol agnostic mechanisms.
> 
> Is there no concern about NDO proliferation? Does the size of the
> netdev_ops structure matter? Beyond that, I can see how a single
> entry point with an enum specifying the offload type isn't really any
> different in the grand scheme of things than having multiple NDOs,
> one per offload.
> 
> Given the need to live with existing hardware offloads, I would lean
> toward a consolidated NDO. But if a different NDO per tunnel type is
> preferred, I can be satisified with that.

Having per-offloading NDOs helps the stack to gather further information
what kind of offloads the driver has even maybe without trying to call
down into the layer (just by comparing to NULL). Checking this inside
the driver offload function clearly does not have this feature. So we
finally can have "ip tunnel please-recommend-type" feature. :)

Bye,
Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-01 15:49                 ` Hannes Frederic Sowa
@ 2015-12-01 16:08                   ` John W. Linville
  2015-12-02  0:40                     ` Singhai, Anjali
  2015-12-02  3:50                   ` Tom Herbert
  1 sibling, 1 reply; 94+ messages in thread
From: John W. Linville @ 2015-12-01 16:08 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Tom Herbert, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Tue, Dec 01, 2015 at 04:49:28PM +0100, Hannes Frederic Sowa wrote:
> On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:
> > On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
> > > On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross <jesse@kernel.org> wrote:
> > 
> > > > Based on what we can do today, I see only two real choices: do some
> > > > refactoring to clean up the stack a bit or remove the existing VXLAN
> > > > offloading altogether. I think this series is trying to do the former
> > > > and the result is that the stack is cleaner after than before. That
> > > > seems like a good thing.
> > > 
> > > There is a third choice which is to do nothing. Creating an
> > > infrastructure that claims to "Generalize udp based tunnel offload"
> > > but actually doesn't generalize the mechanism is nothing more than
> > > window dressing-- this does nothing to help with the VXLAN to
> > > VXLAN-GPE transition for instance. If geneve specific offload is
> > > really needed now then that can be should with another ndo function,
> > > or alternatively ntuple filter with a device specific action would at
> > > least get the stack out of needing to be concerned with that.
> > > Regardless, we will work optimize the rest of the stack for devices
> > > that implement protocol agnostic mechanisms.
> > 
> > Is there no concern about NDO proliferation? Does the size of the
> > netdev_ops structure matter? Beyond that, I can see how a single
> > entry point with an enum specifying the offload type isn't really any
> > different in the grand scheme of things than having multiple NDOs,
> > one per offload.
> > 
> > Given the need to live with existing hardware offloads, I would lean
> > toward a consolidated NDO. But if a different NDO per tunnel type is
> > preferred, I can be satisified with that.
> 
> Having per-offloading NDOs helps the stack to gather further information
> what kind of offloads the driver has even maybe without trying to call
> down into the layer (just by comparing to NULL). Checking this inside
> the driver offload function clearly does not have this feature. So we
> finally can have "ip tunnel please-recommend-type" feature. :)

That is a valuable insight! Maybe the per-offload NDO isn't such a
bad idea afterall... :-)

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-01 16:08                   ` John W. Linville
@ 2015-12-02  0:40                     ` Singhai, Anjali
  0 siblings, 0 replies; 94+ messages in thread
From: Singhai, Anjali @ 2015-12-02  0:40 UTC (permalink / raw)
  To: John W. Linville, Hannes Frederic Sowa
  Cc: Tom Herbert, Jesse Gross, David Miller,
	Linux Kernel Network Developers, Kiran Patil



On 12/1/2015 8:08 AM, John W. Linville wrote:
> On Tue, Dec 01, 2015 at 04:49:28PM +0100, Hannes Frederic Sowa wrote:
>> On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:
>>> On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
>>>> On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>>> Based on what we can do today, I see only two real choices: do some
>>>>> refactoring to clean up the stack a bit or remove the existing VXLAN
>>>>> offloading altogether. I think this series is trying to do the former
>>>>> and the result is that the stack is cleaner after than before. That
>>>>> seems like a good thing.
>>>> There is a third choice which is to do nothing. Creating an
>>>> infrastructure that claims to "Generalize udp based tunnel offload"
>>>> but actually doesn't generalize the mechanism is nothing more than
>>>> window dressing-- this does nothing to help with the VXLAN to
>>>> VXLAN-GPE transition for instance. If geneve specific offload is
>>>> really needed now then that can be should with another ndo function,
>>>> or alternatively ntuple filter with a device specific action would at
>>>> least get the stack out of needing to be concerned with that.
>>>> Regardless, we will work optimize the rest of the stack for devices
>>>> that implement protocol agnostic mechanisms.
>>> Is there no concern about NDO proliferation? Does the size of the
>>> netdev_ops structure matter? Beyond that, I can see how a single
>>> entry point with an enum specifying the offload type isn't really any
>>> different in the grand scheme of things than having multiple NDOs,
>>> one per offload.
>>>
>>> Given the need to live with existing hardware offloads, I would lean
>>> toward a consolidated NDO. But if a different NDO per tunnel type is
>>> preferred, I can be satisified with that.
>> Having per-offloading NDOs helps the stack to gather further information
>> what kind of offloads the driver has even maybe without trying to call
>> down into the layer (just by comparing to NULL). Checking this inside
>> the driver offload function clearly does not have this feature. So we
>> finally can have "ip tunnel please-recommend-type" feature. :)
> That is a valuable insight! Maybe the per-offload NDO isn't such a
> bad idea afterall... :-)
>
> John
This helps me understand why having a separate ndo op might still be ok. 
Thanks for the feedback. I will go back to that model.  Also  I think I 
did finally understand the discussion on using a single 2's compliment 
checksum method
for future silicon.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-01 15:49                 ` Hannes Frederic Sowa
  2015-12-01 16:08                   ` John W. Linville
@ 2015-12-02  3:50                   ` Tom Herbert
  2015-12-02 16:35                     ` Hannes Frederic Sowa
  1 sibling, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-12-02  3:50 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Tue, Dec 1, 2015 at 7:49 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:
>> On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
>> > On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross <jesse@kernel.org> wrote:
>>
>> > > Based on what we can do today, I see only two real choices: do some
>> > > refactoring to clean up the stack a bit or remove the existing VXLAN
>> > > offloading altogether. I think this series is trying to do the former
>> > > and the result is that the stack is cleaner after than before. That
>> > > seems like a good thing.
>> >
>> > There is a third choice which is to do nothing. Creating an
>> > infrastructure that claims to "Generalize udp based tunnel offload"
>> > but actually doesn't generalize the mechanism is nothing more than
>> > window dressing-- this does nothing to help with the VXLAN to
>> > VXLAN-GPE transition for instance. If geneve specific offload is
>> > really needed now then that can be should with another ndo function,
>> > or alternatively ntuple filter with a device specific action would at
>> > least get the stack out of needing to be concerned with that.
>> > Regardless, we will work optimize the rest of the stack for devices
>> > that implement protocol agnostic mechanisms.
>>
>> Is there no concern about NDO proliferation? Does the size of the
>> netdev_ops structure matter? Beyond that, I can see how a single
>> entry point with an enum specifying the offload type isn't really any
>> different in the grand scheme of things than having multiple NDOs,
>> one per offload.
>>
>> Given the need to live with existing hardware offloads, I would lean
>> toward a consolidated NDO. But if a different NDO per tunnel type is
>> preferred, I can be satisified with that.
>
> Having per-offloading NDOs helps the stack to gather further information
> what kind of offloads the driver has even maybe without trying to call
> down into the layer (just by comparing to NULL). Checking this inside
> the driver offload function clearly does not have this feature. So we
> finally can have "ip tunnel please-recommend-type" feature. :)
>
That completely misses the whole point of the rest of this thread.
Protocol specific offloads are what we are trying to discourage not
encourage. Adding any more ndo functions for this purpose should be an
exception, not the norm. The bar should be naturally high considering
the cost of exposing this to ndo.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-02  3:50                   ` Tom Herbert
@ 2015-12-02 16:35                     ` Hannes Frederic Sowa
  2015-12-02 19:15                       ` Tom Herbert
  0 siblings, 1 reply; 94+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-02 16:35 UTC (permalink / raw)
  To: Tom Herbert
  Cc: John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Wed, Dec 2, 2015, at 04:50, Tom Herbert wrote:
> On Tue, Dec 1, 2015 at 7:49 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:
> >> On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
> >> > On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross <jesse@kernel.org> wrote:
> >>
> >> > > Based on what we can do today, I see only two real choices: do some
> >> > > refactoring to clean up the stack a bit or remove the existing VXLAN
> >> > > offloading altogether. I think this series is trying to do the former
> >> > > and the result is that the stack is cleaner after than before. That
> >> > > seems like a good thing.
> >> >
> >> > There is a third choice which is to do nothing. Creating an
> >> > infrastructure that claims to "Generalize udp based tunnel offload"
> >> > but actually doesn't generalize the mechanism is nothing more than
> >> > window dressing-- this does nothing to help with the VXLAN to
> >> > VXLAN-GPE transition for instance. If geneve specific offload is
> >> > really needed now then that can be should with another ndo function,
> >> > or alternatively ntuple filter with a device specific action would at
> >> > least get the stack out of needing to be concerned with that.
> >> > Regardless, we will work optimize the rest of the stack for devices
> >> > that implement protocol agnostic mechanisms.
> >>
> >> Is there no concern about NDO proliferation? Does the size of the
> >> netdev_ops structure matter? Beyond that, I can see how a single
> >> entry point with an enum specifying the offload type isn't really any
> >> different in the grand scheme of things than having multiple NDOs,
> >> one per offload.
> >>
> >> Given the need to live with existing hardware offloads, I would lean
> >> toward a consolidated NDO. But if a different NDO per tunnel type is
> >> preferred, I can be satisified with that.
> >
> > Having per-offloading NDOs helps the stack to gather further information
> > what kind of offloads the driver has even maybe without trying to call
> > down into the layer (just by comparing to NULL). Checking this inside
> > the driver offload function clearly does not have this feature. So we
> > finally can have "ip tunnel please-recommend-type" feature. :)
> >
> That completely misses the whole point of the rest of this thread.
> Protocol specific offloads are what we are trying to discourage not
> encourage. Adding any more ndo functions for this purpose should be an
> exception, not the norm. The bar should be naturally high considering
> the cost of exposing this to ndo.

Why?

I wonder why we need protocol generic offloads? I know there are
currently a lot of overlay encapsulation protocols. Are there many more
coming?

Besides, this offload is about TSO and RSS and they do need to parse the
packet to get the information where the inner header starts. It is not
only about checksum offloading.

If those protocols always carry an option length in the header we
probably could make it a little bit more generic, so the protocol
implementation could sound like:

  "Generic Tunnel Offloading besides protocols with chained options"

Unfortunately IPv6 extension headers are exactly a very good example
were this generic offloading would fail horribly as hardware has to
parse the header chain to reach the final (inner) protocol.

How to deal with the next protocol field in vxlan-gpe in a protocol
agnostic way (whoever came up with this)? (it has a special numbering
based on the ietf draft and I don't see any other way to say a network
card please interpret this field as specified by that rfcxxxx and this
is not protocol agnostic at all any more). I don't see how this
technically makes any sense and to implement this protocol agnostic.

Checksums maybe can, rest really does not make sense. Especially for NSH
I currently don't see how this can be done in general.

Please provide a sketch up for a protocol generic api that can tell
hardware where a inner protocol header starts that supports vxlan,
vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
starting at that point.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-02 16:35                     ` Hannes Frederic Sowa
@ 2015-12-02 19:15                       ` Tom Herbert
  2015-12-02 23:35                         ` John Fastabend
  2015-12-03 15:59                         ` Hannes Frederic Sowa
  0 siblings, 2 replies; 94+ messages in thread
From: Tom Herbert @ 2015-12-02 19:15 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Wed, Dec 2, 2015 at 8:35 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Wed, Dec 2, 2015, at 04:50, Tom Herbert wrote:
>> On Tue, Dec 1, 2015 at 7:49 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:
>> >> On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:
>> >> > On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross <jesse@kernel.org> wrote:
>> >>
>> >> > > Based on what we can do today, I see only two real choices: do some
>> >> > > refactoring to clean up the stack a bit or remove the existing VXLAN
>> >> > > offloading altogether. I think this series is trying to do the former
>> >> > > and the result is that the stack is cleaner after than before. That
>> >> > > seems like a good thing.
>> >> >
>> >> > There is a third choice which is to do nothing. Creating an
>> >> > infrastructure that claims to "Generalize udp based tunnel offload"
>> >> > but actually doesn't generalize the mechanism is nothing more than
>> >> > window dressing-- this does nothing to help with the VXLAN to
>> >> > VXLAN-GPE transition for instance. If geneve specific offload is
>> >> > really needed now then that can be should with another ndo function,
>> >> > or alternatively ntuple filter with a device specific action would at
>> >> > least get the stack out of needing to be concerned with that.
>> >> > Regardless, we will work optimize the rest of the stack for devices
>> >> > that implement protocol agnostic mechanisms.
>> >>
>> >> Is there no concern about NDO proliferation? Does the size of the
>> >> netdev_ops structure matter? Beyond that, I can see how a single
>> >> entry point with an enum specifying the offload type isn't really any
>> >> different in the grand scheme of things than having multiple NDOs,
>> >> one per offload.
>> >>
>> >> Given the need to live with existing hardware offloads, I would lean
>> >> toward a consolidated NDO. But if a different NDO per tunnel type is
>> >> preferred, I can be satisified with that.
>> >
>> > Having per-offloading NDOs helps the stack to gather further information
>> > what kind of offloads the driver has even maybe without trying to call
>> > down into the layer (just by comparing to NULL). Checking this inside
>> > the driver offload function clearly does not have this feature. So we
>> > finally can have "ip tunnel please-recommend-type" feature. :)
>> >
>> That completely misses the whole point of the rest of this thread.
>> Protocol specific offloads are what we are trying to discourage not
>> encourage. Adding any more ndo functions for this purpose should be an
>> exception, not the norm. The bar should be naturally high considering
>> the cost of exposing this to ndo.
>
> Why?
>
> I wonder why we need protocol generic offloads? I know there are
> currently a lot of overlay encapsulation protocols. Are there many more
> coming?
>
Yes, and assume that there are more coming with an unbounded limit
(for instance I just noticed today that there is a netdev1.1 talk on
supporting GTP in the kernel). Besides, this problem space not just
limited to offload of encapsulation protocols, but how to generalize
offload of any transport, IPv[46], application protocols, protocol
implemented in user space, security protocols, etc.

> Besides, this offload is about TSO and RSS and they do need to parse the
> packet to get the information where the inner header starts. It is not
> only about checksum offloading.
>
RSS does not require the device to parse the inner header. All the UDP
encapsulations protocols being defined set the source port to entropy
flow value and most devices already support RSS+UDP (just needs to be
enabled) so this works just fine with dumb NICs. In fact, this is one
of the main motivations of encapsulating UDP in the first place, to
leverage existing RSS and ECMP mechanisms. The more general solution
is to use IPv6 flow label (RFC6438). We need HW support to include the
flow label into the hash for ECMP and RSS, but once we have that much
of the motivation for using UDP goes away and we can get back to just
doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
complexity of UDP encap).

> Please provide a sketch up for a protocol generic api that can tell
> hardware where a inner protocol header starts that supports vxlan,
> vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
> starting at that point.
>
BPF. Implementing protocol generic offloads are not just a HW concern
either, adding kernel GRO code for every possible protocol that comes
along doesn't scale well. This becomes especially obvious when we
consider how to provide offloads for applications protocols. If the
kernel provides a programmable framework for the offloads then
application protocols, such as QUIC, could use use that without
needing to hack the kernel to support the specific protocol (which no
one wants!). Application protocol parsing in KCM and some other use
cases of BPF have already foreshadowed this, and we are working on a
prototype for a BPF programmable engine in the kernel. Presumably,
this same model could eventually be applied as the HW API to
programmable offload.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-02 19:15                       ` Tom Herbert
@ 2015-12-02 23:35                         ` John Fastabend
  2015-12-03  0:15                           ` Tom Herbert
  2015-12-03  2:08                           ` Alexei Starovoitov
  2015-12-03 15:59                         ` Hannes Frederic Sowa
  1 sibling, 2 replies; 94+ messages in thread
From: John Fastabend @ 2015-12-02 23:35 UTC (permalink / raw)
  To: Tom Herbert, Hannes Frederic Sowa
  Cc: John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

[...]

>>
>> I wonder why we need protocol generic offloads? I know there are
>> currently a lot of overlay encapsulation protocols. Are there many more
>> coming?
>>
> Yes, and assume that there are more coming with an unbounded limit
> (for instance I just noticed today that there is a netdev1.1 talk on
> supporting GTP in the kernel). Besides, this problem space not just
> limited to offload of encapsulation protocols, but how to generalize
> offload of any transport, IPv[46], application protocols, protocol
> implemented in user space, security protocols, etc.
> 
>> Besides, this offload is about TSO and RSS and they do need to parse the
>> packet to get the information where the inner header starts. It is not
>> only about checksum offloading.
>>
> RSS does not require the device to parse the inner header. All the UDP
> encapsulations protocols being defined set the source port to entropy
> flow value and most devices already support RSS+UDP (just needs to be
> enabled) so this works just fine with dumb NICs. In fact, this is one
> of the main motivations of encapsulating UDP in the first place, to
> leverage existing RSS and ECMP mechanisms. The more general solution
> is to use IPv6 flow label (RFC6438). We need HW support to include the
> flow label into the hash for ECMP and RSS, but once we have that much
> of the motivation for using UDP goes away and we can get back to just
> doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
> complexity of UDP encap).
> 
>> Please provide a sketch up for a protocol generic api that can tell
>> hardware where a inner protocol header starts that supports vxlan,
>> vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
>> starting at that point.
>>
> BPF. Implementing protocol generic offloads are not just a HW concern
> either, adding kernel GRO code for every possible protocol that comes
> along doesn't scale well. This becomes especially obvious when we
> consider how to provide offloads for applications protocols. If the
> kernel provides a programmable framework for the offloads then
> application protocols, such as QUIC, could use use that without
> needing to hack the kernel to support the specific protocol (which no
> one wants!). Application protocol parsing in KCM and some other use
> cases of BPF have already foreshadowed this, and we are working on a
> prototype for a BPF programmable engine in the kernel. Presumably,
> this same model could eventually be applied as the HW API to
> programmable offload.

Just keying off the last statement there...

I think BPF programs are going to be hard to translate into hardware
for most devices. The problem is the BPF programs in general lack
structure. A parse graph would be much more friendly for hardware or
at minimum the BPF program would need to be a some sort of
well-structured program so a driver could turn that into a parse graph.

.John

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-02 23:35                         ` John Fastabend
@ 2015-12-03  0:15                           ` Tom Herbert
  2015-12-08  7:33                             ` John Fastabend
  2015-12-03  2:08                           ` Alexei Starovoitov
  1 sibling, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-12-03  0:15 UTC (permalink / raw)
  To: John Fastabend
  Cc: Hannes Frederic Sowa, John W. Linville, Jesse Gross,
	David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Wed, Dec 2, 2015 at 3:35 PM, John Fastabend <john.fastabend@gmail.com> wrote:
> [...]
>
>>>
>>> I wonder why we need protocol generic offloads? I know there are
>>> currently a lot of overlay encapsulation protocols. Are there many more
>>> coming?
>>>
>> Yes, and assume that there are more coming with an unbounded limit
>> (for instance I just noticed today that there is a netdev1.1 talk on
>> supporting GTP in the kernel). Besides, this problem space not just
>> limited to offload of encapsulation protocols, but how to generalize
>> offload of any transport, IPv[46], application protocols, protocol
>> implemented in user space, security protocols, etc.
>>
>>> Besides, this offload is about TSO and RSS and they do need to parse the
>>> packet to get the information where the inner header starts. It is not
>>> only about checksum offloading.
>>>
>> RSS does not require the device to parse the inner header. All the UDP
>> encapsulations protocols being defined set the source port to entropy
>> flow value and most devices already support RSS+UDP (just needs to be
>> enabled) so this works just fine with dumb NICs. In fact, this is one
>> of the main motivations of encapsulating UDP in the first place, to
>> leverage existing RSS and ECMP mechanisms. The more general solution
>> is to use IPv6 flow label (RFC6438). We need HW support to include the
>> flow label into the hash for ECMP and RSS, but once we have that much
>> of the motivation for using UDP goes away and we can get back to just
>> doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
>> complexity of UDP encap).
>>
>>> Please provide a sketch up for a protocol generic api that can tell
>>> hardware where a inner protocol header starts that supports vxlan,
>>> vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
>>> starting at that point.
>>>
>> BPF. Implementing protocol generic offloads are not just a HW concern
>> either, adding kernel GRO code for every possible protocol that comes
>> along doesn't scale well. This becomes especially obvious when we
>> consider how to provide offloads for applications protocols. If the
>> kernel provides a programmable framework for the offloads then
>> application protocols, such as QUIC, could use use that without
>> needing to hack the kernel to support the specific protocol (which no
>> one wants!). Application protocol parsing in KCM and some other use
>> cases of BPF have already foreshadowed this, and we are working on a
>> prototype for a BPF programmable engine in the kernel. Presumably,
>> this same model could eventually be applied as the HW API to
>> programmable offload.
>
> Just keying off the last statement there...
>
> I think BPF programs are going to be hard to translate into hardware
> for most devices. The problem is the BPF programs in general lack
> structure. A parse graph would be much more friendly for hardware or
> at minimum the BPF program would need to be a some sort of
> well-structured program so a driver could turn that into a parse graph.
>
This might be relevant:
http://richard.systems/research/pdf/IEEE_HPSR_BPF_OPENFLOW.pdf

> .John

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-02 23:35                         ` John Fastabend
  2015-12-03  0:15                           ` Tom Herbert
@ 2015-12-03  2:08                           ` Alexei Starovoitov
  1 sibling, 0 replies; 94+ messages in thread
From: Alexei Starovoitov @ 2015-12-03  2:08 UTC (permalink / raw)
  To: John Fastabend
  Cc: Tom Herbert, Hannes Frederic Sowa, John W. Linville, Jesse Gross,
	David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Wed, Dec 02, 2015 at 03:35:53PM -0800, John Fastabend wrote:
> [...]
> > BPF. Implementing protocol generic offloads are not just a HW concern
> > either, adding kernel GRO code for every possible protocol that comes
> > along doesn't scale well. This becomes especially obvious when we
> > consider how to provide offloads for applications protocols. If the
> > kernel provides a programmable framework for the offloads then
> > application protocols, such as QUIC, could use use that without
> > needing to hack the kernel to support the specific protocol (which no
> > one wants!). Application protocol parsing in KCM and some other use
> > cases of BPF have already foreshadowed this, and we are working on a
> > prototype for a BPF programmable engine in the kernel. Presumably,
> > this same model could eventually be applied as the HW API to
> > programmable offload.
> 
> Just keying off the last statement there...
> 
> I think BPF programs are going to be hard to translate into hardware
> for most devices. The problem is the BPF programs in general lack
> structure. A parse graph would be much more friendly for hardware or
> at minimum the BPF program would need to be a some sort of
> well-structured program so a driver could turn that into a parse graph.

I'm looking at bpf as a way to describe the intent of what HW or SW has to do
and in case of SW it's easy to JIT and execute, but nic/switch doesn't
have to 'execute' bpf instructions. If it's fpga based it can compile
bpf program into parallel gates. Less flexible HW would not be able
to off-load all programs. That's fine. Long term flexible SW will
push HW to be flexible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-02 19:15                       ` Tom Herbert
  2015-12-02 23:35                         ` John Fastabend
@ 2015-12-03 15:59                         ` Hannes Frederic Sowa
  2015-12-03 16:35                           ` Andreas Schultz
  2015-12-04 18:28                           ` Tom Herbert
  1 sibling, 2 replies; 94+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-03 15:59 UTC (permalink / raw)
  To: Tom Herbert
  Cc: John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

Hi Tom,

On Wed, Dec 2, 2015, at 20:15, Tom Herbert wrote:
> On Wed, Dec 2, 2015 at 8:35 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Wed, Dec 2, 2015, at 04:50, Tom Herbert wrote:
> >> That completely misses the whole point of the rest of this thread.
> >> Protocol specific offloads are what we are trying to discourage not
> >> encourage. Adding any more ndo functions for this purpose should be an
> >> exception, not the norm. The bar should be naturally high considering
> >> the cost of exposing this to ndo.
> >
> > Why?
> >
> > I wonder why we need protocol generic offloads? I know there are
> > currently a lot of overlay encapsulation protocols. Are there many more
> > coming?
> >
> Yes, and assume that there are more coming with an unbounded limit
> (for instance I just noticed today that there is a netdev1.1 talk on
> supporting GTP in the kernel). Besides, this problem space not just
> limited to offload of encapsulation protocols, but how to generalize
> offload of any transport, IPv[46], application protocols, protocol
> implemented in user space, security protocols, etc.

GTP seems to be a tunneling protocol also based on TCP, I hope the same
standards apply to it as STT at that time (depending on the
implementation, of course). There are some other protocols on its way, I
see but they can just be realized as kernel modules and that's it.

I am also not sure I can follow, some time ago the use of TOE (TCP
Offload Engine) were pretty much banished from entering the linux
kernel, has this really changed? It would be needed to do hardware
offloading of all other protocols inside TCP, no?

There are really a lot of tunneling protocols nowadays.

> > Besides, this offload is about TSO and RSS and they do need to parse the
> > packet to get the information where the inner header starts. It is not
> > only about checksum offloading.
> >
> RSS does not require the device to parse the inner header. All the UDP
> encapsulations protocols being defined set the source port to entropy
> flow value and most devices already support RSS+UDP (just needs to be
> enabled) so this works just fine with dumb NICs. In fact, this is one
> of the main motivations of encapsulating UDP in the first place, to
> leverage existing RSS and ECMP mechanisms. The more general solution
> is to use IPv6 flow label (RFC6438). We need HW support to include the
> flow label into the hash for ECMP and RSS, but once we have that much
> of the motivation for using UDP goes away and we can get back to just
> doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
> complexity of UDP encap).

I do know that, but fact is, the current drivers do it. I am concerned
about the amount of entropy in one single 16 bit field used to
distinguish flows. Flow labels fine and good, but if current hardware
does not support it, it does not help. Imagine containers with lots of
applications, 16 bit doesn't seem to fit here.

> > Please provide a sketch up for a protocol generic api that can tell
> > hardware where a inner protocol header starts that supports vxlan,
> > vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
> > starting at that point.
> >
> BPF. Implementing protocol generic offloads are not just a HW concern
> either, adding kernel GRO code for every possible protocol that comes
> along doesn't scale well. This becomes especially obvious when we
> consider how to provide offloads for applications protocols. If the
> kernel provides a programmable framework for the offloads then
> application protocols, such as QUIC, could use use that without
> needing to hack the kernel to support the specific protocol (which no
> one wants!). Application protocol parsing in KCM and some other use
> cases of BPF have already foreshadowed this, and we are working on a
> prototype for a BPF programmable engine in the kernel. Presumably,
> this same model could eventually be applied as the HW API to
> programmable offload.

So your proposal is like this:

dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?

What do network cards do when they don't support bpf in hardware as
currently all cards. Should they do program equivalence testing on the
bpf program to check if it conforms some of its offload capabilities and
activate those for the port they parsed out of the bpf program? I don't
really care about more function pointers in struct net_device_ops
because it really doesn't matter but what really concerns me is the huge
size of the drivers in the kernel. Just tell the driver specifically
what is wanted and let them do that. Don't force them to do program
inspection or anything.

About your argument regarding GRO for every possible protocol:

Adding GRO for QUIC or SPUD transparently does not work as it breaks the
semantics of UDP. UDP is a framed protocol not a streamed one so it does
not make sense to add that. You can implement GRO for fragmented UDP,
though. The length of the packet is end-to-end information. If you add a
new protocol with a new socket type, sure you can add GRO engine
transparently for that but not simply peeking data inside UDP if you
don't know how the local application uses this data. In case of
forwarding you can never do that, it will break the internet actually.
In case you are the end host GRO engine can ask the socket what type it
is or what framing inside UDP is used. Thus this cannot work on hardware
either.

I am not very happy with the use cases of BPF outside of tracing and
cls_bpf and packet steering.

Please don't propose that we should use BPF as the API for HW
programmable offloading currently. It does not make sense.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-03 15:59                         ` Hannes Frederic Sowa
@ 2015-12-03 16:35                           ` Andreas Schultz
  2015-12-03 16:43                             ` Hannes Frederic Sowa
  2015-12-04 18:28                           ` Tom Herbert
  1 sibling, 1 reply; 94+ messages in thread
From: Andreas Schultz @ 2015-12-03 16:35 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Tom Herbert
  Cc: John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil



On 12/03/2015 04:59 PM, Hannes Frederic Sowa wrote:
> Hi Tom,
>
> On Wed, Dec 2, 2015, at 20:15, Tom Herbert wrote:
>> On Wed, Dec 2, 2015 at 8:35 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>>> On Wed, Dec 2, 2015, at 04:50, Tom Herbert wrote:
>>>> That completely misses the whole point of the rest of this thread.
>>>> Protocol specific offloads are what we are trying to discourage not
>>>> encourage. Adding any more ndo functions for this purpose should be an
>>>> exception, not the norm. The bar should be naturally high considering
>>>> the cost of exposing this to ndo.
>>>
>>> Why?
>>>
>>> I wonder why we need protocol generic offloads? I know there are
>>> currently a lot of overlay encapsulation protocols. Are there many more
>>> coming?
>>>
>> Yes, and assume that there are more coming with an unbounded limit
>> (for instance I just noticed today that there is a netdev1.1 talk on
>> supporting GTP in the kernel). Besides, this problem space not just
>> limited to offload of encapsulation protocols, but how to generalize
>> offload of any transport, IPv[46], application protocols, protocol
>> implemented in user space, security protocols, etc.
>
> GTP seems to be a tunneling protocol also based on TCP, I hope the same
> standards apply to it as STT at that time (depending on the
> implementation, of course). There are some other protocols on its way, I
> see but they can just be realized as kernel modules and that's it.

GTP is UDP based. The standard permits a variable length header (one can
add extensions after a fixed header), but that is seldom (or even never)
used. Tunnel are identified by a 32bit tunnel endpoint id for GTPv1 and
a 64bit flow id for GTPv0. UDP destination ports differ for v1 and v0,
so it's easy to distinguish.

The biggest pain when implementing GTP are the path maintenance procedures.
But this really has nothing to do with offloads

Andreas

> I am also not sure I can follow, some time ago the use of TOE (TCP
> Offload Engine) were pretty much banished from entering the linux
> kernel, has this really changed? It would be needed to do hardware
> offloading of all other protocols inside TCP, no?
>
> There are really a lot of tunneling protocols nowadays.
>
>>> Besides, this offload is about TSO and RSS and they do need to parse the
>>> packet to get the information where the inner header starts. It is not
>>> only about checksum offloading.
>>>
>> RSS does not require the device to parse the inner header. All the UDP
>> encapsulations protocols being defined set the source port to entropy
>> flow value and most devices already support RSS+UDP (just needs to be
>> enabled) so this works just fine with dumb NICs. In fact, this is one
>> of the main motivations of encapsulating UDP in the first place, to
>> leverage existing RSS and ECMP mechanisms. The more general solution
>> is to use IPv6 flow label (RFC6438). We need HW support to include the
>> flow label into the hash for ECMP and RSS, but once we have that much
>> of the motivation for using UDP goes away and we can get back to just
>> doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
>> complexity of UDP encap).
>
> I do know that, but fact is, the current drivers do it. I am concerned
> about the amount of entropy in one single 16 bit field used to
> distinguish flows. Flow labels fine and good, but if current hardware
> does not support it, it does not help. Imagine containers with lots of
> applications, 16 bit doesn't seem to fit here.
>
>>> Please provide a sketch up for a protocol generic api that can tell
>>> hardware where a inner protocol header starts that supports vxlan,
>>> vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
>>> starting at that point.
>>>
>> BPF. Implementing protocol generic offloads are not just a HW concern
>> either, adding kernel GRO code for every possible protocol that comes
>> along doesn't scale well. This becomes especially obvious when we
>> consider how to provide offloads for applications protocols. If the
>> kernel provides a programmable framework for the offloads then
>> application protocols, such as QUIC, could use use that without
>> needing to hack the kernel to support the specific protocol (which no
>> one wants!). Application protocol parsing in KCM and some other use
>> cases of BPF have already foreshadowed this, and we are working on a
>> prototype for a BPF programmable engine in the kernel. Presumably,
>> this same model could eventually be applied as the HW API to
>> programmable offload.
>
> So your proposal is like this:
>
> dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?
>
> What do network cards do when they don't support bpf in hardware as
> currently all cards. Should they do program equivalence testing on the
> bpf program to check if it conforms some of its offload capabilities and
> activate those for the port they parsed out of the bpf program? I don't
> really care about more function pointers in struct net_device_ops
> because it really doesn't matter but what really concerns me is the huge
> size of the drivers in the kernel. Just tell the driver specifically
> what is wanted and let them do that. Don't force them to do program
> inspection or anything.
>
> About your argument regarding GRO for every possible protocol:
>
> Adding GRO for QUIC or SPUD transparently does not work as it breaks the
> semantics of UDP. UDP is a framed protocol not a streamed one so it does
> not make sense to add that. You can implement GRO for fragmented UDP,
> though. The length of the packet is end-to-end information. If you add a
> new protocol with a new socket type, sure you can add GRO engine
> transparently for that but not simply peeking data inside UDP if you
> don't know how the local application uses this data. In case of
> forwarding you can never do that, it will break the internet actually.
> In case you are the end host GRO engine can ask the socket what type it
> is or what framing inside UDP is used. Thus this cannot work on hardware
> either.
>
> I am not very happy with the use cases of BPF outside of tracing and
> cls_bpf and packet steering.
>
> Please don't propose that we should use BPF as the API for HW
> programmable offloading currently. It does not make sense.
>
> Bye,
> Hannes
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-03 16:35                           ` Andreas Schultz
@ 2015-12-03 16:43                             ` Hannes Frederic Sowa
  0 siblings, 0 replies; 94+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-03 16:43 UTC (permalink / raw)
  To: Andreas Schultz, Tom Herbert
  Cc: John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Thu, Dec 3, 2015, at 17:35, Andreas Schultz wrote:
> On 12/03/2015 04:59 PM, Hannes Frederic Sowa wrote:
> > Hi Tom,
> >
> > On Wed, Dec 2, 2015, at 20:15, Tom Herbert wrote:
> >> On Wed, Dec 2, 2015 at 8:35 AM, Hannes Frederic Sowa
> >> <hannes@stressinduktion.org> wrote:
> >>> On Wed, Dec 2, 2015, at 04:50, Tom Herbert wrote:
> >>>> That completely misses the whole point of the rest of this thread.
> >>>> Protocol specific offloads are what we are trying to discourage not
> >>>> encourage. Adding any more ndo functions for this purpose should be an
> >>>> exception, not the norm. The bar should be naturally high considering
> >>>> the cost of exposing this to ndo.
> >>>
> >>> Why?
> >>>
> >>> I wonder why we need protocol generic offloads? I know there are
> >>> currently a lot of overlay encapsulation protocols. Are there many more
> >>> coming?
> >>>
> >> Yes, and assume that there are more coming with an unbounded limit
> >> (for instance I just noticed today that there is a netdev1.1 talk on
> >> supporting GTP in the kernel). Besides, this problem space not just
> >> limited to offload of encapsulation protocols, but how to generalize
> >> offload of any transport, IPv[46], application protocols, protocol
> >> implemented in user space, security protocols, etc.
> >
> > GTP seems to be a tunneling protocol also based on TCP, I hope the same
> > standards apply to it as STT at that time (depending on the
> > implementation, of course). There are some other protocols on its way, I
> > see but they can just be realized as kernel modules and that's it.
> 
> GTP is UDP based. The standard permits a variable length header (one can
> add extensions after a fixed header), but that is seldom (or even never)
> used. Tunnel are identified by a 32bit tunnel endpoint id for GTPv1 and
> a 64bit flow id for GTPv0. UDP destination ports differ for v1 and v0,
> so it's easy to distinguish.

Ok, thanks for letting me know. Browsing in Wikipedia first mentioned
both TCP and UDP. But I see that v1 only uses UDP.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-03 15:59                         ` Hannes Frederic Sowa
  2015-12-03 16:35                           ` Andreas Schultz
@ 2015-12-04 18:28                           ` Tom Herbert
  2015-12-04 19:54                             ` John Fastabend
  2015-12-04 19:59                             ` Hannes Frederic Sowa
  1 sibling, 2 replies; 94+ messages in thread
From: Tom Herbert @ 2015-12-04 18:28 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

> I do know that, but fact is, the current drivers do it. I am concerned
> about the amount of entropy in one single 16 bit field used to
> distinguish flows. Flow labels fine and good, but if current hardware
> does not support it, it does not help. Imagine containers with lots of
> applications, 16 bit doesn't seem to fit here.
>
Based on what? RSS indirection table is only seven bits so even 16
bits would be overkill for that. Please provide a concrete example,
data where 16 bits wouldn't be sufficient.

>> > Please provide a sketch up for a protocol generic api that can tell
>> > hardware where a inner protocol header starts that supports vxlan,
>> > vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
>> > starting at that point.
>> >
>> BPF. Implementing protocol generic offloads are not just a HW concern
>> either, adding kernel GRO code for every possible protocol that comes
>> along doesn't scale well. This becomes especially obvious when we
>> consider how to provide offloads for applications protocols. If the
>> kernel provides a programmable framework for the offloads then
>> application protocols, such as QUIC, could use use that without
>> needing to hack the kernel to support the specific protocol (which no
>> one wants!). Application protocol parsing in KCM and some other use
>> cases of BPF have already foreshadowed this, and we are working on a
>> prototype for a BPF programmable engine in the kernel. Presumably,
>> this same model could eventually be applied as the HW API to
>> programmable offload.
>
> So your proposal is like this:
>
> dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?
>
> What do network cards do when they don't support bpf in hardware as
> currently all cards. Should they do program equivalence testing on the
> bpf program to check if it conforms some of its offload capabilities and
> activate those for the port they parsed out of the bpf program? I don't
> really care about more function pointers in struct net_device_ops
> because it really doesn't matter but what really concerns me is the huge
> size of the drivers in the kernel. Just tell the driver specifically
> what is wanted and let them do that. Don't force them to do program
> inspection or anything.
>
Nobody is forcing anyone to do anything. If someone implements generic
offload like this it's treated just like any other optional feature of
a NIC.

> About your argument regarding GRO for every possible protocol:
>
> Adding GRO for QUIC or SPUD transparently does not work as it breaks the
> semantics of UDP. UDP is a framed protocol not a streamed one so it does
> not make sense to add that. You can implement GRO for fragmented UDP,
> though. The length of the packet is end-to-end information. If you add a
> new protocol with a new socket type, sure you can add GRO engine
> transparently for that but not simply peeking data inside UDP if you
> don't know how the local application uses this data. In case of
> forwarding you can never do that, it will break the internet actually.
> In case you are the end host GRO engine can ask the socket what type it
> is or what framing inside UDP is used. Thus this cannot work on hardware
> either.
>
This is not correct, We already have many instances of GRO being used
over UDP in several UDP encapsulations, there is no issue with
breaking UDP semantics. QUIC is a stream based transport like TCP so
it will fit into the model (granted the fact that this incoming from
userspace and the per packet security will make it little more
challenging to implement offload). I don't know if this is needed, but
I can only assume that server performance in QUIC must be miserable if
all the I/O is 1350 bytes.

> I am not very happy with the use cases of BPF outside of tracing and
> cls_bpf and packet steering.
>
> Please don't propose that we should use BPF as the API for HW
> programmable offloading currently. It does not make sense.
>
If you have an alternative, please propose it now.

> Bye,
> Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 18:28                           ` Tom Herbert
@ 2015-12-04 19:54                             ` John Fastabend
  2015-12-04 19:59                             ` Hannes Frederic Sowa
  1 sibling, 0 replies; 94+ messages in thread
From: John Fastabend @ 2015-12-04 19:54 UTC (permalink / raw)
  To: Tom Herbert, Hannes Frederic Sowa
  Cc: John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

[...]

>>>> Please provide a sketch up for a protocol generic api that can tell
>>>> hardware where a inner protocol header starts that supports vxlan,
>>>> vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
>>>> starting at that point.
>>>>
>>> BPF. Implementing protocol generic offloads are not just a HW concern
>>> either, adding kernel GRO code for every possible protocol that comes
>>> along doesn't scale well. This becomes especially obvious when we
>>> consider how to provide offloads for applications protocols. If the
>>> kernel provides a programmable framework for the offloads then
>>> application protocols, such as QUIC, could use use that without
>>> needing to hack the kernel to support the specific protocol (which no
>>> one wants!). Application protocol parsing in KCM and some other use
>>> cases of BPF have already foreshadowed this, and we are working on a
>>> prototype for a BPF programmable engine in the kernel. Presumably,
>>> this same model could eventually be applied as the HW API to
>>> programmable offload.
>>
>> So your proposal is like this:
>>
>> dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?
>>
>> What do network cards do when they don't support bpf in hardware as
>> currently all cards. Should they do program equivalence testing on the
>> bpf program to check if it conforms some of its offload capabilities and
>> activate those for the port they parsed out of the bpf program? I don't
>> really care about more function pointers in struct net_device_ops
>> because it really doesn't matter but what really concerns me is the huge
>> size of the drivers in the kernel. Just tell the driver specifically
>> what is wanted and let them do that. Don't force them to do program
>> inspection or anything.
>>
> Nobody is forcing anyone to do anything. If someone implements generic
> offload like this it's treated just like any other optional feature of
> a NIC.
> 

My concern with this approach is it seems to imply either you have
a BPF engine in hardware (via FPGA or NPU) or you do a program
transformation of a BPF program into registers. Possibly by building
the control flow graph and mapping that onto a parse graph. Maybe
this could be done in some library code for drivers to use but it
seems a bit unnecessary to me when we could make an API map to this
class of hardware.

Note I think a ndo_add_opffload is really useful and needed for
one class  of devices but misses the mark slightly for a large class of
devices we have today/tomorrow.

>> About your argument regarding GRO for every possible protocol:
>>
>> Adding GRO for QUIC or SPUD transparently does not work as it breaks the
>> semantics of UDP. UDP is a framed protocol not a streamed one so it does
>> not make sense to add that. You can implement GRO for fragmented UDP,
>> though. The length of the packet is end-to-end information. If you add a
>> new protocol with a new socket type, sure you can add GRO engine
>> transparently for that but not simply peeking data inside UDP if you
>> don't know how the local application uses this data. In case of
>> forwarding you can never do that, it will break the internet actually.
>> In case you are the end host GRO engine can ask the socket what type it
>> is or what framing inside UDP is used. Thus this cannot work on hardware
>> either.
>>
> This is not correct, We already have many instances of GRO being used
> over UDP in several UDP encapsulations, there is no issue with
> breaking UDP semantics. QUIC is a stream based transport like TCP so
> it will fit into the model (granted the fact that this incoming from
> userspace and the per packet security will make it little more
> challenging to implement offload). I don't know if this is needed, but
> I can only assume that server performance in QUIC must be miserable if
> all the I/O is 1350 bytes.
> 
>> I am not very happy with the use cases of BPF outside of tracing and
>> cls_bpf and packet steering.
>>
>> Please don't propose that we should use BPF as the API for HW
>> programmable offloading currently. It does not make sense.
>>
> If you have an alternative, please propose it now.

My proposal is still the Flow API I proposed back in Feb. It maps
well to at least the segment of hardware that exists today and/or will
exist in the very near future. And also requires less mangling by the
driver, kernel, etc.

As a reminder here are the operations I proposed,

On the read-only side for parse graphs,

 get_hdrs : returns a list of header types supported
 get_hdr_graph : returns a parse graph of the header types
 get_actions : returns a list of actions the device supports on nodes
	       in the above graph.

Then I also proposed some operations for reading out table formats
but I think you could ignore that for the time being if your main
concern is parsing headers for queue mappings, RSS, etc. These were
more about building pipelines of operations. For completeness the
operations were get_tbls and get_tbl_graph.

Further although I didn't propose them in the talk (a) because the
hardware wasn't ready and (b) because rocker which was my prototype
vehicle at the time could not support them but there could be write ops
as well such as,

  set_hdrs : push a list of header types to support
  set_hdr_graph : push a parse graph to support

If you wanted the hdrs and hdr_graph ops could be pushed into a single
operations but I found it easier to deal with two separate operations.

This would (I think at least) easily support NICs that don't have a
a more general purpose engine like BPF or some other instruction set
but do support generic parsers. I think this is the trend that we will
see. Its a big jump to go from fixed logic to an instruction set its
much more manageable on the hardware side to go from fixed logic to
a generic parser engine.

If we insist on BPF programs I don't see how to avoid doing the BPF
to CFG and mapping that onto a parse graph to support the class of
devices I am looking at supporting. Perhaps the argument is this isn't
horrible to do, but I would ask if we go that route make that
mapping in the core kernel code and expose the above ndo ops to the
driver.

Maybe when I was talking at netconf/netdev0.1 I (others?) got hung
up on how the above ops related to switchdev or offload some feature
of the kernel. Specifically the set_flow piece was controversial which
let users push flow rules into the hardware similar to ntuple ethtool
case. But you don't necessarily need to include the set_flow part to
make it useful for loading parse graphs into the hardware.

Thanks
.John



> 
>> Bye,
>> Hannes
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 18:28                           ` Tom Herbert
  2015-12-04 19:54                             ` John Fastabend
@ 2015-12-04 19:59                             ` Hannes Frederic Sowa
  2015-12-04 20:02                               ` Hannes Frederic Sowa
  2015-12-04 20:06                               ` David Miller
  1 sibling, 2 replies; 94+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-04 19:59 UTC (permalink / raw)
  To: Tom Herbert
  Cc: John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

Hi Tom,

On Fri, Dec 4, 2015, at 19:28, Tom Herbert wrote:
> > I do know that, but fact is, the current drivers do it. I am concerned
> > about the amount of entropy in one single 16 bit field used to
> > distinguish flows. Flow labels fine and good, but if current hardware
> > does not support it, it does not help. Imagine containers with lots of
> > applications, 16 bit doesn't seem to fit here.
> >
> Based on what? RSS indirection table is only seven bits so even 16
> bits would be overkill for that. Please provide a concrete example,
> data where 16 bits wouldn't be sufficient.

I don't have concrete evidence: I just noticed that drivers already
implement RSS based on the data we push them over the vxlan offloading
ndos. This patchset achieves the same for geneve. Also if people would
like to implement ntuple filtering within encapsulated packets on the
NIC this is a requirement. I agree this is a bit far fetched but losing
this capability right now doesn't seem worthwhile for now and also not
stopping new protocols being deployed in this manner with specific
offloads.

Also I am not sure if hardware does only provide a 7 bit indirection
table, that would limit them to 128 receive queues/cores where they can
steer their packets to, but I absolutely don't know, it is just a guess.
Given that FM10K has 256 max queues, I could imagine they also use a
larger indirection table, no? But yeah, obviously this would still be
enough. This also very much depends on the hw used hash function,
probably toeplitz hash, and the distribution thereof. I would need to do
more research on this and check out biases.

> >> > Please provide a sketch up for a protocol generic api that can tell
> >> > hardware where a inner protocol header starts that supports vxlan,
> >> > vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
> >> > starting at that point.
> >> >
> >> BPF. Implementing protocol generic offloads are not just a HW concern
> >> either, adding kernel GRO code for every possible protocol that comes
> >> along doesn't scale well. This becomes especially obvious when we
> >> consider how to provide offloads for applications protocols. If the
> >> kernel provides a programmable framework for the offloads then
> >> application protocols, such as QUIC, could use use that without
> >> needing to hack the kernel to support the specific protocol (which no
> >> one wants!). Application protocol parsing in KCM and some other use
> >> cases of BPF have already foreshadowed this, and we are working on a
> >> prototype for a BPF programmable engine in the kernel. Presumably,
> >> this same model could eventually be applied as the HW API to
> >> programmable offload.
> >
> > So your proposal is like this:
> >
> > dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?
> >
> > What do network cards do when they don't support bpf in hardware as
> > currently all cards. Should they do program equivalence testing on the
> > bpf program to check if it conforms some of its offload capabilities and
> > activate those for the port they parsed out of the bpf program? I don't
> > really care about more function pointers in struct net_device_ops
> > because it really doesn't matter but what really concerns me is the huge
> > size of the drivers in the kernel. Just tell the driver specifically
> > what is wanted and let them do that. Don't force them to do program
> > inspection or anything.
> >
> Nobody is forcing anyone to do anything. If someone implements generic
> offload like this it's treated just like any other optional feature of
> a NIC.

Yes, I agree, I am totally with you here. If generic offloading can be
realized by NICs I am totally with you that this should be the way to
go. I don't see that coming in the next (small number of) years, so I
don't see a reason to stop this patchset. (Or the more specific one
posted recently.)

All protocols can try to push down their offloading needs to the NIC via
a special generic ndo op hopefully in the future. But hardware currently
doesn't support that, so I can understand why this patchset implements
more specific offloads for specific IETF drafts.

I favor the new ndo op slightly more which is implemented in the new
patch set.

> > About your argument regarding GRO for every possible protocol:
> >
> > Adding GRO for QUIC or SPUD transparently does not work as it breaks the
> > semantics of UDP. UDP is a framed protocol not a streamed one so it does
> > not make sense to add that. You can implement GRO for fragmented UDP,
> > though. The length of the packet is end-to-end information. If you add a
> > new protocol with a new socket type, sure you can add GRO engine
> > transparently for that but not simply peeking data inside UDP if you
> > don't know how the local application uses this data. In case of
> > forwarding you can never do that, it will break the internet actually.
> > In case you are the end host GRO engine can ask the socket what type it
> > is or what framing inside UDP is used. Thus this cannot work on hardware
> > either.
> >
> This is not correct, We already have many instances of GRO being used
> over UDP in several UDP encapsulations, there is no issue with
> breaking UDP semantics. QUIC is a stream based transport like TCP so
> it will fit into the model (granted the fact that this incoming from
> userspace and the per packet security will make it little more
> challenging to implement offload). I don't know if this is needed, but
> I can only assume that server performance in QUIC must be miserable if
> all the I/O is 1350 bytes.

In case of fou offloading the kernel specifically let's the gro engine
know for which port it should look out and is allowed to aggregate
frames therein. As I said, if you have this information it is totally
possible to do that. This means that user space also has to push this
information on every forwarding host into the kernel. For QUIC as a
non-transport but a end-to-end protocol this seems much more difficult
to me, as you don't know which port numbers are used by the end
applications or by the users. So generic offloading like we do for TCP
does not work, if you know the context of applications knowing specific
port numbers, this can work but must be synchronized with user
application's sockets, maybe even over the network.

> > I am not very happy with the use cases of BPF outside of tracing and
> > cls_bpf and packet steering.
> >
> > Please don't propose that we should use BPF as the API for HW
> > programmable offloading currently. It does not make sense.
> >
> If you have an alternative, please propose it now.

I don't, that is the problem.

I tried to come up with a way to describe offloads like:

<<pseudocode>>

struct field {
    unsigned int offset;
    unsigned int length;
    unsigned int mask;
};

struct offload_config {
  struct field proto_id[whatever];
  struct field length;
  struct field next_protocol;
  struct field port;
};

(For vxlan-gpe or nsh a custom mapping of protocol ids would need to be
specified (to know the header therein).)

And then filling out those fields using the offsetof and sizeof of the
headers, but this seemed to be very difficult a) because they use
bitmasks (which of course could be converted) or in case of IPv6 a
schema would have to be specified how to walk down the IPv6 extensions.
This seems also to be true for NSH. Maybe gcc could help with
compile-time introspection with bitfields in the future but I doubt that
for now. Duplicating and maintaining two header structs for one
tunneling protoco

But looking at vxlan, vxlan-gpe, fou, geneve and ipv6 extensions this
seemed to not be possible with extra code. This was also the conclusion
by trying to add a way that user space can access NIC descriptors in a
generic way. Without code this didn't seem feasable. So for the time
being I can understand why specific offloads are proposed and should be
accepted. As soon as NICs allow uploading parsing trees or bpf(-like)
code I am all in for that!

Bye,
Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 19:59                             ` Hannes Frederic Sowa
@ 2015-12-04 20:02                               ` Hannes Frederic Sowa
  2015-12-04 20:06                               ` David Miller
  1 sibling, 0 replies; 94+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-04 20:02 UTC (permalink / raw)
  To: Tom Herbert
  Cc: John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil



On Fri, Dec 4, 2015, at 20:59, Hannes Frederic Sowa wrote:
> And then filling out those fields using the offsetof and sizeof of the
> headers, but this seemed to be very difficult a) because they use
> bitmasks (which of course could be converted) or in case of IPv6 a
> schema would have to be specified how to walk down the IPv6 extensions.
> This seems also to be true for NSH. Maybe gcc could help with
> compile-time introspection with bitfields in the future but I doubt that
> for now. Duplicating and maintaining two header structs for one
> tunneling protoco

Seems like I accidentally removed something here:

I wanted to write that maintaining multiple descriptions of tunneling
headers seems not worthwhile for now.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 19:59                             ` Hannes Frederic Sowa
  2015-12-04 20:02                               ` Hannes Frederic Sowa
@ 2015-12-04 20:06                               ` David Miller
  2015-12-04 20:13                                 ` Tom Herbert
                                                   ` (2 more replies)
  1 sibling, 3 replies; 94+ messages in thread
From: David Miller @ 2015-12-04 20:06 UTC (permalink / raw)
  To: hannes; +Cc: tom, linville, jesse, anjali.singhai, netdev, kiran.patil

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Fri, 04 Dec 2015 20:59:05 +0100

> Yes, I agree, I am totally with you here. If generic offloading can be
> realized by NICs I am totally with you that this should be the way to
> go. I don't see that coming in the next (small number of) years, so I
> don't see a reason to stop this patchset.

If I just apply this and say "yeah ok", the message is completely lost
and your prediction about "small number of years" indeed will occur.

However if I push back hard on this, as I will, then the message has
some chance of seeping back to the people designing these chips.

So that's what I'm going to do, like it or not.

Or can someone convince me that someone who understand this stuff
is telling the hardware guys to universally put 2's complement
checksums into the descriptors?

Who is doing that at each and every prominent ethernet hardware
verndor?

Who?

If I get silence, or some vague non-specific response, then I'm going
to hold my ground and keep pushing back on this stuff.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 20:06                               ` David Miller
@ 2015-12-04 20:13                                 ` Tom Herbert
  2015-12-04 21:37                                   ` David Miller
  2015-12-04 20:26                                 ` Hannes Frederic Sowa
  2015-12-04 22:44                                 ` Alexander Duyck
  2 siblings, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-12-04 20:13 UTC (permalink / raw)
  To: David Miller
  Cc: Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Linux Kernel Network Developers,
	Kiran Patil

On Fri, Dec 4, 2015 at 12:06 PM, David Miller <davem@davemloft.net> wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Fri, 04 Dec 2015 20:59:05 +0100
>
>> Yes, I agree, I am totally with you here. If generic offloading can be
>> realized by NICs I am totally with you that this should be the way to
>> go. I don't see that coming in the next (small number of) years, so I
>> don't see a reason to stop this patchset.
>
> If I just apply this and say "yeah ok", the message is completely lost
> and your prediction about "small number of years" indeed will occur.
>
> However if I push back hard on this, as I will, then the message has
> some chance of seeping back to the people designing these chips.
>
> So that's what I'm going to do, like it or not.
>
> Or can someone convince me that someone who understand this stuff
> is telling the hardware guys to universally put 2's complement
> checksums into the descriptors?
>
We're talking about 1's complement checksum (RFC1701). Just to be clear :-)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 20:06                               ` David Miller
  2015-12-04 20:13                                 ` Tom Herbert
@ 2015-12-04 20:26                                 ` Hannes Frederic Sowa
  2015-12-04 20:43                                   ` Tom Herbert
  2015-12-04 20:44                                   ` Jesse Gross
  2015-12-04 22:44                                 ` Alexander Duyck
  2 siblings, 2 replies; 94+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-04 20:26 UTC (permalink / raw)
  To: David Miller; +Cc: tom, linville, jesse, anjali.singhai, netdev, kiran.patil

Hi Dave,

On Fri, Dec 4, 2015, at 21:06, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Fri, 04 Dec 2015 20:59:05 +0100
> 
> > Yes, I agree, I am totally with you here. If generic offloading can be
> > realized by NICs I am totally with you that this should be the way to
> > go. I don't see that coming in the next (small number of) years, so I
> > don't see a reason to stop this patchset.
> 
> If I just apply this and say "yeah ok", the message is completely lost
> and your prediction about "small number of years" indeed will occur.
> 
> However if I push back hard on this, as I will, then the message has
> some chance of seeping back to the people designing these chips.
> 
> So that's what I'm going to do, like it or not.
> 
> Or can someone convince me that someone who understand this stuff
> is telling the hardware guys to universally put 2's complement
> checksums into the descriptors?
> 
> Who is doing that at each and every prominent ethernet hardware
> verndor?
> 
> Who?
> 
> If I get silence, or some vague non-specific response, then I'm going
> to hold my ground and keep pushing back on this stuff.

This is not only about 1's checksumming but also about TSO (and to some
smaller degree about RSS, as I tried to explain): if we attach a geneve
header in front of a skb we expect the hardware to recognize it and
duplicate it while doing the hardware segmentation. The hardware can
only do so if it is in knowledge of the specific port (in this case UDP
port used for geneve) which is in use for this particular tunneling
transport protocol. We currently cannot describe this in a generic way,
thus this patchset. (Please correct me if I am wrong!)

The other way to do it would probably be to enlarge the skb and push the
structure of the packet into it, so hardware has more semantic knowledge
about the frames structure. I guess(!!!) DPDK does it like that?

If it would only be about checksuming I probably would agree.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 20:26                                 ` Hannes Frederic Sowa
@ 2015-12-04 20:43                                   ` Tom Herbert
  2015-12-04 21:11                                     ` Hannes Frederic Sowa
  2015-12-04 20:44                                   ` Jesse Gross
  1 sibling, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-12-04 20:43 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Miller, John Linville, Jesse Gross, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Fri, Dec 4, 2015 at 12:26 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi Dave,
>
> On Fri, Dec 4, 2015, at 21:06, David Miller wrote:
>> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> Date: Fri, 04 Dec 2015 20:59:05 +0100
>>
>> > Yes, I agree, I am totally with you here. If generic offloading can be
>> > realized by NICs I am totally with you that this should be the way to
>> > go. I don't see that coming in the next (small number of) years, so I
>> > don't see a reason to stop this patchset.
>>
>> If I just apply this and say "yeah ok", the message is completely lost
>> and your prediction about "small number of years" indeed will occur.
>>
>> However if I push back hard on this, as I will, then the message has
>> some chance of seeping back to the people designing these chips.
>>
>> So that's what I'm going to do, like it or not.
>>
>> Or can someone convince me that someone who understand this stuff
>> is telling the hardware guys to universally put 2's complement
>> checksums into the descriptors?
>>
>> Who is doing that at each and every prominent ethernet hardware
>> verndor?
>>
>> Who?
>>
>> If I get silence, or some vague non-specific response, then I'm going
>> to hold my ground and keep pushing back on this stuff.
>
> This is not only about 1's checksumming but also about TSO (and to some
> smaller degree about RSS, as I tried to explain): if we attach a geneve
> header in front of a skb we expect the hardware to recognize it and
> duplicate it while doing the hardware segmentation. The hardware can
> only do so if it is in knowledge of the specific port (in this case UDP
> port used for geneve) which is in use for this particular tunneling
> transport protocol. We currently cannot describe this in a generic way,
> thus this patchset. (Please correct me if I am wrong!)
>
Yes, you are wrong. Port numbers are not used in transmit path to
signal offload. To perform TSO on UDP encapsulated packets the skb is
marked with SKB_GSO_UDP_TUNNEL or SKB_GSO_UDP_TUNNEL_CSUM and
SKB_GSO_TCP, etc. The driver can use information along with the
offsets of the inner and outer headers in the packet to set up the
operation in the device.  Some devices only support TSO for VXLAN, but
regardless SKB_GSO_UDP_TUNNEL is generic for all known UDP
encapsulations. Protocol specific offload is not needed. Please start
looking at http://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf
and the kernel code to see how things _actually_ work.

> The other way to do it would probably be to enlarge the skb and push the
> structure of the packet into it, so hardware has more semantic knowledge
> about the frames structure. I guess(!!!) DPDK does it like that?
>
> If it would only be about checksuming I probably would agree.
>
> Bye,
> Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 20:26                                 ` Hannes Frederic Sowa
  2015-12-04 20:43                                   ` Tom Herbert
@ 2015-12-04 20:44                                   ` Jesse Gross
  1 sibling, 0 replies; 94+ messages in thread
From: Jesse Gross @ 2015-12-04 20:44 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Miller, Tom Herbert, linville, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Fri, Dec 4, 2015 at 12:26 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi Dave,
>
> On Fri, Dec 4, 2015, at 21:06, David Miller wrote:
>> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> Date: Fri, 04 Dec 2015 20:59:05 +0100
>>
>> > Yes, I agree, I am totally with you here. If generic offloading can be
>> > realized by NICs I am totally with you that this should be the way to
>> > go. I don't see that coming in the next (small number of) years, so I
>> > don't see a reason to stop this patchset.
>>
>> If I just apply this and say "yeah ok", the message is completely lost
>> and your prediction about "small number of years" indeed will occur.
>>
>> However if I push back hard on this, as I will, then the message has
>> some chance of seeping back to the people designing these chips.
>>
>> So that's what I'm going to do, like it or not.
>>
>> Or can someone convince me that someone who understand this stuff
>> is telling the hardware guys to universally put 2's complement
>> checksums into the descriptors?
>>
>> Who is doing that at each and every prominent ethernet hardware
>> verndor?
>>
>> Who?
>>
>> If I get silence, or some vague non-specific response, then I'm going
>> to hold my ground and keep pushing back on this stuff.
>
> This is not only about 1's checksumming but also about TSO (and to some
> smaller degree about RSS, as I tried to explain): if we attach a geneve
> header in front of a skb we expect the hardware to recognize it and
> duplicate it while doing the hardware segmentation. The hardware can
> only do so if it is in knowledge of the specific port (in this case UDP
> port used for geneve) which is in use for this particular tunneling
> transport protocol. We currently cannot describe this in a generic way,
> thus this patchset. (Please correct me if I am wrong!)

This isn't really about TSO so much as receive side offloads. However,
the general point still stands.

Checksum is only one component and really the only one that has this
type of generalizable mathematical property. n-tuple offloads, LRO,
etc. are things that are currently supported by the stack and need
this type of support. And encryption, which people are already pushing
for, has the same issue. The fact is that there is no real plan to be
able to support these types of things in a way other than what is
being done in this patchset.

I do believe that there is genuine interest in working to find
solutions to these types of problems as John and et. al. have already
been doing. However, a real, fully general solution is not something
that exists as this point in time, as you can see from all of the
discussion in this thread.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 20:43                                   ` Tom Herbert
@ 2015-12-04 21:11                                     ` Hannes Frederic Sowa
  0 siblings, 0 replies; 94+ messages in thread
From: Hannes Frederic Sowa @ 2015-12-04 21:11 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, John Linville, Jesse Gross, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Fri, Dec 4, 2015, at 21:43, Tom Herbert wrote:
> On Fri, Dec 4, 2015 at 12:26 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi Dave,
> >
> > On Fri, Dec 4, 2015, at 21:06, David Miller wrote:
> >> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> >> Date: Fri, 04 Dec 2015 20:59:05 +0100
> >>
> >> > Yes, I agree, I am totally with you here. If generic offloading can be
> >> > realized by NICs I am totally with you that this should be the way to
> >> > go. I don't see that coming in the next (small number of) years, so I
> >> > don't see a reason to stop this patchset.
> >>
> >> If I just apply this and say "yeah ok", the message is completely lost
> >> and your prediction about "small number of years" indeed will occur.
> >>
> >> However if I push back hard on this, as I will, then the message has
> >> some chance of seeping back to the people designing these chips.
> >>
> >> So that's what I'm going to do, like it or not.
> >>
> >> Or can someone convince me that someone who understand this stuff
> >> is telling the hardware guys to universally put 2's complement
> >> checksums into the descriptors?
> >>
> >> Who is doing that at each and every prominent ethernet hardware
> >> verndor?
> >>
> >> Who?
> >>
> >> If I get silence, or some vague non-specific response, then I'm going
> >> to hold my ground and keep pushing back on this stuff.
> >
> > This is not only about 1's checksumming but also about TSO (and to some
> > smaller degree about RSS, as I tried to explain): if we attach a geneve
> > header in front of a skb we expect the hardware to recognize it and
> > duplicate it while doing the hardware segmentation. The hardware can
> > only do so if it is in knowledge of the specific port (in this case UDP
> > port used for geneve) which is in use for this particular tunneling
> > transport protocol. We currently cannot describe this in a generic way,
> > thus this patchset. (Please correct me if I am wrong!)
> >
> Yes, you are wrong. Port numbers are not used in transmit path to
> signal offload. To perform TSO on UDP encapsulated packets the skb is
> marked with SKB_GSO_UDP_TUNNEL or SKB_GSO_UDP_TUNNEL_CSUM and
> SKB_GSO_TCP, etc. The driver can use information along with the
> offsets of the inner and outer headers in the packet to set up the
> operation in the device.  Some devices only support TSO for VXLAN, but
> regardless SKB_GSO_UDP_TUNNEL is generic for all known UDP
> encapsulations. Protocol specific offload is not needed. Please start
> looking at
> http://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf
> and the kernel code to see how things _actually_ work.

I am sorry. Of course we have _inner_ header pointers and the
corresponding gso_types in skb_shinfo to signal that already. Probably I
got confused by some driver sources and commit descriptions I looked
into lately. Thanks for the correction! :)

Bye,
Hannes

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 20:13                                 ` Tom Herbert
@ 2015-12-04 21:37                                   ` David Miller
  0 siblings, 0 replies; 94+ messages in thread
From: David Miller @ 2015-12-04 21:37 UTC (permalink / raw)
  To: tom; +Cc: hannes, linville, jesse, anjali.singhai, netdev, kiran.patil

From: Tom Herbert <tom@herbertland.com>
Date: Fri, 4 Dec 2015 12:13:53 -0800

> On Fri, Dec 4, 2015 at 12:06 PM, David Miller <davem@davemloft.net> wrote:
>> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> Date: Fri, 04 Dec 2015 20:59:05 +0100
>>
>>> Yes, I agree, I am totally with you here. If generic offloading can be
>>> realized by NICs I am totally with you that this should be the way to
>>> go. I don't see that coming in the next (small number of) years, so I
>>> don't see a reason to stop this patchset.
>>
>> If I just apply this and say "yeah ok", the message is completely lost
>> and your prediction about "small number of years" indeed will occur.
>>
>> However if I push back hard on this, as I will, then the message has
>> some chance of seeping back to the people designing these chips.
>>
>> So that's what I'm going to do, like it or not.
>>
>> Or can someone convince me that someone who understand this stuff
>> is telling the hardware guys to universally put 2's complement
>> checksums into the descriptors?
>>
> We're talking about 1's complement checksum (RFC1701). Just to be clear :-)

Right :)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 20:06                               ` David Miller
  2015-12-04 20:13                                 ` Tom Herbert
  2015-12-04 20:26                                 ` Hannes Frederic Sowa
@ 2015-12-04 22:44                                 ` Alexander Duyck
  2015-12-05  0:53                                   ` Tom Herbert
  2015-12-05  4:50                                   ` David Miller
  2 siblings, 2 replies; 94+ messages in thread
From: Alexander Duyck @ 2015-12-04 22:44 UTC (permalink / raw)
  To: David Miller
  Cc: Hannes Frederic Sowa, Tom Herbert, John Linville, jesse,
	Anjali Singhai Jain, Netdev, Kiran Patil

On Fri, Dec 4, 2015 at 12:06 PM, David Miller <davem@davemloft.net> wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Fri, 04 Dec 2015 20:59:05 +0100
>
>> Yes, I agree, I am totally with you here. If generic offloading can be
>> realized by NICs I am totally with you that this should be the way to
>> go. I don't see that coming in the next (small number of) years, so I
>> don't see a reason to stop this patchset.
>
> If I just apply this and say "yeah ok", the message is completely lost
> and your prediction about "small number of years" indeed will occur.

It is going to take several years regardless.  It isn't as if any of
these manufacturers can spin a design overnight.  It would likely take
a few years even if they suddenly all decided it was an important
feature to have tomorrow.  I suspect we will probably see more cards
with similar offloads long before any updated cards could come out as
there are already likely a number in the pipeline.

> However if I push back hard on this, as I will, then the message has
> some chance of seeping back to the people designing these chips.
>
> So that's what I'm going to do, like it or not.

The problem is the Linux kernel itself doesn't hold much sway over
hardware manufacturers.  A push back on something like this means they
will just bypass the upstream kernel entirely and only support this
type of offload out-of-tree on Linux or in DPDK.

If you are actually wanting to see the manufacturers change their
habits then the consumers of said cards really need to push back on
this kind of stuff.  As the saying goes money talks, B.S. walks.

> Or can someone convince me that someone who understand this stuff
> is telling the hardware guys to universally put 2's complement
> checksums into the descriptors?
>
> Who is doing that at each and every prominent ethernet hardware
> verndor?
>
> Who?

I actually tried to push the generic checksum idea for fm10k back
during hardware development but ended up losing that battle.  The
problem is you have to have some customer willing to spend the cash in
order to get a feature, and the fact is nobody other than Tom has been
pushing for this.  If it was one of Tom's employer, either Google or
Facebook, that had been telling manufacturers that they wouldn't buy
their product unless it had the feature then you can bet they would
have changed their tune.

If you want to push the manufacturers to change you basically need to
have someone put out some sort of marketable data on how a 1's
compliment checksum approach is superior to the current solution that
just indicates if the checksum is valid.  The problem is I haven't
seen anything like that so either this is due to nobody providing a
part that actually takes this approach, or because the approach is not
superior in terms of performance.  The test that should demonstrate
the superiority of using the 1's compliment checksum would be
something like having a number of VXLAN tunnel ports that exceed the
capabilities of the port filters for a given netdev.  With that you
would have the 1's compliment competing essentially against no offload
at all.

> If I get silence, or some vague non-specific response, then I'm going
> to hold my ground and keep pushing back on this stuff.

Trying to get driver developers to change this is far too late in the
process.  In many cases they hold little sway on the hardware design
which was likely locked down a year or more ago.  It is just preaching
to the choir as I am sure they have plenty of other parts of the
hardware implementation they are not happy with as well.

If anything I would say we need to be able to support the existing
hardware that has some number of filters that will identify these
tunnels via some form of ntuple filter.  The fact is there are already
5 different drivers that do this for VXLAN using vxlan_get_rx_port, I
suspect we will probably see others popping up soon to support GENEVE
and VXLAN-GPE.  By providing support for the existing hardware we can
at least let people make use of their hardware features without having
to circumvent the kernel.

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 22:44                                 ` Alexander Duyck
@ 2015-12-05  0:53                                   ` Tom Herbert
  2015-12-05  5:45                                     ` Alexander Duyck
  2015-12-05  4:50                                   ` David Miller
  1 sibling, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-12-05  0:53 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

> I actually tried to push the generic checksum idea for fm10k back
> during hardware development but ended up losing that battle.  The
> problem is you have to have some customer willing to spend the cash in
> order to get a feature, and the fact is nobody other than Tom has been
> pushing for this.

Very well, it is true that I only represent one user of networking
protocols, grant it a very large one. I will shut up now. If other
USERS want to chime in on what is best I'll certainly listen.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-04 22:44                                 ` Alexander Duyck
  2015-12-05  0:53                                   ` Tom Herbert
@ 2015-12-05  4:50                                   ` David Miller
  2015-12-05  6:50                                     ` Alexander Duyck
  1 sibling, 1 reply; 94+ messages in thread
From: David Miller @ 2015-12-05  4:50 UTC (permalink / raw)
  To: alexander.duyck
  Cc: hannes, tom, linville, jesse, anjali.singhai, netdev, kiran.patil

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Fri, 4 Dec 2015 14:44:00 -0800

> I actually tried to push the generic checksum idea for fm10k back
> during hardware development but ended up losing that battle.

This chips already have a circuit calculating the 1's complement sum
over the data as is passed through the FIFO in the RTL, it's merely a
matter of putting the result it in the descriptor.

Relatively speaking, the feature would be almost free.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-05  0:53                                   ` Tom Herbert
@ 2015-12-05  5:45                                     ` Alexander Duyck
  2015-12-05  6:49                                       ` David Miller
  0 siblings, 1 reply; 94+ messages in thread
From: Alexander Duyck @ 2015-12-05  5:45 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

On 12/04/2015 04:53 PM, Tom Herbert wrote:
>> I actually tried to push the generic checksum idea for fm10k back
>> during hardware development but ended up losing that battle.  The
>> problem is you have to have some customer willing to spend the cash in
>> order to get a feature, and the fact is nobody other than Tom has been
>> pushing for this.
>
> Very well, it is true that I only represent one user of networking
> protocols, grant it a very large one. I will shut up now. If other
> USERS want to chime in on what is best I'll certainly listen.

Tom,

I'm sorry, but I have a hard time believing you are actually 
representing a large user here.  By large user I assume you are implying 
Facebook?  I agree that your point is very valid on the merits of the 
1's compliment checksum likely being a useful feature, but I just think 
you are going about this the wrong way as obstructing things like this 
does little to impact hardware design decisions.

If we want to win over the manufacturers we would have to speak with 
money as they aren't going to make something unless they are convinced 
they can sell it.  Simply insisting we want some feature doesn't do much 
without the customer demand to back it up.  So, unless you are telling 
me Facebook is going to let this feature influence a purchasing decision 
in the future, the argument is B.S.

Not having this feature has to in some way impact sales.  You need to 
make the 1's compliment checksum a check box type item that if the part 
doesn't have the customer won't buy.  If you were to come up with some 
sort of data demonstrating the need for the feature and were to 
associate it with something such as Open Compute then you would start to 
go a long way towards winning over consumers that they need the feature 
and as a result convincing the manufacturers that they have to provide it.

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-05  5:45                                     ` Alexander Duyck
@ 2015-12-05  6:49                                       ` David Miller
  2015-12-05  8:24                                         ` Alexander Duyck
  0 siblings, 1 reply; 94+ messages in thread
From: David Miller @ 2015-12-05  6:49 UTC (permalink / raw)
  To: alexander.duyck
  Cc: tom, hannes, linville, jesse, anjali.singhai, netdev, kiran.patil

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Fri, 4 Dec 2015 21:45:09 -0800

> Not having this feature has to in some way impact sales.

I'm glad money trumps clean design and performance these days.

Would they ship a literal turd until some customer asked for
something better?  You have to be kidding me.

If it's true, then what a sad world we live in.

And part of this is bogus, the circuit is already there and
implemented already.  The missing part is putting the value computed
by that circuit into the receive descriptor.

And furthermore, nobody is going to drop to BSD or DPDK for VXLAN
tunnels just because I push back hard on this facility.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-05  4:50                                   ` David Miller
@ 2015-12-05  6:50                                     ` Alexander Duyck
  0 siblings, 0 replies; 94+ messages in thread
From: Alexander Duyck @ 2015-12-05  6:50 UTC (permalink / raw)
  To: David Miller
  Cc: Hannes Frederic Sowa, Tom Herbert, John Linville, jesse,
	Anjali Singhai Jain, Netdev, Kiran Patil

On Fri, Dec 4, 2015 at 8:50 PM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Fri, 4 Dec 2015 14:44:00 -0800
>
>> I actually tried to push the generic checksum idea for fm10k back
>> during hardware development but ended up losing that battle.
>
> This chips already have a circuit calculating the 1's complement sum
> over the data as is passed through the FIFO in the RTL, it's merely a
> matter of putting the result it in the descriptor.

Actually it is a bit trickier than that.  The problem is the L4
checksum doesn't include all of the L3 header in the pseudo header.
So in order to add this feature and maintain the current feature you
have to essentially compute two checksums.  One with the pseudo header
and one with the entire L3 header.  In addition there are the fiddly
little details like what to do about VLAN headers since I think they
were included in the checksum if you leave them in the header, but you
have to exclude them if they aren't.  There is also the matter of if
we include the L2 header or not.  Basically there are number of odd
corner cases and such where this really starts to become a pain.

> Relatively speaking, the feature would be almost free.

Right.  I made similar arguments.  The problem is the RTL they have
works, and they don't want to change it unless they have to.  Having a
couple engineers write the RTL, then have some validation engineers
test it, and some driver developer code it up costs money.  In
addition there is risk involved if some flaw slips through one of the
validation efforts resulting in a silicon spin, or even worse if some
flaw ends up being released.

What it comes down to is that from the engineering side we likely
won't be able to influence any change on the hardware design.  We have
to convince the sales and marketing folks for one of the vendors that
implementing this feature would be beneficial to their bottom line if
we want to see any actual change.  Without that impetus there is no
motivation on the vendors side to risk any of their capital trying to
implement a feature that a few of us kernel engineers think would be a
really good idea.

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-05  6:49                                       ` David Miller
@ 2015-12-05  8:24                                         ` Alexander Duyck
  2015-12-05 17:53                                           ` Tom Herbert
  2015-12-05 18:03                                           ` David Miller
  0 siblings, 2 replies; 94+ messages in thread
From: Alexander Duyck @ 2015-12-05  8:24 UTC (permalink / raw)
  To: David Miller
  Cc: Tom Herbert, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

On Fri, Dec 4, 2015 at 10:49 PM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Fri, 4 Dec 2015 21:45:09 -0800
>
>> Not having this feature has to in some way impact sales.
>
> I'm glad money trumps clean design and performance these days.
>
> Would they ship a literal turd until some customer asked for
> something better?  You have to be kidding me.

You think they wouldn't?  It all comes down to the bottom line.

Also, do you really think not having support for CHECKSUM_COMPLETE
makes the part a complete turd?  That hasn't stopped anyone from
buying many of the NICs out there that are using the port based
approach for VXLAN up to now.  Really what we are arguing about here
is a "nice to have feature" not something that will make or break a
sale for most people.  It was implemented just well enough to be able
to show gains on marketing data but there are enough corner cases
where the feature won't do much of anything since there is always some
upper limit on the number of ports supported.

> If it's true, then what a sad world we live in.

That is the nature of business.  If there is isn't any significant
impact on the bottom line most companies won't be pushed to take
action.

> And part of this is bogus, the circuit is already there and
> implemented already.  The missing part is putting the value computed
> by that circuit into the receive descriptor.

Yes, but that part doesn't complete the whole piece.  As I stated in
my other email it is still a bit of effort to complete something like
this.

> And furthermore, nobody is going to drop to BSD or DPDK for VXLAN
> tunnels just because I push back hard on this facility.

I agree, that was a bit of hyperbole on my part.  Still, hard blocking
this isn't necessarily going to push the vendors to change their ways.
It just ends up punishing the customers who already own the devices.
You may think the port based approach for the UDP tunnel offloads is a
"literal turd" but the fact is no amount of software changes is going
to do anything to fundamentally change how the hardware was designed,
and like I said there is likely more of that to come.

Keep in mind I don't represent one of the hardware vendors here
anymore.  I am approaching this from the customer point of view.  I
would like to have the performance I can get out of the parts I have.
In the future Mirantis may not buy nor recommend the devices, but I
have an OpenStack environment filled with NICs from various vendors
that support this UDP port number based offload.  I have to come up
with a means of polishing these "turds" in such a way that we can get
the maximum benefit out of them.  The question I would have is if you
see a constructive way of me doing this and working it out through the
kernel network stack, or do I need to suggest a bypass solution such
as ovs-dpdk and just give up on the hope of using kernel networking
with these parts?

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-05  8:24                                         ` Alexander Duyck
@ 2015-12-05 17:53                                           ` Tom Herbert
  2015-12-05 19:34                                             ` Alexander Duyck
  2015-12-05 18:03                                           ` David Miller
  1 sibling, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-12-05 17:53 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

> Keep in mind I don't represent one of the hardware vendors here
> anymore.  I am approaching this from the customer point of view.  I
> would like to have the performance I can get out of the parts I have.

Trying enabling UDP checksum, GRO/GSO, and Remote Checksum Offload in
VXLAN. Assuming you have a NIC that at least provides UDP checksum
offload and RSS for UDP (may need to be enabled) you can get good
VXLAN performance across a varietyof legacy "dumb" NICs.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-05  8:24                                         ` Alexander Duyck
  2015-12-05 17:53                                           ` Tom Herbert
@ 2015-12-05 18:03                                           ` David Miller
  2015-12-05 19:34                                             ` Alexander Duyck
  1 sibling, 1 reply; 94+ messages in thread
From: David Miller @ 2015-12-05 18:03 UTC (permalink / raw)
  To: alexander.duyck
  Cc: tom, hannes, linville, jesse, anjali.singhai, netdev, kiran.patil

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Sat, 5 Dec 2015 00:24:55 -0800

> Still, hard blocking this isn't necessarily going to push the
> vendors to change their ways.

Pushing back is different from blocking entirely.

That means I'm going to be very difficult and make a lot of noise
until I see the message has seeped in.

It doesn't mean that I won't allow a means to use existing hardware
offloads.  You'll just have to bear with me, be patient, and survive
my tantrum on this matter.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-05 17:53                                           ` Tom Herbert
@ 2015-12-05 19:34                                             ` Alexander Duyck
  0 siblings, 0 replies; 94+ messages in thread
From: Alexander Duyck @ 2015-12-05 19:34 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

On Sat, Dec 5, 2015 at 9:53 AM, Tom Herbert <tom@herbertland.com> wrote:
>> Keep in mind I don't represent one of the hardware vendors here
>> anymore.  I am approaching this from the customer point of view.  I
>> would like to have the performance I can get out of the parts I have.
>
> Trying enabling UDP checksum, GRO/GSO, and Remote Checksum Offload in
> VXLAN. Assuming you have a NIC that at least provides UDP checksum
> offload and RSS for UDP (may need to be enabled) you can get good
> VXLAN performance across a varietyof legacy "dumb" NICs.

VXLAN offload support is already there so I can make use of that in
hardware.  I assume we aren't talking about introducing a performance
regression by removing the VXLAN Rx port notification code.

Setting up any given environment I have to work with the hand I am
dealt.  If I can make use of the work you did to do offloading with
standard NICs then I will, but more solutions for the performance with
tunnels is always better.

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-05 18:03                                           ` David Miller
@ 2015-12-05 19:34                                             ` Alexander Duyck
  2015-12-05 22:27                                               ` David Miller
  0 siblings, 1 reply; 94+ messages in thread
From: Alexander Duyck @ 2015-12-05 19:34 UTC (permalink / raw)
  To: David Miller
  Cc: Tom Herbert, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

On Sat, Dec 5, 2015 at 10:03 AM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Sat, 5 Dec 2015 00:24:55 -0800
>
>> Still, hard blocking this isn't necessarily going to push the
>> vendors to change their ways.
>
> Pushing back is different from blocking entirely.

Sorry.  I had the mistaken impression that you were planning to block
this entirely based on earlier comments.

> That means I'm going to be very difficult and make a lot of noise
> until I see the message has seeped in.
>
> It doesn't mean that I won't allow a means to use existing hardware
> offloads.  You'll just have to bear with me, be patient, and survive
> my tantrum on this matter.

I'm only really interested in what options the customers has in order
to get this all configured.  As long as there eventually ends up being
some path forward I'll be good with whatever ends up happening, though
my preference would be to see some option available in the kernel.

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-05 19:34                                             ` Alexander Duyck
@ 2015-12-05 22:27                                               ` David Miller
  2015-12-06  2:13                                                 ` Alexander Duyck
  0 siblings, 1 reply; 94+ messages in thread
From: David Miller @ 2015-12-05 22:27 UTC (permalink / raw)
  To: alexander.duyck
  Cc: tom, hannes, linville, jesse, anjali.singhai, netdev, kiran.patil

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Sat, 5 Dec 2015 11:34:47 -0800

> I'm only really interested in what options the customers has in order
> to get this all configured.  As long as there eventually ends up being
> some path forward I'll be good with whatever ends up happening, though
> my preference would be to see some option available in the kernel.

Fair enough.

BTW, I don't entirely buy your hardware complexity argument.

When we're looking at checksumming offload via 1's complement in the
RX descriptor, that is heaps simpler than having a seperate RTL path
for N different encapsulation technologies and/or protocols.

I'd rather maintain a single 1's complement circuit than N header
parser engines that trigger the sum at the right range.

There is definitely long term maintainability and stability value in
this.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-05 22:27                                               ` David Miller
@ 2015-12-06  2:13                                                 ` Alexander Duyck
  2015-12-06 16:31                                                   ` Tom Herbert
  0 siblings, 1 reply; 94+ messages in thread
From: Alexander Duyck @ 2015-12-06  2:13 UTC (permalink / raw)
  To: David Miller
  Cc: Tom Herbert, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

On Sat, Dec 5, 2015 at 2:27 PM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Sat, 5 Dec 2015 11:34:47 -0800
>
>> I'm only really interested in what options the customers has in order
>> to get this all configured.  As long as there eventually ends up being
>> some path forward I'll be good with whatever ends up happening, though
>> my preference would be to see some option available in the kernel.
>
> Fair enough.
>
> BTW, I don't entirely buy your hardware complexity argument.
>
> When we're looking at checksumming offload via 1's complement in the
> RX descriptor, that is heaps simpler than having a seperate RTL path
> for N different encapsulation technologies and/or protocols.

I fully agree with you.  Basically what I had mentioned is some of the
explanation I was given from the hardware engineers when they pushed
back on my request.

> I'd rather maintain a single 1's complement circuit than N header
> parser engines that trigger the sum at the right range.

Yes, but the thing is the port number and parsers are also needed for
other things like RSS.  You also have to take into account there are
also requirements placed on the vendors by other organizations such as
Microsoft that end up impacting the final design.  As such there are
some parts of the design we cannot convince the hardware vendors to
give up.

> There is definitely long term maintainability and stability value in
> this.

Agreed.  That is why I fully support the request to add the feature,
it is just a matter of convincing the vendors to do so.

The only spot I think you and I disagreed on was the approach.  I
don't know if the hard push back does anything but punish the users by
delaying the time needed to find a reasonable solution.  I really
think if we are going to get the hardware vendors to change their
behavior we have to create a market demand for it.  Having a bit of
marketable data showing the folly of this approach versus the 1's
compliment checksum would probably do more to encourage and/or shame
them into it than simply pushing for this based on engineering
opinion.

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-06  2:13                                                 ` Alexander Duyck
@ 2015-12-06 16:31                                                   ` Tom Herbert
  2015-12-06 18:44                                                     ` Alexander Duyck
  0 siblings, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-12-06 16:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

> The only spot I think you and I disagreed on was the approach.  I
> don't know if the hard push back does anything but punish the users by
> delaying the time needed to find a reasonable solution.  I really
> think if we are going to get the hardware vendors to change their
> behavior we have to create a market demand for it.  Having a bit of
> marketable data showing the folly of this approach versus the 1's
> compliment checksum would probably do more to encourage and/or shame
> them into it than simply pushing for this based on engineering
> opinion.
>
I don't know what "marketable data" means. But I do know that we're
like 70 postings into this thread, into the third patch set regarding
this, yet nobody has bothered to contribute any data on what these
patches do and what the quantifiable benefits are with HW offload of
these protocols. I would test this stuff myself, but I don't have
access to any NICs with necessary support. If someone else can start
testing and providing meaningful data it would be most helpful...

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-06 16:31                                                   ` Tom Herbert
@ 2015-12-06 18:44                                                     ` Alexander Duyck
  2015-12-06 21:30                                                       ` Tom Herbert
  0 siblings, 1 reply; 94+ messages in thread
From: Alexander Duyck @ 2015-12-06 18:44 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

On Sun, Dec 6, 2015 at 8:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>> The only spot I think you and I disagreed on was the approach.  I
>> don't know if the hard push back does anything but punish the users by
>> delaying the time needed to find a reasonable solution.  I really
>> think if we are going to get the hardware vendors to change their
>> behavior we have to create a market demand for it.  Having a bit of
>> marketable data showing the folly of this approach versus the 1's
>> compliment checksum would probably do more to encourage and/or shame
>> them into it than simply pushing for this based on engineering
>> opinion.
>>
> I don't know what "marketable data" means. But I do know that we're
> like 70 postings into this thread, into the third patch set regarding
> this, yet nobody has bothered to contribute any data on what these
> patches do and what the quantifiable benefits are with HW offload of
> these protocols. I would test this stuff myself, but I don't have
> access to any NICs with necessary support. If someone else can start
> testing and providing meaningful data it would be most helpful...

Here is an example of something kind of like what I am talking about:
http://www.mellanox.com/related-docs/whitepapers/CB_Intel_XL710.pdf

I have seen evidence of the gains first hand.  The biggest gain ends
up being the result of GRO, and you cannot make use of GRO without
some form of Rx checksum offload.

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-06 18:44                                                     ` Alexander Duyck
@ 2015-12-06 21:30                                                       ` Tom Herbert
  2015-12-07  1:20                                                         ` Alexander Duyck
  0 siblings, 1 reply; 94+ messages in thread
From: Tom Herbert @ 2015-12-06 21:30 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

On Sun, Dec 6, 2015 at 10:44 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Sun, Dec 6, 2015 at 8:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>> The only spot I think you and I disagreed on was the approach.  I
>>> don't know if the hard push back does anything but punish the users by
>>> delaying the time needed to find a reasonable solution.  I really
>>> think if we are going to get the hardware vendors to change their
>>> behavior we have to create a market demand for it.  Having a bit of
>>> marketable data showing the folly of this approach versus the 1's
>>> compliment checksum would probably do more to encourage and/or shame
>>> them into it than simply pushing for this based on engineering
>>> opinion.
>>>
>> I don't know what "marketable data" means. But I do know that we're
>> like 70 postings into this thread, into the third patch set regarding
>> this, yet nobody has bothered to contribute any data on what these
>> patches do and what the quantifiable benefits are with HW offload of
>> these protocols. I would test this stuff myself, but I don't have
>> access to any NICs with necessary support. If someone else can start
>> testing and providing meaningful data it would be most helpful...
>
> Here is an example of something kind of like what I am talking about:
> http://www.mellanox.com/related-docs/whitepapers/CB_Intel_XL710.pdf
>
> I have seen evidence of the gains first hand.  The biggest gain ends
> up being the result of GRO, and you cannot make use of GRO without
> some form of Rx checksum offload.
>
Right, but we recoup the gains of GRO simply by enabling the UDP
checksum. This works for all the UDP encapsulations, and probably
about all NICs in deployment. You don't need protocol specific
offloads for this. I have posted performance data many times on this,
it is a clear win.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-06 21:30                                                       ` Tom Herbert
@ 2015-12-07  1:20                                                         ` Alexander Duyck
  2015-12-07  3:02                                                           ` David Ahern
  0 siblings, 1 reply; 94+ messages in thread
From: Alexander Duyck @ 2015-12-07  1:20 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

On Sun, Dec 6, 2015 at 1:30 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Sun, Dec 6, 2015 at 10:44 AM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Sun, Dec 6, 2015 at 8:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>> The only spot I think you and I disagreed on was the approach.  I
>>>> don't know if the hard push back does anything but punish the users by
>>>> delaying the time needed to find a reasonable solution.  I really
>>>> think if we are going to get the hardware vendors to change their
>>>> behavior we have to create a market demand for it.  Having a bit of
>>>> marketable data showing the folly of this approach versus the 1's
>>>> compliment checksum would probably do more to encourage and/or shame
>>>> them into it than simply pushing for this based on engineering
>>>> opinion.
>>>>
>>> I don't know what "marketable data" means. But I do know that we're
>>> like 70 postings into this thread, into the third patch set regarding
>>> this, yet nobody has bothered to contribute any data on what these
>>> patches do and what the quantifiable benefits are with HW offload of
>>> these protocols. I would test this stuff myself, but I don't have
>>> access to any NICs with necessary support. If someone else can start
>>> testing and providing meaningful data it would be most helpful...
>>
>> Here is an example of something kind of like what I am talking about:
>> http://www.mellanox.com/related-docs/whitepapers/CB_Intel_XL710.pdf
>>
>> I have seen evidence of the gains first hand.  The biggest gain ends
>> up being the result of GRO, and you cannot make use of GRO without
>> some form of Rx checksum offload.
>>
> Right, but we recoup the gains of GRO simply by enabling the UDP
> checksum. This works for all the UDP encapsulations, and probably
> about all NICs in deployment. You don't need protocol specific
> offloads for this. I have posted performance data many times on this,
> it is a clear win.

That works for Linux to Linux, but what about the cases where you have
a non-Linux endpoint on the other end such as something like a Cisco
switch?  That is where having the protocol specific offload is useful
as long as the hardware has sufficient capabilities to support it.

As far as trying to get the vendors to give up their protocol parsing,
it will probably never happen.  It wouldn't surprise me if it is due
to product requirements from Microsoft in order to support things like
RSS, RSC, and filtering on inner header fields.  If Linux doesn't care
about that we can drop support for it, but it still doesn't mean they
can drop those bits from the hardware design so they would likely
interpret this as a request to add a new feature instead of fixing or
replacing the existing checksum approach.

- Alex

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-07  1:20                                                         ` Alexander Duyck
@ 2015-12-07  3:02                                                           ` David Ahern
  2015-12-07 16:20                                                             ` Jesse Gross
  0 siblings, 1 reply; 94+ messages in thread
From: David Ahern @ 2015-12-07  3:02 UTC (permalink / raw)
  To: Alexander Duyck, Tom Herbert
  Cc: David Miller, Hannes Frederic Sowa, John Linville, Jesse Gross,
	Anjali Singhai Jain, Netdev, Kiran Patil

On 12/6/15 6:20 PM, Alexander Duyck wrote:
> That works for Linux to Linux, but what about the cases where you have
> a non-Linux endpoint on the other end such as something like a Cisco
> switch?

Why does is matter what kind of switch the NIC is connected to?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-07  3:02                                                           ` David Ahern
@ 2015-12-07 16:20                                                             ` Jesse Gross
  0 siblings, 0 replies; 94+ messages in thread
From: Jesse Gross @ 2015-12-07 16:20 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexander Duyck, Tom Herbert, David Miller, Hannes Frederic Sowa,
	John Linville, Anjali Singhai Jain, Netdev, Kiran Patil

On Sun, Dec 6, 2015 at 7:02 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
> On 12/6/15 6:20 PM, Alexander Duyck wrote:
>>
>> That works for Linux to Linux, but what about the cases where you have
>> a non-Linux endpoint on the other end such as something like a Cisco
>> switch?
>
>
> Why does is matter what kind of switch the NIC is connected to?

I think Cisco was just an example, not anything particular about their
switches. But there are two general problems:

 * Some protocols, like VXLAN, recommend that the UDP checksum be zero
so this is what pretty much everyone implements. As a result,
independent of the merits of using the checksum, most non-Linux
endpoints won't support it.

* The reason why this recommendation exists in the first place is that
most ASIC based switches can't compute/verify UDP checksums. They
slice off the headers and only run that through the chip's core
memory, so the rest of the packet isn't available to compute a
checksum over.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-03  0:15                           ` Tom Herbert
@ 2015-12-08  7:33                             ` John Fastabend
  2015-12-08 14:23                               ` Jamal Hadi Salim
  0 siblings, 1 reply; 94+ messages in thread
From: John Fastabend @ 2015-12-08  7:33 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Hannes Frederic Sowa, John W. Linville, Jesse Gross,
	David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On 15-12-02 04:15 PM, Tom Herbert wrote:
> On Wed, Dec 2, 2015 at 3:35 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>> [...]
>>
>>>>
>>>> I wonder why we need protocol generic offloads? I know there are
>>>> currently a lot of overlay encapsulation protocols. Are there many more
>>>> coming?
>>>>
>>> Yes, and assume that there are more coming with an unbounded limit
>>> (for instance I just noticed today that there is a netdev1.1 talk on
>>> supporting GTP in the kernel). Besides, this problem space not just
>>> limited to offload of encapsulation protocols, but how to generalize
>>> offload of any transport, IPv[46], application protocols, protocol
>>> implemented in user space, security protocols, etc.
>>>
>>>> Besides, this offload is about TSO and RSS and they do need to parse the
>>>> packet to get the information where the inner header starts. It is not
>>>> only about checksum offloading.
>>>>
>>> RSS does not require the device to parse the inner header. All the UDP
>>> encapsulations protocols being defined set the source port to entropy
>>> flow value and most devices already support RSS+UDP (just needs to be
>>> enabled) so this works just fine with dumb NICs. In fact, this is one
>>> of the main motivations of encapsulating UDP in the first place, to
>>> leverage existing RSS and ECMP mechanisms. The more general solution
>>> is to use IPv6 flow label (RFC6438). We need HW support to include the
>>> flow label into the hash for ECMP and RSS, but once we have that much
>>> of the motivation for using UDP goes away and we can get back to just
>>> doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and
>>> complexity of UDP encap).
>>>
>>>> Please provide a sketch up for a protocol generic api that can tell
>>>> hardware where a inner protocol header starts that supports vxlan,
>>>> vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
>>>> starting at that point.
>>>>
>>> BPF. Implementing protocol generic offloads are not just a HW concern
>>> either, adding kernel GRO code for every possible protocol that comes
>>> along doesn't scale well. This becomes especially obvious when we
>>> consider how to provide offloads for applications protocols. If the
>>> kernel provides a programmable framework for the offloads then
>>> application protocols, such as QUIC, could use use that without
>>> needing to hack the kernel to support the specific protocol (which no
>>> one wants!). Application protocol parsing in KCM and some other use
>>> cases of BPF have already foreshadowed this, and we are working on a
>>> prototype for a BPF programmable engine in the kernel. Presumably,
>>> this same model could eventually be applied as the HW API to
>>> programmable offload.
>>
>> Just keying off the last statement there...
>>
>> I think BPF programs are going to be hard to translate into hardware
>> for most devices. The problem is the BPF programs in general lack
>> structure. A parse graph would be much more friendly for hardware or
>> at minimum the BPF program would need to be a some sort of
>> well-structured program so a driver could turn that into a parse graph.
>>
> This might be relevant:
> http://richard.systems/research/pdf/IEEE_HPSR_BPF_OPENFLOW.pdf
> 

Thanks Tom interesting read but they seem to argue for a BPF engine in
hardware which I'm still not convinced is necessary and the numbers
provided are for a 1Gbps link where 10Gpbs/100Gbps+ would be more
valuable.

I am still leaning towards a fully programmable parse graph and a set
of basic actions push/pop/set/fwd/etc. This would be useful for other
features not just checksum offloads. I guess it doesn't necessarily
exclude also having 1s complement logic though.

.John

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-08  7:33                             ` John Fastabend
@ 2015-12-08 14:23                               ` Jamal Hadi Salim
  2015-12-08 15:10                                 ` Jamal Hadi Salim
  0 siblings, 1 reply; 94+ messages in thread
From: Jamal Hadi Salim @ 2015-12-08 14:23 UTC (permalink / raw)
  To: John Fastabend, Tom Herbert
  Cc: Hannes Frederic Sowa, John W. Linville, Jesse Gross,
	David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On 15-12-08 02:33 AM, John Fastabend wrote:
> On 15-12-02 04:15 PM, Tom Herbert wrote:

>>>
>>> Just keying off the last statement there...
>>>
>>> I think BPF programs are going to be hard to translate into hardware
>>> for most devices. The problem is the BPF programs in general lack
>>> structure. A parse graph would be much more friendly for hardware or
>>> at minimum the BPF program would need to be a some sort of
>>> well-structured program so a driver could turn that into a parse graph.
>>>
>> This might be relevant:
>> http://richard.systems/research/pdf/IEEE_HPSR_BPF_OPENFLOW.pdf
>>
>
> Thanks Tom interesting read but they seem to argue for a BPF engine in
> hardware which I'm still not convinced is necessary and the numbers
> provided are for a 1Gbps link where 10Gpbs/100Gbps+ would be more
> valuable.
>
> I am still leaning towards a fully programmable parse graph and a set
> of basic actions push/pop/set/fwd/etc. This would be useful for other
> features not just checksum offloads. I guess it doesn't necessarily
> exclude also having 1s complement logic though.


;-> I feel a little vindicated with this discussion.

Of course you can implement hardware using BPF! I think there is an
opportunity for someone to build such hardware, if one is not in
progress of being built yet.
A BPF hardware implementation is just a very different approach;
instead of it being a series of TCAM table hardware implementation
(and/or other  types of implementation which use DRAM etc), it becomes
CPU instructions. Surely one can cast the EBPF bytecode into an ASIC.
My disagreement with Tom is laying a stake that this is how hardware
features are to be exposed.
My disagreement with you is laying a stake in the ground that hardware
oughta be implemented using a series of Tubes^WTCAMS.
When i build a graphics card the API is not how the internal 
implementation works. Everbody conforms to the same driver APIs.
Likewise, we have Linux APIs - and switchdev
is the right direction. Write your driver to switchdev interfaces.
Let a thousand flowers bloom.

BTW: It is bordering on the abstraction-ridiculous when I see the
P4 claims to use ebpf and then somehow translate to use
classifier/actions.


cheers,
jamal

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-08 14:23                               ` Jamal Hadi Salim
@ 2015-12-08 15:10                                 ` Jamal Hadi Salim
  2015-12-09  1:40                                   ` Thomas Graf
  0 siblings, 1 reply; 94+ messages in thread
From: Jamal Hadi Salim @ 2015-12-08 15:10 UTC (permalink / raw)
  To: John Fastabend, Tom Herbert
  Cc: Hannes Frederic Sowa, John W. Linville, Jesse Gross,
	David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On 15-12-08 09:23 AM, Jamal Hadi Salim wrote:
> On 15-12-08 02:33 AM, John Fastabend wrote:

> ;-> I feel a little vindicated with this discussion.
>
> Of course you can implement hardware using BPF!

BTW - Just to be clear; I am not arguing for what that paper
preaches. What the paper preaches is an academic exercise
(square hole, round peg - must fit into OF description).
What i am saying is you can take the ebpf instruction set and
create a cpu that executes those instructions.

cheers,
jamal

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-08 15:10                                 ` Jamal Hadi Salim
@ 2015-12-09  1:40                                   ` Thomas Graf
  2015-12-09  5:45                                     ` Alexei Starovoitov
  0 siblings, 1 reply; 94+ messages in thread
From: Thomas Graf @ 2015-12-09  1:40 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: John Fastabend, Tom Herbert, Hannes Frederic Sowa,
	John W. Linville, Jesse Gross, David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On 12/08/15 at 10:10am, Jamal Hadi Salim wrote:
> On 15-12-08 09:23 AM, Jamal Hadi Salim wrote:
> >On 15-12-08 02:33 AM, John Fastabend wrote:
> 
> >;-> I feel a little vindicated with this discussion.
> >
> >Of course you can implement hardware using BPF!
> 
> BTW - Just to be clear; I am not arguing for what that paper
> preaches. What the paper preaches is an academic exercise
> (square hole, round peg - must fit into OF description).
> What i am saying is you can take the ebpf instruction set and
> create a cpu that executes those instructions.

I'm still having a difficulty trying to understand what exactly
the intended proposal around this is. You may have just answered
my question but just to make sure: When people refer to
implementing or interpreting BPF in hardware, do they mean:

 1) A limited BPF instruction set used as descriptive language
    to define match/action logic?
 2) A specific (versioned) BPF instruction set which hardware
    can support?
 3) The full BPF instruction set of the current kernel + all
    defined helper functions and tail call support?

Would programs of 2) and 3) nature be simply rejected or would
the driver convert them somehow?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-09  1:40                                   ` Thomas Graf
@ 2015-12-09  5:45                                     ` Alexei Starovoitov
  2015-12-09 12:58                                       ` Thomas Graf
  0 siblings, 1 reply; 94+ messages in thread
From: Alexei Starovoitov @ 2015-12-09  5:45 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jamal Hadi Salim, John Fastabend, Tom Herbert,
	Hannes Frederic Sowa, John W. Linville, Jesse Gross,
	David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Wed, Dec 09, 2015 at 02:40:38AM +0100, Thomas Graf wrote:
> 
> I'm still having a difficulty trying to understand what exactly
> the intended proposal around this is. You may have just answered
> my question but just to make sure: When people refer to
> implementing or interpreting BPF in hardware, do they mean:
> 
>  1) A limited BPF instruction set used as descriptive language
>     to define match/action logic?
>  2) A specific (versioned) BPF instruction set which hardware
>     can support?
>  3) The full BPF instruction set of the current kernel + all
>     defined helper functions and tail call support?

definetely not 1, not 2 and hardly 3.
bpf verifier in 2k lines does full code analysis with all branches,
memory accesses and so on, so it's not hard to understand _intent_
of the program by any HW backend.
I agree with John that it's not trivial to convert bpf program into
parse graph that intel asic understands, but it's not hard either.
fpga based nic/switch can convert a program into parallel gates.
netronom nic can JIT it into their instruction set.
Programmable switch asics can equally understand intent of the
program and convert it into their firmware.
The easiest would be arm-based nics.
In all cases HW will not be able to convert all possible programs,
but it's not a limitation of instruction set. That's why 1 and 2
above doesn't really apply.
Different explanation of the above:
think of bpf as intermediate representation. When C or some other
language is used to describe what dataplane suppose to do
the compiler generates bpf==IR which is later compiled by hw specific
backend into target. That target can be fpga, asic, npu, etc.
Some backends will be simple and small enough to stay completely
within kernel. Some backends (like fpga) would need to
call_usermodehelper() or similar, since netlist compilation is
tedious and slow process.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-09  5:45                                     ` Alexei Starovoitov
@ 2015-12-09 12:58                                       ` Thomas Graf
  2015-12-09 17:38                                         ` Alexei Starovoitov
  0 siblings, 1 reply; 94+ messages in thread
From: Thomas Graf @ 2015-12-09 12:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jamal Hadi Salim, John Fastabend, Tom Herbert,
	Hannes Frederic Sowa, John W. Linville, Jesse Gross,
	David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On 12/08/15 at 09:45pm, Alexei Starovoitov wrote:
> definetely not 1, not 2 and hardly 3.
> bpf verifier in 2k lines does full code analysis with all branches,
> memory accesses and so on, so it's not hard to understand _intent_
> of the program by any HW backend.
> I agree with John that it's not trivial to convert bpf program into
> parse graph that intel asic understands, but it's not hard either.
> fpga based nic/switch can convert a program into parallel gates.
> netronom nic can JIT it into their instruction set.
> Programmable switch asics can equally understand intent of the
> program and convert it into their firmware.
> The easiest would be arm-based nics.
> In all cases HW will not be able to convert all possible programs,
> but it's not a limitation of instruction set. That's why 1 and 2
> above doesn't really apply.
> Different explanation of the above:
> think of bpf as intermediate representation. When C or some other
> language is used to describe what dataplane suppose to do
> the compiler generates bpf==IR which is later compiled by hw specific
> backend into target. That target can be fpga, asic, npu, etc.
> Some backends will be simple and small enough to stay completely
> within kernel. Some backends (like fpga) would need to
> call_usermodehelper() or similar, since netlist compilation is
> tedious and slow process.

Trying to summarize that, the definition of a BPF program in the
context of this discussion is: a BPF program of which the driver
or firmware/NIC can understand the original intent. Unless the NIC
can JIT, this implies reverse engineering the control flow into
a declarative model.

So if the goal is to make the intent available to the hardware in
a format which both the kernel and the hardware can draw the same
conclusions from, wouldn't something like P4 + BPF derived from P4
be a possibly better fit? There is discussion on stateful P4
processing now.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-09 12:58                                       ` Thomas Graf
@ 2015-12-09 17:38                                         ` Alexei Starovoitov
  2015-12-09 20:03                                           ` David Miller
  2015-12-09 22:03                                           ` Thomas Graf
  0 siblings, 2 replies; 94+ messages in thread
From: Alexei Starovoitov @ 2015-12-09 17:38 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jamal Hadi Salim, John Fastabend, Tom Herbert,
	Hannes Frederic Sowa, John W. Linville, Jesse Gross,
	David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On Wed, Dec 09, 2015 at 01:58:57PM +0100, Thomas Graf wrote:
> 
> So if the goal is to make the intent available to the hardware in
> a format which both the kernel and the hardware can draw the same
> conclusions from, wouldn't something like P4 + BPF derived from P4
> be a possibly better fit? There is discussion on stateful P4
> processing now.

p4 is a high level language and absolutely not suitable for such purpose.
bpf as intermediate representation can be generated from p4 or C or other
language. There is room to innovate in the language definition on top
and in HW design at the bottom. That's the most flexible model.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-09 17:38                                         ` Alexei Starovoitov
@ 2015-12-09 20:03                                           ` David Miller
  2015-12-09 22:03                                           ` Thomas Graf
  1 sibling, 0 replies; 94+ messages in thread
From: David Miller @ 2015-12-09 20:03 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: tgraf, jhs, john.fastabend, tom, hannes, linville, jesse,
	anjali.singhai, netdev, kiran.patil

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Wed, 9 Dec 2015 09:38:44 -0800

> p4 is a high level language and absolutely not suitable for such purpose.
> bpf as intermediate representation can be generated from p4 or C or other
> language. There is room to innovate in the language definition on top
> and in HW design at the bottom. That's the most flexible model.

+1

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-09 17:38                                         ` Alexei Starovoitov
  2015-12-09 20:03                                           ` David Miller
@ 2015-12-09 22:03                                           ` Thomas Graf
  2015-12-09 22:21                                             ` David Miller
  1 sibling, 1 reply; 94+ messages in thread
From: Thomas Graf @ 2015-12-09 22:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jamal Hadi Salim, John Fastabend, Tom Herbert,
	Hannes Frederic Sowa, John W. Linville, Jesse Gross,
	David Miller, Anjali Singhai Jain,
	Linux Kernel Network Developers, Kiran Patil

On 12/09/15 at 09:38am, Alexei Starovoitov wrote:
> On Wed, Dec 09, 2015 at 01:58:57PM +0100, Thomas Graf wrote:
> > 
> > So if the goal is to make the intent available to the hardware in
> > a format which both the kernel and the hardware can draw the same
> > conclusions from, wouldn't something like P4 + BPF derived from P4
> > be a possibly better fit? There is discussion on stateful P4
> > processing now.
> 
> p4 is a high level language and absolutely not suitable for such purpose.
> bpf as intermediate representation can be generated from p4 or C or other
> language. There is room to innovate in the language definition on top
> and in HW design at the bottom. That's the most flexible model.

If you don't want to discuss it, no problem. But stating that P4
is a high level language (not sure what this means exactly since
we exactly _want_ an abstraction away from hardware) and that it's
not suitable for this purpose is just wrong. P4 has been created
exactly for the purpose of expressing how a packet should be
processed by a forwarding element independent of specific hardware.

There is a lot of interesting open source work coming out of that
space and I think we owe it to at least consider P4. The goal is
very much in line with what we want to achieve as Linux community
as well.

I'll wait for your proposal as you stated you are working on
something specific.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-09 22:03                                           ` Thomas Graf
@ 2015-12-09 22:21                                             ` David Miller
  2015-12-09 22:25                                               ` Thomas Graf
  0 siblings, 1 reply; 94+ messages in thread
From: David Miller @ 2015-12-09 22:21 UTC (permalink / raw)
  To: tgraf
  Cc: alexei.starovoitov, jhs, john.fastabend, tom, hannes, linville,
	jesse, anjali.singhai, netdev, kiran.patil

From: Thomas Graf <tgraf@suug.ch>
Date: Wed, 9 Dec 2015 23:03:39 +0100

> On 12/09/15 at 09:38am, Alexei Starovoitov wrote:
>> On Wed, Dec 09, 2015 at 01:58:57PM +0100, Thomas Graf wrote:
>> > 
>> > So if the goal is to make the intent available to the hardware in
>> > a format which both the kernel and the hardware can draw the same
>> > conclusions from, wouldn't something like P4 + BPF derived from P4
>> > be a possibly better fit? There is discussion on stateful P4
>> > processing now.
>> 
>> p4 is a high level language and absolutely not suitable for such purpose.
>> bpf as intermediate representation can be generated from p4 or C or other
>> language. There is room to innovate in the language definition on top
>> and in HW design at the bottom. That's the most flexible model.
> 
> If you don't want to discuss it, no problem. But stating that P4
> is a high level language (not sure what this means exactly since
> we exactly _want_ an abstraction away from hardware) and that it's
> not suitable for this purpose is just wrong. P4 has been created
> exactly for the purpose of expressing how a packet should be
> processed by a forwarding element independent of specific hardware.

Just because it was supposeduly designed for this purpose, doesn't
mean it's the most appropriate intermediate language between what
the kernel wants hardware to do and what actually has to happen for
the hardware to do that.

BPF is so much more universal and can cover everything we'd want
hardware to perform, and then some.

Plus it's everywhere in the kernel already, has a full validation and
test suite, full LLVM backend, plus JITs for several prominent
architectures with more on the way.

It is clearly the most appropriate middle layer representation.

The fact that BPF could be generated from any P4 program, yet the
reverse is not true, tells me everything I need to know.

I'm sorry if you have either a mental or a time invenstment in P4, but
I really don't see it as really suitable for this.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
  2015-12-09 22:21                                             ` David Miller
@ 2015-12-09 22:25                                               ` Thomas Graf
  0 siblings, 0 replies; 94+ messages in thread
From: Thomas Graf @ 2015-12-09 22:25 UTC (permalink / raw)
  To: David Miller
  Cc: alexei.starovoitov, jhs, john.fastabend, tom, hannes, linville,
	jesse, anjali.singhai, netdev, kiran.patil

On 12/09/15 at 05:21pm, David Miller wrote:
> It is clearly the most appropriate middle layer representation.
> 
> The fact that BPF could be generated from any P4 program, yet the
> reverse is not true, tells me everything I need to know.
> 
> I'm sorry if you have either a mental or a time invenstment in P4, but
> I really don't see it as really suitable for this.

I don't. I like the approach and the effect it has on a currently
very vendor secrets oriented environment.

I won't drag this further. I'm perfectly fine if BPF is suitable for
a wide range of hardware models.

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2015-12-09 22:25 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-23 21:02 [PATCH 0/6] Generalize udp based tunnels and add geneve offload Anjali Singhai Jain
2015-11-23 21:02 ` [PATCH v1 1/6] net: Generalize udp based tunnel offload Anjali Singhai Jain
2015-11-23 20:57   ` kbuild test robot
2015-11-23 20:58   ` kbuild test robot
2015-11-23 21:53   ` Tom Herbert
2015-11-23 22:49     ` Jesse Gross
2015-11-24  0:32       ` Singhai, Anjali
2015-11-24  0:38         ` Tom Herbert
2015-11-24  1:11           ` Jesse Brandeburg
2015-11-24 17:32             ` Tom Herbert
2015-11-24 17:43               ` Hannes Frederic Sowa
2015-11-24 17:52                 ` Tom Herbert
2015-11-24 18:16                   ` Hannes Frederic Sowa
2015-11-24 18:37                 ` David Miller
2015-11-24 18:42                   ` Hannes Frederic Sowa
2015-11-24 18:43                   ` Tom Herbert
2015-11-30  3:22               ` David Miller
2015-11-30 21:42                 ` Singhai, Anjali
2015-11-30 21:48                   ` Tom Herbert
2015-12-01  3:51                     ` David Miller
2015-12-01  3:48                   ` David Miller
2015-12-01  6:33                     ` Alexander Duyck
2015-11-30  3:21     ` David Miller
2015-11-30 21:33       ` Singhai, Anjali
2015-12-01  0:25       ` Jesse Gross
2015-12-01  1:02         ` Tom Herbert
2015-12-01  1:28           ` Jesse Gross
2015-12-01  5:26             ` Tom Herbert
2015-12-01 15:44               ` John W. Linville
2015-12-01 15:49                 ` Hannes Frederic Sowa
2015-12-01 16:08                   ` John W. Linville
2015-12-02  0:40                     ` Singhai, Anjali
2015-12-02  3:50                   ` Tom Herbert
2015-12-02 16:35                     ` Hannes Frederic Sowa
2015-12-02 19:15                       ` Tom Herbert
2015-12-02 23:35                         ` John Fastabend
2015-12-03  0:15                           ` Tom Herbert
2015-12-08  7:33                             ` John Fastabend
2015-12-08 14:23                               ` Jamal Hadi Salim
2015-12-08 15:10                                 ` Jamal Hadi Salim
2015-12-09  1:40                                   ` Thomas Graf
2015-12-09  5:45                                     ` Alexei Starovoitov
2015-12-09 12:58                                       ` Thomas Graf
2015-12-09 17:38                                         ` Alexei Starovoitov
2015-12-09 20:03                                           ` David Miller
2015-12-09 22:03                                           ` Thomas Graf
2015-12-09 22:21                                             ` David Miller
2015-12-09 22:25                                               ` Thomas Graf
2015-12-03  2:08                           ` Alexei Starovoitov
2015-12-03 15:59                         ` Hannes Frederic Sowa
2015-12-03 16:35                           ` Andreas Schultz
2015-12-03 16:43                             ` Hannes Frederic Sowa
2015-12-04 18:28                           ` Tom Herbert
2015-12-04 19:54                             ` John Fastabend
2015-12-04 19:59                             ` Hannes Frederic Sowa
2015-12-04 20:02                               ` Hannes Frederic Sowa
2015-12-04 20:06                               ` David Miller
2015-12-04 20:13                                 ` Tom Herbert
2015-12-04 21:37                                   ` David Miller
2015-12-04 20:26                                 ` Hannes Frederic Sowa
2015-12-04 20:43                                   ` Tom Herbert
2015-12-04 21:11                                     ` Hannes Frederic Sowa
2015-12-04 20:44                                   ` Jesse Gross
2015-12-04 22:44                                 ` Alexander Duyck
2015-12-05  0:53                                   ` Tom Herbert
2015-12-05  5:45                                     ` Alexander Duyck
2015-12-05  6:49                                       ` David Miller
2015-12-05  8:24                                         ` Alexander Duyck
2015-12-05 17:53                                           ` Tom Herbert
2015-12-05 19:34                                             ` Alexander Duyck
2015-12-05 18:03                                           ` David Miller
2015-12-05 19:34                                             ` Alexander Duyck
2015-12-05 22:27                                               ` David Miller
2015-12-06  2:13                                                 ` Alexander Duyck
2015-12-06 16:31                                                   ` Tom Herbert
2015-12-06 18:44                                                     ` Alexander Duyck
2015-12-06 21:30                                                       ` Tom Herbert
2015-12-07  1:20                                                         ` Alexander Duyck
2015-12-07  3:02                                                           ` David Ahern
2015-12-07 16:20                                                             ` Jesse Gross
2015-12-05  4:50                                   ` David Miller
2015-12-05  6:50                                     ` Alexander Duyck
2015-11-24  5:41   ` Alexander Duyck
2015-11-30 16:35   ` Tom Herbert
2015-11-30 21:53     ` Singhai, Anjali
2015-12-01  3:52       ` David Miller
2015-11-23 21:02 ` [PATCH v1 2/6] net: Add a generic udp_offload_get_port function Anjali Singhai Jain
2015-11-24  6:08   ` Alexander Duyck
2015-11-24  6:37   ` Alexander Duyck
2015-11-24 19:35     ` Singhai, Anjali
2015-11-23 21:02 ` [PATCH v1 3/6] i40e: Generalize the flow for udp based tunnels Anjali Singhai Jain
2015-11-23 21:02 ` [PATCH v1 4/6] i40e: Remove CONFIG_I40E_VXLAN Anjali Singhai Jain
2015-11-23 21:02 ` [PATCH v1 5/6] net: Refactor udp_offload and add Geneve port offload support Anjali Singhai Jain
2015-11-23 21:02 ` [PATCH v1 6/6] i40e:Add geneve tunnel " Anjali Singhai Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.