All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath
@ 2014-09-03  9:24 Jiri Pirko
  2014-09-03  9:24 ` [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones Jiri Pirko
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, dev, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye

This patchset can be divided into 3 main sections:
- introduce switchdev api for implementing switch drivers
- add hardware acceleration bits into openvswitch datapath, This uses
  previously mentioned switchdev api
- introduce rocker switch driver which implements switchdev api

More info in separate patches.

So now there is possible out of the box to create ovs bridge over rocker
switch ports and the flows will be offloaded into hardware.

RFC->v1 changes:
- moved include/linux/*.h -> include/net/
- moved net/core/switchdev.c -> net/switchdev/
- moved drivers/net/rocker.* -> drivers/net/ethernet/rocker/
- fixed couple of little bugs and typos
- in dsa the switch id is generated randomly
- fixed rocker schedule in atomic context bug in rocker_port_set_rx_mode 
- added switchdev Netlink API

Jiri Pirko (13):
  openvswitch: split flow structures into ovs specific and generic ones
  net: rename netdev_phys_port_id to more generic name
  net: introduce generic switch devices support
  rtnl: expose physical switch id for particular device
  net-sysfs: expose physical switch id for particular device
  net: introduce dummy switch
  dsa: implement ndo_swdev_get_id
  net: introduce netdev_phys_item_ids_match helper
  openvswitch: introduce vport_op get_netdev
  openvswitch: add support for datapath hardware offload
  sw_flow: add misc section to key with in_port_ifindex field
  rocker: introduce rocker switch driver
  switchdev: introduce Netlink API

 Documentation/networking/switchdev.txt           |   53 +
 MAINTAINERS                                      |   14 +
 drivers/net/Kconfig                              |    7 +
 drivers/net/Makefile                             |    1 +
 drivers/net/dummyswitch.c                        |  130 +
 drivers/net/ethernet/Kconfig                     |    1 +
 drivers/net/ethernet/Makefile                    |    1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c      |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |    2 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    2 +-
 drivers/net/ethernet/rocker/Kconfig              |   29 +
 drivers/net/ethernet/rocker/Makefile             |    5 +
 drivers/net/ethernet/rocker/rocker.c             | 3553 ++++++++++++++++++++++
 drivers/net/ethernet/rocker/rocker.h             |  465 +++
 include/linux/netdevice.h                        |   54 +-
 include/net/dsa.h                                |    1 +
 include/net/sw_flow.h                            |  116 +
 include/net/switchdev.h                          |   44 +
 include/uapi/linux/if_link.h                     |   10 +
 include/uapi/linux/switchdev.h                   |  119 +
 net/Kconfig                                      |    1 +
 net/Makefile                                     |    3 +
 net/core/dev.c                                   |    2 +-
 net/core/net-sysfs.c                             |   26 +-
 net/core/rtnetlink.c                             |   30 +-
 net/dsa/Kconfig                                  |    2 +-
 net/dsa/dsa.c                                    |    3 +
 net/dsa/slave.c                                  |   10 +
 net/openvswitch/Makefile                         |    3 +-
 net/openvswitch/actions.c                        |    3 +-
 net/openvswitch/datapath.c                       |  109 +-
 net/openvswitch/datapath.h                       |    7 +-
 net/openvswitch/dp_notify.c                      |    7 +-
 net/openvswitch/flow.c                           |    6 +-
 net/openvswitch/flow.h                           |  102 +-
 net/openvswitch/flow_netlink.c                   |   53 +-
 net/openvswitch/flow_netlink.h                   |   10 +-
 net/openvswitch/flow_table.c                     |  119 +-
 net/openvswitch/flow_table.h                     |   30 +-
 net/openvswitch/hw_offload.c                     |  267 ++
 net/openvswitch/hw_offload.h                     |   22 +
 net/openvswitch/vport-gre.c                      |    4 +-
 net/openvswitch/vport-internal_dev.c             |   56 +-
 net/openvswitch/vport-netdev.c                   |   19 +
 net/openvswitch/vport-netdev.h                   |   12 -
 net/openvswitch/vport-vxlan.c                    |    2 +-
 net/openvswitch/vport.c                          |    2 +-
 net/openvswitch/vport.h                          |    6 +-
 net/switchdev/Kconfig                            |   20 +
 net/switchdev/Makefile                           |    6 +
 net/switchdev/switchdev.c                        |  174 ++
 net/switchdev/switchdev_netlink.c                |  493 +++
 53 files changed, 5931 insertions(+), 289 deletions(-)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 drivers/net/dummyswitch.c
 create mode 100644 drivers/net/ethernet/rocker/Kconfig
 create mode 100644 drivers/net/ethernet/rocker/Makefile
 create mode 100644 drivers/net/ethernet/rocker/rocker.c
 create mode 100644 drivers/net/ethernet/rocker/rocker.h
 create mode 100644 include/net/sw_flow.h
 create mode 100644 include/net/switchdev.h
 create mode 100644 include/uapi/linux/switchdev.h
 create mode 100644 net/openvswitch/hw_offload.c
 create mode 100644 net/openvswitch/hw_offload.h
 create mode 100644 net/switchdev/Kconfig
 create mode 100644 net/switchdev/Makefile
 create mode 100644 net/switchdev/switchdev.c
 create mode 100644 net/switchdev/switchdev_netlink.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
  2014-09-03  9:24 [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Jiri Pirko
@ 2014-09-03  9:24 ` Jiri Pirko
       [not found]   ` <1409736300-12303-2-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
  2014-09-03 18:41   ` Pravin Shelar
  2014-09-03  9:24 ` [patch net-next 02/13] net: rename netdev_phys_port_id to more generic name Jiri Pirko
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, dev, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye

After this, flow related structures can be used in other code.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/net/sw_flow.h          |  99 ++++++++++++++++++++++++++++++++++
 net/openvswitch/actions.c      |   3 +-
 net/openvswitch/datapath.c     |  74 +++++++++++++-------------
 net/openvswitch/datapath.h     |   4 +-
 net/openvswitch/flow.c         |   6 +--
 net/openvswitch/flow.h         | 102 +++++++----------------------------
 net/openvswitch/flow_netlink.c |  53 +++++++++---------
 net/openvswitch/flow_netlink.h |  10 ++--
 net/openvswitch/flow_table.c   | 118 ++++++++++++++++++++++-------------------
 net/openvswitch/flow_table.h   |  30 +++++------
 net/openvswitch/vport-gre.c    |   4 +-
 net/openvswitch/vport-vxlan.c  |   2 +-
 net/openvswitch/vport.c        |   2 +-
 net/openvswitch/vport.h        |   2 +-
 14 files changed, 276 insertions(+), 233 deletions(-)
 create mode 100644 include/net/sw_flow.h

diff --git a/include/net/sw_flow.h b/include/net/sw_flow.h
new file mode 100644
index 0000000..21724f1
--- /dev/null
+++ b/include/net/sw_flow.h
@@ -0,0 +1,99 @@
+/*
+ * include/net/sw_flow.h - Generic switch flow structures
+ * Copyright (c) 2007-2012 Nicira, Inc.
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _NET_SW_FLOW_H_
+#define _NET_SW_FLOW_H_
+
+struct sw_flow_key_ipv4_tunnel {
+	__be64 tun_id;
+	__be32 ipv4_src;
+	__be32 ipv4_dst;
+	__be16 tun_flags;
+	u8   ipv4_tos;
+	u8   ipv4_ttl;
+};
+
+struct sw_flow_key {
+	struct sw_flow_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
+	struct {
+		u32	priority;	/* Packet QoS priority. */
+		u32	skb_mark;	/* SKB mark. */
+		u16	in_port;	/* Input switch port (or DP_MAX_PORTS). */
+	} __packed phy; /* Safe when right after 'tun_key'. */
+	struct {
+		u8     src[ETH_ALEN];	/* Ethernet source address. */
+		u8     dst[ETH_ALEN];	/* Ethernet destination address. */
+		__be16 tci;		/* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
+		__be16 type;		/* Ethernet frame type. */
+	} eth;
+	struct {
+		u8     proto;		/* IP protocol or lower 8 bits of ARP opcode. */
+		u8     tos;		/* IP ToS. */
+		u8     ttl;		/* IP TTL/hop limit. */
+		u8     frag;		/* One of OVS_FRAG_TYPE_*. */
+	} ip;
+	struct {
+		__be16 src;		/* TCP/UDP/SCTP source port. */
+		__be16 dst;		/* TCP/UDP/SCTP destination port. */
+		__be16 flags;		/* TCP flags. */
+	} tp;
+	union {
+		struct {
+			struct {
+				__be32 src;	/* IP source address. */
+				__be32 dst;	/* IP destination address. */
+			} addr;
+			struct {
+				u8 sha[ETH_ALEN];	/* ARP source hardware address. */
+				u8 tha[ETH_ALEN];	/* ARP target hardware address. */
+			} arp;
+		} ipv4;
+		struct {
+			struct {
+				struct in6_addr src;	/* IPv6 source address. */
+				struct in6_addr dst;	/* IPv6 destination address. */
+			} addr;
+			__be32 label;			/* IPv6 flow label. */
+			struct {
+				struct in6_addr target;	/* ND target address. */
+				u8 sll[ETH_ALEN];	/* ND source link layer address. */
+				u8 tll[ETH_ALEN];	/* ND target link layer address. */
+			} nd;
+		} ipv6;
+	};
+} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
+
+struct sw_flow_key_range {
+	unsigned short int start;
+	unsigned short int end;
+};
+
+struct sw_flow_mask {
+	struct sw_flow_key_range range;
+	struct sw_flow_key key;
+};
+
+struct sw_flow_action {
+};
+
+struct sw_flow_actions {
+	unsigned count;
+	struct sw_flow_action actions[0];
+};
+
+struct sw_flow {
+	struct sw_flow_key key;
+	struct sw_flow_key unmasked_key;
+	struct sw_flow_mask *mask;
+	struct sw_flow_actions *actions;
+};
+
+#endif /* _NET_SW_FLOW_H_ */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 5231652..a044491 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -610,8 +610,9 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 /* Execute a list of actions against 'skb'. */
 int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb)
 {
-	struct sw_flow_actions *acts = rcu_dereference(OVS_CB(skb)->flow->sf_acts);
+	struct ovs_flow_actions *acts;
 
+	acts = rcu_dereference(OVS_CB(skb)->flow->sf_acts);
 	OVS_CB(skb)->tun_key = NULL;
 	return do_execute_actions(dp, skb, acts->actions, acts->actions_len);
 }
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 7228ec3..683d6cd 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -240,7 +240,7 @@ void ovs_dp_detach_port(struct vport *p)
 void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
 {
 	struct datapath *dp = p->dp;
-	struct sw_flow *flow;
+	struct ovs_flow *flow;
 	struct dp_stats_percpu *stats;
 	struct sw_flow_key key;
 	u64 *stats_counter;
@@ -505,9 +505,9 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 {
 	struct ovs_header *ovs_header = info->userhdr;
 	struct nlattr **a = info->attrs;
-	struct sw_flow_actions *acts;
+	struct ovs_flow_actions *acts;
 	struct sk_buff *packet;
-	struct sw_flow *flow;
+	struct ovs_flow *flow;
 	struct datapath *dp;
 	struct ethhdr *eth;
 	int len;
@@ -544,11 +544,11 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(flow))
 		goto err_kfree_skb;
 
-	err = ovs_flow_extract(packet, -1, &flow->key);
+	err = ovs_flow_extract(packet, -1, &flow->flow.key);
 	if (err)
 		goto err_flow_free;
 
-	err = ovs_nla_get_flow_metadata(flow, a[OVS_PACKET_ATTR_KEY]);
+	err = ovs_nla_get_flow_metadata(&flow->flow, a[OVS_PACKET_ATTR_KEY]);
 	if (err)
 		goto err_flow_free;
 	acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_PACKET_ATTR_ACTIONS]));
@@ -557,15 +557,15 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 		goto err_flow_free;
 
 	err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS],
-				   &flow->key, 0, &acts);
+				   &flow->flow.key, 0, &acts);
 	rcu_assign_pointer(flow->sf_acts, acts);
 	if (err)
 		goto err_flow_free;
 
 	OVS_CB(packet)->flow = flow;
-	OVS_CB(packet)->pkt_key = &flow->key;
-	packet->priority = flow->key.phy.priority;
-	packet->mark = flow->key.phy.skb_mark;
+	OVS_CB(packet)->pkt_key = &flow->flow.key;
+	packet->priority = flow->flow.key.phy.priority;
+	packet->mark = flow->flow.key.phy.skb_mark;
 
 	rcu_read_lock();
 	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
@@ -648,7 +648,7 @@ static void get_dp_stats(struct datapath *dp, struct ovs_dp_stats *stats,
 	}
 }
 
-static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts)
+static size_t ovs_flow_cmd_msg_size(const struct ovs_flow_actions *acts)
 {
 	return NLMSG_ALIGN(sizeof(struct ovs_header))
 		+ nla_total_size(key_attr_size()) /* OVS_FLOW_ATTR_KEY */
@@ -660,7 +660,7 @@ static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts)
 }
 
 /* Called with ovs_mutex or RCU read lock. */
-static int ovs_flow_cmd_fill_info(const struct sw_flow *flow, int dp_ifindex,
+static int ovs_flow_cmd_fill_info(const struct ovs_flow *flow, int dp_ifindex,
 				  struct sk_buff *skb, u32 portid,
 				  u32 seq, u32 flags, u8 cmd)
 {
@@ -684,7 +684,8 @@ static int ovs_flow_cmd_fill_info(const struct sw_flow *flow, int dp_ifindex,
 	if (!nla)
 		goto nla_put_failure;
 
-	err = ovs_nla_put_flow(&flow->unmasked_key, &flow->unmasked_key, skb);
+	err = ovs_nla_put_flow(&flow->flow.unmasked_key,
+			       &flow->flow.unmasked_key, skb);
 	if (err)
 		goto error;
 	nla_nest_end(skb, nla);
@@ -693,7 +694,7 @@ static int ovs_flow_cmd_fill_info(const struct sw_flow *flow, int dp_ifindex,
 	if (!nla)
 		goto nla_put_failure;
 
-	err = ovs_nla_put_flow(&flow->key, &flow->mask->key, skb);
+	err = ovs_nla_put_flow(&flow->flow.key, &flow->flow.mask->key, skb);
 	if (err)
 		goto error;
 
@@ -725,7 +726,7 @@ static int ovs_flow_cmd_fill_info(const struct sw_flow *flow, int dp_ifindex,
 	 */
 	start = nla_nest_start(skb, OVS_FLOW_ATTR_ACTIONS);
 	if (start) {
-		const struct sw_flow_actions *sf_acts;
+		const struct ovs_flow_actions *sf_acts;
 
 		sf_acts = rcu_dereference_ovsl(flow->sf_acts);
 		err = ovs_nla_put_actions(sf_acts->actions,
@@ -752,9 +753,9 @@ error:
 }
 
 /* May not be called with RCU read lock. */
-static struct sk_buff *ovs_flow_cmd_alloc_info(const struct sw_flow_actions *acts,
-					       struct genl_info *info,
-					       bool always)
+static struct sk_buff *
+ovs_flow_cmd_alloc_info(const struct ovs_flow_actions *acts,
+			struct genl_info *info, bool always)
 {
 	struct sk_buff *skb;
 
@@ -769,7 +770,7 @@ static struct sk_buff *ovs_flow_cmd_alloc_info(const struct sw_flow_actions *act
 }
 
 /* Called with ovs_mutex. */
-static struct sk_buff *ovs_flow_cmd_build_info(const struct sw_flow *flow,
+static struct sk_buff *ovs_flow_cmd_build_info(const struct ovs_flow *flow,
 					       int dp_ifindex,
 					       struct genl_info *info, u8 cmd,
 					       bool always)
@@ -793,12 +794,12 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 {
 	struct nlattr **a = info->attrs;
 	struct ovs_header *ovs_header = info->userhdr;
-	struct sw_flow *flow, *new_flow;
+	struct ovs_flow *flow, *new_flow;
 	struct sw_flow_mask mask;
 	struct sk_buff *reply;
 	struct datapath *dp;
-	struct sw_flow_actions *acts;
-	struct sw_flow_match match;
+	struct ovs_flow_actions *acts;
+	struct ovs_flow_match match;
 	int error;
 
 	/* Must have key and actions. */
@@ -818,13 +819,14 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	}
 
 	/* Extract key. */
-	ovs_match_init(&match, &new_flow->unmasked_key, &mask);
+	ovs_match_init(&match, &new_flow->flow.unmasked_key, &mask);
 	error = ovs_nla_get_match(&match,
 				  a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK]);
 	if (error)
 		goto err_kfree_flow;
 
-	ovs_flow_mask_key(&new_flow->key, &new_flow->unmasked_key, &mask);
+	ovs_flow_mask_key(&new_flow->flow.key,
+			  &new_flow->flow.unmasked_key, &mask);
 
 	/* Validate actions. */
 	acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_FLOW_ATTR_ACTIONS]));
@@ -832,8 +834,8 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(acts))
 		goto err_kfree_flow;
 
-	error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], &new_flow->key,
-				     0, &acts);
+	error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS],
+				     &new_flow->flow.key, 0, &acts);
 	if (error) {
 		OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
 		goto err_kfree_acts;
@@ -852,7 +854,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		goto err_unlock_ovs;
 	}
 	/* Check if this is a duplicate flow */
-	flow = ovs_flow_tbl_lookup(&dp->table, &new_flow->unmasked_key);
+	flow = ovs_flow_tbl_lookup(&dp->table, &new_flow->flow.unmasked_key);
 	if (likely(!flow)) {
 		rcu_assign_pointer(new_flow->sf_acts, acts);
 
@@ -873,7 +875,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		}
 		ovs_unlock();
 	} else {
-		struct sw_flow_actions *old_acts;
+		struct ovs_flow_actions *old_acts;
 
 		/* Bail out if we're not allowed to modify an existing flow.
 		 * We accept NLM_F_CREATE in place of the intended NLM_F_EXCL
@@ -932,12 +934,12 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	struct nlattr **a = info->attrs;
 	struct ovs_header *ovs_header = info->userhdr;
 	struct sw_flow_key key, masked_key;
-	struct sw_flow *flow;
+	struct ovs_flow *flow;
 	struct sw_flow_mask mask;
 	struct sk_buff *reply = NULL;
 	struct datapath *dp;
-	struct sw_flow_actions *old_acts = NULL, *acts = NULL;
-	struct sw_flow_match match;
+	struct ovs_flow_actions *old_acts = NULL, *acts = NULL;
+	struct ovs_flow_match match;
 	int error;
 
 	/* Extract key. */
@@ -1039,9 +1041,9 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info)
 	struct ovs_header *ovs_header = info->userhdr;
 	struct sw_flow_key key;
 	struct sk_buff *reply;
-	struct sw_flow *flow;
+	struct ovs_flow *flow;
 	struct datapath *dp;
-	struct sw_flow_match match;
+	struct ovs_flow_match match;
 	int err;
 
 	if (!a[OVS_FLOW_ATTR_KEY]) {
@@ -1087,9 +1089,9 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	struct ovs_header *ovs_header = info->userhdr;
 	struct sw_flow_key key;
 	struct sk_buff *reply;
-	struct sw_flow *flow;
+	struct ovs_flow *flow;
 	struct datapath *dp;
-	struct sw_flow_match match;
+	struct ovs_flow_match match;
 	int err;
 
 	if (likely(a[OVS_FLOW_ATTR_KEY])) {
@@ -1120,7 +1122,7 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	ovs_flow_tbl_remove(&dp->table, flow);
 	ovs_unlock();
 
-	reply = ovs_flow_cmd_alloc_info((const struct sw_flow_actions __force *) flow->sf_acts,
+	reply = ovs_flow_cmd_alloc_info((const struct ovs_flow_actions __force *) flow->sf_acts,
 					info, false);
 	if (likely(reply)) {
 		if (likely(!IS_ERR(reply))) {
@@ -1160,7 +1162,7 @@ static int ovs_flow_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
 
 	ti = rcu_dereference(dp->table.ti);
 	for (;;) {
-		struct sw_flow *flow;
+		struct ovs_flow *flow;
 		u32 bucket, obj;
 
 		bucket = cb->args[0];
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 701b573..291f5a0 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -100,9 +100,9 @@ struct datapath {
  * packet is not being tunneled.
  */
 struct ovs_skb_cb {
-	struct sw_flow		*flow;
+	struct ovs_flow		*flow;
 	struct sw_flow_key	*pkt_key;
-	struct ovs_key_ipv4_tunnel  *tun_key;
+	struct sw_flow_key_ipv4_tunnel  *tun_key;
 };
 #define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
 
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 7064da9..4e2d4c8 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -61,7 +61,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies)
 
 #define TCP_FLAGS_BE16(tp) (*(__be16 *)&tcp_flag_word(tp) & htons(0x0FFF))
 
-void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags,
+void ovs_flow_stats_update(struct ovs_flow *flow, __be16 tcp_flags,
 			   struct sk_buff *skb)
 {
 	struct flow_stats *stats;
@@ -123,7 +123,7 @@ unlock:
 }
 
 /* Must be called with rcu_read_lock or ovs_mutex. */
-void ovs_flow_stats_get(const struct sw_flow *flow,
+void ovs_flow_stats_get(const struct ovs_flow *flow,
 			struct ovs_flow_stats *ovs_stats,
 			unsigned long *used, __be16 *tcp_flags)
 {
@@ -152,7 +152,7 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
 }
 
 /* Called with ovs_mutex. */
-void ovs_flow_stats_clear(struct sw_flow *flow)
+void ovs_flow_stats_clear(struct ovs_flow *flow)
 {
 	int node;
 
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 5e5aaed..712314e 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -32,26 +32,18 @@
 #include <linux/time.h>
 #include <linux/flex_array.h>
 #include <net/inet_ecn.h>
+#include <net/sw_flow.h>
 
 struct sk_buff;
 
-/* Used to memset ovs_key_ipv4_tunnel padding. */
+/* Used to memset sw_flow_key_ipv4_tunnel padding. */
 #define OVS_TUNNEL_KEY_SIZE					\
-	(offsetof(struct ovs_key_ipv4_tunnel, ipv4_ttl) +	\
-	FIELD_SIZEOF(struct ovs_key_ipv4_tunnel, ipv4_ttl))
-
-struct ovs_key_ipv4_tunnel {
-	__be64 tun_id;
-	__be32 ipv4_src;
-	__be32 ipv4_dst;
-	__be16 tun_flags;
-	u8   ipv4_tos;
-	u8   ipv4_ttl;
-} __packed __aligned(4); /* Minimize padding. */
-
-static inline void ovs_flow_tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
-					 const struct iphdr *iph, __be64 tun_id,
-					 __be16 tun_flags)
+	(offsetof(struct sw_flow_key_ipv4_tunnel, ipv4_ttl) +	\
+	FIELD_SIZEOF(struct sw_flow_key_ipv4_tunnel, ipv4_ttl))
+
+static inline void
+ovs_flow_tun_key_init(struct sw_flow_key_ipv4_tunnel *tun_key,
+		      const struct iphdr *iph, __be64 tun_id, __be16 tun_flags)
 {
 	tun_key->tun_id = tun_id;
 	tun_key->ipv4_src = iph->saddr;
@@ -65,76 +57,20 @@ static inline void ovs_flow_tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
 	       sizeof(*tun_key) - OVS_TUNNEL_KEY_SIZE);
 }
 
-struct sw_flow_key {
-	struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
-	struct {
-		u32	priority;	/* Packet QoS priority. */
-		u32	skb_mark;	/* SKB mark. */
-		u16	in_port;	/* Input switch port (or DP_MAX_PORTS). */
-	} __packed phy; /* Safe when right after 'tun_key'. */
-	struct {
-		u8     src[ETH_ALEN];	/* Ethernet source address. */
-		u8     dst[ETH_ALEN];	/* Ethernet destination address. */
-		__be16 tci;		/* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
-		__be16 type;		/* Ethernet frame type. */
-	} eth;
-	struct {
-		u8     proto;		/* IP protocol or lower 8 bits of ARP opcode. */
-		u8     tos;		/* IP ToS. */
-		u8     ttl;		/* IP TTL/hop limit. */
-		u8     frag;		/* One of OVS_FRAG_TYPE_*. */
-	} ip;
-	struct {
-		__be16 src;		/* TCP/UDP/SCTP source port. */
-		__be16 dst;		/* TCP/UDP/SCTP destination port. */
-		__be16 flags;		/* TCP flags. */
-	} tp;
-	union {
-		struct {
-			struct {
-				__be32 src;	/* IP source address. */
-				__be32 dst;	/* IP destination address. */
-			} addr;
-			struct {
-				u8 sha[ETH_ALEN];	/* ARP source hardware address. */
-				u8 tha[ETH_ALEN];	/* ARP target hardware address. */
-			} arp;
-		} ipv4;
-		struct {
-			struct {
-				struct in6_addr src;	/* IPv6 source address. */
-				struct in6_addr dst;	/* IPv6 destination address. */
-			} addr;
-			__be32 label;			/* IPv6 flow label. */
-			struct {
-				struct in6_addr target;	/* ND target address. */
-				u8 sll[ETH_ALEN];	/* ND source link layer address. */
-				u8 tll[ETH_ALEN];	/* ND target link layer address. */
-			} nd;
-		} ipv6;
-	};
-} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
-
-struct sw_flow_key_range {
-	unsigned short int start;
-	unsigned short int end;
-};
-
-struct sw_flow_mask {
+struct ovs_flow_mask {
 	int ref_count;
 	struct rcu_head rcu;
 	struct list_head list;
-	struct sw_flow_key_range range;
-	struct sw_flow_key key;
+	struct sw_flow_mask mask;
 };
 
-struct sw_flow_match {
+struct ovs_flow_match {
 	struct sw_flow_key *key;
 	struct sw_flow_key_range range;
 	struct sw_flow_mask *mask;
 };
 
-struct sw_flow_actions {
+struct ovs_flow_actions {
 	struct rcu_head rcu;
 	u32 actions_len;
 	struct nlattr actions[];
@@ -148,17 +84,15 @@ struct flow_stats {
 	__be16 tcp_flags;		/* Union of seen TCP flags. */
 };
 
-struct sw_flow {
+struct ovs_flow {
 	struct rcu_head rcu;
 	struct hlist_node hash_node[2];
 	u32 hash;
 	int stats_last_writer;		/* NUMA-node id of the last writer on
 					 * 'stats[0]'.
 					 */
-	struct sw_flow_key key;
-	struct sw_flow_key unmasked_key;
-	struct sw_flow_mask *mask;
-	struct sw_flow_actions __rcu *sf_acts;
+	struct sw_flow flow;
+	struct ovs_flow_actions __rcu *sf_acts;
 	struct flow_stats __rcu *stats[]; /* One for each NUMA node.  First one
 					   * is allocated at flow creation time,
 					   * the rest are allocated on demand
@@ -180,11 +114,11 @@ struct arp_eth_header {
 	unsigned char       ar_tip[4];		/* target IP address        */
 } __packed;
 
-void ovs_flow_stats_update(struct sw_flow *, __be16 tcp_flags,
+void ovs_flow_stats_update(struct ovs_flow *, __be16 tcp_flags,
 			   struct sk_buff *);
-void ovs_flow_stats_get(const struct sw_flow *, struct ovs_flow_stats *,
+void ovs_flow_stats_get(const struct ovs_flow *, struct ovs_flow_stats *,
 			unsigned long *used, __be16 *tcp_flags);
-void ovs_flow_stats_clear(struct sw_flow *);
+void ovs_flow_stats_clear(struct ovs_flow *);
 u64 ovs_flow_used_time(unsigned long flow_jiffies);
 
 int ovs_flow_extract(struct sk_buff *, u16 in_port, struct sw_flow_key *);
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d757848..1eb5054 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -48,7 +48,7 @@
 
 #include "flow_netlink.h"
 
-static void update_range__(struct sw_flow_match *match,
+static void update_range__(struct ovs_flow_match *match,
 			   size_t offset, size_t size, bool is_mask)
 {
 	struct sw_flow_key_range *range = NULL;
@@ -105,7 +105,7 @@ static u16 range_n_bytes(const struct sw_flow_key_range *range)
 	return range->end - range->start;
 }
 
-static bool match_validate(const struct sw_flow_match *match,
+static bool match_validate(const struct ovs_flow_match *match,
 			   u64 key_attrs, u64 mask_attrs)
 {
 	u64 key_expected = 1 << OVS_KEY_ATTR_ETHERNET;
@@ -327,7 +327,7 @@ static int parse_flow_nlattrs(const struct nlattr *attr,
 }
 
 static int ipv4_tun_from_nlattr(const struct nlattr *attr,
-				struct sw_flow_match *match, bool is_mask)
+				struct ovs_flow_match *match, bool is_mask)
 {
 	struct nlattr *a;
 	int rem;
@@ -416,8 +416,8 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 }
 
 static int ipv4_tun_to_nlattr(struct sk_buff *skb,
-			      const struct ovs_key_ipv4_tunnel *tun_key,
-			      const struct ovs_key_ipv4_tunnel *output)
+			      const struct sw_flow_key_ipv4_tunnel *tun_key,
+			      const struct sw_flow_key_ipv4_tunnel *output)
 {
 	struct nlattr *nla;
 
@@ -451,7 +451,7 @@ static int ipv4_tun_to_nlattr(struct sk_buff *skb,
 }
 
 
-static int metadata_from_nlattrs(struct sw_flow_match *match,  u64 *attrs,
+static int metadata_from_nlattrs(struct ovs_flow_match *match,  u64 *attrs,
 				 const struct nlattr **a, bool is_mask)
 {
 	if (*attrs & (1 << OVS_KEY_ATTR_PRIORITY)) {
@@ -489,7 +489,7 @@ static int metadata_from_nlattrs(struct sw_flow_match *match,  u64 *attrs,
 	return 0;
 }
 
-static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
+static int ovs_key_from_nlattrs(struct ovs_flow_match *match, u64 attrs,
 				const struct nlattr **a, bool is_mask)
 {
 	int err;
@@ -730,7 +730,7 @@ static void sw_flow_mask_set(struct sw_flow_mask *mask,
  * @mask: Optional. Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink
  * attribute specifies the mask field of the wildcarded flow.
  */
-int ovs_nla_get_match(struct sw_flow_match *match,
+int ovs_nla_get_match(struct ovs_flow_match *match,
 		      const struct nlattr *key,
 		      const struct nlattr *mask)
 {
@@ -849,11 +849,11 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 int ovs_nla_get_flow_metadata(struct sw_flow *flow,
 			      const struct nlattr *attr)
 {
-	struct ovs_key_ipv4_tunnel *tun_key = &flow->key.tun_key;
+	struct sw_flow_key_ipv4_tunnel *tun_key = &flow->key.tun_key;
 	const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
 	u64 attrs = 0;
 	int err;
-	struct sw_flow_match match;
+	struct ovs_flow_match match;
 
 	flow->key.phy.in_port = DP_MAX_PORTS;
 	flow->key.phy.priority = 0;
@@ -1070,9 +1070,9 @@ nla_put_failure:
 
 #define MAX_ACTIONS_BUFSIZE	(32 * 1024)
 
-struct sw_flow_actions *ovs_nla_alloc_flow_actions(int size)
+struct ovs_flow_actions *ovs_nla_alloc_flow_actions(int size)
 {
-	struct sw_flow_actions *sfa;
+	struct ovs_flow_actions *sfa;
 
 	if (size > MAX_ACTIONS_BUFSIZE)
 		return ERR_PTR(-EINVAL);
@@ -1087,19 +1087,19 @@ struct sw_flow_actions *ovs_nla_alloc_flow_actions(int size)
 
 /* Schedules 'sf_acts' to be freed after the next RCU grace period.
  * The caller must hold rcu_read_lock for this to be sensible. */
-void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
+void ovs_nla_free_flow_actions(struct ovs_flow_actions *sf_acts)
 {
 	kfree_rcu(sf_acts, rcu);
 }
 
-static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa,
+static struct nlattr *reserve_sfa_size(struct ovs_flow_actions **sfa,
 				       int attr_len)
 {
 
-	struct sw_flow_actions *acts;
+	struct ovs_flow_actions *acts;
 	int new_acts_size;
 	int req_size = NLA_ALIGN(attr_len);
-	int next_offset = offsetof(struct sw_flow_actions, actions) +
+	int next_offset = offsetof(struct ovs_flow_actions, actions) +
 					(*sfa)->actions_len;
 
 	if (req_size <= (ksize(*sfa) - next_offset))
@@ -1127,7 +1127,8 @@ out:
 	return  (struct nlattr *) ((unsigned char *)(*sfa) + next_offset);
 }
 
-static int add_action(struct sw_flow_actions **sfa, int attrtype, void *data, int len)
+static int add_action(struct ovs_flow_actions **sfa, int attrtype,
+		      void *data, int len)
 {
 	struct nlattr *a;
 
@@ -1145,7 +1146,7 @@ static int add_action(struct sw_flow_actions **sfa, int attrtype, void *data, in
 	return 0;
 }
 
-static inline int add_nested_action_start(struct sw_flow_actions **sfa,
+static inline int add_nested_action_start(struct ovs_flow_actions **sfa,
 					  int attrtype)
 {
 	int used = (*sfa)->actions_len;
@@ -1158,7 +1159,7 @@ static inline int add_nested_action_start(struct sw_flow_actions **sfa,
 	return used;
 }
 
-static inline void add_nested_action_end(struct sw_flow_actions *sfa,
+static inline void add_nested_action_end(struct ovs_flow_actions *sfa,
 					 int st_offset)
 {
 	struct nlattr *a = (struct nlattr *) ((unsigned char *)sfa->actions +
@@ -1169,7 +1170,7 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa,
 
 static int validate_and_copy_sample(const struct nlattr *attr,
 				    const struct sw_flow_key *key, int depth,
-				    struct sw_flow_actions **sfa)
+				    struct ovs_flow_actions **sfa)
 {
 	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
 	const struct nlattr *probability, *actions;
@@ -1226,7 +1227,7 @@ static int validate_tp_port(const struct sw_flow_key *flow_key)
 	return -EINVAL;
 }
 
-void ovs_match_init(struct sw_flow_match *match,
+void ovs_match_init(struct ovs_flow_match *match,
 		    struct sw_flow_key *key,
 		    struct sw_flow_mask *mask)
 {
@@ -1243,9 +1244,9 @@ void ovs_match_init(struct sw_flow_match *match,
 }
 
 static int validate_and_copy_set_tun(const struct nlattr *attr,
-				     struct sw_flow_actions **sfa)
+				     struct ovs_flow_actions **sfa)
 {
-	struct sw_flow_match match;
+	struct ovs_flow_match match;
 	struct sw_flow_key key;
 	int err, start;
 
@@ -1267,7 +1268,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 
 static int validate_set(const struct nlattr *a,
 			const struct sw_flow_key *flow_key,
-			struct sw_flow_actions **sfa,
+			struct ovs_flow_actions **sfa,
 			bool *set_tun)
 {
 	const struct nlattr *ovs_key = nla_data(a);
@@ -1381,7 +1382,7 @@ static int validate_userspace(const struct nlattr *attr)
 }
 
 static int copy_action(const struct nlattr *from,
-		       struct sw_flow_actions **sfa)
+		       struct ovs_flow_actions **sfa)
 {
 	int totlen = NLA_ALIGN(from->nla_len);
 	struct nlattr *to;
@@ -1397,7 +1398,7 @@ static int copy_action(const struct nlattr *from,
 int ovs_nla_copy_actions(const struct nlattr *attr,
 			 const struct sw_flow_key *key,
 			 int depth,
-			 struct sw_flow_actions **sfa)
+			 struct ovs_flow_actions **sfa)
 {
 	const struct nlattr *a;
 	int rem, err;
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index 4401510..296b126 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -37,24 +37,24 @@
 
 #include "flow.h"
 
-void ovs_match_init(struct sw_flow_match *match,
+void ovs_match_init(struct ovs_flow_match *match,
 		    struct sw_flow_key *key, struct sw_flow_mask *mask);
 
 int ovs_nla_put_flow(const struct sw_flow_key *,
 		     const struct sw_flow_key *, struct sk_buff *);
 int ovs_nla_get_flow_metadata(struct sw_flow *flow,
 			      const struct nlattr *attr);
-int ovs_nla_get_match(struct sw_flow_match *match,
+int ovs_nla_get_match(struct ovs_flow_match *match,
 		      const struct nlattr *,
 		      const struct nlattr *);
 
 int ovs_nla_copy_actions(const struct nlattr *attr,
 			 const struct sw_flow_key *key, int depth,
-			 struct sw_flow_actions **sfa);
+			 struct ovs_flow_actions **sfa);
 int ovs_nla_put_actions(const struct nlattr *attr,
 			int len, struct sk_buff *skb);
 
-struct sw_flow_actions *ovs_nla_alloc_flow_actions(int actions_len);
-void ovs_nla_free_flow_actions(struct sw_flow_actions *);
+struct ovs_flow_actions *ovs_nla_alloc_flow_actions(int actions_len);
+void ovs_nla_free_flow_actions(struct ovs_flow_actions *);
 
 #endif /* flow_netlink.h */
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index cf2d853..e7d9a41 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -73,9 +73,9 @@ void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
 		*d++ = *s++ & *m++;
 }
 
-struct sw_flow *ovs_flow_alloc(void)
+struct ovs_flow *ovs_flow_alloc(void)
 {
-	struct sw_flow *flow;
+	struct ovs_flow *flow;
 	struct flow_stats *stats;
 	int node;
 
@@ -84,7 +84,7 @@ struct sw_flow *ovs_flow_alloc(void)
 		return ERR_PTR(-ENOMEM);
 
 	flow->sf_acts = NULL;
-	flow->mask = NULL;
+	flow->flow.mask = NULL;
 	flow->stats_last_writer = NUMA_NO_NODE;
 
 	/* Initialize the default stat node. */
@@ -135,11 +135,11 @@ static struct flex_array *alloc_buckets(unsigned int n_buckets)
 	return buckets;
 }
 
-static void flow_free(struct sw_flow *flow)
+static void flow_free(struct ovs_flow *flow)
 {
 	int node;
 
-	kfree((struct sw_flow_actions __force *)flow->sf_acts);
+	kfree((struct ovs_flow_actions __force *)flow->sf_acts);
 	for_each_node(node)
 		if (flow->stats[node])
 			kmem_cache_free(flow_stats_cache,
@@ -149,12 +149,12 @@ static void flow_free(struct sw_flow *flow)
 
 static void rcu_free_flow_callback(struct rcu_head *rcu)
 {
-	struct sw_flow *flow = container_of(rcu, struct sw_flow, rcu);
+	struct ovs_flow *flow = container_of(rcu, struct ovs_flow, rcu);
 
 	flow_free(flow);
 }
 
-void ovs_flow_free(struct sw_flow *flow, bool deferred)
+void ovs_flow_free(struct ovs_flow *flow, bool deferred)
 {
 	if (!flow)
 		return;
@@ -232,7 +232,7 @@ static void table_instance_destroy(struct table_instance *ti, bool deferred)
 		goto skip_flows;
 
 	for (i = 0; i < ti->n_buckets; i++) {
-		struct sw_flow *flow;
+		struct ovs_flow *flow;
 		struct hlist_head *head = flex_array_get(ti->buckets, i);
 		struct hlist_node *n;
 		int ver = ti->node_ver;
@@ -257,10 +257,10 @@ void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred)
 	table_instance_destroy(ti, deferred);
 }
 
-struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *ti,
-				       u32 *bucket, u32 *last)
+struct ovs_flow *ovs_flow_tbl_dump_next(struct table_instance *ti,
+					u32 *bucket, u32 *last)
 {
-	struct sw_flow *flow;
+	struct ovs_flow *flow;
 	struct hlist_head *head;
 	int ver;
 	int i;
@@ -291,7 +291,8 @@ static struct hlist_head *find_bucket(struct table_instance *ti, u32 hash)
 				(hash & (ti->n_buckets - 1)));
 }
 
-static void table_instance_insert(struct table_instance *ti, struct sw_flow *flow)
+static void table_instance_insert(struct table_instance *ti,
+				  struct ovs_flow *flow)
 {
 	struct hlist_head *head;
 
@@ -310,7 +311,7 @@ static void flow_table_copy_flows(struct table_instance *old,
 
 	/* Insert in new table. */
 	for (i = 0; i < old->n_buckets; i++) {
-		struct sw_flow *flow;
+		struct ovs_flow *flow;
 		struct hlist_head *head;
 
 		head = flex_array_get(old->buckets, i);
@@ -397,21 +398,21 @@ static bool flow_cmp_masked_key(const struct sw_flow *flow,
 	return cmp_key(&flow->key, key, key_start, key_end);
 }
 
-bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
-			       struct sw_flow_match *match)
+bool ovs_flow_cmp_unmasked_key(const struct ovs_flow *flow,
+			       struct ovs_flow_match *match)
 {
 	struct sw_flow_key *key = match->key;
 	int key_start = flow_key_start(key);
 	int key_end = match->range.end;
 
-	return cmp_key(&flow->unmasked_key, key, key_start, key_end);
+	return cmp_key(&flow->flow.unmasked_key, key, key_start, key_end);
 }
 
-static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
-					  const struct sw_flow_key *unmasked,
-					  struct sw_flow_mask *mask)
+static struct ovs_flow *masked_flow_lookup(struct table_instance *ti,
+					   const struct sw_flow_key *unmasked,
+					   struct sw_flow_mask *mask)
 {
-	struct sw_flow *flow;
+	struct ovs_flow *flow;
 	struct hlist_head *head;
 	int key_start = mask->range.start;
 	int key_end = mask->range.end;
@@ -422,50 +423,50 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
 	hash = flow_hash(&masked_key, key_start, key_end);
 	head = find_bucket(ti, hash);
 	hlist_for_each_entry_rcu(flow, head, hash_node[ti->node_ver]) {
-		if (flow->mask == mask && flow->hash == hash &&
-		    flow_cmp_masked_key(flow, &masked_key,
-					  key_start, key_end))
+		if (flow->flow.mask == mask && flow->hash == hash &&
+		    flow_cmp_masked_key(&flow->flow, &masked_key,
+					key_start, key_end))
 			return flow;
 	}
 	return NULL;
 }
 
-struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
-				    const struct sw_flow_key *key,
-				    u32 *n_mask_hit)
+struct ovs_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
+					   const struct sw_flow_key *key,
+					   u32 *n_mask_hit)
 {
 	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
-	struct sw_flow_mask *mask;
-	struct sw_flow *flow;
+	struct ovs_flow_mask *mask;
+	struct ovs_flow *flow;
 
 	*n_mask_hit = 0;
 	list_for_each_entry_rcu(mask, &tbl->mask_list, list) {
 		(*n_mask_hit)++;
-		flow = masked_flow_lookup(ti, key, mask);
+		flow = masked_flow_lookup(ti, key, &mask->mask);
 		if (flow)  /* Found */
 			return flow;
 	}
 	return NULL;
 }
 
-struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
-				    const struct sw_flow_key *key)
+struct ovs_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
+				     const struct sw_flow_key *key)
 {
 	u32 __always_unused n_mask_hit;
 
 	return ovs_flow_tbl_lookup_stats(tbl, key, &n_mask_hit);
 }
 
-struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
-					  struct sw_flow_match *match)
+struct ovs_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
+					   struct ovs_flow_match *match)
 {
 	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
-	struct sw_flow_mask *mask;
-	struct sw_flow *flow;
+	struct ovs_flow_mask *mask;
+	struct ovs_flow *flow;
 
 	/* Always called under ovs-mutex. */
 	list_for_each_entry(mask, &tbl->mask_list, list) {
-		flow = masked_flow_lookup(ti, match->key, mask);
+		flow = masked_flow_lookup(ti, match->key, &mask->mask);
 		if (flow && ovs_flow_cmp_unmasked_key(flow, match))  /* Found */
 			return flow;
 	}
@@ -474,7 +475,7 @@ struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
 
 int ovs_flow_tbl_num_masks(const struct flow_table *table)
 {
-	struct sw_flow_mask *mask;
+	struct ovs_flow_mask *mask;
 	int num = 0;
 
 	list_for_each_entry(mask, &table->mask_list, list)
@@ -489,7 +490,7 @@ static struct table_instance *table_instance_expand(struct table_instance *ti)
 }
 
 /* Remove 'mask' from the mask list, if it is not needed any more. */
-static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask)
+static void flow_mask_remove(struct flow_table *tbl, struct ovs_flow_mask *mask)
 {
 	if (mask) {
 		/* ovs-lock is required to protect mask-refcount and
@@ -507,9 +508,12 @@ static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask)
 }
 
 /* Must be called with OVS mutex held. */
-void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
+void ovs_flow_tbl_remove(struct flow_table *table, struct ovs_flow *flow)
 {
 	struct table_instance *ti = ovsl_dereference(table->ti);
+	struct ovs_flow_mask *mask = container_of(flow->flow.mask,
+						  struct ovs_flow_mask,
+						  mask);
 
 	BUG_ON(table->count == 0);
 	hlist_del_rcu(&flow->hash_node[ti->node_ver]);
@@ -518,12 +522,12 @@ void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
 	/* RCU delete the mask. 'flow->mask' is not NULLed, as it should be
 	 * accessible as long as the RCU read lock is held.
 	 */
-	flow_mask_remove(table, flow->mask);
+	flow_mask_remove(table, mask);
 }
 
-static struct sw_flow_mask *mask_alloc(void)
+static struct ovs_flow_mask *mask_alloc(void)
 {
-	struct sw_flow_mask *mask;
+	struct ovs_flow_mask *mask;
 
 	mask = kmalloc(sizeof(*mask), GFP_KERNEL);
 	if (mask)
@@ -543,15 +547,16 @@ static bool mask_equal(const struct sw_flow_mask *a,
 		&& (memcmp(a_, b_, range_n_bytes(&a->range)) == 0);
 }
 
-static struct sw_flow_mask *flow_mask_find(const struct flow_table *tbl,
-					   const struct sw_flow_mask *mask)
+static struct ovs_flow_mask *flow_mask_find(const struct flow_table *tbl,
+					    const struct sw_flow_mask *mask)
 {
 	struct list_head *ml;
 
 	list_for_each(ml, &tbl->mask_list) {
-		struct sw_flow_mask *m;
-		m = container_of(ml, struct sw_flow_mask, list);
-		if (mask_equal(mask, m))
+		struct ovs_flow_mask *m;
+
+		m = container_of(ml, struct ovs_flow_mask, list);
+		if (mask_equal(mask, &m->mask))
 			return m;
 	}
 
@@ -559,30 +564,31 @@ static struct sw_flow_mask *flow_mask_find(const struct flow_table *tbl,
 }
 
 /* Add 'mask' into the mask list, if it is not already there. */
-static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
+static int flow_mask_insert(struct flow_table *tbl, struct ovs_flow *flow,
 			    struct sw_flow_mask *new)
 {
-	struct sw_flow_mask *mask;
+	struct ovs_flow_mask *mask;
+
 	mask = flow_mask_find(tbl, new);
 	if (!mask) {
 		/* Allocate a new mask if none exsits. */
 		mask = mask_alloc();
 		if (!mask)
 			return -ENOMEM;
-		mask->key = new->key;
-		mask->range = new->range;
+		mask->mask.key = new->key;
+		mask->mask.range = new->range;
 		list_add_rcu(&mask->list, &tbl->mask_list);
 	} else {
 		BUG_ON(!mask->ref_count);
 		mask->ref_count++;
 	}
 
-	flow->mask = mask;
+	flow->flow.mask = &mask->mask;
 	return 0;
 }
 
 /* Must be called with OVS mutex held. */
-int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
+int ovs_flow_tbl_insert(struct flow_table *table, struct ovs_flow *flow,
 			struct sw_flow_mask *mask)
 {
 	struct table_instance *new_ti = NULL;
@@ -593,8 +599,8 @@ int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
 	if (err)
 		return err;
 
-	flow->hash = flow_hash(&flow->key, flow->mask->range.start,
-			flow->mask->range.end);
+	flow->hash = flow_hash(&flow->flow.key, flow->flow.mask->range.start,
+			       flow->flow.mask->range.end);
 	ti = ovsl_dereference(table->ti);
 	table_instance_insert(ti, flow);
 	table->count++;
@@ -620,7 +626,7 @@ int ovs_flow_init(void)
 	BUILD_BUG_ON(__alignof__(struct sw_flow_key) % __alignof__(long));
 	BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long));
 
-	flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow)
+	flow_cache = kmem_cache_create("ovs_flow", sizeof(struct ovs_flow)
 				       + (num_possible_nodes()
 					  * sizeof(struct flow_stats *)),
 				       0, 0, NULL);
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index 5918bff..d57d6b5 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -57,29 +57,29 @@ extern struct kmem_cache *flow_stats_cache;
 int ovs_flow_init(void);
 void ovs_flow_exit(void);
 
-struct sw_flow *ovs_flow_alloc(void);
-void ovs_flow_free(struct sw_flow *, bool deferred);
+struct ovs_flow *ovs_flow_alloc(void);
+void ovs_flow_free(struct ovs_flow *, bool deferred);
 
 int ovs_flow_tbl_init(struct flow_table *);
 int ovs_flow_tbl_count(struct flow_table *table);
 void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred);
 int ovs_flow_tbl_flush(struct flow_table *flow_table);
 
-int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
+int ovs_flow_tbl_insert(struct flow_table *table, struct ovs_flow *flow,
 			struct sw_flow_mask *mask);
-void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow);
+void ovs_flow_tbl_remove(struct flow_table *table, struct ovs_flow *flow);
 int  ovs_flow_tbl_num_masks(const struct flow_table *table);
-struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *table,
-				       u32 *bucket, u32 *idx);
-struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *,
-				    const struct sw_flow_key *,
-				    u32 *n_mask_hit);
-struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *,
-				    const struct sw_flow_key *);
-struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
-					  struct sw_flow_match *match);
-bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
-			       struct sw_flow_match *match);
+struct ovs_flow *ovs_flow_tbl_dump_next(struct table_instance *table,
+					u32 *bucket, u32 *idx);
+struct ovs_flow *ovs_flow_tbl_lookup_stats(struct flow_table *,
+					   const struct sw_flow_key *,
+					   u32 *n_mask_hit);
+struct ovs_flow *ovs_flow_tbl_lookup(struct flow_table *,
+				     const struct sw_flow_key *);
+struct ovs_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
+					   struct ovs_flow_match *match);
+bool ovs_flow_cmp_unmasked_key(const struct ovs_flow *flow,
+			       struct ovs_flow_match *match);
 
 void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
 		       const struct sw_flow_mask *mask);
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index f49148a..fda79eb 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -63,7 +63,7 @@ static __be16 filter_tnl_flags(__be16 flags)
 static struct sk_buff *__build_header(struct sk_buff *skb,
 				      int tunnel_hlen)
 {
-	const struct ovs_key_ipv4_tunnel *tun_key = OVS_CB(skb)->tun_key;
+	const struct sw_flow_key_ipv4_tunnel *tun_key = OVS_CB(skb)->tun_key;
 	struct tnl_ptk_info tpi;
 
 	skb = gre_handle_offloads(skb, !!(tun_key->tun_flags & TUNNEL_CSUM));
@@ -92,7 +92,7 @@ static __be64 key_to_tunnel_id(__be32 key, __be32 seq)
 static int gre_rcv(struct sk_buff *skb,
 		   const struct tnl_ptk_info *tpi)
 {
-	struct ovs_key_ipv4_tunnel tun_key;
+	struct sw_flow_key_ipv4_tunnel tun_key;
 	struct ovs_net *ovs_net;
 	struct vport *vport;
 	__be64 key;
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index d8b7e24..b7edf47 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -58,7 +58,7 @@ static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
 /* Called with rcu_read_lock and BH disabled. */
 static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
 {
-	struct ovs_key_ipv4_tunnel tun_key;
+	struct sw_flow_key_ipv4_tunnel tun_key;
 	struct vport *vport = vs->data;
 	struct iphdr *iph;
 	__be64 key;
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 6d8f2ec..7df5234 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -438,7 +438,7 @@ u32 ovs_vport_find_upcall_portid(const struct vport *p, struct sk_buff *skb)
  * skb->data should point to the Ethernet header.
  */
 void ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
-		       struct ovs_key_ipv4_tunnel *tun_key)
+		       struct sw_flow_key_ipv4_tunnel *tun_key)
 {
 	struct pcpu_sw_netstats *stats;
 
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index 35f89d8..8409e06 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -210,7 +210,7 @@ static inline struct vport *vport_from_priv(void *priv)
 }
 
 void ovs_vport_receive(struct vport *, struct sk_buff *,
-		       struct ovs_key_ipv4_tunnel *);
+		       struct sw_flow_key_ipv4_tunnel *);
 
 /* List of statically compiled vport implementations.  Don't forget to also
  * add yours to the list at the top of vport.c. */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 02/13] net: rename netdev_phys_port_id to more generic name
  2014-09-03  9:24 [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Jiri Pirko
  2014-09-03  9:24 ` [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones Jiri Pirko
@ 2014-09-03  9:24 ` Jiri Pirko
       [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, dev, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye

So this can be reused for identification of other "items" as well.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c      |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |  2 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |  2 +-
 include/linux/netdevice.h                        | 16 ++++++++--------
 net/core/dev.c                                   |  2 +-
 net/core/net-sysfs.c                             |  2 +-
 net/core/rtnetlink.c                             |  6 +++---
 8 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 93132d8f..deeaa7f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12410,7 +12410,7 @@ static int bnx2x_validate_addr(struct net_device *dev)
 }
 
 static int bnx2x_get_phys_port_id(struct net_device *netdev,
-				  struct netdev_phys_port_id *ppid)
+				  struct netdev_phys_item_id *ppid)
 {
 	struct bnx2x *bp = netdev_priv(netdev);
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index bd192b8..cb7208e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7336,7 +7336,7 @@ static void i40e_del_vxlan_port(struct net_device *netdev,
 
 #endif
 static int i40e_get_phys_port_id(struct net_device *netdev,
-				 struct netdev_phys_port_id *ppid)
+				 struct netdev_phys_item_id *ppid)
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_pf *pf = np->vsi->back;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index bb536aa..edf3040 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2276,7 +2276,7 @@ static int mlx4_en_set_vf_link_state(struct net_device *dev, int vf, int link_st
 
 #define PORT_ID_BYTE_LEN 8
 static int mlx4_en_get_phys_port_id(struct net_device *dev,
-				    struct netdev_phys_port_id *ppid)
+				    struct netdev_phys_item_id *ppid)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_dev *mdev = priv->mdev->dev;
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index f5e29f7..6e514d2 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -460,7 +460,7 @@ static void qlcnic_82xx_cancel_idc_work(struct qlcnic_adapter *adapter)
 }
 
 static int qlcnic_get_phys_port_id(struct net_device *netdev,
-				   struct netdev_phys_port_id *ppid)
+				   struct netdev_phys_item_id *ppid)
 {
 	struct qlcnic_adapter *adapter = netdev_priv(netdev);
 	struct qlcnic_hardware_context *ahw = adapter->ahw;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5be20a7..9faeea6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -739,13 +739,13 @@ struct netdev_fcoe_hbainfo {
 };
 #endif
 
-#define MAX_PHYS_PORT_ID_LEN 32
+#define MAX_PHYS_ITEM_ID_LEN 32
 
-/* This structure holds a unique identifier to identify the
- * physical port used by a netdevice.
+/* This structure holds a unique identifier to identify some
+ * physical item (port for example) used by a netdevice.
  */
-struct netdev_phys_port_id {
-	unsigned char id[MAX_PHYS_PORT_ID_LEN];
+struct netdev_phys_item_id {
+	unsigned char id[MAX_PHYS_ITEM_ID_LEN];
 	unsigned char id_len;
 };
 
@@ -961,7 +961,7 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
  *
  * int (*ndo_get_phys_port_id)(struct net_device *dev,
- *			       struct netdev_phys_port_id *ppid);
+ *			       struct netdev_phys_item_id *ppid);
  *	Called to get ID of physical port of this device. If driver does
  *	not implement this, it is assumed that the hw is not able to have
  *	multiple net devices on single physical port.
@@ -1129,7 +1129,7 @@ struct net_device_ops {
 	int			(*ndo_change_carrier)(struct net_device *dev,
 						      bool new_carrier);
 	int			(*ndo_get_phys_port_id)(struct net_device *dev,
-							struct netdev_phys_port_id *ppid);
+							struct netdev_phys_item_id *ppid);
 	void			(*ndo_add_vxlan_port)(struct  net_device *dev,
 						      sa_family_t sa_family,
 						      __be16 port);
@@ -2848,7 +2848,7 @@ void dev_set_group(struct net_device *, int);
 int dev_set_mac_address(struct net_device *, struct sockaddr *);
 int dev_change_carrier(struct net_device *, bool new_carrier);
 int dev_get_phys_port_id(struct net_device *dev,
-			 struct netdev_phys_port_id *ppid);
+			 struct netdev_phys_item_id *ppid);
 struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device *dev);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
diff --git a/net/core/dev.c b/net/core/dev.c
index 3774afc..50a9004 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5650,7 +5650,7 @@ EXPORT_SYMBOL(dev_change_carrier);
  *	Get device physical port ID
  */
 int dev_get_phys_port_id(struct net_device *dev,
-			 struct netdev_phys_port_id *ppid)
+			 struct netdev_phys_item_id *ppid)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 9dd0669..55dc4da 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -387,7 +387,7 @@ static ssize_t phys_port_id_show(struct device *dev,
 		return restart_syscall();
 
 	if (dev_isalive(netdev)) {
-		struct netdev_phys_port_id ppid;
+		struct netdev_phys_item_id ppid;
 
 		ret = dev_get_phys_port_id(netdev, &ppid);
 		if (!ret)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a688268..1087c6d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -868,7 +868,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
 	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
 	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
-	       + nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */
+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
 }
 
 static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
@@ -952,7 +952,7 @@ static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev,
 static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
 {
 	int err;
-	struct netdev_phys_port_id ppid;
+	struct netdev_phys_item_id ppid;
 
 	err = dev_get_phys_port_id(dev, &ppid);
 	if (err) {
@@ -1196,7 +1196,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_PROMISCUITY]	= { .type = NLA_U32 },
 	[IFLA_NUM_TX_QUEUES]	= { .type = NLA_U32 },
 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
-	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
+	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
 };
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 03/13] net: introduce generic switch devices support
       [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
@ 2014-09-03  9:24   ` Jiri Pirko
       [not found]     ` <1409736300-12303-4-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
  2014-09-03  9:24   ` [patch net-next 04/13] rtnl: expose physical switch id for particular device Jiri Pirko
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

The goal of this is to provide a possibility to suport various switch
chips. Drivers should implement relevant ndos to do so. Now there is a
couple of ndos defines:
- for getting physical switch id is in place.
- for work with flows.

Note that user can use random port netdevice to access the switch.

Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
---
 Documentation/networking/switchdev.txt |  53 ++++++++++
 MAINTAINERS                            |   7 ++
 include/linux/netdevice.h              |  28 ++++++
 include/net/sw_flow.h                  |  14 +++
 include/net/switchdev.h                |  44 +++++++++
 net/Kconfig                            |   1 +
 net/Makefile                           |   3 +
 net/switchdev/Kconfig                  |   9 ++
 net/switchdev/Makefile                 |   5 +
 net/switchdev/switchdev.c              | 172 +++++++++++++++++++++++++++++++++
 10 files changed, 336 insertions(+)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 include/net/switchdev.h
 create mode 100644 net/switchdev/Kconfig
 create mode 100644 net/switchdev/Makefile
 create mode 100644 net/switchdev/switchdev.c

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
new file mode 100644
index 0000000..435746a
--- /dev/null
+++ b/Documentation/networking/switchdev.txt
@@ -0,0 +1,53 @@
+Switch device drivers HOWTO
+===========================
+
+First lets describe a topology a bit. Imagine the following example:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  NIC0 NIC1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+In this example, there are two independent lines between the switch silicon
+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
+separate from the switch driver. SOME switch chip is by managed by a driver
+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
+connected to some other type of bus.
+
+Now, for the previous example show the representation in kernel:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  eth0 eth1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
+created for each port of a switch. These netdevices are instances
+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
+of the switch chip. eth0 and eth1 are instances of some other existing driver.
+
+The only difference of the switch-port netdevice from the ordinary netdevice
+is that is implements couple more NDOs:
+
+	ndo_swdev_get_id - This returns the same ID for two port netdevices of
+			   the same physical switch chip. This is mandatory to
+			   be implemented by all switch drivers and serves
+			   the caller for recognition of a port netdevice.
+	ndo_swdev_* - Functions that serve for a manipulation of the switch chip
+		      itself. They are not port-specific. Caller might use
+		      arbitrary port netdevice of the same switch and it will
+		      make no difference.
+	ndo_swportdev_* - Functions that serve for a port-specific manipulation.
diff --git a/MAINTAINERS b/MAINTAINERS
index c9b4b55..4baaf44 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8808,6 +8808,13 @@ F:	lib/swiotlb.c
 F:	arch/*/kernel/pci-swiotlb.c
 F:	include/linux/swiotlb.h
 
+SWITCHDEV
+M:	Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
+L:	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+S:	Supported
+F:	net/switchdev/
+F:	include/net/switchdev.h
+
 SYNOPSYS ARC ARCHITECTURE
 M:	Vineet Gupta <vgupta-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
 S:	Supported
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9faeea6..6a009d1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -46,9 +46,11 @@
 #include <net/dcbnl.h>
 #endif
 #include <net/netprio_cgroup.h>
+#include <net/sw_flow.h>
 
 #include <linux/netdev_features.h>
 #include <linux/neighbour.h>
+
 #include <uapi/linux/netdevice.h>
 
 struct netpoll_info;
@@ -997,6 +999,24 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	Callback to use for xmit over the accelerated station. This
  *	is used in place of ndo_start_xmit on accelerated net
  *	devices.
+ *
+ * int (*ndo_swdev_get_id)(struct net_device *dev,
+ *			   struct netdev_phys_item_id *psid);
+ *	Called to get an ID of the switch chip this port is part of.
+ *	If driver implements this, it indicates that it represents a port
+ *	of a switch chip.
+ *
+ * int (*ndo_swdev_flow_insert)(struct net_device *dev,
+ *				const struct sw_flow *flow);
+ *	Called to insert a flow into switch device. If driver does
+ *	not implement this, it is assumed that the hw does not have
+ *	a capability to work with flows.
+ *
+ * int (*ndo_swdev_flow_remove)(struct net_device *dev,
+ *				const struct sw_flow *flow);
+ *	Called to remove a flow from switch device. If driver does
+ *	not implement this, it is assumed that the hw does not have
+ *	a capability to work with flows.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1146,6 +1166,14 @@ struct net_device_ops {
 							struct net_device *dev,
 							void *priv);
 	int			(*ndo_get_lock_subclass)(struct net_device *dev);
+#ifdef CONFIG_NET_SWITCHDEV
+	int			(*ndo_swdev_get_id)(struct net_device *dev,
+						    struct netdev_phys_item_id *psid);
+	int			(*ndo_swdev_flow_insert)(struct net_device *dev,
+							 const struct sw_flow *flow);
+	int			(*ndo_swdev_flow_remove)(struct net_device *dev,
+							 const struct sw_flow *flow);
+#endif
 };
 
 /**
diff --git a/include/net/sw_flow.h b/include/net/sw_flow.h
index 21724f1..3af7758 100644
--- a/include/net/sw_flow.h
+++ b/include/net/sw_flow.h
@@ -81,7 +81,21 @@ struct sw_flow_mask {
 	struct sw_flow_key key;
 };
 
+enum sw_flow_action_type {
+	SW_FLOW_ACTION_TYPE_OUTPUT,
+	SW_FLOW_ACTION_TYPE_VLAN_PUSH,
+	SW_FLOW_ACTION_TYPE_VLAN_POP,
+};
+
 struct sw_flow_action {
+	enum sw_flow_action_type type;
+	union {
+		u32 out_port_ifindex;
+		struct {
+			__be16 vlan_proto;
+			u16 vlan_tci;
+		} vlan;
+	};
 };
 
 struct sw_flow_actions {
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
new file mode 100644
index 0000000..098784b
--- /dev/null
+++ b/include/net/switchdev.h
@@ -0,0 +1,44 @@
+/*
+ * include/net/switchdev.h - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#ifndef _LINUX_SWITCHDEV_H_
+#define _LINUX_SWITCHDEV_H_
+
+#include <linux/netdevice.h>
+#include <net/sw_flow.h>
+
+#ifdef CONFIG_NET_SWITCHDEV
+
+int swdev_get_id(struct net_device *dev, struct netdev_phys_item_id *psid);
+int swdev_flow_insert(struct net_device *dev, const struct sw_flow *flow);
+int swdev_flow_remove(struct net_device *dev, const struct sw_flow *flow);
+
+#else
+
+static inline int swdev_get_id(struct net_device *dev,
+			       struct netdev_phys_item_id *psid)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int swdev_flow_insert(struct net_device *dev,
+				    const struct sw_flow *flow)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int swdev_flow_remove(struct net_device *dev,
+				    const struct sw_flow *flow)
+{
+	return -EOPNOTSUPP;
+}
+
+#endif
+
+#endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/Kconfig b/net/Kconfig
index 4051fdf..89a7fec 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -226,6 +226,7 @@ source "net/vmw_vsock/Kconfig"
 source "net/netlink/Kconfig"
 source "net/mpls/Kconfig"
 source "net/hsr/Kconfig"
+source "net/switchdev/Kconfig"
 
 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index 7ed1970..95fc694 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
 obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
 obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
 obj-$(CONFIG_HSR)		+= hsr/
+ifneq ($(CONFIG_NET_SWITCHDEV),)
+obj-y				+= switchdev/
+endif
diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
new file mode 100644
index 0000000..20e8ed2
--- /dev/null
+++ b/net/switchdev/Kconfig
@@ -0,0 +1,9 @@
+#
+# Configuration for Switch device support
+#
+
+config NET_SWITCHDEV
+	boolean "Switch device support (EXPERIMENTAL)"
+	depends on INET
+	---help---
+	  This module provides support for hardware switch chips.
diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
new file mode 100644
index 0000000..5ed63ed
--- /dev/null
+++ b/net/switchdev/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the Switch device API
+#
+
+obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
new file mode 100644
index 0000000..e079707
--- /dev/null
+++ b/net/switchdev/switchdev.c
@@ -0,0 +1,172 @@
+/*
+ * net/switchdev/switchdev.c - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/netdevice.h>
+#include <net/switchdev.h>
+#include <net/sw_flow.h>
+
+/**
+ *	swdev_get_id - Get ID of a switch
+ *	@dev: port device
+ *	@psid: switch ID
+ *
+ *	Get ID of a switch this port is part of.
+ */
+int swdev_get_id(struct net_device *dev, struct netdev_phys_item_id *psid)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_swdev_get_id)
+		return -EOPNOTSUPP;
+	return ops->ndo_swdev_get_id(dev, psid);
+}
+EXPORT_SYMBOL(swdev_get_id);
+
+static void print_flow_key_tun(const char *prefix,
+			       const struct sw_flow_key *key)
+{
+	pr_debug("%s tun  { id %08llx, s %pI4, d %pI4, f %02x, tos %x, ttl %x }\n",
+		 prefix,
+		 be64_to_cpu(key->tun_key.tun_id), &key->tun_key.ipv4_src,
+		 &key->tun_key.ipv4_dst, ntohs(key->tun_key.tun_flags),
+		 key->tun_key.ipv4_tos, key->tun_key.ipv4_ttl);
+}
+
+static void print_flow_key_phy(const char *prefix,
+			       const struct sw_flow_key *key)
+{
+	pr_debug("%s phy  { prio %08x, mark %04x, in_port %02x }\n",
+		 prefix,
+		 key->phy.priority, key->phy.skb_mark, key->phy.in_port);
+}
+
+static void print_flow_key_eth(const char *prefix,
+			       const struct sw_flow_key *key)
+{
+	pr_debug("%s eth  { sm %pM, dm %pM, tci %04x, type %04x }\n",
+		 prefix,
+		 key->eth.src, key->eth.dst, ntohs(key->eth.tci),
+		 ntohs(key->eth.type));
+}
+
+static void print_flow_key_ip(const char *prefix,
+			      const struct sw_flow_key *key)
+{
+	pr_debug("%s ip   { proto %02x, tos %02x, ttl %02x }\n",
+		 prefix,
+		 key->ip.proto, key->ip.tos, key->ip.ttl);
+}
+
+static void print_flow_key_ipv4(const char *prefix,
+				const struct sw_flow_key *key)
+{
+	pr_debug("%s ipv4 { si %pI4, di %pI4, sm %pM, dm %pM }\n",
+		 prefix,
+		 &key->ipv4.addr.src, &key->ipv4.addr.dst,
+		 key->ipv4.arp.sha, key->ipv4.arp.tha);
+}
+
+static void print_flow_key_misc(const char *prefix,
+				const struct sw_flow_key *key)
+{
+	pr_debug("%s misc { in_port_ifindex %08x }\n",
+		 prefix,
+		 key->misc.in_port_ifindex);
+}
+
+static void print_flow_actions(struct sw_flow_actions *actions)
+{
+	int i;
+
+	pr_debug("  actions:\n");
+	if (!actions)
+		return;
+	for (i = 0; i < actions->count; i++) {
+		struct sw_flow_action *action = &actions->actions[i];
+
+		switch (action->type) {
+		case SW_FLOW_ACTION_TYPE_OUTPUT:
+			pr_debug("    output    { ifindex %u }\n",
+				 action->out_port_ifindex);
+			break;
+		case SW_FLOW_ACTION_TYPE_VLAN_PUSH:
+			pr_debug("    vlan push { proto %04x, tci %04x }\n",
+				 ntohs(action->vlan.vlan_proto),
+				 ntohs(action->vlan.vlan_tci));
+			break;
+		case SW_FLOW_ACTION_TYPE_VLAN_POP:
+			pr_debug("    vlan pop\n");
+			break;
+		}
+	}
+}
+
+#define PREFIX_NONE "      "
+#define PREFIX_MASK "  mask"
+
+static void print_flow(const struct sw_flow *flow, struct net_device *dev,
+		       const char *comment)
+{
+	pr_debug("%s flow %s (%x-%x):\n", dev->name, comment,
+		 flow->mask->range.start, flow->mask->range.end);
+	print_flow_key_tun(PREFIX_NONE, &flow->key);
+	print_flow_key_tun(PREFIX_MASK, &flow->mask->key);
+	print_flow_key_phy(PREFIX_NONE, &flow->key);
+	print_flow_key_phy(PREFIX_MASK, &flow->mask->key);
+	print_flow_key_eth(PREFIX_NONE, &flow->key);
+	print_flow_key_eth(PREFIX_MASK, &flow->mask->key);
+	print_flow_key_ip(PREFIX_NONE, &flow->key);
+	print_flow_key_ip(PREFIX_MASK, &flow->mask->key);
+	print_flow_key_ipv4(PREFIX_NONE, &flow->key);
+	print_flow_key_ipv4(PREFIX_MASK, &flow->mask->key);
+	print_flow_actions(flow->actions);
+}
+
+/**
+ *	swdev_flow_insert - Insert a flow into switch
+ *	@dev: port device
+ *	@flow: flow descriptor
+ *
+ *	Insert a flow into switch this port is part of.
+ */
+int swdev_flow_insert(struct net_device *dev, const struct sw_flow *flow)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	print_flow(flow, dev, "insert");
+	if (!ops->ndo_swdev_flow_insert)
+		return -EOPNOTSUPP;
+	WARN_ON(!ops->ndo_swdev_get_id);
+	BUG_ON(!flow->actions);
+	return ops->ndo_swdev_flow_insert(dev, flow);
+}
+EXPORT_SYMBOL(swdev_flow_insert);
+
+/**
+ *	swdev_flow_remove - Remove a flow from switch
+ *	@dev: port device
+ *	@flow: flow descriptor
+ *
+ *	Remove a flow from switch this port is part of.
+ */
+int swdev_flow_remove(struct net_device *dev, const struct sw_flow *flow)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	print_flow(flow, dev, "remove");
+	if (!ops->ndo_swdev_flow_remove)
+		return -EOPNOTSUPP;
+	WARN_ON(!ops->ndo_swdev_get_id);
+	return ops->ndo_swdev_flow_remove(dev, flow);
+}
+EXPORT_SYMBOL(swdev_flow_remove);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 04/13] rtnl: expose physical switch id for particular device
       [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
  2014-09-03  9:24   ` [patch net-next 03/13] net: introduce generic switch devices support Jiri Pirko
@ 2014-09-03  9:24   ` Jiri Pirko
  2014-09-03  9:24   ` [patch net-next 05/13] net-sysfs: " Jiri Pirko
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

The netdevice represents a port in a switch, it will expose
IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
belong to one physical switch.

Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
---
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ff95760..fe6c4c5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -145,6 +145,7 @@ enum {
 	IFLA_CARRIER,
 	IFLA_PHYS_PORT_ID,
 	IFLA_CARRIER_CHANGES,
+	IFLA_PHYS_SWITCH_ID,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1087c6d..ef1450f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -43,6 +43,7 @@
 
 #include <linux/inet.h>
 #include <linux/netdevice.h>
+#include <net/switchdev.h>
 #include <net/ip.h>
 #include <net/protocol.h>
 #include <net/arp.h>
@@ -868,7 +869,8 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
 	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
 	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
-	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_SWITCH_ID */
 }
 
 static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
@@ -967,6 +969,24 @@ static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
 	return 0;
 }
 
+static int rtnl_phys_switch_id_fill(struct sk_buff *skb, struct net_device *dev)
+{
+	int err;
+	struct netdev_phys_item_id psid;
+
+	err = swdev_get_id(dev, &psid);
+	if (err) {
+		if (err == -EOPNOTSUPP)
+			return 0;
+		return err;
+	}
+
+	if (nla_put(skb, IFLA_PHYS_SWITCH_ID, psid.id_len, psid.id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			    int type, u32 pid, u32 seq, u32 change,
 			    unsigned int flags, u32 ext_filter_mask)
@@ -1039,6 +1059,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	if (rtnl_phys_port_id_fill(skb, dev))
 		goto nla_put_failure;
 
+	if (rtnl_phys_switch_id_fill(skb, dev))
+		goto nla_put_failure;
+
 	attr = nla_reserve(skb, IFLA_STATS,
 			sizeof(struct rtnl_link_stats));
 	if (attr == NULL)
@@ -1198,6 +1221,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
 	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
+	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 05/13] net-sysfs: expose physical switch id for particular device
       [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
  2014-09-03  9:24   ` [patch net-next 03/13] net: introduce generic switch devices support Jiri Pirko
  2014-09-03  9:24   ` [patch net-next 04/13] rtnl: expose physical switch id for particular device Jiri Pirko
@ 2014-09-03  9:24   ` Jiri Pirko
  2014-09-03  9:24   ` [patch net-next 06/13] net: introduce dummy switch Jiri Pirko
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
---
 net/core/net-sysfs.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 55dc4da..51cd5ab 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -12,6 +12,7 @@
 #include <linux/capability.h>
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
+#include <net/switchdev.h>
 #include <linux/if_arp.h>
 #include <linux/slab.h>
 #include <linux/nsproxy.h>
@@ -399,6 +400,28 @@ static ssize_t phys_port_id_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(phys_port_id);
 
+static ssize_t phys_switch_id_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct net_device *netdev = to_net_dev(dev);
+	ssize_t ret = -EINVAL;
+
+	if (!rtnl_trylock())
+		return restart_syscall();
+
+	if (dev_isalive(netdev)) {
+		struct netdev_phys_item_id ppid;
+
+		ret = swdev_get_id(netdev, &ppid);
+		if (!ret)
+			ret = sprintf(buf, "%*phN\n", ppid.id_len, ppid.id);
+	}
+	rtnl_unlock();
+
+	return ret;
+}
+static DEVICE_ATTR_RO(phys_switch_id);
+
 static struct attribute *net_class_attrs[] = {
 	&dev_attr_netdev_group.attr,
 	&dev_attr_type.attr,
@@ -423,6 +446,7 @@ static struct attribute *net_class_attrs[] = {
 	&dev_attr_flags.attr,
 	&dev_attr_tx_queue_len.attr,
 	&dev_attr_phys_port_id.attr,
+	&dev_attr_phys_switch_id.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(net_class);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 06/13] net: introduce dummy switch
       [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2014-09-03  9:24   ` [patch net-next 05/13] net-sysfs: " Jiri Pirko
@ 2014-09-03  9:24   ` Jiri Pirko
  2014-09-03  9:24   ` [patch net-next 07/13] dsa: implement ndo_swdev_get_id Jiri Pirko
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Dummy switch implementation using switchdev interface

Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
---
 drivers/net/Kconfig          |   7 +++
 drivers/net/Makefile         |   1 +
 drivers/net/dummyswitch.c    | 130 +++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/if_link.h |   9 +++
 4 files changed, 147 insertions(+)
 create mode 100644 drivers/net/dummyswitch.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c6f6f69..7822c74 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -71,6 +71,13 @@ config DUMMY
 	  To compile this driver as a module, choose M here: the module
 	  will be called dummy.
 
+config NET_DUMMY_SWITCH
+	tristate "Dummy switch net driver support"
+	depends on NET_SWITCHDEV
+	---help---
+	  To compile this driver as a module, choose M here: the module
+	  will be called dummyswitch.
+
 config EQUALIZER
 	tristate "EQL (serial line load balancing) support"
 	---help---
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 61aefdd..3c835ba 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -7,6 +7,7 @@
 #
 obj-$(CONFIG_BONDING) += bonding/
 obj-$(CONFIG_DUMMY) += dummy.o
+obj-$(CONFIG_NET_DUMMY_SWITCH) += dummyswitch.o
 obj-$(CONFIG_EQUALIZER) += eql.o
 obj-$(CONFIG_IFB) += ifb.o
 obj-$(CONFIG_MACVLAN) += macvlan.o
diff --git a/drivers/net/dummyswitch.c b/drivers/net/dummyswitch.c
new file mode 100644
index 0000000..7e1a54c
--- /dev/null
+++ b/drivers/net/dummyswitch.c
@@ -0,0 +1,130 @@
+/*
+ * drivers/net/dummyswitch.c - Dummy switch device
+ * Copyright (c) 2014 Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <net/rtnetlink.h>
+
+struct dummyswport_priv {
+	struct netdev_phys_item_id psid;
+};
+
+static netdev_tx_t dummyswport_start_xmit(struct sk_buff *skb,
+					  struct net_device *dev)
+{
+	dev_kfree_skb(skb);
+	return NETDEV_TX_OK;
+}
+
+static int dummyswport_swdev_get_id(struct net_device *dev,
+				    struct netdev_phys_item_id *psid)
+{
+	struct dummyswport_priv *dsp = netdev_priv(dev);
+
+	memcpy(psid, &dsp->psid, sizeof(*psid));
+	return 0;
+}
+
+static int dummyswport_change_carrier(struct net_device *dev, bool new_carrier)
+{
+	if (new_carrier)
+		netif_carrier_on(dev);
+	else
+		netif_carrier_off(dev);
+	return 0;
+}
+
+static const struct net_device_ops dummyswport_netdev_ops = {
+	.ndo_start_xmit		= dummyswport_start_xmit,
+	.ndo_swdev_get_id	= dummyswport_swdev_get_id,
+	.ndo_change_carrier	= dummyswport_change_carrier,
+};
+
+static void dummyswport_setup(struct net_device *dev)
+{
+	ether_setup(dev);
+
+	/* Initialize the device structure. */
+	dev->netdev_ops = &dummyswport_netdev_ops;
+	dev->destructor = free_netdev;
+
+	/* Fill in device structure with ethernet-generic values. */
+	dev->tx_queue_len = 0;
+	dev->flags |= IFF_NOARP;
+	dev->flags &= ~IFF_MULTICAST;
+	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+	dev->features	|= NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_TSO;
+	dev->features	|= NETIF_F_HW_CSUM | NETIF_F_HIGHDMA | NETIF_F_LLTX;
+	eth_hw_addr_random(dev);
+}
+
+static int dummyswport_validate(struct nlattr *tb[], struct nlattr *data[])
+{
+	if (tb[IFLA_ADDRESS])
+		return -EINVAL;
+	if (!data || !data[IFLA_DUMMYSWPORT_PHYS_SWITCH_ID])
+		return -EINVAL;
+	return 0;
+}
+
+static int dummyswport_newlink(struct net *src_net, struct net_device *dev,
+			       struct nlattr *tb[], struct nlattr *data[])
+{
+	struct dummyswport_priv *dsp = netdev_priv(dev);
+	int err;
+
+	dsp->psid.id_len = nla_len(data[IFLA_DUMMYSWPORT_PHYS_SWITCH_ID]);
+	memcpy(dsp->psid.id, nla_data(data[IFLA_DUMMYSWPORT_PHYS_SWITCH_ID]),
+	       dsp->psid.id_len);
+
+	err = register_netdevice(dev);
+	if (err)
+		return err;
+
+	netif_carrier_on(dev);
+
+	return 0;
+}
+
+static const struct nla_policy dummyswport_policy[IFLA_DUMMYSWPORT_MAX + 1] = {
+	[IFLA_DUMMYSWPORT_PHYS_SWITCH_ID] = { .type = NLA_BINARY,
+					      .len = MAX_PHYS_ITEM_ID_LEN },
+};
+
+static struct rtnl_link_ops dummyswport_link_ops __read_mostly = {
+	.kind		= "dummyswport",
+	.priv_size	= sizeof(struct dummyswport_priv),
+	.setup		= dummyswport_setup,
+	.validate	= dummyswport_validate,
+	.newlink	= dummyswport_newlink,
+	.policy		= dummyswport_policy,
+	.maxtype	= IFLA_DUMMYSWPORT_MAX,
+};
+
+static int __init dummysw_module_init(void)
+{
+	return rtnl_link_register(&dummyswport_link_ops);
+}
+
+static void __exit dummysw_module_exit(void)
+{
+	rtnl_link_unregister(&dummyswport_link_ops);
+}
+
+module_init(dummysw_module_init);
+module_exit(dummysw_module_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>");
+MODULE_DESCRIPTION("Dummy switch device");
+MODULE_ALIAS_RTNL_LINK("dummyswport");
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index fe6c4c5..33353e3 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -562,4 +562,13 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* DUMMYSWPORT section */
+enum {
+	IFLA_DUMMYSWPORT_UNSPEC,
+	IFLA_DUMMYSWPORT_PHYS_SWITCH_ID,
+	__IFLA_DUMMYSWPORT_MAX,
+};
+
+#define IFLA_DUMMYSWPORT_MAX (__IFLA_DUMMYSWPORT_MAX - 1)
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 07/13] dsa: implement ndo_swdev_get_id
       [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2014-09-03  9:24   ` [patch net-next 06/13] net: introduce dummy switch Jiri Pirko
@ 2014-09-03  9:24   ` Jiri Pirko
       [not found]     ` <1409736300-12303-8-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
  2014-09-03  9:24   ` [patch net-next 10/13] openvswitch: add support for datapath hardware offload Jiri Pirko
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
---
 include/linux/netdevice.h |  3 ++-
 include/net/dsa.h         |  1 +
 net/dsa/Kconfig           |  2 +-
 net/dsa/dsa.c             |  3 +++
 net/dsa/slave.c           | 10 ++++++++++
 5 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6a009d1..7ee070f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -41,7 +41,6 @@
 
 #include <linux/ethtool.h>
 #include <net/net_namespace.h>
-#include <net/dsa.h>
 #ifdef CONFIG_DCB
 #include <net/dcbnl.h>
 #endif
@@ -1259,6 +1258,8 @@ enum netdev_priv_flags {
 #define IFF_LIVE_ADDR_CHANGE		IFF_LIVE_ADDR_CHANGE
 #define IFF_MACVLAN			IFF_MACVLAN
 
+#include <net/dsa.h>
+
 /**
  *	struct net_device - The DEVICE structure.
  *		Actually, this whole structure is a big mistake.  It mixes I/O
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 9771292..d60cd42 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -140,6 +140,7 @@ struct dsa_switch {
 	u32			phys_mii_mask;
 	struct mii_bus		*slave_mii_bus;
 	struct net_device	*ports[DSA_MAX_PORTS];
+	struct netdev_phys_item_id psid;
 };
 
 static inline bool dsa_is_cpu_port(struct dsa_switch *ds, int p)
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index a585fd6..4e144a2 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -1,6 +1,6 @@
 config HAVE_NET_DSA
 	def_bool y
-	depends on NETDEVICES && !S390
+	depends on NETDEVICES && NET_SWITCHDEV && !S390
 
 # Drivers must select NET_DSA and the appropriate tagging format
 
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 61f145c..374912d 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -202,6 +202,9 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
 		ds->ports[i] = slave_dev;
 	}
 
+	ds->psid.id_len = MAX_PHYS_ITEM_ID_LEN;
+	get_random_bytes(ds->psid.id, ds->psid.id_len);
+
 	return ds;
 
 out_free:
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 7333a4a..d79a6c7 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -192,6 +192,15 @@ static netdev_tx_t dsa_slave_notag_xmit(struct sk_buff *skb,
 	return NETDEV_TX_OK;
 }
 
+static int dsa_slave_swdev_get_id(struct net_device *dev,
+				  struct netdev_phys_item_id *psid)
+{
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct dsa_switch *ds = p->parent;
+
+	memcpy(psid, &ds->psid, sizeof(*psid));
+	return 0;
+}
 
 /* ethtool operations *******************************************************/
 static int
@@ -323,6 +332,7 @@ static const struct net_device_ops dsa_slave_netdev_ops = {
 	.ndo_set_rx_mode	= dsa_slave_set_rx_mode,
 	.ndo_set_mac_address	= dsa_slave_set_mac_address,
 	.ndo_do_ioctl		= dsa_slave_ioctl,
+	.ndo_swdev_get_id	= dsa_slave_swdev_get_id,
 };
 
 static const struct dsa_device_ops notag_netdev_ops = {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 08/13] net: introduce netdev_phys_item_ids_match helper
  2014-09-03  9:24 [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Jiri Pirko
                   ` (2 preceding siblings ...)
       [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
@ 2014-09-03  9:24 ` Jiri Pirko
  2014-09-03  9:24 ` [patch net-next 09/13] openvswitch: introduce vport_op get_netdev Jiri Pirko
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, dev, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@cumulusnetworks.com>
---
 include/linux/netdevice.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7ee070f..b2c3ff0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -750,6 +750,13 @@ struct netdev_phys_item_id {
 	unsigned char id_len;
 };
 
+static inline bool netdev_phys_item_ids_match(struct netdev_phys_item_id *id1,
+					      struct netdev_phys_item_id *id2)
+{
+	return id1->id_len == id2->id_len &&
+	       !memcmp(id1->id, id2->id, id1->id_len);
+}
+
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
 				       struct sk_buff *skb);
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 09/13] openvswitch: introduce vport_op get_netdev
  2014-09-03  9:24 [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Jiri Pirko
                   ` (3 preceding siblings ...)
  2014-09-03  9:24 ` [patch net-next 08/13] net: introduce netdev_phys_item_ids_match helper Jiri Pirko
@ 2014-09-03  9:24 ` Jiri Pirko
  2014-09-03  9:25 ` [patch net-next 13/13] switchdev: introduce Netlink API Jiri Pirko
  2014-09-08 13:54 ` [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Thomas Graf
  6 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, dev, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye

This will allow to query easily if the vport has netdev. Also it allows
to unexpose netdev_vport_priv and struct netdev_vport.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/openvswitch/datapath.c           |  2 +-
 net/openvswitch/dp_notify.c          |  7 ++---
 net/openvswitch/vport-internal_dev.c | 56 ++++++++++++++++++++++++------------
 net/openvswitch/vport-netdev.c       | 16 +++++++++++
 net/openvswitch/vport-netdev.h       | 12 --------
 net/openvswitch/vport.h              |  2 ++
 6 files changed, 59 insertions(+), 36 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 683d6cd..75bb07f 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -171,7 +171,7 @@ static int get_dpifindex(struct datapath *dp)
 
 	local = ovs_vport_rcu(dp, OVSP_LOCAL);
 	if (local)
-		ifindex = netdev_vport_priv(local)->dev->ifindex;
+		ifindex = local->ops->get_netdev(local)->ifindex;
 	else
 		ifindex = 0;
 
diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index 2c631fe..d2cc24b 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -58,13 +58,12 @@ void ovs_dp_notify_wq(struct work_struct *work)
 			struct hlist_node *n;
 
 			hlist_for_each_entry_safe(vport, n, &dp->ports[i], dp_hash_node) {
-				struct netdev_vport *netdev_vport;
+				struct net_device *dev;
 
 				if (vport->ops->type != OVS_VPORT_TYPE_NETDEV)
 					continue;
-
-				netdev_vport = netdev_vport_priv(vport);
-				if (!(netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH))
+				dev = vport->ops->get_netdev(vport);
+				if (!(dev->priv_flags & IFF_OVS_DATAPATH))
 					dp_detach_port_notify(vport);
 			}
 		}
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 8451612..6be7928 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -32,6 +32,17 @@
 #include "vport-internal_dev.h"
 #include "vport-netdev.h"
 
+struct internal_dev_vport {
+	struct rcu_head rcu;
+	struct net_device *dev;
+};
+
+static struct internal_dev_vport *
+internal_dev_vport_priv(const struct vport *vport)
+{
+	return vport_priv(vport);
+}
+
 struct internal_dev {
 	struct vport *vport;
 };
@@ -154,49 +165,50 @@ static void do_setup(struct net_device *netdev)
 static struct vport *internal_dev_create(const struct vport_parms *parms)
 {
 	struct vport *vport;
-	struct netdev_vport *netdev_vport;
+	struct internal_dev_vport *int_vport;
 	struct internal_dev *internal_dev;
+	struct net_device *dev;
 	int err;
 
-	vport = ovs_vport_alloc(sizeof(struct netdev_vport),
+	vport = ovs_vport_alloc(sizeof(struct internal_dev_vport),
 				&ovs_internal_vport_ops, parms);
 	if (IS_ERR(vport)) {
 		err = PTR_ERR(vport);
 		goto error;
 	}
 
-	netdev_vport = netdev_vport_priv(vport);
+	int_vport = internal_dev_vport_priv(vport);
 
-	netdev_vport->dev = alloc_netdev(sizeof(struct internal_dev),
-					 parms->name, NET_NAME_UNKNOWN,
-					 do_setup);
-	if (!netdev_vport->dev) {
+	dev = alloc_netdev(sizeof(struct internal_dev), parms->name,
+			   NET_NAME_UNKNOWN, do_setup);
+	if (!dev) {
 		err = -ENOMEM;
 		goto error_free_vport;
 	}
+	int_vport->dev = dev;
 
-	dev_net_set(netdev_vport->dev, ovs_dp_get_net(vport->dp));
-	internal_dev = internal_dev_priv(netdev_vport->dev);
+	dev_net_set(dev, ovs_dp_get_net(vport->dp));
+	internal_dev = internal_dev_priv(dev);
 	internal_dev->vport = vport;
 
 	/* Restrict bridge port to current netns. */
 	if (vport->port_no == OVSP_LOCAL)
-		netdev_vport->dev->features |= NETIF_F_NETNS_LOCAL;
+		dev->features |= NETIF_F_NETNS_LOCAL;
 
 	rtnl_lock();
-	err = register_netdevice(netdev_vport->dev);
+	err = register_netdevice(dev);
 	if (err)
 		goto error_free_netdev;
 
-	dev_set_promiscuity(netdev_vport->dev, 1);
+	dev_set_promiscuity(dev, 1);
 	rtnl_unlock();
-	netif_start_queue(netdev_vport->dev);
+	netif_start_queue(dev);
 
 	return vport;
 
 error_free_netdev:
 	rtnl_unlock();
-	free_netdev(netdev_vport->dev);
+	free_netdev(dev);
 error_free_vport:
 	ovs_vport_free(vport);
 error:
@@ -205,21 +217,21 @@ error:
 
 static void internal_dev_destroy(struct vport *vport)
 {
-	struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
+	struct internal_dev_vport *int_vport = internal_dev_vport_priv(vport);
 
-	netif_stop_queue(netdev_vport->dev);
+	netif_stop_queue(int_vport->dev);
 	rtnl_lock();
-	dev_set_promiscuity(netdev_vport->dev, -1);
+	dev_set_promiscuity(int_vport->dev, -1);
 
 	/* unregister_netdevice() waits for an RCU grace period. */
-	unregister_netdevice(netdev_vport->dev);
+	unregister_netdevice(int_vport->dev);
 
 	rtnl_unlock();
 }
 
 static int internal_dev_recv(struct vport *vport, struct sk_buff *skb)
 {
-	struct net_device *netdev = netdev_vport_priv(vport)->dev;
+	struct net_device *netdev = internal_dev_vport_priv(vport)->dev;
 	int len;
 
 	len = skb->len;
@@ -238,12 +250,18 @@ static int internal_dev_recv(struct vport *vport, struct sk_buff *skb)
 	return len;
 }
 
+static struct net_device *internal_dev_get_netdev(struct vport *vport)
+{
+	return internal_dev_vport_priv(vport)->dev;
+}
+
 const struct vport_ops ovs_internal_vport_ops = {
 	.type		= OVS_VPORT_TYPE_INTERNAL,
 	.create		= internal_dev_create,
 	.destroy	= internal_dev_destroy,
 	.get_name	= ovs_netdev_get_name,
 	.send		= internal_dev_recv,
+	.get_netdev	= internal_dev_get_netdev,
 };
 
 int ovs_is_internal_dev(const struct net_device *netdev)
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index d21f77d..aaf3d14 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -33,6 +33,16 @@
 #include "vport-internal_dev.h"
 #include "vport-netdev.h"
 
+struct netdev_vport {
+	struct rcu_head rcu;
+	struct net_device *dev;
+};
+
+static struct netdev_vport *netdev_vport_priv(const struct vport *vport)
+{
+	return vport_priv(vport);
+}
+
 /* Must be called with rcu_read_lock. */
 static void netdev_port_receive(struct vport *vport, struct sk_buff *skb)
 {
@@ -224,10 +234,16 @@ struct vport *ovs_netdev_get_vport(struct net_device *dev)
 		return NULL;
 }
 
+static struct net_device *netdev_get_netdev(struct vport *vport)
+{
+	return netdev_vport_priv(vport)->dev;
+}
+
 const struct vport_ops ovs_netdev_vport_ops = {
 	.type		= OVS_VPORT_TYPE_NETDEV,
 	.create		= netdev_create,
 	.destroy	= netdev_destroy,
 	.get_name	= ovs_netdev_get_name,
 	.send		= netdev_send,
+	.get_netdev	= netdev_get_netdev,
 };
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 8df01c11..f03d41d 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -26,18 +26,6 @@
 
 struct vport *ovs_netdev_get_vport(struct net_device *dev);
 
-struct netdev_vport {
-	struct rcu_head rcu;
-
-	struct net_device *dev;
-};
-
-static inline struct netdev_vport *
-netdev_vport_priv(const struct vport *vport)
-{
-	return vport_priv(vport);
-}
-
 const char *ovs_netdev_get_name(const struct vport *);
 void ovs_netdev_detach_dev(struct vport *);
 
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index 8409e06..f434271 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -164,6 +164,8 @@ struct vport_ops {
 	const char *(*get_name)(const struct vport *);
 
 	int (*send)(struct vport *, struct sk_buff *);
+
+	struct net_device *(*get_netdev)(struct vport *);
 };
 
 enum vport_err_type {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 10/13] openvswitch: add support for datapath hardware offload
       [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2014-09-03  9:24   ` [patch net-next 07/13] dsa: implement ndo_swdev_get_id Jiri Pirko
@ 2014-09-03  9:24   ` Jiri Pirko
       [not found]     ` <1409736300-12303-11-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
  2014-09-03  9:24   ` [patch net-next 11/13] sw_flow: add misc section to key with in_port_ifindex field Jiri Pirko
  2014-09-03  9:24   ` [patch net-next 12/13] rocker: introduce rocker switch driver Jiri Pirko
  7 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Benefit from the possibility to work with flows in switch devices and
use the swdev api to offload flow datapath.

Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
---
 net/openvswitch/Makefile       |   3 +-
 net/openvswitch/datapath.c     |  33 ++++++
 net/openvswitch/datapath.h     |   3 +
 net/openvswitch/flow_table.c   |   1 +
 net/openvswitch/hw_offload.c   | 245 +++++++++++++++++++++++++++++++++++++++++
 net/openvswitch/hw_offload.h   |  22 ++++
 net/openvswitch/vport-netdev.c |   3 +
 net/openvswitch/vport.h        |   2 +
 8 files changed, 311 insertions(+), 1 deletion(-)
 create mode 100644 net/openvswitch/hw_offload.c
 create mode 100644 net/openvswitch/hw_offload.h

diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
index 3591cb5..5152437 100644
--- a/net/openvswitch/Makefile
+++ b/net/openvswitch/Makefile
@@ -13,7 +13,8 @@ openvswitch-y := \
 	flow_table.o \
 	vport.o \
 	vport-internal_dev.o \
-	vport-netdev.o
+	vport-netdev.o \
+	hw_offload.o
 
 ifneq ($(CONFIG_OPENVSWITCH_VXLAN),)
 openvswitch-y += vport-vxlan.o
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 75bb07f..3e43e1d 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -57,6 +57,7 @@
 #include "flow_netlink.h"
 #include "vport-internal_dev.h"
 #include "vport-netdev.h"
+#include "hw_offload.h"
 
 int ovs_net_id __read_mostly;
 
@@ -864,6 +865,9 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 			acts = NULL;
 			goto err_unlock_ovs;
 		}
+		error = ovs_hw_flow_insert(dp, new_flow);
+		if (error)
+			pr_warn("failed to insert flow into hw\n");
 
 		if (unlikely(reply)) {
 			error = ovs_flow_cmd_fill_info(new_flow,
@@ -896,10 +900,18 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 				goto err_unlock_ovs;
 			}
 		}
+		error = ovs_hw_flow_remove(dp, flow);
+		if (error)
+			pr_warn("failed to remove flow from hw\n");
+
 		/* Update actions. */
 		old_acts = ovsl_dereference(flow->sf_acts);
 		rcu_assign_pointer(flow->sf_acts, acts);
 
+		error = ovs_hw_flow_insert(dp, flow);
+		if (error)
+			pr_warn("failed to insert flow into hw\n");
+
 		if (unlikely(reply)) {
 			error = ovs_flow_cmd_fill_info(flow,
 						       ovs_header->dp_ifindex,
@@ -993,9 +1005,17 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
 
 	/* Update actions, if present. */
 	if (likely(acts)) {
+		error = ovs_hw_flow_remove(dp, flow);
+		if (error)
+			pr_warn("failed to remove flow from hw\n");
+
 		old_acts = ovsl_dereference(flow->sf_acts);
 		rcu_assign_pointer(flow->sf_acts, acts);
 
+		error = ovs_hw_flow_insert(dp, flow);
+		if (error)
+			pr_warn("failed to insert flow into hw\n");
+
 		if (unlikely(reply)) {
 			error = ovs_flow_cmd_fill_info(flow,
 						       ovs_header->dp_ifindex,
@@ -1109,6 +1129,9 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	}
 
 	if (unlikely(!a[OVS_FLOW_ATTR_KEY])) {
+		err = ovs_hw_flow_flush(dp);
+		if (err)
+			pr_warn("failed to flush flows from hw\n");
 		err = ovs_flow_tbl_flush(&dp->table);
 		goto unlock;
 	}
@@ -1120,6 +1143,9 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	}
 
 	ovs_flow_tbl_remove(&dp->table, flow);
+	err = ovs_hw_flow_remove(dp, flow);
+	if (err)
+		pr_warn("failed to remove flow from hw\n");
 	ovs_unlock();
 
 	reply = ovs_flow_cmd_alloc_info((const struct ovs_flow_actions __force *) flow->sf_acts,
@@ -1368,6 +1394,8 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++)
 		INIT_HLIST_HEAD(&dp->ports[i]);
 
+	INIT_LIST_HEAD(&dp->swdev_rep_list);
+
 	/* Set up our datapath device. */
 	parms.name = nla_data(a[OVS_DP_ATTR_NAME]);
 	parms.type = OVS_VPORT_TYPE_INTERNAL;
@@ -1431,6 +1459,7 @@ err:
 static void __dp_destroy(struct datapath *dp)
 {
 	int i;
+	int err;
 
 	for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++) {
 		struct vport *vport;
@@ -1448,6 +1477,10 @@ static void __dp_destroy(struct datapath *dp)
 	 */
 	ovs_dp_detach_port(ovs_vport_ovsl(dp, OVSP_LOCAL));
 
+	err = ovs_hw_flow_flush(dp);
+	if (err)
+		pr_warn("failed to flush flows from hw\n");
+
 	/* RCU destroy the flow table */
 	ovs_flow_tbl_destroy(&dp->table, true);
 
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 291f5a0..9dc11a6 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -90,6 +90,9 @@ struct datapath {
 #endif
 
 	u32 user_features;
+
+	/* List of switchdev representative ports */
+	struct list_head swdev_rep_list;
 };
 
 /**
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index e7d9a41..c01e4cb 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -85,6 +85,7 @@ struct ovs_flow *ovs_flow_alloc(void)
 
 	flow->sf_acts = NULL;
 	flow->flow.mask = NULL;
+	flow->flow.actions = NULL;
 	flow->stats_last_writer = NUMA_NO_NODE;
 
 	/* Initialize the default stat node. */
diff --git a/net/openvswitch/hw_offload.c b/net/openvswitch/hw_offload.c
new file mode 100644
index 0000000..45a0c5f
--- /dev/null
+++ b/net/openvswitch/hw_offload.c
@@ -0,0 +1,245 @@
+/*
+ * Copyright (c) 2014 Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+#include <net/sw_flow.h>
+#include <net/switchdev.h>
+
+#include "datapath.h"
+#include "vport-netdev.h"
+
+static int sw_flow_action_create(struct datapath *dp,
+				 struct sw_flow_actions **p_actions,
+				 struct ovs_flow_actions *acts)
+{
+	const struct nlattr *attr = acts->actions;
+	int len = acts->actions_len;
+	const struct nlattr *a;
+	int rem;
+	struct sw_flow_actions *actions;
+	struct sw_flow_action *cur;
+	size_t count = 0;
+	int err;
+
+	for (a = attr, rem = len; rem > 0; a = nla_next(a, &rem))
+		count++;
+
+	actions = kzalloc(sizeof(struct sw_flow_actions) +
+			  sizeof(struct sw_flow_action) * count,
+			  GFP_KERNEL);
+	if (!actions)
+		return -ENOMEM;
+	actions->count = count;
+
+	cur = actions->actions;
+	for (a = attr, rem = len; rem > 0; a = nla_next(a, &rem)) {
+		switch (nla_type(a)) {
+		case OVS_ACTION_ATTR_OUTPUT:
+			{
+				struct vport *vport;
+
+				vport = ovs_vport_ovsl_rcu(dp, nla_get_u32(a));
+				cur->type = SW_FLOW_ACTION_TYPE_OUTPUT;
+				cur->out_port_ifindex =
+					vport->ops->get_netdev(vport)->ifindex;
+			}
+			break;
+
+		case OVS_ACTION_ATTR_PUSH_VLAN:
+			{
+				const struct ovs_action_push_vlan *vlan;
+
+				vlan = nla_data(a);
+				cur->type = SW_FLOW_ACTION_TYPE_VLAN_PUSH;
+				cur->vlan.vlan_proto = vlan->vlan_tpid;
+				cur->vlan.vlan_tci = vlan->vlan_tci;
+			}
+			break;
+
+		case OVS_ACTION_ATTR_POP_VLAN:
+			cur->type = SW_FLOW_ACTION_TYPE_VLAN_POP;
+			break;
+
+		default:
+			err = -EOPNOTSUPP;
+			goto errout;
+		}
+		cur++;
+	}
+	*p_actions = actions;
+	return 0;
+
+errout:
+	kfree(actions);
+	return err;
+}
+
+int ovs_hw_flow_insert(struct datapath *dp, struct ovs_flow *flow)
+{
+	struct sw_flow_actions *actions;
+	struct vport *vport;
+	struct net_device *dev;
+	int err;
+
+	ASSERT_OVSL();
+	BUG_ON(flow->flow.actions);
+
+	err = sw_flow_action_create(dp, &actions, flow->sf_acts);
+	if (err)
+		return err;
+	flow->flow.actions = actions;
+
+	list_for_each_entry(vport, &dp->swdev_rep_list, swdev_rep_list) {
+		dev = vport->ops->get_netdev(vport);
+		BUG_ON(!dev);
+		err = swdev_flow_insert(dev, &flow->flow);
+		if (err == -ENODEV) /* out device is not in this switch */
+			continue;
+		if (err)
+			break;
+	}
+
+	if (err) {
+		kfree(actions);
+		flow->flow.actions = NULL;
+	}
+	return err;
+}
+
+int ovs_hw_flow_remove(struct datapath *dp, struct ovs_flow *flow)
+{
+	struct sw_flow_actions *actions;
+	struct vport *vport;
+	struct net_device *dev;
+	int err = 0;
+
+	ASSERT_OVSL();
+
+	if (!flow->flow.actions) {
+		err = sw_flow_action_create(dp, &actions, flow->sf_acts);
+		if (err)
+			return err;
+		flow->flow.actions = actions;
+	}
+
+	list_for_each_entry(vport, &dp->swdev_rep_list, swdev_rep_list) {
+		dev = vport->ops->get_netdev(vport);
+		BUG_ON(!dev);
+		err = swdev_flow_remove(dev, &flow->flow);
+		if (err == -ENODEV) /* out device is not in this switch */
+			continue;
+		if (err)
+			break;
+	}
+	kfree(flow->flow.actions);
+	flow->flow.actions = NULL;
+	return err;
+}
+
+int ovs_hw_flow_flush(struct datapath *dp)
+{
+	struct table_instance *ti;
+	int i;
+	int ver;
+	int err;
+
+	ti = ovsl_dereference(dp->table.ti);
+	ver = ti->node_ver;
+
+	for (i = 0; i < ti->n_buckets; i++) {
+		struct ovs_flow *flow;
+		struct hlist_head *head = flex_array_get(ti->buckets, i);
+
+		hlist_for_each_entry(flow, head, hash_node[ver]) {
+			err = ovs_hw_flow_remove(dp, flow);
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
+
+static bool __is_vport_in_swdev_rep_list(struct datapath *dp,
+					 struct vport *vport)
+{
+	struct vport *cur_vport;
+
+	list_for_each_entry(cur_vport, &dp->swdev_rep_list, swdev_rep_list) {
+		if (cur_vport == vport)
+			return true;
+	}
+	return false;
+}
+
+static struct vport *__find_vport_by_swdev_id(struct datapath *dp,
+					      struct vport *vport)
+{
+	struct net_device *dev;
+	struct vport *cur_vport;
+	struct netdev_phys_item_id id;
+	struct netdev_phys_item_id cur_id;
+	int i;
+	int err;
+
+	err = swdev_get_id(vport->ops->get_netdev(vport), &id);
+	if (err)
+		return ERR_PTR(err);
+
+	for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++) {
+		hlist_for_each_entry(cur_vport, &dp->ports[i], dp_hash_node) {
+			if (cur_vport->ops->type != OVS_VPORT_TYPE_NETDEV)
+				continue;
+			if (cur_vport == vport)
+				continue;
+			dev = cur_vport->ops->get_netdev(cur_vport);
+			if (!dev)
+				continue;
+			err = swdev_get_id(dev, &cur_id);
+			if (err)
+				continue;
+			if (netdev_phys_item_ids_match(&id, &cur_id))
+				return cur_vport;
+		}
+	}
+	return ERR_PTR(-ENOENT);
+}
+
+void ovs_hw_port_add(struct datapath *dp, struct vport *vport)
+{
+	struct vport *found_vport;
+
+	ASSERT_OVSL();
+	/* The representative list contains always one port per switch dev id */
+	found_vport = __find_vport_by_swdev_id(dp, vport);
+	if (IS_ERR(found_vport) && PTR_ERR(found_vport) == -ENOENT) {
+		list_add(&vport->swdev_rep_list, &dp->swdev_rep_list);
+		pr_debug("%s added to rep_list\n", vport->ops->get_name(vport));
+	}
+}
+
+void ovs_hw_port_del(struct datapath *dp, struct vport *vport)
+{
+	struct vport *found_vport;
+
+	ASSERT_OVSL();
+	if (!__is_vport_in_swdev_rep_list(dp, vport))
+		return;
+
+	list_del(&vport->swdev_rep_list);
+	pr_debug("%s deleted from rep_list\n", vport->ops->get_name(vport));
+	found_vport = __find_vport_by_swdev_id(dp, vport);
+	if (!IS_ERR(found_vport)) {
+		list_add(&found_vport->swdev_rep_list, &dp->swdev_rep_list);
+		pr_debug("%s added to rep_list instead\n",
+			 found_vport->ops->get_name(found_vport));
+	}
+}
diff --git a/net/openvswitch/hw_offload.h b/net/openvswitch/hw_offload.h
new file mode 100644
index 0000000..83972d7
--- /dev/null
+++ b/net/openvswitch/hw_offload.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2014 Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef HW_OFFLOAD_H
+#define HW_OFFLOAD_H 1
+
+#include "datapath.h"
+#include "flow.h"
+
+int ovs_hw_flow_insert(struct datapath *dp, struct ovs_flow *flow);
+int ovs_hw_flow_remove(struct datapath *dp, struct ovs_flow *flow);
+int ovs_hw_flow_flush(struct datapath *dp);
+void ovs_hw_port_add(struct datapath *dp, struct vport *vport);
+void ovs_hw_port_del(struct datapath *dp, struct vport *vport);
+
+#endif
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index aaf3d14..c5953de 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -32,6 +32,7 @@
 #include "datapath.h"
 #include "vport-internal_dev.h"
 #include "vport-netdev.h"
+#include "hw_offload.h"
 
 struct netdev_vport {
 	struct rcu_head rcu;
@@ -136,6 +137,7 @@ static struct vport *netdev_create(const struct vport_parms *parms)
 	dev_set_promiscuity(netdev_vport->dev, 1);
 	netdev_vport->dev->priv_flags |= IFF_OVS_DATAPATH;
 	rtnl_unlock();
+	ovs_hw_port_add(vport->dp, vport);
 
 	return vport;
 
@@ -176,6 +178,7 @@ static void netdev_destroy(struct vport *vport)
 {
 	struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
 
+	ovs_hw_port_del(vport->dp, vport);
 	rtnl_lock();
 	if (netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH)
 		ovs_netdev_detach_dev(vport);
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index f434271..c28604a 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -110,6 +110,8 @@ struct vport {
 
 	spinlock_t stats_lock;
 	struct vport_err_stats err_stats;
+
+	struct list_head swdev_rep_list;
 };
 
 /**
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 11/13] sw_flow: add misc section to key with in_port_ifindex field
       [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
                     ` (5 preceding siblings ...)
  2014-09-03  9:24   ` [patch net-next 10/13] openvswitch: add support for datapath hardware offload Jiri Pirko
@ 2014-09-03  9:24   ` Jiri Pirko
  2014-09-03  9:24   ` [patch net-next 12/13] rocker: introduce rocker switch driver Jiri Pirko
  7 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
---
 include/net/sw_flow.h        |  3 +++
 net/openvswitch/hw_offload.c | 22 ++++++++++++++++++++++
 net/switchdev/switchdev.c    |  2 ++
 3 files changed, 27 insertions(+)

diff --git a/include/net/sw_flow.h b/include/net/sw_flow.h
index 3af7758..a144d8e 100644
--- a/include/net/sw_flow.h
+++ b/include/net/sw_flow.h
@@ -69,6 +69,9 @@ struct sw_flow_key {
 			} nd;
 		} ipv6;
 	};
+	struct {
+		u32	in_port_ifindex; /* Input switch port ifindex (or 0). */
+	} misc;
 } __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
 
 struct sw_flow_key_range {
diff --git a/net/openvswitch/hw_offload.c b/net/openvswitch/hw_offload.c
index 45a0c5f..5c3edd0 100644
--- a/net/openvswitch/hw_offload.c
+++ b/net/openvswitch/hw_offload.c
@@ -83,6 +83,24 @@ errout:
 	return err;
 }
 
+void ovs_hw_flow_adjust(struct datapath *dp, struct ovs_flow *flow)
+{
+	struct vport *vport;
+
+	flow->flow.key.misc.in_port_ifindex = 0;
+	flow->flow.mask->key.misc.in_port_ifindex = 0;
+	vport = ovs_vport_ovsl(dp, flow->flow.key.phy.in_port);
+	if (vport && vport->ops->type == OVS_VPORT_TYPE_NETDEV) {
+		struct net_device *dev;
+
+		dev = vport->ops->get_netdev(vport);
+		if (dev) {
+			flow->flow.key.misc.in_port_ifindex = dev->ifindex;
+			flow->flow.mask->key.misc.in_port_ifindex = 0xFFFFFFFF;
+		}
+	}
+}
+
 int ovs_hw_flow_insert(struct datapath *dp, struct ovs_flow *flow)
 {
 	struct sw_flow_actions *actions;
@@ -93,6 +111,8 @@ int ovs_hw_flow_insert(struct datapath *dp, struct ovs_flow *flow)
 	ASSERT_OVSL();
 	BUG_ON(flow->flow.actions);
 
+	ovs_hw_flow_adjust(dp, flow);
+
 	err = sw_flow_action_create(dp, &actions, flow->sf_acts);
 	if (err)
 		return err;
@@ -124,6 +144,8 @@ int ovs_hw_flow_remove(struct datapath *dp, struct ovs_flow *flow)
 
 	ASSERT_OVSL();
 
+	ovs_hw_flow_adjust(dp, flow);
+
 	if (!flow->flow.actions) {
 		err = sw_flow_action_create(dp, &actions, flow->sf_acts);
 		if (err)
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index e079707..05acb0b 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -129,6 +129,8 @@ static void print_flow(const struct sw_flow *flow, struct net_device *dev,
 	print_flow_key_ip(PREFIX_MASK, &flow->mask->key);
 	print_flow_key_ipv4(PREFIX_NONE, &flow->key);
 	print_flow_key_ipv4(PREFIX_MASK, &flow->mask->key);
+	print_flow_key_misc(PREFIX_NONE, &flow->key);
+	print_flow_key_misc(PREFIX_MASK, &flow->mask->key);
 	print_flow_actions(flow->actions);
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 12/13] rocker: introduce rocker switch driver
       [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
                     ` (6 preceding siblings ...)
  2014-09-03  9:24   ` [patch net-next 11/13] sw_flow: add misc section to key with in_port_ifindex field Jiri Pirko
@ 2014-09-03  9:24   ` Jiri Pirko
  7 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:24 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

This patch introduces the first driver to benefit from the switchdev
infrastructure and to implement newly introduced switch ndos. This is a
driver for emulated switch chip implemented in qemu:
https://github.com/sfeldma/qemu-rocker/

This patch is a result of joint work with Scott Feldman.

Signed-off-by: Scott Feldman <sfeldma-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR@public.gmane.org>
Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
---
 MAINTAINERS                          |    6 +
 drivers/net/ethernet/Kconfig         |    1 +
 drivers/net/ethernet/Makefile        |    1 +
 drivers/net/ethernet/rocker/Kconfig  |   29 +
 drivers/net/ethernet/rocker/Makefile |    5 +
 drivers/net/ethernet/rocker/rocker.c | 3553 ++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/rocker/rocker.h |  465 +++++
 7 files changed, 4060 insertions(+)
 create mode 100644 drivers/net/ethernet/rocker/Kconfig
 create mode 100644 drivers/net/ethernet/rocker/Makefile
 create mode 100644 drivers/net/ethernet/rocker/rocker.c
 create mode 100644 drivers/net/ethernet/rocker/rocker.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 4baaf44..9797bda 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7638,6 +7638,12 @@ F:	drivers/hid/hid-roccat*
 F:	include/linux/hid-roccat*
 F:	Documentation/ABI/*/sysfs-driver-hid-roccat*
 
+ROCKER DRIVER
+M:	Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
+L:	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+S:	Supported
+F:	drivers/net/ethernet/rocker/
+
 ROCKETPORT DRIVER
 P:	Comtrol Corp.
 W:	http://www.comtrol.com
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index dc7406c..61c9cc4 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -153,6 +153,7 @@ source "drivers/net/ethernet/qlogic/Kconfig"
 source "drivers/net/ethernet/realtek/Kconfig"
 source "drivers/net/ethernet/renesas/Kconfig"
 source "drivers/net/ethernet/rdc/Kconfig"
+source "drivers/net/ethernet/rocker/Kconfig"
 
 config S6GMAC
 	tristate "S6105 GMAC ethernet support"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 224a018..51ff723 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -63,6 +63,7 @@ obj-$(CONFIG_NET_VENDOR_QLOGIC) += qlogic/
 obj-$(CONFIG_NET_VENDOR_REALTEK) += realtek/
 obj-$(CONFIG_SH_ETH) += renesas/
 obj-$(CONFIG_NET_VENDOR_RDC) += rdc/
+obj-$(CONFIG_NET_VENDOR_ROCKER) += rocker/
 obj-$(CONFIG_S6GMAC) += s6gmac.o
 obj-$(CONFIG_NET_VENDOR_SAMSUNG) += samsung/
 obj-$(CONFIG_NET_VENDOR_SEEQ) += seeq/
diff --git a/drivers/net/ethernet/rocker/Kconfig b/drivers/net/ethernet/rocker/Kconfig
new file mode 100644
index 0000000..0441932
--- /dev/null
+++ b/drivers/net/ethernet/rocker/Kconfig
@@ -0,0 +1,29 @@
+#
+# Rocker device configuration
+#
+
+config NET_VENDOR_ROCKER
+	bool "Rocker devices"
+	default y
+	---help---
+	  If you have a network (Ethernet) card belonging to this class, say Y
+	  and read the Ethernet-HOWTO, available from
+	  <http://www.tldp.org/docs.html#howto>.
+
+	  Note that the answer to this question doesn't directly affect the
+	  kernel: saying N will just cause the configurator to skip all
+	  the questions about Rocker devices. If you say Y, you will be asked for
+	  your specific card in the following questions.
+
+if NET_VENDOR_ROCKER
+
+config ROCKER
+	tristate "Rocker switch driver (EXPERIMENTAL)"
+	depends on PCI && NET_SWITCHDEV
+	---help---
+	  This driver supports Rocker switch device.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called rocker.
+
+endif # NET_VENDOR_ROCKER
diff --git a/drivers/net/ethernet/rocker/Makefile b/drivers/net/ethernet/rocker/Makefile
new file mode 100644
index 0000000..f85fb12
--- /dev/null
+++ b/drivers/net/ethernet/rocker/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the Rocker network device drivers.
+#
+
+obj-$(CONFIG_ROCKER) += rocker.o
diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
new file mode 100644
index 0000000..0e8b1ef
--- /dev/null
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -0,0 +1,3553 @@
+/*
+ * drivers/net/ethernet/rocker/rocker.c - Rocker switch device driver
+ * Copyright (c) 2014 Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
+ * Copyright (c) 2014 Scott Feldman <sfeldma-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/interrupt.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/spinlock.h>
+#include <linux/hashtable.h>
+#include <linux/crc32.h>
+#include <linux/sort.h>
+#include <linux/random.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+#include <net/sw_flow.h>
+#include <net/rtnetlink.h>
+#include <asm-generic/io-64-nonatomic-lo-hi.h>
+#include <generated/utsrelease.h>
+
+#include "rocker.h"
+
+static const char rocker_driver_name[] = "rocker";
+
+static const struct pci_device_id rocker_pci_id_table[] = {
+	{PCI_VDEVICE(REDHAT, PCI_DEVICE_ID_REDHAT_ROCKER), 0},
+	{0, }
+};
+
+struct rocker_flow_tbl_key {
+	u32 priority;
+	enum rocker_of_dpa_table_id tbl_id;
+	union {
+		struct {
+			u32 in_lport;
+			u32 in_lport_mask;
+			enum rocker_of_dpa_table_id goto_tbl;
+		} ig_port;
+		struct {
+			u32 in_lport;
+			__be16 vlan_id;
+			__be16 vlan_id_mask;
+			enum rocker_of_dpa_table_id goto_tbl;
+			bool untagged;
+			__be16 new_vlan_id;
+		} vlan;
+		struct {
+			/* TODO */
+		} term_mac;
+		struct {
+			u8 eth_dst[ETH_ALEN];
+			u8 eth_dst_mask[ETH_ALEN];
+			int has_eth_dst;
+			int has_eth_dst_mask;
+			__be16 vlan_id;
+			u32 tunnel_id;
+			enum rocker_of_dpa_table_id goto_tbl;
+			u32 group_id;
+		} bridge;
+		struct {
+			u32 in_lport;
+			u32 in_lport_mask;
+			u8 eth_src[ETH_ALEN];
+			u8 eth_src_mask[ETH_ALEN];
+			u8 eth_dst[ETH_ALEN];
+			u8 eth_dst_mask[ETH_ALEN];
+			__be16 eth_type;
+			__be16 vlan_id;
+			__be16 vlan_id_mask;
+			u8 ip_proto;
+			u8 ip_proto_mask;
+			u8 ip_tos;
+			u8 ip_tos_mask;
+			u32 group_id;
+		} acl;
+	};
+};
+
+struct rocker_flow_tbl_entry {
+	struct hlist_node entry;
+	u32 ref_count;
+	u64 cookie;
+	struct rocker_flow_tbl_key key;
+	u32 key_crc32;
+};
+
+struct rocker_group_tbl_entry {
+	struct hlist_node entry;
+	u32 ref_count;
+	u32 group_id;
+	u16 group_count;
+	u32 *group_ids;
+	union {
+		struct {
+			u8 pop_vlan;
+		} l2_interface;
+	};
+};
+
+struct rocker_desc_info {
+	char *data; /* mapped */
+	size_t data_size;
+	size_t tlv_size;
+	struct rocker_desc *desc;
+	DEFINE_DMA_UNMAP_ADDR(mapaddr);
+};
+
+struct rocker_dma_ring_info {
+	size_t size;
+	u32 head;
+	u32 tail;
+	struct rocker_desc *desc; /* mapped */
+	dma_addr_t mapaddr;
+	struct rocker_desc_info *desc_info;
+	unsigned int type;
+};
+
+struct rocker;
+
+struct rocker_port {
+	struct net_device *dev;
+	unsigned int prev_flags;
+	struct rocker *rocker;
+	unsigned port_number;
+	struct napi_struct napi_tx;
+	struct napi_struct napi_rx;
+	struct rocker_dma_ring_info tx_ring;
+	struct rocker_dma_ring_info rx_ring;
+};
+
+struct rocker {
+	struct pci_dev *pdev;
+	u8 __iomem *hw_addr;
+	struct msix_entry *msix_entries;
+	unsigned port_count;
+	struct rocker_port **ports;
+	struct {
+		u64 id;
+	} hw;
+	spinlock_t cmd_ring_lock;
+	struct rocker_dma_ring_info cmd_ring;
+	struct rocker_dma_ring_info event_ring;
+	DECLARE_HASHTABLE(flow_tbl, 16);
+	spinlock_t flow_tbl_lock;
+	u64 flow_tbl_next_cookie;
+	DECLARE_HASHTABLE(group_tbl, 16);
+	spinlock_t group_tbl_lock;
+	u16 group_index_next;
+};
+
+struct rocker_wait {
+	wait_queue_head_t wait;
+	bool done;
+	bool nowait;
+};
+
+static const u8 zero_mac[ETH_ALEN] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
+static const u8 ff_mac[ETH_ALEN] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+static const u8 lldp_mac[ETH_ALEN] = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x0e };
+
+/* Rocker priority levels for flow table entries.  Higher
+ * priority match takes precedence over lower priority match.
+ */
+
+enum {
+	ROCKER_PRIORITY_UNKNOWN = 0,
+	ROCKER_PRIORITY_IG_PORT = 1,
+	ROCKER_PRIORITY_VLAN = 1,
+	ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_EXACT = 1,
+	ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_WILD = 2,
+	ROCKER_PRIORITY_BRIDGING_VLAN = 3,
+	ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_EXACT = 1,
+	ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_WILD = 2,
+	ROCKER_PRIORITY_BRIDGING_TENANT = 3,
+	ROCKER_PRIORITY_ACL_PORT_PROMISC = 1,
+	ROCKER_PRIORITY_ACL = 2,
+};
+
+static u32 rocker_port_to_lport(struct rocker_port *rocker_port)
+{
+	return rocker_port->port_number + 1;
+}
+
+static void rocker_wait_reset(struct rocker_wait *wait)
+{
+	wait->done = false;
+	wait->nowait = false;
+}
+
+static void rocker_wait_init(struct rocker_wait *wait)
+{
+	init_waitqueue_head(&wait->wait);
+	rocker_wait_reset(wait);
+}
+
+static struct rocker_wait *rocker_wait_create(gfp_t gfp)
+{
+	struct rocker_wait *wait;
+
+	wait = kmalloc(sizeof(*wait), gfp);
+	if (!wait)
+		return NULL;
+	rocker_wait_init(wait);
+	return wait;
+}
+
+static void rocker_wait_destroy(struct rocker_wait *work)
+{
+	kfree(work);
+}
+
+static bool rocker_wait_event_timeout(struct rocker_wait *wait,
+				      unsigned long timeout)
+{
+	wait_event_timeout(wait->wait, wait->done, HZ / 10);
+	if (!wait->done)
+		return false;
+	return true;
+}
+
+static void rocker_wait_wake_up(struct rocker_wait *wait)
+{
+	wait->done = true;
+	wake_up(&wait->wait);
+}
+
+static u32 rocker_msix_vector(struct rocker *rocker, unsigned vector)
+{
+	return rocker->msix_entries[vector].vector;
+}
+
+static u32 rocker_msix_tx_vector(struct rocker_port *rocker_port)
+{
+	return rocker_msix_vector(rocker_port->rocker,
+				  ROCKER_MSIX_VEC_TX(rocker_port->port_number));
+}
+
+static u32 rocker_msix_rx_vector(struct rocker_port *rocker_port)
+{
+	return rocker_msix_vector(rocker_port->rocker,
+				  ROCKER_MSIX_VEC_RX(rocker_port->port_number));
+}
+
+#define rocker_write32(rocker, reg, val)	\
+	writel((val), (rocker)->hw_addr + (ROCKER_ ## reg))
+#define rocker_read32(rocker, reg)	\
+	readl((rocker)->hw_addr + (ROCKER_ ## reg))
+#define rocker_write64(rocker, reg, val)	\
+	writeq((val), (rocker)->hw_addr + (ROCKER_ ## reg))
+#define rocker_read64(rocker, reg)	\
+	readq((rocker)->hw_addr + (ROCKER_ ## reg))
+
+/*****************************
+ * HW basic testing functions
+ *****************************/
+
+static int rocker_reg_test(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	u64 test_reg;
+	u64 rnd;
+
+	rnd = prandom_u32();
+	rnd >>= 1;
+	rocker_write32(rocker, TEST_REG, rnd);
+	test_reg = rocker_read32(rocker, TEST_REG);
+	if (test_reg != rnd * 2) {
+		dev_err(&pdev->dev, "unexpected 32bit register value %08llx, expected %08llx\n",
+			test_reg, rnd * 2);
+		return -EIO;
+	}
+
+	rnd = prandom_u32();
+	rnd <<= 31;
+	rnd |= prandom_u32();
+	rocker_write64(rocker, TEST_REG64, rnd);
+	test_reg = rocker_read64(rocker, TEST_REG64);
+	if (test_reg != rnd * 2) {
+		dev_err(&pdev->dev, "unexpected 64bit register value %16llx, expected %16llx\n",
+			test_reg, rnd * 2);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int rocker_dma_test_one(struct rocker *rocker, struct rocker_wait *wait,
+			       u32 test_type, dma_addr_t dma_handle,
+			       unsigned char *buf, unsigned char *expect,
+			       size_t size)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int i;
+
+	rocker_wait_reset(wait);
+	rocker_write32(rocker, TEST_DMA_CTRL, test_type);
+
+	if (!rocker_wait_event_timeout(wait, HZ / 10)) {
+		dev_err(&pdev->dev, "no interrupt received within a timeout\n");
+		return -EIO;
+	}
+
+	for (i = 0; i < size; i++) {
+		if (buf[i] != expect[i]) {
+			dev_err(&pdev->dev, "unexpected memory content %02x at byte %x\n, %02x expected",
+				buf[i], i, expect[i]);
+			return -EIO;
+		}
+	}
+	return 0;
+}
+
+#define ROCKER_TEST_DMA_BUF_SIZE (PAGE_SIZE * 4)
+#define ROCKER_TEST_DMA_FILL_PATTERN 0x96
+
+static int rocker_dma_test_offset(struct rocker *rocker,
+				  struct rocker_wait *wait, int offset)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	unsigned char *alloc;
+	unsigned char *buf;
+	unsigned char *expect;
+	dma_addr_t dma_handle;
+	int i;
+	int err;
+
+	alloc = kzalloc(ROCKER_TEST_DMA_BUF_SIZE * 2 + offset,
+			GFP_KERNEL | GFP_DMA);
+	if (!alloc)
+		return -ENOMEM;
+	buf = alloc + offset;
+	expect = buf + ROCKER_TEST_DMA_BUF_SIZE;
+
+	dma_handle = pci_map_single(pdev, buf, ROCKER_TEST_DMA_BUF_SIZE,
+				    PCI_DMA_BIDIRECTIONAL);
+	if (pci_dma_mapping_error(pdev, dma_handle)) {
+		err = -EIO;
+		goto free_alloc;
+	}
+
+	rocker_write64(rocker, TEST_DMA_ADDR, dma_handle);
+	rocker_write32(rocker, TEST_DMA_SIZE, ROCKER_TEST_DMA_BUF_SIZE);
+
+	memset(expect, ROCKER_TEST_DMA_FILL_PATTERN, ROCKER_TEST_DMA_BUF_SIZE);
+	err = rocker_dma_test_one(rocker, wait, ROCKER_TEST_DMA_CTRL_FILL,
+				  dma_handle, buf, expect,
+				  ROCKER_TEST_DMA_BUF_SIZE);
+	if (err)
+		goto unmap;
+
+	memset(expect, 0, ROCKER_TEST_DMA_BUF_SIZE);
+	err = rocker_dma_test_one(rocker, wait, ROCKER_TEST_DMA_CTRL_CLEAR,
+				  dma_handle, buf, expect,
+				  ROCKER_TEST_DMA_BUF_SIZE);
+	if (err)
+		goto unmap;
+
+	prandom_bytes(buf, ROCKER_TEST_DMA_BUF_SIZE);
+	for (i = 0; i < ROCKER_TEST_DMA_BUF_SIZE; i++)
+		expect[i] = ~buf[i];
+	err = rocker_dma_test_one(rocker, wait, ROCKER_TEST_DMA_CTRL_INVERT,
+				  dma_handle, buf, expect,
+				  ROCKER_TEST_DMA_BUF_SIZE);
+	if (err)
+		goto unmap;
+
+unmap:
+	pci_unmap_single(pdev, dma_handle, ROCKER_TEST_DMA_BUF_SIZE,
+			 PCI_DMA_BIDIRECTIONAL);
+free_alloc:
+	kfree(alloc);
+
+	return err;
+}
+
+static int rocker_dma_test(struct rocker *rocker, struct rocker_wait *wait)
+{
+	int i;
+	int err;
+
+	for (i = 0; i < 8; i++) {
+		err = rocker_dma_test_offset(rocker, wait, i);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static irqreturn_t rocker_test_irq_handler(int irq, void *dev_id)
+{
+	struct rocker_wait *wait = dev_id;
+
+	rocker_wait_wake_up(wait);
+
+	return IRQ_HANDLED;
+}
+
+static int rocker_basic_hw_test(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_wait wait;
+	int err;
+
+	err = rocker_reg_test(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "reg test failed\n");
+		return err;
+	}
+
+	err = request_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_TEST),
+			  rocker_test_irq_handler, 0,
+			  rocker_driver_name, &wait);
+	if (err) {
+		dev_err(&pdev->dev, "cannot assign test irq\n");
+		return err;
+	}
+
+	rocker_wait_init(&wait);
+	rocker_write32(rocker, TEST_IRQ, ROCKER_MSIX_VEC_TEST);
+
+	if (!rocker_wait_event_timeout(&wait, HZ / 10)) {
+		dev_err(&pdev->dev, "no interrupt received within a timeout\n");
+		err = -EIO;
+		goto free_irq;
+	}
+
+	err = rocker_dma_test(rocker, &wait);
+	if (err)
+		dev_err(&pdev->dev, "dma test failed\n");
+
+free_irq:
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_TEST), &wait);
+	return err;
+}
+
+/******
+ * TLV
+ ******/
+
+#define ROCKER_TLV_ALIGNTO 8U
+#define ROCKER_TLV_ALIGN(len) \
+	(((len) + ROCKER_TLV_ALIGNTO - 1) & ~(ROCKER_TLV_ALIGNTO - 1))
+#define ROCKER_TLV_HDRLEN ROCKER_TLV_ALIGN(sizeof(struct rocker_tlv))
+
+/*  <------- ROCKER_TLV_HDRLEN -------> <--- ROCKER_TLV_ALIGN(payload) --->
+ * +-----------------------------+- - -+- - - - - - - - - - - - - - -+- - -+
+ * |             Header          | Pad |           Payload           | Pad |
+ * |      (struct rocker_tlv)    | ing |                             | ing |
+ * +-----------------------------+- - -+- - - - - - - - - - - - - - -+- - -+
+ *  <--------------------------- tlv->len -------------------------->
+ */
+
+static struct rocker_tlv *rocker_tlv_next(const struct rocker_tlv *tlv,
+					  int *remaining)
+{
+	int totlen = ROCKER_TLV_ALIGN(tlv->len);
+
+	*remaining -= totlen;
+	return (struct rocker_tlv *) ((char *) tlv + totlen);
+}
+
+static int rocker_tlv_ok(const struct rocker_tlv *tlv, int remaining)
+{
+	return remaining >= (int) ROCKER_TLV_HDRLEN &&
+	       tlv->len >= ROCKER_TLV_HDRLEN &&
+	       tlv->len <= remaining;
+}
+
+#define rocker_tlv_for_each(pos, head, len, rem)	\
+	for (pos = head, rem = len;			\
+	     rocker_tlv_ok(pos, rem);			\
+	     pos = rocker_tlv_next(pos, &(rem)))
+
+#define rocker_tlv_for_each_nested(pos, tlv, rem)	\
+	rocker_tlv_for_each(pos, rocker_tlv_data(tlv),	\
+			    rocker_tlv_len(tlv), rem)
+
+static int rocker_tlv_attr_size(int payload)
+{
+	return ROCKER_TLV_HDRLEN + payload;
+}
+
+static int rocker_tlv_total_size(int payload)
+{
+	return ROCKER_TLV_ALIGN(rocker_tlv_attr_size(payload));
+}
+
+static int rocker_tlv_padlen(int payload)
+{
+	return rocker_tlv_total_size(payload) - rocker_tlv_attr_size(payload);
+}
+
+static int rocker_tlv_type(const struct rocker_tlv *tlv)
+{
+	return tlv->type;
+}
+
+static void *rocker_tlv_data(const struct rocker_tlv *tlv)
+{
+	return (char *) tlv + ROCKER_TLV_HDRLEN;
+}
+
+static int rocker_tlv_len(const struct rocker_tlv *tlv)
+{
+	return tlv->len - ROCKER_TLV_HDRLEN;
+}
+
+static u8 rocker_tlv_get_u8(const struct rocker_tlv *tlv)
+{
+	return *(u8 *) rocker_tlv_data(tlv);
+}
+
+static u16 rocker_tlv_get_u16(const struct rocker_tlv *tlv)
+{
+	return *(u16 *) rocker_tlv_data(tlv);
+}
+
+static u32 rocker_tlv_get_u32(const struct rocker_tlv *tlv)
+{
+	return *(u32 *) rocker_tlv_data(tlv);
+}
+
+static u64 rocker_tlv_get_u64(const struct rocker_tlv *tlv)
+{
+	return *(u64 *) rocker_tlv_data(tlv);
+}
+
+static void rocker_tlv_parse(struct rocker_tlv **tb, int maxtype,
+			     const char *buf, int buf_len)
+{
+	const struct rocker_tlv *tlv;
+	const struct rocker_tlv *head = (const struct rocker_tlv *) buf;
+	int rem;
+
+	memset(tb, 0, sizeof(struct rocker_tlv *) * (maxtype + 1));
+
+	rocker_tlv_for_each(tlv, head, buf_len, rem) {
+		u32 type = rocker_tlv_type(tlv);
+
+		if (type > 0 && type <= maxtype)
+			tb[type] = (struct rocker_tlv *) tlv;
+	}
+}
+
+static void rocker_tlv_parse_nested(struct rocker_tlv **tb, int maxtype,
+				    const struct rocker_tlv *tlv)
+{
+	rocker_tlv_parse(tb, maxtype, rocker_tlv_data(tlv),
+			 rocker_tlv_len(tlv));
+}
+
+static void rocker_tlv_parse_desc(struct rocker_tlv **tb, int maxtype,
+				  struct rocker_desc_info *desc_info)
+{
+	rocker_tlv_parse(tb, maxtype, desc_info->data,
+			 desc_info->desc->tlv_size);
+}
+
+static struct rocker_tlv *rocker_tlv_start(struct rocker_desc_info *desc_info)
+{
+	return (struct rocker_tlv *) ((char *) desc_info->data +
+					       desc_info->tlv_size);
+}
+
+static int rocker_tlv_put(struct rocker_desc_info *desc_info,
+			  int attrtype, int attrlen, const void *data)
+{
+	int tail_room = desc_info->data_size - desc_info->tlv_size;
+	int total_size = rocker_tlv_total_size(attrlen);
+	struct rocker_tlv *tlv;
+
+	if (unlikely(tail_room < total_size))
+		return -EMSGSIZE;
+
+	tlv = rocker_tlv_start(desc_info);
+	desc_info->tlv_size += total_size;
+	tlv->type = attrtype;
+	tlv->len = rocker_tlv_attr_size(attrlen);
+	memcpy(rocker_tlv_data(tlv), data, attrlen);
+	memset((char *) tlv + tlv->len, 0, rocker_tlv_padlen(attrlen));
+	return 0;
+}
+
+static int rocker_tlv_put_u8(struct rocker_desc_info *desc_info,
+			     int attrtype, u8 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u8), &value);
+}
+
+static int rocker_tlv_put_u16(struct rocker_desc_info *desc_info,
+			      int attrtype, u16 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u16), &value);
+}
+
+static int rocker_tlv_put_u32(struct rocker_desc_info *desc_info,
+			      int attrtype, u32 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u32), &value);
+}
+
+static int rocker_tlv_put_u64(struct rocker_desc_info *desc_info,
+			      int attrtype, u64 value)
+{
+	return rocker_tlv_put(desc_info, attrtype, sizeof(u64), &value);
+}
+
+static struct rocker_tlv *
+rocker_tlv_nest_start(struct rocker_desc_info *desc_info, int attrtype)
+{
+	struct rocker_tlv *start = rocker_tlv_start(desc_info);
+
+	if (rocker_tlv_put(desc_info, attrtype, 0, NULL) < 0)
+		return NULL;
+
+	return start;
+}
+
+static void rocker_tlv_nest_end(struct rocker_desc_info *desc_info,
+				struct rocker_tlv *start)
+{
+	start->len = (char *) rocker_tlv_start(desc_info) - (char *) start;
+}
+
+static void rocker_tlv_nest_cancel(struct rocker_desc_info *desc_info,
+				   struct rocker_tlv *start)
+{
+	desc_info->tlv_size = (char *) start - desc_info->data;
+}
+
+/******************************************
+ * DMA rings and descriptors manipulations
+ ******************************************/
+
+static u32 __pos_inc(u32 pos, size_t limit)
+{
+	return ++pos == limit ? 0 : pos;
+}
+
+static int rocker_desc_err(struct rocker_desc_info *desc_info)
+{
+	return -(desc_info->desc->comp_err & ~ROCKER_DMA_DESC_COMP_ERR_GEN);
+}
+
+static void rocker_desc_gen_clear(struct rocker_desc_info *desc_info)
+{
+	desc_info->desc->comp_err &= ~ROCKER_DMA_DESC_COMP_ERR_GEN;
+}
+
+static bool rocker_desc_gen(struct rocker_desc_info *desc_info)
+{
+	u32 comp_err = desc_info->desc->comp_err;
+
+	return comp_err & ROCKER_DMA_DESC_COMP_ERR_GEN ? true : false;
+}
+
+static void *rocker_desc_cookie_ptr_get(struct rocker_desc_info *desc_info)
+{
+	return (void *) desc_info->desc->cookie;
+}
+
+static void rocker_desc_cookie_ptr_set(struct rocker_desc_info *desc_info,
+				       void *ptr)
+{
+	desc_info->desc->cookie = (long) ptr;
+}
+
+static struct rocker_desc_info *
+rocker_desc_head_get(struct rocker_dma_ring_info *info)
+{
+	static struct rocker_desc_info *desc_info;
+	u32 head = __pos_inc(info->head, info->size);
+
+	desc_info = &info->desc_info[info->head];
+	if (head == info->tail)
+		return NULL; /* ring full */
+	desc_info->tlv_size = 0;
+	return desc_info;
+}
+
+static void rocker_desc_commit(struct rocker_desc_info *desc_info)
+{
+	desc_info->desc->buf_size = desc_info->data_size;
+	desc_info->desc->tlv_size = desc_info->tlv_size;
+}
+
+static void rocker_desc_head_set(struct rocker *rocker,
+				 struct rocker_dma_ring_info *info,
+				 struct rocker_desc_info *desc_info)
+{
+	u32 head = __pos_inc(info->head, info->size);
+
+	BUG_ON(head == info->tail);
+	rocker_desc_commit(desc_info);
+	info->head = head;
+	rocker_write32(rocker, DMA_DESC_HEAD(info->type), head);
+}
+
+static struct rocker_desc_info *
+rocker_desc_tail_get(struct rocker_dma_ring_info *info)
+{
+	static struct rocker_desc_info *desc_info;
+
+	if (info->tail == info->head)
+		return NULL; /* no thing to be done between head and tail */
+	desc_info = &info->desc_info[info->tail];
+	if (!rocker_desc_gen(desc_info))
+		return NULL; /* gen bit not set, desc is not ready yet */
+	info->tail = __pos_inc(info->tail, info->size);
+	desc_info->tlv_size = desc_info->desc->tlv_size;
+	return desc_info;
+}
+
+static void rocker_dma_ring_credits_set(struct rocker *rocker,
+					struct rocker_dma_ring_info *info,
+					u32 credits)
+{
+	if (credits)
+		rocker_write32(rocker, DMA_DESC_CREDITS(info->type), credits);
+}
+
+static unsigned long rocker_dma_ring_size_fix(size_t size)
+{
+	return max(ROCKER_DMA_SIZE_MIN,
+		   min(roundup_pow_of_two(size), ROCKER_DMA_SIZE_MAX));
+}
+
+static int rocker_dma_ring_create(struct rocker *rocker,
+				  unsigned int type,
+				  size_t size,
+				  struct rocker_dma_ring_info *info)
+{
+	int i;
+
+	BUG_ON(size != rocker_dma_ring_size_fix(size));
+	info->size = size;
+	info->type = type;
+	info->head = 0;
+	info->tail = 0;
+	info->desc_info = kcalloc(info->size, sizeof(*info->desc_info),
+				  GFP_KERNEL);
+	if (!info->desc_info)
+		return -ENOMEM;
+
+	info->desc = pci_alloc_consistent(rocker->pdev,
+					  info->size * sizeof(*info->desc),
+					  &info->mapaddr);
+	if (!info->desc) {
+		kfree(info->desc_info);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < info->size; i++)
+		info->desc_info[i].desc = &info->desc[i];
+
+	rocker_write32(rocker, DMA_DESC_CTRL(info->type),
+		       ROCKER_DMA_DESC_CTRL_RESET);
+	rocker_write64(rocker, DMA_DESC_ADDR(info->type), info->mapaddr);
+	rocker_write32(rocker, DMA_DESC_SIZE(info->type), info->size);
+
+	return 0;
+}
+
+static void rocker_dma_ring_destroy(struct rocker *rocker,
+				    struct rocker_dma_ring_info *info)
+{
+	rocker_write64(rocker, DMA_DESC_ADDR(info->type), 0);
+
+	pci_free_consistent(rocker->pdev,
+			    info->size * sizeof(struct rocker_desc),
+			    info->desc, info->mapaddr);
+	kfree(info->desc_info);
+}
+
+static void rocker_dma_ring_pass_to_producer(struct rocker *rocker,
+					     struct rocker_dma_ring_info *info)
+{
+	int i;
+
+	BUG_ON(info->head || info->tail);
+
+	/* When ring is consumer, we need to advance head for each desc.
+	 * That tells hw that the desc is ready to be used by it.
+	 */
+	for (i = 0; i < info->size - 1; i++)
+		rocker_desc_head_set(rocker, info, &info->desc_info[i]);
+	rocker_desc_commit(&info->desc_info[i]);
+}
+
+static int rocker_dma_ring_bufs_alloc(struct rocker *rocker,
+				      struct rocker_dma_ring_info *info,
+				      int direction, size_t buf_size)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int i;
+	int err;
+
+	for (i = 0; i < info->size; i++) {
+		struct rocker_desc_info *desc_info = &info->desc_info[i];
+		struct rocker_desc *desc = &info->desc[i];
+		dma_addr_t dma_handle;
+		char *buf;
+
+		buf = kzalloc(buf_size, GFP_KERNEL | GFP_DMA);
+		if (!buf) {
+			err = -ENOMEM;
+			goto rollback;
+		}
+
+		dma_handle = pci_map_single(pdev, buf, buf_size, direction);
+		if (pci_dma_mapping_error(pdev, dma_handle)) {
+			kfree(buf);
+			err = -EIO;
+			goto rollback;
+		}
+
+		desc_info->data = buf;
+		desc_info->data_size = buf_size;
+		dma_unmap_addr_set(desc_info, mapaddr, dma_handle);
+
+		desc->buf_addr = dma_handle;
+		desc->buf_size = buf_size;
+	}
+	return 0;
+
+rollback:
+	for (i--; i >= 0; i--) {
+		struct rocker_desc_info *desc_info = &info->desc_info[i];
+
+		pci_unmap_single(pdev, dma_unmap_addr(desc_info, mapaddr),
+				 desc_info->data_size, direction);
+		kfree(desc_info->data);
+	}
+	return err;
+}
+
+static void rocker_dma_ring_bufs_free(struct rocker *rocker,
+				      struct rocker_dma_ring_info *info,
+				      int direction)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int i;
+
+	for (i = 0; i < info->size; i++) {
+		struct rocker_desc_info *desc_info = &info->desc_info[i];
+		struct rocker_desc *desc = &info->desc[i];
+
+		desc->buf_addr = 0;
+		desc->buf_size = 0;
+		pci_unmap_single(pdev, dma_unmap_addr(desc_info, mapaddr),
+				 desc_info->data_size, direction);
+		kfree(desc_info->data);
+	}
+}
+
+static int rocker_dma_rings_init(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int err;
+
+	err = rocker_dma_ring_create(rocker, ROCKER_DMA_CMD,
+				     ROCKER_DMA_CMD_DEFAULT_SIZE,
+				     &rocker->cmd_ring);
+	if (err) {
+		dev_err(&pdev->dev, "failed to create command dma ring\n");
+		return err;
+	}
+
+	spin_lock_init(&rocker->cmd_ring_lock);
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker->cmd_ring,
+					 PCI_DMA_BIDIRECTIONAL, PAGE_SIZE);
+	if (err) {
+		dev_err(&pdev->dev, "failed to alloc command dma ring buffers\n");
+		goto err_dma_cmd_ring_bufs_alloc;
+	}
+
+	err = rocker_dma_ring_create(rocker, ROCKER_DMA_EVENT,
+				     ROCKER_DMA_EVENT_DEFAULT_SIZE,
+				     &rocker->event_ring);
+	if (err) {
+		dev_err(&pdev->dev, "failed to create event dma ring\n");
+		goto err_dma_event_ring_create;
+	}
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker->event_ring,
+					 PCI_DMA_FROMDEVICE, PAGE_SIZE);
+	if (err) {
+		dev_err(&pdev->dev, "failed to alloc event dma ring buffers\n");
+		goto err_dma_event_ring_bufs_alloc;
+	}
+	rocker_dma_ring_pass_to_producer(rocker, &rocker->event_ring);
+	return 0;
+
+err_dma_event_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker->event_ring);
+err_dma_event_ring_create:
+	rocker_dma_ring_bufs_free(rocker, &rocker->cmd_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+err_dma_cmd_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker->cmd_ring);
+	return err;
+}
+
+static void rocker_dma_rings_fini(struct rocker *rocker)
+{
+	rocker_dma_ring_bufs_free(rocker, &rocker->event_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+	rocker_dma_ring_destroy(rocker, &rocker->event_ring);
+	rocker_dma_ring_bufs_free(rocker, &rocker->cmd_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+	rocker_dma_ring_destroy(rocker, &rocker->cmd_ring);
+}
+
+static int rocker_dma_rx_ring_skb_map(struct rocker *rocker,
+				      struct rocker_port *rocker_port,
+				      struct rocker_desc_info *desc_info,
+				      struct sk_buff *skb, size_t buf_len)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	dma_addr_t dma_handle;
+
+	dma_handle = pci_map_single(pdev, skb->data, buf_len,
+				    PCI_DMA_FROMDEVICE);
+	if (pci_dma_mapping_error(pdev, dma_handle))
+		return -EIO;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_RX_FRAG_ADDR, dma_handle))
+		goto tlv_put_failure;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_RX_FRAG_MAX_LEN, buf_len))
+		goto tlv_put_failure;
+	return 0;
+
+tlv_put_failure:
+	pci_unmap_single(pdev, dma_handle, buf_len, PCI_DMA_FROMDEVICE);
+	desc_info->tlv_size = 0;
+	return -EMSGSIZE;
+}
+
+static size_t rocker_port_rx_buf_len(struct rocker_port *rocker_port)
+{
+	return rocker_port->dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
+}
+
+static int rocker_dma_rx_ring_skb_alloc(struct rocker *rocker,
+					struct rocker_port *rocker_port,
+					struct rocker_desc_info *desc_info)
+{
+	struct net_device *dev = rocker_port->dev;
+	struct sk_buff *skb;
+	size_t buf_len = rocker_port_rx_buf_len(rocker_port);
+	int err;
+
+	/* Ensure that hw will see tlv_size zero in case of an error.
+	 * That tells hw to use another descriptor.
+	 */
+	rocker_desc_cookie_ptr_set(desc_info, NULL);
+	desc_info->tlv_size = 0;
+
+	skb = netdev_alloc_skb_ip_align(dev, buf_len);
+	if (!skb)
+		return -ENOMEM;
+	err = rocker_dma_rx_ring_skb_map(rocker, rocker_port, desc_info,
+					 skb, buf_len);
+	if (err) {
+		dev_kfree_skb_any(skb);
+		return err;
+	}
+	rocker_desc_cookie_ptr_set(desc_info, skb);
+	return 0;
+}
+
+static void rocker_dma_rx_ring_skb_unmap(struct rocker *rocker,
+					 struct rocker_tlv **attrs)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	dma_addr_t dma_handle;
+	size_t len;
+
+	if (!attrs[ROCKER_TLV_RX_FRAG_ADDR] ||
+	    !attrs[ROCKER_TLV_RX_FRAG_MAX_LEN])
+		return;
+	dma_handle = rocker_tlv_get_u64(attrs[ROCKER_TLV_RX_FRAG_ADDR]);
+	len = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FRAG_MAX_LEN]);
+	pci_unmap_single(pdev, dma_handle, len, PCI_DMA_FROMDEVICE);
+}
+
+static void rocker_dma_rx_ring_skb_free(struct rocker *rocker,
+					struct rocker_desc_info *desc_info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1];
+	struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info);
+
+	if (!skb)
+		return;
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info);
+	rocker_dma_rx_ring_skb_unmap(rocker, attrs);
+	dev_kfree_skb_any(skb);
+}
+
+static int rocker_dma_rx_ring_skbs_alloc(struct rocker *rocker,
+					 struct rocker_port *rocker_port)
+{
+	struct rocker_dma_ring_info *rx_ring = &rocker_port->rx_ring;
+	int i;
+	int err;
+
+	for (i = 0; i < rx_ring->size; i++) {
+		err = rocker_dma_rx_ring_skb_alloc(rocker, rocker_port,
+						   &rx_ring->desc_info[i]);
+		if (err)
+			goto rollback;
+	}
+	return 0;
+
+rollback:
+	for (i--; i >= 0; i--)
+		rocker_dma_rx_ring_skb_free(rocker, &rx_ring->desc_info[i]);
+	return err;
+}
+
+static void rocker_dma_rx_ring_skbs_free(struct rocker *rocker,
+					 struct rocker_port *rocker_port)
+{
+	struct rocker_dma_ring_info *rx_ring = &rocker_port->rx_ring;
+	int i;
+
+	for (i = 0; i < rx_ring->size; i++)
+		rocker_dma_rx_ring_skb_free(rocker, &rx_ring->desc_info[i]);
+}
+
+static int rocker_port_dma_rings_init(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	int err;
+
+	err = rocker_dma_ring_create(rocker,
+				     ROCKER_DMA_TX(rocker_port->port_number),
+				     ROCKER_DMA_TX_DEFAULT_SIZE,
+				     &rocker_port->tx_ring);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to create tx dma ring\n");
+		return err;
+	}
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker_port->tx_ring,
+					 PCI_DMA_TODEVICE,
+					 ROCKER_DMA_TX_DESC_SIZE);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to alloc tx dma ring buffers\n");
+		goto err_dma_tx_ring_bufs_alloc;
+	}
+
+	err = rocker_dma_ring_create(rocker,
+				     ROCKER_DMA_RX(rocker_port->port_number),
+				     ROCKER_DMA_RX_DEFAULT_SIZE,
+				     &rocker_port->rx_ring);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to create rx dma ring\n");
+		goto err_dma_rx_ring_create;
+	}
+
+	err = rocker_dma_ring_bufs_alloc(rocker, &rocker_port->rx_ring,
+					 PCI_DMA_BIDIRECTIONAL,
+					 ROCKER_DMA_RX_DESC_SIZE);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to alloc rx dma ring buffers\n");
+		goto err_dma_rx_ring_bufs_alloc;
+	}
+
+	err = rocker_dma_rx_ring_skbs_alloc(rocker, rocker_port);
+	if (err) {
+		netdev_err(rocker_port->dev, "failed to alloc rx dma ring skbs\n");
+		goto err_dma_rx_ring_skbs_alloc;
+	}
+	rocker_dma_ring_pass_to_producer(rocker, &rocker_port->rx_ring);
+
+	return 0;
+
+err_dma_rx_ring_skbs_alloc:
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->rx_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+err_dma_rx_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker_port->rx_ring);
+err_dma_rx_ring_create:
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->tx_ring,
+				  PCI_DMA_TODEVICE);
+err_dma_tx_ring_bufs_alloc:
+	rocker_dma_ring_destroy(rocker, &rocker_port->tx_ring);
+	return err;
+}
+
+static void rocker_port_dma_rings_fini(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+
+	rocker_dma_rx_ring_skbs_free(rocker, rocker_port);
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->rx_ring,
+				  PCI_DMA_BIDIRECTIONAL);
+	rocker_dma_ring_destroy(rocker, &rocker_port->rx_ring);
+	rocker_dma_ring_bufs_free(rocker, &rocker_port->tx_ring,
+				  PCI_DMA_TODEVICE);
+	rocker_dma_ring_destroy(rocker, &rocker_port->tx_ring);
+}
+
+static void rocker_port_set_enable(struct rocker_port *rocker_port, bool enable)
+{
+	u64 val = rocker_read64(rocker_port->rocker, PORT_PHYS_ENABLE);
+
+	if (enable)
+		val |= 1 << rocker_port_to_lport(rocker_port);
+	else
+		val &= ~(1 << rocker_port_to_lport(rocker_port));
+	rocker_write64(rocker_port->rocker, PORT_PHYS_ENABLE, val);
+}
+
+/********************************
+ * Interrupt handler and helpers
+ ********************************/
+
+static irqreturn_t rocker_cmd_irq_handler(int irq, void *dev_id)
+{
+	struct rocker *rocker = dev_id;
+	struct rocker_desc_info *desc_info;
+	struct rocker_wait *wait;
+	u32 credits = 0;
+
+	spin_lock(&rocker->cmd_ring_lock);
+	while ((desc_info = rocker_desc_tail_get(&rocker->cmd_ring))) {
+		wait = rocker_desc_cookie_ptr_get(desc_info);
+		if (wait->nowait) {
+			rocker_desc_gen_clear(desc_info);
+			rocker_wait_destroy(wait);
+		} else {
+			rocker_wait_wake_up(wait);
+		}
+		credits++;
+	}
+	spin_unlock(&rocker->cmd_ring_lock);
+	rocker_dma_ring_credits_set(rocker, &rocker->cmd_ring, credits);
+
+	return IRQ_HANDLED;
+}
+
+static void rocker_port_link_up(struct rocker_port *rocker_port)
+{
+	netif_carrier_on(rocker_port->dev);
+	netdev_info(rocker_port->dev, "Link is up\n");
+}
+
+static void rocker_port_link_down(struct rocker_port *rocker_port)
+{
+	netif_carrier_off(rocker_port->dev);
+	netdev_info(rocker_port->dev, "Link is down\n");
+}
+
+static int rocker_event_process(struct rocker *rocker,
+				struct rocker_desc_info *desc_info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_EVENT_MAX + 1];
+	struct rocker_tlv *info_attrs[ROCKER_TLV_EVENT_LINK_CHANGED_MAX + 1];
+	u16 type;
+	unsigned port_number;
+	bool link_up;
+	struct rocker_port *rocker_port;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_EVENT_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_EVENT_TYPE] ||
+	    !attrs[ROCKER_TLV_EVENT_INFO])
+		return -EIO;
+
+	type = rocker_tlv_get_u16(attrs[ROCKER_TLV_EVENT_TYPE]);
+	if (!type)
+		return -EOPNOTSUPP;
+
+	rocker_tlv_parse_nested(info_attrs, ROCKER_TLV_EVENT_LINK_CHANGED_MAX,
+				attrs[ROCKER_TLV_EVENT_INFO]);
+	if (!info_attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LPORT] ||
+	    !info_attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LINKUP])
+		return -EIO;
+	port_number = rocker_tlv_get_u32(info_attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LPORT]) - 1;
+	link_up = rocker_tlv_get_u8(info_attrs[ROCKER_TLV_EVENT_LINK_CHANGED_LINKUP]);
+
+	if (port_number >= rocker->port_count)
+		return -EINVAL;
+
+	rocker_port = rocker->ports[port_number];
+	if (netif_carrier_ok(rocker_port->dev) != link_up) {
+		if (link_up)
+			rocker_port_link_up(rocker_port);
+		else
+			rocker_port_link_down(rocker_port);
+	}
+	return 0;
+}
+
+static irqreturn_t rocker_event_irq_handler(int irq, void *dev_id)
+{
+	struct rocker *rocker = dev_id;
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_desc_info *desc_info;
+	u32 credits = 0;
+	int err;
+
+	while ((desc_info = rocker_desc_tail_get(&rocker->event_ring))) {
+		err = rocker_desc_err(desc_info);
+		if (err) {
+			dev_err(&pdev->dev, "event desc received with err %d\n",
+				err);
+		} else {
+			err = rocker_event_process(rocker, desc_info);
+			if (err)
+				dev_err(&pdev->dev, "event processing failed with err %d\n",
+					err);
+		}
+		rocker_desc_gen_clear(desc_info);
+		rocker_desc_head_set(rocker, &rocker->event_ring, desc_info);
+		credits++;
+	}
+	rocker_dma_ring_credits_set(rocker, &rocker->event_ring, credits);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t rocker_tx_irq_handler(int irq, void *dev_id)
+{
+	struct rocker_port *rocker_port = dev_id;
+
+	napi_schedule(&rocker_port->napi_tx);
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t rocker_rx_irq_handler(int irq, void *dev_id)
+{
+	struct rocker_port *rocker_port = dev_id;
+
+	napi_schedule(&rocker_port->napi_rx);
+	return IRQ_HANDLED;
+}
+
+/********************
+ * Command interface
+ ********************/
+
+typedef int (*rocker_cmd_cb_t)(struct rocker *rocker,
+			       struct rocker_port *rocker_port,
+			       struct rocker_desc_info *desc_info,
+			       void *priv);
+
+static int rocker_cmd_exec(struct rocker *rocker,
+			   struct rocker_port *rocker_port,
+			   rocker_cmd_cb_t prepare, void *prepare_priv,
+			   rocker_cmd_cb_t process, void *process_priv,
+			   bool nowait)
+{
+	struct rocker_desc_info *desc_info;
+	struct rocker_wait *wait;
+	unsigned long flags;
+	int err;
+
+	wait = rocker_wait_create(nowait ? GFP_ATOMIC : GFP_KERNEL);
+	if (!wait)
+		return -ENOMEM;
+	wait->nowait = nowait;
+
+	spin_lock_irqsave(&rocker->cmd_ring_lock, flags);
+	desc_info = rocker_desc_head_get(&rocker->cmd_ring);
+	if (!desc_info) {
+		spin_unlock_irqrestore(&rocker->cmd_ring_lock, flags);
+		err = -EAGAIN;
+		goto out;
+	}
+	err = prepare(rocker, rocker_port, desc_info, prepare_priv);
+	if (err) {
+		spin_unlock_irqrestore(&rocker->cmd_ring_lock, flags);
+		goto out;
+	}
+	rocker_desc_cookie_ptr_set(desc_info, wait);
+	rocker_desc_head_set(rocker, &rocker->cmd_ring, desc_info);
+	spin_unlock_irqrestore(&rocker->cmd_ring_lock, flags);
+
+	if (nowait)
+		return 0;
+
+	if (!rocker_wait_event_timeout(wait, HZ / 10))
+		return -EIO;
+
+	err = rocker_desc_err(desc_info);
+	if (err)
+		return err;
+
+	if (process)
+		err = process(rocker, rocker_port, desc_info, process_priv);
+
+	rocker_desc_gen_clear(desc_info);
+out:
+	rocker_wait_destroy(wait);
+	return err;
+}
+
+static int
+rocker_cmd_get_port_settings_prep(struct rocker *rocker,
+				  struct rocker_port *rocker_port,
+				  struct rocker_desc_info *desc_info,
+				  void *priv)
+{
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_GET_PORT_SETTINGS))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,
+			       rocker_port_to_lport(rocker_port)))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+	return 0;
+}
+
+static int
+rocker_cmd_get_port_settings_ethtool_proc(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	struct ethtool_cmd *ecmd = priv;
+	struct rocker_tlv *attrs[ROCKER_TLV_CMD_MAX + 1];
+	struct rocker_tlv *info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MAX + 1];
+	u32 speed;
+	u8 duplex;
+	u8 autoneg;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_CMD_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_CMD_INFO])
+		return -EIO;
+
+	rocker_tlv_parse_nested(info_attrs, ROCKER_TLV_CMD_PORT_SETTINGS_MAX,
+				attrs[ROCKER_TLV_CMD_INFO]);
+	if (!info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_SPEED] ||
+	    !info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX] ||
+	    !info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG])
+		return -EIO;
+
+	speed = rocker_tlv_get_u32(info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_SPEED]);
+	duplex = rocker_tlv_get_u8(info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX]);
+	autoneg = rocker_tlv_get_u8(info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG]);
+
+	ecmd->transceiver = XCVR_INTERNAL;
+	ecmd->supported = SUPPORTED_TP;
+	ecmd->phy_address = 0xff;
+	ecmd->port = PORT_TP;
+	ethtool_cmd_speed_set(ecmd, speed);
+	ecmd->duplex = duplex ? DUPLEX_FULL : DUPLEX_HALF;
+	ecmd->autoneg = autoneg ? AUTONEG_ENABLE : AUTONEG_DISABLE;
+
+	return 0;
+}
+
+static int
+rocker_cmd_get_port_settings_macaddr_proc(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	unsigned char *macaddr = priv;
+	struct rocker_tlv *attrs[ROCKER_TLV_CMD_MAX + 1];
+	struct rocker_tlv *info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MAX + 1];
+	struct rocker_tlv *attr;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_CMD_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_CMD_INFO])
+		return -EIO;
+
+	rocker_tlv_parse_nested(info_attrs, ROCKER_TLV_CMD_PORT_SETTINGS_MAX,
+				attrs[ROCKER_TLV_CMD_INFO]);
+	attr = info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MACADDR];
+	if (!attr)
+		return -EIO;
+
+	if (rocker_tlv_len(attr) != ETH_ALEN)
+		return -EINVAL;
+
+	ether_addr_copy(macaddr, rocker_tlv_data(attr));
+	return 0;
+}
+
+static int
+rocker_cmd_get_port_settings_mode_proc(struct rocker *rocker,
+				       struct rocker_port *rocker_port,
+				       struct rocker_desc_info *desc_info,
+				       void *priv)
+{
+	enum rocker_port_mode *mode = priv;
+	struct rocker_tlv *attrs[ROCKER_TLV_CMD_MAX + 1];
+	struct rocker_tlv *info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MAX + 1];
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_CMD_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_CMD_INFO])
+		return -EIO;
+
+	rocker_tlv_parse_nested(info_attrs, ROCKER_TLV_CMD_PORT_SETTINGS_MAX,
+				attrs[ROCKER_TLV_CMD_INFO]);
+	if (!info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MODE])
+		return -EIO;
+
+	*mode = rocker_tlv_get_u8(info_attrs[ROCKER_TLV_CMD_PORT_SETTINGS_MODE]);
+
+	return 0;
+}
+
+static int
+rocker_cmd_set_port_settings_ethtool_prep(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	struct ethtool_cmd *ecmd = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_SET_PORT_SETTINGS))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,
+			       rocker_port_to_lport(rocker_port)))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_SPEED,
+			       ethtool_cmd_speed(ecmd)))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX,
+			      ecmd->duplex))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG,
+			      ecmd->autoneg))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+	return 0;
+}
+
+static int
+rocker_cmd_set_port_settings_macaddr_prep(struct rocker *rocker,
+					  struct rocker_port *rocker_port,
+					  struct rocker_desc_info *desc_info,
+					  void *priv)
+{
+	unsigned char *macaddr = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_SET_PORT_SETTINGS))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,
+			       rocker_port_to_lport(rocker_port)))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_MACADDR,
+			   ETH_ALEN, macaddr))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+	return 0;
+}
+
+static int
+rocker_cmd_set_port_settings_mode_prep(struct rocker *rocker,
+				       struct rocker_port *rocker_port,
+				       struct rocker_desc_info *desc_info,
+				       void *priv)
+{
+	enum rocker_port_mode *mode = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_SET_PORT_SETTINGS))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,
+			       rocker_port_to_lport(rocker_port)))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_CMD_PORT_SETTINGS_MODE,
+			      *mode))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+	return 0;
+}
+
+static int rocker_cmd_get_port_settings_ethtool(struct rocker_port *rocker_port,
+						struct ethtool_cmd *ecmd)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_get_port_settings_prep, NULL,
+			       rocker_cmd_get_port_settings_ethtool_proc,
+			       ecmd, false);
+}
+
+static int rocker_cmd_get_port_settings_macaddr(struct rocker_port *rocker_port,
+						unsigned char *macaddr)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_get_port_settings_prep, NULL,
+			       rocker_cmd_get_port_settings_macaddr_proc,
+			       macaddr, false);
+}
+
+static int rocker_cmd_get_port_settings_mode(struct rocker_port *rocker_port,
+					     enum rocker_port_mode *mode)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_get_port_settings_prep, NULL,
+			       rocker_cmd_get_port_settings_mode_proc,
+			       mode, false);
+}
+
+static int rocker_cmd_set_port_settings_ethtool(struct rocker_port *rocker_port,
+						struct ethtool_cmd *ecmd)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_set_port_settings_ethtool_prep,
+			       ecmd, NULL, NULL, false);
+}
+
+static int rocker_cmd_set_port_settings_macaddr(struct rocker_port *rocker_port,
+						unsigned char *macaddr)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_set_port_settings_macaddr_prep,
+			       macaddr, NULL, NULL, false);
+}
+
+static int rocker_cmd_set_port_settings_mode(struct rocker_port *rocker_port,
+					     enum rocker_port_mode mode)
+{
+	return rocker_cmd_exec(rocker_port->rocker, rocker_port,
+			       rocker_cmd_set_port_settings_mode_prep,
+			       &mode, NULL, NULL, false);
+}
+
+static int rocker_cmd_flow_tbl_add_ig_port(struct rocker_desc_info *desc_info,
+					   struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
+			       entry->key.ig_port.in_lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT_MASK,
+			       entry->key.ig_port.in_lport_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.ig_port.goto_tbl))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add_vlan(struct rocker_desc_info *desc_info,
+					struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
+			       entry->key.vlan.in_lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->key.vlan.vlan_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+			       entry->key.vlan.vlan_id_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.vlan.goto_tbl))
+		return -EMSGSIZE;
+	if (entry->key.vlan.untagged &&
+	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_NEW_VLAN_ID,
+			       entry->key.vlan.new_vlan_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add_bridge(struct rocker_desc_info *desc_info,
+					  struct rocker_flow_tbl_entry *entry)
+{
+	if (entry->key.bridge.has_eth_dst &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->key.bridge.eth_dst))
+		return -EMSGSIZE;
+	if (entry->key.bridge.has_eth_dst_mask &&
+	    rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
+			   ETH_ALEN, entry->key.bridge.eth_dst_mask))
+		return -EMSGSIZE;
+	if (entry->key.bridge.vlan_id &&
+	    rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->key.bridge.vlan_id))
+		return -EMSGSIZE;
+	if (entry->key.bridge.tunnel_id &&
+	    rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_TUNNEL_ID,
+			       entry->key.bridge.tunnel_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,
+			       entry->key.bridge.goto_tbl))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->key.bridge.group_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add_acl(struct rocker_desc_info *desc_info,
+				       struct rocker_flow_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT,
+			       entry->key.acl.in_lport))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_IN_LPORT_MASK,
+			       entry->key.acl.in_lport_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_SRC_MAC,
+			   ETH_ALEN, entry->key.acl.eth_src))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_SRC_MAC_MASK,
+			   ETH_ALEN, entry->key.acl.eth_src_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC,
+			   ETH_ALEN, entry->key.acl.eth_dst))
+		return -EMSGSIZE;
+	if (rocker_tlv_put(desc_info, ROCKER_TLV_OF_DPA_DST_MAC_MASK,
+			   ETH_ALEN, entry->key.acl.eth_dst_mask))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_ETHERTYPE,
+			       entry->key.acl.eth_type))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID,
+			       entry->key.acl.vlan_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_VLAN_ID_MASK,
+			       entry->key.acl.vlan_id_mask))
+		return -EMSGSIZE;
+
+	switch (ntohs(entry->key.acl.eth_type)) {
+	case ETH_P_IP:
+	case ETH_P_IPV6:
+		if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_IP_PROTO,
+				      entry->key.acl.ip_proto))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info,
+				      ROCKER_TLV_OF_DPA_IP_PROTO_MASK,
+				      entry->key.acl.ip_proto_mask))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_IP_DSCP,
+				      entry->key.acl.ip_tos & 0x3f))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info,
+				      ROCKER_TLV_OF_DPA_IP_DSCP_MASK,
+				      entry->key.acl.ip_tos_mask & 0x3f))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_IP_ECN,
+				      (entry->key.acl.ip_tos & 0xc0) >> 6))
+			return -EMSGSIZE;
+		if (rocker_tlv_put_u8(desc_info,
+				      ROCKER_TLV_OF_DPA_IP_ECN_MASK,
+				      (entry->key.acl.ip_tos_mask & 0xc0) >> 6))
+			return -EMSGSIZE;
+		break;
+	}
+
+	if (entry->key.acl.group_id != ROCKER_GROUP_NONE &&
+	    rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->key.acl.group_id))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_add(struct rocker *rocker,
+				   struct rocker_port *rocker_port,
+				   struct rocker_desc_info *desc_info,
+				   void *priv)
+{
+	struct rocker_flow_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+	int err = 0;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_ADD))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_TABLE_ID,
+			       entry->key.tbl_id))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_PRIORITY,
+			       entry->key.priority))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_HARDTIME, 0))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_OF_DPA_COOKIE,
+			       entry->cookie))
+		return -EMSGSIZE;
+
+	switch (entry->key.tbl_id) {
+	case ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT:
+		err = rocker_cmd_flow_tbl_add_ig_port(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_VLAN:
+		err = rocker_cmd_flow_tbl_add_vlan(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_BRIDGING:
+		err = rocker_cmd_flow_tbl_add_bridge(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_TABLE_ID_ACL_POLICY:
+		err = rocker_cmd_flow_tbl_add_acl(desc_info, entry);
+		break;
+	default:
+		err = -ENOTSUPP;
+		break;
+	}
+
+	if (err)
+		return err;
+
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+static int rocker_cmd_flow_tbl_del(struct rocker *rocker,
+				   struct rocker_port *rocker_port,
+				   struct rocker_desc_info *desc_info,
+				   void *priv)
+{
+	const struct rocker_flow_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_DEL))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_OF_DPA_COOKIE,
+			       entry->cookie))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+static int
+rocker_cmd_group_tbl_add_l2_interface(struct rocker_desc_info *desc_info,
+				      struct rocker_group_tbl_entry *entry)
+{
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_OUT_LPORT,
+			       ROCKER_GROUP_PORT_GET(entry->group_id)))
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u8(desc_info, ROCKER_TLV_OF_DPA_POP_VLAN,
+			      entry->l2_interface.pop_vlan))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int
+rocker_cmd_group_tbl_add_group_ids(struct rocker_desc_info *desc_info,
+				   struct rocker_group_tbl_entry *entry)
+{
+	int i;
+	struct rocker_tlv *group_ids;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_OF_DPA_GROUP_COUNT,
+			       entry->group_count))
+		return -EMSGSIZE;
+
+	group_ids = rocker_tlv_nest_start(desc_info,
+					  ROCKER_TLV_OF_DPA_GROUP_IDS);
+	if (!group_ids)
+		return -EMSGSIZE;
+
+	for (i = 0; i < entry->group_count; i++)
+		/* Note TLV array is 1-based */
+		if (rocker_tlv_put_u32(desc_info, i + 1, entry->group_ids[i]))
+			return -EMSGSIZE;
+
+	rocker_tlv_nest_end(desc_info, group_ids);
+
+	return 0;
+}
+
+static int rocker_cmd_group_tbl_add(struct rocker *rocker,
+				    struct rocker_port *rocker_port,
+				    struct rocker_desc_info *desc_info,
+				    void *priv)
+{
+	struct rocker_group_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+	int err = 0;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_ADD))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->group_id))
+		return -EMSGSIZE;
+
+	switch (ROCKER_GROUP_TYPE_GET(entry->group_id)) {
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_INTERFACE:
+		err = rocker_cmd_group_tbl_add_l2_interface(desc_info, entry);
+		break;
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST:
+		err = rocker_cmd_group_tbl_add_group_ids(desc_info, entry);
+		break;
+	default:
+		err = -ENOTSUPP;
+		break;
+	}
+
+	if (err)
+		return err;
+
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+static int rocker_cmd_group_tbl_del(struct rocker *rocker,
+				    struct rocker_port *rocker_port,
+				    struct rocker_desc_info *desc_info,
+				    void *priv)
+{
+	const struct rocker_group_tbl_entry *entry = priv;
+	struct rocker_tlv *cmd_info;
+
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE,
+			       ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_DEL))
+		return -EMSGSIZE;
+	cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO);
+	if (!cmd_info)
+		return -EMSGSIZE;
+	if (rocker_tlv_put_u32(desc_info, ROCKER_TLV_OF_DPA_GROUP_ID,
+			       entry->group_id))
+		return -EMSGSIZE;
+	rocker_tlv_nest_end(desc_info, cmd_info);
+
+	return 0;
+}
+
+/************************
+ * Flow and group tables
+ ************************/
+
+static struct rocker_flow_tbl_entry *rocker_flow_tbl_find(
+	struct rocker *rocker, struct rocker_flow_tbl_entry *match)
+{
+	struct rocker_flow_tbl_entry *found;
+
+	hash_for_each_possible(rocker->flow_tbl, found, entry, match->key_crc32)
+		if (memcmp(&found->key, &match->key, sizeof(found->key)) == 0)
+			return found;
+
+	return NULL;
+}
+
+static int rocker_flow_tbl_add(struct rocker_port *rocker_port,
+			       struct rocker_flow_tbl_entry *match,
+			       bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_flow_tbl_entry *found;
+	unsigned long flags;
+	bool add_to_hw = false;
+	int err = 0;
+
+	match->key_crc32 = crc32(~0, &match->key, sizeof(match->key));
+
+	spin_lock_irqsave(&rocker->flow_tbl_lock, flags);
+
+	found = rocker_flow_tbl_find(rocker, match);
+
+	if (found) {
+		kfree(match);
+	} else {
+		found = match;
+		found->cookie = rocker->flow_tbl_next_cookie++;
+		hash_add(rocker->flow_tbl, &found->entry, found->key_crc32);
+		add_to_hw = true;
+	}
+
+	found->ref_count++;
+
+	spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags);
+
+	if (add_to_hw) {
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_flow_tbl_add,
+				      found, NULL, NULL, nowait);
+		if (err) {
+			spin_lock_irqsave(&rocker->flow_tbl_lock, flags);
+			hash_del(&found->entry);
+			spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags);
+			kfree(found);
+		}
+	}
+
+	return err;
+}
+
+static int rocker_flow_tbl_del(struct rocker_port *rocker_port,
+			       struct rocker_flow_tbl_entry *match,
+			       bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_flow_tbl_entry *found;
+	unsigned long flags;
+	int del_from_hw = 0;
+	int err = 0;
+
+	match->key_crc32 = crc32(~0, &match->key, sizeof(match->key));
+
+	spin_lock_irqsave(&rocker->flow_tbl_lock, flags);
+
+	found = rocker_flow_tbl_find(rocker, match);
+
+	if (found) {
+		found->ref_count--;
+		if (found->ref_count == 0) {
+			hash_del(&found->entry);
+			del_from_hw = 1;
+		}
+	}
+
+	spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags);
+
+	kfree(match);
+
+	if (del_from_hw) {
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_flow_tbl_del,
+				      found, NULL, NULL, nowait);
+		kfree(found);
+	}
+
+	return err;
+}
+
+#define ROCKER_OP_FLAG_REMOVE		(1 << 0)
+#define ROCKER_OP_FLAG_NOWAIT		(1 << 1)
+
+static gfp_t rocker_op_flags_gfp(int flags)
+{
+	return flags & ROCKER_OP_FLAG_NOWAIT ? GFP_ATOMIC : GFP_KERNEL;
+}
+
+static int rocker_flow_tbl_do(struct rocker_port *rocker_port,
+			      int flags, struct rocker_flow_tbl_entry *entry)
+{
+	bool nowait = flags & ROCKER_OP_FLAG_NOWAIT;
+
+	if (flags & ROCKER_OP_FLAG_REMOVE)
+		return rocker_flow_tbl_del(rocker_port, entry, nowait);
+	else
+		return rocker_flow_tbl_add(rocker_port, entry, nowait);
+}
+
+static int rocker_flow_tbl_ig_port(struct rocker_port *rocker_port,
+				   int flags, u32 in_lport, u32 in_lport_mask,
+				   enum rocker_of_dpa_table_id goto_tbl)
+{
+	struct rocker_flow_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->key.priority = ROCKER_PRIORITY_IG_PORT;
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT;
+	entry->key.ig_port.in_lport = in_lport;
+	entry->key.ig_port.in_lport_mask = in_lport_mask;
+	entry->key.ig_port.goto_tbl = goto_tbl;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_tbl_vlan(struct rocker_port *rocker_port,
+				int flags, u32 in_lport,
+				__be16 vlan_id, __be16 vlan_id_mask,
+				enum rocker_of_dpa_table_id goto_tbl,
+				bool untagged, __be16 new_vlan_id)
+{
+	struct rocker_flow_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->key.priority = ROCKER_PRIORITY_VLAN;
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_VLAN;
+	entry->key.vlan.in_lport = in_lport;
+	entry->key.vlan.vlan_id = vlan_id;
+	entry->key.vlan.vlan_id_mask = vlan_id_mask;
+	entry->key.vlan.goto_tbl = goto_tbl;
+
+	entry->key.vlan.untagged = untagged;
+	entry->key.vlan.new_vlan_id = new_vlan_id;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_tbl_bridge(struct rocker_port *rocker_port,
+				  int flags,
+				  const u8 *eth_dst, const u8 *eth_dst_mask,
+				  __be16 vlan_id, u32 tunnel_id,
+				  enum rocker_of_dpa_table_id goto_tbl,
+				  u32 group_id)
+{
+	struct rocker_flow_tbl_entry *entry;
+	u32 priority;
+	bool vlan_bridging = !!vlan_id;
+	bool dflt = !eth_dst || (eth_dst && eth_dst_mask);
+	bool wild = false;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_BRIDGING;
+
+	if (eth_dst) {
+		entry->key.bridge.has_eth_dst = 1;
+		ether_addr_copy(entry->key.bridge.eth_dst, eth_dst);
+	}
+	if (eth_dst_mask) {
+		entry->key.bridge.has_eth_dst_mask = 1;
+		ether_addr_copy(entry->key.bridge.eth_dst_mask, eth_dst_mask);
+		if (memcmp(eth_dst_mask, zero_mac, ETH_ALEN))
+			wild = true;
+	}
+
+	priority = ROCKER_PRIORITY_UNKNOWN;
+	if (vlan_bridging & dflt & wild)
+		priority = ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_WILD;
+	else if (vlan_bridging & dflt & !wild)
+		priority = ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_EXACT;
+	else if (vlan_bridging & !dflt)
+		priority = ROCKER_PRIORITY_BRIDGING_VLAN;
+	else if (!vlan_bridging & dflt & wild)
+		priority = ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_WILD;
+	else if (!vlan_bridging & dflt & !wild)
+		priority = ROCKER_PRIORITY_BRIDGING_TENANT_DFLT_EXACT;
+	else if (!vlan_bridging & !dflt)
+		priority = ROCKER_PRIORITY_BRIDGING_TENANT;
+
+	entry->key.priority = priority;
+	entry->key.bridge.vlan_id = vlan_id;
+	entry->key.bridge.tunnel_id = tunnel_id;
+	entry->key.bridge.goto_tbl = goto_tbl;
+	entry->key.bridge.group_id = group_id;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_tbl_acl(struct rocker_port *rocker_port,
+			       int flags, u32 priority, u32 in_lport,
+			       u32 in_lport_mask,
+			       const u8 *eth_src, const u8 *eth_src_mask,
+			       const u8 *eth_dst, const u8 *eth_dst_mask,
+			       __be16 eth_type,
+			       __be16 vlan_id, __be16 vlan_id_mask,
+			       u8 ip_proto, u8 ip_proto_mask,
+			       u8 ip_tos, u8 ip_tos_mask,
+			       u32 group_id)
+{
+	struct rocker_flow_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->key.priority = priority;
+	entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+	entry->key.acl.in_lport = in_lport;
+	entry->key.acl.in_lport_mask = in_lport_mask;
+
+	if (eth_src)
+		ether_addr_copy(entry->key.acl.eth_src, eth_src);
+	if (eth_src_mask)
+		ether_addr_copy(entry->key.acl.eth_src_mask, eth_src_mask);
+	if (eth_dst)
+		ether_addr_copy(entry->key.acl.eth_dst, eth_dst);
+	if (eth_dst_mask)
+		ether_addr_copy(entry->key.acl.eth_dst_mask, eth_dst_mask);
+
+	entry->key.acl.eth_type = eth_type;
+	entry->key.acl.vlan_id = vlan_id;
+	entry->key.acl.vlan_id_mask = vlan_id_mask;
+	entry->key.acl.ip_proto = ip_proto;
+	entry->key.acl.ip_proto_mask = ip_proto_mask;
+	entry->key.acl.ip_tos = ip_tos;
+	entry->key.acl.ip_tos_mask = ip_tos_mask;
+	entry->key.acl.group_id = group_id;
+
+	return rocker_flow_tbl_do(rocker_port, flags, entry);
+}
+
+static struct rocker_group_tbl_entry *rocker_group_tbl_find(
+	struct rocker *rocker, struct rocker_group_tbl_entry *match)
+{
+	struct rocker_group_tbl_entry *found;
+	u8 type = ROCKER_GROUP_TYPE_GET(match->group_id);
+	u16 index;
+	int bkt;
+
+	switch (type) {
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_INTERFACE:
+		/* search for match by group_id */
+		hash_for_each_possible(rocker->group_tbl, found,
+				       entry, match->group_id)
+			if (found->group_id == match->group_id)
+				return found;
+		break;
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST:
+		/* search for match by group_ids */
+		hash_for_each(rocker->group_tbl, bkt, found, entry) {
+			if (type != ROCKER_GROUP_TYPE_GET(found->group_id))
+				continue;
+			if (found->group_count != match->group_count)
+				continue;
+			if (memcmp(found->group_ids, match->group_ids,
+				   found->group_count * sizeof(u32)) == 0)
+				return found;
+		}
+		/* no match: create new unique group_id */
+		index = rocker->group_index_next++;
+		match->group_id &= ~ROCKER_GROUP_INDEX_MASK;
+		match->group_id |= ROCKER_GROUP_INDEX_SET(index);
+		break;
+	default:
+		break;
+	}
+
+	return NULL;
+}
+
+static void rocker_group_tbl_entry_free(struct rocker_group_tbl_entry *entry)
+{
+	switch (ROCKER_GROUP_TYPE_GET(entry->group_id)) {
+	case ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST:
+		kfree(entry->group_ids);
+		break;
+	default:
+		break;
+	}
+	kfree(entry);
+}
+
+static int rocker_group_tbl_add(struct rocker_port *rocker_port,
+				struct rocker_group_tbl_entry *match,
+				u32 *group_id, bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_group_tbl_entry *found;
+	unsigned long flags;
+	bool add_to_hw = false;
+	int err = 0;
+
+	spin_lock_irqsave(&rocker->group_tbl_lock, flags);
+
+	found = rocker_group_tbl_find(rocker, match);
+
+	if (found) {
+		rocker_group_tbl_entry_free(match);
+	} else {
+		found = match;
+		hash_add(rocker->group_tbl, &found->entry, found->group_id);
+		add_to_hw = true;
+	}
+
+	found->ref_count++;
+
+	spin_unlock_irqrestore(&rocker->group_tbl_lock, flags);
+
+	*group_id = found->group_id;
+
+	if (add_to_hw) {
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_group_tbl_add,
+				      found, NULL, NULL, nowait);
+		if (err) {
+			spin_lock_irqsave(&rocker->group_tbl_lock, flags);
+			hash_del(&found->entry);
+			spin_unlock_irqrestore(&rocker->group_tbl_lock, flags);
+			rocker_group_tbl_entry_free(found);
+		}
+	}
+
+	return err;
+}
+
+static int rocker_group_tbl_del(struct rocker_port *rocker_port,
+				struct rocker_group_tbl_entry *match,
+				u32 *group_id, bool nowait)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_group_tbl_entry *found;
+	unsigned long flags;
+	bool del_from_hw = false;
+	int err = 0;
+
+	spin_lock_irqsave(&rocker->group_tbl_lock, flags);
+
+	found = rocker_group_tbl_find(rocker, match);
+
+	if (found) {
+		*group_id = found->group_id;
+		found->ref_count--;
+		if (found->ref_count == 0) {
+			hash_del(&found->entry);
+			del_from_hw = true;
+		}
+	}
+
+	spin_unlock_irqrestore(&rocker->group_tbl_lock, flags);
+
+	rocker_group_tbl_entry_free(match);
+
+	if (del_from_hw) {
+		err = rocker_cmd_exec(rocker, rocker_port,
+				      rocker_cmd_group_tbl_del,
+				      found, NULL, NULL, nowait);
+		rocker_group_tbl_entry_free(found);
+	}
+
+	return err;
+}
+
+static int rocker_group_tbl_do(struct rocker_port *rocker_port,
+			       int flags, struct rocker_group_tbl_entry *entry,
+			       u32 *group_id)
+{
+	bool nowait = flags & ROCKER_OP_FLAG_NOWAIT;
+
+	if (flags & ROCKER_OP_FLAG_REMOVE)
+		return rocker_group_tbl_del(rocker_port, entry,
+					    group_id, nowait);
+	else
+		return rocker_group_tbl_add(rocker_port, entry,
+					    group_id, nowait);
+}
+
+static int rocker_group_l2_interface(struct rocker_port *rocker_port,
+				     int flags, u32 group_id,
+				     int pop_vlan)
+{
+	struct rocker_group_tbl_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->group_id = group_id;
+	entry->l2_interface.pop_vlan = pop_vlan;
+
+	return rocker_group_tbl_do(rocker_port, flags, entry, &group_id);
+}
+
+static int rocker_group_l2_mcast(struct rocker_port *rocker_port,
+				 int flags, __be16 vlan_id,
+				 u16 group_count, u32 *group_ids,
+				 u32 *group_id)
+{
+	struct rocker_group_tbl_entry *entry;
+
+	*group_id = 0;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->group_id = ROCKER_GROUP_L2_MCAST(vlan_id, 0);
+	entry->group_count = group_count;
+	entry->group_ids = group_ids;
+
+	return rocker_group_tbl_do(rocker_port, flags, entry, group_id);
+}
+
+static int rocker_group_id_compare(const void *a, const void *b)
+{
+	return memcmp(a, b, sizeof(u32));
+}
+
+static struct rocker_port *rocker_port_get_by_ifindex(struct rocker *rocker,
+						      int ifindex)
+{
+	int i;
+
+	for (i = 0; i < rocker->port_count; i++)
+		if (rocker->ports[i]->dev->ifindex == ifindex)
+			return rocker->ports[i];
+	return NULL;
+}
+
+static u32 *rocker_flow_get_group_ids(struct rocker_port *rocker_port,
+				      const struct sw_flow *flow, int flags,
+				      __be16 vlan_id, u16 *count)
+{
+	struct rocker_port *out_port;
+	u32 *group_ids = NULL;
+	u32 out_lport;
+	bool send_up = false;
+	int i;
+
+	*count = 0;
+
+	for (i = 0; i < flow->actions->count; i++) {
+		int ifindex = flow->actions->actions[i].out_port_ifindex;
+
+		out_port = rocker_port_get_by_ifindex(rocker_port->rocker,
+						      ifindex);
+		if (out_port) {
+			group_ids = krealloc(group_ids,
+					     ++(*count) * sizeof(u32),
+					     rocker_op_flags_gfp(flags));
+			if (!group_ids)
+				goto err_out;
+			out_lport =
+				rocker_port_to_lport(out_port);
+			group_ids[i] = ROCKER_GROUP_L2_INTERFACE(vlan_id,
+								 out_lport);
+		} else if (!send_up) {
+			send_up = true;
+			group_ids = krealloc(group_ids,
+					     ++(*count) * sizeof(u32),
+					     rocker_op_flags_gfp(flags));
+			if (!group_ids)
+				goto err_out;
+			out_lport = 0;
+			group_ids[i] = ROCKER_GROUP_L2_INTERFACE(vlan_id,
+								 out_lport);
+		}
+	}
+
+	sort(group_ids, *count, sizeof(u32), rocker_group_id_compare, NULL);
+
+	return group_ids;
+
+err_out:
+	*count = 0;
+	return NULL;
+}
+
+static int rocker_bridging_vlan_ucast(struct rocker_port *rocker_port,
+				      const struct sw_flow *flow,
+				      int flags, __be16 vlan_id, bool pop_vlan)
+{
+	struct rocker_port *out_port;
+	u32 out_lport;
+	u32 tunnel_id = 0;
+	u32 group_l2_interface;
+	int err;
+
+	/* L2 interface group for output */
+
+	if (flow->actions->count == 0) {
+		out_lport = 0; /* send it up */
+	} else if (flow->actions->count == 1) {
+		int ifindex = flow->actions->actions[0].out_port_ifindex;
+
+		out_port = rocker_port_get_by_ifindex(rocker_port->rocker,
+						      ifindex);
+		if (out_port)
+			out_lport = rocker_port_to_lport(out_port);
+		else
+			out_lport = 0; /* send it up */
+	} else {
+		netdev_err(rocker_port->dev, "Trying to install unicast bridge vlan flow with more than one output device\n");
+		return -EINVAL;
+	}
+
+	group_l2_interface = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
+	err = rocker_group_l2_interface(rocker_port, flags,
+					group_l2_interface, pop_vlan);
+	if (err) {
+		netdev_err(rocker_port->dev, "Error (%d) L2 interface group\n",
+			   err);
+		return err;
+	}
+
+	/* VLAN unicast bridge table entry */
+
+	err = rocker_flow_tbl_bridge(rocker_port, flags,
+				     flow->key.eth.dst, NULL,
+				     vlan_id, tunnel_id,
+				     ROCKER_OF_DPA_TABLE_ID_ACL_POLICY,
+				     group_l2_interface);
+
+	if (err)
+		netdev_err(rocker_port->dev, "Error (%d) VLAN unicast bridging table entry\n",
+			   err);
+
+	return err;
+}
+
+static int rocker_bridging_vlan_mcast(struct rocker_port *rocker_port,
+				      const struct sw_flow *flow,
+				      int flags, __be16 vlan_id, bool pop_vlan)
+{
+	u32 tunnel_id = 0;
+	u32 group_l2_mcast;
+	u16 group_count;
+	u32 *group_ids;
+	int err;
+	int i;
+
+	/* Get sorted list of output L2 interface group ids;
+	 * if there are none, there is nothing to forward in HW,
+	 * so we're done.
+	 */
+
+	group_ids = rocker_flow_get_group_ids(rocker_port, flow, flags, vlan_id,
+					      &group_count);
+	if (group_ids == 0)
+		return 0;
+
+	/* L2 interface groups for each out_lport */
+
+	for (i = 0; i < group_count; i++) {
+		err = rocker_group_l2_interface(rocker_port, flags,
+						group_ids[i], pop_vlan);
+		if (err) {
+			netdev_err(rocker_port->dev, "Error (%d) L2 interface group\n",
+				   err);
+			goto err_free_group_ids;
+		}
+	}
+
+	/* L2 multicast group entry */
+
+	err = rocker_group_l2_mcast(rocker_port, flags,
+				    vlan_id, group_count,
+				    group_ids, &group_l2_mcast);
+	if (err) {
+		netdev_err(rocker_port->dev, "Error (%d) L2 mcast group\n",
+			   err);
+		goto err_free_group_ids;
+	}
+
+	/* VLAN multicast bridge table entry */
+
+	err = rocker_flow_tbl_bridge(rocker_port, flags,
+				     flow->key.eth.dst, NULL,
+				     vlan_id, tunnel_id,
+				     ROCKER_OF_DPA_TABLE_ID_ACL_POLICY,
+				     group_l2_mcast);
+
+	if (err)
+		netdev_err(rocker_port->dev, "Error (%d) VLAN mcast bridging\n",
+			   err);
+
+	return err;
+
+err_free_group_ids:
+	kfree(group_ids);
+	return err;
+}
+
+static int rocker_flow_parse(struct rocker_port *rocker_port,
+			     const struct sw_flow *flow,
+			     int flags)
+{
+	struct rocker_port *in_port;
+	u32 in_lport;
+	u32 in_lport_mask;
+	__be16 vlan_id;
+	__be16 vlan_id_mask;
+	__be16 new_vlan_id;
+	__be16 outer_vlan_id;
+	u16 bridge_id;
+	u32 tunnel_id;
+	bool untagged;
+	bool unicast;
+	bool eth_dst_exact;
+	int err;
+
+	enum {
+		BRIDGING_MODE_UNKNOWN,
+		BRIDGING_MODE_VLAN_UCAST,
+		BRIDGING_MODE_VLAN_MCAST,
+		BRIDGING_MODE_VLAN_DFLT,
+		BRIDGING_MODE_TUNNEL_UCAST,
+		BRIDGING_MODE_TUNNEL_MCAST,
+		BRIDGING_MODE_TUNNEL_DFLT,
+	} bridging_mode = BRIDGING_MODE_UNKNOWN;
+
+	tunnel_id = 0; /* XXX for now */
+
+	/* A note about value masks: sw_flow uses mask bit value of
+	 * 0 for "don't care", whereas OF-DPA HW uses mask bit value
+	 * of 1 for "don't care", so sw_flow mask value must be
+	 * inverted beforing passing to OF-DPA HW.  To summarize:
+	 *
+	 *      mask bit   sw_flow         OF-DPA
+	 *      -------------------------------------
+	 *      0          don't care      care
+	 *      1          care            don't care
+	 */
+
+	/* Get lport for in_port.  Skip sw_flows if in_port is not a
+	 * rocker port in our network namespace.
+	 */
+
+	in_port = rocker_port_get_by_ifindex(rocker_port->rocker,
+					     flow->key.misc.in_port_ifindex);
+	if (!in_port)
+		return 0;
+
+	in_lport = rocker_port_to_lport(in_port);
+	in_lport_mask = 0;
+
+	/* Determine outer VLAN ID.  If untagged, use bridge VLAN ID,
+	 * otherwise use tagged VLAN ID for outer VLAN ID.
+	 */
+
+	if (flow->key.eth.tci == htons(0) &&
+	    flow->mask->key.eth.tci == htons(0xffff)) {
+		vlan_id = flow->key.eth.tci;
+		vlan_id_mask = htons(0x0fff);
+		untagged = true;
+	} else {
+		/* XXX For now, fail any vlan except untagged vlan 0 */
+		netdev_warn(rocker_port->dev,
+			    "Can't parse vlan info, vlan 0x%04x mask 0x%04x\n",
+			    ntohs(flow->key.eth.tci),
+			    ntohs(flow->mask->key.eth.tci));
+		return 0;
+	}
+
+	bridge_id = 0; /* XXX for now, need unique ID for each bridge */
+	new_vlan_id = htons(bridge_id << 8 | in_lport);
+	outer_vlan_id = untagged ? new_vlan_id : vlan_id;
+
+	/* Ingress port table entry */
+
+	err = rocker_flow_tbl_ig_port(rocker_port, flags,
+				      in_lport, in_lport_mask,
+				      ROCKER_OF_DPA_TABLE_ID_VLAN);
+	if (err) {
+		netdev_err(rocker_port->dev, "Error (%d) ingress port table entry\n",
+			   err);
+		return err;
+	}
+
+	/* VLAN table entry */
+
+	err = rocker_flow_tbl_vlan(rocker_port, flags,
+				   in_lport, vlan_id, vlan_id_mask,
+				   ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC,
+				   untagged, new_vlan_id);
+	if (err) {
+		netdev_err(rocker_port->dev, "Error (%d) VLAN table entry\n",
+			   err);
+		return err;
+	}
+
+	/* XXX Determine if sw_flow wants L2 bridging or L3 routing.
+	 * XXX If wanting L3 routing, need to add termination mac
+	 * XXX table entry to catch L3 routing prefixes.
+	 * XXX For now, just doing L2 bridging, so skip term mac tbl
+	 * XXX (miss on term mac tbl goes to bridge tbl).
+	 */
+
+	unicast = (flow->key.eth.dst[0] & 0x01) == 0x00;
+	eth_dst_exact = memcmp(flow->mask->key.eth.dst, ff_mac, ETH_ALEN) == 0;
+
+	if (outer_vlan_id && unicast && eth_dst_exact)
+		bridging_mode = BRIDGING_MODE_VLAN_UCAST;
+	else if (outer_vlan_id && !unicast && eth_dst_exact)
+		bridging_mode = BRIDGING_MODE_VLAN_MCAST;
+
+	switch (bridging_mode) {
+	case BRIDGING_MODE_VLAN_UCAST:
+		err = rocker_bridging_vlan_ucast(rocker_port, flow, flags,
+						 outer_vlan_id, untagged);
+		break;
+	case BRIDGING_MODE_VLAN_MCAST:
+		err = rocker_bridging_vlan_mcast(rocker_port, flow, flags,
+						 outer_vlan_id, untagged);
+		break;
+	default:
+		netdev_err(rocker_port->dev, "Unknown bridging mode\n");
+		err = -ENOTSUPP;
+		break;
+	}
+
+	if (err) {
+		netdev_err(rocker_port->dev, "Error (%d) bridging table entry\n",
+			   err);
+		return err;
+	}
+
+	/* ACL table entry */
+
+	err = rocker_flow_tbl_acl(rocker_port, flags,
+				  ROCKER_PRIORITY_ACL,
+				  in_lport, in_lport_mask,
+				  flow->key.eth.src, zero_mac,
+				  flow->key.eth.dst, zero_mac,
+				  flow->key.eth.type,
+				  outer_vlan_id, vlan_id_mask,
+				  flow->key.ip.proto, ~flow->mask->key.ip.proto,
+				  flow->key.ip.tos, ~flow->mask->key.ip.tos,
+				  ROCKER_GROUP_NONE);
+
+	if (err)
+		netdev_err(rocker_port->dev, "Error (%d) ACL table entry\n",
+			   err);
+
+	return err;
+}
+
+static int rocker_flow_add(struct rocker_port *rocker_port,
+			   const struct sw_flow *flow)
+{
+	return rocker_flow_parse(rocker_port, flow, 0);
+}
+
+static int rocker_flow_del(struct rocker_port *rocker_port,
+			   const struct sw_flow *flow)
+{
+	return rocker_flow_parse(rocker_port, flow, ROCKER_OP_FLAG_REMOVE);
+}
+
+/*****************
+ * Net device ops
+ *****************/
+
+static int rocker_port_open(struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int err;
+
+	err = rocker_port_dma_rings_init(rocker_port);
+	if (err)
+		return err;
+
+	err = request_irq(rocker_msix_tx_vector(rocker_port),
+			  rocker_tx_irq_handler, 0,
+			  rocker_driver_name, rocker_port);
+	if (err) {
+		netdev_err(rocker_port->dev, "cannot assign tx irq\n");
+		goto err_request_tx_irq;
+	}
+
+	err = request_irq(rocker_msix_rx_vector(rocker_port),
+			  rocker_rx_irq_handler, 0,
+			  rocker_driver_name, rocker_port);
+	if (err) {
+		netdev_err(rocker_port->dev, "cannot assign rx irq\n");
+		goto err_request_rx_irq;
+	}
+
+	napi_enable(&rocker_port->napi_tx);
+	napi_enable(&rocker_port->napi_rx);
+	rocker_port_set_enable(rocker_port, true);
+	netif_start_queue(dev);
+	return 0;
+
+err_request_rx_irq:
+	free_irq(rocker_msix_tx_vector(rocker_port), rocker_port);
+err_request_tx_irq:
+	rocker_port_dma_rings_fini(rocker_port);
+	return err;
+}
+
+static int rocker_port_stop(struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+
+	netif_stop_queue(dev);
+	rocker_port_set_enable(rocker_port, false);
+	napi_disable(&rocker_port->napi_rx);
+	napi_disable(&rocker_port->napi_tx);
+	free_irq(rocker_msix_rx_vector(rocker_port), rocker_port);
+	free_irq(rocker_msix_tx_vector(rocker_port), rocker_port);
+	rocker_port_dma_rings_fini(rocker_port);
+
+	return 0;
+}
+
+static void rocker_tx_desc_frags_unmap(struct rocker_port *rocker_port,
+				       struct rocker_desc_info *desc_info)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_tlv *attrs[ROCKER_TLV_TX_MAX + 1];
+	struct rocker_tlv *attr;
+	int rem;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_TX_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_TX_FRAGS])
+		return;
+	rocker_tlv_for_each_nested(attr, attrs[ROCKER_TLV_TX_FRAGS], rem) {
+		struct rocker_tlv *frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_MAX + 1];
+		dma_addr_t dma_handle;
+		size_t len;
+
+		if (rocker_tlv_type(attr) != ROCKER_TLV_TX_FRAG)
+			continue;
+		rocker_tlv_parse_nested(frag_attrs, ROCKER_TLV_TX_FRAG_ATTR_MAX,
+					attr);
+		if (!frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_ADDR] ||
+		    !frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_LEN])
+			continue;
+		dma_handle = rocker_tlv_get_u64(frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_ADDR]);
+		len = rocker_tlv_get_u16(frag_attrs[ROCKER_TLV_TX_FRAG_ATTR_LEN]);
+		pci_unmap_single(pdev, dma_handle, len, DMA_TO_DEVICE);
+	}
+}
+
+static int rocker_tx_desc_frag_map_put(struct rocker_port *rocker_port,
+				       struct rocker_desc_info *desc_info,
+				       char *buf, size_t buf_len)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	struct pci_dev *pdev = rocker->pdev;
+	dma_addr_t dma_handle;
+	struct rocker_tlv *frag;
+
+	dma_handle = pci_map_single(pdev, buf, buf_len, DMA_TO_DEVICE);
+	if (unlikely(pci_dma_mapping_error(pdev, dma_handle))) {
+		if (net_ratelimit())
+			netdev_err(rocker_port->dev, "failed to dma map tx frag\n");
+		return -EIO;
+	}
+	frag = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAG);
+	if (!frag)
+		goto unmap_frag;
+	if (rocker_tlv_put_u64(desc_info, ROCKER_TLV_TX_FRAG_ATTR_ADDR,
+			       dma_handle))
+		goto nest_cancel;
+	if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_TX_FRAG_ATTR_LEN,
+			       buf_len))
+		goto nest_cancel;
+	rocker_tlv_nest_end(desc_info, frag);
+	return 0;
+
+nest_cancel:
+	rocker_tlv_nest_cancel(desc_info, frag);
+unmap_frag:
+	pci_unmap_single(pdev, dma_handle, buf_len, DMA_TO_DEVICE);
+	return -EMSGSIZE;
+}
+
+static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_desc_info *desc_info;
+	struct rocker_tlv *frags;
+	int i;
+	int err;
+
+	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
+	if (unlikely(!desc_info)) {
+		if (net_ratelimit())
+			netdev_err(dev, "tx ring full when queue awake\n");
+		return NETDEV_TX_BUSY;
+	}
+
+	rocker_desc_cookie_ptr_set(desc_info, skb);
+
+	frags = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAGS);
+	if (!frags)
+		goto out;
+	err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
+					  skb->data, skb_headlen(skb));
+	if (err)
+		goto nest_cancel;
+	if (skb_shinfo(skb)->nr_frags > ROCKER_TX_FRAGS_MAX)
+		goto nest_cancel;
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
+						  skb_frag_address(frag),
+						  skb_frag_size(frag));
+		if (err)
+			goto unmap_frags;
+	}
+	rocker_tlv_nest_end(desc_info, frags);
+
+	rocker_desc_gen_clear(desc_info);
+	rocker_desc_head_set(rocker, &rocker_port->tx_ring, desc_info);
+
+	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
+	if (!desc_info)
+		netif_stop_queue(dev);
+
+	return NETDEV_TX_OK;
+
+unmap_frags:
+	rocker_tx_desc_frags_unmap(rocker_port, desc_info);
+nest_cancel:
+	rocker_tlv_nest_cancel(desc_info, frags);
+out:
+	dev_kfree_skb(skb);
+	return NETDEV_TX_OK;
+}
+
+static struct rocker_promisc_acl {
+	u16 eth_type;
+	const u8 *eth_src;
+	const u8 *eth_src_mask;
+	const u8 *eth_dst;
+	const u8 *eth_dst_mask;
+	u8 ip_proto;
+	u8 ip_proto_mask;
+	u8 ip_tos;
+	u8 ip_tos_mask;
+} rocker_promisc_acls[] = {
+	{
+		/* allow any ARP pkts */
+		.eth_type = htons(ETH_P_ARP),
+		.eth_src = zero_mac,
+		.eth_src_mask = ff_mac,
+		.eth_dst = zero_mac,
+		.eth_dst_mask = ff_mac,
+	},
+	{
+		/* allow any IP pkts */
+		.eth_type = htons(ETH_P_IP),
+		.eth_src = zero_mac,
+		.eth_src_mask = ff_mac,
+		.eth_dst = zero_mac,
+		.eth_dst_mask = ff_mac,
+		.ip_proto = 0,
+		.ip_proto_mask = 0xff,
+		.ip_tos = 0,
+		.ip_tos_mask = 0xff,
+	},
+	{
+		/* allow LLDP pkts */
+		.eth_type = htons(0x88cc),
+		.eth_src = zero_mac,
+		.eth_src_mask = ff_mac,
+		.eth_dst = lldp_mac,
+		.eth_dst_mask = zero_mac,
+	},
+	{
+		/* allow any IPv6 pkts */
+		.eth_type = htons(ETH_P_IPV6),
+		.eth_src = zero_mac,
+		.eth_src_mask = ff_mac,
+		.eth_dst = zero_mac,
+		.eth_dst_mask = ff_mac,
+		.ip_proto = 0,
+		.ip_proto_mask = 0xff,
+		.ip_tos = 0,
+		.ip_tos_mask = 0xff,
+	},
+	{
+		/* mark end of list */
+		.eth_type = 0,
+	},
+};
+
+static int rocker_port_set_promisc(struct rocker_port *rocker_port,
+				   int flags)
+{
+	u32 in_lport = rocker_port_to_lport(rocker_port);
+	u32 in_lport_mask = 0;
+	u32 out_lport;
+	u16 bridge_id;
+	__be16 vlan_id;
+	__be16 vlan_id_mask;
+	__be16 new_vlan_id;
+	struct rocker_promisc_acl *acl;
+	u32 group_l2_interface;
+	bool untagged;
+	bool pop_vlan;
+	int err;
+
+	/* ingress port table entry */
+
+	err = rocker_flow_tbl_ig_port(rocker_port, flags,
+				      in_lport, in_lport_mask,
+				      ROCKER_OF_DPA_TABLE_ID_VLAN);
+	if (err) {
+		netdev_err(rocker_port->dev, "Error (%d) ingress port table entry\n",
+			   err);
+		return err;
+	}
+
+	/* VLAN table entry for untagged traffic */
+
+	vlan_id = 0;
+	vlan_id_mask = htons(0x0fff);
+	untagged = true;
+	bridge_id = 0; /* XXX for now, need a unique ID for each bridge */
+	new_vlan_id = htons(bridge_id << 8 | in_lport);
+
+	err = rocker_flow_tbl_vlan(rocker_port, flags,
+				   in_lport, vlan_id, vlan_id_mask,
+				   ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC,
+				   untagged, new_vlan_id);
+	if (err) {
+		netdev_err(rocker_port->dev, "Error (%d) VLAN table entry\n",
+			   err);
+		return err;
+	}
+
+	/* L2 interface group entry for bridge (port 0) */
+
+	out_lport = 0;
+	pop_vlan = untagged;
+
+	group_l2_interface = ROCKER_GROUP_L2_INTERFACE(new_vlan_id, out_lport);
+	err = rocker_group_l2_interface(rocker_port, flags, group_l2_interface,
+					pop_vlan);
+	if (err) {
+		netdev_err(rocker_port->dev, "Error (%d) L2 interface group\n",
+			   err);
+		return err;
+	}
+
+	/* ACL table entries for acceptable pkts */
+
+	for (acl = rocker_promisc_acls; acl->eth_type; acl++) {
+		err = rocker_flow_tbl_acl(rocker_port, flags,
+					  ROCKER_PRIORITY_ACL_PORT_PROMISC,
+					  in_lport, in_lport_mask,
+					  acl->eth_src, acl->eth_src_mask,
+					  acl->eth_dst, acl->eth_dst_mask,
+					  acl->eth_type,
+					  new_vlan_id, vlan_id_mask,
+					  acl->ip_proto, acl->ip_proto_mask,
+					  acl->ip_tos, acl->ip_tos_mask,
+					  group_l2_interface);
+		if (err) {
+			netdev_err(rocker_port->dev, "Error (%d) ACL table entry\n",
+				   err);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+static void rocker_port_set_rx_mode(struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int prev_promisc = (rocker_port->prev_flags & IFF_PROMISC) ? 1 : 0;
+	int promisc = (dev->flags & IFF_PROMISC) ? 1 : 0;
+	int op_flags = ROCKER_OP_FLAG_NOWAIT;
+
+	if (!promisc)
+		op_flags |= ROCKER_OP_FLAG_REMOVE;
+
+	if (promisc != prev_promisc)
+		rocker_port_set_promisc(rocker_port, op_flags);
+
+	rocker_port->prev_flags = dev->flags;
+}
+
+static int rocker_port_set_mac_address(struct net_device *dev, void *p)
+{
+	struct sockaddr *addr = p;
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int err;
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	err = rocker_cmd_set_port_settings_macaddr(rocker_port, addr->sa_data);
+	if (err)
+		return err;
+	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
+	return 0;
+}
+
+static int rocker_port_swdev_get_id(struct net_device *dev,
+				    struct netdev_phys_item_id *psid)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker *rocker = rocker_port->rocker;
+
+	psid->id_len = sizeof(rocker->hw.id);
+	memcpy(&psid->id, &rocker->hw.id, psid->id_len);
+	return 0;
+}
+
+static int rocker_port_swdev_flow_insert(struct net_device *dev,
+					 const struct sw_flow *flow)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+
+	return rocker_flow_add(rocker_port, flow);
+}
+
+static int rocker_port_swdev_flow_remove(struct net_device *dev,
+					 const struct sw_flow *flow)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+
+	return rocker_flow_del(rocker_port, flow);
+}
+
+static const struct net_device_ops rocker_port_netdev_ops = {
+	.ndo_open		= rocker_port_open,
+	.ndo_stop		= rocker_port_stop,
+	.ndo_start_xmit		= rocker_port_xmit,
+	.ndo_set_rx_mode	= rocker_port_set_rx_mode,
+	.ndo_set_mac_address	= rocker_port_set_mac_address,
+	.ndo_swdev_get_id	= rocker_port_swdev_get_id,
+	.ndo_swdev_flow_insert	= rocker_port_swdev_flow_insert,
+	.ndo_swdev_flow_remove	= rocker_port_swdev_flow_remove,
+};
+
+static bool rocker_port_dev_check(struct net_device *dev)
+{
+	return dev->netdev_ops == &rocker_port_netdev_ops;
+}
+
+/********************
+ * ethtool interface
+ ********************/
+
+static int rocker_port_get_settings(struct net_device *dev,
+				    struct ethtool_cmd *ecmd)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+
+	return rocker_cmd_get_port_settings_ethtool(rocker_port, ecmd);
+}
+
+static int rocker_port_set_settings(struct net_device *dev,
+				    struct ethtool_cmd *ecmd)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+
+	return rocker_cmd_set_port_settings_ethtool(rocker_port, ecmd);
+}
+
+static void rocker_port_get_drvinfo(struct net_device *dev,
+				    struct ethtool_drvinfo *drvinfo)
+{
+	strlcpy(drvinfo->driver, rocker_driver_name, sizeof(drvinfo->driver));
+	strlcpy(drvinfo->version, UTS_RELEASE, sizeof(drvinfo->version));
+}
+
+static const struct ethtool_ops rocker_port_ethtool_ops = {
+	.get_settings		= rocker_port_get_settings,
+	.set_settings		= rocker_port_set_settings,
+	.get_drvinfo		= rocker_port_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+};
+
+/*****************
+ * NAPI interface
+ *****************/
+
+static struct rocker_port *rocker_port_napi_tx_get(struct napi_struct *napi)
+{
+	return container_of(napi, struct rocker_port, napi_tx);
+}
+
+static int rocker_port_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct rocker_port *rocker_port = rocker_port_napi_tx_get(napi);
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_desc_info *desc_info;
+	u32 credits = 0;
+	int err;
+
+	/* Cleanup tx descriptors */
+	while ((desc_info = rocker_desc_tail_get(&rocker_port->tx_ring))) {
+		err = rocker_desc_err(desc_info);
+		if (err && net_ratelimit())
+			netdev_err(rocker_port->dev, "tx desc received with err %d\n",
+				   err);
+		rocker_tx_desc_frags_unmap(rocker_port, desc_info);
+		dev_kfree_skb_any(rocker_desc_cookie_ptr_get(desc_info));
+		credits++;
+	}
+
+	if (credits && netif_queue_stopped(rocker_port->dev))
+		netif_wake_queue(rocker_port->dev);
+
+	napi_complete(napi);
+	rocker_dma_ring_credits_set(rocker, &rocker_port->tx_ring, credits);
+
+	return 0;
+}
+
+static int rocker_port_rx_proc(struct rocker *rocker,
+			       struct rocker_port *rocker_port,
+			       struct rocker_desc_info *desc_info)
+{
+	struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1];
+	struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info);
+	size_t rx_len;
+
+	if (!skb)
+		return -ENOENT;
+
+	rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info);
+	if (!attrs[ROCKER_TLV_RX_FRAG_LEN])
+		return -EINVAL;
+
+	rocker_dma_rx_ring_skb_unmap(rocker, attrs);
+
+	rx_len = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FRAG_LEN]);
+	skb_put(skb, rx_len);
+	skb->protocol = eth_type_trans(skb, rocker_port->dev);
+	netif_receive_skb(skb);
+
+	return rocker_dma_rx_ring_skb_alloc(rocker, rocker_port, desc_info);
+}
+
+static struct rocker_port *rocker_port_napi_rx_get(struct napi_struct *napi)
+{
+	return container_of(napi, struct rocker_port, napi_rx);
+}
+
+static int rocker_port_poll_rx(struct napi_struct *napi, int budget)
+{
+	struct rocker_port *rocker_port = rocker_port_napi_rx_get(napi);
+	struct rocker *rocker = rocker_port->rocker;
+	struct rocker_desc_info *desc_info;
+	u32 credits = 0;
+	int err;
+
+	/* Process rx descriptors */
+	while (credits < budget &&
+	       (desc_info = rocker_desc_tail_get(&rocker_port->rx_ring))) {
+		err = rocker_desc_err(desc_info);
+		if (err) {
+			if (net_ratelimit())
+				netdev_err(rocker_port->dev, "rx desc received with err %d\n",
+					   err);
+		} else {
+			err = rocker_port_rx_proc(rocker, rocker_port,
+						  desc_info);
+			if (err && net_ratelimit())
+				netdev_err(rocker_port->dev, "rx processing failed with err %d\n",
+					   err);
+		}
+		rocker_desc_gen_clear(desc_info);
+		rocker_desc_head_set(rocker, &rocker_port->rx_ring, desc_info);
+		credits++;
+	}
+
+	if (credits < budget)
+		napi_complete(napi);
+
+	rocker_dma_ring_credits_set(rocker, &rocker_port->rx_ring, credits);
+
+	return credits;
+}
+
+/*****************
+ * PCI driver ops
+ *****************/
+
+static void rocker_carrier_init(struct rocker_port *rocker_port)
+{
+	struct rocker *rocker = rocker_port->rocker;
+	u64 link_status = rocker_read64(rocker, PORT_PHYS_LINK_STATUS);
+	bool link_up;
+
+	link_up = link_status & (1 << rocker_port_to_lport(rocker_port));
+	if (link_up)
+		netif_carrier_on(rocker_port->dev);
+	else
+		netif_carrier_off(rocker_port->dev);
+}
+
+static void rocker_remove_ports(struct rocker *rocker)
+{
+	int i;
+
+	for (i = 0; i < rocker->port_count; i++)
+		unregister_netdev(rocker->ports[i]->dev);
+	kfree(rocker->ports);
+}
+
+static void rocker_port_dev_addr_init(struct rocker *rocker,
+				      struct rocker_port *rocker_port)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int err;
+
+	err = rocker_cmd_get_port_settings_macaddr(rocker_port,
+						   rocker_port->dev->dev_addr);
+	if (err) {
+		dev_warn(&pdev->dev, "failed to get mac address, using random\n");
+		eth_hw_addr_random(rocker_port->dev);
+	}
+}
+
+static int rocker_probe_port(struct rocker *rocker, unsigned port_number)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	struct rocker_port *rocker_port;
+	struct net_device *dev;
+	int err;
+
+	dev = alloc_etherdev(sizeof(struct rocker_port));
+	if (!dev)
+		return -ENOMEM;
+	rocker_port = netdev_priv(dev);
+	rocker_port->dev = dev;
+	rocker_port->rocker = rocker;
+	rocker_port->port_number = port_number;
+
+	rocker_port_dev_addr_init(rocker, rocker_port);
+	dev->netdev_ops = &rocker_port_netdev_ops;
+	dev->ethtool_ops = &rocker_port_ethtool_ops;
+	netif_napi_add(dev, &rocker_port->napi_tx, rocker_port_poll_tx,
+		       NAPI_POLL_WEIGHT);
+	netif_napi_add(dev, &rocker_port->napi_rx, rocker_port_poll_rx,
+		       NAPI_POLL_WEIGHT);
+	rocker_carrier_init(rocker_port);
+
+	err = register_netdev(dev);
+	if (err) {
+		dev_err(&pdev->dev, "register_netdev failed\n");
+		goto free_netdev;
+	}
+	rocker->ports[port_number] = rocker_port;
+	return 0;
+
+free_netdev:
+	free_netdev(dev);
+	return err;
+}
+
+static int rocker_probe_ports(struct rocker *rocker)
+{
+	int i;
+	size_t alloc_size;
+	int err;
+
+	alloc_size = sizeof(struct rocker_port *) * rocker->port_count;
+	rocker->ports = kmalloc(alloc_size, GFP_KERNEL);
+	for (i = 0; i < rocker->port_count; i++) {
+		err = rocker_probe_port(rocker, i);
+		if (err)
+			goto remove_ports;
+	}
+	return 0;
+
+remove_ports:
+	rocker_remove_ports(rocker);
+	return err;
+}
+
+static int rocker_msix_init(struct rocker *rocker)
+{
+	struct pci_dev *pdev = rocker->pdev;
+	int msix_entries;
+	int i;
+	int err;
+
+	msix_entries = pci_msix_vec_count(pdev);
+	if (msix_entries < 0)
+		return msix_entries;
+
+	if (msix_entries != ROCKER_MSIX_VEC_COUNT(rocker->port_count))
+		return -EINVAL;
+
+	rocker->msix_entries = kmalloc_array(msix_entries,
+					     sizeof(struct msix_entry),
+					     GFP_KERNEL);
+	if (!rocker->msix_entries)
+		return -ENOMEM;
+
+	for (i = 0; i < msix_entries; i++)
+		rocker->msix_entries[i].entry = i;
+
+	err = pci_enable_msix_exact(pdev, rocker->msix_entries, msix_entries);
+	if (err < 0)
+		goto err_enable_msix;
+
+	return 0;
+
+err_enable_msix:
+	kfree(rocker->msix_entries);
+	return err;
+}
+
+static void rocker_msix_fini(struct rocker *rocker)
+{
+	pci_disable_msix(rocker->pdev);
+	kfree(rocker->msix_entries);
+}
+
+static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct rocker *rocker;
+	int err;
+
+	rocker = kzalloc(sizeof(*rocker), GFP_KERNEL);
+	if (!rocker)
+		return -ENOMEM;
+
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(&pdev->dev, "pci_enable_device failed\n");
+		goto err_pci_enable_device;
+	}
+
+	err = pci_request_regions(pdev, rocker_driver_name);
+	if (err) {
+		dev_err(&pdev->dev, "pci_request_regions failed\n");
+		goto err_pci_request_regions;
+	}
+
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (!err) {
+		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+		if (err) {
+			dev_err(&pdev->dev, "pci_set_consistent_dma_mask failed\n");
+			goto err_pci_set_dma_mask;
+		}
+	} else {
+		err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+		if (err) {
+			dev_err(&pdev->dev, "pci_set_dma_mask failed\n");
+			goto err_pci_set_dma_mask;
+		}
+	}
+
+	if (pci_resource_len(pdev, 0) < ROCKER_PCI_BAR0_SIZE) {
+		dev_err(&pdev->dev, "invalid PCI region size\n");
+		goto err_pci_resource_len_check;
+	}
+
+	rocker->hw_addr = ioremap(pci_resource_start(pdev, 0),
+				  pci_resource_len(pdev, 0));
+	if (!rocker->hw_addr) {
+		dev_err(&pdev->dev, "ioremap failed\n");
+		err = -EIO;
+		goto err_ioremap;
+	}
+	pci_set_master(pdev);
+
+	rocker->pdev = pdev;
+	pci_set_drvdata(pdev, rocker);
+
+	rocker->port_count = rocker_read32(rocker, PORT_PHYS_COUNT);
+
+	err = rocker_msix_init(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "MSI-X init failed\n");
+		goto err_msix_init;
+	}
+
+	err = rocker_basic_hw_test(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "basic hw test failed\n");
+		goto err_basic_hw_test;
+	}
+
+	rocker_write32(rocker, CONTROL, ROCKER_CONTROL_RESET);
+
+	err = rocker_dma_rings_init(rocker);
+	if (err)
+		goto err_dma_rings_init;
+
+	err = request_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_CMD),
+			  rocker_cmd_irq_handler, 0,
+			  rocker_driver_name, rocker);
+	if (err) {
+		dev_err(&pdev->dev, "cannot assign cmd irq\n");
+		goto err_request_cmd_irq;
+	}
+
+	err = request_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT),
+			  rocker_event_irq_handler, 0,
+			  rocker_driver_name, rocker);
+	if (err) {
+		dev_err(&pdev->dev, "cannot assign event irq\n");
+		goto err_request_event_irq;
+	}
+
+	rocker->hw.id = rocker_read64(rocker, SWITCH_ID);
+
+	err = rocker_probe_ports(rocker);
+	if (err) {
+		dev_err(&pdev->dev, "failed to probe ports\n");
+		goto err_probe_ports;
+	}
+
+	hash_init(rocker->flow_tbl);
+	spin_lock_init(&rocker->flow_tbl_lock);
+
+	hash_init(rocker->group_tbl);
+	spin_lock_init(&rocker->group_tbl_lock);
+
+	dev_info(&pdev->dev, "Rocker switch with id %016llx\n", rocker->hw.id);
+
+	return 0;
+
+err_probe_ports:
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
+err_request_event_irq:
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_CMD), rocker);
+err_request_cmd_irq:
+	rocker_dma_rings_fini(rocker);
+err_dma_rings_init:
+err_basic_hw_test:
+	rocker_msix_fini(rocker);
+err_msix_init:
+	iounmap(rocker->hw_addr);
+err_ioremap:
+err_pci_resource_len_check:
+err_pci_set_dma_mask:
+	pci_release_regions(pdev);
+err_pci_request_regions:
+	pci_disable_device(pdev);
+err_pci_enable_device:
+	kfree(rocker);
+	return err;
+}
+
+static void rocker_remove(struct pci_dev *pdev)
+{
+	struct rocker *rocker = pci_get_drvdata(pdev);
+
+	rocker_remove_ports(rocker);
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
+	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_CMD), rocker);
+	rocker_dma_rings_fini(rocker);
+	rocker_msix_fini(rocker);
+	iounmap(rocker->hw_addr);
+	pci_release_regions(rocker->pdev);
+	pci_disable_device(rocker->pdev);
+	kfree(rocker);
+}
+
+static struct pci_driver rocker_pci_driver = {
+	.name		= rocker_driver_name,
+	.id_table	= rocker_pci_id_table,
+	.probe		= rocker_probe,
+	.remove		= rocker_remove,
+};
+
+/************************************
+ * Net device notifier event handler
+ ************************************/
+
+static int rocker_port_master_changed(struct net_device *dev)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	enum rocker_port_mode newmode = ROCKER_PORT_MODE_L2L3;
+	enum rocker_port_mode oldmode;
+	struct net_device *master = netdev_master_upper_dev_get(dev);
+	int err;
+
+	if (master && master->rtnl_link_ops &&
+	    !strcmp(master->rtnl_link_ops->kind, "openvswitch"))
+		newmode = ROCKER_PORT_MODE_OF_DPA;
+	err = rocker_cmd_get_port_settings_mode(rocker_port, &oldmode);
+	if (err)
+		return err;
+	if (newmode == oldmode)
+		return 0;
+	err = rocker_cmd_set_port_settings_mode(rocker_port, newmode);
+	if (err)
+		return err;
+	netdev_info(dev, "port mode changed from %d to %d\n", oldmode, newmode);
+	return err;
+}
+
+static int rocker_device_event(struct notifier_block *unused,
+			       unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	int err;
+
+	if (!rocker_port_dev_check(dev))
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case NETDEV_CHANGEUPPER:
+		err = rocker_port_master_changed(dev);
+		if (err)
+			netdev_warn(dev, "failed to reflect master change (err %d)\n",
+				    err);
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block rocker_notifier_block __read_mostly = {
+	.notifier_call = rocker_device_event,
+};
+
+/***********************
+ * Module init and exit
+ ***********************/
+
+static int __init rocker_module_init(void)
+{
+	int err;
+
+	register_netdevice_notifier(&rocker_notifier_block);
+	err = pci_register_driver(&rocker_pci_driver);
+	if (err)
+		goto err_pci_register_driver;
+	return 0;
+
+err_pci_register_driver:
+	unregister_netdevice_notifier(&rocker_notifier_block);
+	return err;
+}
+
+static void __exit rocker_module_exit(void)
+{
+	unregister_netdevice_notifier(&rocker_notifier_block);
+	pci_unregister_driver(&rocker_pci_driver);
+}
+
+module_init(rocker_module_init);
+module_exit(rocker_module_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>");
+MODULE_AUTHOR("Scott Feldman <sfeldma-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR@public.gmane.org>");
+MODULE_DESCRIPTION("Rocker switch device driver");
+MODULE_DEVICE_TABLE(pci, rocker_pci_id_table);
diff --git a/drivers/net/ethernet/rocker/rocker.h b/drivers/net/ethernet/rocker/rocker.h
new file mode 100644
index 0000000..fc08592
--- /dev/null
+++ b/drivers/net/ethernet/rocker/rocker.h
@@ -0,0 +1,465 @@
+/*
+ * drivers/net/ethernet/rocker/rocker.h - Rocker switch device driver
+ * Copyright (c) 2014 Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
+ * Copyright (c) 2014 Scott Feldman <sfeldma-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _ROCKER_H
+#define _ROCKER_H
+
+#include <linux/types.h>
+
+#define PCI_VENDOR_ID_REDHAT		0x1b36
+#define PCI_DEVICE_ID_REDHAT_ROCKER	0x0006
+
+#define ROCKER_PCI_BAR0_SIZE		0x2000
+
+/* MSI-X vectors */
+enum {
+	ROCKER_MSIX_VEC_CMD,
+	ROCKER_MSIX_VEC_EVENT,
+	ROCKER_MSIX_VEC_TEST,
+	ROCKER_MSIX_VEC_RESERVED0,
+	__ROCKER_MSIX_VEC_TX,
+	__ROCKER_MSIX_VEC_RX,
+#define ROCKER_MSIX_VEC_TX(port) \
+	(__ROCKER_MSIX_VEC_TX + ((port) * 2))
+#define ROCKER_MSIX_VEC_RX(port) \
+	(__ROCKER_MSIX_VEC_RX + ((port) * 2))
+#define ROCKER_MSIX_VEC_COUNT(portcnt) \
+	(ROCKER_MSIX_VEC_RX((portcnt - 1)) + 1)
+};
+
+/* Rocker bogus registers */
+#define ROCKER_BOGUS_REG0		0x0000
+#define ROCKER_BOGUS_REG1		0x0004
+#define ROCKER_BOGUS_REG2		0x0008
+#define ROCKER_BOGUS_REG3		0x000c
+
+/* Rocker test registers */
+#define ROCKER_TEST_REG			0x0010
+#define ROCKER_TEST_REG64		0x0018  /* 8-byte */
+#define ROCKER_TEST_IRQ			0x0020
+#define ROCKER_TEST_DMA_ADDR		0x0028  /* 8-byte */
+#define ROCKER_TEST_DMA_SIZE		0x0030
+#define ROCKER_TEST_DMA_CTRL		0x0034
+
+/* Rocker test register ctrl */
+#define ROCKER_TEST_DMA_CTRL_CLEAR	(1 << 0)
+#define ROCKER_TEST_DMA_CTRL_FILL	(1 << 1)
+#define ROCKER_TEST_DMA_CTRL_INVERT	(1 << 2)
+
+/* Rocker DMA ring register offsets */
+#define ROCKER_DMA_DESC_ADDR(x)		(0x1000 + (x) * 32)  /* 8-byte */
+#define ROCKER_DMA_DESC_SIZE(x)		(0x1008 + (x) * 32)
+#define ROCKER_DMA_DESC_HEAD(x)		(0x100c + (x) * 32)
+#define ROCKER_DMA_DESC_TAIL(x)		(0x1010 + (x) * 32)
+#define ROCKER_DMA_DESC_CTRL(x)		(0x1014 + (x) * 32)
+#define ROCKER_DMA_DESC_CREDITS(x)	(0x1018 + (x) * 32)
+#define ROCKER_DMA_DESC_RES1(x)		(0x101c + (x) * 32)
+
+/* Rocker dma ctrl register bits */
+#define ROCKER_DMA_DESC_CTRL_RESET	(1 << 0)
+
+/* Rocker DMA ring types */
+enum rocker_dma_type {
+	ROCKER_DMA_CMD,
+	ROCKER_DMA_EVENT,
+	__ROCKER_DMA_TX,
+	__ROCKER_DMA_RX,
+#define ROCKER_DMA_TX(port) (__ROCKER_DMA_TX + (port) * 2)
+#define ROCKER_DMA_RX(port) (__ROCKER_DMA_RX + (port) * 2)
+};
+
+/* Rocker DMA ring size limits and default sizes */
+#define ROCKER_DMA_SIZE_MIN		2ul
+#define ROCKER_DMA_SIZE_MAX		65536ul
+#define ROCKER_DMA_CMD_DEFAULT_SIZE	32ul
+#define ROCKER_DMA_EVENT_DEFAULT_SIZE	32ul
+#define ROCKER_DMA_TX_DEFAULT_SIZE	64ul
+#define ROCKER_DMA_TX_DESC_SIZE		256
+#define ROCKER_DMA_RX_DEFAULT_SIZE	64ul
+#define ROCKER_DMA_RX_DESC_SIZE		256
+
+/* Rocker DMA descriptor struct */
+struct rocker_desc {
+	u64 buf_addr;
+	u64 cookie;
+	u16 buf_size;
+	u16 tlv_size;
+	u16 resv[5];
+	u16 comp_err;
+} __packed __aligned(8);
+
+#define ROCKER_DMA_DESC_COMP_ERR_GEN	(1 << 15)
+
+/* Rocker DMA TLV struct */
+struct rocker_tlv {
+	u32 type;
+	u16 len;
+} __packed __aligned(8);
+
+/* TLVs */
+enum {
+	ROCKER_TLV_CMD_UNSPEC,
+	ROCKER_TLV_CMD_TYPE,	/* u16 */
+	ROCKER_TLV_CMD_INFO,	/* nest */
+
+	__ROCKER_TLV_CMD_MAX,
+	ROCKER_TLV_CMD_MAX = __ROCKER_TLV_CMD_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_CMD_TYPE_UNSPEC,
+	ROCKER_TLV_CMD_TYPE_GET_PORT_SETTINGS,
+	ROCKER_TLV_CMD_TYPE_SET_PORT_SETTINGS,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_ADD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_MOD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_DEL,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_GET_STATS,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_ADD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_MOD,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_DEL,
+	ROCKER_TLV_CMD_TYPE_OF_DPA_GROUP_GET_STATS,
+	ROCKER_TLV_CMD_TYPE_TRUNK,
+	ROCKER_TLV_CMD_TYPE_BRIDGE,
+
+	__ROCKER_TLV_CMD_TYPE_MAX,
+	ROCKER_TLV_CMD_TYPE_MAX = __ROCKER_TLV_CMD_TYPE_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_CMD_PORT_SETTINGS_UNSPEC,
+	ROCKER_TLV_CMD_PORT_SETTINGS_LPORT,		/* u32 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_SPEED,		/* u32 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_DUPLEX,		/* u8 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_AUTONEG,		/* u8 */
+	ROCKER_TLV_CMD_PORT_SETTINGS_MACADDR,		/* binary */
+	ROCKER_TLV_CMD_PORT_SETTINGS_MODE,		/* u8 */
+
+	__ROCKER_TLV_CMD_PORT_SETTINGS_MAX,
+	ROCKER_TLV_CMD_PORT_SETTINGS_MAX =
+			__ROCKER_TLV_CMD_PORT_SETTINGS_MAX - 1,
+};
+
+enum rocker_port_mode {
+	ROCKER_PORT_MODE_OF_DPA,
+	ROCKER_PORT_MODE_L2L3,
+};
+
+enum {
+	ROCKER_TLV_EVENT_UNSPEC,
+	ROCKER_TLV_EVENT_TYPE,	/* u16 */
+	ROCKER_TLV_EVENT_INFO,	/* nest */
+
+	__ROCKER_TLV_EVENT_MAX,
+	ROCKER_TLV_EVENT_MAX = __ROCKER_TLV_EVENT_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_EVENT_TYPE_UNSPEC,
+	ROCKER_TLV_EVENT_TYPE_LINK_CHANGED,
+
+	__ROCKER_TLV_EVENT_TYPE_MAX,
+	ROCKER_TLV_EVENT_TYPE_MAX = __ROCKER_TLV_EVENT_TYPE_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_EVENT_LINK_CHANGED_UNSPEC,
+	ROCKER_TLV_EVENT_LINK_CHANGED_LPORT,	/* u32 */
+	ROCKER_TLV_EVENT_LINK_CHANGED_LINKUP,	/* u8 */
+
+	__ROCKER_TLV_EVENT_LINK_CHANGED_MAX,
+	ROCKER_TLV_EVENT_LINK_CHANGED_MAX =
+			__ROCKER_TLV_EVENT_LINK_CHANGED_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_RX_UNSPEC,
+	ROCKER_TLV_RX_FLAGS,		/* u16, see ROCKER_RX_FLAGS_ */
+	ROCKER_TLV_RX_CSUM,		/* u16 */
+	ROCKER_TLV_RX_FRAG_ADDR,	/* u64 */
+	ROCKER_TLV_RX_FRAG_MAX_LEN,	/* u16 */
+	ROCKER_TLV_RX_FRAG_LEN,		/* u16 */
+
+	__ROCKER_TLV_RX_MAX,
+	ROCKER_TLV_RX_MAX = __ROCKER_TLV_RX_MAX - 1,
+};
+
+#define ROCKER_RX_FLAGS_IPV4			(1 << 0)
+#define ROCKER_RX_FLAGS_IPV6			(1 << 1)
+#define ROCKER_RX_FLAGS_CSUM_CALC		(1 << 2)
+#define ROCKER_RX_FLAGS_IPV4_CSUM_GOOD		(1 << 3)
+#define ROCKER_RX_FLAGS_IP_FRAG			(1 << 4)
+#define ROCKER_RX_FLAGS_TCP			(1 << 5)
+#define ROCKER_RX_FLAGS_UDP			(1 << 6)
+#define ROCKER_RX_FLAGS_TCP_UDP_CSUM_GOOD	(1 << 7)
+
+enum {
+	ROCKER_TLV_TX_UNSPEC,
+	ROCKER_TLV_TX_OFFLOAD,		/* u8, see ROCKER_TX_OFFLOAD_ */
+	ROCKER_TLV_TX_L3_CSUM_OFF,	/* u16 */
+	ROCKER_TLV_TX_TSO_MSS,		/* u16 */
+	ROCKER_TLV_TX_TSO_HDR_LEN,	/* u16 */
+	ROCKER_TLV_TX_FRAGS,		/* array */
+
+	__ROCKER_TLV_TX_MAX,
+	ROCKER_TLV_TX_MAX = __ROCKER_TLV_TX_MAX - 1,
+};
+
+#define ROCKER_TX_OFFLOAD_NONE		0
+#define ROCKER_TX_OFFLOAD_IP_CSUM	1
+#define ROCKER_TX_OFFLOAD_TCP_UDP_CSUM	2
+#define ROCKER_TX_OFFLOAD_L3_CSUM	3
+#define ROCKER_TX_OFFLOAD_TSO		4
+
+#define ROCKER_TX_FRAGS_MAX		16
+
+enum {
+	ROCKER_TLV_TX_FRAG_UNSPEC,
+	ROCKER_TLV_TX_FRAG,		/* nest */
+
+	__ROCKER_TLV_TX_FRAG_MAX,
+	ROCKER_TLV_TX_FRAG_MAX = __ROCKER_TLV_TX_FRAG_MAX - 1,
+};
+
+enum {
+	ROCKER_TLV_TX_FRAG_ATTR_UNSPEC,
+	ROCKER_TLV_TX_FRAG_ATTR_ADDR,	/* u64 */
+	ROCKER_TLV_TX_FRAG_ATTR_LEN,	/* u16 */
+
+	__ROCKER_TLV_TX_FRAG_ATTR_MAX,
+	ROCKER_TLV_TX_FRAG_ATTR_MAX = __ROCKER_TLV_TX_FRAG_ATTR_MAX - 1,
+};
+
+/* cmd info nested for OF-DPA msgs */
+enum {
+	ROCKER_TLV_OF_DPA_UNSPEC,
+	ROCKER_TLV_OF_DPA_TABLE_ID,		/* u16 */
+	ROCKER_TLV_OF_DPA_PRIORITY,		/* u32 */
+	ROCKER_TLV_OF_DPA_HARDTIME,		/* u32 */
+	ROCKER_TLV_OF_DPA_IDLETIME,		/* u32 */
+	ROCKER_TLV_OF_DPA_COOKIE,		/* u64 */
+	ROCKER_TLV_OF_DPA_IN_LPORT,		/* u32 */
+	ROCKER_TLV_OF_DPA_IN_LPORT_MASK,	/* u32 */
+	ROCKER_TLV_OF_DPA_OUT_LPORT,		/* u32 */
+	ROCKER_TLV_OF_DPA_GOTO_TABLE_ID,	/* u16 */
+	ROCKER_TLV_OF_DPA_GROUP_ID,		/* u32 */
+	ROCKER_TLV_OF_DPA_GROUP_COUNT,		/* u16 */
+	ROCKER_TLV_OF_DPA_GROUP_IDS,		/* u32 array */
+	ROCKER_TLV_OF_DPA_VLAN_ID,		/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_ID_MASK,		/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_PCP,		/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_PCP_MASK,	/* __be16 */
+	ROCKER_TLV_OF_DPA_VLAN_PCP_ACTION,	/* u8 */
+	ROCKER_TLV_OF_DPA_NEW_VLAN_ID,		/* __be16 */
+	ROCKER_TLV_OF_DPA_NEW_VLAN_PCP,		/* u8 */
+	ROCKER_TLV_OF_DPA_TUNNEL_ID,		/* u32 */
+	ROCKER_TLV_OF_DPA_TUN_LOG_LPORT,	/* u32 */
+	ROCKER_TLV_OF_DPA_ETHERTYPE,		/* __be16 */
+	ROCKER_TLV_OF_DPA_DST_MAC,		/* binary */
+	ROCKER_TLV_OF_DPA_DST_MAC_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_SRC_MAC,		/* binary */
+	ROCKER_TLV_OF_DPA_SRC_MAC_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_IP_PROTO,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_PROTO_MASK,	/* u8 */
+	ROCKER_TLV_OF_DPA_IP_DSCP,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_DSCP_MASK,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_DSCP_ACTION,	/* u8 */
+	ROCKER_TLV_OF_DPA_NEW_IP_DSCP,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_ECN,		/* u8 */
+	ROCKER_TLV_OF_DPA_IP_ECN_MASK,		/* u8 */
+	ROCKER_TLV_OF_DPA_DST_IP,		/* __be32 */
+	ROCKER_TLV_OF_DPA_DST_IP_MASK,		/* __be32 */
+	ROCKER_TLV_OF_DPA_SRC_IP,		/* __be32 */
+	ROCKER_TLV_OF_DPA_SRC_IP_MASK,		/* __be32 */
+	ROCKER_TLV_OF_DPA_DST_IPV6,		/* binary */
+	ROCKER_TLV_OF_DPA_DST_IPV6_MASK,	/* binary */
+	ROCKER_TLV_OF_DPA_SRC_IPV6,		/* binary */
+	ROCKER_TLV_OF_DPA_SRC_IPV6_MASK,	/* binary */
+	ROCKER_TLV_OF_DPA_SRC_ARP_IP,		/* __be32 */
+	ROCKER_TLV_OF_DPA_SRC_ARP_IP_MASK,	/* __be32 */
+	ROCKER_TLV_OF_DPA_L4_DST_PORT,		/* __be16 */
+	ROCKER_TLV_OF_DPA_L4_DST_PORT_MASK,	/* __be16 */
+	ROCKER_TLV_OF_DPA_L4_SRC_PORT,		/* __be16 */
+	ROCKER_TLV_OF_DPA_L4_SRC_PORT_MASK,	/* __be16 */
+	ROCKER_TLV_OF_DPA_ICMP_TYPE,		/* u8 */
+	ROCKER_TLV_OF_DPA_ICMP_TYPE_MASK,	/* u8 */
+	ROCKER_TLV_OF_DPA_ICMP_CODE,		/* u8 */
+	ROCKER_TLV_OF_DPA_ICMP_CODE_MASK,	/* u8 */
+	ROCKER_TLV_OF_DPA_IPV6_LABEL,		/* __be32 */
+	ROCKER_TLV_OF_DPA_IPV6_LABEL_MASK,	/* __be32 */
+	ROCKER_TLV_OF_DPA_QUEUE_ID_ACTION,	/* u8 */
+	ROCKER_TLV_OF_DPA_NEW_QUEUE_ID,		/* u8 */
+	ROCKER_TLV_OF_DPA_CLEAR_ACTIONS,	/* u32 */
+	ROCKER_TLV_OF_DPA_POP_VLAN,		/* u8 */
+
+	__ROCKER_TLV_OF_DPA_MAX,
+	ROCKER_TLV_OF_DPA_MAX = __ROCKER_TLV_OF_DPA_MAX - 1,
+};
+
+/* OF-DPA table IDs */
+
+enum rocker_of_dpa_table_id {
+	ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT = 0,
+	ROCKER_OF_DPA_TABLE_ID_VLAN = 10,
+	ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC = 20,
+	ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING = 30,
+	ROCKER_OF_DPA_TABLE_ID_MULTICAST_ROUTING = 40,
+	ROCKER_OF_DPA_TABLE_ID_BRIDGING = 50,
+	ROCKER_OF_DPA_TABLE_ID_ACL_POLICY = 60,
+};
+
+/* OF_DPA_xxx nest */
+enum {
+	ROCKER_TLV_OF_DPA_INFO_UNSPEC,
+	ROCKER_TLV_OF_DPA_INFO_IN_LPORT,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_IN_LPORT_MASK,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_OUT_LPORT,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_GOTO_TABLE_ID,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_GROUP_ID,		/* u32 */
+	ROCKER_TLV_OF_DPA_INFO_VLAN_ID,			/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_VLAN_ID_MASK,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_VLAN_PCP,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_VLAN_PCP_MASK,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_VLAN_PCP_ACTION,		/* u8 */
+	ROCKER_TLV_OF_DPA_INFO_NEW_VLAN_ID,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_NEW_VLAN_PCP,		/* u8 */
+	ROCKER_TLV_OF_DPA_INFO_TUNNEL_ID,		/* u32 */
+	ROCKER_TLV_OF_DPA_INFO_TUN_LOG_LPORT,		/* u32 */
+	ROCKER_TLV_OF_DPA_INFO_ETHERTYPE,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_DST_MAC,			/* binary */
+	ROCKER_TLV_OF_DPA_INFO_DST_MAC_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_INFO_SRC_MAC,			/* binary */
+	ROCKER_TLV_OF_DPA_INFO_SRC_MAC_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_INFO_IP_PROTO,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_IP_PROTO_MASK,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_DSCP,			/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_DSCP_MASK,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_DSCP_ACTION,		/* u8 */
+	ROCKER_TLV_OF_DPA_INFO_NEW_DSCP,		/* u8 */
+	ROCKER_TLV_OF_DPA_INFO_ECN,			/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_ECN_MASK,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_DST_IP,			/* binary */
+	ROCKER_TLV_OF_DPA_INFO_DST_IP_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_INFO_SRC_IP,			/* binary */
+	ROCKER_TLV_OF_DPA_INFO_SRC_IP_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_INFO_DST_IPV6,		/* binary */
+	ROCKER_TLV_OF_DPA_INFO_DST_IPV6_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_INFO_SRC_IPV6,		/* binary */
+	ROCKER_TLV_OF_DPA_INFO_SRC_IPV6_MASK,		/* binary */
+	ROCKER_TLV_OF_DPA_INFO_SRC_ARP_IP,		/* u32 */
+	ROCKER_TLV_OF_DPA_INFO_SRC_ARP_IP_MASK,		/* u32 */
+	ROCKER_TLV_OF_DPA_INFO_L4_DST_PORT,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_L4_DST_PORT_MASK,	/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_L4_SRC_PORT,		/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_L4_SRC_PORT_MASK,	/* u16 */
+	ROCKER_TLV_OF_DPA_INFO_ICMP_TYPE,		/* u8 */
+	ROCKER_TLV_OF_DPA_INFO_ICMP_TYPE_MASK,		/* u8 */
+	ROCKER_TLV_OF_DPA_INFO_ICMP_CODE,		/* u8 */
+	ROCKER_TLV_OF_DPA_INFO_ICMP_CODE_MASK,		/* u8 */
+	ROCKER_TLV_OF_DPA_INFO_IPV6_LABEL,		/* u32 */
+	ROCKER_TLV_OF_DPA_INFO_IPV6_LABEL_MASK,		/* u32 */
+	ROCKER_TLV_OF_DPA_INFO_QUEUE_ID_ACTION,		/* u8 */
+	ROCKER_TLV_OF_DPA_INFO_NEW_QUEUE_ID,		/* u8 */
+	ROCKER_TLV_OF_DPA_INFO_CLEAR_ACTIONS,		/* u32 */
+
+	__ROCKER_TLV_OF_DPA_INFO_MAX,
+	ROCKER_TLV_OF_DPA_INFO_MAX = __ROCKER_TLV_OF_DPA_INFO_MAX - 1,
+};
+
+/* OF-DPA flow stats */
+enum {
+	ROCKER_TLV_OF_DPA_FLOW_STAT_UNSPEC,
+	ROCKER_TLV_OF_DPA_FLOW_STAT_DURATION,	/* u32 */
+	ROCKER_TLV_OF_DPA_FLOW_STAT_RX_PKTS,	/* u64 */
+	ROCKER_TLV_OF_DPA_FLOW_STAT_TX_PKTS,	/* u64 */
+
+	__ROCKER_TLV_OF_DPA_FLOW_STAT_MAX,
+	ROCKER_TLV_OF_DPA_FLOW_STAT_MAX = __ROCKER_TLV_OF_DPA_FLOW_STAT_MAX - 1,
+};
+
+/* OF-DPA group types */
+enum rocker_of_dpa_group_type {
+	ROCKER_OF_DPA_GROUP_TYPE_L2_INTERFACE = 0,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_REWRITE,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_UCAST,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_FLOOD,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_INTERFACE,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_MCAST,
+	ROCKER_OF_DPA_GROUP_TYPE_L3_ECMP,
+	ROCKER_OF_DPA_GROUP_TYPE_L2_OVERLAY,
+};
+
+/* OF-DPA group L2 overlay types */
+enum rocker_of_dpa_overlay_type {
+	ROCKER_OF_DPA_OVERLAY_TYPE_FLOOD_UCAST = 0,
+	ROCKER_OF_DPA_OVERLAY_TYPE_FLOOD_MCAST,
+	ROCKER_OF_DPA_OVERLAY_TYPE_MCAST_UCAST,
+	ROCKER_OF_DPA_OVERLAY_TYPE_MCAST_MCAST,
+};
+
+/* OF-DPA group ID encoding */
+#define ROCKER_GROUP_TYPE_SHIFT 28
+#define ROCKER_GROUP_TYPE_MASK 0xf0000000
+#define ROCKER_GROUP_VLAN_SHIFT 16
+#define ROCKER_GROUP_VLAN_MASK 0x0fff0000
+#define ROCKER_GROUP_PORT_SHIFT 0
+#define ROCKER_GROUP_PORT_MASK 0x0000ffff
+#define ROCKER_GROUP_TUNNEL_ID_SHIFT 12
+#define ROCKER_GROUP_TUNNEL_ID_MASK 0x0ffff000
+#define ROCKER_GROUP_SUBTYPE_SHIFT 10
+#define ROCKER_GROUP_SUBTYPE_MASK 0x00000c00
+#define ROCKER_GROUP_INDEX_SHIFT 0
+#define ROCKER_GROUP_INDEX_MASK 0x0000ffff
+#define ROCKER_GROUP_INDEX_LONG_SHIFT 0
+#define ROCKER_GROUP_INDEX_LONG_MASK 0x0fffffff
+
+#define ROCKER_GROUP_TYPE_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_TYPE_MASK) >> ROCKER_GROUP_TYPE_SHIFT)
+#define ROCKER_GROUP_TYPE_SET(type) \
+	(((type) << ROCKER_GROUP_TYPE_SHIFT) & ROCKER_GROUP_TYPE_MASK)
+#define ROCKER_GROUP_VLAN_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_VLAN_ID_MASK) >> ROCKER_GROUP_VLAN_ID_SHIFT)
+#define ROCKER_GROUP_VLAN_SET(vlan_id) \
+	(((vlan_id) << ROCKER_GROUP_VLAN_SHIFT) & ROCKER_GROUP_VLAN_MASK)
+#define ROCKER_GROUP_PORT_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_PORT_MASK) >> ROCKER_GROUP_PORT_SHIFT)
+#define ROCKER_GROUP_PORT_SET(port) \
+	(((port) << ROCKER_GROUP_PORT_SHIFT) & ROCKER_GROUP_PORT_MASK)
+#define ROCKER_GROUP_INDEX_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_INDEX_MASK) >> ROCKER_GROUP_INDEX_SHIFT)
+#define ROCKER_GROUP_INDEX_SET(index) \
+	(((index) << ROCKER_GROUP_INDEX_SHIFT) & ROCKER_GROUP_INDEX_MASK)
+#define ROCKER_GROUP_INDEX_LONG_GET(group_id) \
+	(((group_id) & ROCKER_GROUP_INDEX_LONG_MASK) >> \
+	 ROCKER_GROUP_INDEX_LONG_SHIFT)
+#define ROCKER_GROUP_INDEX_LONG_SET(index) \
+	(((index) << ROCKER_GROUP_INDEX_LONG_SHIFT) & \
+	 ROCKER_GROUP_INDEX_LONG_MASK)
+
+#define ROCKER_GROUP_NONE 0
+#define ROCKER_GROUP_L2_INTERFACE(vlan_id, port) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L2_INTERFACE) |\
+	 ROCKER_GROUP_VLAN_SET(vlan_id) | ROCKER_GROUP_PORT_SET(port))
+#define ROCKER_GROUP_L2_MCAST(vlan_id, index) \
+	(ROCKER_GROUP_TYPE_SET(ROCKER_OF_DPA_GROUP_TYPE_L2_MCAST) |\
+	 ROCKER_GROUP_VLAN_SET(vlan_id) | ROCKER_GROUP_INDEX_SET(index))
+
+/* Rocker general purpose registers */
+#define ROCKER_CONTROL			0x0300
+#define ROCKER_PORT_PHYS_COUNT		0x0304
+#define ROCKER_PORT_PHYS_LINK_STATUS	0x0310 /* 8-byte */
+#define ROCKER_PORT_PHYS_ENABLE		0x0318 /* 8-byte */
+#define ROCKER_SWITCH_ID		0x0320 /* 8-byte */
+
+/* Rocker control bits */
+#define ROCKER_CONTROL_RESET		(1 << 0)
+
+#endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 13/13] switchdev: introduce Netlink API
  2014-09-03  9:24 [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Jiri Pirko
                   ` (4 preceding siblings ...)
  2014-09-03  9:24 ` [patch net-next 09/13] openvswitch: introduce vport_op get_netdev Jiri Pirko
@ 2014-09-03  9:25 ` Jiri Pirko
  2014-09-08 13:54 ` [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Thomas Graf
  6 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-03  9:25 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, dev, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye

This patch exposes switchdev API using generic Netlink.
Example userspace utility is here:
https://github.com/jpirko/switchdev

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 MAINTAINERS                       |   1 +
 include/uapi/linux/switchdev.h    | 119 +++++++++
 net/switchdev/Kconfig             |  11 +
 net/switchdev/Makefile            |   1 +
 net/switchdev/switchdev_netlink.c | 493 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 625 insertions(+)
 create mode 100644 include/uapi/linux/switchdev.h
 create mode 100644 net/switchdev/switchdev_netlink.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 9797bda..83c4f43 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8820,6 +8820,7 @@ L:	netdev@vger.kernel.org
 S:	Supported
 F:	net/switchdev/
 F:	include/net/switchdev.h
+F:	include/uapi/linux/switchdev.h
 
 SYNOPSYS ARC ARCHITECTURE
 M:	Vineet Gupta <vgupta@synopsys.com>
diff --git a/include/uapi/linux/switchdev.h b/include/uapi/linux/switchdev.h
new file mode 100644
index 0000000..83692e2
--- /dev/null
+++ b/include/uapi/linux/switchdev.h
@@ -0,0 +1,119 @@
+/*
+ * include/uapi/linux/switchdev.h - Netlink interface to Switch device
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _UAPI_LINUX_SWITCHDEV_H_
+#define _UAPI_LINUX_SWITCHDEV_H_
+
+enum {
+	SWDEV_CMD_NOOP,
+	SWDEV_CMD_FLOW_INSERT,
+	SWDEV_CMD_FLOW_REMOVE,
+};
+
+enum {
+	SWDEV_ATTR_UNSPEC,
+	SWDEV_ATTR_IFINDEX,			/* u32 */
+	SWDEV_ATTR_FLOW,			/* nest */
+
+	__SWDEV_ATTR_MAX,
+	SWDEV_ATTR_MAX = (__SWDEV_ATTR_MAX - 1),
+};
+
+enum {
+	SWDEV_ATTR_FLOW_KEY_UNSPEC,
+	SWDEV_ATTR_FLOW_KEY_TUN_ID,		/* be64 */
+	SWDEV_ATTR_FLOW_KEY_TUN_IPV4_SRC,	/* be32 */
+	SWDEV_ATTR_FLOW_KEY_TUN_IPV4_DST,	/* be32 */
+	SWDEV_ATTR_FLOW_KEY_TUN_FLAGS,		/* be16 */
+	SWDEV_ATTR_FLOW_KEY_TUN_IPV4_TOS,	/* u8 */
+	SWDEV_ATTR_FLOW_KEY_TUN_IPV4_TTL,	/* u8 */
+	SWDEV_ATTR_FLOW_KEY_PHY_PRIORITY,	/* u32 */
+	SWDEV_ATTR_FLOW_KEY_PHY_IN_PORT,	/* u32 (ifindex) */
+	SWDEV_ATTR_FLOW_KEY_ETH_SRC,		/* ETH_ALEN */
+	SWDEV_ATTR_FLOW_KEY_ETH_DST,		/* ETH_ALEN */
+	SWDEV_ATTR_FLOW_KEY_ETH_TCI,		/* be16 */
+	SWDEV_ATTR_FLOW_KEY_ETH_TYPE,		/* be16 */
+	SWDEV_ATTR_FLOW_KEY_IP_PROTO,		/* u8 */
+	SWDEV_ATTR_FLOW_KEY_IP_TOS,		/* u8 */
+	SWDEV_ATTR_FLOW_KEY_IP_TTL,		/* u8 */
+	SWDEV_ATTR_FLOW_KEY_IP_FRAG,		/* u8 */
+	SWDEV_ATTR_FLOW_KEY_TP_SRC,		/* be16 */
+	SWDEV_ATTR_FLOW_KEY_TP_DST,		/* be16 */
+	SWDEV_ATTR_FLOW_KEY_TP_FLAGS,		/* be16 */
+	SWDEV_ATTR_FLOW_KEY_IPV4_ADDR_SRC,	/* be32 */
+	SWDEV_ATTR_FLOW_KEY_IPV4_ADDR_DST,	/* be32 */
+	SWDEV_ATTR_FLOW_KEY_IPV4_ARP_SHA,	/* ETH_ALEN */
+	SWDEV_ATTR_FLOW_KEY_IPV4_ARP_THA,	/* ETH_ALEN */
+	SWDEV_ATTR_FLOW_KEY_IPV6_ADDR_SRC,	/* struct in6_addr */
+	SWDEV_ATTR_FLOW_KEY_IPV6_ADDR_DST,	/* struct in6_addr */
+	SWDEV_ATTR_FLOW_KEY_IPV6_LABEL,		/* be32 */
+	SWDEV_ATTR_FLOW_KEY_IPV6_ND_TARGET,	/* struct in6_addr */
+	SWDEV_ATTR_FLOW_KEY_IPV6_ND_SLL,	/* ETH_ALEN */
+	SWDEV_ATTR_FLOW_KEY_IPV6_ND_TLL,	/* ETH_ALEN */
+
+	__SWDEV_ATTR_FLOW_KEY_MAX,
+	SWDEV_ATTR_FLOW_KEY_MAX = (__SWDEV_ATTR_FLOW_KEY_MAX - 1),
+};
+
+enum {
+	SWDEV_FLOW_ACTION_TYPE_OUTPUT,
+	SWDEV_FLOW_ACTION_TYPE_VLAN_PUSH,
+	SWDEV_FLOW_ACTION_TYPE_VLAN_POP,
+};
+
+enum {
+	SWDEV_ATTR_FLOW_ACTION_UNSPEC,
+	SWDEV_ATTR_FLOW_ACTION_TYPE,		/* u32 */
+	SWDEV_ATTR_FLOW_ACTION_OUT_PORT,	/* u32 (ifindex) */
+	SWDEV_ATTR_FLOW_ACTION_VLAN_PROTO,	/* be16 */
+	SWDEV_ATTR_FLOW_ACTION_VLAN_TCI,	/* u16 */
+
+	__SWDEV_ATTR_FLOW_ACTION_MAX,
+	SWDEV_ATTR_FLOW_ACTION_MAX = (__SWDEV_ATTR_FLOW_ACTION_MAX - 1),
+};
+
+enum {
+	SWDEV_ATTR_FLOW_ITEM_UNSPEC,
+	SWDEV_ATTR_FLOW_ITEM_ACTION,		/* nest */
+
+	__SWDEV_ATTR_FLOW_ITEM_MAX,
+	SWDEV_ATTR_FLOW_ITEM_MAX = (__SWDEV_ATTR_FLOW_ITEM_MAX - 1),
+};
+
+enum {
+	SWDEV_ATTR_FLOW_UNSPEC,
+	SWDEV_ATTR_FLOW_KEY,			/* nest */
+	SWDEV_ATTR_FLOW_MASK,			/* nest */
+	SWDEV_ATTR_FLOW_LIST_ACTION,		/* nest */
+
+	__SWDEV_ATTR_FLOW_MAX,
+	SWDEV_ATTR_FLOW_MAX = (__SWDEV_ATTR_FLOW_MAX - 1),
+};
+
+/* Nested layout of flow add/remove command message:
+ *
+ *	[SWDEV_ATTR_IFINDEX]
+ *	[SWDEV_ATTR_FLOW]
+ *		[SWDEV_ATTR_FLOW_KEY]
+ *			[SWDEV_ATTR_FLOW_KEY_*], ...
+ *		[SWDEV_ATTR_FLOW_MASK]
+ *			[SWDEV_ATTR_FLOW_KEY_*], ...
+ *		[SWDEV_ATTR_FLOW_LIST_ACTION]
+ *			[SWDEV_ATTR_FLOW_ITEM_ACTION]
+ *				[SWDEV_ATTR_FLOW_ACTION_*], ...
+ *			[SWDEV_ATTR_FLOW_ITEM_ACTION]
+ *				[SWDEV_ATTR_FLOW_ACTION_*], ...
+ *			...
+ */
+
+#define SWITCHDEV_GENL_NAME "switchdev"
+#define SWITCHDEV_GENL_VERSION 0x1
+
+#endif /* _UAPI_LINUX_SWITCHDEV_H_ */
diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
index 20e8ed2..4470d6e 100644
--- a/net/switchdev/Kconfig
+++ b/net/switchdev/Kconfig
@@ -7,3 +7,14 @@ config NET_SWITCHDEV
 	depends on INET
 	---help---
 	  This module provides support for hardware switch chips.
+
+config NET_SWITCHDEV_NETLINK
+	tristate "Netlink interface to Switch device"
+	depends on NET_SWITCHDEV
+	default m
+	---help---
+	  This module provides Generic Netlink intercace to hardware switch
+	  chips.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called switchdev_netlink.
diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
index 5ed63ed..0695b53 100644
--- a/net/switchdev/Makefile
+++ b/net/switchdev/Makefile
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
+obj-$(CONFIG_NET_SWITCHDEV_NETLINK) += switchdev_netlink.o
diff --git a/net/switchdev/switchdev_netlink.c b/net/switchdev/switchdev_netlink.c
new file mode 100644
index 0000000..14a3dd1
--- /dev/null
+++ b/net/switchdev/switchdev_netlink.c
@@ -0,0 +1,493 @@
+/*
+ * net/switchdev/switchdev_netlink.c - Netlink interface to Switch device
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <net/sw_flow.h>
+#include <net/switchdev.h>
+#include <net/netlink.h>
+#include <net/genetlink.h>
+#include <net/netlink.h>
+#include <uapi/linux/switchdev.h>
+
+static struct genl_family swdev_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= SWITCHDEV_GENL_NAME,
+	.version	= SWITCHDEV_GENL_VERSION,
+	.maxattr	= SWDEV_ATTR_MAX,
+	.netnsok	= true,
+};
+
+static const struct nla_policy swdev_nl_flow_policy[SWDEV_ATTR_FLOW_MAX + 1] = {
+	[SWDEV_ATTR_FLOW_UNSPEC]		= { .type = NLA_UNSPEC, },
+	[SWDEV_ATTR_FLOW_KEY]			= { .type = NLA_NESTED },
+	[SWDEV_ATTR_FLOW_MASK]			= { .type = NLA_NESTED },
+	[SWDEV_ATTR_FLOW_LIST_ACTION]		= { .type = NLA_NESTED },
+};
+
+#define __IN6_ALEN sizeof(struct in6_addr)
+
+static const struct nla_policy
+swdev_nl_flow_key_policy[SWDEV_ATTR_FLOW_KEY_MAX + 1] = {
+	[SWDEV_ATTR_FLOW_KEY_UNSPEC]		= { .type = NLA_UNSPEC, },
+	[SWDEV_ATTR_FLOW_KEY_TUN_ID]		= { .type = NLA_U64, },
+	[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_SRC]	= { .type = NLA_U32, },
+	[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_DST]	= { .type = NLA_U32, },
+	[SWDEV_ATTR_FLOW_KEY_TUN_FLAGS]		= { .type = NLA_U16, },
+	[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_TOS]	= { .type = NLA_U8, },
+	[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_TTL]	= { .type = NLA_U8, },
+	[SWDEV_ATTR_FLOW_KEY_PHY_PRIORITY]	= { .type = NLA_U32, },
+	[SWDEV_ATTR_FLOW_KEY_PHY_IN_PORT]	= { .type = NLA_U32, },
+	[SWDEV_ATTR_FLOW_KEY_ETH_SRC]		= { .len  = ETH_ALEN, },
+	[SWDEV_ATTR_FLOW_KEY_ETH_DST]		= { .len  = ETH_ALEN, },
+	[SWDEV_ATTR_FLOW_KEY_ETH_TCI]		= { .type = NLA_U16, },
+	[SWDEV_ATTR_FLOW_KEY_ETH_TYPE]		= { .type = NLA_U16, },
+	[SWDEV_ATTR_FLOW_KEY_IP_PROTO]		= { .type = NLA_U8, },
+	[SWDEV_ATTR_FLOW_KEY_IP_TOS]		= { .type = NLA_U8, },
+	[SWDEV_ATTR_FLOW_KEY_IP_TTL]		= { .type = NLA_U8, },
+	[SWDEV_ATTR_FLOW_KEY_IP_FRAG]		= { .type = NLA_U8, },
+	[SWDEV_ATTR_FLOW_KEY_TP_SRC]		= { .type = NLA_U16, },
+	[SWDEV_ATTR_FLOW_KEY_TP_DST]		= { .type = NLA_U16, },
+	[SWDEV_ATTR_FLOW_KEY_TP_FLAGS]		= { .type = NLA_U16, },
+	[SWDEV_ATTR_FLOW_KEY_IPV4_ADDR_SRC]	= { .type = NLA_U32, },
+	[SWDEV_ATTR_FLOW_KEY_IPV4_ADDR_DST]	= { .type = NLA_U32, },
+	[SWDEV_ATTR_FLOW_KEY_IPV4_ARP_SHA]	= { .len  = ETH_ALEN, },
+	[SWDEV_ATTR_FLOW_KEY_IPV4_ARP_THA]	= { .len  = ETH_ALEN, },
+	[SWDEV_ATTR_FLOW_KEY_IPV6_ADDR_SRC]	= { .len  = __IN6_ALEN, },
+	[SWDEV_ATTR_FLOW_KEY_IPV6_ADDR_DST]	= { .len  = __IN6_ALEN, },
+	[SWDEV_ATTR_FLOW_KEY_IPV6_LABEL]	= { .type = NLA_U32, },
+	[SWDEV_ATTR_FLOW_KEY_IPV6_ND_TARGET]	= { .len  = __IN6_ALEN, },
+	[SWDEV_ATTR_FLOW_KEY_IPV6_ND_SLL]	= { .len  = ETH_ALEN },
+	[SWDEV_ATTR_FLOW_KEY_IPV6_ND_TLL]	= { .len  = ETH_ALEN },
+};
+
+static const struct nla_policy
+swdev_nl_flow_action_policy[SWDEV_ATTR_FLOW_ACTION_MAX + 1] = {
+	[SWDEV_ATTR_FLOW_ACTION_UNSPEC]		= { .type = NLA_UNSPEC, },
+	[SWDEV_ATTR_FLOW_ACTION_TYPE]		= { .type = NLA_U32, },
+	[SWDEV_ATTR_FLOW_ACTION_VLAN_PROTO]	= { .type = NLA_U16, },
+	[SWDEV_ATTR_FLOW_ACTION_VLAN_TCI]	= { .type = NLA_U16, },
+};
+
+static int swdev_nl_cmd_noop(struct sk_buff *skb, struct genl_info *info)
+{
+	struct sk_buff *msg;
+	void *hdr;
+	int err;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq,
+			  &swdev_nl_family, 0, SWDEV_CMD_NOOP);
+	if (!hdr) {
+		err = -EMSGSIZE;
+		goto err_msg_put;
+	}
+
+	genlmsg_end(msg, hdr);
+
+	return genlmsg_unicast(genl_info_net(info), msg, info->snd_portid);
+
+err_msg_put:
+	nlmsg_free(msg);
+
+	return err;
+}
+
+static int swdev_nl_parse_flow_key(struct nlattr *key_attr,
+				   struct sw_flow_key *flow_key)
+{
+	struct nlattr *attrs[SWDEV_ATTR_FLOW_KEY_MAX + 1];
+	int err;
+
+	err = nla_parse_nested(attrs, SWDEV_ATTR_FLOW_KEY_MAX,
+			       key_attr, swdev_nl_flow_key_policy);
+	if (err)
+		return err;
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_TUN_ID])
+		flow_key->tun_key.tun_id =
+			nla_get_be64(attrs[SWDEV_ATTR_FLOW_KEY_TUN_ID]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_SRC])
+		flow_key->tun_key.ipv4_src =
+			nla_get_be32(attrs[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_SRC]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_DST])
+		flow_key->tun_key.ipv4_dst =
+			nla_get_be32(attrs[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_DST]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_TUN_FLAGS])
+		flow_key->tun_key.tun_flags =
+			nla_get_be16(attrs[SWDEV_ATTR_FLOW_KEY_TUN_FLAGS]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_TOS])
+		flow_key->tun_key.ipv4_tos =
+			nla_get_u8(attrs[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_TOS]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_TTL])
+		flow_key->tun_key.ipv4_ttl =
+			nla_get_u8(attrs[SWDEV_ATTR_FLOW_KEY_TUN_IPV4_TTL]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_PHY_PRIORITY])
+		flow_key->phy.priority =
+			nla_get_u32(attrs[SWDEV_ATTR_FLOW_KEY_PHY_PRIORITY]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_PHY_IN_PORT])
+		flow_key->misc.in_port_ifindex =
+			nla_get_u32(attrs[SWDEV_ATTR_FLOW_KEY_PHY_IN_PORT]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_ETH_SRC])
+		ether_addr_copy(flow_key->eth.src,
+				nla_data(attrs[SWDEV_ATTR_FLOW_KEY_ETH_SRC]));
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_ETH_DST])
+		ether_addr_copy(flow_key->eth.dst,
+				nla_data(attrs[SWDEV_ATTR_FLOW_KEY_ETH_DST]));
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_ETH_TCI])
+		flow_key->eth.tci =
+			nla_get_be16(attrs[SWDEV_ATTR_FLOW_KEY_ETH_TCI]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_ETH_TYPE])
+		flow_key->eth.type =
+			nla_get_be16(attrs[SWDEV_ATTR_FLOW_KEY_ETH_TYPE]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IP_PROTO])
+		flow_key->ip.proto =
+			nla_get_u8(attrs[SWDEV_ATTR_FLOW_KEY_IP_PROTO]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IP_TOS])
+		flow_key->ip.tos =
+			nla_get_u8(attrs[SWDEV_ATTR_FLOW_KEY_IP_TOS]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IP_TTL])
+		flow_key->ip.ttl =
+			nla_get_u8(attrs[SWDEV_ATTR_FLOW_KEY_IP_TTL]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IP_FRAG])
+		flow_key->ip.frag =
+			nla_get_u8(attrs[SWDEV_ATTR_FLOW_KEY_IP_FRAG]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_TP_SRC])
+		flow_key->tp.src =
+			nla_get_be16(attrs[SWDEV_ATTR_FLOW_KEY_TP_SRC]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_TP_DST])
+		flow_key->tp.dst =
+			nla_get_be16(attrs[SWDEV_ATTR_FLOW_KEY_TP_DST]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_TP_FLAGS])
+		flow_key->tp.flags =
+			nla_get_be16(attrs[SWDEV_ATTR_FLOW_KEY_TP_FLAGS]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IPV4_ADDR_SRC])
+		flow_key->ipv4.addr.src =
+			nla_get_be32(attrs[SWDEV_ATTR_FLOW_KEY_IPV4_ADDR_SRC]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IPV4_ADDR_DST])
+		flow_key->ipv4.addr.dst =
+			nla_get_be32(attrs[SWDEV_ATTR_FLOW_KEY_IPV4_ADDR_DST]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IPV4_ARP_SHA])
+		ether_addr_copy(flow_key->ipv4.arp.sha,
+				nla_data(attrs[SWDEV_ATTR_FLOW_KEY_IPV4_ARP_SHA]));
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IPV4_ARP_THA])
+		ether_addr_copy(flow_key->ipv4.arp.tha,
+				nla_data(attrs[SWDEV_ATTR_FLOW_KEY_IPV4_ARP_THA]));
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IPV6_ADDR_SRC])
+		memcpy(&flow_key->ipv6.addr.src,
+		       nla_data(attrs[SWDEV_ATTR_FLOW_KEY_IPV6_ADDR_SRC]),
+		       sizeof(flow_key->ipv6.addr.src));
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IPV6_ADDR_DST])
+		memcpy(&flow_key->ipv6.addr.dst,
+		       nla_data(attrs[SWDEV_ATTR_FLOW_KEY_IPV6_ADDR_DST]),
+		       sizeof(flow_key->ipv6.addr.dst));
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IPV6_LABEL])
+		flow_key->ipv6.label =
+			nla_get_be32(attrs[SWDEV_ATTR_FLOW_KEY_IPV6_LABEL]);
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IPV6_ND_TARGET])
+		memcpy(&flow_key->ipv6.nd.target,
+		       nla_data(attrs[SWDEV_ATTR_FLOW_KEY_IPV6_ND_TARGET]),
+		       sizeof(flow_key->ipv6.nd.target));
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IPV6_ND_SLL])
+		ether_addr_copy(flow_key->ipv6.nd.sll,
+				nla_data(attrs[SWDEV_ATTR_FLOW_KEY_IPV6_ND_SLL]));
+
+	if (attrs[SWDEV_ATTR_FLOW_KEY_IPV6_ND_TLL])
+		ether_addr_copy(flow_key->ipv6.nd.tll,
+				nla_data(attrs[SWDEV_ATTR_FLOW_KEY_IPV6_ND_TLL]));
+
+	return 0;
+}
+
+static int swdev_nl_parse_flow_mask(struct nlattr *mask_attr,
+				    struct sw_flow_mask **p_flow_mask)
+{
+	struct sw_flow_mask *flow_mask;
+	int err;
+
+	flow_mask = kzalloc(sizeof(*flow_mask), GFP_KERNEL);
+	if (!flow_mask)
+		return -ENOMEM;
+
+	err = swdev_nl_parse_flow_key(mask_attr, &flow_mask->key);
+	if (err)
+		goto out;
+	flow_mask->range.start = 0;
+	flow_mask->range.end = sizeof(flow_mask->key);
+
+	*p_flow_mask = flow_mask;
+	return 0;
+out:
+	kfree(flow_mask);
+	return err;
+}
+
+static int swdev_nl_parse_flow_action(struct nlattr *action_attr,
+				      struct sw_flow_action *flow_action)
+{
+	struct nlattr *attrs[SWDEV_ATTR_FLOW_ACTION_MAX + 1];
+	int err;
+
+	err = nla_parse_nested(attrs, SWDEV_ATTR_FLOW_ACTION_MAX,
+			       action_attr, swdev_nl_flow_action_policy);
+	if (err)
+		return err;
+
+	if (!attrs[SWDEV_ATTR_FLOW_ACTION_TYPE])
+		return -EINVAL;
+
+	switch (nla_get_u32(attrs[SWDEV_ATTR_FLOW_ACTION_TYPE])) {
+	case SWDEV_FLOW_ACTION_TYPE_OUTPUT:
+		if (!attrs[SWDEV_ATTR_FLOW_ACTION_OUT_PORT])
+			return -EINVAL;
+		flow_action->out_port_ifindex =
+			nla_get_u32(attrs[SWDEV_ATTR_FLOW_ACTION_OUT_PORT]);
+		flow_action->type = SW_FLOW_ACTION_TYPE_OUTPUT;
+		break;
+	case SWDEV_FLOW_ACTION_TYPE_VLAN_PUSH:
+		if (!attrs[SWDEV_ATTR_FLOW_ACTION_VLAN_PROTO] ||
+		    !attrs[SWDEV_ATTR_FLOW_ACTION_VLAN_TCI])
+			return -EINVAL;
+		flow_action->vlan.vlan_proto =
+			nla_get_be16(attrs[SWDEV_ATTR_FLOW_ACTION_VLAN_PROTO]);
+		flow_action->vlan.vlan_tci =
+			nla_get_u16(attrs[SWDEV_ATTR_FLOW_ACTION_VLAN_TCI]);
+		flow_action->type = SW_FLOW_ACTION_TYPE_VLAN_PUSH;
+		break;
+	case SWDEV_FLOW_ACTION_TYPE_VLAN_POP:
+		flow_action->type = SW_FLOW_ACTION_TYPE_VLAN_POP;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int swdev_nl_parse_flow_actions(struct nlattr *actions_attr,
+				       struct sw_flow_actions **p_flow_actions)
+{
+	struct sw_flow_actions *flow_actions;
+	struct sw_flow_action *cur;
+	struct nlattr *action_attr;
+	int rem;
+	int count = 0;
+	int err;
+
+	nla_for_each_nested(action_attr, actions_attr, rem) {
+		if (nla_type(action_attr) != SWDEV_ATTR_FLOW_ITEM_ACTION)
+			return -EINVAL;
+		count++;
+	}
+
+	flow_actions = kzalloc(sizeof(struct sw_flow_actions) +
+			       sizeof(struct sw_flow_action) * count,
+			       GFP_KERNEL);
+	if (!flow_actions)
+		return -ENOMEM;
+
+	cur = flow_actions->actions;
+	nla_for_each_nested(action_attr, actions_attr, rem) {
+		err = swdev_nl_parse_flow_action(action_attr, cur);
+		if (err)
+			goto out;
+		cur++;
+	}
+
+	flow_actions->count = count;
+	*p_flow_actions = flow_actions;
+	return 0;
+out:
+	kfree(flow_actions);
+	return err;
+}
+
+static void swdev_nl_free_flow(struct sw_flow *flow)
+{
+	kfree(flow->actions);
+	kfree(flow->mask);
+	kfree(flow);
+}
+
+static int swdev_nl_parse_flow(struct nlattr *flow_attr, struct sw_flow **p_flow)
+{
+	struct sw_flow *flow;
+	struct nlattr *attrs[SWDEV_ATTR_FLOW_MAX + 1];
+	int err;
+
+	err = nla_parse_nested(attrs, SWDEV_ATTR_FLOW_MAX,
+			       flow_attr, swdev_nl_flow_policy);
+	if (err)
+		return err;
+
+	if (!attrs[SWDEV_ATTR_FLOW_KEY] || !attrs[SWDEV_ATTR_FLOW_MASK] ||
+	    !attrs[SWDEV_ATTR_FLOW_LIST_ACTION])
+		return -EINVAL;
+
+	flow = kzalloc(sizeof(*flow), GFP_KERNEL);
+	if (!flow)
+		return -ENOMEM;
+
+	err = swdev_nl_parse_flow_key(attrs[SWDEV_ATTR_FLOW_KEY], &flow->key);
+	if (err)
+		goto out;
+
+	err = swdev_nl_parse_flow_mask(attrs[SWDEV_ATTR_FLOW_MASK], &flow->mask);
+	if (err)
+		goto out;
+
+	err = swdev_nl_parse_flow_actions(attrs[SWDEV_ATTR_FLOW_LIST_ACTION],
+					  &flow->actions);
+	if (err)
+		goto out;
+
+	*p_flow = flow;
+	return 0;
+
+out:
+	kfree(flow);
+	return err;
+}
+
+static struct net_device *swdev_nl_dev_get(struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	int ifindex;
+
+	if (!info->attrs[SWDEV_ATTR_IFINDEX])
+		return NULL;
+
+	ifindex = nla_get_u32(info->attrs[SWDEV_ATTR_IFINDEX]);
+	return dev_get_by_index(net, ifindex);
+}
+
+static void swdev_nl_dev_put(struct net_device *dev)
+{
+	dev_put(dev);
+}
+
+static int swdev_nl_cmd_flow_insert(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net_device *dev;
+	struct sw_flow *flow;
+	int err;
+
+	if (!info->attrs[SWDEV_ATTR_FLOW])
+		return -EINVAL;
+
+	dev = swdev_nl_dev_get(info);
+	if (!dev)
+		return -EINVAL;
+
+	err = swdev_nl_parse_flow(info->attrs[SWDEV_ATTR_FLOW], &flow);
+	if (err)
+		goto dev_put;
+
+	err = swdev_flow_insert(dev, flow);
+	swdev_nl_free_flow(flow);
+dev_put:
+	swdev_nl_dev_put(dev);
+	return err;
+}
+
+static int swdev_nl_cmd_flow_remove(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net_device *dev;
+	struct sw_flow *flow;
+	int err;
+
+	if (!info->attrs[SWDEV_ATTR_FLOW])
+		return -EINVAL;
+
+	dev = swdev_nl_dev_get(info);
+	if (!dev)
+		return -EINVAL;
+
+	err = swdev_nl_parse_flow(info->attrs[SWDEV_ATTR_FLOW], &flow);
+	if (err)
+		goto dev_put;
+
+	err = swdev_flow_remove(dev, flow);
+	swdev_nl_free_flow(flow);
+dev_put:
+	swdev_nl_dev_put(dev);
+	return err;
+}
+
+static const struct genl_ops swdev_nl_ops[] = {
+	{
+		.cmd = SWDEV_CMD_NOOP,
+		.doit = swdev_nl_cmd_noop,
+	},
+	{
+		.cmd = SWDEV_CMD_FLOW_INSERT,
+		.doit = swdev_nl_cmd_flow_insert,
+		.policy = swdev_nl_flow_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = SWDEV_CMD_FLOW_REMOVE,
+		.doit = swdev_nl_cmd_flow_remove,
+		.policy = swdev_nl_flow_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+static int __init swdev_nl_module_init(void)
+{
+	return genl_register_family_with_ops(&swdev_nl_family, swdev_nl_ops);
+}
+
+static void swdev_nl_module_fini(void)
+{
+	genl_unregister_family(&swdev_nl_family);
+}
+
+module_init(swdev_nl_module_init);
+module_exit(swdev_nl_module_fini);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jiri@resnulli.us>");
+MODULE_DESCRIPTION("Netlink interface to Switch device");
+MODULE_ALIAS_GENL_FAMILY(SWITCHDEV_GENL_NAME);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
       [not found]   ` <1409736300-12303-2-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
@ 2014-09-03 15:20     ` John Fastabend
       [not found]       ` <540731B9.4010603-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2014-09-03 21:11       ` Jamal Hadi Salim
  0 siblings, 2 replies; 42+ messages in thread
From: John Fastabend @ 2014-09-03 15:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 09/03/2014 02:24 AM, Jiri Pirko wrote:
> After this, flow related structures can be used in other code.
>
> Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
> ---

Hi Jiri,

As I indicated before I'm looking into integrating this with some
hardware here. Progress is a bit slow but starting to look at it.The
i40e/ixgbe driver being one open source example with very limited
support for tables, flow matches, etc. And then a closed source driver
with much more flexibility. What I don't have is a middle of the road
switch to work with something better then a host nic but not as
flexible as a TOR.

Couple questions my assumption here is I can extend the flow_key
as needed to support additional match criteria my hardware has.
I scanned the ./net/openvswitch source and I didn't catch any
place that would break but might need to take a closer look.
Similarly the actions set will need to be extended. For example
if I want to use this with i40e a OVS_ACTION_ATTR_QUEUE could
be used to steer packets to the queue. With this in mind we
will want a follow up patch to rename OVS_ACTION_ATTR_* to
FLOW_ACTION_ATTR_*

Also I have some filters that can match on offset/length/mask
tuples. As far as I can tell this is going to have to be yet
another interface? Or would it be worth the effort to define
the flow key more generically. My initial guess is I'll just
write a separate interface. I think this is what Jamal referred
to as another "classifier".

Thanks,
John

[...]

> +
> +struct sw_flow_key_ipv4_tunnel {
> +	__be64 tun_id;
> +	__be32 ipv4_src;
> +	__be32 ipv4_dst;
> +	__be16 tun_flags;
> +	u8   ipv4_tos;
> +	u8   ipv4_ttl;
> +};
> +
> +struct sw_flow_key {
> +	struct sw_flow_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
> +	struct {
> +		u32	priority;	/* Packet QoS priority. */
> +		u32	skb_mark;	/* SKB mark. */
> +		u16	in_port;	/* Input switch port (or DP_MAX_PORTS). */
> +	} __packed phy; /* Safe when right after 'tun_key'. */
> +	struct {
> +		u8     src[ETH_ALEN];	/* Ethernet source address. */
> +		u8     dst[ETH_ALEN];	/* Ethernet destination address. */
> +		__be16 tci;		/* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
> +		__be16 type;		/* Ethernet frame type. */
> +	} eth;
> +	struct {
> +		u8     proto;		/* IP protocol or lower 8 bits of ARP opcode. */
> +		u8     tos;		/* IP ToS. */
> +		u8     ttl;		/* IP TTL/hop limit. */
> +		u8     frag;		/* One of OVS_FRAG_TYPE_*. */
> +	} ip;
> +	struct {
> +		__be16 src;		/* TCP/UDP/SCTP source port. */
> +		__be16 dst;		/* TCP/UDP/SCTP destination port. */
> +		__be16 flags;		/* TCP flags. */
> +	} tp;
> +	union {
> +		struct {
> +			struct {
> +				__be32 src;	/* IP source address. */
> +				__be32 dst;	/* IP destination address. */
> +			} addr;
> +			struct {
> +				u8 sha[ETH_ALEN];	/* ARP source hardware address. */
> +				u8 tha[ETH_ALEN];	/* ARP target hardware address. */
> +			} arp;
> +		} ipv4;
> +		struct {
> +			struct {
> +				struct in6_addr src;	/* IPv6 source address. */
> +				struct in6_addr dst;	/* IPv6 destination address. */
> +			} addr;
> +			__be32 label;			/* IPv6 flow label. */
> +			struct {
> +				struct in6_addr target;	/* ND target address. */
> +				u8 sll[ETH_ALEN];	/* ND source link layer address. */
> +				u8 tll[ETH_ALEN];	/* ND target link layer address. */
> +			} nd;
> +		} ipv6;
> +	};
> +} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
> +
> +struct sw_flow_key_range {
> +	unsigned short int start;
> +	unsigned short int end;
> +};
> +
> +struct sw_flow_mask {
> +	struct sw_flow_key_range range;
> +	struct sw_flow_key key;
> +};
> +
> +struct sw_flow_action {
> +};
> +
> +struct sw_flow_actions {
> +	unsigned count;
> +	struct sw_flow_action actions[0];
> +};
> +
> +struct sw_flow {
> +	struct sw_flow_key key;
> +	struct sw_flow_key unmasked_key;
> +	struct sw_flow_mask *mask;
> +	struct sw_flow_actions *actions;
> +};
> +


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 03/13] net: introduce generic switch devices support
       [not found]     ` <1409736300-12303-4-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
@ 2014-09-03 15:46       ` John Fastabend
       [not found]         ` <540737CF.4000402-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: John Fastabend @ 2014-09-03 15:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 09/03/2014 02:24 AM, Jiri Pirko wrote:
> The goal of this is to provide a possibility to suport various switch
> chips. Drivers should implement relevant ndos to do so. Now there is a
> couple of ndos defines:
> - for getting physical switch id is in place.
> - for work with flows.
>
> Note that user can use random port netdevice to access the switch.
>
> Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
> ---


[...]

>   struct netpoll_info;
> @@ -997,6 +999,24 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>    *	Callback to use for xmit over the accelerated station. This
>    *	is used in place of ndo_start_xmit on accelerated net
>    *	devices.
> + *
> + * int (*ndo_swdev_get_id)(struct net_device *dev,
> + *			   struct netdev_phys_item_id *psid);
> + *	Called to get an ID of the switch chip this port is part of.
> + *	If driver implements this, it indicates that it represents a port
> + *	of a switch chip.
> + *
> + * int (*ndo_swdev_flow_insert)(struct net_device *dev,
> + *				const struct sw_flow *flow);
> + *	Called to insert a flow into switch device. If driver does
> + *	not implement this, it is assumed that the hw does not have
> + *	a capability to work with flows.
> + *
> + * int (*ndo_swdev_flow_remove)(struct net_device *dev,
> + *				const struct sw_flow *flow);
> + *	Called to remove a flow from switch device. If driver does
> + *	not implement this, it is assumed that the hw does not have
> + *	a capability to work with flows.
>    */
>   struct net_device_ops {
>   	int			(*ndo_init)(struct net_device *dev);
> @@ -1146,6 +1166,14 @@ struct net_device_ops {
>   							struct net_device *dev,
>   							void *priv);
>   	int			(*ndo_get_lock_subclass)(struct net_device *dev);
> +#ifdef CONFIG_NET_SWITCHDEV
> +	int			(*ndo_swdev_get_id)(struct net_device *dev,
> +						    struct netdev_phys_item_id *psid);
> +	int			(*ndo_swdev_flow_insert)(struct net_device *dev,
> +							 const struct sw_flow *flow);
> +	int			(*ndo_swdev_flow_remove)(struct net_device *dev,
> +							 const struct sw_flow *flow);

Not really a critique of your patch but I'll need to extend this
with a ndo_swdev_flow_dump() to get the fields. Without this if
your user space side ever restarts, gets out of sync there is no
way to get back in sync.

Also with hardware that has multiple flow tables we need to indicate
the table to insert the flow into. One concrete reason to do this
is to create atomic updates of multiple ACLs. The idea is to create
a new ACL table build the table up and then link it in. This can be
added when its needed my opensource drivers don't support this yet
either but maybe adding multiple tables to rocker switch will help
flush this out.

Finally we need some way to drive capabilities out of the swdev.
Even rocker switch needs this to indicate it doesn't support matching
on all the sw_flow fields. Without this its not clear to me how to
manage the device from user space. I tried writing user space daemon
for the simpler flow director interface and the try and see model
breaks quickly.

> +#endif
>   };
>
>   /**
> diff --git a/include/net/sw_flow.h b/include/net/sw_flow.h
> index 21724f1..3af7758 100644
> --- a/include/net/sw_flow.h
> +++ b/include/net/sw_flow.h
> @@ -81,7 +81,21 @@ struct sw_flow_mask {
>   	struct sw_flow_key key;
>   };
>
> +enum sw_flow_action_type {
> +	SW_FLOW_ACTION_TYPE_OUTPUT,
> +	SW_FLOW_ACTION_TYPE_VLAN_PUSH,
> +	SW_FLOW_ACTION_TYPE_VLAN_POP,
> +};
> +

OK my previous comment about having another patch to create
generic actions seems to be resolved here. I'm not sure how
important it is but if we abstract the flow types away from
OVS is there any reason not to reuse and relabel the action
types as well? I guess we can't break userspace API but maybe
a 1:1 mapping would be better?

>   struct sw_flow_action {
> +	enum sw_flow_action_type type;
> +	union {
> +		u32 out_port_ifindex;
> +		struct {
> +			__be16 vlan_proto;
> +			u16 vlan_tci;
> +		} vlan;
> +	};
>   };

[...]

I think my comments could be addressed with additional patches
if you want. I could help but it will be another week or so
before I have some time. The biggest issue IMO is the lack of
capabilities queries.

Thanks,
John


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 10/13] openvswitch: add support for datapath hardware offload
       [not found]     ` <1409736300-12303-11-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
@ 2014-09-03 16:37       ` John Fastabend
       [not found]         ` <540743B4.9080500-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: John Fastabend @ 2014-09-03 16:37 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 09/03/2014 02:24 AM, Jiri Pirko wrote:
> Benefit from the possibility to work with flows in switch devices and
> use the swdev api to offload flow datapath.
>
> Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
> ---
>   net/openvswitch/Makefile       |   3 +-
>   net/openvswitch/datapath.c     |  33 ++++++
>   net/openvswitch/datapath.h     |   3 +
>   net/openvswitch/flow_table.c   |   1 +
>   net/openvswitch/hw_offload.c   | 245 +++++++++++++++++++++++++++++++++++++++++
>   net/openvswitch/hw_offload.h   |  22 ++++
>   net/openvswitch/vport-netdev.c |   3 +
>   net/openvswitch/vport.h        |   2 +
>   8 files changed, 311 insertions(+), 1 deletion(-)
>   create mode 100644 net/openvswitch/hw_offload.c
>   create mode 100644 net/openvswitch/hw_offload.h
>
> diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
> index 3591cb5..5152437 100644
> --- a/net/openvswitch/Makefile
> +++ b/net/openvswitch/Makefile
> @@ -13,7 +13,8 @@ openvswitch-y := \
>   	flow_table.o \
>   	vport.o \
>   	vport-internal_dev.o \
> -	vport-netdev.o
> +	vport-netdev.o \
> +	hw_offload.o
>
>   ifneq ($(CONFIG_OPENVSWITCH_VXLAN),)
>   openvswitch-y += vport-vxlan.o
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 75bb07f..3e43e1d 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -57,6 +57,7 @@
>   #include "flow_netlink.h"
>   #include "vport-internal_dev.h"
>   #include "vport-netdev.h"
> +#include "hw_offload.h"
>
>   int ovs_net_id __read_mostly;
>
> @@ -864,6 +865,9 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
>   			acts = NULL;
>   			goto err_unlock_ovs;
>   		}
> +		error = ovs_hw_flow_insert(dp, new_flow);
> +		if (error)
> +			pr_warn("failed to insert flow into hw\n");

This is really close to silently failing. I think we need to
hard fail here somehow and push it back to userspace as part of
the reply and ovs_notify.

Otherwise I don't know how to manage the hardware correctly. Consider
the hardware table is full. In this case user space will continue to
add rules and they will be silently discarded. Similarly if user space
adds a flow/action that can not be supported by the hardware it will
be silently ignored.

Even if we do careful accounting on resources in user space we could
still get an ENOMEM error from sw_flow_action_create.

Same comment for the other hw commands flush/remove.

>   		if (unlikely(reply)) {
>   			error = ovs_flow_cmd_fill_info(new_flow,
> @@ -896,10 +900,18 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
>   				goto err_unlock_ovs;
>   			}
>   		}


[...]


Thanks,
John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
  2014-09-03  9:24 ` [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones Jiri Pirko
       [not found]   ` <1409736300-12303-2-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
@ 2014-09-03 18:41   ` Pravin Shelar
  2014-09-03 21:22     ` Jamal Hadi Salim
       [not found]     ` <CALnjE+pscRmfhaWgkWCunJfjvG04RiNUAj6nefSFHrknQTC+xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 2 replies; 42+ messages in thread
From: Pravin Shelar @ 2014-09-03 18:41 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, David Miller, nhorman, andy, Thomas Graf,
	Daniel Borkmann, Or Gerlitz, Jesse Gross, Andy Zhou,
	Ben Hutchings, Stephen Hemminger, jeffrey.t.kirsher, vyasevic,
	Cong Wang, john.r.fastabend, Eric Dumazet, jhs, sfeldma,
	f.fainelli, roopa, John Linville, dev, jasowang, ebiederm,
	Nicolas Dichtel, ryazanov.s.a, buytenh, aviadr, nbd, alexei.s

On Wed, Sep 3, 2014 at 2:24 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> After this, flow related structures can be used in other code.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/net/sw_flow.h          |  99 ++++++++++++++++++++++++++++++++++
>  net/openvswitch/actions.c      |   3 +-
>  net/openvswitch/datapath.c     |  74 +++++++++++++-------------
>  net/openvswitch/datapath.h     |   4 +-
>  net/openvswitch/flow.c         |   6 +--
>  net/openvswitch/flow.h         | 102 +++++++----------------------------
>  net/openvswitch/flow_netlink.c |  53 +++++++++---------
>  net/openvswitch/flow_netlink.h |  10 ++--
>  net/openvswitch/flow_table.c   | 118 ++++++++++++++++++++++-------------------
>  net/openvswitch/flow_table.h   |  30 +++++------
>  net/openvswitch/vport-gre.c    |   4 +-
>  net/openvswitch/vport-vxlan.c  |   2 +-
>  net/openvswitch/vport.c        |   2 +-
>  net/openvswitch/vport.h        |   2 +-
>  14 files changed, 276 insertions(+), 233 deletions(-)
>  create mode 100644 include/net/sw_flow.h
>
> diff --git a/include/net/sw_flow.h b/include/net/sw_flow.h
> new file mode 100644
> index 0000000..21724f1
> --- /dev/null
> +++ b/include/net/sw_flow.h
> @@ -0,0 +1,99 @@
> +/*
> + * include/net/sw_flow.h - Generic switch flow structures
> + * Copyright (c) 2007-2012 Nicira, Inc.
> + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#ifndef _NET_SW_FLOW_H_
> +#define _NET_SW_FLOW_H_
> +
> +struct sw_flow_key_ipv4_tunnel {
> +       __be64 tun_id;
> +       __be32 ipv4_src;
> +       __be32 ipv4_dst;
> +       __be16 tun_flags;
> +       u8   ipv4_tos;
> +       u8   ipv4_ttl;
> +};
> +
> +struct sw_flow_key {
> +       struct sw_flow_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
> +       struct {
> +               u32     priority;       /* Packet QoS priority. */
> +               u32     skb_mark;       /* SKB mark. */
> +               u16     in_port;        /* Input switch port (or DP_MAX_PORTS). */
> +       } __packed phy; /* Safe when right after 'tun_key'. */
> +       struct {
> +               u8     src[ETH_ALEN];   /* Ethernet source address. */
> +               u8     dst[ETH_ALEN];   /* Ethernet destination address. */
> +               __be16 tci;             /* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
> +               __be16 type;            /* Ethernet frame type. */
> +       } eth;
> +       struct {
> +               u8     proto;           /* IP protocol or lower 8 bits of ARP opcode. */
> +               u8     tos;             /* IP ToS. */
> +               u8     ttl;             /* IP TTL/hop limit. */
> +               u8     frag;            /* One of OVS_FRAG_TYPE_*. */
> +       } ip;
> +       struct {
> +               __be16 src;             /* TCP/UDP/SCTP source port. */
> +               __be16 dst;             /* TCP/UDP/SCTP destination port. */
> +               __be16 flags;           /* TCP flags. */
> +       } tp;
> +       union {
> +               struct {
> +                       struct {
> +                               __be32 src;     /* IP source address. */
> +                               __be32 dst;     /* IP destination address. */
> +                       } addr;
> +                       struct {
> +                               u8 sha[ETH_ALEN];       /* ARP source hardware address. */
> +                               u8 tha[ETH_ALEN];       /* ARP target hardware address. */
> +                       } arp;
> +               } ipv4;
> +               struct {
> +                       struct {
> +                               struct in6_addr src;    /* IPv6 source address. */
> +                               struct in6_addr dst;    /* IPv6 destination address. */
> +                       } addr;
> +                       __be32 label;                   /* IPv6 flow label. */
> +                       struct {
> +                               struct in6_addr target; /* ND target address. */
> +                               u8 sll[ETH_ALEN];       /* ND source link layer address. */
> +                               u8 tll[ETH_ALEN];       /* ND target link layer address. */
> +                       } nd;
> +               } ipv6;
> +       };
> +} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
> +

HW offload API should be separate from OVS module. This has following
advantages.
1. It can be managed by OVS userspace vswitchd process which has much
better context to setup hardware flow table. Once we add capabilities
for swdev, it is much more easier for vswitchd process to choose
correct (hw or sw) flow table for given flow.
2. Other application that wants to use HW offload does not have
dependency on OVS kernel module.
3. Hardware and software datapath remains separate, these two
components has no dependency on each other, both can be developed
independent of each other.

Thanks.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
       [not found]       ` <540731B9.4010603-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-09-03 18:42         ` Pravin Shelar
       [not found]           ` <CALnjE+rk26Om1O5_Q=8tn7eAyh4Ywen-1+UD_nCVj_geZY1HuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-09-04 12:09         ` Jiri Pirko
  1 sibling, 1 reply; 42+ messages in thread
From: Pravin Shelar @ 2014-09-03 18:42 UTC (permalink / raw)
  To: John Fastabend
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ, Eric Dumazet,
	andy-QlMahl40kYEqcZcGjlUOXw, dev-yBygre7rU0TnMu66kgdUjQ,
	nbd-p3rKhJxN3npAfugRpC6u6w, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	Rony Efraim, jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	Or Gerlitz, Ben Hutchings, buytenh-OLH4Qvv75CYX/NnBR394Jw,
	Jiri Pirko, roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	Nicolas Dichtel, vyasevic-H+wXaHxf7aLQT0dZR+AlfA,
	nhorman-2XuSBdqkA4R54TAoqtyWWQ, netdev, Stephen Hemminger,
	Daniel Borkmann, ebiederm-aS9lmoZGLiVWk0Htik3J/w, David Miller

On Wed, Sep 3, 2014 at 8:20 AM, John Fastabend <john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On 09/03/2014 02:24 AM, Jiri Pirko wrote:
>>
>> After this, flow related structures can be used in other code.
>>
>> Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
>> ---
>
>
> Hi Jiri,
>
> As I indicated before I'm looking into integrating this with some
> hardware here. Progress is a bit slow but starting to look at it.The
> i40e/ixgbe driver being one open source example with very limited
> support for tables, flow matches, etc. And then a closed source driver
> with much more flexibility. What I don't have is a middle of the road
> switch to work with something better then a host nic but not as
> flexible as a TOR.
>
> Couple questions my assumption here is I can extend the flow_key
> as needed to support additional match criteria my hardware has.
> I scanned the ./net/openvswitch source and I didn't catch any
> place that would break but might need to take a closer look.
> Similarly the actions set will need to be extended. For example
> if I want to use this with i40e a OVS_ACTION_ATTR_QUEUE could
> be used to steer packets to the queue. With this in mind we
> will want a follow up patch to rename OVS_ACTION_ATTR_* to
> FLOW_ACTION_ATTR_*
>

struct sw_flow_key is internal structure of OVS, it is designed to
have better flow-table performance. By adding hw specific fields in
sw_flow_key, it increase flow-key size and that has negative impact on
OVS software switching performance. Therefore it is better not to
share this internal structure with driver interface.

Thanks.

> Also I have some filters that can match on offset/length/mask
> tuples. As far as I can tell this is going to have to be yet
> another interface? Or would it be worth the effort to define
> the flow key more generically. My initial guess is I'll just
> write a separate interface. I think this is what Jamal referred
> to as another "classifier".
>
> Thanks,
> John
>
> [...]
>
>
>> +
>> +struct sw_flow_key_ipv4_tunnel {
>> +       __be64 tun_id;
>> +       __be32 ipv4_src;
>> +       __be32 ipv4_dst;
>> +       __be16 tun_flags;
>> +       u8   ipv4_tos;
>> +       u8   ipv4_ttl;
>> +};
>> +
>> +struct sw_flow_key {
>> +       struct sw_flow_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel
>> key. */
>> +       struct {
>> +               u32     priority;       /* Packet QoS priority. */
>> +               u32     skb_mark;       /* SKB mark. */
>> +               u16     in_port;        /* Input switch port (or
>> DP_MAX_PORTS). */
>> +       } __packed phy; /* Safe when right after 'tun_key'. */
>> +       struct {
>> +               u8     src[ETH_ALEN];   /* Ethernet source address. */
>> +               u8     dst[ETH_ALEN];   /* Ethernet destination address.
>> */
>> +               __be16 tci;             /* 0 if no VLAN, VLAN_TAG_PRESENT
>> set otherwise. */
>> +               __be16 type;            /* Ethernet frame type. */
>> +       } eth;
>> +       struct {
>> +               u8     proto;           /* IP protocol or lower 8 bits of
>> ARP opcode. */
>> +               u8     tos;             /* IP ToS. */
>> +               u8     ttl;             /* IP TTL/hop limit. */
>> +               u8     frag;            /* One of OVS_FRAG_TYPE_*. */
>> +       } ip;
>> +       struct {
>> +               __be16 src;             /* TCP/UDP/SCTP source port. */
>> +               __be16 dst;             /* TCP/UDP/SCTP destination port.
>> */
>> +               __be16 flags;           /* TCP flags. */
>> +       } tp;
>> +       union {
>> +               struct {
>> +                       struct {
>> +                               __be32 src;     /* IP source address. */
>> +                               __be32 dst;     /* IP destination address.
>> */
>> +                       } addr;
>> +                       struct {
>> +                               u8 sha[ETH_ALEN];       /* ARP source
>> hardware address. */
>> +                               u8 tha[ETH_ALEN];       /* ARP target
>> hardware address. */
>> +                       } arp;
>> +               } ipv4;
>> +               struct {
>> +                       struct {
>> +                               struct in6_addr src;    /* IPv6 source
>> address. */
>> +                               struct in6_addr dst;    /* IPv6
>> destination address. */
>> +                       } addr;
>> +                       __be32 label;                   /* IPv6 flow
>> label. */
>> +                       struct {
>> +                               struct in6_addr target; /* ND target
>> address. */
>> +                               u8 sll[ETH_ALEN];       /* ND source link
>> layer address. */
>> +                               u8 tll[ETH_ALEN];       /* ND target link
>> layer address. */
>> +                       } nd;
>> +               } ipv6;
>> +       };
>> +} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as
>> longs. */
>> +
>> +struct sw_flow_key_range {
>> +       unsigned short int start;
>> +       unsigned short int end;
>> +};
>> +
>> +struct sw_flow_mask {
>> +       struct sw_flow_key_range range;
>> +       struct sw_flow_key key;
>> +};
>> +
>> +struct sw_flow_action {
>> +};
>> +
>> +struct sw_flow_actions {
>> +       unsigned count;
>> +       struct sw_flow_action actions[0];
>> +};
>> +
>> +struct sw_flow {
>> +       struct sw_flow_key key;
>> +       struct sw_flow_key unmasked_key;
>> +       struct sw_flow_mask *mask;
>> +       struct sw_flow_actions *actions;
>> +};
>> +
>
>
>
> --
> John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
  2014-09-03 15:20     ` John Fastabend
       [not found]       ` <540731B9.4010603-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-09-03 21:11       ` Jamal Hadi Salim
  1 sibling, 0 replies; 42+ messages in thread
From: Jamal Hadi Salim @ 2014-09-03 21:11 UTC (permalink / raw)
  To: John Fastabend, Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, dev, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye

On 09/03/14 11:20, John Fastabend wrote:

> Also I have some filters that can match on offset/length/mask
> tuples. As far as I can tell this is going to have to be yet
> another interface? Or would it be worth the effort to define
> the flow key more generically. My initial guess is I'll just
> write a separate interface. I think this is what Jamal referred
> to as another "classifier".
>

Exactly. I have more complex classifiers as stated earlier.
I am afraid these patches again are not satisfying that need.

In any case - we are taking a different tact than these patches
do and hopefully at some point we can merge thoughts.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
  2014-09-03 18:41   ` Pravin Shelar
@ 2014-09-03 21:22     ` Jamal Hadi Salim
       [not found]       ` <54078694.5040104-jkUAjuhPggJWk0Htik3J/w@public.gmane.org>
       [not found]     ` <CALnjE+pscRmfhaWgkWCunJfjvG04RiNUAj6nefSFHrknQTC+xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Jamal Hadi Salim @ 2014-09-03 21:22 UTC (permalink / raw)
  To: Pravin Shelar, Jiri Pirko
  Cc: netdev, David Miller, nhorman, andy, Thomas Graf,
	Daniel Borkmann, Or Gerlitz, Jesse Gross, Andy Zhou,
	Ben Hutchings, Stephen Hemminger, jeffrey.t.kirsher, vyasevic,
	Cong Wang, john.r.fastabend, Eric Dumazet, sfeldma, f.fainelli,
	roopa, John Linville, dev, jasowang, ebiederm, Nicolas Dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov

On 09/03/14 14:41, Pravin Shelar wrote:
> On Wed, Sep 3, 2014 at 2:24 AM, Jiri Pirko <jiri@resnulli.us> wrote:

> HW offload API should be separate from OVS module.

The above part i agree with. Infact it is very odd that it seems
hard to get this point across ;->

> This has following
> advantages.
> 1. It can be managed by OVS userspace vswitchd process which has much
> better context to setup hardware flow table. Once we add capabilities
> for swdev, it is much more easier for vswitchd process to choose
> correct (hw or sw) flow table for given flow.

This i disagree with.
The desire is to have existing user tools to work with offloads.
When necessary, we then create new tools.
Existing tools may need to be taught to do selectively do
hardware vs software offload. We have a precedence with
bridging code which selectively offloads to hardware using iproute2.

> 2. Other application that wants to use HW offload does not have
> dependency on OVS kernel module.

Or on OF for that matter.

> 3. Hardware and software datapath remains separate, these two
> components has no dependency on each other, both can be developed
> independent of each other.
>

The basic definition of "offload" implies dependency;-> So,
I strongly disagree. You may need to go backwards and look at
views expressed on this (other than emails - theres slideware).

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
       [not found]       ` <54078694.5040104-jkUAjuhPggJWk0Htik3J/w@public.gmane.org>
@ 2014-09-03 21:59         ` Pravin Shelar
       [not found]           ` <CALnjE+qUqSK7kHSi5BZuA0hzFjMcZ8TCTd9JRG1PPmMfDmAQOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Pravin Shelar @ 2014-09-03 21:59 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: ryazanov.s.a, jasowang, john.r.fastabend, neil.jerram,
	Eric Dumazet, andy, dev-yBygre7rU0TnMu66kgdUjQ, nbd, f.fainelli,
	Rony Efraim, jeffrey.t.kirsher, Or Gerlitz, Ben Hutchings,
	buytenh, Jiri Pirko, roopa, aviadr, Nicolas Dichtel, vyasevic,
	nhorman, netdev, Stephen Hemminger, Daniel Borkmann, ebiederm

On Wed, Sep 3, 2014 at 2:22 PM, Jamal Hadi Salim <jhs-jkUAjuhPggJWk0Htik3J/w@public.gmane.org> wrote:
> On 09/03/14 14:41, Pravin Shelar wrote:
>>
>> On Wed, Sep 3, 2014 at 2:24 AM, Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org> wrote:
>
>
>> HW offload API should be separate from OVS module.
>
>
> The above part i agree with. Infact it is very odd that it seems
> hard to get this point across ;->
>
>
>> This has following
>> advantages.
>> 1. It can be managed by OVS userspace vswitchd process which has much
>> better context to setup hardware flow table. Once we add capabilities
>> for swdev, it is much more easier for vswitchd process to choose
>> correct (hw or sw) flow table for given flow.
>
>
> This i disagree with.
> The desire is to have existing user tools to work with offloads.
> When necessary, we then create new tools.
> Existing tools may need to be taught to do selectively do
> hardware vs software offload. We have a precedence with
> bridging code which selectively offloads to hardware using iproute2.
>
Both of us are saying same thing.
What I meant was for OVS use-case, where OVS wants to use offload for
switching flows, vswitchd userspace process can program HW offload
using kernel HW offload APIs directly from userspace, rather than
going through OVS kernel module. If user wants to use some other tool,
then the tool can use same kernel HW offload APIs.

>
>> 2. Other application that wants to use HW offload does not have
>> dependency on OVS kernel module.
>
>
> Or on OF for that matter.
>
>
>> 3. Hardware and software datapath remains separate, these two
>> components has no dependency on each other, both can be developed
>> independent of each other.
>>
>
> The basic definition of "offload" implies dependency;-> So,
> I strongly disagree. You may need to go backwards and look at
> views expressed on this (other than emails - theres slideware).
>

I was referring to code dependency in kernel. For example ovs flow-key
structure used. This complicates OVS internal structure which needs to
be shared plus OVS might need to extend interface for configuring HW
match or action that does not exist in OVS software datapath.

I agree these two components are related and that dependency can be
handled from userspace.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 07/13] dsa: implement ndo_swdev_get_id
       [not found]     ` <1409736300-12303-8-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
@ 2014-09-03 23:20       ` Florian Fainelli
       [not found]         ` <5407A25A.8050401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Florian Fainelli @ 2014-09-03 23:20 UTC (permalink / raw)
  To: Jiri Pirko, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 09/03/2014 02:24 AM, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
> ---
>  include/linux/netdevice.h |  3 ++-
>  include/net/dsa.h         |  1 +
>  net/dsa/Kconfig           |  2 +-
>  net/dsa/dsa.c             |  3 +++
>  net/dsa/slave.c           | 10 ++++++++++
>  5 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 6a009d1..7ee070f 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -41,7 +41,6 @@
>  
>  #include <linux/ethtool.h>
>  #include <net/net_namespace.h>
> -#include <net/dsa.h>
>  #ifdef CONFIG_DCB
>  #include <net/dcbnl.h>
>  #endif
> @@ -1259,6 +1258,8 @@ enum netdev_priv_flags {
>  #define IFF_LIVE_ADDR_CHANGE		IFF_LIVE_ADDR_CHANGE
>  #define IFF_MACVLAN			IFF_MACVLAN
>  
> +#include <net/dsa.h>
> +
>  /**
>   *	struct net_device - The DEVICE structure.
>   *		Actually, this whole structure is a big mistake.  It mixes I/O
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 9771292..d60cd42 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -140,6 +140,7 @@ struct dsa_switch {
>  	u32			phys_mii_mask;
>  	struct mii_bus		*slave_mii_bus;
>  	struct net_device	*ports[DSA_MAX_PORTS];
> +	struct netdev_phys_item_id psid;
>  };
>  
>  static inline bool dsa_is_cpu_port(struct dsa_switch *ds, int p)
> diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
> index a585fd6..4e144a2 100644
> --- a/net/dsa/Kconfig
> +++ b/net/dsa/Kconfig
> @@ -1,6 +1,6 @@
>  config HAVE_NET_DSA
>  	def_bool y
> -	depends on NETDEVICES && !S390
> +	depends on NETDEVICES && NET_SWITCHDEV && !S390

It does not look like this is necessary, we are only using definitions
from net/dsa.h and include/linux/netdevice.h, and if it was, a 'select'
would be more appropriate here I think.

TBH, I think we should rather drop this patch for now, I do not see any
benefit in providing a random id over no-id at all.

>  
>  # Drivers must select NET_DSA and the appropriate tagging format
>  
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index 61f145c..374912d 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -202,6 +202,9 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
>  		ds->ports[i] = slave_dev;
>  	}
>  
> +	ds->psid.id_len = MAX_PHYS_ITEM_ID_LEN;
> +	get_random_bytes(ds->psid.id, ds->psid.id_len);
> +
>  	return ds;
>  
>  out_free:
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 7333a4a..d79a6c7 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -192,6 +192,15 @@ static netdev_tx_t dsa_slave_notag_xmit(struct sk_buff *skb,
>  	return NETDEV_TX_OK;
>  }
>  
> +static int dsa_slave_swdev_get_id(struct net_device *dev,
> +				  struct netdev_phys_item_id *psid)
> +{
> +	struct dsa_slave_priv *p = netdev_priv(dev);
> +	struct dsa_switch *ds = p->parent;
> +
> +	memcpy(psid, &ds->psid, sizeof(*psid));
> +	return 0;
> +}
>  
>  /* ethtool operations *******************************************************/
>  static int
> @@ -323,6 +332,7 @@ static const struct net_device_ops dsa_slave_netdev_ops = {
>  	.ndo_set_rx_mode	= dsa_slave_set_rx_mode,
>  	.ndo_set_mac_address	= dsa_slave_set_mac_address,
>  	.ndo_do_ioctl		= dsa_slave_ioctl,
> +	.ndo_swdev_get_id	= dsa_slave_swdev_get_id,
>  };
>  
>  static const struct dsa_device_ops notag_netdev_ops = {
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
       [not found]           ` <CALnjE+qUqSK7kHSi5BZuA0hzFjMcZ8TCTd9JRG1PPmMfDmAQOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-09-04  1:54             ` Jamal Hadi Salim
  0 siblings, 0 replies; 42+ messages in thread
From: Jamal Hadi Salim @ 2014-09-04  1:54 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: ryazanov.s.a, jasowang, john.r.fastabend, neil.jerram,
	Eric Dumazet, andy, dev-yBygre7rU0TnMu66kgdUjQ, nbd, f.fainelli,
	Rony Efraim, jeffrey.t.kirsher, Or Gerlitz, Ben Hutchings,
	buytenh, Jiri Pirko, roopa, aviadr, Nicolas Dichtel, vyasevic,
	nhorman, netdev, Stephen Hemminger, Daniel Borkmann, ebiederm

On 09/03/14 17:59, Pravin Shelar wrote:

> Both of us are saying same thing.
> What I meant was for OVS use-case, where OVS wants to use offload for
> switching flows, vswitchd userspace process can program HW offload
> using kernel HW offload APIs directly from userspace, rather than
> going through OVS kernel module. If user wants to use some other tool,
> then the tool can use same kernel HW offload APIs.

Ok, sorry, you are right - we are saying the same thing.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
       [not found]       ` <540731B9.4010603-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2014-09-03 18:42         ` Pravin Shelar
@ 2014-09-04 12:09         ` Jiri Pirko
  1 sibling, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-04 12:09 UTC (permalink / raw)
  To: John Fastabend
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Wed, Sep 03, 2014 at 05:20:25PM CEST, john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
>On 09/03/2014 02:24 AM, Jiri Pirko wrote:
>>After this, flow related structures can be used in other code.
>>
>>Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
>>---
>
>Hi Jiri,
>
>As I indicated before I'm looking into integrating this with some
>hardware here. Progress is a bit slow but starting to look at it.The
>i40e/ixgbe driver being one open source example with very limited
>support for tables, flow matches, etc. And then a closed source driver
>with much more flexibility. What I don't have is a middle of the road
>switch to work with something better then a host nic but not as
>flexible as a TOR.
>
>Couple questions my assumption here is I can extend the flow_key
>as needed to support additional match criteria my hardware has.
>I scanned the ./net/openvswitch source and I didn't catch any
>place that would break but might need to take a closer look.
>Similarly the actions set will need to be extended. For example
>if I want to use this with i40e a OVS_ACTION_ATTR_QUEUE could
>be used to steer packets to the queue. With this in mind we
>will want a follow up patch to rename OVS_ACTION_ATTR_* to
>FLOW_ACTION_ATTR_*
>
>Also I have some filters that can match on offset/length/mask
>tuples. As far as I can tell this is going to have to be yet
>another interface? Or would it be worth the effort to define
>the flow key more generically. My initial guess is I'll just
>write a separate interface. I think this is what Jamal referred
>to as another "classifier".

I'm thinking about using some more generic match key. It would
incorporate ovs key and possible other classifiers (as your
off/len/mask) as well. Drivers will have free will to implement whatever
the hw supports.

Will do it for the next version of the patchset (most probably after I
return from holliday, Sep 15).


>
>Thanks,
>John
>
>[...]
>
>>+
>>+struct sw_flow_key_ipv4_tunnel {
>>+	__be64 tun_id;
>>+	__be32 ipv4_src;
>>+	__be32 ipv4_dst;
>>+	__be16 tun_flags;
>>+	u8   ipv4_tos;
>>+	u8   ipv4_ttl;
>>+};
>>+
>>+struct sw_flow_key {
>>+	struct sw_flow_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
>>+	struct {
>>+		u32	priority;	/* Packet QoS priority. */
>>+		u32	skb_mark;	/* SKB mark. */
>>+		u16	in_port;	/* Input switch port (or DP_MAX_PORTS). */
>>+	} __packed phy; /* Safe when right after 'tun_key'. */
>>+	struct {
>>+		u8     src[ETH_ALEN];	/* Ethernet source address. */
>>+		u8     dst[ETH_ALEN];	/* Ethernet destination address. */
>>+		__be16 tci;		/* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
>>+		__be16 type;		/* Ethernet frame type. */
>>+	} eth;
>>+	struct {
>>+		u8     proto;		/* IP protocol or lower 8 bits of ARP opcode. */
>>+		u8     tos;		/* IP ToS. */
>>+		u8     ttl;		/* IP TTL/hop limit. */
>>+		u8     frag;		/* One of OVS_FRAG_TYPE_*. */
>>+	} ip;
>>+	struct {
>>+		__be16 src;		/* TCP/UDP/SCTP source port. */
>>+		__be16 dst;		/* TCP/UDP/SCTP destination port. */
>>+		__be16 flags;		/* TCP flags. */
>>+	} tp;
>>+	union {
>>+		struct {
>>+			struct {
>>+				__be32 src;	/* IP source address. */
>>+				__be32 dst;	/* IP destination address. */
>>+			} addr;
>>+			struct {
>>+				u8 sha[ETH_ALEN];	/* ARP source hardware address. */
>>+				u8 tha[ETH_ALEN];	/* ARP target hardware address. */
>>+			} arp;
>>+		} ipv4;
>>+		struct {
>>+			struct {
>>+				struct in6_addr src;	/* IPv6 source address. */
>>+				struct in6_addr dst;	/* IPv6 destination address. */
>>+			} addr;
>>+			__be32 label;			/* IPv6 flow label. */
>>+			struct {
>>+				struct in6_addr target;	/* ND target address. */
>>+				u8 sll[ETH_ALEN];	/* ND source link layer address. */
>>+				u8 tll[ETH_ALEN];	/* ND target link layer address. */
>>+			} nd;
>>+		} ipv6;
>>+	};
>>+} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
>>+
>>+struct sw_flow_key_range {
>>+	unsigned short int start;
>>+	unsigned short int end;
>>+};
>>+
>>+struct sw_flow_mask {
>>+	struct sw_flow_key_range range;
>>+	struct sw_flow_key key;
>>+};
>>+
>>+struct sw_flow_action {
>>+};
>>+
>>+struct sw_flow_actions {
>>+	unsigned count;
>>+	struct sw_flow_action actions[0];
>>+};
>>+
>>+struct sw_flow {
>>+	struct sw_flow_key key;
>>+	struct sw_flow_key unmasked_key;
>>+	struct sw_flow_mask *mask;
>>+	struct sw_flow_actions *actions;
>>+};
>>+
>
>
>-- 
>John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
       [not found]           ` <CALnjE+rk26Om1O5_Q=8tn7eAyh4Ywen-1+UD_nCVj_geZY1HuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-09-04 12:25             ` Jiri Pirko
  0 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-04 12:25 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w, Rony Efraim,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ, Eric Dumazet,
	andy-QlMahl40kYEqcZcGjlUOXw, dev-yBygre7rU0TnMu66kgdUjQ,
	nbd-p3rKhJxN3npAfugRpC6u6w, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	John Fastabend, jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	Or Gerlitz, Ben Hutchings, buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	Nicolas Dichtel, vyasevic-H+wXaHxf7aLQT0dZR+AlfA,
	nhorman-2XuSBdqkA4R54TAoqtyWWQ, netdev, Stephen Hemminger,
	Daniel Borkmann, ebiederm-aS9lmoZGLiVWk0Htik3J/w, David Miller

Wed, Sep 03, 2014 at 08:42:18PM CEST, pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org wrote:
>On Wed, Sep 3, 2014 at 8:20 AM, John Fastabend <john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On 09/03/2014 02:24 AM, Jiri Pirko wrote:
>>>
>>> After this, flow related structures can be used in other code.
>>>
>>> Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
>>> ---
>>
>>
>> Hi Jiri,
>>
>> As I indicated before I'm looking into integrating this with some
>> hardware here. Progress is a bit slow but starting to look at it.The
>> i40e/ixgbe driver being one open source example with very limited
>> support for tables, flow matches, etc. And then a closed source driver
>> with much more flexibility. What I don't have is a middle of the road
>> switch to work with something better then a host nic but not as
>> flexible as a TOR.
>>
>> Couple questions my assumption here is I can extend the flow_key
>> as needed to support additional match criteria my hardware has.
>> I scanned the ./net/openvswitch source and I didn't catch any
>> place that would break but might need to take a closer look.
>> Similarly the actions set will need to be extended. For example
>> if I want to use this with i40e a OVS_ACTION_ATTR_QUEUE could
>> be used to steer packets to the queue. With this in mind we
>> will want a follow up patch to rename OVS_ACTION_ATTR_* to
>> FLOW_ACTION_ATTR_*
>>
>
>struct sw_flow_key is internal structure of OVS, it is designed to
>have better flow-table performance. By adding hw specific fields in
>sw_flow_key, it increase flow-key size and that has negative impact on
>OVS software switching performance. Therefore it is better not to
>share this internal structure with driver interface.

Ok. I will split this leaving the sw_flow_key into ovs and introducing
new one. Thanks.

>
>Thanks.
>
>> Also I have some filters that can match on offset/length/mask
>> tuples. As far as I can tell this is going to have to be yet
>> another interface? Or would it be worth the effort to define
>> the flow key more generically. My initial guess is I'll just
>> write a separate interface. I think this is what Jamal referred
>> to as another "classifier".
>>
>> Thanks,
>> John
>>
>> [...]
>>
>>
>>> +
>>> +struct sw_flow_key_ipv4_tunnel {
>>> +       __be64 tun_id;
>>> +       __be32 ipv4_src;
>>> +       __be32 ipv4_dst;
>>> +       __be16 tun_flags;
>>> +       u8   ipv4_tos;
>>> +       u8   ipv4_ttl;
>>> +};
>>> +
>>> +struct sw_flow_key {
>>> +       struct sw_flow_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel
>>> key. */
>>> +       struct {
>>> +               u32     priority;       /* Packet QoS priority. */
>>> +               u32     skb_mark;       /* SKB mark. */
>>> +               u16     in_port;        /* Input switch port (or
>>> DP_MAX_PORTS). */
>>> +       } __packed phy; /* Safe when right after 'tun_key'. */
>>> +       struct {
>>> +               u8     src[ETH_ALEN];   /* Ethernet source address. */
>>> +               u8     dst[ETH_ALEN];   /* Ethernet destination address.
>>> */
>>> +               __be16 tci;             /* 0 if no VLAN, VLAN_TAG_PRESENT
>>> set otherwise. */
>>> +               __be16 type;            /* Ethernet frame type. */
>>> +       } eth;
>>> +       struct {
>>> +               u8     proto;           /* IP protocol or lower 8 bits of
>>> ARP opcode. */
>>> +               u8     tos;             /* IP ToS. */
>>> +               u8     ttl;             /* IP TTL/hop limit. */
>>> +               u8     frag;            /* One of OVS_FRAG_TYPE_*. */
>>> +       } ip;
>>> +       struct {
>>> +               __be16 src;             /* TCP/UDP/SCTP source port. */
>>> +               __be16 dst;             /* TCP/UDP/SCTP destination port.
>>> */
>>> +               __be16 flags;           /* TCP flags. */
>>> +       } tp;
>>> +       union {
>>> +               struct {
>>> +                       struct {
>>> +                               __be32 src;     /* IP source address. */
>>> +                               __be32 dst;     /* IP destination address.
>>> */
>>> +                       } addr;
>>> +                       struct {
>>> +                               u8 sha[ETH_ALEN];       /* ARP source
>>> hardware address. */
>>> +                               u8 tha[ETH_ALEN];       /* ARP target
>>> hardware address. */
>>> +                       } arp;
>>> +               } ipv4;
>>> +               struct {
>>> +                       struct {
>>> +                               struct in6_addr src;    /* IPv6 source
>>> address. */
>>> +                               struct in6_addr dst;    /* IPv6
>>> destination address. */
>>> +                       } addr;
>>> +                       __be32 label;                   /* IPv6 flow
>>> label. */
>>> +                       struct {
>>> +                               struct in6_addr target; /* ND target
>>> address. */
>>> +                               u8 sll[ETH_ALEN];       /* ND source link
>>> layer address. */
>>> +                               u8 tll[ETH_ALEN];       /* ND target link
>>> layer address. */
>>> +                       } nd;
>>> +               } ipv6;
>>> +       };
>>> +} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as
>>> longs. */
>>> +
>>> +struct sw_flow_key_range {
>>> +       unsigned short int start;
>>> +       unsigned short int end;
>>> +};
>>> +
>>> +struct sw_flow_mask {
>>> +       struct sw_flow_key_range range;
>>> +       struct sw_flow_key key;
>>> +};
>>> +
>>> +struct sw_flow_action {
>>> +};
>>> +
>>> +struct sw_flow_actions {
>>> +       unsigned count;
>>> +       struct sw_flow_action actions[0];
>>> +};
>>> +
>>> +struct sw_flow {
>>> +       struct sw_flow_key key;
>>> +       struct sw_flow_key unmasked_key;
>>> +       struct sw_flow_mask *mask;
>>> +       struct sw_flow_actions *actions;
>>> +};
>>> +
>>
>>
>>
>> --
>> John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
       [not found]     ` <CALnjE+pscRmfhaWgkWCunJfjvG04RiNUAj6nefSFHrknQTC+xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-09-04 12:33       ` Jiri Pirko
       [not found]         ` <20140904123323.GF1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2014-09-04 12:33 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ, Eric Dumazet,
	andy-QlMahl40kYEqcZcGjlUOXw, dev-yBygre7rU0TnMu66kgdUjQ,
	nbd-p3rKhJxN3npAfugRpC6u6w, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	Rony Efraim, jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	Or Gerlitz, Ben Hutchings, buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	Nicolas Dichtel, vyasevic-H+wXaHxf7aLQT0dZR+AlfA,
	nhorman-2XuSBdqkA4R54TAoqtyWWQ, netdev, Stephen Hemminger,
	Daniel Borkmann, ebiederm-aS9lmoZGLiVWk0Htik3J/w, David Miller

Wed, Sep 03, 2014 at 08:41:39PM CEST, pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org wrote:
>On Wed, Sep 3, 2014 at 2:24 AM, Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org> wrote:
>> After this, flow related structures can be used in other code.
>>
>> Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
>> ---
>>  include/net/sw_flow.h          |  99 ++++++++++++++++++++++++++++++++++
>>  net/openvswitch/actions.c      |   3 +-
>>  net/openvswitch/datapath.c     |  74 +++++++++++++-------------
>>  net/openvswitch/datapath.h     |   4 +-
>>  net/openvswitch/flow.c         |   6 +--
>>  net/openvswitch/flow.h         | 102 +++++++----------------------------
>>  net/openvswitch/flow_netlink.c |  53 +++++++++---------
>>  net/openvswitch/flow_netlink.h |  10 ++--
>>  net/openvswitch/flow_table.c   | 118 ++++++++++++++++++++++-------------------
>>  net/openvswitch/flow_table.h   |  30 +++++------
>>  net/openvswitch/vport-gre.c    |   4 +-
>>  net/openvswitch/vport-vxlan.c  |   2 +-
>>  net/openvswitch/vport.c        |   2 +-
>>  net/openvswitch/vport.h        |   2 +-
>>  14 files changed, 276 insertions(+), 233 deletions(-)
>>  create mode 100644 include/net/sw_flow.h
>>
>> diff --git a/include/net/sw_flow.h b/include/net/sw_flow.h
>> new file mode 100644
>> index 0000000..21724f1
>> --- /dev/null
>> +++ b/include/net/sw_flow.h
>> @@ -0,0 +1,99 @@
>> +/*
>> + * include/net/sw_flow.h - Generic switch flow structures
>> + * Copyright (c) 2007-2012 Nicira, Inc.
>> + * Copyright (c) 2014 Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + */
>> +
>> +#ifndef _NET_SW_FLOW_H_
>> +#define _NET_SW_FLOW_H_
>> +
>> +struct sw_flow_key_ipv4_tunnel {
>> +       __be64 tun_id;
>> +       __be32 ipv4_src;
>> +       __be32 ipv4_dst;
>> +       __be16 tun_flags;
>> +       u8   ipv4_tos;
>> +       u8   ipv4_ttl;
>> +};
>> +
>> +struct sw_flow_key {
>> +       struct sw_flow_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
>> +       struct {
>> +               u32     priority;       /* Packet QoS priority. */
>> +               u32     skb_mark;       /* SKB mark. */
>> +               u16     in_port;        /* Input switch port (or DP_MAX_PORTS). */
>> +       } __packed phy; /* Safe when right after 'tun_key'. */
>> +       struct {
>> +               u8     src[ETH_ALEN];   /* Ethernet source address. */
>> +               u8     dst[ETH_ALEN];   /* Ethernet destination address. */
>> +               __be16 tci;             /* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
>> +               __be16 type;            /* Ethernet frame type. */
>> +       } eth;
>> +       struct {
>> +               u8     proto;           /* IP protocol or lower 8 bits of ARP opcode. */
>> +               u8     tos;             /* IP ToS. */
>> +               u8     ttl;             /* IP TTL/hop limit. */
>> +               u8     frag;            /* One of OVS_FRAG_TYPE_*. */
>> +       } ip;
>> +       struct {
>> +               __be16 src;             /* TCP/UDP/SCTP source port. */
>> +               __be16 dst;             /* TCP/UDP/SCTP destination port. */
>> +               __be16 flags;           /* TCP flags. */
>> +       } tp;
>> +       union {
>> +               struct {
>> +                       struct {
>> +                               __be32 src;     /* IP source address. */
>> +                               __be32 dst;     /* IP destination address. */
>> +                       } addr;
>> +                       struct {
>> +                               u8 sha[ETH_ALEN];       /* ARP source hardware address. */
>> +                               u8 tha[ETH_ALEN];       /* ARP target hardware address. */
>> +                       } arp;
>> +               } ipv4;
>> +               struct {
>> +                       struct {
>> +                               struct in6_addr src;    /* IPv6 source address. */
>> +                               struct in6_addr dst;    /* IPv6 destination address. */
>> +                       } addr;
>> +                       __be32 label;                   /* IPv6 flow label. */
>> +                       struct {
>> +                               struct in6_addr target; /* ND target address. */
>> +                               u8 sll[ETH_ALEN];       /* ND source link layer address. */
>> +                               u8 tll[ETH_ALEN];       /* ND target link layer address. */
>> +                       } nd;
>> +               } ipv6;
>> +       };
>> +} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
>> +
>
>HW offload API should be separate from OVS module. This has following
>advantages.
>1. It can be managed by OVS userspace vswitchd process which has much
>better context to setup hardware flow table. Once we add capabilities
>for swdev, it is much more easier for vswitchd process to choose
>correct (hw or sw) flow table for given flow.

The idea is to add a nl attr in ovs genl iface so the vswitchd can
speficify the flow the to be in sw only, in hw only, in both.
I believe that is is more convenient to let switchd to communicate flows
via single iface.

>2. Other application that wants to use HW offload does not have
>dependency on OVS kernel module.

That is not the case for this patchset. Userspace can insert/remove
flows using the switchdev generic netlink api - see:
[patch net-next 13/13] switchdev: introduce Netlink API

>3. Hardware and software datapath remains separate, these two
>components has no dependency on each other, both can be developed
>independent of each other.


The general idea is to have the offloads handled in-kernel. Therefore I
hooked on to ovs kernel dp code.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 03/13] net: introduce generic switch devices support
       [not found]         ` <540737CF.4000402-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-09-04 12:46           ` Jiri Pirko
  0 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-04 12:46 UTC (permalink / raw)
  To: John Fastabend
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Wed, Sep 03, 2014 at 05:46:23PM CEST, john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
>On 09/03/2014 02:24 AM, Jiri Pirko wrote:
>>The goal of this is to provide a possibility to suport various switch
>>chips. Drivers should implement relevant ndos to do so. Now there is a
>>couple of ndos defines:
>>- for getting physical switch id is in place.
>>- for work with flows.
>>
>>Note that user can use random port netdevice to access the switch.
>>
>>Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
>>---
>
>
>[...]
>
>>  struct netpoll_info;
>>@@ -997,6 +999,24 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>   *	Callback to use for xmit over the accelerated station. This
>>   *	is used in place of ndo_start_xmit on accelerated net
>>   *	devices.
>>+ *
>>+ * int (*ndo_swdev_get_id)(struct net_device *dev,
>>+ *			   struct netdev_phys_item_id *psid);
>>+ *	Called to get an ID of the switch chip this port is part of.
>>+ *	If driver implements this, it indicates that it represents a port
>>+ *	of a switch chip.
>>+ *
>>+ * int (*ndo_swdev_flow_insert)(struct net_device *dev,
>>+ *				const struct sw_flow *flow);
>>+ *	Called to insert a flow into switch device. If driver does
>>+ *	not implement this, it is assumed that the hw does not have
>>+ *	a capability to work with flows.
>>+ *
>>+ * int (*ndo_swdev_flow_remove)(struct net_device *dev,
>>+ *				const struct sw_flow *flow);
>>+ *	Called to remove a flow from switch device. If driver does
>>+ *	not implement this, it is assumed that the hw does not have
>>+ *	a capability to work with flows.
>>   */
>>  struct net_device_ops {
>>  	int			(*ndo_init)(struct net_device *dev);
>>@@ -1146,6 +1166,14 @@ struct net_device_ops {
>>  							struct net_device *dev,
>>  							void *priv);
>>  	int			(*ndo_get_lock_subclass)(struct net_device *dev);
>>+#ifdef CONFIG_NET_SWITCHDEV
>>+	int			(*ndo_swdev_get_id)(struct net_device *dev,
>>+						    struct netdev_phys_item_id *psid);
>>+	int			(*ndo_swdev_flow_insert)(struct net_device *dev,
>>+							 const struct sw_flow *flow);
>>+	int			(*ndo_swdev_flow_remove)(struct net_device *dev,
>>+							 const struct sw_flow *flow);
>
>Not really a critique of your patch but I'll need to extend this
>with a ndo_swdev_flow_dump() to get the fields. Without this if
>your user space side ever restarts, gets out of sync there is no
>way to get back in sync.

Sure. I do not say that the api is complete (If anything ever is...)
Feel free to add dump ndo. In fact we can take care of it and implement
in rocker driver.


>
>Also with hardware that has multiple flow tables we need to indicate
>the table to insert the flow into. One concrete reason to do this
>is to create atomic updates of multiple ACLs. The idea is to create
>a new ACL table build the table up and then link it in. This can be
>added when its needed my opensource drivers don't support this yet
>either but maybe adding multiple tables to rocker switch will help
>flush this out.

Ok. Lets leave this for future follow-ups.

>
>Finally we need some way to drive capabilities out of the swdev.
>Even rocker switch needs this to indicate it doesn't support matching
>on all the sw_flow fields. Without this its not clear to me how to
>manage the device from user space. I tried writing user space daemon
>for the simpler flow director interface and the try and see model
>breaks quickly.

Hmm. I was under impression that a simple fact that the flow insertion
fails with error is enough. But thining of it more. I believe that a set
of features makes sense. I will think about it and add it into the next
patchset version.

>
>>+#endif
>>  };
>>
>>  /**
>>diff --git a/include/net/sw_flow.h b/include/net/sw_flow.h
>>index 21724f1..3af7758 100644
>>--- a/include/net/sw_flow.h
>>+++ b/include/net/sw_flow.h
>>@@ -81,7 +81,21 @@ struct sw_flow_mask {
>>  	struct sw_flow_key key;
>>  };
>>
>>+enum sw_flow_action_type {
>>+	SW_FLOW_ACTION_TYPE_OUTPUT,
>>+	SW_FLOW_ACTION_TYPE_VLAN_PUSH,
>>+	SW_FLOW_ACTION_TYPE_VLAN_POP,
>>+};
>>+
>
>OK my previous comment about having another patch to create
>generic actions seems to be resolved here. I'm not sure how
>important it is but if we abstract the flow types away from
>OVS is there any reason not to reuse and relabel the action
>types as well? I guess we can't break userspace API but maybe
>a 1:1 mapping would be better?
>
>>  struct sw_flow_action {
>>+	enum sw_flow_action_type type;
>>+	union {
>>+		u32 out_port_ifindex;
>>+		struct {
>>+			__be16 vlan_proto;
>>+			u16 vlan_tci;
>>+		} vlan;
>>+	};
>>  };
>
>[...]
>
>I think my comments could be addressed with additional patches
>if you want. I could help but it will be another week or so
>before I have some time. The biggest issue IMO is the lack of
>capabilities queries.

Np. I will handle these (probably not before I return from vacation (Sep
15)).

Thanks!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 07/13] dsa: implement ndo_swdev_get_id
       [not found]         ` <5407A25A.8050401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-09-04 12:47           ` Jiri Pirko
       [not found]             ` <20140904124701.GH1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2014-09-04 12:47 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Thu, Sep 04, 2014 at 01:20:58AM CEST, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
>On 09/03/2014 02:24 AM, Jiri Pirko wrote:
>> Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
>> ---
>>  include/linux/netdevice.h |  3 ++-
>>  include/net/dsa.h         |  1 +
>>  net/dsa/Kconfig           |  2 +-
>>  net/dsa/dsa.c             |  3 +++
>>  net/dsa/slave.c           | 10 ++++++++++
>>  5 files changed, 17 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 6a009d1..7ee070f 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -41,7 +41,6 @@
>>  
>>  #include <linux/ethtool.h>
>>  #include <net/net_namespace.h>
>> -#include <net/dsa.h>
>>  #ifdef CONFIG_DCB
>>  #include <net/dcbnl.h>
>>  #endif
>> @@ -1259,6 +1258,8 @@ enum netdev_priv_flags {
>>  #define IFF_LIVE_ADDR_CHANGE		IFF_LIVE_ADDR_CHANGE
>>  #define IFF_MACVLAN			IFF_MACVLAN
>>  
>> +#include <net/dsa.h>
>> +
>>  /**
>>   *	struct net_device - The DEVICE structure.
>>   *		Actually, this whole structure is a big mistake.  It mixes I/O
>> diff --git a/include/net/dsa.h b/include/net/dsa.h
>> index 9771292..d60cd42 100644
>> --- a/include/net/dsa.h
>> +++ b/include/net/dsa.h
>> @@ -140,6 +140,7 @@ struct dsa_switch {
>>  	u32			phys_mii_mask;
>>  	struct mii_bus		*slave_mii_bus;
>>  	struct net_device	*ports[DSA_MAX_PORTS];
>> +	struct netdev_phys_item_id psid;
>>  };
>>  
>>  static inline bool dsa_is_cpu_port(struct dsa_switch *ds, int p)
>> diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
>> index a585fd6..4e144a2 100644
>> --- a/net/dsa/Kconfig
>> +++ b/net/dsa/Kconfig
>> @@ -1,6 +1,6 @@
>>  config HAVE_NET_DSA
>>  	def_bool y
>> -	depends on NETDEVICES && !S390
>> +	depends on NETDEVICES && NET_SWITCHDEV && !S390
>
>It does not look like this is necessary, we are only using definitions
>from net/dsa.h and include/linux/netdevice.h, and if it was, a 'select'
>would be more appropriate here I think.
>
>TBH, I think we should rather drop this patch for now, I do not see any
>benefit in providing a random id over no-id at all.

Well, the benefit is that you are still able to see which ports belong
to the same switch.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 10/13] openvswitch: add support for datapath hardware offload
       [not found]         ` <540743B4.9080500-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-09-04 12:48           ` Jiri Pirko
       [not found]             ` <20140904124837.GI1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2014-09-04 12:48 UTC (permalink / raw)
  To: John Fastabend
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Wed, Sep 03, 2014 at 06:37:08PM CEST, john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
>On 09/03/2014 02:24 AM, Jiri Pirko wrote:
>>Benefit from the possibility to work with flows in switch devices and
>>use the swdev api to offload flow datapath.
>>
>>Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
>>---
>>  net/openvswitch/Makefile       |   3 +-
>>  net/openvswitch/datapath.c     |  33 ++++++
>>  net/openvswitch/datapath.h     |   3 +
>>  net/openvswitch/flow_table.c   |   1 +
>>  net/openvswitch/hw_offload.c   | 245 +++++++++++++++++++++++++++++++++++++++++
>>  net/openvswitch/hw_offload.h   |  22 ++++
>>  net/openvswitch/vport-netdev.c |   3 +
>>  net/openvswitch/vport.h        |   2 +
>>  8 files changed, 311 insertions(+), 1 deletion(-)
>>  create mode 100644 net/openvswitch/hw_offload.c
>>  create mode 100644 net/openvswitch/hw_offload.h
>>
>>diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
>>index 3591cb5..5152437 100644
>>--- a/net/openvswitch/Makefile
>>+++ b/net/openvswitch/Makefile
>>@@ -13,7 +13,8 @@ openvswitch-y := \
>>  	flow_table.o \
>>  	vport.o \
>>  	vport-internal_dev.o \
>>-	vport-netdev.o
>>+	vport-netdev.o \
>>+	hw_offload.o
>>
>>  ifneq ($(CONFIG_OPENVSWITCH_VXLAN),)
>>  openvswitch-y += vport-vxlan.o
>>diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
>>index 75bb07f..3e43e1d 100644
>>--- a/net/openvswitch/datapath.c
>>+++ b/net/openvswitch/datapath.c
>>@@ -57,6 +57,7 @@
>>  #include "flow_netlink.h"
>>  #include "vport-internal_dev.h"
>>  #include "vport-netdev.h"
>>+#include "hw_offload.h"
>>
>>  int ovs_net_id __read_mostly;
>>
>>@@ -864,6 +865,9 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
>>  			acts = NULL;
>>  			goto err_unlock_ovs;
>>  		}
>>+		error = ovs_hw_flow_insert(dp, new_flow);
>>+		if (error)
>>+			pr_warn("failed to insert flow into hw\n");
>
>This is really close to silently failing. I think we need to
>hard fail here somehow and push it back to userspace as part of
>the reply and ovs_notify.

Yes, I agree. My plan was to handle this in ovs hw/sw/both netlink attr
implementation.


>
>Otherwise I don't know how to manage the hardware correctly. Consider
>the hardware table is full. In this case user space will continue to
>add rules and they will be silently discarded. Similarly if user space
>adds a flow/action that can not be supported by the hardware it will
>be silently ignored.
>
>Even if we do careful accounting on resources in user space we could
>still get an ENOMEM error from sw_flow_action_create.
>
>Same comment for the other hw commands flush/remove.
>
>>  		if (unlikely(reply)) {
>>  			error = ovs_flow_cmd_fill_info(new_flow,
>>@@ -896,10 +900,18 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
>>  				goto err_unlock_ovs;
>>  			}
>>  		}
>
>
>[...]
>
>
>Thanks,
>John
>
>-- 
>John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
       [not found]         ` <20140904123323.GF1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
@ 2014-09-04 20:46           ` Pravin Shelar
  2014-09-17  8:34             ` Jiri Pirko
  0 siblings, 1 reply; 42+ messages in thread
From: Pravin Shelar @ 2014-09-04 20:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: ryazanov.s.a, jasowang, john.r.fastabend, Neil Jerram,
	Eric Dumazet, Andy Gospodarek, dev-yBygre7rU0TnMu66kgdUjQ, nbd,
	Florian Fainelli, Rony Efraim, jeffrey.t.kirsher, Or Gerlitz,
	Ben Hutchings, buytenh, roopa, Jamal Hadi Salim, aviadr,
	Nicolas Dichtel, vyasevic, nhorman, netdev, Stephen Hemminger,
	Daniel Borkmann, ebiederm

On Thu, Sep 4, 2014 at 5:33 AM, Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org> wrote:
> Wed, Sep 03, 2014 at 08:41:39PM CEST, pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org wrote:
>>On Wed, Sep 3, 2014 at 2:24 AM, Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org> wrote:
>>> After this, flow related structures can be used in other code.
>>>
>>> Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
>>> ---
>>>  include/net/sw_flow.h          |  99 ++++++++++++++++++++++++++++++++++
>>>  net/openvswitch/actions.c      |   3 +-
>>>  net/openvswitch/datapath.c     |  74 +++++++++++++-------------
>>>  net/openvswitch/datapath.h     |   4 +-
>>>  net/openvswitch/flow.c         |   6 +--
>>>  net/openvswitch/flow.h         | 102 +++++++----------------------------
>>>  net/openvswitch/flow_netlink.c |  53 +++++++++---------
>>>  net/openvswitch/flow_netlink.h |  10 ++--
>>>  net/openvswitch/flow_table.c   | 118 ++++++++++++++++++++++-------------------
>>>  net/openvswitch/flow_table.h   |  30 +++++------
>>>  net/openvswitch/vport-gre.c    |   4 +-
>>>  net/openvswitch/vport-vxlan.c  |   2 +-
>>>  net/openvswitch/vport.c        |   2 +-
>>>  net/openvswitch/vport.h        |   2 +-
>>>  14 files changed, 276 insertions(+), 233 deletions(-)
>>>  create mode 100644 include/net/sw_flow.h
>>>
>>> diff --git a/include/net/sw_flow.h b/include/net/sw_flow.h
>>> new file mode 100644
>>> index 0000000..21724f1
>>> --- /dev/null
>>> +++ b/include/net/sw_flow.h
>>> @@ -0,0 +1,99 @@
>>> +/*
>>> + * include/net/sw_flow.h - Generic switch flow structures
>>> + * Copyright (c) 2007-2012 Nicira, Inc.
>>> + * Copyright (c) 2014 Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + */
>>> +
>>> +#ifndef _NET_SW_FLOW_H_
>>> +#define _NET_SW_FLOW_H_
>>> +
>>> +struct sw_flow_key_ipv4_tunnel {
>>> +       __be64 tun_id;
>>> +       __be32 ipv4_src;
>>> +       __be32 ipv4_dst;
>>> +       __be16 tun_flags;
>>> +       u8   ipv4_tos;
>>> +       u8   ipv4_ttl;
>>> +};
>>> +
>>> +struct sw_flow_key {
>>> +       struct sw_flow_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
>>> +       struct {
>>> +               u32     priority;       /* Packet QoS priority. */
>>> +               u32     skb_mark;       /* SKB mark. */
>>> +               u16     in_port;        /* Input switch port (or DP_MAX_PORTS). */
>>> +       } __packed phy; /* Safe when right after 'tun_key'. */
>>> +       struct {
>>> +               u8     src[ETH_ALEN];   /* Ethernet source address. */
>>> +               u8     dst[ETH_ALEN];   /* Ethernet destination address. */
>>> +               __be16 tci;             /* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
>>> +               __be16 type;            /* Ethernet frame type. */
>>> +       } eth;
>>> +       struct {
>>> +               u8     proto;           /* IP protocol or lower 8 bits of ARP opcode. */
>>> +               u8     tos;             /* IP ToS. */
>>> +               u8     ttl;             /* IP TTL/hop limit. */
>>> +               u8     frag;            /* One of OVS_FRAG_TYPE_*. */
>>> +       } ip;
>>> +       struct {
>>> +               __be16 src;             /* TCP/UDP/SCTP source port. */
>>> +               __be16 dst;             /* TCP/UDP/SCTP destination port. */
>>> +               __be16 flags;           /* TCP flags. */
>>> +       } tp;
>>> +       union {
>>> +               struct {
>>> +                       struct {
>>> +                               __be32 src;     /* IP source address. */
>>> +                               __be32 dst;     /* IP destination address. */
>>> +                       } addr;
>>> +                       struct {
>>> +                               u8 sha[ETH_ALEN];       /* ARP source hardware address. */
>>> +                               u8 tha[ETH_ALEN];       /* ARP target hardware address. */
>>> +                       } arp;
>>> +               } ipv4;
>>> +               struct {
>>> +                       struct {
>>> +                               struct in6_addr src;    /* IPv6 source address. */
>>> +                               struct in6_addr dst;    /* IPv6 destination address. */
>>> +                       } addr;
>>> +                       __be32 label;                   /* IPv6 flow label. */
>>> +                       struct {
>>> +                               struct in6_addr target; /* ND target address. */
>>> +                               u8 sll[ETH_ALEN];       /* ND source link layer address. */
>>> +                               u8 tll[ETH_ALEN];       /* ND target link layer address. */
>>> +                       } nd;
>>> +               } ipv6;
>>> +       };
>>> +} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
>>> +
>>
>>HW offload API should be separate from OVS module. This has following
>>advantages.
>>1. It can be managed by OVS userspace vswitchd process which has much
>>better context to setup hardware flow table. Once we add capabilities
>>for swdev, it is much more easier for vswitchd process to choose
>>correct (hw or sw) flow table for given flow.
>
> The idea is to add a nl attr in ovs genl iface so the vswitchd can
> speficify the flow the to be in sw only, in hw only, in both.
> I believe that is is more convenient to let switchd to communicate flows
> via single iface.
>
How is it convenient? this patch complicates OVS kernel module. It add
OVS interfaces for HW offload. And you need similar interfaces for
switchdev device. So it duplicate code.
On the other hand if vswitchd uses common interface (switchdev) there
is no need to extend ovs kernel interface. For example specifying
extra metadata, like (sw only, hw olny, both).

>>2. Other application that wants to use HW offload does not have
>>dependency on OVS kernel module.
>
> That is not the case for this patchset. Userspace can insert/remove
> flows using the switchdev generic netlink api - see:
> [patch net-next 13/13] switchdev: introduce Netlink API
>
>>3. Hardware and software datapath remains separate, these two
>>components has no dependency on each other, both can be developed
>>independent of each other.
>
>
> The general idea is to have the offloads handled in-kernel. Therefore I
> hooked on to ovs kernel dp code.
>
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 10/13] openvswitch: add support for datapath hardware offload
       [not found]             ` <20140904124837.GI1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
@ 2014-09-05  3:59               ` Simon Horman
  0 siblings, 0 replies; 42+ messages in thread
From: Simon Horman @ 2014-09-05  3:59 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w, John Fastabend,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

On Thu, Sep 04, 2014 at 02:48:37PM +0200, Jiri Pirko wrote:
> Wed, Sep 03, 2014 at 06:37:08PM CEST, john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
> >On 09/03/2014 02:24 AM, Jiri Pirko wrote:
> >>Benefit from the possibility to work with flows in switch devices and
> >>use the swdev api to offload flow datapath.
> >>
> >>Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
> >>---
> >>  net/openvswitch/Makefile       |   3 +-
> >>  net/openvswitch/datapath.c     |  33 ++++++
> >>  net/openvswitch/datapath.h     |   3 +
> >>  net/openvswitch/flow_table.c   |   1 +
> >>  net/openvswitch/hw_offload.c   | 245 +++++++++++++++++++++++++++++++++++++++++
> >>  net/openvswitch/hw_offload.h   |  22 ++++
> >>  net/openvswitch/vport-netdev.c |   3 +
> >>  net/openvswitch/vport.h        |   2 +
> >>  8 files changed, 311 insertions(+), 1 deletion(-)
> >>  create mode 100644 net/openvswitch/hw_offload.c
> >>  create mode 100644 net/openvswitch/hw_offload.h
> >>
> >>diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
> >>index 3591cb5..5152437 100644
> >>--- a/net/openvswitch/Makefile
> >>+++ b/net/openvswitch/Makefile
> >>@@ -13,7 +13,8 @@ openvswitch-y := \
> >>  	flow_table.o \
> >>  	vport.o \
> >>  	vport-internal_dev.o \
> >>-	vport-netdev.o
> >>+	vport-netdev.o \
> >>+	hw_offload.o
> >>
> >>  ifneq ($(CONFIG_OPENVSWITCH_VXLAN),)
> >>  openvswitch-y += vport-vxlan.o
> >>diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> >>index 75bb07f..3e43e1d 100644
> >>--- a/net/openvswitch/datapath.c
> >>+++ b/net/openvswitch/datapath.c
> >>@@ -57,6 +57,7 @@
> >>  #include "flow_netlink.h"
> >>  #include "vport-internal_dev.h"
> >>  #include "vport-netdev.h"
> >>+#include "hw_offload.h"
> >>
> >>  int ovs_net_id __read_mostly;
> >>
> >>@@ -864,6 +865,9 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
> >>  			acts = NULL;
> >>  			goto err_unlock_ovs;
> >>  		}
> >>+		error = ovs_hw_flow_insert(dp, new_flow);
> >>+		if (error)
> >>+			pr_warn("failed to insert flow into hw\n");
> >
> >This is really close to silently failing. I think we need to
> >hard fail here somehow and push it back to userspace as part of
> >the reply and ovs_notify.
> 
> Yes, I agree. My plan was to handle this in ovs hw/sw/both netlink attr
> implementation.

FWIW I agree that handling it in that way makes sense.
In particular I think that "both", where the datapath is allowed
to fall back to software is a useful mode to have (its the current
implementation, right?). But that it is also good to allow user-space
more control.

> >Otherwise I don't know how to manage the hardware correctly. Consider
> >the hardware table is full. In this case user space will continue to
> >add rules and they will be silently discarded. Similarly if user space
> >adds a flow/action that can not be supported by the hardware it will
> >be silently ignored.
> >
> >Even if we do careful accounting on resources in user space we could
> >still get an ENOMEM error from sw_flow_action_create.
> >
> >Same comment for the other hw commands flush/remove.
> >
> >>  		if (unlikely(reply)) {
> >>  			error = ovs_flow_cmd_fill_info(new_flow,
> >>@@ -896,10 +900,18 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
> >>  				goto err_unlock_ovs;
> >>  			}
> >>  		}
> >
> >
> >[...]
> >
> >
> >Thanks,
> >John
> >
> >-- 
> >John Fastabend         Intel Corporation
> _______________________________________________
> dev mailing list
> dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org
> http://openvswitch.org/mailman/listinfo/dev
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 07/13] dsa: implement ndo_swdev_get_id
       [not found]             ` <20140904124701.GH1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
@ 2014-09-05  4:43               ` Felix Fietkau
  2014-09-05  5:52                 ` Jiri Pirko
  0 siblings, 1 reply; 42+ messages in thread
From: Felix Fietkau @ 2014-09-05  4:43 UTC (permalink / raw)
  To: Jiri Pirko, Florian Fainelli
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

On 2014-09-04 14:47, Jiri Pirko wrote:
> Thu, Sep 04, 2014 at 01:20:58AM CEST, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
>>On 09/03/2014 02:24 AM, Jiri Pirko wrote:
>>> Signed-off-by: Jiri Pirko <jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
>>> ---
>>>  include/linux/netdevice.h |  3 ++-
>>>  include/net/dsa.h         |  1 +
>>>  net/dsa/Kconfig           |  2 +-
>>>  net/dsa/dsa.c             |  3 +++
>>>  net/dsa/slave.c           | 10 ++++++++++
>>>  5 files changed, 17 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 6a009d1..7ee070f 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -41,7 +41,6 @@
>>>  
>>>  #include <linux/ethtool.h>
>>>  #include <net/net_namespace.h>
>>> -#include <net/dsa.h>
>>>  #ifdef CONFIG_DCB
>>>  #include <net/dcbnl.h>
>>>  #endif
>>> @@ -1259,6 +1258,8 @@ enum netdev_priv_flags {
>>>  #define IFF_LIVE_ADDR_CHANGE		IFF_LIVE_ADDR_CHANGE
>>>  #define IFF_MACVLAN			IFF_MACVLAN
>>>  
>>> +#include <net/dsa.h>
>>> +
>>>  /**
>>>   *	struct net_device - The DEVICE structure.
>>>   *		Actually, this whole structure is a big mistake.  It mixes I/O
>>> diff --git a/include/net/dsa.h b/include/net/dsa.h
>>> index 9771292..d60cd42 100644
>>> --- a/include/net/dsa.h
>>> +++ b/include/net/dsa.h
>>> @@ -140,6 +140,7 @@ struct dsa_switch {
>>>  	u32			phys_mii_mask;
>>>  	struct mii_bus		*slave_mii_bus;
>>>  	struct net_device	*ports[DSA_MAX_PORTS];
>>> +	struct netdev_phys_item_id psid;
>>>  };
>>>  
>>>  static inline bool dsa_is_cpu_port(struct dsa_switch *ds, int p)
>>> diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
>>> index a585fd6..4e144a2 100644
>>> --- a/net/dsa/Kconfig
>>> +++ b/net/dsa/Kconfig
>>> @@ -1,6 +1,6 @@
>>>  config HAVE_NET_DSA
>>>  	def_bool y
>>> -	depends on NETDEVICES && !S390
>>> +	depends on NETDEVICES && NET_SWITCHDEV && !S390
>>
>>It does not look like this is necessary, we are only using definitions
>>from net/dsa.h and include/linux/netdevice.h, and if it was, a 'select'
>>would be more appropriate here I think.
>>
>>TBH, I think we should rather drop this patch for now, I do not see any
>>benefit in providing a random id over no-id at all.
> 
> Well, the benefit is that you are still able to see which ports belong
> to the same switch.
I think it's a bad idea to force switchdev bloat onto DSA users just for
that random id thing.

- Felix

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 07/13] dsa: implement ndo_swdev_get_id
  2014-09-05  4:43               ` Felix Fietkau
@ 2014-09-05  5:52                 ` Jiri Pirko
  0 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2014-09-05  5:52 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Florian Fainelli, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, roopa, linville, dev, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr,
	alexei.starovoitov, Neil.Jerram, ronye

Fri, Sep 05, 2014 at 06:43:23AM CEST, nbd@openwrt.org wrote:
>On 2014-09-04 14:47, Jiri Pirko wrote:
>> Thu, Sep 04, 2014 at 01:20:58AM CEST, f.fainelli@gmail.com wrote:
>>>On 09/03/2014 02:24 AM, Jiri Pirko wrote:
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> ---
>>>>  include/linux/netdevice.h |  3 ++-
>>>>  include/net/dsa.h         |  1 +
>>>>  net/dsa/Kconfig           |  2 +-
>>>>  net/dsa/dsa.c             |  3 +++
>>>>  net/dsa/slave.c           | 10 ++++++++++
>>>>  5 files changed, 17 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 6a009d1..7ee070f 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -41,7 +41,6 @@
>>>>  
>>>>  #include <linux/ethtool.h>
>>>>  #include <net/net_namespace.h>
>>>> -#include <net/dsa.h>
>>>>  #ifdef CONFIG_DCB
>>>>  #include <net/dcbnl.h>
>>>>  #endif
>>>> @@ -1259,6 +1258,8 @@ enum netdev_priv_flags {
>>>>  #define IFF_LIVE_ADDR_CHANGE		IFF_LIVE_ADDR_CHANGE
>>>>  #define IFF_MACVLAN			IFF_MACVLAN
>>>>  
>>>> +#include <net/dsa.h>
>>>> +
>>>>  /**
>>>>   *	struct net_device - The DEVICE structure.
>>>>   *		Actually, this whole structure is a big mistake.  It mixes I/O
>>>> diff --git a/include/net/dsa.h b/include/net/dsa.h
>>>> index 9771292..d60cd42 100644
>>>> --- a/include/net/dsa.h
>>>> +++ b/include/net/dsa.h
>>>> @@ -140,6 +140,7 @@ struct dsa_switch {
>>>>  	u32			phys_mii_mask;
>>>>  	struct mii_bus		*slave_mii_bus;
>>>>  	struct net_device	*ports[DSA_MAX_PORTS];
>>>> +	struct netdev_phys_item_id psid;
>>>>  };
>>>>  
>>>>  static inline bool dsa_is_cpu_port(struct dsa_switch *ds, int p)
>>>> diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
>>>> index a585fd6..4e144a2 100644
>>>> --- a/net/dsa/Kconfig
>>>> +++ b/net/dsa/Kconfig
>>>> @@ -1,6 +1,6 @@
>>>>  config HAVE_NET_DSA
>>>>  	def_bool y
>>>> -	depends on NETDEVICES && !S390
>>>> +	depends on NETDEVICES && NET_SWITCHDEV && !S390
>>>
>>>It does not look like this is necessary, we are only using definitions
>>>from net/dsa.h and include/linux/netdevice.h, and if it was, a 'select'
>>>would be more appropriate here I think.
>>>
>>>TBH, I think we should rather drop this patch for now, I do not see any
>>>benefit in providing a random id over no-id at all.
>> 
>> Well, the benefit is that you are still able to see which ports belong
>> to the same switch.
>I think it's a bad idea to force switchdev bloat onto DSA users just for
>that random id thing.

Np. I will drop this.

>
>- Felix

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath
  2014-09-03  9:24 [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Jiri Pirko
                   ` (5 preceding siblings ...)
  2014-09-03  9:25 ` [patch net-next 13/13] switchdev: introduce Netlink API Jiri Pirko
@ 2014-09-08 13:54 ` Thomas Graf
  2014-09-09 21:09   ` Alexei Starovoitov
  2014-09-16 15:58   ` Jiri Pirko
  6 siblings, 2 replies; 42+ messages in thread
From: Thomas Graf @ 2014-09-08 13:54 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, dev, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye

On 09/03/14 at 11:24am, Jiri Pirko wrote:
> This patchset can be divided into 3 main sections:
> - introduce switchdev api for implementing switch drivers
> - add hardware acceleration bits into openvswitch datapath, This uses
>   previously mentioned switchdev api
> - introduce rocker switch driver which implements switchdev api

Jiri, Scott,

Enclosed is the GOOG doc which outlines some details on my particular
interests [0]. It includes several diagrams which might help to
understand the overall arch. It is highly related to John's work as
well. Please let me know if something does not align with the model
you have in mind.

Summary:
The full virtual tunnel endpoint flow offload attempts to offload full
flows to the hardware and utilize the embedded switch on the host NIC
to empower the eSwitch with the required flexibility of the software
driven network. In this model, the guest (VM or LXC) attaches through a
SR-IOV VF which serves as the primary path. A slow path / software path
is provided via the CPU which can route packets back into the VF by
tagging packets with forwarding metadata and sending the frame back to
the NIC.

[0] https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit?usp=sharing
(Publicly accessible and open for comments)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath
  2014-09-08 13:54 ` [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Thomas Graf
@ 2014-09-09 21:09   ` Alexei Starovoitov
  2014-09-15 12:43     ` Thomas Graf
  2014-09-16 15:58   ` Jiri Pirko
  1 sibling, 1 reply; 42+ messages in thread
From: Alexei Starovoitov @ 2014-09-09 21:09 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, dev, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd, Neil.Jerram,
	ronye

On Mon, Sep 08, 2014 at 02:54:13PM +0100, Thomas Graf wrote:
> On 09/03/14 at 11:24am, Jiri Pirko wrote:
> > This patchset can be divided into 3 main sections:
> > - introduce switchdev api for implementing switch drivers
> > - add hardware acceleration bits into openvswitch datapath, This uses
> >   previously mentioned switchdev api
> > - introduce rocker switch driver which implements switchdev api
> 
> Jiri, Scott,
> 
> Enclosed is the GOOG doc which outlines some details on my particular
> interests [0]. It includes several diagrams which might help to
> understand the overall arch. It is highly related to John's work as
> well. Please let me know if something does not align with the model
> you have in mind.
> 
> Summary:
> The full virtual tunnel endpoint flow offload attempts to offload full
> flows to the hardware and utilize the embedded switch on the host NIC
> to empower the eSwitch with the required flexibility of the software
> driven network. In this model, the guest (VM or LXC) attaches through a
> SR-IOV VF which serves as the primary path. A slow path / software path
> is provided via the CPU which can route packets back into the VF by
> tagging packets with forwarding metadata and sending the frame back to
> the NIC.
> 
> [0] https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit?usp=sharing
> (Publicly accessible and open for comments)

Great doc. Very clear. I wish I could write docs like this :)

Few questions:
- on the 1st slide dpdk is used accept vm and lxc packet. How is that working?
  I know of 3 dpdk mechanisms to receive vm traffic, but all of them are kinda
  deficient, since offloads need to be disabled inside VM, so VM to VM
  performance over dpdk is not impressive. What is there for lxc?
  Is there a special pmd that can take packets from veth?

- full offload vs partial.
  The doc doesn't say, but I suspect we want transition from full to partial
  to be transparent? Especially for lxc. criu should be able to snapshot
  container on one box with full offload and restore it seamlessly on the
  other machine with partial offload, right?

- full offload with two nics.
  how bonding and redundancy suppose to work in such case?
  If wire attached to eth0 no longer passing packet, how traffic from VM1
  will reach eth1 on a different nic? Via sw datapath (flow table) ?
  I suspect we want to reuse current bonding/team abstraction here.
  I'm not quite getting the whole point of two separate physical nics.
  Is it for completeness and generality of the picture ?
  I think typical hypervisor will likely have only one multi-port nic, then
  bonding can be off-loaded within single nic via bonding driver.
  Partial offload scenario doesn't have this issue, since 'flow table'
  is fed by standard netdev which can be bond-dev and everything else, right?

- number of VFs
  I believe it's still very limited even in the newest nics, but
  number of containers will be large.
  So some lxcs will be using VFs and some will use standard veth?
  We cannot swap them dynamically based on load, so I'm not sure
  how VF approach is generically applicable here. For some use cases
  with demanding lxcs, it probably helps, but is it worth the gains?

Thanks!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath
  2014-09-09 21:09   ` Alexei Starovoitov
@ 2014-09-15 12:43     ` Thomas Graf
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Graf @ 2014-09-15 12:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, dev, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd, Neil.Jerram,
	ronye

On 09/09/14 at 02:09pm, Alexei Starovoitov wrote:
> On Mon, Sep 08, 2014 at 02:54:13PM +0100, Thomas Graf wrote:
> > [0] https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit?usp=sharing
> > (Publicly accessible and open for comments)
> 
> Great doc. Very clear. I wish I could write docs like this :)
> 
> Few questions:
> - on the 1st slide dpdk is used accept vm and lxc packet. How is that working?
>   I know of 3 dpdk mechanisms to receive vm traffic, but all of them are kinda
>   deficient, since offloads need to be disabled inside VM, so VM to VM
>   performance over dpdk is not impressive. What is there for lxc?
>   Is there a special pmd that can take packets from veth?

Glad to see you are paying attention ;-) I'm assuming somebody to
write a veth PMD at some point. It does not exist yet.

> - full offload vs partial.
>   The doc doesn't say, but I suspect we want transition from full to partial
>   to be transparent? Especially for lxc. criu should be able to snapshot
>   container on one box with full offload and restore it seamlessly on the
>   other machine with partial offload, right?

Correct. In a full offload environment, the CPU path could still use
partial offload functionality. I'll update the doc.

> - full offload with two nics.
>   how bonding and redundancy suppose to work in such case?
>   If wire attached to eth0 no longer passing packet, how traffic from VM1
>   will reach eth1 on a different nic? Via sw datapath (flow table) ?

Yes.

>   I suspect we want to reuse current bonding/team abstraction here.

Yes, both kernel bond/team and OVS group table based abstraction (see
Simon's recent effrots).

>   I'm not quite getting the whole point of two separate physical nics.
>   Is it for completeness and generality of the picture ?

Correct. It is entirely to outline the more difficult case of multiple
physical NICs.

>   I think typical hypervisor will likely have only one multi-port nic, then
>   bonding can be off-loaded within single nic via bonding driver.

Agreed. I would expect that to be the reference architecture.

>   Partial offload scenario doesn't have this issue, since 'flow table'
>   is fed by standard netdev which can be bond-dev and everything else, right?

It is unclear how a virtual LAG device would forward flow table hints.
A particular NIC might take 5 tuples which point to a particular fwd
entry. The state of this offload might be different on individual NICs
forming a bond. Either case, it should be abstracted.

> - number of VFs
>   I believe it's still very limited even in the newest nics, but
>   number of containers will be large.

Agreed.

>   So some lxcs will be using VFs and some will use standard veth?

Likely yes. I could forsee this to be driven by an elephant detection
algorithm or driven by policy.

>   We cannot swap them dynamically based on load, so I'm not sure
>   how VF approach is generically applicable here. For some use cases
>   with demanding lxcs, it probably helps, but is it worth the gains?

TBH, I don't know. I'm trying to figure that out ;-) It is obvious that
dedicating individual cores to PMDs is not ideal either for lxc type
workloads. The same question also applies to live migration which is
complicated by this and DPDK type setups. However, I believe that a
proper HW offload abstraction API is superior in terms of providing
virtualization abstraction but I'm afraid we won't know for sure until
we've actually tried it ;-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath
  2014-09-08 13:54 ` [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Thomas Graf
  2014-09-09 21:09   ` Alexei Starovoitov
@ 2014-09-16 15:58   ` Jiri Pirko
       [not found]     ` <20140916155832.GA1869-6KJVSR23iU488b5SBfVpbw@public.gmane.org>
  1 sibling, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2014-09-16 15:58 UTC (permalink / raw)
  To: Thomas Graf
  Cc: netdev, davem, nhorman, andy, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, dev, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye

Mon, Sep 08, 2014 at 03:54:13PM CEST, tgraf@suug.ch wrote:
>On 09/03/14 at 11:24am, Jiri Pirko wrote:
>> This patchset can be divided into 3 main sections:
>> - introduce switchdev api for implementing switch drivers
>> - add hardware acceleration bits into openvswitch datapath, This uses
>>   previously mentioned switchdev api
>> - introduce rocker switch driver which implements switchdev api
>
>Jiri, Scott,
>
>Enclosed is the GOOG doc which outlines some details on my particular
>interests [0]. It includes several diagrams which might help to
>understand the overall arch. It is highly related to John's work as
>well. Please let me know if something does not align with the model
>you have in mind.


Hi Thomas.

Sorry for late answer, I returned from vacation yesterday.
I went over your document, I did not find anything which would not align
with our approach. Looks good to me.

>
>Summary:
>The full virtual tunnel endpoint flow offload attempts to offload full
>flows to the hardware and utilize the embedded switch on the host NIC
>to empower the eSwitch with the required flexibility of the software
>driven network. In this model, the guest (VM or LXC) attaches through a
>SR-IOV VF which serves as the primary path. A slow path / software path
>is provided via the CPU which can route packets back into the VF by
>tagging packets with forwarding metadata and sending the frame back to
>the NIC.
>
>[0] https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit?usp=sharing
>(Publicly accessible and open for comments)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
  2014-09-04 20:46           ` Pravin Shelar
@ 2014-09-17  8:34             ` Jiri Pirko
  2014-09-17 22:07               ` Jesse Gross
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2014-09-17  8:34 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: netdev, David Miller, nhorman, Andy Gospodarek, Thomas Graf,
	Daniel Borkmann, Or Gerlitz, Jesse Gross, Andy Zhou,
	Ben Hutchings, Stephen Hemminger, jeffrey.t.kirsher, vyasevic,
	Cong Wang, john.r.fastabend, Eric Dumazet, Jamal Hadi Salim,
	sfeldma, Florian Fainelli, roopa, John Linville, dev, jasowang,
	ebiederm

Thu, Sep 04, 2014 at 10:46:28PM CEST, pshelar@nicira.com wrote:
>On Thu, Sep 4, 2014 at 5:33 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Wed, Sep 03, 2014 at 08:41:39PM CEST, pshelar@nicira.com wrote:
>>>On Wed, Sep 3, 2014 at 2:24 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> After this, flow related structures can be used in other code.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> ---
>>>>  include/net/sw_flow.h          |  99 ++++++++++++++++++++++++++++++++++
>>>>  net/openvswitch/actions.c      |   3 +-
>>>>  net/openvswitch/datapath.c     |  74 +++++++++++++-------------
>>>>  net/openvswitch/datapath.h     |   4 +-
>>>>  net/openvswitch/flow.c         |   6 +--
>>>>  net/openvswitch/flow.h         | 102 +++++++----------------------------
>>>>  net/openvswitch/flow_netlink.c |  53 +++++++++---------
>>>>  net/openvswitch/flow_netlink.h |  10 ++--
>>>>  net/openvswitch/flow_table.c   | 118 ++++++++++++++++++++++-------------------
>>>>  net/openvswitch/flow_table.h   |  30 +++++------
>>>>  net/openvswitch/vport-gre.c    |   4 +-
>>>>  net/openvswitch/vport-vxlan.c  |   2 +-
>>>>  net/openvswitch/vport.c        |   2 +-
>>>>  net/openvswitch/vport.h        |   2 +-
>>>>  14 files changed, 276 insertions(+), 233 deletions(-)
>>>>  create mode 100644 include/net/sw_flow.h
>>>>
>>>> diff --git a/include/net/sw_flow.h b/include/net/sw_flow.h
>>>> new file mode 100644
>>>> index 0000000..21724f1
>>>> --- /dev/null
>>>> +++ b/include/net/sw_flow.h
>>>> @@ -0,0 +1,99 @@
>>>> +/*
>>>> + * include/net/sw_flow.h - Generic switch flow structures
>>>> + * Copyright (c) 2007-2012 Nicira, Inc.
>>>> + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License as published by
>>>> + * the Free Software Foundation; either version 2 of the License, or
>>>> + * (at your option) any later version.
>>>> + */
>>>> +
>>>> +#ifndef _NET_SW_FLOW_H_
>>>> +#define _NET_SW_FLOW_H_
>>>> +
>>>> +struct sw_flow_key_ipv4_tunnel {
>>>> +       __be64 tun_id;
>>>> +       __be32 ipv4_src;
>>>> +       __be32 ipv4_dst;
>>>> +       __be16 tun_flags;
>>>> +       u8   ipv4_tos;
>>>> +       u8   ipv4_ttl;
>>>> +};
>>>> +
>>>> +struct sw_flow_key {
>>>> +       struct sw_flow_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
>>>> +       struct {
>>>> +               u32     priority;       /* Packet QoS priority. */
>>>> +               u32     skb_mark;       /* SKB mark. */
>>>> +               u16     in_port;        /* Input switch port (or DP_MAX_PORTS). */
>>>> +       } __packed phy; /* Safe when right after 'tun_key'. */
>>>> +       struct {
>>>> +               u8     src[ETH_ALEN];   /* Ethernet source address. */
>>>> +               u8     dst[ETH_ALEN];   /* Ethernet destination address. */
>>>> +               __be16 tci;             /* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
>>>> +               __be16 type;            /* Ethernet frame type. */
>>>> +       } eth;
>>>> +       struct {
>>>> +               u8     proto;           /* IP protocol or lower 8 bits of ARP opcode. */
>>>> +               u8     tos;             /* IP ToS. */
>>>> +               u8     ttl;             /* IP TTL/hop limit. */
>>>> +               u8     frag;            /* One of OVS_FRAG_TYPE_*. */
>>>> +       } ip;
>>>> +       struct {
>>>> +               __be16 src;             /* TCP/UDP/SCTP source port. */
>>>> +               __be16 dst;             /* TCP/UDP/SCTP destination port. */
>>>> +               __be16 flags;           /* TCP flags. */
>>>> +       } tp;
>>>> +       union {
>>>> +               struct {
>>>> +                       struct {
>>>> +                               __be32 src;     /* IP source address. */
>>>> +                               __be32 dst;     /* IP destination address. */
>>>> +                       } addr;
>>>> +                       struct {
>>>> +                               u8 sha[ETH_ALEN];       /* ARP source hardware address. */
>>>> +                               u8 tha[ETH_ALEN];       /* ARP target hardware address. */
>>>> +                       } arp;
>>>> +               } ipv4;
>>>> +               struct {
>>>> +                       struct {
>>>> +                               struct in6_addr src;    /* IPv6 source address. */
>>>> +                               struct in6_addr dst;    /* IPv6 destination address. */
>>>> +                       } addr;
>>>> +                       __be32 label;                   /* IPv6 flow label. */
>>>> +                       struct {
>>>> +                               struct in6_addr target; /* ND target address. */
>>>> +                               u8 sll[ETH_ALEN];       /* ND source link layer address. */
>>>> +                               u8 tll[ETH_ALEN];       /* ND target link layer address. */
>>>> +                       } nd;
>>>> +               } ipv6;
>>>> +       };
>>>> +} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
>>>> +
>>>
>>>HW offload API should be separate from OVS module. This has following
>>>advantages.
>>>1. It can be managed by OVS userspace vswitchd process which has much
>>>better context to setup hardware flow table. Once we add capabilities
>>>for swdev, it is much more easier for vswitchd process to choose
>>>correct (hw or sw) flow table for given flow.
>>
>> The idea is to add a nl attr in ovs genl iface so the vswitchd can
>> speficify the flow the to be in sw only, in hw only, in both.
>> I believe that is is more convenient to let switchd to communicate flows
>> via single iface.
>>
>How is it convenient? this patch complicates OVS kernel module. It add
>OVS interfaces for HW offload. And you need similar interfaces for
>switchdev device. So it duplicate code.

There is almost no code duplication there. And in next patchset
iteration I plan to have even less.


>On the other hand if vswitchd uses common interface (switchdev) there
>is no need to extend ovs kernel interface. For example specifying
>extra metadata, like (sw only, hw olny, both).

I understand you point of view. However from the offloading perspective
it makes much more sense to push the flows through a single interface
(ovs genl) and only offload selected flows to hw (pushing further).
Having vswitchd to handle 2 different ifaces for the same/similar thing
does not seem like a clean solution to me. And it really breaks the
offloading view.

Plus the amount of code needed to be pushed into ovs kernel dp code in
order to enable this is small.


>
>>>2. Other application that wants to use HW offload does not have
>>>dependency on OVS kernel module.
>>
>> That is not the case for this patchset. Userspace can insert/remove
>> flows using the switchdev generic netlink api - see:
>> [patch net-next 13/13] switchdev: introduce Netlink API
>>
>>>3. Hardware and software datapath remains separate, these two
>>>components has no dependency on each other, both can be developed
>>>independent of each other.
>>
>>
>> The general idea is to have the offloads handled in-kernel. Therefore I
>> hooked on to ovs kernel dp code.
>>
>>
>>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones
  2014-09-17  8:34             ` Jiri Pirko
@ 2014-09-17 22:07               ` Jesse Gross
  0 siblings, 0 replies; 42+ messages in thread
From: Jesse Gross @ 2014-09-17 22:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Pravin Shelar, netdev, David Miller, nhorman, Andy Gospodarek,
	Thomas Graf, Daniel Borkmann, Or Gerlitz, Andy Zhou,
	Ben Hutchings, Stephen Hemminger, jeffrey.t.kirsher, vyasevic,
	Cong Wang, john.r.fastabend, Eric Dumazet, Jamal Hadi Salim,
	sfeldma, Florian Fainelli, roopa, John Linville, dev, jasowang,
	ebie

On Wed, Sep 17, 2014 at 1:34 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, Sep 04, 2014 at 10:46:28PM CEST, pshelar@nicira.com wrote:
>>On the other hand if vswitchd uses common interface (switchdev) there
>>is no need to extend ovs kernel interface. For example specifying
>>extra metadata, like (sw only, hw olny, both).
>
> I understand you point of view. However from the offloading perspective
> it makes much more sense to push the flows through a single interface
> (ovs genl) and only offload selected flows to hw (pushing further).
> Having vswitchd to handle 2 different ifaces for the same/similar thing
> does not seem like a clean solution to me. And it really breaks the
> offloading view.
>
> Plus the amount of code needed to be pushed into ovs kernel dp code in
> order to enable this is small.

This is missing the point: software forwarding and a hardware driver
interface are doing totally different things. Over time, these will
diverge and you will essentially up with two separate paths packed
together which doesn't help anything. This is not a theoretical
concern as different directions either already exist or have been
proposed. On the software side, there is the BPF proposal which is not
likely to map to hardware any time soon. On the other hand, hardware
is usually composed of multiple tables/functions with varying
capabilities. Sooner or later you will want to take advantage of these
and doing so isn't really possible with the software optimized flows
that the kernel handles. At that point, you will likely introduce a
new interface to userspace to expose this and get flows processed in a
different way.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath
       [not found]     ` <20140916155832.GA1869-6KJVSR23iU488b5SBfVpbw@public.gmane.org>
@ 2015-06-29  5:44       ` Neelakantam Gaddam
       [not found]         ` <CAOv37=BNU1-+kgTR6RUqxw7snJL6=5g-rLYhuPc1F-V0B1k7tA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 42+ messages in thread
From: Neelakantam Gaddam @ 2015-06-29  5:44 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, Neil Horman, Linux Netdev List,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Hi All,

Shall we expect these changes to be available in the upcoming 2.4 release?
or is it still in development?






On Tue, Sep 16, 2014 at 9:28 PM, Jiri Pirko <jiri@resnulli.us> wrote:

> Mon, Sep 08, 2014 at 03:54:13PM CEST, tgraf@suug.ch wrote:
> >On 09/03/14 at 11:24am, Jiri Pirko wrote:
> >> This patchset can be divided into 3 main sections:
> >> - introduce switchdev api for implementing switch drivers
> >> - add hardware acceleration bits into openvswitch datapath, This uses
> >>   previously mentioned switchdev api
> >> - introduce rocker switch driver which implements switchdev api
> >
> >Jiri, Scott,
> >
> >Enclosed is the GOOG doc which outlines some details on my particular
> >interests [0]. It includes several diagrams which might help to
> >understand the overall arch. It is highly related to John's work as
> >well. Please let me know if something does not align with the model
> >you have in mind.
>
>
> Hi Thomas.
>
> Sorry for late answer, I returned from vacation yesterday.
> I went over your document, I did not find anything which would not align
> with our approach. Looks good to me.
>
> >
> >Summary:
> >The full virtual tunnel endpoint flow offload attempts to offload full
> >flows to the hardware and utilize the embedded switch on the host NIC
> >to empower the eSwitch with the required flexibility of the software
> >driven network. In this model, the guest (VM or LXC) attaches through a
> >SR-IOV VF which serves as the primary path. A slow path / software path
> >is provided via the CPU which can route packets back into the VF by
> >tagging packets with forwarding metadata and sending the frame back to
> >the NIC.
> >
> >[0]
> https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit?usp=sharing
> >(Publicly accessible and open for comments)
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>



-- 
Thanks & Regards
Neelakantam Gaddam
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath
       [not found]         ` <CAOv37=BNU1-+kgTR6RUqxw7snJL6=5g-rLYhuPc1F-V0B1k7tA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-29  5:46           ` Jiri Pirko
  0 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2015-06-29  5:46 UTC (permalink / raw)
  To: Neelakantam Gaddam
  Cc: ryazanov.s.a-Re5JQEeQqe8AvxtiuMwx3w,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA,
	Neil.Jerram-QnUH15yq9NYqDJ6do+/SaQ,
	edumazet-hpIqsD4AKlfQT0dZR+AlfA, andy-QlMahl40kYEqcZcGjlUOXw,
	dev-yBygre7rU0TnMu66kgdUjQ, nbd-p3rKhJxN3npAfugRpC6u6w,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, ronye-VPRAkNaXOzVWk0Htik3J/w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, ben-/+tVBieCtBitmTQ+vhA3Yw,
	buytenh-OLH4Qvv75CYX/NnBR394Jw,
	roopa-qUQiAmfTcIp+XZJcv9eMoEEOCMrvLtNR,
	jhs-jkUAjuhPggJWk0Htik3J/w, aviadr-VPRAkNaXOzVWk0Htik3J/w,
	nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	vyasevic-H+wXaHxf7aLQT0dZR+AlfA, Neil Horman, Linux Netdev List,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	dborkman-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Mon, Jun 29, 2015 at 07:44:38AM CEST, neelugaddam@gmail.com wrote:
>Hi All,
>
>Shall we expect these changes to be available in the upcoming 2.4 release?
>or is it still in development?

Still in devel.

>
>
>
>
>
>
>On Tue, Sep 16, 2014 at 9:28 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>
>> Mon, Sep 08, 2014 at 03:54:13PM CEST, tgraf@suug.ch wrote:
>> >On 09/03/14 at 11:24am, Jiri Pirko wrote:
>> >> This patchset can be divided into 3 main sections:
>> >> - introduce switchdev api for implementing switch drivers
>> >> - add hardware acceleration bits into openvswitch datapath, This uses
>> >>   previously mentioned switchdev api
>> >> - introduce rocker switch driver which implements switchdev api
>> >
>> >Jiri, Scott,
>> >
>> >Enclosed is the GOOG doc which outlines some details on my particular
>> >interests [0]. It includes several diagrams which might help to
>> >understand the overall arch. It is highly related to John's work as
>> >well. Please let me know if something does not align with the model
>> >you have in mind.
>>
>>
>> Hi Thomas.
>>
>> Sorry for late answer, I returned from vacation yesterday.
>> I went over your document, I did not find anything which would not align
>> with our approach. Looks good to me.
>>
>> >
>> >Summary:
>> >The full virtual tunnel endpoint flow offload attempts to offload full
>> >flows to the hardware and utilize the embedded switch on the host NIC
>> >to empower the eSwitch with the required flexibility of the software
>> >driven network. In this model, the guest (VM or LXC) attaches through a
>> >SR-IOV VF which serves as the primary path. A slow path / software path
>> >is provided via the CPU which can route packets back into the VF by
>> >tagging packets with forwarding metadata and sending the frame back to
>> >the NIC.
>> >
>> >[0]
>> https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit?usp=sharing
>> >(Publicly accessible and open for comments)
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
>>
>
>
>
>-- 
>Thanks & Regards
>Neelakantam Gaddam
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2015-06-29  5:46 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-03  9:24 [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Jiri Pirko
2014-09-03  9:24 ` [patch net-next 01/13] openvswitch: split flow structures into ovs specific and generic ones Jiri Pirko
     [not found]   ` <1409736300-12303-2-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 15:20     ` John Fastabend
     [not found]       ` <540731B9.4010603-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-03 18:42         ` Pravin Shelar
     [not found]           ` <CALnjE+rk26Om1O5_Q=8tn7eAyh4Ywen-1+UD_nCVj_geZY1HuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-04 12:25             ` Jiri Pirko
2014-09-04 12:09         ` Jiri Pirko
2014-09-03 21:11       ` Jamal Hadi Salim
2014-09-03 18:41   ` Pravin Shelar
2014-09-03 21:22     ` Jamal Hadi Salim
     [not found]       ` <54078694.5040104-jkUAjuhPggJWk0Htik3J/w@public.gmane.org>
2014-09-03 21:59         ` Pravin Shelar
     [not found]           ` <CALnjE+qUqSK7kHSi5BZuA0hzFjMcZ8TCTd9JRG1PPmMfDmAQOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-04  1:54             ` Jamal Hadi Salim
     [not found]     ` <CALnjE+pscRmfhaWgkWCunJfjvG04RiNUAj6nefSFHrknQTC+xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-04 12:33       ` Jiri Pirko
     [not found]         ` <20140904123323.GF1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
2014-09-04 20:46           ` Pravin Shelar
2014-09-17  8:34             ` Jiri Pirko
2014-09-17 22:07               ` Jesse Gross
2014-09-03  9:24 ` [patch net-next 02/13] net: rename netdev_phys_port_id to more generic name Jiri Pirko
     [not found] ` <1409736300-12303-1-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03  9:24   ` [patch net-next 03/13] net: introduce generic switch devices support Jiri Pirko
     [not found]     ` <1409736300-12303-4-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 15:46       ` John Fastabend
     [not found]         ` <540737CF.4000402-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-04 12:46           ` Jiri Pirko
2014-09-03  9:24   ` [patch net-next 04/13] rtnl: expose physical switch id for particular device Jiri Pirko
2014-09-03  9:24   ` [patch net-next 05/13] net-sysfs: " Jiri Pirko
2014-09-03  9:24   ` [patch net-next 06/13] net: introduce dummy switch Jiri Pirko
2014-09-03  9:24   ` [patch net-next 07/13] dsa: implement ndo_swdev_get_id Jiri Pirko
     [not found]     ` <1409736300-12303-8-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 23:20       ` Florian Fainelli
     [not found]         ` <5407A25A.8050401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-04 12:47           ` Jiri Pirko
     [not found]             ` <20140904124701.GH1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
2014-09-05  4:43               ` Felix Fietkau
2014-09-05  5:52                 ` Jiri Pirko
2014-09-03  9:24   ` [patch net-next 10/13] openvswitch: add support for datapath hardware offload Jiri Pirko
     [not found]     ` <1409736300-12303-11-git-send-email-jiri-rHqAuBHg3fBzbRFIqnYvSA@public.gmane.org>
2014-09-03 16:37       ` John Fastabend
     [not found]         ` <540743B4.9080500-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-09-04 12:48           ` Jiri Pirko
     [not found]             ` <20140904124837.GI1867-6KJVSR23iU5sFDB2n11ItA@public.gmane.org>
2014-09-05  3:59               ` Simon Horman
2014-09-03  9:24   ` [patch net-next 11/13] sw_flow: add misc section to key with in_port_ifindex field Jiri Pirko
2014-09-03  9:24   ` [patch net-next 12/13] rocker: introduce rocker switch driver Jiri Pirko
2014-09-03  9:24 ` [patch net-next 08/13] net: introduce netdev_phys_item_ids_match helper Jiri Pirko
2014-09-03  9:24 ` [patch net-next 09/13] openvswitch: introduce vport_op get_netdev Jiri Pirko
2014-09-03  9:25 ` [patch net-next 13/13] switchdev: introduce Netlink API Jiri Pirko
2014-09-08 13:54 ` [patch net-next 00/13] introduce rocker switch driver with openvswitch hardware accelerated datapath Thomas Graf
2014-09-09 21:09   ` Alexei Starovoitov
2014-09-15 12:43     ` Thomas Graf
2014-09-16 15:58   ` Jiri Pirko
     [not found]     ` <20140916155832.GA1869-6KJVSR23iU488b5SBfVpbw@public.gmane.org>
2015-06-29  5:44       ` Neelakantam Gaddam
     [not found]         ` <CAOv37=BNU1-+kgTR6RUqxw7snJL6=5g-rLYhuPc1F-V0B1k7tA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-29  5:46           ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.