All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH v3 00/12] Flow API
@ 2015-01-20 20:26 John Fastabend
  2015-01-20 20:26 ` [net-next PATCH v3 01/12] net: flow_table: create interface for hw match/action tables John Fastabend
                   ` (12 more replies)
  0 siblings, 13 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:26 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

I believe I addressed all the comments so far except for the integrate
with 'tc'. I plan to work on the integration pieces next.

v3:
 - fixes from Simon Horman integrated see netdev mailing list
 - converted synch rcu to call_rcu
 - updated git commit messages to match code
 - updated flow-api.html document to match latest updates
 - also updated user space flow tool with a handful of fixes
v2:
 - Use a software rhashtable to store add/del flows so we can skip
   having to interrogate drivers for get_flow requests.

 - Removed structures from UAPI this should make it easier to evolve
   as needed.

 - Added net_flow_lock around set/del rule ops.

 - Alexei Starovoitov suggested renaming NET_FLOW -> NFL for
   brevity/clarity. Seems reasonable to me so went ahead and changed
   the UAPI enums. Also renamed flow types and calls  to *_rule. Core
   flow_table still using net_flow_* prefix.

 - various fixes/suggestion from Simon Horman, Jiri Pirko, Scot
   Feldman, Thomas Graf, et. al.
	* SimonH: sent patch series of fixes to netdev
	* JiriP: some naming issues, some helper funcs added, etc.
 	* ScottF: use ARRAY_SIZE, let compiler define array sizes, use
	          ETH_P_* macros. Various fixes.
	* ThomasG: various suggestions

 - fixed a few cases to catch invalid messages from user space
   and dev_put errors.

---

This set creates a new netlink family and set of messages to configure
flow tables in hardware. I tried to make the commit messages
reasonably verbose at least in the flow_table patches possibly too
verbose.

What we get at the end of this series is a working API to get device
capabilities and program flows using the rocker switch.

I created a user space tool 'flow' that I use to configure and query
the devices it is posted here,

	https://github.com/jrfastab/iprotue2-flow-tool

For now it is a stand-alone tool but once the kernel bits get sorted
out I would like to port it into the iproute2 package. This way we
can keep all of our tooling in one package.

As far as testing, I've tested various combinations of tables and
rules on the rocker switch and it seems to work.

For some examples and maybe a bit more illustrative description I
posted a  set of notes on github io pages. Here we can show the
description along with images produced by the flow tool showing
the pipeline.

http://jrfastab.github.io/jekyll/update/2014/12/21/flow-api.html

After this base work is complete the next task is to integrate with
existing subsystems 'tc' and OVS for example. And provide more
example setups in the notes.

Thanks! Any comments/feedback always welcome.

And also thanks to everyone who helped with this flow API so
far. All the folks at Dusseldorf LPC, OVS summit Santa Clara, P4
authors for some inspiration, the collection of IETF FoRCES
documents I mulled over, Netfilter workshop where I started
to realize fixing ethtool was most likely not going to work,
etc.

---

John Fastabend (12):
      net: flow_table: create interface for hw match/action tables
      net: flow_table: add rule, delete rule
      net: flow: implement flow cache for get routines
      net: flow_table: create a set of common headers and actions
      net: flow_table: add validation functions for rules
      net: rocker: add pipeline model for rocker switch
      net: rocker: add set rule ops
      net: rocker: add group_id slices and drop explicit goto
      net: rocker: add multicast path to bridging
      net: rocker: add cookie to group acls and use flow_id to set cookie
      net: rocker: have flow api calls set cookie value
      net: rocker: implement delete flow routine


 drivers/net/ethernet/rocker/rocker.c          |  754 ++++++++++
 drivers/net/ethernet/rocker/rocker_pipeline.h |  595 ++++++++
 include/linux/if_flow.h                       |  231 +++
 include/linux/if_flow_common.h                |  257 +++
 include/linux/netdevice.h                     |   48 +
 include/uapi/linux/if_flow.h                  |  440 ++++++
 net/Kconfig                                   |    7 
 net/core/Makefile                             |    1 
 net/core/flow_table.c                         | 1915 +++++++++++++++++++++++++
 9 files changed, 4231 insertions(+), 17 deletions(-)
 create mode 100644 drivers/net/ethernet/rocker/rocker_pipeline.h
 create mode 100644 include/linux/if_flow.h
 create mode 100644 include/linux/if_flow_common.h
 create mode 100644 include/uapi/linux/if_flow.h
 create mode 100644 net/core/flow_table.c

-- 
Signature

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 01/12] net: flow_table: create interface for hw match/action tables
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
@ 2015-01-20 20:26 ` John Fastabend
  2015-01-22  4:37   ` Simon Horman
  2015-01-20 20:27 ` [net-next PATCH v3 02/12] net: flow_table: add rule, delete rule John Fastabend
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:26 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

Currently, we do not have an interface to query hardware and learn
the capabilities of the device. This makes it very difficult to use
hardware flow tables.

At the moment the only interface we have to work with hardware flow
tables is ethtool. This has many deficiencies, first its ioctl based
making it difficult to use in systems that need to monitor interfaces
because there is no support for multicast, notifiers, etc.

The next big gap is it doesn't support querying devices for
capabilities. The only way to learn hardware entries is by doing a
"try and see" operation. An error perhaps indicating the device can
not support your request but could be possibly for other reasons.
Maybe a table is full for example. The existing flow interface only
supports a single ingress table which is sufficient for some of the
existing NIC host interfaces but limiting for more advanced NIC
interfaces and switch devices.

Also it is not extensible without recompiling both drivers and core
interfaces. It may be possible to reprogram a device with additional
header types, new protocols, whatever and it would be great if the
flow table infrastructure can handle this.

So this patch scraps the ethtool flow classifier interface and
creates a new flow table interface. It is expected that device that
support the existing ethtool interface today can support both
interfaces without too much difficulty. I did a proof point on the
ixgbe driver. Only choosing ixgbe because I have a 82599 10Gbps
device in my development system. A more thorough implementation
was done for the rocker switch showing how to use the interface.

In this patch we create interfaces to get the headers a device
supports, the actions it supports, a header graph showing the
relationship between headers the device supports, the tables
supported by the device and how they are connected.

This patch _only_ provides the get routines in an attempt to
make the patch sequence manageable.

get_hdrs :

   report a set of headers/fields the device supports. These
   are specified as length/offsets so we can support standard
   protocols or vendor specific headers. This is more flexible
   then bitmasks of pre-defined packet types. In 'tc' for example
   I may use u32 to match on proprietary or vendor specific fields.
   A bitmask approach does not allow for this, but defining the
   fields as a set of offsets and lengths allows for this.

   A device that supports Openflow version 1.x for example could
   provide the set of field/offsets that are equivelent to the
   specification.

   One property of this type of interface is I don't have to
   rebuild my kernel/driver header interfaces, etc to support the
   latest and greatest trendy protocol foo.

   For some types of metadata the device understands we also
   use header fields to represent these. One example of this is
   we may have an ingress_port metadata field to report the
   port a packet was received on. At the moment we expect the
   metadata fields to be defined outside the interface. We can
   standardize on common ones such "ingress_port" across devices.

   Some examples of outside definitions specifying metadata
   might be OVS, internal definitions like skb->mark, or some
   FoRCES definitions.

get_hdr_graph :

   Simply providing a header/field offset I support is not sufficient
   to learn how many nested 802.1Q tags I can support and other
   similar cases where the ordering of headers matters.

   So we use this operation to query the device for a header
   graph showing how the headers need to be related.
   With this operation and the 'get_headers' operation you can
   interrogate the driver with questions like "do you support
   Q'in'Q?", "how many VLAN tags can I nest before the parser
   breaks?", "Do you support MPLS?", "How about Foo Header in
   a VXLAN tunnel?".

get_actions :

   Report a list of actions supported by the device along with the
   arguments they take. So "drop_packet" action takes no arguments
   and "set_field" action takes two arguments a field and value.

   This suffers again from being slightly opaque. Meaning if a device
   reports back action "foo_bar" with three arguments how do I as a
   consumer of this "know" what that action is? The easy thing to do
   is punt on it and say it should be described outside the driver
   somewhere. OVS for example defines a set of actions. If my FoRCeS
   quick read is correct they define actions using text in the
   messaging interface. A follow up patch series could use a
   description language to describe actions. Possibly using something
   from eBPF or nftables for example. This patch will not try to
   solve the isuse now and expect actions are defined outside the API
   or are well known.

get_tbls :

   Hardware may support one or more tables. Each table supports a set
   of matches and a set of actions. The match fields supported are
   defined above by the 'get_headers' operations. Similarly the actions
   supported are defined by the 'get_actions' operation.

   This allows the hardware to report several tables all with distinct
   capabilities. Tables also have table attributes used to describe
   features of the table. Because netlink messages are TLV based we
   can easily add new table attribues as needed.

   Currently a table has two attributes size and source. The size
   indicates how many "slots" are in the table for flow entries. One
   caveat here is a rule in the flow table may consume multiple slots
   in the table. We deal with this in a subsequent patch.

   The source field is used to indicate table boundaries where actions
   are applied. A table with the same source value will not "see"
   actions from tables with the same source. An example where this is
   relavent would be to have an action to re-write the destiniation
   IP address of a packet. If you have a match rule in a table with
   the same source that matches on the new IP address it will not be
   hit. However if it is in a table with a different source value
   _and_ in another table that gets applied the rule will be hit. See
   the next operatoin for querying table ordering.

   Some basic hardware may only support a single table which simplifies
   some things. But even the simple 10/40Gbps NICs support multiple
   tables and different tables depending on ingress/egress.

get_tbl_graph :

   When a device supports multiple tables we need to identify how the
   tables are connected when each table is executed.

   To do this we provide a table graph which gives the pipeline of the
   device. The graph gives nodes representing each table and the edges
   indicate the criteria to progress to the next flow table. There are
   examples of this type of thing in both FoRCES and OVS. OVS
   prescribes a set of tables reachable with goto actions and FoRCES a
   slightly more flexible arrangement. In software tc's u32 classifier
   allows "linking" hash tables together. The OVS dataplane with the
   support of 'goto' action is completely connected. Without the
   'goto' action the tables are progressed linearly.

   By querying the graph from hardware we can "learn" what table flows
   are supported and map them into software.

   We also provide a bit to indicate if the node is a root node of the
   ingress pipeline or egress pipeline. This is used on devices that
   have different pipelines for ingres and egress. This appears to be
   fairly common for devices. The realtek chip presented at LPC in
   Dusseldorf for example appeared to have a separate ingress/egress
   pipeline.

With these five operations software can learn what types of fields
the hardware flow table supports and how they are arranged. Subsequent
patches will address programming the flow tables.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/if_flow.h      |  188 ++++++++
 include/linux/netdevice.h    |   38 ++
 include/uapi/linux/if_flow.h |  389 +++++++++++++++++
 net/Kconfig                  |    7 
 net/core/Makefile            |    1 
 net/core/flow_table.c        |  942 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1565 insertions(+)
 create mode 100644 include/linux/if_flow.h
 create mode 100644 include/uapi/linux/if_flow.h
 create mode 100644 net/core/flow_table.c

diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
new file mode 100644
index 0000000..7ce1e1d
--- /dev/null
+++ b/include/linux/if_flow.h
@@ -0,0 +1,188 @@
+/*
+ * include/linux/net/if_flow.h - Flow table interface for Switch devices
+ * Copyright (c) 2014 John Fastabend <john.r.fastabend@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Author: John Fastabend <john.r.fastabend@intel.com>
+ */
+
+#ifndef _IF_FLOW_H
+#define _IF_FLOW_H
+
+#include <uapi/linux/if_flow.h>
+
+/**
+ * @struct net_flow_fields
+ * @brief defines a field in a header
+ *
+ * @name string identifier for pretty printing
+ * @uid  unique identifier for field
+ * @bitwidth length of field in bits
+ */
+struct net_flow_field {
+	char *name;
+	__u32 uid;
+	__u32 bitwidth;
+};
+
+/**
+ * @struct net_flow_hdr
+ * @brief defines a match (header/field) an endpoint can use
+ *
+ * @name string identifier for pretty printing
+ * @uid unique identifier for header
+ * @field_sz number of fields are in the set
+ * @fields the set of fields in the net_flow_hdr
+ */
+struct net_flow_hdr {
+	char *name;
+	__u32 uid;
+	__u32 field_sz;
+	struct net_flow_field *fields;
+};
+
+/**
+ * @struct net_flow_action_arg
+ * @brief encodes action arguments in structures one per argument
+ *
+ * @name    string identifier for pretty printing
+ * @type    type of argument either u8, u16, u32, u64
+ * @value_# indicate value/mask value type on of u8, u16, u32, or u64
+ */
+struct net_flow_action_arg {
+	char *name;
+	enum net_flow_action_arg_type type;
+	union {
+		__u8  value_u8;
+		__u16 value_u16;
+		__u32 value_u32;
+		__u64 value_u64;
+	};
+};
+
+/**
+ * @struct net_flow_action
+ * @brief a description of a endpoint defined action
+ *
+ * @name printable name
+ * @uid unique action identifier
+ * @args null terminated list of action arguments
+ */
+struct net_flow_action {
+	char *name;
+	__u32 uid;
+	struct net_flow_action_arg *args;
+};
+
+/**
+ * @struct net_flow_field_ref
+ * @brief uniquely identify field as instance:header:field tuple
+ *
+ * @instance identify unique instance of field reference
+ * @header   identify unique header reference
+ * @field    identify unique field in above header reference
+ * @mask_type indicate mask type
+ * @type     indicate value/mask value type on of u8, u16, u32, or u64
+ * @value_u# value of field reference
+ * @mask_u#  mask value of field reference
+ */
+struct net_flow_field_ref {
+	__u32 instance;
+	__u32 header;
+	__u32 field;
+	__u32 mask_type;
+	__u32 type;
+	union {
+		struct {
+			__u8 value_u8;
+			__u8 mask_u8;
+		};
+		struct {
+			__u16 value_u16;
+			__u16 mask_u16;
+		};
+		struct {
+			__u32 value_u32;
+			__u32 mask_u32;
+		};
+		struct {
+			__u64 value_u64;
+			__u64 mask_u64;
+		};
+	};
+};
+
+/**
+ * @struct net_flow_tbl
+ * @brief define flow table with supported match/actions
+ *
+ * @name string identifier for pretty printing
+ * @uid unique identifier for table
+ * @source uid of parent table
+ * @apply_action actions in the same apply group are applied in one step
+ * @size max number of entries for table or -1 for unbounded
+ * @matches null terminated set of supported match types given by match uid
+ * @actions null terminated set of supported action types given by action uid
+ */
+struct net_flow_tbl {
+	char *name;
+	__u32 uid;
+	__u32 source;
+	__u32 apply_action;
+	__u32 size;
+	struct net_flow_field_ref *matches;
+	__u32 *actions;
+};
+
+/**
+ * @struct net_flow_jump_table
+ * @brief encodes an edge of the table graph or header graph
+ *
+ * @field   field reference must be true to follow edge
+ * @node    node identifier to connect edge to
+ */
+
+struct net_flow_jump_table {
+	struct net_flow_field_ref field;
+	__u32 node; /* <0 is a parser error */
+};
+
+/* @struct net_flow_hdr_node
+ * @brief node in a header graph of header fields.
+ *
+ * @name string identifier for pretty printing
+ * @uid  unique id of the graph node
+ * @hdrs null terminated list of hdrs identified by this node
+ * @jump encoding of graph structure as a case jump statement
+ */
+struct net_flow_hdr_node {
+	char *name;
+	__u32 uid;
+	__u32 *hdrs;
+	struct net_flow_jump_table *jump;
+};
+
+/* @struct net_flow_tbl_node
+ * @brief
+ *
+ * @uid	  unique id of the table node
+ * @flags bitmask of table attributes
+ * @jump  encoding of graph structure as a case jump statement
+ */
+struct net_flow_tbl_node {
+	__u32 uid;
+	__u32 flags;
+	struct net_flow_jump_table *jump;
+};
+#endif
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 679e6e9..74481b9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,10 @@
 #include <linux/neighbour.h>
 #include <uapi/linux/netdevice.h>
 
+#ifdef CONFIG_NET_FLOW_TABLES
+#include <linux/if_flow.h>
+#endif
+
 struct netpoll_info;
 struct device;
 struct phy_device;
@@ -1030,6 +1034,33 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
  *	Called to notify switch device port of bridge port STP
  *	state change.
+ *
+ * struct net_flow_action **(*ndo_flow_get_actions)(struct net_device *dev)
+ *	Report a null terminated list of actions supported by the device along
+ *	with the arguments they take.
+ *
+ * struct net_flow_tbl **(*ndo_flow_get_tbls)(struct net_device *dev)
+ *	Report a null terminated list of tables supported by the device.
+ *	Including the match fields and actions supported. The match fields
+ *	are defined by the 'ndo_flow_get_hdrs' op and the actions are defined
+ *	by 'ndo_flow_get_actions' op.
+ *
+ * struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev)
+ *	Report a null terminated list of nodes defining the table graph. When
+ *	a device supports multiple tables we need to identify how the tables
+ *	are connected and in what order are the tables traversed. The table
+ *	nodes returned here provide the graph required to learn this.
+ *
+ * struct net_flow_hdr	 **(*ndo_flow_get_hdrs)(struct net_device *dev)
+ *	Report a null terminated list of headers+fields supported by the
+ *	device. See net_flow_hdr struct for details on header/field layout
+ *	the basic logic is by giving the byte/length/offset of each field
+ *	the device can define the protocols it supports.
+ *
+ * struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev)
+ *	Report a null terminated list of nodes defining the header graph. This
+ *	provides the necessary graph to learn the ordering of headers supported
+ *	by the device.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1190,6 +1221,13 @@ struct net_device_ops {
 	int			(*ndo_switch_port_stp_update)(struct net_device *dev,
 							      u8 state);
 #endif
+#ifdef CONFIG_NET_FLOW_TABLES
+	struct net_flow_action	 **(*ndo_flow_get_actions)(struct net_device *dev);
+	struct net_flow_tbl	 **(*ndo_flow_get_tbls)(struct net_device *dev);
+	struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev);
+	struct net_flow_hdr	 **(*ndo_flow_get_hdrs)(struct net_device *dev);
+	struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev);
+#endif
 };
 
 /**
diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
new file mode 100644
index 0000000..3314aa2
--- /dev/null
+++ b/include/uapi/linux/if_flow.h
@@ -0,0 +1,389 @@
+/*
+ * include/uapi/linux/if_flow.h - Flow table interface for Switch devices
+ * Copyright (c) 2014 John Fastabend <john.r.fastabend@intel.com>
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Author: John Fastabend <john.r.fastabend@intel.com>
+ */
+
+/* Netlink description:
+ *
+ * Table definition used to describe running tables. The following
+ * describes the netlink format used by the flow API.
+ *
+ * Flow table definitions used to define tables.
+ *
+ * [NFL_TABLE_IDENTIFIER_TYPE]
+ * [NFL_TABLE_IDENTIFIER]
+ * [NFL_TABLE_TABLES]
+ *     [NFL_TABLE]
+ *	 [NFL_TABLE_ATTR_NAME]
+ *	 [NFL_TABLE_ATTR_UID]
+ *	 [NFL_TABLE_ATTR_SOURCE]
+ *	 [NFL_TABLE_ATTR_APPLY]
+ *	 [NFL_TABLE_ATTR_SIZE]
+ *	 [NFL_TABLE_ATTR_MATCHES]
+ *	   [NFL_FIELD_REF]
+ *	     [NFL_FIELD_REF_INSTANCE]
+ *	     [NFL_FIELD_REF_HEADER]
+ *	     [NFL_FIELD_REF_FIELD]
+ *	     [NFL_FIELD_REF_MASK]
+ *	     [NFL_FIELD_REF_TYPE]
+ *	   [...]
+ *	 [NFL_TABLE_ATTR_ACTIONS]
+ *	     [NFL_ACTION_ATTR_UID]
+ *	     [...]
+ *     [NFL_TABLE]
+ *       [...]
+ *
+ * Header definitions used to define headers with user friendly
+ * names.
+ *
+ * [NFL_TABLE_HEADERS]
+ *   [NFL_HEADER]
+ *	[NFL_HEADER_ATTR_NAME]
+ *	[NFL_HEADER_ATTR_UID]
+ *	[NFL_HEADER_ATTR_FIELDS]
+ *	  [NFL_HEADER_ATTR_FIELD]
+ *	    [NFL_FIELD_ATTR_NAME]
+ *	    [NFL_FIELD_ATTR_UID]
+ *	    [NFL_FIELD_ATTR_BITWIDTH]
+ *	  [NFL_HEADER_ATTR_FIELD]
+ *	    [...]
+ *	  [...]
+ *   [NFL_HEADER]
+ *      [...]
+ *   [...]
+ *
+ * Action definitions supported by tables
+ *
+ * [NFL_TABLE_ACTIONS]
+ *   [NFL_TABLE_ATTR_ACTIONS]
+ *	[NFL_ACTION]
+ *	  [NFL_ACTION_ATTR_NAME]
+ *	  [NFL_ACTION_ATTR_UID]
+ *	  [NFL_ACTION_ATTR_SIGNATURE]
+ *		 [NFL_ACTION_ARG]
+ *			[NFL_ACTION_ARG_NAME]
+ *			[NFL_ACTION_ARG_TYPE]
+ *               [...]
+ *	[NFL_ACTION]
+ *	     [...]
+ *
+ * Then two get definitions for the headers graph and the table graph
+ * The header graph gives an encoded graph to describe how the device
+ * parses the headers. Use this to learn if a specific protocol is
+ * supported in the current device configuration. The table graph
+ * reports how tables are traversed by packets.
+ *
+ * Get Headers Graph <Request> only requires msg preamble.
+ *
+ * Get Headers Graph <Reply> description
+ *
+ * [NFL_HEADER_GRAPH]
+ *   [NFL_HEADER_GRAPH_NODE]
+ *	[NFL_HEADER_NODE_NAME]
+ *	[NFL_HEADER_NODE_HDRS]
+ *	    [NFL_HEADER_NODE_HDRS_VALUE]
+ *	    [...]
+ *	[NFL_HEADER_NODE_JUMP]]
+ *	  [NFL_JUMP_ENTRY]
+ *	    [NFL_FIELD_REF_NEXT_NODE]
+ *	    [NFL_FIELD_REF_INSTANCE]
+ *	    [NFL_FIELD_REF_HEADER]
+ *	    [NFL_FIELD_REF_FIELD]
+ *	    [NFL_FIELD_REF_MASK]
+ *	    [NFL_FIELD_REF_TYPE]
+ *	    [NFL_FIELD_REF_VALUE]
+ *	    [NFL_FIELD_REF_MASK]
+ *	  [...]
+ *   [NFL_HEADER_GRAPH_NODE]
+ *	[
+ *
+ * Get Table Graph <Request> only requires msg preamble.
+ *
+ * Get Table Graph <Reply> description
+ *
+ * [NFL_TABLE_GRAPH]
+ *   [NFL_TABLE_GRAPH_NODE]
+ *	[NFL_TABLE_GRAPH_NODE_UID]
+ *	[NFL_TABLE_GRAPH_NODE_JUMP]
+ *	  [NFL_JUMP_ENTRY]
+ *	    [NFL_FIELD_REF_NEXT_NODE]
+ *	    [NFL_FIELD_REF_INSTANCE]
+ *	    [NFL_FIELD_REF_HEADER]
+ *	    [NFL_FIELD_REF_FIELD]
+ *	    [NFL_FIELD_REF_MASK]
+ *	    [NFL_FIELD_REF_TYPE]
+ *	    [NFL_FIELD_REF_VALUE]
+ *	    [NFL_FIELD_REF_MASK]
+ *	  [...]
+ *   [NFL_TABLE_GRAPH_NODE]
+ *	[..]
+ */
+
+#ifndef _UAPI_LINUX_IF_FLOW
+#define _UAPI_LINUX_IF_FLOW
+
+#include <linux/types.h>
+#include <linux/netlink.h>
+#include <linux/if.h>
+
+enum {
+	NFL_FIELD_UNSPEC,
+	NFL_FIELD,
+	__NFL_FIELD_MAX,
+};
+
+#define NFL_FIELD_MAX (__NFL_FIELD_MAX - 1)
+
+enum {
+	NFL_FIELD_ATTR_UNSPEC,
+	NFL_FIELD_ATTR_NAME,
+	NFL_FIELD_ATTR_UID,
+	NFL_FIELD_ATTR_BITWIDTH,
+	__NFL_FIELD_ATTR_MAX,
+};
+
+#define NFL_FIELD_ATTR_MAX (__NFL_FIELD_ATTR_MAX - 1)
+
+enum {
+	NFL_HEADER_UNSPEC,
+	NFL_HEADER,
+	__NFL_HEADER_MAX,
+};
+
+#define NFL_HEADER_MAX (__NFL_HEADER_MAX - 1)
+
+enum {
+	NFL_HEADER_ATTR_UNSPEC,
+	NFL_HEADER_ATTR_NAME,
+	NFL_HEADER_ATTR_UID,
+	NFL_HEADER_ATTR_FIELDS,
+	__NFL_HEADER_ATTR_MAX,
+};
+
+#define NFL_HEADER_ATTR_MAX (__NFL_HEADER_ATTR_MAX - 1)
+
+enum {
+	NFL_MASK_TYPE_UNSPEC,
+	NFL_MASK_TYPE_EXACT,
+	NFL_MASK_TYPE_LPM,
+	NFL_MASK_TYPE_MASK,
+};
+
+enum {
+	NFL_FIELD_REF_UNSPEC,
+	NFL_FIELD_REF_NEXT_NODE,
+	NFL_FIELD_REF_INSTANCE,
+	NFL_FIELD_REF_HEADER,
+	NFL_FIELD_REF_FIELD,
+	NFL_FIELD_REF_MASK_TYPE,
+	NFL_FIELD_REF_TYPE,
+	NFL_FIELD_REF_VALUE,
+	NFL_FIELD_REF_MASK,
+	__NFL_FIELD_REF_MAX,
+};
+
+#define NFL_FIELD_REF_MAX (__NFL_FIELD_REF_MAX - 1)
+
+enum {
+	NFL_FIELD_REFS_UNSPEC,
+	NFL_FIELD_REF,
+	__NFL_FIELD_REFS_MAX,
+};
+
+#define NFL_FIELD_REFS_MAX (__NFL_FIELD_REFS_MAX - 1)
+
+enum {
+	NFL_FIELD_REF_ATTR_TYPE_UNSPEC,
+	NFL_FIELD_REF_ATTR_TYPE_U8,
+	NFL_FIELD_REF_ATTR_TYPE_U16,
+	NFL_FIELD_REF_ATTR_TYPE_U32,
+	NFL_FIELD_REF_ATTR_TYPE_U64,
+};
+
+enum net_flow_action_arg_type {
+	NFL_ACTION_ARG_TYPE_NULL,
+	NFL_ACTION_ARG_TYPE_U8,
+	NFL_ACTION_ARG_TYPE_U16,
+	NFL_ACTION_ARG_TYPE_U32,
+	NFL_ACTION_ARG_TYPE_U64,
+	__NFL_ACTION_ARG_TYPE_VAL_MAX,
+};
+
+enum {
+	NFL_ACTION_ARG_UNSPEC,
+	NFL_ACTION_ARG_NAME,
+	NFL_ACTION_ARG_TYPE,
+	NFL_ACTION_ARG_VALUE,
+	__NFL_ACTION_ARG_MAX,
+};
+
+#define NFL_ACTION_ARG_MAX (__NFL_ACTION_ARG_MAX - 1)
+
+enum {
+	NFL_ACTION_ARGS_UNSPEC,
+	NFL_ACTION_ARG,
+	__NFL_ACTION_ARGS_MAX,
+};
+
+#define NFL_ACTION_ARGS_MAX (__NFL_ACTION_ARGS_MAX - 1)
+
+enum {
+	NFL_ACTION_UNSPEC,
+	NFL_ACTION,
+	__NFL_ACTION_MAX,
+};
+
+#define NFL_ACTION_MAX (__NFL_ACTION_MAX - 1)
+
+enum {
+	NFL_ACTION_ATTR_UNSPEC,
+	NFL_ACTION_ATTR_NAME,
+	NFL_ACTION_ATTR_UID,
+	NFL_ACTION_ATTR_SIGNATURE,
+	__NFL_ACTION_ATTR_MAX,
+};
+
+#define NFL_ACTION_ATTR_MAX (__NFL_ACTION_ATTR_MAX - 1)
+
+enum {
+	NFL_ACTION_SET_UNSPEC,
+	NFL_ACTION_SET_ACTIONS,
+	__NFL_ACTION_SET_MAX,
+};
+
+#define NFL_ACTION_SET_MAX (__NFL_ACTION_SET_MAX - 1)
+
+enum {
+	NFL_TABLE_UNSPEC,
+	NFL_TABLE,
+	__NFL_TABLE_MAX,
+};
+
+#define NFL_TABLE_MAX (__NFL_TABLE_MAX - 1)
+
+enum {
+	NFL_TABLE_ATTR_UNSPEC,
+	NFL_TABLE_ATTR_NAME,
+	NFL_TABLE_ATTR_UID,
+	NFL_TABLE_ATTR_SOURCE,
+	NFL_TABLE_ATTR_APPLY,
+	NFL_TABLE_ATTR_SIZE,
+	NFL_TABLE_ATTR_MATCHES,
+	NFL_TABLE_ATTR_ACTIONS,
+	__NFL_TABLE_ATTR_MAX,
+};
+
+#define NFL_TABLE_ATTR_MAX (__NFL_TABLE_ATTR_MAX - 1)
+
+#define NFL_JUMP_TABLE_DONE 0
+enum {
+	NFL_JUMP_ENTRY_UNSPEC,
+	NFL_JUMP_ENTRY,
+	__NFL_JUMP_ENTRY_MAX,
+};
+
+enum {
+	NFL_HEADER_NODE_HDRS_UNSPEC,
+	NFL_HEADER_NODE_HDRS_VALUE,
+	__NFL_HEADER_NODE_HDRS_MAX,
+};
+
+#define NFL_HEADER_NODE_HDRS_MAX (__NFL_HEADER_NODE_HDRS_MAX - 1)
+
+enum {
+	NFL_HEADER_NODE_UNSPEC,
+	NFL_HEADER_NODE_NAME,
+	NFL_HEADER_NODE_UID,
+	NFL_HEADER_NODE_HDRS,
+	NFL_HEADER_NODE_JUMP,
+	__NFL_HEADER_NODE_MAX,
+};
+
+#define NFL_HEADER_NODE_MAX (__NFL_HEADER_NODE_MAX - 1)
+
+enum {
+	NFL_HEADER_GRAPH_UNSPEC,
+	NFL_HEADER_GRAPH_NODE,
+	__NFL_HEADER_GRAPH_MAX,
+};
+
+#define NFL_HEADER_GRAPH_MAX (__NFL_HEADER_GRAPH_MAX - 1)
+
+#define	NFL_TABLE_EGRESS_ROOT 1
+#define	NFL_TABLE_INGRESS_ROOT 2
+
+enum {
+	NFL_TABLE_GRAPH_NODE_UNSPEC,
+	NFL_TABLE_GRAPH_NODE_UID,
+	NFL_TABLE_GRAPH_NODE_FLAGS,
+	NFL_TABLE_GRAPH_NODE_JUMP,
+	__NFL_TABLE_GRAPH_NODE_MAX,
+};
+
+#define NFL_TABLE_GRAPH_NODE_MAX (__NFL_TABLE_GRAPH_NODE_MAX - 1)
+
+enum {
+	NFL_TABLE_GRAPH_UNSPEC,
+	NFL_TABLE_GRAPH_NODE,
+	__NFL_TABLE_GRAPH_MAX,
+};
+
+#define NFL_TABLE_GRAPH_MAX (__NFL_TABLE_GRAPH_MAX - 1)
+
+enum {
+	NFL_NFL_UNSPEC,
+	NFL_FLOW,
+	__NFL_NFL_MAX,
+};
+
+#define NFL_NFL_MAX (__NFL_NFL_MAX - 1)
+
+enum {
+	NFL_IDENTIFIER_UNSPEC,
+	NFL_IDENTIFIER_IFINDEX, /* net_device ifindex */
+};
+
+enum {
+	NFL_UNSPEC,
+	NFL_IDENTIFIER_TYPE,
+	NFL_IDENTIFIER,
+
+	NFL_TABLES,
+	NFL_HEADERS,
+	NFL_ACTIONS,
+	NFL_HEADER_GRAPH,
+	NFL_TABLE_GRAPH,
+
+	__NFL_MAX,
+	NFL_MAX = (__NFL_MAX - 1),
+};
+
+enum {
+	NFL_TABLE_CMD_GET_TABLES,
+	NFL_TABLE_CMD_GET_HEADERS,
+	NFL_TABLE_CMD_GET_ACTIONS,
+	NFL_TABLE_CMD_GET_HDR_GRAPH,
+	NFL_TABLE_CMD_GET_TABLE_GRAPH,
+
+	__NFL_CMD_MAX,
+	NFL_CMD_MAX = (__NFL_CMD_MAX - 1),
+};
+
+#define NFL_GENL_NAME "net_flow_nl"
+#define NFL_GENL_VERSION 0x1
+#endif /* _UAPI_LINUX_IF_FLOW */
diff --git a/net/Kconfig b/net/Kconfig
index ff9ffc1..8380bfe 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -293,6 +293,13 @@ config NET_FLOW_LIMIT
 	  with many clients some protection against DoS by a single (spoofed)
 	  flow that greatly exceeds average workload.
 
+config NET_FLOW_TABLES
+	boolean "Support network flow tables"
+	---help---
+	This feature provides an interface for device drivers to report
+	flow tables and supported matches and actions. If you do not
+	want to support hardware offloads for flow tables, say N here.
+
 menu "Network testing"
 
 config NET_PKTGEN
diff --git a/net/core/Makefile b/net/core/Makefile
index 235e6c5..1eea785 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -23,3 +23,4 @@ obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
 obj-$(CONFIG_NET_PTP_CLASSIFY) += ptp_classifier.o
 obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
 obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
+obj-$(CONFIG_NET_FLOW_TABLES) += flow_table.o
diff --git a/net/core/flow_table.c b/net/core/flow_table.c
new file mode 100644
index 0000000..f994acb
--- /dev/null
+++ b/net/core/flow_table.c
@@ -0,0 +1,942 @@
+/*
+ * net/core/flow_table.c - Flow table interface for Switch devices
+ * Copyright (c) 2014 John Fastabend <john.r.fastabend@intel.com>
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Author: John Fastabend <john.r.fastabend@intel.com>
+ */
+
+#include <uapi/linux/if_flow.h>
+#include <linux/if_flow.h>
+#include <linux/if_bridge.h>
+#include <linux/types.h>
+#include <net/netlink.h>
+#include <net/genetlink.h>
+#include <net/rtnetlink.h>
+#include <linux/module.h>
+
+static struct genl_family net_flow_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= NFL_GENL_NAME,
+	.version	= NFL_GENL_VERSION,
+	.maxattr	= NFL_MAX,
+	.netnsok	= true,
+};
+
+static struct net_device *net_flow_get_dev(struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	int type, ifindex;
+
+	if (!info->attrs[NFL_IDENTIFIER_TYPE] ||
+	    !info->attrs[NFL_IDENTIFIER])
+		return NULL;
+
+	type = nla_get_u32(info->attrs[NFL_IDENTIFIER_TYPE]);
+	switch (type) {
+	case NFL_IDENTIFIER_IFINDEX:
+		ifindex = nla_get_u32(info->attrs[NFL_IDENTIFIER]);
+		break;
+	default:
+		return NULL;
+	}
+
+	return dev_get_by_index(net, ifindex);
+}
+
+static int net_flow_put_act_types(struct sk_buff *skb,
+				  struct net_flow_action_arg *args)
+{
+	struct nlattr *arg;
+	int i, err;
+
+	for (i = 0; args[i].type; i++) {
+		arg = nla_nest_start(skb, NFL_ACTION_ARG);
+		if (!arg)
+			return -EMSGSIZE;
+
+		if (args[i].name) {
+			err = nla_put_string(skb, NFL_ACTION_ARG_NAME,
+					     args[i].name);
+			if (err)
+				goto out;
+		}
+
+		err = nla_put_u32(skb, NFL_ACTION_ARG_TYPE, args[i].type);
+		if (err)
+			goto out;
+
+		nla_nest_end(skb, arg);
+	}
+	return 0;
+out:
+	nla_nest_cancel(skb, arg);
+	return err;
+}
+
+static const
+struct nla_policy net_flow_action_policy[NFL_ACTION_ATTR_MAX + 1] = {
+	[NFL_ACTION_ATTR_NAME]	    = {.type = NLA_STRING },
+	[NFL_ACTION_ATTR_UID]	    = {.type = NLA_U32 },
+	[NFL_ACTION_ATTR_SIGNATURE] = {.type = NLA_NESTED },
+};
+
+static int net_flow_put_action(struct sk_buff *skb, struct net_flow_action *a)
+{
+	struct nlattr *nest;
+	int err;
+
+	if (a->name && nla_put_string(skb, NFL_ACTION_ATTR_NAME, a->name))
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, NFL_ACTION_ATTR_UID, a->uid))
+		return -EMSGSIZE;
+
+	if (a->args) {
+		nest = nla_nest_start(skb, NFL_ACTION_ATTR_SIGNATURE);
+		if (!nest)
+			return -EMSGSIZE;
+
+		err = net_flow_put_act_types(skb, a->args);
+		if (err) {
+			nla_nest_cancel(skb, nest);
+			return err;
+		}
+		nla_nest_end(skb, nest);
+	}
+
+	return 0;
+}
+
+static int net_flow_put_actions(struct sk_buff *skb,
+				struct net_flow_action **acts)
+{
+	struct nlattr *actions;
+	int i, err;
+
+	actions = nla_nest_start(skb, NFL_ACTIONS);
+	if (!actions)
+		return -EMSGSIZE;
+
+	for (i = 0; acts[i]; i++) {
+		struct nlattr *action = nla_nest_start(skb, NFL_ACTION);
+
+		if (!action)
+			goto action_put_failure;
+
+		err = net_flow_put_action(skb, acts[i]);
+		if (err)
+			goto action_put_failure;
+		nla_nest_end(skb, action);
+	}
+	nla_nest_end(skb, actions);
+
+	return 0;
+action_put_failure:
+	nla_nest_cancel(skb, actions);
+	return -EMSGSIZE;
+}
+
+static struct sk_buff *net_flow_build_actions_msg(struct net_flow_action **a,
+						  struct net_device *dev,
+						  u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *hdr;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NFL_IDENTIFIER_TYPE,
+			NFL_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NFL_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	err = net_flow_put_actions(skb, a);
+	if (err < 0)
+		goto out;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static int net_flow_cmd_get_actions(struct sk_buff *skb,
+				    struct genl_info *info)
+{
+	struct net_flow_action **a;
+	struct net_device *dev;
+	struct sk_buff *msg;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_get_actions) {
+		dev_put(dev);
+		return -EOPNOTSUPP;
+	}
+
+	a = dev->netdev_ops->ndo_flow_get_actions(dev);
+	if (!a) {
+		dev_put(dev);
+		return -EBUSY;
+	}
+
+	msg = net_flow_build_actions_msg(a, dev,
+					 info->snd_portid,
+					 info->snd_seq,
+					 NFL_TABLE_CMD_GET_ACTIONS);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+}
+
+static int net_flow_put_field_ref(struct sk_buff *skb,
+				  struct net_flow_field_ref *ref)
+{
+	if (nla_put_u32(skb, NFL_FIELD_REF_INSTANCE, ref->instance) ||
+	    nla_put_u32(skb, NFL_FIELD_REF_HEADER, ref->header) ||
+	    nla_put_u32(skb, NFL_FIELD_REF_FIELD, ref->field) ||
+	    nla_put_u32(skb, NFL_FIELD_REF_MASK_TYPE, ref->mask_type) ||
+	    nla_put_u32(skb, NFL_FIELD_REF_TYPE, ref->type))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
+static int net_flow_put_field_value(struct sk_buff *skb,
+				    struct net_flow_field_ref *r)
+{
+	int err = -EINVAL;
+
+	switch (r->type) {
+	case NFL_FIELD_REF_ATTR_TYPE_UNSPEC:
+		err = 0;
+		break;
+	case NFL_FIELD_REF_ATTR_TYPE_U8:
+		err = nla_put_u8(skb, NFL_FIELD_REF_VALUE, r->value_u8);
+		if (err)
+			break;
+		err = nla_put_u8(skb, NFL_FIELD_REF_MASK, r->mask_u8);
+		break;
+	case NFL_FIELD_REF_ATTR_TYPE_U16:
+		err = nla_put_u16(skb, NFL_FIELD_REF_VALUE, r->value_u16);
+		if (err)
+			break;
+		err = nla_put_u16(skb, NFL_FIELD_REF_MASK, r->mask_u16);
+		break;
+	case NFL_FIELD_REF_ATTR_TYPE_U32:
+		err = nla_put_u32(skb, NFL_FIELD_REF_VALUE, r->value_u32);
+		if (err)
+			break;
+		err = nla_put_u32(skb, NFL_FIELD_REF_MASK, r->mask_u32);
+		break;
+	case NFL_FIELD_REF_ATTR_TYPE_U64:
+		err = nla_put_u64(skb, NFL_FIELD_REF_VALUE, r->value_u64);
+		if (err)
+			break;
+		err = nla_put_u64(skb, NFL_FIELD_REF_MASK, r->mask_u64);
+		break;
+	default:
+		break;
+	}
+	return err;
+}
+
+static int net_flow_put_table(struct net_device *dev,
+			      struct sk_buff *skb,
+			      struct net_flow_tbl *t)
+{
+	struct nlattr *matches, *actions, *field;
+	int i, err;
+
+	if (nla_put_string(skb, NFL_TABLE_ATTR_NAME, t->name) ||
+	    nla_put_u32(skb, NFL_TABLE_ATTR_UID, t->uid) ||
+	    nla_put_u32(skb, NFL_TABLE_ATTR_SOURCE, t->source) ||
+	    nla_put_u32(skb, NFL_TABLE_ATTR_APPLY, t->apply_action) ||
+	    nla_put_u32(skb, NFL_TABLE_ATTR_SIZE, t->size))
+		return -EMSGSIZE;
+
+	matches = nla_nest_start(skb, NFL_TABLE_ATTR_MATCHES);
+	if (!matches)
+		return -EMSGSIZE;
+
+	for (i = 0; t->matches[i].instance; i++) {
+		field = nla_nest_start(skb, NFL_FIELD_REF);
+
+		err = net_flow_put_field_ref(skb, &t->matches[i]);
+		if (err) {
+			nla_nest_cancel(skb, matches);
+			return -EMSGSIZE;
+		}
+
+		nla_nest_end(skb, field);
+	}
+	nla_nest_end(skb, matches);
+
+	actions = nla_nest_start(skb, NFL_TABLE_ATTR_ACTIONS);
+	if (!actions)
+		return -EMSGSIZE;
+
+	for (i = 0; t->actions[i]; i++) {
+		if (nla_put_u32(skb,
+				NFL_ACTION_ATTR_UID,
+				t->actions[i])) {
+			nla_nest_cancel(skb, actions);
+			return -EMSGSIZE;
+		}
+	}
+	nla_nest_end(skb, actions);
+
+	return 0;
+}
+
+static int net_flow_put_tables(struct net_device *dev,
+			       struct sk_buff *skb,
+			       struct net_flow_tbl **tables)
+{
+	struct nlattr *nest, *t;
+	int i, err = 0;
+
+	nest = nla_nest_start(skb, NFL_TABLES);
+	if (!nest)
+		return -EMSGSIZE;
+
+	for (i = 0; tables[i]; i++) {
+		t = nla_nest_start(skb, NFL_TABLE);
+		if (!t) {
+			err = -EMSGSIZE;
+			goto errout;
+		}
+
+		err = net_flow_put_table(dev, skb, tables[i]);
+		if (err) {
+			nla_nest_cancel(skb, t);
+			goto errout;
+		}
+		nla_nest_end(skb, t);
+	}
+	nla_nest_end(skb, nest);
+	return 0;
+errout:
+	nla_nest_cancel(skb, nest);
+	return err;
+}
+
+static struct sk_buff *net_flow_build_tables_msg(struct net_flow_tbl **t,
+						 struct net_device *dev,
+						 u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *hdr;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NFL_IDENTIFIER_TYPE,
+			NFL_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NFL_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	err = net_flow_put_tables(dev, skb, t);
+	if (err < 0)
+		goto out;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static int net_flow_cmd_get_tables(struct sk_buff *skb,
+				   struct genl_info *info)
+{
+	struct net_flow_tbl **tables;
+	struct net_device *dev;
+	struct sk_buff *msg;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_get_tbls) {
+		dev_put(dev);
+		return -EOPNOTSUPP;
+	}
+
+	tables = dev->netdev_ops->ndo_flow_get_tbls(dev);
+	if (!tables) {
+		dev_put(dev);
+		return -EBUSY;
+	}
+
+	msg = net_flow_build_tables_msg(tables, dev,
+					info->snd_portid,
+					info->snd_seq,
+					NFL_TABLE_CMD_GET_TABLES);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+}
+
+static
+int net_flow_put_fields(struct sk_buff *skb, const struct net_flow_hdr *h)
+{
+	struct net_flow_field *f;
+	int count = h->field_sz;
+	struct nlattr *field;
+
+	for (f = h->fields; count; count--, f++) {
+		field = nla_nest_start(skb, NFL_FIELD);
+		if (!field)
+			goto field_put_failure;
+
+		if (nla_put_string(skb, NFL_FIELD_ATTR_NAME, f->name) ||
+		    nla_put_u32(skb, NFL_FIELD_ATTR_UID, f->uid) ||
+		    nla_put_u32(skb, NFL_FIELD_ATTR_BITWIDTH, f->bitwidth))
+			goto out;
+
+		nla_nest_end(skb, field);
+	}
+
+	return 0;
+out:
+	nla_nest_cancel(skb, field);
+field_put_failure:
+	return -EMSGSIZE;
+}
+
+static int net_flow_put_headers(struct sk_buff *skb,
+				struct net_flow_hdr **headers)
+{
+	struct nlattr *nest, *hdr, *fields;
+	struct net_flow_hdr *h;
+	int i, err;
+
+	nest = nla_nest_start(skb, NFL_HEADERS);
+	if (!nest)
+		return -EMSGSIZE;
+
+	for (i = 0; headers[i]; i++) {
+		err = -EMSGSIZE;
+		h = headers[i];
+
+		hdr = nla_nest_start(skb, NFL_HEADER);
+		if (!hdr)
+			goto put_failure;
+
+		if (nla_put_string(skb, NFL_HEADER_ATTR_NAME, h->name) ||
+		    nla_put_u32(skb, NFL_HEADER_ATTR_UID, h->uid))
+			goto put_failure;
+
+		fields = nla_nest_start(skb, NFL_HEADER_ATTR_FIELDS);
+		if (!fields)
+			goto put_failure;
+
+		err = net_flow_put_fields(skb, h);
+		if (err)
+			goto put_failure;
+
+		nla_nest_end(skb, fields);
+
+		nla_nest_end(skb, hdr);
+	}
+	nla_nest_end(skb, nest);
+
+	return 0;
+put_failure:
+	nla_nest_cancel(skb, nest);
+	return err;
+}
+
+static struct sk_buff *net_flow_build_headers_msg(struct net_flow_hdr **h,
+						  struct net_device *dev,
+						  u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *hdr;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NFL_IDENTIFIER_TYPE,
+			NFL_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NFL_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	err = net_flow_put_headers(skb, h);
+	if (err < 0)
+		goto out;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static int net_flow_cmd_get_headers(struct sk_buff *skb,
+				    struct genl_info *info)
+{
+	struct net_flow_hdr **h;
+	struct net_device *dev;
+	struct sk_buff *msg;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_get_hdrs) {
+		dev_put(dev);
+		return -EOPNOTSUPP;
+	}
+
+	h = dev->netdev_ops->ndo_flow_get_hdrs(dev);
+	if (!h) {
+		dev_put(dev);
+		return -EBUSY;
+	}
+
+	msg = net_flow_build_headers_msg(h, dev,
+					 info->snd_portid,
+					 info->snd_seq,
+					 NFL_TABLE_CMD_GET_HEADERS);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+}
+
+static int net_flow_put_header_node(struct sk_buff *skb,
+				    struct net_flow_hdr_node *node)
+{
+	struct nlattr *hdrs, *jumps;
+	int i, err;
+
+	if (nla_put_string(skb, NFL_HEADER_NODE_NAME, node->name) ||
+	    nla_put_u32(skb, NFL_HEADER_NODE_UID, node->uid))
+		return -EMSGSIZE;
+
+	/* Insert the set of headers that get extracted at this node */
+	hdrs = nla_nest_start(skb, NFL_HEADER_NODE_HDRS);
+	if (!hdrs)
+		return -EMSGSIZE;
+	for (i = 0; node->hdrs[i]; i++) {
+		if (nla_put_u32(skb, NFL_HEADER_NODE_HDRS_VALUE,
+				node->hdrs[i])) {
+			nla_nest_cancel(skb, hdrs);
+			return -EMSGSIZE;
+		}
+	}
+	nla_nest_end(skb, hdrs);
+
+	/* Then give the jump table to find next header node in graph */
+	jumps = nla_nest_start(skb, NFL_HEADER_NODE_JUMP);
+	if (!jumps)
+		return -EMSGSIZE;
+
+	for (i = 0; node->jump[i].node; i++) {
+		struct nlattr *entry;
+
+		entry = nla_nest_start(skb, NFL_JUMP_ENTRY);
+		if (!entry) {
+			nla_nest_cancel(skb, jumps);
+			return -EMSGSIZE;
+		}
+
+		err = nla_put_u32(skb, NFL_FIELD_REF_NEXT_NODE,
+				  node->jump[i].node);
+		if (err) {
+			nla_nest_cancel(skb, jumps);
+			return err;
+		}
+
+		err = net_flow_put_field_ref(skb, &node->jump[i].field);
+		if (err) {
+			nla_nest_cancel(skb, jumps);
+			return err;
+		}
+
+		err = net_flow_put_field_value(skb, &node->jump[i].field);
+		if (err) {
+			nla_nest_cancel(skb, jumps);
+			return err;
+		}
+		nla_nest_end(skb, entry);
+	}
+	nla_nest_end(skb, jumps);
+
+	return 0;
+}
+
+static int net_flow_put_header_graph(struct sk_buff *skb,
+				     struct net_flow_hdr_node **g)
+{
+	struct nlattr *nodes, *node;
+	int i, err;
+
+	nodes = nla_nest_start(skb, NFL_HEADER_GRAPH);
+	if (!nodes)
+		return -EMSGSIZE;
+
+	for (i = 0; g[i]; i++) {
+		node = nla_nest_start(skb, NFL_HEADER_GRAPH_NODE);
+		if (!node) {
+			err = -EMSGSIZE;
+			goto nodes_put_error;
+		}
+
+		err = net_flow_put_header_node(skb, g[i]);
+		if (err)
+			goto nodes_put_error;
+
+		nla_nest_end(skb, node);
+	}
+
+	nla_nest_end(skb, nodes);
+	return 0;
+nodes_put_error:
+	nla_nest_cancel(skb, nodes);
+	return err;
+}
+
+static
+struct sk_buff *net_flow_build_header_graph_msg(struct net_flow_hdr_node **g,
+						struct net_device *dev,
+						u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *hdr;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NFL_IDENTIFIER_TYPE,
+			NFL_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NFL_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	err = net_flow_put_header_graph(skb, g);
+	if (err < 0)
+		goto out;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static int net_flow_cmd_get_header_graph(struct sk_buff *skb,
+					 struct genl_info *info)
+{
+	struct net_flow_hdr_node **h;
+	struct net_device *dev;
+	struct sk_buff *msg;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_get_hdr_graph) {
+		dev_put(dev);
+		return -EOPNOTSUPP;
+	}
+
+	h = dev->netdev_ops->ndo_flow_get_hdr_graph(dev);
+	if (!h) {
+		dev_put(dev);
+		return -EBUSY;
+	}
+
+	msg = net_flow_build_header_graph_msg(h, dev,
+					      info->snd_portid,
+					      info->snd_seq,
+					      NFL_TABLE_CMD_GET_HDR_GRAPH);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+}
+
+static int net_flow_put_table_node(struct sk_buff *skb,
+				   struct net_flow_tbl_node *node)
+{
+	struct nlattr *nest, *jump;
+	int i, err = -EMSGSIZE;
+
+	nest = nla_nest_start(skb, NFL_TABLE_GRAPH_NODE);
+	if (!nest)
+		return err;
+
+	if (nla_put_u32(skb, NFL_TABLE_GRAPH_NODE_UID, node->uid) ||
+	    nla_put_u32(skb, NFL_TABLE_GRAPH_NODE_FLAGS, node->flags))
+		goto node_put_failure;
+
+	jump = nla_nest_start(skb, NFL_TABLE_GRAPH_NODE_JUMP);
+	if (!jump)
+		goto node_put_failure;
+
+	for (i = 0; node->jump[i].node; i++) {
+		struct nlattr *entry;
+
+		entry = nla_nest_start(skb, NFL_JUMP_ENTRY);
+		if (!entry)
+			goto node_put_failure;
+
+		err = nla_put_u32(skb, NFL_FIELD_REF_NEXT_NODE,
+				  node->jump[i].node);
+		if (err) {
+			nla_nest_cancel(skb, jump);
+			return err;
+		}
+
+		err = net_flow_put_field_ref(skb, &node->jump[i].field);
+		if (err)
+			goto node_put_failure;
+
+		err = net_flow_put_field_value(skb, &node->jump[i].field);
+		if (err)
+			goto node_put_failure;
+
+		nla_nest_end(skb, entry);
+	}
+
+	nla_nest_end(skb, jump);
+	nla_nest_end(skb, nest);
+	return 0;
+node_put_failure:
+	nla_nest_cancel(skb, nest);
+	return err;
+}
+
+static int net_flow_put_table_graph(struct sk_buff *skb,
+				    struct net_flow_tbl_node **nodes)
+{
+	struct nlattr *graph;
+	int i, err;
+
+	graph = nla_nest_start(skb, NFL_TABLE_GRAPH);
+	if (!graph)
+		return -EMSGSIZE;
+
+	for (i = 0; nodes[i]; i++) {
+		err = net_flow_put_table_node(skb, nodes[i]);
+		if (err) {
+			nla_nest_cancel(skb, graph);
+			return -EMSGSIZE;
+		}
+	}
+
+	nla_nest_end(skb, graph);
+	return 0;
+}
+
+static
+struct sk_buff *net_flow_build_graph_msg(struct net_flow_tbl_node **g,
+					 struct net_device *dev,
+					 u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *hdr;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NFL_IDENTIFIER_TYPE,
+			NFL_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NFL_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	err = net_flow_put_table_graph(skb, g);
+	if (err < 0)
+		goto out;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static int net_flow_cmd_get_table_graph(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct net_flow_tbl_node **g;
+	struct net_device *dev;
+	struct sk_buff *msg;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_get_tbl_graph) {
+		dev_put(dev);
+		return -EOPNOTSUPP;
+	}
+
+	g = dev->netdev_ops->ndo_flow_get_tbl_graph(dev);
+	if (!g) {
+		dev_put(dev);
+		return -EBUSY;
+	}
+
+	msg = net_flow_build_graph_msg(g, dev,
+				       info->snd_portid,
+				       info->snd_seq,
+				       NFL_TABLE_CMD_GET_TABLE_GRAPH);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+}
+
+static const struct nla_policy net_flow_cmd_policy[NFL_MAX + 1] = {
+	[NFL_IDENTIFIER_TYPE]	= {.type = NLA_U32, },
+	[NFL_IDENTIFIER]	= {.type = NLA_U32, },
+	[NFL_TABLES]		= {.type = NLA_NESTED, },
+	[NFL_HEADERS]		= {.type = NLA_NESTED, },
+	[NFL_ACTIONS]		= {.type = NLA_NESTED, },
+	[NFL_HEADER_GRAPH]	= {.type = NLA_NESTED, },
+	[NFL_TABLE_GRAPH]	= {.type = NLA_NESTED, },
+};
+
+static const struct genl_ops net_flow_table_nl_ops[] = {
+	{
+		.cmd = NFL_TABLE_CMD_GET_TABLES,
+		.doit = net_flow_cmd_get_tables,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NFL_TABLE_CMD_GET_HEADERS,
+		.doit = net_flow_cmd_get_headers,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NFL_TABLE_CMD_GET_ACTIONS,
+		.doit = net_flow_cmd_get_actions,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NFL_TABLE_CMD_GET_HDR_GRAPH,
+		.doit = net_flow_cmd_get_header_graph,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NFL_TABLE_CMD_GET_TABLE_GRAPH,
+		.doit = net_flow_cmd_get_table_graph,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+static int __init net_flow_nl_module_init(void)
+{
+	return genl_register_family_with_ops(&net_flow_nl_family,
+					     net_flow_table_nl_ops);
+}
+
+static void net_flow_nl_module_fini(void)
+{
+	genl_unregister_family(&net_flow_nl_family);
+}
+
+module_init(net_flow_nl_module_init);
+module_exit(net_flow_nl_module_fini);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("John Fastabend <john.r.fastabend@intel.com>");
+MODULE_DESCRIPTION("Netlink interface to Flow Tables (Net Flow Netlink)");
+MODULE_ALIAS_GENL_FAMILY(NFL_GENL_NAME);

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 02/12] net: flow_table: add rule, delete rule
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
  2015-01-20 20:26 ` [net-next PATCH v3 01/12] net: flow_table: create interface for hw match/action tables John Fastabend
@ 2015-01-20 20:27 ` John Fastabend
  2015-01-20 20:27 ` [net-next PATCH v3 03/12] net: flow: implement flow cache for get routines John Fastabend
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:27 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

Now that the device capabilities are exposed we can add support to
add and delete rules from the tables. Rules are a set of matches
to run on packets and a set of actions to perform when the match
occurs.

The two operations are

set_rule :

  The set rule operations is used to program a set of rules into a
  hardware device table. The message is consumed via netlink encoded
  message which is then decoded into a null terminated array of
  rule entry structures. A rule entry structure is defined as

     struct net_flow_rule {
			  __u32 table_id;
			  __u32 uid;
			  __u32 priority;
			  struct net_flow_field_ref *matches;
			  struct net_flow_action *actions;
     }

  The table id is the _uid_ returned from 'get_tables' operations.
  Matches is a set of match criteria for packets with a logical AND
  operation done on the set so packets match the entire criteria.
  Actions provide a set of actions to perform when the flow rule is
  hit. Both matches and actions are null terminated arrays.

  The rules are configured in hardware using an ndo op. We do not
  provide a commit operation at the moment and expect hardware
  commits the rules one at a time. Future work may require a commit
  operation to tell the hardware we are done loading rules. On some
  hardware this will help bulk updates.

  Its possible for hardware to return an error from a rule set
  operation. This can occur for many reasons both transient and
  resource constraints. We have different error handling strategies
  built in and listed here,

    *_ERROR_ABORT      abort on first error with errmsg

    *_ERROR_CONTINUE   continue programming rules no errmsg

    *_ERROR_ABORT_LOG  abort on first error and return rule that
 		       failed to user space in reply msg

    *_ERROR_CONT_LOG   continue programming rules and return a list
		       of rules that failed to user space in a reply
		       msg.

  notably missing is a rollback error strategy. I don't have a
  use for this in software yet but the strategy can be added with
  *_ERROR_ROLLBACK for example.

del_rule :

  The delete rule operation uses the same structures and error
  handling strategies as the set_rule operation. Although delete
  messges ommit the matches/actions arrays because they are not
  needed to lookup the flow.

Also thanks to Simon Horman for fixes and other help.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/if_flow.h      |   21 +
 include/linux/netdevice.h    |   10 +
 include/uapi/linux/if_flow.h |   51 +++
 net/core/flow_table.c        |  760 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 841 insertions(+), 1 deletion(-)

diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
index 7ce1e1d..712b54f 100644
--- a/include/linux/if_flow.h
+++ b/include/linux/if_flow.h
@@ -185,4 +185,23 @@ struct net_flow_tbl_node {
 	__u32 flags;
 	struct net_flow_jump_table *jump;
 };
-#endif
+
+/**
+ * @struct net_flow_rule
+ * @brief describes the match/action entry
+ *
+ * @uid unique identifier for flow
+ * @priority priority to execute flow match/action in table
+ * @match null terminated set of match uids match criteria
+ * @action null terminated set of action uids to apply to match
+ *
+ * Flows must match all entries in match set.
+ */
+struct net_flow_rule {
+	__u32 table_id;
+	__u32 uid;
+	__u32 priority;
+	struct net_flow_field_ref *matches;
+	struct net_flow_action *actions;
+};
+#endif /* _IF_FLOW_H_ */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 74481b9..9d57f8b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1061,6 +1061,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	Report a null terminated list of nodes defining the header graph. This
  *	provides the necessary graph to learn the ordering of headers supported
  *	by the device.
+ *
+ * int (*ndo_flow_set_rule)(struct net_device *dev, struct net_flow_rule *f)
+ *	This is used to program a rule into a device table.
+ *
+ * int (*ndo_flow_del_rule)(struct net_device *dev, struct net_flow_rule *f)
+ *	This is used to remove a rule from a device table.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1227,6 +1233,10 @@ struct net_device_ops {
 	struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev);
 	struct net_flow_hdr	 **(*ndo_flow_get_hdrs)(struct net_device *dev);
 	struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev);
+	int		        (*ndo_flow_set_rule)(struct net_device *dev,
+						     struct net_flow_rule *f);
+	int		        (*ndo_flow_del_rule)(struct net_device *dev,
+						     struct net_flow_rule *f);
 #endif
 };
 
diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
index 3314aa2..989d019 100644
--- a/include/uapi/linux/if_flow.h
+++ b/include/uapi/linux/if_flow.h
@@ -149,6 +149,12 @@ enum {
 
 #define NFL_FIELD_MAX (__NFL_FIELD_MAX - 1)
 
+/* Max length supported by kernel name strings only the first n characters
+ * will be used by the kernel API. This is to prevent arbitrarily long
+ * strings being passed from user space.
+ */
+#define NFL_MAX_NAME 80
+
 enum {
 	NFL_FIELD_ATTR_UNSPEC,
 	NFL_FIELD_ATTR_NAME,
@@ -354,6 +360,44 @@ enum {
 #define NFL_NFL_MAX (__NFL_NFL_MAX - 1)
 
 enum {
+	NFL_TABLE_FLOWS_UNSPEC,
+	NFL_TABLE_FLOWS_TABLE,
+	NFL_TABLE_FLOWS_MINPRIO,
+	NFL_TABLE_FLOWS_MAXPRIO,
+	NFL_TABLE_FLOWS_FLOWS,
+	__NFL_TABLE_FLOWS_MAX,
+};
+
+#define NFL_TABLE_FLOWS_MAX (__NFL_TABLE_FLOWS_MAX - 1)
+
+enum {
+	/* Abort with normal errmsg */
+	NFL_FLOWS_ERROR_ABORT,
+	/* Ignore errors and continue without logging */
+	NFL_FLOWS_ERROR_CONTINUE,
+	/* Abort and reply with invalid flow fields */
+	NFL_FLOWS_ERROR_ABORT_LOG,
+	/* Continue and reply with list of invalid flows */
+	NFL_FLOWS_ERROR_CONT_LOG,
+	__NFLS_FLOWS_ERROR_MAX,
+};
+
+#define NFLS_FLOWS_ERROR_MAX (__NFLS_FLOWS_ERROR_MAX - 1)
+
+enum {
+	NFL_ATTR_UNSPEC,
+	NFL_ATTR_ERROR,
+	NFL_ATTR_TABLE,
+	NFL_ATTR_UID,
+	NFL_ATTR_PRIORITY,
+	NFL_ATTR_MATCHES,
+	NFL_ATTR_ACTIONS,
+	__NFL_ATTR_MAX,
+};
+
+#define NFL_ATTR_MAX (__NFL_ATTR_MAX - 1)
+
+enum {
 	NFL_IDENTIFIER_UNSPEC,
 	NFL_IDENTIFIER_IFINDEX, /* net_device ifindex */
 };
@@ -369,6 +413,9 @@ enum {
 	NFL_HEADER_GRAPH,
 	NFL_TABLE_GRAPH,
 
+	NFL_FLOWS,
+	NFL_FLOWS_ERROR,
+
 	__NFL_MAX,
 	NFL_MAX = (__NFL_MAX - 1),
 };
@@ -380,6 +427,10 @@ enum {
 	NFL_TABLE_CMD_GET_HDR_GRAPH,
 	NFL_TABLE_CMD_GET_TABLE_GRAPH,
 
+	NFL_TABLE_CMD_GET_FLOWS,
+	NFL_TABLE_CMD_SET_FLOWS,
+	NFL_TABLE_CMD_DEL_FLOWS,
+
 	__NFL_CMD_MAX,
 	NFL_CMD_MAX = (__NFL_CMD_MAX - 1),
 };
diff --git a/net/core/flow_table.c b/net/core/flow_table.c
index f994acb..7b85e53 100644
--- a/net/core/flow_table.c
+++ b/net/core/flow_table.c
@@ -27,6 +27,18 @@
 #include <net/rtnetlink.h>
 #include <linux/module.h>
 
+static DEFINE_MUTEX(net_flow_mutex);
+
+void net_flow_lock(void)
+{
+	mutex_lock(&net_flow_mutex);
+}
+
+void net_flow_unlock(void)
+{
+	mutex_unlock(&net_flow_mutex);
+}
+
 static struct genl_family net_flow_nl_family = {
 	.id		= GENL_ID_GENERATE,
 	.name		= NFL_GENL_NAME,
@@ -78,6 +90,34 @@ static int net_flow_put_act_types(struct sk_buff *skb,
 		if (err)
 			goto out;
 
+		switch (args[i].type) {
+		case NFL_ACTION_ARG_TYPE_NULL:
+			err = 0;
+			break;
+		case NFL_ACTION_ARG_TYPE_U8:
+			err = nla_put_u8(skb, NFL_ACTION_ARG_VALUE,
+					 args[i].value_u8);
+			break;
+		case NFL_ACTION_ARG_TYPE_U16:
+			err = nla_put_u16(skb, NFL_ACTION_ARG_VALUE,
+					  args[i].value_u16);
+			break;
+		case NFL_ACTION_ARG_TYPE_U32:
+			err = nla_put_u32(skb, NFL_ACTION_ARG_VALUE,
+					  args[i].value_u32);
+			break;
+		case NFL_ACTION_ARG_TYPE_U64:
+			err = nla_put_u64(skb, NFL_ACTION_ARG_VALUE,
+					  args[i].value_u64);
+			break;
+		default:
+			err = -EINVAL;
+			break;
+		}
+
+		if (err)
+			goto out;
+
 		nla_nest_end(skb, arg);
 	}
 	return 0;
@@ -879,6 +919,708 @@ static int net_flow_cmd_get_table_graph(struct sk_buff *skb,
 	return genlmsg_reply(msg, info);
 }
 
+static int net_flow_put_flow_action(struct sk_buff *skb,
+				    struct net_flow_action *a)
+{
+	struct nlattr *action, *sigs;
+	int err = 0;
+
+	action = nla_nest_start(skb, NFL_ACTION);
+	if (!action)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, NFL_ACTION_ATTR_UID, a->uid))
+		return -EMSGSIZE;
+
+	if (!a->args)
+		goto done;
+
+	sigs = nla_nest_start(skb, NFL_ACTION_ATTR_SIGNATURE);
+	if (!sigs) {
+		nla_nest_cancel(skb, action);
+		return -EMSGSIZE;
+	}
+
+	err = net_flow_put_act_types(skb, a->args);
+	if (err) {
+		nla_nest_cancel(skb, action);
+		return err;
+	}
+	nla_nest_end(skb, sigs);
+done:
+	nla_nest_end(skb, action);
+	return 0;
+}
+
+static int net_flow_put_rule(struct sk_buff *skb, struct net_flow_rule *rule)
+{
+	struct nlattr *flows, *actions, *matches;
+	int j, i = 0;
+	int err = -EMSGSIZE;
+
+	flows = nla_nest_start(skb, NFL_FLOW);
+	if (!flows)
+		goto put_failure;
+
+	if (nla_put_u32(skb, NFL_ATTR_TABLE, rule->table_id) ||
+	    nla_put_u32(skb, NFL_ATTR_UID, rule->uid) ||
+	    nla_put_u32(skb, NFL_ATTR_PRIORITY, rule->priority))
+		goto flows_put_failure;
+
+	if (rule->matches) {
+		matches = nla_nest_start(skb, NFL_ATTR_MATCHES);
+		if (!matches)
+			goto flows_put_failure;
+
+		for (j = 0; rule->matches && rule->matches[j].header; j++) {
+			struct net_flow_field_ref *f = &rule->matches[j];
+			struct nlattr *field;
+
+			field = nla_nest_start(skb, NFL_FIELD_REF);
+			if (!field) {
+				err = -EMSGSIZE;
+				goto flows_put_failure;
+			}
+
+			err = net_flow_put_field_ref(skb, f);
+			if (err)
+				goto flows_put_failure;
+
+			err = net_flow_put_field_value(skb, f);
+			if (err)
+				goto flows_put_failure;
+
+			nla_nest_end(skb, field);
+		}
+		nla_nest_end(skb, matches);
+	}
+
+	if (rule->actions) {
+		actions = nla_nest_start(skb, NFL_ATTR_ACTIONS);
+		if (!actions)
+			goto flows_put_failure;
+
+		for (i = 0; rule->actions && rule->actions[i].uid; i++) {
+			err = net_flow_put_flow_action(skb, &rule->actions[i]);
+			if (err)
+				goto flows_put_failure;
+		}
+		nla_nest_end(skb, actions);
+	}
+
+	nla_nest_end(skb, flows);
+	return 0;
+
+flows_put_failure:
+	nla_nest_cancel(skb, flows);
+put_failure:
+	return err;
+}
+
+static struct sk_buff *net_flow_build_flows_msg(struct net_device *dev,
+						u32 portid, int seq, u8 cmd,
+						int min, int max, int table)
+{
+	struct genlmsghdr *hdr;
+	struct nlattr *flows;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NFL_IDENTIFIER_TYPE,
+			NFL_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NFL_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	flows = nla_nest_start(skb, NFL_FLOWS);
+	if (!flows) {
+		err = -EMSGSIZE;
+		goto out;
+	}
+
+	err = -EOPNOTSUPP;
+	if (err < 0)
+		goto out_cancel;
+
+	nla_nest_end(skb, flows);
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out_cancel:
+	nla_nest_cancel(skb, flows);
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static const
+struct nla_policy net_flow_table_flows_policy[NFL_TABLE_FLOWS_MAX + 1] = {
+	[NFL_TABLE_FLOWS_TABLE]   = { .type = NLA_U32,},
+	[NFL_TABLE_FLOWS_MINPRIO] = { .type = NLA_U32,},
+	[NFL_TABLE_FLOWS_MAXPRIO] = { .type = NLA_U32,},
+	[NFL_TABLE_FLOWS_FLOWS]   = { .type = NLA_NESTED,},
+};
+
+static int net_flow_table_cmd_get_flows(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct nlattr *tb[NFL_TABLE_FLOWS_MAX+1];
+	int table, min = -1, max = -1;
+	struct net_device *dev;
+	struct sk_buff *msg;
+	int err = -EINVAL;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!info->attrs[NFL_IDENTIFIER_TYPE] ||
+	    !info->attrs[NFL_IDENTIFIER] ||
+	    !info->attrs[NFL_FLOWS])
+		goto out;
+
+	err = nla_parse_nested(tb, NFL_TABLE_FLOWS_MAX,
+			       info->attrs[NFL_FLOWS],
+			       net_flow_table_flows_policy);
+	if (err)
+		goto out;
+
+	if (!tb[NFL_TABLE_FLOWS_TABLE])
+		goto out;
+
+	table = nla_get_u32(tb[NFL_TABLE_FLOWS_TABLE]);
+
+	if (tb[NFL_TABLE_FLOWS_MINPRIO])
+		min = nla_get_u32(tb[NFL_TABLE_FLOWS_MINPRIO]);
+	if (tb[NFL_TABLE_FLOWS_MAXPRIO])
+		max = nla_get_u32(tb[NFL_TABLE_FLOWS_MAXPRIO]);
+
+	msg = net_flow_build_flows_msg(dev,
+				       info->snd_portid,
+				       info->snd_seq,
+				       NFL_TABLE_CMD_GET_FLOWS,
+				       min, max, table);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+out:
+	dev_put(dev);
+	return err;
+}
+
+static struct sk_buff *net_flow_start_errmsg(struct net_device *dev,
+					     struct genlmsghdr **hdr,
+					     u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *h;
+	struct sk_buff *skb;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-EMSGSIZE);
+
+	h = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!h)
+		return ERR_PTR(-EMSGSIZE);
+
+	if (nla_put_u32(skb,
+			NFL_IDENTIFIER_TYPE,
+			NFL_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NFL_IDENTIFIER, dev->ifindex))
+		return ERR_PTR(-EMSGSIZE);
+
+	*hdr = h;
+	return skb;
+}
+
+static struct sk_buff *net_flow_end_flow_errmsg(struct sk_buff *skb,
+						struct genlmsghdr *hdr)
+{
+	int err;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0) {
+		nlmsg_free(skb);
+		return ERR_PTR(err);
+	}
+
+	return skb;
+}
+
+const struct nla_policy net_flow_field_policy[NFL_FIELD_REF_MAX + 1] = {
+	[NFL_FIELD_REF_NEXT_NODE] = { .type = NLA_U32,},
+	[NFL_FIELD_REF_INSTANCE]  = { .type = NLA_U32,},
+	[NFL_FIELD_REF_HEADER]	  = { .type = NLA_U32,},
+	[NFL_FIELD_REF_FIELD]	  = { .type = NLA_U32,},
+	[NFL_FIELD_REF_MASK_TYPE] = { .type = NLA_U32,},
+	[NFL_FIELD_REF_TYPE]	  = { .type = NLA_U32,},
+	[NFL_FIELD_REF_VALUE]	  = { .type = NLA_BINARY,
+				      .len = sizeof(u64)},
+	[NFL_FIELD_REF_MASK]	  = { .type = NLA_BINARY,
+				      .len = sizeof(u64)},
+};
+
+static int net_flow_get_field(struct net_flow_field_ref *field,
+			      struct nlattr *nla)
+{
+	struct nlattr *ref[NFL_FIELD_REF_MAX+1];
+	int err;
+
+	err = nla_parse_nested(ref, NFL_FIELD_REF_MAX,
+			       nla, net_flow_field_policy);
+	if (err)
+		return err;
+
+	if (!ref[NFL_FIELD_REF_INSTANCE] ||
+	    !ref[NFL_FIELD_REF_HEADER] ||
+	    !ref[NFL_FIELD_REF_FIELD] ||
+	    !ref[NFL_FIELD_REF_MASK_TYPE] ||
+	    !ref[NFL_FIELD_REF_TYPE])
+		return -EINVAL;
+
+	field->instance = nla_get_u32(ref[NFL_FIELD_REF_INSTANCE]);
+	field->header = nla_get_u32(ref[NFL_FIELD_REF_HEADER]);
+	field->field = nla_get_u32(ref[NFL_FIELD_REF_FIELD]);
+	field->mask_type = nla_get_u32(ref[NFL_FIELD_REF_MASK_TYPE]);
+	field->type = nla_get_u32(ref[NFL_FIELD_REF_TYPE]);
+
+	if (!ref[NFL_FIELD_REF_VALUE])
+		return 0;
+
+	switch (field->type) {
+	case NFL_FIELD_REF_ATTR_TYPE_U8:
+		if (nla_len(ref[NFL_FIELD_REF_VALUE]) < sizeof(u8)) {
+			err = -EINVAL;
+			break;
+		}
+		field->value_u8 = nla_get_u8(ref[NFL_FIELD_REF_VALUE]);
+
+		if (!ref[NFL_FIELD_REF_MASK])
+			break;
+
+		if (nla_len(ref[NFL_FIELD_REF_MASK]) < sizeof(u8)) {
+			err = -EINVAL;
+			break;
+		}
+		field->mask_u8 = nla_get_u8(ref[NFL_FIELD_REF_MASK]);
+		break;
+	case NFL_FIELD_REF_ATTR_TYPE_U16:
+		if (nla_len(ref[NFL_FIELD_REF_VALUE]) < sizeof(u16)) {
+			err = -EINVAL;
+			break;
+		}
+		field->value_u16 = nla_get_u16(ref[NFL_FIELD_REF_VALUE]);
+
+		if (!ref[NFL_FIELD_REF_MASK])
+			break;
+
+		if (nla_len(ref[NFL_FIELD_REF_MASK]) < sizeof(u16)) {
+			err = -EINVAL;
+			break;
+		}
+		field->mask_u16 = nla_get_u16(ref[NFL_FIELD_REF_MASK]);
+		break;
+	case NFL_FIELD_REF_ATTR_TYPE_U32:
+		if (nla_len(ref[NFL_FIELD_REF_VALUE]) < sizeof(u32)) {
+			err = -EINVAL;
+			break;
+		}
+		field->value_u32 = nla_get_u32(ref[NFL_FIELD_REF_VALUE]);
+
+		if (!ref[NFL_FIELD_REF_MASK])
+			break;
+
+		if (nla_len(ref[NFL_FIELD_REF_MASK]) < sizeof(u32)) {
+			err = -EINVAL;
+			break;
+		}
+		field->mask_u32 = nla_get_u32(ref[NFL_FIELD_REF_MASK]);
+		break;
+	case NFL_FIELD_REF_ATTR_TYPE_U64:
+		if (nla_len(ref[NFL_FIELD_REF_VALUE]) < sizeof(u64)) {
+			err = -EINVAL;
+			break;
+		}
+		field->value_u64 = nla_get_u64(ref[NFL_FIELD_REF_VALUE]);
+
+		if (!ref[NFL_FIELD_REF_MASK])
+			break;
+
+		if (nla_len(ref[NFL_FIELD_REF_MASK]) < sizeof(u64)) {
+			err = -EINVAL;
+			break;
+		}
+		field->mask_u64 = nla_get_u64(ref[NFL_FIELD_REF_MASK]);
+		break;
+	default:
+		err = -EINVAL;
+		break;
+	}
+
+	return err;
+}
+
+static void net_flow_free_actions(struct net_flow_action *actions)
+{
+	int i;
+
+	if (!actions)
+		return;
+
+	for (i = 0; actions[i].args; i++) {
+		kfree(actions[i].args->name);
+		kfree(actions[i].args);
+	}
+	kfree(actions);
+}
+
+static void net_flow_rule_free(struct net_flow_rule *rule)
+{
+	if (!rule)
+		return;
+
+	kfree(rule->matches);
+	net_flow_free_actions(rule->actions);
+	kfree(rule);
+}
+
+static const
+struct nla_policy net_flow_actarg_policy[NFL_ACTION_ARG_MAX + 1] = {
+	[NFL_ACTION_ARG_NAME]  = { .type = NLA_STRING },
+	[NFL_ACTION_ARG_TYPE]  = { .type = NLA_U32 },
+	[NFL_ACTION_ARG_VALUE] = { .type = NLA_BINARY, .len = sizeof(u64)},
+};
+
+static int net_flow_get_actarg(struct net_flow_action_arg *arg,
+			       struct nlattr *attr)
+{
+	struct nlattr *r[NFL_ACTION_ARG_MAX+1];
+	int err;
+
+	err = nla_parse_nested(r, NFL_ACTION_ARG_MAX,
+			       attr, net_flow_actarg_policy);
+	if (err)
+		return err;
+
+	if (!r[NFL_ACTION_ARG_TYPE] ||
+	    !r[NFL_ACTION_ARG_VALUE])
+		return -EINVAL;
+
+	arg->type = nla_get_u32(r[NFL_ACTION_ARG_TYPE]);
+	switch (arg->type) {
+	case NFL_ACTION_ARG_TYPE_U8:
+		if (nla_len(r[NFL_ACTION_ARG_VALUE]) < sizeof(u8))
+			return -EINVAL;
+		arg->value_u8 = nla_get_u8(r[NFL_ACTION_ARG_VALUE]);
+		break;
+	case NFL_ACTION_ARG_TYPE_U16:
+		if (nla_len(r[NFL_ACTION_ARG_VALUE]) < sizeof(u16))
+			return -EINVAL;
+		arg->value_u16 = nla_get_u16(r[NFL_ACTION_ARG_VALUE]);
+		break;
+	case NFL_ACTION_ARG_TYPE_U32:
+		if (nla_len(r[NFL_ACTION_ARG_VALUE]) < sizeof(u32))
+			return -EINVAL;
+		arg->value_u32 = nla_get_u32(r[NFL_ACTION_ARG_VALUE]);
+		break;
+	case NFL_ACTION_ARG_TYPE_U64:
+		if (nla_len(r[NFL_ACTION_ARG_VALUE]) < sizeof(u64))
+			return -EINVAL;
+		arg->value_u64 = nla_get_u64(r[NFL_ACTION_ARG_VALUE]);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (r[NFL_ACTION_ARG_NAME]) {
+		int max = nla_len(r[NFL_ACTION_ARG_NAME]);
+
+		if (max > NFL_MAX_NAME)
+			max = NFL_MAX_NAME;
+
+		arg->name = kzalloc(max, GFP_KERNEL);
+		if (!arg->name)
+			return -ENOMEM;
+		nla_strlcpy(arg->name, r[NFL_ACTION_ARG_NAME], max);
+	}
+
+	return 0;
+}
+
+static int net_flow_get_action(struct net_flow_action *a, struct nlattr *attr)
+{
+	struct nlattr *act[NFL_ACTION_ATTR_MAX+1];
+	struct nlattr *args;
+	int rem;
+	int err, count = 0;
+
+	if (nla_type(attr) != NFL_ACTION) {
+		pr_warn("%s: expected NFL_ACTION\n", __func__);
+		return 0;
+	}
+
+	err = nla_parse_nested(act, NFL_ACTION_ATTR_MAX,
+			       attr, net_flow_action_policy);
+	if (err < 0)
+		return err;
+
+	if (!act[NFL_ACTION_ATTR_UID])
+		return -EINVAL;
+
+	a->uid = nla_get_u32(act[NFL_ACTION_ATTR_UID]);
+
+	/* Only need to parse signature if it is provided otherwise assume
+	 * action does not need any arguments
+	 */
+	if (!act[NFL_ACTION_ATTR_SIGNATURE])
+		return 0;
+
+	nla_for_each_nested(args, act[NFL_ACTION_ATTR_SIGNATURE], rem)
+		count++;
+
+	a->args = kcalloc(count + 1,
+			  sizeof(struct net_flow_action_arg),
+			  GFP_KERNEL);
+	count = 0;
+
+	nla_for_each_nested(args, act[NFL_ACTION_ATTR_SIGNATURE], rem) {
+		if (nla_type(args) != NFL_ACTION_ARG)
+			continue;
+
+		err = net_flow_get_actarg(&a->args[count], args);
+		if (err) {
+			kfree(a->args);
+			a->args = NULL;
+			return err;
+		}
+		count++;
+	}
+	return 0;
+}
+
+static const
+struct nla_policy net_flow_rule_policy[NFL_ATTR_MAX + 1] = {
+	[NFL_ATTR_TABLE]	= { .type = NLA_U32 },
+	[NFL_ATTR_UID]		= { .type = NLA_U32 },
+	[NFL_ATTR_PRIORITY]	= { .type = NLA_U32 },
+	[NFL_ATTR_MATCHES]	= { .type = NLA_NESTED },
+	[NFL_ATTR_ACTIONS]	= { .type = NLA_NESTED },
+};
+
+static int net_flow_get_rule(struct net_flow_rule *rule, struct nlattr *attr)
+{
+	struct nlattr *f[NFL_ATTR_MAX+1];
+	struct nlattr *match, *act;
+	int rem, err;
+	int count = 0;
+
+	err = nla_parse_nested(f, NFL_ATTR_MAX,
+			       attr, net_flow_rule_policy);
+	if (err < 0)
+		return -EINVAL;
+
+	if (!f[NFL_ATTR_TABLE] || !f[NFL_ATTR_UID] ||
+	    !f[NFL_ATTR_PRIORITY])
+		return -EINVAL;
+
+	rule->table_id = nla_get_u32(f[NFL_ATTR_TABLE]);
+	rule->uid = nla_get_u32(f[NFL_ATTR_UID]);
+	rule->priority = nla_get_u32(f[NFL_ATTR_PRIORITY]);
+
+	rule->matches = NULL;
+	rule->actions = NULL;
+
+	if (f[NFL_ATTR_MATCHES]) {
+		nla_for_each_nested(match, f[NFL_ATTR_MATCHES], rem) {
+			if (nla_type(match) == NFL_FIELD_REF)
+				count++;
+		}
+
+		/* Null terminated list of matches */
+		rule->matches = kcalloc(count + 1,
+					sizeof(struct net_flow_field_ref),
+					GFP_KERNEL);
+		if (!rule->matches)
+			return -ENOMEM;
+
+		count = 0;
+		nla_for_each_nested(match, f[NFL_ATTR_MATCHES], rem) {
+			err = net_flow_get_field(&rule->matches[count], match);
+			if (err) {
+				kfree(rule->matches);
+				rule->matches = NULL;
+				return err;
+			}
+			count++;
+		}
+	}
+
+	if (f[NFL_ATTR_ACTIONS]) {
+		count = 0;
+		nla_for_each_nested(act, f[NFL_ATTR_ACTIONS], rem) {
+			if (nla_type(act) == NFL_ACTION)
+				count++;
+		}
+
+		/* Null terminated list of actions */
+		rule->actions = kcalloc(count + 1,
+					sizeof(struct net_flow_action),
+					GFP_KERNEL);
+		if (!rule->actions) {
+			kfree(rule->matches);
+			rule->matches = NULL;
+			return -ENOMEM;
+		}
+
+		count = 0;
+		nla_for_each_nested(act, f[NFL_ATTR_ACTIONS], rem) {
+			err = net_flow_get_action(&rule->actions[count], act);
+			if (err) {
+				kfree(rule->matches);
+				rule->matches = NULL;
+				net_flow_free_actions(rule->actions);
+				rule->actions = NULL;
+				return err;
+			}
+			count++;
+		}
+	}
+
+	return 0;
+}
+
+static int net_flow_table_cmd_flows(struct sk_buff *recv_skb,
+				    struct genl_info *info)
+{
+	int rem, err_handle = NFL_FLOWS_ERROR_ABORT;
+	struct net_flow_rule *this = NULL;
+	struct sk_buff *skb = NULL;
+	struct genlmsghdr *hdr;
+	struct net_device *dev;
+	struct nlattr *flow, *flows;
+	int cmd = info->genlhdr->cmd;
+	int err = -EOPNOTSUPP;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	switch (cmd) {
+	case NFL_TABLE_CMD_SET_FLOWS:
+		if (!dev->netdev_ops->ndo_flow_set_rule)
+			goto out;
+		break;
+	case NFL_TABLE_CMD_DEL_FLOWS:
+		if (!dev->netdev_ops->ndo_flow_del_rule)
+			goto out;
+		break;
+	default:
+		goto out;
+	}
+
+	if (!info->attrs[NFL_IDENTIFIER_TYPE] ||
+	    !info->attrs[NFL_IDENTIFIER] ||
+	    !info->attrs[NFL_FLOWS]) {
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (info->attrs[NFL_FLOWS_ERROR])
+		err_handle = nla_get_u32(info->attrs[NFL_FLOWS_ERROR]);
+
+	net_flow_lock();
+	nla_for_each_nested(flow, info->attrs[NFL_FLOWS], rem) {
+		if (nla_type(flow) != NFL_FLOW)
+			continue;
+
+		this = kzalloc(sizeof(*this), GFP_KERNEL);
+		if (!this) {
+			err = -ENOMEM;
+			goto skip;
+		}
+
+		/* If userspace is passing invalid messages so that we can not
+		 * even build correct flow structures abort with an error. And
+		 * do not try to proceed regardless of error structure.
+		 */
+		err = net_flow_get_rule(this, flow);
+		if (err)
+			goto out_locked;
+
+		switch (cmd) {
+		case NFL_TABLE_CMD_SET_FLOWS:
+			err = dev->netdev_ops->ndo_flow_set_rule(dev, this);
+			break;
+		case NFL_TABLE_CMD_DEL_FLOWS:
+			err = dev->netdev_ops->ndo_flow_del_rule(dev, this);
+			break;
+		default:
+			err = -EOPNOTSUPP;
+			break;
+		}
+
+skip:
+		if (err && err_handle != NFL_FLOWS_ERROR_CONTINUE) {
+			if (!skb) {
+				skb = net_flow_start_errmsg(dev, &hdr,
+							    info->snd_portid,
+							    info->snd_seq,
+							    cmd);
+				if (IS_ERR(skb)) {
+					err = PTR_ERR(skb);
+					goto out_locked;
+				}
+
+				flows = nla_nest_start(skb, NFL_FLOWS);
+				if (!flows) {
+					err = -EMSGSIZE;
+					goto out_locked;
+				}
+			}
+
+			net_flow_put_rule(skb, this);
+		}
+
+		net_flow_rule_free(this);
+
+		if (err && err_handle == NFL_FLOWS_ERROR_ABORT)
+			goto out_locked;
+	}
+	net_flow_unlock();
+	dev_put(dev);
+
+	if (skb) {
+		nla_nest_end(skb, flows);
+		net_flow_end_flow_errmsg(skb, hdr);
+		return genlmsg_reply(skb, info);
+	}
+	return 0;
+
+out_locked:
+	net_flow_unlock();
+out:
+	net_flow_rule_free(this);
+	nlmsg_free(skb);
+	dev_put(dev);
+	return err;
+}
+
 static const struct nla_policy net_flow_cmd_policy[NFL_MAX + 1] = {
 	[NFL_IDENTIFIER_TYPE]	= {.type = NLA_U32, },
 	[NFL_IDENTIFIER]	= {.type = NLA_U32, },
@@ -920,6 +1662,24 @@ static const struct genl_ops net_flow_table_nl_ops[] = {
 		.policy = net_flow_cmd_policy,
 		.flags = GENL_ADMIN_PERM,
 	},
+	{
+		.cmd = NFL_TABLE_CMD_GET_FLOWS,
+		.doit = net_flow_table_cmd_get_flows,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NFL_TABLE_CMD_SET_FLOWS,
+		.doit = net_flow_table_cmd_flows,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NFL_TABLE_CMD_DEL_FLOWS,
+		.doit = net_flow_table_cmd_flows,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
 };
 
 static int __init net_flow_nl_module_init(void)

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 03/12] net: flow: implement flow cache for get routines
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
  2015-01-20 20:26 ` [net-next PATCH v3 01/12] net: flow_table: create interface for hw match/action tables John Fastabend
  2015-01-20 20:27 ` [net-next PATCH v3 02/12] net: flow_table: add rule, delete rule John Fastabend
@ 2015-01-20 20:27 ` John Fastabend
  2015-01-20 20:27 ` [net-next PATCH v3 04/12] net: flow_table: create a set of common headers and actions John Fastabend
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:27 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

I used rhashtable to implement a flow cache so software can track the
currently programmed rules without requiring every driver to
implement its own cache logic or fetch information from hardware.

I chose rhashtable to get the dynamic resizing.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/if_flow.h |   24 +++++++
 net/core/flow_table.c   |  152 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 169 insertions(+), 7 deletions(-)

diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
index 712b54f..07d7bca 100644
--- a/include/linux/if_flow.h
+++ b/include/linux/if_flow.h
@@ -21,6 +21,7 @@
 #define _IF_FLOW_H
 
 #include <uapi/linux/if_flow.h>
+#include <linux/rhashtable.h>
 
 /**
  * @struct net_flow_fields
@@ -134,6 +135,7 @@ struct net_flow_field_ref {
  * @size max number of entries for table or -1 for unbounded
  * @matches null terminated set of supported match types given by match uid
  * @actions null terminated set of supported action types given by action uid
+ * @cache software cache of hardware flows
  */
 struct net_flow_tbl {
 	char *name;
@@ -143,6 +145,7 @@ struct net_flow_tbl {
 	__u32 size;
 	struct net_flow_field_ref *matches;
 	__u32 *actions;
+	struct rhashtable cache;
 };
 
 /**
@@ -190,6 +193,8 @@ struct net_flow_tbl_node {
  * @struct net_flow_rule
  * @brief describes the match/action entry
  *
+ * @node node for resizable hash table used for software cache of rules
+ * @rcu used to support delayed freeing via call_rcu in software cache
  * @uid unique identifier for flow
  * @priority priority to execute flow match/action in table
  * @match null terminated set of match uids match criteria
@@ -198,10 +203,29 @@ struct net_flow_tbl_node {
  * Flows must match all entries in match set.
  */
 struct net_flow_rule {
+	struct rhash_head node;
+	struct rcu_head rcu;
 	__u32 table_id;
 	__u32 uid;
 	__u32 priority;
 	struct net_flow_field_ref *matches;
 	struct net_flow_action *actions;
 };
+
+#ifdef CONFIG_NET_FLOW_TABLES
+int net_flow_init_cache(struct net_flow_tbl *table);
+void net_flow_destroy_cache(struct net_flow_tbl *table);
+#else
+static inline int
+net_flow_init_cache(struct net_flow_tbl *table)
+{
+	return 0;
+}
+
+static inline void
+net_flow_destroy_cache(struct net_flow_tbl *table)
+{
+	return;
+}
+#endif /* CONFIG_NET_FLOW_TABLES */
 #endif /* _IF_FLOW_H_ */
diff --git a/net/core/flow_table.c b/net/core/flow_table.c
index 7b85e53..c1a9716 100644
--- a/net/core/flow_table.c
+++ b/net/core/flow_table.c
@@ -26,6 +26,8 @@
 #include <net/genetlink.h>
 #include <net/rtnetlink.h>
 #include <linux/module.h>
+#include <linux/rhashtable.h>
+#include <linux/jhash.h>
 
 static DEFINE_MUTEX(net_flow_mutex);
 
@@ -919,6 +921,27 @@ static int net_flow_cmd_get_table_graph(struct sk_buff *skb,
 	return genlmsg_reply(msg, info);
 }
 
+static struct net_flow_tbl *net_flow_get_table(struct net_device *dev,
+					       int table_id)
+{
+	struct net_flow_tbl **tables;
+	int i;
+
+	if (!dev->netdev_ops->ndo_flow_get_tbls)
+		return NULL;
+
+	tables = dev->netdev_ops->ndo_flow_get_tbls(dev);
+	if (!tables)
+		return NULL;
+
+	for (i = 0; tables[i]; i++) {
+		if (tables[i]->uid == table_id)
+			return tables[i];
+	}
+
+	return NULL;
+}
+
 static int net_flow_put_flow_action(struct sk_buff *skb,
 				    struct net_flow_action *a)
 {
@@ -1017,11 +1040,39 @@ put_failure:
 	return err;
 }
 
+static int net_flow_get_rule_cache(struct sk_buff *skb,
+				   struct net_flow_tbl *table,
+				   int min, int max)
+{
+	const struct bucket_table *tbl;
+	struct net_flow_rule *he;
+	int i, err = 0;
+
+	rcu_read_lock();
+	tbl = rht_dereference_rcu(table->cache.tbl, &table->cache);
+
+	for (i = 0; i < tbl->size; i++) {
+		struct rhash_head *pos;
+
+		rht_for_each_entry_rcu(he, pos, tbl, i, node) {
+			if (he->uid < min || (max > 0 && he->uid > max))
+				continue;
+			err = net_flow_put_rule(skb, he);
+			if (err)
+				goto out;
+		}
+	}
+out:
+	rcu_read_unlock();
+	return err;
+}
+
 static struct sk_buff *net_flow_build_flows_msg(struct net_device *dev,
 						u32 portid, int seq, u8 cmd,
 						int min, int max, int table)
 {
 	struct genlmsghdr *hdr;
+	struct net_flow_tbl *t;
 	struct nlattr *flows;
 	struct sk_buff *skb;
 	int err = -ENOBUFS;
@@ -1042,15 +1093,23 @@ static struct sk_buff *net_flow_build_flows_msg(struct net_device *dev,
 		goto out;
 	}
 
+	t = net_flow_get_table(dev, table);
+	if (!t) {
+		err = -EINVAL;
+		goto out;
+	}
+
 	flows = nla_nest_start(skb, NFL_FLOWS);
 	if (!flows) {
 		err = -EMSGSIZE;
 		goto out;
 	}
 
-	err = -EOPNOTSUPP;
-	if (err < 0)
-		goto out_cancel;
+	err = net_flow_get_rule_cache(skb, t, min, max);
+	if (err < 0) {
+		nla_nest_cancel(skb, flows);
+		goto out;
+	}
 
 	nla_nest_end(skb, flows);
 
@@ -1059,8 +1118,6 @@ static struct sk_buff *net_flow_build_flows_msg(struct net_device *dev,
 		goto out;
 
 	return skb;
-out_cancel:
-	nla_nest_cancel(skb, flows);
 out:
 	nlmsg_free(skb);
 	return ERR_PTR(err);
@@ -1300,6 +1357,13 @@ static void net_flow_rule_free(struct net_flow_rule *rule)
 	kfree(rule);
 }
 
+static void net_flow_rule_free_rcu(struct rcu_head *head)
+{
+	struct net_flow_rule *r = container_of(head, struct net_flow_rule, rcu);
+
+	net_flow_rule_free(r);
+}
+
 static const
 struct nla_policy net_flow_actarg_policy[NFL_ACTION_ARG_MAX + 1] = {
 	[NFL_ACTION_ARG_NAME]  = { .type = NLA_STRING },
@@ -1505,6 +1569,70 @@ static int net_flow_get_rule(struct net_flow_rule *rule, struct nlattr *attr)
 	return 0;
 }
 
+#define NFL_TABLE_ELEM_HINT 10
+int net_flow_init_cache(struct net_flow_tbl *table)
+{
+	struct rhashtable_params params = {
+		.nelem_hint = NFL_TABLE_ELEM_HINT,
+		.head_offset = offsetof(struct net_flow_rule, node),
+		.key_offset = offsetof(struct net_flow_rule, uid),
+		.key_len = sizeof(__u32),
+		.hashfn = jhash,
+		.grow_decision = rht_grow_above_75,
+		.shrink_decision = rht_shrink_below_30
+	};
+
+	return rhashtable_init(&table->cache, &params);
+}
+EXPORT_SYMBOL(net_flow_init_cache);
+
+void net_flow_destroy_cache(struct net_flow_tbl *table)
+{
+	struct rhashtable *cache = &table->cache;
+	const struct bucket_table *tbl;
+	struct net_flow_rule *he;
+	struct rhash_head *pos, *next;
+	unsigned int i;
+
+	/* Stop an eventual async resizing */
+	cache->being_destroyed = true;
+	mutex_lock(&cache->mutex);
+
+	tbl = rht_dereference(cache->tbl, cache);
+	for (i = 0; i < tbl->size; i++) {
+		rht_for_each_entry_safe(he, pos, next, tbl, i, node) {
+			rhashtable_remove(&table->cache, &he->node);
+			call_rcu(&he->rcu, net_flow_rule_free_rcu);
+		}
+	}
+
+	mutex_unlock(&cache->mutex);
+	rhashtable_destroy(cache);
+}
+EXPORT_SYMBOL(net_flow_destroy_cache);
+
+static void net_flow_add_rule_cache(struct net_flow_tbl *table,
+				    struct net_flow_rule *this)
+{
+	rhashtable_insert(&table->cache, &this->node);
+}
+
+static int net_flow_del_rule_cache(struct net_flow_tbl *table,
+				   struct net_flow_rule *this)
+{
+	struct net_flow_rule *he;
+
+	he = rhashtable_lookup(&table->cache, &this->uid);
+	if (he) {
+		rhashtable_remove(&table->cache, &he->node);
+		synchronize_rcu();
+		net_flow_rule_free(he);
+		return 0;
+	}
+
+	return -EEXIST;
+}
+
 static int net_flow_table_cmd_flows(struct sk_buff *recv_skb,
 				    struct genl_info *info)
 {
@@ -1546,6 +1674,8 @@ static int net_flow_table_cmd_flows(struct sk_buff *recv_skb,
 
 	net_flow_lock();
 	nla_for_each_nested(flow, info->attrs[NFL_FLOWS], rem) {
+		struct net_flow_tbl *table;
+
 		if (nla_type(flow) != NFL_FLOW)
 			continue;
 
@@ -1563,12 +1693,22 @@ static int net_flow_table_cmd_flows(struct sk_buff *recv_skb,
 		if (err)
 			goto out_locked;
 
+		table = net_flow_get_table(dev, this->table_id);
+		if (!table) {
+			err = -EINVAL;
+			goto skip;
+		}
+
 		switch (cmd) {
 		case NFL_TABLE_CMD_SET_FLOWS:
 			err = dev->netdev_ops->ndo_flow_set_rule(dev, this);
+			if (!err)
+				net_flow_add_rule_cache(table, this);
 			break;
 		case NFL_TABLE_CMD_DEL_FLOWS:
 			err = dev->netdev_ops->ndo_flow_del_rule(dev, this);
+			if (!err)
+				err = net_flow_del_rule_cache(table, this);
 			break;
 		default:
 			err = -EOPNOTSUPP;
@@ -1597,8 +1737,6 @@ skip:
 			net_flow_put_rule(skb, this);
 		}
 
-		net_flow_rule_free(this);
-
 		if (err && err_handle == NFL_FLOWS_ERROR_ABORT)
 			goto out_locked;
 	}

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 04/12] net: flow_table: create a set of common headers and actions
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
                   ` (2 preceding siblings ...)
  2015-01-20 20:27 ` [net-next PATCH v3 03/12] net: flow: implement flow cache for get routines John Fastabend
@ 2015-01-20 20:27 ` John Fastabend
  2015-01-20 20:59   ` John W. Linville
  2015-01-20 20:28 ` [net-next PATCH v3 05/12] net: flow_table: add validation functions for rules John Fastabend
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:27 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

This adds common headers and actions that drivers can use.

I have not yet moved the header graphs into the common header
because I'm not entirely convinced its re-usable. The devices
I have been looking at have different enough header graphs that
they wouldn't be re-usable. However possibly many 40Gbp NICs
for example could share a common header graph. When we get
multiple implementations we can move this into the common file
if it makes sense.

And table structures seem to be unique enough that there is
little value in putting each devices table layout into the
common file so its left for device specific implementation.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/if_flow_common.h |  257 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 257 insertions(+)
 create mode 100644 include/linux/if_flow_common.h

diff --git a/include/linux/if_flow_common.h b/include/linux/if_flow_common.h
new file mode 100644
index 0000000..ef2d66f
--- /dev/null
+++ b/include/linux/if_flow_common.h
@@ -0,0 +1,257 @@
+#ifndef _IF_FLOW_COMMON_H_
+#define _IF_FLOW_COMMON_H_
+
+#include <linux/if_flow.h>
+
+/* Common header definition this section provides a set of common or
+ * standard headers that device driver writers may use to simplify the
+ * driver creation. We do not want vendor or driver specific headers
+ * here though. Driver authors can keep these contained to their driver
+ *
+ * Driver authors may use unique IDs greater than HEADER_MAX_UID it is
+ * guaranteed to be larger than any unique IDs used here.
+ */
+#define HEADER_MAX_UID 100
+
+enum net_flow_headers {
+	HEADER_UNSPEC,
+	HEADER_ETHERNET,
+	HEADER_VLAN,
+	HEADER_IPV4,
+};
+
+enum net_flow_ethernet_fields_ids {
+	HEADER_ETHERNET_UNSPEC,
+	HEADER_ETHERNET_SRC_MAC,
+	HEADER_ETHERNET_DST_MAC,
+	HEADER_ETHERNET_ETHERTYPE,
+};
+
+struct net_flow_field net_flow_ethernet_fields[] = {
+	{ .name = "src_mac", .uid = HEADER_ETHERNET_SRC_MAC, .bitwidth = 48},
+	{ .name = "dst_mac", .uid = HEADER_ETHERNET_DST_MAC, .bitwidth = 48},
+	{ .name = "ethertype",
+	  .uid = HEADER_ETHERNET_ETHERTYPE,
+	  .bitwidth = 16},
+};
+
+struct net_flow_hdr net_flow_ethernet = {
+	.name = "ethernet",
+	.uid = HEADER_ETHERNET,
+	.field_sz = ARRAY_SIZE(net_flow_ethernet_fields),
+	.fields = net_flow_ethernet_fields,
+};
+
+enum net_flow_vlan_fields_ids {
+	HEADER_VLAN_UNSPEC,
+	HEADER_VLAN_PCP,
+	HEADER_VLAN_CFI,
+	HEADER_VLAN_VID,
+	HEADER_VLAN_ETHERTYPE,
+};
+
+struct net_flow_field net_flow_vlan_fields[] = {
+	{ .name = "pcp", .uid = HEADER_VLAN_PCP, .bitwidth = 3,},
+	{ .name = "cfi", .uid = HEADER_VLAN_CFI, .bitwidth = 1,},
+	{ .name = "vid", .uid = HEADER_VLAN_VID, .bitwidth = 12,},
+	{ .name = "ethertype", .uid = HEADER_VLAN_ETHERTYPE, .bitwidth = 16,},
+};
+
+struct net_flow_hdr net_flow_vlan = {
+	.name = "vlan",
+	.uid = HEADER_VLAN,
+	.field_sz = ARRAY_SIZE(net_flow_vlan_fields),
+	.fields = net_flow_vlan_fields,
+};
+
+enum net_flow_ipv4_fields_ids {
+	HEADER_IPV4_UNSPEC,
+	HEADER_IPV4_VERSION,
+	HEADER_IPV4_IHL,
+	HEADER_IPV4_DSCP,
+	HEADER_IPV4_ECN,
+	HEADER_IPV4_LENGTH,
+	HEADER_IPV4_IDENTIFICATION,
+	HEADER_IPV4_FLAGS,
+	HEADER_IPV4_FRAGMENT_OFFSET,
+	HEADER_IPV4_TTL,
+	HEADER_IPV4_PROTOCOL,
+	HEADER_IPV4_CSUM,
+	HEADER_IPV4_SRC_IP,
+	HEADER_IPV4_DST_IP,
+	HEADER_IPV4_OPTIONS,
+};
+
+struct net_flow_field net_flow_ipv4_fields[] = {
+	{ .name = "version",
+	  .uid = HEADER_IPV4_VERSION,
+	  .bitwidth = 4,},
+	{ .name = "ihl",
+	  .uid = HEADER_IPV4_IHL,
+	  .bitwidth = 4,},
+	{ .name = "dscp",
+	  .uid = HEADER_IPV4_DSCP,
+	  .bitwidth = 6,},
+	{ .name = "ecn",
+	  .uid = HEADER_IPV4_ECN,
+	  .bitwidth = 2,},
+	{ .name = "length",
+	  .uid = HEADER_IPV4_LENGTH,
+	  .bitwidth = 8,},
+	{ .name = "identification",
+	  .uid = HEADER_IPV4_IDENTIFICATION,
+	  .bitwidth = 8,},
+	{ .name = "flags",
+	  .uid = HEADER_IPV4_FLAGS,
+	  .bitwidth = 3,},
+	{ .name = "fragment_offset",
+	  .uid = HEADER_IPV4_FRAGMENT_OFFSET,
+	  .bitwidth = 13,},
+	{ .name = "ttl",
+	  .uid = HEADER_IPV4_TTL,
+	  .bitwidth = 1,},
+	{ .name = "protocol",
+	  .uid = HEADER_IPV4_PROTOCOL,
+	  .bitwidth = 8,},
+	{ .name = "csum",
+	  .uid = HEADER_IPV4_CSUM,
+	  .bitwidth = 8,},
+	{ .name = "src_ip",
+	  .uid = HEADER_IPV4_SRC_IP,
+	  .bitwidth = 32,},
+	{ .name = "dst_ip",
+	  .uid = HEADER_IPV4_DST_IP,
+	  .bitwidth = 32,},
+	{ .name = "options",
+	  .uid = HEADER_IPV4_OPTIONS,
+	  .bitwidth = 0,},
+};
+
+struct net_flow_hdr net_flow_ipv4 = {
+	.name = "ipv4",
+	.uid = HEADER_IPV4,
+	.field_sz = ARRAY_SIZE(net_flow_ipv4_fields),
+	.fields = net_flow_ipv4_fields,
+};
+
+/* Common set of actions. Below are the list of actions that are or we expect
+ * are common enough to be supported by multiple devices.
+ *
+ * Driver authors may use unique IDs greater than ACTION_MAX_UID it is
+ * guaranteed to be larger than any unique IDs used here.
+ */
+#define ACTION_MAX_UID 100
+
+enum net_flow_action_ids {
+	ACTION_SET_UNSPEC,
+	ACTION_SET_VLAN_ID,
+	ACTION_COPY_TO_CPU,
+	ACTION_POP_VLAN,
+	ACTION_SET_ETH_SRC,
+	ACTION_SET_ETH_DST,
+	ACTION_SET_OUT_PORT,
+	ACTION_CHECK_TTL_DROP,
+};
+
+struct net_flow_action_arg net_flow_null_args[] = {
+	{
+		.name = "",
+		.type = NFL_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+struct net_flow_action net_flow_null_action = {
+	.name = "", .uid = 0, .args = NULL,
+};
+
+struct net_flow_action_arg net_flow_set_vlan_id_args[] = {
+	{
+		.name = "vlan_id",
+		.type = NFL_ACTION_ARG_TYPE_U16,
+		.value_u16 = 0,
+	},
+	{
+		.name = "",
+		.type = NFL_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+struct net_flow_action net_flow_set_vlan_id = {
+	.name = "set_vlan_id",
+	.uid = ACTION_SET_VLAN_ID,
+	.args = net_flow_set_vlan_id_args,
+};
+
+struct net_flow_action net_flow_copy_to_cpu = {
+	.name = "copy_to_cpu",
+	.uid = ACTION_COPY_TO_CPU,
+	.args = net_flow_null_args,
+};
+
+struct net_flow_action net_flow_pop_vlan = {
+	.name = "pop_vlan",
+	.uid = ACTION_POP_VLAN,
+	.args = net_flow_null_args,
+};
+
+struct net_flow_action_arg net_flow_set_eth_src_args[] = {
+	{
+		.name = "eth_src",
+		.type = NFL_ACTION_ARG_TYPE_U64,
+		.value_u64 = 0,
+	},
+	{
+		.name = "",
+		.type = NFL_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+struct net_flow_action net_flow_set_eth_src = {
+	.name = "set_eth_src",
+	.uid = ACTION_SET_ETH_SRC,
+	.args = net_flow_set_eth_src_args,
+};
+
+struct net_flow_action_arg net_flow_set_eth_dst_args[] = {
+	{
+		.name = "eth_dst",
+		.type = NFL_ACTION_ARG_TYPE_U64,
+		.value_u64 = 0,
+	},
+	{
+		.name = "",
+		.type = NFL_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+struct net_flow_action net_flow_set_eth_dst = {
+	.name = "set_eth_dst",
+	.uid = ACTION_SET_ETH_DST,
+	.args = net_flow_set_eth_dst_args,
+};
+
+struct net_flow_action_arg net_flow_set_out_port_args[] = {
+	{
+		.name = "set_out_port",
+		.type = NFL_ACTION_ARG_TYPE_U32,
+		.value_u32 = 0,
+	},
+	{
+		.name = "",
+		.type = NFL_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+struct net_flow_action net_flow_set_out_port = {
+	.name = "set_out_port",
+	.uid = ACTION_SET_OUT_PORT,
+	.args = net_flow_set_out_port_args,
+};
+
+struct net_flow_action net_flow_check_ttl_drop = {
+	.name = "check_ttl_drop",
+	.uid = ACTION_CHECK_TTL_DROP,
+	.args = net_flow_null_args,
+};
+
+#endif

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 05/12] net: flow_table: add validation functions for rules
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
                   ` (3 preceding siblings ...)
  2015-01-20 20:27 ` [net-next PATCH v3 04/12] net: flow_table: create a set of common headers and actions John Fastabend
@ 2015-01-20 20:28 ` John Fastabend
  2015-01-20 20:28 ` [net-next PATCH v3 06/12] net: rocker: add pipeline model for rocker switch John Fastabend
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:28 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

This adds common validation functions that is used before
adding rules to verify they match the table spec returned
from driver.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/core/flow_table.c |   75 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/net/core/flow_table.c b/net/core/flow_table.c
index c1a9716..4b4da2e 100644
--- a/net/core/flow_table.c
+++ b/net/core/flow_table.c
@@ -1633,6 +1633,78 @@ static int net_flow_del_rule_cache(struct net_flow_tbl *table,
 	return -EEXIST;
 }
 
+static int net_flow_is_valid_action_arg(struct net_flow_action *a, int id)
+{
+	struct net_flow_action_arg *args = a->args;
+	int i;
+
+	/* Actions may not have any arguments */
+	if (!a->args)
+		return 0;
+
+	for (i = 0; args[i].type != NFL_ACTION_ARG_TYPE_NULL; i++) {
+		if (a->args[i].type == NFL_ACTION_ARG_TYPE_NULL ||
+		    args[i].type != a->args[i].type)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int net_flow_is_valid_action(struct net_flow_action *a, int *actions)
+{
+	int i;
+
+	for (i = 0; actions[i]; i++) {
+		if (actions[i] == a->uid)
+			return net_flow_is_valid_action_arg(a, a->uid);
+	}
+	return -EINVAL;
+}
+
+static int net_flow_is_valid_match(struct net_flow_field_ref *f,
+				   struct net_flow_field_ref *fields)
+{
+	int i;
+
+	for (i = 0; fields[i].header; i++) {
+		if (f->header == fields[i].header &&
+		    f->field == fields[i].field)
+			return 0;
+	}
+
+	return -EINVAL;
+}
+
+static int net_flow_is_valid_rule(struct net_flow_tbl *table,
+				  struct net_flow_rule *flow)
+{
+	struct net_flow_field_ref *fields = table->matches;
+	int *actions = table->actions;
+	int i, err;
+
+	/* Only accept rules with matches AND actions it does not seem
+	 * correct to allow a match without actions or action chains
+	 * that will never be hit
+	 */
+	if (!flow->actions || !flow->matches)
+		return -EINVAL;
+
+	for (i = 0; flow->actions[i].uid; i++) {
+		err = net_flow_is_valid_action(&flow->actions[i], actions);
+		if (err)
+			return -EINVAL;
+	}
+
+	for (i = 0; flow->matches[i].header; i++) {
+		err = net_flow_is_valid_match(&flow->matches[i], fields);
+		if (err)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int net_flow_table_cmd_flows(struct sk_buff *recv_skb,
 				    struct genl_info *info)
 {
@@ -1701,6 +1773,9 @@ static int net_flow_table_cmd_flows(struct sk_buff *recv_skb,
 
 		switch (cmd) {
 		case NFL_TABLE_CMD_SET_FLOWS:
+			err = net_flow_is_valid_rule(table, this);
+			if (err)
+				break;
 			err = dev->netdev_ops->ndo_flow_set_rule(dev, this);
 			if (!err)
 				net_flow_add_rule_cache(table, this);

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 06/12] net: rocker: add pipeline model for rocker switch
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
                   ` (4 preceding siblings ...)
  2015-01-20 20:28 ` [net-next PATCH v3 05/12] net: flow_table: add validation functions for rules John Fastabend
@ 2015-01-20 20:28 ` John Fastabend
  2015-01-20 20:29 ` [net-next PATCH v3 07/12] net: rocker: add set rule ops John Fastabend
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:28 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

This adds rocker support for the net_flow_get_* operations. With this
we can interrogate rocker.

Here we see that for static configurations enabling the get operations
is simply a matter of defining a pipeline model and returning the
structures for the core infrastructure to encapsulate into netlink
messages.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c          |   65 ++++
 drivers/net/ethernet/rocker/rocker_pipeline.h |  451 +++++++++++++++++++++++++
 2 files changed, 516 insertions(+)
 create mode 100644 drivers/net/ethernet/rocker/rocker_pipeline.h

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 2f398fa..d2ea451 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -36,6 +36,7 @@
 #include <generated/utsrelease.h>
 
 #include "rocker.h"
+#include "rocker_pipeline.h"
 
 static const char rocker_driver_name[] = "rocker";
 
@@ -3781,6 +3782,56 @@ static int rocker_port_switch_port_stp_update(struct net_device *dev, u8 state)
 	return rocker_port_stp_update(rocker_port, state);
 }
 
+static void rocker_destroy_flow_tables(struct rocker_port *rocker_port)
+{
+	int i;
+
+	for (i = 0; rocker_table_list[i]; i++)
+		net_flow_destroy_cache(rocker_table_list[i]);
+}
+
+static int rocker_init_flow_tables(struct rocker_port *rocker_port)
+{
+	int i, err;
+
+	for (i = 0; rocker_table_list[i]; i++) {
+		err = net_flow_init_cache(rocker_table_list[i]);
+		if (err) {
+			rocker_destroy_flow_tables(rocker_port);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+#ifdef CONFIG_NET_FLOW_TABLES
+static struct net_flow_tbl **rocker_get_tables(struct net_device *d)
+{
+	return rocker_table_list;
+}
+
+static struct net_flow_hdr **rocker_get_headers(struct net_device *d)
+{
+	return rocker_header_list;
+}
+
+static struct net_flow_action **rocker_get_actions(struct net_device *d)
+{
+	return rocker_action_list;
+}
+
+static struct net_flow_tbl_node **rocker_get_tgraph(struct net_device *d)
+{
+	return rocker_table_nodes;
+}
+
+static struct net_flow_hdr_node **rocker_get_hgraph(struct net_device *d)
+{
+	return rocker_header_nodes;
+}
+#endif
+
 static const struct net_device_ops rocker_port_netdev_ops = {
 	.ndo_open			= rocker_port_open,
 	.ndo_stop			= rocker_port_stop,
@@ -3795,6 +3846,13 @@ static const struct net_device_ops rocker_port_netdev_ops = {
 	.ndo_bridge_getlink		= rocker_port_bridge_getlink,
 	.ndo_switch_parent_id_get	= rocker_port_switch_parent_id_get,
 	.ndo_switch_port_stp_update	= rocker_port_switch_port_stp_update,
+#ifdef CONFIG_NET_FLOW_TABLES
+	.ndo_flow_get_tbls		= rocker_get_tables,
+	.ndo_flow_get_hdrs		= rocker_get_headers,
+	.ndo_flow_get_actions		= rocker_get_actions,
+	.ndo_flow_get_tbl_graph		= rocker_get_tgraph,
+	.ndo_flow_get_hdr_graph		= rocker_get_hgraph,
+#endif
 };
 
 /********************
@@ -3960,6 +4018,7 @@ static void rocker_remove_ports(struct rocker *rocker)
 		rocker_port = rocker->ports[i];
 		rocker_port_ig_tbl(rocker_port, ROCKER_OP_FLAG_REMOVE);
 		unregister_netdev(rocker_port->dev);
+		rocker_destroy_flow_tables(rocker_port);
 	}
 	kfree(rocker->ports);
 }
@@ -4023,6 +4082,12 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
 		goto err_port_ig_tbl;
 	}
 
+	err = rocker_init_flow_tables(rocker_port);
+	if (err) {
+		dev_err(&pdev->dev, "install flow table failed\n");
+		goto err_port_ig_tbl;
+	}
+
 	return 0;
 
 err_port_ig_tbl:
diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
new file mode 100644
index 0000000..7136380
--- /dev/null
+++ b/drivers/net/ethernet/rocker/rocker_pipeline.h
@@ -0,0 +1,451 @@
+/*
+ * drivers/net/ethernet/rocker/rocker_pipeline.h - Rocker switch device driver
+ * Copyright (c) 2014 John Fastabend <john.r.fastabend@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _ROCKER_PIPELINE_H_
+#define _ROCKER_PIPELINE_H_
+
+#include <linux/if_flow.h>
+#include <linux/if_flow_common.h>
+
+enum rocker_header_ids {
+	ROCKER_HEADER_UNSPEC = HEADER_MAX_UID,
+	ROCKER_HEADER_METADATA,
+};
+
+enum rocker_header_metadata_fields {
+	ROCKER_HEADER_METADATA_UNSPEC,
+	ROCKER_HEADER_METADATA_IN_LPORT,
+	ROCKER_HEADER_METADATA_GOTO_TBL,
+	ROCKER_HEADER_METADATA_GROUP_ID,
+};
+
+struct net_flow_field rocker_metadata_fields[] = {
+	{ .name = "in_lport",
+	  .uid = ROCKER_HEADER_METADATA_IN_LPORT,
+	  .bitwidth = 32,},
+	{ .name = "goto_tbl",
+	  .uid = ROCKER_HEADER_METADATA_GOTO_TBL,
+	  .bitwidth = 16,},
+	{ .name = "group_id",
+	  .uid = ROCKER_HEADER_METADATA_GROUP_ID,
+	  .bitwidth = 32,},
+};
+
+struct net_flow_hdr rocker_metadata_t = {
+	.name = "metadata_t",
+	.uid = ROCKER_HEADER_METADATA,
+	.field_sz = ARRAY_SIZE(rocker_metadata_fields),
+	.fields = rocker_metadata_fields,
+};
+
+struct net_flow_hdr *rocker_header_list[] = {
+	&net_flow_ethernet,
+	&net_flow_vlan,
+	&net_flow_ipv4,
+	&rocker_metadata_t,
+	NULL,
+};
+
+/* rocker specific action definitions */
+struct net_flow_action_arg rocker_set_group_id_args[] = {
+	{
+		.name = "group_id",
+		.type = NFL_ACTION_ARG_TYPE_U32,
+		.value_u32 = 0,
+	},
+	{
+		.name = "",
+		.type = NFL_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+enum rocker_action_ids {
+	ROCKER_ACTION_UNSPEC = ACTION_MAX_UID,
+	ROCKER_ACTION_SET_GROUP_ID,
+};
+
+struct net_flow_action rocker_set_group_id = {
+	.name = "set_group_id",
+	.uid = ROCKER_ACTION_SET_GROUP_ID,
+	.args = rocker_set_group_id_args,
+};
+
+struct net_flow_action *rocker_action_list[] = {
+	&net_flow_set_vlan_id,
+	&net_flow_copy_to_cpu,
+	&rocker_set_group_id,
+	&net_flow_pop_vlan,
+	&net_flow_set_eth_src,
+	&net_flow_set_eth_dst,
+	NULL,
+};
+
+/* headers graph */
+enum rocker_header_instance_ids {
+	ROCKER_HEADER_INSTANCE_UNSPEC,
+	ROCKER_HEADER_INSTANCE_ETHERNET,
+	ROCKER_HEADER_INSTANCE_VLAN_OUTER,
+	ROCKER_HEADER_INSTANCE_IPV4,
+	ROCKER_HEADER_INSTANCE_IN_LPORT,
+	ROCKER_HEADER_INSTANCE_GOTO_TABLE,
+	ROCKER_HEADER_INSTANCE_GROUP_ID,
+};
+
+struct net_flow_jump_table rocker_parse_ethernet[] = {
+	{
+		.field = {
+		   .header = HEADER_ETHERNET,
+		   .field = HEADER_ETHERNET_ETHERTYPE,
+		   .type = NFL_FIELD_REF_ATTR_TYPE_U16,
+		   .value_u16 = ETH_P_IP,
+		},
+		.node = ROCKER_HEADER_INSTANCE_IPV4,
+	},
+	{
+		.field = {
+		   .header = HEADER_ETHERNET,
+		   .field = HEADER_ETHERNET_ETHERTYPE,
+		   .type = NFL_FIELD_REF_ATTR_TYPE_U16,
+		   .value_u16 = ETH_P_8021Q,
+		},
+		.node = ROCKER_HEADER_INSTANCE_VLAN_OUTER,
+	},
+	{
+		.field = {0},
+		.node = 0,
+	},
+};
+
+int rocker_ethernet_headers[] = {HEADER_ETHERNET, 0};
+
+struct net_flow_hdr_node rocker_ethernet_header_node = {
+	.name = "ethernet",
+	.uid = ROCKER_HEADER_INSTANCE_ETHERNET,
+	.hdrs = rocker_ethernet_headers,
+	.jump = rocker_parse_ethernet,
+};
+
+struct net_flow_jump_table rocker_parse_vlan[] = {
+	{
+		.field = {
+		   .header = HEADER_VLAN,
+		   .field = HEADER_VLAN_ETHERTYPE,
+		   .type = NFL_FIELD_REF_ATTR_TYPE_U16,
+		   .value_u16 = ETH_P_IP,
+		},
+		.node = ROCKER_HEADER_INSTANCE_IPV4,
+	},
+	{
+		.field = {0},
+		.node = 0,
+	},
+};
+
+int rocker_vlan_headers[] = {HEADER_VLAN, 0};
+struct net_flow_hdr_node rocker_vlan_header_node = {
+	.name = "vlan",
+	.uid = ROCKER_HEADER_INSTANCE_VLAN_OUTER,
+	.hdrs = rocker_vlan_headers,
+	.jump = rocker_parse_vlan,
+};
+
+struct net_flow_jump_table rocker_terminal_headers[] = {
+	{
+		.field = {0},
+		.node = NFL_JUMP_TABLE_DONE,
+	},
+	{
+		.field = {0},
+		.node = 0,
+	},
+};
+
+int rocker_ipv4_headers[] = {HEADER_IPV4, 0};
+struct net_flow_hdr_node rocker_ipv4_header_node = {
+	.name = "ipv4",
+	.uid = ROCKER_HEADER_INSTANCE_IPV4,
+	.hdrs = rocker_ipv4_headers,
+	.jump = rocker_terminal_headers,
+};
+
+int rocker_metadata_headers[] = {ROCKER_HEADER_METADATA, 0};
+struct net_flow_hdr_node rocker_in_lport_header_node = {
+	.name = "in_lport",
+	.uid = ROCKER_HEADER_INSTANCE_IN_LPORT,
+	.hdrs = rocker_metadata_headers,
+	.jump = rocker_terminal_headers,
+};
+
+struct net_flow_hdr_node rocker_group_id_header_node = {
+	.name = "group_id",
+	.uid = ROCKER_HEADER_INSTANCE_GROUP_ID,
+	.hdrs = rocker_metadata_headers,
+	.jump = rocker_terminal_headers,
+};
+
+struct net_flow_hdr_node *rocker_header_nodes[] = {
+	&rocker_ethernet_header_node,
+	&rocker_vlan_header_node,
+	&rocker_ipv4_header_node,
+	&rocker_in_lport_header_node,
+	&rocker_group_id_header_node,
+	NULL,
+};
+
+/* table definition */
+struct net_flow_field_ref rocker_matches_ig_port[] = {
+	{ .instance = ROCKER_HEADER_INSTANCE_IN_LPORT,
+	  .header = ROCKER_HEADER_METADATA,
+	  .field = ROCKER_HEADER_METADATA_IN_LPORT,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref rocker_matches_vlan[] = {
+	{ .instance = ROCKER_HEADER_INSTANCE_IN_LPORT,
+	  .header = ROCKER_HEADER_METADATA,
+	  .field = ROCKER_HEADER_METADATA_IN_LPORT,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = ROCKER_HEADER_INSTANCE_VLAN_OUTER,
+	  .header = HEADER_VLAN,
+	  .field = HEADER_VLAN_VID,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref rocker_matches_term_mac[] = {
+	{ .instance = ROCKER_HEADER_INSTANCE_IN_LPORT,
+	  .header = ROCKER_HEADER_METADATA,
+	  .field = ROCKER_HEADER_METADATA_IN_LPORT,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = ROCKER_HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_ETHERTYPE,
+	  .mask_type = NFL_MASK_TYPE_EXACT},
+	{ .instance = ROCKER_HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_DST_MAC,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = ROCKER_HEADER_INSTANCE_VLAN_OUTER,
+	  .header = HEADER_VLAN,
+	  .field = HEADER_VLAN_VID,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref rocker_matches_ucast_routing[] = {
+	{ .instance = ROCKER_HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_ETHERTYPE,
+	  .mask_type = NFL_MASK_TYPE_EXACT},
+	{ .instance = ROCKER_HEADER_INSTANCE_IPV4,
+	  .header = HEADER_IPV4,
+	  .field = HEADER_IPV4_DST_IP,
+	  .mask_type = NFL_MASK_TYPE_LPM},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref rocker_matches_bridge[] = {
+	{ .instance = ROCKER_HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_DST_MAC,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = ROCKER_HEADER_INSTANCE_VLAN_OUTER,
+	  .header = HEADER_VLAN,
+	  .field = HEADER_VLAN_VID,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref rocker_matches_acl[] = {
+	{ .instance = ROCKER_HEADER_INSTANCE_IN_LPORT,
+	  .header = ROCKER_HEADER_METADATA,
+	  .field = ROCKER_HEADER_METADATA_IN_LPORT,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = ROCKER_HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_SRC_MAC,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = ROCKER_HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_DST_MAC,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = ROCKER_HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_ETHERTYPE,
+	  .mask_type = NFL_MASK_TYPE_EXACT},
+	{ .instance = ROCKER_HEADER_INSTANCE_VLAN_OUTER,
+	  .header = HEADER_VLAN,
+	  .field = HEADER_VLAN_VID,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = ROCKER_HEADER_INSTANCE_IPV4,
+	  .header = HEADER_IPV4,
+	  .field = HEADER_IPV4_PROTOCOL,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = ROCKER_HEADER_INSTANCE_IPV4,
+	  .header = HEADER_IPV4,
+	  .field = HEADER_IPV4_DSCP,
+	  .mask_type = NFL_MASK_TYPE_MASK},
+	{ .instance = 0, .field = 0},
+};
+
+int rocker_actions_ig_port[] = {0};
+int rocker_actions_vlan[] = {ACTION_SET_VLAN_ID, 0};
+int rocker_actions_term_mac[] = {ACTION_COPY_TO_CPU, 0};
+int rocker_actions_ucast_routing[] = {ROCKER_ACTION_SET_GROUP_ID, 0};
+int rocker_actions_bridge[] = {ROCKER_ACTION_SET_GROUP_ID,
+			       ACTION_COPY_TO_CPU, 0};
+int rocker_actions_acl[] = {ROCKER_ACTION_SET_GROUP_ID, 0};
+
+enum rocker_flow_table_id_space {
+	ROCKER_FLOW_TABLE_NULL,
+	ROCKER_FLOW_TABLE_ID_INGRESS_PORT,
+	ROCKER_FLOW_TABLE_ID_VLAN,
+	ROCKER_FLOW_TABLE_ID_TERMINATION_MAC,
+	ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING,
+	ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING,
+	ROCKER_FLOW_TABLE_ID_BRIDGING,
+	ROCKER_FLOW_TABLE_ID_ACL_POLICY,
+};
+
+struct net_flow_tbl rocker_ingress_port_table = {
+	.name = "ingress_port",
+	.uid = ROCKER_FLOW_TABLE_ID_INGRESS_PORT,
+	.source = 1,
+	.size = -1,
+	.matches = rocker_matches_ig_port,
+	.actions = rocker_actions_ig_port,
+	.cache = {0},
+};
+
+struct net_flow_tbl rocker_vlan_table = {
+	.name = "vlan",
+	.uid = ROCKER_FLOW_TABLE_ID_VLAN,
+	.source = 1,
+	.size = -1,
+	.matches = rocker_matches_vlan,
+	.actions = rocker_actions_vlan,
+	.cache = {0},
+};
+
+struct net_flow_tbl rocker_term_mac_table = {
+	.name = "term_mac",
+	.uid = ROCKER_FLOW_TABLE_ID_TERMINATION_MAC,
+	.source = 1,
+	.size = -1,
+	.matches = rocker_matches_term_mac,
+	.actions = rocker_actions_term_mac,
+	.cache = {0},
+};
+
+struct net_flow_tbl rocker_ucast_routing_table = {
+	.name = "ucast_routing",
+	.uid = ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING,
+	.source = 1,
+	.size = -1,
+	.matches = rocker_matches_ucast_routing,
+	.actions = rocker_actions_ucast_routing,
+	.cache = {0},
+};
+
+struct net_flow_tbl rocker_bridge_table = {
+	.name = "bridge",
+	.uid = ROCKER_FLOW_TABLE_ID_BRIDGING,
+	.source = 1,
+	.size = -1,
+	.matches = rocker_matches_bridge,
+	.actions = rocker_actions_bridge,
+	.cache = {0},
+};
+
+struct net_flow_tbl rocker_acl_table = {
+	.name = "acl",
+	.uid = ROCKER_FLOW_TABLE_ID_ACL_POLICY,
+	.source = 1,
+	.size = -1,
+	.matches = rocker_matches_acl,
+	.actions = rocker_actions_acl,
+	.cache = {0},
+};
+
+struct net_flow_tbl *rocker_table_list[] = {
+	&rocker_ingress_port_table,
+	&rocker_vlan_table,
+	&rocker_term_mac_table,
+	&rocker_ucast_routing_table,
+	&rocker_bridge_table,
+	&rocker_acl_table,
+	NULL,
+};
+
+/* Define the table graph layout */
+struct net_flow_jump_table rocker_table_node_ig_port_next[] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_VLAN},
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node rocker_table_node_ingress_port = {
+	.uid = ROCKER_FLOW_TABLE_ID_INGRESS_PORT,
+	.jump = rocker_table_node_ig_port_next};
+
+struct net_flow_jump_table rocker_table_node_vlan_next[] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_TERMINATION_MAC},
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node rocker_table_node_vlan = {
+	.uid = ROCKER_FLOW_TABLE_ID_VLAN,
+	.jump = rocker_table_node_vlan_next};
+
+struct net_flow_jump_table rocker_table_node_term_mac_next[] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING},
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node rocker_table_node_term_mac = {
+	.uid = ROCKER_FLOW_TABLE_ID_TERMINATION_MAC,
+	.jump = rocker_table_node_term_mac_next};
+
+struct net_flow_jump_table rocker_table_node_bridge_next[] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_ACL_POLICY},
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node rocker_table_node_bridge = {
+	.uid = ROCKER_FLOW_TABLE_ID_BRIDGING,
+	.jump = rocker_table_node_bridge_next};
+
+struct net_flow_jump_table rocker_table_node_ucast_routing_next[] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_ACL_POLICY},
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node rocker_table_node_ucast_routing = {
+	.uid = ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING,
+	.jump = rocker_table_node_ucast_routing_next};
+
+struct net_flow_jump_table rocker_table_node_acl_next[] = {
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node rocker_table_node_acl = {
+	.uid = ROCKER_FLOW_TABLE_ID_ACL_POLICY,
+	.jump = rocker_table_node_acl_next};
+
+struct net_flow_tbl_node *rocker_table_nodes[] = {
+	&rocker_table_node_ingress_port,
+	&rocker_table_node_vlan,
+	&rocker_table_node_term_mac,
+	&rocker_table_node_ucast_routing,
+	&rocker_table_node_bridge,
+	&rocker_table_node_acl,
+	NULL,
+};
+#endif /*_ROCKER_PIPELINE_H_*/

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 07/12] net: rocker: add set rule ops
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
                   ` (5 preceding siblings ...)
  2015-01-20 20:28 ` [net-next PATCH v3 06/12] net: rocker: add pipeline model for rocker switch John Fastabend
@ 2015-01-20 20:29 ` John Fastabend
  2015-01-20 20:29 ` [net-next PATCH v3 08/12] net: rocker: add group_id slices and drop explicit goto John Fastabend
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:29 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

Implement set rule operations for existing rocker tables.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c |  421 ++++++++++++++++++++++++++++++++++
 1 file changed, 420 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index d2ea451..51290882 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3830,6 +3830,422 @@ static struct net_flow_hdr_node **rocker_get_hgraph(struct net_device *d)
 {
 	return rocker_header_nodes;
 }
+
+static u32 rocker_goto_value(u32 id)
+{
+	switch (id) {
+	case ROCKER_FLOW_TABLE_ID_INGRESS_PORT:
+		return ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT;
+	case ROCKER_FLOW_TABLE_ID_VLAN:
+		return ROCKER_OF_DPA_TABLE_ID_VLAN;
+	case ROCKER_FLOW_TABLE_ID_TERMINATION_MAC:
+		return ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
+	case ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING:
+		return ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING;
+	case ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING:
+		return ROCKER_OF_DPA_TABLE_ID_MULTICAST_ROUTING;
+	case ROCKER_FLOW_TABLE_ID_BRIDGING:
+		return ROCKER_OF_DPA_TABLE_ID_BRIDGING;
+	case ROCKER_FLOW_TABLE_ID_ACL_POLICY:
+		return ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+	default:
+		return 0;
+	}
+}
+
+static int rocker_flow_set_ig_port(struct net_device *dev,
+				   struct net_flow_rule *rule)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	enum rocker_of_dpa_table_id goto_tbl;
+	u32 in_lport_mask, in_lport;
+	int flags = 0;
+
+	/* ingress port table only supports one field/mask/action this
+	 * simplifies the key construction and we can assume the values
+	 * are the correct types/mask/action by valid check above. The
+	 * user could pass multiple match/actions in a message with the
+	 * same field multiple times currently the valid test does not
+	 * catch this and we just use the first specified.
+	 */
+	in_lport = rule->matches[0].value_u32;
+	in_lport_mask = rule->matches[0].mask_u32;
+	goto_tbl = rocker_goto_value(rule->actions[0].args[0].value_u16);
+
+	return rocker_flow_tbl_ig_port(rocker_port, flags,
+				       in_lport, in_lport_mask,
+				       goto_tbl);
+}
+
+static int rocker_flow_set_vlan(struct net_device *dev,
+				struct net_flow_rule *rule)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	__be16 vlan_id, vlan_id_mask, new_vlan_id;
+	bool untagged, have_in_lport = false;
+	enum rocker_of_dpa_table_id goto_tbl;
+	int i, flags = 0;
+	u32 in_lport;
+
+	goto_tbl = ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
+
+	/* If user does not specify vid match default to any */
+	vlan_id = htons(1);
+	vlan_id_mask = 0;
+
+	for (i = 0; rule->matches && rule->matches[i].instance; i++) {
+		switch (rule->matches[i].instance) {
+		case ROCKER_HEADER_INSTANCE_IN_LPORT:
+			in_lport = rule->matches[i].value_u32;
+			have_in_lport = true;
+			break;
+		case ROCKER_HEADER_INSTANCE_VLAN_OUTER:
+			if (rule->matches[i].field != HEADER_VLAN_VID)
+				break;
+
+			vlan_id = htons(rule->matches[i].value_u16);
+			vlan_id_mask = htons(rule->matches[i].mask_u16);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	if (!have_in_lport)
+		return -EINVAL;
+
+	/* If user does not specify a new vlan id use default vlan id */
+	new_vlan_id = rocker_port_vid_to_vlan(rocker_port, vlan_id, &untagged);
+
+	for (i = 0; rule->actions && rule->actions[i].uid; i++) {
+		struct net_flow_action_arg *arg = &rule->actions[i].args[0];
+
+		switch (rule->actions[i].uid) {
+		case ACTION_SET_VLAN_ID:
+			new_vlan_id = htons(arg->value_u16);
+			if (new_vlan_id)
+				untagged = false;
+			break;
+		}
+	}
+
+	return rocker_flow_tbl_vlan(rocker_port, flags, in_lport,
+				    vlan_id, vlan_id_mask, goto_tbl,
+				    untagged, new_vlan_id);
+}
+
+static int rocker_flow_set_term_mac(struct net_device *dev,
+				    struct net_flow_rule *rule)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	__be16 vlan_id, vlan_id_mask, ethtype = 0;
+	const u8 *eth_dst, *eth_dst_mask;
+	u32 in_lport, in_lport_mask;
+	int i, flags = 0;
+	bool copy_to_cpu;
+
+	/* If user does not specify vid match default to any */
+	vlan_id = rocker_port->internal_vlan_id;
+	vlan_id_mask = 0;
+
+	/* If user does not specify in_lport match default to any */
+	in_lport = rocker_port->lport;
+	in_lport_mask = 0;
+
+	/* If user does not specify a mac address match any */
+	eth_dst = rocker_port->dev->dev_addr;
+	eth_dst_mask = zero_mac;
+
+	for (i = 0; rule->matches && rule->matches[i].instance; i++) {
+		switch (rule->matches[i].instance) {
+		case ROCKER_HEADER_INSTANCE_IN_LPORT:
+			in_lport = rule->matches[i].value_u32;
+			in_lport_mask = rule->matches[i].mask_u32;
+			break;
+		case ROCKER_HEADER_INSTANCE_VLAN_OUTER:
+			if (rule->matches[i].field != HEADER_VLAN_VID)
+				break;
+
+			vlan_id = htons(rule->matches[i].value_u16);
+			vlan_id_mask = htons(rule->matches[i].mask_u16);
+			break;
+		case ROCKER_HEADER_INSTANCE_ETHERNET:
+			switch (rule->matches[i].field) {
+			case HEADER_ETHERNET_DST_MAC:
+				eth_dst = (u8 *)&rule->matches[i].value_u64;
+				eth_dst_mask = (u8 *)&rule->matches[i].mask_u64;
+				break;
+			case HEADER_ETHERNET_ETHERTYPE:
+				ethtype = htons(rule->matches[i].value_u16);
+				break;
+			default:
+				return -EINVAL;
+			}
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	if (!ethtype)
+		return -EINVAL;
+
+	/* By default do not copy to cpu */
+	copy_to_cpu = false;
+
+	for (i = 0; rule->actions && rule->actions[i].uid; i++) {
+		switch (rule->actions[i].uid) {
+		case ACTION_COPY_TO_CPU:
+			copy_to_cpu = true;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return rocker_flow_tbl_term_mac(rocker_port, in_lport, in_lport_mask,
+					ethtype, eth_dst, eth_dst_mask,
+					vlan_id, vlan_id_mask,
+					copy_to_cpu, flags);
+}
+
+static int rocker_flow_set_ucast_routing(struct net_device *dev,
+					 struct net_flow_rule *rule)
+{
+	return -EOPNOTSUPP;
+}
+
+static int rocker_flow_set_mcast_routing(struct net_device *dev,
+					 struct net_flow_rule *rule)
+{
+	return -EOPNOTSUPP;
+}
+
+static int rocker_flow_set_bridge(struct net_device *dev,
+				  struct net_flow_rule *rule)
+{
+	enum rocker_of_dpa_table_id goto_tbl;
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	u32 in_lport, in_lport_mask, group_id, tunnel_id;
+	__be16 vlan_id, vlan_id_mask;
+	const u8 *eth_dst, *eth_dst_mask;
+	int i, flags = 0;
+	bool copy_to_cpu;
+
+	goto_tbl = ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+
+	/* If user does not specify vid match default to any */
+	vlan_id = rocker_port->internal_vlan_id;
+	vlan_id_mask = 0;
+
+	/* If user does not specify in_lport match default to any */
+	in_lport = rocker_port->lport;
+	in_lport_mask = 0;
+
+	/* If user does not specify a mac address match any */
+	eth_dst = rocker_port->dev->dev_addr;
+	eth_dst_mask = NULL;
+
+	/* Do not support for tunnel_id yet. */
+	tunnel_id = 0;
+
+	for (i = 0; rule->matches && rule->matches[i].instance; i++) {
+		switch (rule->matches[i].instance) {
+		case ROCKER_HEADER_INSTANCE_IN_LPORT:
+			in_lport = rule->matches[i].value_u32;
+			in_lport_mask = rule->matches[i].mask_u32;
+			break;
+		case ROCKER_HEADER_INSTANCE_VLAN_OUTER:
+			if (rule->matches[i].field != HEADER_VLAN_VID)
+				break;
+
+			vlan_id = htons(rule->matches[i].value_u16);
+			vlan_id_mask = htons(rule->matches[i].mask_u16);
+			break;
+		case ROCKER_HEADER_INSTANCE_ETHERNET:
+			switch (rule->matches[i].field) {
+			case HEADER_ETHERNET_DST_MAC:
+				eth_dst = (u8 *)&rule->matches[i].value_u64;
+				eth_dst_mask = (u8 *)&rule->matches[i].mask_u64;
+				break;
+			default:
+				return -EINVAL;
+			}
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	/* By default do not copy to cpu and skip group assignment */
+	copy_to_cpu = false;
+	group_id = ROCKER_GROUP_NONE;
+
+	for (i = 0; rule->actions && rule->actions[i].uid; i++) {
+		struct net_flow_action_arg *arg = &rule->actions[i].args[0];
+
+		switch (rule->actions[i].uid) {
+		case ACTION_COPY_TO_CPU:
+			copy_to_cpu = true;
+			break;
+		case ROCKER_ACTION_SET_GROUP_ID:
+			group_id = arg->value_u32;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	/* Ignoring eth_dst_mask it seems to cause a EINVAL return code */
+	return rocker_flow_tbl_bridge(rocker_port, flags,
+				      eth_dst, eth_dst_mask,
+				      vlan_id, tunnel_id,
+				      goto_tbl, group_id, copy_to_cpu);
+}
+
+static int rocker_flow_set_acl(struct net_device *dev,
+			       struct net_flow_rule *rule)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	u32 in_lport, in_lport_mask, group_id, tunnel_id;
+	__be16 vlan_id, vlan_id_mask, ethtype = 0;
+	const u8 *eth_dst, *eth_src, *eth_dst_mask, *eth_src_mask;
+	u8 protocol, protocol_mask, dscp, dscp_mask;
+	int i, flags = 0;
+
+	/* If user does not specify vid match default to any */
+	vlan_id = rocker_port->internal_vlan_id;
+	vlan_id_mask = 0;
+
+	/* If user does not specify in_lport match default to any */
+	in_lport = rocker_port->lport;
+	in_lport_mask = 0;
+
+	/* If user does not specify a mac address match any */
+	eth_dst = rocker_port->dev->dev_addr;
+	eth_src = zero_mac;
+	eth_dst_mask = NULL;
+	eth_src_mask = NULL;
+
+	/* If user does not set protocol/dscp mask them out */
+	protocol = 0;
+	dscp = 0;
+	protocol_mask = 0;
+	dscp_mask = 0;
+
+	/* Do not support for tunnel_id yet. */
+	tunnel_id = 0;
+
+	for (i = 0; rule->matches && rule->matches[i].instance; i++) {
+		switch (rule->matches[i].instance) {
+		case ROCKER_HEADER_INSTANCE_IN_LPORT:
+			in_lport = rule->matches[i].value_u32;
+			in_lport_mask = rule->matches[i].mask_u32;
+			break;
+		case ROCKER_HEADER_INSTANCE_VLAN_OUTER:
+			if (rule->matches[i].field != HEADER_VLAN_VID)
+				break;
+
+			vlan_id = htons(rule->matches[i].value_u16);
+			vlan_id_mask = htons(rule->matches[i].mask_u16);
+			break;
+		case ROCKER_HEADER_INSTANCE_ETHERNET:
+			switch (rule->matches[i].field) {
+			case HEADER_ETHERNET_SRC_MAC:
+				eth_src = (u8 *)&rule->matches[i].value_u64;
+				eth_src_mask = (u8 *)&rule->matches[i].mask_u64;
+				break;
+			case HEADER_ETHERNET_DST_MAC:
+				eth_dst = (u8 *)&rule->matches[i].value_u64;
+				eth_dst_mask = (u8 *)&rule->matches[i].mask_u64;
+				break;
+			case HEADER_ETHERNET_ETHERTYPE:
+				ethtype = htons(rule->matches[i].value_u16);
+				break;
+			default:
+				return -EINVAL;
+			}
+			break;
+		case ROCKER_HEADER_INSTANCE_IPV4:
+			switch (rule->matches[i].field) {
+			case HEADER_IPV4_PROTOCOL:
+				protocol = rule->matches[i].value_u8;
+				protocol_mask = rule->matches[i].mask_u8;
+				break;
+			case HEADER_IPV4_DSCP:
+				dscp = rule->matches[i].value_u8;
+				dscp_mask = rule->matches[i].mask_u8;
+				break;
+			default:
+				return -EINVAL;
+			}
+		default:
+			return -EINVAL;
+		}
+	}
+
+	/* By default do not copy to cpu and skip group assignment */
+	group_id = ROCKER_GROUP_NONE;
+
+	for (i = 0; rule->actions && rule->actions[i].uid; i++) {
+		switch (rule->actions[i].uid) {
+		case ROCKER_ACTION_SET_GROUP_ID:
+			group_id = rule->actions[i].args[0].value_u32;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return rocker_flow_tbl_acl(rocker_port, flags,
+				   in_lport, in_lport_mask,
+				   eth_src, eth_src_mask,
+				   eth_dst, eth_dst_mask, ethtype,
+				   vlan_id, vlan_id_mask,
+				   protocol, protocol_mask,
+				   dscp, dscp_mask,
+				   group_id);
+}
+
+static int rocker_set_rules(struct net_device *dev,
+			    struct net_flow_rule *rule)
+{
+	int err = -EINVAL;
+
+	switch (rule->table_id) {
+	case ROCKER_FLOW_TABLE_ID_INGRESS_PORT:
+		err = rocker_flow_set_ig_port(dev, rule);
+		break;
+	case ROCKER_FLOW_TABLE_ID_VLAN:
+		err = rocker_flow_set_vlan(dev, rule);
+		break;
+	case ROCKER_FLOW_TABLE_ID_TERMINATION_MAC:
+		err = rocker_flow_set_term_mac(dev, rule);
+		break;
+	case ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING:
+		err = rocker_flow_set_ucast_routing(dev, rule);
+		break;
+	case ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING:
+		err = rocker_flow_set_mcast_routing(dev, rule);
+		break;
+	case ROCKER_FLOW_TABLE_ID_BRIDGING:
+		err = rocker_flow_set_bridge(dev, rule);
+		break;
+	case ROCKER_FLOW_TABLE_ID_ACL_POLICY:
+		err = rocker_flow_set_acl(dev, rule);
+		break;
+	default:
+		break;
+	}
+
+	return err;
+}
+
+static int rocker_del_rules(struct net_device *dev,
+			    struct net_flow_rule *rule)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 static const struct net_device_ops rocker_port_netdev_ops = {
@@ -3852,6 +4268,9 @@ static const struct net_device_ops rocker_port_netdev_ops = {
 	.ndo_flow_get_actions		= rocker_get_actions,
 	.ndo_flow_get_tbl_graph		= rocker_get_tgraph,
 	.ndo_flow_get_hdr_graph		= rocker_get_hgraph,
+
+	.ndo_flow_set_rule		= rocker_set_rules,
+	.ndo_flow_del_rule		= rocker_del_rules,
 #endif
 };
 
@@ -4084,7 +4503,7 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number)
 
 	err = rocker_init_flow_tables(rocker_port);
 	if (err) {
-		dev_err(&pdev->dev, "install flow table failed\n");
+		dev_err(&pdev->dev, "install rule table failed\n");
 		goto err_port_ig_tbl;
 	}
 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 08/12] net: rocker: add group_id slices and drop explicit goto
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
                   ` (6 preceding siblings ...)
  2015-01-20 20:29 ` [net-next PATCH v3 07/12] net: rocker: add set rule ops John Fastabend
@ 2015-01-20 20:29 ` John Fastabend
  2015-01-20 20:30 ` [net-next PATCH v3 09/12] net: rocker: add multicast path to bridging John Fastabend
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:29 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

This adds the group tables for l3_unicast, l2_rewrite and l2. In
addition to adding the tables we extend the metadata fields to
support three different group id lookups. One for each table and
drop the more generic one previously being used.

Finally we can also drop the goto action as it is not used anymore.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c          |  174 ++++++++++++++++++++++++
 drivers/net/ethernet/rocker/rocker_pipeline.h |  180 ++++++++++++++++++++++---
 2 files changed, 328 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 51290882..2be8f61 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4088,8 +4088,8 @@ static int rocker_flow_set_bridge(struct net_device *dev,
 		case ACTION_COPY_TO_CPU:
 			copy_to_cpu = true;
 			break;
-		case ROCKER_ACTION_SET_GROUP_ID:
-			group_id = arg->value_u32;
+		case ROCKER_ACTION_SET_L3_UNICAST_GID:
+			group_id = ROCKER_GROUP_L3_UNICAST(arg->value_u32);
 			break;
 		default:
 			return -EINVAL;
@@ -4188,9 +4188,11 @@ static int rocker_flow_set_acl(struct net_device *dev,
 	group_id = ROCKER_GROUP_NONE;
 
 	for (i = 0; rule->actions && rule->actions[i].uid; i++) {
+		struct net_flow_action_arg *arg = &rule->actions[i].args[0];
+
 		switch (rule->actions[i].uid) {
-		case ROCKER_ACTION_SET_GROUP_ID:
-			group_id = rule->actions[i].args[0].value_u32;
+		case ROCKER_ACTION_SET_L3_UNICAST_GID:
+			group_id = ROCKER_GROUP_L3_UNICAST(arg->value_u32);
 			break;
 		default:
 			return -EINVAL;
@@ -4207,6 +4209,161 @@ static int rocker_flow_set_acl(struct net_device *dev,
 				   group_id);
 }
 
+static int rocker_flow_set_group_slice_l3_unicast(struct net_device *dev,
+						  struct net_flow_rule *rule)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker_group_tbl_entry *entry;
+	int i, flags = 0;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	for (i = 0; rule->matches && rule->matches[i].instance; i++) {
+		struct net_flow_field_ref *r = &rule->matches[i];
+
+		switch (r->instance) {
+		case ROCKER_HEADER_INSTANCE_L3_UNICAST_GID:
+			entry->group_id = ROCKER_GROUP_L3_UNICAST(r->value_u32);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; rule->actions && rule->actions[i].uid; i++) {
+		struct net_flow_action_arg *arg = &rule->actions[i].args[0];
+
+		switch (rule->actions[i].uid) {
+		case ACTION_SET_ETH_SRC:
+			ether_addr_copy(entry->l3_unicast.eth_src,
+					(u8 *)&arg->value_u64);
+			break;
+		case ACTION_SET_ETH_DST:
+			ether_addr_copy(entry->l3_unicast.eth_dst,
+					(u8 *)&arg->value_u64);
+			break;
+		case ACTION_SET_VLAN_ID:
+			entry->l3_unicast.vlan_id = htons(arg->value_u16);
+			break;
+		case ACTION_CHECK_TTL_DROP:
+			entry->l3_unicast.ttl_check = true;
+			break;
+		case ROCKER_ACTION_SET_L2_REWRITE_GID:
+			entry->l3_unicast.group_id =
+				ROCKER_GROUP_L2_REWRITE(arg->value_u32);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return rocker_group_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_set_group_slice_l2_rewrite(struct net_device *dev,
+						  struct net_flow_rule *rule)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker_group_tbl_entry *entry;
+	int i, flags = 0;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	for (i = 0; rule->matches && rule->matches[i].instance; i++) {
+		struct net_flow_field_ref *r = &rule->matches[i];
+
+		switch (r->instance) {
+		case ROCKER_HEADER_INSTANCE_L2_REWRITE_GID:
+			entry->group_id = ROCKER_GROUP_L2_REWRITE(r->value_u32);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; rule->actions && rule->actions[i].uid; i++) {
+		struct net_flow_action_arg *arg = &rule->actions[i].args[0];
+
+		switch (rule->actions[i].uid) {
+		case ACTION_SET_ETH_SRC:
+			ether_addr_copy(entry->l2_rewrite.eth_src,
+					(u8 *)&arg->value_u64);
+			break;
+		case ACTION_SET_ETH_DST:
+			ether_addr_copy(entry->l2_rewrite.eth_dst,
+					(u8 *)&arg->value_u64);
+			break;
+		case ACTION_SET_VLAN_ID:
+			entry->l2_rewrite.vlan_id = htons(arg->value_u16);
+			break;
+		case ROCKER_ACTION_SET_L2_GID:
+			entry->l2_rewrite.group_id =
+				ROCKER_GROUP_L2_INTERFACE(arg->value_u32,
+							  rocker_port->lport);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return rocker_group_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_set_group_slice_l2(struct net_device *dev,
+					  struct net_flow_rule *rule)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker_group_tbl_entry *entry;
+	int i, flags = 0;
+	u32 lport;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	lport = rocker_port->lport;
+
+	/* Use the dev lport if we don't have a specified lport instance
+	 * from the user. We need to walk the list once before to extract
+	 * any lport attribute.
+	 */
+	for (i = 0; rule->matches && rule->matches[i].instance; i++) {
+		switch (rule->matches[i].instance) {
+		case ROCKER_HEADER_METADATA_IN_LPORT:
+			lport = rule->matches[i].value_u32;
+		}
+	}
+
+	for (i = 0; rule->matches && rule->matches[i].instance; i++) {
+		struct net_flow_field_ref *r = &rule->matches[i];
+
+		switch (r->instance) {
+		case ROCKER_HEADER_INSTANCE_L2_GID:
+			entry->group_id =
+				ROCKER_GROUP_L2_INTERFACE(r->value_u32, lport);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; rule->actions && rule->actions[i].uid; i++) {
+		switch (rule->actions[i].uid) {
+		case ACTION_POP_VLAN:
+			entry->l2_interface.pop_vlan = true;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return rocker_group_tbl_do(rocker_port, flags, entry);
+}
+
 static int rocker_set_rules(struct net_device *dev,
 			    struct net_flow_rule *rule)
 {
@@ -4234,6 +4391,15 @@ static int rocker_set_rules(struct net_device *dev,
 	case ROCKER_FLOW_TABLE_ID_ACL_POLICY:
 		err = rocker_flow_set_acl(dev, rule);
 		break;
+	case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST:
+		err = rocker_flow_set_group_slice_l3_unicast(dev, rule);
+		break;
+	case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE:
+		err = rocker_flow_set_group_slice_l2_rewrite(dev, rule);
+		break;
+	case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2:
+		err = rocker_flow_set_group_slice_l2(dev, rule);
+		break;
 	default:
 		break;
 	}
diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
index 7136380..6d1e2ee 100644
--- a/drivers/net/ethernet/rocker/rocker_pipeline.h
+++ b/drivers/net/ethernet/rocker/rocker_pipeline.h
@@ -22,19 +22,23 @@ enum rocker_header_ids {
 enum rocker_header_metadata_fields {
 	ROCKER_HEADER_METADATA_UNSPEC,
 	ROCKER_HEADER_METADATA_IN_LPORT,
-	ROCKER_HEADER_METADATA_GOTO_TBL,
-	ROCKER_HEADER_METADATA_GROUP_ID,
+	ROCKER_HEADER_METADATA_L3_UNICAST_GID,
+	ROCKER_HEADER_METADATA_L2_REWRITE_GID,
+	ROCKER_HEADER_METADATA_L2_GID,
 };
 
 struct net_flow_field rocker_metadata_fields[] = {
 	{ .name = "in_lport",
 	  .uid = ROCKER_HEADER_METADATA_IN_LPORT,
 	  .bitwidth = 32,},
-	{ .name = "goto_tbl",
-	  .uid = ROCKER_HEADER_METADATA_GOTO_TBL,
-	  .bitwidth = 16,},
-	{ .name = "group_id",
-	  .uid = ROCKER_HEADER_METADATA_GROUP_ID,
+	{ .name = "l3_unicast_group_id",
+	  .uid = ROCKER_HEADER_METADATA_L3_UNICAST_GID,
+	  .bitwidth = 32,},
+	{ .name = "l2_rewrite_group_id",
+	  .uid = ROCKER_HEADER_METADATA_L2_REWRITE_GID,
+	  .bitwidth = 32,},
+	{ .name = "l2_group_id",
+	  .uid = ROCKER_HEADER_METADATA_L2_GID,
 	  .bitwidth = 32,},
 };
 
@@ -68,22 +72,39 @@ struct net_flow_action_arg rocker_set_group_id_args[] = {
 
 enum rocker_action_ids {
 	ROCKER_ACTION_UNSPEC = ACTION_MAX_UID,
-	ROCKER_ACTION_SET_GROUP_ID,
+	ROCKER_ACTION_SET_L3_UNICAST_GID,
+	ROCKER_ACTION_SET_L2_REWRITE_GID,
+	ROCKER_ACTION_SET_L2_GID,
+};
+
+struct net_flow_action rocker_set_l3_unicast_group_id = {
+	.name = "set_l3_unicast_group_id",
+	.uid = ROCKER_ACTION_SET_L3_UNICAST_GID,
+	.args = rocker_set_group_id_args,
+};
+
+struct net_flow_action rocker_set_l2_rewrite_group_id = {
+	.name = "set_l2_rewrite_group_id",
+	.uid = ROCKER_ACTION_SET_L2_REWRITE_GID,
+	.args = rocker_set_group_id_args,
 };
 
-struct net_flow_action rocker_set_group_id = {
-	.name = "set_group_id",
-	.uid = ROCKER_ACTION_SET_GROUP_ID,
+struct net_flow_action rocker_set_l2_group_id = {
+	.name = "set_l2_group_id",
+	.uid = ROCKER_ACTION_SET_L2_GID,
 	.args = rocker_set_group_id_args,
 };
 
 struct net_flow_action *rocker_action_list[] = {
 	&net_flow_set_vlan_id,
 	&net_flow_copy_to_cpu,
-	&rocker_set_group_id,
+	&rocker_set_l3_unicast_group_id,
+	&rocker_set_l2_rewrite_group_id,
+	&rocker_set_l2_group_id,
 	&net_flow_pop_vlan,
 	&net_flow_set_eth_src,
 	&net_flow_set_eth_dst,
+	&net_flow_check_ttl_drop,
 	NULL,
 };
 
@@ -94,8 +115,9 @@ enum rocker_header_instance_ids {
 	ROCKER_HEADER_INSTANCE_VLAN_OUTER,
 	ROCKER_HEADER_INSTANCE_IPV4,
 	ROCKER_HEADER_INSTANCE_IN_LPORT,
-	ROCKER_HEADER_INSTANCE_GOTO_TABLE,
-	ROCKER_HEADER_INSTANCE_GROUP_ID,
+	ROCKER_HEADER_INSTANCE_L3_UNICAST_GID,
+	ROCKER_HEADER_INSTANCE_L2_REWRITE_GID,
+	ROCKER_HEADER_INSTANCE_L2_GID,
 };
 
 struct net_flow_jump_table rocker_parse_ethernet[] = {
@@ -183,9 +205,23 @@ struct net_flow_hdr_node rocker_in_lport_header_node = {
 	.jump = rocker_terminal_headers,
 };
 
-struct net_flow_hdr_node rocker_group_id_header_node = {
-	.name = "group_id",
-	.uid = ROCKER_HEADER_INSTANCE_GROUP_ID,
+struct net_flow_hdr_node rocker_l2_group_id_header_node = {
+	.name = "l2_group_id",
+	.uid = ROCKER_HEADER_INSTANCE_L2_GID,
+	.hdrs = rocker_metadata_headers,
+	.jump = rocker_terminal_headers,
+};
+
+struct net_flow_hdr_node rocker_l2_rewrite_group_id_header_node = {
+	.name = "l2_rewrite_group_id",
+	.uid = ROCKER_HEADER_INSTANCE_L2_REWRITE_GID,
+	.hdrs = rocker_metadata_headers,
+	.jump = rocker_terminal_headers,
+};
+
+struct net_flow_hdr_node rocker_l3_unicast_group_id_header_node = {
+	.name = "l3_uniscast_group_id",
+	.uid = ROCKER_HEADER_INSTANCE_L3_UNICAST_GID,
 	.hdrs = rocker_metadata_headers,
 	.jump = rocker_terminal_headers,
 };
@@ -195,7 +231,9 @@ struct net_flow_hdr_node *rocker_header_nodes[] = {
 	&rocker_vlan_header_node,
 	&rocker_ipv4_header_node,
 	&rocker_in_lport_header_node,
-	&rocker_group_id_header_node,
+	&rocker_l3_unicast_group_id_header_node,
+	&rocker_l2_rewrite_group_id_header_node,
+	&rocker_l2_group_id_header_node,
 	NULL,
 };
 
@@ -296,13 +334,48 @@ struct net_flow_field_ref rocker_matches_acl[] = {
 	{ .instance = 0, .field = 0},
 };
 
+struct net_flow_field_ref rocker_matches_l3_unicast_group_slice[2] = {
+	{ .instance = ROCKER_HEADER_INSTANCE_L3_UNICAST_GID,
+	  .header = ROCKER_HEADER_METADATA,
+	  .field = ROCKER_HEADER_METADATA_L3_UNICAST_GID,
+	  .mask_type = NFL_MASK_TYPE_EXACT},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref rocker_matches_l2_rewrite_group_slice[2] = {
+	{ .instance = ROCKER_HEADER_INSTANCE_L2_REWRITE_GID,
+	  .header = ROCKER_HEADER_METADATA,
+	  .field = ROCKER_HEADER_METADATA_L2_REWRITE_GID,
+	  .mask_type = NFL_MASK_TYPE_EXACT},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref rocker_matches_l2_group_slice[2] = {
+	{ .instance = ROCKER_HEADER_INSTANCE_L2_GID,
+	  .header = ROCKER_HEADER_METADATA,
+	  .field = ROCKER_HEADER_METADATA_L2_GID,
+	  .mask_type = NFL_MASK_TYPE_EXACT},
+	{ .instance = 0, .field = 0},
+};
+
 int rocker_actions_ig_port[] = {0};
 int rocker_actions_vlan[] = {ACTION_SET_VLAN_ID, 0};
 int rocker_actions_term_mac[] = {ACTION_COPY_TO_CPU, 0};
-int rocker_actions_ucast_routing[] = {ROCKER_ACTION_SET_GROUP_ID, 0};
-int rocker_actions_bridge[] = {ROCKER_ACTION_SET_GROUP_ID,
+int rocker_actions_ucast_routing[] = {ROCKER_ACTION_SET_L3_UNICAST_GID, 0};
+int rocker_actions_bridge[] = {ROCKER_ACTION_SET_L3_UNICAST_GID,
 			       ACTION_COPY_TO_CPU, 0};
-int rocker_actions_acl[] = {ROCKER_ACTION_SET_GROUP_ID, 0};
+int rocker_actions_acl[] = {ROCKER_ACTION_SET_L3_UNICAST_GID, 0};
+int rocker_actions_group_slice_l3_unicast[] = {ACTION_SET_ETH_SRC,
+					       ACTION_SET_ETH_DST,
+					       ACTION_SET_VLAN_ID,
+					       ROCKER_ACTION_SET_L2_REWRITE_GID,
+					       ACTION_CHECK_TTL_DROP, 0};
+int rocker_actions_group_slice_l2_rewrite[] = {ACTION_SET_ETH_SRC,
+					       ACTION_SET_ETH_DST,
+					       ACTION_SET_VLAN_ID,
+					       ROCKER_ACTION_SET_L2_GID,
+					       0};
+int rocker_actions_group_slice_l2[] = {ACTION_POP_VLAN, 0};
 
 enum rocker_flow_table_id_space {
 	ROCKER_FLOW_TABLE_NULL,
@@ -313,6 +386,9 @@ enum rocker_flow_table_id_space {
 	ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING,
 	ROCKER_FLOW_TABLE_ID_BRIDGING,
 	ROCKER_FLOW_TABLE_ID_ACL_POLICY,
+	ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST,
+	ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE,
+	ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2,
 };
 
 struct net_flow_tbl rocker_ingress_port_table = {
@@ -375,6 +451,33 @@ struct net_flow_tbl rocker_acl_table = {
 	.cache = {0},
 };
 
+struct net_flow_tbl rocker_group_slice_l3_unicast_table = {
+	.name = "group_slice_l3_unicast",
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST,
+	.source = 1,
+	.size = -1,
+	.matches = rocker_matches_l3_unicast_group_slice,
+	.actions = rocker_actions_group_slice_l3_unicast,
+};
+
+struct net_flow_tbl rocker_group_slice_l2_rewrite_table = {
+	.name = "group_slice_l2_rewrite",
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE,
+	.source = 1,
+	.size = -1,
+	.matches = rocker_matches_l2_rewrite_group_slice,
+	.actions = rocker_actions_group_slice_l2_rewrite,
+};
+
+struct net_flow_tbl rocker_group_slice_l2_table = {
+	.name = "group_slice_l2",
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2,
+	.source = 1,
+	.size = -1,
+	.matches = rocker_matches_l2_group_slice,
+	.actions = rocker_actions_group_slice_l2,
+};
+
 struct net_flow_tbl *rocker_table_list[] = {
 	&rocker_ingress_port_table,
 	&rocker_vlan_table,
@@ -382,6 +485,9 @@ struct net_flow_tbl *rocker_table_list[] = {
 	&rocker_ucast_routing_table,
 	&rocker_bridge_table,
 	&rocker_acl_table,
+	&rocker_group_slice_l3_unicast_table,
+	&rocker_group_slice_l2_rewrite_table,
+	&rocker_group_slice_l2_table,
 	NULL,
 };
 
@@ -432,6 +538,7 @@ struct net_flow_tbl_node rocker_table_node_ucast_routing = {
 	.jump = rocker_table_node_ucast_routing_next};
 
 struct net_flow_jump_table rocker_table_node_acl_next[] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST},
 	{ .field = {0}, .node = 0},
 };
 
@@ -439,6 +546,32 @@ struct net_flow_tbl_node rocker_table_node_acl = {
 	.uid = ROCKER_FLOW_TABLE_ID_ACL_POLICY,
 	.jump = rocker_table_node_acl_next};
 
+struct net_flow_jump_table rocker_table_node_group_l3_unicast_next[1] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE},
+};
+
+struct net_flow_tbl_node rocker_table_node_group_l3_unicast = {
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST,
+	.jump = rocker_table_node_group_l3_unicast_next};
+
+struct net_flow_jump_table rocker_table_node_group_l2_rewrite_next[1] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2},
+};
+
+struct net_flow_tbl_node rocker_table_node_group_l2_rewrite = {
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE,
+	.jump = rocker_table_node_group_l2_rewrite_next};
+
+struct net_flow_jump_table rocker_table_node_group_l2_next[1] = {
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node rocker_table_node_group_l2 = {
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2,
+	.jump = rocker_table_node_group_l2_next};
+
+struct net_flow_tbl_node rocker_table_node_nil = {.uid = 0, .jump = NULL};
+
 struct net_flow_tbl_node *rocker_table_nodes[] = {
 	&rocker_table_node_ingress_port,
 	&rocker_table_node_vlan,
@@ -446,6 +579,9 @@ struct net_flow_tbl_node *rocker_table_nodes[] = {
 	&rocker_table_node_ucast_routing,
 	&rocker_table_node_bridge,
 	&rocker_table_node_acl,
-	NULL,
+	&rocker_table_node_group_l3_unicast,
+	&rocker_table_node_group_l2_rewrite,
+	&rocker_table_node_group_l2,
+	NULL
 };
 #endif /*_ROCKER_PIPELINE_H_*/

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 09/12] net: rocker: add multicast path to bridging
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
                   ` (7 preceding siblings ...)
  2015-01-20 20:29 ` [net-next PATCH v3 08/12] net: rocker: add group_id slices and drop explicit goto John Fastabend
@ 2015-01-20 20:30 ` John Fastabend
  2015-01-20 20:30 ` [net-next PATCH v3 10/12] net: rocker: add cookie to group acls and use flow_id to set cookie John Fastabend
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:30 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

Add path in table graph to send packets to the bridge table.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker_pipeline.h |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
index 6d1e2ee..1a3dcdb 100644
--- a/drivers/net/ethernet/rocker/rocker_pipeline.h
+++ b/drivers/net/ethernet/rocker/rocker_pipeline.h
@@ -511,6 +511,14 @@ struct net_flow_tbl_node rocker_table_node_vlan = {
 	.jump = rocker_table_node_vlan_next};
 
 struct net_flow_jump_table rocker_table_node_term_mac_next[] = {
+	{ .field = {.instance = ROCKER_HEADER_INSTANCE_ETHERNET,
+		    .header = HEADER_ETHERNET,
+		    .field = HEADER_ETHERNET_DST_MAC,
+		    .mask_type = NFL_MASK_TYPE_LPM,
+		    .type = NFL_FIELD_REF_ATTR_TYPE_U64,
+		    .value_u64 = (__u64)0x1,
+		    .mask_u64 = (__u64)0x1,
+	}, .node = ROCKER_FLOW_TABLE_ID_BRIDGING},
 	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING},
 	{ .field = {0}, .node = 0},
 };

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 10/12] net: rocker: add cookie to group acls and use flow_id to set cookie
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
                   ` (8 preceding siblings ...)
  2015-01-20 20:30 ` [net-next PATCH v3 09/12] net: rocker: add multicast path to bridging John Fastabend
@ 2015-01-20 20:30 ` John Fastabend
  2015-01-20 20:31 ` [net-next PATCH v3 11/12] net: rocker: have flow api calls set cookie value John Fastabend
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:30 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

Rocker uses a cookie value to identify flows however the flow API
already has a unique id for each flow. To help the translation
add support to set the cookie value through the internal rocker
flow API and then use the unique id in the cases where it is
available.

This patch extends the internal code paths to support the new
cookie value.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c |   64 ++++++++++++++++++++++------------
 1 file changed, 42 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 2be8f61..5ba46f6 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -120,6 +120,7 @@ struct rocker_flow_tbl_entry {
 
 struct rocker_group_tbl_entry {
 	struct hlist_node entry;
+	u64 cookie;
 	u32 cmd;
 	u32 group_id; /* key */
 	u16 group_count;
@@ -2233,7 +2234,8 @@ static int rocker_flow_tbl_add(struct rocker_port *rocker_port,
 		kfree(match);
 	} else {
 		found = match;
-		found->cookie = rocker->flow_tbl_next_cookie++;
+		if (!found->cookie)
+			found->cookie = rocker->flow_tbl_next_cookie++;
 		hash_add(rocker->flow_tbl, &found->entry, found->key_crc32);
 		add_to_hw = true;
 	}
@@ -2311,7 +2313,7 @@ static int rocker_flow_tbl_do(struct rocker_port *rocker_port,
 		return rocker_flow_tbl_add(rocker_port, entry, nowait);
 }
 
-static int rocker_flow_tbl_ig_port(struct rocker_port *rocker_port,
+static int rocker_flow_tbl_ig_port(struct rocker_port *rocker_port, u64 flow_id,
 				   int flags, u32 in_lport, u32 in_lport_mask,
 				   enum rocker_of_dpa_table_id goto_tbl)
 {
@@ -2327,11 +2329,14 @@ static int rocker_flow_tbl_ig_port(struct rocker_port *rocker_port,
 	entry->key.ig_port.in_lport_mask = in_lport_mask;
 	entry->key.ig_port.goto_tbl = goto_tbl;
 
+	if (flow_id)
+		entry->cookie = flow_id;
+
 	return rocker_flow_tbl_do(rocker_port, flags, entry);
 }
 
 static int rocker_flow_tbl_vlan(struct rocker_port *rocker_port,
-				int flags, u32 in_lport,
+				int flags, u64 flow_id, u32 in_lport,
 				__be16 vlan_id, __be16 vlan_id_mask,
 				enum rocker_of_dpa_table_id goto_tbl,
 				bool untagged, __be16 new_vlan_id)
@@ -2352,10 +2357,14 @@ static int rocker_flow_tbl_vlan(struct rocker_port *rocker_port,
 	entry->key.vlan.untagged = untagged;
 	entry->key.vlan.new_vlan_id = new_vlan_id;
 
+	if (flow_id)
+		entry->cookie = flow_id;
+
 	return rocker_flow_tbl_do(rocker_port, flags, entry);
 }
 
 static int rocker_flow_tbl_term_mac(struct rocker_port *rocker_port,
+				    u64 flow_id,
 				    u32 in_lport, u32 in_lport_mask,
 				    __be16 eth_type, const u8 *eth_dst,
 				    const u8 *eth_dst_mask, __be16 vlan_id,
@@ -2388,11 +2397,14 @@ static int rocker_flow_tbl_term_mac(struct rocker_port *rocker_port,
 	entry->key.term_mac.vlan_id_mask = vlan_id_mask;
 	entry->key.term_mac.copy_to_cpu = copy_to_cpu;
 
+	if (flow_id)
+		entry->cookie = flow_id;
+
 	return rocker_flow_tbl_do(rocker_port, flags, entry);
 }
 
 static int rocker_flow_tbl_bridge(struct rocker_port *rocker_port,
-				  int flags,
+				  int flags, u64 flow_id,
 				  const u8 *eth_dst, const u8 *eth_dst_mask,
 				  __be16 vlan_id, u32 tunnel_id,
 				  enum rocker_of_dpa_table_id goto_tbl,
@@ -2442,11 +2454,14 @@ static int rocker_flow_tbl_bridge(struct rocker_port *rocker_port,
 	entry->key.bridge.group_id = group_id;
 	entry->key.bridge.copy_to_cpu = copy_to_cpu;
 
+	if (flow_id)
+		entry->cookie = flow_id;
+
 	return rocker_flow_tbl_do(rocker_port, flags, entry);
 }
 
 static int rocker_flow_tbl_acl(struct rocker_port *rocker_port,
-			       int flags, u32 in_lport,
+			       int flags, u64 flow_id, u32 in_lport,
 			       u32 in_lport_mask,
 			       const u8 *eth_src, const u8 *eth_src_mask,
 			       const u8 *eth_dst, const u8 *eth_dst_mask,
@@ -2494,6 +2509,9 @@ static int rocker_flow_tbl_acl(struct rocker_port *rocker_port,
 	entry->key.acl.ip_tos_mask = ip_tos_mask;
 	entry->key.acl.group_id = group_id;
 
+	if (flow_id)
+		entry->cookie = flow_id;
+
 	return rocker_flow_tbl_do(rocker_port, flags, entry);
 }
 
@@ -2604,7 +2622,7 @@ static int rocker_group_tbl_do(struct rocker_port *rocker_port,
 }
 
 static int rocker_group_l2_interface(struct rocker_port *rocker_port,
-				     int flags, __be16 vlan_id,
+				     int flags, int flow_id, __be16 vlan_id,
 				     u32 out_lport, int pop_vlan)
 {
 	struct rocker_group_tbl_entry *entry;
@@ -2615,6 +2633,7 @@ static int rocker_group_l2_interface(struct rocker_port *rocker_port,
 
 	entry->group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
 	entry->l2_interface.pop_vlan = pop_vlan;
+	entry->cookie = flow_id;
 
 	return rocker_group_tbl_do(rocker_port, flags, entry);
 }
@@ -2713,7 +2732,7 @@ static int rocker_port_vlan_l2_groups(struct rocker_port *rocker_port,
 	if (rocker_port->stp_state == BR_STATE_LEARNING ||
 	    rocker_port->stp_state == BR_STATE_FORWARDING) {
 		out_lport = rocker_port->lport;
-		err = rocker_group_l2_interface(rocker_port, flags,
+		err = rocker_group_l2_interface(rocker_port, flags, 0,
 						vlan_id, out_lport,
 						pop_vlan);
 		if (err) {
@@ -2739,7 +2758,7 @@ static int rocker_port_vlan_l2_groups(struct rocker_port *rocker_port,
 		return 0;
 
 	out_lport = 0;
-	err = rocker_group_l2_interface(rocker_port, flags,
+	err = rocker_group_l2_interface(rocker_port, flags, 0,
 					vlan_id, out_lport,
 					pop_vlan);
 	if (err) {
@@ -2813,7 +2832,7 @@ static int rocker_port_ctrl_vlan_acl(struct rocker_port *rocker_port,
 	u32 group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
 	int err;
 
-	err = rocker_flow_tbl_acl(rocker_port, flags,
+	err = rocker_flow_tbl_acl(rocker_port, flags, 0,
 				  in_lport, in_lport_mask,
 				  eth_src, eth_src_mask,
 				  ctrl->eth_dst, ctrl->eth_dst_mask,
@@ -2842,7 +2861,7 @@ static int rocker_port_ctrl_vlan_bridge(struct rocker_port *rocker_port,
 	if (!rocker_port_is_bridged(rocker_port))
 		return 0;
 
-	err = rocker_flow_tbl_bridge(rocker_port, flags,
+	err = rocker_flow_tbl_bridge(rocker_port, flags, 0,
 				     ctrl->eth_dst, ctrl->eth_dst_mask,
 				     vlan_id, tunnel_id,
 				     goto_tbl, group_id, ctrl->copy_to_cpu);
@@ -2864,7 +2883,7 @@ static int rocker_port_ctrl_vlan_term(struct rocker_port *rocker_port,
 	if (ntohs(vlan_id) == 0)
 		vlan_id = rocker_port->internal_vlan_id;
 
-	err = rocker_flow_tbl_term_mac(rocker_port,
+	err = rocker_flow_tbl_term_mac(rocker_port, 0,
 				       rocker_port->lport, in_lport_mask,
 				       ctrl->eth_type, ctrl->eth_dst,
 				       ctrl->eth_dst_mask, vlan_id,
@@ -2978,7 +2997,7 @@ static int rocker_port_vlan(struct rocker_port *rocker_port, int flags,
 		return err;
 	}
 
-	err = rocker_flow_tbl_vlan(rocker_port, flags,
+	err = rocker_flow_tbl_vlan(rocker_port, flags, 0,
 				   in_lport, vlan_id, vlan_id_mask,
 				   goto_tbl, untagged, internal_vlan_id);
 	if (err)
@@ -3003,7 +3022,7 @@ static int rocker_port_ig_tbl(struct rocker_port *rocker_port, int flags)
 	in_lport_mask = 0xffff0000;
 	goto_tbl = ROCKER_OF_DPA_TABLE_ID_VLAN;
 
-	err = rocker_flow_tbl_ig_port(rocker_port, flags,
+	err = rocker_flow_tbl_ig_port(rocker_port, flags, 0,
 				      in_lport, in_lport_mask,
 				      goto_tbl);
 	if (err)
@@ -3053,7 +3072,7 @@ static int rocker_port_fdb_learn(struct rocker_port *rocker_port,
 		group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
 
 	if (!(flags & ROCKER_OP_FLAG_REFRESH)) {
-		err = rocker_flow_tbl_bridge(rocker_port, flags, addr, NULL,
+		err = rocker_flow_tbl_bridge(rocker_port, flags, 0, addr, NULL,
 					     vlan_id, tunnel_id, goto_tbl,
 					     group_id, copy_to_cpu);
 		if (err)
@@ -3188,7 +3207,7 @@ static int rocker_port_router_mac(struct rocker_port *rocker_port,
 		vlan_id = rocker_port->internal_vlan_id;
 
 	eth_type = htons(ETH_P_IP);
-	err = rocker_flow_tbl_term_mac(rocker_port,
+	err = rocker_flow_tbl_term_mac(rocker_port, 0,
 				       rocker_port->lport, in_lport_mask,
 				       eth_type, rocker_port->dev->dev_addr,
 				       dst_mac_mask, vlan_id, vlan_id_mask,
@@ -3197,7 +3216,7 @@ static int rocker_port_router_mac(struct rocker_port *rocker_port,
 		return err;
 
 	eth_type = htons(ETH_P_IPV6);
-	err = rocker_flow_tbl_term_mac(rocker_port,
+	err = rocker_flow_tbl_term_mac(rocker_port, 0,
 				       rocker_port->lport, in_lport_mask,
 				       eth_type, rocker_port->dev->dev_addr,
 				       dst_mac_mask, vlan_id, vlan_id_mask,
@@ -3232,7 +3251,7 @@ static int rocker_port_fwding(struct rocker_port *rocker_port)
 			continue;
 		vlan_id = htons(vid);
 		pop_vlan = rocker_vlan_id_is_internal(vlan_id);
-		err = rocker_group_l2_interface(rocker_port, flags,
+		err = rocker_group_l2_interface(rocker_port, flags, 0,
 						vlan_id, out_lport,
 						pop_vlan);
 		if (err) {
@@ -3872,7 +3891,7 @@ static int rocker_flow_set_ig_port(struct net_device *dev,
 	in_lport_mask = rule->matches[0].mask_u32;
 	goto_tbl = rocker_goto_value(rule->actions[0].args[0].value_u16);
 
-	return rocker_flow_tbl_ig_port(rocker_port, flags,
+	return rocker_flow_tbl_ig_port(rocker_port, flags, 0,
 				       in_lport, in_lport_mask,
 				       goto_tbl);
 }
@@ -3929,7 +3948,7 @@ static int rocker_flow_set_vlan(struct net_device *dev,
 		}
 	}
 
-	return rocker_flow_tbl_vlan(rocker_port, flags, in_lport,
+	return rocker_flow_tbl_vlan(rocker_port, flags, 0, in_lport,
 				    vlan_id, vlan_id_mask, goto_tbl,
 				    untagged, new_vlan_id);
 }
@@ -4003,7 +4022,8 @@ static int rocker_flow_set_term_mac(struct net_device *dev,
 		}
 	}
 
-	return rocker_flow_tbl_term_mac(rocker_port, in_lport, in_lport_mask,
+	return rocker_flow_tbl_term_mac(rocker_port, 0,
+					in_lport, in_lport_mask,
 					ethtype, eth_dst, eth_dst_mask,
 					vlan_id, vlan_id_mask,
 					copy_to_cpu, flags);
@@ -4097,7 +4117,7 @@ static int rocker_flow_set_bridge(struct net_device *dev,
 	}
 
 	/* Ignoring eth_dst_mask it seems to cause a EINVAL return code */
-	return rocker_flow_tbl_bridge(rocker_port, flags,
+	return rocker_flow_tbl_bridge(rocker_port, flags, 0,
 				      eth_dst, eth_dst_mask,
 				      vlan_id, tunnel_id,
 				      goto_tbl, group_id, copy_to_cpu);
@@ -4199,7 +4219,7 @@ static int rocker_flow_set_acl(struct net_device *dev,
 		}
 	}
 
-	return rocker_flow_tbl_acl(rocker_port, flags,
+	return rocker_flow_tbl_acl(rocker_port, flags, 0,
 				   in_lport, in_lport_mask,
 				   eth_src, eth_src_mask,
 				   eth_dst, eth_dst_mask, ethtype,

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 11/12] net: rocker: have flow api calls set cookie value
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
                   ` (9 preceding siblings ...)
  2015-01-20 20:30 ` [net-next PATCH v3 10/12] net: rocker: add cookie to group acls and use flow_id to set cookie John Fastabend
@ 2015-01-20 20:31 ` John Fastabend
  2015-01-20 20:31 ` [net-next PATCH v3 12/12] net: rocker: implement delete flow routine John Fastabend
  2015-01-22 12:52 ` [net-next PATCH v3 00/12] Flow API Pablo Neira Ayuso
  12 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:31 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c |   16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 5ba46f6..3ceb313 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3891,7 +3891,7 @@ static int rocker_flow_set_ig_port(struct net_device *dev,
 	in_lport_mask = rule->matches[0].mask_u32;
 	goto_tbl = rocker_goto_value(rule->actions[0].args[0].value_u16);
 
-	return rocker_flow_tbl_ig_port(rocker_port, flags, 0,
+	return rocker_flow_tbl_ig_port(rocker_port, flags, rule->uid,
 				       in_lport, in_lport_mask,
 				       goto_tbl);
 }
@@ -3948,7 +3948,7 @@ static int rocker_flow_set_vlan(struct net_device *dev,
 		}
 	}
 
-	return rocker_flow_tbl_vlan(rocker_port, flags, 0, in_lport,
+	return rocker_flow_tbl_vlan(rocker_port, flags, rule->uid, in_lport,
 				    vlan_id, vlan_id_mask, goto_tbl,
 				    untagged, new_vlan_id);
 }
@@ -4022,7 +4022,7 @@ static int rocker_flow_set_term_mac(struct net_device *dev,
 		}
 	}
 
-	return rocker_flow_tbl_term_mac(rocker_port, 0,
+	return rocker_flow_tbl_term_mac(rocker_port, rule->uid,
 					in_lport, in_lport_mask,
 					ethtype, eth_dst, eth_dst_mask,
 					vlan_id, vlan_id_mask,
@@ -4117,7 +4117,7 @@ static int rocker_flow_set_bridge(struct net_device *dev,
 	}
 
 	/* Ignoring eth_dst_mask it seems to cause a EINVAL return code */
-	return rocker_flow_tbl_bridge(rocker_port, flags, 0,
+	return rocker_flow_tbl_bridge(rocker_port, flags, rule->uid,
 				      eth_dst, eth_dst_mask,
 				      vlan_id, tunnel_id,
 				      goto_tbl, group_id, copy_to_cpu);
@@ -4219,7 +4219,7 @@ static int rocker_flow_set_acl(struct net_device *dev,
 		}
 	}
 
-	return rocker_flow_tbl_acl(rocker_port, flags, 0,
+	return rocker_flow_tbl_acl(rocker_port, flags, rule->uid,
 				   in_lport, in_lport_mask,
 				   eth_src, eth_src_mask,
 				   eth_dst, eth_dst_mask, ethtype,
@@ -4279,6 +4279,8 @@ static int rocker_flow_set_group_slice_l3_unicast(struct net_device *dev,
 		}
 	}
 
+	entry->cookie = rule->uid;
+
 	return rocker_group_tbl_do(rocker_port, flags, entry);
 }
 
@@ -4330,6 +4332,8 @@ static int rocker_flow_set_group_slice_l2_rewrite(struct net_device *dev,
 		}
 	}
 
+	entry->cookie = rule->uid;
+
 	return rocker_group_tbl_do(rocker_port, flags, entry);
 }
 
@@ -4381,6 +4385,8 @@ static int rocker_flow_set_group_slice_l2(struct net_device *dev,
 		}
 	}
 
+	entry->cookie = rule->uid;
+
 	return rocker_group_tbl_do(rocker_port, flags, entry);
 }
 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [net-next PATCH v3 12/12] net: rocker: implement delete flow routine
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
                   ` (10 preceding siblings ...)
  2015-01-20 20:31 ` [net-next PATCH v3 11/12] net: rocker: have flow api calls set cookie value John Fastabend
@ 2015-01-20 20:31 ` John Fastabend
  2015-01-22 12:52 ` [net-next PATCH v3 00/12] Flow API Pablo Neira Ayuso
  12 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 20:31 UTC (permalink / raw)
  To: tgraf, simon.horman, sfeldma; +Cc: netdev, jhs, davem, gerlitz.or, andy, ast

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c |   46 +++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 3ceb313..ba48e88 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4436,7 +4436,51 @@ static int rocker_set_rules(struct net_device *dev,
 static int rocker_del_rules(struct net_device *dev,
 			    struct net_flow_rule *rule)
 {
-	return -EOPNOTSUPP;
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker_flow_tbl_entry *entry;
+	struct rocker_group_tbl_entry *group;
+	struct hlist_node *tmp;
+	int bkt, err = -EEXIST;
+	unsigned long flags;
+
+	spin_lock_irqsave(&rocker_port->rocker->flow_tbl_lock, flags);
+	hash_for_each_safe(rocker_port->rocker->flow_tbl,
+			   bkt, tmp, entry, entry) {
+		if (rocker_goto_value(rule->table_id) != entry->key.tbl_id ||
+		    rule->uid != entry->cookie)
+			continue;
+
+		hash_del(&entry->entry);
+		err = 0;
+		break;
+	}
+	spin_unlock_irqrestore(&rocker_port->rocker->flow_tbl_lock, flags);
+
+	if (!err)
+		goto done;
+
+	spin_lock_irqsave(&rocker_port->rocker->group_tbl_lock, flags);
+	hash_for_each_safe(rocker_port->rocker->group_tbl,
+			   bkt, tmp, group, entry) {
+		if (rocker_goto_value(rule->table_id) !=
+			ROCKER_GROUP_TYPE_GET(group->group_id) ||
+		    rule->uid != group->cookie)
+			continue;
+
+		hash_del(&group->entry);
+		err = 0;
+		break;
+	}
+	spin_unlock_irqrestore(&rocker_port->rocker->group_tbl_lock, flags);
+
+done:
+	if (!err) {
+		err = rocker_cmd_exec(rocker_port->rocker, rocker_port,
+				      rocker_cmd_flow_tbl_del,
+				      entry, NULL, NULL, true);
+		kfree(entry);
+	}
+	return err;
 }
 #endif
 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 04/12] net: flow_table: create a set of common headers and actions
  2015-01-20 20:27 ` [net-next PATCH v3 04/12] net: flow_table: create a set of common headers and actions John Fastabend
@ 2015-01-20 20:59   ` John W. Linville
  2015-01-20 22:10     ` John Fastabend
  0 siblings, 1 reply; 66+ messages in thread
From: John W. Linville @ 2015-01-20 20:59 UTC (permalink / raw)
  To: John Fastabend
  Cc: tgraf, simon.horman, sfeldma, netdev, jhs, davem, gerlitz.or, andy, ast

On Tue, Jan 20, 2015 at 12:27:53PM -0800, John Fastabend wrote:
> This adds common headers and actions that drivers can use.
> 
> I have not yet moved the header graphs into the common header
> because I'm not entirely convinced its re-usable. The devices
> I have been looking at have different enough header graphs that
> they wouldn't be re-usable. However possibly many 40Gbp NICs
> for example could share a common header graph. When we get
> multiple implementations we can move this into the common file
> if it makes sense.
> 
> And table structures seem to be unique enough that there is
> little value in putting each devices table layout into the
> common file so its left for device specific implementation.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  include/linux/if_flow_common.h |  257 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 257 insertions(+)
>  create mode 100644 include/linux/if_flow_common.h
> 
> diff --git a/include/linux/if_flow_common.h b/include/linux/if_flow_common.h
> new file mode 100644
> index 0000000..ef2d66f
> --- /dev/null
> +++ b/include/linux/if_flow_common.h

<snip>

> +struct net_flow_action net_flow_pop_vlan = {
> +	.name = "pop_vlan",
> +	.uid = ACTION_POP_VLAN,
> +	.args = net_flow_null_args,
> +};

Random thought, should there be a "push_vlan" (for double VLAN tagging)?


-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 04/12] net: flow_table: create a set of common headers and actions
  2015-01-20 20:59   ` John W. Linville
@ 2015-01-20 22:10     ` John Fastabend
  0 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-20 22:10 UTC (permalink / raw)
  To: John W. Linville
  Cc: tgraf, simon.horman, sfeldma, netdev, jhs, davem, gerlitz.or, andy, ast

On 01/20/2015 12:59 PM, John W. Linville wrote:
> On Tue, Jan 20, 2015 at 12:27:53PM -0800, John Fastabend wrote:
>> This adds common headers and actions that drivers can use.
>>
>> I have not yet moved the header graphs into the common header
>> because I'm not entirely convinced its re-usable. The devices
>> I have been looking at have different enough header graphs that
>> they wouldn't be re-usable. However possibly many 40Gbp NICs
>> for example could share a common header graph. When we get
>> multiple implementations we can move this into the common file
>> if it makes sense.
>>
>> And table structures seem to be unique enough that there is
>> little value in putting each devices table layout into the
>> common file so its left for device specific implementation.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>   include/linux/if_flow_common.h |  257 ++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 257 insertions(+)
>>   create mode 100644 include/linux/if_flow_common.h
>>
>> diff --git a/include/linux/if_flow_common.h b/include/linux/if_flow_common.h
>> new file mode 100644
>> index 0000000..ef2d66f
>> --- /dev/null
>> +++ b/include/linux/if_flow_common.h
>
> <snip>
>
>> +struct net_flow_action net_flow_pop_vlan = {
>> +	.name = "pop_vlan",
>> +	.uid = ACTION_POP_VLAN,
>> +	.args = net_flow_null_args,
>> +};
>
> Random thought, should there be a "push_vlan" (for double VLAN tagging)?
>
>

Yep I should add that one there are also some other actions
on my todo list but some of them require updates to the driver.

Assuming its not a big issue for anyone I would like like to
get this series in with the match/actions it has and then extend
the action and match lists.

.John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 01/12] net: flow_table: create interface for hw match/action tables
  2015-01-20 20:26 ` [net-next PATCH v3 01/12] net: flow_table: create interface for hw match/action tables John Fastabend
@ 2015-01-22  4:37   ` Simon Horman
  0 siblings, 0 replies; 66+ messages in thread
From: Simon Horman @ 2015-01-22  4:37 UTC (permalink / raw)
  To: John Fastabend; +Cc: tgraf, sfeldma, netdev, jhs, davem, gerlitz.or, andy, ast

On Tue, Jan 20, 2015 at 12:26:37PM -0800, John Fastabend wrote:

[snip]

> diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
> new file mode 100644
> index 0000000..7ce1e1d
> --- /dev/null
> +++ b/include/linux/if_flow.h
> @@ -0,0 +1,188 @@

[snip]

> +#define NFL_JUMP_TABLE_DONE 0
> +enum {
> +	NFL_JUMP_ENTRY_UNSPEC,
> +	NFL_JUMP_ENTRY,
> +	__NFL_JUMP_ENTRY_MAX,
> +};

For consistency it seems that the following could go here:

#define NFL_JUMP_ENTRY_MAX (__NFL_JUMP_ENTRY_MAX - 1)

> +enum {
> +	NFL_HEADER_NODE_HDRS_UNSPEC,
> +	NFL_HEADER_NODE_HDRS_VALUE,
> +	__NFL_HEADER_NODE_HDRS_MAX,
> +};
> +
> +#define NFL_HEADER_NODE_HDRS_MAX (__NFL_HEADER_NODE_HDRS_MAX - 1)
> +
> +enum {
> +	NFL_HEADER_NODE_UNSPEC,
> +	NFL_HEADER_NODE_NAME,
> +	NFL_HEADER_NODE_UID,
> +	NFL_HEADER_NODE_HDRS,
> +	NFL_HEADER_NODE_JUMP,
> +	__NFL_HEADER_NODE_MAX,
> +};
> +
> +#define NFL_HEADER_NODE_MAX (__NFL_HEADER_NODE_MAX - 1)
> +
> +enum {
> +	NFL_HEADER_GRAPH_UNSPEC,
> +	NFL_HEADER_GRAPH_NODE,
> +	__NFL_HEADER_GRAPH_MAX,
> +};
> +
> +#define NFL_HEADER_GRAPH_MAX (__NFL_HEADER_GRAPH_MAX - 1)

[snip]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
                   ` (11 preceding siblings ...)
  2015-01-20 20:31 ` [net-next PATCH v3 12/12] net: rocker: implement delete flow routine John Fastabend
@ 2015-01-22 12:52 ` Pablo Neira Ayuso
  2015-01-22 13:37   ` Thomas Graf
  12 siblings, 1 reply; 66+ messages in thread
From: Pablo Neira Ayuso @ 2015-01-22 12:52 UTC (permalink / raw)
  To: John Fastabend
  Cc: tgraf, simon.horman, sfeldma, netdev, jhs, davem, gerlitz.or, andy, ast

Hi John,

On Tue, Jan 20, 2015 at 12:26:13PM -0800, John Fastabend wrote:
> I believe I addressed all the comments so far except for the integrate
> with 'tc'. I plan to work on the integration pieces next.

I think that postponing the integration with 'tc' means that we're
renouncing to provide some abstraction to represent the actions that
the device provides. After this patch we'll have a standard API that
exposes the vendor specific semantics, *so user configurations will
not be portable anymore*.

At least, we should come up with some abstraction / mapping as
interface, so the vendors can use them to represent their operations.
That interface will provide a trade-off: If the vendor offers an
operation that doesn't map to our abstraction, then sorry that
operation has to remain behind the curtain.

netdev is just two weeks ahead and this is an important change IMO.
I'd rather have the chance to meet you and other fellows there and
discuss if we can come up with some glue abstraction.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 12:52 ` [net-next PATCH v3 00/12] Flow API Pablo Neira Ayuso
@ 2015-01-22 13:37   ` Thomas Graf
  2015-01-22 14:00     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 66+ messages in thread
From: Thomas Graf @ 2015-01-22 13:37 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: John Fastabend, simon.horman, sfeldma, netdev, jhs, davem,
	gerlitz.or, andy, ast

On 01/22/15 at 01:52pm, Pablo Neira Ayuso wrote:
> Hi John,
> 
> On Tue, Jan 20, 2015 at 12:26:13PM -0800, John Fastabend wrote:
> > I believe I addressed all the comments so far except for the integrate
> > with 'tc'. I plan to work on the integration pieces next.
> 
> I think that postponing the integration with 'tc' means that we're
> renouncing to provide some abstraction to represent the actions that
> the device provides. After this patch we'll have a standard API that
> exposes the vendor specific semantics, *so user configurations will
> not be portable anymore*.
> 
> At least, we should come up with some abstraction / mapping as
> interface, so the vendors can use them to represent their operations.
> That interface will provide a trade-off: If the vendor offers an
> operation that doesn't map to our abstraction, then sorry that
> operation has to remain behind the curtain.

I thought this *is* the abstraction ;-) Can you elaborate on which
parts you consider vendor specific?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 13:37   ` Thomas Graf
@ 2015-01-22 14:00     ` Pablo Neira Ayuso
  2015-01-22 15:00       ` Jamal Hadi Salim
  0 siblings, 1 reply; 66+ messages in thread
From: Pablo Neira Ayuso @ 2015-01-22 14:00 UTC (permalink / raw)
  To: Thomas Graf
  Cc: John Fastabend, simon.horman, sfeldma, netdev, jhs, davem,
	gerlitz.or, andy, ast

On Thu, Jan 22, 2015 at 01:37:13PM +0000, Thomas Graf wrote:
> On 01/22/15 at 01:52pm, Pablo Neira Ayuso wrote:
> > Hi John,
> > 
> > On Tue, Jan 20, 2015 at 12:26:13PM -0800, John Fastabend wrote:
> > > I believe I addressed all the comments so far except for the integrate
> > > with 'tc'. I plan to work on the integration pieces next.
> > 
> > I think that postponing the integration with 'tc' means that we're
> > renouncing to provide some abstraction to represent the actions that
> > the device provides. After this patch we'll have a standard API that
> > exposes the vendor specific semantics, *so user configurations will
> > not be portable anymore*.
> > 
> > At least, we should come up with some abstraction / mapping as
> > interface, so the vendors can use them to represent their operations.
> > That interface will provide a trade-off: If the vendor offers an
> > operation that doesn't map to our abstraction, then sorry that
> > operation has to remain behind the curtain.
> 
> I thought this *is* the abstraction ;-) Can you elaborate on which
> parts you consider vendor specific?

+/* rocker specific action definitions */
+struct net_flow_action_arg rocker_set_group_id_args[] = {
+       {
+               .name = "group_id",
+               .type = NFL_ACTION_ARG_TYPE_U32,
+               .value_u32 = 0,
+       },

that is retrieved via ndo_flow_get_actions and fully exposed to
userspace.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 14:00     ` Pablo Neira Ayuso
@ 2015-01-22 15:00       ` Jamal Hadi Salim
  2015-01-22 15:13         ` Thomas Graf
  0 siblings, 1 reply; 66+ messages in thread
From: Jamal Hadi Salim @ 2015-01-22 15:00 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Thomas Graf
  Cc: John Fastabend, simon.horman, sfeldma, netdev, davem, gerlitz.or,
	andy, ast, Jiri Pirko

On 01/22/15 09:00, Pablo Neira Ayuso wrote:

>
> +/* rocker specific action definitions */
> +struct net_flow_action_arg rocker_set_group_id_args[] = {
> +       {
> +               .name = "group_id",
> +               .type = NFL_ACTION_ARG_TYPE_U32,
> +               .value_u32 = 0,
> +       },
>
> that is retrieved via ndo_flow_get_actions and fully exposed to
> userspace.
>

My main concern is along similar lines (I did express it earlier and
I think Jiri chimed in as well).
The API exposes direct access to hardware. I am sure this was a result
of trying to replace the ethtool interface (which was primitive).
By providing vendors direct access to the hardware - they do not need
to use any traditional Linux tooling/APIs. I see this as a gaping hole
for vendor SDKs with their own definitions of their own hardware that
doesnt work with anyone else. i.e it seems to standardize proprietary
interfaces. Maybe thats what Pablo is alluding to.
Interfacing tc or nftables (or pick your favorite linux tool here) would
be preferable.

cheers,
jamal

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 15:00       ` Jamal Hadi Salim
@ 2015-01-22 15:13         ` Thomas Graf
  2015-01-22 15:28           ` Jamal Hadi Salim
  2015-01-22 16:58           ` John Fastabend
  0 siblings, 2 replies; 66+ messages in thread
From: Thomas Graf @ 2015-01-22 15:13 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, John Fastabend, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast, Jiri Pirko

On 01/22/15 at 10:00am, Jamal Hadi Salim wrote:
> On 01/22/15 09:00, Pablo Neira Ayuso wrote:
> 
> >
> >+/* rocker specific action definitions */
> >+struct net_flow_action_arg rocker_set_group_id_args[] = {
> >+       {
> >+               .name = "group_id",
> >+               .type = NFL_ACTION_ARG_TYPE_U32,
> >+               .value_u32 = 0,
> >+       },
> >
> >that is retrieved via ndo_flow_get_actions and fully exposed to
> >userspace.
> >
> 
> My main concern is along similar lines (I did express it earlier and
> I think Jiri chimed in as well).
> The API exposes direct access to hardware. I am sure this was a result
> of trying to replace the ethtool interface (which was primitive).
> By providing vendors direct access to the hardware - they do not need
> to use any traditional Linux tooling/APIs.

I don't follow this. John's proposal allows to decide on a case by
case basis what we want to export. Just like with ethtool or
RTNETLINK. There is no direct access to hardware. A user can only
configure what is being exposed by the kernel.

Pablo raises an interesting point though. How do we handle unique
features like Rocker groups.

Maybe Jiri and Scott can chime in and describe if we can map this to
something more generic and avoid exporting anything Rocker specific.

What would a rocker group map to in the tc world?

> I see this as a gaping hole
> for vendor SDKs with their own definitions of their own hardware that
> doesnt work with anyone else. i.e it seems to standardize proprietary
> interfaces. Maybe thats what Pablo is alluding to.

I will be the first to root for rejection if such patches appear.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 15:13         ` Thomas Graf
@ 2015-01-22 15:28           ` Jamal Hadi Salim
  2015-01-22 15:37             ` Thomas Graf
  2015-01-22 16:58           ` John Fastabend
  1 sibling, 1 reply; 66+ messages in thread
From: Jamal Hadi Salim @ 2015-01-22 15:28 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Pablo Neira Ayuso, John Fastabend, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast, Jiri Pirko

On 01/22/15 10:13, Thomas Graf wrote:

> I don't follow this. John's proposal allows to decide on a case by
> case basis what we want to export. Just like with ethtool or
> RTNETLINK. There is no direct access to hardware. A user can only
> configure what is being exposed by the kernel.
>

So if i am a vendor with my own driver, I can expose whatever i want.
Only my SDK needs to deal with what i expose. There is no needed
feature in the kernel other than the driver exposing it that is
required.

cheers,
jamal

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 15:28           ` Jamal Hadi Salim
@ 2015-01-22 15:37             ` Thomas Graf
  2015-01-22 15:44               ` Jamal Hadi Salim
                                 ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Thomas Graf @ 2015-01-22 15:37 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, John Fastabend, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast, Jiri Pirko

On 01/22/15 at 10:28am, Jamal Hadi Salim wrote:
> On 01/22/15 10:13, Thomas Graf wrote:
> 
> >I don't follow this. John's proposal allows to decide on a case by
> >case basis what we want to export. Just like with ethtool or
> >RTNETLINK. There is no direct access to hardware. A user can only
> >configure what is being exposed by the kernel.
> >
> 
> So if i am a vendor with my own driver, I can expose whatever i want.

No. We will reject any driver change attempting to do so on this
list.

This is the whole point of this: Coming up with a model that allows
to describe capabilities and offer flow programming capabilities
in a Vendor neutral way. A "push_vlan" or "pop_vlan" action will work
with any driver that supports it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 15:37             ` Thomas Graf
@ 2015-01-22 15:44               ` Jamal Hadi Salim
  2015-01-23 10:10                 ` Thomas Graf
  2015-01-22 15:48               ` Jiri Pirko
  2015-01-22 16:49               ` Pablo Neira Ayuso
  2 siblings, 1 reply; 66+ messages in thread
From: Jamal Hadi Salim @ 2015-01-22 15:44 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Pablo Neira Ayuso, John Fastabend, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast, Jiri Pirko

On 01/22/15 10:37, Thomas Graf wrote:
> On 01/22/15 at 10:28am, Jamal Hadi Salim wrote:

>> So if i am a vendor with my own driver, I can expose whatever i want.
>
> No. We will reject any driver change attempting to do so on this
> list.
>

Vendor provides a driver that exposes a discoverable interface
(capabilities exposure that is facilitated).
They dont need it to be part of the mainstream kernel.
And they dont need any of your definitions.

cheers,
jamal

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 15:37             ` Thomas Graf
  2015-01-22 15:44               ` Jamal Hadi Salim
@ 2015-01-22 15:48               ` Jiri Pirko
  2015-01-22 17:58                 ` Thomas Graf
  2015-01-22 16:49               ` Pablo Neira Ayuso
  2 siblings, 1 reply; 66+ messages in thread
From: Jiri Pirko @ 2015-01-22 15:48 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, John Fastabend,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

Thu, Jan 22, 2015 at 04:37:27PM CET, tgraf@suug.ch wrote:
>On 01/22/15 at 10:28am, Jamal Hadi Salim wrote:
>> On 01/22/15 10:13, Thomas Graf wrote:
>> 
>> >I don't follow this. John's proposal allows to decide on a case by
>> >case basis what we want to export. Just like with ethtool or
>> >RTNETLINK. There is no direct access to hardware. A user can only
>> >configure what is being exposed by the kernel.
>> >
>> 
>> So if i am a vendor with my own driver, I can expose whatever i want.
>
>No. We will reject any driver change attempting to do so on this
>list.

That is not 100%, on contrary. If the infrastructure would be made to
explicitly disallow that kind of behaviour, it would be much safer.


>
>This is the whole point of this: Coming up with a model that allows
>to describe capabilities and offer flow programming capabilities
>in a Vendor neutral way. A "push_vlan" or "pop_vlan" action will work
>with any driver that supports it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 15:37             ` Thomas Graf
  2015-01-22 15:44               ` Jamal Hadi Salim
  2015-01-22 15:48               ` Jiri Pirko
@ 2015-01-22 16:49               ` Pablo Neira Ayuso
  2015-01-22 17:10                 ` John Fastabend
                                   ` (2 more replies)
  2 siblings, 3 replies; 66+ messages in thread
From: Pablo Neira Ayuso @ 2015-01-22 16:49 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jamal Hadi Salim, John Fastabend, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast, Jiri Pirko

On Thu, Jan 22, 2015 at 03:37:27PM +0000, Thomas Graf wrote:
> On 01/22/15 at 10:28am, Jamal Hadi Salim wrote:
> > On 01/22/15 10:13, Thomas Graf wrote:
> > 
> > >I don't follow this. John's proposal allows to decide on a case by
> > >case basis what we want to export. Just like with ethtool or
> > >RTNETLINK. There is no direct access to hardware. A user can only
> > >configure what is being exposed by the kernel.
> > >
> > 
> > So if i am a vendor with my own driver, I can expose whatever i want.
> 
> No. We will reject any driver change attempting to do so on this
> list.

I think those vendors do not want to push those driver changes
mainstream. They will likely use these new ndo's to fully expose their
vendor-specific capabilities distributed in proprietary blobs.

I remember to have seen one ugly patch for netfilter that added
several hook functions (not netfilter hooks) at different positions of
the NAT code, the goal was to offload NAT through hardware. I was told
the code that was using those ad-hoc hooks was distributed in a binary
blob.

> This is the whole point of this: Coming up with a model that allows
> to describe capabilities and offer flow programming capabilities
> in a Vendor neutral way. A "push_vlan" or "pop_vlan" action will work
> with any driver that supports it.

Right, we need an abstraction for actions too, and the infrastructure
should not provide any means to circunvent and expose vendor specific
details.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 15:13         ` Thomas Graf
  2015-01-22 15:28           ` Jamal Hadi Salim
@ 2015-01-22 16:58           ` John Fastabend
  2015-01-23 10:49             ` Thomas Graf
  2015-01-24 12:29             ` Jamal Hadi Salim
  1 sibling, 2 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-22 16:58 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, simon.horman, sfeldma,
	netdev, davem, gerlitz.or, andy, ast, Jiri Pirko

On 01/22/2015 07:13 AM, Thomas Graf wrote:
> On 01/22/15 at 10:00am, Jamal Hadi Salim wrote:
>> On 01/22/15 09:00, Pablo Neira Ayuso wrote:
>>
>>>

I'll try to unify the threads here

>>> +/* rocker specific action definitions */
>>> +struct net_flow_action_arg rocker_set_group_id_args[] = {
>>> +       {
>>> +               .name = "group_id",
>>> +               .type = NFL_ACTION_ARG_TYPE_U32,
>>> +               .value_u32 = 0,
>>> +       },
>>>

In response to Pablo's observation,

Correct this is fully exposed to user space, but it is also self
contained inside the API meaning I can learn when to use it and what it
does by looking at the other operations tables the table graph and
supported headers. The assumption I am making that is not in the API
explicitly yet. Is that actions named "set_field_name" perform the
set operation on that field. We can and plan to extend the API to make
this assumption explicit in the API.

In this case I can "learn" that I can match on group_id in some tables
and then use the above action to set the group_id in others.

>>> that is retrieved via ndo_flow_get_actions and fully exposed to
>>> userspace.
>>>
>>
>> My main concern is along similar lines (I did express it earlier and
>> I think Jiri chimed in as well).
>> The API exposes direct access to hardware. I am sure this was a result
>> of trying to replace the ethtool interface (which was primitive).
>> By providing vendors direct access to the hardware - they do not need
>> to use any traditional Linux tooling/APIs.
>
> I don't follow this. John's proposal allows to decide on a case by
> case basis what we want to export. Just like with ethtool or
> RTNETLINK. There is no direct access to hardware. A user can only
> configure what is being exposed by the kernel.
>
> Pablo raises an interesting point though. How do we handle unique
> features like Rocker groups.
>
> Maybe Jiri and Scott can chime in and describe if we can map this to
> something more generic and avoid exporting anything Rocker specific.
>

Even though its a detail of the rocker world its easy enough for a
program on top of the API to learn how it works.

So in the rocker switch case if I want to rewrite an eth_dst adress I
have a  couple choices. I can set the group_id in one of the tables
that support setting the group_id and then do the rewrite in one of the
tables that supports matching on group_id and setting the eth_dst mac.
The "choice" I make is a policy IMO and I don't want to hard code logic
in the kernel that picks tables and decides things like what should I
do if table x is full but table y could also be used should I overflow
into table y? Or  is table y reserved for some other network function?
etc.

There are some actions and metadata though that _need_ to be
standardized. These are the metadata that is used outside the API. For
example ingress_port is metadata that is set outside the tables.
Similarly set_egress_port and set_egress_queue provide the forwarding
and queueing fields. No matter how hard you look at the model from the
API you can not learn how these are used.

> What would a rocker group map to in the tc world?

In the 'tc' world I would guess the easiest thing to do is simply bind
a 'tc' qdisc to the ACL table. It seems a good first approximation of
how to make this work. But the rocker world doesn't yet have any QOS so
it makes it difficult to "offload" anything but the fifo qdiscs.

>
>> I see this as a gaping hole
>> for vendor SDKs with their own definitions of their own hardware that
>> doesnt work with anyone else. i.e it seems to standardize proprietary
>> interfaces. Maybe thats what Pablo is alluding to.
>
> I will be the first to root for rejection if such patches appear.
>

Is it problematic if users define some unique header here and then
provide actions to set/pop/push/get operations on it?

For me this seems perfectly reasonable. We can pull it out of hardware
or a database in libvirt perhaps then feed it back into Linux nft,
ebpf, tc u32_filter, to create a unified view. I think ebpf, nft, and
u32 have been all about supporting vendor specific protocols?

I must be missing the point about proprietary interfaces. The FLOW API
would be the interface and if we create interesting tools/systems around
it and integration with other Linux sub-systems by choosing to use a
proprietary SDK you lose the goodness.

.John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 16:49               ` Pablo Neira Ayuso
@ 2015-01-22 17:10                 ` John Fastabend
  2015-01-22 17:44                 ` Thomas Graf
  2015-01-23  9:00                 ` David Miller
  2 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-22 17:10 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Thomas Graf, Jamal Hadi Salim, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast, Jiri Pirko

On 01/22/2015 08:49 AM, Pablo Neira Ayuso wrote:
> On Thu, Jan 22, 2015 at 03:37:27PM +0000, Thomas Graf wrote:
>> On 01/22/15 at 10:28am, Jamal Hadi Salim wrote:
>>> On 01/22/15 10:13, Thomas Graf wrote:
>>>
>>>> I don't follow this. John's proposal allows to decide on a case by
>>>> case basis what we want to export. Just like with ethtool or
>>>> RTNETLINK. There is no direct access to hardware. A user can only
>>>> configure what is being exposed by the kernel.
>>>>
>>>
>>> So if i am a vendor with my own driver, I can expose whatever i want.
>>
>> No. We will reject any driver change attempting to do so on this
>> list.
>
> I think those vendors do not want to push those driver changes
> mainstream. They will likely use these new ndo's to fully expose their
> vendor-specific capabilities distributed in proprietary blobs.
>
> I remember to have seen one ugly patch for netfilter that added
> several hook functions (not netfilter hooks) at different positions of
> the NAT code, the goal was to offload NAT through hardware. I was told
> the code that was using those ad-hoc hooks was distributed in a binary
> blob.
>
>> This is the whole point of this: Coming up with a model that allows
>> to describe capabilities and offer flow programming capabilities
>> in a Vendor neutral way. A "push_vlan" or "pop_vlan" action will work
>> with any driver that supports it.
>
> Right, we need an abstraction for actions too, and the infrastructure
> should not provide any means to circunvent and expose vendor specific
> details.
>

I'm not sure what a vendor specific detail is in the API now?

The API provides a mechanism to define the headers you support in a
vendor neutral way. Perhaps vendor X may be the only hardware to
support the packet type but the packets are not really vendor specific
if we have the packet layout we can generate code to create the packets
if we care to in software and find it useful.

Maybe the issue is the actions look like UIDs without any specification
as to what they do? The concern being as a vendor I could create an
action and call it my_vendor_specific_action and if I never tell anyone
how to use it then we are stuck.

I can add a set of specifier attributes to the get_action call that
describes actions using basic primitives to resolve this specific issue.
Actions should either 'set' fields, 'get' fields, 'dec' fields, 'inc'
fields, etc. Then you specify an action by providing the list of
operations it performs. So set_group_id is simply

	'set_field uid=group_id_metadata'

More complicated actions exist like route() and such which will be a
list of operations that may set multiple fields in a single atomic step.

The table API also seems vendor neutral to me. I'm not sure what
specific details would be vendor specific here either.

.John



-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 16:49               ` Pablo Neira Ayuso
  2015-01-22 17:10                 ` John Fastabend
@ 2015-01-22 17:44                 ` Thomas Graf
  2015-01-24 12:34                   ` Jamal Hadi Salim
  2015-01-23  9:00                 ` David Miller
  2 siblings, 1 reply; 66+ messages in thread
From: Thomas Graf @ 2015-01-22 17:44 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jamal Hadi Salim, John Fastabend, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast, Jiri Pirko

On 01/22/15 at 05:49pm, Pablo Neira Ayuso wrote:
> On Thu, Jan 22, 2015 at 03:37:27PM +0000, Thomas Graf wrote:
> > On 01/22/15 at 10:28am, Jamal Hadi Salim wrote:
> > > On 01/22/15 10:13, Thomas Graf wrote:
> > > 
> > > >I don't follow this. John's proposal allows to decide on a case by
> > > >case basis what we want to export. Just like with ethtool or
> > > >RTNETLINK. There is no direct access to hardware. A user can only
> > > >configure what is being exposed by the kernel.
> > > >
> > > 
> > > So if i am a vendor with my own driver, I can expose whatever i want.
> > 
> > No. We will reject any driver change attempting to do so on this
> > list.
> 
> I think those vendors do not want to push those driver changes
> mainstream. They will likely use these new ndo's to fully expose their
> vendor-specific capabilities distributed in proprietary blobs.

You can achieve the exact same thing with an out of tree tc action,
classifier or even a new link type. Nothing prevents an out of tree
driver to register a new rtnetlink link type and do vendor specific
crap.

Out of tree code can abuse any kernel API in any way it wants. Not
sure how much we can do about that.

That said, as we know, vendor specific SDKs for most of the chips in
question here already exist. I'm not sure why a vendor would want to
use this infrastructure (which is subject to constant internal API
changes) to implement vendor specific APIs if that vendor already has
an indepdendent out of tree SDK.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 15:48               ` Jiri Pirko
@ 2015-01-22 17:58                 ` Thomas Graf
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Graf @ 2015-01-22 17:58 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, John Fastabend,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/22/15 at 04:48pm, Jiri Pirko wrote:
> Thu, Jan 22, 2015 at 04:37:27PM CET, tgraf@suug.ch wrote:
> >On 01/22/15 at 10:28am, Jamal Hadi Salim wrote:
> >> On 01/22/15 10:13, Thomas Graf wrote:
> >> 
> >> >I don't follow this. John's proposal allows to decide on a case by
> >> >case basis what we want to export. Just like with ethtool or
> >> >RTNETLINK. There is no direct access to hardware. A user can only
> >> >configure what is being exposed by the kernel.
> >> >
> >> 
> >> So if i am a vendor with my own driver, I can expose whatever i want.
> >
> >No. We will reject any driver change attempting to do so on this
> >list.
> 
> That is not 100%, on contrary. If the infrastructure would be made to
> explicitly disallow that kind of behaviour, it would be much safer.

I'm very much in favour of that. Ideas?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 16:49               ` Pablo Neira Ayuso
  2015-01-22 17:10                 ` John Fastabend
  2015-01-22 17:44                 ` Thomas Graf
@ 2015-01-23  9:00                 ` David Miller
  2 siblings, 0 replies; 66+ messages in thread
From: David Miller @ 2015-01-23  9:00 UTC (permalink / raw)
  To: pablo
  Cc: tgraf, jhs, john.fastabend, simon.horman, sfeldma, netdev,
	gerlitz.or, andy, ast, jiri

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Thu, 22 Jan 2015 17:49:51 +0100

> I think those vendors do not want to push those driver changes
> mainstream. They will likely use these new ndo's to fully expose their
> vendor-specific capabilities distributed in proprietary blobs.

+1

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 15:44               ` Jamal Hadi Salim
@ 2015-01-23 10:10                 ` Thomas Graf
  2015-01-23 10:24                   ` Jiri Pirko
  0 siblings, 1 reply; 66+ messages in thread
From: Thomas Graf @ 2015-01-23 10:10 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, John Fastabend, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast, Jiri Pirko

On 01/22/15 at 10:44am, Jamal Hadi Salim wrote:
> On 01/22/15 10:37, Thomas Graf wrote:
> >On 01/22/15 at 10:28am, Jamal Hadi Salim wrote:
> 
> >>So if i am a vendor with my own driver, I can expose whatever i want.
> >
> >No. We will reject any driver change attempting to do so on this
> >list.
> >
> 
> Vendor provides a driver that exposes a discoverable interface
> (capabilities exposure that is facilitated).
> They dont need it to be part of the mainstream kernel.
> And they dont need any of your definitions.

An out of tree driver always had the possibility to register its own
Generic Netlink protocol and do exactly what you describe. The same
driver could also register as { xt, cls, act } module and export direct
hardware access to userspace through tc and iptables. The driver could
even register its own netfilter hook. You can abuse pretty much any
interface that has some form of registration mechansim from an out of
tree driver. We can't really control that.

I think we agree that the value of this model is that tools like nft,
OVS, SnabbSwitch, tc, [you name it] can use it to program the hardware
in a very generic manner from user space without requiring to move all
of that complexity to the kernel. In the very same way as the team
device exports most of the complexity to user space. Or for the same
reason routing protocol implementations were kept out of the kernel.

If a vendor exposes capabilities in a form that is not understood by
the well known tools it has zero value because the capabilities can't
be used. Any tool that would depend on such vendor specific bits that
are exported by out of tree drivers might as well use the existing
vendor SDKs which will always provide some additional functionality
because it doesn't have to compromise.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 10:10                 ` Thomas Graf
@ 2015-01-23 10:24                   ` Jiri Pirko
  2015-01-23 11:08                     ` Thomas Graf
  0 siblings, 1 reply; 66+ messages in thread
From: Jiri Pirko @ 2015-01-23 10:24 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, John Fastabend,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

Fri, Jan 23, 2015 at 11:10:19AM CET, tgraf@suug.ch wrote:
>On 01/22/15 at 10:44am, Jamal Hadi Salim wrote:
>> On 01/22/15 10:37, Thomas Graf wrote:
>> >On 01/22/15 at 10:28am, Jamal Hadi Salim wrote:
>> 
>> >>So if i am a vendor with my own driver, I can expose whatever i want.
>> >
>> >No. We will reject any driver change attempting to do so on this
>> >list.
>> >
>> 
>> Vendor provides a driver that exposes a discoverable interface
>> (capabilities exposure that is facilitated).
>> They dont need it to be part of the mainstream kernel.
>> And they dont need any of your definitions.
>
>An out of tree driver always had the possibility to register its own
>Generic Netlink protocol and do exactly what you describe. The same
>driver could also register as { xt, cls, act } module and export direct
>hardware access to userspace through tc and iptables. The driver could
>even register its own netfilter hook. You can abuse pretty much any
>interface that has some form of registration mechansim from an out of
>tree driver. We can't really control that.
>
>I think we agree that the value of this model is that tools like nft,
>OVS, SnabbSwitch, tc, [you name it] can use it to program the hardware
>in a very generic manner from user space without requiring to move all
>of that complexity to the kernel. In the very same way as the team
>device exports most of the complexity to user space. Or for the same
>reason routing protocol implementations were kept out of the kernel.


I think that comparing this to team or routing userspace is not
correct. The reason is that team and routing has single api to kernel.
However in this case userspace has to use multiple APIs.

For example OVS. It would have to use existing OVS gennetlink iface + this
new flow netlink iface for flow offloads. For all others, this is the same.
Multiple apis for the same thing (does not matter if it is implemented
in hw or sw) does not seem right to me.


>
>If a vendor exposes capabilities in a form that is not understood by
>the well known tools it has zero value because the capabilities can't
>be used. Any tool that would depend on such vendor specific bits that
>are exported by out of tree drivers might as well use the existing
>vendor SDKs which will always provide some additional functionality
>because it doesn't have to compromise.
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 16:58           ` John Fastabend
@ 2015-01-23 10:49             ` Thomas Graf
  2015-01-23 16:42               ` John Fastabend
  2015-01-24 12:29             ` Jamal Hadi Salim
  1 sibling, 1 reply; 66+ messages in thread
From: Thomas Graf @ 2015-01-23 10:49 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, simon.horman, sfeldma,
	netdev, davem, gerlitz.or, andy, ast, Jiri Pirko

On 01/22/15 at 08:58am, John Fastabend wrote:
> In response to Pablo's observation,
> 
> Correct this is fully exposed to user space, but it is also self
> contained inside the API meaning I can learn when to use it and what it
> does by looking at the other operations tables the table graph and
> supported headers. The assumption I am making that is not in the API
> explicitly yet. Is that actions named "set_field_name" perform the
> set operation on that field. We can and plan to extend the API to make
> this assumption explicit in the API.

OK. I think it's this assumption that is not explicitly in the API yet
that causes confusion. Making it explicit would definitely help. Do
we even need the driver to declare get/set operations at all? Can we
just have the driver expose the field and the API takes care of
providing get/set actions?

> Even though its a detail of the rocker world its easy enough for a
> program on top of the API to learn how it works.
> 
> So in the rocker switch case if I want to rewrite an eth_dst adress I
> have a  couple choices. I can set the group_id in one of the tables
> that support setting the group_id and then do the rewrite in one of the
> tables that supports matching on group_id and setting the eth_dst mac.
> The "choice" I make is a policy IMO and I don't want to hard code logic
> in the kernel that picks tables and decides things like what should I
> do if table x is full but table y could also be used should I overflow
> into table y? Or  is table y reserved for some other network function?
> etc.

Agreed. It might make sense to declare such fields as general purpose
metadata or have some kind of field class which describes the nature
of the field: { register, protocol-field, configuration, ... }

> There are some actions and metadata though that _need_ to be
> standardized. These are the metadata that is used outside the API. For
> example ingress_port is metadata that is set outside the tables.
> Similarly set_egress_port and set_egress_queue provide the forwarding
> and queueing fields. No matter how hard you look at the model from the
> API you can not learn how these are used.

Agreed. I assume we would implement a tun_dst the same standardized way.

> >What would a rocker group map to in the tc world?
> 
> In the 'tc' world I would guess the easiest thing to do is simply bind
> a 'tc' qdisc to the ACL table. It seems a good first approximation of
> how to make this work. But the rocker world doesn't yet have any QOS so
> it makes it difficult to "offload" anything but the fifo qdiscs.

Right. I was asking as tc will have the same difficulty if it wishes
to classify based on rocker groups or other general purpose hardware
metadata fields. We can either supoprt them by describing them and
allow learning of such fields or ignore them.

> >>I see this as a gaping hole
> >>for vendor SDKs with their own definitions of their own hardware that
> >>doesnt work with anyone else. i.e it seems to standardize proprietary
> >>interfaces. Maybe thats what Pablo is alluding to.
> >
> >I will be the first to root for rejection if such patches appear.
> >
> 
> Is it problematic if users define some unique header here and then
> provide actions to set/pop/push/get operations on it?

I have no problem with unique headers but we have to ensure that a
field with identical purpose or same logical meaning is represented in
the same way by all drivers. If a driver introduces a new field it
must consider that other drivers will need/want to use it as well.
I guess/hope this is obvious though ;-)

I agree that if chip A has 8 general purpose registers and chip B has
32 of them then it doesn't matter how they are called. What matters is
that they are declared as such to API users.

Actions must obviously be standardized as your proposal already does
by exposing push_vlan, pop_vlan, etc.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 10:24                   ` Jiri Pirko
@ 2015-01-23 11:08                     ` Thomas Graf
  2015-01-23 11:39                       ` Jiri Pirko
  0 siblings, 1 reply; 66+ messages in thread
From: Thomas Graf @ 2015-01-23 11:08 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, John Fastabend,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/23/15 at 11:24am, Jiri Pirko wrote:
> I think that comparing this to team or routing userspace is not
> correct. The reason is that team and routing has single api to kernel.
> However in this case userspace has to use multiple APIs.

The point I was trying to make is that there are legitimate reasons
to keep complexity out of the kernel and team is a good example for
that.

As for multiple APIs. Team does in fact export its own Generic Netlink
interface while it also hooks into rtnetlink to support ip link. Not
sure whether that qualifies for multiple APIs or not but I think it's
an excellent architecture decision. Same as for nl80211 tools.

> For example OVS. It would have to use existing OVS gennetlink iface + this
> new flow netlink iface for flow offloads. For all others, this is the same.
> Multiple apis for the same thing (does not matter if it is implemented
> in hw or sw) does not seem right to me.

Fair enough. I have no objections to merging the flow API into RTNETLINK
although I don't really see a need to put more under the rtnl umbrella
unless absolutely required.

I think John also mentioned that he proposes to have this as a separate
Generic Netlink interface for now but this could really live wherever it
seems appropriate.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 11:08                     ` Thomas Graf
@ 2015-01-23 11:39                       ` Jiri Pirko
  2015-01-23 12:28                         ` Thomas Graf
  2015-01-24 12:36                         ` Jamal Hadi Salim
  0 siblings, 2 replies; 66+ messages in thread
From: Jiri Pirko @ 2015-01-23 11:39 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, John Fastabend,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

Fri, Jan 23, 2015 at 12:08:21PM CET, tgraf@suug.ch wrote:
>On 01/23/15 at 11:24am, Jiri Pirko wrote:
>> I think that comparing this to team or routing userspace is not
>> correct. The reason is that team and routing has single api to kernel.
>> However in this case userspace has to use multiple APIs.
>
>The point I was trying to make is that there are legitimate reasons
>to keep complexity out of the kernel and team is a good example for
>that.
>
>As for multiple APIs. Team does in fact export its own Generic Netlink
>interface while it also hooks into rtnetlink to support ip link. Not
>sure whether that qualifies for multiple APIs or not but I think it's
>an excellent architecture decision. Same as for nl80211 tools.

Team uses multiple api for sure, but for different things.

>
>> For example OVS. It would have to use existing OVS gennetlink iface + this
>> new flow netlink iface for flow offloads. For all others, this is the same.
>> Multiple apis for the same thing (does not matter if it is implemented
>> in hw or sw) does not seem right to me.
>
>Fair enough. I have no objections to merging the flow API into RTNETLINK
>although I don't really see a need to put more under the rtnl umbrella
>unless absolutely required.
>
>I think John also mentioned that he proposes to have this as a separate
>Generic Netlink interface for now but this could really live wherever it
>seems appropriate.

Maybe I did not express myself correctly. I do not care if this is
exposed by rtnl or a separate genetlink. The issue still stands. And the
issue is that the user have to use "the way A" to setup sw datapath and
"the way B" to setup hw datapath. The preferable would be to have
"the way X" which can be used to setup both sw and hw.

And I believe that could be achieved. Consider something like this:

- have cls_xflows tc classifier and act_xflows tc action as a wrapper
  (or api) for John's work. With possibility for multiple backends. The
  backend iface would looke very similar to what John has now.
- other tc clses and acts will implement xflows backend
- openvswitch datapath will implement xflows backend
- rocker switch will implement xflows backend
- other drivers will implement xflows backend

Now if user wants to manipulate with any flow setting, he can just use
cls_xflows and act_xflows to to that.

This is very rough, but I just wanted to draw the picture. This would
provide single entry to flow world manipulation in kernel, no matter if
sw or hw.

Thoughts?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 11:39                       ` Jiri Pirko
@ 2015-01-23 12:28                         ` Thomas Graf
  2015-01-23 13:43                           ` Jiri Pirko
  2015-01-24 12:36                         ` Jamal Hadi Salim
  1 sibling, 1 reply; 66+ messages in thread
From: Thomas Graf @ 2015-01-23 12:28 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, John Fastabend,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/23/15 at 12:39pm, Jiri Pirko wrote:
> Maybe I did not express myself correctly. I do not care if this is
> exposed by rtnl or a separate genetlink. The issue still stands. And the
> issue is that the user have to use "the way A" to setup sw datapath and
> "the way B" to setup hw datapath. The preferable would be to have
> "the way X" which can be used to setup both sw and hw.
> 
> And I believe that could be achieved. Consider something like this:
> 
> - have cls_xflows tc classifier and act_xflows tc action as a wrapper
>   (or api) for John's work. With possibility for multiple backends. The
>   backend iface would looke very similar to what John has now.
> - other tc clses and acts will implement xflows backend
> - openvswitch datapath will implement xflows backend
> - rocker switch will implement xflows backend
> - other drivers will implement xflows backend
> 
> Now if user wants to manipulate with any flow setting, he can just use
> cls_xflows and act_xflows to to that.
> 
> This is very rough, but I just wanted to draw the picture. This would
> provide single entry to flow world manipulation in kernel, no matter if
> sw or hw.

If I understand this correctly then you propose to do the decision on
whether to implement a flow in software or offload it to hardware in the
xflows classifier and action. I had exactly the same architecture in mind
initially when I first approached this and wanted to offload OVS
datapath flows transparently to hardware.

If you look at this from the existing tc world then that makes a lot
of sense, in particular if you only support a single flat table with
wildcard flows and no priorities.

If you want to support priorities it already gets complicated. If flow
A, B, C are offloaded to hardware and the user then inserts a new flow
D with higher priority that can't be offloaded you need to figure out
whether you have to remove any of A, B, C from the hardware tables again
on the basis whether D overlaps with A, B, or C. If you have to remove
any of them you then have to verify whether that removal needs to
remove other already offloaded flows as well. It's certainly doable but
already adds considerable complexity to the kernel.

If you want to support multiple tables it gets even more complicated
because a flow in table 2 which can be offloaded might depend on a
flow in table 1 which can't be offloaded. You somehow need to track
that dependency and ensure that table 1 sends that flow to the CPU so
that the flow in table 2 sees it. The answer to this might be to maybe
only support  offload to a single table but that decreases the value
of the offload dramatically because the capabilities of each table are
very different.

If you bring the full programmability of OVS into the picture you might
have a pipeline consisting of multiple tables like this:

 +-------+   +------+   +-----+   +-------+
 | Decap |-->| L2   |-->| L3  |-->| Encap |
 +-------+   +------+   +-----+   +-------+

Each table contains flows and metadata registers plus header matches
are used to talk among the tables. The pipeline builds a chain of
actions which may be executed at any point in the pipeline or at the
end. If you want to map such a software pipeline to a set of hardware
tables you need to have full visbility into this table structure at
the point where you make the offload decision. This means that all of
this complexity would have to move into xflows.

Another aspect is that you might want to split a flow X into a hardware
and software part, e.g. consider the following flow:

in_port=vxlan0,vni=10,ip_dst=10.1.1.1,actions=decap(),nfqueue(10),output(tap0)

The hardware might be capable of matching on the VXLAN VNI, IP dst and
it might also capable of deencap. It obviously doesn't know about
netfilter queues. Ideally what you want is to split this into the
following flows:

Hardware table (offloaded):
in_port=vxlan0,vni=10,ip_dst=10.1.1.1,actions=decap(),metadata=1

Software table:
metadata=1,actions=nfqueue(10),output(tap0)

If the hardware capabilities are not exported to OVS then xflows would
need to encode such logic and xflows would need to be made aware of the
full software pipeline with all tables as you need to see all flows in
order to decide what to offload where.

I would love to see a tc interface to John's flow API and I see
tremendous value but I don't think it's appropriate to offload OVS.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 12:28                         ` Thomas Graf
@ 2015-01-23 13:43                           ` Jiri Pirko
  2015-01-23 14:07                             ` Thomas Graf
  0 siblings, 1 reply; 66+ messages in thread
From: Jiri Pirko @ 2015-01-23 13:43 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, John Fastabend,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

Fri, Jan 23, 2015 at 01:28:38PM CET, tgraf@suug.ch wrote:
>On 01/23/15 at 12:39pm, Jiri Pirko wrote:
>> Maybe I did not express myself correctly. I do not care if this is
>> exposed by rtnl or a separate genetlink. The issue still stands. And the
>> issue is that the user have to use "the way A" to setup sw datapath and
>> "the way B" to setup hw datapath. The preferable would be to have
>> "the way X" which can be used to setup both sw and hw.
>> 
>> And I believe that could be achieved. Consider something like this:
>> 
>> - have cls_xflows tc classifier and act_xflows tc action as a wrapper
>>   (or api) for John's work. With possibility for multiple backends. The
>>   backend iface would looke very similar to what John has now.
>> - other tc clses and acts will implement xflows backend
>> - openvswitch datapath will implement xflows backend
>> - rocker switch will implement xflows backend
>> - other drivers will implement xflows backend
>> 
>> Now if user wants to manipulate with any flow setting, he can just use
>> cls_xflows and act_xflows to to that.
>> 
>> This is very rough, but I just wanted to draw the picture. This would
>> provide single entry to flow world manipulation in kernel, no matter if
>> sw or hw.
>
>If I understand this correctly then you propose to do the decision on
>whether to implement a flow in software or offload it to hardware in the
>xflows classifier and action. I had exactly the same architecture in mind
>initially when I first approached this and wanted to offload OVS
>datapath flows transparently to hardware.

Think about xflows as an iface to multiple backends, some sw and some hw.
User will be able to specify which backed he wants to use for particular
"commands".

So for example, ovs kernel datapath module will implement an xflows
backend and register it as "ovsdp". Rocker will implement another xflows
backend and register it as "rockerdp". Then, ovs userspace will use xflows
api to setup both backends independently, but using the same xflows api.

It is still up to userspace to decide what should be put where (what
backend to use).

>
>If you look at this from the existing tc world then that makes a lot
>of sense, in particular if you only support a single flat table with
>wildcard flows and no priorities.
>
>If you want to support priorities it already gets complicated. If flow
>A, B, C are offloaded to hardware and the user then inserts a new flow
>D with higher priority that can't be offloaded you need to figure out
>whether you have to remove any of A, B, C from the hardware tables again
>on the basis whether D overlaps with A, B, or C. If you have to remove
>any of them you then have to verify whether that removal needs to
>remove other already offloaded flows as well. It's certainly doable but
>already adds considerable complexity to the kernel.
>
>If you want to support multiple tables it gets even more complicated
>because a flow in table 2 which can be offloaded might depend on a
>flow in table 1 which can't be offloaded. You somehow need to track
>that dependency and ensure that table 1 sends that flow to the CPU so
>that the flow in table 2 sees it. The answer to this might be to maybe
>only support  offload to a single table but that decreases the value
>of the offload dramatically because the capabilities of each table are
>very different.
>
>If you bring the full programmability of OVS into the picture you might
>have a pipeline consisting of multiple tables like this:
>
> +-------+   +------+   +-----+   +-------+
> | Decap |-->| L2   |-->| L3  |-->| Encap |
> +-------+   +------+   +-----+   +-------+
>
>Each table contains flows and metadata registers plus header matches
>are used to talk among the tables. The pipeline builds a chain of
>actions which may be executed at any point in the pipeline or at the
>end. If you want to map such a software pipeline to a set of hardware
>tables you need to have full visbility into this table structure at
>the point where you make the offload decision. This means that all of
>this complexity would have to move into xflows.
>
>Another aspect is that you might want to split a flow X into a hardware
>and software part, e.g. consider the following flow:
>
>in_port=vxlan0,vni=10,ip_dst=10.1.1.1,actions=decap(),nfqueue(10),output(tap0)
>
>The hardware might be capable of matching on the VXLAN VNI, IP dst and
>it might also capable of deencap. It obviously doesn't know about
>netfilter queues. Ideally what you want is to split this into the
>following flows:
>
>Hardware table (offloaded):
>in_port=vxlan0,vni=10,ip_dst=10.1.1.1,actions=decap(),metadata=1
>
>Software table:
>metadata=1,actions=nfqueue(10),output(tap0)
>
>If the hardware capabilities are not exported to OVS then xflows would
>need to encode such logic and xflows would need to be made aware of the
>full software pipeline with all tables as you need to see all flows in
>order to decide what to offload where.
>
>I would love to see a tc interface to John's flow API and I see
>tremendous value but I don't think it's appropriate to offload OVS.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 13:43                           ` Jiri Pirko
@ 2015-01-23 14:07                             ` Thomas Graf
  2015-01-23 15:25                               ` Jiri Pirko
  2015-01-23 15:34                               ` John Fastabend
  0 siblings, 2 replies; 66+ messages in thread
From: Thomas Graf @ 2015-01-23 14:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, John Fastabend,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/23/15 at 02:43pm, Jiri Pirko wrote:
> Fri, Jan 23, 2015 at 01:28:38PM CET, tgraf@suug.ch wrote:
> >If I understand this correctly then you propose to do the decision on
> >whether to implement a flow in software or offload it to hardware in the
> >xflows classifier and action. I had exactly the same architecture in mind
> >initially when I first approached this and wanted to offload OVS
> >datapath flows transparently to hardware.
> 
> Think about xflows as an iface to multiple backends, some sw and some hw.
> User will be able to specify which backed he wants to use for particular
> "commands".
> 
> So for example, ovs kernel datapath module will implement an xflows
> backend and register it as "ovsdp". Rocker will implement another xflows
> backend and register it as "rockerdp". Then, ovs userspace will use xflows
> api to setup both backends independently, but using the same xflows api.
> 
> It is still up to userspace to decide what should be put where (what
> backend to use).

OK, sounds good so far. Although we can't completely ditch the existing
genl based OVS flow API for obvious backwards compatibility reasons ;-)

How does John's API fit into this? How would you expose capabilities
through xflows? How would it differ from what John proposes?

Since this would be a regular tc classifier I assume it could be
attached to any tc class and interface and then combined with other
classifiers which OVS would not be aware of. How do you intend to
resolve such conflicts?

Example:
 eth0:
   ingress qdisc:
     cls prio 20 u32 match [...]
     cls prio 10 xflows [...]

If xflows offloads to hardware, the u32 classifier with higher
priority is hidden unintentionally.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 14:07                             ` Thomas Graf
@ 2015-01-23 15:25                               ` Jiri Pirko
  2015-01-23 15:43                                 ` John Fastabend
  2015-01-23 15:49                                 ` Thomas Graf
  2015-01-23 15:34                               ` John Fastabend
  1 sibling, 2 replies; 66+ messages in thread
From: Jiri Pirko @ 2015-01-23 15:25 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, John Fastabend,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

Fri, Jan 23, 2015 at 03:07:24PM CET, tgraf@suug.ch wrote:
>On 01/23/15 at 02:43pm, Jiri Pirko wrote:
>> Fri, Jan 23, 2015 at 01:28:38PM CET, tgraf@suug.ch wrote:
>> >If I understand this correctly then you propose to do the decision on
>> >whether to implement a flow in software or offload it to hardware in the
>> >xflows classifier and action. I had exactly the same architecture in mind
>> >initially when I first approached this and wanted to offload OVS
>> >datapath flows transparently to hardware.
>> 
>> Think about xflows as an iface to multiple backends, some sw and some hw.
>> User will be able to specify which backed he wants to use for particular
>> "commands".
>> 
>> So for example, ovs kernel datapath module will implement an xflows
>> backend and register it as "ovsdp". Rocker will implement another xflows
>> backend and register it as "rockerdp". Then, ovs userspace will use xflows
>> api to setup both backends independently, but using the same xflows api.
>> 
>> It is still up to userspace to decide what should be put where (what
>> backend to use).
>
>OK, sounds good so far. Although we can't completely ditch the existing
>genl based OVS flow API for obvious backwards compatibility reasons ;-)

Sure.

>
>How does John's API fit into this? How would you expose capabilities
>through xflows? How would it differ from what John proposes?

This certainly need more thinking. The capabilities could be exposed
either by separate a genl api (like in this version) or directly via TC
netlink iface (RTM_GETTFILTERCAP, RTM_GETACTIONCAP). The insides of the
message can stay the same. I like the second way better.

flow manipulation would happen as standard TC filters/actions manipulation.
Here, the Netlink messages could be also very similar to what John has now.


>
>Since this would be a regular tc classifier I assume it could be
>attached to any tc class and interface and then combined with other
>classifiers which OVS would not be aware of. How do you intend to
>resolve such conflicts?
>
>Example:
> eth0:
>   ingress qdisc:
>     cls prio 20 u32 match [...]
>     cls prio 10 xflows [...]
>
>If xflows offloads to hardware, the u32 classifier with higher
>priority is hidden unintentionally.


Right. We have to either introduce some limitations for xflows to
disallow this or let the user to take care of this. But it's similar
problem as if you use tc with John's API or ovs with John's API.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 14:07                             ` Thomas Graf
  2015-01-23 15:25                               ` Jiri Pirko
@ 2015-01-23 15:34                               ` John Fastabend
  2015-01-23 15:53                                 ` Jiri Pirko
                                                   ` (2 more replies)
  1 sibling, 3 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-23 15:34 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jiri Pirko, Jamal Hadi Salim, Pablo Neira Ayuso, simon.horman,
	sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/23/2015 06:07 AM, Thomas Graf wrote:
> On 01/23/15 at 02:43pm, Jiri Pirko wrote:
>> Fri, Jan 23, 2015 at 01:28:38PM CET, tgraf@suug.ch wrote:
>>> If I understand this correctly then you propose to do the decision on
>>> whether to implement a flow in software or offload it to hardware in the
>>> xflows classifier and action. I had exactly the same architecture in mind
>>> initially when I first approached this and wanted to offload OVS
>>> datapath flows transparently to hardware.
>>
>> Think about xflows as an iface to multiple backends, some sw and some hw.
>> User will be able to specify which backed he wants to use for particular
>> "commands".
>>
>> So for example, ovs kernel datapath module will implement an xflows
>> backend and register it as "ovsdp". Rocker will implement another xflows
>> backend and register it as "rockerdp". Then, ovs userspace will use xflows
>> api to setup both backends independently, but using the same xflows api.
>>
>> It is still up to userspace to decide what should be put where (what
>> backend to use).
>
> OK, sounds good so far. Although we can't completely ditch the existing
> genl based OVS flow API for obvious backwards compatibility reasons ;-)
>
> How does John's API fit into this? How would you expose capabilities
> through xflows? How would it differ from what John proposes?
>
> Since this would be a regular tc classifier I assume it could be
> attached to any tc class and interface and then combined with other
> classifiers which OVS would not be aware of. How do you intend to
> resolve such conflicts?
>
> Example:
>   eth0:
>     ingress qdisc:
>       cls prio 20 u32 match [...]
>       cls prio 10 xflows [...]
>
> If xflows offloads to hardware, the u32 classifier with higher
> priority is hidden unintentionally.
>

I thought about this at length. And I'm not opposed to pulling my API
into a 'tc classifier' but I its not 100% clear to me the reason it
helps.

First 'tc' infrastructure doesn't have any classifier that would map
well to this today so you are talking about a new classifier looks like
Jiri is calling it xflows. This is fine.

Now 'xflows' needs to implement the same get operations that exist in
this flow API otherwise writing meaningful policies as Thomas points out
is crude at best. So this tc classifier supports 'get headers',
'get actions', and 'get tables' and then there associated graphs. All
good so far. This is just an embedding of the existing API in the 'tc'
netlink family. I've never had any issues with this. Finally you build
up the 'get_flow' and 'set_flow' operations I still so no issue with
this and its just an embedding of the existing API into a 'tc
classifier'. My flow tool becomes one of the classifier tools.

Now what should I attach my filter to? Typically we attach it to qdiscs
today. But what does that mean for a switch device? I guess I need an
_offloaded qdisc_? I don't want to run the same qdisc in my dataplane
of the switch as I run on the ports going into/out of the sw dataplane.
Similarly I don't want to run the same set of filters. So at this point
I have a set of qdiscs per port to represent the switch dataplane and
a set of qdiscs attached to the software dataplane. If people think this
is worth doing lets do it. It may get you a nice way to manage QOS while
your @ it.

At this point we have the above xflows filter that works on hardware and
some qdisc abstraction to represent hardware great.

What I don't have a lot of use for at the moment is an xflows that runs
in software? Conceptually it sounds fine but why would I want to mirror
hardware limitations into software? And if I make it completely generic
it becomes u32 more or less. I could create an optimized version of the
hardware dataplane in userspace which sits somewhere between u32 and the
other classifiers on flexility and maybe gains some performance but I'm
at a loss as to why this is useful. I would rather spend my time getting
better performance out of u32 and dropping qdisc_lock completely then
writing some partially useful filter for software.

My original conclusion was not to worry about embedding it inside 'tc'
and I didn't mind having another netlink family but I'm not opposed to
doing the embedding also if it helps someone, even if just resolves some
cognitive dissonance.

.John


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 15:25                               ` Jiri Pirko
@ 2015-01-23 15:43                                 ` John Fastabend
  2015-01-23 15:56                                   ` Jiri Pirko
  2015-01-23 15:49                                 ` Thomas Graf
  1 sibling, 1 reply; 66+ messages in thread
From: John Fastabend @ 2015-01-23 15:43 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Thomas Graf, Jamal Hadi Salim, Pablo Neira Ayuso, simon.horman,
	sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/23/2015 07:25 AM, Jiri Pirko wrote:
> Fri, Jan 23, 2015 at 03:07:24PM CET, tgraf@suug.ch wrote:
>> On 01/23/15 at 02:43pm, Jiri Pirko wrote:
>>> Fri, Jan 23, 2015 at 01:28:38PM CET, tgraf@suug.ch wrote:
>>>> If I understand this correctly then you propose to do the decision on
>>>> whether to implement a flow in software or offload it to hardware in the
>>>> xflows classifier and action. I had exactly the same architecture in mind
>>>> initially when I first approached this and wanted to offload OVS
>>>> datapath flows transparently to hardware.
>>>
>>> Think about xflows as an iface to multiple backends, some sw and some hw.
>>> User will be able to specify which backed he wants to use for particular
>>> "commands".
>>>
>>> So for example, ovs kernel datapath module will implement an xflows
>>> backend and register it as "ovsdp". Rocker will implement another xflows
>>> backend and register it as "rockerdp". Then, ovs userspace will use xflows
>>> api to setup both backends independently, but using the same xflows api.
>>>
>>> It is still up to userspace to decide what should be put where (what
>>> backend to use).
>>
>> OK, sounds good so far. Although we can't completely ditch the existing
>> genl based OVS flow API for obvious backwards compatibility reasons ;-)
>
> Sure.

Replied to the other thread before seeing this, but some comments.

>
>>
>> How does John's API fit into this? How would you expose capabilities
>> through xflows? How would it differ from what John proposes?
>
> This certainly need more thinking. The capabilities could be exposed
> either by separate a genl api (like in this version) or directly via TC
> netlink iface (RTM_GETTFILTERCAP, RTM_GETACTIONCAP). The insides of the
> message can stay the same. I like the second way better.
>

For what its worth I started this route when I did the flow API before
ditching it and going to its own netlink family.

> flow manipulation would happen as standard TC filters/actions manipulation.
> Here, the Netlink messages could be also very similar to what John has now.
>

In fact I think they are almost the same ;) I don't mind doing the
embedding as long as there is some sort of plan for how to attach
filters to hardware. This is where I got stuck. I think you need
a new attach point in the hardware. See my other reply.

>
>>
>> Since this would be a regular tc classifier I assume it could be
>> attached to any tc class and interface and then combined with other
>> classifiers which OVS would not be aware of. How do you intend to
>> resolve such conflicts?
>>
>> Example:
>> eth0:
>>    ingress qdisc:
>>      cls prio 20 u32 match [...]
>>      cls prio 10 xflows [...]
>>
>> If xflows offloads to hardware, the u32 classifier with higher
>> priority is hidden unintentionally.
>
>
> Right. We have to either introduce some limitations for xflows to
> disallow this or let the user to take care of this. But it's similar
> problem as if you use tc with John's API or ovs with John's API.
>

But with the current API its clear that the rules managed by the
Flow API are in front of 'tc' and 'ovs' on ingress. Just the same
as it is clear 'tc' ingress rules are walked before 'ovs' ingress
rules. On egress it is similarly clear that 'ovs' does a forward
rule to a netdev, then 'tc' fiters+qdisc is run, and finally the
hardware flow api is hit.

I also think it is clear that when a packet never enters the software
dataplane _only_ the hardware dataplane rules are used namely the
entries in the Flow API.

The cases I've been experimenting with using Flow API it is clear
on the priority and what rules are being used by looking at counters
and "knowing" the above pipeline mode.

Although as I type this I think a picture would help and some
documentation.

.John


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 15:25                               ` Jiri Pirko
  2015-01-23 15:43                                 ` John Fastabend
@ 2015-01-23 15:49                                 ` Thomas Graf
  2015-01-23 16:00                                   ` Jiri Pirko
  1 sibling, 1 reply; 66+ messages in thread
From: Thomas Graf @ 2015-01-23 15:49 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, John Fastabend,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/23/15 at 04:25pm, Jiri Pirko wrote:
> Fri, Jan 23, 2015 at 03:07:24PM CET, tgraf@suug.ch wrote:
> >How does John's API fit into this? How would you expose capabilities
> >through xflows? How would it differ from what John proposes?
> 
> This certainly need more thinking. The capabilities could be exposed
> either by separate a genl api (like in this version) or directly via TC
> netlink iface (RTM_GETTFILTERCAP, RTM_GETACTIONCAP). The insides of the
> message can stay the same. I like the second way better.

OK. Any particular reason why you like the tc integration better?

> flow manipulation would happen as standard TC filters/actions manipulation.
> Here, the Netlink messages could be also very similar to what John has now.

I have one concern here: This would mean we put flow modifications
under the rtnl lock which will have severe impact on the rate of
flow modifications we can support. We need flow table modifications
to continue being super fast.

Parallel genetlink operations were introduced just for this.

> Right. We have to either introduce some limitations for xflows to
> disallow this or let the user to take care of this. But it's similar
> problem as if you use tc with John's API or ovs with John's API.

Agreed. It's a general problem with having multiple indepdent tools.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 15:34                               ` John Fastabend
@ 2015-01-23 15:53                                 ` Jiri Pirko
  2015-01-23 16:00                                   ` Thomas Graf
  2015-01-23 17:46                                 ` Thomas Graf
  2015-01-24 13:01                                 ` Jamal Hadi Salim
  2 siblings, 1 reply; 66+ messages in thread
From: Jiri Pirko @ 2015-01-23 15:53 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Jamal Hadi Salim, Pablo Neira Ayuso, simon.horman,
	sfeldma, netdev, davem, gerlitz.or, andy, ast

Fri, Jan 23, 2015 at 04:34:55PM CET, john.fastabend@gmail.com wrote:
>On 01/23/2015 06:07 AM, Thomas Graf wrote:
>>On 01/23/15 at 02:43pm, Jiri Pirko wrote:
>>>Fri, Jan 23, 2015 at 01:28:38PM CET, tgraf@suug.ch wrote:
>>>>If I understand this correctly then you propose to do the decision on
>>>>whether to implement a flow in software or offload it to hardware in the
>>>>xflows classifier and action. I had exactly the same architecture in mind
>>>>initially when I first approached this and wanted to offload OVS
>>>>datapath flows transparently to hardware.
>>>
>>>Think about xflows as an iface to multiple backends, some sw and some hw.
>>>User will be able to specify which backed he wants to use for particular
>>>"commands".
>>>
>>>So for example, ovs kernel datapath module will implement an xflows
>>>backend and register it as "ovsdp". Rocker will implement another xflows
>>>backend and register it as "rockerdp". Then, ovs userspace will use xflows
>>>api to setup both backends independently, but using the same xflows api.
>>>
>>>It is still up to userspace to decide what should be put where (what
>>>backend to use).
>>
>>OK, sounds good so far. Although we can't completely ditch the existing
>>genl based OVS flow API for obvious backwards compatibility reasons ;-)
>>
>>How does John's API fit into this? How would you expose capabilities
>>through xflows? How would it differ from what John proposes?
>>
>>Since this would be a regular tc classifier I assume it could be
>>attached to any tc class and interface and then combined with other
>>classifiers which OVS would not be aware of. How do you intend to
>>resolve such conflicts?
>>
>>Example:
>>  eth0:
>>    ingress qdisc:
>>      cls prio 20 u32 match [...]
>>      cls prio 10 xflows [...]
>>
>>If xflows offloads to hardware, the u32 classifier with higher
>>priority is hidden unintentionally.
>>
>
>I thought about this at length. And I'm not opposed to pulling my API
>into a 'tc classifier' but I its not 100% clear to me the reason it
>helps.
>
>First 'tc' infrastructure doesn't have any classifier that would map
>well to this today so you are talking about a new classifier looks like
>Jiri is calling it xflows. This is fine.
>
>Now 'xflows' needs to implement the same get operations that exist in
>this flow API otherwise writing meaningful policies as Thomas points out
>is crude at best. So this tc classifier supports 'get headers',
>'get actions', and 'get tables' and then there associated graphs. All
>good so far. This is just an embedding of the existing API in the 'tc'
>netlink family. I've never had any issues with this. Finally you build
>up the 'get_flow' and 'set_flow' operations I still so no issue with
>this and its just an embedding of the existing API into a 'tc
>classifier'. My flow tool becomes one of the classifier tools.
>
>Now what should I attach my filter to? Typically we attach it to qdiscs
>today. But what does that mean for a switch device? I guess I need an
>_offloaded qdisc_? I don't want to run the same qdisc in my dataplane
>of the switch as I run on the ports going into/out of the sw dataplane.
>Similarly I don't want to run the same set of filters. So at this point
>I have a set of qdiscs per port to represent the switch dataplane and
>a set of qdiscs attached to the software dataplane. If people think this
>is worth doing lets do it. It may get you a nice way to manage QOS while
>your @ it.

Yes!

>
>At this point we have the above xflows filter that works on hardware and
>some qdisc abstraction to represent hardware great.
>
>What I don't have a lot of use for at the moment is an xflows that runs
>in software? Conceptually it sounds fine but why would I want to mirror
>hardware limitations into software? And if I make it completely generic
>it becomes u32 more or less. I could create an optimized version of the
>hardware dataplane in userspace which sits somewhere between u32 and the
>other classifiers on flexility and maybe gains some performance but I'm
>at a loss as to why this is useful. I would rather spend my time getting
>better performance out of u32 and dropping qdisc_lock completely then
>writing some partially useful filter for software.

Well, even software implementation has limitations. Take ovs kernel
datapath as example. You can use your graphs to describe exactly what
ovs can handle. And after that you could use xflows api to set it up as
well as your rocker offload. That to me seems lie a very nice feature to
have.

>
>My original conclusion was not to worry about embedding it inside 'tc'
>and I didn't mind having another netlink family but I'm not opposed to
>doing the embedding also if it helps someone, even if just resolves some
>cognitive dissonance.
>
>.John
>
>
>-- 
>John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 15:43                                 ` John Fastabend
@ 2015-01-23 15:56                                   ` Jiri Pirko
  0 siblings, 0 replies; 66+ messages in thread
From: Jiri Pirko @ 2015-01-23 15:56 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Jamal Hadi Salim, Pablo Neira Ayuso, simon.horman,
	sfeldma, netdev, davem, gerlitz.or, andy, ast

Fri, Jan 23, 2015 at 04:43:48PM CET, john.fastabend@gmail.com wrote:
>On 01/23/2015 07:25 AM, Jiri Pirko wrote:
>>Fri, Jan 23, 2015 at 03:07:24PM CET, tgraf@suug.ch wrote:
>>>On 01/23/15 at 02:43pm, Jiri Pirko wrote:
>>>>Fri, Jan 23, 2015 at 01:28:38PM CET, tgraf@suug.ch wrote:
>>>>>If I understand this correctly then you propose to do the decision on
>>>>>whether to implement a flow in software or offload it to hardware in the
>>>>>xflows classifier and action. I had exactly the same architecture in mind
>>>>>initially when I first approached this and wanted to offload OVS
>>>>>datapath flows transparently to hardware.
>>>>
>>>>Think about xflows as an iface to multiple backends, some sw and some hw.
>>>>User will be able to specify which backed he wants to use for particular
>>>>"commands".
>>>>
>>>>So for example, ovs kernel datapath module will implement an xflows
>>>>backend and register it as "ovsdp". Rocker will implement another xflows
>>>>backend and register it as "rockerdp". Then, ovs userspace will use xflows
>>>>api to setup both backends independently, but using the same xflows api.
>>>>
>>>>It is still up to userspace to decide what should be put where (what
>>>>backend to use).
>>>
>>>OK, sounds good so far. Although we can't completely ditch the existing
>>>genl based OVS flow API for obvious backwards compatibility reasons ;-)
>>
>>Sure.
>
>Replied to the other thread before seeing this, but some comments.
>
>>
>>>
>>>How does John's API fit into this? How would you expose capabilities
>>>through xflows? How would it differ from what John proposes?
>>
>>This certainly need more thinking. The capabilities could be exposed
>>either by separate a genl api (like in this version) or directly via TC
>>netlink iface (RTM_GETTFILTERCAP, RTM_GETACTIONCAP). The insides of the
>>message can stay the same. I like the second way better.
>>
>
>For what its worth I started this route when I did the flow API before
>ditching it and going to its own netlink family.
>
>>flow manipulation would happen as standard TC filters/actions manipulation.
>>Here, the Netlink messages could be also very similar to what John has now.
>>
>
>In fact I think they are almost the same ;) I don't mind doing the
>embedding as long as there is some sort of plan for how to attach
>filters to hardware. This is where I got stuck. I think you need
>a new attach point in the hardware. See my other reply.

The special "offload" qdisc of some sort sound a good way to me.


>
>>
>>>
>>>Since this would be a regular tc classifier I assume it could be
>>>attached to any tc class and interface and then combined with other
>>>classifiers which OVS would not be aware of. How do you intend to
>>>resolve such conflicts?
>>>
>>>Example:
>>>eth0:
>>>   ingress qdisc:
>>>     cls prio 20 u32 match [...]
>>>     cls prio 10 xflows [...]
>>>
>>>If xflows offloads to hardware, the u32 classifier with higher
>>>priority is hidden unintentionally.
>>
>>
>>Right. We have to either introduce some limitations for xflows to
>>disallow this or let the user to take care of this. But it's similar
>>problem as if you use tc with John's API or ovs with John's API.
>>
>
>But with the current API its clear that the rules managed by the
>Flow API are in front of 'tc' and 'ovs' on ingress. Just the same
>as it is clear 'tc' ingress rules are walked before 'ovs' ingress
>rules. On egress it is similarly clear that 'ovs' does a forward
>rule to a netdev, then 'tc' fiters+qdisc is run, and finally the
>hardware flow api is hit.


Seems like this would be resolved by the separe "offload" qdisc.

>
>I also think it is clear that when a packet never enters the software
>dataplane _only_ the hardware dataplane rules are used namely the
>entries in the Flow API.
>
>The cases I've been experimenting with using Flow API it is clear
>on the priority and what rules are being used by looking at counters
>and "knowing" the above pipeline mode.
>
>Although as I type this I think a picture would help and some
>documentation.
>
>.John
>
>
>-- 
>John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 15:49                                 ` Thomas Graf
@ 2015-01-23 16:00                                   ` Jiri Pirko
  0 siblings, 0 replies; 66+ messages in thread
From: Jiri Pirko @ 2015-01-23 16:00 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, John Fastabend,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

Fri, Jan 23, 2015 at 04:49:49PM CET, tgraf@suug.ch wrote:
>On 01/23/15 at 04:25pm, Jiri Pirko wrote:
>> Fri, Jan 23, 2015 at 03:07:24PM CET, tgraf@suug.ch wrote:
>> >How does John's API fit into this? How would you expose capabilities
>> >through xflows? How would it differ from what John proposes?
>> 
>> This certainly need more thinking. The capabilities could be exposed
>> either by separate a genl api (like in this version) or directly via TC
>> netlink iface (RTM_GETTFILTERCAP, RTM_GETACTIONCAP). The insides of the
>> message can stay the same. I like the second way better.
>
>OK. Any particular reason why you like the tc integration better?

As I wrote earlier, that would provides us a single interface for flow
manipulation in both sw and hw. That is why I prefer tc here.


>
>> flow manipulation would happen as standard TC filters/actions manipulation.
>> Here, the Netlink messages could be also very similar to what John has now.
>
>I have one concern here: This would mean we put flow modifications
>under the rtnl lock which will have severe impact on the rate of
>flow modifications we can support. We need flow table modifications
>to continue being super fast.

I agree that is a problem. But I believe that can be resolved (have to
think about this some more).

>
>Parallel genetlink operations were introduced just for this.
>
>> Right. We have to either introduce some limitations for xflows to
>> disallow this or let the user to take care of this. But it's similar
>> problem as if you use tc with John's API or ovs with John's API.
>
>Agreed. It's a general problem with having multiple indepdent tools.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 15:53                                 ` Jiri Pirko
@ 2015-01-23 16:00                                   ` Thomas Graf
  2015-01-23 16:08                                     ` John Fastabend
  2015-01-23 16:16                                     ` Jiri Pirko
  0 siblings, 2 replies; 66+ messages in thread
From: Thomas Graf @ 2015-01-23 16:00 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, Jamal Hadi Salim, Pablo Neira Ayuso,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/23/15 at 04:53pm, Jiri Pirko wrote:
> Fri, Jan 23, 2015 at 04:34:55PM CET, john.fastabend@gmail.com wrote:
> >What I don't have a lot of use for at the moment is an xflows that runs
> >in software? Conceptually it sounds fine but why would I want to mirror
> >hardware limitations into software? And if I make it completely generic
> >it becomes u32 more or less. I could create an optimized version of the
> >hardware dataplane in userspace which sits somewhere between u32 and the
> >other classifiers on flexility and maybe gains some performance but I'm
> >at a loss as to why this is useful. I would rather spend my time getting
> >better performance out of u32 and dropping qdisc_lock completely then
> >writing some partially useful filter for software.
> 
> Well, even software implementation has limitations. Take ovs kernel
> datapath as example. You can use your graphs to describe exactly what
> ovs can handle. And after that you could use xflows api to set it up as
> well as your rocker offload. That to me seems lie a very nice feature to
> have.

What is the value of this? The OVS kernel datapath is already built to
fall back to user space if the kernel datapath does not support a
specific feature.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 16:00                                   ` Thomas Graf
@ 2015-01-23 16:08                                     ` John Fastabend
  2015-01-23 16:16                                     ` Jiri Pirko
  1 sibling, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-23 16:08 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jiri Pirko, Jamal Hadi Salim, Pablo Neira Ayuso, simon.horman,
	sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/23/2015 08:00 AM, Thomas Graf wrote:
> On 01/23/15 at 04:53pm, Jiri Pirko wrote:
>> Fri, Jan 23, 2015 at 04:34:55PM CET, john.fastabend@gmail.com wrote:
>>> What I don't have a lot of use for at the moment is an xflows that runs
>>> in software? Conceptually it sounds fine but why would I want to mirror
>>> hardware limitations into software? And if I make it completely generic
>>> it becomes u32 more or less. I could create an optimized version of the
>>> hardware dataplane in userspace which sits somewhere between u32 and the
>>> other classifiers on flexility and maybe gains some performance but I'm
>>> at a loss as to why this is useful. I would rather spend my time getting
>>> better performance out of u32 and dropping qdisc_lock completely then
>>> writing some partially useful filter for software.
>>
>> Well, even software implementation has limitations. Take ovs kernel
>> datapath as example. You can use your graphs to describe exactly what
>> ovs can handle. And after that you could use xflows api to set it up as
>> well as your rocker offload. That to me seems lie a very nice feature to
>> have.
>
> What is the value of this? The OVS kernel datapath is already built to
> fall back to user space if the kernel datapath does not support a
> specific feature.
>

I might be reaching.. but one advantage of something like this API is
the headers are not pre-defined nor are the actions. Coupled with eBPF
or a generic parser (think optimized u32) you would provide the ability
to configure the OVS fields in use and the actions being supported. Also
I haven't thought about it as much but if you had programmable hardware
and/or software you could create the set operations for headers, tables,
actions. I've done some work on the set tables because its relatively
common on existing hardware. Its on github in the flow tool and the
user space tester flowd.

I think the OVS folks have been thinking along these lines. Of course
your still bound by OF1.x at the moment.

.John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 16:00                                   ` Thomas Graf
  2015-01-23 16:08                                     ` John Fastabend
@ 2015-01-23 16:16                                     ` Jiri Pirko
  2015-01-24 13:04                                       ` Jamal Hadi Salim
  1 sibling, 1 reply; 66+ messages in thread
From: Jiri Pirko @ 2015-01-23 16:16 UTC (permalink / raw)
  To: Thomas Graf
  Cc: John Fastabend, Jamal Hadi Salim, Pablo Neira Ayuso,
	simon.horman, sfeldma, netdev, davem, gerlitz.or, andy, ast

Fri, Jan 23, 2015 at 05:00:58PM CET, tgraf@suug.ch wrote:
>On 01/23/15 at 04:53pm, Jiri Pirko wrote:
>> Fri, Jan 23, 2015 at 04:34:55PM CET, john.fastabend@gmail.com wrote:
>> >What I don't have a lot of use for at the moment is an xflows that runs
>> >in software? Conceptually it sounds fine but why would I want to mirror
>> >hardware limitations into software? And if I make it completely generic
>> >it becomes u32 more or less. I could create an optimized version of the
>> >hardware dataplane in userspace which sits somewhere between u32 and the
>> >other classifiers on flexility and maybe gains some performance but I'm
>> >at a loss as to why this is useful. I would rather spend my time getting
>> >better performance out of u32 and dropping qdisc_lock completely then
>> >writing some partially useful filter for software.
>> 
>> Well, even software implementation has limitations. Take ovs kernel
>> datapath as example. You can use your graphs to describe exactly what
>> ovs can handle. And after that you could use xflows api to set it up as
>> well as your rocker offload. That to me seems lie a very nice feature to
>> have.
>
>What is the value of this? The OVS kernel datapath is already built to
>fall back to user space if the kernel datapath does not support a
>specific feature.


As I wrote earlier, the value is that userspace can easily use single
xflows api to take care of all ways to handle flows (ovs kernel dp,
rocker, other device, u32 tc filter + actions, you name it)


    my flow managing app 
          |
uspc      |
  --------|----------------------------------------------------
krnl      |
       tc xflows api
          |  |  |
          |  |  ---------------------------------------------------
          |  |                                                    |
          |  ------------------                                other xflows backend
          |                   |
     ovs xflows backend     rocker driver xflows backend
          |                   |
         ovs dp               |
krnl	                      |
  ----------------------------|--------------------------------
hw                            |
                           rocker switch

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 10:49             ` Thomas Graf
@ 2015-01-23 16:42               ` John Fastabend
  0 siblings, 0 replies; 66+ messages in thread
From: John Fastabend @ 2015-01-23 16:42 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, simon.horman, sfeldma,
	netdev, davem, gerlitz.or, andy, ast, Jiri Pirko

On 01/23/2015 02:49 AM, Thomas Graf wrote:
> On 01/22/15 at 08:58am, John Fastabend wrote:
>> In response to Pablo's observation,
>>
>> Correct this is fully exposed to user space, but it is also self
>> contained inside the API meaning I can learn when to use it and what it
>> does by looking at the other operations tables the table graph and
>> supported headers. The assumption I am making that is not in the API
>> explicitly yet. Is that actions named "set_field_name" perform the
>> set operation on that field. We can and plan to extend the API to make
>> this assumption explicit in the API.
>
> OK. I think it's this assumption that is not explicitly in the API yet
> that causes confusion. Making it explicit would definitely help. Do
> we even need the driver to declare get/set operations at all? Can we
> just have the driver expose the field and the API takes care of
> providing get/set actions?
>
>> Even though its a detail of the rocker world its easy enough for a
>> program on top of the API to learn how it works.
>>
>> So in the rocker switch case if I want to rewrite an eth_dst adress I
>> have a  couple choices. I can set the group_id in one of the tables
>> that support setting the group_id and then do the rewrite in one of the
>> tables that supports matching on group_id and setting the eth_dst mac.
>> The "choice" I make is a policy IMO and I don't want to hard code logic
>> in the kernel that picks tables and decides things like what should I
>> do if table x is full but table y could also be used should I overflow
>> into table y? Or  is table y reserved for some other network function?
>> etc.
>
> Agreed. It might make sense to declare such fields as general purpose
> metadata or have some kind of field class which describes the nature
> of the field: { register, protocol-field, configuration, ... }
>
>> There are some actions and metadata though that _need_ to be
>> standardized. These are the metadata that is used outside the API. For
>> example ingress_port is metadata that is set outside the tables.
>> Similarly set_egress_port and set_egress_queue provide the forwarding
>> and queueing fields. No matter how hard you look at the model from the
>> API you can not learn how these are used.
>
> Agreed. I assume we would implement a tun_dst the same standardized way.
>
>>> What would a rocker group map to in the tc world?
>>
>> In the 'tc' world I would guess the easiest thing to do is simply bind
>> a 'tc' qdisc to the ACL table. It seems a good first approximation of
>> how to make this work. But the rocker world doesn't yet have any QOS so
>> it makes it difficult to "offload" anything but the fifo qdiscs.
>
> Right. I was asking as tc will have the same difficulty if it wishes
> to classify based on rocker groups or other general purpose hardware
> metadata fields. We can either supoprt them by describing them and
> allow learning of such fields or ignore them.
>
>>>> I see this as a gaping hole
>>>> for vendor SDKs with their own definitions of their own hardware that
>>>> doesnt work with anyone else. i.e it seems to standardize proprietary
>>>> interfaces. Maybe thats what Pablo is alluding to.
>>>
>>> I will be the first to root for rejection if such patches appear.
>>>
>>
>> Is it problematic if users define some unique header here and then
>> provide actions to set/pop/push/get operations on it?

couple additional comments I wanted to add here,

>
> I have no problem with unique headers but we have to ensure that a
> field with identical purpose or same logical meaning is represented in
> the same way by all drivers. If a driver introduces a new field it
> must consider that other drivers will need/want to use it as well.
> I guess/hope this is obvious though ;-)
>

Yep, so my thought here is as if_flow_common.h builds up a list of
headers then a driver writer can go into their pipeline and "click" on
the headers they support to define the devices parse graph. Because
pkt headers are described using length/offset/mask types its always
clear what an ethernet header is and what a ip header is. There really
is no way to define/represent an ip header differently. If a driver
writer puts in there own definition and doesn't use our if_flow_common
definition it will be duplicate code and we should squash it. But for a
consumer of the API it will be the _same_ header assuming there are no
bugs in the definition.

> I agree that if chip A has 8 general purpose registers and chip B has
> 32 of them then it doesn't matter how they are called. What matters is
> that they are declared as such to API users.

You could add a flag to the field type to indicate explicitly it is
metadata or a register but I've not found any need for it. I leave
it out of the pkt header graph and then let consumers note these
are metadata fields that can be used. If we need this to be explicit
its easy enough to add a flags field.

>
> Actions must obviously be standardized as your proposal already does
> by exposing push_vlan, pop_vlan, etc.
>

At minimum you need a standardized a primitive set that you can use to
define other actions. push/pop/set/get can be standardized. It may be
that hardware has more complicated actions that act as an atomic actions
to do a entire list of actions, set some fields + pop headers for
example. Consumers can sort out how these actions work by looking at
the list of primitives. Its an optimization problem then for users of
the API to "know" they have applied a set of actions that the hardware
could actually do in a single action.

Note this set of patches 00/12 did not define the primitives or use it
in the action definitions. I think it is follow on work. Besides rocker
doesn't have any of these type of actions yet. A 'route' action might
be described using primitives for example.

.John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 15:34                               ` John Fastabend
  2015-01-23 15:53                                 ` Jiri Pirko
@ 2015-01-23 17:46                                 ` Thomas Graf
  2015-01-23 19:59                                   ` John Fastabend
  2015-01-24 13:22                                   ` Jamal Hadi Salim
  2015-01-24 13:01                                 ` Jamal Hadi Salim
  2 siblings, 2 replies; 66+ messages in thread
From: Thomas Graf @ 2015-01-23 17:46 UTC (permalink / raw)
  To: John Fastabend, Jiri Pirko
  Cc: Jamal Hadi Salim, Pablo Neira Ayuso, simon.horman, sfeldma,
	netdev, davem, gerlitz.or, andy, ast

I'm pulling in both branches of the thread here:

On 01/23/15 at 04:56pm, Jiri Pirko wrote:
> Fri, Jan 23, 2015 at 04:43:48PM CET, john.fastabend@gmail.com wrote:
> >But with the current API its clear that the rules managed by the
> >Flow API are in front of 'tc' and 'ovs' on ingress. Just the same
> >as it is clear 'tc' ingress rules are walked before 'ovs' ingress
> >rules. On egress it is similarly clear that 'ovs' does a forward
> >rule to a netdev, then 'tc' fiters+qdisc is run, and finally the
> >hardware flow api is hit.
> 
> 
> Seems like this would be resolved by the separe "offload" qdisc.

I'm not sure I understand the offload qdisc yet. My interpretation
so far is that it would contain childs which *must* be offloaded.

How would one transparently offload tc in this model? e.g. let's
assume we have a simple prio qdisc with u32 cls:

eth0
  prio
      class
      class
      ...
    u32 ...
    u32 ...

Would you need to attach the prio to an "offload qdisc" to offload
it or would that happen automatically? How would this looks like to
user space?

eth0
  offload
    prio
      u32
      u32
  prio
   u32
   u32

Like this?

> >The cases I've been experimenting with using Flow API it is clear
> >on the priority and what rules are being used by looking at counters
> >and "knowing" the above pipeline mode.
> >
> >Although as I type this I think a picture would help and some
> >documentation.

+1

We need one of those awesome graphs as the netfilter guys had it with
where the hooks are attached to ;-)

On 01/23/15 at 07:34am, John Fastabend wrote:
> Now 'xflows' needs to implement the same get operations that exist in
> this flow API otherwise writing meaningful policies as Thomas points out
> is crude at best. So this tc classifier supports 'get headers',
> 'get actions', and 'get tables' and then there associated graphs. All
> good so far. This is just an embedding of the existing API in the 'tc'
> netlink family. I've never had any issues with this. Finally you build
> up the 'get_flow' and 'set_flow' operations I still so no issue with
> this and its just an embedding of the existing API into a 'tc
> classifier'. My flow tool becomes one of the classifier tools.

.... if we can get rid of the rtnl lock in the flow mod path ;-)

> Now what should I attach my filter to? Typically we attach it to qdiscs
> today. But what does that mean for a switch device? I guess I need an
> _offloaded qdisc_? I don't want to run the same qdisc in my dataplane
> of the switch as I run on the ports going into/out of the sw dataplane.
> Similarly I don't want to run the same set of filters. So at this point
> I have a set of qdiscs per port to represent the switch dataplane and
> a set of qdiscs attached to the software dataplane. If people think this
> is worth doing lets do it. It may get you a nice way to manage QOS while
> your @ it.

If I interpret this correctly then this would imply that each switch
port is represented with a net_device as this is what the tc API
understands.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 17:46                                 ` Thomas Graf
@ 2015-01-23 19:59                                   ` John Fastabend
  2015-01-23 23:16                                     ` Thomas Graf
  2015-01-24 13:22                                   ` Jamal Hadi Salim
  1 sibling, 1 reply; 66+ messages in thread
From: John Fastabend @ 2015-01-23 19:59 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jiri Pirko, Jamal Hadi Salim, Pablo Neira Ayuso, simon.horman,
	sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/23/2015 09:46 AM, Thomas Graf wrote:
> I'm pulling in both branches of the thread here:
>
> On 01/23/15 at 04:56pm, Jiri Pirko wrote:
>> Fri, Jan 23, 2015 at 04:43:48PM CET, john.fastabend@gmail.com wrote:
>>> But with the current API its clear that the rules managed by the
>>> Flow API are in front of 'tc' and 'ovs' on ingress. Just the same
>>> as it is clear 'tc' ingress rules are walked before 'ovs' ingress
>>> rules. On egress it is similarly clear that 'ovs' does a forward
>>> rule to a netdev, then 'tc' fiters+qdisc is run, and finally the
>>> hardware flow api is hit.
>>
>>
>> Seems like this would be resolved by the separe "offload" qdisc.
>
> I'm not sure I understand the offload qdisc yet. My interpretation
> so far is that it would contain childs which *must* be offloaded.

Correct that is my suggestion.

_If_ we want to pursue an embedding inside tc/qdisc for the Flow API
then we need some structure to attach filters and qdisc's that _must_
be offloaded. I have cases where qdisc's on the software dataplane
will be entirely different the qdisc/filter layout on the hardware
dataplane. If you don't do this you end up with a rather strange
array of filters that I don't see anyway to unravel especially
with filters like u32 that have many tables and hardware that has
many tables.

In these cases IMO its going to be easiest to reason about the state
and how to configure it if you have two qdisc/filter attach points. One
for software and one for hardware.

>
> How would one transparently offload tc in this model? e.g. let's
> assume we have a simple prio qdisc with u32 cls:
>
> eth0
>    prio
>        class
>        class
>        ...
>      u32 ...
>      u32 ...
>
> Would you need to attach the prio to an "offload qdisc" to offload
> it or would that happen automatically? How would this looks like to
> user space?

My take is it doesn't happen transparently in general. The user space
has to add the qdisc then subsequently attach flows and actions
explicitly to the hardware qdisc. But I'm confused about what 'tc' has
to say about global pipelines see below,

>
> eth0
>    offload
>      prio
>        u32
>        u32
>    prio
>     u32
>     u32
>
> Like this?
>

So if I try to do a mock 'tc' session first creating some software
QOS and filters,

   # tc qdisc dev eth0 handle 8001: root mq <- add my mq sw qdisc
   # tc qdisc dev eth0 parent 8001:1 fq_codel <- add my fq_codel qdiscs
   # tc qdisc dev eth0 parent 8001:2 prio <- one per queue
	...
   # tc filter add dev p3p2 parent 8001:2 \
        protocol ip prio 20 \
        u32 match ip protocol 1 0xff \
        action skbedit priority        <- arbitrary filter

    [...]

   everything above is part of my software dataplane next up add some
   hw qdisc's and filters.

   # tc qdisc dev eth0 handle hw_dpif: root mq <- add my mq hw qdisc
   # tc qdisc show
	[...] <- normal output
	qdisc mq (hwdpif) 0: dev eth0 ...

So that seems OK to me I have a multiqueue QOS object on top of a
netdev that represents the switch _port_.

But it starts to break when I want to add a filter to the flow table
pipeline _not_ a qdisc on a port. The pipeline is shared between all
ports its a per port queueing discipline which is how the current
'tc' model works?

And here is where I stopped in my initial attempt and decided we needed
a new object the Flow API. But let me try to push it perhaps? So I need
something to represent the actual pipeline not the per port qdisc. A
new 'tc' object called 'tables' perhaps?

   # tc tables dev eth0 show
	[...]
       table: vlan:2
	src 1 apply 2 size -1
	matches:
	 in_lport [in_lport (lpm)]
	 vlan [vid (lpm)]
	actions:
	 set_vlan_id ( u16 vlan_id 0  )
	[...]

So the above is just selected output from 'flow' tool command giving a
table description. Then I can use the same syntax as my 'flow' tool but
embedded in 'tc'

  # tc tables dev eth0 set_rule prio 1 handle 4 table 2  \
     match in_lport.in_lport 1 0xffffffff		\
     action set_vlan_id 10

This could work but its a very simple embedding of what I have now.

Also I can imagine another qdisc options to offload port filters/qos
automagically from inside 'tc'. This could/should be done regardless
of if the Flow API is embedded in 'tc' IMO. So we can have a bit,

# tc qdisc set dev p3p2 handle 8001: offload

Then we can do some tests and offload flows and rules from 'tc'. but
I hope(?) its clear its not the same operation as the above 'tables'
command that I made up to represent the pipeline. The tables cmd above
lets me work on the pipeline.


>>> The cases I've been experimenting with using Flow API it is clear
>>> on the priority and what rules are being used by looking at counters
>>> and "knowing" the above pipeline mode.
>>>
>>> Although as I type this I think a picture would help and some
>>> documentation.
>
> +1
>
> We need one of those awesome graphs as the netfilter guys had it with
> where the hooks are attached to ;-)

Yes, I'll try to draft something next week. I'm a bit worried my above
example is a bit convoluted without it.

>
> On 01/23/15 at 07:34am, John Fastabend wrote:
>> Now 'xflows' needs to implement the same get operations that exist in
>> this flow API otherwise writing meaningful policies as Thomas points out
>> is crude at best. So this tc classifier supports 'get headers',
>> 'get actions', and 'get tables' and then there associated graphs. All
>> good so far. This is just an embedding of the existing API in the 'tc'
>> netlink family. I've never had any issues with this. Finally you build
>> up the 'get_flow' and 'set_flow' operations I still so no issue with
>> this and its just an embedding of the existing API into a 'tc
>> classifier'. My flow tool becomes one of the classifier tools.
>
> .... if we can get rid of the rtnl lock in the flow mod path ;-)

Well isn't it the qdisc lock here? And its not needed anymore for
filters/actions only qdisc's use it because they are not lock-safe
yet. Its been on my backlog to start replacing the skb lists with
lock-free rings but I haven't got anywhere on this yet.

Although a hardware doesn't really need a queuing discipline its
done in hardware so you could drop the qdisc lock in this case.

>
>> Now what should I attach my filter to? Typically we attach it to qdiscs
>> today. But what does that mean for a switch device? I guess I need an
>> _offloaded qdisc_? I don't want to run the same qdisc in my dataplane
>> of the switch as I run on the ports going into/out of the sw dataplane.
>> Similarly I don't want to run the same set of filters. So at this point
>> I have a set of qdiscs per port to represent the switch dataplane and
>> a set of qdiscs attached to the software dataplane. If people think this
>> is worth doing lets do it. It may get you a nice way to manage QOS while
>> your @ it.
>
> If I interpret this correctly then this would imply that each switch
> port is represented with a net_device as this is what the tc API
> understands.
>

I think this would work for QOS but I'm also confused as I tried to
illustrate above how the global pipeline fits into the 'tc' model where
everything is a port with queues.


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 19:59                                   ` John Fastabend
@ 2015-01-23 23:16                                     ` Thomas Graf
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Graf @ 2015-01-23 23:16 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jiri Pirko, Jamal Hadi Salim, Pablo Neira Ayuso, simon.horman,
	sfeldma, netdev, davem, gerlitz.or, andy, ast

[Skipping the tc model part for now. Need some time to digest
 all of that]

On 01/23/15 at 11:59am, John Fastabend wrote:
> On 01/23/2015 09:46 AM, Thomas Graf wrote:
> >.... if we can get rid of the rtnl lock in the flow mod path ;-)
> 
> Well isn't it the qdisc lock here? And its not needed anymore for
> filters/actions only qdisc's use it because they are not lock-safe
> yet. Its been on my backlog to start replacing the skb lists with
> lock-free rings but I haven't got anywhere on this yet.
> 
> Although a hardware doesn't really need a queuing discipline its
> done in hardware so you could drop the qdisc lock in this case.

I'm not even in the data path yet with that comment. I'm worried
with the locking in the control path as talking rtnetlink implies
taking rtnl for each flow modification.

Agreed that we wouldn't depend on the qdisc lock for offloaded flows.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 16:58           ` John Fastabend
  2015-01-23 10:49             ` Thomas Graf
@ 2015-01-24 12:29             ` Jamal Hadi Salim
  1 sibling, 0 replies; 66+ messages in thread
From: Jamal Hadi Salim @ 2015-01-24 12:29 UTC (permalink / raw)
  To: John Fastabend, Thomas Graf
  Cc: Pablo Neira Ayuso, simon.horman, sfeldma, netdev, davem,
	gerlitz.or, andy, ast, Jiri Pirko

Sorry I have been running around like a lunatic chicken so havent
had time to join the fun discussion. I hope we can make progress at
the meeting.
I am going to skim and jump through the emails and comment.

On 01/22/15 11:58, John Fastabend wrote:
> On 01/22/2015 07:13 AM, Thomas Graf wrote:
>> On 01/22/15 at 10:00am, Jamal Hadi Salim wrote:

>
> Correct this is fully exposed to user space, but it is also self
> contained inside the API meaning I can learn when to use it and what it
> does by looking at the other operations tables the table graph and
> supported headers. The assumption I am making that is not in the API
> explicitly yet. Is that actions named "set_field_name" perform the
> set operation on that field. We can and plan to extend the API to make
> this assumption explicit in the API.
>

 From what you describe, you are running into a danger of going too low
level such that the interface will end up weighing too much into
flexibility/perfomance and less into usability. If there is one lesson
i learnt from netfilter is usability counts for something. You dont
want another u32 api (otherwise Jiri wouldnt have to write that new
classifier - there is nothing he is doing that cant be done with
u32).

> In this case I can "learn" that I can match on group_id in some tables
> and then use the above action to set the group_id in others.
>

And this discoverability was part of my concern especially when there is
no "stickiness" to the kernel or Linux tooling for that matter by
going direct to hardware. It is a tactical issue more than anything
else. With the approach, you need a little bit of clue of course, you 
really dont even care about compiling the kernel. Essentially the 
barrier to entry for SDKs is immensely lowered.
SDK joy. I know you were intending to replace ethtool - but you are
replacing it with a turbo engine and we need to look at a much bigger
scope.

cheers,
jamal

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-22 17:44                 ` Thomas Graf
@ 2015-01-24 12:34                   ` Jamal Hadi Salim
  2015-01-24 13:48                     ` Thomas Graf
  0 siblings, 1 reply; 66+ messages in thread
From: Jamal Hadi Salim @ 2015-01-24 12:34 UTC (permalink / raw)
  To: Thomas Graf, Pablo Neira Ayuso
  Cc: John Fastabend, simon.horman, sfeldma, netdev, davem, gerlitz.or,
	andy, ast, Jiri Pirko

On 01/22/15 12:44, Thomas Graf wrote:
> On 01/22/15 at 05:49pm, Pablo Neira Ayuso wrote:

>
> You can achieve the exact same thing with an out of tree tc action,
> classifier or even a new link type. Nothing prevents an out of tree
> driver to register a new rtnetlink link type and do vendor specific
> crap.
>

They are not the same. The API lowers the barrier immensely; refer to
my response to John. And we are actually handing it to them.

 From an equivalance perspective:
Youd have to convince Dave to allow all those TOE vendors to get in
with their direct hardware APIs (if those guys are still around).

cheers,
jamal

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 11:39                       ` Jiri Pirko
  2015-01-23 12:28                         ` Thomas Graf
@ 2015-01-24 12:36                         ` Jamal Hadi Salim
  1 sibling, 0 replies; 66+ messages in thread
From: Jamal Hadi Salim @ 2015-01-24 12:36 UTC (permalink / raw)
  To: Jiri Pirko, Thomas Graf
  Cc: Pablo Neira Ayuso, John Fastabend, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast

On 01/23/15 06:39, Jiri Pirko wrote:

> Maybe I did not express myself correctly. I do not care if this is
> exposed by rtnl or a separate genetlink. The issue still stands. And the
> issue is that the user have to use "the way A" to setup sw datapath and
> "the way B" to setup hw datapath. The preferable would be to have
> "the way X" which can be used to setup both sw and hw.
>
> And I believe that could be achieved. Consider something like this:
>
> - have cls_xflows tc classifier and act_xflows tc action as a wrapper
>    (or api) for John's work. With possibility for multiple backends. The
>    backend iface would looke very similar to what John has now.
> - other tc clses and acts will implement xflows backend
> - openvswitch datapath will implement xflows backend
> - rocker switch will implement xflows backend
> - other drivers will implement xflows backend
>
> Now if user wants to manipulate with any flow setting, he can just use
> cls_xflows and act_xflows to to that.
>
> This is very rough, but I just wanted to draw the picture. This would
> provide single entry to flow world manipulation in kernel, no matter if
> sw or hw.
>
> Thoughts?
>


Exactly my thinking as well.
I guess skipping a few emails helps.

cheers,
jamal

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 15:34                               ` John Fastabend
  2015-01-23 15:53                                 ` Jiri Pirko
  2015-01-23 17:46                                 ` Thomas Graf
@ 2015-01-24 13:01                                 ` Jamal Hadi Salim
  2015-01-26  8:26                                   ` Simon Horman
  2 siblings, 1 reply; 66+ messages in thread
From: Jamal Hadi Salim @ 2015-01-24 13:01 UTC (permalink / raw)
  To: John Fastabend, Thomas Graf
  Cc: Jiri Pirko, Pablo Neira Ayuso, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast

On 01/23/15 10:34, John Fastabend wrote:

> First 'tc' infrastructure doesn't have any classifier that would map
> well to this today so you are talking about a new classifier looks like
> Jiri is calling it xflows. This is fine.

I know you know this (and apologies for the little Australian Bike
Shed tangent):
You can do _any_ classifier you want. xflows just happens to make the
OF people happy. Someone else who wants to classify on pcre like
strings can go ahead and write another one.
i.e there is no monopoly on what a classifier should be.

> Now 'xflows' needs to implement the same get operations that exist in
> this flow API otherwise writing meaningful policies as Thomas points out
> is crude at best.

It is crude only if you assume the kernel is doing your policies
and fixing any conflicts. Let the kernel do mechanisms and have user
space do the brainy part. No need to give total autonomy to the kernel.


> So this tc classifier supports 'get headers',
> 'get actions', and 'get tables' and then there associated graphs. All
> good so far. This is just an embedding of the existing API in the 'tc'
> netlink family. I've never had any issues with this. Finally you build
> up the 'get_flow' and 'set_flow' operations I still so no issue with
> this and its just an embedding of the existing API into a 'tc
> classifier'. My flow tool becomes one of the classifier tools.
>

You have very few generic verbs really within tc and i dont see
much more needed.
GET/SET(mods for create/append/replace)/DEL with the object
being a noun. Add a handful for capabilities exercising verbs and
you should be on your way.
BTW: I did have capabilities in actions for years but Cong sent a
patch about a year or so ago to kill them because they were not being
exercised from user space tc - I protested but Dave overruled me.
There are still remnants - look at struct tcf_common field
tcfc_capab - the original intent was to have that look like netdev
features bitmask. In any case i never got to proper implementation
and have gained a lot of experience since those early days
and my thinking has changed.


> Now what should I attach my filter to? Typically we attach it to qdiscs
> today. But what does that mean for a switch device? I guess I need an
> _offloaded qdisc_? I don't want to run the same qdisc in my dataplane
> of the switch as I run on the ports going into/out of the sw dataplane.

I dont know if you need a qdisc necessarily that sits in hardware.
But you need to anchor your policy somewhere. The ingress qdisc is
really a dummy for this purpose. It is the beggining of the pipeline.
Most of the hardware i have looked at has some anchor point for the
hardware ACLs. Typically around a queue or a port. Sometime i find it
hard to use this model because of vendor SDKs and APIs they offer.

> Similarly I don't want to run the same set of filters. So at this point
> I have a set of qdiscs per port to represent the switch dataplane and
> a set of qdiscs attached to the software dataplane. If people think this
> is worth doing lets do it. It may get you a nice way to manage QOS while
> your @ it.
>

Lets discuss at the meeting. I am just skimming these emails (the
conference is chewing a lot of my time so i will mostly be absent).
Sorry if i am not responding to some things.

cheers,
jamal


>
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 16:16                                     ` Jiri Pirko
@ 2015-01-24 13:04                                       ` Jamal Hadi Salim
  0 siblings, 0 replies; 66+ messages in thread
From: Jamal Hadi Salim @ 2015-01-24 13:04 UTC (permalink / raw)
  To: Jiri Pirko, Thomas Graf
  Cc: John Fastabend, Pablo Neira Ayuso, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast

I am not big on high fives but here's my +1
Excellent diagram below.

cheers,
jamal
On 01/23/15 11:16, Jiri Pirko wrote:
>
> As I wrote earlier, the value is that userspace can easily use single
> xflows api to take care of all ways to handle flows (ovs kernel dp,
> rocker, other device, u32 tc filter + actions, you name it)
>
>
>      my flow managing app
>            |
> uspc      |
>    --------|----------------------------------------------------
> krnl      |
>         tc xflows api
>            |  |  |
>            |  |  ---------------------------------------------------
>            |  |                                                    |
>            |  ------------------                                other xflows backend
>            |                   |
>       ovs xflows backend     rocker driver xflows backend
>            |                   |
>           ovs dp               |
> krnl	                      |
>    ----------------------------|--------------------------------
> hw                            |
>                             rocker switch
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-23 17:46                                 ` Thomas Graf
  2015-01-23 19:59                                   ` John Fastabend
@ 2015-01-24 13:22                                   ` Jamal Hadi Salim
  2015-01-24 13:34                                     ` Thomas Graf
  1 sibling, 1 reply; 66+ messages in thread
From: Jamal Hadi Salim @ 2015-01-24 13:22 UTC (permalink / raw)
  To: Thomas Graf, John Fastabend, Jiri Pirko
  Cc: Pablo Neira Ayuso, simon.horman, sfeldma, netdev, davem,
	gerlitz.or, andy, ast

On 01/23/15 12:46, Thomas Graf wrote:

> I'm not sure I understand the offload qdisc yet. My interpretation
> so far is that it would contain childs which *must* be offloaded.
>
> How would one transparently offload tc in this model? e.g. let's
> assume we have a simple prio qdisc with u32 cls:
>
> eth0
>    prio
>        class
>        class
>        ...
>      u32 ...
>      u32 ...
>

My view is:
It is up to user space to decide on what the policy should do.
The kernel is not paid to think. You tell it what to do and it does it
efficiently. So if you are going to tell it to have a mix and match
of some things to execute in hardware and some in software then
it may shoot someone's big toe.
IOW, user space should decide how a packet is going to flow.
Agreed that we would need a good way to provide this knowledge
to user space.
BTW: Thomas, reading your other email quickly:
the idea that metadata would be carried around on OF pipeline and
some script at the end executes the actions is imo  a hardware
pipeline hack limitation. Why do i want to defer dropping a packet
when some action is telling me to drop it? ;->
For some reason, brcm hardware in particulat requires that i
complete the pipeline first.
I dont know why we need such a limitation in s/ware (and tc will kill
the pipeline when needed).

Sorry, trying to post while doing other things so not paying close
attention to possibly other important details.

cheers,
jamal

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-24 13:22                                   ` Jamal Hadi Salim
@ 2015-01-24 13:34                                     ` Thomas Graf
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Graf @ 2015-01-24 13:34 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: John Fastabend, Jiri Pirko, Pablo Neira Ayuso, simon.horman,
	sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/24/15 at 08:22am, Jamal Hadi Salim wrote:
> It is up to user space to decide on what the policy should do.
> The kernel is not paid to think. You tell it what to do and it does it
> efficiently. So if you are going to tell it to have a mix and match
> of some things to execute in hardware and some in software then
> it may shoot someone's big toe.

OK. We seem agree on this part. In order to do so, user space needs
to know about hardware capabilities. If that should happen through
tc, so be it. John raised some open question around this and the
rtnl lock is currently a blocker on this architecture as well.

> IOW, user space should decide how a packet is going to flow.
> Agreed that we would need a good way to provide this knowledge
> to user space.
> BTW: Thomas, reading your other email quickly:
> the idea that metadata would be carried around on OF pipeline and
> some script at the end executes the actions is imo  a hardware
> pipeline hack limitation. Why do i want to defer dropping a packet
> when some action is telling me to drop it? ;->

There is obviously no reason to defer a drop.

An example of deferred actions would be if only certain tables allow
certain actions but the matching to chose the action is done in a 
previous tables. Or if you have multiple tables matching on the
original packet header and you need to defer the L2/L3 rewrite until
all matching and action construction is done.

> For some reason, brcm hardware in particulat requires that i
> complete the pipeline first.
> I dont know why we need such a limitation in s/ware (and tc will kill
> the pipeline when needed).

Not sure what "killing the pipeline" means ;-)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-24 12:34                   ` Jamal Hadi Salim
@ 2015-01-24 13:48                     ` Thomas Graf
  0 siblings, 0 replies; 66+ messages in thread
From: Thomas Graf @ 2015-01-24 13:48 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, John Fastabend, simon.horman, sfeldma, netdev,
	davem, gerlitz.or, andy, ast, Jiri Pirko

On 01/24/15 at 07:34am, Jamal Hadi Salim wrote:
> They are not the same. The API lowers the barrier immensely; refer to
> my response to John. And we are actually handing it to them.
> 
> From an equivalance perspective:
> Youd have to convince Dave to allow all those TOE vendors to get in
> with their direct hardware APIs (if those guys are still around).

I'm not advocating TOE, not sure why that pops up. I made the point
that various ways already exist for an out of tree driver to expose
a Netlink or other interface which provides direct access to the
hardware and that we can't prevent that.

Anyways, looks like you agree to the general direction the thread has
taken with Jiri's input.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-24 13:01                                 ` Jamal Hadi Salim
@ 2015-01-26  8:26                                   ` Simon Horman
  2015-01-26 12:26                                     ` Jamal Hadi Salim
  0 siblings, 1 reply; 66+ messages in thread
From: Simon Horman @ 2015-01-26  8:26 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: John Fastabend, Thomas Graf, Jiri Pirko, Pablo Neira Ayuso,
	sfeldma, netdev, davem, gerlitz.or, andy, ast

On Sat, Jan 24, 2015 at 08:01:52AM -0500, Jamal Hadi Salim wrote:

[snip]

> Lets discuss at the meeting. I am just skimming these emails (the
> conference is chewing a lot of my time so i will mostly be absent).
> Sorry if i am not responding to some things.

Is "the meeting" the Hardware Offloading BoF at Netdev 1.0?
For the benefit of others: https://www.netdev01.org/sessions/10

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-26  8:26                                   ` Simon Horman
@ 2015-01-26 12:26                                     ` Jamal Hadi Salim
  2015-01-27  4:28                                       ` David Ahern
  0 siblings, 1 reply; 66+ messages in thread
From: Jamal Hadi Salim @ 2015-01-26 12:26 UTC (permalink / raw)
  To: Simon Horman
  Cc: John Fastabend, Thomas Graf, Jiri Pirko, Pablo Neira Ayuso,
	sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/26/15 03:26, Simon Horman wrote:
> On Sat, Jan 24, 2015 at 08:01:52AM -0500, Jamal Hadi Salim wrote:
>
> [snip]
>
>> Lets discuss at the meeting. I am just skimming these emails (the
>> conference is chewing a lot of my time so i will mostly be absent).
>> Sorry if i am not responding to some things.
>
> Is "the meeting" the Hardware Offloading BoF at Netdev 1.0?
> For the benefit of others: https://www.netdev01.org/sessions/10
>

Yes, that is the plan. Sorry - the statement was addressed at the usual
suspects who probably understood the context.

cheers,
jamal

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-26 12:26                                     ` Jamal Hadi Salim
@ 2015-01-27  4:28                                       ` David Ahern
  2015-01-27  4:58                                         ` Andy Gospodarek
  0 siblings, 1 reply; 66+ messages in thread
From: David Ahern @ 2015-01-27  4:28 UTC (permalink / raw)
  To: Jamal Hadi Salim, Simon Horman
  Cc: John Fastabend, Thomas Graf, Jiri Pirko, Pablo Neira Ayuso,
	sfeldma, netdev, davem, gerlitz.or, andy, ast

On 1/26/15 5:26 AM, Jamal Hadi Salim wrote:
> On 01/26/15 03:26, Simon Horman wrote:
>> On Sat, Jan 24, 2015 at 08:01:52AM -0500, Jamal Hadi Salim wrote:
>>> Lets discuss at the meeting. I am just skimming these emails (the
>>> conference is chewing a lot of my time so i will mostly be absent).
>>> Sorry if i am not responding to some things.
>>
>> Is "the meeting" the Hardware Offloading BoF at Netdev 1.0?
>> For the benefit of others: https://www.netdev01.org/sessions/10
>>
>
> Yes, that is the plan. Sorry - the statement was addressed at the usual
> suspects who probably understood the context.

Will someone be taking copious notes or recording the sessions and then 
making those available for those not in attendance?

David

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-27  4:28                                       ` David Ahern
@ 2015-01-27  4:58                                         ` Andy Gospodarek
  2015-01-27 15:54                                           ` Jamal Hadi Salim
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Gospodarek @ 2015-01-27  4:58 UTC (permalink / raw)
  To: David Ahern
  Cc: Jamal Hadi Salim, Simon Horman, John Fastabend, Thomas Graf,
	Jiri Pirko, Pablo Neira Ayuso, sfeldma, netdev, davem,
	gerlitz.or, andy, ast

On Mon, Jan 26, 2015 at 09:28:57PM -0700, David Ahern wrote:
> On 1/26/15 5:26 AM, Jamal Hadi Salim wrote:
> >On 01/26/15 03:26, Simon Horman wrote:
> >>On Sat, Jan 24, 2015 at 08:01:52AM -0500, Jamal Hadi Salim wrote:
> >>>Lets discuss at the meeting. I am just skimming these emails (the
> >>>conference is chewing a lot of my time so i will mostly be absent).
> >>>Sorry if i am not responding to some things.
> >>
> >>Is "the meeting" the Hardware Offloading BoF at Netdev 1.0?
> >>For the benefit of others: https://www.netdev01.org/sessions/10
> >>
> >
> >Yes, that is the plan. Sorry - the statement was addressed at the usual
> >suspects who probably understood the context.
> 
> Will someone be taking copious notes or recording the sessions and then
> making those available for those not in attendance?
> 

I'm not sure if the sessions will be recorded, but notes will be taken
for those wise enough not to come to Ottawa this time of year.  :)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [net-next PATCH v3 00/12] Flow API
  2015-01-27  4:58                                         ` Andy Gospodarek
@ 2015-01-27 15:54                                           ` Jamal Hadi Salim
  0 siblings, 0 replies; 66+ messages in thread
From: Jamal Hadi Salim @ 2015-01-27 15:54 UTC (permalink / raw)
  To: Andy Gospodarek, David Ahern
  Cc: Simon Horman, John Fastabend, Thomas Graf, Jiri Pirko,
	Pablo Neira Ayuso, sfeldma, netdev, davem, gerlitz.or, andy, ast

On 01/26/15 23:58, Andy Gospodarek wrote:
> On Mon, Jan 26, 2015 at 09:28:57PM -0700, David Ahern wrote:

>> Will someone be taking copious notes or recording the sessions and then
>> making those available for those not in attendance?
>>
>
> I'm not sure if the sessions will be recorded, but notes will be taken
> for those wise enough not to come to Ottawa this time of year.  :)
>

We hope the content is hot enough it will melt all the ice around the
building but not down the street.
Tourists are welcome as well ;->

We are soliciting volunteers to do videos that we can post.
Having volunteers taking notes certainly is an excellent idea.

cheers,
janmal

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2015-01-27 15:54 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-20 20:26 [net-next PATCH v3 00/12] Flow API John Fastabend
2015-01-20 20:26 ` [net-next PATCH v3 01/12] net: flow_table: create interface for hw match/action tables John Fastabend
2015-01-22  4:37   ` Simon Horman
2015-01-20 20:27 ` [net-next PATCH v3 02/12] net: flow_table: add rule, delete rule John Fastabend
2015-01-20 20:27 ` [net-next PATCH v3 03/12] net: flow: implement flow cache for get routines John Fastabend
2015-01-20 20:27 ` [net-next PATCH v3 04/12] net: flow_table: create a set of common headers and actions John Fastabend
2015-01-20 20:59   ` John W. Linville
2015-01-20 22:10     ` John Fastabend
2015-01-20 20:28 ` [net-next PATCH v3 05/12] net: flow_table: add validation functions for rules John Fastabend
2015-01-20 20:28 ` [net-next PATCH v3 06/12] net: rocker: add pipeline model for rocker switch John Fastabend
2015-01-20 20:29 ` [net-next PATCH v3 07/12] net: rocker: add set rule ops John Fastabend
2015-01-20 20:29 ` [net-next PATCH v3 08/12] net: rocker: add group_id slices and drop explicit goto John Fastabend
2015-01-20 20:30 ` [net-next PATCH v3 09/12] net: rocker: add multicast path to bridging John Fastabend
2015-01-20 20:30 ` [net-next PATCH v3 10/12] net: rocker: add cookie to group acls and use flow_id to set cookie John Fastabend
2015-01-20 20:31 ` [net-next PATCH v3 11/12] net: rocker: have flow api calls set cookie value John Fastabend
2015-01-20 20:31 ` [net-next PATCH v3 12/12] net: rocker: implement delete flow routine John Fastabend
2015-01-22 12:52 ` [net-next PATCH v3 00/12] Flow API Pablo Neira Ayuso
2015-01-22 13:37   ` Thomas Graf
2015-01-22 14:00     ` Pablo Neira Ayuso
2015-01-22 15:00       ` Jamal Hadi Salim
2015-01-22 15:13         ` Thomas Graf
2015-01-22 15:28           ` Jamal Hadi Salim
2015-01-22 15:37             ` Thomas Graf
2015-01-22 15:44               ` Jamal Hadi Salim
2015-01-23 10:10                 ` Thomas Graf
2015-01-23 10:24                   ` Jiri Pirko
2015-01-23 11:08                     ` Thomas Graf
2015-01-23 11:39                       ` Jiri Pirko
2015-01-23 12:28                         ` Thomas Graf
2015-01-23 13:43                           ` Jiri Pirko
2015-01-23 14:07                             ` Thomas Graf
2015-01-23 15:25                               ` Jiri Pirko
2015-01-23 15:43                                 ` John Fastabend
2015-01-23 15:56                                   ` Jiri Pirko
2015-01-23 15:49                                 ` Thomas Graf
2015-01-23 16:00                                   ` Jiri Pirko
2015-01-23 15:34                               ` John Fastabend
2015-01-23 15:53                                 ` Jiri Pirko
2015-01-23 16:00                                   ` Thomas Graf
2015-01-23 16:08                                     ` John Fastabend
2015-01-23 16:16                                     ` Jiri Pirko
2015-01-24 13:04                                       ` Jamal Hadi Salim
2015-01-23 17:46                                 ` Thomas Graf
2015-01-23 19:59                                   ` John Fastabend
2015-01-23 23:16                                     ` Thomas Graf
2015-01-24 13:22                                   ` Jamal Hadi Salim
2015-01-24 13:34                                     ` Thomas Graf
2015-01-24 13:01                                 ` Jamal Hadi Salim
2015-01-26  8:26                                   ` Simon Horman
2015-01-26 12:26                                     ` Jamal Hadi Salim
2015-01-27  4:28                                       ` David Ahern
2015-01-27  4:58                                         ` Andy Gospodarek
2015-01-27 15:54                                           ` Jamal Hadi Salim
2015-01-24 12:36                         ` Jamal Hadi Salim
2015-01-22 15:48               ` Jiri Pirko
2015-01-22 17:58                 ` Thomas Graf
2015-01-22 16:49               ` Pablo Neira Ayuso
2015-01-22 17:10                 ` John Fastabend
2015-01-22 17:44                 ` Thomas Graf
2015-01-24 12:34                   ` Jamal Hadi Salim
2015-01-24 13:48                     ` Thomas Graf
2015-01-23  9:00                 ` David Miller
2015-01-22 16:58           ` John Fastabend
2015-01-23 10:49             ` Thomas Graf
2015-01-23 16:42               ` John Fastabend
2015-01-24 12:29             ` Jamal Hadi Salim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.