All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH v1 00/11] A flow API
@ 2014-12-31 19:45 John Fastabend
  2014-12-31 19:45 ` [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables John Fastabend
                   ` (15 more replies)
  0 siblings, 16 replies; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:45 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

So... I could continue to mull over this and tweak bits and pieces
here and there but I decided its best to get a wider group of folks
looking at it and hopefulyl with any luck using it so here it is.

This set creates a new netlink family and set of messages to configure
flow tables in hardware. I tried to make the commit messages
reasonably verbose at least in the flow_table patches.

What we get at the end of this series is a working API to get device
capabilities and program flows using the rocker switch.

I created a user space tool 'flow' that I use to configure and query
the devices it is posted here,

	https://github.com/jrfastab/iprotue2-flow-tool

For now it is a stand-alone tool but once the kernel bits get sorted
out (I'm guessing there will need to be a few versions of this series
to get it right) I would like to port it into the iproute2 package.
This way we can keep all of our tooling in one package see 'bridge'
for example.

As far as testing, I've tested various combinations of tables and
rules on the rocker switch and it seems to work. I have not tested
100% of the rocker code paths though. It would be great to get some
sort of automated framework around the API to do this. I don't
think should gate the inclusion of the API though.

I could use some help reviewing,

  (a) error paths and netlink validation code paths

  (b) Break down of structures vs netlink attributes. I
      am trying to balance flexibility given by having
      netlinnk TLV attributes vs conciseness. So some
      things are passed as structures.

  (c) are there any devices that have pipelines that we
      can't represent with this API? It would be good to
      know about these so we can design it in probably
      in a future series.

For some examples and maybe a bit more illustrative description I
posted a quickly typed up set of notes on github io pages. Here we
can show the description along with images produced by the flow tool
showing the pipeline. Once we settle a bit more on the API we should
probably do a clean up of this and other threads happening and commit
something to the Documentation directory.

 http://jrfastab.github.io/jekyll/update/2014/12/21/flow-api.html

Finally I have more patches to add support for creating and destroying
tables. This allows users to define the pipeline at runtime rather
than statically as rocker does now. After this set gets some traction
I'll look at pushing them in a next round. However it likely requires
adding another "world" to rocker. Another piece that I want to add is
a description of the actions and metadata. This way user space can
"learn" what an action is and how metadata interacts with the system.
This work is under development.

Thanks! Any comments/feedback always welcome.

And also thanks to everyone who helped with this flow API so far. All
the folks at Dusseldorf LPC, OVS summit Santa Clara, P4 authors for
some inspiration, the collection of IETF FoRCES documents I mulled
over, Netfilter workshop where I started to realize fixing ethtool
was most likely not going to work, etc.

---

John Fastabend (11):
      net: flow_table: create interface for hw match/action tables
      net: flow_table: add flow, delete flow
      net: flow_table: add apply action argument to tables
      rocker: add pipeline model for rocker switch
      net: rocker: add set flow rules
      net: rocker: add group_id slices and drop explicit goto
      net: rocker: add multicast path to bridging
      net: rocker: add get flow API operation
      net: rocker: add cookie to group acls and use flow_id to set cookie
      net: rocker: have flow api calls set cookie value
      net: rocker: implement delete flow routine


 drivers/net/ethernet/rocker/rocker.c          | 1641 +++++++++++++++++++++++++
 drivers/net/ethernet/rocker/rocker_pipeline.h |  793 ++++++++++++
 include/linux/if_flow.h                       |  115 ++
 include/linux/netdevice.h                     |   20 
 include/uapi/linux/if_flow.h                  |  413 ++++++
 net/Kconfig                                   |    7 
 net/core/Makefile                             |    1 
 net/core/flow_table.c                         | 1339 ++++++++++++++++++++
 8 files changed, 4312 insertions(+), 17 deletions(-)
 create mode 100644 drivers/net/ethernet/rocker/rocker_pipeline.h
 create mode 100644 include/linux/if_flow.h
 create mode 100644 include/uapi/linux/if_flow.h
 create mode 100644 net/core/flow_table.c

-- 
Signature

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
@ 2014-12-31 19:45 ` John Fastabend
  2014-12-31 20:10   ` John Fastabend
                     ` (2 more replies)
  2014-12-31 19:46 ` [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow John Fastabend
                   ` (14 subsequent siblings)
  15 siblings, 3 replies; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:45 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

Currently, we do not have an interface to query hardware and learn
the capabilities of the device. This makes it very difficult to use
hardware flow tables.

At the moment the only interface we have to work with hardware flow
tables is ethtool. This has many deficiencies, first its ioctl based
making it difficult to use in systems that need to monitor interfaces
because there is no support for multicast, notifiers, etc.

The next big gap is it doesn't support querying devices for
capabilities. The only way to learn hardware entries is by doing a
"try and see" operation. An error perhaps indicating the device can
not support your request but could be possibly for other reasons.
Maybe a table is full for example. The existing flow interface only
supports a single ingress table which is sufficient for some of the
existing NIC host interfaces but limiting for more advanced NIC
interfaces and switch devices.

Also it is not extensible without recompiling both drivers and core
interfaces. It may be possible to reprogram a device with additional
header types, new protocols, whatever and it would be great if the
flow table infrastructure can handle this.

So this patch scraps the ethtool flow classifier interface and
creates a new flow table interface. It is expected that device that
support the existing ethtool interface today can support both
interfaces without too much difficulty. I did a proof point on the
ixgbe driver. Only choosing ixgbe because I have a 82599 10Gbps
device in my development system. A more thorough implementation
was done for the rocker switch showing how to use the interface.

In this patch we create interfaces to get the headers a device
supports, the actions it supports, a header graph showing the
relationship between headers the device supports, the tables
supported by the device and how they are connected.

This patch _only_ provides the get routines in an attempt to
make the patch sequence manageable.

get_headers :

   report a set of headers/fields the device supports. These
   are specified as length/offsets so we can support standard
   protocols or vendor specific headers. This is more flexible
   then bitmasks of pre-defined packet types. In 'tc' for example
   I may use u32 to match on proprietary or vendor specific fields.
   A bitmask approach does not allow for this, but defining the
   fields as a set of offsets and lengths allows for this.

   A device that supports Openflow version 1.x for example could
   provide the set of field/offsets that are equivelent to the
   specification.

   One property of this type of interface is I don't have to
   rebuild my kernel/driver header interfaces, etc to support the
   latest and greatest trendy protocol foo.

   For some types of metadata the device understands we also
   use header fields to represent these. One example of this is
   we may have an ingress_port metadata field to report the
   port a packet was received on. At the moment we expect the
   metadata fields to be defined outside the interface. We can
   standardize on common ones such "ingress_port" across devices.

   Some examples of outside definitions specifying metadata
   might be OVS, internal definitions like skb->mark, or some
   FoRCES definitions.

get_header_graph :

   Simply providing a header/field offset I support is not sufficient
   to learn how many nested 802.1Q tags I can support and other
   similar cases where the ordering of headers matters.

   So we use this operation to query the device for a header
   graph showing how the headers need to be related.
   With this operation and the 'get_headers' operation you can
   interrogate the driver with questions like "do you support
   Q'in'Q?", "how many VLAN tags can I nest before the parser
   breaks?", "Do you support MPLS?", "How about Foo Header in
   a VXLAN tunnel?".

get_actions :

   Report a list of actions supported by the device along with the
   arguments they take. So "drop_packet" action takes no arguments
   and "set_field" action takes two arguments a field and value.

   This suffers again from being slightly opaque. Meaning if a device
   reports back action "foo_bar" with three arguments how do I as a
   consumer of this "know" what that action is? The easy thing to do
   is punt on it and say it should be described outside the driver
   somewhere. OVS for example defines a set of actions. If my FoRCeS
   quick read is correct they define actions using text in the
   messaging interface. A follow up patch series could use a
   description language to describe actions. Possibly using something
   from eBPF or nftables for example. This patch will not try to
   solve the isuse now and expect actions are defined outside the API
   or are well known.

get_tables :

   Hardware may support one or more tables. Each table supports a set
   of matches and a set of actions. The match fields supported are
   defined above by the 'get_headers' operations. Similarly the actions
   supported are defined by the 'get_actions' operation.

   This allows the hardware to report several tables all with distinct
   capabilities. Tables also have table attributes used to describe
   features of the table. Because netlink messages are TLV based we
   can easily add new table attribues as needed.

   Currently a table has two attributes size and source. The size
   indicates how many "slots" are in the table for flow entries. One
   caveat here is a rule in the flow table may consume multiple slots
   in the table. We deal with this in a subsequent patch.

   The source field is used to indicate table boundaries where actions
   are applied. A table with the same source value will not "see"
   actions from tables with the same source. An example where this is
   relavent would be to have an action to re-write the destiniation
   IP address of a packet. If you have a match rule in a table with
   the same source that matches on the new IP address it will not be
   hit. However if it is in a table with a different source value
   _and_ in another table that gets applied the rule will be hit. See
   the next operatoin for querying table ordering.

   Some basic hardware may only support a single table which simplifies
   some things. But even the simple 10/40Gbps NICs support multiple
   tables and different tables depending on ingress/egress.

get_table_graph :

   When a device supports multiple tables we need to identify how the
   tables are connected when each table is executed.

   To do this we provide a table graph which gives the pipeline of the
   device. The graph gives nodes representing each table and the edges
   indicate the criteria to progress to the next flow table. There are
   examples of this type of thing in both FoRCES and OVS. OVS
   prescribes a set of tables reachable with goto actions and FoRCES a
   slightly more flexible arrangement. In software tc's u32 classifier
   allows "linking" hash tables together. The OVS dataplane with the
   support of 'goto' action is completely connected. Without the
   'goto' action the tables are progressed linearly.

   By querying the graph from hardware we can "learn" what table flows
   are supported and map them into software.

   We also provide a bit to indicate if the node is a root node of the
   ingress pipeline or egress pipeline. This is used on devices that
   have different pipelines for ingres and egress. This appears to be
   fairly common for devices. The realtek chip presented at LPC in
   Dusseldorf for example appeared to have a separate ingress/egress
   pipeline.

With these five operations software can learn what types of fields
the hardware flow table supports and how they are arranged. Subsequent
patches will address programming the flow tables.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/if_flow.h      |   93 +++++
 include/linux/netdevice.h    |   12 +
 include/uapi/linux/if_flow.h |  363 ++++++++++++++++++
 net/Kconfig                  |    7 
 net/core/Makefile            |    1 
 net/core/flow_table.c        |  837 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1313 insertions(+)
 create mode 100644 include/linux/if_flow.h
 create mode 100644 include/uapi/linux/if_flow.h
 create mode 100644 net/core/flow_table.c

diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
new file mode 100644
index 0000000..1b6c1ea
--- /dev/null
+++ b/include/linux/if_flow.h
@@ -0,0 +1,93 @@
+/*
+ * include/linux/net/if_flow.h - Flow table interface for Switch devices
+ * Copyright (c) 2014 John Fastabend <john.r.fastabend@intel.com>
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Author: John Fastabend <john.r.fastabend@intel.com>
+ */
+
+#ifndef _IF_FLOW_H
+#define _IF_FLOW_H
+
+#include <uapi/linux/if_flow.h>
+
+/**
+ * @struct net_flow_header
+ * @brief defines a match (header/field) an endpoint can use
+ *
+ * @uid unique identifier for header
+ * @field_sz number of fields are in the set
+ * @fields the set of fields in the net_flow_header
+ */
+struct net_flow_header {
+	char name[NET_FLOW_NAMSIZ];
+	int uid;
+	int field_sz;
+	struct net_flow_field *fields;
+};
+
+/**
+ * @struct net_flow_action
+ * @brief a description of a endpoint defined action
+ *
+ * @name printable name
+ * @uid unique action identifier
+ * @types NET_FLOW_ACTION_TYPE_NULL terminated list of action types
+ */
+struct net_flow_action {
+	char name[NET_FLOW_NAMSIZ];
+	int uid;
+	struct net_flow_action_arg *args;
+};
+
+/**
+ * @struct net_flow_table
+ * @brief define flow table with supported match/actions
+ *
+ * @uid unique identifier for table
+ * @source uid of parent table
+ * @size max number of entries for table or -1 for unbounded
+ * @matches null terminated set of supported match types given by match uid
+ * @actions null terminated set of supported action types given by action uid
+ * @flows set of flows
+ */
+struct net_flow_table {
+	char name[NET_FLOW_NAMSIZ];
+	int uid;
+	int source;
+	int size;
+	struct net_flow_field_ref *matches;
+	int *actions;
+};
+
+/* net_flow_hdr_node: node in a header graph of header fields.
+ *
+ * @uid : unique id of the graph node
+ * @flwo_header_ref : identify the hdrs that can handled by this node
+ * @net_flow_jump_table : give a case jump statement
+ */
+struct net_flow_hdr_node {
+	char name[NET_FLOW_NAMSIZ];
+	int uid;
+	int *hdrs;
+	struct net_flow_jump_table *jump;
+};
+
+struct net_flow_tbl_node {
+	int uid;
+	__u32 flags;
+	struct net_flow_jump_table *jump;
+};
+#endif
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 29c92ee..3c3c856 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,11 @@
 #include <linux/neighbour.h>
 #include <uapi/linux/netdevice.h>
 
+#ifdef CONFIG_NET_FLOW_TABLES
+#include <linux/if_flow.h>
+#include <uapi/linux/if_flow.h>
+#endif
+
 struct netpoll_info;
 struct device;
 struct phy_device;
@@ -1186,6 +1191,13 @@ struct net_device_ops {
 	int			(*ndo_switch_port_stp_update)(struct net_device *dev,
 							      u8 state);
 #endif
+#ifdef CONFIG_NET_FLOW_TABLES
+	struct net_flow_action  **(*ndo_flow_get_actions)(struct net_device *dev);
+	struct net_flow_table	**(*ndo_flow_get_tables)(struct net_device *dev);
+	struct net_flow_header	**(*ndo_flow_get_headers)(struct net_device *dev);
+	struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev);
+	struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev);
+#endif
 };
 
 /**
diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
new file mode 100644
index 0000000..2acdb38
--- /dev/null
+++ b/include/uapi/linux/if_flow.h
@@ -0,0 +1,363 @@
+/*
+ * include/uapi/linux/if_flow.h - Flow table interface for Switch devices
+ * Copyright (c) 2014 John Fastabend <john.r.fastabend@intel.com>
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Author: John Fastabend <john.r.fastabend@intel.com>
+ */
+
+/* Netlink description:
+ *
+ * Table definition used to describe running tables. The following
+ * describes the netlink message returned from a flow API messages.
+ *
+ * Flow table definitions used to define tables.
+ *
+ * [NET_FLOW_TABLE_IDENTIFIER_TYPE]
+ * [NET_FLOW_TABLE_IDENTIFIER]
+ * [NET_FLOW_TABLE_TABLES]
+ *     [NET_FLOW_TABLE]
+ *       [NET_FLOW_TABLE_ATTR_NAME]
+ *       [NET_FLOW_TABLE_ATTR_UID]
+ *       [NET_FLOW_TABLE_ATTR_SOURCE]
+ *       [NET_FLOW_TABLE_ATTR_SIZE]
+ *	 [NET_FLOW_TABLE_ATTR_MATCHES]
+ *	   [NET_FLOW_FIELD_REF]
+ *	   [NET_FLOW_FIELD_REF]
+ *	     [...]
+ *	   [...]
+ *	 [NET_FLOW_TABLE_ATTR_ACTIONS]
+ *	   [NET_FLOW_ACTION]
+ *	     [NET_FLOW_ACTION_ATTR_NAME]
+ *	     [NET_FLOW_ACTION_ATTR_UID]
+ *	     [NET_FLOW_ACTION_ATTR_SIGNATURE]
+ *		 [NET_FLOW_ACTION_ARG]
+ *	         [NET_FLOW_ACTION_ARG]
+ *	         [...]
+ *	   [NET_FLOW_ACTION]
+ *	     [...]
+ *	   [...]
+ *     [NET_FLOW_TABLE]
+ *       [...]
+ *
+ * Header definitions used to define headers with user friendly
+ * names.
+ *
+ * [NET_FLOW_TABLE_HEADERS]
+ *   [NET_FLOW_HEADER]
+ *	[NET_FLOW_HEADER_ATTR_NAME]
+ *	[NET_FLOW_HEADER_ATTR_UID]
+ *	[NET_FLOW_HEADER_ATTR_FIELDS]
+ *	  [NET_FLOW_HEADER_ATTR_FIELD]
+ *	    [NET_FLOW_FIELD_ATTR_NAME]
+ *	    [NET_FLOW_FIELD_ATTR_UID]
+ *	    [NET_FLOW_FIELD_ATTR_BITWIDTH]
+ *	  [NET_FLOW_HEADER_ATTR_FIELD]
+ *	    [...]
+ *	  [...]
+ *   [NET_FLOW_HEADER]
+ *      [...]
+ *   [...]
+ *
+ * Action definitions supported by tables
+ *
+ * [NET_FLOW_TABLE_ACTIONS]
+ *   [NET_FLOW_TABLE_ATTR_ACTIONS]
+ *	[NET_FLOW_ACTION]
+ *	  [NET_FLOW_ACTION_ATTR_NAME]
+ *	  [NET_FLOW_ACTION_ATTR_UID]
+ *	  [NET_FLOW_ACTION_ATTR_SIGNATURE]
+ *		 [NET_FLOW_ACTION_ARG]
+ *	         [NET_FLOW_ACTION_ARG]
+ *               [...]
+ *	[NET_FLOW_ACTION]
+ *	     [...]
+ *
+ * Parser definition used to unambiguously define match headers.
+ *
+ * [NET_FLOW_TABLE_PARSE_GRAPH]
+ *
+ * Primitive Type descriptions
+ *
+ * Get Table Graph <Request> only requires msg preamble.
+ *
+ * Get Table Graph <Reply> description
+ *
+ * [NET_FLOW_TABLE_TABLE_GRAPH]
+ *   [TABLE_GRAPH_NODE]
+ *	[TABLE_GRAPH_NODE_UID]
+ *	[TABLE_GRAPH_NODE_JUMP]
+ *	  [NET_FLOW_JUMP_TABLE_ENTRY]
+ *	  [NET_FLOW_JUMP_TABLE_ENTRY]
+ *	    [...]
+ *   [TABLE_GRAPH_NODE]
+ *	[..]
+ */
+
+#ifndef _UAPI_LINUX_IF_FLOW
+#define _UAPI_LINUX_IF_FLOW
+
+#include <linux/types.h>
+#include <linux/netlink.h>
+#include <linux/if.h>
+
+#define NET_FLOW_NAMSIZ 80
+
+/**
+ * @struct net_flow_fields
+ * @brief defines a field in a header
+ */
+struct net_flow_field {
+	char name[NET_FLOW_NAMSIZ];
+	int uid;
+	int bitwidth;
+};
+
+enum {
+	NET_FLOW_FIELD_UNSPEC,
+	NET_FLOW_FIELD,
+	__NET_FLOW_FIELD_MAX,
+};
+#define NET_FLOW_FIELD_MAX (__NET_FLOW_FIELD_MAX - 1)
+
+enum {
+	NET_FLOW_FIELD_ATTR_UNSPEC,
+	NET_FLOW_FIELD_ATTR_NAME,
+	NET_FLOW_FIELD_ATTR_UID,
+	NET_FLOW_FIELD_ATTR_BITWIDTH,
+	__NET_FLOW_FIELD_ATTR_MAX,
+};
+#define NET_FLOW_FIELD_ATTR_MAX (__NET_FLOW_FIELD_ATTR_MAX - 1)
+
+enum {
+	NET_FLOW_HEADER_UNSPEC,
+	NET_FLOW_HEADER,
+	__NET_FLOW_HEADER_MAX,
+};
+#define NET_FLOW_HEADER_MAX (__NET_FLOW_HEADER_MAX - 1)
+
+enum {
+	NET_FLOW_HEADER_ATTR_UNSPEC,
+	NET_FLOW_HEADER_ATTR_NAME,
+	NET_FLOW_HEADER_ATTR_UID,
+	NET_FLOW_HEADER_ATTR_FIELDS,
+	__NET_FLOW_HEADER_ATTR_MAX,
+};
+#define NET_FLOW_HEADER_ATTR_MAX (__NET_FLOW_HEADER_ATTR_MAX - 1)
+
+enum {
+	NET_FLOW_MASK_TYPE_UNSPEC,
+	NET_FLOW_MASK_TYPE_EXACT,
+	NET_FLOW_MASK_TYPE_LPM,
+};
+
+/**
+ * @struct net_flow_field_ref
+ * @brief uniquely identify field as header:field tuple
+ */
+struct net_flow_field_ref {
+	int instance;
+	int header;
+	int field;
+	int mask_type;
+	int type;
+	union {	/* Are these all the required data types */
+		__u8 value_u8;
+		__u16 value_u16;
+		__u32 value_u32;
+		__u64 value_u64;
+	};
+	union {	/* Are these all the required data types */
+		__u8 mask_u8;
+		__u16 mask_u16;
+		__u32 mask_u32;
+		__u64 mask_u64;
+	};
+};
+
+enum {
+	NET_FLOW_FIELD_REF_UNSPEC,
+	NET_FLOW_FIELD_REF,
+	__NET_FLOW_FIELD_REF_MAX,
+};
+#define NET_FLOW_FIELD_REF_MAX (__NET_FLOW_FIELD_REF_MAX - 1)
+
+enum {
+	NET_FLOW_FIELD_REF_ATTR_TYPE_UNSPEC,
+	NET_FLOW_FIELD_REF_ATTR_TYPE_U8,
+	NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
+	NET_FLOW_FIELD_REF_ATTR_TYPE_U32,
+	NET_FLOW_FIELD_REF_ATTR_TYPE_U64,
+	/* Need more types for ether.addrs, ip.addrs, ... */
+};
+
+enum net_flow_action_arg_type {
+	NET_FLOW_ACTION_ARG_TYPE_NULL,
+	NET_FLOW_ACTION_ARG_TYPE_U8,
+	NET_FLOW_ACTION_ARG_TYPE_U16,
+	NET_FLOW_ACTION_ARG_TYPE_U32,
+	NET_FLOW_ACTION_ARG_TYPE_U64,
+	__NET_FLOW_ACTION_ARG_TYPE_VAL_MAX,
+};
+
+struct net_flow_action_arg {
+	char name[NET_FLOW_NAMSIZ];
+	enum net_flow_action_arg_type type;
+	union {
+		__u8  value_u8;
+		__u16 value_u16;
+		__u32 value_u32;
+		__u64 value_u64;
+	};
+};
+
+enum {
+	NET_FLOW_ACTION_ARG_UNSPEC,
+	NET_FLOW_ACTION_ARG,
+	__NET_FLOW_ACTION_ARG_MAX,
+};
+#define NET_FLOW_ACTION_ARG_MAX (__NET_FLOW_ACTION_ARG_MAX - 1)
+
+enum {
+	NET_FLOW_ACTION_UNSPEC,
+	NET_FLOW_ACTION,
+	__NET_FLOW_ACTION_MAX,
+};
+#define NET_FLOW_ACTION_MAX (__NET_FLOW_ACTION_MAX - 1)
+
+enum {
+	NET_FLOW_ACTION_ATTR_UNSPEC,
+	NET_FLOW_ACTION_ATTR_NAME,
+	NET_FLOW_ACTION_ATTR_UID,
+	NET_FLOW_ACTION_ATTR_SIGNATURE,
+	__NET_FLOW_ACTION_ATTR_MAX,
+};
+#define NET_FLOW_ACTION_ATTR_MAX (__NET_FLOW_ACTION_ATTR_MAX - 1)
+
+enum {
+	NET_FLOW_ACTION_SET_UNSPEC,
+	NET_FLOW_ACTION_SET_ACTIONS,
+	__NET_FLOW_ACTION_SET_MAX,
+};
+#define NET_FLOW_ACTION_SET_MAX (__NET_FLOW_ACTION_SET_MAX - 1)
+
+enum {
+	NET_FLOW_TABLE_UNSPEC,
+	NET_FLOW_TABLE,
+	__NET_FLOW_TABLE_MAX,
+};
+#define NET_FLOW_TABLE_MAX (__NET_FLOW_TABLE_MAX - 1)
+
+enum {
+	NET_FLOW_TABLE_ATTR_UNSPEC,
+	NET_FLOW_TABLE_ATTR_NAME,
+	NET_FLOW_TABLE_ATTR_UID,
+	NET_FLOW_TABLE_ATTR_SOURCE,
+	NET_FLOW_TABLE_ATTR_SIZE,
+	NET_FLOW_TABLE_ATTR_MATCHES,
+	NET_FLOW_TABLE_ATTR_ACTIONS,
+	__NET_FLOW_TABLE_ATTR_MAX,
+};
+#define NET_FLOW_TABLE_ATTR_MAX (__NET_FLOW_TABLE_ATTR_MAX - 1)
+
+struct net_flow_jump_table {
+	struct net_flow_field_ref field;
+	int node; /* <0 is a parser error */
+};
+
+#define NET_FLOW_JUMP_TABLE_DONE	-1
+
+enum {
+	NET_FLOW_JUMP_TABLE_ENTRY_UNSPEC,
+	NET_FLOW_JUMP_TABLE_ENTRY,
+	__NET_FLOW_JUMP_TABLE_ENTRY_MAX,
+};
+
+enum {
+	NET_FLOW_HEADER_NODE_HDRS_UNSPEC,
+	NET_FLOW_HEADER_NODE_HDRS_VALUE,
+	__NET_FLOW_HEADER_NODE_HDRS_MAX,
+};
+#define NET_FLOW_HEADER_NODE_HDRS_MAX (__NET_FLOW_HEADER_NODE_HDRS_MAX - 1)
+
+enum {
+	NET_FLOW_HEADER_NODE_UNSPEC,
+	NET_FLOW_HEADER_NODE_NAME,
+	NET_FLOW_HEADER_NODE_UID,
+	NET_FLOW_HEADER_NODE_HDRS,
+	NET_FLOW_HEADER_NODE_JUMP,
+	__NET_FLOW_HEADER_NODE_MAX,
+};
+#define NET_FLOW_HEADER_NODE_MAX (__NET_FLOW_HEADER_NODE_MAX - 1)
+
+enum {
+	NET_FLOW_HEADER_GRAPH_UNSPEC,
+	NET_FLOW_HEADER_GRAPH_NODE,
+	__NET_FLOW_HEADER_GRAPH_MAX,
+};
+#define NET_FLOW_HEADER_GRAPH_MAX (__NET_FLOW_HEADER_GRAPH_MAX - 1)
+
+#define NET_FLOW_TABLE_EGRESS_ROOT 1
+#define	NET_FLOW_TABLE_INGRESS_ROOT 2
+
+enum {
+	NET_FLOW_TABLE_GRAPH_NODE_UNSPEC,
+	NET_FLOW_TABLE_GRAPH_NODE_UID,
+	NET_FLOW_TABLE_GRAPH_NODE_FLAGS,
+	NET_FLOW_TABLE_GRAPH_NODE_JUMP,
+	__NET_FLOW_TABLE_GRAPH_NODE_MAX,
+};
+#define NET_FLOW_TABLE_GRAPH_NODE_MAX (__NET_FLOW_TABLE_GRAPH_NODE_MAX - 1)
+
+enum {
+	NET_FLOW_TABLE_GRAPH_UNSPEC,
+	NET_FLOW_TABLE_GRAPH_NODE,
+	__NET_FLOW_TABLE_GRAPH_MAX,
+};
+#define NET_FLOW_TABLE_GRAPH_MAX (__NET_FLOW_TABLE_GRAPH_MAX - 1)
+
+enum {
+	NET_FLOW_IDENTIFIER_IFINDEX, /* net_device ifindex */
+};
+
+enum {
+	NET_FLOW_UNSPEC,
+	NET_FLOW_IDENTIFIER_TYPE,
+	NET_FLOW_IDENTIFIER,
+
+	NET_FLOW_TABLES,
+	NET_FLOW_HEADERS,
+	NET_FLOW_ACTIONS,
+	NET_FLOW_HEADER_GRAPH,
+	NET_FLOW_TABLE_GRAPH,
+
+	__NET_FLOW_MAX,
+	NET_FLOW_MAX = (__NET_FLOW_MAX - 1),
+};
+
+enum {
+	NET_FLOW_TABLE_CMD_GET_TABLES,
+	NET_FLOW_TABLE_CMD_GET_HEADERS,
+	NET_FLOW_TABLE_CMD_GET_ACTIONS,
+	NET_FLOW_TABLE_CMD_GET_HDR_GRAPH,
+	NET_FLOW_TABLE_CMD_GET_TABLE_GRAPH,
+
+	__NET_FLOW_CMD_MAX,
+	NET_FLOW_CMD_MAX = (__NET_FLOW_CMD_MAX - 1),
+};
+
+#define NET_FLOW_GENL_NAME "net_flow_table"
+#define NET_FLOW_GENL_VERSION 0x1
+#endif /* _UAPI_LINUX_IF_FLOW */
diff --git a/net/Kconfig b/net/Kconfig
index ff9ffc1..8380bfe 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -293,6 +293,13 @@ config NET_FLOW_LIMIT
 	  with many clients some protection against DoS by a single (spoofed)
 	  flow that greatly exceeds average workload.
 
+config NET_FLOW_TABLES
+	boolean "Support network flow tables"
+	---help---
+	This feature provides an interface for device drivers to report
+	flow tables and supported matches and actions. If you do not
+	want to support hardware offloads for flow tables, say N here.
+
 menu "Network testing"
 
 config NET_PKTGEN
diff --git a/net/core/Makefile b/net/core/Makefile
index 235e6c5..1eea785 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -23,3 +23,4 @@ obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
 obj-$(CONFIG_NET_PTP_CLASSIFY) += ptp_classifier.o
 obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
 obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
+obj-$(CONFIG_NET_FLOW_TABLES) += flow_table.o
diff --git a/net/core/flow_table.c b/net/core/flow_table.c
new file mode 100644
index 0000000..ec3f06d
--- /dev/null
+++ b/net/core/flow_table.c
@@ -0,0 +1,837 @@
+/*
+ * include/uapi/linux/if_flow.h - Flow table interface for Switch devices
+ * Copyright (c) 2014 John Fastabend <john.r.fastabend@intel.com>
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Author: John Fastabend <john.r.fastabend@intel.com>
+ */
+
+#include <uapi/linux/if_flow.h>
+#include <linux/if_flow.h>
+#include <linux/if_bridge.h>
+#include <linux/types.h>
+#include <net/netlink.h>
+#include <net/genetlink.h>
+#include <net/rtnetlink.h>
+#include <linux/module.h>
+
+static struct genl_family net_flow_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= NET_FLOW_GENL_NAME,
+	.version	= NET_FLOW_GENL_VERSION,
+	.maxattr	= NET_FLOW_MAX,
+	.netnsok	= true,
+};
+
+static struct net_device *net_flow_get_dev(struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	int type, ifindex;
+
+	if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] ||
+	    !info->attrs[NET_FLOW_IDENTIFIER])
+		return NULL;
+
+	type = nla_get_u32(info->attrs[NET_FLOW_IDENTIFIER_TYPE]);
+	switch (type) {
+	case NET_FLOW_IDENTIFIER_IFINDEX:
+		ifindex = nla_get_u32(info->attrs[NET_FLOW_IDENTIFIER]);
+		break;
+	default:
+		return NULL;
+	}
+
+	return dev_get_by_index(net, ifindex);
+}
+
+static int net_flow_put_act_types(struct sk_buff *skb,
+				  struct net_flow_action_arg *args)
+{
+	int i, err;
+
+	for (i = 0; args[i].type; i++) {
+		err = nla_put(skb, NET_FLOW_ACTION_ARG,
+			      sizeof(struct net_flow_action_arg), &args[i]);
+		if (err)
+			return -EMSGSIZE;
+	}
+	return 0;
+}
+
+static const
+struct nla_policy net_flow_action_policy[NET_FLOW_ACTION_ATTR_MAX + 1] = {
+	[NET_FLOW_ACTION_ATTR_NAME]	 = {.type = NLA_STRING,
+					    .len = NET_FLOW_NAMSIZ-1 },
+	[NET_FLOW_ACTION_ATTR_UID]	 = {.type = NLA_U32 },
+	[NET_FLOW_ACTION_ATTR_SIGNATURE] = {.type = NLA_NESTED },
+};
+
+static int net_flow_put_action(struct sk_buff *skb, struct net_flow_action *a)
+{
+	struct net_flow_action_arg *this;
+	struct nlattr *nest;
+	int err, args = 0;
+
+	if (a->name && nla_put_string(skb, NET_FLOW_ACTION_ATTR_NAME, a->name))
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, NET_FLOW_ACTION_ATTR_UID, a->uid))
+		return -EMSGSIZE;
+
+	if (!a->args)
+		return 0;
+
+	for (this = &a->args[0]; strlen(this->name) > 0; this++)
+		args++;
+
+	if (args) {
+		nest = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
+		if (!nest)
+			goto nest_put_failure;
+
+		err = net_flow_put_act_types(skb, a->args);
+		if (err) {
+			nla_nest_cancel(skb, nest);
+			return err;
+		}
+		nla_nest_end(skb, nest);
+	}
+
+	return 0;
+nest_put_failure:
+	return -EMSGSIZE;
+}
+
+static int net_flow_put_actions(struct sk_buff *skb,
+				struct net_flow_action **acts)
+{
+	struct nlattr *actions;
+	int err, i;
+
+	actions = nla_nest_start(skb, NET_FLOW_ACTIONS);
+	if (!actions)
+		return -EMSGSIZE;
+
+	for (i = 0; acts[i]->uid; i++) {
+		struct nlattr *action = nla_nest_start(skb, NET_FLOW_ACTION);
+
+		if (!action)
+			goto action_put_failure;
+
+		err = net_flow_put_action(skb, acts[i]);
+		if (err)
+			goto action_put_failure;
+		nla_nest_end(skb, action);
+	}
+	nla_nest_end(skb, actions);
+
+	return 0;
+action_put_failure:
+	nla_nest_cancel(skb, actions);
+	return -EMSGSIZE;
+}
+
+struct sk_buff *net_flow_build_actions_msg(struct net_flow_action **a,
+					   struct net_device *dev,
+					   u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *hdr;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NET_FLOW_IDENTIFIER_TYPE,
+			NET_FLOW_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	err = net_flow_put_actions(skb, a);
+	if (err < 0)
+		goto out;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static int net_flow_cmd_get_actions(struct sk_buff *skb,
+				    struct genl_info *info)
+{
+	struct net_flow_action **a;
+	struct net_device *dev;
+	struct sk_buff *msg;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_get_actions) {
+		dev_put(dev);
+		return -EOPNOTSUPP;
+	}
+
+	a = dev->netdev_ops->ndo_flow_get_actions(dev);
+	if (!a)
+		return -EBUSY;
+
+	msg = net_flow_build_actions_msg(a, dev,
+					 info->snd_portid,
+					 info->snd_seq,
+					 NET_FLOW_TABLE_CMD_GET_ACTIONS);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+}
+
+static int net_flow_put_table(struct net_device *dev,
+			      struct sk_buff *skb,
+			      struct net_flow_table *t)
+{
+	struct nlattr *matches, *actions;
+	int i;
+
+	if (nla_put_string(skb, NET_FLOW_TABLE_ATTR_NAME, t->name) ||
+	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_UID, t->uid) ||
+	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SOURCE, t->source) ||
+	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SIZE, t->size))
+		return -EMSGSIZE;
+
+	matches = nla_nest_start(skb, NET_FLOW_TABLE_ATTR_MATCHES);
+	if (!matches)
+		return -EMSGSIZE;
+
+	for (i = 0; t->matches[i].instance; i++)
+		nla_put(skb, NET_FLOW_FIELD_REF,
+			sizeof(struct net_flow_field_ref),
+			&t->matches[i]);
+	nla_nest_end(skb, matches);
+
+	actions = nla_nest_start(skb, NET_FLOW_TABLE_ATTR_ACTIONS);
+	if (!actions)
+		return -EMSGSIZE;
+
+	for (i = 0; t->actions[i]; i++) {
+		if (nla_put_u32(skb,
+				NET_FLOW_ACTION_ATTR_UID,
+				t->actions[i])) {
+			nla_nest_cancel(skb, actions);
+			return -EMSGSIZE;
+		}
+	}
+	nla_nest_end(skb, actions);
+
+	return 0;
+}
+
+static int net_flow_put_tables(struct net_device *dev,
+			       struct sk_buff *skb,
+			       struct net_flow_table **tables)
+{
+	struct nlattr *nest, *t;
+	int i, err = 0;
+
+	nest = nla_nest_start(skb, NET_FLOW_TABLES);
+	if (!nest)
+		return -EMSGSIZE;
+
+	for (i = 0; tables[i]->uid; i++) {
+		t = nla_nest_start(skb, NET_FLOW_TABLE);
+		if (!t) {
+			err = -EMSGSIZE;
+			goto errout;
+		}
+
+		err = net_flow_put_table(dev, skb, tables[i]);
+		if (err) {
+			nla_nest_cancel(skb, t);
+			goto errout;
+		}
+		nla_nest_end(skb, t);
+	}
+	nla_nest_end(skb, nest);
+	return 0;
+errout:
+	nla_nest_cancel(skb, nest);
+	return err;
+}
+
+static struct sk_buff *net_flow_build_tables_msg(struct net_flow_table **t,
+						 struct net_device *dev,
+						 u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *hdr;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NET_FLOW_IDENTIFIER_TYPE,
+			NET_FLOW_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	err = net_flow_put_tables(dev, skb, t);
+	if (err < 0)
+		goto out;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static int net_flow_cmd_get_tables(struct sk_buff *skb,
+				   struct genl_info *info)
+{
+	struct net_flow_table **tables;
+	struct net_device *dev;
+	struct sk_buff *msg;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_get_tables) {
+		dev_put(dev);
+		return -EOPNOTSUPP;
+	}
+
+	tables = dev->netdev_ops->ndo_flow_get_tables(dev);
+	if (!tables) /* transient failure should always have some table */
+		return -EBUSY;
+
+	msg = net_flow_build_tables_msg(tables, dev,
+					info->snd_portid,
+					info->snd_seq,
+					NET_FLOW_TABLE_CMD_GET_TABLES);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+}
+
+static
+int net_flow_put_fields(struct sk_buff *skb, const struct net_flow_header *h)
+{
+	struct net_flow_field *f;
+	int count = h->field_sz;
+	struct nlattr *field;
+
+	for (f = h->fields; count; count--, f++) {
+		field = nla_nest_start(skb, NET_FLOW_FIELD);
+		if (!field)
+			goto field_put_failure;
+
+		if (nla_put_string(skb, NET_FLOW_FIELD_ATTR_NAME, f->name) ||
+		    nla_put_u32(skb, NET_FLOW_FIELD_ATTR_UID, f->uid) ||
+		    nla_put_u32(skb, NET_FLOW_FIELD_ATTR_BITWIDTH, f->bitwidth))
+			goto out;
+
+		nla_nest_end(skb, field);
+	}
+
+	return 0;
+out:
+	nla_nest_cancel(skb, field);
+field_put_failure:
+	return -EMSGSIZE;
+}
+
+static int net_flow_put_headers(struct sk_buff *skb,
+				struct net_flow_header **headers)
+{
+	struct nlattr *nest, *hdr, *fields;
+	struct net_flow_header *h;
+	int i, err;
+
+	nest = nla_nest_start(skb, NET_FLOW_HEADERS);
+	if (!nest)
+		return -EMSGSIZE;
+
+	for (i = 0; headers[i]->uid; i++) {
+		err = -EMSGSIZE;
+		h = headers[i];
+
+		hdr = nla_nest_start(skb, NET_FLOW_HEADER);
+		if (!hdr)
+			goto hdr_put_failure;
+
+		if (nla_put_string(skb, NET_FLOW_HEADER_ATTR_NAME, h->name) ||
+		    nla_put_u32(skb, NET_FLOW_HEADER_ATTR_UID, h->uid))
+			goto attr_put_failure;
+
+		fields = nla_nest_start(skb, NET_FLOW_HEADER_ATTR_FIELDS);
+		if (!fields)
+			goto attr_put_failure;
+
+		err = net_flow_put_fields(skb, h);
+		if (err)
+			goto fields_put_failure;
+
+		nla_nest_end(skb, fields);
+
+		nla_nest_end(skb, hdr);
+	}
+	nla_nest_end(skb, nest);
+
+	return 0;
+fields_put_failure:
+	nla_nest_cancel(skb, fields);
+attr_put_failure:
+	nla_nest_cancel(skb, hdr);
+hdr_put_failure:
+	nla_nest_cancel(skb, nest);
+	return err;
+}
+
+static struct sk_buff *net_flow_build_headers_msg(struct net_flow_header **h,
+						  struct net_device *dev,
+						  u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *hdr;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NET_FLOW_IDENTIFIER_TYPE,
+			NET_FLOW_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	err = net_flow_put_headers(skb, h);
+	if (err < 0)
+		goto out;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static int net_flow_cmd_get_headers(struct sk_buff *skb,
+				    struct genl_info *info)
+{
+	struct net_flow_header **h;
+	struct net_device *dev;
+	struct sk_buff *msg;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_get_headers) {
+		dev_put(dev);
+		return -EOPNOTSUPP;
+	}
+
+	h = dev->netdev_ops->ndo_flow_get_headers(dev);
+	if (!h)
+		return -EBUSY;
+
+	msg = net_flow_build_headers_msg(h, dev,
+					 info->snd_portid,
+					 info->snd_seq,
+					 NET_FLOW_TABLE_CMD_GET_HEADERS);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+}
+
+static int net_flow_put_header_node(struct sk_buff *skb,
+				    struct net_flow_hdr_node *node)
+{
+	struct nlattr *hdrs, *jumps;
+	int i, err;
+
+	if (nla_put_string(skb, NET_FLOW_HEADER_NODE_NAME, node->name) ||
+	    nla_put_u32(skb, NET_FLOW_HEADER_NODE_UID, node->uid))
+		return -EMSGSIZE;
+
+	/* Insert the set of headers that get extracted at this node */
+	hdrs = nla_nest_start(skb, NET_FLOW_HEADER_NODE_HDRS);
+	if (!hdrs)
+		return -EMSGSIZE;
+	for (i = 0; node->hdrs[i]; i++) {
+		if (nla_put_u32(skb, NET_FLOW_HEADER_NODE_HDRS_VALUE,
+				node->hdrs[i])) {
+			nla_nest_cancel(skb, hdrs);
+			return -EMSGSIZE;
+		}
+	}
+	nla_nest_end(skb, hdrs);
+
+	/* Then give the jump table to find next header node in graph */
+	jumps = nla_nest_start(skb, NET_FLOW_HEADER_NODE_JUMP);
+	if (!jumps)
+		return -EMSGSIZE;
+
+	for (i = 0; node->jump[i].node; i++) {
+		err = nla_put(skb, NET_FLOW_JUMP_TABLE_ENTRY,
+			      sizeof(struct net_flow_jump_table),
+			      &node->jump[i]);
+		if (err) {
+			nla_nest_cancel(skb, jumps);
+			return -EMSGSIZE;
+		}
+	}
+	nla_nest_end(skb, jumps);
+
+	return 0;
+}
+
+static int net_flow_put_header_graph(struct sk_buff *skb,
+				     struct net_flow_hdr_node **g)
+{
+	struct nlattr *nodes, *node;
+	int err, i;
+
+	nodes = nla_nest_start(skb, NET_FLOW_HEADER_GRAPH);
+	if (!nodes)
+		return -EMSGSIZE;
+
+	for (i = 0; g[i]->uid; i++) {
+		node = nla_nest_start(skb, NET_FLOW_HEADER_GRAPH_NODE);
+		if (!node) {
+			err = -EMSGSIZE;
+			goto nodes_put_error;
+		}
+
+		err = net_flow_put_header_node(skb, g[i]);
+		if (err)
+			goto node_put_error;
+
+		nla_nest_end(skb, node);
+	}
+
+	nla_nest_end(skb, nodes);
+	return 0;
+node_put_error:
+	nla_nest_cancel(skb, node);
+nodes_put_error:
+	nla_nest_cancel(skb, nodes);
+	return err;
+}
+
+static
+struct sk_buff *net_flow_build_header_graph_msg(struct net_flow_hdr_node **g,
+						struct net_device *dev,
+						u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *hdr;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NET_FLOW_IDENTIFIER_TYPE,
+			NET_FLOW_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	err = net_flow_put_header_graph(skb, g);
+	if (err < 0)
+		goto out;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static int net_flow_cmd_get_header_graph(struct sk_buff *skb,
+					 struct genl_info *info)
+{
+	struct net_flow_hdr_node **h;
+	struct net_device *dev;
+	struct sk_buff *msg;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_get_hdr_graph) {
+		dev_put(dev);
+		return -EOPNOTSUPP;
+	}
+
+	h = dev->netdev_ops->ndo_flow_get_hdr_graph(dev);
+	if (!h)
+		return -EBUSY;
+
+	msg = net_flow_build_header_graph_msg(h, dev,
+					      info->snd_portid,
+					      info->snd_seq,
+					      NET_FLOW_TABLE_CMD_GET_HDR_GRAPH);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+}
+
+static int net_flow_put_table_node(struct sk_buff *skb,
+				   struct net_flow_tbl_node *node)
+{
+	struct nlattr *nest, *jump;
+	int i, err = -EMSGSIZE;
+
+	nest = nla_nest_start(skb, NET_FLOW_TABLE_GRAPH_NODE);
+	if (!nest)
+		return err;
+
+	if (nla_put_u32(skb, NET_FLOW_TABLE_GRAPH_NODE_UID, node->uid) ||
+	    nla_put_u32(skb, NET_FLOW_TABLE_GRAPH_NODE_FLAGS, node->flags))
+		goto node_put_failure;
+
+	jump = nla_nest_start(skb, NET_FLOW_TABLE_GRAPH_NODE_JUMP);
+	if (!jump)
+		goto node_put_failure;
+
+	for (i = 0; node->jump[i].node; i++) {
+		err = nla_put(skb, NET_FLOW_JUMP_TABLE_ENTRY,
+			      sizeof(struct net_flow_jump_table),
+			      &node->jump[i]);
+		if (err)
+			goto jump_put_failure;
+	}
+
+	nla_nest_end(skb, jump);
+	nla_nest_end(skb, nest);
+	return 0;
+jump_put_failure:
+	nla_nest_cancel(skb, jump);
+node_put_failure:
+	nla_nest_cancel(skb, nest);
+	return err;
+}
+
+static int net_flow_put_table_graph(struct sk_buff *skb,
+				    struct net_flow_tbl_node **nodes)
+{
+	struct nlattr *graph;
+	int err, i = 0;
+
+	graph = nla_nest_start(skb, NET_FLOW_TABLE_GRAPH);
+	if (!graph)
+		return -EMSGSIZE;
+
+	for (i = 0; nodes[i]->uid; i++) {
+		err = net_flow_put_table_node(skb, nodes[i]);
+		if (err) {
+			nla_nest_cancel(skb, graph);
+			return -EMSGSIZE;
+		}
+	}
+
+	nla_nest_end(skb, graph);
+	return 0;
+}
+
+static
+struct sk_buff *net_flow_build_graph_msg(struct net_flow_tbl_node **g,
+					 struct net_device *dev,
+					 u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *hdr;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NET_FLOW_IDENTIFIER_TYPE,
+			NET_FLOW_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	err = net_flow_put_table_graph(skb, g);
+	if (err < 0)
+		goto out;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static int net_flow_cmd_get_table_graph(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct net_flow_tbl_node **g;
+	struct net_device *dev;
+	struct sk_buff *msg;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_get_tbl_graph) {
+		dev_put(dev);
+		return -EOPNOTSUPP;
+	}
+
+	g = dev->netdev_ops->ndo_flow_get_tbl_graph(dev);
+	if (!g)
+		return -EBUSY;
+
+	msg = net_flow_build_graph_msg(g, dev,
+				       info->snd_portid,
+				       info->snd_seq,
+				       NET_FLOW_TABLE_CMD_GET_TABLE_GRAPH);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+}
+
+static const struct nla_policy net_flow_cmd_policy[NET_FLOW_MAX + 1] = {
+	[NET_FLOW_IDENTIFIER_TYPE] = {.type = NLA_U32, },
+	[NET_FLOW_IDENTIFIER]	   = {.type = NLA_U32, },
+	[NET_FLOW_TABLES]	   = {.type = NLA_NESTED, },
+	[NET_FLOW_HEADERS]	   = {.type = NLA_NESTED, },
+	[NET_FLOW_ACTIONS]	   = {.type = NLA_NESTED, },
+	[NET_FLOW_HEADER_GRAPH]	   = {.type = NLA_NESTED, },
+	[NET_FLOW_TABLE_GRAPH]	   = {.type = NLA_NESTED, },
+};
+
+static const struct genl_ops net_flow_table_nl_ops[] = {
+	{
+		.cmd = NET_FLOW_TABLE_CMD_GET_TABLES,
+		.doit = net_flow_cmd_get_tables,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NET_FLOW_TABLE_CMD_GET_HEADERS,
+		.doit = net_flow_cmd_get_headers,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NET_FLOW_TABLE_CMD_GET_ACTIONS,
+		.doit = net_flow_cmd_get_actions,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NET_FLOW_TABLE_CMD_GET_HDR_GRAPH,
+		.doit = net_flow_cmd_get_header_graph,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NET_FLOW_TABLE_CMD_GET_TABLE_GRAPH,
+		.doit = net_flow_cmd_get_table_graph,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+static int __init net_flow_nl_module_init(void)
+{
+	return genl_register_family_with_ops(&net_flow_nl_family,
+					     net_flow_table_nl_ops);
+}
+
+static void net_flow_nl_module_fini(void)
+{
+	genl_unregister_family(&net_flow_nl_family);
+}
+
+module_init(net_flow_nl_module_init);
+module_exit(net_flow_nl_module_fini);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("John Fastabend <john.r.fastabend@intel.com>");
+MODULE_DESCRIPTION("Netlink interface to Flow Tables");
+MODULE_ALIAS_GENL_FAMILY(NET_FLOW_GENL_NAME);

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
  2014-12-31 19:45 ` [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables John Fastabend
@ 2014-12-31 19:46 ` John Fastabend
  2015-01-06  6:19   ` Scott Feldman
  2015-01-08 17:39   ` Jiri Pirko
  2014-12-31 19:46 ` [net-next PATCH v1 03/11] net: flow_table: add apply action argument to tables John Fastabend
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:46 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

Now that the device capabilities are exposed we can add support to
add and delete flows from the tables.

The two operations are

table_set_flows :

  The set flow operations is used to program a set of flows into a
  hardware device table. The message is consumed via netlink encoded
  message which is then decoded into a null terminated  array of
  flow entry structures. A flow entry structure is defined as

     struct net_flow_flow {
			  int table_id;
			  int uid;
			  int priority;
			  struct net_flow_field_ref *matches;
			  struct net_flow_action *actions;
     }

  The table id is the _uid_ returned from 'get_tables' operatoins.
  Matches is a set of match criteria for packets with a logical AND
  operation done on the set so packets match the entire criteria.
  Actions provide a set of actions to perform when the flow rule is
  hit. Both matches and actions are null terminated arrays.

  The flows are configured in hardware using an ndo op. We do not
  provide a commit operation at the moment and expect hardware
  commits the flows one at a time. Future work may require a commit
  operation to tell the hardware we are done loading flow rules. On
  some hardware this will help bulk updates.

  Its possible for hardware to return an error from a flow set
  operation. This can occur for many reasons both transient and
  resource constraints. We have different error handling strategies
  built in and listed here,

    *_ERROR_ABORT      abort on first error with errmsg

    *_ERROR_CONTINUE   continue programming flows no errmsg

    *_ERROR_ABORT_LOG  abort on first error and return flow that
 		       failed to user space in reply msg

    *_ERROR_CONT_LOG   continue programming flows and return a list
		       of flows that failed to user space in a reply
		       msg.

  notably missing is a rollback error strategy. I don't have a
  use for this in software yet but the strategy can be added with
  *_ERROR_ROLLBACK for example.

table_del_flows

  The delete flow operation uses the same structures and error
  handling strategies as the table_set_flows operations. Although on
  delete messges ommit the matches/actions arrays because they are
  not needed to lookup the flow.

Also thanks to Simon Horman for fixes and other help.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/if_flow.h      |   21 ++
 include/linux/netdevice.h    |    8 +
 include/uapi/linux/if_flow.h |   49 ++++
 net/core/flow_table.c        |  501 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 579 insertions(+)

diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
index 1b6c1ea..20fa752 100644
--- a/include/linux/if_flow.h
+++ b/include/linux/if_flow.h
@@ -90,4 +90,25 @@ struct net_flow_tbl_node {
 	__u32 flags;
 	struct net_flow_jump_table *jump;
 };
+
+/**
+ * @struct net_flow_flow
+ * @brief describes the match/action entry
+ *
+ * @uid unique identifier for flow
+ * @priority priority to execute flow match/action in table
+ * @match null terminated set of match uids match criteria
+ * @actoin null terminated set of action uids to apply to match
+ *
+ * Flows must match all entries in match set.
+ */
+struct net_flow_flow {
+	int table_id;
+	int uid;
+	int priority;
+	struct net_flow_field_ref *matches;
+	struct net_flow_action *actions;
+};
+
+int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow);
 #endif
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3c3c856..be8d4e4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1197,6 +1197,14 @@ struct net_device_ops {
 	struct net_flow_header	**(*ndo_flow_get_headers)(struct net_device *dev);
 	struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev);
 	struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev);
+	int		        (*ndo_flow_get_flows)(struct sk_buff *skb,
+						      struct net_device *dev,
+						      int table,
+						      int min, int max);
+	int		        (*ndo_flow_set_flows)(struct net_device *dev,
+						      struct net_flow_flow *f);
+	int		        (*ndo_flow_del_flows)(struct net_device *dev,
+						      struct net_flow_flow *f);
 #endif
 };
 
diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
index 2acdb38..125cdc6 100644
--- a/include/uapi/linux/if_flow.h
+++ b/include/uapi/linux/if_flow.h
@@ -329,6 +329,48 @@ enum {
 #define NET_FLOW_TABLE_GRAPH_MAX (__NET_FLOW_TABLE_GRAPH_MAX - 1)
 
 enum {
+	NET_FLOW_NET_FLOW_UNSPEC,
+	NET_FLOW_FLOW,
+	__NET_FLOW_NET_FLOW_MAX,
+};
+#define NET_FLOW_NET_FLOW_MAX (__NET_FLOW_NET_FLOW_MAX - 1)
+
+enum {
+	NET_FLOW_TABLE_FLOWS_UNSPEC,
+	NET_FLOW_TABLE_FLOWS_TABLE,
+	NET_FLOW_TABLE_FLOWS_MINPRIO,
+	NET_FLOW_TABLE_FLOWS_MAXPRIO,
+	NET_FLOW_TABLE_FLOWS_FLOWS,
+	__NET_FLOW_TABLE_FLOWS_MAX,
+};
+#define NET_FLOW_TABLE_FLOWS_MAX (__NET_FLOW_TABLE_FLOWS_MAX - 1)
+
+enum {
+	/* Abort with normal errmsg */
+	NET_FLOW_FLOWS_ERROR_ABORT,
+	/* Ignore errors and continue without logging */
+	NET_FLOW_FLOWS_ERROR_CONTINUE,
+	/* Abort and reply with invalid flow fields */
+	NET_FLOW_FLOWS_ERROR_ABORT_LOG,
+	/* Continue and reply with list of invalid flows */
+	NET_FLOW_FLOWS_ERROR_CONT_LOG,
+	__NET_FLOWS_FLOWS_ERROR_MAX,
+};
+#define NET_FLOWS_FLOWS_ERROR_MAX (__NET_FLOWS_FLOWS_ERROR_MAX - 1)
+
+enum {
+	NET_FLOW_ATTR_UNSPEC,
+	NET_FLOW_ATTR_ERROR,
+	NET_FLOW_ATTR_TABLE,
+	NET_FLOW_ATTR_UID,
+	NET_FLOW_ATTR_PRIORITY,
+	NET_FLOW_ATTR_MATCHES,
+	NET_FLOW_ATTR_ACTIONS,
+	__NET_FLOW_ATTR_MAX,
+};
+#define NET_FLOW_ATTR_MAX (__NET_FLOW_ATTR_MAX - 1)
+
+enum {
 	NET_FLOW_IDENTIFIER_IFINDEX, /* net_device ifindex */
 };
 
@@ -343,6 +385,9 @@ enum {
 	NET_FLOW_HEADER_GRAPH,
 	NET_FLOW_TABLE_GRAPH,
 
+	NET_FLOW_FLOWS,
+	NET_FLOW_FLOWS_ERROR,
+
 	__NET_FLOW_MAX,
 	NET_FLOW_MAX = (__NET_FLOW_MAX - 1),
 };
@@ -354,6 +399,10 @@ enum {
 	NET_FLOW_TABLE_CMD_GET_HDR_GRAPH,
 	NET_FLOW_TABLE_CMD_GET_TABLE_GRAPH,
 
+	NET_FLOW_TABLE_CMD_GET_FLOWS,
+	NET_FLOW_TABLE_CMD_SET_FLOWS,
+	NET_FLOW_TABLE_CMD_DEL_FLOWS,
+
 	__NET_FLOW_CMD_MAX,
 	NET_FLOW_CMD_MAX = (__NET_FLOW_CMD_MAX - 1),
 };
diff --git a/net/core/flow_table.c b/net/core/flow_table.c
index ec3f06d..f4cf293 100644
--- a/net/core/flow_table.c
+++ b/net/core/flow_table.c
@@ -774,6 +774,489 @@ static int net_flow_cmd_get_table_graph(struct sk_buff *skb,
 	return genlmsg_reply(msg, info);
 }
 
+static struct sk_buff *net_flow_build_flows_msg(struct net_device *dev,
+						u32 portid, int seq, u8 cmd,
+						int min, int max, int table)
+{
+	struct genlmsghdr *hdr;
+	struct nlattr *flows;
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOBUFS);
+
+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!hdr)
+		goto out;
+
+	if (nla_put_u32(skb,
+			NET_FLOW_IDENTIFIER_TYPE,
+			NET_FLOW_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
+	flows = nla_nest_start(skb, NET_FLOW_FLOWS);
+	if (!flows) {
+		err = -EMSGSIZE;
+		goto out;
+	}
+
+	err = dev->netdev_ops->ndo_flow_get_flows(skb, dev, table, min, max);
+	if (err < 0)
+		goto out_cancel;
+
+	nla_nest_end(skb, flows);
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0)
+		goto out;
+
+	return skb;
+out_cancel:
+	nla_nest_cancel(skb, flows);
+out:
+	nlmsg_free(skb);
+	return ERR_PTR(err);
+}
+
+static const
+struct nla_policy net_flow_table_flows_policy[NET_FLOW_TABLE_FLOWS_MAX + 1] = {
+	[NET_FLOW_TABLE_FLOWS_TABLE]   = { .type = NLA_U32,},
+	[NET_FLOW_TABLE_FLOWS_MINPRIO] = { .type = NLA_U32,},
+	[NET_FLOW_TABLE_FLOWS_MAXPRIO] = { .type = NLA_U32,},
+	[NET_FLOW_TABLE_FLOWS_FLOWS]   = { .type = NLA_NESTED,},
+};
+
+static int net_flow_table_cmd_get_flows(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct nlattr *tb[NET_FLOW_TABLE_FLOWS_MAX+1];
+	int table, min = -1, max = -1;
+	struct net_device *dev;
+	struct sk_buff *msg;
+	int err = -EINVAL;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_get_flows) {
+		dev_put(dev);
+		return -EOPNOTSUPP;
+	}
+
+	if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] ||
+	    !info->attrs[NET_FLOW_IDENTIFIER] ||
+	    !info->attrs[NET_FLOW_FLOWS])
+		goto out;
+
+	err = nla_parse_nested(tb, NET_FLOW_TABLE_FLOWS_MAX,
+			       info->attrs[NET_FLOW_FLOWS],
+			       net_flow_table_flows_policy);
+	if (err)
+		goto out;
+
+	if (!tb[NET_FLOW_TABLE_FLOWS_TABLE])
+		goto out;
+
+	table = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_TABLE]);
+
+	if (tb[NET_FLOW_TABLE_FLOWS_MINPRIO])
+		min = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MINPRIO]);
+	if (tb[NET_FLOW_TABLE_FLOWS_MAXPRIO])
+		max = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MAXPRIO]);
+
+	msg = net_flow_build_flows_msg(dev,
+				       info->snd_portid,
+				       info->snd_seq,
+				       NET_FLOW_TABLE_CMD_GET_FLOWS,
+				       min, max, table);
+	dev_put(dev);
+
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	return genlmsg_reply(msg, info);
+out:
+	dev_put(dev);
+	return err;
+}
+
+static struct sk_buff *net_flow_start_errmsg(struct net_device *dev,
+					     struct genlmsghdr **hdr,
+					     u32 portid, int seq, u8 cmd)
+{
+	struct genlmsghdr *h;
+	struct sk_buff *skb;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-EMSGSIZE);
+
+	h = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
+	if (!h)
+		return ERR_PTR(-EMSGSIZE);
+
+	if (nla_put_u32(skb,
+			NET_FLOW_IDENTIFIER_TYPE,
+			NET_FLOW_IDENTIFIER_IFINDEX) ||
+	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex))
+		return ERR_PTR(-EMSGSIZE);
+
+	*hdr = h;
+	return skb;
+}
+
+static struct sk_buff *net_flow_end_flow_errmsg(struct sk_buff *skb,
+						struct genlmsghdr *hdr)
+{
+	int err;
+
+	err = genlmsg_end(skb, hdr);
+	if (err < 0) {
+		nlmsg_free(skb);
+		return ERR_PTR(err);
+	}
+
+	return skb;
+}
+
+static int net_flow_put_flow_action(struct sk_buff *skb,
+				    struct net_flow_action *a)
+{
+	struct nlattr *action, *sigs;
+	int i, err = 0;
+
+	action = nla_nest_start(skb, NET_FLOW_ACTION);
+	if (!action)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(skb, NET_FLOW_ACTION_ATTR_UID, a->uid))
+		return -EMSGSIZE;
+
+	if (!a->args)
+		goto done;
+
+	for (i = 0; a->args[i].type; i++) {
+		sigs = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
+		if (!sigs) {
+			nla_nest_cancel(skb, action);
+			return -EMSGSIZE;
+		}
+
+		err = net_flow_put_act_types(skb, a[i].args);
+		if (err) {
+			nla_nest_cancel(skb, action);
+			nla_nest_cancel(skb, sigs);
+			return err;
+		}
+		nla_nest_end(skb, sigs);
+	}
+
+done:
+	nla_nest_end(skb, action);
+	return 0;
+}
+
+int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow)
+{
+	struct nlattr *flows, *matches;
+	struct nlattr *actions = NULL; /* must be null to unwind */
+	int err, j, i = 0;
+
+	flows = nla_nest_start(skb, NET_FLOW_FLOW);
+	if (!flows)
+		goto put_failure;
+
+	if (nla_put_u32(skb, NET_FLOW_ATTR_TABLE, flow->table_id) ||
+	    nla_put_u32(skb, NET_FLOW_ATTR_UID, flow->uid) ||
+	    nla_put_u32(skb, NET_FLOW_ATTR_PRIORITY, flow->priority))
+		goto flows_put_failure;
+
+	if (flow->matches) {
+		matches = nla_nest_start(skb, NET_FLOW_ATTR_MATCHES);
+		if (!matches)
+			goto flows_put_failure;
+
+		for (j = 0; flow->matches && flow->matches[j].header; j++) {
+			struct net_flow_field_ref *f = &flow->matches[j];
+
+			if (!f->header)
+				continue;
+
+			nla_put(skb, NET_FLOW_FIELD_REF, sizeof(*f), f);
+		}
+		nla_nest_end(skb, matches);
+	}
+
+	if (flow->actions) {
+		actions = nla_nest_start(skb, NET_FLOW_ATTR_ACTIONS);
+		if (!actions)
+			goto flows_put_failure;
+
+		for (i = 0; flow->actions && flow->actions[i].uid; i++) {
+			err = net_flow_put_flow_action(skb, &flow->actions[i]);
+			if (err) {
+				nla_nest_cancel(skb, actions);
+				goto flows_put_failure;
+			}
+		}
+		nla_nest_end(skb, actions);
+	}
+
+	nla_nest_end(skb, flows);
+	return 0;
+
+flows_put_failure:
+	nla_nest_cancel(skb, flows);
+put_failure:
+	return -EMSGSIZE;
+}
+EXPORT_SYMBOL(net_flow_put_flow);
+
+static int net_flow_get_field(struct net_flow_field_ref *field,
+			      struct nlattr *nla)
+{
+	if (nla_type(nla) != NET_FLOW_FIELD_REF)
+		return -EINVAL;
+
+	if (nla_len(nla) < sizeof(*field))
+		return -EINVAL;
+
+	*field = *(struct net_flow_field_ref *)nla_data(nla);
+	return 0;
+}
+
+static int net_flow_get_action(struct net_flow_action *a, struct nlattr *attr)
+{
+	struct nlattr *act[NET_FLOW_ACTION_ATTR_MAX+1];
+	struct nlattr *args;
+	int rem;
+	int err, count = 0;
+
+	if (nla_type(attr) != NET_FLOW_ACTION) {
+		pr_warn("%s: expected NET_FLOW_ACTION\n", __func__);
+		return 0;
+	}
+
+	err = nla_parse_nested(act, NET_FLOW_ACTION_ATTR_MAX,
+			       attr, net_flow_action_policy);
+	if (err < 0)
+		return err;
+
+	if (!act[NET_FLOW_ACTION_ATTR_UID] ||
+	    !act[NET_FLOW_ACTION_ATTR_SIGNATURE])
+		return -EINVAL;
+
+	a->uid = nla_get_u32(act[NET_FLOW_ACTION_ATTR_UID]);
+
+	nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem)
+		count++; /* unoptimized max possible */
+
+	a->args = kcalloc(count + 1,
+			  sizeof(struct net_flow_action_arg),
+			  GFP_KERNEL);
+	count = 0;
+
+	nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem) {
+		if (nla_type(args) != NET_FLOW_ACTION_ARG)
+			continue;
+
+		if (nla_len(args) < sizeof(struct net_flow_action_arg)) {
+			kfree(a->args);
+			return -EINVAL;
+		}
+
+		a->args[count] = *(struct net_flow_action_arg *)nla_data(args);
+	}
+	return 0;
+}
+
+static const
+struct nla_policy net_flow_flow_policy[NET_FLOW_ATTR_MAX + 1] = {
+	[NET_FLOW_ATTR_TABLE]		= { .type = NLA_U32 },
+	[NET_FLOW_ATTR_UID]		= { .type = NLA_U32 },
+	[NET_FLOW_ATTR_PRIORITY]	= { .type = NLA_U32 },
+	[NET_FLOW_ATTR_MATCHES]		= { .type = NLA_NESTED },
+	[NET_FLOW_ATTR_ACTIONS]		= { .type = NLA_NESTED },
+};
+
+static int net_flow_get_flow(struct net_flow_flow *flow, struct nlattr *attr)
+{
+	struct nlattr *f[NET_FLOW_ATTR_MAX+1];
+	struct nlattr *attr2;
+	int rem, err;
+	int count = 0;
+
+	err = nla_parse_nested(f, NET_FLOW_ATTR_MAX,
+			       attr, net_flow_flow_policy);
+	if (err < 0)
+		return -EINVAL;
+
+	if (!f[NET_FLOW_ATTR_TABLE] || !f[NET_FLOW_ATTR_UID] ||
+	    !f[NET_FLOW_ATTR_PRIORITY])
+		return -EINVAL;
+
+	flow->table_id = nla_get_u32(f[NET_FLOW_ATTR_TABLE]);
+	flow->uid = nla_get_u32(f[NET_FLOW_ATTR_UID]);
+	flow->priority = nla_get_u32(f[NET_FLOW_ATTR_PRIORITY]);
+
+	flow->matches = NULL;
+	flow->actions = NULL;
+
+	if (f[NET_FLOW_ATTR_MATCHES]) {
+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_MATCHES], rem)
+			count++;
+
+		/* Null terminated list of matches */
+		flow->matches = kcalloc(count + 1,
+					sizeof(struct net_flow_field_ref),
+					GFP_KERNEL);
+		if (!flow->matches)
+			return -ENOMEM;
+
+		count = 0;
+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_MATCHES], rem) {
+			err = net_flow_get_field(&flow->matches[count], attr2);
+			if (err) {
+				kfree(flow->matches);
+				return err;
+			}
+			count++;
+		}
+	}
+
+	if (f[NET_FLOW_ATTR_ACTIONS]) {
+		count = 0;
+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_ACTIONS], rem)
+			count++;
+
+		/* Null terminated list of actions */
+		flow->actions = kcalloc(count + 1,
+					sizeof(struct net_flow_action),
+					GFP_KERNEL);
+		if (!flow->actions) {
+			kfree(flow->matches);
+			return -ENOMEM;
+		}
+
+		count = 0;
+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_ACTIONS], rem) {
+			err = net_flow_get_action(&flow->actions[count], attr2);
+			if (err) {
+				kfree(flow->matches);
+				kfree(flow->actions);
+				return err;
+			}
+			count++;
+		}
+	}
+
+	return 0;
+}
+
+static int net_flow_table_cmd_flows(struct sk_buff *recv_skb,
+				    struct genl_info *info)
+{
+	int rem, err_handle = NET_FLOW_FLOWS_ERROR_ABORT;
+	struct sk_buff *skb = NULL;
+	struct net_flow_flow this;
+	struct genlmsghdr *hdr;
+	struct net_device *dev;
+	struct nlattr *flow, *flows;
+	int cmd = info->genlhdr->cmd;
+	int err = -EOPNOTSUPP;
+
+	dev = net_flow_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	if (!dev->netdev_ops->ndo_flow_set_flows ||
+	    !dev->netdev_ops->ndo_flow_del_flows)
+		goto out;
+
+	if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] ||
+	    !info->attrs[NET_FLOW_IDENTIFIER] ||
+	    !info->attrs[NET_FLOW_FLOWS]) {
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (info->attrs[NET_FLOW_FLOWS_ERROR])
+		err_handle = nla_get_u32(info->attrs[NET_FLOW_FLOWS_ERROR]);
+
+	nla_for_each_nested(flow, info->attrs[NET_FLOW_FLOWS], rem) {
+		if (nla_type(flow) != NET_FLOW_FLOW)
+			continue;
+
+		err = net_flow_get_flow(&this, flow);
+		if (err)
+			goto out;
+
+		switch (cmd) {
+		case NET_FLOW_TABLE_CMD_SET_FLOWS:
+			err = dev->netdev_ops->ndo_flow_set_flows(dev, &this);
+			break;
+		case NET_FLOW_TABLE_CMD_DEL_FLOWS:
+			err = dev->netdev_ops->ndo_flow_del_flows(dev, &this);
+			break;
+		default:
+			err = -EOPNOTSUPP;
+			break;
+		}
+
+		if (err && err_handle != NET_FLOW_FLOWS_ERROR_CONTINUE) {
+			if (!skb) {
+				skb = net_flow_start_errmsg(dev, &hdr,
+							    info->snd_portid,
+							    info->snd_seq,
+							    cmd);
+				if (IS_ERR(skb)) {
+					err = PTR_ERR(skb);
+					goto out_plus_free;
+				}
+
+				flows = nla_nest_start(skb, NET_FLOW_FLOWS);
+				if (!flows) {
+					err = -EMSGSIZE;
+					goto out_plus_free;
+				}
+			}
+
+			net_flow_put_flow(skb, &this);
+		}
+
+		/* Cleanup flow */
+		kfree(this.matches);
+		kfree(this.actions);
+
+		if (err && err_handle == NET_FLOW_FLOWS_ERROR_ABORT)
+			goto out;
+	}
+
+	dev_put(dev);
+
+	if (skb) {
+		nla_nest_end(skb, flows);
+		net_flow_end_flow_errmsg(skb, hdr);
+		return genlmsg_reply(skb, info);
+	}
+	return 0;
+
+out_plus_free:
+	kfree(this.matches);
+	kfree(this.actions);
+out:
+	if (skb)
+		nlmsg_free(skb);
+	dev_put(dev);
+	return -EINVAL;
+}
+
 static const struct nla_policy net_flow_cmd_policy[NET_FLOW_MAX + 1] = {
 	[NET_FLOW_IDENTIFIER_TYPE] = {.type = NLA_U32, },
 	[NET_FLOW_IDENTIFIER]	   = {.type = NLA_U32, },
@@ -815,6 +1298,24 @@ static const struct genl_ops net_flow_table_nl_ops[] = {
 		.policy = net_flow_cmd_policy,
 		.flags = GENL_ADMIN_PERM,
 	},
+	{
+		.cmd = NET_FLOW_TABLE_CMD_GET_FLOWS,
+		.doit = net_flow_table_cmd_get_flows,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NET_FLOW_TABLE_CMD_SET_FLOWS,
+		.doit = net_flow_table_cmd_flows,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NET_FLOW_TABLE_CMD_DEL_FLOWS,
+		.doit = net_flow_table_cmd_flows,
+		.policy = net_flow_cmd_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
 };
 
 static int __init net_flow_nl_module_init(void)

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [net-next PATCH v1 03/11] net: flow_table: add apply action argument to tables
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
  2014-12-31 19:45 ` [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables John Fastabend
  2014-12-31 19:46 ` [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow John Fastabend
@ 2014-12-31 19:46 ` John Fastabend
  2015-01-08 17:41   ` Jiri Pirko
  2014-12-31 19:47 ` [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch John Fastabend
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:46 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

Actions may not always be applied after exiting a table. For example
some pipelines may accumulate actions and then apply them at the end
of a pipeline.

To model this we use a table type called APPLY. Tables who share an
apply identifier have their actions applied in one step.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/if_flow.h      |    1 +
 include/uapi/linux/if_flow.h |    1 +
 net/core/flow_table.c        |    1 +
 3 files changed, 3 insertions(+)

diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
index 20fa752..a042a3d 100644
--- a/include/linux/if_flow.h
+++ b/include/linux/if_flow.h
@@ -67,6 +67,7 @@ struct net_flow_table {
 	char name[NET_FLOW_NAMSIZ];
 	int uid;
 	int source;
+	int apply_action;
 	int size;
 	struct net_flow_field_ref *matches;
 	int *actions;
diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
index 125cdc6..3c1a860 100644
--- a/include/uapi/linux/if_flow.h
+++ b/include/uapi/linux/if_flow.h
@@ -265,6 +265,7 @@ enum {
 	NET_FLOW_TABLE_ATTR_NAME,
 	NET_FLOW_TABLE_ATTR_UID,
 	NET_FLOW_TABLE_ATTR_SOURCE,
+	NET_FLOW_TABLE_ATTR_APPLY,
 	NET_FLOW_TABLE_ATTR_SIZE,
 	NET_FLOW_TABLE_ATTR_MATCHES,
 	NET_FLOW_TABLE_ATTR_ACTIONS,
diff --git a/net/core/flow_table.c b/net/core/flow_table.c
index f4cf293..97cdf92 100644
--- a/net/core/flow_table.c
+++ b/net/core/flow_table.c
@@ -223,6 +223,7 @@ static int net_flow_put_table(struct net_device *dev,
 	if (nla_put_string(skb, NET_FLOW_TABLE_ATTR_NAME, t->name) ||
 	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_UID, t->uid) ||
 	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SOURCE, t->source) ||
+	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_APPLY, t->apply_action) ||
 	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SIZE, t->size))
 		return -EMSGSIZE;
 

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (2 preceding siblings ...)
  2014-12-31 19:46 ` [net-next PATCH v1 03/11] net: flow_table: add apply action argument to tables John Fastabend
@ 2014-12-31 19:47 ` John Fastabend
  2015-01-04  8:43   ` Or Gerlitz
  2015-01-06  7:01   ` Scott Feldman
  2014-12-31 19:47 ` [net-next PATCH v1 05/11] net: rocker: add set flow rules John Fastabend
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:47 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

This adds rocker support for the net_flow_get_* operations. With this
we can interrogate rocker.

Here we see that for static configurations enabling the get operations
is simply a matter of defining a pipeline model and returning the
structures for the core infrastructure to encapsulate into netlink
messages.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c          |   35 +
 drivers/net/ethernet/rocker/rocker_pipeline.h |  673 +++++++++++++++++++++++++
 2 files changed, 708 insertions(+)
 create mode 100644 drivers/net/ethernet/rocker/rocker_pipeline.h

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index fded127..4c6787a 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -36,6 +36,7 @@
 #include <generated/utsrelease.h>
 
 #include "rocker.h"
+#include "rocker_pipeline.h"
 
 static const char rocker_driver_name[] = "rocker";
 
@@ -3780,6 +3781,33 @@ static int rocker_port_switch_port_stp_update(struct net_device *dev, u8 state)
 	return rocker_port_stp_update(rocker_port, state);
 }
 
+#ifdef CONFIG_NET_FLOW_TABLES
+static struct net_flow_table **rocker_get_tables(struct net_device *d)
+{
+	return rocker_table_list;
+}
+
+static struct net_flow_header **rocker_get_headers(struct net_device *d)
+{
+	return rocker_header_list;
+}
+
+static struct net_flow_action **rocker_get_actions(struct net_device *d)
+{
+	return rocker_action_list;
+}
+
+static struct net_flow_tbl_node **rocker_get_tgraph(struct net_device *d)
+{
+	return rocker_table_nodes;
+}
+
+static struct net_flow_hdr_node **rocker_get_hgraph(struct net_device *d)
+{
+	return rocker_header_nodes;
+}
+#endif
+
 static const struct net_device_ops rocker_port_netdev_ops = {
 	.ndo_open			= rocker_port_open,
 	.ndo_stop			= rocker_port_stop,
@@ -3794,6 +3822,13 @@ static const struct net_device_ops rocker_port_netdev_ops = {
 	.ndo_bridge_getlink		= rocker_port_bridge_getlink,
 	.ndo_switch_parent_id_get	= rocker_port_switch_parent_id_get,
 	.ndo_switch_port_stp_update	= rocker_port_switch_port_stp_update,
+#ifdef CONFIG_NET_FLOW_TABLES
+	.ndo_flow_get_tables		= rocker_get_tables,
+	.ndo_flow_get_headers		= rocker_get_headers,
+	.ndo_flow_get_actions		= rocker_get_actions,
+	.ndo_flow_get_tbl_graph		= rocker_get_tgraph,
+	.ndo_flow_get_hdr_graph		= rocker_get_hgraph,
+#endif
 };
 
 /********************
diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
new file mode 100644
index 0000000..9544339
--- /dev/null
+++ b/drivers/net/ethernet/rocker/rocker_pipeline.h
@@ -0,0 +1,673 @@
+#ifndef _MY_PIPELINE_H_
+#define _MY_PIPELINE_H_
+
+#include <linux/if_flow.h>
+
+/* header definition */
+#define HEADER_ETHERNET_SRC_MAC 1
+#define HEADER_ETHERNET_DST_MAC 2
+#define HEADER_ETHERNET_ETHERTYPE 3
+struct net_flow_field ethernet_fields[3] = {
+	{ .name = "src_mac", .uid = HEADER_ETHERNET_SRC_MAC, .bitwidth = 48},
+	{ .name = "dst_mac", .uid = HEADER_ETHERNET_DST_MAC, .bitwidth = 48},
+	{ .name = "ethertype",
+	  .uid = HEADER_ETHERNET_ETHERTYPE,
+	  .bitwidth = 16},
+};
+
+#define HEADER_ETHERNET 1
+struct net_flow_header ethernet = {
+	.name = "ethernet",
+	.uid = HEADER_ETHERNET,
+	.field_sz = 3,
+	.fields = ethernet_fields,
+};
+
+#define HEADER_VLAN_PCP 1
+#define HEADER_VLAN_CFI 2
+#define HEADER_VLAN_VID 3
+#define HEADER_VLAN_ETHERTYPE 4
+struct net_flow_field vlan_fields[4] = {
+	{ .name = "pcp", .uid = HEADER_VLAN_PCP, .bitwidth = 3,},
+	{ .name = "cfi", .uid = HEADER_VLAN_CFI, .bitwidth = 1,},
+	{ .name = "vid", .uid = HEADER_VLAN_VID, .bitwidth = 12,},
+	{ .name = "ethertype", .uid = HEADER_VLAN_ETHERTYPE, .bitwidth = 16,},
+};
+
+#define HEADER_VLAN 2
+struct net_flow_header vlan = {
+	.name = "vlan",
+	.uid = HEADER_VLAN,
+	.field_sz = 4,
+	.fields = vlan_fields,
+};
+
+#define HEADER_IPV4_VERSION 1
+#define HEADER_IPV4_IHL 2
+#define HEADER_IPV4_DSCP 3
+#define HEADER_IPV4_ECN 4
+#define HEADER_IPV4_LENGTH 5
+#define HEADER_IPV4_IDENTIFICATION 6
+#define HEADER_IPV4_FLAGS 7
+#define HEADER_IPV4_FRAGMENT_OFFSET 8
+#define HEADER_IPV4_TTL 9
+#define HEADER_IPV4_PROTOCOL 10
+#define HEADER_IPV4_CSUM 11
+#define HEADER_IPV4_SRC_IP 12
+#define HEADER_IPV4_DST_IP 13
+#define HEADER_IPV4_OPTIONS 14
+struct net_flow_field ipv4_fields[14] = {
+	{ .name = "version",
+	  .uid = HEADER_IPV4_VERSION,
+	  .bitwidth = 4,},
+	{ .name = "ihl",
+	  .uid = HEADER_IPV4_IHL,
+	  .bitwidth = 4,},
+	{ .name = "dscp",
+	  .uid = HEADER_IPV4_DSCP,
+	  .bitwidth = 6,},
+	{ .name = "ecn",
+	  .uid = HEADER_IPV4_ECN,
+	  .bitwidth = 2,},
+	{ .name = "length",
+	  .uid = HEADER_IPV4_LENGTH,
+	  .bitwidth = 8,},
+	{ .name = "identification",
+	  .uid = HEADER_IPV4_IDENTIFICATION,
+	  .bitwidth = 8,},
+	{ .name = "flags",
+	  .uid = HEADER_IPV4_FLAGS,
+	  .bitwidth = 3,},
+	{ .name = "fragment_offset",
+	  .uid = HEADER_IPV4_FRAGMENT_OFFSET,
+	  .bitwidth = 13,},
+	{ .name = "ttl",
+	  .uid = HEADER_IPV4_TTL,
+	  .bitwidth = 1,},
+	{ .name = "protocol",
+	  .uid = HEADER_IPV4_PROTOCOL,
+	  .bitwidth = 8,},
+	{ .name = "csum",
+	  .uid = HEADER_IPV4_CSUM,
+	  .bitwidth = 8,},
+	{ .name = "src_ip",
+	  .uid = HEADER_IPV4_SRC_IP,
+	  .bitwidth = 32,},
+	{ .name = "dst_ip",
+	  .uid = HEADER_IPV4_DST_IP,
+	  .bitwidth = 32,},
+	{ .name = "options",
+	  .uid = HEADER_IPV4_OPTIONS,
+	  .bitwidth = -1,},
+};
+
+#define HEADER_IPV4 3
+struct net_flow_header ipv4 = {
+	.name = "ipv4",
+	.uid = HEADER_IPV4,
+	.field_sz = 14,
+	.fields = ipv4_fields,
+};
+
+#define HEADER_METADATA_IN_LPORT 1
+#define HEADER_METADATA_GOTO_TBL 2
+#define HEADER_METADATA_GROUP_ID 3
+struct net_flow_field metadata_fields[3] = {
+	{ .name = "in_lport",
+	  .uid = HEADER_METADATA_IN_LPORT,
+	  .bitwidth = 32,},
+	{ .name = "goto_tbl",
+	  .uid = HEADER_METADATA_GOTO_TBL,
+	  .bitwidth = 16,},
+	{ .name = "group_id",
+	  .uid = HEADER_METADATA_GROUP_ID,
+	  .bitwidth = 32,},
+};
+
+#define HEADER_METADATA 4
+struct net_flow_header metadata_t = {
+	.name = "metadata_t",
+	.uid = HEADER_METADATA,
+	.field_sz = 3,
+	.fields = metadata_fields,
+};
+
+struct net_flow_header null_hdr = {.name = "",
+				   .uid = 0,
+				   .field_sz = 0,
+				   .fields = NULL};
+
+struct net_flow_header *rocker_header_list[8] = {
+	&ethernet,
+	&vlan,
+	&ipv4,
+	&metadata_t,
+	&null_hdr,
+};
+
+/* action definitions */
+struct net_flow_action_arg null_args[1] = {
+	{
+		.name = "",
+		.type = NET_FLOW_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+struct net_flow_action null_action = {
+	.name = "", .uid = 0, .args = NULL,
+};
+
+struct net_flow_action_arg set_goto_table_args[2] = {
+	{
+		.name = "table",
+		.type = NET_FLOW_ACTION_ARG_TYPE_U16,
+		.value_u16 = 0,
+	},
+	{
+		.name = "",
+		.type = NET_FLOW_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+#define ACTION_SET_GOTO_TABLE 1
+struct net_flow_action set_goto_table = {
+	.name = "set_goto_table",
+	.uid = ACTION_SET_GOTO_TABLE,
+	.args = set_goto_table_args,
+};
+
+struct net_flow_action_arg set_vlan_id_args[2] = {
+	{
+		.name = "vlan_id",
+		.type = NET_FLOW_ACTION_ARG_TYPE_U16,
+		.value_u16 = 0,
+	},
+	{
+		.name = "",
+		.type = NET_FLOW_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+#define ACTION_SET_VLAN_ID 2
+struct net_flow_action set_vlan_id = {
+	.name = "set_vlan_id",
+	.uid = ACTION_SET_VLAN_ID,
+	.args = set_vlan_id_args,
+};
+
+/* TBD: what is the untagged bool about in vlan table */
+#define ACTION_COPY_TO_CPU 3
+struct net_flow_action copy_to_cpu = {
+	.name = "copy_to_cpu",
+	.uid = ACTION_COPY_TO_CPU,
+	.args = null_args,
+};
+
+struct net_flow_action_arg set_group_id_args[2] = {
+	{
+		.name = "group_id",
+		.type = NET_FLOW_ACTION_ARG_TYPE_U32,
+		.value_u32 = 0,
+	},
+	{
+		.name = "",
+		.type = NET_FLOW_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+#define ACTION_SET_GROUP_ID 4
+struct net_flow_action set_group_id = {
+	.name = "set_group_id",
+	.uid = ACTION_SET_GROUP_ID,
+	.args = set_group_id_args,
+};
+
+#define ACTION_POP_VLAN 5
+struct net_flow_action pop_vlan = {
+	.name = "pop_vlan",
+	.uid = ACTION_POP_VLAN,
+	.args = null_args,
+};
+
+struct net_flow_action_arg set_eth_src_args[2] = {
+	{
+		.name = "eth_src",
+		.type = NET_FLOW_ACTION_ARG_TYPE_U64,
+		.value_u64 = 0,
+	},
+	{
+		.name = "",
+		.type = NET_FLOW_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+#define ACTION_SET_ETH_SRC 6
+struct net_flow_action set_eth_src = {
+	.name = "set_eth_src",
+	.uid = ACTION_SET_ETH_SRC,
+	.args = set_eth_src_args,
+};
+
+struct net_flow_action_arg set_eth_dst_args[2] = {
+	{
+		.name = "eth_dst",
+		.type = NET_FLOW_ACTION_ARG_TYPE_U64,
+		.value_u64 = 0,
+	},
+	{
+		.name = "",
+		.type = NET_FLOW_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+#define ACTION_SET_ETH_DST 7
+struct net_flow_action set_eth_dst = {
+	.name = "set_eth_dst",
+	.uid = ACTION_SET_ETH_DST,
+	.args = set_eth_dst_args,
+};
+
+struct net_flow_action_arg set_out_port_args[2] = {
+	{
+		.name = "set_out_port",
+		.type = NET_FLOW_ACTION_ARG_TYPE_U32,
+		.value_u32 = 0,
+	},
+	{
+		.name = "",
+		.type = NET_FLOW_ACTION_ARG_TYPE_NULL,
+	},
+};
+
+#define ACTION_SET_OUT_PORT 8
+struct net_flow_action set_out_port = {
+	.name = "set_out_port",
+	.uid = ACTION_SET_OUT_PORT,
+	.args = set_out_port_args,
+};
+
+struct net_flow_action *rocker_action_list[8] = {
+	&set_goto_table,
+	&set_vlan_id,
+	&copy_to_cpu,
+	&set_group_id,
+	&pop_vlan,
+	&set_eth_src,
+	&set_eth_dst,
+	&null_action,
+};
+
+/* headers graph */
+#define HEADER_INSTANCE_ETHERNET 1
+#define HEADER_INSTANCE_VLAN_OUTER 2
+#define HEADER_INSTANCE_IPV4 3
+#define HEADER_INSTANCE_IN_LPORT 4
+#define HEADER_INSTANCE_GOTO_TABLE 5
+#define HEADER_INSTANCE_GROUP_ID 6
+
+struct net_flow_jump_table parse_ethernet[3] = {
+	{
+		.field = {
+		   .header = HEADER_ETHERNET,
+		   .field = HEADER_ETHERNET_ETHERTYPE,
+		   .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
+		   .value_u16 = 0x0800,
+		},
+		.node = HEADER_INSTANCE_IPV4,
+	},
+	{
+		.field = {
+		   .header = HEADER_ETHERNET,
+		   .field = HEADER_ETHERNET_ETHERTYPE,
+		   .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
+		   .value_u16 = 0x8100,
+		},
+		.node = HEADER_INSTANCE_VLAN_OUTER,
+	},
+	{
+		.field = {0},
+		.node = 0,
+	},
+};
+
+int ethernet_headers[2] = {HEADER_ETHERNET, 0};
+
+struct net_flow_hdr_node ethernet_header_node = {
+	.name = "ethernet",
+	.uid = HEADER_INSTANCE_ETHERNET,
+	.hdrs = ethernet_headers,
+	.jump = parse_ethernet,
+};
+
+struct net_flow_jump_table parse_vlan[2] = {
+	{
+		.field = {
+		   .header = HEADER_VLAN,
+		   .field = HEADER_VLAN_ETHERTYPE,
+		   .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
+		   .value_u16 = 0x0800,
+		},
+		.node = HEADER_INSTANCE_IPV4,
+	},
+	{
+		.field = {0},
+		.node = 0,
+	},
+};
+
+int vlan_headers[2] = {HEADER_VLAN, 0};
+struct net_flow_hdr_node vlan_header_node = {
+	.name = "vlan",
+	.uid = HEADER_INSTANCE_VLAN_OUTER,
+	.hdrs = vlan_headers,
+	.jump = parse_vlan,
+};
+
+struct net_flow_jump_table terminal_headers[2] = {
+	{
+		.field = {0},
+		.node = NET_FLOW_JUMP_TABLE_DONE,
+	},
+	{
+		.field = {0},
+		.node = 0,
+	},
+};
+
+int ipv4_headers[2] = {HEADER_IPV4, 0};
+struct net_flow_hdr_node ipv4_header_node = {
+	.name = "ipv4",
+	.uid = HEADER_INSTANCE_IPV4,
+	.hdrs = ipv4_headers,
+	.jump = terminal_headers,
+};
+
+int metadata_headers[2] = {HEADER_METADATA, 0};
+struct net_flow_hdr_node in_lport_header_node = {
+	.name = "in_lport",
+	.uid = HEADER_INSTANCE_IN_LPORT,
+	.hdrs = metadata_headers,
+	.jump = terminal_headers,
+};
+
+struct net_flow_hdr_node goto_table_header_node = {
+	.name = "goto_table",
+	.uid = HEADER_INSTANCE_GOTO_TABLE,
+	.hdrs = metadata_headers,
+	.jump = terminal_headers,
+};
+
+struct net_flow_hdr_node group_id_header_node = {
+	.name = "group_id",
+	.uid = HEADER_INSTANCE_GROUP_ID,
+	.hdrs = metadata_headers,
+	.jump = terminal_headers,
+};
+
+struct net_flow_hdr_node null_header = {.name = "", .uid = 0,};
+
+struct net_flow_hdr_node *rocker_header_nodes[7] = {
+	&ethernet_header_node,
+	&vlan_header_node,
+	&ipv4_header_node,
+	&in_lport_header_node,
+	&goto_table_header_node,
+	&group_id_header_node,
+	&null_header,
+};
+
+/* table definition */
+struct net_flow_field_ref matches_ig_port[2] = {
+	{ .instance = HEADER_INSTANCE_IN_LPORT,
+	  .header = HEADER_METADATA,
+	  .field = HEADER_METADATA_IN_LPORT,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref matches_vlan[3] = {
+	{ .instance = HEADER_INSTANCE_IN_LPORT,
+	  .header = HEADER_METADATA,
+	  .field = HEADER_METADATA_IN_LPORT,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = HEADER_INSTANCE_VLAN_OUTER,
+	  .header = HEADER_VLAN,
+	  .field = HEADER_VLAN_VID,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref matches_term_mac[5] = {
+	{ .instance = HEADER_INSTANCE_IN_LPORT,
+	  .header = HEADER_METADATA,
+	  .field = HEADER_METADATA_IN_LPORT,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_ETHERTYPE,
+	  .mask_type = NET_FLOW_MASK_TYPE_EXACT},
+	{ .instance = HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_DST_MAC,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = HEADER_INSTANCE_VLAN_OUTER,
+	  .header = HEADER_VLAN,
+	  .field = HEADER_VLAN_VID,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref matches_ucast_routing[3] = {
+	{ .instance = HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_ETHERTYPE,
+	  .mask_type = NET_FLOW_MASK_TYPE_EXACT},
+	{ .instance = HEADER_INSTANCE_IPV4,
+	  .header = HEADER_IPV4,
+	  .field = HEADER_IPV4_DST_IP,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref matches_bridge[3] = {
+	{ .instance = HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_DST_MAC,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = HEADER_INSTANCE_VLAN_OUTER,
+	  .header = HEADER_VLAN,
+	  .field = HEADER_VLAN_VID,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref matches_acl[8] = {
+	{ .instance = HEADER_INSTANCE_IN_LPORT,
+	  .header = HEADER_METADATA,
+	  .field = HEADER_METADATA_IN_LPORT,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_SRC_MAC,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_DST_MAC,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = HEADER_INSTANCE_ETHERNET,
+	  .header = HEADER_ETHERNET,
+	  .field = HEADER_ETHERNET_ETHERTYPE,
+	  .mask_type = NET_FLOW_MASK_TYPE_EXACT},
+	{ .instance = HEADER_INSTANCE_VLAN_OUTER,
+	  .header = HEADER_VLAN,
+	  .field = HEADER_VLAN_VID,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = HEADER_INSTANCE_IPV4,
+	  .header = HEADER_IPV4,
+	  .field = HEADER_IPV4_PROTOCOL,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = HEADER_INSTANCE_IPV4,
+	  .header = HEADER_IPV4,
+	  .field = HEADER_IPV4_DSCP,
+	  .mask_type = NET_FLOW_MASK_TYPE_LPM},
+	{ .instance = 0, .field = 0},
+};
+
+int actions_ig_port[2] = {ACTION_SET_GOTO_TABLE, 0};
+int actions_vlan[3] = {ACTION_SET_GOTO_TABLE, ACTION_SET_VLAN_ID, 0};
+int actions_term_mac[3] = {ACTION_SET_GOTO_TABLE, ACTION_COPY_TO_CPU, 0};
+int actions_ucast_routing[3] = {ACTION_SET_GOTO_TABLE, ACTION_SET_GROUP_ID, 0};
+int actions_bridge[4] = {ACTION_SET_GOTO_TABLE,
+			 ACTION_SET_GROUP_ID,
+			 ACTION_COPY_TO_CPU, 0};
+int actions_acl[2] = {ACTION_SET_GROUP_ID, 0};
+
+enum rocker_flow_table_id_space {
+	ROCKER_FLOW_TABLE_ID_INGRESS_PORT = 1,
+	ROCKER_FLOW_TABLE_ID_VLAN,
+	ROCKER_FLOW_TABLE_ID_TERMINATION_MAC,
+	ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING,
+	ROCKER_FLOW_TABLE_ID_BRIDGING,
+	ROCKER_FLOW_TABLE_ID_ACL_POLICY,
+	ROCKER_FLOW_TABLE_NULL = 0,
+};
+
+struct net_flow_table ingress_port_table = {
+	.name = "ingress_port",
+	.uid = ROCKER_FLOW_TABLE_ID_INGRESS_PORT,
+	.source = 1,
+	.size = -1,
+	.matches = matches_ig_port,
+	.actions = actions_ig_port,
+};
+
+struct net_flow_table vlan_table = {
+	.name = "vlan",
+	.uid = ROCKER_FLOW_TABLE_ID_VLAN,
+	.source = 1,
+	.size = -1,
+	.matches = matches_vlan,
+	.actions = actions_vlan,
+};
+
+struct net_flow_table term_mac_table = {
+	.name = "term_mac",
+	.uid = ROCKER_FLOW_TABLE_ID_TERMINATION_MAC,
+	.source = 1,
+	.size = -1,
+	.matches = matches_term_mac,
+	.actions = actions_term_mac,
+};
+
+struct net_flow_table ucast_routing_table = {
+	.name = "ucast_routing",
+	.uid = ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING,
+	.source = 1,
+	.size = -1,
+	.matches = matches_ucast_routing,
+	.actions = actions_ucast_routing,
+};
+
+struct net_flow_table bridge_table = {
+	.name = "bridge",
+	.uid = ROCKER_FLOW_TABLE_ID_BRIDGING,
+	.source = 1,
+	.size = -1,
+	.matches = matches_bridge,
+	.actions = actions_bridge,
+};
+
+struct net_flow_table acl_table = {
+	.name = "acl",
+	.uid = ROCKER_FLOW_TABLE_ID_ACL_POLICY,
+	.source = 1,
+	.size = -1,
+	.matches = matches_acl,
+	.actions = actions_acl,
+};
+
+struct net_flow_table null_table = {
+	.name = "",
+	.uid = 0,
+	.source = 0,
+	.size = 0,
+	.matches = NULL,
+	.actions = NULL,
+};
+
+struct net_flow_table *rocker_table_list[7] = {
+	&ingress_port_table,
+	&vlan_table,
+	&term_mac_table,
+	&ucast_routing_table,
+	&bridge_table,
+	&acl_table,
+	&null_table,
+};
+
+/* Define the table graph layout */
+struct net_flow_jump_table table_node_ig_port_next[2] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_VLAN},
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node table_node_ingress_port = {
+	.uid = ROCKER_FLOW_TABLE_ID_INGRESS_PORT,
+	.jump = table_node_ig_port_next};
+
+struct net_flow_jump_table table_node_vlan_next[2] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_TERMINATION_MAC},
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node table_node_vlan = {
+	.uid = ROCKER_FLOW_TABLE_ID_VLAN,
+	.jump = table_node_vlan_next};
+
+struct net_flow_jump_table table_node_term_mac_next[2] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING},
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node table_node_term_mac = {
+	.uid = ROCKER_FLOW_TABLE_ID_TERMINATION_MAC,
+	.jump = table_node_term_mac_next};
+
+struct net_flow_jump_table table_node_bridge_next[2] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_ACL_POLICY},
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node table_node_bridge = {
+	.uid = ROCKER_FLOW_TABLE_ID_BRIDGING,
+	.jump = table_node_bridge_next};
+
+struct net_flow_jump_table table_node_ucast_routing_next[2] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_ACL_POLICY},
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node table_node_ucast_routing = {
+	.uid = ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING,
+	.jump = table_node_ucast_routing_next};
+
+struct net_flow_jump_table table_node_acl_next[1] = {
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node table_node_acl = {
+	.uid = ROCKER_FLOW_TABLE_ID_ACL_POLICY,
+	.jump = table_node_acl_next};
+
+struct net_flow_tbl_node table_node_nil = {.uid = 0, .jump = NULL};
+
+struct net_flow_tbl_node *rocker_table_nodes[7] = {
+	&table_node_ingress_port,
+	&table_node_vlan,
+	&table_node_term_mac,
+	&table_node_ucast_routing,
+	&table_node_bridge,
+	&table_node_acl,
+	&table_node_nil,
+};
+#endif /*_MY_PIPELINE_H*/

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [net-next PATCH v1 05/11] net: rocker: add set flow rules
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (3 preceding siblings ...)
  2014-12-31 19:47 ` [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch John Fastabend
@ 2014-12-31 19:47 ` John Fastabend
  2015-01-06  7:23   ` Scott Feldman
  2014-12-31 19:48 ` [net-next PATCH v1 06/11] net: rocker: add group_id slices and drop explicit goto John Fastabend
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:47 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

Implement set flow operations for existing rocker tables.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c          |  517 +++++++++++++++++++++++++
 drivers/net/ethernet/rocker/rocker_pipeline.h |    3 
 2 files changed, 519 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 4c6787a..c40c58d 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3806,6 +3806,520 @@ static struct net_flow_hdr_node **rocker_get_hgraph(struct net_device *d)
 {
 	return rocker_header_nodes;
 }
+
+static int is_valid_net_flow_action_arg(struct net_flow_action *a, int id)
+{
+	struct net_flow_action_arg *args = a->args;
+	int i;
+
+	for (i = 0; args[i].type != NET_FLOW_ACTION_ARG_TYPE_NULL; i++) {
+		if (a->args[i].type == NET_FLOW_ACTION_ARG_TYPE_NULL ||
+		    args[i].type != a->args[i].type)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int is_valid_net_flow_action(struct net_flow_action *a, int *actions)
+{
+	int i;
+
+	for (i = 0; actions[i]; i++) {
+		if (actions[i] == a->uid)
+			return is_valid_net_flow_action_arg(a, a->uid);
+	}
+	return -EINVAL;
+}
+
+static int is_valid_net_flow_match(struct net_flow_field_ref *f,
+				   struct net_flow_field_ref *fields)
+{
+	int i;
+
+	for (i = 0; fields[i].header; i++) {
+		if (f->header == fields[i].header &&
+		    f->field == fields[i].field)
+			return 0;
+	}
+
+	return -EINVAL;
+}
+
+int is_valid_net_flow(struct net_flow_table *table, struct net_flow_flow *flow)
+{
+	struct net_flow_field_ref *fields = table->matches;
+	int *actions = table->actions;
+	int i, err;
+
+	for (i = 0; flow->actions[i].uid; i++) {
+		err = is_valid_net_flow_action(&flow->actions[i], actions);
+		if (err)
+			return -EINVAL;
+	}
+
+	for (i = 0; flow->matches[i].header; i++) {
+		err = is_valid_net_flow_match(&flow->matches[i], fields);
+		if (err)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static u32 rocker_goto_value(u32 id)
+{
+	switch (id) {
+	case ROCKER_FLOW_TABLE_ID_INGRESS_PORT:
+		return ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT;
+	case ROCKER_FLOW_TABLE_ID_VLAN:
+		return ROCKER_OF_DPA_TABLE_ID_VLAN;
+	case ROCKER_FLOW_TABLE_ID_TERMINATION_MAC:
+		return ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
+	case ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING:
+		return ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING;
+	case ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING:
+		return ROCKER_OF_DPA_TABLE_ID_MULTICAST_ROUTING;
+	case ROCKER_FLOW_TABLE_ID_BRIDGING:
+		return ROCKER_OF_DPA_TABLE_ID_BRIDGING;
+	case ROCKER_FLOW_TABLE_ID_ACL_POLICY:
+		return ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+	default:
+		return 0;
+	}
+}
+
+static int rocker_flow_set_ig_port(struct net_device *dev,
+				   struct net_flow_flow *flow)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	enum rocker_of_dpa_table_id goto_tbl;
+	u32 in_lport_mask = 0xffff0000;
+	u32 in_lport = 0;
+	int err, flags = 0;
+
+	err = is_valid_net_flow(&ingress_port_table, flow);
+	if (err)
+		return err;
+
+	/* ingress port table only supports one field/mask/action this
+	 * simplifies the key construction and we can assume the values
+	 * are the correct types/mask/action by valid check above. The
+	 * user could pass multiple match/actions in a message with the
+	 * same field multiple times currently the valid test does not
+	 * catch this and we just use the first specified.
+	 */
+	in_lport = flow->matches[0].value_u32;
+	in_lport_mask = flow->matches[0].mask_u32;
+	goto_tbl = rocker_goto_value(flow->actions[0].args[0].value_u16);
+
+	err = rocker_flow_tbl_ig_port(rocker_port, flags,
+				      in_lport, in_lport_mask,
+				      goto_tbl);
+	return err;
+}
+
+static int rocker_flow_set_vlan(struct net_device *dev,
+				struct net_flow_flow *flow)
+{
+	enum rocker_of_dpa_table_id goto_tbl;
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	int i, err = 0, flags = 0;
+	u32 in_lport;
+	__be16 vlan_id, vlan_id_mask, new_vlan_id;
+	bool untagged, have_in_lport = false;
+
+	err = is_valid_net_flow(&vlan_table, flow);
+	if (err)
+		return err;
+
+	goto_tbl = ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
+
+	/* If user does not specify vid match default to any */
+	vlan_id = 1;
+	vlan_id_mask = 0;
+
+	for (i = 0; flow->matches && flow->matches[i].instance; i++) {
+		switch (flow->matches[i].instance) {
+		case HEADER_INSTANCE_IN_LPORT:
+			in_lport = flow->matches[i].value_u32;
+			have_in_lport = true;
+			break;
+		case HEADER_INSTANCE_VLAN_OUTER:
+			if (flow->matches[i].field != HEADER_VLAN_VID)
+				break;
+
+			vlan_id = htons(flow->matches[i].value_u16);
+			vlan_id_mask = htons(flow->matches[i].mask_u16);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	/* If user does not specify a new vlan id use default vlan id */
+	new_vlan_id = rocker_port_vid_to_vlan(rocker_port, vlan_id, &untagged);
+
+	for (i = 0; flow->actions && flow->actions[i].uid; i++) {
+		struct net_flow_action_arg *arg = &flow->actions[i].args[0];
+
+		switch (flow->actions[i].uid) {
+		case ACTION_SET_GOTO_TABLE:
+			goto_tbl = rocker_goto_value(arg->value_u16);
+			break;
+		case ACTION_SET_VLAN_ID:
+			new_vlan_id = htons(arg->value_u16);
+			if (new_vlan_id)
+				untagged = false;
+			break;
+		}
+	}
+
+	if (!have_in_lport)
+		return -EINVAL;
+
+	err = rocker_flow_tbl_vlan(rocker_port, flags, in_lport,
+				   vlan_id, vlan_id_mask, goto_tbl,
+				   untagged, new_vlan_id);
+	return err;
+}
+
+static int rocker_flow_set_term_mac(struct net_device *dev,
+				    struct net_flow_flow *flow)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	__be16 vlan_id, vlan_id_mask, ethtype = 0;
+	const u8 *eth_dst, *eth_dst_mask;
+	u32 in_lport, in_lport_mask;
+	int i, err = 0, flags = 0;
+	bool copy_to_cpu;
+
+	eth_dst = NULL;
+	eth_dst_mask = NULL;
+
+	err = is_valid_net_flow(&term_mac_table, flow);
+	if (err)
+		return err;
+
+	/* If user does not specify vid match default to any */
+	vlan_id = rocker_port->internal_vlan_id;
+	vlan_id_mask = 0;
+
+	/* If user does not specify in_lport match default to any */
+	in_lport = rocker_port->lport;
+	in_lport_mask = 0;
+
+	/* If user does not specify a mac address match any */
+	eth_dst = rocker_port->dev->dev_addr;
+	eth_dst_mask = zero_mac;
+
+	for (i = 0; flow->matches && flow->matches[i].instance; i++) {
+		switch (flow->matches[i].instance) {
+		case HEADER_INSTANCE_IN_LPORT:
+			in_lport = flow->matches[i].value_u32;
+			in_lport_mask = flow->matches[i].mask_u32;
+			break;
+		case HEADER_INSTANCE_VLAN_OUTER:
+			if (flow->matches[i].field != HEADER_VLAN_VID)
+				break;
+
+			vlan_id = htons(flow->matches[i].value_u16);
+			vlan_id_mask = htons(flow->matches[i].mask_u16);
+			break;
+		case HEADER_INSTANCE_ETHERNET:
+			switch (flow->matches[i].field) {
+			case HEADER_ETHERNET_DST_MAC:
+				eth_dst = (u8 *)&flow->matches[i].value_u64;
+				eth_dst_mask = (u8 *)&flow->matches[i].mask_u64;
+				break;
+			case HEADER_ETHERNET_ETHERTYPE:
+				ethtype = htons(flow->matches[i].value_u16);
+				break;
+			default:
+				return -EINVAL;
+			}
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	if (!ethtype)
+		return -EINVAL;
+
+	/* By default do not copy to cpu */
+	copy_to_cpu = false;
+
+	for (i = 0; flow->actions && flow->actions[i].uid; i++) {
+		switch (flow->actions[i].uid) {
+		case ACTION_COPY_TO_CPU:
+			copy_to_cpu = true;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	err = rocker_flow_tbl_term_mac(rocker_port, in_lport, in_lport_mask,
+				       ethtype, eth_dst, eth_dst_mask,
+				       vlan_id, vlan_id_mask,
+				       copy_to_cpu, flags);
+	return err;
+}
+
+static int rocker_flow_set_ucast_routing(struct net_device *dev,
+					 struct net_flow_flow *flow)
+{
+	return -EOPNOTSUPP;
+}
+
+static int rocker_flow_set_mcast_routing(struct net_device *dev,
+					 struct net_flow_flow *flow)
+{
+	return -EOPNOTSUPP;
+}
+
+static int rocker_flow_set_bridge(struct net_device *dev,
+				  struct net_flow_flow *flow)
+{
+	enum rocker_of_dpa_table_id goto_tbl;
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	u32 in_lport, in_lport_mask, group_id, tunnel_id;
+	__be16 vlan_id, vlan_id_mask;
+	const u8 *eth_dst, *eth_dst_mask;
+	int i, err = 0, flags = 0;
+	bool copy_to_cpu;
+
+	err = is_valid_net_flow(&bridge_table, flow);
+	if (err)
+		return err;
+
+	goto_tbl = ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+
+	/* If user does not specify vid match default to any */
+	vlan_id = rocker_port->internal_vlan_id;
+	vlan_id_mask = 0;
+
+	/* If user does not specify in_lport match default to any */
+	in_lport = rocker_port->lport;
+	in_lport_mask = 0;
+
+	/* If user does not specify a mac address match any */
+	eth_dst = rocker_port->dev->dev_addr;
+	eth_dst_mask = NULL;
+
+	/* Do not support for tunnel_id yet. */
+	tunnel_id = 0;
+
+	for (i = 0; flow->matches && flow->matches[i].instance; i++) {
+		switch (flow->matches[i].instance) {
+		case HEADER_INSTANCE_IN_LPORT:
+			in_lport = flow->matches[i].value_u32;
+			in_lport_mask = flow->matches[i].mask_u32;
+			break;
+		case HEADER_INSTANCE_VLAN_OUTER:
+			if (flow->matches[i].field != HEADER_VLAN_VID)
+				break;
+
+			vlan_id = htons(flow->matches[i].value_u16);
+			vlan_id_mask = htons(flow->matches[i].mask_u16);
+			break;
+		case HEADER_INSTANCE_ETHERNET:
+			switch (flow->matches[i].field) {
+			case HEADER_ETHERNET_DST_MAC:
+				eth_dst = (u8 *)&flow->matches[i].value_u64;
+				eth_dst_mask = (u8 *)&flow->matches[i].mask_u64;
+				break;
+			default:
+				return -EINVAL;
+			}
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	/* By default do not copy to cpu and skip group assignment */
+	copy_to_cpu = false;
+	group_id = ROCKER_GROUP_NONE;
+
+	for (i = 0; flow->actions && flow->actions[i].uid; i++) {
+		struct net_flow_action_arg *arg = &flow->actions[i].args[0];
+
+		switch (flow->actions[i].uid) {
+		case ACTION_SET_GOTO_TABLE:
+			goto_tbl = rocker_goto_value(arg->value_u16);
+			break;
+		case ACTION_COPY_TO_CPU:
+			copy_to_cpu = true;
+			break;
+		case ACTION_SET_GROUP_ID:
+			group_id = arg->value_u32;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	/* Ignoring eth_dst_mask it seems to cause a EINVAL return code */
+	err = rocker_flow_tbl_bridge(rocker_port, flags,
+				     eth_dst, eth_dst_mask,
+				     vlan_id, tunnel_id,
+				     goto_tbl, group_id, copy_to_cpu);
+	return err;
+}
+
+static int rocker_flow_set_acl(struct net_device *dev,
+			       struct net_flow_flow *flow)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	u32 in_lport, in_lport_mask, group_id, tunnel_id;
+	__be16 vlan_id, vlan_id_mask, ethtype = 0;
+	const u8 *eth_dst, *eth_src, *eth_dst_mask, *eth_src_mask;
+	u8 protocol, protocol_mask, dscp, dscp_mask;
+	int i, err = 0, flags = 0;
+
+	err = is_valid_net_flow(&bridge_table, flow);
+	if (err)
+		return err;
+
+	/* If user does not specify vid match default to any */
+	vlan_id = rocker_port->internal_vlan_id;
+	vlan_id_mask = 0;
+
+	/* If user does not specify in_lport match default to any */
+	in_lport = rocker_port->lport;
+	in_lport_mask = 0;
+
+	/* If user does not specify a mac address match any */
+	eth_dst = rocker_port->dev->dev_addr;
+	eth_src = zero_mac;
+	eth_dst_mask = NULL;
+	eth_src_mask = NULL;
+
+	/* If user does not set protocol/dscp mask them out */
+	protocol = 0;
+	dscp = 0;
+	protocol_mask = 0;
+	dscp_mask = 0;
+
+	/* Do not support for tunnel_id yet. */
+	tunnel_id = 0;
+
+	for (i = 0; flow->matches && flow->matches[i].instance; i++) {
+		switch (flow->matches[i].instance) {
+		case HEADER_INSTANCE_IN_LPORT:
+			in_lport = flow->matches[i].value_u32;
+			in_lport_mask = flow->matches[i].mask_u32;
+			break;
+		case HEADER_INSTANCE_VLAN_OUTER:
+			if (flow->matches[i].field != HEADER_VLAN_VID)
+				break;
+
+			vlan_id = htons(flow->matches[i].value_u16);
+			vlan_id_mask = htons(flow->matches[i].mask_u16);
+			break;
+		case HEADER_INSTANCE_ETHERNET:
+			switch (flow->matches[i].field) {
+			case HEADER_ETHERNET_SRC_MAC:
+				eth_src = (u8 *)&flow->matches[i].value_u64;
+				eth_src_mask = (u8 *)&flow->matches[i].mask_u64;
+				break;
+			case HEADER_ETHERNET_DST_MAC:
+				eth_dst = (u8 *)&flow->matches[i].value_u64;
+				eth_dst_mask = (u8 *)&flow->matches[i].mask_u64;
+				break;
+			case HEADER_ETHERNET_ETHERTYPE:
+				ethtype = htons(flow->matches[i].value_u16);
+				break;
+			default:
+				return -EINVAL;
+			}
+			break;
+		case HEADER_INSTANCE_IPV4:
+			switch (flow->matches[i].field) {
+			case HEADER_IPV4_PROTOCOL:
+				protocol = flow->matches[i].value_u8;
+				protocol_mask = flow->matches[i].mask_u8;
+				break;
+			case HEADER_IPV4_DSCP:
+				dscp = flow->matches[i].value_u8;
+				dscp_mask = flow->matches[i].mask_u8;
+				break;
+			default:
+				return -EINVAL;
+			}
+		default:
+			return -EINVAL;
+		}
+	}
+
+	/* By default do not copy to cpu and skip group assignment */
+	group_id = ROCKER_GROUP_NONE;
+
+	for (i = 0; flow->actions && flow->actions[i].uid; i++) {
+		switch (flow->actions[i].uid) {
+		case ACTION_SET_GROUP_ID:
+			group_id = flow->actions[i].args[0].value_u32;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	err = rocker_flow_tbl_acl(rocker_port, flags,
+				  in_lport, in_lport_mask,
+				  eth_src, eth_src_mask,
+				  eth_dst, eth_dst_mask, ethtype,
+				  vlan_id, vlan_id_mask,
+				  protocol, protocol_mask,
+				  dscp, dscp_mask,
+				  group_id);
+	return err;
+}
+
+static int rocker_set_flows(struct net_device *dev,
+			    struct net_flow_flow *flow)
+{
+	int err = -EINVAL;
+
+	if (!flow->matches || !flow->actions)
+		return -EINVAL;
+
+	switch (flow->table_id) {
+	case ROCKER_FLOW_TABLE_ID_INGRESS_PORT:
+		err = rocker_flow_set_ig_port(dev, flow);
+		break;
+	case ROCKER_FLOW_TABLE_ID_VLAN:
+		err = rocker_flow_set_vlan(dev, flow);
+		break;
+	case ROCKER_FLOW_TABLE_ID_TERMINATION_MAC:
+		err = rocker_flow_set_term_mac(dev, flow);
+		break;
+	case ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING:
+		err = rocker_flow_set_ucast_routing(dev, flow);
+		break;
+	case ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING:
+		err = rocker_flow_set_mcast_routing(dev, flow);
+		break;
+	case ROCKER_FLOW_TABLE_ID_BRIDGING:
+		err = rocker_flow_set_bridge(dev, flow);
+		break;
+	case ROCKER_FLOW_TABLE_ID_ACL_POLICY:
+		err = rocker_flow_set_acl(dev, flow);
+		break;
+	default:
+		break;
+	}
+
+	return err;
+}
+
+static int rocker_del_flows(struct net_device *dev,
+			    struct net_flow_flow *flow)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 static const struct net_device_ops rocker_port_netdev_ops = {
@@ -3828,6 +4342,9 @@ static const struct net_device_ops rocker_port_netdev_ops = {
 	.ndo_flow_get_actions		= rocker_get_actions,
 	.ndo_flow_get_tbl_graph		= rocker_get_tgraph,
 	.ndo_flow_get_hdr_graph		= rocker_get_hgraph,
+
+	.ndo_flow_set_flows		= rocker_set_flows,
+	.ndo_flow_del_flows		= rocker_del_flows,
 #endif
 };
 
diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
index 9544339..701e139 100644
--- a/drivers/net/ethernet/rocker/rocker_pipeline.h
+++ b/drivers/net/ethernet/rocker/rocker_pipeline.h
@@ -527,6 +527,7 @@ enum rocker_flow_table_id_space {
 	ROCKER_FLOW_TABLE_ID_VLAN,
 	ROCKER_FLOW_TABLE_ID_TERMINATION_MAC,
 	ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING,
+	ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING,
 	ROCKER_FLOW_TABLE_ID_BRIDGING,
 	ROCKER_FLOW_TABLE_ID_ACL_POLICY,
 	ROCKER_FLOW_TABLE_NULL = 0,
@@ -588,7 +589,7 @@ struct net_flow_table acl_table = {
 
 struct net_flow_table null_table = {
 	.name = "",
-	.uid = 0,
+	.uid = ROCKER_FLOW_TABLE_NULL,
 	.source = 0,
 	.size = 0,
 	.matches = NULL,

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [net-next PATCH v1 06/11] net: rocker: add group_id slices and drop explicit goto
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (4 preceding siblings ...)
  2014-12-31 19:47 ` [net-next PATCH v1 05/11] net: rocker: add set flow rules John Fastabend
@ 2014-12-31 19:48 ` John Fastabend
  2014-12-31 19:48 ` [net-next PATCH v1 07/11] net: rocker: add multicast path to bridging John Fastabend
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:48 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

This adds the group tables for l3_unicast, l2_rewrite and l2. In
addition to adding the tables we extend the metadata fields to
support three different group id lookups. One for each table and
drop the more generic one previously being used.

Finally we can also drop the goto action as it is not used anymore.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c          |  192 +++++++++++++++++++-
 drivers/net/ethernet/rocker/rocker_pipeline.h |  235 ++++++++++++++++++-------
 2 files changed, 355 insertions(+), 72 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index c40c58d..8ce9933 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3964,9 +3964,6 @@ static int rocker_flow_set_vlan(struct net_device *dev,
 		struct net_flow_action_arg *arg = &flow->actions[i].args[0];
 
 		switch (flow->actions[i].uid) {
-		case ACTION_SET_GOTO_TABLE:
-			goto_tbl = rocker_goto_value(arg->value_u16);
-			break;
 		case ACTION_SET_VLAN_ID:
 			new_vlan_id = htons(arg->value_u16);
 			if (new_vlan_id)
@@ -4147,14 +4144,11 @@ static int rocker_flow_set_bridge(struct net_device *dev,
 		struct net_flow_action_arg *arg = &flow->actions[i].args[0];
 
 		switch (flow->actions[i].uid) {
-		case ACTION_SET_GOTO_TABLE:
-			goto_tbl = rocker_goto_value(arg->value_u16);
-			break;
 		case ACTION_COPY_TO_CPU:
 			copy_to_cpu = true;
 			break;
-		case ACTION_SET_GROUP_ID:
-			group_id = arg->value_u32;
+		case ACTION_SET_L3_UNICAST_GROUP_ID:
+			group_id = ROCKER_GROUP_L3_UNICAST(arg->value_u32);
 			break;
 		default:
 			return -EINVAL;
@@ -4258,9 +4252,11 @@ static int rocker_flow_set_acl(struct net_device *dev,
 	group_id = ROCKER_GROUP_NONE;
 
 	for (i = 0; flow->actions && flow->actions[i].uid; i++) {
+		struct net_flow_action_arg *arg = &flow->actions[i].args[0];
+
 		switch (flow->actions[i].uid) {
-		case ACTION_SET_GROUP_ID:
-			group_id = flow->actions[i].args[0].value_u32;
+		case ACTION_SET_L3_UNICAST_GROUP_ID:
+			group_id = ROCKER_GROUP_L3_UNICAST(arg->value_u32);
 			break;
 		default:
 			return -EINVAL;
@@ -4278,6 +4274,173 @@ static int rocker_flow_set_acl(struct net_device *dev,
 	return err;
 }
 
+static int rocker_flow_set_group_slice_l3_unicast(struct net_device *dev,
+						  struct net_flow_flow *flow)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker_group_tbl_entry *entry;
+	int i, flags = 0, err = 0;
+
+	err = is_valid_net_flow(&group_slice_l3_unicast_table, flow);
+	if (err)
+		return err;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	for (i = 0; flow->matches && flow->matches[i].instance; i++) {
+		struct net_flow_field_ref *r = &flow->matches[i];
+
+		switch (r->instance) {
+		case HEADER_INSTANCE_L3_UNICAST_GROUP_ID:
+			entry->group_id = ROCKER_GROUP_L3_UNICAST(r->value_u32);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; flow->actions && flow->actions[i].uid; i++) {
+		struct net_flow_action_arg *arg = &flow->actions[i].args[0];
+
+		switch (flow->actions[i].uid) {
+		case ACTION_SET_ETH_SRC:
+			ether_addr_copy(entry->l3_unicast.eth_src,
+					(u8 *)&arg->value_u64);
+			break;
+		case ACTION_SET_ETH_DST:
+			ether_addr_copy(entry->l3_unicast.eth_dst,
+					(u8 *)&arg->value_u64);
+			break;
+		case ACTION_SET_VLAN_ID:
+			entry->l3_unicast.vlan_id = htons(arg->value_u16);
+			break;
+		case ACTION_CHECK_TTL_DROP:
+			entry->l3_unicast.ttl_check = true;
+			break;
+		case ACTION_SET_L2_REWRITE_GROUP_ID:
+			entry->l3_unicast.group_id =
+				ROCKER_GROUP_L2_REWRITE(arg->value_u32);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return rocker_group_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_set_group_slice_l2_rewrite(struct net_device *dev,
+						  struct net_flow_flow *flow)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker_group_tbl_entry *entry;
+	int i, flags = 0, err = 0;
+
+	err = is_valid_net_flow(&group_slice_l2_rewrite_table, flow);
+	if (err)
+		return err;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	for (i = 0; flow->matches && flow->matches[i].instance; i++) {
+		struct net_flow_field_ref *r = &flow->matches[i];
+
+		switch (r->instance) {
+		case HEADER_INSTANCE_L2_REWRITE_GROUP_ID:
+			entry->group_id = ROCKER_GROUP_L2_REWRITE(r->value_u32);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; flow->actions && flow->actions[i].uid; i++) {
+		struct net_flow_action_arg *arg = &flow->actions[i].args[0];
+
+		switch (flow->actions[i].uid) {
+		case ACTION_SET_ETH_SRC:
+			ether_addr_copy(entry->l2_rewrite.eth_src,
+					(u8 *)&arg->value_u64);
+			break;
+		case ACTION_SET_ETH_DST:
+			ether_addr_copy(entry->l2_rewrite.eth_dst,
+					(u8 *)&arg->value_u64);
+			break;
+		case ACTION_SET_VLAN_ID:
+			entry->l2_rewrite.vlan_id = htons(arg->value_u16);
+			break;
+		case ACTION_SET_L2_GROUP_ID:
+			entry->l2_rewrite.group_id =
+				ROCKER_GROUP_L2_INTERFACE(arg->value_u32,
+							  rocker_port->lport);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return rocker_group_tbl_do(rocker_port, flags, entry);
+}
+
+static int rocker_flow_set_group_slice_l2(struct net_device *dev,
+					  struct net_flow_flow *flow)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker_group_tbl_entry *entry;
+	int i, flags = 0, err = 0;
+	u32 lport;
+
+	err = is_valid_net_flow(&group_slice_l2_table, flow);
+	if (err)
+		return err;
+
+	entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags));
+	if (!entry)
+		return -ENOMEM;
+
+	lport = rocker_port->lport;
+
+	/* Use the dev lport if we don't have a specified lport instance
+	 * from the user. We need to walk the list once before to extract
+	 * any lport attribute.
+	 */
+	for (i = 0; flow->matches && flow->matches[i].instance; i++) {
+		switch (flow->matches[i].instance) {
+		case HEADER_METADATA_IN_LPORT:
+			lport = flow->matches[i].value_u32;
+		}
+	}
+
+	for (i = 0; flow->matches && flow->matches[i].instance; i++) {
+		struct net_flow_field_ref *r = &flow->matches[i];
+
+		switch (r->instance) {
+		case HEADER_INSTANCE_L2_GROUP_ID:
+			entry->group_id =
+				ROCKER_GROUP_L2_INTERFACE(r->value_u32, lport);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; flow->actions && flow->actions[i].uid; i++) {
+		switch (flow->actions[i].uid) {
+		case ACTION_POP_VLAN:
+			entry->l2_interface.pop_vlan = true;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return rocker_group_tbl_do(rocker_port, flags, entry);
+}
+
 static int rocker_set_flows(struct net_device *dev,
 			    struct net_flow_flow *flow)
 {
@@ -4308,6 +4471,15 @@ static int rocker_set_flows(struct net_device *dev,
 	case ROCKER_FLOW_TABLE_ID_ACL_POLICY:
 		err = rocker_flow_set_acl(dev, flow);
 		break;
+	case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST:
+		err = rocker_flow_set_group_slice_l3_unicast(dev, flow);
+		break;
+	case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE:
+		err = rocker_flow_set_group_slice_l2_rewrite(dev, flow);
+		break;
+	case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2:
+		err = rocker_flow_set_group_slice_l2(dev, flow);
+		break;
 	default:
 		break;
 	}
diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
index 701e139..7e689c0 100644
--- a/drivers/net/ethernet/rocker/rocker_pipeline.h
+++ b/drivers/net/ethernet/rocker/rocker_pipeline.h
@@ -111,16 +111,21 @@ struct net_flow_header ipv4 = {
 
 #define HEADER_METADATA_IN_LPORT 1
 #define HEADER_METADATA_GOTO_TBL 2
-#define HEADER_METADATA_GROUP_ID 3
-struct net_flow_field metadata_fields[3] = {
+#define HEADER_METADATA_L3_UNICAST_GROUP_ID	3
+#define HEADER_METADATA_L2_REWRITE_GROUP_ID	4
+#define HEADER_METADATA_L2_GROUP_ID		5
+struct net_flow_field metadata_fields[5] = {
 	{ .name = "in_lport",
 	  .uid = HEADER_METADATA_IN_LPORT,
 	  .bitwidth = 32,},
-	{ .name = "goto_tbl",
-	  .uid = HEADER_METADATA_GOTO_TBL,
-	  .bitwidth = 16,},
-	{ .name = "group_id",
-	  .uid = HEADER_METADATA_GROUP_ID,
+	{ .name = "l3_unicast_group_id",
+	  .uid = HEADER_METADATA_L3_UNICAST_GROUP_ID,
+	  .bitwidth = 32,},
+	{ .name = "l2_rewrite_group_id",
+	  .uid = HEADER_METADATA_L2_REWRITE_GROUP_ID,
+	  .bitwidth = 32,},
+	{ .name = "l2_group_id",
+	  .uid = HEADER_METADATA_L2_GROUP_ID,
 	  .bitwidth = 32,},
 };
 
@@ -128,7 +133,7 @@ struct net_flow_field metadata_fields[3] = {
 struct net_flow_header metadata_t = {
 	.name = "metadata_t",
 	.uid = HEADER_METADATA,
-	.field_sz = 3,
+	.field_sz = 5,
 	.fields = metadata_fields,
 };
 
@@ -157,25 +162,6 @@ struct net_flow_action null_action = {
 	.name = "", .uid = 0, .args = NULL,
 };
 
-struct net_flow_action_arg set_goto_table_args[2] = {
-	{
-		.name = "table",
-		.type = NET_FLOW_ACTION_ARG_TYPE_U16,
-		.value_u16 = 0,
-	},
-	{
-		.name = "",
-		.type = NET_FLOW_ACTION_ARG_TYPE_NULL,
-	},
-};
-
-#define ACTION_SET_GOTO_TABLE 1
-struct net_flow_action set_goto_table = {
-	.name = "set_goto_table",
-	.uid = ACTION_SET_GOTO_TABLE,
-	.args = set_goto_table_args,
-};
-
 struct net_flow_action_arg set_vlan_id_args[2] = {
 	{
 		.name = "vlan_id",
@@ -188,7 +174,7 @@ struct net_flow_action_arg set_vlan_id_args[2] = {
 	},
 };
 
-#define ACTION_SET_VLAN_ID 2
+#define ACTION_SET_VLAN_ID 1
 struct net_flow_action set_vlan_id = {
 	.name = "set_vlan_id",
 	.uid = ACTION_SET_VLAN_ID,
@@ -196,7 +182,7 @@ struct net_flow_action set_vlan_id = {
 };
 
 /* TBD: what is the untagged bool about in vlan table */
-#define ACTION_COPY_TO_CPU 3
+#define ACTION_COPY_TO_CPU 2
 struct net_flow_action copy_to_cpu = {
 	.name = "copy_to_cpu",
 	.uid = ACTION_COPY_TO_CPU,
@@ -215,14 +201,28 @@ struct net_flow_action_arg set_group_id_args[2] = {
 	},
 };
 
-#define ACTION_SET_GROUP_ID 4
-struct net_flow_action set_group_id = {
-	.name = "set_group_id",
-	.uid = ACTION_SET_GROUP_ID,
+#define ACTION_SET_L3_UNICAST_GROUP_ID 3
+struct net_flow_action set_l3_unicast_group_id = {
+	.name = "set_l3_unicast_group_id",
+	.uid = ACTION_SET_L3_UNICAST_GROUP_ID,
 	.args = set_group_id_args,
 };
 
-#define ACTION_POP_VLAN 5
+#define ACTION_SET_L2_REWRITE_GROUP_ID 4
+struct net_flow_action set_l2_rewrite_group_id = {
+	.name = "set_l2_rewrite_group_id",
+	.uid = ACTION_SET_L2_REWRITE_GROUP_ID,
+	.args = set_group_id_args,
+};
+
+#define ACTION_SET_L2_GROUP_ID 5
+struct net_flow_action set_l2_group_id = {
+	.name = "set_l2_group_id",
+	.uid = ACTION_SET_L2_GROUP_ID,
+	.args = set_group_id_args,
+};
+
+#define ACTION_POP_VLAN 6
 struct net_flow_action pop_vlan = {
 	.name = "pop_vlan",
 	.uid = ACTION_POP_VLAN,
@@ -241,7 +241,7 @@ struct net_flow_action_arg set_eth_src_args[2] = {
 	},
 };
 
-#define ACTION_SET_ETH_SRC 6
+#define ACTION_SET_ETH_SRC 7
 struct net_flow_action set_eth_src = {
 	.name = "set_eth_src",
 	.uid = ACTION_SET_ETH_SRC,
@@ -260,7 +260,7 @@ struct net_flow_action_arg set_eth_dst_args[2] = {
 	},
 };
 
-#define ACTION_SET_ETH_DST 7
+#define ACTION_SET_ETH_DST 8
 struct net_flow_action set_eth_dst = {
 	.name = "set_eth_dst",
 	.uid = ACTION_SET_ETH_DST,
@@ -279,21 +279,30 @@ struct net_flow_action_arg set_out_port_args[2] = {
 	},
 };
 
-#define ACTION_SET_OUT_PORT 8
+#define ACTION_SET_OUT_PORT 9
 struct net_flow_action set_out_port = {
 	.name = "set_out_port",
 	.uid = ACTION_SET_OUT_PORT,
 	.args = set_out_port_args,
 };
 
-struct net_flow_action *rocker_action_list[8] = {
-	&set_goto_table,
+#define ACTION_CHECK_TTL_DROP 10
+struct net_flow_action check_ttl_drop = {
+	.name = "check_ttl_drop",
+	.uid = ACTION_CHECK_TTL_DROP,
+	.args = null_args,
+};
+
+struct net_flow_action *rocker_action_list[10] = {
 	&set_vlan_id,
 	&copy_to_cpu,
-	&set_group_id,
+	&set_l3_unicast_group_id,
+	&set_l2_rewrite_group_id,
+	&set_l2_group_id,
 	&pop_vlan,
 	&set_eth_src,
 	&set_eth_dst,
+	&check_ttl_drop,
 	&null_action,
 };
 
@@ -302,8 +311,9 @@ struct net_flow_action *rocker_action_list[8] = {
 #define HEADER_INSTANCE_VLAN_OUTER 2
 #define HEADER_INSTANCE_IPV4 3
 #define HEADER_INSTANCE_IN_LPORT 4
-#define HEADER_INSTANCE_GOTO_TABLE 5
-#define HEADER_INSTANCE_GROUP_ID 6
+#define HEADER_INSTANCE_L3_UNICAST_GROUP_ID 5
+#define HEADER_INSTANCE_L2_REWRITE_GROUP_ID 6
+#define HEADER_INSTANCE_L2_GROUP_ID 7
 
 struct net_flow_jump_table parse_ethernet[3] = {
 	{
@@ -390,29 +400,37 @@ struct net_flow_hdr_node in_lport_header_node = {
 	.jump = terminal_headers,
 };
 
-struct net_flow_hdr_node goto_table_header_node = {
-	.name = "goto_table",
-	.uid = HEADER_INSTANCE_GOTO_TABLE,
+struct net_flow_hdr_node l2_group_id_header_node = {
+	.name = "l2_group_id",
+	.uid = HEADER_INSTANCE_L2_GROUP_ID,
+	.hdrs = metadata_headers,
+	.jump = terminal_headers,
+};
+
+struct net_flow_hdr_node l2_rewrite_group_id_header_node = {
+	.name = "l2_rewrite_group_id",
+	.uid = HEADER_INSTANCE_L2_REWRITE_GROUP_ID,
 	.hdrs = metadata_headers,
 	.jump = terminal_headers,
 };
 
-struct net_flow_hdr_node group_id_header_node = {
-	.name = "group_id",
-	.uid = HEADER_INSTANCE_GROUP_ID,
+struct net_flow_hdr_node l3_unicast_group_id_header_node = {
+	.name = "l3_uniscast_group_id",
+	.uid = HEADER_INSTANCE_L3_UNICAST_GROUP_ID,
 	.hdrs = metadata_headers,
 	.jump = terminal_headers,
 };
 
 struct net_flow_hdr_node null_header = {.name = "", .uid = 0,};
 
-struct net_flow_hdr_node *rocker_header_nodes[7] = {
+struct net_flow_hdr_node *rocker_header_nodes[] = {
 	&ethernet_header_node,
 	&vlan_header_node,
 	&ipv4_header_node,
 	&in_lport_header_node,
-	&goto_table_header_node,
-	&group_id_header_node,
+	&l3_unicast_group_id_header_node,
+	&l2_rewrite_group_id_header_node,
+	&l2_group_id_header_node,
 	&null_header,
 };
 
@@ -513,14 +531,46 @@ struct net_flow_field_ref matches_acl[8] = {
 	{ .instance = 0, .field = 0},
 };
 
-int actions_ig_port[2] = {ACTION_SET_GOTO_TABLE, 0};
-int actions_vlan[3] = {ACTION_SET_GOTO_TABLE, ACTION_SET_VLAN_ID, 0};
-int actions_term_mac[3] = {ACTION_SET_GOTO_TABLE, ACTION_COPY_TO_CPU, 0};
-int actions_ucast_routing[3] = {ACTION_SET_GOTO_TABLE, ACTION_SET_GROUP_ID, 0};
-int actions_bridge[4] = {ACTION_SET_GOTO_TABLE,
-			 ACTION_SET_GROUP_ID,
-			 ACTION_COPY_TO_CPU, 0};
-int actions_acl[2] = {ACTION_SET_GROUP_ID, 0};
+struct net_flow_field_ref matches_l3_unicast_group_slice[2] = {
+	{ .instance = HEADER_INSTANCE_L3_UNICAST_GROUP_ID,
+	  .header = HEADER_METADATA,
+	  .field = HEADER_METADATA_L3_UNICAST_GROUP_ID,
+	  .mask_type = NET_FLOW_MASK_TYPE_EXACT},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref matches_l2_rewrite_group_slice[2] = {
+	{ .instance = HEADER_INSTANCE_L2_REWRITE_GROUP_ID,
+	  .header = HEADER_METADATA,
+	  .field = HEADER_METADATA_L2_REWRITE_GROUP_ID,
+	  .mask_type = NET_FLOW_MASK_TYPE_EXACT},
+	{ .instance = 0, .field = 0},
+};
+
+struct net_flow_field_ref matches_l2_group_slice[2] = {
+	{ .instance = HEADER_INSTANCE_L2_GROUP_ID,
+	  .header = HEADER_METADATA,
+	  .field = HEADER_METADATA_L2_GROUP_ID,
+	  .mask_type = NET_FLOW_MASK_TYPE_EXACT},
+	{ .instance = 0, .field = 0},
+};
+
+int actions_ig_port[] = {0};
+int actions_vlan[] = {ACTION_SET_VLAN_ID, 0};
+int actions_term_mac[] = {ACTION_COPY_TO_CPU, 0};
+int actions_ucast_routing[] = {ACTION_SET_L3_UNICAST_GROUP_ID, 0};
+int actions_bridge[] = {ACTION_SET_L2_GROUP_ID, ACTION_COPY_TO_CPU, 0};
+int actions_acl[] = {ACTION_SET_L3_UNICAST_GROUP_ID, 0};
+int actions_group_slice_l3_unicast[] = {ACTION_SET_ETH_SRC,
+					ACTION_SET_ETH_DST,
+					ACTION_SET_VLAN_ID,
+					ACTION_SET_L2_REWRITE_GROUP_ID,
+					ACTION_CHECK_TTL_DROP, 0};
+int actions_group_slice_l2_rewrite[] = {ACTION_SET_ETH_SRC,
+					ACTION_SET_ETH_DST,
+					ACTION_SET_VLAN_ID,
+					ACTION_SET_L2_GROUP_ID, 0};
+int actions_group_slice_l2[] = {ACTION_POP_VLAN, 0};
 
 enum rocker_flow_table_id_space {
 	ROCKER_FLOW_TABLE_ID_INGRESS_PORT = 1,
@@ -530,6 +580,9 @@ enum rocker_flow_table_id_space {
 	ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING,
 	ROCKER_FLOW_TABLE_ID_BRIDGING,
 	ROCKER_FLOW_TABLE_ID_ACL_POLICY,
+	ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST,
+	ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE,
+	ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2,
 	ROCKER_FLOW_TABLE_NULL = 0,
 };
 
@@ -587,6 +640,33 @@ struct net_flow_table acl_table = {
 	.actions = actions_acl,
 };
 
+struct net_flow_table group_slice_l3_unicast_table = {
+	.name = "group_slice_l3_unicast",
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST,
+	.source = 1,
+	.size = -1,
+	.matches = matches_l3_unicast_group_slice,
+	.actions = actions_group_slice_l3_unicast,
+};
+
+struct net_flow_table group_slice_l2_rewrite_table = {
+	.name = "group_slice_l2_rewrite",
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE,
+	.source = 1,
+	.size = -1,
+	.matches = matches_l2_rewrite_group_slice,
+	.actions = actions_group_slice_l2_rewrite,
+};
+
+struct net_flow_table group_slice_l2_table = {
+	.name = "group_slice_l2",
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2,
+	.source = 1,
+	.size = -1,
+	.matches = matches_l2_group_slice,
+	.actions = actions_group_slice_l2,
+};
+
 struct net_flow_table null_table = {
 	.name = "",
 	.uid = ROCKER_FLOW_TABLE_NULL,
@@ -596,13 +676,16 @@ struct net_flow_table null_table = {
 	.actions = NULL,
 };
 
-struct net_flow_table *rocker_table_list[7] = {
+struct net_flow_table *rocker_table_list[10] = {
 	&ingress_port_table,
 	&vlan_table,
 	&term_mac_table,
 	&ucast_routing_table,
 	&bridge_table,
 	&acl_table,
+	&group_slice_l3_unicast_table,
+	&group_slice_l2_rewrite_table,
+	&group_slice_l2_table,
 	&null_table,
 };
 
@@ -652,7 +735,8 @@ struct net_flow_tbl_node table_node_ucast_routing = {
 	.uid = ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING,
 	.jump = table_node_ucast_routing_next};
 
-struct net_flow_jump_table table_node_acl_next[1] = {
+struct net_flow_jump_table table_node_acl_next[2] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST},
 	{ .field = {0}, .node = 0},
 };
 
@@ -660,15 +744,42 @@ struct net_flow_tbl_node table_node_acl = {
 	.uid = ROCKER_FLOW_TABLE_ID_ACL_POLICY,
 	.jump = table_node_acl_next};
 
+struct net_flow_jump_table table_node_group_l3_unicast_next[1] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE},
+};
+
+struct net_flow_tbl_node table_node_group_l3_unicast = {
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST,
+	.jump = table_node_group_l3_unicast_next};
+
+struct net_flow_jump_table table_node_group_l2_rewrite_next[1] = {
+	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2},
+};
+
+struct net_flow_tbl_node table_node_group_l2_rewrite = {
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE,
+	.jump = table_node_group_l2_rewrite_next};
+
+struct net_flow_jump_table table_node_group_l2_next[1] = {
+	{ .field = {0}, .node = 0},
+};
+
+struct net_flow_tbl_node table_node_group_l2 = {
+	.uid = ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2,
+	.jump = table_node_group_l2_next};
+
 struct net_flow_tbl_node table_node_nil = {.uid = 0, .jump = NULL};
 
-struct net_flow_tbl_node *rocker_table_nodes[7] = {
+struct net_flow_tbl_node *rocker_table_nodes[10] = {
 	&table_node_ingress_port,
 	&table_node_vlan,
 	&table_node_term_mac,
 	&table_node_ucast_routing,
 	&table_node_bridge,
 	&table_node_acl,
+	&table_node_group_l3_unicast,
+	&table_node_group_l2_rewrite,
+	&table_node_group_l2,
 	&table_node_nil,
 };
 #endif /*_MY_PIPELINE_H*/

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [net-next PATCH v1 07/11] net: rocker: add multicast path to bridging
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (5 preceding siblings ...)
  2014-12-31 19:48 ` [net-next PATCH v1 06/11] net: rocker: add group_id slices and drop explicit goto John Fastabend
@ 2014-12-31 19:48 ` John Fastabend
  2014-12-31 19:48 ` [net-next PATCH v1 08/11] net: rocker: add get flow API operation John Fastabend
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:48 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

Add path in table graph to send packets to the bridge table.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker_pipeline.h |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
index 7e689c0..0835bcc 100644
--- a/drivers/net/ethernet/rocker/rocker_pipeline.h
+++ b/drivers/net/ethernet/rocker/rocker_pipeline.h
@@ -708,7 +708,15 @@ struct net_flow_tbl_node table_node_vlan = {
 	.uid = ROCKER_FLOW_TABLE_ID_VLAN,
 	.jump = table_node_vlan_next};
 
-struct net_flow_jump_table table_node_term_mac_next[2] = {
+struct net_flow_jump_table table_node_term_mac_next[3] = {
+	{ .field = {.instance = HEADER_INSTANCE_ETHERNET,
+		    .header = HEADER_ETHERNET,
+		    .field = HEADER_ETHERNET_DST_MAC,
+		    .mask_type = NET_FLOW_MASK_TYPE_LPM,
+		    .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U64,
+		    .value_u64 = (__u64)0x1,
+		    .mask_u64 = (__u64)0x1,
+	}, .node = ROCKER_FLOW_TABLE_ID_BRIDGING},
 	{ .field = {0}, .node = ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING},
 	{ .field = {0}, .node = 0},
 };

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [net-next PATCH v1 08/11] net: rocker: add get flow API operation
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (6 preceding siblings ...)
  2014-12-31 19:48 ` [net-next PATCH v1 07/11] net: rocker: add multicast path to bridging John Fastabend
@ 2014-12-31 19:48 ` John Fastabend
       [not found]   ` <CAKoUArm4z_i6Su9Q4ODB1QYR_Z098MjT2yN=WR7LbN387AvPsg@mail.gmail.com>
  2015-01-06  7:40   ` Scott Feldman
  2014-12-31 19:49 ` [net-next PATCH v1 09/11] net: rocker: add cookie to group acls and use flow_id to set cookie John Fastabend
                   ` (7 subsequent siblings)
  15 siblings, 2 replies; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:48 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

Add operations to get flows. I wouldn't mind cleaning this code
up a bit but my first attempt to do this used macros which shortered
the code up but when I was done I decided it just made the code
unreadable and unmaintainable.

I might think about it a bit more but this implementation albeit
a bit long and repeatative is easier to understand IMO.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c |  819 ++++++++++++++++++++++++++++++++++
 1 file changed, 819 insertions(+)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 8ce9933..997beb9 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3884,6 +3884,12 @@ static u32 rocker_goto_value(u32 id)
 		return ROCKER_OF_DPA_TABLE_ID_BRIDGING;
 	case ROCKER_FLOW_TABLE_ID_ACL_POLICY:
 		return ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
+	case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST:
+		return ROCKER_OF_DPA_GROUP_TYPE_L3_UCAST;
+	case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE:
+		return ROCKER_OF_DPA_GROUP_TYPE_L2_REWRITE;
+	case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2:
+		return ROCKER_OF_DPA_GROUP_TYPE_L2_INTERFACE;
 	default:
 		return 0;
 	}
@@ -4492,6 +4498,818 @@ static int rocker_del_flows(struct net_device *dev,
 {
 	return -EOPNOTSUPP;
 }
+
+static int rocker_ig_port_to_flow(struct rocker_flow_tbl_key *key,
+				  struct net_flow_flow *flow)
+{
+	flow->matches = kcalloc(2, sizeof(struct net_flow_field_ref),
+				GFP_KERNEL);
+	if (!flow->matches)
+		return -ENOMEM;
+
+	flow->matches[0].instance = HEADER_INSTANCE_IN_LPORT;
+	flow->matches[0].header = HEADER_METADATA;
+	flow->matches[0].field = HEADER_METADATA_IN_LPORT;
+	flow->matches[0].mask_type = NET_FLOW_MASK_TYPE_LPM;
+	flow->matches[0].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U32;
+	flow->matches[0].value_u32 = key->ig_port.in_lport;
+	flow->matches[0].mask_u32 = key->ig_port.in_lport_mask;
+	memset(&flow->matches[1], 0, sizeof(flow->matches[1]));
+	return 0;
+}
+
+static int rocker_vlan_to_flow(struct rocker_flow_tbl_key *key,
+			       struct net_flow_flow *flow)
+{
+	int cnt = 0;
+
+	if (key->vlan.in_lport)
+		cnt++;
+	if (key->vlan.vlan_id)
+		cnt++;
+
+	flow->matches = kcalloc((cnt + 1),
+				sizeof(struct net_flow_field_ref),
+				GFP_KERNEL);
+	if (!flow->matches)
+		return -ENOMEM;
+
+	cnt = 0;
+	if (key->vlan.in_lport) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_IN_LPORT;
+		flow->matches[cnt].header = HEADER_METADATA;
+		flow->matches[cnt].field = HEADER_METADATA_IN_LPORT;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_EXACT;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U32;
+		flow->matches[cnt].value_u32 = key->vlan.in_lport;
+		cnt++;
+	}
+
+	if (key->vlan.vlan_id) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_VLAN_OUTER;
+		flow->matches[cnt].header = HEADER_VLAN;
+		flow->matches[cnt].field = HEADER_VLAN_VID;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_LPM;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16;
+		flow->matches[cnt].value_u16 = ntohs(key->vlan.vlan_id);
+		flow->matches[cnt].mask_u16 = ntohs(key->vlan.vlan_id_mask);
+		cnt++;
+	}
+	memset(&flow->matches[cnt], 0, sizeof(flow->matches[cnt]));
+
+	flow->actions = kcalloc(2,
+				sizeof(struct net_flow_action),
+				GFP_KERNEL);
+	if (!flow->actions) {
+		kfree(flow->matches);
+		return -ENOMEM;
+	}
+
+	flow->actions[0].args = kcalloc(2, sizeof(struct net_flow_action_arg),
+					GFP_KERNEL);
+	if (!flow->actions[0].args) {
+		kfree(flow->matches);
+		kfree(flow->actions);
+		return -ENOMEM;
+	}
+
+	flow->actions[0].uid = ACTION_SET_VLAN_ID;
+	flow->actions[0].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U16;
+	flow->actions[0].args[0].value_u16 = ntohs(key->vlan.new_vlan_id);
+
+	memset(&flow->actions[1], 0, sizeof(flow->actions[1]));
+	memset(&flow->actions[0].args[1], 0,
+	       sizeof(struct net_flow_action_arg));
+
+	return 0;
+}
+
+static int rocker_term_to_flow(struct rocker_flow_tbl_key *key,
+			       struct net_flow_flow *flow)
+{
+	int cnt = 0;
+
+	if (key->term_mac.in_lport)
+		cnt++;
+	if (key->term_mac.eth_type)
+		cnt++;
+	if (key->term_mac.eth_dst)
+		cnt++;
+	if (key->term_mac.vlan_id)
+		cnt++;
+
+	flow->matches = kcalloc((cnt + 1), sizeof(struct net_flow_field_ref),
+				GFP_KERNEL);
+	if (!flow->matches)
+		return -ENOMEM;
+
+	cnt = 0;
+	if (key->term_mac.in_lport) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_IN_LPORT;
+		flow->matches[cnt].header = HEADER_METADATA;
+		flow->matches[cnt].field = HEADER_METADATA_IN_LPORT;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_LPM;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U32;
+		flow->matches[cnt].value_u32 = key->term_mac.in_lport;
+		flow->matches[cnt].mask_u32 = key->term_mac.in_lport;
+		cnt++;
+	}
+
+	if (key->term_mac.eth_type) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_ETHERNET;
+		flow->matches[cnt].header = HEADER_ETHERNET;
+		flow->matches[cnt].field = HEADER_ETHERNET_ETHERTYPE;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_EXACT;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16;
+		flow->matches[cnt].value_u16 = ntohs(key->term_mac.eth_type);
+		cnt++;
+	}
+
+	if (key->term_mac.eth_dst) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_ETHERNET;
+		flow->matches[cnt].header = HEADER_ETHERNET;
+		flow->matches[cnt].field = HEADER_ETHERNET_DST_MAC;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_LPM;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U64;
+		memcpy(&flow->matches[cnt].value_u64,
+		       key->term_mac.eth_dst, ETH_ALEN);
+		memcpy(&flow->matches[cnt].mask_u64,
+		       key->term_mac.eth_dst_mask, ETH_ALEN);
+		cnt++;
+	}
+
+	if (key->term_mac.vlan_id) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_VLAN_OUTER;
+		flow->matches[cnt].header = HEADER_VLAN;
+		flow->matches[cnt].field = HEADER_VLAN_VID;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_LPM;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16;
+		flow->matches[cnt].value_u16 = ntohs(key->term_mac.vlan_id);
+		flow->matches[cnt].mask_u16 = ntohs(key->term_mac.vlan_id_mask);
+		cnt++;
+	}
+
+	memset(&flow->matches[cnt], 0, sizeof(flow->matches[cnt]));
+
+	flow->actions = kmalloc(2 * sizeof(struct net_flow_action), GFP_KERNEL);
+	if (!flow->actions) {
+		kfree(flow->matches);
+		return -ENOMEM;
+	}
+
+	flow->actions[0].args = NULL;
+	flow->actions[0].uid = ACTION_COPY_TO_CPU;
+	memset(&flow->actions[1], 0, sizeof(flow->actions[1]));
+
+	return 0;
+}
+
+static int rocker_ucast_to_flow(struct rocker_flow_tbl_key *key,
+				struct net_flow_flow *flow)
+{
+	int cnt = 0;
+
+	if (key->ucast_routing.eth_type)
+		cnt++;
+	if (key->ucast_routing.dst4)
+		cnt++;
+
+	flow->matches = kcalloc((cnt + 1), sizeof(struct net_flow_field_ref),
+				GFP_KERNEL);
+	if (!flow->matches)
+		return -ENOMEM;
+
+	cnt = 0;
+
+	if (key->ucast_routing.eth_type) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_ETHERNET;
+		flow->matches[cnt].header = HEADER_ETHERNET;
+		flow->matches[cnt].field = HEADER_ETHERNET_ETHERTYPE;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_EXACT;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16;
+		flow->matches[cnt].value_u16 =
+				ntohs(key->ucast_routing.eth_type);
+		cnt++;
+	}
+
+	if (key->ucast_routing.dst4) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_IPV4;
+		flow->matches[cnt].header = HEADER_IPV4;
+		flow->matches[cnt].field = HEADER_IPV4_DST_IP;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_LPM;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U32;
+		flow->matches[cnt].value_u32 = key->ucast_routing.dst4;
+		flow->matches[cnt].mask_u32 = key->ucast_routing.dst4_mask;
+		cnt++;
+	}
+
+	memset(&flow->matches[cnt], 0, sizeof(flow->matches[cnt]));
+
+	flow->actions = kmalloc(2 * sizeof(struct net_flow_action), GFP_KERNEL);
+	if (!flow->actions) {
+		kfree(flow->matches);
+		return -ENOMEM;
+	}
+
+	flow->actions[0].args = kcalloc(2, sizeof(struct net_flow_action_arg),
+					GFP_KERNEL);
+	if (!flow->actions[0].args) {
+		kfree(flow->matches);
+		kfree(flow->actions);
+		return -ENOMEM;
+	}
+
+	flow->actions[0].uid = ACTION_SET_L3_UNICAST_GROUP_ID;
+	flow->actions[0].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U32;
+	flow->actions[0].args[0].value_u32 = key->ucast_routing.group_id;
+
+	memset(&flow->actions[1], 0, sizeof(flow->actions[1]));
+	memset(&flow->actions[0].args[1], 0,
+	       sizeof(struct net_flow_action_arg));
+
+	return 0;
+}
+
+static int rocker_bridge_to_flow(struct rocker_flow_tbl_key *key,
+				 struct net_flow_flow *flow)
+{
+	int cnt = 0;
+
+	if (key->bridge.eth_dst)
+		cnt++;
+	if (key->bridge.vlan_id)
+		cnt++;
+
+	flow->matches = kcalloc((cnt + 1), sizeof(struct net_flow_field_ref),
+				GFP_KERNEL);
+	if (!flow->matches)
+		return -ENOMEM;
+
+	cnt = 0;
+
+	if (key->bridge.eth_dst) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_ETHERNET;
+		flow->matches[cnt].header = HEADER_ETHERNET;
+		flow->matches[cnt].field = HEADER_ETHERNET_DST_MAC;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_LPM;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U64;
+		memcpy(&flow->matches[cnt].value_u64,
+		       key->bridge.eth_dst, ETH_ALEN);
+		memcpy(&flow->matches[cnt].mask_u64,
+		       key->bridge.eth_dst_mask, ETH_ALEN);
+		cnt++;
+	}
+
+	if (key->bridge.vlan_id) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_VLAN_OUTER;
+		flow->matches[cnt].header = HEADER_VLAN;
+		flow->matches[cnt].field = HEADER_VLAN_VID;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_EXACT;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16;
+		flow->matches[cnt].value_u16 = ntohs(key->bridge.vlan_id);
+		cnt++;
+	}
+
+	memset(&flow->matches[cnt], 0, sizeof(flow->matches[cnt]));
+
+	cnt = 0;
+	if (key->bridge.group_id)
+		cnt++;
+	if (key->bridge.copy_to_cpu)
+		cnt++;
+
+	flow->actions = kcalloc((cnt + 1), sizeof(struct net_flow_action),
+				GFP_KERNEL);
+	if (!flow->actions) {
+		kfree(flow->matches);
+		return -ENOMEM;
+	}
+
+	cnt = 0;
+	if (key->bridge.group_id) {
+		flow->actions[cnt].args =
+				kcalloc(2,
+					sizeof(struct net_flow_action_arg),
+					GFP_KERNEL);
+		if (!flow->actions[cnt].args) {
+			kfree(flow->matches);
+			kfree(flow->actions);
+			return -ENOMEM;
+		}
+
+		flow->actions[cnt].uid = ACTION_SET_L3_UNICAST_GROUP_ID;
+		flow->actions[cnt].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U32;
+		flow->actions[cnt].args[0].value_u32 = key->bridge.group_id;
+		cnt++;
+	}
+
+	if (key->bridge.copy_to_cpu) {
+		flow->actions[cnt].uid = ACTION_COPY_TO_CPU;
+		flow->actions[cnt].args = NULL;
+		cnt++;
+	}
+
+	memset(&flow->actions[cnt], 0, sizeof(flow->actions[1]));
+	return 0;
+}
+
+static int rocker_acl_to_flow(struct rocker_flow_tbl_key *key,
+			      struct net_flow_flow *flow)
+{
+	int cnt = 0;
+
+	if (key->acl.in_lport)
+		cnt++;
+	if (key->acl.eth_src)
+		cnt++;
+	if (key->acl.eth_dst)
+		cnt++;
+	if (key->acl.eth_type)
+		cnt++;
+	if (key->acl.vlan_id)
+		cnt++;
+	if (key->acl.ip_proto)
+		cnt++;
+	if (key->acl.ip_tos)
+		cnt++;
+
+	flow->matches = kcalloc((cnt + 1), sizeof(struct net_flow_field_ref),
+				GFP_KERNEL);
+	if (!flow->matches)
+		return -ENOMEM;
+
+	cnt = 0;
+
+	if (key->acl.in_lport) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_IN_LPORT;
+		flow->matches[cnt].header = HEADER_METADATA;
+		flow->matches[cnt].field = HEADER_METADATA_IN_LPORT;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_LPM;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U32;
+		flow->matches[cnt].value_u32 = key->acl.in_lport;
+		flow->matches[cnt].mask_u32 = key->acl.in_lport_mask;
+		cnt++;
+	}
+
+	if (key->acl.eth_src) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_ETHERNET;
+		flow->matches[cnt].header = HEADER_ETHERNET;
+		flow->matches[cnt].field = HEADER_ETHERNET_SRC_MAC;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_LPM;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U64;
+		flow->matches[cnt].value_u64 = *key->acl.eth_src;
+		flow->matches[cnt].mask_u64 = *key->acl.eth_src_mask;
+		cnt++;
+	}
+
+	if (key->acl.eth_dst) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_ETHERNET;
+		flow->matches[cnt].header = HEADER_ETHERNET;
+		flow->matches[cnt].field = HEADER_ETHERNET_DST_MAC;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_LPM;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U64;
+		memcpy(&flow->matches[cnt].value_u64,
+		       key->acl.eth_dst, ETH_ALEN);
+		memcpy(&flow->matches[cnt].mask_u64,
+		       key->acl.eth_dst_mask, ETH_ALEN);
+		cnt++;
+	}
+
+	if (key->acl.eth_type) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_ETHERNET;
+		flow->matches[cnt].header = HEADER_ETHERNET;
+		flow->matches[cnt].field = HEADER_ETHERNET_ETHERTYPE;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_EXACT;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16;
+		flow->matches[cnt].value_u16 = ntohs(key->acl.eth_type);
+		cnt++;
+	}
+
+	if (key->acl.vlan_id) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_VLAN_OUTER;
+		flow->matches[cnt].header = HEADER_VLAN;
+		flow->matches[cnt].field = HEADER_VLAN_VID;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_EXACT;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16;
+		flow->matches[cnt].value_u16 = ntohs(key->acl.vlan_id);
+		cnt++;
+	}
+
+	if (key->acl.ip_proto) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_IPV4;
+		flow->matches[cnt].header = HEADER_IPV4;
+		flow->matches[cnt].field = HEADER_IPV4_PROTOCOL;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_LPM;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U8;
+		flow->matches[cnt].value_u8 = key->acl.ip_proto;
+		flow->matches[cnt].mask_u8 = key->acl.ip_proto_mask;
+		cnt++;
+	}
+
+	if (key->acl.ip_tos) {
+		flow->matches[cnt].instance = HEADER_INSTANCE_IPV4;
+		flow->matches[cnt].header = HEADER_IPV4;
+		flow->matches[cnt].field = HEADER_IPV4_DSCP;
+		flow->matches[cnt].mask_type = NET_FLOW_MASK_TYPE_LPM;
+		flow->matches[cnt].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U8;
+		flow->matches[cnt].value_u8 = key->acl.ip_tos;
+		flow->matches[cnt].mask_u8 = key->acl.ip_tos_mask;
+		cnt++;
+	}
+
+	memset(&flow->matches[cnt], 0, sizeof(flow->matches[cnt]));
+
+	flow->actions = kcalloc(2,
+				sizeof(struct net_flow_action),
+				GFP_KERNEL);
+	if (!flow->actions) {
+		kfree(flow->matches);
+		return -ENOMEM;
+	}
+
+	flow->actions[0].args = kcalloc(2,
+					sizeof(struct net_flow_action_arg),
+					GFP_KERNEL);
+	if (!flow->actions[0].args) {
+		kfree(flow->matches);
+		kfree(flow->actions);
+		return -ENOMEM;
+	}
+
+	flow->actions[0].uid = ACTION_SET_L3_UNICAST_GROUP_ID;
+	flow->actions[0].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U32;
+	flow->actions[0].args[0].value_u32 = key->acl.group_id;
+
+	memset(&flow->actions[0].args[1], 0,
+	       sizeof(struct net_flow_action_arg));
+	memset(&flow->actions[1], 0, sizeof(flow->actions[1]));
+	return 0;
+}
+
+static int rocker_l3_unicast_to_flow(struct rocker_group_tbl_entry *entry,
+				     struct net_flow_flow *flow)
+{
+	int cnt = 0;
+
+	flow->matches = kcalloc(2, sizeof(struct net_flow_field_ref),
+				GFP_KERNEL);
+	if (!flow->matches)
+		return -ENOMEM;
+
+	flow->matches[0].instance = HEADER_INSTANCE_L3_UNICAST_GROUP_ID;
+	flow->matches[0].header = HEADER_METADATA;
+	flow->matches[0].field = HEADER_METADATA_L3_UNICAST_GROUP_ID;
+	flow->matches[0].mask_type = NET_FLOW_MASK_TYPE_EXACT;
+	flow->matches[0].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U32;
+	flow->matches[0].value_u32 = ~ROCKER_GROUP_TYPE_MASK & entry->group_id;
+
+	memset(&flow->matches[1], 0, sizeof(flow->matches[cnt]));
+
+	if (entry->l3_unicast.eth_src)
+		cnt++;
+	if (entry->l3_unicast.eth_dst)
+		cnt++;
+	if (entry->l3_unicast.vlan_id)
+		cnt++;
+	if (entry->l3_unicast.ttl_check)
+		cnt++;
+	if (entry->l3_unicast.group_id)
+		cnt++;
+
+	flow->actions = kcalloc(cnt, sizeof(struct net_flow_action),
+				GFP_KERNEL);
+	if (!flow->actions) {
+		kfree(flow->matches);
+		return -ENOMEM;
+	}
+
+	cnt = 0;
+
+	if (entry->l3_unicast.eth_src) {
+		flow->actions[cnt].args =
+				kcalloc(2,
+					sizeof(struct net_flow_action_arg),
+					GFP_KERNEL);
+
+		if (!flow->actions[cnt].args)
+			goto unwind_args;
+
+		flow->actions[cnt].uid = ACTION_SET_ETH_SRC;
+		flow->actions[cnt].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U64;
+		ether_addr_copy(flow->actions[cnt].args[0].value_u64,
+				entry->l3_unicast.eth_src);
+		memset(&flow->actions[0].args[1], 0,
+		       sizeof(struct net_flow_action_arg));
+		cnt++;
+	}
+
+	if (entry->l3_unicast.eth_dst) {
+		flow->actions[cnt].args =
+			kcalloc(2,
+				sizeof(struct net_flow_action_arg),
+				GFP_KERNEL);
+
+		if (!flow->actions[cnt].args)
+			goto unwind_args;
+
+		flow->actions[cnt].uid = ACTION_SET_ETH_DST;
+		flow->actions[cnt].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U64;
+		ether_addr_copy(&flow->actions[cnt].args[0].value_u64,
+				entry->l3_unicast.eth_dst);
+		memset(&flow->actions[0].args[1], 0,
+		       sizeof(struct net_flow_action_arg));
+		cnt++;
+	}
+
+	if (entry->l3_unicast.vlan_id) {
+		flow->actions[cnt].args =
+				kcalloc(2,
+					sizeof(struct net_flow_action_arg),
+					GFP_KERNEL);
+
+		if (!flow->actions[cnt].args)
+			goto unwind_args;
+
+		flow->actions[cnt].uid = ACTION_SET_VLAN_ID;
+		flow->actions[cnt].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U16;
+		flow->actions[cnt].args[0].value_u16 =
+					ntohs(entry->l3_unicast.vlan_id);
+		memset(&flow->actions[0].args[1], 0,
+		       sizeof(struct net_flow_action_arg));
+		cnt++;
+	}
+
+	if (entry->l3_unicast.ttl_check) {
+		flow->actions[cnt].uid = ACTION_CHECK_TTL_DROP;
+		flow->actions[cnt].args = NULL;
+		cnt++;
+	}
+
+	if (entry->l3_unicast.group_id) {
+		flow->actions[cnt].args =
+				kcalloc(2,
+					sizeof(struct net_flow_action_arg),
+					GFP_KERNEL);
+
+		if (!flow->actions[cnt].args)
+			goto unwind_args;
+
+		flow->actions[cnt].uid = ACTION_SET_L2_GROUP_ID;
+		flow->actions[cnt].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U32;
+		flow->actions[cnt].args[0].value_u32 =
+						entry->l3_unicast.group_id;
+		memset(&flow->actions[0].args[1], 0,
+		       sizeof(struct net_flow_action_arg));
+		cnt++;
+	}
+
+	memset(&flow->actions[cnt], 0, sizeof(flow->actions[cnt]));
+	return 0;
+unwind_args:
+	kfree(flow->matches);
+	for (cnt--; cnt >= 0; cnt--)
+		kfree(flow->actions[cnt].args);
+	kfree(flow->actions);
+	return -ENOMEM;
+}
+
+static int rocker_l2_rewrite_to_flow(struct rocker_group_tbl_entry *entry,
+				     struct net_flow_flow *flow)
+{
+	int cnt = 0;
+
+	flow->matches = kcalloc(2, sizeof(struct net_flow_field_ref),
+				GFP_KERNEL);
+	if (!flow->matches)
+		return -ENOMEM;
+
+	flow->matches[0].instance = HEADER_INSTANCE_L2_REWRITE_GROUP_ID;
+	flow->matches[0].header = HEADER_METADATA;
+	flow->matches[0].field = HEADER_METADATA_L2_REWRITE_GROUP_ID;
+	flow->matches[0].mask_type = NET_FLOW_MASK_TYPE_EXACT;
+	flow->matches[0].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U32;
+	flow->matches[0].value_u32 = ~ROCKER_GROUP_TYPE_MASK & entry->group_id;
+
+	memset(&flow->matches[1], 0, sizeof(flow->matches[cnt]));
+
+	if (entry->l2_rewrite.eth_src)
+		cnt++;
+	if (entry->l2_rewrite.eth_dst)
+		cnt++;
+	if (entry->l2_rewrite.vlan_id)
+		cnt++;
+	if (entry->l2_rewrite.group_id)
+		cnt++;
+
+	flow->actions = kcalloc(cnt, sizeof(struct net_flow_action),
+				GFP_KERNEL);
+	if (!flow->actions) {
+		kfree(flow->matches);
+		return -ENOMEM;
+	}
+
+	cnt = 0;
+
+	if (entry->l2_rewrite.eth_src) {
+		flow->actions[cnt].args =
+			kmalloc(2 * sizeof(struct net_flow_action_arg),
+				GFP_KERNEL);
+
+		if (!flow->actions[cnt].args)
+			goto unwind_args;
+
+		flow->actions[cnt].uid = ACTION_SET_ETH_SRC;
+		flow->actions[cnt].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U64;
+		ether_addr_copy(flow->actions[cnt].args[0].value_u64,
+				entry->l2_rewrite.eth_src);
+		memset(&flow->actions[0].args[1], 0,
+		       sizeof(struct net_flow_action_arg));
+		cnt++;
+	}
+
+	if (entry->l2_rewrite.eth_dst) {
+		flow->actions[cnt].args =
+			kmalloc(2 * sizeof(struct net_flow_action_arg),
+				GFP_KERNEL);
+
+		if (!flow->actions[cnt].args)
+			goto unwind_args;
+
+		flow->actions[cnt].uid = ACTION_SET_ETH_DST;
+		flow->actions[cnt].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U64;
+		ether_addr_copy(&flow->actions[cnt].args[0].value_u64,
+				entry->l2_rewrite.eth_dst);
+		memset(&flow->actions[0].args[1], 0,
+		       sizeof(struct net_flow_action_arg));
+		cnt++;
+	}
+
+	if (entry->l2_rewrite.vlan_id) {
+		flow->actions[cnt].args =
+			kmalloc(2 * sizeof(struct net_flow_action_arg),
+				GFP_KERNEL);
+
+		if (!flow->actions[cnt].args)
+			goto unwind_args;
+
+		flow->actions[cnt].uid = ACTION_SET_VLAN_ID;
+		flow->actions[cnt].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U16;
+		flow->actions[cnt].args[0].value_u16 =
+					ntohs(entry->l2_rewrite.vlan_id);
+		memset(&flow->actions[0].args[1], 0,
+		       sizeof(struct net_flow_action_arg));
+		cnt++;
+	}
+
+	if (entry->l2_rewrite.group_id) {
+		flow->actions[cnt].args =
+			kmalloc(2 * sizeof(struct net_flow_action_arg),
+				GFP_KERNEL);
+
+		if (!flow->actions[cnt].args)
+			goto unwind_args;
+
+		flow->actions[cnt].uid = ACTION_SET_L2_GROUP_ID;
+		flow->actions[cnt].args[0].type = NET_FLOW_ACTION_ARG_TYPE_U32;
+		flow->actions[cnt].args[0].value_u32 =
+			entry->l2_rewrite.group_id;
+		memset(&flow->actions[0].args[1], 0,
+		       sizeof(struct net_flow_action_arg));
+		cnt++;
+	}
+
+	memset(&flow->actions[cnt], 0, sizeof(flow->actions[cnt]));
+	return 0;
+unwind_args:
+	kfree(flow->matches);
+	for (cnt--; cnt >= 0; cnt--)
+		kfree(flow->actions[cnt].args);
+	kfree(flow->actions);
+	return -ENOMEM;
+}
+
+static int rocker_l2_interface_to_flow(struct rocker_group_tbl_entry *entry,
+				       struct net_flow_flow *flow)
+{
+	flow->matches = kmalloc(2 * sizeof(struct net_flow_field_ref),
+				GFP_KERNEL);
+	if (!flow->matches)
+		return -ENOMEM;
+
+	flow->matches[0].instance = HEADER_INSTANCE_L2_GROUP_ID;
+	flow->matches[0].header = HEADER_METADATA;
+	flow->matches[0].field = HEADER_METADATA_L2_GROUP_ID;
+	flow->matches[0].mask_type = NET_FLOW_MASK_TYPE_EXACT;
+	flow->matches[0].type = NET_FLOW_FIELD_REF_ATTR_TYPE_U32;
+	flow->matches[0].value_u32 = ~ROCKER_GROUP_TYPE_MASK & entry->group_id;
+
+	memset(&flow->matches[1], 0, sizeof(flow->matches[1]));
+
+	if (!entry->l2_interface.pop_vlan) {
+		flow->actions = NULL;
+		return 0;
+	}
+
+	flow->actions = kmalloc(2 * sizeof(struct net_flow_action), GFP_KERNEL);
+	if (!flow->actions) {
+		kfree(flow->matches);
+		return -ENOMEM;
+	}
+
+	if (entry->l2_interface.pop_vlan) {
+		flow->actions[0].uid = ACTION_POP_VLAN;
+		flow->actions[0].args = NULL;
+	}
+
+	memset(&flow->actions[1], 0, sizeof(flow->actions[1]));
+	return 0;
+}
+
+static int rocker_get_flows(struct sk_buff *skb, struct net_device *dev,
+			    int table, int min, int max)
+{
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct net_flow_flow flow;
+	struct rocker_flow_tbl_entry *entry;
+	struct rocker_group_tbl_entry *group;
+	struct hlist_node *tmp;
+	unsigned long flags;
+	int bkt, err;
+
+	spin_lock_irqsave(&rocker_port->rocker->flow_tbl_lock, flags);
+	hash_for_each_safe(rocker_port->rocker->flow_tbl,
+			   bkt, tmp, entry, entry) {
+		struct rocker_flow_tbl_key *key = &entry->key;
+
+		if (rocker_goto_value(table) != key->tbl_id)
+			continue;
+
+		flow.table_id = table;
+		flow.uid = entry->cookie;
+		flow.priority = key->priority;
+
+		switch (table) {
+		case ROCKER_FLOW_TABLE_ID_INGRESS_PORT:
+			err = rocker_ig_port_to_flow(key, &flow);
+			if (err)
+				return err;
+			break;
+		case ROCKER_FLOW_TABLE_ID_VLAN:
+			err = rocker_vlan_to_flow(key, &flow);
+			if (err)
+				return err;
+			break;
+		case ROCKER_FLOW_TABLE_ID_TERMINATION_MAC:
+			err = rocker_term_to_flow(key, &flow);
+			break;
+		case ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING:
+			err = rocker_ucast_to_flow(key, &flow);
+			break;
+		case ROCKER_FLOW_TABLE_ID_BRIDGING:
+			err = rocker_bridge_to_flow(key, &flow);
+			break;
+		case ROCKER_FLOW_TABLE_ID_ACL_POLICY:
+			err = rocker_acl_to_flow(key, &flow);
+			break;
+		default:
+			continue;
+		}
+
+		net_flow_put_flow(skb, &flow);
+	}
+	spin_unlock_irqrestore(&rocker_port->rocker->flow_tbl_lock, flags);
+
+	spin_lock_irqsave(&rocker_port->rocker->group_tbl_lock, flags);
+	hash_for_each_safe(rocker_port->rocker->group_tbl,
+			   bkt, tmp, group, entry) {
+		if (rocker_goto_value(table) !=
+			ROCKER_GROUP_TYPE_GET(group->group_id))
+			continue;
+
+		flow.table_id = table;
+		flow.uid = group->group_id;
+		flow.priority = 1;
+
+		switch (table) {
+		case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L3_UNICAST:
+			err = rocker_l3_unicast_to_flow(group, &flow);
+			break;
+		case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2_REWRITE:
+			err = rocker_l2_rewrite_to_flow(group, &flow);
+			break;
+		case ROCKER_FLOW_TABLE_ID_GROUP_SLICE_L2:
+			err = rocker_l2_interface_to_flow(group, &flow);
+			break;
+		default:
+			continue;
+		}
+
+		net_flow_put_flow(skb, &flow);
+	}
+	spin_unlock_irqrestore(&rocker_port->rocker->group_tbl_lock, flags);
+
+	return 0;
+}
 #endif
 
 static const struct net_device_ops rocker_port_netdev_ops = {
@@ -4517,6 +5335,7 @@ static const struct net_device_ops rocker_port_netdev_ops = {
 
 	.ndo_flow_set_flows		= rocker_set_flows,
 	.ndo_flow_del_flows		= rocker_del_flows,
+	.ndo_flow_get_flows		= rocker_get_flows,
 #endif
 };
 

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [net-next PATCH v1 09/11] net: rocker: add cookie to group acls and use flow_id to set cookie
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (7 preceding siblings ...)
  2014-12-31 19:48 ` [net-next PATCH v1 08/11] net: rocker: add get flow API operation John Fastabend
@ 2014-12-31 19:49 ` John Fastabend
  2014-12-31 19:50 ` [net-next PATCH v1 10/11] net: rocker: have flow api calls set cookie value John Fastabend
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:49 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

Rocker uses a cookie value to identify flows however the flow API
already has a unique id for each flow. To help the translation
add support to set the cookie value through the internal rocker
flow API and then use the unique id in the cases where it is
available.

This patch extends the internal code paths to support the new
cookie value.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c |   64 ++++++++++++++++++++++------------
 1 file changed, 42 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 997beb9..4d2d292 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -120,6 +120,7 @@ struct rocker_flow_tbl_entry {
 
 struct rocker_group_tbl_entry {
 	struct hlist_node entry;
+	u64 cookie;
 	u32 cmd;
 	u32 group_id; /* key */
 	u16 group_count;
@@ -2216,7 +2217,8 @@ static int rocker_flow_tbl_add(struct rocker_port *rocker_port,
 		kfree(match);
 	} else {
 		found = match;
-		found->cookie = rocker->flow_tbl_next_cookie++;
+		if (!found->cookie)
+			found->cookie = rocker->flow_tbl_next_cookie++;
 		hash_add(rocker->flow_tbl, &found->entry, found->key_crc32);
 		add_to_hw = true;
 	}
@@ -2294,7 +2296,7 @@ static int rocker_flow_tbl_do(struct rocker_port *rocker_port,
 		return rocker_flow_tbl_add(rocker_port, entry, nowait);
 }
 
-static int rocker_flow_tbl_ig_port(struct rocker_port *rocker_port,
+static int rocker_flow_tbl_ig_port(struct rocker_port *rocker_port, u64 flow_id,
 				   int flags, u32 in_lport, u32 in_lport_mask,
 				   enum rocker_of_dpa_table_id goto_tbl)
 {
@@ -2310,11 +2312,14 @@ static int rocker_flow_tbl_ig_port(struct rocker_port *rocker_port,
 	entry->key.ig_port.in_lport_mask = in_lport_mask;
 	entry->key.ig_port.goto_tbl = goto_tbl;
 
+	if (flow_id)
+		entry->cookie = flow_id;
+
 	return rocker_flow_tbl_do(rocker_port, flags, entry);
 }
 
 static int rocker_flow_tbl_vlan(struct rocker_port *rocker_port,
-				int flags, u32 in_lport,
+				int flags, u64 flow_id, u32 in_lport,
 				__be16 vlan_id, __be16 vlan_id_mask,
 				enum rocker_of_dpa_table_id goto_tbl,
 				bool untagged, __be16 new_vlan_id)
@@ -2335,10 +2340,14 @@ static int rocker_flow_tbl_vlan(struct rocker_port *rocker_port,
 	entry->key.vlan.untagged = untagged;
 	entry->key.vlan.new_vlan_id = new_vlan_id;
 
+	if (flow_id)
+		entry->cookie = flow_id;
+
 	return rocker_flow_tbl_do(rocker_port, flags, entry);
 }
 
 static int rocker_flow_tbl_term_mac(struct rocker_port *rocker_port,
+				    u64 flow_id,
 				    u32 in_lport, u32 in_lport_mask,
 				    __be16 eth_type, const u8 *eth_dst,
 				    const u8 *eth_dst_mask, __be16 vlan_id,
@@ -2371,11 +2380,14 @@ static int rocker_flow_tbl_term_mac(struct rocker_port *rocker_port,
 	entry->key.term_mac.vlan_id_mask = vlan_id_mask;
 	entry->key.term_mac.copy_to_cpu = copy_to_cpu;
 
+	if (flow_id)
+		entry->cookie = flow_id;
+
 	return rocker_flow_tbl_do(rocker_port, flags, entry);
 }
 
 static int rocker_flow_tbl_bridge(struct rocker_port *rocker_port,
-				  int flags,
+				  int flags, u64 flow_id,
 				  const u8 *eth_dst, const u8 *eth_dst_mask,
 				  __be16 vlan_id, u32 tunnel_id,
 				  enum rocker_of_dpa_table_id goto_tbl,
@@ -2425,11 +2437,14 @@ static int rocker_flow_tbl_bridge(struct rocker_port *rocker_port,
 	entry->key.bridge.group_id = group_id;
 	entry->key.bridge.copy_to_cpu = copy_to_cpu;
 
+	if (flow_id)
+		entry->cookie = flow_id;
+
 	return rocker_flow_tbl_do(rocker_port, flags, entry);
 }
 
 static int rocker_flow_tbl_acl(struct rocker_port *rocker_port,
-			       int flags, u32 in_lport,
+			       int flags, u64 flow_id, u32 in_lport,
 			       u32 in_lport_mask,
 			       const u8 *eth_src, const u8 *eth_src_mask,
 			       const u8 *eth_dst, const u8 *eth_dst_mask,
@@ -2477,6 +2492,9 @@ static int rocker_flow_tbl_acl(struct rocker_port *rocker_port,
 	entry->key.acl.ip_tos_mask = ip_tos_mask;
 	entry->key.acl.group_id = group_id;
 
+	if (flow_id)
+		entry->cookie = flow_id;
+
 	return rocker_flow_tbl_do(rocker_port, flags, entry);
 }
 
@@ -2587,7 +2605,7 @@ static int rocker_group_tbl_do(struct rocker_port *rocker_port,
 }
 
 static int rocker_group_l2_interface(struct rocker_port *rocker_port,
-				     int flags, __be16 vlan_id,
+				     int flags, int flow_id, __be16 vlan_id,
 				     u32 out_lport, int pop_vlan)
 {
 	struct rocker_group_tbl_entry *entry;
@@ -2598,6 +2616,7 @@ static int rocker_group_l2_interface(struct rocker_port *rocker_port,
 
 	entry->group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
 	entry->l2_interface.pop_vlan = pop_vlan;
+	entry->cookie = flow_id;
 
 	return rocker_group_tbl_do(rocker_port, flags, entry);
 }
@@ -2696,7 +2715,7 @@ static int rocker_port_vlan_l2_groups(struct rocker_port *rocker_port,
 	if (rocker_port->stp_state == BR_STATE_LEARNING ||
 	    rocker_port->stp_state == BR_STATE_FORWARDING) {
 		out_lport = rocker_port->lport;
-		err = rocker_group_l2_interface(rocker_port, flags,
+		err = rocker_group_l2_interface(rocker_port, flags, 0,
 						vlan_id, out_lport,
 						pop_vlan);
 		if (err) {
@@ -2722,7 +2741,7 @@ static int rocker_port_vlan_l2_groups(struct rocker_port *rocker_port,
 		return 0;
 
 	out_lport = 0;
-	err = rocker_group_l2_interface(rocker_port, flags,
+	err = rocker_group_l2_interface(rocker_port, flags, 0,
 					vlan_id, out_lport,
 					pop_vlan);
 	if (err) {
@@ -2796,7 +2815,7 @@ static int rocker_port_ctrl_vlan_acl(struct rocker_port *rocker_port,
 	u32 group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
 	int err;
 
-	err = rocker_flow_tbl_acl(rocker_port, flags,
+	err = rocker_flow_tbl_acl(rocker_port, flags, 0,
 				  in_lport, in_lport_mask,
 				  eth_src, eth_src_mask,
 				  ctrl->eth_dst, ctrl->eth_dst_mask,
@@ -2825,7 +2844,7 @@ static int rocker_port_ctrl_vlan_bridge(struct rocker_port *rocker_port,
 	if (!rocker_port_is_bridged(rocker_port))
 		return 0;
 
-	err = rocker_flow_tbl_bridge(rocker_port, flags,
+	err = rocker_flow_tbl_bridge(rocker_port, flags, 0,
 				     ctrl->eth_dst, ctrl->eth_dst_mask,
 				     vlan_id, tunnel_id,
 				     goto_tbl, group_id, ctrl->copy_to_cpu);
@@ -2847,7 +2866,7 @@ static int rocker_port_ctrl_vlan_term(struct rocker_port *rocker_port,
 	if (ntohs(vlan_id) == 0)
 		vlan_id = rocker_port->internal_vlan_id;
 
-	err = rocker_flow_tbl_term_mac(rocker_port,
+	err = rocker_flow_tbl_term_mac(rocker_port, 0,
 				       rocker_port->lport, in_lport_mask,
 				       ctrl->eth_type, ctrl->eth_dst,
 				       ctrl->eth_dst_mask, vlan_id,
@@ -2961,7 +2980,7 @@ static int rocker_port_vlan(struct rocker_port *rocker_port, int flags,
 		return err;
 	}
 
-	err = rocker_flow_tbl_vlan(rocker_port, flags,
+	err = rocker_flow_tbl_vlan(rocker_port, flags, 0,
 				   in_lport, vlan_id, vlan_id_mask,
 				   goto_tbl, untagged, internal_vlan_id);
 	if (err)
@@ -2986,7 +3005,7 @@ static int rocker_port_ig_tbl(struct rocker_port *rocker_port, int flags)
 	in_lport_mask = 0xffff0000;
 	goto_tbl = ROCKER_OF_DPA_TABLE_ID_VLAN;
 
-	err = rocker_flow_tbl_ig_port(rocker_port, flags,
+	err = rocker_flow_tbl_ig_port(rocker_port, flags, 0,
 				      in_lport, in_lport_mask,
 				      goto_tbl);
 	if (err)
@@ -3036,7 +3055,7 @@ static int rocker_port_fdb_learn(struct rocker_port *rocker_port,
 		group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, out_lport);
 
 	if (!(flags & ROCKER_OP_FLAG_REFRESH)) {
-		err = rocker_flow_tbl_bridge(rocker_port, flags, addr, NULL,
+		err = rocker_flow_tbl_bridge(rocker_port, flags, 0, addr, NULL,
 					     vlan_id, tunnel_id, goto_tbl,
 					     group_id, copy_to_cpu);
 		if (err)
@@ -3171,7 +3190,7 @@ static int rocker_port_router_mac(struct rocker_port *rocker_port,
 		vlan_id = rocker_port->internal_vlan_id;
 
 	eth_type = htons(ETH_P_IP);
-	err = rocker_flow_tbl_term_mac(rocker_port,
+	err = rocker_flow_tbl_term_mac(rocker_port, 0,
 				       rocker_port->lport, in_lport_mask,
 				       eth_type, rocker_port->dev->dev_addr,
 				       dst_mac_mask, vlan_id, vlan_id_mask,
@@ -3180,7 +3199,7 @@ static int rocker_port_router_mac(struct rocker_port *rocker_port,
 		return err;
 
 	eth_type = htons(ETH_P_IPV6);
-	err = rocker_flow_tbl_term_mac(rocker_port,
+	err = rocker_flow_tbl_term_mac(rocker_port, 0,
 				       rocker_port->lport, in_lport_mask,
 				       eth_type, rocker_port->dev->dev_addr,
 				       dst_mac_mask, vlan_id, vlan_id_mask,
@@ -3215,7 +3234,7 @@ static int rocker_port_fwding(struct rocker_port *rocker_port)
 			continue;
 		vlan_id = htons(vid);
 		pop_vlan = rocker_vlan_id_is_internal(vlan_id);
-		err = rocker_group_l2_interface(rocker_port, flags,
+		err = rocker_group_l2_interface(rocker_port, flags, 0,
 						vlan_id, out_lport,
 						pop_vlan);
 		if (err) {
@@ -3919,7 +3938,7 @@ static int rocker_flow_set_ig_port(struct net_device *dev,
 	in_lport_mask = flow->matches[0].mask_u32;
 	goto_tbl = rocker_goto_value(flow->actions[0].args[0].value_u16);
 
-	err = rocker_flow_tbl_ig_port(rocker_port, flags,
+	err = rocker_flow_tbl_ig_port(rocker_port, flags, 0,
 				      in_lport, in_lport_mask,
 				      goto_tbl);
 	return err;
@@ -3981,7 +4000,7 @@ static int rocker_flow_set_vlan(struct net_device *dev,
 	if (!have_in_lport)
 		return -EINVAL;
 
-	err = rocker_flow_tbl_vlan(rocker_port, flags, in_lport,
+	err = rocker_flow_tbl_vlan(rocker_port, flags, 0, in_lport,
 				   vlan_id, vlan_id_mask, goto_tbl,
 				   untagged, new_vlan_id);
 	return err;
@@ -4063,7 +4082,8 @@ static int rocker_flow_set_term_mac(struct net_device *dev,
 		}
 	}
 
-	err = rocker_flow_tbl_term_mac(rocker_port, in_lport, in_lport_mask,
+	err = rocker_flow_tbl_term_mac(rocker_port, 0,
+				       in_lport, in_lport_mask,
 				       ethtype, eth_dst, eth_dst_mask,
 				       vlan_id, vlan_id_mask,
 				       copy_to_cpu, flags);
@@ -4162,7 +4182,7 @@ static int rocker_flow_set_bridge(struct net_device *dev,
 	}
 
 	/* Ignoring eth_dst_mask it seems to cause a EINVAL return code */
-	err = rocker_flow_tbl_bridge(rocker_port, flags,
+	err = rocker_flow_tbl_bridge(rocker_port, flags, 0,
 				     eth_dst, eth_dst_mask,
 				     vlan_id, tunnel_id,
 				     goto_tbl, group_id, copy_to_cpu);
@@ -4269,7 +4289,7 @@ static int rocker_flow_set_acl(struct net_device *dev,
 		}
 	}
 
-	err = rocker_flow_tbl_acl(rocker_port, flags,
+	err = rocker_flow_tbl_acl(rocker_port, flags, 0,
 				  in_lport, in_lport_mask,
 				  eth_src, eth_src_mask,
 				  eth_dst, eth_dst_mask, ethtype,

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [net-next PATCH v1 10/11] net: rocker: have flow api calls set cookie value
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (8 preceding siblings ...)
  2014-12-31 19:49 ` [net-next PATCH v1 09/11] net: rocker: add cookie to group acls and use flow_id to set cookie John Fastabend
@ 2014-12-31 19:50 ` John Fastabend
  2014-12-31 19:50 ` [net-next PATCH v1 11/11] net: rocker: implement delete flow routine John Fastabend
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:50 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 4d2d292..4ca95da 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3938,7 +3938,8 @@ static int rocker_flow_set_ig_port(struct net_device *dev,
 	in_lport_mask = flow->matches[0].mask_u32;
 	goto_tbl = rocker_goto_value(flow->actions[0].args[0].value_u16);
 
-	err = rocker_flow_tbl_ig_port(rocker_port, flags, 0,
+	err = rocker_flow_tbl_ig_port(rocker_port, flags,
+				      flow->uid, 
 				      in_lport, in_lport_mask,
 				      goto_tbl);
 	return err;
@@ -4000,7 +4001,7 @@ static int rocker_flow_set_vlan(struct net_device *dev,
 	if (!have_in_lport)
 		return -EINVAL;
 
-	err = rocker_flow_tbl_vlan(rocker_port, flags, 0, in_lport,
+	err = rocker_flow_tbl_vlan(rocker_port, flags, flow->uid, in_lport,
 				   vlan_id, vlan_id_mask, goto_tbl,
 				   untagged, new_vlan_id);
 	return err;
@@ -4082,7 +4083,7 @@ static int rocker_flow_set_term_mac(struct net_device *dev,
 		}
 	}
 
-	err = rocker_flow_tbl_term_mac(rocker_port, 0,
+	err = rocker_flow_tbl_term_mac(rocker_port, flow->uid,
 				       in_lport, in_lport_mask,
 				       ethtype, eth_dst, eth_dst_mask,
 				       vlan_id, vlan_id_mask,
@@ -4182,7 +4183,7 @@ static int rocker_flow_set_bridge(struct net_device *dev,
 	}
 
 	/* Ignoring eth_dst_mask it seems to cause a EINVAL return code */
-	err = rocker_flow_tbl_bridge(rocker_port, flags, 0,
+	err = rocker_flow_tbl_bridge(rocker_port, flags, flow->uid,
 				     eth_dst, eth_dst_mask,
 				     vlan_id, tunnel_id,
 				     goto_tbl, group_id, copy_to_cpu);
@@ -4289,7 +4290,7 @@ static int rocker_flow_set_acl(struct net_device *dev,
 		}
 	}
 
-	err = rocker_flow_tbl_acl(rocker_port, flags, 0,
+	err = rocker_flow_tbl_acl(rocker_port, flags, flow->uid,
 				  in_lport, in_lport_mask,
 				  eth_src, eth_src_mask,
 				  eth_dst, eth_dst_mask, ethtype,
@@ -4354,6 +4355,8 @@ static int rocker_flow_set_group_slice_l3_unicast(struct net_device *dev,
 		}
 	}
 
+	entry->cookie = flow->uid;
+
 	return rocker_group_tbl_do(rocker_port, flags, entry);
 }
 
@@ -4409,6 +4412,8 @@ static int rocker_flow_set_group_slice_l2_rewrite(struct net_device *dev,
 		}
 	}
 
+	entry->cookie = flow->uid;
+
 	return rocker_group_tbl_do(rocker_port, flags, entry);
 }
 
@@ -4464,6 +4469,8 @@ static int rocker_flow_set_group_slice_l2(struct net_device *dev,
 		}
 	}
 
+	entry->cookie = flow->uid;
+
 	return rocker_group_tbl_do(rocker_port, flags, entry);
 }
 
@@ -5307,7 +5314,7 @@ static int rocker_get_flows(struct sk_buff *skb, struct net_device *dev,
 			continue;
 
 		flow.table_id = table;
-		flow.uid = group->group_id;
+		flow.uid = group->cookie;
 		flow.priority = 1;
 
 		switch (table) {

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [net-next PATCH v1 11/11] net: rocker: implement delete flow routine
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (9 preceding siblings ...)
  2014-12-31 19:50 ` [net-next PATCH v1 10/11] net: rocker: have flow api calls set cookie value John Fastabend
@ 2014-12-31 19:50 ` John Fastabend
  2015-01-04  8:30 ` [net-next PATCH v1 00/11] A flow API Or Gerlitz
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2014-12-31 19:50 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c |   39 +++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 4ca95da..fb1e3eb 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4523,7 +4523,44 @@ static int rocker_set_flows(struct net_device *dev,
 static int rocker_del_flows(struct net_device *dev,
 			    struct net_flow_flow *flow)
 {
-	return -EOPNOTSUPP;
+	struct rocker_port *rocker_port = netdev_priv(dev);
+	struct rocker_flow_tbl_entry *entry;
+	struct rocker_group_tbl_entry *group;
+	struct hlist_node *tmp;
+	int bkt, err = -EEXIST;
+	unsigned long flags;
+
+	spin_lock_irqsave(&rocker_port->rocker->flow_tbl_lock, flags);
+	hash_for_each_safe(rocker_port->rocker->flow_tbl,
+			   bkt, tmp, entry, entry) {
+		if (rocker_goto_value(flow->table_id) != entry->key.tbl_id ||
+		    flow->uid != entry->cookie)
+			continue;
+
+		hash_del(&entry->entry);
+		err = 0;
+		break;
+	}
+	spin_unlock_irqrestore(&rocker_port->rocker->flow_tbl_lock, flags);
+
+	if (!err)
+		return err;
+
+	spin_lock_irqsave(&rocker_port->rocker->group_tbl_lock, flags);
+	hash_for_each_safe(rocker_port->rocker->group_tbl,
+			   bkt, tmp, group, entry) {
+		if (rocker_goto_value(flow->table_id) !=
+			ROCKER_GROUP_TYPE_GET(group->group_id) ||
+		    flow->uid != group->cookie)
+			continue;
+
+		hash_del(&group->entry);
+		err = 0;
+		break;
+	}
+	spin_unlock_irqrestore(&rocker_port->rocker->group_tbl_lock, flags);
+
+	return err;
 }
 
 static int rocker_ig_port_to_flow(struct rocker_flow_tbl_key *key,

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2014-12-31 19:45 ` [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables John Fastabend
@ 2014-12-31 20:10   ` John Fastabend
  2015-01-04 11:12   ` Thomas Graf
  2015-01-06  5:25   ` Scott Feldman
  2 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2014-12-31 20:10 UTC (permalink / raw)
  To: tgraf, sfeldma, jiri, jhs, simon.horman; +Cc: netdev, davem, andy

On 12/31/2014 11:45 AM, John Fastabend wrote:
> Currently, we do not have an interface to query hardware and learn
> the capabilities of the device. This makes it very difficult to use
> hardware flow tables.
>

oops missed a few dev_put calls so at least need a new rev
for this. I'll wait a few days for feedback though.

[...]

> +
> +static int net_flow_cmd_get_actions(struct sk_buff *skb,
> +				    struct genl_info *info)
> +{
> +	struct net_flow_action **a;
> +	struct net_device *dev;
> +	struct sk_buff *msg;
> +
> +	dev = net_flow_get_dev(info);
> +	if (!dev)
> +		return -EINVAL;
> +
> +	if (!dev->netdev_ops->ndo_flow_get_actions) {
> +		dev_put(dev);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	a = dev->netdev_ops->ndo_flow_get_actions(dev);
> +	if (!a)

missing dev_put(dev) here.

> +		return -EBUSY;
> +
> +	msg = net_flow_build_actions_msg(a, dev,
> +					 info->snd_portid,
> +					 info->snd_seq,
> +					 NET_FLOW_TABLE_CMD_GET_ACTIONS);
> +	dev_put(dev);
> +
> +	if (IS_ERR(msg))
> +		return PTR_ERR(msg);
> +
> +	return genlmsg_reply(msg, info);
> +}
> +
> +static int net_flow_put_table(struct net_device *dev,
> +			      struct sk_buff *skb,
> +			      struct net_flow_table *t)
> +{
> +	struct nlattr *matches, *actions;
> +	int i;
> +
> +	if (nla_put_string(skb, NET_FLOW_TABLE_ATTR_NAME, t->name) ||
> +	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_UID, t->uid) ||
> +	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SOURCE, t->source) ||
> +	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SIZE, t->size))
> +		return -EMSGSIZE;
> +
> +	matches = nla_nest_start(skb, NET_FLOW_TABLE_ATTR_MATCHES);
> +	if (!matches)
> +		return -EMSGSIZE;
> +
> +	for (i = 0; t->matches[i].instance; i++)
> +		nla_put(skb, NET_FLOW_FIELD_REF,
> +			sizeof(struct net_flow_field_ref),
> +			&t->matches[i]);

need to check the return codes here.

> +	nla_nest_end(skb, matches);
> +
> +	actions = nla_nest_start(skb, NET_FLOW_TABLE_ATTR_ACTIONS);
> +	if (!actions)
> +		return -EMSGSIZE;
> +
> +	for (i = 0; t->actions[i]; i++) {
> +		if (nla_put_u32(skb,
> +				NET_FLOW_ACTION_ATTR_UID,
> +				t->actions[i])) {
> +			nla_nest_cancel(skb, actions);
> +			return -EMSGSIZE;
> +		}

remembered to do the check here though ;)

> +	}
> +	nla_nest_end(skb, actions);
> +
> +	return 0;
> +}
> +

[...]

> +
> +static struct sk_buff *net_flow_build_tables_msg(struct net_flow_table **t,
> +						 struct net_device *dev,
> +						 u32 portid, int seq, u8 cmd)
> +{
> +	struct genlmsghdr *hdr;
> +	struct sk_buff *skb;
> +	int err = -ENOBUFS;
> +
> +	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!skb)
> +		return ERR_PTR(-ENOBUFS);
> +
> +	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
> +	if (!hdr)
> +		goto out;
> +
> +	if (nla_put_u32(skb,
> +			NET_FLOW_IDENTIFIER_TYPE,
> +			NET_FLOW_IDENTIFIER_IFINDEX) ||
> +	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
> +		err = -ENOBUFS;
> +		goto out;
> +	}
> +
> +	err = net_flow_put_tables(dev, skb, t);
> +	if (err < 0)
> +		goto out;
> +
> +	err = genlmsg_end(skb, hdr);
> +	if (err < 0)
> +		goto out;
> +
> +	return skb;
> +out:
> +	nlmsg_free(skb);
> +	return ERR_PTR(err);
> +}
> +
> +static int net_flow_cmd_get_tables(struct sk_buff *skb,
> +				   struct genl_info *info)
> +{
> +	struct net_flow_table **tables;
> +	struct net_device *dev;
> +	struct sk_buff *msg;
> +
> +	dev = net_flow_get_dev(info);
> +	if (!dev)
> +		return -EINVAL;
> +
> +	if (!dev->netdev_ops->ndo_flow_get_tables) {
> +		dev_put(dev);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	tables = dev->netdev_ops->ndo_flow_get_tables(dev);
> +	if (!tables) /* transient failure should always have some table */

need dev_put()

> +		return -EBUSY;
> +
> +	msg = net_flow_build_tables_msg(tables, dev,
> +					info->snd_portid,
> +					info->snd_seq,
> +					NET_FLOW_TABLE_CMD_GET_TABLES);
> +	dev_put(dev);
> +
> +	if (IS_ERR(msg))
> +		return PTR_ERR(msg);
> +
> +	return genlmsg_reply(msg, info);
> +}
> +

[...]

> +
> +static int net_flow_put_headers(struct sk_buff *skb,
> +				struct net_flow_header **headers)
> +{
> +	struct nlattr *nest, *hdr, *fields;
> +	struct net_flow_header *h;
> +	int i, err;
> +
> +	nest = nla_nest_start(skb, NET_FLOW_HEADERS);
> +	if (!nest)
> +		return -EMSGSIZE;
> +
> +	for (i = 0; headers[i]->uid; i++) {
> +		err = -EMSGSIZE;
> +		h = headers[i];
> +
> +		hdr = nla_nest_start(skb, NET_FLOW_HEADER);
> +		if (!hdr)
> +			goto hdr_put_failure;
> +
> +		if (nla_put_string(skb, NET_FLOW_HEADER_ATTR_NAME, h->name) ||
> +		    nla_put_u32(skb, NET_FLOW_HEADER_ATTR_UID, h->uid))
> +			goto attr_put_failure;
> +
> +		fields = nla_nest_start(skb, NET_FLOW_HEADER_ATTR_FIELDS);
> +		if (!fields)
> +			goto attr_put_failure;
> +
> +		err = net_flow_put_fields(skb, h);
> +		if (err)
> +			goto fields_put_failure;
> +
> +		nla_nest_end(skb, fields);
> +

can remove this new line I think it doesn't add much.

> +		nla_nest_end(skb, hdr);
> +	}
> +	nla_nest_end(skb, nest);
> +
> +	return 0;
> +fields_put_failure:
> +	nla_nest_cancel(skb, fields);
> +attr_put_failure:
> +	nla_nest_cancel(skb, hdr);
> +hdr_put_failure:
> +	nla_nest_cancel(skb, nest);
> +	return err;
> +}
> +

[...]

> +
> +static int net_flow_cmd_get_headers(struct sk_buff *skb,
> +				    struct genl_info *info)
> +{
> +	struct net_flow_header **h;
> +	struct net_device *dev;
> +	struct sk_buff *msg;
> +
> +	dev = net_flow_get_dev(info);
> +	if (!dev)
> +		return -EINVAL;
> +
> +	if (!dev->netdev_ops->ndo_flow_get_headers) {
> +		dev_put(dev);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	h = dev->netdev_ops->ndo_flow_get_headers(dev);
> +	if (!h)

dev_put again

> +		return -EBUSY;
> +
> +	msg = net_flow_build_headers_msg(h, dev,
> +					 info->snd_portid,
> +					 info->snd_seq,
> +					 NET_FLOW_TABLE_CMD_GET_HEADERS);
> +	dev_put(dev);
> +
> +	if (IS_ERR(msg))
> +		return PTR_ERR(msg);
> +
> +	return genlmsg_reply(msg, info);
> +}
> +

[...]

> +
> +static int net_flow_cmd_get_header_graph(struct sk_buff *skb,
> +					 struct genl_info *info)
> +{
> +	struct net_flow_hdr_node **h;
> +	struct net_device *dev;
> +	struct sk_buff *msg;
> +
> +	dev = net_flow_get_dev(info);
> +	if (!dev)
> +		return -EINVAL;
> +
> +	if (!dev->netdev_ops->ndo_flow_get_hdr_graph) {
> +		dev_put(dev);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	h = dev->netdev_ops->ndo_flow_get_hdr_graph(dev);
> +	if (!h)

dev_put() seems I copy/pasted the same template for each cmd.

> +		return -EBUSY;
> +
> +	msg = net_flow_build_header_graph_msg(h, dev,
> +					      info->snd_portid,
> +					      info->snd_seq,
> +					      NET_FLOW_TABLE_CMD_GET_HDR_GRAPH);
> +	dev_put(dev);
> +
> +	if (IS_ERR(msg))
> +		return PTR_ERR(msg);
> +
> +	return genlmsg_reply(msg, info);
> +}
> +

[...]

> +
> +static int net_flow_cmd_get_table_graph(struct sk_buff *skb,
> +					struct genl_info *info)
> +{
> +	struct net_flow_tbl_node **g;
> +	struct net_device *dev;
> +	struct sk_buff *msg;
> +
> +	dev = net_flow_get_dev(info);
> +	if (!dev)
> +		return -EINVAL;
> +
> +	if (!dev->netdev_ops->ndo_flow_get_tbl_graph) {
> +		dev_put(dev);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	g = dev->netdev_ops->ndo_flow_get_tbl_graph(dev);
> +	if (!g)

dev_put

> +		return -EBUSY;
> +

[...]


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 08/11] net: rocker: add get flow API operation
       [not found]   ` <CAKoUArm4z_i6Su9Q4ODB1QYR_Z098MjT2yN=WR7LbN387AvPsg@mail.gmail.com>
@ 2015-01-02 21:15     ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-02 21:15 UTC (permalink / raw)
  To: Rami Rosen
  Cc: simon.horman, Jamal Hadi Salim, Scott Feldman, David Miller,
	Andy Gospodarek, Thomas Graf, Netdev, Jiří Pírko

On 01/02/2015 12:46 PM, Rami Rosen wrote:
> Nice work!
>
>
> בתאריך 31 בדצמ 2014
>
>
>  > +static int rocker_get_flows(struct sk_buff *skb, struct net_device *dev,
>  > +                           int table, int min, int max)
>  > +{
>  > +       struct rocker_port *rocker_port = netdev_priv(dev);
>  > +       struct net_flow_flow flow;
>  > +       struct rocker_flow_tbl_entry *entry;
>  > +       struct rocker_group_tbl_entry *group;
>  > +       struct hlist_node *tmp;
>  > +       unsigned long flags;
>  > +       int bkt, err;
>  > +
>  > +       spin_lock_irqsave(&rocker_port->rocker->flow_tbl_lock, flags);
>  > +       hash_for_each_safe(rocker_port->rocker->flow_tbl,
>  > +                          bkt, tmp, entry, entry) {
>  > +               struct rocker_flow_tbl_key *key = &entry->key;
>  > +
>  > +               if (rocker_goto_value(table) != key->tbl_id)
>  > +                       continue;
>  > +
>  > +               flow.table_id = table;
>  > +               flow.uid = entry->cookie;
>  > +               flow.priority = key->priority;
>  > +
>  > +               switch (table) {
>  > +               case ROCKER_FLOW_TABLE_ID_INGRESS_PORT:
>  > +                       err = rocker_ig_port_to_flow(key, &flow);
>  > +                       if (err)
>  > +                               return err;
>  > +                       break;
>  > +               case ROCKER_FLOW_TABLE_ID_VLAN:
>  > +                       err = rocker_vlan_to_flow(key, &flow);
>  > +                       if (err)
>  > +                               return err;
>  > +                       break;
>  > +               case ROCKER_FLOW_TABLE_ID_TERMINATION_MAC:
>  > +                       err = rocker_term_to_flow(key, &flow);
>
> Shouldn't it be here (and in the following 3 case entries) also:
>

Yes, thanks for catching this. I'll update it in v2. Along with the
other fixes for dev_put misses.

.John


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 00/11] A flow API
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (10 preceding siblings ...)
  2014-12-31 19:50 ` [net-next PATCH v1 11/11] net: rocker: implement delete flow routine John Fastabend
@ 2015-01-04  8:30 ` Or Gerlitz
  2015-01-05  5:17   ` John Fastabend
  2015-01-06  2:42 ` Scott Feldman
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Or Gerlitz @ 2015-01-04  8:30 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, sfeldma, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, Linux Netdev List, David Miller,
	Andy Gospodarek

On Wed, Dec 31, 2014 at 9:45 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> So... I could continue to mull over this and tweak bits and pieces
> here and there but I decided its best to get a wider group of folks
> looking at it and hopefulyl with any luck using it so here it is.
[...]
> I could use some help reviewing
[...]

Hi John,

It would be very helpful to get access to the actual patches, I don't
see them on the netdev patchwork queue, and assume
it's b/c this is still in RFC stage. Cloning your github tree and
looking there, I see some earlier/WIP versions of the code, but it's
not
the submitted patches.

Or.

[1] https://github.com/jrfastab/flow-net-next.git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch
  2014-12-31 19:47 ` [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch John Fastabend
@ 2015-01-04  8:43   ` Or Gerlitz
  2015-01-05  5:18     ` John Fastabend
  2015-01-06  7:01   ` Scott Feldman
  1 sibling, 1 reply; 60+ messages in thread
From: Or Gerlitz @ 2015-01-04  8:43 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, sfeldma, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, Linux Netdev List, David Miller,
	Andy Gospodarek

On Wed, Dec 31, 2014 at 9:47 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> This adds rocker support for the net_flow_get_* operations. With this
> we can interrogate rocker.
>
> Here we see that for static configurations enabling the get operations
> is simply a matter of defining a pipeline model and returning the
> structures for the core infrastructure to encapsulate into netlink
> messages.

[..]

> diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
> new file mode 100644
> index 0000000..9544339
> --- /dev/null
> +++ b/drivers/net/ethernet/rocker/rocker_pipeline.h
> @@ -0,0 +1,673 @@
> +#ifndef _MY_PIPELINE_H_
> +#define _MY_PIPELINE_H_
> +
> +#include <linux/if_flow.h>
> +
> +/* header definition */
> +#define HEADER_ETHERNET_SRC_MAC 1
> +#define HEADER_ETHERNET_DST_MAC 2
> +#define HEADER_ETHERNET_ETHERTYPE 3
> +struct net_flow_field ethernet_fields[3] = {
> +       { .name = "src_mac", .uid = HEADER_ETHERNET_SRC_MAC, .bitwidth = 48},
> +       { .name = "dst_mac", .uid = HEADER_ETHERNET_DST_MAC, .bitwidth = 48},
> +       { .name = "ethertype",
> +         .uid = HEADER_ETHERNET_ETHERTYPE,
> +         .bitwidth = 16},
> +};
> +
> +#define HEADER_ETHERNET 1
> +struct net_flow_header ethernet = {
> +       .name = "ethernet",
> +       .uid = HEADER_ETHERNET,
> +       .field_sz = 3,
> +       .fields = ethernet_fields,
> +};
> +
> +#define HEADER_VLAN_PCP 1
> +#define HEADER_VLAN_CFI 2
> +#define HEADER_VLAN_VID 3
> +#define HEADER_VLAN_ETHERTYPE 4
> +struct net_flow_field vlan_fields[4] = {
> +       { .name = "pcp", .uid = HEADER_VLAN_PCP, .bitwidth = 3,},
> +       { .name = "cfi", .uid = HEADER_VLAN_CFI, .bitwidth = 1,},
> +       { .name = "vid", .uid = HEADER_VLAN_VID, .bitwidth = 12,},
> +       { .name = "ethertype", .uid = HEADER_VLAN_ETHERTYPE, .bitwidth = 16,},
> +};
> +
> +#define HEADER_VLAN 2
> +struct net_flow_header vlan = {
> +       .name = "vlan",
> +       .uid = HEADER_VLAN,
> +       .field_sz = 4,
> +       .fields = vlan_fields,
> +};
> +
> +#define HEADER_IPV4_VERSION 1
> +#define HEADER_IPV4_IHL 2
> +#define HEADER_IPV4_DSCP 3
> +#define HEADER_IPV4_ECN 4
> +#define HEADER_IPV4_LENGTH 5
> +#define HEADER_IPV4_IDENTIFICATION 6
> +#define HEADER_IPV4_FLAGS 7
> +#define HEADER_IPV4_FRAGMENT_OFFSET 8
> +#define HEADER_IPV4_TTL 9
> +#define HEADER_IPV4_PROTOCOL 10
> +#define HEADER_IPV4_CSUM 11
> +#define HEADER_IPV4_SRC_IP 12
> +#define HEADER_IPV4_DST_IP 13
> +#define HEADER_IPV4_OPTIONS 14
> +struct net_flow_field ipv4_fields[14] = {
> +       { .name = "version",
> +         .uid = HEADER_IPV4_VERSION,
> +         .bitwidth = 4,},
> +       { .name = "ihl",
> +         .uid = HEADER_IPV4_IHL,
> +         .bitwidth = 4,},
> +       { .name = "dscp",
> +         .uid = HEADER_IPV4_DSCP,
> +         .bitwidth = 6,},
> +       { .name = "ecn",
> +         .uid = HEADER_IPV4_ECN,
> +         .bitwidth = 2,},
> +       { .name = "length",
> +         .uid = HEADER_IPV4_LENGTH,
> +         .bitwidth = 8,},
> +       { .name = "identification",
> +         .uid = HEADER_IPV4_IDENTIFICATION,
> +         .bitwidth = 8,},
> +       { .name = "flags",
> +         .uid = HEADER_IPV4_FLAGS,
> +         .bitwidth = 3,},
> +       { .name = "fragment_offset",
> +         .uid = HEADER_IPV4_FRAGMENT_OFFSET,
> +         .bitwidth = 13,},
> +       { .name = "ttl",
> +         .uid = HEADER_IPV4_TTL,
> +         .bitwidth = 1,},
> +       { .name = "protocol",
> +         .uid = HEADER_IPV4_PROTOCOL,
> +         .bitwidth = 8,},
> +       { .name = "csum",
> +         .uid = HEADER_IPV4_CSUM,
> +         .bitwidth = 8,},
> +       { .name = "src_ip",
> +         .uid = HEADER_IPV4_SRC_IP,
> +         .bitwidth = 32,},
> +       { .name = "dst_ip",
> +         .uid = HEADER_IPV4_DST_IP,
> +         .bitwidth = 32,},
> +       { .name = "options",
> +         .uid = HEADER_IPV4_OPTIONS,
> +         .bitwidth = -1,},
> +};
> +

John,

Repeating the feedback I provided you f2f in Dusseldorf when the WIP
code was within ixgbe, some/many code pieces in this patch (e.g the
above) are generic and hence should reside not within a low level
driver such as rocker, and also on a separate patch.

Or.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2014-12-31 19:45 ` [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables John Fastabend
  2014-12-31 20:10   ` John Fastabend
@ 2015-01-04 11:12   ` Thomas Graf
  2015-01-05 18:59     ` John Fastabend
  2015-01-06  5:25   ` Scott Feldman
  2 siblings, 1 reply; 60+ messages in thread
From: Thomas Graf @ 2015-01-04 11:12 UTC (permalink / raw)
  To: John Fastabend; +Cc: sfeldma, jiri, jhs, simon.horman, netdev, davem, andy

On 12/31/14 at 11:45am, John Fastabend wrote:

Impressive work John, some minor nits below. In general this looks
great. How large could tables grow? Any risk one of the nested
attribtues could exceed 16K in size because of a very large parse
graph? Not a problem if we account for it and allow for jumbo
attributes.

> +
> +/**
> + * @struct net_flow_header
> + * @brief defines a match (header/field) an endpoint can use
> + *
> + * @uid unique identifier for header
> + * @field_sz number of fields are in the set
> + * @fields the set of fields in the net_flow_header

FWIW, name is not documented.

> + */
> +struct net_flow_header {
> +	char name[NET_FLOW_NAMSIZ];
> +	int uid;
> +	int field_sz;
> +	struct net_flow_field *fields;
> +};
> +
> +
> +/**
> + * @struct net_flow_table
> + * @brief define flow table with supported match/actions
> + *
> + * @uid unique identifier for table
> + * @source uid of parent table
> + * @size max number of entries for table or -1 for unbounded
> + * @matches null terminated set of supported match types given by match uid
> + * @actions null terminated set of supported action types given by action uid
> + * @flows set of flows

name not documented, flows seems to be leftover

> + */
> +struct net_flow_table {
> +	char name[NET_FLOW_NAMSIZ];
> +	int uid;
> +	int source;
> +	int size;
> +	struct net_flow_field_ref *matches;
> +	int *actions;
> +};
> +
> +/* net_flow_hdr_node: node in a header graph of header fields.
> + *
> + * @uid : unique id of the graph node
> + * @flwo_header_ref : identify the hdrs that can handled by this node
> + * @net_flow_jump_table : give a case jump statement
> + */

needs more work too ;)

> +struct net_flow_hdr_node {
> +	char name[NET_FLOW_NAMSIZ];
> +	int uid;
> +	int *hdrs;
> +	struct net_flow_jump_table *jump;
> +};
> + */
> +
> +/* Netlink description:
> + *
> + * Table definition used to describe running tables. The following
> + * describes the netlink message returned from a flow API messages.
> + *
> + * Flow table definitions used to define tables.
> + *
> + * [NET_FLOW_TABLE_IDENTIFIER_TYPE]
> + * [NET_FLOW_TABLE_IDENTIFIER]
> + * [NET_FLOW_TABLE_TABLES]
> + *     [NET_FLOW_TABLE]
> + *       [NET_FLOW_TABLE_ATTR_NAME]
> + *       [NET_FLOW_TABLE_ATTR_UID]
> + *       [NET_FLOW_TABLE_ATTR_SOURCE]
> + *       [NET_FLOW_TABLE_ATTR_SIZE]
> + *	 [NET_FLOW_TABLE_ATTR_MATCHES]

The tabs and spaces mix make the indentation wrong in the patch, it
looks correct unquoted though but consistency would make this perfect.

> +#ifndef _UAPI_LINUX_IF_FLOW
> +#define _UAPI_LINUX_IF_FLOW
> +
> +#include <linux/types.h>
> +#include <linux/netlink.h>
> +#include <linux/if.h>
> +
> +#define NET_FLOW_NAMSIZ 80

Did you consider allocating the memory for names? I don't have a grasp
for the typical number of net_flow_* instances in memory yet.

> +/**
> + * @struct net_flow_field_ref
> + * @brief uniquely identify field as header:field tuple
> + */
> +struct net_flow_field_ref {
> +	int instance;
> +	int header;
> +	int field;
> +	int mask_type;
> +	int type;
> +	union {	/* Are these all the required data types */
> +		__u8 value_u8;
> +		__u16 value_u16;
> +		__u32 value_u32;
> +		__u64 value_u64;
> +	};
> +	union {	/* Are these all the required data types */
> +		__u8 mask_u8;
> +		__u16 mask_u16;
> +		__u32 mask_u32;
> +		__u64 mask_u64;
> +	};
> +};

Does it make sense to write this as follows?

union {
        struct {
                __u8 value_u8;
                __u8 mask_u8;
        };
        struct {
                __u16 value_u16;
                __u16 mask_u16;
        };
        ...
};

> +#define NET_FLOW_TABLE_EGRESS_ROOT 1
> +#define	NET_FLOW_TABLE_INGRESS_ROOT 2

Tab/space mix.

> +struct sk_buff *net_flow_build_actions_msg(struct net_flow_action **a,
> +					   struct net_device *dev,
> +					   u32 portid, int seq, u8 cmd)
> +{
> +	struct genlmsghdr *hdr;
> +	struct sk_buff *skb;
> +	int err = -ENOBUFS;
> +
> +	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);

genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);

> +static int net_flow_put_table(struct net_device *dev,
> +			      struct sk_buff *skb,
> +			      struct net_flow_table *t)
> +{
> +	struct nlattr *matches, *actions;
> +	int i;
> +
> +	if (nla_put_string(skb, NET_FLOW_TABLE_ATTR_NAME, t->name) ||
> +	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_UID, t->uid) ||
> +	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SOURCE, t->source) ||
> +	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SIZE, t->size))
> +		return -EMSGSIZE;
> +
> +	matches = nla_nest_start(skb, NET_FLOW_TABLE_ATTR_MATCHES);
> +	if (!matches)
> +		return -EMSGSIZE;
> +
> +	for (i = 0; t->matches[i].instance; i++)
> +		nla_put(skb, NET_FLOW_FIELD_REF,
> +			sizeof(struct net_flow_field_ref),
> +			&t->matches[i]);

Unhandled nla_put() error


> +static struct sk_buff *net_flow_build_tables_msg(struct net_flow_table **t,
> +						 struct net_device *dev,
> +						 u32 portid, int seq, u8 cmd)
> +{
> +	struct genlmsghdr *hdr;
> +	struct sk_buff *skb;
> +	int err = -ENOBUFS;
> +
> +	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);

genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);

> +static int net_flow_put_headers(struct sk_buff *skb,
> +				struct net_flow_header **headers)
> +{
> +	struct nlattr *nest, *hdr, *fields;
> +	struct net_flow_header *h;
> +	int i, err;
> +
> +	nest = nla_nest_start(skb, NET_FLOW_HEADERS);
> +	if (!nest)
> +		return -EMSGSIZE;
> +
> +	for (i = 0; headers[i]->uid; i++) {
> +		err = -EMSGSIZE;
> +		h = headers[i];
> +
> +		hdr = nla_nest_start(skb, NET_FLOW_HEADER);
> +		if (!hdr)
> +			goto hdr_put_failure;
> +
> +		if (nla_put_string(skb, NET_FLOW_HEADER_ATTR_NAME, h->name) ||
> +		    nla_put_u32(skb, NET_FLOW_HEADER_ATTR_UID, h->uid))
> +			goto attr_put_failure;
> +
> +		fields = nla_nest_start(skb, NET_FLOW_HEADER_ATTR_FIELDS);
> +		if (!fields)
> +			goto attr_put_failure;

You can jump to hdr_put_failure right away and get rid of the
attr_put_failure target as you cancel that nest anyway. You can apply 
this comment to several other places as well if you want.

> +
> +		err = net_flow_put_fields(skb, h);
> +		if (err)
> +			goto fields_put_failure;
> +
> +		nla_nest_end(skb, fields);
> +
> +		nla_nest_end(skb, hdr);
> +	}
> +	nla_nest_end(skb, nest);
> +
> +	return 0;
> +fields_put_failure:
> +	nla_nest_cancel(skb, fields);
> +attr_put_failure:
> +	nla_nest_cancel(skb, hdr);
> +hdr_put_failure:
> +	nla_nest_cancel(skb, nest);
> +	return err;
> +}
> +
> +static struct sk_buff *net_flow_build_headers_msg(struct net_flow_header **h,
> +						  struct net_device *dev,
> +						  u32 portid, int seq, u8 cmd)
> +{
> +	struct genlmsghdr *hdr;
> +	struct sk_buff *skb;
> +	int err = -ENOBUFS;
> +
> +	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>
genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);

> +static
> +struct sk_buff *net_flow_build_graph_msg(struct net_flow_tbl_node **g,
> +					 struct net_device *dev,
> +					 u32 portid, int seq, u8 cmd)
> +{
> +	struct genlmsghdr *hdr;
> +	struct sk_buff *skb;
> +	int err = -ENOBUFS;
> +
> +	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>
genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 00/11] A flow API
  2015-01-04  8:30 ` [net-next PATCH v1 00/11] A flow API Or Gerlitz
@ 2015-01-05  5:17   ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-05  5:17 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Thomas Graf, sfeldma, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, Linux Netdev List, David Miller,
	Andy Gospodarek

On 01/04/2015 12:30 AM, Or Gerlitz wrote:
> On Wed, Dec 31, 2014 at 9:45 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> So... I could continue to mull over this and tweak bits and pieces
>> here and there but I decided its best to get a wider group of folks
>> looking at it and hopefulyl with any luck using it so here it is.
> [...]
>> I could use some help reviewing
> [...]
>
> Hi John,
>
> It would be very helpful to get access to the actual patches, I don't
> see them on the netdev patchwork queue, and assume
> it's b/c this is still in RFC stage. Cloning your github tree and
> looking there, I see some earlier/WIP versions of the code, but it's
> not
> the submitted patches.
>

The netdev mailed patches should be there (didn't check) I'm guessing
you just need to set the filters correctly. There is an "Action
Required" filter that you most likely need to remove I think it is on
by default. Seeing I already commented on the series indicating I would
need a v2 to address some fixes I'm guessing its already been cleared
from the queue.

> Or.
>
> [1] https://github.com/jrfastab/flow-net-next.git
>

That link is a bit out of date here I pushed the exact series I sent
to a git repo here,

https://github.com/jrfastab/rocker-net-next

All update it tomorrow with some feedback though.

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch
  2015-01-04  8:43   ` Or Gerlitz
@ 2015-01-05  5:18     ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-05  5:18 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Thomas Graf, sfeldma, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, Linux Netdev List, David Miller,
	Andy Gospodarek

On 01/04/2015 12:43 AM, Or Gerlitz wrote:
> On Wed, Dec 31, 2014 at 9:47 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> This adds rocker support for the net_flow_get_* operations. With this
>> we can interrogate rocker.
>>
>> Here we see that for static configurations enabling the get operations
>> is simply a matter of defining a pipeline model and returning the
>> structures for the core infrastructure to encapsulate into netlink
>> messages.
>
> [..]
>

[...]

>> +
>
> John,
>
> Repeating the feedback I provided you f2f in Dusseldorf when the WIP
> code was within ixgbe, some/many code pieces in this patch (e.g the
> above) are generic and hence should reside not within a low level
> driver such as rocker, and also on a separate patch.
>
> Or.
>

yep... will address in v2 thanks for the reminder, I expect I'll have
a v2 ready some time soon.

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-04 11:12   ` Thomas Graf
@ 2015-01-05 18:59     ` John Fastabend
  2015-01-05 21:48       ` Thomas Graf
                         ` (3 more replies)
  0 siblings, 4 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-05 18:59 UTC (permalink / raw)
  To: Thomas Graf; +Cc: sfeldma, jiri, jhs, simon.horman, netdev, davem, andy

On 01/04/2015 03:12 AM, Thomas Graf wrote:
> On 12/31/14 at 11:45am, John Fastabend wrote:
>
> Impressive work John, some minor nits below. In general this looks
> great. How large could tables grow? Any risk one of the nested
> attribtues could exceed 16K in size because of a very large parse
> graph? Not a problem if we account for it and allow for jumbo
> attributes.
>

hmm it sounds large to me but maybe if you have an NPU that is trying
to parse into application data it could happen.

What does it take to allow for jumbo attributes?

>> +
>> +/**
>> + * @struct net_flow_header
>> + * @brief defines a match (header/field) an endpoint can use
>> + *
>> + * @uid unique identifier for header
>> + * @field_sz number of fields are in the set
>> + * @fields the set of fields in the net_flow_header
>
> FWIW, name is not documented.

thanks fixed up documentation and spacing for v2.

[...]

>> +#ifndef _UAPI_LINUX_IF_FLOW
>> +#define _UAPI_LINUX_IF_FLOW
>> +
>> +#include <linux/types.h>
>> +#include <linux/netlink.h>
>> +#include <linux/if.h>
>> +
>> +#define NET_FLOW_NAMSIZ 80
>
> Did you consider allocating the memory for names? I don't have a grasp
> for the typical number of net_flow_* instances in memory yet.
>

<100k in the devices I have. Maybe Simon can pitch in what is typical
on the NPUs I'm not sure about them.

Rocker tables can grow as large as needed at the moment.

Allocating the memory may help I'll go ahead and give it a try.

>> +/**
>> + * @struct net_flow_field_ref
>> + * @brief uniquely identify field as header:field tuple
>> + */
>> +struct net_flow_field_ref {
>> +	int instance;
>> +	int header;
>> +	int field;
>> +	int mask_type;
>> +	int type;
>> +	union {	/* Are these all the required data types */
>> +		__u8 value_u8;
>> +		__u16 value_u16;
>> +		__u32 value_u32;
>> +		__u64 value_u64;
>> +	};
>> +	union {	/* Are these all the required data types */
>> +		__u8 mask_u8;
>> +		__u16 mask_u16;
>> +		__u32 mask_u32;
>> +		__u64 mask_u64;
>> +	};
>> +};
>
> Does it make sense to write this as follows?

Yes. I'll make this update it helps make it clear value/mask pairs are
needed.

>
> union {
>          struct {
>                  __u8 value_u8;
>                  __u8 mask_u8;
>          };
>          struct {
>                  __u16 value_u16;
>                  __u16 mask_u16;
>          };
>          ...
> };
>
>> +#define NET_FLOW_TABLE_EGRESS_ROOT 1
>> +#define	NET_FLOW_TABLE_INGRESS_ROOT 2
>
> Tab/space mix.
>

yep fixed.

>> +struct sk_buff *net_flow_build_actions_msg(struct net_flow_action **a,
>> +					   struct net_device *dev,
>> +					   u32 portid, int seq, u8 cmd)
>> +{
>> +	struct genlmsghdr *hdr;
>> +	struct sk_buff *skb;
>> +	int err = -ENOBUFS;
>> +
>> +	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>
> genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);

fixed along with the other cases.

>
>> +static int net_flow_put_table(struct net_device *dev,
>> +			      struct sk_buff *skb,
>> +			      struct net_flow_table *t)
>> +{
>> +	struct nlattr *matches, *actions;
>> +	int i;
>> +
>> +	if (nla_put_string(skb, NET_FLOW_TABLE_ATTR_NAME, t->name) ||
>> +	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_UID, t->uid) ||
>> +	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SOURCE, t->source) ||
>> +	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SIZE, t->size))
>> +		return -EMSGSIZE;
>> +
>> +	matches = nla_nest_start(skb, NET_FLOW_TABLE_ATTR_MATCHES);
>> +	if (!matches)
>> +		return -EMSGSIZE;
>> +
>> +	for (i = 0; t->matches[i].instance; i++)
>> +		nla_put(skb, NET_FLOW_FIELD_REF,
>> +			sizeof(struct net_flow_field_ref),
>> +			&t->matches[i]);
>
> Unhandled nla_put() error
>

thanks.

[...]

>> +static int net_flow_put_headers(struct sk_buff *skb,
>> +				struct net_flow_header **headers)
>> +{
>> +	struct nlattr *nest, *hdr, *fields;
>> +	struct net_flow_header *h;
>> +	int i, err;
>> +
>> +	nest = nla_nest_start(skb, NET_FLOW_HEADERS);
>> +	if (!nest)
>> +		return -EMSGSIZE;
>> +
>> +	for (i = 0; headers[i]->uid; i++) {
>> +		err = -EMSGSIZE;
>> +		h = headers[i];
>> +
>> +		hdr = nla_nest_start(skb, NET_FLOW_HEADER);
>> +		if (!hdr)
>> +			goto hdr_put_failure;
>> +
>> +		if (nla_put_string(skb, NET_FLOW_HEADER_ATTR_NAME, h->name) ||
>> +		    nla_put_u32(skb, NET_FLOW_HEADER_ATTR_UID, h->uid))
>> +			goto attr_put_failure;
>> +
>> +		fields = nla_nest_start(skb, NET_FLOW_HEADER_ATTR_FIELDS);
>> +		if (!fields)
>> +			goto attr_put_failure;
>
> You can jump to hdr_put_failure right away and get rid of the
> attr_put_failure target as you cancel that nest anyway. You can apply
> this comment to several other places as well if you want.
>

OK so to simplify the error paths we only need to cancel the outer most
nested attribute. I'll do this transformation.

.John


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-05 18:59     ` John Fastabend
@ 2015-01-05 21:48       ` Thomas Graf
  2015-01-05 23:29       ` John Fastabend
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 60+ messages in thread
From: Thomas Graf @ 2015-01-05 21:48 UTC (permalink / raw)
  To: John Fastabend; +Cc: sfeldma, jiri, jhs, simon.horman, netdev, davem, andy

On 01/05/15 at 10:59am, John Fastabend wrote:
> On 01/04/2015 03:12 AM, Thomas Graf wrote:
> >On 12/31/14 at 11:45am, John Fastabend wrote:
> >
> >Impressive work John, some minor nits below. In general this looks
> >great. How large could tables grow? Any risk one of the nested
> >attribtues could exceed 16K in size because of a very large parse
> >graph? Not a problem if we account for it and allow for jumbo
> >attributes.
> >
> 
> hmm it sounds large to me but maybe if you have an NPU that is trying
> to parse into application data it could happen.
> 
> What does it take to allow for jumbo attributes?

We basically need to make user space aware of a new nlattr header
to be expected for certain attributes. We can reserve the 2nd bit
of the type to indicate a 32bit length field following the current
header. We can only do this for new attributes as its not backwards
compatible so we need to think about this before we start exposing
them.

I can send a patch introducing them in the next few days if you
want as it seems you'll have to respin this again anyway.

> >You can jump to hdr_put_failure right away and get rid of the
> >attr_put_failure target as you cancel that nest anyway. You can apply
> >this comment to several other places as well if you want.
> >
> 
> OK so to simplify the error paths we only need to cancel the outer most
> nested attribute. I'll do this transformation.

It's a matter of style. I'm fine either way. Personally I prefer the
single abort error target.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-05 18:59     ` John Fastabend
  2015-01-05 21:48       ` Thomas Graf
@ 2015-01-05 23:29       ` John Fastabend
  2015-01-06  0:45       ` John Fastabend
  2015-01-07 10:07       ` Or Gerlitz
  3 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-05 23:29 UTC (permalink / raw)
  To: Thomas Graf; +Cc: sfeldma, jiri, jhs, simon.horman, netdev, davem, andy

On 01/05/2015 10:59 AM, John Fastabend wrote:

[...]

>>> +#ifndef _UAPI_LINUX_IF_FLOW
>>> +#define _UAPI_LINUX_IF_FLOW
>>> +
>>> +#include <linux/types.h>
>>> +#include <linux/netlink.h>
>>> +#include <linux/if.h>
>>> +
>>> +#define NET_FLOW_NAMSIZ 80
>>
>> Did you consider allocating the memory for names? I don't have a grasp
>> for the typical number of net_flow_* instances in memory yet.
>>
>
> <100k in the devices I have. Maybe Simon can pitch in what is typical
> on the NPUs I'm not sure about them.
>
> Rocker tables can grow as large as needed at the moment.
>
> Allocating the memory may help I'll go ahead and give it a try.
>

One issue with breaking this is up is a couple structures are being
passed as attributes with name[] as a field. I think its best to break
these up passing empty arrays seems to be ugly at best. So I'll need to
adjust some of the messaging as well.

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-05 18:59     ` John Fastabend
  2015-01-05 21:48       ` Thomas Graf
  2015-01-05 23:29       ` John Fastabend
@ 2015-01-06  0:45       ` John Fastabend
  2015-01-06  1:09         ` Simon Horman
  2015-01-07 10:07       ` Or Gerlitz
  3 siblings, 1 reply; 60+ messages in thread
From: John Fastabend @ 2015-01-06  0:45 UTC (permalink / raw)
  To: Thomas Graf; +Cc: sfeldma, jiri, jhs, simon.horman, netdev, davem, andy

[...]

>>> +/**
>>> + * @struct net_flow_field_ref
>>> + * @brief uniquely identify field as header:field tuple
>>> + */
>>> +struct net_flow_field_ref {
>>> +    int instance;
>>> +    int header;
>>> +    int field;
>>> +    int mask_type;
>>> +    int type;
>>> +    union {    /* Are these all the required data types */
>>> +        __u8 value_u8;
>>> +        __u16 value_u16;
>>> +        __u32 value_u32;
>>> +        __u64 value_u64;
>>> +    };
>>> +    union {    /* Are these all the required data types */
>>> +        __u8 mask_u8;
>>> +        __u16 mask_u16;
>>> +        __u32 mask_u32;
>>> +        __u64 mask_u64;
>>> +    };
>>> +};
>>
>> Does it make sense to write this as follows?
>
> Yes. I'll make this update it helps make it clear value/mask pairs are
> needed.
>
>>
>> union {
>>          struct {
>>                  __u8 value_u8;
>>                  __u8 mask_u8;
>>          };
>>          struct {
>>                  __u16 value_u16;
>>                  __u16 mask_u16;
>>          };
>>          ...
>> };

Another thought is to pull this entirely out of the structure and hide
it from the UAPI so we can add more value/mask types as needed without
having to spin versions of net_flow_field_ref. On the other hand I've
been able to fit all my fields in these types so far and I can't think
of any additions we need at the moment.

>>
>>> +#define NET_FLOW_TABLE_EGRESS_ROOT 1
>>> +#define    NET_FLOW_TABLE_INGRESS_ROOT 2
>>
>> Tab/space mix.
>>
>

[...]


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-06  0:45       ` John Fastabend
@ 2015-01-06  1:09         ` Simon Horman
  2015-01-06  1:19           ` John Fastabend
  0 siblings, 1 reply; 60+ messages in thread
From: Simon Horman @ 2015-01-06  1:09 UTC (permalink / raw)
  To: John Fastabend; +Cc: Thomas Graf, sfeldma, jiri, jhs, netdev, davem, andy

On Mon, Jan 05, 2015 at 04:45:50PM -0800, John Fastabend wrote:
> [...]
> 
> >>>+/**
> >>>+ * @struct net_flow_field_ref
> >>>+ * @brief uniquely identify field as header:field tuple
> >>>+ */
> >>>+struct net_flow_field_ref {
> >>>+    int instance;
> >>>+    int header;
> >>>+    int field;
> >>>+    int mask_type;
> >>>+    int type;
> >>>+    union {    /* Are these all the required data types */
> >>>+        __u8 value_u8;
> >>>+        __u16 value_u16;
> >>>+        __u32 value_u32;
> >>>+        __u64 value_u64;
> >>>+    };
> >>>+    union {    /* Are these all the required data types */
> >>>+        __u8 mask_u8;
> >>>+        __u16 mask_u16;
> >>>+        __u32 mask_u32;
> >>>+        __u64 mask_u64;
> >>>+    };
> >>>+};
> >>
> >>Does it make sense to write this as follows?
> >
> >Yes. I'll make this update it helps make it clear value/mask pairs are
> >needed.
> >
> >>
> >>union {
> >>         struct {
> >>                 __u8 value_u8;
> >>                 __u8 mask_u8;
> >>         };
> >>         struct {
> >>                 __u16 value_u16;
> >>                 __u16 mask_u16;
> >>         };
> >>         ...
> >>};
> 
> Another thought is to pull this entirely out of the structure and hide
> it from the UAPI so we can add more value/mask types as needed without
> having to spin versions of net_flow_field_ref. On the other hand I've
> been able to fit all my fields in these types so far and I can't think
> of any additions we need at the moment.

FWIW, I think it would be cleaner to break both field_ref and action_args
out into attributes and not expose the structures to user-space. But
perhaps there is an advantage to dealing with structures directly that
I am missing.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-06  1:09         ` Simon Horman
@ 2015-01-06  1:19           ` John Fastabend
  2015-01-06  2:05             ` Simon Horman
  0 siblings, 1 reply; 60+ messages in thread
From: John Fastabend @ 2015-01-06  1:19 UTC (permalink / raw)
  To: Simon Horman; +Cc: Thomas Graf, sfeldma, jiri, jhs, netdev, davem, andy

On 01/05/2015 05:09 PM, Simon Horman wrote:
> On Mon, Jan 05, 2015 at 04:45:50PM -0800, John Fastabend wrote:
>> [...]
>>
>>>>> +/**
>>>>> + * @struct net_flow_field_ref
>>>>> + * @brief uniquely identify field as header:field tuple
>>>>> + */
>>>>> +struct net_flow_field_ref {
>>>>> +    int instance;
>>>>> +    int header;
>>>>> +    int field;
>>>>> +    int mask_type;
>>>>> +    int type;
>>>>> +    union {    /* Are these all the required data types */
>>>>> +        __u8 value_u8;
>>>>> +        __u16 value_u16;
>>>>> +        __u32 value_u32;
>>>>> +        __u64 value_u64;
>>>>> +    };
>>>>> +    union {    /* Are these all the required data types */
>>>>> +        __u8 mask_u8;
>>>>> +        __u16 mask_u16;
>>>>> +        __u32 mask_u32;
>>>>> +        __u64 mask_u64;
>>>>> +    };
>>>>> +};
>>>>
>>>> Does it make sense to write this as follows?
>>>
>>> Yes. I'll make this update it helps make it clear value/mask pairs are
>>> needed.
>>>
>>>>
>>>> union {
>>>>          struct {
>>>>                  __u8 value_u8;
>>>>                  __u8 mask_u8;
>>>>          };
>>>>          struct {
>>>>                  __u16 value_u16;
>>>>                  __u16 mask_u16;
>>>>          };
>>>>          ...
>>>> };
>>
>> Another thought is to pull this entirely out of the structure and hide
>> it from the UAPI so we can add more value/mask types as needed without
>> having to spin versions of net_flow_field_ref. On the other hand I've
>> been able to fit all my fields in these types so far and I can't think
>> of any additions we need at the moment.
>
> FWIW, I think it would be cleaner to break both field_ref and action_args
> out into attributes and not expose the structures to user-space. But
> perhaps there is an advantage to dealing with structures directly that
> I am missing.
>

I  came to the same conclusion just now as well. I'm reworking it now
for v2.

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-06  1:19           ` John Fastabend
@ 2015-01-06  2:05             ` Simon Horman
  2015-01-06  2:54               ` Simon Horman
  0 siblings, 1 reply; 60+ messages in thread
From: Simon Horman @ 2015-01-06  2:05 UTC (permalink / raw)
  To: John Fastabend; +Cc: Thomas Graf, sfeldma, jiri, jhs, netdev, davem, andy

On Mon, Jan 05, 2015 at 05:19:26PM -0800, John Fastabend wrote:
> On 01/05/2015 05:09 PM, Simon Horman wrote:
> >On Mon, Jan 05, 2015 at 04:45:50PM -0800, John Fastabend wrote:
> >>[...]
> >>
> >>>>>+/**
> >>>>>+ * @struct net_flow_field_ref
> >>>>>+ * @brief uniquely identify field as header:field tuple
> >>>>>+ */
> >>>>>+struct net_flow_field_ref {
> >>>>>+    int instance;
> >>>>>+    int header;
> >>>>>+    int field;
> >>>>>+    int mask_type;
> >>>>>+    int type;
> >>>>>+    union {    /* Are these all the required data types */
> >>>>>+        __u8 value_u8;
> >>>>>+        __u16 value_u16;
> >>>>>+        __u32 value_u32;
> >>>>>+        __u64 value_u64;
> >>>>>+    };
> >>>>>+    union {    /* Are these all the required data types */
> >>>>>+        __u8 mask_u8;
> >>>>>+        __u16 mask_u16;
> >>>>>+        __u32 mask_u32;
> >>>>>+        __u64 mask_u64;
> >>>>>+    };
> >>>>>+};
> >>>>
> >>>>Does it make sense to write this as follows?
> >>>
> >>>Yes. I'll make this update it helps make it clear value/mask pairs are
> >>>needed.
> >>>
> >>>>
> >>>>union {
> >>>>         struct {
> >>>>                 __u8 value_u8;
> >>>>                 __u8 mask_u8;
> >>>>         };
> >>>>         struct {
> >>>>                 __u16 value_u16;
> >>>>                 __u16 mask_u16;
> >>>>         };
> >>>>         ...
> >>>>};
> >>
> >>Another thought is to pull this entirely out of the structure and hide
> >>it from the UAPI so we can add more value/mask types as needed without
> >>having to spin versions of net_flow_field_ref. On the other hand I've
> >>been able to fit all my fields in these types so far and I can't think
> >>of any additions we need at the moment.
> >
> >FWIW, I think it would be cleaner to break both field_ref and action_args
> >out into attributes and not expose the structures to user-space. But
> >perhaps there is an advantage to dealing with structures directly that
> >I am missing.
> >
> 
> I  came to the same conclusion just now as well. I'm reworking it now
> for v2.

Thanks.

BTW, I think there are a few problems with net_flow_put_flow_action().

I am not quite to the bottom of it but it seems that:
* It loops over a->args[i] and then calls net_flow_put_act_types()
  which performs a similar loop. This outer-loop appears to be incorrect.
* It passes a[i].args instead of a->args[i] to net_flow_put_act_types()

I can post a fix once I've got it working to my satisfaction.
But if you are reworking that code anyway perhaps it is easier for
you to handle it then.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 00/11] A flow API
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (11 preceding siblings ...)
  2015-01-04  8:30 ` [net-next PATCH v1 00/11] A flow API Or Gerlitz
@ 2015-01-06  2:42 ` Scott Feldman
  2015-01-06 12:23 ` Jamal Hadi Salim
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Scott Feldman @ 2015-01-06  2:42 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On Wed, Dec 31, 2014 at 11:45 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Finally I have more patches to add support for creating and destroying
> tables. This allows users to define the pipeline at runtime rather
> than statically as rocker does now. After this set gets some traction
> I'll look at pushing them in a next round. However it likely requires
> adding another "world" to rocker.

Yes, it would require another "world" to be added to rocker.  It would
be cool if someone could work on this.  Currently, the only world
rocker supports is OF-DPA, which is based on Broadcom's published
OF-DPA spec.  OF-DPA is a fixed pipeline with predefined tables.  I'd
like to see a "universal machine" world added to rocker that presents
a programmable pipeline and programmable tables.  The nice thing is
most of the rocker device and driver code gets reused fro this new
world.  Nothing changes about the port interface, or the DMA/MSI/PCI
interfaces, or the I/O, cmd, and event paths, so much is reused.

> Another piece that I want to add is
> a description of the actions and metadata. This way user space can
> "learn" what an action is and how metadata interacts with the system.
> This work is under development.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-06  2:05             ` Simon Horman
@ 2015-01-06  2:54               ` Simon Horman
  2015-01-06  3:31                 ` John Fastabend
  0 siblings, 1 reply; 60+ messages in thread
From: Simon Horman @ 2015-01-06  2:54 UTC (permalink / raw)
  To: John Fastabend; +Cc: Thomas Graf, sfeldma, jiri, jhs, netdev, davem, andy

On Tue, Jan 06, 2015 at 11:05:14AM +0900, Simon Horman wrote:
> On Mon, Jan 05, 2015 at 05:19:26PM -0800, John Fastabend wrote:
> > On 01/05/2015 05:09 PM, Simon Horman wrote:
> > >On Mon, Jan 05, 2015 at 04:45:50PM -0800, John Fastabend wrote:
> > >>[...]
> > >>
> > >>>>>+/**
> > >>>>>+ * @struct net_flow_field_ref
> > >>>>>+ * @brief uniquely identify field as header:field tuple
> > >>>>>+ */
> > >>>>>+struct net_flow_field_ref {
> > >>>>>+    int instance;
> > >>>>>+    int header;
> > >>>>>+    int field;
> > >>>>>+    int mask_type;
> > >>>>>+    int type;
> > >>>>>+    union {    /* Are these all the required data types */
> > >>>>>+        __u8 value_u8;
> > >>>>>+        __u16 value_u16;
> > >>>>>+        __u32 value_u32;
> > >>>>>+        __u64 value_u64;
> > >>>>>+    };
> > >>>>>+    union {    /* Are these all the required data types */
> > >>>>>+        __u8 mask_u8;
> > >>>>>+        __u16 mask_u16;
> > >>>>>+        __u32 mask_u32;
> > >>>>>+        __u64 mask_u64;
> > >>>>>+    };
> > >>>>>+};
> > >>>>
> > >>>>Does it make sense to write this as follows?
> > >>>
> > >>>Yes. I'll make this update it helps make it clear value/mask pairs are
> > >>>needed.
> > >>>
> > >>>>
> > >>>>union {
> > >>>>         struct {
> > >>>>                 __u8 value_u8;
> > >>>>                 __u8 mask_u8;
> > >>>>         };
> > >>>>         struct {
> > >>>>                 __u16 value_u16;
> > >>>>                 __u16 mask_u16;
> > >>>>         };
> > >>>>         ...
> > >>>>};
> > >>
> > >>Another thought is to pull this entirely out of the structure and hide
> > >>it from the UAPI so we can add more value/mask types as needed without
> > >>having to spin versions of net_flow_field_ref. On the other hand I've
> > >>been able to fit all my fields in these types so far and I can't think
> > >>of any additions we need at the moment.
> > >
> > >FWIW, I think it would be cleaner to break both field_ref and action_args
> > >out into attributes and not expose the structures to user-space. But
> > >perhaps there is an advantage to dealing with structures directly that
> > >I am missing.
> > >
> > 
> > I  came to the same conclusion just now as well. I'm reworking it now
> > for v2.
> 
> Thanks.
> 
> BTW, I think there are a few problems with net_flow_put_flow_action().
> 
> I am not quite to the bottom of it but it seems that:
> * It loops over a->args[i] and then calls net_flow_put_act_types()
>   which performs a similar loop. This outer-loop appears to be incorrect.
> * It passes a[i].args instead of a->args[i] to net_flow_put_act_types()
> 
> I can post a fix once I've got it working to my satisfaction.
> But if you are reworking that code anyway perhaps it is easier for
> you to handle it then.

FWIW this got the current scheme working for me:

diff --git a/net/core/flow_table.c b/net/core/flow_table.c
index 5dbdc13..598afa2 100644
--- a/net/core/flow_table.c
+++ b/net/core/flow_table.c
@@ -946,7 +946,7 @@ static int net_flow_put_flow_action(struct sk_buff *skb,
 				    struct net_flow_action *a)
 {
 	struct nlattr *action, *sigs;
-	int i, err = 0;
+	int err = 0;
 
 	action = nla_nest_start(skb, NET_FLOW_ACTION);
 	if (!action)
@@ -958,21 +958,19 @@ static int net_flow_put_flow_action(struct sk_buff *skb,
 	if (!a->args)
 		goto done;
 
-	for (i = 0; a->args[i].type; i++) {
-		sigs = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
-		if (!sigs) {
-			nla_nest_cancel(skb, action);
-			return -EMSGSIZE;
-		}
+	sigs = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
+	if (!sigs) {
+		nla_nest_cancel(skb, action);
+		return -EMSGSIZE;
+	}
 
-		err = net_flow_put_act_types(skb, a[i].args);
-		if (err) {
-			nla_nest_cancel(skb, sigs);
-			nla_nest_cancel(skb, action);
-			return err;
-		}
-		nla_nest_end(skb, sigs);
+	err = net_flow_put_act_types(skb, a->args);
+	if (err) {
+		nla_nest_cancel(skb, sigs);
+		nla_nest_cancel(skb, action);
+		return err;
 	}
+	nla_nest_end(skb, sigs);
 
 done:
 	nla_nest_end(skb, action);
@@ -1103,6 +1101,7 @@ static int net_flow_get_action(struct net_flow_action *a, struct nlattr *attr)
 		}
 
 		a->args[count] = *(struct net_flow_action_arg *)nla_data(args);
+		count++;
 	}
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-06  2:54               ` Simon Horman
@ 2015-01-06  3:31                 ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-06  3:31 UTC (permalink / raw)
  To: Simon Horman; +Cc: Thomas Graf, sfeldma, jiri, jhs, netdev, davem, andy


[...]

>>
>> BTW, I think there are a few problems with net_flow_put_flow_action().
>>
>> I am not quite to the bottom of it but it seems that:
>> * It loops over a->args[i] and then calls net_flow_put_act_types()
>>    which performs a similar loop. This outer-loop appears to be incorrect.
>> * It passes a[i].args instead of a->args[i] to net_flow_put_act_types()
>>
>> I can post a fix once I've got it working to my satisfaction.
>> But if you are reworking that code anyway perhaps it is easier for
>> you to handle it then.
>
> FWIW this got the current scheme working for me:
>

Thanks Simon. I'll roll this in as well.

> diff --git a/net/core/flow_table.c b/net/core/flow_table.c
> index 5dbdc13..598afa2 100644
> --- a/net/core/flow_table.c
> +++ b/net/core/flow_table.c
> @@ -946,7 +946,7 @@ static int net_flow_put_flow_action(struct sk_buff *skb,
>   				    struct net_flow_action *a)
>   {
>   	struct nlattr *action, *sigs;
> -	int i, err = 0;
> +	int err = 0;
>
>   	action = nla_nest_start(skb, NET_FLOW_ACTION);
>   	if (!action)
> @@ -958,21 +958,19 @@ static int net_flow_put_flow_action(struct sk_buff *skb,
>   	if (!a->args)
>   		goto done;
>
> -	for (i = 0; a->args[i].type; i++) {
> -		sigs = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
> -		if (!sigs) {
> -			nla_nest_cancel(skb, action);
> -			return -EMSGSIZE;
> -		}
> +	sigs = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
> +	if (!sigs) {
> +		nla_nest_cancel(skb, action);
> +		return -EMSGSIZE;
> +	}
>
> -		err = net_flow_put_act_types(skb, a[i].args);
> -		if (err) {
> -			nla_nest_cancel(skb, sigs);
> -			nla_nest_cancel(skb, action);
> -			return err;
> -		}
> -		nla_nest_end(skb, sigs);
> +	err = net_flow_put_act_types(skb, a->args);
> +	if (err) {
> +		nla_nest_cancel(skb, sigs);
> +		nla_nest_cancel(skb, action);
> +		return err;
>   	}
> +	nla_nest_end(skb, sigs);
>
>   done:
>   	nla_nest_end(skb, action);
> @@ -1103,6 +1101,7 @@ static int net_flow_get_action(struct net_flow_action *a, struct nlattr *attr)
>   		}
>
>   		a->args[count] = *(struct net_flow_action_arg *)nla_data(args);
> +		count++;
>   	}
>   	return 0;
>   }
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2014-12-31 19:45 ` [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables John Fastabend
  2014-12-31 20:10   ` John Fastabend
  2015-01-04 11:12   ` Thomas Graf
@ 2015-01-06  5:25   ` Scott Feldman
  2015-01-06  6:04     ` John Fastabend
  2 siblings, 1 reply; 60+ messages in thread
From: Scott Feldman @ 2015-01-06  5:25 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

Nice work John.  Some nits inline...

On Wed, Dec 31, 2014 at 11:45 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
>
> diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
> new file mode 100644
> index 0000000..1b6c1ea
> --- /dev/null
> +++ b/include/linux/if_flow.h
> @@ -0,0 +1,93 @@
> +/*
> + * include/linux/net/if_flow.h - Flow table interface for Switch devices
> + * Copyright (c) 2014 John Fastabend <john.r.fastabend@intel.com>
> + *
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Author: John Fastabend <john.r.fastabend@intel.com>
> + */
> +
> +#ifndef _IF_FLOW_H
> +#define _IF_FLOW_H
> +
> +#include <uapi/linux/if_flow.h>
> +
> +/**
> + * @struct net_flow_header
> + * @brief defines a match (header/field) an endpoint can use
> + *
> + * @uid unique identifier for header
> + * @field_sz number of fields are in the set
> + * @fields the set of fields in the net_flow_header
> + */
> +struct net_flow_header {
> +       char name[NET_FLOW_NAMSIZ];
> +       int uid;
> +       int field_sz;
> +       struct net_flow_field *fields;
> +};
> +
> +/**
> + * @struct net_flow_action
> + * @brief a description of a endpoint defined action
> + *
> + * @name printable name
> + * @uid unique action identifier
> + * @types NET_FLOW_ACTION_TYPE_NULL terminated list of action types

s/types/args?

> + */
> +struct net_flow_action {
> +       char name[NET_FLOW_NAMSIZ];
> +       int uid;
> +       struct net_flow_action_arg *args;
> +};
> +
> +/**
> + * @struct net_flow_table
> + * @brief define flow table with supported match/actions
> + *
> + * @uid unique identifier for table
> + * @source uid of parent table

Is parent table the table previous in the pipeline?  If so, what if
you can get to table from N different parent tables, what goes in
source?

> + * @size max number of entries for table or -1 for unbounded
> + * @matches null terminated set of supported match types given by match uid
> + * @actions null terminated set of supported action types given by action uid
> + * @flows set of flows
> + */
> +struct net_flow_table {
> +       char name[NET_FLOW_NAMSIZ];
> +       int uid;
> +       int source;
> +       int size;
> +       struct net_flow_field_ref *matches;
> +       int *actions;
> +};
> +
> +/* net_flow_hdr_node: node in a header graph of header fields.
> + *
> + * @uid : unique id of the graph node
> + * @flwo_header_ref : identify the hdrs that can handled by this node

s/flwo_header_ref/hdrs?

> + * @net_flow_jump_table : give a case jump statement

s/net_flow_jump_table/jump

> + */
> +struct net_flow_hdr_node {
> +       char name[NET_FLOW_NAMSIZ];
> +       int uid;
> +       int *hdrs;
> +       struct net_flow_jump_table *jump;
> +};
> +
> +struct net_flow_tbl_node {
> +       int uid;
> +       __u32 flags;
> +       struct net_flow_jump_table *jump;
> +};
> +#endif
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 29c92ee..3c3c856 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -52,6 +52,11 @@
>  #include <linux/neighbour.h>
>  #include <uapi/linux/netdevice.h>
>
> +#ifdef CONFIG_NET_FLOW_TABLES
> +#include <linux/if_flow.h>
> +#include <uapi/linux/if_flow.h>

linux/if_flow.h already included uapi file

> +#endif
> +
>  struct netpoll_info;
>  struct device;
>  struct phy_device;
> @@ -1186,6 +1191,13 @@ struct net_device_ops {
>         int                     (*ndo_switch_port_stp_update)(struct net_device *dev,
>                                                               u8 state);
>  #endif
> +#ifdef CONFIG_NET_FLOW_TABLES
> +       struct net_flow_action  **(*ndo_flow_get_actions)(struct net_device *dev);
> +       struct net_flow_table   **(*ndo_flow_get_tables)(struct net_device *dev);
> +       struct net_flow_header  **(*ndo_flow_get_headers)(struct net_device *dev);
> +       struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev);

hdr or header?  pick one, probably hdr.

> +       struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev);

move this up next to get_tables

> +#endif
>  };
>
>  /**
> diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
> new file mode 100644
> index 0000000..2acdb38
> --- /dev/null
> +++ b/include/uapi/linux/if_flow.h
> @@ -0,0 +1,363 @@
> +/*
> + * include/uapi/linux/if_flow.h - Flow table interface for Switch devices
> + * Copyright (c) 2014 John Fastabend <john.r.fastabend@intel.com>
> + *
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Author: John Fastabend <john.r.fastabend@intel.com>
> + */
> +
> +/* Netlink description:
> + *
> + * Table definition used to describe running tables. The following
> + * describes the netlink message returned from a flow API messages.

message?


> +
> +enum {
> +       NET_FLOW_MASK_TYPE_UNSPEC,
> +       NET_FLOW_MASK_TYPE_EXACT,
> +       NET_FLOW_MASK_TYPE_LPM,

As discussed in another thread, need third mask type that's not LPM;
e.g. 0b0101.


> +#define NET_FLOW_TABLE_GRAPH_NODE_MAX (__NET_FLOW_TABLE_GRAPH_NODE_MAX - 1)
> +
> +enum {
> +       NET_FLOW_TABLE_GRAPH_UNSPEC,
> +       NET_FLOW_TABLE_GRAPH_NODE,
> +       __NET_FLOW_TABLE_GRAPH_MAX,
> +};
> +#define NET_FLOW_TABLE_GRAPH_MAX (__NET_FLOW_TABLE_GRAPH_MAX - 1)
> +
> +enum {
> +       NET_FLOW_IDENTIFIER_IFINDEX, /* net_device ifindex */

Maybe add an NET_FLOW_IDENTIFIER_UNSPEC so NET_FLOW_IDENTIFIER_IFINDEX
isn't zero.

> diff --git a/net/Kconfig b/net/Kconfig
> index ff9ffc1..8380bfe 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -293,6 +293,13 @@ config NET_FLOW_LIMIT
>           with many clients some protection against DoS by a single (spoofed)
>           flow that greatly exceeds average workload.
>
> +config NET_FLOW_TABLES
> +       boolean "Support network flow tables"
> +       ---help---
> +       This feature provides an interface for device drivers to report
> +       flow tables and supported matches and actions. If you do not
> +       want to support hardware offloads for flow tables, say N here.
> +
>  menu "Network testing"
>
>  config NET_PKTGEN
> diff --git a/net/core/Makefile b/net/core/Makefile
> index 235e6c5..1eea785 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -23,3 +23,4 @@ obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
>  obj-$(CONFIG_NET_PTP_CLASSIFY) += ptp_classifier.o
>  obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
>  obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
> +obj-$(CONFIG_NET_FLOW_TABLES) += flow_table.o
> diff --git a/net/core/flow_table.c b/net/core/flow_table.c
> new file mode 100644
> index 0000000..ec3f06d
> --- /dev/null
> +++ b/net/core/flow_table.c
> @@ -0,0 +1,837 @@
> +/*
> + * include/uapi/linux/if_flow.h - Flow table interface for Switch devices
> + * Copyright (c) 2014 John Fastabend <john.r.fastabend@intel.com>
> + *
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Author: John Fastabend <john.r.fastabend@intel.com>
> + */
> +
> +#include <uapi/linux/if_flow.h>
> +#include <linux/if_flow.h>
> +#include <linux/if_bridge.h>
> +#include <linux/types.h>
> +#include <net/netlink.h>
> +#include <net/genetlink.h>
> +#include <net/rtnetlink.h>
> +#include <linux/module.h>
> +
> +static struct genl_family net_flow_nl_family = {
> +       .id             = GENL_ID_GENERATE,
> +       .name           = NET_FLOW_GENL_NAME,
> +       .version        = NET_FLOW_GENL_VERSION,
> +       .maxattr        = NET_FLOW_MAX,
> +       .netnsok        = true,
> +};
> +
> +static struct net_device *net_flow_get_dev(struct genl_info *info)
> +{
> +       struct net *net = genl_info_net(info);
> +       int type, ifindex;
> +
> +       if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] ||
> +           !info->attrs[NET_FLOW_IDENTIFIER])
> +               return NULL;
> +
> +       type = nla_get_u32(info->attrs[NET_FLOW_IDENTIFIER_TYPE]);
> +       switch (type) {
> +       case NET_FLOW_IDENTIFIER_IFINDEX:
> +               ifindex = nla_get_u32(info->attrs[NET_FLOW_IDENTIFIER]);
> +               break;
> +       default:
> +               return NULL;
> +       }
> +
> +       return dev_get_by_index(net, ifindex);
> +}
> +
> +static int net_flow_put_act_types(struct sk_buff *skb,
> +                                 struct net_flow_action_arg *args)
> +{
> +       int i, err;
> +
> +       for (i = 0; args[i].type; i++) {
> +               err = nla_put(skb, NET_FLOW_ACTION_ARG,
> +                             sizeof(struct net_flow_action_arg), &args[i]);
> +               if (err)
> +                       return -EMSGSIZE;
> +       }
> +       return 0;
> +}
> +
> +static const
> +struct nla_policy net_flow_action_policy[NET_FLOW_ACTION_ATTR_MAX + 1] = {
> +       [NET_FLOW_ACTION_ATTR_NAME]      = {.type = NLA_STRING,
> +                                           .len = NET_FLOW_NAMSIZ-1 },
> +       [NET_FLOW_ACTION_ATTR_UID]       = {.type = NLA_U32 },
> +       [NET_FLOW_ACTION_ATTR_SIGNATURE] = {.type = NLA_NESTED },
> +};
> +
> +static int net_flow_put_action(struct sk_buff *skb, struct net_flow_action *a)
> +{
> +       struct net_flow_action_arg *this;
> +       struct nlattr *nest;
> +       int err, args = 0;
> +
> +       if (a->name && nla_put_string(skb, NET_FLOW_ACTION_ATTR_NAME, a->name))
> +               return -EMSGSIZE;
> +
> +       if (nla_put_u32(skb, NET_FLOW_ACTION_ATTR_UID, a->uid))
> +               return -EMSGSIZE;
> +
> +       if (!a->args)
> +               return 0;
> +
> +       for (this = &a->args[0]; strlen(this->name) > 0; this++)
> +               args++;
> +

Since you only need to know that there are > 0 args, but don't need
the actual count, can you simplify test with something like:

   bool has_args = strlen(a->args->name) > 0;

or

  bool has_args = !!a->args->type;

> +       if (args) {
> +               nest = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
> +               if (!nest)
> +                       goto nest_put_failure;

Maybe just return -EMSGSIZE here and skip goto.

> +
> +               err = net_flow_put_act_types(skb, a->args);
> +               if (err) {
> +                       nla_nest_cancel(skb, nest);
> +                       return err;
> +               }
> +               nla_nest_end(skb, nest);
> +       }
> +
> +       return 0;
> +nest_put_failure:
> +       return -EMSGSIZE;
> +}
> +
> +static int net_flow_put_actions(struct sk_buff *skb,
> +                               struct net_flow_action **acts)
> +{
> +       struct nlattr *actions;
> +       int err, i;
> +
> +       actions = nla_nest_start(skb, NET_FLOW_ACTIONS);
> +       if (!actions)
> +               return -EMSGSIZE;
> +
> +       for (i = 0; acts[i]->uid; i++) {

Using for(act = acts; act->udi; act++) will make code a little nicer.

> +               struct nlattr *action = nla_nest_start(skb, NET_FLOW_ACTION);
> +
> +               if (!action)
> +                       goto action_put_failure;
> +
> +               err = net_flow_put_action(skb, acts[i]);
> +               if (err)
> +                       goto action_put_failure;
> +               nla_nest_end(skb, action);
> +       }
> +       nla_nest_end(skb, actions);
> +
> +       return 0;
> +action_put_failure:
> +       nla_nest_cancel(skb, actions);
> +       return -EMSGSIZE;
> +}
> +
> +struct sk_buff *net_flow_build_actions_msg(struct net_flow_action **a,
> +                                          struct net_device *dev,
> +                                          u32 portid, int seq, u8 cmd)
> +{
> +       struct genlmsghdr *hdr;
> +       struct sk_buff *skb;
> +       int err = -ENOBUFS;
> +
> +       skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +       if (!skb)
> +               return ERR_PTR(-ENOBUFS);
> +
> +       hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
> +       if (!hdr)
> +               goto out;
> +
> +       if (nla_put_u32(skb,
> +                       NET_FLOW_IDENTIFIER_TYPE,
> +                       NET_FLOW_IDENTIFIER_IFINDEX) ||
> +           nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
> +               err = -ENOBUFS;
> +               goto out;
> +       }
> +
> +       err = net_flow_put_actions(skb, a);
> +       if (err < 0)
> +               goto out;
> +
> +       err = genlmsg_end(skb, hdr);
> +       if (err < 0)
> +               goto out;
> +
> +       return skb;
> +out:
> +       nlmsg_free(skb);
> +       return ERR_PTR(err);
> +}
> +
> +static int net_flow_cmd_get_actions(struct sk_buff *skb,
> +                                   struct genl_info *info)
> +{
> +       struct net_flow_action **a;
> +       struct net_device *dev;
> +       struct sk_buff *msg;
> +
> +       dev = net_flow_get_dev(info);
> +       if (!dev)
> +               return -EINVAL;
> +
> +       if (!dev->netdev_ops->ndo_flow_get_actions) {
> +               dev_put(dev);
> +               return -EOPNOTSUPP;
> +       }
> +
> +       a = dev->netdev_ops->ndo_flow_get_actions(dev);
> +       if (!a)
> +               return -EBUSY;

Is it assumed ndo_flow_get_actions() returns a pointer to a static
list of actions?  What if the device wants to give up a dynamic list
of actions?  I'm trying to understand the lifetime of pointer 'a'.
What would cause -EBUSY condition?

> +
> +       msg = net_flow_build_actions_msg(a, dev,
> +                                        info->snd_portid,
> +                                        info->snd_seq,
> +                                        NET_FLOW_TABLE_CMD_GET_ACTIONS);
> +       dev_put(dev);
> +
> +       if (IS_ERR(msg))
> +               return PTR_ERR(msg);
> +
> +       return genlmsg_reply(msg, info);
> +}
> +
> +static int net_flow_put_table(struct net_device *dev,
> +                             struct sk_buff *skb,
> +                             struct net_flow_table *t)
> +{
> +       struct nlattr *matches, *actions;
> +       int i;
> +
> +       if (nla_put_string(skb, NET_FLOW_TABLE_ATTR_NAME, t->name) ||
> +           nla_put_u32(skb, NET_FLOW_TABLE_ATTR_UID, t->uid) ||
> +           nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SOURCE, t->source) ||
> +           nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SIZE, t->size))
> +               return -EMSGSIZE;
> +
> +       matches = nla_nest_start(skb, NET_FLOW_TABLE_ATTR_MATCHES);
> +       if (!matches)
> +               return -EMSGSIZE;
> +
> +       for (i = 0; t->matches[i].instance; i++)

pointer-based loop better than i-based?  my personal preference, i guess.

> +               nla_put(skb, NET_FLOW_FIELD_REF,
> +                       sizeof(struct net_flow_field_ref),
> +                       &t->matches[i]);
> +       nla_nest_end(skb, matches);
> +

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-06  5:25   ` Scott Feldman
@ 2015-01-06  6:04     ` John Fastabend
  2015-01-06  6:40       ` Scott Feldman
  0 siblings, 1 reply; 60+ messages in thread
From: John Fastabend @ 2015-01-06  6:04 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

[...]

>> +/**
>> + * @struct net_flow_action
>> + * @brief a description of a endpoint defined action
>> + *
>> + * @name printable name
>> + * @uid unique action identifier
>> + * @types NET_FLOW_ACTION_TYPE_NULL terminated list of action types
>
> s/types/args?
>

yep typo fixed in upcoming v2.

>> + */
>> +struct net_flow_action {
>> +       char name[NET_FLOW_NAMSIZ];
>> +       int uid;
>> +       struct net_flow_action_arg *args;
>> +};
>> +
>> +/**
>> + * @struct net_flow_table
>> + * @brief define flow table with supported match/actions
>> + *
>> + * @uid unique identifier for table
>> + * @source uid of parent table
>
> Is parent table the table previous in the pipeline?  If so, what if
> you can get to table from N different parent tables, what goes in
> source?

No, you can get the layout of tables from the table graph ops.

Source is used when a single tcam or other implementation mechanism
is sliced into a set of tables. The current rocker world doesn't use
this very much at the moment because its static and I just assumed
every table came out of the same virtual hardware namespace.

A simple example world would be to come up with a set of large virtual
TCAMs. Any given TCAM maybe sliced into a set of tables. Users may
organize these either via some out of band configuration at init or
power on time. In the rocker case we could specify this when we load
qemu. For now it is just informational. But if we start allowing users
to create delete tables at runtime it is important to "know" where the
slices are being allocated/free'd from. The source gives you this
information.

The hardware devices I'm working on have multiple sources we can
allocate/free tables from. The source values would provide a way to
track down which tables are in which hardware namespaces.

Hope that helps?

>
>> + * @size max number of entries for table or -1 for unbounded
>> + * @matches null terminated set of supported match types given by match uid
>> + * @actions null terminated set of supported action types given by action uid
>> + * @flows set of flows
>> + */
>> +struct net_flow_table {
>> +       char name[NET_FLOW_NAMSIZ];
>> +       int uid;
>> +       int source;
>> +       int size;
>> +       struct net_flow_field_ref *matches;
>> +       int *actions;
>> +};
>> +
>> +/* net_flow_hdr_node: node in a header graph of header fields.
>> + *
>> + * @uid : unique id of the graph node
>> + * @flwo_header_ref : identify the hdrs that can handled by this node
>
> s/flwo_header_ref/hdrs?
>
>> + * @net_flow_jump_table : give a case jump statement
>
> s/net_flow_jump_table/jump

yep thanks.

>
>> + */
>> +struct net_flow_hdr_node {
>> +       char name[NET_FLOW_NAMSIZ];
>> +       int uid;
>> +       int *hdrs;
>> +       struct net_flow_jump_table *jump;
>> +};
>> +
>> +struct net_flow_tbl_node {
>> +       int uid;
>> +       __u32 flags;
>> +       struct net_flow_jump_table *jump;
>> +};
>> +#endif
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 29c92ee..3c3c856 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -52,6 +52,11 @@
>>   #include <linux/neighbour.h>
>>   #include <uapi/linux/netdevice.h>
>>
>> +#ifdef CONFIG_NET_FLOW_TABLES
>> +#include <linux/if_flow.h>
>> +#include <uapi/linux/if_flow.h>
>
> linux/if_flow.h already included uapi file
>

fixed.

>> +#endif
>> +
>>   struct netpoll_info;
>>   struct device;
>>   struct phy_device;
>> @@ -1186,6 +1191,13 @@ struct net_device_ops {
>>          int                     (*ndo_switch_port_stp_update)(struct net_device *dev,
>>                                                                u8 state);
>>   #endif
>> +#ifdef CONFIG_NET_FLOW_TABLES
>> +       struct net_flow_action  **(*ndo_flow_get_actions)(struct net_device *dev);
>> +       struct net_flow_table   **(*ndo_flow_get_tables)(struct net_device *dev);
>> +       struct net_flow_header  **(*ndo_flow_get_headers)(struct net_device *dev);
>> +       struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev);
>
> hdr or header?  pick one, probably hdr.

hdr is shorter and doesn't lose any clarity IMO I'll use net_flow_hdr
and net_flow_hdr_node

>
>> +       struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev);
>
> move this up next to get_tables

sure also what do you think tbl instead of table.

>
>> +#endif
>>   };
>>
>>   /**
>> diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
>> new file mode 100644
>> index 0000000..2acdb38
>> --- /dev/null
>> +++ b/include/uapi/linux/if_flow.h
>> @@ -0,0 +1,363 @@
>> +/*
>> + * include/uapi/linux/if_flow.h - Flow table interface for Switch devices
>> + * Copyright (c) 2014 John Fastabend <john.r.fastabend@intel.com>
>> + *
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * The full GNU General Public License is included in this distribution in
>> + * the file called "COPYING".
>> + *
>> + * Author: John Fastabend <john.r.fastabend@intel.com>
>> + */
>> +
>> +/* Netlink description:
>> + *
>> + * Table definition used to describe running tables. The following
>> + * describes the netlink message returned from a flow API messages.
>
> message?
>

That sentence is a bit awkward all around. Changed it to

"The following describes the netlink format used by the flow API."

maybe that is better.

>
>> +
>> +enum {
>> +       NET_FLOW_MASK_TYPE_UNSPEC,
>> +       NET_FLOW_MASK_TYPE_EXACT,
>> +       NET_FLOW_MASK_TYPE_LPM,
>
> As discussed in another thread, need third mask type that's not LPM;
> e.g. 0b0101.
>

yep.

>
>> +#define NET_FLOW_TABLE_GRAPH_NODE_MAX (__NET_FLOW_TABLE_GRAPH_NODE_MAX - 1)
>> +
>> +enum {
>> +       NET_FLOW_TABLE_GRAPH_UNSPEC,
>> +       NET_FLOW_TABLE_GRAPH_NODE,
>> +       __NET_FLOW_TABLE_GRAPH_MAX,
>> +};
>> +#define NET_FLOW_TABLE_GRAPH_MAX (__NET_FLOW_TABLE_GRAPH_MAX - 1)
>> +
>> +enum {
>> +       NET_FLOW_IDENTIFIER_IFINDEX, /* net_device ifindex */
>
> Maybe add an NET_FLOW_IDENTIFIER_UNSPEC so NET_FLOW_IDENTIFIER_IFINDEX
> isn't zero.
>

agreed I tend to like being able to test things with if (foo) { ... }

[...]

>> +
>> +static int net_flow_put_action(struct sk_buff *skb, struct net_flow_action *a)
>> +{
>> +       struct net_flow_action_arg *this;
>> +       struct nlattr *nest;
>> +       int err, args = 0;
>> +
>> +       if (a->name && nla_put_string(skb, NET_FLOW_ACTION_ATTR_NAME, a->name))
>> +               return -EMSGSIZE;
>> +
>> +       if (nla_put_u32(skb, NET_FLOW_ACTION_ATTR_UID, a->uid))
>> +               return -EMSGSIZE;
>> +
>> +       if (!a->args)
>> +               return 0;
>> +
>> +       for (this = &a->args[0]; strlen(this->name) > 0; this++)
>> +               args++;
>> +
>
> Since you only need to know that there are > 0 args, but don't need
> the actual count, can you simplify test with something like:
>

good catch, this is a hold over from some code I rewrote I'll clean
this up like,


  static int net_flow_put_action(struct sk_buff *skb, struct 
net_flow_action *a)
  {
          struct net_flow_action_arg *this;
          struct nlattr *nest;
          int err;

          if (a->name && nla_put_string(skb, NET_FLOW_ACTION_ATTR_NAME, 
a->name))
                  return -EMSGSIZE;

          if (nla_put_u32(skb, NET_FLOW_ACTION_ATTR_UID, a->uid))
                  return -EMSGSIZE;

          if (a->args && a->args[0].type) {
                  nest = nla_nest_start(skb, 
NET_FLOW_ACTION_ATTR_SIGNATURE);
                  if (!nest)
                          return -EMSGSIZE;

                  err = net_flow_put_act_types(skb, a->args);
                  if (err) {
                          nla_nest_cancel(skb, nest);
                          return err;
                  }
                  nla_nest_end(skb, nest);
        }

         return 0;
  }

I think that should probably work. Of course I'll compile it and test
it.


>     bool has_args = strlen(a->args->name) > 0;
>
> or
>
>    bool has_args = !!a->args->type;
>
>> +       if (args) {
>> +               nest = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
>> +               if (!nest)
>> +                       goto nest_put_failure;
>
> Maybe just return -EMSGSIZE here and skip goto.
>
>> +
>> +               err = net_flow_put_act_types(skb, a->args);
>> +               if (err) {
>> +                       nla_nest_cancel(skb, nest);
>> +                       return err;
>> +               }
>> +               nla_nest_end(skb, nest);
>> +       }
>> +
>> +       return 0;
>> +nest_put_failure:
>> +       return -EMSGSIZE;
>> +}
>> +
>> +static int net_flow_put_actions(struct sk_buff *skb,
>> +                               struct net_flow_action **acts)
>> +{
>> +       struct nlattr *actions;
>> +       int err, i;
>> +
>> +       actions = nla_nest_start(skb, NET_FLOW_ACTIONS);
>> +       if (!actions)
>> +               return -EMSGSIZE;
>> +
>> +       for (i = 0; acts[i]->uid; i++) {
>
> Using for(act = acts; act->udi; act++) will make code a little nicer.
>

not entirely convinced its any nicer that way but sure I'll convert it.

[...]

>> +static int net_flow_cmd_get_actions(struct sk_buff *skb,
>> +                                   struct genl_info *info)
>> +{
>> +       struct net_flow_action **a;
>> +       struct net_device *dev;
>> +       struct sk_buff *msg;
>> +
>> +       dev = net_flow_get_dev(info);
>> +       if (!dev)
>> +               return -EINVAL;
>> +
>> +       if (!dev->netdev_ops->ndo_flow_get_actions) {
>> +               dev_put(dev);
>> +               return -EOPNOTSUPP;
>> +       }
>> +
>> +       a = dev->netdev_ops->ndo_flow_get_actions(dev);
>> +       if (!a)
>> +               return -EBUSY;
>
> Is it assumed ndo_flow_get_actions() returns a pointer to a static
> list of actions?  What if the device wants to give up a dynamic list
> of actions?  I'm trying to understand the lifetime of pointer 'a'.
> What would cause -EBUSY condition?
>

Ah this is a good point. At the moment if a driver dynamically changes
a structure then its going to break because there is no locking
involved. I think the best way to do this is to use RCU here. We can
return rcu dereferenced pointers and then drivers will need to wait a
grace period before free'ing the old pointer. To simplify drivers we
can do this from helper calls and document the semantics.

Currently rocker is static so we don't have any issues. If no one minds
I would like to do this in a follow up series.

>> +
>> +       msg = net_flow_build_actions_msg(a, dev,
>> +                                        info->snd_portid,
>> +                                        info->snd_seq,
>> +                                        NET_FLOW_TABLE_CMD_GET_ACTIONS);
>> +       dev_put(dev);
>> +
>> +       if (IS_ERR(msg))
>> +               return PTR_ERR(msg);
>> +
>> +       return genlmsg_reply(msg, info);
>> +}
>> +
>> +static int net_flow_put_table(struct net_device *dev,
>> +                             struct sk_buff *skb,
>> +                             struct net_flow_table *t)
>> +{
>> +       struct nlattr *matches, *actions;
>> +       int i;
>> +
>> +       if (nla_put_string(skb, NET_FLOW_TABLE_ATTR_NAME, t->name) ||
>> +           nla_put_u32(skb, NET_FLOW_TABLE_ATTR_UID, t->uid) ||
>> +           nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SOURCE, t->source) ||
>> +           nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SIZE, t->size))
>> +               return -EMSGSIZE;
>> +
>> +       matches = nla_nest_start(skb, NET_FLOW_TABLE_ATTR_MATCHES);
>> +       if (!matches)
>> +               return -EMSGSIZE;
>> +
>> +       for (i = 0; t->matches[i].instance; i++)
>
> pointer-based loop better than i-based?  my personal preference, i guess.

hmm I guess I tended to write these with indices. I might leave them
for now but can change them if the consensus is pointer loops are easier
to read.

>
>> +               nla_put(skb, NET_FLOW_FIELD_REF,
>> +                       sizeof(struct net_flow_field_ref),
>> +                       &t->matches[i]);
>> +       nla_nest_end(skb, matches);
>> +


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow
  2014-12-31 19:46 ` [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow John Fastabend
@ 2015-01-06  6:19   ` Scott Feldman
  2015-01-08 17:39   ` Jiri Pirko
  1 sibling, 0 replies; 60+ messages in thread
From: Scott Feldman @ 2015-01-06  6:19 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On Wed, Dec 31, 2014 at 11:46 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> Now that the device capabilities are exposed we can add support to
> add and delete flows from the tables.
>
> The two operations are
>
> table_set_flows :
>
>   The set flow operations is used to program a set of flows into a
>   hardware device table. The message is consumed via netlink encoded
>   message which is then decoded into a null terminated  array of
>   flow entry structures. A flow entry structure is defined as
>
>      struct net_flow_flow {
>                           int table_id;
>                           int uid;
>                           int priority;
>                           struct net_flow_field_ref *matches;
>                           struct net_flow_action *actions;
>      }
>
>   The table id is the _uid_ returned from 'get_tables' operatoins.
>   Matches is a set of match criteria for packets with a logical AND
>   operation done on the set so packets match the entire criteria.
>   Actions provide a set of actions to perform when the flow rule is
>   hit. Both matches and actions are null terminated arrays.
>
>   The flows are configured in hardware using an ndo op. We do not
>   provide a commit operation at the moment and expect hardware
>   commits the flows one at a time. Future work may require a commit
>   operation to tell the hardware we are done loading flow rules. On
>   some hardware this will help bulk updates.
>
>   Its possible for hardware to return an error from a flow set
>   operation. This can occur for many reasons both transient and
>   resource constraints. We have different error handling strategies
>   built in and listed here,
>
>     *_ERROR_ABORT      abort on first error with errmsg
>
>     *_ERROR_CONTINUE   continue programming flows no errmsg
>
>     *_ERROR_ABORT_LOG  abort on first error and return flow that
>                        failed to user space in reply msg
>
>     *_ERROR_CONT_LOG   continue programming flows and return a list
>                        of flows that failed to user space in a reply
>                        msg.
>
>   notably missing is a rollback error strategy. I don't have a
>   use for this in software yet but the strategy can be added with
>   *_ERROR_ROLLBACK for example.
>
> table_del_flows
>
>   The delete flow operation uses the same structures and error
>   handling strategies as the table_set_flows operations. Although on
>   delete messges ommit the matches/actions arrays because they are
>   not needed to lookup the flow.
>
> Also thanks to Simon Horman for fixes and other help.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  include/linux/if_flow.h      |   21 ++
>  include/linux/netdevice.h    |    8 +
>  include/uapi/linux/if_flow.h |   49 ++++
>  net/core/flow_table.c        |  501 ++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 579 insertions(+)
>
> diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
> index 1b6c1ea..20fa752 100644
> --- a/include/linux/if_flow.h
> +++ b/include/linux/if_flow.h
> @@ -90,4 +90,25 @@ struct net_flow_tbl_node {
>         __u32 flags;
>         struct net_flow_jump_table *jump;
>  };
> +
> +/**
> + * @struct net_flow_flow
> + * @brief describes the match/action entry
> + *
> + * @uid unique identifier for flow
> + * @priority priority to execute flow match/action in table

What is the convention on priority?  0 is lowest priority or highest?

> + * @match null terminated set of match uids match criteria
> + * @actoin null terminated set of action uids to apply to match
> + *
> + * Flows must match all entries in match set.
> + */
> +struct net_flow_flow {
> +       int table_id;
> +       int uid;
> +       int priority;
> +       struct net_flow_field_ref *matches;
> +       struct net_flow_action *actions;
> +};
> +
> +int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow);
>  #endif
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3c3c856..be8d4e4 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1197,6 +1197,14 @@ struct net_device_ops {
>         struct net_flow_header  **(*ndo_flow_get_headers)(struct net_device *dev);
>         struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev);
>         struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev);
> +       int                     (*ndo_flow_get_flows)(struct sk_buff *skb,
> +                                                     struct net_device *dev,
> +                                                     int table,
> +                                                     int min, int max);
> +       int                     (*ndo_flow_set_flows)(struct net_device *dev,
> +                                                     struct net_flow_flow *f);
> +       int                     (*ndo_flow_del_flows)(struct net_device *dev,
> +                                                     struct net_flow_flow *f);

Need doc for these in BIG comment block above this struct.  Same for
ndo_flow_xxx added in previous patch.

>  #endif
>  };
>
> diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
> index 2acdb38..125cdc6 100644
> --- a/include/uapi/linux/if_flow.h
> +++ b/include/uapi/linux/if_flow.h
> @@ -329,6 +329,48 @@ enum {
>  #define NET_FLOW_TABLE_GRAPH_MAX (__NET_FLOW_TABLE_GRAPH_MAX - 1)
>
>  enum {
> +       NET_FLOW_NET_FLOW_UNSPEC,
> +       NET_FLOW_FLOW,
> +       __NET_FLOW_NET_FLOW_MAX,
> +};
> +#define NET_FLOW_NET_FLOW_MAX (__NET_FLOW_NET_FLOW_MAX - 1)
> +
> +enum {
> +       NET_FLOW_TABLE_FLOWS_UNSPEC,
> +       NET_FLOW_TABLE_FLOWS_TABLE,
> +       NET_FLOW_TABLE_FLOWS_MINPRIO,
> +       NET_FLOW_TABLE_FLOWS_MAXPRIO,
> +       NET_FLOW_TABLE_FLOWS_FLOWS,
> +       __NET_FLOW_TABLE_FLOWS_MAX,
> +};
> +#define NET_FLOW_TABLE_FLOWS_MAX (__NET_FLOW_TABLE_FLOWS_MAX - 1)
> +
> +enum {

NET_FLOW_FLOWS_ERROR_UNSPEC?

> +       /* Abort with normal errmsg */
> +       NET_FLOW_FLOWS_ERROR_ABORT,
> +       /* Ignore errors and continue without logging */
> +       NET_FLOW_FLOWS_ERROR_CONTINUE,
> +       /* Abort and reply with invalid flow fields */
> +       NET_FLOW_FLOWS_ERROR_ABORT_LOG,
> +       /* Continue and reply with list of invalid flows */
> +       NET_FLOW_FLOWS_ERROR_CONT_LOG,
> +       __NET_FLOWS_FLOWS_ERROR_MAX,
> +};
> +#define NET_FLOWS_FLOWS_ERROR_MAX (__NET_FLOWS_FLOWS_ERROR_MAX - 1)
> +
> +enum {
> +       NET_FLOW_ATTR_UNSPEC,
> +       NET_FLOW_ATTR_ERROR,
> +       NET_FLOW_ATTR_TABLE,
> +       NET_FLOW_ATTR_UID,
> +       NET_FLOW_ATTR_PRIORITY,
> +       NET_FLOW_ATTR_MATCHES,
> +       NET_FLOW_ATTR_ACTIONS,
> +       __NET_FLOW_ATTR_MAX,
> +};
> +#define NET_FLOW_ATTR_MAX (__NET_FLOW_ATTR_MAX - 1)
> +
> +enum {
>         NET_FLOW_IDENTIFIER_IFINDEX, /* net_device ifindex */
>  };
>
> @@ -343,6 +385,9 @@ enum {
>         NET_FLOW_HEADER_GRAPH,
>         NET_FLOW_TABLE_GRAPH,
>
> +       NET_FLOW_FLOWS,
> +       NET_FLOW_FLOWS_ERROR,
> +
>         __NET_FLOW_MAX,
>         NET_FLOW_MAX = (__NET_FLOW_MAX - 1),
>  };
> @@ -354,6 +399,10 @@ enum {
>         NET_FLOW_TABLE_CMD_GET_HDR_GRAPH,
>         NET_FLOW_TABLE_CMD_GET_TABLE_GRAPH,
>
> +       NET_FLOW_TABLE_CMD_GET_FLOWS,
> +       NET_FLOW_TABLE_CMD_SET_FLOWS,
> +       NET_FLOW_TABLE_CMD_DEL_FLOWS,
> +
>         __NET_FLOW_CMD_MAX,
>         NET_FLOW_CMD_MAX = (__NET_FLOW_CMD_MAX - 1),
>  };
> diff --git a/net/core/flow_table.c b/net/core/flow_table.c
> index ec3f06d..f4cf293 100644
> --- a/net/core/flow_table.c
> +++ b/net/core/flow_table.c
> @@ -774,6 +774,489 @@ static int net_flow_cmd_get_table_graph(struct sk_buff *skb,
>         return genlmsg_reply(msg, info);
>  }
>
> +static struct sk_buff *net_flow_build_flows_msg(struct net_device *dev,
> +                                               u32 portid, int seq, u8 cmd,
> +                                               int min, int max, int table)
> +{
> +       struct genlmsghdr *hdr;
> +       struct nlattr *flows;
> +       struct sk_buff *skb;
> +       int err = -ENOBUFS;
> +
> +       skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);

Does netlink msg size limit the number of flows we can return?  If I
have 10K unique prefix routes, resulting in 10K flows in my L3 table,
all at same priority, can it be dumped?

I'm wondering (out loud) if get_flows is something that should be
supported at the driver level, or should it be managed above the
driver.  In other words, whoever calls set_flows and del_flows knows
what's what and could return get_flows rather than calling down to
driver/device to get_flows.  Unless driver/device could create flows
independent of set_flows and del_flows.

> +       if (!skb)
> +               return ERR_PTR(-ENOBUFS);
> +
> +       hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
> +       if (!hdr)
> +               goto out;
> +
> +       if (nla_put_u32(skb,
> +                       NET_FLOW_IDENTIFIER_TYPE,
> +                       NET_FLOW_IDENTIFIER_IFINDEX) ||
> +           nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
> +               err = -ENOBUFS;
> +               goto out;
> +       }
> +
> +       flows = nla_nest_start(skb, NET_FLOW_FLOWS);
> +       if (!flows) {
> +               err = -EMSGSIZE;
> +               goto out;
> +       }
> +
> +       err = dev->netdev_ops->ndo_flow_get_flows(skb, dev, table, min, max);
> +       if (err < 0)
> +               goto out_cancel;
> +
> +       nla_nest_end(skb, flows);
> +
> +       err = genlmsg_end(skb, hdr);
> +       if (err < 0)
> +               goto out;
> +
> +       return skb;
> +out_cancel:
> +       nla_nest_cancel(skb, flows);
> +out:
> +       nlmsg_free(skb);
> +       return ERR_PTR(err);
> +}
> +
> +static const
> +struct nla_policy net_flow_table_flows_policy[NET_FLOW_TABLE_FLOWS_MAX + 1] = {
> +       [NET_FLOW_TABLE_FLOWS_TABLE]   = { .type = NLA_U32,},
> +       [NET_FLOW_TABLE_FLOWS_MINPRIO] = { .type = NLA_U32,},
> +       [NET_FLOW_TABLE_FLOWS_MAXPRIO] = { .type = NLA_U32,},
> +       [NET_FLOW_TABLE_FLOWS_FLOWS]   = { .type = NLA_NESTED,},
> +};
> +
> +static int net_flow_table_cmd_get_flows(struct sk_buff *skb,
> +                                       struct genl_info *info)
> +{
> +       struct nlattr *tb[NET_FLOW_TABLE_FLOWS_MAX+1];
> +       int table, min = -1, max = -1;
> +       struct net_device *dev;
> +       struct sk_buff *msg;
> +       int err = -EINVAL;
> +
> +       dev = net_flow_get_dev(info);
> +       if (!dev)
> +               return -EINVAL;
> +
> +       if (!dev->netdev_ops->ndo_flow_get_flows) {
> +               dev_put(dev);
> +               return -EOPNOTSUPP;
> +       }
> +
> +       if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] ||
> +           !info->attrs[NET_FLOW_IDENTIFIER] ||
> +           !info->attrs[NET_FLOW_FLOWS])
> +               goto out;
> +
> +       err = nla_parse_nested(tb, NET_FLOW_TABLE_FLOWS_MAX,
> +                              info->attrs[NET_FLOW_FLOWS],
> +                              net_flow_table_flows_policy);
> +       if (err)
> +               goto out;
> +
> +       if (!tb[NET_FLOW_TABLE_FLOWS_TABLE])
> +               goto out;
> +
> +       table = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_TABLE]);
> +
> +       if (tb[NET_FLOW_TABLE_FLOWS_MINPRIO])
> +               min = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MINPRIO]);
> +       if (tb[NET_FLOW_TABLE_FLOWS_MAXPRIO])
> +               max = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MAXPRIO]);

Just curious, what is the intended use of min/max prio?  Was it to
reduce number of flows returned in one get_flows call?


> +       msg = net_flow_build_flows_msg(dev,
> +                                      info->snd_portid,
> +                                      info->snd_seq,
> +                                      NET_FLOW_TABLE_CMD_GET_FLOWS,
> +                                      min, max, table);
> +       dev_put(dev);
> +
> +       if (IS_ERR(msg))
> +               return PTR_ERR(msg);
> +
> +       return genlmsg_reply(msg, info);
> +out:
> +       dev_put(dev);
> +       return err;
> +}
> +
> +static struct sk_buff *net_flow_start_errmsg(struct net_device *dev,
> +                                            struct genlmsghdr **hdr,
> +                                            u32 portid, int seq, u8 cmd)
> +{
> +       struct genlmsghdr *h;
> +       struct sk_buff *skb;
> +
> +       skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +       if (!skb)
> +               return ERR_PTR(-EMSGSIZE);
> +
> +       h = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
> +       if (!h)
> +               return ERR_PTR(-EMSGSIZE);
> +
> +       if (nla_put_u32(skb,
> +                       NET_FLOW_IDENTIFIER_TYPE,
> +                       NET_FLOW_IDENTIFIER_IFINDEX) ||
> +           nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex))
> +               return ERR_PTR(-EMSGSIZE);
> +
> +       *hdr = h;
> +       return skb;
> +}
> +
> +static struct sk_buff *net_flow_end_flow_errmsg(struct sk_buff *skb,
> +                                               struct genlmsghdr *hdr)
> +{
> +       int err;
> +
> +       err = genlmsg_end(skb, hdr);
> +       if (err < 0) {
> +               nlmsg_free(skb);
> +               return ERR_PTR(err);
> +       }
> +
> +       return skb;
> +}
> +
> +static int net_flow_put_flow_action(struct sk_buff *skb,
> +                                   struct net_flow_action *a)
> +{
> +       struct nlattr *action, *sigs;
> +       int i, err = 0;
> +
> +       action = nla_nest_start(skb, NET_FLOW_ACTION);
> +       if (!action)
> +               return -EMSGSIZE;
> +
> +       if (nla_put_u32(skb, NET_FLOW_ACTION_ATTR_UID, a->uid))
> +               return -EMSGSIZE;
> +
> +       if (!a->args)
> +               goto done;
> +
> +       for (i = 0; a->args[i].type; i++) {
> +               sigs = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
> +               if (!sigs) {
> +                       nla_nest_cancel(skb, action);
> +                       return -EMSGSIZE;
> +               }
> +
> +               err = net_flow_put_act_types(skb, a[i].args);
> +               if (err) {
> +                       nla_nest_cancel(skb, action);
> +                       nla_nest_cancel(skb, sigs);

order seems backwards.  i think you can just cancel outer.

> +                       return err;
> +               }
> +               nla_nest_end(skb, sigs);
> +       }
> +
> +done:
> +       nla_nest_end(skb, action);
> +       return 0;
> +}
> +
> +int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow)
> +{
> +       struct nlattr *flows, *matches;
> +       struct nlattr *actions = NULL; /* must be null to unwind */
> +       int err, j, i = 0;
> +
> +       flows = nla_nest_start(skb, NET_FLOW_FLOW);
> +       if (!flows)
> +               goto put_failure;
> +
> +       if (nla_put_u32(skb, NET_FLOW_ATTR_TABLE, flow->table_id) ||
> +           nla_put_u32(skb, NET_FLOW_ATTR_UID, flow->uid) ||
> +           nla_put_u32(skb, NET_FLOW_ATTR_PRIORITY, flow->priority))
> +               goto flows_put_failure;
> +
> +       if (flow->matches) {
> +               matches = nla_nest_start(skb, NET_FLOW_ATTR_MATCHES);
> +               if (!matches)
> +                       goto flows_put_failure;
> +
> +               for (j = 0; flow->matches && flow->matches[j].header; j++) {

for(match = flow->matches; match->header; match++)

> +                       struct net_flow_field_ref *f = &flow->matches[j];
> +
> +                       if (!f->header)
> +                               continue;

Already checked in for loop?

> +
> +                       nla_put(skb, NET_FLOW_FIELD_REF, sizeof(*f), f);

no err check

> +               }
> +               nla_nest_end(skb, matches);
> +       }
> +
> +       if (flow->actions) {
> +               actions = nla_nest_start(skb, NET_FLOW_ATTR_ACTIONS);
> +               if (!actions)
> +                       goto flows_put_failure;
> +
> +               for (i = 0; flow->actions && flow->actions[i].uid; i++) {
> +                       err = net_flow_put_flow_action(skb, &flow->actions[i]);
> +                       if (err) {
> +                               nla_nest_cancel(skb, actions);

just cancel outer (I think)

> +                               goto flows_put_failure;
> +                       }
> +               }
> +               nla_nest_end(skb, actions);
> +       }
> +
> +       nla_nest_end(skb, flows);
> +       return 0;
> +
> +flows_put_failure:
> +       nla_nest_cancel(skb, flows);
> +put_failure:
> +       return -EMSGSIZE;
> +}
> +EXPORT_SYMBOL(net_flow_put_flow);
> +
> +static int net_flow_get_field(struct net_flow_field_ref *field,
> +                             struct nlattr *nla)
> +{
> +       if (nla_type(nla) != NET_FLOW_FIELD_REF)
> +               return -EINVAL;
> +
> +       if (nla_len(nla) < sizeof(*field))
> +               return -EINVAL;
> +
> +       *field = *(struct net_flow_field_ref *)nla_data(nla);
> +       return 0;
> +}

maybe return struct net_flow_field_ref * to simplify return logic.

> +
> +static int net_flow_get_action(struct net_flow_action *a, struct nlattr *attr)
> +{
> +       struct nlattr *act[NET_FLOW_ACTION_ATTR_MAX+1];
> +       struct nlattr *args;
> +       int rem;
> +       int err, count = 0;
> +
> +       if (nla_type(attr) != NET_FLOW_ACTION) {
> +               pr_warn("%s: expected NET_FLOW_ACTION\n", __func__);
> +               return 0;
> +       }
> +
> +       err = nla_parse_nested(act, NET_FLOW_ACTION_ATTR_MAX,
> +                              attr, net_flow_action_policy);
> +       if (err < 0)
> +               return err;
> +
> +       if (!act[NET_FLOW_ACTION_ATTR_UID] ||
> +           !act[NET_FLOW_ACTION_ATTR_SIGNATURE])
> +               return -EINVAL;
> +
> +       a->uid = nla_get_u32(act[NET_FLOW_ACTION_ATTR_UID]);
> +
> +       nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem)
> +               count++; /* unoptimized max possible */
> +
> +       a->args = kcalloc(count + 1,
> +                         sizeof(struct net_flow_action_arg),
> +                         GFP_KERNEL);

kcalloc failure?

> +       count = 0;
> +
> +       nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem) {
> +               if (nla_type(args) != NET_FLOW_ACTION_ARG)
> +                       continue;
> +
> +               if (nla_len(args) < sizeof(struct net_flow_action_arg)) {
> +                       kfree(a->args);
> +                       return -EINVAL;
> +               }
> +
> +               a->args[count] = *(struct net_flow_action_arg *)nla_data(args);

count++?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-06  6:04     ` John Fastabend
@ 2015-01-06  6:40       ` Scott Feldman
  0 siblings, 0 replies; 60+ messages in thread
From: Scott Feldman @ 2015-01-06  6:40 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On Mon, Jan 5, 2015 at 10:04 PM, John Fastabend
<john.fastabend@gmail.com> wrote:

>>> + * @uid unique identifier for table
>>> + * @source uid of parent table
>>
>>
>> Is parent table the table previous in the pipeline?  If so, what if
>> you can get to table from N different parent tables, what goes in
>> source?
>
>
> No, you can get the layout of tables from the table graph ops.
>
> Source is used when a single tcam or other implementation mechanism
> is sliced into a set of tables. The current rocker world doesn't use
> this very much at the moment because its static and I just assumed
> every table came out of the same virtual hardware namespace.
>
> A simple example world would be to come up with a set of large virtual
> TCAMs. Any given TCAM maybe sliced into a set of tables. Users may
> organize these either via some out of band configuration at init or
> power on time. In the rocker case we could specify this when we load
> qemu. For now it is just informational. But if we start allowing users
> to create delete tables at runtime it is important to "know" where the
> slices are being allocated/free'd from. The source gives you this
> information.
>
> The hardware devices I'm working on have multiple sources we can
> allocate/free tables from. The source values would provide a way to
> track down which tables are in which hardware namespaces.
>
> Hope that helps?

Got it, thanks.

Can source be encoded in tbl_id?


>> hdr or header?  pick one, probably hdr.
>
>
> hdr is shorter and doesn't lose any clarity IMO I'll use net_flow_hdr
> and net_flow_hdr_node
>
>>
>>> +       struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct
>>> net_device *dev);
>>
>>
>> move this up next to get_tables
>
>
> sure also what do you think tbl instead of table.

+1 for tbl.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch
  2014-12-31 19:47 ` [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch John Fastabend
  2015-01-04  8:43   ` Or Gerlitz
@ 2015-01-06  7:01   ` Scott Feldman
  2015-01-06 17:00     ` John Fastabend
  1 sibling, 1 reply; 60+ messages in thread
From: Scott Feldman @ 2015-01-06  7:01 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On Wed, Dec 31, 2014 at 11:47 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> This adds rocker support for the net_flow_get_* operations. With this
> we can interrogate rocker.
>
> Here we see that for static configurations enabling the get operations
> is simply a matter of defining a pipeline model and returning the
> structures for the core infrastructure to encapsulate into netlink
> messages.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/ethernet/rocker/rocker.c          |   35 +
>  drivers/net/ethernet/rocker/rocker_pipeline.h |  673 +++++++++++++++++++++++++
>  2 files changed, 708 insertions(+)
>  create mode 100644 drivers/net/ethernet/rocker/rocker_pipeline.h
>
> diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
> index fded127..4c6787a 100644
> --- a/drivers/net/ethernet/rocker/rocker.c
> +++ b/drivers/net/ethernet/rocker/rocker.c
> @@ -36,6 +36,7 @@
>  #include <generated/utsrelease.h>
>
>  #include "rocker.h"
> +#include "rocker_pipeline.h"
>
>  static const char rocker_driver_name[] = "rocker";
>
> @@ -3780,6 +3781,33 @@ static int rocker_port_switch_port_stp_update(struct net_device *dev, u8 state)
>         return rocker_port_stp_update(rocker_port, state);
>  }
>
> +#ifdef CONFIG_NET_FLOW_TABLES

Can this #ifdef test be moved out of driver?  The if_flow core code
can stub out operations if CONFIG_NET_FLOW_TABLES isn't defined.

> +static struct net_flow_table **rocker_get_tables(struct net_device *d)
> +{
> +       return rocker_table_list;
> +}
> +
> +static struct net_flow_header **rocker_get_headers(struct net_device *d)
> +{
> +       return rocker_header_list;
> +}
> +
> +static struct net_flow_action **rocker_get_actions(struct net_device *d)
> +{
> +       return rocker_action_list;
> +}
> +
> +static struct net_flow_tbl_node **rocker_get_tgraph(struct net_device *d)
> +{
> +       return rocker_table_nodes;
> +}
> +
> +static struct net_flow_hdr_node **rocker_get_hgraph(struct net_device *d)
> +{
> +       return rocker_header_nodes;
> +}
> +#endif
> +
>  static const struct net_device_ops rocker_port_netdev_ops = {
>         .ndo_open                       = rocker_port_open,
>         .ndo_stop                       = rocker_port_stop,
> @@ -3794,6 +3822,13 @@ static const struct net_device_ops rocker_port_netdev_ops = {
>         .ndo_bridge_getlink             = rocker_port_bridge_getlink,
>         .ndo_switch_parent_id_get       = rocker_port_switch_parent_id_get,
>         .ndo_switch_port_stp_update     = rocker_port_switch_port_stp_update,
> +#ifdef CONFIG_NET_FLOW_TABLES

same comment here

> +       .ndo_flow_get_tables            = rocker_get_tables,
> +       .ndo_flow_get_headers           = rocker_get_headers,
> +       .ndo_flow_get_actions           = rocker_get_actions,
> +       .ndo_flow_get_tbl_graph         = rocker_get_tgraph,
> +       .ndo_flow_get_hdr_graph         = rocker_get_hgraph,
> +#endif
>  };
>
>  /********************
> diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
> new file mode 100644
> index 0000000..9544339
> --- /dev/null
> +++ b/drivers/net/ethernet/rocker/rocker_pipeline.h

Add standard header info...copyright/license.

> @@ -0,0 +1,673 @@
> +#ifndef _MY_PIPELINE_H_
> +#define _MY_PIPELINE_H_

_ROCKER_PIPELINE_H_

> +
> +#include <linux/if_flow.h>
> +
> +/* header definition */
> +#define HEADER_ETHERNET_SRC_MAC 1
> +#define HEADER_ETHERNET_DST_MAC 2
> +#define HEADER_ETHERNET_ETHERTYPE 3

Use enum?

> +struct net_flow_field ethernet_fields[3] = {

ethernet_fields[] = ...   // let compiler size it

> +       { .name = "src_mac", .uid = HEADER_ETHERNET_SRC_MAC, .bitwidth = 48},
> +       { .name = "dst_mac", .uid = HEADER_ETHERNET_DST_MAC, .bitwidth = 48},
> +       { .name = "ethertype",
> +         .uid = HEADER_ETHERNET_ETHERTYPE,
> +         .bitwidth = 16},
> +};
> +
> +#define HEADER_ETHERNET 1
> +struct net_flow_header ethernet = {
> +       .name = "ethernet",
> +       .uid = HEADER_ETHERNET,
> +       .field_sz = 3,

ARRAY_SIZE()?

> +       .fields = ethernet_fields,
> +};
> +
> +#define HEADER_VLAN_PCP 1
> +#define HEADER_VLAN_CFI 2
> +#define HEADER_VLAN_VID 3
> +#define HEADER_VLAN_ETHERTYPE 4
> +struct net_flow_field vlan_fields[4] = {

[] = ...

> +       { .name = "pcp", .uid = HEADER_VLAN_PCP, .bitwidth = 3,},
> +       { .name = "cfi", .uid = HEADER_VLAN_CFI, .bitwidth = 1,},
> +       { .name = "vid", .uid = HEADER_VLAN_VID, .bitwidth = 12,},
> +       { .name = "ethertype", .uid = HEADER_VLAN_ETHERTYPE, .bitwidth = 16,},
> +};
> +
> +#define HEADER_VLAN 2
> +struct net_flow_header vlan = {
> +       .name = "vlan",
> +       .uid = HEADER_VLAN,
> +       .field_sz = 4,

ARRAY_SIZE()

> +       .fields = vlan_fields,
> +};
> +
> +#define HEADER_IPV4_VERSION 1
> +#define HEADER_IPV4_IHL 2
> +#define HEADER_IPV4_DSCP 3
> +#define HEADER_IPV4_ECN 4
> +#define HEADER_IPV4_LENGTH 5
> +#define HEADER_IPV4_IDENTIFICATION 6
> +#define HEADER_IPV4_FLAGS 7
> +#define HEADER_IPV4_FRAGMENT_OFFSET 8
> +#define HEADER_IPV4_TTL 9
> +#define HEADER_IPV4_PROTOCOL 10
> +#define HEADER_IPV4_CSUM 11
> +#define HEADER_IPV4_SRC_IP 12
> +#define HEADER_IPV4_DST_IP 13
> +#define HEADER_IPV4_OPTIONS 14
> +struct net_flow_field ipv4_fields[14] = {
> +       { .name = "version",
> +         .uid = HEADER_IPV4_VERSION,
> +         .bitwidth = 4,},
> +       { .name = "ihl",
> +         .uid = HEADER_IPV4_IHL,
> +         .bitwidth = 4,},
> +       { .name = "dscp",
> +         .uid = HEADER_IPV4_DSCP,
> +         .bitwidth = 6,},
> +       { .name = "ecn",
> +         .uid = HEADER_IPV4_ECN,
> +         .bitwidth = 2,},
> +       { .name = "length",
> +         .uid = HEADER_IPV4_LENGTH,
> +         .bitwidth = 8,},
> +       { .name = "identification",
> +         .uid = HEADER_IPV4_IDENTIFICATION,
> +         .bitwidth = 8,},
> +       { .name = "flags",
> +         .uid = HEADER_IPV4_FLAGS,
> +         .bitwidth = 3,},
> +       { .name = "fragment_offset",
> +         .uid = HEADER_IPV4_FRAGMENT_OFFSET,
> +         .bitwidth = 13,},
> +       { .name = "ttl",
> +         .uid = HEADER_IPV4_TTL,
> +         .bitwidth = 1,},
> +       { .name = "protocol",
> +         .uid = HEADER_IPV4_PROTOCOL,
> +         .bitwidth = 8,},
> +       { .name = "csum",
> +         .uid = HEADER_IPV4_CSUM,
> +         .bitwidth = 8,},
> +       { .name = "src_ip",
> +         .uid = HEADER_IPV4_SRC_IP,
> +         .bitwidth = 32,},
> +       { .name = "dst_ip",
> +         .uid = HEADER_IPV4_DST_IP,
> +         .bitwidth = 32,},
> +       { .name = "options",
> +         .uid = HEADER_IPV4_OPTIONS,
> +         .bitwidth = -1,},
> +};
> +
> +#define HEADER_IPV4 3
> +struct net_flow_header ipv4 = {
> +       .name = "ipv4",
> +       .uid = HEADER_IPV4,
> +       .field_sz = 14,
> +       .fields = ipv4_fields,
> +};
> +
> +#define HEADER_METADATA_IN_LPORT 1
> +#define HEADER_METADATA_GOTO_TBL 2
> +#define HEADER_METADATA_GROUP_ID 3
> +struct net_flow_field metadata_fields[3] = {
> +       { .name = "in_lport",
> +         .uid = HEADER_METADATA_IN_LPORT,
> +         .bitwidth = 32,},
> +       { .name = "goto_tbl",
> +         .uid = HEADER_METADATA_GOTO_TBL,
> +         .bitwidth = 16,},
> +       { .name = "group_id",
> +         .uid = HEADER_METADATA_GROUP_ID,
> +         .bitwidth = 32,},
> +};
> +
> +#define HEADER_METADATA 4
> +struct net_flow_header metadata_t = {
> +       .name = "metadata_t",
> +       .uid = HEADER_METADATA,
> +       .field_sz = 3,
> +       .fields = metadata_fields,
> +};
> +
> +struct net_flow_header null_hdr = {.name = "",
> +                                  .uid = 0,
> +                                  .field_sz = 0,
> +                                  .fields = NULL};
> +
> +struct net_flow_header *rocker_header_list[8] = {

[] = ...

Not sure where the [8] comes from

> +       &ethernet,
> +       &vlan,
> +       &ipv4,
> +       &metadata_t,
> +       &null_hdr,

Seems just setting last entry to NULL would be cleaner:

struct foo *foo[] = {
    &one,
    &two,
    &three,
    NULL,
};

> +};
> +
> +/* action definitions */
> +struct net_flow_action_arg null_args[1] = {
> +       {
> +               .name = "",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_NULL,
> +       },
> +};
> +
> +struct net_flow_action null_action = {
> +       .name = "", .uid = 0, .args = NULL,
> +};
> +
> +struct net_flow_action_arg set_goto_table_args[2] = {
> +       {
> +               .name = "table",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_U16,
> +               .value_u16 = 0,
> +       },
> +       {
> +               .name = "",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_NULL,
> +       },
> +};
> +
> +#define ACTION_SET_GOTO_TABLE 1
> +struct net_flow_action set_goto_table = {
> +       .name = "set_goto_table",
> +       .uid = ACTION_SET_GOTO_TABLE,
> +       .args = set_goto_table_args,
> +};
> +
> +struct net_flow_action_arg set_vlan_id_args[2] = {
> +       {
> +               .name = "vlan_id",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_U16,
> +               .value_u16 = 0,
> +       },
> +       {
> +               .name = "",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_NULL,
> +       },
> +};
> +
> +#define ACTION_SET_VLAN_ID 2
> +struct net_flow_action set_vlan_id = {
> +       .name = "set_vlan_id",
> +       .uid = ACTION_SET_VLAN_ID,
> +       .args = set_vlan_id_args,
> +};
> +
> +/* TBD: what is the untagged bool about in vlan table */
> +#define ACTION_COPY_TO_CPU 3
> +struct net_flow_action copy_to_cpu = {
> +       .name = "copy_to_cpu",
> +       .uid = ACTION_COPY_TO_CPU,
> +       .args = null_args,
> +};
> +
> +struct net_flow_action_arg set_group_id_args[2] = {
> +       {
> +               .name = "group_id",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_U32,
> +               .value_u32 = 0,
> +       },
> +       {
> +               .name = "",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_NULL,
> +       },
> +};
> +
> +#define ACTION_SET_GROUP_ID 4
> +struct net_flow_action set_group_id = {
> +       .name = "set_group_id",
> +       .uid = ACTION_SET_GROUP_ID,
> +       .args = set_group_id_args,
> +};
> +
> +#define ACTION_POP_VLAN 5
> +struct net_flow_action pop_vlan = {
> +       .name = "pop_vlan",
> +       .uid = ACTION_POP_VLAN,
> +       .args = null_args,
> +};
> +
> +struct net_flow_action_arg set_eth_src_args[2] = {
> +       {
> +               .name = "eth_src",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_U64,
> +               .value_u64 = 0,
> +       },
> +       {
> +               .name = "",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_NULL,
> +       },
> +};
> +
> +#define ACTION_SET_ETH_SRC 6
> +struct net_flow_action set_eth_src = {
> +       .name = "set_eth_src",
> +       .uid = ACTION_SET_ETH_SRC,
> +       .args = set_eth_src_args,
> +};
> +
> +struct net_flow_action_arg set_eth_dst_args[2] = {
> +       {
> +               .name = "eth_dst",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_U64,
> +               .value_u64 = 0,
> +       },
> +       {
> +               .name = "",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_NULL,
> +       },
> +};
> +
> +#define ACTION_SET_ETH_DST 7
> +struct net_flow_action set_eth_dst = {
> +       .name = "set_eth_dst",
> +       .uid = ACTION_SET_ETH_DST,
> +       .args = set_eth_dst_args,
> +};
> +
> +struct net_flow_action_arg set_out_port_args[2] = {
> +       {
> +               .name = "set_out_port",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_U32,
> +               .value_u32 = 0,
> +       },
> +       {
> +               .name = "",
> +               .type = NET_FLOW_ACTION_ARG_TYPE_NULL,
> +       },
> +};
> +
> +#define ACTION_SET_OUT_PORT 8
> +struct net_flow_action set_out_port = {
> +       .name = "set_out_port",
> +       .uid = ACTION_SET_OUT_PORT,
> +       .args = set_out_port_args,
> +};
> +
> +struct net_flow_action *rocker_action_list[8] = {
> +       &set_goto_table,
> +       &set_vlan_id,
> +       &copy_to_cpu,
> +       &set_group_id,
> +       &pop_vlan,
> +       &set_eth_src,
> +       &set_eth_dst,
> +       &null_action,
> +};
> +
> +/* headers graph */
> +#define HEADER_INSTANCE_ETHERNET 1
> +#define HEADER_INSTANCE_VLAN_OUTER 2
> +#define HEADER_INSTANCE_IPV4 3
> +#define HEADER_INSTANCE_IN_LPORT 4
> +#define HEADER_INSTANCE_GOTO_TABLE 5
> +#define HEADER_INSTANCE_GROUP_ID 6
> +
> +struct net_flow_jump_table parse_ethernet[3] = {
> +       {
> +               .field = {
> +                  .header = HEADER_ETHERNET,
> +                  .field = HEADER_ETHERNET_ETHERTYPE,
> +                  .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
> +                  .value_u16 = 0x0800,

How is htons/ntohs conversions happening here?

Since these are network header fields, seems you want htons(0x0800).

> +               },
> +               .node = HEADER_INSTANCE_IPV4,
> +       },
> +       {
> +               .field = {
> +                  .header = HEADER_ETHERNET,
> +                  .field = HEADER_ETHERNET_ETHERTYPE,
> +                  .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
> +                  .value_u16 = 0x8100,
> +               },
> +               .node = HEADER_INSTANCE_VLAN_OUTER,
> +       },
> +       {
> +               .field = {0},
> +               .node = 0,
> +       },

just use NULL,

> +};
> +
> +int ethernet_headers[2] = {HEADER_ETHERNET, 0};
> +
> +struct net_flow_hdr_node ethernet_header_node = {
> +       .name = "ethernet",
> +       .uid = HEADER_INSTANCE_ETHERNET,
> +       .hdrs = ethernet_headers,
> +       .jump = parse_ethernet,
> +};
> +
> +struct net_flow_jump_table parse_vlan[2] = {
> +       {
> +               .field = {
> +                  .header = HEADER_VLAN,
> +                  .field = HEADER_VLAN_ETHERTYPE,
> +                  .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
> +                  .value_u16 = 0x0800,
> +               },
> +               .node = HEADER_INSTANCE_IPV4,
> +       },
> +       {
> +               .field = {0},
> +               .node = 0,
> +       },
> +};
> +
> +int vlan_headers[2] = {HEADER_VLAN, 0};
> +struct net_flow_hdr_node vlan_header_node = {
> +       .name = "vlan",
> +       .uid = HEADER_INSTANCE_VLAN_OUTER,
> +       .hdrs = vlan_headers,
> +       .jump = parse_vlan,
> +};
> +
> +struct net_flow_jump_table terminal_headers[2] = {
> +       {
> +               .field = {0},
> +               .node = NET_FLOW_JUMP_TABLE_DONE,
> +       },
> +       {
> +               .field = {0},
> +               .node = 0,
> +       },
> +};
> +
> +int ipv4_headers[2] = {HEADER_IPV4, 0};
> +struct net_flow_hdr_node ipv4_header_node = {
> +       .name = "ipv4",
> +       .uid = HEADER_INSTANCE_IPV4,
> +       .hdrs = ipv4_headers,
> +       .jump = terminal_headers,
> +};
> +
> +int metadata_headers[2] = {HEADER_METADATA, 0};
> +struct net_flow_hdr_node in_lport_header_node = {
> +       .name = "in_lport",
> +       .uid = HEADER_INSTANCE_IN_LPORT,
> +       .hdrs = metadata_headers,
> +       .jump = terminal_headers,
> +};
> +
> +struct net_flow_hdr_node goto_table_header_node = {
> +       .name = "goto_table",
> +       .uid = HEADER_INSTANCE_GOTO_TABLE,
> +       .hdrs = metadata_headers,
> +       .jump = terminal_headers,
> +};
> +
> +struct net_flow_hdr_node group_id_header_node = {
> +       .name = "group_id",
> +       .uid = HEADER_INSTANCE_GROUP_ID,
> +       .hdrs = metadata_headers,
> +       .jump = terminal_headers,
> +};
> +
> +struct net_flow_hdr_node null_header = {.name = "", .uid = 0,};
> +
> +struct net_flow_hdr_node *rocker_header_nodes[7] = {
> +       &ethernet_header_node,
> +       &vlan_header_node,
> +       &ipv4_header_node,
> +       &in_lport_header_node,
> +       &goto_table_header_node,
> +       &group_id_header_node,
> +       &null_header,
> +};
> +
> +/* table definition */
> +struct net_flow_field_ref matches_ig_port[2] = {
> +       { .instance = HEADER_INSTANCE_IN_LPORT,
> +         .header = HEADER_METADATA,
> +         .field = HEADER_METADATA_IN_LPORT,
> +         .mask_type = NET_FLOW_MASK_TYPE_LPM},

Need other mask type, not LPM.


> +struct net_flow_table *rocker_table_list[7] = {
> +       &ingress_port_table,
> +       &vlan_table,
> +       &term_mac_table,
> +       &ucast_routing_table,
> +       &bridge_table,
> +       &acl_table,
> +       &null_table,
> +};

cool stuff

> +
> +/* Define the table graph layout */
> +struct net_flow_jump_table table_node_ig_port_next[2] = {
> +       { .field = {0}, .node = ROCKER_FLOW_TABLE_ID_VLAN},
> +       { .field = {0}, .node = 0},
> +};
> +
> +struct net_flow_tbl_node table_node_ingress_port = {
> +       .uid = ROCKER_FLOW_TABLE_ID_INGRESS_PORT,
> +       .jump = table_node_ig_port_next};
> +
> +struct net_flow_jump_table table_node_vlan_next[2] = {
> +       { .field = {0}, .node = ROCKER_FLOW_TABLE_ID_TERMINATION_MAC},
> +       { .field = {0}, .node = 0},
> +};
> +
> +struct net_flow_tbl_node table_node_vlan = {
> +       .uid = ROCKER_FLOW_TABLE_ID_VLAN,
> +       .jump = table_node_vlan_next};
> +
> +struct net_flow_jump_table table_node_term_mac_next[2] = {
> +       { .field = {0}, .node = ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING},
> +       { .field = {0}, .node = 0},
> +};
> +
> +struct net_flow_tbl_node table_node_term_mac = {
> +       .uid = ROCKER_FLOW_TABLE_ID_TERMINATION_MAC,
> +       .jump = table_node_term_mac_next};
> +
> +struct net_flow_jump_table table_node_bridge_next[2] = {
> +       { .field = {0}, .node = ROCKER_FLOW_TABLE_ID_ACL_POLICY},
> +       { .field = {0}, .node = 0},
> +};
> +
> +struct net_flow_tbl_node table_node_bridge = {
> +       .uid = ROCKER_FLOW_TABLE_ID_BRIDGING,
> +       .jump = table_node_bridge_next};
> +
> +struct net_flow_jump_table table_node_ucast_routing_next[2] = {
> +       { .field = {0}, .node = ROCKER_FLOW_TABLE_ID_ACL_POLICY},
> +       { .field = {0}, .node = 0},
> +};
> +
> +struct net_flow_tbl_node table_node_ucast_routing = {
> +       .uid = ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING,
> +       .jump = table_node_ucast_routing_next};
> +
> +struct net_flow_jump_table table_node_acl_next[1] = {
> +       { .field = {0}, .node = 0},
> +};
> +
> +struct net_flow_tbl_node table_node_acl = {
> +       .uid = ROCKER_FLOW_TABLE_ID_ACL_POLICY,
> +       .jump = table_node_acl_next};
> +
> +struct net_flow_tbl_node table_node_nil = {.uid = 0, .jump = NULL};
> +
> +struct net_flow_tbl_node *rocker_table_nodes[7] = {
> +       &table_node_ingress_port,
> +       &table_node_vlan,
> +       &table_node_term_mac,
> +       &table_node_ucast_routing,
> +       &table_node_bridge,
> +       &table_node_acl,
> +       &table_node_nil,
> +};

Cool...getting tired but will review this again in v2

> +#endif /*_MY_PIPELINE_H*/

ROCKER

>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 05/11] net: rocker: add set flow rules
  2014-12-31 19:47 ` [net-next PATCH v1 05/11] net: rocker: add set flow rules John Fastabend
@ 2015-01-06  7:23   ` Scott Feldman
  2015-01-06 15:31     ` John Fastabend
  0 siblings, 1 reply; 60+ messages in thread
From: Scott Feldman @ 2015-01-06  7:23 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On Wed, Dec 31, 2014 at 11:47 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> Implement set flow operations for existing rocker tables.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/ethernet/rocker/rocker.c          |  517 +++++++++++++++++++++++++
>  drivers/net/ethernet/rocker/rocker_pipeline.h |    3
>  2 files changed, 519 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
> index 4c6787a..c40c58d 100644
> --- a/drivers/net/ethernet/rocker/rocker.c
> +++ b/drivers/net/ethernet/rocker/rocker.c
> @@ -3806,6 +3806,520 @@ static struct net_flow_hdr_node **rocker_get_hgraph(struct net_device *d)
>  {
>         return rocker_header_nodes;
>  }
> +
> +static int is_valid_net_flow_action_arg(struct net_flow_action *a, int id)
> +{
> +       struct net_flow_action_arg *args = a->args;
> +       int i;
> +
> +       for (i = 0; args[i].type != NET_FLOW_ACTION_ARG_TYPE_NULL; i++) {
> +               if (a->args[i].type == NET_FLOW_ACTION_ARG_TYPE_NULL ||
> +                   args[i].type != a->args[i].type)
> +                       return -EINVAL;
> +       }
> +
> +       return 0;
> +}
> +
> +static int is_valid_net_flow_action(struct net_flow_action *a, int *actions)
> +{
> +       int i;
> +
> +       for (i = 0; actions[i]; i++) {
> +               if (actions[i] == a->uid)
> +                       return is_valid_net_flow_action_arg(a, a->uid);
> +       }
> +       return -EINVAL;
> +}
> +
> +static int is_valid_net_flow_match(struct net_flow_field_ref *f,
> +                                  struct net_flow_field_ref *fields)
> +{
> +       int i;
> +
> +       for (i = 0; fields[i].header; i++) {
> +               if (f->header == fields[i].header &&
> +                   f->field == fields[i].field)
> +                       return 0;
> +       }
> +
> +       return -EINVAL;
> +}
> +
> +int is_valid_net_flow(struct net_flow_table *table, struct net_flow_flow *flow)
> +{
> +       struct net_flow_field_ref *fields = table->matches;
> +       int *actions = table->actions;
> +       int i, err;
> +
> +       for (i = 0; flow->actions[i].uid; i++) {
> +               err = is_valid_net_flow_action(&flow->actions[i], actions);
> +               if (err)
> +                       return -EINVAL;
> +       }
> +
> +       for (i = 0; flow->matches[i].header; i++) {
> +               err = is_valid_net_flow_match(&flow->matches[i], fields);
> +               if (err)
> +                       return -EINVAL;
> +       }
> +
> +       return 0;
> +}

All the above doesn't look rocker-specific...up-level?

> +
> +static u32 rocker_goto_value(u32 id)
> +{
> +       switch (id) {
> +       case ROCKER_FLOW_TABLE_ID_INGRESS_PORT:
> +               return ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT;
> +       case ROCKER_FLOW_TABLE_ID_VLAN:
> +               return ROCKER_OF_DPA_TABLE_ID_VLAN;
> +       case ROCKER_FLOW_TABLE_ID_TERMINATION_MAC:
> +               return ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
> +       case ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING:
> +               return ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING;
> +       case ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING:
> +               return ROCKER_OF_DPA_TABLE_ID_MULTICAST_ROUTING;
> +       case ROCKER_FLOW_TABLE_ID_BRIDGING:
> +               return ROCKER_OF_DPA_TABLE_ID_BRIDGING;
> +       case ROCKER_FLOW_TABLE_ID_ACL_POLICY:
> +               return ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
> +       default:
> +               return 0;
> +       }
> +}

Could the OF-DPA table IDs be used in the flow table defs?  I think I
remember your answer was no because OF-DPA uses INGRESS_PORT ID == 0,
and 0 is a special value for if_flow tables.  Bummer.

> +
> +static int rocker_flow_set_ig_port(struct net_device *dev,
> +                                  struct net_flow_flow *flow)
> +{
> +       struct rocker_port *rocker_port = netdev_priv(dev);
> +       enum rocker_of_dpa_table_id goto_tbl;
> +       u32 in_lport_mask = 0xffff0000;
> +       u32 in_lport = 0;

why initialize these two?

> +       int err, flags = 0;
> +
> +       err = is_valid_net_flow(&ingress_port_table, flow);
> +       if (err)
> +               return err;
> +
> +       /* ingress port table only supports one field/mask/action this
> +        * simplifies the key construction and we can assume the values
> +        * are the correct types/mask/action by valid check above. The
> +        * user could pass multiple match/actions in a message with the
> +        * same field multiple times currently the valid test does not
> +        * catch this and we just use the first specified.
> +        */
> +       in_lport = flow->matches[0].value_u32;
> +       in_lport_mask = flow->matches[0].mask_u32;
> +       goto_tbl = rocker_goto_value(flow->actions[0].args[0].value_u16);
> +
> +       err = rocker_flow_tbl_ig_port(rocker_port, flags,
> +                                     in_lport, in_lport_mask,
> +                                     goto_tbl);
> +       return err;
> +}
> +
> +static int rocker_flow_set_vlan(struct net_device *dev,
> +                               struct net_flow_flow *flow)
> +{
> +       enum rocker_of_dpa_table_id goto_tbl;
> +       struct rocker_port *rocker_port = netdev_priv(dev);

rocker style thing: put rocker_port decl first (sorry for being so pedantic).

> +       int i, err = 0, flags = 0;
> +       u32 in_lport;
> +       __be16 vlan_id, vlan_id_mask, new_vlan_id;
> +       bool untagged, have_in_lport = false;
> +
> +       err = is_valid_net_flow(&vlan_table, flow);
> +       if (err)
> +               return err;
> +
> +       goto_tbl = ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
> +
> +       /* If user does not specify vid match default to any */
> +       vlan_id = 1;

htons()?

Not sure.  Rocker convention is vlan_id is network-order, but some
places you'll see vid and that's host-order.

> +       vlan_id_mask = 0;
> +
> +       for (i = 0; flow->matches && flow->matches[i].instance; i++) {
> +               switch (flow->matches[i].instance) {
> +               case HEADER_INSTANCE_IN_LPORT:
> +                       in_lport = flow->matches[i].value_u32;
> +                       have_in_lport = true;
> +                       break;
> +               case HEADER_INSTANCE_VLAN_OUTER:
> +                       if (flow->matches[i].field != HEADER_VLAN_VID)
> +                               break;
> +
> +                       vlan_id = htons(flow->matches[i].value_u16);
> +                       vlan_id_mask = htons(flow->matches[i].mask_u16);
> +                       break;
> +               default:
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       /* If user does not specify a new vlan id use default vlan id */
> +       new_vlan_id = rocker_port_vid_to_vlan(rocker_port, vlan_id, &untagged);
> +
> +       for (i = 0; flow->actions && flow->actions[i].uid; i++) {
> +               struct net_flow_action_arg *arg = &flow->actions[i].args[0];
> +
> +               switch (flow->actions[i].uid) {
> +               case ACTION_SET_GOTO_TABLE:
> +                       goto_tbl = rocker_goto_value(arg->value_u16);
> +                       break;
> +               case ACTION_SET_VLAN_ID:
> +                       new_vlan_id = htons(arg->value_u16);
> +                       if (new_vlan_id)
> +                               untagged = false;
> +                       break;
> +               }
> +       }
> +
> +       if (!have_in_lport)
> +               return -EINVAL;

This can be moved up, before second for loop

> +
> +       err = rocker_flow_tbl_vlan(rocker_port, flags, in_lport,
> +                                  vlan_id, vlan_id_mask, goto_tbl,
> +                                  untagged, new_vlan_id);
> +       return err;
> +}
> +
> +static int rocker_flow_set_term_mac(struct net_device *dev,
> +                                   struct net_flow_flow *flow)
> +{
> +       struct rocker_port *rocker_port = netdev_priv(dev);
> +       __be16 vlan_id, vlan_id_mask, ethtype = 0;
> +       const u8 *eth_dst, *eth_dst_mask;
> +       u32 in_lport, in_lport_mask;
> +       int i, err = 0, flags = 0;
> +       bool copy_to_cpu;
> +
> +       eth_dst = NULL;
> +       eth_dst_mask = NULL;
> +

Needed?

> +       err = is_valid_net_flow(&term_mac_table, flow);
> +       if (err)
> +               return err;
> +
> +       /* If user does not specify vid match default to any */
> +       vlan_id = rocker_port->internal_vlan_id;
> +       vlan_id_mask = 0;
> +
> +       /* If user does not specify in_lport match default to any */
> +       in_lport = rocker_port->lport;
> +       in_lport_mask = 0;
> +
> +       /* If user does not specify a mac address match any */
> +       eth_dst = rocker_port->dev->dev_addr;
> +       eth_dst_mask = zero_mac;
> +
> +       for (i = 0; flow->matches && flow->matches[i].instance; i++) {
> +               switch (flow->matches[i].instance) {
> +               case HEADER_INSTANCE_IN_LPORT:
> +                       in_lport = flow->matches[i].value_u32;
> +                       in_lport_mask = flow->matches[i].mask_u32;
> +                       break;
> +               case HEADER_INSTANCE_VLAN_OUTER:
> +                       if (flow->matches[i].field != HEADER_VLAN_VID)
> +                               break;
> +
> +                       vlan_id = htons(flow->matches[i].value_u16);
> +                       vlan_id_mask = htons(flow->matches[i].mask_u16);
> +                       break;
> +               case HEADER_INSTANCE_ETHERNET:
> +                       switch (flow->matches[i].field) {
> +                       case HEADER_ETHERNET_DST_MAC:
> +                               eth_dst = (u8 *)&flow->matches[i].value_u64;
> +                               eth_dst_mask = (u8 *)&flow->matches[i].mask_u64;
> +                               break;
> +                       case HEADER_ETHERNET_ETHERTYPE:
> +                               ethtype = htons(flow->matches[i].value_u16);
> +                               break;
> +                       default:
> +                               return -EINVAL;
> +                       }
> +                       break;
> +               default:
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       if (!ethtype)
> +               return -EINVAL;
> +
> +       /* By default do not copy to cpu */
> +       copy_to_cpu = false;
> +
> +       for (i = 0; flow->actions && flow->actions[i].uid; i++) {
> +               switch (flow->actions[i].uid) {
> +               case ACTION_COPY_TO_CPU:
> +                       copy_to_cpu = true;
> +                       break;
> +               default:
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       err = rocker_flow_tbl_term_mac(rocker_port, in_lport, in_lport_mask,
> +                                      ethtype, eth_dst, eth_dst_mask,
> +                                      vlan_id, vlan_id_mask,
> +                                      copy_to_cpu, flags);
> +       return err;
> +}
> +
> +static int rocker_flow_set_ucast_routing(struct net_device *dev,
> +                                        struct net_flow_flow *flow)
> +{
> +       return -EOPNOTSUPP;
> +}
> +
> +static int rocker_flow_set_mcast_routing(struct net_device *dev,
> +                                        struct net_flow_flow *flow)
> +{
> +       return -EOPNOTSUPP;
> +}
> +
> +static int rocker_flow_set_bridge(struct net_device *dev,
> +                                 struct net_flow_flow *flow)
> +{
> +       enum rocker_of_dpa_table_id goto_tbl;
> +       struct rocker_port *rocker_port = netdev_priv(dev);
> +       u32 in_lport, in_lport_mask, group_id, tunnel_id;
> +       __be16 vlan_id, vlan_id_mask;
> +       const u8 *eth_dst, *eth_dst_mask;
> +       int i, err = 0, flags = 0;
> +       bool copy_to_cpu;
> +
> +       err = is_valid_net_flow(&bridge_table, flow);
> +       if (err)
> +               return err;
> +
> +       goto_tbl = ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
> +
> +       /* If user does not specify vid match default to any */
> +       vlan_id = rocker_port->internal_vlan_id;
> +       vlan_id_mask = 0;
> +
> +       /* If user does not specify in_lport match default to any */
> +       in_lport = rocker_port->lport;
> +       in_lport_mask = 0;
> +
> +       /* If user does not specify a mac address match any */
> +       eth_dst = rocker_port->dev->dev_addr;
> +       eth_dst_mask = NULL;
> +
> +       /* Do not support for tunnel_id yet. */
> +       tunnel_id = 0;
> +
> +       for (i = 0; flow->matches && flow->matches[i].instance; i++) {
> +               switch (flow->matches[i].instance) {
> +               case HEADER_INSTANCE_IN_LPORT:
> +                       in_lport = flow->matches[i].value_u32;
> +                       in_lport_mask = flow->matches[i].mask_u32;
> +                       break;
> +               case HEADER_INSTANCE_VLAN_OUTER:
> +                       if (flow->matches[i].field != HEADER_VLAN_VID)
> +                               break;
> +
> +                       vlan_id = htons(flow->matches[i].value_u16);
> +                       vlan_id_mask = htons(flow->matches[i].mask_u16);
> +                       break;
> +               case HEADER_INSTANCE_ETHERNET:
> +                       switch (flow->matches[i].field) {
> +                       case HEADER_ETHERNET_DST_MAC:
> +                               eth_dst = (u8 *)&flow->matches[i].value_u64;
> +                               eth_dst_mask = (u8 *)&flow->matches[i].mask_u64;
> +                               break;
> +                       default:
> +                               return -EINVAL;
> +                       }
> +                       break;
> +               default:
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       /* By default do not copy to cpu and skip group assignment */
> +       copy_to_cpu = false;
> +       group_id = ROCKER_GROUP_NONE;
> +
> +       for (i = 0; flow->actions && flow->actions[i].uid; i++) {
> +               struct net_flow_action_arg *arg = &flow->actions[i].args[0];
> +
> +               switch (flow->actions[i].uid) {
> +               case ACTION_SET_GOTO_TABLE:
> +                       goto_tbl = rocker_goto_value(arg->value_u16);
> +                       break;
> +               case ACTION_COPY_TO_CPU:
> +                       copy_to_cpu = true;
> +                       break;
> +               case ACTION_SET_GROUP_ID:
> +                       group_id = arg->value_u32;
> +                       break;
> +               default:
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       /* Ignoring eth_dst_mask it seems to cause a EINVAL return code */
> +       err = rocker_flow_tbl_bridge(rocker_port, flags,
> +                                    eth_dst, eth_dst_mask,
> +                                    vlan_id, tunnel_id,
> +                                    goto_tbl, group_id, copy_to_cpu);
> +       return err;
> +}
> +
> +static int rocker_flow_set_acl(struct net_device *dev,
> +                              struct net_flow_flow *flow)
> +{
> +       struct rocker_port *rocker_port = netdev_priv(dev);
> +       u32 in_lport, in_lport_mask, group_id, tunnel_id;
> +       __be16 vlan_id, vlan_id_mask, ethtype = 0;
> +       const u8 *eth_dst, *eth_src, *eth_dst_mask, *eth_src_mask;
> +       u8 protocol, protocol_mask, dscp, dscp_mask;
> +       int i, err = 0, flags = 0;
> +
> +       err = is_valid_net_flow(&bridge_table, flow);
> +       if (err)
> +               return err;
> +
> +       /* If user does not specify vid match default to any */
> +       vlan_id = rocker_port->internal_vlan_id;
> +       vlan_id_mask = 0;
> +
> +       /* If user does not specify in_lport match default to any */
> +       in_lport = rocker_port->lport;
> +       in_lport_mask = 0;
> +
> +       /* If user does not specify a mac address match any */
> +       eth_dst = rocker_port->dev->dev_addr;
> +       eth_src = zero_mac;
> +       eth_dst_mask = NULL;
> +       eth_src_mask = NULL;
> +
> +       /* If user does not set protocol/dscp mask them out */
> +       protocol = 0;
> +       dscp = 0;
> +       protocol_mask = 0;
> +       dscp_mask = 0;
> +
> +       /* Do not support for tunnel_id yet. */
> +       tunnel_id = 0;
> +
> +       for (i = 0; flow->matches && flow->matches[i].instance; i++) {
> +               switch (flow->matches[i].instance) {
> +               case HEADER_INSTANCE_IN_LPORT:
> +                       in_lport = flow->matches[i].value_u32;
> +                       in_lport_mask = flow->matches[i].mask_u32;
> +                       break;
> +               case HEADER_INSTANCE_VLAN_OUTER:
> +                       if (flow->matches[i].field != HEADER_VLAN_VID)
> +                               break;
> +
> +                       vlan_id = htons(flow->matches[i].value_u16);
> +                       vlan_id_mask = htons(flow->matches[i].mask_u16);
> +                       break;
> +               case HEADER_INSTANCE_ETHERNET:
> +                       switch (flow->matches[i].field) {
> +                       case HEADER_ETHERNET_SRC_MAC:
> +                               eth_src = (u8 *)&flow->matches[i].value_u64;
> +                               eth_src_mask = (u8 *)&flow->matches[i].mask_u64;
> +                               break;
> +                       case HEADER_ETHERNET_DST_MAC:
> +                               eth_dst = (u8 *)&flow->matches[i].value_u64;
> +                               eth_dst_mask = (u8 *)&flow->matches[i].mask_u64;
> +                               break;
> +                       case HEADER_ETHERNET_ETHERTYPE:
> +                               ethtype = htons(flow->matches[i].value_u16);
> +                               break;
> +                       default:
> +                               return -EINVAL;
> +                       }
> +                       break;
> +               case HEADER_INSTANCE_IPV4:
> +                       switch (flow->matches[i].field) {
> +                       case HEADER_IPV4_PROTOCOL:
> +                               protocol = flow->matches[i].value_u8;
> +                               protocol_mask = flow->matches[i].mask_u8;
> +                               break;
> +                       case HEADER_IPV4_DSCP:
> +                               dscp = flow->matches[i].value_u8;
> +                               dscp_mask = flow->matches[i].mask_u8;
> +                               break;
> +                       default:
> +                               return -EINVAL;
> +                       }
> +               default:
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       /* By default do not copy to cpu and skip group assignment */
> +       group_id = ROCKER_GROUP_NONE;
> +
> +       for (i = 0; flow->actions && flow->actions[i].uid; i++) {
> +               switch (flow->actions[i].uid) {
> +               case ACTION_SET_GROUP_ID:
> +                       group_id = flow->actions[i].args[0].value_u32;
> +                       break;
> +               default:
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       err = rocker_flow_tbl_acl(rocker_port, flags,
> +                                 in_lport, in_lport_mask,
> +                                 eth_src, eth_src_mask,
> +                                 eth_dst, eth_dst_mask, ethtype,
> +                                 vlan_id, vlan_id_mask,
> +                                 protocol, protocol_mask,
> +                                 dscp, dscp_mask,
> +                                 group_id);
> +       return err;
> +}
> +
> +static int rocker_set_flows(struct net_device *dev,
> +                           struct net_flow_flow *flow)
> +{
> +       int err = -EINVAL;
> +
> +       if (!flow->matches || !flow->actions)
> +               return -EINVAL;
> +
> +       switch (flow->table_id) {
> +       case ROCKER_FLOW_TABLE_ID_INGRESS_PORT:
> +               err = rocker_flow_set_ig_port(dev, flow);
> +               break;
> +       case ROCKER_FLOW_TABLE_ID_VLAN:
> +               err = rocker_flow_set_vlan(dev, flow);
> +               break;
> +       case ROCKER_FLOW_TABLE_ID_TERMINATION_MAC:
> +               err = rocker_flow_set_term_mac(dev, flow);
> +               break;
> +       case ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING:
> +               err = rocker_flow_set_ucast_routing(dev, flow);
> +               break;
> +       case ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING:
> +               err = rocker_flow_set_mcast_routing(dev, flow);
> +               break;
> +       case ROCKER_FLOW_TABLE_ID_BRIDGING:
> +               err = rocker_flow_set_bridge(dev, flow);
> +               break;
> +       case ROCKER_FLOW_TABLE_ID_ACL_POLICY:
> +               err = rocker_flow_set_acl(dev, flow);
> +               break;
> +       default:
> +               break;
> +       }
> +
> +       return err;
> +}
> +
> +static int rocker_del_flows(struct net_device *dev,
> +                           struct net_flow_flow *flow)
> +{
> +       return -EOPNOTSUPP;
> +}
>  #endif
>
>  static const struct net_device_ops rocker_port_netdev_ops = {
> @@ -3828,6 +4342,9 @@ static const struct net_device_ops rocker_port_netdev_ops = {
>         .ndo_flow_get_actions           = rocker_get_actions,
>         .ndo_flow_get_tbl_graph         = rocker_get_tgraph,
>         .ndo_flow_get_hdr_graph         = rocker_get_hgraph,
> +
> +       .ndo_flow_set_flows             = rocker_set_flows,
> +       .ndo_flow_del_flows             = rocker_del_flows,
>  #endif
>  };

Looks good overall to me

> diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
> index 9544339..701e139 100644
> --- a/drivers/net/ethernet/rocker/rocker_pipeline.h
> +++ b/drivers/net/ethernet/rocker/rocker_pipeline.h
> @@ -527,6 +527,7 @@ enum rocker_flow_table_id_space {
>         ROCKER_FLOW_TABLE_ID_VLAN,
>         ROCKER_FLOW_TABLE_ID_TERMINATION_MAC,
>         ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING,
> +       ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING,
>         ROCKER_FLOW_TABLE_ID_BRIDGING,
>         ROCKER_FLOW_TABLE_ID_ACL_POLICY,
>         ROCKER_FLOW_TABLE_NULL = 0,
> @@ -588,7 +589,7 @@ struct net_flow_table acl_table = {
>
>  struct net_flow_table null_table = {
>         .name = "",
> -       .uid = 0,
> +       .uid = ROCKER_FLOW_TABLE_NULL,
>         .source = 0,
>         .size = 0,
>         .matches = NULL,
>

Move these changes to previous patch?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 08/11] net: rocker: add get flow API operation
  2014-12-31 19:48 ` [net-next PATCH v1 08/11] net: rocker: add get flow API operation John Fastabend
       [not found]   ` <CAKoUArm4z_i6Su9Q4ODB1QYR_Z098MjT2yN=WR7LbN387AvPsg@mail.gmail.com>
@ 2015-01-06  7:40   ` Scott Feldman
  2015-01-06 14:59     ` John Fastabend
  1 sibling, 1 reply; 60+ messages in thread
From: Scott Feldman @ 2015-01-06  7:40 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On Wed, Dec 31, 2014 at 11:48 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> Add operations to get flows. I wouldn't mind cleaning this code
> up a bit but my first attempt to do this used macros which shortered
> the code up but when I was done I decided it just made the code
> unreadable and unmaintainable.
>
> I might think about it a bit more but this implementation albeit
> a bit long and repeatative is easier to understand IMO.

Dang, you put a lot of work into this one.

Something doesn't feel right though.  In this case, rocker driver just
happened to have cached all the flow/group stuff in hash tables in
software, so you don't need to query thru to the device to extract the
if_flow info.  What doesn't feel right is all the work need in the
driver.  For each and every driver.  get_flows needs to go above
driver, somehow.

Seems the caller of if_flow already knows the flows pushed down with
add_flows/del_flows, and with the err handling can't mess it up.

Is one use-case for get_flows to recover from a fatal OS/driver crash,
and to rely on hardware to recover flow set?  In this rocker example,
that's not going to work because driver didn't get thru to device to
get_flows.  I think I'd like to know more about the use-cases of
get_flows.

-scott

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 00/11] A flow API
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (12 preceding siblings ...)
  2015-01-06  2:42 ` Scott Feldman
@ 2015-01-06 12:23 ` Jamal Hadi Salim
  2015-01-09 18:27   ` John Fastabend
  2015-01-08 15:14 ` Or Gerlitz
  2015-01-08 18:03 ` Jiri Pirko
  15 siblings, 1 reply; 60+ messages in thread
From: Jamal Hadi Salim @ 2015-01-06 12:23 UTC (permalink / raw)
  To: John Fastabend, tgraf, sfeldma, jiri, simon.horman
  Cc: netdev, davem, andy, Shrijeet Mukherjee

John,

There are a lot of things to digest in your posting - I am interested
in commenting on many things but feel need to pay attention to details
in general given the importance of this interface (and conference is
chewing my netdev time at the moment). I need to actually sit down
and stare at code and documentation.

I do think we need to have this discussion as part of the BOF
Shrijeet is running at netdev01.

General comments:
1) one of the things that i have learnt over time is that not
everything that sits or is abstracted from hardware is a table.
You could have structs or simple scalars for config or runtime
control. How does what you are proposing here allow to express that?
I dont think you'd need it for simple things but if you dont allow
for it you run into the square-hole-round-peg syndrome of "yeah
i can express that u32 variable as a single table with a single row
and a single column" ;-> or "you need another infrastructure for
that single scalr u32"

2) So i understood the sense of replacing ethtool for classifier
access with a direct interface mostly because thats what it was
already doing - but i am not sure why you need
it for a generic interface. Am i mistaken you are providing direct
access to hardware from user space? Would this make essentially
the Linux infrastructure a bypass (which vendors and their SDKs
love)? IMHO, a good example is to pick something like netfilter
or tc-filters and show how that is offloaded. This keeps it in
the same spirit as what we are shooting for in L2/3 at the moment.

Anyways I apologize i havent spent as much time (holiday period
wasnt good for me and netdev01 is picking up and consuming my time
but i will try my best to respond and comment with some latency)

cheers,
jamal

On 12/31/14 14:45, John Fastabend wrote:
> So... I could continue to mull over this and tweak bits and pieces
> here and there but I decided its best to get a wider group of folks
> looking at it and hopefulyl with any luck using it so here it is.
>
> This set creates a new netlink family and set of messages to configure
> flow tables in hardware. I tried to make the commit messages
> reasonably verbose at least in the flow_table patches.
>
> What we get at the end of this series is a working API to get device
> capabilities and program flows using the rocker switch.
>
> I created a user space tool 'flow' that I use to configure and query
> the devices it is posted here,
>
> 	https://github.com/jrfastab/iprotue2-flow-tool
>
> For now it is a stand-alone tool but once the kernel bits get sorted
> out (I'm guessing there will need to be a few versions of this series
> to get it right) I would like to port it into the iproute2 package.
> This way we can keep all of our tooling in one package see 'bridge'
> for example.
>
> As far as testing, I've tested various combinations of tables and
> rules on the rocker switch and it seems to work. I have not tested
> 100% of the rocker code paths though. It would be great to get some
> sort of automated framework around the API to do this. I don't
> think should gate the inclusion of the API though.
>
> I could use some help reviewing,
>
>    (a) error paths and netlink validation code paths
>
>    (b) Break down of structures vs netlink attributes. I
>        am trying to balance flexibility given by having
>        netlinnk TLV attributes vs conciseness. So some
>        things are passed as structures.
>
>    (c) are there any devices that have pipelines that we
>        can't represent with this API? It would be good to
>        know about these so we can design it in probably
>        in a future series.
>
> For some examples and maybe a bit more illustrative description I
> posted a quickly typed up set of notes on github io pages. Here we
> can show the description along with images produced by the flow tool
> showing the pipeline. Once we settle a bit more on the API we should
> probably do a clean up of this and other threads happening and commit
> something to the Documentation directory.
>
>   http://jrfastab.github.io/jekyll/update/2014/12/21/flow-api.html
>
> Finally I have more patches to add support for creating and destroying
> tables. This allows users to define the pipeline at runtime rather
> than statically as rocker does now. After this set gets some traction
> I'll look at pushing them in a next round. However it likely requires
> adding another "world" to rocker. Another piece that I want to add is
> a description of the actions and metadata. This way user space can
> "learn" what an action is and how metadata interacts with the system.
> This work is under development.
>
> Thanks! Any comments/feedback always welcome.
>
> And also thanks to everyone who helped with this flow API so far. All
> the folks at Dusseldorf LPC, OVS summit Santa Clara, P4 authors for
> some inspiration, the collection of IETF FoRCES documents I mulled
> over, Netfilter workshop where I started to realize fixing ethtool
> was most likely not going to work, etc.
>
> ---
>
> John Fastabend (11):
>        net: flow_table: create interface for hw match/action tables
>        net: flow_table: add flow, delete flow
>        net: flow_table: add apply action argument to tables
>        rocker: add pipeline model for rocker switch
>        net: rocker: add set flow rules
>        net: rocker: add group_id slices and drop explicit goto
>        net: rocker: add multicast path to bridging
>        net: rocker: add get flow API operation
>        net: rocker: add cookie to group acls and use flow_id to set cookie
>        net: rocker: have flow api calls set cookie value
>        net: rocker: implement delete flow routine
>
>
>   drivers/net/ethernet/rocker/rocker.c          | 1641 +++++++++++++++++++++++++
>   drivers/net/ethernet/rocker/rocker_pipeline.h |  793 ++++++++++++
>   include/linux/if_flow.h                       |  115 ++
>   include/linux/netdevice.h                     |   20
>   include/uapi/linux/if_flow.h                  |  413 ++++++
>   net/Kconfig                                   |    7
>   net/core/Makefile                             |    1
>   net/core/flow_table.c                         | 1339 ++++++++++++++++++++
>   8 files changed, 4312 insertions(+), 17 deletions(-)
>   create mode 100644 drivers/net/ethernet/rocker/rocker_pipeline.h
>   create mode 100644 include/linux/if_flow.h
>   create mode 100644 include/uapi/linux/if_flow.h
>   create mode 100644 net/core/flow_table.c
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 08/11] net: rocker: add get flow API operation
  2015-01-06  7:40   ` Scott Feldman
@ 2015-01-06 14:59     ` John Fastabend
  2015-01-06 16:57       ` Scott Feldman
  0 siblings, 1 reply; 60+ messages in thread
From: John Fastabend @ 2015-01-06 14:59 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On 01/05/2015 11:40 PM, Scott Feldman wrote:
> On Wed, Dec 31, 2014 at 11:48 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> Add operations to get flows. I wouldn't mind cleaning this code
>> up a bit but my first attempt to do this used macros which shortered
>> the code up but when I was done I decided it just made the code
>> unreadable and unmaintainable.
>>
>> I might think about it a bit more but this implementation albeit
>> a bit long and repeatative is easier to understand IMO.
>
> Dang, you put a lot of work into this one.
>
> Something doesn't feel right though.  In this case, rocker driver just
> happened to have cached all the flow/group stuff in hash tables in
> software, so you don't need to query thru to the device to extract the
> if_flow info.  What doesn't feel right is all the work need in the
> driver.  For each and every driver.  get_flows needs to go above
> driver, somehow.

Another option is to have a software cache in the flow_table.c I
was trying to avoid caching as I really don't expect 'get' operations
to be fast path and going to hardware seems good enough for me.
Other than its a bit annoying to write the mapping code.

If you don't have a cache then somewhere there has to be a mapping
from hardware flow descriptors to the flow descriptors used by the
flow API. Like I noted I tried to help by using macros and helper
routines but in the end I simply decided it convoluted the code to
much and made it hard to debug.

>
> Seems the caller of if_flow already knows the flows pushed down with
> add_flows/del_flows, and with the err handling can't mess it up.

yes the caller could know if it cached them which it doesn't. We
can add a cache if its helpful. You may have multiple users of the
API (both in-kernel and user space) though so I don't think you can
push it much beyond the flow_table.c.

>
> Is one use-case for get_flows to recover from a fatal OS/driver crash,
> and to rely on hardware to recover flow set?  In this rocker example,
> that's not going to work because driver didn't get thru to device to
> get_flows.  I think I'd like to know more about the use-cases of
> get_flows.

Its helpful for debugging. And if you have multiple consumers it
may be helpful to "learn" what other consumers are doing. I don't
have any concrete cases at the moment though.

For the CLI case its handy to add some flows, forget what you did,
and then do a get to refresh your mind. Not likely a problem for
"real" management software.

At least its not part of the UAPI so we could tweak/improve it as
much as we wanted. Any better ideas? I'm open to suggestions on this
one.

>
> -scott
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 05/11] net: rocker: add set flow rules
  2015-01-06  7:23   ` Scott Feldman
@ 2015-01-06 15:31     ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-06 15:31 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On 01/05/2015 11:23 PM, Scott Feldman wrote:
> On Wed, Dec 31, 2014 at 11:47 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> Implement set flow operations for existing rocker tables.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

[...]

>> +static int is_valid_net_flow_action(struct net_flow_action *a, int *actions)
>> +{
>> +       int i;
>> +
>> +       for (i = 0; actions[i]; i++) {
>> +               if (actions[i] == a->uid)
>> +                       return is_valid_net_flow_action_arg(a, a->uid);
>> +       }
>> +       return -EINVAL;
>> +}
>> +
>> +static int is_valid_net_flow_match(struct net_flow_field_ref *f,
>> +                                  struct net_flow_field_ref *fields)
>> +{
>> +       int i;
>> +
>> +       for (i = 0; fields[i].header; i++) {
>> +               if (f->header == fields[i].header &&
>> +                   f->field == fields[i].field)
>> +                       return 0;
>> +       }
>> +
>> +       return -EINVAL;
>> +}
>> +
>> +int is_valid_net_flow(struct net_flow_table *table, struct net_flow_flow *flow)
>> +{
>> +       struct net_flow_field_ref *fields = table->matches;
>> +       int *actions = table->actions;
>> +       int i, err;
>> +
>> +       for (i = 0; flow->actions[i].uid; i++) {
>> +               err = is_valid_net_flow_action(&flow->actions[i], actions);
>> +               if (err)
>> +                       return -EINVAL;
>> +       }
>> +
>> +       for (i = 0; flow->matches[i].header; i++) {
>> +               err = is_valid_net_flow_match(&flow->matches[i], fields);
>> +               if (err)
>> +                       return -EINVAL;
>> +       }
>> +
>> +       return 0;
>> +}
>
> All the above doesn't look rocker-specific...up-level?
>

Yes, already in the works for v2.

>> +
>> +static u32 rocker_goto_value(u32 id)
>> +{
>> +       switch (id) {
>> +       case ROCKER_FLOW_TABLE_ID_INGRESS_PORT:
>> +               return ROCKER_OF_DPA_TABLE_ID_INGRESS_PORT;
>> +       case ROCKER_FLOW_TABLE_ID_VLAN:
>> +               return ROCKER_OF_DPA_TABLE_ID_VLAN;
>> +       case ROCKER_FLOW_TABLE_ID_TERMINATION_MAC:
>> +               return ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
>> +       case ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING:
>> +               return ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING;
>> +       case ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING:
>> +               return ROCKER_OF_DPA_TABLE_ID_MULTICAST_ROUTING;
>> +       case ROCKER_FLOW_TABLE_ID_BRIDGING:
>> +               return ROCKER_OF_DPA_TABLE_ID_BRIDGING;
>> +       case ROCKER_FLOW_TABLE_ID_ACL_POLICY:
>> +               return ROCKER_OF_DPA_TABLE_ID_ACL_POLICY;
>> +       default:
>> +               return 0;
>> +       }
>> +}
>
> Could the OF-DPA table IDs be used in the flow table defs?  I think I
> remember your answer was no because OF-DPA uses INGRESS_PORT ID == 0,
> and 0 is a special value for if_flow tables.  Bummer.
>

A minor nuisance. I made table_id 0 a special delineating table.

>> +
>> +static int rocker_flow_set_ig_port(struct net_device *dev,
>> +                                  struct net_flow_flow *flow)
>> +{
>> +       struct rocker_port *rocker_port = netdev_priv(dev);
>> +       enum rocker_of_dpa_table_id goto_tbl;
>> +       u32 in_lport_mask = 0xffff0000;
>> +       u32 in_lport = 0;
>
> why initialize these two?

apparently a hold out from some code before I added the valid_net_flow()
check. I'll remove it.

>
>> +       int err, flags = 0;
>> +
>> +       err = is_valid_net_flow(&ingress_port_table, flow);
>> +       if (err)
>> +               return err;
>> +
>> +       /* ingress port table only supports one field/mask/action this
>> +        * simplifies the key construction and we can assume the values
>> +        * are the correct types/mask/action by valid check above. The
>> +        * user could pass multiple match/actions in a message with the
>> +        * same field multiple times currently the valid test does not
>> +        * catch this and we just use the first specified.
>> +        */
>> +       in_lport = flow->matches[0].value_u32;
>> +       in_lport_mask = flow->matches[0].mask_u32;
>> +       goto_tbl = rocker_goto_value(flow->actions[0].args[0].value_u16);
>> +
>> +       err = rocker_flow_tbl_ig_port(rocker_port, flags,
>> +                                     in_lport, in_lport_mask,
>> +                                     goto_tbl);
>> +       return err;
>> +}
>> +
>> +static int rocker_flow_set_vlan(struct net_device *dev,
>> +                               struct net_flow_flow *flow)
>> +{
>> +       enum rocker_of_dpa_table_id goto_tbl;
>> +       struct rocker_port *rocker_port = netdev_priv(dev);
>
> rocker style thing: put rocker_port decl first (sorry for being so pedantic).

no problem, making the change.

>
>> +       int i, err = 0, flags = 0;
>> +       u32 in_lport;
>> +       __be16 vlan_id, vlan_id_mask, new_vlan_id;
>> +       bool untagged, have_in_lport = false;
>> +
>> +       err = is_valid_net_flow(&vlan_table, flow);
>> +       if (err)
>> +               return err;
>> +
>> +       goto_tbl = ROCKER_OF_DPA_TABLE_ID_TERMINATION_MAC;
>> +
>> +       /* If user does not specify vid match default to any */
>> +       vlan_id = 1;
>
> htons()?
>
> Not sure.  Rocker convention is vlan_id is network-order, but some
> places you'll see vid and that's host-order.
>

Yep this is needed.

>> +       vlan_id_mask = 0;
>> +
>> +       for (i = 0; flow->matches && flow->matches[i].instance; i++) {
>> +               switch (flow->matches[i].instance) {
>> +               case HEADER_INSTANCE_IN_LPORT:
>> +                       in_lport = flow->matches[i].value_u32;
>> +                       have_in_lport = true;
>> +                       break;
>> +               case HEADER_INSTANCE_VLAN_OUTER:
>> +                       if (flow->matches[i].field != HEADER_VLAN_VID)
>> +                               break;
>> +
>> +                       vlan_id = htons(flow->matches[i].value_u16);
>> +                       vlan_id_mask = htons(flow->matches[i].mask_u16);
>> +                       break;
>> +               default:
>> +                       return -EINVAL;
>> +               }
>> +       }
>> +
>> +       /* If user does not specify a new vlan id use default vlan id */
>> +       new_vlan_id = rocker_port_vid_to_vlan(rocker_port, vlan_id, &untagged);
>> +
>> +       for (i = 0; flow->actions && flow->actions[i].uid; i++) {
>> +               struct net_flow_action_arg *arg = &flow->actions[i].args[0];
>> +
>> +               switch (flow->actions[i].uid) {
>> +               case ACTION_SET_GOTO_TABLE:
>> +                       goto_tbl = rocker_goto_value(arg->value_u16);
>> +                       break;
>> +               case ACTION_SET_VLAN_ID:
>> +                       new_vlan_id = htons(arg->value_u16);
>> +                       if (new_vlan_id)
>> +                               untagged = false;
>> +                       break;
>> +               }
>> +       }
>> +
>> +       if (!have_in_lport)
>> +               return -EINVAL;
>
> This can be moved up, before second for loop
>

done.

>> +
>> +       err = rocker_flow_tbl_vlan(rocker_port, flags, in_lport,
>> +                                  vlan_id, vlan_id_mask, goto_tbl,
>> +                                  untagged, new_vlan_id);
>> +       return err;
>> +}
>> +
>> +static int rocker_flow_set_term_mac(struct net_device *dev,
>> +                                   struct net_flow_flow *flow)
>> +{
>> +       struct rocker_port *rocker_port = netdev_priv(dev);
>> +       __be16 vlan_id, vlan_id_mask, ethtype = 0;
>> +       const u8 *eth_dst, *eth_dst_mask;
>> +       u32 in_lport, in_lport_mask;
>> +       int i, err = 0, flags = 0;
>> +       bool copy_to_cpu;
>> +
>> +       eth_dst = NULL;
>> +       eth_dst_mask = NULL;
>> +
>
> Needed?

nope same as above hold out from an older variant of valid_net_flow().

>
>> +       err = is_valid_net_flow(&term_mac_table, flow);
>> +       if (err)
>> +               return err;
>> +
>> +       /* If user does not specify vid match default to any */
>> +       vlan_id = rocker_port->internal_vlan_id;
>> +       vlan_id_mask = 0;
>> +

[...]

>>
>>   static const struct net_device_ops rocker_port_netdev_ops = {
>> @@ -3828,6 +4342,9 @@ static const struct net_device_ops rocker_port_netdev_ops = {
>>          .ndo_flow_get_actions           = rocker_get_actions,
>>          .ndo_flow_get_tbl_graph         = rocker_get_tgraph,
>>          .ndo_flow_get_hdr_graph         = rocker_get_hgraph,
>> +
>> +       .ndo_flow_set_flows             = rocker_set_flows,
>> +       .ndo_flow_del_flows             = rocker_del_flows,
>>   #endif
>>   };
>
> Looks good overall to me

good to hear.

>
>> diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
>> index 9544339..701e139 100644
>> --- a/drivers/net/ethernet/rocker/rocker_pipeline.h
>> +++ b/drivers/net/ethernet/rocker/rocker_pipeline.h
>> @@ -527,6 +527,7 @@ enum rocker_flow_table_id_space {
>>          ROCKER_FLOW_TABLE_ID_VLAN,
>>          ROCKER_FLOW_TABLE_ID_TERMINATION_MAC,
>>          ROCKER_FLOW_TABLE_ID_UNICAST_ROUTING,
>> +       ROCKER_FLOW_TABLE_ID_MULTICAST_ROUTING,
>>          ROCKER_FLOW_TABLE_ID_BRIDGING,
>>          ROCKER_FLOW_TABLE_ID_ACL_POLICY,
>>          ROCKER_FLOW_TABLE_NULL = 0,
>> @@ -588,7 +589,7 @@ struct net_flow_table acl_table = {
>>
>>   struct net_flow_table null_table = {
>>          .name = "",
>> -       .uid = 0,
>> +       .uid = ROCKER_FLOW_TABLE_NULL,
>>          .source = 0,
>>          .size = 0,
>>          .matches = NULL,
>>
>
> Move these changes to previous patch?
>

yep will do.

Thanks!
John


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 08/11] net: rocker: add get flow API operation
  2015-01-06 14:59     ` John Fastabend
@ 2015-01-06 16:57       ` Scott Feldman
  2015-01-06 17:50         ` John Fastabend
  0 siblings, 1 reply; 60+ messages in thread
From: Scott Feldman @ 2015-01-06 16:57 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On Tue, Jan 6, 2015 at 6:59 AM, John Fastabend <john.fastabend@gmail.com> wrote:
> On 01/05/2015 11:40 PM, Scott Feldman wrote:
>>
>> On Wed, Dec 31, 2014 at 11:48 AM, John Fastabend
>> <john.fastabend@gmail.com> wrote:
>>>
>>> Add operations to get flows. I wouldn't mind cleaning this code
>>> up a bit but my first attempt to do this used macros which shortered
>>> the code up but when I was done I decided it just made the code
>>> unreadable and unmaintainable.
>>>
>>> I might think about it a bit more but this implementation albeit
>>> a bit long and repeatative is easier to understand IMO.
>>
>>
>> Dang, you put a lot of work into this one.
>>
>> Something doesn't feel right though.  In this case, rocker driver just
>> happened to have cached all the flow/group stuff in hash tables in
>> software, so you don't need to query thru to the device to extract the
>> if_flow info.  What doesn't feel right is all the work need in the
>> driver.  For each and every driver.  get_flows needs to go above
>> driver, somehow.
>
>
> Another option is to have a software cache in the flow_table.c I
> was trying to avoid caching as I really don't expect 'get' operations
> to be fast path and going to hardware seems good enough for me.
> Other than its a bit annoying to write the mapping code.

Caching in flow_table.c seems best to me as drivers/devices don't need
to be involved and the cache can server multiple users of the API.
Are there cases where the device could get flow table entries
installed/deleted outside the API?  For example, if the device was
learning MAC addresses, and did automatic table insertions.  We worked
around that case with the recent L2 swdev support by pushing learned
MAC addrs up to bridge's FDB so software and hardware tables stay
synced.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch
  2015-01-06  7:01   ` Scott Feldman
@ 2015-01-06 17:00     ` John Fastabend
  2015-01-06 17:16       ` Scott Feldman
  0 siblings, 1 reply; 60+ messages in thread
From: John Fastabend @ 2015-01-06 17:00 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On 01/05/2015 11:01 PM, Scott Feldman wrote:
> On Wed, Dec 31, 2014 at 11:47 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> This adds rocker support for the net_flow_get_* operations. With this
>> we can interrogate rocker.
>>
>> Here we see that for static configurations enabling the get operations
>> is simply a matter of defining a pipeline model and returning the
>> structures for the core infrastructure to encapsulate into netlink
>> messages.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>   drivers/net/ethernet/rocker/rocker.c          |   35 +
>>   drivers/net/ethernet/rocker/rocker_pipeline.h |  673 +++++++++++++++++++++++++
>>   2 files changed, 708 insertions(+)
>>   create mode 100644 drivers/net/ethernet/rocker/rocker_pipeline.h
>>
>> diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
>> index fded127..4c6787a 100644
>> --- a/drivers/net/ethernet/rocker/rocker.c
>> +++ b/drivers/net/ethernet/rocker/rocker.c
>> @@ -36,6 +36,7 @@
>>   #include <generated/utsrelease.h>
>>
>>   #include "rocker.h"
>> +#include "rocker_pipeline.h"
>>
>>   static const char rocker_driver_name[] = "rocker";
>>
>> @@ -3780,6 +3781,33 @@ static int rocker_port_switch_port_stp_update(struct net_device *dev, u8 state)
>>          return rocker_port_stp_update(rocker_port, state);
>>   }
>>
>> +#ifdef CONFIG_NET_FLOW_TABLES
>
> Can this #ifdef test be moved out of driver?  The if_flow core code
> can stub out operations if CONFIG_NET_FLOW_TABLES isn't defined.

here sure this is easy enough.

>
>> +static struct net_flow_table **rocker_get_tables(struct net_device *d)
>> +{
>> +       return rocker_table_list;
>> +}
>> +
>> +static struct net_flow_header **rocker_get_headers(struct net_device *d)
>> +{
>> +       return rocker_header_list;
>> +}
>> +
>> +static struct net_flow_action **rocker_get_actions(struct net_device *d)
>> +{
>> +       return rocker_action_list;
>> +}
>> +
>> +static struct net_flow_tbl_node **rocker_get_tgraph(struct net_device *d)
>> +{
>> +       return rocker_table_nodes;
>> +}
>> +
>> +static struct net_flow_hdr_node **rocker_get_hgraph(struct net_device *d)
>> +{
>> +       return rocker_header_nodes;
>> +}
>> +#endif
>> +
>>   static const struct net_device_ops rocker_port_netdev_ops = {
>>          .ndo_open                       = rocker_port_open,
>>          .ndo_stop                       = rocker_port_stop,
>> @@ -3794,6 +3822,13 @@ static const struct net_device_ops rocker_port_netdev_ops = {
>>          .ndo_bridge_getlink             = rocker_port_bridge_getlink,
>>          .ndo_switch_parent_id_get       = rocker_port_switch_parent_id_get,
>>          .ndo_switch_port_stp_update     = rocker_port_switch_port_stp_update,
>> +#ifdef CONFIG_NET_FLOW_TABLES
>
> same comment here

We could although then we need some 'depends on' logic in Kconfig
to be sure CONFIG_NET_FLOW_TABLES is enabled. I think we want to
be able to strip this code out of the main core code paths when
its not needed which means wrapping it in the ifdef in netdevice.h

>
>> +       .ndo_flow_get_tables            = rocker_get_tables,
>> +       .ndo_flow_get_headers           = rocker_get_headers,
>> +       .ndo_flow_get_actions           = rocker_get_actions,
>> +       .ndo_flow_get_tbl_graph         = rocker_get_tgraph,
>> +       .ndo_flow_get_hdr_graph         = rocker_get_hgraph,
>> +#endif
>>   };
>>
>>   /********************
>> diff --git a/drivers/net/ethernet/rocker/rocker_pipeline.h b/drivers/net/ethernet/rocker/rocker_pipeline.h
>> new file mode 100644
>> index 0000000..9544339
>> --- /dev/null
>> +++ b/drivers/net/ethernet/rocker/rocker_pipeline.h
>
> Add standard header info...copyright/license.
>
>> @@ -0,0 +1,673 @@
>> +#ifndef _MY_PIPELINE_H_
>> +#define _MY_PIPELINE_H_
>
> _ROCKER_PIPELINE_H_
>
>> +
>> +#include <linux/if_flow.h>
>> +
>> +/* header definition */
>> +#define HEADER_ETHERNET_SRC_MAC 1
>> +#define HEADER_ETHERNET_DST_MAC 2
>> +#define HEADER_ETHERNET_ETHERTYPE 3
>
> Use enum?
>

yep changed this all to enums, array_size macros and use a
simpler null terminator throughout. Thanks.

[...]

>> +/* headers graph */
>> +#define HEADER_INSTANCE_ETHERNET 1
>> +#define HEADER_INSTANCE_VLAN_OUTER 2
>> +#define HEADER_INSTANCE_IPV4 3
>> +#define HEADER_INSTANCE_IN_LPORT 4
>> +#define HEADER_INSTANCE_GOTO_TABLE 5
>> +#define HEADER_INSTANCE_GROUP_ID 6
>> +
>> +struct net_flow_jump_table parse_ethernet[3] = {
>> +       {
>> +               .field = {
>> +                  .header = HEADER_ETHERNET,
>> +                  .field = HEADER_ETHERNET_ETHERTYPE,
>> +                  .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
>> +                  .value_u16 = 0x0800,
>
> How is htons/ntohs conversions happening here?

my current stance is to leave everything in host order in the model
and let the drivers do conversions as needed. For example some drivers
want the vlan vid in host order others network order. I think its
more readable above then with hton*() throughout.

>
> Since these are network header fields, seems you want htons(0x0800).
>
>> +               },
>> +               .node = HEADER_INSTANCE_IPV4,
>> +       },
>> +       {
>> +               .field = {
>> +                  .header = HEADER_ETHERNET,
>> +                  .field = HEADER_ETHERNET_ETHERTYPE,
>> +                  .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
>> +                  .value_u16 = 0x8100,
>> +               },
>> +               .node = HEADER_INSTANCE_VLAN_OUTER,
>> +       },
>> +       {
>> +               .field = {0},
>> +               .node = 0,
>> +       },
>
> just use NULL,

Yep done throughout.

[...]

>> +/* table definition */
>> +struct net_flow_field_ref matches_ig_port[2] = {
>> +       { .instance = HEADER_INSTANCE_IN_LPORT,
>> +         .header = HEADER_METADATA,
>> +         .field = HEADER_METADATA_IN_LPORT,
>> +         .mask_type = NET_FLOW_MASK_TYPE_LPM},
>
> Need other mask type, not LPM.

v2 will have the additional mask types.

>
>
>> +struct net_flow_table *rocker_table_list[7] = {
>> +       &ingress_port_table,
>> +       &vlan_table,
>> +       &term_mac_table,
>> +       &ucast_routing_table,
>> +       &bridge_table,
>> +       &acl_table,
>> +       &null_table,
>> +};
>
> cool stuff

bit of work to get here but sort of fun to start defining
pipelines like this.

[...]

>> +struct net_flow_tbl_node *rocker_table_nodes[7] = {
>> +       &table_node_ingress_port,
>> +       &table_node_vlan,
>> +       &table_node_term_mac,
>> +       &table_node_ucast_routing,
>> +       &table_node_bridge,
>> +       &table_node_acl,
>> +       &table_node_nil,
>> +};
>
> Cool...getting tired but will review this again in v2

Great thanks for the detailed feedback.

>
>> +#endif /*_MY_PIPELINE_H*/
>
> ROCKER
>
>>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch
  2015-01-06 17:00     ` John Fastabend
@ 2015-01-06 17:16       ` Scott Feldman
  2015-01-06 17:49         ` John Fastabend
  0 siblings, 1 reply; 60+ messages in thread
From: Scott Feldman @ 2015-01-06 17:16 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On Tue, Jan 6, 2015 at 9:00 AM, John Fastabend <john.fastabend@gmail.com> wrote:
> On 01/05/2015 11:01 PM, Scott Feldman wrote:
>>> +
>>> +struct net_flow_jump_table parse_ethernet[3] = {
>>> +       {
>>> +               .field = {
>>> +                  .header = HEADER_ETHERNET,
>>> +                  .field = HEADER_ETHERNET_ETHERTYPE,
>>> +                  .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
>>> +                  .value_u16 = 0x0800,

ETH_P_IP, etc

>>
>>
>> How is htons/ntohs conversions happening here?
>
>
> my current stance is to leave everything in host order in the model
> and let the drivers do conversions as needed. For example some drivers
> want the vlan vid in host order others network order. I think its
> more readable above then with hton*() throughout.

Hmmm...I would argue adding htons/htonl makes it more readable in the
sense that it's a reminder that this is a field in a network header,
to be used for matching against packet headers, which use
network-ordering.  Store the field in the order best for comparison
with the raw pkt data.  Drivers may still need to do some conversion
if the field is programmed in hardware in a diff order.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch
  2015-01-06 17:16       ` Scott Feldman
@ 2015-01-06 17:49         ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-06 17:49 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On 01/06/2015 09:16 AM, Scott Feldman wrote:
> On Tue, Jan 6, 2015 at 9:00 AM, John Fastabend <john.fastabend@gmail.com> wrote:
>> On 01/05/2015 11:01 PM, Scott Feldman wrote:
>>>> +
>>>> +struct net_flow_jump_table parse_ethernet[3] = {
>>>> +       {
>>>> +               .field = {
>>>> +                  .header = HEADER_ETHERNET,
>>>> +                  .field = HEADER_ETHERNET_ETHERTYPE,
>>>> +                  .type = NET_FLOW_FIELD_REF_ATTR_TYPE_U16,
>>>> +                  .value_u16 = 0x0800,
>
> ETH_P_IP, etc
>
>>>
>>>
>>> How is htons/ntohs conversions happening here?
>>
>>
>> my current stance is to leave everything in host order in the model
>> and let the drivers do conversions as needed. For example some drivers
>> want the vlan vid in host order others network order. I think its
>> more readable above then with hton*() throughout.
>
> Hmmm...I would argue adding htons/htonl makes it more readable in the
> sense that it's a reminder that this is a field in a network header,
> to be used for matching against packet headers, which use
> network-ordering.  Store the field in the order best for comparison
> with the raw pkt data.  Drivers may still need to do some conversion
> if the field is programmed in hardware in a diff order.
>

Easy enough here, but then when we set_flows what do we use
network-ordering or host? If it can be network-order in some
cases and host-order in others its hard to resolve pragmatically.
Humans at a CLI can most likely get it right for well known fields
such as VLAN IDs but for less common fields (maybe proprietary)
or management software it gets tricky.

I guess we could add a flag to indicate byte ordering.

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 08/11] net: rocker: add get flow API operation
  2015-01-06 16:57       ` Scott Feldman
@ 2015-01-06 17:50         ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-06 17:50 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Thomas Graf, Jiří Pírko, Jamal Hadi Salim,
	simon.horman, Netdev, David S. Miller, Andy Gospodarek

On 01/06/2015 08:57 AM, Scott Feldman wrote:
> On Tue, Jan 6, 2015 at 6:59 AM, John Fastabend <john.fastabend@gmail.com> wrote:
>> On 01/05/2015 11:40 PM, Scott Feldman wrote:
>>>
>>> On Wed, Dec 31, 2014 at 11:48 AM, John Fastabend
>>> <john.fastabend@gmail.com> wrote:
>>>>
>>>> Add operations to get flows. I wouldn't mind cleaning this code
>>>> up a bit but my first attempt to do this used macros which shortered
>>>> the code up but when I was done I decided it just made the code
>>>> unreadable and unmaintainable.
>>>>
>>>> I might think about it a bit more but this implementation albeit
>>>> a bit long and repeatative is easier to understand IMO.
>>>
>>>
>>> Dang, you put a lot of work into this one.
>>>
>>> Something doesn't feel right though.  In this case, rocker driver just
>>> happened to have cached all the flow/group stuff in hash tables in
>>> software, so you don't need to query thru to the device to extract the
>>> if_flow info.  What doesn't feel right is all the work need in the
>>> driver.  For each and every driver.  get_flows needs to go above
>>> driver, somehow.
>>
>>
>> Another option is to have a software cache in the flow_table.c I
>> was trying to avoid caching as I really don't expect 'get' operations
>> to be fast path and going to hardware seems good enough for me.
>> Other than its a bit annoying to write the mapping code.
>
> Caching in flow_table.c seems best to me as drivers/devices don't need
> to be involved and the cache can server multiple users of the API.
> Are there cases where the device could get flow table entries
> installed/deleted outside the API?  For example, if the device was
> learning MAC addresses, and did automatic table insertions.  We worked
> around that case with the recent L2 swdev support by pushing learned
> MAC addrs up to bridge's FDB so software and hardware tables stay
> synced.
>

OK I guess I'm convinced. I'll go ahead and cache the flow entries in
software. I'll work this into v2.

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-05 18:59     ` John Fastabend
                         ` (2 preceding siblings ...)
  2015-01-06  0:45       ` John Fastabend
@ 2015-01-07 10:07       ` Or Gerlitz
  2015-01-07 16:35         ` John Fastabend
  3 siblings, 1 reply; 60+ messages in thread
From: Or Gerlitz @ 2015-01-07 10:07 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, Linux Netdev List, David Miller,
	Andy Gospodarek

On Mon, Jan 5, 2015 at 8:59 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>>> +struct sk_buff *net_flow_build_actions_msg(struct net_flow_action **a,
>>> +                                          struct net_device *dev,
>>> +                                          u32 portid, int seq, u8 cmd)
>>> +{
>>> +       struct genlmsghdr *hdr;
>>> +       struct sk_buff *skb;
>>> +       int err = -ENOBUFS;
>>> +
>>> +       skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>>
>>
>> genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
>
>
> fixed along with the other cases.

small nit here, net_flow_build_actions_msg can be made static, it's
called only from within this file

few more nits... checkpatch --strict produces bunch of "CHECK: Please
use a blank line after function/struct/union/enum declarations"
comments, I guess worth fixing too.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-07 10:07       ` Or Gerlitz
@ 2015-01-07 16:35         ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-07 16:35 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, Linux Netdev List, David Miller,
	Andy Gospodarek

On 01/07/2015 02:07 AM, Or Gerlitz wrote:
> On Mon, Jan 5, 2015 at 8:59 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>>>> +struct sk_buff *net_flow_build_actions_msg(struct net_flow_action **a,
>>>> +                                          struct net_device *dev,
>>>> +                                          u32 portid, int seq, u8 cmd)
>>>> +{
>>>> +       struct genlmsghdr *hdr;
>>>> +       struct sk_buff *skb;
>>>> +       int err = -ENOBUFS;
>>>> +
>>>> +       skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>>>
>>>
>>> genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
>>
>>
>> fixed along with the other cases.
>
> small nit here, net_flow_build_actions_msg can be made static, it's
> called only from within this file
>
> few more nits... checkpatch --strict produces bunch of "CHECK: Please
> use a blank line after function/struct/union/enum declarations"
> comments, I guess worth fixing too.
>

Thanks. will fix in v2.

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 00/11] A flow API
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (13 preceding siblings ...)
  2015-01-06 12:23 ` Jamal Hadi Salim
@ 2015-01-08 15:14 ` Or Gerlitz
  2015-01-09 17:26   ` John Fastabend
  2015-01-08 18:03 ` Jiri Pirko
  15 siblings, 1 reply; 60+ messages in thread
From: Or Gerlitz @ 2015-01-08 15:14 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, Linux Netdev List, David Miller,
	Andy Gospodarek

On Wed, Dec 31, 2014 at 9:45 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> For some examples and maybe a bit more illustrative description I
> posted a quickly typed up set of notes on github io pages. Here we
> can show the description along with images produced by the flow tool
> showing the pipeline. Once we settle a bit more on the API we should
> probably do a clean up of this and other threads happening and commit
> something to the Documentation directory.
>
>  http://jrfastab.github.io/jekyll/update/2014/12/21/flow-api.html

John, Going over your excellent tutorial, the distinction between
action and operation isn’t clear... specifically, the paragraph
“Although this gives us a list of actions we can perform on a packet
and a set of argument to give the action so we can use them it does
not supply the operations performed on the packet” is a bit vague.

Or.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow
  2014-12-31 19:46 ` [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow John Fastabend
  2015-01-06  6:19   ` Scott Feldman
@ 2015-01-08 17:39   ` Jiri Pirko
  2015-01-09  6:21     ` John Fastabend
  1 sibling, 1 reply; 60+ messages in thread
From: Jiri Pirko @ 2015-01-08 17:39 UTC (permalink / raw)
  To: John Fastabend; +Cc: tgraf, sfeldma, jhs, simon.horman, netdev, davem, andy

Wed, Dec 31, 2014 at 08:46:16PM CET, john.fastabend@gmail.com wrote:
>Now that the device capabilities are exposed we can add support to
>add and delete flows from the tables.
>
>The two operations are
>
>table_set_flows :
>
>  The set flow operations is used to program a set of flows into a
>  hardware device table. The message is consumed via netlink encoded
>  message which is then decoded into a null terminated  array of
>  flow entry structures. A flow entry structure is defined as
>
>     struct net_flow_flow {
>			  int table_id;
>			  int uid;
>			  int priority;
>			  struct net_flow_field_ref *matches;
>			  struct net_flow_action *actions;
>     }
>
>  The table id is the _uid_ returned from 'get_tables' operatoins.
>  Matches is a set of match criteria for packets with a logical AND
>  operation done on the set so packets match the entire criteria.
>  Actions provide a set of actions to perform when the flow rule is
>  hit. Both matches and actions are null terminated arrays.
>
>  The flows are configured in hardware using an ndo op. We do not
>  provide a commit operation at the moment and expect hardware
>  commits the flows one at a time. Future work may require a commit
>  operation to tell the hardware we are done loading flow rules. On
>  some hardware this will help bulk updates.
>
>  Its possible for hardware to return an error from a flow set
>  operation. This can occur for many reasons both transient and
>  resource constraints. We have different error handling strategies
>  built in and listed here,
>
>    *_ERROR_ABORT      abort on first error with errmsg
>
>    *_ERROR_CONTINUE   continue programming flows no errmsg
>
>    *_ERROR_ABORT_LOG  abort on first error and return flow that
> 		       failed to user space in reply msg
>
>    *_ERROR_CONT_LOG   continue programming flows and return a list
>		       of flows that failed to user space in a reply
>		       msg.
>
>  notably missing is a rollback error strategy. I don't have a
>  use for this in software yet but the strategy can be added with
>  *_ERROR_ROLLBACK for example.
>
>table_del_flows
>
>  The delete flow operation uses the same structures and error
>  handling strategies as the table_set_flows operations. Although on
>  delete messges ommit the matches/actions arrays because they are
>  not needed to lookup the flow.
>
>Also thanks to Simon Horman for fixes and other help.
>
>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>---
> include/linux/if_flow.h      |   21 ++
> include/linux/netdevice.h    |    8 +
> include/uapi/linux/if_flow.h |   49 ++++
> net/core/flow_table.c        |  501 ++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 579 insertions(+)
>
>diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
>index 1b6c1ea..20fa752 100644
>--- a/include/linux/if_flow.h
>+++ b/include/linux/if_flow.h
>@@ -90,4 +90,25 @@ struct net_flow_tbl_node {
> 	__u32 flags;
> 	struct net_flow_jump_table *jump;
> };
>+
>+/**
>+ * @struct net_flow_flow
>+ * @brief describes the match/action entry
>+ *
>+ * @uid unique identifier for flow
>+ * @priority priority to execute flow match/action in table
>+ * @match null terminated set of match uids match criteria
>+ * @actoin null terminated set of action uids to apply to match
>+ *
>+ * Flows must match all entries in match set.
>+ */
>+struct net_flow_flow {
>+	int table_id;
>+	int uid;
>+	int priority;
>+	struct net_flow_field_ref *matches;
>+	struct net_flow_action *actions;
>+};
>+
>+int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow);
> #endif
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 3c3c856..be8d4e4 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1197,6 +1197,14 @@ struct net_device_ops {
> 	struct net_flow_header	**(*ndo_flow_get_headers)(struct net_device *dev);
> 	struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev);
> 	struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev);
>+	int		        (*ndo_flow_get_flows)(struct sk_buff *skb,
>+						      struct net_device *dev,
>+						      int table,
>+						      int min, int max);
>+	int		        (*ndo_flow_set_flows)(struct net_device *dev,
>+						      struct net_flow_flow *f);
>+	int		        (*ndo_flow_del_flows)(struct net_device *dev,
>+						      struct net_flow_flow *f);
> #endif
> };
> 
>diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
>index 2acdb38..125cdc6 100644
>--- a/include/uapi/linux/if_flow.h
>+++ b/include/uapi/linux/if_flow.h
>@@ -329,6 +329,48 @@ enum {
> #define NET_FLOW_TABLE_GRAPH_MAX (__NET_FLOW_TABLE_GRAPH_MAX - 1)
> 
> enum {
>+	NET_FLOW_NET_FLOW_UNSPEC,
>+	NET_FLOW_FLOW,
>+	__NET_FLOW_NET_FLOW_MAX,
>+};
>+#define NET_FLOW_NET_FLOW_MAX (__NET_FLOW_NET_FLOW_MAX - 1)
>+
>+enum {
>+	NET_FLOW_TABLE_FLOWS_UNSPEC,
>+	NET_FLOW_TABLE_FLOWS_TABLE,
>+	NET_FLOW_TABLE_FLOWS_MINPRIO,
>+	NET_FLOW_TABLE_FLOWS_MAXPRIO,
>+	NET_FLOW_TABLE_FLOWS_FLOWS,
>+	__NET_FLOW_TABLE_FLOWS_MAX,
>+};
>+#define NET_FLOW_TABLE_FLOWS_MAX (__NET_FLOW_TABLE_FLOWS_MAX - 1)
>+
>+enum {
>+	/* Abort with normal errmsg */
>+	NET_FLOW_FLOWS_ERROR_ABORT,
>+	/* Ignore errors and continue without logging */
>+	NET_FLOW_FLOWS_ERROR_CONTINUE,
>+	/* Abort and reply with invalid flow fields */
>+	NET_FLOW_FLOWS_ERROR_ABORT_LOG,
>+	/* Continue and reply with list of invalid flows */
>+	NET_FLOW_FLOWS_ERROR_CONT_LOG,
>+	__NET_FLOWS_FLOWS_ERROR_MAX,
>+};
>+#define NET_FLOWS_FLOWS_ERROR_MAX (__NET_FLOWS_FLOWS_ERROR_MAX - 1)
>+
>+enum {
>+	NET_FLOW_ATTR_UNSPEC,
>+	NET_FLOW_ATTR_ERROR,
>+	NET_FLOW_ATTR_TABLE,
>+	NET_FLOW_ATTR_UID,
>+	NET_FLOW_ATTR_PRIORITY,
>+	NET_FLOW_ATTR_MATCHES,
>+	NET_FLOW_ATTR_ACTIONS,
>+	__NET_FLOW_ATTR_MAX,
>+};
>+#define NET_FLOW_ATTR_MAX (__NET_FLOW_ATTR_MAX - 1)
>+
>+enum {
> 	NET_FLOW_IDENTIFIER_IFINDEX, /* net_device ifindex */
> };
> 
>@@ -343,6 +385,9 @@ enum {
> 	NET_FLOW_HEADER_GRAPH,
> 	NET_FLOW_TABLE_GRAPH,
> 
>+	NET_FLOW_FLOWS,
>+	NET_FLOW_FLOWS_ERROR,
>+
> 	__NET_FLOW_MAX,
> 	NET_FLOW_MAX = (__NET_FLOW_MAX - 1),
> };
>@@ -354,6 +399,10 @@ enum {
> 	NET_FLOW_TABLE_CMD_GET_HDR_GRAPH,
> 	NET_FLOW_TABLE_CMD_GET_TABLE_GRAPH,
> 
>+	NET_FLOW_TABLE_CMD_GET_FLOWS,
>+	NET_FLOW_TABLE_CMD_SET_FLOWS,
>+	NET_FLOW_TABLE_CMD_DEL_FLOWS,
>+
> 	__NET_FLOW_CMD_MAX,
> 	NET_FLOW_CMD_MAX = (__NET_FLOW_CMD_MAX - 1),
> };
>diff --git a/net/core/flow_table.c b/net/core/flow_table.c
>index ec3f06d..f4cf293 100644
>--- a/net/core/flow_table.c
>+++ b/net/core/flow_table.c
>@@ -774,6 +774,489 @@ static int net_flow_cmd_get_table_graph(struct sk_buff *skb,
> 	return genlmsg_reply(msg, info);
> }
> 
>+static struct sk_buff *net_flow_build_flows_msg(struct net_device *dev,
>+						u32 portid, int seq, u8 cmd,
>+						int min, int max, int table)
>+{
>+	struct genlmsghdr *hdr;
>+	struct nlattr *flows;
>+	struct sk_buff *skb;
>+	int err = -ENOBUFS;
>+
>+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>+	if (!skb)
>+		return ERR_PTR(-ENOBUFS);
>+
>+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
>+	if (!hdr)
>+		goto out;
>+
>+	if (nla_put_u32(skb,
>+			NET_FLOW_IDENTIFIER_TYPE,
>+			NET_FLOW_IDENTIFIER_IFINDEX) ||
>+	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
>+		err = -ENOBUFS;
>+		goto out;
>+	}
>+
>+	flows = nla_nest_start(skb, NET_FLOW_FLOWS);
>+	if (!flows) {
>+		err = -EMSGSIZE;
>+		goto out;
>+	}
>+
>+	err = dev->netdev_ops->ndo_flow_get_flows(skb, dev, table, min, max);
>+	if (err < 0)
>+		goto out_cancel;
>+
>+	nla_nest_end(skb, flows);
>+
>+	err = genlmsg_end(skb, hdr);
>+	if (err < 0)
>+		goto out;
>+
>+	return skb;
>+out_cancel:
>+	nla_nest_cancel(skb, flows);
>+out:
>+	nlmsg_free(skb);
>+	return ERR_PTR(err);
>+}
>+
>+static const
>+struct nla_policy net_flow_table_flows_policy[NET_FLOW_TABLE_FLOWS_MAX + 1] = {
>+	[NET_FLOW_TABLE_FLOWS_TABLE]   = { .type = NLA_U32,},
>+	[NET_FLOW_TABLE_FLOWS_MINPRIO] = { .type = NLA_U32,},
>+	[NET_FLOW_TABLE_FLOWS_MAXPRIO] = { .type = NLA_U32,},
>+	[NET_FLOW_TABLE_FLOWS_FLOWS]   = { .type = NLA_NESTED,},
>+};
>+
>+static int net_flow_table_cmd_get_flows(struct sk_buff *skb,
>+					struct genl_info *info)
>+{
>+	struct nlattr *tb[NET_FLOW_TABLE_FLOWS_MAX+1];
>+	int table, min = -1, max = -1;
>+	struct net_device *dev;
>+	struct sk_buff *msg;
>+	int err = -EINVAL;
>+
>+	dev = net_flow_get_dev(info);
>+	if (!dev)
>+		return -EINVAL;
>+
>+	if (!dev->netdev_ops->ndo_flow_get_flows) {
>+		dev_put(dev);
>+		return -EOPNOTSUPP;
>+	}
>+
>+	if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] ||
>+	    !info->attrs[NET_FLOW_IDENTIFIER] ||
>+	    !info->attrs[NET_FLOW_FLOWS])
>+		goto out;
>+
>+	err = nla_parse_nested(tb, NET_FLOW_TABLE_FLOWS_MAX,
>+			       info->attrs[NET_FLOW_FLOWS],
>+			       net_flow_table_flows_policy);
>+	if (err)
>+		goto out;
>+
>+	if (!tb[NET_FLOW_TABLE_FLOWS_TABLE])
>+		goto out;
>+
>+	table = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_TABLE]);
>+
>+	if (tb[NET_FLOW_TABLE_FLOWS_MINPRIO])
>+		min = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MINPRIO]);
>+	if (tb[NET_FLOW_TABLE_FLOWS_MAXPRIO])
>+		max = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MAXPRIO]);
>+
>+	msg = net_flow_build_flows_msg(dev,
>+				       info->snd_portid,
>+				       info->snd_seq,
>+				       NET_FLOW_TABLE_CMD_GET_FLOWS,
>+				       min, max, table);
>+	dev_put(dev);
>+
>+	if (IS_ERR(msg))
>+		return PTR_ERR(msg);
>+
>+	return genlmsg_reply(msg, info);
>+out:
>+	dev_put(dev);
>+	return err;
>+}
>+
>+static struct sk_buff *net_flow_start_errmsg(struct net_device *dev,
>+					     struct genlmsghdr **hdr,
>+					     u32 portid, int seq, u8 cmd)
>+{
>+	struct genlmsghdr *h;
>+	struct sk_buff *skb;
>+
>+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>+	if (!skb)
>+		return ERR_PTR(-EMSGSIZE);
>+
>+	h = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
>+	if (!h)
>+		return ERR_PTR(-EMSGSIZE);
>+
>+	if (nla_put_u32(skb,
>+			NET_FLOW_IDENTIFIER_TYPE,
>+			NET_FLOW_IDENTIFIER_IFINDEX) ||
>+	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex))
>+		return ERR_PTR(-EMSGSIZE);
>+
>+	*hdr = h;
>+	return skb;
>+}
>+
>+static struct sk_buff *net_flow_end_flow_errmsg(struct sk_buff *skb,
>+						struct genlmsghdr *hdr)
>+{
>+	int err;
>+
>+	err = genlmsg_end(skb, hdr);
>+	if (err < 0) {
>+		nlmsg_free(skb);
>+		return ERR_PTR(err);
>+	}
>+
>+	return skb;
>+}
>+
>+static int net_flow_put_flow_action(struct sk_buff *skb,
>+				    struct net_flow_action *a)
>+{
>+	struct nlattr *action, *sigs;
>+	int i, err = 0;
>+
>+	action = nla_nest_start(skb, NET_FLOW_ACTION);
>+	if (!action)
>+		return -EMSGSIZE;
>+
>+	if (nla_put_u32(skb, NET_FLOW_ACTION_ATTR_UID, a->uid))
>+		return -EMSGSIZE;
>+
>+	if (!a->args)
>+		goto done;
>+
>+	for (i = 0; a->args[i].type; i++) {
>+		sigs = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
>+		if (!sigs) {
>+			nla_nest_cancel(skb, action);
>+			return -EMSGSIZE;
>+		}
>+
>+		err = net_flow_put_act_types(skb, a[i].args);
>+		if (err) {
>+			nla_nest_cancel(skb, action);
>+			nla_nest_cancel(skb, sigs);
>+			return err;
>+		}
>+		nla_nest_end(skb, sigs);
>+	}
>+
>+done:
>+	nla_nest_end(skb, action);
>+	return 0;
>+}
>+
>+int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow)
>+{
>+	struct nlattr *flows, *matches;
>+	struct nlattr *actions = NULL; /* must be null to unwind */
>+	int err, j, i = 0;
>+
>+	flows = nla_nest_start(skb, NET_FLOW_FLOW);
>+	if (!flows)
>+		goto put_failure;
>+
>+	if (nla_put_u32(skb, NET_FLOW_ATTR_TABLE, flow->table_id) ||
>+	    nla_put_u32(skb, NET_FLOW_ATTR_UID, flow->uid) ||
>+	    nla_put_u32(skb, NET_FLOW_ATTR_PRIORITY, flow->priority))
>+		goto flows_put_failure;
>+
>+	if (flow->matches) {
>+		matches = nla_nest_start(skb, NET_FLOW_ATTR_MATCHES);
>+		if (!matches)
>+			goto flows_put_failure;
>+
>+		for (j = 0; flow->matches && flow->matches[j].header; j++) {
>+			struct net_flow_field_ref *f = &flow->matches[j];
>+
>+			if (!f->header)
>+				continue;
>+
>+			nla_put(skb, NET_FLOW_FIELD_REF, sizeof(*f), f);
>+		}
>+		nla_nest_end(skb, matches);
>+	}
>+
>+	if (flow->actions) {
>+		actions = nla_nest_start(skb, NET_FLOW_ATTR_ACTIONS);
>+		if (!actions)
>+			goto flows_put_failure;
>+
>+		for (i = 0; flow->actions && flow->actions[i].uid; i++) {
>+			err = net_flow_put_flow_action(skb, &flow->actions[i]);
>+			if (err) {
>+				nla_nest_cancel(skb, actions);
>+				goto flows_put_failure;
>+			}
>+		}
>+		nla_nest_end(skb, actions);
>+	}
>+
>+	nla_nest_end(skb, flows);
>+	return 0;
>+
>+flows_put_failure:
>+	nla_nest_cancel(skb, flows);
>+put_failure:
>+	return -EMSGSIZE;
>+}
>+EXPORT_SYMBOL(net_flow_put_flow);
>+
>+static int net_flow_get_field(struct net_flow_field_ref *field,
>+			      struct nlattr *nla)
>+{
>+	if (nla_type(nla) != NET_FLOW_FIELD_REF)
>+		return -EINVAL;
>+
>+	if (nla_len(nla) < sizeof(*field))
>+		return -EINVAL;
>+
>+	*field = *(struct net_flow_field_ref *)nla_data(nla);
>+	return 0;
>+}
>+
>+static int net_flow_get_action(struct net_flow_action *a, struct nlattr *attr)
>+{
>+	struct nlattr *act[NET_FLOW_ACTION_ATTR_MAX+1];
>+	struct nlattr *args;
>+	int rem;
>+	int err, count = 0;
>+
>+	if (nla_type(attr) != NET_FLOW_ACTION) {
>+		pr_warn("%s: expected NET_FLOW_ACTION\n", __func__);
>+		return 0;
>+	}
>+
>+	err = nla_parse_nested(act, NET_FLOW_ACTION_ATTR_MAX,
>+			       attr, net_flow_action_policy);
>+	if (err < 0)
>+		return err;
>+
>+	if (!act[NET_FLOW_ACTION_ATTR_UID] ||
>+	    !act[NET_FLOW_ACTION_ATTR_SIGNATURE])
>+		return -EINVAL;
>+
>+	a->uid = nla_get_u32(act[NET_FLOW_ACTION_ATTR_UID]);
>+
>+	nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem)
>+		count++; /* unoptimized max possible */
>+
>+	a->args = kcalloc(count + 1,
>+			  sizeof(struct net_flow_action_arg),
>+			  GFP_KERNEL);
>+	count = 0;
>+
>+	nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem) {
>+		if (nla_type(args) != NET_FLOW_ACTION_ARG)
>+			continue;
>+
>+		if (nla_len(args) < sizeof(struct net_flow_action_arg)) {
>+			kfree(a->args);
>+			return -EINVAL;
>+		}
>+
>+		a->args[count] = *(struct net_flow_action_arg *)nla_data(args);
>+	}
>+	return 0;
>+}
>+
>+static const
>+struct nla_policy net_flow_flow_policy[NET_FLOW_ATTR_MAX + 1] = {
>+	[NET_FLOW_ATTR_TABLE]		= { .type = NLA_U32 },
>+	[NET_FLOW_ATTR_UID]		= { .type = NLA_U32 },
>+	[NET_FLOW_ATTR_PRIORITY]	= { .type = NLA_U32 },
>+	[NET_FLOW_ATTR_MATCHES]		= { .type = NLA_NESTED },
>+	[NET_FLOW_ATTR_ACTIONS]		= { .type = NLA_NESTED },
>+};
>+
>+static int net_flow_get_flow(struct net_flow_flow *flow, struct nlattr *attr)
>+{
>+	struct nlattr *f[NET_FLOW_ATTR_MAX+1];
>+	struct nlattr *attr2;
>+	int rem, err;
>+	int count = 0;
>+
>+	err = nla_parse_nested(f, NET_FLOW_ATTR_MAX,
>+			       attr, net_flow_flow_policy);
>+	if (err < 0)
>+		return -EINVAL;
>+
>+	if (!f[NET_FLOW_ATTR_TABLE] || !f[NET_FLOW_ATTR_UID] ||
>+	    !f[NET_FLOW_ATTR_PRIORITY])
>+		return -EINVAL;
>+
>+	flow->table_id = nla_get_u32(f[NET_FLOW_ATTR_TABLE]);
>+	flow->uid = nla_get_u32(f[NET_FLOW_ATTR_UID]);
>+	flow->priority = nla_get_u32(f[NET_FLOW_ATTR_PRIORITY]);
>+
>+	flow->matches = NULL;
>+	flow->actions = NULL;
>+
>+	if (f[NET_FLOW_ATTR_MATCHES]) {
>+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_MATCHES], rem)
>+			count++;
>+
>+		/* Null terminated list of matches */
>+		flow->matches = kcalloc(count + 1,
>+					sizeof(struct net_flow_field_ref),
>+					GFP_KERNEL);
>+		if (!flow->matches)
>+			return -ENOMEM;
>+
>+		count = 0;
>+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_MATCHES], rem) {
>+			err = net_flow_get_field(&flow->matches[count], attr2);
>+			if (err) {
>+				kfree(flow->matches);
>+				return err;
>+			}
>+			count++;
>+		}
>+	}
>+
>+	if (f[NET_FLOW_ATTR_ACTIONS]) {
>+		count = 0;
>+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_ACTIONS], rem)
>+			count++;
>+
>+		/* Null terminated list of actions */
>+		flow->actions = kcalloc(count + 1,
>+					sizeof(struct net_flow_action),
>+					GFP_KERNEL);
>+		if (!flow->actions) {
>+			kfree(flow->matches);
>+			return -ENOMEM;
>+		}
>+
>+		count = 0;
>+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_ACTIONS], rem) {
>+			err = net_flow_get_action(&flow->actions[count], attr2);
>+			if (err) {
>+				kfree(flow->matches);
>+				kfree(flow->actions);
>+				return err;
>+			}
>+			count++;
>+		}
>+	}
>+
>+	return 0;
>+}
>+
>+static int net_flow_table_cmd_flows(struct sk_buff *recv_skb,
>+				    struct genl_info *info)
>+{
>+	int rem, err_handle = NET_FLOW_FLOWS_ERROR_ABORT;
>+	struct sk_buff *skb = NULL;
>+	struct net_flow_flow this;
>+	struct genlmsghdr *hdr;
>+	struct net_device *dev;
>+	struct nlattr *flow, *flows;
>+	int cmd = info->genlhdr->cmd;
>+	int err = -EOPNOTSUPP;

I don't like the inconsistency in var naming. Sometimes, "flow" is of type
struct nlattr, sometimes it is of type struct net_flow_flow
(net_flow_get_flow). It is slightly confusing.

>+
>+	dev = net_flow_get_dev(info);
>+	if (!dev)
>+		return -EINVAL;
>+
>+	if (!dev->netdev_ops->ndo_flow_set_flows ||
>+	    !dev->netdev_ops->ndo_flow_del_flows)
>+		goto out;
>+
>+	if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] ||
>+	    !info->attrs[NET_FLOW_IDENTIFIER] ||
>+	    !info->attrs[NET_FLOW_FLOWS]) {
>+		err = -EINVAL;
>+		goto out;
>+	}
>+
>+	if (info->attrs[NET_FLOW_FLOWS_ERROR])
>+		err_handle = nla_get_u32(info->attrs[NET_FLOW_FLOWS_ERROR]);
>+
>+	nla_for_each_nested(flow, info->attrs[NET_FLOW_FLOWS], rem) {
>+		if (nla_type(flow) != NET_FLOW_FLOW)
>+			continue;
>+
>+		err = net_flow_get_flow(&this, flow);
>+		if (err)
>+			goto out;
>+
>+		switch (cmd) {
>+		case NET_FLOW_TABLE_CMD_SET_FLOWS:
>+			err = dev->netdev_ops->ndo_flow_set_flows(dev, &this);
>+			break;
>+		case NET_FLOW_TABLE_CMD_DEL_FLOWS:
>+			err = dev->netdev_ops->ndo_flow_del_flows(dev, &this);
>+			break;
>+		default:
>+			err = -EOPNOTSUPP;
>+			break;
>+		}
>+
>+		if (err && err_handle != NET_FLOW_FLOWS_ERROR_CONTINUE) {
>+			if (!skb) {
>+				skb = net_flow_start_errmsg(dev, &hdr,
>+							    info->snd_portid,
>+							    info->snd_seq,
>+							    cmd);
>+				if (IS_ERR(skb)) {
>+					err = PTR_ERR(skb);
>+					goto out_plus_free;
>+				}
>+
>+				flows = nla_nest_start(skb, NET_FLOW_FLOWS);
>+				if (!flows) {
>+					err = -EMSGSIZE;
>+					goto out_plus_free;
>+				}
>+			}
>+
>+			net_flow_put_flow(skb, &this);
>+		}
>+
>+		/* Cleanup flow */
>+		kfree(this.matches);
>+		kfree(this.actions);
>+
>+		if (err && err_handle == NET_FLOW_FLOWS_ERROR_ABORT)
>+			goto out;
>+	}
>+
>+	dev_put(dev);
>+
>+	if (skb) {
>+		nla_nest_end(skb, flows);
>+		net_flow_end_flow_errmsg(skb, hdr);
>+		return genlmsg_reply(skb, info);
>+	}
>+	return 0;
>+
>+out_plus_free:
>+	kfree(this.matches);
>+	kfree(this.actions);

	Maybe this can be done by some "flow_free" helper...

>+out:
>+	if (skb)
>+		nlmsg_free(skb);
>+	dev_put(dev);
>+	return -EINVAL;
>+}
>+
> static const struct nla_policy net_flow_cmd_policy[NET_FLOW_MAX + 1] = {
> 	[NET_FLOW_IDENTIFIER_TYPE] = {.type = NLA_U32, },
> 	[NET_FLOW_IDENTIFIER]	   = {.type = NLA_U32, },
>@@ -815,6 +1298,24 @@ static const struct genl_ops net_flow_table_nl_ops[] = {
> 		.policy = net_flow_cmd_policy,
> 		.flags = GENL_ADMIN_PERM,
> 	},
>+	{
>+		.cmd = NET_FLOW_TABLE_CMD_GET_FLOWS,
>+		.doit = net_flow_table_cmd_get_flows,
>+		.policy = net_flow_cmd_policy,
>+		.flags = GENL_ADMIN_PERM,
>+	},
>+	{
>+		.cmd = NET_FLOW_TABLE_CMD_SET_FLOWS,
>+		.doit = net_flow_table_cmd_flows,
>+		.policy = net_flow_cmd_policy,
>+		.flags = GENL_ADMIN_PERM,
>+	},
>+	{
>+		.cmd = NET_FLOW_TABLE_CMD_DEL_FLOWS,
>+		.doit = net_flow_table_cmd_flows,
>+		.policy = net_flow_cmd_policy,
>+		.flags = GENL_ADMIN_PERM,
>+	},
> };
> 
> static int __init net_flow_nl_module_init(void)
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 03/11] net: flow_table: add apply action argument to tables
  2014-12-31 19:46 ` [net-next PATCH v1 03/11] net: flow_table: add apply action argument to tables John Fastabend
@ 2015-01-08 17:41   ` Jiri Pirko
  2015-01-09  6:17     ` John Fastabend
  0 siblings, 1 reply; 60+ messages in thread
From: Jiri Pirko @ 2015-01-08 17:41 UTC (permalink / raw)
  To: John Fastabend; +Cc: tgraf, sfeldma, jhs, simon.horman, netdev, davem, andy

Wed, Dec 31, 2014 at 08:46:44PM CET, john.fastabend@gmail.com wrote:
>Actions may not always be applied after exiting a table. For example
>some pipelines may accumulate actions and then apply them at the end
>of a pipeline.
>
>To model this we use a table type called APPLY. Tables who share an
>apply identifier have their actions applied in one step.

Why this is a separate patch? Perhaps this can be squashed to one of the
previous ones?

>
>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>---
> include/linux/if_flow.h      |    1 +
> include/uapi/linux/if_flow.h |    1 +
> net/core/flow_table.c        |    1 +
> 3 files changed, 3 insertions(+)
>
>diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
>index 20fa752..a042a3d 100644
>--- a/include/linux/if_flow.h
>+++ b/include/linux/if_flow.h
>@@ -67,6 +67,7 @@ struct net_flow_table {
> 	char name[NET_FLOW_NAMSIZ];
> 	int uid;
> 	int source;
>+	int apply_action;
> 	int size;
> 	struct net_flow_field_ref *matches;
> 	int *actions;
>diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
>index 125cdc6..3c1a860 100644
>--- a/include/uapi/linux/if_flow.h
>+++ b/include/uapi/linux/if_flow.h
>@@ -265,6 +265,7 @@ enum {
> 	NET_FLOW_TABLE_ATTR_NAME,
> 	NET_FLOW_TABLE_ATTR_UID,
> 	NET_FLOW_TABLE_ATTR_SOURCE,
>+	NET_FLOW_TABLE_ATTR_APPLY,
> 	NET_FLOW_TABLE_ATTR_SIZE,
> 	NET_FLOW_TABLE_ATTR_MATCHES,
> 	NET_FLOW_TABLE_ATTR_ACTIONS,
>diff --git a/net/core/flow_table.c b/net/core/flow_table.c
>index f4cf293..97cdf92 100644
>--- a/net/core/flow_table.c
>+++ b/net/core/flow_table.c
>@@ -223,6 +223,7 @@ static int net_flow_put_table(struct net_device *dev,
> 	if (nla_put_string(skb, NET_FLOW_TABLE_ATTR_NAME, t->name) ||
> 	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_UID, t->uid) ||
> 	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SOURCE, t->source) ||
>+	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_APPLY, t->apply_action) ||
> 	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SIZE, t->size))
> 		return -EMSGSIZE;
> 
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 00/11] A flow API
  2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
                   ` (14 preceding siblings ...)
  2015-01-08 15:14 ` Or Gerlitz
@ 2015-01-08 18:03 ` Jiri Pirko
  2015-01-09 18:10   ` John Fastabend
  15 siblings, 1 reply; 60+ messages in thread
From: Jiri Pirko @ 2015-01-08 18:03 UTC (permalink / raw)
  To: John Fastabend; +Cc: tgraf, sfeldma, jhs, simon.horman, netdev, davem, andy

Wed, Dec 31, 2014 at 08:45:19PM CET, john.fastabend@gmail.com wrote:
>So... I could continue to mull over this and tweak bits and pieces
>here and there but I decided its best to get a wider group of folks
>looking at it and hopefulyl with any luck using it so here it is.
>
>This set creates a new netlink family and set of messages to configure
>flow tables in hardware. I tried to make the commit messages
>reasonably verbose at least in the flow_table patches.
>
>What we get at the end of this series is a working API to get device
>capabilities and program flows using the rocker switch.
>
>I created a user space tool 'flow' that I use to configure and query
>the devices it is posted here,
>
>	https://github.com/jrfastab/iprotue2-flow-tool
>
>For now it is a stand-alone tool but once the kernel bits get sorted
>out (I'm guessing there will need to be a few versions of this series
>to get it right) I would like to port it into the iproute2 package.
>This way we can keep all of our tooling in one package see 'bridge'
>for example.
>
>As far as testing, I've tested various combinations of tables and
>rules on the rocker switch and it seems to work. I have not tested
>100% of the rocker code paths though. It would be great to get some
>sort of automated framework around the API to do this. I don't
>think should gate the inclusion of the API though.
>
>I could use some help reviewing,
>
>  (a) error paths and netlink validation code paths
>
>  (b) Break down of structures vs netlink attributes. I
>      am trying to balance flexibility given by having
>      netlinnk TLV attributes vs conciseness. So some
>      things are passed as structures.
>
>  (c) are there any devices that have pipelines that we
>      can't represent with this API? It would be good to
>      know about these so we can design it in probably
>      in a future series.
>
>For some examples and maybe a bit more illustrative description I
>posted a quickly typed up set of notes on github io pages. Here we
>can show the description along with images produced by the flow tool
>showing the pipeline. Once we settle a bit more on the API we should
>probably do a clean up of this and other threads happening and commit
>something to the Documentation directory.
>
> http://jrfastab.github.io/jekyll/update/2014/12/21/flow-api.html
>
>Finally I have more patches to add support for creating and destroying
>tables. This allows users to define the pipeline at runtime rather
>than statically as rocker does now. After this set gets some traction
>I'll look at pushing them in a next round. However it likely requires
>adding another "world" to rocker. Another piece that I want to add is
>a description of the actions and metadata. This way user space can
>"learn" what an action is and how metadata interacts with the system.
>This work is under development.
>
>Thanks! Any comments/feedback always welcome.
>
>And also thanks to everyone who helped with this flow API so far. All
>the folks at Dusseldorf LPC, OVS summit Santa Clara, P4 authors for
>some inspiration, the collection of IETF FoRCES documents I mulled
>over, Netfilter workshop where I started to realize fixing ethtool
>was most likely not going to work, etc.
>
>---
>
>John Fastabend (11):
>      net: flow_table: create interface for hw match/action tables
>      net: flow_table: add flow, delete flow
>      net: flow_table: add apply action argument to tables
>      rocker: add pipeline model for rocker switch
>      net: rocker: add set flow rules
>      net: rocker: add group_id slices and drop explicit goto
>      net: rocker: add multicast path to bridging
>      net: rocker: add get flow API operation
>      net: rocker: add cookie to group acls and use flow_id to set cookie
>      net: rocker: have flow api calls set cookie value
>      net: rocker: implement delete flow routine

Truly impressive work John (including the "flow" tool, documentation).
Hat's off.

Currently, all is very userspace oriented and I understand the reason.
I also understand why Jamal is a bit nervous from that fact. I am as well..
Correct me if I'm wrong but this amount of "direct hw access" is
unprecedented. There have been kernel here to cover the hw differencies,
I wonder if there is any way to continue in this direction with flows...

What I would love to see in this initial patchset is "the internal user".
For example tc. The tc code could query the capabilities and decide what
"flows" to put into hw tables.

Jiri

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 03/11] net: flow_table: add apply action argument to tables
  2015-01-08 17:41   ` Jiri Pirko
@ 2015-01-09  6:17     ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-09  6:17 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: tgraf, sfeldma, jhs, simon.horman, netdev, davem, andy

On 01/08/2015 09:41 AM, Jiri Pirko wrote:
> Wed, Dec 31, 2014 at 08:46:44PM CET, john.fastabend@gmail.com wrote:
>> Actions may not always be applied after exiting a table. For example
>> some pipelines may accumulate actions and then apply them at the end
>> of a pipeline.
>>
>> To model this we use a table type called APPLY. Tables who share an
>> apply identifier have their actions applied in one step.
>
> Why this is a separate patch? Perhaps this can be squashed to one of the
> previous ones?
>

Good point mostly an artefact of how the code evolved I'll push it
into on of the previous patches as you suggest.

Thanks,
John


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow
  2015-01-08 17:39   ` Jiri Pirko
@ 2015-01-09  6:21     ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-09  6:21 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: tgraf, sfeldma, jhs, simon.horman, netdev, davem, andy

On 01/08/2015 09:39 AM, Jiri Pirko wrote:
> Wed, Dec 31, 2014 at 08:46:16PM CET, john.fastabend@gmail.com wrote:
>> Now that the device capabilities are exposed we can add support to
>> add and delete flows from the tables.
>>
>> The two operations are
>>

[...]

>> +
>> +static int net_flow_table_cmd_flows(struct sk_buff *recv_skb,
>> +				    struct genl_info *info)
>> +{
>> +	int rem, err_handle = NET_FLOW_FLOWS_ERROR_ABORT;
>> +	struct sk_buff *skb = NULL;
>> +	struct net_flow_flow this;
>> +	struct genlmsghdr *hdr;
>> +	struct net_device *dev;
>> +	struct nlattr *flow, *flows;
>> +	int cmd = info->genlhdr->cmd;
>> +	int err = -EOPNOTSUPP;
>
> I don't like the inconsistency in var naming. Sometimes, "flow" is of type
> struct nlattr, sometimes it is of type struct net_flow_flow
> (net_flow_get_flow). It is slightly confusing.
>

Alexei made a similar comment I'll try to clean this up in v2.

>> +
>> +	dev = net_flow_get_dev(info);
>> +	if (!dev)
>> +		return -EINVAL;
>> +
>> +	if (!dev->netdev_ops->ndo_flow_set_flows ||
>> +	    !dev->netdev_ops->ndo_flow_del_flows)
>> +		goto out;
>> +
>> +	if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] ||
>> +	    !info->attrs[NET_FLOW_IDENTIFIER] ||
>> +	    !info->attrs[NET_FLOW_FLOWS]) {
>> +		err = -EINVAL;
>> +		goto out;
>> +	}
>> +
>> +	if (info->attrs[NET_FLOW_FLOWS_ERROR])
>> +		err_handle = nla_get_u32(info->attrs[NET_FLOW_FLOWS_ERROR]);
>> +
>> +	nla_for_each_nested(flow, info->attrs[NET_FLOW_FLOWS], rem) {
>> +		if (nla_type(flow) != NET_FLOW_FLOW)
>> +			continue;
>> +
>> +		err = net_flow_get_flow(&this, flow);
>> +		if (err)
>> +			goto out;
>> +
>> +		switch (cmd) {
>> +		case NET_FLOW_TABLE_CMD_SET_FLOWS:
>> +			err = dev->netdev_ops->ndo_flow_set_flows(dev, &this);
>> +			break;
>> +		case NET_FLOW_TABLE_CMD_DEL_FLOWS:
>> +			err = dev->netdev_ops->ndo_flow_del_flows(dev, &this);
>> +			break;
>> +		default:
>> +			err = -EOPNOTSUPP;
>> +			break;
>> +		}
>> +
>> +		if (err && err_handle != NET_FLOW_FLOWS_ERROR_CONTINUE) {
>> +			if (!skb) {
>> +				skb = net_flow_start_errmsg(dev, &hdr,
>> +							    info->snd_portid,
>> +							    info->snd_seq,
>> +							    cmd);
>> +				if (IS_ERR(skb)) {
>> +					err = PTR_ERR(skb);
>> +					goto out_plus_free;
>> +				}
>> +
>> +				flows = nla_nest_start(skb, NET_FLOW_FLOWS);
>> +				if (!flows) {
>> +					err = -EMSGSIZE;
>> +					goto out_plus_free;
>> +				}
>> +			}
>> +
>> +			net_flow_put_flow(skb, &this);
>> +		}
>> +
>> +		/* Cleanup flow */
>> +		kfree(this.matches);
>> +		kfree(this.actions);
>> +
>> +		if (err && err_handle == NET_FLOW_FLOWS_ERROR_ABORT)
>> +			goto out;
>> +	}
>> +
>> +	dev_put(dev);
>> +
>> +	if (skb) {
>> +		nla_nest_end(skb, flows);
>> +		net_flow_end_flow_errmsg(skb, hdr);
>> +		return genlmsg_reply(skb, info);
>> +	}
>> +	return 0;
>> +
>> +out_plus_free:
>> +	kfree(this.matches);
>> +	kfree(this.actions);
>
> 	Maybe this can be done by some "flow_free" helper...
>

Agreed I already wrote helpers for this on my local tree. I'll push it
in the next version as well.

Thanks,
John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 00/11] A flow API
  2015-01-08 15:14 ` Or Gerlitz
@ 2015-01-09 17:26   ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-09 17:26 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, Linux Netdev List, David Miller,
	Andy Gospodarek

On 01/08/2015 07:14 AM, Or Gerlitz wrote:
> On Wed, Dec 31, 2014 at 9:45 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> For some examples and maybe a bit more illustrative description I
>> posted a quickly typed up set of notes on github io pages. Here we
>> can show the description along with images produced by the flow tool
>> showing the pipeline. Once we settle a bit more on the API we should
>> probably do a clean up of this and other threads happening and commit
>> something to the Documentation directory.
>>
>>   http://jrfastab.github.io/jekyll/update/2014/12/21/flow-api.html
>
> John, Going over your excellent tutorial, the distinction between
> action and operation isn’t clear... specifically, the paragraph
> “Although this gives us a list of actions we can perform on a packet
> and a set of argument to give the action so we can use them it does
> not supply the operations performed on the packet” is a bit vague.
>
> Or.
>

Agreed that is a bit confusing. What I was trying to show is if two
hardware devices give you the same action but with different names
showing they are equivalent is not possible with the current API.
So either (a) you need to enforce every device names their actions
correctly or (b) provide a mechanism to describe the actions so we
can evaluate their equivalence.

Its actually worse then this what I want to eventually show is if
device A has support for a set of actions and device B has support
for another set. I want to be able to say things about the devices
like device A can support any action B can do but it may require
applying a 2 actions from A's collection of actions. (clear as mud?)
I'll try to clear it up in the documentation.

Thanks for looking it over.
.John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 00/11] A flow API
  2015-01-08 18:03 ` Jiri Pirko
@ 2015-01-09 18:10   ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-09 18:10 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: tgraf, sfeldma, jhs, simon.horman, netdev, davem, andy

On 01/08/2015 10:03 AM, Jiri Pirko wrote:
> Wed, Dec 31, 2014 at 08:45:19PM CET, john.fastabend@gmail.com wrote:
>> So... I could continue to mull over this and tweak bits and pieces
>> here and there but I decided its best to get a wider group of folks
>> looking at it and hopefulyl with any luck using it so here it is.
>>
>> This set creates a new netlink family and set of messages to configure
>> flow tables in hardware. I tried to make the commit messages
>> reasonably verbose at least in the flow_table patches.
>>
>> What we get at the end of this series is a working API to get device
>> capabilities and program flows using the rocker switch.
>>
>> I created a user space tool 'flow' that I use to configure and query
>> the devices it is posted here,
>>
>> 	https://github.com/jrfastab/iprotue2-flow-tool
>>
>> For now it is a stand-alone tool but once the kernel bits get sorted
>> out (I'm guessing there will need to be a few versions of this series
>> to get it right) I would like to port it into the iproute2 package.
>> This way we can keep all of our tooling in one package see 'bridge'
>> for example.
>>
>> As far as testing, I've tested various combinations of tables and
>> rules on the rocker switch and it seems to work. I have not tested
>> 100% of the rocker code paths though. It would be great to get some
>> sort of automated framework around the API to do this. I don't
>> think should gate the inclusion of the API though.
>>
>> I could use some help reviewing,
>>
>>   (a) error paths and netlink validation code paths
>>
>>   (b) Break down of structures vs netlink attributes. I
>>       am trying to balance flexibility given by having
>>       netlinnk TLV attributes vs conciseness. So some
>>       things are passed as structures.
>>
>>   (c) are there any devices that have pipelines that we
>>       can't represent with this API? It would be good to
>>       know about these so we can design it in probably
>>       in a future series.
>>
>> For some examples and maybe a bit more illustrative description I
>> posted a quickly typed up set of notes on github io pages. Here we
>> can show the description along with images produced by the flow tool
>> showing the pipeline. Once we settle a bit more on the API we should
>> probably do a clean up of this and other threads happening and commit
>> something to the Documentation directory.
>>
>> http://jrfastab.github.io/jekyll/update/2014/12/21/flow-api.html
>>
>> Finally I have more patches to add support for creating and destroying
>> tables. This allows users to define the pipeline at runtime rather
>> than statically as rocker does now. After this set gets some traction
>> I'll look at pushing them in a next round. However it likely requires
>> adding another "world" to rocker. Another piece that I want to add is
>> a description of the actions and metadata. This way user space can
>> "learn" what an action is and how metadata interacts with the system.
>> This work is under development.
>>
>> Thanks! Any comments/feedback always welcome.
>>
>> And also thanks to everyone who helped with this flow API so far. All
>> the folks at Dusseldorf LPC, OVS summit Santa Clara, P4 authors for
>> some inspiration, the collection of IETF FoRCES documents I mulled
>> over, Netfilter workshop where I started to realize fixing ethtool
>> was most likely not going to work, etc.
>>
>> ---
>>
>> John Fastabend (11):
>>       net: flow_table: create interface for hw match/action tables
>>       net: flow_table: add flow, delete flow
>>       net: flow_table: add apply action argument to tables
>>       rocker: add pipeline model for rocker switch
>>       net: rocker: add set flow rules
>>       net: rocker: add group_id slices and drop explicit goto
>>       net: rocker: add multicast path to bridging
>>       net: rocker: add get flow API operation
>>       net: rocker: add cookie to group acls and use flow_id to set cookie
>>       net: rocker: have flow api calls set cookie value
>>       net: rocker: implement delete flow routine
>
> Truly impressive work John (including the "flow" tool, documentation).
> Hat's off.
>
> Currently, all is very userspace oriented and I understand the reason.
> I also understand why Jamal is a bit nervous from that fact. I am as well..
> Correct me if I'm wrong but this amount of "direct hw access" is
> unprecedented. There have been kernel here to cover the hw differencies,
> I wonder if there is any way to continue in this direction with flows...
>

As it is currently written the API allows for abstracting the hardware
programming and low level interface by using a common model and API that
can represent a large array of devices.

By abstract the hw differencies I'm not sure what this means except for
the above model/API. I intentionally didn't want to force _all_
hardware to expose a specific pipeline for example the OVS pipeline.

> What I would love to see in this initial patchset is "the internal user".
> For example tc. The tc code could query the capabilities and decide what
> "flows" to put into hw tables.

Sure, the biggest gap for me on this is 'tc' is actually about
ports/queues and currently filters/tables are part of qdiscs. The
model in this series is a pipeline that has a set of egress endpoints
that can be reached by actions. The endpoints would be ports or tunnel
engines or could be other network function blocks.

That said I can imagine pushing the configuration into a per port table
in the hardware or most likely just requiring any matches on egress
qdisc's to use an implied egress_port match. On ingress similarly use
an ingress_port match.

I'll look at doing this next week but I think the series is useful even
without any "internal users" ;) I'll send out a v2 with all the feedback
I've received so far shortly then think some more about this. Doing the
mapping from software filters/actions/tables onto the hardware tables
exposed by the API in this series is actually what I wanted to present
@ netdev conference so I think we are heading in the same direction.

.John


>
> Jiri
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 00/11] A flow API
  2015-01-06 12:23 ` Jamal Hadi Salim
@ 2015-01-09 18:27   ` John Fastabend
  2015-01-14 19:02     ` Thomas Graf
  0 siblings, 1 reply; 60+ messages in thread
From: John Fastabend @ 2015-01-09 18:27 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: tgraf, sfeldma, jiri, simon.horman, netdev, davem, andy,
	Shrijeet Mukherjee

On 01/06/2015 04:23 AM, Jamal Hadi Salim wrote:
> John,
>
> There are a lot of things to digest in your posting - I am interested
> in commenting on many things but feel need to pay attention to details
> in general given the importance of this interface (and conference is
> chewing my netdev time at the moment). I need to actually sit down
> and stare at code and documentation.

any additional feedback would be great. sorry I tried to be concise
but this email got fairly long regardless. Also a delayed the response
a few days as I mulled over some of it.

>
> I do think we need to have this discussion as part of the BOF
> Shrijeet is running at netdev01.

Maybe I was a bit ambitious thinking we could get this merged
by then? Maybe I can resolve concerns via email ;) What I wanted
to discuss at netdev01 was specifically the mapping between
software models and hardware model as exposed by this series.
I see value in doing this in user space for some consumers OVS
which is why the UAPI is there to support this.

Also I think in-kernel users are interesting as well and 'tc'
is a reasonable candidate to try and offload from in the kernel IMO.

>
> General comments:
> 1) one of the things that i have learnt over time is that not
> everything that sits or is abstracted from hardware is a table.
> You could have structs or simple scalars for config or runtime
> control. How does what you are proposing here allow to express that?
> I dont think you'd need it for simple things but if you dont allow
> for it you run into the square-hole-round-peg syndrome of "yeah
> i can express that u32 variable as a single table with a single row
> and a single column" ;-> or "you need another infrastructure for
> that single scalr u32"

The interface (both UAPI and kernel API) deals exclusively with the
flow table pipeline at the moment. I've allowed for table attributes
which allows you to give tables characteristics. Right now it only
supports basic attribs like ingress_root and egress_root but I have
some work not in this series to allow tables to be dynamic
(allocated/freed) at runtime. More attributes could be added as needed
here. But this still only covers tables.

I agree there other things besides tables, of course. First thing
that comes to mind for me is queues and QOS. How do we model these?
My take is you add another object type call it QUEUE and use a
'struct net_flow_queue' to model queues. Queues then have attributes
as well like length, QOS policies, etc. I would call this extending the
infrastructure not creating another one :). Maybe my naming it
'net_flow' is not ideal. With a queue structure I can connect queues
and tables together with an enqueue action. That would be one example I
can generate more, encrypt operations, etc.

FWIW queues and QOS to me fit nicely into the existing infrastructure
and it may be easier to utilize the existing 'tc' UAPI for this.

In this series I just want to get the flow table piece down though.

>
> 2) So i understood the sense of replacing ethtool for classifier
> access with a direct interface mostly because thats what it was
> already doing - but i am not sure why you need
> it for a generic interface. Am i mistaken you are providing direct
> access to hardware from user space? Would this make essentially
> the Linux infrastructure a bypass (which vendors and their SDKs
> love)? IMHO, a good example is to pick something like netfilter
> or tc-filters and show how that is offloaded. This keeps it in
> the same spirit as what we are shooting for in L2/3 at the moment.
>

I'll try to knock these off one by one:

Yes we are providing an interface for userspace to interrogate the
hardware and program it. My take on this is even if you embed this
into another netlink family OVS, NFT, TCA you end up with the same
operations w.r.t. table support (a) query hardware for
resources/constraints/etc and (b) an API to add/del rules in those
tables. It seems the intersection of these features with existing
netlink families is fairly small so I opted to create a new family.
The underlying hardware offload mechanisms in flow_table.c here could
be used by in-kernel consumers as well as user space. For some
consumers 'tc' perhaps this makes good sense for others 'OVS'
it does not IMO.

Direct access to the hardware? hmm not so sure about that its an
abstraction layer so I can talk to _any_ hardware device using the
same semantics. But yes at the bottom of the interface there is
hardware. Although this provide a "raw" interface for userspace to
inspect and program the hardware it equally provides an API for
in-kernel consumers from using the hardware offload APIs. For
example if you want 'tc' to offload a queueing discipline with some
filters. For what its worth I did some experimental work here and for
some basic cases its possible to do this offload. I'll explore
this more as Jiri/you suggest.

Would this make essentially the Linux infrastructure a bypass? hmm
I'm not sure here exactly what you mean? If switching is done in
the ASIC then the dataplane is being bypassed. And I don't want
to couple management of software dataplane with management with
hardware dataplane. It would be valid to have these dataplanes
running two completely different pipelines/network functions. So I
assume you mean does this API bypass the existing Linux control plane
infrastructure for software dataplanes. I'll say tentatively
yes it does. But in many cases my goal is to unify them in userspace
where it is easier to make policy decisions. For OVS, NFT it
seems to me that user space libraries can handle the unification
of hardware/software dataplanes. Further I think it is the correct
place to unify the dataplanes. I don't want to encode complex
policies into the kernel. Even if you embed the netlink UAPI into
another netlink family the semantics look the same.

To address how to offload existing infrastructures, I'll try to
explain my ideas for each subsystem.

I looked into using netfilter but really didn't make much traction
in the existing infrastructure. The trouble being nft wants to use
expressions like payload that have registers, base, offset, len in
the kernel but the hardware (again at least all the hardware I'm
working with) doesn't work with these semantics it needs a field-id,
possibly the logical operation to use and the value to match. Yes I can
map base/offset/len to a field_id but what do I do with register? And
this sort of complication continues with most the other expressions.
I could write a new expression that was primarily used by hardware
but could have a software user as well but I'm not convinced we would
ever use it in software when we already have the functionally more
generic expressions. To me this looks like a somewhat arbitrary
embedding into netfilter uapi where the gain of doing this is not
entirely clear to me.

OVS would seem to have similar trouble all the policy is in user
space. And the netlink UAPI is tuned for OVS we don't want to start
adding/removing bits to support a hardware API where very little of it
would be used in the software only case and vice versa very little of
the OVS uapi messages as they exist today would be sufficient for the
hardware API. My point of view is the intersection is small enough here
that its easier to write a clean API that stands on its own then try
to sync these hardware offload operations into the OVS UAPI. Further
OVS is very specific about what fields/tables it supports in its current
version and I don't want to force hardware into this model.

And finally 'tc'. Filters today can only be attached to qdisc's which
are bound to net_devices. So the model is netdev's have queues, queues
have a qdisc association and qdiscs have filters. Here we are are
modelling a pipeline associated with a set of ports and in hardware.
The model is slightly different we have queues that dequeue into
an ingress table and an egress table that enqueues packets into queues.
Queues may or may not be bound to the same port. Yes I know 'tc' can
forward to ports but it has no notion of a global table space.

We could build a new 'tc' filter that loaded the hardware tables and
then added rules or deleted rules via hardware api but we would need
some new mechanics to get out the capabilities/resources. Basically
the same set of operations supported in the UAPI of this series. This
would end up IMO to be basically this series only embedded in the TCA_
family with a new filter kind. But then what do we attach it to? Not
a specific qdisc because it is associated with a set of qdiscs. And
additionally why would we use this qdisc*/hw-filter in software when
we already have u32 and bpf? IMO 'tc' is about per port(queues) QOS
and filters/actions to support this. That said I actually see offloading
'tc' qdisc/filters on the ports into the hardware as being useful
and using the operations added in this series to flow_table.c. See
my response to Jiri noting I'll go ahead and try to get this working.
OTOH I still think you need the UAPI proposed in this series for other
consumers.

Maybe I need to be enlightened but I thought for a bit about some grand
unification of ovs, bridge, tc, netlink, et. al. but that seems like
an entirely different scope of project. (side note: filters/actions
are no longer locked by qdisc and could stand on their own) My thoughts
on this are not yet organized.

> Anyways I apologize i havent spent as much time (holiday period
> wasnt good for me and netdev01 is picking up and consuming my time
> but i will try my best to respond and comment with some latency)
>

great thanks. Maybe this will give you more to mull over. If its
clear as mud let me know and I'll draw up some pictures. Likely
need to do that regardless. Bottom line I think the proposed API
here solves a real need.

Thanks!
John

> cheers,
> jamal
>
> On 12/31/14 14:45, John Fastabend wrote:
>> So... I could continue to mull over this and tweak bits and pieces
>> here and there but I decided its best to get a wider group of folks
>> looking at it and hopefulyl with any luck using it so here it is.
>>
>> This set creates a new netlink family and set of messages to configure
>> flow tables in hardware. I tried to make the commit messages
>> reasonably verbose at least in the flow_table patches.
>>
>> What we get at the end of this series is a working API to ge

t device
>> capabilities and program flows using the rocker switch.
>>
>> I created a user space tool 'flow' that I use to configure and query
>> the devices it is posted here,
>>
>>     https://github.com/jrfastab/iprotue2-flow-tool
>>
>> For now it is a stand-alone tool but once the kernel bits get sorted
>> out (I'm guessing there will need to be a few versions of this series
>> to get it right) I would like to port it into the iproute2 package.
>> This way we can keep all of our tooling in one package see 'bridge'
>> for example.
>>
>> As far as testing, I've tested various combinations of tables and
>> rules on the rocker switch and it seems to work. I have not tested
>> 100% of the rocker code paths though. It would be great to get some
>> sort of automated framework around the API to do this. I don't
>> think should gate the inclusion of the API though.
>>
>> I could use some help reviewing,
>>
>>    (a) error paths and netlink validation code paths
>>
>>    (b) Break down of structures vs netlink attributes. I
>>        am trying to balance flexibility given by having
>>        netlinnk TLV attributes vs conciseness. So some
>>        things are passed as structures.
>>
>>    (c) are there any devices that have pipelines that we
>>        can't represent with this API? It would be good to
>>        know about these so we can design it in probably
>>        in a future series.
>>
>> For some examples and maybe a bit more illustrative description I
>> posted a quickly typed up set of notes on github io pages. Here we
>> can show the description along with images produced by the flow tool
>> showing the pipeline. Once we settle a bit more on the API we should
>> probably do a clean up of this and other threads happening and commit
>> something to the Documentation directory.
>>
>>   http://jrfastab.github.io/jekyll/update/2014/12/21/flow-api.html
>>
>> Finally I have more patches to add support for creating and destroying
>> tables. This allows users to define the pipeline at runtime rather
>> than statically as rocker does now. After this set gets some traction
>> I'll look at pushing them in a next round. However it likely requires
>> adding another "world" to rocker. Another piece that I want to add is
>> a description of the actions and metadata. This way user space can
>> "learn" what an action is and how metadata interacts with the system.
>> This work is under development.
>>
>> Thanks! Any comments/feedback always welcome.
>>
>> And also thanks to everyone who helped with this flow API so far. All
>> the folks at Dusseldorf LPC, OVS summit Santa Clara, P4 authors for
>> some inspiration, the collection of IETF FoRCES documents I mulled
>> over, Netfilter workshop where I started to realize fixing ethtool
>> was most likely not going to work, etc.
>>
>> ---
>>
>> John Fastabend (11):
>>        net: flow_table: create interface for hw match/action tables
>>        net: flow_table: add flow, delete flow
>>        net: flow_table: add apply action argument to tables
>>        rocker: add pipeline model for rocker switch
>>        net: rocker: add set flow rules
>>        net: rocker: add group_id slices and drop explicit goto
>>        net: rocker: add multicast path to bridging
>>        net: rocker: add get flow API operation
>>        net: rocker: add cookie to group acls and use flow_id to set
>> cookie
>>        net: rocker: have flow api calls set cookie value
>>        net: rocker: implement delete flow routine
>>
>>
>>   drivers/net/ethernet/rocker/rocker.c          | 1641
>> +++++++++++++++++++++++++
>>   drivers/net/ethernet/rocker/rocker_pipeline.h |  793 ++++++++++++
>>   include/linux/if_flow.h                       |  115 ++
>>   include/linux/netdevice.h                     |   20
>>   include/uapi/linux/if_flow.h                  |  413 ++++++
>>   net/Kconfig                                   |    7
>>   net/core/Makefile                             |    1
>>   net/core/flow_table.c                         | 1339
>> ++++++++++++++++++++
>>   8 files changed, 4312 insertions(+), 17 deletions(-)
>>   create mode 100644 drivers/net/ethernet/rocker/rocker_pipeline.h
>>   create mode 100644 include/linux/if_flow.h
>>   create mode 100644 include/uapi/linux/if_flow.h
>>   create mode 100644 net/core/flow_table.c
>>
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 00/11] A flow API
  2015-01-09 18:27   ` John Fastabend
@ 2015-01-14 19:02     ` Thomas Graf
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Graf @ 2015-01-14 19:02 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jamal Hadi Salim, sfeldma, jiri, simon.horman, netdev, davem,
	andy, Shrijeet Mukherjee

On 01/09/15 at 10:27am, John Fastabend wrote:
> Yes we are providing an interface for userspace to interrogate the
> hardware and program it. My take on this is even if you embed this
> into another netlink family OVS, NFT, TCA you end up with the same
> operations w.r.t. table support (a) query hardware for
> resources/constraints/etc and (b) an API to add/del rules in those
> tables. It seems the intersection of these features with existing
> netlink families is fairly small so I opted to create a new family.
> The underlying hardware offload mechanisms in flow_table.c here could
> be used by in-kernel consumers as well as user space. For some
> consumers 'tc' perhaps this makes good sense for others 'OVS'
> it does not IMO.

+1

> [...]
>
> But in many cases my goal is to unify them in userspace
> where it is easier to make policy decisions. For OVS, NFT it
> seems to me that user space libraries can handle the unification
> of hardware/software dataplanes. Further I think it is the correct
> place to unify the dataplanes. I don't want to encode complex
> policies into the kernel. Even if you embed the netlink UAPI into
> another netlink family the semantics look the same.

I think we want the kernel to remain in control but it does not
necessarily have to hold the offload decision logic for all users.
I think this is compareable to routing daemons. We do not want to
talk BPF or OSPF in the kernel and we don't need to know about all
of the selection logic behind it but we want to be in charge of
keeping track of the actual datapath routes.

Also, I think this is still an option to emebed the proposed
attribtues in existing Netlink families even if we shoot for a new
family for now. So far the attributes seem to be defined in a way
that would allow them to be embedded into other existing Netlink
families.

> Maybe I need to be enlightened but I thought for a bit about some grand
> unification of ovs, bridge, tc, netlink, et. al. but that seems like
> an entirely different scope of project. (side note: filters/actions
> are no longer locked by qdisc and could stand on their own) My thoughts
> on this are not yet organized.

I think everybody had this in the back of their mind at some point.
Be it based on BPF, NFT or TC. I don't think it's undoable but it
takes a lot of effort as each is based on a slightly different set
of assumptions with corresponding focus derived from that.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-07 21:17 Alexei Starovoitov
@ 2015-01-07 22:00 ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-07 22:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, netdev, David S. Miller,
	Andy Gospodarek

On 01/07/2015 01:17 PM, Alexei Starovoitov wrote:
> On Tue, Jan 6, 2015 at 9:37 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>>> - above plus put_header_graph() which will allow to
>>>     rearrange some fixed sized headers ?
>>
>> OK but I'm not sure where/if these devices exist. Maybe your
>> thinking of a software dataplane case? Would get_headers return
>> some headers/fields but not include them in the graph and then
>> expect the user to build a graph with them if the user needs
>> them. Are there restrictions on how the graph can be built
>> out? I guess I'm working with the assumption that the device
>> returns a complete parse graph for all combinations it supports.
>
> ahh. I thought that get_hdr_graph() will return one
> that is currently configured and put_hdr_graph()
> will try to configure new sequence of headers.
> I think returning all possible combinations is not practical,
> since number of such combinations can be very large for
> some hw.

Agree here I think it should return the currently configured
and active hdr graph. Just to be clear I had assumed that any driver
that supported put_header_graph would also support a put_headers
call. basically your case 3 below.

> Also it seems that 4/11 patch and rocker_header_nodes[]
> in particular describing one graph instead of
> all possible?

It returns the one and only graph rocker supports now or at least
the graph of supported headers as I read the rocker code. As
rocker becomes more flexible I would expect this to grow including
tunnels, stacked headers, tcp, etc.

>
>>> - above plus put_header() ?
>>>     I'm having a hard time envisioning how that would
>>>     look like.
>>
>> This case makes more sense to me. The user supplies the definition
>> of the headers and the graph showing how they are related and the
>> driver can program the parser to work correctly.
>
> yes, assuming that put_hdr_graph() programs one
> sequence of jumping through hdrs...
> but I think it's also fine if you do one put_hdrs_and_graph()
> function as you described.
>
>> To be honest though I would really be happy getting the 1st option
>> working.
>
> agree.
> as long as we don't screw up get*() semantics that
> prevent clean put*() logic :)
> To illustrate my point:
> if hw parser can parse 2 vlans and there is
> no way to configure it to do zero, one or three, it's perfectly
> fine for put_hdr_graph() to fail when it tries to configure
> something different.
> But if hw can be configured to do 1 vlan or 2 vlans, it
> would be overkill to pass both graphs in get().
> Just pass one that is currently active and let put() try things?

This is what I intended. I think it is good enough.

> I think get_hdrs() on its own is good enough indication
> to the user what hw is capable of and hdr_graph is
> just a jump table between them. If hw can parse vxlan
> without vxlan extensions it will be clearly seen in get_hdrs,
> so no point trying to do put_hdrs() with some new
> definition of vxlan unless parser is fully programmable.
> that's where I was going with my category 2 where
> only put_hdr_graph() exists... imo it will fit trident
> and alta models ?
> Personally I believe that we should design this API
> with as much as possible real hw in mind.
> rocker can support different models of hw...
>

Yep. Which is why at some point I would like to program up
a couple other "worlds" for rocker that have different pipelines.
This would allow experimenting with more then the current static
model rocker uses.


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
@ 2015-01-07 21:17 Alexei Starovoitov
  2015-01-07 22:00 ` John Fastabend
  0 siblings, 1 reply; 60+ messages in thread
From: Alexei Starovoitov @ 2015-01-07 21:17 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, netdev, David S. Miller,
	Andy Gospodarek

On Tue, Jan 6, 2015 at 9:37 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>> - above plus put_header_graph() which will allow to
>>    rearrange some fixed sized headers ?
>
> OK but I'm not sure where/if these devices exist. Maybe your
> thinking of a software dataplane case? Would get_headers return
> some headers/fields but not include them in the graph and then
> expect the user to build a graph with them if the user needs
> them. Are there restrictions on how the graph can be built
> out? I guess I'm working with the assumption that the device
> returns a complete parse graph for all combinations it supports.

ahh. I thought that get_hdr_graph() will return one
that is currently configured and put_hdr_graph()
will try to configure new sequence of headers.
I think returning all possible combinations is not practical,
since number of such combinations can be very large for
some hw.
Also it seems that 4/11 patch and rocker_header_nodes[]
in particular describing one graph instead of
all possible?

>> - above plus put_header() ?
>>    I'm having a hard time envisioning how that would
>>    look like.
>
> This case makes more sense to me. The user supplies the definition
> of the headers and the graph showing how they are related and the
> driver can program the parser to work correctly.

yes, assuming that put_hdr_graph() programs one
sequence of jumping through hdrs...
but I think it's also fine if you do one put_hdrs_and_graph()
function as you described.

> To be honest though I would really be happy getting the 1st option
> working.

agree.
as long as we don't screw up get*() semantics that
prevent clean put*() logic :)
To illustrate my point:
if hw parser can parse 2 vlans and there is
no way to configure it to do zero, one or three, it's perfectly
fine for put_hdr_graph() to fail when it tries to configure
something different.
But if hw can be configured to do 1 vlan or 2 vlans, it
would be overkill to pass both graphs in get().
Just pass one that is currently active and let put() try things?
I think get_hdrs() on its own is good enough indication
to the user what hw is capable of and hdr_graph is
just a jump table between them. If hw can parse vxlan
without vxlan extensions it will be clearly seen in get_hdrs,
so no point trying to do put_hdrs() with some new
definition of vxlan unless parser is fully programmable.
that's where I was going with my category 2 where
only put_hdr_graph() exists... imo it will fit trident
and alta models ?
Personally I believe that we should design this API
with as much as possible real hw in mind.
rocker can support different models of hw...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
  2015-01-07  1:14 [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables Alexei Starovoitov
@ 2015-01-07  5:37 ` John Fastabend
  0 siblings, 0 replies; 60+ messages in thread
From: John Fastabend @ 2015-01-07  5:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, netdev, David S. Miller,
	Andy Gospodarek

On 01/06/2015 05:14 PM, Alexei Starovoitov wrote:
> On Wed, Dec 31, 2014 at 11:45 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> + * [NET_FLOW_TABLE_IDENTIFIER_TYPE]
>> + * [NET_FLOW_TABLE_IDENTIFIER]
>> + * [NET_FLOW_TABLE_TABLES]
>> + *     [NET_FLOW_TABLE]
>> + *       [NET_FLOW_TABLE_ATTR_NAME]
>> + *       [NET_FLOW_TABLE_ATTR_UID]
>> + *       [NET_FLOW_TABLE_ATTR_SOURCE]
>> + *       [NET_FLOW_TABLE_ATTR_SIZE]
> ...
>> + * Header definitions used to define headers with user friendly
>> + * names.
>> + *
>> + * [NET_FLOW_TABLE_HEADERS]
>> + *   [NET_FLOW_HEADER]
>> + *     [NET_FLOW_HEADER_ATTR_NAME]
>> + *     [NET_FLOW_HEADER_ATTR_UID]
>> + *     [NET_FLOW_HEADER_ATTR_FIELDS]
>> + *       [NET_FLOW_HEADER_ATTR_FIELD]
>> + *         [NET_FLOW_FIELD_ATTR_NAME]
>> + *         [NET_FLOW_FIELD_ATTR_UID]
>> + *         [NET_FLOW_FIELD_ATTR_BITWIDTH]
>> + *       [NET_FLOW_HEADER_ATTR_FIELD]
>> + *         [...]
>> + *       [...]
>> + * Action definitions supported by tables
>> + *
>> + * [NET_FLOW_TABLE_ACTIONS]
>> + *   [NET_FLOW_TABLE_ATTR_ACTIONS]
>> + *     [NET_FLOW_ACTION]
>> + *       [NET_FLOW_ACTION_ATTR_NAME]
>> + *       [NET_FLOW_ACTION_ATTR_UID]
>> + *       [NET_FLOW_ACTION_ATTR_SIGNATURE]
>> + *              [NET_FLOW_ACTION_ARG]
> ..
>> + * Get Table Graph <Reply> description
>> + *
>> + * [NET_FLOW_TABLE_TABLE_GRAPH]
>> + *   [TABLE_GRAPH_NODE]
>> + *     [TABLE_GRAPH_NODE_UID]
>> + *     [TABLE_GRAPH_NODE_JUMP]
>
> I think NET_FLOW prefix everywhere is too verbose.
> Especially since you've missed it in the above 3.
> and in patch 2 it is:
> NET_FLOW_FLOW
> which is kinda awkward.
> Can you abbreviate it to NFL_ or something else ?

hmm I'm open for a better name, NFL_ might work but seems
a bit cryptic to me. Maybe it is better than NET_FLOW.
Anyone other suggestions?

>
> I couldn't find get_headers() and get_header_graph()
> implementation on rocker side ?

It is in patch

[net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch

+#ifdef CONFIG_NET_FLOW_TABLES
+	.ndo_flow_get_tables		= rocker_get_tables,
+	.ndo_flow_get_headers		= rocker_get_headers,
+	.ndo_flow_get_actions		= rocker_get_actions,
+	.ndo_flow_get_tbl_graph		= rocker_get_tgraph,
+	.ndo_flow_get_hdr_graph		= rocker_get_hgraph,
+#endif

although v2 will address some good feedback and clean it up a bit.


> Could you describe how put_header_graph() will look like?

The signature for get_hdrs and get_hdrs_graph in my latest deck
(I'll push shortly still need to sort out the caching for
get_flows) look like this,

   struct net_flow_hdr **(*ndo_flow_get_hdrs)(struct net_device *dev);
   struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct 
net_device *dev)

I could then use the following signatures for put hdrs,

   int (*ndo_flow_put_hdrs)(struct net_device *dev, struct net_flow_hdr 
**hdrs);
   int (*ndo_flow_put_hdrs_graph(struct net_device *dev, struct 
net_flow_hdr_graph **graph);

If the user supplies a new set of hdrs via put_hdrs it would
invalidate the hdrs graph though so we could either smash those two
operations into one or require both to occur when the device is down but
not allow it to come up without a graph operation. Currently my
preference is to smash the above two ops to this,

>   int (*ndo_flow_put_hdrs)(struct net_device *dev, struct net_flow_hdr **hdrs, struct net_flow_hdr_graph **graph);

I've gone back and forth on this, doing updates to the hdrs/graph
while the device is online doesn't seem practical. I don't have access
to any devices that would support this. If your device is a software
one though (futuristic eBPF-OVS) it may make some sense.

> When it comes to parsing I'm assuming that hw will fall
> into N categories:
> - that has get_headers() and get_header_graph() only
>    which would mean fixed parser

right this is rocker and what the initial series is trying to
address.

> - above plus put_header_graph() which will allow to
>    rearrange some fixed sized headers ?

OK but I'm not sure where/if these devices exist. Maybe your
thinking of a software dataplane case? Would get_headers return
some headers/fields but not include them in the graph and then
expect the user to build a graph with them if the user needs
them. Are there restrictions on how the graph can be built
out? I guess I'm working with the assumption that the device
returns a complete parse graph for all combinations it supports.
Are there really devices that could only support certain combinations
and then if you shuffled the headers graph around support others?
I'm just not aware of any device.

> - above plus put_header() ?
>    I'm having a hard time envisioning how that would
>    look like.

This case makes more sense to me. The user supplies the definition
of the headers and the graph showing how they are related and the
driver can program the parser to work correctly. This implies a
flexible parser but I think some devices could support this. You
would need some attributes to define depth of the parser and such
restrictions if the device has restrictions on the parsers that
can be supported.

Maybe one concrete example would be to introduce a header tag that
was previously unknown to the device. You could define it using
the header/fields (bit/length/offset) notation and then give the
graph to let the parser "know" where to expect it. Finally this
could be passed to the driver and the parser could be generated.

To be honest though I would really be happy getting the 1st option
working.

> - ... ?
>
> also can we change a name from add_flow
> to add_entry or add_rule ?
> I think 'rule' fits better, since rule = field_ref+action
> and one real TCP flow may need multiple rules
> inserted into table, right?
> The whole thing can still be called 'flow API'...

add_rule/del_rule fine by me.

>
> will there be a put_table_graph() ?
> probably not, right? since as soon as HW supports
> 'goto' aciton, the meaning of table_graph is lost and
> it's actually just a set of disconnected tables and the
> way to jump from one into another is through 'goto'.

hmm I have support in another tree for create/destroy table. This
allows users to create tables and destroy them. .

I think the table_graph is still relevant its just the graph
is completely connected. I'm not sure you will really see hardware
like this anytime soon though :) or maybe I just mean I haven't
seen any.

>
> I think OVS guys are quiet, since they're skeptical
> that headers+header_graph approach can work?
> Would be great if they can share the experience...
>

hmm maybe but I could define all the headers OVS supports using
this. And then define a linear array of tables. It might be an
interesting exercise to build this on top of rocker.

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
@ 2015-01-07  1:14 Alexei Starovoitov
  2015-01-07  5:37 ` John Fastabend
  0 siblings, 1 reply; 60+ messages in thread
From: Alexei Starovoitov @ 2015-01-07  1:14 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, netdev, David S. Miller,
	Andy Gospodarek

On Wed, Dec 31, 2014 at 11:45 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> + * [NET_FLOW_TABLE_IDENTIFIER_TYPE]
> + * [NET_FLOW_TABLE_IDENTIFIER]
> + * [NET_FLOW_TABLE_TABLES]
> + *     [NET_FLOW_TABLE]
> + *       [NET_FLOW_TABLE_ATTR_NAME]
> + *       [NET_FLOW_TABLE_ATTR_UID]
> + *       [NET_FLOW_TABLE_ATTR_SOURCE]
> + *       [NET_FLOW_TABLE_ATTR_SIZE]
...
> + * Header definitions used to define headers with user friendly
> + * names.
> + *
> + * [NET_FLOW_TABLE_HEADERS]
> + *   [NET_FLOW_HEADER]
> + *     [NET_FLOW_HEADER_ATTR_NAME]
> + *     [NET_FLOW_HEADER_ATTR_UID]
> + *     [NET_FLOW_HEADER_ATTR_FIELDS]
> + *       [NET_FLOW_HEADER_ATTR_FIELD]
> + *         [NET_FLOW_FIELD_ATTR_NAME]
> + *         [NET_FLOW_FIELD_ATTR_UID]
> + *         [NET_FLOW_FIELD_ATTR_BITWIDTH]
> + *       [NET_FLOW_HEADER_ATTR_FIELD]
> + *         [...]
> + *       [...]
> + * Action definitions supported by tables
> + *
> + * [NET_FLOW_TABLE_ACTIONS]
> + *   [NET_FLOW_TABLE_ATTR_ACTIONS]
> + *     [NET_FLOW_ACTION]
> + *       [NET_FLOW_ACTION_ATTR_NAME]
> + *       [NET_FLOW_ACTION_ATTR_UID]
> + *       [NET_FLOW_ACTION_ATTR_SIGNATURE]
> + *              [NET_FLOW_ACTION_ARG]
..
> + * Get Table Graph <Reply> description
> + *
> + * [NET_FLOW_TABLE_TABLE_GRAPH]
> + *   [TABLE_GRAPH_NODE]
> + *     [TABLE_GRAPH_NODE_UID]
> + *     [TABLE_GRAPH_NODE_JUMP]

I think NET_FLOW prefix everywhere is too verbose.
Especially since you've missed it in the above 3.
and in patch 2 it is:
NET_FLOW_FLOW
which is kinda awkward.
Can you abbreviate it to NFL_ or something else ?

I couldn't find get_headers() and get_header_graph()
implementation on rocker side ?
Could you describe how put_header_graph() will look like?
When it comes to parsing I'm assuming that hw will fall
into N categories:
- that has get_headers() and get_header_graph() only
  which would mean fixed parser
- above plus put_header_graph() which will allow to
  rearrange some fixed sized headers ?
- above plus put_header() ?
  I'm having a hard time envisioning how that would
  look like.
- ... ?

also can we change a name from add_flow
to add_entry or add_rule ?
I think 'rule' fits better, since rule = field_ref+action
and one real TCP flow may need multiple rules
inserted into table, right?
The whole thing can still be called 'flow API'...

will there be a put_table_graph() ?
probably not, right? since as soon as HW supports
'goto' aciton, the meaning of table_graph is lost and
it's actually just a set of disconnected tables and the
way to jump from one into another is through 'goto'.

I think OVS guys are quiet, since they're skeptical
that headers+header_graph approach can work?
Would be great if they can share the experience...

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2015-01-14 19:02 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-31 19:45 [net-next PATCH v1 00/11] A flow API John Fastabend
2014-12-31 19:45 ` [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables John Fastabend
2014-12-31 20:10   ` John Fastabend
2015-01-04 11:12   ` Thomas Graf
2015-01-05 18:59     ` John Fastabend
2015-01-05 21:48       ` Thomas Graf
2015-01-05 23:29       ` John Fastabend
2015-01-06  0:45       ` John Fastabend
2015-01-06  1:09         ` Simon Horman
2015-01-06  1:19           ` John Fastabend
2015-01-06  2:05             ` Simon Horman
2015-01-06  2:54               ` Simon Horman
2015-01-06  3:31                 ` John Fastabend
2015-01-07 10:07       ` Or Gerlitz
2015-01-07 16:35         ` John Fastabend
2015-01-06  5:25   ` Scott Feldman
2015-01-06  6:04     ` John Fastabend
2015-01-06  6:40       ` Scott Feldman
2014-12-31 19:46 ` [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow John Fastabend
2015-01-06  6:19   ` Scott Feldman
2015-01-08 17:39   ` Jiri Pirko
2015-01-09  6:21     ` John Fastabend
2014-12-31 19:46 ` [net-next PATCH v1 03/11] net: flow_table: add apply action argument to tables John Fastabend
2015-01-08 17:41   ` Jiri Pirko
2015-01-09  6:17     ` John Fastabend
2014-12-31 19:47 ` [net-next PATCH v1 04/11] rocker: add pipeline model for rocker switch John Fastabend
2015-01-04  8:43   ` Or Gerlitz
2015-01-05  5:18     ` John Fastabend
2015-01-06  7:01   ` Scott Feldman
2015-01-06 17:00     ` John Fastabend
2015-01-06 17:16       ` Scott Feldman
2015-01-06 17:49         ` John Fastabend
2014-12-31 19:47 ` [net-next PATCH v1 05/11] net: rocker: add set flow rules John Fastabend
2015-01-06  7:23   ` Scott Feldman
2015-01-06 15:31     ` John Fastabend
2014-12-31 19:48 ` [net-next PATCH v1 06/11] net: rocker: add group_id slices and drop explicit goto John Fastabend
2014-12-31 19:48 ` [net-next PATCH v1 07/11] net: rocker: add multicast path to bridging John Fastabend
2014-12-31 19:48 ` [net-next PATCH v1 08/11] net: rocker: add get flow API operation John Fastabend
     [not found]   ` <CAKoUArm4z_i6Su9Q4ODB1QYR_Z098MjT2yN=WR7LbN387AvPsg@mail.gmail.com>
2015-01-02 21:15     ` John Fastabend
2015-01-06  7:40   ` Scott Feldman
2015-01-06 14:59     ` John Fastabend
2015-01-06 16:57       ` Scott Feldman
2015-01-06 17:50         ` John Fastabend
2014-12-31 19:49 ` [net-next PATCH v1 09/11] net: rocker: add cookie to group acls and use flow_id to set cookie John Fastabend
2014-12-31 19:50 ` [net-next PATCH v1 10/11] net: rocker: have flow api calls set cookie value John Fastabend
2014-12-31 19:50 ` [net-next PATCH v1 11/11] net: rocker: implement delete flow routine John Fastabend
2015-01-04  8:30 ` [net-next PATCH v1 00/11] A flow API Or Gerlitz
2015-01-05  5:17   ` John Fastabend
2015-01-06  2:42 ` Scott Feldman
2015-01-06 12:23 ` Jamal Hadi Salim
2015-01-09 18:27   ` John Fastabend
2015-01-14 19:02     ` Thomas Graf
2015-01-08 15:14 ` Or Gerlitz
2015-01-09 17:26   ` John Fastabend
2015-01-08 18:03 ` Jiri Pirko
2015-01-09 18:10   ` John Fastabend
2015-01-07  1:14 [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables Alexei Starovoitov
2015-01-07  5:37 ` John Fastabend
2015-01-07 21:17 Alexei Starovoitov
2015-01-07 22:00 ` John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.