All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v6 0/3] openvswitch: add NSH support
@ 2017-08-25 14:20 Yi Yang
  2017-08-25 14:20 ` [PATCH net-next v6 1/3] net: add NSH header structures and helpers Yi Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Yi Yang @ 2017-08-25 14:20 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

v5->v6
 - Fix the rest comments for v4.
 - Add NSH GSO support for VxLAN-gpe + NSH and
   Eth + NSH.

v4->v5
 - Fix many comments by Jiri Benc and Eric Garver
   for v4.

v3->v4
 - Add new NSH match field ttl
 - Update NSH header to the latest format
   which will be final format and won't change
   per its author's confirmation.
 - Fix comments for v3.

v2->v3
 - Change OVS_KEY_ATTR_NSH to nested key to handle
   length-fixed attributes and length-variable
   attriubte more flexibly.
 - Remove struct ovs_action_push_nsh completely
 - Add code to handle nested attribute for SET_MASKED
 - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
   to transfer NSH header data.
 - Fix comments and coding style issues by Jiri and Eric

v1->v2
 - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
 - Dynamically allocate struct ovs_action_push_nsh for
   length-variable metadata.

This patch series is to enable NSH support in OVS kernel
data path, it also adds NSH GSO support for big packet.

Yi Yang (3):
  net: add NSH header structures and helpers
  net: gso: Add GSO support for NSH
  openvswitch: enable NSH support

 drivers/net/vxlan.c              |   7 +
 include/linux/netdevice.h        |   1 +
 include/linux/skbuff.h           |   8 +-
 include/net/nsh.h                | 307 +++++++++++++++++++++++++++++
 include/uapi/linux/if_ether.h    |   1 +
 include/uapi/linux/openvswitch.h |  28 +++
 net/Kconfig                      |   1 +
 net/Makefile                     |   1 +
 net/core/dev.c                   |  14 ++
 net/ipv4/udp_offload.c           |   7 +
 net/nsh/Kconfig                  |  11 ++
 net/nsh/Makefile                 |   4 +
 net/nsh/nsh_gso.c                | 106 ++++++++++
 net/openvswitch/actions.c        | 178 +++++++++++++++++
 net/openvswitch/flow.c           |  55 ++++++
 net/openvswitch/flow.h           |  11 ++
 net/openvswitch/flow_netlink.c   | 404 ++++++++++++++++++++++++++++++++++++++-
 net/openvswitch/flow_netlink.h   |   4 +
 18 files changed, 1145 insertions(+), 3 deletions(-)
 create mode 100644 include/net/nsh.h
 create mode 100644 net/nsh/Kconfig
 create mode 100644 net/nsh/Makefile
 create mode 100644 net/nsh/nsh_gso.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH net-next v6 1/3] net: add NSH header structures and helpers
  2017-08-25 14:20 [PATCH net-next v6 0/3] openvswitch: add NSH support Yi Yang
@ 2017-08-25 14:20 ` Yi Yang
  2017-08-25 15:07   ` Jiri Benc
  2017-08-25 14:20 ` [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH Yi Yang
  2017-08-25 14:20 ` [PATCH net-next v6 3/3] openvswitch: enable NSH support Yi Yang
  2 siblings, 1 reply; 31+ messages in thread
From: Yi Yang @ 2017-08-25 14:20 UTC (permalink / raw)
  To: netdev; +Cc: dev, jbenc, e, blp, jan.scheurich, Yi Yang

NSH (Network Service Header)[1] is a new protocol for service
function chaining, it can be handled as a L3 protocol like
IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
Inner packet are two typical use cases.

This patch adds NSH header structures and helpers for NSH GSO
support and Open vSwitch NSH support.

[1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
---
 include/net/nsh.h             | 307 ++++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/if_ether.h |   1 +
 2 files changed, 308 insertions(+)
 create mode 100644 include/net/nsh.h

diff --git a/include/net/nsh.h b/include/net/nsh.h
new file mode 100644
index 0000000..72d62df
--- /dev/null
+++ b/include/net/nsh.h
@@ -0,0 +1,307 @@
+#ifndef __NET_NSH_H
+#define __NET_NSH_H 1
+
+/*
+ * Network Service Header:
+ *  0                   1                   2                   3
+ *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |Ver|O|U|    TTL    |   Length  |U|U|U|U|MD Type| Next Protocol |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |          Service Path Identifier (SPI)        | Service Index |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                                                               |
+ * ~               Mandatory/Optional Context Headers              ~
+ * |                                                               |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * Version: The version field is used to ensure backward compatibility
+ * going forward with future NSH specification updates.  It MUST be set
+ * to 0x0 by the sender, in this first revision of NSH.  Given the
+ * widespread implementation of existing hardware that uses the first
+ * nibble after an MPLS label stack for ECMP decision processing, this
+ * document reserves version 01b and this value MUST NOT be used in
+ * future versions of the protocol.  Please see [RFC7325] for further
+ * discussion of MPLS-related forwarding requirements.
+ *
+ * O bit: Setting this bit indicates an Operations, Administration, and
+ * Maintenance (OAM) packet.  The actual format and processing of SFC
+ * OAM packets is outside the scope of this specification (see for
+ * example [I-D.ietf-sfc-oam-framework] for one approach).
+ *
+ * The O bit MUST be set for OAM packets and MUST NOT be set for non-OAM
+ * packets.  The O bit MUST NOT be modified along the SFP.
+ *
+ * SF/SFF/SFC Proxy/Classifier implementations that do not support SFC
+ * OAM procedures SHOULD discard packets with O bit set, but MAY support
+ * a configurable parameter to enable forwarding received SFC OAM
+ * packets unmodified to the next element in the chain.  Forwarding OAM
+ * packets unmodified by SFC elements that do not support SFC OAM
+ * procedures may be acceptable for a subset of OAM functions, but can
+ * result in unexpected outcomes for others, thus it is recommended to
+ * analyze the impact of forwarding an OAM packet for all OAM functions
+ * prior to enabling this behavior.  The configurable parameter MUST be
+ * disabled by default.
+ *
+ * TTL: Indicates the maximum SFF hops for an SFP.  This field is used
+ * for service plane loop detection.  The initial TTL value SHOULD be
+ * configurable via the control plane; the configured initial value can
+ * be specific to one or more SFPs.  If no initial value is explicitly
+ * provided, the default initial TTL value of 63 MUST be used.  Each SFF
+ * involved in forwarding an NSH packet MUST decrement the TTL value by
+ * 1 prior to NSH forwarding lookup.  Decrementing by 1 from an incoming
+ * value of 0 shall result in a TTL value of 63.  The packet MUST NOT be
+ * forwarded if TTL is, after decrement, 0.
+ *
+ * All other flag fields, marked U, are unassigned and available for
+ * future use, see Section 11.2.1.  Unassigned bits MUST be set to zero
+ * upon origination, and MUST be ignored and preserved unmodified by
+ * other NSH supporting elements.  Elements which do not understand the
+ * meaning of any of these bits MUST NOT modify their actions based on
+ * those unknown bits.
+ *
+ * Length: The total length, in 4-byte words, of NSH including the Base
+ * Header, the Service Path Header, the Fixed Length Context Header or
+ * Variable Length Context Header(s).  The length MUST be 0x6 for MD
+ * Type equal to 0x1, and MUST be 0x2 or greater for MD Type equal to
+ * 0x2.  The length of the NSH header MUST be an integer multiple of 4
+ * bytes, thus variable length metadata is always padded out to a
+ * multiple of 4 bytes.
+ *
+ * MD Type: Indicates the format of NSH beyond the mandatory Base Header
+ * and the Service Path Header.  MD Type defines the format of the
+ * metadata being carried.
+ *
+ * 0x0 - This is a reserved value.  Implementations SHOULD silently
+ * discard packets with MD Type 0x0.
+ *
+ * 0x1 - This indicates that the format of the header includes a fixed
+ * length Context Header (see Figure 4 below).
+ *
+ * 0x2 - This does not mandate any headers beyond the Base Header and
+ * Service Path Header, but may contain optional variable length Context
+ * Header(s).  The semantics of the variable length Context Header(s)
+ * are not defined in this document.  The format of the optional
+ * variable length Context Headers is provided in Section 2.5.1.
+ *
+ * 0xF - This value is reserved for experimentation and testing, as per
+ * [RFC3692].  Implementations not explicitly configured to be part of
+ * an experiment SHOULD silently discard packets with MD Type 0xF.
+ *
+ * Next Protocol: indicates the protocol type of the encapsulated data.
+ * NSH does not alter the inner payload, and the semantics on the inner
+ * protocol remain unchanged due to NSH service function chaining.
+ * Please see the IANA Considerations section below, Section 11.2.5.
+ *
+ * This document defines the following Next Protocol values:
+ *
+ * 0x1: IPv4
+ * 0x2: IPv6
+ * 0x3: Ethernet
+ * 0x4: NSH
+ * 0x5: MPLS
+ * 0xFE: Experiment 1
+ * 0xFF: Experiment 2
+ *
+ * Packets with Next Protocol values not supported SHOULD be silently
+ * dropped by default, although an implementation MAY provide a
+ * configuration parameter to forward them.  Additionally, an
+ * implementation not explicitly configured for a specific experiment
+ * [RFC3692] SHOULD silently drop packets with Next Protocol values 0xFE
+ * and 0xFF.
+ *
+ * Service Path Identifier (SPI): Identifies a service path.
+ * Participating nodes MUST use this identifier for Service Function
+ * Path selection.  The initial classifier MUST set the appropriate SPI
+ * for a given classification result.
+ *
+ * Service Index (SI): Provides location within the SFP.  The initial
+ * classifier for a given SFP SHOULD set the SI to 255, however the
+ * control plane MAY configure the initial value of SI as appropriate
+ * (i.e., taking into account the length of the service function path).
+ * The Service Index MUST be decremented by a value of 1 by Service
+ * Functions or by SFC Proxy nodes after performing required services
+ * and the new decremented SI value MUST be used in the egress packet's
+ * NSH.  The initial Classifier MUST send the packet to the first SFF in
+ * the identified SFP for forwarding along an SFP.  If re-classification
+ * occurs, and that re-classification results in a new SPI, the
+ * (re)classifier is, in effect, the initial classifier for the
+ * resultant SPI.
+ *
+ * The SI is used in conjunction the with Service Path Identifier for
+ * Service Function Path Selection and for determining the next SFF/SF
+ * in the path.  The SI is also valuable when troubleshooting or
+ * reporting service paths.  Additionally, while the TTL field is the
+ * main mechanism for service plane loop detection, the SI can also be
+ * used for detecting service plane loops.
+ *
+ * When the Base Header specifies MD Type = 0x1, a Fixed Length Context
+ * Header (16-bytes) MUST be present immediately following the Service
+ * Path Header. The value of a Fixed Length Context
+ * Header that carries no metadata MUST be set to zero.
+ *
+ * When the base header specifies MD Type = 0x2, zero or more Variable
+ * Length Context Headers MAY be added, immediately following the
+ * Service Path Header (see Figure 5).  Therefore, Length = 0x2,
+ * indicates that only the Base Header followed by the Service Path
+ * Header are present.  The optional Variable Length Context Headers
+ * MUST be of an integer number of 4-bytes.  The base header Length
+ * field MUST be used to determine the offset to locate the original
+ * packet or frame for SFC nodes that require access to that
+ * information.
+ *
+ * The format of the optional variable length Context Headers
+ *
+ *  0                   1                   2                   3
+ *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |          Metadata Class       |      Type     |U|    Length   |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                      Variable Metadata                        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * Metadata Class (MD Class): Defines the scope of the 'Type' field to
+ * provide a hierarchical namespace.  The IANA Considerations
+ * Section 11.2.4 defines how the MD Class values can be allocated to
+ * standards bodies, vendors, and others.
+ *
+ * Type: Indicates the explicit type of metadata being carried.  The
+ * definition of the Type is the responsibility of the MD Class owner.
+ *
+ * Unassigned bit: One unassigned bit is available for future use. This
+ * bit MUST NOT be set, and MUST be ignored on receipt.
+ *
+ * Length: Indicates the length of the variable metadata, in bytes.  In
+ * case the metadata length is not an integer number of 4-byte words,
+ * the sender MUST add pad bytes immediately following the last metadata
+ * byte to extend the metadata to an integer number of 4-byte words.
+ * The receiver MUST round up the length field to the nearest 4-byte
+ * word boundary, to locate and process the next field in the packet.
+ * The receiver MUST access only those bytes in the metadata indicated
+ * by the length field (i.e., actual number of bytes) and MUST ignore
+ * the remaining bytes up to the nearest 4-byte word boundary.  The
+ * Length may be 0 or greater.
+ *
+ * A value of 0 denotes a Context Header without a Variable Metadata
+ * field.
+ *
+ * [0] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/
+ */
+
+/**
+ * struct nsh_md1_ctx - Keeps track of NSH context data
+ * @nshc<1-4>: NSH Contexts.
+ */
+struct nsh_md1_ctx {
+	__be32 context[4];
+};
+
+struct nsh_md2_tlv {
+	__be16 md_class;
+	u8 type;
+	u8 length;
+	u8 md_value[];
+};
+
+struct nsh_hdr {
+	__be16 ver_flags_ttl_len;
+	u8 mdtype;
+	u8 np;
+	__be32 path_hdr;
+	union {
+	    struct nsh_md1_ctx md1;
+	    struct nsh_md2_tlv md2;
+	};
+};
+
+/* Masking NSH header fields. */
+#define NSH_VER_MASK       0xc000
+#define NSH_VER_SHIFT      14
+#define NSH_FLAGS_MASK     0x3000
+#define NSH_FLAGS_SHIFT    12
+#define NSH_TTL_MASK       0x0fc0
+#define NSH_TTL_SHIFT      6
+#define NSH_LEN_MASK       0x003f
+#define NSH_LEN_SHIFT      0
+
+#define NSH_MDTYPE_MASK    0x0f
+#define NSH_MDTYPE_SHIFT   0
+
+#define NSH_SPI_MASK       0xffffff00
+#define NSH_SPI_SHIFT      8
+#define NSH_SI_MASK        0x000000ff
+#define NSH_SI_SHIFT       0
+
+/* NSH Base Header Next Protocol. */
+#define NSH_P_IPV4        0x01
+#define NSH_P_IPV6        0x02
+#define NSH_P_ETHERNET    0x03
+#define NSH_P_NSH         0x04
+#define NSH_P_MPLS        0x05
+
+/* MD Type Registry. */
+#define NSH_M_TYPE1     0x01
+#define NSH_M_TYPE2     0x02
+#define NSH_M_EXP1      0xFE
+#define NSH_M_EXP2      0xFF
+
+/* NSH Base Header Length */
+#define NSH_BASE_HDR_LEN  8
+
+/* NSH MD Type 1 header Length. */
+#define NSH_M_TYPE1_LEN   24
+
+/* NSH header maximum Length. */
+#define NSH_HDR_MAX_LEN 256
+
+/* NSH context headers maximum Length. */
+#define NSH_CTX_HDRS_MAX_LEN 248
+
+static inline u16 nsh_hdr_len(const struct nsh_hdr *nsh)
+{
+	return ((ntohs(nsh->ver_flags_ttl_len) & NSH_LEN_MASK)
+		>> NSH_LEN_SHIFT) << 2;
+}
+
+static inline u8 nsh_get_ver(const struct nsh_hdr *nsh)
+{
+	return (ntohs(nsh->ver_flags_ttl_len) & NSH_VER_MASK)
+		>> NSH_VER_SHIFT;
+}
+
+static inline u8 nsh_get_flags(const struct nsh_hdr *nsh)
+{
+	return (ntohs(nsh->ver_flags_ttl_len) & NSH_FLAGS_MASK)
+		>> NSH_FLAGS_SHIFT;
+}
+
+static inline u8 nsh_get_ttl(const struct nsh_hdr *nsh)
+{
+	return (ntohs(nsh->ver_flags_ttl_len) & NSH_TTL_MASK)
+		>> NSH_TTL_SHIFT;
+}
+
+static inline void __nsh_set_xflag(struct nsh_hdr *nsh, u16 xflag, u16 xmask)
+{
+	nsh->ver_flags_ttl_len
+		= (nsh->ver_flags_ttl_len & ~htons(xmask)) | htons(xflag);
+}
+
+static inline void nsh_set_flags_and_ttl(struct nsh_hdr *nsh, u8 flags, u8 ttl)
+{
+	__nsh_set_xflag(nsh, ((flags << NSH_FLAGS_SHIFT) & NSH_FLAGS_MASK) |
+			     ((ttl << NSH_TTL_SHIFT) & NSH_TTL_MASK),
+			NSH_FLAGS_MASK | NSH_TTL_MASK);
+}
+
+static inline void nsh_set_flags_ttl_len(struct nsh_hdr *nsh, u8 flags,
+					 u8 ttl, u8 len)
+{
+	len = len >> 2;
+	__nsh_set_xflag(nsh, ((flags << NSH_FLAGS_SHIFT) & NSH_FLAGS_MASK) |
+			     ((ttl << NSH_TTL_SHIFT) & NSH_TTL_MASK) |
+			     ((len << NSH_LEN_SHIFT) & NSH_LEN_MASK),
+			NSH_FLAGS_MASK | NSH_TTL_MASK | NSH_LEN_MASK);
+}
+
+#endif /* __NET_NSH_H */
diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index efeb119..bc5b727 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -138,6 +138,7 @@
 #define ETH_P_IEEE802154 0x00F6		/* IEEE802.15.4 frame		*/
 #define ETH_P_CAIF	0x00F7		/* ST-Ericsson CAIF protocol	*/
 #define ETH_P_XDSA	0x00F8		/* Multiplexed DSA protocol	*/
+#define ETH_P_NSH	0x894F		/* Ethertype for NSH. */
 
 /*
  *	This is an Ethernet frame header.
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH
  2017-08-25 14:20 [PATCH net-next v6 0/3] openvswitch: add NSH support Yi Yang
  2017-08-25 14:20 ` [PATCH net-next v6 1/3] net: add NSH header structures and helpers Yi Yang
@ 2017-08-25 14:20 ` Yi Yang
  2017-08-25 16:25   ` Jiri Benc
  2017-08-25 14:20 ` [PATCH net-next v6 3/3] openvswitch: enable NSH support Yi Yang
  2 siblings, 1 reply; 31+ messages in thread
From: Yi Yang @ 2017-08-25 14:20 UTC (permalink / raw)
  To: netdev; +Cc: dev, jbenc, e, blp, jan.scheurich, Yi Yang

NSH (Network Service Header)[1] is a new protocol for service
function chaining, it can be handled as a L3 protocol like
IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
Inner packet are two typical use cases.

We need to enbale Open vSwitch to support NSH, this patch
is to make Linux network infrastructure able to support
NSH GSO for big packet.

[1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
---
 include/linux/netdevice.h |   1 +
 include/linux/skbuff.h    |   8 +++-
 net/Kconfig               |   1 +
 net/Makefile              |   1 +
 net/core/dev.c            |  14 ++++++
 net/ipv4/udp_offload.c    |   7 +++
 net/nsh/Kconfig           |  11 +++++
 net/nsh/Makefile          |   4 ++
 net/nsh/nsh_gso.c         | 106 ++++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 151 insertions(+), 2 deletions(-)
 create mode 100644 net/nsh/Kconfig
 create mode 100644 net/nsh/Makefile
 create mode 100644 net/nsh/nsh_gso.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c5475b3..b017418 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3253,6 +3253,7 @@ struct sk_buff *napi_get_frags(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
 struct packet_offload *gro_find_receive_by_type(__be16 type);
 struct packet_offload *gro_find_complete_by_type(__be16 type);
+struct packet_offload *find_gso_segment_by_type(__be16 type);
 
 static inline void napi_free_frags(struct napi_struct *napi)
 {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7594e19..aafc8ff 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -766,7 +766,7 @@ struct sk_buff {
 	__u8			ndisc_nodetype:2;
 #endif
 	__u8			ipvs_property:1;
-	__u8			inner_protocol_type:1;
+	__u8			inner_protocol_type:2;
 	__u8			remcsum_offload:1;
 #ifdef CONFIG_NET_SWITCHDEV
 	__u8			offload_fwd_mark:1;
@@ -2174,12 +2174,16 @@ static inline void skb_tailroom_reserve(struct sk_buff *skb, unsigned int mtu,
 
 #define ENCAP_TYPE_ETHER	0
 #define ENCAP_TYPE_IPPROTO	1
+#define ENCAP_TYPE_NSH		2
 
 static inline void skb_set_inner_protocol(struct sk_buff *skb,
 					  __be16 protocol)
 {
 	skb->inner_protocol = protocol;
-	skb->inner_protocol_type = ENCAP_TYPE_ETHER;
+	if (skb->inner_protocol == htons(ETH_P_NSH))
+		skb->inner_protocol_type = ENCAP_TYPE_NSH;
+	else
+		skb->inner_protocol_type = ENCAP_TYPE_ETHER;
 }
 
 static inline void skb_set_inner_ipproto(struct sk_buff *skb,
diff --git a/net/Kconfig b/net/Kconfig
index 7d57ef3..818df7a 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -240,6 +240,7 @@ source "net/switchdev/Kconfig"
 source "net/l3mdev/Kconfig"
 source "net/qrtr/Kconfig"
 source "net/ncsi/Kconfig"
+source "net/nsh/Kconfig"
 
 config RPS
 	bool
diff --git a/net/Makefile b/net/Makefile
index bed80fa..82bfac6 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -85,3 +85,4 @@ obj-y				+= l3mdev/
 endif
 obj-$(CONFIG_QRTR)		+= qrtr/
 obj-$(CONFIG_NET_NCSI)		+= ncsi/
+obj-$(CONFIG_NET_NSH_GSO)		+= nsh/
diff --git a/net/core/dev.c b/net/core/dev.c
index 270b547..02a988d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4874,6 +4874,20 @@ struct packet_offload *gro_find_complete_by_type(__be16 type)
 }
 EXPORT_SYMBOL(gro_find_complete_by_type);
 
+struct packet_offload *find_gso_segment_by_type(__be16 type)
+{
+	struct list_head *offload_head = &offload_base;
+	struct packet_offload *ptype;
+
+	list_for_each_entry_rcu(ptype, offload_head, list) {
+		if (ptype->type != type || !ptype->callbacks.gso_segment)
+			continue;
+		return ptype;
+	}
+	return NULL;
+}
+EXPORT_SYMBOL(find_gso_segment_by_type);
+
 static void napi_skb_free_stolen_head(struct sk_buff *skb)
 {
 	skb_dst_drop(skb);
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 97658bf..31f9383 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -155,6 +155,7 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
 	__be16 protocol = skb->protocol;
 	const struct net_offload **offloads;
 	const struct net_offload *ops;
+	const struct packet_offload *po;
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
 					     netdev_features_t features);
@@ -173,6 +174,12 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
 			goto out_unlock;
 		gso_inner_segment = ops->callbacks.gso_segment;
 		break;
+	case ENCAP_TYPE_NSH:
+		protocol = skb->inner_protocol;
+		po = find_gso_segment_by_type(protocol);
+		if (!po || !po->callbacks.gso_segment)
+			goto out_unlock;
+		gso_inner_segment = po->callbacks.gso_segment;
 	default:
 		goto out_unlock;
 	}
diff --git a/net/nsh/Kconfig b/net/nsh/Kconfig
new file mode 100644
index 0000000..0157b26
--- /dev/null
+++ b/net/nsh/Kconfig
@@ -0,0 +1,11 @@
+#
+# NSH GSO support
+#
+
+config NET_NSH_GSO
+	bool "NSH GSO support"
+	depends on INET
+	default y
+	---help---
+	  This allows segmentation of GSO packet that have had NSH header
+	  pushed onto them and thus become NSH GSO packets.
diff --git a/net/nsh/Makefile b/net/nsh/Makefile
new file mode 100644
index 0000000..eb4bca0
--- /dev/null
+++ b/net/nsh/Makefile
@@ -0,0 +1,4 @@
+#
+# Makefile for NSH GSO.
+#
+obj-$(CONFIG_NET_NSH_GSO) += nsh_gso.o
diff --git a/net/nsh/nsh_gso.c b/net/nsh/nsh_gso.c
new file mode 100644
index 0000000..4b6fb29
--- /dev/null
+++ b/net/nsh/nsh_gso.c
@@ -0,0 +1,106 @@
+/*
+ *	NSH GSO Support
+ *
+ *	Authors: Yi Yang (yi.y.yang@intel.com)
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ *
+ *	Based on: net/mpls/mpls_gso.c
+ */
+
+#include <linux/err.h>
+#include <linux/netdev_features.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <net/protocol.h>
+#include <net/nsh.h>
+
+struct sk_buff *nsh_gso_segment(struct sk_buff *skb,
+				netdev_features_t features)
+{
+	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	int nshoff;
+	__be16 inner_proto;
+	struct nsh_hdr *nsh;
+	unsigned int nsh_hlen;
+	struct packet_offload *po;
+	struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
+					     netdev_features_t features);
+
+	skb_reset_network_header(skb);
+	nshoff = skb_network_header(skb) - skb_mac_header(skb);
+
+	if (unlikely(!pskb_may_pull(skb, NSH_BASE_HDR_LEN)))
+		goto out;
+
+	nsh = (struct nsh_hdr *)skb_network_header(skb);
+	nsh_hlen = nsh_hdr_len(nsh);
+	if (unlikely(!pskb_may_pull(skb, nsh_hlen)))
+		goto out;
+
+	nsh = (struct nsh_hdr *)skb_network_header(skb);
+	__skb_pull(skb, nsh_hlen);
+
+	skb_reset_transport_header(skb);
+
+	switch (nsh->np) {
+	case NSH_P_ETHERNET:
+		inner_proto = htons(ETH_P_TEB);
+		gso_inner_segment = skb_mac_gso_segment;
+		break;
+	case NSH_P_IPV4:
+		inner_proto = htons(ETH_P_IP);
+		po = find_gso_segment_by_type(inner_proto);
+		if (!po || !po->callbacks.gso_segment)
+			goto out;
+		gso_inner_segment = po->callbacks.gso_segment;
+		break;
+	case NSH_P_IPV6:
+		inner_proto = htons(ETH_P_IPV6);
+		po = find_gso_segment_by_type(inner_proto);
+		if (!po || !po->callbacks.gso_segment)
+			goto out;
+		gso_inner_segment = po->callbacks.gso_segment;
+		break;
+	case NSH_P_NSH:
+		inner_proto = htons(ETH_P_NSH);
+		gso_inner_segment = nsh_gso_segment;
+		break;
+	default:
+		goto out;
+	}
+
+	segs = gso_inner_segment(skb, features);
+	if (IS_ERR_OR_NULL(segs))
+		goto out;
+
+	skb = segs;
+	do {
+		nsh = (struct nsh_hdr *)(skb_mac_header(skb) + nshoff);
+		skb->network_header = (u8 *)nsh - skb->head;
+	} while ((skb = skb->next));
+
+out:
+	return segs;
+}
+EXPORT_SYMBOL(nsh_gso_segment);
+
+static struct packet_offload nsh_offload __read_mostly = {
+	.type = cpu_to_be16(ETH_P_NSH),
+	.priority = 15,
+	.callbacks = {
+		.gso_segment    =	nsh_gso_segment,
+	},
+};
+
+static int __init nsh_gso_init(void)
+{
+	dev_add_offload(&nsh_offload);
+
+	return 0;
+}
+
+device_initcall(nsh_gso_init);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH net-next v6 3/3] openvswitch: enable NSH support
  2017-08-25 14:20 [PATCH net-next v6 0/3] openvswitch: add NSH support Yi Yang
  2017-08-25 14:20 ` [PATCH net-next v6 1/3] net: add NSH header structures and helpers Yi Yang
  2017-08-25 14:20 ` [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH Yi Yang
@ 2017-08-25 14:20 ` Yi Yang
  2017-08-30  9:53   ` Hannes Frederic Sowa
  2 siblings, 1 reply; 31+ messages in thread
From: Yi Yang @ 2017-08-25 14:20 UTC (permalink / raw)
  To: netdev; +Cc: dev, jbenc, e, blp, jan.scheurich, Yi Yang

OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in compat mode by porting this.

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
---
 drivers/net/vxlan.c              |   7 +
 include/uapi/linux/openvswitch.h |  28 +++
 net/openvswitch/actions.c        | 178 +++++++++++++++++
 net/openvswitch/flow.c           |  55 ++++++
 net/openvswitch/flow.h           |  11 ++
 net/openvswitch/flow_netlink.c   | 404 ++++++++++++++++++++++++++++++++++++++-
 net/openvswitch/flow_netlink.h   |   4 +
 7 files changed, 686 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ae3a1da..a36c41e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -27,6 +27,7 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <net/vxlan.h>
+#include <net/nsh.h>
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ip6_tunnel.h>
@@ -1268,6 +1269,9 @@ static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
 	case VXLAN_GPE_NP_IPV6:
 		*protocol = htons(ETH_P_IPV6);
 		break;
+	case VXLAN_GPE_NP_NSH:
+		*protocol = htons(ETH_P_NSH);
+		break;
 	case VXLAN_GPE_NP_ETHERNET:
 		*protocol = htons(ETH_P_TEB);
 		break;
@@ -1807,6 +1811,9 @@ static int vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 vxflags,
 	case htons(ETH_P_IPV6):
 		gpe->next_protocol = VXLAN_GPE_NP_IPV6;
 		return 0;
+	case htons(ETH_P_NSH):
+		gpe->next_protocol = VXLAN_GPE_NP_NSH;
+		return 0;
 	case htons(ETH_P_TEB):
 		gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
 		return 0;
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 156ee4c..91dee5b 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -333,6 +333,7 @@ enum ovs_key_attr {
 	OVS_KEY_ATTR_CT_LABELS,	/* 16-octet connection tracking label */
 	OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
 	OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
+	OVS_KEY_ATTR_NSH,       /* Nested set of ovs_nsh_key_* */
 
 #ifdef __KERNEL__
 	OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -491,6 +492,29 @@ struct ovs_key_ct_tuple_ipv6 {
 	__u8   ipv6_proto;
 };
 
+enum ovs_nsh_key_attr {
+	OVS_NSH_KEY_ATTR_BASE,  /* struct ovs_nsh_key_base. */
+	OVS_NSH_KEY_ATTR_MD1,   /* struct ovs_nsh_key_md1. */
+	OVS_NSH_KEY_ATTR_MD2,   /* variable-length octets for MD type 2. */
+	__OVS_NSH_KEY_ATTR_MAX
+};
+
+#define OVS_NSH_KEY_ATTR_MAX (__OVS_NSH_KEY_ATTR_MAX - 1)
+
+struct ovs_nsh_key_base {
+	__u8 flags;
+	__u8 ttl;
+	__u8 mdtype;
+	__u8 np;
+	__be32 path_hdr;
+};
+
+#define NSH_MD1_CONTEXT_SIZE 4
+
+struct ovs_nsh_key_md1 {
+	__be32 context[NSH_MD1_CONTEXT_SIZE];
+};
+
 /**
  * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
  * @OVS_FLOW_ATTR_KEY: Nested %OVS_KEY_ATTR_* attributes specifying the flow
@@ -806,6 +830,8 @@ struct ovs_action_push_eth {
  * packet.
  * @OVS_ACTION_ATTR_POP_ETH: Pop the outermost Ethernet header off the
  * packet.
+ * @OVS_ACTION_ATTR_PUSH_NSH: push NSH header to the packet.
+ * @OVS_ACTION_ATTR_POP_NSH: pop the outermost NSH header off the packet.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -835,6 +861,8 @@ enum ovs_action_attr {
 	OVS_ACTION_ATTR_TRUNC,        /* u32 struct ovs_action_trunc. */
 	OVS_ACTION_ATTR_PUSH_ETH,     /* struct ovs_action_push_eth. */
 	OVS_ACTION_ATTR_POP_ETH,      /* No argument. */
+	OVS_ACTION_ATTR_PUSH_NSH,     /* Nested OVS_NSH_KEY_ATTR_*. */
+	OVS_ACTION_ATTR_POP_NSH,      /* No argument. */
 
 	__OVS_ACTION_ATTR_MAX,	      /* Nothing past this will be accepted
 				       * from userspace. */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index a54a556..9ca1a84 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -43,6 +43,7 @@
 #include "flow.h"
 #include "conntrack.h"
 #include "vport.h"
+#include "flow_netlink.h"
 
 struct deferred_action {
 	struct sk_buff *skb;
@@ -380,6 +381,95 @@ static int push_eth(struct sk_buff *skb, struct sw_flow_key *key,
 	return 0;
 }
 
+static int push_nsh(struct sk_buff *skb, struct sw_flow_key *key,
+		    const struct nsh_hdr *nsh_src)
+{
+	struct nsh_hdr *nsh;
+	size_t length = nsh_hdr_len(nsh_src);
+	u8 next_proto;
+
+	if (key->mac_proto == MAC_PROTO_ETHERNET) {
+		next_proto = NSH_P_ETHERNET;
+	} else {
+		switch (ntohs(skb->protocol)) {
+		case ETH_P_IP:
+			next_proto = NSH_P_IPV4;
+			break;
+		case ETH_P_IPV6:
+			next_proto = NSH_P_IPV6;
+			break;
+		case ETH_P_NSH:
+			next_proto = NSH_P_NSH;
+			break;
+		default:
+			return -ENOTSUPP;
+		}
+	}
+
+	/* Add the NSH header */
+	if (skb_cow_head(skb, length) < 0)
+		return -ENOMEM;
+
+	skb_push(skb, length);
+	nsh = (struct nsh_hdr *)(skb->data);
+	memcpy(nsh, nsh_src, length);
+	nsh->np = next_proto;
+	nsh->mdtype &= NSH_MDTYPE_MASK;
+
+	skb->protocol = htons(ETH_P_NSH);
+	key->eth.type = htons(ETH_P_NSH);
+	skb_reset_mac_header(skb);
+	skb_reset_mac_len(skb);
+
+	/* safe right before invalidate_flow_key */
+	key->mac_proto = MAC_PROTO_NONE;
+	invalidate_flow_key(key);
+	return 0;
+}
+
+static int pop_nsh(struct sk_buff *skb, struct sw_flow_key *key)
+{
+	struct nsh_hdr *nsh = (struct nsh_hdr *)(skb->data);
+	size_t length;
+	u16 inner_proto;
+
+	if (ovs_key_mac_proto(key) != MAC_PROTO_NONE ||
+	    skb->protocol != htons(ETH_P_NSH)) {
+		return -EINVAL;
+	}
+
+	switch (nsh->np) {
+	case NSH_P_ETHERNET:
+		inner_proto = htons(ETH_P_TEB);
+		break;
+	case NSH_P_IPV4:
+		inner_proto = htons(ETH_P_IP);
+		break;
+	case NSH_P_IPV6:
+		inner_proto = htons(ETH_P_IPV6);
+		break;
+	case NSH_P_NSH:
+		inner_proto = htons(ETH_P_NSH);
+		break;
+	default:
+		return -ENOTSUPP;
+	}
+
+	length = nsh_hdr_len(nsh);
+	skb_pull(skb, length);
+	skb_reset_mac_header(skb);
+	skb_reset_mac_len(skb);
+	skb->protocol = inner_proto;
+
+	/* safe right before invalidate_flow_key */
+	if (inner_proto == htons(ETH_P_TEB))
+		key->mac_proto = MAC_PROTO_ETHERNET;
+	else
+		key->mac_proto = MAC_PROTO_NONE;
+	invalidate_flow_key(key);
+	return 0;
+}
+
 static void update_ip_l4_checksum(struct sk_buff *skb, struct iphdr *nh,
 				  __be32 addr, __be32 new_addr)
 {
@@ -602,6 +692,53 @@ static int set_ipv6(struct sk_buff *skb, struct sw_flow_key *flow_key,
 	return 0;
 }
 
+static int set_nsh(struct sk_buff *skb, struct sw_flow_key *flow_key,
+		   const struct ovs_key_nsh *key,
+		   const struct ovs_key_nsh *mask)
+{
+	struct nsh_hdr *nsh;
+	int err;
+	u8 flags;
+	u8 ttl;
+	int i;
+
+	err = skb_ensure_writable(skb, skb_network_offset(skb) +
+				  sizeof(struct nsh_hdr));
+	if (unlikely(err))
+		return err;
+
+	nsh = (struct nsh_hdr *)skb_network_header(skb);
+
+	flags = nsh_get_flags(nsh);
+	flags = OVS_MASKED(flags, key->flags, mask->flags);
+	flow_key->nsh.flags = flags;
+	ttl = nsh_get_ttl(nsh);
+	ttl = OVS_MASKED(ttl, key->ttl, mask->ttl);
+	flow_key->nsh.ttl = ttl;
+	nsh_set_flags_and_ttl(nsh, flags, ttl);
+	nsh->path_hdr = OVS_MASKED(nsh->path_hdr, key->path_hdr,
+				   mask->path_hdr);
+	flow_key->nsh.path_hdr = nsh->path_hdr;
+	switch (nsh->mdtype) {
+	case NSH_M_TYPE1:
+		for (i = 0; i < NSH_MD1_CONTEXT_SIZE; i++) {
+			nsh->md1.context[i] =
+			    OVS_MASKED(nsh->md1.context[i], key->context[i],
+				       mask->context[i]);
+		}
+		memcpy(flow_key->nsh.context, nsh->md1.context,
+		       sizeof(nsh->md1.context));
+		break;
+	case NSH_M_TYPE2:
+		memset(flow_key->nsh.context, 0,
+		       sizeof(flow_key->nsh.context));
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
 /* Must follow skb_ensure_writable() since that can move the skb data. */
 static void set_tp_port(struct sk_buff *skb, __be16 *port,
 			__be16 new_port, __sum16 *check)
@@ -1024,6 +1161,32 @@ static int execute_masked_set_action(struct sk_buff *skb,
 				   get_mask(a, struct ovs_key_ethernet *));
 		break;
 
+	case OVS_KEY_ATTR_NSH: {
+		struct ovs_key_nsh nsh;
+		struct ovs_key_nsh nsh_mask;
+		size_t size = nla_len(a) / 2;
+		struct {
+			struct nlattr nla;
+			u8 data[size];
+		} attr, mask;
+
+		attr.nla.nla_type = nla_type(a);
+		attr.nla.nla_len = NLA_HDRLEN + size;
+		memcpy(attr.data, (char *)(a + 1), size);
+
+		mask.nla = attr.nla;
+		memcpy(mask.data, (char *)(a + 1) + size, size);
+
+		err = nsh_key_from_nlattr(&attr.nla, &nsh);
+		if (err)
+			break;
+		err = nsh_key_from_nlattr(&mask.nla, &nsh_mask);
+		if (err)
+			break;
+		err = set_nsh(skb, flow_key, &nsh, &nsh_mask);
+		break;
+	}
+
 	case OVS_KEY_ATTR_IPV4:
 		err = set_ipv4(skb, flow_key, nla_data(a),
 			       get_mask(a, struct ovs_key_ipv4 *));
@@ -1210,6 +1373,21 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 		case OVS_ACTION_ATTR_POP_ETH:
 			err = pop_eth(skb, key);
 			break;
+
+		case OVS_ACTION_ATTR_PUSH_NSH: {
+			u8 buffer[NSH_HDR_MAX_LEN];
+			struct nsh_hdr *nsh_hdr = (struct nsh_hdr *)buffer;
+			const struct nsh_hdr *nsh_src = nsh_hdr;
+
+			nsh_hdr_from_nlattr(nla_data(a), nsh_hdr,
+					    NSH_HDR_MAX_LEN);
+			err = push_nsh(skb, key, nsh_src);
+			break;
+		}
+
+		case OVS_ACTION_ATTR_POP_NSH:
+			err = pop_nsh(skb, key);
+			break;
 		}
 
 		if (unlikely(err)) {
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 8c94cef..34865b5 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -46,6 +46,7 @@
 #include <net/ipv6.h>
 #include <net/mpls.h>
 #include <net/ndisc.h>
+#include <net/nsh.h>
 
 #include "conntrack.h"
 #include "datapath.h"
@@ -490,6 +491,56 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
 	return 0;
 }
 
+static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key)
+{
+	struct nsh_hdr *nsh;
+	unsigned int nh_ofs = skb_network_offset(skb);
+	u8 version, length;
+	int err;
+
+	err = check_header(skb, nh_ofs + NSH_BASE_HDR_LEN);
+	if (unlikely(err))
+		return err;
+
+	nsh = (struct nsh_hdr *)skb_network_header(skb);
+	version = nsh_get_ver(nsh);
+	length = nsh_hdr_len(nsh);
+
+	if (version != 0)
+		return -EINVAL;
+
+	err = check_header(skb, nh_ofs + length);
+	if (unlikely(err))
+		return err;
+
+	nsh = (struct nsh_hdr *)skb_network_header(skb);
+	key->nsh.flags = nsh_get_flags(nsh);
+	key->nsh.ttl = nsh_get_ttl(nsh);
+	key->nsh.mdtype = nsh->mdtype;
+	key->nsh.np = nsh->np;
+	key->nsh.path_hdr = nsh->path_hdr;
+	switch (key->nsh.mdtype) {
+	case NSH_M_TYPE1:
+		if (length != NSH_M_TYPE1_LEN)
+			return -EINVAL;
+		memcpy(key->nsh.context, nsh->md1.context,
+		       sizeof(nsh->md1));
+		break;
+	case NSH_M_TYPE2:
+		/* Don't support MD type 2 metedata parsing yet */
+		if (length < NSH_BASE_HDR_LEN)
+			return -EINVAL;
+
+		memset(key->nsh.context, 0,
+		       sizeof(nsh->md1));
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /**
  * key_extract - extracts a flow key from an Ethernet frame.
  * @skb: sk_buff that contains the frame, with skb->data pointing to the
@@ -735,6 +786,10 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 				memset(&key->tp, 0, sizeof(key->tp));
 			}
 		}
+	} else if (key->eth.type == htons(ETH_P_NSH)) {
+		error = parse_nsh(skb, key);
+		if (error)
+			return error;
 	}
 	return 0;
 }
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 1875bba..6a3cd9c 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -35,6 +35,7 @@
 #include <net/inet_ecn.h>
 #include <net/ip_tunnels.h>
 #include <net/dst_metadata.h>
+#include <net/nsh.h>
 
 struct sk_buff;
 
@@ -66,6 +67,15 @@ struct vlan_head {
 	(offsetof(struct sw_flow_key, recirc_id) +	\
 	FIELD_SIZEOF(struct sw_flow_key, recirc_id))
 
+struct ovs_key_nsh {
+	u8 flags;
+	u8 ttl;
+	u8 mdtype;
+	u8 np;
+	__be32 path_hdr;
+	__be32 context[NSH_MD1_CONTEXT_SIZE];
+};
+
 struct sw_flow_key {
 	u8 tun_opts[IP_TUNNEL_OPTS_MAX];
 	u8 tun_opts_len;
@@ -144,6 +154,7 @@ struct sw_flow_key {
 			};
 		} ipv6;
 	};
+	struct ovs_key_nsh nsh;         /* network service header */
 	struct {
 		/* Connection tracking fields not packed above. */
 		struct {
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index e8eb427..3b7f166 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -78,9 +78,11 @@ static bool actions_may_change_flow(const struct nlattr *actions)
 		case OVS_ACTION_ATTR_HASH:
 		case OVS_ACTION_ATTR_POP_ETH:
 		case OVS_ACTION_ATTR_POP_MPLS:
+		case OVS_ACTION_ATTR_POP_NSH:
 		case OVS_ACTION_ATTR_POP_VLAN:
 		case OVS_ACTION_ATTR_PUSH_ETH:
 		case OVS_ACTION_ATTR_PUSH_MPLS:
+		case OVS_ACTION_ATTR_PUSH_NSH:
 		case OVS_ACTION_ATTR_PUSH_VLAN:
 		case OVS_ACTION_ATTR_SAMPLE:
 		case OVS_ACTION_ATTR_SET:
@@ -322,12 +324,27 @@ size_t ovs_tun_key_attr_size(void)
 		+ nla_total_size(2);   /* OVS_TUNNEL_KEY_ATTR_TP_DST */
 }
 
+size_t ovs_nsh_key_attr_size(void)
+{
+	/* Whenever adding new OVS_NSH_KEY_ FIELDS, we should consider
+	 * updating this function.
+	 */
+	return  nla_total_size(NSH_BASE_HDR_LEN) /* OVS_NSH_KEY_ATTR_BASE */
+		/* OVS_NSH_KEY_ATTR_MD1 and OVS_NSH_KEY_ATTR_MD2 are
+		 * mutually exclusive, so the bigger one can cover
+		 * the small one.
+		 *
+		 * OVS_NSH_KEY_ATTR_MD2
+		 */
+		+ nla_total_size(NSH_CTX_HDRS_MAX_LEN);
+}
+
 size_t ovs_key_attr_size(void)
 {
 	/* Whenever adding new OVS_KEY_ FIELDS, we should consider
 	 * updating this function.
 	 */
-	BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 28);
+	BUILD_BUG_ON(OVS_KEY_ATTR_TUNNEL_INFO != 29);
 
 	return    nla_total_size(4)   /* OVS_KEY_ATTR_PRIORITY */
 		+ nla_total_size(0)   /* OVS_KEY_ATTR_TUNNEL */
@@ -341,6 +358,8 @@ size_t ovs_key_attr_size(void)
 		+ nla_total_size(4)   /* OVS_KEY_ATTR_CT_MARK */
 		+ nla_total_size(16)  /* OVS_KEY_ATTR_CT_LABELS */
 		+ nla_total_size(40)  /* OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6 */
+		+ nla_total_size(0)   /* OVS_KEY_ATTR_NSH */
+		  + ovs_nsh_key_attr_size()
 		+ nla_total_size(12)  /* OVS_KEY_ATTR_ETHERNET */
 		+ nla_total_size(2)   /* OVS_KEY_ATTR_ETHERTYPE */
 		+ nla_total_size(4)   /* OVS_KEY_ATTR_VLAN */
@@ -373,6 +392,13 @@ static const struct ovs_len_tbl ovs_tunnel_key_lens[OVS_TUNNEL_KEY_ATTR_MAX + 1]
 	[OVS_TUNNEL_KEY_ATTR_IPV6_DST]      = { .len = sizeof(struct in6_addr) },
 };
 
+static const struct ovs_len_tbl
+ovs_nsh_key_attr_lens[OVS_NSH_KEY_ATTR_MAX + 1] = {
+	[OVS_NSH_KEY_ATTR_BASE]     = { .len = 8 },
+	[OVS_NSH_KEY_ATTR_MD1]      = { .len = 16 },
+	[OVS_NSH_KEY_ATTR_MD2]      = { .len = OVS_ATTR_VARIABLE },
+};
+
 /* The size of the argument for each %OVS_KEY_ATTR_* Netlink attribute.  */
 static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_ENCAP]	 = { .len = OVS_ATTR_NESTED },
@@ -405,6 +431,8 @@ static const struct ovs_len_tbl ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 		.len = sizeof(struct ovs_key_ct_tuple_ipv4) },
 	[OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6] = {
 		.len = sizeof(struct ovs_key_ct_tuple_ipv6) },
+	[OVS_KEY_ATTR_NSH]       = { .len = OVS_ATTR_NESTED,
+				     .next = ovs_nsh_key_attr_lens, },
 };
 
 static bool check_attr_len(unsigned int attr_len, unsigned int expected_len)
@@ -1179,6 +1207,303 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match,
 	return 0;
 }
 
+int nsh_hdr_from_nlattr(const struct nlattr *attr,
+			struct nsh_hdr *nsh, size_t size)
+{
+	struct nlattr *a;
+	int rem;
+	u8 flags = 0;
+	u8 ttl = 0;
+	int mdlen = 0;
+	bool has_md1 = false;
+	bool has_md2 = false;
+	u8 len;
+
+	nla_for_each_nested(a, attr, rem) {
+		int type = nla_type(a);
+
+		if (type > OVS_NSH_KEY_ATTR_MAX) {
+			OVS_NLERR(1, "nsh attr %d is out of range max %d",
+				  type, OVS_NSH_KEY_ATTR_MAX);
+			return -EINVAL;
+		}
+
+		if (!check_attr_len(nla_len(a),
+				    ovs_nsh_key_attr_lens[type].len)) {
+			OVS_NLERR(
+			    1,
+			    "nsh attr %d has unexpected len %d expected %d",
+			    type,
+			    nla_len(a),
+			    ovs_nsh_key_attr_lens[type].len
+			);
+			return -EINVAL;
+		}
+
+		switch (type) {
+		case OVS_NSH_KEY_ATTR_BASE: {
+			const struct ovs_nsh_key_base *base =
+				(struct ovs_nsh_key_base *)nla_data(a);
+			flags = base->flags;
+			ttl = base->ttl;
+			nsh->np = base->np;
+			nsh->mdtype = base->mdtype;
+			nsh->path_hdr = base->path_hdr;
+			break;
+		}
+		case OVS_NSH_KEY_ATTR_MD1: {
+			const struct ovs_nsh_key_md1 *md1 =
+				(struct ovs_nsh_key_md1 *)nla_data(a);
+			struct nsh_md1_ctx *md1_dst = &nsh->md1;
+
+			has_md1 = true;
+			mdlen = nla_len(a);
+			if (((mdlen + NSH_BASE_HDR_LEN) != NSH_M_TYPE1_LEN) ||
+			    ((mdlen + NSH_BASE_HDR_LEN) > size) ||
+			    (mdlen <= 0)) {
+				OVS_NLERR(
+				    1,
+				    "length %d of nsh attr %d is invalid",
+				    mdlen,
+				    type
+				);
+				return -EINVAL;
+			}
+			memcpy(md1_dst, md1, mdlen);
+			break;
+		}
+		case OVS_NSH_KEY_ATTR_MD2: {
+			const struct u8 *md2 = nla_data(a);
+			struct nsh_md2_tlv *md2_dst = &nsh->md2;
+
+			has_md2 = true;
+			mdlen = nla_len(a);
+			if (((mdlen + NSH_BASE_HDR_LEN) > size) ||
+			    (mdlen <= 0)) {
+				OVS_NLERR(
+				    1,
+				    "length %d of nsh attr %d is invalid",
+				    mdlen,
+				    type
+				);
+				return -EINVAL;
+			}
+			memcpy(md2_dst, md2, mdlen);
+			break;
+		}
+		default:
+			OVS_NLERR(1, "Unknown nsh attribute %d",
+				  type);
+			return -EINVAL;
+		}
+	}
+
+	if (rem > 0) {
+		OVS_NLERR(1, "nsh attribute has %d unknown bytes.", rem);
+		return -EINVAL;
+	}
+
+	if ((has_md1 && nsh->mdtype != NSH_M_TYPE1) ||
+	    (has_md2 && nsh->mdtype != NSH_M_TYPE2)) {
+		OVS_NLERR(1,
+			  "nsh attribute has unmatched MD type %d.",
+			  nsh->mdtype);
+		return -EINVAL;
+	}
+
+	if (unlikely(has_md1 && has_md2)) {
+		OVS_NLERR(1, "both nsh md1 and md2 attribute are there");
+		return -EINVAL;
+	}
+
+	if (unlikely(!has_md1 && !has_md2)) {
+		OVS_NLERR(1, "neither nsh md1 nor md2 attribute is there");
+		return -EINVAL;
+	}
+
+	/* nsh header length  = NSH_BASE_HDR_LEN + mdlen */
+	len = NSH_BASE_HDR_LEN + mdlen;
+	nsh_set_flags_ttl_len(nsh, flags, ttl, len);
+
+	return 0;
+}
+
+int nsh_key_from_nlattr(const struct nlattr *attr,
+			struct ovs_key_nsh *nsh)
+{
+	struct nlattr *a;
+	int rem;
+	bool has_md1 = false;
+
+	nla_for_each_nested(a, attr, rem) {
+		int type = nla_type(a);
+
+		if (type > OVS_NSH_KEY_ATTR_MAX) {
+			OVS_NLERR(1, "nsh attr %d is out of range max %d",
+				  type, OVS_NSH_KEY_ATTR_MAX);
+			return -EINVAL;
+		}
+
+		if (!check_attr_len(nla_len(a),
+				    ovs_nsh_key_attr_lens[type].len)) {
+			OVS_NLERR(
+			    1,
+			    "nsh attr %d has unexpected len %d expected %d",
+			    type,
+			    nla_len(a),
+			    ovs_nsh_key_attr_lens[type].len
+			);
+			return -EINVAL;
+		}
+
+		switch (type) {
+		case OVS_NSH_KEY_ATTR_BASE: {
+			const struct ovs_nsh_key_base *base =
+				(struct ovs_nsh_key_base *)nla_data(a);
+
+			memcpy(nsh, base, sizeof(*base));
+			break;
+		}
+		case OVS_NSH_KEY_ATTR_MD1: {
+			const struct ovs_nsh_key_md1 *md1 =
+				(struct ovs_nsh_key_md1 *)nla_data(a);
+
+			has_md1 = true;
+			memcpy(nsh->context, md1->context, sizeof(*md1));
+			break;
+		}
+		case OVS_NSH_KEY_ATTR_MD2:
+			/* Not supported yet */
+			return -ENOTSUPP;
+		default:
+			OVS_NLERR(1, "Unknown nsh attribute %d",
+				  type);
+			return -EINVAL;
+		}
+	}
+
+	if (rem > 0) {
+		OVS_NLERR(1, "nsh attribute has %d unknown bytes.", rem);
+		return -EINVAL;
+	}
+
+	if ((has_md1 && nsh->mdtype != NSH_M_TYPE1)) {
+		OVS_NLERR(1, "nsh attribute has unmatched MD type %d.",
+			  nsh->mdtype);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int nsh_key_put_from_nlattr(const struct nlattr *attr,
+				   struct sw_flow_match *match, bool is_mask,
+				   bool is_push_nsh, bool log)
+{
+	struct nlattr *a;
+	int rem;
+	bool has_md1 = false;
+	bool has_md2 = false;
+	u8 mdtype = 0;
+	int mdlen = 0;
+
+	nla_for_each_nested(a, attr, rem) {
+		int type = nla_type(a);
+		int i;
+
+		if (type > OVS_NSH_KEY_ATTR_MAX) {
+			OVS_NLERR(log, "nsh attr %d is out of range max %d",
+				  type, OVS_NSH_KEY_ATTR_MAX);
+			return -EINVAL;
+		}
+
+		if (!check_attr_len(nla_len(a),
+				    ovs_nsh_key_attr_lens[type].len)) {
+			OVS_NLERR(
+			    log,
+			    "nsh attr %d has unexpected len %d expected %d",
+			    type,
+			    nla_len(a),
+			    ovs_nsh_key_attr_lens[type].len
+			);
+			return -EINVAL;
+		}
+
+		switch (type) {
+		case OVS_NSH_KEY_ATTR_BASE: {
+			const struct ovs_nsh_key_base *base =
+				(struct ovs_nsh_key_base *)nla_data(a);
+
+			mdtype = base->mdtype;
+			SW_FLOW_KEY_PUT(match, nsh.flags,
+					base->flags, is_mask);
+			SW_FLOW_KEY_PUT(match, nsh.ttl,
+					base->ttl, is_mask);
+			SW_FLOW_KEY_PUT(match, nsh.mdtype,
+					base->mdtype, is_mask);
+			SW_FLOW_KEY_PUT(match, nsh.np,
+					base->np, is_mask);
+			SW_FLOW_KEY_PUT(match, nsh.path_hdr,
+					base->path_hdr, is_mask);
+			break;
+		}
+		case OVS_NSH_KEY_ATTR_MD1: {
+			const struct ovs_nsh_key_md1 *md1 =
+				(struct ovs_nsh_key_md1 *)nla_data(a);
+
+			has_md1 = true;
+			for (i = 0; i < NSH_MD1_CONTEXT_SIZE; i++)
+				SW_FLOW_KEY_PUT(match, nsh.context[i],
+						md1->context[i], is_mask);
+			break;
+		}
+		case OVS_NSH_KEY_ATTR_MD2:
+			if (!is_push_nsh) /* Not supported MD type 2 yet */
+				return -ENOTSUPP;
+
+			has_md2 = true;
+			mdlen = nla_len(a);
+			if ((mdlen > NSH_CTX_HDRS_MAX_LEN) ||
+			    (mdlen <= 0))
+				return -EINVAL;
+			break;
+		default:
+			OVS_NLERR(log, "Unknown nsh attribute %d",
+				  type);
+			return -EINVAL;
+		}
+	}
+
+	if (rem > 0) {
+		OVS_NLERR(log, "nsh attribute has %d unknown bytes.", rem);
+		return -EINVAL;
+	}
+
+	if (!is_mask) {
+		if ((has_md1 && mdtype != NSH_M_TYPE1)) {
+			OVS_NLERR(1, "nsh attribute has unmatched MD type %d.",
+				  mdtype);
+			return -EINVAL;
+		}
+	}
+
+	if (is_push_nsh & !is_mask) {
+		if ((has_md1 && mdtype != NSH_M_TYPE1) ||
+		    (has_md2 && mdtype != NSH_M_TYPE2) ||
+		    (has_md1 && has_md2) ||
+		    (!has_md1 && !has_md2)) {
+			OVS_NLERR(
+			    1,
+			    "push nsh attributes are invalid for type %d.",
+			    mdtype
+			);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
 static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 				u64 attrs, const struct nlattr **a,
 				bool is_mask, bool log)
@@ -1306,6 +1631,13 @@ static int ovs_key_from_nlattrs(struct net *net, struct sw_flow_match *match,
 		attrs &= ~(1 << OVS_KEY_ATTR_ARP);
 	}
 
+	if (attrs & (1 << OVS_KEY_ATTR_NSH)) {
+		if (nsh_key_put_from_nlattr(a[OVS_KEY_ATTR_NSH], match,
+					    is_mask, false, log) < 0)
+			return -EINVAL;
+		attrs &= ~(1 << OVS_KEY_ATTR_NSH);
+	}
+
 	if (attrs & (1 << OVS_KEY_ATTR_MPLS)) {
 		const struct ovs_key_mpls *mpls_key;
 
@@ -1622,6 +1954,40 @@ static int ovs_nla_put_vlan(struct sk_buff *skb, const struct vlan_head *vh,
 	return 0;
 }
 
+static int nsh_key_to_nlattr(const struct ovs_key_nsh *nsh, bool is_mask,
+			     struct sk_buff *skb)
+{
+	struct nlattr *start;
+	struct ovs_nsh_key_base base;
+	struct ovs_nsh_key_md1 md1;
+
+	memcpy(&base, nsh, sizeof(base));
+
+	if (is_mask || nsh->mdtype == NSH_M_TYPE1)
+		memcpy(md1.context, nsh->context, sizeof(md1));
+
+	start = nla_nest_start(skb, OVS_KEY_ATTR_NSH);
+	if (!start)
+		return -EMSGSIZE;
+
+	if (nla_put(skb, OVS_NSH_KEY_ATTR_BASE, sizeof(base), &base))
+		goto nla_put_failure;
+
+	if (is_mask || nsh->mdtype == NSH_M_TYPE1) {
+		if (nla_put(skb, OVS_NSH_KEY_ATTR_MD1, sizeof(md1), &md1))
+			goto nla_put_failure;
+	}
+
+	/* Don't support MD type 2 yet */
+
+	nla_nest_end(skb, start);
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 			     const struct sw_flow_key *output, bool is_mask,
 			     struct sk_buff *skb)
@@ -1750,6 +2116,9 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 		ipv6_key->ipv6_tclass = output->ip.tos;
 		ipv6_key->ipv6_hlimit = output->ip.ttl;
 		ipv6_key->ipv6_frag = output->ip.frag;
+	} else if (swkey->eth.type == htons(ETH_P_NSH)) {
+		if (nsh_key_to_nlattr(&output->nsh, is_mask, skb))
+			goto nla_put_failure;
 	} else if (swkey->eth.type == htons(ETH_P_ARP) ||
 		   swkey->eth.type == htons(ETH_P_RARP)) {
 		struct ovs_key_arp *arp_key;
@@ -2242,6 +2611,19 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 	return err;
 }
 
+static bool validate_nsh(const struct nlattr *attr, bool is_mask,
+			 bool is_push_nsh, bool log)
+{
+	struct sw_flow_match match;
+	struct sw_flow_key key;
+	int ret = 0;
+
+	ovs_match_init(&match, &key, true, NULL);
+	ret = nsh_key_put_from_nlattr(attr, &match, is_mask,
+				      is_push_nsh, log);
+	return ((ret != 0) ? false : true);
+}
+
 /* Return false if there are any non-masked bits set.
  * Mask follows data immediately, before any netlink padding.
  */
@@ -2384,6 +2766,11 @@ static int validate_set(const struct nlattr *a,
 
 		break;
 
+	case OVS_KEY_ATTR_NSH:
+		if (!validate_nsh(nla_data(a), masked, false, log))
+			return -EINVAL;
+		break;
+
 	default:
 		return -EINVAL;
 	}
@@ -2482,6 +2869,8 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			[OVS_ACTION_ATTR_TRUNC] = sizeof(struct ovs_action_trunc),
 			[OVS_ACTION_ATTR_PUSH_ETH] = sizeof(struct ovs_action_push_eth),
 			[OVS_ACTION_ATTR_POP_ETH] = 0,
+			[OVS_ACTION_ATTR_PUSH_NSH] = (u32)-1,
+			[OVS_ACTION_ATTR_POP_NSH] = 0,
 		};
 		const struct ovs_action_push_vlan *vlan;
 		int type = nla_type(a);
@@ -2636,6 +3025,19 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			mac_proto = MAC_PROTO_ETHERNET;
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_NSH:
+			mac_proto = MAC_PROTO_NONE;
+			if (!validate_nsh(nla_data(a), false, true, true))
+				return -EINVAL;
+			break;
+
+		case OVS_ACTION_ATTR_POP_NSH:
+			if (key->nsh.np == NSH_P_ETHERNET)
+				mac_proto = MAC_PROTO_ETHERNET;
+			else
+				mac_proto = MAC_PROTO_NONE;
+			break;
+
 		default:
 			OVS_NLERR(log, "Unknown Action type %d", type);
 			return -EINVAL;
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index 929c665..a849945 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -79,4 +79,8 @@ int ovs_nla_put_actions(const struct nlattr *attr,
 void ovs_nla_free_flow_actions(struct sw_flow_actions *);
 void ovs_nla_free_flow_actions_rcu(struct sw_flow_actions *);
 
+int nsh_key_from_nlattr(const struct nlattr *attr, struct ovs_key_nsh *nsh);
+int nsh_hdr_from_nlattr(const struct nlattr *attr, struct nsh_hdr *nsh_src,
+			size_t size);
+
 #endif /* flow_netlink.h */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 1/3] net: add NSH header structures and helpers
  2017-08-25 14:20 ` [PATCH net-next v6 1/3] net: add NSH header structures and helpers Yi Yang
@ 2017-08-25 15:07   ` Jiri Benc
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Benc @ 2017-08-25 15:07 UTC (permalink / raw)
  To: Yi Yang; +Cc: netdev, dev, e, blp, jan.scheurich

On Fri, 25 Aug 2017 22:20:03 +0800, Yi Yang wrote:
> --- a/include/uapi/linux/if_ether.h
> +++ b/include/uapi/linux/if_ether.h
> @@ -138,6 +138,7 @@
>  #define ETH_P_IEEE802154 0x00F6		/* IEEE802.15.4 frame		*/
>  #define ETH_P_CAIF	0x00F7		/* ST-Ericsson CAIF protocol	*/
>  #define ETH_P_XDSA	0x00F8		/* Multiplexed DSA protocol	*/
> +#define ETH_P_NSH	0x894F		/* Ethertype for NSH. */

This is in a wrong section. It belongs to Ethernet protocols a few lines
higher in the file (after ETH_P_HSR).

 Jiri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH
  2017-08-25 14:20 ` [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH Yi Yang
@ 2017-08-25 16:25   ` Jiri Benc
  2017-08-25 23:22     ` Jiri Benc
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Benc @ 2017-08-25 16:25 UTC (permalink / raw)
  To: Yi Yang; +Cc: netdev, dev, e, blp, jan.scheurich

On Fri, 25 Aug 2017 22:20:04 +0800, Yi Yang wrote:
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -766,7 +766,7 @@ struct sk_buff {
>  	__u8			ndisc_nodetype:2;
>  #endif
>  	__u8			ipvs_property:1;
> -	__u8			inner_protocol_type:1;
> +	__u8			inner_protocol_type:2;

Adding anything to sk_buff is pretty much forbidden. You can't add more
bytes to it and there are no more free bits to use, either.

Luckily, we still have one byte hole next to inner_ipproto that we can
use. What is needed is renaming of ENCAP_TYPE_IPPROTO to ENCAP_TYPE_L3
and storing the L3 type in the unused byte. It's not beautiful (would
be better to use ethertype than a custom enum) but it will work.

While looking at this, I realized that GSO for VXLAN-GPE is broken,
too. Let me fix it by implementing what I described above which will
make your patch much easier.

 Jiri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH
  2017-08-25 16:25   ` Jiri Benc
@ 2017-08-25 23:22     ` Jiri Benc
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Benc @ 2017-08-25 23:22 UTC (permalink / raw)
  To: Yi Yang; +Cc: netdev, dev, e, blp, jan.scheurich

On Fri, 25 Aug 2017 18:25:14 +0200, Jiri Benc wrote:
> While looking at this, I realized that GSO for VXLAN-GPE is broken,
> too. Let me fix it by implementing what I described above which will
> make your patch much easier.

Okay, it's not really broken and we don't need that complexity. At
least not immediately. Hw offloading in the VXLAN-GPE case probably does
not work correctly and would benefit from that change but that's a
different beast to tackle at a different time. Software segmentation
works fine for VXLAN-GPE.

There should not be much problems with NSH segmentation, either, if we
carefully store and set mac_header, mac_len and skb->protocol around
calls to skb_mac_gso_segment. Note that with zero mac_len (and correct
skb->protocol), skb_mac_gso_segment behaves in the same way that you
tried to achieve with find_gso_segment_by_type, which is thus completely
unnecessary.

More on Monday.

 Jiri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
  2017-08-25 14:20 ` [PATCH net-next v6 3/3] openvswitch: enable NSH support Yi Yang
@ 2017-08-30  9:53   ` Hannes Frederic Sowa
       [not found]     ` <87wp5l7560.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2017-08-30  9:53 UTC (permalink / raw)
  To: Yi Yang; +Cc: netdev, dev, jbenc, e, blp, jan.scheurich

Hello,

Yi Yang <yi.y.yang@intel.com> writes:

[...]

> +struct ovs_key_nsh {
> +	u8 flags;
> +	u8 ttl;
> +	u8 mdtype;
> +	u8 np;
> +	__be32 path_hdr;
> +	__be32 context[NSH_MD1_CONTEXT_SIZE];
> +};
> +
>  struct sw_flow_key {
>  	u8 tun_opts[IP_TUNNEL_OPTS_MAX];
>  	u8 tun_opts_len;
> @@ -144,6 +154,7 @@ struct sw_flow_key {
>  			};
>  		} ipv6;
>  	};
> +	struct ovs_key_nsh nsh;         /* network service header */
>  	struct {
>  		/* Connection tracking fields not packed above. */
>  		struct {

Does it makes sense to keep the context headers as part of the flow?
What is the reasoning behind it? With mdtype 2 headers this might either
not work very well or will increase sw_flow_key size causing slowdowns
for all protocols.

[...]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]     ` <87wp5l7560.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
@ 2017-08-30 11:36       ` Mooney, Sean K
  2017-08-30 15:15         ` [ovs-dev] " Hannes Frederic Sowa
  2017-09-04  2:38       ` Yang, Yi
  1 sibling, 1 reply; 31+ messages in thread
From: Mooney, Sean K @ 2017-08-30 11:36 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Yang, Yi Y
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA, e,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA



> -----Original Message-----
> From: ovs-dev-bounces-yBygre7rU0TnMu66kgdUjQ@public.gmane.org [mailto:ovs-dev-
> bounces-yBygre7rU0TnMu66kgdUjQ@public.gmane.org] On Behalf Of Hannes Frederic Sowa
> Sent: Wednesday, August 30, 2017 10:53 AM
> To: Yang, Yi Y <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org; netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; jbenc-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> e@erig.me
> Subject: Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH
> support
> 
> Hello,
> 
> Yi Yang <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
> 
> [...]
> 
> > +struct ovs_key_nsh {
> > +	u8 flags;
> > +	u8 ttl;
> > +	u8 mdtype;
> > +	u8 np;
> > +	__be32 path_hdr;
> > +	__be32 context[NSH_MD1_CONTEXT_SIZE]; };
> > +
> >  struct sw_flow_key {
> >  	u8 tun_opts[IP_TUNNEL_OPTS_MAX];
> >  	u8 tun_opts_len;
> > @@ -144,6 +154,7 @@ struct sw_flow_key {
> >  			};
> >  		} ipv6;
> >  	};
> > +	struct ovs_key_nsh nsh;         /* network service header */
> >  	struct {
> >  		/* Connection tracking fields not packed above. */
> >  		struct {
> 
> Does it makes sense to keep the context headers as part of the flow?
> What is the reasoning behind it? With mdtype 2 headers this might
> either not work very well or will increase sw_flow_key size causing
> slowdowns for all protocols.
[Mooney, Sean K] 
Having the nsh context headers in the flow is quite useful
It would allow loadblancing on values stored in the context headers
Or other use. I belive odl previously used context header 4 to store a
Flow id so this could potentialy be used with the multipath action to have ovs
Choose between several possible next hops in the chain.

Another example of where this is usefull is branching chains.
if I assume that both the classifier and
Service function forwarder are collocated in ovs on the host, and is send
A packet to a firewall service function which tags the packet as suspicious
Via setting a context header metadata field to 1, I as the sdn controller can
Install a high priority rule that will reclassify the packet as part of as separate
Service function chain the will prefer dpi on the packet before returning it to
The original chain if demand not a threat.

So while a sff dose not in general have to be able to match on the context header
If I assume I want to use ovs to implenet a classifier or service function(e.g. loadblancer)
The its desirable to be able to both match on the context headers in md type1  and also be able
To set them(this is something classifies and service fuction are allowed to do).
> 
> [...]
> _______________________________________________
> dev mailing list
> dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support
  2017-08-30 11:36       ` Mooney, Sean K
@ 2017-08-30 15:15         ` Hannes Frederic Sowa
       [not found]           ` <87inh56q8u.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2017-08-30 15:15 UTC (permalink / raw)
  To: Mooney, Sean K; +Cc: Yang, Yi Y, dev, netdev, jbenc, e

"Mooney, Sean K" <sean.k.mooney@intel.com> writes:

>> -----Original Message-----
>> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
>> bounces@openvswitch.org] On Behalf Of Hannes Frederic Sowa
>> Sent: Wednesday, August 30, 2017 10:53 AM
>> To: Yang, Yi Y <yi.y.yang@intel.com>
>> Cc: dev@openvswitch.org; netdev@vger.kernel.org; jbenc@redhat.com;
>> e@erig.me
>> Subject: Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH
>> support
>> 
>> Hello,
>> 
>> Yi Yang <yi.y.yang@intel.com> writes:
>> 
>> [...]
>> 
>> > +struct ovs_key_nsh {
>> > +	u8 flags;
>> > +	u8 ttl;
>> > +	u8 mdtype;
>> > +	u8 np;
>> > +	__be32 path_hdr;
>> > +	__be32 context[NSH_MD1_CONTEXT_SIZE]; };
>> > +
>> >  struct sw_flow_key {
>> >  	u8 tun_opts[IP_TUNNEL_OPTS_MAX];
>> >  	u8 tun_opts_len;
>> > @@ -144,6 +154,7 @@ struct sw_flow_key {
>> >  			};
>> >  		} ipv6;
>> >  	};
>> > +	struct ovs_key_nsh nsh;         /* network service header */
>> >  	struct {
>> >  		/* Connection tracking fields not packed above. */
>> >  		struct {
>> 
>> Does it makes sense to keep the context headers as part of the flow?
>> What is the reasoning behind it? With mdtype 2 headers this might
>> either not work very well or will increase sw_flow_key size causing
>> slowdowns for all protocols.
> [Mooney, Sean K]
> Having the nsh context headers in the flow is quite useful It would
> allow loadblancing on values stored in the context headers Or other
> use. I belive odl previously used context header 4 to store a Flow id
> so this could potentialy be used with the multipath action to have ovs
> Choose between several possible next hops in the chain.

In OVS, masks are a list(!) for matching. How can this work for
different paths that might require different masks? If they can't be
unified you even get exact matches. Thus, for OVS the context should not
be part of the flow.

> Another example of where this is usefull is branching chains.  if I
> assume that both the classifier and Service function forwarder are
> collocated in ovs on the host, and is send A packet to a firewall
> service function which tags the packet as suspicious Via setting a
> context header metadata field to 1, I as the sdn controller can
> Install a high priority rule that will reclassify the packet as part
> of as separate Service function chain the will prefer dpi on the
> packet before returning it to The original chain if demand not a
> threat.

You can do that with different path id's, too?

> So while a sff dose not in general have to be able to match on the
> context header If I assume I want to use ovs to implenet a classifier
> or service function(e.g. loadblancer) The its desirable to be able to
> both match on the context headers in md type1 and also be able To set
> them(this is something classifies and service fuction are allowed to
> do).

I don't think it is practical at all?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]           ` <87inh56q8u.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
@ 2017-08-30 19:00             ` Mooney, Sean K
  2017-08-31 12:49               ` [ovs-dev] " Hannes Frederic Sowa
  2017-09-01 12:11             ` Jan Scheurich
  1 sibling, 1 reply; 31+ messages in thread
From: Mooney, Sean K @ 2017-08-30 19:00 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, jbenc-H+wXaHxf7aLQT0dZR+AlfA, e,
	netdev-u79uwXL29TY76Z2rM5mHXA



> -----Original Message-----
> From: Hannes Frederic Sowa [mailto:hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org]
> Sent: Wednesday, August 30, 2017 4:16 PM
> To: Mooney, Sean K <sean.k.mooney-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: Yang, Yi Y <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org;
> netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; jbenc-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; e@erig.me
> Subject: Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH
> support
> 
> "Mooney, Sean K" <sean.k.mooney-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
> 
> >> -----Original Message-----
> >> From: ovs-dev-bounces-yBygre7rU0TnMu66kgdUjQ@public.gmane.org [mailto:ovs-dev-
> >> bounces-yBygre7rU0TnMu66kgdUjQ@public.gmane.org] On Behalf Of Hannes Frederic Sowa
> >> Sent: Wednesday, August 30, 2017 10:53 AM
> >> To: Yang, Yi Y <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >> Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org; netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; jbenc-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org;
> >> e@erig.me
> >> Subject: Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable
> >> NSH support
> >>
> >> Hello,
> >>
> >> Yi Yang <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
> >>
> >> [...]
> >>
> >> > +struct ovs_key_nsh {
> >> > +	u8 flags;
> >> > +	u8 ttl;
> >> > +	u8 mdtype;
> >> > +	u8 np;
> >> > +	__be32 path_hdr;
> >> > +	__be32 context[NSH_MD1_CONTEXT_SIZE]; };
> >> > +
> >> >  struct sw_flow_key {
> >> >  	u8 tun_opts[IP_TUNNEL_OPTS_MAX];
> >> >  	u8 tun_opts_len;
> >> > @@ -144,6 +154,7 @@ struct sw_flow_key {
> >> >  			};
> >> >  		} ipv6;
> >> >  	};
> >> > +	struct ovs_key_nsh nsh;         /* network service header */
> >> >  	struct {
> >> >  		/* Connection tracking fields not packed above. */
> >> >  		struct {
> >>
> >> Does it makes sense to keep the context headers as part of the flow?
> >> What is the reasoning behind it? With mdtype 2 headers this might
> >> either not work very well or will increase sw_flow_key size causing
> >> slowdowns for all protocols.
> > [Mooney, Sean K]
> > Having the nsh context headers in the flow is quite useful It would
> > allow loadblancing on values stored in the context headers Or other
> > use. I belive odl previously used context header 4 to store a Flow id
> > so this could potentialy be used with the multipath action to have
> ovs
> > Choose between several possible next hops in the chain.
> 
> In OVS, masks are a list(!) for matching. How can this work for
> different paths that might require different masks? If they can't be
> unified you even get exact matches. Thus, for OVS the context should
> not be part of the flow.
[Mooney, Sean K] I'm not sure what you mean about a list as I never made reference to one.
md type 1 context headers are 4 mandatory 32 bit field.
form an ovs context they should be treated the same as the 32 bit register fields.
We do not need to necessarily support those in md type 2 as all metadata is optional.
> 
> > Another example of where this is usefull is branching chains.  if I
> > assume that both the classifier and Service function forwarder are
> > collocated in ovs on the host, and is send A packet to a firewall
> > service function which tags the packet as suspicious Via setting a
> > context header metadata field to 1, I as the sdn controller can
> > Install a high priority rule that will reclassify the packet as part
> > of as separate Service function chain the will prefer dpi on the
> > packet before returning it to The original chain if demand not a
> > threat.
> 
> You can do that with different path id's, too?
[Mooney, Sean K] a service function is not allowed to alter the spi.
Only a classifier can do that. You are correct that a different spi is required for the branch
To the dpi sf but packets that are not droped can rejoin the original spi.
Packet are not require you to enter a service chain at the first hop.

service function chain are explicitly not A list. They are a directed graph composed
of service function instance or service function groups that generally 
are reducible to a directed acrylic graph and in the simple case to a simple list. 
branching an recovering chains are expected, as is loadblancing between 
service function instances in the same service function group which share the same spi. 
The spi filed in the nsh heard is intended to transport the logical service plane path id
not the rendered service path id.
> 
> > So while a sff dose not in general have to be able to match on the
> > context header If I assume I want to use ovs to implenet a classifier
> > or service function(e.g. loadblancer) The its desirable to be able to
> > both match on the context headers in md type1 and also be able To set
> > them(this is something classifies and service fuction are allowed to
> > do).
> 
> I don't think it is practical at all?
[Mooney, Sean K] To day in OpenStack we co-locate the clarifier at every
Ovs instance and use mpls as a standing for nsh doing proxing at every hop before sending
To an sf. Flow based load balancing is not only practical but has been demonstrated to work with odl gbp classifier
and sfc application controlling a patched ovs with nsh support in 2015. 

If you look at slide 36 of http://events.linuxfoundation.org/sites/events/files/slides/odl%20summit%20sfc%20v5.pdf
you will see that prior to the berilium release of odl context header 1 and 2 are used to allow multi-tenancy
and overlapping ips. For the loadblancing implentaiton I belived they also used context header 4 to store a flow id.
This require odl to generate openflow rules to set the context headers as part of classification.

You can also see that context header 1 and 2 are still used in the newer odl netvirt sfc classifier implementation
https://github.com/opendaylight/netvirt/blob/ba22f7cf19d8a827d77a3391a7f654344ade43d8/docs/specs/new-sfc-classifier.rst#pipeline-changes
so for odl existing nsh support to work with md type one we must have the ability to set the nsh context headers
and should have the ability to match on them also for load lancing between service function in service function group.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support
  2017-08-30 19:00             ` Mooney, Sean K
@ 2017-08-31 12:49               ` Hannes Frederic Sowa
  2017-09-04  9:38                 ` Jan Scheurich
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2017-08-31 12:49 UTC (permalink / raw)
  To: Mooney, Sean K; +Cc: Yang, Yi Y, dev, netdev, jbenc, e

Hello,

"Mooney, Sean K" <sean.k.mooney@intel.com> writes:


[...]

>> >> > +struct ovs_key_nsh {
>> >> > +	u8 flags;
>> >> > +	u8 ttl;
>> >> > +	u8 mdtype;
>> >> > +	u8 np;
>> >> > +	__be32 path_hdr;
>> >> > +	__be32 context[NSH_MD1_CONTEXT_SIZE]; };
>> >> > +
>> >> >  struct sw_flow_key {
>> >> >  	u8 tun_opts[IP_TUNNEL_OPTS_MAX];
>> >> >  	u8 tun_opts_len;
>> >> > @@ -144,6 +154,7 @@ struct sw_flow_key {
>> >> >  			};
>> >> >  		} ipv6;
>> >> >  	};
>> >> > +	struct ovs_key_nsh nsh;         /* network service header */
>> >> >  	struct {
>> >> >  		/* Connection tracking fields not packed above. */
>> >> >  		struct {
>> >>
>> >> Does it makes sense to keep the context headers as part of the flow?
>> >> What is the reasoning behind it? With mdtype 2 headers this might
>> >> either not work very well or will increase sw_flow_key size causing
>> >> slowdowns for all protocols.
>> > [Mooney, Sean K]
>> > Having the nsh context headers in the flow is quite useful It would
>> > allow loadblancing on values stored in the context headers Or other
>> > use. I belive odl previously used context header 4 to store a Flow id
>> > so this could potentialy be used with the multipath action to have
>> ovs
>> > Choose between several possible next hops in the chain.
>> 
>> In OVS, masks are a list(!) for matching. How can this work for
>> different paths that might require different masks? If they can't be
>> unified you even get exact matches. Thus, for OVS the context should
>> not be part of the flow.

> [Mooney, Sean K] I'm not sure what you mean about a list as I never
> made reference to one.  md type 1 context headers are 4 mandatory 32
> bit field.

The semantic of the context fields depend on the path id. Every path can
have their own interpretation of context fields.

Thus the paths will cetainly have their own masks. I hope I understood
this part of the standard correctly.

dp only supports installing (approximated) disjoint megaflows and this
affects performance a lot. Every new mask that gets added to the dp will
be installed into a list which will be walked sequentially for
packets. This will kill performance.

I assume that user space slows down, too, depending on the algorithm
they use generate megaflows.

All in all, this stuff sounds extremely fragile to me. E.g. imagine you
configure a new match on your context and suddenly you start to generate
exact match flows, they can't be offloaded anymore at more than 10Gbp/s,
I don't think the network survives this event at all.

> form an ovs context they should be treated the same as the 32 bit register fields.
> We do not need to necessarily support those in md type 2 as all metadata is optional.
>> 
>> > Another example of where this is usefull is branching chains.  if I
>> > assume that both the classifier and Service function forwarder are
>> > collocated in ovs on the host, and is send A packet to a firewall
>> > service function which tags the packet as suspicious Via setting a
>> > context header metadata field to 1, I as the sdn controller can
>> > Install a high priority rule that will reclassify the packet as part
>> > of as separate Service function chain the will prefer dpi on the
>> > packet before returning it to The original chain if demand not a
>> > threat.
>> 
>> You can do that with different path id's, too?
> [Mooney, Sean K] a service function is not allowed to alter the spi.
> Only a classifier can do that. You are correct that a different spi is required for the branch
> To the dpi sf but packets that are not droped can rejoin the original spi.
> Packet are not require you to enter a service chain at the first hop.
>
> service function chain are explicitly not A list. They are a directed graph composed
> of service function instance or service function groups that generally 
> are reducible to a directed acrylic graph and in the simple case to a simple list. 
> branching an recovering chains are expected, as is loadblancing between 
> service function instances in the same service function group which share the same spi. 
> The spi filed in the nsh heard is intended to transport the logical service plane path id
> not the rendered service path id.

Use an explicit action with a parameter to multiplex on a context
header, no problem with that.

>> 
>> > So while a sff dose not in general have to be able to match on the
>> > context header If I assume I want to use ovs to implenet a classifier
>> > or service function(e.g. loadblancer) The its desirable to be able to
>> > both match on the context headers in md type1 and also be able To set
>> > them(this is something classifies and service fuction are allowed to
>> > do).
>> 
>> I don't think it is practical at all?
> [Mooney, Sean K] To day in OpenStack we co-locate the clarifier at every
> Ovs instance and use mpls as a standing for nsh doing proxing at every hop before sending
> To an sf. Flow based load balancing is not only practical but has been demonstrated to work with odl gbp classifier
> and sfc application controlling a patched ovs with nsh support in 2015. 

Define an action to do so.

> If you look at slide 36 of http://events.linuxfoundation.org/sites/events/files/slides/odl%20summit%20sfc%20v5.pdf
> you will see that prior to the berilium release of odl context header 1 and 2 are used to allow multi-tenancy
> and overlapping ips. For the loadblancing implentaiton I belived they also used context header 4 to store a flow id.
> This require odl to generate openflow rules to set the context headers
> as part of classification.

Okay.

> You can also see that context header 1 and 2 are still used in the newer odl netvirt sfc classifier implementation
> https://github.com/opendaylight/netvirt/blob/ba22f7cf19d8a827d77a3391a7f654344ade43d8/docs/specs/new-sfc-classifier.rst#pipeline-changes
> so for odl existing nsh support to work with md type one we must have the ability to set the nsh context headers
> and should have the ability to match on them also for load lancing between service function in service function group.

As I said, a separate action sounds much more useful.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]           ` <87inh56q8u.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  2017-08-30 19:00             ` Mooney, Sean K
@ 2017-09-01 12:11             ` Jan Scheurich
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Scheurich @ 2017-09-01 12:11 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Mooney, Sean K
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA, e,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA

> > [Mooney, Sean K]
> > Having the nsh context headers in the flow is quite useful It would
> > allow loadblancing on values stored in the context headers Or other
> > use. I belive odl previously used context header 4 to store a Flow id
> > so this could potentialy be used with the multipath action to have ovs
> > Choose between several possible next hops in the chain.
> 
> In OVS, masks are a list(!) for matching. How can this work for different
> paths that might require different masks? If they can't be unified you even
> get exact matches. Thus, for OVS the context should not be part of the
> flow.

The NSH support in OVS 2.8 (for the user-space datapath only, so far) supports matching on and manipulating the fixed size MD1 context headers C1-C4. They are part of the flow and there are corresponding OXM fields defined. It is up to the SDN controller to program pipelines that match on or set these fields. 

The goal was to support all relevant NSH use cases for MD1: Classifier, SFF, and (with certain limitations) NSH proxy, and SF.

We also support MD2 TLV context headers but not yet for matching and setting, so MD2 TLVs are not part of the flow. OVS 2.8 can add MD2 context TLVs with the encap(nsh) action (classifier use case), can transparently forward MD2 headers (SFF use case) and pop an NSH header with MD2 context (final SFF use case). Support for matching and setting MD2 TLVs is FFS and can be added in a later release.

BR, Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]     ` <87wp5l7560.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  2017-08-30 11:36       ` Mooney, Sean K
@ 2017-09-04  2:38       ` Yang, Yi
  2017-09-04 11:22         ` Hannes Frederic Sowa
  1 sibling, 1 reply; 31+ messages in thread
From: Yang, Yi @ 2017-09-04  2:38 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

On Wed, Aug 30, 2017 at 05:53:27PM +0800, Hannes Frederic Sowa wrote:
> Hello,
> 
> Yi Yang <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
> 
> [...]
> 
> > +struct ovs_key_nsh {
> > +	u8 flags;
> > +	u8 ttl;
> > +	u8 mdtype;
> > +	u8 np;
> > +	__be32 path_hdr;
> > +	__be32 context[NSH_MD1_CONTEXT_SIZE];
> > +};
> > +
> >  struct sw_flow_key {
> >  	u8 tun_opts[IP_TUNNEL_OPTS_MAX];
> >  	u8 tun_opts_len;
> > @@ -144,6 +154,7 @@ struct sw_flow_key {
> >  			};
> >  		} ipv6;
> >  	};
> > +	struct ovs_key_nsh nsh;         /* network service header */
> >  	struct {
> >  		/* Connection tracking fields not packed above. */
> >  		struct {
> 
> Does it makes sense to keep the context headers as part of the flow?
> What is the reasoning behind it? With mdtype 2 headers this might either
> not work very well or will increase sw_flow_key size causing slowdowns
> for all protocols.

For userspace, miniflow can handle such issue, but kernel data path
didn't provide such mechanism, so I don't think we can think of a better
way to fix this.

We have to have context headers in flow for match and set, every hop in
service function chaining can put its metedata into context headers on
demand, MD type 2 is much more complicated than you can imagine, it is
impossible to use an uniform way to handle MD type 1 and MD type 2, our
way is to keep MD type 1 keys in struct ovs_key_nsh and to reuse struct
flow_tnl for MD type 2 metadata, in case of MD type 2, we can set
context headers to 0 in struct ovs_key_nsh.

I beleive every newly-added key will result in similiar issue you
concern, maybe it will be better to introduce miniflow in kernel data
path, but it isn't in scope of this patch series.

It will be great if you can have a better proposal to fix your concern.

> 
> [...]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support
  2017-08-31 12:49               ` [ovs-dev] " Hannes Frederic Sowa
@ 2017-09-04  9:38                 ` Jan Scheurich
  2017-09-04 11:45                   ` Hannes Frederic Sowa
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Scheurich @ 2017-09-04  9:38 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Mooney, Sean K; +Cc: dev, e, jbenc, netdev

> >> >> Does it makes sense to keep the context headers as part of the flow?
> >> >> What is the reasoning behind it? With mdtype 2 headers this might
> >> >> either not work very well or will increase sw_flow_key size causing
> >> >> slowdowns for all protocols.
> >> > [Mooney, Sean K]
> >> > Having the nsh context headers in the flow is quite useful It would
> >> > allow loadblancing on values stored in the context headers Or other
> >> > use. I belive odl previously used context header 4 to store a Flow id
> >> > so this could potentialy be used with the multipath action to have
> >> ovs
> >> > Choose between several possible next hops in the chain.
> >>
> >> In OVS, masks are a list(!) for matching. How can this work for
> >> different paths that might require different masks? If they can't be
> >> unified you even get exact matches. Thus, for OVS the context should
> >> not be part of the flow.
> 
> > [Mooney, Sean K] I'm not sure what you mean about a list as I never
> > made reference to one.  md type 1 context headers are 4 mandatory 32
> > bit field.
> 
> The semantic of the context fields depend on the path id. Every path can
> have their own interpretation of context fields.
> 
> Thus the paths will cetainly have their own masks. I hope I understood
> this part of the standard correctly.
> 
> dp only supports installing (approximated) disjoint megaflows and this
> affects performance a lot. Every new mask that gets added to the dp will
> be installed into a list which will be walked sequentially for
> packets. This will kill performance.
> 
> I assume that user space slows down, too, depending on the algorithm
> they use generate megaflows.

The primary use case for OVS in SFC/NSH is SFF. In almost all SFF use 
cases OVS will not need to match on context headers. The ability of OVS
to match on context headers should not in general slow down the datapath.

When SFC controllers match on parts of the context headers (e.g. in the 
final SFF or for load-balancing purposes), we will get additional megaflow 
masks. This is a fact of life in OVS and nothing new in NSH. I don't think
this should prevent us from supporting valid use cases in OVS.

The slow-down of the megaflow lookup in the datapath with the number 
of masks has been addressed in the user-space datapath with sorting the
mask lists according to hit frequency and maintaining one sorted lists per 
ingress port. There is further work in progress to improve on this with a
cuckoo hash to pick the most likely mask first.
It should be possible to apply similar optimization schemes also in the
Kernel datapath.

> > form an ovs context they should be treated the same as the 32 bit register fields.
> > We do not need to necessarily support those in md type 2 as all metadata is optional.

Support for matching on or updating MD2 TLV context headers is FFS in a
future OVS release. We do not foresee to add MD2 TLVs explicitly to the 
flow struct.

> > You can also see that context header 1 and 2 are still used in the newer odl netvirt sfc classifier implementation
> > https://github.com/opendaylight/netvirt/blob/ba22f7cf19d8a827d77a3391a7f654344ade43d8/docs/specs/new-sfc-
> classifier.rst#pipeline-changes
> > so for odl existing nsh support to work with md type one we must have the ability to set the nsh context headers
> > and should have the ability to match on them also for load lancing between service function in service function group.
> 
> As I said, a separate action sounds much more useful.

I don't think it is a good approach in OVS to introduce custom actions for NSH, e.g. for load sharing based on context header data. The primary tool for load sharing in OpenFlow is the Select Group. OVS already support an extension to the OF 1.5 Group Mod message to specify which fields to hash on. This can be used to hash on the relevant NSH context headers.

BR, Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
  2017-09-04  2:38       ` Yang, Yi
@ 2017-09-04 11:22         ` Hannes Frederic Sowa
  2017-09-04 11:57           ` Jan Scheurich
       [not found]           ` <87mv6abte5.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  0 siblings, 2 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2017-09-04 11:22 UTC (permalink / raw)
  To: Yang, Yi; +Cc: netdev, dev, jbenc, e, blp, jan.scheurich

Hello,

"Yang, Yi" <yi.y.yang@intel.com> writes:

> On Wed, Aug 30, 2017 at 05:53:27PM +0800, Hannes Frederic Sowa wrote:
>> Hello,
>> 
>> Yi Yang <yi.y.yang@intel.com> writes:
>> 
>> [...]
>> 
>> > +struct ovs_key_nsh {
>> > +	u8 flags;
>> > +	u8 ttl;
>> > +	u8 mdtype;
>> > +	u8 np;
>> > +	__be32 path_hdr;
>> > +	__be32 context[NSH_MD1_CONTEXT_SIZE];
>> > +};
>> > +
>> >  struct sw_flow_key {
>> >  	u8 tun_opts[IP_TUNNEL_OPTS_MAX];
>> >  	u8 tun_opts_len;
>> > @@ -144,6 +154,7 @@ struct sw_flow_key {
>> >  			};
>> >  		} ipv6;
>> >  	};
>> > +	struct ovs_key_nsh nsh;         /* network service header */
>> >  	struct {
>> >  		/* Connection tracking fields not packed above. */
>> >  		struct {
>> 
>> Does it makes sense to keep the context headers as part of the flow?
>> What is the reasoning behind it? With mdtype 2 headers this might either
>> not work very well or will increase sw_flow_key size causing slowdowns
>> for all protocols.
>
> For userspace, miniflow can handle such issue, but kernel data path
> didn't provide such mechanism, so I don't think we can think of a better
> way to fix this.

I do think that both, kernel and dpdk dp, are fine if you don't match on
context fields, which I think is the common case.

As soon as a few paths start to match on contexts at different offsets I
am not sure if miniflow will actually help. I don't know enough about
that. Set cover is a NP hard problem, so constructing plain simple flows
will be an approximation somewhere anyway at some point.

If the context values are in some way discrete it might make sense, but
for load balancing purposes your ingress router might fill out one
context field with a flow hash of the inner flow to allow ECMP or
loadbalancing. You might want to treat that separately. Otherwise the
different paths might always need to context[3] & 0x3 (bits as number of
possible next hops) and put that into the flow state. Because every hop
and every path might have different numbers of nexthops etc., I don't
think this is easy unifiable anymore.

> We have to have context headers in flow for match and set, every hop in
> service function chaining can put its metedata into context headers on
> demand, MD type 2 is much more complicated than you can imagine, it is
> impossible to use an uniform way to handle MD type 1 and MD type 2, our
> way is to keep MD type 1 keys in struct ovs_key_nsh and to reuse struct
> flow_tnl for MD type 2 metadata, in case of MD type 2, we can set
> context headers to 0 in struct ovs_key_nsh.

Understood and agreed.

> I beleive every newly-added key will result in similiar issue you
> concern, maybe it will be better to introduce miniflow in kernel data
> path, but it isn't in scope of this patch series.

You will see the same problem with offloading flows e.g. to network
cards very soon. Flow explosion here will cause your 100Gbit/s NIC
suddenly to go to software slow path.

> It will be great if you can have a better proposal to fix your concern.

I do think that something alike miniflow might make sense, but I don't
know enough about miniflow.

New ACTIONS, that are not only bound to NSH, seem appealing to me at the
moment. E.g. have a subtable match in the kernel dp or allowing for some
OXM % modulo -> select action or neighbor alike thing would probably
cover a lot of cases in the beginning. The submatch would probably very
much reassemble the miniflow in dpdk dp.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support
  2017-09-04  9:38                 ` Jan Scheurich
@ 2017-09-04 11:45                   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2017-09-04 11:45 UTC (permalink / raw)
  To: Jan Scheurich; +Cc: Mooney, Sean K, dev, e, jbenc, netdev

Hello,

Jan Scheurich <jan.scheurich@ericsson.com> writes:

>> >> >> Does it makes sense to keep the context headers as part of the flow?
>> >> >> What is the reasoning behind it? With mdtype 2 headers this might
>> >> >> either not work very well or will increase sw_flow_key size causing
>> >> >> slowdowns for all protocols.
>> >> > [Mooney, Sean K]
>> >> > Having the nsh context headers in the flow is quite useful It would
>> >> > allow loadblancing on values stored in the context headers Or other
>> >> > use. I belive odl previously used context header 4 to store a Flow id
>> >> > so this could potentialy be used with the multipath action to have
>> >> ovs
>> >> > Choose between several possible next hops in the chain.
>> >>
>> >> In OVS, masks are a list(!) for matching. How can this work for
>> >> different paths that might require different masks? If they can't be
>> >> unified you even get exact matches. Thus, for OVS the context should
>> >> not be part of the flow.
>> 
>> > [Mooney, Sean K] I'm not sure what you mean about a list as I never
>> > made reference to one.  md type 1 context headers are 4 mandatory 32
>> > bit field.
>> 
>> The semantic of the context fields depend on the path id. Every path can
>> have their own interpretation of context fields.
>> 
>> Thus the paths will cetainly have their own masks. I hope I understood
>> this part of the standard correctly.
>> 
>> dp only supports installing (approximated) disjoint megaflows and this
>> affects performance a lot. Every new mask that gets added to the dp will
>> be installed into a list which will be walked sequentially for
>> packets. This will kill performance.
>> 
>> I assume that user space slows down, too, depending on the algorithm
>> they use generate megaflows.
>
> The primary use case for OVS in SFC/NSH is SFF. In almost all SFF use 
> cases OVS will not need to match on context headers. The ability of OVS
> to match on context headers should not in general slow down the datapath.

The sw_flow_key is huge nowadays. You will be expanding it to the eigth
cache line. But I agree that it should not generally slow down the data
path.

> When SFC controllers match on parts of the context headers (e.g. in the 
> final SFF or for load-balancing purposes), we will get additional megaflow 
> masks. This is a fact of life in OVS and nothing new in NSH. I don't think
> this should prevent us from supporting valid use cases in OVS.

Other protocols are using entropy from the protocol and load balance on
that implicitly. Why not NSH?

I would argue that before NSH the masks were pretty constant. NSH
introduces context dependent paths where the design emphasizes to have
different masks per tenant. I don't think this is the case for other
protocols processed by the kernel dp right now. So the megaflow
unification was rather effective so far.

I expect a major slow down with just 10-20 masks.

Btw. this also affects possibilities to synthesize those flows to
hardware at all.

> The slow-down of the megaflow lookup in the datapath with the number 
> of masks has been addressed in the user-space datapath with sorting the
> mask lists according to hit frequency and maintaining one sorted lists per 
> ingress port. There is further work in progress to improve on this with a
> cuckoo hash to pick the most likely mask first.
> It should be possible to apply similar optimization schemes also in the
> Kernel datapath.

Lots of optimizations could be done, I agree, but the fundamental
problem still exists, especially if you want to be traffic agnostic
(short slow lived flows, 64 byte size UDP/RTP traffic etc., you
basically walk through more layers of abstraction to find your flow
resolution).

>
>> > form an ovs context they should be treated the same as the 32 bit register fields.
>> > We do not need to necessarily support those in md type 2 as all metadata is optional.
>
> Support for matching on or updating MD2 TLV context headers is FFS in a
> future OVS release. We do not foresee to add MD2 TLVs explicitly to the 
> flow struct.

In my opinion I don't see a difference between MD1 and MD2.

>> > You can also see that context header 1 and 2 are still used in the newer odl netvirt sfc classifier implementation
>> > https://github.com/opendaylight/netvirt/blob/ba22f7cf19d8a827d77a3391a7f654344ade43d8/docs/specs/new-sfc-
>> classifier.rst#pipeline-changes
>> > so for odl existing nsh support to work with md type one we must have the ability to set the nsh context headers
>> > and should have the ability to match on them also for load lancing between service function in service function group.
>> 
>> As I said, a separate action sounds much more useful.
>
> I don't think it is a good approach in OVS to introduce custom actions
> for NSH, e.g. for load sharing based on context header data. The
> primary tool for load sharing in OpenFlow is the Select Group. OVS
> already support an extension to the OF 1.5 Group Mod message to
> specify which fields to hash on. This can be used to hash on the
> relevant NSH context headers.

It doesn't need to be custom to NSH at all. A load balancing action
based on a flow hash seems pretty common to me (I am talking about
kernel actions here).

Thanks and bye,
Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH net-next v6 3/3] openvswitch: enable NSH support
  2017-09-04 11:22         ` Hannes Frederic Sowa
@ 2017-09-04 11:57           ` Jan Scheurich
       [not found]           ` <87mv6abte5.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Scheurich @ 2017-09-04 11:57 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Yang, Yi; +Cc: netdev, dev, jbenc, e, blp

> >> Does it makes sense to keep the context headers as part of the flow?
> >> What is the reasoning behind it? With mdtype 2 headers this might either
> >> not work very well or will increase sw_flow_key size causing slowdowns
> >> for all protocols.
> >
> > For userspace, miniflow can handle such issue, but kernel data path
> > didn't provide such mechanism, so I don't think we can think of a better
> > way to fix this.
> 
> I do think that both, kernel and dpdk dp, are fine if you don't match on
> context fields, which I think is the common case.

Agree

> As soon as a few paths start to match on contexts at different offsets I
> am not sure if miniflow will actually help. I don't know enough about
> that. Set cover is a NP hard problem, so constructing plain simple flows
> will be an approximation somewhere anyway at some point.
> 
> If the context values are in some way discrete it might make sense, but
> for load balancing purposes your ingress router might fill out one
> context field with a flow hash of the inner flow to allow ECMP or
> loadbalancing. You might want to treat that separately. Otherwise the
> different paths might always need to context[3] & 0x3 (bits as number of
> possible next hops) and put that into the flow state. Because every hop
> and every path might have different numbers of nexthops etc., I don't
> think this is easy unifiable anymore.

Yes, that would become a scaling challenge for the datapaths and also for 
schemes to off-load that datapath to HW. I believe it requires discipline
on the controller side to not let the number of needed masks explode for
OVS.

> > I believe every newly-added key will result in similiar issue you
> > concern, maybe it will be better to introduce miniflow in kernel data
> > path, but it isn't in scope of this patch series.
> 
> You will see the same problem with offloading flows e.g. to network
> cards very soon. Flow explosion here will cause your 100Gbit/s NIC
> suddenly to go to software slow path.
> 
> > It will be great if you can have a better proposal to fix your concern.
> 
> I do think that something alike miniflow might make sense, but I don't
> know enough about miniflow.

Struct miniflow in the netdev datapath is merely a compact representation
of struct flow. struct flow is partitioned in to 64-bit "stripes" and struct
miniflow only stores those 64-bit bit stripes that have a non-zero mask stripe.
It reduces the memory footprint for storing packet flow and megaflow entries
but does not provide any megaflow lookup performance as such.

> New ACTIONS, that are not only bound to NSH, seem appealing to me at the
> moment. E.g. have a subtable match in the kernel dp or allowing for some
> OXM % modulo -> select action or neighbor alike thing would probably
> cover a lot of cases in the beginning. The submatch would probably very
> much reassemble the miniflow in dpdk dp.

I would like to discuss your ideas to optimize scalable load-sharing in OVS.
But I think we should do that in a separate mail thread to let the immediate
NSH kernel datapath work proceed. 

Can you  provide some more details of what you suggest? Perhaps take that 
on the ovs-dev mailing list.

BR, Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]           ` <87mv6abte5.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
@ 2017-09-05  2:11             ` Yang, Yi
       [not found]               ` <20170905021112.GA86057-re2EX8HDrk21gSHoDXDV2kEOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Yang, Yi @ 2017-09-05  2:11 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

On Mon, Sep 04, 2017 at 07:22:26PM +0800, Hannes Frederic Sowa wrote:
> Hello,
> 
> "Yang, Yi" <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
> 
> > On Wed, Aug 30, 2017 at 05:53:27PM +0800, Hannes Frederic Sowa wrote:
> >> Hello,
> >> 
> >> Yi Yang <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
> >> 
> >> [...]
> >> 
> >> > +struct ovs_key_nsh {
> >> > +	u8 flags;
> >> > +	u8 ttl;
> >> > +	u8 mdtype;
> >> > +	u8 np;
> >> > +	__be32 path_hdr;
> >> > +	__be32 context[NSH_MD1_CONTEXT_SIZE];
> >> > +};
> >> > +
> >> >  struct sw_flow_key {
> >> >  	u8 tun_opts[IP_TUNNEL_OPTS_MAX];
> >> >  	u8 tun_opts_len;
> >> > @@ -144,6 +154,7 @@ struct sw_flow_key {
> >> >  			};
> >> >  		} ipv6;
> >> >  	};
> >> > +	struct ovs_key_nsh nsh;         /* network service header */
> >> >  	struct {
> >> >  		/* Connection tracking fields not packed above. */
> >> >  		struct {
> >> 
> >> Does it makes sense to keep the context headers as part of the flow?
> >> What is the reasoning behind it? With mdtype 2 headers this might either
> >> not work very well or will increase sw_flow_key size causing slowdowns
> >> for all protocols.
> >
> > For userspace, miniflow can handle such issue, but kernel data path
> > didn't provide such mechanism, so I don't think we can think of a better
> > way to fix this.
> 
> I do think that both, kernel and dpdk dp, are fine if you don't match on
> context fields, which I think is the common case.
> 
> As soon as a few paths start to match on contexts at different offsets I
> am not sure if miniflow will actually help. I don't know enough about
> that. Set cover is a NP hard problem, so constructing plain simple flows
> will be an approximation somewhere anyway at some point.
> 
> If the context values are in some way discrete it might make sense, but
> for load balancing purposes your ingress router might fill out one
> context field with a flow hash of the inner flow to allow ECMP or
> loadbalancing. You might want to treat that separately. Otherwise the
> different paths might always need to context[3] & 0x3 (bits as number of
> possible next hops) and put that into the flow state. Because every hop
> and every path might have different numbers of nexthops etc., I don't
> think this is easy unifiable anymore.

Anyway, this is a common issue but not NSH-implemention-specific issue.
In my perspective, kernel data path won't have better perfromance than
userspace DPDK data path, maybe you're considering hardware offload, but
so far there isn't an infrastructure in OVS to offload NSH.

> 
> > We have to have context headers in flow for match and set, every hop in
> > service function chaining can put its metedata into context headers on
> > demand, MD type 2 is much more complicated than you can imagine, it is
> > impossible to use an uniform way to handle MD type 1 and MD type 2, our
> > way is to keep MD type 1 keys in struct ovs_key_nsh and to reuse struct
> > flow_tnl for MD type 2 metadata, in case of MD type 2, we can set
> > context headers to 0 in struct ovs_key_nsh.
> 
> Understood and agreed.
> 
> > I beleive every newly-added key will result in similiar issue you
> > concern, maybe it will be better to introduce miniflow in kernel data
> > path, but it isn't in scope of this patch series.
> 
> You will see the same problem with offloading flows e.g. to network
> cards very soon. Flow explosion here will cause your 100Gbit/s NIC
> suddenly to go to software slow path.

Do you mean the whole OVS data path offload? As I said, this is a common
issue, it isn't NSH-specific.

> 
> > It will be great if you can have a better proposal to fix your concern.
> 
> I do think that something alike miniflow might make sense, but I don't
> know enough about miniflow.
> 
> New ACTIONS, that are not only bound to NSH, seem appealing to me at the
> moment. E.g. have a subtable match in the kernel dp or allowing for some
> OXM % modulo -> select action or neighbor alike thing would probably
> cover a lot of cases in the beginning. The submatch would probably very
> much reassemble the miniflow in dpdk dp.

I'm not sure what new action you expect to bring here, I think group
action is just for this, as you said it isn't only bound to NSH, you can
start a new thread to discuss this. I don't think it is in scope of NSH.

> 
> Thanks,
> Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]               ` <20170905021112.GA86057-re2EX8HDrk21gSHoDXDV2kEOCMrvLtNR@public.gmane.org>
@ 2017-09-05 10:30                 ` Hannes Frederic Sowa
       [not found]                   ` <87vakxsaj2.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  2017-09-05 12:19                   ` Jan Scheurich
  0 siblings, 2 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2017-09-05 10:30 UTC (permalink / raw)
  To: Yang, Yi
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

"Yang, Yi" <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:

> I'm not sure what new action you expect to bring here, I think group
> action is just for this, as you said it isn't only bound to NSH, you can
> start a new thread to discuss this. I don't think it is in scope of NSH.

It is in scope of this discussion as you will provide a user space API
that makes the NSH context fields accessible from user space in a
certain way. If you commit to this, there is no way going back.

I haven't yet grasped the idea on how those fields will be used in OVS
besides load balancing. Even for load balancing the tunnel itself
(vxlan-gpe + UDP source port or ipv6 flowlabel) already provides enough
entropy to do per-flow load balancing. What else is needed?  Why a
context header for that? You just need multiple action chains and pick
one randomly.

The only protocol that I can compare that to is geneve with TLVs, but
the TLVs are global and uniquie and a property of the networking
forwarding backplane and not a property of the path inside a tenant. So
I expect this actually to be the first case where I think that matters.

Why are context labels that special that they are not part of tun_ops?

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]                   ` <87vakxsaj2.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
@ 2017-09-05 11:38                     ` Yang, Yi
       [not found]                       ` <20170905113848.GC92895-re2EX8HDrk21gSHoDXDV2kEOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Yang, Yi @ 2017-09-05 11:38 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

On Tue, Sep 05, 2017 at 12:30:09PM +0200, Hannes Frederic Sowa wrote:
> "Yang, Yi" <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
> 
> > I'm not sure what new action you expect to bring here, I think group
> > action is just for this, as you said it isn't only bound to NSH, you can
> > start a new thread to discuss this. I don't think it is in scope of NSH.
> 
> It is in scope of this discussion as you will provide a user space API
> that makes the NSH context fields accessible from user space in a
> certain way. If you commit to this, there is no way going back.

We can change this later if we really find a better way to handle this
because it isn't defined in include/uapi/linux/openvswitch.h, so I still
have backdoor to do this if needed :-)

> 
> I haven't yet grasped the idea on how those fields will be used in OVS
> besides load balancing. Even for load balancing the tunnel itself
> (vxlan-gpe + UDP source port or ipv6 flowlabel) already provides enough
> entropy to do per-flow load balancing. What else is needed?  Why a
> context header for that? You just need multiple action chains and pick
> one randomly.

For our sfc use case in Opendaylight, we use context[0] for tunnel ID,
context[1] for destination IP for reverse RSP, they are used to match
and set in OpenFlow table, you can't limit users to use them in such
ways.

If you check GENEVE implementation, tun_metadata* can be set or matched
as any other match field.

Actually the most important information in NSH are just these context
headers, you can't limit imagination of users by removing them from flow
keys.

My point is to bring miniflow into kernel data path to fix your concern,
this will benefit your employer directly :-)

I'm just curious your company has hardware to offload NSH? Which
company are you from?

> 
> The only protocol that I can compare that to is geneve with TLVs, but
> the TLVs are global and uniquie and a property of the networking
> forwarding backplane and not a property of the path inside a tenant. So
> I expect this actually to be the first case where I think that matters.
> 
> Why are context labels that special that they are not part of tun_ops?
> 
> Thanks,
> Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH net-next v6 3/3] openvswitch: enable NSH support
  2017-09-05 10:30                 ` Hannes Frederic Sowa
       [not found]                   ` <87vakxsaj2.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
@ 2017-09-05 12:19                   ` Jan Scheurich
       [not found]                     ` <CFF8EF42F1132E4CBE2BF0AB6C21C58D787F5650-hqolJogE5njKJFWPz4pdheaU1rCVNFv4@public.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Scheurich @ 2017-09-05 12:19 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Yang, Yi; +Cc: netdev, dev, jbenc, e, blp

Hi Hannes,

Please have a look at the Google doc that sketches the overall solution to support NSH in OVS. 
https://drive.google.com/open?id=1oWMYUH8sjZJzWa72o2q9kU0N6pNE-rwZcLH3-kbbDR8

In details it is slightly outdated but the NSH MD1 support (and all prerequisites like PTAP and Generic Encap/Decap) have been implemented in OVS 2.8 (ofproto layer and the userspace datapath). 

The NSH use cases are discussed in the chapter "Support for NSH". OVS is primarily targeting the Classifier and SFF use cases. Obviously the classifier must be able to set the MD1 context headers. The final SFF must be able to inspect the context headers, typically to determine the L2 or L3 forwarding context (e.g. VRF) in which to forward the decapsulated packet on to its final destination.

This information has end-to-end significance for the SFP and is encoded by the classifier in the NSH context headers. It cannot be put into transport tunnel options of e.g. a Geneve tunnel connecting two SFFs along the SFP.

We are now trying to establish feature parity in the kernel datapath. If the kernel datapath chooses not to support the MD1 fields, OVS with kernel datapath will not be able to address the classifier and final SFF use cases.

BR, Jan

> -----Original Message-----
> From: Hannes Frederic Sowa [mailto:hannes@stressinduktion.org]
> Sent: Tuesday, 05 September, 2017 12:30
> To: Yang, Yi <yi.y.yang@intel.com>
> Cc: netdev@vger.kernel.org; dev@openvswitch.org; jbenc@redhat.com; e@erig.me; blp@ovn.org; Jan Scheurich
> <jan.scheurich@ericsson.com>
> Subject: Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
> 
> "Yang, Yi" <yi.y.yang@intel.com> writes:
> 
> > I'm not sure what new action you expect to bring here, I think group
> > action is just for this, as you said it isn't only bound to NSH, you can
> > start a new thread to discuss this. I don't think it is in scope of NSH.
> 
> It is in scope of this discussion as you will provide a user space API
> that makes the NSH context fields accessible from user space in a
> certain way. If you commit to this, there is no way going back.
> 
> I haven't yet grasped the idea on how those fields will be used in OVS
> besides load balancing. Even for load balancing the tunnel itself
> (vxlan-gpe + UDP source port or ipv6 flowlabel) already provides enough
> entropy to do per-flow load balancing. What else is needed?  Why a
> context header for that? You just need multiple action chains and pick
> one randomly.
> 
> The only protocol that I can compare that to is geneve with TLVs, but
> the TLVs are global and uniquie and a property of the networking
> forwarding backplane and not a property of the path inside a tenant. So
> I expect this actually to be the first case where I think that matters.
> 
> Why are context labels that special that they are not part of tun_ops?
> 
> Thanks,
> Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]                       ` <20170905113848.GC92895-re2EX8HDrk21gSHoDXDV2kEOCMrvLtNR@public.gmane.org>
@ 2017-09-05 13:12                         ` Hannes Frederic Sowa
       [not found]                           ` <878thtmgra.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2017-09-05 13:12 UTC (permalink / raw)
  To: Yang, Yi
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

"Yang, Yi" <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:

> On Tue, Sep 05, 2017 at 12:30:09PM +0200, Hannes Frederic Sowa wrote:
>> "Yang, Yi" <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
>> 
>> > I'm not sure what new action you expect to bring here, I think group
>> > action is just for this, as you said it isn't only bound to NSH, you can
>> > start a new thread to discuss this. I don't think it is in scope of NSH.
>> 
>> It is in scope of this discussion as you will provide a user space API
>> that makes the NSH context fields accessible from user space in a
>> certain way. If you commit to this, there is no way going back.
>
> We can change this later if we really find a better way to handle this
> because it isn't defined in include/uapi/linux/openvswitch.h, so I still
> have backdoor to do this if needed :-)

Sorry, I can't follow you.

It doesn't matter if something is defined in uapi headers, the
observable behavior matters. If you allow users to configure flows with
specific fields, it should not stop working at a future point in time.

>> I haven't yet grasped the idea on how those fields will be used in OVS
>> besides load balancing. Even for load balancing the tunnel itself
>> (vxlan-gpe + UDP source port or ipv6 flowlabel) already provides enough
>> entropy to do per-flow load balancing. What else is needed?  Why a
>> context header for that? You just need multiple action chains and pick
>> one randomly.
>
> For our sfc use case in Opendaylight, we use context[0] for tunnel ID,
> context[1] for destination IP for reverse RSP, they are used to match
> and set in OpenFlow table, you can't limit users to use them in such
> ways.

So in your specific case you expect the masks to be completely stable
because you defined a protocol on top of NSH, understood. And that is
stable accross all possible paths. Understood as well.

> If you check GENEVE implementation, tun_metadata* can be set or matched
> as any other match field.

Yes, I wrote that in my previous mail. I wonder why NSH context metadata
is not in tun_metadata as well?

> Actually the most important information in NSH are just these context
> headers, you can't limit imagination of users by removing them from flow
> keys.
>
> My point is to bring miniflow into kernel data path to fix your concern,
> this will benefit your employer directly :-)

Okay, interesting. It will probably not help if you still have a hash of
a packet inside the flow table and use that for load balancing.

[...]

BTW I don't want to stop this patch, I am merely interested in how the
bigger picture will look like in the end.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]                     ` <CFF8EF42F1132E4CBE2BF0AB6C21C58D787F5650-hqolJogE5njKJFWPz4pdheaU1rCVNFv4@public.gmane.org>
@ 2017-09-05 13:34                       ` Hannes Frederic Sowa
  0 siblings, 0 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2017-09-05 13:34 UTC (permalink / raw)
  To: Jan Scheurich
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

Hello Jan,

Jan Scheurich <jan.scheurich-IzeFyvvaP7pWk0Htik3J/w@public.gmane.org> writes:

> Please have a look at the Google doc that sketches the overall
> solution to support NSH in OVS.
> https://drive.google.com/open?id=1oWMYUH8sjZJzWa72o2q9kU0N6pNE-rwZcLH3-kbbDR8
>
> In details it is slightly outdated but the NSH MD1 support (and all
> prerequisites like PTAP and Generic Encap/Decap) have been implemented
> in OVS 2.8 (ofproto layer and the userspace datapath).
>
> The NSH use cases are discussed in the chapter "Support for NSH". OVS
> is primarily targeting the Classifier and SFF use cases. Obviously the
> classifier must be able to set the MD1 context headers. The final SFF
> must be able to inspect the context headers, typically to determine
> the L2 or L3 forwarding context (e.g. VRF) in which to forward the
> decapsulated packet on to its final destination.

>From Yi Yang's mail I understood that you put a tunnel ID into
context[0]. In this case, I can understand that you want to match on
that. I was under the impression that the combination of tenant id from
the vxlan-gpe header and the path id plus ttl would give enough state
for the SDN to keep state on where the packet is in the network. I don't
understand what a tunnel id is.

I understood that the context fields are usable by the service function
chains, but they are actually in use by the outer SDN and basically
standardized.

> This information has end-to-end significance for the SFP and is
> encoded by the classifier in the NSH context headers. It cannot be put
> into transport tunnel options of e.g. a Geneve tunnel connecting two
> SFFs along the SFP.

Because the TLVs are not end to end in the geneve case? Yes, this makes
sense.

> We are now trying to establish feature parity in the kernel
> datapath. If the kernel datapath chooses not to support the MD1
> fields, OVS with kernel datapath will not be able to address the
> classifier and final SFF use cases.

As I understand this now, you are designing some kind of protocol or
particular implementation on top of NSH. Thus you don't expect masks to
grow limitless and you don't allow any SFC elements to take part in the
decision what to communicate over the context fields.

I still wonder about the hash as part of the NSH context and how that
fits into the picture, which looked like the only standardized
application of context headers from your google doc.

Btw., I don't want to block this patch.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]                           ` <878thtmgra.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
@ 2017-09-06  0:53                             ` Yang, Yi
       [not found]                               ` <20170906005359.GA103260-re2EX8HDrk21gSHoDXDV2kEOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Yang, Yi @ 2017-09-06  0:53 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

On Tue, Sep 05, 2017 at 09:12:09PM +0800, Hannes Frederic Sowa wrote:
> "Yang, Yi" <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
> 
> > We can change this later if we really find a better way to handle this
> > because it isn't defined in include/uapi/linux/openvswitch.h, so I still
> > have backdoor to do this if needed :-)
> 
> Sorry, I can't follow you.
> 
> It doesn't matter if something is defined in uapi headers, the
> observable behavior matters. If you allow users to configure flows with
> specific fields, it should not stop working at a future point in time.

Anyway this can be changed if it is really needed, so far current way is
the best one we can come up with, we would like to use it if you can
have better proposal. We have explained repeatedly context headers must
be matched and set, this is bottom line.

> 
> > For our sfc use case in Opendaylight, we use context[0] for tunnel ID,
> > context[1] for destination IP for reverse RSP, they are used to match
> > and set in OpenFlow table, you can't limit users to use them in such
> > ways.
> 
> So in your specific case you expect the masks to be completely stable
> because you defined a protocol on top of NSH, understood. And that is
> stable accross all possible paths. Understood as well.
> 
> > If you check GENEVE implementation, tun_metadata* can be set or matched
> > as any other match field.
> 
> Yes, I wrote that in my previous mail. I wonder why NSH context metadata
> is not in tun_metadata as well?

tun_metadata is tunnel metadata, GENEVE needs tunnel port, but NSH is
not so, NSH can't directly use tun_metadata, for MD type 2, we need to a
lot of rework on tun_metadata to make it shared between GENEVE and NSH,
I don't think this can happen in near term. So tun_metadata isn't option
for this now.

> 
> > Actually the most important information in NSH are just these context
> > headers, you can't limit imagination of users by removing them from flow
> > keys.
> >
> > My point is to bring miniflow into kernel data path to fix your concern,
> > this will benefit your employer directly :-)
> 
> Okay, interesting. It will probably not help if you still have a hash of
> a packet inside the flow table and use that for load balancing.
> 
> [...]
> 
> BTW I don't want to stop this patch, I am merely interested in how the
> bigger picture will look like in the end.
> 
> Bye,
> Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]                               ` <20170906005359.GA103260-re2EX8HDrk21gSHoDXDV2kEOCMrvLtNR@public.gmane.org>
@ 2017-09-06  8:03                                 ` Hannes Frederic Sowa
       [not found]                                   ` <87o9qo9ru6.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2017-09-06  8:03 UTC (permalink / raw)
  To: Yang, Yi
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

"Yang, Yi" <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:

> On Tue, Sep 05, 2017 at 09:12:09PM +0800, Hannes Frederic Sowa wrote:
>> "Yang, Yi" <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
>> 
>> > We can change this later if we really find a better way to handle this
>> > because it isn't defined in include/uapi/linux/openvswitch.h, so I still
>> > have backdoor to do this if needed :-)
>> 
>> Sorry, I can't follow you.
>> 
>> It doesn't matter if something is defined in uapi headers, the
>> observable behavior matters. If you allow users to configure flows with
>> specific fields, it should not stop working at a future point in time.
>
> Anyway this can be changed if it is really needed, so far current way is
> the best one we can come up with, we would like to use it if you can
> have better proposal. We have explained repeatedly context headers must
> be matched and set, this is bottom line.

I understand that in the way you use those context fields you will have
to match on those. I understand that there is no way around that.

>> > For our sfc use case in Opendaylight, we use context[0] for tunnel ID,
>> > context[1] for destination IP for reverse RSP, they are used to match
>> > and set in OpenFlow table, you can't limit users to use them in such
>> > ways.
>> 
>> So in your specific case you expect the masks to be completely stable
>> because you defined a protocol on top of NSH, understood. And that is
>> stable accross all possible paths. Understood as well.
>> 
>> > If you check GENEVE implementation, tun_metadata* can be set or matched
>> > as any other match field.
>> 
>> Yes, I wrote that in my previous mail. I wonder why NSH context metadata
>> is not in tun_metadata as well?
>
> tun_metadata is tunnel metadata, GENEVE needs tunnel port, but NSH is
> not so, NSH can't directly use tun_metadata, for MD type 2, we need to a
> lot of rework on tun_metadata to make it shared between GENEVE and NSH,
> I don't think this can happen in near term. So tun_metadata isn't option
> for this now.

Sorry, I couldn't follow you. Why can't you store the context headers in
tun_metadata exactly?

[...]

Bye,
Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]                                   ` <87o9qo9ru6.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
@ 2017-09-06  8:27                                     ` Jan Scheurich
       [not found]                                       ` <CFF8EF42F1132E4CBE2BF0AB6C21C58D787F5D2E-hqolJogE5njKJFWPz4pdheaU1rCVNFv4@public.gmane.org>
  2017-09-06 11:03                                     ` Yang, Yi
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Scheurich @ 2017-09-06  8:27 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Yang, Yi
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA, e,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA

> >> Yes, I wrote that in my previous mail. I wonder why NSH context metadata
> >> is not in tun_metadata as well?
> >
> > tun_metadata is tunnel metadata, GENEVE needs tunnel port, but NSH is
> > not so, NSH can't directly use tun_metadata, for MD type 2, we need to a
> > lot of rework on tun_metadata to make it shared between GENEVE and NSH,
> > I don't think this can happen in near term. So tun_metadata isn't option
> > for this now.
> 
> Sorry, I couldn't follow you. Why can't you store the context headers in
> tun_metadata exactly?
> 

I think we mixing things. Let me try to clarify:

1. NSH context metadata has end-to-end significance for the SFP. They must be part of the NSH header and cannot be transported as tunnel metadata, because transport tunnels (e.g. Geneve) only connect pairs of SFFs in the path. So we need OVS to be able to match on and set NSH context header fields, also for MD2 TLVs in the future. 

2. OVS today has support for matching on TLV tunnel metadata after termination of a Geneve tunnel. This infrastructure is only usable for OVS tunnel ports (like Geneve) but not for matching on TLV headers of the NSH protocol, which is not modelled as an OVS tunnel port but handled in the OpenFlow pipeline (with generic encp/decap actions to enter/terminate an NSH SFP). This was a strategic decision by the OVS community two years ago.

There is no way we can re-use the existing TLV tunnel metadata infrastructure in OVS for matching and setting NSH MD2 TLV headers. We will need to introduce a new (perhaps similar) scheme for modelling generic TLV match registers in OVS that are assigned to protocol TLVs by the controller. This is FFS.

BR, Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]                                       ` <CFF8EF42F1132E4CBE2BF0AB6C21C58D787F5D2E-hqolJogE5njKJFWPz4pdheaU1rCVNFv4@public.gmane.org>
@ 2017-09-06  9:37                                         ` Hannes Frederic Sowa
       [not found]                                           ` <87bmmo9ngt.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2017-09-06  9:37 UTC (permalink / raw)
  To: Jan Scheurich
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

Jan Scheurich <jan.scheurich-IzeFyvvaP7pWk0Htik3J/w@public.gmane.org> writes:

>> >> Yes, I wrote that in my previous mail. I wonder why NSH context metadata
>> >> is not in tun_metadata as well?
>> >
>> > tun_metadata is tunnel metadata, GENEVE needs tunnel port, but NSH is
>> > not so, NSH can't directly use tun_metadata, for MD type 2, we need to a
>> > lot of rework on tun_metadata to make it shared between GENEVE and NSH,
>> > I don't think this can happen in near term. So tun_metadata isn't option
>> > for this now.
>> 
>> Sorry, I couldn't follow you. Why can't you store the context headers in
>> tun_metadata exactly?
>> 
>
> I think we mixing things. Let me try to clarify:
>
> 1. NSH context metadata has end-to-end significance for the SFP. They
> must be part of the NSH header and cannot be transported as tunnel
> metadata, because transport tunnels (e.g. Geneve) only connect pairs
> of SFFs in the path.

No questions asked. I am not talking about a design choice of the
protocol but an implementation detail of the patch.

> So we need OVS to be able to match on and set NSH context header
> fields, also for MD2 TLVs in the future.

So be it.

> 2. OVS today has support for matching on TLV tunnel metadata after
> termination of a Geneve tunnel. This infrastructure is only usable for
> OVS tunnel ports (like Geneve) but not for matching on TLV headers of
> the NSH protocol, which is not modelled as an OVS tunnel port but
> handled in the OpenFlow pipeline (with generic encp/decap actions to
> enter/terminate an NSH SFP). This was a strategic decision by the OVS
> community two years ago.

I am talking about the tun_opts field in the sw_flow_keys structure for
the kernel dp only.

> There is no way we can re-use the existing TLV tunnel metadata
> infrastructure in OVS for matching and setting NSH MD2 TLV headers. We
> will need to introduce a new (perhaps similar) scheme for modelling
> generic TLV match registers in OVS that are assigned to protocol TLVs
> by the controller. This is FFS.

This is what I don't understand.

Why can't you just reuse the space in the struct sw_flow_key where
geneve would put in their metadata. There are 255 empty bytes at the
beginning if you don't have other tunnel metadata anyway.

If you receive packets over vxlan(gpe), tun_opts gets populated with an
ip_tunnel_key. Couldn't you use the options space in there after the
ip_tunnel_key to store the NSH context just for the sake of storing them
somewhere instead of adding 16 bytes to sw_flow_key?

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]                                           ` <87bmmo9ngt.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
@ 2017-09-06  9:54                                             ` Jan Scheurich
  2017-09-06 10:02                                               ` Hannes Frederic Sowa
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Scheurich @ 2017-09-06  9:54 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

> > There is no way we can re-use the existing TLV tunnel metadata
> > infrastructure in OVS for matching and setting NSH MD2 TLV headers. We
> > will need to introduce a new (perhaps similar) scheme for modelling
> > generic TLV match registers in OVS that are assigned to protocol TLVs
> > by the controller. This is FFS.
> 
> This is what I don't understand.
> 
> Why can't you just reuse the space in the struct sw_flow_key where
> geneve would put in their metadata. There are 255 empty bytes at the
> beginning if you don't have other tunnel metadata anyway.
> 
> If you receive packets over vxlan(gpe), tun_opts gets populated with an
> ip_tunnel_key. Couldn't you use the options space in there after the
> ip_tunnel_key to store the NSH context just for the sake of storing them
> somewhere instead of adding 16 bytes to sw_flow_key?

There is a significant conceptual difference between tunnel metadata (copied from a popped tunnel header) and packed match fields extracted during parsing of the packets. If we'd store them in the same space in the sw_flow_key struct, we are calling for trouble.

NSH is transport agnostic, it should work over Ethernet, VXLAN(GPE) and other transport tunnels. Think about an NSH packet arriving on an Geneve tunnel port. Any Geneve tunnel options have already been stored in the tun_opts metadata bytes. Now the datapath parses the NSH header and overwrites the tun_opts metadata with the NSH metadata. This would break the OVS semantics.

I absolutely understand your concern about efficient space utilization in the flow struct for TLV match fields and it will be part of the design challenge for MD2 TLV support to find a good balance between memory and run-time efficiency. But that is FFS. For the four fixed size MD1 headers the decision has been to include them as additional attributes in the flow key.

BR, Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
  2017-09-06  9:54                                             ` Jan Scheurich
@ 2017-09-06 10:02                                               ` Hannes Frederic Sowa
  0 siblings, 0 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2017-09-06 10:02 UTC (permalink / raw)
  To: Jan Scheurich; +Cc: Yang, Yi, netdev, dev, jbenc, e, blp

Jan Scheurich <jan.scheurich@ericsson.com> writes:

>> > There is no way we can re-use the existing TLV tunnel metadata
>> > infrastructure in OVS for matching and setting NSH MD2 TLV headers. We
>> > will need to introduce a new (perhaps similar) scheme for modelling
>> > generic TLV match registers in OVS that are assigned to protocol TLVs
>> > by the controller. This is FFS.
>> 
>> This is what I don't understand.
>> 
>> Why can't you just reuse the space in the struct sw_flow_key where
>> geneve would put in their metadata. There are 255 empty bytes at the
>> beginning if you don't have other tunnel metadata anyway.
>> 
>> If you receive packets over vxlan(gpe), tun_opts gets populated with an
>> ip_tunnel_key. Couldn't you use the options space in there after the
>> ip_tunnel_key to store the NSH context just for the sake of storing them
>> somewhere instead of adding 16 bytes to sw_flow_key?
>
> There is a significant conceptual difference between tunnel metadata
> (copied from a popped tunnel header) and packed match fields extracted
> during parsing of the packets. If we'd store them in the same space in
> the sw_flow_key struct, we are calling for trouble.
>
> NSH is transport agnostic, it should work over Ethernet, VXLAN(GPE)
> and other transport tunnels. Think about an NSH packet arriving on an
> Geneve tunnel port. Any Geneve tunnel options have already been stored
> in the tun_opts metadata bytes. Now the datapath parses the NSH header
> and overwrites the tun_opts metadata with the NSH metadata. This would
> break the OVS semantics.

Obviously you would use key->tun_opts_len and start appending there and
not simply overwrite. Otherwise that would be rather silly.

> I absolutely understand your concern about efficient space utilization
> in the flow struct for TLV match fields and it will be part of the
> design challenge for MD2 TLV support to find a good balance between
> memory and run-time efficiency. But that is FFS. For the four fixed
> size MD1 headers the decision has been to include them as additional
> attributes in the flow key.

Okay, then.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
       [not found]                                   ` <87o9qo9ru6.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
  2017-09-06  8:27                                     ` Jan Scheurich
@ 2017-09-06 11:03                                     ` Yang, Yi
  1 sibling, 0 replies; 31+ messages in thread
From: Yang, Yi @ 2017-09-06 11:03 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	jbenc-H+wXaHxf7aLQT0dZR+AlfA, e

On Wed, Sep 06, 2017 at 04:03:29PM +0800, Hannes Frederic Sowa wrote:
> "Yang, Yi" <yi.y.yang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
> >> 
> >> > If you check GENEVE implementation, tun_metadata* can be set or matched
> >> > as any other match field.
> >> 
> >> Yes, I wrote that in my previous mail. I wonder why NSH context metadata
> >> is not in tun_metadata as well?
> >
> > tun_metadata is tunnel metadata, GENEVE needs tunnel port, but NSH is
> > not so, NSH can't directly use tun_metadata, for MD type 2, we need to a
> > lot of rework on tun_metadata to make it shared between GENEVE and NSH,
> > I don't think this can happen in near term. So tun_metadata isn't option
> > for this now.
> 
> Sorry, I couldn't follow you. Why can't you store the context headers in
> tun_metadata exactly?

tun_metadata can work only if in_port and out_port are GENEVE tunnel
port, it is designed specially for GENEVE, Eth+NSH can work without
any tunnel port involved, context headers for NSH are not tunnel
metadata, they aren't tunnel-specific, you can't treat them as same
thing. In addtition, tun_metadata also can be matched and set, they have
the same issue as you concern.

> 
> [...]
> 
> Bye,
> Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-09-06 11:03 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-25 14:20 [PATCH net-next v6 0/3] openvswitch: add NSH support Yi Yang
2017-08-25 14:20 ` [PATCH net-next v6 1/3] net: add NSH header structures and helpers Yi Yang
2017-08-25 15:07   ` Jiri Benc
2017-08-25 14:20 ` [PATCH net-next v6 2/3] net: gso: Add GSO support for NSH Yi Yang
2017-08-25 16:25   ` Jiri Benc
2017-08-25 23:22     ` Jiri Benc
2017-08-25 14:20 ` [PATCH net-next v6 3/3] openvswitch: enable NSH support Yi Yang
2017-08-30  9:53   ` Hannes Frederic Sowa
     [not found]     ` <87wp5l7560.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
2017-08-30 11:36       ` Mooney, Sean K
2017-08-30 15:15         ` [ovs-dev] " Hannes Frederic Sowa
     [not found]           ` <87inh56q8u.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
2017-08-30 19:00             ` Mooney, Sean K
2017-08-31 12:49               ` [ovs-dev] " Hannes Frederic Sowa
2017-09-04  9:38                 ` Jan Scheurich
2017-09-04 11:45                   ` Hannes Frederic Sowa
2017-09-01 12:11             ` Jan Scheurich
2017-09-04  2:38       ` Yang, Yi
2017-09-04 11:22         ` Hannes Frederic Sowa
2017-09-04 11:57           ` Jan Scheurich
     [not found]           ` <87mv6abte5.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
2017-09-05  2:11             ` Yang, Yi
     [not found]               ` <20170905021112.GA86057-re2EX8HDrk21gSHoDXDV2kEOCMrvLtNR@public.gmane.org>
2017-09-05 10:30                 ` Hannes Frederic Sowa
     [not found]                   ` <87vakxsaj2.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
2017-09-05 11:38                     ` Yang, Yi
     [not found]                       ` <20170905113848.GC92895-re2EX8HDrk21gSHoDXDV2kEOCMrvLtNR@public.gmane.org>
2017-09-05 13:12                         ` Hannes Frederic Sowa
     [not found]                           ` <878thtmgra.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
2017-09-06  0:53                             ` Yang, Yi
     [not found]                               ` <20170906005359.GA103260-re2EX8HDrk21gSHoDXDV2kEOCMrvLtNR@public.gmane.org>
2017-09-06  8:03                                 ` Hannes Frederic Sowa
     [not found]                                   ` <87o9qo9ru6.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
2017-09-06  8:27                                     ` Jan Scheurich
     [not found]                                       ` <CFF8EF42F1132E4CBE2BF0AB6C21C58D787F5D2E-hqolJogE5njKJFWPz4pdheaU1rCVNFv4@public.gmane.org>
2017-09-06  9:37                                         ` Hannes Frederic Sowa
     [not found]                                           ` <87bmmo9ngt.fsf-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r@public.gmane.org>
2017-09-06  9:54                                             ` Jan Scheurich
2017-09-06 10:02                                               ` Hannes Frederic Sowa
2017-09-06 11:03                                     ` Yang, Yi
2017-09-05 12:19                   ` Jan Scheurich
     [not found]                     ` <CFF8EF42F1132E4CBE2BF0AB6C21C58D787F5650-hqolJogE5njKJFWPz4pdheaU1rCVNFv4@public.gmane.org>
2017-09-05 13:34                       ` Hannes Frederic Sowa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.