netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v4 00/21] Flow Based Tunneling for Open vSwitch
@ 2012-05-24  9:08 Simon Horman
  2012-05-24  9:08 ` [PATCH 02/21] datapath: Use tun_key on transmit Simon Horman
                   ` (8 more replies)
  0 siblings, 9 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:08 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery

Hi,

This series comprises a fresh batch of proposed changes to introduce
flow-based tunnelling.

At the heart of these changes is the following structure, which
is attached as a pointer to skb->cb.

struct ovs_key_ipv4_tunnel {
        __be64 tun_id;
        __u32  tun_flags;
        __be32 ipv4_src;
        __be32 ipv4_dst;
        __u8   ipv4_tos;
        __u8   ipv4_ttl;
        __u8   pad[2];
};

This series does not introdue use of in-tree kernel tunneling code
by Open vSwitch. However, it is intended as preliminary work
for that goal and I believe attaching a structure similar
to the one above to to skb->cb could be mechanism to achieve that.

I have CCed netdev for any comment on that.

Some details of the implementatoin follow, they are not
particularly related to the use of in-tree kernel tunneling code.


Overview:

In general the appraoch that I have taken in user-space is to split
tunneling into realdevs and tundevs.  Tunnel realdevs are devices that look
to users like the existing port-based tunnelling implementation. Tunnel
tundevs exist in the datapath and are where tx and rx occur.  Tunnel
tundevs have very little configuration and are unable to opperate without
flow information that describes at least the remote IP.

Changes:

* Do not attempt to configure a tundev realport, it will fail which
  results in ovs-vswitchd to start. I had not noticed this as
  ovs-vswitchd will start if there are no tundevs present in the databse
  when it starts, and I usally test on a fresh install.

* Add a flags fields to ovs_key_ipv4_tunnel (above) and use it
  to reinstate the functionality of various flags e.g. tunnel checksum,
  tunnel out key. Previously these flags were set on the 'mutable' of
  a tunnel device in the kernel, however this is no longer appropriate
  as a tunnel device may now handle multiple tunnels.

* Cleaned up output and parsing of tunnel flows.
  Test Suite enhancements to come.

* Do not use Linux kernel headers in lib/odp-util.c.
  This is achieved by defining a new structure flow_tun_key
  and using it instead of ovs_key_ipv4_tunnel. THe structure
  is currently the same internally as ovs_key_ipv4_tunnel.

Limiations:

* In this series, realdevs exist in the kernel although I believe
  it should not be necessary for them to do so. The reason that they are
  there is to limit the changes that are needed to the user-space netdev
  code and to allow review of the series before making those changes.

* PMTU discovery is broken and I'm unsure if it has been fixed.
  Jesse Gross sugested that a uer-space implemtation of MSS clampint would
  be a good solution to this. I have made a start on that and sent a
  separate email about it.

* The header cache has been removed, but some reminants of the
  API remain. In particualr the tunnel header is still created and updated,
  even thogh both occur for each transmit. It may make sense to
  recombine those calls into a single call if the header cache is
  to be permantently removed.

* Multicast could be implemented in user-space byt currently isn't.
  This means that muilticast remote IP for tunneling is broken.

* I have not implemented matches for tun_keys. This means
  that the current implementation only provides port-based tunneling
  implemented on top of flow-bassed tunneling. It is not yet possible for a
  controller to match on or set the tun_key of flows.

  I expect this to be a small body of work to complete.

* The way that I have split the patchs is still somewhat arbitrary.
  I wanted to avoid one very large patch to aid review.  But a lot of the
  chagnes are inter-related, so a bisectable split seems rather difficult.
  None the less, the split could be significantly improved.

----------------------------------------------------------------
Simon Horman (21):
      datapath: tunnelling: Replace tun_id with tun_key
      datapath: Use tun_key on transmit
      odp-util: Add tun_key to parse_odp_key_attr()
      vswitchd: Add iface_parse_tunnel
      vswitchd: Add add_tunnel_ports()
      ofproto: Add set_tunnelling()
      vswitchd: Configure tunnel interfaces.
      ofproto: Add realdev_to_txdev()
      ofproto: Add tundev_to_realdev()
      classifier: Convert struct flow flow_metadata to use tun_key
      datapath, vport: Provide tunnel realdev and tundev classes and vports
      lib: Replace commit_set_tun_id_action() with commit_set_tunnel_action()
      global: Remove OVS_KEY_ATTR_TUN_ID
      ofproto: Set flow tun_key in compose_output_action()
      datapath: Remove mlink element from tnl_mutable_config
      datapath: remove tunnel cache
      datapath: Always use tun_key addresses for route lookup
      dataptah: remove ttl and tos from tnl_mutable_config
      datapath: Simplify vport lookup
      datapath: Use tun_key flags for id and csum settings on transmit
      datapath: Always use tun_key flags

 datapath/Modules.mk             |   3 +-
 datapath/actions.c              |   6 +-
 datapath/datapath.c             |  11 +-
 datapath/datapath.h             |   5 +-
 datapath/flow.c                 |  35 +-
 datapath/flow.h                 |  27 +-
 datapath/tunnel.c               | 782 +++++-----------------------------------
 datapath/tunnel.h               |  98 +----
 datapath/vport-capwap.c         |  45 +--
 datapath/vport-gre.c            |  62 ++--
 datapath/vport-tunnel-realdev.c | 260 +++++++++++++
 datapath/vport.c                |   3 +-
 datapath/vport.h                |   1 +
 include/linux/openvswitch.h     |  24 +-
 include/openvswitch/tunnel.h    |   4 +
 lib/classifier.c                |   8 +-
 lib/dpif-linux.c                |   2 +-
 lib/dpif-netdev.c               |   2 +-
 lib/flow.c                      |  31 +-
 lib/flow.h                      |  21 +-
 lib/meta-flow.c                 |   4 +-
 lib/netdev-vport.c              | 333 ++++-------------
 lib/nx-match.c                  |   2 +-
 lib/odp-util.c                  |  72 +++-
 lib/odp-util.h                  |   5 +-
 lib/ofp-print.c                 |  12 +-
 lib/ofp-util.c                  |   4 +-
 ofproto/ofproto-dpif.c          | 347 ++++++++++++++++--
 ofproto/ofproto-provider.h      |  12 +
 ofproto/ofproto.c               |  28 ++
 ofproto/ofproto.h               |  46 +++
 tests/test-classifier.c         |   7 +-
 vswitchd/bridge.c               | 350 ++++++++++++++++++
 33 files changed, 1451 insertions(+), 1201 deletions(-)
 create mode 100644 datapath/vport-tunnel-realdev.c

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
@ 2012-05-24  9:08   ` Simon Horman
       [not found]     ` <1337850554-10339-2-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
  2012-06-03  9:15     ` [ovs-dev] " Jesse Gross
  2012-05-24  9:08   ` [PATCH 04/21] vswitchd: Add iface_parse_tunnel Simon Horman
                     ` (11 subsequent siblings)
  12 siblings, 2 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:08 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

this is a first pass at providing a tun_key which can be used
as the basis for flow-based tunnelling. The tun_key includes and
replaces the tun_id in both struct ovs_skb_cb and struct sw_tun_key.

In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow
when support for IPv6 to an extent that inlining the structure will result
in ovs_skb_cb being larger than the 48 bytes available in skb->cb.

As OVS does not support IPv6 as the outer transport protocol for tunnels
the IPv6 portions of this change, which appeared in the previous revision,
have been dropped in order to limit the scope and size of this patch.

This patch does not make any effort to retain the existing tun_id behaviour
nor does it fully implement flow-based tunnels. As such it it is incomplete
and can't be used in its current form (other than to break OVS tunnelling).

** Please do not apply **

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

---

v4
* Add tun_flags to ovs_key_ipv4_tunnel
* Correct format in format_odp_key_attr()

v3
* Rework, actually works in limited scenarios

v2
* Use pointer to struct ovs_key_ipv4_tunnel in OVS_CB()
  rather than having a struct ovs_key_ipv4_tunnel in OVS_CB()

v1
* Initial post
---
 datapath/actions.c          |  6 +++---
 datapath/datapath.c         | 10 +++++++++-
 datapath/datapath.h         |  5 +++--
 datapath/flow.c             | 34 +++++++++++++++++++++++-----------
 datapath/flow.h             | 27 +++++++++++++++++++++++----
 datapath/tunnel.c           | 24 +++++++++++++-----------
 datapath/tunnel.h           |  5 +++--
 datapath/vport-capwap.c     | 12 ++++++------
 datapath/vport-gre.c        | 21 +++++++++++----------
 datapath/vport.c            |  2 +-
 include/linux/openvswitch.h | 13 ++++++++++++-
 lib/dpif-netdev.c           |  1 +
 lib/odp-util.c              | 13 +++++++++++++
 lib/odp-util.h              |  5 +++--
 14 files changed, 124 insertions(+), 54 deletions(-)

diff --git a/datapath/actions.c b/datapath/actions.c
index 208f260..7b2ea25 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -342,8 +342,8 @@ static int execute_set_action(struct sk_buff *skb,
 		skb->priority = nla_get_u32(nested_attr);
 		break;
 
-	case OVS_KEY_ATTR_TUN_ID:
-		OVS_CB(skb)->tun_id = nla_get_be64(nested_attr);
+	case OVS_KEY_ATTR_IPV4_TUNNEL:
+		OVS_CB(skb)->tun_key = nla_data(nested_attr);
 		break;
 
 	case OVS_KEY_ATTR_ETHERNET:
@@ -469,7 +469,7 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb)
 		goto out_loop;
 	}
 
-	OVS_CB(skb)->tun_id = 0;
+	OVS_CB(skb)->tun_key = NULL;
 	error = do_execute_actions(dp, skb, acts->actions,
 					 acts->actions_len, false);
 
diff --git a/datapath/datapath.c b/datapath/datapath.c
index a4376a0..65dfe79 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -587,12 +587,20 @@ static int validate_set(const struct nlattr *a,
 
 	switch (key_type) {
 	const struct ovs_key_ipv4 *ipv4_key;
+	const struct ovs_key_ipv4_tunnel *tun_key;
 
 	case OVS_KEY_ATTR_PRIORITY:
 	case OVS_KEY_ATTR_TUN_ID:
 	case OVS_KEY_ATTR_ETHERNET:
 		break;
 
+	case OVS_KEY_ATTR_IPV4_TUNNEL:
+		tun_key = nla_data(ovs_key);
+		if (!tun_key->ipv4_dst) {
+			return -EINVAL;
+		}
+		break;
+
 	case OVS_KEY_ATTR_IPV4:
 		if (flow_key->eth.type != htons(ETH_P_IP))
 			return -EINVAL;
@@ -785,7 +793,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 
 	err = ovs_flow_metadata_from_nlattrs(&flow->key.phy.priority,
 					     &flow->key.phy.in_port,
-					     &flow->key.phy.tun_id,
+					     &flow->key.phy.tun_key,
 					     a[OVS_PACKET_ATTR_KEY]);
 	if (err)
 		goto err_flow_put;
diff --git a/datapath/datapath.h b/datapath/datapath.h
index affbf0e..de0b28d 100644
--- a/datapath/datapath.h
+++ b/datapath/datapath.h
@@ -96,7 +96,7 @@ struct datapath {
 /**
  * struct ovs_skb_cb - OVS data in skb CB
  * @flow: The flow associated with this packet.  May be %NULL if no flow.
- * @tun_id: ID of the tunnel that encapsulated this packet.  It is 0 if the
+ * @tun_key: Key for the tunnel that encapsulated this packet.
  * @ip_summed: Consistently stores L4 checksumming status across different
  * kernel versions.
  * @csum_start: Stores the offset from which to start checksumming independent
@@ -107,7 +107,7 @@ struct datapath {
  */
 struct ovs_skb_cb {
 	struct sw_flow		*flow;
-	__be64			tun_id;
+	struct ovs_key_ipv4_tunnel  *tun_key;
 #ifdef NEED_CSUM_NORMALIZE
 	enum csum_type		ip_summed;
 	u16			csum_start;
@@ -192,4 +192,5 @@ struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 pid, u32 seq,
 					 u8 cmd);
 
 int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb);
+
 #endif /* datapath.h */
diff --git a/datapath/flow.c b/datapath/flow.c
index d07337c..49c0dd8 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -629,7 +629,8 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
 	memset(key, 0, sizeof(*key));
 
 	key->phy.priority = skb->priority;
-	key->phy.tun_id = OVS_CB(skb)->tun_id;
+	if (OVS_CB(skb)->tun_key)
+		key->phy.tun_key = *OVS_CB(skb)->tun_key;
 	key->phy.in_port = in_port;
 
 	skb_reset_mac_header(skb);
@@ -847,6 +848,7 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 
 	/* Not upstream. */
 	[OVS_KEY_ATTR_TUN_ID] = sizeof(__be64),
+	[OVS_KEY_ATTR_IPV4_TUNNEL] = sizeof(struct ovs_key_ipv4_tunnel),
 };
 
 static int ipv4_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len,
@@ -1022,9 +1024,11 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
 		swkey->phy.in_port = DP_MAX_PORTS;
 	}
 
-	if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) {
-		swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
-		attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID);
+	if (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) {
+		struct ovs_key_ipv4_tunnel *tun_key;
+		tun_key = nla_data(a[OVS_KEY_ATTR_IPV4_TUNNEL]);
+		swkey->phy.tun_key = *tun_key;
+		attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL);
 	}
 
 	/* Data attributes. */
@@ -1162,14 +1166,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
  * get the metadata, that is, the parts of the flow key that cannot be
  * extracted from the packet itself.
  */
-int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
+int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
+				   struct ovs_key_ipv4_tunnel *tun_key,
 				   const struct nlattr *attr)
 {
 	const struct nlattr *nla;
 	int rem;
 
 	*in_port = DP_MAX_PORTS;
-	*tun_id = 0;
+	tun_key->tun_id = 0;
 	*priority = 0;
 
 	nla_for_each_nested(nla, attr, rem) {
@@ -1184,8 +1189,9 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
 				*priority = nla_get_u32(nla);
 				break;
 
-			case OVS_KEY_ATTR_TUN_ID:
-				*tun_id = nla_get_be64(nla);
+			case OVS_KEY_ATTR_IPV4_TUNNEL:
+				memcpy(tun_key, nla_data(nla),
+				       sizeof(*tun_key));
 				break;
 
 			case OVS_KEY_ATTR_IN_PORT:
@@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
 int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
 {
 	struct ovs_key_ethernet *eth_key;
+	struct ovs_key_ipv4_tunnel *tun_key;
 	struct nlattr *nla, *encap;
 
 	if (swkey->phy.priority &&
 	    nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
 		goto nla_put_failure;
 
-	if (swkey->phy.tun_id != cpu_to_be64(0) &&
-	    nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id))
-		goto nla_put_failure;
+	if (swkey->phy.tun_key.ipv4_dst) {
+		nla = nla_reserve(skb, OVS_KEY_ATTR_IPV4_TUNNEL,
+				  sizeof(*tun_key));
+		if (!nla)
+			goto nla_put_failure;
+		tun_key = nla_data(nla);
+		*tun_key = swkey->phy.tun_key;
+	}
 
 	if (swkey->phy.in_port != DP_MAX_PORTS &&
 	    nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, swkey->phy.in_port))
diff --git a/datapath/flow.h b/datapath/flow.h
index 5be481e..bab5363 100644
--- a/datapath/flow.h
+++ b/datapath/flow.h
@@ -42,7 +42,7 @@ struct sw_flow_actions {
 
 struct sw_flow_key {
 	struct {
-		__be64	tun_id;		/* Encapsulating tunnel ID. */
+		struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
 		u32	priority;	/* Packet QoS priority. */
 		u16	in_port;	/* Input switch port (or DP_MAX_PORTS). */
 	} phy;
@@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
  *                         ------  ---  ------  -----
  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
+ *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24
  *  OVS_KEY_ATTR_IN_PORT       4    --     4      8
  *  OVS_KEY_ATTR_ETHERNET     12    --     4     16
  *  OVS_KEY_ATTR_8021Q         4    --     4      8
@@ -158,14 +159,15 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
  *  OVS_KEY_ATTR_ICMPV6        2     2     4      8
  *  OVS_KEY_ATTR_ND           28    --     4     32
  *  -------------------------------------------------
- *  total                                       144
+ *  total                                       168
  */
-#define FLOW_BUFSIZE 144
+#define FLOW_BUFSIZE 168
 
 int ovs_flow_to_nlattrs(const struct sw_flow_key *, struct sk_buff *);
 int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
 		      const struct nlattr *);
-int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
+int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
+				   struct ovs_key_ipv4_tunnel *tun_key,
 				   const struct nlattr *);
 
 #define MAX_ACTIONS_BUFSIZE	(16 * 1024)
@@ -204,4 +206,21 @@ u32 ovs_flow_hash(const struct sw_flow_key *key, int key_len);
 struct sw_flow *ovs_flow_tbl_next(struct flow_table *table, u32 *bucket, u32 *idx);
 extern const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1];
 
+static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key)
+{
+	__be32 ndst = tun_key->ipv4_src;
+	tun_key->ipv4_src = tun_key->ipv4_dst;
+	tun_key->ipv4_dst = ndst;
+}
+
+static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
+				const struct iphdr *iph, __be64 tun_id)
+{
+	tun_key->tun_id = tun_id;
+	tun_key->ipv4_src = iph->saddr;
+	tun_key->ipv4_dst = iph->daddr;
+	tun_key->ipv4_tos = iph->tos;
+	tun_key->ipv4_ttl = iph->ttl;
+}
+
 #endif /* flow.h */
diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index d651c11..010e513 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -367,9 +367,9 @@ struct vport *ovs_tnl_find_port(struct net *net, __be32 saddr, __be32 daddr,
 	return NULL;
 }
 
-static void ecn_decapsulate(struct sk_buff *skb, u8 tos)
+static void ecn_decapsulate(struct sk_buff *skb)
 {
-	if (unlikely(INET_ECN_is_ce(tos))) {
+	if (unlikely(INET_ECN_is_ce(OVS_CB(skb)->tun_key->ipv4_tos))) {
 		__be16 protocol = skb->protocol;
 
 		skb_set_network_header(skb, ETH_HLEN);
@@ -416,7 +416,7 @@ static void ecn_decapsulate(struct sk_buff *skb, u8 tos)
  * - skb->csum does not include the inner Ethernet header.
  * - The layer pointers are undefined.
  */
-void ovs_tnl_rcv(struct vport *vport, struct sk_buff *skb, u8 tos)
+void ovs_tnl_rcv(struct vport *vport, struct sk_buff *skb)
 {
 	struct ethhdr *eh;
 
@@ -433,7 +433,7 @@ void ovs_tnl_rcv(struct vport *vport, struct sk_buff *skb, u8 tos)
 	skb_clear_rxhash(skb);
 	secpath_reset(skb);
 
-	ecn_decapsulate(skb, tos);
+	ecn_decapsulate(skb);
 	vlan_set_tci(skb, 0);
 
 	if (unlikely(compute_ip_summed(skb, false))) {
@@ -613,12 +613,14 @@ static void ipv6_build_icmp(struct sk_buff *skb, struct sk_buff *nskb,
 
 bool ovs_tnl_frag_needed(struct vport *vport,
 			 const struct tnl_mutable_config *mutable,
-			 struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
+			 struct sk_buff *skb, unsigned int mtu,
+			 struct ovs_key_ipv4_tunnel *tun_key)
 {
 	unsigned int eth_hdr_len = ETH_HLEN;
 	unsigned int total_length = 0, header_length = 0, payload_length;
 	struct ethhdr *eh, *old_eh = eth_hdr(skb);
 	struct sk_buff *nskb;
+	struct ovs_key_ipv4_tunnel ntun_key;
 
 	/* Sanity check */
 	if (skb->protocol == htons(ETH_P_IP)) {
@@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport,
 	 * any way of synthesizing packets.
 	 */
 	if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
-	    (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
-		OVS_CB(nskb)->tun_id = flow_key;
+	    (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
+		ntun_key = *tun_key;
+		OVS_CB(nskb)->tun_key = &ntun_key;
+	}
 
 	if (unlikely(compute_ip_summed(nskb, false))) {
 		kfree_skb(nskb);
@@ -761,7 +765,7 @@ static bool check_mtu(struct sk_buff *skb,
 
 			if (packet_length > mtu &&
 			    ovs_tnl_frag_needed(vport, mutable, skb, mtu,
-						OVS_CB(skb)->tun_id))
+						OVS_CB(skb)->tun_key))
 				return false;
 		}
 	}
@@ -778,7 +782,7 @@ static bool check_mtu(struct sk_buff *skb,
 
 			if (packet_length > mtu &&
 			    ovs_tnl_frag_needed(vport, mutable, skb, mtu,
-						OVS_CB(skb)->tun_id))
+						OVS_CB(skb)->tun_key))
 				return false;
 		}
 	}
@@ -799,10 +803,8 @@ static void create_tunnel_header(const struct vport *vport,
 	iph->ihl	= sizeof(struct iphdr) >> 2;
 	iph->frag_off	= htons(IP_DF);
 	iph->protocol	= tnl_vport->tnl_ops->ipproto;
-	iph->tos	= mutable->tos;
 	iph->daddr	= rt->rt_dst;
 	iph->saddr	= rt->rt_src;
-	iph->ttl	= mutable->ttl;
 	if (!iph->ttl)
 		iph->ttl = ip4_dst_hoplimit(&rt_dst(rt));
 
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index 1924017..7d78297 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -269,14 +269,15 @@ int ovs_tnl_set_addr(struct vport *vport, const unsigned char *addr);
 const char *ovs_tnl_get_name(const struct vport *vport);
 const unsigned char *ovs_tnl_get_addr(const struct vport *vport);
 int ovs_tnl_send(struct vport *vport, struct sk_buff *skb);
-void ovs_tnl_rcv(struct vport *vport, struct sk_buff *skb, u8 tos);
+void ovs_tnl_rcv(struct vport *vport, struct sk_buff *skb);
 
 struct vport *ovs_tnl_find_port(struct net *net, __be32 saddr, __be32 daddr,
 				__be64 key, int tunnel_type,
 				const struct tnl_mutable_config **mutable);
 bool ovs_tnl_frag_needed(struct vport *vport,
 			 const struct tnl_mutable_config *mutable,
-			 struct sk_buff *skb, unsigned int mtu, __be64 flow_key);
+			 struct sk_buff *skb, unsigned int mtu,
+			 struct ovs_key_ipv4_tunnel *tun_key);
 void ovs_tnl_free_linked_skbs(struct sk_buff *skb);
 
 int ovs_tnl_init(void);
diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c
index 05a099d..1e08d5a 100644
--- a/datapath/vport-capwap.c
+++ b/datapath/vport-capwap.c
@@ -220,7 +220,7 @@ static struct sk_buff *capwap_update_header(const struct vport *vport,
 		struct capwaphdr_wsi *wsi = (struct capwaphdr_wsi *)(cwh + 1);
 		struct capwaphdr_wsi_key *opt = (struct capwaphdr_wsi_key *)(wsi + 1);
 
-		opt->key = OVS_CB(skb)->tun_id;
+		opt->key = OVS_CB(skb)->tun_key->tun_id;
 	}
 
 	udph->len = htons(skb->len - skb_transport_offset(skb));
@@ -316,6 +316,7 @@ static int capwap_rcv(struct sock *sk, struct sk_buff *skb)
 	struct vport *vport;
 	const struct tnl_mutable_config *mutable;
 	struct iphdr *iph;
+	struct ovs_key_ipv4_tunnel tun_key;
 	__be64 key = 0;
 
 	if (unlikely(!pskb_may_pull(skb, CAPWAP_MIN_HLEN + ETH_HLEN)))
@@ -333,12 +334,11 @@ static int capwap_rcv(struct sock *sk, struct sk_buff *skb)
 		goto error;
 	}
 
-	if (mutable->flags & TNL_F_IN_KEY_MATCH)
-		OVS_CB(skb)->tun_id = key;
-	else
-		OVS_CB(skb)->tun_id = 0;
+	tun_key_init(&tun_key, iph,
+		     mutable->flags & TNL_F_IN_KEY_MATCH ? key : 0);
+	OVS_CB(skb)->tun_key = &tun_key;
 
-	ovs_tnl_rcv(vport, skb, iph->tos);
+	ovs_tnl_rcv(vport, skb);
 	goto out;
 
 error:
diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
index ab89c5b..fd2b038 100644
--- a/datapath/vport-gre.c
+++ b/datapath/vport-gre.c
@@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct vport *vport,
 	__be32 *options = (__be32 *)(skb_network_header(skb) + mutable->tunnel_hlen
 					       - GRE_HEADER_SECTION);
 
-	/* Work backwards over the options so the checksum is last. */
-	if (mutable->flags & TNL_F_OUT_KEY_ACTION)
-		*options = be64_get_low32(OVS_CB(skb)->tun_id);
-
 	if (mutable->out_key || mutable->flags & TNL_F_OUT_KEY_ACTION)
 		options--;
 
@@ -285,7 +281,11 @@ static void gre_err(struct sk_buff *skb, u32 info)
 #endif
 
 	__skb_pull(skb, tunnel_hdr_len);
-	ovs_tnl_frag_needed(vport, mutable, skb, mtu, key);
+	{
+		struct ovs_key_ipv4_tunnel tun_key;
+		tun_key_init(&tun_key, iph, key);
+		ovs_tnl_frag_needed(vport, mutable, skb, mtu, &tun_key);
+	}
 	__skb_push(skb, tunnel_hdr_len);
 
 out:
@@ -327,6 +327,7 @@ static int gre_rcv(struct sk_buff *skb)
 	const struct tnl_mutable_config *mutable;
 	int hdr_len;
 	struct iphdr *iph;
+	struct ovs_key_ipv4_tunnel tun_key;
 	__be16 flags;
 	__be64 key;
 
@@ -351,15 +352,15 @@ static int gre_rcv(struct sk_buff *skb)
 		goto error;
 	}
 
-	if (mutable->flags & TNL_F_IN_KEY_MATCH)
-		OVS_CB(skb)->tun_id = key;
-	else
-		OVS_CB(skb)->tun_id = 0;
+
+	tun_key_init(&tun_key, iph,
+		     mutable->flags & TNL_F_IN_KEY_MATCH ? key : 0);
+	OVS_CB(skb)->tun_key = &tun_key;
 
 	__skb_pull(skb, hdr_len);
 	skb_postpull_rcsum(skb, skb_transport_header(skb), hdr_len + ETH_HLEN);
 
-	ovs_tnl_rcv(vport, skb, iph->tos);
+	ovs_tnl_rcv(vport, skb);
 	return 0;
 
 error:
diff --git a/datapath/vport.c b/datapath/vport.c
index 172261a..0c77a1b 100644
--- a/datapath/vport.c
+++ b/datapath/vport.c
@@ -462,7 +462,7 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff *skb)
 		OVS_CB(skb)->flow = NULL;
 
 	if (!(vport->ops->flags & VPORT_F_TUN_ID))
-		OVS_CB(skb)->tun_id = 0;
+		OVS_CB(skb)->tun_key = NULL;
 
 	ovs_dp_process_received_packet(vport, skb);
 }
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index f5c9cca..c32bb58 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -278,7 +278,8 @@ enum ovs_key_attr {
 	OVS_KEY_ATTR_ICMPV6,    /* struct ovs_key_icmpv6 */
 	OVS_KEY_ATTR_ARP,       /* struct ovs_key_arp */
 	OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
-	OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */
+	OVS_KEY_ATTR_TUN_ID,    /* be64 tunnel ID */
+	OVS_KEY_ATTR_IPV4_TUNNEL,  /* struct ovs_key_ipv4_tunnel */
 	__OVS_KEY_ATTR_MAX
 };
 
@@ -360,6 +361,16 @@ struct ovs_key_nd {
 	__u8  nd_tll[6];
 };
 
+struct ovs_key_ipv4_tunnel {
+	__be64 tun_id;
+	__u32  tun_flags;
+	__be32 ipv4_src;
+	__be32 ipv4_dst;
+	__u8   ipv4_tos;
+	__u8   ipv4_ttl;
+	__u8   pad[2];
+};
+
 /**
  * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
  * @OVS_FLOW_ATTR_KEY: Nested %OVS_KEY_ATTR_* attributes specifying the flow
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index fb0a863..d065a3a 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1165,6 +1165,7 @@ execute_set_action(struct ofpbuf *packet, const struct nlattr *a)
     case OVS_KEY_ATTR_TUN_ID:
     case OVS_KEY_ATTR_PRIORITY:
     case OVS_KEY_ATTR_IPV6:
+    case OVS_KEY_ATTR_IPV4_TUNNEL:
         /* not implemented */
         break;
 
diff --git a/lib/odp-util.c b/lib/odp-util.c
index 8693d3c..23d1efe 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -106,6 +106,7 @@ ovs_key_attr_to_string(enum ovs_key_attr attr)
     case OVS_KEY_ATTR_ARP: return "arp";
     case OVS_KEY_ATTR_ND: return "nd";
     case OVS_KEY_ATTR_TUN_ID: return "tun_id";
+    case OVS_KEY_ATTR_IPV4_TUNNEL: return "ipv4_tunnel";
 
     case __OVS_KEY_ATTR_MAX:
     default:
@@ -614,6 +615,7 @@ odp_flow_key_attr_len(uint16_t type)
     case OVS_KEY_ATTR_ICMPV6: return sizeof(struct ovs_key_icmpv6);
     case OVS_KEY_ATTR_ARP: return sizeof(struct ovs_key_arp);
     case OVS_KEY_ATTR_ND: return sizeof(struct ovs_key_nd);
+    case OVS_KEY_ATTR_IPV4_TUNNEL: return sizeof(struct ovs_key_ipv4_tunnel);
 
     case OVS_KEY_ATTR_UNSPEC:
     case __OVS_KEY_ATTR_MAX:
@@ -668,6 +670,7 @@ format_odp_key_attr(const struct nlattr *a, struct ds *ds)
     const struct ovs_key_icmpv6 *icmpv6_key;
     const struct ovs_key_arp *arp_key;
     const struct ovs_key_nd *nd_key;
+    const struct ovs_key_ipv4_tunnel *ipv4_tun_key;
     enum ovs_key_attr attr = nl_attr_type(a);
     int expected_len;
 
@@ -698,6 +701,16 @@ format_odp_key_attr(const struct nlattr *a, struct ds *ds)
         ds_put_format(ds, "(%#"PRIx64")", ntohll(nl_attr_get_be64(a)));
         break;
 
+    case OVS_KEY_ATTR_IPV4_TUNNEL:
+        ipv4_tun_key = nl_attr_get(a);
+        ds_put_format(ds, "(tun_id=%"PRIx64",flags=%"PRIx32
+                      ",src="IP_FMT",dst="IP_FMT",tos=%"PRIx8",ttl=%"PRIu8")",
+                      ntohll(ipv4_tun_key->tun_id), ipv4_tun_key->tun_flags,
+                      IP_ARGS(&ipv4_tun_key->ipv4_src),
+                      IP_ARGS(&ipv4_tun_key->ipv4_dst),
+                      ipv4_tun_key->ipv4_tos, ipv4_tun_key->ipv4_ttl);
+        break;
+
     case OVS_KEY_ATTR_IN_PORT:
         ds_put_format(ds, "(%"PRIu32")", nl_attr_get_u32(a));
         break;
diff --git a/lib/odp-util.h b/lib/odp-util.h
index d53f083..4e5a8a1 100644
--- a/lib/odp-util.h
+++ b/lib/odp-util.h
@@ -72,6 +72,7 @@ int odp_actions_from_string(const char *, const struct simap *port_names,
  *                         ------  ---  ------  -----
  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
+ *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24
  *  OVS_KEY_ATTR_IN_PORT       4    --     4      8
  *  OVS_KEY_ATTR_ETHERNET     12    --     4     16
  *  OVS_KEY_ATTR_8021Q         4    --     4      8
@@ -80,9 +81,9 @@ int odp_actions_from_string(const char *, const struct simap *port_names,
  *  OVS_KEY_ATTR_ICMPV6        2     2     4      8
  *  OVS_KEY_ATTR_ND           28    --     4     32
  *  -------------------------------------------------
- *  total                                       144
+ *  total                                       168
  */
-#define ODPUTIL_FLOW_KEY_BYTES 144
+#define ODPUTIL_FLOW_KEY_BYTES 168
 
 /* A buffer with sufficient size and alignment to hold an nlattr-formatted flow
  * key.  An array of "struct nlattr" might not, in theory, be sufficiently
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/21] datapath: Use tun_key on transmit
  2012-05-24  9:08 [RFC v4 00/21] Flow Based Tunneling for Open vSwitch Simon Horman
@ 2012-05-24  9:08 ` Simon Horman
  2012-05-24  9:08 ` [PATCH 03/21] odp-util: Add tun_key to parse_odp_key_attr() Simon Horman
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:08 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman

Use the tun_key, which is the basis of flow-based tunnelling, on transmit.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index 010e513..61add96 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -1002,15 +1002,16 @@ unlock:
 }
 
 static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
-				   u8 ipproto, u8 tos)
+				   u8 ipproto, __be32 daddr, __be32 saddr,
+				   u8 tos)
 {
 	/* Tunnel configuration keeps DSCP part of TOS bits, But Linux
 	 * router expect RT_TOS bits only. */
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,39)
 	struct flowi fl = { .nl_u = { .ip4_u = {
-					.daddr = mutable->key.daddr,
-					.saddr = mutable->key.saddr,
+					.daddr = daddr,
+					.saddr = saddr,
 					.tos   = RT_TOS(tos) } },
 					.proto = ipproto };
 	struct rtable *rt;
@@ -1020,8 +1021,8 @@ static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
 
 	return rt;
 #else
-	struct flowi4 fl = { .daddr = mutable->key.daddr,
-			     .saddr = mutable->key.saddr,
+	struct flowi4 fl = { .daddr = daddr,
+			     .saddr = saddr,
 			     .flowi4_tos = RT_TOS(tos),
 			     .flowi4_proto = ipproto };
 
@@ -1031,7 +1032,8 @@ static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
 
 static struct rtable *find_route(struct vport *vport,
 				 const struct tnl_mutable_config *mutable,
-				 u8 tos, struct tnl_cache **cache)
+				 u8 tos, __be32 daddr, __be32 saddr,
+				 struct tnl_cache **cache)
 {
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
 	struct tnl_cache *cur_cache = rcu_dereference(tnl_vport->cache);
@@ -1039,14 +1041,16 @@ static struct rtable *find_route(struct vport *vport,
 	*cache = NULL;
 	tos = RT_TOS(tos);
 
-	if (likely(tos == RT_TOS(mutable->tos) &&
-	    check_cache_valid(cur_cache, mutable))) {
+	if (daddr == mutable->key.daddr && saddr == mutable->key.saddr &&
+	    tos == RT_TOS(mutable->tos) &&
+	    check_cache_valid(cur_cache, mutable)) {
 		*cache = cur_cache;
 		return cur_cache->rt;
 	} else {
 		struct rtable *rt;
 
-		rt = __find_route(mutable, tnl_vport->tnl_ops->ipproto, tos);
+		rt = __find_route(mutable, tnl_vport->tnl_ops->ipproto,
+				  daddr, saddr, tos);
 		if (IS_ERR(rt))
 			return NULL;
 
@@ -1182,6 +1186,8 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	struct tnl_cache *cache;
 	int sent_len = 0;
 	__be16 frag_off = 0;
+	__be32 daddr;
+	__be32 saddr;
 	u8 ttl;
 	u8 inner_tos;
 	u8 tos;
@@ -1221,11 +1227,21 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 
 	if (mutable->flags & TNL_F_TOS_INHERIT)
 		tos = inner_tos;
+	else if (OVS_CB(skb)->tun_key)
+		tos = OVS_CB(skb)->tun_key->ipv4_tos;
 	else
 		tos = mutable->tos;
 
+	if (OVS_CB(skb)->tun_key) {
+		daddr = OVS_CB(skb)->tun_key->ipv4_dst;
+		saddr = OVS_CB(skb)->tun_key->ipv4_src;
+	} else {
+		daddr = mutable->key.daddr;
+		saddr = mutable->key.saddr;
+	}
+
 	/* Route lookup */
-	rt = find_route(vport, mutable, tos, &cache);
+	rt = find_route(vport, mutable, tos, daddr, saddr, &cache);
 	if (unlikely(!rt))
 		goto error_free;
 	if (unlikely(!cache))
@@ -1262,10 +1278,12 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	}
 
 	/* TTL */
-	ttl = mutable->ttl;
+	if (OVS_CB(skb)->tun_key)
+		ttl = OVS_CB(skb)->tun_key->ipv4_ttl;
+	else
+		ttl = mutable->ttl;
 	if (!ttl)
 		ttl = ip4_dst_hoplimit(&rt_dst(rt));
-
 	if (mutable->flags & TNL_F_TTL_INHERIT) {
 		if (skb->protocol == htons(ETH_P_IP))
 			ttl = ip_hdr(skb)->ttl;
@@ -1444,7 +1462,8 @@ static int tnl_set_config(struct net *net, struct nlattr *options,
 		struct net_device *dev;
 		struct rtable *rt;
 
-		rt = __find_route(mutable, tnl_ops->ipproto, mutable->tos);
+		rt = __find_route(mutable, tnl_ops->ipproto, mutable->tos,
+				  mutable->key.daddr, mutable->key.saddr);
 		if (IS_ERR(rt))
 			return -EADDRNOTAVAIL;
 		dev = rt_dst(rt).dev;
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/21] odp-util: Add tun_key to parse_odp_key_attr()
  2012-05-24  9:08 [RFC v4 00/21] Flow Based Tunneling for Open vSwitch Simon Horman
  2012-05-24  9:08 ` [PATCH 02/21] datapath: Use tun_key on transmit Simon Horman
@ 2012-05-24  9:08 ` Simon Horman
       [not found]   ` <1337850554-10339-4-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
  2012-05-24  9:09 ` [PATCH 08/21] ofproto: Add realdev_to_txdev() Simon Horman
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:08 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

---

v4
Correct parsing of tunnel key in parse_odp_key_attr()
so that it matches the out put of format_odp_key_attr()

TODO: fix test suite

v3
* Initial post
---
 lib/odp-util.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/lib/odp-util.c b/lib/odp-util.c
index 23d1efe..7cff00c 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -925,6 +925,35 @@ parse_odp_key_attr(const char *s, const struct simap *port_names,
     }
 
     {
+        ovs_be32 ipv4_src;
+        ovs_be32 ipv4_dst;
+        unsigned long long tun_flags;
+        int ipv4_tos;
+        int ipv4_ttl;
+        int n = -1;
+
+        if (sscanf(s, "ipv4_tunnel(tun_id=%31[x0123456789abcdefABCDEF]"
+                   ",flags=%llx,src="IP_SCAN_FMT",dst="IP_SCAN_FMT
+                   ",tos=%i,ttl=%i)%n",
+                   tun_id_s, &tun_flags,
+                   IP_SCAN_ARGS(&ipv4_src), IP_SCAN_ARGS(&ipv4_dst),
+                   &ipv4_tos, &ipv4_ttl, &n) > 0
+            && n > 0) {
+            struct ovs_key_ipv4_tunnel tun_key;
+
+            tun_key.tun_id = htonll(strtoull(tun_id_s, NULL, 0));
+            tun_key.tun_flags = tun_flags;
+            tun_key.ipv4_src = ipv4_src;
+            tun_key.ipv4_dst = ipv4_dst;
+            tun_key.ipv4_tos = ipv4_tos;
+            tun_key.ipv4_ttl = ipv4_ttl;
+            nl_msg_put_unspec(key, OVS_KEY_ATTR_IPV4_TUNNEL,
+                              &tun_key, sizeof tun_key);
+            return n;
+        }
+    }
+
+    {
         unsigned long long int in_port;
         int n = -1;
 
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/21] vswitchd: Add iface_parse_tunnel
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
  2012-05-24  9:08   ` [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key Simon Horman
@ 2012-05-24  9:08   ` Simon Horman
       [not found]     ` <1337850554-10339-5-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
  2012-05-24  9:08   ` [PATCH 05/21] vswitchd: Add add_tunnel_ports() Simon Horman
                     ` (10 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:08 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

This duplicates parse_tunnel_config, the duplication will later be minimised.

iface_parse_tunnel() is currently only used to verify the configuration
by passing NULL as its third argument. It will later be used in storing
the configuration by passing a non-NULL argument. The purpose of verification
is to allow for error-free parsing later.

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
---
 include/openvswitch/tunnel.h |   2 +
 ofproto/ofproto.h            |  33 +++++++
 vswitchd/bridge.c            | 214 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 249 insertions(+)

diff --git a/include/openvswitch/tunnel.h b/include/openvswitch/tunnel.h
index c494791..5f55ecc 100644
--- a/include/openvswitch/tunnel.h
+++ b/include/openvswitch/tunnel.h
@@ -71,5 +71,7 @@ enum {
 #define TNL_F_PMTUD		(1 << 5) /* Enable path MTU discovery. */
 #define TNL_F_HDR_CACHE		(1 << 6) /* Enable tunnel header caching. */
 #define TNL_F_IPSEC		(1 << 7) /* Traffic is IPsec encrypted. */
+#define TNL_F_IN_KEY	(1 << 8) /* Tunnel port has input key. */
+#define TNL_F_OUT_KEY	(1 << 9) /* Tunnel port has output key. */
 
 #endif /* openvswitch/tunnel.h */
diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
index ea988e7..d8739b0 100644
--- a/ofproto/ofproto.h
+++ b/ofproto/ofproto.h
@@ -367,7 +367,40 @@ void ofproto_get_vlan_usage(struct ofproto *, unsigned long int *vlan_bitmap);
 bool ofproto_has_vlan_usage_changed(const struct ofproto *);
 int ofproto_port_set_realdev(struct ofproto *, uint16_t vlandev_ofp_port,
                              uint16_t realdev_ofp_port, int vid);
+\f
+#define TNL_F_CSUM          (1 << 0) /* Checksum packets. */
+#define TNL_F_TOS_INHERIT	(1 << 1) /* Inherit ToS from inner packet. */
+#define TNL_F_TTL_INHERIT	(1 << 2) /* Inherit TTL from inner packet. */
+#define TNL_F_DF_INHERIT	(1 << 3) /* Inherit DF bit from inner packet. */
+#define TNL_F_DF_DEFAULT	(1 << 4) /* Set DF bit if inherit off or
+                                      * not IP. */
+#define TNL_F_PMTUD		    (1 << 5) /* Enable path MTU discovery. */
+#define TNL_F_HDR_CACHE		(1 << 6) /* Enable tunnel header caching. */
+#define TNL_F_IPSEC		    (1 << 7) /* Traffic is IPsec encrypted. */
+#define TNL_F_IN_KEY	    (1 << 8) /* Tunnel port has input key. */
+#define TNL_F_OUT_KEY	    (1 << 9) /* Tunnel port has output key. */
+
+#define TNL_T_PROTO_GRE     0
+#define TNL_T_PROTO_CAPWAP  1
+
+#define TNL_T_KEY_EXACT     (1 << 6)
+#define TNL_T_KEY_MATCH     (1 << 7)
+
+/* Tunnel device support */
+struct tunnel_settings {
+    ovs_be64 in_key;
+    ovs_be64 out_key;
+    ovs_be32 saddr;
+    ovs_be32 daddr;
+    uint8_t tos;
+    uint8_t ttl;
+    uint16_t flags;
+    uint8_t type;
+};
 
+void ofproto_port_set_tunnel(struct ofproto *ofproto, uint16_t tundev_ofp_port,
+                             uint16_t realdev_ofp_port,
+                             const struct tunnel_settings *s);
 #ifdef  __cplusplus
 }
 #endif
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index d720952..f775ae7 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -20,6 +20,7 @@
 #include <inttypes.h>
 #include <stdlib.h>
 #include "bitmap.h"
+#include "byte-order.h"
 #include "bond.h"
 #include "cfm.h"
 #include "coverage.h"
@@ -625,6 +626,13 @@ bridge_update_ofprotos(void)
     }
 }
 
+static bool
+is_tunnel_realdev(const char *type)
+{
+    return !strcmp(type, "gre") || !strcmp(type, "ipsec_gre") ||
+            !strcmp(type, "capwap");
+}
+
 static void
 port_configure(struct port *port)
 {
@@ -1333,6 +1341,207 @@ error:
     return error;
 }
 
+
+static const char *
+get_key(const struct shash *args, const char *name)
+{
+    const char *s;
+
+    s = shash_find_data(args, name);
+    if (!s) {
+        s = shash_find_data(args, "key");
+        if (!s) {
+            s = "0";
+        }
+    }
+
+    if (!strcmp(s, "flow")) {
+        /* This is the default if no attribute is present. */
+        return NULL;
+    }
+
+    return s;
+}
+
+static int
+iface_parse_tunnel(const struct ovsrec_interface *iface_cfg,
+                   const char *type, struct tunnel_settings *sp)
+{
+    bool is_gre = false;
+    bool is_ipsec = false;
+    struct shash args;
+    struct shash_node *node;
+    struct tunnel_settings s = { .tos = 0 };
+    bool ipsec_mech_set = false;
+    int status;
+    const char *key;
+
+    shash_init(&args);
+    shash_from_ovs_idl_map(iface_cfg->key_options,
+                           iface_cfg->value_options,
+                           iface_cfg->n_options, &args);
+
+    s.flags = TNL_F_DF_DEFAULT | TNL_F_PMTUD | TNL_F_HDR_CACHE;
+    if (!strcmp(type, "gre")) {
+        is_gre = true;
+        s.type = TNL_T_PROTO_GRE;
+    } else if (!strcmp(type, "ipsec_gre")) {
+        is_gre = true;
+        s.type = TNL_T_PROTO_GRE;
+        is_ipsec = true;
+        s.flags |= TNL_F_IPSEC;
+        s.flags &= ~TNL_F_HDR_CACHE;
+    } else if (strcmp(type, "capwap")) {
+        s.type = TNL_T_PROTO_CAPWAP;
+    }
+
+    SHASH_FOR_EACH (node, &args) {
+        if (!strcmp(node->name, "remote_ip")) {
+            struct in_addr in_addr;
+            if (lookup_ip(node->data, &in_addr)) {
+                VLOG_WARN("%s: bad %s 'remote_ip'", iface_cfg->name, type);
+            } else {
+                s.daddr = in_addr.s_addr;
+            }
+        } else if (!strcmp(node->name, "local_ip")) {
+            struct in_addr in_addr;
+            if (lookup_ip(node->data, &in_addr)) {
+                VLOG_WARN("%s: bad %s 'local_ip'", iface_cfg->name, type);
+            } else {
+                s.saddr = in_addr.s_addr;
+            }
+        } else if (!strcmp(node->name, "tos")) {
+            if (!strcmp(node->data, "inherit")) {
+                s.flags |= TNL_F_TOS_INHERIT;
+            } else {
+                s.tos = atoi(node->data);
+            }
+        } else if (!strcmp(node->name, "ttl")) {
+            if (!strcmp(node->data, "inherit")) {
+                s.flags |= TNL_F_TTL_INHERIT;
+            } else {
+                s.ttl = atoi(node->data);
+            }
+        } else if (!strcmp(node->name, "csum") && is_gre) {
+            if (!strcmp(node->data, "true")) {
+                s.flags |= TNL_F_CSUM;
+            }
+        } else if (!strcmp(node->name, "df_inherit")) {
+            if (!strcmp(node->data, "true")) {
+                s.flags |= TNL_F_DF_INHERIT;
+            }
+        } else if (!strcmp(node->name, "df_default")) {
+            if (!strcmp(node->data, "false")) {
+                s.flags &= ~TNL_F_DF_DEFAULT;
+            }
+        } else if (!strcmp(node->name, "pmtud")) {
+            if (!strcmp(node->data, "false")) {
+                s.flags &= ~TNL_F_PMTUD;
+            }
+        } else if (!strcmp(node->name, "header_cache")) {
+            if (!strcmp(node->data, "false")) {
+                s.flags &= ~TNL_F_HDR_CACHE;
+            }
+        } else if (!strcmp(node->name, "peer_cert") && is_ipsec) {
+            if (shash_find(&args, "certificate")) {
+                ipsec_mech_set = true;
+            } else {
+                const char *use_ssl_cert;
+
+                /* If the "use_ssl_cert" is true, then "certificate" and
+                 * "private_key" will be pulled from the SSL table.  The
+                 * use of this option is strongly discouraged, since it
+                 * will like be removed when multiple SSL configurations
+                 * are supported by OVS.
+                 */
+                use_ssl_cert = shash_find_data(&args, "use_ssl_cert");
+                if (!use_ssl_cert || strcmp(use_ssl_cert, "true")) {
+                    VLOG_ERR("%s: 'peer_cert' requires 'certificate' argument",
+                             iface_cfg->name);
+                    goto err;
+                }
+                ipsec_mech_set = true;
+            }
+        } else if (!strcmp(node->name, "psk") && is_ipsec) {
+            ipsec_mech_set = true;
+        } else if (is_ipsec
+                && (!strcmp(node->name, "certificate")
+                    || !strcmp(node->name, "private_key")
+                    || !strcmp(node->name, "use_ssl_cert"))) {
+            /* Ignore options not used by the netdev. */
+        } else if (!strcmp(node->name, "key") ||
+                   !strcmp(node->name, "in_key") ||
+                   !strcmp(node->name, "out_key")) {
+            /* Handled separately below. */
+        } else {
+            VLOG_WARN("%s: unknown %s argument '%s'", iface_cfg->name,
+                      type, node->name);
+        }
+    }
+
+    if (is_ipsec) {
+        char *file_name = xasprintf("%s/%s", ovs_rundir(),
+                "ovs-monitor-ipsec.pid");
+        pid_t pid = read_pidfile(file_name);
+        free(file_name);
+        if (pid < 0) {
+            VLOG_ERR("%s: IPsec requires the ovs-monitor-ipsec daemon",
+                     iface_cfg->name);
+            goto err;
+        }
+
+        if (shash_find(&args, "peer_cert") && shash_find(&args, "psk")) {
+            VLOG_ERR("%s: cannot define both 'peer_cert' and 'psk'",
+                     iface_cfg->name);
+            goto err;
+        }
+
+        if (!ipsec_mech_set) {
+            VLOG_ERR("%s: IPsec requires an 'peer_cert' or psk' argument",
+                     iface_cfg->name);
+            goto err;
+        }
+    }
+
+    if ((key = get_key(&args, "in_key"))) {
+        s.flags |= TNL_F_IN_KEY;
+        s.type |= TNL_T_KEY_EXACT;
+        s.in_key = htonll(strtoull(key, NULL, 0));
+    } else {
+        s.type |= TNL_T_KEY_MATCH;
+        s.in_key = 0ULL;
+    }
+    if ((key = get_key(&args, "out_key"))) {
+        s.flags |= TNL_F_OUT_KEY;
+        s.out_key = htonll(strtoull(key, NULL, 0));
+    } else {
+        s.out_key = 0ULL;
+    }
+
+    if (!s.daddr) {
+        VLOG_ERR("%s: %s type requires valid 'remote_ip' argument",
+                 iface_cfg->name, type);
+        goto err;
+    }
+
+    if (s.saddr) {
+        if (ip_is_multicast(s.daddr)) {
+            VLOG_WARN("%s: remote_ip is multicast, ignoring local_ip",
+                      iface_cfg->name);
+            s.saddr = 0;
+        }
+    }
+
+    if (sp) {
+            *sp = s;
+    }
+
+    status = 0;
+err:
+    shash_destroy(&args);
+    return status;
+}
+
 /* Creates a new iface on 'br' based on 'if_cfg'.  The new iface has OpenFlow
  * port number 'ofp_port'.  If ofp_port is negative, an OpenFlow port is
  * automatically allocated for the iface.  Takes ownership of and
@@ -1344,6 +1553,7 @@ iface_create(struct bridge *br, struct if_cfg *if_cfg, int ofp_port)
 {
     const struct ovsrec_interface *iface_cfg = if_cfg->cfg;
     const struct ovsrec_port *port_cfg = if_cfg->parent;
+    const char *type = iface_get_type(iface_cfg, br->cfg);
 
     struct netdev *netdev;
     struct iface *iface;
@@ -1355,6 +1565,10 @@ iface_create(struct bridge *br, struct if_cfg *if_cfg, int ofp_port)
     hmap_remove(&br->if_cfg_todo, &if_cfg->hmap_node);
     free(if_cfg);
 
+    if (is_tunnel_realdev(type) && iface_parse_tunnel(iface_cfg, type, NULL)) {
+        return false;
+    }
+
     /* Do the bits that can fail up front. */
     assert(!iface_lookup(br, iface_cfg->name));
     error = iface_do_create(br, iface_cfg, port_cfg, &ofp_port, &netdev);
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/21] vswitchd: Add add_tunnel_ports()
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
  2012-05-24  9:08   ` [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key Simon Horman
  2012-05-24  9:08   ` [PATCH 04/21] vswitchd: Add iface_parse_tunnel Simon Horman
@ 2012-05-24  9:08   ` Simon Horman
       [not found]     ` <1337850554-10339-6-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
  2012-05-24  9:08   ` [PATCH 06/21] ofproto: Add set_tunnelling() Simon Horman
                     ` (9 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:08 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

Add tunnel tundevs for tunnel realdevs as needed.

In general the notion is that realdevs may be configured by users
and from an end-user point of view are compatible with the existing
port-based tunneling code. And that tundevs exist in the datapath
arnd are actually used to send and recieve packets, based on flows.

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
---
 vswitchd/bridge.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index f775ae7..3d187f0 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -268,6 +268,7 @@ static void configure_splinter_port(struct port *);
 static void add_vlan_splinter_ports(struct bridge *,
                                     const unsigned long int *splinter_vlans,
                                     struct shash *ports);
+static void add_tunnel_ports(struct bridge *, struct shash *ports);
 \f
 /* Public functions. */
 
@@ -2751,6 +2752,8 @@ bridge_add_del_ports(struct bridge *br,
         add_vlan_splinter_ports(br, splinter_vlans, &new_ports);
     }
 
+    add_tunnel_ports(br, &new_ports);
+
     /* Get rid of deleted ports.
      * Get rid of deleted interfaces on ports that still exist. */
     HMAP_FOR_EACH_SAFE (port, next, hmap_node, &br->ports) {
@@ -4153,6 +4156,70 @@ add_vlan_splinter_ports(struct bridge *br,
     }
 }
 
+static struct ovsrec_port *
+synthesize_tunnel_port(const char *name, const char *type)
+{
+    struct ovsrec_interface *iface;
+    struct ovsrec_port *port;
+
+    iface = xzalloc(sizeof *iface);
+    iface->name = xstrdup(name);
+    iface->type = type;
+
+    port = xzalloc(sizeof *port);
+    port->interfaces = xmemdup(&iface, sizeof iface);
+    port->n_interfaces = 1;
+    port->name = xstrdup(name);
+
+    register_block(iface);
+    register_block(iface->name);
+    register_block(port);
+    register_block(port->interfaces);
+    register_block(port->name);
+
+    return port;
+}
+
+/* For each interface with 'br' is a tunnel, adds the corresponding
+ * ovsrec_port to 'ports' if it is not already present */
+static void
+add_tunnel_ports(struct bridge *br, struct shash *ports)
+{
+    size_t i;
+
+    /* We iterate through 'br->cfg->ports' instead of 'ports' here because
+     * we're modifying 'ports'. */
+    for (i = 0; i < br->cfg->n_ports; i++) {
+        const char *name = br->cfg->ports[i]->name;
+        struct ovsrec_port *port_cfg = shash_find_data(ports, name);
+        size_t j;
+
+        for (j = 0; j < port_cfg->n_interfaces; j++) {
+            struct ovsrec_interface *iface_cfg = port_cfg->interfaces[j];
+            const char *type = iface_get_type(iface_cfg, br->cfg);
+            const char *tundev_name;
+            const char *tundev_type;
+
+            if (!is_tunnel_realdev(type)) {
+                continue;
+            }
+
+            tundev_name = strcmp(type, "ipsec_gre") ? type : "gre";
+            if (!strcmp(tundev_name, "gre")) {
+                tundev_type = "gre-tundev";
+            } else {
+                tundev_type = "capwap-tundev";
+            }
+
+            if (!shash_find(ports, tundev_name)) {
+                    shash_add(ports, tundev_name,
+                              synthesize_tunnel_port(tundev_name,
+                                                     tundev_type));
+            }
+        }
+    }
+}
+
 static void
 mirror_refresh_stats(struct mirror *m)
 {
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/21] ofproto: Add set_tunnelling()
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
                     ` (2 preceding siblings ...)
  2012-05-24  9:08   ` [PATCH 05/21] vswitchd: Add add_tunnel_ports() Simon Horman
@ 2012-05-24  9:08   ` Simon Horman
  2012-05-24  9:09   ` [PATCH 07/21] vswitchd: Configure tunnel interfaces Simon Horman
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:08 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

Allow configuration of tunneling in ofproto_port instances.

For tunnel realdevs this includes the remote IP of the and type tunnel,
and optionally the local IP, tos and ttl.

For tunnel tundevs it only includes the type.

realdevs and tundevs can be differentiated by examining the remote IP,
which is always zero for tundevs and always non-zero for realdevs.

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
---
 ofproto/ofproto-dpif.c     | 116 +++++++++++++++++++++++++++++++++++++++++++++
 ofproto/ofproto-provider.h |  12 +++++
 ofproto/ofproto.c          |  28 +++++++++++
 ofproto/ofproto.h          |  13 +++++
 4 files changed, 169 insertions(+)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index f2c2ca9..642b508 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -476,6 +476,13 @@ static void facet_account(struct facet *);
 
 static bool facet_is_controller_flow(struct facet *);
 
+struct ofport_dpif_tun {
+    struct tunnel_settings s;
+    uint16_t tundev_ofp_port;
+    struct hmap_node tundev_node;
+    struct ofport_dpif *ofport;  /* Containing ofport_dpif */
+};
+
 struct ofport_dpif {
     struct ofport up;
 
@@ -503,6 +510,9 @@ struct ofport_dpif {
      * widespread use, we will delete these interfaces. */
     uint16_t realdev_ofp_port;
     int vlandev_vid;
+
+    /* Tunneling */
+    struct ofport_dpif_tun *tun;
 };
 
 /* Node in 'ofport_dpif''s 'priorities' map.  Used to maintain a map from
@@ -535,6 +545,16 @@ static bool vsp_adjust_flow(const struct ofproto_dpif *, struct flow *);
 static void vsp_remove(struct ofport_dpif *);
 static void vsp_add(struct ofport_dpif *, uint16_t realdev_ofp_port, int vid);
 
+static unsigned key_local_remote_ports;
+static unsigned key_remote_ports;
+static unsigned local_remote_ports;
+static unsigned remote_ports;
+static unsigned key_multicast_ports;
+static unsigned multicast_ports;
+
+static int set_tunnelling(struct ofport *ofport_, uint16_t realdev_ofp_port,
+                          const struct tunnel_settings *s);
+
 static struct ofport_dpif *
 ofport_dpif_cast(const struct ofport *ofport)
 {
@@ -612,6 +632,9 @@ struct ofproto_dpif {
     /* VLAN splinters. */
     struct hmap realdev_vid_map; /* (realdev,vid) -> vlandev. */
     struct hmap vlandev_map;     /* vlandev -> (realdev,vid). */
+
+    /* Tunnelling */
+    struct hmap tundev_map;     /* tundev -> realdev */
 };
 
 /* Defer flow mod completion until "ovs-appctl ofproto/unclog"?  (Useful only
@@ -771,6 +794,8 @@ construct(struct ofproto *ofproto_)
     hmap_init(&ofproto->vlandev_map);
     hmap_init(&ofproto->realdev_vid_map);
 
+    hmap_init(&ofproto->tundev_map);
+
     hmap_insert(&all_ofproto_dpifs, &ofproto->all_ofproto_dpifs_node,
                 hash_string(ofproto->up.name, 0));
     memset(&ofproto->stats, 0, sizeof ofproto->stats);
@@ -1153,6 +1178,7 @@ port_construct(struct ofport *port_)
     hmap_init(&port->priorities);
     port->realdev_ofp_port = 0;
     port->vlandev_vid = 0;
+    port->tun = NULL;
     port->carrier_seq = netdev_get_carrier_resets(port->up.netdev);
 
     if (ofproto->sflow) {
@@ -1171,6 +1197,7 @@ port_destruct(struct ofport *port_)
     ofproto->need_revalidate = true;
     bundle_remove(port_);
     set_cfm(port_, NULL);
+    set_tunnelling(port_, 0, NULL);
     if (ofproto->sflow) {
         dpif_sflow_del_port(ofproto->sflow, port->odp_port);
     }
@@ -7097,6 +7124,94 @@ vsp_add(struct ofport_dpif *port, uint16_t realdev_ofp_port, int vid)
     }
 }
 \f
+static inline bool
+ipv4_is_multicast(__be32 addr)
+{
+    return (addr & htonl(0xf0000000)) == htonl(0xe0000000);
+}
+
+static unsigned int *
+tun_port_pool(const struct tunnel_settings *s)
+{
+    bool is_multicast = ipv4_is_multicast(s->daddr);
+
+    if (s->type & TNL_T_KEY_MATCH) {
+        if (s->saddr)
+            return &local_remote_ports;
+        else if (is_multicast)
+            return &multicast_ports;
+        else
+            return &remote_ports;
+    } else {
+        if (s->saddr)
+            return &key_local_remote_ports;
+        else if (is_multicast)
+            return &key_multicast_ports;
+        else
+            return &key_remote_ports;
+    }
+}
+
+static void
+tun_remove(struct ofport_dpif *ofport)
+{
+    struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofport->up.ofproto);
+
+    if (!ofport->tun) {
+        return;
+    }
+
+    hmap_remove(&ofproto->tundev_map, &ofport->tun->tundev_node);
+    (*tun_port_pool(&ofport->tun->s))--;
+}
+
+static void
+tun_add(struct ofport_dpif *ofport, uint16_t tundev_ofp_port,
+        const struct tunnel_settings *s)
+{
+    struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofport->up.ofproto);
+
+    ofport->tun->tundev_ofp_port = tundev_ofp_port;
+    ofport->tun->s = *s;
+    (*tun_port_pool(&ofport->tun->s))++;
+    hmap_insert(&ofproto->tundev_map, &ofport->tun->tundev_node,
+                hash_int(tundev_ofp_port, 0));
+}
+
+static int
+set_tunnelling(struct ofport *ofport_, uint16_t tundev_ofp_port,
+               const struct tunnel_settings *s)
+{
+    struct ofport_dpif *ofport = ofport_dpif_cast(ofport_);
+
+    if (!s) {
+        tun_remove(ofport);
+        free(ofport->tun);
+        ofport->tun = NULL;
+        return 0;
+    }
+
+    if (!ofport->tun) {
+        struct ofproto_dpif *ofproto;
+
+        ofproto = ofproto_dpif_cast(ofport->up.ofproto);
+        ofproto->need_revalidate = true;
+        ofport->tun = xzalloc(sizeof *ofport->tun);
+        ofport->tun->ofport = ofport;
+    }
+    else {
+        if (ofport->tun->tundev_ofp_port == tundev_ofp_port &&
+            tunnel_settings_equal(&ofport->tun->s, s)) {
+            return 0;
+        }
+        tun_remove(ofport);
+    }
+
+    tun_add(ofport, tundev_ofp_port, s);
+
+    return 0;
+}
+\f
 const struct ofproto_class ofproto_dpif_class = {
     enumerate_types,
     enumerate_names,
@@ -7159,4 +7274,5 @@ const struct ofproto_class ofproto_dpif_class = {
     forward_bpdu_changed,
     set_mac_idle_time,
     set_realdev,
+    set_tunnelling,
 };
diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index 1f3ad37..be39691 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -1168,6 +1168,18 @@ struct ofproto_class {
      * it. */
     int (*set_realdev)(struct ofport *ofport,
                        uint16_t realdev_ofp_port, int vid);
+
+    /* Configures tunneling for 'ofport'.
+     *
+     * If 'tunnel_settings' is nonnull, configures tunneling
+     * according to its members.
+     *
+     * If 'tunneling_settings' is null, then any tunnel configuration is
+     * removed.
+     *
+     * This function should be null if tunnelling is not supported */
+    int (*set_tunnelling)(struct ofport *ofport, uint16_t tundev_ofp_port,
+			  const struct tunnel_settings *s);
 };
 
 extern const struct ofproto_class ofproto_dpif_class;
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 0bda06a..79f7a24 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -4184,3 +4184,31 @@ ofproto_port_set_realdev(struct ofproto *ofproto, uint16_t vlandev_ofp_port,
     }
     return error;
 }
+
+/* Configure tunneling parameters of a port
+ *
+ * This function has no effect if 'ofproto' does not have a port 'ofp_port'. */
+void
+ofproto_port_set_tunnel(struct ofproto *ofproto, uint16_t tundev_ofp_port,
+			uint16_t ofp_port, const struct tunnel_settings *s)
+{
+    struct ofport *ofport;
+    int error;
+
+    ofport = ofproto_get_port(ofproto, ofp_port);
+    if (!ofport) {
+        VLOG_WARN("%s: cannot configure tunnel on nonexistent port %"PRIu16,
+                  ofproto->name, ofp_port);
+        return;
+    }
+
+    error = (ofproto->ofproto_class->set_tunnelling
+             ? ofproto->ofproto_class->set_tunnelling(ofport,
+						      tundev_ofp_port, s)
+             : EOPNOTSUPP);
+    if (error) {
+        VLOG_WARN("%s: Tunnel configuration on port %"PRIu16" (%s) failed (%s)",
+                  ofproto->name, ofp_port,
+		  netdev_get_name(ofport->netdev), strerror(error));
+    }
+}
diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
index d8739b0..147a588 100644
--- a/ofproto/ofproto.h
+++ b/ofproto/ofproto.h
@@ -398,6 +398,19 @@ struct tunnel_settings {
     uint8_t type;
 };
 
+static inline bool
+tunnel_settings_equal(const struct tunnel_settings *a,
+                      const struct tunnel_settings *b)
+{
+        return a->daddr == b->daddr &&
+                a->in_key == b->in_key &&
+                a->out_key == b->out_key &&
+                a->saddr == b->saddr &&
+                a->flags == b->flags &&
+                a->tos == b->tos &&
+                a->ttl == b->ttl;
+}
+
 void ofproto_port_set_tunnel(struct ofproto *ofproto, uint16_t tundev_ofp_port,
                              uint16_t realdev_ofp_port,
                              const struct tunnel_settings *s);
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/21] vswitchd: Configure tunnel interfaces.
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
                     ` (3 preceding siblings ...)
  2012-05-24  9:08   ` [PATCH 06/21] ofproto: Add set_tunnelling() Simon Horman
@ 2012-05-24  9:09   ` Simon Horman
  2012-05-24  9:09   ` [PATCH 09/21] ofproto: Add tundev_to_realdev() Simon Horman
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

For tunnel realdevs this sets the remote IP and type,
and optionally source IP, ttl and tos. The remote IP
must non-zero.

For tunnel tundevs only the type is configured.
The remote IP must be zero.

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
---
 vswitchd/bridge.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 3d187f0..a67f391 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -242,6 +242,7 @@ static void iface_set_ofport(const struct ovsrec_interface *, int64_t ofport);
 static void iface_clear_db_record(const struct ovsrec_interface *if_cfg);
 static void iface_configure_qos(struct iface *, const struct ovsrec_qos *);
 static void iface_configure_cfm(struct iface *);
+static void iface_configure_tunnel(struct iface *);
 static void iface_refresh_cfm_stats(struct iface *);
 static void iface_refresh_stats(struct iface *);
 static void iface_refresh_status(struct iface *);
@@ -535,6 +536,7 @@ bridge_reconfigure_continue(const struct ovsrec_open_vswitch *ovs_cfg)
             LIST_FOR_EACH (iface, port_elem, &port->ifaces) {
                 iface_configure_cfm(iface);
                 iface_configure_qos(iface, port->cfg->qos);
+                iface_configure_tunnel(iface);
                 iface_set_mac(iface);
             }
         }
@@ -627,6 +629,21 @@ bridge_update_ofprotos(void)
     }
 }
 
+is_tunnel_tundev(const char *type)
+{
+    return !strcmp(type, "gre-tundev") || !strcmp(type, "capwap-tundev");
+}
+
+static uint8_t
+tunnel_tundev_type_from_str(const char *type)
+{
+    if (!strcmp(type, "gre-tundev"))
+        return TNL_T_PROTO_GRE;
+    if (!strcmp(type, "gre-tundev"))
+        return TNL_T_PROTO_CAPWAP;
+    NOT_REACHED();
+}
+
 static bool
 is_tunnel_realdev(const char *type)
 {
@@ -648,6 +665,15 @@ port_configure(struct port *port)
         return;
     }
 
+    if (list_is_singleton(&port->ifaces)) {
+        iface = CONTAINER_OF(list_front(&port->ifaces),
+                             struct iface, port_elem);
+        if (is_tunnel_tundev(iface->type)) {
+            ofproto_bundle_unregister(port->bridge->ofproto, port);
+            return;
+        }
+    }
+
     /* Get name. */
     s.name = port->name;
 
@@ -3686,6 +3712,49 @@ iface_configure_cfm(struct iface *iface)
     ofproto_port_set_cfm(iface->port->bridge->ofproto, iface->ofp_port, &s);
 }
 
+static void
+iface_configure_tunnel_tundev(struct iface *iface)
+{
+    const char *type = iface_get_type(iface->cfg, iface->port->bridge->cfg);
+    struct tunnel_settings s = { .type = tunnel_tundev_type_from_str(type) };
+
+    ofproto_port_set_tunnel(iface->port->bridge->ofproto, 0,
+                            iface->ofp_port, &s);
+}
+
+static void
+iface_configure_tunnel_realdev(struct iface *iface)
+{
+    struct tunnel_settings s = { .tos = 0 };
+    const char *type = iface_get_type(iface->cfg, iface->port->bridge->cfg);
+    struct iface *tundev;
+
+    /* This will not fail as it has already been called
+     * to check for errors */
+    iface_parse_tunnel(iface->cfg, type, &s);
+
+    tundev = iface_lookup(iface->port->bridge, type);
+    assert(tundev);
+
+    ofproto_port_set_tunnel(iface->port->bridge->ofproto, tundev->ofp_port,
+                            iface->ofp_port, &s);
+}
+
+static void
+iface_configure_tunnel(struct iface *iface)
+{
+    const char *type = iface_get_type(iface->cfg, iface->port->bridge->cfg);
+
+    if (is_tunnel_realdev(type)) {
+        return iface_configure_tunnel_realdev(iface);
+    } else if (is_tunnel_tundev(type)) {
+        return iface_configure_tunnel_tundev(iface);
+    }
+
+    /* Nothing to do */
+    return;
+}
+
 /* Returns true if 'iface' is synthetic, that is, if we constructed it locally
  * instead of obtaining it from the database. */
 static bool
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/21] ofproto: Add realdev_to_txdev()
  2012-05-24  9:08 [RFC v4 00/21] Flow Based Tunneling for Open vSwitch Simon Horman
  2012-05-24  9:08 ` [PATCH 02/21] datapath: Use tun_key on transmit Simon Horman
  2012-05-24  9:08 ` [PATCH 03/21] odp-util: Add tun_key to parse_odp_key_attr() Simon Horman
@ 2012-05-24  9:09 ` Simon Horman
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman

This is used to map a tunnel or VLAN realdevs to
tundev and vlandevs respectively. This is used
on transmit to map fromt the interface used
in user-space to the interface used in the datapath.

In the case where an interface is not a tunnel
and does not have VLAN splinters configured
a identity map is made.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 ofproto/ofproto-dpif.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 642b508..c7ea391 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -539,8 +539,6 @@ struct vlan_splinter {
     int vid;
 };
 
-static uint32_t vsp_realdev_to_vlandev(const struct ofproto_dpif *,
-                                       uint32_t realdev, ovs_be16 vlan_tci);
 static bool vsp_adjust_flow(const struct ofproto_dpif *, struct flow *);
 static void vsp_remove(struct ofport_dpif *);
 static void vsp_add(struct ofport_dpif *, uint16_t realdev_ofp_port, int vid);
@@ -555,6 +553,10 @@ static unsigned multicast_ports;
 static int set_tunnelling(struct ofport *ofport_, uint16_t realdev_ofp_port,
                           const struct tunnel_settings *s);
 
+static uint32_t
+realdev_to_txdev(const struct ofproto_dpif *ofproto,
+                 const struct ofport_dpif *ofport, ovs_be16 vlan_tci);
+
 static struct ofport_dpif *
 ofport_dpif_cast(const struct ofport *ofport)
 {
@@ -4700,9 +4702,8 @@ send_packet(const struct ofport_dpif *ofport, struct ofpbuf *packet)
     int error;
 
     flow_extract((struct ofpbuf *) packet, 0, 0, 0, &flow);
-    odp_port = vsp_realdev_to_vlandev(ofproto, ofport->odp_port,
-                                      flow.vlan_tci);
-    if (odp_port != ofport->odp_port) {
+    odp_port = realdev_to_txdev(ofproto, ofport, flow.vlan_tci);
+    if (odp_port != ofport->odp_port && !ofport->tun) {
         eth_pop_vlan(packet);
         flow.vlan_tci = htons(0);
     }
@@ -4909,9 +4910,8 @@ compose_output_action__(struct action_xlate_ctx *ctx, uint16_t ofp_port,
          * later and we're pre-populating the flow table.  */
     }
 
-    out_port = vsp_realdev_to_vlandev(ctx->ofproto, odp_port,
-                                      ctx->flow.vlan_tci);
-    if (out_port != odp_port) {
+    out_port = realdev_to_txdev(ctx->ofproto, ofport, ctx->flow.vlan_tci);
+    if (out_port != odp_port && !ofport->tun) {
         ctx->flow.vlan_tci = htons(0);
     }
     commit_odp_actions(&ctx->flow, &ctx->base_flow, ctx->odp_actions);
@@ -7211,6 +7211,21 @@ set_tunnelling(struct ofport *ofport_, uint16_t tundev_ofp_port,
 
     return 0;
 }
+
+/* Maps a port to the port that it should be transmitted on.
+ * If tunneling is enabled then the associated tunnel port is returned.
+ * If VLAN splintering is enabled then the ofp_port of the vlandev is
+ * returned.
+ * Otherwise no mapping is in effect and ofport->odp_port is returned. */
+static uint32_t
+realdev_to_txdev(const struct ofproto_dpif *ofproto,
+                 const struct ofport_dpif *ofport, ovs_be16 vlan_tci)
+{
+    if (ofport->tun) {
+        return ofp_port_to_odp_port(ofport->tun->tundev_ofp_port);
+    }
+    return vsp_realdev_to_vlandev(ofproto, ofport->odp_port, vlan_tci);
+}
 \f
 const struct ofproto_class ofproto_dpif_class = {
     enumerate_types,
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/21] ofproto: Add tundev_to_realdev()
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
                     ` (4 preceding siblings ...)
  2012-05-24  9:09   ` [PATCH 07/21] vswitchd: Configure tunnel interfaces Simon Horman
@ 2012-05-24  9:09   ` Simon Horman
  2012-05-24  9:09   ` [PATCH 10/21] classifier: Convert struct flow flow_metadata to use tun_key Simon Horman
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

In essence this is a duplication of ovs_tnl_find_port(),
copying code from the datapath to vswitchd. It is planned
that the datapath version will be removed.

It is used to map from the tundev interface that a
packet is recieved by in the datapath to the tunnel realdev
interface used in user-sapce. It is the tunnel realdev
that has the tunnel configuration attached.

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
---
 ofproto/ofproto-dpif.c | 194 ++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 174 insertions(+), 20 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index c7ea391..03a86bc 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -183,7 +183,7 @@ static void bundle_del_port(struct ofport_dpif *);
 static void bundle_run(struct ofbundle *);
 static void bundle_wait(struct ofbundle *);
 static struct ofbundle *lookup_input_bundle(const struct ofproto_dpif *,
-                                            uint16_t in_port, bool warn,
+                                            const struct flow *, bool warn,
                                             struct ofport_dpif **in_ofportp);
 
 /* A controller may use OFPP_NONE as the ingress port to indicate that
@@ -550,8 +550,12 @@ static unsigned remote_ports;
 static unsigned key_multicast_ports;
 static unsigned multicast_ports;
 
+static bool tunnel_adjust_flow(const struct ofproto_dpif *ofproto,
+                               struct flow *flow);
 static int set_tunnelling(struct ofport *ofport_, uint16_t realdev_ofp_port,
                           const struct tunnel_settings *s);
+static struct ofport_dpif *tundev_to_realdev(const struct ofproto_dpif *ofproto,
+                                             const struct flow *flow);
 
 static uint32_t
 realdev_to_txdev(const struct ofproto_dpif *ofproto,
@@ -2998,6 +3002,7 @@ ofproto_dpif_extract_flow_key(const struct ofproto_dpif *ofproto,
                               struct ofpbuf *packet)
 {
     enum odp_key_fitness fitness;
+    bool adjusted = false;
 
     fitness = odp_flow_key_to_flow(key, key_len, flow);
     if (fitness == ODP_FIT_ERROR) {
@@ -3005,7 +3010,9 @@ ofproto_dpif_extract_flow_key(const struct ofproto_dpif *ofproto,
     }
     *initial_tci = flow->vlan_tci;
 
-    if (vsp_adjust_flow(ofproto, flow)) {
+    if (tunnel_adjust_flow(ofproto, flow)) {
+        adjusted = true;
+    } else if (vsp_adjust_flow(ofproto, flow)) {
         if (packet) {
             /* Make the packet resemble the flow, so that it gets sent to an
              * OpenFlow controller properly, so that it looks correct for
@@ -3023,11 +3030,12 @@ ofproto_dpif_extract_flow_key(const struct ofproto_dpif *ofproto,
              * since we don't need that header anymore. */
             eth_push_vlan(packet, flow->vlan_tci);
         }
+        adjusted = true;
+    }
 
-        /* Let the caller know that we can't reproduce 'key' from 'flow'. */
-        if (fitness == ODP_FIT_PERFECT) {
-            fitness = ODP_FIT_TOO_MUCH;
-        }
+    /* Let the caller know that we can't reproduce 'key' from 'flow'. */
+    if (adjusted && fitness == ODP_FIT_PERFECT) {
+        fitness = ODP_FIT_TOO_MUCH;
     }
 
     return fitness;
@@ -5934,7 +5942,7 @@ add_mirror_actions(struct action_xlate_ctx *ctx, const struct flow *orig_flow)
     const struct nlattr *a;
     size_t left;
 
-    in_bundle = lookup_input_bundle(ctx->ofproto, orig_flow->in_port,
+    in_bundle = lookup_input_bundle(ctx->ofproto, orig_flow,
                                     ctx->packet != NULL, NULL);
     if (!in_bundle) {
         return;
@@ -6095,13 +6103,17 @@ update_learning_table(struct ofproto_dpif *ofproto,
 }
 
 static struct ofbundle *
-lookup_input_bundle(const struct ofproto_dpif *ofproto, uint16_t in_port,
-                    bool warn, struct ofport_dpif **in_ofportp)
+lookup_input_bundle(const struct ofproto_dpif *ofproto,
+                    const struct flow *flow, bool warn,
+                    struct ofport_dpif **in_ofportp)
 {
     struct ofport_dpif *ofport;
 
     /* Find the port and bundle for the received packet. */
-    ofport = get_ofp_port(ofproto, in_port);
+    ofport = tundev_to_realdev(ofproto, flow);
+    if (!ofport) {
+        ofport = get_ofp_port(ofproto, flow->in_port);
+    }
     if (in_ofportp) {
         *in_ofportp = ofport;
     }
@@ -6111,7 +6123,7 @@ lookup_input_bundle(const struct ofproto_dpif *ofproto, uint16_t in_port,
 
     /* Special-case OFPP_NONE, which a controller may use as the ingress
      * port for traffic that it is sourcing. */
-    if (in_port == OFPP_NONE) {
+    if (flow->in_port == OFPP_NONE) {
         return &ofpp_none_bundle;
     }
 
@@ -6129,7 +6141,7 @@ lookup_input_bundle(const struct ofproto_dpif *ofproto, uint16_t in_port,
         static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
         VLOG_WARN_RL(&rl, "bridge %s: received packet on unknown "
-                     "port %"PRIu16, ofproto->up.name, in_port);
+                     "port %"PRIu16, ofproto->up.name, flow->in_port);
     }
     return NULL;
 }
@@ -6196,7 +6208,7 @@ xlate_normal(struct action_xlate_ctx *ctx)
 
     ctx->has_normal = true;
 
-    in_bundle = lookup_input_bundle(ctx->ofproto, ctx->flow.in_port,
+    in_bundle = lookup_input_bundle(ctx->ofproto, &ctx->flow,
                                     ctx->packet != NULL, &in_port);
     if (!in_bundle) {
         return;
@@ -7166,16 +7178,19 @@ tun_remove(struct ofport_dpif *ofport)
 }
 
 static void
-tun_add(struct ofport_dpif *ofport, uint16_t tundev_ofp_port,
-        const struct tunnel_settings *s)
+tun_add(struct ofport_dpif *ofport)
 {
     struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofport->up.ofproto);
 
-    ofport->tun->tundev_ofp_port = tundev_ofp_port;
-    ofport->tun->s = *s;
+    /* Only add if the saddr is non-zero, in which case ofport is a
+     * realdev. Otherwise it is a tundev */
+    if (ofport->tun->s.daddr == htonl(0)) {
+        return;
+    }
+
     (*tun_port_pool(&ofport->tun->s))++;
     hmap_insert(&ofproto->tundev_map, &ofport->tun->tundev_node,
-                hash_int(tundev_ofp_port, 0));
+                hash_int(ofport->tun->tundev_ofp_port, 0));
 }
 
 static int
@@ -7203,15 +7218,154 @@ set_tunnelling(struct ofport *ofport_, uint16_t tundev_ofp_port,
         if (ofport->tun->tundev_ofp_port == tundev_ofp_port &&
             tunnel_settings_equal(&ofport->tun->s, s)) {
             return 0;
-        }
+       }
         tun_remove(ofport);
     }
 
-    tun_add(ofport, tundev_ofp_port, s);
+    ofport->tun->s = *s;
+    ofport->tun->tundev_ofp_port = tundev_ofp_port;
+    tun_add(ofport);
 
     return 0;
 }
 
+struct tunnel_lookup_key {
+    ovs_be64 tun_id;
+    ovs_be32 ipv4_src;
+    ovs_be32 ipv4_dst;
+    uint8_t tun_type;
+};
+
+static struct ofport_dpif *
+tundev_find(const struct ofproto_dpif *ofproto, uint16_t tundev_ofp_port,
+            const struct tunnel_lookup_key *tun_key)
+{
+    struct ofport_dpif_tun *tun;
+
+    HMAP_FOR_EACH_WITH_HASH (tun, tundev_node, hash_int(tundev_ofp_port, 0),
+                             &ofproto->tundev_map) {
+        if (tun_key->tun_type == tun->s.type &&
+            tun_key->ipv4_dst == tun->s.daddr &&
+            tun_key->tun_id == tun->s.in_key &&
+            tun_key->ipv4_src == tun->s.saddr) {
+	    return tun->ofport;
+        }
+    }
+
+    return NULL;
+}
+
+/* Returns the OpenFlow port number of the "real" device underlying the Linux
+ * tunnel device matching tun_key.
+ *
+ * Returns 0 if no match is found */
+static struct ofport_dpif *
+tundev_to_realdev(const struct ofproto_dpif *ofproto, const struct flow *flow)
+{
+    bool is_multicast = ipv4_is_multicast(flow->tun_key.ipv4_dst);
+    struct ofport_dpif *tundev_ofport;
+    struct ofport_dpif *realdev_ofport;
+    struct tunnel_lookup_key lookup;
+
+    /* Nothing to do if the packet wasn't unencapsulated on receive */
+    if (!flow->tun_key.ipv4_dst) {
+        return NULL;
+    }
+
+    /* Nothing to do if there are no tunnel devices configured */
+    if (hmap_is_empty(&ofproto->tundev_map)) {
+        return NULL;
+    }
+
+    /* Give up if the tunnel device can't be found
+     * or isn't a tunnel tundev */
+    tundev_ofport = get_ofp_port(ofproto, flow->in_port);
+    if (!tundev_ofport || !tundev_ofport->tun || tundev_ofport->tun->s.daddr) {
+        return NULL;
+    }
+
+    lookup.tun_id = flow->tun_key.tun_id;
+    lookup.ipv4_src = flow->tun_key.ipv4_dst;
+    lookup.ipv4_dst = flow->tun_key.ipv4_src;
+
+    /* First try for an exact match on the tun_id */
+    lookup.tun_id = flow->tun_key.tun_id;
+    lookup.tun_type = tundev_ofport->tun->s.type | TNL_T_KEY_EXACT;
+    if (!is_multicast && key_local_remote_ports) {
+        realdev_ofport = tundev_find(ofproto, flow->in_port, &lookup);
+        if (realdev_ofport)
+            return realdev_ofport;
+    }
+    if (key_remote_ports) {
+        lookup.ipv4_src = htonl(0);
+        realdev_ofport = tundev_find(ofproto, flow->in_port, &lookup);
+        if (realdev_ofport)
+            return realdev_ofport;
+        lookup.ipv4_src = flow->tun_key.ipv4_dst;
+    }
+
+    /* Then try matches that wildcard the tun_id. */
+    lookup.tun_id = htonll(0);
+    lookup.tun_type = tundev_ofport->tun->s.type | TNL_T_KEY_MATCH;
+    if (!is_multicast && local_remote_ports) {
+        realdev_ofport = tundev_find(ofproto, flow->in_port, &lookup);
+        if (realdev_ofport)
+            return realdev_ofport;
+    }
+    if (remote_ports) {
+        lookup.ipv4_src = htonl(0);
+        realdev_ofport = tundev_find(ofproto, flow->in_port, &lookup);
+        if (realdev_ofport)
+            return realdev_ofport;
+    }
+
+    if (is_multicast) {
+        lookup.ipv4_src = htonl(0);
+        lookup.ipv4_dst = flow->tun_key.ipv4_dst;
+        if (key_multicast_ports) {
+            lookup.tun_id = flow->tun_key.tun_id;
+            lookup.tun_type = tundev_ofport->tun->s.type | TNL_T_KEY_EXACT;
+            realdev_ofport = tundev_find(ofproto, flow->in_port, &lookup);
+            if (realdev_ofport)
+                return realdev_ofport;
+        }
+        if (multicast_ports) {
+            lookup.tun_id = 0;
+            lookup.tun_type = tundev_ofport->tun->s.type | TNL_T_KEY_MATCH;
+            realdev_ofport = tundev_find(ofproto, flow->in_port, &lookup);
+            if (realdev_ofport)
+                return realdev_ofport;
+        }
+    }
+
+    return NULL;
+}
+
+/* Given 'flow', a flow representing a packet received on 'ofproto', checks
+ * whether 'flow->in_port' represents a Linux tunnel device.  If so, changes
+ * 'flow->in_port' to the "real" device backing the tunnel device, sets
+ * 'flow->key' to using the real device's tunnel settings, and returns true.
+ * Otherwise (which is always the case unless tunneling enabled), returns
+ * false without making any changes. */
+static bool
+tunnel_adjust_flow(const struct ofproto_dpif *ofproto, struct flow *flow)
+{
+    const struct ofport_dpif *realdev_ofport = tundev_to_realdev(ofproto, flow);
+    if (!realdev_ofport) {
+        return false;
+    }
+
+    /* Cause the flow to be processed as if it came in on the real device with
+     * the tunnel's key. */
+    flow->in_port = ofp_port_to_odp_port(realdev_ofport->up.ofp_port);
+    flow->tun_key.tun_id = realdev_ofport->tun->s.out_key;
+    flow->tun_key.ipv4_src = realdev_ofport->tun->s.saddr;
+    flow->tun_key.ipv4_dst = realdev_ofport->tun->s.daddr;
+    flow->tun_key.ipv4_tos = realdev_ofport->tun->s.tos;
+    flow->tun_key.ipv4_ttl = realdev_ofport->tun->s.ttl;
+    return true;
+}
+
 /* Maps a port to the port that it should be transmitted on.
  * If tunneling is enabled then the associated tunnel port is returned.
  * If VLAN splintering is enabled then the ofp_port of the vlandev is
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/21] classifier: Convert struct flow flow_metadata to use tun_key
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
                     ` (5 preceding siblings ...)
  2012-05-24  9:09   ` [PATCH 09/21] ofproto: Add tundev_to_realdev() Simon Horman
@ 2012-05-24  9:09   ` Simon Horman
  2012-05-24  9:09   ` [PATCH 11/21] datapath, vport: Provide tunnel realdev and tundev classes and vports Simon Horman
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

This allows the tun_key tp be bassed throughout user-space,
attached to a flow. This is the essence of flow-based tunneling.

This does not add tun_key or wildcards, other than the existing match for
the tun_id. It is envisaged that most if not all fields of the tun_key
could be wildcarded.

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

---

v4
* flow_format() and ofp_print_packet_in() format strings:
  - Make more consistent with eachother and format_odp_key_attr()
  - Update for flags field of tunnel
* Remove debugging message
* Add struct flow_tun_key to avoid needing to use
  ovs_key_ipv4_tunnel which is defined in a Linux kernel header.
  This code should be ofproto-provider agnostic.

v3
* Initial posting

classifer: don't use kernel tunnel structure
---
 lib/classifier.c        |  8 ++++----
 lib/dpif-linux.c        |  2 +-
 lib/flow.c              | 31 ++++++++++++++++++++++++++-----
 lib/flow.h              | 21 ++++++++++++++++-----
 lib/meta-flow.c         |  4 ++--
 lib/nx-match.c          |  2 +-
 lib/odp-util.c          | 24 ++++++++++++++++--------
 lib/ofp-print.c         | 12 ++++++++++--
 lib/ofp-util.c          |  4 ++--
 ofproto/ofproto-dpif.c  | 11 ++++++-----
 tests/test-classifier.c |  7 ++++---
 11 files changed, 88 insertions(+), 38 deletions(-)

diff --git a/lib/classifier.c b/lib/classifier.c
index e11a585..7dc6560 100644
--- a/lib/classifier.c
+++ b/lib/classifier.c
@@ -129,7 +129,7 @@ cls_rule_set_tun_id_masked(struct cls_rule *rule,
                            ovs_be64 tun_id, ovs_be64 mask)
 {
     rule->wc.tun_id_mask = mask;
-    rule->flow.tun_id = tun_id & mask;
+    rule->flow.tun_key.tun_id = tun_id & mask;
 }
 
 void
@@ -563,11 +563,11 @@ cls_rule_format(const struct cls_rule *rule, struct ds *s)
     case 0:
         break;
     case CONSTANT_HTONLL(UINT64_MAX):
-        ds_put_format(s, "tun_id=%#"PRIx64",", ntohll(f->tun_id));
+        ds_put_format(s, "tun_id=%#"PRIx64",", ntohll(f->tun_key.tun_id));
         break;
     default:
         ds_put_format(s, "tun_id=%#"PRIx64"/%#"PRIx64",",
-                      ntohll(f->tun_id), ntohll(wc->tun_id_mask));
+                      ntohll(f->tun_key.tun_id), ntohll(wc->tun_id_mask));
         break;
     }
     if (!(w & FWW_IN_PORT)) {
@@ -1187,7 +1187,7 @@ flow_equal_except(const struct flow *a, const struct flow *b,
         }
     }
 
-    return (!((a->tun_id ^ b->tun_id) & wildcards->tun_id_mask)
+    return (!((a->tun_key.tun_id ^ b->tun_key.tun_id) & wildcards->tun_id_mask)
             && !((a->nw_src ^ b->nw_src) & wildcards->nw_src_mask)
             && !((a->nw_dst ^ b->nw_dst) & wildcards->nw_dst_mask)
             && (wc & FWW_IN_PORT || a->in_port == b->in_port)
diff --git a/lib/dpif-linux.c b/lib/dpif-linux.c
index 256c9d6..0e5cdd2 100644
--- a/lib/dpif-linux.c
+++ b/lib/dpif-linux.c
@@ -1292,7 +1292,7 @@ dpif_linux_vport_send(int dp_ifindex, uint32_t port_no,
     uint64_t action;
 
     ofpbuf_use_const(&packet, data, size);
-    flow_extract(&packet, 0, htonll(0), 0, &flow);
+    flow_extract(&packet, 0, NULL, 0, &flow);
 
     ofpbuf_use_stack(&key, &keybuf, sizeof keybuf);
     odp_flow_key_from_flow(&key, &flow);
diff --git a/lib/flow.c b/lib/flow.c
index fc61610..8645e7d 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -330,7 +330,8 @@ invalid:
  *      present and has a correct length, and otherwise NULL.
  */
 void
-flow_extract(struct ofpbuf *packet, uint32_t skb_priority, ovs_be64 tun_id,
+flow_extract(struct ofpbuf *packet, uint32_t skb_priority,
+             const struct flow_tun_key *tun_key,
              uint16_t ofp_in_port, struct flow *flow)
 {
     struct ofpbuf b = *packet;
@@ -339,7 +340,9 @@ flow_extract(struct ofpbuf *packet, uint32_t skb_priority, ovs_be64 tun_id,
     COVERAGE_INC(flow_extract);
 
     memset(flow, 0, sizeof *flow);
-    flow->tun_id = tun_id;
+    if (tun_key) {
+        flow->tun_key = *tun_key;;
+    }
     flow->in_port = ofp_in_port;
     flow->skb_priority = skb_priority;
 
@@ -449,7 +452,7 @@ flow_zero_wildcards(struct flow *flow, const struct flow_wildcards *wildcards)
     for (i = 0; i < FLOW_N_REGS; i++) {
         flow->regs[i] &= wildcards->reg_masks[i];
     }
-    flow->tun_id &= wildcards->tun_id_mask;
+    flow->tun_key.tun_id &= wildcards->tun_id_mask;
     flow->nw_src &= wildcards->nw_src_mask;
     flow->nw_dst &= wildcards->nw_dst_mask;
     if (wc & FWW_IN_PORT) {
@@ -508,7 +511,7 @@ flow_get_metadata(const struct flow *flow, struct flow_metadata *fmd)
 {
     BUILD_ASSERT_DECL(FLOW_WC_SEQ == 10);
 
-    fmd->tun_id = flow->tun_id;
+    fmd->tun_key = flow->tun_key;
     fmd->tun_id_mask = htonll(UINT64_MAX);
 
     memcpy(fmd->regs, flow->regs, sizeof fmd->regs);
@@ -528,11 +531,13 @@ flow_to_string(const struct flow *flow)
 void
 flow_format(struct ds *ds, const struct flow *flow)
 {
+    /* The tunnel key is also displayed as part of tunnel() below.
+     * It is here for backwards-compatibility */
     ds_put_format(ds, "priority:%"PRIu32
                       ",tunnel:%#"PRIx64
                       ",in_port:%04"PRIx16,
                       flow->skb_priority,
-                      ntohll(flow->tun_id),
+                      ntohll(flow->tun_key.tun_id),
                       flow->in_port);
 
     ds_put_format(ds, ",tci(");
@@ -579,6 +584,22 @@ flow_format(struct ds *ds, const struct flow *flow)
                 ETH_ADDR_ARGS(flow->arp_sha),
                 ETH_ADDR_ARGS(flow->arp_tha));
     }
+    if (!eth_addr_is_zero(flow->arp_sha) || !eth_addr_is_zero(flow->arp_tha)) {
+        ds_put_format(ds, " arp_ha("ETH_ADDR_FMT"->"ETH_ADDR_FMT")",
+                ETH_ADDR_ARGS(flow->arp_sha),
+                ETH_ADDR_ARGS(flow->arp_tha));
+    }
+    if (flow->tun_key.ipv4_dst != htonl(0)) {
+        ds_put_format(ds, " tunnel(tun_id:%"PRIx64",flags:%"PRIx32
+                          ",ip("IP_FMT"->"IP_FMT"),"
+                          ",tos:%"PRIx8",ttl:%"PRIu8")",
+                          ntohll(flow->tun_key.tun_id),
+                          flow->tun_key.tun_flags,
+                          IP_ARGS(&flow->tun_key.ipv4_src),
+                          IP_ARGS(&flow->tun_key.ipv4_dst),
+                          flow->tun_key.ipv4_tos, flow->tun_key.ipv4_ttl);
+    }
+
 }
 
 void
diff --git a/lib/flow.h b/lib/flow.h
index 7ee9a26..0b5932f 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -52,8 +52,18 @@ BUILD_ASSERT_DECL(FLOW_N_REGS <= NXM_NX_MAX_REGS);
 BUILD_ASSERT_DECL(FLOW_NW_FRAG_ANY == NX_IP_FRAG_ANY);
 BUILD_ASSERT_DECL(FLOW_NW_FRAG_LATER == NX_IP_FRAG_LATER);
 
+struct flow_tun_key {
+	ovs_be64 tun_id;
+	uint32_t tun_flags;
+	ovs_be32 ipv4_src;
+	ovs_be32 ipv4_dst;
+	uint8_t  ipv4_tos;
+	uint8_t  ipv4_ttl;
+	uint8_t  pad[2];
+};
+
 struct flow {
-    ovs_be64 tun_id;            /* Encapsulating tunnel ID. */
+    struct flow_tun_key tun_key;/* Encapsulating tunnel. */
     struct in6_addr ipv6_src;   /* IPv6 source address. */
     struct in6_addr ipv6_dst;   /* IPv6 destination address. */
     struct in6_addr nd_target;  /* IPv6 neighbor discovery (ND) target. */
@@ -82,7 +92,7 @@ struct flow {
  * indicate which metadata fields are relevant in a given context.  Typically
  * they will be all 1 or all 0. */
 struct flow_metadata {
-    ovs_be64 tun_id;                 /* Encapsulating tunnel ID. */
+    struct flow_tun_key tun_key;     /* Encapsulating tunnel. */
     ovs_be64 tun_id_mask;            /* 1-bit in each significant tun_id bit.*/
 
     uint32_t regs[FLOW_N_REGS];      /* Registers. */
@@ -93,16 +103,17 @@ struct flow_metadata {
 
 /* Assert that there are FLOW_SIG_SIZE bytes of significant data in "struct
  * flow", followed by FLOW_PAD_SIZE bytes of padding. */
-#define FLOW_SIG_SIZE (110 + FLOW_N_REGS * 4)
+#define FLOW_SIG_SIZE (126 + FLOW_N_REGS * 4)
 #define FLOW_PAD_SIZE 2
 BUILD_ASSERT_DECL(offsetof(struct flow, nw_frag) == FLOW_SIG_SIZE - 1);
 BUILD_ASSERT_DECL(sizeof(((struct flow *)0)->nw_frag) == 1);
 BUILD_ASSERT_DECL(sizeof(struct flow) == FLOW_SIG_SIZE + FLOW_PAD_SIZE);
 
 /* Remember to update FLOW_WC_SEQ when changing 'struct flow'. */
-BUILD_ASSERT_DECL(FLOW_SIG_SIZE == 142 && FLOW_WC_SEQ == 10);
+BUILD_ASSERT_DECL(FLOW_SIG_SIZE == 158 && FLOW_WC_SEQ == 10);
 
-void flow_extract(struct ofpbuf *, uint32_t priority, ovs_be64 tun_id,
+void flow_extract(struct ofpbuf *, uint32_t priority,
+	          const struct flow_tun_key *,
                   uint16_t in_port, struct flow *);
 void flow_zero_wildcards(struct flow *, const struct flow_wildcards *);
 void flow_get_metadata(const struct flow *, struct flow_metadata *);
diff --git a/lib/meta-flow.c b/lib/meta-flow.c
index 8b60b35..0b47ea1 100644
--- a/lib/meta-flow.c
+++ b/lib/meta-flow.c
@@ -962,7 +962,7 @@ mf_get_value(const struct mf_field *mf, const struct flow *flow,
 {
     switch (mf->id) {
     case MFF_TUN_ID:
-        value->be64 = flow->tun_id;
+        value->be64 = flow->tun_key.tun_id;
         break;
 
     case MFF_IN_PORT:
@@ -1300,7 +1300,7 @@ mf_set_flow_value(const struct mf_field *mf,
 {
     switch (mf->id) {
     case MFF_TUN_ID:
-        flow->tun_id = value->be64;
+        flow->tun_key.tun_id = value->be64;
         break;
 
     case MFF_IN_PORT:
diff --git a/lib/nx-match.c b/lib/nx-match.c
index 34c8354..f97ef5d 100644
--- a/lib/nx-match.c
+++ b/lib/nx-match.c
@@ -541,7 +541,7 @@ nx_put_match(struct ofpbuf *b, const struct cls_rule *cr,
     }
 
     /* Tunnel ID. */
-    nxm_put_64m(b, NXM_NX_TUN_ID, flow->tun_id, cr->wc.tun_id_mask);
+    nxm_put_64m(b, NXM_NX_TUN_ID, flow->tun_key.tun_id, cr->wc.tun_id_mask);
 
     /* Registers. */
     for (i = 0; i < FLOW_N_REGS; i++) {
diff --git a/lib/odp-util.c b/lib/odp-util.c
index 7cff00c..5f76f5e 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -1299,8 +1299,12 @@ odp_flow_key_from_flow(struct ofpbuf *buf, const struct flow *flow)
         nl_msg_put_u32(buf, OVS_KEY_ATTR_PRIORITY, flow->skb_priority);
     }
 
-    if (flow->tun_id != htonll(0)) {
-        nl_msg_put_be64(buf, OVS_KEY_ATTR_TUN_ID, flow->tun_id);
+    if (flow->tun_key.ipv4_dst != htonl(0)) {
+        struct flow_tun_key *tun_key;
+
+        tun_key = nl_msg_put_unspec_uninit(buf, OVS_KEY_ATTR_IPV4_TUNNEL,
+                                           sizeof *tun_key);
+        *tun_key = flow->tun_key;
     }
 
     if (flow->in_port != OFPP_NONE && flow->in_port != OFPP_CONTROLLER) {
@@ -1791,9 +1795,13 @@ odp_flow_key_to_flow(const struct nlattr *key, size_t key_len,
         expected_attrs |= UINT64_C(1) << OVS_KEY_ATTR_PRIORITY;
     }
 
-    if (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_TUN_ID)) {
-        flow->tun_id = nl_attr_get_be64(attrs[OVS_KEY_ATTR_TUN_ID]);
-        expected_attrs |= UINT64_C(1) << OVS_KEY_ATTR_TUN_ID;
+    if (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_IPV4_TUNNEL)) {
+        const struct flow_tun_key *tun_key;
+
+        tun_key = nl_attr_get(attrs[OVS_KEY_ATTR_IPV4_TUNNEL]);
+        flow->tun_key = *tun_key;
+
+        expected_attrs |= UINT64_C(1) << OVS_KEY_ATTR_IPV4_TUNNEL;
     }
 
     if (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_IN_PORT)) {
@@ -1887,13 +1895,13 @@ static void
 commit_set_tun_id_action(const struct flow *flow, struct flow *base,
                          struct ofpbuf *odp_actions)
 {
-    if (base->tun_id == flow->tun_id) {
+    if (base->tun_key.tun_id == flow->tun_key.tun_id) {
         return;
     }
-    base->tun_id = flow->tun_id;
+    base->tun_key.tun_id = flow->tun_key.tun_id;
 
     commit_set_action(odp_actions, OVS_KEY_ATTR_TUN_ID,
-                      &base->tun_id, sizeof(base->tun_id));
+                      &base->tun_key.tun_id, sizeof(base->tun_key.tun_id));
 }
 
 static void
diff --git a/lib/ofp-print.c b/lib/ofp-print.c
index 1757a30..fff7454 100644
--- a/lib/ofp-print.c
+++ b/lib/ofp-print.c
@@ -106,11 +106,19 @@ ofp_print_packet_in(struct ds *string, const struct ofp_header *oh,
     ds_put_format(string, " total_len=%"PRIu16" in_port=", pin.total_len);
     ofputil_format_port(pin.fmd.in_port, string);
 
-    if (pin.fmd.tun_id_mask) {
-        ds_put_format(string, " tun_id=0x%"PRIx64, ntohll(pin.fmd.tun_id));
+    if (pin.fmd.tun_key.ipv4_dst != htonl(0)) {
+        ds_put_format(string, " tunnel(tun_id=0x%"PRIx64,
+                              ntohll(pin.fmd.tun_key.tun_id));
         if (pin.fmd.tun_id_mask != htonll(UINT64_MAX)) {
             ds_put_format(string, "/0x%"PRIx64, ntohll(pin.fmd.tun_id_mask));
         }
+        ds_put_format(string, ",flags=%"PRIx32",ip="IP_FMT"->"IP_FMT","
+                              "tos=%"PRIx8",ttl=%"PRIu8")",
+                              pin.fmd.tun_key.tun_flags,
+                              IP_ARGS(&pin.fmd.tun_key.ipv4_src),
+                              IP_ARGS(&pin.fmd.tun_key.ipv4_dst),
+                              pin.fmd.tun_key.ipv4_tos,
+                              pin.fmd.tun_key.ipv4_ttl);
     }
 
     for (i = 0; i < FLOW_N_REGS; i++) {
diff --git a/lib/ofp-util.c b/lib/ofp-util.c
index 90124ec..652a6bf 100644
--- a/lib/ofp-util.c
+++ b/lib/ofp-util.c
@@ -2096,7 +2096,7 @@ ofputil_decode_packet_in(struct ofputil_packet_in *pin,
 
         pin->fmd.in_port = rule.flow.in_port;
 
-        pin->fmd.tun_id = rule.flow.tun_id;
+        pin->fmd.tun_key.tun_id = rule.flow.tun_key.tun_id;
         pin->fmd.tun_id_mask = rule.wc.tun_id_mask;
 
         memcpy(pin->fmd.regs, rule.flow.regs, sizeof pin->fmd.regs);
@@ -2149,7 +2149,7 @@ ofputil_encode_packet_in(const struct ofputil_packet_in *pin,
                             + 2 + send_len);
 
         cls_rule_init_catchall(&rule, 0);
-        cls_rule_set_tun_id_masked(&rule, pin->fmd.tun_id,
+        cls_rule_set_tun_id_masked(&rule, pin->fmd.tun_key.tun_id,
                                    pin->fmd.tun_id_mask);
 
         for (i = 0; i < FLOW_N_REGS; i++) {
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 03a86bc..2a52f37 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3080,7 +3080,7 @@ handle_miss_upcalls(struct ofproto_dpif *ofproto, struct dpif_upcall *upcalls,
             continue;
         }
         flow_extract(upcall->packet, miss->flow.skb_priority,
-                     miss->flow.tun_id, miss->flow.in_port, &miss->flow);
+                     &miss->flow.tun_key, miss->flow.in_port, &miss->flow);
 
         /* Add other packets to a to-do list. */
         hash = flow_hash(&miss->flow, 0);
@@ -5464,7 +5464,7 @@ do_xlate_actions(const union ofp_action *in, size_t n_in,
         case OFPUTIL_NXAST_SET_TUNNEL:
             nast = (const struct nx_action_set_tunnel *) ia;
             tun_id = htonll(ntohl(nast->tun_id));
-            ctx->flow.tun_id = tun_id;
+            ctx->flow.tun_key.tun_id = tun_id;
             break;
 
         case OFPUTIL_NXAST_SET_QUEUE:
@@ -5492,7 +5492,7 @@ do_xlate_actions(const union ofp_action *in, size_t n_in,
 
         case OFPUTIL_NXAST_SET_TUNNEL64:
             tun_id = ((const struct nx_action_set_tunnel64 *) ia)->tun_id;
-            ctx->flow.tun_id = tun_id;
+            ctx->flow.tun_key.tun_id = tun_id;
             break;
 
         case OFPUTIL_NXAST_MULTIPATH:
@@ -5576,7 +5576,7 @@ action_xlate_ctx_init(struct action_xlate_ctx *ctx,
     ctx->ofproto = ofproto;
     ctx->flow = *flow;
     ctx->base_flow = ctx->flow;
-    ctx->base_flow.tun_id = 0;
+    ctx->base_flow.tun_key.ipv4_src = 0;
     ctx->base_flow.vlan_tci = initial_tci;
     ctx->rule = rule;
     ctx->packet = packet;
@@ -6739,6 +6739,7 @@ ofproto_unixctl_trace(struct unixctl_conn *conn, int argc, const char *argv[],
         const char *packet_s = argv[5];
         uint16_t in_port = ofp_port_to_odp_port(atoi(in_port_s));
         ovs_be64 tun_id = htonll(strtoull(tun_id_s, NULL, 0));
+        struct ovs_key_ipv4_tunnel tun_key = { .tun_id = tun_id };
         uint32_t priority = atoi(priority_s);
         const char *msg;
 
@@ -6753,7 +6754,7 @@ ofproto_unixctl_trace(struct unixctl_conn *conn, int argc, const char *argv[],
         ds_put_cstr(&result, s);
         free(s);
 
-        flow_extract(packet, priority, tun_id, in_port, &flow);
+        flow_extract(packet, priority, &tun_key, in_port, &flow);
         initial_tci = flow.vlan_tci;
     } else {
         unixctl_command_reply_error(conn, "Bad command syntax");
diff --git a/tests/test-classifier.c b/tests/test-classifier.c
index fcafdb2..5bb5df8 100644
--- a/tests/test-classifier.c
+++ b/tests/test-classifier.c
@@ -44,7 +44,7 @@
     /*                                    struct flow  all-caps */  \
     /*        FWW_* bit(s)                member name  name     */  \
     /*        --------------------------  -----------  -------- */  \
-    CLS_FIELD(0,                          tun_id,      TUN_ID)      \
+    CLS_FIELD(0,                          tun_key.tun_id,  TUN_ID)  \
     CLS_FIELD(0,                          nw_src,      NW_SRC)      \
     CLS_FIELD(0,                          nw_dst,      NW_DST)      \
     CLS_FIELD(FWW_IN_PORT,                in_port,     IN_PORT)     \
@@ -206,7 +206,8 @@ match(const struct cls_rule *wild, const struct flow *fixed)
             eq = !((fixed->vlan_tci ^ wild->flow.vlan_tci)
                    & wild->wc.vlan_tci_mask);
         } else if (f_idx == CLS_F_IDX_TUN_ID) {
-            eq = !((fixed->tun_id ^ wild->flow.tun_id) & wild->wc.tun_id_mask);
+            eq = !((fixed->tun_key.tun_id ^ wild->flow.tun_key.tun_id) &
+                   wild->wc.tun_id_mask);
         } else if (f_idx == CLS_F_IDX_NW_DSCP) {
             eq = !((fixed->nw_tos ^ wild->flow.nw_tos) & IP_DSCP_MASK);
         } else {
@@ -362,7 +363,7 @@ compare_classifiers(struct classifier *cls, struct tcls *tcls)
         x = rand () % N_FLOW_VALUES;
         flow.nw_src = nw_src_values[get_value(&x, N_NW_SRC_VALUES)];
         flow.nw_dst = nw_dst_values[get_value(&x, N_NW_DST_VALUES)];
-        flow.tun_id = tun_id_values[get_value(&x, N_TUN_ID_VALUES)];
+        flow.tun_key.tun_id = tun_id_values[get_value(&x, N_TUN_ID_VALUES)];
         flow.in_port = in_port_values[get_value(&x, N_IN_PORT_VALUES)];
         flow.vlan_tci = vlan_tci_values[get_value(&x, N_VLAN_TCI_VALUES)];
         flow.dl_type = dl_type_values[get_value(&x, N_DL_TYPE_VALUES)];
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/21] datapath, vport: Provide tunnel realdev and tundev classes and vports
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
                     ` (6 preceding siblings ...)
  2012-05-24  9:09   ` [PATCH 10/21] classifier: Convert struct flow flow_metadata to use tun_key Simon Horman
@ 2012-05-24  9:09   ` Simon Horman
  2012-05-24  9:09   ` [PATCH 12/21] lib: Replace commit_set_tun_id_action() with commit_set_tunnel_action() Simon Horman
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

On the user-space side of things, the existing tunnel classes become tunnel
realdev classes and new classes are added to provide tunnel tundevs.

On the datpath side of things, the existing tunnel vports are used as
tundev vports. A new vport is added for tunnel realdevs.

It should be possible to remove realdevs entirely from the datapath,
however that requries teaching the user-space netdev to exclude them from
kernel-related opperations. I have avoided that at this time in order to
allow review of other aspects of the approach taken in my flow-bassed
tunneling prototype.

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

--

v4
* Tunnel tundevs should have a NULL set_config callback as their
  parse_config call back is NULL. Otherwise, reconfiguration will fail and
  ovs-vwitchd will exit if started with tundevs already configured.
* Remove unparse_tunnel_config, it is not used

v3
* Initial Post

remove unparse_tunnel_config
---
 datapath/Modules.mk             |   3 +-
 datapath/tunnel.c               | 158 +------------------
 datapath/vport-capwap.c         |   2 -
 datapath/vport-gre.c            |   2 -
 datapath/vport-tunnel-realdev.c | 260 +++++++++++++++++++++++++++++++
 datapath/vport.c                |   1 +
 datapath/vport.h                |   1 +
 include/linux/openvswitch.h     |   1 +
 include/openvswitch/tunnel.h    |   2 +
 lib/netdev-vport.c              | 333 +++++++++-------------------------------
 10 files changed, 343 insertions(+), 420 deletions(-)
 create mode 100644 datapath/vport-tunnel-realdev.c

diff --git a/datapath/Modules.mk b/datapath/Modules.mk
index 24c1075..9aed4c3 100644
--- a/datapath/Modules.mk
+++ b/datapath/Modules.mk
@@ -26,7 +26,8 @@ openvswitch_sources = \
 	vport-gre.c \
 	vport-internal_dev.c \
 	vport-netdev.c \
-	vport-patch.c
+	vport-patch.c \
+	vport-tunnel-realdev.c
 
 openvswitch_headers = \
 	checksum.h \
diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index 61add96..f07ec69 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -250,21 +250,6 @@ static void port_table_add_port(struct vport *vport)
 	(*find_port_pool(rtnl_dereference(tnl_vport->mutable)))++;
 }
 
-static void port_table_move_port(struct vport *vport,
-		      struct tnl_mutable_config *new_mutable)
-{
-	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
-	u32 hash;
-
-	hash = port_hash(&new_mutable->key);
-	hlist_del_init_rcu(&tnl_vport->hash_node);
-	hlist_add_head_rcu(&tnl_vport->hash_node, find_bucket(hash));
-
-	(*find_port_pool(rtnl_dereference(tnl_vport->mutable)))--;
-	assign_config_rcu(vport, new_mutable);
-	(*find_port_pool(rtnl_dereference(tnl_vport->mutable)))++;
-}
-
 static void port_table_remove_port(struct vport *vport)
 {
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
@@ -1381,71 +1366,20 @@ out:
 	return sent_len;
 }
 
-static const struct nla_policy tnl_policy[OVS_TUNNEL_ATTR_MAX + 1] = {
-	[OVS_TUNNEL_ATTR_FLAGS]    = { .type = NLA_U32 },
-	[OVS_TUNNEL_ATTR_DST_IPV4] = { .type = NLA_U32 },
-	[OVS_TUNNEL_ATTR_SRC_IPV4] = { .type = NLA_U32 },
-	[OVS_TUNNEL_ATTR_OUT_KEY]  = { .type = NLA_U64 },
-	[OVS_TUNNEL_ATTR_IN_KEY]   = { .type = NLA_U64 },
-	[OVS_TUNNEL_ATTR_TOS]      = { .type = NLA_U8 },
-	[OVS_TUNNEL_ATTR_TTL]      = { .type = NLA_U8 },
-};
-
 /* Sets OVS_TUNNEL_ATTR_* fields in 'mutable', which must initially be
  * zeroed. */
-static int tnl_set_config(struct net *net, struct nlattr *options,
+static int tnl_set_config(struct net *net,
 			  const struct tnl_ops *tnl_ops,
 			  const struct vport *cur_vport,
 			  struct tnl_mutable_config *mutable)
 {
 	const struct vport *old_vport;
 	const struct tnl_mutable_config *old_mutable;
-	struct nlattr *a[OVS_TUNNEL_ATTR_MAX + 1];
-	int err;
-
-	if (!options)
-		return -EINVAL;
-
-	err = nla_parse_nested(a, OVS_TUNNEL_ATTR_MAX, options, tnl_policy);
-	if (err)
-		return err;
-
-	if (!a[OVS_TUNNEL_ATTR_FLAGS] || !a[OVS_TUNNEL_ATTR_DST_IPV4])
-		return -EINVAL;
-
-	mutable->flags = nla_get_u32(a[OVS_TUNNEL_ATTR_FLAGS]) & TNL_F_PUBLIC;
 
+	mutable->flags = 0;
 	port_key_set_net(&mutable->key, net);
-	mutable->key.daddr = nla_get_be32(a[OVS_TUNNEL_ATTR_DST_IPV4]);
-	if (a[OVS_TUNNEL_ATTR_SRC_IPV4]) {
-		if (ipv4_is_multicast(mutable->key.daddr))
-			return -EINVAL;
-		mutable->key.saddr = nla_get_be32(a[OVS_TUNNEL_ATTR_SRC_IPV4]);
-	}
-
-	if (a[OVS_TUNNEL_ATTR_TOS]) {
-		mutable->tos = nla_get_u8(a[OVS_TUNNEL_ATTR_TOS]);
-		/* Reject ToS config with ECN bits set. */
-		if (mutable->tos & INET_ECN_MASK)
-			return -EINVAL;
-	}
-
-	if (a[OVS_TUNNEL_ATTR_TTL])
-		mutable->ttl = nla_get_u8(a[OVS_TUNNEL_ATTR_TTL]);
-
+	mutable->key.daddr = htonl(0);
 	mutable->key.tunnel_type = tnl_ops->tunnel_type;
-	if (!a[OVS_TUNNEL_ATTR_IN_KEY]) {
-		mutable->key.tunnel_type |= TNL_T_KEY_MATCH;
-		mutable->flags |= TNL_F_IN_KEY_MATCH;
-	} else {
-		mutable->key.tunnel_type |= TNL_T_KEY_EXACT;
-		mutable->key.in_key = nla_get_be64(a[OVS_TUNNEL_ATTR_IN_KEY]);
-	}
-
-	if (!a[OVS_TUNNEL_ATTR_OUT_KEY])
-		mutable->flags |= TNL_F_OUT_KEY_ACTION;
-	else
-		mutable->out_key = nla_get_be64(a[OVS_TUNNEL_ATTR_OUT_KEY]);
 
 	mutable->tunnel_hlen = tnl_ops->hdr_len(mutable);
 	if (mutable->tunnel_hlen < 0)
@@ -1458,21 +1392,6 @@ static int tnl_set_config(struct net *net, struct nlattr *options,
 		return -EEXIST;
 
 	mutable->mlink = 0;
-	if (ipv4_is_multicast(mutable->key.daddr)) {
-		struct net_device *dev;
-		struct rtable *rt;
-
-		rt = __find_route(mutable, tnl_ops->ipproto, mutable->tos,
-				  mutable->key.daddr, mutable->key.saddr);
-		if (IS_ERR(rt))
-			return -EADDRNOTAVAIL;
-		dev = rt_dst(rt).dev;
-		ip_rt_put(rt);
-		if (__in_dev_get_rtnl(dev) == NULL)
-			return -EADDRNOTAVAIL;
-		mutable->mlink = dev->ifindex;
-		ip_mc_inc_group(__in_dev_get_rtnl(dev), mutable->key.daddr);
-	}
 
 	return 0;
 }
@@ -1509,8 +1428,7 @@ struct vport *ovs_tnl_create(const struct vport_parms *parms,
 	get_random_bytes(&initial_frag_id, sizeof(int));
 	atomic_set(&tnl_vport->frag_id, initial_frag_id);
 
-	err = tnl_set_config(ovs_dp_get_net(parms->dp), parms->options, tnl_ops,
-			     NULL, mutable);
+	err = tnl_set_config(ovs_dp_get_net(parms->dp), tnl_ops, NULL, mutable);
 	if (err)
 		goto error_free_mutable;
 
@@ -1535,74 +1453,6 @@ error:
 	return ERR_PTR(err);
 }
 
-int ovs_tnl_set_options(struct vport *vport, struct nlattr *options)
-{
-	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
-	const struct tnl_mutable_config *old_mutable;
-	struct tnl_mutable_config *mutable;
-	int err;
-
-	mutable = kzalloc(sizeof(struct tnl_mutable_config), GFP_KERNEL);
-	if (!mutable) {
-		err = -ENOMEM;
-		goto error;
-	}
-
-	/* Copy fields whose values should be retained. */
-	old_mutable = rtnl_dereference(tnl_vport->mutable);
-	mutable->seq = old_mutable->seq + 1;
-	memcpy(mutable->eth_addr, old_mutable->eth_addr, ETH_ALEN);
-
-	/* Parse the others configured by userspace. */
-	err = tnl_set_config(ovs_dp_get_net(vport->dp), options, tnl_vport->tnl_ops,
-			     vport, mutable);
-	if (err)
-		goto error_free;
-
-	if (port_hash(&mutable->key) != port_hash(&old_mutable->key))
-		port_table_move_port(vport, mutable);
-	else
-		assign_config_rcu(vport, mutable);
-
-	return 0;
-
-error_free:
-	free_mutable_rtnl(mutable);
-	kfree(mutable);
-error:
-	return err;
-}
-
-int ovs_tnl_get_options(const struct vport *vport, struct sk_buff *skb)
-{
-	const struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
-	const struct tnl_mutable_config *mutable = rcu_dereference_rtnl(tnl_vport->mutable);
-
-	if (nla_put_u32(skb, OVS_TUNNEL_ATTR_FLAGS,
-		      mutable->flags & TNL_F_PUBLIC) ||
-	    nla_put_be32(skb, OVS_TUNNEL_ATTR_DST_IPV4, mutable->key.daddr))
-		goto nla_put_failure;
-
-	if (!(mutable->flags & TNL_F_IN_KEY_MATCH) &&
-	    nla_put_be64(skb, OVS_TUNNEL_ATTR_IN_KEY, mutable->key.in_key))
-		goto nla_put_failure;
-	if (!(mutable->flags & TNL_F_OUT_KEY_ACTION) &&
-	    nla_put_be64(skb, OVS_TUNNEL_ATTR_OUT_KEY, mutable->out_key))
-		goto nla_put_failure;
-	if (mutable->key.saddr &&
-	    nla_put_be32(skb, OVS_TUNNEL_ATTR_SRC_IPV4, mutable->key.saddr))
-		goto nla_put_failure;
-	if (mutable->tos && nla_put_u8(skb, OVS_TUNNEL_ATTR_TOS, mutable->tos))
-		goto nla_put_failure;
-	if (mutable->ttl && nla_put_u8(skb, OVS_TUNNEL_ATTR_TTL, mutable->ttl))
-		goto nla_put_failure;
-
-	return 0;
-
-nla_put_failure:
-	return -EMSGSIZE;
-}
-
 static void free_port_rcu(struct rcu_head *rcu)
 {
 	struct tnl_vport *tnl_vport = container_of(rcu,
diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c
index 1e08d5a..f26a7d2 100644
--- a/datapath/vport-capwap.c
+++ b/datapath/vport-capwap.c
@@ -835,8 +835,6 @@ const struct vport_ops ovs_capwap_vport_ops = {
 	.set_addr	= ovs_tnl_set_addr,
 	.get_name	= ovs_tnl_get_name,
 	.get_addr	= ovs_tnl_get_addr,
-	.get_options	= ovs_tnl_get_options,
-	.set_options	= ovs_tnl_set_options,
 	.get_dev_flags	= ovs_vport_gen_get_dev_flags,
 	.is_running	= ovs_vport_gen_is_running,
 	.get_operstate	= ovs_vport_gen_get_operstate,
diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
index fd2b038..f610097 100644
--- a/datapath/vport-gre.c
+++ b/datapath/vport-gre.c
@@ -415,8 +415,6 @@ const struct vport_ops ovs_gre_vport_ops = {
 	.set_addr	= ovs_tnl_set_addr,
 	.get_name	= ovs_tnl_get_name,
 	.get_addr	= ovs_tnl_get_addr,
-	.get_options	= ovs_tnl_get_options,
-	.set_options	= ovs_tnl_set_options,
 	.get_dev_flags	= ovs_vport_gen_get_dev_flags,
 	.is_running	= ovs_vport_gen_is_running,
 	.get_operstate	= ovs_vport_gen_get_operstate,
diff --git a/datapath/vport-tunnel-realdev.c b/datapath/vport-tunnel-realdev.c
new file mode 100644
index 0000000..6225f70
--- /dev/null
+++ b/datapath/vport-tunnel-realdev.c
@@ -0,0 +1,260 @@
+/*
+ * Copyright (c) 2012 Horms Solution Ltd.
+ *
+ * Based on vport-patch.c
+ *
+ * Copyright (c) 2007-2012 Nicira, Inc.
+ *
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ */
+
+#include <linux/kernel.h>
+#include <linux/jhash.h>
+#include <linux/list.h>
+#include <linux/rtnetlink.h>
+#include <net/net_namespace.h>
+
+#include "compat.h"
+#include "datapath.h"
+#include "vport.h"
+#include "vport-generic.h"
+
+struct realdev_config {
+	struct rcu_head rcu;
+
+	unsigned char eth_addr[ETH_ALEN];
+	__be32 daddr;
+	u32 flags;
+};
+
+struct realdev_vport {
+	struct rcu_head rcu;
+
+	char name[IFNAMSIZ];
+
+	struct realdev_config __rcu *realdevconf;
+};
+
+static struct realdev_vport *realdev_vport_priv(const struct vport *vport)
+{
+	return vport_priv(vport);
+}
+
+/* RCU callback. */
+static void free_config(struct rcu_head *rcu)
+{
+	struct realdev_config *c = container_of(rcu, struct realdev_config, rcu);
+	kfree(c);
+}
+
+static void assign_config_rcu(struct vport *vport,
+			      struct realdev_config *new_config)
+{
+	struct realdev_vport *realdev_vport = realdev_vport_priv(vport);
+	struct realdev_config *old_config;
+
+	old_config = rtnl_dereference(realdev_vport->realdevconf);
+	rcu_assign_pointer(realdev_vport->realdevconf, new_config);
+	call_rcu(&old_config->rcu, free_config);
+}
+
+static int realdev_init(void)
+{
+	return 0;
+}
+
+static void realdev_exit(void)
+{
+}
+
+static const struct nla_policy realdev_policy[OVS_TUNNEL_ATTR_MAX + 1] = {
+	[OVS_TUNNEL_ATTR_FLAGS]    = { .type = NLA_U32 },
+	[OVS_TUNNEL_ATTR_DST_IPV4] = { .type = NLA_U32 },
+};
+
+static int realdev_set_config(struct vport *vport, const struct nlattr *options,
+			    struct realdev_config *realdevconf)
+{
+	struct nlattr *a[OVS_TUNNEL_ATTR_MAX + 1];
+	int err;
+
+	if (!options)
+		return -EINVAL;
+
+	err = nla_parse_nested(a, OVS_TUNNEL_ATTR_MAX, options, realdev_policy);
+	if (err)
+		return err;
+
+	if (!a[OVS_TUNNEL_ATTR_FLAGS] || !a[OVS_TUNNEL_ATTR_DST_IPV4])
+		return -EINVAL;
+
+	realdevconf->flags = nla_get_u32(a[OVS_TUNNEL_ATTR_FLAGS]);
+	realdevconf->daddr = nla_get_u32(a[OVS_TUNNEL_ATTR_DST_IPV4]);
+
+	return 0;
+}
+
+
+static struct vport *realdev_create(const struct vport_parms *parms)
+{
+	struct vport *vport;
+	struct realdev_vport *realdev_vport;
+	struct realdev_config *realdevconf;
+	int err;
+
+	vport = ovs_vport_alloc(sizeof(struct realdev_vport),
+				&ovs_tunnel_realdev_vport_ops, parms);
+	if (IS_ERR(vport)) {
+		err = PTR_ERR(vport);
+		goto error;
+	}
+
+	realdev_vport = realdev_vport_priv(vport);
+
+	strcpy(realdev_vport->name, parms->name);
+
+	realdevconf = kmalloc(sizeof(struct realdev_config), GFP_KERNEL);
+	if (!realdevconf) {
+		err = -ENOMEM;
+		goto error_free_vport;
+	}
+
+	err = realdev_set_config(vport, parms->options, realdevconf);
+	if (err)
+		goto error_free_realdevconf;
+
+	random_ether_addr(realdevconf->eth_addr);
+
+	rcu_assign_pointer(realdev_vport->realdevconf, realdevconf);
+
+	return vport;
+
+error_free_realdevconf:
+	kfree(realdevconf);
+error_free_vport:
+	ovs_vport_free(vport);
+error:
+	return ERR_PTR(err);
+}
+
+static void free_port_rcu(struct rcu_head *rcu)
+{
+	struct realdev_vport *realdev_vport = container_of(rcu,
+					  struct realdev_vport, rcu);
+
+	kfree((struct realdev_config __force *)realdev_vport->realdevconf);
+	ovs_vport_free(vport_from_priv(realdev_vport));
+}
+
+static void realdev_destroy(struct vport *vport)
+{
+	struct realdev_vport *realdev_vport = realdev_vport_priv(vport);
+	call_rcu(&realdev_vport->rcu, free_port_rcu);
+}
+
+static int realdev_set_addr(struct vport *vport, const unsigned char *addr)
+{
+	struct realdev_vport *realdev_vport = realdev_vport_priv(vport);
+	struct realdev_config *realdevconf;
+
+	realdevconf = kmemdup(rtnl_dereference(realdev_vport->realdevconf),
+			  sizeof(struct realdev_config), GFP_KERNEL);
+	if (!realdevconf)
+		return -ENOMEM;
+
+	memcpy(realdevconf->eth_addr, addr, ETH_ALEN);
+	assign_config_rcu(vport, realdevconf);
+
+	return 0;
+}
+
+static int realdev_set_options(struct vport *vport, struct nlattr *options)
+{
+	struct realdev_vport *realdev_vport = realdev_vport_priv(vport);
+	struct realdev_config *realdevconf;
+	int err;
+
+	realdevconf = kmemdup(rtnl_dereference(realdev_vport->realdevconf),
+			  sizeof(struct realdev_config), GFP_KERNEL);
+	if (!realdevconf) {
+		err = -ENOMEM;
+		goto error;
+	}
+
+	err = realdev_set_config(vport, options, realdevconf);
+	if (err)
+		goto error_free;
+
+	assign_config_rcu(vport, realdevconf);
+
+	return 0;
+error_free:
+	kfree(realdevconf);
+error:
+	return err;
+}
+
+static const char *realdev_get_name(const struct vport *vport)
+{
+	const struct realdev_vport *realdev_vport = realdev_vport_priv(vport);
+	return realdev_vport->name;
+}
+
+static const unsigned char *realdev_get_addr(const struct vport *vport)
+{
+	const struct realdev_vport *realdev_vport = realdev_vport_priv(vport);
+	return rcu_dereference_rtnl(realdev_vport->realdevconf)->eth_addr;
+}
+
+static int realdev_get_options(const struct vport *vport, struct sk_buff *skb)
+{
+	struct realdev_vport *realdev_vport = realdev_vport_priv(vport);
+	struct realdev_config *realdevconf =
+		rcu_dereference_rtnl(realdev_vport->realdevconf);
+	int err;
+
+	err = nla_put_u32(skb, OVS_TUNNEL_ATTR_FLAGS, realdevconf->flags);
+	if (err)
+		goto error;
+
+	err = nla_put_u32(skb, OVS_TUNNEL_ATTR_DST_IPV4, realdevconf->daddr);
+error:
+	return err;
+}
+
+static int realdev_send(struct vport *vport, struct sk_buff *skb)
+{
+	kfree_skb(skb);
+	ovs_vport_record_error(vport, VPORT_E_TX_DROPPED);
+	return 0;
+}
+
+const struct vport_ops ovs_tunnel_realdev_vport_ops = {
+	.type		= OVS_VPORT_TYPE_TUNNEL_REALDEV,
+	.init		= realdev_init,
+	.exit		= realdev_exit,
+	.create		= realdev_create,
+	.destroy	= realdev_destroy,
+	.set_addr	= realdev_set_addr,
+	.get_name	= realdev_get_name,
+	.get_addr	= realdev_get_addr,
+	.get_options	= realdev_get_options,
+	.set_options	= realdev_set_options,
+	.get_dev_flags	= ovs_vport_gen_get_dev_flags,
+	.is_running	= ovs_vport_gen_is_running,
+	.get_operstate	= ovs_vport_gen_get_operstate,
+	.send		= realdev_send,
+};
diff --git a/datapath/vport.c b/datapath/vport.c
index 0c77a1b..7759e07 100644
--- a/datapath/vport.c
+++ b/datapath/vport.c
@@ -44,6 +44,7 @@ static const struct vport_ops *base_vport_ops_list[] = {
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,26)
 	&ovs_capwap_vport_ops,
 #endif
+	&ovs_tunnel_realdev_vport_ops,
 };
 
 static const struct vport_ops **vport_ops_list;
diff --git a/datapath/vport.h b/datapath/vport.h
index b0cdeae..893daaf 100644
--- a/datapath/vport.h
+++ b/datapath/vport.h
@@ -257,5 +257,6 @@ extern const struct vport_ops ovs_internal_vport_ops;
 extern const struct vport_ops ovs_patch_vport_ops;
 extern const struct vport_ops ovs_gre_vport_ops;
 extern const struct vport_ops ovs_capwap_vport_ops;
+extern const struct vport_ops ovs_tunnel_realdev_vport_ops;
 
 #endif /* vport.h */
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index c32bb58..87a3e22 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -185,6 +185,7 @@ enum ovs_vport_type {
 	OVS_VPORT_TYPE_PATCH = 100, /* virtual tunnel connecting two vports */
 	OVS_VPORT_TYPE_GRE,      /* GRE tunnel */
 	OVS_VPORT_TYPE_CAPWAP,   /* CAPWAP tunnel */
+	OVS_VPORT_TYPE_TUNNEL_REALDEV,  /* real tunnel device */
 	__OVS_VPORT_TYPE_MAX
 };
 
diff --git a/include/openvswitch/tunnel.h b/include/openvswitch/tunnel.h
index 5f55ecc..078a940 100644
--- a/include/openvswitch/tunnel.h
+++ b/include/openvswitch/tunnel.h
@@ -74,4 +74,6 @@ enum {
 #define TNL_F_IN_KEY	(1 << 8) /* Tunnel port has input key. */
 #define TNL_F_OUT_KEY	(1 << 9) /* Tunnel port has output key. */
 
+#define TNL_F_CAPWAP    (1 << 10)
+
 #endif /* openvswitch/tunnel.h */
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index a9eb3eb..7a9803b 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -155,15 +155,24 @@ netdev_vport_get_netdev_type(const struct dpif_linux_vport *vport)
         return "patch";
 
     case OVS_VPORT_TYPE_GRE:
-        if (tnl_port_config_from_nlattr(vport->options, vport->options_len,
-                                        a)) {
-            break;
-        }
-        return (nl_attr_get_u32(a[OVS_TUNNEL_ATTR_FLAGS]) & TNL_F_IPSEC
-                ? "ipsec_gre" : "gre");
+        return "gre-tundev";
 
     case OVS_VPORT_TYPE_CAPWAP:
-        return "capwap";
+        return "capwap-tundev";
+
+    case OVS_VPORT_TYPE_TUNNEL_REALDEV:
+        if (tnl_port_config_from_nlattr(vport->options,
+                                        vport->options_len, a)) {
+                return "no-config";
+        }
+
+        if (nl_attr_get_u32(a[OVS_TUNNEL_ATTR_FLAGS]) & TNL_F_CAPWAP) {
+            return "capwap";
+        } else if (nl_attr_get_u32(a[OVS_TUNNEL_ATTR_FLAGS]) & TNL_F_IPSEC) {
+            return "ipsec_gre";
+        } else {
+            return "gre";
+        }
 
     case __OVS_VPORT_TYPE_MAX:
         break;
@@ -248,6 +257,10 @@ netdev_vport_get_config(struct netdev_dev *dev_, struct shash *args)
         ofpbuf_delete(buf);
     }
 
+    if (!vport_class->unparse_config) {
+        return 0;
+    }
+
     error = vport_class->unparse_config(name, netdev_class->type,
                                         dev->options->data,
                                         dev->options->size,
@@ -267,11 +280,13 @@ netdev_vport_set_config(struct netdev_dev *dev_, const struct shash *args)
     struct netdev_dev_vport *dev = netdev_dev_vport_cast(dev_);
     const char *name = netdev_dev_get_name(dev_);
     struct ofpbuf *options;
-    int error;
+    int error = 0;
 
     options = ofpbuf_new(64);
-    error = vport_class->parse_config(name, netdev_dev_get_type(dev_),
-                                      args, options);
+    if (vport_class->parse_config) {
+        error = vport_class->parse_config(name, netdev_dev_get_type(dev_),
+                                          args, options);
+    }
     if (!error
         && (!dev->options
             || options->size != dev->options->size
@@ -550,47 +565,18 @@ netdev_vport_poll_notify(const struct netdev *netdev)
 \f
 /* Code specific to individual vport types. */
 
-static void
-set_key(const struct shash *args, const char *name, uint16_t type,
-        struct ofpbuf *options)
-{
-    const char *s;
-
-    s = shash_find_data(args, name);
-    if (!s) {
-        s = shash_find_data(args, "key");
-        if (!s) {
-            s = "0";
-        }
-    }
-
-    if (!strcmp(s, "flow")) {
-        /* This is the default if no attribute is present. */
-    } else {
-        nl_msg_put_be64(options, type, htonll(strtoull(s, NULL, 0)));
-    }
-}
-
 static int
 parse_tunnel_config(const char *name, const char *type,
                     const struct shash *args, struct ofpbuf *options)
 {
-    bool is_gre = false;
-    bool is_ipsec = false;
-    struct shash_node *node;
-    bool ipsec_mech_set = false;
     ovs_be32 daddr = htonl(0);
-    ovs_be32 saddr = htonl(0);
-    uint32_t flags;
-
-    flags = TNL_F_DF_DEFAULT | TNL_F_PMTUD | TNL_F_HDR_CACHE;
-    if (!strcmp(type, "gre")) {
-        is_gre = true;
-    } else if (!strcmp(type, "ipsec_gre")) {
-        is_gre = true;
-        is_ipsec = true;
+    struct shash_node *node;
+    uint32_t flags = 0;
+
+    if (!strcmp(type, "ipsec_gre")) {
         flags |= TNL_F_IPSEC;
-        flags &= ~TNL_F_HDR_CACHE;
+    } else if (!strcmp(type, "capwap")) {
+        flags |= TNL_F_CAPWAP;
     }
 
     SHASH_FOR_EACH (node, args) {
@@ -601,112 +587,9 @@ parse_tunnel_config(const char *name, const char *type,
             } else {
                 daddr = in_addr.s_addr;
             }
-        } else if (!strcmp(node->name, "local_ip")) {
-            struct in_addr in_addr;
-            if (lookup_ip(node->data, &in_addr)) {
-                VLOG_WARN("%s: bad %s 'local_ip'", name, type);
-            } else {
-                saddr = in_addr.s_addr;
-            }
-        } else if (!strcmp(node->name, "tos")) {
-            if (!strcmp(node->data, "inherit")) {
-                flags |= TNL_F_TOS_INHERIT;
-            } else {
-                char *endptr;
-                int tos;
-                tos = strtol(node->data, &endptr, 0);
-                if (*endptr == '\0') {
-                    nl_msg_put_u8(options, OVS_TUNNEL_ATTR_TOS, tos);
-                }
-            }
-        } else if (!strcmp(node->name, "ttl")) {
-            if (!strcmp(node->data, "inherit")) {
-                flags |= TNL_F_TTL_INHERIT;
-            } else {
-                nl_msg_put_u8(options, OVS_TUNNEL_ATTR_TTL, atoi(node->data));
-            }
-        } else if (!strcmp(node->name, "csum") && is_gre) {
-            if (!strcmp(node->data, "true")) {
-                flags |= TNL_F_CSUM;
-            }
-        } else if (!strcmp(node->name, "df_inherit")) {
-            if (!strcmp(node->data, "true")) {
-                flags |= TNL_F_DF_INHERIT;
-            }
-        } else if (!strcmp(node->name, "df_default")) {
-            if (!strcmp(node->data, "false")) {
-                flags &= ~TNL_F_DF_DEFAULT;
-            }
-        } else if (!strcmp(node->name, "pmtud")) {
-            if (!strcmp(node->data, "false")) {
-                flags &= ~TNL_F_PMTUD;
-            }
-        } else if (!strcmp(node->name, "header_cache")) {
-            if (!strcmp(node->data, "false")) {
-                flags &= ~TNL_F_HDR_CACHE;
-            }
-        } else if (!strcmp(node->name, "peer_cert") && is_ipsec) {
-            if (shash_find(args, "certificate")) {
-                ipsec_mech_set = true;
-            } else {
-                const char *use_ssl_cert;
-
-                /* If the "use_ssl_cert" is true, then "certificate" and
-                 * "private_key" will be pulled from the SSL table.  The
-                 * use of this option is strongly discouraged, since it
-                 * will like be removed when multiple SSL configurations
-                 * are supported by OVS.
-                 */
-                use_ssl_cert = shash_find_data(args, "use_ssl_cert");
-                if (!use_ssl_cert || strcmp(use_ssl_cert, "true")) {
-                    VLOG_ERR("%s: 'peer_cert' requires 'certificate' argument",
-                             name);
-                    return EINVAL;
-                }
-                ipsec_mech_set = true;
-            }
-        } else if (!strcmp(node->name, "psk") && is_ipsec) {
-            ipsec_mech_set = true;
-        } else if (is_ipsec
-                && (!strcmp(node->name, "certificate")
-                    || !strcmp(node->name, "private_key")
-                    || !strcmp(node->name, "use_ssl_cert"))) {
-            /* Ignore options not used by the netdev. */
-        } else if (!strcmp(node->name, "key") ||
-                   !strcmp(node->name, "in_key") ||
-                   !strcmp(node->name, "out_key")) {
-            /* Handled separately below. */
-        } else {
-            VLOG_WARN("%s: unknown %s argument '%s'", name, type, node->name);
         }
     }
 
-    if (is_ipsec) {
-        char *file_name = xasprintf("%s/%s", ovs_rundir(),
-                "ovs-monitor-ipsec.pid");
-        pid_t pid = read_pidfile(file_name);
-        free(file_name);
-        if (pid < 0) {
-            VLOG_ERR("%s: IPsec requires the ovs-monitor-ipsec daemon",
-                     name);
-            return EINVAL;
-        }
-
-        if (shash_find(args, "peer_cert") && shash_find(args, "psk")) {
-            VLOG_ERR("%s: cannot define both 'peer_cert' and 'psk'", name);
-            return EINVAL;
-        }
-
-        if (!ipsec_mech_set) {
-            VLOG_ERR("%s: IPsec requires an 'peer_cert' or psk' argument",
-                     name);
-            return EINVAL;
-        }
-    }
-
-    set_key(args, "in_key", OVS_TUNNEL_ATTR_IN_KEY, options);
-    set_key(args, "out_key", OVS_TUNNEL_ATTR_OUT_KEY, options);
-
     if (!daddr) {
         VLOG_ERR("%s: %s type requires valid 'remote_ip' argument",
                  name, type);
@@ -714,14 +597,6 @@ parse_tunnel_config(const char *name, const char *type,
     }
     nl_msg_put_be32(options, OVS_TUNNEL_ATTR_DST_IPV4, daddr);
 
-    if (saddr) {
-        if (ip_is_multicast(daddr)) {
-            VLOG_WARN("%s: remote_ip is multicast, ignoring local_ip", name);
-        } else {
-            nl_msg_put_be32(options, OVS_TUNNEL_ATTR_SRC_IPV4, saddr);
-        }
-    }
-
     nl_msg_put_u32(options, OVS_TUNNEL_ATTR_FLAGS, flags);
 
     return 0;
@@ -749,95 +624,6 @@ tnl_port_config_from_nlattr(const struct nlattr *options, size_t options_len,
     }
     return 0;
 }
-
-static uint64_t
-get_be64_or_zero(const struct nlattr *a)
-{
-    return a ? ntohll(nl_attr_get_be64(a)) : 0;
-}
-
-static int
-unparse_tunnel_config(const char *name OVS_UNUSED, const char *type OVS_UNUSED,
-                      const struct nlattr *options, size_t options_len,
-                      struct shash *args)
-{
-    struct nlattr *a[OVS_TUNNEL_ATTR_MAX + 1];
-    ovs_be32 daddr;
-    uint32_t flags;
-    int error;
-
-    error = tnl_port_config_from_nlattr(options, options_len, a);
-    if (error) {
-        return error;
-    }
-
-    flags = nl_attr_get_u32(a[OVS_TUNNEL_ATTR_FLAGS]);
-    if (!(flags & TNL_F_HDR_CACHE) == !(flags & TNL_F_IPSEC)) {
-        smap_add(args, "header_cache",
-                 flags & TNL_F_HDR_CACHE ? "true" : "false");
-    }
-
-    daddr = nl_attr_get_be32(a[OVS_TUNNEL_ATTR_DST_IPV4]);
-    shash_add(args, "remote_ip", xasprintf(IP_FMT, IP_ARGS(&daddr)));
-
-    if (a[OVS_TUNNEL_ATTR_SRC_IPV4]) {
-        ovs_be32 saddr = nl_attr_get_be32(a[OVS_TUNNEL_ATTR_SRC_IPV4]);
-        shash_add(args, "local_ip", xasprintf(IP_FMT, IP_ARGS(&saddr)));
-    }
-
-    if (!a[OVS_TUNNEL_ATTR_IN_KEY] && !a[OVS_TUNNEL_ATTR_OUT_KEY]) {
-        smap_add(args, "key", "flow");
-    } else {
-        uint64_t in_key = get_be64_or_zero(a[OVS_TUNNEL_ATTR_IN_KEY]);
-        uint64_t out_key = get_be64_or_zero(a[OVS_TUNNEL_ATTR_OUT_KEY]);
-
-        if (in_key && in_key == out_key) {
-            shash_add(args, "key", xasprintf("%"PRIu64, in_key));
-        } else {
-            if (!a[OVS_TUNNEL_ATTR_IN_KEY]) {
-                smap_add(args, "in_key", "flow");
-            } else if (in_key) {
-                shash_add(args, "in_key", xasprintf("%"PRIu64, in_key));
-            }
-
-            if (!a[OVS_TUNNEL_ATTR_OUT_KEY]) {
-                smap_add(args, "out_key", "flow");
-            } else if (out_key) {
-                shash_add(args, "out_key", xasprintf("%"PRIu64, out_key));
-            }
-        }
-    }
-
-    if (flags & TNL_F_TTL_INHERIT) {
-        smap_add(args, "tos", "inherit");
-    } else if (a[OVS_TUNNEL_ATTR_TTL]) {
-        int ttl = nl_attr_get_u8(a[OVS_TUNNEL_ATTR_TTL]);
-        shash_add(args, "tos", xasprintf("%d", ttl));
-    }
-
-    if (flags & TNL_F_TOS_INHERIT) {
-        smap_add(args, "tos", "inherit");
-    } else if (a[OVS_TUNNEL_ATTR_TOS]) {
-        int tos = nl_attr_get_u8(a[OVS_TUNNEL_ATTR_TOS]);
-        shash_add(args, "tos", xasprintf("0x%x", tos));
-    }
-
-    if (flags & TNL_F_CSUM) {
-        smap_add(args, "csum", "true");
-    }
-    if (flags & TNL_F_DF_INHERIT) {
-        smap_add(args, "df_inherit", "true");
-    }
-    if (!(flags & TNL_F_DF_DEFAULT)) {
-        smap_add(args, "df_default", "false");
-    }
-    if (!(flags & TNL_F_PMTUD)) {
-        smap_add(args, "pmtud", "false");
-    }
-
-    return 0;
-}
-
 static int
 parse_patch_config(const char *name, const char *type OVS_UNUSED,
                    const struct shash *args, struct ofpbuf *options)
@@ -894,15 +680,17 @@ unparse_patch_config(const char *name OVS_UNUSED, const char *type OVS_UNUSED,
     return 0;
 }
 \f
-#define VPORT_FUNCTIONS(GET_STATUS)                         \
+#define __VPORT_FUNCTIONS(RUN, WAIT, GET_CONFIG,            \
+                          SET_CONFIG, SEND, GET_STATS,      \
+                          SET_STATS, GET_STATUS)            \
     NULL,                                                   \
-    netdev_vport_run,                                       \
-    netdev_vport_wait,                                      \
+    RUN,                                                    \
+    WAIT,                                                   \
                                                             \
     netdev_vport_create,                                    \
     netdev_vport_destroy,                                   \
-    netdev_vport_get_config,                                \
-    netdev_vport_set_config,                                \
+    GET_CONFIG,                                             \
+    SET_CONFIG,                                             \
                                                             \
     netdev_vport_open,                                      \
     netdev_vport_close,                                     \
@@ -912,7 +700,7 @@ unparse_patch_config(const char *name OVS_UNUSED, const char *type OVS_UNUSED,
     NULL,                       /* recv_wait */             \
     NULL,                       /* drain */                 \
                                                             \
-    netdev_vport_send,          /* send */                  \
+    SEND,                       /* send */                  \
     NULL,                       /* send_wait */             \
                                                             \
     netdev_vport_set_etheraddr,                             \
@@ -923,8 +711,8 @@ unparse_patch_config(const char *name OVS_UNUSED, const char *type OVS_UNUSED,
     NULL,                       /* get_carrier */           \
     NULL,                       /* get_carrier_resets */    \
     NULL,                       /* get_miimon */            \
-    netdev_vport_get_stats,                                 \
-    netdev_vport_set_stats,                                 \
+    GET_STATS,                                              \
+    SET_STATS,                                              \
                                                             \
     NULL,                       /* get_features */          \
     NULL,                       /* set_advertisements */    \
@@ -953,24 +741,47 @@ unparse_patch_config(const char *name OVS_UNUSED, const char *type OVS_UNUSED,
                                                             \
     netdev_vport_change_seq
 
+#define VPORT_FUNCTIONS(SET_CONFIG, GET_STATUS)             \
+        __VPORT_FUNCTIONS(netdev_vport_run,                 \
+                          netdev_vport_wait,                \
+                          netdev_vport_get_config,          \
+                          SET_CONFIG,                       \
+                          netdev_vport_send,                \
+                          netdev_vport_get_stats,           \
+                          netdev_vport_set_stats,           \
+                          GET_STATUS)
+
+#define VPORT_TUNNEL_REALDEV_FUNCTIONS                      \
+        __VPORT_FUNCTIONS(NULL, NULL, NULL,                 \
+                          netdev_vport_set_config,          \
+                          NULL, NULL, NULL, NULL)
+
 void
 netdev_vport_register(void)
 {
     static const struct vport_class vport_classes[] = {
-        { OVS_VPORT_TYPE_GRE,
-          { "gre", VPORT_FUNCTIONS(netdev_vport_get_drv_info) },
-          parse_tunnel_config, unparse_tunnel_config },
+        { OVS_VPORT_TYPE_TUNNEL_REALDEV,
+          { "gre", VPORT_TUNNEL_REALDEV_FUNCTIONS },
+          parse_tunnel_config, NULL },
+
+        { OVS_VPORT_TYPE_TUNNEL_REALDEV,
+          { "ipsec_gre", VPORT_TUNNEL_REALDEV_FUNCTIONS },
+          parse_tunnel_config, NULL },
 
         { OVS_VPORT_TYPE_GRE,
-          { "ipsec_gre", VPORT_FUNCTIONS(netdev_vport_get_drv_info) },
-          parse_tunnel_config, unparse_tunnel_config },
+          { "gre-tundev", VPORT_FUNCTIONS(NULL, netdev_vport_get_drv_info) },
+          NULL, NULL },
+
+        { OVS_VPORT_TYPE_TUNNEL_REALDEV,
+          { "capwap", VPORT_TUNNEL_REALDEV_FUNCTIONS },
+          parse_tunnel_config, NULL },
 
         { OVS_VPORT_TYPE_CAPWAP,
-          { "capwap", VPORT_FUNCTIONS(netdev_vport_get_drv_info) },
-          parse_tunnel_config, unparse_tunnel_config },
+          { "capwap-tundev", VPORT_FUNCTIONS(NULL, netdev_vport_get_drv_info) },
+          NULL, NULL },
 
         { OVS_VPORT_TYPE_PATCH,
-          { "patch", VPORT_FUNCTIONS(NULL) },
+          { "patch", VPORT_FUNCTIONS(netdev_vport_set_config, NULL) },
           parse_patch_config, unparse_patch_config }
     };
 
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/21] lib: Replace commit_set_tun_id_action() with commit_set_tunnel_action()
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
                     ` (7 preceding siblings ...)
  2012-05-24  9:09   ` [PATCH 11/21] datapath, vport: Provide tunnel realdev and tundev classes and vports Simon Horman
@ 2012-05-24  9:09   ` Simon Horman
  2012-05-24  9:09   ` [PATCH 13/21] global: Remove OVS_KEY_ATTR_TUN_ID Simon Horman
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
---
 include/linux/openvswitch.h | 11 +++++++++++
 lib/odp-util.c              | 12 ++++++------
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index 87a3e22..f2d56ec 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -372,6 +372,17 @@ struct ovs_key_ipv4_tunnel {
 	__u8   pad[2];
 };
 
+static inline int
+ovs_key_ipv4_tunnel_equal(const struct ovs_key_ipv4_tunnel *a,
+                          const struct ovs_key_ipv4_tunnel *b)
+{
+	return a->ipv4_dst == b->ipv4_dst &&
+		a->tun_id == b->tun_id &&
+		a->ipv4_src == b->ipv4_src &&
+		a->ipv4_tos == b->ipv4_tos &&
+		a->ipv4_ttl == b->ipv4_ttl;
+}
+
 /**
  * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
  * @OVS_FLOW_ATTR_KEY: Nested %OVS_KEY_ATTR_* attributes specifying the flow
diff --git a/lib/odp-util.c b/lib/odp-util.c
index 5f76f5e..11b7a1b 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -1892,16 +1892,16 @@ commit_set_action(struct ofpbuf *odp_actions, enum ovs_key_attr key_type,
 }
 
 static void
-commit_set_tun_id_action(const struct flow *flow, struct flow *base,
+commit_set_tunnel_action(const struct flow *flow, struct flow *base,
                          struct ofpbuf *odp_actions)
 {
-    if (base->tun_key.tun_id == flow->tun_key.tun_id) {
+    if (ovs_key_ipv4_tunnel_equal(&base->tun_key, &flow->tun_key)) {
         return;
     }
-    base->tun_key.tun_id = flow->tun_key.tun_id;
+    base->tun_key = flow->tun_key;
 
-    commit_set_action(odp_actions, OVS_KEY_ATTR_TUN_ID,
-                      &base->tun_key.tun_id, sizeof(base->tun_key.tun_id));
+    commit_set_action(odp_actions, OVS_KEY_ATTR_IPV4_TUNNEL,
+                      &base->tun_key, sizeof(base->tun_key));
 }
 
 static void
@@ -2072,7 +2072,7 @@ void
 commit_odp_actions(const struct flow *flow, struct flow *base,
                    struct ofpbuf *odp_actions)
 {
-    commit_set_tun_id_action(flow, base, odp_actions);
+    commit_set_tunnel_action(flow, base, odp_actions);
     commit_set_ether_addr_action(flow, base, odp_actions);
     commit_vlan_action(flow, base, odp_actions);
     commit_set_nw_action(flow, base, odp_actions);
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/21] global: Remove OVS_KEY_ATTR_TUN_ID
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
                     ` (8 preceding siblings ...)
  2012-05-24  9:09   ` [PATCH 12/21] lib: Replace commit_set_tun_id_action() with commit_set_tunnel_action() Simon Horman
@ 2012-05-24  9:09   ` Simon Horman
  2012-05-24  9:09   ` [PATCH 14/21] ofproto: Set flow tun_key in compose_output_action() Simon Horman
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

OVS_KEY_ATTR_TUN_ID may now be removed as it is
no longer used in any meaningful way.

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
---
 datapath/datapath.c         |  1 -
 datapath/flow.c             |  1 -
 include/linux/openvswitch.h |  1 -
 lib/dpif-netdev.c           |  1 -
 lib/odp-util.c              | 18 ------------------
 5 files changed, 22 deletions(-)

diff --git a/datapath/datapath.c b/datapath/datapath.c
index 65dfe79..dcff4c6 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -590,7 +590,6 @@ static int validate_set(const struct nlattr *a,
 	const struct ovs_key_ipv4_tunnel *tun_key;
 
 	case OVS_KEY_ATTR_PRIORITY:
-	case OVS_KEY_ATTR_TUN_ID:
 	case OVS_KEY_ATTR_ETHERNET:
 		break;
 
diff --git a/datapath/flow.c b/datapath/flow.c
index 49c0dd8..9c898c6 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -847,7 +847,6 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd),
 
 	/* Not upstream. */
-	[OVS_KEY_ATTR_TUN_ID] = sizeof(__be64),
 	[OVS_KEY_ATTR_IPV4_TUNNEL] = sizeof(struct ovs_key_ipv4_tunnel),
 };
 
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index f2d56ec..9de3f20 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -279,7 +279,6 @@ enum ovs_key_attr {
 	OVS_KEY_ATTR_ICMPV6,    /* struct ovs_key_icmpv6 */
 	OVS_KEY_ATTR_ARP,       /* struct ovs_key_arp */
 	OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
-	OVS_KEY_ATTR_TUN_ID,    /* be64 tunnel ID */
 	OVS_KEY_ATTR_IPV4_TUNNEL,  /* struct ovs_key_ipv4_tunnel */
 	__OVS_KEY_ATTR_MAX
 };
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d065a3a..ff00e05 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1162,7 +1162,6 @@ execute_set_action(struct ofpbuf *packet, const struct nlattr *a)
     const struct ovs_key_udp *udp_key;
 
     switch (type) {
-    case OVS_KEY_ATTR_TUN_ID:
     case OVS_KEY_ATTR_PRIORITY:
     case OVS_KEY_ATTR_IPV6:
     case OVS_KEY_ATTR_IPV4_TUNNEL:
diff --git a/lib/odp-util.c b/lib/odp-util.c
index 11b7a1b..d1fe9d8 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -105,7 +105,6 @@ ovs_key_attr_to_string(enum ovs_key_attr attr)
     case OVS_KEY_ATTR_ICMPV6: return "icmpv6";
     case OVS_KEY_ATTR_ARP: return "arp";
     case OVS_KEY_ATTR_ND: return "nd";
-    case OVS_KEY_ATTR_TUN_ID: return "tun_id";
     case OVS_KEY_ATTR_IPV4_TUNNEL: return "ipv4_tunnel";
 
     case __OVS_KEY_ATTR_MAX:
@@ -602,7 +601,6 @@ odp_flow_key_attr_len(uint16_t type)
     switch ((enum ovs_key_attr) type) {
     case OVS_KEY_ATTR_ENCAP: return -2;
     case OVS_KEY_ATTR_PRIORITY: return 4;
-    case OVS_KEY_ATTR_TUN_ID: return 8;
     case OVS_KEY_ATTR_IN_PORT: return 4;
     case OVS_KEY_ATTR_ETHERNET: return sizeof(struct ovs_key_ethernet);
     case OVS_KEY_ATTR_VLAN: return sizeof(ovs_be16);
@@ -697,10 +695,6 @@ format_odp_key_attr(const struct nlattr *a, struct ds *ds)
         ds_put_format(ds, "(%"PRIu32")", nl_attr_get_u32(a));
         break;
 
-    case OVS_KEY_ATTR_TUN_ID:
-        ds_put_format(ds, "(%#"PRIx64")", ntohll(nl_attr_get_be64(a)));
-        break;
-
     case OVS_KEY_ATTR_IPV4_TUNNEL:
         ipv4_tun_key = nl_attr_get(a);
         ds_put_format(ds, "(tun_id=%"PRIx64",flags=%"PRIx32
@@ -913,18 +907,6 @@ parse_odp_key_attr(const char *s, const struct simap *port_names,
     }
 
     {
-        char tun_id_s[32];
-        int n = -1;
-
-        if (sscanf(s, "tun_id(%31[x0123456789abcdefABCDEF])%n",
-                   tun_id_s, &n) > 0 && n > 0) {
-            uint64_t tun_id = strtoull(tun_id_s, NULL, 0);
-            nl_msg_put_be64(key, OVS_KEY_ATTR_TUN_ID, htonll(tun_id));
-            return n;
-        }
-    }
-
-    {
         ovs_be32 ipv4_src;
         ovs_be32 ipv4_dst;
         unsigned long long tun_flags;
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 14/21] ofproto: Set flow tun_key in compose_output_action()
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
                     ` (9 preceding siblings ...)
  2012-05-24  9:09   ` [PATCH 13/21] global: Remove OVS_KEY_ATTR_TUN_ID Simon Horman
@ 2012-05-24  9:09   ` Simon Horman
  2012-05-24  9:09   ` [PATCH 15/21] datapath: Remove mlink element from tnl_mutable_config Simon Horman
  2012-05-24  9:09   ` [PATCH 18/21] dataptah: remove ttl and tos " Simon Horman
  12 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

In essence this attached the tun_key, if any,
to the output processing of a packet. This allows
it the packet to be transmitted using flow-based
tunneling as necessary.

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

---

v4
* Set tun_flags field of flow.tun_key
* Remove debugging message

v3
* Initial release

datapath: Add flags to ovs_key_ipv4_tunnel

Add flags to ovs_key_ipv4_tunnel and set from
the tunnel's realdev flags. This allows the datapath
to have access to flags on transmit which can be
used to effect the transmission - e.g. add a tunnel id.

Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
---
 ofproto/ofproto-dpif.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 2a52f37..b1354a2 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4919,8 +4919,17 @@ compose_output_action__(struct action_xlate_ctx *ctx, uint16_t ofp_port,
     }
 
     out_port = realdev_to_txdev(ctx->ofproto, ofport, ctx->flow.vlan_tci);
-    if (out_port != odp_port && !ofport->tun) {
-        ctx->flow.vlan_tci = htons(0);
+    if (out_port != odp_port) {
+        if (ofport->tun) {
+            ctx->flow.tun_key.tun_id = ofport->tun->s.out_key;
+            ctx->flow.tun_key.tun_flags = ofport->tun->s.flags;
+            ctx->flow.tun_key.ipv4_src = ofport->tun->s.saddr;
+            ctx->flow.tun_key.ipv4_dst = ofport->tun->s.daddr;
+            ctx->flow.tun_key.ipv4_tos = ofport->tun->s.tos;
+            ctx->flow.tun_key.ipv4_ttl = ofport->tun->s.ttl;
+        } else {
+            ctx->flow.vlan_tci = htons(0);
+        }
     }
     commit_odp_actions(&ctx->flow, &ctx->base_flow, ctx->odp_actions);
     nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_OUTPUT, out_port);
@@ -5576,7 +5585,7 @@ action_xlate_ctx_init(struct action_xlate_ctx *ctx,
     ctx->ofproto = ofproto;
     ctx->flow = *flow;
     ctx->base_flow = ctx->flow;
-    ctx->base_flow.tun_key.ipv4_src = 0;
+    ctx->base_flow.tun_key.ipv4_src = htonl(0);
     ctx->base_flow.vlan_tci = initial_tci;
     ctx->rule = rule;
     ctx->packet = packet;
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 15/21] datapath: Remove mlink element from tnl_mutable_config
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
                     ` (10 preceding siblings ...)
  2012-05-24  9:09   ` [PATCH 14/21] ofproto: Set flow tun_key in compose_output_action() Simon Horman
@ 2012-05-24  9:09   ` Simon Horman
  2012-05-24  9:09   ` [PATCH 18/21] dataptah: remove ttl and tos " Simon Horman
  12 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

Multicast may be handled in user-space (but isn't yet).

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
---
 datapath/tunnel.c | 22 ----------------------
 datapath/tunnel.h |  3 ---
 2 files changed, 25 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index f07ec69..cdcb0a7 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -162,21 +162,6 @@ static void free_cache_rcu(struct rcu_head *rcu)
 	free_cache(c);
 }
 
-/* Frees the portion of 'mutable' that requires RTNL and thus can't happen
- * within an RCU callback.  Fortunately this part doesn't require waiting for
- * an RCU grace period.
- */
-static void free_mutable_rtnl(struct tnl_mutable_config *mutable)
-{
-	ASSERT_RTNL();
-	if (ipv4_is_multicast(mutable->key.daddr) && mutable->mlink) {
-		struct in_device *in_dev;
-		in_dev = inetdev_by_index(port_key_get_net(&mutable->key), mutable->mlink);
-		if (in_dev)
-			ip_mc_dec_group(in_dev, mutable->key.daddr);
-	}
-}
-
 static void assign_config_rcu(struct vport *vport,
 			      struct tnl_mutable_config *new_config)
 {
@@ -186,7 +171,6 @@ static void assign_config_rcu(struct vport *vport,
 	old_config = rtnl_dereference(tnl_vport->mutable);
 	rcu_assign_pointer(tnl_vport->mutable, new_config);
 
-	free_mutable_rtnl(old_config);
 	call_rcu(&old_config->rcu, free_config_rcu);
 }
 
@@ -1391,8 +1375,6 @@ static int tnl_set_config(struct net *net,
 	if (old_vport && old_vport != cur_vport)
 		return -EEXIST;
 
-	mutable->mlink = 0;
-
 	return 0;
 }
 
@@ -1445,7 +1427,6 @@ struct vport *ovs_tnl_create(const struct vport_parms *parms,
 	return vport;
 
 error_free_mutable:
-	free_mutable_rtnl(mutable);
 	kfree(mutable);
 error_free_vport:
 	ovs_vport_free(vport);
@@ -1470,7 +1451,6 @@ void ovs_tnl_destroy(struct vport *vport)
 
 	mutable = rtnl_dereference(tnl_vport->mutable);
 	port_table_remove_port(vport);
-	free_mutable_rtnl(mutable);
 	call_rcu(&tnl_vport->rcu, free_port_rcu);
 }
 
@@ -1484,8 +1464,6 @@ int ovs_tnl_set_addr(struct vport *vport, const unsigned char *addr)
 	if (!mutable)
 		return -ENOMEM;
 
-	old_mutable->mlink = 0;
-
 	memcpy(mutable->eth_addr, addr, ETH_ALEN);
 	assign_config_rcu(vport, mutable);
 
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index 7d78297..0af27ac 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -117,9 +117,6 @@ struct tnl_mutable_config {
 	u32	flags;
 	u8	tos;
 	u8	ttl;
-
-	/* Multicast configuration. */
-	int	mlink;
 };
 
 struct tnl_ops {
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 16/21] datapath: remove tunnel cache
  2012-05-24  9:08 [RFC v4 00/21] Flow Based Tunneling for Open vSwitch Simon Horman
                   ` (3 preceding siblings ...)
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
@ 2012-05-24  9:09 ` Simon Horman
  2012-05-24  9:09 ` [PATCH 17/21] datapath: Always use tun_key addresses for route lookup Simon Horman
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman

As tunndevs no longer have a daddr the cache can no longer built in this way.
Furthermore, its not clear to me what the value of keeping the cache is in
the context of moving towards allowing use of in-tree tunnelling.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c | 384 +++---------------------------------------------------
 datapath/tunnel.h |  52 --------
 2 files changed, 20 insertions(+), 416 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index cdcb0a7..b997cb8 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -52,43 +52,9 @@
 #include "vport-generic.h"
 #include "vport-internal_dev.h"
 
-#ifdef NEED_CACHE_TIMEOUT
-/*
- * On kernels where we can't quickly detect changes in the rest of the system
- * we use an expiration time to invalidate the cache.  A shorter expiration
- * reduces the length of time that we may potentially blackhole packets while
- * a longer time increases performance by reducing the frequency that the
- * cache needs to be rebuilt.  A variety of factors may cause the cache to be
- * invalidated before the expiration time but this is the maximum.  The time
- * is expressed in jiffies.
- */
-#define MAX_CACHE_EXP HZ
-#endif
-
-/*
- * Interval to check for and remove caches that are no longer valid.  Caches
- * are checked for validity before they are used for packet encapsulation and
- * old caches are removed at that time.  However, if no packets are sent through
- * the tunnel then the cache will never be destroyed.  Since it holds
- * references to a number of system objects, the cache will continue to use
- * system resources by not allowing those objects to be destroyed.  The cache
- * cleaner is periodically run to free invalid caches.  It does not
- * significantly affect system performance.  A lower interval will release
- * resources faster but will itself consume resources by requiring more frequent
- * checks.  A longer interval may result in messages being printed to the kernel
- * message buffer about unreleased resources.  The interval is expressed in
- * jiffies.
- */
-#define CACHE_CLEANER_INTERVAL (5 * HZ)
-
-#define CACHE_DATA_ALIGN 16
 #define PORT_TABLE_SIZE  1024
 
 static struct hlist_head *port_table __read_mostly;
-static int port_table_count;
-
-static void cache_cleaner(struct work_struct *work);
-static DECLARE_DELAYED_WORK(cache_cleaner_wq, cache_cleaner);
 
 /*
  * These are just used as an optimization: they don't require any kind of
@@ -108,60 +74,17 @@ static unsigned int multicast_ports __read_mostly;
 #define rt_dst(rt) (rt->u.dst)
 #endif
 
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,1,0)
-static struct hh_cache *rt_hh(struct rtable *rt)
-{
-	struct neighbour *neigh = dst_get_neighbour_noref(&rt->dst);
-	if (!neigh || !(neigh->nud_state & NUD_CONNECTED) ||
-			!neigh->hh.hh_len)
-		return NULL;
-	return &neigh->hh;
-}
-#else
-#define rt_hh(rt) (rt_dst(rt).hh)
-#endif
-
 static struct vport *tnl_vport_to_vport(const struct tnl_vport *tnl_vport)
 {
 	return vport_from_priv(tnl_vport);
 }
 
-/* This is analogous to rtnl_dereference for the tunnel cache.  It checks that
- * cache_lock is held, so it is only for update side code.
- */
-static struct tnl_cache *cache_dereference(struct tnl_vport *tnl_vport)
-{
-	return rcu_dereference_protected(tnl_vport->cache,
-				 lockdep_is_held(&tnl_vport->cache_lock));
-}
-
-static void schedule_cache_cleaner(void)
-{
-	schedule_delayed_work(&cache_cleaner_wq, CACHE_CLEANER_INTERVAL);
-}
-
-static void free_cache(struct tnl_cache *cache)
-{
-	if (!cache)
-		return;
-
-	ovs_flow_put(cache->flow);
-	ip_rt_put(cache->rt);
-	kfree(cache);
-}
-
 static void free_config_rcu(struct rcu_head *rcu)
 {
 	struct tnl_mutable_config *c = container_of(rcu, struct tnl_mutable_config, rcu);
 	kfree(c);
 }
 
-static void free_cache_rcu(struct rcu_head *rcu)
-{
-	struct tnl_cache *c = container_of(rcu, struct tnl_cache, rcu);
-	free_cache(c);
-}
-
 static void assign_config_rcu(struct vport *vport,
 			      struct tnl_mutable_config *new_config)
 {
@@ -174,18 +97,6 @@ static void assign_config_rcu(struct vport *vport,
 	call_rcu(&old_config->rcu, free_config_rcu);
 }
 
-static void assign_cache_rcu(struct vport *vport, struct tnl_cache *new_cache)
-{
-	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
-	struct tnl_cache *old_cache;
-
-	old_cache = cache_dereference(tnl_vport);
-	rcu_assign_pointer(tnl_vport->cache, new_cache);
-
-	if (old_cache)
-		call_rcu(&old_cache->rcu, free_cache_rcu);
-}
-
 static unsigned int *find_port_pool(const struct tnl_mutable_config *mutable)
 {
 	bool is_multicast = ipv4_is_multicast(mutable->key.daddr);
@@ -223,13 +134,9 @@ static void port_table_add_port(struct vport *vport)
 	const struct tnl_mutable_config *mutable;
 	u32 hash;
 
-	if (port_table_count == 0)
-		schedule_cache_cleaner();
-
 	mutable = rtnl_dereference(tnl_vport->mutable);
 	hash = port_hash(&mutable->key);
 	hlist_add_head_rcu(&tnl_vport->hash_node, find_bucket(hash));
-	port_table_count++;
 
 	(*find_port_pool(rtnl_dereference(tnl_vport->mutable)))++;
 }
@@ -240,10 +147,6 @@ static void port_table_remove_port(struct vport *vport)
 
 	hlist_del_init_rcu(&tnl_vport->hash_node);
 
-	port_table_count--;
-	if (port_table_count == 0)
-		cancel_delayed_work_sync(&cache_cleaner_wq);
-
 	(*find_port_pool(rtnl_dereference(tnl_vport->mutable)))--;
 }
 
@@ -780,11 +683,6 @@ static void create_tunnel_header(const struct vport *vport,
 	tnl_vport->tnl_ops->build_header(vport, mutable, iph + 1);
 }
 
-static void *get_cached_header(const struct tnl_cache *cache)
-{
-	return (void *)cache + ALIGN(sizeof(struct tnl_cache), CACHE_DATA_ALIGN);
-}
-
 #ifdef HAVE_RT_GENID
 static inline int rt_genid(struct net *net)
 {
@@ -792,184 +690,6 @@ static inline int rt_genid(struct net *net)
 }
 #endif
 
-static bool check_cache_valid(const struct tnl_cache *cache,
-			      const struct tnl_mutable_config *mutable)
-{
-	struct hh_cache *hh;
-
-	if (!cache)
-		return false;
-
-	hh = rt_hh(cache->rt);
-	return hh &&
-#ifdef NEED_CACHE_TIMEOUT
-		time_before(jiffies, cache->expiration) &&
-#endif
-#ifdef HAVE_RT_GENID
-		rt_genid(dev_net(rt_dst(cache->rt).dev)) == cache->rt->rt_genid &&
-#endif
-#ifdef HAVE_HH_SEQ
-		hh->hh_lock.sequence == cache->hh_seq &&
-#endif
-		mutable->seq == cache->mutable_seq &&
-		(!ovs_is_internal_dev(rt_dst(cache->rt).dev) ||
-		(cache->flow && !cache->flow->dead));
-}
-
-static void __cache_cleaner(struct tnl_vport *tnl_vport)
-{
-	const struct tnl_mutable_config *mutable =
-			rcu_dereference(tnl_vport->mutable);
-	const struct tnl_cache *cache = rcu_dereference(tnl_vport->cache);
-
-	if (cache && !check_cache_valid(cache, mutable) &&
-	    spin_trylock_bh(&tnl_vport->cache_lock)) {
-		assign_cache_rcu(tnl_vport_to_vport(tnl_vport), NULL);
-		spin_unlock_bh(&tnl_vport->cache_lock);
-	}
-}
-
-static void cache_cleaner(struct work_struct *work)
-{
-	int i;
-
-	schedule_cache_cleaner();
-
-	rcu_read_lock();
-	for (i = 0; i < PORT_TABLE_SIZE; i++) {
-		struct hlist_node *n;
-		struct hlist_head *bucket;
-		struct tnl_vport *tnl_vport;
-
-		bucket = &port_table[i];
-		hlist_for_each_entry_rcu(tnl_vport, n, bucket, hash_node)
-			__cache_cleaner(tnl_vport);
-	}
-	rcu_read_unlock();
-}
-
-static void create_eth_hdr(struct tnl_cache *cache, struct hh_cache *hh)
-{
-	void *cache_data = get_cached_header(cache);
-	int hh_off;
-
-#ifdef HAVE_HH_SEQ
-	unsigned hh_seq;
-
-	do {
-		hh_seq = read_seqbegin(&hh->hh_lock);
-		hh_off = HH_DATA_ALIGN(hh->hh_len) - hh->hh_len;
-		memcpy(cache_data, (void *)hh->hh_data + hh_off, hh->hh_len);
-		cache->hh_len = hh->hh_len;
-	} while (read_seqretry(&hh->hh_lock, hh_seq));
-
-	cache->hh_seq = hh_seq;
-#else
-	read_lock(&hh->hh_lock);
-	hh_off = HH_DATA_ALIGN(hh->hh_len) - hh->hh_len;
-	memcpy(cache_data, (void *)hh->hh_data + hh_off, hh->hh_len);
-	cache->hh_len = hh->hh_len;
-	read_unlock(&hh->hh_lock);
-#endif
-}
-
-static struct tnl_cache *build_cache(struct vport *vport,
-				     const struct tnl_mutable_config *mutable,
-				     struct rtable *rt)
-{
-	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
-	struct tnl_cache *cache;
-	void *cache_data;
-	int cache_len;
-	struct hh_cache *hh;
-
-	if (!(mutable->flags & TNL_F_HDR_CACHE))
-		return NULL;
-
-	/*
-	 * If there is no entry in the ARP cache or if this device does not
-	 * support hard header caching just fall back to the IP stack.
-	 */
-
-	hh = rt_hh(rt);
-	if (!hh)
-		return NULL;
-
-	/*
-	 * If lock is contended fall back to directly building the header.
-	 * We're not going to help performance by sitting here spinning.
-	 */
-	if (!spin_trylock(&tnl_vport->cache_lock))
-		return NULL;
-
-	cache = cache_dereference(tnl_vport);
-	if (check_cache_valid(cache, mutable))
-		goto unlock;
-	else
-		cache = NULL;
-
-	cache_len = LL_RESERVED_SPACE(rt_dst(rt).dev) + mutable->tunnel_hlen;
-
-	cache = kzalloc(ALIGN(sizeof(struct tnl_cache), CACHE_DATA_ALIGN) +
-			cache_len, GFP_ATOMIC);
-	if (!cache)
-		goto unlock;
-
-	create_eth_hdr(cache, hh);
-	cache_data = get_cached_header(cache) + cache->hh_len;
-	cache->len = cache->hh_len + mutable->tunnel_hlen;
-
-	create_tunnel_header(vport, mutable, rt, cache_data);
-
-	cache->mutable_seq = mutable->seq;
-	cache->rt = rt;
-#ifdef NEED_CACHE_TIMEOUT
-	cache->expiration = jiffies + tnl_vport->cache_exp_interval;
-#endif
-
-	if (ovs_is_internal_dev(rt_dst(rt).dev)) {
-		struct sw_flow_key flow_key;
-		struct vport *dst_vport;
-		struct sk_buff *skb;
-		int err;
-		int flow_key_len;
-		struct sw_flow *flow;
-
-		dst_vport = ovs_internal_dev_get_vport(rt_dst(rt).dev);
-		if (!dst_vport)
-			goto done;
-
-		skb = alloc_skb(cache->len, GFP_ATOMIC);
-		if (!skb)
-			goto done;
-
-		__skb_put(skb, cache->len);
-		memcpy(skb->data, get_cached_header(cache), cache->len);
-
-		err = ovs_flow_extract(skb, dst_vport->port_no, &flow_key,
-				       &flow_key_len);
-
-		consume_skb(skb);
-		if (err)
-			goto done;
-
-		flow = ovs_flow_tbl_lookup(rcu_dereference(dst_vport->dp->table),
-					   &flow_key, flow_key_len);
-		if (flow) {
-			cache->flow = flow;
-			ovs_flow_hold(flow);
-		}
-	}
-
-done:
-	assign_cache_rcu(vport, cache);
-
-unlock:
-	spin_unlock(&tnl_vport->cache_lock);
-
-	return cache;
-}
-
 static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
 				   u8 ipproto, __be32 daddr, __be32 saddr,
 				   u8 tos)
@@ -1001,33 +721,19 @@ static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
 
 static struct rtable *find_route(struct vport *vport,
 				 const struct tnl_mutable_config *mutable,
-				 u8 tos, __be32 daddr, __be32 saddr,
-				 struct tnl_cache **cache)
+				 u8 tos, __be32 daddr, __be32 saddr)
 {
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
-	struct tnl_cache *cur_cache = rcu_dereference(tnl_vport->cache);
+	struct rtable *rt;
 
-	*cache = NULL;
 	tos = RT_TOS(tos);
 
-	if (daddr == mutable->key.daddr && saddr == mutable->key.saddr &&
-	    tos == RT_TOS(mutable->tos) &&
-	    check_cache_valid(cur_cache, mutable)) {
-		*cache = cur_cache;
-		return cur_cache->rt;
-	} else {
-		struct rtable *rt;
-
-		rt = __find_route(mutable, tnl_vport->tnl_ops->ipproto,
-				  daddr, saddr, tos);
-		if (IS_ERR(rt))
-			return NULL;
-
-		if (likely(tos == RT_TOS(mutable->tos)))
-			*cache = build_cache(vport, mutable, rt);
+	rt = __find_route(mutable, tnl_vport->tnl_ops->ipproto,
+			  daddr, saddr, tos);
+	if (IS_ERR(rt))
+		return NULL;
 
-		return rt;
-	}
+	return rt;
 }
 
 static bool need_linearize(const struct sk_buff *skb)
@@ -1152,7 +858,6 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	enum vport_err_type err = VPORT_E_TX_ERROR;
 	struct rtable *rt;
 	struct dst_entry *unattached_dst = NULL;
-	struct tnl_cache *cache;
 	int sent_len = 0;
 	__be16 frag_off = 0;
 	__be32 daddr;
@@ -1210,11 +915,10 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	}
 
 	/* Route lookup */
-	rt = find_route(vport, mutable, tos, daddr, saddr, &cache);
+	rt = find_route(vport, mutable, tos, daddr, saddr);
 	if (unlikely(!rt))
 		goto error_free;
-	if (unlikely(!cache))
-		unattached_dst = &rt_dst(rt);
+	unattached_dst = &rt_dst(rt);
 
 	tos = INET_ECN_encapsulate(tos, inner_tos);
 
@@ -1239,11 +943,9 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	 * If we are over the MTU, allow the IP stack to handle fragmentation.
 	 * Fragmentation is a slow path anyways.
 	 */
-	if (unlikely(skb->len + mutable->tunnel_hlen > dst_mtu(&rt_dst(rt)) &&
-		     cache)) {
+	if (unlikely(skb->len + mutable->tunnel_hlen > dst_mtu(&rt_dst(rt)))) {
 		unattached_dst = &rt_dst(rt);
 		dst_hold(unattached_dst);
-		cache = NULL;
 	}
 
 	/* TTL */
@@ -1270,23 +972,15 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 		if (unlikely(vlan_deaccel_tag(skb)))
 			goto next;
 
-		if (likely(cache)) {
-			skb_push(skb, cache->len);
-			memcpy(skb->data, get_cached_header(cache), cache->len);
-			skb_reset_mac_header(skb);
-			skb_set_network_header(skb, cache->hh_len);
-
-		} else {
-			skb_push(skb, mutable->tunnel_hlen);
-			create_tunnel_header(vport, mutable, rt, skb->data);
-			skb_reset_network_header(skb);
-
-			if (next_skb)
-				skb_dst_set(skb, dst_clone(unattached_dst));
-			else {
-				skb_dst_set(skb, unattached_dst);
-				unattached_dst = NULL;
-			}
+		skb_push(skb, mutable->tunnel_hlen);
+		create_tunnel_header(vport, mutable, rt, skb->data);
+		skb_reset_network_header(skb);
+
+		if (next_skb)
+			skb_dst_set(skb, dst_clone(unattached_dst));
+		else {
+			skb_dst_set(skb, unattached_dst);
+			unattached_dst = NULL;
 		}
 		skb_set_transport_header(skb, skb_network_offset(skb) + sizeof(struct iphdr));
 
@@ -1301,37 +995,7 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 		if (unlikely(!skb))
 			goto next;
 
-		if (likely(cache)) {
-			int orig_len = skb->len - cache->len;
-			struct vport *cache_vport;
-
-			cache_vport = ovs_internal_dev_get_vport(rt_dst(rt).dev);
-			skb->protocol = htons(ETH_P_IP);
-			iph = ip_hdr(skb);
-			iph->tot_len = htons(skb->len - skb_network_offset(skb));
-			ip_send_check(iph);
-
-			if (cache_vport) {
-				if (unlikely(compute_ip_summed(skb, true))) {
-					kfree_skb(skb);
-					goto next;
-				}
-
-				OVS_CB(skb)->flow = cache->flow;
-				ovs_vport_receive(cache_vport, skb);
-				sent_len += orig_len;
-			} else {
-				int xmit_err;
-
-				skb->dev = rt_dst(rt).dev;
-				xmit_err = dev_queue_xmit(skb);
-
-				if (likely(net_xmit_eval(xmit_err) == 0))
-					sent_len += orig_len;
-			}
-		} else
-			sent_len += send_frags(skb, mutable);
-
+		sent_len += send_frags(skb, mutable);
 next:
 		skb = next_skb;
 	}
@@ -1414,13 +1078,6 @@ struct vport *ovs_tnl_create(const struct vport_parms *parms,
 	if (err)
 		goto error_free_mutable;
 
-	spin_lock_init(&tnl_vport->cache_lock);
-
-#ifdef NEED_CACHE_TIMEOUT
-	tnl_vport->cache_exp_interval = MAX_CACHE_EXP -
-				       (net_random() % (MAX_CACHE_EXP / 2));
-#endif
-
 	rcu_assign_pointer(tnl_vport->mutable, mutable);
 
 	port_table_add_port(vport);
@@ -1439,7 +1096,6 @@ static void free_port_rcu(struct rcu_head *rcu)
 	struct tnl_vport *tnl_vport = container_of(rcu,
 						   struct tnl_vport, rcu);
 
-	free_cache((struct tnl_cache __force *)tnl_vport->cache);
 	kfree((struct tnl_mutable __force *)tnl_vport->mutable);
 	ovs_vport_free(tnl_vport_to_vport(tnl_vport));
 }
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index 0af27ac..ed3b4ec 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -172,58 +172,6 @@ struct tnl_ops {
 /* If we can't detect all system changes directly we need to use a timeout. */
 #define NEED_CACHE_TIMEOUT
 #endif
-struct tnl_cache {
-	struct rcu_head rcu;
-
-	int len;		/* Length of data to be memcpy'd from cache. */
-	int hh_len;		/* Hardware hdr length, cached from hh_cache. */
-
-	/* Sequence number of mutable->seq from which this cache was
-	 * generated. */
-	unsigned mutable_seq;
-
-#ifdef HAVE_HH_SEQ
-	/*
-	 * The sequence number from the seqlock protecting the hardware header
-	 * cache (in the ARP cache).  Since every write increments the counter
-	 * this gives us an easy way to tell if it has changed.
-	 */
-	unsigned hh_seq;
-#endif
-
-#ifdef NEED_CACHE_TIMEOUT
-	/*
-	 * If we don't have direct mechanisms to detect all important changes in
-	 * the system fall back to an expiration time.  This expiration time
-	 * can be relatively short since at high rates there will be millions of
-	 * packets per second, so we'll still get plenty of benefit from the
-	 * cache.  Note that if something changes we may blackhole packets
-	 * until the expiration time (depending on what changed and the kernel
-	 * version we may be able to detect the change sooner).  Expiration is
-	 * expressed as a time in jiffies.
-	 */
-	unsigned long expiration;
-#endif
-
-	/*
-	 * The routing table entry that is the result of looking up the tunnel
-	 * endpoints.  It also contains a sequence number (called a generation
-	 * ID) that can be compared to a global sequence to tell if the routing
-	 * table has changed (and therefore there is a potential that this
-	 * cached route has been invalidated).
-	 */
-	struct rtable *rt;
-
-	/*
-	 * If the output device for tunnel traffic is an OVS internal device,
-	 * the flow of that datapath.  Since all tunnel traffic will have the
-	 * same headers this allows us to cache the flow lookup.  NULL if the
-	 * output device is not OVS or if there is no flow installed.
-	 */
-	struct sw_flow *flow;
-
-	/* The cached header follows after padding for alignment. */
-};
 
 struct tnl_vport {
 	struct rcu_head rcu;
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 17/21] datapath: Always use tun_key addresses for route lookup
  2012-05-24  9:08 [RFC v4 00/21] Flow Based Tunneling for Open vSwitch Simon Horman
                   ` (4 preceding siblings ...)
  2012-05-24  9:09 ` [PATCH 16/21] datapath: remove tunnel cache Simon Horman
@ 2012-05-24  9:09 ` Simon Horman
  2012-05-24  9:09 ` [PATCH 19/21] datapath: Simplify vport lookup Simon Horman
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman

The tun_key should always be present and correct.
Mutable no longer stores correct address information
and the saddr and daddr fields will be removed.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c | 42 +++++++++++++++++-------------------------
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index b997cb8..ba18055 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -690,46 +690,44 @@ static inline int rt_genid(struct net *net)
 }
 #endif
 
-static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
-				   u8 ipproto, __be32 daddr, __be32 saddr,
-				   u8 tos)
+static struct rtable *__find_route(struct net *net, u8 ipproto,
+				   struct ovs_key_ipv4_tunnel *tun_key, u8 tos)
 {
 	/* Tunnel configuration keeps DSCP part of TOS bits, But Linux
 	 * router expect RT_TOS bits only. */
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,39)
 	struct flowi fl = { .nl_u = { .ip4_u = {
-					.daddr = daddr,
-					.saddr = saddr,
+					.daddr = tun_key->ipv4_dst,
+					.saddr = tun_key->ipv4_src,
 					.tos   = RT_TOS(tos) } },
 					.proto = ipproto };
 	struct rtable *rt;
 
-	if (unlikely(ip_route_output_key(port_key_get_net(&mutable->key), &rt, &fl)))
+	if (unlikely(ip_route_output_key(net, &rt, &fl)))
 		return ERR_PTR(-EADDRNOTAVAIL);
 
 	return rt;
 #else
-	struct flowi4 fl = { .daddr = daddr,
-			     .saddr = saddr,
+	struct flowi4 fl = { .daddr = tun_key->ipv4_dst,
+			     .saddr = tun_key->ipv4_src,
 			     .flowi4_tos = RT_TOS(tos),
 			     .flowi4_proto = ipproto };
 
-	return ip_route_output_key(port_key_get_net(&mutable->key), &fl);
+	return ip_route_output_key(net, &fl);
 #endif
 }
 
-static struct rtable *find_route(struct vport *vport,
-				 const struct tnl_mutable_config *mutable,
-				 u8 tos, __be32 daddr, __be32 saddr)
+static struct rtable *find_route(struct vport *vport, struct net *net,
+				 struct ovs_key_ipv4_tunnel *tun_key, u8 tos)
 {
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
 	struct rtable *rt;
 
 	tos = RT_TOS(tos);
 
-	rt = __find_route(mutable, tnl_vport->tnl_ops->ipproto,
-			  daddr, saddr, tos);
+	rt = __find_route(net, tnl_vport->tnl_ops->ipproto,
+			  tun_key, tos);
 	if (IS_ERR(rt))
 		return NULL;
 
@@ -860,12 +858,13 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	struct dst_entry *unattached_dst = NULL;
 	int sent_len = 0;
 	__be16 frag_off = 0;
-	__be32 daddr;
-	__be32 saddr;
 	u8 ttl;
 	u8 inner_tos;
 	u8 tos;
 
+	if (!OVS_CB(skb)->tun_key)
+		goto error_free;
+
 	/* Validate the protocol headers before we try to use them. */
 	if (skb->protocol == htons(ETH_P_8021Q) &&
 	    !vlan_tx_tag_present(skb)) {
@@ -906,16 +905,9 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	else
 		tos = mutable->tos;
 
-	if (OVS_CB(skb)->tun_key) {
-		daddr = OVS_CB(skb)->tun_key->ipv4_dst;
-		saddr = OVS_CB(skb)->tun_key->ipv4_src;
-	} else {
-		daddr = mutable->key.daddr;
-		saddr = mutable->key.saddr;
-	}
-
 	/* Route lookup */
-	rt = find_route(vport, mutable, tos, daddr, saddr);
+	rt = find_route(vport, port_key_get_net(&mutable->key),
+			OVS_CB(skb)->tun_key, tos);
 	if (unlikely(!rt))
 		goto error_free;
 	unattached_dst = &rt_dst(rt);
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 18/21] dataptah: remove ttl and tos from tnl_mutable_config
       [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
                     ` (11 preceding siblings ...)
  2012-05-24  9:09   ` [PATCH 15/21] datapath: Remove mlink element from tnl_mutable_config Simon Horman
@ 2012-05-24  9:09   ` Simon Horman
  12 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA

tun_key should always be present and correct in ovs_tnl_send()

It ought to be possible to handle the ttl entirely
in user-space. This is not implemented yet. However, the
TNL_F_TOS_INHERIT is currently never set.

Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
---
 datapath/tunnel.c | 10 ++--------
 datapath/tunnel.h |  4 ----
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index ba18055..39aa2af 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -900,10 +900,8 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 
 	if (mutable->flags & TNL_F_TOS_INHERIT)
 		tos = inner_tos;
-	else if (OVS_CB(skb)->tun_key)
-		tos = OVS_CB(skb)->tun_key->ipv4_tos;
 	else
-		tos = mutable->tos;
+		tos = OVS_CB(skb)->tun_key->ipv4_tos;
 
 	/* Route lookup */
 	rt = find_route(vport, port_key_get_net(&mutable->key),
@@ -940,11 +938,7 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 		dst_hold(unattached_dst);
 	}
 
-	/* TTL */
-	if (OVS_CB(skb)->tun_key)
-		ttl = OVS_CB(skb)->tun_key->ipv4_ttl;
-	else
-		ttl = mutable->ttl;
+	ttl = OVS_CB(skb)->tun_key->ipv4_ttl;
 	if (!ttl)
 		ttl = ip4_dst_hoplimit(&rt_dst(rt));
 	if (mutable->flags & TNL_F_TTL_INHERIT) {
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index ed3b4ec..330df27 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -99,8 +99,6 @@ static inline void port_key_set_net(struct port_lookup_key *key, struct net *net
  * (e.g. ICMP fragmentation needed messages).
  * @out_key: Key to use on output, 0 if this tunnel has no fixed output key.
  * @flags: TNL_F_* flags.
- * @tos: IPv4 TOS value to use for tunnel, 0 if no fixed TOS.
- * @ttl: IPv4 TTL value to use for tunnel, 0 if no fixed TTL.
  */
 struct tnl_mutable_config {
 	struct port_lookup_key key;
@@ -115,8 +113,6 @@ struct tnl_mutable_config {
 	/* Configured via OVS_TUNNEL_ATTR_* attributes. */
 	__be64	out_key;
 	u32	flags;
-	u8	tos;
-	u8	ttl;
 };
 
 struct tnl_ops {
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 19/21] datapath: Simplify vport lookup
  2012-05-24  9:08 [RFC v4 00/21] Flow Based Tunneling for Open vSwitch Simon Horman
                   ` (5 preceding siblings ...)
  2012-05-24  9:09 ` [PATCH 17/21] datapath: Always use tun_key addresses for route lookup Simon Horman
@ 2012-05-24  9:09 ` Simon Horman
  2012-05-24  9:09 ` [PATCH 20/21] datapath: Use tun_key flags for id and csum settings on transmit Simon Horman
  2012-05-24  9:09 ` [PATCH 21/21] datapath: Always use tun_key flags Simon Horman
  8 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman

The lookup is now only based on the net and tunnel type.
It should be possible to either get rid of the lookup alltogether
or push it into the GRE and CAPWAP implementations, but this
change is simpler for now

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c       | 110 +++---------------------------------------------
 datapath/tunnel.h       |  18 ++------
 datapath/vport-capwap.c |   7 +--
 datapath/vport-gre.c    |  10 ++---
 4 files changed, 16 insertions(+), 129 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index 39aa2af..a303d8d 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -56,18 +56,6 @@
 
 static struct hlist_head *port_table __read_mostly;
 
-/*
- * These are just used as an optimization: they don't require any kind of
- * synchronization because we could have just as easily read the value before
- * the port change happened.
- */
-static unsigned int key_local_remote_ports __read_mostly;
-static unsigned int key_remote_ports __read_mostly;
-static unsigned int key_multicast_ports __read_mostly;
-static unsigned int local_remote_ports __read_mostly;
-static unsigned int remote_ports __read_mostly;
-static unsigned int multicast_ports __read_mostly;
-
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,36)
 #define rt_dst(rt) (rt->dst)
 #else
@@ -97,27 +85,6 @@ static void assign_config_rcu(struct vport *vport,
 	call_rcu(&old_config->rcu, free_config_rcu);
 }
 
-static unsigned int *find_port_pool(const struct tnl_mutable_config *mutable)
-{
-	bool is_multicast = ipv4_is_multicast(mutable->key.daddr);
-
-	if (mutable->flags & TNL_F_IN_KEY_MATCH) {
-		if (mutable->key.saddr)
-			return &local_remote_ports;
-		else if (is_multicast)
-			return &multicast_ports;
-		else
-			return &remote_ports;
-	} else {
-		if (mutable->key.saddr)
-			return &key_local_remote_ports;
-		else if (is_multicast)
-			return &key_multicast_ports;
-		else
-			return &key_remote_ports;
-	}
-}
-
 static u32 port_hash(const struct port_lookup_key *key)
 {
 	return jhash2((u32 *)key, (PORT_KEY_LEN / sizeof(u32)), 0);
@@ -137,8 +104,6 @@ static void port_table_add_port(struct vport *vport)
 	mutable = rtnl_dereference(tnl_vport->mutable);
 	hash = port_hash(&mutable->key);
 	hlist_add_head_rcu(&tnl_vport->hash_node, find_bucket(hash));
-
-	(*find_port_pool(rtnl_dereference(tnl_vport->mutable)))++;
 }
 
 static void port_table_remove_port(struct vport *vport)
@@ -146,12 +111,9 @@ static void port_table_remove_port(struct vport *vport)
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
 
 	hlist_del_init_rcu(&tnl_vport->hash_node);
-
-	(*find_port_pool(rtnl_dereference(tnl_vport->mutable)))--;
 }
 
-static struct vport *port_table_lookup(struct port_lookup_key *key,
-				       const struct tnl_mutable_config **pmutable)
+static struct vport *port_table_lookup(struct port_lookup_key *key)
 {
 	struct hlist_node *n;
 	struct hlist_head *bucket;
@@ -164,79 +126,21 @@ static struct vport *port_table_lookup(struct port_lookup_key *key,
 		struct tnl_mutable_config *mutable;
 
 		mutable = rcu_dereference_rtnl(tnl_vport->mutable);
-		if (!memcmp(&mutable->key, key, PORT_KEY_LEN)) {
-			*pmutable = mutable;
+		if (!memcmp(&mutable->key, key, PORT_KEY_LEN))
 			return tnl_vport_to_vport(tnl_vport);
-		}
 	}
 
 	return NULL;
 }
 
-struct vport *ovs_tnl_find_port(struct net *net, __be32 saddr, __be32 daddr,
-				__be64 key, int tunnel_type,
-				const struct tnl_mutable_config **mutable)
+struct vport *ovs_tnl_find_port(struct net *net, u32 tunnel_type)
 {
 	struct port_lookup_key lookup;
-	struct vport *vport;
-	bool is_multicast = ipv4_is_multicast(saddr);
 
 	port_key_set_net(&lookup, net);
-	lookup.saddr = saddr;
-	lookup.daddr = daddr;
-
-	/* First try for exact match on in_key. */
-	lookup.in_key = key;
-	lookup.tunnel_type = tunnel_type | TNL_T_KEY_EXACT;
-	if (!is_multicast && key_local_remote_ports) {
-		vport = port_table_lookup(&lookup, mutable);
-		if (vport)
-			return vport;
-	}
-	if (key_remote_ports) {
-		lookup.saddr = 0;
-		vport = port_table_lookup(&lookup, mutable);
-		if (vport)
-			return vport;
-
-		lookup.saddr = saddr;
-	}
-
-	/* Then try matches that wildcard in_key. */
-	lookup.in_key = 0;
-	lookup.tunnel_type = tunnel_type | TNL_T_KEY_MATCH;
-	if (!is_multicast && local_remote_ports) {
-		vport = port_table_lookup(&lookup, mutable);
-		if (vport)
-			return vport;
-	}
-	if (remote_ports) {
-		lookup.saddr = 0;
-		vport = port_table_lookup(&lookup, mutable);
-		if (vport)
-			return vport;
-	}
+	lookup.tunnel_type = tunnel_type;
 
-	if (is_multicast) {
-		lookup.saddr = 0;
-		lookup.daddr = saddr;
-		if (key_multicast_ports) {
-			lookup.tunnel_type = tunnel_type | TNL_T_KEY_EXACT;
-			lookup.in_key = key;
-			vport = port_table_lookup(&lookup, mutable);
-			if (vport)
-				return vport;
-		}
-		if (multicast_ports) {
-			lookup.tunnel_type = tunnel_type | TNL_T_KEY_MATCH;
-			lookup.in_key = 0;
-			vport = port_table_lookup(&lookup, mutable);
-			if (vport)
-				return vport;
-		}
-	}
-
-	return NULL;
+	return port_table_lookup(&lookup);
 }
 
 static void ecn_decapsulate(struct sk_buff *skb)
@@ -1008,11 +912,9 @@ static int tnl_set_config(struct net *net,
 			  struct tnl_mutable_config *mutable)
 {
 	const struct vport *old_vport;
-	const struct tnl_mutable_config *old_mutable;
 
 	mutable->flags = 0;
 	port_key_set_net(&mutable->key, net);
-	mutable->key.daddr = htonl(0);
 	mutable->key.tunnel_type = tnl_ops->tunnel_type;
 
 	mutable->tunnel_hlen = tnl_ops->hdr_len(mutable);
@@ -1021,7 +923,7 @@ static int tnl_set_config(struct net *net,
 
 	mutable->tunnel_hlen += sizeof(struct iphdr);
 
-	old_vport = port_table_lookup(&mutable->key, &old_mutable);
+	old_vport = port_table_lookup(&mutable->key);
 	if (old_vport && old_vport != cur_vport)
 		return -EEXIST;
 
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index 330df27..cddb88e 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -35,16 +35,9 @@
 
 /*
  * One of these goes in struct tnl_ops and in tnl_find_port().
- * These values are in the same namespace as other TNL_T_* values, so
- * only the least significant 10 bits are available to define protocol
- * identifiers.
  */
-#define TNL_T_PROTO_GRE		0
-#define TNL_T_PROTO_CAPWAP	1
-
-/* These flags are only needed when calling tnl_find_port(). */
-#define TNL_T_KEY_EXACT		(1 << 10)
-#define TNL_T_KEY_MATCH		(1 << 11)
+#define TNL_T_PROTO_GRE			0
+#define TNL_T_PROTO_CAPWAP		1
 
 /* Private flags not exposed to userspace in this form. */
 #define TNL_F_IN_KEY_MATCH	(1 << 16) /* Store the key in tun_id to
@@ -66,12 +59,9 @@
  * @tunnel_type: Set of TNL_T_* flags that define lookup.
  */
 struct port_lookup_key {
-	__be64 in_key;
 #ifdef CONFIG_NET_NS
 	struct net *net;
 #endif
-	__be32 saddr;
-	__be32 daddr;
 	u32    tunnel_type;
 };
 
@@ -212,9 +202,7 @@ const unsigned char *ovs_tnl_get_addr(const struct vport *vport);
 int ovs_tnl_send(struct vport *vport, struct sk_buff *skb);
 void ovs_tnl_rcv(struct vport *vport, struct sk_buff *skb);
 
-struct vport *ovs_tnl_find_port(struct net *net, __be32 saddr, __be32 daddr,
-				__be64 key, int tunnel_type,
-				const struct tnl_mutable_config **mutable);
+struct vport *ovs_tnl_find_port(struct net *net, u32 tunnel_type);
 bool ovs_tnl_frag_needed(struct vport *vport,
 			 const struct tnl_mutable_config *mutable,
 			 struct sk_buff *skb, unsigned int mtu,
diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c
index f26a7d2..a180b87 100644
--- a/datapath/vport-capwap.c
+++ b/datapath/vport-capwap.c
@@ -314,7 +314,6 @@ error:
 static int capwap_rcv(struct sock *sk, struct sk_buff *skb)
 {
 	struct vport *vport;
-	const struct tnl_mutable_config *mutable;
 	struct iphdr *iph;
 	struct ovs_key_ipv4_tunnel tun_key;
 	__be64 key = 0;
@@ -327,15 +326,13 @@ static int capwap_rcv(struct sock *sk, struct sk_buff *skb)
 		goto out;
 
 	iph = ip_hdr(skb);
-	vport = ovs_tnl_find_port(sock_net(sk), iph->daddr, iph->saddr, key,
-				  TNL_T_PROTO_CAPWAP, &mutable);
+	vport = ovs_tnl_find_port(dev_net(skb->dev), TNL_T_PROTO_CAPWAP);
 	if (unlikely(!vport)) {
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
 		goto error;
 	}
 
-	tun_key_init(&tun_key, iph,
-		     mutable->flags & TNL_F_IN_KEY_MATCH ? key : 0);
+	tun_key_init(&tun_key, iph, key);
 	OVS_CB(skb)->tun_key = &tun_key;
 
 	ovs_tnl_rcv(vport, skb);
diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
index f610097..8fab193 100644
--- a/datapath/vport-gre.c
+++ b/datapath/vport-gre.c
@@ -170,6 +170,8 @@ static int parse_header(struct iphdr *iph, __be16 *flags, __be64 *key)
 /* Called with rcu_read_lock and BH disabled. */
 static void gre_err(struct sk_buff *skb, u32 info)
 {
+#warning fix gre_err
+#if 0
 	struct vport *vport;
 	const struct tnl_mutable_config *mutable;
 	const int type = icmp_hdr(skb)->type;
@@ -292,6 +294,7 @@ out:
 	skb_set_mac_header(skb, orig_mac_header);
 	skb_set_network_header(skb, orig_nw_header);
 	skb->protocol = htons(ETH_P_IP);
+#endif
 }
 
 static bool check_checksum(struct sk_buff *skb)
@@ -324,7 +327,6 @@ static bool check_checksum(struct sk_buff *skb)
 static int gre_rcv(struct sk_buff *skb)
 {
 	struct vport *vport;
-	const struct tnl_mutable_config *mutable;
 	int hdr_len;
 	struct iphdr *iph;
 	struct ovs_key_ipv4_tunnel tun_key;
@@ -345,16 +347,14 @@ static int gre_rcv(struct sk_buff *skb)
 		goto error;
 
 	iph = ip_hdr(skb);
-	vport = ovs_tnl_find_port(dev_net(skb->dev), iph->daddr, iph->saddr, key,
-				  TNL_T_PROTO_GRE, &mutable);
+	vport = ovs_tnl_find_port(dev_net(skb->dev), TNL_T_PROTO_GRE);
 	if (unlikely(!vport)) {
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
 		goto error;
 	}
 
 
-	tun_key_init(&tun_key, iph,
-		     mutable->flags & TNL_F_IN_KEY_MATCH ? key : 0);
+	tun_key_init(&tun_key, iph, key);
 	OVS_CB(skb)->tun_key = &tun_key;
 
 	__skb_pull(skb, hdr_len);
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 20/21] datapath: Use tun_key flags for id and csum settings on transmit
  2012-05-24  9:08 [RFC v4 00/21] Flow Based Tunneling for Open vSwitch Simon Horman
                   ` (6 preceding siblings ...)
  2012-05-24  9:09 ` [PATCH 19/21] datapath: Simplify vport lookup Simon Horman
@ 2012-05-24  9:09 ` Simon Horman
  2012-05-24  9:09 ` [PATCH 21/21] datapath: Always use tun_key flags Simon Horman
  8 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman

The use of these flags in the tnl_mutable_config structure
are no longer correct as a tunnel device may be used to
transmit packets for many different tunnels.

This change restores the checksum and out key behavior of
tunneling.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-of-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c       | 58 ++++++++++++++++++++++++-------------------------
 datapath/tunnel.h       | 12 +++-------
 datapath/vport-capwap.c | 28 ++++++++++++------------
 datapath/vport-gre.c    | 33 ++++++++++++++--------------
 4 files changed, 63 insertions(+), 68 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index a303d8d..982de25 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -500,7 +500,7 @@ bool ovs_tnl_frag_needed(struct vport *vport,
 
 static bool check_mtu(struct sk_buff *skb,
 		      struct vport *vport,
-		      const struct tnl_mutable_config *mutable,
+		      const struct tnl_mutable_config *mutable, int tun_hlen,
 		      const struct rtable *rt, __be16 *frag_offp)
 {
 	bool df_inherit = mutable->flags & TNL_F_DF_INHERIT;
@@ -524,10 +524,7 @@ static bool check_mtu(struct sk_buff *skb,
 		    eth_hdr(skb)->h_proto == htons(ETH_P_8021Q))
 			vlan_header = VLAN_HLEN;
 
-		mtu = dst_mtu(&rt_dst(rt))
-			- ETH_HLEN
-			- mutable->tunnel_hlen
-			- vlan_header;
+		mtu = dst_mtu(&rt_dst(rt)) - ETH_HLEN - tun_hlen - vlan_header;
 	}
 
 	if (skb->protocol == htons(ETH_P_IP)) {
@@ -569,11 +566,10 @@ static bool check_mtu(struct sk_buff *skb,
 }
 
 static void create_tunnel_header(const struct vport *vport,
-				 const struct tnl_mutable_config *mutable,
-				 const struct rtable *rt, void *header)
+				 const struct rtable *rt, struct sk_buff *skb)
 {
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
-	struct iphdr *iph = header;
+	struct iphdr *iph = (struct iphdr *)skb->data;
 
 	iph->version	= 4;
 	iph->ihl	= sizeof(struct iphdr) >> 2;
@@ -584,7 +580,7 @@ static void create_tunnel_header(const struct vport *vport,
 	if (!iph->ttl)
 		iph->ttl = ip4_dst_hoplimit(&rt_dst(rt));
 
-	tnl_vport->tnl_ops->build_header(vport, mutable, iph + 1);
+	tnl_vport->tnl_ops->build_header(vport, skb);
 }
 
 #ifdef HAVE_RT_GENID
@@ -657,16 +653,14 @@ static bool need_linearize(const struct sk_buff *skb)
 	return false;
 }
 
-static struct sk_buff *handle_offloads(struct sk_buff *skb,
-				       const struct tnl_mutable_config *mutable,
+static struct sk_buff *handle_offloads(struct sk_buff *skb, int tun_hlen,
 				       const struct rtable *rt)
 {
 	int min_headroom;
 	int err;
 
 	min_headroom = LL_RESERVED_SPACE(rt_dst(rt).dev) + rt_dst(rt).header_len
-			+ mutable->tunnel_hlen
-			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
+			+ tun_hlen + (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
 	if (skb_headroom(skb) < min_headroom || skb_header_cloned(skb)) {
 		int head_delta = SKB_DATA_ALIGN(min_headroom -
@@ -719,15 +713,14 @@ error:
 	return ERR_PTR(err);
 }
 
-static int send_frags(struct sk_buff *skb,
-		      const struct tnl_mutable_config *mutable)
+static int send_frags(struct sk_buff *skb, int tun_hlen)
 {
 	int sent_len;
 
 	sent_len = 0;
 	while (skb) {
 		struct sk_buff *next = skb->next;
-		int frag_len = skb->len - mutable->tunnel_hlen;
+		int frag_len = skb->len - tun_hlen;
 		int err;
 
 		skb->next = NULL;
@@ -752,6 +745,14 @@ free_frags:
 	return sent_len;
 }
 
+static int tunnel_hlen(struct tnl_vport *tnl_vport, struct sk_buff *skb)
+{
+	int tun_hlen = tnl_vport->tnl_ops->hdr_len(skb);
+	if (tun_hlen < 0)
+		return tun_hlen;
+	return tun_hlen + sizeof(struct iphdr);
+}
+
 int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 {
 	struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
@@ -765,6 +766,7 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	u8 ttl;
 	u8 inner_tos;
 	u8 tos;
+	int tun_hlen;
 
 	if (!OVS_CB(skb)->tun_key)
 		goto error_free;
@@ -822,13 +824,17 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	skb_dst_drop(skb);
 	skb_clear_rxhash(skb);
 
+	tun_hlen = tunnel_hlen(tnl_vport, skb);
+	if (unlikely(tun_hlen < 0))
+		goto error;
+
 	/* Offloading */
-	skb = handle_offloads(skb, mutable, rt);
+	skb = handle_offloads(skb, tun_hlen, rt);
 	if (IS_ERR(skb))
 		goto error;
 
 	/* MTU */
-	if (unlikely(!check_mtu(skb, vport, mutable, rt, &frag_off))) {
+	if (unlikely(!check_mtu(skb, vport, mutable, tun_hlen, rt, &frag_off))) {
 		err = VPORT_E_TX_DROPPED;
 		goto error_free;
 	}
@@ -837,7 +843,7 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	 * If we are over the MTU, allow the IP stack to handle fragmentation.
 	 * Fragmentation is a slow path anyways.
 	 */
-	if (unlikely(skb->len + mutable->tunnel_hlen > dst_mtu(&rt_dst(rt)))) {
+	if (unlikely(skb->len + tun_hlen > dst_mtu(&rt_dst(rt)))) {
 		unattached_dst = &rt_dst(rt);
 		dst_hold(unattached_dst);
 	}
@@ -862,8 +868,8 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 		if (unlikely(vlan_deaccel_tag(skb)))
 			goto next;
 
-		skb_push(skb, mutable->tunnel_hlen);
-		create_tunnel_header(vport, mutable, rt, skb->data);
+		skb_push(skb, tun_hlen);
+		create_tunnel_header(vport, rt, skb);
 		skb_reset_network_header(skb);
 
 		if (next_skb)
@@ -880,12 +886,12 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 		iph->frag_off = frag_off;
 		ip_select_ident(iph, &rt_dst(rt), NULL);
 
-		skb = tnl_vport->tnl_ops->update_header(vport, mutable,
+		skb = tnl_vport->tnl_ops->update_header(vport, tun_hlen,
 							&rt_dst(rt), skb);
 		if (unlikely(!skb))
 			goto next;
 
-		sent_len += send_frags(skb, mutable);
+		sent_len += send_frags(skb, tun_hlen);
 next:
 		skb = next_skb;
 	}
@@ -917,12 +923,6 @@ static int tnl_set_config(struct net *net,
 	port_key_set_net(&mutable->key, net);
 	mutable->key.tunnel_type = tnl_ops->tunnel_type;
 
-	mutable->tunnel_hlen = tnl_ops->hdr_len(mutable);
-	if (mutable->tunnel_hlen < 0)
-		return mutable->tunnel_hlen;
-
-	mutable->tunnel_hlen += sizeof(struct iphdr);
-
 	old_vport = port_table_lookup(&mutable->key);
 	if (old_vport && old_vport != cur_vport)
 		return -EEXIST;
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index cddb88e..a32241f 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -84,10 +84,8 @@ static inline void port_key_set_net(struct port_lookup_key *key, struct net *net
  * attributes.
  * @rcu: RCU callback head for deferred destruction.
  * @seq: Sequence number for distinguishing configuration versions.
- * @tunnel_hlen: Tunnel header length.
  * @eth_addr: Source address for packets generated by tunnel itself
  * (e.g. ICMP fragmentation needed messages).
- * @out_key: Key to use on output, 0 if this tunnel has no fixed output key.
  * @flags: TNL_F_* flags.
  */
 struct tnl_mutable_config {
@@ -96,12 +94,9 @@ struct tnl_mutable_config {
 
 	unsigned seq;
 
-	unsigned tunnel_hlen;
-
 	unsigned char eth_addr[ETH_ALEN];
 
 	/* Configured via OVS_TUNNEL_ATTR_* attributes. */
-	__be64	out_key;
 	u32	flags;
 };
 
@@ -114,7 +109,7 @@ struct tnl_ops {
 	 * build_header() (i.e. excludes the IP header).  Returns a negative
 	 * error code if the configuration is invalid.
 	 */
-	int (*hdr_len)(const struct tnl_mutable_config *);
+	int (*hdr_len)(struct sk_buff *skb);
 
 	/*
 	 * Builds the static portion of the tunnel header, which is stored in
@@ -124,8 +119,7 @@ struct tnl_ops {
 	 * in some circumstances caching is disabled and this function will be
 	 * called for every packet, so try not to make it too slow.
 	 */
-	void (*build_header)(const struct vport *,
-			     const struct tnl_mutable_config *, void *header);
+	void (*build_header)(const struct vport *, struct sk_buff *);
 
 	/*
 	 * Updates the cached header of a packet to match the actual packet
@@ -136,7 +130,7 @@ struct tnl_ops {
 	 * of fragmentation).
 	 */
 	struct sk_buff *(*update_header)(const struct vport *,
-					 const struct tnl_mutable_config *,
+					 int tun_hlen,
 					 struct dst_entry *, struct sk_buff *);
 };
 
diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c
index a180b87..102a207 100644
--- a/datapath/vport-capwap.c
+++ b/datapath/vport-capwap.c
@@ -155,16 +155,17 @@ static struct inet_frags frag_state = {
 	.secret_interval = CAPWAP_FRAG_SECRET_INTERVAL,
 };
 
-static int capwap_hdr_len(const struct tnl_mutable_config *mutable)
+static int capwap_hdr_len(struct sk_buff *skb)
 {
 	int size = CAPWAP_MIN_HLEN;
 
 	/* CAPWAP has no checksums. */
-	if (mutable->flags & TNL_F_CSUM)
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_CSUM) {
 		return -EINVAL;
 
 	/* if keys are specified, then add WSI field */
-	if (mutable->out_key || (mutable->flags & TNL_F_OUT_KEY_ACTION)) {
+	if (OVS_CB(skb)->tun_key->tun_id ||
+	    OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION)
 		size += sizeof(struct capwaphdr_wsi) +
 			sizeof(struct capwaphdr_wsi_key);
 	}
@@ -172,11 +173,10 @@ static int capwap_hdr_len(const struct tnl_mutable_config *mutable)
 	return size;
 }
 
-static void capwap_build_header(const struct vport *vport,
-				const struct tnl_mutable_config *mutable,
-				void *header)
+static void capwap_build_header(const struct vport *vport, struct sk_buff *skb)
 {
-	struct udphdr *udph = header;
+	struct iphdr *iph = (struct iphdr *)skb->data;
+	struct udphdr *udph = (struct udphdr *)(iph + 1);
 	struct capwaphdr *cwh = (struct capwaphdr *)(udph + 1);
 
 	udph->source = htons(CAPWAP_SRC_PORT);
@@ -186,7 +186,8 @@ static void capwap_build_header(const struct vport *vport,
 	cwh->frag_id = 0;
 	cwh->frag_off = 0;
 
-	if (mutable->out_key || (mutable->flags & TNL_F_OUT_KEY_ACTION)) {
+	if (OVS_CB(skb)->tun_key->tun_id ||
+	    OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION) {
 		struct capwaphdr_wsi *wsi = (struct capwaphdr_wsi *)(cwh + 1);
 
 		cwh->begin = CAPWAP_KEYED;
@@ -197,9 +198,9 @@ static void capwap_build_header(const struct vport *vport,
 		wsi->flags = CAPWAP_WSI_F_KEY64;
 		wsi->reserved_padding = 0;
 
-		if (mutable->out_key) {
+		if (OVS_CB(skb)->tun_key->tun_id) {
 			struct capwaphdr_wsi_key *opt = (struct capwaphdr_wsi_key *)(wsi + 1);
-			opt->key = mutable->out_key;
+			opt->key = OVS_CB(skb)->tun_key->tun_id;
 		}
 	} else {
 		/* make packet readable by old capwap code */
@@ -208,13 +209,12 @@ static void capwap_build_header(const struct vport *vport,
 }
 
 static struct sk_buff *capwap_update_header(const struct vport *vport,
-					    const struct tnl_mutable_config *mutable,
-					    struct dst_entry *dst,
+					    int tun_hlen, struct dst_entry *dst,
 					    struct sk_buff *skb)
 {
 	struct udphdr *udph = udp_hdr(skb);
 
-	if (mutable->flags & TNL_F_OUT_KEY_ACTION) {
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION) {
 		/* first field in WSI is key */
 		struct capwaphdr *cwh = (struct capwaphdr *)(udph + 1);
 		struct capwaphdr_wsi *wsi = (struct capwaphdr_wsi *)(cwh + 1);
@@ -226,7 +226,7 @@ static struct sk_buff *capwap_update_header(const struct vport *vport,
 	udph->len = htons(skb->len - skb_transport_offset(skb));
 
 	if (unlikely(skb->len - skb_network_offset(skb) > dst_mtu(dst))) {
-		unsigned int hlen = skb_transport_offset(skb) + capwap_hdr_len(mutable);
+		unsigned int hlen = skb_transport_offset(skb) + capwap_hdr_len(skb);
 		skb = fragment(skb, vport, dst, hlen);
 	}
 
diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
index 8fab193..b6a4308 100644
--- a/datapath/vport-gre.c
+++ b/datapath/vport-gre.c
@@ -45,16 +45,17 @@ struct gre_base_hdr {
 	__be16 protocol;
 };
 
-static int gre_hdr_len(const struct tnl_mutable_config *mutable)
+static int gre_hdr_len(struct sk_buff *skb)
 {
 	int len;
 
 	len = GRE_HEADER_SECTION;
 
-	if (mutable->flags & TNL_F_CSUM)
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_CSUM)
 		len += GRE_HEADER_SECTION;
 
-	if (mutable->out_key || mutable->flags & TNL_F_OUT_KEY_ACTION)
+	if (OVS_CB(skb)->tun_key->tun_id ||
+	    OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION)
 		len += GRE_HEADER_SECTION;
 
 	return len;
@@ -70,41 +71,41 @@ static __be32 be64_get_low32(__be64 x)
 #endif
 }
 
-static void gre_build_header(const struct vport *vport,
-			     const struct tnl_mutable_config *mutable,
-			     void *header)
+static void gre_build_header(const struct vport *vport, struct sk_buff *skb)
 {
-	struct gre_base_hdr *greh = header;
+	struct iphdr *iph = (struct iphdr *)skb->data;
+	struct gre_base_hdr *greh = (struct gre_base_hdr *)(iph + 1);
 	__be32 *options = (__be32 *)(greh + 1);
 
 	greh->protocol = htons(ETH_P_TEB);
 	greh->flags = 0;
 
-	if (mutable->flags & TNL_F_CSUM) {
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_CSUM) {
 		greh->flags |= GRE_CSUM;
 		*options = 0;
 		options++;
 	}
 
-	if (mutable->out_key || mutable->flags & TNL_F_OUT_KEY_ACTION)
+	if (OVS_CB(skb)->tun_key->tun_id ||
+	    OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION)
 		greh->flags |= GRE_KEY;
 
-	if (mutable->out_key)
-		*options = be64_get_low32(mutable->out_key);
+	if (OVS_CB(skb)->tun_key->tun_id)
+		*options = be64_get_low32(OVS_CB(skb)->tun_key->tun_id);
 }
 
 static struct sk_buff *gre_update_header(const struct vport *vport,
-					 const struct tnl_mutable_config *mutable,
-					 struct dst_entry *dst,
+					 int tun_hlen, struct dst_entry *dst,
 					 struct sk_buff *skb)
 {
-	__be32 *options = (__be32 *)(skb_network_header(skb) + mutable->tunnel_hlen
+	__be32 *options = (__be32 *)(skb_network_header(skb) + tun_hlen
 					       - GRE_HEADER_SECTION);
 
-	if (mutable->out_key || mutable->flags & TNL_F_OUT_KEY_ACTION)
+	if (OVS_CB(skb)->tun_key->tun_id ||
+	    OVS_CB(skb)->tun_key->tun_flags & TNL_F_OUT_KEY_ACTION)
 		options--;
 
-	if (mutable->flags & TNL_F_CSUM)
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_CSUM)
 		*(__sum16 *)options = csum_fold(skb_checksum(skb,
 						skb_transport_offset(skb),
 						skb->len - skb_transport_offset(skb),
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 21/21] datapath: Always use tun_key flags
  2012-05-24  9:08 [RFC v4 00/21] Flow Based Tunneling for Open vSwitch Simon Horman
                   ` (7 preceding siblings ...)
  2012-05-24  9:09 ` [PATCH 20/21] datapath: Use tun_key flags for id and csum settings on transmit Simon Horman
@ 2012-05-24  9:09 ` Simon Horman
  8 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24  9:09 UTC (permalink / raw)
  To: dev; +Cc: netdev, Kyle Mestery, Simon Horman

These flags should always be valid and allows the flags
element of tnl_mutable_config to be removed.

The flags in mutable were actually not being set due to a previous patch in
this series, so all flag-related features, except outgoing ken and csum
which were restored in a previous patch, were disabled.

Cc: Kyle Mestery <kmestery@cisco.com>
Signed-of-by: Simon Horman <horms@verge.net.au>
---
 datapath/tunnel.c | 13 ++++++-------
 datapath/tunnel.h |  4 ----
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/datapath/tunnel.c b/datapath/tunnel.c
index 982de25..a91e319 100644
--- a/datapath/tunnel.c
+++ b/datapath/tunnel.c
@@ -482,7 +482,7 @@ bool ovs_tnl_frag_needed(struct vport *vport,
 	 * not symmetric then PMTUD needs to be disabled since we won't have
 	 * any way of synthesizing packets.
 	 */
-	if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
+	if ((OVS_CB(skb)->tun_key->tun_flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
 	    (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
 		ntun_key = *tun_key;
 		OVS_CB(nskb)->tun_key = &ntun_key;
@@ -503,9 +503,9 @@ static bool check_mtu(struct sk_buff *skb,
 		      const struct tnl_mutable_config *mutable, int tun_hlen,
 		      const struct rtable *rt, __be16 *frag_offp)
 {
-	bool df_inherit = mutable->flags & TNL_F_DF_INHERIT;
-	bool pmtud = mutable->flags & TNL_F_PMTUD;
-	__be16 frag_off = mutable->flags & TNL_F_DF_DEFAULT ? htons(IP_DF) : 0;
+	bool df_inherit = OVS_CB(skb)->tun_key->tun_flags & TNL_F_DF_INHERIT;
+	bool pmtud = OVS_CB(skb)->tun_key->tun_flags & TNL_F_PMTUD;
+	__be16 frag_off = OVS_CB(skb)->tun_key->tun_flags & TNL_F_DF_DEFAULT ? htons(IP_DF) : 0;
 	int mtu = 0;
 	unsigned int packet_length = skb->len - ETH_HLEN;
 
@@ -804,7 +804,7 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	else
 		inner_tos = 0;
 
-	if (mutable->flags & TNL_F_TOS_INHERIT)
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_TOS_INHERIT)
 		tos = inner_tos;
 	else
 		tos = OVS_CB(skb)->tun_key->ipv4_tos;
@@ -851,7 +851,7 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
 	ttl = OVS_CB(skb)->tun_key->ipv4_ttl;
 	if (!ttl)
 		ttl = ip4_dst_hoplimit(&rt_dst(rt));
-	if (mutable->flags & TNL_F_TTL_INHERIT) {
+	if (OVS_CB(skb)->tun_key->tun_flags & TNL_F_TTL_INHERIT) {
 		if (skb->protocol == htons(ETH_P_IP))
 			ttl = ip_hdr(skb)->ttl;
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
@@ -919,7 +919,6 @@ static int tnl_set_config(struct net *net,
 {
 	const struct vport *old_vport;
 
-	mutable->flags = 0;
 	port_key_set_net(&mutable->key, net);
 	mutable->key.tunnel_type = tnl_ops->tunnel_type;
 
diff --git a/datapath/tunnel.h b/datapath/tunnel.h
index a32241f..4893903 100644
--- a/datapath/tunnel.h
+++ b/datapath/tunnel.h
@@ -86,7 +86,6 @@ static inline void port_key_set_net(struct port_lookup_key *key, struct net *net
  * @seq: Sequence number for distinguishing configuration versions.
  * @eth_addr: Source address for packets generated by tunnel itself
  * (e.g. ICMP fragmentation needed messages).
- * @flags: TNL_F_* flags.
  */
 struct tnl_mutable_config {
 	struct port_lookup_key key;
@@ -95,9 +94,6 @@ struct tnl_mutable_config {
 	unsigned seq;
 
 	unsigned char eth_addr[ETH_ALEN];
-
-	/* Configured via OVS_TUNNEL_ATTR_* attributes. */
-	u32	flags;
 };
 
 struct tnl_ops {
-- 
1.7.10.2.484.gcd07cc5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/21] odp-util: Add tun_key to parse_odp_key_attr()
       [not found]   ` <1337850554-10339-4-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
@ 2012-05-24 16:29     ` Ben Pfaff
       [not found]       ` <20120524162911.GD26173-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Pfaff @ 2012-05-24 16:29 UTC (permalink / raw)
  To: Simon Horman; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

On Thu, May 24, 2012 at 06:08:56PM +0900, Simon Horman wrote:
> Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>

But I don't see him CCed?

> +        ovs_be32 ipv4_src;
> +        ovs_be32 ipv4_dst;
> +        unsigned long long tun_flags;
> +        int ipv4_tos;
> +        int ipv4_ttl;
> +        int n = -1;
> +
> +        if (sscanf(s, "ipv4_tunnel(tun_id=%31[x0123456789abcdefABCDEF]"
> +                   ",flags=%llx,src="IP_SCAN_FMT",dst="IP_SCAN_FMT
> +                   ",tos=%i,ttl=%i)%n",
> +                   tun_id_s, &tun_flags,
> +                   IP_SCAN_ARGS(&ipv4_src), IP_SCAN_ARGS(&ipv4_dst),
> +                   &ipv4_tos, &ipv4_ttl, &n) > 0
> +            && n > 0) {

Does this compile?  I don't see a declaration of tun_id_s.

In the ODP printer and parser, we usually require fields that are
hexadecimal to be written with an explicit "0x" on output (using
something like "0x%x" or "%#x" on output), and then use "%i" on input,
so that it is always unambiguous at a glance whether a number is
decimal or hexadecimal.  I'd appreciate it if we could maintain that
here (I didn't look over at the printer code to see if it writes 0x,
but I'd like it to).

Otherwise, this looks good, thank you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/21] vswitchd: Add iface_parse_tunnel
       [not found]     ` <1337850554-10339-5-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
@ 2012-05-24 16:47       ` Ben Pfaff
  2012-05-24 23:59         ` [ovs-dev] " Simon Horman
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Pfaff @ 2012-05-24 16:47 UTC (permalink / raw)
  To: Simon Horman; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

The concept seems OK to me here.  I have only a few minor comments.

On Thu, May 24, 2012 at 06:08:57PM +0900, Simon Horman wrote:
> +#define TNL_F_CSUM          (1 << 0) /* Checksum packets. */
> +#define TNL_F_TOS_INHERIT	(1 << 1) /* Inherit ToS from inner packet. */
> +#define TNL_F_TTL_INHERIT	(1 << 2) /* Inherit TTL from inner packet. */
> +#define TNL_F_DF_INHERIT	(1 << 3) /* Inherit DF bit from inner packet. */
> +#define TNL_F_DF_DEFAULT	(1 << 4) /* Set DF bit if inherit off or
> +                                      * not IP. */
> +#define TNL_F_PMTUD		    (1 << 5) /* Enable path MTU discovery. */
> +#define TNL_F_HDR_CACHE		(1 << 6) /* Enable tunnel header caching. */
> +#define TNL_F_IPSEC		    (1 << 7) /* Traffic is IPsec encrypted. */
> +#define TNL_F_IN_KEY	    (1 << 8) /* Tunnel port has input key. */
> +#define TNL_F_OUT_KEY	    (1 << 9) /* Tunnel port has output key. */

Some of the above definitions use all spaces, others use tabs.  It's
OVS userspace code so it's better to use all spaces, I think.

> +    if (is_ipsec) {
> +        char *file_name = xasprintf("%s/%s", ovs_rundir(),
> +                "ovs-monitor-ipsec.pid");
> +        pid_t pid = read_pidfile(file_name);
> +        free(file_name);
> +        if (pid < 0) {
> +            VLOG_ERR("%s: IPsec requires the ovs-monitor-ipsec daemon",
> +                     iface_cfg->name);
> +            goto err;
> +        }

I just noticed that we re-read this pidfile every time we parse an
IPsec tunnel.  I guess that would be a big waste of time if we have a
lot of IPsec tunnels.  I'll make a note to consider fixing this
separately (it's not your problem).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ovs-dev] [PATCH 04/21] vswitchd: Add iface_parse_tunnel
  2012-05-24 16:47       ` Ben Pfaff
@ 2012-05-24 23:59         ` Simon Horman
  0 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-24 23:59 UTC (permalink / raw)
  To: Ben Pfaff; +Cc: dev, netdev

On Thu, May 24, 2012 at 09:47:38AM -0700, Ben Pfaff wrote:
> The concept seems OK to me here.  I have only a few minor comments.
> 
> On Thu, May 24, 2012 at 06:08:57PM +0900, Simon Horman wrote:
> > +#define TNL_F_CSUM          (1 << 0) /* Checksum packets. */
> > +#define TNL_F_TOS_INHERIT	(1 << 1) /* Inherit ToS from inner packet. */
> > +#define TNL_F_TTL_INHERIT	(1 << 2) /* Inherit TTL from inner packet. */
> > +#define TNL_F_DF_INHERIT	(1 << 3) /* Inherit DF bit from inner packet. */
> > +#define TNL_F_DF_DEFAULT	(1 << 4) /* Set DF bit if inherit off or
> > +                                      * not IP. */
> > +#define TNL_F_PMTUD		    (1 << 5) /* Enable path MTU discovery. */
> > +#define TNL_F_HDR_CACHE		(1 << 6) /* Enable tunnel header caching. */
> > +#define TNL_F_IPSEC		    (1 << 7) /* Traffic is IPsec encrypted. */
> > +#define TNL_F_IN_KEY	    (1 << 8) /* Tunnel port has input key. */
> > +#define TNL_F_OUT_KEY	    (1 << 9) /* Tunnel port has output key. */
> 
> Some of the above definitions use all spaces, others use tabs.  It's
> OVS userspace code so it's better to use all spaces, I think.

Sorry about that. I have a bit of trouble remembering to switch
tabbing modes in my editor depending on if I am in user-space or the
datapath.

> > +    if (is_ipsec) {
> > +        char *file_name = xasprintf("%s/%s", ovs_rundir(),
> > +                "ovs-monitor-ipsec.pid");
> > +        pid_t pid = read_pidfile(file_name);
> > +        free(file_name);
> > +        if (pid < 0) {
> > +            VLOG_ERR("%s: IPsec requires the ovs-monitor-ipsec daemon",
> > +                     iface_cfg->name);
> > +            goto err;
> > +        }
> 
> I just noticed that we re-read this pidfile every time we parse an
> IPsec tunnel.  I guess that would be a big waste of time if we have a
> lot of IPsec tunnels.  I'll make a note to consider fixing this
> separately (it's not your problem).

I guess that it should be easy enough to set a flag if any of the parsed
configurations use ipsec and perform the pid check if so.

As it is, I wouldn't be at all surprised if my series breaks ipsec as
I haven't tested it (with or without my changes).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/21] odp-util: Add tun_key to parse_odp_key_attr()
       [not found]       ` <20120524162911.GD26173-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
@ 2012-05-25  0:01         ` Simon Horman
  0 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-05-25  0:01 UTC (permalink / raw)
  To: Ben Pfaff; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

On Thu, May 24, 2012 at 09:29:11AM -0700, Ben Pfaff wrote:
> On Thu, May 24, 2012 at 06:08:56PM +0900, Simon Horman wrote:
> > Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
> 
> But I don't see him CCed?

Strange. I asked git send-mail to CC him explicitly.

> > +        ovs_be32 ipv4_src;
> > +        ovs_be32 ipv4_dst;
> > +        unsigned long long tun_flags;
> > +        int ipv4_tos;
> > +        int ipv4_ttl;
> > +        int n = -1;
> > +
> > +        if (sscanf(s, "ipv4_tunnel(tun_id=%31[x0123456789abcdefABCDEF]"
> > +                   ",flags=%llx,src="IP_SCAN_FMT",dst="IP_SCAN_FMT
> > +                   ",tos=%i,ttl=%i)%n",
> > +                   tun_id_s, &tun_flags,
> > +                   IP_SCAN_ARGS(&ipv4_src), IP_SCAN_ARGS(&ipv4_dst),
> > +                   &ipv4_tos, &ipv4_ttl, &n) > 0
> > +            && n > 0) {
> 
> Does this compile?  I don't see a declaration of tun_id_s.
> 
> In the ODP printer and parser, we usually require fields that are
> hexadecimal to be written with an explicit "0x" on output (using
> something like "0x%x" or "%#x" on output), and then use "%i" on input,
> so that it is always unambiguous at a glance whether a number is
> decimal or hexadecimal.  I'd appreciate it if we could maintain that
> here (I didn't look over at the printer code to see if it writes 0x,
> but I'd like it to).
> 
> Otherwise, this looks good, thank you.

Sorry, perhaps this is not the latest revision, somehow.
I did have it compiling, and I'll update the patch accordingly.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/21] vswitchd: Add add_tunnel_ports()
       [not found]     ` <1337850554-10339-6-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
@ 2012-05-25 17:18       ` Ben Pfaff
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Pfaff @ 2012-05-25 17:18 UTC (permalink / raw)
  To: Simon Horman; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

On Thu, May 24, 2012 at 06:08:58PM +0900, Simon Horman wrote:
> Add tunnel tundevs for tunnel realdevs as needed.
> 
> In general the notion is that realdevs may be configured by users
> and from an end-user point of view are compatible with the existing
> port-based tunneling code. And that tundevs exist in the datapath
> arnd are actually used to send and recieve packets, based on flows.
> 
> Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

This seems reasonable at a glance.  There are bits I might quibble
with as this gets closer, but the structure seems reasonable.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key
       [not found]     ` <1337850554-10339-2-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
@ 2012-06-03  9:01       ` Jesse Gross
  0 siblings, 0 replies; 32+ messages in thread
From: Jesse Gross @ 2012-06-03  9:01 UTC (permalink / raw)
  To: Simon Horman; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 9431 bytes --]

On May 24, 2012, at 2:08 AM, Simon Horman wrote:

> this is a first pass at providing a tun_key which can be used
> as the basis for flow-based tunnelling. The tun_key includes and
> replaces the tun_id in both struct ovs_skb_cb and struct sw_tun_key.
> 
> In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow
> when support for IPv6 to an extent that inlining the structure will result
> in ovs_skb_cb being larger than the 48 bytes available in skb->cb.
> 
> As OVS does not support IPv6 as the outer transport protocol for tunnels
> the IPv6 portions of this change, which appeared in the previous revision,
> have been dropped in order to limit the scope and size of this patch.
> 
> This patch does not make any effort to retain the existing tun_id behaviour
> nor does it fully implement flow-based tunnels. As such it it is incomplete
> and can't be used in its current form (other than to break OVS tunnelling).
> 
> ** Please do not apply **
> 
> Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

Thanks and sorry again about being so slow to look at this.

Overall, this looks pretty good to me.  The main difficulty that I had was in figuring out what should go with the old behavior and what should go with the new since it's at an intermediate point between the two but I understand that it's difficult to break it up in a way that both encapsulates a particular set of functionality and isn't too large.  Otherwise, I noticed a few specific things that I noted below.

> diff --git a/datapath/flow.c b/datapath/flow.c
> index d07337c..49c0dd8 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> @@ -1162,14 +1166,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
>  * get the metadata, that is, the parts of the flow key that cannot be
>  * extracted from the packet itself.
>  */
> -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
> +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
> +				   struct ovs_key_ipv4_tunnel *tun_key,
> 				   const struct nlattr *attr)
> {
> 	const struct nlattr *nla;
> 	int rem;
> 
> 	*in_port = DP_MAX_PORTS;
> -	*tun_id = 0;
> +	tun_key->tun_id = 0;

I think we probably want to memset the entire tun_key to zero to avoid having potentially uninitialized data in the flow.

> 
> @@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
> int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
> {
> 	struct ovs_key_ethernet *eth_key;
> +	struct ovs_key_ipv4_tunnel *tun_key;
> 	struct nlattr *nla, *encap;
> 
> 	if (swkey->phy.priority &&
> 	    nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
> 		goto nla_put_failure;
> 
> -	if (swkey->phy.tun_id != cpu_to_be64(0) &&
> -	    nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id))
> -		goto nla_put_failure;
> +	if (swkey->phy.tun_key.ipv4_dst) {

It's probably OK to use DIP equal to zero as a not present marker but we need to enforce that it's always true - for example we shouldn't allow somebody to setup a flow that way or receive packets with a zero address.  Alternately, we may be able to find a spare bit to indicate this, like is done with vlans.

In any case, I think we need to do some additional validation when setting up flows to check reserved space, for example, as otherwise that will never match.

> diff --git a/datapath/flow.h b/datapath/flow.h
> index 5be481e..bab5363 100644
> --- a/datapath/flow.h
> +++ b/datapath/flow.h
> @@ -42,7 +42,7 @@ struct sw_flow_actions {
> 
> struct sw_flow_key {
> 	struct {
> -		__be64	tun_id;		/* Encapsulating tunnel ID. */
> +		struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */

This is an optimization but as we get closer I'd like to put the tun_key at the end of struct sw_flow_key so that packets that didn't come from a tunnel don't have to pay the cost during the lookup (this is especially true as we add support for IPv6 tunnels).

In a similar vein, struct ovs_key_ipv4_tunnel contains some fields that I think can never apply for lookup such as the flags so it would be nice if we could remove that for lookup.

> 
> @@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
>  *                         ------  ---  ------  -----
>  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
>  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
> + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24

If my math is correct, I think the size of the base struct ova_key_ipv4_tunnel is 24 bytes.

> +static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key)
> +{
> +	__be32 ndst = tun_key->ipv4_src;
> +	tun_key->ipv4_src = tun_key->ipv4_dst;
> +	tun_key->ipv4_dst = ndst;
> +}

I'm not quite sure when we would need to swap the addresses in a tunnel and I didn't see any uses of this function.

> +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
> +				const struct iphdr *iph, __be64 tun_id)
> +{
> +	tun_key->tun_id = tun_id;
> +	tun_key->ipv4_src = iph->saddr;
> +	tun_key->ipv4_dst = iph->daddr;
> +	tun_key->ipv4_tos = iph->tos;
> +	tun_key->ipv4_ttl = iph->ttl;
> +}
> 

Aren't there some fields that we need to zero out to avoid problems in the lookup?

> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> index d651c11..010e513 100644
> --- a/datapath/tunnel.c
> +++ b/datapath/tunnel.c
> @@ -367,9 +367,9 @@ struct vport *ovs_tnl_find_port(struct net *net, __be32 saddr, __be32 daddr,
> 	return NULL;
> }
> 
> -static void ecn_decapsulate(struct sk_buff *skb, u8 tos)
> +static void ecn_decapsulate(struct sk_buff *skb)
> {
> -	if (unlikely(INET_ECN_is_ce(tos))) {
> +	if (unlikely(INET_ECN_is_ce(OVS_CB(skb)->tun_key->ipv4_tos))) {
> 		__be16 protocol = skb->protocol;

This might come in a later patch, although I didn't see it in a quick scan, but it should be possible to implement all the ECN encapsulation and decapsulation in userspace, just like we can do with the rest of the ToS and TTL.

> 
> bool ovs_tnl_frag_needed(struct vport *vport,
> 			 const struct tnl_mutable_config *mutable,
> -			 struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
> +			 struct sk_buff *skb, unsigned int mtu,
> +			 struct ovs_key_ipv4_tunnel *tun_key)
> {
> 	unsigned int eth_hdr_len = ETH_HLEN;
> 	unsigned int total_length = 0, header_length = 0, payload_length;
> 	struct ethhdr *eh, *old_eh = eth_hdr(skb);
> 	struct sk_buff *nskb;
> +	struct ovs_key_ipv4_tunnel ntun_key;
> 
> 	/* Sanity check */
> 	if (skb->protocol == htons(ETH_P_IP)) {
> @@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport,
> 	 * any way of synthesizing packets.
> 	 */
> 	if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
> -	    (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
> -		OVS_CB(nskb)->tun_id = flow_key;
> +	    (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
> +		ntun_key = *tun_key;
> +		OVS_CB(nskb)->tun_key = &ntun_key;
> +	}

I guess this is probably where you were going to use the function to reverse IP addresses.  The logic doesn't really work but it's moot since this is going away anyways.
> 
> @@ -799,10 +803,8 @@ static void create_tunnel_header(const struct vport *vport,
> 	iph->ihl	= sizeof(struct iphdr) >> 2;
> 	iph->frag_off	= htons(IP_DF);
> 	iph->protocol	= tnl_vport->tnl_ops->ipproto;
> -	iph->tos	= mutable->tos;
> 	iph->daddr	= rt->rt_dst;
> 	iph->saddr	= rt->rt_src;
> -	iph->ttl	= mutable->ttl;
> 	if (!iph->ttl)
> 		iph->ttl = ip4_dst_hoplimit(&rt_dst(rt));
> 

I'm not sure that these changes quite belong in this patch (not that it shouldn't be done but it seems like the supporting code isn't there yet).
> 
> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
> index ab89c5b..fd2b038 100644
> --- a/datapath/vport-gre.c
> +++ b/datapath/vport-gre.c
> @@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct vport *vport,
> 	__be32 *options = (__be32 *)(skb_network_header(skb) + mutable->tunnel_hlen
> 					       - GRE_HEADER_SECTION);
> 
> -	/* Work backwards over the options so the checksum is last. */
> -	if (mutable->flags & TNL_F_OUT_KEY_ACTION)
> -		*options = be64_get_low32(OVS_CB(skb)->tun_id);

Why does this go away?

> diff --git a/datapath/vport.c b/datapath/vport.c
> index 172261a..0c77a1b 100644
> --- a/datapath/vport.c
> +++ b/datapath/vport.c
> @@ -462,7 +462,7 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff *skb)
> 		OVS_CB(skb)->flow = NULL;
> 
> 	if (!(vport->ops->flags & VPORT_F_TUN_ID))
> -		OVS_CB(skb)->tun_id = 0;
> +		OVS_CB(skb)->tun_key = NULL;

We probably should rename this flag now.

> diff --git a/lib/odp-util.h b/lib/odp-util.h
> index d53f083..4e5a8a1 100644
> --- a/lib/odp-util.h
> +++ b/lib/odp-util.h
> @@ -72,6 +72,7 @@ int odp_actions_from_string(const char *, const struct simap *port_names,
>  *                         ------  ---  ------  -----
>  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
>  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
> + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24

Same thing about the size here as well.

[-- Attachment #1.2: Type: text/html, Size: 19171 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ovs-dev] [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key
  2012-05-24  9:08   ` [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key Simon Horman
       [not found]     ` <1337850554-10339-2-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
@ 2012-06-03  9:15     ` Jesse Gross
       [not found]       ` <CAEP_g=9hkP-7fuFK3zSJcR=2BTK0feq7qUa8LHs3dbGQBy+suw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 32+ messages in thread
From: Jesse Gross @ 2012-06-03  9:15 UTC (permalink / raw)
  To: Simon Horman; +Cc: dev, netdev

On Thu, May 24, 2012 at 6:08 PM, Simon Horman <horms@verge.net.au> wrote:
> this is a first pass at providing a tun_key which can be used
> as the basis for flow-based tunnelling. The tun_key includes and
> replaces the tun_id in both struct ovs_skb_cb and struct sw_tun_key.
>
> In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow
> when support for IPv6 to an extent that inlining the structure will result
> in ovs_skb_cb being larger than the 48 bytes available in skb->cb.
>
> As OVS does not support IPv6 as the outer transport protocol for tunnels
> the IPv6 portions of this change, which appeared in the previous revision,
> have been dropped in order to limit the scope and size of this patch.
>
> This patch does not make any effort to retain the existing tun_id behaviour
> nor does it fully implement flow-based tunnels. As such it it is incomplete
> and can't be used in its current form (other than to break OVS tunnelling).
>
> ** Please do not apply **
>
> Cc: Kyle Mestery <kmestery@cisco.com>
> Signed-off-by: Simon Horman <horms@verge.net.au>

Thanks and sorry again about being so slow to look at this.

Overall, this looks pretty good to me.  The main difficulty that I had
was in figuring out what should go with the old behavior and what
should go with the new since it's at an intermediate point between the
two but I understand that it's difficult to break it up in a way that
both encapsulates a particular set of functionality and isn't too
large.  Otherwise, I noticed a few specific things that I noted below.

> diff --git a/datapath/flow.c b/datapath/flow.c
> index d07337c..49c0dd8 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> @@ -1162,14 +1166,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
>  * get the metadata, that is, the parts of the flow key that cannot be
>  * extracted from the packet itself.
>  */
> -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
> +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
> +                                  struct ovs_key_ipv4_tunnel *tun_key,
>                                   const struct nlattr *attr)
>  {
>        const struct nlattr *nla;
>        int rem;
>
>        *in_port = DP_MAX_PORTS;
> -       *tun_id = 0;
> +       tun_key->tun_id = 0;

I think we probably want to memset the entire tun_key to zero to avoid
having potentially uninitialized data in the flow.

> @@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
>  int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
>  {
>        struct ovs_key_ethernet *eth_key;
> +       struct ovs_key_ipv4_tunnel *tun_key;
>        struct nlattr *nla, *encap;
>
>        if (swkey->phy.priority &&
>            nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
>                goto nla_put_failure;
>
> -       if (swkey->phy.tun_id != cpu_to_be64(0) &&
> -           nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id))
> -               goto nla_put_failure;
> +       if (swkey->phy.tun_key.ipv4_dst) {

It's probably OK to use DIP equal to zero as a not present marker but
we need to enforce that it's always true - for example we shouldn't
allow somebody to setup a flow that way or receive packets with a zero
address.  Alternately, we may be able to find a spare bit to indicate
this, like is done with vlans.

In any case, I think we need to do some additional validation when
setting up flows to check reserved space, for example, as otherwise
that will never match.

> diff --git a/datapath/flow.h b/datapath/flow.h
> index 5be481e..bab5363 100644
> --- a/datapath/flow.h
> +++ b/datapath/flow.h
> @@ -42,7 +42,7 @@ struct sw_flow_actions {
>
>  struct sw_flow_key {
>        struct {
> -               __be64  tun_id;         /* Encapsulating tunnel ID. */
> +               struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */

This is an optimization but as we get closer I'd like to put the
tun_key at the end of struct sw_flow_key so that packets that didn't
come from a tunnel don't have to pay the cost during the lookup (this
is especially true as we add support for IPv6 tunnels).

In a similar vein, struct ovs_key_ipv4_tunnel contains some fields
that I think can never apply for lookup such as the flags so it would
be nice if we could remove that for lookup.

> @@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
>  *                         ------  ---  ------  -----
>  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
>  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
> + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24

If my math is correct, I think the size of the base struct
ova_key_ipv4_tunnel is 24 bytes.

> +static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key)
> +{
> +       __be32 ndst = tun_key->ipv4_src;
> +       tun_key->ipv4_src = tun_key->ipv4_dst;
> +       tun_key->ipv4_dst = ndst;
> +}

I'm not quite sure when we would need to swap the addresses in a
tunnel and I didn't see any uses of this function.

> +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
> +                               const struct iphdr *iph, __be64 tun_id)
> +{
> +       tun_key->tun_id = tun_id;
> +       tun_key->ipv4_src = iph->saddr;
> +       tun_key->ipv4_dst = iph->daddr;
> +       tun_key->ipv4_tos = iph->tos;
> +       tun_key->ipv4_ttl = iph->ttl;
> +}

Aren't there some fields that we need to zero out to avoid problems in
the lookup?

> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> index d651c11..010e513 100644
> --- a/datapath/tunnel.c
> +++ b/datapath/tunnel.c
> @@ -367,9 +367,9 @@ struct vport *ovs_tnl_find_port(struct net *net, __be32 saddr, __be32 daddr,
>        return NULL;
>  }
>
> -static void ecn_decapsulate(struct sk_buff *skb, u8 tos)
> +static void ecn_decapsulate(struct sk_buff *skb)
>  {
> -       if (unlikely(INET_ECN_is_ce(tos))) {
> +       if (unlikely(INET_ECN_is_ce(OVS_CB(skb)->tun_key->ipv4_tos))) {

This might come in a later patch, although I didn't see it in a quick
scan, but it should be possible to implement all the ECN encapsulation
and decapsulation in userspace, just like we can do with the rest of
the ToS and TTL.

>  bool ovs_tnl_frag_needed(struct vport *vport,
>                         const struct tnl_mutable_config *mutable,
> -                        struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
> +                        struct sk_buff *skb, unsigned int mtu,
> +                        struct ovs_key_ipv4_tunnel *tun_key)
>  {
>        unsigned int eth_hdr_len = ETH_HLEN;
>        unsigned int total_length = 0, header_length = 0, payload_length;
>        struct ethhdr *eh, *old_eh = eth_hdr(skb);
>        struct sk_buff *nskb;
> +       struct ovs_key_ipv4_tunnel ntun_key;
>
>        /* Sanity check */
>        if (skb->protocol == htons(ETH_P_IP)) {
> @@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport,
>         * any way of synthesizing packets.
>         */
>        if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
> -           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
> -               OVS_CB(nskb)->tun_id = flow_key;
> +           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
> +               ntun_key = *tun_key;
> +               OVS_CB(nskb)->tun_key = &ntun_key;
> +       }

I guess this is probably where you were going to use the function to
reverse IP addresses.  The logic doesn't really work but it's moot
since this is going away anyways.

> @@ -799,10 +803,8 @@ static void create_tunnel_header(const struct vport *vport,
>        iph->ihl        = sizeof(struct iphdr) >> 2;
>        iph->frag_off   = htons(IP_DF);
>        iph->protocol   = tnl_vport->tnl_ops->ipproto;
> -       iph->tos        = mutable->tos;
>        iph->daddr      = rt->rt_dst;
>        iph->saddr      = rt->rt_src;
> -       iph->ttl        = mutable->ttl;
>        if (!iph->ttl)
>                iph->ttl = ip4_dst_hoplimit(&rt_dst(rt));

I'm not sure that these changes quite belong in this patch (not that
it shouldn't be done but it seems like the supporting code isn't there
yet).

> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
> index ab89c5b..fd2b038 100644
> --- a/datapath/vport-gre.c
> +++ b/datapath/vport-gre.c
> @@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct vport *vport,
>        __be32 *options = (__be32 *)(skb_network_header(skb) + mutable->tunnel_hlen
>                                               - GRE_HEADER_SECTION);
>
> -       /* Work backwards over the options so the checksum is last. */
> -       if (mutable->flags & TNL_F_OUT_KEY_ACTION)
> -               *options = be64_get_low32(OVS_CB(skb)->tun_id);

Why does this go away?

> diff --git a/datapath/vport.c b/datapath/vport.c
> index 172261a..0c77a1b 100644
> --- a/datapath/vport.c
> +++ b/datapath/vport.c
> @@ -462,7 +462,7 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff *skb)
>                OVS_CB(skb)->flow = NULL;
>
>        if (!(vport->ops->flags & VPORT_F_TUN_ID))
> -               OVS_CB(skb)->tun_id = 0;
> +               OVS_CB(skb)->tun_key = NULL;

We probably should rename this flag now.

> diff --git a/lib/odp-util.h b/lib/odp-util.h
> index d53f083..4e5a8a1 100644
> --- a/lib/odp-util.h
> +++ b/lib/odp-util.h
> @@ -72,6 +72,7 @@ int odp_actions_from_string(const char *, const struct simap *port_names,
>  *                         ------  ---  ------  -----
>  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
>  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
> + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24

Same thing about the size here as well.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key
       [not found]       ` <CAEP_g=9hkP-7fuFK3zSJcR=2BTK0feq7qUa8LHs3dbGQBy+suw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-06-04 22:34         ` Simon Horman
  2012-06-05  3:33           ` [ovs-dev] " Jesse Gross
  0 siblings, 1 reply; 32+ messages in thread
From: Simon Horman @ 2012-06-04 22:34 UTC (permalink / raw)
  To: Jesse Gross; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

On Sun, Jun 03, 2012 at 06:15:04PM +0900, Jesse Gross wrote:
> On Thu, May 24, 2012 at 6:08 PM, Simon Horman <horms@verge.net.au> wrote:
> > this is a first pass at providing a tun_key which can be used
> > as the basis for flow-based tunnelling. The tun_key includes and
> > replaces the tun_id in both struct ovs_skb_cb and struct sw_tun_key.
> >
> > In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow
> > when support for IPv6 to an extent that inlining the structure will result
> > in ovs_skb_cb being larger than the 48 bytes available in skb->cb.
> >
> > As OVS does not support IPv6 as the outer transport protocol for tunnels
> > the IPv6 portions of this change, which appeared in the previous revision,
> > have been dropped in order to limit the scope and size of this patch.
> >
> > This patch does not make any effort to retain the existing tun_id behaviour
> > nor does it fully implement flow-based tunnels. As such it it is incomplete
> > and can't be used in its current form (other than to break OVS tunnelling).
> >
> > ** Please do not apply **
> >
> > Cc: Kyle Mestery <kmestery@cisco.com>
> > Signed-off-by: Simon Horman <horms@verge.net.au>
> 
> Thanks and sorry again about being so slow to look at this.
> 
> Overall, this looks pretty good to me.  The main difficulty that I had
> was in figuring out what should go with the old behavior and what
> should go with the new since it's at an intermediate point between the
> two but I understand that it's difficult to break it up in a way that
> both encapsulates a particular set of functionality and isn't too
> large.  Otherwise, I noticed a few specific things that I noted below.
> 
> > diff --git a/datapath/flow.c b/datapath/flow.c
> > index d07337c..49c0dd8 100644
> > --- a/datapath/flow.c
> > +++ b/datapath/flow.c
> > @@ -1162,14 +1166,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
> >  * get the metadata, that is, the parts of the flow key that cannot be
> >  * extracted from the packet itself.
> >  */
> > -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
> > +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
> > +                                  struct ovs_key_ipv4_tunnel *tun_key,
> >                                   const struct nlattr *attr)
> >  {
> >        const struct nlattr *nla;
> >        int rem;
> >
> >        *in_port = DP_MAX_PORTS;
> > -       *tun_id = 0;
> > +       tun_key->tun_id = 0;
> 
> I think we probably want to memset the entire tun_key to zero to avoid
> having potentially uninitialized data in the flow.

Sure, that is fine by me.

> > @@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
> >  int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
> >  {
> >        struct ovs_key_ethernet *eth_key;
> > +       struct ovs_key_ipv4_tunnel *tun_key;
> >        struct nlattr *nla, *encap;
> >
> >        if (swkey->phy.priority &&
> >            nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
> >                goto nla_put_failure;
> >
> > -       if (swkey->phy.tun_id != cpu_to_be64(0) &&
> > -           nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id))
> > -               goto nla_put_failure;
> > +       if (swkey->phy.tun_key.ipv4_dst) {
> 
> It's probably OK to use DIP equal to zero as a not present marker but
> we need to enforce that it's always true - for example we shouldn't
> allow somebody to setup a flow that way or receive packets with a zero
> address.  Alternately, we may be able to find a spare bit to indicate
> this, like is done with vlans.

When I originally wrote this there didn't seem to be any obvious
place in ovs_key_ipv4_tunnel to have an active/inactive bit, which
is in part why the code relies on checking DIP.

However, more recent versions of ovs_key_ipv4_tunnel have a flags field of
which only one bit is currently used. We could use one of the unused flag
bits.

> In any case, I think we need to do some additional validation when
> setting up flows to check reserved space, for example, as otherwise
> that will never match.

Sure. My testing seems to indicate that matching does occur,
though I am quite happy to tighten things up.

> 
> > diff --git a/datapath/flow.h b/datapath/flow.h
> > index 5be481e..bab5363 100644
> > --- a/datapath/flow.h
> > +++ b/datapath/flow.h
> > @@ -42,7 +42,7 @@ struct sw_flow_actions {
> >
> >  struct sw_flow_key {
> >        struct {
> > -               __be64  tun_id;         /* Encapsulating tunnel ID. */
> > +               struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
> 
> This is an optimization but as we get closer I'd like to put the
> tun_key at the end of struct sw_flow_key so that packets that didn't
> come from a tunnel don't have to pay the cost during the lookup (this
> is especially true as we add support for IPv6 tunnels).

Sure.

> In a similar vein, struct ovs_key_ipv4_tunnel contains some fields
> that I think can never apply for lookup such as the flags so it would
> be nice if we could remove that for lookup.

I think they need to be there to be passed around, so I'm not
sure if they can easily be removed from ovs_key_ipv4_tunnel if that
is what you are asking.

> > @@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
> >  *                         ------  ---  ------  -----
> >  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
> >  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
> > + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24
> 
> If my math is correct, I think the size of the base struct
> ova_key_ipv4_tunnel is 24 bytes.

Sorry, the size changed a few times and I seem to have forgotten to
update the above table accordingly.

> > +static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key)
> > +{
> > +       __be32 ndst = tun_key->ipv4_src;
> > +       tun_key->ipv4_src = tun_key->ipv4_dst;
> > +       tun_key->ipv4_dst = ndst;
> > +}
> 
> I'm not quite sure when we would need to swap the addresses in a
> tunnel and I didn't see any uses of this function.

This should no longer be needed - and was always broken - I'll remove it.

> > +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
> > +                               const struct iphdr *iph, __be64 tun_id)
> > +{
> > +       tun_key->tun_id = tun_id;
> > +       tun_key->ipv4_src = iph->saddr;
> > +       tun_key->ipv4_dst = iph->daddr;
> > +       tun_key->ipv4_tos = iph->tos;
> > +       tun_key->ipv4_ttl = iph->ttl;
> > +}
> 
> Aren't there some fields that we need to zero out to avoid problems in
> the lookup?

Thanks, I will check.

> > diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> > index d651c11..010e513 100644
> > --- a/datapath/tunnel.c
> > +++ b/datapath/tunnel.c
> > @@ -367,9 +367,9 @@ struct vport *ovs_tnl_find_port(struct net *net, __be32 saddr, __be32 daddr,
> >        return NULL;
> >  }
> >
> > -static void ecn_decapsulate(struct sk_buff *skb, u8 tos)
> > +static void ecn_decapsulate(struct sk_buff *skb)
> >  {
> > -       if (unlikely(INET_ECN_is_ce(tos))) {
> > +       if (unlikely(INET_ECN_is_ce(OVS_CB(skb)->tun_key->ipv4_tos))) {
> 
> This might come in a later patch, although I didn't see it in a quick
> scan, but it should be possible to implement all the ECN encapsulation
> and decapsulation in userspace, just like we can do with the rest of
> the ToS and TTL.

I hadn't considered that, I will add it to my TODO list.

> >  bool ovs_tnl_frag_needed(struct vport *vport,
> >                         const struct tnl_mutable_config *mutable,
> > -                        struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
> > +                        struct sk_buff *skb, unsigned int mtu,
> > +                        struct ovs_key_ipv4_tunnel *tun_key)
> >  {
> >        unsigned int eth_hdr_len = ETH_HLEN;
> >        unsigned int total_length = 0, header_length = 0, payload_length;
> >        struct ethhdr *eh, *old_eh = eth_hdr(skb);
> >        struct sk_buff *nskb;
> > +       struct ovs_key_ipv4_tunnel ntun_key;
> >
> >        /* Sanity check */
> >        if (skb->protocol == htons(ETH_P_IP)) {
> > @@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport,
> >         * any way of synthesizing packets.
> >         */
> >        if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
> > -           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
> > -               OVS_CB(nskb)->tun_id = flow_key;
> > +           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
> > +               ntun_key = *tun_key;
> > +               OVS_CB(nskb)->tun_key = &ntun_key;
> > +       }
> 
> I guess this is probably where you were going to use the function to
> reverse IP addresses.  The logic doesn't really work but it's moot
> since this is going away anyways.

My latest series includes a clean up to ovs_tnl_frag_needed() to allow
it to work in some circumstances - i.e. those found in my test environment.
That series removes knowledge of tun_key from ovs_tnl_frag_needed().

I am however happy to remove ovs_tnl_frag_needed() completely if you think
that is appropriate.

> > @@ -799,10 +803,8 @@ static void create_tunnel_header(const struct vport *vport,
> >        iph->ihl        = sizeof(struct iphdr) >> 2;
> >        iph->frag_off   = htons(IP_DF);
> >        iph->protocol   = tnl_vport->tnl_ops->ipproto;
> > -       iph->tos        = mutable->tos;
> >        iph->daddr      = rt->rt_dst;
> >        iph->saddr      = rt->rt_src;
> > -       iph->ttl        = mutable->ttl;
> >        if (!iph->ttl)
> >                iph->ttl = ip4_dst_hoplimit(&rt_dst(rt));
> 
> I'm not sure that these changes quite belong in this patch (not that
> it shouldn't be done but it seems like the supporting code isn't there
> yet).

Sorry, I did some merging of my patches and this seems to have been
a case where I merged incorrectly. I think that change belongs in:

[PATCH 18/21] dataptah: remove ttl and tos from tnl_mutable_config

(I need to fix the typo in the title of that patch!)

> > diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
> > index ab89c5b..fd2b038 100644
> > --- a/datapath/vport-gre.c
> > +++ b/datapath/vport-gre.c
> > @@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct vport *vport,
> >        __be32 *options = (__be32 *)(skb_network_header(skb) + mutable->tunnel_hlen
> >                                               - GRE_HEADER_SECTION);
> >
> > -       /* Work backwards over the options so the checksum is last. */
> > -       if (mutable->flags & TNL_F_OUT_KEY_ACTION)
> > -               *options = be64_get_low32(OVS_CB(skb)->tun_id);
> 
> Why does this go away?

I agree that looks wrong and my only explanation is that it is remnants
of some hack.

I have done some, hopefully more sensible, re-working of gre_update_header in:

[PATCH 20/21] datapath: Use tun_key flags for id and csum settings on transmit

> > diff --git a/datapath/vport.c b/datapath/vport.c
> > index 172261a..0c77a1b 100644
> > --- a/datapath/vport.c
> > +++ b/datapath/vport.c
> > @@ -462,7 +462,7 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff *skb)
> >                OVS_CB(skb)->flow = NULL;
> >
> >        if (!(vport->ops->flags & VPORT_F_TUN_ID))
> > -               OVS_CB(skb)->tun_id = 0;
> > +               OVS_CB(skb)->tun_key = NULL;
> 
> We probably should rename this flag now.

Yes, I think there are several flags that may now be removed.
I'll add that to my TODO list.

> > diff --git a/lib/odp-util.h b/lib/odp-util.h
> > index d53f083..4e5a8a1 100644
> > --- a/lib/odp-util.h
> > +++ b/lib/odp-util.h
> > @@ -72,6 +72,7 @@ int odp_actions_from_string(const char *, const struct simap *port_names,
> >  *                         ------  ---  ------  -----
> >  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
> >  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
> > + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24
> 
> Same thing about the size here as well.

Thanks
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ovs-dev] [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key
  2012-06-04 22:34         ` Simon Horman
@ 2012-06-05  3:33           ` Jesse Gross
  2012-06-05  8:12             ` Simon Horman
  0 siblings, 1 reply; 32+ messages in thread
From: Jesse Gross @ 2012-06-05  3:33 UTC (permalink / raw)
  To: Simon Horman; +Cc: dev, netdev

On Tue, Jun 5, 2012 at 7:34 AM, Simon Horman <horms@verge.net.au> wrote:
> On Sun, Jun 03, 2012 at 06:15:04PM +0900, Jesse Gross wrote:
>> On Thu, May 24, 2012 at 6:08 PM, Simon Horman <horms@verge.net.au> wrote:
>> > @@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
>> >  int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
>> >  {
>> >        struct ovs_key_ethernet *eth_key;
>> > +       struct ovs_key_ipv4_tunnel *tun_key;
>> >        struct nlattr *nla, *encap;
>> >
>> >        if (swkey->phy.priority &&
>> >            nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
>> >                goto nla_put_failure;
>> >
>> > -       if (swkey->phy.tun_id != cpu_to_be64(0) &&
>> > -           nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id))
>> > -               goto nla_put_failure;
>> > +       if (swkey->phy.tun_key.ipv4_dst) {
>>
>> It's probably OK to use DIP equal to zero as a not present marker but
>> we need to enforce that it's always true - for example we shouldn't
>> allow somebody to setup a flow that way or receive packets with a zero
>> address.  Alternately, we may be able to find a spare bit to indicate
>> this, like is done with vlans.
>
> When I originally wrote this there didn't seem to be any obvious
> place in ovs_key_ipv4_tunnel to have an active/inactive bit, which
> is in part why the code relies on checking DIP.
>
> However, more recent versions of ovs_key_ipv4_tunnel have a flags field of
> which only one bit is currently used. We could use one of the unused flag
> bits.

I guess it depends on what we end up doing with the lookup struct.  If
it stays as it is today, there's plenty of space if you include those
padding bytes.  If we shrink it down and there isn't a place then I do
think it is fine to use DIP (since this is traversing an IP stack and
DIP = 0 is an invalid value it's not like an L2 switch not allowing
invalid IP packet).  In that case, we just need to do more validation
in other places to make sure that this is the only situation that the
condition can arise.

>> In any case, I think we need to do some additional validation when
>> setting up flows to check reserved space, for example, as otherwise
>> that will never match.
>
> Sure. My testing seems to indicate that matching does occur,
> though I am quite happy to tighten things up.

I don't think it causes a problem as long as userspace is well
behaved, I was think it's best to detect problems early.

>> In a similar vein, struct ovs_key_ipv4_tunnel contains some fields
>> that I think can never apply for lookup such as the flags so it would
>> be nice if we could remove that for lookup.
>
> I think they need to be there to be passed around, so I'm not
> sure if they can easily be removed from ovs_key_ipv4_tunnel if that
> is what you are asking.

My guess is that we'll probably want to separate out the struct that
is used for lookup from the one that is used for communication with
userspace, which is what we do for most things so that we have more
freedom to optimize in the kernel.

>> >  bool ovs_tnl_frag_needed(struct vport *vport,
>> >                         const struct tnl_mutable_config *mutable,
>> > -                        struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
>> > +                        struct sk_buff *skb, unsigned int mtu,
>> > +                        struct ovs_key_ipv4_tunnel *tun_key)
>> >  {
>> >        unsigned int eth_hdr_len = ETH_HLEN;
>> >        unsigned int total_length = 0, header_length = 0, payload_length;
>> >        struct ethhdr *eh, *old_eh = eth_hdr(skb);
>> >        struct sk_buff *nskb;
>> > +       struct ovs_key_ipv4_tunnel ntun_key;
>> >
>> >        /* Sanity check */
>> >        if (skb->protocol == htons(ETH_P_IP)) {
>> > @@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport,
>> >         * any way of synthesizing packets.
>> >         */
>> >        if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
>> > -           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
>> > -               OVS_CB(nskb)->tun_id = flow_key;
>> > +           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
>> > +               ntun_key = *tun_key;
>> > +               OVS_CB(nskb)->tun_key = &ntun_key;
>> > +       }
>>
>> I guess this is probably where you were going to use the function to
>> reverse IP addresses.  The logic doesn't really work but it's moot
>> since this is going away anyways.
>
> My latest series includes a clean up to ovs_tnl_frag_needed() to allow
> it to work in some circumstances - i.e. those found in my test environment.
> That series removes knowledge of tun_key from ovs_tnl_frag_needed().
>
> I am however happy to remove ovs_tnl_frag_needed() completely if you think
> that is appropriate.

I think, in retrospect, that the path MTU discovery that I implemented
here was probably not the right choice and MSS clamping is the correct
way to do things.  It was better when it wasn't possible to do any
kind of flow-based manipulation of tunnels but the model is breaking
down more and more over time.  Given that I would be hesitant to
submit it upstream and since that's the goal of this work, removing it
completely is probably the right thing to do.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [ovs-dev] [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key
  2012-06-05  3:33           ` [ovs-dev] " Jesse Gross
@ 2012-06-05  8:12             ` Simon Horman
  0 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2012-06-05  8:12 UTC (permalink / raw)
  To: Jesse Gross; +Cc: dev, netdev

On Tue, Jun 05, 2012 at 12:33:11PM +0900, Jesse Gross wrote:
> On Tue, Jun 5, 2012 at 7:34 AM, Simon Horman <horms@verge.net.au> wrote:
> > On Sun, Jun 03, 2012 at 06:15:04PM +0900, Jesse Gross wrote:
> >> On Thu, May 24, 2012 at 6:08 PM, Simon Horman <horms@verge.net.au> wrote:
> >> > @@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
> >> >  int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
> >> >  {
> >> >        struct ovs_key_ethernet *eth_key;
> >> > +       struct ovs_key_ipv4_tunnel *tun_key;
> >> >        struct nlattr *nla, *encap;
> >> >
> >> >        if (swkey->phy.priority &&
> >> >            nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
> >> >                goto nla_put_failure;
> >> >
> >> > -       if (swkey->phy.tun_id != cpu_to_be64(0) &&
> >> > -           nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id))
> >> > -               goto nla_put_failure;
> >> > +       if (swkey->phy.tun_key.ipv4_dst) {
> >>
> >> It's probably OK to use DIP equal to zero as a not present marker but
> >> we need to enforce that it's always true - for example we shouldn't
> >> allow somebody to setup a flow that way or receive packets with a zero
> >> address.  Alternately, we may be able to find a spare bit to indicate
> >> this, like is done with vlans.
> >
> > When I originally wrote this there didn't seem to be any obvious
> > place in ovs_key_ipv4_tunnel to have an active/inactive bit, which
> > is in part why the code relies on checking DIP.
> >
> > However, more recent versions of ovs_key_ipv4_tunnel have a flags field of
> > which only one bit is currently used. We could use one of the unused flag
> > bits.
> 
> I guess it depends on what we end up doing with the lookup struct.  If
> it stays as it is today, there's plenty of space if you include those
> padding bytes.  If we shrink it down and there isn't a place then I do
> think it is fine to use DIP (since this is traversing an IP stack and
> DIP = 0 is an invalid value it's not like an L2 switch not allowing
> invalid IP packet).  In that case, we just need to do more validation
> in other places to make sure that this is the only situation that the
> condition can arise.

Ok, my suspicion is that there will be space.

> >> In any case, I think we need to do some additional validation when
> >> setting up flows to check reserved space, for example, as otherwise
> >> that will never match.
> >
> > Sure. My testing seems to indicate that matching does occur,
> > though I am quite happy to tighten things up.
> 
> I don't think it causes a problem as long as userspace is well
> behaved, I was think it's best to detect problems early.

Ok, good point. I'll make sure the data is clean.
> 
> >> In a similar vein, struct ovs_key_ipv4_tunnel contains some fields
> >> that I think can never apply for lookup such as the flags so it would
> >> be nice if we could remove that for lookup.
> >
> > I think they need to be there to be passed around, so I'm not
> > sure if they can easily be removed from ovs_key_ipv4_tunnel if that
> > is what you are asking.
> 
> My guess is that we'll probably want to separate out the struct that
> is used for lookup from the one that is used for communication with
> userspace, which is what we do for most things so that we have more
> freedom to optimize in the kernel.

Understood, I'll look into doing that.

> >> >  bool ovs_tnl_frag_needed(struct vport *vport,
> >> >                         const struct tnl_mutable_config *mutable,
> >> > -                        struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
> >> > +                        struct sk_buff *skb, unsigned int mtu,
> >> > +                        struct ovs_key_ipv4_tunnel *tun_key)
> >> >  {
> >> >        unsigned int eth_hdr_len = ETH_HLEN;
> >> >        unsigned int total_length = 0, header_length = 0, payload_length;
> >> >        struct ethhdr *eh, *old_eh = eth_hdr(skb);
> >> >        struct sk_buff *nskb;
> >> > +       struct ovs_key_ipv4_tunnel ntun_key;
> >> >
> >> >        /* Sanity check */
> >> >        if (skb->protocol == htons(ETH_P_IP)) {
> >> > @@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport,
> >> >         * any way of synthesizing packets.
> >> >         */
> >> >        if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
> >> > -           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
> >> > -               OVS_CB(nskb)->tun_id = flow_key;
> >> > +           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
> >> > +               ntun_key = *tun_key;
> >> > +               OVS_CB(nskb)->tun_key = &ntun_key;
> >> > +       }
> >>
> >> I guess this is probably where you were going to use the function to
> >> reverse IP addresses.  The logic doesn't really work but it's moot
> >> since this is going away anyways.
> >
> > My latest series includes a clean up to ovs_tnl_frag_needed() to allow
> > it to work in some circumstances - i.e. those found in my test environment.
> > That series removes knowledge of tun_key from ovs_tnl_frag_needed().
> >
> > I am however happy to remove ovs_tnl_frag_needed() completely if you think
> > that is appropriate.
> 
> I think, in retrospect, that the path MTU discovery that I implemented
> here was probably not the right choice and MSS clamping is the correct
> way to do things.  It was better when it wasn't possible to do any
> kind of flow-based manipulation of tunnels but the model is breaking
> down more and more over time.  Given that I would be hesitant to
> submit it upstream and since that's the goal of this work, removing it
> completely is probably the right thing to do.

Sure. My latest series also includes an implementation of MSS clamping,
so assuming it isn't doomed in some way we should be safe to remove
MTU discovery. I'll see about doing that for my next post.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-06-05  8:12 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-24  9:08 [RFC v4 00/21] Flow Based Tunneling for Open vSwitch Simon Horman
2012-05-24  9:08 ` [PATCH 02/21] datapath: Use tun_key on transmit Simon Horman
2012-05-24  9:08 ` [PATCH 03/21] odp-util: Add tun_key to parse_odp_key_attr() Simon Horman
     [not found]   ` <1337850554-10339-4-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
2012-05-24 16:29     ` Ben Pfaff
     [not found]       ` <20120524162911.GD26173-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
2012-05-25  0:01         ` Simon Horman
2012-05-24  9:09 ` [PATCH 08/21] ofproto: Add realdev_to_txdev() Simon Horman
     [not found] ` <1337850554-10339-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
2012-05-24  9:08   ` [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key Simon Horman
     [not found]     ` <1337850554-10339-2-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
2012-06-03  9:01       ` Jesse Gross
2012-06-03  9:15     ` [ovs-dev] " Jesse Gross
     [not found]       ` <CAEP_g=9hkP-7fuFK3zSJcR=2BTK0feq7qUa8LHs3dbGQBy+suw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-06-04 22:34         ` Simon Horman
2012-06-05  3:33           ` [ovs-dev] " Jesse Gross
2012-06-05  8:12             ` Simon Horman
2012-05-24  9:08   ` [PATCH 04/21] vswitchd: Add iface_parse_tunnel Simon Horman
     [not found]     ` <1337850554-10339-5-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
2012-05-24 16:47       ` Ben Pfaff
2012-05-24 23:59         ` [ovs-dev] " Simon Horman
2012-05-24  9:08   ` [PATCH 05/21] vswitchd: Add add_tunnel_ports() Simon Horman
     [not found]     ` <1337850554-10339-6-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
2012-05-25 17:18       ` Ben Pfaff
2012-05-24  9:08   ` [PATCH 06/21] ofproto: Add set_tunnelling() Simon Horman
2012-05-24  9:09   ` [PATCH 07/21] vswitchd: Configure tunnel interfaces Simon Horman
2012-05-24  9:09   ` [PATCH 09/21] ofproto: Add tundev_to_realdev() Simon Horman
2012-05-24  9:09   ` [PATCH 10/21] classifier: Convert struct flow flow_metadata to use tun_key Simon Horman
2012-05-24  9:09   ` [PATCH 11/21] datapath, vport: Provide tunnel realdev and tundev classes and vports Simon Horman
2012-05-24  9:09   ` [PATCH 12/21] lib: Replace commit_set_tun_id_action() with commit_set_tunnel_action() Simon Horman
2012-05-24  9:09   ` [PATCH 13/21] global: Remove OVS_KEY_ATTR_TUN_ID Simon Horman
2012-05-24  9:09   ` [PATCH 14/21] ofproto: Set flow tun_key in compose_output_action() Simon Horman
2012-05-24  9:09   ` [PATCH 15/21] datapath: Remove mlink element from tnl_mutable_config Simon Horman
2012-05-24  9:09   ` [PATCH 18/21] dataptah: remove ttl and tos " Simon Horman
2012-05-24  9:09 ` [PATCH 16/21] datapath: remove tunnel cache Simon Horman
2012-05-24  9:09 ` [PATCH 17/21] datapath: Always use tun_key addresses for route lookup Simon Horman
2012-05-24  9:09 ` [PATCH 19/21] datapath: Simplify vport lookup Simon Horman
2012-05-24  9:09 ` [PATCH 20/21] datapath: Use tun_key flags for id and csum settings on transmit Simon Horman
2012-05-24  9:09 ` [PATCH 21/21] datapath: Always use tun_key flags Simon Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).