netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next RFC 0/4] Openvswitch meter action
@ 2017-10-12 22:38 Andy Zhou
  2017-10-12 22:38 ` [net-next RFC 1/4] openvswitch: Add meter netlink definitions Andy Zhou
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Andy Zhou @ 2017-10-12 22:38 UTC (permalink / raw)
  To: netdev; +Cc: pshelar, joe, gvrose8192, Andy Zhou

This patch series is the first attempt to add openvswitch
meter support. We have previously experimented with adding
metering support in nftables. However 1) It was not clear
how to expose a named nftables object cleanly, and 2)
the logic that implements metering is quite small, < 100 lines
of code.

With those two observations, it seems cleaner to add meter
support in the openvswitch module directly.


Andy Zhou (4):
  openvswitch: Add meter netlink definitions
  openvswitch: export get_dp() API.
  openvswitch: Add meter infrastructure
  openvswitch: Add meter action support

 include/uapi/linux/openvswitch.h |  52 ++++
 net/openvswitch/Makefile         |   1 +
 net/openvswitch/actions.c        |  12 +
 net/openvswitch/datapath.c       |  43 +--
 net/openvswitch/datapath.h       |  35 +++
 net/openvswitch/flow_netlink.c   |   6 +
 net/openvswitch/meter.c          | 611 +++++++++++++++++++++++++++++++++++++++
 net/openvswitch/meter.h          |  54 ++++
 8 files changed, 783 insertions(+), 31 deletions(-)
 create mode 100644 net/openvswitch/meter.c
 create mode 100644 net/openvswitch/meter.h

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [net-next RFC 1/4] openvswitch: Add meter netlink definitions
  2017-10-12 22:38 [net-next RFC 0/4] Openvswitch meter action Andy Zhou
@ 2017-10-12 22:38 ` Andy Zhou
  2017-10-12 22:38 ` [net-next RFC 2/4] openvswitch: export get_dp() API Andy Zhou
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Andy Zhou @ 2017-10-12 22:38 UTC (permalink / raw)
  To: netdev; +Cc: pshelar, joe, gvrose8192, Andy Zhou

Meter has its own netlink family. Define netlink messages and attributes
for communicating with the user space programs.

Signed-off-by: Andy Zhou <azhou@ovn.org>
---
 include/uapi/linux/openvswitch.h | 51 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 156ee4cab82e..325049a129e4 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -848,4 +848,55 @@ enum ovs_action_attr {
 
 #define OVS_ACTION_ATTR_MAX (__OVS_ACTION_ATTR_MAX - 1)
 
+/* Meters. */
+#define OVS_METER_FAMILY  "ovs_meter"
+#define OVS_METER_MCGROUP "ovs_meter"
+#define OVS_METER_VERSION 0x1
+
+enum ovs_meter_cmd {
+	OVS_METER_CMD_UNSPEC,
+	OVS_METER_CMD_FEATURES,	/* Get features supported by the datapath. */
+	OVS_METER_CMD_SET,	/* Add or modify a meter. */
+	OVS_METER_CMD_DEL,	/* Delete a meter. */
+	OVS_METER_CMD_GET	/* Get meter stats. */
+};
+
+enum ovs_meter_attr {
+	OVS_METER_ATTR_UNSPEC,
+	OVS_METER_ATTR_ID,	/* u32 meter ID within datapath. */
+	OVS_METER_ATTR_KBPS,	/* No argument. If set, units in kilobits
+				 * per second. Otherwise, units in
+				 * packets per second.
+				 */
+	OVS_METER_ATTR_STATS,	/* struct ovs_flow_stats for the meter. */
+	OVS_METER_ATTR_BANDS,	/* Nested attributes for meter bands. */
+	OVS_METER_ATTR_USED,	/* u64 msecs last used in monotonic time. */
+	OVS_METER_ATTR_CLEAR,	/* Flag to clear stats, used. */
+	OVS_METER_ATTR_MAX_METERS, /* u32 number of meters supported. */
+	OVS_METER_ATTR_MAX_BANDS,  /* u32 max number of bands per meter. */
+	OVS_METER_ATTR_PAD,
+	__OVS_METER_ATTR_MAX
+};
+
+#define OVS_METER_ATTR_MAX (__OVS_METER_ATTR_MAX - 1)
+
+enum ovs_band_attr {
+	OVS_BAND_ATTR_UNSPEC,
+	OVS_BAND_ATTR_TYPE,	/* u32 OVS_METER_BAND_TYPE_* constant. */
+	OVS_BAND_ATTR_RATE,	/* u32 band rate in meter units (see above). */
+	OVS_BAND_ATTR_BURST,	/* u32 burst size in meter units. */
+	OVS_BAND_ATTR_STATS,	/* struct ovs_flow_stats for the band. */
+	__OVS_BAND_ATTR_MAX
+};
+
+#define OVS_BAND_ATTR_MAX (__OVS_BAND_ATTR_MAX - 1)
+
+enum ovs_meter_band_type {
+	OVS_METER_BAND_TYPE_UNSPEC,
+	OVS_METER_BAND_TYPE_DROP,   /* Drop exceeding packets. */
+	__OVS_METER_BAND_TYPE_MAX
+};
+
+#define OVS_METER_BAND_TYPE_MAX (__OVS_METER_BAND_TYPE_MAX - 1)
+
 #endif /* _LINUX_OPENVSWITCH_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [net-next RFC 2/4] openvswitch: export get_dp() API.
  2017-10-12 22:38 [net-next RFC 0/4] Openvswitch meter action Andy Zhou
  2017-10-12 22:38 ` [net-next RFC 1/4] openvswitch: Add meter netlink definitions Andy Zhou
@ 2017-10-12 22:38 ` Andy Zhou
  2017-10-12 22:38 ` [net-next RFC 3/4] openvswitch: Add meter infrastructure Andy Zhou
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Andy Zhou @ 2017-10-12 22:38 UTC (permalink / raw)
  To: netdev; +Cc: pshelar, joe, gvrose8192, Andy Zhou

Later patches will invoke get_dp() outside of datapath.c. Export it.

Signed-off-by: Andy Zhou <azhou@ovn.org>
---
 net/openvswitch/datapath.c | 29 -----------------------------
 net/openvswitch/datapath.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c3aec6227c91..ac7154018676 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -142,35 +142,6 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *,
 				  const struct dp_upcall_info *,
 				  uint32_t cutlen);
 
-/* Must be called with rcu_read_lock. */
-static struct datapath *get_dp_rcu(struct net *net, int dp_ifindex)
-{
-	struct net_device *dev = dev_get_by_index_rcu(net, dp_ifindex);
-
-	if (dev) {
-		struct vport *vport = ovs_internal_dev_get_vport(dev);
-		if (vport)
-			return vport->dp;
-	}
-
-	return NULL;
-}
-
-/* The caller must hold either ovs_mutex or rcu_read_lock to keep the
- * returned dp pointer valid.
- */
-static inline struct datapath *get_dp(struct net *net, int dp_ifindex)
-{
-	struct datapath *dp;
-
-	WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_ovsl_is_held());
-	rcu_read_lock();
-	dp = get_dp_rcu(net, dp_ifindex);
-	rcu_read_unlock();
-
-	return dp;
-}
-
 /* Must be called with rcu_read_lock or ovs_mutex. */
 const char *ovs_dp_name(const struct datapath *dp)
 {
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 480600649d0b..ad14b571219d 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -30,6 +30,7 @@
 #include "conntrack.h"
 #include "flow.h"
 #include "flow_table.h"
+#include "vport-internal_dev.h"
 
 #define DP_MAX_PORTS           USHRT_MAX
 #define DP_VPORT_HASH_BUCKETS  1024
@@ -190,6 +191,36 @@ static inline struct vport *ovs_vport_ovsl(const struct datapath *dp, int port_n
 	return ovs_lookup_vport(dp, port_no);
 }
 
+/* Must be called with rcu_read_lock. */
+static inline struct datapath *get_dp_rcu(struct net *net, int dp_ifindex)
+{
+	struct net_device *dev = dev_get_by_index_rcu(net, dp_ifindex);
+
+	if (dev) {
+		struct vport *vport = ovs_internal_dev_get_vport(dev);
+
+		if (vport)
+			return vport->dp;
+	}
+
+	return NULL;
+}
+
+/* The caller must hold either ovs_mutex or rcu_read_lock to keep the
+ * returned dp pointer valid.
+ */
+static inline struct datapath *get_dp(struct net *net, int dp_ifindex)
+{
+	struct datapath *dp;
+
+	WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_ovsl_is_held());
+	rcu_read_lock();
+	dp = get_dp_rcu(net, dp_ifindex);
+	rcu_read_unlock();
+
+	return dp;
+}
+
 extern struct notifier_block ovs_dp_device_notifier;
 extern struct genl_family dp_vport_genl_family;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [net-next RFC 3/4] openvswitch: Add meter infrastructure
  2017-10-12 22:38 [net-next RFC 0/4] Openvswitch meter action Andy Zhou
  2017-10-12 22:38 ` [net-next RFC 1/4] openvswitch: Add meter netlink definitions Andy Zhou
  2017-10-12 22:38 ` [net-next RFC 2/4] openvswitch: export get_dp() API Andy Zhou
@ 2017-10-12 22:38 ` Andy Zhou
  2017-10-14  0:12   ` Pravin Shelar
  2017-10-12 22:38 ` [net-next RFC 4/4] openvswitch: Add meter action support Andy Zhou
  2017-10-14  0:09 ` [net-next RFC 0/4] Openvswitch meter action Pravin Shelar
  4 siblings, 1 reply; 12+ messages in thread
From: Andy Zhou @ 2017-10-12 22:38 UTC (permalink / raw)
  To: netdev; +Cc: pshelar, joe, gvrose8192, Andy Zhou

OVS kernel datapath so far does not support Openflow meter action.
This is the first stab at adding kernel datapath meter support.
This implementation supports only drop band type.

Signed-off-by: Andy Zhou <azhou@ovn.org>
---
 net/openvswitch/Makefile   |   1 +
 net/openvswitch/datapath.c |  14 +-
 net/openvswitch/datapath.h |   3 +
 net/openvswitch/meter.c    | 611 +++++++++++++++++++++++++++++++++++++++++++++
 net/openvswitch/meter.h    |  54 ++++
 5 files changed, 681 insertions(+), 2 deletions(-)
 create mode 100644 net/openvswitch/meter.c
 create mode 100644 net/openvswitch/meter.h

diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
index 60f809085b92..658383fbdf53 100644
--- a/net/openvswitch/Makefile
+++ b/net/openvswitch/Makefile
@@ -11,6 +11,7 @@ openvswitch-y := \
 	flow.o \
 	flow_netlink.o \
 	flow_table.o \
+	meter.o \
 	vport.o \
 	vport-internal_dev.o \
 	vport-netdev.o
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index ac7154018676..eef8d3ea3aae 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -55,6 +55,7 @@
 #include "flow.h"
 #include "flow_table.h"
 #include "flow_netlink.h"
+#include "meter.h"
 #include "vport-internal_dev.h"
 #include "vport-netdev.h"
 
@@ -174,6 +175,7 @@ static void destroy_dp_rcu(struct rcu_head *rcu)
 	ovs_flow_tbl_destroy(&dp->table);
 	free_percpu(dp->stats_percpu);
 	kfree(dp->ports);
+	ovs_meters_exit(dp);
 	kfree(dp);
 }
 
@@ -1572,6 +1574,10 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++)
 		INIT_HLIST_HEAD(&dp->ports[i]);
 
+	err = ovs_meters_init(dp);
+	if (err)
+		goto err_destroy_ports_array;
+
 	/* Set up our datapath device. */
 	parms.name = nla_data(a[OVS_DP_ATTR_NAME]);
 	parms.type = OVS_VPORT_TYPE_INTERNAL;
@@ -1600,7 +1606,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 				ovs_dp_reset_user_features(skb, info);
 		}
 
-		goto err_destroy_ports_array;
+		goto err_destroy_meters;
 	}
 
 	err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
@@ -1615,8 +1621,10 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	ovs_notify(&dp_datapath_genl_family, reply, info);
 	return 0;
 
-err_destroy_ports_array:
+err_destroy_meters:
 	ovs_unlock();
+	ovs_meters_exit(dp);
+err_destroy_ports_array:
 	kfree(dp->ports);
 err_destroy_percpu:
 	free_percpu(dp->stats_percpu);
@@ -2244,6 +2252,7 @@ struct genl_family dp_vport_genl_family __ro_after_init = {
 	&dp_vport_genl_family,
 	&dp_flow_genl_family,
 	&dp_packet_genl_family,
+	&dp_meter_genl_family,
 };
 
 static void dp_unregister_genl(int n_families)
@@ -2424,3 +2433,4 @@ static void dp_cleanup(void)
 MODULE_ALIAS_GENL_FAMILY(OVS_VPORT_FAMILY);
 MODULE_ALIAS_GENL_FAMILY(OVS_FLOW_FAMILY);
 MODULE_ALIAS_GENL_FAMILY(OVS_PACKET_FAMILY);
+MODULE_ALIAS_GENL_FAMILY(OVS_METER_FAMILY);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index ad14b571219d..d1ffa1d9fe57 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -92,6 +92,9 @@ struct datapath {
 	u32 user_features;
 
 	u32 max_headroom;
+
+	/* Switch meters. */
+	struct hlist_head *meters;
 };
 
 /**
diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
new file mode 100644
index 000000000000..f24ebb5f7af4
--- /dev/null
+++ b/net/openvswitch/meter.c
@@ -0,0 +1,611 @@
+/*
+ * Copyright (c) 2017 Nicira, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/if.h>
+#include <linux/skbuff.h>
+#include <linux/ip.h>
+#include <linux/kernel.h>
+#include <linux/openvswitch.h>
+#include <linux/netlink.h>
+#include <linux/rculist.h>
+
+#include <net/netlink.h>
+#include <net/genetlink.h>
+
+#include "datapath.h"
+#include "meter.h"
+
+#define METER_HASH_BUCKETS 1024
+
+static const struct nla_policy meter_policy[OVS_METER_ATTR_MAX + 1] = {
+	[OVS_METER_ATTR_ID] = { .type = NLA_U32, },
+	[OVS_METER_ATTR_KBPS] = { .type = NLA_FLAG },
+	[OVS_METER_ATTR_STATS] = { .len = sizeof(struct ovs_flow_stats) },
+	[OVS_METER_ATTR_BANDS] = { .type = NLA_NESTED },
+	[OVS_METER_ATTR_USED] = { .type = NLA_U64 },
+	[OVS_METER_ATTR_CLEAR] = { .type = NLA_FLAG },
+	[OVS_METER_ATTR_MAX_METERS] = { .type = NLA_U32 },
+	[OVS_METER_ATTR_MAX_BANDS] = { .type = NLA_U32 },
+};
+
+static const struct nla_policy band_policy[OVS_BAND_ATTR_MAX + 1] = {
+	[OVS_BAND_ATTR_TYPE] = { .type = NLA_U32, },
+	[OVS_BAND_ATTR_RATE] = { .type = NLA_U32, },
+	[OVS_BAND_ATTR_BURST] = { .type = NLA_U32, },
+	[OVS_BAND_ATTR_STATS] = { .len = sizeof(struct ovs_flow_stats) },
+};
+
+static void rcu_free_ovs_meter_callback(struct rcu_head *rcu)
+{
+	struct dp_meter *meter = container_of(rcu, struct dp_meter, rcu);
+
+	kfree(meter);
+}
+
+static void ovs_meter_free(struct dp_meter *meter)
+{
+	if (!meter)
+		return;
+
+	call_rcu(&meter->rcu, rcu_free_ovs_meter_callback);
+}
+
+static struct hlist_head *meter_hash_bucket(const struct datapath *dp,
+					    u32 meter_id)
+{
+	return &dp->meters[meter_id & (METER_HASH_BUCKETS - 1)];
+}
+
+/* Call with ovs_mutex or RCU read lock. */
+static struct dp_meter *lookup_meter(const struct datapath *dp,
+				     u32 meter_id)
+{
+	struct dp_meter *meter;
+	struct hlist_head *head;
+
+	head = meter_hash_bucket(dp, meter_id);
+	hlist_for_each_entry_rcu(meter, head, dp_hash_node) {
+		if (meter->id == meter_id)
+			return meter;
+	}
+	return NULL;
+}
+
+static void attach_meter(struct datapath *dp, struct dp_meter *meter)
+{
+	struct hlist_head *head = meter_hash_bucket(dp, meter->id);
+
+	hlist_add_head_rcu(&meter->dp_hash_node, head);
+}
+
+static void detach_meter(struct dp_meter *meter)
+{
+	ASSERT_OVSL();
+	if (meter)
+		hlist_del_rcu(&meter->dp_hash_node);
+}
+
+static struct sk_buff *
+ovs_meter_cmd_reply_start(struct genl_info *info, u8 cmd,
+			  struct ovs_header **ovs_reply_header)
+{
+	struct sk_buff *skb;
+	struct ovs_header *ovs_header = info->userhdr;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+	if (!skb)
+		return ERR_PTR(-ENOMEM);
+
+	*ovs_reply_header = genlmsg_put(skb, info->snd_portid,
+					info->snd_seq,
+					&dp_meter_genl_family, 0, cmd);
+	if (!ovs_reply_header) {
+		nlmsg_free(skb);
+		return ERR_PTR(-EMSGSIZE);
+	}
+	(*ovs_reply_header)->dp_ifindex = ovs_header->dp_ifindex;
+
+	return skb;
+}
+
+static int ovs_meter_cmd_reply_stats(struct sk_buff *reply, u32 meter_id,
+				     struct dp_meter *meter)
+{
+	struct nlattr *nla;
+	struct dp_meter_band *band;
+	u16 i;
+
+	if (nla_put_u32(reply, OVS_METER_ATTR_ID, meter_id))
+		goto error;
+
+	if (!meter)
+		return 0;
+
+	if (nla_put(reply, OVS_METER_ATTR_STATS,
+		    sizeof(struct ovs_flow_stats), &meter->stats) ||
+	    nla_put_u64_64bit(reply, OVS_METER_ATTR_USED, meter->used,
+			      OVS_METER_ATTR_PAD))
+		goto error;
+
+	nla = nla_nest_start(reply, OVS_METER_ATTR_BANDS);
+	if (!nla)
+		goto error;
+
+	band = meter->bands;
+
+	for (i = 0; i < meter->n_bands; ++i, ++band) {
+		struct nlattr *band_nla;
+
+		band_nla = nla_nest_start(reply, OVS_BAND_ATTR_UNSPEC);
+		if (!band_nla || nla_put(reply, OVS_BAND_ATTR_STATS,
+					 sizeof(struct ovs_flow_stats),
+					 &band->stats))
+			goto error;
+		nla_nest_end(reply, band_nla);
+	}
+	nla_nest_end(reply, nla);
+
+	return 0;
+error:
+	return -EMSGSIZE;
+}
+
+static int ovs_meter_cmd_features(struct sk_buff *skb, struct genl_info *info)
+{
+	struct datapath *dp;
+	struct ovs_header *ovs_header = info->userhdr;
+	struct sk_buff *reply;
+	struct ovs_header *ovs_reply_header;
+	struct nlattr *nla, *band_nla;
+	int err;
+
+	/* Check that the datapath exists */
+	ovs_lock();
+	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
+	ovs_unlock();
+	if (!dp)
+		return -ENODEV;
+
+	reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_FEATURES,
+					  &ovs_reply_header);
+	if (!reply)
+		return PTR_ERR(reply);
+
+	if (nla_put_u32(reply, OVS_METER_ATTR_MAX_METERS, U32_MAX) ||
+	    nla_put_u32(reply, OVS_METER_ATTR_MAX_BANDS, DP_MAX_BANDS))
+		goto nla_put_failure;
+
+	nla = nla_nest_start(reply, OVS_METER_ATTR_BANDS);
+	if (!nla)
+		goto nla_put_failure;
+
+	band_nla = nla_nest_start(reply, OVS_BAND_ATTR_UNSPEC);
+	if (!band_nla)
+		goto nla_put_failure;
+	/* Currently only DROP band type is supported. */
+	if (nla_put_u32(reply, OVS_BAND_ATTR_TYPE, OVS_METER_BAND_TYPE_DROP))
+		goto nla_put_failure;
+	nla_nest_end(reply, band_nla);
+	nla_nest_end(reply, nla);
+
+	genlmsg_end(reply, ovs_reply_header);
+	return genlmsg_reply(reply, info);
+
+nla_put_failure:
+	nlmsg_free(reply);
+	err = -EMSGSIZE;
+	return err;
+}
+
+static struct dp_meter *dp_meter_create(struct nlattr **a)
+{
+	struct nlattr *nla;
+	int rem;
+	u16 n_bands = 0;
+	struct dp_meter *meter;
+	struct dp_meter_band *band;
+	int err;
+
+	/* Validate attributes, count the bands. */
+	if (!a[OVS_METER_ATTR_BANDS])
+		return ERR_PTR(-EINVAL);
+
+	nla_for_each_nested(nla, a[OVS_METER_ATTR_BANDS], rem)
+		if (++n_bands > DP_MAX_BANDS)
+			return ERR_PTR(-EINVAL);
+
+	/* Allocate and set up the meter before locking anything. */
+	meter = kzalloc(n_bands * sizeof(struct dp_meter_band) +
+			sizeof(*meter), GFP_KERNEL);
+	if (!meter)
+		return ERR_PTR(-ENOMEM);
+
+	meter->used = ktime_get_ns() / 1000 / 1000;
+	meter->kbps = a[OVS_METER_ATTR_KBPS] ? 1 : 0;
+	meter->keep_stats = !a[OVS_METER_ATTR_CLEAR];
+	spin_lock_init(&meter->lock);
+	if (meter->keep_stats && a[OVS_METER_ATTR_STATS]) {
+		meter->stats = *(struct ovs_flow_stats *)
+			nla_data(a[OVS_METER_ATTR_STATS]);
+	}
+	meter->n_bands = n_bands;
+
+	/* Set up meter bands. */
+	band = meter->bands;
+	nla_for_each_nested(nla, a[OVS_METER_ATTR_BANDS], rem) {
+		struct nlattr *attr[OVS_BAND_ATTR_MAX + 1];
+		u32 band_max_delta_t;
+
+		err = nla_parse((struct nlattr **)&attr, OVS_BAND_ATTR_MAX,
+				nla_data(nla), nla_len(nla), band_policy,
+				NULL);
+		if (err)
+			goto exit_free_meter;
+
+		if (!attr[OVS_BAND_ATTR_TYPE] ||
+		    !attr[OVS_BAND_ATTR_RATE] ||
+		    !attr[OVS_BAND_ATTR_BURST]) {
+			err = -EINVAL;
+			goto exit_free_meter;
+		}
+
+		band->type = nla_get_u32(attr[OVS_BAND_ATTR_TYPE]);
+		band->rate = nla_get_u32(attr[OVS_BAND_ATTR_RATE]);
+		band->burst_size = nla_get_u32(attr[OVS_BAND_ATTR_BURST]);
+		/* Figure out max delta_t that is enough to fill any bucket.
+		 * Keep max_delta_t size to the bucket units:
+		 * pkts => 1/1000 packets, kilobits => bits.
+		 */
+		band_max_delta_t = (band->burst_size + band->rate) * 1000;
+		/* Start with a full bucket. */
+		band->bucket = band_max_delta_t;
+		if (band_max_delta_t > meter->max_delta_t)
+			meter->max_delta_t = band_max_delta_t;
+		band++;
+	}
+
+	return meter;
+
+exit_free_meter:
+	kfree(meter);
+	return ERR_PTR(err);
+}
+
+static int ovs_meter_cmd_set(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr **a = info->attrs;
+	struct dp_meter *meter, *old_meter;
+	struct sk_buff *reply;
+	struct ovs_header *ovs_reply_header;
+	struct ovs_header *ovs_header = info->userhdr;
+	struct datapath *dp;
+	int err;
+	u32 meter_id;
+	bool failed;
+
+	meter = dp_meter_create(a);
+	if (IS_ERR_OR_NULL(meter))
+		return PTR_ERR(meter);
+
+	reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_SET,
+					  &ovs_reply_header);
+	if (IS_ERR(reply)) {
+		err = PTR_ERR(reply);
+		goto exit_free_meter;
+	}
+
+	ovs_lock();
+	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
+	if (!dp) {
+		err = -ENODEV;
+		goto exit_unlock;
+	}
+
+	if (!a[OVS_METER_ATTR_ID]) {
+		err = -ENODEV;
+		goto exit_unlock;
+	}
+
+	meter_id = nla_get_u32(a[OVS_METER_ATTR_ID]);
+
+	/* Cannot fail after this. */
+	old_meter = lookup_meter(dp, meter_id);
+	attach_meter(dp, meter);
+	ovs_unlock();
+
+	/* Build response with the meter_id and stats from
+	 * the old meter, if any.
+	 */
+	failed = nla_put_u32(reply, OVS_METER_ATTR_ID, meter_id);
+	WARN_ON(failed);
+	if (old_meter) {
+		spin_lock_bh(&old_meter->lock);
+		if (old_meter->keep_stats) {
+			err = ovs_meter_cmd_reply_stats(reply, meter_id,
+							old_meter);
+			WARN_ON(err);
+		}
+		spin_unlock_bh(&old_meter->lock);
+		ovs_meter_free(old_meter);
+	}
+
+	genlmsg_end(reply, ovs_reply_header);
+	return genlmsg_reply(reply, info);
+
+exit_unlock:
+	ovs_unlock();
+	nlmsg_free(reply);
+exit_free_meter:
+	kfree(meter);
+	return err;
+}
+
+static int ovs_meter_cmd_get(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr **a = info->attrs;
+	u32 meter_id;
+	struct ovs_header *ovs_header = info->userhdr;
+	struct ovs_header *ovs_reply_header;
+	struct datapath *dp;
+	int err;
+	struct sk_buff *reply;
+	struct dp_meter *meter;
+
+	if (!a[OVS_METER_ATTR_ID])
+		return -EINVAL;
+
+	meter_id = nla_get_u32(a[OVS_METER_ATTR_ID]);
+
+	reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_GET,
+					  &ovs_reply_header);
+	if (IS_ERR(reply))
+		return PTR_ERR(reply);
+
+	ovs_lock();
+
+	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
+	if (!dp) {
+		err = -ENODEV;
+		goto exit_unlock;
+	}
+
+	/* Locate meter, copy stats. */
+	meter = lookup_meter(dp, meter_id);
+	if (!meter) {
+		err = -ENOENT;
+		goto exit_unlock;
+	}
+
+	spin_lock_bh(&meter->lock);
+	err = ovs_meter_cmd_reply_stats(reply, meter_id, meter);
+	spin_unlock_bh(&meter->lock);
+	if (err)
+		goto exit_unlock;
+
+	ovs_unlock();
+
+	genlmsg_end(reply, ovs_reply_header);
+	return genlmsg_reply(reply, info);
+
+exit_unlock:
+	ovs_unlock();
+	nlmsg_free(reply);
+	return err;
+}
+
+static int ovs_meter_cmd_del(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr **a = info->attrs;
+	u32 meter_id;
+	struct ovs_header *ovs_header = info->userhdr;
+	struct ovs_header *ovs_reply_header;
+	struct datapath *dp;
+	int err;
+	struct sk_buff *reply;
+	struct dp_meter *old_meter;
+
+	if (!a[OVS_METER_ATTR_ID])
+		return -EINVAL;
+	meter_id = nla_get_u32(a[OVS_METER_ATTR_ID]);
+
+	reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_DEL,
+					  &ovs_reply_header);
+	if (IS_ERR(reply))
+		return PTR_ERR(reply);
+
+	ovs_lock();
+
+	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
+	if (!dp) {
+		err = -ENODEV;
+		goto exit_unlock;
+	}
+
+	old_meter = lookup_meter(dp, meter_id);
+	if (old_meter) {
+		spin_lock_bh(&old_meter->lock);
+		err = ovs_meter_cmd_reply_stats(reply, meter_id, old_meter);
+		WARN_ON(err);
+		spin_unlock_bh(&old_meter->lock);
+		detach_meter(old_meter);
+	}
+	ovs_unlock();
+	ovs_meter_free(old_meter);
+	genlmsg_end(reply, ovs_reply_header);
+	return genlmsg_reply(reply, info);
+
+exit_unlock:
+	ovs_unlock();
+	nlmsg_free(reply);
+	return err;
+}
+
+/* Meter action execution.
+ *
+ * Return true 'meter_id' drop band is triggered. The 'skb' should be
+ * dropped by the caller'.
+ */
+bool ovs_meter_execute(struct datapath *dp, struct sk_buff *skb,
+		       struct sw_flow_key *key, u32 meter_id)
+{
+	struct dp_meter *meter;
+	struct dp_meter_band *band;
+	long long int now_ms = ktime_get_ns() / 1000 / 1000;
+	long long int long_delta_ms;
+	u32 delta_ms;
+	u32 cost;
+	int i, band_exceeded_max = -1;
+	u32 band_exceeded_rate = 0;
+
+	meter = lookup_meter(dp, meter_id);
+	/* Do not drop the packet when there is no meter. */
+	if (!meter)
+		return false;
+
+	/* Lock the meter while using it. */
+	spin_lock(&meter->lock);
+
+	long_delta_ms = (now_ms - meter->used); /* ms */
+
+	/* Make sure delta_ms will not be too large, so that bucket will not
+	 * wrap around below.
+	 */
+	delta_ms = (long_delta_ms > (long long int)meter->max_delta_t)
+		   ? meter->max_delta_t : (u32)long_delta_ms;
+
+	/* Update meter statistics.
+	 */
+	meter->used = now_ms;
+	meter->stats.n_packets += 1;
+	meter->stats.n_bytes += skb->len;
+
+	/* Bucket rate is either in kilobits per second, or in packets per
+	 * second.  We maintain the bucket in the units of either bits or
+	 * 1/1000th of a packet, correspondingly.
+	 * Then, when rate is multiplied with milliseconds, we get the
+	 * bucket units:
+	 * msec * kbps = bits, and
+	 * msec * packets/sec = 1/1000 packets.
+	 *
+	 * 'cost' is the number of bucket units in this packet.
+	 */
+	cost = (meter->kbps) ? skb->len * 8 : 1000;
+
+	/* Update all bands and find the one hit with the highest rate. */
+	for (i = 0; i < meter->n_bands; ++i) {
+		long long int max_bucket_size;
+
+		band = &meter->bands[i];
+		max_bucket_size = (band->burst_size + band->rate) * 1000;
+
+		band->bucket += delta_ms * band->rate;
+		if (band->bucket > max_bucket_size)
+			band->bucket = max_bucket_size;
+
+		if (band->bucket >= cost) {
+			band->bucket -= cost;
+		} else if (band->rate > band_exceeded_rate) {
+			band_exceeded_rate = band->rate;
+			band_exceeded_max = i;
+		}
+	}
+
+	spin_unlock(&meter->lock);
+
+	if (band_exceeded_max >= 0) {
+		/* Update band statistics. */
+		band = &meter->bands[band_exceeded_max];
+		band->stats.n_packets += 1;
+		band->stats.n_bytes += skb->len;
+
+		/* Drop band triggered, let the caller drop the 'skb'.  */
+		if (band->type == OVS_METER_BAND_TYPE_DROP)
+			return true;
+	}
+
+	return false;
+}
+
+static struct genl_ops dp_meter_genl_ops[] = {
+	{ .cmd = OVS_METER_CMD_FEATURES,
+		.flags = 0,		  /* OK for unprivileged users. */
+		.policy = meter_policy,
+		.doit = ovs_meter_cmd_features
+	},
+	{ .cmd = OVS_METER_CMD_SET,
+		.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+					   *  privilege.
+					   */
+		.policy = meter_policy,
+		.doit = ovs_meter_cmd_set,
+	},
+	{ .cmd = OVS_METER_CMD_GET,
+		.flags = 0,		  /* OK for unprivileged users. */
+		.policy = meter_policy,
+		.doit = ovs_meter_cmd_get,
+	},
+	{ .cmd = OVS_METER_CMD_DEL,
+		.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+					   *  privilege.
+					   */
+		.policy = meter_policy,
+		.doit = ovs_meter_cmd_del
+	},
+};
+
+static const struct genl_multicast_group ovs_meter_multicast_group = {
+	.name = OVS_METER_MCGROUP,
+};
+
+struct genl_family dp_meter_genl_family __ro_after_init = {
+	.hdrsize = sizeof(struct ovs_header),
+	.name = OVS_METER_FAMILY,
+	.version = OVS_METER_VERSION,
+	.maxattr = OVS_METER_ATTR_MAX,
+	.netnsok = true,
+	.parallel_ops = true,
+	.ops = dp_meter_genl_ops,
+	.n_ops = ARRAY_SIZE(dp_meter_genl_ops),
+	.mcgrps = &ovs_meter_multicast_group,
+	.n_mcgrps = 1,
+	.module = THIS_MODULE,
+};
+
+int ovs_meters_init(struct datapath *dp)
+{
+	int i;
+
+	dp->meters = kmalloc_array(METER_HASH_BUCKETS,
+				   sizeof(struct hlist_head), GFP_KERNEL);
+
+	if (!dp->meters)
+		return -ENOMEM;
+
+	for (i = 0; i < METER_HASH_BUCKETS; i++)
+		INIT_HLIST_HEAD(&dp->meters[i]);
+
+	return 0;
+}
+
+void ovs_meters_exit(struct datapath *dp)
+{
+	int i;
+
+	for (i = 0; i < METER_HASH_BUCKETS; i++) {
+		struct hlist_head *head = &dp->meters[i];
+		struct dp_meter *meter;
+		struct hlist_node *n;
+
+		hlist_for_each_entry_safe(meter, n, head, dp_hash_node)
+			kfree(meter);
+	}
+
+	kfree(dp->meters);
+}
diff --git a/net/openvswitch/meter.h b/net/openvswitch/meter.h
new file mode 100644
index 000000000000..964ace2650f8
--- /dev/null
+++ b/net/openvswitch/meter.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2017 Nicira, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#ifndef METER_H
+#define METER_H 1
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/netlink.h>
+#include <linux/openvswitch.h>
+#include <linux/genetlink.h>
+#include <linux/skbuff.h>
+
+#include "flow.h"
+struct datapath;
+
+#define DP_MAX_BANDS		1
+
+struct dp_meter_band {
+	u32 type;
+	u32 rate;
+	u32 burst_size;
+	u32 bucket; /* 1/1000 packets, or in bits */
+	struct ovs_flow_stats stats;
+};
+
+struct dp_meter {
+	spinlock_t lock;    /* Per meter lock */
+	struct rcu_head rcu;
+	struct hlist_node dp_hash_node; /*Element in datapath->meters
+					 * hash table.
+					 */
+	u32 id;
+	u16 kbps:1, keep_stats:1;
+	u16 n_bands;
+	u32 max_delta_t;
+	u64 used;
+	struct ovs_flow_stats stats;
+	struct dp_meter_band bands[];
+};
+
+extern struct genl_family dp_meter_genl_family;
+int ovs_meters_init(struct datapath *dp);
+void ovs_meters_exit(struct datapath *dp);
+bool ovs_meter_execute(struct datapath *dp, struct sk_buff *skb,
+		       struct sw_flow_key *key, u32 meter_id);
+
+#endif /* meter.h */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [net-next RFC 4/4] openvswitch: Add meter action support
  2017-10-12 22:38 [net-next RFC 0/4] Openvswitch meter action Andy Zhou
                   ` (2 preceding siblings ...)
  2017-10-12 22:38 ` [net-next RFC 3/4] openvswitch: Add meter infrastructure Andy Zhou
@ 2017-10-12 22:38 ` Andy Zhou
  2017-10-14  0:13   ` Pravin Shelar
  2017-10-14  0:09 ` [net-next RFC 0/4] Openvswitch meter action Pravin Shelar
  4 siblings, 1 reply; 12+ messages in thread
From: Andy Zhou @ 2017-10-12 22:38 UTC (permalink / raw)
  To: netdev; +Cc: pshelar, joe, gvrose8192, Andy Zhou

Implements OVS kernel meter action support.

Signed-off-by: Andy Zhou <azhou@ovn.org>
---
 include/uapi/linux/openvswitch.h |  1 +
 net/openvswitch/actions.c        | 12 ++++++++++++
 net/openvswitch/datapath.h       |  1 +
 net/openvswitch/flow_netlink.c   |  6 ++++++
 4 files changed, 20 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 325049a129e4..11fe1a06cdd6 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -835,6 +835,7 @@ enum ovs_action_attr {
 	OVS_ACTION_ATTR_TRUNC,        /* u32 struct ovs_action_trunc. */
 	OVS_ACTION_ATTR_PUSH_ETH,     /* struct ovs_action_push_eth. */
 	OVS_ACTION_ATTR_POP_ETH,      /* No argument. */
+	OVS_ACTION_ATTR_METER,        /* u32 meter ID. */
 
 	__OVS_ACTION_ATTR_MAX,	      /* Nothing past this will be accepted
 				       * from userspace. */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index a54a556fcdb5..4eb160ac5a27 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1210,6 +1210,12 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 		case OVS_ACTION_ATTR_POP_ETH:
 			err = pop_eth(skb, key);
 			break;
+
+		case OVS_ACTION_ATTR_METER:
+			if (ovs_meter_execute(dp, skb, key, nla_get_u32(a))) {
+				consume_skb(skb);
+				return 0;
+			}
 		}
 
 		if (unlikely(err)) {
@@ -1341,6 +1347,12 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
 	err = do_execute_actions(dp, skb, key,
 				 acts->actions, acts->actions_len);
 
+	/* OVS action has dropped the packet, do not expose it
+	 * to the user.
+	 */
+	if (err == -ENODATA)
+		err = 0;
+
 	if (level == 1)
 		process_deferred_actions(dp);
 
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index d1ffa1d9fe57..cda40c6af40a 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -30,6 +30,7 @@
 #include "conntrack.h"
 #include "flow.h"
 #include "flow_table.h"
+#include "meter.h"
 #include "vport-internal_dev.h"
 
 #define DP_MAX_PORTS           USHRT_MAX
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index e8eb427ce6d1..39b548431f68 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -85,6 +85,7 @@ static bool actions_may_change_flow(const struct nlattr *actions)
 		case OVS_ACTION_ATTR_SAMPLE:
 		case OVS_ACTION_ATTR_SET:
 		case OVS_ACTION_ATTR_SET_MASKED:
+		case OVS_ACTION_ATTR_METER:
 		default:
 			return true;
 		}
@@ -2482,6 +2483,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			[OVS_ACTION_ATTR_TRUNC] = sizeof(struct ovs_action_trunc),
 			[OVS_ACTION_ATTR_PUSH_ETH] = sizeof(struct ovs_action_push_eth),
 			[OVS_ACTION_ATTR_POP_ETH] = 0,
+			[OVS_ACTION_ATTR_METER] = sizeof(u32),
 		};
 		const struct ovs_action_push_vlan *vlan;
 		int type = nla_type(a);
@@ -2636,6 +2638,10 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			mac_proto = MAC_PROTO_ETHERNET;
 			break;
 
+		case OVS_ACTION_ATTR_METER:
+			/* Non-existent meters are simply ignored.  */
+			break;
+
 		default:
 			OVS_NLERR(log, "Unknown Action type %d", type);
 			return -EINVAL;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [net-next RFC 0/4] Openvswitch meter action
  2017-10-12 22:38 [net-next RFC 0/4] Openvswitch meter action Andy Zhou
                   ` (3 preceding siblings ...)
  2017-10-12 22:38 ` [net-next RFC 4/4] openvswitch: Add meter action support Andy Zhou
@ 2017-10-14  0:09 ` Pravin Shelar
  4 siblings, 0 replies; 12+ messages in thread
From: Pravin Shelar @ 2017-10-14  0:09 UTC (permalink / raw)
  To: Andy Zhou; +Cc: Linux Kernel Network Developers, Joe Stringer, Greg Rose

On Thu, Oct 12, 2017 at 3:38 PM, Andy Zhou <azhou@ovn.org> wrote:
> This patch series is the first attempt to add openvswitch
> meter support. We have previously experimented with adding
> metering support in nftables. However 1) It was not clear
> how to expose a named nftables object cleanly, and 2)
> the logic that implements metering is quite small, < 100 lines
> of code.
>
> With those two observations, it seems cleaner to add meter
> support in the openvswitch module directly.
>
>
Thanks for working on this feature. It looks good to me. I have couple
of comments inlined.

> Andy Zhou (4):
>   openvswitch: Add meter netlink definitions
>   openvswitch: export get_dp() API.
>   openvswitch: Add meter infrastructure
>   openvswitch: Add meter action support
>
>  include/uapi/linux/openvswitch.h |  52 ++++
>  net/openvswitch/Makefile         |   1 +
>  net/openvswitch/actions.c        |  12 +
>  net/openvswitch/datapath.c       |  43 +--
>  net/openvswitch/datapath.h       |  35 +++
>  net/openvswitch/flow_netlink.c   |   6 +
>  net/openvswitch/meter.c          | 611 +++++++++++++++++++++++++++++++++++++++
>  net/openvswitch/meter.h          |  54 ++++
>  8 files changed, 783 insertions(+), 31 deletions(-)
>  create mode 100644 net/openvswitch/meter.c
>  create mode 100644 net/openvswitch/meter.h
>
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net-next RFC 3/4] openvswitch: Add meter infrastructure
  2017-10-12 22:38 ` [net-next RFC 3/4] openvswitch: Add meter infrastructure Andy Zhou
@ 2017-10-14  0:12   ` Pravin Shelar
  2017-10-16  7:05     ` Andy Zhou
  0 siblings, 1 reply; 12+ messages in thread
From: Pravin Shelar @ 2017-10-14  0:12 UTC (permalink / raw)
  To: Andy Zhou; +Cc: Linux Kernel Network Developers, Joe Stringer, Greg Rose

On Thu, Oct 12, 2017 at 3:38 PM, Andy Zhou <azhou@ovn.org> wrote:
> OVS kernel datapath so far does not support Openflow meter action.
> This is the first stab at adding kernel datapath meter support.
> This implementation supports only drop band type.
>
> Signed-off-by: Andy Zhou <azhou@ovn.org>
> ---
>  net/openvswitch/Makefile   |   1 +
>  net/openvswitch/datapath.c |  14 +-
>  net/openvswitch/datapath.h |   3 +
>  net/openvswitch/meter.c    | 611 +++++++++++++++++++++++++++++++++++++++++++++
>  net/openvswitch/meter.h    |  54 ++++
>  5 files changed, 681 insertions(+), 2 deletions(-)
>  create mode 100644 net/openvswitch/meter.c
>  create mode 100644 net/openvswitch/meter.h
>
...

> diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
> new file mode 100644
> index 000000000000..f24ebb5f7af4
> --- /dev/null
> +++ b/net/openvswitch/meter.c

....
....
> +static int ovs_meter_cmd_features(struct sk_buff *skb, struct genl_info *info)
> +{
> +       struct datapath *dp;
> +       struct ovs_header *ovs_header = info->userhdr;
> +       struct sk_buff *reply;
> +       struct ovs_header *ovs_reply_header;
> +       struct nlattr *nla, *band_nla;
> +       int err;
> +
> +       /* Check that the datapath exists */
> +       ovs_lock();
> +       dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
> +       ovs_unlock();
> +       if (!dp)
> +               return -ENODEV;
> +
why dp check is required for this API?

> +       reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_FEATURES,
> +                                         &ovs_reply_header);
> +       if (!reply)
> +               return PTR_ERR(reply);
> +
> +       if (nla_put_u32(reply, OVS_METER_ATTR_MAX_METERS, U32_MAX) ||
> +           nla_put_u32(reply, OVS_METER_ATTR_MAX_BANDS, DP_MAX_BANDS))
> +               goto nla_put_failure;
> +
> +       nla = nla_nest_start(reply, OVS_METER_ATTR_BANDS);
> +       if (!nla)
> +               goto nla_put_failure;
> +
> +       band_nla = nla_nest_start(reply, OVS_BAND_ATTR_UNSPEC);
> +       if (!band_nla)
> +               goto nla_put_failure;
> +       /* Currently only DROP band type is supported. */
> +       if (nla_put_u32(reply, OVS_BAND_ATTR_TYPE, OVS_METER_BAND_TYPE_DROP))
> +               goto nla_put_failure;
> +       nla_nest_end(reply, band_nla);
> +       nla_nest_end(reply, nla);
> +
> +       genlmsg_end(reply, ovs_reply_header);
> +       return genlmsg_reply(reply, info);
> +
> +nla_put_failure:
> +       nlmsg_free(reply);
> +       err = -EMSGSIZE;
> +       return err;
> +}
> +
....

> +static int ovs_meter_cmd_set(struct sk_buff *skb, struct genl_info *info)
> +{
> +       struct nlattr **a = info->attrs;
> +       struct dp_meter *meter, *old_meter;
> +       struct sk_buff *reply;
> +       struct ovs_header *ovs_reply_header;
> +       struct ovs_header *ovs_header = info->userhdr;
> +       struct datapath *dp;
> +       int err;
> +       u32 meter_id;
> +       bool failed;
> +
> +       meter = dp_meter_create(a);
> +       if (IS_ERR_OR_NULL(meter))
> +               return PTR_ERR(meter);
> +
> +       reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_SET,
> +                                         &ovs_reply_header);
> +       if (IS_ERR(reply)) {
> +               err = PTR_ERR(reply);
> +               goto exit_free_meter;
> +       }
> +
> +       ovs_lock();
> +       dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
> +       if (!dp) {
> +               err = -ENODEV;
> +               goto exit_unlock;
> +       }
> +
> +       if (!a[OVS_METER_ATTR_ID]) {
> +               err = -ENODEV;
> +               goto exit_unlock;
> +       }
> +
> +       meter_id = nla_get_u32(a[OVS_METER_ATTR_ID]);
> +
> +       /* Cannot fail after this. */
> +       old_meter = lookup_meter(dp, meter_id);
> +       attach_meter(dp, meter);
> +       ovs_unlock();
> +
After the unlock, it is not safe to keep the ref to old_meter. better
to release lock at the end. we could optimize it later if required.

> +       /* Build response with the meter_id and stats from
> +        * the old meter, if any.
> +        */
> +       failed = nla_put_u32(reply, OVS_METER_ATTR_ID, meter_id);
> +       WARN_ON(failed);
> +       if (old_meter) {
> +               spin_lock_bh(&old_meter->lock);
> +               if (old_meter->keep_stats) {
> +                       err = ovs_meter_cmd_reply_stats(reply, meter_id,
> +                                                       old_meter);
> +                       WARN_ON(err);
> +               }
> +               spin_unlock_bh(&old_meter->lock);
> +               ovs_meter_free(old_meter);
> +       }
> +
> +       genlmsg_end(reply, ovs_reply_header);
> +       return genlmsg_reply(reply, info);
> +
> +exit_unlock:
> +       ovs_unlock();
> +       nlmsg_free(reply);
> +exit_free_meter:
> +       kfree(meter);
> +       return err;
> +}
> +
....

> +bool ovs_meter_execute(struct datapath *dp, struct sk_buff *skb,
> +                      struct sw_flow_key *key, u32 meter_id)
> +{
> +       struct dp_meter *meter;
> +       struct dp_meter_band *band;
> +       long long int now_ms = ktime_get_ns() / 1000 / 1000;
> +       long long int long_delta_ms;
> +       u32 delta_ms;
> +       u32 cost;
> +       int i, band_exceeded_max = -1;
> +       u32 band_exceeded_rate = 0;
> +
> +       meter = lookup_meter(dp, meter_id);
> +       /* Do not drop the packet when there is no meter. */
> +       if (!meter)
> +               return false;
> +
> +       /* Lock the meter while using it. */
> +       spin_lock(&meter->lock);
> +
> +       long_delta_ms = (now_ms - meter->used); /* ms */
> +
> +       /* Make sure delta_ms will not be too large, so that bucket will not
> +        * wrap around below.
> +        */
> +       delta_ms = (long_delta_ms > (long long int)meter->max_delta_t)
> +                  ? meter->max_delta_t : (u32)long_delta_ms;
> +
> +       /* Update meter statistics.
> +        */
> +       meter->used = now_ms;
> +       meter->stats.n_packets += 1;
> +       meter->stats.n_bytes += skb->len;
> +
> +       /* Bucket rate is either in kilobits per second, or in packets per
> +        * second.  We maintain the bucket in the units of either bits or
> +        * 1/1000th of a packet, correspondingly.
> +        * Then, when rate is multiplied with milliseconds, we get the
> +        * bucket units:
> +        * msec * kbps = bits, and
> +        * msec * packets/sec = 1/1000 packets.
> +        *
> +        * 'cost' is the number of bucket units in this packet.
> +        */
> +       cost = (meter->kbps) ? skb->len * 8 : 1000;
> +
> +       /* Update all bands and find the one hit with the highest rate. */
> +       for (i = 0; i < meter->n_bands; ++i) {
> +               long long int max_bucket_size;
> +
> +               band = &meter->bands[i];
> +               max_bucket_size = (band->burst_size + band->rate) * 1000;
> +
> +               band->bucket += delta_ms * band->rate;
> +               if (band->bucket > max_bucket_size)
> +                       band->bucket = max_bucket_size;
> +
> +               if (band->bucket >= cost) {
> +                       band->bucket -= cost;
> +               } else if (band->rate > band_exceeded_rate) {
> +                       band_exceeded_rate = band->rate;
> +                       band_exceeded_max = i;
> +               }
> +       }
> +
> +       spin_unlock(&meter->lock);
> +
> +       if (band_exceeded_max >= 0) {
> +               /* Update band statistics. */
> +               band = &meter->bands[band_exceeded_max];
> +               band->stats.n_packets += 1;
> +               band->stats.n_bytes += skb->len;
> +
Is it safe to do outside of the sipinlock?

> +               /* Drop band triggered, let the caller drop the 'skb'.  */
> +               if (band->type == OVS_METER_BAND_TYPE_DROP)
> +                       return true;
> +       }
> +
> +       return false;
> +}
> +

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net-next RFC 4/4] openvswitch: Add meter action support
  2017-10-12 22:38 ` [net-next RFC 4/4] openvswitch: Add meter action support Andy Zhou
@ 2017-10-14  0:13   ` Pravin Shelar
  2017-10-16  7:06     ` Andy Zhou
  0 siblings, 1 reply; 12+ messages in thread
From: Pravin Shelar @ 2017-10-14  0:13 UTC (permalink / raw)
  To: Andy Zhou; +Cc: Linux Kernel Network Developers, Joe Stringer, Greg Rose

On Thu, Oct 12, 2017 at 3:38 PM, Andy Zhou <azhou@ovn.org> wrote:
> Implements OVS kernel meter action support.
>
> Signed-off-by: Andy Zhou <azhou@ovn.org>
> ---
>  include/uapi/linux/openvswitch.h |  1 +
>  net/openvswitch/actions.c        | 12 ++++++++++++
>  net/openvswitch/datapath.h       |  1 +
>  net/openvswitch/flow_netlink.c   |  6 ++++++
>  4 files changed, 20 insertions(+)
>
> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> index 325049a129e4..11fe1a06cdd6 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -835,6 +835,7 @@ enum ovs_action_attr {
>         OVS_ACTION_ATTR_TRUNC,        /* u32 struct ovs_action_trunc. */
>         OVS_ACTION_ATTR_PUSH_ETH,     /* struct ovs_action_push_eth. */
>         OVS_ACTION_ATTR_POP_ETH,      /* No argument. */
> +       OVS_ACTION_ATTR_METER,        /* u32 meter ID. */
>
>         __OVS_ACTION_ATTR_MAX,        /* Nothing past this will be accepted
>                                        * from userspace. */
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index a54a556fcdb5..4eb160ac5a27 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -1210,6 +1210,12 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>                 case OVS_ACTION_ATTR_POP_ETH:
>                         err = pop_eth(skb, key);
>                         break;
> +
> +               case OVS_ACTION_ATTR_METER:
> +                       if (ovs_meter_execute(dp, skb, key, nla_get_u32(a))) {
> +                               consume_skb(skb);
> +                               return 0;
> +                       }
>                 }
>
>                 if (unlikely(err)) {
> @@ -1341,6 +1347,12 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>         err = do_execute_actions(dp, skb, key,
>                                  acts->actions, acts->actions_len);
>
> +       /* OVS action has dropped the packet, do not expose it
> +        * to the user.
> +        */
> +       if (err == -ENODATA)
> +               err = 0;
> +
I am not sure who is returning this error code?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net-next RFC 3/4] openvswitch: Add meter infrastructure
  2017-10-14  0:12   ` Pravin Shelar
@ 2017-10-16  7:05     ` Andy Zhou
  2017-10-16 17:49       ` Pravin Shelar
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Zhou @ 2017-10-16  7:05 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: Linux Kernel Network Developers, Joe Stringer, Greg Rose

On Fri, Oct 13, 2017 at 5:12 PM, Pravin Shelar <pshelar@ovn.org> wrote:
> On Thu, Oct 12, 2017 at 3:38 PM, Andy Zhou <azhou@ovn.org> wrote:
>> OVS kernel datapath so far does not support Openflow meter action.
>> This is the first stab at adding kernel datapath meter support.
>> This implementation supports only drop band type.
>>
>> Signed-off-by: Andy Zhou <azhou@ovn.org>
>> ---
>>  net/openvswitch/Makefile   |   1 +
>>  net/openvswitch/datapath.c |  14 +-
>>  net/openvswitch/datapath.h |   3 +
>>  net/openvswitch/meter.c    | 611 +++++++++++++++++++++++++++++++++++++++++++++
>>  net/openvswitch/meter.h    |  54 ++++
>>  5 files changed, 681 insertions(+), 2 deletions(-)
>>  create mode 100644 net/openvswitch/meter.c
>>  create mode 100644 net/openvswitch/meter.h
>>
> ...
>
>> diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
>> new file mode 100644
>> index 000000000000..f24ebb5f7af4
>> --- /dev/null
>> +++ b/net/openvswitch/meter.c
>
> ....
> ....
>> +static int ovs_meter_cmd_features(struct sk_buff *skb, struct genl_info *info)
>> +{
>> +       struct datapath *dp;
>> +       struct ovs_header *ovs_header = info->userhdr;
>> +       struct sk_buff *reply;
>> +       struct ovs_header *ovs_reply_header;
>> +       struct nlattr *nla, *band_nla;
>> +       int err;
>> +
>> +       /* Check that the datapath exists */
>> +       ovs_lock();
>> +       dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
>> +       ovs_unlock();
>> +       if (!dp)
>> +               return -ENODEV;
>> +
> why dp check is required for this API?
Is it possible for another core delete the dp, before ovs_lock()
returns? Then, in theory, get_dp() can
return NULL, no?
>
>> +       reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_FEATURES,
>> +                                         &ovs_reply_header);
>> +       if (!reply)
>> +               return PTR_ERR(reply);
>> +
>> +       if (nla_put_u32(reply, OVS_METER_ATTR_MAX_METERS, U32_MAX) ||
>> +           nla_put_u32(reply, OVS_METER_ATTR_MAX_BANDS, DP_MAX_BANDS))
>> +               goto nla_put_failure;
>> +
>> +       nla = nla_nest_start(reply, OVS_METER_ATTR_BANDS);
>> +       if (!nla)
>> +               goto nla_put_failure;
>> +
>> +       band_nla = nla_nest_start(reply, OVS_BAND_ATTR_UNSPEC);
>> +       if (!band_nla)
>> +               goto nla_put_failure;
>> +       /* Currently only DROP band type is supported. */
>> +       if (nla_put_u32(reply, OVS_BAND_ATTR_TYPE, OVS_METER_BAND_TYPE_DROP))
>> +               goto nla_put_failure;
>> +       nla_nest_end(reply, band_nla);
>> +       nla_nest_end(reply, nla);
>> +
>> +       genlmsg_end(reply, ovs_reply_header);
>> +       return genlmsg_reply(reply, info);
>> +
>> +nla_put_failure:
>> +       nlmsg_free(reply);
>> +       err = -EMSGSIZE;
>> +       return err;
>> +}
>> +
> ....
>
>> +static int ovs_meter_cmd_set(struct sk_buff *skb, struct genl_info *info)
>> +{
>> +       struct nlattr **a = info->attrs;
>> +       struct dp_meter *meter, *old_meter;
>> +       struct sk_buff *reply;
>> +       struct ovs_header *ovs_reply_header;
>> +       struct ovs_header *ovs_header = info->userhdr;
>> +       struct datapath *dp;
>> +       int err;
>> +       u32 meter_id;
>> +       bool failed;
>> +
>> +       meter = dp_meter_create(a);
>> +       if (IS_ERR_OR_NULL(meter))
>> +               return PTR_ERR(meter);
>> +
>> +       reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_SET,
>> +                                         &ovs_reply_header);
>> +       if (IS_ERR(reply)) {
>> +               err = PTR_ERR(reply);
>> +               goto exit_free_meter;
>> +       }
>> +
>> +       ovs_lock();
>> +       dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
>> +       if (!dp) {
>> +               err = -ENODEV;
>> +               goto exit_unlock;
>> +       }
>> +
>> +       if (!a[OVS_METER_ATTR_ID]) {
>> +               err = -ENODEV;
>> +               goto exit_unlock;
>> +       }
>> +
>> +       meter_id = nla_get_u32(a[OVS_METER_ATTR_ID]);
>> +
>> +       /* Cannot fail after this. */
>> +       old_meter = lookup_meter(dp, meter_id);
>> +       attach_meter(dp, meter);
>> +       ovs_unlock();
>> +
> After the unlock, it is not safe to keep the ref to old_meter. better
> to release lock at the end. we could optimize it later if required.

I see a problem here: the old_meter has not been removed from the list before
unlock. The code should have been:

old_meter = lookup_meter(dp, meter_id);
detch_meter(dp, old_meter);
attach_meter(dp, meter);
ovs_unlock();

Do you still see a problem w.r.t. unlock() here? Once detch_meter() is called,
another thread should not have access to 'old_meter' any more. right?
>
>> +       /* Build response with the meter_id and stats from
>> +        * the old meter, if any.
>> +        */
>> +       failed = nla_put_u32(reply, OVS_METER_ATTR_ID, meter_id);
>> +       WARN_ON(failed);
>> +       if (old_meter) {
>> +               spin_lock_bh(&old_meter->lock);
>> +               if (old_meter->keep_stats) {
>> +                       err = ovs_meter_cmd_reply_stats(reply, meter_id,
>> +                                                       old_meter);
>> +                       WARN_ON(err);
>> +               }
>> +               spin_unlock_bh(&old_meter->lock);
>> +               ovs_meter_free(old_meter);
>> +       }
>> +
>> +       genlmsg_end(reply, ovs_reply_header);
>> +       return genlmsg_reply(reply, info);
>> +
>> +exit_unlock:
>> +       ovs_unlock();
>> +       nlmsg_free(reply);
>> +exit_free_meter:
>> +       kfree(meter);
>> +       return err;
>> +}
>> +
> ....
>
>> +bool ovs_meter_execute(struct datapath *dp, struct sk_buff *skb,
>> +                      struct sw_flow_key *key, u32 meter_id)
>> +{
>> +       struct dp_meter *meter;
>> +       struct dp_meter_band *band;
>> +       long long int now_ms = ktime_get_ns() / 1000 / 1000;
>> +       long long int long_delta_ms;
>> +       u32 delta_ms;
>> +       u32 cost;
>> +       int i, band_exceeded_max = -1;
>> +       u32 band_exceeded_rate = 0;
>> +
>> +       meter = lookup_meter(dp, meter_id);
>> +       /* Do not drop the packet when there is no meter. */
>> +       if (!meter)
>> +               return false;
>> +
>> +       /* Lock the meter while using it. */
>> +       spin_lock(&meter->lock);
>> +
>> +       long_delta_ms = (now_ms - meter->used); /* ms */
>> +
>> +       /* Make sure delta_ms will not be too large, so that bucket will not
>> +        * wrap around below.
>> +        */
>> +       delta_ms = (long_delta_ms > (long long int)meter->max_delta_t)
>> +                  ? meter->max_delta_t : (u32)long_delta_ms;
>> +
>> +       /* Update meter statistics.
>> +        */
>> +       meter->used = now_ms;
>> +       meter->stats.n_packets += 1;
>> +       meter->stats.n_bytes += skb->len;
>> +
>> +       /* Bucket rate is either in kilobits per second, or in packets per
>> +        * second.  We maintain the bucket in the units of either bits or
>> +        * 1/1000th of a packet, correspondingly.
>> +        * Then, when rate is multiplied with milliseconds, we get the
>> +        * bucket units:
>> +        * msec * kbps = bits, and
>> +        * msec * packets/sec = 1/1000 packets.
>> +        *
>> +        * 'cost' is the number of bucket units in this packet.
>> +        */
>> +       cost = (meter->kbps) ? skb->len * 8 : 1000;
>> +
>> +       /* Update all bands and find the one hit with the highest rate. */
>> +       for (i = 0; i < meter->n_bands; ++i) {
>> +               long long int max_bucket_size;
>> +
>> +               band = &meter->bands[i];
>> +               max_bucket_size = (band->burst_size + band->rate) * 1000;
>> +
>> +               band->bucket += delta_ms * band->rate;
>> +               if (band->bucket > max_bucket_size)
>> +                       band->bucket = max_bucket_size;
>> +
>> +               if (band->bucket >= cost) {
>> +                       band->bucket -= cost;
>> +               } else if (band->rate > band_exceeded_rate) {
>> +                       band_exceeded_rate = band->rate;
>> +                       band_exceeded_max = i;
>> +               }
>> +       }
>> +
>> +       spin_unlock(&meter->lock);
>> +
>> +       if (band_exceeded_max >= 0) {
>> +               /* Update band statistics. */
>> +               band = &meter->bands[band_exceeded_max];
>> +               band->stats.n_packets += 1;
>> +               band->stats.n_bytes += skb->len;
>> +
> Is it safe to do outside of the sipinlock?

Good catch, those should be covered by the same lock.  Will fix.
>
>> +               /* Drop band triggered, let the caller drop the 'skb'.  */
>> +               if (band->type == OVS_METER_BAND_TYPE_DROP)
>> +                       return true;
>> +       }
>> +
>> +       return false;
>> +}
>> +

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net-next RFC 4/4] openvswitch: Add meter action support
  2017-10-14  0:13   ` Pravin Shelar
@ 2017-10-16  7:06     ` Andy Zhou
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Zhou @ 2017-10-16  7:06 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: Linux Kernel Network Developers, Joe Stringer, Greg Rose

On Fri, Oct 13, 2017 at 5:13 PM, Pravin Shelar <pshelar@ovn.org> wrote:
> On Thu, Oct 12, 2017 at 3:38 PM, Andy Zhou <azhou@ovn.org> wrote:
>> Implements OVS kernel meter action support.
>>
>> Signed-off-by: Andy Zhou <azhou@ovn.org>
>> ---
>>  include/uapi/linux/openvswitch.h |  1 +
>>  net/openvswitch/actions.c        | 12 ++++++++++++
>>  net/openvswitch/datapath.h       |  1 +
>>  net/openvswitch/flow_netlink.c   |  6 ++++++
>>  4 files changed, 20 insertions(+)
>>
>> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
>> index 325049a129e4..11fe1a06cdd6 100644
>> --- a/include/uapi/linux/openvswitch.h
>> +++ b/include/uapi/linux/openvswitch.h
>> @@ -835,6 +835,7 @@ enum ovs_action_attr {
>>         OVS_ACTION_ATTR_TRUNC,        /* u32 struct ovs_action_trunc. */
>>         OVS_ACTION_ATTR_PUSH_ETH,     /* struct ovs_action_push_eth. */
>>         OVS_ACTION_ATTR_POP_ETH,      /* No argument. */
>> +       OVS_ACTION_ATTR_METER,        /* u32 meter ID. */
>>
>>         __OVS_ACTION_ATTR_MAX,        /* Nothing past this will be accepted
>>                                        * from userspace. */
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index a54a556fcdb5..4eb160ac5a27 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -1210,6 +1210,12 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>                 case OVS_ACTION_ATTR_POP_ETH:
>>                         err = pop_eth(skb, key);
>>                         break;
>> +
>> +               case OVS_ACTION_ATTR_METER:
>> +                       if (ovs_meter_execute(dp, skb, key, nla_get_u32(a))) {
>> +                               consume_skb(skb);
>> +                               return 0;
>> +                       }
>>                 }
>>
>>                 if (unlikely(err)) {
>> @@ -1341,6 +1347,12 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>         err = do_execute_actions(dp, skb, key,
>>                                  acts->actions, acts->actions_len);
>>
>> +       /* OVS action has dropped the packet, do not expose it
>> +        * to the user.
>> +        */
>> +       if (err == -ENODATA)
>> +               err = 0;
>> +
> I am not sure who is returning this error code?
Ah, this hunk was left over from experimenting with per band actions
list. Will remove. Thanks for catching this!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net-next RFC 3/4] openvswitch: Add meter infrastructure
  2017-10-16  7:05     ` Andy Zhou
@ 2017-10-16 17:49       ` Pravin Shelar
  2017-10-17  7:40         ` Andy Zhou
  0 siblings, 1 reply; 12+ messages in thread
From: Pravin Shelar @ 2017-10-16 17:49 UTC (permalink / raw)
  To: Andy Zhou; +Cc: Linux Kernel Network Developers, Joe Stringer, Greg Rose

On Mon, Oct 16, 2017 at 12:05 AM, Andy Zhou <azhou@ovn.org> wrote:
> On Fri, Oct 13, 2017 at 5:12 PM, Pravin Shelar <pshelar@ovn.org> wrote:
>> On Thu, Oct 12, 2017 at 3:38 PM, Andy Zhou <azhou@ovn.org> wrote:
>>> OVS kernel datapath so far does not support Openflow meter action.
>>> This is the first stab at adding kernel datapath meter support.
>>> This implementation supports only drop band type.
>>>
>>> Signed-off-by: Andy Zhou <azhou@ovn.org>
>>> ---
>>>  net/openvswitch/Makefile   |   1 +
>>>  net/openvswitch/datapath.c |  14 +-
>>>  net/openvswitch/datapath.h |   3 +
>>>  net/openvswitch/meter.c    | 611 +++++++++++++++++++++++++++++++++++++++++++++
>>>  net/openvswitch/meter.h    |  54 ++++
>>>  5 files changed, 681 insertions(+), 2 deletions(-)
>>>  create mode 100644 net/openvswitch/meter.c
>>>  create mode 100644 net/openvswitch/meter.h
>>>
>> ...
>>
>>> diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
>>> new file mode 100644
>>> index 000000000000..f24ebb5f7af4
>>> --- /dev/null
>>> +++ b/net/openvswitch/meter.c
>>
>> ....
>> ....
>>> +static int ovs_meter_cmd_features(struct sk_buff *skb, struct genl_info *info)
>>> +{
>>> +       struct datapath *dp;
>>> +       struct ovs_header *ovs_header = info->userhdr;
>>> +       struct sk_buff *reply;
>>> +       struct ovs_header *ovs_reply_header;
>>> +       struct nlattr *nla, *band_nla;
>>> +       int err;
>>> +
>>> +       /* Check that the datapath exists */
>>> +       ovs_lock();
>>> +       dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
>>> +       ovs_unlock();
>>> +       if (!dp)
>>> +               return -ENODEV;
>>> +
>> why dp check is required for this API?
> Is it possible for another core delete the dp, before ovs_lock()
> returns? Then, in theory, get_dp() can
> return NULL, no?

I do not see dp used anywhere in function. so the question was why is
dp lookup done here?

>>
>>> +       reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_FEATURES,
>>> +                                         &ovs_reply_header);
>>> +       if (!reply)
>>> +               return PTR_ERR(reply);
>>> +
>>> +       if (nla_put_u32(reply, OVS_METER_ATTR_MAX_METERS, U32_MAX) ||
>>> +           nla_put_u32(reply, OVS_METER_ATTR_MAX_BANDS, DP_MAX_BANDS))
>>> +               goto nla_put_failure;
>>> +
>>> +       nla = nla_nest_start(reply, OVS_METER_ATTR_BANDS);
>>> +       if (!nla)
>>> +               goto nla_put_failure;
>>> +
>>> +       band_nla = nla_nest_start(reply, OVS_BAND_ATTR_UNSPEC);
>>> +       if (!band_nla)
>>> +               goto nla_put_failure;
>>> +       /* Currently only DROP band type is supported. */
>>> +       if (nla_put_u32(reply, OVS_BAND_ATTR_TYPE, OVS_METER_BAND_TYPE_DROP))
>>> +               goto nla_put_failure;
>>> +       nla_nest_end(reply, band_nla);
>>> +       nla_nest_end(reply, nla);
>>> +
>>> +       genlmsg_end(reply, ovs_reply_header);
>>> +       return genlmsg_reply(reply, info);
>>> +
>>> +nla_put_failure:
>>> +       nlmsg_free(reply);
>>> +       err = -EMSGSIZE;
>>> +       return err;
>>> +}
>>> +
>> ....
>>
>>> +static int ovs_meter_cmd_set(struct sk_buff *skb, struct genl_info *info)
>>> +{
>>> +       struct nlattr **a = info->attrs;
>>> +       struct dp_meter *meter, *old_meter;
>>> +       struct sk_buff *reply;
>>> +       struct ovs_header *ovs_reply_header;
>>> +       struct ovs_header *ovs_header = info->userhdr;
>>> +       struct datapath *dp;
>>> +       int err;
>>> +       u32 meter_id;
>>> +       bool failed;
>>> +
>>> +       meter = dp_meter_create(a);
>>> +       if (IS_ERR_OR_NULL(meter))
>>> +               return PTR_ERR(meter);
>>> +
>>> +       reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_SET,
>>> +                                         &ovs_reply_header);
>>> +       if (IS_ERR(reply)) {
>>> +               err = PTR_ERR(reply);
>>> +               goto exit_free_meter;
>>> +       }
>>> +
>>> +       ovs_lock();
>>> +       dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
>>> +       if (!dp) {
>>> +               err = -ENODEV;
>>> +               goto exit_unlock;
>>> +       }
>>> +
>>> +       if (!a[OVS_METER_ATTR_ID]) {
>>> +               err = -ENODEV;
>>> +               goto exit_unlock;
>>> +       }
>>> +
>>> +       meter_id = nla_get_u32(a[OVS_METER_ATTR_ID]);
>>> +
>>> +       /* Cannot fail after this. */
>>> +       old_meter = lookup_meter(dp, meter_id);
>>> +       attach_meter(dp, meter);
>>> +       ovs_unlock();
>>> +
>> After the unlock, it is not safe to keep the ref to old_meter. better
>> to release lock at the end. we could optimize it later if required.
>
> I see a problem here: the old_meter has not been removed from the list before
> unlock. The code should have been:
>
> old_meter = lookup_meter(dp, meter_id);
> detch_meter(dp, old_meter);
> attach_meter(dp, meter);
> ovs_unlock();
>
> Do you still see a problem w.r.t. unlock() here? Once detch_meter() is called,
> another thread should not have access to 'old_meter' any more. right?

looks good.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net-next RFC 3/4] openvswitch: Add meter infrastructure
  2017-10-16 17:49       ` Pravin Shelar
@ 2017-10-17  7:40         ` Andy Zhou
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Zhou @ 2017-10-17  7:40 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: Linux Kernel Network Developers, Joe Stringer, Greg Rose

On Mon, Oct 16, 2017 at 10:49 AM, Pravin Shelar <pshelar@ovn.org> wrote:
> On Mon, Oct 16, 2017 at 12:05 AM, Andy Zhou <azhou@ovn.org> wrote:
>> On Fri, Oct 13, 2017 at 5:12 PM, Pravin Shelar <pshelar@ovn.org> wrote:
>>> On Thu, Oct 12, 2017 at 3:38 PM, Andy Zhou <azhou@ovn.org> wrote:
>>>> OVS kernel datapath so far does not support Openflow meter action.
>>>> This is the first stab at adding kernel datapath meter support.
>>>> This implementation supports only drop band type.
>>>>
>>>> Signed-off-by: Andy Zhou <azhou@ovn.org>
>>>> ---
>>>>  net/openvswitch/Makefile   |   1 +
>>>>  net/openvswitch/datapath.c |  14 +-
>>>>  net/openvswitch/datapath.h |   3 +
>>>>  net/openvswitch/meter.c    | 611 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  net/openvswitch/meter.h    |  54 ++++
>>>>  5 files changed, 681 insertions(+), 2 deletions(-)
>>>>  create mode 100644 net/openvswitch/meter.c
>>>>  create mode 100644 net/openvswitch/meter.h
>>>>
>>> ...
>>>
>>>> diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
>>>> new file mode 100644
>>>> index 000000000000..f24ebb5f7af4
>>>> --- /dev/null
>>>> +++ b/net/openvswitch/meter.c
>>>
>>> ....
>>> ....
>>>> +static int ovs_meter_cmd_features(struct sk_buff *skb, struct genl_info *info)
>>>> +{
>>>> +       struct datapath *dp;
>>>> +       struct ovs_header *ovs_header = info->userhdr;
>>>> +       struct sk_buff *reply;
>>>> +       struct ovs_header *ovs_reply_header;
>>>> +       struct nlattr *nla, *band_nla;
>>>> +       int err;
>>>> +
>>>> +       /* Check that the datapath exists */
>>>> +       ovs_lock();
>>>> +       dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
>>>> +       ovs_unlock();
>>>> +       if (!dp)
>>>> +               return -ENODEV;
>>>> +
>>> why dp check is required for this API?
>> Is it possible for another core delete the dp, before ovs_lock()
>> returns? Then, in theory, get_dp() can
>> return NULL, no?
>
> I do not see dp used anywhere in function. so the question was why is
> dp lookup done here?

Got it. I misunderstood. You are right, we don't need the dp pointer.
Just posted v2.

>
>>>
>>>> +       reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_FEATURES,
>>>> +                                         &ovs_reply_header);
>>>> +       if (!reply)
>>>> +               return PTR_ERR(reply);
>>>> +
>>>> +       if (nla_put_u32(reply, OVS_METER_ATTR_MAX_METERS, U32_MAX) ||
>>>> +           nla_put_u32(reply, OVS_METER_ATTR_MAX_BANDS, DP_MAX_BANDS))
>>>> +               goto nla_put_failure;
>>>> +
>>>> +       nla = nla_nest_start(reply, OVS_METER_ATTR_BANDS);
>>>> +       if (!nla)
>>>> +               goto nla_put_failure;
>>>> +
>>>> +       band_nla = nla_nest_start(reply, OVS_BAND_ATTR_UNSPEC);
>>>> +       if (!band_nla)
>>>> +               goto nla_put_failure;
>>>> +       /* Currently only DROP band type is supported. */
>>>> +       if (nla_put_u32(reply, OVS_BAND_ATTR_TYPE, OVS_METER_BAND_TYPE_DROP))
>>>> +               goto nla_put_failure;
>>>> +       nla_nest_end(reply, band_nla);
>>>> +       nla_nest_end(reply, nla);
>>>> +
>>>> +       genlmsg_end(reply, ovs_reply_header);
>>>> +       return genlmsg_reply(reply, info);
>>>> +
>>>> +nla_put_failure:
>>>> +       nlmsg_free(reply);
>>>> +       err = -EMSGSIZE;
>>>> +       return err;
>>>> +}
>>>> +
>>> ....
>>>
>>>> +static int ovs_meter_cmd_set(struct sk_buff *skb, struct genl_info *info)
>>>> +{
>>>> +       struct nlattr **a = info->attrs;
>>>> +       struct dp_meter *meter, *old_meter;
>>>> +       struct sk_buff *reply;
>>>> +       struct ovs_header *ovs_reply_header;
>>>> +       struct ovs_header *ovs_header = info->userhdr;
>>>> +       struct datapath *dp;
>>>> +       int err;
>>>> +       u32 meter_id;
>>>> +       bool failed;
>>>> +
>>>> +       meter = dp_meter_create(a);
>>>> +       if (IS_ERR_OR_NULL(meter))
>>>> +               return PTR_ERR(meter);
>>>> +
>>>> +       reply = ovs_meter_cmd_reply_start(info, OVS_METER_CMD_SET,
>>>> +                                         &ovs_reply_header);
>>>> +       if (IS_ERR(reply)) {
>>>> +               err = PTR_ERR(reply);
>>>> +               goto exit_free_meter;
>>>> +       }
>>>> +
>>>> +       ovs_lock();
>>>> +       dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
>>>> +       if (!dp) {
>>>> +               err = -ENODEV;
>>>> +               goto exit_unlock;
>>>> +       }
>>>> +
>>>> +       if (!a[OVS_METER_ATTR_ID]) {
>>>> +               err = -ENODEV;
>>>> +               goto exit_unlock;
>>>> +       }
>>>> +
>>>> +       meter_id = nla_get_u32(a[OVS_METER_ATTR_ID]);
>>>> +
>>>> +       /* Cannot fail after this. */
>>>> +       old_meter = lookup_meter(dp, meter_id);
>>>> +       attach_meter(dp, meter);
>>>> +       ovs_unlock();
>>>> +
>>> After the unlock, it is not safe to keep the ref to old_meter. better
>>> to release lock at the end. we could optimize it later if required.
>>
>> I see a problem here: the old_meter has not been removed from the list before
>> unlock. The code should have been:
>>
>> old_meter = lookup_meter(dp, meter_id);
>> detch_meter(dp, old_meter);
>> attach_meter(dp, meter);
>> ovs_unlock();
>>
>> Do you still see a problem w.r.t. unlock() here? Once detch_meter() is called,
>> another thread should not have access to 'old_meter' any more. right?
>
> looks good.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-17  7:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-12 22:38 [net-next RFC 0/4] Openvswitch meter action Andy Zhou
2017-10-12 22:38 ` [net-next RFC 1/4] openvswitch: Add meter netlink definitions Andy Zhou
2017-10-12 22:38 ` [net-next RFC 2/4] openvswitch: export get_dp() API Andy Zhou
2017-10-12 22:38 ` [net-next RFC 3/4] openvswitch: Add meter infrastructure Andy Zhou
2017-10-14  0:12   ` Pravin Shelar
2017-10-16  7:05     ` Andy Zhou
2017-10-16 17:49       ` Pravin Shelar
2017-10-17  7:40         ` Andy Zhou
2017-10-12 22:38 ` [net-next RFC 4/4] openvswitch: Add meter action support Andy Zhou
2017-10-14  0:13   ` Pravin Shelar
2017-10-16  7:06     ` Andy Zhou
2017-10-14  0:09 ` [net-next RFC 0/4] Openvswitch meter action Pravin Shelar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).