[PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit
@ 2018-04-18  0:30 Yi-Hung Wei
  2018-04-18  0:30 ` [PATCH net-next v2 1/2] openvswitch: Add conntrack limit netlink definition Yi-Hung Wei
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Yi-Hung Wei @ 2018-04-18  0:30 UTC (permalink / raw)
  To: netdev; +Cc: Yi-Hung Wei

Currently, nf_conntrack_max is used to limit the maximum number of
conntrack entries in the conntrack table for every network namespace.
For the VMs and containers that reside in the same namespace,
they share the same conntrack table, and the total # of conntrack entries
for all the VMs and containers are limited by nf_conntrack_max.  In this
case, if one of the VM/container abuses the usage the conntrack entries,
it blocks the others from committing valid conntrack entries into the
conntrack table.  Even if we can possibly put the VM in different network
namespace, the current nf_conntrack_max configuration is kind of rigid
that we cannot limit different VM/container to have different # conntrack
entries.

To address the aforementioned issue, this patch proposes to have a
fine-grained mechanism that could further limit the # of conntrack entries
per-zone.  For example, we can designate different zone to different VM,
and set conntrack limit to each zone.  By providing this isolation, a
mis-behaved VM only consumes the conntrack entries in its own zone, and
it will not influence other well-behaved VMs.  Moreover, the users can
set various conntrack limit to different zone based on their preference.

The proposed implementation utilizes Netfilter's nf_conncount backend
to count the number of connections in a particular zone.  If the number of
connection is above a configured limitation, OVS will return ENOMEM to the
userspace.  If userspace does not configure the zone limit, the limit
defaults to zero that is no limitation, which is backward compatible to
the behavior without this patch.

The first patch defines the conntrack limit netlink definition, and the
second patch provides the implementation.

v1->v2:
  - Fixes commit log typos suggested by Greg.
  - Fixes memory free issue that Julia found.

Yi-Hung Wei (2):
  openvswitch: Add conntrack limit netlink definition
  openvswitch: Support conntrack zone limit

 include/uapi/linux/openvswitch.h |  62 +++++
 net/openvswitch/Kconfig          |   3 +-
 net/openvswitch/conntrack.c      | 498 ++++++++++++++++++++++++++++++++++++++-
 net/openvswitch/conntrack.h      |   9 +-
 net/openvswitch/datapath.c       |   7 +-
 net/openvswitch/datapath.h       |   1 +
 6 files changed, 574 insertions(+), 6 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v2 1/2] openvswitch: Add conntrack limit netlink definition
  2018-04-18  0:30 [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit Yi-Hung Wei
@ 2018-04-18  0:30 ` Yi-Hung Wei
  2018-04-18  0:30 ` [PATCH net-next v2 2/2] openvswitch: Support conntrack zone limit Yi-Hung Wei
  2018-04-23 13:39 ` [PATCH net-next v2 0/2] " David Miller
  2 siblings, 0 replies; 13+ messages in thread
From: Yi-Hung Wei @ 2018-04-18  0:30 UTC (permalink / raw)
  To: netdev; +Cc: Yi-Hung Wei

Define netlink messages and attributes to support user kernel
communication that uses the conntrack limit feature.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
---
 include/uapi/linux/openvswitch.h | 62 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 713e56ce681f..ca63c16375ce 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -937,4 +937,66 @@ enum ovs_meter_band_type {
 
 #define OVS_METER_BAND_TYPE_MAX (__OVS_METER_BAND_TYPE_MAX - 1)
 
+/* Conntrack limit */
+#define OVS_CT_LIMIT_FAMILY  "ovs_ct_limit"
+#define OVS_CT_LIMIT_MCGROUP "ovs_ct_limit"
+#define OVS_CT_LIMIT_VERSION 0x1
+
+enum ovs_ct_limit_cmd {
+	OVS_CT_LIMIT_CMD_UNSPEC,
+	OVS_CT_LIMIT_CMD_SET,		/* Add or modify ct limit. */
+	OVS_CT_LIMIT_CMD_DEL,		/* Delete ct limit. */
+	OVS_CT_LIMIT_CMD_GET		/* Get ct limit. */
+};
+
+enum ovs_ct_limit_attr {
+	OVS_CT_LIMIT_ATTR_UNSPEC,
+	OVS_CT_LIMIT_ATTR_OPTION,	/* Nested OVS_CT_LIMIT_ATTR_* */
+	__OVS_CT_LIMIT_ATTR_MAX
+};
+
+#define OVS_CT_LIMIT_ATTR_MAX (__OVS_CT_LIMIT_ATTR_MAX - 1)
+
+/**
+ * @OVS_CT_ZONE_LIMIT_ATTR_SET_REQ: Contains either
+ * OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT or a pair of
+ * OVS_CT_ZONE_LIMIT_ATTR_ZONE and OVS_CT_ZONE_LIMIT_ATTR_LIMIT.
+ * @OVS_CT_ZONE_LIMIT_ATTR_DEL_REQ: Contains OVS_CT_ZONE_LIMIT_ATTR_ZONE.
+ * @OVS_CT_ZONE_LIMIT_ATTR_GET_REQ: Contains OVS_CT_ZONE_LIMIT_ATTR_ZONE.
+ * @OVS_CT_ZONE_LIMIT_ATTR_GET_RLY: Contains either
+ * OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT or a triple of
+ * OVS_CT_ZONE_LIMIT_ATTR_ZONE, OVS_CT_ZONE_LIMIT_ATTR_LIMIT and
+ * OVS_CT_ZONE_LIMIT_ATTR_COUNT.
+ */
+enum ovs_ct_limit_option_attr {
+	OVS_CT_LIMIT_OPTION_ATTR_UNSPEC,
+	OVS_CT_ZONE_LIMIT_ATTR_SET_REQ,	/* Nested OVS_CT_ZONE_LIMIT_ATTR_*
+					 * attributes. */
+	OVS_CT_ZONE_LIMIT_ATTR_DEL_REQ,	/* Nested OVS_CT_ZONE_LIMIT_ATTR_*
+					 * attributes. */
+	OVS_CT_ZONE_LIMIT_ATTR_GET_REQ,	/* Nested OVS_CT_ZONE_LIMIT_ATTR_*
+					 * attributes. */
+	OVS_CT_ZONE_LIMIT_ATTR_GET_RLY,	/* Nested OVS_CT_ZONE_LIMIT_ATTR_*
+					 * attributes. */
+	__OVS_CT_LIMIT_OPTION_ATTR_MAX
+};
+
+#define OVS_CT_LIMIT_OPTION_ATTR_MAX (__OVS_CT_LIMIT_OPTION_ATTR_MAX - 1)
+
+enum ovs_ct_zone_limit_attr {
+	OVS_CT_ZONE_LIMIT_ATTR_UNSPEC,
+	OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT,	/* u32 default conntrack limit
+						 * for all zones. */
+	OVS_CT_ZONE_LIMIT_ATTR_ZONE,		/* u16 conntrack zone id. */
+	OVS_CT_ZONE_LIMIT_ATTR_LIMIT,		/* u32 max number of conntrack
+						 * entries allowed in the
+						 * corresponding zone. */
+	OVS_CT_ZONE_LIMIT_ATTR_COUNT,		/* u32 number of conntrack
+						 * entries in the corresponding
+						 * zone. */
+	__OVS_CT_ZONE_LIMIT_ATTR_MAX
+};
+
+#define OVS_CT_ZONE_LIMIT_ATTR_MAX (__OVS_CT_ZONE_LIMIT_ATTR_MAX - 1)
+
 #endif /* _LINUX_OPENVSWITCH_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v2 2/2] openvswitch: Support conntrack zone limit
  2018-04-18  0:30 [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit Yi-Hung Wei
  2018-04-18  0:30 ` [PATCH net-next v2 1/2] openvswitch: Add conntrack limit netlink definition Yi-Hung Wei
@ 2018-04-18  0:30 ` Yi-Hung Wei
  2018-04-24  6:30   ` Pravin Shelar
  2018-04-23 13:39 ` [PATCH net-next v2 0/2] " David Miller
  2 siblings, 1 reply; 13+ messages in thread
From: Yi-Hung Wei @ 2018-04-18  0:30 UTC (permalink / raw)
  To: netdev; +Cc: Yi-Hung Wei

Currently, nf_conntrack_max is used to limit the maximum number of
conntrack entries in the conntrack table for every network namespace.
For the VMs and containers that reside in the same namespace,
they share the same conntrack table, and the total # of conntrack entries
for all the VMs and containers are limited by nf_conntrack_max.  In this
case, if one of the VM/container abuses the usage the conntrack entries,
it blocks the others from committing valid conntrack entries into the
conntrack table.  Even if we can possibly put the VM in different network
namespace, the current nf_conntrack_max configuration is kind of rigid
that we cannot limit different VM/container to have different # conntrack
entries.

To address the aforementioned issue, this patch proposes to have a
fine-grained mechanism that could further limit the # of conntrack entries
per-zone.  For example, we can designate different zone to different VM,
and set conntrack limit to each zone.  By providing this isolation, a
mis-behaved VM only consumes the conntrack entries in its own zone, and
it will not influence other well-behaved VMs.  Moreover, the users can
set various conntrack limit to different zone based on their preference.

The proposed implementation utilizes Netfilter's nf_conncount backend
to count the number of connections in a particular zone.  If the number of
connection is above a configured limitation, ovs will return ENOMEM to the
userspace.  If userspace does not configure the zone limit, the limit
defaults to zero that is no limitation, which is backward compatible to
the behavior without this patch.

The following high leve APIs are provided to the userspace:
  - OVS_CT_LIMIT_CMD_SET:
    * set default connection limit for all zones
    * set the connection limit for a particular zone
  - OVS_CT_LIMIT_CMD_DEL:
    * remove the connection limit for a particular zone
  - OVS_CT_LIMIT_CMD_GET:
    * get the default connection limit for all zones
    * get the connection limit for a particular zone

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
---
 net/openvswitch/Kconfig     |   3 +-
 net/openvswitch/conntrack.c | 498 +++++++++++++++++++++++++++++++++++++++++++-
 net/openvswitch/conntrack.h |   9 +-
 net/openvswitch/datapath.c  |   7 +-
 net/openvswitch/datapath.h  |   1 +
 5 files changed, 512 insertions(+), 6 deletions(-)

diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 2650205cdaf9..89da9512ec1e 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -9,7 +9,8 @@ config OPENVSWITCH
 		   (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
 				     (!NF_NAT || NF_NAT) && \
 				     (!NF_NAT_IPV4 || NF_NAT_IPV4) && \
-				     (!NF_NAT_IPV6 || NF_NAT_IPV6)))
+				     (!NF_NAT_IPV6 || NF_NAT_IPV6) && \
+				     (!NETFILTER_CONNCOUNT || NETFILTER_CONNCOUNT)))
 	select LIBCRC32C
 	select MPLS
 	select NET_MPLS_GSO
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c5904f629091..d09b572f72b4 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -17,7 +17,9 @@
 #include <linux/udp.h>
 #include <linux/sctp.h>
 #include <net/ip.h>
+#include <net/genetlink.h>
 #include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_count.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_labels.h>
 #include <net/netfilter/nf_conntrack_seqadj.h>
@@ -76,6 +78,38 @@ struct ovs_conntrack_info {
 #endif
 };
 
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+#define OVS_CT_LIMIT_UNLIMITED	0
+#define OVS_CT_LIMIT_DEFAULT OVS_CT_LIMIT_UNLIMITED
+#define CT_LIMIT_HASH_BUCKETS 512
+
+struct ovs_ct_limit {
+	/* Elements in ovs_ct_limit_info->limits hash table */
+	struct hlist_node hlist_node;
+	struct rcu_head rcu;
+	u16 zone;
+	u32 limit;
+};
+
+struct ovs_ct_limit_info {
+	u32 default_limit;
+	struct hlist_head *limits;
+	struct nf_conncount_data *data __aligned(8);
+};
+
+static const struct nla_policy ct_limit_policy[OVS_CT_LIMIT_ATTR_MAX + 1] = {
+	[OVS_CT_LIMIT_ATTR_OPTION] = { .type = NLA_NESTED, },
+};
+
+static const struct nla_policy
+	ct_zone_limit_policy[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1] = {
+		[OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT] = { .type = NLA_U32, },
+		[OVS_CT_ZONE_LIMIT_ATTR_ZONE] = { .type = NLA_U16, },
+		[OVS_CT_ZONE_LIMIT_ATTR_LIMIT] = { .type = NLA_U32, },
+		[OVS_CT_ZONE_LIMIT_ATTR_COUNT] = { .type = NLA_U32, },
+};
+#endif
+
 static bool labels_nonzero(const struct ovs_key_ct_labels *labels);
 
 static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info);
@@ -1036,6 +1070,94 @@ static bool labels_nonzero(const struct ovs_key_ct_labels *labels)
 	return false;
 }
 
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+static struct hlist_head *ct_limit_hash_bucket(
+	const struct ovs_ct_limit_info *info, u16 zone)
+{
+	return &info->limits[zone & (CT_LIMIT_HASH_BUCKETS - 1)];
+}
+
+/* Call with ovs_mutex */
+static void ct_limit_set(const struct ovs_ct_limit_info *info,
+			 struct ovs_ct_limit *new_ct_limit)
+{
+	struct ovs_ct_limit *ct_limit;
+	struct hlist_head *head;
+
+	head = ct_limit_hash_bucket(info, new_ct_limit->zone);
+	hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
+		if (ct_limit->zone == new_ct_limit->zone) {
+			hlist_replace_rcu(&ct_limit->hlist_node,
+					  &new_ct_limit->hlist_node);
+			kfree_rcu(ct_limit, rcu);
+			return;
+		}
+	}
+
+	hlist_add_head_rcu(&new_ct_limit->hlist_node, head);
+}
+
+/* Call with ovs_mutex */
+static void ct_limit_del(const struct ovs_ct_limit_info *info, u16 zone)
+{
+	struct ovs_ct_limit *ct_limit;
+	struct hlist_head *head;
+
+	head = ct_limit_hash_bucket(info, zone);
+	hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
+		if (ct_limit->zone == zone) {
+			hlist_del_rcu(&ct_limit->hlist_node);
+			kfree_rcu(ct_limit, rcu);
+			return;
+		}
+	}
+}
+
+/* Call with RCU read lock */
+static u32 ct_limit_get(const struct ovs_ct_limit_info *info, u16 zone)
+{
+	struct ovs_ct_limit *ct_limit;
+	struct hlist_head *head;
+
+	head = ct_limit_hash_bucket(info, zone);
+	hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
+		if (ct_limit->zone == zone)
+			return ct_limit->limit;
+	}
+
+	return info->default_limit;
+}
+
+static int ovs_ct_check_limit(struct net *net,
+			      const struct ovs_conntrack_info *info,
+			      const struct nf_conntrack_tuple *tuple)
+{
+	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
+	const struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+	u32 per_zone_limit, connections;
+	u32 conncount_key[5];
+
+	conncount_key[0] = info->zone.id;
+
+	rcu_read_lock();
+	per_zone_limit = ct_limit_get(ct_limit_info, info->zone.id);
+	if (per_zone_limit == OVS_CT_LIMIT_UNLIMITED) {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	connections = nf_conncount_count(net, ct_limit_info->data,
+					 conncount_key, tuple, &info->zone);
+	if (connections > per_zone_limit) {
+		rcu_read_unlock();
+		return -ENOMEM;
+	}
+
+	rcu_read_unlock();
+	return 0;
+}
+#endif
+
 /* Lookup connection and confirm if unconfirmed. */
 static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
 			 const struct ovs_conntrack_info *info,
@@ -1054,6 +1176,13 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
 	if (!ct)
 		return 0;
 
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+	err = ovs_ct_check_limit(net, info,
+				 &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+	if (err)
+		return err;
+#endif
+
 	/* Set the conntrack event mask if given.  NEW and DELETE events have
 	 * their own groups, but the NFNLGRP_CONNTRACK_UPDATE group listener
 	 * typically would receive many kinds of updates.  Setting the event
@@ -1655,7 +1784,364 @@ static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info)
 		nf_ct_tmpl_free(ct_info->ct);
 }
 
-void ovs_ct_init(struct net *net)
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+static int ovs_ct_limit_init(struct net *net, struct ovs_net *ovs_net)
+{
+	int i, err;
+
+	ovs_net->ct_limit_info = kmalloc(sizeof *ovs_net->ct_limit_info,
+					 GFP_KERNEL);
+	if (!ovs_net->ct_limit_info)
+		return -ENOMEM;
+
+	ovs_net->ct_limit_info->default_limit = OVS_CT_LIMIT_DEFAULT;
+	ovs_net->ct_limit_info->limits =
+		kmalloc_array(CT_LIMIT_HASH_BUCKETS, sizeof(struct hlist_head),
+			      GFP_KERNEL);
+	if (!ovs_net->ct_limit_info->limits) {
+		kfree(ovs_net->ct_limit_info);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < CT_LIMIT_HASH_BUCKETS; i++)
+		INIT_HLIST_HEAD(&ovs_net->ct_limit_info->limits[i]);
+
+	ovs_net->ct_limit_info->data =
+		nf_conncount_init(net, NFPROTO_INET, sizeof(u32));
+
+	if (IS_ERR(ovs_net->ct_limit_info->data)) {
+		err = PTR_ERR(ovs_net->ct_limit_info->data);
+		kfree(ovs_net->ct_limit_info->limits);
+		kfree(ovs_net->ct_limit_info);
+		return err;
+	}
+	return 0;
+}
+
+static void ovs_ct_limit_exit(struct net *net, struct ovs_net *ovs_net)
+{
+	const struct ovs_ct_limit_info *info = ovs_net->ct_limit_info;
+	int i;
+
+	nf_conncount_destroy(net, NFPROTO_INET, info->data);
+	for (i = 0; i < CT_LIMIT_HASH_BUCKETS; ++i) {
+		struct hlist_head *head = &info->limits[i];
+		struct ovs_ct_limit *ct_limit;
+
+		hlist_for_each_entry_rcu(ct_limit, head, hlist_node)
+			kfree_rcu(ct_limit, rcu);
+	}
+	kfree(ovs_net->ct_limit_info->limits);
+	kfree(ovs_net->ct_limit_info);
+}
+
+static struct sk_buff *
+ovs_ct_limit_cmd_reply_start(struct genl_info *info, u8 cmd,
+			     struct ovs_header **ovs_reply_header)
+{
+	struct sk_buff *skb;
+	struct ovs_header *ovs_header = info->userhdr;
+
+	skb = genlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ERR_PTR(-ENOMEM);
+
+	*ovs_reply_header = genlmsg_put(skb, info->snd_portid,
+					info->snd_seq,
+					&dp_ct_limit_genl_family, 0, cmd);
+
+	if (!*ovs_reply_header) {
+		nlmsg_free(skb);
+		return ERR_PTR(-EMSGSIZE);
+	}
+	(*ovs_reply_header)->dp_ifindex = ovs_header->dp_ifindex;
+
+	return skb;
+}
+
+static int ovs_ct_limit_set_zone_limit(struct nlattr *nla_zone_limit,
+				       struct ovs_ct_limit_info *info)
+{
+	struct nlattr *nla;
+	int rem, err;
+
+	nla_for_each_nested(nla, nla_zone_limit, rem) {
+		struct nlattr *attr[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1];
+		struct ovs_ct_limit *ct_limit;
+
+		if (nla_type(nla) != OVS_CT_ZONE_LIMIT_ATTR_SET_REQ)
+			return  -EINVAL;
+
+		err = nla_parse((struct nlattr **)&attr,
+				OVS_CT_ZONE_LIMIT_ATTR_MAX, nla_data(nla),
+				nla_len(nla), ct_zone_limit_policy, NULL);
+		if (err)
+			return err;
+
+		if (attr[OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT]) {
+			u32 default_limit = nla_get_u32(
+				attr[OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT]);
+			ovs_lock();
+			info->default_limit = default_limit;
+			ovs_unlock();
+		} else {
+			if (!attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE] ||
+			    !attr[OVS_CT_ZONE_LIMIT_ATTR_LIMIT]) {
+				return -EINVAL;
+			}
+
+			ct_limit = kmalloc(sizeof(*ct_limit), GFP_KERNEL);
+			if (!ct_limit)
+				return -ENOMEM;
+
+			ct_limit->zone = nla_get_u16(
+				attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE]);
+			ct_limit->limit = nla_get_u32(
+				attr[OVS_CT_ZONE_LIMIT_ATTR_LIMIT]);
+
+			ovs_lock();
+			ct_limit_set(info, ct_limit);
+			ovs_unlock();
+		}
+	}
+	return 0;
+}
+
+static int ovs_ct_limit_del_zone_limit(struct nlattr *nla_zone_limit,
+				       struct ovs_ct_limit_info *info)
+{
+	struct nlattr *nla;
+	int rem, err;
+
+	nla_for_each_nested(nla, nla_zone_limit, rem) {
+		struct nlattr *attr[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1];
+		u16 zone;
+
+		if (nla_type(nla) != OVS_CT_ZONE_LIMIT_ATTR_DEL_REQ)
+			return  -EINVAL;
+
+		err = nla_parse((struct nlattr **)&attr,
+				OVS_CT_ZONE_LIMIT_ATTR_MAX, nla_data(nla),
+				nla_len(nla), ct_zone_limit_policy, NULL);
+		if (err)
+			return err;
+
+		if (!attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE])
+			return -EINVAL;
+
+		zone = nla_get_u16(attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE]);
+
+		ovs_lock();
+		ct_limit_del(info, zone);
+		ovs_unlock();
+	}
+	return 0;
+}
+
+static int ovs_ct_limit_get_default_limit(struct ovs_ct_limit_info *info,
+					  struct sk_buff *reply)
+{
+	int err;
+	struct nlattr *nla_nested;
+
+	nla_nested = nla_nest_start(reply, OVS_CT_ZONE_LIMIT_ATTR_GET_RLY);
+
+	err = nla_put_u32(reply, OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT,
+			  info->default_limit);
+	if (err)
+		return err;
+
+	nla_nest_end(reply, nla_nested);
+	return 0;
+}
+
+static int ovs_ct_limit_get_zone_limit(struct net *net,
+				       struct nlattr *nla_zone_limit,
+				       struct ovs_ct_limit_info *info,
+				       struct sk_buff *reply)
+{
+	struct nlattr *nla, *nla_nested;
+	int rem, err;
+	u16 zone;
+	u32 limit, count, conncount_key[5];
+	struct nf_conntrack_zone ct_zone;
+
+	nla_for_each_nested(nla, nla_zone_limit, rem) {
+		struct nlattr *attr[OVS_CT_ZONE_LIMIT_ATTR_MAX + 1];
+
+		if (nla_type(nla) != OVS_CT_ZONE_LIMIT_ATTR_GET_REQ)
+			return -EINVAL;
+
+		err = nla_parse((struct nlattr **)&attr,
+				OVS_CT_ZONE_LIMIT_ATTR_MAX, nla_data(nla),
+				nla_len(nla), ct_zone_limit_policy, NULL);
+		if (err)
+			return err;
+
+		if (!attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE])
+			return -EINVAL;
+
+		zone = nla_get_u16(attr[OVS_CT_ZONE_LIMIT_ATTR_ZONE]);
+		nf_ct_zone_init(&ct_zone, zone, NF_CT_DEFAULT_ZONE_DIR, 0);
+		rcu_read_lock();
+		limit = ct_limit_get(info, zone);
+		rcu_read_unlock();
+
+		conncount_key[0] = zone;
+		count = nf_conncount_count(net, info->data, conncount_key,
+					   NULL, &ct_zone);
+
+		nla_nested = nla_nest_start(reply,
+					    OVS_CT_ZONE_LIMIT_ATTR_GET_RLY);
+		if (nla_put_u16(reply, OVS_CT_ZONE_LIMIT_ATTR_ZONE, zone) ||
+		    nla_put_u32(reply, OVS_CT_ZONE_LIMIT_ATTR_LIMIT, limit) ||
+		    nla_put_u32(reply, OVS_CT_ZONE_LIMIT_ATTR_COUNT, count))
+			return -EMSGSIZE;
+		nla_nest_end(reply, nla_nested);
+	}
+
+	return 0;
+}
+
+static int ovs_ct_limit_cmd_set(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr **a = info->attrs;
+	struct sk_buff *reply;
+	struct ovs_header *ovs_reply_header;
+	struct ovs_net *ovs_net = net_generic(sock_net(skb->sk), ovs_net_id);
+	struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+	int err;
+
+	reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_SET,
+					     &ovs_reply_header);
+	if (IS_ERR(reply))
+		return PTR_ERR(reply);
+
+	if (!a[OVS_CT_LIMIT_ATTR_OPTION])
+		return -EINVAL;
+
+	err = ovs_ct_limit_set_zone_limit(a[OVS_CT_LIMIT_ATTR_OPTION],
+					  ct_limit_info);
+	if (err)
+		goto exit_err;
+
+	genlmsg_end(reply, ovs_reply_header);
+	return genlmsg_reply(reply, info);
+
+exit_err:
+	nlmsg_free(reply);
+	return err;
+}
+
+static int ovs_ct_limit_cmd_del(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr **a = info->attrs;
+	struct sk_buff *reply;
+	struct ovs_header *ovs_reply_header;
+	struct ovs_net *ovs_net = net_generic(sock_net(skb->sk), ovs_net_id);
+	struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+	int err;
+
+	reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_DEL,
+					     &ovs_reply_header);
+	if (IS_ERR(reply))
+		return PTR_ERR(reply);
+
+	if (!a[OVS_CT_LIMIT_ATTR_OPTION])
+		return -EINVAL;
+
+	err = ovs_ct_limit_del_zone_limit(a[OVS_CT_LIMIT_ATTR_OPTION],
+					  ct_limit_info);
+	if (err)
+		goto exit_err;
+
+	genlmsg_end(reply, ovs_reply_header);
+	return genlmsg_reply(reply, info);
+
+exit_err:
+	nlmsg_free(reply);
+	return err;
+}
+
+static int ovs_ct_limit_cmd_get(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr **a = info->attrs;
+	struct nlattr *nla_reply;
+	struct sk_buff *reply;
+	struct ovs_header *ovs_reply_header;
+	struct net *net = sock_net(skb->sk);
+	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
+	struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
+	int err;
+
+	reply = ovs_ct_limit_cmd_reply_start(info, OVS_CT_LIMIT_CMD_GET,
+					     &ovs_reply_header);
+	if (IS_ERR(reply))
+		return PTR_ERR(reply);
+
+	nla_reply = nla_nest_start(reply, OVS_CT_LIMIT_ATTR_OPTION);
+
+	err = ovs_ct_limit_get_default_limit(ct_limit_info, reply);
+	if (err)
+		goto exit_err;
+
+	if (a[OVS_CT_LIMIT_ATTR_OPTION]) {
+		err = ovs_ct_limit_get_zone_limit(
+			net, a[OVS_CT_LIMIT_ATTR_OPTION], ct_limit_info,
+			reply);
+		if (err)
+			goto exit_err;
+	}
+
+	nla_nest_end(reply, nla_reply);
+	genlmsg_end(reply, ovs_reply_header);
+	return genlmsg_reply(reply, info);
+
+exit_err:
+	nlmsg_free(reply);
+	return err;
+}
+
+static struct genl_ops ct_limit_genl_ops[] = {
+	{ .cmd = OVS_CT_LIMIT_CMD_SET,
+		.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+					   * privilege. */
+		.policy = ct_limit_policy,
+		.doit = ovs_ct_limit_cmd_set,
+	},
+	{ .cmd = OVS_CT_LIMIT_CMD_DEL,
+		.flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN
+					   * privilege. */
+		.policy = ct_limit_policy,
+		.doit = ovs_ct_limit_cmd_del,
+	},
+	{ .cmd = OVS_CT_LIMIT_CMD_GET,
+		.flags = 0,		  /* OK for unprivileged users. */
+		.policy = ct_limit_policy,
+		.doit = ovs_ct_limit_cmd_get,
+	},
+};
+
+static const struct genl_multicast_group ovs_ct_limit_multicast_group = {
+	.name = OVS_CT_LIMIT_MCGROUP,
+};
+
+struct genl_family dp_ct_limit_genl_family __ro_after_init = {
+	.hdrsize = sizeof(struct ovs_header),
+	.name = OVS_CT_LIMIT_FAMILY,
+	.version = OVS_CT_LIMIT_VERSION,
+	.maxattr = OVS_CT_LIMIT_ATTR_MAX,
+	.netnsok = true,
+	.parallel_ops = true,
+	.ops = ct_limit_genl_ops,
+	.n_ops = ARRAY_SIZE(ct_limit_genl_ops),
+	.mcgrps = &ovs_ct_limit_multicast_group,
+	.n_mcgrps = 1,
+	.module = THIS_MODULE,
+};
+#endif
+
+int ovs_ct_init(struct net *net)
 {
 	unsigned int n_bits = sizeof(struct ovs_key_ct_labels) * BITS_PER_BYTE;
 	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
@@ -1666,12 +2152,22 @@ void ovs_ct_init(struct net *net)
 	} else {
 		ovs_net->xt_label = true;
 	}
+
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+	return ovs_ct_limit_init(net, ovs_net);
+#else
+	return 0;
+#endif
 }
 
 void ovs_ct_exit(struct net *net)
 {
 	struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
 
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+	ovs_ct_limit_exit(net, ovs_net);
+#endif
+
 	if (ovs_net->xt_label)
 		nf_connlabels_put(net);
 }
diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
index 399dfdd2c4f9..900dadd70974 100644
--- a/net/openvswitch/conntrack.h
+++ b/net/openvswitch/conntrack.h
@@ -17,10 +17,11 @@
 #include "flow.h"
 
 struct ovs_conntrack_info;
+struct ovs_ct_limit_info;
 enum ovs_key_attr;
 
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
-void ovs_ct_init(struct net *);
+int ovs_ct_init(struct net *);
 void ovs_ct_exit(struct net *);
 bool ovs_ct_verify(struct net *, enum ovs_key_attr attr);
 int ovs_ct_copy_action(struct net *, const struct nlattr *,
@@ -44,7 +45,7 @@ void ovs_ct_free_action(const struct nlattr *a);
 #else
 #include <linux/errno.h>
 
-static inline void ovs_ct_init(struct net *net) { }
+static inline int ovs_ct_init(struct net *net) { return 0; }
 
 static inline void ovs_ct_exit(struct net *net) { }
 
@@ -104,4 +105,8 @@ static inline void ovs_ct_free_action(const struct nlattr *a) { }
 
 #define CT_SUPPORTED_MASK 0
 #endif /* CONFIG_NF_CONNTRACK */
+
+#if IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+extern struct genl_family dp_ct_limit_genl_family;
+#endif
 #endif /* ovs_conntrack.h */
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 015e24e08909..a61818e94396 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -2288,6 +2288,9 @@ static struct genl_family * const dp_genl_families[] = {
 	&dp_flow_genl_family,
 	&dp_packet_genl_family,
 	&dp_meter_genl_family,
+#if	IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
+	&dp_ct_limit_genl_family,
+#endif
 };
 
 static void dp_unregister_genl(int n_families)
@@ -2323,8 +2326,7 @@ static int __net_init ovs_init_net(struct net *net)
 
 	INIT_LIST_HEAD(&ovs_net->dps);
 	INIT_WORK(&ovs_net->dp_notify_work, ovs_dp_notify_wq);
-	ovs_ct_init(net);
-	return 0;
+	return ovs_ct_init(net);
 }
 
 static void __net_exit list_vports_from_net(struct net *net, struct net *dnet,
@@ -2469,3 +2471,4 @@ MODULE_ALIAS_GENL_FAMILY(OVS_VPORT_FAMILY);
 MODULE_ALIAS_GENL_FAMILY(OVS_FLOW_FAMILY);
 MODULE_ALIAS_GENL_FAMILY(OVS_PACKET_FAMILY);
 MODULE_ALIAS_GENL_FAMILY(OVS_METER_FAMILY);
+MODULE_ALIAS_GENL_FAMILY(OVS_CT_LIMIT_FAMILY);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 523d65526766..51bd4dcb6c8b 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -144,6 +144,7 @@ struct dp_upcall_info {
 struct ovs_net {
 	struct list_head dps;
 	struct work_struct dp_notify_work;
+	struct ovs_ct_limit_info *ct_limit_info;
 
 	/* Module reference for configuring conntrack. */
 	bool xt_label;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit
  2018-04-18  0:30 [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit Yi-Hung Wei
  2018-04-18  0:30 ` [PATCH net-next v2 1/2] openvswitch: Add conntrack limit netlink definition Yi-Hung Wei
  2018-04-18  0:30 ` [PATCH net-next v2 2/2] openvswitch: Support conntrack zone limit Yi-Hung Wei
@ 2018-04-23 13:39 ` David Miller
  2018-04-23 20:10   ` Pravin Shelar
  2 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2018-04-23 13:39 UTC (permalink / raw)
  To: yihung.wei; +Cc: netdev, pshelar

From: Yi-Hung Wei <yihung.wei@gmail.com>
Date: Tue, 17 Apr 2018 17:30:27 -0700

> Currently, nf_conntrack_max is used to limit the maximum number of
> conntrack entries in the conntrack table for every network namespace.
> For the VMs and containers that reside in the same namespace,
> they share the same conntrack table, and the total # of conntrack entries
> for all the VMs and containers are limited by nf_conntrack_max.  In this
> case, if one of the VM/container abuses the usage the conntrack entries,
> it blocks the others from committing valid conntrack entries into the
> conntrack table.  Even if we can possibly put the VM in different network
> namespace, the current nf_conntrack_max configuration is kind of rigid
> that we cannot limit different VM/container to have different # conntrack
> entries.
> 
> To address the aforementioned issue, this patch proposes to have a
> fine-grained mechanism that could further limit the # of conntrack entries
> per-zone.  For example, we can designate different zone to different VM,
> and set conntrack limit to each zone.  By providing this isolation, a
> mis-behaved VM only consumes the conntrack entries in its own zone, and
> it will not influence other well-behaved VMs.  Moreover, the users can
> set various conntrack limit to different zone based on their preference.
> 
> The proposed implementation utilizes Netfilter's nf_conncount backend
> to count the number of connections in a particular zone.  If the number of
> connection is above a configured limitation, OVS will return ENOMEM to the
> userspace.  If userspace does not configure the zone limit, the limit
> defaults to zero that is no limitation, which is backward compatible to
> the behavior without this patch.
> 
> The first patch defines the conntrack limit netlink definition, and the
> second patch provides the implementation.

Pravin, I need this series reviewed.

Thank you.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit
  2018-04-23 13:39 ` [PATCH net-next v2 0/2] " David Miller
@ 2018-04-23 20:10   ` Pravin Shelar
  2018-04-23 21:19     ` Yi-Hung Wei
  0 siblings, 1 reply; 13+ messages in thread
From: Pravin Shelar @ 2018-04-23 20:10 UTC (permalink / raw)
  To: David Miller; +Cc: Yi-Hung Wei, Linux Kernel Network Developers

On Mon, Apr 23, 2018 at 6:39 AM, David Miller <davem@davemloft.net> wrote:
> From: Yi-Hung Wei <yihung.wei@gmail.com>
> Date: Tue, 17 Apr 2018 17:30:27 -0700
>
>> Currently, nf_conntrack_max is used to limit the maximum number of
>> conntrack entries in the conntrack table for every network namespace.
>> For the VMs and containers that reside in the same namespace,
>> they share the same conntrack table, and the total # of conntrack entries
>> for all the VMs and containers are limited by nf_conntrack_max.  In this
>> case, if one of the VM/container abuses the usage the conntrack entries,
>> it blocks the others from committing valid conntrack entries into the
>> conntrack table.  Even if we can possibly put the VM in different network
>> namespace, the current nf_conntrack_max configuration is kind of rigid
>> that we cannot limit different VM/container to have different # conntrack
>> entries.
>>

Hi
This looks like general problem related to nf zone usage limit, Did
you considered changing nf-conntrack to have a per zone limit, so that
all users of nf-filter can use it. I prefer this to adding a wrapper
in OVS nf-filter layer.

Thanks,
Pravin.

>> To address the aforementioned issue, this patch proposes to have a
>> fine-grained mechanism that could further limit the # of conntrack entries
>> per-zone.  For example, we can designate different zone to different VM,
>> and set conntrack limit to each zone.  By providing this isolation, a
>> mis-behaved VM only consumes the conntrack entries in its own zone, and
>> it will not influence other well-behaved VMs.  Moreover, the users can
>> set various conntrack limit to different zone based on their preference.
>>
>> The proposed implementation utilizes Netfilter's nf_conncount backend
>> to count the number of connections in a particular zone.  If the number of
>> connection is above a configured limitation, OVS will return ENOMEM to the
>> userspace.  If userspace does not configure the zone limit, the limit
>> defaults to zero that is no limitation, which is backward compatible to
>> the behavior without this patch.
>>
>> The first patch defines the conntrack limit netlink definition, and the
>> second patch provides the implementation.
>
> Pravin, I need this series reviewed.
>
> Thank you.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit
  2018-04-23 20:10   ` Pravin Shelar
@ 2018-04-23 21:19     ` Yi-Hung Wei
  2018-04-24  6:34       ` Pravin Shelar
  0 siblings, 1 reply; 13+ messages in thread
From: Yi-Hung Wei @ 2018-04-23 21:19 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: David Miller, Linux Kernel Network Developers, Florian Westphal

On Mon, Apr 23, 2018 at 1:10 PM, Pravin Shelar <pshelar@ovn.org> wrote:
> On Mon, Apr 23, 2018 at 6:39 AM, David Miller <davem@davemloft.net> wrote:
>> From: Yi-Hung Wei <yihung.wei@gmail.com>
>> Date: Tue, 17 Apr 2018 17:30:27 -0700
>>
>>> Currently, nf_conntrack_max is used to limit the maximum number of
>>> conntrack entries in the conntrack table for every network namespace.
>>> For the VMs and containers that reside in the same namespace,
>>> they share the same conntrack table, and the total # of conntrack entries
>>> for all the VMs and containers are limited by nf_conntrack_max.  In this
>>> case, if one of the VM/container abuses the usage the conntrack entries,
>>> it blocks the others from committing valid conntrack entries into the
>>> conntrack table.  Even if we can possibly put the VM in different network
>>> namespace, the current nf_conntrack_max configuration is kind of rigid
>>> that we cannot limit different VM/container to have different # conntrack
>>> entries.
>>>
>
> Hi
> This looks like general problem related to nf zone usage limit, Did
> you considered changing nf-conntrack to have a per zone limit, so that
> all users of nf-filter can use it. I prefer this to adding a wrapper
> in OVS nf-filter layer.
>
> Thanks,
> Pravin.
>

Hi Prvain,

Thanks for your comment.  Originally, I was thinking to add this
feature in nf_conntrack and had some discussion with Florian.  It
turns out that iptables and nft have their own way to keep track of
the connection limits, and it sounds reasonable to share the backend
that counts the number of connections, but each module can enforce the
connection limit in their own way.  Therefore, Florian helped to pull
out the common backend to nf_conncount in the following commit. The
nf_conncount then can be used by xtables, nft, and ovs.

commit 625c556118f3c2fd28bb8ef6da18c53bd4037be4
Author: Florian Westphal <fw@strlen.de>
Date:   Sat Dec 9 21:01:08 2017 +0100

    netfilter: connlimit: split xt_connlimit into front and backend

This allows to reuse xt_connlimit infrastructure from nf_tables.
The upcoming nf_tables frontend can just pass in an nftables register
as input key, this allows limiting by any nft-supported key, including
concatenations.  For xt_connlimit, pass in the zone and the ip/ipv6 addres.
....

Basically, to achieve conntrack zone limit in OVS.  We need the
following 3 parts.
1. Count the number of connections (this is provided by netfilter's
nf_conncount backend)
2. Keep track of the connection limits of zones, and check if it
exceeds the limit.
3. An API for userspace to set/delete/get the conntrack zone limit.

This patch series implements item 2 and 3, and it reuses the
nf_conncount from netfiler for the first part.

Thanks,

-Yi-Hung

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 2/2] openvswitch: Support conntrack zone limit
  2018-04-18  0:30 ` [PATCH net-next v2 2/2] openvswitch: Support conntrack zone limit Yi-Hung Wei
@ 2018-04-24  6:30   ` Pravin Shelar
  2018-04-25 21:51     ` Yi-Hung Wei
  0 siblings, 1 reply; 13+ messages in thread
From: Pravin Shelar @ 2018-04-24  6:30 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: Linux Kernel Network Developers

On Tue, Apr 17, 2018 at 5:30 PM, Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> Currently, nf_conntrack_max is used to limit the maximum number of
> conntrack entries in the conntrack table for every network namespace.
> For the VMs and containers that reside in the same namespace,
> they share the same conntrack table, and the total # of conntrack entries
> for all the VMs and containers are limited by nf_conntrack_max.  In this
> case, if one of the VM/container abuses the usage the conntrack entries,
> it blocks the others from committing valid conntrack entries into the
> conntrack table.  Even if we can possibly put the VM in different network
> namespace, the current nf_conntrack_max configuration is kind of rigid
> that we cannot limit different VM/container to have different # conntrack
> entries.
>
> To address the aforementioned issue, this patch proposes to have a
> fine-grained mechanism that could further limit the # of conntrack entries
> per-zone.  For example, we can designate different zone to different VM,
> and set conntrack limit to each zone.  By providing this isolation, a
> mis-behaved VM only consumes the conntrack entries in its own zone, and
> it will not influence other well-behaved VMs.  Moreover, the users can
> set various conntrack limit to different zone based on their preference.
>
> The proposed implementation utilizes Netfilter's nf_conncount backend
> to count the number of connections in a particular zone.  If the number of
> connection is above a configured limitation, ovs will return ENOMEM to the
> userspace.  If userspace does not configure the zone limit, the limit
> defaults to zero that is no limitation, which is backward compatible to
> the behavior without this patch.
>
> The following high leve APIs are provided to the userspace:
>   - OVS_CT_LIMIT_CMD_SET:
>     * set default connection limit for all zones
>     * set the connection limit for a particular zone
>   - OVS_CT_LIMIT_CMD_DEL:
>     * remove the connection limit for a particular zone
>   - OVS_CT_LIMIT_CMD_GET:
>     * get the default connection limit for all zones
>     * get the connection limit for a particular zone
>
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> ---
>  net/openvswitch/Kconfig     |   3 +-
>  net/openvswitch/conntrack.c | 498 +++++++++++++++++++++++++++++++++++++++++++-
>  net/openvswitch/conntrack.h |   9 +-
>  net/openvswitch/datapath.c  |   7 +-
>  net/openvswitch/datapath.h  |   1 +
>  5 files changed, 512 insertions(+), 6 deletions(-)
>
> diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
> index 2650205cdaf9..89da9512ec1e 100644
> --- a/net/openvswitch/Kconfig
> +++ b/net/openvswitch/Kconfig
> @@ -9,7 +9,8 @@ config OPENVSWITCH
>                    (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
>                                      (!NF_NAT || NF_NAT) && \
>                                      (!NF_NAT_IPV4 || NF_NAT_IPV4) && \
> -                                    (!NF_NAT_IPV6 || NF_NAT_IPV6)))
> +                                    (!NF_NAT_IPV6 || NF_NAT_IPV6) && \
> +                                    (!NETFILTER_CONNCOUNT || NETFILTER_CONNCOUNT)))
>         select LIBCRC32C
>         select MPLS
>         select NET_MPLS_GSO
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index c5904f629091..d09b572f72b4 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -17,7 +17,9 @@
>  #include <linux/udp.h>
>  #include <linux/sctp.h>
>  #include <net/ip.h>
> +#include <net/genetlink.h>
>  #include <net/netfilter/nf_conntrack_core.h>
> +#include <net/netfilter/nf_conntrack_count.h>
>  #include <net/netfilter/nf_conntrack_helper.h>
>  #include <net/netfilter/nf_conntrack_labels.h>
>  #include <net/netfilter/nf_conntrack_seqadj.h>
> @@ -76,6 +78,38 @@ struct ovs_conntrack_info {
>  #endif
>  };
>
> +#if    IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +#define OVS_CT_LIMIT_UNLIMITED 0
> +#define OVS_CT_LIMIT_DEFAULT OVS_CT_LIMIT_UNLIMITED
> +#define CT_LIMIT_HASH_BUCKETS 512
> +
Can you use static key when the limit is not set.
This would avoid overhead in datapath when these limits are not used.

> +struct ovs_ct_limit {
> +       /* Elements in ovs_ct_limit_info->limits hash table */
> +       struct hlist_node hlist_node;
> +       struct rcu_head rcu;
> +       u16 zone;
> +       u32 limit;
> +};
> +
...

> +#endif
> +
>  /* Lookup connection and confirm if unconfirmed. */
>  static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
>                          const struct ovs_conntrack_info *info,
> @@ -1054,6 +1176,13 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
>         if (!ct)
>                 return 0;
>
> +#if    IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +       err = ovs_ct_check_limit(net, info,
> +                                &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
> +       if (err)
> +               return err;
> +#endif
> +

This could be checked during flow install time, so that only permitted
flows would have 'ct commit' action, we can avoid per packet cost
checking the limit.
returning error code form ovs_ct_commit() is lost in datapath and it
would be hard to debug packet lost in case of the limit is reached. So
another advantage of checking the limit in flow install be better
traceability. datapath would return error to usespace and it can log
the error code.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit
  2018-04-23 21:19     ` Yi-Hung Wei
@ 2018-04-24  6:34       ` Pravin Shelar
  2018-04-24 17:42         ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Pravin Shelar @ 2018-04-24  6:34 UTC (permalink / raw)
  To: Yi-Hung Wei
  Cc: David Miller, Linux Kernel Network Developers, Florian Westphal

On Mon, Apr 23, 2018 at 2:19 PM, Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> On Mon, Apr 23, 2018 at 1:10 PM, Pravin Shelar <pshelar@ovn.org> wrote:
>> On Mon, Apr 23, 2018 at 6:39 AM, David Miller <davem@davemloft.net> wrote:
>>> From: Yi-Hung Wei <yihung.wei@gmail.com>
>>> Date: Tue, 17 Apr 2018 17:30:27 -0700
>>>
>>>> Currently, nf_conntrack_max is used to limit the maximum number of
>>>> conntrack entries in the conntrack table for every network namespace.
>>>> For the VMs and containers that reside in the same namespace,
>>>> they share the same conntrack table, and the total # of conntrack entries
>>>> for all the VMs and containers are limited by nf_conntrack_max.  In this
>>>> case, if one of the VM/container abuses the usage the conntrack entries,
>>>> it blocks the others from committing valid conntrack entries into the
>>>> conntrack table.  Even if we can possibly put the VM in different network
>>>> namespace, the current nf_conntrack_max configuration is kind of rigid
>>>> that we cannot limit different VM/container to have different # conntrack
>>>> entries.
>>>>
>>
>> Hi
>> This looks like general problem related to nf zone usage limit, Did
>> you considered changing nf-conntrack to have a per zone limit, so that
>> all users of nf-filter can use it. I prefer this to adding a wrapper
>> in OVS nf-filter layer.
>>
>> Thanks,
>> Pravin.
>>
>
> Hi Prvain,
>
> Thanks for your comment.  Originally, I was thinking to add this
> feature in nf_conntrack and had some discussion with Florian.  It
> turns out that iptables and nft have their own way to keep track of
> the connection limits, and it sounds reasonable to share the backend
> that counts the number of connections, but each module can enforce the
> connection limit in their own way.  Therefore, Florian helped to pull
> out the common backend to nf_conncount in the following commit. The
> nf_conncount then can be used by xtables, nft, and ovs.
>
> commit 625c556118f3c2fd28bb8ef6da18c53bd4037be4
> Author: Florian Westphal <fw@strlen.de>
> Date:   Sat Dec 9 21:01:08 2017 +0100
>
>     netfilter: connlimit: split xt_connlimit into front and backend
>
> This allows to reuse xt_connlimit infrastructure from nf_tables.
> The upcoming nf_tables frontend can just pass in an nftables register
> as input key, this allows limiting by any nft-supported key, including
> concatenations.  For xt_connlimit, pass in the zone and the ip/ipv6 addres.
> ....
>
>
> Basically, to achieve conntrack zone limit in OVS.  We need the
> following 3 parts.
> 1. Count the number of connections (this is provided by netfilter's
> nf_conncount backend)
> 2. Keep track of the connection limits of zones, and check if it
> exceeds the limit.
> 3. An API for userspace to set/delete/get the conntrack zone limit.
>
> This patch series implements item 2 and 3, and it reuses the
> nf_conncount from netfiler for the first part.
>
OK. Thanks for the info.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit
  2018-04-24  6:34       ` Pravin Shelar
@ 2018-04-24 17:42         ` David Miller
  2018-04-24 18:21           ` Yi-Hung Wei
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2018-04-24 17:42 UTC (permalink / raw)
  To: pshelar; +Cc: yihung.wei, netdev, fw

From: Pravin Shelar <pshelar@ovn.org>
Date: Mon, 23 Apr 2018 23:34:48 -0700

> OK. Thanks for the info.

So, ACK, Reviewed-by, etc.? :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit
  2018-04-24 17:42         ` David Miller
@ 2018-04-24 18:21           ` Yi-Hung Wei
  2018-04-24 19:03             ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Yi-Hung Wei @ 2018-04-24 18:21 UTC (permalink / raw)
  To: David Miller
  Cc: Pravin Shelar, Linux Kernel Network Developers, Florian Westphal

On Tue, Apr 24, 2018 at 10:42 AM, David Miller <davem@davemloft.net> wrote:
> From: Pravin Shelar <pshelar@ovn.org>
> Date: Mon, 23 Apr 2018 23:34:48 -0700
>
>> OK. Thanks for the info.
>
> So, ACK, Reviewed-by, etc.? :-)
>

Parvin provides feedback in a previous email.  I will address them and
send out v3.

Thanks,

-Yi-Hung

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit
  2018-04-24 18:21           ` Yi-Hung Wei
@ 2018-04-24 19:03             ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2018-04-24 19:03 UTC (permalink / raw)
  To: yihung.wei; +Cc: pshelar, netdev, fw

From: Yi-Hung Wei <yihung.wei@gmail.com>
Date: Tue, 24 Apr 2018 11:21:33 -0700

> On Tue, Apr 24, 2018 at 10:42 AM, David Miller <davem@davemloft.net> wrote:
>> From: Pravin Shelar <pshelar@ovn.org>
>> Date: Mon, 23 Apr 2018 23:34:48 -0700
>>
>>> OK. Thanks for the info.
>>
>> So, ACK, Reviewed-by, etc.? :-)
>>
> 
> Parvin provides feedback in a previous email.  I will address them and
> send out v3.

Aha, I see, thanks for explaining.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 2/2] openvswitch: Support conntrack zone limit
  2018-04-24  6:30   ` Pravin Shelar
@ 2018-04-25 21:51     ` Yi-Hung Wei
  2018-04-27  7:28       ` Pravin Shelar
  0 siblings, 1 reply; 13+ messages in thread
From: Yi-Hung Wei @ 2018-04-25 21:51 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: Linux Kernel Network Developers

>> +#if    IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
>> +#define OVS_CT_LIMIT_UNLIMITED 0
>> +#define OVS_CT_LIMIT_DEFAULT OVS_CT_LIMIT_UNLIMITED
>> +#define CT_LIMIT_HASH_BUCKETS 512
>> +
> Can you use static key when the limit is not set.
> This would avoid overhead in datapath when these limits are not used.
>
Thanks Parvin for the review.  I'm not sure about the 'static key'
part, are you suggesting that say if we can have a flag to check if no
one has ever set the ct_limit?   If the ct_limit feature is not used
so far, then instead of look up the hash table for the zone limit, we
just return the default unlimited value.  So that we can avoid the
overhead of checking ct_limit.

>> +struct ovs_ct_limit {
>> +       /* Elements in ovs_ct_limit_info->limits hash table */
>> +       struct hlist_node hlist_node;
>> +       struct rcu_head rcu;
>> +       u16 zone;
>> +       u32 limit;
>> +};
>> +
> ...


>>  /* Lookup connection and confirm if unconfirmed. */
>>  static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
>>                          const struct ovs_conntrack_info *info,
>> @@ -1054,6 +1176,13 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
>>         if (!ct)
>>                 return 0;
>>
>> +#if    IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
>> +       err = ovs_ct_check_limit(net, info,
>> +                                &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
>> +       if (err)
>> +               return err;
>> +#endif
>> +
>
> This could be checked during flow install time, so that only permitted
> flows would have 'ct commit' action, we can avoid per packet cost
> checking the limit.
It seems to me that it would be hard to check the # of connections of
a flow in the flow installation stage.  For example, if we have a
datapath flow that performs “ct commit” action on all UDP traffic from
in_port 1, then it could be various combinations of 5-tuple that form
various # of connections.  Therefore, it would be hard to do the
admission control there.


> returning error code form ovs_ct_commit() is lost in datapath and it
> would be hard to debug packet lost in case of the limit is reached. So
> another advantage of checking the limit in flow install be better
> traceability. datapath would return error to usespace and it can log
> the error code.
Thanks for pointing out the error code from ovs_ct_commit() will be
lost in the datapath.  In this case, shall we consider to report the
packet drop by some rate_limit logging such as net_warn_ratelimited()
or net_info_ratelimited()?

Thanks,

-Yi-Hung

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 2/2] openvswitch: Support conntrack zone limit
  2018-04-25 21:51     ` Yi-Hung Wei
@ 2018-04-27  7:28       ` Pravin Shelar
  0 siblings, 0 replies; 13+ messages in thread
From: Pravin Shelar @ 2018-04-27  7:28 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: Linux Kernel Network Developers

On Wed, Apr 25, 2018 at 2:51 PM, Yi-Hung Wei <yihung.wei@gmail.com> wrote:
>>> +#if    IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
>>> +#define OVS_CT_LIMIT_UNLIMITED 0
>>> +#define OVS_CT_LIMIT_DEFAULT OVS_CT_LIMIT_UNLIMITED
>>> +#define CT_LIMIT_HASH_BUCKETS 512
>>> +
>> Can you use static key when the limit is not set.
>> This would avoid overhead in datapath when these limits are not used.
>>
> Thanks Parvin for the review.  I'm not sure about the 'static key'
> part, are you suggesting that say if we can have a flag to check if no
> one has ever set the ct_limit?   If the ct_limit feature is not used
> so far, then instead of look up the hash table for the zone limit, we
> just return the default unlimited value.  So that we can avoid the
> overhead of checking ct_limit.
>

Right, here documentaion of static keys:
https://www.kernel.org/doc/Documentation/static-keys.txt

>>> +struct ovs_ct_limit {
>>> +       /* Elements in ovs_ct_limit_info->limits hash table */
>>> +       struct hlist_node hlist_node;
>>> +       struct rcu_head rcu;
>>> +       u16 zone;
>>> +       u32 limit;
>>> +};
>>> +
>> ...
>
>
>>>  /* Lookup connection and confirm if unconfirmed. */
>>>  static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
>>>                          const struct ovs_conntrack_info *info,
>>> @@ -1054,6 +1176,13 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key,
>>>         if (!ct)
>>>                 return 0;
>>>
>>> +#if    IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
>>> +       err = ovs_ct_check_limit(net, info,
>>> +                                &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
>>> +       if (err)
>>> +               return err;
>>> +#endif
>>> +
>>
>> This could be checked during flow install time, so that only permitted
>> flows would have 'ct commit' action, we can avoid per packet cost
>> checking the limit.
> It seems to me that it would be hard to check the # of connections of
> a flow in the flow installation stage.  For example, if we have a
> datapath flow that performs “ct commit” action on all UDP traffic from
> in_port 1, then it could be various combinations of 5-tuple that form
> various # of connections.  Therefore, it would be hard to do the
> admission control there.
>

Ok, I see the issue.
I think we could still avoid the lookup every time by checking the
limits for unconfirmed connections (ref: nf_ct_is_confirmed()).
So once connection confirmed and is under limit  we should allow all
packets for given connection.
This way we can avoid connection disruption when we are reaching near
connection limit.

>
>> returning error code form ovs_ct_commit() is lost in datapath and it
>> would be hard to debug packet lost in case of the limit is reached. So
>> another advantage of checking the limit in flow install be better
>> traceability. datapath would return error to usespace and it can log
>> the error code.
> Thanks for pointing out the error code from ovs_ct_commit() will be
> lost in the datapath.  In this case, shall we consider to report the
> packet drop by some rate_limit logging such as net_warn_ratelimited()
> or net_info_ratelimited()?
>

ok, that would be fine.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-04-27  7:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-18  0:30 [PATCH net-next v2 0/2] openvswitch: Support conntrack zone limit Yi-Hung Wei
2018-04-18  0:30 ` [PATCH net-next v2 1/2] openvswitch: Add conntrack limit netlink definition Yi-Hung Wei
2018-04-18  0:30 ` [PATCH net-next v2 2/2] openvswitch: Support conntrack zone limit Yi-Hung Wei
2018-04-24  6:30   ` Pravin Shelar
2018-04-25 21:51     ` Yi-Hung Wei
2018-04-27  7:28       ` Pravin Shelar
2018-04-23 13:39 ` [PATCH net-next v2 0/2] " David Miller
2018-04-23 20:10   ` Pravin Shelar
2018-04-23 21:19     ` Yi-Hung Wei
2018-04-24  6:34       ` Pravin Shelar
2018-04-24 17:42         ` David Miller
2018-04-24 18:21           ` Yi-Hung Wei
2018-04-24 19:03             ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.