All of lore.kernel.org
 help / color / mirror / Atom feed
From: Saeed Mahameed <saeedm@mellanox.com>
To: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Huy Nguyen <huyn@mellanox.com>,
	Saeed Mahameed <saeedm@mellanox.com>
Subject: [net-next 8/9] net/mlx5: E-Switch, Add support for VEPA in legacy mode.
Date: Fri, 22 Feb 2019 13:44:25 -0800	[thread overview]
Message-ID: <20190222214426.27833-9-saeedm@mellanox.com> (raw)
In-Reply-To: <20190222214426.27833-1-saeedm@mellanox.com>

From: Huy Nguyen <huyn@mellanox.com>

In Virtual Ethernet Port Aggregator (VEPA) mode, the packet skips
the system internal virtual switch and forwards to external network
switch. In Mellanox HCA case, the virtual switch is the HCA's Eswitch.

To support this, an new FDB flow table are created with level 0 and
linked to the existing FDB flow table in legacy mode. By default,
VEPA is turned off and this FDB flow table is empty. When VEPA is
turned on, two rules are created. One rule to forward on uplink vport
traffic to the legacy FDB. The other rule forward all other traffic
to uplink vport.

Other design alternatives were not chosen as explained below:
1. Create a forward rule in ACL flow table (most efficient design).
This approach is the not chosen because firmware does not support
forward rule to uplink vport (0xffff) for ACL flow table.
2. Add additional source port criteria in all the FDB rules to make the
FDB rules to be received rules only. This approach is not chosen because
it is not efficient as there can many rules in the FDB and VEPA mode
cannot be controlled per vport.
3. Add a highest prioirty flow group in the existing legacy FDB Flow
Table instead of a new flow table. This approoach does not work because the
new flow group has the same match criteria as the promiscuous flow group
and mlx5_add_flow_rules does not allow specifying flow group.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 223 ++++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |   5 +
 2 files changed, 207 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 6cb9710f76d5..84200d93cf9d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -64,6 +64,9 @@ enum {
 	PROMISC_CHANGE = BIT(3),
 };
 
+static void esw_destroy_legacy_fdb_table(struct mlx5_eswitch *esw);
+static void esw_cleanup_vepa_rules(struct mlx5_eswitch *esw);
+
 /* Vport context events */
 #define SRIOV_VPORT_EVENTS (UC_ADDR_CHANGE | \
 			    MC_ADDR_CHANGE | \
@@ -268,6 +271,37 @@ esw_fdb_set_vport_promisc_rule(struct mlx5_eswitch *esw, u16 vport)
 	return __esw_fdb_set_vport_rule(esw, vport, true, mac_c, mac_v);
 }
 
+enum {
+	LEGACY_VEPA_PRIO = 0,
+	LEGACY_FDB_PRIO,
+};
+
+static int esw_create_legacy_vepa_table(struct mlx5_eswitch *esw)
+{
+	struct mlx5_core_dev *dev = esw->dev;
+	struct mlx5_flow_namespace *root_ns;
+	struct mlx5_flow_table *fdb;
+	int err;
+
+	root_ns = mlx5_get_fdb_sub_ns(dev, 0);
+	if (!root_ns) {
+		esw_warn(dev, "Failed to get FDB flow namespace\n");
+		return -EOPNOTSUPP;
+	}
+
+	/* num FTE 2, num FG 2 */
+	fdb = mlx5_create_auto_grouped_flow_table(root_ns, LEGACY_VEPA_PRIO,
+						  2, 2, 0, 0);
+	if (IS_ERR(fdb)) {
+		err = PTR_ERR(fdb);
+		esw_warn(dev, "Failed to create VEPA FDB err %d\n", err);
+		return err;
+	}
+	esw->fdb_table.legacy.vepa_fdb = fdb;
+
+	return 0;
+}
+
 static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw)
 {
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
@@ -296,8 +330,8 @@ static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw)
 		return -ENOMEM;
 
 	table_size = BIT(MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size));
-
 	ft_attr.max_fte = table_size;
+	ft_attr.prio = LEGACY_FDB_PRIO;
 	fdb = mlx5_create_flow_table(root_ns, &ft_attr);
 	if (IS_ERR(fdb)) {
 		err = PTR_ERR(fdb);
@@ -356,41 +390,65 @@ static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw)
 	esw->fdb_table.legacy.promisc_grp = g;
 
 out:
-	if (err) {
-		if (!IS_ERR_OR_NULL(esw->fdb_table.legacy.allmulti_grp)) {
-			mlx5_destroy_flow_group(esw->fdb_table.legacy.allmulti_grp);
-			esw->fdb_table.legacy.allmulti_grp = NULL;
-		}
-		if (!IS_ERR_OR_NULL(esw->fdb_table.legacy.addr_grp)) {
-			mlx5_destroy_flow_group(esw->fdb_table.legacy.addr_grp);
-			esw->fdb_table.legacy.addr_grp = NULL;
-		}
-		if (!IS_ERR_OR_NULL(esw->fdb_table.legacy.fdb)) {
-			mlx5_destroy_flow_table(esw->fdb_table.legacy.fdb);
-			esw->fdb_table.legacy.fdb = NULL;
-		}
-	}
+	if (err)
+		esw_destroy_legacy_fdb_table(esw);
 
 	kvfree(flow_group_in);
 	return err;
 }
 
+static void esw_destroy_legacy_vepa_table(struct mlx5_eswitch *esw)
+{
+	esw_debug(esw->dev, "Destroy VEPA Table\n");
+	if (!esw->fdb_table.legacy.vepa_fdb)
+		return;
+
+	mlx5_destroy_flow_table(esw->fdb_table.legacy.vepa_fdb);
+	esw->fdb_table.legacy.vepa_fdb = NULL;
+}
+
 static void esw_destroy_legacy_fdb_table(struct mlx5_eswitch *esw)
 {
+	esw_debug(esw->dev, "Destroy FDB Table\n");
 	if (!esw->fdb_table.legacy.fdb)
 		return;
 
-	esw_debug(esw->dev, "Destroy FDB Table\n");
-	mlx5_destroy_flow_group(esw->fdb_table.legacy.promisc_grp);
-	mlx5_destroy_flow_group(esw->fdb_table.legacy.allmulti_grp);
-	mlx5_destroy_flow_group(esw->fdb_table.legacy.addr_grp);
+	if (esw->fdb_table.legacy.promisc_grp)
+		mlx5_destroy_flow_group(esw->fdb_table.legacy.promisc_grp);
+	if (esw->fdb_table.legacy.allmulti_grp)
+		mlx5_destroy_flow_group(esw->fdb_table.legacy.allmulti_grp);
+	if (esw->fdb_table.legacy.addr_grp)
+		mlx5_destroy_flow_group(esw->fdb_table.legacy.addr_grp);
 	mlx5_destroy_flow_table(esw->fdb_table.legacy.fdb);
+
 	esw->fdb_table.legacy.fdb = NULL;
 	esw->fdb_table.legacy.addr_grp = NULL;
 	esw->fdb_table.legacy.allmulti_grp = NULL;
 	esw->fdb_table.legacy.promisc_grp = NULL;
 }
 
+static int esw_create_legacy_table(struct mlx5_eswitch *esw)
+{
+	int err;
+
+	err = esw_create_legacy_vepa_table(esw);
+	if (err)
+		return err;
+
+	err = esw_create_legacy_fdb_table(esw);
+	if (err)
+		esw_destroy_legacy_vepa_table(esw);
+
+	return err;
+}
+
+static void esw_destroy_legacy_table(struct mlx5_eswitch *esw)
+{
+	esw_cleanup_vepa_rules(esw);
+	esw_destroy_legacy_fdb_table(esw);
+	esw_destroy_legacy_vepa_table(esw);
+}
+
 /* E-Switch vport UC/MC lists management */
 typedef int (*vport_addr_action)(struct mlx5_eswitch *esw,
 				 struct vport_addr *vaddr);
@@ -1677,7 +1735,9 @@ int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs, int mode)
 	mlx5_lag_update(esw->dev);
 
 	if (mode == SRIOV_LEGACY) {
-		err = esw_create_legacy_fdb_table(esw);
+		err = esw_create_legacy_table(esw);
+		if (err)
+			goto abort;
 	} else {
 		mlx5_reload_interface(esw->dev, MLX5_INTERFACE_PROTOCOL_ETH);
 		mlx5_reload_interface(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
@@ -1758,7 +1818,7 @@ void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw)
 	esw_destroy_tsar(esw);
 
 	if (esw->mode == SRIOV_LEGACY)
-		esw_destroy_legacy_fdb_table(esw);
+		esw_destroy_legacy_table(esw);
 	else if (esw->mode == SRIOV_OFFLOADS)
 		esw_offloads_cleanup(esw);
 
@@ -2041,6 +2101,127 @@ int mlx5_eswitch_set_vport_spoofchk(struct mlx5_eswitch *esw,
 	return err;
 }
 
+static void esw_cleanup_vepa_rules(struct mlx5_eswitch *esw)
+{
+	if (esw->fdb_table.legacy.vepa_uplink_rule)
+		mlx5_del_flow_rules(esw->fdb_table.legacy.vepa_uplink_rule);
+
+	if (esw->fdb_table.legacy.vepa_star_rule)
+		mlx5_del_flow_rules(esw->fdb_table.legacy.vepa_star_rule);
+
+	esw->fdb_table.legacy.vepa_uplink_rule = NULL;
+	esw->fdb_table.legacy.vepa_star_rule = NULL;
+}
+
+static int _mlx5_eswitch_set_vepa_locked(struct mlx5_eswitch *esw,
+					 u8 setting)
+{
+	struct mlx5_flow_destination dest = {};
+	struct mlx5_flow_act flow_act = {};
+	struct mlx5_flow_handle *flow_rule;
+	struct mlx5_flow_spec *spec;
+	int err = 0;
+	void *misc;
+
+	if (!setting) {
+		esw_cleanup_vepa_rules(esw);
+		return 0;
+	}
+
+	if (esw->fdb_table.legacy.vepa_uplink_rule)
+		return 0;
+
+	spec = kvzalloc(sizeof(*spec), GFP_KERNEL);
+	if (!spec)
+		return -ENOMEM;
+
+	/* Uplink rule forward uplink traffic to FDB */
+	misc = MLX5_ADDR_OF(fte_match_param, spec->match_value, misc_parameters);
+	MLX5_SET(fte_match_set_misc, misc, source_port, MLX5_VPORT_UPLINK);
+
+	misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters);
+	MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_port);
+
+	spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS;
+	dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+	dest.ft = esw->fdb_table.legacy.fdb;
+	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+	flow_rule = mlx5_add_flow_rules(esw->fdb_table.legacy.vepa_fdb, spec,
+					&flow_act, &dest, 1);
+	if (IS_ERR(flow_rule)) {
+		err = PTR_ERR(flow_rule);
+		goto out;
+	} else {
+		esw->fdb_table.legacy.vepa_uplink_rule = flow_rule;
+	}
+
+	/* Star rule to forward all traffic to uplink vport */
+	memset(spec, 0, sizeof(*spec));
+	dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
+	dest.vport.num = MLX5_VPORT_UPLINK;
+	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+	flow_rule = mlx5_add_flow_rules(esw->fdb_table.legacy.vepa_fdb, spec,
+					&flow_act, &dest, 1);
+	if (IS_ERR(flow_rule)) {
+		err = PTR_ERR(flow_rule);
+		goto out;
+	} else {
+		esw->fdb_table.legacy.vepa_star_rule = flow_rule;
+	}
+
+out:
+	kvfree(spec);
+	if (err)
+		esw_cleanup_vepa_rules(esw);
+	return err;
+}
+
+int mlx5_eswitch_set_vepa(struct mlx5_eswitch *esw, u8 setting)
+{
+	int err = 0;
+
+	if (!esw)
+		return -EOPNOTSUPP;
+
+	if (!ESW_ALLOWED(esw))
+		return -EPERM;
+
+	mutex_lock(&esw->state_lock);
+	if (esw->mode != SRIOV_LEGACY) {
+		err = -EOPNOTSUPP;
+		goto out;
+	}
+
+	err = _mlx5_eswitch_set_vepa_locked(esw, setting);
+
+out:
+	mutex_unlock(&esw->state_lock);
+	return err;
+}
+
+int mlx5_eswitch_get_vepa(struct mlx5_eswitch *esw, u8 *setting)
+{
+	int err = 0;
+
+	if (!esw)
+		return -EOPNOTSUPP;
+
+	if (!ESW_ALLOWED(esw))
+		return -EPERM;
+
+	mutex_lock(&esw->state_lock);
+	if (esw->mode != SRIOV_LEGACY) {
+		err = -EOPNOTSUPP;
+		goto out;
+	}
+
+	*setting = esw->fdb_table.legacy.vepa_uplink_rule ? 1 : 0;
+
+out:
+	mutex_unlock(&esw->state_lock);
+	return err;
+}
+
 int mlx5_eswitch_set_vport_trust(struct mlx5_eswitch *esw,
 				 int vport, bool setting)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index af5581a57e56..5f82e637410b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -137,6 +137,9 @@ struct mlx5_eswitch_fdb {
 			struct mlx5_flow_group *addr_grp;
 			struct mlx5_flow_group *allmulti_grp;
 			struct mlx5_flow_group *promisc_grp;
+			struct mlx5_flow_table *vepa_fdb;
+			struct mlx5_flow_handle *vepa_uplink_rule;
+			struct mlx5_flow_handle *vepa_star_rule;
 		} legacy;
 
 		struct offloads_fdb {
@@ -242,6 +245,8 @@ int mlx5_eswitch_set_vport_trust(struct mlx5_eswitch *esw,
 				 int vport_num, bool setting);
 int mlx5_eswitch_set_vport_rate(struct mlx5_eswitch *esw, int vport,
 				u32 max_rate, u32 min_rate);
+int mlx5_eswitch_set_vepa(struct mlx5_eswitch *esw, u8 setting);
+int mlx5_eswitch_get_vepa(struct mlx5_eswitch *esw, u8 *setting);
 int mlx5_eswitch_get_vport_config(struct mlx5_eswitch *esw,
 				  int vport, struct ifla_vf_info *ivi);
 int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
-- 
2.20.1


  parent reply	other threads:[~2019-02-22 21:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-22 21:44 [pull request][net-next 0/9] Mellanox, mlx5 updates 2019-02-21 Saeed Mahameed
2019-02-22 21:44 ` [net-next 1/9] net/mlx5: Use read-modify-write when changing PCMR register values Saeed Mahameed
2019-02-22 21:44 ` [net-next 2/9] net/mlx5: Introduce tunnel entropy control in PCMR register Saeed Mahameed
2019-02-22 21:44 ` [net-next 3/9] net/mlx5e: Fix GRE key by controlling port tunnel entropy calculation Saeed Mahameed
2019-02-22 21:44 ` [net-next 4/9] net/mlx5e: Fix warn print in case of TX reporter creation failure Saeed Mahameed
2019-02-22 21:44 ` [net-next 5/9] net/mlx5e: Re-add support for TX timeout when TX reporter is not valid Saeed Mahameed
2019-02-22 21:44 ` [net-next 6/9] net/mlx5e: Fix return status of TX reporter timeout recover Saeed Mahameed
2019-02-22 21:44 ` [net-next 7/9] net/mlx5e: Fix mlx5e_tx_reporter_create return value Saeed Mahameed
2019-02-22 21:44 ` Saeed Mahameed [this message]
2019-02-22 21:44 ` [net-next 9/9] net/mlx5: Support ndo bridge_setlink and getlink Saeed Mahameed
2019-02-23 21:56 ` [pull request][net-next 0/9] Mellanox, mlx5 updates 2019-02-21 David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190222214426.27833-9-saeedm@mellanox.com \
    --to=saeedm@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=huyn@mellanox.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.