All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] mlx5: Create build configuration options
@ 2017-01-26 23:32 Tom Herbert
  2017-01-26 23:32 ` [PATCH net-next 1/4] mlx5: Make building eswitch configurable Tom Herbert
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Tom Herbert @ 2017-01-26 23:32 UTC (permalink / raw)
  To: saeedm, davem, netdev; +Cc: kernel-team

This patchset creates configuration options for sriov, vxlan, eswitch,
and tc features in the mlx5 driver. The purpose of this is to allow not
building these features. These features are optional advanced features
that are not required for a core Ethernet driver. A user can disable
these features which resuces the amount of code in the driver. Disabling
these features (and DCB) reduces the size of mlx5_core.o by about 16%.
This is also can reduce the complexity of backport and rebases since
user would no longer need to worry about dependencies with the rest of
the kernel that features which might not be of any interest to a user
may bring in.

Tested: Build and ran the driver with all features enabled (the default)
and with none enabled (including DCB). Did not see any issues. I did
not explicity test operation of ayy of features in the list.


Tom Herbert (4):
  mlx5: Make building eswitch configurable
  mlx5: Make building SR-IOV configurable
  mlx5: Make building tc hardware offload configurable
  mlx5: Make building vxlan hardware offload configurable

 drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  35 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  16 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 129 ++++++++++++++++------
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   |  39 +++++--
 drivers/net/ethernet/mellanox/mlx5/core/eq.c      |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/lag.c     |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c    |  32 ++++--
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |   6 +-
 8 files changed, 205 insertions(+), 58 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-26 23:32 [PATCH net-next 0/4] mlx5: Create build configuration options Tom Herbert
@ 2017-01-26 23:32 ` Tom Herbert
  2017-01-27  5:34   ` Or Gerlitz
  2017-01-27 18:19   ` Saeed Mahameed
  2017-01-26 23:32 ` [PATCH net-next 2/4] mlx5: Make building SR-IOV configurable Tom Herbert
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 33+ messages in thread
From: Tom Herbert @ 2017-01-26 23:32 UTC (permalink / raw)
  To: saeedm, davem, netdev; +Cc: kernel-team

Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
whether the eswitch code is built. Change Kconfig and Makefile
accordingly.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig   | 11 +++
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 92 +++++++++++++++++------
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   | 39 +++++++---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c      |  4 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c    | 16 ++--
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |  6 +-
 7 files changed, 125 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index ddb4ca4..27aae58 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -30,3 +30,14 @@ config MLX5_CORE_EN_DCB
 	  This flag is depended on the kernel's DCB support.
 
 	  If unsure, set to Y
+
+config MLX5_CORE_EN_ESWITCH
+	bool "Ethernet switch"
+	default y
+	depends on MLX5_CORE_EN
+	---help---
+	  Say Y here if you want to use Ethernet switch (E-switch). E-Switch
+	  is the software entity that represents and manages ConnectX4
+	  inter-HCA ethernet l2 switching.
+
+	  If unsure, set to Y
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9f43beb..17025d8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -5,9 +5,11 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
 		fs_counters.o rl.o lag.o dev.o
 
-mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
+mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o \
 		en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
 		en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
-		en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o en_selftest.o
+		en_tc.o en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
+
+mlx5_core-$(CONFIG_MLX5_CORE_EN_ESWITCH) += eswitch.o eswitch_offloads.o en_rep.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e829143..1db4d98 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -38,7 +38,9 @@
 #include <linux/bpf.h>
 #include "en.h"
 #include "en_tc.h"
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 #include "eswitch.h"
+#endif
 #include "vxlan.h"
 
 struct mlx5e_rq_param {
@@ -585,10 +587,12 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
 
 	switch (priv->params.rq_wq_type) {
 	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 		if (mlx5e_is_vf_vport_rep(priv)) {
 			err = -EINVAL;
 			goto err_rq_wq_destroy;
 		}
+#endif
 
 		rq->handle_rx_cqe = mlx5e_handle_rx_cqe_mpwrq;
 		rq->alloc_wqe = mlx5e_alloc_rx_mpwqe;
@@ -617,10 +621,14 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
 			goto err_rq_wq_destroy;
 		}
 
-		if (mlx5e_is_vf_vport_rep(priv))
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
+		if (mlx5e_is_vf_vport_rep(priv)) {
 			rq->handle_rx_cqe = mlx5e_handle_rx_cqe_rep;
-		else
+		} else
+#endif
+		{
 			rq->handle_rx_cqe = mlx5e_handle_rx_cqe;
+		}
 
 		rq->alloc_wqe = mlx5e_alloc_rx_wqe;
 		rq->dealloc_wqe = mlx5e_dealloc_rx_wqe;
@@ -2198,7 +2206,6 @@ static void mlx5e_netdev_set_tcs(struct net_device *netdev)
 int mlx5e_open_locked(struct net_device *netdev)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
-	struct mlx5_core_dev *mdev = priv->mdev;
 	int num_txqs;
 	int err;
 
@@ -2233,11 +2240,13 @@ int mlx5e_open_locked(struct net_device *netdev)
 	if (priv->profile->update_stats)
 		queue_delayed_work(priv->wq, &priv->update_stats_work, 0);
 
-	if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
+	if (MLX5_CAP_GEN(priv->mdev, vport_group_manager)) {
 		err = mlx5e_add_sqs_fwd_rules(priv);
 		if (err)
 			goto err_close_channels;
 	}
+#endif
 	return 0;
 
 err_close_channels:
@@ -2262,7 +2271,6 @@ int mlx5e_open(struct net_device *netdev)
 int mlx5e_close_locked(struct net_device *netdev)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
-	struct mlx5_core_dev *mdev = priv->mdev;
 
 	/* May already be CLOSED in case a previous configuration operation
 	 * (e.g RX/TX queue size change) that involves close&open failed.
@@ -2272,8 +2280,10 @@ int mlx5e_close_locked(struct net_device *netdev)
 
 	clear_bit(MLX5E_STATE_OPENED, &priv->state);
 
-	if (MLX5_CAP_GEN(mdev, vport_group_manager))
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
+	if (MLX5_CAP_GEN(priv->mdev, vport_group_manager))
 		mlx5e_remove_sqs_fwd_rules(priv);
+#endif
 
 	mlx5e_timestamp_cleanup(priv);
 	netif_carrier_off(priv->netdev);
@@ -2742,12 +2752,15 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
 	struct mlx5e_vport_stats *vstats = &priv->stats.vport;
 	struct mlx5e_pport_stats *pstats = &priv->stats.pport;
 
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	if (mlx5e_is_uplink_rep(priv)) {
 		stats->rx_packets = PPORT_802_3_GET(pstats, a_frames_received_ok);
 		stats->rx_bytes   = PPORT_802_3_GET(pstats, a_octets_received_ok);
 		stats->tx_packets = PPORT_802_3_GET(pstats, a_frames_transmitted_ok);
 		stats->tx_bytes   = PPORT_802_3_GET(pstats, a_octets_transmitted_ok);
-	} else {
+	} else
+#endif
+	{
 		stats->rx_packets = sstats->rx_packets;
 		stats->rx_bytes   = sstats->rx_bytes;
 		stats->tx_packets = sstats->tx_packets;
@@ -2991,6 +3004,7 @@ static int mlx5e_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	}
 }
 
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 static int mlx5e_set_vf_mac(struct net_device *dev, int vf, u8 *mac)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
@@ -3093,6 +3107,7 @@ static int mlx5e_get_vf_stats(struct net_device *dev,
 	return mlx5_eswitch_get_vport_stats(mdev->priv.eswitch, vf + 1,
 					    vf_stats);
 }
+#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
 
 void mlx5e_add_vxlan_port(struct net_device *netdev,
 			  struct udp_tunnel_info *ti)
@@ -3329,6 +3344,7 @@ static const struct net_device_ops mlx5e_netdev_ops_basic = {
 #endif
 };
 
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 static const struct net_device_ops mlx5e_netdev_ops_sriov = {
 	.ndo_open                = mlx5e_open,
 	.ndo_stop                = mlx5e_close,
@@ -3358,14 +3374,15 @@ static const struct net_device_ops mlx5e_netdev_ops_sriov = {
 	.ndo_get_vf_config       = mlx5e_get_vf_config,
 	.ndo_set_vf_link_state   = mlx5e_set_vf_link_state,
 	.ndo_get_vf_stats        = mlx5e_get_vf_stats,
+	.ndo_has_offload_stats	 = mlx5e_has_offload_stats,
+	.ndo_get_offload_stats	 = mlx5e_get_offload_stats,
 	.ndo_tx_timeout          = mlx5e_tx_timeout,
 	.ndo_xdp		 = mlx5e_xdp,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller     = mlx5e_netpoll,
 #endif
-	.ndo_has_offload_stats	 = mlx5e_has_offload_stats,
-	.ndo_get_offload_stats	 = mlx5e_get_offload_stats,
 };
+#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
 
 static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
 {
@@ -3586,13 +3603,16 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 
 	SET_NETDEV_DEV(netdev, &mdev->pdev->dev);
 
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
 		netdev->netdev_ops = &mlx5e_netdev_ops_sriov;
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 		if (MLX5_CAP_GEN(mdev, qos))
 			netdev->dcbnl_ops = &mlx5e_dcbnl_ops;
 #endif
-	} else {
+	} else
+#endif
+	{
 		netdev->netdev_ops = &mlx5e_netdev_ops_basic;
 	}
 
@@ -3795,14 +3815,16 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 {
 	struct net_device *netdev = priv->netdev;
 	struct mlx5_core_dev *mdev = priv->mdev;
-	struct mlx5_eswitch *esw = mdev->priv.eswitch;
-	struct mlx5_eswitch_rep rep;
 
 	mlx5_lag_add(mdev, netdev);
 
 	mlx5e_enable_async_events(priv);
 
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
+		struct mlx5_eswitch *esw = mdev->priv.eswitch;
+		struct mlx5_eswitch_rep rep;
+
 		mlx5_query_nic_vport_mac_address(mdev, 0, rep.hw_id);
 		rep.load = mlx5e_nic_rep_load;
 		rep.unload = mlx5e_nic_rep_unload;
@@ -3810,6 +3832,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 		rep.netdev = netdev;
 		mlx5_eswitch_register_vport_rep(esw, 0, &rep);
 	}
+#endif
 
 	if (netdev->reg_state != NETREG_REGISTERED)
 		return;
@@ -3827,11 +3850,14 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 static void mlx5e_nic_disable(struct mlx5e_priv *priv)
 {
 	struct mlx5_core_dev *mdev = priv->mdev;
-	struct mlx5_eswitch *esw = mdev->priv.eswitch;
 
 	queue_work(priv->wq, &priv->set_rx_mode_work);
-	if (MLX5_CAP_GEN(mdev, vport_group_manager))
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
+	if (MLX5_CAP_GEN(priv->mdev, vport_group_manager)) {
+		struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 		mlx5_eswitch_unregister_vport_rep(esw, 0);
+	}
+#endif
 	mlx5e_disable_async_events(priv);
 	mlx5_lag_remove(mdev);
 }
@@ -3942,6 +3968,7 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
 	return err;
 }
 
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
 {
 	struct mlx5_eswitch *esw = mdev->priv.eswitch;
@@ -3964,6 +3991,7 @@ static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
 		mlx5_eswitch_register_vport_rep(esw, vport, &rep);
 	}
 }
+#endif
 
 void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
 {
@@ -4028,11 +4056,8 @@ static void mlx5e_detach(struct mlx5_core_dev *mdev, void *vpriv)
 
 static void *mlx5e_add(struct mlx5_core_dev *mdev)
 {
-	struct mlx5_eswitch *esw = mdev->priv.eswitch;
-	int total_vfs = MLX5_TOTAL_VPORTS(mdev);
 	void *ppriv = NULL;
 	void *priv;
-	int vport;
 	int err;
 	struct net_device *netdev;
 
@@ -4040,10 +4065,14 @@ static void *mlx5e_add(struct mlx5_core_dev *mdev)
 	if (err)
 		return NULL;
 
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	mlx5e_register_vport_rep(mdev);
 
-	if (MLX5_CAP_GEN(mdev, vport_group_manager))
+	if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
+		struct mlx5_eswitch *esw = mdev->priv.eswitch;
 		ppriv = &esw->offloads.vport_reps[0];
+	}
+#endif
 
 	netdev = mlx5e_create_netdev(mdev, &mlx5e_nic_profile, ppriv);
 	if (!netdev) {
@@ -4074,8 +4103,16 @@ static void *mlx5e_add(struct mlx5_core_dev *mdev)
 	mlx5e_destroy_netdev(mdev, priv);
 
 err_unregister_reps:
-	for (vport = 1; vport < total_vfs; vport++)
-		mlx5_eswitch_unregister_vport_rep(esw, vport);
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
+	{
+		int total_vfs = MLX5_TOTAL_VPORTS(mdev);
+		struct mlx5_eswitch *esw = mdev->priv.eswitch;
+		int vport;
+
+		for (vport = 1; vport < total_vfs; vport++)
+			mlx5_eswitch_unregister_vport_rep(esw, vport);
+	}
+#endif
 
 	return NULL;
 }
@@ -4093,13 +4130,18 @@ void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 
 static void mlx5e_remove(struct mlx5_core_dev *mdev, void *vpriv)
 {
-	struct mlx5_eswitch *esw = mdev->priv.eswitch;
-	int total_vfs = MLX5_TOTAL_VPORTS(mdev);
 	struct mlx5e_priv *priv = vpriv;
-	int vport;
 
-	for (vport = 1; vport < total_vfs; vport++)
-		mlx5_eswitch_unregister_vport_rep(esw, vport);
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
+	{
+		struct mlx5_eswitch *esw = mdev->priv.eswitch;
+		int total_vfs = MLX5_TOTAL_VPORTS(mdev);
+		int vport;
+
+		for (vport = 1; vport < total_vfs; vport++)
+			mlx5_eswitch_unregister_vport_rep(esw, vport);
+	}
+#endif
 
 	unregister_netdev(priv->netdev);
 	mlx5e_detach(mdev, vpriv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 640f10f..8e5f565 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -128,6 +128,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
 	return rule;
 }
 
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 static struct mlx5_flow_handle *
 mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
 		      struct mlx5_flow_spec *spec,
@@ -160,6 +161,7 @@ static void mlx5e_detach_encap(struct mlx5e_priv *priv,
 		kfree(e);
 	}
 }
+#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
 
 /* we get here also when setting rule to the FW failed, etc. It means that the
  * flow rule itself might not exist, but some offloading related to the actions
@@ -168,7 +170,6 @@ static void mlx5e_detach_encap(struct mlx5e_priv *priv,
 static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
 			      struct mlx5e_tc_flow *flow)
 {
-	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	struct mlx5_fc *counter = NULL;
 
 	if (!IS_ERR(flow->rule)) {
@@ -177,11 +178,17 @@ static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
 		mlx5_fc_destroy(priv->mdev, counter);
 	}
 
-	if (esw && esw->mode == SRIOV_OFFLOADS) {
-		mlx5_eswitch_del_vlan_action(esw, flow->attr);
-		if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
-			mlx5e_detach_encap(priv, flow);
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
+	{
+		struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+
+		if (esw && esw->mode == SRIOV_OFFLOADS) {
+			mlx5_eswitch_del_vlan_action(esw, flow->attr);
+			if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
+				mlx5e_detach_encap(priv, flow);
+		}
 	}
+#endif
 
 	if (!mlx5e_tc_num_filters(priv) && (priv->fs.tc.t)) {
 		mlx5_destroy_flow_table(priv->fs.tc.t);
@@ -679,6 +686,7 @@ static inline int hash_encap_info(struct ip_tunnel_key *key)
 	return jhash(key, sizeof(*key), 0);
 }
 
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 static int mlx5e_route_lookup_ipv4(struct mlx5e_priv *priv,
 				   struct net_device *mirred_dev,
 				   struct net_device **out_dev,
@@ -1129,20 +1137,27 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
 	}
 	return 0;
 }
+#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
 
 int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
 			   struct tc_cls_flower_offload *f)
 {
 	struct mlx5e_tc_table *tc = &priv->fs.tc;
 	int err = 0;
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	bool fdb_flow = false;
+#endif
 	u32 flow_tag, action;
 	struct mlx5e_tc_flow *flow;
 	struct mlx5_flow_spec *spec;
-	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 
-	if (esw && esw->mode == SRIOV_OFFLOADS)
-		fdb_flow = true;
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
+	{
+		struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+
+		if (esw && esw->mode == SRIOV_OFFLOADS)
+			fdb_flow = true;
+	}
 
 	if (fdb_flow)
 		flow = kzalloc(sizeof(*flow) +
@@ -1150,6 +1165,9 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
 			       GFP_KERNEL);
 	else
 		flow = kzalloc(sizeof(*flow), GFP_KERNEL);
+#else
+	flow = kzalloc(sizeof(*flow), GFP_KERNEL);
+#endif
 
 	spec = mlx5_vzalloc(sizeof(*spec));
 	if (!spec || !flow) {
@@ -1163,13 +1181,16 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
 	if (err < 0)
 		goto err_free;
 
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	if (fdb_flow) {
 		flow->attr  = (struct mlx5_esw_flow_attr *)(flow + 1);
 		err = parse_tc_fdb_actions(priv, f->exts, flow);
 		if (err < 0)
 			goto err_free;
 		flow->rule = mlx5e_tc_add_fdb_flow(priv, spec, flow->attr);
-	} else {
+	} else
+#endif
+	{
 		err = parse_tc_nic_actions(priv, f->exts, &action, &flow_tag);
 		if (err < 0)
 			goto err_free;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index ea5d8d3..2c67c25 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -35,7 +35,7 @@
 #include <linux/mlx5/driver.h>
 #include <linux/mlx5/cmd.h>
 #include "mlx5_core.h"
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 #include "eswitch.h"
 #endif
 
@@ -462,7 +462,7 @@ static irqreturn_t mlx5_eq_int(int irq, void *eq_ptr)
 			}
 			break;
 
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 		case MLX5_EVENT_TYPE_NIC_VPORT_CHANGE:
 			mlx5_eswitch_vport_event(dev->priv.eswitch, eqe);
 			break;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 84f7970..224f499 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -53,7 +53,7 @@
 #include <net/devlink.h>
 #include "mlx5_core.h"
 #include "fs_core.h"
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 #include "eswitch.h"
 #endif
 
@@ -941,7 +941,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 		goto err_tables_cleanup;
 	}
 
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	err = mlx5_eswitch_init(dev);
 	if (err) {
 		dev_err(&pdev->dev, "Failed to init eswitch %d\n", err);
@@ -958,7 +958,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 	return 0;
 
 err_eswitch_cleanup:
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	mlx5_eswitch_cleanup(dev->priv.eswitch);
 
 err_rl_cleanup:
@@ -981,7 +981,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 {
 	mlx5_sriov_cleanup(dev);
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	mlx5_eswitch_cleanup(dev->priv.eswitch);
 #endif
 	mlx5_cleanup_rl_table(dev);
@@ -1131,7 +1131,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		goto err_fs;
 	}
 
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	mlx5_eswitch_attach(dev->priv.eswitch);
 #endif
 
@@ -1162,7 +1162,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	mlx5_sriov_detach(dev);
 
 err_sriov:
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	mlx5_eswitch_detach(dev->priv.eswitch);
 #endif
 	mlx5_cleanup_fs(dev);
@@ -1233,7 +1233,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		mlx5_detach_device(dev);
 
 	mlx5_sriov_detach(dev);
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	mlx5_eswitch_detach(dev->priv.eswitch);
 #endif
 	mlx5_cleanup_fs(dev);
@@ -1269,7 +1269,7 @@ struct mlx5_core_event_handler {
 };
 
 static const struct devlink_ops mlx5_devlink_ops = {
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	.eswitch_mode_set = mlx5_devlink_eswitch_mode_set,
 	.eswitch_mode_get = mlx5_devlink_eswitch_mode_get,
 	.eswitch_inline_mode_set = mlx5_devlink_eswitch_inline_mode_set,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
index e086277..0664cc4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
@@ -33,7 +33,7 @@
 #include <linux/pci.h>
 #include <linux/mlx5/driver.h>
 #include "mlx5_core.h"
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_ESWITCH
 #include "eswitch.h"
 #endif
 
@@ -57,7 +57,7 @@ static int mlx5_device_enable_sriov(struct mlx5_core_dev *dev, int num_vfs)
 		return -EBUSY;
 	}
 
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_ESWITCH
 	err = mlx5_eswitch_enable_sriov(dev->priv.eswitch, num_vfs, SRIOV_LEGACY);
 	if (err) {
 		mlx5_core_warn(dev,
@@ -102,7 +102,7 @@ static void mlx5_device_disable_sriov(struct mlx5_core_dev *dev)
 		sriov->enabled_vfs--;
 	}
 
-#ifdef CONFIG_MLX5_CORE_EN
+#ifdef CONFIG_MLX5_CORE_ESWITCH
 	mlx5_eswitch_disable_sriov(dev->priv.eswitch);
 #endif
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 2/4] mlx5: Make building SR-IOV configurable
  2017-01-26 23:32 [PATCH net-next 0/4] mlx5: Create build configuration options Tom Herbert
  2017-01-26 23:32 ` [PATCH net-next 1/4] mlx5: Make building eswitch configurable Tom Herbert
@ 2017-01-26 23:32 ` Tom Herbert
  2017-01-26 23:32 ` [PATCH net-next 3/4] mlx5: Make building tc hardware offload configurable Tom Herbert
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2017-01-26 23:32 UTC (permalink / raw)
  To: saeedm, davem, netdev; +Cc: kernel-team

Add a configuration option (CONFIG_MLX5_CORE_EN_SRIOV) for controlling
whether the eswitch code is built. Change Kconfig and Makefile
accordingly.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig  |  8 ++++++++
 drivers/net/ethernet/mellanox/mlx5/core/Makefile |  4 +++-
 drivers/net/ethernet/mellanox/mlx5/core/lag.c    |  2 ++
 drivers/net/ethernet/mellanox/mlx5/core/main.c   | 16 ++++++++++++++++
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 27aae58..7ade61a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -41,3 +41,11 @@ config MLX5_CORE_EN_ESWITCH
 	  inter-HCA ethernet l2 switching.
 
 	  If unsure, set to Y
+
+config MLX5_CORE_EN_SRIOV
+	bool "SR-IOV"
+	default y
+	---help---
+	  Say Y here if you want to use SR-IOV.
+
+	  If unsure, set to Y
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 17025d8..6d38250 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -2,7 +2,7 @@ obj-$(CONFIG_MLX5_CORE)		+= mlx5_core.o
 
 mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
-		mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
+		mad.o transobj.o vport.o fs_cmd.o fs_core.o \
 		fs_counters.o rl.o lag.o dev.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o \
@@ -13,3 +13,5 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o \
 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_ESWITCH) += eswitch.o eswitch_offloads.o en_rep.o
+
+mlx5_core-$(CONFIG_MLX5_CORE_EN_SRIOV) += sriov.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
index 5595724..6dc3792 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
@@ -223,11 +223,13 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
 	mutex_unlock(&lag_mutex);
 
 	if (tracker.is_bonded && !mlx5_lag_is_bonded(ldev)) {
+#ifdef CONFIG_MLX5_CORE_EN_SRIOV
 		if (mlx5_sriov_is_enabled(dev0) ||
 		    mlx5_sriov_is_enabled(dev1)) {
 			mlx5_core_warn(dev0, "LAG is not supported with SRIOV");
 			return;
 		}
+#endif
 
 		for (i = 0; i < MLX5_MAX_PORTS; i++)
 			mlx5_remove_dev_by_protocol(ldev->pf[i].dev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 224f499..cd6a9c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -949,15 +949,20 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 	}
 #endif
 
+#ifdef CONFIG_MLX5_CORE_EN_SRIOV
 	err = mlx5_sriov_init(dev);
 	if (err) {
 		dev_err(&pdev->dev, "Failed to init sriov %d\n", err);
 		goto err_eswitch_cleanup;
 	}
+#endif
 
 	return 0;
 
+#ifdef CONFIG_MLX5_CORE_EN_SRIOV
 err_eswitch_cleanup:
+#endif
+
 #ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	mlx5_eswitch_cleanup(dev->priv.eswitch);
 
@@ -980,7 +985,9 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 
 static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 {
+#ifdef CONFIG_MLX5_CORE_EN_SRIOV
 	mlx5_sriov_cleanup(dev);
+#endif
 #ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	mlx5_eswitch_cleanup(dev->priv.eswitch);
 #endif
@@ -1135,11 +1142,13 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	mlx5_eswitch_attach(dev->priv.eswitch);
 #endif
 
+#ifdef CONFIG_MLX5_CORE_EN_SRIOV
 	err = mlx5_sriov_attach(dev);
 	if (err) {
 		dev_err(&pdev->dev, "sriov init failed %d\n", err);
 		goto err_sriov;
 	}
+#endif
 
 	if (mlx5_device_registered(dev)) {
 		mlx5_attach_device(dev);
@@ -1159,9 +1168,12 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	return 0;
 
 err_reg_dev:
+#ifdef CONFIG_MLX5_CORE_EN_SRIOV
 	mlx5_sriov_detach(dev);
 
 err_sriov:
+#endif
+
 #ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	mlx5_eswitch_detach(dev->priv.eswitch);
 #endif
@@ -1232,7 +1244,9 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	if (mlx5_device_registered(dev))
 		mlx5_detach_device(dev);
 
+#ifdef CONFIG_MLX5_CORE_EN_SRIOV
 	mlx5_sriov_detach(dev);
+#endif
 #ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 	mlx5_eswitch_detach(dev->priv.eswitch);
 #endif
@@ -1533,7 +1547,9 @@ static struct pci_driver mlx5_core_driver = {
 	.remove         = remove_one,
 	.shutdown	= shutdown,
 	.err_handler	= &mlx5_err_handler,
+#ifdef CONFIG_MLX5_CORE_EN_SRIOV
 	.sriov_configure   = mlx5_core_sriov_configure,
+#endif
 };
 
 static void mlx5_core_verify_params(void)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 3/4] mlx5: Make building tc hardware offload configurable
  2017-01-26 23:32 [PATCH net-next 0/4] mlx5: Create build configuration options Tom Herbert
  2017-01-26 23:32 ` [PATCH net-next 1/4] mlx5: Make building eswitch configurable Tom Herbert
  2017-01-26 23:32 ` [PATCH net-next 2/4] mlx5: Make building SR-IOV configurable Tom Herbert
@ 2017-01-26 23:32 ` Tom Herbert
  2017-01-27  6:29   ` kbuild test robot
  2017-01-27 13:43   ` kbuild test robot
  2017-01-26 23:32 ` [PATCH net-next 4/4] mlx5: Make building vxlan " Tom Herbert
  2017-01-27 17:58 ` [PATCH net-next 0/4] mlx5: Create build configuration options Saeed Mahameed
  4 siblings, 2 replies; 33+ messages in thread
From: Tom Herbert @ 2017-01-26 23:32 UTC (permalink / raw)
  To: saeedm, davem, netdev; +Cc: kernel-team

Add a configuration option (CONFIG_MLX5_CORE_EN_TC) for controlling
whether the eswitch code is built. Change Kconfig and Makefile
accordingly.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  8 ++++++++
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  4 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 10 ++++++++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 7ade61a..b38c920 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -49,3 +49,11 @@ config MLX5_CORE_EN_SRIOV
 	  Say Y here if you want to use SR-IOV.
 
 	  If unsure, set to Y
+
+config MLX5_CORE_EN_TC
+	bool "TC offload"
+	default y
+	---help---
+	  Say Y here if you want to use TC hardware offload support.
+
+	  If unsure, set to Y
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 6d38250..c308531 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -8,10 +8,12 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o \
 		en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
 		en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
-		en_tc.o en_arfs.o en_fs_ethtool.o en_selftest.o
+		en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_ESWITCH) += eswitch.o eswitch_offloads.o en_rep.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_SRIOV) += sriov.o
+
+mlx5_core-$(CONFIG_MLX5_CORE_EN_TC) += en_tc.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 1db4d98..2d2c982 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2690,6 +2690,7 @@ int mlx5e_modify_rqs_vsd(struct mlx5e_priv *priv, bool vsd)
 	return 0;
 }
 
+#ifdef CONFIG_MLX5_CORE_EN_TC
 static int mlx5e_setup_tc(struct net_device *netdev, u8 tc)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
@@ -2743,6 +2744,7 @@ static int mlx5e_ndo_setup_tc(struct net_device *dev, u32 handle,
 
 	return mlx5e_setup_tc(dev, tc->tc);
 }
+#endif
 
 static void
 mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
@@ -3323,7 +3325,9 @@ static const struct net_device_ops mlx5e_netdev_ops_basic = {
 	.ndo_open                = mlx5e_open,
 	.ndo_stop                = mlx5e_close,
 	.ndo_start_xmit          = mlx5e_xmit,
+#ifdef CONFIG_MLX5_CORE_EN_TC
 	.ndo_setup_tc            = mlx5e_ndo_setup_tc,
+#endif
 	.ndo_select_queue        = mlx5e_select_queue,
 	.ndo_get_stats64         = mlx5e_get_stats,
 	.ndo_set_rx_mode         = mlx5e_set_rx_mode,
@@ -3349,7 +3353,9 @@ static const struct net_device_ops mlx5e_netdev_ops_sriov = {
 	.ndo_open                = mlx5e_open,
 	.ndo_stop                = mlx5e_close,
 	.ndo_start_xmit          = mlx5e_xmit,
+#ifdef CONFIG_MLX5_CORE_EN_TC
 	.ndo_setup_tc            = mlx5e_ndo_setup_tc,
+#endif
 	.ndo_select_queue        = mlx5e_select_queue,
 	.ndo_get_stats64         = mlx5e_get_stats,
 	.ndo_set_rx_mode         = mlx5e_set_rx_mode,
@@ -3762,9 +3768,11 @@ static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
 		goto err_destroy_direct_tirs;
 	}
 
+#ifdef CONFIG_MLX5_CORE_EN_TC
 	err = mlx5e_tc_init(priv);
 	if (err)
 		goto err_destroy_flow_steering;
+#endif
 
 	return 0;
 
@@ -3786,7 +3794,9 @@ static void mlx5e_cleanup_nic_rx(struct mlx5e_priv *priv)
 {
 	int i;
 
+#ifdef CONFIG_MLX5_CORE_EN_TC
 	mlx5e_tc_cleanup(priv);
+#endif
 	mlx5e_destroy_flow_steering(priv);
 	mlx5e_destroy_direct_tirs(priv);
 	mlx5e_destroy_indirect_tirs(priv);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 4/4] mlx5: Make building vxlan hardware offload configurable
  2017-01-26 23:32 [PATCH net-next 0/4] mlx5: Create build configuration options Tom Herbert
                   ` (2 preceding siblings ...)
  2017-01-26 23:32 ` [PATCH net-next 3/4] mlx5: Make building tc hardware offload configurable Tom Herbert
@ 2017-01-26 23:32 ` Tom Herbert
  2017-01-27 17:58 ` [PATCH net-next 0/4] mlx5: Create build configuration options Saeed Mahameed
  4 siblings, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2017-01-26 23:32 UTC (permalink / raw)
  To: saeedm, davem, netdev; +Cc: kernel-team

Add a configuration option (CONFIG_MLX5_CORE_EN_UDP_ENCAP_OFFLOAD)
for controlling whether the support for UDP encapsulation offlaod is
supported. Note that only VXLAN offload is supported currently,
however the config option is named to be generic for UDP offloads.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  8 +++++++
 drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  4 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 27 +++++++++++++++++------
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index b38c920..d8ed54a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -57,3 +57,11 @@ config MLX5_CORE_EN_TC
 	  Say Y here if you want to use TC hardware offload support.
 
 	  If unsure, set to Y
+
+config MLX5_CORE_EN_UDP_ENCAP_OFFLOAD
+	bool "UDP encapsulation offload"
+	default y
+	---help---
+	  Say Y here if you want to use UDP encapsulation hardware offload.
+	  Currently, VXLAN offload is uspported.
+
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index c308531..c08c9c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -7,7 +7,7 @@ mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o \
 		en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
-		en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
+		en_rx.o en_rx_am.o en_txrx.o en_clock.o \
 		en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
@@ -17,3 +17,5 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN_ESWITCH) += eswitch.o eswitch_offloads.o en_rep.
 mlx5_core-$(CONFIG_MLX5_CORE_EN_SRIOV) += sriov.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_TC) += en_tc.o
+
+mlx5_core-$(CONFIG_MLX5_CORE_EN_UDP_ENCAP_OFFLOAD) += vxlan.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2d2c982..31a8d88 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -34,14 +34,16 @@
 #include <linux/crash_dump.h>
 #include <net/pkt_cls.h>
 #include <linux/mlx5/fs.h>
-#include <net/vxlan.h>
 #include <linux/bpf.h>
 #include "en.h"
 #include "en_tc.h"
 #ifdef CONFIG_MLX5_CORE_EN_ESWITCH
 #include "eswitch.h"
 #endif
+#ifdef CONFIG_MLX5_CORE_EN_UDP_ENCAP_OFFLOAD
+#include <net/vxlan.h>
 #include "vxlan.h"
+#endif
 
 struct mlx5e_rq_param {
 	u32			rqc[MLX5_ST_SZ_DW(rqc)];
@@ -3111,6 +3113,7 @@ static int mlx5e_get_vf_stats(struct net_device *dev,
 }
 #endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
 
+#ifdef CONFIG_MLX5_CORE_EN_UDP_ENCAP_OFFLOAD
 void mlx5e_add_vxlan_port(struct net_device *netdev,
 			  struct udp_tunnel_info *ti)
 {
@@ -3171,20 +3174,22 @@ static netdev_features_t mlx5e_vxlan_features_check(struct mlx5e_priv *priv,
 	/* Disable CSUM and GSO if the udp dport is not offloaded by HW */
 	return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
 }
+#endif
 
 static netdev_features_t mlx5e_features_check(struct sk_buff *skb,
 					      struct net_device *netdev,
 					      netdev_features_t features)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
-
 	features = vlan_features_check(skb, features);
+#ifdef CONFIG_MLX5_CORE_EN_UDP_ENCAP_OFFLOAD
 	features = vxlan_features_check(skb, features);
 
 	/* Validate if the tunneled packet is being offloaded by HW */
 	if (skb->encapsulation &&
 	    (features & NETIF_F_CSUM_MASK || features & NETIF_F_GSO_MASK))
-		return mlx5e_vxlan_features_check(priv, skb, features);
+		return mlx5e_vxlan_features_check(netdev_priv(netdev),
+						  skb, features);
+#endif
 
 	return features;
 }
@@ -3365,8 +3370,10 @@ static const struct net_device_ops mlx5e_netdev_ops_sriov = {
 	.ndo_set_features        = mlx5e_set_features,
 	.ndo_change_mtu          = mlx5e_change_mtu,
 	.ndo_do_ioctl            = mlx5e_ioctl,
+#ifdef CONFIG_MLX5_CORE_EN_UDP_ENCAP_OFFLOAD
 	.ndo_udp_tunnel_add	 = mlx5e_add_vxlan_port,
 	.ndo_udp_tunnel_del	 = mlx5e_del_vxlan_port,
+#endif
 	.ndo_set_tx_maxrate      = mlx5e_set_tx_maxrate,
 	.ndo_features_check      = mlx5e_features_check,
 #ifdef CONFIG_RFS_ACCEL
@@ -3643,6 +3650,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 	netdev->hw_features      |= NETIF_F_HW_VLAN_CTAG_RX;
 	netdev->hw_features      |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
+#ifdef CONFIG_MLX5_CORE_EN_UDP_ENCAP_OFFLOAD
 	if (mlx5e_vxlan_allowed(mdev)) {
 		netdev->hw_features     |= NETIF_F_GSO_UDP_TUNNEL |
 					   NETIF_F_GSO_UDP_TUNNEL_CSUM |
@@ -3656,6 +3664,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 					   NETIF_F_GSO_PARTIAL;
 		netdev->gso_partial_features = NETIF_F_GSO_UDP_TUNNEL_CSUM;
 	}
+#endif
 
 	mlx5_query_port_fcs(mdev, &fcs_supported, &fcs_enabled);
 
@@ -3717,16 +3726,18 @@ static void mlx5e_nic_init(struct mlx5_core_dev *mdev,
 			   const struct mlx5e_profile *profile,
 			   void *ppriv)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
-
 	mlx5e_build_nic_netdev_priv(mdev, netdev, profile, ppriv);
 	mlx5e_build_nic_netdev(netdev);
-	mlx5e_vxlan_init(priv);
+#ifdef CONFIG_MLX5_CORE_EN_UDP_ENCAP_OFFLOAD
+	mlx5e_vxlan_init(netdev_priv(netdev));
+#endif
 }
 
 static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
 {
+#ifdef CONFIG_MLX5_CORE_EN_UDP_ENCAP_OFFLOAD
 	mlx5e_vxlan_cleanup(priv);
+#endif
 
 	if (priv->xdp_prog)
 		bpf_prog_put(priv->xdp_prog);
@@ -3847,12 +3858,14 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 	if (netdev->reg_state != NETREG_REGISTERED)
 		return;
 
+#ifdef CONFIG_MLX5_CORE_EN_UDP_ENCAP_OFFLOAD
 	/* Device already registered: sync netdev system state */
 	if (mlx5e_vxlan_allowed(mdev)) {
 		rtnl_lock();
 		udp_tunnel_get_rx_info(netdev);
 		rtnl_unlock();
 	}
+#endif
 
 	queue_work(priv->wq, &priv->set_rx_mode_work);
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-26 23:32 ` [PATCH net-next 1/4] mlx5: Make building eswitch configurable Tom Herbert
@ 2017-01-27  5:34   ` Or Gerlitz
  2017-01-27 17:38     ` Saeed Mahameed
  2017-01-27 18:19   ` Saeed Mahameed
  1 sibling, 1 reply; 33+ messages in thread
From: Or Gerlitz @ 2017-01-27  5:34 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Saeed Mahameed, David Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
> whether the eswitch code is built. Change Kconfig and Makefile
> accordingly.

Tom, FWIW, please note that the basic e-switch functionality is needed
also when SRIOV isn't of use, this is for a multi host configuration.

Or.

My WW (and same for the rest of the IL team..) has ended so I will be
able to further look on this series and comment on Sunday.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 3/4] mlx5: Make building tc hardware offload configurable
  2017-01-26 23:32 ` [PATCH net-next 3/4] mlx5: Make building tc hardware offload configurable Tom Herbert
@ 2017-01-27  6:29   ` kbuild test robot
  2017-01-27 13:43   ` kbuild test robot
  1 sibling, 0 replies; 33+ messages in thread
From: kbuild test robot @ 2017-01-27  6:29 UTC (permalink / raw)
  To: Tom Herbert; +Cc: kbuild-all, saeedm, davem, netdev, kernel-team

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

Hi Tom,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Tom-Herbert/mlx5-Create-build-configuration-options/20170127-084348
config: x86_64-randconfig-s3-01271208 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

Note: the linux-review/Tom-Herbert/mlx5-Create-build-configuration-options/20170127-084348 HEAD fe8939265468a7204ffc5b1c6c878b39bae7e7d0 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `mlx5e_configure_flower':
>> (.text+0x22a6a1): undefined reference to `mlx5e_vxlan_lookup_port'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31256 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 3/4] mlx5: Make building tc hardware offload configurable
  2017-01-26 23:32 ` [PATCH net-next 3/4] mlx5: Make building tc hardware offload configurable Tom Herbert
  2017-01-27  6:29   ` kbuild test robot
@ 2017-01-27 13:43   ` kbuild test robot
  1 sibling, 0 replies; 33+ messages in thread
From: kbuild test robot @ 2017-01-27 13:43 UTC (permalink / raw)
  To: Tom Herbert; +Cc: kbuild-all, saeedm, davem, netdev, kernel-team

[-- Attachment #1: Type: text/plain, Size: 853 bytes --]

Hi Tom,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Tom-Herbert/mlx5-Create-build-configuration-options/20170127-084348
config: x86_64-randconfig-s2-01271634 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

Note: the linux-review/Tom-Herbert/mlx5-Create-build-configuration-options/20170127-084348 HEAD fe8939265468a7204ffc5b1c6c878b39bae7e7d0 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> ERROR: "mlx5e_vxlan_lookup_port" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27633 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-27  5:34   ` Or Gerlitz
@ 2017-01-27 17:38     ` Saeed Mahameed
  2017-01-27 17:50       ` Tom Herbert
  0 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-27 17:38 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Tom Herbert, Saeed Mahameed, David Miller, Linux Netdev List,
	Kernel Team

On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>> whether the eswitch code is built. Change Kconfig and Makefile
>> accordingly.
>
> Tom, FWIW, please note that the basic e-switch functionality is needed
> also when SRIOV isn't of use, this is for a multi host configuration.
>

Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
MAC address wanted by VF or PF.
To keep one flow in the code, the implementation is done as part of eswitch.

so in multi-host configuration (where there are 4 PFs) each PF should
invoke set_l2_table_entry_cmd  for each one of its own UC MACs.

populating the l2 table is done using the whole eswitch event driven
mechanisms, it is not easy and IMH not right to separate eswitch
tables from l2 table (same management logic, different tables).

Anyways as Or stated this is just an FYI, eswitch needs to be enabled
on Multi-host configuration.

> Or.
>
> My WW (and same for the rest of the IL team..) has ended so I will be
> able to further look on this series and comment on Sunday.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-27 17:38     ` Saeed Mahameed
@ 2017-01-27 17:50       ` Tom Herbert
  2017-01-27 18:05         ` Saeed Mahameed
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2017-01-27 17:50 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Or Gerlitz, Saeed Mahameed, David Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>>> whether the eswitch code is built. Change Kconfig and Makefile
>>> accordingly.
>>
>> Tom, FWIW, please note that the basic e-switch functionality is needed
>> also when SRIOV isn't of use, this is for a multi host configuration.
>>
>
> Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
> MAC address wanted by VF or PF.
> To keep one flow in the code, the implementation is done as part of eswitch.
>
> so in multi-host configuration (where there are 4 PFs) each PF should
> invoke set_l2_table_entry_cmd  for each one of its own UC MACs.
>
> populating the l2 table is done using the whole eswitch event driven
> mechanisms, it is not easy and IMH not right to separate eswitch
> tables from l2 table (same management logic, different tables).
>
> Anyways as Or stated this is just an FYI, eswitch needs to be enabled
> on Multi-host configuration.
>
What indicate a multi-host configuration?

>> Or.
>>
>> My WW (and same for the rest of the IL team..) has ended so I will be
>> able to further look on this series and comment on Sunday.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/4] mlx5: Create build configuration options
  2017-01-26 23:32 [PATCH net-next 0/4] mlx5: Create build configuration options Tom Herbert
                   ` (3 preceding siblings ...)
  2017-01-26 23:32 ` [PATCH net-next 4/4] mlx5: Make building vxlan " Tom Herbert
@ 2017-01-27 17:58 ` Saeed Mahameed
  2017-01-27 18:13   ` Tom Herbert
  4 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-27 17:58 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
> This patchset creates configuration options for sriov, vxlan, eswitch,
> and tc features in the mlx5 driver. The purpose of this is to allow not
> building these features. These features are optional advanced features
> that are not required for a core Ethernet driver. A user can disable
> these features which resuces the amount of code in the driver. Disabling
> these features (and DCB) reduces the size of mlx5_core.o by about 16%.
> This is also can reduce the complexity of backport and rebases since
> user would no longer need to worry about dependencies with the rest of
> the kernel that features which might not be of any interest to a user
> may bring in.
>
> Tested: Build and ran the driver with all features enabled (the default)
> and with none enabled (including DCB). Did not see any issues. I did
> not explicity test operation of ayy of features in the list.
>

Basically I am not against this kind of change, infact i am with it,
although I would have done some restructuring in the driver before i
did such change ;), filling the code with ifdefs is not a neat thing.

I agree this will simplify backporting and provide some kind of
feature separation inside the driver.
But this will also increase the testing matrix we need to cover and
increase the likelihood of kbuild breaks by an order of magnitude.

One more thing, do we really need a device specific flag per feature
per vendor per device?  can't we just use the same kconfig flag for
all drivers and if there is a more generic system wide flag that
covers the same feature
can't we just use it, for instance instead of
CONFIG_<DRIVER_NAME>_SRIOV why not use already existing CONFIG_PCI_IOV
for all drivers ?

Saeed.

>
>
> Tom Herbert (4):
>   mlx5: Make building eswitch configurable
>   mlx5: Make building SR-IOV configurable
>   mlx5: Make building tc hardware offload configurable
>   mlx5: Make building vxlan hardware offload configurable
>
>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  35 ++++++
>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  16 ++-
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 129 ++++++++++++++++------
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   |  39 +++++--
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c      |   4 +-
>  drivers/net/ethernet/mellanox/mlx5/core/lag.c     |   2 +
>  drivers/net/ethernet/mellanox/mlx5/core/main.c    |  32 ++++--
>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |   6 +-
>  8 files changed, 205 insertions(+), 58 deletions(-)
>
> --
> 2.9.3
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-27 17:50       ` Tom Herbert
@ 2017-01-27 18:05         ` Saeed Mahameed
  2017-01-27 18:16           ` Tom Herbert
  0 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-27 18:05 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Or Gerlitz, Saeed Mahameed, David Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 7:50 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>>>> whether the eswitch code is built. Change Kconfig and Makefile
>>>> accordingly.
>>>
>>> Tom, FWIW, please note that the basic e-switch functionality is needed
>>> also when SRIOV isn't of use, this is for a multi host configuration.
>>>
>>
>> Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
>> MAC address wanted by VF or PF.
>> To keep one flow in the code, the implementation is done as part of eswitch.
>>
>> so in multi-host configuration (where there are 4 PFs) each PF should
>> invoke set_l2_table_entry_cmd  for each one of its own UC MACs.
>>
>> populating the l2 table is done using the whole eswitch event driven
>> mechanisms, it is not easy and IMH not right to separate eswitch
>> tables from l2 table (same management logic, different tables).
>>
>> Anyways as Or stated this is just an FYI, eswitch needs to be enabled
>> on Multi-host configuration.
>>
> What indicate a multi-host configuration?

nothing in the driver, it is transparent.

>
>>> Or.
>>>
>>> My WW (and same for the rest of the IL team..) has ended so I will be
>>> able to further look on this series and comment on Sunday.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/4] mlx5: Create build configuration options
  2017-01-27 17:58 ` [PATCH net-next 0/4] mlx5: Create build configuration options Saeed Mahameed
@ 2017-01-27 18:13   ` Tom Herbert
  2017-01-28 11:38     ` Saeed Mahameed
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2017-01-27 18:13 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 9:58 AM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>> This patchset creates configuration options for sriov, vxlan, eswitch,
>> and tc features in the mlx5 driver. The purpose of this is to allow not
>> building these features. These features are optional advanced features
>> that are not required for a core Ethernet driver. A user can disable
>> these features which resuces the amount of code in the driver. Disabling
>> these features (and DCB) reduces the size of mlx5_core.o by about 16%.
>> This is also can reduce the complexity of backport and rebases since
>> user would no longer need to worry about dependencies with the rest of
>> the kernel that features which might not be of any interest to a user
>> may bring in.
>>
>> Tested: Build and ran the driver with all features enabled (the default)
>> and with none enabled (including DCB). Did not see any issues. I did
>> not explicity test operation of ayy of features in the list.
>>
>
> Basically I am not against this kind of change, infact i am with it,
> although I would have done some restructuring in the driver before i
> did such change ;), filling the code with ifdefs is not a neat thing.
>
If you wish, please take this as an RFC and feel free to structure the
code the right way. I think the intent is clear enough and looks like
davem isn't going to allow the directory restructuring so something
like this seems to be the best course of action now.

> I agree this will simplify backporting and provide some kind of
> feature separation inside the driver.
> But this will also increase the testing matrix we need to cover and
> increase the likelihood of kbuild breaks by an order of magnitude.
>
The testing matrix already exploded with the proliferation of
supported features. If anything this reduces the test matrix problem.
For instance, if we make a change to the core driver and functionality
properly isolated there is a much better chance that this won't affect
peripheral functionality and vice versa. It is just not feasible for
us to test every combination of NIC features for every change being
made.

> One more thing, do we really need a device specific flag per feature
> per vendor per device?  can't we just use the same kconfig flag for
> all drivers and if there is a more generic system wide flag that
> covers the same feature
> can't we just use it, for instance instead of
> CONFIG_<DRIVER_NAME>_SRIOV why not use already existing CONFIG_PCI_IOV
> for all drivers ?
>
That sounds good to me. We already have CONFIG_RFS_ACCEL and others
that do that.

Tom

> Saeed.
>
>>
>>
>> Tom Herbert (4):
>>   mlx5: Make building eswitch configurable
>>   mlx5: Make building SR-IOV configurable
>>   mlx5: Make building tc hardware offload configurable
>>   mlx5: Make building vxlan hardware offload configurable
>>
>>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  35 ++++++
>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  16 ++-
>>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 129 ++++++++++++++++------
>>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   |  39 +++++--
>>  drivers/net/ethernet/mellanox/mlx5/core/eq.c      |   4 +-
>>  drivers/net/ethernet/mellanox/mlx5/core/lag.c     |   2 +
>>  drivers/net/ethernet/mellanox/mlx5/core/main.c    |  32 ++++--
>>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |   6 +-
>>  8 files changed, 205 insertions(+), 58 deletions(-)
>>
>> --
>> 2.9.3
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-27 18:05         ` Saeed Mahameed
@ 2017-01-27 18:16           ` Tom Herbert
  2017-01-27 18:28             ` Saeed Mahameed
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2017-01-27 18:16 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Or Gerlitz, Saeed Mahameed, David Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 10:05 AM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Fri, Jan 27, 2017 at 7:50 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
>> <saeedm@dev.mellanox.co.il> wrote:
>>> On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>>>>> whether the eswitch code is built. Change Kconfig and Makefile
>>>>> accordingly.
>>>>
>>>> Tom, FWIW, please note that the basic e-switch functionality is needed
>>>> also when SRIOV isn't of use, this is for a multi host configuration.
>>>>
>>>
>>> Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
>>> MAC address wanted by VF or PF.
>>> To keep one flow in the code, the implementation is done as part of eswitch.
>>>
>>> so in multi-host configuration (where there are 4 PFs) each PF should
>>> invoke set_l2_table_entry_cmd  for each one of its own UC MACs.
>>>
>>> populating the l2 table is done using the whole eswitch event driven
>>> mechanisms, it is not easy and IMH not right to separate eswitch
>>> tables from l2 table (same management logic, different tables).
>>>
>>> Anyways as Or stated this is just an FYI, eswitch needs to be enabled
>>> on Multi-host configuration.
>>>
>> What indicate a multi-host configuration?
>
> nothing in the driver, it is transparent.
>
So then we always need the eswitch code to be built even if someone
never uses any of it?

>>
>>>> Or.
>>>>
>>>> My WW (and same for the rest of the IL team..) has ended so I will be
>>>> able to further look on this series and comment on Sunday.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-26 23:32 ` [PATCH net-next 1/4] mlx5: Make building eswitch configurable Tom Herbert
  2017-01-27  5:34   ` Or Gerlitz
@ 2017-01-27 18:19   ` Saeed Mahameed
  2017-01-27 18:33     ` Tom Herbert
  1 sibling, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-27 18:19 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
> whether the eswitch code is built. Change Kconfig and Makefile
> accordingly.
>
> Signed-off-by: Tom Herbert <tom@herbertland.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   | 11 +++
>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  6 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 92 +++++++++++++++++------
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   | 39 +++++++---
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c      |  4 +-
>  drivers/net/ethernet/mellanox/mlx5/core/main.c    | 16 ++--
>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |  6 +-
>  7 files changed, 125 insertions(+), 49 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> index ddb4ca4..27aae58 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> @@ -30,3 +30,14 @@ config MLX5_CORE_EN_DCB
>           This flag is depended on the kernel's DCB support.
>
>           If unsure, set to Y
> +
> +config MLX5_CORE_EN_ESWITCH
> +       bool "Ethernet switch"
> +       default y
> +       depends on MLX5_CORE_EN
> +       ---help---
> +         Say Y here if you want to use Ethernet switch (E-switch). E-Switch
> +         is the software entity that represents and manages ConnectX4
> +         inter-HCA ethernet l2 switching.
> +
> +         If unsure, set to Y
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> index 9f43beb..17025d8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> @@ -5,9 +5,11 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
>                 mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
>                 fs_counters.o rl.o lag.o dev.o
>
> -mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
> +mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o \
>                 en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
>                 en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
> -               en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o en_selftest.o
> +               en_tc.o en_arfs.o en_fs_ethtool.o en_selftest.o
>
>  mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
> +
> +mlx5_core-$(CONFIG_MLX5_CORE_EN_ESWITCH) += eswitch.o eswitch_offloads.o en_rep.o
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index e829143..1db4d98 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -38,7 +38,9 @@
>  #include <linux/bpf.h>
>  #include "en.h"
>  #include "en_tc.h"
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>  #include "eswitch.h"
> +#endif

Wouldn't it be cleaner if we left the main code (en_main.c) untouched
and kept this
#include "eswitch.h" and instead of filling the main flow code with
#ifdefs, in eswitch.h
we can create eswitch mock API functions when
CONFIG_MLX5_CORE_EN_ESWITCH is not enabled ? the main flow will be
clean from ifdefs and will complie with mock functions.

we did this with accelerated RFS feature. look for "#ifndef
CONFIG_RFS_ACCEL" in en.h

>  #include "vxlan.h"
>
>  struct mlx5e_rq_param {
> @@ -585,10 +587,12 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
>
>         switch (priv->params.rq_wq_type) {
>         case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>                 if (mlx5e_is_vf_vport_rep(priv)) {
>                         err = -EINVAL;
>                         goto err_rq_wq_destroy;
>                 }
> +#endif
>
>                 rq->handle_rx_cqe = mlx5e_handle_rx_cqe_mpwrq;
>                 rq->alloc_wqe = mlx5e_alloc_rx_mpwqe;
> @@ -617,10 +621,14 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
>                         goto err_rq_wq_destroy;
>                 }
>
> -               if (mlx5e_is_vf_vport_rep(priv))
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
> +               if (mlx5e_is_vf_vport_rep(priv)) {
>                         rq->handle_rx_cqe = mlx5e_handle_rx_cqe_rep;
> -               else
> +               } else
> +#endif
> +               {
>                         rq->handle_rx_cqe = mlx5e_handle_rx_cqe;
> +               }
>
>                 rq->alloc_wqe = mlx5e_alloc_rx_wqe;
>                 rq->dealloc_wqe = mlx5e_dealloc_rx_wqe;
> @@ -2198,7 +2206,6 @@ static void mlx5e_netdev_set_tcs(struct net_device *netdev)
>  int mlx5e_open_locked(struct net_device *netdev)
>  {
>         struct mlx5e_priv *priv = netdev_priv(netdev);
> -       struct mlx5_core_dev *mdev = priv->mdev;
>         int num_txqs;
>         int err;
>
> @@ -2233,11 +2240,13 @@ int mlx5e_open_locked(struct net_device *netdev)
>         if (priv->profile->update_stats)
>                 queue_delayed_work(priv->wq, &priv->update_stats_work, 0);
>
> -       if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
> +       if (MLX5_CAP_GEN(priv->mdev, vport_group_manager)) {
>                 err = mlx5e_add_sqs_fwd_rules(priv);
>                 if (err)
>                         goto err_close_channels;
>         }
> +#endif
>         return 0;
>
>  err_close_channels:
> @@ -2262,7 +2271,6 @@ int mlx5e_open(struct net_device *netdev)
>  int mlx5e_close_locked(struct net_device *netdev)
>  {
>         struct mlx5e_priv *priv = netdev_priv(netdev);
> -       struct mlx5_core_dev *mdev = priv->mdev;
>
>         /* May already be CLOSED in case a previous configuration operation
>          * (e.g RX/TX queue size change) that involves close&open failed.
> @@ -2272,8 +2280,10 @@ int mlx5e_close_locked(struct net_device *netdev)
>
>         clear_bit(MLX5E_STATE_OPENED, &priv->state);
>
> -       if (MLX5_CAP_GEN(mdev, vport_group_manager))
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
> +       if (MLX5_CAP_GEN(priv->mdev, vport_group_manager))
>                 mlx5e_remove_sqs_fwd_rules(priv);
> +#endif
>
>         mlx5e_timestamp_cleanup(priv);
>         netif_carrier_off(priv->netdev);
> @@ -2742,12 +2752,15 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
>         struct mlx5e_vport_stats *vstats = &priv->stats.vport;
>         struct mlx5e_pport_stats *pstats = &priv->stats.pport;
>
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         if (mlx5e_is_uplink_rep(priv)) {
>                 stats->rx_packets = PPORT_802_3_GET(pstats, a_frames_received_ok);
>                 stats->rx_bytes   = PPORT_802_3_GET(pstats, a_octets_received_ok);
>                 stats->tx_packets = PPORT_802_3_GET(pstats, a_frames_transmitted_ok);
>                 stats->tx_bytes   = PPORT_802_3_GET(pstats, a_octets_transmitted_ok);
> -       } else {
> +       } else
> +#endif
> +       {
>                 stats->rx_packets = sstats->rx_packets;
>                 stats->rx_bytes   = sstats->rx_bytes;
>                 stats->tx_packets = sstats->tx_packets;
> @@ -2991,6 +3004,7 @@ static int mlx5e_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
>         }
>  }
>
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>  static int mlx5e_set_vf_mac(struct net_device *dev, int vf, u8 *mac)
>  {
>         struct mlx5e_priv *priv = netdev_priv(dev);
> @@ -3093,6 +3107,7 @@ static int mlx5e_get_vf_stats(struct net_device *dev,
>         return mlx5_eswitch_get_vport_stats(mdev->priv.eswitch, vf + 1,
>                                             vf_stats);
>  }
> +#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
>
>  void mlx5e_add_vxlan_port(struct net_device *netdev,
>                           struct udp_tunnel_info *ti)
> @@ -3329,6 +3344,7 @@ static const struct net_device_ops mlx5e_netdev_ops_basic = {
>  #endif
>  };
>
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>  static const struct net_device_ops mlx5e_netdev_ops_sriov = {
>         .ndo_open                = mlx5e_open,
>         .ndo_stop                = mlx5e_close,
> @@ -3358,14 +3374,15 @@ static const struct net_device_ops mlx5e_netdev_ops_sriov = {
>         .ndo_get_vf_config       = mlx5e_get_vf_config,
>         .ndo_set_vf_link_state   = mlx5e_set_vf_link_state,
>         .ndo_get_vf_stats        = mlx5e_get_vf_stats,
> +       .ndo_has_offload_stats   = mlx5e_has_offload_stats,
> +       .ndo_get_offload_stats   = mlx5e_get_offload_stats,
>         .ndo_tx_timeout          = mlx5e_tx_timeout,
>         .ndo_xdp                 = mlx5e_xdp,
>  #ifdef CONFIG_NET_POLL_CONTROLLER
>         .ndo_poll_controller     = mlx5e_netpoll,
>  #endif
> -       .ndo_has_offload_stats   = mlx5e_has_offload_stats,
> -       .ndo_get_offload_stats   = mlx5e_get_offload_stats,
>  };
> +#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
>
>  static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
>  {
> @@ -3586,13 +3603,16 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>
>         SET_NETDEV_DEV(netdev, &mdev->pdev->dev);
>
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
>                 netdev->netdev_ops = &mlx5e_netdev_ops_sriov;
>  #ifdef CONFIG_MLX5_CORE_EN_DCB
>                 if (MLX5_CAP_GEN(mdev, qos))
>                         netdev->dcbnl_ops = &mlx5e_dcbnl_ops;
>  #endif
> -       } else {
> +       } else
> +#endif
> +       {
>                 netdev->netdev_ops = &mlx5e_netdev_ops_basic;
>         }
>
> @@ -3795,14 +3815,16 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
>  {
>         struct net_device *netdev = priv->netdev;
>         struct mlx5_core_dev *mdev = priv->mdev;
> -       struct mlx5_eswitch *esw = mdev->priv.eswitch;
> -       struct mlx5_eswitch_rep rep;
>
>         mlx5_lag_add(mdev, netdev);
>
>         mlx5e_enable_async_events(priv);
>
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
> +               struct mlx5_eswitch *esw = mdev->priv.eswitch;
> +               struct mlx5_eswitch_rep rep;
> +
>                 mlx5_query_nic_vport_mac_address(mdev, 0, rep.hw_id);
>                 rep.load = mlx5e_nic_rep_load;
>                 rep.unload = mlx5e_nic_rep_unload;
> @@ -3810,6 +3832,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
>                 rep.netdev = netdev;
>                 mlx5_eswitch_register_vport_rep(esw, 0, &rep);
>         }
> +#endif
>
>         if (netdev->reg_state != NETREG_REGISTERED)
>                 return;
> @@ -3827,11 +3850,14 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
>  static void mlx5e_nic_disable(struct mlx5e_priv *priv)
>  {
>         struct mlx5_core_dev *mdev = priv->mdev;
> -       struct mlx5_eswitch *esw = mdev->priv.eswitch;
>
>         queue_work(priv->wq, &priv->set_rx_mode_work);
> -       if (MLX5_CAP_GEN(mdev, vport_group_manager))
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
> +       if (MLX5_CAP_GEN(priv->mdev, vport_group_manager)) {
> +               struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
>                 mlx5_eswitch_unregister_vport_rep(esw, 0);
> +       }
> +#endif
>         mlx5e_disable_async_events(priv);
>         mlx5_lag_remove(mdev);
>  }
> @@ -3942,6 +3968,7 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
>         return err;
>  }
>
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>  static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
>  {
>         struct mlx5_eswitch *esw = mdev->priv.eswitch;
> @@ -3964,6 +3991,7 @@ static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
>                 mlx5_eswitch_register_vport_rep(esw, vport, &rep);
>         }
>  }
> +#endif
>
>  void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
>  {
> @@ -4028,11 +4056,8 @@ static void mlx5e_detach(struct mlx5_core_dev *mdev, void *vpriv)
>
>  static void *mlx5e_add(struct mlx5_core_dev *mdev)
>  {
> -       struct mlx5_eswitch *esw = mdev->priv.eswitch;
> -       int total_vfs = MLX5_TOTAL_VPORTS(mdev);
>         void *ppriv = NULL;
>         void *priv;
> -       int vport;
>         int err;
>         struct net_device *netdev;
>
> @@ -4040,10 +4065,14 @@ static void *mlx5e_add(struct mlx5_core_dev *mdev)
>         if (err)
>                 return NULL;
>
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         mlx5e_register_vport_rep(mdev);
>
> -       if (MLX5_CAP_GEN(mdev, vport_group_manager))
> +       if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
> +               struct mlx5_eswitch *esw = mdev->priv.eswitch;
>                 ppriv = &esw->offloads.vport_reps[0];
> +       }
> +#endif
>
>         netdev = mlx5e_create_netdev(mdev, &mlx5e_nic_profile, ppriv);
>         if (!netdev) {
> @@ -4074,8 +4103,16 @@ static void *mlx5e_add(struct mlx5_core_dev *mdev)
>         mlx5e_destroy_netdev(mdev, priv);
>
>  err_unregister_reps:
> -       for (vport = 1; vport < total_vfs; vport++)
> -               mlx5_eswitch_unregister_vport_rep(esw, vport);
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
> +       {
> +               int total_vfs = MLX5_TOTAL_VPORTS(mdev);
> +               struct mlx5_eswitch *esw = mdev->priv.eswitch;
> +               int vport;
> +
> +               for (vport = 1; vport < total_vfs; vport++)
> +                       mlx5_eswitch_unregister_vport_rep(esw, vport);
> +       }
> +#endif
>
>         return NULL;
>  }
> @@ -4093,13 +4130,18 @@ void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
>
>  static void mlx5e_remove(struct mlx5_core_dev *mdev, void *vpriv)
>  {
> -       struct mlx5_eswitch *esw = mdev->priv.eswitch;
> -       int total_vfs = MLX5_TOTAL_VPORTS(mdev);
>         struct mlx5e_priv *priv = vpriv;
> -       int vport;
>
> -       for (vport = 1; vport < total_vfs; vport++)
> -               mlx5_eswitch_unregister_vport_rep(esw, vport);
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
> +       {
> +               struct mlx5_eswitch *esw = mdev->priv.eswitch;
> +               int total_vfs = MLX5_TOTAL_VPORTS(mdev);
> +               int vport;
> +
> +               for (vport = 1; vport < total_vfs; vport++)
> +                       mlx5_eswitch_unregister_vport_rep(esw, vport);
> +       }
> +#endif
>
>         unregister_netdev(priv->netdev);
>         mlx5e_detach(mdev, vpriv);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 640f10f..8e5f565 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -128,6 +128,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
>         return rule;
>  }
>
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>  static struct mlx5_flow_handle *
>  mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
>                       struct mlx5_flow_spec *spec,
> @@ -160,6 +161,7 @@ static void mlx5e_detach_encap(struct mlx5e_priv *priv,
>                 kfree(e);
>         }
>  }
> +#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
>
>  /* we get here also when setting rule to the FW failed, etc. It means that the
>   * flow rule itself might not exist, but some offloading related to the actions
> @@ -168,7 +170,6 @@ static void mlx5e_detach_encap(struct mlx5e_priv *priv,
>  static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
>                               struct mlx5e_tc_flow *flow)
>  {
> -       struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
>         struct mlx5_fc *counter = NULL;
>
>         if (!IS_ERR(flow->rule)) {
> @@ -177,11 +178,17 @@ static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
>                 mlx5_fc_destroy(priv->mdev, counter);
>         }
>
> -       if (esw && esw->mode == SRIOV_OFFLOADS) {
> -               mlx5_eswitch_del_vlan_action(esw, flow->attr);
> -               if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
> -                       mlx5e_detach_encap(priv, flow);
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
> +       {
> +               struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
> +
> +               if (esw && esw->mode == SRIOV_OFFLOADS) {
> +                       mlx5_eswitch_del_vlan_action(esw, flow->attr);
> +                       if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
> +                               mlx5e_detach_encap(priv, flow);
> +               }
>         }
> +#endif
>
>         if (!mlx5e_tc_num_filters(priv) && (priv->fs.tc.t)) {
>                 mlx5_destroy_flow_table(priv->fs.tc.t);
> @@ -679,6 +686,7 @@ static inline int hash_encap_info(struct ip_tunnel_key *key)
>         return jhash(key, sizeof(*key), 0);
>  }
>
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>  static int mlx5e_route_lookup_ipv4(struct mlx5e_priv *priv,
>                                    struct net_device *mirred_dev,
>                                    struct net_device **out_dev,
> @@ -1129,20 +1137,27 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
>         }
>         return 0;
>  }
> +#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
>
>  int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
>                            struct tc_cls_flower_offload *f)
>  {
>         struct mlx5e_tc_table *tc = &priv->fs.tc;
>         int err = 0;
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         bool fdb_flow = false;
> +#endif
>         u32 flow_tag, action;
>         struct mlx5e_tc_flow *flow;
>         struct mlx5_flow_spec *spec;
> -       struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
>
> -       if (esw && esw->mode == SRIOV_OFFLOADS)
> -               fdb_flow = true;
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
> +       {
> +               struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
> +
> +               if (esw && esw->mode == SRIOV_OFFLOADS)
> +                       fdb_flow = true;
> +       }
>
>         if (fdb_flow)
>                 flow = kzalloc(sizeof(*flow) +
> @@ -1150,6 +1165,9 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
>                                GFP_KERNEL);
>         else
>                 flow = kzalloc(sizeof(*flow), GFP_KERNEL);
> +#else
> +       flow = kzalloc(sizeof(*flow), GFP_KERNEL);
> +#endif
>
>         spec = mlx5_vzalloc(sizeof(*spec));
>         if (!spec || !flow) {
> @@ -1163,13 +1181,16 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
>         if (err < 0)
>                 goto err_free;
>
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         if (fdb_flow) {
>                 flow->attr  = (struct mlx5_esw_flow_attr *)(flow + 1);
>                 err = parse_tc_fdb_actions(priv, f->exts, flow);
>                 if (err < 0)
>                         goto err_free;
>                 flow->rule = mlx5e_tc_add_fdb_flow(priv, spec, flow->attr);
> -       } else {
> +       } else
> +#endif
> +       {
>                 err = parse_tc_nic_actions(priv, f->exts, &action, &flow_tag);
>                 if (err < 0)
>                         goto err_free;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index ea5d8d3..2c67c25 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -35,7 +35,7 @@
>  #include <linux/mlx5/driver.h>
>  #include <linux/mlx5/cmd.h>
>  #include "mlx5_core.h"
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>  #include "eswitch.h"
>  #endif
>
> @@ -462,7 +462,7 @@ static irqreturn_t mlx5_eq_int(int irq, void *eq_ptr)
>                         }
>                         break;
>
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>                 case MLX5_EVENT_TYPE_NIC_VPORT_CHANGE:
>                         mlx5_eswitch_vport_event(dev->priv.eswitch, eqe);
>                         break;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 84f7970..224f499 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -53,7 +53,7 @@
>  #include <net/devlink.h>
>  #include "mlx5_core.h"
>  #include "fs_core.h"
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>  #include "eswitch.h"
>  #endif
>
> @@ -941,7 +941,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
>                 goto err_tables_cleanup;
>         }
>
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         err = mlx5_eswitch_init(dev);
>         if (err) {
>                 dev_err(&pdev->dev, "Failed to init eswitch %d\n", err);
> @@ -958,7 +958,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
>         return 0;
>
>  err_eswitch_cleanup:
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         mlx5_eswitch_cleanup(dev->priv.eswitch);
>
>  err_rl_cleanup:
> @@ -981,7 +981,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
>  static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
>  {
>         mlx5_sriov_cleanup(dev);
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         mlx5_eswitch_cleanup(dev->priv.eswitch);
>  #endif
>         mlx5_cleanup_rl_table(dev);
> @@ -1131,7 +1131,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>                 goto err_fs;
>         }
>
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         mlx5_eswitch_attach(dev->priv.eswitch);
>  #endif
>
> @@ -1162,7 +1162,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>         mlx5_sriov_detach(dev);
>
>  err_sriov:
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         mlx5_eswitch_detach(dev->priv.eswitch);
>  #endif
>         mlx5_cleanup_fs(dev);
> @@ -1233,7 +1233,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>                 mlx5_detach_device(dev);
>
>         mlx5_sriov_detach(dev);
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         mlx5_eswitch_detach(dev->priv.eswitch);
>  #endif
>         mlx5_cleanup_fs(dev);
> @@ -1269,7 +1269,7 @@ struct mlx5_core_event_handler {
>  };
>
>  static const struct devlink_ops mlx5_devlink_ops = {
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>         .eswitch_mode_set = mlx5_devlink_eswitch_mode_set,
>         .eswitch_mode_get = mlx5_devlink_eswitch_mode_get,
>         .eswitch_inline_mode_set = mlx5_devlink_eswitch_inline_mode_set,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
> index e086277..0664cc4 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
> @@ -33,7 +33,7 @@
>  #include <linux/pci.h>
>  #include <linux/mlx5/driver.h>
>  #include "mlx5_core.h"
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_ESWITCH
>  #include "eswitch.h"
>  #endif
>
> @@ -57,7 +57,7 @@ static int mlx5_device_enable_sriov(struct mlx5_core_dev *dev, int num_vfs)
>                 return -EBUSY;
>         }
>
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_ESWITCH
>         err = mlx5_eswitch_enable_sriov(dev->priv.eswitch, num_vfs, SRIOV_LEGACY);
>         if (err) {
>                 mlx5_core_warn(dev,
> @@ -102,7 +102,7 @@ static void mlx5_device_disable_sriov(struct mlx5_core_dev *dev)
>                 sriov->enabled_vfs--;
>         }
>
> -#ifdef CONFIG_MLX5_CORE_EN
> +#ifdef CONFIG_MLX5_CORE_ESWITCH
>         mlx5_eswitch_disable_sriov(dev->priv.eswitch);
>  #endif
>
> --
> 2.9.3
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-27 18:16           ` Tom Herbert
@ 2017-01-27 18:28             ` Saeed Mahameed
  2017-01-27 18:42               ` Tom Herbert
  0 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-27 18:28 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Or Gerlitz, Saeed Mahameed, David Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 8:16 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Fri, Jan 27, 2017 at 10:05 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Fri, Jan 27, 2017 at 7:50 PM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
>>> <saeedm@dev.mellanox.co.il> wrote:
>>>> On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>>>>>> whether the eswitch code is built. Change Kconfig and Makefile
>>>>>> accordingly.
>>>>>
>>>>> Tom, FWIW, please note that the basic e-switch functionality is needed
>>>>> also when SRIOV isn't of use, this is for a multi host configuration.
>>>>>
>>>>
>>>> Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
>>>> MAC address wanted by VF or PF.
>>>> To keep one flow in the code, the implementation is done as part of eswitch.
>>>>
>>>> so in multi-host configuration (where there are 4 PFs) each PF should
>>>> invoke set_l2_table_entry_cmd  for each one of its own UC MACs.
>>>>
>>>> populating the l2 table is done using the whole eswitch event driven
>>>> mechanisms, it is not easy and IMH not right to separate eswitch
>>>> tables from l2 table (same management logic, different tables).
>>>>
>>>> Anyways as Or stated this is just an FYI, eswitch needs to be enabled
>>>> on Multi-host configuration.
>>>>
>>> What indicate a multi-host configuration?
>>
>> nothing in the driver, it is transparent.
>>
> So then we always need the eswitch code to be built even if someone
> never uses any of it?
>

yes.
but for your convenience all you need is to compile eswitch.c.
esiwtch_offoalds.c and en_rep.c can be compiled out for basic ethernet.

>>>
>>>>> Or.
>>>>>
>>>>> My WW (and same for the rest of the IL team..) has ended so I will be
>>>>> able to further look on this series and comment on Sunday.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-27 18:19   ` Saeed Mahameed
@ 2017-01-27 18:33     ` Tom Herbert
  2017-01-27 20:59       ` Saeed Mahameed
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2017-01-27 18:33 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 10:19 AM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>> whether the eswitch code is built. Change Kconfig and Makefile
>> accordingly.
>>
>> Signed-off-by: Tom Herbert <tom@herbertland.com>
>> ---
>>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   | 11 +++
>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  6 +-
>>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 92 +++++++++++++++++------
>>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   | 39 +++++++---
>>  drivers/net/ethernet/mellanox/mlx5/core/eq.c      |  4 +-
>>  drivers/net/ethernet/mellanox/mlx5/core/main.c    | 16 ++--
>>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |  6 +-
>>  7 files changed, 125 insertions(+), 49 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>> index ddb4ca4..27aae58 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>> @@ -30,3 +30,14 @@ config MLX5_CORE_EN_DCB
>>           This flag is depended on the kernel's DCB support.
>>
>>           If unsure, set to Y
>> +
>> +config MLX5_CORE_EN_ESWITCH
>> +       bool "Ethernet switch"
>> +       default y
>> +       depends on MLX5_CORE_EN
>> +       ---help---
>> +         Say Y here if you want to use Ethernet switch (E-switch). E-Switch
>> +         is the software entity that represents and manages ConnectX4
>> +         inter-HCA ethernet l2 switching.
>> +
>> +         If unsure, set to Y
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> index 9f43beb..17025d8 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>> @@ -5,9 +5,11 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
>>                 mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
>>                 fs_counters.o rl.o lag.o dev.o
>>
>> -mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
>> +mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o \
>>                 en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
>>                 en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
>> -               en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o en_selftest.o
>> +               en_tc.o en_arfs.o en_fs_ethtool.o en_selftest.o
>>
>>  mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
>> +
>> +mlx5_core-$(CONFIG_MLX5_CORE_EN_ESWITCH) += eswitch.o eswitch_offloads.o en_rep.o
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> index e829143..1db4d98 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>> @@ -38,7 +38,9 @@
>>  #include <linux/bpf.h>
>>  #include "en.h"
>>  #include "en_tc.h"
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>  #include "eswitch.h"
>> +#endif
>
> Wouldn't it be cleaner if we left the main code (en_main.c) untouched
> and kept this
> #include "eswitch.h" and instead of filling the main flow code with
> #ifdefs, in eswitch.h
> we can create eswitch mock API functions when
> CONFIG_MLX5_CORE_EN_ESWITCH is not enabled ? the main flow will be
> clean from ifdefs and will complie with mock functions.
>
> we did this with accelerated RFS feature. look for "#ifndef
> CONFIG_RFS_ACCEL" in en.h
>
There are still occurrences of CONFIG_RFS_ACCEL in en_main.c and
main.c. For eswitch its a header problem, not everything related to
eswitch was extracted out of main though backend functions. There is a
lot of code that related to eswitch that is intertwined with the core
code.

>>  #include "vxlan.h"
>>
>>  struct mlx5e_rq_param {
>> @@ -585,10 +587,12 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
>>
>>         switch (priv->params.rq_wq_type) {
>>         case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>                 if (mlx5e_is_vf_vport_rep(priv)) {
>>                         err = -EINVAL;
>>                         goto err_rq_wq_destroy;
>>                 }
>> +#endif
>>
>>                 rq->handle_rx_cqe = mlx5e_handle_rx_cqe_mpwrq;
>>                 rq->alloc_wqe = mlx5e_alloc_rx_mpwqe;
>> @@ -617,10 +621,14 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
>>                         goto err_rq_wq_destroy;
>>                 }
>>
>> -               if (mlx5e_is_vf_vport_rep(priv))
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>> +               if (mlx5e_is_vf_vport_rep(priv)) {
>>                         rq->handle_rx_cqe = mlx5e_handle_rx_cqe_rep;
>> -               else
>> +               } else
>> +#endif
>> +               {
>>                         rq->handle_rx_cqe = mlx5e_handle_rx_cqe;
>> +               }
>>
>>                 rq->alloc_wqe = mlx5e_alloc_rx_wqe;
>>                 rq->dealloc_wqe = mlx5e_dealloc_rx_wqe;
>> @@ -2198,7 +2206,6 @@ static void mlx5e_netdev_set_tcs(struct net_device *netdev)
>>  int mlx5e_open_locked(struct net_device *netdev)
>>  {
>>         struct mlx5e_priv *priv = netdev_priv(netdev);
>> -       struct mlx5_core_dev *mdev = priv->mdev;
>>         int num_txqs;
>>         int err;
>>
>> @@ -2233,11 +2240,13 @@ int mlx5e_open_locked(struct net_device *netdev)
>>         if (priv->profile->update_stats)
>>                 queue_delayed_work(priv->wq, &priv->update_stats_work, 0);
>>
>> -       if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>> +       if (MLX5_CAP_GEN(priv->mdev, vport_group_manager)) {
>>                 err = mlx5e_add_sqs_fwd_rules(priv);
>>                 if (err)
>>                         goto err_close_channels;
>>         }
>> +#endif
>>         return 0;
>>
>>  err_close_channels:
>> @@ -2262,7 +2271,6 @@ int mlx5e_open(struct net_device *netdev)
>>  int mlx5e_close_locked(struct net_device *netdev)
>>  {
>>         struct mlx5e_priv *priv = netdev_priv(netdev);
>> -       struct mlx5_core_dev *mdev = priv->mdev;
>>
>>         /* May already be CLOSED in case a previous configuration operation
>>          * (e.g RX/TX queue size change) that involves close&open failed.
>> @@ -2272,8 +2280,10 @@ int mlx5e_close_locked(struct net_device *netdev)
>>
>>         clear_bit(MLX5E_STATE_OPENED, &priv->state);
>>
>> -       if (MLX5_CAP_GEN(mdev, vport_group_manager))
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>> +       if (MLX5_CAP_GEN(priv->mdev, vport_group_manager))
>>                 mlx5e_remove_sqs_fwd_rules(priv);
>> +#endif
>>
>>         mlx5e_timestamp_cleanup(priv);
>>         netif_carrier_off(priv->netdev);
>> @@ -2742,12 +2752,15 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
>>         struct mlx5e_vport_stats *vstats = &priv->stats.vport;
>>         struct mlx5e_pport_stats *pstats = &priv->stats.pport;
>>
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         if (mlx5e_is_uplink_rep(priv)) {
>>                 stats->rx_packets = PPORT_802_3_GET(pstats, a_frames_received_ok);
>>                 stats->rx_bytes   = PPORT_802_3_GET(pstats, a_octets_received_ok);
>>                 stats->tx_packets = PPORT_802_3_GET(pstats, a_frames_transmitted_ok);
>>                 stats->tx_bytes   = PPORT_802_3_GET(pstats, a_octets_transmitted_ok);
>> -       } else {
>> +       } else
>> +#endif
>> +       {
>>                 stats->rx_packets = sstats->rx_packets;
>>                 stats->rx_bytes   = sstats->rx_bytes;
>>                 stats->tx_packets = sstats->tx_packets;
>> @@ -2991,6 +3004,7 @@ static int mlx5e_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
>>         }
>>  }
>>
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>  static int mlx5e_set_vf_mac(struct net_device *dev, int vf, u8 *mac)
>>  {
>>         struct mlx5e_priv *priv = netdev_priv(dev);
>> @@ -3093,6 +3107,7 @@ static int mlx5e_get_vf_stats(struct net_device *dev,
>>         return mlx5_eswitch_get_vport_stats(mdev->priv.eswitch, vf + 1,
>>                                             vf_stats);
>>  }
>> +#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
>>
>>  void mlx5e_add_vxlan_port(struct net_device *netdev,
>>                           struct udp_tunnel_info *ti)
>> @@ -3329,6 +3344,7 @@ static const struct net_device_ops mlx5e_netdev_ops_basic = {
>>  #endif
>>  };
>>
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>  static const struct net_device_ops mlx5e_netdev_ops_sriov = {
>>         .ndo_open                = mlx5e_open,
>>         .ndo_stop                = mlx5e_close,
>> @@ -3358,14 +3374,15 @@ static const struct net_device_ops mlx5e_netdev_ops_sriov = {
>>         .ndo_get_vf_config       = mlx5e_get_vf_config,
>>         .ndo_set_vf_link_state   = mlx5e_set_vf_link_state,
>>         .ndo_get_vf_stats        = mlx5e_get_vf_stats,
>> +       .ndo_has_offload_stats   = mlx5e_has_offload_stats,
>> +       .ndo_get_offload_stats   = mlx5e_get_offload_stats,
>>         .ndo_tx_timeout          = mlx5e_tx_timeout,
>>         .ndo_xdp                 = mlx5e_xdp,
>>  #ifdef CONFIG_NET_POLL_CONTROLLER
>>         .ndo_poll_controller     = mlx5e_netpoll,
>>  #endif
>> -       .ndo_has_offload_stats   = mlx5e_has_offload_stats,
>> -       .ndo_get_offload_stats   = mlx5e_get_offload_stats,
>>  };
>> +#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
>>
>>  static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
>>  {
>> @@ -3586,13 +3603,16 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>>
>>         SET_NETDEV_DEV(netdev, &mdev->pdev->dev);
>>
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
>>                 netdev->netdev_ops = &mlx5e_netdev_ops_sriov;
>>  #ifdef CONFIG_MLX5_CORE_EN_DCB
>>                 if (MLX5_CAP_GEN(mdev, qos))
>>                         netdev->dcbnl_ops = &mlx5e_dcbnl_ops;
>>  #endif
>> -       } else {
>> +       } else
>> +#endif
>> +       {
>>                 netdev->netdev_ops = &mlx5e_netdev_ops_basic;
>>         }
>>
>> @@ -3795,14 +3815,16 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
>>  {
>>         struct net_device *netdev = priv->netdev;
>>         struct mlx5_core_dev *mdev = priv->mdev;
>> -       struct mlx5_eswitch *esw = mdev->priv.eswitch;
>> -       struct mlx5_eswitch_rep rep;
>>
>>         mlx5_lag_add(mdev, netdev);
>>
>>         mlx5e_enable_async_events(priv);
>>
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
>> +               struct mlx5_eswitch *esw = mdev->priv.eswitch;
>> +               struct mlx5_eswitch_rep rep;
>> +
>>                 mlx5_query_nic_vport_mac_address(mdev, 0, rep.hw_id);
>>                 rep.load = mlx5e_nic_rep_load;
>>                 rep.unload = mlx5e_nic_rep_unload;
>> @@ -3810,6 +3832,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
>>                 rep.netdev = netdev;
>>                 mlx5_eswitch_register_vport_rep(esw, 0, &rep);
>>         }
>> +#endif
>>
>>         if (netdev->reg_state != NETREG_REGISTERED)
>>                 return;
>> @@ -3827,11 +3850,14 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
>>  static void mlx5e_nic_disable(struct mlx5e_priv *priv)
>>  {
>>         struct mlx5_core_dev *mdev = priv->mdev;
>> -       struct mlx5_eswitch *esw = mdev->priv.eswitch;
>>
>>         queue_work(priv->wq, &priv->set_rx_mode_work);
>> -       if (MLX5_CAP_GEN(mdev, vport_group_manager))
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>> +       if (MLX5_CAP_GEN(priv->mdev, vport_group_manager)) {
>> +               struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
>>                 mlx5_eswitch_unregister_vport_rep(esw, 0);
>> +       }
>> +#endif
>>         mlx5e_disable_async_events(priv);
>>         mlx5_lag_remove(mdev);
>>  }
>> @@ -3942,6 +3968,7 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
>>         return err;
>>  }
>>
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>  static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
>>  {
>>         struct mlx5_eswitch *esw = mdev->priv.eswitch;
>> @@ -3964,6 +3991,7 @@ static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
>>                 mlx5_eswitch_register_vport_rep(esw, vport, &rep);
>>         }
>>  }
>> +#endif
>>
>>  void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
>>  {
>> @@ -4028,11 +4056,8 @@ static void mlx5e_detach(struct mlx5_core_dev *mdev, void *vpriv)
>>
>>  static void *mlx5e_add(struct mlx5_core_dev *mdev)
>>  {
>> -       struct mlx5_eswitch *esw = mdev->priv.eswitch;
>> -       int total_vfs = MLX5_TOTAL_VPORTS(mdev);
>>         void *ppriv = NULL;
>>         void *priv;
>> -       int vport;
>>         int err;
>>         struct net_device *netdev;
>>
>> @@ -4040,10 +4065,14 @@ static void *mlx5e_add(struct mlx5_core_dev *mdev)
>>         if (err)
>>                 return NULL;
>>
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         mlx5e_register_vport_rep(mdev);
>>
>> -       if (MLX5_CAP_GEN(mdev, vport_group_manager))
>> +       if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
>> +               struct mlx5_eswitch *esw = mdev->priv.eswitch;
>>                 ppriv = &esw->offloads.vport_reps[0];
>> +       }
>> +#endif
>>
>>         netdev = mlx5e_create_netdev(mdev, &mlx5e_nic_profile, ppriv);
>>         if (!netdev) {
>> @@ -4074,8 +4103,16 @@ static void *mlx5e_add(struct mlx5_core_dev *mdev)
>>         mlx5e_destroy_netdev(mdev, priv);
>>
>>  err_unregister_reps:
>> -       for (vport = 1; vport < total_vfs; vport++)
>> -               mlx5_eswitch_unregister_vport_rep(esw, vport);
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>> +       {
>> +               int total_vfs = MLX5_TOTAL_VPORTS(mdev);
>> +               struct mlx5_eswitch *esw = mdev->priv.eswitch;
>> +               int vport;
>> +
>> +               for (vport = 1; vport < total_vfs; vport++)
>> +                       mlx5_eswitch_unregister_vport_rep(esw, vport);
>> +       }
>> +#endif
>>
>>         return NULL;
>>  }
>> @@ -4093,13 +4130,18 @@ void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
>>
>>  static void mlx5e_remove(struct mlx5_core_dev *mdev, void *vpriv)
>>  {
>> -       struct mlx5_eswitch *esw = mdev->priv.eswitch;
>> -       int total_vfs = MLX5_TOTAL_VPORTS(mdev);
>>         struct mlx5e_priv *priv = vpriv;
>> -       int vport;
>>
>> -       for (vport = 1; vport < total_vfs; vport++)
>> -               mlx5_eswitch_unregister_vport_rep(esw, vport);
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>> +       {
>> +               struct mlx5_eswitch *esw = mdev->priv.eswitch;
>> +               int total_vfs = MLX5_TOTAL_VPORTS(mdev);
>> +               int vport;
>> +
>> +               for (vport = 1; vport < total_vfs; vport++)
>> +                       mlx5_eswitch_unregister_vport_rep(esw, vport);
>> +       }
>> +#endif
>>
>>         unregister_netdev(priv->netdev);
>>         mlx5e_detach(mdev, vpriv);
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> index 640f10f..8e5f565 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> @@ -128,6 +128,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
>>         return rule;
>>  }
>>
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>  static struct mlx5_flow_handle *
>>  mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
>>                       struct mlx5_flow_spec *spec,
>> @@ -160,6 +161,7 @@ static void mlx5e_detach_encap(struct mlx5e_priv *priv,
>>                 kfree(e);
>>         }
>>  }
>> +#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
>>
>>  /* we get here also when setting rule to the FW failed, etc. It means that the
>>   * flow rule itself might not exist, but some offloading related to the actions
>> @@ -168,7 +170,6 @@ static void mlx5e_detach_encap(struct mlx5e_priv *priv,
>>  static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
>>                               struct mlx5e_tc_flow *flow)
>>  {
>> -       struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
>>         struct mlx5_fc *counter = NULL;
>>
>>         if (!IS_ERR(flow->rule)) {
>> @@ -177,11 +178,17 @@ static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
>>                 mlx5_fc_destroy(priv->mdev, counter);
>>         }
>>
>> -       if (esw && esw->mode == SRIOV_OFFLOADS) {
>> -               mlx5_eswitch_del_vlan_action(esw, flow->attr);
>> -               if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
>> -                       mlx5e_detach_encap(priv, flow);
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>> +       {
>> +               struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
>> +
>> +               if (esw && esw->mode == SRIOV_OFFLOADS) {
>> +                       mlx5_eswitch_del_vlan_action(esw, flow->attr);
>> +                       if (flow->attr->action & MLX5_FLOW_CONTEXT_ACTION_ENCAP)
>> +                               mlx5e_detach_encap(priv, flow);
>> +               }
>>         }
>> +#endif
>>
>>         if (!mlx5e_tc_num_filters(priv) && (priv->fs.tc.t)) {
>>                 mlx5_destroy_flow_table(priv->fs.tc.t);
>> @@ -679,6 +686,7 @@ static inline int hash_encap_info(struct ip_tunnel_key *key)
>>         return jhash(key, sizeof(*key), 0);
>>  }
>>
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>  static int mlx5e_route_lookup_ipv4(struct mlx5e_priv *priv,
>>                                    struct net_device *mirred_dev,
>>                                    struct net_device **out_dev,
>> @@ -1129,20 +1137,27 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
>>         }
>>         return 0;
>>  }
>> +#endif /* CONFIG_MLX5_CORE_EN_ESWITCH */
>>
>>  int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
>>                            struct tc_cls_flower_offload *f)
>>  {
>>         struct mlx5e_tc_table *tc = &priv->fs.tc;
>>         int err = 0;
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         bool fdb_flow = false;
>> +#endif
>>         u32 flow_tag, action;
>>         struct mlx5e_tc_flow *flow;
>>         struct mlx5_flow_spec *spec;
>> -       struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
>>
>> -       if (esw && esw->mode == SRIOV_OFFLOADS)
>> -               fdb_flow = true;
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>> +       {
>> +               struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
>> +
>> +               if (esw && esw->mode == SRIOV_OFFLOADS)
>> +                       fdb_flow = true;
>> +       }
>>
>>         if (fdb_flow)
>>                 flow = kzalloc(sizeof(*flow) +
>> @@ -1150,6 +1165,9 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
>>                                GFP_KERNEL);
>>         else
>>                 flow = kzalloc(sizeof(*flow), GFP_KERNEL);
>> +#else
>> +       flow = kzalloc(sizeof(*flow), GFP_KERNEL);
>> +#endif
>>
>>         spec = mlx5_vzalloc(sizeof(*spec));
>>         if (!spec || !flow) {
>> @@ -1163,13 +1181,16 @@ int mlx5e_configure_flower(struct mlx5e_priv *priv, __be16 protocol,
>>         if (err < 0)
>>                 goto err_free;
>>
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         if (fdb_flow) {
>>                 flow->attr  = (struct mlx5_esw_flow_attr *)(flow + 1);
>>                 err = parse_tc_fdb_actions(priv, f->exts, flow);
>>                 if (err < 0)
>>                         goto err_free;
>>                 flow->rule = mlx5e_tc_add_fdb_flow(priv, spec, flow->attr);
>> -       } else {
>> +       } else
>> +#endif
>> +       {
>>                 err = parse_tc_nic_actions(priv, f->exts, &action, &flow_tag);
>>                 if (err < 0)
>>                         goto err_free;
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> index ea5d8d3..2c67c25 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> @@ -35,7 +35,7 @@
>>  #include <linux/mlx5/driver.h>
>>  #include <linux/mlx5/cmd.h>
>>  #include "mlx5_core.h"
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>  #include "eswitch.h"
>>  #endif
>>
>> @@ -462,7 +462,7 @@ static irqreturn_t mlx5_eq_int(int irq, void *eq_ptr)
>>                         }
>>                         break;
>>
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>                 case MLX5_EVENT_TYPE_NIC_VPORT_CHANGE:
>>                         mlx5_eswitch_vport_event(dev->priv.eswitch, eqe);
>>                         break;
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> index 84f7970..224f499 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> @@ -53,7 +53,7 @@
>>  #include <net/devlink.h>
>>  #include "mlx5_core.h"
>>  #include "fs_core.h"
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>  #include "eswitch.h"
>>  #endif
>>
>> @@ -941,7 +941,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
>>                 goto err_tables_cleanup;
>>         }
>>
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         err = mlx5_eswitch_init(dev);
>>         if (err) {
>>                 dev_err(&pdev->dev, "Failed to init eswitch %d\n", err);
>> @@ -958,7 +958,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
>>         return 0;
>>
>>  err_eswitch_cleanup:
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         mlx5_eswitch_cleanup(dev->priv.eswitch);
>>
>>  err_rl_cleanup:
>> @@ -981,7 +981,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
>>  static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
>>  {
>>         mlx5_sriov_cleanup(dev);
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         mlx5_eswitch_cleanup(dev->priv.eswitch);
>>  #endif
>>         mlx5_cleanup_rl_table(dev);
>> @@ -1131,7 +1131,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>>                 goto err_fs;
>>         }
>>
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         mlx5_eswitch_attach(dev->priv.eswitch);
>>  #endif
>>
>> @@ -1162,7 +1162,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>>         mlx5_sriov_detach(dev);
>>
>>  err_sriov:
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         mlx5_eswitch_detach(dev->priv.eswitch);
>>  #endif
>>         mlx5_cleanup_fs(dev);
>> @@ -1233,7 +1233,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>>                 mlx5_detach_device(dev);
>>
>>         mlx5_sriov_detach(dev);
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         mlx5_eswitch_detach(dev->priv.eswitch);
>>  #endif
>>         mlx5_cleanup_fs(dev);
>> @@ -1269,7 +1269,7 @@ struct mlx5_core_event_handler {
>>  };
>>
>>  static const struct devlink_ops mlx5_devlink_ops = {
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>         .eswitch_mode_set = mlx5_devlink_eswitch_mode_set,
>>         .eswitch_mode_get = mlx5_devlink_eswitch_mode_get,
>>         .eswitch_inline_mode_set = mlx5_devlink_eswitch_inline_mode_set,
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
>> index e086277..0664cc4 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
>> @@ -33,7 +33,7 @@
>>  #include <linux/pci.h>
>>  #include <linux/mlx5/driver.h>
>>  #include "mlx5_core.h"
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_ESWITCH
>>  #include "eswitch.h"
>>  #endif
>>
>> @@ -57,7 +57,7 @@ static int mlx5_device_enable_sriov(struct mlx5_core_dev *dev, int num_vfs)
>>                 return -EBUSY;
>>         }
>>
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_ESWITCH
>>         err = mlx5_eswitch_enable_sriov(dev->priv.eswitch, num_vfs, SRIOV_LEGACY);
>>         if (err) {
>>                 mlx5_core_warn(dev,
>> @@ -102,7 +102,7 @@ static void mlx5_device_disable_sriov(struct mlx5_core_dev *dev)
>>                 sriov->enabled_vfs--;
>>         }
>>
>> -#ifdef CONFIG_MLX5_CORE_EN
>> +#ifdef CONFIG_MLX5_CORE_ESWITCH
>>         mlx5_eswitch_disable_sriov(dev->priv.eswitch);
>>  #endif
>>
>> --
>> 2.9.3
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-27 18:28             ` Saeed Mahameed
@ 2017-01-27 18:42               ` Tom Herbert
  2017-01-27 21:15                 ` Saeed Mahameed
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2017-01-27 18:42 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Or Gerlitz, Saeed Mahameed, David Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 10:28 AM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Fri, Jan 27, 2017 at 8:16 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Fri, Jan 27, 2017 at 10:05 AM, Saeed Mahameed
>> <saeedm@dev.mellanox.co.il> wrote:
>>> On Fri, Jan 27, 2017 at 7:50 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>> On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
>>>> <saeedm@dev.mellanox.co.il> wrote:
>>>>> On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>>>>>>> whether the eswitch code is built. Change Kconfig and Makefile
>>>>>>> accordingly.
>>>>>>
>>>>>> Tom, FWIW, please note that the basic e-switch functionality is needed
>>>>>> also when SRIOV isn't of use, this is for a multi host configuration.
>>>>>>
>>>>>
>>>>> Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
>>>>> MAC address wanted by VF or PF.
>>>>> To keep one flow in the code, the implementation is done as part of eswitch.
>>>>>
>>>>> so in multi-host configuration (where there are 4 PFs) each PF should
>>>>> invoke set_l2_table_entry_cmd  for each one of its own UC MACs.
>>>>>
>>>>> populating the l2 table is done using the whole eswitch event driven
>>>>> mechanisms, it is not easy and IMH not right to separate eswitch
>>>>> tables from l2 table (same management logic, different tables).
>>>>>
>>>>> Anyways as Or stated this is just an FYI, eswitch needs to be enabled
>>>>> on Multi-host configuration.
>>>>>
>>>> What indicate a multi-host configuration?
>>>
>>> nothing in the driver, it is transparent.
>>>
>> So then we always need the eswitch code to be built even if someone
>> never uses any of it?
>>
>
> yes.
> but for your convenience all you need is to compile eswitch.c.
> esiwtch_offoalds.c and en_rep.c can be compiled out for basic ethernet.
>
Well eswitch.c is 2200 LOC. en_rep.c and eswitch_offloads.c are 1600
LOC. If we _must_ have eswitch.c then there's probably not much point
then. But I am still finding it hard to fathom that eswitch has now
become a mandatory component of Ethernet drivers/devices.

Tom

>>>>
>>>>>> Or.
>>>>>>
>>>>>> My WW (and same for the rest of the IL team..) has ended so I will be
>>>>>> able to further look on this series and comment on Sunday.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-27 18:33     ` Tom Herbert
@ 2017-01-27 20:59       ` Saeed Mahameed
  0 siblings, 0 replies; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-27 20:59 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 8:33 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Fri, Jan 27, 2017 at 10:19 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>>> whether the eswitch code is built. Change Kconfig and Makefile
>>> accordingly.
>>>
>>> Signed-off-by: Tom Herbert <tom@herbertland.com>
>>> ---
>>>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   | 11 +++
>>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  6 +-
>>>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 92 +++++++++++++++++------
>>>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   | 39 +++++++---
>>>  drivers/net/ethernet/mellanox/mlx5/core/eq.c      |  4 +-
>>>  drivers/net/ethernet/mellanox/mlx5/core/main.c    | 16 ++--
>>>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |  6 +-
>>>  7 files changed, 125 insertions(+), 49 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>>> index ddb4ca4..27aae58 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
>>> @@ -30,3 +30,14 @@ config MLX5_CORE_EN_DCB
>>>           This flag is depended on the kernel's DCB support.
>>>
>>>           If unsure, set to Y
>>> +
>>> +config MLX5_CORE_EN_ESWITCH
>>> +       bool "Ethernet switch"
>>> +       default y
>>> +       depends on MLX5_CORE_EN
>>> +       ---help---
>>> +         Say Y here if you want to use Ethernet switch (E-switch). E-Switch
>>> +         is the software entity that represents and manages ConnectX4
>>> +         inter-HCA ethernet l2 switching.
>>> +
>>> +         If unsure, set to Y
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>> index 9f43beb..17025d8 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>> @@ -5,9 +5,11 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
>>>                 mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
>>>                 fs_counters.o rl.o lag.o dev.o
>>>
>>> -mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
>>> +mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o \
>>>                 en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
>>>                 en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
>>> -               en_tc.o en_arfs.o en_rep.o en_fs_ethtool.o en_selftest.o
>>> +               en_tc.o en_arfs.o en_fs_ethtool.o en_selftest.o
>>>
>>>  mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
>>> +
>>> +mlx5_core-$(CONFIG_MLX5_CORE_EN_ESWITCH) += eswitch.o eswitch_offloads.o en_rep.o
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> index e829143..1db4d98 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> @@ -38,7 +38,9 @@
>>>  #include <linux/bpf.h>
>>>  #include "en.h"
>>>  #include "en_tc.h"
>>> +#ifdef CONFIG_MLX5_CORE_EN_ESWITCH
>>>  #include "eswitch.h"
>>> +#endif
>>
>> Wouldn't it be cleaner if we left the main code (en_main.c) untouched
>> and kept this
>> #include "eswitch.h" and instead of filling the main flow code with
>> #ifdefs, in eswitch.h
>> we can create eswitch mock API functions when
>> CONFIG_MLX5_CORE_EN_ESWITCH is not enabled ? the main flow will be
>> clean from ifdefs and will complie with mock functions.
>>
>> we did this with accelerated RFS feature. look for "#ifndef
>> CONFIG_RFS_ACCEL" in en.h
>>
> There are still occurrences of CONFIG_RFS_ACCEL in en_main.c and
> main.c. For eswitch its a header problem, not everything related to
> eswitch was extracted out of main though backend functions. There is a
> lot of code that related to eswitch that is intertwined with the core
> code.
>

Interesting, i just did a quick look and it seems to me all eswitch
logic in en_main.c can be kept untouched if we have the right mock
functions, on the other hand it seems that there are a lot of
eswitch functions to mock, i am not sure it is a good thing anymore,
let's leave it as is for now.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-27 18:42               ` Tom Herbert
@ 2017-01-27 21:15                 ` Saeed Mahameed
  2017-01-27 23:23                   ` Alexei Starovoitov
  0 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-27 21:15 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Or Gerlitz, Saeed Mahameed, David Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 8:42 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Fri, Jan 27, 2017 at 10:28 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Fri, Jan 27, 2017 at 8:16 PM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Fri, Jan 27, 2017 at 10:05 AM, Saeed Mahameed
>>> <saeedm@dev.mellanox.co.il> wrote:
>>>> On Fri, Jan 27, 2017 at 7:50 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> On Fri, Jan 27, 2017 at 9:38 AM, Saeed Mahameed
>>>>> <saeedm@dev.mellanox.co.il> wrote:
>>>>>> On Fri, Jan 27, 2017 at 7:34 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>>>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>>> Add a configuration option (CONFIG_MLX5_CORE_ESWITCH) for controlling
>>>>>>>> whether the eswitch code is built. Change Kconfig and Makefile
>>>>>>>> accordingly.
>>>>>>>
>>>>>>> Tom, FWIW, please note that the basic e-switch functionality is needed
>>>>>>> also when SRIOV isn't of use, this is for a multi host configuration.
>>>>>>>
>>>>>>
>>>>>> Right, set_l2_table_entry@eswitch.c need to be called by PF for any UC
>>>>>> MAC address wanted by VF or PF.
>>>>>> To keep one flow in the code, the implementation is done as part of eswitch.
>>>>>>
>>>>>> so in multi-host configuration (where there are 4 PFs) each PF should
>>>>>> invoke set_l2_table_entry_cmd  for each one of its own UC MACs.
>>>>>>
>>>>>> populating the l2 table is done using the whole eswitch event driven
>>>>>> mechanisms, it is not easy and IMH not right to separate eswitch
>>>>>> tables from l2 table (same management logic, different tables).
>>>>>>
>>>>>> Anyways as Or stated this is just an FYI, eswitch needs to be enabled
>>>>>> on Multi-host configuration.
>>>>>>
>>>>> What indicate a multi-host configuration?
>>>>
>>>> nothing in the driver, it is transparent.
>>>>
>>> So then we always need the eswitch code to be built even if someone
>>> never uses any of it?
>>>
>>
>> yes.
>> but for your convenience all you need is to compile eswitch.c.
>> esiwtch_offoalds.c and en_rep.c can be compiled out for basic ethernet.
>>
> Well eswitch.c is 2200 LOC. en_rep.c and eswitch_offloads.c are 1600
> LOC. If we _must_ have eswitch.c then there's probably not much point
> then. But I am still finding it hard to fathom that eswitch has now
> become a mandatory component of Ethernet drivers/devices.
>

It is only mandatory for configurations that needs eswitch, where the
driver has no way to know about them, for a good old bare metal box,
eswitch is not needed.

we can do some work to strip the l2 table logic - needed for PFs to
work on multi-host - out of eswitch but again that would further
complicate the driver code since eswitch will still need to update l2
tables for VFs.


> Tom
>
>>>>>
>>>>>>> Or.
>>>>>>>
>>>>>>> My WW (and same for the rest of the IL team..) has ended so I will be
>>>>>>> able to further look on this series and comment on Sunday.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-27 21:15                 ` Saeed Mahameed
@ 2017-01-27 23:23                   ` Alexei Starovoitov
  2017-01-28 11:20                     ` Saeed Mahameed
  0 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2017-01-27 23:23 UTC (permalink / raw)
  To: Saeed Mahameed, Tom Herbert
  Cc: Or Gerlitz, Saeed Mahameed, David Miller, Linux Netdev List, Kernel Team

On 1/27/17 1:15 PM, Saeed Mahameed wrote:
> It is only mandatory for configurations that needs eswitch, where the
> driver has no way to know about them, for a good old bare metal box,
> eswitch is not needed.
>
> we can do some work to strip the l2 table logic - needed for PFs to
> work on multi-host - out of eswitch but again that would further
> complicate the driver code since eswitch will still need to update l2
> tables for VFs.

Saeed,
for multi-host setups every host in that multi-host doesn't
actually see the eswitch, no? Otherwise broken driver on one machine
can affect the other hosts in the same bundle? Please double check,
since this is absolutely critical HW requirement.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-27 23:23                   ` Alexei Starovoitov
@ 2017-01-28 11:20                     ` Saeed Mahameed
  2017-01-28 17:52                       ` Alexei Starovoitov
  0 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-28 11:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tom Herbert, Or Gerlitz, Saeed Mahameed, David Miller,
	Linux Netdev List, Kernel Team

On Sat, Jan 28, 2017 at 1:23 AM, Alexei Starovoitov <ast@fb.com> wrote:
> On 1/27/17 1:15 PM, Saeed Mahameed wrote:
>>
>> It is only mandatory for configurations that needs eswitch, where the
>> driver has no way to know about them, for a good old bare metal box,
>> eswitch is not needed.
>>
>> we can do some work to strip the l2 table logic - needed for PFs to
>> work on multi-host - out of eswitch but again that would further
>> complicate the driver code since eswitch will still need to update l2
>> tables for VFs.
>
>
> Saeed,
> for multi-host setups every host in that multi-host doesn't
> actually see the eswitch, no? Otherwise broken driver on one machine
> can affect the other hosts in the same bundle? Please double check,

each host (PF) has its own eswitch, and each eswitch lives in its own
"steering-space"
 and it can't affect others.

> since this is absolutely critical HW requirement.
>

The only shared HW resources between hosts (PFs) is the simple l2 table,
and the only thing a host can ask from the l2 talbe (FW) is: "forward
UC MAC to me", and it is the responsibility of the the driver eswitch
to do so.

the l2 table is created and managed by FW, SW eswitch can only request
from FW, and the FW is trusted.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/4] mlx5: Create build configuration options
  2017-01-27 18:13   ` Tom Herbert
@ 2017-01-28 11:38     ` Saeed Mahameed
  2017-01-28 17:19       ` Tom Herbert
  0 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-28 11:38 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Kernel Team

On Fri, Jan 27, 2017 at 8:13 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Fri, Jan 27, 2017 at 9:58 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>> This patchset creates configuration options for sriov, vxlan, eswitch,
>>> and tc features in the mlx5 driver. The purpose of this is to allow not
>>> building these features. These features are optional advanced features
>>> that are not required for a core Ethernet driver. A user can disable
>>> these features which resuces the amount of code in the driver. Disabling
>>> these features (and DCB) reduces the size of mlx5_core.o by about 16%.
>>> This is also can reduce the complexity of backport and rebases since
>>> user would no longer need to worry about dependencies with the rest of
>>> the kernel that features which might not be of any interest to a user
>>> may bring in.
>>>
>>> Tested: Build and ran the driver with all features enabled (the default)
>>> and with none enabled (including DCB). Did not see any issues. I did
>>> not explicity test operation of ayy of features in the list.
>>>
>>
>> Basically I am not against this kind of change, infact i am with it,
>> although I would have done some restructuring in the driver before i
>> did such change ;), filling the code with ifdefs is not a neat thing.
>>
> If you wish, please take this as an RFC and feel free to structure the
> code the right way. I think the intent is clear enough and looks like
> davem isn't going to allow the directory restructuring so something
> like this seems to be the best course of action now.
>

Right.

>> I agree this will simplify backporting and provide some kind of
>> feature separation inside the driver.
>> But this will also increase the testing matrix we need to cover and
>> increase the likelihood of kbuild breaks by an order of magnitude.
>>
> The testing matrix already exploded with the proliferation of
> supported features. If anything this reduces the test matrix problem.
> For instance, if we make a change to the core driver and functionality
> properly isolated there is a much better chance that this won't affect
> peripheral functionality and vice versa. It is just not feasible for
> us to test every combination of NIC features for every change being
> made.
>

Yes for isolated features, but for base functionality, we need to test
it with all new device specific kconfig combinations on every patch!
since a misplaced code inside or outside the correct ifdef
can easily go unnoticed and break functionality.

>> One more thing, do we really need a device specific flag per feature
>> per vendor per device?  can't we just use the same kconfig flag for
>> all drivers and if there is a more generic system wide flag that
>> covers the same feature
>> can't we just use it, for instance instead of
>> CONFIG_<DRIVER_NAME>_SRIOV why not use already existing CONFIG_PCI_IOV
>> for all drivers ?
>>
> That sounds good to me. We already have CONFIG_RFS_ACCEL and others
> that do that.
>
> Tom
>
>> Saeed.
>>
>>>
>>>
>>> Tom Herbert (4):
>>>   mlx5: Make building eswitch configurable
>>>   mlx5: Make building SR-IOV configurable
>>>   mlx5: Make building tc hardware offload configurable
>>>   mlx5: Make building vxlan hardware offload configurable
>>>
>>>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  35 ++++++
>>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  16 ++-
>>>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 129 ++++++++++++++++------
>>>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   |  39 +++++--
>>>  drivers/net/ethernet/mellanox/mlx5/core/eq.c      |   4 +-
>>>  drivers/net/ethernet/mellanox/mlx5/core/lag.c     |   2 +
>>>  drivers/net/ethernet/mellanox/mlx5/core/main.c    |  32 ++++--
>>>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |   6 +-
>>>  8 files changed, 205 insertions(+), 58 deletions(-)
>>>
>>> --
>>> 2.9.3
>>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/4] mlx5: Create build configuration options
  2017-01-28 11:38     ` Saeed Mahameed
@ 2017-01-28 17:19       ` Tom Herbert
  2017-01-29  8:07         ` Saeed Mahameed
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2017-01-28 17:19 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Kernel Team

On Sat, Jan 28, 2017 at 3:38 AM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Fri, Jan 27, 2017 at 8:13 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Fri, Jan 27, 2017 at 9:58 AM, Saeed Mahameed
>> <saeedm@dev.mellanox.co.il> wrote:
>>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>> This patchset creates configuration options for sriov, vxlan, eswitch,
>>>> and tc features in the mlx5 driver. The purpose of this is to allow not
>>>> building these features. These features are optional advanced features
>>>> that are not required for a core Ethernet driver. A user can disable
>>>> these features which resuces the amount of code in the driver. Disabling
>>>> these features (and DCB) reduces the size of mlx5_core.o by about 16%.
>>>> This is also can reduce the complexity of backport and rebases since
>>>> user would no longer need to worry about dependencies with the rest of
>>>> the kernel that features which might not be of any interest to a user
>>>> may bring in.
>>>>
>>>> Tested: Build and ran the driver with all features enabled (the default)
>>>> and with none enabled (including DCB). Did not see any issues. I did
>>>> not explicity test operation of ayy of features in the list.
>>>>
>>>
>>> Basically I am not against this kind of change, infact i am with it,
>>> although I would have done some restructuring in the driver before i
>>> did such change ;), filling the code with ifdefs is not a neat thing.
>>>
>> If you wish, please take this as an RFC and feel free to structure the
>> code the right way. I think the intent is clear enough and looks like
>> davem isn't going to allow the directory restructuring so something
>> like this seems to be the best course of action now.
>>
>
> Right.
>
>>> I agree this will simplify backporting and provide some kind of
>>> feature separation inside the driver.
>>> But this will also increase the testing matrix we need to cover and
>>> increase the likelihood of kbuild breaks by an order of magnitude.
>>>
>> The testing matrix already exploded with the proliferation of
>> supported features. If anything this reduces the test matrix problem.
>> For instance, if we make a change to the core driver and functionality
>> properly isolated there is a much better chance that this won't affect
>> peripheral functionality and vice versa. It is just not feasible for
>> us to test every combination of NIC features for every change being
>> made.
>>
>
> Yes for isolated features, but for base functionality, we need to test
> it with all new device specific kconfig combinations on every patch!

Sorry, but that is the price you need to pay for a feature rich device.

On the subject of testing, I don't really see any indication in these
patches on how patches are being tested. Also, there are patches that
fix things without any mention of how to repro the problems. It is
critical that we know IPv6 is tested as much or more than IPv4 (just
last week with hit yet another IPv6-only issue in an another upstream
driver that should have been caught with a simple load test-- this
really is not acceptable any more!). Please add a description of how
patches were tested to commit logs.

Tom

> since a misplaced code inside or outside the correct ifdef
> can easily go unnoticed and break functionality.
>
>>> One more thing, do we really need a device specific flag per feature
>>> per vendor per device?  can't we just use the same kconfig flag for
>>> all drivers and if there is a more generic system wide flag that
>>> covers the same feature
>>> can't we just use it, for instance instead of
>>> CONFIG_<DRIVER_NAME>_SRIOV why not use already existing CONFIG_PCI_IOV
>>> for all drivers ?
>>>
>> That sounds good to me. We already have CONFIG_RFS_ACCEL and others
>> that do that.
>>
>> Tom
>>
>>> Saeed.
>>>
>>>>
>>>>
>>>> Tom Herbert (4):
>>>>   mlx5: Make building eswitch configurable
>>>>   mlx5: Make building SR-IOV configurable
>>>>   mlx5: Make building tc hardware offload configurable
>>>>   mlx5: Make building vxlan hardware offload configurable
>>>>
>>>>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  35 ++++++
>>>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  16 ++-
>>>>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 129 ++++++++++++++++------
>>>>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   |  39 +++++--
>>>>  drivers/net/ethernet/mellanox/mlx5/core/eq.c      |   4 +-
>>>>  drivers/net/ethernet/mellanox/mlx5/core/lag.c     |   2 +
>>>>  drivers/net/ethernet/mellanox/mlx5/core/main.c    |  32 ++++--
>>>>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |   6 +-
>>>>  8 files changed, 205 insertions(+), 58 deletions(-)
>>>>
>>>> --
>>>> 2.9.3
>>>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-28 11:20                     ` Saeed Mahameed
@ 2017-01-28 17:52                       ` Alexei Starovoitov
  2017-01-29  9:11                         ` Saeed Mahameed
  0 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2017-01-28 17:52 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Tom Herbert, Or Gerlitz, Saeed Mahameed, David Miller,
	Linux Netdev List, Kernel Team

On 1/28/17 3:20 AM, Saeed Mahameed wrote:
> On Sat, Jan 28, 2017 at 1:23 AM, Alexei Starovoitov <ast@fb.com> wrote:
>> On 1/27/17 1:15 PM, Saeed Mahameed wrote:
>>>
>>> It is only mandatory for configurations that needs eswitch, where the
>>> driver has no way to know about them, for a good old bare metal box,
>>> eswitch is not needed.
>>>
>>> we can do some work to strip the l2 table logic - needed for PFs to
>>> work on multi-host - out of eswitch but again that would further
>>> complicate the driver code since eswitch will still need to update l2
>>> tables for VFs.
>>
>>
>> Saeed,
>> for multi-host setups every host in that multi-host doesn't
>> actually see the eswitch, no? Otherwise broken driver on one machine
>> can affect the other hosts in the same bundle? Please double check,
>
> each host (PF) has its own eswitch, and each eswitch lives in its own
> "steering-space"
>   and it can't affect others.
>
>> since this is absolutely critical HW requirement.
>>
>
> The only shared HW resources between hosts (PFs) is the simple l2 table,
> and the only thing a host can ask from the l2 talbe (FW) is: "forward
> UC MAC to me", and it is the responsibility of the the driver eswitch
> to do so.
>
> the l2 table is created and managed by FW, SW eswitch can only request
> from FW, and the FW is trusted.

ok. clear. thanks for explaining.
Could you describe the sequence of function calls within mlx5
that does the assignment of uc mac for PF ?
since I'm missing where eswitch is involved.
I can see:
mlx5e_nic_enable | mlx5e_set_mac
   queue_work(priv->wq, &priv->set_rx_mode_work);
     mlx5e_set_rx_mode_work
       mlx5e_apply_netdev_addr
         mlx5e_add_l2_flow_rule

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/4] mlx5: Create build configuration options
  2017-01-28 17:19       ` Tom Herbert
@ 2017-01-29  8:07         ` Saeed Mahameed
  2017-01-30 20:00           ` Tom Herbert
  0 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-29  8:07 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Kernel Team

On Sat, Jan 28, 2017 at 7:19 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Sat, Jan 28, 2017 at 3:38 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Fri, Jan 27, 2017 at 8:13 PM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Fri, Jan 27, 2017 at 9:58 AM, Saeed Mahameed
>>> <saeedm@dev.mellanox.co.il> wrote:
>>>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> This patchset creates configuration options for sriov, vxlan, eswitch,
>>>>> and tc features in the mlx5 driver. The purpose of this is to allow not
>>>>> building these features. These features are optional advanced features
>>>>> that are not required for a core Ethernet driver. A user can disable
>>>>> these features which resuces the amount of code in the driver. Disabling
>>>>> these features (and DCB) reduces the size of mlx5_core.o by about 16%.
>>>>> This is also can reduce the complexity of backport and rebases since
>>>>> user would no longer need to worry about dependencies with the rest of
>>>>> the kernel that features which might not be of any interest to a user
>>>>> may bring in.
>>>>>
>>>>> Tested: Build and ran the driver with all features enabled (the default)
>>>>> and with none enabled (including DCB). Did not see any issues. I did
>>>>> not explicity test operation of ayy of features in the list.
>>>>>
>>>>
>>>> Basically I am not against this kind of change, infact i am with it,
>>>> although I would have done some restructuring in the driver before i
>>>> did such change ;), filling the code with ifdefs is not a neat thing.
>>>>
>>> If you wish, please take this as an RFC and feel free to structure the
>>> code the right way. I think the intent is clear enough and looks like
>>> davem isn't going to allow the directory restructuring so something
>>> like this seems to be the best course of action now.
>>>
>>
>> Right.
>>
>>>> I agree this will simplify backporting and provide some kind of
>>>> feature separation inside the driver.
>>>> But this will also increase the testing matrix we need to cover and
>>>> increase the likelihood of kbuild breaks by an order of magnitude.
>>>>
>>> The testing matrix already exploded with the proliferation of
>>> supported features. If anything this reduces the test matrix problem.
>>> For instance, if we make a change to the core driver and functionality
>>> properly isolated there is a much better chance that this won't affect
>>> peripheral functionality and vice versa. It is just not feasible for
>>> us to test every combination of NIC features for every change being
>>> made.
>>>
>>
>> Yes for isolated features, but for base functionality, we need to test
>> it with all new device specific kconfig combinations on every patch!
>
> Sorry, but that is the price you need to pay for a feature rich device.
>
> On the subject of testing, I don't really see any indication in these
> patches on how patches are being tested. Also, there are patches that
> fix things without any mention of how to repro the problems. It is

I Will do my best to provide more information in fixes commit
messages, Thanks for the input.

> critical that we know IPv6 is tested as much or more than IPv4 (just

For the record, for every IPv4 test in our automatic regression system
we have an IPv6 equivalent test,
not to mention IPv6-only directed tests.

> last week with hit yet another IPv6-only issue in an another upstream
> driver that should have been caught with a simple load test-- this

Is there any IPv6-only functionality issues with mlx4/mlx5 that we
don't know about ?
If you do see any of those, please report them so we take the needed
corrective actions to fix them and cover them in our regression.

> really is not acceptable any more!). Please add a description of how
> patches were tested to commit logs.
>

Acknowledge.

> Tom
>
>> since a misplaced code inside or outside the correct ifdef
>> can easily go unnoticed and break functionality.
>>
>>>> One more thing, do we really need a device specific flag per feature
>>>> per vendor per device?  can't we just use the same kconfig flag for
>>>> all drivers and if there is a more generic system wide flag that
>>>> covers the same feature
>>>> can't we just use it, for instance instead of
>>>> CONFIG_<DRIVER_NAME>_SRIOV why not use already existing CONFIG_PCI_IOV
>>>> for all drivers ?
>>>>
>>> That sounds good to me. We already have CONFIG_RFS_ACCEL and others
>>> that do that.
>>>
>>> Tom
>>>
>>>> Saeed.
>>>>
>>>>>
>>>>>
>>>>> Tom Herbert (4):
>>>>>   mlx5: Make building eswitch configurable
>>>>>   mlx5: Make building SR-IOV configurable
>>>>>   mlx5: Make building tc hardware offload configurable
>>>>>   mlx5: Make building vxlan hardware offload configurable
>>>>>
>>>>>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  35 ++++++
>>>>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  16 ++-
>>>>>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 129 ++++++++++++++++------
>>>>>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   |  39 +++++--
>>>>>  drivers/net/ethernet/mellanox/mlx5/core/eq.c      |   4 +-
>>>>>  drivers/net/ethernet/mellanox/mlx5/core/lag.c     |   2 +
>>>>>  drivers/net/ethernet/mellanox/mlx5/core/main.c    |  32 ++++--
>>>>>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |   6 +-
>>>>>  8 files changed, 205 insertions(+), 58 deletions(-)
>>>>>
>>>>> --
>>>>> 2.9.3
>>>>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-28 17:52                       ` Alexei Starovoitov
@ 2017-01-29  9:11                         ` Saeed Mahameed
  2017-01-30 16:45                           ` Alexei Starovoitov
  0 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-29  9:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tom Herbert, Or Gerlitz, Saeed Mahameed, David Miller,
	Linux Netdev List, Kernel Team

On Sat, Jan 28, 2017 at 7:52 PM, Alexei Starovoitov <ast@fb.com> wrote:
> On 1/28/17 3:20 AM, Saeed Mahameed wrote:
>>
>> On Sat, Jan 28, 2017 at 1:23 AM, Alexei Starovoitov <ast@fb.com> wrote:
>>>
>>> On 1/27/17 1:15 PM, Saeed Mahameed wrote:
>>>>
>>>>
>>>> It is only mandatory for configurations that needs eswitch, where the
>>>> driver has no way to know about them, for a good old bare metal box,
>>>> eswitch is not needed.
>>>>
>>>> we can do some work to strip the l2 table logic - needed for PFs to
>>>> work on multi-host - out of eswitch but again that would further
>>>> complicate the driver code since eswitch will still need to update l2
>>>> tables for VFs.
>>>
>>>
>>>
>>> Saeed,
>>> for multi-host setups every host in that multi-host doesn't
>>> actually see the eswitch, no? Otherwise broken driver on one machine
>>> can affect the other hosts in the same bundle? Please double check,
>>
>>
>> each host (PF) has its own eswitch, and each eswitch lives in its own
>> "steering-space"
>>   and it can't affect others.
>>
>>> since this is absolutely critical HW requirement.
>>>
>>
>> The only shared HW resources between hosts (PFs) is the simple l2 table,
>> and the only thing a host can ask from the l2 talbe (FW) is: "forward
>> UC MAC to me", and it is the responsibility of the the driver eswitch
>> to do so.
>>
>> the l2 table is created and managed by FW, SW eswitch can only request
>> from FW, and the FW is trusted.
>
>
> ok. clear. thanks for explaining.
> Could you describe the sequence of function calls within mlx5
> that does the assignment of uc mac for PF ?
> since I'm missing where eswitch is involved.
> I can see:
> mlx5e_nic_enable | mlx5e_set_mac
>   queue_work(priv->wq, &priv->set_rx_mode_work);
>     mlx5e_set_rx_mode_work
>       mlx5e_apply_netdev_addr
>         mlx5e_add_l2_flow_rule
>

It is a  little bit more complicated than this :).

ConnectX4/5 and hopefully so on .. provide three different isolated
steering layers:

3. vport layer: avaialbe for any PF/VF vport nic driver instance
(netdevice), it allows vlan/mac filtering
 ,RSS hashing and n-tuple steering (for both encapsulated and
nonencapsulated traffic) and RFS steering. ( the code above only
writes flow entries of a PF/VF to its own vport flow tables, there is
another mechanism to propagate l2 steering rules down to eswitch from
the vport layer.

2. eswitch layer: Available for PFs only with
HCA_CAP.vport_group_manager capability set.
it allows steering between PF and different VFs on the same host (vlan
mac steering and ACL filters in sriov legacy mode, and fancy n-tuple
steering and offloads for switchdev mode - eswitch_offloads.c - )
if this table is not create the default is pass-throu traffic to PF

1. L2 table: Available for PFs only with HCA_CAP.vport_group_manager
capability set.
needed for MH configurations and only PF is allowed and should write
"request UC MAC - set_l2_table_entry" on behalf of the PF itself and
it's own VFs.

- On a bare metal machine only layer 3 is required (all traffic is
passed to the PF vport).
- On a MH configuration layer 3 and 1 are required.
- On a SRIOV configuration layer 3 and 2 are required.
- On MH with SRIOV all layers are required.

in the driver, eswitch and L2 layers are handled by PF@eswitch.c.

So for your question:

PF always init_eswitch ( no eswitch -sriov- tables are created), and
the eswitch will start listening for vport_change_events.

A PF/VF or netdev vport instance on any steering changes updates
should call  mlx5e_vport_context_update[1]

vport_context_update is A FW command that will store the current
UC/MC/VLAN list and promiscuity info of a vport.

The FW will generate an event to the PF driver eswitch manager (vport
manager) mlx5_eswitch_vport_event [2], and the PF eswitch will call
set_l2_table_entry for each UC mac on each vport change event of any
vport (including its own vport), in case of SRIOV is enabled it will
update eswitch tables as well.

To simplify my answer the function calls are:
Vport VF/PF netdevice:
mlx5e_set_rx_mode_work
    mlx5e_vport_context_update
       mlx5e_vport_context_update_addr_list  --> FW event will be
generated to the PF esiwtch manager

PF eswitch manager(eswitch.c) on a vport change FW event:
mlx5_eswitch_vport_event
      esw_vport_change_handler
           esw_vport_change_handle_locked
                   esw_apply_vport_addr_list
                              esw_add_uc_addr
                                     set_l2_table_entry --> this will
update the l2 table in case MH is enabled.

Sorry for the long answer :)
-Saeed

[1] http://lxr.free-electrons.com/source/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c#L440
[2] http://lxr.free-electrons.com/source/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c#L1675

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-29  9:11                         ` Saeed Mahameed
@ 2017-01-30 16:45                           ` Alexei Starovoitov
  2017-01-30 21:18                             ` Saeed Mahameed
  0 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2017-01-30 16:45 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Tom Herbert, Or Gerlitz, Saeed Mahameed, David Miller,
	Linux Netdev List, Kernel Team

On 1/29/17 1:11 AM, Saeed Mahameed wrote:
>
> ConnectX4/5 and hopefully so on .. provide three different isolated
> steering layers:
>
> 3. vport layer: avaialbe for any PF/VF vport nic driver instance
> (netdevice), it allows vlan/mac filtering
>   ,RSS hashing and n-tuple steering (for both encapsulated and
> nonencapsulated traffic) and RFS steering. ( the code above only
> writes flow entries of a PF/VF to its own vport flow tables, there is
> another mechanism to propagate l2 steering rules down to eswitch from
> the vport layer.
>
> 2. eswitch layer: Available for PFs only with
> HCA_CAP.vport_group_manager capability set.
> it allows steering between PF and different VFs on the same host (vlan
> mac steering and ACL filters in sriov legacy mode, and fancy n-tuple
> steering and offloads for switchdev mode - eswitch_offloads.c - )
> if this table is not create the default is pass-throu traffic to PF
>
> 1. L2 table: Available for PFs only with HCA_CAP.vport_group_manager
> capability set.
> needed for MH configurations and only PF is allowed and should write
> "request UC MAC - set_l2_table_entry" on behalf of the PF itself and
> it's own VFs.
>
> - On a bare metal machine only layer 3 is required (all traffic is
> passed to the PF vport).
> - On a MH configuration layer 3 and 1 are required.
> - On a SRIOV configuration layer 3 and 2 are required.
> - On MH with SRIOV all layers are required.
>
> in the driver, eswitch and L2 layers are handled by PF@eswitch.c.
>
> So for your question:
>
> PF always init_eswitch ( no eswitch -sriov- tables are created), and
> the eswitch will start listening for vport_change_events.
>
> A PF/VF or netdev vport instance on any steering changes updates
> should call  mlx5e_vport_context_update[1]
>
> vport_context_update is A FW command that will store the current
> UC/MC/VLAN list and promiscuity info of a vport.
>
> The FW will generate an event to the PF driver eswitch manager (vport
> manager) mlx5_eswitch_vport_event [2], and the PF eswitch will call
> set_l2_table_entry for each UC mac on each vport change event of any
> vport (including its own vport), in case of SRIOV is enabled it will
> update eswitch tables as well.
>
> To simplify my answer the function calls are:
> Vport VF/PF netdevice:
> mlx5e_set_rx_mode_work
>      mlx5e_vport_context_update
>         mlx5e_vport_context_update_addr_list  --> FW event will be
> generated to the PF esiwtch manager
>
> PF eswitch manager(eswitch.c) on a vport change FW event:
> mlx5_eswitch_vport_event
>        esw_vport_change_handler
>             esw_vport_change_handle_locked
>                     esw_apply_vport_addr_list
>                                esw_add_uc_addr
>                                       set_l2_table_entry --> this will
> update the l2 table in case MH is enabled.

all makes sense. To test this logic I added printk-s
to above functions, but I only see:
# ip link set eth0 addr 24:8a:07:47:2b:6e
[  148.861914] mlx5e_vport_context_update_addr_list: is_uc 1 err 0
[  148.875152] mlx5e_vport_context_update_addr_list: is_uc 0 err 0

MLX5_EVENT_TYPE_NIC_VPORT_CHANGE doesn't come into mlx5_eq_int().
Yet nic seems to work fine. Packets come and go.

broken firmware or expected behavior?

# ethtool -i eth0
driver: mlx5_core
version: 3.0-1 (January 2015)
firmware-version: 14.16.2024

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/4] mlx5: Create build configuration options
  2017-01-29  8:07         ` Saeed Mahameed
@ 2017-01-30 20:00           ` Tom Herbert
  2017-01-30 21:26             ` Saeed Mahameed
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2017-01-30 20:00 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Kernel Team

On Sun, Jan 29, 2017 at 12:07 AM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Sat, Jan 28, 2017 at 7:19 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Sat, Jan 28, 2017 at 3:38 AM, Saeed Mahameed
>> <saeedm@dev.mellanox.co.il> wrote:
>>> On Fri, Jan 27, 2017 at 8:13 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>> On Fri, Jan 27, 2017 at 9:58 AM, Saeed Mahameed
>>>> <saeedm@dev.mellanox.co.il> wrote:
>>>>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>> This patchset creates configuration options for sriov, vxlan, eswitch,
>>>>>> and tc features in the mlx5 driver. The purpose of this is to allow not
>>>>>> building these features. These features are optional advanced features
>>>>>> that are not required for a core Ethernet driver. A user can disable
>>>>>> these features which resuces the amount of code in the driver. Disabling
>>>>>> these features (and DCB) reduces the size of mlx5_core.o by about 16%.
>>>>>> This is also can reduce the complexity of backport and rebases since
>>>>>> user would no longer need to worry about dependencies with the rest of
>>>>>> the kernel that features which might not be of any interest to a user
>>>>>> may bring in.
>>>>>>
>>>>>> Tested: Build and ran the driver with all features enabled (the default)
>>>>>> and with none enabled (including DCB). Did not see any issues. I did
>>>>>> not explicity test operation of ayy of features in the list.
>>>>>>
>>>>>
>>>>> Basically I am not against this kind of change, infact i am with it,
>>>>> although I would have done some restructuring in the driver before i
>>>>> did such change ;), filling the code with ifdefs is not a neat thing.
>>>>>
>>>> If you wish, please take this as an RFC and feel free to structure the
>>>> code the right way. I think the intent is clear enough and looks like
>>>> davem isn't going to allow the directory restructuring so something
>>>> like this seems to be the best course of action now.
>>>>
>>>
>>> Right.
>>>
>>>>> I agree this will simplify backporting and provide some kind of
>>>>> feature separation inside the driver.
>>>>> But this will also increase the testing matrix we need to cover and
>>>>> increase the likelihood of kbuild breaks by an order of magnitude.
>>>>>
>>>> The testing matrix already exploded with the proliferation of
>>>> supported features. If anything this reduces the test matrix problem.
>>>> For instance, if we make a change to the core driver and functionality
>>>> properly isolated there is a much better chance that this won't affect
>>>> peripheral functionality and vice versa. It is just not feasible for
>>>> us to test every combination of NIC features for every change being
>>>> made.
>>>>
>>>
>>> Yes for isolated features, but for base functionality, we need to test
>>> it with all new device specific kconfig combinations on every patch!
>>
>> Sorry, but that is the price you need to pay for a feature rich device.
>>
>> On the subject of testing, I don't really see any indication in these
>> patches on how patches are being tested. Also, there are patches that
>> fix things without any mention of how to repro the problems. It is
>
> I Will do my best to provide more information in fixes commit
> messages, Thanks for the input.
>
>> critical that we know IPv6 is tested as much or more than IPv4 (just
>
> For the record, for every IPv4 test in our automatic regression system
> we have an IPv6 equivalent test,
> not to mention IPv6-only directed tests.
>
Great to know. Is there a publicly available description of what
specific tests are in the suite?

>> last week with hit yet another IPv6-only issue in an another upstream
>> driver that should have been caught with a simple load test-- this
>
> Is there any IPv6-only functionality issues with mlx4/mlx5 that we
> don't know about ?
> If you do see any of those, please report them so we take the needed
> corrective actions to fix them and cover them in our regression.
>
If we see them we will certainly let you know. This is not just
functionality either, performance regression versus IPv4 should also
be considered serious issues.

>> really is not acceptable any more!). Please add a description of how
>> patches were tested to commit logs.
>>
>
> Acknowledge.
>

Thanks,
Tom

>> Tom
>>
>>> since a misplaced code inside or outside the correct ifdef
>>> can easily go unnoticed and break functionality.
>>>
>>>>> One more thing, do we really need a device specific flag per feature
>>>>> per vendor per device?  can't we just use the same kconfig flag for
>>>>> all drivers and if there is a more generic system wide flag that
>>>>> covers the same feature
>>>>> can't we just use it, for instance instead of
>>>>> CONFIG_<DRIVER_NAME>_SRIOV why not use already existing CONFIG_PCI_IOV
>>>>> for all drivers ?
>>>>>
>>>> That sounds good to me. We already have CONFIG_RFS_ACCEL and others
>>>> that do that.
>>>>
>>>> Tom
>>>>
>>>>> Saeed.
>>>>>
>>>>>>
>>>>>>
>>>>>> Tom Herbert (4):
>>>>>>   mlx5: Make building eswitch configurable
>>>>>>   mlx5: Make building SR-IOV configurable
>>>>>>   mlx5: Make building tc hardware offload configurable
>>>>>>   mlx5: Make building vxlan hardware offload configurable
>>>>>>
>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  35 ++++++
>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  16 ++-
>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 129 ++++++++++++++++------
>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   |  39 +++++--
>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/eq.c      |   4 +-
>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/lag.c     |   2 +
>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/main.c    |  32 ++++--
>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |   6 +-
>>>>>>  8 files changed, 205 insertions(+), 58 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.9.3
>>>>>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-30 16:45                           ` Alexei Starovoitov
@ 2017-01-30 21:18                             ` Saeed Mahameed
  2017-01-31  3:32                               ` Alexei Starovoitov
  0 siblings, 1 reply; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-30 21:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tom Herbert, Or Gerlitz, Saeed Mahameed, David Miller,
	Linux Netdev List, Kernel Team

On Mon, Jan 30, 2017 at 6:45 PM, Alexei Starovoitov <ast@fb.com> wrote:
> On 1/29/17 1:11 AM, Saeed Mahameed wrote:
>>
>>
>> ConnectX4/5 and hopefully so on .. provide three different isolated
>> steering layers:
>>
>> 3. vport layer: avaialbe for any PF/VF vport nic driver instance
>> (netdevice), it allows vlan/mac filtering
>>   ,RSS hashing and n-tuple steering (for both encapsulated and
>> nonencapsulated traffic) and RFS steering. ( the code above only
>> writes flow entries of a PF/VF to its own vport flow tables, there is
>> another mechanism to propagate l2 steering rules down to eswitch from
>> the vport layer.
>>
>> 2. eswitch layer: Available for PFs only with
>> HCA_CAP.vport_group_manager capability set.
>> it allows steering between PF and different VFs on the same host (vlan
>> mac steering and ACL filters in sriov legacy mode, and fancy n-tuple
>> steering and offloads for switchdev mode - eswitch_offloads.c - )
>> if this table is not create the default is pass-throu traffic to PF
>>
>> 1. L2 table: Available for PFs only with HCA_CAP.vport_group_manager
>> capability set.
>> needed for MH configurations and only PF is allowed and should write
>> "request UC MAC - set_l2_table_entry" on behalf of the PF itself and
>> it's own VFs.
>>
>> - On a bare metal machine only layer 3 is required (all traffic is
>> passed to the PF vport).
>> - On a MH configuration layer 3 and 1 are required.
>> - On a SRIOV configuration layer 3 and 2 are required.
>> - On MH with SRIOV all layers are required.
>>
>> in the driver, eswitch and L2 layers are handled by PF@eswitch.c.
>>
>> So for your question:
>>
>> PF always init_eswitch ( no eswitch -sriov- tables are created), and
>> the eswitch will start listening for vport_change_events.
>>
>> A PF/VF or netdev vport instance on any steering changes updates
>> should call  mlx5e_vport_context_update[1]
>>
>> vport_context_update is A FW command that will store the current
>> UC/MC/VLAN list and promiscuity info of a vport.
>>
>> The FW will generate an event to the PF driver eswitch manager (vport
>> manager) mlx5_eswitch_vport_event [2], and the PF eswitch will call
>> set_l2_table_entry for each UC mac on each vport change event of any
>> vport (including its own vport), in case of SRIOV is enabled it will
>> update eswitch tables as well.
>>
>> To simplify my answer the function calls are:
>> Vport VF/PF netdevice:
>> mlx5e_set_rx_mode_work
>>      mlx5e_vport_context_update
>>         mlx5e_vport_context_update_addr_list  --> FW event will be
>> generated to the PF esiwtch manager
>>
>> PF eswitch manager(eswitch.c) on a vport change FW event:
>> mlx5_eswitch_vport_event
>>        esw_vport_change_handler
>>             esw_vport_change_handle_locked
>>                     esw_apply_vport_addr_list
>>                                esw_add_uc_addr
>>                                       set_l2_table_entry --> this will
>> update the l2 table in case MH is enabled.
>
>
> all makes sense. To test this logic I added printk-s
> to above functions, but I only see:
> # ip link set eth0 addr 24:8a:07:47:2b:6e
> [  148.861914] mlx5e_vport_context_update_addr_list: is_uc 1 err 0
> [  148.875152] mlx5e_vport_context_update_addr_list: is_uc 0 err 0
>
> MLX5_EVENT_TYPE_NIC_VPORT_CHANGE doesn't come into mlx5_eq_int().

Strange, just double checked and i got those events on latest net-next
bare-metal box.

> Yet nic seems to work fine. Packets come and go.
>

Is it multi host configuration or bare metal ?
Do you have internal loopback traffic between different hosts ?

> broken firmware or expected behavior?

which driver did you test ? backported or net-next ?

if it is backported driver please verify that on driver load the
following occurs :

1. VPORTS change events are globally enabled:
in mlx5_start_eqs@eq.c:
async_event_mask |= (1ull << MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);

2. UC address change events are enabled for vport 0 (PF):
In eswitch_attach or on eswitch_init (depends on the kernel version) @eswitch.c
esw_enable_vport(esw, 0, UC_ADDR_CHANGE); is called.

>
> # ethtool -i eth0
> driver: mlx5_core
> version: 3.0-1 (January 2015)
> firmware-version: 14.16.2024
>

BTW folks, i am going to be on vacation for the rest of the week, so
please expect slow responses.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/4] mlx5: Create build configuration options
  2017-01-30 20:00           ` Tom Herbert
@ 2017-01-30 21:26             ` Saeed Mahameed
  0 siblings, 0 replies; 33+ messages in thread
From: Saeed Mahameed @ 2017-01-30 21:26 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Kernel Team

On Mon, Jan 30, 2017 at 10:00 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Sun, Jan 29, 2017 at 12:07 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Sat, Jan 28, 2017 at 7:19 PM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Sat, Jan 28, 2017 at 3:38 AM, Saeed Mahameed
>>> <saeedm@dev.mellanox.co.il> wrote:
>>>> On Fri, Jan 27, 2017 at 8:13 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> On Fri, Jan 27, 2017 at 9:58 AM, Saeed Mahameed
>>>>> <saeedm@dev.mellanox.co.il> wrote:
>>>>>> On Fri, Jan 27, 2017 at 1:32 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>> This patchset creates configuration options for sriov, vxlan, eswitch,
>>>>>>> and tc features in the mlx5 driver. The purpose of this is to allow not
>>>>>>> building these features. These features are optional advanced features
>>>>>>> that are not required for a core Ethernet driver. A user can disable
>>>>>>> these features which resuces the amount of code in the driver. Disabling
>>>>>>> these features (and DCB) reduces the size of mlx5_core.o by about 16%.
>>>>>>> This is also can reduce the complexity of backport and rebases since
>>>>>>> user would no longer need to worry about dependencies with the rest of
>>>>>>> the kernel that features which might not be of any interest to a user
>>>>>>> may bring in.
>>>>>>>
>>>>>>> Tested: Build and ran the driver with all features enabled (the default)
>>>>>>> and with none enabled (including DCB). Did not see any issues. I did
>>>>>>> not explicity test operation of ayy of features in the list.
>>>>>>>
>>>>>>
>>>>>> Basically I am not against this kind of change, infact i am with it,
>>>>>> although I would have done some restructuring in the driver before i
>>>>>> did such change ;), filling the code with ifdefs is not a neat thing.
>>>>>>
>>>>> If you wish, please take this as an RFC and feel free to structure the
>>>>> code the right way. I think the intent is clear enough and looks like
>>>>> davem isn't going to allow the directory restructuring so something
>>>>> like this seems to be the best course of action now.
>>>>>
>>>>
>>>> Right.
>>>>
>>>>>> I agree this will simplify backporting and provide some kind of
>>>>>> feature separation inside the driver.
>>>>>> But this will also increase the testing matrix we need to cover and
>>>>>> increase the likelihood of kbuild breaks by an order of magnitude.
>>>>>>
>>>>> The testing matrix already exploded with the proliferation of
>>>>> supported features. If anything this reduces the test matrix problem.
>>>>> For instance, if we make a change to the core driver and functionality
>>>>> properly isolated there is a much better chance that this won't affect
>>>>> peripheral functionality and vice versa. It is just not feasible for
>>>>> us to test every combination of NIC features for every change being
>>>>> made.
>>>>>
>>>>
>>>> Yes for isolated features, but for base functionality, we need to test
>>>> it with all new device specific kconfig combinations on every patch!
>>>
>>> Sorry, but that is the price you need to pay for a feature rich device.
>>>
>>> On the subject of testing, I don't really see any indication in these
>>> patches on how patches are being tested. Also, there are patches that
>>> fix things without any mention of how to repro the problems. It is
>>
>> I Will do my best to provide more information in fixes commit
>> messages, Thanks for the input.
>>
>>> critical that we know IPv6 is tested as much or more than IPv4 (just
>>
>> For the record, for every IPv4 test in our automatic regression system
>> we have an IPv6 equivalent test,
>> not to mention IPv6-only directed tests.
>>
> Great to know. Is there a publicly available description of what
> specific tests are in the suite?
>

Nothing public, but i can collect some information and share it with
you if you wish.
but mostly traffic oriented tests and stress tests.
and configuration oriented tests:
  - basic configuration stuff: IPv6 with MTU/ring/offloads changes, etc..
  - Vxlan over IPv6, IPv6 over vxlan,
  - and some more

>>> last week with hit yet another IPv6-only issue in an another upstream
>>> driver that should have been caught with a simple load test-- this
>>
>> Is there any IPv6-only functionality issues with mlx4/mlx5 that we
>> don't know about ?
>> If you do see any of those, please report them so we take the needed
>> corrective actions to fix them and cover them in our regression.
>>
> If we see them we will certainly let you know. This is not just
> functionality either, performance regression versus IPv4 should also
> be considered serious issues.
>
>>> really is not acceptable any more!). Please add a description of how
>>> patches were tested to commit logs.
>>>
>>
>> Acknowledge.
>>
>
> Thanks,
> Tom
>
>>> Tom
>>>
>>>> since a misplaced code inside or outside the correct ifdef
>>>> can easily go unnoticed and break functionality.
>>>>
>>>>>> One more thing, do we really need a device specific flag per feature
>>>>>> per vendor per device?  can't we just use the same kconfig flag for
>>>>>> all drivers and if there is a more generic system wide flag that
>>>>>> covers the same feature
>>>>>> can't we just use it, for instance instead of
>>>>>> CONFIG_<DRIVER_NAME>_SRIOV why not use already existing CONFIG_PCI_IOV
>>>>>> for all drivers ?
>>>>>>
>>>>> That sounds good to me. We already have CONFIG_RFS_ACCEL and others
>>>>> that do that.
>>>>>
>>>>> Tom
>>>>>
>>>>>> Saeed.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Tom Herbert (4):
>>>>>>>   mlx5: Make building eswitch configurable
>>>>>>>   mlx5: Make building SR-IOV configurable
>>>>>>>   mlx5: Make building tc hardware offload configurable
>>>>>>>   mlx5: Make building vxlan hardware offload configurable
>>>>>>>
>>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/Kconfig   |  35 ++++++
>>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/Makefile  |  16 ++-
>>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 129 ++++++++++++++++------
>>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   |  39 +++++--
>>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/eq.c      |   4 +-
>>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/lag.c     |   2 +
>>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/main.c    |  32 ++++--
>>>>>>>  drivers/net/ethernet/mellanox/mlx5/core/sriov.c   |   6 +-
>>>>>>>  8 files changed, 205 insertions(+), 58 deletions(-)
>>>>>>>
>>>>>>> --
>>>>>>> 2.9.3
>>>>>>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-30 21:18                             ` Saeed Mahameed
@ 2017-01-31  3:32                               ` Alexei Starovoitov
  2017-01-31 14:44                                 ` Mohamad Haj Yahia
  0 siblings, 1 reply; 33+ messages in thread
From: Alexei Starovoitov @ 2017-01-31  3:32 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Tom Herbert, Or Gerlitz, Saeed Mahameed, David Miller,
	Linux Netdev List, Kernel Team

On 1/30/17 1:18 PM, Saeed Mahameed wrote:
> On Mon, Jan 30, 2017 at 6:45 PM, Alexei Starovoitov <ast@fb.com> wrote:
>> On 1/29/17 1:11 AM, Saeed Mahameed wrote:
>>>
>>>
>>> ConnectX4/5 and hopefully so on .. provide three different isolated
>>> steering layers:
>>>
>>> 3. vport layer: avaialbe for any PF/VF vport nic driver instance
>>> (netdevice), it allows vlan/mac filtering
>>>    ,RSS hashing and n-tuple steering (for both encapsulated and
>>> nonencapsulated traffic) and RFS steering. ( the code above only
>>> writes flow entries of a PF/VF to its own vport flow tables, there is
>>> another mechanism to propagate l2 steering rules down to eswitch from
>>> the vport layer.
>>>
>>> 2. eswitch layer: Available for PFs only with
>>> HCA_CAP.vport_group_manager capability set.
>>> it allows steering between PF and different VFs on the same host (vlan
>>> mac steering and ACL filters in sriov legacy mode, and fancy n-tuple
>>> steering and offloads for switchdev mode - eswitch_offloads.c - )
>>> if this table is not create the default is pass-throu traffic to PF
>>>
>>> 1. L2 table: Available for PFs only with HCA_CAP.vport_group_manager
>>> capability set.
>>> needed for MH configurations and only PF is allowed and should write
>>> "request UC MAC - set_l2_table_entry" on behalf of the PF itself and
>>> it's own VFs.
>>>
>>> - On a bare metal machine only layer 3 is required (all traffic is
>>> passed to the PF vport).
>>> - On a MH configuration layer 3 and 1 are required.
>>> - On a SRIOV configuration layer 3 and 2 are required.
>>> - On MH with SRIOV all layers are required.
>>>
>>> in the driver, eswitch and L2 layers are handled by PF@eswitch.c.
>>>
>>> So for your question:
>>>
>>> PF always init_eswitch ( no eswitch -sriov- tables are created), and
>>> the eswitch will start listening for vport_change_events.
>>>
>>> A PF/VF or netdev vport instance on any steering changes updates
>>> should call  mlx5e_vport_context_update[1]
>>>
>>> vport_context_update is A FW command that will store the current
>>> UC/MC/VLAN list and promiscuity info of a vport.
>>>
>>> The FW will generate an event to the PF driver eswitch manager (vport
>>> manager) mlx5_eswitch_vport_event [2], and the PF eswitch will call
>>> set_l2_table_entry for each UC mac on each vport change event of any
>>> vport (including its own vport), in case of SRIOV is enabled it will
>>> update eswitch tables as well.
>>>
>>> To simplify my answer the function calls are:
>>> Vport VF/PF netdevice:
>>> mlx5e_set_rx_mode_work
>>>       mlx5e_vport_context_update
>>>          mlx5e_vport_context_update_addr_list  --> FW event will be
>>> generated to the PF esiwtch manager
>>>
>>> PF eswitch manager(eswitch.c) on a vport change FW event:
>>> mlx5_eswitch_vport_event
>>>         esw_vport_change_handler
>>>              esw_vport_change_handle_locked
>>>                      esw_apply_vport_addr_list
>>>                                 esw_add_uc_addr
>>>                                        set_l2_table_entry --> this will
>>> update the l2 table in case MH is enabled.
>>
>>
>> all makes sense. To test this logic I added printk-s
>> to above functions, but I only see:
>> # ip link set eth0 addr 24:8a:07:47:2b:6e
>> [  148.861914] mlx5e_vport_context_update_addr_list: is_uc 1 err 0
>> [  148.875152] mlx5e_vport_context_update_addr_list: is_uc 0 err 0
>>
>> MLX5_EVENT_TYPE_NIC_VPORT_CHANGE doesn't come into mlx5_eq_int().
>
> Strange, just double checked and i got those events on latest net-next
> bare-metal box.
>
>> Yet nic seems to work fine. Packets come and go.
>>
>
> Is it multi host configuration or bare metal ?

multihost

> Do you have internal loopback traffic between different hosts ?

in a multihost? how can I check that?
Is there an ethtool command?

>> broken firmware or expected behavior?
>
> which driver did you test ? backported or net-next ?

both backported and net-next with Tom's patches.

> if it is backported driver please verify that on driver load the
> following occurs :
>
> 1. VPORTS change events are globally enabled:
> in mlx5_start_eqs@eq.c:
> async_event_mask |= (1ull << MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);

this one is done.

> 2. UC address change events are enabled for vport 0 (PF):
> In eswitch_attach or on eswitch_init (depends on the kernel version) @eswitch.c
> esw_enable_vport(esw, 0, UC_ADDR_CHANGE); is called.

this one is not. Tom's proposal to compile out eswitch.c
removes invocation of mlx5_eswitch_attach() and
corresponding esw_enable_vport() call as well.
The question is why is it necessary?
What will break if it's not done?
so far we don't see any adverse effects in both multihost
and baremetal setups.

> BTW folks, i am going to be on vacation for the rest of the week, so
> please expect slow responses.

have a great time off. I hope other mlx folks can answer.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/4] mlx5: Make building eswitch configurable
  2017-01-31  3:32                               ` Alexei Starovoitov
@ 2017-01-31 14:44                                 ` Mohamad Haj Yahia
  0 siblings, 0 replies; 33+ messages in thread
From: Mohamad Haj Yahia @ 2017-01-31 14:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Saeed Mahameed, Tom Herbert, Or Gerlitz, Saeed Mahameed,
	David Miller, Linux Netdev List, Kernel Team

>> Is it multi host configuration or bare metal ?
>
>
> multihost
>
>> Do you have internal loopback traffic between different hosts ?
>
>
> in a multihost? how can I check that?
> Is there an ethtool command?
>

You can check that by sending traffic from one PF to another on the same port.

>>> broken firmware or expected behavior?
>>
>>
>> which driver did you test ? backported or net-next ?
>
>
> both backported and net-next with Tom's patches.
>
>> if it is backported driver please verify that on driver load the
>> following occurs :
>>
>> 1. VPORTS change events are globally enabled:
>> in mlx5_start_eqs@eq.c:
>> async_event_mask |= (1ull << MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);
>
>
> this one is done.
>
>> 2. UC address change events are enabled for vport 0 (PF):
>> In eswitch_attach or on eswitch_init (depends on the kernel version)
>> @eswitch.c
>> esw_enable_vport(esw, 0, UC_ADDR_CHANGE); is called.
>
>
> this one is not. Tom's proposal to compile out eswitch.c
> removes invocation of mlx5_eswitch_attach() and
> corresponding esw_enable_vport() call as well.
> The question is why is it necessary?
> What will break if it's not done?
> so far we don't see any adverse effects in both multihost
> and baremetal setups.
>
Please pay attention that if you compile out the eswitch.c we will not
program the l2 table.
However, the FW populate the L2 table with default entries after
init_hca command (at the driver load stage).
For each PF, By default its permanent mac pointing to it in the l2 table.
This can explain why its working for you without setting the l2 entries.

Now, if you try to set another mac for that PF or try to reach mac
addresses that are not the permantent address (virtual mac addresses
over the PF) it will fail.
So for this reason its crucial that the PF program the l2 table (which
needs the UC_ADDR_CHANGE set in the PF vport context).

>> BTW folks, i am going to be on vacation for the rest of the week, so
>> please expect slow responses.
>
>
> have a great time off. I hope other mlx folks can answer.
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-01-31 14:44 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-26 23:32 [PATCH net-next 0/4] mlx5: Create build configuration options Tom Herbert
2017-01-26 23:32 ` [PATCH net-next 1/4] mlx5: Make building eswitch configurable Tom Herbert
2017-01-27  5:34   ` Or Gerlitz
2017-01-27 17:38     ` Saeed Mahameed
2017-01-27 17:50       ` Tom Herbert
2017-01-27 18:05         ` Saeed Mahameed
2017-01-27 18:16           ` Tom Herbert
2017-01-27 18:28             ` Saeed Mahameed
2017-01-27 18:42               ` Tom Herbert
2017-01-27 21:15                 ` Saeed Mahameed
2017-01-27 23:23                   ` Alexei Starovoitov
2017-01-28 11:20                     ` Saeed Mahameed
2017-01-28 17:52                       ` Alexei Starovoitov
2017-01-29  9:11                         ` Saeed Mahameed
2017-01-30 16:45                           ` Alexei Starovoitov
2017-01-30 21:18                             ` Saeed Mahameed
2017-01-31  3:32                               ` Alexei Starovoitov
2017-01-31 14:44                                 ` Mohamad Haj Yahia
2017-01-27 18:19   ` Saeed Mahameed
2017-01-27 18:33     ` Tom Herbert
2017-01-27 20:59       ` Saeed Mahameed
2017-01-26 23:32 ` [PATCH net-next 2/4] mlx5: Make building SR-IOV configurable Tom Herbert
2017-01-26 23:32 ` [PATCH net-next 3/4] mlx5: Make building tc hardware offload configurable Tom Herbert
2017-01-27  6:29   ` kbuild test robot
2017-01-27 13:43   ` kbuild test robot
2017-01-26 23:32 ` [PATCH net-next 4/4] mlx5: Make building vxlan " Tom Herbert
2017-01-27 17:58 ` [PATCH net-next 0/4] mlx5: Create build configuration options Saeed Mahameed
2017-01-27 18:13   ` Tom Herbert
2017-01-28 11:38     ` Saeed Mahameed
2017-01-28 17:19       ` Tom Herbert
2017-01-29  8:07         ` Saeed Mahameed
2017-01-30 20:00           ` Tom Herbert
2017-01-30 21:26             ` Saeed Mahameed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.