All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/10] net/mlx5: add support for multiport IB devices
@ 2019-02-28 18:02 Viacheslav Ovsiienko
  2019-02-28 18:02 ` [RFC 01/10] net/mlx5: add multiport IB device port structure Viacheslav Ovsiienko
                   ` (10 more replies)
  0 siblings, 11 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-02-28 18:02 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The Mellanox NICs support SR-IOV and have E-Switch feature. 
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context.

mlx5_pci_probe() routine is planned to be extended to probe
the ports of found Infiniband devices. All active ports,
belonging to the same Infiniband device will use the signle
shared Infiniband context of that device:
  - QP are created within shared context, with assigning to
    specified port
  - Verb flows are also created with specifying port index
  - Protection Domain is shared by all representors
  - Global Memory Regions Cache is shared by all representors,
    local ones are not (not in this patchset)
  - Shared Asynchronous Event Handler distributes events
    among the device ports (not in this patchset)
  - UAR mapping to user space is also shared (not in patchset)
	  
Each representor has its own attach network interface device,
interface name and index can be retrieved via IB device and
port index, so the link control is performed via Netlink
in the same way as in existing implementation.
	  
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Viacheslav Ovsiienko (10):
  net/mlx5: add multiport IB device port structure
  net/mlx5: modify get ifindex routine for multiport IB
  net/mlx5: add getting IB ports number for multiport IB
  net/mlx5: add multiport IB device support to probing
  net/mlx5: add IB shared context alloc/free functions
  net/mlx5: switch to the names in the shared IB context
  net/mlx5: switch to the shared Protection Domain
  net/mlx5: switch to the shared context IB attributes
  net/mlx5: switch to the shared IB device context
  net/mlx5: provide IB port for the object being created

 drivers/net/mlx5/mlx5.c            | 529 ++++++++++++++++++++++++++-----------
 drivers/net/mlx5/mlx5.h            |  34 ++-
 drivers/net/mlx5/mlx5_ethdev.c     |  29 +-
 drivers/net/mlx5/mlx5_flow.c       |   2 +
 drivers/net/mlx5/mlx5_flow_dv.c    |   8 +-
 drivers/net/mlx5/mlx5_flow_verbs.c |   7 +-
 drivers/net/mlx5/mlx5_mr.c         |   4 +-
 drivers/net/mlx5/mlx5_nl.c         |  65 ++++-
 drivers/net/mlx5/mlx5_rxq.c        |  41 +--
 drivers/net/mlx5/mlx5_socket.c     |   4 +-
 drivers/net/mlx5/mlx5_stats.c      |  22 +-
 drivers/net/mlx5/mlx5_trigger.c    |   2 +-
 drivers/net/mlx5/mlx5_txq.c        |  24 +-
 13 files changed, 545 insertions(+), 226 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [RFC 01/10] net/mlx5: add multiport IB device port structure
  2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
@ 2019-02-28 18:02 ` Viacheslav Ovsiienko
  2019-02-28 18:02 ` [RFC 02/10] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-02-28 18:02 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The Mellanox NICs support SR-IOV and have E-Switch feature.
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context saving various resources. This patch
introduces appropriate structures.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5384453..3487eed 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -186,10 +186,36 @@ struct mlx5_drop {
 
 struct mlx5_flow_tcf_context;
 
+/* Per port data of shared IB device. */
+struct mlx5_ibv_shared_port {
+	uint32_t port_id;
+};
+
+/*
+ * Shared Infiniband device context for Master/Representors
+ * which belong to same IB device with multiple IB ports.
+ **/
+struct mlx5_ibv_shared {
+	LIST_ENTRY(mlx5_ibv_shared) next;
+	uint32_t refcnt;
+	uint32_t devx:1; /* Opened with DV. */
+	uint32_t secondary:1; /* Temporary spawned by secondary process. */
+	uint32_t max_port; /* Maximal IB device port index. */
+	struct ibv_context *ctx; /* Verbs/DV context. */
+	struct ibv_pd *pd; /* Protection Domain. */
+	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
+	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
+	struct ibv_device_attr_ex device_attr; /* Device properties. */
+	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
+	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
+};
+
 struct mlx5_priv {
 	LIST_ENTRY(mlx5_priv) mem_event_cb;
 	/**< Called by memory event callback. */
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
+	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
+	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC 02/10] net/mlx5: modify get ifindex routine for multiport IB
  2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
  2019-02-28 18:02 ` [RFC 01/10] net/mlx5: add multiport IB device port structure Viacheslav Ovsiienko
@ 2019-02-28 18:02 ` Viacheslav Ovsiienko
  2019-02-28 18:02 ` [RFC 03/10] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-02-28 18:02 UTC (permalink / raw)
  To: dev; +Cc: shahafs

There is the routine mlx5_nl_ifindex() returning the
network interface index associated with Infiniband device.
We are going to support multiport IB devices, now function
takes the IB port as argument and returns ifindexassociated
with tuple <IB device, IB port>

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  2 +-
 drivers/net/mlx5/mlx5.h        |  2 +-
 drivers/net/mlx5/mlx5_ethdev.c |  2 +-
 drivers/net/mlx5/mlx5_nl.c     | 20 +++++++++++++-------
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 740c5e2..d7226fc 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1420,7 +1420,7 @@ struct mlx5_dev_spawn_data {
 			list[i].ifindex = 0;
 		else
 			list[i].ifindex = mlx5_nl_ifindex
-				(nl_rdma, list[i].ibv_dev->name);
+				(nl_rdma, list[i].ibv_dev->name, 1);
 		if (nl_route < 0 ||
 		    !list[i].ifindex ||
 		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3487eed..b259841 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -443,7 +443,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct ether_addr *mac,
 void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
 int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
 int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
-unsigned int mlx5_nl_ifindex(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 664f485..ebc63df 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -222,7 +222,7 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int ifindex =
 		priv->nl_socket_rdma >= 0 ?
-		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name) : 0;
+		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name, 1) : 0;
 
 	if (!ifindex) {
 		if (!priv->representor)
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index aaa65b5..825f94b 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -85,11 +85,12 @@ struct mlx5_nl_mac_addr {
 	int mac_n; /**< Number of addresses in the array. */
 };
 
-/** Data structure used by mlx5_nl_ifindex_cb(). */
+/** Data structure used by mlx5_nl_cmdget_cb(). */
 struct mlx5_nl_ifindex_data {
 	const char *name; /**< IB device name (in). */
 	uint32_t ibindex; /**< IB device index (out). */
 	uint32_t ifindex; /**< Network interface index (out). */
+	uint32_t portnum; /**< IB device max port number. */
 };
 
 /**
@@ -695,12 +696,13 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_ifindex_cb(struct nlmsghdr *nh, void *arg)
+mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
 {
 	struct mlx5_nl_ifindex_data *data = arg;
 	size_t off = NLMSG_HDRLEN;
 	uint32_t ibindex = 0;
 	uint32_t ifindex = 0;
+	uint32_t portnum = 0;
 	int found = 0;
 
 	if (nh->nlmsg_type !=
@@ -725,6 +727,9 @@ struct mlx5_nl_ifindex_data {
 		case RDMA_NLDEV_ATTR_NDEV_INDEX:
 			ifindex = *(uint32_t *)payload;
 			break;
+		case RDMA_NLDEV_ATTR_PORT_INDEX:
+			portnum = *(uint32_t *)payload;
+			break;
 		default:
 			break;
 		}
@@ -733,6 +738,7 @@ struct mlx5_nl_ifindex_data {
 	if (found) {
 		data->ibindex = ibindex;
 		data->ifindex = ifindex;
+		data->portnum = portnum;
 	}
 	return 0;
 error:
@@ -751,15 +757,15 @@ struct mlx5_nl_ifindex_data {
  *   Netlink socket of the RDMA kind (NETLINK_RDMA).
  * @param[in] name
  *   IB device name.
- *
+ * @param[in] pindex
+ *   IB device port index, starting from 1
  * @return
  *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
  *   is set.
  */
 unsigned int
-mlx5_nl_ifindex(int nl, const char *name)
+mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
 {
-	static const uint32_t pindex = 1;
 	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.name = name,
@@ -785,7 +791,7 @@ struct mlx5_nl_ifindex_data {
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!data.ibindex)
@@ -808,7 +814,7 @@ struct mlx5_nl_ifindex_data {
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!data.ifindex)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC 03/10] net/mlx5: add getting IB ports number for multiport IB
  2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
  2019-02-28 18:02 ` [RFC 01/10] net/mlx5: add multiport IB device port structure Viacheslav Ovsiienko
  2019-02-28 18:02 ` [RFC 02/10] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
@ 2019-02-28 18:02 ` Viacheslav Ovsiienko
  2019-02-28 18:02 ` [RFC 04/10] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-02-28 18:02 UTC (permalink / raw)
  To: dev; +Cc: shahafs

There is the routine mlx5_nl_portnum() added to get
the number of ports of multiport Infiniband device.
It is assumed the Uplink/VF representors are attached
on this ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h    |  1 +
 drivers/net/mlx5/mlx5_nl.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b259841..6fba29c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -443,6 +443,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct ether_addr *mac,
 void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
 int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
 int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
 unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 825f94b..65292c4 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -826,6 +826,51 @@ struct mlx5_nl_ifindex_data {
 }
 
 /**
+ * Get the number of physical ports of given IB device.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ *
+ * @return
+ *   A valid (nonzero) number of ports on success, 0 otherwise
+ *   and rte_errno is set.
+ */
+unsigned int
+mlx5_nl_portnum(int nl, const char *name)
+{
+	uint32_t seq = random();
+	struct mlx5_nl_ifindex_data data = {
+		.name = name,
+		.ibindex = 0,
+		.ifindex = 0,
+		.portnum = 0,
+	};
+	struct nlmsghdr req = {
+		.nlmsg_len = NLMSG_LENGTH(0),
+		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					       RDMA_NLDEV_CMD_GET),
+		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+	};
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req, seq);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!data.ibindex) {
+		rte_errno = ENODEV;
+		return 0;
+	}
+	if (!data.portnum)
+		rte_errno = EINVAL;
+	return data.portnum;
+}
+
+/**
  * Process switch information from Netlink message.
  *
  * @param nh
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC 04/10] net/mlx5: add multiport IB device support to probing
  2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                   ` (2 preceding siblings ...)
  2019-02-28 18:02 ` [RFC 03/10] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
@ 2019-02-28 18:02 ` Viacheslav Ovsiienko
  2019-02-28 18:03 ` [RFC 05/10] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-02-28 18:02 UTC (permalink / raw)
  To: dev; +Cc: shahafs

mlx5_pci_probe() routine is refactored to probe the ports
of found Infiniband devices. All active ports (with attached
network interface), belonging to the same Infiniband device
will use the signle shared Infiniband context of that device.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 302 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 210 insertions(+), 92 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d7226fc..c6ddeb3 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -131,6 +131,16 @@
 /** Driver-specific log messages type. */
 int mlx5_logtype;
 
+/** Data associated with devices to spawn. */
+struct mlx5_dev_spawn_data {
+	uint32_t ifindex; /**< Network interface index. */
+	uint32_t max_port; /**< IB device maximal port index. */
+	uint32_t ibv_port; /**< IB device physical port index. */
+	struct mlx5_switch_info info; /**< Switch information. */
+	struct ibv_device *ibv_dev; /**< Associated IB device. */
+	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
+};
+
 /**
  * Prepare shared data between primary and secondary process.
  */
@@ -717,12 +727,10 @@
  *
  * @param dpdk_dev
  *   Backing DPDK device.
- * @param ibv_dev
- *   Verbs device.
+ * @param spawn
+ *   Verbs device parameters (name, port, switch_info) to spawn.
  * @param config
  *   Device configuration parameters.
- * @param[in] switch_info
- *   Switch properties of Ethernet device.
  *
  * @return
  *   A valid Ethernet device object on success, NULL otherwise and rte_errno
@@ -733,10 +741,11 @@
  */
 static struct rte_eth_dev *
 mlx5_dev_spawn(struct rte_device *dpdk_dev,
-	       struct ibv_device *ibv_dev,
-	       struct mlx5_dev_config config,
-	       const struct mlx5_switch_info *switch_info)
+	       struct mlx5_dev_spawn_data *spawn,
+	       struct mlx5_dev_config config)
 {
+	const struct mlx5_switch_info *switch_info = &spawn->info;
+	struct ibv_device *ibv_dev = spawn->ibv_dev;
 	struct ibv_context *ctx = NULL;
 	struct ibv_device_attr_ex attr;
 	struct ibv_port_attr port_attr;
@@ -950,7 +959,7 @@
 		return eth_dev;
 	}
 	/* Check port status. */
-	err = mlx5_glue->query_port(ctx, 1, &port_attr);
+	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
 	if (err) {
 		DRV_LOG(ERR, "port query failed: %s", strerror(err));
 		goto error;
@@ -1295,14 +1304,6 @@
 	return NULL;
 }
 
-/** Data associated with devices to spawn. */
-struct mlx5_dev_spawn_data {
-	unsigned int ifindex; /**< Network interface index. */
-	struct mlx5_switch_info info; /**< Switch information. */
-	struct ibv_device *ibv_dev; /**< Associated IB device. */
-	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
-};
-
 /**
  * Comparison callback to sort device data.
  *
@@ -1359,7 +1360,9 @@ struct mlx5_dev_spawn_data {
 	       struct rte_pci_device *pci_dev)
 {
 	struct ibv_device **ibv_list;
-	unsigned int n = 0;
+	unsigned int nd = 0;
+	unsigned int np = 0;
+	unsigned int ns = 0;
 	struct mlx5_dev_config dev_config;
 	int ret;
 
@@ -1371,8 +1374,14 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(ERR, "cannot list devices, is ib_uverbs loaded?");
 		return -rte_errno;
 	}
-
+	/*
+	 * First scan the list of all Infiniband devices to find
+	 * matching ones, gathering into the list.
+	 */
 	struct ibv_device *ibv_match[ret + 1];
+	int nl_route = -1;
+	int nl_rdma = -1;
+	unsigned int i;
 
 	while (ret-- > 0) {
 		struct rte_pci_addr pci_addr;
@@ -1387,77 +1396,183 @@ struct mlx5_dev_spawn_data {
 			continue;
 		DRV_LOG(INFO, "PCI information matches for device \"%s\"",
 			ibv_list[ret]->name);
-		ibv_match[n++] = ibv_list[ret];
+		ibv_match[nd++] = ibv_list[ret];
+	}
+	ibv_match[nd] = NULL;
+	if (!nd) {
+		/* No device macthes, just complain and bail out. */
+		mlx5_glue->free_device_list(ibv_list);
+		DRV_LOG(WARNING,
+			"no Verbs device matches PCI device " PCI_PRI_FMT ","
+			" are kernel drivers loaded?",
+			pci_dev->addr.domain, pci_dev->addr.bus,
+			pci_dev->addr.devid, pci_dev->addr.function);
+		rte_errno = ENOENT;
+		ret = -rte_errno;
+		return ret;
+	}
+	nl_route = mlx5_nl_init(NETLINK_ROUTE);
+	nl_rdma = mlx5_nl_init(NETLINK_RDMA);
+	if (nd == 1) {
+		/*
+		 * Found single matching device may have multiple ports.
+		 * Each port may be representor, we have to check the port
+		 * number and check the representors existence.
+		 */
+		if (nl_rdma >= 0)
+			np = mlx5_nl_portnum(nl_rdma, ibv_match[0]->name);
+		if (!np)
+			DRV_LOG(WARNING, "can not get IB device \"%s\""
+					 " ports number", ibv_match[0]->name);
 	}
-	ibv_match[n] = NULL;
-
-	struct mlx5_dev_spawn_data list[n];
-	int nl_route = n ? mlx5_nl_init(NETLINK_ROUTE) : -1;
-	int nl_rdma = n ? mlx5_nl_init(NETLINK_RDMA) : -1;
-	unsigned int i;
-	unsigned int u;
-
 	/*
-	 * The existence of several matching entries (n > 1) means port
-	 * representors have been instantiated. No existing Verbs call nor
-	 * /sys entries can tell them apart, this can only be done through
-	 * Netlink calls assuming kernel drivers are recent enough to
-	 * support them.
-	 *
-	 * In the event of identification failure through Netlink, try again
-	 * through sysfs, then either:
-	 *
-	 * 1. No device matches (n == 0), complain and bail out.
-	 * 2. A single IB device matches (n == 1) and is not a representor,
-	 *    assume no switch support.
-	 * 3. Otherwise no safe assumptions can be made; complain louder and
-	 *    bail out.
+	 * Now we can determine the maximal
+	 * amount of devices to be spawned.
 	 */
-	for (i = 0; i != n; ++i) {
-		list[i].ibv_dev = ibv_match[i];
-		list[i].eth_dev = NULL;
-		if (nl_rdma < 0)
-			list[i].ifindex = 0;
-		else
-			list[i].ifindex = mlx5_nl_ifindex
-				(nl_rdma, list[i].ibv_dev->name, 1);
-		if (nl_route < 0 ||
-		    !list[i].ifindex ||
-		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
-					&list[i].info) ||
-		    ((!list[i].info.representor && !list[i].info.master) &&
-		     mlx5_sysfs_switch_info(list[i].ifindex, &list[i].info))) {
-			list[i].ifindex = 0;
-			memset(&list[i].info, 0, sizeof(list[i].info));
-			continue;
+	struct mlx5_dev_spawn_data list[np ? np : nd];
+
+	if (np > 1) {
+		/*
+		 * Signle IB device with multiple ports found,
+		 * it may be E-Switch master device and representors.
+		 * We have to perform identification trough the ports.
+		 */
+		assert(nl_rdma >= 0);
+		assert(ns == 0);
+		assert(nd == 1);
+		for (i = 1; i <= np; ++i) {
+			list[ns].max_port = np;
+			list[ns].ibv_port = i;
+			list[ns].ibv_dev = ibv_match[0];
+			list[ns].eth_dev = NULL;
+			list[ns].ifindex = mlx5_nl_ifindex
+					(nl_rdma, list[ns].ibv_dev->name, i);
+			if (!list[ns].ifindex) {
+				/*
+				 * No network interface index found for the
+				 * specified port, it means there is no
+				 * representor on this port. It's OK,
+				 * there can be disabled ports, for example
+				 * if sriov_numvfs < sriov_totalvfs.
+				 */
+				continue;
+			}
+			ret = -1;
+			if (nl_route >= 0)
+				ret = mlx5_nl_switch_info
+					       (nl_route,
+						list[ns].ifindex,
+						&list[ns].info);
+			if (ret || (!list[ns].info.representor &&
+				    !list[ns].info.master)) {
+				/*
+				 * We failed to recognize representors with
+				 * Netlink, let's try to perform the task
+				 * with sysfs.
+				 */
+				ret =  mlx5_sysfs_switch_info
+						(list[ns].ifindex,
+						 &list[ns].info);
+			}
+			if (!ret && (list[ns].info.representor ^
+				     list[ns].info.master))
+				ns++;
 		}
-	}
-	if (nl_rdma >= 0)
-		close(nl_rdma);
-	if (nl_route >= 0)
-		close(nl_route);
-	/* Count unidentified devices. */
-	for (u = 0, i = 0; i != n; ++i)
-		if (!list[i].info.master && !list[i].info.representor)
-			++u;
-	if (u) {
-		if (n == 1 && u == 1) {
-			/* Case #2. */
-			DRV_LOG(INFO, "no switch support detected");
-		} else {
-			/* Case #3. */
+		if (!ns) {
+			DRV_LOG(ERR,
+				"unable to recognize master/representors"
+				" on the IB device with multiple ports");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
+		}
+	} else {
+		/*
+		 * The existence of several matching entries (nd > 1) means
+		 * port representors have been instantiated. No existing Verbs
+		 * call nor sysfs entries can tell them apart, this can only
+		 * be done through Netlink calls assuming kernel drivers are
+		 * recent enough to support them.
+		 *
+		 * In the event of identification failure through Netlink,
+		 * try again through sysfs, then:
+		 *
+		 * 1. A single IB device matches (nd == 1) with single
+		 *    port (np=0/1) and is not a representor, assume
+		 *    no switch support.
+		 *
+		 * 2. Otherwise no safe assumptions can be made;
+		 *    complain louder and bail out.
+		 */
+		np = 1;
+		for (i = 0; i != nd; ++i) {
+			memset(&list[ns].info, 0, sizeof(list[ns].info));
+			list[ns].max_port = 1;
+			list[ns].ibv_port = 1;
+			list[ns].ibv_dev = ibv_match[i];
+			list[ns].eth_dev = NULL;
+			list[ns].ifindex = 0;
+			if (nl_rdma >= 0)
+				list[ns].ifindex = mlx5_nl_ifindex
+					(nl_rdma, list[ns].ibv_dev->name, 1);
+			if (!list[ns].ifindex) {
+				/*
+				 * No network interface index found for the
+				 * specified device, it means there it is not
+				 * a representor/master.
+				 */
+				continue;
+			}
+			ret = -1;
+			if (nl_route >= 0)
+				ret = mlx5_nl_switch_info
+					       (nl_route,
+						list[ns].ifindex,
+						&list[ns].info);
+			if (ret || (!list[ns].info.representor &&
+				    !list[ns].info.master)) {
+				/*
+				 * We failed to recognize representors with
+				 * Netlink, let's try to perform the task
+				 * with sysfs.
+				 */
+				ret =  mlx5_sysfs_switch_info
+						(list[ns].ifindex,
+						 &list[ns].info);
+			}
+			if (!ret && (list[ns].info.representor ^
+				     list[ns].info.master)) {
+				ns++;
+			} else if ((nd == 1) &&
+				   !list[ns].info.representor &&
+				   !list[ns].info.master) {
+				/*
+				 * Single IB device with
+				 * one physical port and
+				 * attached network device.
+				 * May be SRIOV is not enabled
+				 * or there is no representors.
+				 */
+				DRV_LOG(INFO, "no E-Switch support detected");
+				ns++;
+				break;
+			}
+		}
+		if (!ns) {
 			DRV_LOG(ERR,
-				"unable to tell which of the matching devices"
-				" is the master (lack of kernel support?)");
-			n = 0;
+				"unable to recognize master/representors"
+				" on the multiple IB devices");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
 		}
 	}
+	assert(ns);
 	/*
 	 * Sort list to probe devices in natural order for users convenience
 	 * (i.e. master first, then representors from lowest to highest ID).
 	 */
-	if (n)
-		qsort(list, n, sizeof(*list), mlx5_dev_spawn_data_cmp);
+	qsort(list, ns, sizeof(*list), mlx5_dev_spawn_data_cmp);
 	/* Default configuration. */
 	dev_config = (struct mlx5_dev_config){
 		.hw_padding = 0,
@@ -1476,7 +1591,7 @@ struct mlx5_dev_spawn_data {
 			.min_rxqs_num = MLX5_MPRQ_MIN_RXQS,
 		},
 	};
-	/* Device speicific configuration. */
+	/* Device specific configuration. */
 	switch (pci_dev->id.device_id) {
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX5BF:
 		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS_BLUEFIELD;
@@ -1493,12 +1608,12 @@ struct mlx5_dev_spawn_data {
 	/* Set architecture-dependent default value if unset. */
 	if (dev_config.txqs_vec == MLX5_ARG_UNSET)
 		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS;
-	for (i = 0; i != n; ++i) {
+	for (i = 0; i != ns; ++i) {
 		uint32_t restore;
 
 		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
-						 list[i].ibv_dev, dev_config,
-						 &list[i].info);
+						 &list[i],
+						 dev_config);
 		if (!list[i].eth_dev) {
 			if (rte_errno != EBUSY && rte_errno != EEXIST)
 				break;
@@ -1511,16 +1626,7 @@ struct mlx5_dev_spawn_data {
 		list[i].eth_dev->data->dev_flags |= restore;
 		rte_eth_dev_probing_finish(list[i].eth_dev);
 	}
-	mlx5_glue->free_device_list(ibv_list);
-	if (!n) {
-		DRV_LOG(WARNING,
-			"no Verbs device matches PCI device " PCI_PRI_FMT ","
-			" are kernel drivers loaded?",
-			pci_dev->addr.domain, pci_dev->addr.bus,
-			pci_dev->addr.devid, pci_dev->addr.function);
-		rte_errno = ENOENT;
-		ret = -rte_errno;
-	} else if (i != n) {
+	if (i != ns) {
 		DRV_LOG(ERR,
 			"probe of PCI device " PCI_PRI_FMT " aborted after"
 			" encountering an error: %s",
@@ -1542,6 +1648,18 @@ struct mlx5_dev_spawn_data {
 	} else {
 		ret = 0;
 	}
+exit:
+	/*
+	 * Do the routine cleanup:
+	 * - close opened Netlink sockets
+	 * - free the Infiniband device list
+	 */
+	if (nl_rdma >= 0)
+		close(nl_rdma);
+	if (nl_route >= 0)
+		close(nl_route);
+	assert(ibv_list);
+	mlx5_glue->free_device_list(ibv_list);
 	return ret;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC 05/10] net/mlx5: add IB shared context alloc/free functions
  2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                   ` (3 preceding siblings ...)
  2019-02-28 18:02 ` [RFC 04/10] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
@ 2019-02-28 18:03 ` Viacheslav Ovsiienko
  2019-02-28 18:03 ` [RFC 06/10] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-02-28 18:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The functions to allocate and free shared IB context for
multiport is added. The IB device context, Protection
Domain, device attributes, Infiniband names are going
to be relocated to the shared structure from the device
private one. mlx5_dev_spawn() is updated to create shared
context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 232 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 174 insertions(+), 58 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index c6ddeb3..4df6496 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -141,6 +141,148 @@ struct mlx5_dev_spawn_data {
 	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
 };
 
+static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
+
+/**
+ * Allocate shared IB device context. If there is multiport device the
+ * master and representors will share this context, if there is single
+ * port dedicated IB device, the context will be used by only given
+ * port due to unification.
+ *
+ * Routine first searches the context for the spesified IB device name,
+ * if found the shared context assumed and reference counter is incremented.
+ * If no context found the new one is created and initialized with specified
+ * IB device context and parameters.
+ *
+ * @param[in] spawn
+ *   Pointer to the IB device attributes (name, port, etc).
+ *
+ * @return
+ *   Pointer to mlx5_ibv_shared object on success,
+ *   otherwise NULL and rte_errno is set.
+ */
+static struct mlx5_ibv_shared *
+mlx5_alloc_shared_ibctx(const struct mlx5_dev_spawn_data *spawn)
+{
+	struct mlx5_ibv_shared *sh;
+	int err = 0;
+
+	assert(spawn);
+	/* Search for IB context by device name. */
+	LIST_FOREACH(sh, &mlx5_ibv_list, next) {
+		if (!strcmp(sh->ibdev_name, spawn->ibv_dev->name)) {
+			assert(!sh->secondary);
+			sh->refcnt++;
+			return sh;
+		}
+	}
+	/* No device found, we have to create new sharted context. */
+	assert(spawn->max_port);
+	sh = rte_zmalloc("ethdev shared ib context",
+			 sizeof(struct mlx5_ibv_shared) +
+			 spawn->max_port *
+			 sizeof(struct mlx5_ibv_shared_port),
+			 RTE_CACHE_LINE_SIZE);
+	if (!sh) {
+		DRV_LOG(ERR, "shared context allocation failure");
+		rte_errno  = ENOMEM;
+		return NULL;
+	}
+	/* Try to open IB device with DV first, then usual Verbs. */
+	errno = 0;
+	sh->ctx = mlx5_glue->dv_open_device(spawn->ibv_dev);
+	if (sh->ctx) {
+		sh->devx = 1;
+		DRV_LOG(DEBUG, "DevX is supported");
+	} else {
+		sh->ctx = mlx5_glue->open_device(spawn->ibv_dev);
+		if (!sh->ctx) {
+			err = errno ? errno : ENODEV;
+			goto error;
+		}
+		DRV_LOG(DEBUG, "DevX is NOT supported");
+	}
+	err = mlx5_glue->query_device_ex(sh->ctx, NULL, &sh->device_attr);
+	if (err) {
+		DRV_LOG(DEBUG, "ibv_query_device_ex() failed");
+		goto error;
+	}
+	sh->refcnt = 1;
+	sh->max_port = spawn->max_port;
+	strncpy(sh->ibdev_name, sh->ctx->device->name,
+		sizeof(sh->ibdev_name));
+	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
+		sizeof(sh->ibdev_path));
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		/*
+		 * For secondary process we just open the IB device
+		 * and get attributes, there will no be real usage
+		 * of this structure, the secondary process will
+		 * use one from prpimary.
+		 */
+		sh->secondary = 1;
+		return sh;
+	}
+	sh->pd = mlx5_glue->alloc_pd(sh->ctx);
+	if (sh->pd == NULL) {
+		DRV_LOG(ERR, "PD allocation failure");
+		err = ENOMEM;
+		goto error;
+	}
+	return sh;
+error:
+	assert(sh);
+	if (sh->pd)
+		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->ctx)
+		claim_zero(mlx5_glue->close_device(sh->ctx));
+	rte_free(sh);
+	assert(err > 0);
+	rte_errno = err;
+	return NULL;
+}
+
+/**
+ * Free shared IB device context. Decrement counter and if zero free
+ * all allocated resources and close handles.
+ *
+ * @param[in] sh
+ *   Pointer to mlx5_ibv_shared object to free
+ */
+static void
+mlx5_free_shared_ibctx(struct mlx5_ibv_shared *sh)
+{
+#ifndef NDEBUG
+	/* Check the object presence in the list. */
+	struct mlx5_ibv_shared *lctx;
+
+	LIST_FOREACH(lctx, &mlx5_ibv_list, next)
+		if (lctx == sh)
+			break;
+	assert(lctx);
+	if (lctx != sh) {
+		DRV_LOG(ERR, "Freeing non-existing shared IB context");
+		return;
+	}
+#endif
+	assert(sh);
+	assert(sh->refcnt);
+	if (--sh->refcnt)
+		return;
+	/* Zero reference counter, we should release resources. */
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		assert(sh->secondary);
+		assert(sh->ctx);
+		assert(!sh->pd);
+	}
+	if (sh->pd)
+		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->ctx)
+		claim_zero(mlx5_glue->close_device(sh->ctx));
+	rte_free(sh);
+}
+
+
 /**
  * Prepare shared data between primary and secondary process.
  */
@@ -290,12 +432,10 @@ struct mlx5_dev_spawn_data {
 	}
 	mlx5_mprq_free_mp(dev);
 	mlx5_mr_release(dev);
-	if (priv->pd != NULL) {
-		assert(priv->ctx != NULL);
-		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
-		claim_zero(mlx5_glue->close_device(priv->ctx));
-	} else
-		assert(priv->ctx == NULL);
+	assert(priv->sh);
+	if (priv->sh)
+		mlx5_free_shared_ibctx(priv->sh);
+	priv->sh = NULL;
 	if (priv->rss_conf.rss_key != NULL)
 		rte_free(priv->rss_conf.rss_key);
 	if (priv->reta_idx != NULL)
@@ -745,11 +885,8 @@ struct mlx5_dev_spawn_data {
 	       struct mlx5_dev_config config)
 {
 	const struct mlx5_switch_info *switch_info = &spawn->info;
-	struct ibv_device *ibv_dev = spawn->ibv_dev;
-	struct ibv_context *ctx = NULL;
-	struct ibv_device_attr_ex attr;
+	struct mlx5_ibv_shared *sh;
 	struct ibv_port_attr port_attr;
-	struct ibv_pd *pd = NULL;
 	struct mlx5dv_context dv_attr = { .comp_mask = 0 };
 	struct rte_eth_dev *eth_dev = NULL;
 	struct mlx5_priv *priv = NULL;
@@ -805,18 +942,10 @@ struct mlx5_dev_spawn_data {
 	}
 	/* Prepare shared data between primary and secondary process. */
 	mlx5_prepare_shared_data();
-	errno = 0;
-	ctx = mlx5_glue->dv_open_device(ibv_dev);
-	if (ctx) {
-		config.devx = 1;
-		DRV_LOG(DEBUG, "DEVX is supported");
-	} else {
-		ctx = mlx5_glue->open_device(ibv_dev);
-		if (!ctx) {
-			rte_errno = errno ? errno : ENODEV;
-			return NULL;
-		}
-	}
+	sh = mlx5_alloc_shared_ibctx(spawn);
+	if (!sh)
+		return NULL;
+	config.devx = sh->devx;
 #ifdef HAVE_IBV_MLX5_MOD_SWP
 	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_SWP;
 #endif
@@ -830,7 +959,7 @@ struct mlx5_dev_spawn_data {
 #ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
 	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_STRIDING_RQ;
 #endif
-	mlx5_glue->dv_query_device(ctx, &dv_attr);
+	mlx5_glue->dv_query_device(sh->ctx, &dv_attr);
 	if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) {
 		if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW) {
 			DRV_LOG(DEBUG, "enhanced MPW is supported");
@@ -915,11 +1044,6 @@ struct mlx5_dev_spawn_data {
 		" old OFED/rdma-core version or firmware configuration");
 #endif
 	config.mpls_en = mpls_en;
-	err = mlx5_glue->query_device_ex(ctx, NULL, &attr);
-	if (err) {
-		DEBUG("ibv_query_device_ex() failed");
-		goto error;
-	}
 	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
 		eth_dev = rte_eth_dev_attach_secondary(name);
@@ -955,11 +1079,11 @@ struct mlx5_dev_spawn_data {
 		 */
 		eth_dev->rx_pkt_burst = mlx5_select_rx_function(eth_dev);
 		eth_dev->tx_pkt_burst = mlx5_select_tx_function(eth_dev);
-		claim_zero(mlx5_glue->close_device(ctx));
+		mlx5_free_shared_ibctx(sh);
 		return eth_dev;
 	}
 	/* Check port status. */
-	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
+	err = mlx5_glue->query_port(sh->ctx, spawn->ibv_port, &port_attr);
 	if (err) {
 		DRV_LOG(ERR, "port query failed: %s", strerror(err));
 		goto error;
@@ -973,13 +1097,7 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(DEBUG, "port is not active: \"%s\" (%d)",
 			mlx5_glue->port_state_str(port_attr.state),
 			port_attr.state);
-	/* Allocate protection domain. */
-	pd = mlx5_glue->alloc_pd(ctx);
-	if (pd == NULL) {
-		DRV_LOG(ERR, "PD allocation failure");
-		err = ENOMEM;
-		goto error;
-	}
+	/* Allocate private eth device data. */
 	priv = rte_zmalloc("ethdev private structure",
 			   sizeof(*priv),
 			   RTE_CACHE_LINE_SIZE);
@@ -988,13 +1106,11 @@ struct mlx5_dev_spawn_data {
 		err = ENOMEM;
 		goto error;
 	}
-	priv->ctx = ctx;
-	strncpy(priv->ibdev_name, priv->ctx->device->name,
-		sizeof(priv->ibdev_name));
-	strncpy(priv->ibdev_path, priv->ctx->device->ibdev_path,
-		sizeof(priv->ibdev_path));
-	priv->device_attr = attr;
-	priv->pd = pd;
+	priv->sh = sh;
+	priv->ctx = sh->ctx;
+	priv->ibv_port = spawn->ibv_port;
+	priv->device_attr = sh->device_attr;
+	priv->pd = sh->pd;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
@@ -1048,7 +1164,8 @@ struct mlx5_dev_spawn_data {
 			strerror(rte_errno));
 		goto error;
 	}
-	config.hw_csum = !!(attr.device_cap_flags_ex & IBV_DEVICE_RAW_IP_CSUM);
+	config.hw_csum = !!(sh->device_attr.device_cap_flags_ex &
+			    IBV_DEVICE_RAW_IP_CSUM);
 	DRV_LOG(DEBUG, "checksum offloading is %ssupported",
 		(config.hw_csum ? "" : "not "));
 #if !defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) && \
@@ -1062,7 +1179,7 @@ struct mlx5_dev_spawn_data {
 	}
 #endif
 	config.ind_table_max_size =
-		attr.rss_caps.max_rwq_indirection_table_size;
+		sh->device_attr.rss_caps.max_rwq_indirection_table_size;
 	/*
 	 * Remove this check once DPDK supports larger/variable
 	 * indirection tables.
@@ -1071,18 +1188,18 @@ struct mlx5_dev_spawn_data {
 		config.ind_table_max_size = ETH_RSS_RETA_SIZE_512;
 	DRV_LOG(DEBUG, "maximum Rx indirection table size is %u",
 		config.ind_table_max_size);
-	config.hw_vlan_strip = !!(attr.raw_packet_caps &
+	config.hw_vlan_strip = !!(sh->device_attr.raw_packet_caps &
 				  IBV_RAW_PACKET_CAP_CVLAN_STRIPPING);
 	DRV_LOG(DEBUG, "VLAN stripping is %ssupported",
 		(config.hw_vlan_strip ? "" : "not "));
-	config.hw_fcs_strip = !!(attr.raw_packet_caps &
+	config.hw_fcs_strip = !!(sh->device_attr.raw_packet_caps &
 				 IBV_RAW_PACKET_CAP_SCATTER_FCS);
 	DRV_LOG(DEBUG, "FCS stripping configuration is %ssupported",
 		(config.hw_fcs_strip ? "" : "not "));
 #if defined(HAVE_IBV_WQ_FLAG_RX_END_PADDING)
-	hw_padding = !!attr.rx_pad_end_addr_align;
+	hw_padding = !!sh->device_attr.rx_pad_end_addr_align;
 #elif defined(HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING)
-	hw_padding = !!(attr.device_cap_flags_ex &
+	hw_padding = !!(sh->device_attr.device_cap_flags_ex &
 			IBV_DEVICE_PCI_WRITE_END_PADDING);
 #endif
 	if (config.hw_padding && !hw_padding) {
@@ -1091,11 +1208,11 @@ struct mlx5_dev_spawn_data {
 	} else if (config.hw_padding) {
 		DRV_LOG(DEBUG, "Rx end alignment padding is enabled");
 	}
-	config.tso = (attr.tso_caps.max_tso > 0 &&
-		      (attr.tso_caps.supported_qpts &
+	config.tso = (sh->device_attr.tso_caps.max_tso > 0 &&
+		      (sh->device_attr.tso_caps.supported_qpts &
 		       (1 << IBV_QPT_RAW_PACKET)));
 	if (config.tso)
-		config.tso_max_payload_sz = attr.tso_caps.max_tso;
+		config.tso_max_payload_sz = sh->device_attr.tso_caps.max_tso;
 	/*
 	 * MPW is disabled by default, while the Enhanced MPW is enabled
 	 * by default.
@@ -1236,7 +1353,8 @@ struct mlx5_dev_spawn_data {
 		.free = &mlx5_free_verbs_buf,
 		.data = priv,
 	};
-	mlx5_glue->dv_set_context_attr(ctx, MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
+	mlx5_glue->dv_set_context_attr(sh->ctx,
+				       MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
 				       (void *)((uintptr_t)&alctr));
 	/* Bring Ethernet device up. */
 	DRV_LOG(DEBUG, "port %u forcing Ethernet interface up",
@@ -1290,15 +1408,13 @@ struct mlx5_dev_spawn_data {
 		if (eth_dev != NULL)
 			eth_dev->data->dev_private = NULL;
 	}
-	if (pd)
-		claim_zero(mlx5_glue->dealloc_pd(pd));
 	if (eth_dev != NULL) {
 		/* mac_addrs must not be freed alone because part of dev_private */
 		eth_dev->data->mac_addrs = NULL;
 		rte_eth_dev_release_port(eth_dev);
 	}
-	if (ctx)
-		claim_zero(mlx5_glue->close_device(ctx));
+	if (sh)
+		mlx5_free_shared_ibctx(sh);
 	assert(err > 0);
 	rte_errno = err;
 	return NULL;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC 06/10] net/mlx5: switch to the names in the shared IB context
  2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                   ` (4 preceding siblings ...)
  2019-02-28 18:03 ` [RFC 05/10] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
@ 2019-02-28 18:03 ` Viacheslav Ovsiienko
  2019-02-28 18:03 ` [RFC 07/10] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-02-28 18:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The IB device names are moved from device private data
to the shared context, code involving the names is updated.
The IB port index treatment is added where it is relevant.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h        |  2 --
 drivers/net/mlx5/mlx5_ethdev.c | 10 +++++++---
 drivers/net/mlx5/mlx5_stats.c  | 22 +++++++++++++---------
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6fba29c..27f208e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -219,8 +219,6 @@ struct mlx5_priv {
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
-	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index ebc63df..3452bf9 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -138,8 +138,10 @@ struct ethtool_link_settings {
 	unsigned int dev_port_prev = ~0u;
 	char match[IF_NAMESIZE] = "";
 
+	assert(priv);
+	assert(priv->sh);
 	{
-		MKSTR(path, "%s/device/net", priv->ibdev_path);
+		MKSTR(path, "%s/device/net", priv->sh->ibdev_path);
 
 		dir = opendir(path);
 		if (dir == NULL) {
@@ -159,7 +161,7 @@ struct ethtool_link_settings {
 			continue;
 
 		MKSTR(path, "%s/device/net/%s/%s",
-		      priv->ibdev_path, name,
+		      priv->sh->ibdev_path, name,
 		      (dev_type ? "dev_id" : "dev_port"));
 
 		file = fopen(path, "rb");
@@ -222,7 +224,9 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int ifindex =
 		priv->nl_socket_rdma >= 0 ?
-		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name, 1) : 0;
+		mlx5_nl_ifindex(priv->nl_socket_rdma,
+				priv->sh->ibdev_name,
+				priv->ibv_port) : 0;
 
 	if (!ifindex) {
 		if (!priv->representor)
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 6906dc8..5af199d 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -140,18 +140,22 @@
 mlx5_read_ib_stat(struct mlx5_priv *priv, const char *ctr_name, uint64_t *stat)
 {
 	FILE *file;
-	MKSTR(path, "%s/ports/1/hw_counters/%s",
-		  priv->ibdev_path,
-		  ctr_name);
+	if (priv->sh) {
+		MKSTR(path, "%s/ports/%d/hw_counters/%s",
+			  priv->sh->ibdev_path,
+			  priv->ibv_port,
+			  ctr_name);
 
-	file = fopen(path, "rb");
-	if (file) {
-		int n = fscanf(file, "%" SCNu64, stat);
+		file = fopen(path, "rb");
+		if (file) {
+			int n = fscanf(file, "%" SCNu64, stat);
 
-		fclose(file);
-		if (n != 1)
-			stat = 0;
+			fclose(file);
+			if (n == 1)
+				return;
+		}
 	}
+	*stat = 0;
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC 07/10] net/mlx5: switch to the shared Protection Domain
  2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                   ` (5 preceding siblings ...)
  2019-02-28 18:03 ` [RFC 06/10] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
@ 2019-02-28 18:03 ` Viacheslav Ovsiienko
  2019-02-28 18:03 ` [RFC 08/10] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-02-28 18:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The PMD code is updated to use Protected Domain from the
the shared IB device context. The Domain is shared between
all devices belonging to the same multiport Infiniband device.
If IB device has only one port, the PD is not shared, because
there is only ethernet device created over IB one.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c     |  1 -
 drivers/net/mlx5/mlx5.h     |  1 -
 drivers/net/mlx5/mlx5_mr.c  |  4 ++--
 drivers/net/mlx5/mlx5_rxq.c | 10 +++++-----
 drivers/net/mlx5/mlx5_txq.c |  2 +-
 5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4df6496..d895aec 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1110,7 +1110,6 @@ struct mlx5_dev_spawn_data {
 	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
 	priv->device_attr = sh->device_attr;
-	priv->pd = sh->pd;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 27f208e..dd0b9fa 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -218,7 +218,6 @@ struct mlx5_priv {
 	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
-	struct ibv_pd *pd; /* Protection Domain. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 700d83d..f7eb9a5 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -719,7 +719,7 @@ struct mr_update_mp_data {
 	 * mlx5_alloc_buf_extern() which eventually calls rte_malloc_socket()
 	 * through mlx5_alloc_verbs_buf().
 	 */
-	mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)data.start, len,
+	mr->ibv_mr = mlx5_glue->reg_mr(priv->sh->pd, (void *)data.start, len,
 				       IBV_ACCESS_LOCAL_WRITE);
 	if (mr->ibv_mr == NULL) {
 		DEBUG("port %u fail to create a verbs MR for address (%p)",
@@ -1156,7 +1156,7 @@ struct mr_update_mp_data {
 	}
 	DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)",
 		dev->data->port_id, mem_idx, mp->name);
-	mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
+	mr->ibv_mr = mlx5_glue->reg_mr(priv->sh->pd, (void *)addr, len,
 				       IBV_ACCESS_LOCAL_WRITE);
 	if (mr->ibv_mr == NULL) {
 		DRV_LOG(WARNING,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2f60999..0496c4e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -867,7 +867,7 @@ struct mlx5_rxq_ibv *
 		.max_wr = wqe_n >> rxq_data->sges_n,
 		/* Max number of scatter/gather elements in a WR. */
 		.max_sge = 1 << rxq_data->sges_n,
-		.pd = priv->pd,
+		.pd = priv->sh->pd,
 		.cq = tmpl->cq,
 		.comp_mask =
 			IBV_WQ_FLAGS_CVLAN_STRIPPING |
@@ -1831,7 +1831,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = hash_fields,
 			},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 		 },
 		 &qp_init_attr);
 #else
@@ -1850,7 +1850,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = hash_fields,
 			},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 		 });
 #endif
 	if (!qp) {
@@ -2006,7 +2006,7 @@ struct mlx5_rxq_ibv *
 			.wq_type = IBV_WQT_RQ,
 			.max_wr = 1,
 			.max_sge = 1,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 			.cq = cq,
 		 });
 	if (!wq) {
@@ -2160,7 +2160,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = 0,
 				},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd
+			.pd = priv->sh->pd
 		 });
 	if (!qp) {
 		DEBUG("port %u cannot allocate QP for drop queue",
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d185617..d3a5498 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -426,7 +426,7 @@ struct mlx5_txq_ibv *
 		 * Tx burst.
 		 */
 		.sq_sig_all = 0,
-		.pd = priv->pd,
+		.pd = priv->sh->pd,
 		.comp_mask = IBV_QP_INIT_ATTR_PD,
 	};
 	if (txq_data->max_inline)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC 08/10] net/mlx5: switch to the shared context IB attributes
  2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                   ` (6 preceding siblings ...)
  2019-02-28 18:03 ` [RFC 07/10] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
@ 2019-02-28 18:03 ` Viacheslav Ovsiienko
  2019-02-28 18:03 ` [RFC 09/10] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-02-28 18:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to use the shared IB device attributes,
located in the shared IB context. It saves some memory if
there are representors created over the single Infiniband
device with multuple ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  1 -
 drivers/net/mlx5/mlx5.h        |  1 -
 drivers/net/mlx5/mlx5_ethdev.c |  6 +++---
 drivers/net/mlx5/mlx5_rxq.c    |  8 ++++----
 drivers/net/mlx5/mlx5_txq.c    | 14 +++++++-------
 5 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d895aec..3a5edb5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1109,7 +1109,6 @@ struct mlx5_dev_spawn_data {
 	priv->sh = sh;
 	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
-	priv->device_attr = sh->device_attr;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index dd0b9fa..f922526 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -217,7 +217,6 @@ struct mlx5_priv {
 	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
 	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
-	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 3452bf9..368d71d 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -515,8 +515,8 @@ struct ethtool_link_settings {
 	 * Since we need one CQ per QP, the limit is the minimum number
 	 * between the two values.
 	 */
-	max = RTE_MIN(priv->device_attr.orig_attr.max_cq,
-		      priv->device_attr.orig_attr.max_qp);
+	max = RTE_MIN(priv->sh->device_attr.orig_attr.max_cq,
+		      priv->sh->device_attr.orig_attr.max_qp);
 	/* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */
 	if (max >= 65535)
 		max = 65535;
@@ -577,7 +577,7 @@ struct ethtool_link_settings {
 int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct ibv_device_attr *attr = &priv->device_attr.orig_attr;
+	struct ibv_device_attr *attr = &priv->sh->device_attr.orig_attr;
 	size_t size = strnlen(attr->fw_ver, sizeof(attr->fw_ver)) + 1;
 
 	if (fw_size < size)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 0496c4e..fd1c3a2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -856,10 +856,10 @@ struct mlx5_rxq_ibv *
 		rte_errno = ENOMEM;
 		goto error;
 	}
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_qp_wr is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_qp_wr);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_sge is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_sge);
+	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_qp_wr);
+	DRV_LOG(DEBUG, "port %u device_attr.max_sge is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_sge);
 	attr.wq.ibv = (struct ibv_wq_init_attr){
 		.wq_context = NULL, /* Could be useful in the future. */
 		.wq_type = IBV_WQT_RQ,
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d3a5498..10a3040 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -407,15 +407,15 @@ struct mlx5_txq_ibv *
 		.cap = {
 			/* Max number of outstanding WRs. */
 			.max_send_wr =
-				((priv->device_attr.orig_attr.max_qp_wr <
+				((priv->sh->device_attr.orig_attr.max_qp_wr <
 				  desc) ?
-				 priv->device_attr.orig_attr.max_qp_wr :
+				 priv->sh->device_attr.orig_attr.max_qp_wr :
 				 desc),
 			/*
 			 * Max number of scatter/gather elements in a WR,
 			 * must be 1 to prevent libmlx5 from trying to affect
 			 * too much memory. TX gather is not impacted by the
-			 * priv->device_attr.max_sge limit and will still work
+			 * device_attr.max_sge limit and will still work
 			 * properly.
 			 */
 			.max_send_sge = 1,
@@ -780,10 +780,10 @@ struct mlx5_txq_ctrl *
 	tmpl->txq.elts_n = log2above(desc);
 	tmpl->idx = idx;
 	txq_set_params(tmpl);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_qp_wr is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_qp_wr);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_sge is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_sge);
+	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_qp_wr);
+	DRV_LOG(DEBUG, "port %u device_attr.max_sge is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_sge);
 	tmpl->txq.elts =
 		(struct rte_mbuf *(*)[1 << tmpl->txq.elts_n])(tmpl + 1);
 	tmpl->txq.stats.idx = idx;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC 09/10] net/mlx5: switch to the shared IB device context
  2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                   ` (7 preceding siblings ...)
  2019-02-28 18:03 ` [RFC 08/10] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
@ 2019-02-28 18:03 ` Viacheslav Ovsiienko
  2019-02-28 18:03 ` [RFC 10/10] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
  10 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-02-28 18:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to use the shared IB device context and
device handles. The IB device context is shared between
reprentors created over the single multiport IB device. All
Verbs and DevX objects will be created whithin this shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Conflicts:
	drivers/net/mlx5/mlx5.c
	drivers/net/mlx5/mlx5_ethdev.c
	drivers/net/mlx5/mlx5_flow_verbs.c
	drivers/net/mlx5/mlx5_rxq.c
---
 drivers/net/mlx5/mlx5.c            |  3 +--
 drivers/net/mlx5/mlx5.h            |  1 -
 drivers/net/mlx5/mlx5_ethdev.c     | 13 +++++++------
 drivers/net/mlx5/mlx5_flow_dv.c    |  8 ++++----
 drivers/net/mlx5/mlx5_flow_verbs.c |  6 ++++--
 drivers/net/mlx5/mlx5_rxq.c        | 23 ++++++++++++-----------
 drivers/net/mlx5/mlx5_socket.c     |  4 ++--
 drivers/net/mlx5/mlx5_trigger.c    |  2 +-
 drivers/net/mlx5/mlx5_txq.c        |  4 ++--
 9 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 3a5edb5..083e617 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -406,7 +406,7 @@ struct mlx5_dev_spawn_data {
 
 	DRV_LOG(DEBUG, "port %u closing device \"%s\"",
 		dev->data->port_id,
-		((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+		((priv->sh->ctx != NULL) ? priv->sh->ctx->device->name : ""));
 	/* In case mlx5_dev_stop() has not been called. */
 	mlx5_dev_interrupt_handler_uninstall(dev);
 	mlx5_traffic_disable(dev);
@@ -1107,7 +1107,6 @@ struct mlx5_dev_spawn_data {
 		goto error;
 	}
 	priv->sh = sh;
-	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f922526..77bb83e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -216,7 +216,6 @@ struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
 	uint32_t ibv_port; /* IB device port number. */
-	struct ibv_context *ctx; /* Verbs context. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 368d71d..a4fb4df 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1055,7 +1055,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	}
 	/* Read all message and acknowledge them. */
 	for (;;) {
-		if (mlx5_glue->get_async_event(priv->ctx, &event))
+		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
 			break;
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
 			event.event_type == IBV_EVENT_PORT_ERR) &&
@@ -1142,12 +1142,13 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	int ret;
 	int flags;
 
-	assert(priv->ctx->async_fd > 0);
-	flags = fcntl(priv->ctx->async_fd, F_GETFL);
-	ret = fcntl(priv->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
+	assert(ctx->async_fd > 0);
+	flags = fcntl(ctx->async_fd, F_GETFL);
+	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
 	if (ret) {
 		DRV_LOG(INFO,
 			"port %u failed to change file descriptor async event"
@@ -1158,7 +1159,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	}
 	if (dev->data->dev_conf.intr_conf.lsc ||
 	    dev->data->dev_conf.intr_conf.rmv) {
-		priv->intr_handle.fd = priv->ctx->async_fd;
+		priv->intr_handle.fd = ctx->async_fd;
 		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
 		rte_intr_callback_register(&priv->intr_handle,
 					   mlx5_dev_interrupt_handler, dev);
@@ -1303,7 +1304,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	struct ibv_device_attr device_attr;
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (mlx5_glue->query_device(priv->ctx, &device_attr) == EIO)
+	if (mlx5_glue->query_device(priv->sh->ctx, &device_attr) == EIO)
 		return 1;
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index ebcdd15..d4a1149 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -836,7 +836,7 @@ struct field_modify_info modify_tcp[] = {
 	*cache_resource = *resource;
 	cache_resource->verbs_action =
 		mlx5_glue->dv_create_flow_action_packet_reformat
-			(priv->ctx, cache_resource->size,
+			(priv->sh->ctx, cache_resource->size,
 			 (cache_resource->size ? cache_resource->buf : NULL),
 			 cache_resource->reformat_type,
 			 cache_resource->ft_type);
@@ -1468,7 +1468,7 @@ struct field_modify_info modify_tcp[] = {
 	*cache_resource = *resource;
 	cache_resource->verbs_action =
 		mlx5_glue->dv_create_flow_action_modify_header
-					(priv->ctx,
+					(priv->sh->ctx,
 					 cache_resource->actions_num *
 					 sizeof(cache_resource->actions[0]),
 					 (uint64_t *)cache_resource->actions,
@@ -1528,7 +1528,7 @@ struct field_modify_info modify_tcp[] = {
 		ret = -ENOMEM;
 		goto error_exit;
 	}
-	ret = mlx5_devx_cmd_flow_counter_alloc(priv->ctx, dcs);
+	ret = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, dcs);
 	if (ret)
 		goto error_exit;
 	struct mlx5_flow_counter tmpl = {
@@ -2787,7 +2787,7 @@ struct field_modify_info modify_tcp[] = {
 	if (matcher->egress)
 		dv_attr.flags |= IBV_FLOW_ATTR_FLAGS_EGRESS;
 	cache_matcher->matcher_object =
-		mlx5_glue->dv_create_flow_matcher(priv->ctx, &dv_attr);
+		mlx5_glue->dv_create_flow_matcher(priv->sh->ctx, &dv_attr);
 	if (!cache_matcher->matcher_object) {
 		rte_free(cache_matcher);
 		return rte_flow_error_set(error, ENOMEM,
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 6c4f52f..9548d12 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -56,10 +56,11 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context ctx = priv->sh->ctx;
 	struct ibv_counter_set_init_attr init = {
 			 .counter_set_id = counter->id};
 
-	counter->cs = mlx5_glue->create_counter_set(priv->ctx, &init);
+	counter->cs = mlx5_glue->create_counter_set(ctx, &init);
 	if (!counter->cs) {
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
@@ -67,12 +68,13 @@
 	return 0;
 #elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	struct ibv_counters_init_attr init = {0};
 	struct ibv_counter_attach_attr attach;
 	int ret;
 
 	memset(&attach, 0, sizeof(attach));
-	counter->cs = mlx5_glue->create_counters(priv->ctx, &init);
+	counter->cs = mlx5_glue->create_counters(ctx, &init);
 	if (!counter->cs) {
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index fd1c3a2..dcb97c2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -799,7 +799,7 @@ struct mlx5_rxq_ibv *
 	}
 	tmpl->rxq_ctrl = rxq_ctrl;
 	if (rxq_ctrl->irq) {
-		tmpl->channel = mlx5_glue->create_comp_channel(priv->ctx);
+		tmpl->channel = mlx5_glue->create_comp_channel(priv->sh->ctx);
 		if (!tmpl->channel) {
 			DRV_LOG(ERR, "port %u: comp channel creation failure",
 				dev->data->port_id);
@@ -848,7 +848,7 @@ struct mlx5_rxq_ibv *
 	}
 #endif
 	tmpl->cq = mlx5_glue->cq_ex_to_cq
-		(mlx5_glue->dv_create_cq(priv->ctx, &attr.cq.ibv,
+		(mlx5_glue->dv_create_cq(priv->sh->ctx, &attr.cq.ibv,
 					 &attr.cq.mlx5));
 	if (tmpl->cq == NULL) {
 		DRV_LOG(ERR, "port %u Rx queue %u CQ creation failure",
@@ -905,10 +905,10 @@ struct mlx5_rxq_ibv *
 			.two_byte_shift_en = MLX5_MPRQ_TWO_BYTE_SHIFT,
 		};
 	}
-	tmpl->wq = mlx5_glue->dv_create_wq(priv->ctx, &attr.wq.ibv,
+	tmpl->wq = mlx5_glue->dv_create_wq(priv->sh->ctx, &attr.wq.ibv,
 					   &attr.wq.mlx5);
 #else
-	tmpl->wq = mlx5_glue->create_wq(priv->ctx, &attr.wq.ibv);
+	tmpl->wq = mlx5_glue->create_wq(priv->sh->ctx, &attr.wq.ibv);
 #endif
 	if (tmpl->wq == NULL) {
 		DRV_LOG(ERR, "port %u Rx queue %u WQ creation failure",
@@ -1643,7 +1643,7 @@ struct mlx5_ind_table_ibv *
 	for (j = 0; i != (unsigned int)(1 << wq_n); ++i, ++j)
 		wq[i] = wq[j];
 	ind_tbl->ind_table = mlx5_glue->create_rwq_ind_table
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_rwq_ind_table_init_attr){
 			.log_ind_tbl_size = wq_n,
 			.ind_tbl = wq,
@@ -1817,7 +1817,7 @@ struct mlx5_hrxq *
 	}
 #endif
 	qp = mlx5_glue->dv_create_qp
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
@@ -1836,7 +1836,7 @@ struct mlx5_hrxq *
 		 &qp_init_attr);
 #else
 	qp = mlx5_glue->create_qp_ex
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
@@ -1988,20 +1988,21 @@ struct mlx5_rxq_ibv *
 mlx5_rxq_ibv_drop_new(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	struct ibv_cq *cq;
 	struct ibv_wq *wq = NULL;
 	struct mlx5_rxq_ibv *rxq;
 
 	if (priv->drop_queue.rxq)
 		return priv->drop_queue.rxq;
-	cq = mlx5_glue->create_cq(priv->ctx, 1, NULL, NULL, 0);
+	cq = mlx5_glue->create_cq(ctx, 1, NULL, NULL, 0);
 	if (!cq) {
 		DEBUG("port %u cannot allocate CQ for drop queue",
 		      dev->data->port_id);
 		rte_errno = errno;
 		goto error;
 	}
-	wq = mlx5_glue->create_wq(priv->ctx,
+	wq = mlx5_glue->create_wq(ctx,
 		 &(struct ibv_wq_init_attr){
 			.wq_type = IBV_WQT_RQ,
 			.max_wr = 1,
@@ -2078,7 +2079,7 @@ struct mlx5_ind_table_ibv *
 	if (!rxq)
 		return NULL;
 	tmpl.ind_table = mlx5_glue->create_rwq_ind_table
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_rwq_ind_table_init_attr){
 			.log_ind_tbl_size = 0,
 			.ind_tbl = &rxq->wq,
@@ -2145,7 +2146,7 @@ struct mlx5_hrxq *
 	ind_tbl = mlx5_ind_table_ibv_drop_new(dev);
 	if (!ind_tbl)
 		return NULL;
-	qp = mlx5_glue->create_qp_ex(priv->ctx,
+	qp = mlx5_glue->create_qp_ex(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
diff --git a/drivers/net/mlx5/mlx5_socket.c b/drivers/net/mlx5/mlx5_socket.c
index 41cac3c..8fa6430 100644
--- a/drivers/net/mlx5/mlx5_socket.c
+++ b/drivers/net/mlx5/mlx5_socket.c
@@ -185,9 +185,9 @@
 	assert(cmsg != NULL);
 	cmsg->cmsg_level = SOL_SOCKET;
 	cmsg->cmsg_type = SCM_RIGHTS;
-	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->ctx->cmd_fd));
+	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->sh->ctx->cmd_fd));
 	fd = (int *)CMSG_DATA(cmsg);
-	*fd = priv->ctx->cmd_fd;
+	*fd = priv->sh->ctx->cmd_fd;
 	ret = sendmsg(conn_sock, &msg, 0);
 	if (ret < 0)
 		DRV_LOG(WARNING, "port %u cannot send response",
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 2137bdc..d13a1a1 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -58,7 +58,7 @@
 			goto error;
 		}
 	}
-	ret = mlx5_tx_uar_remap(dev, priv->ctx->cmd_fd);
+	ret = mlx5_tx_uar_remap(dev, priv->sh->ctx->cmd_fd);
 	if (ret) {
 		/* Adjust index for rollback. */
 		i = priv->txqs_n - 1;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 10a3040..25305b6 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -392,7 +392,7 @@ struct mlx5_txq_ibv *
 		((desc / MLX5_TX_COMP_THRESH) - 1) : 1;
 	if (is_empw_burst_func(tx_pkt_burst))
 		cqe_n += MLX5_TX_COMP_THRESH_INLINE_DIV;
-	tmpl.cq = mlx5_glue->create_cq(priv->ctx, cqe_n, NULL, NULL, 0);
+	tmpl.cq = mlx5_glue->create_cq(priv->sh->ctx, cqe_n, NULL, NULL, 0);
 	if (tmpl.cq == NULL) {
 		DRV_LOG(ERR, "port %u Tx queue %u CQ creation failure",
 			dev->data->port_id, idx);
@@ -435,7 +435,7 @@ struct mlx5_txq_ibv *
 		attr.init.max_tso_header = txq_ctrl->max_tso_header;
 		attr.init.comp_mask |= IBV_QP_INIT_ATTR_MAX_TSO_HEADER;
 	}
-	tmpl.qp = mlx5_glue->create_qp_ex(priv->ctx, &attr.init);
+	tmpl.qp = mlx5_glue->create_qp_ex(priv->sh->ctx, &attr.init);
 	if (tmpl.qp == NULL) {
 		DRV_LOG(ERR, "port %u Tx queue %u QP creation failure",
 			dev->data->port_id, idx);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [RFC 10/10] net/mlx5: provide IB port for the object being created
  2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                   ` (8 preceding siblings ...)
  2019-02-28 18:03 ` [RFC 09/10] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
@ 2019-02-28 18:03 ` Viacheslav Ovsiienko
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
  10 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-02-28 18:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to provide IB port index for the Verbs
objects being created - QP and Verbs Flows.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c       | 2 ++
 drivers/net/mlx5/mlx5_flow_verbs.c | 1 +
 drivers/net/mlx5/mlx5_txq.c        | 4 ++--
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index dea38e2..d09fdff 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -315,6 +315,7 @@ struct mlx5_flow_tunnel_info {
 int
 mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
 		struct ibv_flow_attr attr;
 		struct ibv_flow_spec_eth eth;
@@ -322,6 +323,7 @@ struct mlx5_flow_tunnel_info {
 	} flow_attr = {
 		.attr = {
 			.num_of_specs = 2,
+			.port = (uint8_t)priv->ibv_port,
 		},
 		.eth = {
 			.type = IBV_FLOW_SPEC_ETH,
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 9548d12..b69af6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -1550,6 +1550,7 @@
 	dev_flow->layers = item_flags;
 	dev_flow->verbs.attr->priority =
 		mlx5_flow_adjust_priority(dev, priority, subpriority);
+	dev_flow->verbs.attr->port = (uint8_t)priv->ibv_port;
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 25305b6..5062f5c 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -445,8 +445,8 @@ struct mlx5_txq_ibv *
 	attr.mod = (struct ibv_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = 1,
+		/* IB device port number. */
+		.port_num = (uint8_t)priv->ibv_port,
 	};
 	ret = mlx5_glue->modify_qp(tmpl.qp, &attr.mod,
 				   (IBV_QP_STATE | IBV_QP_PORT));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 00/14] net/mlx5: add support for multiport IB devices
  2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                   ` (9 preceding siblings ...)
  2019-02-28 18:03 ` [RFC 10/10] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
@ 2019-03-21  8:11 ` Viacheslav Ovsiienko
  2019-03-21  8:11   ` [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
                     ` (15 more replies)
  10 siblings, 16 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The Mellanox NICs support SR-IOV and have E-Switch feature. 
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context.

mlx5_pci_probe() routine is planned to be extended to probe
the ports of found Infiniband devices. All active ports,
belonging to the same Infiniband device will use the signle
shared Infiniband context of that device:
  - QP are created within shared context, with assigning to
    specified port
  - Verb flows are also created with specifying port index
  - Protection Domain is shared by all representors
  - Global Memory Regions Cache is shared by all representors,
    local ones are not (not in this patchset)
  - Shared Asynchronous Event Handler distributes events
    among the device ports (not in this patchset)
  - UAR mapping to user space is also shared (not in patchset)
		  
Each representor has its own attach network interface device,
interface name and index can be retrieved via IB device and
port index, so the link control is performed via Netlink
in the same way as in existing implementation.

This patchset also superseedes http://patches.dpdk.org/patch/50486/
		  
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Viacheslav Ovsiienko (14):
  net/mlx5: add representor recognition on kernels 5.x
  net/mlx5: introduce multiport IB device shared structure
  net/mlx5: modify get ifindex routine for multiport IB
  net/mlx5: add getting IB ports number for multiport IB
  net/mlx5: add multiport IB device support to probing
  net/mlx5: add IB shared context alloc/free functions
  net/mlx5: switch to the names in the shared IB context
  net/mlx5: switch to the shared Protection Domain
  net/mlx5: switch to the shared context IB attributes
  net/mlx5: switch to the shared IB device context
  net/mlx5: provide IB port for the object being created
  net/mlx5: update install/uninstall int handler routines
  net/mlx5: update event handler for multiport IB devices
  net/mlx5: add source vport match to the ingress rules

 drivers/net/mlx5/Makefile          |  10 +
 drivers/net/mlx5/meson.build       |   4 +
 drivers/net/mlx5/mlx5.c            | 549 +++++++++++++++++++++++++++----------
 drivers/net/mlx5/mlx5.h            |  38 ++-
 drivers/net/mlx5/mlx5_ethdev.c     | 241 ++++++++++------
 drivers/net/mlx5/mlx5_flow.c       |   2 +
 drivers/net/mlx5/mlx5_flow_dv.c    |  47 +++-
 drivers/net/mlx5/mlx5_flow_verbs.c |   7 +-
 drivers/net/mlx5/mlx5_mr.c         |   4 +-
 drivers/net/mlx5/mlx5_nl.c         | 101 ++++++-
 drivers/net/mlx5/mlx5_rxq.c        |  41 +--
 drivers/net/mlx5/mlx5_socket.c     |   4 +-
 drivers/net/mlx5/mlx5_stats.c      |  22 +-
 drivers/net/mlx5/mlx5_trigger.c    |   2 +-
 drivers/net/mlx5/mlx5_txq.c        |  24 +-
 15 files changed, 795 insertions(+), 301 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:13     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 02/14] net/mlx5: introduce multiport IB device shared structure Viacheslav Ovsiienko
                     ` (14 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The master device and VF representors were distinguished by
presence of port name, master device did not have one. The new Linux
kernels starting from 5.0 provide the port name for master device
and the implemented representor recognizing method does not work.
The new recognizing method is based on quiering the VF number,
created on the base of the device.

The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
attribute is specified in the Netlink request message.

Also the presence of device symlink in device sysfs folder is
added to distinguish representors with sysfs based method.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---

v3: - rebased over new port naming http://patches.dpdk.org/patch/51245/
    - master recognition is reinforced by checking vport for -1
      for new port naming schema

v2: - fopen replaced with opendir to detect whether directory exists

v1: http://patches.dpdk.org/patch/50411/
---
 drivers/net/mlx5/Makefile      | 10 ++++++++++
 drivers/net/mlx5/meson.build   |  4 ++++
 drivers/net/mlx5/mlx5.c        |  2 +-
 drivers/net/mlx5/mlx5.h        |  1 +
 drivers/net/mlx5/mlx5_ethdev.c | 13 +++++++++++--
 drivers/net/mlx5/mlx5_nl.c     | 36 +++++++++++++++++++++++++++++++++---
 6 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 1ed299d..3dd7e38 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -231,6 +231,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_NUM_VF \
+		linux/if_link.h \
+		enum IFLA_NUM_VF \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_EXT_MASK \
+		linux/if_link.h \
+		enum IFLA_EXT_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_IFLA_PHYS_SWITCH_ID \
 		linux/if_link.h \
 		enum IFLA_PHYS_SWITCH_ID \
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 0cf2f08..e3cb9bc 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -133,6 +133,10 @@ if build
 		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
 		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
 		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
+		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
+		'IFLA_NUM_VF' ],
+		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
+		'IFLA_EXT_MASK' ],
 		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
 		'IFLA_PHYS_SWITCH_ID' ],
 		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ad1975c..ea3d00c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -13,7 +13,6 @@
 #include <errno.h>
 #include <net/if.h>
 #include <sys/mman.h>
-#include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 
 /* Verbs header. */
@@ -1001,6 +1000,7 @@
 	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
 	priv->nl_sn = 0;
 	priv->representor = !!switch_info->representor;
+	priv->master = !!switch_info->master;
 	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
 	priv->representor_id =
 		switch_info->representor ? switch_info->port_name : -1;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a88cb4a..58bc37f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -214,6 +214,7 @@ struct mlx5_priv {
 	uint16_t mtu; /* Configured MTU. */
 	unsigned int isolated:1; /* Whether isolated mode is enabled. */
 	unsigned int representor:1; /* Device is a port representor. */
+	unsigned int master:1; /* Device is a E-Switch master. */
 	uint16_t domain_id; /* Switch domain identifier. */
 	int32_t representor_id; /* Port representor identifier. */
 	/* RX/TX queues. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 84d761c..81f2a42 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1362,8 +1362,10 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 		.port_name = 0,
 		.switch_id = 0,
 	};
+	DIR *dir;
 	bool port_name_set = false;
 	bool port_switch_id_set = false;
+	bool device_dir = false;
 	char c;
 
 	if (!if_indextoname(ifindex, ifname)) {
@@ -1375,6 +1377,8 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	      ifname);
 	MKSTR(phys_switch_id, "/sys/class/net/%s/phys_switch_id",
 	      ifname);
+	MKSTR(pci_device, "/sys/class/net/%s/device",
+	      ifname);
 
 	file = fopen(phys_port_name, "rb");
 	if (file != NULL) {
@@ -1391,8 +1395,13 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 		fscanf(file, "%" SCNx64 "%c", &data.switch_id, &c) == 2 &&
 		c == '\n';
 	fclose(file);
-	data.master = port_switch_id_set && !port_name_set;
-	data.representor = port_switch_id_set && port_name_set;
+	dir = opendir(pci_device);
+	if (dir != NULL) {
+		closedir(dir);
+		device_dir = true;
+	}
+	data.master = port_switch_id_set && (!port_name_set || device_dir);
+	data.representor = port_switch_id_set && port_name_set && !device_dir;
 	*info = data;
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 8a10109..aa49cb4 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -65,6 +65,12 @@
 #endif
 
 /* These are normally found in linux/if_link.h. */
+#ifndef HAVE_IFLA_NUM_VF
+#define IFLA_NUM_VF 21
+#endif
+#ifndef HAVE_IFLA_EXT_MASK
+#define IFLA_EXT_MASK 29
+#endif
 #ifndef HAVE_IFLA_PHYS_SWITCH_ID
 #define IFLA_PHYS_SWITCH_ID 36
 #endif
@@ -837,6 +843,7 @@ struct mlx5_nl_ifindex_data {
 	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
 	bool port_name_set = false;
 	bool switch_id_set = false;
+	bool num_vf_set = false;
 
 	if (nh->nlmsg_type != RTM_NEWLINK)
 		goto error;
@@ -848,6 +855,9 @@ struct mlx5_nl_ifindex_data {
 		if (ra->rta_len > nh->nlmsg_len - off)
 			goto error;
 		switch (ra->rta_type) {
+		case IFLA_NUM_VF:
+			num_vf_set = true;
+			break;
 		case IFLA_PHYS_PORT_NAME:
 			port_name_set =
 				mlx5_translate_port_name((char *)payload,
@@ -864,8 +874,19 @@ struct mlx5_nl_ifindex_data {
 		}
 		off += RTA_ALIGN(ra->rta_len);
 	}
-	info.master = switch_id_set && !port_name_set;
-	info.representor = switch_id_set && port_name_set;
+	if (switch_id_set) {
+		if (info.port_name_new) {
+			/* New representors naming schema. */
+			if (port_name_set) {
+				info.master = (info.port_name == -1);
+				info.representor = (info.port_name != -1);
+			}
+		} else {
+			/* Legacy representors naming schema. */
+			info.master = (!port_name_set || num_vf_set);
+			info.representor = port_name_set && !num_vf_set;
+		}
+	}
 	memcpy(arg, &info, sizeof(info));
 	return 0;
 error:
@@ -893,9 +914,13 @@ struct mlx5_nl_ifindex_data {
 	struct {
 		struct nlmsghdr nh;
 		struct ifinfomsg info;
+		struct rtattr rta;
+		uint32_t extmask;
 	} req = {
 		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(req.info)),
+			.nlmsg_len = NLMSG_LENGTH
+					(sizeof(req.info) +
+					 RTA_LENGTH(sizeof(uint32_t))),
 			.nlmsg_type = RTM_GETLINK,
 			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
 		},
@@ -903,6 +928,11 @@ struct mlx5_nl_ifindex_data {
 			.ifi_family = AF_UNSPEC,
 			.ifi_index = ifindex,
 		},
+		.rta = {
+			.rta_type = IFLA_EXT_MASK,
+			.rta_len = RTA_LENGTH(sizeof(int32_t)),
+		},
+		.extmask = RTE_LE32(1),
 	};
 	int ret;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 02/14] net/mlx5: introduce multiport IB device shared structure
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
  2019-03-21  8:11   ` [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21  8:11   ` [PATCH 03/14] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
                     ` (13 subsequent siblings)
  15 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The Mellanox NICs support SR-IOV and have E-Switch feature.
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context saving various resources. This patch
introduces appropriate structures.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 58bc37f..1716cd1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -196,10 +196,36 @@ struct mlx5_flow_tbl_resource {
 #define MLX5_MAX_TABLES 1024
 #define MLX5_GROUP_FACTOR 1
 
+/* Per port data of shared IB device. */
+struct mlx5_ibv_shared_port {
+	uint32_t port_id;
+};
+
+/*
+ * Shared Infiniband device context for Master/Representors
+ * which belong to same IB device with multiple IB ports.
+ **/
+struct mlx5_ibv_shared {
+	LIST_ENTRY(mlx5_ibv_shared) next;
+	uint32_t refcnt;
+	uint32_t devx:1; /* Opened with DV. */
+	uint32_t secondary:1; /* Temporary spawned by secondary process. */
+	uint32_t max_port; /* Maximal IB device port index. */
+	struct ibv_context *ctx; /* Verbs/DV context. */
+	struct ibv_pd *pd; /* Protection Domain. */
+	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
+	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
+	struct ibv_device_attr_ex device_attr; /* Device properties. */
+	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
+	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
+};
+
 struct mlx5_priv {
 	LIST_ENTRY(mlx5_priv) mem_event_cb;
 	/**< Called by memory event callback. */
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
+	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
+	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 03/14] net/mlx5: modify get ifindex routine for multiport IB
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
  2019-03-21  8:11   ` [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
  2019-03-21  8:11   ` [PATCH 02/14] net/mlx5: introduce multiport IB device shared structure Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:14     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 04/14] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
                     ` (12 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

There is the routine mlx5_nl_ifindex() returning the
network interface index associated with Infiniband device.
We are going to support multiport IB devices, now function
takes the IB port as argument and returns ifindex associated
with tuple <IB device, IB port>

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  2 +-
 drivers/net/mlx5/mlx5.h        |  2 +-
 drivers/net/mlx5/mlx5_ethdev.c |  2 +-
 drivers/net/mlx5/mlx5_nl.c     | 20 +++++++++++++-------
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ea3d00c..89c30af 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1441,7 +1441,7 @@ struct mlx5_dev_spawn_data {
 			list[i].ifindex = 0;
 		else
 			list[i].ifindex = mlx5_nl_ifindex
-				(nl_rdma, list[i].ibv_dev->name);
+				(nl_rdma, list[i].ibv_dev->name, 1);
 		if (nl_route < 0 ||
 		    !list[i].ifindex ||
 		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1716cd1..f5e29ca 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -470,7 +470,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct ether_addr *mac,
 void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
 int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
 int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
-unsigned int mlx5_nl_ifindex(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 81f2a42..5b44889 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -222,7 +222,7 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int ifindex =
 		priv->nl_socket_rdma >= 0 ?
-		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name) : 0;
+		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name, 1) : 0;
 
 	if (!ifindex) {
 		if (!priv->representor)
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index aa49cb4..6e9145a 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -85,11 +85,12 @@ struct mlx5_nl_mac_addr {
 	int mac_n; /**< Number of addresses in the array. */
 };
 
-/** Data structure used by mlx5_nl_ifindex_cb(). */
+/** Data structure used by mlx5_nl_cmdget_cb(). */
 struct mlx5_nl_ifindex_data {
 	const char *name; /**< IB device name (in). */
 	uint32_t ibindex; /**< IB device index (out). */
 	uint32_t ifindex; /**< Network interface index (out). */
+	uint32_t portnum; /**< IB device max port number. */
 };
 
 /**
@@ -695,12 +696,13 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_ifindex_cb(struct nlmsghdr *nh, void *arg)
+mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
 {
 	struct mlx5_nl_ifindex_data *data = arg;
 	size_t off = NLMSG_HDRLEN;
 	uint32_t ibindex = 0;
 	uint32_t ifindex = 0;
+	uint32_t portnum = 0;
 	int found = 0;
 
 	if (nh->nlmsg_type !=
@@ -725,6 +727,9 @@ struct mlx5_nl_ifindex_data {
 		case RDMA_NLDEV_ATTR_NDEV_INDEX:
 			ifindex = *(uint32_t *)payload;
 			break;
+		case RDMA_NLDEV_ATTR_PORT_INDEX:
+			portnum = *(uint32_t *)payload;
+			break;
 		default:
 			break;
 		}
@@ -733,6 +738,7 @@ struct mlx5_nl_ifindex_data {
 	if (found) {
 		data->ibindex = ibindex;
 		data->ifindex = ifindex;
+		data->portnum = portnum;
 	}
 	return 0;
 error:
@@ -751,15 +757,15 @@ struct mlx5_nl_ifindex_data {
  *   Netlink socket of the RDMA kind (NETLINK_RDMA).
  * @param[in] name
  *   IB device name.
- *
+ * @param[in] pindex
+ *   IB device port index, starting from 1
  * @return
  *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
  *   is set.
  */
 unsigned int
-mlx5_nl_ifindex(int nl, const char *name)
+mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
 {
-	static const uint32_t pindex = 1;
 	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.name = name,
@@ -785,7 +791,7 @@ struct mlx5_nl_ifindex_data {
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!data.ibindex)
@@ -808,7 +814,7 @@ struct mlx5_nl_ifindex_data {
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!data.ifindex)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 04/14] net/mlx5: add getting IB ports number for multiport IB
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 03/14] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:14     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 05/14] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
                     ` (11 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

There is the routine mlx5_nl_portnum() added to get
the number of ports of multiport Infiniband device.
It is assumed the Uplink/VF representors are attached
on this ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h    |  1 +
 drivers/net/mlx5/mlx5_nl.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f5e29ca..528ae02 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -470,6 +470,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct ether_addr *mac,
 void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
 int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
 int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
 unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 6e9145a..81483b8 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -826,6 +826,51 @@ struct mlx5_nl_ifindex_data {
 }
 
 /**
+ * Get the number of physical ports of given IB device.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ *
+ * @return
+ *   A valid (nonzero) number of ports on success, 0 otherwise
+ *   and rte_errno is set.
+ */
+unsigned int
+mlx5_nl_portnum(int nl, const char *name)
+{
+	uint32_t seq = random();
+	struct mlx5_nl_ifindex_data data = {
+		.name = name,
+		.ibindex = 0,
+		.ifindex = 0,
+		.portnum = 0,
+	};
+	struct nlmsghdr req = {
+		.nlmsg_len = NLMSG_LENGTH(0),
+		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					       RDMA_NLDEV_CMD_GET),
+		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+	};
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req, seq);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!data.ibindex) {
+		rte_errno = ENODEV;
+		return 0;
+	}
+	if (!data.portnum)
+		rte_errno = EINVAL;
+	return data.portnum;
+}
+
+/**
  * Process switch information from Netlink message.
  *
  * @param nh
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 05/14] net/mlx5: add multiport IB device support to probing
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 04/14] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:14     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 06/14] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
                     ` (10 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

mlx5_pci_probe() routine is refactored to probe the ports
of found Infiniband devices. All active ports (with attached
network interface), belonging to the same Infiniband device
will use the signle shared Infiniband context of that device.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 302 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 210 insertions(+), 92 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 89c30af..100e9f4 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -130,6 +130,16 @@
 /** Driver-specific log messages type. */
 int mlx5_logtype;
 
+/** Data associated with devices to spawn. */
+struct mlx5_dev_spawn_data {
+	uint32_t ifindex; /**< Network interface index. */
+	uint32_t max_port; /**< IB device maximal port index. */
+	uint32_t ibv_port; /**< IB device physical port index. */
+	struct mlx5_switch_info info; /**< Switch information. */
+	struct ibv_device *ibv_dev; /**< Associated IB device. */
+	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
+};
+
 /**
  * Prepare shared data between primary and secondary process.
  */
@@ -716,12 +726,10 @@
  *
  * @param dpdk_dev
  *   Backing DPDK device.
- * @param ibv_dev
- *   Verbs device.
+ * @param spawn
+ *   Verbs device parameters (name, port, switch_info) to spawn.
  * @param config
  *   Device configuration parameters.
- * @param[in] switch_info
- *   Switch properties of Ethernet device.
  *
  * @return
  *   A valid Ethernet device object on success, NULL otherwise and rte_errno
@@ -732,10 +740,11 @@
  */
 static struct rte_eth_dev *
 mlx5_dev_spawn(struct rte_device *dpdk_dev,
-	       struct ibv_device *ibv_dev,
-	       struct mlx5_dev_config config,
-	       const struct mlx5_switch_info *switch_info)
+	       struct mlx5_dev_spawn_data *spawn,
+	       struct mlx5_dev_config config)
 {
+	const struct mlx5_switch_info *switch_info = &spawn->info;
+	struct ibv_device *ibv_dev = spawn->ibv_dev;
 	struct ibv_context *ctx = NULL;
 	struct ibv_device_attr_ex attr;
 	struct ibv_port_attr port_attr;
@@ -952,7 +961,7 @@
 		return eth_dev;
 	}
 	/* Check port status. */
-	err = mlx5_glue->query_port(ctx, 1, &port_attr);
+	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
 	if (err) {
 		DRV_LOG(ERR, "port query failed: %s", strerror(err));
 		goto error;
@@ -1316,14 +1325,6 @@
 	return NULL;
 }
 
-/** Data associated with devices to spawn. */
-struct mlx5_dev_spawn_data {
-	unsigned int ifindex; /**< Network interface index. */
-	struct mlx5_switch_info info; /**< Switch information. */
-	struct ibv_device *ibv_dev; /**< Associated IB device. */
-	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
-};
-
 /**
  * Comparison callback to sort device data.
  *
@@ -1380,7 +1381,9 @@ struct mlx5_dev_spawn_data {
 	       struct rte_pci_device *pci_dev)
 {
 	struct ibv_device **ibv_list;
-	unsigned int n = 0;
+	unsigned int nd = 0;
+	unsigned int np = 0;
+	unsigned int ns = 0;
 	struct mlx5_dev_config dev_config;
 	int ret;
 
@@ -1392,8 +1395,14 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(ERR, "cannot list devices, is ib_uverbs loaded?");
 		return -rte_errno;
 	}
-
+	/*
+	 * First scan the list of all Infiniband devices to find
+	 * matching ones, gathering into the list.
+	 */
 	struct ibv_device *ibv_match[ret + 1];
+	int nl_route = -1;
+	int nl_rdma = -1;
+	unsigned int i;
 
 	while (ret-- > 0) {
 		struct rte_pci_addr pci_addr;
@@ -1408,77 +1417,183 @@ struct mlx5_dev_spawn_data {
 			continue;
 		DRV_LOG(INFO, "PCI information matches for device \"%s\"",
 			ibv_list[ret]->name);
-		ibv_match[n++] = ibv_list[ret];
+		ibv_match[nd++] = ibv_list[ret];
+	}
+	ibv_match[nd] = NULL;
+	if (!nd) {
+		/* No device macthes, just complain and bail out. */
+		mlx5_glue->free_device_list(ibv_list);
+		DRV_LOG(WARNING,
+			"no Verbs device matches PCI device " PCI_PRI_FMT ","
+			" are kernel drivers loaded?",
+			pci_dev->addr.domain, pci_dev->addr.bus,
+			pci_dev->addr.devid, pci_dev->addr.function);
+		rte_errno = ENOENT;
+		ret = -rte_errno;
+		return ret;
+	}
+	nl_route = mlx5_nl_init(NETLINK_ROUTE);
+	nl_rdma = mlx5_nl_init(NETLINK_RDMA);
+	if (nd == 1) {
+		/*
+		 * Found single matching device may have multiple ports.
+		 * Each port may be representor, we have to check the port
+		 * number and check the representors existence.
+		 */
+		if (nl_rdma >= 0)
+			np = mlx5_nl_portnum(nl_rdma, ibv_match[0]->name);
+		if (!np)
+			DRV_LOG(WARNING, "can not get IB device \"%s\""
+					 " ports number", ibv_match[0]->name);
 	}
-	ibv_match[n] = NULL;
-
-	struct mlx5_dev_spawn_data list[n];
-	int nl_route = n ? mlx5_nl_init(NETLINK_ROUTE) : -1;
-	int nl_rdma = n ? mlx5_nl_init(NETLINK_RDMA) : -1;
-	unsigned int i;
-	unsigned int u;
-
 	/*
-	 * The existence of several matching entries (n > 1) means port
-	 * representors have been instantiated. No existing Verbs call nor
-	 * /sys entries can tell them apart, this can only be done through
-	 * Netlink calls assuming kernel drivers are recent enough to
-	 * support them.
-	 *
-	 * In the event of identification failure through Netlink, try again
-	 * through sysfs, then either:
-	 *
-	 * 1. No device matches (n == 0), complain and bail out.
-	 * 2. A single IB device matches (n == 1) and is not a representor,
-	 *    assume no switch support.
-	 * 3. Otherwise no safe assumptions can be made; complain louder and
-	 *    bail out.
+	 * Now we can determine the maximal
+	 * amount of devices to be spawned.
 	 */
-	for (i = 0; i != n; ++i) {
-		list[i].ibv_dev = ibv_match[i];
-		list[i].eth_dev = NULL;
-		if (nl_rdma < 0)
-			list[i].ifindex = 0;
-		else
-			list[i].ifindex = mlx5_nl_ifindex
-				(nl_rdma, list[i].ibv_dev->name, 1);
-		if (nl_route < 0 ||
-		    !list[i].ifindex ||
-		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
-					&list[i].info) ||
-		    ((!list[i].info.representor && !list[i].info.master) &&
-		     mlx5_sysfs_switch_info(list[i].ifindex, &list[i].info))) {
-			list[i].ifindex = 0;
-			memset(&list[i].info, 0, sizeof(list[i].info));
-			continue;
+	struct mlx5_dev_spawn_data list[np ? np : nd];
+
+	if (np > 1) {
+		/*
+		 * Signle IB device with multiple ports found,
+		 * it may be E-Switch master device and representors.
+		 * We have to perform identification trough the ports.
+		 */
+		assert(nl_rdma >= 0);
+		assert(ns == 0);
+		assert(nd == 1);
+		for (i = 1; i <= np; ++i) {
+			list[ns].max_port = np;
+			list[ns].ibv_port = i;
+			list[ns].ibv_dev = ibv_match[0];
+			list[ns].eth_dev = NULL;
+			list[ns].ifindex = mlx5_nl_ifindex
+					(nl_rdma, list[ns].ibv_dev->name, i);
+			if (!list[ns].ifindex) {
+				/*
+				 * No network interface index found for the
+				 * specified port, it means there is no
+				 * representor on this port. It's OK,
+				 * there can be disabled ports, for example
+				 * if sriov_numvfs < sriov_totalvfs.
+				 */
+				continue;
+			}
+			ret = -1;
+			if (nl_route >= 0)
+				ret = mlx5_nl_switch_info
+					       (nl_route,
+						list[ns].ifindex,
+						&list[ns].info);
+			if (ret || (!list[ns].info.representor &&
+				    !list[ns].info.master)) {
+				/*
+				 * We failed to recognize representors with
+				 * Netlink, let's try to perform the task
+				 * with sysfs.
+				 */
+				ret =  mlx5_sysfs_switch_info
+						(list[ns].ifindex,
+						 &list[ns].info);
+			}
+			if (!ret && (list[ns].info.representor ^
+				     list[ns].info.master))
+				ns++;
 		}
-	}
-	if (nl_rdma >= 0)
-		close(nl_rdma);
-	if (nl_route >= 0)
-		close(nl_route);
-	/* Count unidentified devices. */
-	for (u = 0, i = 0; i != n; ++i)
-		if (!list[i].info.master && !list[i].info.representor)
-			++u;
-	if (u) {
-		if (n == 1 && u == 1) {
-			/* Case #2. */
-			DRV_LOG(INFO, "no switch support detected");
-		} else {
-			/* Case #3. */
+		if (!ns) {
+			DRV_LOG(ERR,
+				"unable to recognize master/representors"
+				" on the IB device with multiple ports");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
+		}
+	} else {
+		/*
+		 * The existence of several matching entries (nd > 1) means
+		 * port representors have been instantiated. No existing Verbs
+		 * call nor sysfs entries can tell them apart, this can only
+		 * be done through Netlink calls assuming kernel drivers are
+		 * recent enough to support them.
+		 *
+		 * In the event of identification failure through Netlink,
+		 * try again through sysfs, then:
+		 *
+		 * 1. A single IB device matches (nd == 1) with single
+		 *    port (np=0/1) and is not a representor, assume
+		 *    no switch support.
+		 *
+		 * 2. Otherwise no safe assumptions can be made;
+		 *    complain louder and bail out.
+		 */
+		np = 1;
+		for (i = 0; i != nd; ++i) {
+			memset(&list[ns].info, 0, sizeof(list[ns].info));
+			list[ns].max_port = 1;
+			list[ns].ibv_port = 1;
+			list[ns].ibv_dev = ibv_match[i];
+			list[ns].eth_dev = NULL;
+			list[ns].ifindex = 0;
+			if (nl_rdma >= 0)
+				list[ns].ifindex = mlx5_nl_ifindex
+					(nl_rdma, list[ns].ibv_dev->name, 1);
+			if (!list[ns].ifindex) {
+				/*
+				 * No network interface index found for the
+				 * specified device, it means there it is not
+				 * a representor/master.
+				 */
+				continue;
+			}
+			ret = -1;
+			if (nl_route >= 0)
+				ret = mlx5_nl_switch_info
+					       (nl_route,
+						list[ns].ifindex,
+						&list[ns].info);
+			if (ret || (!list[ns].info.representor &&
+				    !list[ns].info.master)) {
+				/*
+				 * We failed to recognize representors with
+				 * Netlink, let's try to perform the task
+				 * with sysfs.
+				 */
+				ret =  mlx5_sysfs_switch_info
+						(list[ns].ifindex,
+						 &list[ns].info);
+			}
+			if (!ret && (list[ns].info.representor ^
+				     list[ns].info.master)) {
+				ns++;
+			} else if ((nd == 1) &&
+				   !list[ns].info.representor &&
+				   !list[ns].info.master) {
+				/*
+				 * Single IB device with
+				 * one physical port and
+				 * attached network device.
+				 * May be SRIOV is not enabled
+				 * or there is no representors.
+				 */
+				DRV_LOG(INFO, "no E-Switch support detected");
+				ns++;
+				break;
+			}
+		}
+		if (!ns) {
 			DRV_LOG(ERR,
-				"unable to tell which of the matching devices"
-				" is the master (lack of kernel support?)");
-			n = 0;
+				"unable to recognize master/representors"
+				" on the multiple IB devices");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
 		}
 	}
+	assert(ns);
 	/*
 	 * Sort list to probe devices in natural order for users convenience
 	 * (i.e. master first, then representors from lowest to highest ID).
 	 */
-	if (n)
-		qsort(list, n, sizeof(*list), mlx5_dev_spawn_data_cmp);
+	qsort(list, ns, sizeof(*list), mlx5_dev_spawn_data_cmp);
 	/* Default configuration. */
 	dev_config = (struct mlx5_dev_config){
 		.hw_padding = 0,
@@ -1497,7 +1612,7 @@ struct mlx5_dev_spawn_data {
 			.min_rxqs_num = MLX5_MPRQ_MIN_RXQS,
 		},
 	};
-	/* Device speicific configuration. */
+	/* Device specific configuration. */
 	switch (pci_dev->id.device_id) {
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX5BF:
 		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS_BLUEFIELD;
@@ -1514,12 +1629,12 @@ struct mlx5_dev_spawn_data {
 	/* Set architecture-dependent default value if unset. */
 	if (dev_config.txqs_vec == MLX5_ARG_UNSET)
 		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS;
-	for (i = 0; i != n; ++i) {
+	for (i = 0; i != ns; ++i) {
 		uint32_t restore;
 
 		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
-						 list[i].ibv_dev, dev_config,
-						 &list[i].info);
+						 &list[i],
+						 dev_config);
 		if (!list[i].eth_dev) {
 			if (rte_errno != EBUSY && rte_errno != EEXIST)
 				break;
@@ -1532,16 +1647,7 @@ struct mlx5_dev_spawn_data {
 		list[i].eth_dev->data->dev_flags |= restore;
 		rte_eth_dev_probing_finish(list[i].eth_dev);
 	}
-	mlx5_glue->free_device_list(ibv_list);
-	if (!n) {
-		DRV_LOG(WARNING,
-			"no Verbs device matches PCI device " PCI_PRI_FMT ","
-			" are kernel drivers loaded?",
-			pci_dev->addr.domain, pci_dev->addr.bus,
-			pci_dev->addr.devid, pci_dev->addr.function);
-		rte_errno = ENOENT;
-		ret = -rte_errno;
-	} else if (i != n) {
+	if (i != ns) {
 		DRV_LOG(ERR,
 			"probe of PCI device " PCI_PRI_FMT " aborted after"
 			" encountering an error: %s",
@@ -1563,6 +1669,18 @@ struct mlx5_dev_spawn_data {
 	} else {
 		ret = 0;
 	}
+exit:
+	/*
+	 * Do the routine cleanup:
+	 * - close opened Netlink sockets
+	 * - free the Infiniband device list
+	 */
+	if (nl_rdma >= 0)
+		close(nl_rdma);
+	if (nl_route >= 0)
+		close(nl_route);
+	assert(ibv_list);
+	mlx5_glue->free_device_list(ibv_list);
 	return ret;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 06/14] net/mlx5: add IB shared context alloc/free functions
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (4 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 05/14] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:14     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 07/14] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
                     ` (9 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The functions to allocate and free shared IB context for
multiport is added. The IB device context, Protection
Domain, device attributes, Infiniband names are going
to be relocated to the shared structure from the device
private one. mlx5_dev_spawn() is updated to create shared
context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 234 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 176 insertions(+), 58 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 100e9f4..b3060de 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -140,6 +140,150 @@ struct mlx5_dev_spawn_data {
 	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
 };
 
+static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
+
+/**
+ * Allocate shared IB device context. If there is multiport device the
+ * master and representors will share this context, if there is single
+ * port dedicated IB device, the context will be used by only given
+ * port due to unification.
+ *
+ * Routine first searches the context for the spesified IB device name,
+ * if found the shared context assumed and reference counter is incremented.
+ * If no context found the new one is created and initialized with specified
+ * IB device context and parameters.
+ *
+ * @param[in] spawn
+ *   Pointer to the IB device attributes (name, port, etc).
+ *
+ * @return
+ *   Pointer to mlx5_ibv_shared object on success,
+ *   otherwise NULL and rte_errno is set.
+ */
+static struct mlx5_ibv_shared *
+mlx5_alloc_shared_ibctx(const struct mlx5_dev_spawn_data *spawn)
+{
+	struct mlx5_ibv_shared *sh;
+	int err = 0;
+
+	assert(spawn);
+	/* Search for IB context by device name. */
+	LIST_FOREACH(sh, &mlx5_ibv_list, next) {
+		if (!strcmp(sh->ibdev_name, spawn->ibv_dev->name)) {
+			assert(!sh->secondary);
+			sh->refcnt++;
+			return sh;
+		}
+	}
+	/* No device found, we have to create new sharted context. */
+	assert(spawn->max_port);
+	sh = rte_zmalloc("ethdev shared ib context",
+			 sizeof(struct mlx5_ibv_shared) +
+			 spawn->max_port *
+			 sizeof(struct mlx5_ibv_shared_port),
+			 RTE_CACHE_LINE_SIZE);
+	if (!sh) {
+		DRV_LOG(ERR, "shared context allocation failure");
+		rte_errno  = ENOMEM;
+		return NULL;
+	}
+	/* Try to open IB device with DV first, then usual Verbs. */
+	errno = 0;
+	sh->ctx = mlx5_glue->dv_open_device(spawn->ibv_dev);
+	if (sh->ctx) {
+		sh->devx = 1;
+		DRV_LOG(DEBUG, "DevX is supported");
+	} else {
+		sh->ctx = mlx5_glue->open_device(spawn->ibv_dev);
+		if (!sh->ctx) {
+			err = errno ? errno : ENODEV;
+			goto error;
+		}
+		DRV_LOG(DEBUG, "DevX is NOT supported");
+	}
+	err = mlx5_glue->query_device_ex(sh->ctx, NULL, &sh->device_attr);
+	if (err) {
+		DRV_LOG(DEBUG, "ibv_query_device_ex() failed");
+		goto error;
+	}
+	sh->refcnt = 1;
+	sh->max_port = spawn->max_port;
+	strncpy(sh->ibdev_name, sh->ctx->device->name,
+		sizeof(sh->ibdev_name));
+	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
+		sizeof(sh->ibdev_path));
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		/*
+		 * For secondary process we just open the IB device
+		 * and get attributes, there will no be real usage
+		 * of this structure, the secondary process will
+		 * use one from prpimary.
+		 */
+		sh->secondary = 1;
+		return sh;
+	}
+	sh->pd = mlx5_glue->alloc_pd(sh->ctx);
+	if (sh->pd == NULL) {
+		DRV_LOG(ERR, "PD allocation failure");
+		err = ENOMEM;
+		goto error;
+	}
+	LIST_INSERT_HEAD(&mlx5_ibv_list, sh, next);
+	return sh;
+error:
+	assert(sh);
+	if (sh->pd)
+		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->ctx)
+		claim_zero(mlx5_glue->close_device(sh->ctx));
+	rte_free(sh);
+	assert(err > 0);
+	rte_errno = err;
+	return NULL;
+}
+
+/**
+ * Free shared IB device context. Decrement counter and if zero free
+ * all allocated resources and close handles.
+ *
+ * @param[in] sh
+ *   Pointer to mlx5_ibv_shared object to free
+ */
+static void
+mlx5_free_shared_ibctx(struct mlx5_ibv_shared *sh)
+{
+#ifndef NDEBUG
+	/* Check the object presence in the list. */
+	struct mlx5_ibv_shared *lctx;
+
+	LIST_FOREACH(lctx, &mlx5_ibv_list, next)
+		if (lctx == sh)
+			break;
+	assert(lctx);
+	if (lctx != sh) {
+		DRV_LOG(ERR, "Freeing non-existing shared IB context");
+		return;
+	}
+#endif
+	assert(sh);
+	assert(sh->refcnt);
+	if (--sh->refcnt)
+		return;
+	/* Zero reference counter, we should release resources. */
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		assert(sh->secondary);
+		assert(sh->ctx);
+		assert(!sh->pd);
+	}
+	LIST_REMOVE(sh, next);
+	if (sh->pd)
+		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->ctx)
+		claim_zero(mlx5_glue->close_device(sh->ctx));
+	rte_free(sh);
+}
+
+
 /**
  * Prepare shared data between primary and secondary process.
  */
@@ -289,12 +433,10 @@ struct mlx5_dev_spawn_data {
 	}
 	mlx5_mprq_free_mp(dev);
 	mlx5_mr_release(dev);
-	if (priv->pd != NULL) {
-		assert(priv->ctx != NULL);
-		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
-		claim_zero(mlx5_glue->close_device(priv->ctx));
-	} else
-		assert(priv->ctx == NULL);
+	assert(priv->sh);
+	if (priv->sh)
+		mlx5_free_shared_ibctx(priv->sh);
+	priv->sh = NULL;
 	if (priv->rss_conf.rss_key != NULL)
 		rte_free(priv->rss_conf.rss_key);
 	if (priv->reta_idx != NULL)
@@ -744,11 +886,8 @@ struct mlx5_dev_spawn_data {
 	       struct mlx5_dev_config config)
 {
 	const struct mlx5_switch_info *switch_info = &spawn->info;
-	struct ibv_device *ibv_dev = spawn->ibv_dev;
-	struct ibv_context *ctx = NULL;
-	struct ibv_device_attr_ex attr;
+	struct mlx5_ibv_shared *sh;
 	struct ibv_port_attr port_attr;
-	struct ibv_pd *pd = NULL;
 	struct mlx5dv_context dv_attr = { .comp_mask = 0 };
 	struct rte_eth_dev *eth_dev = NULL;
 	struct mlx5_priv *priv = NULL;
@@ -807,18 +946,10 @@ struct mlx5_dev_spawn_data {
 	}
 	/* Prepare shared data between primary and secondary process. */
 	mlx5_prepare_shared_data();
-	errno = 0;
-	ctx = mlx5_glue->dv_open_device(ibv_dev);
-	if (ctx) {
-		config.devx = 1;
-		DRV_LOG(DEBUG, "DEVX is supported");
-	} else {
-		ctx = mlx5_glue->open_device(ibv_dev);
-		if (!ctx) {
-			rte_errno = errno ? errno : ENODEV;
-			return NULL;
-		}
-	}
+	sh = mlx5_alloc_shared_ibctx(spawn);
+	if (!sh)
+		return NULL;
+	config.devx = sh->devx;
 #ifdef HAVE_IBV_MLX5_MOD_SWP
 	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_SWP;
 #endif
@@ -832,7 +963,7 @@ struct mlx5_dev_spawn_data {
 #ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
 	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_STRIDING_RQ;
 #endif
-	mlx5_glue->dv_query_device(ctx, &dv_attr);
+	mlx5_glue->dv_query_device(sh->ctx, &dv_attr);
 	if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) {
 		if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW) {
 			DRV_LOG(DEBUG, "enhanced MPW is supported");
@@ -917,11 +1048,6 @@ struct mlx5_dev_spawn_data {
 		" old OFED/rdma-core version or firmware configuration");
 #endif
 	config.mpls_en = mpls_en;
-	err = mlx5_glue->query_device_ex(ctx, NULL, &attr);
-	if (err) {
-		DEBUG("ibv_query_device_ex() failed");
-		goto error;
-	}
 	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
 		eth_dev = rte_eth_dev_attach_secondary(name);
@@ -957,11 +1083,11 @@ struct mlx5_dev_spawn_data {
 		 */
 		eth_dev->rx_pkt_burst = mlx5_select_rx_function(eth_dev);
 		eth_dev->tx_pkt_burst = mlx5_select_tx_function(eth_dev);
-		claim_zero(mlx5_glue->close_device(ctx));
+		mlx5_free_shared_ibctx(sh);
 		return eth_dev;
 	}
 	/* Check port status. */
-	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
+	err = mlx5_glue->query_port(sh->ctx, spawn->ibv_port, &port_attr);
 	if (err) {
 		DRV_LOG(ERR, "port query failed: %s", strerror(err));
 		goto error;
@@ -975,13 +1101,7 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(DEBUG, "port is not active: \"%s\" (%d)",
 			mlx5_glue->port_state_str(port_attr.state),
 			port_attr.state);
-	/* Allocate protection domain. */
-	pd = mlx5_glue->alloc_pd(ctx);
-	if (pd == NULL) {
-		DRV_LOG(ERR, "PD allocation failure");
-		err = ENOMEM;
-		goto error;
-	}
+	/* Allocate private eth device data. */
 	priv = rte_zmalloc("ethdev private structure",
 			   sizeof(*priv),
 			   RTE_CACHE_LINE_SIZE);
@@ -990,13 +1110,11 @@ struct mlx5_dev_spawn_data {
 		err = ENOMEM;
 		goto error;
 	}
-	priv->ctx = ctx;
-	strncpy(priv->ibdev_name, priv->ctx->device->name,
-		sizeof(priv->ibdev_name));
-	strncpy(priv->ibdev_path, priv->ctx->device->ibdev_path,
-		sizeof(priv->ibdev_path));
-	priv->device_attr = attr;
-	priv->pd = pd;
+	priv->sh = sh;
+	priv->ctx = sh->ctx;
+	priv->ibv_port = spawn->ibv_port;
+	priv->device_attr = sh->device_attr;
+	priv->pd = sh->pd;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
@@ -1051,7 +1169,8 @@ struct mlx5_dev_spawn_data {
 			strerror(rte_errno));
 		goto error;
 	}
-	config.hw_csum = !!(attr.device_cap_flags_ex & IBV_DEVICE_RAW_IP_CSUM);
+	config.hw_csum = !!(sh->device_attr.device_cap_flags_ex &
+			    IBV_DEVICE_RAW_IP_CSUM);
 	DRV_LOG(DEBUG, "checksum offloading is %ssupported",
 		(config.hw_csum ? "" : "not "));
 #if !defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) && \
@@ -1065,7 +1184,7 @@ struct mlx5_dev_spawn_data {
 	}
 #endif
 	config.ind_table_max_size =
-		attr.rss_caps.max_rwq_indirection_table_size;
+		sh->device_attr.rss_caps.max_rwq_indirection_table_size;
 	/*
 	 * Remove this check once DPDK supports larger/variable
 	 * indirection tables.
@@ -1074,18 +1193,18 @@ struct mlx5_dev_spawn_data {
 		config.ind_table_max_size = ETH_RSS_RETA_SIZE_512;
 	DRV_LOG(DEBUG, "maximum Rx indirection table size is %u",
 		config.ind_table_max_size);
-	config.hw_vlan_strip = !!(attr.raw_packet_caps &
+	config.hw_vlan_strip = !!(sh->device_attr.raw_packet_caps &
 				  IBV_RAW_PACKET_CAP_CVLAN_STRIPPING);
 	DRV_LOG(DEBUG, "VLAN stripping is %ssupported",
 		(config.hw_vlan_strip ? "" : "not "));
-	config.hw_fcs_strip = !!(attr.raw_packet_caps &
+	config.hw_fcs_strip = !!(sh->device_attr.raw_packet_caps &
 				 IBV_RAW_PACKET_CAP_SCATTER_FCS);
 	DRV_LOG(DEBUG, "FCS stripping configuration is %ssupported",
 		(config.hw_fcs_strip ? "" : "not "));
 #if defined(HAVE_IBV_WQ_FLAG_RX_END_PADDING)
-	hw_padding = !!attr.rx_pad_end_addr_align;
+	hw_padding = !!sh->device_attr.rx_pad_end_addr_align;
 #elif defined(HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING)
-	hw_padding = !!(attr.device_cap_flags_ex &
+	hw_padding = !!(sh->device_attr.device_cap_flags_ex &
 			IBV_DEVICE_PCI_WRITE_END_PADDING);
 #endif
 	if (config.hw_padding && !hw_padding) {
@@ -1094,11 +1213,11 @@ struct mlx5_dev_spawn_data {
 	} else if (config.hw_padding) {
 		DRV_LOG(DEBUG, "Rx end alignment padding is enabled");
 	}
-	config.tso = (attr.tso_caps.max_tso > 0 &&
-		      (attr.tso_caps.supported_qpts &
+	config.tso = (sh->device_attr.tso_caps.max_tso > 0 &&
+		      (sh->device_attr.tso_caps.supported_qpts &
 		       (1 << IBV_QPT_RAW_PACKET)));
 	if (config.tso)
-		config.tso_max_payload_sz = attr.tso_caps.max_tso;
+		config.tso_max_payload_sz = sh->device_attr.tso_caps.max_tso;
 	/*
 	 * MPW is disabled by default, while the Enhanced MPW is enabled
 	 * by default.
@@ -1257,7 +1376,8 @@ struct mlx5_dev_spawn_data {
 		.free = &mlx5_free_verbs_buf,
 		.data = priv,
 	};
-	mlx5_glue->dv_set_context_attr(ctx, MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
+	mlx5_glue->dv_set_context_attr(sh->ctx,
+				       MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
 				       (void *)((uintptr_t)&alctr));
 	/* Bring Ethernet device up. */
 	DRV_LOG(DEBUG, "port %u forcing Ethernet interface up",
@@ -1311,15 +1431,13 @@ struct mlx5_dev_spawn_data {
 		if (eth_dev != NULL)
 			eth_dev->data->dev_private = NULL;
 	}
-	if (pd)
-		claim_zero(mlx5_glue->dealloc_pd(pd));
 	if (eth_dev != NULL) {
 		/* mac_addrs must not be freed alone because part of dev_private */
 		eth_dev->data->mac_addrs = NULL;
 		rte_eth_dev_release_port(eth_dev);
 	}
-	if (ctx)
-		claim_zero(mlx5_glue->close_device(ctx));
+	if (sh)
+		mlx5_free_shared_ibctx(sh);
 	assert(err > 0);
 	rte_errno = err;
 	return NULL;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 07/14] net/mlx5: switch to the names in the shared IB context
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (5 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 06/14] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:14     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 08/14] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
                     ` (8 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The IB device names are moved from device private data
to the shared context, code involving the names is updated.
The IB port index treatment is added where it is relevant.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.h        |  2 --
 drivers/net/mlx5/mlx5_ethdev.c | 10 +++++++---
 drivers/net/mlx5/mlx5_stats.c  | 22 +++++++++++++---------
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 528ae02..fb9ed3b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -229,8 +229,6 @@ struct mlx5_priv {
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
-	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 5b44889..2ccc743 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -138,8 +138,10 @@ struct ethtool_link_settings {
 	unsigned int dev_port_prev = ~0u;
 	char match[IF_NAMESIZE] = "";
 
+	assert(priv);
+	assert(priv->sh);
 	{
-		MKSTR(path, "%s/device/net", priv->ibdev_path);
+		MKSTR(path, "%s/device/net", priv->sh->ibdev_path);
 
 		dir = opendir(path);
 		if (dir == NULL) {
@@ -159,7 +161,7 @@ struct ethtool_link_settings {
 			continue;
 
 		MKSTR(path, "%s/device/net/%s/%s",
-		      priv->ibdev_path, name,
+		      priv->sh->ibdev_path, name,
 		      (dev_type ? "dev_id" : "dev_port"));
 
 		file = fopen(path, "rb");
@@ -222,7 +224,9 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int ifindex =
 		priv->nl_socket_rdma >= 0 ?
-		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name, 1) : 0;
+		mlx5_nl_ifindex(priv->nl_socket_rdma,
+				priv->sh->ibdev_name,
+				priv->ibv_port) : 0;
 
 	if (!ifindex) {
 		if (!priv->representor)
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 6906dc8..5af199d 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -140,18 +140,22 @@
 mlx5_read_ib_stat(struct mlx5_priv *priv, const char *ctr_name, uint64_t *stat)
 {
 	FILE *file;
-	MKSTR(path, "%s/ports/1/hw_counters/%s",
-		  priv->ibdev_path,
-		  ctr_name);
+	if (priv->sh) {
+		MKSTR(path, "%s/ports/%d/hw_counters/%s",
+			  priv->sh->ibdev_path,
+			  priv->ibv_port,
+			  ctr_name);
 
-	file = fopen(path, "rb");
-	if (file) {
-		int n = fscanf(file, "%" SCNu64, stat);
+		file = fopen(path, "rb");
+		if (file) {
+			int n = fscanf(file, "%" SCNu64, stat);
 
-		fclose(file);
-		if (n != 1)
-			stat = 0;
+			fclose(file);
+			if (n == 1)
+				return;
+		}
 	}
+	*stat = 0;
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 08/14] net/mlx5: switch to the shared Protection Domain
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (6 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 07/14] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:14     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 09/14] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
                     ` (7 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The PMD code is updated to use Protected Domain from the
the shared IB device context. The Domain is shared between
all devices belonging to the same multiport Infiniband device.
If IB device has only one port, the PD is not shared, because
there is only ethernet device created over IB one.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c     |  1 -
 drivers/net/mlx5/mlx5.h     |  1 -
 drivers/net/mlx5/mlx5_mr.c  |  4 ++--
 drivers/net/mlx5/mlx5_rxq.c | 10 +++++-----
 drivers/net/mlx5/mlx5_txq.c |  2 +-
 5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b3060de..16134de 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1114,7 +1114,6 @@ struct mlx5_dev_spawn_data {
 	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
 	priv->device_attr = sh->device_attr;
-	priv->pd = sh->pd;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index fb9ed3b..219addd 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -228,7 +228,6 @@ struct mlx5_priv {
 	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
-	struct ibv_pd *pd; /* Protection Domain. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 700d83d..f7eb9a5 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -719,7 +719,7 @@ struct mr_update_mp_data {
 	 * mlx5_alloc_buf_extern() which eventually calls rte_malloc_socket()
 	 * through mlx5_alloc_verbs_buf().
 	 */
-	mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)data.start, len,
+	mr->ibv_mr = mlx5_glue->reg_mr(priv->sh->pd, (void *)data.start, len,
 				       IBV_ACCESS_LOCAL_WRITE);
 	if (mr->ibv_mr == NULL) {
 		DEBUG("port %u fail to create a verbs MR for address (%p)",
@@ -1156,7 +1156,7 @@ struct mr_update_mp_data {
 	}
 	DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)",
 		dev->data->port_id, mem_idx, mp->name);
-	mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
+	mr->ibv_mr = mlx5_glue->reg_mr(priv->sh->pd, (void *)addr, len,
 				       IBV_ACCESS_LOCAL_WRITE);
 	if (mr->ibv_mr == NULL) {
 		DRV_LOG(WARNING,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2f60999..0496c4e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -867,7 +867,7 @@ struct mlx5_rxq_ibv *
 		.max_wr = wqe_n >> rxq_data->sges_n,
 		/* Max number of scatter/gather elements in a WR. */
 		.max_sge = 1 << rxq_data->sges_n,
-		.pd = priv->pd,
+		.pd = priv->sh->pd,
 		.cq = tmpl->cq,
 		.comp_mask =
 			IBV_WQ_FLAGS_CVLAN_STRIPPING |
@@ -1831,7 +1831,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = hash_fields,
 			},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 		 },
 		 &qp_init_attr);
 #else
@@ -1850,7 +1850,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = hash_fields,
 			},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 		 });
 #endif
 	if (!qp) {
@@ -2006,7 +2006,7 @@ struct mlx5_rxq_ibv *
 			.wq_type = IBV_WQT_RQ,
 			.max_wr = 1,
 			.max_sge = 1,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 			.cq = cq,
 		 });
 	if (!wq) {
@@ -2160,7 +2160,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = 0,
 				},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd
+			.pd = priv->sh->pd
 		 });
 	if (!qp) {
 		DEBUG("port %u cannot allocate QP for drop queue",
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d185617..d3a5498 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -426,7 +426,7 @@ struct mlx5_txq_ibv *
 		 * Tx burst.
 		 */
 		.sq_sig_all = 0,
-		.pd = priv->pd,
+		.pd = priv->sh->pd,
 		.comp_mask = IBV_QP_INIT_ATTR_PD,
 	};
 	if (txq_data->max_inline)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 09/14] net/mlx5: switch to the shared context IB attributes
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (7 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 08/14] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:14     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 10/14] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
                     ` (6 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to use the shared IB device attributes,
located in the shared IB context. It saves some memory if
there are representors created over the single Infiniband
device with multuple ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  1 -
 drivers/net/mlx5/mlx5.h        |  1 -
 drivers/net/mlx5/mlx5_ethdev.c |  6 +++---
 drivers/net/mlx5/mlx5_rxq.c    |  8 ++++----
 drivers/net/mlx5/mlx5_txq.c    | 14 +++++++-------
 5 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 16134de..a0f224b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1113,7 +1113,6 @@ struct mlx5_dev_spawn_data {
 	priv->sh = sh;
 	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
-	priv->device_attr = sh->device_attr;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 219addd..efa3a3c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -227,7 +227,6 @@ struct mlx5_priv {
 	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
 	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
-	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2ccc743..69f238b 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -515,8 +515,8 @@ struct ethtool_link_settings {
 	 * Since we need one CQ per QP, the limit is the minimum number
 	 * between the two values.
 	 */
-	max = RTE_MIN(priv->device_attr.orig_attr.max_cq,
-		      priv->device_attr.orig_attr.max_qp);
+	max = RTE_MIN(priv->sh->device_attr.orig_attr.max_cq,
+		      priv->sh->device_attr.orig_attr.max_qp);
 	/* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */
 	if (max >= 65535)
 		max = 65535;
@@ -577,7 +577,7 @@ struct ethtool_link_settings {
 int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct ibv_device_attr *attr = &priv->device_attr.orig_attr;
+	struct ibv_device_attr *attr = &priv->sh->device_attr.orig_attr;
 	size_t size = strnlen(attr->fw_ver, sizeof(attr->fw_ver)) + 1;
 
 	if (fw_size < size)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 0496c4e..fd1c3a2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -856,10 +856,10 @@ struct mlx5_rxq_ibv *
 		rte_errno = ENOMEM;
 		goto error;
 	}
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_qp_wr is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_qp_wr);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_sge is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_sge);
+	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_qp_wr);
+	DRV_LOG(DEBUG, "port %u device_attr.max_sge is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_sge);
 	attr.wq.ibv = (struct ibv_wq_init_attr){
 		.wq_context = NULL, /* Could be useful in the future. */
 		.wq_type = IBV_WQT_RQ,
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d3a5498..10a3040 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -407,15 +407,15 @@ struct mlx5_txq_ibv *
 		.cap = {
 			/* Max number of outstanding WRs. */
 			.max_send_wr =
-				((priv->device_attr.orig_attr.max_qp_wr <
+				((priv->sh->device_attr.orig_attr.max_qp_wr <
 				  desc) ?
-				 priv->device_attr.orig_attr.max_qp_wr :
+				 priv->sh->device_attr.orig_attr.max_qp_wr :
 				 desc),
 			/*
 			 * Max number of scatter/gather elements in a WR,
 			 * must be 1 to prevent libmlx5 from trying to affect
 			 * too much memory. TX gather is not impacted by the
-			 * priv->device_attr.max_sge limit and will still work
+			 * device_attr.max_sge limit and will still work
 			 * properly.
 			 */
 			.max_send_sge = 1,
@@ -780,10 +780,10 @@ struct mlx5_txq_ctrl *
 	tmpl->txq.elts_n = log2above(desc);
 	tmpl->idx = idx;
 	txq_set_params(tmpl);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_qp_wr is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_qp_wr);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_sge is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_sge);
+	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_qp_wr);
+	DRV_LOG(DEBUG, "port %u device_attr.max_sge is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_sge);
 	tmpl->txq.elts =
 		(struct rte_mbuf *(*)[1 << tmpl->txq.elts_n])(tmpl + 1);
 	tmpl->txq.stats.idx = idx;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 10/14] net/mlx5: switch to the shared IB device context
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (8 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 09/14] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:14     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 11/14] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
                     ` (5 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to use the shared IB device context and
device handles. The IB device context is shared between
reprentors created over the single multiport IB device. All
Verbs and DevX objects will be created whithin this shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c            |  7 +++----
 drivers/net/mlx5/mlx5.h            |  1 -
 drivers/net/mlx5/mlx5_ethdev.c     | 13 +++++++------
 drivers/net/mlx5/mlx5_flow_dv.c    |  9 +++++----
 drivers/net/mlx5/mlx5_flow_verbs.c |  6 ++++--
 drivers/net/mlx5/mlx5_rxq.c        | 23 ++++++++++++-----------
 drivers/net/mlx5/mlx5_socket.c     |  4 ++--
 drivers/net/mlx5/mlx5_trigger.c    |  2 +-
 drivers/net/mlx5/mlx5_txq.c        |  4 ++--
 9 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a0f224b..312c42b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -407,7 +407,7 @@ struct mlx5_dev_spawn_data {
 
 	DRV_LOG(DEBUG, "port %u closing device \"%s\"",
 		dev->data->port_id,
-		((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+		((priv->sh->ctx != NULL) ? priv->sh->ctx->device->name : ""));
 	/* In case mlx5_dev_stop() has not been called. */
 	mlx5_dev_interrupt_handler_uninstall(dev);
 	mlx5_traffic_disable(dev);
@@ -1111,7 +1111,6 @@ struct mlx5_dev_spawn_data {
 		goto error;
 	}
 	priv->sh = sh;
-	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
@@ -1349,7 +1348,7 @@ struct mlx5_dev_spawn_data {
 		}
 	}
 #ifdef HAVE_MLX5DV_DR
-		ns = mlx5dv_dr_create_ns(ctx,
+		ns = mlx5dv_dr_create_ns(sh->ctx,
 					 MLX5DV_DR_NS_DOMAIN_INGRESS_BYPASS);
 		if (ns == NULL) {
 			DRV_LOG(ERR, "mlx5dv_dr_create_ns failed");
@@ -1357,7 +1356,7 @@ struct mlx5_dev_spawn_data {
 			goto error;
 		}
 		priv->rx_ns = ns;
-		ns = mlx5dv_dr_create_ns(ctx,
+		ns = mlx5dv_dr_create_ns(sh->ctx,
 					 MLX5DV_DR_NS_DOMAIN_EGRESS_BYPASS);
 		if (ns == NULL) {
 			DRV_LOG(ERR, "mlx5dv_dr_create_ns failed");
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index efa3a3c..d816d24 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -226,7 +226,6 @@ struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
 	uint32_t ibv_port; /* IB device port number. */
-	struct ibv_context *ctx; /* Verbs context. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 69f238b..1b2173b 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1055,7 +1055,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	}
 	/* Read all message and acknowledge them. */
 	for (;;) {
-		if (mlx5_glue->get_async_event(priv->ctx, &event))
+		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
 			break;
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
 			event.event_type == IBV_EVENT_PORT_ERR) &&
@@ -1142,12 +1142,13 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	int ret;
 	int flags;
 
-	assert(priv->ctx->async_fd > 0);
-	flags = fcntl(priv->ctx->async_fd, F_GETFL);
-	ret = fcntl(priv->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
+	assert(ctx->async_fd > 0);
+	flags = fcntl(ctx->async_fd, F_GETFL);
+	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
 	if (ret) {
 		DRV_LOG(INFO,
 			"port %u failed to change file descriptor async event"
@@ -1158,7 +1159,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	}
 	if (dev->data->dev_conf.intr_conf.lsc ||
 	    dev->data->dev_conf.intr_conf.rmv) {
-		priv->intr_handle.fd = priv->ctx->async_fd;
+		priv->intr_handle.fd = ctx->async_fd;
 		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
 		rte_intr_callback_register(&priv->intr_handle,
 					   mlx5_dev_interrupt_handler, dev);
@@ -1303,7 +1304,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	struct ibv_device_attr device_attr;
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (mlx5_glue->query_device(priv->ctx, &device_attr) == EIO)
+	if (mlx5_glue->query_device(priv->sh->ctx, &device_attr) == EIO)
 		return 1;
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 44b36ee..b8943da 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -840,7 +840,7 @@ struct field_modify_info modify_tcp[] = {
 	*cache_resource = *resource;
 	cache_resource->verbs_action =
 		mlx5_glue->dv_create_flow_action_packet_reformat
-			(priv->ctx, cache_resource->reformat_type,
+			(priv->sh->ctx, cache_resource->reformat_type,
 			 cache_resource->ft_type, ns, cache_resource->flags,
 			 cache_resource->size,
 			 (cache_resource->size ? cache_resource->buf : NULL));
@@ -1569,7 +1569,8 @@ struct field_modify_info modify_tcp[] = {
 	*cache_resource = *resource;
 	cache_resource->verbs_action =
 		mlx5_glue->dv_create_flow_action_modify_header
-					(priv->ctx, cache_resource->ft_type,
+					(priv->sh->ctx,
+					 cache_resource->ft_type,
 					 ns, 0,
 					 cache_resource->actions_num *
 					 sizeof(cache_resource->actions[0]),
@@ -1629,7 +1630,7 @@ struct field_modify_info modify_tcp[] = {
 		ret = -ENOMEM;
 		goto error_exit;
 	}
-	ret = mlx5_devx_cmd_flow_counter_alloc(priv->ctx, dcs);
+	ret = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, dcs);
 	if (ret)
 		goto error_exit;
 	struct mlx5_flow_counter tmpl = {
@@ -2982,7 +2983,7 @@ struct field_modify_info modify_tcp[] = {
 	if (matcher->egress)
 		dv_attr.flags |= IBV_FLOW_ATTR_FLAGS_EGRESS;
 	cache_matcher->matcher_object =
-		mlx5_glue->dv_create_flow_matcher(priv->ctx, &dv_attr,
+		mlx5_glue->dv_create_flow_matcher(priv->sh->ctx, &dv_attr,
 						  tbl->obj);
 	if (!cache_matcher->matcher_object) {
 		rte_free(cache_matcher);
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 6c4f52f..9548d12 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -56,10 +56,11 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context ctx = priv->sh->ctx;
 	struct ibv_counter_set_init_attr init = {
 			 .counter_set_id = counter->id};
 
-	counter->cs = mlx5_glue->create_counter_set(priv->ctx, &init);
+	counter->cs = mlx5_glue->create_counter_set(ctx, &init);
 	if (!counter->cs) {
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
@@ -67,12 +68,13 @@
 	return 0;
 #elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	struct ibv_counters_init_attr init = {0};
 	struct ibv_counter_attach_attr attach;
 	int ret;
 
 	memset(&attach, 0, sizeof(attach));
-	counter->cs = mlx5_glue->create_counters(priv->ctx, &init);
+	counter->cs = mlx5_glue->create_counters(ctx, &init);
 	if (!counter->cs) {
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index fd1c3a2..dcb97c2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -799,7 +799,7 @@ struct mlx5_rxq_ibv *
 	}
 	tmpl->rxq_ctrl = rxq_ctrl;
 	if (rxq_ctrl->irq) {
-		tmpl->channel = mlx5_glue->create_comp_channel(priv->ctx);
+		tmpl->channel = mlx5_glue->create_comp_channel(priv->sh->ctx);
 		if (!tmpl->channel) {
 			DRV_LOG(ERR, "port %u: comp channel creation failure",
 				dev->data->port_id);
@@ -848,7 +848,7 @@ struct mlx5_rxq_ibv *
 	}
 #endif
 	tmpl->cq = mlx5_glue->cq_ex_to_cq
-		(mlx5_glue->dv_create_cq(priv->ctx, &attr.cq.ibv,
+		(mlx5_glue->dv_create_cq(priv->sh->ctx, &attr.cq.ibv,
 					 &attr.cq.mlx5));
 	if (tmpl->cq == NULL) {
 		DRV_LOG(ERR, "port %u Rx queue %u CQ creation failure",
@@ -905,10 +905,10 @@ struct mlx5_rxq_ibv *
 			.two_byte_shift_en = MLX5_MPRQ_TWO_BYTE_SHIFT,
 		};
 	}
-	tmpl->wq = mlx5_glue->dv_create_wq(priv->ctx, &attr.wq.ibv,
+	tmpl->wq = mlx5_glue->dv_create_wq(priv->sh->ctx, &attr.wq.ibv,
 					   &attr.wq.mlx5);
 #else
-	tmpl->wq = mlx5_glue->create_wq(priv->ctx, &attr.wq.ibv);
+	tmpl->wq = mlx5_glue->create_wq(priv->sh->ctx, &attr.wq.ibv);
 #endif
 	if (tmpl->wq == NULL) {
 		DRV_LOG(ERR, "port %u Rx queue %u WQ creation failure",
@@ -1643,7 +1643,7 @@ struct mlx5_ind_table_ibv *
 	for (j = 0; i != (unsigned int)(1 << wq_n); ++i, ++j)
 		wq[i] = wq[j];
 	ind_tbl->ind_table = mlx5_glue->create_rwq_ind_table
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_rwq_ind_table_init_attr){
 			.log_ind_tbl_size = wq_n,
 			.ind_tbl = wq,
@@ -1817,7 +1817,7 @@ struct mlx5_hrxq *
 	}
 #endif
 	qp = mlx5_glue->dv_create_qp
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
@@ -1836,7 +1836,7 @@ struct mlx5_hrxq *
 		 &qp_init_attr);
 #else
 	qp = mlx5_glue->create_qp_ex
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
@@ -1988,20 +1988,21 @@ struct mlx5_rxq_ibv *
 mlx5_rxq_ibv_drop_new(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	struct ibv_cq *cq;
 	struct ibv_wq *wq = NULL;
 	struct mlx5_rxq_ibv *rxq;
 
 	if (priv->drop_queue.rxq)
 		return priv->drop_queue.rxq;
-	cq = mlx5_glue->create_cq(priv->ctx, 1, NULL, NULL, 0);
+	cq = mlx5_glue->create_cq(ctx, 1, NULL, NULL, 0);
 	if (!cq) {
 		DEBUG("port %u cannot allocate CQ for drop queue",
 		      dev->data->port_id);
 		rte_errno = errno;
 		goto error;
 	}
-	wq = mlx5_glue->create_wq(priv->ctx,
+	wq = mlx5_glue->create_wq(ctx,
 		 &(struct ibv_wq_init_attr){
 			.wq_type = IBV_WQT_RQ,
 			.max_wr = 1,
@@ -2078,7 +2079,7 @@ struct mlx5_ind_table_ibv *
 	if (!rxq)
 		return NULL;
 	tmpl.ind_table = mlx5_glue->create_rwq_ind_table
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_rwq_ind_table_init_attr){
 			.log_ind_tbl_size = 0,
 			.ind_tbl = &rxq->wq,
@@ -2145,7 +2146,7 @@ struct mlx5_hrxq *
 	ind_tbl = mlx5_ind_table_ibv_drop_new(dev);
 	if (!ind_tbl)
 		return NULL;
-	qp = mlx5_glue->create_qp_ex(priv->ctx,
+	qp = mlx5_glue->create_qp_ex(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
diff --git a/drivers/net/mlx5/mlx5_socket.c b/drivers/net/mlx5/mlx5_socket.c
index 41cac3c..8fa6430 100644
--- a/drivers/net/mlx5/mlx5_socket.c
+++ b/drivers/net/mlx5/mlx5_socket.c
@@ -185,9 +185,9 @@
 	assert(cmsg != NULL);
 	cmsg->cmsg_level = SOL_SOCKET;
 	cmsg->cmsg_type = SCM_RIGHTS;
-	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->ctx->cmd_fd));
+	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->sh->ctx->cmd_fd));
 	fd = (int *)CMSG_DATA(cmsg);
-	*fd = priv->ctx->cmd_fd;
+	*fd = priv->sh->ctx->cmd_fd;
 	ret = sendmsg(conn_sock, &msg, 0);
 	if (ret < 0)
 		DRV_LOG(WARNING, "port %u cannot send response",
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 2137bdc..d13a1a1 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -58,7 +58,7 @@
 			goto error;
 		}
 	}
-	ret = mlx5_tx_uar_remap(dev, priv->ctx->cmd_fd);
+	ret = mlx5_tx_uar_remap(dev, priv->sh->ctx->cmd_fd);
 	if (ret) {
 		/* Adjust index for rollback. */
 		i = priv->txqs_n - 1;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 10a3040..25305b6 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -392,7 +392,7 @@ struct mlx5_txq_ibv *
 		((desc / MLX5_TX_COMP_THRESH) - 1) : 1;
 	if (is_empw_burst_func(tx_pkt_burst))
 		cqe_n += MLX5_TX_COMP_THRESH_INLINE_DIV;
-	tmpl.cq = mlx5_glue->create_cq(priv->ctx, cqe_n, NULL, NULL, 0);
+	tmpl.cq = mlx5_glue->create_cq(priv->sh->ctx, cqe_n, NULL, NULL, 0);
 	if (tmpl.cq == NULL) {
 		DRV_LOG(ERR, "port %u Tx queue %u CQ creation failure",
 			dev->data->port_id, idx);
@@ -435,7 +435,7 @@ struct mlx5_txq_ibv *
 		attr.init.max_tso_header = txq_ctrl->max_tso_header;
 		attr.init.comp_mask |= IBV_QP_INIT_ATTR_MAX_TSO_HEADER;
 	}
-	tmpl.qp = mlx5_glue->create_qp_ex(priv->ctx, &attr.init);
+	tmpl.qp = mlx5_glue->create_qp_ex(priv->sh->ctx, &attr.init);
 	if (tmpl.qp == NULL) {
 		DRV_LOG(ERR, "port %u Tx queue %u QP creation failure",
 			dev->data->port_id, idx);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 11/14] net/mlx5: provide IB port for the object being created
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (9 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 10/14] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:15     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 12/14] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
                     ` (4 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to provide IB port index for the Verbs
objects being created - QP and Verbs Flows.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c       | 2 ++
 drivers/net/mlx5/mlx5_flow_verbs.c | 1 +
 drivers/net/mlx5/mlx5_txq.c        | 4 ++--
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 010d61a..40d6818 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -315,6 +315,7 @@ struct mlx5_flow_tunnel_info {
 int
 mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
 		struct ibv_flow_attr attr;
 		struct ibv_flow_spec_eth eth;
@@ -322,6 +323,7 @@ struct mlx5_flow_tunnel_info {
 	} flow_attr = {
 		.attr = {
 			.num_of_specs = 2,
+			.port = (uint8_t)priv->ibv_port,
 		},
 		.eth = {
 			.type = IBV_FLOW_SPEC_ETH,
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 9548d12..b69af6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -1550,6 +1550,7 @@
 	dev_flow->layers = item_flags;
 	dev_flow->verbs.attr->priority =
 		mlx5_flow_adjust_priority(dev, priority, subpriority);
+	dev_flow->verbs.attr->port = (uint8_t)priv->ibv_port;
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 25305b6..5062f5c 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -445,8 +445,8 @@ struct mlx5_txq_ibv *
 	attr.mod = (struct ibv_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = 1,
+		/* IB device port number. */
+		.port_num = (uint8_t)priv->ibv_port,
 	};
 	ret = mlx5_glue->modify_qp(tmpl.qp, &attr.mod,
 				   (IBV_QP_STATE | IBV_QP_PORT));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 12/14] net/mlx5: update install/uninstall int handler routines
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (10 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 11/14] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:15     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 13/14] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

We are implementing the support for multport Infiniband device
withj representors attached to these multiple ports. Asynchronous
device event notifications (link status change, removal event, etc.)
should be shared between ports. We are going to implement shared
event handler and this patch introduces appropriate device
structure changes and updated event handler install and uninstall
routines.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  14 ++++-
 drivers/net/mlx5/mlx5.h        |   3 +-
 drivers/net/mlx5/mlx5_ethdev.c | 118 ++++++++++++++++++++++++++++++++---------
 3 files changed, 107 insertions(+), 28 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 312c42b..44b7a87 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -165,6 +165,7 @@ struct mlx5_dev_spawn_data {
 {
 	struct mlx5_ibv_shared *sh;
 	int err = 0;
+	uint32_t i;
 
 	assert(spawn);
 	/* Search for IB context by device name. */
@@ -212,6 +213,9 @@ struct mlx5_dev_spawn_data {
 		sizeof(sh->ibdev_name));
 	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
 		sizeof(sh->ibdev_path));
+	pthread_mutex_init(&sh->intr_mutex, NULL);
+	for (i = 0; i < sh->max_port; i++)
+		sh->port[i].port_id = RTE_MAX_ETHPORTS;
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
 		/*
 		 * For secondary process we just open the IB device
@@ -276,6 +280,15 @@ struct mlx5_dev_spawn_data {
 		assert(!sh->pd);
 	}
 	LIST_REMOVE(sh, next);
+	/*
+	 *  Ensure there is no async event handler installed.
+	 *  Only primary process handles async device events.
+	 **/
+	assert(!sh->intr_cnt);
+	if (sh->intr_cnt)
+		rte_intr_callback_unregister
+			(&sh->intr_handle, mlx5_dev_interrupt_handler, sh);
+	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
@@ -283,7 +296,6 @@ struct mlx5_dev_spawn_data {
 	rte_free(sh);
 }
 
-
 /**
  * Prepare shared data between primary and secondary process.
  */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d816d24..f23298e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -216,6 +216,8 @@ struct mlx5_ibv_shared {
 	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
 	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
+	pthread_mutex_t intr_mutex; /* Interrupt config mutex. */
+	uint32_t intr_cnt; /* Interrupt handler reference counter. */
 	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
@@ -245,7 +247,6 @@ struct mlx5_priv {
 	struct mlx5_txq_data *(*txqs)[]; /* TX queues. */
 	struct rte_mempool *mprq_mp; /* Mempool for Multi-Packet RQ. */
 	struct rte_eth_rss_conf rss_conf; /* RSS configuration. */
-	struct rte_intr_handle intr_handle; /* Interrupt handler. */
 	unsigned int (*reta_idx)[]; /* RETA index table. */
 	unsigned int reta_idx_n; /* RETA index size. */
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 1b2173b..8358cd2 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1109,6 +1109,96 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
+ * Uninstall shared asynchronous device events handler.
+ * This function is implemeted to support event sharing
+ * between multiple ports of single IB device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_dev_shared_handler_uninstall(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+	pthread_mutex_lock(&sh->intr_mutex);
+	assert(priv->ibv_port);
+	assert(priv->ibv_port <= sh->max_port);
+	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
+	if (sh->port[priv->ibv_port - 1].port_id >= RTE_MAX_ETHPORTS)
+		goto exit;
+	assert(sh->port[priv->ibv_port - 1].port_id ==
+					(uint32_t)dev->data->port_id);
+	assert(sh->intr_cnt);
+	sh->port[priv->ibv_port - 1].port_id = RTE_MAX_ETHPORTS;
+	if (!sh->intr_cnt || --sh->intr_cnt)
+		goto exit;
+	rte_intr_callback_unregister(&sh->intr_handle,
+				     mlx5_dev_interrupt_handler, sh);
+	sh->intr_handle.fd = 0;
+	sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+exit:
+	pthread_mutex_unlock(&sh->intr_mutex);
+}
+
+/**
+ * Install shared asyncronous device events handler.
+ * This function is implemeted to support event sharing
+ * between multiple ports of single IB device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_dev_shared_handler_install(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
+	int ret;
+	int flags;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+	pthread_mutex_lock(&sh->intr_mutex);
+	assert(priv->ibv_port);
+	assert(priv->ibv_port <= sh->max_port);
+	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
+	if (sh->port[priv->ibv_port - 1].port_id < RTE_MAX_ETHPORTS) {
+		/* The handler is already installed for this port. */
+		assert(sh->intr_cnt++);
+		goto exit;
+	}
+	sh->port[priv->ibv_port - 1].port_id = (uint32_t)dev->data->port_id;
+	if (sh->intr_cnt) {
+		sh->intr_cnt++;
+		goto exit;
+	}
+	/* No shared handler installed. */
+	assert(sh->ctx->async_fd > 0);
+	flags = fcntl(sh->ctx->async_fd, F_GETFL);
+	ret = fcntl(sh->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
+	if (ret) {
+		DRV_LOG(INFO, "failed to change file descriptor"
+			      " async event queue");
+		/* Indicate there will be no interrupts. */
+		dev->data->dev_conf.intr_conf.lsc = 0;
+		dev->data->dev_conf.intr_conf.rmv = 0;
+		sh->port[priv->ibv_port - 1].port_id = RTE_MAX_ETHPORTS;
+		goto exit;
+	}
+	sh->intr_handle.fd = sh->ctx->async_fd;
+	sh->intr_handle.type = RTE_INTR_HANDLE_EXT;
+	rte_intr_callback_register(&sh->intr_handle,
+				   mlx5_dev_interrupt_handler, sh);
+	sh->intr_cnt++;
+exit:
+	pthread_mutex_unlock(&sh->intr_mutex);
+}
+
+/**
  * Uninstall interrupt handler.
  *
  * @param dev
@@ -1119,15 +1209,10 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (dev->data->dev_conf.intr_conf.lsc ||
-	    dev->data->dev_conf.intr_conf.rmv)
-		rte_intr_callback_unregister(&priv->intr_handle,
-					     mlx5_dev_interrupt_handler, dev);
+	mlx5_dev_shared_handler_uninstall(dev);
 	if (priv->primary_socket)
 		rte_intr_callback_unregister(&priv->intr_handle_socket,
 					     mlx5_dev_handler_socket, dev);
-	priv->intr_handle.fd = 0;
-	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 	priv->intr_handle_socket.fd = 0;
 	priv->intr_handle_socket.type = RTE_INTR_HANDLE_UNKNOWN;
 }
@@ -1142,28 +1227,9 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct ibv_context *ctx = priv->sh->ctx;
 	int ret;
-	int flags;
 
-	assert(ctx->async_fd > 0);
-	flags = fcntl(ctx->async_fd, F_GETFL);
-	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
-	if (ret) {
-		DRV_LOG(INFO,
-			"port %u failed to change file descriptor async event"
-			" queue",
-			dev->data->port_id);
-		dev->data->dev_conf.intr_conf.lsc = 0;
-		dev->data->dev_conf.intr_conf.rmv = 0;
-	}
-	if (dev->data->dev_conf.intr_conf.lsc ||
-	    dev->data->dev_conf.intr_conf.rmv) {
-		priv->intr_handle.fd = ctx->async_fd;
-		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
-		rte_intr_callback_register(&priv->intr_handle,
-					   mlx5_dev_interrupt_handler, dev);
-	}
+	mlx5_dev_shared_handler_install(dev);
 	ret = mlx5_socket_init(dev);
 	if (ret)
 		DRV_LOG(ERR, "port %u cannot initialise socket: %s",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 13/14] net/mlx5: update event handler for multiport IB devices
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (11 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 12/14] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:15     ` Shahaf Shuler
  2019-03-21  8:11   ` [PATCH 14/14] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

This patch modifies asynchronous event handler to support multiport
Infiniband devices. Handler queries the event parameters, including
event source port index, and invokes the handler for specific
devices with appropriate port_id.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_ethdev.c | 101 +++++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 8358cd2..710e6b5 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1032,66 +1032,67 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Device status handler.
+ * Handle shared asynchronous events the NIC (removal event
+ * and link status change). Supports multiport IB device.
  *
- * @param dev
- *   Pointer to Ethernet device.
- * @param events
- *   Pointer to event flags holder.
- *
- * @return
- *   Events bitmap of callback process which can be called immediately.
+ * @param cb_arg
+ *   Callback argument.
  */
-static uint32_t
-mlx5_dev_status_handler(struct rte_eth_dev *dev)
+void
+mlx5_dev_interrupt_handler(void *cb_arg)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = cb_arg;
 	struct ibv_async_event event;
-	uint32_t ret = 0;
 
-	if (mlx5_link_update(dev, 0) == -EAGAIN) {
-		usleep(0);
-		return 0;
-	}
-	/* Read all message and acknowledge them. */
+	/* Read all message from the IB device and acknowledge them. */
 	for (;;) {
-		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
+		struct rte_eth_dev *dev;
+		uint32_t tmp;
+
+		if (mlx5_glue->get_async_event(sh->ctx, &event))
 			break;
+		/* Retrieve and check IB port index. */
+		tmp = (uint32_t)event.element.port_num;
+		assert(tmp && (tmp <= sh->max_port));
+		if (!tmp ||
+		    tmp > sh->max_port ||
+		    sh->port[tmp - 1].port_id >= RTE_MAX_ETHPORTS) {
+			/*
+			 * Invalid IB port index or no handler
+			 * installed for this port.
+			 */
+			mlx5_glue->ack_async_event(&event);
+			continue;
+		}
+		/* Retrieve ethernet device descriptor. */
+		tmp = sh->port[tmp - 1].port_id;
+		dev = &rte_eth_devices[tmp];
+		tmp = 0;
+		assert(dev);
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
-			event.event_type == IBV_EVENT_PORT_ERR) &&
-			(dev->data->dev_conf.intr_conf.lsc == 1))
-			ret |= (1 << RTE_ETH_EVENT_INTR_LSC);
-		else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
-			dev->data->dev_conf.intr_conf.rmv == 1)
-			ret |= (1 << RTE_ETH_EVENT_INTR_RMV);
-		else
-			DRV_LOG(DEBUG,
-				"port %u event type %d on not handled",
-				dev->data->port_id, event.event_type);
+		     event.event_type == IBV_EVENT_PORT_ERR) &&
+			dev->data->dev_conf.intr_conf.lsc) {
+			mlx5_glue->ack_async_event(&event);
+			if (mlx5_link_update(dev, 0) == -EAGAIN) {
+				usleep(0);
+				continue;
+			}
+			_rte_eth_dev_callback_process
+				(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
+			continue;
+		}
+		if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
+		    dev->data->dev_conf.intr_conf.rmv) {
+			mlx5_glue->ack_async_event(&event);
+			_rte_eth_dev_callback_process
+				(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
+			continue;
+		}
+		DRV_LOG(DEBUG,
+			"port %u event type %d on not handled",
+			dev->data->port_id, event.event_type);
 		mlx5_glue->ack_async_event(&event);
 	}
-	return ret;
-}
-
-/**
- * Handle interrupts from the NIC.
- *
- * @param[in] intr_handle
- *   Interrupt handler.
- * @param cb_arg
- *   Callback argument.
- */
-void
-mlx5_dev_interrupt_handler(void *cb_arg)
-{
-	struct rte_eth_dev *dev = cb_arg;
-	uint32_t events;
-
-	events = mlx5_dev_status_handler(dev);
-	if (events & (1 << RTE_ETH_EVENT_INTR_LSC))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
-	if (events & (1 << RTE_ETH_EVENT_INTR_RMV))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH 14/14] net/mlx5: add source vport match to the ingress rules
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (12 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 13/14] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
@ 2019-03-21  8:11   ` Viacheslav Ovsiienko
  2019-03-21 12:15     ` Shahaf Shuler
  2019-03-21 12:13   ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Shahaf Shuler
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
  15 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-21  8:11 UTC (permalink / raw)
  To: dev; +Cc: shahafs

For E-Switch configurations over multiport Infiniband devices
we should add source vport match to correctly distribute
traffic between representors.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index b8943da..489b3bd 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3095,6 +3095,29 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add source vport match to the specified matcher.
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] port
+ *   Source vport value to match
+ * @param[in] mask
+ *   Mask
+ */
+static void
+flow_dv_translate_source_vport(void *matcher, void *key,
+			      int16_t port, uint16_t mask)
+{
+	void *misc_m = MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters);
+	void *misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+
+	MLX5_SET(fte_match_set_misc, misc_m, source_port, mask);
+	MLX5_SET(fte_match_set_misc, misc_v, source_port, port);
+}
+
+/**
  * Fill the flow with DV spec.
  *
  * @param[in] dev
@@ -3389,6 +3412,21 @@ struct field_modify_info modify_tcp[] = {
 	}
 	dev_flow->dv.actions_n = actions_n;
 	flow->actions = action_flags;
+	if (attr->ingress && !attr->transfer &&
+	    (priv->representor || priv->master)) {
+		/* It was validated - we support unidirections flows only. */
+		assert(!attr->egress);
+		/*
+		 * Add matching on source vport index only
+		 * for ingress rules in E-Switch configurations.
+		 */
+		flow_dv_translate_source_vport(matcher.mask.buf,
+					       dev_flow->dv.value.buf,
+					       priv->representor_id < 0 ?
+					       priv->representor_id :
+					       priv->representor_id + 1,
+					       0xffff);
+	}
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		void *match_mask = matcher.mask.buf;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/14] net/mlx5: add support for multiport IB devices
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (13 preceding siblings ...)
  2019-03-21  8:11   ` [PATCH 14/14] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
@ 2019-03-21 12:13   ` Shahaf Shuler
  2019-03-21 12:58     ` Slava Ovsiienko
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
  15 siblings, 1 reply; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:13 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 00/14] net/mlx5: add support for multiport IB devices
> 
> The Mellanox NICs support SR-IOV and have E-Switch feature.
> When SR-IOV is set up in switchdev mode and E-Switch is enabled we have
> so called VF representors in the system. All representors belonging to the
> same E-Switch are created on the basis of the single PCI function and with
> current implementation each representor has its own dedicated Infiniband
> device and operates within its own Infiniband context. It is proposed to
> provide representors as ports of the single Infiniband device and operate on
> the shared Infiniband context.
> 
> mlx5_pci_probe() routine is planned to be extended to probe the ports of
> found Infiniband devices. All active ports, belonging to the same Infiniband
> device will use the signle shared Infiniband context of that device:
>   - QP are created within shared context, with assigning to
>     specified port
>   - Verb flows are also created with specifying port index
>   - Protection Domain is shared by all representors
>   - Global Memory Regions Cache is shared by all representors,
>     local ones are not (not in this patchset)
>   - Shared Asynchronous Event Handler distributes events
>     among the device ports (not in this patchset)
>   - UAR mapping to user space is also shared (not in patchset)
> 
> Each representor has its own attach network interface device, interface
> name and index can be retrieved via IB device and port index, so the link
> control is performed via Netlink in the same way as in existing
> implementation.

Nice Work Slava,
Please see my comments. 

Also - where is the support for sharing of the memory regions? 

> 
> This patchset also superseedes http://patches.dpdk.org/patch/50486/
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> Viacheslav Ovsiienko (14):
>   net/mlx5: add representor recognition on kernels 5.x
>   net/mlx5: introduce multiport IB device shared structure
>   net/mlx5: modify get ifindex routine for multiport IB
>   net/mlx5: add getting IB ports number for multiport IB
>   net/mlx5: add multiport IB device support to probing
>   net/mlx5: add IB shared context alloc/free functions
>   net/mlx5: switch to the names in the shared IB context
>   net/mlx5: switch to the shared Protection Domain
>   net/mlx5: switch to the shared context IB attributes
>   net/mlx5: switch to the shared IB device context
>   net/mlx5: provide IB port for the object being created
>   net/mlx5: update install/uninstall int handler routines
>   net/mlx5: update event handler for multiport IB devices
>   net/mlx5: add source vport match to the ingress rules
> 
>  drivers/net/mlx5/Makefile          |  10 +
>  drivers/net/mlx5/meson.build       |   4 +
>  drivers/net/mlx5/mlx5.c            | 549 +++++++++++++++++++++++++++------
> ----
>  drivers/net/mlx5/mlx5.h            |  38 ++-
>  drivers/net/mlx5/mlx5_ethdev.c     | 241 ++++++++++------
>  drivers/net/mlx5/mlx5_flow.c       |   2 +
>  drivers/net/mlx5/mlx5_flow_dv.c    |  47 +++-
>  drivers/net/mlx5/mlx5_flow_verbs.c |   7 +-
>  drivers/net/mlx5/mlx5_mr.c         |   4 +-
>  drivers/net/mlx5/mlx5_nl.c         | 101 ++++++-
>  drivers/net/mlx5/mlx5_rxq.c        |  41 +--
>  drivers/net/mlx5/mlx5_socket.c     |   4 +-
>  drivers/net/mlx5/mlx5_stats.c      |  22 +-
>  drivers/net/mlx5/mlx5_trigger.c    |   2 +-
>  drivers/net/mlx5/mlx5_txq.c        |  24 +-
>  15 files changed, 795 insertions(+), 301 deletions(-)
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x
  2019-03-21  8:11   ` [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
@ 2019-03-21 12:13     ` Shahaf Shuler
  2019-03-21 15:08       ` Stephen Hemminger
  0 siblings, 1 reply; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:13 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Hi Slava,

Small comments below. Once fixed you can put my acked-by on the next version. 

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x
> 
> The master device and VF representors were distinguished by presence of
> port name, master device did not have one. The new Linux kernels starting
> from 5.0 provide the port name for master device and the implemented
> representor recognizing method does not work.
> The new recognizing method is based on quiering the VF number, created on
> the base of the device.
> 
> The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
> attribute is specified in the Netlink request message.
> 
> Also the presence of device symlink in device sysfs folder is added to
> distinguish representors with sysfs based method.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> ---
> 
> v3: - rebased over new port naming http://patches.dpdk.org/patch/51245/
>     - master recognition is reinforced by checking vport for -1
>       for new port naming schema
> 
> v2: - fopen replaced with opendir to detect whether directory exists
> 
> v1: http://patches.dpdk.org/patch/50411/
> ---
>  drivers/net/mlx5/Makefile      | 10 ++++++++++
>  drivers/net/mlx5/meson.build   |  4 ++++
>  drivers/net/mlx5/mlx5.c        |  2 +-
>  drivers/net/mlx5/mlx5.h        |  1 +
>  drivers/net/mlx5/mlx5_ethdev.c | 13 +++++++++++--
>  drivers/net/mlx5/mlx5_nl.c     | 36
> +++++++++++++++++++++++++++++++++---
>  6 files changed, 60 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile index
> 1ed299d..3dd7e38 100644
> --- a/drivers/net/mlx5/Makefile
> +++ b/drivers/net/mlx5/Makefile
> @@ -231,6 +231,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-
> config-h.sh
>  		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
>  		$(AUTOCONF_OUTPUT)
>  	$Q sh -- '$<' '$@' \
> +		HAVE_IFLA_NUM_VF \
> +		linux/if_link.h \
> +		enum IFLA_NUM_VF \
> +		$(AUTOCONF_OUTPUT)
> +	$Q sh -- '$<' '$@' \
> +		HAVE_IFLA_EXT_MASK \
> +		linux/if_link.h \
> +		enum IFLA_EXT_MASK \
> +		$(AUTOCONF_OUTPUT)
> +	$Q sh -- '$<' '$@' \
>  		HAVE_IFLA_PHYS_SWITCH_ID \
>  		linux/if_link.h \
>  		enum IFLA_PHYS_SWITCH_ID \
> diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
> index 0cf2f08..e3cb9bc 100644
> --- a/drivers/net/mlx5/meson.build
> +++ b/drivers/net/mlx5/meson.build
> @@ -133,6 +133,10 @@ if build
>  		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
>  		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
>  		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
> +		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
> +		'IFLA_NUM_VF' ],
> +		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
> +		'IFLA_EXT_MASK' ],
>  		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
>  		'IFLA_PHYS_SWITCH_ID' ],
>  		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h', diff --git
> a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> ad1975c..ea3d00c 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -13,7 +13,6 @@
>  #include <errno.h>
>  #include <net/if.h>
>  #include <sys/mman.h>
> -#include <linux/netlink.h>
>  #include <linux/rtnetlink.h>
> 
>  /* Verbs header. */
> @@ -1001,6 +1000,7 @@
>  	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
>  	priv->nl_sn = 0;
>  	priv->representor = !!switch_info->representor;
> +	priv->master = !!switch_info->master;
>  	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
>  	priv->representor_id =
>  		switch_info->representor ? switch_info->port_name : -1; diff
> --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> a88cb4a..58bc37f 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -214,6 +214,7 @@ struct mlx5_priv {
>  	uint16_t mtu; /* Configured MTU. */
>  	unsigned int isolated:1; /* Whether isolated mode is enabled. */
>  	unsigned int representor:1; /* Device is a port representor. */
> +	unsigned int master:1; /* Device is a E-Switch master. */
>  	uint16_t domain_id; /* Switch domain identifier. */
>  	int32_t representor_id; /* Port representor identifier. */
>  	/* RX/TX queues. */
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> b/drivers/net/mlx5/mlx5_ethdev.c index 84d761c..81f2a42 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -1362,8 +1362,10 @@ int mlx5_fw_version_get(struct rte_eth_dev
> *dev, char *fw_ver, size_t fw_size)
>  		.port_name = 0,
>  		.switch_id = 0,
>  	};
> +	DIR *dir;
>  	bool port_name_set = false;
>  	bool port_switch_id_set = false;
> +	bool device_dir = false;
>  	char c;
> 
>  	if (!if_indextoname(ifindex, ifname)) { @@ -1375,6 +1377,8 @@ int
> mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t
> fw_size)
>  	      ifname);
>  	MKSTR(phys_switch_id, "/sys/class/net/%s/phys_switch_id",
>  	      ifname);
> +	MKSTR(pci_device, "/sys/class/net/%s/device",
> +	      ifname);
> 
>  	file = fopen(phys_port_name, "rb");
>  	if (file != NULL) {
> @@ -1391,8 +1395,13 @@ int mlx5_fw_version_get(struct rte_eth_dev
> *dev, char *fw_ver, size_t fw_size)
>  		fscanf(file, "%" SCNx64 "%c", &data.switch_id, &c) == 2 &&
>  		c == '\n';
>  	fclose(file);
> -	data.master = port_switch_id_set && !port_name_set;
> -	data.representor = port_switch_id_set && port_name_set;
> +	dir = opendir(pci_device);
> +	if (dir != NULL) {
> +		closedir(dir);
> +		device_dir = true;
> +	}
> +	data.master = port_switch_id_set && (!port_name_set ||
> device_dir);
> +	data.representor = port_switch_id_set && port_name_set &&
> !device_dir;

Add assert that device cannot be both master and representor. 

>  	*info = data;
>  	return 0;
>  }
> diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c index
> 8a10109..aa49cb4 100644
> --- a/drivers/net/mlx5/mlx5_nl.c
> +++ b/drivers/net/mlx5/mlx5_nl.c
> @@ -65,6 +65,12 @@
>  #endif
> 
>  /* These are normally found in linux/if_link.h. */
> +#ifndef HAVE_IFLA_NUM_VF
> +#define IFLA_NUM_VF 21
> +#endif
> +#ifndef HAVE_IFLA_EXT_MASK
> +#define IFLA_EXT_MASK 29
> +#endif
>  #ifndef HAVE_IFLA_PHYS_SWITCH_ID
>  #define IFLA_PHYS_SWITCH_ID 36
>  #endif
> @@ -837,6 +843,7 @@ struct mlx5_nl_ifindex_data {
>  	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
>  	bool port_name_set = false;
>  	bool switch_id_set = false;
> +	bool num_vf_set = false;
> 
>  	if (nh->nlmsg_type != RTM_NEWLINK)
>  		goto error;
> @@ -848,6 +855,9 @@ struct mlx5_nl_ifindex_data {
>  		if (ra->rta_len > nh->nlmsg_len - off)
>  			goto error;
>  		switch (ra->rta_type) {
> +		case IFLA_NUM_VF:
> +			num_vf_set = true;
> +			break;
>  		case IFLA_PHYS_PORT_NAME:
>  			port_name_set =
>  				mlx5_translate_port_name((char *)payload,
> @@ -864,8 +874,19 @@ struct mlx5_nl_ifindex_data {
>  		}
>  		off += RTA_ALIGN(ra->rta_len);
>  	}
> -	info.master = switch_id_set && !port_name_set;
> -	info.representor = switch_id_set && port_name_set;
> +	if (switch_id_set) {
> +		if (info.port_name_new) {
> +			/* New representors naming schema. */
> +			if (port_name_set) {
> +				info.master = (info.port_name == -1);
> +				info.representor = (info.port_name != -1);
> +			}
> +		} else {
> +			/* Legacy representors naming schema. */
> +			info.master = (!port_name_set || num_vf_set);
> +			info.representor = port_name_set && !num_vf_set;
> +		}
> +	}

Add assert that device cannot be both master and representor.

>  	memcpy(arg, &info, sizeof(info));
>  	return 0;
>  error:
> @@ -893,9 +914,13 @@ struct mlx5_nl_ifindex_data {
>  	struct {
>  		struct nlmsghdr nh;
>  		struct ifinfomsg info;
> +		struct rtattr rta;
> +		uint32_t extmask;
>  	} req = {
>  		.nh = {
> -			.nlmsg_len = NLMSG_LENGTH(sizeof(req.info)),
> +			.nlmsg_len = NLMSG_LENGTH
> +					(sizeof(req.info) +
> +					 RTA_LENGTH(sizeof(uint32_t))),
>  			.nlmsg_type = RTM_GETLINK,
>  			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
>  		},
> @@ -903,6 +928,11 @@ struct mlx5_nl_ifindex_data {
>  			.ifi_family = AF_UNSPEC,
>  			.ifi_index = ifindex,
>  		},
> +		.rta = {
> +			.rta_type = IFLA_EXT_MASK,
> +			.rta_len = RTA_LENGTH(sizeof(int32_t)),
> +		},
> +		.extmask = RTE_LE32(1),
>  	};
>  	int ret;
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 03/14] net/mlx5: modify get ifindex routine for multiport IB
  2019-03-21  8:11   ` [PATCH 03/14] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
@ 2019-03-21 12:14     ` Shahaf Shuler
  2019-03-21 12:58       ` Slava Ovsiienko
  0 siblings, 1 reply; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:14 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 03/14] net/mlx5: modify get ifindex routine for multiport IB
> 
> There is the routine mlx5_nl_ifindex() returning the network interface index
> associated with Infiniband device.
> We are going to support multiport IB devices, now function takes the IB port
> as argument and returns ifindex associated with tuple <IB device, IB port>
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5.c        |  2 +-
>  drivers/net/mlx5/mlx5.h        |  2 +-
>  drivers/net/mlx5/mlx5_ethdev.c |  2 +-
>  drivers/net/mlx5/mlx5_nl.c     | 20 +++++++++++++-------
>  4 files changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> ea3d00c..89c30af 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -1441,7 +1441,7 @@ struct mlx5_dev_spawn_data {
>  			list[i].ifindex = 0;
>  		else
>  			list[i].ifindex = mlx5_nl_ifindex
> -				(nl_rdma, list[i].ibv_dev->name);
> +				(nl_rdma, list[i].ibv_dev->name, 1);

Why hardcoded to 1?

>  		if (nl_route < 0 ||
>  		    !list[i].ifindex ||
>  		    mlx5_nl_switch_info(nl_route, list[i].ifindex, diff --git
> a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 1716cd1..f5e29ca 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -470,7 +470,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev
> *dev, struct ether_addr *mac,  void mlx5_nl_mac_addr_flush(struct
> rte_eth_dev *dev);  int mlx5_nl_promisc(struct rte_eth_dev *dev, int
> enable);  int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable); -
> unsigned int mlx5_nl_ifindex(int nl, const char *name);
> +unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t
> +pindex);
>  int mlx5_nl_switch_info(int nl, unsigned int ifindex,
>  			struct mlx5_switch_info *info);
> 
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> b/drivers/net/mlx5/mlx5_ethdev.c index 81f2a42..5b44889 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -222,7 +222,7 @@ struct ethtool_link_settings {
>  	struct mlx5_priv *priv = dev->data->dev_private;
>  	unsigned int ifindex =
>  		priv->nl_socket_rdma >= 0 ?
> -		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name) :
> 0;
> +		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name,
> 1) : 0;
> 
>  	if (!ifindex) {
>  		if (!priv->representor)
> diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c index
> aa49cb4..6e9145a 100644
> --- a/drivers/net/mlx5/mlx5_nl.c
> +++ b/drivers/net/mlx5/mlx5_nl.c
> @@ -85,11 +85,12 @@ struct mlx5_nl_mac_addr {
>  	int mac_n; /**< Number of addresses in the array. */  };
> 
> -/** Data structure used by mlx5_nl_ifindex_cb(). */
> +/** Data structure used by mlx5_nl_cmdget_cb(). */
>  struct mlx5_nl_ifindex_data {
>  	const char *name; /**< IB device name (in). */
>  	uint32_t ibindex; /**< IB device index (out). */
>  	uint32_t ifindex; /**< Network interface index (out). */
> +	uint32_t portnum; /**< IB device max port number. */

What happen w/ old kernel which don't support it? the portnum is ignored? 

>  };
> 
>  /**
> @@ -695,12 +696,13 @@ struct mlx5_nl_ifindex_data {
>   *   0 on success, a negative errno value otherwise and rte_errno is set.
>   */
>  static int
> -mlx5_nl_ifindex_cb(struct nlmsghdr *nh, void *arg)
> +mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
>  {
>  	struct mlx5_nl_ifindex_data *data = arg;
>  	size_t off = NLMSG_HDRLEN;
>  	uint32_t ibindex = 0;
>  	uint32_t ifindex = 0;
> +	uint32_t portnum = 0;
>  	int found = 0;
> 
>  	if (nh->nlmsg_type !=
> @@ -725,6 +727,9 @@ struct mlx5_nl_ifindex_data {
>  		case RDMA_NLDEV_ATTR_NDEV_INDEX:
>  			ifindex = *(uint32_t *)payload;
>  			break;
> +		case RDMA_NLDEV_ATTR_PORT_INDEX:
> +			portnum = *(uint32_t *)payload;
> +			break;
>  		default:
>  			break;
>  		}
> @@ -733,6 +738,7 @@ struct mlx5_nl_ifindex_data {
>  	if (found) {
>  		data->ibindex = ibindex;
>  		data->ifindex = ifindex;
> +		data->portnum = portnum;
>  	}
>  	return 0;
>  error:
> @@ -751,15 +757,15 @@ struct mlx5_nl_ifindex_data {
>   *   Netlink socket of the RDMA kind (NETLINK_RDMA).
>   * @param[in] name
>   *   IB device name.
> - *
> + * @param[in] pindex
> + *   IB device port index, starting from 1
>   * @return
>   *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
>   *   is set.
>   */
>  unsigned int
> -mlx5_nl_ifindex(int nl, const char *name)
> +mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
>  {
> -	static const uint32_t pindex = 1;
>  	uint32_t seq = random();
>  	struct mlx5_nl_ifindex_data data = {
>  		.name = name,
> @@ -785,7 +791,7 @@ struct mlx5_nl_ifindex_data {
>  	ret = mlx5_nl_send(nl, &req.nh, seq);
>  	if (ret < 0)
>  		return 0;
> -	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
> +	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
>  	if (ret < 0)
>  		return 0;
>  	if (!data.ibindex)
> @@ -808,7 +814,7 @@ struct mlx5_nl_ifindex_data {
>  	ret = mlx5_nl_send(nl, &req.nh, seq);
>  	if (ret < 0)
>  		return 0;
> -	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
> +	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
>  	if (ret < 0)
>  		return 0;
>  	if (!data.ifindex)
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 04/14] net/mlx5: add getting IB ports number for multiport IB
  2019-03-21  8:11   ` [PATCH 04/14] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
@ 2019-03-21 12:14     ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:14 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [dpdk-dev] [PATCH 04/14] net/mlx5: add getting IB ports number
> for multiport IB
> 
> There is the routine mlx5_nl_portnum() added to get the number of ports of
> multiport Infiniband device.
> It is assumed the Uplink/VF representors are attached on this ports.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellanox.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/14] net/mlx5: add multiport IB device support to probing
  2019-03-21  8:11   ` [PATCH 05/14] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
@ 2019-03-21 12:14     ` Shahaf Shuler
  2019-03-21 12:54       ` Slava Ovsiienko
  0 siblings, 1 reply; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:14 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 05/14] net/mlx5: add multiport IB device support to probing
> 
> mlx5_pci_probe() routine is refactored to probe the ports of found
> Infiniband devices. All active ports (with attached network interface),
> belonging to the same Infiniband device will use the signle shared Infiniband
> context of that device.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5.c | 302 +++++++++++++++++++++++++++++++++---
> ------------
>  1 file changed, 210 insertions(+), 92 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 89c30af..100e9f4 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -130,6 +130,16 @@
>  /** Driver-specific log messages type. */  int mlx5_logtype;
> 
> +/** Data associated with devices to spawn. */ struct
> +mlx5_dev_spawn_data {
> +	uint32_t ifindex; /**< Network interface index. */
> +	uint32_t max_port; /**< IB device maximal port index. */
> +	uint32_t ibv_port; /**< IB device physical port index. */
> +	struct mlx5_switch_info info; /**< Switch information. */
> +	struct ibv_device *ibv_dev; /**< Associated IB device. */
> +	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */ };
> +
>  /**
>   * Prepare shared data between primary and secondary process.
>   */
> @@ -716,12 +726,10 @@
>   *
>   * @param dpdk_dev
>   *   Backing DPDK device.
> - * @param ibv_dev
> - *   Verbs device.
> + * @param spawn
> + *   Verbs device parameters (name, port, switch_info) to spawn.
>   * @param config
>   *   Device configuration parameters.
> - * @param[in] switch_info
> - *   Switch properties of Ethernet device.
>   *
>   * @return
>   *   A valid Ethernet device object on success, NULL otherwise and rte_errno
> @@ -732,10 +740,11 @@
>   */
>  static struct rte_eth_dev *
>  mlx5_dev_spawn(struct rte_device *dpdk_dev,
> -	       struct ibv_device *ibv_dev,
> -	       struct mlx5_dev_config config,
> -	       const struct mlx5_switch_info *switch_info)
> +	       struct mlx5_dev_spawn_data *spawn,
> +	       struct mlx5_dev_config config)
>  {
> +	const struct mlx5_switch_info *switch_info = &spawn->info;
> +	struct ibv_device *ibv_dev = spawn->ibv_dev;
>  	struct ibv_context *ctx = NULL;
>  	struct ibv_device_attr_ex attr;
>  	struct ibv_port_attr port_attr;
> @@ -952,7 +961,7 @@
>  		return eth_dev;
>  	}
>  	/* Check port status. */
> -	err = mlx5_glue->query_port(ctx, 1, &port_attr);
> +	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
>  	if (err) {
>  		DRV_LOG(ERR, "port query failed: %s", strerror(err));
>  		goto error;
> @@ -1316,14 +1325,6 @@
>  	return NULL;
>  }
> 
> -/** Data associated with devices to spawn. */ -struct
> mlx5_dev_spawn_data {
> -	unsigned int ifindex; /**< Network interface index. */
> -	struct mlx5_switch_info info; /**< Switch information. */
> -	struct ibv_device *ibv_dev; /**< Associated IB device. */
> -	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
> -};
> -
>  /**
>   * Comparison callback to sort device data.
>   *
> @@ -1380,7 +1381,9 @@ struct mlx5_dev_spawn_data {
>  	       struct rte_pci_device *pci_dev)  {
>  	struct ibv_device **ibv_list;
> -	unsigned int n = 0;
> +	unsigned int nd = 0;
> +	unsigned int np = 0;
> +	unsigned int ns = 0;

This fields names are not informative. Find a better ones. 

>  	struct mlx5_dev_config dev_config;
>  	int ret;
> 
> @@ -1392,8 +1395,14 @@ struct mlx5_dev_spawn_data {
>  		DRV_LOG(ERR, "cannot list devices, is ib_uverbs loaded?");
>  		return -rte_errno;
>  	}
> -
> +	/*
> +	 * First scan the list of all Infiniband devices to find
> +	 * matching ones, gathering into the list.
> +	 */
>  	struct ibv_device *ibv_match[ret + 1];
> +	int nl_route = -1;
> +	int nl_rdma = -1;
> +	unsigned int i;
> 
>  	while (ret-- > 0) {
>  		struct rte_pci_addr pci_addr;
> @@ -1408,77 +1417,183 @@ struct mlx5_dev_spawn_data {
>  			continue;
>  		DRV_LOG(INFO, "PCI information matches for device
> \"%s\"",
>  			ibv_list[ret]->name);
> -		ibv_match[n++] = ibv_list[ret];
> +		ibv_match[nd++] = ibv_list[ret];
> +	}
> +	ibv_match[nd] = NULL;
> +	if (!nd) {
> +		/* No device macthes, just complain and bail out. */
> +		mlx5_glue->free_device_list(ibv_list);
> +		DRV_LOG(WARNING,
> +			"no Verbs device matches PCI device " PCI_PRI_FMT
> ","
> +			" are kernel drivers loaded?",
> +			pci_dev->addr.domain, pci_dev->addr.bus,
> +			pci_dev->addr.devid, pci_dev->addr.function);
> +		rte_errno = ENOENT;
> +		ret = -rte_errno;
> +		return ret;
> +	}
> +	nl_route = mlx5_nl_init(NETLINK_ROUTE);
> +	nl_rdma = mlx5_nl_init(NETLINK_RDMA);
> +	if (nd == 1) {
> +		/*
> +		 * Found single matching device may have multiple ports.
> +		 * Each port may be representor, we have to check the port
> +		 * number and check the representors existence.
> +		 */
> +		if (nl_rdma >= 0)
> +			np = mlx5_nl_portnum(nl_rdma, ibv_match[0]-
> >name);
> +		if (!np)
> +			DRV_LOG(WARNING, "can not get IB device \"%s\""
> +					 " ports number", ibv_match[0]-
> >name);

This warning is misleading. On old kernels it is expected to have multiple IB devices instead of a single one w/ multiple ports.
The level should be changed for debug, and the syntax to express it is not an error. 

>  	}
> -	ibv_match[n] = NULL;
> -
> -	struct mlx5_dev_spawn_data list[n];
> -	int nl_route = n ? mlx5_nl_init(NETLINK_ROUTE) : -1;
> -	int nl_rdma = n ? mlx5_nl_init(NETLINK_RDMA) : -1;
> -	unsigned int i;
> -	unsigned int u;
> -
>  	/*
> -	 * The existence of several matching entries (n > 1) means port
> -	 * representors have been instantiated. No existing Verbs call nor
> -	 * /sys entries can tell them apart, this can only be done through
> -	 * Netlink calls assuming kernel drivers are recent enough to
> -	 * support them.
> -	 *
> -	 * In the event of identification failure through Netlink, try again
> -	 * through sysfs, then either:
> -	 *
> -	 * 1. No device matches (n == 0), complain and bail out.
> -	 * 2. A single IB device matches (n == 1) and is not a representor,
> -	 *    assume no switch support.
> -	 * 3. Otherwise no safe assumptions can be made; complain louder
> and
> -	 *    bail out.
> +	 * Now we can determine the maximal
> +	 * amount of devices to be spawned.
>  	 */
> -	for (i = 0; i != n; ++i) {
> -		list[i].ibv_dev = ibv_match[i];
> -		list[i].eth_dev = NULL;
> -		if (nl_rdma < 0)
> -			list[i].ifindex = 0;
> -		else
> -			list[i].ifindex = mlx5_nl_ifindex
> -				(nl_rdma, list[i].ibv_dev->name, 1);
> -		if (nl_route < 0 ||
> -		    !list[i].ifindex ||
> -		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
> -					&list[i].info) ||
> -		    ((!list[i].info.representor && !list[i].info.master) &&
> -		     mlx5_sysfs_switch_info(list[i].ifindex, &list[i].info))) {
> -			list[i].ifindex = 0;
> -			memset(&list[i].info, 0, sizeof(list[i].info));
> -			continue;
> +	struct mlx5_dev_spawn_data list[np ? np : nd];
> +
> +	if (np > 1) {
> +		/*
> +		 * Signle IB device with multiple ports found,
> +		 * it may be E-Switch master device and representors.
> +		 * We have to perform identification trough the ports.
> +		 */
> +		assert(nl_rdma >= 0);
> +		assert(ns == 0);
> +		assert(nd == 1);
> +		for (i = 1; i <= np; ++i) {
> +			list[ns].max_port = np;
> +			list[ns].ibv_port = i;
> +			list[ns].ibv_dev = ibv_match[0];
> +			list[ns].eth_dev = NULL;
> +			list[ns].ifindex = mlx5_nl_ifindex
> +					(nl_rdma, list[ns].ibv_dev->name, i);
> +			if (!list[ns].ifindex) {
> +				/*
> +				 * No network interface index found for the
> +				 * specified port, it means there is no
> +				 * representor on this port. It's OK,
> +				 * there can be disabled ports, for example
> +				 * if sriov_numvfs < sriov_totalvfs.
> +				 */
> +				continue;
> +			}
> +			ret = -1;
> +			if (nl_route >= 0)
> +				ret = mlx5_nl_switch_info
> +					       (nl_route,
> +						list[ns].ifindex,
> +						&list[ns].info);
> +			if (ret || (!list[ns].info.representor &&
> +				    !list[ns].info.master)) {
> +				/*
> +				 * We failed to recognize representors with
> +				 * Netlink, let's try to perform the task
> +				 * with sysfs.
> +				 */
> +				ret =  mlx5_sysfs_switch_info
> +						(list[ns].ifindex,
> +						 &list[ns].info);
> +			}
> +			if (!ret && (list[ns].info.representor ^
> +				     list[ns].info.master))
> +				ns++;
>  		}
> -	}
> -	if (nl_rdma >= 0)
> -		close(nl_rdma);
> -	if (nl_route >= 0)
> -		close(nl_route);
> -	/* Count unidentified devices. */
> -	for (u = 0, i = 0; i != n; ++i)
> -		if (!list[i].info.master && !list[i].info.representor)
> -			++u;
> -	if (u) {
> -		if (n == 1 && u == 1) {
> -			/* Case #2. */
> -			DRV_LOG(INFO, "no switch support detected");
> -		} else {
> -			/* Case #3. */
> +		if (!ns) {
> +			DRV_LOG(ERR,
> +				"unable to recognize master/representors"
> +				" on the IB device with multiple ports");
> +			rte_errno = ENOENT;
> +			ret = -rte_errno;
> +			goto exit;
> +		}
> +	} else {
> +		/*
> +		 * The existence of several matching entries (nd > 1) means
> +		 * port representors have been instantiated. No existing
> Verbs
> +		 * call nor sysfs entries can tell them apart, this can only
> +		 * be done through Netlink calls assuming kernel drivers are
> +		 * recent enough to support them.
> +		 *
> +		 * In the event of identification failure through Netlink,
> +		 * try again through sysfs, then:
> +		 *
> +		 * 1. A single IB device matches (nd == 1) with single
> +		 *    port (np=0/1) and is not a representor, assume
> +		 *    no switch support.
> +		 *
> +		 * 2. Otherwise no safe assumptions can be made;
> +		 *    complain louder and bail out.
> +		 */
> +		np = 1;
> +		for (i = 0; i != nd; ++i) {
> +			memset(&list[ns].info, 0, sizeof(list[ns].info));
> +			list[ns].max_port = 1;
> +			list[ns].ibv_port = 1;
> +			list[ns].ibv_dev = ibv_match[i];
> +			list[ns].eth_dev = NULL;
> +			list[ns].ifindex = 0;
> +			if (nl_rdma >= 0)
> +				list[ns].ifindex = mlx5_nl_ifindex
> +					(nl_rdma, list[ns].ibv_dev->name, 1);
> +			if (!list[ns].ifindex) {
> +				/*
> +				 * No network interface index found for the
> +				 * specified device, it means there it is not
> +				 * a representor/master.
> +				 */
> +				continue;
> +			}
> +			ret = -1;
> +			if (nl_route >= 0)
> +				ret = mlx5_nl_switch_info
> +					       (nl_route,
> +						list[ns].ifindex,
> +						&list[ns].info);
> +			if (ret || (!list[ns].info.representor &&
> +				    !list[ns].info.master)) {
> +				/*
> +				 * We failed to recognize representors with
> +				 * Netlink, let's try to perform the task
> +				 * with sysfs.
> +				 */
> +				ret =  mlx5_sysfs_switch_info
> +						(list[ns].ifindex,
> +						 &list[ns].info);
> +			}
> +			if (!ret && (list[ns].info.representor ^
> +				     list[ns].info.master)) {
> +				ns++;
> +			} else if ((nd == 1) &&
> +				   !list[ns].info.representor &&
> +				   !list[ns].info.master) {
> +				/*
> +				 * Single IB device with
> +				 * one physical port and
> +				 * attached network device.
> +				 * May be SRIOV is not enabled
> +				 * or there is no representors.
> +				 */
> +				DRV_LOG(INFO, "no E-Switch support
> detected");
> +				ns++;
> +				break;
> +			}
> +		}
> +		if (!ns) {
>  			DRV_LOG(ERR,
> -				"unable to tell which of the matching
> devices"
> -				" is the master (lack of kernel support?)");
> -			n = 0;
> +				"unable to recognize master/representors"
> +				" on the multiple IB devices");
> +			rte_errno = ENOENT;
> +			ret = -rte_errno;
> +			goto exit;
>  		}
>  	}
> +	assert(ns);
>  	/*
>  	 * Sort list to probe devices in natural order for users convenience
>  	 * (i.e. master first, then representors from lowest to highest ID).
>  	 */
> -	if (n)
> -		qsort(list, n, sizeof(*list), mlx5_dev_spawn_data_cmp);
> +	qsort(list, ns, sizeof(*list), mlx5_dev_spawn_data_cmp);
>  	/* Default configuration. */
>  	dev_config = (struct mlx5_dev_config){
>  		.hw_padding = 0,
> @@ -1497,7 +1612,7 @@ struct mlx5_dev_spawn_data {
>  			.min_rxqs_num = MLX5_MPRQ_MIN_RXQS,
>  		},
>  	};
> -	/* Device speicific configuration. */
> +	/* Device specific configuration. */
>  	switch (pci_dev->id.device_id) {
>  	case PCI_DEVICE_ID_MELLANOX_CONNECTX5BF:
>  		dev_config.txqs_vec =
> MLX5_VPMD_MAX_TXQS_BLUEFIELD; @@ -1514,12 +1629,12 @@ struct
> mlx5_dev_spawn_data {
>  	/* Set architecture-dependent default value if unset. */
>  	if (dev_config.txqs_vec == MLX5_ARG_UNSET)
>  		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS;
> -	for (i = 0; i != n; ++i) {
> +	for (i = 0; i != ns; ++i) {
>  		uint32_t restore;
> 
>  		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
> -						 list[i].ibv_dev, dev_config,
> -						 &list[i].info);
> +						 &list[i],
> +						 dev_config);
>  		if (!list[i].eth_dev) {
>  			if (rte_errno != EBUSY && rte_errno != EEXIST)
>  				break;
> @@ -1532,16 +1647,7 @@ struct mlx5_dev_spawn_data {
>  		list[i].eth_dev->data->dev_flags |= restore;
>  		rte_eth_dev_probing_finish(list[i].eth_dev);
>  	}
> -	mlx5_glue->free_device_list(ibv_list);
> -	if (!n) {
> -		DRV_LOG(WARNING,
> -			"no Verbs device matches PCI device " PCI_PRI_FMT
> ","
> -			" are kernel drivers loaded?",
> -			pci_dev->addr.domain, pci_dev->addr.bus,
> -			pci_dev->addr.devid, pci_dev->addr.function);
> -		rte_errno = ENOENT;
> -		ret = -rte_errno;
> -	} else if (i != n) {
> +	if (i != ns) {
>  		DRV_LOG(ERR,
>  			"probe of PCI device " PCI_PRI_FMT " aborted after"
>  			" encountering an error: %s",
> @@ -1563,6 +1669,18 @@ struct mlx5_dev_spawn_data {
>  	} else {
>  		ret = 0;
>  	}
> +exit:
> +	/*
> +	 * Do the routine cleanup:
> +	 * - close opened Netlink sockets
> +	 * - free the Infiniband device list
> +	 */
> +	if (nl_rdma >= 0)
> +		close(nl_rdma);
> +	if (nl_route >= 0)
> +		close(nl_route);
> +	assert(ibv_list);
> +	mlx5_glue->free_device_list(ibv_list);
>  	return ret;
>  }
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 06/14] net/mlx5: add IB shared context alloc/free functions
  2019-03-21  8:11   ` [PATCH 06/14] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
@ 2019-03-21 12:14     ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:14 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [dpdk-dev] [PATCH 06/14] net/mlx5: add IB shared context
> alloc/free functions
> 
> The functions to allocate and free shared IB context for multiport is added.
> The IB device context, Protection Domain, device attributes, Infiniband
> names are going to be relocated to the shared structure from the device
> private one. mlx5_dev_spawn() is updated to create shared context.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Patch #2 in the series should be part of this commit, since it doesn't make sense to be a standalone. 

> ---
>  drivers/net/mlx5/mlx5.c | 234
> ++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 176 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 100e9f4..b3060de 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -140,6 +140,150 @@ struct mlx5_dev_spawn_data {
>  	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */  };
> 
> +static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list =
> +LIST_HEAD_INITIALIZER();
> +
> +/**
> + * Allocate shared IB device context. If there is multiport device the
> + * master and representors will share this context, if there is single
> + * port dedicated IB device, the context will be used by only given
> + * port due to unification.
> + *
> + * Routine first searches the context for the spesified IB device name,
> + * if found the shared context assumed and reference counter is
> incremented.
> + * If no context found the new one is created and initialized with
> +specified
> + * IB device context and parameters.
> + *
> + * @param[in] spawn
> + *   Pointer to the IB device attributes (name, port, etc).
> + *
> + * @return
> + *   Pointer to mlx5_ibv_shared object on success,
> + *   otherwise NULL and rte_errno is set.
> + */
> +static struct mlx5_ibv_shared *
> +mlx5_alloc_shared_ibctx(const struct mlx5_dev_spawn_data *spawn) {
> +	struct mlx5_ibv_shared *sh;
> +	int err = 0;

Since you having a global list you need to make the access thread safe. 

> +
> +	assert(spawn);
> +	/* Search for IB context by device name. */
> +	LIST_FOREACH(sh, &mlx5_ibv_list, next) {
> +		if (!strcmp(sh->ibdev_name, spawn->ibv_dev->name)) {
> +			assert(!sh->secondary);

How do you enforce secondary not to call this function? 

> +			sh->refcnt++;
> +			return sh;
> +		}
> +	}
> +	/* No device found, we have to create new sharted context. */
> +	assert(spawn->max_port);
> +	sh = rte_zmalloc("ethdev shared ib context",
> +			 sizeof(struct mlx5_ibv_shared) +
> +			 spawn->max_port *
> +			 sizeof(struct mlx5_ibv_shared_port),
> +			 RTE_CACHE_LINE_SIZE);
> +	if (!sh) {
> +		DRV_LOG(ERR, "shared context allocation failure");
> +		rte_errno  = ENOMEM;
> +		return NULL;
> +	}
> +	/* Try to open IB device with DV first, then usual Verbs. */
> +	errno = 0;
> +	sh->ctx = mlx5_glue->dv_open_device(spawn->ibv_dev);

This one is only for primary processes right? 

> +	if (sh->ctx) {
> +		sh->devx = 1;
> +		DRV_LOG(DEBUG, "DevX is supported");
> +	} else {
> +		sh->ctx = mlx5_glue->open_device(spawn->ibv_dev);
> +		if (!sh->ctx) {
> +			err = errno ? errno : ENODEV;
> +			goto error;
> +		}
> +		DRV_LOG(DEBUG, "DevX is NOT supported");
> +	}
> +	err = mlx5_glue->query_device_ex(sh->ctx, NULL, &sh-
> >device_attr);
> +	if (err) {
> +		DRV_LOG(DEBUG, "ibv_query_device_ex() failed");
> +		goto error;
> +	}
> +	sh->refcnt = 1;
> +	sh->max_port = spawn->max_port;
> +	strncpy(sh->ibdev_name, sh->ctx->device->name,
> +		sizeof(sh->ibdev_name));
> +	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
> +		sizeof(sh->ibdev_path));
> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> +		/*
> +		 * For secondary process we just open the IB device
> +		 * and get attributes, there will no be real usage
> +		 * of this structure, the secondary process will
> +		 * use one from prpimary.
> +		 */
> +		sh->secondary = 1;

Secondary process should not open a new device. it should use the primary device + private structure for everything.
In fact, secondary process should not call this function nor reference any shared object. 

> +		return sh;
> +	}
> +	sh->pd = mlx5_glue->alloc_pd(sh->ctx);
> +	if (sh->pd == NULL) {
> +		DRV_LOG(ERR, "PD allocation failure");
> +		err = ENOMEM;
> +		goto error;
> +	}
> +	LIST_INSERT_HEAD(&mlx5_ibv_list, sh, next);
> +	return sh;
> +error:
> +	assert(sh);
> +	if (sh->pd)
> +		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
> +	if (sh->ctx)
> +		claim_zero(mlx5_glue->close_device(sh->ctx));
> +	rte_free(sh);
> +	assert(err > 0);
> +	rte_errno = err;
> +	return NULL;
> +}
> +
> +/**
> + * Free shared IB device context. Decrement counter and if zero free
> + * all allocated resources and close handles.
> + *
> + * @param[in] sh
> + *   Pointer to mlx5_ibv_shared object to free
> + */
> +static void
> +mlx5_free_shared_ibctx(struct mlx5_ibv_shared *sh) { #ifndef NDEBUG
> +	/* Check the object presence in the list. */
> +	struct mlx5_ibv_shared *lctx;
> +
> +	LIST_FOREACH(lctx, &mlx5_ibv_list, next)
> +		if (lctx == sh)
> +			break;
> +	assert(lctx);
> +	if (lctx != sh) {
> +		DRV_LOG(ERR, "Freeing non-existing shared IB context");
> +		return;
> +	}
> +#endif
> +	assert(sh);
> +	assert(sh->refcnt);
> +	if (--sh->refcnt)
> +		return;
> +	/* Zero reference counter, we should release resources. */
> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> +		assert(sh->secondary);
> +		assert(sh->ctx);
> +		assert(!sh->pd);
> +	}
> +	LIST_REMOVE(sh, next);

Secondary process is not allowed to do all of the below. 

> +	if (sh->pd)
> +		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
> +	if (sh->ctx)
> +		claim_zero(mlx5_glue->close_device(sh->ctx));
> +	rte_free(sh);
> +}
> +
> +
>  /**
>   * Prepare shared data between primary and secondary process.
>   */
> @@ -289,12 +433,10 @@ struct mlx5_dev_spawn_data {
>  	}
>  	mlx5_mprq_free_mp(dev);
>  	mlx5_mr_release(dev);
> -	if (priv->pd != NULL) {
> -		assert(priv->ctx != NULL);
> -		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
> -		claim_zero(mlx5_glue->close_device(priv->ctx));
> -	} else
> -		assert(priv->ctx == NULL);
> +	assert(priv->sh);
> +	if (priv->sh)
> +		mlx5_free_shared_ibctx(priv->sh);
> +	priv->sh = NULL;
>  	if (priv->rss_conf.rss_key != NULL)
>  		rte_free(priv->rss_conf.rss_key);
>  	if (priv->reta_idx != NULL)
> @@ -744,11 +886,8 @@ struct mlx5_dev_spawn_data {
>  	       struct mlx5_dev_config config)
>  {
>  	const struct mlx5_switch_info *switch_info = &spawn->info;
> -	struct ibv_device *ibv_dev = spawn->ibv_dev;
> -	struct ibv_context *ctx = NULL;
> -	struct ibv_device_attr_ex attr;
> +	struct mlx5_ibv_shared *sh;
>  	struct ibv_port_attr port_attr;
> -	struct ibv_pd *pd = NULL;
>  	struct mlx5dv_context dv_attr = { .comp_mask = 0 };
>  	struct rte_eth_dev *eth_dev = NULL;
>  	struct mlx5_priv *priv = NULL;
> @@ -807,18 +946,10 @@ struct mlx5_dev_spawn_data {
>  	}
>  	/* Prepare shared data between primary and secondary process. */
>  	mlx5_prepare_shared_data();
> -	errno = 0;
> -	ctx = mlx5_glue->dv_open_device(ibv_dev);
> -	if (ctx) {
> -		config.devx = 1;
> -		DRV_LOG(DEBUG, "DEVX is supported");
> -	} else {
> -		ctx = mlx5_glue->open_device(ibv_dev);
> -		if (!ctx) {
> -			rte_errno = errno ? errno : ENODEV;
> -			return NULL;
> -		}
> -	}
> +	sh = mlx5_alloc_shared_ibctx(spawn);
> +	if (!sh)
> +		return NULL;
> +	config.devx = sh->devx;
>  #ifdef HAVE_IBV_MLX5_MOD_SWP
>  	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_SWP;  #endif @@
> -832,7 +963,7 @@ struct mlx5_dev_spawn_data {  #ifdef
> HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
>  	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_STRIDING_RQ;
> #endif
> -	mlx5_glue->dv_query_device(ctx, &dv_attr);
> +	mlx5_glue->dv_query_device(sh->ctx, &dv_attr);
>  	if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) {
>  		if (dv_attr.flags &
> MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW) {
>  			DRV_LOG(DEBUG, "enhanced MPW is supported");
> @@ -917,11 +1048,6 @@ struct mlx5_dev_spawn_data {
>  		" old OFED/rdma-core version or firmware configuration");
> #endif
>  	config.mpls_en = mpls_en;
> -	err = mlx5_glue->query_device_ex(ctx, NULL, &attr);
> -	if (err) {
> -		DEBUG("ibv_query_device_ex() failed");
> -		goto error;
> -	}
>  	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
>  	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
>  		eth_dev = rte_eth_dev_attach_secondary(name);
> @@ -957,11 +1083,11 @@ struct mlx5_dev_spawn_data {
>  		 */
>  		eth_dev->rx_pkt_burst =
> mlx5_select_rx_function(eth_dev);
>  		eth_dev->tx_pkt_burst =
> mlx5_select_tx_function(eth_dev);
> -		claim_zero(mlx5_glue->close_device(ctx));
> +		mlx5_free_shared_ibctx(sh);
>  		return eth_dev;
>  	}
>  	/* Check port status. */
> -	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
> +	err = mlx5_glue->query_port(sh->ctx, spawn->ibv_port, &port_attr);
>  	if (err) {
>  		DRV_LOG(ERR, "port query failed: %s", strerror(err));
>  		goto error;
> @@ -975,13 +1101,7 @@ struct mlx5_dev_spawn_data {
>  		DRV_LOG(DEBUG, "port is not active: \"%s\" (%d)",
>  			mlx5_glue->port_state_str(port_attr.state),
>  			port_attr.state);
> -	/* Allocate protection domain. */
> -	pd = mlx5_glue->alloc_pd(ctx);
> -	if (pd == NULL) {
> -		DRV_LOG(ERR, "PD allocation failure");
> -		err = ENOMEM;
> -		goto error;
> -	}
> +	/* Allocate private eth device data. */
>  	priv = rte_zmalloc("ethdev private structure",
>  			   sizeof(*priv),
>  			   RTE_CACHE_LINE_SIZE);
> @@ -990,13 +1110,11 @@ struct mlx5_dev_spawn_data {
>  		err = ENOMEM;
>  		goto error;
>  	}
> -	priv->ctx = ctx;
> -	strncpy(priv->ibdev_name, priv->ctx->device->name,
> -		sizeof(priv->ibdev_name));
> -	strncpy(priv->ibdev_path, priv->ctx->device->ibdev_path,
> -		sizeof(priv->ibdev_path));
> -	priv->device_attr = attr;
> -	priv->pd = pd;
> +	priv->sh = sh;
> +	priv->ctx = sh->ctx;
> +	priv->ibv_port = spawn->ibv_port;
> +	priv->device_attr = sh->device_attr;
> +	priv->pd = sh->pd;
>  	priv->mtu = ETHER_MTU;
>  #ifndef RTE_ARCH_64
>  	/* Initialize UAR access locks for 32bit implementations. */ @@ -
> 1051,7 +1169,8 @@ struct mlx5_dev_spawn_data {
>  			strerror(rte_errno));
>  		goto error;
>  	}
> -	config.hw_csum = !!(attr.device_cap_flags_ex &
> IBV_DEVICE_RAW_IP_CSUM);
> +	config.hw_csum = !!(sh->device_attr.device_cap_flags_ex &
> +			    IBV_DEVICE_RAW_IP_CSUM);
>  	DRV_LOG(DEBUG, "checksum offloading is %ssupported",
>  		(config.hw_csum ? "" : "not "));
>  #if !defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) && \ @@ -1065,7
> +1184,7 @@ struct mlx5_dev_spawn_data {
>  	}
>  #endif
>  	config.ind_table_max_size =
> -		attr.rss_caps.max_rwq_indirection_table_size;
> +		sh->device_attr.rss_caps.max_rwq_indirection_table_size;
>  	/*
>  	 * Remove this check once DPDK supports larger/variable
>  	 * indirection tables.
> @@ -1074,18 +1193,18 @@ struct mlx5_dev_spawn_data {
>  		config.ind_table_max_size = ETH_RSS_RETA_SIZE_512;
>  	DRV_LOG(DEBUG, "maximum Rx indirection table size is %u",
>  		config.ind_table_max_size);
> -	config.hw_vlan_strip = !!(attr.raw_packet_caps &
> +	config.hw_vlan_strip = !!(sh->device_attr.raw_packet_caps &
> 
> IBV_RAW_PACKET_CAP_CVLAN_STRIPPING);
>  	DRV_LOG(DEBUG, "VLAN stripping is %ssupported",
>  		(config.hw_vlan_strip ? "" : "not "));
> -	config.hw_fcs_strip = !!(attr.raw_packet_caps &
> +	config.hw_fcs_strip = !!(sh->device_attr.raw_packet_caps &
>  				 IBV_RAW_PACKET_CAP_SCATTER_FCS);
>  	DRV_LOG(DEBUG, "FCS stripping configuration is %ssupported",
>  		(config.hw_fcs_strip ? "" : "not "));  #if
> defined(HAVE_IBV_WQ_FLAG_RX_END_PADDING)
> -	hw_padding = !!attr.rx_pad_end_addr_align;
> +	hw_padding = !!sh->device_attr.rx_pad_end_addr_align;
>  #elif defined(HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING)
> -	hw_padding = !!(attr.device_cap_flags_ex &
> +	hw_padding = !!(sh->device_attr.device_cap_flags_ex &
>  			IBV_DEVICE_PCI_WRITE_END_PADDING);
>  #endif
>  	if (config.hw_padding && !hw_padding) { @@ -1094,11 +1213,11 @@
> struct mlx5_dev_spawn_data {
>  	} else if (config.hw_padding) {
>  		DRV_LOG(DEBUG, "Rx end alignment padding is enabled");
>  	}
> -	config.tso = (attr.tso_caps.max_tso > 0 &&
> -		      (attr.tso_caps.supported_qpts &
> +	config.tso = (sh->device_attr.tso_caps.max_tso > 0 &&
> +		      (sh->device_attr.tso_caps.supported_qpts &
>  		       (1 << IBV_QPT_RAW_PACKET)));
>  	if (config.tso)
> -		config.tso_max_payload_sz = attr.tso_caps.max_tso;
> +		config.tso_max_payload_sz = sh-
> >device_attr.tso_caps.max_tso;
>  	/*
>  	 * MPW is disabled by default, while the Enhanced MPW is enabled
>  	 * by default.
> @@ -1257,7 +1376,8 @@ struct mlx5_dev_spawn_data {
>  		.free = &mlx5_free_verbs_buf,
>  		.data = priv,
>  	};
> -	mlx5_glue->dv_set_context_attr(ctx,
> MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
> +	mlx5_glue->dv_set_context_attr(sh->ctx,
> +				       MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
>  				       (void *)((uintptr_t)&alctr));
>  	/* Bring Ethernet device up. */
>  	DRV_LOG(DEBUG, "port %u forcing Ethernet interface up", @@ -
> 1311,15 +1431,13 @@ struct mlx5_dev_spawn_data {
>  		if (eth_dev != NULL)
>  			eth_dev->data->dev_private = NULL;
>  	}
> -	if (pd)
> -		claim_zero(mlx5_glue->dealloc_pd(pd));
>  	if (eth_dev != NULL) {
>  		/* mac_addrs must not be freed alone because part of
> dev_private */
>  		eth_dev->data->mac_addrs = NULL;
>  		rte_eth_dev_release_port(eth_dev);
>  	}
> -	if (ctx)
> -		claim_zero(mlx5_glue->close_device(ctx));
> +	if (sh)
> +		mlx5_free_shared_ibctx(sh);
>  	assert(err > 0);
>  	rte_errno = err;
>  	return NULL;
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 07/14] net/mlx5: switch to the names in the shared IB context
  2019-03-21  8:11   ` [PATCH 07/14] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
@ 2019-03-21 12:14     ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:14 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 07/14] net/mlx5: switch to the names in the shared IB
> context
> 
> The IB device names are moved from device private data to the shared
> context, code involving the names is updated.
> The IB port index treatment is added where it is relevant.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellaox.com>

> ---
>  drivers/net/mlx5/mlx5.h        |  2 --
>  drivers/net/mlx5/mlx5_ethdev.c | 10 +++++++---
> drivers/net/mlx5/mlx5_stats.c  | 22 +++++++++++++---------
>  3 files changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 528ae02..fb9ed3b 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -229,8 +229,6 @@ struct mlx5_priv {
>  	struct ibv_context *ctx; /* Verbs context. */
>  	struct ibv_device_attr_ex device_attr; /* Device properties. */
>  	struct ibv_pd *pd; /* Protection Domain. */
> -	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
> -	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for
> secondary */
>  	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC
> addresses. */
>  	BITFIELD_DECLARE(mac_own, uint64_t,
> MLX5_MAX_MAC_ADDRESSES);
>  	/* Bit-field of MAC addresses owned by the PMD. */ diff --git
> a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index
> 5b44889..2ccc743 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -138,8 +138,10 @@ struct ethtool_link_settings {
>  	unsigned int dev_port_prev = ~0u;
>  	char match[IF_NAMESIZE] = "";
> 
> +	assert(priv);
> +	assert(priv->sh);
>  	{
> -		MKSTR(path, "%s/device/net", priv->ibdev_path);
> +		MKSTR(path, "%s/device/net", priv->sh->ibdev_path);
> 
>  		dir = opendir(path);
>  		if (dir == NULL) {
> @@ -159,7 +161,7 @@ struct ethtool_link_settings {
>  			continue;
> 
>  		MKSTR(path, "%s/device/net/%s/%s",
> -		      priv->ibdev_path, name,
> +		      priv->sh->ibdev_path, name,
>  		      (dev_type ? "dev_id" : "dev_port"));
> 
>  		file = fopen(path, "rb");
> @@ -222,7 +224,9 @@ struct ethtool_link_settings {
>  	struct mlx5_priv *priv = dev->data->dev_private;
>  	unsigned int ifindex =
>  		priv->nl_socket_rdma >= 0 ?
> -		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name,
> 1) : 0;
> +		mlx5_nl_ifindex(priv->nl_socket_rdma,
> +				priv->sh->ibdev_name,
> +				priv->ibv_port) : 0;
> 
>  	if (!ifindex) {
>  		if (!priv->representor)
> diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
> index 6906dc8..5af199d 100644
> --- a/drivers/net/mlx5/mlx5_stats.c
> +++ b/drivers/net/mlx5/mlx5_stats.c
> @@ -140,18 +140,22 @@
>  mlx5_read_ib_stat(struct mlx5_priv *priv, const char *ctr_name, uint64_t
> *stat)  {
>  	FILE *file;
> -	MKSTR(path, "%s/ports/1/hw_counters/%s",
> -		  priv->ibdev_path,
> -		  ctr_name);
> +	if (priv->sh) {
> +		MKSTR(path, "%s/ports/%d/hw_counters/%s",
> +			  priv->sh->ibdev_path,
> +			  priv->ibv_port,
> +			  ctr_name);
> 
> -	file = fopen(path, "rb");
> -	if (file) {
> -		int n = fscanf(file, "%" SCNu64, stat);
> +		file = fopen(path, "rb");
> +		if (file) {
> +			int n = fscanf(file, "%" SCNu64, stat);
> 
> -		fclose(file);
> -		if (n != 1)
> -			stat = 0;
> +			fclose(file);
> +			if (n == 1)
> +				return;
> +		}
>  	}
> +	*stat = 0;
>  }
> 
>  /**
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 08/14] net/mlx5: switch to the shared Protection Domain
  2019-03-21  8:11   ` [PATCH 08/14] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
@ 2019-03-21 12:14     ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:14 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 08/14] net/mlx5: switch to the shared Protection Domain
> 
> The PMD code is updated to use Protected Domain from the the shared IB
> device context. The Domain is shared between all devices belonging to the
> same multiport Infiniband device.
> If IB device has only one port, the PD is not shared, because there is only
> ethernet device created over IB one.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellaonx.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 09/14] net/mlx5: switch to the shared context IB attributes
  2019-03-21  8:11   ` [PATCH 09/14] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
@ 2019-03-21 12:14     ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:14 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 09/14] net/mlx5: switch to the shared context IB attributes
> 
> The code is updated to use the shared IB device attributes, located in the
> shared IB context. It saves some memory if there are representors created
> over the single Infiniband device with multuple ports.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellanox.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 10/14] net/mlx5: switch to the shared IB device context
  2019-03-21  8:11   ` [PATCH 10/14] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
@ 2019-03-21 12:14     ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:14 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 10/14] net/mlx5: switch to the shared IB device context
> 
> The code is updated to use the shared IB device context and device handles.
> The IB device context is shared between reprentors created over the single
> multiport IB device. All Verbs and DevX objects will be created whithin this
> shared context.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellanox.com>

> ---
>  drivers/net/mlx5/mlx5.c            |  7 +++----
>  drivers/net/mlx5/mlx5.h            |  1 -
>  drivers/net/mlx5/mlx5_ethdev.c     | 13 +++++++------
>  drivers/net/mlx5/mlx5_flow_dv.c    |  9 +++++----
>  drivers/net/mlx5/mlx5_flow_verbs.c |  6 ++++--
>  drivers/net/mlx5/mlx5_rxq.c        | 23 ++++++++++++-----------
>  drivers/net/mlx5/mlx5_socket.c     |  4 ++--
>  drivers/net/mlx5/mlx5_trigger.c    |  2 +-
>  drivers/net/mlx5/mlx5_txq.c        |  4 ++--
>  9 files changed, 36 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> a0f224b..312c42b 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -407,7 +407,7 @@ struct mlx5_dev_spawn_data {
> 
>  	DRV_LOG(DEBUG, "port %u closing device \"%s\"",
>  		dev->data->port_id,
> -		((priv->ctx != NULL) ? priv->ctx->device->name : ""));
> +		((priv->sh->ctx != NULL) ? priv->sh->ctx->device->name :
> ""));
>  	/* In case mlx5_dev_stop() has not been called. */
>  	mlx5_dev_interrupt_handler_uninstall(dev);
>  	mlx5_traffic_disable(dev);
> @@ -1111,7 +1111,6 @@ struct mlx5_dev_spawn_data {
>  		goto error;
>  	}
>  	priv->sh = sh;
> -	priv->ctx = sh->ctx;
>  	priv->ibv_port = spawn->ibv_port;
>  	priv->mtu = ETHER_MTU;
>  #ifndef RTE_ARCH_64
> @@ -1349,7 +1348,7 @@ struct mlx5_dev_spawn_data {
>  		}
>  	}
>  #ifdef HAVE_MLX5DV_DR
> -		ns = mlx5dv_dr_create_ns(ctx,
> +		ns = mlx5dv_dr_create_ns(sh->ctx,
> 
> MLX5DV_DR_NS_DOMAIN_INGRESS_BYPASS);
>  		if (ns == NULL) {
>  			DRV_LOG(ERR, "mlx5dv_dr_create_ns failed"); @@ -
> 1357,7 +1356,7 @@ struct mlx5_dev_spawn_data {
>  			goto error;
>  		}
>  		priv->rx_ns = ns;
> -		ns = mlx5dv_dr_create_ns(ctx,
> +		ns = mlx5dv_dr_create_ns(sh->ctx,
> 
> MLX5DV_DR_NS_DOMAIN_EGRESS_BYPASS);
>  		if (ns == NULL) {
>  			DRV_LOG(ERR, "mlx5dv_dr_create_ns failed"); diff --
> git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> efa3a3c..d816d24 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -226,7 +226,6 @@ struct mlx5_priv {
>  	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
>  	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
>  	uint32_t ibv_port; /* IB device port number. */
> -	struct ibv_context *ctx; /* Verbs context. */
>  	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC
> addresses. */
>  	BITFIELD_DECLARE(mac_own, uint64_t,
> MLX5_MAX_MAC_ADDRESSES);
>  	/* Bit-field of MAC addresses owned by the PMD. */ diff --git
> a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index
> 69f238b..1b2173b 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -1055,7 +1055,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev,
> char *fw_ver, size_t fw_size)
>  	}
>  	/* Read all message and acknowledge them. */
>  	for (;;) {
> -		if (mlx5_glue->get_async_event(priv->ctx, &event))
> +		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
>  			break;
>  		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
>  			event.event_type == IBV_EVENT_PORT_ERR) &&
> @@ -1142,12 +1142,13 @@ int mlx5_fw_version_get(struct rte_eth_dev
> *dev, char *fw_ver, size_t fw_size)
> mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct ibv_context *ctx = priv->sh->ctx;
>  	int ret;
>  	int flags;
> 
> -	assert(priv->ctx->async_fd > 0);
> -	flags = fcntl(priv->ctx->async_fd, F_GETFL);
> -	ret = fcntl(priv->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
> +	assert(ctx->async_fd > 0);
> +	flags = fcntl(ctx->async_fd, F_GETFL);
> +	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
>  	if (ret) {
>  		DRV_LOG(INFO,
>  			"port %u failed to change file descriptor async event"
> @@ -1158,7 +1159,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev,
> char *fw_ver, size_t fw_size)
>  	}
>  	if (dev->data->dev_conf.intr_conf.lsc ||
>  	    dev->data->dev_conf.intr_conf.rmv) {
> -		priv->intr_handle.fd = priv->ctx->async_fd;
> +		priv->intr_handle.fd = ctx->async_fd;
>  		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
>  		rte_intr_callback_register(&priv->intr_handle,
>  					   mlx5_dev_interrupt_handler, dev);
> @@ -1303,7 +1304,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev,
> char *fw_ver, size_t fw_size)
>  	struct ibv_device_attr device_attr;
>  	struct mlx5_priv *priv = dev->data->dev_private;
> 
> -	if (mlx5_glue->query_device(priv->ctx, &device_attr) == EIO)
> +	if (mlx5_glue->query_device(priv->sh->ctx, &device_attr) == EIO)
>  		return 1;
>  	return 0;
>  }
> diff --git a/drivers/net/mlx5/mlx5_flow_dv.c
> b/drivers/net/mlx5/mlx5_flow_dv.c index 44b36ee..b8943da 100644
> --- a/drivers/net/mlx5/mlx5_flow_dv.c
> +++ b/drivers/net/mlx5/mlx5_flow_dv.c
> @@ -840,7 +840,7 @@ struct field_modify_info modify_tcp[] = {
>  	*cache_resource = *resource;
>  	cache_resource->verbs_action =
>  		mlx5_glue->dv_create_flow_action_packet_reformat
> -			(priv->ctx, cache_resource->reformat_type,
> +			(priv->sh->ctx, cache_resource->reformat_type,
>  			 cache_resource->ft_type, ns, cache_resource-
> >flags,
>  			 cache_resource->size,
>  			 (cache_resource->size ? cache_resource->buf :
> NULL)); @@ -1569,7 +1569,8 @@ struct field_modify_info modify_tcp[] = {
>  	*cache_resource = *resource;
>  	cache_resource->verbs_action =
>  		mlx5_glue->dv_create_flow_action_modify_header
> -					(priv->ctx, cache_resource->ft_type,
> +					(priv->sh->ctx,
> +					 cache_resource->ft_type,
>  					 ns, 0,
>  					 cache_resource->actions_num *
>  					 sizeof(cache_resource->actions[0]),
> @@ -1629,7 +1630,7 @@ struct field_modify_info modify_tcp[] = {
>  		ret = -ENOMEM;
>  		goto error_exit;
>  	}
> -	ret = mlx5_devx_cmd_flow_counter_alloc(priv->ctx, dcs);
> +	ret = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, dcs);
>  	if (ret)
>  		goto error_exit;
>  	struct mlx5_flow_counter tmpl = {
> @@ -2982,7 +2983,7 @@ struct field_modify_info modify_tcp[] = {
>  	if (matcher->egress)
>  		dv_attr.flags |= IBV_FLOW_ATTR_FLAGS_EGRESS;
>  	cache_matcher->matcher_object =
> -		mlx5_glue->dv_create_flow_matcher(priv->ctx, &dv_attr,
> +		mlx5_glue->dv_create_flow_matcher(priv->sh->ctx,
> &dv_attr,
>  						  tbl->obj);
>  	if (!cache_matcher->matcher_object) {
>  		rte_free(cache_matcher);
> diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c
> b/drivers/net/mlx5/mlx5_flow_verbs.c
> index 6c4f52f..9548d12 100644
> --- a/drivers/net/mlx5/mlx5_flow_verbs.c
> +++ b/drivers/net/mlx5/mlx5_flow_verbs.c
> @@ -56,10 +56,11 @@
>  {
>  #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
>  	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct ibv_context ctx = priv->sh->ctx;
>  	struct ibv_counter_set_init_attr init = {
>  			 .counter_set_id = counter->id};
> 
> -	counter->cs = mlx5_glue->create_counter_set(priv->ctx, &init);
> +	counter->cs = mlx5_glue->create_counter_set(ctx, &init);
>  	if (!counter->cs) {
>  		rte_errno = ENOTSUP;
>  		return -ENOTSUP;
> @@ -67,12 +68,13 @@
>  	return 0;
>  #elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
>  	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct ibv_context *ctx = priv->sh->ctx;
>  	struct ibv_counters_init_attr init = {0};
>  	struct ibv_counter_attach_attr attach;
>  	int ret;
> 
>  	memset(&attach, 0, sizeof(attach));
> -	counter->cs = mlx5_glue->create_counters(priv->ctx, &init);
> +	counter->cs = mlx5_glue->create_counters(ctx, &init);
>  	if (!counter->cs) {
>  		rte_errno = ENOTSUP;
>  		return -ENOTSUP;
> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> index fd1c3a2..dcb97c2 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -799,7 +799,7 @@ struct mlx5_rxq_ibv *
>  	}
>  	tmpl->rxq_ctrl = rxq_ctrl;
>  	if (rxq_ctrl->irq) {
> -		tmpl->channel = mlx5_glue->create_comp_channel(priv-
> >ctx);
> +		tmpl->channel = mlx5_glue->create_comp_channel(priv-
> >sh->ctx);
>  		if (!tmpl->channel) {
>  			DRV_LOG(ERR, "port %u: comp channel creation
> failure",
>  				dev->data->port_id);
> @@ -848,7 +848,7 @@ struct mlx5_rxq_ibv *
>  	}
>  #endif
>  	tmpl->cq = mlx5_glue->cq_ex_to_cq
> -		(mlx5_glue->dv_create_cq(priv->ctx, &attr.cq.ibv,
> +		(mlx5_glue->dv_create_cq(priv->sh->ctx, &attr.cq.ibv,
>  					 &attr.cq.mlx5));
>  	if (tmpl->cq == NULL) {
>  		DRV_LOG(ERR, "port %u Rx queue %u CQ creation failure",
> @@ -905,10 +905,10 @@ struct mlx5_rxq_ibv *
>  			.two_byte_shift_en =
> MLX5_MPRQ_TWO_BYTE_SHIFT,
>  		};
>  	}
> -	tmpl->wq = mlx5_glue->dv_create_wq(priv->ctx, &attr.wq.ibv,
> +	tmpl->wq = mlx5_glue->dv_create_wq(priv->sh->ctx, &attr.wq.ibv,
>  					   &attr.wq.mlx5);
>  #else
> -	tmpl->wq = mlx5_glue->create_wq(priv->ctx, &attr.wq.ibv);
> +	tmpl->wq = mlx5_glue->create_wq(priv->sh->ctx, &attr.wq.ibv);
>  #endif
>  	if (tmpl->wq == NULL) {
>  		DRV_LOG(ERR, "port %u Rx queue %u WQ creation failure",
> @@ -1643,7 +1643,7 @@ struct mlx5_ind_table_ibv *
>  	for (j = 0; i != (unsigned int)(1 << wq_n); ++i, ++j)
>  		wq[i] = wq[j];
>  	ind_tbl->ind_table = mlx5_glue->create_rwq_ind_table
> -		(priv->ctx,
> +		(priv->sh->ctx,
>  		 &(struct ibv_rwq_ind_table_init_attr){
>  			.log_ind_tbl_size = wq_n,
>  			.ind_tbl = wq,
> @@ -1817,7 +1817,7 @@ struct mlx5_hrxq *
>  	}
>  #endif
>  	qp = mlx5_glue->dv_create_qp
> -		(priv->ctx,
> +		(priv->sh->ctx,
>  		 &(struct ibv_qp_init_attr_ex){
>  			.qp_type = IBV_QPT_RAW_PACKET,
>  			.comp_mask =
> @@ -1836,7 +1836,7 @@ struct mlx5_hrxq *
>  		 &qp_init_attr);
>  #else
>  	qp = mlx5_glue->create_qp_ex
> -		(priv->ctx,
> +		(priv->sh->ctx,
>  		 &(struct ibv_qp_init_attr_ex){
>  			.qp_type = IBV_QPT_RAW_PACKET,
>  			.comp_mask =
> @@ -1988,20 +1988,21 @@ struct mlx5_rxq_ibv *
> mlx5_rxq_ibv_drop_new(struct rte_eth_dev *dev)  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct ibv_context *ctx = priv->sh->ctx;
>  	struct ibv_cq *cq;
>  	struct ibv_wq *wq = NULL;
>  	struct mlx5_rxq_ibv *rxq;
> 
>  	if (priv->drop_queue.rxq)
>  		return priv->drop_queue.rxq;
> -	cq = mlx5_glue->create_cq(priv->ctx, 1, NULL, NULL, 0);
> +	cq = mlx5_glue->create_cq(ctx, 1, NULL, NULL, 0);
>  	if (!cq) {
>  		DEBUG("port %u cannot allocate CQ for drop queue",
>  		      dev->data->port_id);
>  		rte_errno = errno;
>  		goto error;
>  	}
> -	wq = mlx5_glue->create_wq(priv->ctx,
> +	wq = mlx5_glue->create_wq(ctx,
>  		 &(struct ibv_wq_init_attr){
>  			.wq_type = IBV_WQT_RQ,
>  			.max_wr = 1,
> @@ -2078,7 +2079,7 @@ struct mlx5_ind_table_ibv *
>  	if (!rxq)
>  		return NULL;
>  	tmpl.ind_table = mlx5_glue->create_rwq_ind_table
> -		(priv->ctx,
> +		(priv->sh->ctx,
>  		 &(struct ibv_rwq_ind_table_init_attr){
>  			.log_ind_tbl_size = 0,
>  			.ind_tbl = &rxq->wq,
> @@ -2145,7 +2146,7 @@ struct mlx5_hrxq *
>  	ind_tbl = mlx5_ind_table_ibv_drop_new(dev);
>  	if (!ind_tbl)
>  		return NULL;
> -	qp = mlx5_glue->create_qp_ex(priv->ctx,
> +	qp = mlx5_glue->create_qp_ex(priv->sh->ctx,
>  		 &(struct ibv_qp_init_attr_ex){
>  			.qp_type = IBV_QPT_RAW_PACKET,
>  			.comp_mask =
> diff --git a/drivers/net/mlx5/mlx5_socket.c
> b/drivers/net/mlx5/mlx5_socket.c index 41cac3c..8fa6430 100644
> --- a/drivers/net/mlx5/mlx5_socket.c
> +++ b/drivers/net/mlx5/mlx5_socket.c
> @@ -185,9 +185,9 @@
>  	assert(cmsg != NULL);
>  	cmsg->cmsg_level = SOL_SOCKET;
>  	cmsg->cmsg_type = SCM_RIGHTS;
> -	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->ctx->cmd_fd));
> +	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->sh->ctx->cmd_fd));
>  	fd = (int *)CMSG_DATA(cmsg);
> -	*fd = priv->ctx->cmd_fd;
> +	*fd = priv->sh->ctx->cmd_fd;
>  	ret = sendmsg(conn_sock, &msg, 0);
>  	if (ret < 0)
>  		DRV_LOG(WARNING, "port %u cannot send response", diff -
> -git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> index 2137bdc..d13a1a1 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -58,7 +58,7 @@
>  			goto error;
>  		}
>  	}
> -	ret = mlx5_tx_uar_remap(dev, priv->ctx->cmd_fd);
> +	ret = mlx5_tx_uar_remap(dev, priv->sh->ctx->cmd_fd);
>  	if (ret) {
>  		/* Adjust index for rollback. */
>  		i = priv->txqs_n - 1;
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
> index 10a3040..25305b6 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -392,7 +392,7 @@ struct mlx5_txq_ibv *
>  		((desc / MLX5_TX_COMP_THRESH) - 1) : 1;
>  	if (is_empw_burst_func(tx_pkt_burst))
>  		cqe_n += MLX5_TX_COMP_THRESH_INLINE_DIV;
> -	tmpl.cq = mlx5_glue->create_cq(priv->ctx, cqe_n, NULL, NULL, 0);
> +	tmpl.cq = mlx5_glue->create_cq(priv->sh->ctx, cqe_n, NULL, NULL,
> 0);
>  	if (tmpl.cq == NULL) {
>  		DRV_LOG(ERR, "port %u Tx queue %u CQ creation failure",
>  			dev->data->port_id, idx);
> @@ -435,7 +435,7 @@ struct mlx5_txq_ibv *
>  		attr.init.max_tso_header = txq_ctrl->max_tso_header;
>  		attr.init.comp_mask |=
> IBV_QP_INIT_ATTR_MAX_TSO_HEADER;
>  	}
> -	tmpl.qp = mlx5_glue->create_qp_ex(priv->ctx, &attr.init);
> +	tmpl.qp = mlx5_glue->create_qp_ex(priv->sh->ctx, &attr.init);
>  	if (tmpl.qp == NULL) {
>  		DRV_LOG(ERR, "port %u Tx queue %u QP creation failure",
>  			dev->data->port_id, idx);
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 11/14] net/mlx5: provide IB port for the object being created
  2019-03-21  8:11   ` [PATCH 11/14] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
@ 2019-03-21 12:15     ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:15 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 11/14] net/mlx5: provide IB port for the object being
> created
> 
> The code is updated to provide IB port index for the Verbs objects being
> created - QP and Verbs Flows.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellanox.com>

> ---
>  drivers/net/mlx5/mlx5_flow.c       | 2 ++
>  drivers/net/mlx5/mlx5_flow_verbs.c | 1 +
>  drivers/net/mlx5/mlx5_txq.c        | 4 ++--
>  3 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 010d61a..40d6818 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -315,6 +315,7 @@ struct mlx5_flow_tunnel_info {  int
> mlx5_flow_discover_priorities(struct rte_eth_dev *dev)  {
> +	struct mlx5_priv *priv = dev->data->dev_private;
>  	struct {
>  		struct ibv_flow_attr attr;
>  		struct ibv_flow_spec_eth eth;
> @@ -322,6 +323,7 @@ struct mlx5_flow_tunnel_info {
>  	} flow_attr = {
>  		.attr = {
>  			.num_of_specs = 2,
> +			.port = (uint8_t)priv->ibv_port,
>  		},
>  		.eth = {
>  			.type = IBV_FLOW_SPEC_ETH,
> diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c
> b/drivers/net/mlx5/mlx5_flow_verbs.c
> index 9548d12..b69af6c 100644
> --- a/drivers/net/mlx5/mlx5_flow_verbs.c
> +++ b/drivers/net/mlx5/mlx5_flow_verbs.c
> @@ -1550,6 +1550,7 @@
>  	dev_flow->layers = item_flags;
>  	dev_flow->verbs.attr->priority =
>  		mlx5_flow_adjust_priority(dev, priority, subpriority);
> +	dev_flow->verbs.attr->port = (uint8_t)priv->ibv_port;
>  	return 0;
>  }
> 
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
> index 25305b6..5062f5c 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -445,8 +445,8 @@ struct mlx5_txq_ibv *
>  	attr.mod = (struct ibv_qp_attr){
>  		/* Move the QP to this state. */
>  		.qp_state = IBV_QPS_INIT,
> -		/* Primary port number. */
> -		.port_num = 1,
> +		/* IB device port number. */
> +		.port_num = (uint8_t)priv->ibv_port,
>  	};
>  	ret = mlx5_glue->modify_qp(tmpl.qp, &attr.mod,
>  				   (IBV_QP_STATE | IBV_QP_PORT));
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 12/14] net/mlx5: update install/uninstall int handler routines
  2019-03-21  8:11   ` [PATCH 12/14] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
@ 2019-03-21 12:15     ` Shahaf Shuler
  2019-03-21 14:01       ` Slava Ovsiienko
  0 siblings, 1 reply; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:15 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 12/14] net/mlx5: update install/uninstall int handler routines
> 
> We are implementing the support for multport Infiniband device withj
> representors attached to these multiple ports. Asynchronous device event
> notifications (link status change, removal event, etc.) should be shared
> between ports. We are going to implement shared event handler and this
> patch introduces appropriate device structure changes and updated event
> handler install and uninstall routines.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5.c        |  14 ++++-
>  drivers/net/mlx5/mlx5.h        |   3 +-
>  drivers/net/mlx5/mlx5_ethdev.c | 118
> ++++++++++++++++++++++++++++++++---------
>  3 files changed, 107 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 312c42b..44b7a87 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -165,6 +165,7 @@ struct mlx5_dev_spawn_data {  {
>  	struct mlx5_ibv_shared *sh;
>  	int err = 0;
> +	uint32_t i;
> 
>  	assert(spawn);
>  	/* Search for IB context by device name. */ @@ -212,6 +213,9 @@
> struct mlx5_dev_spawn_data {
>  		sizeof(sh->ibdev_name));
>  	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
>  		sizeof(sh->ibdev_path));
> +	pthread_mutex_init(&sh->intr_mutex, NULL);
> +	for (i = 0; i < sh->max_port; i++)
> +		sh->port[i].port_id = RTE_MAX_ETHPORTS;

Why you need struct here? You port array is not just of uint32_t type?

>  	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
>  		/*
>  		 * For secondary process we just open the IB device @@ -
> 276,6 +280,15 @@ struct mlx5_dev_spawn_data {
>  		assert(!sh->pd);
>  	}
>  	LIST_REMOVE(sh, next);
> +	/*
> +	 *  Ensure there is no async event handler installed.
> +	 *  Only primary process handles async device events.
> +	 **/
> +	assert(!sh->intr_cnt);
> +	if (sh->intr_cnt)
> +		rte_intr_callback_unregister
> +			(&sh->intr_handle, mlx5_dev_interrupt_handler,
> sh);
> +	pthread_mutex_destroy(&sh->intr_mutex);
>  	if (sh->pd)
>  		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
>  	if (sh->ctx)
> @@ -283,7 +296,6 @@ struct mlx5_dev_spawn_data {
>  	rte_free(sh);
>  }
> 
> -
>  /**
>   * Prepare shared data between primary and secondary process.
>   */
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> d816d24..f23298e 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -216,6 +216,8 @@ struct mlx5_ibv_shared {
>  	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
>  	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for
> secondary */
>  	struct ibv_device_attr_ex device_attr; /* Device properties. */
> +	pthread_mutex_t intr_mutex; /* Interrupt config mutex. */
> +	uint32_t intr_cnt; /* Interrupt handler reference counter. */
>  	struct rte_intr_handle intr_handle; /* Interrupt handler for device.
> */
>  	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
> }; @@ -245,7 +247,6 @@ struct mlx5_priv {
>  	struct mlx5_txq_data *(*txqs)[]; /* TX queues. */
>  	struct rte_mempool *mprq_mp; /* Mempool for Multi-Packet RQ.
> */
>  	struct rte_eth_rss_conf rss_conf; /* RSS configuration. */
> -	struct rte_intr_handle intr_handle; /* Interrupt handler. */
>  	unsigned int (*reta_idx)[]; /* RETA index table. */
>  	unsigned int reta_idx_n; /* RETA index size. */
>  	struct mlx5_drop drop_queue; /* Flow drop queues. */ diff --git
> a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index
> 1b2173b..8358cd2 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -1109,6 +1109,96 @@ int mlx5_fw_version_get(struct rte_eth_dev
> *dev, char *fw_ver, size_t fw_size)  }
> 
>  /**
> + * Uninstall shared asynchronous device events handler.
> + * This function is implemeted to support event sharing
> + * between multiple ports of single IB device.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + */
> +static void
> +mlx5_dev_shared_handler_uninstall(struct rte_eth_dev *dev) {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh = priv->sh;
> +
> +	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> +		return;
> +	pthread_mutex_lock(&sh->intr_mutex);
> +	assert(priv->ibv_port);
> +	assert(priv->ibv_port <= sh->max_port);
> +	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
> +	if (sh->port[priv->ibv_port - 1].port_id >= RTE_MAX_ETHPORTS)
> +		goto exit;
> +	assert(sh->port[priv->ibv_port - 1].port_id ==
> +					(uint32_t)dev->data->port_id);
> +	assert(sh->intr_cnt);
> +	sh->port[priv->ibv_port - 1].port_id = RTE_MAX_ETHPORTS;
> +	if (!sh->intr_cnt || --sh->intr_cnt)
> +		goto exit;
> +	rte_intr_callback_unregister(&sh->intr_handle,
> +				     mlx5_dev_interrupt_handler, sh);
> +	sh->intr_handle.fd = 0;
> +	sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> +exit:
> +	pthread_mutex_unlock(&sh->intr_mutex);
> +}
> +
> +/**
> + * Install shared asyncronous device events handler.
> + * This function is implemeted to support event sharing
> + * between multiple ports of single IB device.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + */
> +static void
> +mlx5_dev_shared_handler_install(struct rte_eth_dev *dev) {
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh = priv->sh;
> +	int ret;
> +	int flags;
> +
> +	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> +		return;
> +	pthread_mutex_lock(&sh->intr_mutex);
> +	assert(priv->ibv_port);
> +	assert(priv->ibv_port <= sh->max_port);
> +	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
> +	if (sh->port[priv->ibv_port - 1].port_id < RTE_MAX_ETHPORTS) {

I don't understand why need an array to understand handler is already exists. 
Why not the refcnt?

> +		/* The handler is already installed for this port. */
> +		assert(sh->intr_cnt++);

Asserts are compiled only in debug mode. You should not put any logic (++) into them. 

> +		goto exit;
> +	}
> +	sh->port[priv->ibv_port - 1].port_id = (uint32_t)dev->data->port_id;
> +	if (sh->intr_cnt) {
> +		sh->intr_cnt++;
> +		goto exit;
> +	}
> +	/* No shared handler installed. */
> +	assert(sh->ctx->async_fd > 0);
> +	flags = fcntl(sh->ctx->async_fd, F_GETFL);
> +	ret = fcntl(sh->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
> +	if (ret) {
> +		DRV_LOG(INFO, "failed to change file descriptor"
> +			      " async event queue");
> +		/* Indicate there will be no interrupts. */
> +		dev->data->dev_conf.intr_conf.lsc = 0;
> +		dev->data->dev_conf.intr_conf.rmv = 0;
> +		sh->port[priv->ibv_port - 1].port_id = RTE_MAX_ETHPORTS;
> +		goto exit;
> +	}
> +	sh->intr_handle.fd = sh->ctx->async_fd;
> +	sh->intr_handle.type = RTE_INTR_HANDLE_EXT;
> +	rte_intr_callback_register(&sh->intr_handle,
> +				   mlx5_dev_interrupt_handler, sh);
> +	sh->intr_cnt++;
> +exit:
> +	pthread_mutex_unlock(&sh->intr_mutex);
> +}
> +
> +/**
>   * Uninstall interrupt handler.
>   *
>   * @param dev
> @@ -1119,15 +1209,10 @@ int mlx5_fw_version_get(struct rte_eth_dev
> *dev, char *fw_ver, size_t fw_size)  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
> 
> -	if (dev->data->dev_conf.intr_conf.lsc ||
> -	    dev->data->dev_conf.intr_conf.rmv)
> -		rte_intr_callback_unregister(&priv->intr_handle,
> -					     mlx5_dev_interrupt_handler,
> dev);
> +	mlx5_dev_shared_handler_uninstall(dev);
>  	if (priv->primary_socket)
>  		rte_intr_callback_unregister(&priv->intr_handle_socket,
>  					     mlx5_dev_handler_socket, dev);
> -	priv->intr_handle.fd = 0;
> -	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
>  	priv->intr_handle_socket.fd = 0;
>  	priv->intr_handle_socket.type = RTE_INTR_HANDLE_UNKNOWN;  }
> @@ -1142,28 +1227,9 @@ int mlx5_fw_version_get(struct rte_eth_dev
> *dev, char *fw_ver, size_t fw_size)
> mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)  {
>  	struct mlx5_priv *priv = dev->data->dev_private;
> -	struct ibv_context *ctx = priv->sh->ctx;
>  	int ret;
> -	int flags;
> 
> -	assert(ctx->async_fd > 0);
> -	flags = fcntl(ctx->async_fd, F_GETFL);
> -	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
> -	if (ret) {
> -		DRV_LOG(INFO,
> -			"port %u failed to change file descriptor async event"
> -			" queue",
> -			dev->data->port_id);
> -		dev->data->dev_conf.intr_conf.lsc = 0;
> -		dev->data->dev_conf.intr_conf.rmv = 0;
> -	}
> -	if (dev->data->dev_conf.intr_conf.lsc ||
> -	    dev->data->dev_conf.intr_conf.rmv) {
> -		priv->intr_handle.fd = ctx->async_fd;
> -		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
> -		rte_intr_callback_register(&priv->intr_handle,
> -					   mlx5_dev_interrupt_handler, dev);
> -	}
> +	mlx5_dev_shared_handler_install(dev);
>  	ret = mlx5_socket_init(dev);
>  	if (ret)
>  		DRV_LOG(ERR, "port %u cannot initialise socket: %s",
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 13/14] net/mlx5: update event handler for multiport IB devices
  2019-03-21  8:11   ` [PATCH 13/14] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
@ 2019-03-21 12:15     ` Shahaf Shuler
  2019-03-21 14:08       ` Slava Ovsiienko
  0 siblings, 1 reply; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:15 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 13/14] net/mlx5: update event handler for multiport IB
> devices
> 
> This patch modifies asynchronous event handler to support multiport
> Infiniband devices. Handler queries the event parameters, including event
> source port index, and invokes the handler for specific devices with
> appropriate port_id.

This commit should be along w/ the previous one, since interrupts will not work after commit #12 in this series. 

> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_ethdev.c | 101 +++++++++++++++++++++------------
> --------
>  1 file changed, 51 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> b/drivers/net/mlx5/mlx5_ethdev.c index 8358cd2..710e6b5 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -1032,66 +1032,67 @@ int mlx5_fw_version_get(struct rte_eth_dev
> *dev, char *fw_ver, size_t fw_size)  }
> 
>  /**
> - * Device status handler.
> + * Handle shared asynchronous events the NIC (removal event
> + * and link status change). Supports multiport IB device.
>   *
> - * @param dev
> - *   Pointer to Ethernet device.
> - * @param events
> - *   Pointer to event flags holder.
> - *
> - * @return
> - *   Events bitmap of callback process which can be called immediately.
> + * @param cb_arg
> + *   Callback argument.
>   */
> -static uint32_t
> -mlx5_dev_status_handler(struct rte_eth_dev *dev)
> +void
> +mlx5_dev_interrupt_handler(void *cb_arg)
>  {
> -	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_ibv_shared *sh = cb_arg;
>  	struct ibv_async_event event;
> -	uint32_t ret = 0;
> 
> -	if (mlx5_link_update(dev, 0) == -EAGAIN) {
> -		usleep(0);
> -		return 0;
> -	}
> -	/* Read all message and acknowledge them. */
> +	/* Read all message from the IB device and acknowledge them. */
>  	for (;;) {
> -		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
> +		struct rte_eth_dev *dev;
> +		uint32_t tmp;
> +
> +		if (mlx5_glue->get_async_event(sh->ctx, &event))
>  			break;
> +		/* Retrieve and check IB port index. */
> +		tmp = (uint32_t)event.element.port_num;
> +		assert(tmp && (tmp <= sh->max_port));
> +		if (!tmp ||
> +		    tmp > sh->max_port ||
> +		    sh->port[tmp - 1].port_id >= RTE_MAX_ETHPORTS) {
> +			/*
> +			 * Invalid IB port index or no handler
> +			 * installed for this port.
> +			 */
> +			mlx5_glue->ack_async_event(&event);
> +			continue;
> +		}
> +		/* Retrieve ethernet device descriptor. */
> +		tmp = sh->port[tmp - 1].port_id;
> +		dev = &rte_eth_devices[tmp];

Is there a guarantee that the representors ethedev indexes will be contiguous?  
I think we need a mapping between ibdev port and ethedv index. 

> +		tmp = 0;
> +		assert(dev);
>  		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
> -			event.event_type == IBV_EVENT_PORT_ERR) &&
> -			(dev->data->dev_conf.intr_conf.lsc == 1))
> -			ret |= (1 << RTE_ETH_EVENT_INTR_LSC);
> -		else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
> -			dev->data->dev_conf.intr_conf.rmv == 1)
> -			ret |= (1 << RTE_ETH_EVENT_INTR_RMV);
> -		else
> -			DRV_LOG(DEBUG,
> -				"port %u event type %d on not handled",
> -				dev->data->port_id, event.event_type);
> +		     event.event_type == IBV_EVENT_PORT_ERR) &&
> +			dev->data->dev_conf.intr_conf.lsc) {
> +			mlx5_glue->ack_async_event(&event);
> +			if (mlx5_link_update(dev, 0) == -EAGAIN) {
> +				usleep(0);
> +				continue;
> +			}
> +			_rte_eth_dev_callback_process
> +				(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
> +			continue;
> +		}
> +		if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
> +		    dev->data->dev_conf.intr_conf.rmv) {
> +			mlx5_glue->ack_async_event(&event);
> +			_rte_eth_dev_callback_process
> +				(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
> +			continue;
> +		}
> +		DRV_LOG(DEBUG,
> +			"port %u event type %d on not handled",
> +			dev->data->port_id, event.event_type);
>  		mlx5_glue->ack_async_event(&event);
>  	}
> -	return ret;
> -}
> -
> -/**
> - * Handle interrupts from the NIC.
> - *
> - * @param[in] intr_handle
> - *   Interrupt handler.
> - * @param cb_arg
> - *   Callback argument.
> - */
> -void
> -mlx5_dev_interrupt_handler(void *cb_arg) -{
> -	struct rte_eth_dev *dev = cb_arg;
> -	uint32_t events;
> -
> -	events = mlx5_dev_status_handler(dev);
> -	if (events & (1 << RTE_ETH_EVENT_INTR_LSC))
> -		_rte_eth_dev_callback_process(dev,
> RTE_ETH_EVENT_INTR_LSC, NULL);
> -	if (events & (1 << RTE_ETH_EVENT_INTR_RMV))
> -		_rte_eth_dev_callback_process(dev,
> RTE_ETH_EVENT_INTR_RMV, NULL);
>  }
> 
>  /**
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 14/14] net/mlx5: add source vport match to the ingress rules
  2019-03-21  8:11   ` [PATCH 14/14] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
@ 2019-03-21 12:15     ` Shahaf Shuler
  2019-03-21 14:11       ` Slava Ovsiienko
  0 siblings, 1 reply; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-21 12:15 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> Subject: [PATCH 14/14] net/mlx5: add source vport match to the ingress rules
> 
> For E-Switch configurations over multiport Infiniband devices we should add
> source vport match to correctly distribute traffic between representors.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_flow_dv.c | 38
> ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow_dv.c
> b/drivers/net/mlx5/mlx5_flow_dv.c index b8943da..489b3bd 100644
> --- a/drivers/net/mlx5/mlx5_flow_dv.c
> +++ b/drivers/net/mlx5/mlx5_flow_dv.c
> @@ -3095,6 +3095,29 @@ struct field_modify_info modify_tcp[] = {  }
> 
>  /**
> + * Add source vport match to the specified matcher.
> + *
> + * @param[in, out] matcher
> + *   Flow matcher.
> + * @param[in, out] key
> + *   Flow matcher value.
> + * @param[in] port
> + *   Source vport value to match
> + * @param[in] mask
> + *   Mask
> + */
> +static void
> +flow_dv_translate_source_vport(void *matcher, void *key,
> +			      int16_t port, uint16_t mask)
> +{
> +	void *misc_m = MLX5_ADDR_OF(fte_match_param, matcher,
> misc_parameters);
> +	void *misc_v = MLX5_ADDR_OF(fte_match_param, key,
> misc_parameters);
> +
> +	MLX5_SET(fte_match_set_misc, misc_m, source_port, mask);
> +	MLX5_SET(fte_match_set_misc, misc_v, source_port, port); }
> +
> +/**
>   * Fill the flow with DV spec.
>   *
>   * @param[in] dev
> @@ -3389,6 +3412,21 @@ struct field_modify_info modify_tcp[] = {
>  	}
>  	dev_flow->dv.actions_n = actions_n;
>  	flow->actions = action_flags;
> +	if (attr->ingress && !attr->transfer &&
> +	    (priv->representor || priv->master)) {
> +		/* It was validated - we support unidirections flows only. */
> +		assert(!attr->egress);
> +		/*
> +		 * Add matching on source vport index only
> +		 * for ingress rules in E-Switch configurations.
> +		 */
> +		flow_dv_translate_source_vport(matcher.mask.buf,
> +					       dev_flow->dv.value.buf,
> +					       priv->representor_id < 0 ?
> +					       priv->representor_id :
> +					       priv->representor_id + 1,

The vport of representor_id 0 will be 1? 
Who owns vport 0? 

> +					       0xffff);
> +	}
>  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
>  		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
>  		void *match_mask = matcher.mask.buf;
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/14] net/mlx5: add multiport IB device support to probing
  2019-03-21 12:14     ` Shahaf Shuler
@ 2019-03-21 12:54       ` Slava Ovsiienko
  2019-03-21 12:57         ` Slava Ovsiienko
  0 siblings, 1 reply; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-21 12:54 UTC (permalink / raw)
  To: Shahaf Shuler, dev

> -----Original Message-----
> From: Shahaf Shuler
> Sent: Thursday, March 21, 2019 14:15
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Subject: RE: [PATCH 05/14] net/mlx5: add multiport IB device support to
> probing
> 
> Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> > Subject: [PATCH 05/14] net/mlx5: add multiport IB device support to
> > probing
> >
> > mlx5_pci_probe() routine is refactored to probe the ports of found
> > Infiniband devices. All active ports (with attached network
> > interface), belonging to the same Infiniband device will use the
> > signle shared Infiniband context of that device.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5.c | 302 +++++++++++++++++++++++++++++++++---
> > ------------
> >  1 file changed, 210 insertions(+), 92 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > 89c30af..100e9f4 100644
> > --- a/drivers/net/mlx5/mlx5.c
> > +++ b/drivers/net/mlx5/mlx5.c
> > @@ -130,6 +130,16 @@
> >  /** Driver-specific log messages type. */  int mlx5_logtype;
> >
> > +/** Data associated with devices to spawn. */ struct
> > +mlx5_dev_spawn_data {
> > +	uint32_t ifindex; /**< Network interface index. */
> > +	uint32_t max_port; /**< IB device maximal port index. */
> > +	uint32_t ibv_port; /**< IB device physical port index. */
> > +	struct mlx5_switch_info info; /**< Switch information. */
> > +	struct ibv_device *ibv_dev; /**< Associated IB device. */
> > +	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */ };
> > +
> >  /**
> >   * Prepare shared data between primary and secondary process.
> >   */
> > @@ -716,12 +726,10 @@
> >   *
> >   * @param dpdk_dev
> >   *   Backing DPDK device.
> > - * @param ibv_dev
> > - *   Verbs device.
> > + * @param spawn
> > + *   Verbs device parameters (name, port, switch_info) to spawn.
> >   * @param config
> >   *   Device configuration parameters.
> > - * @param[in] switch_info
> > - *   Switch properties of Ethernet device.
> >   *
> >   * @return
> >   *   A valid Ethernet device object on success, NULL otherwise and
> rte_errno
> > @@ -732,10 +740,11 @@
> >   */
> >  static struct rte_eth_dev *
> >  mlx5_dev_spawn(struct rte_device *dpdk_dev,
> > -	       struct ibv_device *ibv_dev,
> > -	       struct mlx5_dev_config config,
> > -	       const struct mlx5_switch_info *switch_info)
> > +	       struct mlx5_dev_spawn_data *spawn,
> > +	       struct mlx5_dev_config config)
> >  {
> > +	const struct mlx5_switch_info *switch_info = &spawn->info;
> > +	struct ibv_device *ibv_dev = spawn->ibv_dev;
> >  	struct ibv_context *ctx = NULL;
> >  	struct ibv_device_attr_ex attr;
> >  	struct ibv_port_attr port_attr;
> > @@ -952,7 +961,7 @@
> >  		return eth_dev;
> >  	}
> >  	/* Check port status. */
> > -	err = mlx5_glue->query_port(ctx, 1, &port_attr);
> > +	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
> >  	if (err) {
> >  		DRV_LOG(ERR, "port query failed: %s", strerror(err));
> >  		goto error;
> > @@ -1316,14 +1325,6 @@
> >  	return NULL;
> >  }
> >
> > -/** Data associated with devices to spawn. */ -struct
> > mlx5_dev_spawn_data {
> > -	unsigned int ifindex; /**< Network interface index. */
> > -	struct mlx5_switch_info info; /**< Switch information. */
> > -	struct ibv_device *ibv_dev; /**< Associated IB device. */
> > -	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
> > -};
> > -
> >  /**
> >   * Comparison callback to sort device data.
> >   *
> > @@ -1380,7 +1381,9 @@ struct mlx5_dev_spawn_data {
> >  	       struct rte_pci_device *pci_dev)  {
> >  	struct ibv_device **ibv_list;
> > -	unsigned int n = 0;
> > +	unsigned int nd = 0;
> > +	unsigned int np = 0;
> > +	unsigned int ns = 0;
> 
> This fields names are not informative. Find a better ones.

Would the adding clarifying comments be enough ?

nd - Number of (PCI) Devices   (nd != 1 means we have multiple devices with the same BDF - old schema)
np - Number of (device) Ports (nd =1, np 1...n means we have new multiport device)
ns - Number to Spawn  (deduced index - number of iterations)

This names are used as indices, long names may make code less readable, IMHO.

> 
> >  	struct mlx5_dev_config dev_config;
> >  	int ret;
> >
> > @@ -1392,8 +1395,14 @@ struct mlx5_dev_spawn_data {
> >  		DRV_LOG(ERR, "cannot list devices, is ib_uverbs loaded?");
> >  		return -rte_errno;
> >  	}
> > -
> > +	/*
> > +	 * First scan the list of all Infiniband devices to find
> > +	 * matching ones, gathering into the list.
> > +	 */
> >  	struct ibv_device *ibv_match[ret + 1];
> > +	int nl_route = -1;
> > +	int nl_rdma = -1;
> > +	unsigned int i;
> >
> >  	while (ret-- > 0) {
> >  		struct rte_pci_addr pci_addr;
> > @@ -1408,77 +1417,183 @@ struct mlx5_dev_spawn_data {
> >  			continue;
> >  		DRV_LOG(INFO, "PCI information matches for device \"%s\"",
> >  			ibv_list[ret]->name);
> > -		ibv_match[n++] = ibv_list[ret];
> > +		ibv_match[nd++] = ibv_list[ret];
> > +	}
> > +	ibv_match[nd] = NULL;
> > +	if (!nd) {
> > +		/* No device macthes, just complain and bail out. */
> > +		mlx5_glue->free_device_list(ibv_list);
> > +		DRV_LOG(WARNING,
> > +			"no Verbs device matches PCI device " PCI_PRI_FMT
> > ","
> > +			" are kernel drivers loaded?",
> > +			pci_dev->addr.domain, pci_dev->addr.bus,
> > +			pci_dev->addr.devid, pci_dev->addr.function);
> > +		rte_errno = ENOENT;
> > +		ret = -rte_errno;
> > +		return ret;
> > +	}
> > +	nl_route = mlx5_nl_init(NETLINK_ROUTE);
> > +	nl_rdma = mlx5_nl_init(NETLINK_RDMA);
> > +	if (nd == 1) {
> > +		/*
> > +		 * Found single matching device may have multiple ports.
> > +		 * Each port may be representor, we have to check the port
> > +		 * number and check the representors existence.
> > +		 */
> > +		if (nl_rdma >= 0)
> > +			np = mlx5_nl_portnum(nl_rdma, ibv_match[0]-
> > >name);
> > +		if (!np)
> > +			DRV_LOG(WARNING, "can not get IB device \"%s\""
> > +					 " ports number", ibv_match[0]-
> > >name);
> 
> This warning is misleading. On old kernels it is expected to have multiple IB
> devices instead of a single one w/ multiple ports.
> The level should be changed for debug, and the syntax to express it is not an
> error.
> 
> >  	}
> > -	ibv_match[n] = NULL;
> > -
> > -	struct mlx5_dev_spawn_data list[n];
> > -	int nl_route = n ? mlx5_nl_init(NETLINK_ROUTE) : -1;
> > -	int nl_rdma = n ? mlx5_nl_init(NETLINK_RDMA) : -1;
> > -	unsigned int i;
> > -	unsigned int u;
> > -
> >  	/*
> > -	 * The existence of several matching entries (n > 1) means port
> > -	 * representors have been instantiated. No existing Verbs call nor
> > -	 * /sys entries can tell them apart, this can only be done through
> > -	 * Netlink calls assuming kernel drivers are recent enough to
> > -	 * support them.
> > -	 *
> > -	 * In the event of identification failure through Netlink, try again
> > -	 * through sysfs, then either:
> > -	 *
> > -	 * 1. No device matches (n == 0), complain and bail out.
> > -	 * 2. A single IB device matches (n == 1) and is not a representor,
> > -	 *    assume no switch support.
> > -	 * 3. Otherwise no safe assumptions can be made; complain louder
> > and
> > -	 *    bail out.
> > +	 * Now we can determine the maximal
> > +	 * amount of devices to be spawned.
> >  	 */
> > -	for (i = 0; i != n; ++i) {
> > -		list[i].ibv_dev = ibv_match[i];
> > -		list[i].eth_dev = NULL;
> > -		if (nl_rdma < 0)
> > -			list[i].ifindex = 0;
> > -		else
> > -			list[i].ifindex = mlx5_nl_ifindex
> > -				(nl_rdma, list[i].ibv_dev->name, 1);
> > -		if (nl_route < 0 ||
> > -		    !list[i].ifindex ||
> > -		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
> > -					&list[i].info) ||
> > -		    ((!list[i].info.representor && !list[i].info.master) &&
> > -		     mlx5_sysfs_switch_info(list[i].ifindex, &list[i].info))) {
> > -			list[i].ifindex = 0;
> > -			memset(&list[i].info, 0, sizeof(list[i].info));
> > -			continue;
> > +	struct mlx5_dev_spawn_data list[np ? np : nd];
> > +
> > +	if (np > 1) {
> > +		/*
> > +		 * Signle IB device with multiple ports found,
> > +		 * it may be E-Switch master device and representors.
> > +		 * We have to perform identification trough the ports.
> > +		 */
> > +		assert(nl_rdma >= 0);
> > +		assert(ns == 0);
> > +		assert(nd == 1);
> > +		for (i = 1; i <= np; ++i) {
> > +			list[ns].max_port = np;
> > +			list[ns].ibv_port = i;
> > +			list[ns].ibv_dev = ibv_match[0];
> > +			list[ns].eth_dev = NULL;
> > +			list[ns].ifindex = mlx5_nl_ifindex
> > +					(nl_rdma, list[ns].ibv_dev->name, i);
> > +			if (!list[ns].ifindex) {
> > +				/*
> > +				 * No network interface index found for the
> > +				 * specified port, it means there is no
> > +				 * representor on this port. It's OK,
> > +				 * there can be disabled ports, for example
> > +				 * if sriov_numvfs < sriov_totalvfs.
> > +				 */
> > +				continue;
> > +			}
> > +			ret = -1;
> > +			if (nl_route >= 0)
> > +				ret = mlx5_nl_switch_info
> > +					       (nl_route,
> > +						list[ns].ifindex,
> > +						&list[ns].info);
> > +			if (ret || (!list[ns].info.representor &&
> > +				    !list[ns].info.master)) {
> > +				/*
> > +				 * We failed to recognize representors with
> > +				 * Netlink, let's try to perform the task
> > +				 * with sysfs.
> > +				 */
> > +				ret =  mlx5_sysfs_switch_info
> > +						(list[ns].ifindex,
> > +						 &list[ns].info);
> > +			}
> > +			if (!ret && (list[ns].info.representor ^
> > +				     list[ns].info.master))
> > +				ns++;
> >  		}
> > -	}
> > -	if (nl_rdma >= 0)
> > -		close(nl_rdma);
> > -	if (nl_route >= 0)
> > -		close(nl_route);
> > -	/* Count unidentified devices. */
> > -	for (u = 0, i = 0; i != n; ++i)
> > -		if (!list[i].info.master && !list[i].info.representor)
> > -			++u;
> > -	if (u) {
> > -		if (n == 1 && u == 1) {
> > -			/* Case #2. */
> > -			DRV_LOG(INFO, "no switch support detected");
> > -		} else {
> > -			/* Case #3. */
> > +		if (!ns) {
> > +			DRV_LOG(ERR,
> > +				"unable to recognize master/representors"
> > +				" on the IB device with multiple ports");
> > +			rte_errno = ENOENT;
> > +			ret = -rte_errno;
> > +			goto exit;
> > +		}
> > +	} else {
> > +		/*
> > +		 * The existence of several matching entries (nd > 1) means
> > +		 * port representors have been instantiated. No existing
> > Verbs
> > +		 * call nor sysfs entries can tell them apart, this can only
> > +		 * be done through Netlink calls assuming kernel drivers are
> > +		 * recent enough to support them.
> > +		 *
> > +		 * In the event of identification failure through Netlink,
> > +		 * try again through sysfs, then:
> > +		 *
> > +		 * 1. A single IB device matches (nd == 1) with single
> > +		 *    port (np=0/1) and is not a representor, assume
> > +		 *    no switch support.
> > +		 *
> > +		 * 2. Otherwise no safe assumptions can be made;
> > +		 *    complain louder and bail out.
> > +		 */
> > +		np = 1;
> > +		for (i = 0; i != nd; ++i) {
> > +			memset(&list[ns].info, 0, sizeof(list[ns].info));
> > +			list[ns].max_port = 1;
> > +			list[ns].ibv_port = 1;
> > +			list[ns].ibv_dev = ibv_match[i];
> > +			list[ns].eth_dev = NULL;
> > +			list[ns].ifindex = 0;
> > +			if (nl_rdma >= 0)
> > +				list[ns].ifindex = mlx5_nl_ifindex
> > +					(nl_rdma, list[ns].ibv_dev->name, 1);
> > +			if (!list[ns].ifindex) {
> > +				/*
> > +				 * No network interface index found for the
> > +				 * specified device, it means there it is not
> > +				 * a representor/master.
> > +				 */
> > +				continue;
> > +			}
> > +			ret = -1;
> > +			if (nl_route >= 0)
> > +				ret = mlx5_nl_switch_info
> > +					       (nl_route,
> > +						list[ns].ifindex,
> > +						&list[ns].info);
> > +			if (ret || (!list[ns].info.representor &&
> > +				    !list[ns].info.master)) {
> > +				/*
> > +				 * We failed to recognize representors with
> > +				 * Netlink, let's try to perform the task
> > +				 * with sysfs.
> > +				 */
> > +				ret =  mlx5_sysfs_switch_info
> > +						(list[ns].ifindex,
> > +						 &list[ns].info);
> > +			}
> > +			if (!ret && (list[ns].info.representor ^
> > +				     list[ns].info.master)) {
> > +				ns++;
> > +			} else if ((nd == 1) &&
> > +				   !list[ns].info.representor &&
> > +				   !list[ns].info.master) {
> > +				/*
> > +				 * Single IB device with
> > +				 * one physical port and
> > +				 * attached network device.
> > +				 * May be SRIOV is not enabled
> > +				 * or there is no representors.
> > +				 */
> > +				DRV_LOG(INFO, "no E-Switch support
> > detected");
> > +				ns++;
> > +				break;
> > +			}
> > +		}
> > +		if (!ns) {
> >  			DRV_LOG(ERR,
> > -				"unable to tell which of the matching
> > devices"
> > -				" is the master (lack of kernel support?)");
> > -			n = 0;
> > +				"unable to recognize master/representors"
> > +				" on the multiple IB devices");
> > +			rte_errno = ENOENT;
> > +			ret = -rte_errno;
> > +			goto exit;
> >  		}
> >  	}
> > +	assert(ns);
> >  	/*
> >  	 * Sort list to probe devices in natural order for users convenience
> >  	 * (i.e. master first, then representors from lowest to highest ID).
> >  	 */
> > -	if (n)
> > -		qsort(list, n, sizeof(*list), mlx5_dev_spawn_data_cmp);
> > +	qsort(list, ns, sizeof(*list), mlx5_dev_spawn_data_cmp);
> >  	/* Default configuration. */
> >  	dev_config = (struct mlx5_dev_config){
> >  		.hw_padding = 0,
> > @@ -1497,7 +1612,7 @@ struct mlx5_dev_spawn_data {
> >  			.min_rxqs_num = MLX5_MPRQ_MIN_RXQS,
> >  		},
> >  	};
> > -	/* Device speicific configuration. */
> > +	/* Device specific configuration. */
> >  	switch (pci_dev->id.device_id) {
> >  	case PCI_DEVICE_ID_MELLANOX_CONNECTX5BF:
> >  		dev_config.txqs_vec =
> > MLX5_VPMD_MAX_TXQS_BLUEFIELD; @@ -1514,12 +1629,12 @@ struct
> > mlx5_dev_spawn_data {
> >  	/* Set architecture-dependent default value if unset. */
> >  	if (dev_config.txqs_vec == MLX5_ARG_UNSET)
> >  		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS;
> > -	for (i = 0; i != n; ++i) {
> > +	for (i = 0; i != ns; ++i) {
> >  		uint32_t restore;
> >
> >  		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
> > -						 list[i].ibv_dev, dev_config,
> > -						 &list[i].info);
> > +						 &list[i],
> > +						 dev_config);
> >  		if (!list[i].eth_dev) {
> >  			if (rte_errno != EBUSY && rte_errno != EEXIST)
> >  				break;
> > @@ -1532,16 +1647,7 @@ struct mlx5_dev_spawn_data {
> >  		list[i].eth_dev->data->dev_flags |= restore;
> >  		rte_eth_dev_probing_finish(list[i].eth_dev);
> >  	}
> > -	mlx5_glue->free_device_list(ibv_list);
> > -	if (!n) {
> > -		DRV_LOG(WARNING,
> > -			"no Verbs device matches PCI device " PCI_PRI_FMT
> > ","
> > -			" are kernel drivers loaded?",
> > -			pci_dev->addr.domain, pci_dev->addr.bus,
> > -			pci_dev->addr.devid, pci_dev->addr.function);
> > -		rte_errno = ENOENT;
> > -		ret = -rte_errno;
> > -	} else if (i != n) {
> > +	if (i != ns) {
> >  		DRV_LOG(ERR,
> >  			"probe of PCI device " PCI_PRI_FMT " aborted after"
> >  			" encountering an error: %s",
> > @@ -1563,6 +1669,18 @@ struct mlx5_dev_spawn_data {
> >  	} else {
> >  		ret = 0;
> >  	}
> > +exit:
> > +	/*
> > +	 * Do the routine cleanup:
> > +	 * - close opened Netlink sockets
> > +	 * - free the Infiniband device list
> > +	 */
> > +	if (nl_rdma >= 0)
> > +		close(nl_rdma);
> > +	if (nl_route >= 0)
> > +		close(nl_route);
> > +	assert(ibv_list);
> > +	mlx5_glue->free_device_list(ibv_list);
> >  	return ret;
> >  }
> >
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/14] net/mlx5: add multiport IB device support to probing
  2019-03-21 12:54       ` Slava Ovsiienko
@ 2019-03-21 12:57         ` Slava Ovsiienko
  2019-03-24  9:00           ` Shahaf Shuler
  0 siblings, 1 reply; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-21 12:57 UTC (permalink / raw)
  To: Shahaf Shuler, dev

Sorry, missed some comments. Here is my extra answers.

> -----Original Message-----
> From: Slava Ovsiienko
> Sent: Thursday, March 21, 2019 14:54
> To: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: RE: [PATCH 05/14] net/mlx5: add multiport IB device support to
> probing
> 
> > -----Original Message-----
> > From: Shahaf Shuler
> > Sent: Thursday, March 21, 2019 14:15
> > To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> > Subject: RE: [PATCH 05/14] net/mlx5: add multiport IB device support
> > to probing
> >
> > Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> > > Subject: [PATCH 05/14] net/mlx5: add multiport IB device support to
> > > probing
> > >
> > > mlx5_pci_probe() routine is refactored to probe the ports of found
> > > Infiniband devices. All active ports (with attached network
> > > interface), belonging to the same Infiniband device will use the
> > > signle shared Infiniband context of that device.
> > >
> > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > ---
> > >  drivers/net/mlx5/mlx5.c | 302 +++++++++++++++++++++++++++++++++---
> > > ------------
> > >  1 file changed, 210 insertions(+), 92 deletions(-)
> > >
> > > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > > 89c30af..100e9f4 100644
> > > --- a/drivers/net/mlx5/mlx5.c
> > > +++ b/drivers/net/mlx5/mlx5.c
> > > @@ -130,6 +130,16 @@
> > >  /** Driver-specific log messages type. */  int mlx5_logtype;
> > >
> > > +/** Data associated with devices to spawn. */ struct
> > > +mlx5_dev_spawn_data {
> > > +	uint32_t ifindex; /**< Network interface index. */
> > > +	uint32_t max_port; /**< IB device maximal port index. */
> > > +	uint32_t ibv_port; /**< IB device physical port index. */
> > > +	struct mlx5_switch_info info; /**< Switch information. */
> > > +	struct ibv_device *ibv_dev; /**< Associated IB device. */
> > > +	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
> > > +};
> > > +
> > >  /**
> > >   * Prepare shared data between primary and secondary process.
> > >   */
> > > @@ -716,12 +726,10 @@
> > >   *
> > >   * @param dpdk_dev
> > >   *   Backing DPDK device.
> > > - * @param ibv_dev
> > > - *   Verbs device.
> > > + * @param spawn
> > > + *   Verbs device parameters (name, port, switch_info) to spawn.
> > >   * @param config
> > >   *   Device configuration parameters.
> > > - * @param[in] switch_info
> > > - *   Switch properties of Ethernet device.
> > >   *
> > >   * @return
> > >   *   A valid Ethernet device object on success, NULL otherwise and
> > rte_errno
> > > @@ -732,10 +740,11 @@
> > >   */
> > >  static struct rte_eth_dev *
> > >  mlx5_dev_spawn(struct rte_device *dpdk_dev,
> > > -	       struct ibv_device *ibv_dev,
> > > -	       struct mlx5_dev_config config,
> > > -	       const struct mlx5_switch_info *switch_info)
> > > +	       struct mlx5_dev_spawn_data *spawn,
> > > +	       struct mlx5_dev_config config)
> > >  {
> > > +	const struct mlx5_switch_info *switch_info = &spawn->info;
> > > +	struct ibv_device *ibv_dev = spawn->ibv_dev;
> > >  	struct ibv_context *ctx = NULL;
> > >  	struct ibv_device_attr_ex attr;
> > >  	struct ibv_port_attr port_attr;
> > > @@ -952,7 +961,7 @@
> > >  		return eth_dev;
> > >  	}
> > >  	/* Check port status. */
> > > -	err = mlx5_glue->query_port(ctx, 1, &port_attr);
> > > +	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
> > >  	if (err) {
> > >  		DRV_LOG(ERR, "port query failed: %s", strerror(err));
> > >  		goto error;
> > > @@ -1316,14 +1325,6 @@
> > >  	return NULL;
> > >  }
> > >
> > > -/** Data associated with devices to spawn. */ -struct
> > > mlx5_dev_spawn_data {
> > > -	unsigned int ifindex; /**< Network interface index. */
> > > -	struct mlx5_switch_info info; /**< Switch information. */
> > > -	struct ibv_device *ibv_dev; /**< Associated IB device. */
> > > -	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
> > > -};
> > > -
> > >  /**
> > >   * Comparison callback to sort device data.
> > >   *
> > > @@ -1380,7 +1381,9 @@ struct mlx5_dev_spawn_data {
> > >  	       struct rte_pci_device *pci_dev)  {
> > >  	struct ibv_device **ibv_list;
> > > -	unsigned int n = 0;
> > > +	unsigned int nd = 0;
> > > +	unsigned int np = 0;
> > > +	unsigned int ns = 0;
> >
> > This fields names are not informative. Find a better ones.
> 
> Would the adding clarifying comments be enough ?
> 
> nd - Number of (PCI) Devices   (nd != 1 means we have multiple devices with
> the same BDF - old schema)
> np - Number of (device) Ports (nd =1, np 1...n means we have new multiport
> device) ns - Number to Spawn  (deduced index - number of iterations)
> 
> This names are used as indices, long names may make code less readable,
> IMHO.
> 
> >
> > >  	struct mlx5_dev_config dev_config;
> > >  	int ret;
> > >
> > > @@ -1392,8 +1395,14 @@ struct mlx5_dev_spawn_data {
> > >  		DRV_LOG(ERR, "cannot list devices, is ib_uverbs loaded?");
> > >  		return -rte_errno;
> > >  	}
> > > -
> > > +	/*
> > > +	 * First scan the list of all Infiniband devices to find
> > > +	 * matching ones, gathering into the list.
> > > +	 */
> > >  	struct ibv_device *ibv_match[ret + 1];
> > > +	int nl_route = -1;
> > > +	int nl_rdma = -1;
> > > +	unsigned int i;
> > >
> > >  	while (ret-- > 0) {
> > >  		struct rte_pci_addr pci_addr;
> > > @@ -1408,77 +1417,183 @@ struct mlx5_dev_spawn_data {
> > >  			continue;
> > >  		DRV_LOG(INFO, "PCI information matches for device \"%s\"",
> > >  			ibv_list[ret]->name);
> > > -		ibv_match[n++] = ibv_list[ret];
> > > +		ibv_match[nd++] = ibv_list[ret];
> > > +	}
> > > +	ibv_match[nd] = NULL;
> > > +	if (!nd) {
> > > +		/* No device macthes, just complain and bail out. */
> > > +		mlx5_glue->free_device_list(ibv_list);
> > > +		DRV_LOG(WARNING,
> > > +			"no Verbs device matches PCI device " PCI_PRI_FMT
> > > ","
> > > +			" are kernel drivers loaded?",
> > > +			pci_dev->addr.domain, pci_dev->addr.bus,
> > > +			pci_dev->addr.devid, pci_dev->addr.function);
> > > +		rte_errno = ENOENT;
> > > +		ret = -rte_errno;
> > > +		return ret;
> > > +	}
> > > +	nl_route = mlx5_nl_init(NETLINK_ROUTE);
> > > +	nl_rdma = mlx5_nl_init(NETLINK_RDMA);
> > > +	if (nd == 1) {
> > > +		/*
> > > +		 * Found single matching device may have multiple ports.
> > > +		 * Each port may be representor, we have to check the port
> > > +		 * number and check the representors existence.
> > > +		 */
> > > +		if (nl_rdma >= 0)
> > > +			np = mlx5_nl_portnum(nl_rdma, ibv_match[0]-
> > > >name);
> > > +		if (!np)
> > > +			DRV_LOG(WARNING, "can not get IB device \"%s\""
> > > +					 " ports number", ibv_match[0]-
> > > >name);
> >
> > This warning is misleading. On old kernels it is expected to have
> > multiple IB devices instead of a single one w/ multiple ports.
> > The level should be changed for debug, and the syntax to express it is
> > not an error.

On old kernels we should get np = 1. If np == 0 it means an error, 
even if there is old kernel. Zero np means that is something is going
in wrong way and we should notify the user. We do not expect this
behavior from old/new kernels, so this message should not be
annoying.

> >
> > >  	}
> > > -	ibv_match[n] = NULL;
> > > -
> > > -	struct mlx5_dev_spawn_data list[n];
> > > -	int nl_route = n ? mlx5_nl_init(NETLINK_ROUTE) : -1;
> > > -	int nl_rdma = n ? mlx5_nl_init(NETLINK_RDMA) : -1;
> > > -	unsigned int i;
> > > -	unsigned int u;
> > > -
> > >  	/*
> > > -	 * The existence of several matching entries (n > 1) means port
> > > -	 * representors have been instantiated. No existing Verbs call nor
> > > -	 * /sys entries can tell them apart, this can only be done through
> > > -	 * Netlink calls assuming kernel drivers are recent enough to
> > > -	 * support them.
> > > -	 *
> > > -	 * In the event of identification failure through Netlink, try again
> > > -	 * through sysfs, then either:
> > > -	 *
> > > -	 * 1. No device matches (n == 0), complain and bail out.
> > > -	 * 2. A single IB device matches (n == 1) and is not a representor,
> > > -	 *    assume no switch support.
> > > -	 * 3. Otherwise no safe assumptions can be made; complain louder
> > > and
> > > -	 *    bail out.
> > > +	 * Now we can determine the maximal
> > > +	 * amount of devices to be spawned.
> > >  	 */
> > > -	for (i = 0; i != n; ++i) {
> > > -		list[i].ibv_dev = ibv_match[i];
> > > -		list[i].eth_dev = NULL;
> > > -		if (nl_rdma < 0)
> > > -			list[i].ifindex = 0;
> > > -		else
> > > -			list[i].ifindex = mlx5_nl_ifindex
> > > -				(nl_rdma, list[i].ibv_dev->name, 1);
> > > -		if (nl_route < 0 ||
> > > -		    !list[i].ifindex ||
> > > -		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
> > > -					&list[i].info) ||
> > > -		    ((!list[i].info.representor && !list[i].info.master) &&
> > > -		     mlx5_sysfs_switch_info(list[i].ifindex, &list[i].info))) {
> > > -			list[i].ifindex = 0;
> > > -			memset(&list[i].info, 0, sizeof(list[i].info));
> > > -			continue;
> > > +	struct mlx5_dev_spawn_data list[np ? np : nd];
> > > +
> > > +	if (np > 1) {
> > > +		/*
> > > +		 * Signle IB device with multiple ports found,
> > > +		 * it may be E-Switch master device and representors.
> > > +		 * We have to perform identification trough the ports.
> > > +		 */
> > > +		assert(nl_rdma >= 0);
> > > +		assert(ns == 0);
> > > +		assert(nd == 1);
> > > +		for (i = 1; i <= np; ++i) {
> > > +			list[ns].max_port = np;
> > > +			list[ns].ibv_port = i;
> > > +			list[ns].ibv_dev = ibv_match[0];
> > > +			list[ns].eth_dev = NULL;
> > > +			list[ns].ifindex = mlx5_nl_ifindex
> > > +					(nl_rdma, list[ns].ibv_dev->name, i);
> > > +			if (!list[ns].ifindex) {
> > > +				/*
> > > +				 * No network interface index found for the
> > > +				 * specified port, it means there is no
> > > +				 * representor on this port. It's OK,
> > > +				 * there can be disabled ports, for example
> > > +				 * if sriov_numvfs < sriov_totalvfs.
> > > +				 */
> > > +				continue;
> > > +			}
> > > +			ret = -1;
> > > +			if (nl_route >= 0)
> > > +				ret = mlx5_nl_switch_info
> > > +					       (nl_route,
> > > +						list[ns].ifindex,
> > > +						&list[ns].info);
> > > +			if (ret || (!list[ns].info.representor &&
> > > +				    !list[ns].info.master)) {
> > > +				/*
> > > +				 * We failed to recognize representors with
> > > +				 * Netlink, let's try to perform the task
> > > +				 * with sysfs.
> > > +				 */
> > > +				ret =  mlx5_sysfs_switch_info
> > > +						(list[ns].ifindex,
> > > +						 &list[ns].info);
> > > +			}
> > > +			if (!ret && (list[ns].info.representor ^
> > > +				     list[ns].info.master))
> > > +				ns++;
> > >  		}
> > > -	}
> > > -	if (nl_rdma >= 0)
> > > -		close(nl_rdma);
> > > -	if (nl_route >= 0)
> > > -		close(nl_route);
> > > -	/* Count unidentified devices. */
> > > -	for (u = 0, i = 0; i != n; ++i)
> > > -		if (!list[i].info.master && !list[i].info.representor)
> > > -			++u;
> > > -	if (u) {
> > > -		if (n == 1 && u == 1) {
> > > -			/* Case #2. */
> > > -			DRV_LOG(INFO, "no switch support detected");
> > > -		} else {
> > > -			/* Case #3. */
> > > +		if (!ns) {
> > > +			DRV_LOG(ERR,
> > > +				"unable to recognize master/representors"
> > > +				" on the IB device with multiple ports");
> > > +			rte_errno = ENOENT;
> > > +			ret = -rte_errno;
> > > +			goto exit;
> > > +		}
> > > +	} else {
> > > +		/*
> > > +		 * The existence of several matching entries (nd > 1) means
> > > +		 * port representors have been instantiated. No existing
> > > Verbs
> > > +		 * call nor sysfs entries can tell them apart, this can only
> > > +		 * be done through Netlink calls assuming kernel drivers are
> > > +		 * recent enough to support them.
> > > +		 *
> > > +		 * In the event of identification failure through Netlink,
> > > +		 * try again through sysfs, then:
> > > +		 *
> > > +		 * 1. A single IB device matches (nd == 1) with single
> > > +		 *    port (np=0/1) and is not a representor, assume
> > > +		 *    no switch support.
> > > +		 *
> > > +		 * 2. Otherwise no safe assumptions can be made;
> > > +		 *    complain louder and bail out.
> > > +		 */
> > > +		np = 1;
> > > +		for (i = 0; i != nd; ++i) {
> > > +			memset(&list[ns].info, 0, sizeof(list[ns].info));
> > > +			list[ns].max_port = 1;
> > > +			list[ns].ibv_port = 1;
> > > +			list[ns].ibv_dev = ibv_match[i];
> > > +			list[ns].eth_dev = NULL;
> > > +			list[ns].ifindex = 0;
> > > +			if (nl_rdma >= 0)
> > > +				list[ns].ifindex = mlx5_nl_ifindex
> > > +					(nl_rdma, list[ns].ibv_dev->name, 1);
> > > +			if (!list[ns].ifindex) {
> > > +				/*
> > > +				 * No network interface index found for the
> > > +				 * specified device, it means there it is not
> > > +				 * a representor/master.
> > > +				 */
> > > +				continue;
> > > +			}
> > > +			ret = -1;
> > > +			if (nl_route >= 0)
> > > +				ret = mlx5_nl_switch_info
> > > +					       (nl_route,
> > > +						list[ns].ifindex,
> > > +						&list[ns].info);
> > > +			if (ret || (!list[ns].info.representor &&
> > > +				    !list[ns].info.master)) {
> > > +				/*
> > > +				 * We failed to recognize representors with
> > > +				 * Netlink, let's try to perform the task
> > > +				 * with sysfs.
> > > +				 */
> > > +				ret =  mlx5_sysfs_switch_info
> > > +						(list[ns].ifindex,
> > > +						 &list[ns].info);
> > > +			}
> > > +			if (!ret && (list[ns].info.representor ^
> > > +				     list[ns].info.master)) {
> > > +				ns++;
> > > +			} else if ((nd == 1) &&
> > > +				   !list[ns].info.representor &&
> > > +				   !list[ns].info.master) {
> > > +				/*
> > > +				 * Single IB device with
> > > +				 * one physical port and
> > > +				 * attached network device.
> > > +				 * May be SRIOV is not enabled
> > > +				 * or there is no representors.
> > > +				 */
> > > +				DRV_LOG(INFO, "no E-Switch support
> > > detected");
> > > +				ns++;
> > > +				break;
> > > +			}
> > > +		}
> > > +		if (!ns) {
> > >  			DRV_LOG(ERR,
> > > -				"unable to tell which of the matching
> > > devices"
> > > -				" is the master (lack of kernel support?)");
> > > -			n = 0;
> > > +				"unable to recognize master/representors"
> > > +				" on the multiple IB devices");
> > > +			rte_errno = ENOENT;
> > > +			ret = -rte_errno;
> > > +			goto exit;
> > >  		}
> > >  	}
> > > +	assert(ns);
> > >  	/*
> > >  	 * Sort list to probe devices in natural order for users convenience
> > >  	 * (i.e. master first, then representors from lowest to highest ID).
> > >  	 */
> > > -	if (n)
> > > -		qsort(list, n, sizeof(*list), mlx5_dev_spawn_data_cmp);
> > > +	qsort(list, ns, sizeof(*list), mlx5_dev_spawn_data_cmp);
> > >  	/* Default configuration. */
> > >  	dev_config = (struct mlx5_dev_config){
> > >  		.hw_padding = 0,
> > > @@ -1497,7 +1612,7 @@ struct mlx5_dev_spawn_data {
> > >  			.min_rxqs_num = MLX5_MPRQ_MIN_RXQS,
> > >  		},
> > >  	};
> > > -	/* Device speicific configuration. */
> > > +	/* Device specific configuration. */
> > >  	switch (pci_dev->id.device_id) {
> > >  	case PCI_DEVICE_ID_MELLANOX_CONNECTX5BF:
> > >  		dev_config.txqs_vec =
> > > MLX5_VPMD_MAX_TXQS_BLUEFIELD; @@ -1514,12 +1629,12 @@ struct
> > > mlx5_dev_spawn_data {
> > >  	/* Set architecture-dependent default value if unset. */
> > >  	if (dev_config.txqs_vec == MLX5_ARG_UNSET)
> > >  		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS;
> > > -	for (i = 0; i != n; ++i) {
> > > +	for (i = 0; i != ns; ++i) {
> > >  		uint32_t restore;
> > >
> > >  		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
> > > -						 list[i].ibv_dev, dev_config,
> > > -						 &list[i].info);
> > > +						 &list[i],
> > > +						 dev_config);
> > >  		if (!list[i].eth_dev) {
> > >  			if (rte_errno != EBUSY && rte_errno != EEXIST)
> > >  				break;
> > > @@ -1532,16 +1647,7 @@ struct mlx5_dev_spawn_data {
> > >  		list[i].eth_dev->data->dev_flags |= restore;
> > >  		rte_eth_dev_probing_finish(list[i].eth_dev);
> > >  	}
> > > -	mlx5_glue->free_device_list(ibv_list);
> > > -	if (!n) {
> > > -		DRV_LOG(WARNING,
> > > -			"no Verbs device matches PCI device " PCI_PRI_FMT
> > > ","
> > > -			" are kernel drivers loaded?",
> > > -			pci_dev->addr.domain, pci_dev->addr.bus,
> > > -			pci_dev->addr.devid, pci_dev->addr.function);
> > > -		rte_errno = ENOENT;
> > > -		ret = -rte_errno;
> > > -	} else if (i != n) {
> > > +	if (i != ns) {
> > >  		DRV_LOG(ERR,
> > >  			"probe of PCI device " PCI_PRI_FMT " aborted after"
> > >  			" encountering an error: %s",
> > > @@ -1563,6 +1669,18 @@ struct mlx5_dev_spawn_data {
> > >  	} else {
> > >  		ret = 0;
> > >  	}
> > > +exit:
> > > +	/*
> > > +	 * Do the routine cleanup:
> > > +	 * - close opened Netlink sockets
> > > +	 * - free the Infiniband device list
> > > +	 */
> > > +	if (nl_rdma >= 0)
> > > +		close(nl_rdma);
> > > +	if (nl_route >= 0)
> > > +		close(nl_route);
> > > +	assert(ibv_list);
> > > +	mlx5_glue->free_device_list(ibv_list);
> > >  	return ret;
> > >  }
> > >
> > > --
> > > 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 00/14] net/mlx5: add support for multiport IB devices
  2019-03-21 12:13   ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Shahaf Shuler
@ 2019-03-21 12:58     ` Slava Ovsiienko
  0 siblings, 0 replies; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-21 12:58 UTC (permalink / raw)
  To: Shahaf Shuler, dev

Hi, Shahaf,

My special thanks for immediate review. PSB.

With best regards,
Slava

> -----Original Message-----
> From: Shahaf Shuler
> Sent: Thursday, March 21, 2019 14:14
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Subject: RE: [PATCH 00/14] net/mlx5: add support for multiport IB devices
> 
> Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> > Subject: [PATCH 00/14] net/mlx5: add support for multiport IB devices
> >
> > The Mellanox NICs support SR-IOV and have E-Switch feature.
> > When SR-IOV is set up in switchdev mode and E-Switch is enabled we
> > have so called VF representors in the system. All representors
> > belonging to the same E-Switch are created on the basis of the single
> > PCI function and with current implementation each representor has its
> > own dedicated Infiniband device and operates within its own Infiniband
> > context. It is proposed to provide representors as ports of the single
> > Infiniband device and operate on the shared Infiniband context.
> >
> > mlx5_pci_probe() routine is planned to be extended to probe the ports
> > of found Infiniband devices. All active ports, belonging to the same
> > Infiniband device will use the signle shared Infiniband context of that
> device:
> >   - QP are created within shared context, with assigning to
> >     specified port
> >   - Verb flows are also created with specifying port index
> >   - Protection Domain is shared by all representors
> >   - Global Memory Regions Cache is shared by all representors,
> >     local ones are not (not in this patchset)
> >   - Shared Asynchronous Event Handler distributes events
> >     among the device ports (not in this patchset)
> >   - UAR mapping to user space is also shared (not in patchset)
> >
> > Each representor has its own attach network interface device,
> > interface name and index can be retrieved via IB device and port
> > index, so the link control is performed via Netlink in the same way as
> > in existing implementation.
> 
> Nice Work Slava,
> Please see my comments.
> 
> Also - where is the support for sharing of the memory regions?

It is coming soon, as separate patch/patchset.
Ditto regarding UAR sharing.

> 
> >
> > This patchset also superseedes http://patches.dpdk.org/patch/50486/
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> >
> > Viacheslav Ovsiienko (14):
> >   net/mlx5: add representor recognition on kernels 5.x
> >   net/mlx5: introduce multiport IB device shared structure
> >   net/mlx5: modify get ifindex routine for multiport IB
> >   net/mlx5: add getting IB ports number for multiport IB
> >   net/mlx5: add multiport IB device support to probing
> >   net/mlx5: add IB shared context alloc/free functions
> >   net/mlx5: switch to the names in the shared IB context
> >   net/mlx5: switch to the shared Protection Domain
> >   net/mlx5: switch to the shared context IB attributes
> >   net/mlx5: switch to the shared IB device context
> >   net/mlx5: provide IB port for the object being created
> >   net/mlx5: update install/uninstall int handler routines
> >   net/mlx5: update event handler for multiport IB devices
> >   net/mlx5: add source vport match to the ingress rules
> >
> >  drivers/net/mlx5/Makefile          |  10 +
> >  drivers/net/mlx5/meson.build       |   4 +
> >  drivers/net/mlx5/mlx5.c            | 549 +++++++++++++++++++++++++++------
> > ----
> >  drivers/net/mlx5/mlx5.h            |  38 ++-
> >  drivers/net/mlx5/mlx5_ethdev.c     | 241 ++++++++++------
> >  drivers/net/mlx5/mlx5_flow.c       |   2 +
> >  drivers/net/mlx5/mlx5_flow_dv.c    |  47 +++-
> >  drivers/net/mlx5/mlx5_flow_verbs.c |   7 +-
> >  drivers/net/mlx5/mlx5_mr.c         |   4 +-
> >  drivers/net/mlx5/mlx5_nl.c         | 101 ++++++-
> >  drivers/net/mlx5/mlx5_rxq.c        |  41 +--
> >  drivers/net/mlx5/mlx5_socket.c     |   4 +-
> >  drivers/net/mlx5/mlx5_stats.c      |  22 +-
> >  drivers/net/mlx5/mlx5_trigger.c    |   2 +-
> >  drivers/net/mlx5/mlx5_txq.c        |  24 +-
> >  15 files changed, 795 insertions(+), 301 deletions(-)
> >
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 03/14] net/mlx5: modify get ifindex routine for multiport IB
  2019-03-21 12:14     ` Shahaf Shuler
@ 2019-03-21 12:58       ` Slava Ovsiienko
  0 siblings, 0 replies; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-21 12:58 UTC (permalink / raw)
  To: Shahaf Shuler, dev

> -----Original Message-----
> From: Shahaf Shuler
> Sent: Thursday, March 21, 2019 14:14
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Subject: RE: [PATCH 03/14] net/mlx5: modify get ifindex routine for multiport
> IB
> 
> Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> > Subject: [PATCH 03/14] net/mlx5: modify get ifindex routine for
> > multiport IB
> >
> > There is the routine mlx5_nl_ifindex() returning the network interface
> > index associated with Infiniband device.
> > We are going to support multiport IB devices, now function takes the
> > IB port as argument and returns ifindex associated with tuple <IB
> > device, IB port>
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5.c        |  2 +-
> >  drivers/net/mlx5/mlx5.h        |  2 +-
> >  drivers/net/mlx5/mlx5_ethdev.c |  2 +-
> >  drivers/net/mlx5/mlx5_nl.c     | 20 +++++++++++++-------
> >  4 files changed, 16 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > ea3d00c..89c30af 100644
> > --- a/drivers/net/mlx5/mlx5.c
> > +++ b/drivers/net/mlx5/mlx5.c
> > @@ -1441,7 +1441,7 @@ struct mlx5_dev_spawn_data {
> >  			list[i].ifindex = 0;
> >  		else
> >  			list[i].ifindex = mlx5_nl_ifindex
> > -				(nl_rdma, list[i].ibv_dev->name);
> > +				(nl_rdma, list[i].ibv_dev->name, 1);
> 
> Why hardcoded to 1?

Because the given part of patchset regards only mlx5_nl_ifindex() change.
We are not introducing the multiport Infiniband device right now and here,
it is coming in the next parts of the patchset. This "hardcoded 1"
will be replaced by actual index (please, see the next parts).

> 
> >  		if (nl_route < 0 ||
> >  		    !list[i].ifindex ||
> >  		    mlx5_nl_switch_info(nl_route, list[i].ifindex, diff --git
> > a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> > 1716cd1..f5e29ca 100644
> > --- a/drivers/net/mlx5/mlx5.h
> > +++ b/drivers/net/mlx5/mlx5.h
> > @@ -470,7 +470,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev
> > *dev, struct ether_addr *mac,  void mlx5_nl_mac_addr_flush(struct
> > rte_eth_dev *dev);  int mlx5_nl_promisc(struct rte_eth_dev *dev, int
> > enable);  int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable); -
> > unsigned int mlx5_nl_ifindex(int nl, const char *name);
> > +unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t
> > +pindex);
> >  int mlx5_nl_switch_info(int nl, unsigned int ifindex,
> >  			struct mlx5_switch_info *info);
> >
> > diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> > b/drivers/net/mlx5/mlx5_ethdev.c index 81f2a42..5b44889 100644
> > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > @@ -222,7 +222,7 @@ struct ethtool_link_settings {
> >  	struct mlx5_priv *priv = dev->data->dev_private;
> >  	unsigned int ifindex =
> >  		priv->nl_socket_rdma >= 0 ?
> > -		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name) :
> > 0;
> > +		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name,
> > 1) : 0;
> >
> >  	if (!ifindex) {
> >  		if (!priv->representor)
> > diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
> > index aa49cb4..6e9145a 100644
> > --- a/drivers/net/mlx5/mlx5_nl.c
> > +++ b/drivers/net/mlx5/mlx5_nl.c
> > @@ -85,11 +85,12 @@ struct mlx5_nl_mac_addr {
> >  	int mac_n; /**< Number of addresses in the array. */  };
> >
> > -/** Data structure used by mlx5_nl_ifindex_cb(). */
> > +/** Data structure used by mlx5_nl_cmdget_cb(). */
> >  struct mlx5_nl_ifindex_data {
> >  	const char *name; /**< IB device name (in). */
> >  	uint32_t ibindex; /**< IB device index (out). */
> >  	uint32_t ifindex; /**< Network interface index (out). */
> > +	uint32_t portnum; /**< IB device max port number. */
> 
> What happen w/ old kernel which don't support it? the portnum is ignored?

On old kernels there is always port 1 exists. So, it os OK to use 1 for this only port.

> 
> >  };
> >
> >  /**
> > @@ -695,12 +696,13 @@ struct mlx5_nl_ifindex_data {
> >   *   0 on success, a negative errno value otherwise and rte_errno is set.
> >   */
> >  static int
> > -mlx5_nl_ifindex_cb(struct nlmsghdr *nh, void *arg)
> > +mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
> >  {
> >  	struct mlx5_nl_ifindex_data *data = arg;
> >  	size_t off = NLMSG_HDRLEN;
> >  	uint32_t ibindex = 0;
> >  	uint32_t ifindex = 0;
> > +	uint32_t portnum = 0;
> >  	int found = 0;
> >
> >  	if (nh->nlmsg_type !=
> > @@ -725,6 +727,9 @@ struct mlx5_nl_ifindex_data {
> >  		case RDMA_NLDEV_ATTR_NDEV_INDEX:
> >  			ifindex = *(uint32_t *)payload;
> >  			break;
> > +		case RDMA_NLDEV_ATTR_PORT_INDEX:
> > +			portnum = *(uint32_t *)payload;
> > +			break;
> >  		default:
> >  			break;
> >  		}
> > @@ -733,6 +738,7 @@ struct mlx5_nl_ifindex_data {
> >  	if (found) {
> >  		data->ibindex = ibindex;
> >  		data->ifindex = ifindex;
> > +		data->portnum = portnum;
> >  	}
> >  	return 0;
> >  error:
> > @@ -751,15 +757,15 @@ struct mlx5_nl_ifindex_data {
> >   *   Netlink socket of the RDMA kind (NETLINK_RDMA).
> >   * @param[in] name
> >   *   IB device name.
> > - *
> > + * @param[in] pindex
> > + *   IB device port index, starting from 1
> >   * @return
> >   *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
> >   *   is set.
> >   */
> >  unsigned int
> > -mlx5_nl_ifindex(int nl, const char *name)
> > +mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
> >  {
> > -	static const uint32_t pindex = 1;
> >  	uint32_t seq = random();
> >  	struct mlx5_nl_ifindex_data data = {
> >  		.name = name,
> > @@ -785,7 +791,7 @@ struct mlx5_nl_ifindex_data {
> >  	ret = mlx5_nl_send(nl, &req.nh, seq);
> >  	if (ret < 0)
> >  		return 0;
> > -	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
> > +	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
> >  	if (ret < 0)
> >  		return 0;
> >  	if (!data.ibindex)
> > @@ -808,7 +814,7 @@ struct mlx5_nl_ifindex_data {
> >  	ret = mlx5_nl_send(nl, &req.nh, seq);
> >  	if (ret < 0)
> >  		return 0;
> > -	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
> > +	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
> >  	if (ret < 0)
> >  		return 0;
> >  	if (!data.ifindex)
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 12/14] net/mlx5: update install/uninstall int handler routines
  2019-03-21 12:15     ` Shahaf Shuler
@ 2019-03-21 14:01       ` Slava Ovsiienko
  2019-03-24  9:07         ` Shahaf Shuler
  0 siblings, 1 reply; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-21 14:01 UTC (permalink / raw)
  To: Shahaf Shuler, dev

> -----Original Message-----
> From: Shahaf Shuler
> Sent: Thursday, March 21, 2019 14:15
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Subject: RE: [PATCH 12/14] net/mlx5: update install/uninstall int handler
> routines
> 
> Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> > Subject: [PATCH 12/14] net/mlx5: update install/uninstall int handler
> > routines
> >
> > We are implementing the support for multport Infiniband device withj
> > representors attached to these multiple ports. Asynchronous device
> > event notifications (link status change, removal event, etc.) should
> > be shared between ports. We are going to implement shared event
> > handler and this patch introduces appropriate device structure changes
> > and updated event handler install and uninstall routines.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5.c        |  14 ++++-
> >  drivers/net/mlx5/mlx5.h        |   3 +-
> >  drivers/net/mlx5/mlx5_ethdev.c | 118
> > ++++++++++++++++++++++++++++++++---------
> >  3 files changed, 107 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > 312c42b..44b7a87 100644
> > --- a/drivers/net/mlx5/mlx5.c
> > +++ b/drivers/net/mlx5/mlx5.c
> > @@ -165,6 +165,7 @@ struct mlx5_dev_spawn_data {  {
> >  	struct mlx5_ibv_shared *sh;
> >  	int err = 0;
> > +	uint32_t i;
> >
> >  	assert(spawn);
> >  	/* Search for IB context by device name. */ @@ -212,6 +213,9 @@
> > struct mlx5_dev_spawn_data {
> >  		sizeof(sh->ibdev_name));
> >  	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
> >  		sizeof(sh->ibdev_path));
> > +	pthread_mutex_init(&sh->intr_mutex, NULL);
> > +	for (i = 0; i < sh->max_port; i++)
> > +		sh->port[i].port_id = RTE_MAX_ETHPORTS;
> 
> Why you need struct here? You port array is not just of uint32_t type?

For the case if we would like to add some other per-port data
accessible only from shared context. For example - in interrupt
handler we have only one parameter - the shared context, and we
should deduce eth_dev for the some device (not DPDK port_id) port

Actually it is uint_32_t array for now, but it is easily extandable,
for example, we could add per-port context for interrupt
handler.
	
> 
> >  	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> >  		/*
> >  		 * For secondary process we just open the IB device @@ -
> > 276,6 +280,15 @@ struct mlx5_dev_spawn_data {
> >  		assert(!sh->pd);
> >  	}
> >  	LIST_REMOVE(sh, next);
> > +	/*
> > +	 *  Ensure there is no async event handler installed.
> > +	 *  Only primary process handles async device events.
> > +	 **/
> > +	assert(!sh->intr_cnt);
> > +	if (sh->intr_cnt)
> > +		rte_intr_callback_unregister
> > +			(&sh->intr_handle, mlx5_dev_interrupt_handler,
> > sh);
> > +	pthread_mutex_destroy(&sh->intr_mutex);
> >  	if (sh->pd)
> >  		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
> >  	if (sh->ctx)
> > @@ -283,7 +296,6 @@ struct mlx5_dev_spawn_data {
> >  	rte_free(sh);
> >  }
> >
> > -
> >  /**
> >   * Prepare shared data between primary and secondary process.
> >   */
> > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> > d816d24..f23298e 100644
> > --- a/drivers/net/mlx5/mlx5.h
> > +++ b/drivers/net/mlx5/mlx5.h
> > @@ -216,6 +216,8 @@ struct mlx5_ibv_shared {
> >  	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
> >  	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for
> secondary
> > */
> >  	struct ibv_device_attr_ex device_attr; /* Device properties. */
> > +	pthread_mutex_t intr_mutex; /* Interrupt config mutex. */
> > +	uint32_t intr_cnt; /* Interrupt handler reference counter. */
> >  	struct rte_intr_handle intr_handle; /* Interrupt handler for device.
> > */
> >  	struct mlx5_ibv_shared_port port[]; /* per device port data array.
> > */ }; @@ -245,7 +247,6 @@ struct mlx5_priv {
> >  	struct mlx5_txq_data *(*txqs)[]; /* TX queues. */
> >  	struct rte_mempool *mprq_mp; /* Mempool for Multi-Packet RQ.
> > */
> >  	struct rte_eth_rss_conf rss_conf; /* RSS configuration. */
> > -	struct rte_intr_handle intr_handle; /* Interrupt handler. */
> >  	unsigned int (*reta_idx)[]; /* RETA index table. */
> >  	unsigned int reta_idx_n; /* RETA index size. */
> >  	struct mlx5_drop drop_queue; /* Flow drop queues. */ diff --git
> > a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
> > index
> > 1b2173b..8358cd2 100644
> > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > @@ -1109,6 +1109,96 @@ int mlx5_fw_version_get(struct rte_eth_dev
> > *dev, char *fw_ver, size_t fw_size)  }
> >
> >  /**
> > + * Uninstall shared asynchronous device events handler.
> > + * This function is implemeted to support event sharing
> > + * between multiple ports of single IB device.
> > + *
> > + * @param dev
> > + *   Pointer to Ethernet device.
> > + */
> > +static void
> > +mlx5_dev_shared_handler_uninstall(struct rte_eth_dev *dev) {
> > +	struct mlx5_priv *priv = dev->data->dev_private;
> > +	struct mlx5_ibv_shared *sh = priv->sh;
> > +
> > +	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> > +		return;
> > +	pthread_mutex_lock(&sh->intr_mutex);
> > +	assert(priv->ibv_port);
> > +	assert(priv->ibv_port <= sh->max_port);
> > +	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
> > +	if (sh->port[priv->ibv_port - 1].port_id >= RTE_MAX_ETHPORTS)
> > +		goto exit;
> > +	assert(sh->port[priv->ibv_port - 1].port_id ==
> > +					(uint32_t)dev->data->port_id);
> > +	assert(sh->intr_cnt);
> > +	sh->port[priv->ibv_port - 1].port_id = RTE_MAX_ETHPORTS;
> > +	if (!sh->intr_cnt || --sh->intr_cnt)
> > +		goto exit;
> > +	rte_intr_callback_unregister(&sh->intr_handle,
> > +				     mlx5_dev_interrupt_handler, sh);
> > +	sh->intr_handle.fd = 0;
> > +	sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> > +exit:
> > +	pthread_mutex_unlock(&sh->intr_mutex);
> > +}
> > +
> > +/**
> > + * Install shared asyncronous device events handler.
> > + * This function is implemeted to support event sharing
> > + * between multiple ports of single IB device.
> > + *
> > + * @param dev
> > + *   Pointer to Ethernet device.
> > + */
> > +static void
> > +mlx5_dev_shared_handler_install(struct rte_eth_dev *dev) {
> > +	struct mlx5_priv *priv = dev->data->dev_private;
> > +	struct mlx5_ibv_shared *sh = priv->sh;
> > +	int ret;
> > +	int flags;
> > +
> > +	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> > +		return;
> > +	pthread_mutex_lock(&sh->intr_mutex);
> > +	assert(priv->ibv_port);
> > +	assert(priv->ibv_port <= sh->max_port);
> > +	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
> > +	if (sh->port[priv->ibv_port - 1].port_id < RTE_MAX_ETHPORTS) {
> 
> I don't understand why need an array to understand handler is already
> exists.
> Why not the refcnt?

Array is needed to deduce the eth_dev from the device port number.
Here is interrupt handler flow:
- entry
- for()
 - get_event()
- get device port (note, this is IB port index, not DPDK port id) from event
- check in the array whether the handler is installed for this port 
  (array member is less than RTE_MAX_ETHPORTS)
-  get DPDK port_id from array()

Array member just indicates whether the handler for  given IB port is
installed. Reference counter is used for rte_intr_callback_register/
rte_intr_callback_unregister calls. 
rte_intr_callback_register() is called when the first handler for the port is
being installed.
rte_intr_callback_unregister() is called when the lastt handler for the port is
being gone away.

> 
> > +		/* The handler is already installed for this port. */
> > +		assert(sh->intr_cnt++);
> 
> Asserts are compiled only in debug mode. You should not put any logic (++)
> into them.

Yes, it is a bug, there should no be "++" at all. Thanks. 

> 
> > +		goto exit;
> > +	}
> > +	sh->port[priv->ibv_port - 1].port_id = (uint32_t)dev->data->port_id;
> > +	if (sh->intr_cnt) {
> > +		sh->intr_cnt++;
> > +		goto exit;
> > +	}
> > +	/* No shared handler installed. */
> > +	assert(sh->ctx->async_fd > 0);
> > +	flags = fcntl(sh->ctx->async_fd, F_GETFL);
> > +	ret = fcntl(sh->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
> > +	if (ret) {
> > +		DRV_LOG(INFO, "failed to change file descriptor"
> > +			      " async event queue");
> > +		/* Indicate there will be no interrupts. */
> > +		dev->data->dev_conf.intr_conf.lsc = 0;
> > +		dev->data->dev_conf.intr_conf.rmv = 0;
> > +		sh->port[priv->ibv_port - 1].port_id = RTE_MAX_ETHPORTS;
> > +		goto exit;
> > +	}
> > +	sh->intr_handle.fd = sh->ctx->async_fd;
> > +	sh->intr_handle.type = RTE_INTR_HANDLE_EXT;
> > +	rte_intr_callback_register(&sh->intr_handle,
> > +				   mlx5_dev_interrupt_handler, sh);
> > +	sh->intr_cnt++;
> > +exit:
> > +	pthread_mutex_unlock(&sh->intr_mutex);
> > +}
> > +
> > +/**
> >   * Uninstall interrupt handler.
> >   *
> >   * @param dev
> > @@ -1119,15 +1209,10 @@ int mlx5_fw_version_get(struct rte_eth_dev
> > *dev, char *fw_ver, size_t fw_size)  {
> >  	struct mlx5_priv *priv = dev->data->dev_private;
> >
> > -	if (dev->data->dev_conf.intr_conf.lsc ||
> > -	    dev->data->dev_conf.intr_conf.rmv)
> > -		rte_intr_callback_unregister(&priv->intr_handle,
> > -					     mlx5_dev_interrupt_handler,
> > dev);
> > +	mlx5_dev_shared_handler_uninstall(dev);
> >  	if (priv->primary_socket)
> >  		rte_intr_callback_unregister(&priv->intr_handle_socket,
> >  					     mlx5_dev_handler_socket, dev);
> > -	priv->intr_handle.fd = 0;
> > -	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> >  	priv->intr_handle_socket.fd = 0;
> >  	priv->intr_handle_socket.type = RTE_INTR_HANDLE_UNKNOWN;  }
> @@
> > -1142,28 +1227,9 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev,
> > char *fw_ver, size_t fw_size)
> > mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)  {
> >  	struct mlx5_priv *priv = dev->data->dev_private;
> > -	struct ibv_context *ctx = priv->sh->ctx;
> >  	int ret;
> > -	int flags;
> >
> > -	assert(ctx->async_fd > 0);
> > -	flags = fcntl(ctx->async_fd, F_GETFL);
> > -	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
> > -	if (ret) {
> > -		DRV_LOG(INFO,
> > -			"port %u failed to change file descriptor async event"
> > -			" queue",
> > -			dev->data->port_id);
> > -		dev->data->dev_conf.intr_conf.lsc = 0;
> > -		dev->data->dev_conf.intr_conf.rmv = 0;
> > -	}
> > -	if (dev->data->dev_conf.intr_conf.lsc ||
> > -	    dev->data->dev_conf.intr_conf.rmv) {
> > -		priv->intr_handle.fd = ctx->async_fd;
> > -		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
> > -		rte_intr_callback_register(&priv->intr_handle,
> > -					   mlx5_dev_interrupt_handler, dev);
> > -	}
> > +	mlx5_dev_shared_handler_install(dev);
> >  	ret = mlx5_socket_init(dev);
> >  	if (ret)
> >  		DRV_LOG(ERR, "port %u cannot initialise socket: %s",
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 13/14] net/mlx5: update event handler for multiport IB devices
  2019-03-21 12:15     ` Shahaf Shuler
@ 2019-03-21 14:08       ` Slava Ovsiienko
  0 siblings, 0 replies; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-21 14:08 UTC (permalink / raw)
  To: Shahaf Shuler, dev

> -----Original Message-----
> From: Shahaf Shuler
> Sent: Thursday, March 21, 2019 14:15
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Subject: RE: [PATCH 13/14] net/mlx5: update event handler for multiport IB
> devices
> 
> Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> > Subject: [PATCH 13/14] net/mlx5: update event handler for multiport IB
> > devices
> >
> > This patch modifies asynchronous event handler to support multiport
> > Infiniband devices. Handler queries the event parameters, including
> > event source port index, and invokes the handler for specific devices
> > with appropriate port_id.
> 
> This commit should be along w/ the previous one, since interrupts will not
> work after commit #12 in this series.

It was 😊
I tried first to do these parts in single commit. But the resulting diff looked
as very weird thing - it intermixed the handler and install/uninstall
routines. So, it was intentionally  splitted in two for better readability.
Nonetheless, if you insist - the is no any problem to squash these ones.
Please, let me know.

> 
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5_ethdev.c | 101
> > +++++++++++++++++++++------------
> > --------
> >  1 file changed, 51 insertions(+), 50 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> > b/drivers/net/mlx5/mlx5_ethdev.c index 8358cd2..710e6b5 100644
> > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > @@ -1032,66 +1032,67 @@ int mlx5_fw_version_get(struct rte_eth_dev
> > *dev, char *fw_ver, size_t fw_size)  }
> >
> >  /**
> > - * Device status handler.
> > + * Handle shared asynchronous events the NIC (removal event
> > + * and link status change). Supports multiport IB device.
> >   *
> > - * @param dev
> > - *   Pointer to Ethernet device.
> > - * @param events
> > - *   Pointer to event flags holder.
> > - *
> > - * @return
> > - *   Events bitmap of callback process which can be called immediately.
> > + * @param cb_arg
> > + *   Callback argument.
> >   */
> > -static uint32_t
> > -mlx5_dev_status_handler(struct rte_eth_dev *dev)
> > +void
> > +mlx5_dev_interrupt_handler(void *cb_arg)
> >  {
> > -	struct mlx5_priv *priv = dev->data->dev_private;
> > +	struct mlx5_ibv_shared *sh = cb_arg;
> >  	struct ibv_async_event event;
> > -	uint32_t ret = 0;
> >
> > -	if (mlx5_link_update(dev, 0) == -EAGAIN) {
> > -		usleep(0);
> > -		return 0;
> > -	}
> > -	/* Read all message and acknowledge them. */
> > +	/* Read all message from the IB device and acknowledge them. */
> >  	for (;;) {
> > -		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
> > +		struct rte_eth_dev *dev;
> > +		uint32_t tmp;
> > +
> > +		if (mlx5_glue->get_async_event(sh->ctx, &event))
> >  			break;
> > +		/* Retrieve and check IB port index. */
> > +		tmp = (uint32_t)event.element.port_num;
> > +		assert(tmp && (tmp <= sh->max_port));
> > +		if (!tmp ||
> > +		    tmp > sh->max_port ||
> > +		    sh->port[tmp - 1].port_id >= RTE_MAX_ETHPORTS) {
> > +			/*
> > +			 * Invalid IB port index or no handler
> > +			 * installed for this port.
> > +			 */
> > +			mlx5_glue->ack_async_event(&event);
> > +			continue;
> > +		}
> > +		/* Retrieve ethernet device descriptor. */
> > +		tmp = sh->port[tmp - 1].port_id;
> > +		dev = &rte_eth_devices[tmp];
> 
> Is there a guarantee that the representors ethedev indexes will be
> contiguous?
No any. And no assumptions of such kind were made during development.

> I think we need a mapping between ibdev port and ethedv index.
Yes, it is exactly done. Array contains the DPDK port_ids, array index is ibv_port.

> 
> > +		tmp = 0;
> > +		assert(dev);
> >  		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
> > -			event.event_type == IBV_EVENT_PORT_ERR) &&
> > -			(dev->data->dev_conf.intr_conf.lsc == 1))
> > -			ret |= (1 << RTE_ETH_EVENT_INTR_LSC);
> > -		else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
> > -			dev->data->dev_conf.intr_conf.rmv == 1)
> > -			ret |= (1 << RTE_ETH_EVENT_INTR_RMV);
> > -		else
> > -			DRV_LOG(DEBUG,
> > -				"port %u event type %d on not handled",
> > -				dev->data->port_id, event.event_type);
> > +		     event.event_type == IBV_EVENT_PORT_ERR) &&
> > +			dev->data->dev_conf.intr_conf.lsc) {
> > +			mlx5_glue->ack_async_event(&event);
> > +			if (mlx5_link_update(dev, 0) == -EAGAIN) {
> > +				usleep(0);
> > +				continue;
> > +			}
> > +			_rte_eth_dev_callback_process
> > +				(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
> > +			continue;
> > +		}
> > +		if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
> > +		    dev->data->dev_conf.intr_conf.rmv) {
> > +			mlx5_glue->ack_async_event(&event);
> > +			_rte_eth_dev_callback_process
> > +				(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
> > +			continue;
> > +		}
> > +		DRV_LOG(DEBUG,
> > +			"port %u event type %d on not handled",
> > +			dev->data->port_id, event.event_type);
> >  		mlx5_glue->ack_async_event(&event);
> >  	}
> > -	return ret;
> > -}
> > -
> > -/**
> > - * Handle interrupts from the NIC.
> > - *
> > - * @param[in] intr_handle
> > - *   Interrupt handler.
> > - * @param cb_arg
> > - *   Callback argument.
> > - */
> > -void
> > -mlx5_dev_interrupt_handler(void *cb_arg) -{
> > -	struct rte_eth_dev *dev = cb_arg;
> > -	uint32_t events;
> > -
> > -	events = mlx5_dev_status_handler(dev);
> > -	if (events & (1 << RTE_ETH_EVENT_INTR_LSC))
> > -		_rte_eth_dev_callback_process(dev,
> > RTE_ETH_EVENT_INTR_LSC, NULL);
> > -	if (events & (1 << RTE_ETH_EVENT_INTR_RMV))
> > -		_rte_eth_dev_callback_process(dev,
> > RTE_ETH_EVENT_INTR_RMV, NULL);
> >  }
> >
> >  /**
> > --
> > 1.8.3.1


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 14/14] net/mlx5: add source vport match to the ingress rules
  2019-03-21 12:15     ` Shahaf Shuler
@ 2019-03-21 14:11       ` Slava Ovsiienko
  2019-03-24  9:13         ` Shahaf Shuler
  0 siblings, 1 reply; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-21 14:11 UTC (permalink / raw)
  To: Shahaf Shuler, dev

> -----Original Message-----
> From: Shahaf Shuler
> Sent: Thursday, March 21, 2019 14:15
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Subject: RE: [PATCH 14/14] net/mlx5: add source vport match to the ingress
> rules
> 
> Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> > Subject: [PATCH 14/14] net/mlx5: add source vport match to the ingress
> > rules
> >
> > For E-Switch configurations over multiport Infiniband devices we
> > should add source vport match to correctly distribute traffic between
> representors.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5_flow_dv.c | 38
> > ++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 38 insertions(+)
> >
> > diff --git a/drivers/net/mlx5/mlx5_flow_dv.c
> > b/drivers/net/mlx5/mlx5_flow_dv.c index b8943da..489b3bd 100644
> > --- a/drivers/net/mlx5/mlx5_flow_dv.c
> > +++ b/drivers/net/mlx5/mlx5_flow_dv.c
> > @@ -3095,6 +3095,29 @@ struct field_modify_info modify_tcp[] = {  }
> >
> >  /**
> > + * Add source vport match to the specified matcher.
> > + *
> > + * @param[in, out] matcher
> > + *   Flow matcher.
> > + * @param[in, out] key
> > + *   Flow matcher value.
> > + * @param[in] port
> > + *   Source vport value to match
> > + * @param[in] mask
> > + *   Mask
> > + */
> > +static void
> > +flow_dv_translate_source_vport(void *matcher, void *key,
> > +			      int16_t port, uint16_t mask) {
> > +	void *misc_m = MLX5_ADDR_OF(fte_match_param, matcher,
> > misc_parameters);
> > +	void *misc_v = MLX5_ADDR_OF(fte_match_param, key,
> > misc_parameters);
> > +
> > +	MLX5_SET(fte_match_set_misc, misc_m, source_port, mask);
> > +	MLX5_SET(fte_match_set_misc, misc_v, source_port, port); }
> > +
> > +/**
> >   * Fill the flow with DV spec.
> >   *
> >   * @param[in] dev
> > @@ -3389,6 +3412,21 @@ struct field_modify_info modify_tcp[] = {
> >  	}
> >  	dev_flow->dv.actions_n = actions_n;
> >  	flow->actions = action_flags;
> > +	if (attr->ingress && !attr->transfer &&
> > +	    (priv->representor || priv->master)) {
> > +		/* It was validated - we support unidirections flows only. */
> > +		assert(!attr->egress);
> > +		/*
> > +		 * Add matching on source vport index only
> > +		 * for ingress rules in E-Switch configurations.
> > +		 */
> > +		flow_dv_translate_source_vport(matcher.mask.buf,
> > +					       dev_flow->dv.value.buf,
> > +					       priv->representor_id < 0 ?
> > +					       priv->representor_id :
> > +					       priv->representor_id + 1,
> 
> The vport of representor_id 0 will be 1?
> Who owns vport 0?

PF.
There is the foillowing vport mapping (for single E-Switch per PF):

-1 - wire
0 - PF (uplink + VF reps)
1 - VF0
2 - VF1
...
n+1 - VFn

This code is subject to change - (SF, multi E-Switch per function, etc),
this patch currently supports single E-Switch  per PF.

> 
> > +					       0xffff);
> > +	}
> >  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
> >  		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
> >  		void *match_mask = matcher.mask.buf;
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x
  2019-03-21 12:13     ` Shahaf Shuler
@ 2019-03-21 15:08       ` Stephen Hemminger
  2019-03-21 15:31         ` Slava Ovsiienko
  0 siblings, 1 reply; 114+ messages in thread
From: Stephen Hemminger @ 2019-03-21 15:08 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Slava Ovsiienko, dev

On Thu, 21 Mar 2019 12:13:50 +0000
Shahaf Shuler <shahafs@mellanox.com> wrote:

> Hi Slava,
> 
> Small comments below. Once fixed you can put my acked-by on the next version. 
> 
> Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> > Subject: [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x
> > 
> > The master device and VF representors were distinguished by presence of
> > port name, master device did not have one. The new Linux kernels starting
> > from 5.0 provide the port name for master device and the implemented
> > representor recognizing method does not work.
> > The new recognizing method is based on quiering the VF number, created on
> > the base of the device.
> > 
> > The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
> > attribute is specified in the Netlink request message.
> > 
> > Also the presence of device symlink in device sysfs folder is added to
> > distinguish representors with sysfs based method.
> > 
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > 
> > ---
> > 
> > v3: - rebased over new port naming http://patches.dpdk.org/patch/51245/
> >     - master recognition is reinforced by checking vport for -1
> >       for new port naming schema
> > 
> > v2: - fopen replaced with opendir to detect whether directory exists
> > 
> > v1: http://patches.dpdk.org/patch/50411/
> > ---
> >  drivers/net/mlx5/Makefile      | 10 ++++++++++
> >  drivers/net/mlx5/meson.build   |  4 ++++
> >  drivers/net/mlx5/mlx5.c        |  2 +-
> >  drivers/net/mlx5/mlx5.h        |  1 +
> >  drivers/net/mlx5/mlx5_ethdev.c | 13 +++++++++++--
> >  drivers/net/mlx5/mlx5_nl.c     | 36
> > +++++++++++++++++++++++++++++++++---
> >  6 files changed, 60 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile index
> > 1ed299d..3dd7e38 100644
> > --- a/drivers/net/mlx5/Makefile
> > +++ b/drivers/net/mlx5/Makefile
> > @@ -231,6 +231,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-
> > config-h.sh
> >  		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
> >  		$(AUTOCONF_OUTPUT)
> >  	$Q sh -- '$<' '$@' \
> > +		HAVE_IFLA_NUM_VF \
> > +		linux/if_link.h \
> > +		enum IFLA_NUM_VF \
> > +		$(AUTOCONF_OUTPUT)
> > +	$Q sh -- '$<' '$@' \
> > +		HAVE_IFLA_EXT_MASK \
> > +		linux/if_link.h \
> > +		enum IFLA_EXT_MASK \
> > +		$(AUTOCONF_OUTPUT)
> > +	$Q sh -- '$<' '$@' \
> >  		HAVE_IFLA_PHYS_SWITCH_ID \
> >  		linux/if_link.h \
> >  		enum IFLA_PHYS_SWITCH_ID \
> > diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
> > index 0cf2f08..e3cb9bc 100644
> > --- a/drivers/net/mlx5/meson.build
> > +++ b/drivers/net/mlx5/meson.build
> > @@ -133,6 +133,10 @@ if build
> >  		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
> >  		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
> >  		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
> > +		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
> > +		'IFLA_NUM_VF' ],
> > +		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
> > +		'IFLA_EXT_MASK' ],
> >  		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
> >  		'IFLA_PHYS_SWITCH_ID' ],
> >  		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h', diff --git
> > a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > ad1975c..ea3d00c 100644
> > --- a/drivers/net/mlx5/mlx5.c
> > +++ b/drivers/net/mlx5/mlx5.c
> > @@ -13,7 +13,6 @@
> >  #include <errno.h>
> >  #include <net/if.h>
> >  #include <sys/mman.h>
> > -#include <linux/netlink.h>
> >  #include <linux/rtnetlink.h>
> > 
> >  /* Verbs header. */
> > @@ -1001,6 +1000,7 @@
> >  	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
> >  	priv->nl_sn = 0;
> >  	priv->representor = !!switch_info->representor;
> > +	priv->master = !!switch_info->master;
> >  	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
> >  	priv->representor_id =
> >  		switch_info->representor ? switch_info->port_name : -1; diff
> > --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> > a88cb4a..58bc37f 100644
> > --- a/drivers/net/mlx5/mlx5.h
> > +++ b/drivers/net/mlx5/mlx5.h
> > @@ -214,6 +214,7 @@ struct mlx5_priv {
> >  	uint16_t mtu; /* Configured MTU. */
> >  	unsigned int isolated:1; /* Whether isolated mode is enabled. */
> >  	unsigned int representor:1; /* Device is a port representor. */
> > +	unsigned int master:1; /* Device is a E-Switch master. */
> >  	uint16_t domain_id; /* Switch domain identifier. */
> >  	int32_t representor_id; /* Port representor identifier. */
> >  	/* RX/TX queues. */
> > diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> > b/drivers/net/mlx5/mlx5_ethdev.c index 84d761c..81f2a42 100644
> > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > @@ -1362,8 +1362,10 @@ int mlx5_fw_version_get(struct rte_eth_dev
> > *dev, char *fw_ver, size_t fw_size)
> >  		.port_name = 0,
> >  		.switch_id = 0,
> >  	};
> > +	DIR *dir;
> >  	bool port_name_set = false;
> >  	bool port_switch_id_set = false;
> > +	bool device_dir = false;
> >  	char c;
> > 
> >  	if (!if_indextoname(ifindex, ifname)) { @@ -1375,6 +1377,8 @@ int
> > mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t
> > fw_size)
> >  	      ifname);
> >  	MKSTR(phys_switch_id, "/sys/class/net/%s/phys_switch_id",
> >  	      ifname);
> > +	MKSTR(pci_device, "/sys/class/net/%s/device",
> > +	      ifname);
> > 
> >  	file = fopen(phys_port_name, "rb");
> >  	if (file != NULL) {
> > @@ -1391,8 +1395,13 @@ int mlx5_fw_version_get(struct rte_eth_dev
> > *dev, char *fw_ver, size_t fw_size)
> >  		fscanf(file, "%" SCNx64 "%c", &data.switch_id, &c) == 2 &&
> >  		c == '\n';
> >  	fclose(file);
> > -	data.master = port_switch_id_set && !port_name_set;
> > -	data.representor = port_switch_id_set && port_name_set;
> > +	dir = opendir(pci_device);
> > +	if (dir != NULL) {
> > +		closedir(dir);
> > +		device_dir = true;
> > +	}
> > +	data.master = port_switch_id_set && (!port_name_set ||
> > device_dir);
> > +	data.representor = port_switch_id_set && port_name_set &&
> > !device_dir;  
> 
> Add assert that device cannot be both master and representor. 

Error checking would be but assert() is usually not a useful in drivers.
It causes crash, and is often compiled out.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x
  2019-03-21 15:08       ` Stephen Hemminger
@ 2019-03-21 15:31         ` Slava Ovsiienko
  2019-03-21 19:08           ` Stephen Hemminger
  0 siblings, 1 reply; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-21 15:31 UTC (permalink / raw)
  To: Stephen Hemminger, Shahaf Shuler; +Cc: dev

Hi, Stephen

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, March 21, 2019 17:09
> To: Shahaf Shuler <shahafs@mellanox.com>
> Cc: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 01/14] net/mlx5: add representor recognition
> on kernels 5.x
> 
> On Thu, 21 Mar 2019 12:13:50 +0000
> Shahaf Shuler <shahafs@mellanox.com> wrote:
> 
> > Hi Slava,
> >
> > Small comments below. Once fixed you can put my acked-by on the next
> version.
> >
> > Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> > > Subject: [PATCH 01/14] net/mlx5: add representor recognition on
> > > kernels 5.x
> > >
> > > The master device and VF representors were distinguished by presence
> > > of port name, master device did not have one. The new Linux kernels
> > > starting from 5.0 provide the port name for master device and the
> > > implemented representor recognizing method does not work.
> > > The new recognizing method is based on quiering the VF number,
> > > created on the base of the device.
> > >
> > > The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
> > > attribute is specified in the Netlink request message.
> > >
> > > Also the presence of device symlink in device sysfs folder is added
> > > to distinguish representors with sysfs based method.
> > >
> > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > >
> > > ---
> > >
> > > v3: - rebased over new port naming
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatche
> s.dpdk.org%2Fpatch%2F51245%2F&amp;data=02%7C01%7Cviacheslavo%40
> mellanox.com%7C94cc885cbb8d4aade9dd08d6ae0f26cd%7Ca652971c7d2e4
> d9ba6a4d149256f461b%7C0%7C0%7C636887777455248723&amp;sdata=FDq
> 950ksokxsNac8cBM293W263uPfVeY1xA7Cx%2F4FLk%3D&amp;reserved=0
> > >     - master recognition is reinforced by checking vport for -1
> > >       for new port naming schema
> > >
> > > v2: - fopen replaced with opendir to detect whether directory exists
> > >
> > > v1:
> > > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpat
> > >
> ches.dpdk.org%2Fpatch%2F50411%2F&amp;data=02%7C01%7Cviacheslavo%
> 40me
> > >
> llanox.com%7C94cc885cbb8d4aade9dd08d6ae0f26cd%7Ca652971c7d2e4d9b
> a6a4
> > >
> d149256f461b%7C0%7C0%7C636887777455248723&amp;sdata=JkWKbb6LV
> diIHW%2
> > > FpJEQHcD7hvFLWdGmM%2BTVhM%2F%2F80Uk%3D&amp;reserved=0
> > > ---
> > >  drivers/net/mlx5/Makefile      | 10 ++++++++++
> > >  drivers/net/mlx5/meson.build   |  4 ++++
> > >  drivers/net/mlx5/mlx5.c        |  2 +-
> > >  drivers/net/mlx5/mlx5.h        |  1 +
> > >  drivers/net/mlx5/mlx5_ethdev.c | 13 +++++++++++--
> > >  drivers/net/mlx5/mlx5_nl.c     | 36
> > > +++++++++++++++++++++++++++++++++---
> > >  6 files changed, 60 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
> > > index
> > > 1ed299d..3dd7e38 100644
> > > --- a/drivers/net/mlx5/Makefile
> > > +++ b/drivers/net/mlx5/Makefile
> > > @@ -231,6 +231,16 @@ mlx5_autoconf.h.new:
> > > $(RTE_SDK)/buildtools/auto- config-h.sh
> > >  		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
> > >  		$(AUTOCONF_OUTPUT)
> > >  	$Q sh -- '$<' '$@' \
> > > +		HAVE_IFLA_NUM_VF \
> > > +		linux/if_link.h \
> > > +		enum IFLA_NUM_VF \
> > > +		$(AUTOCONF_OUTPUT)
> > > +	$Q sh -- '$<' '$@' \
> > > +		HAVE_IFLA_EXT_MASK \
> > > +		linux/if_link.h \
> > > +		enum IFLA_EXT_MASK \
> > > +		$(AUTOCONF_OUTPUT)
> > > +	$Q sh -- '$<' '$@' \
> > >  		HAVE_IFLA_PHYS_SWITCH_ID \
> > >  		linux/if_link.h \
> > >  		enum IFLA_PHYS_SWITCH_ID \
> > > diff --git a/drivers/net/mlx5/meson.build
> > > b/drivers/net/mlx5/meson.build index 0cf2f08..e3cb9bc 100644
> > > --- a/drivers/net/mlx5/meson.build
> > > +++ b/drivers/net/mlx5/meson.build
> > > @@ -133,6 +133,10 @@ if build
> > >  		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
> > >  		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
> > >  		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
> > > +		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
> > > +		'IFLA_NUM_VF' ],
> > > +		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
> > > +		'IFLA_EXT_MASK' ],
> > >  		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
> > >  		'IFLA_PHYS_SWITCH_ID' ],
> > >  		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h', diff --git
> > > a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > > ad1975c..ea3d00c 100644
> > > --- a/drivers/net/mlx5/mlx5.c
> > > +++ b/drivers/net/mlx5/mlx5.c
> > > @@ -13,7 +13,6 @@
> > >  #include <errno.h>
> > >  #include <net/if.h>
> > >  #include <sys/mman.h>
> > > -#include <linux/netlink.h>
> > >  #include <linux/rtnetlink.h>
> > >
> > >  /* Verbs header. */
> > > @@ -1001,6 +1000,7 @@
> > >  	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
> > >  	priv->nl_sn = 0;
> > >  	priv->representor = !!switch_info->representor;
> > > +	priv->master = !!switch_info->master;
> > >  	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
> > >  	priv->representor_id =
> > >  		switch_info->representor ? switch_info->port_name : -1; diff
> > > --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> > > a88cb4a..58bc37f 100644
> > > --- a/drivers/net/mlx5/mlx5.h
> > > +++ b/drivers/net/mlx5/mlx5.h
> > > @@ -214,6 +214,7 @@ struct mlx5_priv {
> > >  	uint16_t mtu; /* Configured MTU. */
> > >  	unsigned int isolated:1; /* Whether isolated mode is enabled. */
> > >  	unsigned int representor:1; /* Device is a port representor. */
> > > +	unsigned int master:1; /* Device is a E-Switch master. */
> > >  	uint16_t domain_id; /* Switch domain identifier. */
> > >  	int32_t representor_id; /* Port representor identifier. */
> > >  	/* RX/TX queues. */
> > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> > > b/drivers/net/mlx5/mlx5_ethdev.c index 84d761c..81f2a42 100644
> > > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > > @@ -1362,8 +1362,10 @@ int mlx5_fw_version_get(struct rte_eth_dev
> > > *dev, char *fw_ver, size_t fw_size)
> > >  		.port_name = 0,
> > >  		.switch_id = 0,
> > >  	};
> > > +	DIR *dir;
> > >  	bool port_name_set = false;
> > >  	bool port_switch_id_set = false;
> > > +	bool device_dir = false;
> > >  	char c;
> > >
> > >  	if (!if_indextoname(ifindex, ifname)) { @@ -1375,6 +1377,8 @@ int
> > > mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t
> > > fw_size)
> > >  	      ifname);
> > >  	MKSTR(phys_switch_id, "/sys/class/net/%s/phys_switch_id",
> > >  	      ifname);
> > > +	MKSTR(pci_device, "/sys/class/net/%s/device",
> > > +	      ifname);
> > >
> > >  	file = fopen(phys_port_name, "rb");
> > >  	if (file != NULL) {
> > > @@ -1391,8 +1395,13 @@ int mlx5_fw_version_get(struct rte_eth_dev
> > > *dev, char *fw_ver, size_t fw_size)
> > >  		fscanf(file, "%" SCNx64 "%c", &data.switch_id, &c) == 2 &&
> > >  		c == '\n';
> > >  	fclose(file);
> > > -	data.master = port_switch_id_set && !port_name_set;
> > > -	data.representor = port_switch_id_set && port_name_set;
> > > +	dir = opendir(pci_device);
> > > +	if (dir != NULL) {
> > > +		closedir(dir);
> > > +		device_dir = true;
> > > +	}
> > > +	data.master = port_switch_id_set && (!port_name_set ||
> > > device_dir);
> > > +	data.representor = port_switch_id_set && port_name_set &&
> > > !device_dir;
> >
> > Add assert that device cannot be both master and representor.
> 
> Error checking would be but assert() is usually not a useful in drivers.
> It causes crash, and is often compiled out.

PMD is a user mode driver, so standard assert() seems to be relevant.
But I agree, it would be good for portable code to have its own
definition of assert. Say, "rte_assert". It would allow us to define/redefine
the code behavior if assertion fails.

As for me, I think asserts are EXTREMELY useful, it saves a lot of time while
debugging, and it is proved by my own practice of mlx5 PMD debugging
(beside other projects).  Assert inserted in right place stops the quite wrong
situation evolving and allows us to have a good catch and find
the root of the problem quickly.

WBR,
Slava

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x
  2019-03-21 15:31         ` Slava Ovsiienko
@ 2019-03-21 19:08           ` Stephen Hemminger
  2019-03-22  8:15             ` Slava Ovsiienko
  0 siblings, 1 reply; 114+ messages in thread
From: Stephen Hemminger @ 2019-03-21 19:08 UTC (permalink / raw)
  To: Slava Ovsiienko; +Cc: Shahaf Shuler, dev

On Thu, 21 Mar 2019 15:31:36 +0000
Slava Ovsiienko <viacheslavo@mellanox.com> wrote:

> Hi, Stephen
> 
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Thursday, March 21, 2019 17:09
> > To: Shahaf Shuler <shahafs@mellanox.com>
> > Cc: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 01/14] net/mlx5: add representor recognition
> > on kernels 5.x
> > 
> > On Thu, 21 Mar 2019 12:13:50 +0000
> > Shahaf Shuler <shahafs@mellanox.com> wrote:
> >   
> > > Hi Slava,
> > >
> > > Small comments below. Once fixed you can put my acked-by on the next  
> > version.  
> > >
> > > Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:  
> > > > Subject: [PATCH 01/14] net/mlx5: add representor recognition on
> > > > kernels 5.x
> > > >
> > > > The master device and VF representors were distinguished by presence
> > > > of port name, master device did not have one. The new Linux kernels
> > > > starting from 5.0 provide the port name for master device and the
> > > > implemented representor recognizing method does not work.
> > > > The new recognizing method is based on quiering the VF number,
> > > > created on the base of the device.
> > > >
> > > > The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
> > > > attribute is specified in the Netlink request message.
> > > >
> > > > Also the presence of device symlink in device sysfs folder is added
> > > > to distinguish representors with sysfs based method.
> > > >
> > > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > >
> > > > ---
> > > >
> > > > v3: - rebased over new port naming  
> > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatche
> > s.dpdk.org%2Fpatch%2F51245%2F&amp;data=02%7C01%7Cviacheslavo%40
> > mellanox.com%7C94cc885cbb8d4aade9dd08d6ae0f26cd%7Ca652971c7d2e4
> > d9ba6a4d149256f461b%7C0%7C0%7C636887777455248723&amp;sdata=FDq
> > 950ksokxsNac8cBM293W263uPfVeY1xA7Cx%2F4FLk%3D&amp;reserved=0  
> > > >     - master recognition is reinforced by checking vport for -1
> > > >       for new port naming schema
> > > >
> > > > v2: - fopen replaced with opendir to detect whether directory exists
> > > >
> > > > v1:
> > > > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpat
> > > >  
> > ches.dpdk.org%2Fpatch%2F50411%2F&amp;data=02%7C01%7Cviacheslavo%
> > 40me  
> > > >  
> > llanox.com%7C94cc885cbb8d4aade9dd08d6ae0f26cd%7Ca652971c7d2e4d9b
> > a6a4  
> > > >  
> > d149256f461b%7C0%7C0%7C636887777455248723&amp;sdata=JkWKbb6LV
> > diIHW%2  
> > > > FpJEQHcD7hvFLWdGmM%2BTVhM%2F%2F80Uk%3D&amp;reserved=0
> > > > ---
> > > >  drivers/net/mlx5/Makefile      | 10 ++++++++++
> > > >  drivers/net/mlx5/meson.build   |  4 ++++
> > > >  drivers/net/mlx5/mlx5.c        |  2 +-
> > > >  drivers/net/mlx5/mlx5.h        |  1 +
> > > >  drivers/net/mlx5/mlx5_ethdev.c | 13 +++++++++++--
> > > >  drivers/net/mlx5/mlx5_nl.c     | 36
> > > > +++++++++++++++++++++++++++++++++---
> > > >  6 files changed, 60 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
> > > > index
> > > > 1ed299d..3dd7e38 100644
> > > > --- a/drivers/net/mlx5/Makefile
> > > > +++ b/drivers/net/mlx5/Makefile
> > > > @@ -231,6 +231,16 @@ mlx5_autoconf.h.new:
> > > > $(RTE_SDK)/buildtools/auto- config-h.sh
> > > >  		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
> > > >  		$(AUTOCONF_OUTPUT)
> > > >  	$Q sh -- '$<' '$@' \
> > > > +		HAVE_IFLA_NUM_VF \
> > > > +		linux/if_link.h \
> > > > +		enum IFLA_NUM_VF \
> > > > +		$(AUTOCONF_OUTPUT)
> > > > +	$Q sh -- '$<' '$@' \
> > > > +		HAVE_IFLA_EXT_MASK \
> > > > +		linux/if_link.h \
> > > > +		enum IFLA_EXT_MASK \
> > > > +		$(AUTOCONF_OUTPUT)
> > > > +	$Q sh -- '$<' '$@' \
> > > >  		HAVE_IFLA_PHYS_SWITCH_ID \
> > > >  		linux/if_link.h \
> > > >  		enum IFLA_PHYS_SWITCH_ID \
> > > > diff --git a/drivers/net/mlx5/meson.build
> > > > b/drivers/net/mlx5/meson.build index 0cf2f08..e3cb9bc 100644
> > > > --- a/drivers/net/mlx5/meson.build
> > > > +++ b/drivers/net/mlx5/meson.build
> > > > @@ -133,6 +133,10 @@ if build
> > > >  		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
> > > >  		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
> > > >  		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
> > > > +		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
> > > > +		'IFLA_NUM_VF' ],
> > > > +		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
> > > > +		'IFLA_EXT_MASK' ],
> > > >  		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
> > > >  		'IFLA_PHYS_SWITCH_ID' ],
> > > >  		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h', diff --git
> > > > a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > > > ad1975c..ea3d00c 100644
> > > > --- a/drivers/net/mlx5/mlx5.c
> > > > +++ b/drivers/net/mlx5/mlx5.c
> > > > @@ -13,7 +13,6 @@
> > > >  #include <errno.h>
> > > >  #include <net/if.h>
> > > >  #include <sys/mman.h>
> > > > -#include <linux/netlink.h>
> > > >  #include <linux/rtnetlink.h>
> > > >
> > > >  /* Verbs header. */
> > > > @@ -1001,6 +1000,7 @@
> > > >  	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
> > > >  	priv->nl_sn = 0;
> > > >  	priv->representor = !!switch_info->representor;
> > > > +	priv->master = !!switch_info->master;
> > > >  	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
> > > >  	priv->representor_id =
> > > >  		switch_info->representor ? switch_info->port_name : -1; diff
> > > > --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> > > > a88cb4a..58bc37f 100644
> > > > --- a/drivers/net/mlx5/mlx5.h
> > > > +++ b/drivers/net/mlx5/mlx5.h
> > > > @@ -214,6 +214,7 @@ struct mlx5_priv {
> > > >  	uint16_t mtu; /* Configured MTU. */
> > > >  	unsigned int isolated:1; /* Whether isolated mode is enabled. */
> > > >  	unsigned int representor:1; /* Device is a port representor. */
> > > > +	unsigned int master:1; /* Device is a E-Switch master. */
> > > >  	uint16_t domain_id; /* Switch domain identifier. */
> > > >  	int32_t representor_id; /* Port representor identifier. */
> > > >  	/* RX/TX queues. */
> > > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> > > > b/drivers/net/mlx5/mlx5_ethdev.c index 84d761c..81f2a42 100644
> > > > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > > > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > > > @@ -1362,8 +1362,10 @@ int mlx5_fw_version_get(struct rte_eth_dev
> > > > *dev, char *fw_ver, size_t fw_size)
> > > >  		.port_name = 0,
> > > >  		.switch_id = 0,
> > > >  	};
> > > > +	DIR *dir;
> > > >  	bool port_name_set = false;
> > > >  	bool port_switch_id_set = false;
> > > > +	bool device_dir = false;
> > > >  	char c;
> > > >
> > > >  	if (!if_indextoname(ifindex, ifname)) { @@ -1375,6 +1377,8 @@ int
> > > > mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t
> > > > fw_size)
> > > >  	      ifname);
> > > >  	MKSTR(phys_switch_id, "/sys/class/net/%s/phys_switch_id",
> > > >  	      ifname);
> > > > +	MKSTR(pci_device, "/sys/class/net/%s/device",
> > > > +	      ifname);
> > > >
> > > >  	file = fopen(phys_port_name, "rb");
> > > >  	if (file != NULL) {
> > > > @@ -1391,8 +1395,13 @@ int mlx5_fw_version_get(struct rte_eth_dev
> > > > *dev, char *fw_ver, size_t fw_size)
> > > >  		fscanf(file, "%" SCNx64 "%c", &data.switch_id, &c) == 2 &&
> > > >  		c == '\n';
> > > >  	fclose(file);
> > > > -	data.master = port_switch_id_set && !port_name_set;
> > > > -	data.representor = port_switch_id_set && port_name_set;
> > > > +	dir = opendir(pci_device);
> > > > +	if (dir != NULL) {
> > > > +		closedir(dir);
> > > > +		device_dir = true;
> > > > +	}
> > > > +	data.master = port_switch_id_set && (!port_name_set ||
> > > > device_dir);
> > > > +	data.representor = port_switch_id_set && port_name_set &&
> > > > !device_dir;  
> > >
> > > Add assert that device cannot be both master and representor.  
> > 
> > Error checking would be but assert() is usually not a useful in drivers.
> > It causes crash, and is often compiled out.  
> 
> PMD is a user mode driver, so standard assert() seems to be relevant.
> But I agree, it would be good for portable code to have its own
> definition of assert. Say, "rte_assert". It would allow us to define/redefine
> the code behavior if assertion fails.
> 
> As for me, I think asserts are EXTREMELY useful, it saves a lot of time while
> debugging, and it is proved by my own practice of mlx5 PMD debugging
> (beside other projects).  Assert inserted in right place stops the quite wrong
> situation evolving and allows us to have a good catch and find
> the root of the problem quickly.
> 
> WBR,
> Slava

You may misunderstand what I meant. Asserts are useful, but they need to be
used only when real error handling is not possible. In general, it is better
to log and return an error on invalid data than crash the whole application.
Especially since DPDK now supports hot plug and it could be that device is
added to a working application.

For me, the worry is that an application on Azure starts up using synthetic
datapath successfully, and the hot plug of VF (accelerated networking) might
expose a driver bug. In that case, there still is a fallback to ignore the
mlx device.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x
  2019-03-21 19:08           ` Stephen Hemminger
@ 2019-03-22  8:15             ` Slava Ovsiienko
  0 siblings, 0 replies; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-22  8:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Shahaf Shuler, dev

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, March 21, 2019 21:09
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 01/14] net/mlx5: add representor recognition
> on kernels 5.x
> 
> On Thu, 21 Mar 2019 15:31:36 +0000
> Slava Ovsiienko <viacheslavo@mellanox.com> wrote:
> 
> > Hi, Stephen
> >
> > > -----Original Message-----
> > > From: Stephen Hemminger <stephen@networkplumber.org>
> > > Sent: Thursday, March 21, 2019 17:09
> > > To: Shahaf Shuler <shahafs@mellanox.com>
> > > Cc: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 01/14] net/mlx5: add representor
> > > recognition on kernels 5.x
> > >
> > > On Thu, 21 Mar 2019 12:13:50 +0000
> > > Shahaf Shuler <shahafs@mellanox.com> wrote:
> > >
> > > > Hi Slava,
> > > >
> > > > Small comments below. Once fixed you can put my acked-by on the
> > > > next
> > > version.
> > > >
> > > > Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> > > > > Subject: [PATCH 01/14] net/mlx5: add representor recognition on
> > > > > kernels 5.x
> > > > >
> > > > > The master device and VF representors were distinguished by
> > > > > presence of port name, master device did not have one. The new
> > > > > Linux kernels starting from 5.0 provide the port name for master
> > > > > device and the implemented representor recognizing method does
> not work.
> > > > > The new recognizing method is based on quiering the VF number,
> > > > > created on the base of the device.
> > > > >
> > > > > The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
> > > > > attribute is specified in the Netlink request message.
> > > > >
> > > > > Also the presence of device symlink in device sysfs folder is
> > > > > added to distinguish representors with sysfs based method.
> > > > >
> > > > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > > >
> > > > > ---
> > > > >
> > > > > v3: - rebased over new port naming
> > > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpat
> > > che
> > >
> s.dpdk.org%2Fpatch%2F51245%2F&amp;data=02%7C01%7Cviacheslavo%40
> > >
> mellanox.com%7C94cc885cbb8d4aade9dd08d6ae0f26cd%7Ca652971c7d2e4
> > >
> d9ba6a4d149256f461b%7C0%7C0%7C636887777455248723&amp;sdata=FDq
> > >
> 950ksokxsNac8cBM293W263uPfVeY1xA7Cx%2F4FLk%3D&amp;reserved=0
> > > > >     - master recognition is reinforced by checking vport for -1
> > > > >       for new port naming schema
> > > > >
> > > > > v2: - fopen replaced with opendir to detect whether directory
> > > > > exists
> > > > >
> > > > > v1:
> > > > > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2
> > > > > Fpat
> > > > >
> > >
> ches.dpdk.org%2Fpatch%2F50411%2F&amp;data=02%7C01%7Cviacheslavo%
> > > 40me
> > > > >
> > >
> llanox.com%7C94cc885cbb8d4aade9dd08d6ae0f26cd%7Ca652971c7d2e4d9b
> > > a6a4
> > > > >
> > >
> d149256f461b%7C0%7C0%7C636887777455248723&amp;sdata=JkWKbb6LV
> > > diIHW%2
> > > > >
> FpJEQHcD7hvFLWdGmM%2BTVhM%2F%2F80Uk%3D&amp;reserved=0
> > > > > ---
> > > > >  drivers/net/mlx5/Makefile      | 10 ++++++++++
> > > > >  drivers/net/mlx5/meson.build   |  4 ++++
> > > > >  drivers/net/mlx5/mlx5.c        |  2 +-
> > > > >  drivers/net/mlx5/mlx5.h        |  1 +
> > > > >  drivers/net/mlx5/mlx5_ethdev.c | 13 +++++++++++--
> > > > >  drivers/net/mlx5/mlx5_nl.c     | 36
> > > > > +++++++++++++++++++++++++++++++++---
> > > > >  6 files changed, 60 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/mlx5/Makefile
> > > > > b/drivers/net/mlx5/Makefile index
> > > > > 1ed299d..3dd7e38 100644
> > > > > --- a/drivers/net/mlx5/Makefile
> > > > > +++ b/drivers/net/mlx5/Makefile
> > > > > @@ -231,6 +231,16 @@ mlx5_autoconf.h.new:
> > > > > $(RTE_SDK)/buildtools/auto- config-h.sh
> > > > >  		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
> > > > >  		$(AUTOCONF_OUTPUT)
> > > > >  	$Q sh -- '$<' '$@' \
> > > > > +		HAVE_IFLA_NUM_VF \
> > > > > +		linux/if_link.h \
> > > > > +		enum IFLA_NUM_VF \
> > > > > +		$(AUTOCONF_OUTPUT)
> > > > > +	$Q sh -- '$<' '$@' \
> > > > > +		HAVE_IFLA_EXT_MASK \
> > > > > +		linux/if_link.h \
> > > > > +		enum IFLA_EXT_MASK \
> > > > > +		$(AUTOCONF_OUTPUT)
> > > > > +	$Q sh -- '$<' '$@' \
> > > > >  		HAVE_IFLA_PHYS_SWITCH_ID \
> > > > >  		linux/if_link.h \
> > > > >  		enum IFLA_PHYS_SWITCH_ID \
> > > > > diff --git a/drivers/net/mlx5/meson.build
> > > > > b/drivers/net/mlx5/meson.build index 0cf2f08..e3cb9bc 100644
> > > > > --- a/drivers/net/mlx5/meson.build
> > > > > +++ b/drivers/net/mlx5/meson.build
> > > > > @@ -133,6 +133,10 @@ if build
> > > > >  		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
> > > > >  		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
> > > > >  		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
> > > > > +		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
> > > > > +		'IFLA_NUM_VF' ],
> > > > > +		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
> > > > > +		'IFLA_EXT_MASK' ],
> > > > >  		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
> > > > >  		'IFLA_PHYS_SWITCH_ID' ],
> > > > >  		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h', diff --git
> > > > > a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > > > > ad1975c..ea3d00c 100644
> > > > > --- a/drivers/net/mlx5/mlx5.c
> > > > > +++ b/drivers/net/mlx5/mlx5.c
> > > > > @@ -13,7 +13,6 @@
> > > > >  #include <errno.h>
> > > > >  #include <net/if.h>
> > > > >  #include <sys/mman.h>
> > > > > -#include <linux/netlink.h>
> > > > >  #include <linux/rtnetlink.h>
> > > > >
> > > > >  /* Verbs header. */
> > > > > @@ -1001,6 +1000,7 @@
> > > > >  	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
> > > > >  	priv->nl_sn = 0;
> > > > >  	priv->representor = !!switch_info->representor;
> > > > > +	priv->master = !!switch_info->master;
> > > > >  	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
> > > > >  	priv->representor_id =
> > > > >  		switch_info->representor ? switch_info->port_name : -1; diff
> > > > > --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> > > > > a88cb4a..58bc37f 100644
> > > > > --- a/drivers/net/mlx5/mlx5.h
> > > > > +++ b/drivers/net/mlx5/mlx5.h
> > > > > @@ -214,6 +214,7 @@ struct mlx5_priv {
> > > > >  	uint16_t mtu; /* Configured MTU. */
> > > > >  	unsigned int isolated:1; /* Whether isolated mode is enabled. */
> > > > >  	unsigned int representor:1; /* Device is a port representor.
> > > > > */
> > > > > +	unsigned int master:1; /* Device is a E-Switch master. */
> > > > >  	uint16_t domain_id; /* Switch domain identifier. */
> > > > >  	int32_t representor_id; /* Port representor identifier. */
> > > > >  	/* RX/TX queues. */
> > > > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> > > > > b/drivers/net/mlx5/mlx5_ethdev.c index 84d761c..81f2a42 100644
> > > > > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > > > > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > > > > @@ -1362,8 +1362,10 @@ int mlx5_fw_version_get(struct
> > > > > rte_eth_dev *dev, char *fw_ver, size_t fw_size)
> > > > >  		.port_name = 0,
> > > > >  		.switch_id = 0,
> > > > >  	};
> > > > > +	DIR *dir;
> > > > >  	bool port_name_set = false;
> > > > >  	bool port_switch_id_set = false;
> > > > > +	bool device_dir = false;
> > > > >  	char c;
> > > > >
> > > > >  	if (!if_indextoname(ifindex, ifname)) { @@ -1375,6 +1377,8 @@
> > > > > int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver,
> > > > > size_t
> > > > > fw_size)
> > > > >  	      ifname);
> > > > >  	MKSTR(phys_switch_id, "/sys/class/net/%s/phys_switch_id",
> > > > >  	      ifname);
> > > > > +	MKSTR(pci_device, "/sys/class/net/%s/device",
> > > > > +	      ifname);
> > > > >
> > > > >  	file = fopen(phys_port_name, "rb");
> > > > >  	if (file != NULL) {
> > > > > @@ -1391,8 +1395,13 @@ int mlx5_fw_version_get(struct
> > > > > rte_eth_dev *dev, char *fw_ver, size_t fw_size)
> > > > >  		fscanf(file, "%" SCNx64 "%c", &data.switch_id, &c) == 2 &&
> > > > >  		c == '\n';
> > > > >  	fclose(file);
> > > > > -	data.master = port_switch_id_set && !port_name_set;
> > > > > -	data.representor = port_switch_id_set && port_name_set;
> > > > > +	dir = opendir(pci_device);
> > > > > +	if (dir != NULL) {
> > > > > +		closedir(dir);
> > > > > +		device_dir = true;
> > > > > +	}
> > > > > +	data.master = port_switch_id_set && (!port_name_set ||
> > > > > device_dir);
> > > > > +	data.representor = port_switch_id_set && port_name_set
> &&
> > > > > !device_dir;
> > > >
> > > > Add assert that device cannot be both master and representor.
> > >
> > > Error checking would be but assert() is usually not a useful in drivers.
> > > It causes crash, and is often compiled out.
> >
> > PMD is a user mode driver, so standard assert() seems to be relevant.
> > But I agree, it would be good for portable code to have its own
> > definition of assert. Say, "rte_assert". It would allow us to
> > define/redefine the code behavior if assertion fails.
> >
> > As for me, I think asserts are EXTREMELY useful, it saves a lot of
> > time while debugging, and it is proved by my own practice of mlx5 PMD
> > debugging (beside other projects).  Assert inserted in right place
> > stops the quite wrong situation evolving and allows us to have a good
> > catch and find the root of the problem quickly.
> >
> > WBR,
> > Slava
> 
> You may misunderstand what I meant. Asserts are useful, but they need to
> be used only when real error handling is not possible. In general, it is better
> to log and return an error on invalid data than crash the whole application.

Yes, assert is not a replacement for error handling in any way. Assert is a "last chance"
debug tool, if assert condition is not met it means something is going in very
wrong way and often we have a very few steps before final application crash. 
Assert just allows us to be a step ahead of crash, nothing else. I totally agree
that we should handle errors whenever it is possible, but asserts are not regarding
this. Error handling and asserts are orthogonal things, IMO.

For example, in my patch, Shahaf's proposed asserts is just to ensure the
logic above works in correct way. Actually we can just review this logic to make
sure there is no invalid output combinations produced (and we've reviewed, of course).
But this logic can be changed occasionally for some reasons.  And proposed asserts
will help to locate this erroneous/not covered change instantly. Obviously it is not a case
we should tell user/application about error, it is a case of developer failure 😊. We just
use assert here as free of charge debug check, nothing else.
 
> Especially since DPDK now supports hot plug and it could be that device is
> added to a working application.
> 
> For me, the worry is that an application on Azure starts up using synthetic
> datapath successfully, and the hot plug of VF (accelerated networking) might
> expose a driver bug. In that case, there still is a fallback to ignore the mlx
> device.
There should no be asserts in release version. Assert is just a debug tool, should
we be worried too much about debug versions? By the way, "to expose driver bug" is
exactly our intention when we are running the debug version, isn't it?

With best regards,
Slava

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 05/14] net/mlx5: add multiport IB device support to probing
  2019-03-21 12:57         ` Slava Ovsiienko
@ 2019-03-24  9:00           ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-24  9:00 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 2:58 PM, Slava Ovsiienko:
> Subject: RE: [PATCH 05/14] net/mlx5: add multiport IB device support to
> probing
> 
> Sorry, missed some comments. Here is my extra answers.
> 

[...]

> > -----Original 
callback to sort device data.
> > > >   *
> > > > @@ -1380,7 +1381,9 @@ struct mlx5_dev_spawn_data {
> > > >  	       struct rte_pci_device *pci_dev)  {
> > > >  	struct ibv_device **ibv_list;
> > > > -	unsigned int n = 0;
> > > > +	unsigned int nd = 0;
> > > > +	unsigned int np = 0;
> > > > +	unsigned int ns = 0;
> > >
> > > This fields names are not informative. Find a better ones.
> >
> > Would the adding clarifying comments be enough ?

Yes it will be OK.

> >
> > nd - Number of (PCI) Devices   (nd != 1 means we have multiple devices
> with
> > the same BDF - old schema)
> > np - Number of (device) Ports (nd =1, np 1...n means we have new
> > multiport
> > device) ns - Number to Spawn  (deduced index - number of iterations)
> >
> > This names are used as indices, long names may make code less
> > readable, IMHO.
> >
> > >
> > > >  	struct mlx5_dev_config dev_config;
> > > >  	int ret;
> > > >
> > > > @@ -1392,8 +1395,14 @@ struct mlx5_dev_spawn_data {
> > > >  		DRV_LOG(ERR, "cannot list devices, is ib_uverbs loaded?");
> > > >  		return -rte_errno;
> > > >  	}
> > > > -
> > > > +	/*
> > > > +	 * First scan the list of all Infiniband devices to find
> > > > +	 * matching ones, gathering into the list.
> > > > +	 */
> > > >  	struct ibv_device *ibv_match[ret + 1];
> > > > +	int nl_route = -1;
> > > > +	int nl_rdma = -1;
> > > > +	unsigned int i;
> > > >
> > > >  	while (ret-- > 0) {
> > > >  		struct rte_pci_addr pci_addr;
> > > > @@ -1408,77 +1417,183 @@ struct mlx5_dev_spawn_data {
> > > >  			continue;
> > > >  		DRV_LOG(INFO, "PCI information matches for device
> \"%s\"",
> > > >  			ibv_list[ret]->name);
> > > > -		ibv_match[n++] = ibv_list[ret];
> > > > +		ibv_match[nd++] = ibv_list[ret];
> > > > +	}
> > > > +	ibv_match[nd] = NULL;
> > > > +	if (!nd) {
> > > > +		/* No device macthes, just complain and bail out. */
> > > > +		mlx5_glue->free_device_list(ibv_list);
> > > > +		DRV_LOG(WARNING,
> > > > +			"no Verbs device matches PCI device " PCI_PRI_FMT
> > > > ","
> > > > +			" are kernel drivers loaded?",
> > > > +			pci_dev->addr.domain, pci_dev->addr.bus,
> > > > +			pci_dev->addr.devid, pci_dev->addr.function);
> > > > +		rte_errno = ENOENT;
> > > > +		ret = -rte_errno;
> > > > +		return ret;
> > > > +	}
> > > > +	nl_route = mlx5_nl_init(NETLINK_ROUTE);
> > > > +	nl_rdma = mlx5_nl_init(NETLINK_RDMA);
> > > > +	if (nd == 1) {
> > > > +		/*
> > > > +		 * Found single matching device may have multiple ports.
> > > > +		 * Each port may be representor, we have to check the port
> > > > +		 * number and check the representors existence.
> > > > +		 */
> > > > +		if (nl_rdma >= 0)
> > > > +			np = mlx5_nl_portnum(nl_rdma, ibv_match[0]-
> > > > >name);
> > > > +		if (!np)
> > > > +			DRV_LOG(WARNING, "can not get IB device \"%s\""
> > > > +					 " ports number", ibv_match[0]-
> > > > >name);
> > >
> > > This warning is misleading. On old kernels it is expected to have
> > > multiple IB devices instead of a single one w/ multiple ports.
> > > The level should be changed for debug, and the syntax to express it
> > > is not an error.
> 
> On old kernels we should get np = 1. If np == 0 it means an error, even if
> there is old kernel. Zero np means that is something is going in wrong way
> and we should notify the user. We do not expect this behavior from old/new
> kernels, so this message should not be annoying.

OK.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 12/14] net/mlx5: update install/uninstall int handler routines
  2019-03-21 14:01       ` Slava Ovsiienko
@ 2019-03-24  9:07         ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-24  9:07 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 4:02 PM, Slava Ovsiienko:
> To: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: RE: [PATCH 12/14] net/mlx5: update install/uninstall int handler
> routines
> >
> > Thursday, March 21, 2019 10:11 AM, Viacheslav Ovsiienko:
> > > Subject: [PATCH 12/14] net/mlx5: update install/uninstall int
> > > handler routines
> > >
> > > We are implementing the support for multport Infiniband device withj
> > > representors attached to these multiple ports. Asynchronous device
> > > event notifications (link status change, removal event, etc.) should
> > > be shared between ports. We are going to implement shared event
> > > handler and this patch introduces appropriate device structure
> > > changes and updated event handler install and uninstall routines.
> > >
> > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

[...]

> > >
> > >  	assert(spawn);
> > >  	/* Search for IB context by device name. */ @@ -212,6 +213,9 @@
> > > struct mlx5_dev_spawn_data {
> > >  		sizeof(sh->ibdev_name));
> > >  	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
> > >  		sizeof(sh->ibdev_path));
> > > +	pthread_mutex_init(&sh->intr_mutex, NULL);
> > > +	for (i = 0; i < sh->max_port; i++)
> > > +		sh->port[i].port_id = RTE_MAX_ETHPORTS;
> >
> > Why you need struct here? You port array is not just of uint32_t type?
> 
> For the case if we would like to add some other per-port data accessible only
> from shared context. For example - in interrupt handler we have only one
> parameter - the shared context, and we should deduce eth_dev for the
> some device (not DPDK port_id) port
> 
> Actually it is uint_32_t array for now, but it is easily extandable, for example,
> we could add per-port context for interrupt handler.

OK, then you need to doc it as such ("per port context for interrupt"). 

> 
> >

[...]

> > > +	assert(priv->ibv_port <= sh->max_port);
> > > +	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
> > > +	if (sh->port[priv->ibv_port - 1].port_id < RTE_MAX_ETHPORTS) {
> >
> > I don't understand why need an array to understand handler is already
> > exists.
> > Why not the refcnt?
> 
> Array is needed to deduce the eth_dev from the device port number.
> Here is interrupt handler flow:
> - entry
> - for()
>  - get_event()
> - get device port (note, this is IB port index, not DPDK port id) from event
> - check in the array whether the handler is installed for this port
>   (array member is less than RTE_MAX_ETHPORTS)
> -  get DPDK port_id from array()
> 
> Array member just indicates whether the handler for  given IB port is
> installed. Reference counter is used for rte_intr_callback_register/
> rte_intr_callback_unregister calls.
> rte_intr_callback_register() is called when the first handler for the port is
> being installed.
> rte_intr_callback_unregister() is called when the lastt handler for the port is
> being gone away.

OK, it will be much clear to have all the handler patches in a single patch. 

> 
> >
> > > +		/* The handler is already installed for this port. */
> > > +		assert(sh->intr_cnt++);
> >
> > Asserts are compiled only in debug mode. You should not put any logic
> > (++) into them.
> 
> Yes, it is a bug, there should no be "++" at all. Thanks.
> 
> >
> > > +		goto exit;
> > > +	}
> > > +	sh->port[priv->ibv_port - 1].port_id = (uint32_t)dev->data->port_id;
> > > +	if (sh->intr_cnt) {
> > > +		sh->intr_cnt++;
> > > +		goto exit;
> > > +	}
> > > +	/* No shared handler installed. */
> > > +	assert(sh->ctx->async_fd > 0);
> > > +	flags = fcntl(sh->ctx->async_fd, F_GETFL);
> > > +	ret = fcntl(sh->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
> > > +	if (ret) {
> > > +		DRV_LOG(INFO, "failed to change file descriptor"
> > > +			      " async event queue");
> > > +		/* Indicate there will be no interrupts. */
> > > +		dev->data->dev_conf.intr_conf.lsc = 0;
> > > +		dev->data->dev_conf.intr_conf.rmv = 0;
> > > +		sh->port[priv->ibv_port - 1].port_id = RTE_MAX_ETHPORTS;
> > > +		goto exit;
> > > +	}
> > > +	sh->intr_handle.fd = sh->ctx->async_fd;
> > > +	sh->intr_handle.type = RTE_INTR_HANDLE_EXT;
> > > +	rte_intr_callback_register(&sh->intr_handle,
> > > +				   mlx5_dev_interrupt_handler, sh);
> > > +	sh->intr_cnt++;
> > > +exit:
> > > +	pthread_mutex_unlock(&sh->intr_mutex);
> > > +}
> > > +
> > > +/**
> > >   * Uninstall interrupt handler.
> > >   *
> > >   * @param dev
> > > @@ -1119,15 +1209,10 @@ int mlx5_fw_version_get(struct rte_eth_dev
> > > *dev, char *fw_ver, size_t fw_size)  {
> > >  	struct mlx5_priv *priv = dev->data->dev_private;
> > >
> > > -	if (dev->data->dev_conf.intr_conf.lsc ||
> > > -	    dev->data->dev_conf.intr_conf.rmv)
> > > -		rte_intr_callback_unregister(&priv->intr_handle,
> > > -					     mlx5_dev_interrupt_handler,
> > > dev);
> > > +	mlx5_dev_shared_handler_uninstall(dev);
> > >  	if (priv->primary_socket)
> > >  		rte_intr_callback_unregister(&priv->intr_handle_socket,
> > >  					     mlx5_dev_handler_socket, dev);
> > > -	priv->intr_handle.fd = 0;
> > > -	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> > >  	priv->intr_handle_socket.fd = 0;
> > >  	priv->intr_handle_socket.type = RTE_INTR_HANDLE_UNKNOWN;  }
> > @@
> > > -1142,28 +1227,9 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev,
> > > char *fw_ver, size_t fw_size)
> > > mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)  {
> > >  	struct mlx5_priv *priv = dev->data->dev_private;
> > > -	struct ibv_context *ctx = priv->sh->ctx;
> > >  	int ret;
> > > -	int flags;
> > >
> > > -	assert(ctx->async_fd > 0);
> > > -	flags = fcntl(ctx->async_fd, F_GETFL);
> > > -	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
> > > -	if (ret) {
> > > -		DRV_LOG(INFO,
> > > -			"port %u failed to change file descriptor async event"
> > > -			" queue",
> > > -			dev->data->port_id);
> > > -		dev->data->dev_conf.intr_conf.lsc = 0;
> > > -		dev->data->dev_conf.intr_conf.rmv = 0;
> > > -	}
> > > -	if (dev->data->dev_conf.intr_conf.lsc ||
> > > -	    dev->data->dev_conf.intr_conf.rmv) {
> > > -		priv->intr_handle.fd = ctx->async_fd;
> > > -		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
> > > -		rte_intr_callback_register(&priv->intr_handle,
> > > -					   mlx5_dev_interrupt_handler, dev);
> > > -	}
> > > +	mlx5_dev_shared_handler_install(dev);
> > >  	ret = mlx5_socket_init(dev);
> > >  	if (ret)
> > >  		DRV_LOG(ERR, "port %u cannot initialise socket: %s",
> > > --
> > > 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 14/14] net/mlx5: add source vport match to the ingress rules
  2019-03-21 14:11       ` Slava Ovsiienko
@ 2019-03-24  9:13         ` Shahaf Shuler
  2019-03-25  7:44           ` Slava Ovsiienko
  0 siblings, 1 reply; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-24  9:13 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Thursday, March 21, 2019 4:12 PM, Slava Ovsiienko:
> Subject: RE: [PATCH 14/14] net/mlx5: add source vport match to the ingress
> rules
> > >
> > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

[...]

> > > +		flow_dv_translate_source_vport(matcher.mask.buf,
> > > +					       dev_flow->dv.value.buf,
> > > +					       priv->representor_id < 0 ?
> > > +					       priv->representor_id :
> > > +					       priv->representor_id + 1,
> >
> > The vport of representor_id 0 will be 1?
> > Who owns vport 0?
> 
> PF.
> There is the foillowing vport mapping (for single E-Switch per PF):
> 
> -1 - wire

Wire, i.e. the uplink representor. indeed it's index is defined by PRM to -1.

> 0 - PF (uplink + VF reps)

I don't understand this part. When you have uplink representor you don't have PF.
Moreover, I would expect the first representor created to have vport_num=0 (w/ name pf0vf0).
Isn't it the case? 

> 1 - VF0
> 2 - VF1
> ...
> n+1 - VFn
> 
> This code is subject to change - (SF, multi E-Switch per function, etc), this
> patch currently supports single E-Switch  per PF.
> 
> >
> > > +					       0xffff);
> > > +	}
> > >  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
> > >  		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
> > >  		void *match_mask = matcher.mask.buf;
> > > --
> > > 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH 14/14] net/mlx5: add source vport match to the ingress rules
  2019-03-24  9:13         ` Shahaf Shuler
@ 2019-03-25  7:44           ` Slava Ovsiienko
  0 siblings, 0 replies; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-25  7:44 UTC (permalink / raw)
  To: Shahaf Shuler, dev

> -----Original Message-----
> From: Shahaf Shuler
> Sent: Sunday, March 24, 2019 11:14
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Subject: RE: [PATCH 14/14] net/mlx5: add source vport match to the ingress
> rules
> 
> Thursday, March 21, 2019 4:12 PM, Slava Ovsiienko:
> > Subject: RE: [PATCH 14/14] net/mlx5: add source vport match to the
> > ingress rules
> > > >
> > > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> [...]
> 
> > > > +		flow_dv_translate_source_vport(matcher.mask.buf,
> > > > +					       dev_flow->dv.value.buf,
> > > > +					       priv->representor_id < 0 ?
> > > > +					       priv->representor_id :
> > > > +					       priv->representor_id + 1,
> > >
> > > The vport of representor_id 0 will be 1?
> > > Who owns vport 0?
> >
> > PF.
> > There is the foillowing vport mapping (for single E-Switch per PF):
> >
> > -1 - wire
> 
> Wire, i.e. the uplink representor. indeed it's index is defined by PRM to -1.
> 
> > 0 - PF (uplink + VF reps)
> 
> I don't understand this part. When you have uplink representor you don't
> have PF.
We do. There is PF anyway. In meaning we always have PCI function, which
Is some kind of "container" for representors and also serves as E-Switch manager.
It may contain only "uplink rep" if there is no VF enabled, or may contain the bunch 
of uplink and VF reps. This PF has vport zero assigned.

> Moreover, I would expect the first representor created to have vport_num=0
> (w/ name pf0vf0).
> Isn't it the case?
No. Representors do not have dedicated vports, they share the vport zero (PF) instead.
And VFs have dedicated vports.

> 
> > 1 - VF0
> > 2 - VF1
> > ...
> > n+1 - VFn
> >
> > This code is subject to change - (SF, multi E-Switch per function,
> > etc), this patch currently supports single E-Switch  per PF.
> >
> > >
> > > > +					       0xffff);
> > > > +	}
> > > >  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
> > > >  		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
> > > >  		void *match_mask = matcher.mask.buf;
> > > > --
> > > > 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v2 00/14] net/mlx5: add support for multiport IB devices
  2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                     ` (14 preceding siblings ...)
  2019-03-21 12:13   ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Shahaf Shuler
@ 2019-03-25 17:03   ` Viacheslav Ovsiienko
  2019-03-25 17:03     ` [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
                       ` (13 more replies)
  15 siblings, 14 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The Mellanox NICs support SR-IOV and have E-Switch feature. 
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context.

mlx5_pci_probe() routine is planned to be extended to probe
the ports of found Infiniband devices. All active ports,
belonging to the same Infiniband device will use the signle
shared Infiniband context of that device:
  - QP are created within shared context, with assigning to
    specified port
  - Verb flows are also created with specifying port index
  - Protection Domain is shared by all representors
  - Global Memory Regions Cache is shared by all representors,
    local ones are not (not in this patchset)
  - Shared Asynchronous Event Handler distributes events
    among the device ports (not in this patchset)
  - UAR mapping to user space is also shared (not in patchset)
		  
Each representor has its own attach network interface device,
interface name and index can be retrieved via IB device and
port index, so the link control is performed via Netlink
in the same way as in existing implementation.

This patchset also superseedes http://patches.dpdk.org/patch/50486/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
v2:
	- clarifying comments added
	- asserts added
	- secondary process does not probe IB device at all and
	  does not create shared context
	- minor bugs (like ++ in assert) fixed
	- representor_id contains corresponding VF vport index,
	  no corrections +1 is needed anymore. vport index is
	  calculated in the only place in code and if indexing
	  schema is changed it would be easy to modify
	  
v1:
	http://patches.dpdk.org/cover/51451/

Viacheslav Ovsiienko (13):
  net/mlx5: add representor recognition on kernels 5.x
  net/mlx5: modify get ifindex routine for multiport IB
  net/mlx5: add getting IB ports number for multiport IB
  net/mlx5: add multiport IB device support to probing
  net/mlx5: add IB shared context alloc/free functions
  net/mlx5: switch to the names in the shared IB context
  net/mlx5: switch to the shared Protection Domain
  net/mlx5: switch to the shared context IB attributes
  net/mlx5: switch to the shared IB device context
  net/mlx5: provide IB port for the object being created
  net/mlx5: update install/uninstall int handler routines
  net/mlx5: update event handler for multiport IB devices
  net/mlx5: add source vport match to the ingress rules

 drivers/net/mlx5/Makefile          |  10 +
 drivers/net/mlx5/meson.build       |   4 +
 drivers/net/mlx5/mlx5.c            | 644 ++++++++++++++++++++++++++-----------
 drivers/net/mlx5/mlx5.h            |  44 ++-
 drivers/net/mlx5/mlx5_ethdev.c     | 242 +++++++++-----
 drivers/net/mlx5/mlx5_flow.c       |   2 +
 drivers/net/mlx5/mlx5_flow_dv.c    |  45 ++-
 drivers/net/mlx5/mlx5_flow_verbs.c |   7 +-
 drivers/net/mlx5/mlx5_mr.c         |   4 +-
 drivers/net/mlx5/mlx5_nl.c         | 102 +++++-
 drivers/net/mlx5/mlx5_rxq.c        |  41 +--
 drivers/net/mlx5/mlx5_socket.c     |   4 +-
 drivers/net/mlx5/mlx5_stats.c      |  22 +-
 drivers/net/mlx5/mlx5_trigger.c    |   2 +-
 drivers/net/mlx5/mlx5_txq.c        |  24 +-
 15 files changed, 859 insertions(+), 338 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-25 18:06       ` Stephen Hemminger
                         ` (2 more replies)
  2019-03-25 17:03     ` [PATCH v2 02/13] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
                       ` (12 subsequent siblings)
  13 siblings, 3 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The master device and VF representors were distinguished by
presence of port name, master device did not have one. The new Linux
kernels starting from 5.0 provide the port name for master device
and the implemented representor recognizing method does not work.
The new recognizing method is based on quiering the VF number,
created on the base of the device.

The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
attribute is specified in the Netlink request message.

Also the presence of device symlink in device sysfs folder is
added to distinguish representors with sysfs based method.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/Makefile      | 10 ++++++++++
 drivers/net/mlx5/meson.build   |  4 ++++
 drivers/net/mlx5/mlx5.c        | 17 ++++++++++++++---
 drivers/net/mlx5/mlx5.h        |  1 +
 drivers/net/mlx5/mlx5_ethdev.c | 14 ++++++++++++--
 drivers/net/mlx5/mlx5_nl.c     | 37 ++++++++++++++++++++++++++++++++++---
 6 files changed, 75 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 1ed299d..3dd7e38 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -231,6 +231,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_NUM_VF \
+		linux/if_link.h \
+		enum IFLA_NUM_VF \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_EXT_MASK \
+		linux/if_link.h \
+		enum IFLA_EXT_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_IFLA_PHYS_SWITCH_ID \
 		linux/if_link.h \
 		enum IFLA_PHYS_SWITCH_ID \
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 0cf2f08..e3cb9bc 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -133,6 +133,10 @@ if build
 		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
 		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
 		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
+		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
+		'IFLA_NUM_VF' ],
+		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
+		'IFLA_EXT_MASK' ],
 		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
 		'IFLA_PHYS_SWITCH_ID' ],
 		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ad1975c..032f1f2 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -13,7 +13,6 @@
 #include <errno.h>
 #include <net/if.h>
 #include <sys/mman.h>
-#include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 
 /* Verbs header. */
@@ -1001,9 +1000,21 @@
 	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
 	priv->nl_sn = 0;
 	priv->representor = !!switch_info->representor;
+	priv->master = !!switch_info->master;
 	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
-	priv->representor_id =
-		switch_info->representor ? switch_info->port_name : -1;
+	/*
+	 * Currently we support sinlge E-Switch per PF configurations
+	 * only and representors_id field contains the vport index for
+	 * corresponding VF, deduced from representor port name.
+	 * For exapmple, let's have the IB device port 10, it has
+	 * attached network device eth0, which has port name attribute
+	 * pf0vf2, we can deduce the VF number as 2, and set vport index
+	 * as 3 (2+1). This assigning schema should be changed if the
+	 * multiple E-Switch instance per PF configurations or/and PCI
+	 * subfunctions dded.
+	 */
+	priv->representor_id = switch_info->representor ?
+			       switch_info->port_name + 1 : -1;
 	/*
 	 * Look for sibling devices in order to reuse their switch domain
 	 * if any, otherwise allocate one.
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a88cb4a..58bc37f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -214,6 +214,7 @@ struct mlx5_priv {
 	uint16_t mtu; /* Configured MTU. */
 	unsigned int isolated:1; /* Whether isolated mode is enabled. */
 	unsigned int representor:1; /* Device is a port representor. */
+	unsigned int master:1; /* Device is a E-Switch master. */
 	uint16_t domain_id; /* Switch domain identifier. */
 	int32_t representor_id; /* Port representor identifier. */
 	/* RX/TX queues. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2e8a296..64c6b6f 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1362,8 +1362,10 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 		.port_name = 0,
 		.switch_id = 0,
 	};
+	DIR *dir;
 	bool port_name_set = false;
 	bool port_switch_id_set = false;
+	bool device_dir = false;
 	char c;
 	int ret;
 
@@ -1376,6 +1378,8 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	      ifname);
 	MKSTR(phys_switch_id, "/sys/class/net/%s/phys_switch_id",
 	      ifname);
+	MKSTR(pci_device, "/sys/class/net/%s/device",
+	      ifname);
 
 	file = fopen(phys_port_name, "rb");
 	if (file != NULL) {
@@ -1394,8 +1398,14 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 		fscanf(file, "%" SCNx64 "%c", &data.switch_id, &c) == 2 &&
 		c == '\n';
 	fclose(file);
-	data.master = port_switch_id_set && !port_name_set;
-	data.representor = port_switch_id_set && port_name_set;
+	dir = opendir(pci_device);
+	if (dir != NULL) {
+		closedir(dir);
+		device_dir = true;
+	}
+	data.master = port_switch_id_set && (!port_name_set || device_dir);
+	data.representor = port_switch_id_set && port_name_set && !device_dir;
+	assert(!(data.master && data.representor));
 	*info = data;
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 8a10109..9d0fcba 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -65,6 +65,12 @@
 #endif
 
 /* These are normally found in linux/if_link.h. */
+#ifndef HAVE_IFLA_NUM_VF
+#define IFLA_NUM_VF 21
+#endif
+#ifndef HAVE_IFLA_EXT_MASK
+#define IFLA_EXT_MASK 29
+#endif
 #ifndef HAVE_IFLA_PHYS_SWITCH_ID
 #define IFLA_PHYS_SWITCH_ID 36
 #endif
@@ -837,6 +843,7 @@ struct mlx5_nl_ifindex_data {
 	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
 	bool port_name_set = false;
 	bool switch_id_set = false;
+	bool num_vf_set = false;
 
 	if (nh->nlmsg_type != RTM_NEWLINK)
 		goto error;
@@ -848,6 +855,9 @@ struct mlx5_nl_ifindex_data {
 		if (ra->rta_len > nh->nlmsg_len - off)
 			goto error;
 		switch (ra->rta_type) {
+		case IFLA_NUM_VF:
+			num_vf_set = true;
+			break;
 		case IFLA_PHYS_PORT_NAME:
 			port_name_set =
 				mlx5_translate_port_name((char *)payload,
@@ -864,8 +874,20 @@ struct mlx5_nl_ifindex_data {
 		}
 		off += RTA_ALIGN(ra->rta_len);
 	}
-	info.master = switch_id_set && !port_name_set;
-	info.representor = switch_id_set && port_name_set;
+	if (switch_id_set) {
+		if (info.port_name_new) {
+			/* New representors naming schema. */
+			if (port_name_set) {
+				info.master = (info.port_name == -1);
+				info.representor = (info.port_name != -1);
+			}
+		} else {
+			/* Legacy representors naming schema. */
+			info.master = (!port_name_set || num_vf_set);
+			info.representor = port_name_set && !num_vf_set;
+		}
+	}
+	assert(!(data.master && data.representor));
 	memcpy(arg, &info, sizeof(info));
 	return 0;
 error:
@@ -893,9 +915,13 @@ struct mlx5_nl_ifindex_data {
 	struct {
 		struct nlmsghdr nh;
 		struct ifinfomsg info;
+		struct rtattr rta;
+		uint32_t extmask;
 	} req = {
 		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(req.info)),
+			.nlmsg_len = NLMSG_LENGTH
+					(sizeof(req.info) +
+					 RTA_LENGTH(sizeof(uint32_t))),
 			.nlmsg_type = RTM_GETLINK,
 			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
 		},
@@ -903,6 +929,11 @@ struct mlx5_nl_ifindex_data {
 			.ifi_family = AF_UNSPEC,
 			.ifi_index = ifindex,
 		},
+		.rta = {
+			.rta_type = IFLA_EXT_MASK,
+			.rta_len = RTA_LENGTH(sizeof(int32_t)),
+		},
+		.extmask = RTE_LE32(1),
 	};
 	int ret;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 02/13] net/mlx5: modify get ifindex routine for multiport IB
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
  2019-03-25 17:03     ` [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-26 11:47       ` Shahaf Shuler
  2019-03-25 17:03     ` [PATCH v2 03/13] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
                       ` (11 subsequent siblings)
  13 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

There is the routine mlx5_nl_ifindex() returning the
network interface index associated with Infiniband device.
We are going to support multiport IB devices, now function
takes the IB port as argument and returns ifindex associated
with tuple <IB device, IB port>

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  2 +-
 drivers/net/mlx5/mlx5.h        |  2 +-
 drivers/net/mlx5/mlx5_ethdev.c |  2 +-
 drivers/net/mlx5/mlx5_nl.c     | 20 +++++++++++++-------
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 032f1f2..25bd3e5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1452,7 +1452,7 @@ struct mlx5_dev_spawn_data {
 			list[i].ifindex = 0;
 		else
 			list[i].ifindex = mlx5_nl_ifindex
-				(nl_rdma, list[i].ibv_dev->name);
+				(nl_rdma, list[i].ibv_dev->name, 1);
 		if (nl_route < 0 ||
 		    !list[i].ifindex ||
 		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 58bc37f..626e1c0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -444,7 +444,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct ether_addr *mac,
 void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
 int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
 int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
-unsigned int mlx5_nl_ifindex(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 64c6b6f..5e0c945 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -222,7 +222,7 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int ifindex =
 		priv->nl_socket_rdma >= 0 ?
-		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name) : 0;
+		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name, 1) : 0;
 
 	if (!ifindex) {
 		if (!priv->representor)
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 9d0fcba..f223670 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -85,11 +85,12 @@ struct mlx5_nl_mac_addr {
 	int mac_n; /**< Number of addresses in the array. */
 };
 
-/** Data structure used by mlx5_nl_ifindex_cb(). */
+/** Data structure used by mlx5_nl_cmdget_cb(). */
 struct mlx5_nl_ifindex_data {
 	const char *name; /**< IB device name (in). */
 	uint32_t ibindex; /**< IB device index (out). */
 	uint32_t ifindex; /**< Network interface index (out). */
+	uint32_t portnum; /**< IB device max port number. */
 };
 
 /**
@@ -695,12 +696,13 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_ifindex_cb(struct nlmsghdr *nh, void *arg)
+mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
 {
 	struct mlx5_nl_ifindex_data *data = arg;
 	size_t off = NLMSG_HDRLEN;
 	uint32_t ibindex = 0;
 	uint32_t ifindex = 0;
+	uint32_t portnum = 0;
 	int found = 0;
 
 	if (nh->nlmsg_type !=
@@ -725,6 +727,9 @@ struct mlx5_nl_ifindex_data {
 		case RDMA_NLDEV_ATTR_NDEV_INDEX:
 			ifindex = *(uint32_t *)payload;
 			break;
+		case RDMA_NLDEV_ATTR_PORT_INDEX:
+			portnum = *(uint32_t *)payload;
+			break;
 		default:
 			break;
 		}
@@ -733,6 +738,7 @@ struct mlx5_nl_ifindex_data {
 	if (found) {
 		data->ibindex = ibindex;
 		data->ifindex = ifindex;
+		data->portnum = portnum;
 	}
 	return 0;
 error:
@@ -751,15 +757,15 @@ struct mlx5_nl_ifindex_data {
  *   Netlink socket of the RDMA kind (NETLINK_RDMA).
  * @param[in] name
  *   IB device name.
- *
+ * @param[in] pindex
+ *   IB device port index, starting from 1
  * @return
  *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
  *   is set.
  */
 unsigned int
-mlx5_nl_ifindex(int nl, const char *name)
+mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
 {
-	static const uint32_t pindex = 1;
 	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.name = name,
@@ -785,7 +791,7 @@ struct mlx5_nl_ifindex_data {
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!data.ibindex)
@@ -808,7 +814,7 @@ struct mlx5_nl_ifindex_data {
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!data.ifindex)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 03/13] net/mlx5: add getting IB ports number for multiport IB
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
  2019-03-25 17:03     ` [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
  2019-03-25 17:03     ` [PATCH v2 02/13] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-25 17:03     ` [PATCH v2 04/13] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
                       ` (10 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

There is the routine mlx5_nl_portnum() added to get
the number of ports of multiport Infiniband device.
It is assumed the Uplink/VF representors are attached
on this ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.h    |  1 +
 drivers/net/mlx5/mlx5_nl.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 626e1c0..d6967a3 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -444,6 +444,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct ether_addr *mac,
 void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
 int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
 int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
 unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index f223670..010a25e 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -826,6 +826,51 @@ struct mlx5_nl_ifindex_data {
 }
 
 /**
+ * Get the number of physical ports of given IB device.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ *
+ * @return
+ *   A valid (nonzero) number of ports on success, 0 otherwise
+ *   and rte_errno is set.
+ */
+unsigned int
+mlx5_nl_portnum(int nl, const char *name)
+{
+	uint32_t seq = random();
+	struct mlx5_nl_ifindex_data data = {
+		.name = name,
+		.ibindex = 0,
+		.ifindex = 0,
+		.portnum = 0,
+	};
+	struct nlmsghdr req = {
+		.nlmsg_len = NLMSG_LENGTH(0),
+		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					       RDMA_NLDEV_CMD_GET),
+		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+	};
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req, seq);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!data.ibindex) {
+		rte_errno = ENODEV;
+		return 0;
+	}
+	if (!data.portnum)
+		rte_errno = EINVAL;
+	return data.portnum;
+}
+
+/**
  * Process switch information from Netlink message.
  *
  * @param nh
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 04/13] net/mlx5: add multiport IB device support to probing
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
                       ` (2 preceding siblings ...)
  2019-03-25 17:03     ` [PATCH v2 03/13] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-26 12:02       ` Shahaf Shuler
  2019-03-25 17:03     ` [PATCH v2 05/13] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
                       ` (9 subsequent siblings)
  13 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

mlx5_pci_probe() routine is refactored to probe the ports
of found Infiniband devices. All active ports (with attached
network interface), belonging to the same Infiniband device
will use the signle shared Infiniband context of that device.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 317 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 225 insertions(+), 92 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 25bd3e5..4338254 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -130,6 +130,16 @@
 /** Driver-specific log messages type. */
 int mlx5_logtype;
 
+/** Data associated with devices to spawn. */
+struct mlx5_dev_spawn_data {
+	uint32_t ifindex; /**< Network interface index. */
+	uint32_t max_port; /**< IB device maximal port index. */
+	uint32_t ibv_port; /**< IB device physical port index. */
+	struct mlx5_switch_info info; /**< Switch information. */
+	struct ibv_device *ibv_dev; /**< Associated IB device. */
+	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
+};
+
 /**
  * Prepare shared data between primary and secondary process.
  */
@@ -716,12 +726,10 @@
  *
  * @param dpdk_dev
  *   Backing DPDK device.
- * @param ibv_dev
- *   Verbs device.
+ * @param spawn
+ *   Verbs device parameters (name, port, switch_info) to spawn.
  * @param config
  *   Device configuration parameters.
- * @param[in] switch_info
- *   Switch properties of Ethernet device.
  *
  * @return
  *   A valid Ethernet device object on success, NULL otherwise and rte_errno
@@ -732,10 +740,11 @@
  */
 static struct rte_eth_dev *
 mlx5_dev_spawn(struct rte_device *dpdk_dev,
-	       struct ibv_device *ibv_dev,
-	       struct mlx5_dev_config config,
-	       const struct mlx5_switch_info *switch_info)
+	       struct mlx5_dev_spawn_data *spawn,
+	       struct mlx5_dev_config config)
 {
+	const struct mlx5_switch_info *switch_info = &spawn->info;
+	struct ibv_device *ibv_dev = spawn->ibv_dev;
 	struct ibv_context *ctx = NULL;
 	struct ibv_device_attr_ex attr;
 	struct ibv_port_attr port_attr;
@@ -952,7 +961,7 @@
 		return eth_dev;
 	}
 	/* Check port status. */
-	err = mlx5_glue->query_port(ctx, 1, &port_attr);
+	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
 	if (err) {
 		DRV_LOG(ERR, "port query failed: %s", strerror(err));
 		goto error;
@@ -1327,14 +1336,6 @@
 	return NULL;
 }
 
-/** Data associated with devices to spawn. */
-struct mlx5_dev_spawn_data {
-	unsigned int ifindex; /**< Network interface index. */
-	struct mlx5_switch_info info; /**< Switch information. */
-	struct ibv_device *ibv_dev; /**< Associated IB device. */
-	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
-};
-
 /**
  * Comparison callback to sort device data.
  *
@@ -1391,7 +1392,24 @@ struct mlx5_dev_spawn_data {
 	       struct rte_pci_device *pci_dev)
 {
 	struct ibv_device **ibv_list;
-	unsigned int n = 0;
+	/*
+	 * Number of found IB Devices matching with requested PCI BDF.
+	 * nd != 1 means there are multiple IB devices over the same
+	 * PCI device and we have representors and master.
+	 */
+	unsigned int nd = 0;
+	/*
+	 * Number of found IB device Ports. nd = 1 and np = 1..n means
+	 * we have the single multiport IB device, and there may be
+	 * representors attached to some of found ports.
+	 */
+	unsigned int np = 0;
+	/*
+	 * Number of DPDK ethernet devices to Spawn - either over
+	 * multiple IB devices or multiple ports of single IB device.
+	 * Actually this is the number of iterations to spawn.
+	 */
+	unsigned int ns = 0;
 	struct mlx5_dev_config dev_config;
 	int ret;
 
@@ -1403,8 +1421,14 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(ERR, "cannot list devices, is ib_uverbs loaded?");
 		return -rte_errno;
 	}
-
+	/*
+	 * First scan the list of all Infiniband devices to find
+	 * matching ones, gathering into the list.
+	 */
 	struct ibv_device *ibv_match[ret + 1];
+	int nl_route = -1;
+	int nl_rdma = -1;
+	unsigned int i;
 
 	while (ret-- > 0) {
 		struct rte_pci_addr pci_addr;
@@ -1419,77 +1443,183 @@ struct mlx5_dev_spawn_data {
 			continue;
 		DRV_LOG(INFO, "PCI information matches for device \"%s\"",
 			ibv_list[ret]->name);
-		ibv_match[n++] = ibv_list[ret];
+		ibv_match[nd++] = ibv_list[ret];
+	}
+	ibv_match[nd] = NULL;
+	if (!nd) {
+		/* No device macthes, just complain and bail out. */
+		mlx5_glue->free_device_list(ibv_list);
+		DRV_LOG(WARNING,
+			"no Verbs device matches PCI device " PCI_PRI_FMT ","
+			" are kernel drivers loaded?",
+			pci_dev->addr.domain, pci_dev->addr.bus,
+			pci_dev->addr.devid, pci_dev->addr.function);
+		rte_errno = ENOENT;
+		ret = -rte_errno;
+		return ret;
+	}
+	nl_route = mlx5_nl_init(NETLINK_ROUTE);
+	nl_rdma = mlx5_nl_init(NETLINK_RDMA);
+	if (nd == 1) {
+		/*
+		 * Found single matching device may have multiple ports.
+		 * Each port may be representor, we have to check the port
+		 * number and check the representors existence.
+		 */
+		if (nl_rdma >= 0)
+			np = mlx5_nl_portnum(nl_rdma, ibv_match[0]->name);
+		if (!np)
+			DRV_LOG(WARNING, "can not get IB device \"%s\""
+					 " ports number", ibv_match[0]->name);
 	}
-	ibv_match[n] = NULL;
-
-	struct mlx5_dev_spawn_data list[n];
-	int nl_route = n ? mlx5_nl_init(NETLINK_ROUTE) : -1;
-	int nl_rdma = n ? mlx5_nl_init(NETLINK_RDMA) : -1;
-	unsigned int i;
-	unsigned int u;
-
 	/*
-	 * The existence of several matching entries (n > 1) means port
-	 * representors have been instantiated. No existing Verbs call nor
-	 * /sys entries can tell them apart, this can only be done through
-	 * Netlink calls assuming kernel drivers are recent enough to
-	 * support them.
-	 *
-	 * In the event of identification failure through Netlink, try again
-	 * through sysfs, then either:
-	 *
-	 * 1. No device matches (n == 0), complain and bail out.
-	 * 2. A single IB device matches (n == 1) and is not a representor,
-	 *    assume no switch support.
-	 * 3. Otherwise no safe assumptions can be made; complain louder and
-	 *    bail out.
+	 * Now we can determine the maximal
+	 * amount of devices to be spawned.
 	 */
-	for (i = 0; i != n; ++i) {
-		list[i].ibv_dev = ibv_match[i];
-		list[i].eth_dev = NULL;
-		if (nl_rdma < 0)
-			list[i].ifindex = 0;
-		else
-			list[i].ifindex = mlx5_nl_ifindex
-				(nl_rdma, list[i].ibv_dev->name, 1);
-		if (nl_route < 0 ||
-		    !list[i].ifindex ||
-		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
-					&list[i].info) ||
-		    ((!list[i].info.representor && !list[i].info.master) &&
-		     mlx5_sysfs_switch_info(list[i].ifindex, &list[i].info))) {
-			list[i].ifindex = 0;
-			memset(&list[i].info, 0, sizeof(list[i].info));
-			continue;
+	struct mlx5_dev_spawn_data list[np ? np : nd];
+
+	if (np > 1) {
+		/*
+		 * Signle IB device with multiple ports found,
+		 * it may be E-Switch master device and representors.
+		 * We have to perform identification trough the ports.
+		 */
+		assert(nl_rdma >= 0);
+		assert(ns == 0);
+		assert(nd == 1);
+		for (i = 1; i <= np; ++i) {
+			list[ns].max_port = np;
+			list[ns].ibv_port = i;
+			list[ns].ibv_dev = ibv_match[0];
+			list[ns].eth_dev = NULL;
+			list[ns].ifindex = mlx5_nl_ifindex
+					(nl_rdma, list[ns].ibv_dev->name, i);
+			if (!list[ns].ifindex) {
+				/*
+				 * No network interface index found for the
+				 * specified port, it means there is no
+				 * representor on this port. It's OK,
+				 * there can be disabled ports, for example
+				 * if sriov_numvfs < sriov_totalvfs.
+				 */
+				continue;
+			}
+			ret = -1;
+			if (nl_route >= 0)
+				ret = mlx5_nl_switch_info
+					       (nl_route,
+						list[ns].ifindex,
+						&list[ns].info);
+			if (ret || (!list[ns].info.representor &&
+				    !list[ns].info.master)) {
+				/*
+				 * We failed to recognize representors with
+				 * Netlink, let's try to perform the task
+				 * with sysfs.
+				 */
+				ret =  mlx5_sysfs_switch_info
+						(list[ns].ifindex,
+						 &list[ns].info);
+			}
+			if (!ret && (list[ns].info.representor ^
+				     list[ns].info.master))
+				ns++;
 		}
-	}
-	if (nl_rdma >= 0)
-		close(nl_rdma);
-	if (nl_route >= 0)
-		close(nl_route);
-	/* Count unidentified devices. */
-	for (u = 0, i = 0; i != n; ++i)
-		if (!list[i].info.master && !list[i].info.representor)
-			++u;
-	if (u) {
-		if (n == 1 && u == 1) {
-			/* Case #2. */
-			DRV_LOG(INFO, "no switch support detected");
-		} else {
-			/* Case #3. */
+		if (!ns) {
 			DRV_LOG(ERR,
-				"unable to tell which of the matching devices"
-				" is the master (lack of kernel support?)");
-			n = 0;
+				"unable to recognize master/representors"
+				" on the IB device with multiple ports");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
+		}
+	} else {
+		/*
+		 * The existence of several matching entries (nd > 1) means
+		 * port representors have been instantiated. No existing Verbs
+		 * call nor sysfs entries can tell them apart, this can only
+		 * be done through Netlink calls assuming kernel drivers are
+		 * recent enough to support them.
+		 *
+		 * In the event of identification failure through Netlink,
+		 * try again through sysfs, then:
+		 *
+		 * 1. A single IB device matches (nd == 1) with single
+		 *    port (np=0/1) and is not a representor, assume
+		 *    no switch support.
+		 *
+		 * 2. Otherwise no safe assumptions can be made;
+		 *    complain louder and bail out.
+		 */
+		np = 1;
+		for (i = 0; i != nd; ++i) {
+			memset(&list[ns].info, 0, sizeof(list[ns].info));
+			list[ns].max_port = 1;
+			list[ns].ibv_port = 1;
+			list[ns].ibv_dev = ibv_match[i];
+			list[ns].eth_dev = NULL;
+			list[ns].ifindex = 0;
+			if (nl_rdma >= 0)
+				list[ns].ifindex = mlx5_nl_ifindex
+					(nl_rdma, list[ns].ibv_dev->name, 1);
+			if (!list[ns].ifindex) {
+				/*
+				 * No network interface index found for the
+				 * specified device, it means there it is not
+				 * a representor/master.
+				 */
+				continue;
+			}
+			ret = -1;
+			if (nl_route >= 0)
+				ret = mlx5_nl_switch_info
+					       (nl_route,
+						list[ns].ifindex,
+						&list[ns].info);
+			if (ret || (!list[ns].info.representor &&
+				    !list[ns].info.master)) {
+				/*
+				 * We failed to recognize representors with
+				 * Netlink, let's try to perform the task
+				 * with sysfs.
+				 */
+				ret =  mlx5_sysfs_switch_info
+						(list[ns].ifindex,
+						 &list[ns].info);
+			}
+			if (!ret && (list[ns].info.representor ^
+				     list[ns].info.master)) {
+				ns++;
+			} else if ((nd == 1) &&
+				   !list[ns].info.representor &&
+				   !list[ns].info.master) {
+				/*
+				 * Single IB device with
+				 * one physical port and
+				 * attached network device.
+				 * May be SRIOV is not enabled
+				 * or there is no representors.
+				 */
+				DRV_LOG(INFO, "no E-Switch support detected");
+				ns++;
+				break;
+			}
+		}
+		if (!ns) {
+			DRV_LOG(ERR,
+				"unable to recognize master/representors"
+				" on the multiple IB devices");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
 		}
 	}
+	assert(ns);
 	/*
 	 * Sort list to probe devices in natural order for users convenience
 	 * (i.e. master first, then representors from lowest to highest ID).
 	 */
-	if (n)
-		qsort(list, n, sizeof(*list), mlx5_dev_spawn_data_cmp);
+	qsort(list, ns, sizeof(*list), mlx5_dev_spawn_data_cmp);
 	/* Default configuration. */
 	dev_config = (struct mlx5_dev_config){
 		.hw_padding = 0,
@@ -1508,7 +1638,7 @@ struct mlx5_dev_spawn_data {
 			.min_rxqs_num = MLX5_MPRQ_MIN_RXQS,
 		},
 	};
-	/* Device speicific configuration. */
+	/* Device specific configuration. */
 	switch (pci_dev->id.device_id) {
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX5BF:
 		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS_BLUEFIELD;
@@ -1525,12 +1655,12 @@ struct mlx5_dev_spawn_data {
 	/* Set architecture-dependent default value if unset. */
 	if (dev_config.txqs_vec == MLX5_ARG_UNSET)
 		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS;
-	for (i = 0; i != n; ++i) {
+	for (i = 0; i != ns; ++i) {
 		uint32_t restore;
 
 		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
-						 list[i].ibv_dev, dev_config,
-						 &list[i].info);
+						 &list[i],
+						 dev_config);
 		if (!list[i].eth_dev) {
 			if (rte_errno != EBUSY && rte_errno != EEXIST)
 				break;
@@ -1543,16 +1673,7 @@ struct mlx5_dev_spawn_data {
 		list[i].eth_dev->data->dev_flags |= restore;
 		rte_eth_dev_probing_finish(list[i].eth_dev);
 	}
-	mlx5_glue->free_device_list(ibv_list);
-	if (!n) {
-		DRV_LOG(WARNING,
-			"no Verbs device matches PCI device " PCI_PRI_FMT ","
-			" are kernel drivers loaded?",
-			pci_dev->addr.domain, pci_dev->addr.bus,
-			pci_dev->addr.devid, pci_dev->addr.function);
-		rte_errno = ENOENT;
-		ret = -rte_errno;
-	} else if (i != n) {
+	if (i != ns) {
 		DRV_LOG(ERR,
 			"probe of PCI device " PCI_PRI_FMT " aborted after"
 			" encountering an error: %s",
@@ -1574,6 +1695,18 @@ struct mlx5_dev_spawn_data {
 	} else {
 		ret = 0;
 	}
+exit:
+	/*
+	 * Do the routine cleanup:
+	 * - close opened Netlink sockets
+	 * - free the Infiniband device list
+	 */
+	if (nl_rdma >= 0)
+		close(nl_rdma);
+	if (nl_route >= 0)
+		close(nl_route);
+	assert(ibv_list);
+	mlx5_glue->free_device_list(ibv_list);
 	return ret;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 05/13] net/mlx5: add IB shared context alloc/free functions
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
                       ` (3 preceding siblings ...)
  2019-03-25 17:03     ` [PATCH v2 04/13] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-26 12:10       ` Shahaf Shuler
  2019-03-25 17:03     ` [PATCH v2 06/13] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
                       ` (8 subsequent siblings)
  13 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The Mellanox NICs support SR-IOV and have E-Switch feature.
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context saving various resources. This patch
introduces appropriate structures.

Alsa the functions to allocate and free shared IB context for
multiport are added. The IB device context, Protection Domain,
device attributes, Infiniband names are going to be relocated
to the shared structure from the device private one.
mlx5_dev_spawn() is updated to support shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 298 +++++++++++++++++++++++++++++++++---------------
 drivers/net/mlx5/mlx5.h |  32 ++++++
 2 files changed, 235 insertions(+), 95 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4338254..4c9621a 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -140,6 +140,141 @@ struct mlx5_dev_spawn_data {
 	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
 };
 
+static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
+
+/**
+ * Allocate shared IB device context. If there is multiport device the
+ * master and representors will share this context, if there is single
+ * port dedicated IB device, the context will be used by only given
+ * port due to unification.
+ *
+ * Routine first searches the context for the spesified IB device name,
+ * if found the shared context assumed and reference counter is incremented.
+ * If no context found the new one is created and initialized with specified
+ * IB device context and parameters.
+ *
+ * @param[in] spawn
+ *   Pointer to the IB device attributes (name, port, etc).
+ *
+ * @return
+ *   Pointer to mlx5_ibv_shared object on success,
+ *   otherwise NULL and rte_errno is set.
+ */
+static struct mlx5_ibv_shared *
+mlx5_alloc_shared_ibctx(const struct mlx5_dev_spawn_data *spawn)
+{
+	struct mlx5_ibv_shared *sh;
+	int err = 0;
+
+	assert(spawn);
+	/* Secondary process should not create the shared context. */
+	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
+	/* Search for IB context by device name. */
+	LIST_FOREACH(sh, &mlx5_ibv_list, next) {
+		if (!strcmp(sh->ibdev_name, spawn->ibv_dev->name)) {
+			sh->refcnt++;
+			return sh;
+		}
+	}
+	/* No device found, we have to create new sharted context. */
+	assert(spawn->max_port);
+	sh = rte_zmalloc("ethdev shared ib context",
+			 sizeof(struct mlx5_ibv_shared) +
+			 spawn->max_port *
+			 sizeof(struct mlx5_ibv_shared_port),
+			 RTE_CACHE_LINE_SIZE);
+	if (!sh) {
+		DRV_LOG(ERR, "shared context allocation failure");
+		rte_errno  = ENOMEM;
+		return NULL;
+	}
+	/* Try to open IB device with DV first, then usual Verbs. */
+	errno = 0;
+	sh->ctx = mlx5_glue->dv_open_device(spawn->ibv_dev);
+	if (sh->ctx) {
+		sh->devx = 1;
+		DRV_LOG(DEBUG, "DevX is supported");
+	} else {
+		sh->ctx = mlx5_glue->open_device(spawn->ibv_dev);
+		if (!sh->ctx) {
+			err = errno ? errno : ENODEV;
+			goto error;
+		}
+		DRV_LOG(DEBUG, "DevX is NOT supported");
+	}
+	err = mlx5_glue->query_device_ex(sh->ctx, NULL, &sh->device_attr);
+	if (err) {
+		DRV_LOG(DEBUG, "ibv_query_device_ex() failed");
+		goto error;
+	}
+	sh->refcnt = 1;
+	sh->max_port = spawn->max_port;
+	strncpy(sh->ibdev_name, sh->ctx->device->name,
+		sizeof(sh->ibdev_name));
+	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
+		sizeof(sh->ibdev_path));
+	sh->pd = mlx5_glue->alloc_pd(sh->ctx);
+	if (sh->pd == NULL) {
+		DRV_LOG(ERR, "PD allocation failure");
+		err = ENOMEM;
+		goto error;
+	}
+	LIST_INSERT_HEAD(&mlx5_ibv_list, sh, next);
+	return sh;
+error:
+	assert(sh);
+	if (sh->pd)
+		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->ctx)
+		claim_zero(mlx5_glue->close_device(sh->ctx));
+	rte_free(sh);
+	assert(err > 0);
+	rte_errno = err;
+	return NULL;
+}
+
+/**
+ * Free shared IB device context. Decrement counter and if zero free
+ * all allocated resources and close handles.
+ *
+ * @param[in] sh
+ *   Pointer to mlx5_ibv_shared object to free
+ */
+static void
+mlx5_free_shared_ibctx(struct mlx5_ibv_shared *sh)
+{
+#ifndef NDEBUG
+	/* Check the object presence in the list. */
+	struct mlx5_ibv_shared *lctx;
+
+	LIST_FOREACH(lctx, &mlx5_ibv_list, next)
+		if (lctx == sh)
+			break;
+	assert(lctx);
+	if (lctx != sh) {
+		DRV_LOG(ERR, "Freeing non-existing shared IB context");
+		return;
+	}
+#endif
+	assert(sh);
+	assert(sh->refcnt);
+	if (--sh->refcnt)
+		return;
+	/* Zero reference counter, we should release resources. */
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		assert(sh->secondary);
+		assert(sh->ctx);
+		assert(!sh->pd);
+	}
+	LIST_REMOVE(sh, next);
+	if (sh->pd)
+		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->ctx)
+		claim_zero(mlx5_glue->close_device(sh->ctx));
+	rte_free(sh);
+}
+
+
 /**
  * Prepare shared data between primary and secondary process.
  */
@@ -289,12 +424,10 @@ struct mlx5_dev_spawn_data {
 	}
 	mlx5_mprq_free_mp(dev);
 	mlx5_mr_release(dev);
-	if (priv->pd != NULL) {
-		assert(priv->ctx != NULL);
-		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
-		claim_zero(mlx5_glue->close_device(priv->ctx));
-	} else
-		assert(priv->ctx == NULL);
+	assert(priv->sh);
+	if (priv->sh)
+		mlx5_free_shared_ibctx(priv->sh);
+	priv->sh = NULL;
 	if (priv->rss_conf.rss_key != NULL)
 		rte_free(priv->rss_conf.rss_key);
 	if (priv->reta_idx != NULL)
@@ -744,11 +877,8 @@ struct mlx5_dev_spawn_data {
 	       struct mlx5_dev_config config)
 {
 	const struct mlx5_switch_info *switch_info = &spawn->info;
-	struct ibv_device *ibv_dev = spawn->ibv_dev;
-	struct ibv_context *ctx = NULL;
-	struct ibv_device_attr_ex attr;
+	struct mlx5_ibv_shared *sh;
 	struct ibv_port_attr port_attr;
-	struct ibv_pd *pd = NULL;
 	struct mlx5dv_context dv_attr = { .comp_mask = 0 };
 	struct rte_eth_dev *eth_dev = NULL;
 	struct mlx5_priv *priv = NULL;
@@ -807,18 +937,47 @@ struct mlx5_dev_spawn_data {
 	}
 	/* Prepare shared data between primary and secondary process. */
 	mlx5_prepare_shared_data();
-	errno = 0;
-	ctx = mlx5_glue->dv_open_device(ibv_dev);
-	if (ctx) {
-		config.devx = 1;
-		DRV_LOG(DEBUG, "DEVX is supported");
-	} else {
-		ctx = mlx5_glue->open_device(ibv_dev);
-		if (!ctx) {
-			rte_errno = errno ? errno : ENODEV;
-			return NULL;
+	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			DRV_LOG(ERR, "can not attach rte ethdev");
+			rte_errno = ENOMEM;
+			err = rte_errno;
+			goto error;
+		}
+		eth_dev->device = dpdk_dev;
+		eth_dev->dev_ops = &mlx5_dev_sec_ops;
+		err = mlx5_uar_init_secondary(eth_dev);
+		if (err) {
+			err = rte_errno;
+			goto error;
 		}
+		/* Receive command fd from primary process */
+		err = mlx5_socket_connect(eth_dev);
+		if (err < 0) {
+			err = rte_errno;
+			goto error;
+		}
+		/* Remap UAR for Tx queues. */
+		err = mlx5_tx_uar_remap(eth_dev, err);
+		if (err) {
+			err = rte_errno;
+			goto error;
+		}
+		/*
+		 * Ethdev pointer is still required as input since
+		 * the primary device is not accessible from the
+		 * secondary process.
+		 */
+		eth_dev->rx_pkt_burst = mlx5_select_rx_function(eth_dev);
+		eth_dev->tx_pkt_burst = mlx5_select_tx_function(eth_dev);
+		return eth_dev;
 	}
+	sh = mlx5_alloc_shared_ibctx(spawn);
+	if (!sh)
+		return NULL;
+	config.devx = sh->devx;
 #ifdef HAVE_IBV_MLX5_MOD_SWP
 	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_SWP;
 #endif
@@ -832,7 +991,7 @@ struct mlx5_dev_spawn_data {
 #ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
 	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_STRIDING_RQ;
 #endif
-	mlx5_glue->dv_query_device(ctx, &dv_attr);
+	mlx5_glue->dv_query_device(sh->ctx, &dv_attr);
 	if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) {
 		if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW) {
 			DRV_LOG(DEBUG, "enhanced MPW is supported");
@@ -917,51 +1076,8 @@ struct mlx5_dev_spawn_data {
 		" old OFED/rdma-core version or firmware configuration");
 #endif
 	config.mpls_en = mpls_en;
-	err = mlx5_glue->query_device_ex(ctx, NULL, &attr);
-	if (err) {
-		DEBUG("ibv_query_device_ex() failed");
-		goto error;
-	}
-	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
-	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-		eth_dev = rte_eth_dev_attach_secondary(name);
-		if (eth_dev == NULL) {
-			DRV_LOG(ERR, "can not attach rte ethdev");
-			rte_errno = ENOMEM;
-			err = rte_errno;
-			goto error;
-		}
-		eth_dev->device = dpdk_dev;
-		eth_dev->dev_ops = &mlx5_dev_sec_ops;
-		err = mlx5_uar_init_secondary(eth_dev);
-		if (err) {
-			err = rte_errno;
-			goto error;
-		}
-		/* Receive command fd from primary process */
-		err = mlx5_socket_connect(eth_dev);
-		if (err < 0) {
-			err = rte_errno;
-			goto error;
-		}
-		/* Remap UAR for Tx queues. */
-		err = mlx5_tx_uar_remap(eth_dev, err);
-		if (err) {
-			err = rte_errno;
-			goto error;
-		}
-		/*
-		 * Ethdev pointer is still required as input since
-		 * the primary device is not accessible from the
-		 * secondary process.
-		 */
-		eth_dev->rx_pkt_burst = mlx5_select_rx_function(eth_dev);
-		eth_dev->tx_pkt_burst = mlx5_select_tx_function(eth_dev);
-		claim_zero(mlx5_glue->close_device(ctx));
-		return eth_dev;
-	}
 	/* Check port status. */
-	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
+	err = mlx5_glue->query_port(sh->ctx, spawn->ibv_port, &port_attr);
 	if (err) {
 		DRV_LOG(ERR, "port query failed: %s", strerror(err));
 		goto error;
@@ -975,13 +1091,7 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(DEBUG, "port is not active: \"%s\" (%d)",
 			mlx5_glue->port_state_str(port_attr.state),
 			port_attr.state);
-	/* Allocate protection domain. */
-	pd = mlx5_glue->alloc_pd(ctx);
-	if (pd == NULL) {
-		DRV_LOG(ERR, "PD allocation failure");
-		err = ENOMEM;
-		goto error;
-	}
+	/* Allocate private eth device data. */
 	priv = rte_zmalloc("ethdev private structure",
 			   sizeof(*priv),
 			   RTE_CACHE_LINE_SIZE);
@@ -990,13 +1100,11 @@ struct mlx5_dev_spawn_data {
 		err = ENOMEM;
 		goto error;
 	}
-	priv->ctx = ctx;
-	strncpy(priv->ibdev_name, priv->ctx->device->name,
-		sizeof(priv->ibdev_name));
-	strncpy(priv->ibdev_path, priv->ctx->device->ibdev_path,
-		sizeof(priv->ibdev_path));
-	priv->device_attr = attr;
-	priv->pd = pd;
+	priv->sh = sh;
+	priv->ctx = sh->ctx;
+	priv->ibv_port = spawn->ibv_port;
+	priv->device_attr = sh->device_attr;
+	priv->pd = sh->pd;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
@@ -1062,7 +1170,8 @@ struct mlx5_dev_spawn_data {
 			strerror(rte_errno));
 		goto error;
 	}
-	config.hw_csum = !!(attr.device_cap_flags_ex & IBV_DEVICE_RAW_IP_CSUM);
+	config.hw_csum = !!(sh->device_attr.device_cap_flags_ex &
+			    IBV_DEVICE_RAW_IP_CSUM);
 	DRV_LOG(DEBUG, "checksum offloading is %ssupported",
 		(config.hw_csum ? "" : "not "));
 #if !defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) && \
@@ -1076,7 +1185,7 @@ struct mlx5_dev_spawn_data {
 	}
 #endif
 	config.ind_table_max_size =
-		attr.rss_caps.max_rwq_indirection_table_size;
+		sh->device_attr.rss_caps.max_rwq_indirection_table_size;
 	/*
 	 * Remove this check once DPDK supports larger/variable
 	 * indirection tables.
@@ -1085,18 +1194,18 @@ struct mlx5_dev_spawn_data {
 		config.ind_table_max_size = ETH_RSS_RETA_SIZE_512;
 	DRV_LOG(DEBUG, "maximum Rx indirection table size is %u",
 		config.ind_table_max_size);
-	config.hw_vlan_strip = !!(attr.raw_packet_caps &
+	config.hw_vlan_strip = !!(sh->device_attr.raw_packet_caps &
 				  IBV_RAW_PACKET_CAP_CVLAN_STRIPPING);
 	DRV_LOG(DEBUG, "VLAN stripping is %ssupported",
 		(config.hw_vlan_strip ? "" : "not "));
-	config.hw_fcs_strip = !!(attr.raw_packet_caps &
+	config.hw_fcs_strip = !!(sh->device_attr.raw_packet_caps &
 				 IBV_RAW_PACKET_CAP_SCATTER_FCS);
 	DRV_LOG(DEBUG, "FCS stripping configuration is %ssupported",
 		(config.hw_fcs_strip ? "" : "not "));
 #if defined(HAVE_IBV_WQ_FLAG_RX_END_PADDING)
-	hw_padding = !!attr.rx_pad_end_addr_align;
+	hw_padding = !!sh->device_attr.rx_pad_end_addr_align;
 #elif defined(HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING)
-	hw_padding = !!(attr.device_cap_flags_ex &
+	hw_padding = !!(sh->device_attr.device_cap_flags_ex &
 			IBV_DEVICE_PCI_WRITE_END_PADDING);
 #endif
 	if (config.hw_padding && !hw_padding) {
@@ -1105,11 +1214,11 @@ struct mlx5_dev_spawn_data {
 	} else if (config.hw_padding) {
 		DRV_LOG(DEBUG, "Rx end alignment padding is enabled");
 	}
-	config.tso = (attr.tso_caps.max_tso > 0 &&
-		      (attr.tso_caps.supported_qpts &
+	config.tso = (sh->device_attr.tso_caps.max_tso > 0 &&
+		      (sh->device_attr.tso_caps.supported_qpts &
 		       (1 << IBV_QPT_RAW_PACKET)));
 	if (config.tso)
-		config.tso_max_payload_sz = attr.tso_caps.max_tso;
+		config.tso_max_payload_sz = sh->device_attr.tso_caps.max_tso;
 	/*
 	 * MPW is disabled by default, while the Enhanced MPW is enabled
 	 * by default.
@@ -1243,7 +1352,7 @@ struct mlx5_dev_spawn_data {
 		}
 	}
 #ifdef HAVE_MLX5DV_DR
-		ns = mlx5dv_dr_create_ns(ctx,
+		ns = mlx5dv_dr_create_ns(sh->ctx,
 					 MLX5DV_DR_NS_DOMAIN_INGRESS_BYPASS);
 		if (ns == NULL) {
 			DRV_LOG(ERR, "mlx5dv_dr_create_ns failed");
@@ -1251,7 +1360,7 @@ struct mlx5_dev_spawn_data {
 			goto error;
 		}
 		priv->rx_ns = ns;
-		ns = mlx5dv_dr_create_ns(ctx,
+		ns = mlx5dv_dr_create_ns(sh->ctx,
 					 MLX5DV_DR_NS_DOMAIN_EGRESS_BYPASS);
 		if (ns == NULL) {
 			DRV_LOG(ERR, "mlx5dv_dr_create_ns failed");
@@ -1268,7 +1377,8 @@ struct mlx5_dev_spawn_data {
 		.free = &mlx5_free_verbs_buf,
 		.data = priv,
 	};
-	mlx5_glue->dv_set_context_attr(ctx, MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
+	mlx5_glue->dv_set_context_attr(sh->ctx,
+				       MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
 				       (void *)((uintptr_t)&alctr));
 	/* Bring Ethernet device up. */
 	DRV_LOG(DEBUG, "port %u forcing Ethernet interface up",
@@ -1322,15 +1432,13 @@ struct mlx5_dev_spawn_data {
 		if (eth_dev != NULL)
 			eth_dev->data->dev_private = NULL;
 	}
-	if (pd)
-		claim_zero(mlx5_glue->dealloc_pd(pd));
 	if (eth_dev != NULL) {
 		/* mac_addrs must not be freed alone because part of dev_private */
 		eth_dev->data->mac_addrs = NULL;
 		rte_eth_dev_release_port(eth_dev);
 	}
-	if (ctx)
-		claim_zero(mlx5_glue->close_device(ctx));
+	if (sh)
+		mlx5_free_shared_ibctx(sh);
 	assert(err > 0);
 	rte_errno = err;
 	return NULL;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d6967a3..4568de7 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -196,10 +196,42 @@ struct mlx5_flow_tbl_resource {
 #define MLX5_MAX_TABLES 1024
 #define MLX5_GROUP_FACTOR 1
 
+/* Per port data of shared IB device. */
+struct mlx5_ibv_shared_port {
+	uint32_t ih_port_id;
+	/*
+	 * Interrupt handler port_id. Used by shared interrupt
+	 * handler to find the corresponding rte_eth device
+	 * by IB port index. If value is equal or greater
+	 * RTE_MAX_ETHPORTS it means there is no subhandler
+	 * installed for specified IB port index.
+	 */
+};
+
+/*
+ * Shared Infiniband device context for Master/Representors
+ * which belong to same IB device with multiple IB ports.
+ **/
+struct mlx5_ibv_shared {
+	LIST_ENTRY(mlx5_ibv_shared) next;
+	uint32_t refcnt;
+	uint32_t devx:1; /* Opened with DV. */
+	uint32_t max_port; /* Maximal IB device port index. */
+	struct ibv_context *ctx; /* Verbs/DV context. */
+	struct ibv_pd *pd; /* Protection Domain. */
+	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
+	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
+	struct ibv_device_attr_ex device_attr; /* Device properties. */
+	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
+	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
+};
+
 struct mlx5_priv {
 	LIST_ENTRY(mlx5_priv) mem_event_cb;
 	/**< Called by memory event callback. */
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
+	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
+	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 06/13] net/mlx5: switch to the names in the shared IB context
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
                       ` (4 preceding siblings ...)
  2019-03-25 17:03     ` [PATCH v2 05/13] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-25 17:03     ` [PATCH v2 07/13] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
                       ` (7 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The IB device names are moved from device private data
to the shared context, code involving the names is updated.
The IB port index treatment is added where it is relevant.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.h        |  2 --
 drivers/net/mlx5/mlx5_ethdev.c | 10 +++++++---
 drivers/net/mlx5/mlx5_stats.c  | 22 +++++++++++++---------
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4568de7..859b752 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -235,8 +235,6 @@ struct mlx5_priv {
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
-	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 5e0c945..e89b705 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -138,8 +138,10 @@ struct ethtool_link_settings {
 	unsigned int dev_port_prev = ~0u;
 	char match[IF_NAMESIZE] = "";
 
+	assert(priv);
+	assert(priv->sh);
 	{
-		MKSTR(path, "%s/device/net", priv->ibdev_path);
+		MKSTR(path, "%s/device/net", priv->sh->ibdev_path);
 
 		dir = opendir(path);
 		if (dir == NULL) {
@@ -159,7 +161,7 @@ struct ethtool_link_settings {
 			continue;
 
 		MKSTR(path, "%s/device/net/%s/%s",
-		      priv->ibdev_path, name,
+		      priv->sh->ibdev_path, name,
 		      (dev_type ? "dev_id" : "dev_port"));
 
 		file = fopen(path, "rb");
@@ -222,7 +224,9 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int ifindex =
 		priv->nl_socket_rdma >= 0 ?
-		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name, 1) : 0;
+		mlx5_nl_ifindex(priv->nl_socket_rdma,
+				priv->sh->ibdev_name,
+				priv->ibv_port) : 0;
 
 	if (!ifindex) {
 		if (!priv->representor)
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 6906dc8..5af199d 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -140,18 +140,22 @@
 mlx5_read_ib_stat(struct mlx5_priv *priv, const char *ctr_name, uint64_t *stat)
 {
 	FILE *file;
-	MKSTR(path, "%s/ports/1/hw_counters/%s",
-		  priv->ibdev_path,
-		  ctr_name);
+	if (priv->sh) {
+		MKSTR(path, "%s/ports/%d/hw_counters/%s",
+			  priv->sh->ibdev_path,
+			  priv->ibv_port,
+			  ctr_name);
 
-	file = fopen(path, "rb");
-	if (file) {
-		int n = fscanf(file, "%" SCNu64, stat);
+		file = fopen(path, "rb");
+		if (file) {
+			int n = fscanf(file, "%" SCNu64, stat);
 
-		fclose(file);
-		if (n != 1)
-			stat = 0;
+			fclose(file);
+			if (n == 1)
+				return;
+		}
 	}
+	*stat = 0;
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 07/13] net/mlx5: switch to the shared Protection Domain
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
                       ` (5 preceding siblings ...)
  2019-03-25 17:03     ` [PATCH v2 06/13] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-25 17:03     ` [PATCH v2 08/13] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
                       ` (6 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The PMD code is updated to use Protected Domain from the
the shared IB device context. The Domain is shared between
all devices belonging to the same multiport Infiniband device.
If IB device has only one port, the PD is not shared, because
there is only ethernet device created over IB one.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c     |  1 -
 drivers/net/mlx5/mlx5.h     |  1 -
 drivers/net/mlx5/mlx5_mr.c  |  4 ++--
 drivers/net/mlx5/mlx5_rxq.c | 10 +++++-----
 drivers/net/mlx5/mlx5_txq.c |  2 +-
 5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4c9621a..cc6657a 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1104,7 +1104,6 @@ struct mlx5_dev_spawn_data {
 	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
 	priv->device_attr = sh->device_attr;
-	priv->pd = sh->pd;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 859b752..4e5f09a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -234,7 +234,6 @@ struct mlx5_priv {
 	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
-	struct ibv_pd *pd; /* Protection Domain. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 700d83d..f7eb9a5 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -719,7 +719,7 @@ struct mr_update_mp_data {
 	 * mlx5_alloc_buf_extern() which eventually calls rte_malloc_socket()
 	 * through mlx5_alloc_verbs_buf().
 	 */
-	mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)data.start, len,
+	mr->ibv_mr = mlx5_glue->reg_mr(priv->sh->pd, (void *)data.start, len,
 				       IBV_ACCESS_LOCAL_WRITE);
 	if (mr->ibv_mr == NULL) {
 		DEBUG("port %u fail to create a verbs MR for address (%p)",
@@ -1156,7 +1156,7 @@ struct mr_update_mp_data {
 	}
 	DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)",
 		dev->data->port_id, mem_idx, mp->name);
-	mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
+	mr->ibv_mr = mlx5_glue->reg_mr(priv->sh->pd, (void *)addr, len,
 				       IBV_ACCESS_LOCAL_WRITE);
 	if (mr->ibv_mr == NULL) {
 		DRV_LOG(WARNING,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2f60999..0496c4e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -867,7 +867,7 @@ struct mlx5_rxq_ibv *
 		.max_wr = wqe_n >> rxq_data->sges_n,
 		/* Max number of scatter/gather elements in a WR. */
 		.max_sge = 1 << rxq_data->sges_n,
-		.pd = priv->pd,
+		.pd = priv->sh->pd,
 		.cq = tmpl->cq,
 		.comp_mask =
 			IBV_WQ_FLAGS_CVLAN_STRIPPING |
@@ -1831,7 +1831,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = hash_fields,
 			},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 		 },
 		 &qp_init_attr);
 #else
@@ -1850,7 +1850,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = hash_fields,
 			},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 		 });
 #endif
 	if (!qp) {
@@ -2006,7 +2006,7 @@ struct mlx5_rxq_ibv *
 			.wq_type = IBV_WQT_RQ,
 			.max_wr = 1,
 			.max_sge = 1,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 			.cq = cq,
 		 });
 	if (!wq) {
@@ -2160,7 +2160,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = 0,
 				},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd
+			.pd = priv->sh->pd
 		 });
 	if (!qp) {
 		DEBUG("port %u cannot allocate QP for drop queue",
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d185617..d3a5498 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -426,7 +426,7 @@ struct mlx5_txq_ibv *
 		 * Tx burst.
 		 */
 		.sq_sig_all = 0,
-		.pd = priv->pd,
+		.pd = priv->sh->pd,
 		.comp_mask = IBV_QP_INIT_ATTR_PD,
 	};
 	if (txq_data->max_inline)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 08/13] net/mlx5: switch to the shared context IB attributes
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
                       ` (6 preceding siblings ...)
  2019-03-25 17:03     ` [PATCH v2 07/13] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-25 17:03     ` [PATCH v2 09/13] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
                       ` (5 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to use the shared IB device attributes,
located in the shared IB context. It saves some memory if
there are representors created over the single Infiniband
device with multuple ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  1 -
 drivers/net/mlx5/mlx5.h        |  1 -
 drivers/net/mlx5/mlx5_ethdev.c |  6 +++---
 drivers/net/mlx5/mlx5_rxq.c    |  8 ++++----
 drivers/net/mlx5/mlx5_txq.c    | 14 +++++++-------
 5 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cc6657a..93e4eac 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1103,7 +1103,6 @@ struct mlx5_dev_spawn_data {
 	priv->sh = sh;
 	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
-	priv->device_attr = sh->device_attr;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4e5f09a..4227010 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -233,7 +233,6 @@ struct mlx5_priv {
 	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
 	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
-	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index e89b705..46d90cd 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -515,8 +515,8 @@ struct ethtool_link_settings {
 	 * Since we need one CQ per QP, the limit is the minimum number
 	 * between the two values.
 	 */
-	max = RTE_MIN(priv->device_attr.orig_attr.max_cq,
-		      priv->device_attr.orig_attr.max_qp);
+	max = RTE_MIN(priv->sh->device_attr.orig_attr.max_cq,
+		      priv->sh->device_attr.orig_attr.max_qp);
 	/* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */
 	if (max >= 65535)
 		max = 65535;
@@ -577,7 +577,7 @@ struct ethtool_link_settings {
 int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct ibv_device_attr *attr = &priv->device_attr.orig_attr;
+	struct ibv_device_attr *attr = &priv->sh->device_attr.orig_attr;
 	size_t size = strnlen(attr->fw_ver, sizeof(attr->fw_ver)) + 1;
 
 	if (fw_size < size)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 0496c4e..fd1c3a2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -856,10 +856,10 @@ struct mlx5_rxq_ibv *
 		rte_errno = ENOMEM;
 		goto error;
 	}
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_qp_wr is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_qp_wr);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_sge is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_sge);
+	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_qp_wr);
+	DRV_LOG(DEBUG, "port %u device_attr.max_sge is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_sge);
 	attr.wq.ibv = (struct ibv_wq_init_attr){
 		.wq_context = NULL, /* Could be useful in the future. */
 		.wq_type = IBV_WQT_RQ,
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d3a5498..10a3040 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -407,15 +407,15 @@ struct mlx5_txq_ibv *
 		.cap = {
 			/* Max number of outstanding WRs. */
 			.max_send_wr =
-				((priv->device_attr.orig_attr.max_qp_wr <
+				((priv->sh->device_attr.orig_attr.max_qp_wr <
 				  desc) ?
-				 priv->device_attr.orig_attr.max_qp_wr :
+				 priv->sh->device_attr.orig_attr.max_qp_wr :
 				 desc),
 			/*
 			 * Max number of scatter/gather elements in a WR,
 			 * must be 1 to prevent libmlx5 from trying to affect
 			 * too much memory. TX gather is not impacted by the
-			 * priv->device_attr.max_sge limit and will still work
+			 * device_attr.max_sge limit and will still work
 			 * properly.
 			 */
 			.max_send_sge = 1,
@@ -780,10 +780,10 @@ struct mlx5_txq_ctrl *
 	tmpl->txq.elts_n = log2above(desc);
 	tmpl->idx = idx;
 	txq_set_params(tmpl);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_qp_wr is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_qp_wr);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_sge is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_sge);
+	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_qp_wr);
+	DRV_LOG(DEBUG, "port %u device_attr.max_sge is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_sge);
 	tmpl->txq.elts =
 		(struct rte_mbuf *(*)[1 << tmpl->txq.elts_n])(tmpl + 1);
 	tmpl->txq.stats.idx = idx;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 09/13] net/mlx5: switch to the shared IB device context
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
                       ` (7 preceding siblings ...)
  2019-03-25 17:03     ` [PATCH v2 08/13] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-25 17:03     ` [PATCH v2 10/13] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
                       ` (4 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to use the shared IB device context and
device handles. The IB device context is shared between
reprentors created over the single multiport IB device. All
Verbs and DevX objects will be created whithin this shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c            |  3 +--
 drivers/net/mlx5/mlx5.h            |  1 -
 drivers/net/mlx5/mlx5_ethdev.c     | 13 +++++++------
 drivers/net/mlx5/mlx5_flow_dv.c    |  9 +++++----
 drivers/net/mlx5/mlx5_flow_verbs.c |  6 ++++--
 drivers/net/mlx5/mlx5_rxq.c        | 23 ++++++++++++-----------
 drivers/net/mlx5/mlx5_socket.c     |  4 ++--
 drivers/net/mlx5/mlx5_trigger.c    |  2 +-
 drivers/net/mlx5/mlx5_txq.c        |  4 ++--
 9 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 93e4eac..27bd266 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -398,7 +398,7 @@ struct mlx5_dev_spawn_data {
 
 	DRV_LOG(DEBUG, "port %u closing device \"%s\"",
 		dev->data->port_id,
-		((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+		((priv->sh->ctx != NULL) ? priv->sh->ctx->device->name : ""));
 	/* In case mlx5_dev_stop() has not been called. */
 	mlx5_dev_interrupt_handler_uninstall(dev);
 	mlx5_traffic_disable(dev);
@@ -1101,7 +1101,6 @@ struct mlx5_dev_spawn_data {
 		goto error;
 	}
 	priv->sh = sh;
-	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4227010..597e3af 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -232,7 +232,6 @@ struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
 	uint32_t ibv_port; /* IB device port number. */
-	struct ibv_context *ctx; /* Verbs context. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 46d90cd..a44a383 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1055,7 +1055,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	}
 	/* Read all message and acknowledge them. */
 	for (;;) {
-		if (mlx5_glue->get_async_event(priv->ctx, &event))
+		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
 			break;
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
 			event.event_type == IBV_EVENT_PORT_ERR) &&
@@ -1142,12 +1142,13 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	int ret;
 	int flags;
 
-	assert(priv->ctx->async_fd > 0);
-	flags = fcntl(priv->ctx->async_fd, F_GETFL);
-	ret = fcntl(priv->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
+	assert(ctx->async_fd > 0);
+	flags = fcntl(ctx->async_fd, F_GETFL);
+	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
 	if (ret) {
 		DRV_LOG(INFO,
 			"port %u failed to change file descriptor async event"
@@ -1158,7 +1159,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	}
 	if (dev->data->dev_conf.intr_conf.lsc ||
 	    dev->data->dev_conf.intr_conf.rmv) {
-		priv->intr_handle.fd = priv->ctx->async_fd;
+		priv->intr_handle.fd = ctx->async_fd;
 		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
 		rte_intr_callback_register(&priv->intr_handle,
 					   mlx5_dev_interrupt_handler, dev);
@@ -1303,7 +1304,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	struct ibv_device_attr device_attr;
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (mlx5_glue->query_device(priv->ctx, &device_attr) == EIO)
+	if (mlx5_glue->query_device(priv->sh->ctx, &device_attr) == EIO)
 		return 1;
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 44b36ee..b8943da 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -840,7 +840,7 @@ struct field_modify_info modify_tcp[] = {
 	*cache_resource = *resource;
 	cache_resource->verbs_action =
 		mlx5_glue->dv_create_flow_action_packet_reformat
-			(priv->ctx, cache_resource->reformat_type,
+			(priv->sh->ctx, cache_resource->reformat_type,
 			 cache_resource->ft_type, ns, cache_resource->flags,
 			 cache_resource->size,
 			 (cache_resource->size ? cache_resource->buf : NULL));
@@ -1569,7 +1569,8 @@ struct field_modify_info modify_tcp[] = {
 	*cache_resource = *resource;
 	cache_resource->verbs_action =
 		mlx5_glue->dv_create_flow_action_modify_header
-					(priv->ctx, cache_resource->ft_type,
+					(priv->sh->ctx,
+					 cache_resource->ft_type,
 					 ns, 0,
 					 cache_resource->actions_num *
 					 sizeof(cache_resource->actions[0]),
@@ -1629,7 +1630,7 @@ struct field_modify_info modify_tcp[] = {
 		ret = -ENOMEM;
 		goto error_exit;
 	}
-	ret = mlx5_devx_cmd_flow_counter_alloc(priv->ctx, dcs);
+	ret = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, dcs);
 	if (ret)
 		goto error_exit;
 	struct mlx5_flow_counter tmpl = {
@@ -2982,7 +2983,7 @@ struct field_modify_info modify_tcp[] = {
 	if (matcher->egress)
 		dv_attr.flags |= IBV_FLOW_ATTR_FLAGS_EGRESS;
 	cache_matcher->matcher_object =
-		mlx5_glue->dv_create_flow_matcher(priv->ctx, &dv_attr,
+		mlx5_glue->dv_create_flow_matcher(priv->sh->ctx, &dv_attr,
 						  tbl->obj);
 	if (!cache_matcher->matcher_object) {
 		rte_free(cache_matcher);
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 6c4f52f..9548d12 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -56,10 +56,11 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context ctx = priv->sh->ctx;
 	struct ibv_counter_set_init_attr init = {
 			 .counter_set_id = counter->id};
 
-	counter->cs = mlx5_glue->create_counter_set(priv->ctx, &init);
+	counter->cs = mlx5_glue->create_counter_set(ctx, &init);
 	if (!counter->cs) {
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
@@ -67,12 +68,13 @@
 	return 0;
 #elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	struct ibv_counters_init_attr init = {0};
 	struct ibv_counter_attach_attr attach;
 	int ret;
 
 	memset(&attach, 0, sizeof(attach));
-	counter->cs = mlx5_glue->create_counters(priv->ctx, &init);
+	counter->cs = mlx5_glue->create_counters(ctx, &init);
 	if (!counter->cs) {
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index fd1c3a2..dcb97c2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -799,7 +799,7 @@ struct mlx5_rxq_ibv *
 	}
 	tmpl->rxq_ctrl = rxq_ctrl;
 	if (rxq_ctrl->irq) {
-		tmpl->channel = mlx5_glue->create_comp_channel(priv->ctx);
+		tmpl->channel = mlx5_glue->create_comp_channel(priv->sh->ctx);
 		if (!tmpl->channel) {
 			DRV_LOG(ERR, "port %u: comp channel creation failure",
 				dev->data->port_id);
@@ -848,7 +848,7 @@ struct mlx5_rxq_ibv *
 	}
 #endif
 	tmpl->cq = mlx5_glue->cq_ex_to_cq
-		(mlx5_glue->dv_create_cq(priv->ctx, &attr.cq.ibv,
+		(mlx5_glue->dv_create_cq(priv->sh->ctx, &attr.cq.ibv,
 					 &attr.cq.mlx5));
 	if (tmpl->cq == NULL) {
 		DRV_LOG(ERR, "port %u Rx queue %u CQ creation failure",
@@ -905,10 +905,10 @@ struct mlx5_rxq_ibv *
 			.two_byte_shift_en = MLX5_MPRQ_TWO_BYTE_SHIFT,
 		};
 	}
-	tmpl->wq = mlx5_glue->dv_create_wq(priv->ctx, &attr.wq.ibv,
+	tmpl->wq = mlx5_glue->dv_create_wq(priv->sh->ctx, &attr.wq.ibv,
 					   &attr.wq.mlx5);
 #else
-	tmpl->wq = mlx5_glue->create_wq(priv->ctx, &attr.wq.ibv);
+	tmpl->wq = mlx5_glue->create_wq(priv->sh->ctx, &attr.wq.ibv);
 #endif
 	if (tmpl->wq == NULL) {
 		DRV_LOG(ERR, "port %u Rx queue %u WQ creation failure",
@@ -1643,7 +1643,7 @@ struct mlx5_ind_table_ibv *
 	for (j = 0; i != (unsigned int)(1 << wq_n); ++i, ++j)
 		wq[i] = wq[j];
 	ind_tbl->ind_table = mlx5_glue->create_rwq_ind_table
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_rwq_ind_table_init_attr){
 			.log_ind_tbl_size = wq_n,
 			.ind_tbl = wq,
@@ -1817,7 +1817,7 @@ struct mlx5_hrxq *
 	}
 #endif
 	qp = mlx5_glue->dv_create_qp
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
@@ -1836,7 +1836,7 @@ struct mlx5_hrxq *
 		 &qp_init_attr);
 #else
 	qp = mlx5_glue->create_qp_ex
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
@@ -1988,20 +1988,21 @@ struct mlx5_rxq_ibv *
 mlx5_rxq_ibv_drop_new(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	struct ibv_cq *cq;
 	struct ibv_wq *wq = NULL;
 	struct mlx5_rxq_ibv *rxq;
 
 	if (priv->drop_queue.rxq)
 		return priv->drop_queue.rxq;
-	cq = mlx5_glue->create_cq(priv->ctx, 1, NULL, NULL, 0);
+	cq = mlx5_glue->create_cq(ctx, 1, NULL, NULL, 0);
 	if (!cq) {
 		DEBUG("port %u cannot allocate CQ for drop queue",
 		      dev->data->port_id);
 		rte_errno = errno;
 		goto error;
 	}
-	wq = mlx5_glue->create_wq(priv->ctx,
+	wq = mlx5_glue->create_wq(ctx,
 		 &(struct ibv_wq_init_attr){
 			.wq_type = IBV_WQT_RQ,
 			.max_wr = 1,
@@ -2078,7 +2079,7 @@ struct mlx5_ind_table_ibv *
 	if (!rxq)
 		return NULL;
 	tmpl.ind_table = mlx5_glue->create_rwq_ind_table
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_rwq_ind_table_init_attr){
 			.log_ind_tbl_size = 0,
 			.ind_tbl = &rxq->wq,
@@ -2145,7 +2146,7 @@ struct mlx5_hrxq *
 	ind_tbl = mlx5_ind_table_ibv_drop_new(dev);
 	if (!ind_tbl)
 		return NULL;
-	qp = mlx5_glue->create_qp_ex(priv->ctx,
+	qp = mlx5_glue->create_qp_ex(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
diff --git a/drivers/net/mlx5/mlx5_socket.c b/drivers/net/mlx5/mlx5_socket.c
index 41cac3c..8fa6430 100644
--- a/drivers/net/mlx5/mlx5_socket.c
+++ b/drivers/net/mlx5/mlx5_socket.c
@@ -185,9 +185,9 @@
 	assert(cmsg != NULL);
 	cmsg->cmsg_level = SOL_SOCKET;
 	cmsg->cmsg_type = SCM_RIGHTS;
-	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->ctx->cmd_fd));
+	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->sh->ctx->cmd_fd));
 	fd = (int *)CMSG_DATA(cmsg);
-	*fd = priv->ctx->cmd_fd;
+	*fd = priv->sh->ctx->cmd_fd;
 	ret = sendmsg(conn_sock, &msg, 0);
 	if (ret < 0)
 		DRV_LOG(WARNING, "port %u cannot send response",
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 2137bdc..d13a1a1 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -58,7 +58,7 @@
 			goto error;
 		}
 	}
-	ret = mlx5_tx_uar_remap(dev, priv->ctx->cmd_fd);
+	ret = mlx5_tx_uar_remap(dev, priv->sh->ctx->cmd_fd);
 	if (ret) {
 		/* Adjust index for rollback. */
 		i = priv->txqs_n - 1;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 10a3040..25305b6 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -392,7 +392,7 @@ struct mlx5_txq_ibv *
 		((desc / MLX5_TX_COMP_THRESH) - 1) : 1;
 	if (is_empw_burst_func(tx_pkt_burst))
 		cqe_n += MLX5_TX_COMP_THRESH_INLINE_DIV;
-	tmpl.cq = mlx5_glue->create_cq(priv->ctx, cqe_n, NULL, NULL, 0);
+	tmpl.cq = mlx5_glue->create_cq(priv->sh->ctx, cqe_n, NULL, NULL, 0);
 	if (tmpl.cq == NULL) {
 		DRV_LOG(ERR, "port %u Tx queue %u CQ creation failure",
 			dev->data->port_id, idx);
@@ -435,7 +435,7 @@ struct mlx5_txq_ibv *
 		attr.init.max_tso_header = txq_ctrl->max_tso_header;
 		attr.init.comp_mask |= IBV_QP_INIT_ATTR_MAX_TSO_HEADER;
 	}
-	tmpl.qp = mlx5_glue->create_qp_ex(priv->ctx, &attr.init);
+	tmpl.qp = mlx5_glue->create_qp_ex(priv->sh->ctx, &attr.init);
 	if (tmpl.qp == NULL) {
 		DRV_LOG(ERR, "port %u Tx queue %u QP creation failure",
 			dev->data->port_id, idx);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 10/13] net/mlx5: provide IB port for the object being created
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
                       ` (8 preceding siblings ...)
  2019-03-25 17:03     ` [PATCH v2 09/13] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-25 17:03     ` [PATCH v2 11/13] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
                       ` (3 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to provide IB port index for the Verbs
objects being created - QP and Verbs Flows.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c       | 2 ++
 drivers/net/mlx5/mlx5_flow_verbs.c | 1 +
 drivers/net/mlx5/mlx5_txq.c        | 4 ++--
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 010d61a..40d6818 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -315,6 +315,7 @@ struct mlx5_flow_tunnel_info {
 int
 mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
 		struct ibv_flow_attr attr;
 		struct ibv_flow_spec_eth eth;
@@ -322,6 +323,7 @@ struct mlx5_flow_tunnel_info {
 	} flow_attr = {
 		.attr = {
 			.num_of_specs = 2,
+			.port = (uint8_t)priv->ibv_port,
 		},
 		.eth = {
 			.type = IBV_FLOW_SPEC_ETH,
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 9548d12..b69af6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -1550,6 +1550,7 @@
 	dev_flow->layers = item_flags;
 	dev_flow->verbs.attr->priority =
 		mlx5_flow_adjust_priority(dev, priority, subpriority);
+	dev_flow->verbs.attr->port = (uint8_t)priv->ibv_port;
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 25305b6..5062f5c 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -445,8 +445,8 @@ struct mlx5_txq_ibv *
 	attr.mod = (struct ibv_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = 1,
+		/* IB device port number. */
+		.port_num = (uint8_t)priv->ibv_port,
 	};
 	ret = mlx5_glue->modify_qp(tmpl.qp, &attr.mod,
 				   (IBV_QP_STATE | IBV_QP_PORT));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 11/13] net/mlx5: update install/uninstall int handler routines
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
                       ` (9 preceding siblings ...)
  2019-03-25 17:03     ` [PATCH v2 10/13] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-26 12:14       ` Shahaf Shuler
  2019-03-25 17:03     ` [PATCH v2 12/13] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
                       ` (2 subsequent siblings)
  13 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

We are implementing the support for multport Infiniband device
withj representors attached to these multiple ports. Asynchronous
device event notifications (link status change, removal event, etc.)
should be shared between ports. We are going to implement shared
event handler and this patch introduces appropriate device
structure changes and updated event handler install and uninstall
routines.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  19 ++++++-
 drivers/net/mlx5/mlx5.h        |   3 +-
 drivers/net/mlx5/mlx5_ethdev.c | 118 ++++++++++++++++++++++++++++++++---------
 3 files changed, 112 insertions(+), 28 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 27bd266..c78b139 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -165,6 +165,7 @@ struct mlx5_dev_spawn_data {
 {
 	struct mlx5_ibv_shared *sh;
 	int err = 0;
+	uint32_t i;
 
 	assert(spawn);
 	/* Secondary process should not create the shared context. */
@@ -213,6 +214,14 @@ struct mlx5_dev_spawn_data {
 		sizeof(sh->ibdev_name));
 	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
 		sizeof(sh->ibdev_path));
+	pthread_mutex_init(&sh->intr_mutex, NULL);
+	/*
+	 * Setting port_id to max unallowed value means
+	 * there is no interrupt subhandler installed for
+	 * the given port index i.
+	 */
+	for (i = 0; i < sh->max_port; i++)
+		sh->port[i].ih_port_id = RTE_MAX_ETHPORTS;
 	sh->pd = mlx5_glue->alloc_pd(sh->ctx);
 	if (sh->pd == NULL) {
 		DRV_LOG(ERR, "PD allocation failure");
@@ -267,6 +276,15 @@ struct mlx5_dev_spawn_data {
 		assert(!sh->pd);
 	}
 	LIST_REMOVE(sh, next);
+	/*
+	 *  Ensure there is no async event handler installed.
+	 *  Only primary process handles async device events.
+	 **/
+	assert(!sh->intr_cnt);
+	if (sh->intr_cnt)
+		rte_intr_callback_unregister
+			(&sh->intr_handle, mlx5_dev_interrupt_handler, sh);
+	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
@@ -274,7 +292,6 @@ struct mlx5_dev_spawn_data {
 	rte_free(sh);
 }
 
-
 /**
  * Prepare shared data between primary and secondary process.
  */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 597e3af..853655b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -222,6 +222,8 @@ struct mlx5_ibv_shared {
 	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
 	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
+	pthread_mutex_t intr_mutex; /* Interrupt config mutex. */
+	uint32_t intr_cnt; /* Interrupt handler reference counter. */
 	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
@@ -251,7 +253,6 @@ struct mlx5_priv {
 	struct mlx5_txq_data *(*txqs)[]; /* TX queues. */
 	struct rte_mempool *mprq_mp; /* Mempool for Multi-Packet RQ. */
 	struct rte_eth_rss_conf rss_conf; /* RSS configuration. */
-	struct rte_intr_handle intr_handle; /* Interrupt handler. */
 	unsigned int (*reta_idx)[]; /* RETA index table. */
 	unsigned int reta_idx_n; /* RETA index size. */
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index a44a383..d41f66b 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1109,6 +1109,96 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
+ * Uninstall shared asynchronous device events handler.
+ * This function is implemeted to support event sharing
+ * between multiple ports of single IB device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_dev_shared_handler_uninstall(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+	pthread_mutex_lock(&sh->intr_mutex);
+	assert(priv->ibv_port);
+	assert(priv->ibv_port <= sh->max_port);
+	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
+	if (sh->port[priv->ibv_port - 1].ih_port_id >= RTE_MAX_ETHPORTS)
+		goto exit;
+	assert(sh->port[priv->ibv_port - 1].port_id ==
+					(uint32_t)dev->data->port_id);
+	assert(sh->intr_cnt);
+	sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
+	if (!sh->intr_cnt || --sh->intr_cnt)
+		goto exit;
+	rte_intr_callback_unregister(&sh->intr_handle,
+				     mlx5_dev_interrupt_handler, sh);
+	sh->intr_handle.fd = 0;
+	sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+exit:
+	pthread_mutex_unlock(&sh->intr_mutex);
+}
+
+/**
+ * Install shared asyncronous device events handler.
+ * This function is implemeted to support event sharing
+ * between multiple ports of single IB device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_dev_shared_handler_install(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
+	int ret;
+	int flags;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+	pthread_mutex_lock(&sh->intr_mutex);
+	assert(priv->ibv_port);
+	assert(priv->ibv_port <= sh->max_port);
+	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
+	if (sh->port[priv->ibv_port - 1].ih_port_id < RTE_MAX_ETHPORTS) {
+		/* The handler is already installed for this port. */
+		assert(sh->intr_cnt);
+		goto exit;
+	}
+	sh->port[priv->ibv_port - 1].ih_port_id = (uint32_t)dev->data->port_id;
+	if (sh->intr_cnt) {
+		sh->intr_cnt++;
+		goto exit;
+	}
+	/* No shared handler installed. */
+	assert(sh->ctx->async_fd > 0);
+	flags = fcntl(sh->ctx->async_fd, F_GETFL);
+	ret = fcntl(sh->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
+	if (ret) {
+		DRV_LOG(INFO, "failed to change file descriptor"
+			      " async event queue");
+		/* Indicate there will be no interrupts. */
+		dev->data->dev_conf.intr_conf.lsc = 0;
+		dev->data->dev_conf.intr_conf.rmv = 0;
+		sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
+		goto exit;
+	}
+	sh->intr_handle.fd = sh->ctx->async_fd;
+	sh->intr_handle.type = RTE_INTR_HANDLE_EXT;
+	rte_intr_callback_register(&sh->intr_handle,
+				   mlx5_dev_interrupt_handler, sh);
+	sh->intr_cnt++;
+exit:
+	pthread_mutex_unlock(&sh->intr_mutex);
+}
+
+/**
  * Uninstall interrupt handler.
  *
  * @param dev
@@ -1119,15 +1209,10 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (dev->data->dev_conf.intr_conf.lsc ||
-	    dev->data->dev_conf.intr_conf.rmv)
-		rte_intr_callback_unregister(&priv->intr_handle,
-					     mlx5_dev_interrupt_handler, dev);
+	mlx5_dev_shared_handler_uninstall(dev);
 	if (priv->primary_socket)
 		rte_intr_callback_unregister(&priv->intr_handle_socket,
 					     mlx5_dev_handler_socket, dev);
-	priv->intr_handle.fd = 0;
-	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 	priv->intr_handle_socket.fd = 0;
 	priv->intr_handle_socket.type = RTE_INTR_HANDLE_UNKNOWN;
 }
@@ -1142,28 +1227,9 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct ibv_context *ctx = priv->sh->ctx;
 	int ret;
-	int flags;
 
-	assert(ctx->async_fd > 0);
-	flags = fcntl(ctx->async_fd, F_GETFL);
-	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
-	if (ret) {
-		DRV_LOG(INFO,
-			"port %u failed to change file descriptor async event"
-			" queue",
-			dev->data->port_id);
-		dev->data->dev_conf.intr_conf.lsc = 0;
-		dev->data->dev_conf.intr_conf.rmv = 0;
-	}
-	if (dev->data->dev_conf.intr_conf.lsc ||
-	    dev->data->dev_conf.intr_conf.rmv) {
-		priv->intr_handle.fd = ctx->async_fd;
-		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
-		rte_intr_callback_register(&priv->intr_handle,
-					   mlx5_dev_interrupt_handler, dev);
-	}
+	mlx5_dev_shared_handler_install(dev);
 	ret = mlx5_socket_init(dev);
 	if (ret)
 		DRV_LOG(ERR, "port %u cannot initialise socket: %s",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 12/13] net/mlx5: update event handler for multiport IB devices
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
                       ` (10 preceding siblings ...)
  2019-03-25 17:03     ` [PATCH v2 11/13] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-26 12:16       ` Shahaf Shuler
  2019-03-25 17:03     ` [PATCH v2 13/13] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
  13 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

This patch modifies asynchronous event handler to support multiport
Infiniband devices. Handler queries the event parameters, including
event source port index, and invokes the handler for specific
devices with appropriate port_id.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_ethdev.c | 101 +++++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index d41f66b..5871c62 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1032,66 +1032,67 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Device status handler.
+ * Handle shared asynchronous events the NIC (removal event
+ * and link status change). Supports multiport IB device.
  *
- * @param dev
- *   Pointer to Ethernet device.
- * @param events
- *   Pointer to event flags holder.
- *
- * @return
- *   Events bitmap of callback process which can be called immediately.
+ * @param cb_arg
+ *   Callback argument.
  */
-static uint32_t
-mlx5_dev_status_handler(struct rte_eth_dev *dev)
+void
+mlx5_dev_interrupt_handler(void *cb_arg)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = cb_arg;
 	struct ibv_async_event event;
-	uint32_t ret = 0;
 
-	if (mlx5_link_update(dev, 0) == -EAGAIN) {
-		usleep(0);
-		return 0;
-	}
-	/* Read all message and acknowledge them. */
+	/* Read all message from the IB device and acknowledge them. */
 	for (;;) {
-		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
+		struct rte_eth_dev *dev;
+		uint32_t tmp;
+
+		if (mlx5_glue->get_async_event(sh->ctx, &event))
 			break;
+		/* Retrieve and check IB port index. */
+		tmp = (uint32_t)event.element.port_num;
+		assert(tmp && (tmp <= sh->max_port));
+		if (!tmp ||
+		    tmp > sh->max_port ||
+		    sh->port[tmp - 1].ih_port_id >= RTE_MAX_ETHPORTS) {
+			/*
+			 * Invalid IB port index or no handler
+			 * installed for this port.
+			 */
+			mlx5_glue->ack_async_event(&event);
+			continue;
+		}
+		/* Retrieve ethernet device descriptor. */
+		tmp = sh->port[tmp - 1].ih_port_id;
+		dev = &rte_eth_devices[tmp];
+		tmp = 0;
+		assert(dev);
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
-			event.event_type == IBV_EVENT_PORT_ERR) &&
-			(dev->data->dev_conf.intr_conf.lsc == 1))
-			ret |= (1 << RTE_ETH_EVENT_INTR_LSC);
-		else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
-			dev->data->dev_conf.intr_conf.rmv == 1)
-			ret |= (1 << RTE_ETH_EVENT_INTR_RMV);
-		else
-			DRV_LOG(DEBUG,
-				"port %u event type %d on not handled",
-				dev->data->port_id, event.event_type);
+		     event.event_type == IBV_EVENT_PORT_ERR) &&
+			dev->data->dev_conf.intr_conf.lsc) {
+			mlx5_glue->ack_async_event(&event);
+			if (mlx5_link_update(dev, 0) == -EAGAIN) {
+				usleep(0);
+				continue;
+			}
+			_rte_eth_dev_callback_process
+				(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
+			continue;
+		}
+		if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
+		    dev->data->dev_conf.intr_conf.rmv) {
+			mlx5_glue->ack_async_event(&event);
+			_rte_eth_dev_callback_process
+				(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
+			continue;
+		}
+		DRV_LOG(DEBUG,
+			"port %u event type %d on not handled",
+			dev->data->port_id, event.event_type);
 		mlx5_glue->ack_async_event(&event);
 	}
-	return ret;
-}
-
-/**
- * Handle interrupts from the NIC.
- *
- * @param[in] intr_handle
- *   Interrupt handler.
- * @param cb_arg
- *   Callback argument.
- */
-void
-mlx5_dev_interrupt_handler(void *cb_arg)
-{
-	struct rte_eth_dev *dev = cb_arg;
-	uint32_t events;
-
-	events = mlx5_dev_status_handler(dev);
-	if (events & (1 << RTE_ETH_EVENT_INTR_LSC))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
-	if (events & (1 << RTE_ETH_EVENT_INTR_RMV))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v2 13/13] net/mlx5: add source vport match to the ingress rules
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
                       ` (11 preceding siblings ...)
  2019-03-25 17:03     ` [PATCH v2 12/13] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
@ 2019-03-25 17:03     ` Viacheslav Ovsiienko
  2019-03-26 12:21       ` Shahaf Shuler
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
  13 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-25 17:03 UTC (permalink / raw)
  To: dev; +Cc: shahafs

For E-Switch configurations over multiport Infiniband devices
we should add source vport match to correctly distribute
traffic between representors.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index b8943da..67b359f 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3095,6 +3095,29 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add source vport match to the specified matcher.
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] port
+ *   Source vport value to match
+ * @param[in] mask
+ *   Mask
+ */
+static void
+flow_dv_translate_source_vport(void *matcher, void *key,
+			      int16_t port, uint16_t mask)
+{
+	void *misc_m = MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters);
+	void *misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+
+	MLX5_SET(fte_match_set_misc, misc_m, source_port, mask);
+	MLX5_SET(fte_match_set_misc, misc_v, source_port, port);
+}
+
+/**
  * Fill the flow with DV spec.
  *
  * @param[in] dev
@@ -3389,6 +3412,19 @@ struct field_modify_info modify_tcp[] = {
 	}
 	dev_flow->dv.actions_n = actions_n;
 	flow->actions = action_flags;
+	if (attr->ingress && !attr->transfer &&
+	    (priv->representor || priv->master)) {
+		/* It was validated - we support unidirections flows only. */
+		assert(!attr->egress);
+		/*
+		 * Add matching on source vport index only
+		 * for ingress rules in E-Switch configurations.
+		 */
+		flow_dv_translate_source_vport(matcher.mask.buf,
+					       dev_flow->dv.value.buf,
+					       priv->representor_id,
+					       0xffff);
+	}
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		void *match_mask = matcher.mask.buf;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x
  2019-03-25 17:03     ` [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
@ 2019-03-25 18:06       ` Stephen Hemminger
  2019-03-25 18:07       ` Stephen Hemminger
  2019-03-26 12:20       ` Shahaf Shuler
  2 siblings, 0 replies; 114+ messages in thread
From: Stephen Hemminger @ 2019-03-25 18:06 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, shahafs

On Mon, 25 Mar 2019 17:03:22 +0000
Viacheslav Ovsiienko <viacheslavo@mellanox.com> wrote:

> +	 * Currently we support sinlge E-Switch per PF configurations
> +	 * only and representors_id field contains the vport index for
> +	 * corresponding VF, deduced from representor port name.
> +	 * For exapmple, let's have the IB device port 10, it has
> +	 * attached network device eth0, which has port name attribute
> +	 * pf0vf2, we can deduce the VF number as 2, and set vport index
> +	 * as 3 (2+1). This assigning schema should be changed if the
> +	 * multiple E-Switch instance per PF configurations or/and PCI
> +	 * subfunctions dded

Please spell check this comment (and others).

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x
  2019-03-25 17:03     ` [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
  2019-03-25 18:06       ` Stephen Hemminger
@ 2019-03-25 18:07       ` Stephen Hemminger
  2019-03-26  7:33         ` Slava Ovsiienko
  2019-03-26 12:20       ` Shahaf Shuler
  2 siblings, 1 reply; 114+ messages in thread
From: Stephen Hemminger @ 2019-03-25 18:07 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, shahafs

On Mon, 25 Mar 2019 17:03:22 +0000
Viacheslav Ovsiienko <viacheslavo@mellanox.com> wrote:

> +	if (switch_id_set) {
> +		if (info.port_name_new) {
> +			/* New representors naming schema. */
> +			if (port_name_set) {
> +				info.master = (info.port_name == -1);
> +				info.representor = (info.port_name != -1);
> +			}
> +		} else {
> +			/* Legacy representors naming schema. */
> +			info.master = (!port_name_set || num_vf_set);
> +			info.representor = port_name_set && !num_vf_set;
> +		}
> +	}
> +	assert(!(data.master && data.representor));
>  	memcpy(arg, &info, sizeof(info));
>  	return 0;

Since assert() is often removed in non-debug envirionments, why
not add a log message and return an error instead?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x
  2019-03-25 18:07       ` Stephen Hemminger
@ 2019-03-26  7:33         ` Slava Ovsiienko
  0 siblings, 0 replies; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-26  7:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Shahaf Shuler

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, March 25, 2019 20:08
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Shahaf Shuler <shahafs@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH v2 01/13] net/mlx5: add representor
> recognition on kernels 5.x
> 
> On Mon, 25 Mar 2019 17:03:22 +0000
> Viacheslav Ovsiienko <viacheslavo@mellanox.com> wrote:
> 
> > +	if (switch_id_set) {
> > +		if (info.port_name_new) {
> > +			/* New representors naming schema. */
> > +			if (port_name_set) {
> > +				info.master = (info.port_name == -1);
> > +				info.representor = (info.port_name != -1);
> > +			}
> > +		} else {
> > +			/* Legacy representors naming schema. */
> > +			info.master = (!port_name_set || num_vf_set);
> > +			info.representor = port_name_set && !num_vf_set;
> > +		}
> > +	}
> > +	assert(!(data.master && data.representor));
> >  	memcpy(arg, &info, sizeof(info));
> >  	return 0;
> 
> Since assert() is often removed in non-debug envirionments,
Assert should be removed (compiled out) in non-debug environments. If not (suddenly) - it must be resolved.

> why not add a log message and return an error instead?
Because there is no code producing assert wrong condition. No valid combination of entry values to produce master&&representor. It is intended by design. We do not expect master&&representor condition at all. It should never happen. Otherwise, we should notify the developer (because design is corrupted somewhere - and definitely it is not a runtime/user problem). So, assert seems to be the most relevant entity here.

>>instead?
We could add error/log in parallel, not instead of.

With best regards,
Slava

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 02/13] net/mlx5: modify get ifindex routine for multiport IB
  2019-03-25 17:03     ` [PATCH v2 02/13] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
@ 2019-03-26 11:47       ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-26 11:47 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Monday, March 25, 2019 7:03 PM, Viacheslav Ovsiienko:
> Subject: [dpdk-dev] [PATCH v2 02/13] net/mlx5: modify get ifindex routine
> for multiport IB
> 
> There is the routine mlx5_nl_ifindex() returning the network interface index
> associated with Infiniband device.
> We are going to support multiport IB devices, now function takes the IB port
> as argument and returns ifindex associated with tuple <IB device, IB port>
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellanox.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 04/13] net/mlx5: add multiport IB device support to probing
  2019-03-25 17:03     ` [PATCH v2 04/13] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
@ 2019-03-26 12:02       ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-26 12:02 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Monday, March 25, 2019 7:03 PM, Viacheslav Ovsiienko:
> Subject: [dpdk-dev] [PATCH v2 04/13] net/mlx5: add multiport IB device
> support to probing
> 
> mlx5_pci_probe() routine is refactored to probe the ports of found
> Infiniband devices. All active ports (with attached network interface),
> belonging to the same Infiniband device will use the signle shared Infiniband
> context of that device.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellanox.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 05/13] net/mlx5: add IB shared context alloc/free functions
  2019-03-25 17:03     ` [PATCH v2 05/13] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
@ 2019-03-26 12:10       ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-26 12:10 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Monday, March 25, 2019 7:03 PM, Viacheslav Ovsiienko:
> Subject: [PATCH v2 05/13] net/mlx5: add IB shared context alloc/free
> functions
> 
> The Mellanox NICs support SR-IOV and have E-Switch feature.
> When SR-IOV is set up in switchdev mode and E-Switch is enabled we have
> so called VF representors in the system. All representors belonging to the
> same E-Switch are created on the basis of the single PCI function and with
> current implementation each representor has its own dedicated Infiniband
> device and operates within its own Infiniband context. It is proposed to
> provide representors as ports of the single Infiniband device and operate on
> the shared Infiniband context saving various resources. This patch introduces
> appropriate structures.
> 
> Alsa the functions to allocate and free shared IB context for multiport are
> added. The IB device context, Protection Domain, device attributes,
> Infiniband names are going to be relocated to the shared structure from the
> device private one.
> mlx5_dev_spawn() is updated to support shared context.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5.c | 298 +++++++++++++++++++++++++++++++++---
> ------------
>  drivers/net/mlx5/mlx5.h |  32 ++++++
>  2 files changed, 235 insertions(+), 95 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 4338254..4c9621a 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -140,6 +140,141 @@ struct mlx5_dev_spawn_data {
>  	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */  };
> 
> +static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list =
> +LIST_HEAD_INITIALIZER();
> +
> +/**
> + * Allocate shared IB device context. If there is multiport device the
> + * master and representors will share this context, if there is single
> + * port dedicated IB device, the context will be used by only given
> + * port due to unification.
> + *
> + * Routine first searches the context for the spesified IB device name,
> + * if found the shared context assumed and reference counter is
> incremented.
> + * If no context found the new one is created and initialized with
> +specified
> + * IB device context and parameters.
> + *
> + * @param[in] spawn
> + *   Pointer to the IB device attributes (name, port, etc).
> + *
> + * @return
> + *   Pointer to mlx5_ibv_shared object on success,
> + *   otherwise NULL and rte_errno is set.
> + */
> +static struct mlx5_ibv_shared *
> +mlx5_alloc_shared_ibctx(const struct mlx5_dev_spawn_data *spawn) {

Sorry Slava, I mislead you.
Even though it is not expected for probe to be called by multiple thread it is safer to protect this list as it is global resources.

It is not on the critical datapath, so no big concerns w/ mutex. 
Please add proper synchronization when manipulating this shared contexts. 

> +	struct mlx5_ibv_shared *sh;
> +	int err = 0;
> +
> +	assert(spawn);
> +	/* Secondary process should not create the shared context. */
> +	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
> +	/* Search for IB context by device name. */
> +	LIST_FOREACH(sh, &mlx5_ibv_list, next) {
> +		if (!strcmp(sh->ibdev_name, spawn->ibv_dev->name)) {
> +			sh->refcnt++;
> +			return sh;
> +		}
> +	}
> +	/* No device found, we have to create new sharted context. */
> +	assert(spawn->max_port);
> +	sh = rte_zmalloc("ethdev shared ib context",
> +			 sizeof(struct mlx5_ibv_shared) +
> +			 spawn->max_port *
> +			 sizeof(struct mlx5_ibv_shared_port),
> +			 RTE_CACHE_LINE_SIZE);
> +	if (!sh) {
> +		DRV_LOG(ERR, "shared context allocation failure");
> +		rte_errno  = ENOMEM;
> +		return NULL;
> +	}
> +	/* Try to open IB device with DV first, then usual Verbs. */
> +	errno = 0;
> +	sh->ctx = mlx5_glue->dv_open_device(spawn->ibv_dev);
> +	if (sh->ctx) {
> +		sh->devx = 1;
> +		DRV_LOG(DEBUG, "DevX is supported");
> +	} else {
> +		sh->ctx = mlx5_glue->open_device(spawn->ibv_dev);
> +		if (!sh->ctx) {
> +			err = errno ? errno : ENODEV;
> +			goto error;
> +		}
> +		DRV_LOG(DEBUG, "DevX is NOT supported");
> +	}
> +	err = mlx5_glue->query_device_ex(sh->ctx, NULL, &sh-
> >device_attr);
> +	if (err) {
> +		DRV_LOG(DEBUG, "ibv_query_device_ex() failed");
> +		goto error;
> +	}
> +	sh->refcnt = 1;
> +	sh->max_port = spawn->max_port;
> +	strncpy(sh->ibdev_name, sh->ctx->device->name,
> +		sizeof(sh->ibdev_name));
> +	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
> +		sizeof(sh->ibdev_path));
> +	sh->pd = mlx5_glue->alloc_pd(sh->ctx);
> +	if (sh->pd == NULL) {
> +		DRV_LOG(ERR, "PD allocation failure");
> +		err = ENOMEM;
> +		goto error;
> +	}
> +	LIST_INSERT_HEAD(&mlx5_ibv_list, sh, next);
> +	return sh;
> +error:
> +	assert(sh);
> +	if (sh->pd)
> +		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
> +	if (sh->ctx)
> +		claim_zero(mlx5_glue->close_device(sh->ctx));
> +	rte_free(sh);
> +	assert(err > 0);
> +	rte_errno = err;
> +	return NULL;
> +}
> +
> +/**
> + * Free shared IB device context. Decrement counter and if zero free
> + * all allocated resources and close handles.
> + *
> + * @param[in] sh
> + *   Pointer to mlx5_ibv_shared object to free
> + */
> +static void
> +mlx5_free_shared_ibctx(struct mlx5_ibv_shared *sh) { #ifndef NDEBUG
> +	/* Check the object presence in the list. */
> +	struct mlx5_ibv_shared *lctx;
> +
> +	LIST_FOREACH(lctx, &mlx5_ibv_list, next)
> +		if (lctx == sh)
> +			break;
> +	assert(lctx);
> +	if (lctx != sh) {
> +		DRV_LOG(ERR, "Freeing non-existing shared IB context");
> +		return;
> +	}
> +#endif
> +	assert(sh);
> +	assert(sh->refcnt);
> +	if (--sh->refcnt)
> +		return;
> +	/* Zero reference counter, we should release resources. */
> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> +		assert(sh->secondary);
> +		assert(sh->ctx);
> +		assert(!sh->pd);
> +	}
> +	LIST_REMOVE(sh, next);
> +	if (sh->pd)
> +		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
> +	if (sh->ctx)
> +		claim_zero(mlx5_glue->close_device(sh->ctx));
> +	rte_free(sh);
> +}
> +
> +
>  /**
>   * Prepare shared data between primary and secondary process.
>   */
> @@ -289,12 +424,10 @@ struct mlx5_dev_spawn_data {
>  	}
>  	mlx5_mprq_free_mp(dev);
>  	mlx5_mr_release(dev);
> -	if (priv->pd != NULL) {
> -		assert(priv->ctx != NULL);
> -		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
> -		claim_zero(mlx5_glue->close_device(priv->ctx));
> -	} else
> -		assert(priv->ctx == NULL);
> +	assert(priv->sh);
> +	if (priv->sh)
> +		mlx5_free_shared_ibctx(priv->sh);
> +	priv->sh = NULL;
>  	if (priv->rss_conf.rss_key != NULL)
>  		rte_free(priv->rss_conf.rss_key);
>  	if (priv->reta_idx != NULL)
> @@ -744,11 +877,8 @@ struct mlx5_dev_spawn_data {
>  	       struct mlx5_dev_config config)
>  {
>  	const struct mlx5_switch_info *switch_info = &spawn->info;
> -	struct ibv_device *ibv_dev = spawn->ibv_dev;
> -	struct ibv_context *ctx = NULL;
> -	struct ibv_device_attr_ex attr;
> +	struct mlx5_ibv_shared *sh;
>  	struct ibv_port_attr port_attr;
> -	struct ibv_pd *pd = NULL;
>  	struct mlx5dv_context dv_attr = { .comp_mask = 0 };
>  	struct rte_eth_dev *eth_dev = NULL;
>  	struct mlx5_priv *priv = NULL;
> @@ -807,18 +937,47 @@ struct mlx5_dev_spawn_data {
>  	}
>  	/* Prepare shared data between primary and secondary process. */
>  	mlx5_prepare_shared_data();
> -	errno = 0;
> -	ctx = mlx5_glue->dv_open_device(ibv_dev);
> -	if (ctx) {
> -		config.devx = 1;
> -		DRV_LOG(DEBUG, "DEVX is supported");
> -	} else {
> -		ctx = mlx5_glue->open_device(ibv_dev);
> -		if (!ctx) {
> -			rte_errno = errno ? errno : ENODEV;
> -			return NULL;
> +	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> +		eth_dev = rte_eth_dev_attach_secondary(name);
> +		if (eth_dev == NULL) {
> +			DRV_LOG(ERR, "can not attach rte ethdev");
> +			rte_errno = ENOMEM;
> +			err = rte_errno;
> +			goto error;
> +		}
> +		eth_dev->device = dpdk_dev;
> +		eth_dev->dev_ops = &mlx5_dev_sec_ops;
> +		err = mlx5_uar_init_secondary(eth_dev);
> +		if (err) {
> +			err = rte_errno;
> +			goto error;
>  		}
> +		/* Receive command fd from primary process */
> +		err = mlx5_socket_connect(eth_dev);
> +		if (err < 0) {
> +			err = rte_errno;
> +			goto error;
> +		}
> +		/* Remap UAR for Tx queues. */
> +		err = mlx5_tx_uar_remap(eth_dev, err);
> +		if (err) {
> +			err = rte_errno;
> +			goto error;
> +		}
> +		/*
> +		 * Ethdev pointer is still required as input since
> +		 * the primary device is not accessible from the
> +		 * secondary process.
> +		 */
> +		eth_dev->rx_pkt_burst =
> mlx5_select_rx_function(eth_dev);
> +		eth_dev->tx_pkt_burst =
> mlx5_select_tx_function(eth_dev);
> +		return eth_dev;
>  	}
> +	sh = mlx5_alloc_shared_ibctx(spawn);
> +	if (!sh)
> +		return NULL;
> +	config.devx = sh->devx;
>  #ifdef HAVE_IBV_MLX5_MOD_SWP
>  	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_SWP;  #endif @@
> -832,7 +991,7 @@ struct mlx5_dev_spawn_data {  #ifdef
> HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
>  	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_STRIDING_RQ;
> #endif
> -	mlx5_glue->dv_query_device(ctx, &dv_attr);
> +	mlx5_glue->dv_query_device(sh->ctx, &dv_attr);
>  	if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) {
>  		if (dv_attr.flags &
> MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW) {
>  			DRV_LOG(DEBUG, "enhanced MPW is supported");
> @@ -917,51 +1076,8 @@ struct mlx5_dev_spawn_data {
>  		" old OFED/rdma-core version or firmware configuration");
> #endif
>  	config.mpls_en = mpls_en;
> -	err = mlx5_glue->query_device_ex(ctx, NULL, &attr);
> -	if (err) {
> -		DEBUG("ibv_query_device_ex() failed");
> -		goto error;
> -	}
> -	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
> -	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> -		eth_dev = rte_eth_dev_attach_secondary(name);
> -		if (eth_dev == NULL) {
> -			DRV_LOG(ERR, "can not attach rte ethdev");
> -			rte_errno = ENOMEM;
> -			err = rte_errno;
> -			goto error;
> -		}
> -		eth_dev->device = dpdk_dev;
> -		eth_dev->dev_ops = &mlx5_dev_sec_ops;
> -		err = mlx5_uar_init_secondary(eth_dev);
> -		if (err) {
> -			err = rte_errno;
> -			goto error;
> -		}
> -		/* Receive command fd from primary process */
> -		err = mlx5_socket_connect(eth_dev);
> -		if (err < 0) {
> -			err = rte_errno;
> -			goto error;
> -		}
> -		/* Remap UAR for Tx queues. */
> -		err = mlx5_tx_uar_remap(eth_dev, err);
> -		if (err) {
> -			err = rte_errno;
> -			goto error;
> -		}
> -		/*
> -		 * Ethdev pointer is still required as input since
> -		 * the primary device is not accessible from the
> -		 * secondary process.
> -		 */
> -		eth_dev->rx_pkt_burst =
> mlx5_select_rx_function(eth_dev);
> -		eth_dev->tx_pkt_burst =
> mlx5_select_tx_function(eth_dev);
> -		claim_zero(mlx5_glue->close_device(ctx));
> -		return eth_dev;
> -	}
>  	/* Check port status. */
> -	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
> +	err = mlx5_glue->query_port(sh->ctx, spawn->ibv_port, &port_attr);
>  	if (err) {
>  		DRV_LOG(ERR, "port query failed: %s", strerror(err));
>  		goto error;
> @@ -975,13 +1091,7 @@ struct mlx5_dev_spawn_data {
>  		DRV_LOG(DEBUG, "port is not active: \"%s\" (%d)",
>  			mlx5_glue->port_state_str(port_attr.state),
>  			port_attr.state);
> -	/* Allocate protection domain. */
> -	pd = mlx5_glue->alloc_pd(ctx);
> -	if (pd == NULL) {
> -		DRV_LOG(ERR, "PD allocation failure");
> -		err = ENOMEM;
> -		goto error;
> -	}
> +	/* Allocate private eth device data. */
>  	priv = rte_zmalloc("ethdev private structure",
>  			   sizeof(*priv),
>  			   RTE_CACHE_LINE_SIZE);
> @@ -990,13 +1100,11 @@ struct mlx5_dev_spawn_data {
>  		err = ENOMEM;
>  		goto error;
>  	}
> -	priv->ctx = ctx;
> -	strncpy(priv->ibdev_name, priv->ctx->device->name,
> -		sizeof(priv->ibdev_name));
> -	strncpy(priv->ibdev_path, priv->ctx->device->ibdev_path,
> -		sizeof(priv->ibdev_path));
> -	priv->device_attr = attr;
> -	priv->pd = pd;
> +	priv->sh = sh;
> +	priv->ctx = sh->ctx;
> +	priv->ibv_port = spawn->ibv_port;
> +	priv->device_attr = sh->device_attr;
> +	priv->pd = sh->pd;
>  	priv->mtu = ETHER_MTU;
>  #ifndef RTE_ARCH_64
>  	/* Initialize UAR access locks for 32bit implementations. */ @@ -
> 1062,7 +1170,8 @@ struct mlx5_dev_spawn_data {
>  			strerror(rte_errno));
>  		goto error;
>  	}
> -	config.hw_csum = !!(attr.device_cap_flags_ex &
> IBV_DEVICE_RAW_IP_CSUM);
> +	config.hw_csum = !!(sh->device_attr.device_cap_flags_ex &
> +			    IBV_DEVICE_RAW_IP_CSUM);
>  	DRV_LOG(DEBUG, "checksum offloading is %ssupported",
>  		(config.hw_csum ? "" : "not "));
>  #if !defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) && \ @@ -1076,7
> +1185,7 @@ struct mlx5_dev_spawn_data {
>  	}
>  #endif
>  	config.ind_table_max_size =
> -		attr.rss_caps.max_rwq_indirection_table_size;
> +		sh->device_attr.rss_caps.max_rwq_indirection_table_size;
>  	/*
>  	 * Remove this check once DPDK supports larger/variable
>  	 * indirection tables.
> @@ -1085,18 +1194,18 @@ struct mlx5_dev_spawn_data {
>  		config.ind_table_max_size = ETH_RSS_RETA_SIZE_512;
>  	DRV_LOG(DEBUG, "maximum Rx indirection table size is %u",
>  		config.ind_table_max_size);
> -	config.hw_vlan_strip = !!(attr.raw_packet_caps &
> +	config.hw_vlan_strip = !!(sh->device_attr.raw_packet_caps &
> 
> IBV_RAW_PACKET_CAP_CVLAN_STRIPPING);
>  	DRV_LOG(DEBUG, "VLAN stripping is %ssupported",
>  		(config.hw_vlan_strip ? "" : "not "));
> -	config.hw_fcs_strip = !!(attr.raw_packet_caps &
> +	config.hw_fcs_strip = !!(sh->device_attr.raw_packet_caps &
>  				 IBV_RAW_PACKET_CAP_SCATTER_FCS);
>  	DRV_LOG(DEBUG, "FCS stripping configuration is %ssupported",
>  		(config.hw_fcs_strip ? "" : "not "));  #if
> defined(HAVE_IBV_WQ_FLAG_RX_END_PADDING)
> -	hw_padding = !!attr.rx_pad_end_addr_align;
> +	hw_padding = !!sh->device_attr.rx_pad_end_addr_align;
>  #elif defined(HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING)
> -	hw_padding = !!(attr.device_cap_flags_ex &
> +	hw_padding = !!(sh->device_attr.device_cap_flags_ex &
>  			IBV_DEVICE_PCI_WRITE_END_PADDING);
>  #endif
>  	if (config.hw_padding && !hw_padding) { @@ -1105,11 +1214,11 @@
> struct mlx5_dev_spawn_data {
>  	} else if (config.hw_padding) {
>  		DRV_LOG(DEBUG, "Rx end alignment padding is enabled");
>  	}
> -	config.tso = (attr.tso_caps.max_tso > 0 &&
> -		      (attr.tso_caps.supported_qpts &
> +	config.tso = (sh->device_attr.tso_caps.max_tso > 0 &&
> +		      (sh->device_attr.tso_caps.supported_qpts &
>  		       (1 << IBV_QPT_RAW_PACKET)));
>  	if (config.tso)
> -		config.tso_max_payload_sz = attr.tso_caps.max_tso;
> +		config.tso_max_payload_sz = sh-
> >device_attr.tso_caps.max_tso;
>  	/*
>  	 * MPW is disabled by default, while the Enhanced MPW is enabled
>  	 * by default.
> @@ -1243,7 +1352,7 @@ struct mlx5_dev_spawn_data {
>  		}
>  	}
>  #ifdef HAVE_MLX5DV_DR
> -		ns = mlx5dv_dr_create_ns(ctx,
> +		ns = mlx5dv_dr_create_ns(sh->ctx,
> 
> MLX5DV_DR_NS_DOMAIN_INGRESS_BYPASS);
>  		if (ns == NULL) {
>  			DRV_LOG(ERR, "mlx5dv_dr_create_ns failed"); @@ -
> 1251,7 +1360,7 @@ struct mlx5_dev_spawn_data {
>  			goto error;
>  		}
>  		priv->rx_ns = ns;
> -		ns = mlx5dv_dr_create_ns(ctx,
> +		ns = mlx5dv_dr_create_ns(sh->ctx,
> 
> MLX5DV_DR_NS_DOMAIN_EGRESS_BYPASS);
>  		if (ns == NULL) {
>  			DRV_LOG(ERR, "mlx5dv_dr_create_ns failed"); @@ -
> 1268,7 +1377,8 @@ struct mlx5_dev_spawn_data {
>  		.free = &mlx5_free_verbs_buf,
>  		.data = priv,
>  	};
> -	mlx5_glue->dv_set_context_attr(ctx,
> MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
> +	mlx5_glue->dv_set_context_attr(sh->ctx,
> +				       MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
>  				       (void *)((uintptr_t)&alctr));
>  	/* Bring Ethernet device up. */
>  	DRV_LOG(DEBUG, "port %u forcing Ethernet interface up", @@ -
> 1322,15 +1432,13 @@ struct mlx5_dev_spawn_data {
>  		if (eth_dev != NULL)
>  			eth_dev->data->dev_private = NULL;
>  	}
> -	if (pd)
> -		claim_zero(mlx5_glue->dealloc_pd(pd));
>  	if (eth_dev != NULL) {
>  		/* mac_addrs must not be freed alone because part of
> dev_private */
>  		eth_dev->data->mac_addrs = NULL;
>  		rte_eth_dev_release_port(eth_dev);
>  	}
> -	if (ctx)
> -		claim_zero(mlx5_glue->close_device(ctx));
> +	if (sh)
> +		mlx5_free_shared_ibctx(sh);
>  	assert(err > 0);
>  	rte_errno = err;
>  	return NULL;
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> d6967a3..4568de7 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -196,10 +196,42 @@ struct mlx5_flow_tbl_resource {  #define
> MLX5_MAX_TABLES 1024  #define MLX5_GROUP_FACTOR 1
> 
> +/* Per port data of shared IB device. */ struct mlx5_ibv_shared_port {
> +	uint32_t ih_port_id;
> +	/*
> +	 * Interrupt handler port_id. Used by shared interrupt
> +	 * handler to find the corresponding rte_eth device
> +	 * by IB port index. If value is equal or greater
> +	 * RTE_MAX_ETHPORTS it means there is no subhandler
> +	 * installed for specified IB port index.
> +	 */
> +};
> +
> +/*
> + * Shared Infiniband device context for Master/Representors
> + * which belong to same IB device with multiple IB ports.
> + **/
> +struct mlx5_ibv_shared {
> +	LIST_ENTRY(mlx5_ibv_shared) next;
> +	uint32_t refcnt;
> +	uint32_t devx:1; /* Opened with DV. */
> +	uint32_t max_port; /* Maximal IB device port index. */
> +	struct ibv_context *ctx; /* Verbs/DV context. */
> +	struct ibv_pd *pd; /* Protection Domain. */
> +	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
> +	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for
> secondary */
> +	struct ibv_device_attr_ex device_attr; /* Device properties. */
> +	struct rte_intr_handle intr_handle; /* Interrupt handler for device.
> */
> +	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
> +};
> +
>  struct mlx5_priv {
>  	LIST_ENTRY(mlx5_priv) mem_event_cb;
>  	/**< Called by memory event callback. */
>  	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
> +	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
> +	uint32_t ibv_port; /* IB device port number. */
>  	struct ibv_context *ctx; /* Verbs context. */
>  	struct ibv_device_attr_ex device_attr; /* Device properties. */
>  	struct ibv_pd *pd; /* Protection Domain. */
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 11/13] net/mlx5: update install/uninstall int handler routines
  2019-03-25 17:03     ` [PATCH v2 11/13] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
@ 2019-03-26 12:14       ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-26 12:14 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Monday, March 25, 2019 7:04 PM, Viacheslav Ovsiienko:
> Subject: [PATCH v2 11/13] net/mlx5: update install/uninstall int handler
> routines
> 
> We are implementing the support for multport Infiniband device withj
> representors attached to these multiple ports. Asynchronous device event
> notifications (link status change, removal event, etc.) should be shared
> between ports. We are going to implement shared event handler and this
> patch introduces appropriate device structure changes and updated event
> handler install and uninstall routines.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellanox.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 12/13] net/mlx5: update event handler for multiport IB devices
  2019-03-25 17:03     ` [PATCH v2 12/13] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
@ 2019-03-26 12:16       ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-26 12:16 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Monday, March 25, 2019 7:04 PM, Viacheslav Ovsiienko:
> Subject: [PATCH v2 12/13] net/mlx5: update event handler for multiport IB
> devices
> 
> This patch modifies asynchronous event handler to support multiport
> Infiniband devices. Handler queries the event parameters, including event
> source port index, and invokes the handler for specific devices with
> appropriate port_id.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellanox.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x
  2019-03-25 17:03     ` [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
  2019-03-25 18:06       ` Stephen Hemminger
  2019-03-25 18:07       ` Stephen Hemminger
@ 2019-03-26 12:20       ` Shahaf Shuler
  2 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-26 12:20 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Monday, March 25, 2019 7:03 PM, Viacheslav Ovsiienko:
> Subject: [PATCH v2 01/13] net/mlx5: add representor recognition on kernels
> 5.x
> 
> The master device and VF representors were distinguished by presence of
> port name, master device did not have one. The new Linux kernels starting
> from 5.0 provide the port name for master device and the implemented
> representor recognizing method does not work.
> The new recognizing method is based on quiering the VF number, created on
> the base of the device.
> 
> The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
> attribute is specified in the Netlink request message.
> 
> Also the presence of device symlink in device sysfs folder is added to
> distinguish representors with sysfs based method.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/Makefile      | 10 ++++++++++
>  drivers/net/mlx5/meson.build   |  4 ++++
>  drivers/net/mlx5/mlx5.c        | 17 ++++++++++++++---
>  drivers/net/mlx5/mlx5.h        |  1 +
>  drivers/net/mlx5/mlx5_ethdev.c | 14 ++++++++++++--
>  drivers/net/mlx5/mlx5_nl.c     | 37
> ++++++++++++++++++++++++++++++++++---
>  6 files changed, 75 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile index
> 1ed299d..3dd7e38 100644
> --- a/drivers/net/mlx5/Makefile
> +++ b/drivers/net/mlx5/Makefile
> @@ -231,6 +231,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-
> config-h.sh
>  		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
>  		$(AUTOCONF_OUTPUT)
>  	$Q sh -- '$<' '$@' \
> +		HAVE_IFLA_NUM_VF \
> +		linux/if_link.h \
> +		enum IFLA_NUM_VF \
> +		$(AUTOCONF_OUTPUT)
> +	$Q sh -- '$<' '$@' \
> +		HAVE_IFLA_EXT_MASK \
> +		linux/if_link.h \
> +		enum IFLA_EXT_MASK \
> +		$(AUTOCONF_OUTPUT)
> +	$Q sh -- '$<' '$@' \
>  		HAVE_IFLA_PHYS_SWITCH_ID \
>  		linux/if_link.h \
>  		enum IFLA_PHYS_SWITCH_ID \
> diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
> index 0cf2f08..e3cb9bc 100644
> --- a/drivers/net/mlx5/meson.build
> +++ b/drivers/net/mlx5/meson.build
> @@ -133,6 +133,10 @@ if build
>  		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
>  		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
>  		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
> +		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
> +		'IFLA_NUM_VF' ],
> +		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
> +		'IFLA_EXT_MASK' ],
>  		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
>  		'IFLA_PHYS_SWITCH_ID' ],
>  		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h', diff --git
> a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index ad1975c..032f1f2
> 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -13,7 +13,6 @@
>  #include <errno.h>
>  #include <net/if.h>
>  #include <sys/mman.h>
> -#include <linux/netlink.h>
>  #include <linux/rtnetlink.h>
> 
>  /* Verbs header. */
> @@ -1001,9 +1000,21 @@
>  	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
>  	priv->nl_sn = 0;
>  	priv->representor = !!switch_info->representor;
> +	priv->master = !!switch_info->master;
>  	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
> -	priv->representor_id =
> -		switch_info->representor ? switch_info->port_name : -1;
> +	/*
> +	 * Currently we support sinlge E-Switch per PF configurations
> +	 * only and representors_id field contains the vport index for
> +	 * corresponding VF, deduced from representor port name.
> +	 * For exapmple, let's have the IB device port 10, it has
> +	 * attached network device eth0, which has port name attribute
> +	 * pf0vf2, we can deduce the VF number as 2, and set vport index
> +	 * as 3 (2+1). This assigning schema should be changed if the
> +	 * multiple E-Switch instance per PF configurations or/and PCI
> +	 * subfunctions dded.
> +	 */
> +	priv->representor_id = switch_info->representor ?
> +			       switch_info->port_name + 1 : -1;

The representor_id field is also being used on the info query from user. We shouldn't report back the vport id of the representor rather its index according to phys_port_name.
You can define a different field, maybe called vport. 

>  	/*
>  	 * Look for sibling devices in order to reuse their switch domain
>  	 * if any, otherwise allocate one.
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> a88cb4a..58bc37f 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -214,6 +214,7 @@ struct mlx5_priv {
>  	uint16_t mtu; /* Configured MTU. */
>  	unsigned int isolated:1; /* Whether isolated mode is enabled. */
>  	unsigned int representor:1; /* Device is a port representor. */
> +	unsigned int master:1; /* Device is a E-Switch master. */
>  	uint16_t domain_id; /* Switch domain identifier. */
>  	int32_t representor_id; /* Port representor identifier. */
>  	/* RX/TX queues. */
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> b/drivers/net/mlx5/mlx5_ethdev.c index 2e8a296..64c6b6f 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -1362,8 +1362,10 @@ int mlx5_fw_version_get(struct rte_eth_dev
> *dev, char *fw_ver, size_t fw_size)
>  		.port_name = 0,
>  		.switch_id = 0,
>  	};
> +	DIR *dir;
>  	bool port_name_set = false;
>  	bool port_switch_id_set = false;
> +	bool device_dir = false;
>  	char c;
>  	int ret;
> 
> @@ -1376,6 +1378,8 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev,
> char *fw_ver, size_t fw_size)
>  	      ifname);
>  	MKSTR(phys_switch_id, "/sys/class/net/%s/phys_switch_id",
>  	      ifname);
> +	MKSTR(pci_device, "/sys/class/net/%s/device",
> +	      ifname);
> 
>  	file = fopen(phys_port_name, "rb");
>  	if (file != NULL) {
> @@ -1394,8 +1398,14 @@ int mlx5_fw_version_get(struct rte_eth_dev
> *dev, char *fw_ver, size_t fw_size)
>  		fscanf(file, "%" SCNx64 "%c", &data.switch_id, &c) == 2 &&
>  		c == '\n';
>  	fclose(file);
> -	data.master = port_switch_id_set && !port_name_set;
> -	data.representor = port_switch_id_set && port_name_set;
> +	dir = opendir(pci_device);
> +	if (dir != NULL) {
> +		closedir(dir);
> +		device_dir = true;
> +	}
> +	data.master = port_switch_id_set && (!port_name_set ||
> device_dir);
> +	data.representor = port_switch_id_set && port_name_set &&
> !device_dir;
> +	assert(!(data.master && data.representor));
>  	*info = data;
>  	return 0;
>  }
> diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c index
> 8a10109..9d0fcba 100644
> --- a/drivers/net/mlx5/mlx5_nl.c
> +++ b/drivers/net/mlx5/mlx5_nl.c
> @@ -65,6 +65,12 @@
>  #endif
> 
>  /* These are normally found in linux/if_link.h. */
> +#ifndef HAVE_IFLA_NUM_VF
> +#define IFLA_NUM_VF 21
> +#endif
> +#ifndef HAVE_IFLA_EXT_MASK
> +#define IFLA_EXT_MASK 29
> +#endif
>  #ifndef HAVE_IFLA_PHYS_SWITCH_ID
>  #define IFLA_PHYS_SWITCH_ID 36
>  #endif
> @@ -837,6 +843,7 @@ struct mlx5_nl_ifindex_data {
>  	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
>  	bool port_name_set = false;
>  	bool switch_id_set = false;
> +	bool num_vf_set = false;
> 
>  	if (nh->nlmsg_type != RTM_NEWLINK)
>  		goto error;
> @@ -848,6 +855,9 @@ struct mlx5_nl_ifindex_data {
>  		if (ra->rta_len > nh->nlmsg_len - off)
>  			goto error;
>  		switch (ra->rta_type) {
> +		case IFLA_NUM_VF:
> +			num_vf_set = true;
> +			break;
>  		case IFLA_PHYS_PORT_NAME:
>  			port_name_set =
>  				mlx5_translate_port_name((char *)payload,
> @@ -864,8 +874,20 @@ struct mlx5_nl_ifindex_data {
>  		}
>  		off += RTA_ALIGN(ra->rta_len);
>  	}
> -	info.master = switch_id_set && !port_name_set;
> -	info.representor = switch_id_set && port_name_set;
> +	if (switch_id_set) {
> +		if (info.port_name_new) {
> +			/* New representors naming schema. */
> +			if (port_name_set) {
> +				info.master = (info.port_name == -1);
> +				info.representor = (info.port_name != -1);
> +			}
> +		} else {
> +			/* Legacy representors naming schema. */
> +			info.master = (!port_name_set || num_vf_set);
> +			info.representor = port_name_set && !num_vf_set;
> +		}
> +	}
> +	assert(!(data.master && data.representor));
>  	memcpy(arg, &info, sizeof(info));
>  	return 0;
>  error:
> @@ -893,9 +915,13 @@ struct mlx5_nl_ifindex_data {
>  	struct {
>  		struct nlmsghdr nh;
>  		struct ifinfomsg info;
> +		struct rtattr rta;
> +		uint32_t extmask;
>  	} req = {
>  		.nh = {
> -			.nlmsg_len = NLMSG_LENGTH(sizeof(req.info)),
> +			.nlmsg_len = NLMSG_LENGTH
> +					(sizeof(req.info) +
> +					 RTA_LENGTH(sizeof(uint32_t))),
>  			.nlmsg_type = RTM_GETLINK,
>  			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
>  		},
> @@ -903,6 +929,11 @@ struct mlx5_nl_ifindex_data {
>  			.ifi_family = AF_UNSPEC,
>  			.ifi_index = ifindex,
>  		},
> +		.rta = {
> +			.rta_type = IFLA_EXT_MASK,
> +			.rta_len = RTA_LENGTH(sizeof(int32_t)),
> +		},
> +		.extmask = RTE_LE32(1),
>  	};
>  	int ret;
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v2 13/13] net/mlx5: add source vport match to the ingress rules
  2019-03-25 17:03     ` [PATCH v2 13/13] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
@ 2019-03-26 12:21       ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-26 12:21 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Monday, March 25, 2019 7:04 PM, Viacheslav Ovsiienko:
> Subject: [PATCH v2 13/13] net/mlx5: add source vport match to the ingress
> rules
> 
> For E-Switch configurations over multiport Infiniband devices we should add
> source vport match to correctly distribute traffic between representors.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_flow_dv.c | 36
> ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow_dv.c
> b/drivers/net/mlx5/mlx5_flow_dv.c index b8943da..67b359f 100644
> --- a/drivers/net/mlx5/mlx5_flow_dv.c
> +++ b/drivers/net/mlx5/mlx5_flow_dv.c
> @@ -3095,6 +3095,29 @@ struct field_modify_info modify_tcp[] = {  }
> 
>  /**
> + * Add source vport match to the specified matcher.
> + *
> + * @param[in, out] matcher
> + *   Flow matcher.
> + * @param[in, out] key
> + *   Flow matcher value.
> + * @param[in] port
> + *   Source vport value to match
> + * @param[in] mask
> + *   Mask
> + */
> +static void
> +flow_dv_translate_source_vport(void *matcher, void *key,
> +			      int16_t port, uint16_t mask)
> +{
> +	void *misc_m = MLX5_ADDR_OF(fte_match_param, matcher,
> misc_parameters);
> +	void *misc_v = MLX5_ADDR_OF(fte_match_param, key,
> misc_parameters);
> +
> +	MLX5_SET(fte_match_set_misc, misc_m, source_port, mask);
> +	MLX5_SET(fte_match_set_misc, misc_v, source_port, port); }
> +
> +/**
>   * Fill the flow with DV spec.
>   *
>   * @param[in] dev
> @@ -3389,6 +3412,19 @@ struct field_modify_info modify_tcp[] = {
>  	}
>  	dev_flow->dv.actions_n = actions_n;
>  	flow->actions = action_flags;
> +	if (attr->ingress && !attr->transfer &&
> +	    (priv->representor || priv->master)) {
> +		/* It was validated - we support unidirections flows only. */
> +		assert(!attr->egress);
> +		/*
> +		 * Add matching on source vport index only
> +		 * for ingress rules in E-Switch configurations.
> +		 */
> +		flow_dv_translate_source_vport(matcher.mask.buf,
> +					       dev_flow->dv.value.buf,
> +					       priv->representor_id,

Overall looks OK apart from the override of the representor_id. 

> +					       0xffff);
> +	}
>  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
>  		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
>  		void *match_mask = matcher.mask.buf;
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v3 00/14] net/mlx5: add support for multiport IB devices
  2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
                       ` (12 preceding siblings ...)
  2019-03-25 17:03     ` [PATCH v2 13/13] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
@ 2019-03-26 15:35     ` Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
                         ` (14 more replies)
  13 siblings, 15 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The Mellanox NICs support SR-IOV and have E-Switch feature. 
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context.

mlx5_pci_probe() routine is planned to be extended to probe
the ports of found Infiniband devices. All active ports,
belonging to the same Infiniband device will use the signle
shared Infiniband context of that device:
  - QP are created within shared context, with assigning to
    specified port
  - Verb flows are also created with specifying port index
  - Protection Domain is shared by all representors
  - Global Memory Regions Cache is shared by all representors,
    local ones are not (not in this patchset)
  - Shared Asynchronous Event Handler distributes events
    among the device ports (not in this patchset)
  - UAR mapping to user space is also shared (not in patchset)
		  
Each representor has its own attached network interface device,
interface name and index can be retrieved via IB device and
port index, so the link control is performed via Netlink
in the same way as in existing implementation.

This patchset also superseedes http://patches.dpdk.org/patch/50486/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
v3:
	- vport_id field is added, contains associated VF vport
	  index on E-Switch. representor_id field is reverted
	  to keep the VF index
	- some comment spellcheck
	- some error logging is added for master/representor
	  wrong recognition
	- shared IB context alloc/free routines mutex
	  protection added  

v2:
	http://patches.dpdk.org/patch/51642/
	- clarifying comments added
	- secondary process does not probe IB device at all and
	  does not create shared context
	- minor bugs (like ++ in assert) fixed
	- representor_id contains corresponding VF vport index,
	  no corrections +1 is needed anymore. vport index is
	  calculated in the only place in code and if indexing
	  schema is changed it would be easy to modify
	  
v1:
	http://patches.dpdk.org/cover/51451/

Viacheslav Ovsiienko (13):
  net/mlx5: add representor recognition on kernels 5.x
  net/mlx5: modify get ifindex routine for multiport IB
  net/mlx5: add getting IB ports number for multiport IB
  net/mlx5: add multiport IB device support to probing
  net/mlx5: add IB shared context alloc/free functions
  net/mlx5: switch to the names in the shared IB context
  net/mlx5: switch to the shared Protection Domain
  net/mlx5: switch to the shared context IB attributes
  net/mlx5: switch to the shared IB device context
  net/mlx5: provide IB port for the object being created
  net/mlx5: update install/uninstall int handler routines
  net/mlx5: update event handler for multiport IB devices
  net/mlx5: add source vport match to the ingress rules

 drivers/net/mlx5/Makefile          |  10 +
 drivers/net/mlx5/meson.build       |   4 +
 drivers/net/mlx5/mlx5.c            | 651 ++++++++++++++++++++++++++-----------
 drivers/net/mlx5/mlx5.h            |  45 ++-
 drivers/net/mlx5/mlx5_ethdev.c     | 248 +++++++++-----
 drivers/net/mlx5/mlx5_flow.c       |   2 +
 drivers/net/mlx5/mlx5_flow_dv.c    |  45 ++-
 drivers/net/mlx5/mlx5_flow_verbs.c |   7 +-
 drivers/net/mlx5/mlx5_mr.c         |   4 +-
 drivers/net/mlx5/mlx5_nl.c         | 108 +++++-
 drivers/net/mlx5/mlx5_rxq.c        |  41 +--
 drivers/net/mlx5/mlx5_socket.c     |   4 +-
 drivers/net/mlx5/mlx5_stats.c      |  22 +-
 drivers/net/mlx5/mlx5_trigger.c    |   2 +-
 drivers/net/mlx5/mlx5_txq.c        |  24 +-
 15 files changed, 879 insertions(+), 338 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v3 01/13] net/mlx5: add representor recognition on kernels 5.x
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 19:37         ` Shahaf Shuler
  2019-03-26 15:35       ` [PATCH v3 02/13] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
                         ` (13 subsequent siblings)
  14 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The master device and VF representors were distinguished by
presence of port name, master device did not have one. The new Linux
kernels starting from 5.0 provide the port name for master device
and the implemented representor recognizing method does not work.
The new recognizing method is based on querying the VF number,
has been created on the base of the device.

The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
attribute is specified in the Netlink request message.

Also the presence check of device symlink in device sysfs folder
is added to distinguish representors with sysfs based method.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/Makefile      | 10 ++++++++++
 drivers/net/mlx5/meson.build   |  4 ++++
 drivers/net/mlx5/mlx5.c        | 20 +++++++++++++++++---
 drivers/net/mlx5/mlx5.h        |  2 ++
 drivers/net/mlx5/mlx5_ethdev.c | 20 ++++++++++++++++++--
 drivers/net/mlx5/mlx5_nl.c     | 43 +++++++++++++++++++++++++++++++++++++++---
 6 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 1ed299d..3dd7e38 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -231,6 +231,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_NUM_VF \
+		linux/if_link.h \
+		enum IFLA_NUM_VF \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_EXT_MASK \
+		linux/if_link.h \
+		enum IFLA_EXT_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_IFLA_PHYS_SWITCH_ID \
 		linux/if_link.h \
 		enum IFLA_PHYS_SWITCH_ID \
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 0cf2f08..e3cb9bc 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -133,6 +133,10 @@ if build
 		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
 		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
 		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
+		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
+		'IFLA_NUM_VF' ],
+		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
+		'IFLA_EXT_MASK' ],
 		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
 		'IFLA_PHYS_SWITCH_ID' ],
 		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ad1975c..aac1966 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -13,7 +13,6 @@
 #include <errno.h>
 #include <net/if.h>
 #include <sys/mman.h>
-#include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 
 /* Verbs header. */
@@ -1001,9 +1000,24 @@
 	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
 	priv->nl_sn = 0;
 	priv->representor = !!switch_info->representor;
+	priv->master = !!switch_info->master;
 	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
-	priv->representor_id =
-		switch_info->representor ? switch_info->port_name : -1;
+	/*
+	 * Currently we support single E-Switch per PF configurations
+	 * only and vport_id field contains the vport index for
+	 * associated VF, which is deduced from representor port name.
+	 * For exapmple, let's have the IB device port 10, it has
+	 * attached network device eth0, which has port name attribute
+	 * pf0vf2, we can deduce the VF number as 2, and set vport index
+	 * as 3 (2+1). This assigning schema should be changed if the
+	 * multiple E-Switch instances per PF configurations or/and PCI
+	 * subfunctions are added.
+	 */
+	priv->vport_id = switch_info->representor ?
+			 switch_info->port_name + 1 : -1;
+	/* representor_id field keeps the unmodified port/VF index. */
+	priv->representor_id = switch_info->representor ?
+			       switch_info->port_name : -1;
 	/*
 	 * Look for sibling devices in order to reuse their switch domain
 	 * if any, otherwise allocate one.
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a88cb4a..a34e5cc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -214,7 +214,9 @@ struct mlx5_priv {
 	uint16_t mtu; /* Configured MTU. */
 	unsigned int isolated:1; /* Whether isolated mode is enabled. */
 	unsigned int representor:1; /* Device is a port representor. */
+	unsigned int master:1; /* Device is a E-Switch master. */
 	uint16_t domain_id; /* Switch domain identifier. */
+	uint16_t vport_id; /* Associated VF vport index (if any). */
 	int32_t representor_id; /* Port representor identifier. */
 	/* RX/TX queues. */
 	unsigned int rxqs_n; /* RX queues array size. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2e8a296..933839c 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1362,8 +1362,10 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 		.port_name = 0,
 		.switch_id = 0,
 	};
+	DIR *dir;
 	bool port_name_set = false;
 	bool port_switch_id_set = false;
+	bool device_dir = false;
 	char c;
 	int ret;
 
@@ -1376,6 +1378,8 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	      ifname);
 	MKSTR(phys_switch_id, "/sys/class/net/%s/phys_switch_id",
 	      ifname);
+	MKSTR(pci_device, "/sys/class/net/%s/device",
+	      ifname);
 
 	file = fopen(phys_port_name, "rb");
 	if (file != NULL) {
@@ -1394,9 +1398,21 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 		fscanf(file, "%" SCNx64 "%c", &data.switch_id, &c) == 2 &&
 		c == '\n';
 	fclose(file);
-	data.master = port_switch_id_set && !port_name_set;
-	data.representor = port_switch_id_set && port_name_set;
+	dir = opendir(pci_device);
+	if (dir != NULL) {
+		closedir(dir);
+		device_dir = true;
+	}
+	data.master = port_switch_id_set && (!port_name_set || device_dir);
+	data.representor = port_switch_id_set && port_name_set && !device_dir;
 	*info = data;
+	assert(!(data.master && data.representor));
+	if (data.master && data.representor) {
+		DRV_LOG(ERR, "ifindex %u device is recognized as master"
+			     " and as representor", ifindex);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 8a10109..d045b95 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -65,6 +65,12 @@
 #endif
 
 /* These are normally found in linux/if_link.h. */
+#ifndef HAVE_IFLA_NUM_VF
+#define IFLA_NUM_VF 21
+#endif
+#ifndef HAVE_IFLA_EXT_MASK
+#define IFLA_EXT_MASK 29
+#endif
 #ifndef HAVE_IFLA_PHYS_SWITCH_ID
 #define IFLA_PHYS_SWITCH_ID 36
 #endif
@@ -837,6 +843,7 @@ struct mlx5_nl_ifindex_data {
 	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
 	bool port_name_set = false;
 	bool switch_id_set = false;
+	bool num_vf_set = false;
 
 	if (nh->nlmsg_type != RTM_NEWLINK)
 		goto error;
@@ -848,6 +855,9 @@ struct mlx5_nl_ifindex_data {
 		if (ra->rta_len > nh->nlmsg_len - off)
 			goto error;
 		switch (ra->rta_type) {
+		case IFLA_NUM_VF:
+			num_vf_set = true;
+			break;
 		case IFLA_PHYS_PORT_NAME:
 			port_name_set =
 				mlx5_translate_port_name((char *)payload,
@@ -864,8 +874,20 @@ struct mlx5_nl_ifindex_data {
 		}
 		off += RTA_ALIGN(ra->rta_len);
 	}
-	info.master = switch_id_set && !port_name_set;
-	info.representor = switch_id_set && port_name_set;
+	if (switch_id_set) {
+		if (info.port_name_new) {
+			/* New representors naming schema. */
+			if (port_name_set) {
+				info.master = (info.port_name == -1);
+				info.representor = (info.port_name != -1);
+			}
+		} else {
+			/* Legacy representors naming schema. */
+			info.master = (!port_name_set || num_vf_set);
+			info.representor = port_name_set && !num_vf_set;
+		}
+	}
+	assert(!(data.master && data.representor));
 	memcpy(arg, &info, sizeof(info));
 	return 0;
 error:
@@ -893,9 +915,13 @@ struct mlx5_nl_ifindex_data {
 	struct {
 		struct nlmsghdr nh;
 		struct ifinfomsg info;
+		struct rtattr rta;
+		uint32_t extmask;
 	} req = {
 		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(req.info)),
+			.nlmsg_len = NLMSG_LENGTH
+					(sizeof(req.info) +
+					 RTA_LENGTH(sizeof(uint32_t))),
 			.nlmsg_type = RTM_GETLINK,
 			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
 		},
@@ -903,11 +929,22 @@ struct mlx5_nl_ifindex_data {
 			.ifi_family = AF_UNSPEC,
 			.ifi_index = ifindex,
 		},
+		.rta = {
+			.rta_type = IFLA_EXT_MASK,
+			.rta_len = RTA_LENGTH(sizeof(int32_t)),
+		},
+		.extmask = RTE_LE32(1),
 	};
 	int ret;
 
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret >= 0)
 		ret = mlx5_nl_recv(nl, seq, mlx5_nl_switch_info_cb, info);
+	if (info->master && info->representor) {
+		DRV_LOG(ERR, "ifindex %u device is recognized as master"
+			     " and as representor", ifindex);
+		rte_errno = ENODEV;
+		ret = -rte_errno;
+	}
 	return ret;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 02/13] net/mlx5: modify get ifindex routine for multiport IB
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 03/13] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
                         ` (12 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

There is the routine mlx5_nl_ifindex() returning the
network interface index associated with Infiniband device.
We are going to support multiport IB devices, now function
takes the IB port as argument and returns ifindex associated
with tuple <IB device, IB port>

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  2 +-
 drivers/net/mlx5/mlx5.h        |  2 +-
 drivers/net/mlx5/mlx5_ethdev.c |  2 +-
 drivers/net/mlx5/mlx5_nl.c     | 20 +++++++++++++-------
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index aac1966..e8631a9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1455,7 +1455,7 @@ struct mlx5_dev_spawn_data {
 			list[i].ifindex = 0;
 		else
 			list[i].ifindex = mlx5_nl_ifindex
-				(nl_rdma, list[i].ibv_dev->name);
+				(nl_rdma, list[i].ibv_dev->name, 1);
 		if (nl_route < 0 ||
 		    !list[i].ifindex ||
 		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a34e5cc..f359ed6 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -445,7 +445,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct ether_addr *mac,
 void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
 int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
 int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
-unsigned int mlx5_nl_ifindex(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 933839c..2b83898 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -222,7 +222,7 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int ifindex =
 		priv->nl_socket_rdma >= 0 ?
-		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name) : 0;
+		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name, 1) : 0;
 
 	if (!ifindex) {
 		if (!priv->representor)
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index d045b95..cdf6eb6 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -85,11 +85,12 @@ struct mlx5_nl_mac_addr {
 	int mac_n; /**< Number of addresses in the array. */
 };
 
-/** Data structure used by mlx5_nl_ifindex_cb(). */
+/** Data structure used by mlx5_nl_cmdget_cb(). */
 struct mlx5_nl_ifindex_data {
 	const char *name; /**< IB device name (in). */
 	uint32_t ibindex; /**< IB device index (out). */
 	uint32_t ifindex; /**< Network interface index (out). */
+	uint32_t portnum; /**< IB device max port number. */
 };
 
 /**
@@ -695,12 +696,13 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_ifindex_cb(struct nlmsghdr *nh, void *arg)
+mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
 {
 	struct mlx5_nl_ifindex_data *data = arg;
 	size_t off = NLMSG_HDRLEN;
 	uint32_t ibindex = 0;
 	uint32_t ifindex = 0;
+	uint32_t portnum = 0;
 	int found = 0;
 
 	if (nh->nlmsg_type !=
@@ -725,6 +727,9 @@ struct mlx5_nl_ifindex_data {
 		case RDMA_NLDEV_ATTR_NDEV_INDEX:
 			ifindex = *(uint32_t *)payload;
 			break;
+		case RDMA_NLDEV_ATTR_PORT_INDEX:
+			portnum = *(uint32_t *)payload;
+			break;
 		default:
 			break;
 		}
@@ -733,6 +738,7 @@ struct mlx5_nl_ifindex_data {
 	if (found) {
 		data->ibindex = ibindex;
 		data->ifindex = ifindex;
+		data->portnum = portnum;
 	}
 	return 0;
 error:
@@ -751,15 +757,15 @@ struct mlx5_nl_ifindex_data {
  *   Netlink socket of the RDMA kind (NETLINK_RDMA).
  * @param[in] name
  *   IB device name.
- *
+ * @param[in] pindex
+ *   IB device port index, starting from 1
  * @return
  *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
  *   is set.
  */
 unsigned int
-mlx5_nl_ifindex(int nl, const char *name)
+mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
 {
-	static const uint32_t pindex = 1;
 	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.name = name,
@@ -785,7 +791,7 @@ struct mlx5_nl_ifindex_data {
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!data.ibindex)
@@ -808,7 +814,7 @@ struct mlx5_nl_ifindex_data {
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!data.ifindex)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 03/13] net/mlx5: add getting IB ports number for multiport IB
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 02/13] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 04/13] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
                         ` (11 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

There is the routine mlx5_nl_portnum() added to get
the number of ports of multiport Infiniband device.
It is assumed the Uplink/VF representors are attached
on these ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.h    |  1 +
 drivers/net/mlx5/mlx5_nl.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f359ed6..309232b 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -445,6 +445,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct ether_addr *mac,
 void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
 int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
 int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
 unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index cdf6eb6..4653acf 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -826,6 +826,51 @@ struct mlx5_nl_ifindex_data {
 }
 
 /**
+ * Get the number of physical ports of given IB device.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ *
+ * @return
+ *   A valid (nonzero) number of ports on success, 0 otherwise
+ *   and rte_errno is set.
+ */
+unsigned int
+mlx5_nl_portnum(int nl, const char *name)
+{
+	uint32_t seq = random();
+	struct mlx5_nl_ifindex_data data = {
+		.name = name,
+		.ibindex = 0,
+		.ifindex = 0,
+		.portnum = 0,
+	};
+	struct nlmsghdr req = {
+		.nlmsg_len = NLMSG_LENGTH(0),
+		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					       RDMA_NLDEV_CMD_GET),
+		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+	};
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req, seq);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!data.ibindex) {
+		rte_errno = ENODEV;
+		return 0;
+	}
+	if (!data.portnum)
+		rte_errno = EINVAL;
+	return data.portnum;
+}
+
+/**
  * Process switch information from Netlink message.
  *
  * @param nh
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 04/13] net/mlx5: add multiport IB device support to probing
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (2 preceding siblings ...)
  2019-03-26 15:35       ` [PATCH v3 03/13] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 05/13] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
                         ` (10 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

mlx5_pci_probe() routine is refactored to probe the ports
of found Infiniband devices. All active ports (with attached
network interface), belonging to the same Infiniband device
will use the signle shared Infiniband context of that device.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 317 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 225 insertions(+), 92 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e8631a9..9e3a98f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -130,6 +130,16 @@
 /** Driver-specific log messages type. */
 int mlx5_logtype;
 
+/** Data associated with devices to spawn. */
+struct mlx5_dev_spawn_data {
+	uint32_t ifindex; /**< Network interface index. */
+	uint32_t max_port; /**< IB device maximal port index. */
+	uint32_t ibv_port; /**< IB device physical port index. */
+	struct mlx5_switch_info info; /**< Switch information. */
+	struct ibv_device *ibv_dev; /**< Associated IB device. */
+	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
+};
+
 /**
  * Prepare shared data between primary and secondary process.
  */
@@ -716,12 +726,10 @@
  *
  * @param dpdk_dev
  *   Backing DPDK device.
- * @param ibv_dev
- *   Verbs device.
+ * @param spawn
+ *   Verbs device parameters (name, port, switch_info) to spawn.
  * @param config
  *   Device configuration parameters.
- * @param[in] switch_info
- *   Switch properties of Ethernet device.
  *
  * @return
  *   A valid Ethernet device object on success, NULL otherwise and rte_errno
@@ -732,10 +740,11 @@
  */
 static struct rte_eth_dev *
 mlx5_dev_spawn(struct rte_device *dpdk_dev,
-	       struct ibv_device *ibv_dev,
-	       struct mlx5_dev_config config,
-	       const struct mlx5_switch_info *switch_info)
+	       struct mlx5_dev_spawn_data *spawn,
+	       struct mlx5_dev_config config)
 {
+	const struct mlx5_switch_info *switch_info = &spawn->info;
+	struct ibv_device *ibv_dev = spawn->ibv_dev;
 	struct ibv_context *ctx = NULL;
 	struct ibv_device_attr_ex attr;
 	struct ibv_port_attr port_attr;
@@ -952,7 +961,7 @@
 		return eth_dev;
 	}
 	/* Check port status. */
-	err = mlx5_glue->query_port(ctx, 1, &port_attr);
+	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
 	if (err) {
 		DRV_LOG(ERR, "port query failed: %s", strerror(err));
 		goto error;
@@ -1330,14 +1339,6 @@
 	return NULL;
 }
 
-/** Data associated with devices to spawn. */
-struct mlx5_dev_spawn_data {
-	unsigned int ifindex; /**< Network interface index. */
-	struct mlx5_switch_info info; /**< Switch information. */
-	struct ibv_device *ibv_dev; /**< Associated IB device. */
-	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
-};
-
 /**
  * Comparison callback to sort device data.
  *
@@ -1394,7 +1395,24 @@ struct mlx5_dev_spawn_data {
 	       struct rte_pci_device *pci_dev)
 {
 	struct ibv_device **ibv_list;
-	unsigned int n = 0;
+	/*
+	 * Number of found IB Devices matching with requested PCI BDF.
+	 * nd != 1 means there are multiple IB devices over the same
+	 * PCI device and we have representors and master.
+	 */
+	unsigned int nd = 0;
+	/*
+	 * Number of found IB device Ports. nd = 1 and np = 1..n means
+	 * we have the single multiport IB device, and there may be
+	 * representors attached to some of found ports.
+	 */
+	unsigned int np = 0;
+	/*
+	 * Number of DPDK ethernet devices to Spawn - either over
+	 * multiple IB devices or multiple ports of single IB device.
+	 * Actually this is the number of iterations to spawn.
+	 */
+	unsigned int ns = 0;
 	struct mlx5_dev_config dev_config;
 	int ret;
 
@@ -1406,8 +1424,14 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(ERR, "cannot list devices, is ib_uverbs loaded?");
 		return -rte_errno;
 	}
-
+	/*
+	 * First scan the list of all Infiniband devices to find
+	 * matching ones, gathering into the list.
+	 */
 	struct ibv_device *ibv_match[ret + 1];
+	int nl_route = -1;
+	int nl_rdma = -1;
+	unsigned int i;
 
 	while (ret-- > 0) {
 		struct rte_pci_addr pci_addr;
@@ -1422,77 +1446,183 @@ struct mlx5_dev_spawn_data {
 			continue;
 		DRV_LOG(INFO, "PCI information matches for device \"%s\"",
 			ibv_list[ret]->name);
-		ibv_match[n++] = ibv_list[ret];
+		ibv_match[nd++] = ibv_list[ret];
+	}
+	ibv_match[nd] = NULL;
+	if (!nd) {
+		/* No device macthes, just complain and bail out. */
+		mlx5_glue->free_device_list(ibv_list);
+		DRV_LOG(WARNING,
+			"no Verbs device matches PCI device " PCI_PRI_FMT ","
+			" are kernel drivers loaded?",
+			pci_dev->addr.domain, pci_dev->addr.bus,
+			pci_dev->addr.devid, pci_dev->addr.function);
+		rte_errno = ENOENT;
+		ret = -rte_errno;
+		return ret;
+	}
+	nl_route = mlx5_nl_init(NETLINK_ROUTE);
+	nl_rdma = mlx5_nl_init(NETLINK_RDMA);
+	if (nd == 1) {
+		/*
+		 * Found single matching device may have multiple ports.
+		 * Each port may be representor, we have to check the port
+		 * number and check the representors existence.
+		 */
+		if (nl_rdma >= 0)
+			np = mlx5_nl_portnum(nl_rdma, ibv_match[0]->name);
+		if (!np)
+			DRV_LOG(WARNING, "can not get IB device \"%s\""
+					 " ports number", ibv_match[0]->name);
 	}
-	ibv_match[n] = NULL;
-
-	struct mlx5_dev_spawn_data list[n];
-	int nl_route = n ? mlx5_nl_init(NETLINK_ROUTE) : -1;
-	int nl_rdma = n ? mlx5_nl_init(NETLINK_RDMA) : -1;
-	unsigned int i;
-	unsigned int u;
-
 	/*
-	 * The existence of several matching entries (n > 1) means port
-	 * representors have been instantiated. No existing Verbs call nor
-	 * /sys entries can tell them apart, this can only be done through
-	 * Netlink calls assuming kernel drivers are recent enough to
-	 * support them.
-	 *
-	 * In the event of identification failure through Netlink, try again
-	 * through sysfs, then either:
-	 *
-	 * 1. No device matches (n == 0), complain and bail out.
-	 * 2. A single IB device matches (n == 1) and is not a representor,
-	 *    assume no switch support.
-	 * 3. Otherwise no safe assumptions can be made; complain louder and
-	 *    bail out.
+	 * Now we can determine the maximal
+	 * amount of devices to be spawned.
 	 */
-	for (i = 0; i != n; ++i) {
-		list[i].ibv_dev = ibv_match[i];
-		list[i].eth_dev = NULL;
-		if (nl_rdma < 0)
-			list[i].ifindex = 0;
-		else
-			list[i].ifindex = mlx5_nl_ifindex
-				(nl_rdma, list[i].ibv_dev->name, 1);
-		if (nl_route < 0 ||
-		    !list[i].ifindex ||
-		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
-					&list[i].info) ||
-		    ((!list[i].info.representor && !list[i].info.master) &&
-		     mlx5_sysfs_switch_info(list[i].ifindex, &list[i].info))) {
-			list[i].ifindex = 0;
-			memset(&list[i].info, 0, sizeof(list[i].info));
-			continue;
+	struct mlx5_dev_spawn_data list[np ? np : nd];
+
+	if (np > 1) {
+		/*
+		 * Signle IB device with multiple ports found,
+		 * it may be E-Switch master device and representors.
+		 * We have to perform identification trough the ports.
+		 */
+		assert(nl_rdma >= 0);
+		assert(ns == 0);
+		assert(nd == 1);
+		for (i = 1; i <= np; ++i) {
+			list[ns].max_port = np;
+			list[ns].ibv_port = i;
+			list[ns].ibv_dev = ibv_match[0];
+			list[ns].eth_dev = NULL;
+			list[ns].ifindex = mlx5_nl_ifindex
+					(nl_rdma, list[ns].ibv_dev->name, i);
+			if (!list[ns].ifindex) {
+				/*
+				 * No network interface index found for the
+				 * specified port, it means there is no
+				 * representor on this port. It's OK,
+				 * there can be disabled ports, for example
+				 * if sriov_numvfs < sriov_totalvfs.
+				 */
+				continue;
+			}
+			ret = -1;
+			if (nl_route >= 0)
+				ret = mlx5_nl_switch_info
+					       (nl_route,
+						list[ns].ifindex,
+						&list[ns].info);
+			if (ret || (!list[ns].info.representor &&
+				    !list[ns].info.master)) {
+				/*
+				 * We failed to recognize representors with
+				 * Netlink, let's try to perform the task
+				 * with sysfs.
+				 */
+				ret =  mlx5_sysfs_switch_info
+						(list[ns].ifindex,
+						 &list[ns].info);
+			}
+			if (!ret && (list[ns].info.representor ^
+				     list[ns].info.master))
+				ns++;
 		}
-	}
-	if (nl_rdma >= 0)
-		close(nl_rdma);
-	if (nl_route >= 0)
-		close(nl_route);
-	/* Count unidentified devices. */
-	for (u = 0, i = 0; i != n; ++i)
-		if (!list[i].info.master && !list[i].info.representor)
-			++u;
-	if (u) {
-		if (n == 1 && u == 1) {
-			/* Case #2. */
-			DRV_LOG(INFO, "no switch support detected");
-		} else {
-			/* Case #3. */
+		if (!ns) {
 			DRV_LOG(ERR,
-				"unable to tell which of the matching devices"
-				" is the master (lack of kernel support?)");
-			n = 0;
+				"unable to recognize master/representors"
+				" on the IB device with multiple ports");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
+		}
+	} else {
+		/*
+		 * The existence of several matching entries (nd > 1) means
+		 * port representors have been instantiated. No existing Verbs
+		 * call nor sysfs entries can tell them apart, this can only
+		 * be done through Netlink calls assuming kernel drivers are
+		 * recent enough to support them.
+		 *
+		 * In the event of identification failure through Netlink,
+		 * try again through sysfs, then:
+		 *
+		 * 1. A single IB device matches (nd == 1) with single
+		 *    port (np=0/1) and is not a representor, assume
+		 *    no switch support.
+		 *
+		 * 2. Otherwise no safe assumptions can be made;
+		 *    complain louder and bail out.
+		 */
+		np = 1;
+		for (i = 0; i != nd; ++i) {
+			memset(&list[ns].info, 0, sizeof(list[ns].info));
+			list[ns].max_port = 1;
+			list[ns].ibv_port = 1;
+			list[ns].ibv_dev = ibv_match[i];
+			list[ns].eth_dev = NULL;
+			list[ns].ifindex = 0;
+			if (nl_rdma >= 0)
+				list[ns].ifindex = mlx5_nl_ifindex
+					(nl_rdma, list[ns].ibv_dev->name, 1);
+			if (!list[ns].ifindex) {
+				/*
+				 * No network interface index found for the
+				 * specified device, it means there it is not
+				 * a representor/master.
+				 */
+				continue;
+			}
+			ret = -1;
+			if (nl_route >= 0)
+				ret = mlx5_nl_switch_info
+					       (nl_route,
+						list[ns].ifindex,
+						&list[ns].info);
+			if (ret || (!list[ns].info.representor &&
+				    !list[ns].info.master)) {
+				/*
+				 * We failed to recognize representors with
+				 * Netlink, let's try to perform the task
+				 * with sysfs.
+				 */
+				ret =  mlx5_sysfs_switch_info
+						(list[ns].ifindex,
+						 &list[ns].info);
+			}
+			if (!ret && (list[ns].info.representor ^
+				     list[ns].info.master)) {
+				ns++;
+			} else if ((nd == 1) &&
+				   !list[ns].info.representor &&
+				   !list[ns].info.master) {
+				/*
+				 * Single IB device with
+				 * one physical port and
+				 * attached network device.
+				 * May be SRIOV is not enabled
+				 * or there is no representors.
+				 */
+				DRV_LOG(INFO, "no E-Switch support detected");
+				ns++;
+				break;
+			}
+		}
+		if (!ns) {
+			DRV_LOG(ERR,
+				"unable to recognize master/representors"
+				" on the multiple IB devices");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
 		}
 	}
+	assert(ns);
 	/*
 	 * Sort list to probe devices in natural order for users convenience
 	 * (i.e. master first, then representors from lowest to highest ID).
 	 */
-	if (n)
-		qsort(list, n, sizeof(*list), mlx5_dev_spawn_data_cmp);
+	qsort(list, ns, sizeof(*list), mlx5_dev_spawn_data_cmp);
 	/* Default configuration. */
 	dev_config = (struct mlx5_dev_config){
 		.hw_padding = 0,
@@ -1511,7 +1641,7 @@ struct mlx5_dev_spawn_data {
 			.min_rxqs_num = MLX5_MPRQ_MIN_RXQS,
 		},
 	};
-	/* Device speicific configuration. */
+	/* Device specific configuration. */
 	switch (pci_dev->id.device_id) {
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX5BF:
 		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS_BLUEFIELD;
@@ -1528,12 +1658,12 @@ struct mlx5_dev_spawn_data {
 	/* Set architecture-dependent default value if unset. */
 	if (dev_config.txqs_vec == MLX5_ARG_UNSET)
 		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS;
-	for (i = 0; i != n; ++i) {
+	for (i = 0; i != ns; ++i) {
 		uint32_t restore;
 
 		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
-						 list[i].ibv_dev, dev_config,
-						 &list[i].info);
+						 &list[i],
+						 dev_config);
 		if (!list[i].eth_dev) {
 			if (rte_errno != EBUSY && rte_errno != EEXIST)
 				break;
@@ -1546,16 +1676,7 @@ struct mlx5_dev_spawn_data {
 		list[i].eth_dev->data->dev_flags |= restore;
 		rte_eth_dev_probing_finish(list[i].eth_dev);
 	}
-	mlx5_glue->free_device_list(ibv_list);
-	if (!n) {
-		DRV_LOG(WARNING,
-			"no Verbs device matches PCI device " PCI_PRI_FMT ","
-			" are kernel drivers loaded?",
-			pci_dev->addr.domain, pci_dev->addr.bus,
-			pci_dev->addr.devid, pci_dev->addr.function);
-		rte_errno = ENOENT;
-		ret = -rte_errno;
-	} else if (i != n) {
+	if (i != ns) {
 		DRV_LOG(ERR,
 			"probe of PCI device " PCI_PRI_FMT " aborted after"
 			" encountering an error: %s",
@@ -1577,6 +1698,18 @@ struct mlx5_dev_spawn_data {
 	} else {
 		ret = 0;
 	}
+exit:
+	/*
+	 * Do the routine cleanup:
+	 * - close opened Netlink sockets
+	 * - free the Infiniband device list
+	 */
+	if (nl_rdma >= 0)
+		close(nl_rdma);
+	if (nl_route >= 0)
+		close(nl_route);
+	assert(ibv_list);
+	mlx5_glue->free_device_list(ibv_list);
 	return ret;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 05/13] net/mlx5: add IB shared context alloc/free functions
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (3 preceding siblings ...)
  2019-03-26 15:35       ` [PATCH v3 04/13] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 19:35         ` Shahaf Shuler
  2019-03-26 15:35       ` [PATCH v3 06/13] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
                         ` (9 subsequent siblings)
  14 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The Mellanox NICs support SR-IOV and have E-Switch feature.
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context saving various resources. This patch
introduces appropriate structures.

Also the functions to allocate and free shared IB context for
multiport are added. The IB device context, Protection Domain,
device attributes, Infiniband names are going to be relocated
to the shared structure from the device private one.
mlx5_dev_spawn() is updated to support shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 302 +++++++++++++++++++++++++++++++++---------------
 drivers/net/mlx5/mlx5.h |  32 +++++
 2 files changed, 239 insertions(+), 95 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9e3a98f..dd70e53 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -140,6 +140,145 @@ struct mlx5_dev_spawn_data {
 	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
 };
 
+static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
+static pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/**
+ * Allocate shared IB device context. If there is multiport device the
+ * master and representors will share this context, if there is single
+ * port dedicated IB device, the context will be used by only given
+ * port due to unification.
+ *
+ * Routine first searches the context for the spesified IB device name,
+ * if found the shared context assumed and reference counter is incremented.
+ * If no context found the new one is created and initialized with specified
+ * IB device context and parameters.
+ *
+ * @param[in] spawn
+ *   Pointer to the IB device attributes (name, port, etc).
+ *
+ * @return
+ *   Pointer to mlx5_ibv_shared object on success,
+ *   otherwise NULL and rte_errno is set.
+ */
+static struct mlx5_ibv_shared *
+mlx5_alloc_shared_ibctx(const struct mlx5_dev_spawn_data *spawn)
+{
+	struct mlx5_ibv_shared *sh;
+	int err = 0;
+
+	assert(spawn);
+	/* Secondary process should not create the shared context. */
+	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
+	pthread_mutex_lock(&mlx5_ibv_list_mutex);
+	/* Search for IB context by device name. */
+	LIST_FOREACH(sh, &mlx5_ibv_list, next) {
+		if (!strcmp(sh->ibdev_name, spawn->ibv_dev->name)) {
+			sh->refcnt++;
+			goto exit;
+		}
+	}
+	/* No device found, we have to create new sharted context. */
+	assert(spawn->max_port);
+	sh = rte_zmalloc("ethdev shared ib context",
+			 sizeof(struct mlx5_ibv_shared) +
+			 spawn->max_port *
+			 sizeof(struct mlx5_ibv_shared_port),
+			 RTE_CACHE_LINE_SIZE);
+	if (!sh) {
+		DRV_LOG(ERR, "shared context allocation failure");
+		rte_errno  = ENOMEM;
+		goto exit;
+	}
+	/* Try to open IB device with DV first, then usual Verbs. */
+	errno = 0;
+	sh->ctx = mlx5_glue->dv_open_device(spawn->ibv_dev);
+	if (sh->ctx) {
+		sh->devx = 1;
+		DRV_LOG(DEBUG, "DevX is supported");
+	} else {
+		sh->ctx = mlx5_glue->open_device(spawn->ibv_dev);
+		if (!sh->ctx) {
+			err = errno ? errno : ENODEV;
+			goto error;
+		}
+		DRV_LOG(DEBUG, "DevX is NOT supported");
+	}
+	err = mlx5_glue->query_device_ex(sh->ctx, NULL, &sh->device_attr);
+	if (err) {
+		DRV_LOG(DEBUG, "ibv_query_device_ex() failed");
+		goto error;
+	}
+	sh->refcnt = 1;
+	sh->max_port = spawn->max_port;
+	strncpy(sh->ibdev_name, sh->ctx->device->name,
+		sizeof(sh->ibdev_name));
+	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
+		sizeof(sh->ibdev_path));
+	sh->pd = mlx5_glue->alloc_pd(sh->ctx);
+	if (sh->pd == NULL) {
+		DRV_LOG(ERR, "PD allocation failure");
+		err = ENOMEM;
+		goto error;
+	}
+	LIST_INSERT_HEAD(&mlx5_ibv_list, sh, next);
+exit:
+	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
+	return sh;
+error:
+	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
+	assert(sh);
+	if (sh->pd)
+		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->ctx)
+		claim_zero(mlx5_glue->close_device(sh->ctx));
+	rte_free(sh);
+	assert(err > 0);
+	rte_errno = err;
+	return NULL;
+}
+
+/**
+ * Free shared IB device context. Decrement counter and if zero free
+ * all allocated resources and close handles.
+ *
+ * @param[in] sh
+ *   Pointer to mlx5_ibv_shared object to free
+ */
+static void
+mlx5_free_shared_ibctx(struct mlx5_ibv_shared *sh)
+{
+	pthread_mutex_lock(&mlx5_ibv_list_mutex);
+#ifndef NDEBUG
+	/* Check the object presence in the list. */
+	struct mlx5_ibv_shared *lctx;
+
+	LIST_FOREACH(lctx, &mlx5_ibv_list, next)
+		if (lctx == sh)
+			break;
+	assert(lctx);
+	if (lctx != sh) {
+		DRV_LOG(ERR, "Freeing non-existing shared IB context");
+		goto exit;
+	}
+#endif
+	assert(sh);
+	assert(sh->refcnt);
+	/* Secondary process should not free the shared context. */
+	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
+	if (--sh->refcnt)
+		goto exit;
+	LIST_REMOVE(sh, next);
+	if (sh->pd)
+		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->ctx)
+		claim_zero(mlx5_glue->close_device(sh->ctx));
+	rte_free(sh);
+exit:
+	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
+}
+
+
 /**
  * Prepare shared data between primary and secondary process.
  */
@@ -289,12 +428,10 @@ struct mlx5_dev_spawn_data {
 	}
 	mlx5_mprq_free_mp(dev);
 	mlx5_mr_release(dev);
-	if (priv->pd != NULL) {
-		assert(priv->ctx != NULL);
-		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
-		claim_zero(mlx5_glue->close_device(priv->ctx));
-	} else
-		assert(priv->ctx == NULL);
+	assert(priv->sh);
+	if (priv->sh)
+		mlx5_free_shared_ibctx(priv->sh);
+	priv->sh = NULL;
 	if (priv->rss_conf.rss_key != NULL)
 		rte_free(priv->rss_conf.rss_key);
 	if (priv->reta_idx != NULL)
@@ -744,11 +881,8 @@ struct mlx5_dev_spawn_data {
 	       struct mlx5_dev_config config)
 {
 	const struct mlx5_switch_info *switch_info = &spawn->info;
-	struct ibv_device *ibv_dev = spawn->ibv_dev;
-	struct ibv_context *ctx = NULL;
-	struct ibv_device_attr_ex attr;
+	struct mlx5_ibv_shared *sh;
 	struct ibv_port_attr port_attr;
-	struct ibv_pd *pd = NULL;
 	struct mlx5dv_context dv_attr = { .comp_mask = 0 };
 	struct rte_eth_dev *eth_dev = NULL;
 	struct mlx5_priv *priv = NULL;
@@ -807,18 +941,47 @@ struct mlx5_dev_spawn_data {
 	}
 	/* Prepare shared data between primary and secondary process. */
 	mlx5_prepare_shared_data();
-	errno = 0;
-	ctx = mlx5_glue->dv_open_device(ibv_dev);
-	if (ctx) {
-		config.devx = 1;
-		DRV_LOG(DEBUG, "DEVX is supported");
-	} else {
-		ctx = mlx5_glue->open_device(ibv_dev);
-		if (!ctx) {
-			rte_errno = errno ? errno : ENODEV;
-			return NULL;
+	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			DRV_LOG(ERR, "can not attach rte ethdev");
+			rte_errno = ENOMEM;
+			err = rte_errno;
+			goto error;
+		}
+		eth_dev->device = dpdk_dev;
+		eth_dev->dev_ops = &mlx5_dev_sec_ops;
+		err = mlx5_uar_init_secondary(eth_dev);
+		if (err) {
+			err = rte_errno;
+			goto error;
 		}
+		/* Receive command fd from primary process */
+		err = mlx5_socket_connect(eth_dev);
+		if (err < 0) {
+			err = rte_errno;
+			goto error;
+		}
+		/* Remap UAR for Tx queues. */
+		err = mlx5_tx_uar_remap(eth_dev, err);
+		if (err) {
+			err = rte_errno;
+			goto error;
+		}
+		/*
+		 * Ethdev pointer is still required as input since
+		 * the primary device is not accessible from the
+		 * secondary process.
+		 */
+		eth_dev->rx_pkt_burst = mlx5_select_rx_function(eth_dev);
+		eth_dev->tx_pkt_burst = mlx5_select_tx_function(eth_dev);
+		return eth_dev;
 	}
+	sh = mlx5_alloc_shared_ibctx(spawn);
+	if (!sh)
+		return NULL;
+	config.devx = sh->devx;
 #ifdef HAVE_IBV_MLX5_MOD_SWP
 	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_SWP;
 #endif
@@ -832,7 +995,7 @@ struct mlx5_dev_spawn_data {
 #ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
 	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_STRIDING_RQ;
 #endif
-	mlx5_glue->dv_query_device(ctx, &dv_attr);
+	mlx5_glue->dv_query_device(sh->ctx, &dv_attr);
 	if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) {
 		if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW) {
 			DRV_LOG(DEBUG, "enhanced MPW is supported");
@@ -917,51 +1080,8 @@ struct mlx5_dev_spawn_data {
 		" old OFED/rdma-core version or firmware configuration");
 #endif
 	config.mpls_en = mpls_en;
-	err = mlx5_glue->query_device_ex(ctx, NULL, &attr);
-	if (err) {
-		DEBUG("ibv_query_device_ex() failed");
-		goto error;
-	}
-	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
-	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-		eth_dev = rte_eth_dev_attach_secondary(name);
-		if (eth_dev == NULL) {
-			DRV_LOG(ERR, "can not attach rte ethdev");
-			rte_errno = ENOMEM;
-			err = rte_errno;
-			goto error;
-		}
-		eth_dev->device = dpdk_dev;
-		eth_dev->dev_ops = &mlx5_dev_sec_ops;
-		err = mlx5_uar_init_secondary(eth_dev);
-		if (err) {
-			err = rte_errno;
-			goto error;
-		}
-		/* Receive command fd from primary process */
-		err = mlx5_socket_connect(eth_dev);
-		if (err < 0) {
-			err = rte_errno;
-			goto error;
-		}
-		/* Remap UAR for Tx queues. */
-		err = mlx5_tx_uar_remap(eth_dev, err);
-		if (err) {
-			err = rte_errno;
-			goto error;
-		}
-		/*
-		 * Ethdev pointer is still required as input since
-		 * the primary device is not accessible from the
-		 * secondary process.
-		 */
-		eth_dev->rx_pkt_burst = mlx5_select_rx_function(eth_dev);
-		eth_dev->tx_pkt_burst = mlx5_select_tx_function(eth_dev);
-		claim_zero(mlx5_glue->close_device(ctx));
-		return eth_dev;
-	}
 	/* Check port status. */
-	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
+	err = mlx5_glue->query_port(sh->ctx, spawn->ibv_port, &port_attr);
 	if (err) {
 		DRV_LOG(ERR, "port query failed: %s", strerror(err));
 		goto error;
@@ -975,13 +1095,7 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(DEBUG, "port is not active: \"%s\" (%d)",
 			mlx5_glue->port_state_str(port_attr.state),
 			port_attr.state);
-	/* Allocate protection domain. */
-	pd = mlx5_glue->alloc_pd(ctx);
-	if (pd == NULL) {
-		DRV_LOG(ERR, "PD allocation failure");
-		err = ENOMEM;
-		goto error;
-	}
+	/* Allocate private eth device data. */
 	priv = rte_zmalloc("ethdev private structure",
 			   sizeof(*priv),
 			   RTE_CACHE_LINE_SIZE);
@@ -990,13 +1104,11 @@ struct mlx5_dev_spawn_data {
 		err = ENOMEM;
 		goto error;
 	}
-	priv->ctx = ctx;
-	strncpy(priv->ibdev_name, priv->ctx->device->name,
-		sizeof(priv->ibdev_name));
-	strncpy(priv->ibdev_path, priv->ctx->device->ibdev_path,
-		sizeof(priv->ibdev_path));
-	priv->device_attr = attr;
-	priv->pd = pd;
+	priv->sh = sh;
+	priv->ctx = sh->ctx;
+	priv->ibv_port = spawn->ibv_port;
+	priv->device_attr = sh->device_attr;
+	priv->pd = sh->pd;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
@@ -1065,7 +1177,8 @@ struct mlx5_dev_spawn_data {
 			strerror(rte_errno));
 		goto error;
 	}
-	config.hw_csum = !!(attr.device_cap_flags_ex & IBV_DEVICE_RAW_IP_CSUM);
+	config.hw_csum = !!(sh->device_attr.device_cap_flags_ex &
+			    IBV_DEVICE_RAW_IP_CSUM);
 	DRV_LOG(DEBUG, "checksum offloading is %ssupported",
 		(config.hw_csum ? "" : "not "));
 #if !defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) && \
@@ -1079,7 +1192,7 @@ struct mlx5_dev_spawn_data {
 	}
 #endif
 	config.ind_table_max_size =
-		attr.rss_caps.max_rwq_indirection_table_size;
+		sh->device_attr.rss_caps.max_rwq_indirection_table_size;
 	/*
 	 * Remove this check once DPDK supports larger/variable
 	 * indirection tables.
@@ -1088,18 +1201,18 @@ struct mlx5_dev_spawn_data {
 		config.ind_table_max_size = ETH_RSS_RETA_SIZE_512;
 	DRV_LOG(DEBUG, "maximum Rx indirection table size is %u",
 		config.ind_table_max_size);
-	config.hw_vlan_strip = !!(attr.raw_packet_caps &
+	config.hw_vlan_strip = !!(sh->device_attr.raw_packet_caps &
 				  IBV_RAW_PACKET_CAP_CVLAN_STRIPPING);
 	DRV_LOG(DEBUG, "VLAN stripping is %ssupported",
 		(config.hw_vlan_strip ? "" : "not "));
-	config.hw_fcs_strip = !!(attr.raw_packet_caps &
+	config.hw_fcs_strip = !!(sh->device_attr.raw_packet_caps &
 				 IBV_RAW_PACKET_CAP_SCATTER_FCS);
 	DRV_LOG(DEBUG, "FCS stripping configuration is %ssupported",
 		(config.hw_fcs_strip ? "" : "not "));
 #if defined(HAVE_IBV_WQ_FLAG_RX_END_PADDING)
-	hw_padding = !!attr.rx_pad_end_addr_align;
+	hw_padding = !!sh->device_attr.rx_pad_end_addr_align;
 #elif defined(HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING)
-	hw_padding = !!(attr.device_cap_flags_ex &
+	hw_padding = !!(sh->device_attr.device_cap_flags_ex &
 			IBV_DEVICE_PCI_WRITE_END_PADDING);
 #endif
 	if (config.hw_padding && !hw_padding) {
@@ -1108,11 +1221,11 @@ struct mlx5_dev_spawn_data {
 	} else if (config.hw_padding) {
 		DRV_LOG(DEBUG, "Rx end alignment padding is enabled");
 	}
-	config.tso = (attr.tso_caps.max_tso > 0 &&
-		      (attr.tso_caps.supported_qpts &
+	config.tso = (sh->device_attr.tso_caps.max_tso > 0 &&
+		      (sh->device_attr.tso_caps.supported_qpts &
 		       (1 << IBV_QPT_RAW_PACKET)));
 	if (config.tso)
-		config.tso_max_payload_sz = attr.tso_caps.max_tso;
+		config.tso_max_payload_sz = sh->device_attr.tso_caps.max_tso;
 	/*
 	 * MPW is disabled by default, while the Enhanced MPW is enabled
 	 * by default.
@@ -1246,7 +1359,7 @@ struct mlx5_dev_spawn_data {
 		}
 	}
 #ifdef HAVE_MLX5DV_DR
-		ns = mlx5dv_dr_create_ns(ctx,
+		ns = mlx5dv_dr_create_ns(sh->ctx,
 					 MLX5DV_DR_NS_DOMAIN_INGRESS_BYPASS);
 		if (ns == NULL) {
 			DRV_LOG(ERR, "mlx5dv_dr_create_ns failed");
@@ -1254,7 +1367,7 @@ struct mlx5_dev_spawn_data {
 			goto error;
 		}
 		priv->rx_ns = ns;
-		ns = mlx5dv_dr_create_ns(ctx,
+		ns = mlx5dv_dr_create_ns(sh->ctx,
 					 MLX5DV_DR_NS_DOMAIN_EGRESS_BYPASS);
 		if (ns == NULL) {
 			DRV_LOG(ERR, "mlx5dv_dr_create_ns failed");
@@ -1271,7 +1384,8 @@ struct mlx5_dev_spawn_data {
 		.free = &mlx5_free_verbs_buf,
 		.data = priv,
 	};
-	mlx5_glue->dv_set_context_attr(ctx, MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
+	mlx5_glue->dv_set_context_attr(sh->ctx,
+				       MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
 				       (void *)((uintptr_t)&alctr));
 	/* Bring Ethernet device up. */
 	DRV_LOG(DEBUG, "port %u forcing Ethernet interface up",
@@ -1325,15 +1439,13 @@ struct mlx5_dev_spawn_data {
 		if (eth_dev != NULL)
 			eth_dev->data->dev_private = NULL;
 	}
-	if (pd)
-		claim_zero(mlx5_glue->dealloc_pd(pd));
 	if (eth_dev != NULL) {
 		/* mac_addrs must not be freed alone because part of dev_private */
 		eth_dev->data->mac_addrs = NULL;
 		rte_eth_dev_release_port(eth_dev);
 	}
-	if (ctx)
-		claim_zero(mlx5_glue->close_device(ctx));
+	if (sh)
+		mlx5_free_shared_ibctx(sh);
 	assert(err > 0);
 	rte_errno = err;
 	return NULL;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 309232b..f612577 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -196,10 +196,42 @@ struct mlx5_flow_tbl_resource {
 #define MLX5_MAX_TABLES 1024
 #define MLX5_GROUP_FACTOR 1
 
+/* Per port data of shared IB device. */
+struct mlx5_ibv_shared_port {
+	uint32_t ih_port_id;
+	/*
+	 * Interrupt handler port_id. Used by shared interrupt
+	 * handler to find the corresponding rte_eth device
+	 * by IB port index. If value is equal or greater
+	 * RTE_MAX_ETHPORTS it means there is no subhandler
+	 * installed for specified IB port index.
+	 */
+};
+
+/*
+ * Shared Infiniband device context for Master/Representors
+ * which belong to same IB device with multiple IB ports.
+ **/
+struct mlx5_ibv_shared {
+	LIST_ENTRY(mlx5_ibv_shared) next;
+	uint32_t refcnt;
+	uint32_t devx:1; /* Opened with DV. */
+	uint32_t max_port; /* Maximal IB device port index. */
+	struct ibv_context *ctx; /* Verbs/DV context. */
+	struct ibv_pd *pd; /* Protection Domain. */
+	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
+	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
+	struct ibv_device_attr_ex device_attr; /* Device properties. */
+	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
+	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
+};
+
 struct mlx5_priv {
 	LIST_ENTRY(mlx5_priv) mem_event_cb;
 	/**< Called by memory event callback. */
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
+	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
+	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 06/13] net/mlx5: switch to the names in the shared IB context
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (4 preceding siblings ...)
  2019-03-26 15:35       ` [PATCH v3 05/13] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 07/13] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
                         ` (8 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The IB device names are moved from device private data
to the shared context, code involving the names is updated.
The IB port index treatment is added where it is relevant.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.h        |  2 --
 drivers/net/mlx5/mlx5_ethdev.c | 10 +++++++---
 drivers/net/mlx5/mlx5_stats.c  | 22 +++++++++++++---------
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f612577..e5edcde 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -235,8 +235,6 @@ struct mlx5_priv {
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
-	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2b83898..0662594 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -138,8 +138,10 @@ struct ethtool_link_settings {
 	unsigned int dev_port_prev = ~0u;
 	char match[IF_NAMESIZE] = "";
 
+	assert(priv);
+	assert(priv->sh);
 	{
-		MKSTR(path, "%s/device/net", priv->ibdev_path);
+		MKSTR(path, "%s/device/net", priv->sh->ibdev_path);
 
 		dir = opendir(path);
 		if (dir == NULL) {
@@ -159,7 +161,7 @@ struct ethtool_link_settings {
 			continue;
 
 		MKSTR(path, "%s/device/net/%s/%s",
-		      priv->ibdev_path, name,
+		      priv->sh->ibdev_path, name,
 		      (dev_type ? "dev_id" : "dev_port"));
 
 		file = fopen(path, "rb");
@@ -222,7 +224,9 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int ifindex =
 		priv->nl_socket_rdma >= 0 ?
-		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name, 1) : 0;
+		mlx5_nl_ifindex(priv->nl_socket_rdma,
+				priv->sh->ibdev_name,
+				priv->ibv_port) : 0;
 
 	if (!ifindex) {
 		if (!priv->representor)
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 6906dc8..5af199d 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -140,18 +140,22 @@
 mlx5_read_ib_stat(struct mlx5_priv *priv, const char *ctr_name, uint64_t *stat)
 {
 	FILE *file;
-	MKSTR(path, "%s/ports/1/hw_counters/%s",
-		  priv->ibdev_path,
-		  ctr_name);
+	if (priv->sh) {
+		MKSTR(path, "%s/ports/%d/hw_counters/%s",
+			  priv->sh->ibdev_path,
+			  priv->ibv_port,
+			  ctr_name);
 
-	file = fopen(path, "rb");
-	if (file) {
-		int n = fscanf(file, "%" SCNu64, stat);
+		file = fopen(path, "rb");
+		if (file) {
+			int n = fscanf(file, "%" SCNu64, stat);
 
-		fclose(file);
-		if (n != 1)
-			stat = 0;
+			fclose(file);
+			if (n == 1)
+				return;
+		}
 	}
+	*stat = 0;
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 07/13] net/mlx5: switch to the shared Protection Domain
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (5 preceding siblings ...)
  2019-03-26 15:35       ` [PATCH v3 06/13] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 08/13] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
                         ` (7 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The PMD code is updated to use Protected Domain from the
the shared IB device context. The Domain is shared between
all devices belonging to the same multiport Infiniband device.
If IB device has only one port, the PD is not shared, because
there is only ethernet device created over IB one.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c     |  1 -
 drivers/net/mlx5/mlx5.h     |  1 -
 drivers/net/mlx5/mlx5_mr.c  |  4 ++--
 drivers/net/mlx5/mlx5_rxq.c | 10 +++++-----
 drivers/net/mlx5/mlx5_txq.c |  2 +-
 5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index dd70e53..ef49f2d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1108,7 +1108,6 @@ struct mlx5_dev_spawn_data {
 	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
 	priv->device_attr = sh->device_attr;
-	priv->pd = sh->pd;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e5edcde..f7debf1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -234,7 +234,6 @@ struct mlx5_priv {
 	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
-	struct ibv_pd *pd; /* Protection Domain. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 700d83d..f7eb9a5 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -719,7 +719,7 @@ struct mr_update_mp_data {
 	 * mlx5_alloc_buf_extern() which eventually calls rte_malloc_socket()
 	 * through mlx5_alloc_verbs_buf().
 	 */
-	mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)data.start, len,
+	mr->ibv_mr = mlx5_glue->reg_mr(priv->sh->pd, (void *)data.start, len,
 				       IBV_ACCESS_LOCAL_WRITE);
 	if (mr->ibv_mr == NULL) {
 		DEBUG("port %u fail to create a verbs MR for address (%p)",
@@ -1156,7 +1156,7 @@ struct mr_update_mp_data {
 	}
 	DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)",
 		dev->data->port_id, mem_idx, mp->name);
-	mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
+	mr->ibv_mr = mlx5_glue->reg_mr(priv->sh->pd, (void *)addr, len,
 				       IBV_ACCESS_LOCAL_WRITE);
 	if (mr->ibv_mr == NULL) {
 		DRV_LOG(WARNING,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2f60999..0496c4e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -867,7 +867,7 @@ struct mlx5_rxq_ibv *
 		.max_wr = wqe_n >> rxq_data->sges_n,
 		/* Max number of scatter/gather elements in a WR. */
 		.max_sge = 1 << rxq_data->sges_n,
-		.pd = priv->pd,
+		.pd = priv->sh->pd,
 		.cq = tmpl->cq,
 		.comp_mask =
 			IBV_WQ_FLAGS_CVLAN_STRIPPING |
@@ -1831,7 +1831,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = hash_fields,
 			},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 		 },
 		 &qp_init_attr);
 #else
@@ -1850,7 +1850,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = hash_fields,
 			},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 		 });
 #endif
 	if (!qp) {
@@ -2006,7 +2006,7 @@ struct mlx5_rxq_ibv *
 			.wq_type = IBV_WQT_RQ,
 			.max_wr = 1,
 			.max_sge = 1,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 			.cq = cq,
 		 });
 	if (!wq) {
@@ -2160,7 +2160,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = 0,
 				},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd
+			.pd = priv->sh->pd
 		 });
 	if (!qp) {
 		DEBUG("port %u cannot allocate QP for drop queue",
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d185617..d3a5498 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -426,7 +426,7 @@ struct mlx5_txq_ibv *
 		 * Tx burst.
 		 */
 		.sq_sig_all = 0,
-		.pd = priv->pd,
+		.pd = priv->sh->pd,
 		.comp_mask = IBV_QP_INIT_ATTR_PD,
 	};
 	if (txq_data->max_inline)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 08/13] net/mlx5: switch to the shared context IB attributes
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (6 preceding siblings ...)
  2019-03-26 15:35       ` [PATCH v3 07/13] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 09/13] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
                         ` (6 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to use the shared IB device attributes,
located in the shared IB context. It saves some memory if
there are representors created over the single Infiniband
device with multuple ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  1 -
 drivers/net/mlx5/mlx5.h        |  1 -
 drivers/net/mlx5/mlx5_ethdev.c |  6 +++---
 drivers/net/mlx5/mlx5_rxq.c    |  8 ++++----
 drivers/net/mlx5/mlx5_txq.c    | 14 +++++++-------
 5 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ef49f2d..d7580d5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1107,7 +1107,6 @@ struct mlx5_dev_spawn_data {
 	priv->sh = sh;
 	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
-	priv->device_attr = sh->device_attr;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f7debf1..3939a5e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -233,7 +233,6 @@ struct mlx5_priv {
 	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
 	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
-	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 0662594..7003c32 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -515,8 +515,8 @@ struct ethtool_link_settings {
 	 * Since we need one CQ per QP, the limit is the minimum number
 	 * between the two values.
 	 */
-	max = RTE_MIN(priv->device_attr.orig_attr.max_cq,
-		      priv->device_attr.orig_attr.max_qp);
+	max = RTE_MIN(priv->sh->device_attr.orig_attr.max_cq,
+		      priv->sh->device_attr.orig_attr.max_qp);
 	/* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */
 	if (max >= 65535)
 		max = 65535;
@@ -577,7 +577,7 @@ struct ethtool_link_settings {
 int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct ibv_device_attr *attr = &priv->device_attr.orig_attr;
+	struct ibv_device_attr *attr = &priv->sh->device_attr.orig_attr;
 	size_t size = strnlen(attr->fw_ver, sizeof(attr->fw_ver)) + 1;
 
 	if (fw_size < size)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 0496c4e..fd1c3a2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -856,10 +856,10 @@ struct mlx5_rxq_ibv *
 		rte_errno = ENOMEM;
 		goto error;
 	}
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_qp_wr is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_qp_wr);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_sge is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_sge);
+	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_qp_wr);
+	DRV_LOG(DEBUG, "port %u device_attr.max_sge is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_sge);
 	attr.wq.ibv = (struct ibv_wq_init_attr){
 		.wq_context = NULL, /* Could be useful in the future. */
 		.wq_type = IBV_WQT_RQ,
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d3a5498..10a3040 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -407,15 +407,15 @@ struct mlx5_txq_ibv *
 		.cap = {
 			/* Max number of outstanding WRs. */
 			.max_send_wr =
-				((priv->device_attr.orig_attr.max_qp_wr <
+				((priv->sh->device_attr.orig_attr.max_qp_wr <
 				  desc) ?
-				 priv->device_attr.orig_attr.max_qp_wr :
+				 priv->sh->device_attr.orig_attr.max_qp_wr :
 				 desc),
 			/*
 			 * Max number of scatter/gather elements in a WR,
 			 * must be 1 to prevent libmlx5 from trying to affect
 			 * too much memory. TX gather is not impacted by the
-			 * priv->device_attr.max_sge limit and will still work
+			 * device_attr.max_sge limit and will still work
 			 * properly.
 			 */
 			.max_send_sge = 1,
@@ -780,10 +780,10 @@ struct mlx5_txq_ctrl *
 	tmpl->txq.elts_n = log2above(desc);
 	tmpl->idx = idx;
 	txq_set_params(tmpl);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_qp_wr is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_qp_wr);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_sge is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_sge);
+	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_qp_wr);
+	DRV_LOG(DEBUG, "port %u device_attr.max_sge is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_sge);
 	tmpl->txq.elts =
 		(struct rte_mbuf *(*)[1 << tmpl->txq.elts_n])(tmpl + 1);
 	tmpl->txq.stats.idx = idx;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 09/13] net/mlx5: switch to the shared IB device context
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (7 preceding siblings ...)
  2019-03-26 15:35       ` [PATCH v3 08/13] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 10/13] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
                         ` (5 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to use the shared IB device context and
device handles. The IB device context is shared between
reprentors created over the single multiport IB device. All
Verbs and DevX objects will be created whithin this shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c            |  3 +--
 drivers/net/mlx5/mlx5.h            |  1 -
 drivers/net/mlx5/mlx5_ethdev.c     | 13 +++++++------
 drivers/net/mlx5/mlx5_flow_dv.c    |  9 +++++----
 drivers/net/mlx5/mlx5_flow_verbs.c |  6 ++++--
 drivers/net/mlx5/mlx5_rxq.c        | 23 ++++++++++++-----------
 drivers/net/mlx5/mlx5_socket.c     |  4 ++--
 drivers/net/mlx5/mlx5_trigger.c    |  2 +-
 drivers/net/mlx5/mlx5_txq.c        |  4 ++--
 9 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d7580d5..8f35546 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -402,7 +402,7 @@ struct mlx5_dev_spawn_data {
 
 	DRV_LOG(DEBUG, "port %u closing device \"%s\"",
 		dev->data->port_id,
-		((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+		((priv->sh->ctx != NULL) ? priv->sh->ctx->device->name : ""));
 	/* In case mlx5_dev_stop() has not been called. */
 	mlx5_dev_interrupt_handler_uninstall(dev);
 	mlx5_traffic_disable(dev);
@@ -1105,7 +1105,6 @@ struct mlx5_dev_spawn_data {
 		goto error;
 	}
 	priv->sh = sh;
-	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3939a5e..573629a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -232,7 +232,6 @@ struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
 	uint32_t ibv_port; /* IB device port number. */
-	struct ibv_context *ctx; /* Verbs context. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 7003c32..2ad3231 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1055,7 +1055,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	}
 	/* Read all message and acknowledge them. */
 	for (;;) {
-		if (mlx5_glue->get_async_event(priv->ctx, &event))
+		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
 			break;
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
 			event.event_type == IBV_EVENT_PORT_ERR) &&
@@ -1142,12 +1142,13 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	int ret;
 	int flags;
 
-	assert(priv->ctx->async_fd > 0);
-	flags = fcntl(priv->ctx->async_fd, F_GETFL);
-	ret = fcntl(priv->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
+	assert(ctx->async_fd > 0);
+	flags = fcntl(ctx->async_fd, F_GETFL);
+	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
 	if (ret) {
 		DRV_LOG(INFO,
 			"port %u failed to change file descriptor async event"
@@ -1158,7 +1159,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	}
 	if (dev->data->dev_conf.intr_conf.lsc ||
 	    dev->data->dev_conf.intr_conf.rmv) {
-		priv->intr_handle.fd = priv->ctx->async_fd;
+		priv->intr_handle.fd = ctx->async_fd;
 		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
 		rte_intr_callback_register(&priv->intr_handle,
 					   mlx5_dev_interrupt_handler, dev);
@@ -1303,7 +1304,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	struct ibv_device_attr device_attr;
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (mlx5_glue->query_device(priv->ctx, &device_attr) == EIO)
+	if (mlx5_glue->query_device(priv->sh->ctx, &device_attr) == EIO)
 		return 1;
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 44b36ee..b8943da 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -840,7 +840,7 @@ struct field_modify_info modify_tcp[] = {
 	*cache_resource = *resource;
 	cache_resource->verbs_action =
 		mlx5_glue->dv_create_flow_action_packet_reformat
-			(priv->ctx, cache_resource->reformat_type,
+			(priv->sh->ctx, cache_resource->reformat_type,
 			 cache_resource->ft_type, ns, cache_resource->flags,
 			 cache_resource->size,
 			 (cache_resource->size ? cache_resource->buf : NULL));
@@ -1569,7 +1569,8 @@ struct field_modify_info modify_tcp[] = {
 	*cache_resource = *resource;
 	cache_resource->verbs_action =
 		mlx5_glue->dv_create_flow_action_modify_header
-					(priv->ctx, cache_resource->ft_type,
+					(priv->sh->ctx,
+					 cache_resource->ft_type,
 					 ns, 0,
 					 cache_resource->actions_num *
 					 sizeof(cache_resource->actions[0]),
@@ -1629,7 +1630,7 @@ struct field_modify_info modify_tcp[] = {
 		ret = -ENOMEM;
 		goto error_exit;
 	}
-	ret = mlx5_devx_cmd_flow_counter_alloc(priv->ctx, dcs);
+	ret = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, dcs);
 	if (ret)
 		goto error_exit;
 	struct mlx5_flow_counter tmpl = {
@@ -2982,7 +2983,7 @@ struct field_modify_info modify_tcp[] = {
 	if (matcher->egress)
 		dv_attr.flags |= IBV_FLOW_ATTR_FLAGS_EGRESS;
 	cache_matcher->matcher_object =
-		mlx5_glue->dv_create_flow_matcher(priv->ctx, &dv_attr,
+		mlx5_glue->dv_create_flow_matcher(priv->sh->ctx, &dv_attr,
 						  tbl->obj);
 	if (!cache_matcher->matcher_object) {
 		rte_free(cache_matcher);
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 6c4f52f..9548d12 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -56,10 +56,11 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context ctx = priv->sh->ctx;
 	struct ibv_counter_set_init_attr init = {
 			 .counter_set_id = counter->id};
 
-	counter->cs = mlx5_glue->create_counter_set(priv->ctx, &init);
+	counter->cs = mlx5_glue->create_counter_set(ctx, &init);
 	if (!counter->cs) {
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
@@ -67,12 +68,13 @@
 	return 0;
 #elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	struct ibv_counters_init_attr init = {0};
 	struct ibv_counter_attach_attr attach;
 	int ret;
 
 	memset(&attach, 0, sizeof(attach));
-	counter->cs = mlx5_glue->create_counters(priv->ctx, &init);
+	counter->cs = mlx5_glue->create_counters(ctx, &init);
 	if (!counter->cs) {
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index fd1c3a2..dcb97c2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -799,7 +799,7 @@ struct mlx5_rxq_ibv *
 	}
 	tmpl->rxq_ctrl = rxq_ctrl;
 	if (rxq_ctrl->irq) {
-		tmpl->channel = mlx5_glue->create_comp_channel(priv->ctx);
+		tmpl->channel = mlx5_glue->create_comp_channel(priv->sh->ctx);
 		if (!tmpl->channel) {
 			DRV_LOG(ERR, "port %u: comp channel creation failure",
 				dev->data->port_id);
@@ -848,7 +848,7 @@ struct mlx5_rxq_ibv *
 	}
 #endif
 	tmpl->cq = mlx5_glue->cq_ex_to_cq
-		(mlx5_glue->dv_create_cq(priv->ctx, &attr.cq.ibv,
+		(mlx5_glue->dv_create_cq(priv->sh->ctx, &attr.cq.ibv,
 					 &attr.cq.mlx5));
 	if (tmpl->cq == NULL) {
 		DRV_LOG(ERR, "port %u Rx queue %u CQ creation failure",
@@ -905,10 +905,10 @@ struct mlx5_rxq_ibv *
 			.two_byte_shift_en = MLX5_MPRQ_TWO_BYTE_SHIFT,
 		};
 	}
-	tmpl->wq = mlx5_glue->dv_create_wq(priv->ctx, &attr.wq.ibv,
+	tmpl->wq = mlx5_glue->dv_create_wq(priv->sh->ctx, &attr.wq.ibv,
 					   &attr.wq.mlx5);
 #else
-	tmpl->wq = mlx5_glue->create_wq(priv->ctx, &attr.wq.ibv);
+	tmpl->wq = mlx5_glue->create_wq(priv->sh->ctx, &attr.wq.ibv);
 #endif
 	if (tmpl->wq == NULL) {
 		DRV_LOG(ERR, "port %u Rx queue %u WQ creation failure",
@@ -1643,7 +1643,7 @@ struct mlx5_ind_table_ibv *
 	for (j = 0; i != (unsigned int)(1 << wq_n); ++i, ++j)
 		wq[i] = wq[j];
 	ind_tbl->ind_table = mlx5_glue->create_rwq_ind_table
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_rwq_ind_table_init_attr){
 			.log_ind_tbl_size = wq_n,
 			.ind_tbl = wq,
@@ -1817,7 +1817,7 @@ struct mlx5_hrxq *
 	}
 #endif
 	qp = mlx5_glue->dv_create_qp
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
@@ -1836,7 +1836,7 @@ struct mlx5_hrxq *
 		 &qp_init_attr);
 #else
 	qp = mlx5_glue->create_qp_ex
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
@@ -1988,20 +1988,21 @@ struct mlx5_rxq_ibv *
 mlx5_rxq_ibv_drop_new(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	struct ibv_cq *cq;
 	struct ibv_wq *wq = NULL;
 	struct mlx5_rxq_ibv *rxq;
 
 	if (priv->drop_queue.rxq)
 		return priv->drop_queue.rxq;
-	cq = mlx5_glue->create_cq(priv->ctx, 1, NULL, NULL, 0);
+	cq = mlx5_glue->create_cq(ctx, 1, NULL, NULL, 0);
 	if (!cq) {
 		DEBUG("port %u cannot allocate CQ for drop queue",
 		      dev->data->port_id);
 		rte_errno = errno;
 		goto error;
 	}
-	wq = mlx5_glue->create_wq(priv->ctx,
+	wq = mlx5_glue->create_wq(ctx,
 		 &(struct ibv_wq_init_attr){
 			.wq_type = IBV_WQT_RQ,
 			.max_wr = 1,
@@ -2078,7 +2079,7 @@ struct mlx5_ind_table_ibv *
 	if (!rxq)
 		return NULL;
 	tmpl.ind_table = mlx5_glue->create_rwq_ind_table
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_rwq_ind_table_init_attr){
 			.log_ind_tbl_size = 0,
 			.ind_tbl = &rxq->wq,
@@ -2145,7 +2146,7 @@ struct mlx5_hrxq *
 	ind_tbl = mlx5_ind_table_ibv_drop_new(dev);
 	if (!ind_tbl)
 		return NULL;
-	qp = mlx5_glue->create_qp_ex(priv->ctx,
+	qp = mlx5_glue->create_qp_ex(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
diff --git a/drivers/net/mlx5/mlx5_socket.c b/drivers/net/mlx5/mlx5_socket.c
index 41cac3c..8fa6430 100644
--- a/drivers/net/mlx5/mlx5_socket.c
+++ b/drivers/net/mlx5/mlx5_socket.c
@@ -185,9 +185,9 @@
 	assert(cmsg != NULL);
 	cmsg->cmsg_level = SOL_SOCKET;
 	cmsg->cmsg_type = SCM_RIGHTS;
-	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->ctx->cmd_fd));
+	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->sh->ctx->cmd_fd));
 	fd = (int *)CMSG_DATA(cmsg);
-	*fd = priv->ctx->cmd_fd;
+	*fd = priv->sh->ctx->cmd_fd;
 	ret = sendmsg(conn_sock, &msg, 0);
 	if (ret < 0)
 		DRV_LOG(WARNING, "port %u cannot send response",
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 2137bdc..d13a1a1 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -58,7 +58,7 @@
 			goto error;
 		}
 	}
-	ret = mlx5_tx_uar_remap(dev, priv->ctx->cmd_fd);
+	ret = mlx5_tx_uar_remap(dev, priv->sh->ctx->cmd_fd);
 	if (ret) {
 		/* Adjust index for rollback. */
 		i = priv->txqs_n - 1;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 10a3040..25305b6 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -392,7 +392,7 @@ struct mlx5_txq_ibv *
 		((desc / MLX5_TX_COMP_THRESH) - 1) : 1;
 	if (is_empw_burst_func(tx_pkt_burst))
 		cqe_n += MLX5_TX_COMP_THRESH_INLINE_DIV;
-	tmpl.cq = mlx5_glue->create_cq(priv->ctx, cqe_n, NULL, NULL, 0);
+	tmpl.cq = mlx5_glue->create_cq(priv->sh->ctx, cqe_n, NULL, NULL, 0);
 	if (tmpl.cq == NULL) {
 		DRV_LOG(ERR, "port %u Tx queue %u CQ creation failure",
 			dev->data->port_id, idx);
@@ -435,7 +435,7 @@ struct mlx5_txq_ibv *
 		attr.init.max_tso_header = txq_ctrl->max_tso_header;
 		attr.init.comp_mask |= IBV_QP_INIT_ATTR_MAX_TSO_HEADER;
 	}
-	tmpl.qp = mlx5_glue->create_qp_ex(priv->ctx, &attr.init);
+	tmpl.qp = mlx5_glue->create_qp_ex(priv->sh->ctx, &attr.init);
 	if (tmpl.qp == NULL) {
 		DRV_LOG(ERR, "port %u Tx queue %u QP creation failure",
 			dev->data->port_id, idx);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 10/13] net/mlx5: provide IB port for the object being created
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (8 preceding siblings ...)
  2019-03-26 15:35       ` [PATCH v3 09/13] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 11/13] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
                         ` (4 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to provide IB port index for the Verbs
objects being created - QPs and Verbs Flows.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c       | 2 ++
 drivers/net/mlx5/mlx5_flow_verbs.c | 1 +
 drivers/net/mlx5/mlx5_txq.c        | 4 ++--
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 010d61a..40d6818 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -315,6 +315,7 @@ struct mlx5_flow_tunnel_info {
 int
 mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
 		struct ibv_flow_attr attr;
 		struct ibv_flow_spec_eth eth;
@@ -322,6 +323,7 @@ struct mlx5_flow_tunnel_info {
 	} flow_attr = {
 		.attr = {
 			.num_of_specs = 2,
+			.port = (uint8_t)priv->ibv_port,
 		},
 		.eth = {
 			.type = IBV_FLOW_SPEC_ETH,
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 9548d12..b69af6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -1550,6 +1550,7 @@
 	dev_flow->layers = item_flags;
 	dev_flow->verbs.attr->priority =
 		mlx5_flow_adjust_priority(dev, priority, subpriority);
+	dev_flow->verbs.attr->port = (uint8_t)priv->ibv_port;
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 25305b6..5062f5c 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -445,8 +445,8 @@ struct mlx5_txq_ibv *
 	attr.mod = (struct ibv_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = 1,
+		/* IB device port number. */
+		.port_num = (uint8_t)priv->ibv_port,
 	};
 	ret = mlx5_glue->modify_qp(tmpl.qp, &attr.mod,
 				   (IBV_QP_STATE | IBV_QP_PORT));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 11/13] net/mlx5: update install/uninstall int handler routines
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (9 preceding siblings ...)
  2019-03-26 15:35       ` [PATCH v3 10/13] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 12/13] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
                         ` (3 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

We are implementing the support for multport Infiniband device
withj representors attached to these multiple ports. Asynchronous
device event notifications (link status change, removal event, etc.)
should be shared between ports. We are going to implement shared
event handler and this patch introduces appropriate device
structure changes and updated event handler install and uninstall
routines.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  19 ++++++-
 drivers/net/mlx5/mlx5.h        |   3 +-
 drivers/net/mlx5/mlx5_ethdev.c | 118 ++++++++++++++++++++++++++++++++---------
 3 files changed, 112 insertions(+), 28 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 8f35546..0c81fda 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -166,6 +166,7 @@ struct mlx5_dev_spawn_data {
 {
 	struct mlx5_ibv_shared *sh;
 	int err = 0;
+	uint32_t i;
 
 	assert(spawn);
 	/* Secondary process should not create the shared context. */
@@ -215,6 +216,14 @@ struct mlx5_dev_spawn_data {
 		sizeof(sh->ibdev_name));
 	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
 		sizeof(sh->ibdev_path));
+	pthread_mutex_init(&sh->intr_mutex, NULL);
+	/*
+	 * Setting port_id to max unallowed value means
+	 * there is no interrupt subhandler installed for
+	 * the given port index i.
+	 */
+	for (i = 0; i < sh->max_port; i++)
+		sh->port[i].ih_port_id = RTE_MAX_ETHPORTS;
 	sh->pd = mlx5_glue->alloc_pd(sh->ctx);
 	if (sh->pd == NULL) {
 		DRV_LOG(ERR, "PD allocation failure");
@@ -269,6 +278,15 @@ struct mlx5_dev_spawn_data {
 	if (--sh->refcnt)
 		goto exit;
 	LIST_REMOVE(sh, next);
+	/*
+	 *  Ensure there is no async event handler installed.
+	 *  Only primary process handles async device events.
+	 **/
+	assert(!sh->intr_cnt);
+	if (sh->intr_cnt)
+		rte_intr_callback_unregister
+			(&sh->intr_handle, mlx5_dev_interrupt_handler, sh);
+	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
@@ -278,7 +296,6 @@ struct mlx5_dev_spawn_data {
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
 }
 
-
 /**
  * Prepare shared data between primary and secondary process.
  */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 573629a..23f3e34 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -222,6 +222,8 @@ struct mlx5_ibv_shared {
 	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
 	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
+	pthread_mutex_t intr_mutex; /* Interrupt config mutex. */
+	uint32_t intr_cnt; /* Interrupt handler reference counter. */
 	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
@@ -252,7 +254,6 @@ struct mlx5_priv {
 	struct mlx5_txq_data *(*txqs)[]; /* TX queues. */
 	struct rte_mempool *mprq_mp; /* Mempool for Multi-Packet RQ. */
 	struct rte_eth_rss_conf rss_conf; /* RSS configuration. */
-	struct rte_intr_handle intr_handle; /* Interrupt handler. */
 	unsigned int (*reta_idx)[]; /* RETA index table. */
 	unsigned int reta_idx_n; /* RETA index size. */
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2ad3231..0e588da 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1109,6 +1109,96 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
+ * Uninstall shared asynchronous device events handler.
+ * This function is implemeted to support event sharing
+ * between multiple ports of single IB device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_dev_shared_handler_uninstall(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+	pthread_mutex_lock(&sh->intr_mutex);
+	assert(priv->ibv_port);
+	assert(priv->ibv_port <= sh->max_port);
+	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
+	if (sh->port[priv->ibv_port - 1].ih_port_id >= RTE_MAX_ETHPORTS)
+		goto exit;
+	assert(sh->port[priv->ibv_port - 1].port_id ==
+					(uint32_t)dev->data->port_id);
+	assert(sh->intr_cnt);
+	sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
+	if (!sh->intr_cnt || --sh->intr_cnt)
+		goto exit;
+	rte_intr_callback_unregister(&sh->intr_handle,
+				     mlx5_dev_interrupt_handler, sh);
+	sh->intr_handle.fd = 0;
+	sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+exit:
+	pthread_mutex_unlock(&sh->intr_mutex);
+}
+
+/**
+ * Install shared asyncronous device events handler.
+ * This function is implemeted to support event sharing
+ * between multiple ports of single IB device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_dev_shared_handler_install(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
+	int ret;
+	int flags;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+	pthread_mutex_lock(&sh->intr_mutex);
+	assert(priv->ibv_port);
+	assert(priv->ibv_port <= sh->max_port);
+	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
+	if (sh->port[priv->ibv_port - 1].ih_port_id < RTE_MAX_ETHPORTS) {
+		/* The handler is already installed for this port. */
+		assert(sh->intr_cnt);
+		goto exit;
+	}
+	sh->port[priv->ibv_port - 1].ih_port_id = (uint32_t)dev->data->port_id;
+	if (sh->intr_cnt) {
+		sh->intr_cnt++;
+		goto exit;
+	}
+	/* No shared handler installed. */
+	assert(sh->ctx->async_fd > 0);
+	flags = fcntl(sh->ctx->async_fd, F_GETFL);
+	ret = fcntl(sh->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
+	if (ret) {
+		DRV_LOG(INFO, "failed to change file descriptor"
+			      " async event queue");
+		/* Indicate there will be no interrupts. */
+		dev->data->dev_conf.intr_conf.lsc = 0;
+		dev->data->dev_conf.intr_conf.rmv = 0;
+		sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
+		goto exit;
+	}
+	sh->intr_handle.fd = sh->ctx->async_fd;
+	sh->intr_handle.type = RTE_INTR_HANDLE_EXT;
+	rte_intr_callback_register(&sh->intr_handle,
+				   mlx5_dev_interrupt_handler, sh);
+	sh->intr_cnt++;
+exit:
+	pthread_mutex_unlock(&sh->intr_mutex);
+}
+
+/**
  * Uninstall interrupt handler.
  *
  * @param dev
@@ -1119,15 +1209,10 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (dev->data->dev_conf.intr_conf.lsc ||
-	    dev->data->dev_conf.intr_conf.rmv)
-		rte_intr_callback_unregister(&priv->intr_handle,
-					     mlx5_dev_interrupt_handler, dev);
+	mlx5_dev_shared_handler_uninstall(dev);
 	if (priv->primary_socket)
 		rte_intr_callback_unregister(&priv->intr_handle_socket,
 					     mlx5_dev_handler_socket, dev);
-	priv->intr_handle.fd = 0;
-	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 	priv->intr_handle_socket.fd = 0;
 	priv->intr_handle_socket.type = RTE_INTR_HANDLE_UNKNOWN;
 }
@@ -1142,28 +1227,9 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct ibv_context *ctx = priv->sh->ctx;
 	int ret;
-	int flags;
 
-	assert(ctx->async_fd > 0);
-	flags = fcntl(ctx->async_fd, F_GETFL);
-	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
-	if (ret) {
-		DRV_LOG(INFO,
-			"port %u failed to change file descriptor async event"
-			" queue",
-			dev->data->port_id);
-		dev->data->dev_conf.intr_conf.lsc = 0;
-		dev->data->dev_conf.intr_conf.rmv = 0;
-	}
-	if (dev->data->dev_conf.intr_conf.lsc ||
-	    dev->data->dev_conf.intr_conf.rmv) {
-		priv->intr_handle.fd = ctx->async_fd;
-		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
-		rte_intr_callback_register(&priv->intr_handle,
-					   mlx5_dev_interrupt_handler, dev);
-	}
+	mlx5_dev_shared_handler_install(dev);
 	ret = mlx5_socket_init(dev);
 	if (ret)
 		DRV_LOG(ERR, "port %u cannot initialise socket: %s",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 12/13] net/mlx5: update event handler for multiport IB devices
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (10 preceding siblings ...)
  2019-03-26 15:35       ` [PATCH v3 11/13] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 15:35       ` [PATCH v3 13/13] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
                         ` (2 subsequent siblings)
  14 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

This patch modifies asynchronous event handler to support multiport
Infiniband devices. Handler queries the event parameters, including
event source port index, and invokes the handler for specific
devices with appropriate port_id.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5_ethdev.c | 101 +++++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 0e588da..ddd9545 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1032,66 +1032,67 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Device status handler.
+ * Handle shared asynchronous events the NIC (removal event
+ * and link status change). Supports multiport IB device.
  *
- * @param dev
- *   Pointer to Ethernet device.
- * @param events
- *   Pointer to event flags holder.
- *
- * @return
- *   Events bitmap of callback process which can be called immediately.
+ * @param cb_arg
+ *   Callback argument.
  */
-static uint32_t
-mlx5_dev_status_handler(struct rte_eth_dev *dev)
+void
+mlx5_dev_interrupt_handler(void *cb_arg)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = cb_arg;
 	struct ibv_async_event event;
-	uint32_t ret = 0;
 
-	if (mlx5_link_update(dev, 0) == -EAGAIN) {
-		usleep(0);
-		return 0;
-	}
-	/* Read all message and acknowledge them. */
+	/* Read all message from the IB device and acknowledge them. */
 	for (;;) {
-		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
+		struct rte_eth_dev *dev;
+		uint32_t tmp;
+
+		if (mlx5_glue->get_async_event(sh->ctx, &event))
 			break;
+		/* Retrieve and check IB port index. */
+		tmp = (uint32_t)event.element.port_num;
+		assert(tmp && (tmp <= sh->max_port));
+		if (!tmp ||
+		    tmp > sh->max_port ||
+		    sh->port[tmp - 1].ih_port_id >= RTE_MAX_ETHPORTS) {
+			/*
+			 * Invalid IB port index or no handler
+			 * installed for this port.
+			 */
+			mlx5_glue->ack_async_event(&event);
+			continue;
+		}
+		/* Retrieve ethernet device descriptor. */
+		tmp = sh->port[tmp - 1].ih_port_id;
+		dev = &rte_eth_devices[tmp];
+		tmp = 0;
+		assert(dev);
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
-			event.event_type == IBV_EVENT_PORT_ERR) &&
-			(dev->data->dev_conf.intr_conf.lsc == 1))
-			ret |= (1 << RTE_ETH_EVENT_INTR_LSC);
-		else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
-			dev->data->dev_conf.intr_conf.rmv == 1)
-			ret |= (1 << RTE_ETH_EVENT_INTR_RMV);
-		else
-			DRV_LOG(DEBUG,
-				"port %u event type %d on not handled",
-				dev->data->port_id, event.event_type);
+		     event.event_type == IBV_EVENT_PORT_ERR) &&
+			dev->data->dev_conf.intr_conf.lsc) {
+			mlx5_glue->ack_async_event(&event);
+			if (mlx5_link_update(dev, 0) == -EAGAIN) {
+				usleep(0);
+				continue;
+			}
+			_rte_eth_dev_callback_process
+				(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
+			continue;
+		}
+		if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
+		    dev->data->dev_conf.intr_conf.rmv) {
+			mlx5_glue->ack_async_event(&event);
+			_rte_eth_dev_callback_process
+				(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
+			continue;
+		}
+		DRV_LOG(DEBUG,
+			"port %u event type %d on not handled",
+			dev->data->port_id, event.event_type);
 		mlx5_glue->ack_async_event(&event);
 	}
-	return ret;
-}
-
-/**
- * Handle interrupts from the NIC.
- *
- * @param[in] intr_handle
- *   Interrupt handler.
- * @param cb_arg
- *   Callback argument.
- */
-void
-mlx5_dev_interrupt_handler(void *cb_arg)
-{
-	struct rte_eth_dev *dev = cb_arg;
-	uint32_t events;
-
-	events = mlx5_dev_status_handler(dev);
-	if (events & (1 << RTE_ETH_EVENT_INTR_LSC))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
-	if (events & (1 << RTE_ETH_EVENT_INTR_RMV))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v3 13/13] net/mlx5: add source vport match to the ingress rules
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (11 preceding siblings ...)
  2019-03-26 15:35       ` [PATCH v3 12/13] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
@ 2019-03-26 15:35       ` Viacheslav Ovsiienko
  2019-03-26 19:38         ` Shahaf Shuler
  2019-03-27  6:00       ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Shahaf Shuler
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
  14 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-26 15:35 UTC (permalink / raw)
  To: dev; +Cc: shahafs

For E-Switch configurations over multiport Infiniband devices
we should add source vport match to correctly distribute
traffic between representors.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index b8943da..71a626b 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -3095,6 +3095,29 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add source vport match to the specified matcher.
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] port
+ *   Source vport value to match
+ * @param[in] mask
+ *   Mask
+ */
+static void
+flow_dv_translate_source_vport(void *matcher, void *key,
+			      int16_t port, uint16_t mask)
+{
+	void *misc_m = MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters);
+	void *misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+
+	MLX5_SET(fte_match_set_misc, misc_m, source_port, mask);
+	MLX5_SET(fte_match_set_misc, misc_v, source_port, port);
+}
+
+/**
  * Fill the flow with DV spec.
  *
  * @param[in] dev
@@ -3389,6 +3412,19 @@ struct field_modify_info modify_tcp[] = {
 	}
 	dev_flow->dv.actions_n = actions_n;
 	flow->actions = action_flags;
+	if (attr->ingress && !attr->transfer &&
+	    (priv->representor || priv->master)) {
+		/* It was validated - we support unidirection flows only. */
+		assert(!attr->egress);
+		/*
+		 * Add matching on source vport index only
+		 * for ingress rules in E-Switch configurations.
+		 */
+		flow_dv_translate_source_vport(matcher.mask.buf,
+					       dev_flow->dv.value.buf,
+					       priv->vport_id,
+					       0xffff);
+	}
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		void *match_mask = matcher.mask.buf;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 05/13] net/mlx5: add IB shared context alloc/free functions
  2019-03-26 15:35       ` [PATCH v3 05/13] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
@ 2019-03-26 19:35         ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-26 19:35 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Tuesday, March 26, 2019 5:35 PM, Viacheslav Ovsiienko:
> Subject: [PATCH v3 05/13] net/mlx5: add IB shared context alloc/free
> functions
> 
> The Mellanox NICs support SR-IOV and have E-Switch feature.
> When SR-IOV is set up in switchdev mode and E-Switch is enabled we have
> so called VF representors in the system. All representors belonging to the
> same E-Switch are created on the basis of the single PCI function and with
> current implementation each representor has its own dedicated Infiniband
> device and operates within its own Infiniband context. It is proposed to
> provide representors as ports of the single Infiniband device and operate on
> the shared Infiniband context saving various resources. This patch introduces
> appropriate structures.
> 
> Also the functions to allocate and free shared IB context for multiport are
> added. The IB device context, Protection Domain, device attributes,
> Infiniband names are going to be relocated to the shared structure from the
> device private one.
> mlx5_dev_spawn() is updated to support shared context.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellanox.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 01/13] net/mlx5: add representor recognition on kernels 5.x
  2019-03-26 15:35       ` [PATCH v3 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
@ 2019-03-26 19:37         ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-26 19:37 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Tuesday, March 26, 2019 5:35 PM, Viacheslav Ovsiienko:
> Subject: [dpdk-dev] [PATCH v3 01/13] net/mlx5: add representor recognition
> on kernels 5.x
> 
> The master device and VF representors were distinguished by presence of
> port name, master device did not have one. The new Linux kernels starting
> from 5.0 provide the port name for master device and the implemented
> representor recognizing method does not work.
> The new recognizing method is based on querying the VF number, has been
> created on the base of the device.
> 
> The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
> attribute is specified in the Netlink request message.
> 
> Also the presence check of device symlink in device sysfs folder is added to
> distinguish representors with sysfs based method.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellanox.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 13/13] net/mlx5: add source vport match to the ingress rules
  2019-03-26 15:35       ` [PATCH v3 13/13] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
@ 2019-03-26 19:38         ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-26 19:38 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Tuesday, March 26, 2019 5:35 PM, Viacheslav Ovsiienko:
> Subject: [PATCH v3 13/13] net/mlx5: add source vport match to the ingress
> rules
> 
> For E-Switch configurations over multiport Infiniband devices we should add
> source vport match to correctly distribute traffic between representors.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Acked-by: Shahaf Shuler <shahafs@mellanox.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 00/14] net/mlx5: add support for multiport IB devices
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (12 preceding siblings ...)
  2019-03-26 15:35       ` [PATCH v3 13/13] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
@ 2019-03-27  6:00       ` Shahaf Shuler
  2019-03-27  7:31         ` Slava Ovsiienko
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
  14 siblings, 1 reply; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-27  6:00 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Tuesday, March 26, 2019 5:35 PM, Viacheslav Ovsiienko:
> Subject: [PATCH v3 00/14] net/mlx5: add support for multiport IB devices
> 
> The Mellanox NICs support SR-IOV and have E-Switch feature.
> When SR-IOV is set up in switchdev mode and E-Switch is enabled we have
> so called VF representors in the system. All representors belonging to the
> same E-Switch are created on the basis of the single PCI function and with
> current implementation each representor has its own dedicated Infiniband
> device and operates within its own Infiniband context. It is proposed to
> provide representors as ports of the single Infiniband device and operate on
> the shared Infiniband context.
> 
> mlx5_pci_probe() routine is planned to be extended to probe the ports of
> found Infiniband devices. All active ports, belonging to the same Infiniband
> device will use the signle shared Infiniband context of that device:
>   - QP are created within shared context, with assigning to
>     specified port
>   - Verb flows are also created with specifying port index
>   - Protection Domain is shared by all representors
>   - Global Memory Regions Cache is shared by all representors,
>     local ones are not (not in this patchset)
>   - Shared Asynchronous Event Handler distributes events
>     among the device ports (not in this patchset)
>   - UAR mapping to user space is also shared (not in patchset)
> 
> Each representor has its own attached network interface device, interface
> name and index can be retrieved via IB device and port index, so the link
> control is performed via Netlink in the same way as in existing
> implementation.
> 
> This patchset also superseedes http://patches.dpdk.org/patch/50486/
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Slava, basically I am OK w/ the series, in order for me to apply:
1. need to rebase on top of latest master. There are some merge conflicts.
2. at least patch[1] seems to be taken from another branch (checkout the HAVE_MLX5DV_DR flag). Please handle. 

[1]
https://patches.dpdk.org/patch/51739/


> 
> ---
> v3:
> 	- vport_id field is added, contains associated VF vport
> 	  index on E-Switch. representor_id field is reverted
> 	  to keep the VF index
> 	- some comment spellcheck
> 	- some error logging is added for master/representor
> 	  wrong recognition
> 	- shared IB context alloc/free routines mutex
> 	  protection added
> 
> v2:
> 	http://patches.dpdk.org/patch/51642/
> 	- clarifying comments added
> 	- secondary process does not probe IB device at all and
> 	  does not create shared context
> 	- minor bugs (like ++ in assert) fixed
> 	- representor_id contains corresponding VF vport index,
> 	  no corrections +1 is needed anymore. vport index is
> 	  calculated in the only place in code and if indexing
> 	  schema is changed it would be easy to modify
> 
> v1:
> 	http://patches.dpdk.org/cover/51451/
> 
> Viacheslav Ovsiienko (13):
>   net/mlx5: add representor recognition on kernels 5.x
>   net/mlx5: modify get ifindex routine for multiport IB
>   net/mlx5: add getting IB ports number for multiport IB
>   net/mlx5: add multiport IB device support to probing
>   net/mlx5: add IB shared context alloc/free functions
>   net/mlx5: switch to the names in the shared IB context
>   net/mlx5: switch to the shared Protection Domain
>   net/mlx5: switch to the shared context IB attributes
>   net/mlx5: switch to the shared IB device context
>   net/mlx5: provide IB port for the object being created
>   net/mlx5: update install/uninstall int handler routines
>   net/mlx5: update event handler for multiport IB devices
>   net/mlx5: add source vport match to the ingress rules
> 
>  drivers/net/mlx5/Makefile          |  10 +
>  drivers/net/mlx5/meson.build       |   4 +
>  drivers/net/mlx5/mlx5.c            | 651 ++++++++++++++++++++++++++--------
> ---
>  drivers/net/mlx5/mlx5.h            |  45 ++-
>  drivers/net/mlx5/mlx5_ethdev.c     | 248 +++++++++-----
>  drivers/net/mlx5/mlx5_flow.c       |   2 +
>  drivers/net/mlx5/mlx5_flow_dv.c    |  45 ++-
>  drivers/net/mlx5/mlx5_flow_verbs.c |   7 +-
>  drivers/net/mlx5/mlx5_mr.c         |   4 +-
>  drivers/net/mlx5/mlx5_nl.c         | 108 +++++-
>  drivers/net/mlx5/mlx5_rxq.c        |  41 +--
>  drivers/net/mlx5/mlx5_socket.c     |   4 +-
>  drivers/net/mlx5/mlx5_stats.c      |  22 +-
>  drivers/net/mlx5/mlx5_trigger.c    |   2 +-
>  drivers/net/mlx5/mlx5_txq.c        |  24 +-
>  15 files changed, 879 insertions(+), 338 deletions(-)
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v3 00/14] net/mlx5: add support for multiport IB devices
  2019-03-27  6:00       ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Shahaf Shuler
@ 2019-03-27  7:31         ` Slava Ovsiienko
  0 siblings, 0 replies; 114+ messages in thread
From: Slava Ovsiienko @ 2019-03-27  7:31 UTC (permalink / raw)
  To: Shahaf Shuler, dev

> -----Original Message-----
> From: Shahaf Shuler
> Sent: Wednesday, March 27, 2019 8:00
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Subject: RE: [PATCH v3 00/14] net/mlx5: add support for multiport IB devices
> 
> Tuesday, March 26, 2019 5:35 PM, Viacheslav Ovsiienko:
> > Subject: [PATCH v3 00/14] net/mlx5: add support for multiport IB
> > devices
> >
> > The Mellanox NICs support SR-IOV and have E-Switch feature.
[..]
> >
> > This patchset also superseedes http://patches.dpdk.org/patch/50486/
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> Slava, basically I am OK w/ the series, in order for me to apply:
> 1. need to rebase on top of latest master. There are some merge conflicts.
> 2. at least patch[1] seems to be taken from another branch (checkout the
> HAVE_MLX5DV_DR flag). Please handle.
> 
> [1]
> https://patches.dpdk.org/patch/51739/

Off the list:

My patchset is rebased on the top,  but there are three Ori's commits not merged
I was to told to base on:  http://patches.dpdk.org/patch/51403/

> 
> 
> >
> > ---
> > v3:
> > 	- vport_id field is added, contains associated VF vport
> > 	  index on E-Switch. representor_id field is reverted
> > 	  to keep the VF index
> > 	- some comment spellcheck
> > 	- some error logging is added for master/representor
> > 	  wrong recognition
> > 	- shared IB context alloc/free routines mutex
> > 	  protection added
> >
> > v2:
> > 	http://patches.dpdk.org/patch/51642/
> > 	- clarifying comments added
> > 	- secondary process does not probe IB device at all and
> > 	  does not create shared context
> > 	- minor bugs (like ++ in assert) fixed
> > 	- representor_id contains corresponding VF vport index,
> > 	  no corrections +1 is needed anymore. vport index is
> > 	  calculated in the only place in code and if indexing
> > 	  schema is changed it would be easy to modify
> >
> > v1:
> > 	http://patches.dpdk.org/cover/51451/
> >
> > Viacheslav Ovsiienko (13):
> >   net/mlx5: add representor recognition on kernels 5.x
> >   net/mlx5: modify get ifindex routine for multiport IB
> >   net/mlx5: add getting IB ports number for multiport IB
> >   net/mlx5: add multiport IB device support to probing
> >   net/mlx5: add IB shared context alloc/free functions
> >   net/mlx5: switch to the names in the shared IB context
> >   net/mlx5: switch to the shared Protection Domain
> >   net/mlx5: switch to the shared context IB attributes
> >   net/mlx5: switch to the shared IB device context
> >   net/mlx5: provide IB port for the object being created
> >   net/mlx5: update install/uninstall int handler routines
> >   net/mlx5: update event handler for multiport IB devices
> >   net/mlx5: add source vport match to the ingress rules
> >
> >  drivers/net/mlx5/Makefile          |  10 +
> >  drivers/net/mlx5/meson.build       |   4 +
> >  drivers/net/mlx5/mlx5.c            | 651 ++++++++++++++++++++++++++--------
> > ---
> >  drivers/net/mlx5/mlx5.h            |  45 ++-
> >  drivers/net/mlx5/mlx5_ethdev.c     | 248 +++++++++-----
> >  drivers/net/mlx5/mlx5_flow.c       |   2 +
> >  drivers/net/mlx5/mlx5_flow_dv.c    |  45 ++-
> >  drivers/net/mlx5/mlx5_flow_verbs.c |   7 +-
> >  drivers/net/mlx5/mlx5_mr.c         |   4 +-
> >  drivers/net/mlx5/mlx5_nl.c         | 108 +++++-
> >  drivers/net/mlx5/mlx5_rxq.c        |  41 +--
> >  drivers/net/mlx5/mlx5_socket.c     |   4 +-
> >  drivers/net/mlx5/mlx5_stats.c      |  22 +-
> >  drivers/net/mlx5/mlx5_trigger.c    |   2 +-
> >  drivers/net/mlx5/mlx5_txq.c        |  24 +-
> >  15 files changed, 879 insertions(+), 338 deletions(-)
> >
> > --
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v4 00/14] net/mlx5: add support for multiport IB devices
  2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
                         ` (13 preceding siblings ...)
  2019-03-27  6:00       ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Shahaf Shuler
@ 2019-03-27 13:15       ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
                           ` (13 more replies)
  14 siblings, 14 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The Mellanox NICs support SR-IOV and have E-Switch feature. 
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context.

mlx5_pci_probe() routine is planned to be extended to probe
the ports of found Infiniband devices. All active ports,
belonging to the same Infiniband device will use the signle
shared Infiniband context of that device:
  - QP are created within shared context, with assigning to
    specified port
  - Verb flows are also created with specifying port index
  - Protection Domain is shared by all representors
  - Global Memory Regions Cache is shared by all representors,
    local ones are not (not in this patchset)
  - Shared Asynchronous Event Handler distributes events
    among the device ports (not in this patchset)
  - UAR mapping to user space is also shared (not in patchset)
		  
Each representor has its own attached network interface device,
interface name and index can be retrieved via IB device and
port index, so the link control is performed via Netlink
in the same way as in existing implementation.

This patchset also superseedes http://patches.dpdk.org/patch/50486/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
v4:
	- minor typo fixes for debug compilation
	- rebased on current top of repo
	- removed dependency on DR (Direct Rules)

v3:
	http://patches.dpdk.org/cover/51728/
	- vport_id field is added, contains associated VF vport
	  index on E-Switch. representor_id field is reverted
	  to keep the VF index
	- some comment spellcheck
	- some error logging is added for master/representor
	  wrong recognition
	- shared IB context alloc/free routines mutex
	  protection added  

v2:
	http://patches.dpdk.org/patch/51642/
	- clarifying comments added
	- secondary process does not probe IB device at all and
	  does not create shared context
	- minor bugs (like ++ in assert) fixed
	- representor_id contains corresponding VF vport index,
	  no corrections +1 is needed anymore. vport index is
	  calculated in the only place in code and if indexing
	  schema is changed it would be easy to modify
	  
v1:
	http://patches.dpdk.org/cover/51451/

Viacheslav Ovsiienko (13):
  net/mlx5: add representor recognition on kernels 5.x
  net/mlx5: modify get ifindex routine for multiport IB
  net/mlx5: add getting IB ports number for multiport IB
  net/mlx5: add multiport IB device support to probing
  net/mlx5: add IB shared context alloc/free functions
  net/mlx5: switch to the names in the shared IB context
  net/mlx5: switch to the shared Protection Domain
  net/mlx5: switch to the shared context IB attributes
  net/mlx5: switch to the shared IB device context
  net/mlx5: provide IB port for the object being created
  net/mlx5: update install/uninstall int handler routines
  net/mlx5: update event handler for multiport IB devices
  net/mlx5: add source vport match to the ingress rules

 drivers/net/mlx5/Makefile          |  10 +
 drivers/net/mlx5/meson.build       |   4 +
 drivers/net/mlx5/mlx5.c            | 638 ++++++++++++++++++++++++++-----------
 drivers/net/mlx5/mlx5.h            |  45 ++-
 drivers/net/mlx5/mlx5_ethdev.c     | 248 +++++++++-----
 drivers/net/mlx5/mlx5_flow.c       |   2 +
 drivers/net/mlx5/mlx5_flow_dv.c    |  44 ++-
 drivers/net/mlx5/mlx5_flow_verbs.c |   7 +-
 drivers/net/mlx5/mlx5_mr.c         |   4 +-
 drivers/net/mlx5/mlx5_nl.c         | 108 ++++++-
 drivers/net/mlx5/mlx5_rxq.c        |  41 +--
 drivers/net/mlx5/mlx5_socket.c     |   4 +-
 drivers/net/mlx5/mlx5_stats.c      |  22 +-
 drivers/net/mlx5/mlx5_trigger.c    |   2 +-
 drivers/net/mlx5/mlx5_txq.c        |  24 +-
 15 files changed, 868 insertions(+), 335 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v4 01/13] net/mlx5: add representor recognition on kernels 5.x
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 02/13] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
                           ` (12 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The master device and VF representors were distinguished by
presence of port name, master device did not have one. The new Linux
kernels starting from 5.0 provide the port name for master device
and the implemented representor recognizing method does not work.
The new recognizing method is based on querying the VF number,
has been created on the base of the device.

The IFLA_NUM_VF attribute is returned by kernel if IFLA_EXT_MASK
attribute is specified in the Netlink request message.

Also the presence check of device symlink in device sysfs folder
is added to distinguish representors with sysfs based method.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/Makefile      | 10 ++++++++++
 drivers/net/mlx5/meson.build   |  4 ++++
 drivers/net/mlx5/mlx5.c        | 20 +++++++++++++++++---
 drivers/net/mlx5/mlx5.h        |  2 ++
 drivers/net/mlx5/mlx5_ethdev.c | 20 ++++++++++++++++++--
 drivers/net/mlx5/mlx5_nl.c     | 43 +++++++++++++++++++++++++++++++++++++++---
 6 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 9a7da18..c326494 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -226,6 +226,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum RDMA_NLDEV_ATTR_NDEV_INDEX \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_NUM_VF \
+		linux/if_link.h \
+		enum IFLA_NUM_VF \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_IFLA_EXT_MASK \
+		linux/if_link.h \
+		enum IFLA_EXT_MASK \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_IFLA_PHYS_SWITCH_ID \
 		linux/if_link.h \
 		enum IFLA_PHYS_SWITCH_ID \
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 0cf2f08..e3cb9bc 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -133,6 +133,10 @@ if build
 		'ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT' ],
 		[ 'HAVE_ETHTOOL_LINK_MODE_100G', 'linux/ethtool.h',
 		'ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT' ],
+		[ 'HAVE_IFLA_NUM_VF', 'linux/if_link.h',
+		'IFLA_NUM_VF' ],
+		[ 'HAVE_IFLA_EXT_MASK', 'linux/if_link.h',
+		'IFLA_EXT_MASK' ],
 		[ 'HAVE_IFLA_PHYS_SWITCH_ID', 'linux/if_link.h',
 		'IFLA_PHYS_SWITCH_ID' ],
 		[ 'HAVE_IFLA_PHYS_PORT_NAME', 'linux/if_link.h',
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ae4b716..de72b2f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -13,7 +13,6 @@
 #include <errno.h>
 #include <net/if.h>
 #include <sys/mman.h>
-#include <linux/netlink.h>
 #include <linux/rtnetlink.h>
 
 /* Verbs header. */
@@ -998,9 +997,24 @@
 	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
 	priv->nl_sn = 0;
 	priv->representor = !!switch_info->representor;
+	priv->master = !!switch_info->master;
 	priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
-	priv->representor_id =
-		switch_info->representor ? switch_info->port_name : -1;
+	/*
+	 * Currently we support single E-Switch per PF configurations
+	 * only and vport_id field contains the vport index for
+	 * associated VF, which is deduced from representor port name.
+	 * For exapmple, let's have the IB device port 10, it has
+	 * attached network device eth0, which has port name attribute
+	 * pf0vf2, we can deduce the VF number as 2, and set vport index
+	 * as 3 (2+1). This assigning schema should be changed if the
+	 * multiple E-Switch instances per PF configurations or/and PCI
+	 * subfunctions are added.
+	 */
+	priv->vport_id = switch_info->representor ?
+			 switch_info->port_name + 1 : -1;
+	/* representor_id field keeps the unmodified port/VF index. */
+	priv->representor_id = switch_info->representor ?
+			       switch_info->port_name : -1;
 	/*
 	 * Look for sibling devices in order to reuse their switch domain
 	 * if any, otherwise allocate one.
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 88ffb19..70a611f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -205,7 +205,9 @@ struct mlx5_priv {
 	uint16_t mtu; /* Configured MTU. */
 	unsigned int isolated:1; /* Whether isolated mode is enabled. */
 	unsigned int representor:1; /* Device is a port representor. */
+	unsigned int master:1; /* Device is a E-Switch master. */
 	uint16_t domain_id; /* Switch domain identifier. */
+	uint16_t vport_id; /* Associated VF vport index (if any). */
 	int32_t representor_id; /* Port representor identifier. */
 	/* RX/TX queues. */
 	unsigned int rxqs_n; /* RX queues array size. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2e8a296..933839c 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1362,8 +1362,10 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 		.port_name = 0,
 		.switch_id = 0,
 	};
+	DIR *dir;
 	bool port_name_set = false;
 	bool port_switch_id_set = false;
+	bool device_dir = false;
 	char c;
 	int ret;
 
@@ -1376,6 +1378,8 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	      ifname);
 	MKSTR(phys_switch_id, "/sys/class/net/%s/phys_switch_id",
 	      ifname);
+	MKSTR(pci_device, "/sys/class/net/%s/device",
+	      ifname);
 
 	file = fopen(phys_port_name, "rb");
 	if (file != NULL) {
@@ -1394,9 +1398,21 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 		fscanf(file, "%" SCNx64 "%c", &data.switch_id, &c) == 2 &&
 		c == '\n';
 	fclose(file);
-	data.master = port_switch_id_set && !port_name_set;
-	data.representor = port_switch_id_set && port_name_set;
+	dir = opendir(pci_device);
+	if (dir != NULL) {
+		closedir(dir);
+		device_dir = true;
+	}
+	data.master = port_switch_id_set && (!port_name_set || device_dir);
+	data.representor = port_switch_id_set && port_name_set && !device_dir;
 	*info = data;
+	assert(!(data.master && data.representor));
+	if (data.master && data.representor) {
+		DRV_LOG(ERR, "ifindex %u device is recognized as master"
+			     " and as representor", ifindex);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 8a10109..58bbfed 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -65,6 +65,12 @@
 #endif
 
 /* These are normally found in linux/if_link.h. */
+#ifndef HAVE_IFLA_NUM_VF
+#define IFLA_NUM_VF 21
+#endif
+#ifndef HAVE_IFLA_EXT_MASK
+#define IFLA_EXT_MASK 29
+#endif
 #ifndef HAVE_IFLA_PHYS_SWITCH_ID
 #define IFLA_PHYS_SWITCH_ID 36
 #endif
@@ -837,6 +843,7 @@ struct mlx5_nl_ifindex_data {
 	size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
 	bool port_name_set = false;
 	bool switch_id_set = false;
+	bool num_vf_set = false;
 
 	if (nh->nlmsg_type != RTM_NEWLINK)
 		goto error;
@@ -848,6 +855,9 @@ struct mlx5_nl_ifindex_data {
 		if (ra->rta_len > nh->nlmsg_len - off)
 			goto error;
 		switch (ra->rta_type) {
+		case IFLA_NUM_VF:
+			num_vf_set = true;
+			break;
 		case IFLA_PHYS_PORT_NAME:
 			port_name_set =
 				mlx5_translate_port_name((char *)payload,
@@ -864,8 +874,20 @@ struct mlx5_nl_ifindex_data {
 		}
 		off += RTA_ALIGN(ra->rta_len);
 	}
-	info.master = switch_id_set && !port_name_set;
-	info.representor = switch_id_set && port_name_set;
+	if (switch_id_set) {
+		if (info.port_name_new) {
+			/* New representors naming schema. */
+			if (port_name_set) {
+				info.master = (info.port_name == -1);
+				info.representor = (info.port_name != -1);
+			}
+		} else {
+			/* Legacy representors naming schema. */
+			info.master = (!port_name_set || num_vf_set);
+			info.representor = port_name_set && !num_vf_set;
+		}
+	}
+	assert(!(info.master && info.representor));
 	memcpy(arg, &info, sizeof(info));
 	return 0;
 error:
@@ -893,9 +915,13 @@ struct mlx5_nl_ifindex_data {
 	struct {
 		struct nlmsghdr nh;
 		struct ifinfomsg info;
+		struct rtattr rta;
+		uint32_t extmask;
 	} req = {
 		.nh = {
-			.nlmsg_len = NLMSG_LENGTH(sizeof(req.info)),
+			.nlmsg_len = NLMSG_LENGTH
+					(sizeof(req.info) +
+					 RTA_LENGTH(sizeof(uint32_t))),
 			.nlmsg_type = RTM_GETLINK,
 			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
 		},
@@ -903,11 +929,22 @@ struct mlx5_nl_ifindex_data {
 			.ifi_family = AF_UNSPEC,
 			.ifi_index = ifindex,
 		},
+		.rta = {
+			.rta_type = IFLA_EXT_MASK,
+			.rta_len = RTA_LENGTH(sizeof(int32_t)),
+		},
+		.extmask = RTE_LE32(1),
 	};
 	int ret;
 
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret >= 0)
 		ret = mlx5_nl_recv(nl, seq, mlx5_nl_switch_info_cb, info);
+	if (info->master && info->representor) {
+		DRV_LOG(ERR, "ifindex %u device is recognized as master"
+			     " and as representor", ifindex);
+		rte_errno = ENODEV;
+		ret = -rte_errno;
+	}
 	return ret;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 02/13] net/mlx5: modify get ifindex routine for multiport IB
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 03/13] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
                           ` (11 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

There is the routine mlx5_nl_ifindex() returning the
network interface index associated with Infiniband device.
We are going to support multiport IB devices, now function
takes the IB port as argument and returns ifindex associated
with tuple <IB device, IB port>

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  2 +-
 drivers/net/mlx5/mlx5.h        |  2 +-
 drivers/net/mlx5/mlx5_ethdev.c |  2 +-
 drivers/net/mlx5/mlx5_nl.c     | 20 +++++++++++++-------
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index de72b2f..cc2369b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1434,7 +1434,7 @@ struct mlx5_dev_spawn_data {
 			list[i].ifindex = 0;
 		else
 			list[i].ifindex = mlx5_nl_ifindex
-				(nl_rdma, list[i].ibv_dev->name);
+				(nl_rdma, list[i].ibv_dev->name, 1);
 		if (nl_route < 0 ||
 		    !list[i].ifindex ||
 		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 70a611f..befe0ec 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -422,7 +422,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct ether_addr *mac,
 void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
 int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
 int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
-unsigned int mlx5_nl_ifindex(int nl, const char *name);
+unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 933839c..2b83898 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -222,7 +222,7 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int ifindex =
 		priv->nl_socket_rdma >= 0 ?
-		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name) : 0;
+		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name, 1) : 0;
 
 	if (!ifindex) {
 		if (!priv->representor)
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 58bbfed..4209801 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -85,11 +85,12 @@ struct mlx5_nl_mac_addr {
 	int mac_n; /**< Number of addresses in the array. */
 };
 
-/** Data structure used by mlx5_nl_ifindex_cb(). */
+/** Data structure used by mlx5_nl_cmdget_cb(). */
 struct mlx5_nl_ifindex_data {
 	const char *name; /**< IB device name (in). */
 	uint32_t ibindex; /**< IB device index (out). */
 	uint32_t ifindex; /**< Network interface index (out). */
+	uint32_t portnum; /**< IB device max port number. */
 };
 
 /**
@@ -695,12 +696,13 @@ struct mlx5_nl_ifindex_data {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_nl_ifindex_cb(struct nlmsghdr *nh, void *arg)
+mlx5_nl_cmdget_cb(struct nlmsghdr *nh, void *arg)
 {
 	struct mlx5_nl_ifindex_data *data = arg;
 	size_t off = NLMSG_HDRLEN;
 	uint32_t ibindex = 0;
 	uint32_t ifindex = 0;
+	uint32_t portnum = 0;
 	int found = 0;
 
 	if (nh->nlmsg_type !=
@@ -725,6 +727,9 @@ struct mlx5_nl_ifindex_data {
 		case RDMA_NLDEV_ATTR_NDEV_INDEX:
 			ifindex = *(uint32_t *)payload;
 			break;
+		case RDMA_NLDEV_ATTR_PORT_INDEX:
+			portnum = *(uint32_t *)payload;
+			break;
 		default:
 			break;
 		}
@@ -733,6 +738,7 @@ struct mlx5_nl_ifindex_data {
 	if (found) {
 		data->ibindex = ibindex;
 		data->ifindex = ifindex;
+		data->portnum = portnum;
 	}
 	return 0;
 error:
@@ -751,15 +757,15 @@ struct mlx5_nl_ifindex_data {
  *   Netlink socket of the RDMA kind (NETLINK_RDMA).
  * @param[in] name
  *   IB device name.
- *
+ * @param[in] pindex
+ *   IB device port index, starting from 1
  * @return
  *   A valid (nonzero) interface index on success, 0 otherwise and rte_errno
  *   is set.
  */
 unsigned int
-mlx5_nl_ifindex(int nl, const char *name)
+mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex)
 {
-	static const uint32_t pindex = 1;
 	uint32_t seq = random();
 	struct mlx5_nl_ifindex_data data = {
 		.name = name,
@@ -785,7 +791,7 @@ struct mlx5_nl_ifindex_data {
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!data.ibindex)
@@ -808,7 +814,7 @@ struct mlx5_nl_ifindex_data {
 	ret = mlx5_nl_send(nl, &req.nh, seq);
 	if (ret < 0)
 		return 0;
-	ret = mlx5_nl_recv(nl, seq, mlx5_nl_ifindex_cb, &data);
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
 	if (ret < 0)
 		return 0;
 	if (!data.ifindex)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 03/13] net/mlx5: add getting IB ports number for multiport IB
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 02/13] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 04/13] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
                           ` (10 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

There is the routine mlx5_nl_portnum() added to get
the number of ports of multiport Infiniband device.
It is assumed the Uplink/VF representors are attached
on these ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.h    |  1 +
 drivers/net/mlx5/mlx5_nl.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index befe0ec..d69f589 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -422,6 +422,7 @@ int mlx5_nl_mac_addr_remove(struct rte_eth_dev *dev, struct ether_addr *mac,
 void mlx5_nl_mac_addr_flush(struct rte_eth_dev *dev);
 int mlx5_nl_promisc(struct rte_eth_dev *dev, int enable);
 int mlx5_nl_allmulti(struct rte_eth_dev *dev, int enable);
+unsigned int mlx5_nl_portnum(int nl, const char *name);
 unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex);
 int mlx5_nl_switch_info(int nl, unsigned int ifindex,
 			struct mlx5_switch_info *info);
diff --git a/drivers/net/mlx5/mlx5_nl.c b/drivers/net/mlx5/mlx5_nl.c
index 4209801..fd9226b 100644
--- a/drivers/net/mlx5/mlx5_nl.c
+++ b/drivers/net/mlx5/mlx5_nl.c
@@ -826,6 +826,51 @@ struct mlx5_nl_ifindex_data {
 }
 
 /**
+ * Get the number of physical ports of given IB device.
+ *
+ * @param nl
+ *   Netlink socket of the RDMA kind (NETLINK_RDMA).
+ * @param[in] name
+ *   IB device name.
+ *
+ * @return
+ *   A valid (nonzero) number of ports on success, 0 otherwise
+ *   and rte_errno is set.
+ */
+unsigned int
+mlx5_nl_portnum(int nl, const char *name)
+{
+	uint32_t seq = random();
+	struct mlx5_nl_ifindex_data data = {
+		.name = name,
+		.ibindex = 0,
+		.ifindex = 0,
+		.portnum = 0,
+	};
+	struct nlmsghdr req = {
+		.nlmsg_len = NLMSG_LENGTH(0),
+		.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV,
+					       RDMA_NLDEV_CMD_GET),
+		.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP,
+	};
+	int ret;
+
+	ret = mlx5_nl_send(nl, &req, seq);
+	if (ret < 0)
+		return 0;
+	ret = mlx5_nl_recv(nl, seq, mlx5_nl_cmdget_cb, &data);
+	if (ret < 0)
+		return 0;
+	if (!data.ibindex) {
+		rte_errno = ENODEV;
+		return 0;
+	}
+	if (!data.portnum)
+		rte_errno = EINVAL;
+	return data.portnum;
+}
+
+/**
  * Process switch information from Netlink message.
  *
  * @param nh
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 04/13] net/mlx5: add multiport IB device support to probing
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
                           ` (2 preceding siblings ...)
  2019-03-27 13:15         ` [PATCH v4 03/13] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 05/13] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
                           ` (9 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

mlx5_pci_probe() routine is refactored to probe the ports
of found Infiniband devices. All active ports (with attached
network interface), belonging to the same Infiniband device
will use the signle shared Infiniband context of that device.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 317 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 225 insertions(+), 92 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cc2369b..db4dcc5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -130,6 +130,16 @@
 /** Driver-specific log messages type. */
 int mlx5_logtype;
 
+/** Data associated with devices to spawn. */
+struct mlx5_dev_spawn_data {
+	uint32_t ifindex; /**< Network interface index. */
+	uint32_t max_port; /**< IB device maximal port index. */
+	uint32_t ibv_port; /**< IB device physical port index. */
+	struct mlx5_switch_info info; /**< Switch information. */
+	struct ibv_device *ibv_dev; /**< Associated IB device. */
+	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
+};
+
 /**
  * Prepare shared data between primary and secondary process.
  */
@@ -716,12 +726,10 @@
  *
  * @param dpdk_dev
  *   Backing DPDK device.
- * @param ibv_dev
- *   Verbs device.
+ * @param spawn
+ *   Verbs device parameters (name, port, switch_info) to spawn.
  * @param config
  *   Device configuration parameters.
- * @param[in] switch_info
- *   Switch properties of Ethernet device.
  *
  * @return
  *   A valid Ethernet device object on success, NULL otherwise and rte_errno
@@ -732,10 +740,11 @@
  */
 static struct rte_eth_dev *
 mlx5_dev_spawn(struct rte_device *dpdk_dev,
-	       struct ibv_device *ibv_dev,
-	       struct mlx5_dev_config config,
-	       const struct mlx5_switch_info *switch_info)
+	       struct mlx5_dev_spawn_data *spawn,
+	       struct mlx5_dev_config config)
 {
+	const struct mlx5_switch_info *switch_info = &spawn->info;
+	struct ibv_device *ibv_dev = spawn->ibv_dev;
 	struct ibv_context *ctx = NULL;
 	struct ibv_device_attr_ex attr;
 	struct ibv_port_attr port_attr;
@@ -949,7 +958,7 @@
 		return eth_dev;
 	}
 	/* Check port status. */
-	err = mlx5_glue->query_port(ctx, 1, &port_attr);
+	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
 	if (err) {
 		DRV_LOG(ERR, "port query failed: %s", strerror(err));
 		goto error;
@@ -1309,14 +1318,6 @@
 	return NULL;
 }
 
-/** Data associated with devices to spawn. */
-struct mlx5_dev_spawn_data {
-	unsigned int ifindex; /**< Network interface index. */
-	struct mlx5_switch_info info; /**< Switch information. */
-	struct ibv_device *ibv_dev; /**< Associated IB device. */
-	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
-};
-
 /**
  * Comparison callback to sort device data.
  *
@@ -1373,7 +1374,24 @@ struct mlx5_dev_spawn_data {
 	       struct rte_pci_device *pci_dev)
 {
 	struct ibv_device **ibv_list;
-	unsigned int n = 0;
+	/*
+	 * Number of found IB Devices matching with requested PCI BDF.
+	 * nd != 1 means there are multiple IB devices over the same
+	 * PCI device and we have representors and master.
+	 */
+	unsigned int nd = 0;
+	/*
+	 * Number of found IB device Ports. nd = 1 and np = 1..n means
+	 * we have the single multiport IB device, and there may be
+	 * representors attached to some of found ports.
+	 */
+	unsigned int np = 0;
+	/*
+	 * Number of DPDK ethernet devices to Spawn - either over
+	 * multiple IB devices or multiple ports of single IB device.
+	 * Actually this is the number of iterations to spawn.
+	 */
+	unsigned int ns = 0;
 	struct mlx5_dev_config dev_config;
 	int ret;
 
@@ -1385,8 +1403,14 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(ERR, "cannot list devices, is ib_uverbs loaded?");
 		return -rte_errno;
 	}
-
+	/*
+	 * First scan the list of all Infiniband devices to find
+	 * matching ones, gathering into the list.
+	 */
 	struct ibv_device *ibv_match[ret + 1];
+	int nl_route = -1;
+	int nl_rdma = -1;
+	unsigned int i;
 
 	while (ret-- > 0) {
 		struct rte_pci_addr pci_addr;
@@ -1401,77 +1425,183 @@ struct mlx5_dev_spawn_data {
 			continue;
 		DRV_LOG(INFO, "PCI information matches for device \"%s\"",
 			ibv_list[ret]->name);
-		ibv_match[n++] = ibv_list[ret];
+		ibv_match[nd++] = ibv_list[ret];
+	}
+	ibv_match[nd] = NULL;
+	if (!nd) {
+		/* No device macthes, just complain and bail out. */
+		mlx5_glue->free_device_list(ibv_list);
+		DRV_LOG(WARNING,
+			"no Verbs device matches PCI device " PCI_PRI_FMT ","
+			" are kernel drivers loaded?",
+			pci_dev->addr.domain, pci_dev->addr.bus,
+			pci_dev->addr.devid, pci_dev->addr.function);
+		rte_errno = ENOENT;
+		ret = -rte_errno;
+		return ret;
+	}
+	nl_route = mlx5_nl_init(NETLINK_ROUTE);
+	nl_rdma = mlx5_nl_init(NETLINK_RDMA);
+	if (nd == 1) {
+		/*
+		 * Found single matching device may have multiple ports.
+		 * Each port may be representor, we have to check the port
+		 * number and check the representors existence.
+		 */
+		if (nl_rdma >= 0)
+			np = mlx5_nl_portnum(nl_rdma, ibv_match[0]->name);
+		if (!np)
+			DRV_LOG(WARNING, "can not get IB device \"%s\""
+					 " ports number", ibv_match[0]->name);
 	}
-	ibv_match[n] = NULL;
-
-	struct mlx5_dev_spawn_data list[n];
-	int nl_route = n ? mlx5_nl_init(NETLINK_ROUTE) : -1;
-	int nl_rdma = n ? mlx5_nl_init(NETLINK_RDMA) : -1;
-	unsigned int i;
-	unsigned int u;
-
 	/*
-	 * The existence of several matching entries (n > 1) means port
-	 * representors have been instantiated. No existing Verbs call nor
-	 * /sys entries can tell them apart, this can only be done through
-	 * Netlink calls assuming kernel drivers are recent enough to
-	 * support them.
-	 *
-	 * In the event of identification failure through Netlink, try again
-	 * through sysfs, then either:
-	 *
-	 * 1. No device matches (n == 0), complain and bail out.
-	 * 2. A single IB device matches (n == 1) and is not a representor,
-	 *    assume no switch support.
-	 * 3. Otherwise no safe assumptions can be made; complain louder and
-	 *    bail out.
+	 * Now we can determine the maximal
+	 * amount of devices to be spawned.
 	 */
-	for (i = 0; i != n; ++i) {
-		list[i].ibv_dev = ibv_match[i];
-		list[i].eth_dev = NULL;
-		if (nl_rdma < 0)
-			list[i].ifindex = 0;
-		else
-			list[i].ifindex = mlx5_nl_ifindex
-				(nl_rdma, list[i].ibv_dev->name, 1);
-		if (nl_route < 0 ||
-		    !list[i].ifindex ||
-		    mlx5_nl_switch_info(nl_route, list[i].ifindex,
-					&list[i].info) ||
-		    ((!list[i].info.representor && !list[i].info.master) &&
-		     mlx5_sysfs_switch_info(list[i].ifindex, &list[i].info))) {
-			list[i].ifindex = 0;
-			memset(&list[i].info, 0, sizeof(list[i].info));
-			continue;
+	struct mlx5_dev_spawn_data list[np ? np : nd];
+
+	if (np > 1) {
+		/*
+		 * Signle IB device with multiple ports found,
+		 * it may be E-Switch master device and representors.
+		 * We have to perform identification trough the ports.
+		 */
+		assert(nl_rdma >= 0);
+		assert(ns == 0);
+		assert(nd == 1);
+		for (i = 1; i <= np; ++i) {
+			list[ns].max_port = np;
+			list[ns].ibv_port = i;
+			list[ns].ibv_dev = ibv_match[0];
+			list[ns].eth_dev = NULL;
+			list[ns].ifindex = mlx5_nl_ifindex
+					(nl_rdma, list[ns].ibv_dev->name, i);
+			if (!list[ns].ifindex) {
+				/*
+				 * No network interface index found for the
+				 * specified port, it means there is no
+				 * representor on this port. It's OK,
+				 * there can be disabled ports, for example
+				 * if sriov_numvfs < sriov_totalvfs.
+				 */
+				continue;
+			}
+			ret = -1;
+			if (nl_route >= 0)
+				ret = mlx5_nl_switch_info
+					       (nl_route,
+						list[ns].ifindex,
+						&list[ns].info);
+			if (ret || (!list[ns].info.representor &&
+				    !list[ns].info.master)) {
+				/*
+				 * We failed to recognize representors with
+				 * Netlink, let's try to perform the task
+				 * with sysfs.
+				 */
+				ret =  mlx5_sysfs_switch_info
+						(list[ns].ifindex,
+						 &list[ns].info);
+			}
+			if (!ret && (list[ns].info.representor ^
+				     list[ns].info.master))
+				ns++;
 		}
-	}
-	if (nl_rdma >= 0)
-		close(nl_rdma);
-	if (nl_route >= 0)
-		close(nl_route);
-	/* Count unidentified devices. */
-	for (u = 0, i = 0; i != n; ++i)
-		if (!list[i].info.master && !list[i].info.representor)
-			++u;
-	if (u) {
-		if (n == 1 && u == 1) {
-			/* Case #2. */
-			DRV_LOG(INFO, "no switch support detected");
-		} else {
-			/* Case #3. */
+		if (!ns) {
 			DRV_LOG(ERR,
-				"unable to tell which of the matching devices"
-				" is the master (lack of kernel support?)");
-			n = 0;
+				"unable to recognize master/representors"
+				" on the IB device with multiple ports");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
+		}
+	} else {
+		/*
+		 * The existence of several matching entries (nd > 1) means
+		 * port representors have been instantiated. No existing Verbs
+		 * call nor sysfs entries can tell them apart, this can only
+		 * be done through Netlink calls assuming kernel drivers are
+		 * recent enough to support them.
+		 *
+		 * In the event of identification failure through Netlink,
+		 * try again through sysfs, then:
+		 *
+		 * 1. A single IB device matches (nd == 1) with single
+		 *    port (np=0/1) and is not a representor, assume
+		 *    no switch support.
+		 *
+		 * 2. Otherwise no safe assumptions can be made;
+		 *    complain louder and bail out.
+		 */
+		np = 1;
+		for (i = 0; i != nd; ++i) {
+			memset(&list[ns].info, 0, sizeof(list[ns].info));
+			list[ns].max_port = 1;
+			list[ns].ibv_port = 1;
+			list[ns].ibv_dev = ibv_match[i];
+			list[ns].eth_dev = NULL;
+			list[ns].ifindex = 0;
+			if (nl_rdma >= 0)
+				list[ns].ifindex = mlx5_nl_ifindex
+					(nl_rdma, list[ns].ibv_dev->name, 1);
+			if (!list[ns].ifindex) {
+				/*
+				 * No network interface index found for the
+				 * specified device, it means there it is not
+				 * a representor/master.
+				 */
+				continue;
+			}
+			ret = -1;
+			if (nl_route >= 0)
+				ret = mlx5_nl_switch_info
+					       (nl_route,
+						list[ns].ifindex,
+						&list[ns].info);
+			if (ret || (!list[ns].info.representor &&
+				    !list[ns].info.master)) {
+				/*
+				 * We failed to recognize representors with
+				 * Netlink, let's try to perform the task
+				 * with sysfs.
+				 */
+				ret =  mlx5_sysfs_switch_info
+						(list[ns].ifindex,
+						 &list[ns].info);
+			}
+			if (!ret && (list[ns].info.representor ^
+				     list[ns].info.master)) {
+				ns++;
+			} else if ((nd == 1) &&
+				   !list[ns].info.representor &&
+				   !list[ns].info.master) {
+				/*
+				 * Single IB device with
+				 * one physical port and
+				 * attached network device.
+				 * May be SRIOV is not enabled
+				 * or there is no representors.
+				 */
+				DRV_LOG(INFO, "no E-Switch support detected");
+				ns++;
+				break;
+			}
+		}
+		if (!ns) {
+			DRV_LOG(ERR,
+				"unable to recognize master/representors"
+				" on the multiple IB devices");
+			rte_errno = ENOENT;
+			ret = -rte_errno;
+			goto exit;
 		}
 	}
+	assert(ns);
 	/*
 	 * Sort list to probe devices in natural order for users convenience
 	 * (i.e. master first, then representors from lowest to highest ID).
 	 */
-	if (n)
-		qsort(list, n, sizeof(*list), mlx5_dev_spawn_data_cmp);
+	qsort(list, ns, sizeof(*list), mlx5_dev_spawn_data_cmp);
 	/* Default configuration. */
 	dev_config = (struct mlx5_dev_config){
 		.hw_padding = 0,
@@ -1490,7 +1620,7 @@ struct mlx5_dev_spawn_data {
 			.min_rxqs_num = MLX5_MPRQ_MIN_RXQS,
 		},
 	};
-	/* Device speicific configuration. */
+	/* Device specific configuration. */
 	switch (pci_dev->id.device_id) {
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX5BF:
 		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS_BLUEFIELD;
@@ -1507,12 +1637,12 @@ struct mlx5_dev_spawn_data {
 	/* Set architecture-dependent default value if unset. */
 	if (dev_config.txqs_vec == MLX5_ARG_UNSET)
 		dev_config.txqs_vec = MLX5_VPMD_MAX_TXQS;
-	for (i = 0; i != n; ++i) {
+	for (i = 0; i != ns; ++i) {
 		uint32_t restore;
 
 		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
-						 list[i].ibv_dev, dev_config,
-						 &list[i].info);
+						 &list[i],
+						 dev_config);
 		if (!list[i].eth_dev) {
 			if (rte_errno != EBUSY && rte_errno != EEXIST)
 				break;
@@ -1525,16 +1655,7 @@ struct mlx5_dev_spawn_data {
 		list[i].eth_dev->data->dev_flags |= restore;
 		rte_eth_dev_probing_finish(list[i].eth_dev);
 	}
-	mlx5_glue->free_device_list(ibv_list);
-	if (!n) {
-		DRV_LOG(WARNING,
-			"no Verbs device matches PCI device " PCI_PRI_FMT ","
-			" are kernel drivers loaded?",
-			pci_dev->addr.domain, pci_dev->addr.bus,
-			pci_dev->addr.devid, pci_dev->addr.function);
-		rte_errno = ENOENT;
-		ret = -rte_errno;
-	} else if (i != n) {
+	if (i != ns) {
 		DRV_LOG(ERR,
 			"probe of PCI device " PCI_PRI_FMT " aborted after"
 			" encountering an error: %s",
@@ -1556,6 +1677,18 @@ struct mlx5_dev_spawn_data {
 	} else {
 		ret = 0;
 	}
+exit:
+	/*
+	 * Do the routine cleanup:
+	 * - close opened Netlink sockets
+	 * - free the Infiniband device list
+	 */
+	if (nl_rdma >= 0)
+		close(nl_rdma);
+	if (nl_route >= 0)
+		close(nl_route);
+	assert(ibv_list);
+	mlx5_glue->free_device_list(ibv_list);
 	return ret;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 05/13] net/mlx5: add IB shared context alloc/free functions
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
                           ` (3 preceding siblings ...)
  2019-03-27 13:15         ` [PATCH v4 04/13] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 06/13] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
                           ` (8 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The Mellanox NICs support SR-IOV and have E-Switch feature.
When SR-IOV is set up in switchdev mode and E-Switch is enabled
we have so called VF representors in the system. All representors
belonging to the same E-Switch are created on the basis of the
single PCI function and with current implementation each representor
has its own dedicated Infiniband device and operates within its
own Infiniband context. It is proposed to provide representors
as ports of the single Infiniband device and operate on the
shared Infiniband context saving various resources. This patch
introduces appropriate structures.

Also the functions to allocate and free shared IB context for
multiport are added. The IB device context, Protection Domain,
device attributes, Infiniband names are going to be relocated
to the shared structure from the device private one.
mlx5_dev_spawn() is updated to support shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 289 +++++++++++++++++++++++++++++++++---------------
 drivers/net/mlx5/mlx5.h |  32 ++++++
 2 files changed, 229 insertions(+), 92 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index db4dcc5..988fff7 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -140,6 +140,145 @@ struct mlx5_dev_spawn_data {
 	struct rte_eth_dev *eth_dev; /**< Associated Ethernet device. */
 };
 
+static LIST_HEAD(, mlx5_ibv_shared) mlx5_ibv_list = LIST_HEAD_INITIALIZER();
+static pthread_mutex_t mlx5_ibv_list_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+/**
+ * Allocate shared IB device context. If there is multiport device the
+ * master and representors will share this context, if there is single
+ * port dedicated IB device, the context will be used by only given
+ * port due to unification.
+ *
+ * Routine first searches the context for the spesified IB device name,
+ * if found the shared context assumed and reference counter is incremented.
+ * If no context found the new one is created and initialized with specified
+ * IB device context and parameters.
+ *
+ * @param[in] spawn
+ *   Pointer to the IB device attributes (name, port, etc).
+ *
+ * @return
+ *   Pointer to mlx5_ibv_shared object on success,
+ *   otherwise NULL and rte_errno is set.
+ */
+static struct mlx5_ibv_shared *
+mlx5_alloc_shared_ibctx(const struct mlx5_dev_spawn_data *spawn)
+{
+	struct mlx5_ibv_shared *sh;
+	int err = 0;
+
+	assert(spawn);
+	/* Secondary process should not create the shared context. */
+	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
+	pthread_mutex_lock(&mlx5_ibv_list_mutex);
+	/* Search for IB context by device name. */
+	LIST_FOREACH(sh, &mlx5_ibv_list, next) {
+		if (!strcmp(sh->ibdev_name, spawn->ibv_dev->name)) {
+			sh->refcnt++;
+			goto exit;
+		}
+	}
+	/* No device found, we have to create new sharted context. */
+	assert(spawn->max_port);
+	sh = rte_zmalloc("ethdev shared ib context",
+			 sizeof(struct mlx5_ibv_shared) +
+			 spawn->max_port *
+			 sizeof(struct mlx5_ibv_shared_port),
+			 RTE_CACHE_LINE_SIZE);
+	if (!sh) {
+		DRV_LOG(ERR, "shared context allocation failure");
+		rte_errno  = ENOMEM;
+		goto exit;
+	}
+	/* Try to open IB device with DV first, then usual Verbs. */
+	errno = 0;
+	sh->ctx = mlx5_glue->dv_open_device(spawn->ibv_dev);
+	if (sh->ctx) {
+		sh->devx = 1;
+		DRV_LOG(DEBUG, "DevX is supported");
+	} else {
+		sh->ctx = mlx5_glue->open_device(spawn->ibv_dev);
+		if (!sh->ctx) {
+			err = errno ? errno : ENODEV;
+			goto error;
+		}
+		DRV_LOG(DEBUG, "DevX is NOT supported");
+	}
+	err = mlx5_glue->query_device_ex(sh->ctx, NULL, &sh->device_attr);
+	if (err) {
+		DRV_LOG(DEBUG, "ibv_query_device_ex() failed");
+		goto error;
+	}
+	sh->refcnt = 1;
+	sh->max_port = spawn->max_port;
+	strncpy(sh->ibdev_name, sh->ctx->device->name,
+		sizeof(sh->ibdev_name));
+	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
+		sizeof(sh->ibdev_path));
+	sh->pd = mlx5_glue->alloc_pd(sh->ctx);
+	if (sh->pd == NULL) {
+		DRV_LOG(ERR, "PD allocation failure");
+		err = ENOMEM;
+		goto error;
+	}
+	LIST_INSERT_HEAD(&mlx5_ibv_list, sh, next);
+exit:
+	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
+	return sh;
+error:
+	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
+	assert(sh);
+	if (sh->pd)
+		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->ctx)
+		claim_zero(mlx5_glue->close_device(sh->ctx));
+	rte_free(sh);
+	assert(err > 0);
+	rte_errno = err;
+	return NULL;
+}
+
+/**
+ * Free shared IB device context. Decrement counter and if zero free
+ * all allocated resources and close handles.
+ *
+ * @param[in] sh
+ *   Pointer to mlx5_ibv_shared object to free
+ */
+static void
+mlx5_free_shared_ibctx(struct mlx5_ibv_shared *sh)
+{
+	pthread_mutex_lock(&mlx5_ibv_list_mutex);
+#ifndef NDEBUG
+	/* Check the object presence in the list. */
+	struct mlx5_ibv_shared *lctx;
+
+	LIST_FOREACH(lctx, &mlx5_ibv_list, next)
+		if (lctx == sh)
+			break;
+	assert(lctx);
+	if (lctx != sh) {
+		DRV_LOG(ERR, "Freeing non-existing shared IB context");
+		goto exit;
+	}
+#endif
+	assert(sh);
+	assert(sh->refcnt);
+	/* Secondary process should not free the shared context. */
+	assert(rte_eal_process_type() == RTE_PROC_PRIMARY);
+	if (--sh->refcnt)
+		goto exit;
+	LIST_REMOVE(sh, next);
+	if (sh->pd)
+		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
+	if (sh->ctx)
+		claim_zero(mlx5_glue->close_device(sh->ctx));
+	rte_free(sh);
+exit:
+	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
+}
+
+
 /**
  * Prepare shared data between primary and secondary process.
  */
@@ -289,12 +428,10 @@ struct mlx5_dev_spawn_data {
 	}
 	mlx5_mprq_free_mp(dev);
 	mlx5_mr_release(dev);
-	if (priv->pd != NULL) {
-		assert(priv->ctx != NULL);
-		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
-		claim_zero(mlx5_glue->close_device(priv->ctx));
-	} else
-		assert(priv->ctx == NULL);
+	assert(priv->sh);
+	if (priv->sh)
+		mlx5_free_shared_ibctx(priv->sh);
+	priv->sh = NULL;
 	if (priv->rss_conf.rss_key != NULL)
 		rte_free(priv->rss_conf.rss_key);
 	if (priv->reta_idx != NULL)
@@ -744,11 +881,8 @@ struct mlx5_dev_spawn_data {
 	       struct mlx5_dev_config config)
 {
 	const struct mlx5_switch_info *switch_info = &spawn->info;
-	struct ibv_device *ibv_dev = spawn->ibv_dev;
-	struct ibv_context *ctx = NULL;
-	struct ibv_device_attr_ex attr;
+	struct mlx5_ibv_shared *sh = NULL;
 	struct ibv_port_attr port_attr;
-	struct ibv_pd *pd = NULL;
 	struct mlx5dv_context dv_attr = { .comp_mask = 0 };
 	struct rte_eth_dev *eth_dev = NULL;
 	struct mlx5_priv *priv = NULL;
@@ -804,18 +938,40 @@ struct mlx5_dev_spawn_data {
 	}
 	/* Prepare shared data between primary and secondary process. */
 	mlx5_prepare_shared_data();
-	errno = 0;
-	ctx = mlx5_glue->dv_open_device(ibv_dev);
-	if (ctx) {
-		config.devx = 1;
-		DRV_LOG(DEBUG, "DEVX is supported");
-	} else {
-		ctx = mlx5_glue->open_device(ibv_dev);
-		if (!ctx) {
-			rte_errno = errno ? errno : ENODEV;
+	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		eth_dev = rte_eth_dev_attach_secondary(name);
+		if (eth_dev == NULL) {
+			DRV_LOG(ERR, "can not attach rte ethdev");
+			rte_errno = ENOMEM;
 			return NULL;
 		}
+		eth_dev->device = dpdk_dev;
+		eth_dev->dev_ops = &mlx5_dev_sec_ops;
+		err = mlx5_uar_init_secondary(eth_dev);
+		if (err)
+			return NULL;
+		/* Receive command fd from primary process */
+		err = mlx5_socket_connect(eth_dev);
+		if (err < 0)
+			return NULL;
+		/* Remap UAR for Tx queues. */
+		err = mlx5_tx_uar_remap(eth_dev, err);
+		if (err)
+			return NULL;
+		/*
+		 * Ethdev pointer is still required as input since
+		 * the primary device is not accessible from the
+		 * secondary process.
+		 */
+		eth_dev->rx_pkt_burst = mlx5_select_rx_function(eth_dev);
+		eth_dev->tx_pkt_burst = mlx5_select_tx_function(eth_dev);
+		return eth_dev;
 	}
+	sh = mlx5_alloc_shared_ibctx(spawn);
+	if (!sh)
+		return NULL;
+	config.devx = sh->devx;
 #ifdef HAVE_IBV_MLX5_MOD_SWP
 	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_SWP;
 #endif
@@ -829,7 +985,7 @@ struct mlx5_dev_spawn_data {
 #ifdef HAVE_IBV_DEVICE_STRIDING_RQ_SUPPORT
 	dv_attr.comp_mask |= MLX5DV_CONTEXT_MASK_STRIDING_RQ;
 #endif
-	mlx5_glue->dv_query_device(ctx, &dv_attr);
+	mlx5_glue->dv_query_device(sh->ctx, &dv_attr);
 	if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED) {
 		if (dv_attr.flags & MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW) {
 			DRV_LOG(DEBUG, "enhanced MPW is supported");
@@ -914,51 +1070,8 @@ struct mlx5_dev_spawn_data {
 		" old OFED/rdma-core version or firmware configuration");
 #endif
 	config.mpls_en = mpls_en;
-	err = mlx5_glue->query_device_ex(ctx, NULL, &attr);
-	if (err) {
-		DEBUG("ibv_query_device_ex() failed");
-		goto error;
-	}
-	DRV_LOG(DEBUG, "naming Ethernet device \"%s\"", name);
-	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
-		eth_dev = rte_eth_dev_attach_secondary(name);
-		if (eth_dev == NULL) {
-			DRV_LOG(ERR, "can not attach rte ethdev");
-			rte_errno = ENOMEM;
-			err = rte_errno;
-			goto error;
-		}
-		eth_dev->device = dpdk_dev;
-		eth_dev->dev_ops = &mlx5_dev_sec_ops;
-		err = mlx5_uar_init_secondary(eth_dev);
-		if (err) {
-			err = rte_errno;
-			goto error;
-		}
-		/* Receive command fd from primary process */
-		err = mlx5_socket_connect(eth_dev);
-		if (err < 0) {
-			err = rte_errno;
-			goto error;
-		}
-		/* Remap UAR for Tx queues. */
-		err = mlx5_tx_uar_remap(eth_dev, err);
-		if (err) {
-			err = rte_errno;
-			goto error;
-		}
-		/*
-		 * Ethdev pointer is still required as input since
-		 * the primary device is not accessible from the
-		 * secondary process.
-		 */
-		eth_dev->rx_pkt_burst = mlx5_select_rx_function(eth_dev);
-		eth_dev->tx_pkt_burst = mlx5_select_tx_function(eth_dev);
-		claim_zero(mlx5_glue->close_device(ctx));
-		return eth_dev;
-	}
 	/* Check port status. */
-	err = mlx5_glue->query_port(ctx, spawn->ibv_port, &port_attr);
+	err = mlx5_glue->query_port(sh->ctx, spawn->ibv_port, &port_attr);
 	if (err) {
 		DRV_LOG(ERR, "port query failed: %s", strerror(err));
 		goto error;
@@ -972,13 +1085,7 @@ struct mlx5_dev_spawn_data {
 		DRV_LOG(DEBUG, "port is not active: \"%s\" (%d)",
 			mlx5_glue->port_state_str(port_attr.state),
 			port_attr.state);
-	/* Allocate protection domain. */
-	pd = mlx5_glue->alloc_pd(ctx);
-	if (pd == NULL) {
-		DRV_LOG(ERR, "PD allocation failure");
-		err = ENOMEM;
-		goto error;
-	}
+	/* Allocate private eth device data. */
 	priv = rte_zmalloc("ethdev private structure",
 			   sizeof(*priv),
 			   RTE_CACHE_LINE_SIZE);
@@ -987,13 +1094,11 @@ struct mlx5_dev_spawn_data {
 		err = ENOMEM;
 		goto error;
 	}
-	priv->ctx = ctx;
-	strncpy(priv->ibdev_name, priv->ctx->device->name,
-		sizeof(priv->ibdev_name));
-	strncpy(priv->ibdev_path, priv->ctx->device->ibdev_path,
-		sizeof(priv->ibdev_path));
-	priv->device_attr = attr;
-	priv->pd = pd;
+	priv->sh = sh;
+	priv->ctx = sh->ctx;
+	priv->ibv_port = spawn->ibv_port;
+	priv->device_attr = sh->device_attr;
+	priv->pd = sh->pd;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
@@ -1062,7 +1167,8 @@ struct mlx5_dev_spawn_data {
 			strerror(rte_errno));
 		goto error;
 	}
-	config.hw_csum = !!(attr.device_cap_flags_ex & IBV_DEVICE_RAW_IP_CSUM);
+	config.hw_csum = !!(sh->device_attr.device_cap_flags_ex &
+			    IBV_DEVICE_RAW_IP_CSUM);
 	DRV_LOG(DEBUG, "checksum offloading is %ssupported",
 		(config.hw_csum ? "" : "not "));
 #if !defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42) && \
@@ -1076,7 +1182,7 @@ struct mlx5_dev_spawn_data {
 	}
 #endif
 	config.ind_table_max_size =
-		attr.rss_caps.max_rwq_indirection_table_size;
+		sh->device_attr.rss_caps.max_rwq_indirection_table_size;
 	/*
 	 * Remove this check once DPDK supports larger/variable
 	 * indirection tables.
@@ -1085,18 +1191,18 @@ struct mlx5_dev_spawn_data {
 		config.ind_table_max_size = ETH_RSS_RETA_SIZE_512;
 	DRV_LOG(DEBUG, "maximum Rx indirection table size is %u",
 		config.ind_table_max_size);
-	config.hw_vlan_strip = !!(attr.raw_packet_caps &
+	config.hw_vlan_strip = !!(sh->device_attr.raw_packet_caps &
 				  IBV_RAW_PACKET_CAP_CVLAN_STRIPPING);
 	DRV_LOG(DEBUG, "VLAN stripping is %ssupported",
 		(config.hw_vlan_strip ? "" : "not "));
-	config.hw_fcs_strip = !!(attr.raw_packet_caps &
+	config.hw_fcs_strip = !!(sh->device_attr.raw_packet_caps &
 				 IBV_RAW_PACKET_CAP_SCATTER_FCS);
 	DRV_LOG(DEBUG, "FCS stripping configuration is %ssupported",
 		(config.hw_fcs_strip ? "" : "not "));
 #if defined(HAVE_IBV_WQ_FLAG_RX_END_PADDING)
-	hw_padding = !!attr.rx_pad_end_addr_align;
+	hw_padding = !!sh->device_attr.rx_pad_end_addr_align;
 #elif defined(HAVE_IBV_WQ_FLAGS_PCI_WRITE_END_PADDING)
-	hw_padding = !!(attr.device_cap_flags_ex &
+	hw_padding = !!(sh->device_attr.device_cap_flags_ex &
 			IBV_DEVICE_PCI_WRITE_END_PADDING);
 #endif
 	if (config.hw_padding && !hw_padding) {
@@ -1105,11 +1211,11 @@ struct mlx5_dev_spawn_data {
 	} else if (config.hw_padding) {
 		DRV_LOG(DEBUG, "Rx end alignment padding is enabled");
 	}
-	config.tso = (attr.tso_caps.max_tso > 0 &&
-		      (attr.tso_caps.supported_qpts &
+	config.tso = (sh->device_attr.tso_caps.max_tso > 0 &&
+		      (sh->device_attr.tso_caps.supported_qpts &
 		       (1 << IBV_QPT_RAW_PACKET)));
 	if (config.tso)
-		config.tso_max_payload_sz = attr.tso_caps.max_tso;
+		config.tso_max_payload_sz = sh->device_attr.tso_caps.max_tso;
 	/*
 	 * MPW is disabled by default, while the Enhanced MPW is enabled
 	 * by default.
@@ -1250,7 +1356,8 @@ struct mlx5_dev_spawn_data {
 		.free = &mlx5_free_verbs_buf,
 		.data = priv,
 	};
-	mlx5_glue->dv_set_context_attr(ctx, MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
+	mlx5_glue->dv_set_context_attr(sh->ctx,
+				       MLX5DV_CTX_ATTR_BUF_ALLOCATORS,
 				       (void *)((uintptr_t)&alctr));
 	/* Bring Ethernet device up. */
 	DRV_LOG(DEBUG, "port %u forcing Ethernet interface up",
@@ -1304,15 +1411,13 @@ struct mlx5_dev_spawn_data {
 		if (eth_dev != NULL)
 			eth_dev->data->dev_private = NULL;
 	}
-	if (pd)
-		claim_zero(mlx5_glue->dealloc_pd(pd));
 	if (eth_dev != NULL) {
 		/* mac_addrs must not be freed alone because part of dev_private */
 		eth_dev->data->mac_addrs = NULL;
 		rte_eth_dev_release_port(eth_dev);
 	}
-	if (ctx)
-		claim_zero(mlx5_glue->close_device(ctx));
+	if (sh)
+		mlx5_free_shared_ibctx(sh);
 	assert(err > 0);
 	rte_errno = err;
 	return NULL;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d69f589..6882460 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -187,10 +187,42 @@ struct mlx5_drop {
 
 struct mlx5_flow_tcf_context;
 
+/* Per port data of shared IB device. */
+struct mlx5_ibv_shared_port {
+	uint32_t ih_port_id;
+	/*
+	 * Interrupt handler port_id. Used by shared interrupt
+	 * handler to find the corresponding rte_eth device
+	 * by IB port index. If value is equal or greater
+	 * RTE_MAX_ETHPORTS it means there is no subhandler
+	 * installed for specified IB port index.
+	 */
+};
+
+/*
+ * Shared Infiniband device context for Master/Representors
+ * which belong to same IB device with multiple IB ports.
+ **/
+struct mlx5_ibv_shared {
+	LIST_ENTRY(mlx5_ibv_shared) next;
+	uint32_t refcnt;
+	uint32_t devx:1; /* Opened with DV. */
+	uint32_t max_port; /* Maximal IB device port index. */
+	struct ibv_context *ctx; /* Verbs/DV context. */
+	struct ibv_pd *pd; /* Protection Domain. */
+	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
+	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
+	struct ibv_device_attr_ex device_attr; /* Device properties. */
+	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
+	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
+};
+
 struct mlx5_priv {
 	LIST_ENTRY(mlx5_priv) mem_event_cb;
 	/**< Called by memory event callback. */
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
+	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
+	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 06/13] net/mlx5: switch to the names in the shared IB context
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
                           ` (4 preceding siblings ...)
  2019-03-27 13:15         ` [PATCH v4 05/13] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 07/13] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
                           ` (7 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The IB device names are moved from device private data
to the shared context, code involving the names is updated.
The IB port index treatment is added where it is relevant.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.h        |  2 --
 drivers/net/mlx5/mlx5_ethdev.c | 10 +++++++---
 drivers/net/mlx5/mlx5_stats.c  | 22 +++++++++++++---------
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6882460..56270a6 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -226,8 +226,6 @@ struct mlx5_priv {
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
-	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2b83898..0662594 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -138,8 +138,10 @@ struct ethtool_link_settings {
 	unsigned int dev_port_prev = ~0u;
 	char match[IF_NAMESIZE] = "";
 
+	assert(priv);
+	assert(priv->sh);
 	{
-		MKSTR(path, "%s/device/net", priv->ibdev_path);
+		MKSTR(path, "%s/device/net", priv->sh->ibdev_path);
 
 		dir = opendir(path);
 		if (dir == NULL) {
@@ -159,7 +161,7 @@ struct ethtool_link_settings {
 			continue;
 
 		MKSTR(path, "%s/device/net/%s/%s",
-		      priv->ibdev_path, name,
+		      priv->sh->ibdev_path, name,
 		      (dev_type ? "dev_id" : "dev_port"));
 
 		file = fopen(path, "rb");
@@ -222,7 +224,9 @@ struct ethtool_link_settings {
 	struct mlx5_priv *priv = dev->data->dev_private;
 	unsigned int ifindex =
 		priv->nl_socket_rdma >= 0 ?
-		mlx5_nl_ifindex(priv->nl_socket_rdma, priv->ibdev_name, 1) : 0;
+		mlx5_nl_ifindex(priv->nl_socket_rdma,
+				priv->sh->ibdev_name,
+				priv->ibv_port) : 0;
 
 	if (!ifindex) {
 		if (!priv->representor)
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 6906dc8..5af199d 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -140,18 +140,22 @@
 mlx5_read_ib_stat(struct mlx5_priv *priv, const char *ctr_name, uint64_t *stat)
 {
 	FILE *file;
-	MKSTR(path, "%s/ports/1/hw_counters/%s",
-		  priv->ibdev_path,
-		  ctr_name);
+	if (priv->sh) {
+		MKSTR(path, "%s/ports/%d/hw_counters/%s",
+			  priv->sh->ibdev_path,
+			  priv->ibv_port,
+			  ctr_name);
 
-	file = fopen(path, "rb");
-	if (file) {
-		int n = fscanf(file, "%" SCNu64, stat);
+		file = fopen(path, "rb");
+		if (file) {
+			int n = fscanf(file, "%" SCNu64, stat);
 
-		fclose(file);
-		if (n != 1)
-			stat = 0;
+			fclose(file);
+			if (n == 1)
+				return;
+		}
 	}
+	*stat = 0;
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 07/13] net/mlx5: switch to the shared Protection Domain
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
                           ` (5 preceding siblings ...)
  2019-03-27 13:15         ` [PATCH v4 06/13] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 08/13] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
                           ` (6 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The PMD code is updated to use Protected Domain from the
the shared IB device context. The Domain is shared between
all devices belonging to the same multiport Infiniband device.
If IB device has only one port, the PD is not shared, because
there is only ethernet device created over IB one.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c     |  1 -
 drivers/net/mlx5/mlx5.h     |  1 -
 drivers/net/mlx5/mlx5_mr.c  |  4 ++--
 drivers/net/mlx5/mlx5_rxq.c | 10 +++++-----
 drivers/net/mlx5/mlx5_txq.c |  2 +-
 5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 988fff7..e458955 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1098,7 +1098,6 @@ struct mlx5_dev_spawn_data {
 	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
 	priv->device_attr = sh->device_attr;
-	priv->pd = sh->pd;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 56270a6..4213866 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -225,7 +225,6 @@ struct mlx5_priv {
 	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
-	struct ibv_pd *pd; /* Protection Domain. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 700d83d..f7eb9a5 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -719,7 +719,7 @@ struct mr_update_mp_data {
 	 * mlx5_alloc_buf_extern() which eventually calls rte_malloc_socket()
 	 * through mlx5_alloc_verbs_buf().
 	 */
-	mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)data.start, len,
+	mr->ibv_mr = mlx5_glue->reg_mr(priv->sh->pd, (void *)data.start, len,
 				       IBV_ACCESS_LOCAL_WRITE);
 	if (mr->ibv_mr == NULL) {
 		DEBUG("port %u fail to create a verbs MR for address (%p)",
@@ -1156,7 +1156,7 @@ struct mr_update_mp_data {
 	}
 	DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)",
 		dev->data->port_id, mem_idx, mp->name);
-	mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len,
+	mr->ibv_mr = mlx5_glue->reg_mr(priv->sh->pd, (void *)addr, len,
 				       IBV_ACCESS_LOCAL_WRITE);
 	if (mr->ibv_mr == NULL) {
 		DRV_LOG(WARNING,
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2f60999..0496c4e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -867,7 +867,7 @@ struct mlx5_rxq_ibv *
 		.max_wr = wqe_n >> rxq_data->sges_n,
 		/* Max number of scatter/gather elements in a WR. */
 		.max_sge = 1 << rxq_data->sges_n,
-		.pd = priv->pd,
+		.pd = priv->sh->pd,
 		.cq = tmpl->cq,
 		.comp_mask =
 			IBV_WQ_FLAGS_CVLAN_STRIPPING |
@@ -1831,7 +1831,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = hash_fields,
 			},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 		 },
 		 &qp_init_attr);
 #else
@@ -1850,7 +1850,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = hash_fields,
 			},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 		 });
 #endif
 	if (!qp) {
@@ -2006,7 +2006,7 @@ struct mlx5_rxq_ibv *
 			.wq_type = IBV_WQT_RQ,
 			.max_wr = 1,
 			.max_sge = 1,
-			.pd = priv->pd,
+			.pd = priv->sh->pd,
 			.cq = cq,
 		 });
 	if (!wq) {
@@ -2160,7 +2160,7 @@ struct mlx5_hrxq *
 				.rx_hash_fields_mask = 0,
 				},
 			.rwq_ind_tbl = ind_tbl->ind_table,
-			.pd = priv->pd
+			.pd = priv->sh->pd
 		 });
 	if (!qp) {
 		DEBUG("port %u cannot allocate QP for drop queue",
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d185617..d3a5498 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -426,7 +426,7 @@ struct mlx5_txq_ibv *
 		 * Tx burst.
 		 */
 		.sq_sig_all = 0,
-		.pd = priv->pd,
+		.pd = priv->sh->pd,
 		.comp_mask = IBV_QP_INIT_ATTR_PD,
 	};
 	if (txq_data->max_inline)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 08/13] net/mlx5: switch to the shared context IB attributes
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
                           ` (6 preceding siblings ...)
  2019-03-27 13:15         ` [PATCH v4 07/13] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 09/13] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
                           ` (5 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to use the shared IB device attributes,
located in the shared IB context. It saves some memory if
there are representors created over the single Infiniband
device with multuple ports.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  1 -
 drivers/net/mlx5/mlx5.h        |  1 -
 drivers/net/mlx5/mlx5_ethdev.c |  6 +++---
 drivers/net/mlx5/mlx5_rxq.c    |  8 ++++----
 drivers/net/mlx5/mlx5_txq.c    | 14 +++++++-------
 5 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e458955..436d4a8 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1097,7 +1097,6 @@ struct mlx5_dev_spawn_data {
 	priv->sh = sh;
 	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
-	priv->device_attr = sh->device_attr;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
 	/* Initialize UAR access locks for 32bit implementations. */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4213866..d74103f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -224,7 +224,6 @@ struct mlx5_priv {
 	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
 	uint32_t ibv_port; /* IB device port number. */
 	struct ibv_context *ctx; /* Verbs context. */
-	struct ibv_device_attr_ex device_attr; /* Device properties. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 0662594..7003c32 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -515,8 +515,8 @@ struct ethtool_link_settings {
 	 * Since we need one CQ per QP, the limit is the minimum number
 	 * between the two values.
 	 */
-	max = RTE_MIN(priv->device_attr.orig_attr.max_cq,
-		      priv->device_attr.orig_attr.max_qp);
+	max = RTE_MIN(priv->sh->device_attr.orig_attr.max_cq,
+		      priv->sh->device_attr.orig_attr.max_qp);
 	/* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */
 	if (max >= 65535)
 		max = 65535;
@@ -577,7 +577,7 @@ struct ethtool_link_settings {
 int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct ibv_device_attr *attr = &priv->device_attr.orig_attr;
+	struct ibv_device_attr *attr = &priv->sh->device_attr.orig_attr;
 	size_t size = strnlen(attr->fw_ver, sizeof(attr->fw_ver)) + 1;
 
 	if (fw_size < size)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 0496c4e..fd1c3a2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -856,10 +856,10 @@ struct mlx5_rxq_ibv *
 		rte_errno = ENOMEM;
 		goto error;
 	}
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_qp_wr is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_qp_wr);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_sge is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_sge);
+	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_qp_wr);
+	DRV_LOG(DEBUG, "port %u device_attr.max_sge is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_sge);
 	attr.wq.ibv = (struct ibv_wq_init_attr){
 		.wq_context = NULL, /* Could be useful in the future. */
 		.wq_type = IBV_WQT_RQ,
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index d3a5498..10a3040 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -407,15 +407,15 @@ struct mlx5_txq_ibv *
 		.cap = {
 			/* Max number of outstanding WRs. */
 			.max_send_wr =
-				((priv->device_attr.orig_attr.max_qp_wr <
+				((priv->sh->device_attr.orig_attr.max_qp_wr <
 				  desc) ?
-				 priv->device_attr.orig_attr.max_qp_wr :
+				 priv->sh->device_attr.orig_attr.max_qp_wr :
 				 desc),
 			/*
 			 * Max number of scatter/gather elements in a WR,
 			 * must be 1 to prevent libmlx5 from trying to affect
 			 * too much memory. TX gather is not impacted by the
-			 * priv->device_attr.max_sge limit and will still work
+			 * device_attr.max_sge limit and will still work
 			 * properly.
 			 */
 			.max_send_sge = 1,
@@ -780,10 +780,10 @@ struct mlx5_txq_ctrl *
 	tmpl->txq.elts_n = log2above(desc);
 	tmpl->idx = idx;
 	txq_set_params(tmpl);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_qp_wr is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_qp_wr);
-	DRV_LOG(DEBUG, "port %u priv->device_attr.max_sge is %d",
-		dev->data->port_id, priv->device_attr.orig_attr.max_sge);
+	DRV_LOG(DEBUG, "port %u device_attr.max_qp_wr is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_qp_wr);
+	DRV_LOG(DEBUG, "port %u device_attr.max_sge is %d",
+		dev->data->port_id, priv->sh->device_attr.orig_attr.max_sge);
 	tmpl->txq.elts =
 		(struct rte_mbuf *(*)[1 << tmpl->txq.elts_n])(tmpl + 1);
 	tmpl->txq.stats.idx = idx;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 09/13] net/mlx5: switch to the shared IB device context
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
                           ` (7 preceding siblings ...)
  2019-03-27 13:15         ` [PATCH v4 08/13] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-04-02  4:49           ` Shahaf Shuler
  2019-03-27 13:15         ` [PATCH v4 10/13] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
                           ` (4 subsequent siblings)
  13 siblings, 1 reply; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to use the shared IB device context and
device handles. The IB device context is shared between
reprentors created over the single multiport IB device. All
Verbs and DevX objects will be created whithin this shared context.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c            |  3 +--
 drivers/net/mlx5/mlx5.h            |  1 -
 drivers/net/mlx5/mlx5_ethdev.c     | 13 +++++++------
 drivers/net/mlx5/mlx5_flow_dv.c    |  8 ++++----
 drivers/net/mlx5/mlx5_flow_verbs.c |  6 ++++--
 drivers/net/mlx5/mlx5_rxq.c        | 23 ++++++++++++-----------
 drivers/net/mlx5/mlx5_socket.c     |  4 ++--
 drivers/net/mlx5/mlx5_trigger.c    |  2 +-
 drivers/net/mlx5/mlx5_txq.c        |  4 ++--
 9 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 436d4a8..646dd52 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -402,7 +402,7 @@ struct mlx5_dev_spawn_data {
 
 	DRV_LOG(DEBUG, "port %u closing device \"%s\"",
 		dev->data->port_id,
-		((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+		((priv->sh->ctx != NULL) ? priv->sh->ctx->device->name : ""));
 	/* In case mlx5_dev_stop() has not been called. */
 	mlx5_dev_interrupt_handler_uninstall(dev);
 	mlx5_traffic_disable(dev);
@@ -1095,7 +1095,6 @@ struct mlx5_dev_spawn_data {
 		goto error;
 	}
 	priv->sh = sh;
-	priv->ctx = sh->ctx;
 	priv->ibv_port = spawn->ibv_port;
 	priv->mtu = ETHER_MTU;
 #ifndef RTE_ARCH_64
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d74103f..5045648 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -223,7 +223,6 @@ struct mlx5_priv {
 	struct rte_eth_dev_data *dev_data;  /* Pointer to device data. */
 	struct mlx5_ibv_shared *sh; /* Shared IB device context. */
 	uint32_t ibv_port; /* IB device port number. */
-	struct ibv_context *ctx; /* Verbs context. */
 	struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES]; /* MAC addresses. */
 	BITFIELD_DECLARE(mac_own, uint64_t, MLX5_MAX_MAC_ADDRESSES);
 	/* Bit-field of MAC addresses owned by the PMD. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 7003c32..2ad3231 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1055,7 +1055,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	}
 	/* Read all message and acknowledge them. */
 	for (;;) {
-		if (mlx5_glue->get_async_event(priv->ctx, &event))
+		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
 			break;
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
 			event.event_type == IBV_EVENT_PORT_ERR) &&
@@ -1142,12 +1142,13 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	int ret;
 	int flags;
 
-	assert(priv->ctx->async_fd > 0);
-	flags = fcntl(priv->ctx->async_fd, F_GETFL);
-	ret = fcntl(priv->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
+	assert(ctx->async_fd > 0);
+	flags = fcntl(ctx->async_fd, F_GETFL);
+	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
 	if (ret) {
 		DRV_LOG(INFO,
 			"port %u failed to change file descriptor async event"
@@ -1158,7 +1159,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	}
 	if (dev->data->dev_conf.intr_conf.lsc ||
 	    dev->data->dev_conf.intr_conf.rmv) {
-		priv->intr_handle.fd = priv->ctx->async_fd;
+		priv->intr_handle.fd = ctx->async_fd;
 		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
 		rte_intr_callback_register(&priv->intr_handle,
 					   mlx5_dev_interrupt_handler, dev);
@@ -1303,7 +1304,7 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 	struct ibv_device_attr device_attr;
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (mlx5_glue->query_device(priv->ctx, &device_attr) == EIO)
+	if (mlx5_glue->query_device(priv->sh->ctx, &device_attr) == EIO)
 		return 1;
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index ebcdd15..d4a1149 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -836,7 +836,7 @@ struct field_modify_info modify_tcp[] = {
 	*cache_resource = *resource;
 	cache_resource->verbs_action =
 		mlx5_glue->dv_create_flow_action_packet_reformat
-			(priv->ctx, cache_resource->size,
+			(priv->sh->ctx, cache_resource->size,
 			 (cache_resource->size ? cache_resource->buf : NULL),
 			 cache_resource->reformat_type,
 			 cache_resource->ft_type);
@@ -1468,7 +1468,7 @@ struct field_modify_info modify_tcp[] = {
 	*cache_resource = *resource;
 	cache_resource->verbs_action =
 		mlx5_glue->dv_create_flow_action_modify_header
-					(priv->ctx,
+					(priv->sh->ctx,
 					 cache_resource->actions_num *
 					 sizeof(cache_resource->actions[0]),
 					 (uint64_t *)cache_resource->actions,
@@ -1528,7 +1528,7 @@ struct field_modify_info modify_tcp[] = {
 		ret = -ENOMEM;
 		goto error_exit;
 	}
-	ret = mlx5_devx_cmd_flow_counter_alloc(priv->ctx, dcs);
+	ret = mlx5_devx_cmd_flow_counter_alloc(priv->sh->ctx, dcs);
 	if (ret)
 		goto error_exit;
 	struct mlx5_flow_counter tmpl = {
@@ -2787,7 +2787,7 @@ struct field_modify_info modify_tcp[] = {
 	if (matcher->egress)
 		dv_attr.flags |= IBV_FLOW_ATTR_FLAGS_EGRESS;
 	cache_matcher->matcher_object =
-		mlx5_glue->dv_create_flow_matcher(priv->ctx, &dv_attr);
+		mlx5_glue->dv_create_flow_matcher(priv->sh->ctx, &dv_attr);
 	if (!cache_matcher->matcher_object) {
 		rte_free(cache_matcher);
 		return rte_flow_error_set(error, ENOMEM,
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 6c4f52f..9548d12 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -56,10 +56,11 @@
 {
 #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context ctx = priv->sh->ctx;
 	struct ibv_counter_set_init_attr init = {
 			 .counter_set_id = counter->id};
 
-	counter->cs = mlx5_glue->create_counter_set(priv->ctx, &init);
+	counter->cs = mlx5_glue->create_counter_set(ctx, &init);
 	if (!counter->cs) {
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
@@ -67,12 +68,13 @@
 	return 0;
 #elif defined(HAVE_IBV_DEVICE_COUNTERS_SET_V45)
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	struct ibv_counters_init_attr init = {0};
 	struct ibv_counter_attach_attr attach;
 	int ret;
 
 	memset(&attach, 0, sizeof(attach));
-	counter->cs = mlx5_glue->create_counters(priv->ctx, &init);
+	counter->cs = mlx5_glue->create_counters(ctx, &init);
 	if (!counter->cs) {
 		rte_errno = ENOTSUP;
 		return -ENOTSUP;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index fd1c3a2..dcb97c2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -799,7 +799,7 @@ struct mlx5_rxq_ibv *
 	}
 	tmpl->rxq_ctrl = rxq_ctrl;
 	if (rxq_ctrl->irq) {
-		tmpl->channel = mlx5_glue->create_comp_channel(priv->ctx);
+		tmpl->channel = mlx5_glue->create_comp_channel(priv->sh->ctx);
 		if (!tmpl->channel) {
 			DRV_LOG(ERR, "port %u: comp channel creation failure",
 				dev->data->port_id);
@@ -848,7 +848,7 @@ struct mlx5_rxq_ibv *
 	}
 #endif
 	tmpl->cq = mlx5_glue->cq_ex_to_cq
-		(mlx5_glue->dv_create_cq(priv->ctx, &attr.cq.ibv,
+		(mlx5_glue->dv_create_cq(priv->sh->ctx, &attr.cq.ibv,
 					 &attr.cq.mlx5));
 	if (tmpl->cq == NULL) {
 		DRV_LOG(ERR, "port %u Rx queue %u CQ creation failure",
@@ -905,10 +905,10 @@ struct mlx5_rxq_ibv *
 			.two_byte_shift_en = MLX5_MPRQ_TWO_BYTE_SHIFT,
 		};
 	}
-	tmpl->wq = mlx5_glue->dv_create_wq(priv->ctx, &attr.wq.ibv,
+	tmpl->wq = mlx5_glue->dv_create_wq(priv->sh->ctx, &attr.wq.ibv,
 					   &attr.wq.mlx5);
 #else
-	tmpl->wq = mlx5_glue->create_wq(priv->ctx, &attr.wq.ibv);
+	tmpl->wq = mlx5_glue->create_wq(priv->sh->ctx, &attr.wq.ibv);
 #endif
 	if (tmpl->wq == NULL) {
 		DRV_LOG(ERR, "port %u Rx queue %u WQ creation failure",
@@ -1643,7 +1643,7 @@ struct mlx5_ind_table_ibv *
 	for (j = 0; i != (unsigned int)(1 << wq_n); ++i, ++j)
 		wq[i] = wq[j];
 	ind_tbl->ind_table = mlx5_glue->create_rwq_ind_table
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_rwq_ind_table_init_attr){
 			.log_ind_tbl_size = wq_n,
 			.ind_tbl = wq,
@@ -1817,7 +1817,7 @@ struct mlx5_hrxq *
 	}
 #endif
 	qp = mlx5_glue->dv_create_qp
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
@@ -1836,7 +1836,7 @@ struct mlx5_hrxq *
 		 &qp_init_attr);
 #else
 	qp = mlx5_glue->create_qp_ex
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
@@ -1988,20 +1988,21 @@ struct mlx5_rxq_ibv *
 mlx5_rxq_ibv_drop_new(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
+	struct ibv_context *ctx = priv->sh->ctx;
 	struct ibv_cq *cq;
 	struct ibv_wq *wq = NULL;
 	struct mlx5_rxq_ibv *rxq;
 
 	if (priv->drop_queue.rxq)
 		return priv->drop_queue.rxq;
-	cq = mlx5_glue->create_cq(priv->ctx, 1, NULL, NULL, 0);
+	cq = mlx5_glue->create_cq(ctx, 1, NULL, NULL, 0);
 	if (!cq) {
 		DEBUG("port %u cannot allocate CQ for drop queue",
 		      dev->data->port_id);
 		rte_errno = errno;
 		goto error;
 	}
-	wq = mlx5_glue->create_wq(priv->ctx,
+	wq = mlx5_glue->create_wq(ctx,
 		 &(struct ibv_wq_init_attr){
 			.wq_type = IBV_WQT_RQ,
 			.max_wr = 1,
@@ -2078,7 +2079,7 @@ struct mlx5_ind_table_ibv *
 	if (!rxq)
 		return NULL;
 	tmpl.ind_table = mlx5_glue->create_rwq_ind_table
-		(priv->ctx,
+		(priv->sh->ctx,
 		 &(struct ibv_rwq_ind_table_init_attr){
 			.log_ind_tbl_size = 0,
 			.ind_tbl = &rxq->wq,
@@ -2145,7 +2146,7 @@ struct mlx5_hrxq *
 	ind_tbl = mlx5_ind_table_ibv_drop_new(dev);
 	if (!ind_tbl)
 		return NULL;
-	qp = mlx5_glue->create_qp_ex(priv->ctx,
+	qp = mlx5_glue->create_qp_ex(priv->sh->ctx,
 		 &(struct ibv_qp_init_attr_ex){
 			.qp_type = IBV_QPT_RAW_PACKET,
 			.comp_mask =
diff --git a/drivers/net/mlx5/mlx5_socket.c b/drivers/net/mlx5/mlx5_socket.c
index 41cac3c..8fa6430 100644
--- a/drivers/net/mlx5/mlx5_socket.c
+++ b/drivers/net/mlx5/mlx5_socket.c
@@ -185,9 +185,9 @@
 	assert(cmsg != NULL);
 	cmsg->cmsg_level = SOL_SOCKET;
 	cmsg->cmsg_type = SCM_RIGHTS;
-	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->ctx->cmd_fd));
+	cmsg->cmsg_len = CMSG_LEN(sizeof(priv->sh->ctx->cmd_fd));
 	fd = (int *)CMSG_DATA(cmsg);
-	*fd = priv->ctx->cmd_fd;
+	*fd = priv->sh->ctx->cmd_fd;
 	ret = sendmsg(conn_sock, &msg, 0);
 	if (ret < 0)
 		DRV_LOG(WARNING, "port %u cannot send response",
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 2137bdc..d13a1a1 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -58,7 +58,7 @@
 			goto error;
 		}
 	}
-	ret = mlx5_tx_uar_remap(dev, priv->ctx->cmd_fd);
+	ret = mlx5_tx_uar_remap(dev, priv->sh->ctx->cmd_fd);
 	if (ret) {
 		/* Adjust index for rollback. */
 		i = priv->txqs_n - 1;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 10a3040..25305b6 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -392,7 +392,7 @@ struct mlx5_txq_ibv *
 		((desc / MLX5_TX_COMP_THRESH) - 1) : 1;
 	if (is_empw_burst_func(tx_pkt_burst))
 		cqe_n += MLX5_TX_COMP_THRESH_INLINE_DIV;
-	tmpl.cq = mlx5_glue->create_cq(priv->ctx, cqe_n, NULL, NULL, 0);
+	tmpl.cq = mlx5_glue->create_cq(priv->sh->ctx, cqe_n, NULL, NULL, 0);
 	if (tmpl.cq == NULL) {
 		DRV_LOG(ERR, "port %u Tx queue %u CQ creation failure",
 			dev->data->port_id, idx);
@@ -435,7 +435,7 @@ struct mlx5_txq_ibv *
 		attr.init.max_tso_header = txq_ctrl->max_tso_header;
 		attr.init.comp_mask |= IBV_QP_INIT_ATTR_MAX_TSO_HEADER;
 	}
-	tmpl.qp = mlx5_glue->create_qp_ex(priv->ctx, &attr.init);
+	tmpl.qp = mlx5_glue->create_qp_ex(priv->sh->ctx, &attr.init);
 	if (tmpl.qp == NULL) {
 		DRV_LOG(ERR, "port %u Tx queue %u QP creation failure",
 			dev->data->port_id, idx);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 10/13] net/mlx5: provide IB port for the object being created
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
                           ` (8 preceding siblings ...)
  2019-03-27 13:15         ` [PATCH v4 09/13] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 11/13] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
                           ` (3 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

The code is updated to provide IB port index for the Verbs
objects being created - QPs and Verbs Flows.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c       | 2 ++
 drivers/net/mlx5/mlx5_flow_verbs.c | 1 +
 drivers/net/mlx5/mlx5_txq.c        | 4 ++--
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index dea38e2..d09fdff 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -315,6 +315,7 @@ struct mlx5_flow_tunnel_info {
 int
 mlx5_flow_discover_priorities(struct rte_eth_dev *dev)
 {
+	struct mlx5_priv *priv = dev->data->dev_private;
 	struct {
 		struct ibv_flow_attr attr;
 		struct ibv_flow_spec_eth eth;
@@ -322,6 +323,7 @@ struct mlx5_flow_tunnel_info {
 	} flow_attr = {
 		.attr = {
 			.num_of_specs = 2,
+			.port = (uint8_t)priv->ibv_port,
 		},
 		.eth = {
 			.type = IBV_FLOW_SPEC_ETH,
diff --git a/drivers/net/mlx5/mlx5_flow_verbs.c b/drivers/net/mlx5/mlx5_flow_verbs.c
index 9548d12..b69af6c 100644
--- a/drivers/net/mlx5/mlx5_flow_verbs.c
+++ b/drivers/net/mlx5/mlx5_flow_verbs.c
@@ -1550,6 +1550,7 @@
 	dev_flow->layers = item_flags;
 	dev_flow->verbs.attr->priority =
 		mlx5_flow_adjust_priority(dev, priority, subpriority);
+	dev_flow->verbs.attr->port = (uint8_t)priv->ibv_port;
 	return 0;
 }
 
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 25305b6..5062f5c 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -445,8 +445,8 @@ struct mlx5_txq_ibv *
 	attr.mod = (struct ibv_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = 1,
+		/* IB device port number. */
+		.port_num = (uint8_t)priv->ibv_port,
 	};
 	ret = mlx5_glue->modify_qp(tmpl.qp, &attr.mod,
 				   (IBV_QP_STATE | IBV_QP_PORT));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 11/13] net/mlx5: update install/uninstall int handler routines
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
                           ` (9 preceding siblings ...)
  2019-03-27 13:15         ` [PATCH v4 10/13] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 12/13] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
                           ` (2 subsequent siblings)
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

We are implementing the support for multport Infiniband device
withj representors attached to these multiple ports. Asynchronous
device event notifications (link status change, removal event, etc.)
should be shared between ports. We are going to implement shared
event handler and this patch introduces appropriate device
structure changes and updated event handler install and uninstall
routines.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5.c        |  19 ++++++-
 drivers/net/mlx5/mlx5.h        |   3 +-
 drivers/net/mlx5/mlx5_ethdev.c | 118 ++++++++++++++++++++++++++++++++---------
 3 files changed, 112 insertions(+), 28 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 646dd52..7d1a21e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -166,6 +166,7 @@ struct mlx5_dev_spawn_data {
 {
 	struct mlx5_ibv_shared *sh;
 	int err = 0;
+	uint32_t i;
 
 	assert(spawn);
 	/* Secondary process should not create the shared context. */
@@ -215,6 +216,14 @@ struct mlx5_dev_spawn_data {
 		sizeof(sh->ibdev_name));
 	strncpy(sh->ibdev_path, sh->ctx->device->ibdev_path,
 		sizeof(sh->ibdev_path));
+	pthread_mutex_init(&sh->intr_mutex, NULL);
+	/*
+	 * Setting port_id to max unallowed value means
+	 * there is no interrupt subhandler installed for
+	 * the given port index i.
+	 */
+	for (i = 0; i < sh->max_port; i++)
+		sh->port[i].ih_port_id = RTE_MAX_ETHPORTS;
 	sh->pd = mlx5_glue->alloc_pd(sh->ctx);
 	if (sh->pd == NULL) {
 		DRV_LOG(ERR, "PD allocation failure");
@@ -269,6 +278,15 @@ struct mlx5_dev_spawn_data {
 	if (--sh->refcnt)
 		goto exit;
 	LIST_REMOVE(sh, next);
+	/*
+	 *  Ensure there is no async event handler installed.
+	 *  Only primary process handles async device events.
+	 **/
+	assert(!sh->intr_cnt);
+	if (sh->intr_cnt)
+		rte_intr_callback_unregister
+			(&sh->intr_handle, mlx5_dev_interrupt_handler, sh);
+	pthread_mutex_destroy(&sh->intr_mutex);
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->ctx)
@@ -278,7 +296,6 @@ struct mlx5_dev_spawn_data {
 	pthread_mutex_unlock(&mlx5_ibv_list_mutex);
 }
 
-
 /**
  * Prepare shared data between primary and secondary process.
  */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 5045648..7402798 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -213,6 +213,8 @@ struct mlx5_ibv_shared {
 	char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */
 	char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for secondary */
 	struct ibv_device_attr_ex device_attr; /* Device properties. */
+	pthread_mutex_t intr_mutex; /* Interrupt config mutex. */
+	uint32_t intr_cnt; /* Interrupt handler reference counter. */
 	struct rte_intr_handle intr_handle; /* Interrupt handler for device. */
 	struct mlx5_ibv_shared_port port[]; /* per device port data array. */
 };
@@ -243,7 +245,6 @@ struct mlx5_priv {
 	struct mlx5_txq_data *(*txqs)[]; /* TX queues. */
 	struct rte_mempool *mprq_mp; /* Mempool for Multi-Packet RQ. */
 	struct rte_eth_rss_conf rss_conf; /* RSS configuration. */
-	struct rte_intr_handle intr_handle; /* Interrupt handler. */
 	unsigned int (*reta_idx)[]; /* RETA index table. */
 	unsigned int reta_idx_n; /* RETA index size. */
 	struct mlx5_drop drop_queue; /* Flow drop queues. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 2ad3231..55ccbce 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1109,6 +1109,96 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
+ * Uninstall shared asynchronous device events handler.
+ * This function is implemeted to support event sharing
+ * between multiple ports of single IB device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_dev_shared_handler_uninstall(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+	pthread_mutex_lock(&sh->intr_mutex);
+	assert(priv->ibv_port);
+	assert(priv->ibv_port <= sh->max_port);
+	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
+	if (sh->port[priv->ibv_port - 1].ih_port_id >= RTE_MAX_ETHPORTS)
+		goto exit;
+	assert(sh->port[priv->ibv_port - 1].ih_port_id ==
+					(uint32_t)dev->data->port_id);
+	assert(sh->intr_cnt);
+	sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
+	if (!sh->intr_cnt || --sh->intr_cnt)
+		goto exit;
+	rte_intr_callback_unregister(&sh->intr_handle,
+				     mlx5_dev_interrupt_handler, sh);
+	sh->intr_handle.fd = 0;
+	sh->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+exit:
+	pthread_mutex_unlock(&sh->intr_mutex);
+}
+
+/**
+ * Install shared asyncronous device events handler.
+ * This function is implemeted to support event sharing
+ * between multiple ports of single IB device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+static void
+mlx5_dev_shared_handler_install(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = priv->sh;
+	int ret;
+	int flags;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+	pthread_mutex_lock(&sh->intr_mutex);
+	assert(priv->ibv_port);
+	assert(priv->ibv_port <= sh->max_port);
+	assert(dev->data->port_id < RTE_MAX_ETHPORTS);
+	if (sh->port[priv->ibv_port - 1].ih_port_id < RTE_MAX_ETHPORTS) {
+		/* The handler is already installed for this port. */
+		assert(sh->intr_cnt);
+		goto exit;
+	}
+	sh->port[priv->ibv_port - 1].ih_port_id = (uint32_t)dev->data->port_id;
+	if (sh->intr_cnt) {
+		sh->intr_cnt++;
+		goto exit;
+	}
+	/* No shared handler installed. */
+	assert(sh->ctx->async_fd > 0);
+	flags = fcntl(sh->ctx->async_fd, F_GETFL);
+	ret = fcntl(sh->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
+	if (ret) {
+		DRV_LOG(INFO, "failed to change file descriptor"
+			      " async event queue");
+		/* Indicate there will be no interrupts. */
+		dev->data->dev_conf.intr_conf.lsc = 0;
+		dev->data->dev_conf.intr_conf.rmv = 0;
+		sh->port[priv->ibv_port - 1].ih_port_id = RTE_MAX_ETHPORTS;
+		goto exit;
+	}
+	sh->intr_handle.fd = sh->ctx->async_fd;
+	sh->intr_handle.type = RTE_INTR_HANDLE_EXT;
+	rte_intr_callback_register(&sh->intr_handle,
+				   mlx5_dev_interrupt_handler, sh);
+	sh->intr_cnt++;
+exit:
+	pthread_mutex_unlock(&sh->intr_mutex);
+}
+
+/**
  * Uninstall interrupt handler.
  *
  * @param dev
@@ -1119,15 +1209,10 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
 
-	if (dev->data->dev_conf.intr_conf.lsc ||
-	    dev->data->dev_conf.intr_conf.rmv)
-		rte_intr_callback_unregister(&priv->intr_handle,
-					     mlx5_dev_interrupt_handler, dev);
+	mlx5_dev_shared_handler_uninstall(dev);
 	if (priv->primary_socket)
 		rte_intr_callback_unregister(&priv->intr_handle_socket,
 					     mlx5_dev_handler_socket, dev);
-	priv->intr_handle.fd = 0;
-	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 	priv->intr_handle_socket.fd = 0;
 	priv->intr_handle_socket.type = RTE_INTR_HANDLE_UNKNOWN;
 }
@@ -1142,28 +1227,9 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 mlx5_dev_interrupt_handler_install(struct rte_eth_dev *dev)
 {
 	struct mlx5_priv *priv = dev->data->dev_private;
-	struct ibv_context *ctx = priv->sh->ctx;
 	int ret;
-	int flags;
 
-	assert(ctx->async_fd > 0);
-	flags = fcntl(ctx->async_fd, F_GETFL);
-	ret = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
-	if (ret) {
-		DRV_LOG(INFO,
-			"port %u failed to change file descriptor async event"
-			" queue",
-			dev->data->port_id);
-		dev->data->dev_conf.intr_conf.lsc = 0;
-		dev->data->dev_conf.intr_conf.rmv = 0;
-	}
-	if (dev->data->dev_conf.intr_conf.lsc ||
-	    dev->data->dev_conf.intr_conf.rmv) {
-		priv->intr_handle.fd = ctx->async_fd;
-		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
-		rte_intr_callback_register(&priv->intr_handle,
-					   mlx5_dev_interrupt_handler, dev);
-	}
+	mlx5_dev_shared_handler_install(dev);
 	ret = mlx5_socket_init(dev);
 	if (ret)
 		DRV_LOG(ERR, "port %u cannot initialise socket: %s",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 12/13] net/mlx5: update event handler for multiport IB devices
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
                           ` (10 preceding siblings ...)
  2019-03-27 13:15         ` [PATCH v4 11/13] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-27 13:15         ` [PATCH v4 13/13] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
  2019-03-28  9:21         ` [PATCH v4 00/14] net/mlx5: add support for multiport IB devices Shahaf Shuler
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

This patch modifies asynchronous event handler to support multiport
Infiniband devices. Handler queries the event parameters, including
event source port index, and invokes the handler for specific
devices with appropriate port_id.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5_ethdev.c | 101 +++++++++++++++++++++--------------------
 1 file changed, 51 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 55ccbce..7273bd9 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1032,66 +1032,67 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char *fw_ver, size_t fw_size)
 }
 
 /**
- * Device status handler.
+ * Handle shared asynchronous events the NIC (removal event
+ * and link status change). Supports multiport IB device.
  *
- * @param dev
- *   Pointer to Ethernet device.
- * @param events
- *   Pointer to event flags holder.
- *
- * @return
- *   Events bitmap of callback process which can be called immediately.
+ * @param cb_arg
+ *   Callback argument.
  */
-static uint32_t
-mlx5_dev_status_handler(struct rte_eth_dev *dev)
+void
+mlx5_dev_interrupt_handler(void *cb_arg)
 {
-	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_ibv_shared *sh = cb_arg;
 	struct ibv_async_event event;
-	uint32_t ret = 0;
 
-	if (mlx5_link_update(dev, 0) == -EAGAIN) {
-		usleep(0);
-		return 0;
-	}
-	/* Read all message and acknowledge them. */
+	/* Read all message from the IB device and acknowledge them. */
 	for (;;) {
-		if (mlx5_glue->get_async_event(priv->sh->ctx, &event))
+		struct rte_eth_dev *dev;
+		uint32_t tmp;
+
+		if (mlx5_glue->get_async_event(sh->ctx, &event))
 			break;
+		/* Retrieve and check IB port index. */
+		tmp = (uint32_t)event.element.port_num;
+		assert(tmp && (tmp <= sh->max_port));
+		if (!tmp ||
+		    tmp > sh->max_port ||
+		    sh->port[tmp - 1].ih_port_id >= RTE_MAX_ETHPORTS) {
+			/*
+			 * Invalid IB port index or no handler
+			 * installed for this port.
+			 */
+			mlx5_glue->ack_async_event(&event);
+			continue;
+		}
+		/* Retrieve ethernet device descriptor. */
+		tmp = sh->port[tmp - 1].ih_port_id;
+		dev = &rte_eth_devices[tmp];
+		tmp = 0;
+		assert(dev);
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
-			event.event_type == IBV_EVENT_PORT_ERR) &&
-			(dev->data->dev_conf.intr_conf.lsc == 1))
-			ret |= (1 << RTE_ETH_EVENT_INTR_LSC);
-		else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
-			dev->data->dev_conf.intr_conf.rmv == 1)
-			ret |= (1 << RTE_ETH_EVENT_INTR_RMV);
-		else
-			DRV_LOG(DEBUG,
-				"port %u event type %d on not handled",
-				dev->data->port_id, event.event_type);
+		     event.event_type == IBV_EVENT_PORT_ERR) &&
+			dev->data->dev_conf.intr_conf.lsc) {
+			mlx5_glue->ack_async_event(&event);
+			if (mlx5_link_update(dev, 0) == -EAGAIN) {
+				usleep(0);
+				continue;
+			}
+			_rte_eth_dev_callback_process
+				(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
+			continue;
+		}
+		if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
+		    dev->data->dev_conf.intr_conf.rmv) {
+			mlx5_glue->ack_async_event(&event);
+			_rte_eth_dev_callback_process
+				(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
+			continue;
+		}
+		DRV_LOG(DEBUG,
+			"port %u event type %d on not handled",
+			dev->data->port_id, event.event_type);
 		mlx5_glue->ack_async_event(&event);
 	}
-	return ret;
-}
-
-/**
- * Handle interrupts from the NIC.
- *
- * @param[in] intr_handle
- *   Interrupt handler.
- * @param cb_arg
- *   Callback argument.
- */
-void
-mlx5_dev_interrupt_handler(void *cb_arg)
-{
-	struct rte_eth_dev *dev = cb_arg;
-	uint32_t events;
-
-	events = mlx5_dev_status_handler(dev);
-	if (events & (1 << RTE_ETH_EVENT_INTR_LSC))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
-	if (events & (1 << RTE_ETH_EVENT_INTR_RMV))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 13/13] net/mlx5: add source vport match to the ingress rules
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
                           ` (11 preceding siblings ...)
  2019-03-27 13:15         ` [PATCH v4 12/13] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
@ 2019-03-27 13:15         ` Viacheslav Ovsiienko
  2019-03-28  9:21         ` [PATCH v4 00/14] net/mlx5: add support for multiport IB devices Shahaf Shuler
  13 siblings, 0 replies; 114+ messages in thread
From: Viacheslav Ovsiienko @ 2019-03-27 13:15 UTC (permalink / raw)
  To: dev; +Cc: shahafs

For E-Switch configurations over multiport Infiniband devices
we should add source vport match to correctly distribute
traffic between representors.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow_dv.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index d4a1149..57847fb 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -2805,6 +2805,29 @@ struct field_modify_info modify_tcp[] = {
 }
 
 /**
+ * Add source vport match to the specified matcher.
+ *
+ * @param[in, out] matcher
+ *   Flow matcher.
+ * @param[in, out] key
+ *   Flow matcher value.
+ * @param[in] port
+ *   Source vport value to match
+ * @param[in] mask
+ *   Mask
+ */
+static void
+flow_dv_translate_source_vport(void *matcher, void *key,
+			      int16_t port, uint16_t mask)
+{
+	void *misc_m = MLX5_ADDR_OF(fte_match_param, matcher, misc_parameters);
+	void *misc_v = MLX5_ADDR_OF(fte_match_param, key, misc_parameters);
+
+	MLX5_SET(fte_match_set_misc, misc_m, source_port, mask);
+	MLX5_SET(fte_match_set_misc, misc_v, source_port, port);
+}
+
+/**
  * Fill the flow with DV spec.
  *
  * @param[in] dev
@@ -3088,6 +3111,19 @@ struct field_modify_info modify_tcp[] = {
 	}
 	dev_flow->dv.actions_n = actions_n;
 	flow->actions = action_flags;
+	if (attr->ingress && !attr->transfer &&
+	    (priv->representor || priv->master)) {
+		/* It was validated - we support unidirection flows only. */
+		assert(!attr->egress);
+		/*
+		 * Add matching on source vport index only
+		 * for ingress rules in E-Switch configurations.
+		 */
+		flow_dv_translate_source_vport(matcher.mask.buf,
+					       dev_flow->dv.value.buf,
+					       priv->vport_id,
+					       0xffff);
+	}
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; items++) {
 		int tunnel = !!(item_flags & MLX5_FLOW_LAYER_TUNNEL);
 		void *match_mask = matcher.mask.buf;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 00/14] net/mlx5: add support for multiport IB devices
  2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
                           ` (12 preceding siblings ...)
  2019-03-27 13:15         ` [PATCH v4 13/13] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
@ 2019-03-28  9:21         ` Shahaf Shuler
  13 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-03-28  9:21 UTC (permalink / raw)
  To: Slava Ovsiienko, dev

Wednesday, March 27, 2019 3:16 PM, Viacheslav Ovsiienko:
> Subject: [PATCH v4 00/14] net/mlx5: add support for multiport IB devices
> 
> The Mellanox NICs support SR-IOV and have E-Switch feature.
> When SR-IOV is set up in switchdev mode and E-Switch is enabled we have
> so called VF representors in the system. All representors belonging to the
> same E-Switch are created on the basis of the single PCI function and with
> current implementation each representor has its own dedicated Infiniband
> device and operates within its own Infiniband context. It is proposed to
> provide representors as ports of the single Infiniband device and operate on
> the shared Infiniband context.
> 
> mlx5_pci_probe() routine is planned to be extended to probe the ports of
> found Infiniband devices. All active ports, belonging to the same Infiniband
> device will use the signle shared Infiniband context of that device:
>   - QP are created within shared context, with assigning to
>     specified port
>   - Verb flows are also created with specifying port index
>   - Protection Domain is shared by all representors
>   - Global Memory Regions Cache is shared by all representors,
>     local ones are not (not in this patchset)
>   - Shared Asynchronous Event Handler distributes events
>     among the device ports (not in this patchset)
>   - UAR mapping to user space is also shared (not in patchset)
> 
> Each representor has its own attached network interface device, interface
> name and index can be retrieved via IB device and port index, so the link
> control is performed via Netlink in the same way as in existing
> implementation.
> 
> This patchset also superseedes http://patches.dpdk.org/patch/50486/
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Series applied to next-net-mlx. 
Good job Slava. 

> 
> ---
> v4:
> 	- minor typo fixes for debug compilation
> 	- rebased on current top of repo
> 	- removed dependency on DR (Direct Rules)
> 
> v3:
> 	http://patches.dpdk.org/cover/51728/
> 	- vport_id field is added, contains associated VF vport
> 	  index on E-Switch. representor_id field is reverted
> 	  to keep the VF index
> 	- some comment spellcheck
> 	- some error logging is added for master/representor
> 	  wrong recognition
> 	- shared IB context alloc/free routines mutex
> 	  protection added
> 
> v2:
> 	http://patches.dpdk.org/patch/51642/
> 	- clarifying comments added
> 	- secondary process does not probe IB device at all and
> 	  does not create shared context
> 	- minor bugs (like ++ in assert) fixed
> 	- representor_id contains corresponding VF vport index,
> 	  no corrections +1 is needed anymore. vport index is
> 	  calculated in the only place in code and if indexing
> 	  schema is changed it would be easy to modify
> 
> v1:
> 	http://patches.dpdk.org/cover/51451/
> 
> Viacheslav Ovsiienko (13):
>   net/mlx5: add representor recognition on kernels 5.x
>   net/mlx5: modify get ifindex routine for multiport IB
>   net/mlx5: add getting IB ports number for multiport IB
>   net/mlx5: add multiport IB device support to probing
>   net/mlx5: add IB shared context alloc/free functions
>   net/mlx5: switch to the names in the shared IB context
>   net/mlx5: switch to the shared Protection Domain
>   net/mlx5: switch to the shared context IB attributes
>   net/mlx5: switch to the shared IB device context
>   net/mlx5: provide IB port for the object being created
>   net/mlx5: update install/uninstall int handler routines
>   net/mlx5: update event handler for multiport IB devices
>   net/mlx5: add source vport match to the ingress rules
> 
>  drivers/net/mlx5/Makefile          |  10 +
>  drivers/net/mlx5/meson.build       |   4 +
>  drivers/net/mlx5/mlx5.c            | 638 ++++++++++++++++++++++++++--------
> ---
>  drivers/net/mlx5/mlx5.h            |  45 ++-
>  drivers/net/mlx5/mlx5_ethdev.c     | 248 +++++++++-----
>  drivers/net/mlx5/mlx5_flow.c       |   2 +
>  drivers/net/mlx5/mlx5_flow_dv.c    |  44 ++-
>  drivers/net/mlx5/mlx5_flow_verbs.c |   7 +-
>  drivers/net/mlx5/mlx5_mr.c         |   4 +-
>  drivers/net/mlx5/mlx5_nl.c         | 108 ++++++-
>  drivers/net/mlx5/mlx5_rxq.c        |  41 +--
>  drivers/net/mlx5/mlx5_socket.c     |   4 +-
>  drivers/net/mlx5/mlx5_stats.c      |  22 +-
>  drivers/net/mlx5/mlx5_trigger.c    |   2 +-
>  drivers/net/mlx5/mlx5_txq.c        |  24 +-
>  15 files changed, 868 insertions(+), 335 deletions(-)
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 09/13] net/mlx5: switch to the shared IB device context
  2019-03-27 13:15         ` [PATCH v4 09/13] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
@ 2019-04-02  4:49           ` Shahaf Shuler
  0 siblings, 0 replies; 114+ messages in thread
From: Shahaf Shuler @ 2019-04-02  4:49 UTC (permalink / raw)
  To: Slava Ovsiienko, dev; +Cc: Ferruh Yigit, Thomas Monjalon

Slava, 

Wednesday, March 27, 2019 3:16 PM, Viacheslav Ovsiienko:
> Subject: [PATCH v4 09/13] net/mlx5: switch to the shared IB device context
> 
> The code is updated to use the shared IB device context and device handles.
> The IB device context is shared between reprentors created over the single
> multiport IB device. All Verbs and DevX objects will be created whithin this
> shared context.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> Acked-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5.c            |  3 +--
>  drivers/net/mlx5/mlx5.h            |  1 -
>  drivers/net/mlx5/mlx5_ethdev.c     | 13 +++++++------
>  drivers/net/mlx5/mlx5_flow_dv.c    |  8 ++++----
>  drivers/net/mlx5/mlx5_flow_verbs.c |  6 ++++--
>  drivers/net/mlx5/mlx5_rxq.c        | 23 ++++++++++++-----------
>  drivers/net/mlx5/mlx5_socket.c     |  4 ++--
>  drivers/net/mlx5/mlx5_trigger.c    |  2 +-
>  drivers/net/mlx5/mlx5_txq.c        |  4 ++--
>  9 files changed, 33 insertions(+), 31 deletions(-)
> 
> 
> a/drivers/net/mlx5/mlx5_flow_verbs.c
> b/drivers/net/mlx5/mlx5_flow_verbs.c
> index 6c4f52f..9548d12 100644
> --- a/drivers/net/mlx5/mlx5_flow_verbs.c
> +++ b/drivers/net/mlx5/mlx5_flow_verbs.c
> @@ -56,10 +56,11 @@
>  {
>  #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
>  	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct ibv_context ctx = priv->sh->ctx;
>  	struct ibv_counter_set_init_attr init = {
>  			 .counter_set_id = counter->id};
> 
> -	counter->cs = mlx5_glue->create_counter_set(priv->ctx, &init);
> +	counter->cs = mlx5_glue->create_counter_set(ctx, &init);
>  	if (!counter->cs) {
>  		rte_errno = ENOTSUP;
>  		return -ENOTSUP;

This code piece was causing compilation error w/ older rdma-core versions.
It was replaced w/ the below during the merge. 

>   #if defined(HAVE_IBV_DEVICE_COUNTERS_SET_V42)
>          struct mlx5_priv *priv = dev->data->dev_private;
>  -       struct ibv_context ctx = priv->sh->ctx;
>  +       struct ibv_context *ctx = priv->sh->ctx;
>          struct ibv_counter_set_init_attr init = {
>                           .counter_set_id = counter->id};
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2019-04-02  4:49 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-28 18:02 [RFC 00/10] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
2019-02-28 18:02 ` [RFC 01/10] net/mlx5: add multiport IB device port structure Viacheslav Ovsiienko
2019-02-28 18:02 ` [RFC 02/10] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
2019-02-28 18:02 ` [RFC 03/10] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
2019-02-28 18:02 ` [RFC 04/10] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
2019-02-28 18:03 ` [RFC 05/10] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
2019-02-28 18:03 ` [RFC 06/10] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
2019-02-28 18:03 ` [RFC 07/10] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
2019-02-28 18:03 ` [RFC 08/10] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
2019-02-28 18:03 ` [RFC 09/10] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
2019-02-28 18:03 ` [RFC 10/10] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
2019-03-21  8:11 ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
2019-03-21  8:11   ` [PATCH 01/14] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
2019-03-21 12:13     ` Shahaf Shuler
2019-03-21 15:08       ` Stephen Hemminger
2019-03-21 15:31         ` Slava Ovsiienko
2019-03-21 19:08           ` Stephen Hemminger
2019-03-22  8:15             ` Slava Ovsiienko
2019-03-21  8:11   ` [PATCH 02/14] net/mlx5: introduce multiport IB device shared structure Viacheslav Ovsiienko
2019-03-21  8:11   ` [PATCH 03/14] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
2019-03-21 12:14     ` Shahaf Shuler
2019-03-21 12:58       ` Slava Ovsiienko
2019-03-21  8:11   ` [PATCH 04/14] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
2019-03-21 12:14     ` Shahaf Shuler
2019-03-21  8:11   ` [PATCH 05/14] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
2019-03-21 12:14     ` Shahaf Shuler
2019-03-21 12:54       ` Slava Ovsiienko
2019-03-21 12:57         ` Slava Ovsiienko
2019-03-24  9:00           ` Shahaf Shuler
2019-03-21  8:11   ` [PATCH 06/14] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
2019-03-21 12:14     ` Shahaf Shuler
2019-03-21  8:11   ` [PATCH 07/14] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
2019-03-21 12:14     ` Shahaf Shuler
2019-03-21  8:11   ` [PATCH 08/14] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
2019-03-21 12:14     ` Shahaf Shuler
2019-03-21  8:11   ` [PATCH 09/14] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
2019-03-21 12:14     ` Shahaf Shuler
2019-03-21  8:11   ` [PATCH 10/14] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
2019-03-21 12:14     ` Shahaf Shuler
2019-03-21  8:11   ` [PATCH 11/14] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
2019-03-21 12:15     ` Shahaf Shuler
2019-03-21  8:11   ` [PATCH 12/14] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
2019-03-21 12:15     ` Shahaf Shuler
2019-03-21 14:01       ` Slava Ovsiienko
2019-03-24  9:07         ` Shahaf Shuler
2019-03-21  8:11   ` [PATCH 13/14] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
2019-03-21 12:15     ` Shahaf Shuler
2019-03-21 14:08       ` Slava Ovsiienko
2019-03-21  8:11   ` [PATCH 14/14] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
2019-03-21 12:15     ` Shahaf Shuler
2019-03-21 14:11       ` Slava Ovsiienko
2019-03-24  9:13         ` Shahaf Shuler
2019-03-25  7:44           ` Slava Ovsiienko
2019-03-21 12:13   ` [PATCH 00/14] net/mlx5: add support for multiport IB devices Shahaf Shuler
2019-03-21 12:58     ` Slava Ovsiienko
2019-03-25 17:03   ` [PATCH v2 " Viacheslav Ovsiienko
2019-03-25 17:03     ` [PATCH v2 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
2019-03-25 18:06       ` Stephen Hemminger
2019-03-25 18:07       ` Stephen Hemminger
2019-03-26  7:33         ` Slava Ovsiienko
2019-03-26 12:20       ` Shahaf Shuler
2019-03-25 17:03     ` [PATCH v2 02/13] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
2019-03-26 11:47       ` Shahaf Shuler
2019-03-25 17:03     ` [PATCH v2 03/13] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
2019-03-25 17:03     ` [PATCH v2 04/13] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
2019-03-26 12:02       ` Shahaf Shuler
2019-03-25 17:03     ` [PATCH v2 05/13] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
2019-03-26 12:10       ` Shahaf Shuler
2019-03-25 17:03     ` [PATCH v2 06/13] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
2019-03-25 17:03     ` [PATCH v2 07/13] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
2019-03-25 17:03     ` [PATCH v2 08/13] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
2019-03-25 17:03     ` [PATCH v2 09/13] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
2019-03-25 17:03     ` [PATCH v2 10/13] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
2019-03-25 17:03     ` [PATCH v2 11/13] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
2019-03-26 12:14       ` Shahaf Shuler
2019-03-25 17:03     ` [PATCH v2 12/13] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
2019-03-26 12:16       ` Shahaf Shuler
2019-03-25 17:03     ` [PATCH v2 13/13] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
2019-03-26 12:21       ` Shahaf Shuler
2019-03-26 15:35     ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Viacheslav Ovsiienko
2019-03-26 15:35       ` [PATCH v3 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
2019-03-26 19:37         ` Shahaf Shuler
2019-03-26 15:35       ` [PATCH v3 02/13] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
2019-03-26 15:35       ` [PATCH v3 03/13] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
2019-03-26 15:35       ` [PATCH v3 04/13] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
2019-03-26 15:35       ` [PATCH v3 05/13] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
2019-03-26 19:35         ` Shahaf Shuler
2019-03-26 15:35       ` [PATCH v3 06/13] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
2019-03-26 15:35       ` [PATCH v3 07/13] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
2019-03-26 15:35       ` [PATCH v3 08/13] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
2019-03-26 15:35       ` [PATCH v3 09/13] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
2019-03-26 15:35       ` [PATCH v3 10/13] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
2019-03-26 15:35       ` [PATCH v3 11/13] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
2019-03-26 15:35       ` [PATCH v3 12/13] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
2019-03-26 15:35       ` [PATCH v3 13/13] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
2019-03-26 19:38         ` Shahaf Shuler
2019-03-27  6:00       ` [PATCH v3 00/14] net/mlx5: add support for multiport IB devices Shahaf Shuler
2019-03-27  7:31         ` Slava Ovsiienko
2019-03-27 13:15       ` [PATCH v4 " Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 01/13] net/mlx5: add representor recognition on kernels 5.x Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 02/13] net/mlx5: modify get ifindex routine for multiport IB Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 03/13] net/mlx5: add getting IB ports number " Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 04/13] net/mlx5: add multiport IB device support to probing Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 05/13] net/mlx5: add IB shared context alloc/free functions Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 06/13] net/mlx5: switch to the names in the shared IB context Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 07/13] net/mlx5: switch to the shared Protection Domain Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 08/13] net/mlx5: switch to the shared context IB attributes Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 09/13] net/mlx5: switch to the shared IB device context Viacheslav Ovsiienko
2019-04-02  4:49           ` Shahaf Shuler
2019-03-27 13:15         ` [PATCH v4 10/13] net/mlx5: provide IB port for the object being created Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 11/13] net/mlx5: update install/uninstall int handler routines Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 12/13] net/mlx5: update event handler for multiport IB devices Viacheslav Ovsiienko
2019-03-27 13:15         ` [PATCH v4 13/13] net/mlx5: add source vport match to the ingress rules Viacheslav Ovsiienko
2019-03-28  9:21         ` [PATCH v4 00/14] net/mlx5: add support for multiport IB devices Shahaf Shuler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.