All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next v1 0/4] Query GID table API
@ 2020-09-14 11:11 Leon Romanovsky
  2020-09-14 11:11 ` [PATCH rdma-next v1 1/4] RDMA/core: Change rdma_get_gid_attr returned error code Leon Romanovsky
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-09-14 11:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Ariel Elior, Avihai Horon, linux-kernel,
	linux-rdma, Michal Kalderon

From: Leon Romanovsky <leonro@nvidia.com>

Changelog:
v1:
 * Moved git_type logic to cma_set_default_gid_type - Patch #2
 * Changed signature of rdma_query_gid_table - Patch #3
 * Changed i to be unsigned - Patch #3
 * Fixed multiplication overflow - Patch #4
v0: https://lore.kernel.org/lkml/20200910142204.1309061-1-leon@kernel.org

------------------------------------------------------------------------------

From Avihai,

When an application is not using RDMA CM and if it is using multiple RDMA
devices with one or more RoCE ports, finding the right GID table entry is
a long process.

For example, with two RoCE dual-port devices in a system, when IP
failover is used between two RoCE ports, searching a suitable GID
entry for a given source IP, matching netdevice of given RoCEv1/v2 type
requires iterating over all 4 ports * 256 entry GID table.

Even though the best first match GID table for given criteria is used,
when the matching entry is on the 4th port, it requires reading
3 ports * 256 entries * 3 files (GID, netdev, type) = 2304 files.

The GID table needs to be referred on every QP creation during IP
failover on other netdevice of an RDMA device.

In an alternative approach, a GID cache may be maintained and updated on
GID change event was reported by the kernel. However, it comes with below
two limitations:
(a) Maintain a thread per application process instance to listen and update
 the cache.
(b) Without the thread, on cache miss event, query the GID table. Even in
 this approach, if multiple processes are used, a GID cache needs to be
 maintained on a per-process basis. With a large number of processes,
 this method doesn't scale.

Hence, we introduce this series of patches, which introduces an API to
query the complete GID tables of an RDMA device, that returns all valid
GID table entries.

This is done through single ioctl, eliminating 2304 read, 2304 open and
2304 close system calls to just a total of 2 calls (one for each device).

While at it, we also introduce an API to query an individual GID entry
over ioctl interface, which provides all GID attributes information.

Thanks

Avihai Horon (4):
  RDMA/core: Change rdma_get_gid_attr returned error code
  RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
  RDMA/core: Introduce new GID table query API
  RDMA/uverbs: Expose the new GID query API to user space

 drivers/infiniband/core/cache.c               |  97 +++++++++-
 drivers/infiniband/core/cma.c                 |   4 +
 drivers/infiniband/core/cma_configfs.c        |   9 +-
 drivers/infiniband/core/sysfs.c               |   3 +-
 .../infiniband/core/uverbs_std_types_device.c | 182 +++++++++++++++++-
 drivers/infiniband/core/verbs.c               |   2 +-
 drivers/infiniband/hw/mlx5/cq.c               |   2 +-
 drivers/infiniband/hw/mlx5/main.c             |   4 +-
 drivers/infiniband/hw/qedr/verbs.c            |   4 +-
 include/rdma/ib_cache.h                       |   5 +
 include/rdma/ib_verbs.h                       |  19 +-
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  16 ++
 include/uapi/rdma/ib_user_ioctl_verbs.h       |  14 ++
 13 files changed, 341 insertions(+), 20 deletions(-)

--
2.26.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH rdma-next v1 1/4] RDMA/core: Change rdma_get_gid_attr returned error code
  2020-09-14 11:11 [PATCH rdma-next v1 0/4] Query GID table API Leon Romanovsky
@ 2020-09-14 11:11 ` Leon Romanovsky
  2020-09-14 11:11 ` [PATCH rdma-next v1 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Leon Romanovsky
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-09-14 11:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Avihai Horon, linux-rdma

From: Avihai Horon <avihaih@nvidia.com>

Change the error code returned from rdma_get_gid_attr when the GID entry
is invalid but the GID index is in the gid table size range to -ENODATA
instead of -EINVAL.

This change is done in order to provide a more accurate error reporting
to be used by the new GID query API in user space. Nevertheless, -EINVAL
is still returned from sysfs in the aforementioned case to maintain
compatibility with user space that expects -EINVAL.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cache.c | 2 +-
 drivers/infiniband/core/sysfs.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index ffad73bb40ff..6079f1f7e678 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -1220,7 +1220,7 @@ EXPORT_SYMBOL(ib_get_cached_port_state);
 const struct ib_gid_attr *
 rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
 {
-	const struct ib_gid_attr *attr = ERR_PTR(-EINVAL);
+	const struct ib_gid_attr *attr = ERR_PTR(-ENODATA);
 	struct ib_gid_table *table;
 	unsigned long flags;

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 01d85da00187..1a5507b79437 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -391,7 +391,8 @@ static ssize_t _show_port_gid_attr(

 	gid_attr = rdma_get_gid_attr(p->ibdev, p->port_num, tab_attr->index);
 	if (IS_ERR(gid_attr))
-		return PTR_ERR(gid_attr);
+		/* -EINVAL is returned for user space compatibility reasons. */
+		return -EINVAL;

 	ret = print(gid_attr, buf);
 	rdma_put_gid_attr(gid_attr);
--
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH rdma-next v1 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
  2020-09-14 11:11 [PATCH rdma-next v1 0/4] Query GID table API Leon Romanovsky
  2020-09-14 11:11 ` [PATCH rdma-next v1 1/4] RDMA/core: Change rdma_get_gid_attr returned error code Leon Romanovsky
@ 2020-09-14 11:11 ` Leon Romanovsky
  2020-09-14 11:11 ` [PATCH rdma-next v1 3/4] RDMA/core: Introduce new GID table query API Leon Romanovsky
  2020-09-14 11:11 ` [PATCH rdma-next v1 4/4] RDMA/uverbs: Expose the new GID query API to user space Leon Romanovsky
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-09-14 11:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Avihai Horon, Ariel Elior, linux-rdma, Michal Kalderon

From: Avihai Horon <avihaih@nvidia.com>

Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values,
so enum ib_gid_type will match the gid types of the new query GID table
API which will be introduced in the following patches.

This change in enum ib_gid_type requires to change also enum
rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1
values.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cache.c        |  4 ++++
 drivers/infiniband/core/cma.c          |  4 ++++
 drivers/infiniband/core/cma_configfs.c |  9 +++++----
 drivers/infiniband/core/verbs.c        |  2 +-
 drivers/infiniband/hw/mlx5/cq.c        |  2 +-
 drivers/infiniband/hw/mlx5/main.c      |  4 ++--
 drivers/infiniband/hw/qedr/verbs.c     |  4 +++-
 include/rdma/ib_verbs.h                | 17 ++++++++++-------
 8 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 6079f1f7e678..cf49ac0b0aa6 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -133,7 +133,11 @@ static void dispatch_gid_change_event(struct ib_device *ib_dev, u8 port)
 }
 
 static const char * const gid_type_str[] = {
+	/* IB/RoCE v1 value is set for IB_GID_TYPE_IB and IB_GID_TYPE_ROCE for
+	 * user space compatibility reasons.
+	 */
 	[IB_GID_TYPE_IB]	= "IB/RoCE v1",
+	[IB_GID_TYPE_ROCE]	= "IB/RoCE v1",
 	[IB_GID_TYPE_ROCE_UDP_ENCAP]	= "RoCE v2",
 };
 
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 82338099ba09..0de5dff13dad 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -304,6 +304,10 @@ int cma_set_default_gid_type(struct cma_device *cma_dev,
 	if (!rdma_is_port_valid(cma_dev->device, port))
 		return -EINVAL;
 
+	if (default_gid_type == IB_GID_TYPE_IB &&
+	    rdma_protocol_roce_eth_encap(cma_dev->device, port))
+		default_gid_type = IB_GID_TYPE_ROCE;
+
 	supported_gids = roce_gid_type_mask_support(cma_dev->device, port);
 
 	if (!(supported_gids & 1 << default_gid_type))
diff --git a/drivers/infiniband/core/cma_configfs.c b/drivers/infiniband/core/cma_configfs.c
index 3c1e2ca564fe..7ec4af2ed87a 100644
--- a/drivers/infiniband/core/cma_configfs.c
+++ b/drivers/infiniband/core/cma_configfs.c
@@ -123,16 +123,17 @@ static ssize_t default_roce_mode_store(struct config_item *item,
 {
 	struct cma_device *cma_dev;
 	struct cma_dev_port_group *group;
-	int gid_type = ib_cache_gid_parse_type_str(buf);
+	int gid_type;
 	ssize_t ret;
 
-	if (gid_type < 0)
-		return -EINVAL;
-
 	ret = cma_configfs_params_get(item, &cma_dev, &group);
 	if (ret)
 		return ret;
 
+	gid_type = ib_cache_gid_parse_type_str(buf);
+	if (gid_type < 0)
+		return -EINVAL;
+
 	ret = cma_set_default_gid_type(cma_dev, group->port_num, gid_type);
 
 	cma_configfs_params_put(cma_dev);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 6e02f26a30a0..ac4b02f5ae68 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -740,7 +740,7 @@ int ib_get_gids_from_rdma_hdr(const union rdma_network_hdr *hdr,
 				       (struct in6_addr *)dgid);
 		return 0;
 	} else if (net_type == RDMA_NETWORK_IPV6 ||
-		   net_type == RDMA_NETWORK_IB) {
+		   net_type == RDMA_NETWORK_IB || RDMA_NETWORK_ROCE_V1) {
 		*dgid = hdr->ibgrh.dgid;
 		*sgid = hdr->ibgrh.sgid;
 		return 0;
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 0748a5daa2dd..3d4e02ae6628 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -255,7 +255,7 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
 
 	switch (roce_packet_type) {
 	case MLX5_CQE_ROCE_L3_HEADER_TYPE_GRH:
-		wc->network_hdr_type = RDMA_NETWORK_IB;
+		wc->network_hdr_type = RDMA_NETWORK_ROCE_V1;
 		break;
 	case MLX5_CQE_ROCE_L3_HEADER_TYPE_IPV6:
 		wc->network_hdr_type = RDMA_NETWORK_IPV6;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 8696ad50d8fb..5cf5266936de 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -546,7 +546,7 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u8 port_num,
 			 unsigned int index, const union ib_gid *gid,
 			 const struct ib_gid_attr *attr)
 {
-	enum ib_gid_type gid_type = IB_GID_TYPE_IB;
+	enum ib_gid_type gid_type = IB_GID_TYPE_ROCE;
 	u16 vlan_id = 0xffff;
 	u8 roce_version = 0;
 	u8 roce_l3_type = 0;
@@ -561,7 +561,7 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u8 port_num,
 	}
 
 	switch (gid_type) {
-	case IB_GID_TYPE_IB:
+	case IB_GID_TYPE_ROCE:
 		roce_version = MLX5_ROCE_VERSION_1;
 		break;
 	case IB_GID_TYPE_ROCE_UDP_ENCAP:
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 3337d4b6a841..0395523958da 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1138,7 +1138,7 @@ static inline int get_gid_info_from_table(struct ib_qp *ibqp,
 		SET_FIELD(qp_params->modify_flags,
 			  QED_ROCE_MODIFY_QP_VALID_ROCE_MODE, 1);
 		break;
-	case RDMA_NETWORK_IB:
+	case RDMA_NETWORK_ROCE_V1:
 		memcpy(&qp_params->sgid.bytes[0], &gid_attr->gid.raw[0],
 		       sizeof(qp_params->sgid));
 		memcpy(&qp_params->dgid.bytes[0],
@@ -1158,6 +1158,8 @@ static inline int get_gid_info_from_table(struct ib_qp *ibqp,
 			  QED_ROCE_MODIFY_QP_VALID_ROCE_MODE, 1);
 		qp_params->roce_mode = ROCE_V2_IPV4;
 		break;
+	default:
+		return -EINVAL;
 	}
 
 	for (i = 0; i < 4; i++) {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index bde17507f3ea..80cc0f5de5c3 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -138,10 +138,9 @@ union ib_gid {
 extern union ib_gid zgid;
 
 enum ib_gid_type {
-	/* If link layer is Ethernet, this is RoCE V1 */
 	IB_GID_TYPE_IB        = 0,
-	IB_GID_TYPE_ROCE      = 0,
-	IB_GID_TYPE_ROCE_UDP_ENCAP = 1,
+	IB_GID_TYPE_ROCE      = 1,
+	IB_GID_TYPE_ROCE_UDP_ENCAP = 2,
 	IB_GID_TYPE_SIZE
 };
 
@@ -180,7 +179,7 @@ rdma_node_get_transport(unsigned int node_type);
 
 enum rdma_network_type {
 	RDMA_NETWORK_IB,
-	RDMA_NETWORK_ROCE_V1 = RDMA_NETWORK_IB,
+	RDMA_NETWORK_ROCE_V1,
 	RDMA_NETWORK_IPV4,
 	RDMA_NETWORK_IPV6
 };
@@ -190,9 +189,10 @@ static inline enum ib_gid_type ib_network_to_gid_type(enum rdma_network_type net
 	if (network_type == RDMA_NETWORK_IPV4 ||
 	    network_type == RDMA_NETWORK_IPV6)
 		return IB_GID_TYPE_ROCE_UDP_ENCAP;
-
-	/* IB_GID_TYPE_IB same as RDMA_NETWORK_ROCE_V1 */
-	return IB_GID_TYPE_IB;
+	else if (network_type == RDMA_NETWORK_ROCE_V1)
+		return IB_GID_TYPE_ROCE;
+	else
+		return IB_GID_TYPE_IB;
 }
 
 static inline enum rdma_network_type
@@ -201,6 +201,9 @@ rdma_gid_attr_network_type(const struct ib_gid_attr *attr)
 	if (attr->gid_type == IB_GID_TYPE_IB)
 		return RDMA_NETWORK_IB;
 
+	if (attr->gid_type == IB_GID_TYPE_ROCE)
+		return RDMA_NETWORK_ROCE_V1;
+
 	if (ipv6_addr_v4mapped((struct in6_addr *)&attr->gid))
 		return RDMA_NETWORK_IPV4;
 	else
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH rdma-next v1 3/4] RDMA/core: Introduce new GID table query API
  2020-09-14 11:11 [PATCH rdma-next v1 0/4] Query GID table API Leon Romanovsky
  2020-09-14 11:11 ` [PATCH rdma-next v1 1/4] RDMA/core: Change rdma_get_gid_attr returned error code Leon Romanovsky
  2020-09-14 11:11 ` [PATCH rdma-next v1 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Leon Romanovsky
@ 2020-09-14 11:11 ` Leon Romanovsky
  2020-09-14 11:11 ` [PATCH rdma-next v1 4/4] RDMA/uverbs: Expose the new GID query API to user space Leon Romanovsky
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-09-14 11:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Avihai Horon, linux-rdma

From: Avihai Horon <avihaih@nvidia.com>

Introduce rdma_query_gid_table which enables querying all the GID tables
of a given device and copying the attributes of all valid GID entries to
a provided buffer.

This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cache.c         | 91 +++++++++++++++++++++++++
 include/rdma/ib_cache.h                 |  5 ++
 include/uapi/rdma/ib_user_ioctl_verbs.h |  8 +++
 3 files changed, 104 insertions(+)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index cf49ac0b0aa6..d4a83b52a31e 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -1247,6 +1247,97 @@ rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
 }
 EXPORT_SYMBOL(rdma_get_gid_attr);
 
+/**
+ * rdma_get_ndev_ifindex - Reads ndev ifindex of the given gid attr.
+ * @gid_attr: Pointer to the GID attribute.
+ * @ndev_ifindex: Pointer through which the ndev ifindex is returned.
+ *
+ * Returns 0 on success or appropriate error code. The netdevice must be in UP
+ * state.
+ */
+int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr, u32 *ndev_ifindex)
+{
+	struct net_device *ndev;
+	int ret = 0;
+
+	if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
+		*ndev_ifindex = 0;
+		return 0;
+	}
+
+	rcu_read_lock();
+	ndev = rcu_dereference(gid_attr->ndev);
+	if (!ndev || (READ_ONCE(ndev->flags) & IFF_UP) == 0) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	*ndev_ifindex = ndev->ifindex;
+out:
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL(rdma_get_ndev_ifindex);
+
+/**
+ * rdma_query_gid_table - Reads GID table entries of all the ports of a device up to max_entries.
+ * @device: The device to query.
+ * @entries: Entries where GID entries are returned.
+ * @max_entries: Maximum number of entries that can be returned.
+ * Entries array must be allocated to hold max_entries number of entries.
+ * @num_entries: Updated to the number of entries that were successfully read.
+ *
+ * Returns number of entries on success or appropriate error code.
+ */
+ssize_t rdma_query_gid_table(struct ib_device *device,
+			     struct ib_uverbs_gid_entry *entries,
+			     size_t max_entries)
+{
+	const struct ib_gid_attr *gid_attr;
+	ssize_t num_entries = 0, ret;
+	struct ib_gid_table *table;
+	unsigned int port_num, i;
+	unsigned long flags;
+
+	rdma_for_each_port(device, port_num) {
+		if (!rdma_ib_or_roce(device, port_num))
+			continue;
+
+		table = rdma_gid_table(device, port_num);
+		read_lock_irqsave(&table->rwlock, flags);
+		for (i = 0; i < table->sz; i++) {
+			if (!is_gid_entry_valid(table->data_vec[i]))
+				continue;
+			if (num_entries >= max_entries) {
+				ret = -EINVAL;
+				goto err;
+			}
+
+			gid_attr = &table->data_vec[i]->attr;
+
+			memcpy(&entries->gid, &gid_attr->gid,
+			       sizeof(gid_attr->gid));
+			entries->gid_index = gid_attr->index;
+			entries->port_num = gid_attr->port_num;
+			entries->gid_type = gid_attr->gid_type;
+			ret = rdma_get_ndev_ifindex(gid_attr,
+						    &entries->netdev_ifindex);
+			if (ret)
+				goto err;
+
+			num_entries++;
+			entries++;
+		}
+		read_unlock_irqrestore(&table->rwlock, flags);
+	}
+
+	return num_entries;
+err:
+	read_unlock_irqrestore(&table->rwlock, flags);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_query_gid_table);
+
 /**
  * rdma_put_gid_attr - Release reference to the GID attribute
  * @attr:		Pointer to the GID attribute whose reference
diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h
index 66a8f369a2fa..0f092ab0d160 100644
--- a/include/rdma/ib_cache.h
+++ b/include/rdma/ib_cache.h
@@ -110,5 +110,10 @@ const struct ib_gid_attr *rdma_get_gid_attr(struct ib_device *device,
 					    u8 port_num, int index);
 void rdma_put_gid_attr(const struct ib_gid_attr *attr);
 void rdma_hold_gid_attr(const struct ib_gid_attr *attr);
+int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr,
+			  u32 *ndev_ifindex);
+ssize_t rdma_query_gid_table(struct ib_device *device,
+			     struct ib_uverbs_gid_entry *entries,
+			     size_t max_entries);
 
 #endif /* _IB_CACHE_H */
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 5debab45ebcb..d5ac65ae2557 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -250,4 +250,12 @@ enum rdma_driver_id {
 	RDMA_DRIVER_SIW,
 };
 
+struct ib_uverbs_gid_entry {
+	__aligned_u64 gid[2];
+	__u32 gid_index;
+	__u32 port_num;
+	__u32 gid_type;
+	__u32 netdev_ifindex; /* It is 0 if there is no netdev associated with it */
+};
+
 #endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH rdma-next v1 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-14 11:11 [PATCH rdma-next v1 0/4] Query GID table API Leon Romanovsky
                   ` (2 preceding siblings ...)
  2020-09-14 11:11 ` [PATCH rdma-next v1 3/4] RDMA/core: Introduce new GID table query API Leon Romanovsky
@ 2020-09-14 11:11 ` Leon Romanovsky
  2020-09-14 23:07   ` kernel test robot
  3 siblings, 1 reply; 7+ messages in thread
From: Leon Romanovsky @ 2020-09-14 11:11 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Avihai Horon, linux-rdma

From: Avihai Horon <avihaih@nvidia.com>

Expose the query GID table and entry API to user space by adding
two new methods and method handlers to the device object.

This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../infiniband/core/uverbs_std_types_device.c | 182 +++++++++++++++++-
 include/rdma/ib_verbs.h                       |   6 +-
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  16 ++
 include/uapi/rdma/ib_user_ioctl_verbs.h       |   6 +
 4 files changed, 206 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_std_types_device.c b/drivers/infiniband/core/uverbs_std_types_device.c
index 7b03446b6936..b0aa00d8586e 100644
--- a/drivers/infiniband/core/uverbs_std_types_device.c
+++ b/drivers/infiniband/core/uverbs_std_types_device.c
@@ -3,11 +3,13 @@
  * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
  */
 
+#include <linux/overflow.h>
 #include <rdma/uverbs_std_types.h>
 #include "rdma_core.h"
 #include "uverbs.h"
 #include <rdma/uverbs_ioctl.h>
 #include <rdma/opa_addr.h>
+#include <rdma/ib_cache.h>
 
 /*
  * This ioctl method allows calling any defined write or write_ex
@@ -266,6 +268,158 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_CONTEXT)(
 	return ucontext->device->ops.query_ucontext(ucontext, attrs);
 }
 
+static int copy_gid_entries_to_user(struct uverbs_attr_bundle *attrs,
+				    struct ib_uverbs_gid_entry *entries,
+				    size_t num_entries, size_t user_entry_size)
+{
+	const struct uverbs_attr *attr;
+	void __user *user_entries;
+	size_t copy_len;
+	int ret;
+	int i;
+
+	if (user_entry_size == sizeof(*entries)) {
+		ret = uverbs_copy_to(attrs,
+				     UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+				     entries, sizeof(*entries) * num_entries);
+		return ret;
+	}
+
+	copy_len = min_t(size_t, user_entry_size, sizeof(*entries));
+	attr = uverbs_attr_get(attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
+	if (IS_ERR(attr))
+		return PTR_ERR(attr);
+
+	user_entries = u64_to_user_ptr(attr->ptr_attr.data);
+	for (i = 0; i < num_entries; i++) {
+		if (copy_to_user(user_entries, entries, copy_len))
+			return -EFAULT;
+
+		if (user_entry_size > sizeof(*entries)) {
+			if (clear_user(user_entries + sizeof(*entries),
+				       user_entry_size - sizeof(*entries)))
+				return -EFAULT;
+		}
+
+		entries++;
+		user_entries += user_entry_size;
+	}
+
+	return uverbs_output_written(attrs,
+				     UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
+}
+
+static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_TABLE)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uverbs_gid_entry *entries;
+	struct ib_ucontext *ucontext;
+	struct ib_device *ib_dev;
+	size_t user_entry_size;
+	ssize_t num_entries;
+	size_t max_entries;
+	size_t num_bytes;
+	u32 flags;
+	int ret;
+
+	ret = uverbs_get_flags32(&flags, attrs,
+				 UVERBS_ATTR_QUERY_GID_TABLE_FLAGS, 0);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_const(&user_entry_size, attrs,
+			       UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE);
+	if (ret)
+		return ret;
+
+	max_entries = uverbs_attr_ptr_get_array_size(
+		attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+		user_entry_size);
+	if (max_entries <= 0)
+		return -EINVAL;
+
+	ucontext = ib_uverbs_get_ucontext(attrs);
+	if (IS_ERR(ucontext))
+		return PTR_ERR(ucontext);
+	ib_dev = ucontext->device;
+
+	if (check_mul_overflow(max_entries, sizeof(*entries), &num_bytes))
+		return -EINVAL;
+
+	entries = uverbs_zalloc(attrs, num_bytes);
+	if (!entries)
+		return -ENOMEM;
+
+	num_entries = rdma_query_gid_table(ib_dev, entries, max_entries);
+	if (num_entries < 0)
+		return -EINVAL;
+
+	ret = copy_gid_entries_to_user(attrs, entries, num_entries,
+				       user_entry_size);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_to(attrs,
+			     UVERBS_ATTR_QUERY_GID_TABLE_RESP_NUM_ENTRIES,
+			     &num_entries, sizeof(num_entries));
+	return ret;
+}
+
+static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
+	struct uverbs_attr_bundle *attrs)
+{
+	const struct ib_gid_attr *gid_attr;
+	struct ib_uverbs_gid_entry entry;
+	struct ib_ucontext *ucontext;
+	struct ib_device *ib_dev;
+	u32 gid_index;
+	u32 port_num;
+	u32 flags;
+	int ret;
+
+	ret = uverbs_get_flags32(&flags, attrs,
+				 UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_const(&port_num, attrs,
+			       UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_const(&gid_index, attrs,
+			       UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
+	if (ret)
+		return ret;
+
+	ucontext = ib_uverbs_get_ucontext(attrs);
+	if (IS_ERR(ucontext))
+		return PTR_ERR(ucontext);
+	ib_dev = ucontext->device;
+
+	if (!rdma_ib_or_roce(ib_dev, port_num))
+		return -EINVAL;
+
+	gid_attr = rdma_get_gid_attr(ib_dev, port_num, gid_index);
+	if (IS_ERR(gid_attr))
+		return PTR_ERR(gid_attr);
+
+	memcpy(&entry.gid, &gid_attr->gid, sizeof(gid_attr->gid));
+	entry.gid_index = gid_attr->index;
+	entry.port_num = gid_attr->port_num;
+	entry.gid_type = gid_attr->gid_type;
+	ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
+	if (ret)
+		goto out;
+
+	ret = uverbs_copy_to_struct_or_zero(
+		attrs, UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY, &entry,
+		sizeof(entry));
+out:
+	rdma_put_gid_attr(gid_attr);
+	return ret;
+}
+
 DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_METHOD_GET_CONTEXT,
 	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_GET_CONTEXT_NUM_COMP_VECTORS,
@@ -300,12 +454,38 @@ DECLARE_UVERBS_NAMED_METHOD(
 				   reserved),
 		UA_MANDATORY));
 
+DECLARE_UVERBS_NAMED_METHOD(
+	UVERBS_METHOD_QUERY_GID_TABLE,
+	UVERBS_ATTR_CONST_IN(UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE, u64,
+			     UA_MANDATORY),
+	UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_QUERY_GID_TABLE_FLAGS, u32,
+			     UA_OPTIONAL),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+			    UVERBS_ATTR_MIN_SIZE(0), UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_QUERY_GID_TABLE_RESP_NUM_ENTRIES,
+			    UVERBS_ATTR_TYPE(u64), UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_METHOD(
+	UVERBS_METHOD_QUERY_GID_ENTRY,
+	UVERBS_ATTR_CONST_IN(UVERBS_ATTR_QUERY_GID_ENTRY_PORT, u32,
+			     UA_MANDATORY),
+	UVERBS_ATTR_CONST_IN(UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX, u32,
+			     UA_MANDATORY),
+	UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, u32,
+			     UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY,
+			    UVERBS_ATTR_STRUCT(struct ib_uverbs_gid_entry,
+					       netdev_ifindex),
+			    UA_MANDATORY));
+
 DECLARE_UVERBS_GLOBAL_METHODS(UVERBS_OBJECT_DEVICE,
 			      &UVERBS_METHOD(UVERBS_METHOD_GET_CONTEXT),
 			      &UVERBS_METHOD(UVERBS_METHOD_INVOKE_WRITE),
 			      &UVERBS_METHOD(UVERBS_METHOD_INFO_HANDLES),
 			      &UVERBS_METHOD(UVERBS_METHOD_QUERY_PORT),
-			      &UVERBS_METHOD(UVERBS_METHOD_QUERY_CONTEXT));
+			      &UVERBS_METHOD(UVERBS_METHOD_QUERY_CONTEXT),
+			      &UVERBS_METHOD(UVERBS_METHOD_QUERY_GID_TABLE),
+			      &UVERBS_METHOD(UVERBS_METHOD_QUERY_GID_ENTRY));
 
 const struct uapi_definition uverbs_def_obj_device[] = {
 	UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_DEVICE),
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 80cc0f5de5c3..1e170e09afb0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -138,9 +138,9 @@ union ib_gid {
 extern union ib_gid zgid;
 
 enum ib_gid_type {
-	IB_GID_TYPE_IB        = 0,
-	IB_GID_TYPE_ROCE      = 1,
-	IB_GID_TYPE_ROCE_UDP_ENCAP = 2,
+	IB_GID_TYPE_IB = IB_UVERBS_GID_TYPE_IB,
+	IB_GID_TYPE_ROCE = IB_UVERBS_GID_TYPE_ROCE_V1,
+	IB_GID_TYPE_ROCE_UDP_ENCAP = IB_UVERBS_GID_TYPE_ROCE_V2,
 	IB_GID_TYPE_SIZE
 };
 
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 99dcabf61a71..7968a1845355 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -70,6 +70,8 @@ enum uverbs_methods_device {
 	UVERBS_METHOD_QUERY_PORT,
 	UVERBS_METHOD_GET_CONTEXT,
 	UVERBS_METHOD_QUERY_CONTEXT,
+	UVERBS_METHOD_QUERY_GID_TABLE,
+	UVERBS_METHOD_QUERY_GID_ENTRY,
 };
 
 enum uverbs_attrs_invoke_write_cmd_attr_ids {
@@ -352,4 +354,18 @@ enum uverbs_attrs_async_event_create {
 	UVERBS_ATTR_ASYNC_EVENT_ALLOC_FD_HANDLE,
 };
 
+enum uverbs_attrs_query_gid_table_cmd_attr_ids {
+	UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE,
+	UVERBS_ATTR_QUERY_GID_TABLE_FLAGS,
+	UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+	UVERBS_ATTR_QUERY_GID_TABLE_RESP_NUM_ENTRIES,
+};
+
+enum uverbs_attrs_query_gid_entry_cmd_attr_ids {
+	UVERBS_ATTR_QUERY_GID_ENTRY_PORT,
+	UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX,
+	UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS,
+	UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY,
+};
+
 #endif
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index d5ac65ae2557..cfea82acfe57 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -250,6 +250,12 @@ enum rdma_driver_id {
 	RDMA_DRIVER_SIW,
 };
 
+enum ib_uverbs_gid_type {
+	IB_UVERBS_GID_TYPE_IB,
+	IB_UVERBS_GID_TYPE_ROCE_V1,
+	IB_UVERBS_GID_TYPE_ROCE_V2,
+};
+
 struct ib_uverbs_gid_entry {
 	__aligned_u64 gid[2];
 	__u32 gid_index;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH rdma-next v1 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-14 11:11 ` [PATCH rdma-next v1 4/4] RDMA/uverbs: Expose the new GID query API to user space Leon Romanovsky
@ 2020-09-14 23:07   ` kernel test robot
  2020-09-15  6:40     ` Leon Romanovsky
  0 siblings, 1 reply; 7+ messages in thread
From: kernel test robot @ 2020-09-14 23:07 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4445 bytes --]

Hi Leon,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on rdma/for-next]
[also build test ERROR on v5.9-rc5 next-20200914]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Leon-Romanovsky/Query-GID-table-API/20200914-191347
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
config: microblaze-randconfig-r033-20200913 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=microblaze 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from drivers/infiniband/core/uverbs_std_types_device.c:6:
   include/linux/overflow.h: In function 'array_size':
>> include/linux/overflow.h:258:10: error: 'SIZE_MAX' undeclared (first use in this function)
     258 |   return SIZE_MAX;
         |          ^~~~~~~~
   include/linux/overflow.h:6:1: note: 'SIZE_MAX' is defined in header '<stdint.h>'; did you forget to '#include <stdint.h>'?
       5 | #include <linux/compiler.h>
     +++ |+#include <stdint.h>
       6 | 
   include/linux/overflow.h:258:10: note: each undeclared identifier is reported only once for each function it appears in
     258 |   return SIZE_MAX;
         |          ^~~~~~~~
   include/linux/overflow.h: In function 'array3_size':
   include/linux/overflow.h:280:10: error: 'SIZE_MAX' undeclared (first use in this function)
     280 |   return SIZE_MAX;
         |          ^~~~~~~~
   include/linux/overflow.h:280:10: note: 'SIZE_MAX' is defined in header '<stdint.h>'; did you forget to '#include <stdint.h>'?
   include/linux/overflow.h: In function '__ab_c_size':
   include/linux/overflow.h:296:10: error: 'SIZE_MAX' undeclared (first use in this function)
     296 |   return SIZE_MAX;
         |          ^~~~~~~~
   include/linux/overflow.h:296:10: note: 'SIZE_MAX' is defined in header '<stdint.h>'; did you forget to '#include <stdint.h>'?

# https://github.com/0day-ci/linux/commit/97341ae6bc1e5075caa16ee753a3e4f9d1ed522b
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Leon-Romanovsky/Query-GID-table-API/20200914-191347
git checkout 97341ae6bc1e5075caa16ee753a3e4f9d1ed522b
vim +/SIZE_MAX +258 include/linux/overflow.h

0c66847793d198 Jason Gunthorpe 2018-08-01  241  
610b15c50e86eb Kees Cook       2018-05-07  242  /**
610b15c50e86eb Kees Cook       2018-05-07  243   * array_size() - Calculate size of 2-dimensional array.
610b15c50e86eb Kees Cook       2018-05-07  244   *
610b15c50e86eb Kees Cook       2018-05-07  245   * @a: dimension one
610b15c50e86eb Kees Cook       2018-05-07  246   * @b: dimension two
610b15c50e86eb Kees Cook       2018-05-07  247   *
610b15c50e86eb Kees Cook       2018-05-07  248   * Calculates size of 2-dimensional array: @a * @b.
610b15c50e86eb Kees Cook       2018-05-07  249   *
610b15c50e86eb Kees Cook       2018-05-07  250   * Returns: number of bytes needed to represent the array or SIZE_MAX on
610b15c50e86eb Kees Cook       2018-05-07  251   * overflow.
610b15c50e86eb Kees Cook       2018-05-07  252   */
610b15c50e86eb Kees Cook       2018-05-07  253  static inline __must_check size_t array_size(size_t a, size_t b)
610b15c50e86eb Kees Cook       2018-05-07  254  {
610b15c50e86eb Kees Cook       2018-05-07  255  	size_t bytes;
610b15c50e86eb Kees Cook       2018-05-07  256  
610b15c50e86eb Kees Cook       2018-05-07  257  	if (check_mul_overflow(a, b, &bytes))
610b15c50e86eb Kees Cook       2018-05-07 @258  		return SIZE_MAX;
610b15c50e86eb Kees Cook       2018-05-07  259  
610b15c50e86eb Kees Cook       2018-05-07  260  	return bytes;
610b15c50e86eb Kees Cook       2018-05-07  261  }
610b15c50e86eb Kees Cook       2018-05-07  262  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 29552 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH rdma-next v1 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-14 23:07   ` kernel test robot
@ 2020-09-15  6:40     ` Leon Romanovsky
  0 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-09-15  6:40 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 263 bytes --]

On Tue, Sep 15, 2020 at 07:07:58AM +0800, kernel test robot wrote:
> Hi Leon,
>
> Thank you for the patch! Yet something to improve:

The fix for overflow.h was posted earlier.
https://lore.kernel.org/lkml/20200913102928.134985-1-leon(a)kernel.org

Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-15  6:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-14 11:11 [PATCH rdma-next v1 0/4] Query GID table API Leon Romanovsky
2020-09-14 11:11 ` [PATCH rdma-next v1 1/4] RDMA/core: Change rdma_get_gid_attr returned error code Leon Romanovsky
2020-09-14 11:11 ` [PATCH rdma-next v1 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Leon Romanovsky
2020-09-14 11:11 ` [PATCH rdma-next v1 3/4] RDMA/core: Introduce new GID table query API Leon Romanovsky
2020-09-14 11:11 ` [PATCH rdma-next v1 4/4] RDMA/uverbs: Expose the new GID query API to user space Leon Romanovsky
2020-09-14 23:07   ` kernel test robot
2020-09-15  6:40     ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.