All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next 0/4] Query GID table API
@ 2020-09-10 14:22 Leon Romanovsky
  2020-09-10 14:22 ` [PATCH rdma-next 1/4] RDMA/core: Change rdma_get_gid_attr returned error code Leon Romanovsky
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-10 14:22 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Ariel Elior, Avihai Horon, linux-kernel,
	linux-rdma, Michal Kalderon

From: Leon Romanovsky <leonro@nvidia.com>

From Avihai,

When an application is not using RDMA CM and if it is using multiple RDMA
devices with one or more RoCE ports, finding the right GID table entry is
a long process.

For example, with two RoCE dual-port devices in a system, when IP
failover is used between two RoCE ports, searching a suitable GID
entry for a given source IP, matching netdevice of given RoCEv1/v2 type
requires iterating over all 4 ports * 256 entry GID table.

Even though the best first match GID table for given criteria is used,
when the matching entry is on the 4th port, it requires reading
3 ports * 256 entries * 3 files (GID, netdev, type) = 2304 files.

The GID table needs to be referred on every QP creation during IP
failover on other netdevice of an RDMA device.

In an alternative approach, a GID cache may be maintained and updated on
GID change event was reported by the kernel. However, it comes with below
two limitations:
(a) Maintain a thread per application process instance to listen and update
 the cache.
(b) Without the thread, on cache miss event, query the GID table. Even in
 this approach, if multiple processes are used, a GID cache needs to be
 maintained on a per-process basis. With a large number of processes,
 this method doesn't scale.

Hence, we introduce this series of patches, which introduces an API to
query the complete GID tables of an RDMA device, that returns all valid
GID table entries.

This is done through single ioctl, eliminating 2304 read, 2304 open and
2304 close system calls to just a total of 2 calls (one for each device).

While at it, we also introduce an API to query an individual GID entry
over ioctl interface, which provides all GID attributes information.

Thanks

Avihai Horon (4):
  RDMA/core: Change rdma_get_gid_attr returned error code
  RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
  RDMA/core: Introduce new GID table query API
  RDMA/uverbs: Expose the new GID query API to user space

 drivers/infiniband/core/cache.c               |  99 +++++++++-
 drivers/infiniband/core/cma.c                 |  10 -
 drivers/infiniband/core/cma_configfs.c        |  13 +-
 drivers/infiniband/core/cma_priv.h            |  10 +
 drivers/infiniband/core/sysfs.c               |   3 +-
 .../infiniband/core/uverbs_std_types_device.c | 180 +++++++++++++++++-
 drivers/infiniband/core/verbs.c               |   2 +-
 drivers/infiniband/hw/mlx5/cq.c               |   2 +-
 drivers/infiniband/hw/mlx5/main.c             |   4 +-
 drivers/infiniband/hw/qedr/verbs.c            |   4 +-
 include/rdma/ib_cache.h                       |   5 +
 include/rdma/ib_verbs.h                       |  19 +-
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  16 ++
 include/uapi/rdma/ib_user_ioctl_verbs.h       |  14 ++
 14 files changed, 351 insertions(+), 30 deletions(-)

--
2.26.2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH rdma-next 1/4] RDMA/core: Change rdma_get_gid_attr returned error code
  2020-09-10 14:22 [PATCH rdma-next 0/4] Query GID table API Leon Romanovsky
@ 2020-09-10 14:22 ` Leon Romanovsky
  2020-09-10 14:22 ` [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Leon Romanovsky
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-10 14:22 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Avihai Horon, linux-rdma

From: Avihai Horon <avihaih@nvidia.com>

Change the error code returned from rdma_get_gid_attr when the GID entry
is invalid but the GID index is in the gid table size range to -ENODATA
instead of -EINVAL.

This change is done in order to provide a more accurate error reporting
to be used by the new GID query API in user space. Nevertheless, -EINVAL
is still returned from sysfs in the aforementioned case to maintain
compatibility with user space that expects -EINVAL.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cache.c | 2 +-
 drivers/infiniband/core/sysfs.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index ffad73bb40ff..6079f1f7e678 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -1220,7 +1220,7 @@ EXPORT_SYMBOL(ib_get_cached_port_state);
 const struct ib_gid_attr *
 rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
 {
-	const struct ib_gid_attr *attr = ERR_PTR(-EINVAL);
+	const struct ib_gid_attr *attr = ERR_PTR(-ENODATA);
 	struct ib_gid_table *table;
 	unsigned long flags;
 
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 01d85da00187..1a5507b79437 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -391,7 +391,8 @@ static ssize_t _show_port_gid_attr(
 
 	gid_attr = rdma_get_gid_attr(p->ibdev, p->port_num, tab_attr->index);
 	if (IS_ERR(gid_attr))
-		return PTR_ERR(gid_attr);
+		/* -EINVAL is returned for user space compatibility reasons. */
+		return -EINVAL;
 
 	ret = print(gid_attr, buf);
 	rdma_put_gid_attr(gid_attr);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
  2020-09-10 14:22 [PATCH rdma-next 0/4] Query GID table API Leon Romanovsky
  2020-09-10 14:22 ` [PATCH rdma-next 1/4] RDMA/core: Change rdma_get_gid_attr returned error code Leon Romanovsky
@ 2020-09-10 14:22 ` Leon Romanovsky
  2020-09-11 19:11   ` Jason Gunthorpe
  2020-09-10 14:22 ` [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API Leon Romanovsky
  2020-09-10 14:22 ` [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space Leon Romanovsky
  3 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-10 14:22 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Avihai Horon, Ariel Elior, linux-rdma, Michal Kalderon

From: Avihai Horon <avihaih@nvidia.com>

Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values,
so enum ib_gid_type will match the gid types of the new query GID table
API which will be introduced in the following patches.

This change in enum ib_gid_type requires to change also enum
rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1
values.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cache.c        |  4 ++++
 drivers/infiniband/core/cma.c          | 10 ----------
 drivers/infiniband/core/cma_configfs.c | 13 +++++++++----
 drivers/infiniband/core/cma_priv.h     | 10 ++++++++++
 drivers/infiniband/core/verbs.c        |  2 +-
 drivers/infiniband/hw/mlx5/cq.c        |  2 +-
 drivers/infiniband/hw/mlx5/main.c      |  4 ++--
 drivers/infiniband/hw/qedr/verbs.c     |  4 +++-
 include/rdma/ib_verbs.h                | 17 ++++++++++-------
 9 files changed, 40 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 6079f1f7e678..cf49ac0b0aa6 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -133,7 +133,11 @@ static void dispatch_gid_change_event(struct ib_device *ib_dev, u8 port)
 }
 
 static const char * const gid_type_str[] = {
+	/* IB/RoCE v1 value is set for IB_GID_TYPE_IB and IB_GID_TYPE_ROCE for
+	 * user space compatibility reasons.
+	 */
 	[IB_GID_TYPE_IB]	= "IB/RoCE v1",
+	[IB_GID_TYPE_ROCE]	= "IB/RoCE v1",
 	[IB_GID_TYPE_ROCE_UDP_ENCAP]	= "RoCE v2",
 };
 
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 82338099ba09..0bb4e3212158 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -203,16 +203,6 @@ struct xarray *cma_pernet_xa(struct net *net, enum rdma_ucm_port_space ps)
 	}
 }
 
-struct cma_device {
-	struct list_head	list;
-	struct ib_device	*device;
-	struct completion	comp;
-	refcount_t refcount;
-	struct list_head	id_list;
-	enum ib_gid_type	*default_gid_type;
-	u8			*default_roce_tos;
-};
-
 struct rdma_bind_list {
 	enum rdma_ucm_port_space ps;
 	struct hlist_head	owners;
diff --git a/drivers/infiniband/core/cma_configfs.c b/drivers/infiniband/core/cma_configfs.c
index 3c1e2ca564fe..f1b277793980 100644
--- a/drivers/infiniband/core/cma_configfs.c
+++ b/drivers/infiniband/core/cma_configfs.c
@@ -123,16 +123,21 @@ static ssize_t default_roce_mode_store(struct config_item *item,
 {
 	struct cma_device *cma_dev;
 	struct cma_dev_port_group *group;
-	int gid_type = ib_cache_gid_parse_type_str(buf);
+	int gid_type;
 	ssize_t ret;
 
-	if (gid_type < 0)
-		return -EINVAL;
-
 	ret = cma_configfs_params_get(item, &cma_dev, &group);
 	if (ret)
 		return ret;
 
+	gid_type = ib_cache_gid_parse_type_str(buf);
+	if (gid_type < 0)
+		return -EINVAL;
+
+	if (gid_type == IB_GID_TYPE_IB &&
+	    rdma_protocol_roce_eth_encap(cma_dev->device, group->port_num))
+		gid_type = IB_GID_TYPE_ROCE;
+
 	ret = cma_set_default_gid_type(cma_dev, group->port_num, gid_type);
 
 	cma_configfs_params_put(cma_dev);
diff --git a/drivers/infiniband/core/cma_priv.h b/drivers/infiniband/core/cma_priv.h
index caece96ebcf5..5c99f6a53cd3 100644
--- a/drivers/infiniband/core/cma_priv.h
+++ b/drivers/infiniband/core/cma_priv.h
@@ -50,6 +50,16 @@ enum rdma_cm_state {
 	RDMA_CM_DESTROYING
 };
 
+struct cma_device {
+	struct list_head	list;
+	struct ib_device	*device;
+	struct completion	comp;
+	refcount_t refcount;
+	struct list_head	id_list;
+	enum ib_gid_type	*default_gid_type;
+	u8			*default_roce_tos;
+};
+
 struct rdma_id_private {
 	struct rdma_cm_id	id;
 
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 6e02f26a30a0..ac4b02f5ae68 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -740,7 +740,7 @@ int ib_get_gids_from_rdma_hdr(const union rdma_network_hdr *hdr,
 				       (struct in6_addr *)dgid);
 		return 0;
 	} else if (net_type == RDMA_NETWORK_IPV6 ||
-		   net_type == RDMA_NETWORK_IB) {
+		   net_type == RDMA_NETWORK_IB || RDMA_NETWORK_ROCE_V1) {
 		*dgid = hdr->ibgrh.dgid;
 		*sgid = hdr->ibgrh.sgid;
 		return 0;
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 0748a5daa2dd..3d4e02ae6628 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -255,7 +255,7 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
 
 	switch (roce_packet_type) {
 	case MLX5_CQE_ROCE_L3_HEADER_TYPE_GRH:
-		wc->network_hdr_type = RDMA_NETWORK_IB;
+		wc->network_hdr_type = RDMA_NETWORK_ROCE_V1;
 		break;
 	case MLX5_CQE_ROCE_L3_HEADER_TYPE_IPV6:
 		wc->network_hdr_type = RDMA_NETWORK_IPV6;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 8696ad50d8fb..5cf5266936de 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -546,7 +546,7 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u8 port_num,
 			 unsigned int index, const union ib_gid *gid,
 			 const struct ib_gid_attr *attr)
 {
-	enum ib_gid_type gid_type = IB_GID_TYPE_IB;
+	enum ib_gid_type gid_type = IB_GID_TYPE_ROCE;
 	u16 vlan_id = 0xffff;
 	u8 roce_version = 0;
 	u8 roce_l3_type = 0;
@@ -561,7 +561,7 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u8 port_num,
 	}
 
 	switch (gid_type) {
-	case IB_GID_TYPE_IB:
+	case IB_GID_TYPE_ROCE:
 		roce_version = MLX5_ROCE_VERSION_1;
 		break;
 	case IB_GID_TYPE_ROCE_UDP_ENCAP:
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 3337d4b6a841..0395523958da 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1138,7 +1138,7 @@ static inline int get_gid_info_from_table(struct ib_qp *ibqp,
 		SET_FIELD(qp_params->modify_flags,
 			  QED_ROCE_MODIFY_QP_VALID_ROCE_MODE, 1);
 		break;
-	case RDMA_NETWORK_IB:
+	case RDMA_NETWORK_ROCE_V1:
 		memcpy(&qp_params->sgid.bytes[0], &gid_attr->gid.raw[0],
 		       sizeof(qp_params->sgid));
 		memcpy(&qp_params->dgid.bytes[0],
@@ -1158,6 +1158,8 @@ static inline int get_gid_info_from_table(struct ib_qp *ibqp,
 			  QED_ROCE_MODIFY_QP_VALID_ROCE_MODE, 1);
 		qp_params->roce_mode = ROCE_V2_IPV4;
 		break;
+	default:
+		return -EINVAL;
 	}
 
 	for (i = 0; i < 4; i++) {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index bde17507f3ea..80cc0f5de5c3 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -138,10 +138,9 @@ union ib_gid {
 extern union ib_gid zgid;
 
 enum ib_gid_type {
-	/* If link layer is Ethernet, this is RoCE V1 */
 	IB_GID_TYPE_IB        = 0,
-	IB_GID_TYPE_ROCE      = 0,
-	IB_GID_TYPE_ROCE_UDP_ENCAP = 1,
+	IB_GID_TYPE_ROCE      = 1,
+	IB_GID_TYPE_ROCE_UDP_ENCAP = 2,
 	IB_GID_TYPE_SIZE
 };
 
@@ -180,7 +179,7 @@ rdma_node_get_transport(unsigned int node_type);
 
 enum rdma_network_type {
 	RDMA_NETWORK_IB,
-	RDMA_NETWORK_ROCE_V1 = RDMA_NETWORK_IB,
+	RDMA_NETWORK_ROCE_V1,
 	RDMA_NETWORK_IPV4,
 	RDMA_NETWORK_IPV6
 };
@@ -190,9 +189,10 @@ static inline enum ib_gid_type ib_network_to_gid_type(enum rdma_network_type net
 	if (network_type == RDMA_NETWORK_IPV4 ||
 	    network_type == RDMA_NETWORK_IPV6)
 		return IB_GID_TYPE_ROCE_UDP_ENCAP;
-
-	/* IB_GID_TYPE_IB same as RDMA_NETWORK_ROCE_V1 */
-	return IB_GID_TYPE_IB;
+	else if (network_type == RDMA_NETWORK_ROCE_V1)
+		return IB_GID_TYPE_ROCE;
+	else
+		return IB_GID_TYPE_IB;
 }
 
 static inline enum rdma_network_type
@@ -201,6 +201,9 @@ rdma_gid_attr_network_type(const struct ib_gid_attr *attr)
 	if (attr->gid_type == IB_GID_TYPE_IB)
 		return RDMA_NETWORK_IB;
 
+	if (attr->gid_type == IB_GID_TYPE_ROCE)
+		return RDMA_NETWORK_ROCE_V1;
+
 	if (ipv6_addr_v4mapped((struct in6_addr *)&attr->gid))
 		return RDMA_NETWORK_IPV4;
 	else
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API
  2020-09-10 14:22 [PATCH rdma-next 0/4] Query GID table API Leon Romanovsky
  2020-09-10 14:22 ` [PATCH rdma-next 1/4] RDMA/core: Change rdma_get_gid_attr returned error code Leon Romanovsky
  2020-09-10 14:22 ` [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Leon Romanovsky
@ 2020-09-10 14:22 ` Leon Romanovsky
  2020-09-11 19:52   ` Jason Gunthorpe
  2020-09-10 14:22 ` [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space Leon Romanovsky
  3 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-10 14:22 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Avihai Horon, linux-rdma

From: Avihai Horon <avihaih@nvidia.com>

Introduce rdma_query_gid_table which enables querying all the GID tables
of a given device and copying the attributes of all valid GID entries to
a provided buffer.

This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cache.c         | 93 +++++++++++++++++++++++++
 include/rdma/ib_cache.h                 |  5 ++
 include/uapi/rdma/ib_user_ioctl_verbs.h |  8 +++
 3 files changed, 106 insertions(+)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index cf49ac0b0aa6..175e229eccd3 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -1247,6 +1247,99 @@ rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
 }
 EXPORT_SYMBOL(rdma_get_gid_attr);
 
+/**
+ * rdma_get_ndev_ifindex - Reads ndev ifindex of the given gid attr.
+ * @gid_attr: Pointer to the GID attribute.
+ * @ndev_ifindex: Pointer through which the ndev ifindex is returned.
+ *
+ * Returns 0 on success or appropriate error code. The netdevice must be in UP
+ * state.
+ */
+int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr, u32 *ndev_ifindex)
+{
+	struct net_device *ndev;
+	int ret = 0;
+
+	if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
+		*ndev_ifindex = 0;
+		return 0;
+	}
+
+	rcu_read_lock();
+	ndev = rcu_dereference(gid_attr->ndev);
+	if (!ndev || (READ_ONCE(ndev->flags) & IFF_UP) == 0) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	*ndev_ifindex = ndev->ifindex;
+out:
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL(rdma_get_ndev_ifindex);
+
+/**
+ * rdma_query_gid_table - Reads GID table entries of all the ports of a device up to max_entries.
+ * @device: The device to query.
+ * @entries: Entries where GID entries are returned.
+ * @max_entries: Maximum number of entries that can be returned.
+ * Entries array must be allocated to hold max_entries number of entries.
+ * @num_entries: Updated to the number of entries that were successfully read.
+ *
+ * Returns 0 on success or appropriate error code.
+ */
+int rdma_query_gid_table(struct ib_device *device,
+			 struct ib_uverbs_gid_entry *entries,
+			 size_t max_entries, size_t *num_entries)
+{
+	const struct ib_gid_attr *gid_attr;
+	struct ib_gid_table *table;
+	unsigned int port_num;
+	unsigned long flags;
+	int ret;
+	int i;
+
+	*num_entries = 0;
+	rdma_for_each_port(device, port_num) {
+		if (!rdma_ib_or_roce(device, port_num))
+			continue;
+
+		table = rdma_gid_table(device, port_num);
+		read_lock_irqsave(&table->rwlock, flags);
+		for (i = 0; i < table->sz; i++) {
+			if (!is_gid_entry_valid(table->data_vec[i]))
+				continue;
+			if (*num_entries >= max_entries) {
+				ret = -EINVAL;
+				goto err;
+			}
+
+			gid_attr = &table->data_vec[i]->attr;
+
+			memcpy(&entries->gid, &gid_attr->gid,
+			       sizeof(gid_attr->gid));
+			entries->gid_index = gid_attr->index;
+			entries->port_num = gid_attr->port_num;
+			entries->gid_type = gid_attr->gid_type;
+			ret = rdma_get_ndev_ifindex(gid_attr,
+						    &entries->netdev_ifindex);
+			if (ret)
+				goto err;
+
+			(*num_entries)++;
+			entries++;
+		}
+		read_unlock_irqrestore(&table->rwlock, flags);
+	}
+
+	return 0;
+err:
+	read_unlock_irqrestore(&table->rwlock, flags);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_query_gid_table);
+
 /**
  * rdma_put_gid_attr - Release reference to the GID attribute
  * @attr:		Pointer to the GID attribute whose reference
diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h
index 66a8f369a2fa..94e4f729f8e4 100644
--- a/include/rdma/ib_cache.h
+++ b/include/rdma/ib_cache.h
@@ -110,5 +110,10 @@ const struct ib_gid_attr *rdma_get_gid_attr(struct ib_device *device,
 					    u8 port_num, int index);
 void rdma_put_gid_attr(const struct ib_gid_attr *attr);
 void rdma_hold_gid_attr(const struct ib_gid_attr *attr);
+int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr,
+			  u32 *ndev_ifindex);
+int rdma_query_gid_table(struct ib_device *device,
+			 struct ib_uverbs_gid_entry *entries,
+			 size_t max_entries, size_t *num_entries);
 
 #endif /* _IB_CACHE_H */
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 5debab45ebcb..d5ac65ae2557 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -250,4 +250,12 @@ enum rdma_driver_id {
 	RDMA_DRIVER_SIW,
 };
 
+struct ib_uverbs_gid_entry {
+	__aligned_u64 gid[2];
+	__u32 gid_index;
+	__u32 port_num;
+	__u32 gid_type;
+	__u32 netdev_ifindex; /* It is 0 if there is no netdev associated with it */
+};
+
 #endif
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-10 14:22 [PATCH rdma-next 0/4] Query GID table API Leon Romanovsky
                   ` (2 preceding siblings ...)
  2020-09-10 14:22 ` [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API Leon Romanovsky
@ 2020-09-10 14:22 ` Leon Romanovsky
  2020-09-11 19:59   ` Jason Gunthorpe
  3 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-10 14:22 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Avihai Horon, linux-rdma

From: Avihai Horon <avihaih@nvidia.com>

Expose the query GID table and entry API to user space by adding
two new methods and method handlers to the device object.

This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 .../infiniband/core/uverbs_std_types_device.c | 180 +++++++++++++++++-
 include/rdma/ib_verbs.h                       |   6 +-
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  16 ++
 include/uapi/rdma/ib_user_ioctl_verbs.h       |   6 +
 4 files changed, 204 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_std_types_device.c b/drivers/infiniband/core/uverbs_std_types_device.c
index 7b03446b6936..beba1e284264 100644
--- a/drivers/infiniband/core/uverbs_std_types_device.c
+++ b/drivers/infiniband/core/uverbs_std_types_device.c
@@ -8,6 +8,7 @@
 #include "uverbs.h"
 #include <rdma/uverbs_ioctl.h>
 #include <rdma/opa_addr.h>
+#include <rdma/ib_cache.h>
 
 /*
  * This ioctl method allows calling any defined write or write_ex
@@ -266,6 +267,157 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_CONTEXT)(
 	return ucontext->device->ops.query_ucontext(ucontext, attrs);
 }
 
+static int copy_gid_entries_to_user(struct uverbs_attr_bundle *attrs,
+				    struct ib_uverbs_gid_entry *entries,
+				    size_t num_entries, size_t user_entry_size)
+{
+	const struct uverbs_attr *attr;
+	void __user *user_entries;
+	size_t copy_len;
+	int ret;
+	int i;
+
+	if (user_entry_size == sizeof(*entries)) {
+		ret = uverbs_copy_to(attrs,
+				     UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+				     entries, sizeof(*entries) * num_entries);
+		return ret;
+	}
+
+	copy_len = min_t(size_t, user_entry_size, sizeof(*entries));
+	attr = uverbs_attr_get(attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
+	if (IS_ERR(attr))
+		return PTR_ERR(attr);
+
+	user_entries = u64_to_user_ptr(attr->ptr_attr.data);
+	for (i = 0; i < num_entries; i++) {
+		if (copy_to_user(user_entries, entries, copy_len))
+			return -EFAULT;
+
+		if (user_entry_size > sizeof(*entries)) {
+			if (clear_user(user_entries + sizeof(*entries),
+				       user_entry_size - sizeof(*entries)))
+				return -EFAULT;
+		}
+
+		entries++;
+		user_entries += user_entry_size;
+	}
+
+	return uverbs_output_written(attrs,
+				     UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
+}
+
+static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_TABLE)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uverbs_gid_entry *entries;
+	struct ib_ucontext *ucontext;
+	struct ib_device *ib_dev;
+	size_t user_entry_size;
+	size_t max_entries;
+	size_t num_entries;
+	u32 flags;
+	int ret;
+
+	ret = uverbs_get_flags32(&flags, attrs,
+				 UVERBS_ATTR_QUERY_GID_TABLE_FLAGS, 0);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_const(&user_entry_size, attrs,
+			       UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE);
+	if (ret)
+		return ret;
+
+	max_entries = uverbs_attr_ptr_get_array_size(
+		attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+		user_entry_size);
+	if (max_entries <= 0)
+		return -EINVAL;
+
+	ucontext = ib_uverbs_get_ucontext(attrs);
+	if (IS_ERR(ucontext))
+		return PTR_ERR(ucontext);
+	ib_dev = ucontext->device;
+
+	entries = uverbs_zalloc(attrs, max_entries * sizeof(*entries));
+	if (!entries)
+		return -ENOMEM;
+
+	ret = rdma_query_gid_table(ib_dev, entries, max_entries, &num_entries);
+	if (ret)
+		return ret;
+
+	ret = copy_gid_entries_to_user(attrs, entries, num_entries,
+				       user_entry_size);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_to(attrs,
+			     UVERBS_ATTR_QUERY_GID_TABLE_RESP_NUM_ENTRIES,
+			     &num_entries, sizeof(num_entries));
+	return ret;
+}
+
+static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
+	struct uverbs_attr_bundle *attrs)
+{
+	const struct ib_gid_attr *gid_attr;
+	struct ib_uverbs_gid_entry entry;
+	struct ib_ucontext *ucontext;
+	struct ib_device *ib_dev;
+	u32 gid_index;
+	u32 port_num;
+	u32 flags;
+	int ret;
+
+	ret = uverbs_get_flags32(&flags, attrs,
+				 UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_const(&port_num, attrs,
+			       UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_const(&gid_index, attrs,
+			       UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
+	if (ret)
+		return ret;
+
+	ucontext = ib_uverbs_get_ucontext(attrs);
+	if (IS_ERR(ucontext))
+		return PTR_ERR(ucontext);
+	ib_dev = ucontext->device;
+
+	if (!rdma_is_port_valid(ib_dev, port_num))
+		return -EINVAL;
+
+	if (!rdma_ib_or_roce(ib_dev, port_num))
+		return -EINVAL;
+
+	gid_attr = rdma_get_gid_attr(ib_dev, port_num, gid_index);
+	if (IS_ERR(gid_attr))
+		return PTR_ERR(gid_attr);
+
+	memcpy(&entry.gid, &gid_attr->gid, sizeof(gid_attr->gid));
+	entry.gid_index = gid_attr->index;
+	entry.port_num = gid_attr->port_num;
+	entry.gid_type = gid_attr->gid_type;
+	ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
+	if (ret)
+		goto out;
+
+	ret = uverbs_copy_to_struct_or_zero(
+		attrs, UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY, &entry,
+		sizeof(entry));
+out:
+	rdma_put_gid_attr(gid_attr);
+	return ret;
+}
+
 DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_METHOD_GET_CONTEXT,
 	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_GET_CONTEXT_NUM_COMP_VECTORS,
@@ -300,12 +452,38 @@ DECLARE_UVERBS_NAMED_METHOD(
 				   reserved),
 		UA_MANDATORY));
 
+DECLARE_UVERBS_NAMED_METHOD(
+	UVERBS_METHOD_QUERY_GID_TABLE,
+	UVERBS_ATTR_CONST_IN(UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE, u64,
+			     UA_MANDATORY),
+	UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_QUERY_GID_TABLE_FLAGS, u32,
+			     UA_OPTIONAL),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+			    UVERBS_ATTR_MIN_SIZE(0), UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_QUERY_GID_TABLE_RESP_NUM_ENTRIES,
+			    UVERBS_ATTR_TYPE(u64), UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_METHOD(
+	UVERBS_METHOD_QUERY_GID_ENTRY,
+	UVERBS_ATTR_CONST_IN(UVERBS_ATTR_QUERY_GID_ENTRY_PORT, u32,
+			     UA_MANDATORY),
+	UVERBS_ATTR_CONST_IN(UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX, u32,
+			     UA_MANDATORY),
+	UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, u32,
+			     UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY,
+			    UVERBS_ATTR_STRUCT(struct ib_uverbs_gid_entry,
+					       netdev_ifindex),
+			    UA_MANDATORY));
+
 DECLARE_UVERBS_GLOBAL_METHODS(UVERBS_OBJECT_DEVICE,
 			      &UVERBS_METHOD(UVERBS_METHOD_GET_CONTEXT),
 			      &UVERBS_METHOD(UVERBS_METHOD_INVOKE_WRITE),
 			      &UVERBS_METHOD(UVERBS_METHOD_INFO_HANDLES),
 			      &UVERBS_METHOD(UVERBS_METHOD_QUERY_PORT),
-			      &UVERBS_METHOD(UVERBS_METHOD_QUERY_CONTEXT));
+			      &UVERBS_METHOD(UVERBS_METHOD_QUERY_CONTEXT),
+			      &UVERBS_METHOD(UVERBS_METHOD_QUERY_GID_TABLE),
+			      &UVERBS_METHOD(UVERBS_METHOD_QUERY_GID_ENTRY));
 
 const struct uapi_definition uverbs_def_obj_device[] = {
 	UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_DEVICE),
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 80cc0f5de5c3..1e170e09afb0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -138,9 +138,9 @@ union ib_gid {
 extern union ib_gid zgid;
 
 enum ib_gid_type {
-	IB_GID_TYPE_IB        = 0,
-	IB_GID_TYPE_ROCE      = 1,
-	IB_GID_TYPE_ROCE_UDP_ENCAP = 2,
+	IB_GID_TYPE_IB = IB_UVERBS_GID_TYPE_IB,
+	IB_GID_TYPE_ROCE = IB_UVERBS_GID_TYPE_ROCE_V1,
+	IB_GID_TYPE_ROCE_UDP_ENCAP = IB_UVERBS_GID_TYPE_ROCE_V2,
 	IB_GID_TYPE_SIZE
 };
 
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 99dcabf61a71..7968a1845355 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -70,6 +70,8 @@ enum uverbs_methods_device {
 	UVERBS_METHOD_QUERY_PORT,
 	UVERBS_METHOD_GET_CONTEXT,
 	UVERBS_METHOD_QUERY_CONTEXT,
+	UVERBS_METHOD_QUERY_GID_TABLE,
+	UVERBS_METHOD_QUERY_GID_ENTRY,
 };
 
 enum uverbs_attrs_invoke_write_cmd_attr_ids {
@@ -352,4 +354,18 @@ enum uverbs_attrs_async_event_create {
 	UVERBS_ATTR_ASYNC_EVENT_ALLOC_FD_HANDLE,
 };
 
+enum uverbs_attrs_query_gid_table_cmd_attr_ids {
+	UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE,
+	UVERBS_ATTR_QUERY_GID_TABLE_FLAGS,
+	UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+	UVERBS_ATTR_QUERY_GID_TABLE_RESP_NUM_ENTRIES,
+};
+
+enum uverbs_attrs_query_gid_entry_cmd_attr_ids {
+	UVERBS_ATTR_QUERY_GID_ENTRY_PORT,
+	UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX,
+	UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS,
+	UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY,
+};
+
 #endif
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index d5ac65ae2557..cfea82acfe57 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -250,6 +250,12 @@ enum rdma_driver_id {
 	RDMA_DRIVER_SIW,
 };
 
+enum ib_uverbs_gid_type {
+	IB_UVERBS_GID_TYPE_IB,
+	IB_UVERBS_GID_TYPE_ROCE_V1,
+	IB_UVERBS_GID_TYPE_ROCE_V2,
+};
+
 struct ib_uverbs_gid_entry {
 	__aligned_u64 gid[2];
 	__u32 gid_index;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
  2020-09-10 14:22 ` [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Leon Romanovsky
@ 2020-09-11 19:11   ` Jason Gunthorpe
  2020-09-13  7:42     ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-11 19:11 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Avihai Horon, Ariel Elior, linux-rdma, Michal Kalderon

On Thu, Sep 10, 2020 at 05:22:02PM +0300, Leon Romanovsky wrote:

> diff --git a/drivers/infiniband/core/cma_configfs.c b/drivers/infiniband/core/cma_configfs.c
> index 3c1e2ca564fe..f1b277793980 100644
> +++ b/drivers/infiniband/core/cma_configfs.c
> @@ -123,16 +123,21 @@ static ssize_t default_roce_mode_store(struct config_item *item,
>  {
>  	struct cma_device *cma_dev;
>  	struct cma_dev_port_group *group;
> -	int gid_type = ib_cache_gid_parse_type_str(buf);
> +	int gid_type;
>  	ssize_t ret;
>  
> -	if (gid_type < 0)
> -		return -EINVAL;
> -
>  	ret = cma_configfs_params_get(item, &cma_dev, &group);
>  	if (ret)
>  		return ret;
>  
> +	gid_type = ib_cache_gid_parse_type_str(buf);
> +	if (gid_type < 0)
> +		return -EINVAL;
> +
> +	if (gid_type == IB_GID_TYPE_IB &&
> +	    rdma_protocol_roce_eth_encap(cma_dev->device, group->port_num))
> +		gid_type = IB_GID_TYPE_ROCE;

This logic should be in cma_set_default_gid_type() so as not to move
struct cma_device

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API
  2020-09-10 14:22 ` [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API Leon Romanovsky
@ 2020-09-11 19:52   ` Jason Gunthorpe
  2020-09-13  8:02     ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-11 19:52 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Thu, Sep 10, 2020 at 05:22:03PM +0300, Leon Romanovsky wrote:
> From: Avihai Horon <avihaih@nvidia.com>
> 
> Introduce rdma_query_gid_table which enables querying all the GID tables
> of a given device and copying the attributes of all valid GID entries to
> a provided buffer.
> 
> This API provides a faster way to query a GID table using single call and
> will be used in libibverbs to improve current approach that requires
> multiple calls to open, close and read multiple sysfs files for a single
> GID table entry.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>  drivers/infiniband/core/cache.c         | 93 +++++++++++++++++++++++++
>  include/rdma/ib_cache.h                 |  5 ++
>  include/uapi/rdma/ib_user_ioctl_verbs.h |  8 +++
>  3 files changed, 106 insertions(+)
> 
> diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> index cf49ac0b0aa6..175e229eccd3 100644
> +++ b/drivers/infiniband/core/cache.c
> @@ -1247,6 +1247,99 @@ rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
>  }
>  EXPORT_SYMBOL(rdma_get_gid_attr);
>  
> +/**
> + * rdma_get_ndev_ifindex - Reads ndev ifindex of the given gid attr.
> + * @gid_attr: Pointer to the GID attribute.
> + * @ndev_ifindex: Pointer through which the ndev ifindex is returned.
> + *
> + * Returns 0 on success or appropriate error code. The netdevice must be in UP
> + * state.
> + */
> +int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr, u32 *ndev_ifindex)
> +{
> +	struct net_device *ndev;
> +	int ret = 0;
> +
> +	if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
> +		*ndev_ifindex = 0;
> +		return 0;
> +	}
> +
> +	rcu_read_lock();
> +	ndev = rcu_dereference(gid_attr->ndev);
> +	if (!ndev || (READ_ONCE(ndev->flags) & IFF_UP) == 0) {
> +		ret = -ENODEV;
> +		goto out;
> +	}

None of this is necessary to read the ifindex, especially since the
read_lock is being held.

> +/**
> + * rdma_query_gid_table - Reads GID table entries of all the ports of a device up to max_entries.
> + * @device: The device to query.
> + * @entries: Entries where GID entries are returned.
> + * @max_entries: Maximum number of entries that can be returned.
> + * Entries array must be allocated to hold max_entries number of entries.
> + * @num_entries: Updated to the number of entries that were successfully read.
> + *
> + * Returns 0 on success or appropriate error code.
> + */
> +int rdma_query_gid_table(struct ib_device *device,
> +			 struct ib_uverbs_gid_entry *entries,
> +			 size_t max_entries, size_t *num_entries)

return ssize_t instead of the output pointer

> +{
> +	const struct ib_gid_attr *gid_attr;
> +	struct ib_gid_table *table;
> +	unsigned int port_num;
> +	unsigned long flags;
> +	int ret;
> +	int i;

i is unsigned

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-10 14:22 ` [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space Leon Romanovsky
@ 2020-09-11 19:59   ` Jason Gunthorpe
  2020-09-13  9:13     ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-11 19:59 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Thu, Sep 10, 2020 at 05:22:04PM +0300, Leon Romanovsky wrote:
> From: Avihai Horon <avihaih@nvidia.com>
> 
> Expose the query GID table and entry API to user space by adding
> two new methods and method handlers to the device object.
> 
> This API provides a faster way to query a GID table using single call and
> will be used in libibverbs to improve current approach that requires
> multiple calls to open, close and read multiple sysfs files for a single
> GID table entry.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>  .../infiniband/core/uverbs_std_types_device.c | 180 +++++++++++++++++-
>  include/rdma/ib_verbs.h                       |   6 +-
>  include/uapi/rdma/ib_user_ioctl_cmds.h        |  16 ++
>  include/uapi/rdma/ib_user_ioctl_verbs.h       |   6 +
>  4 files changed, 204 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/core/uverbs_std_types_device.c b/drivers/infiniband/core/uverbs_std_types_device.c
> index 7b03446b6936..beba1e284264 100644
> +++ b/drivers/infiniband/core/uverbs_std_types_device.c
> @@ -8,6 +8,7 @@
>  #include "uverbs.h"
>  #include <rdma/uverbs_ioctl.h>
>  #include <rdma/opa_addr.h>
> +#include <rdma/ib_cache.h>
>  
>  /*
>   * This ioctl method allows calling any defined write or write_ex
> @@ -266,6 +267,157 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_CONTEXT)(
>  	return ucontext->device->ops.query_ucontext(ucontext, attrs);
>  }
>  
> +static int copy_gid_entries_to_user(struct uverbs_attr_bundle *attrs,
> +				    struct ib_uverbs_gid_entry *entries,
> +				    size_t num_entries, size_t user_entry_size)
> +{
> +	const struct uverbs_attr *attr;
> +	void __user *user_entries;
> +	size_t copy_len;
> +	int ret;
> +	int i;
> +
> +	if (user_entry_size == sizeof(*entries)) {
> +		ret = uverbs_copy_to(attrs,
> +				     UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
> +				     entries, sizeof(*entries) * num_entries);
> +		return ret;
> +	}
> +
> +	copy_len = min_t(size_t, user_entry_size, sizeof(*entries));
> +	attr = uverbs_attr_get(attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
> +	if (IS_ERR(attr))
> +		return PTR_ERR(attr);
> +
> +	user_entries = u64_to_user_ptr(attr->ptr_attr.data);
> +	for (i = 0; i < num_entries; i++) {
> +		if (copy_to_user(user_entries, entries, copy_len))
> +			return -EFAULT;
> +
> +		if (user_entry_size > sizeof(*entries)) {
> +			if (clear_user(user_entries + sizeof(*entries),
> +				       user_entry_size - sizeof(*entries)))
> +				return -EFAULT;
> +		}
> +
> +		entries++;
> +		user_entries += user_entry_size;
> +	}
> +
> +	return uverbs_output_written(attrs,
> +				     UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
> +}
> +
> +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_TABLE)(
> +	struct uverbs_attr_bundle *attrs)
> +{
> +	struct ib_uverbs_gid_entry *entries;
> +	struct ib_ucontext *ucontext;
> +	struct ib_device *ib_dev;
> +	size_t user_entry_size;
> +	size_t max_entries;
> +	size_t num_entries;
> +	u32 flags;
> +	int ret;
> +
> +	ret = uverbs_get_flags32(&flags, attrs,
> +				 UVERBS_ATTR_QUERY_GID_TABLE_FLAGS, 0);
> +	if (ret)
> +		return ret;
> +
> +	ret = uverbs_get_const(&user_entry_size, attrs,
> +			       UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE);
> +	if (ret)
> +		return ret;
> +
> +	max_entries = uverbs_attr_ptr_get_array_size(
> +		attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
> +		user_entry_size);
> +	if (max_entries <= 0)
> +		return -EINVAL;
> +
> +	ucontext = ib_uverbs_get_ucontext(attrs);
> +	if (IS_ERR(ucontext))
> +		return PTR_ERR(ucontext);
> +	ib_dev = ucontext->device;
> +
> +	entries = uverbs_zalloc(attrs, max_entries * sizeof(*entries));

This multiplication could overflow

> +
> +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> +	struct uverbs_attr_bundle *attrs)
> +{
> +	const struct ib_gid_attr *gid_attr;
> +	struct ib_uverbs_gid_entry entry;
> +	struct ib_ucontext *ucontext;
> +	struct ib_device *ib_dev;
> +	u32 gid_index;
> +	u32 port_num;
> +	u32 flags;
> +	int ret;
> +
> +	ret = uverbs_get_flags32(&flags, attrs,
> +				 UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> +	if (ret)
> +		return ret;
> +
> +	ret = uverbs_get_const(&port_num, attrs,
> +			       UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> +	if (ret)
> +		return ret;
> +
> +	ret = uverbs_get_const(&gid_index, attrs,
> +			       UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> +	if (ret)
> +		return ret;
> +
> +	ucontext = ib_uverbs_get_ucontext(attrs);
> +	if (IS_ERR(ucontext))
> +		return PTR_ERR(ucontext);
> +	ib_dev = ucontext->device;

> +	if (!rdma_is_port_valid(ib_dev, port_num))
> +		return -EINVAL;
> +
> +	if (!rdma_ib_or_roce(ib_dev, port_num))
> +		return -EINVAL;

Why these two tests? I would expect rdma_get_gid_attr() to do them

> +	gid_attr = rdma_get_gid_attr(ib_dev, port_num, gid_index);
> +	if (IS_ERR(gid_attr))
> +		return PTR_ERR(gid_attr);
> +
> +	memcpy(&entry.gid, &gid_attr->gid, sizeof(gid_attr->gid));
> +	entry.gid_index = gid_attr->index;
> +	entry.port_num = gid_attr->port_num;
> +	entry.gid_type = gid_attr->gid_type;
> +	ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);

Use rdma_read_gid_attr_ndev_rcu()

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
  2020-09-11 19:11   ` Jason Gunthorpe
@ 2020-09-13  7:42     ` Leon Romanovsky
  0 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-13  7:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Avihai Horon, Ariel Elior, linux-rdma, Michal Kalderon

On Fri, Sep 11, 2020 at 04:11:16PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 10, 2020 at 05:22:02PM +0300, Leon Romanovsky wrote:
>
> > diff --git a/drivers/infiniband/core/cma_configfs.c b/drivers/infiniband/core/cma_configfs.c
> > index 3c1e2ca564fe..f1b277793980 100644
> > +++ b/drivers/infiniband/core/cma_configfs.c
> > @@ -123,16 +123,21 @@ static ssize_t default_roce_mode_store(struct config_item *item,
> >  {
> >  	struct cma_device *cma_dev;
> >  	struct cma_dev_port_group *group;
> > -	int gid_type = ib_cache_gid_parse_type_str(buf);
> > +	int gid_type;
> >  	ssize_t ret;
> >
> > -	if (gid_type < 0)
> > -		return -EINVAL;
> > -
> >  	ret = cma_configfs_params_get(item, &cma_dev, &group);
> >  	if (ret)
> >  		return ret;
> >
> > +	gid_type = ib_cache_gid_parse_type_str(buf);
> > +	if (gid_type < 0)
> > +		return -EINVAL;
> > +
> > +	if (gid_type == IB_GID_TYPE_IB &&
> > +	    rdma_protocol_roce_eth_encap(cma_dev->device, group->port_num))
> > +		gid_type = IB_GID_TYPE_ROCE;
>
> This logic should be in cma_set_default_gid_type() so as not to move
> struct cma_device

We wanted to keep "this hack of git type" as close as possible to the
actual needed place.

I'll fix.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API
  2020-09-11 19:52   ` Jason Gunthorpe
@ 2020-09-13  8:02     ` Leon Romanovsky
  2020-09-14 15:50       ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-13  8:02 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Fri, Sep 11, 2020 at 04:52:21PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 10, 2020 at 05:22:03PM +0300, Leon Romanovsky wrote:
> > From: Avihai Horon <avihaih@nvidia.com>
> >
> > Introduce rdma_query_gid_table which enables querying all the GID tables
> > of a given device and copying the attributes of all valid GID entries to
> > a provided buffer.
> >
> > This API provides a faster way to query a GID table using single call and
> > will be used in libibverbs to improve current approach that requires
> > multiple calls to open, close and read multiple sysfs files for a single
> > GID table entry.
> >
> > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> >  drivers/infiniband/core/cache.c         | 93 +++++++++++++++++++++++++
> >  include/rdma/ib_cache.h                 |  5 ++
> >  include/uapi/rdma/ib_user_ioctl_verbs.h |  8 +++
> >  3 files changed, 106 insertions(+)
> >
> > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> > index cf49ac0b0aa6..175e229eccd3 100644
> > +++ b/drivers/infiniband/core/cache.c
> > @@ -1247,6 +1247,99 @@ rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
> >  }
> >  EXPORT_SYMBOL(rdma_get_gid_attr);
> >
> > +/**
> > + * rdma_get_ndev_ifindex - Reads ndev ifindex of the given gid attr.
> > + * @gid_attr: Pointer to the GID attribute.
> > + * @ndev_ifindex: Pointer through which the ndev ifindex is returned.
> > + *
> > + * Returns 0 on success or appropriate error code. The netdevice must be in UP
> > + * state.
> > + */
> > +int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr, u32 *ndev_ifindex)
> > +{
> > +	struct net_device *ndev;
> > +	int ret = 0;
> > +
> > +	if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
> > +		*ndev_ifindex = 0;
> > +		return 0;
> > +	}
> > +
> > +	rcu_read_lock();
> > +	ndev = rcu_dereference(gid_attr->ndev);
> > +	if (!ndev || (READ_ONCE(ndev->flags) & IFF_UP) == 0) {
> > +		ret = -ENODEV;
> > +		goto out;
> > +	}
>
> None of this is necessary to read the ifindex, especially since the
> read_lock is being held.

I see same rcu_read_lock->rcu_dereference->rcu_read_unlock pattern in
rdma_read_gid_l2_fields(), why this function is different?

>
> > +/**
> > + * rdma_query_gid_table - Reads GID table entries of all the ports of a device up to max_entries.
> > + * @device: The device to query.
> > + * @entries: Entries where GID entries are returned.
> > + * @max_entries: Maximum number of entries that can be returned.
> > + * Entries array must be allocated to hold max_entries number of entries.
> > + * @num_entries: Updated to the number of entries that were successfully read.
> > + *
> > + * Returns 0 on success or appropriate error code.
> > + */
> > +int rdma_query_gid_table(struct ib_device *device,
> > +			 struct ib_uverbs_gid_entry *entries,
> > +			 size_t max_entries, size_t *num_entries)
>
> return ssize_t instead of the output pointer

I'll change.

>
> > +{
> > +	const struct ib_gid_attr *gid_attr;
> > +	struct ib_gid_table *table;
> > +	unsigned int port_num;
> > +	unsigned long flags;
> > +	int ret;
> > +	int i;
>
> i is unsigned

"i" is used as an iterator till table->sz while "sz" is declared as int.
I'll change it to be unsigned int, but it is not needed.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-11 19:59   ` Jason Gunthorpe
@ 2020-09-13  9:13     ` Leon Romanovsky
  2020-09-14 15:55       ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-13  9:13 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Fri, Sep 11, 2020 at 04:59:18PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 10, 2020 at 05:22:04PM +0300, Leon Romanovsky wrote:
> > From: Avihai Horon <avihaih@nvidia.com>
> >
> > Expose the query GID table and entry API to user space by adding
> > two new methods and method handlers to the device object.
> >
> > This API provides a faster way to query a GID table using single call and
> > will be used in libibverbs to improve current approach that requires
> > multiple calls to open, close and read multiple sysfs files for a single
> > GID table entry.
> >
> > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> >  .../infiniband/core/uverbs_std_types_device.c | 180 +++++++++++++++++-
> >  include/rdma/ib_verbs.h                       |   6 +-
> >  include/uapi/rdma/ib_user_ioctl_cmds.h        |  16 ++
> >  include/uapi/rdma/ib_user_ioctl_verbs.h       |   6 +
> >  4 files changed, 204 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/uverbs_std_types_device.c b/drivers/infiniband/core/uverbs_std_types_device.c
> > index 7b03446b6936..beba1e284264 100644
> > +++ b/drivers/infiniband/core/uverbs_std_types_device.c
> > @@ -8,6 +8,7 @@
> >  #include "uverbs.h"
> >  #include <rdma/uverbs_ioctl.h>
> >  #include <rdma/opa_addr.h>
> > +#include <rdma/ib_cache.h>
> >
> >  /*
> >   * This ioctl method allows calling any defined write or write_ex
> > @@ -266,6 +267,157 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_CONTEXT)(
> >  	return ucontext->device->ops.query_ucontext(ucontext, attrs);
> >  }
> >
> > +static int copy_gid_entries_to_user(struct uverbs_attr_bundle *attrs,
> > +				    struct ib_uverbs_gid_entry *entries,
> > +				    size_t num_entries, size_t user_entry_size)
> > +{
> > +	const struct uverbs_attr *attr;
> > +	void __user *user_entries;
> > +	size_t copy_len;
> > +	int ret;
> > +	int i;
> > +
> > +	if (user_entry_size == sizeof(*entries)) {
> > +		ret = uverbs_copy_to(attrs,
> > +				     UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
> > +				     entries, sizeof(*entries) * num_entries);
> > +		return ret;
> > +	}
> > +
> > +	copy_len = min_t(size_t, user_entry_size, sizeof(*entries));
> > +	attr = uverbs_attr_get(attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
> > +	if (IS_ERR(attr))
> > +		return PTR_ERR(attr);
> > +
> > +	user_entries = u64_to_user_ptr(attr->ptr_attr.data);
> > +	for (i = 0; i < num_entries; i++) {
> > +		if (copy_to_user(user_entries, entries, copy_len))
> > +			return -EFAULT;
> > +
> > +		if (user_entry_size > sizeof(*entries)) {
> > +			if (clear_user(user_entries + sizeof(*entries),
> > +				       user_entry_size - sizeof(*entries)))
> > +				return -EFAULT;
> > +		}
> > +
> > +		entries++;
> > +		user_entries += user_entry_size;
> > +	}
> > +
> > +	return uverbs_output_written(attrs,
> > +				     UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
> > +}
> > +
> > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_TABLE)(
> > +	struct uverbs_attr_bundle *attrs)
> > +{
> > +	struct ib_uverbs_gid_entry *entries;
> > +	struct ib_ucontext *ucontext;
> > +	struct ib_device *ib_dev;
> > +	size_t user_entry_size;
> > +	size_t max_entries;
> > +	size_t num_entries;
> > +	u32 flags;
> > +	int ret;
> > +
> > +	ret = uverbs_get_flags32(&flags, attrs,
> > +				 UVERBS_ATTR_QUERY_GID_TABLE_FLAGS, 0);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = uverbs_get_const(&user_entry_size, attrs,
> > +			       UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE);
> > +	if (ret)
> > +		return ret;
> > +
> > +	max_entries = uverbs_attr_ptr_get_array_size(
> > +		attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
> > +		user_entry_size);
> > +	if (max_entries <= 0)
> > +		return -EINVAL;
> > +
> > +	ucontext = ib_uverbs_get_ucontext(attrs);
> > +	if (IS_ERR(ucontext))
> > +		return PTR_ERR(ucontext);
> > +	ib_dev = ucontext->device;
> > +
> > +	entries = uverbs_zalloc(attrs, max_entries * sizeof(*entries));
>
> This multiplication could overflow

Right, I fixed it.

>
> > +
> > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> > +	struct uverbs_attr_bundle *attrs)
> > +{
> > +	const struct ib_gid_attr *gid_attr;
> > +	struct ib_uverbs_gid_entry entry;
> > +	struct ib_ucontext *ucontext;
> > +	struct ib_device *ib_dev;
> > +	u32 gid_index;
> > +	u32 port_num;
> > +	u32 flags;
> > +	int ret;
> > +
> > +	ret = uverbs_get_flags32(&flags, attrs,
> > +				 UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = uverbs_get_const(&port_num, attrs,
> > +			       UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = uverbs_get_const(&gid_index, attrs,
> > +			       UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ucontext = ib_uverbs_get_ucontext(attrs);
> > +	if (IS_ERR(ucontext))
> > +		return PTR_ERR(ucontext);
> > +	ib_dev = ucontext->device;
>
> > +	if (!rdma_is_port_valid(ib_dev, port_num))
> > +		return -EINVAL;
> > +
> > +	if (!rdma_ib_or_roce(ib_dev, port_num))
> > +		return -EINVAL;
>
> Why these two tests? I would expect rdma_get_gid_attr() to do them

First check is not needed, but the second check doesn't exist in
rdma_get_gid_attr(). We don't check that table returned from
rdma_gid_table() call exists.

>
> > +	gid_attr = rdma_get_gid_attr(ib_dev, port_num, gid_index);
> > +	if (IS_ERR(gid_attr))
> > +		return PTR_ERR(gid_attr);
> > +
> > +	memcpy(&entry.gid, &gid_attr->gid, sizeof(gid_attr->gid));
> > +	entry.gid_index = gid_attr->index;
> > +	entry.port_num = gid_attr->port_num;
> > +	entry.gid_type = gid_attr->gid_type;
> > +	ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
>
> Use rdma_read_gid_attr_ndev_rcu()

I don't want to bring below logic to uverbs* file.

  1263         if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
  1264                 *ndev_ifindex = 0;
  1265                 return 0;
  1266         }

>
> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API
  2020-09-13  8:02     ` Leon Romanovsky
@ 2020-09-14 15:50       ` Jason Gunthorpe
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-14 15:50 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Sun, Sep 13, 2020 at 11:02:33AM +0300, Leon Romanovsky wrote:
> On Fri, Sep 11, 2020 at 04:52:21PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 10, 2020 at 05:22:03PM +0300, Leon Romanovsky wrote:
> > > From: Avihai Horon <avihaih@nvidia.com>
> > >
> > > Introduce rdma_query_gid_table which enables querying all the GID tables
> > > of a given device and copying the attributes of all valid GID entries to
> > > a provided buffer.
> > >
> > > This API provides a faster way to query a GID table using single call and
> > > will be used in libibverbs to improve current approach that requires
> > > multiple calls to open, close and read multiple sysfs files for a single
> > > GID table entry.
> > >
> > > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > >  drivers/infiniband/core/cache.c         | 93 +++++++++++++++++++++++++
> > >  include/rdma/ib_cache.h                 |  5 ++
> > >  include/uapi/rdma/ib_user_ioctl_verbs.h |  8 +++
> > >  3 files changed, 106 insertions(+)
> > >
> > > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> > > index cf49ac0b0aa6..175e229eccd3 100644
> > > +++ b/drivers/infiniband/core/cache.c
> > > @@ -1247,6 +1247,99 @@ rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
> > >  }
> > >  EXPORT_SYMBOL(rdma_get_gid_attr);
> > >
> > > +/**
> > > + * rdma_get_ndev_ifindex - Reads ndev ifindex of the given gid attr.
> > > + * @gid_attr: Pointer to the GID attribute.
> > > + * @ndev_ifindex: Pointer through which the ndev ifindex is returned.
> > > + *
> > > + * Returns 0 on success or appropriate error code. The netdevice must be in UP
> > > + * state.
> > > + */
> > > +int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr, u32 *ndev_ifindex)
> > > +{
> > > +	struct net_device *ndev;
> > > +	int ret = 0;
> > > +
> > > +	if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
> > > +		*ndev_ifindex = 0;
> > > +		return 0;
> > > +	}
> > > +
> > > +	rcu_read_lock();
> > > +	ndev = rcu_dereference(gid_attr->ndev);
> > > +	if (!ndev || (READ_ONCE(ndev->flags) & IFF_UP) == 0) {
> > > +		ret = -ENODEV;
> > > +		goto out;
> > > +	}
> >
> > None of this is necessary to read the ifindex, especially since the
> > read_lock is being held.
> 
> I see same rcu_read_lock->rcu_dereference->rcu_read_unlock pattern in
> rdma_read_gid_l2_fields(), why this function is different?

It doesn't hold the read_lock so it is using rcu to access
attr->ndev. With the read_lock held attr->ndev is stable and none of
this is needed.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-13  9:13     ` Leon Romanovsky
@ 2020-09-14 15:55       ` Jason Gunthorpe
  2020-09-15 11:47         ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-14 15:55 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Sun, Sep 13, 2020 at 12:13:02PM +0300, Leon Romanovsky wrote:
> > > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> > > +	struct uverbs_attr_bundle *attrs)
> > > +{
> > > +	const struct ib_gid_attr *gid_attr;
> > > +	struct ib_uverbs_gid_entry entry;
> > > +	struct ib_ucontext *ucontext;
> > > +	struct ib_device *ib_dev;
> > > +	u32 gid_index;
> > > +	u32 port_num;
> > > +	u32 flags;
> > > +	int ret;
> > > +
> > > +	ret = uverbs_get_flags32(&flags, attrs,
> > > +				 UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	ret = uverbs_get_const(&port_num, attrs,
> > > +			       UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	ret = uverbs_get_const(&gid_index, attrs,
> > > +			       UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	ucontext = ib_uverbs_get_ucontext(attrs);
> > > +	if (IS_ERR(ucontext))
> > > +		return PTR_ERR(ucontext);
> > > +	ib_dev = ucontext->device;
> >
> > > +	if (!rdma_is_port_valid(ib_dev, port_num))
> > > +		return -EINVAL;
> > > +
> > > +	if (!rdma_ib_or_roce(ib_dev, port_num))
> > > +		return -EINVAL;
> >
> > Why these two tests? I would expect rdma_get_gid_attr() to do them
> 
> First check is not needed, but the second check doesn't exist in
> rdma_get_gid_attr(). We don't check that table returned from
> rdma_gid_table() call exists.

Oh that is a bit exciting, maybe it should be checked...

Ideally we should also block this uapi entirely if the device doesn't
have a gid table, so this should be -EOPNOTSUP and moved up to the
top so it can be moved once we figure it out.

> > > +	gid_attr = rdma_get_gid_attr(ib_dev, port_num, gid_index);
> > > +	if (IS_ERR(gid_attr))
> > > +		return PTR_ERR(gid_attr);
> > > +
> > > +	memcpy(&entry.gid, &gid_attr->gid, sizeof(gid_attr->gid));
> > > +	entry.gid_index = gid_attr->index;
> > > +	entry.port_num = gid_attr->port_num;
> > > +	entry.gid_type = gid_attr->gid_type;
> > > +	ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
> >
> > Use rdma_read_gid_attr_ndev_rcu()
> 
> I don't want to bring below logic to uverbs* file.
> 
>   1263         if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
>   1264                 *ndev_ifindex = 0;
>   1265                 return 0;
>   1266         }

Shouldn't be needed, rdma_read_gid_attr_ndev_rcu() already returns -1
for IB

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-14 15:55       ` Jason Gunthorpe
@ 2020-09-15 11:47         ` Leon Romanovsky
  2020-09-15 19:06           ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-15 11:47 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Mon, Sep 14, 2020 at 12:55:50PM -0300, Jason Gunthorpe wrote:
> On Sun, Sep 13, 2020 at 12:13:02PM +0300, Leon Romanovsky wrote:
> > > > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> > > > +	struct uverbs_attr_bundle *attrs)
> > > > +{
> > > > +	const struct ib_gid_attr *gid_attr;
> > > > +	struct ib_uverbs_gid_entry entry;
> > > > +	struct ib_ucontext *ucontext;
> > > > +	struct ib_device *ib_dev;
> > > > +	u32 gid_index;
> > > > +	u32 port_num;
> > > > +	u32 flags;
> > > > +	int ret;
> > > > +
> > > > +	ret = uverbs_get_flags32(&flags, attrs,
> > > > +				 UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	ret = uverbs_get_const(&port_num, attrs,
> > > > +			       UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	ret = uverbs_get_const(&gid_index, attrs,
> > > > +			       UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	ucontext = ib_uverbs_get_ucontext(attrs);
> > > > +	if (IS_ERR(ucontext))
> > > > +		return PTR_ERR(ucontext);
> > > > +	ib_dev = ucontext->device;
> > >
> > > > +	if (!rdma_is_port_valid(ib_dev, port_num))
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (!rdma_ib_or_roce(ib_dev, port_num))
> > > > +		return -EINVAL;
> > >
> > > Why these two tests? I would expect rdma_get_gid_attr() to do them
> >
> > First check is not needed, but the second check doesn't exist in
> > rdma_get_gid_attr(). We don't check that table returned from
> > rdma_gid_table() call exists.
>
> Oh that is a bit exciting, maybe it should be checked...
>
> Ideally we should also block this uapi entirely if the device doesn't
> have a gid table, so this should be -EOPNOTSUP and moved up to the
> top so it can be moved once we figure it out.

It is already in earliest possible stage, right after we get ib_device.

>
> > > > +	gid_attr = rdma_get_gid_attr(ib_dev, port_num, gid_index);
> > > > +	if (IS_ERR(gid_attr))
> > > > +		return PTR_ERR(gid_attr);
> > > > +
> > > > +	memcpy(&entry.gid, &gid_attr->gid, sizeof(gid_attr->gid));
> > > > +	entry.gid_index = gid_attr->index;
> > > > +	entry.port_num = gid_attr->port_num;
> > > > +	entry.gid_type = gid_attr->gid_type;
> > > > +	ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
> > >
> > > Use rdma_read_gid_attr_ndev_rcu()
> >
> > I don't want to bring below logic to uverbs* file.
> >
> >   1263         if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
> >   1264                 *ndev_ifindex = 0;
> >   1265                 return 0;
> >   1266         }
>
> Shouldn't be needed, rdma_read_gid_attr_ndev_rcu() already returns -1
> for IB

It will be something like this:

diff --git a/drivers/infiniband/core/uverbs_std_types_device.c b/drivers/infiniband/core/uverbs_std_types_device.c
index 8e957aa38531..071698af4b8e 100644
--- a/drivers/infiniband/core/uverbs_std_types_device.c
+++ b/drivers/infiniband/core/uverbs_std_types_device.c
@@ -368,10 +368,11 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_TABLE)(
 static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
 	struct uverbs_attr_bundle *attrs)
 {
+	struct ib_uverbs_gid_entry entry = {};
 	const struct ib_gid_attr *gid_attr;
-	struct ib_uverbs_gid_entry entry;
 	struct ib_ucontext *ucontext;
 	struct ib_device *ib_dev;
+	struct net_device *ndev;
 	u32 gid_index;
 	u32 port_num;
 	u32 flags;
@@ -408,9 +409,17 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
 	entry.gid_index = gid_attr->index;
 	entry.port_num = gid_attr->port_num;
 	entry.gid_type = gid_attr->gid_type;
-	ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
-	if (ret)
-		goto out;
+
+	if (rdma_protocol_roce(ib_dev, port_num)) {
+		rcu_read_lock();
+		ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
+		if (IS_ERR(ndev)) {
+		       rcu_read_unlock();
+		       goto out;
+		}
+		entry.netdev_ifindex = ndev->ifindex;
+		rcu_read_unlock();
+	}

 	ret = uverbs_copy_to_struct_or_zero(
 		attrs, UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY, &entry,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-15 11:47         ` Leon Romanovsky
@ 2020-09-15 19:06           ` Jason Gunthorpe
  2020-09-16 10:37             ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-15 19:06 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Tue, Sep 15, 2020 at 02:47:04PM +0300, Leon Romanovsky wrote:
> On Mon, Sep 14, 2020 at 12:55:50PM -0300, Jason Gunthorpe wrote:
> > On Sun, Sep 13, 2020 at 12:13:02PM +0300, Leon Romanovsky wrote:
> > > > > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> > > > > +	struct uverbs_attr_bundle *attrs)
> > > > > +{
> > > > > +	const struct ib_gid_attr *gid_attr;
> > > > > +	struct ib_uverbs_gid_entry entry;
> > > > > +	struct ib_ucontext *ucontext;
> > > > > +	struct ib_device *ib_dev;
> > > > > +	u32 gid_index;
> > > > > +	u32 port_num;
> > > > > +	u32 flags;
> > > > > +	int ret;
> > > > > +
> > > > > +	ret = uverbs_get_flags32(&flags, attrs,
> > > > > +				 UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> > > > > +	if (ret)
> > > > > +		return ret;
> > > > > +
> > > > > +	ret = uverbs_get_const(&port_num, attrs,
> > > > > +			       UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> > > > > +	if (ret)
> > > > > +		return ret;
> > > > > +
> > > > > +	ret = uverbs_get_const(&gid_index, attrs,
> > > > > +			       UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> > > > > +	if (ret)
> > > > > +		return ret;
> > > > > +
> > > > > +	ucontext = ib_uverbs_get_ucontext(attrs);
> > > > > +	if (IS_ERR(ucontext))
> > > > > +		return PTR_ERR(ucontext);
> > > > > +	ib_dev = ucontext->device;
> > > >
> > > > > +	if (!rdma_is_port_valid(ib_dev, port_num))
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	if (!rdma_ib_or_roce(ib_dev, port_num))
> > > > > +		return -EINVAL;
> > > >
> > > > Why these two tests? I would expect rdma_get_gid_attr() to do them
> > >
> > > First check is not needed, but the second check doesn't exist in
> > > rdma_get_gid_attr(). We don't check that table returned from
> > > rdma_gid_table() call exists.
> >
> > Oh that is a bit exciting, maybe it should be checked...
> >
> > Ideally we should also block this uapi entirely if the device doesn't
> > have a gid table, so this should be -EOPNOTSUP and moved up to the
> > top so it can be moved once we figure it out.
> 
> It is already in earliest possible stage, right after we get ib_device.

I usually put it before the uverbs_get_XX.. but doesn't matter

> @@ -408,9 +409,17 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
>  	entry.gid_index = gid_attr->index;
>  	entry.port_num = gid_attr->port_num;
>  	entry.gid_type = gid_attr->gid_type;
> -	ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
> -	if (ret)
> -		goto out;
> +
> +	if (rdma_protocol_roce(ib_dev, port_num)) {

Not sure this is needed? non-roce still has gid table entries, and the
attr->ndev should be null so it will do the ENODEV naturally.

> +		rcu_read_lock();
> +		ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> +		if (IS_ERR(ndev)) {
> +		       rcu_read_unlock();
> +		       goto out;
> +		}
> +		entry.netdev_ifindex = ndev->ifindex;
> +		rcu_read_unlock();
> +	}

This is better than what is in rdma_get_ndev_ifindex()

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-15 19:06           ` Jason Gunthorpe
@ 2020-09-16 10:37             ` Leon Romanovsky
  2020-09-16 12:04               ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-16 10:37 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Tue, Sep 15, 2020 at 04:06:14PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 15, 2020 at 02:47:04PM +0300, Leon Romanovsky wrote:
> > On Mon, Sep 14, 2020 at 12:55:50PM -0300, Jason Gunthorpe wrote:
> > > On Sun, Sep 13, 2020 at 12:13:02PM +0300, Leon Romanovsky wrote:
> > > > > > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> > > > > > +	struct uverbs_attr_bundle *attrs)
> > > > > > +{
> > > > > > +	const struct ib_gid_attr *gid_attr;
> > > > > > +	struct ib_uverbs_gid_entry entry;
> > > > > > +	struct ib_ucontext *ucontext;
> > > > > > +	struct ib_device *ib_dev;
> > > > > > +	u32 gid_index;
> > > > > > +	u32 port_num;
> > > > > > +	u32 flags;
> > > > > > +	int ret;
> > > > > > +
> > > > > > +	ret = uverbs_get_flags32(&flags, attrs,
> > > > > > +				 UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> > > > > > +	if (ret)
> > > > > > +		return ret;
> > > > > > +
> > > > > > +	ret = uverbs_get_const(&port_num, attrs,
> > > > > > +			       UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> > > > > > +	if (ret)
> > > > > > +		return ret;
> > > > > > +
> > > > > > +	ret = uverbs_get_const(&gid_index, attrs,
> > > > > > +			       UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> > > > > > +	if (ret)
> > > > > > +		return ret;
> > > > > > +
> > > > > > +	ucontext = ib_uverbs_get_ucontext(attrs);
> > > > > > +	if (IS_ERR(ucontext))
> > > > > > +		return PTR_ERR(ucontext);
> > > > > > +	ib_dev = ucontext->device;
> > > > >
> > > > > > +	if (!rdma_is_port_valid(ib_dev, port_num))
> > > > > > +		return -EINVAL;
> > > > > > +
> > > > > > +	if (!rdma_ib_or_roce(ib_dev, port_num))
> > > > > > +		return -EINVAL;
> > > > >
> > > > > Why these two tests? I would expect rdma_get_gid_attr() to do them
> > > >
> > > > First check is not needed, but the second check doesn't exist in
> > > > rdma_get_gid_attr(). We don't check that table returned from
> > > > rdma_gid_table() call exists.
> > >
> > > Oh that is a bit exciting, maybe it should be checked...
> > >
> > > Ideally we should also block this uapi entirely if the device doesn't
> > > have a gid table, so this should be -EOPNOTSUP and moved up to the
> > > top so it can be moved once we figure it out.
> >
> > It is already in earliest possible stage, right after we get ib_device.
>
> I usually put it before the uverbs_get_XX.. but doesn't matter
>
> > @@ -408,9 +409,17 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> >  	entry.gid_index = gid_attr->index;
> >  	entry.port_num = gid_attr->port_num;
> >  	entry.gid_type = gid_attr->gid_type;
> > -	ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
> > -	if (ret)
> > -		goto out;
> > +
> > +	if (rdma_protocol_roce(ib_dev, port_num)) {
>
> Not sure this is needed? non-roce still has gid table entries, and the
> attr->ndev should be null so it will do the ENODEV naturally.

It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
Current check allows us to ensure that any error returned by this call is
handled.

Otherwise we will find ourselves with something like this:
ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
if (IS_ERR(ndev)) {
	if (rdma_protocol_roce())
		goto error;
	if (ERR_PTR(ndev) != -ENODEV)
	        goto error;
}

>
> > +		rcu_read_lock();
> > +		ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> > +		if (IS_ERR(ndev)) {
> > +		       rcu_read_unlock();
> > +		       goto out;
> > +		}
> > +		entry.netdev_ifindex = ndev->ifindex;
> > +		rcu_read_unlock();
> > +	}
>
> This is better than what is in rdma_get_ndev_ifindex()
>
> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-16 10:37             ` Leon Romanovsky
@ 2020-09-16 12:04               ` Jason Gunthorpe
  2020-09-16 12:44                 ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-16 12:04 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Wed, Sep 16, 2020 at 01:37:10PM +0300, Leon Romanovsky wrote:
> It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
> Current check allows us to ensure that any error returned by this call is
> handled.
> 
> Otherwise we will find ourselves with something like this:
> ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> if (IS_ERR(ndev)) {
> 	if (rdma_protocol_roce())
> 		goto error;
> 	if (ERR_PTR(ndev) != -ENODEV)
> 	        goto error;
> }

Isn't it just 

if (IS_ERR(ndev)) {
   if (ERR_PTR(ndev) != -ENODEV)
        goto error;
   index = -1;
}

Which seems fine and clear enough

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-16 12:04               ` Jason Gunthorpe
@ 2020-09-16 12:44                 ` Leon Romanovsky
  2020-09-16 14:12                   ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-16 12:44 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Wed, Sep 16, 2020 at 09:04:40AM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 16, 2020 at 01:37:10PM +0300, Leon Romanovsky wrote:
> > It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
> > Current check allows us to ensure that any error returned by this call is
> > handled.
> >
> > Otherwise we will find ourselves with something like this:
> > ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> > if (IS_ERR(ndev)) {
> > 	if (rdma_protocol_roce())
> > 		goto error;
> > 	if (ERR_PTR(ndev) != -ENODEV)
> > 	        goto error;
> > }
>
> Isn't it just
>
> if (IS_ERR(ndev)) {
>    if (ERR_PTR(ndev) != -ENODEV)
>         goto error;
>    index = -1;
> }
>
> Which seems fine and clear enough

It is a problem if roce device returned -ENODEV.
I don't want to paper-out this case by setting index = 0.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-16 12:44                 ` Leon Romanovsky
@ 2020-09-16 14:12                   ` Jason Gunthorpe
  2020-09-16 14:34                     ` Leon Romanovsky
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-16 14:12 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Wed, Sep 16, 2020 at 03:44:29PM +0300, Leon Romanovsky wrote:
> On Wed, Sep 16, 2020 at 09:04:40AM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 16, 2020 at 01:37:10PM +0300, Leon Romanovsky wrote:
> > > It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
> > > Current check allows us to ensure that any error returned by this call is
> > > handled.
> > >
> > > Otherwise we will find ourselves with something like this:
> > > ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> > > if (IS_ERR(ndev)) {
> > > 	if (rdma_protocol_roce())
> > > 		goto error;
> > > 	if (ERR_PTR(ndev) != -ENODEV)
> > > 	        goto error;
> > > }
> >
> > Isn't it just
> >
> > if (IS_ERR(ndev)) {
> >    if (ERR_PTR(ndev) != -ENODEV)
> >         goto error;
> >    index = -1;
> > }
> >
> > Which seems fine and clear enough
> 
> It is a problem if roce device returned -ENODEV.

Can it happen? RCU I suppose, but I think this is an issue in
rdma_read_gid_attr_ndev_rcu() - it should not return ENODEV if the RCU
shows the gid_attr is being concurrently destroyed

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-16 14:12                   ` Jason Gunthorpe
@ 2020-09-16 14:34                     ` Leon Romanovsky
  2020-09-16 14:36                       ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-16 14:34 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Wed, Sep 16, 2020 at 11:12:02AM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 16, 2020 at 03:44:29PM +0300, Leon Romanovsky wrote:
> > On Wed, Sep 16, 2020 at 09:04:40AM -0300, Jason Gunthorpe wrote:
> > > On Wed, Sep 16, 2020 at 01:37:10PM +0300, Leon Romanovsky wrote:
> > > > It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
> > > > Current check allows us to ensure that any error returned by this call is
> > > > handled.
> > > >
> > > > Otherwise we will find ourselves with something like this:
> > > > ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> > > > if (IS_ERR(ndev)) {
> > > > 	if (rdma_protocol_roce())
> > > > 		goto error;
> > > > 	if (ERR_PTR(ndev) != -ENODEV)
> > > > 	        goto error;
> > > > }
> > >
> > > Isn't it just
> > >
> > > if (IS_ERR(ndev)) {
> > >    if (ERR_PTR(ndev) != -ENODEV)
> > >         goto error;
> > >    index = -1;
> > > }
> > >
> > > Which seems fine and clear enough
> >
> > It is a problem if roce device returned -ENODEV.
>
> Can it happen? RCU I suppose, but I think this is an issue in
> rdma_read_gid_attr_ndev_rcu() - it should not return ENODEV if the RCU
> shows the gid_attr is being concurrently destroyed

From RoCE point of view, it is a problem if device is destroyed or gid
not valid, the different returned values won't change much. For the IB,
we don't care.

>
> Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
  2020-09-16 14:34                     ` Leon Romanovsky
@ 2020-09-16 14:36                       ` Jason Gunthorpe
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-16 14:36 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma

On Wed, Sep 16, 2020 at 05:34:25PM +0300, Leon Romanovsky wrote:
> On Wed, Sep 16, 2020 at 11:12:02AM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 16, 2020 at 03:44:29PM +0300, Leon Romanovsky wrote:
> > > On Wed, Sep 16, 2020 at 09:04:40AM -0300, Jason Gunthorpe wrote:
> > > > On Wed, Sep 16, 2020 at 01:37:10PM +0300, Leon Romanovsky wrote:
> > > > > It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
> > > > > Current check allows us to ensure that any error returned by this call is
> > > > > handled.
> > > > >
> > > > > Otherwise we will find ourselves with something like this:
> > > > > ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> > > > > if (IS_ERR(ndev)) {
> > > > > 	if (rdma_protocol_roce())
> > > > > 		goto error;
> > > > > 	if (ERR_PTR(ndev) != -ENODEV)
> > > > > 	        goto error;
> > > > > }
> > > >
> > > > Isn't it just
> > > >
> > > > if (IS_ERR(ndev)) {
> > > >    if (ERR_PTR(ndev) != -ENODEV)
> > > >         goto error;
> > > >    index = -1;
> > > > }
> > > >
> > > > Which seems fine and clear enough
> > >
> > > It is a problem if roce device returned -ENODEV.
> >
> > Can it happen? RCU I suppose, but I think this is an issue in
> > rdma_read_gid_attr_ndev_rcu() - it should not return ENODEV if the RCU
> > shows the gid_attr is being concurrently destroyed
> 
> From RoCE point of view, it is a problem if device is destroyed or gid
> not valid, the different returned values won't change much. For the IB,
> we don't care.

I'm saying I prefer things layered properly. Having random
rdma_protocol's all over the place has proven to be hard to maintain

One inside rdma_read_gid_attr_ndev_rcu() to make a NULL netdev an
error return for roce is much more understandable

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-09-16 21:09 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10 14:22 [PATCH rdma-next 0/4] Query GID table API Leon Romanovsky
2020-09-10 14:22 ` [PATCH rdma-next 1/4] RDMA/core: Change rdma_get_gid_attr returned error code Leon Romanovsky
2020-09-10 14:22 ` [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Leon Romanovsky
2020-09-11 19:11   ` Jason Gunthorpe
2020-09-13  7:42     ` Leon Romanovsky
2020-09-10 14:22 ` [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API Leon Romanovsky
2020-09-11 19:52   ` Jason Gunthorpe
2020-09-13  8:02     ` Leon Romanovsky
2020-09-14 15:50       ` Jason Gunthorpe
2020-09-10 14:22 ` [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space Leon Romanovsky
2020-09-11 19:59   ` Jason Gunthorpe
2020-09-13  9:13     ` Leon Romanovsky
2020-09-14 15:55       ` Jason Gunthorpe
2020-09-15 11:47         ` Leon Romanovsky
2020-09-15 19:06           ` Jason Gunthorpe
2020-09-16 10:37             ` Leon Romanovsky
2020-09-16 12:04               ` Jason Gunthorpe
2020-09-16 12:44                 ` Leon Romanovsky
2020-09-16 14:12                   ` Jason Gunthorpe
2020-09-16 14:34                     ` Leon Romanovsky
2020-09-16 14:36                       ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.