* [PATCH rdma-next 1/4] RDMA/core: Change rdma_get_gid_attr returned error code
2020-09-10 14:22 [PATCH rdma-next 0/4] Query GID table API Leon Romanovsky
@ 2020-09-10 14:22 ` Leon Romanovsky
2020-09-10 14:22 ` [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Leon Romanovsky
` (2 subsequent siblings)
3 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-10 14:22 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe; +Cc: Avihai Horon, linux-rdma
From: Avihai Horon <avihaih@nvidia.com>
Change the error code returned from rdma_get_gid_attr when the GID entry
is invalid but the GID index is in the gid table size range to -ENODATA
instead of -EINVAL.
This change is done in order to provide a more accurate error reporting
to be used by the new GID query API in user space. Nevertheless, -EINVAL
is still returned from sysfs in the aforementioned case to maintain
compatibility with user space that expects -EINVAL.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/cache.c | 2 +-
drivers/infiniband/core/sysfs.c | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index ffad73bb40ff..6079f1f7e678 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -1220,7 +1220,7 @@ EXPORT_SYMBOL(ib_get_cached_port_state);
const struct ib_gid_attr *
rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
{
- const struct ib_gid_attr *attr = ERR_PTR(-EINVAL);
+ const struct ib_gid_attr *attr = ERR_PTR(-ENODATA);
struct ib_gid_table *table;
unsigned long flags;
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 01d85da00187..1a5507b79437 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -391,7 +391,8 @@ static ssize_t _show_port_gid_attr(
gid_attr = rdma_get_gid_attr(p->ibdev, p->port_num, tab_attr->index);
if (IS_ERR(gid_attr))
- return PTR_ERR(gid_attr);
+ /* -EINVAL is returned for user space compatibility reasons. */
+ return -EINVAL;
ret = print(gid_attr, buf);
rdma_put_gid_attr(gid_attr);
--
2.26.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
2020-09-10 14:22 [PATCH rdma-next 0/4] Query GID table API Leon Romanovsky
2020-09-10 14:22 ` [PATCH rdma-next 1/4] RDMA/core: Change rdma_get_gid_attr returned error code Leon Romanovsky
@ 2020-09-10 14:22 ` Leon Romanovsky
2020-09-11 19:11 ` Jason Gunthorpe
2020-09-10 14:22 ` [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API Leon Romanovsky
2020-09-10 14:22 ` [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space Leon Romanovsky
3 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-10 14:22 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Avihai Horon, Ariel Elior, linux-rdma, Michal Kalderon
From: Avihai Horon <avihaih@nvidia.com>
Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values,
so enum ib_gid_type will match the gid types of the new query GID table
API which will be introduced in the following patches.
This change in enum ib_gid_type requires to change also enum
rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1
values.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/cache.c | 4 ++++
drivers/infiniband/core/cma.c | 10 ----------
drivers/infiniband/core/cma_configfs.c | 13 +++++++++----
drivers/infiniband/core/cma_priv.h | 10 ++++++++++
drivers/infiniband/core/verbs.c | 2 +-
drivers/infiniband/hw/mlx5/cq.c | 2 +-
drivers/infiniband/hw/mlx5/main.c | 4 ++--
drivers/infiniband/hw/qedr/verbs.c | 4 +++-
include/rdma/ib_verbs.h | 17 ++++++++++-------
9 files changed, 40 insertions(+), 26 deletions(-)
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 6079f1f7e678..cf49ac0b0aa6 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -133,7 +133,11 @@ static void dispatch_gid_change_event(struct ib_device *ib_dev, u8 port)
}
static const char * const gid_type_str[] = {
+ /* IB/RoCE v1 value is set for IB_GID_TYPE_IB and IB_GID_TYPE_ROCE for
+ * user space compatibility reasons.
+ */
[IB_GID_TYPE_IB] = "IB/RoCE v1",
+ [IB_GID_TYPE_ROCE] = "IB/RoCE v1",
[IB_GID_TYPE_ROCE_UDP_ENCAP] = "RoCE v2",
};
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 82338099ba09..0bb4e3212158 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -203,16 +203,6 @@ struct xarray *cma_pernet_xa(struct net *net, enum rdma_ucm_port_space ps)
}
}
-struct cma_device {
- struct list_head list;
- struct ib_device *device;
- struct completion comp;
- refcount_t refcount;
- struct list_head id_list;
- enum ib_gid_type *default_gid_type;
- u8 *default_roce_tos;
-};
-
struct rdma_bind_list {
enum rdma_ucm_port_space ps;
struct hlist_head owners;
diff --git a/drivers/infiniband/core/cma_configfs.c b/drivers/infiniband/core/cma_configfs.c
index 3c1e2ca564fe..f1b277793980 100644
--- a/drivers/infiniband/core/cma_configfs.c
+++ b/drivers/infiniband/core/cma_configfs.c
@@ -123,16 +123,21 @@ static ssize_t default_roce_mode_store(struct config_item *item,
{
struct cma_device *cma_dev;
struct cma_dev_port_group *group;
- int gid_type = ib_cache_gid_parse_type_str(buf);
+ int gid_type;
ssize_t ret;
- if (gid_type < 0)
- return -EINVAL;
-
ret = cma_configfs_params_get(item, &cma_dev, &group);
if (ret)
return ret;
+ gid_type = ib_cache_gid_parse_type_str(buf);
+ if (gid_type < 0)
+ return -EINVAL;
+
+ if (gid_type == IB_GID_TYPE_IB &&
+ rdma_protocol_roce_eth_encap(cma_dev->device, group->port_num))
+ gid_type = IB_GID_TYPE_ROCE;
+
ret = cma_set_default_gid_type(cma_dev, group->port_num, gid_type);
cma_configfs_params_put(cma_dev);
diff --git a/drivers/infiniband/core/cma_priv.h b/drivers/infiniband/core/cma_priv.h
index caece96ebcf5..5c99f6a53cd3 100644
--- a/drivers/infiniband/core/cma_priv.h
+++ b/drivers/infiniband/core/cma_priv.h
@@ -50,6 +50,16 @@ enum rdma_cm_state {
RDMA_CM_DESTROYING
};
+struct cma_device {
+ struct list_head list;
+ struct ib_device *device;
+ struct completion comp;
+ refcount_t refcount;
+ struct list_head id_list;
+ enum ib_gid_type *default_gid_type;
+ u8 *default_roce_tos;
+};
+
struct rdma_id_private {
struct rdma_cm_id id;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 6e02f26a30a0..ac4b02f5ae68 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -740,7 +740,7 @@ int ib_get_gids_from_rdma_hdr(const union rdma_network_hdr *hdr,
(struct in6_addr *)dgid);
return 0;
} else if (net_type == RDMA_NETWORK_IPV6 ||
- net_type == RDMA_NETWORK_IB) {
+ net_type == RDMA_NETWORK_IB || RDMA_NETWORK_ROCE_V1) {
*dgid = hdr->ibgrh.dgid;
*sgid = hdr->ibgrh.sgid;
return 0;
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 0748a5daa2dd..3d4e02ae6628 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -255,7 +255,7 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
switch (roce_packet_type) {
case MLX5_CQE_ROCE_L3_HEADER_TYPE_GRH:
- wc->network_hdr_type = RDMA_NETWORK_IB;
+ wc->network_hdr_type = RDMA_NETWORK_ROCE_V1;
break;
case MLX5_CQE_ROCE_L3_HEADER_TYPE_IPV6:
wc->network_hdr_type = RDMA_NETWORK_IPV6;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 8696ad50d8fb..5cf5266936de 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -546,7 +546,7 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u8 port_num,
unsigned int index, const union ib_gid *gid,
const struct ib_gid_attr *attr)
{
- enum ib_gid_type gid_type = IB_GID_TYPE_IB;
+ enum ib_gid_type gid_type = IB_GID_TYPE_ROCE;
u16 vlan_id = 0xffff;
u8 roce_version = 0;
u8 roce_l3_type = 0;
@@ -561,7 +561,7 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u8 port_num,
}
switch (gid_type) {
- case IB_GID_TYPE_IB:
+ case IB_GID_TYPE_ROCE:
roce_version = MLX5_ROCE_VERSION_1;
break;
case IB_GID_TYPE_ROCE_UDP_ENCAP:
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 3337d4b6a841..0395523958da 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1138,7 +1138,7 @@ static inline int get_gid_info_from_table(struct ib_qp *ibqp,
SET_FIELD(qp_params->modify_flags,
QED_ROCE_MODIFY_QP_VALID_ROCE_MODE, 1);
break;
- case RDMA_NETWORK_IB:
+ case RDMA_NETWORK_ROCE_V1:
memcpy(&qp_params->sgid.bytes[0], &gid_attr->gid.raw[0],
sizeof(qp_params->sgid));
memcpy(&qp_params->dgid.bytes[0],
@@ -1158,6 +1158,8 @@ static inline int get_gid_info_from_table(struct ib_qp *ibqp,
QED_ROCE_MODIFY_QP_VALID_ROCE_MODE, 1);
qp_params->roce_mode = ROCE_V2_IPV4;
break;
+ default:
+ return -EINVAL;
}
for (i = 0; i < 4; i++) {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index bde17507f3ea..80cc0f5de5c3 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -138,10 +138,9 @@ union ib_gid {
extern union ib_gid zgid;
enum ib_gid_type {
- /* If link layer is Ethernet, this is RoCE V1 */
IB_GID_TYPE_IB = 0,
- IB_GID_TYPE_ROCE = 0,
- IB_GID_TYPE_ROCE_UDP_ENCAP = 1,
+ IB_GID_TYPE_ROCE = 1,
+ IB_GID_TYPE_ROCE_UDP_ENCAP = 2,
IB_GID_TYPE_SIZE
};
@@ -180,7 +179,7 @@ rdma_node_get_transport(unsigned int node_type);
enum rdma_network_type {
RDMA_NETWORK_IB,
- RDMA_NETWORK_ROCE_V1 = RDMA_NETWORK_IB,
+ RDMA_NETWORK_ROCE_V1,
RDMA_NETWORK_IPV4,
RDMA_NETWORK_IPV6
};
@@ -190,9 +189,10 @@ static inline enum ib_gid_type ib_network_to_gid_type(enum rdma_network_type net
if (network_type == RDMA_NETWORK_IPV4 ||
network_type == RDMA_NETWORK_IPV6)
return IB_GID_TYPE_ROCE_UDP_ENCAP;
-
- /* IB_GID_TYPE_IB same as RDMA_NETWORK_ROCE_V1 */
- return IB_GID_TYPE_IB;
+ else if (network_type == RDMA_NETWORK_ROCE_V1)
+ return IB_GID_TYPE_ROCE;
+ else
+ return IB_GID_TYPE_IB;
}
static inline enum rdma_network_type
@@ -201,6 +201,9 @@ rdma_gid_attr_network_type(const struct ib_gid_attr *attr)
if (attr->gid_type == IB_GID_TYPE_IB)
return RDMA_NETWORK_IB;
+ if (attr->gid_type == IB_GID_TYPE_ROCE)
+ return RDMA_NETWORK_ROCE_V1;
+
if (ipv6_addr_v4mapped((struct in6_addr *)&attr->gid))
return RDMA_NETWORK_IPV4;
else
--
2.26.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
2020-09-10 14:22 ` [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Leon Romanovsky
@ 2020-09-11 19:11 ` Jason Gunthorpe
2020-09-13 7:42 ` Leon Romanovsky
0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-11 19:11 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Avihai Horon, Ariel Elior, linux-rdma, Michal Kalderon
On Thu, Sep 10, 2020 at 05:22:02PM +0300, Leon Romanovsky wrote:
> diff --git a/drivers/infiniband/core/cma_configfs.c b/drivers/infiniband/core/cma_configfs.c
> index 3c1e2ca564fe..f1b277793980 100644
> +++ b/drivers/infiniband/core/cma_configfs.c
> @@ -123,16 +123,21 @@ static ssize_t default_roce_mode_store(struct config_item *item,
> {
> struct cma_device *cma_dev;
> struct cma_dev_port_group *group;
> - int gid_type = ib_cache_gid_parse_type_str(buf);
> + int gid_type;
> ssize_t ret;
>
> - if (gid_type < 0)
> - return -EINVAL;
> -
> ret = cma_configfs_params_get(item, &cma_dev, &group);
> if (ret)
> return ret;
>
> + gid_type = ib_cache_gid_parse_type_str(buf);
> + if (gid_type < 0)
> + return -EINVAL;
> +
> + if (gid_type == IB_GID_TYPE_IB &&
> + rdma_protocol_roce_eth_encap(cma_dev->device, group->port_num))
> + gid_type = IB_GID_TYPE_ROCE;
This logic should be in cma_set_default_gid_type() so as not to move
struct cma_device
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
2020-09-11 19:11 ` Jason Gunthorpe
@ 2020-09-13 7:42 ` Leon Romanovsky
0 siblings, 0 replies; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-13 7:42 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Doug Ledford, Avihai Horon, Ariel Elior, linux-rdma, Michal Kalderon
On Fri, Sep 11, 2020 at 04:11:16PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 10, 2020 at 05:22:02PM +0300, Leon Romanovsky wrote:
>
> > diff --git a/drivers/infiniband/core/cma_configfs.c b/drivers/infiniband/core/cma_configfs.c
> > index 3c1e2ca564fe..f1b277793980 100644
> > +++ b/drivers/infiniband/core/cma_configfs.c
> > @@ -123,16 +123,21 @@ static ssize_t default_roce_mode_store(struct config_item *item,
> > {
> > struct cma_device *cma_dev;
> > struct cma_dev_port_group *group;
> > - int gid_type = ib_cache_gid_parse_type_str(buf);
> > + int gid_type;
> > ssize_t ret;
> >
> > - if (gid_type < 0)
> > - return -EINVAL;
> > -
> > ret = cma_configfs_params_get(item, &cma_dev, &group);
> > if (ret)
> > return ret;
> >
> > + gid_type = ib_cache_gid_parse_type_str(buf);
> > + if (gid_type < 0)
> > + return -EINVAL;
> > +
> > + if (gid_type == IB_GID_TYPE_IB &&
> > + rdma_protocol_roce_eth_encap(cma_dev->device, group->port_num))
> > + gid_type = IB_GID_TYPE_ROCE;
>
> This logic should be in cma_set_default_gid_type() so as not to move
> struct cma_device
We wanted to keep "this hack of git type" as close as possible to the
actual needed place.
I'll fix.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API
2020-09-10 14:22 [PATCH rdma-next 0/4] Query GID table API Leon Romanovsky
2020-09-10 14:22 ` [PATCH rdma-next 1/4] RDMA/core: Change rdma_get_gid_attr returned error code Leon Romanovsky
2020-09-10 14:22 ` [PATCH rdma-next 2/4] RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Leon Romanovsky
@ 2020-09-10 14:22 ` Leon Romanovsky
2020-09-11 19:52 ` Jason Gunthorpe
2020-09-10 14:22 ` [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space Leon Romanovsky
3 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-10 14:22 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe; +Cc: Avihai Horon, linux-rdma
From: Avihai Horon <avihaih@nvidia.com>
Introduce rdma_query_gid_table which enables querying all the GID tables
of a given device and copying the attributes of all valid GID entries to
a provided buffer.
This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/cache.c | 93 +++++++++++++++++++++++++
include/rdma/ib_cache.h | 5 ++
include/uapi/rdma/ib_user_ioctl_verbs.h | 8 +++
3 files changed, 106 insertions(+)
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index cf49ac0b0aa6..175e229eccd3 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -1247,6 +1247,99 @@ rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
}
EXPORT_SYMBOL(rdma_get_gid_attr);
+/**
+ * rdma_get_ndev_ifindex - Reads ndev ifindex of the given gid attr.
+ * @gid_attr: Pointer to the GID attribute.
+ * @ndev_ifindex: Pointer through which the ndev ifindex is returned.
+ *
+ * Returns 0 on success or appropriate error code. The netdevice must be in UP
+ * state.
+ */
+int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr, u32 *ndev_ifindex)
+{
+ struct net_device *ndev;
+ int ret = 0;
+
+ if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
+ *ndev_ifindex = 0;
+ return 0;
+ }
+
+ rcu_read_lock();
+ ndev = rcu_dereference(gid_attr->ndev);
+ if (!ndev || (READ_ONCE(ndev->flags) & IFF_UP) == 0) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ *ndev_ifindex = ndev->ifindex;
+out:
+ rcu_read_unlock();
+ return ret;
+}
+EXPORT_SYMBOL(rdma_get_ndev_ifindex);
+
+/**
+ * rdma_query_gid_table - Reads GID table entries of all the ports of a device up to max_entries.
+ * @device: The device to query.
+ * @entries: Entries where GID entries are returned.
+ * @max_entries: Maximum number of entries that can be returned.
+ * Entries array must be allocated to hold max_entries number of entries.
+ * @num_entries: Updated to the number of entries that were successfully read.
+ *
+ * Returns 0 on success or appropriate error code.
+ */
+int rdma_query_gid_table(struct ib_device *device,
+ struct ib_uverbs_gid_entry *entries,
+ size_t max_entries, size_t *num_entries)
+{
+ const struct ib_gid_attr *gid_attr;
+ struct ib_gid_table *table;
+ unsigned int port_num;
+ unsigned long flags;
+ int ret;
+ int i;
+
+ *num_entries = 0;
+ rdma_for_each_port(device, port_num) {
+ if (!rdma_ib_or_roce(device, port_num))
+ continue;
+
+ table = rdma_gid_table(device, port_num);
+ read_lock_irqsave(&table->rwlock, flags);
+ for (i = 0; i < table->sz; i++) {
+ if (!is_gid_entry_valid(table->data_vec[i]))
+ continue;
+ if (*num_entries >= max_entries) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ gid_attr = &table->data_vec[i]->attr;
+
+ memcpy(&entries->gid, &gid_attr->gid,
+ sizeof(gid_attr->gid));
+ entries->gid_index = gid_attr->index;
+ entries->port_num = gid_attr->port_num;
+ entries->gid_type = gid_attr->gid_type;
+ ret = rdma_get_ndev_ifindex(gid_attr,
+ &entries->netdev_ifindex);
+ if (ret)
+ goto err;
+
+ (*num_entries)++;
+ entries++;
+ }
+ read_unlock_irqrestore(&table->rwlock, flags);
+ }
+
+ return 0;
+err:
+ read_unlock_irqrestore(&table->rwlock, flags);
+ return ret;
+}
+EXPORT_SYMBOL(rdma_query_gid_table);
+
/**
* rdma_put_gid_attr - Release reference to the GID attribute
* @attr: Pointer to the GID attribute whose reference
diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h
index 66a8f369a2fa..94e4f729f8e4 100644
--- a/include/rdma/ib_cache.h
+++ b/include/rdma/ib_cache.h
@@ -110,5 +110,10 @@ const struct ib_gid_attr *rdma_get_gid_attr(struct ib_device *device,
u8 port_num, int index);
void rdma_put_gid_attr(const struct ib_gid_attr *attr);
void rdma_hold_gid_attr(const struct ib_gid_attr *attr);
+int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr,
+ u32 *ndev_ifindex);
+int rdma_query_gid_table(struct ib_device *device,
+ struct ib_uverbs_gid_entry *entries,
+ size_t max_entries, size_t *num_entries);
#endif /* _IB_CACHE_H */
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 5debab45ebcb..d5ac65ae2557 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -250,4 +250,12 @@ enum rdma_driver_id {
RDMA_DRIVER_SIW,
};
+struct ib_uverbs_gid_entry {
+ __aligned_u64 gid[2];
+ __u32 gid_index;
+ __u32 port_num;
+ __u32 gid_type;
+ __u32 netdev_ifindex; /* It is 0 if there is no netdev associated with it */
+};
+
#endif
--
2.26.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API
2020-09-10 14:22 ` [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API Leon Romanovsky
@ 2020-09-11 19:52 ` Jason Gunthorpe
2020-09-13 8:02 ` Leon Romanovsky
0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-11 19:52 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Thu, Sep 10, 2020 at 05:22:03PM +0300, Leon Romanovsky wrote:
> From: Avihai Horon <avihaih@nvidia.com>
>
> Introduce rdma_query_gid_table which enables querying all the GID tables
> of a given device and copying the attributes of all valid GID entries to
> a provided buffer.
>
> This API provides a faster way to query a GID table using single call and
> will be used in libibverbs to improve current approach that requires
> multiple calls to open, close and read multiple sysfs files for a single
> GID table entry.
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> drivers/infiniband/core/cache.c | 93 +++++++++++++++++++++++++
> include/rdma/ib_cache.h | 5 ++
> include/uapi/rdma/ib_user_ioctl_verbs.h | 8 +++
> 3 files changed, 106 insertions(+)
>
> diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> index cf49ac0b0aa6..175e229eccd3 100644
> +++ b/drivers/infiniband/core/cache.c
> @@ -1247,6 +1247,99 @@ rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
> }
> EXPORT_SYMBOL(rdma_get_gid_attr);
>
> +/**
> + * rdma_get_ndev_ifindex - Reads ndev ifindex of the given gid attr.
> + * @gid_attr: Pointer to the GID attribute.
> + * @ndev_ifindex: Pointer through which the ndev ifindex is returned.
> + *
> + * Returns 0 on success or appropriate error code. The netdevice must be in UP
> + * state.
> + */
> +int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr, u32 *ndev_ifindex)
> +{
> + struct net_device *ndev;
> + int ret = 0;
> +
> + if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
> + *ndev_ifindex = 0;
> + return 0;
> + }
> +
> + rcu_read_lock();
> + ndev = rcu_dereference(gid_attr->ndev);
> + if (!ndev || (READ_ONCE(ndev->flags) & IFF_UP) == 0) {
> + ret = -ENODEV;
> + goto out;
> + }
None of this is necessary to read the ifindex, especially since the
read_lock is being held.
> +/**
> + * rdma_query_gid_table - Reads GID table entries of all the ports of a device up to max_entries.
> + * @device: The device to query.
> + * @entries: Entries where GID entries are returned.
> + * @max_entries: Maximum number of entries that can be returned.
> + * Entries array must be allocated to hold max_entries number of entries.
> + * @num_entries: Updated to the number of entries that were successfully read.
> + *
> + * Returns 0 on success or appropriate error code.
> + */
> +int rdma_query_gid_table(struct ib_device *device,
> + struct ib_uverbs_gid_entry *entries,
> + size_t max_entries, size_t *num_entries)
return ssize_t instead of the output pointer
> +{
> + const struct ib_gid_attr *gid_attr;
> + struct ib_gid_table *table;
> + unsigned int port_num;
> + unsigned long flags;
> + int ret;
> + int i;
i is unsigned
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API
2020-09-11 19:52 ` Jason Gunthorpe
@ 2020-09-13 8:02 ` Leon Romanovsky
2020-09-14 15:50 ` Jason Gunthorpe
0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-13 8:02 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Fri, Sep 11, 2020 at 04:52:21PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 10, 2020 at 05:22:03PM +0300, Leon Romanovsky wrote:
> > From: Avihai Horon <avihaih@nvidia.com>
> >
> > Introduce rdma_query_gid_table which enables querying all the GID tables
> > of a given device and copying the attributes of all valid GID entries to
> > a provided buffer.
> >
> > This API provides a faster way to query a GID table using single call and
> > will be used in libibverbs to improve current approach that requires
> > multiple calls to open, close and read multiple sysfs files for a single
> > GID table entry.
> >
> > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > drivers/infiniband/core/cache.c | 93 +++++++++++++++++++++++++
> > include/rdma/ib_cache.h | 5 ++
> > include/uapi/rdma/ib_user_ioctl_verbs.h | 8 +++
> > 3 files changed, 106 insertions(+)
> >
> > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> > index cf49ac0b0aa6..175e229eccd3 100644
> > +++ b/drivers/infiniband/core/cache.c
> > @@ -1247,6 +1247,99 @@ rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
> > }
> > EXPORT_SYMBOL(rdma_get_gid_attr);
> >
> > +/**
> > + * rdma_get_ndev_ifindex - Reads ndev ifindex of the given gid attr.
> > + * @gid_attr: Pointer to the GID attribute.
> > + * @ndev_ifindex: Pointer through which the ndev ifindex is returned.
> > + *
> > + * Returns 0 on success or appropriate error code. The netdevice must be in UP
> > + * state.
> > + */
> > +int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr, u32 *ndev_ifindex)
> > +{
> > + struct net_device *ndev;
> > + int ret = 0;
> > +
> > + if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
> > + *ndev_ifindex = 0;
> > + return 0;
> > + }
> > +
> > + rcu_read_lock();
> > + ndev = rcu_dereference(gid_attr->ndev);
> > + if (!ndev || (READ_ONCE(ndev->flags) & IFF_UP) == 0) {
> > + ret = -ENODEV;
> > + goto out;
> > + }
>
> None of this is necessary to read the ifindex, especially since the
> read_lock is being held.
I see same rcu_read_lock->rcu_dereference->rcu_read_unlock pattern in
rdma_read_gid_l2_fields(), why this function is different?
>
> > +/**
> > + * rdma_query_gid_table - Reads GID table entries of all the ports of a device up to max_entries.
> > + * @device: The device to query.
> > + * @entries: Entries where GID entries are returned.
> > + * @max_entries: Maximum number of entries that can be returned.
> > + * Entries array must be allocated to hold max_entries number of entries.
> > + * @num_entries: Updated to the number of entries that were successfully read.
> > + *
> > + * Returns 0 on success or appropriate error code.
> > + */
> > +int rdma_query_gid_table(struct ib_device *device,
> > + struct ib_uverbs_gid_entry *entries,
> > + size_t max_entries, size_t *num_entries)
>
> return ssize_t instead of the output pointer
I'll change.
>
> > +{
> > + const struct ib_gid_attr *gid_attr;
> > + struct ib_gid_table *table;
> > + unsigned int port_num;
> > + unsigned long flags;
> > + int ret;
> > + int i;
>
> i is unsigned
"i" is used as an iterator till table->sz while "sz" is declared as int.
I'll change it to be unsigned int, but it is not needed.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API
2020-09-13 8:02 ` Leon Romanovsky
@ 2020-09-14 15:50 ` Jason Gunthorpe
0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-14 15:50 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Sun, Sep 13, 2020 at 11:02:33AM +0300, Leon Romanovsky wrote:
> On Fri, Sep 11, 2020 at 04:52:21PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 10, 2020 at 05:22:03PM +0300, Leon Romanovsky wrote:
> > > From: Avihai Horon <avihaih@nvidia.com>
> > >
> > > Introduce rdma_query_gid_table which enables querying all the GID tables
> > > of a given device and copying the attributes of all valid GID entries to
> > > a provided buffer.
> > >
> > > This API provides a faster way to query a GID table using single call and
> > > will be used in libibverbs to improve current approach that requires
> > > multiple calls to open, close and read multiple sysfs files for a single
> > > GID table entry.
> > >
> > > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > drivers/infiniband/core/cache.c | 93 +++++++++++++++++++++++++
> > > include/rdma/ib_cache.h | 5 ++
> > > include/uapi/rdma/ib_user_ioctl_verbs.h | 8 +++
> > > 3 files changed, 106 insertions(+)
> > >
> > > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> > > index cf49ac0b0aa6..175e229eccd3 100644
> > > +++ b/drivers/infiniband/core/cache.c
> > > @@ -1247,6 +1247,99 @@ rdma_get_gid_attr(struct ib_device *device, u8 port_num, int index)
> > > }
> > > EXPORT_SYMBOL(rdma_get_gid_attr);
> > >
> > > +/**
> > > + * rdma_get_ndev_ifindex - Reads ndev ifindex of the given gid attr.
> > > + * @gid_attr: Pointer to the GID attribute.
> > > + * @ndev_ifindex: Pointer through which the ndev ifindex is returned.
> > > + *
> > > + * Returns 0 on success or appropriate error code. The netdevice must be in UP
> > > + * state.
> > > + */
> > > +int rdma_get_ndev_ifindex(const struct ib_gid_attr *gid_attr, u32 *ndev_ifindex)
> > > +{
> > > + struct net_device *ndev;
> > > + int ret = 0;
> > > +
> > > + if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
> > > + *ndev_ifindex = 0;
> > > + return 0;
> > > + }
> > > +
> > > + rcu_read_lock();
> > > + ndev = rcu_dereference(gid_attr->ndev);
> > > + if (!ndev || (READ_ONCE(ndev->flags) & IFF_UP) == 0) {
> > > + ret = -ENODEV;
> > > + goto out;
> > > + }
> >
> > None of this is necessary to read the ifindex, especially since the
> > read_lock is being held.
>
> I see same rcu_read_lock->rcu_dereference->rcu_read_unlock pattern in
> rdma_read_gid_l2_fields(), why this function is different?
It doesn't hold the read_lock so it is using rcu to access
attr->ndev. With the read_lock held attr->ndev is stable and none of
this is needed.
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-10 14:22 [PATCH rdma-next 0/4] Query GID table API Leon Romanovsky
` (2 preceding siblings ...)
2020-09-10 14:22 ` [PATCH rdma-next 3/4] RDMA/core: Introduce new GID table query API Leon Romanovsky
@ 2020-09-10 14:22 ` Leon Romanovsky
2020-09-11 19:59 ` Jason Gunthorpe
3 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-10 14:22 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe; +Cc: Avihai Horon, linux-rdma
From: Avihai Horon <avihaih@nvidia.com>
Expose the query GID table and entry API to user space by adding
two new methods and method handlers to the device object.
This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
.../infiniband/core/uverbs_std_types_device.c | 180 +++++++++++++++++-
include/rdma/ib_verbs.h | 6 +-
include/uapi/rdma/ib_user_ioctl_cmds.h | 16 ++
include/uapi/rdma/ib_user_ioctl_verbs.h | 6 +
4 files changed, 204 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/core/uverbs_std_types_device.c b/drivers/infiniband/core/uverbs_std_types_device.c
index 7b03446b6936..beba1e284264 100644
--- a/drivers/infiniband/core/uverbs_std_types_device.c
+++ b/drivers/infiniband/core/uverbs_std_types_device.c
@@ -8,6 +8,7 @@
#include "uverbs.h"
#include <rdma/uverbs_ioctl.h>
#include <rdma/opa_addr.h>
+#include <rdma/ib_cache.h>
/*
* This ioctl method allows calling any defined write or write_ex
@@ -266,6 +267,157 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_CONTEXT)(
return ucontext->device->ops.query_ucontext(ucontext, attrs);
}
+static int copy_gid_entries_to_user(struct uverbs_attr_bundle *attrs,
+ struct ib_uverbs_gid_entry *entries,
+ size_t num_entries, size_t user_entry_size)
+{
+ const struct uverbs_attr *attr;
+ void __user *user_entries;
+ size_t copy_len;
+ int ret;
+ int i;
+
+ if (user_entry_size == sizeof(*entries)) {
+ ret = uverbs_copy_to(attrs,
+ UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+ entries, sizeof(*entries) * num_entries);
+ return ret;
+ }
+
+ copy_len = min_t(size_t, user_entry_size, sizeof(*entries));
+ attr = uverbs_attr_get(attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
+ if (IS_ERR(attr))
+ return PTR_ERR(attr);
+
+ user_entries = u64_to_user_ptr(attr->ptr_attr.data);
+ for (i = 0; i < num_entries; i++) {
+ if (copy_to_user(user_entries, entries, copy_len))
+ return -EFAULT;
+
+ if (user_entry_size > sizeof(*entries)) {
+ if (clear_user(user_entries + sizeof(*entries),
+ user_entry_size - sizeof(*entries)))
+ return -EFAULT;
+ }
+
+ entries++;
+ user_entries += user_entry_size;
+ }
+
+ return uverbs_output_written(attrs,
+ UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
+}
+
+static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_TABLE)(
+ struct uverbs_attr_bundle *attrs)
+{
+ struct ib_uverbs_gid_entry *entries;
+ struct ib_ucontext *ucontext;
+ struct ib_device *ib_dev;
+ size_t user_entry_size;
+ size_t max_entries;
+ size_t num_entries;
+ u32 flags;
+ int ret;
+
+ ret = uverbs_get_flags32(&flags, attrs,
+ UVERBS_ATTR_QUERY_GID_TABLE_FLAGS, 0);
+ if (ret)
+ return ret;
+
+ ret = uverbs_get_const(&user_entry_size, attrs,
+ UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE);
+ if (ret)
+ return ret;
+
+ max_entries = uverbs_attr_ptr_get_array_size(
+ attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+ user_entry_size);
+ if (max_entries <= 0)
+ return -EINVAL;
+
+ ucontext = ib_uverbs_get_ucontext(attrs);
+ if (IS_ERR(ucontext))
+ return PTR_ERR(ucontext);
+ ib_dev = ucontext->device;
+
+ entries = uverbs_zalloc(attrs, max_entries * sizeof(*entries));
+ if (!entries)
+ return -ENOMEM;
+
+ ret = rdma_query_gid_table(ib_dev, entries, max_entries, &num_entries);
+ if (ret)
+ return ret;
+
+ ret = copy_gid_entries_to_user(attrs, entries, num_entries,
+ user_entry_size);
+ if (ret)
+ return ret;
+
+ ret = uverbs_copy_to(attrs,
+ UVERBS_ATTR_QUERY_GID_TABLE_RESP_NUM_ENTRIES,
+ &num_entries, sizeof(num_entries));
+ return ret;
+}
+
+static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
+ struct uverbs_attr_bundle *attrs)
+{
+ const struct ib_gid_attr *gid_attr;
+ struct ib_uverbs_gid_entry entry;
+ struct ib_ucontext *ucontext;
+ struct ib_device *ib_dev;
+ u32 gid_index;
+ u32 port_num;
+ u32 flags;
+ int ret;
+
+ ret = uverbs_get_flags32(&flags, attrs,
+ UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
+ if (ret)
+ return ret;
+
+ ret = uverbs_get_const(&port_num, attrs,
+ UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
+ if (ret)
+ return ret;
+
+ ret = uverbs_get_const(&gid_index, attrs,
+ UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
+ if (ret)
+ return ret;
+
+ ucontext = ib_uverbs_get_ucontext(attrs);
+ if (IS_ERR(ucontext))
+ return PTR_ERR(ucontext);
+ ib_dev = ucontext->device;
+
+ if (!rdma_is_port_valid(ib_dev, port_num))
+ return -EINVAL;
+
+ if (!rdma_ib_or_roce(ib_dev, port_num))
+ return -EINVAL;
+
+ gid_attr = rdma_get_gid_attr(ib_dev, port_num, gid_index);
+ if (IS_ERR(gid_attr))
+ return PTR_ERR(gid_attr);
+
+ memcpy(&entry.gid, &gid_attr->gid, sizeof(gid_attr->gid));
+ entry.gid_index = gid_attr->index;
+ entry.port_num = gid_attr->port_num;
+ entry.gid_type = gid_attr->gid_type;
+ ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
+ if (ret)
+ goto out;
+
+ ret = uverbs_copy_to_struct_or_zero(
+ attrs, UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY, &entry,
+ sizeof(entry));
+out:
+ rdma_put_gid_attr(gid_attr);
+ return ret;
+}
+
DECLARE_UVERBS_NAMED_METHOD(
UVERBS_METHOD_GET_CONTEXT,
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_GET_CONTEXT_NUM_COMP_VECTORS,
@@ -300,12 +452,38 @@ DECLARE_UVERBS_NAMED_METHOD(
reserved),
UA_MANDATORY));
+DECLARE_UVERBS_NAMED_METHOD(
+ UVERBS_METHOD_QUERY_GID_TABLE,
+ UVERBS_ATTR_CONST_IN(UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE, u64,
+ UA_MANDATORY),
+ UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_QUERY_GID_TABLE_FLAGS, u32,
+ UA_OPTIONAL),
+ UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+ UVERBS_ATTR_MIN_SIZE(0), UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_QUERY_GID_TABLE_RESP_NUM_ENTRIES,
+ UVERBS_ATTR_TYPE(u64), UA_MANDATORY));
+
+DECLARE_UVERBS_NAMED_METHOD(
+ UVERBS_METHOD_QUERY_GID_ENTRY,
+ UVERBS_ATTR_CONST_IN(UVERBS_ATTR_QUERY_GID_ENTRY_PORT, u32,
+ UA_MANDATORY),
+ UVERBS_ATTR_CONST_IN(UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX, u32,
+ UA_MANDATORY),
+ UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, u32,
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY,
+ UVERBS_ATTR_STRUCT(struct ib_uverbs_gid_entry,
+ netdev_ifindex),
+ UA_MANDATORY));
+
DECLARE_UVERBS_GLOBAL_METHODS(UVERBS_OBJECT_DEVICE,
&UVERBS_METHOD(UVERBS_METHOD_GET_CONTEXT),
&UVERBS_METHOD(UVERBS_METHOD_INVOKE_WRITE),
&UVERBS_METHOD(UVERBS_METHOD_INFO_HANDLES),
&UVERBS_METHOD(UVERBS_METHOD_QUERY_PORT),
- &UVERBS_METHOD(UVERBS_METHOD_QUERY_CONTEXT));
+ &UVERBS_METHOD(UVERBS_METHOD_QUERY_CONTEXT),
+ &UVERBS_METHOD(UVERBS_METHOD_QUERY_GID_TABLE),
+ &UVERBS_METHOD(UVERBS_METHOD_QUERY_GID_ENTRY));
const struct uapi_definition uverbs_def_obj_device[] = {
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_DEVICE),
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 80cc0f5de5c3..1e170e09afb0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -138,9 +138,9 @@ union ib_gid {
extern union ib_gid zgid;
enum ib_gid_type {
- IB_GID_TYPE_IB = 0,
- IB_GID_TYPE_ROCE = 1,
- IB_GID_TYPE_ROCE_UDP_ENCAP = 2,
+ IB_GID_TYPE_IB = IB_UVERBS_GID_TYPE_IB,
+ IB_GID_TYPE_ROCE = IB_UVERBS_GID_TYPE_ROCE_V1,
+ IB_GID_TYPE_ROCE_UDP_ENCAP = IB_UVERBS_GID_TYPE_ROCE_V2,
IB_GID_TYPE_SIZE
};
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 99dcabf61a71..7968a1845355 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -70,6 +70,8 @@ enum uverbs_methods_device {
UVERBS_METHOD_QUERY_PORT,
UVERBS_METHOD_GET_CONTEXT,
UVERBS_METHOD_QUERY_CONTEXT,
+ UVERBS_METHOD_QUERY_GID_TABLE,
+ UVERBS_METHOD_QUERY_GID_ENTRY,
};
enum uverbs_attrs_invoke_write_cmd_attr_ids {
@@ -352,4 +354,18 @@ enum uverbs_attrs_async_event_create {
UVERBS_ATTR_ASYNC_EVENT_ALLOC_FD_HANDLE,
};
+enum uverbs_attrs_query_gid_table_cmd_attr_ids {
+ UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE,
+ UVERBS_ATTR_QUERY_GID_TABLE_FLAGS,
+ UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
+ UVERBS_ATTR_QUERY_GID_TABLE_RESP_NUM_ENTRIES,
+};
+
+enum uverbs_attrs_query_gid_entry_cmd_attr_ids {
+ UVERBS_ATTR_QUERY_GID_ENTRY_PORT,
+ UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX,
+ UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS,
+ UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY,
+};
+
#endif
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index d5ac65ae2557..cfea82acfe57 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -250,6 +250,12 @@ enum rdma_driver_id {
RDMA_DRIVER_SIW,
};
+enum ib_uverbs_gid_type {
+ IB_UVERBS_GID_TYPE_IB,
+ IB_UVERBS_GID_TYPE_ROCE_V1,
+ IB_UVERBS_GID_TYPE_ROCE_V2,
+};
+
struct ib_uverbs_gid_entry {
__aligned_u64 gid[2];
__u32 gid_index;
--
2.26.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-10 14:22 ` [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space Leon Romanovsky
@ 2020-09-11 19:59 ` Jason Gunthorpe
2020-09-13 9:13 ` Leon Romanovsky
0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-11 19:59 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Thu, Sep 10, 2020 at 05:22:04PM +0300, Leon Romanovsky wrote:
> From: Avihai Horon <avihaih@nvidia.com>
>
> Expose the query GID table and entry API to user space by adding
> two new methods and method handlers to the device object.
>
> This API provides a faster way to query a GID table using single call and
> will be used in libibverbs to improve current approach that requires
> multiple calls to open, close and read multiple sysfs files for a single
> GID table entry.
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> .../infiniband/core/uverbs_std_types_device.c | 180 +++++++++++++++++-
> include/rdma/ib_verbs.h | 6 +-
> include/uapi/rdma/ib_user_ioctl_cmds.h | 16 ++
> include/uapi/rdma/ib_user_ioctl_verbs.h | 6 +
> 4 files changed, 204 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/core/uverbs_std_types_device.c b/drivers/infiniband/core/uverbs_std_types_device.c
> index 7b03446b6936..beba1e284264 100644
> +++ b/drivers/infiniband/core/uverbs_std_types_device.c
> @@ -8,6 +8,7 @@
> #include "uverbs.h"
> #include <rdma/uverbs_ioctl.h>
> #include <rdma/opa_addr.h>
> +#include <rdma/ib_cache.h>
>
> /*
> * This ioctl method allows calling any defined write or write_ex
> @@ -266,6 +267,157 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_CONTEXT)(
> return ucontext->device->ops.query_ucontext(ucontext, attrs);
> }
>
> +static int copy_gid_entries_to_user(struct uverbs_attr_bundle *attrs,
> + struct ib_uverbs_gid_entry *entries,
> + size_t num_entries, size_t user_entry_size)
> +{
> + const struct uverbs_attr *attr;
> + void __user *user_entries;
> + size_t copy_len;
> + int ret;
> + int i;
> +
> + if (user_entry_size == sizeof(*entries)) {
> + ret = uverbs_copy_to(attrs,
> + UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
> + entries, sizeof(*entries) * num_entries);
> + return ret;
> + }
> +
> + copy_len = min_t(size_t, user_entry_size, sizeof(*entries));
> + attr = uverbs_attr_get(attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
> + if (IS_ERR(attr))
> + return PTR_ERR(attr);
> +
> + user_entries = u64_to_user_ptr(attr->ptr_attr.data);
> + for (i = 0; i < num_entries; i++) {
> + if (copy_to_user(user_entries, entries, copy_len))
> + return -EFAULT;
> +
> + if (user_entry_size > sizeof(*entries)) {
> + if (clear_user(user_entries + sizeof(*entries),
> + user_entry_size - sizeof(*entries)))
> + return -EFAULT;
> + }
> +
> + entries++;
> + user_entries += user_entry_size;
> + }
> +
> + return uverbs_output_written(attrs,
> + UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
> +}
> +
> +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_TABLE)(
> + struct uverbs_attr_bundle *attrs)
> +{
> + struct ib_uverbs_gid_entry *entries;
> + struct ib_ucontext *ucontext;
> + struct ib_device *ib_dev;
> + size_t user_entry_size;
> + size_t max_entries;
> + size_t num_entries;
> + u32 flags;
> + int ret;
> +
> + ret = uverbs_get_flags32(&flags, attrs,
> + UVERBS_ATTR_QUERY_GID_TABLE_FLAGS, 0);
> + if (ret)
> + return ret;
> +
> + ret = uverbs_get_const(&user_entry_size, attrs,
> + UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE);
> + if (ret)
> + return ret;
> +
> + max_entries = uverbs_attr_ptr_get_array_size(
> + attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
> + user_entry_size);
> + if (max_entries <= 0)
> + return -EINVAL;
> +
> + ucontext = ib_uverbs_get_ucontext(attrs);
> + if (IS_ERR(ucontext))
> + return PTR_ERR(ucontext);
> + ib_dev = ucontext->device;
> +
> + entries = uverbs_zalloc(attrs, max_entries * sizeof(*entries));
This multiplication could overflow
> +
> +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> + struct uverbs_attr_bundle *attrs)
> +{
> + const struct ib_gid_attr *gid_attr;
> + struct ib_uverbs_gid_entry entry;
> + struct ib_ucontext *ucontext;
> + struct ib_device *ib_dev;
> + u32 gid_index;
> + u32 port_num;
> + u32 flags;
> + int ret;
> +
> + ret = uverbs_get_flags32(&flags, attrs,
> + UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> + if (ret)
> + return ret;
> +
> + ret = uverbs_get_const(&port_num, attrs,
> + UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> + if (ret)
> + return ret;
> +
> + ret = uverbs_get_const(&gid_index, attrs,
> + UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> + if (ret)
> + return ret;
> +
> + ucontext = ib_uverbs_get_ucontext(attrs);
> + if (IS_ERR(ucontext))
> + return PTR_ERR(ucontext);
> + ib_dev = ucontext->device;
> + if (!rdma_is_port_valid(ib_dev, port_num))
> + return -EINVAL;
> +
> + if (!rdma_ib_or_roce(ib_dev, port_num))
> + return -EINVAL;
Why these two tests? I would expect rdma_get_gid_attr() to do them
> + gid_attr = rdma_get_gid_attr(ib_dev, port_num, gid_index);
> + if (IS_ERR(gid_attr))
> + return PTR_ERR(gid_attr);
> +
> + memcpy(&entry.gid, &gid_attr->gid, sizeof(gid_attr->gid));
> + entry.gid_index = gid_attr->index;
> + entry.port_num = gid_attr->port_num;
> + entry.gid_type = gid_attr->gid_type;
> + ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
Use rdma_read_gid_attr_ndev_rcu()
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-11 19:59 ` Jason Gunthorpe
@ 2020-09-13 9:13 ` Leon Romanovsky
2020-09-14 15:55 ` Jason Gunthorpe
0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-13 9:13 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Fri, Sep 11, 2020 at 04:59:18PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 10, 2020 at 05:22:04PM +0300, Leon Romanovsky wrote:
> > From: Avihai Horon <avihaih@nvidia.com>
> >
> > Expose the query GID table and entry API to user space by adding
> > two new methods and method handlers to the device object.
> >
> > This API provides a faster way to query a GID table using single call and
> > will be used in libibverbs to improve current approach that requires
> > multiple calls to open, close and read multiple sysfs files for a single
> > GID table entry.
> >
> > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > .../infiniband/core/uverbs_std_types_device.c | 180 +++++++++++++++++-
> > include/rdma/ib_verbs.h | 6 +-
> > include/uapi/rdma/ib_user_ioctl_cmds.h | 16 ++
> > include/uapi/rdma/ib_user_ioctl_verbs.h | 6 +
> > 4 files changed, 204 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/uverbs_std_types_device.c b/drivers/infiniband/core/uverbs_std_types_device.c
> > index 7b03446b6936..beba1e284264 100644
> > +++ b/drivers/infiniband/core/uverbs_std_types_device.c
> > @@ -8,6 +8,7 @@
> > #include "uverbs.h"
> > #include <rdma/uverbs_ioctl.h>
> > #include <rdma/opa_addr.h>
> > +#include <rdma/ib_cache.h>
> >
> > /*
> > * This ioctl method allows calling any defined write or write_ex
> > @@ -266,6 +267,157 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_CONTEXT)(
> > return ucontext->device->ops.query_ucontext(ucontext, attrs);
> > }
> >
> > +static int copy_gid_entries_to_user(struct uverbs_attr_bundle *attrs,
> > + struct ib_uverbs_gid_entry *entries,
> > + size_t num_entries, size_t user_entry_size)
> > +{
> > + const struct uverbs_attr *attr;
> > + void __user *user_entries;
> > + size_t copy_len;
> > + int ret;
> > + int i;
> > +
> > + if (user_entry_size == sizeof(*entries)) {
> > + ret = uverbs_copy_to(attrs,
> > + UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
> > + entries, sizeof(*entries) * num_entries);
> > + return ret;
> > + }
> > +
> > + copy_len = min_t(size_t, user_entry_size, sizeof(*entries));
> > + attr = uverbs_attr_get(attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
> > + if (IS_ERR(attr))
> > + return PTR_ERR(attr);
> > +
> > + user_entries = u64_to_user_ptr(attr->ptr_attr.data);
> > + for (i = 0; i < num_entries; i++) {
> > + if (copy_to_user(user_entries, entries, copy_len))
> > + return -EFAULT;
> > +
> > + if (user_entry_size > sizeof(*entries)) {
> > + if (clear_user(user_entries + sizeof(*entries),
> > + user_entry_size - sizeof(*entries)))
> > + return -EFAULT;
> > + }
> > +
> > + entries++;
> > + user_entries += user_entry_size;
> > + }
> > +
> > + return uverbs_output_written(attrs,
> > + UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES);
> > +}
> > +
> > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_TABLE)(
> > + struct uverbs_attr_bundle *attrs)
> > +{
> > + struct ib_uverbs_gid_entry *entries;
> > + struct ib_ucontext *ucontext;
> > + struct ib_device *ib_dev;
> > + size_t user_entry_size;
> > + size_t max_entries;
> > + size_t num_entries;
> > + u32 flags;
> > + int ret;
> > +
> > + ret = uverbs_get_flags32(&flags, attrs,
> > + UVERBS_ATTR_QUERY_GID_TABLE_FLAGS, 0);
> > + if (ret)
> > + return ret;
> > +
> > + ret = uverbs_get_const(&user_entry_size, attrs,
> > + UVERBS_ATTR_QUERY_GID_TABLE_ENTRY_SIZE);
> > + if (ret)
> > + return ret;
> > +
> > + max_entries = uverbs_attr_ptr_get_array_size(
> > + attrs, UVERBS_ATTR_QUERY_GID_TABLE_RESP_ENTRIES,
> > + user_entry_size);
> > + if (max_entries <= 0)
> > + return -EINVAL;
> > +
> > + ucontext = ib_uverbs_get_ucontext(attrs);
> > + if (IS_ERR(ucontext))
> > + return PTR_ERR(ucontext);
> > + ib_dev = ucontext->device;
> > +
> > + entries = uverbs_zalloc(attrs, max_entries * sizeof(*entries));
>
> This multiplication could overflow
Right, I fixed it.
>
> > +
> > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> > + struct uverbs_attr_bundle *attrs)
> > +{
> > + const struct ib_gid_attr *gid_attr;
> > + struct ib_uverbs_gid_entry entry;
> > + struct ib_ucontext *ucontext;
> > + struct ib_device *ib_dev;
> > + u32 gid_index;
> > + u32 port_num;
> > + u32 flags;
> > + int ret;
> > +
> > + ret = uverbs_get_flags32(&flags, attrs,
> > + UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> > + if (ret)
> > + return ret;
> > +
> > + ret = uverbs_get_const(&port_num, attrs,
> > + UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> > + if (ret)
> > + return ret;
> > +
> > + ret = uverbs_get_const(&gid_index, attrs,
> > + UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> > + if (ret)
> > + return ret;
> > +
> > + ucontext = ib_uverbs_get_ucontext(attrs);
> > + if (IS_ERR(ucontext))
> > + return PTR_ERR(ucontext);
> > + ib_dev = ucontext->device;
>
> > + if (!rdma_is_port_valid(ib_dev, port_num))
> > + return -EINVAL;
> > +
> > + if (!rdma_ib_or_roce(ib_dev, port_num))
> > + return -EINVAL;
>
> Why these two tests? I would expect rdma_get_gid_attr() to do them
First check is not needed, but the second check doesn't exist in
rdma_get_gid_attr(). We don't check that table returned from
rdma_gid_table() call exists.
>
> > + gid_attr = rdma_get_gid_attr(ib_dev, port_num, gid_index);
> > + if (IS_ERR(gid_attr))
> > + return PTR_ERR(gid_attr);
> > +
> > + memcpy(&entry.gid, &gid_attr->gid, sizeof(gid_attr->gid));
> > + entry.gid_index = gid_attr->index;
> > + entry.port_num = gid_attr->port_num;
> > + entry.gid_type = gid_attr->gid_type;
> > + ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
>
> Use rdma_read_gid_attr_ndev_rcu()
I don't want to bring below logic to uverbs* file.
1263 if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
1264 *ndev_ifindex = 0;
1265 return 0;
1266 }
>
> Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-13 9:13 ` Leon Romanovsky
@ 2020-09-14 15:55 ` Jason Gunthorpe
2020-09-15 11:47 ` Leon Romanovsky
0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-14 15:55 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Sun, Sep 13, 2020 at 12:13:02PM +0300, Leon Romanovsky wrote:
> > > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> > > + struct uverbs_attr_bundle *attrs)
> > > +{
> > > + const struct ib_gid_attr *gid_attr;
> > > + struct ib_uverbs_gid_entry entry;
> > > + struct ib_ucontext *ucontext;
> > > + struct ib_device *ib_dev;
> > > + u32 gid_index;
> > > + u32 port_num;
> > > + u32 flags;
> > > + int ret;
> > > +
> > > + ret = uverbs_get_flags32(&flags, attrs,
> > > + UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + ret = uverbs_get_const(&port_num, attrs,
> > > + UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + ret = uverbs_get_const(&gid_index, attrs,
> > > + UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + ucontext = ib_uverbs_get_ucontext(attrs);
> > > + if (IS_ERR(ucontext))
> > > + return PTR_ERR(ucontext);
> > > + ib_dev = ucontext->device;
> >
> > > + if (!rdma_is_port_valid(ib_dev, port_num))
> > > + return -EINVAL;
> > > +
> > > + if (!rdma_ib_or_roce(ib_dev, port_num))
> > > + return -EINVAL;
> >
> > Why these two tests? I would expect rdma_get_gid_attr() to do them
>
> First check is not needed, but the second check doesn't exist in
> rdma_get_gid_attr(). We don't check that table returned from
> rdma_gid_table() call exists.
Oh that is a bit exciting, maybe it should be checked...
Ideally we should also block this uapi entirely if the device doesn't
have a gid table, so this should be -EOPNOTSUP and moved up to the
top so it can be moved once we figure it out.
> > > + gid_attr = rdma_get_gid_attr(ib_dev, port_num, gid_index);
> > > + if (IS_ERR(gid_attr))
> > > + return PTR_ERR(gid_attr);
> > > +
> > > + memcpy(&entry.gid, &gid_attr->gid, sizeof(gid_attr->gid));
> > > + entry.gid_index = gid_attr->index;
> > > + entry.port_num = gid_attr->port_num;
> > > + entry.gid_type = gid_attr->gid_type;
> > > + ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
> >
> > Use rdma_read_gid_attr_ndev_rcu()
>
> I don't want to bring below logic to uverbs* file.
>
> 1263 if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
> 1264 *ndev_ifindex = 0;
> 1265 return 0;
> 1266 }
Shouldn't be needed, rdma_read_gid_attr_ndev_rcu() already returns -1
for IB
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-14 15:55 ` Jason Gunthorpe
@ 2020-09-15 11:47 ` Leon Romanovsky
2020-09-15 19:06 ` Jason Gunthorpe
0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-15 11:47 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Mon, Sep 14, 2020 at 12:55:50PM -0300, Jason Gunthorpe wrote:
> On Sun, Sep 13, 2020 at 12:13:02PM +0300, Leon Romanovsky wrote:
> > > > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> > > > + struct uverbs_attr_bundle *attrs)
> > > > +{
> > > > + const struct ib_gid_attr *gid_attr;
> > > > + struct ib_uverbs_gid_entry entry;
> > > > + struct ib_ucontext *ucontext;
> > > > + struct ib_device *ib_dev;
> > > > + u32 gid_index;
> > > > + u32 port_num;
> > > > + u32 flags;
> > > > + int ret;
> > > > +
> > > > + ret = uverbs_get_flags32(&flags, attrs,
> > > > + UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + ret = uverbs_get_const(&port_num, attrs,
> > > > + UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + ret = uverbs_get_const(&gid_index, attrs,
> > > > + UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + ucontext = ib_uverbs_get_ucontext(attrs);
> > > > + if (IS_ERR(ucontext))
> > > > + return PTR_ERR(ucontext);
> > > > + ib_dev = ucontext->device;
> > >
> > > > + if (!rdma_is_port_valid(ib_dev, port_num))
> > > > + return -EINVAL;
> > > > +
> > > > + if (!rdma_ib_or_roce(ib_dev, port_num))
> > > > + return -EINVAL;
> > >
> > > Why these two tests? I would expect rdma_get_gid_attr() to do them
> >
> > First check is not needed, but the second check doesn't exist in
> > rdma_get_gid_attr(). We don't check that table returned from
> > rdma_gid_table() call exists.
>
> Oh that is a bit exciting, maybe it should be checked...
>
> Ideally we should also block this uapi entirely if the device doesn't
> have a gid table, so this should be -EOPNOTSUP and moved up to the
> top so it can be moved once we figure it out.
It is already in earliest possible stage, right after we get ib_device.
>
> > > > + gid_attr = rdma_get_gid_attr(ib_dev, port_num, gid_index);
> > > > + if (IS_ERR(gid_attr))
> > > > + return PTR_ERR(gid_attr);
> > > > +
> > > > + memcpy(&entry.gid, &gid_attr->gid, sizeof(gid_attr->gid));
> > > > + entry.gid_index = gid_attr->index;
> > > > + entry.port_num = gid_attr->port_num;
> > > > + entry.gid_type = gid_attr->gid_type;
> > > > + ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
> > >
> > > Use rdma_read_gid_attr_ndev_rcu()
> >
> > I don't want to bring below logic to uverbs* file.
> >
> > 1263 if (rdma_protocol_ib(gid_attr->device, gid_attr->port_num)) {
> > 1264 *ndev_ifindex = 0;
> > 1265 return 0;
> > 1266 }
>
> Shouldn't be needed, rdma_read_gid_attr_ndev_rcu() already returns -1
> for IB
It will be something like this:
diff --git a/drivers/infiniband/core/uverbs_std_types_device.c b/drivers/infiniband/core/uverbs_std_types_device.c
index 8e957aa38531..071698af4b8e 100644
--- a/drivers/infiniband/core/uverbs_std_types_device.c
+++ b/drivers/infiniband/core/uverbs_std_types_device.c
@@ -368,10 +368,11 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_TABLE)(
static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
struct uverbs_attr_bundle *attrs)
{
+ struct ib_uverbs_gid_entry entry = {};
const struct ib_gid_attr *gid_attr;
- struct ib_uverbs_gid_entry entry;
struct ib_ucontext *ucontext;
struct ib_device *ib_dev;
+ struct net_device *ndev;
u32 gid_index;
u32 port_num;
u32 flags;
@@ -408,9 +409,17 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
entry.gid_index = gid_attr->index;
entry.port_num = gid_attr->port_num;
entry.gid_type = gid_attr->gid_type;
- ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
- if (ret)
- goto out;
+
+ if (rdma_protocol_roce(ib_dev, port_num)) {
+ rcu_read_lock();
+ ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
+ if (IS_ERR(ndev)) {
+ rcu_read_unlock();
+ goto out;
+ }
+ entry.netdev_ifindex = ndev->ifindex;
+ rcu_read_unlock();
+ }
ret = uverbs_copy_to_struct_or_zero(
attrs, UVERBS_ATTR_QUERY_GID_ENTRY_RESP_ENTRY, &entry,
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-15 11:47 ` Leon Romanovsky
@ 2020-09-15 19:06 ` Jason Gunthorpe
2020-09-16 10:37 ` Leon Romanovsky
0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-15 19:06 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Tue, Sep 15, 2020 at 02:47:04PM +0300, Leon Romanovsky wrote:
> On Mon, Sep 14, 2020 at 12:55:50PM -0300, Jason Gunthorpe wrote:
> > On Sun, Sep 13, 2020 at 12:13:02PM +0300, Leon Romanovsky wrote:
> > > > > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> > > > > + struct uverbs_attr_bundle *attrs)
> > > > > +{
> > > > > + const struct ib_gid_attr *gid_attr;
> > > > > + struct ib_uverbs_gid_entry entry;
> > > > > + struct ib_ucontext *ucontext;
> > > > > + struct ib_device *ib_dev;
> > > > > + u32 gid_index;
> > > > > + u32 port_num;
> > > > > + u32 flags;
> > > > > + int ret;
> > > > > +
> > > > > + ret = uverbs_get_flags32(&flags, attrs,
> > > > > + UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> > > > > + if (ret)
> > > > > + return ret;
> > > > > +
> > > > > + ret = uverbs_get_const(&port_num, attrs,
> > > > > + UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> > > > > + if (ret)
> > > > > + return ret;
> > > > > +
> > > > > + ret = uverbs_get_const(&gid_index, attrs,
> > > > > + UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> > > > > + if (ret)
> > > > > + return ret;
> > > > > +
> > > > > + ucontext = ib_uverbs_get_ucontext(attrs);
> > > > > + if (IS_ERR(ucontext))
> > > > > + return PTR_ERR(ucontext);
> > > > > + ib_dev = ucontext->device;
> > > >
> > > > > + if (!rdma_is_port_valid(ib_dev, port_num))
> > > > > + return -EINVAL;
> > > > > +
> > > > > + if (!rdma_ib_or_roce(ib_dev, port_num))
> > > > > + return -EINVAL;
> > > >
> > > > Why these two tests? I would expect rdma_get_gid_attr() to do them
> > >
> > > First check is not needed, but the second check doesn't exist in
> > > rdma_get_gid_attr(). We don't check that table returned from
> > > rdma_gid_table() call exists.
> >
> > Oh that is a bit exciting, maybe it should be checked...
> >
> > Ideally we should also block this uapi entirely if the device doesn't
> > have a gid table, so this should be -EOPNOTSUP and moved up to the
> > top so it can be moved once we figure it out.
>
> It is already in earliest possible stage, right after we get ib_device.
I usually put it before the uverbs_get_XX.. but doesn't matter
> @@ -408,9 +409,17 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> entry.gid_index = gid_attr->index;
> entry.port_num = gid_attr->port_num;
> entry.gid_type = gid_attr->gid_type;
> - ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
> - if (ret)
> - goto out;
> +
> + if (rdma_protocol_roce(ib_dev, port_num)) {
Not sure this is needed? non-roce still has gid table entries, and the
attr->ndev should be null so it will do the ENODEV naturally.
> + rcu_read_lock();
> + ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> + if (IS_ERR(ndev)) {
> + rcu_read_unlock();
> + goto out;
> + }
> + entry.netdev_ifindex = ndev->ifindex;
> + rcu_read_unlock();
> + }
This is better than what is in rdma_get_ndev_ifindex()
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-15 19:06 ` Jason Gunthorpe
@ 2020-09-16 10:37 ` Leon Romanovsky
2020-09-16 12:04 ` Jason Gunthorpe
0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-16 10:37 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Tue, Sep 15, 2020 at 04:06:14PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 15, 2020 at 02:47:04PM +0300, Leon Romanovsky wrote:
> > On Mon, Sep 14, 2020 at 12:55:50PM -0300, Jason Gunthorpe wrote:
> > > On Sun, Sep 13, 2020 at 12:13:02PM +0300, Leon Romanovsky wrote:
> > > > > > +static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> > > > > > + struct uverbs_attr_bundle *attrs)
> > > > > > +{
> > > > > > + const struct ib_gid_attr *gid_attr;
> > > > > > + struct ib_uverbs_gid_entry entry;
> > > > > > + struct ib_ucontext *ucontext;
> > > > > > + struct ib_device *ib_dev;
> > > > > > + u32 gid_index;
> > > > > > + u32 port_num;
> > > > > > + u32 flags;
> > > > > > + int ret;
> > > > > > +
> > > > > > + ret = uverbs_get_flags32(&flags, attrs,
> > > > > > + UVERBS_ATTR_QUERY_GID_ENTRY_FLAGS, 0);
> > > > > > + if (ret)
> > > > > > + return ret;
> > > > > > +
> > > > > > + ret = uverbs_get_const(&port_num, attrs,
> > > > > > + UVERBS_ATTR_QUERY_GID_ENTRY_PORT);
> > > > > > + if (ret)
> > > > > > + return ret;
> > > > > > +
> > > > > > + ret = uverbs_get_const(&gid_index, attrs,
> > > > > > + UVERBS_ATTR_QUERY_GID_ENTRY_GID_INDEX);
> > > > > > + if (ret)
> > > > > > + return ret;
> > > > > > +
> > > > > > + ucontext = ib_uverbs_get_ucontext(attrs);
> > > > > > + if (IS_ERR(ucontext))
> > > > > > + return PTR_ERR(ucontext);
> > > > > > + ib_dev = ucontext->device;
> > > > >
> > > > > > + if (!rdma_is_port_valid(ib_dev, port_num))
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + if (!rdma_ib_or_roce(ib_dev, port_num))
> > > > > > + return -EINVAL;
> > > > >
> > > > > Why these two tests? I would expect rdma_get_gid_attr() to do them
> > > >
> > > > First check is not needed, but the second check doesn't exist in
> > > > rdma_get_gid_attr(). We don't check that table returned from
> > > > rdma_gid_table() call exists.
> > >
> > > Oh that is a bit exciting, maybe it should be checked...
> > >
> > > Ideally we should also block this uapi entirely if the device doesn't
> > > have a gid table, so this should be -EOPNOTSUP and moved up to the
> > > top so it can be moved once we figure it out.
> >
> > It is already in earliest possible stage, right after we get ib_device.
>
> I usually put it before the uverbs_get_XX.. but doesn't matter
>
> > @@ -408,9 +409,17 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_GID_ENTRY)(
> > entry.gid_index = gid_attr->index;
> > entry.port_num = gid_attr->port_num;
> > entry.gid_type = gid_attr->gid_type;
> > - ret = rdma_get_ndev_ifindex(gid_attr, &entry.netdev_ifindex);
> > - if (ret)
> > - goto out;
> > +
> > + if (rdma_protocol_roce(ib_dev, port_num)) {
>
> Not sure this is needed? non-roce still has gid table entries, and the
> attr->ndev should be null so it will do the ENODEV naturally.
It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
Current check allows us to ensure that any error returned by this call is
handled.
Otherwise we will find ourselves with something like this:
ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
if (IS_ERR(ndev)) {
if (rdma_protocol_roce())
goto error;
if (ERR_PTR(ndev) != -ENODEV)
goto error;
}
>
> > + rcu_read_lock();
> > + ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> > + if (IS_ERR(ndev)) {
> > + rcu_read_unlock();
> > + goto out;
> > + }
> > + entry.netdev_ifindex = ndev->ifindex;
> > + rcu_read_unlock();
> > + }
>
> This is better than what is in rdma_get_ndev_ifindex()
>
> Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-16 10:37 ` Leon Romanovsky
@ 2020-09-16 12:04 ` Jason Gunthorpe
2020-09-16 12:44 ` Leon Romanovsky
0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-16 12:04 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Wed, Sep 16, 2020 at 01:37:10PM +0300, Leon Romanovsky wrote:
> It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
> Current check allows us to ensure that any error returned by this call is
> handled.
>
> Otherwise we will find ourselves with something like this:
> ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> if (IS_ERR(ndev)) {
> if (rdma_protocol_roce())
> goto error;
> if (ERR_PTR(ndev) != -ENODEV)
> goto error;
> }
Isn't it just
if (IS_ERR(ndev)) {
if (ERR_PTR(ndev) != -ENODEV)
goto error;
index = -1;
}
Which seems fine and clear enough
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-16 12:04 ` Jason Gunthorpe
@ 2020-09-16 12:44 ` Leon Romanovsky
2020-09-16 14:12 ` Jason Gunthorpe
0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-16 12:44 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Wed, Sep 16, 2020 at 09:04:40AM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 16, 2020 at 01:37:10PM +0300, Leon Romanovsky wrote:
> > It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
> > Current check allows us to ensure that any error returned by this call is
> > handled.
> >
> > Otherwise we will find ourselves with something like this:
> > ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> > if (IS_ERR(ndev)) {
> > if (rdma_protocol_roce())
> > goto error;
> > if (ERR_PTR(ndev) != -ENODEV)
> > goto error;
> > }
>
> Isn't it just
>
> if (IS_ERR(ndev)) {
> if (ERR_PTR(ndev) != -ENODEV)
> goto error;
> index = -1;
> }
>
> Which seems fine and clear enough
It is a problem if roce device returned -ENODEV.
I don't want to paper-out this case by setting index = 0.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-16 12:44 ` Leon Romanovsky
@ 2020-09-16 14:12 ` Jason Gunthorpe
2020-09-16 14:34 ` Leon Romanovsky
0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-16 14:12 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Wed, Sep 16, 2020 at 03:44:29PM +0300, Leon Romanovsky wrote:
> On Wed, Sep 16, 2020 at 09:04:40AM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 16, 2020 at 01:37:10PM +0300, Leon Romanovsky wrote:
> > > It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
> > > Current check allows us to ensure that any error returned by this call is
> > > handled.
> > >
> > > Otherwise we will find ourselves with something like this:
> > > ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> > > if (IS_ERR(ndev)) {
> > > if (rdma_protocol_roce())
> > > goto error;
> > > if (ERR_PTR(ndev) != -ENODEV)
> > > goto error;
> > > }
> >
> > Isn't it just
> >
> > if (IS_ERR(ndev)) {
> > if (ERR_PTR(ndev) != -ENODEV)
> > goto error;
> > index = -1;
> > }
> >
> > Which seems fine and clear enough
>
> It is a problem if roce device returned -ENODEV.
Can it happen? RCU I suppose, but I think this is an issue in
rdma_read_gid_attr_ndev_rcu() - it should not return ENODEV if the RCU
shows the gid_attr is being concurrently destroyed
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-16 14:12 ` Jason Gunthorpe
@ 2020-09-16 14:34 ` Leon Romanovsky
2020-09-16 14:36 ` Jason Gunthorpe
0 siblings, 1 reply; 21+ messages in thread
From: Leon Romanovsky @ 2020-09-16 14:34 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Wed, Sep 16, 2020 at 11:12:02AM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 16, 2020 at 03:44:29PM +0300, Leon Romanovsky wrote:
> > On Wed, Sep 16, 2020 at 09:04:40AM -0300, Jason Gunthorpe wrote:
> > > On Wed, Sep 16, 2020 at 01:37:10PM +0300, Leon Romanovsky wrote:
> > > > It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
> > > > Current check allows us to ensure that any error returned by this call is
> > > > handled.
> > > >
> > > > Otherwise we will find ourselves with something like this:
> > > > ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> > > > if (IS_ERR(ndev)) {
> > > > if (rdma_protocol_roce())
> > > > goto error;
> > > > if (ERR_PTR(ndev) != -ENODEV)
> > > > goto error;
> > > > }
> > >
> > > Isn't it just
> > >
> > > if (IS_ERR(ndev)) {
> > > if (ERR_PTR(ndev) != -ENODEV)
> > > goto error;
> > > index = -1;
> > > }
> > >
> > > Which seems fine and clear enough
> >
> > It is a problem if roce device returned -ENODEV.
>
> Can it happen? RCU I suppose, but I think this is an issue in
> rdma_read_gid_attr_ndev_rcu() - it should not return ENODEV if the RCU
> shows the gid_attr is being concurrently destroyed
From RoCE point of view, it is a problem if device is destroyed or gid
not valid, the different returned values won't change much. For the IB,
we don't care.
>
> Jason
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH rdma-next 4/4] RDMA/uverbs: Expose the new GID query API to user space
2020-09-16 14:34 ` Leon Romanovsky
@ 2020-09-16 14:36 ` Jason Gunthorpe
0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2020-09-16 14:36 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Doug Ledford, Avihai Horon, linux-rdma
On Wed, Sep 16, 2020 at 05:34:25PM +0300, Leon Romanovsky wrote:
> On Wed, Sep 16, 2020 at 11:12:02AM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 16, 2020 at 03:44:29PM +0300, Leon Romanovsky wrote:
> > > On Wed, Sep 16, 2020 at 09:04:40AM -0300, Jason Gunthorpe wrote:
> > > > On Wed, Sep 16, 2020 at 01:37:10PM +0300, Leon Romanovsky wrote:
> > > > > It depends on how you want to treat errors from rdma_read_gid_attr_ndev_rcu().
> > > > > Current check allows us to ensure that any error returned by this call is
> > > > > handled.
> > > > >
> > > > > Otherwise we will find ourselves with something like this:
> > > > > ndev = rdma_read_gid_attr_ndev_rcu(gid_attr);
> > > > > if (IS_ERR(ndev)) {
> > > > > if (rdma_protocol_roce())
> > > > > goto error;
> > > > > if (ERR_PTR(ndev) != -ENODEV)
> > > > > goto error;
> > > > > }
> > > >
> > > > Isn't it just
> > > >
> > > > if (IS_ERR(ndev)) {
> > > > if (ERR_PTR(ndev) != -ENODEV)
> > > > goto error;
> > > > index = -1;
> > > > }
> > > >
> > > > Which seems fine and clear enough
> > >
> > > It is a problem if roce device returned -ENODEV.
> >
> > Can it happen? RCU I suppose, but I think this is an issue in
> > rdma_read_gid_attr_ndev_rcu() - it should not return ENODEV if the RCU
> > shows the gid_attr is being concurrently destroyed
>
> From RoCE point of view, it is a problem if device is destroyed or gid
> not valid, the different returned values won't change much. For the IB,
> we don't care.
I'm saying I prefer things layered properly. Having random
rdma_protocol's all over the place has proven to be hard to maintain
One inside rdma_read_gid_attr_ndev_rcu() to make a NULL netdev an
error return for roce is much more understandable
Jason
^ permalink raw reply [flat|nested] 21+ messages in thread