All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC ABI V2 0/8] SG-based RDMA ABI Proposal
@ 2016-07-19 15:23 Matan Barak
       [not found] ` <1468941812-32286-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Matan Barak @ 2016-07-19 15:23 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Liran Liss,
	Haggai Eran, Tal Alon, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky, Matan Barak

The following patch set comes to enrich security model as a follow up
to commit e6bd18f57aad ('IB/security: Restrict use of the write() interface').

DISCLAIMER:
These patches are far from being completed. They present working init_ucontext
and query_device (both regular and extended version). In addition, they are
given as a basis of discussions.

COMMENTS GIVEN ON V1 AREN'T HANDLED IN THIS SERIES, BUT WILL BE HANDLED IN
THE NEXT ONE.

The ideas presented here are based on our V1 series in addition to some ideas
presented in OFVWG and Sean's series.

This patch series add ioctl() interface to the existing write() interface and
provide an easy route to backport this change to legacy supported systems.
Analyzing the current uverbs role in dispatching and parsing commands, we find
that:
(a) uverbs validates the basic properties of the command
(b) uverbs is responsible of doing all the IDR and uobject management and
    locking.
(c) uverbs transforms the user<-->kernel ABI to kernel API.

(a) and (b) are valid for every kABI. Although the nature of commands could
change, they still have to be validated and transform to kernel pointers.
In order to avoid duplications between the various drivers, we would like to
keep (a) and (b) as shared code.

In addition, this is a good time to expand the ABI to be more scalable, so we
added a few goals:
(1) Command's attributes shall be extensible in an easy one. Either by allowing
    drivers to have their own extensible set of attributes or core code
    extensible attributes. Moreover, driver's specific attributes could some
    day become core's standard attributes. We would like to still support
    old user-space while avoid duplicating the code in kernel.
(2) Each driver may have specific type system (i.e QP, CQ, ....). It may
    or may not even implement the standard type system. It could extend this
    type system in the future.
(3) Do not change or recompile driver libraries and don't copy their data.
(4) Efficient dispatching.

Thus, in order to allow this flexibility, we decide giving (a) and (b) as a
common infrastructure, but use per-driver guidelines in order to do that
parsing and uobject management. Handlers are also set by the drivers
themselves (though they can point to either shared common code) or
driver specific code.

Since types are no longer enforced by the common infrastructure, there is no
point of pre-allocating common IDR types in the common code. Instead, we
provide an API for driver to add new types. We use one IDR per driver
for all its types. The driver should add all its usable types before
any application runs. The order of which the driver adds its types (and the
common types it uses) dictates the process release order. After that,
all uboject, reference counts and types are handled automatically for the
driver by the infrastructure.

Scatter gather was chosen in order to allow us not to recompile user space
drivers. By using pointers to driver specific data, we could just use it
without introduce copying data and without changing the user-space driver at
all.

We chose to go with non blocking lock user objects. When exclusive
(WRITE or DESTROY) access is required, we dispatch the action if and only if
no other action needs this object as well. Otherwise, -EBUSY is returned to
the user-space. Device removal is synced with SRCU as of today.
If we were using locks, we would have need to sort the given user-space handles.
Otherwise, a user-space application may result in causing a deadlock.
Moving to a non blocking lock based behaviour, the dispatching in kernel
becomes more efficient.

Further uverbs related subsystem (such as RDMA-CM) may use other fds or use
other ioctl codes.

Regards,
Liran, Haggai, Leon and Matan

TODO:
1. Address Jason's comments

Changes from V1:
1. Refined locking system
	a. try_read_lock and write lock to sync exclusive access
	b. SRCU to sync device removal from commands execution
	c. Future rwsem to sync close context from commands execution
2. Added temporary udata usage for vendor's data
3. Add query_device and init_ucontext command with mlx5 implementation
4. Fixed bugs in ioctl dispatching
5. Change callbacks to get ib_uverbs_file instead of ucontext
6. Add general types initialization and cleanups

Haggai Eran (1):
  RDMA/core: Add support for custom types

Leon Romanovsky (2):
  RDMA/core: Export RDMA IOCTL declarations
  RDMA/core: Refactor IDR to be per-device

Matan Barak (5):
  RDMA/core: Introduce add/remove uobj from types
  RDMA/core: Add new ioctl interface
  RDMA/core: Add initialize and cleanup of common types
  RDMA/core: Add common code for querying device and init context
  RDMA/mlx5: Add mlx5 initial support of the new infrastructure

 drivers/infiniband/core/Makefile           |   3 +-
 drivers/infiniband/core/device.c           |  18 ++
 drivers/infiniband/core/rdma_core.c        | 378 ++++++++++++++++++++++++++
 drivers/infiniband/core/rdma_core.h        |  90 +++++++
 drivers/infiniband/core/user_mad.c         |   2 +-
 drivers/infiniband/core/uverbs.h           |  29 +-
 drivers/infiniband/core/uverbs_cmd.c       | 157 ++++++-----
 drivers/infiniband/core/uverbs_ioctl.c     | 279 ++++++++++++++++++++
 drivers/infiniband/core/uverbs_ioctl_cmd.c | 410 +++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c      | 125 +--------
 drivers/infiniband/hw/mlx5/Makefile        |   3 +-
 drivers/infiniband/hw/mlx5/main.c          |  12 +-
 drivers/infiniband/hw/mlx5/mlx5_user_cmd.c |  69 +++++
 drivers/infiniband/hw/mlx5/user.h          |   3 +
 include/rdma/ib_verbs.h                    |  27 +-
 include/rdma/rdma_ioctl.h                  |  38 +++
 include/rdma/uverbs_ioctl.h                | 234 ++++++++++++++++
 include/rdma/uverbs_ioctl_cmd.h            | 134 ++++++++++
 include/uapi/rdma/Kbuild                   |   1 +
 include/uapi/rdma/ib_user_mad.h            |  12 -
 include/uapi/rdma/rdma_user_ioctl.h        |  82 ++++++
 21 files changed, 1877 insertions(+), 229 deletions(-)
 create mode 100644 drivers/infiniband/core/rdma_core.c
 create mode 100644 drivers/infiniband/core/rdma_core.h
 create mode 100644 drivers/infiniband/core/uverbs_ioctl.c
 create mode 100644 drivers/infiniband/core/uverbs_ioctl_cmd.c
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_user_cmd.c
 create mode 100644 include/rdma/rdma_ioctl.h
 create mode 100644 include/rdma/uverbs_ioctl.h
 create mode 100644 include/rdma/uverbs_ioctl_cmd.h
 create mode 100644 include/uapi/rdma/rdma_user_ioctl.h

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC ABI V2 1/8] RDMA/core: Export RDMA IOCTL declarations
       [not found] ` <1468941812-32286-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-19 15:23   ` Matan Barak
       [not found]     ` <1468941812-32286-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-07-19 15:23   ` [RFC ABI V2 2/8] RDMA/core: Refactor IDR to be per-device Matan Barak
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Matan Barak @ 2016-07-19 15:23 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Liran Liss,
	Haggai Eran, Tal Alon, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky, Matan Barak

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Place all external RDMA IOCTL declarations into one UAPI exported
header file and move all legacy MAD commands to that file.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/user_mad.c  |  2 +-
 include/rdma/rdma_ioctl.h           | 38 ++++++++++++++++++++++++
 include/uapi/rdma/Kbuild            |  1 +
 include/uapi/rdma/ib_user_mad.h     | 12 --------
 include/uapi/rdma/rdma_user_ioctl.h | 59 +++++++++++++++++++++++++++++++++++++
 5 files changed, 99 insertions(+), 13 deletions(-)
 create mode 100644 include/rdma/rdma_ioctl.h
 create mode 100644 include/uapi/rdma/rdma_user_ioctl.h

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 415a318..70e974e 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -52,8 +52,8 @@
 
 #include <asm/uaccess.h>
 
+#include <rdma/rdma_ioctl.h>
 #include <rdma/ib_mad.h>
-#include <rdma/ib_user_mad.h>
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("InfiniBand userspace MAD packet access");
diff --git a/include/rdma/rdma_ioctl.h b/include/rdma/rdma_ioctl.h
new file mode 100644
index 0000000..f62a359
--- /dev/null
+++ b/include/rdma/rdma_ioctl.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright (c) 2016 Mellanox Technologies, LTD. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RDMA_IOCTL_H
+#define RDMA_IOCTL_H
+
+#include <rdma/rdma_user_ioctl.h>
+
+#endif /* RDMA_IOCTL_H */
diff --git a/include/uapi/rdma/Kbuild b/include/uapi/rdma/Kbuild
index 231901b..537e40e 100644
--- a/include/uapi/rdma/Kbuild
+++ b/include/uapi/rdma/Kbuild
@@ -1,5 +1,6 @@
 # UAPI Header export list
 header-y += ib_user_cm.h
+header-y += rdma_user_ioctl.h
 header-y += ib_user_mad.h
 header-y += ib_user_sa.h
 header-y += ib_user_verbs.h
diff --git a/include/uapi/rdma/ib_user_mad.h b/include/uapi/rdma/ib_user_mad.h
index 09f809f..7e722e7 100644
--- a/include/uapi/rdma/ib_user_mad.h
+++ b/include/uapi/rdma/ib_user_mad.h
@@ -230,16 +230,4 @@ struct ib_user_mad_reg_req2 {
 	__u8	reserved[3];
 };
 
-#define IB_IOCTL_MAGIC		0x1b
-
-#define IB_USER_MAD_REGISTER_AGENT	_IOWR(IB_IOCTL_MAGIC, 1, \
-					      struct ib_user_mad_reg_req)
-
-#define IB_USER_MAD_UNREGISTER_AGENT	_IOW(IB_IOCTL_MAGIC, 2, __u32)
-
-#define IB_USER_MAD_ENABLE_PKEY		_IO(IB_IOCTL_MAGIC, 3)
-
-#define IB_USER_MAD_REGISTER_AGENT2     _IOWR(IB_IOCTL_MAGIC, 4, \
-					      struct ib_user_mad_reg_req2)
-
 #endif /* IB_USER_MAD_H */
diff --git a/include/uapi/rdma/rdma_user_ioctl.h b/include/uapi/rdma/rdma_user_ioctl.h
new file mode 100644
index 0000000..5c1117a
--- /dev/null
+++ b/include/uapi/rdma/rdma_user_ioctl.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright (c) 2016 Mellanox Technologies, LTD. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RDMA_USER_IOCTL_H
+#define RDMA_USER_IOCTL_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define RDMA_IOCTL_MAGIC		0x1b
+/*
+ * For hfi1 driver
+ */
+#define IB_IOCTL_MAGIC			RDMA_IOCTL_MAGIC
+
+/* Legacy part
+ * !!!! NOTE: It uses the same command index as VERBS
+ */
+#include <rdma/ib_user_mad.h>
+#define IB_USER_MAD_REGISTER_AGENT	_IOWR(RDMA_IOCTL_MAGIC, 1, \
+					      struct ib_user_mad_reg_req)
+
+#define IB_USER_MAD_UNREGISTER_AGENT	_IOW(RDMA_IOCTL_MAGIC, 2, __u32)
+
+#define IB_USER_MAD_ENABLE_PKEY		_IO(RDMA_IOCTL_MAGIC, 3)
+
+#define IB_USER_MAD_REGISTER_AGENT2     _IOWR(RDMA_IOCTL_MAGIC, 4, \
+					      struct ib_user_mad_reg_req2)
+
+#endif /* RDMA_USER_IOCTL_H */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC ABI V2 2/8] RDMA/core: Refactor IDR to be per-device
       [not found] ` <1468941812-32286-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-07-19 15:23   ` [RFC ABI V2 1/8] RDMA/core: Export RDMA IOCTL declarations Matan Barak
@ 2016-07-19 15:23   ` Matan Barak
       [not found]     ` <1468941812-32286-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-07-19 15:23   ` [RFC ABI V2 3/8] RDMA/core: Add support for custom types Matan Barak
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Matan Barak @ 2016-07-19 15:23 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Liran Liss,
	Haggai Eran, Tal Alon, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky, Matan Barak

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The current code creates an IDR per type. Since types are currently
common for all vendors and known in advance, this was good enough.
However, the proposed ioctl based infrastructure allows each vendor
to declare only some of the common types and declare its own specific
types.

Thus, we decided to implement IDR to be per device and refactor it to
use a new file.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/device.c      |  14 ++++
 drivers/infiniband/core/uverbs.h      |  14 +---
 drivers/infiniband/core/uverbs_cmd.c  | 149 ++++++++++++++++------------------
 drivers/infiniband/core/uverbs_main.c |  36 ++------
 include/rdma/ib_verbs.h               |   4 +
 5 files changed, 100 insertions(+), 117 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 5516fb0..b0135c2 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -168,11 +168,23 @@ static int alloc_name(char *name)
 	return 0;
 }
 
+static void ib_device_allocate_idrs(struct ib_device *device)
+{
+	spin_lock_init(&device->idr_lock);
+	idr_init(&device->idr);
+}
+
+static void ib_device_destroy_idrs(struct ib_device *device)
+{
+	idr_destroy(&device->idr);
+}
+
 static void ib_device_release(struct device *device)
 {
 	struct ib_device *dev = container_of(device, struct ib_device, dev);
 
 	ib_cache_release_one(dev);
+	ib_device_destroy_idrs(dev);
 	kfree(dev->port_immutable);
 	kfree(dev);
 }
@@ -219,6 +231,8 @@ struct ib_device *ib_alloc_device(size_t size)
 	if (!device)
 		return NULL;
 
+	ib_device_allocate_idrs(device);
+
 	device->dev.class = &ib_class;
 	device_initialize(&device->dev);
 
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 612ccfd..60d7c3d 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -38,7 +38,6 @@
 #define UVERBS_H
 
 #include <linux/kref.h>
-#include <linux/idr.h>
 #include <linux/mutex.h>
 #include <linux/completion.h>
 #include <linux/cdev.h>
@@ -171,18 +170,7 @@ struct ib_ucq_object {
 	u32			async_events_reported;
 };
 
-extern spinlock_t ib_uverbs_idr_lock;
-extern struct idr ib_uverbs_pd_idr;
-extern struct idr ib_uverbs_mr_idr;
-extern struct idr ib_uverbs_mw_idr;
-extern struct idr ib_uverbs_ah_idr;
-extern struct idr ib_uverbs_cq_idr;
-extern struct idr ib_uverbs_qp_idr;
-extern struct idr ib_uverbs_srq_idr;
-extern struct idr ib_uverbs_xrcd_idr;
-extern struct idr ib_uverbs_rule_idr;
-
-void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
+void idr_remove_uobj(struct ib_uobject *uobj);
 
 struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file,
 					struct ib_device *ib_dev,
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 1a8babb..6af2c29 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -118,37 +118,36 @@ static void put_uobj_write(struct ib_uobject *uobj)
 	put_uobj(uobj);
 }
 
-static int idr_add_uobj(struct idr *idr, struct ib_uobject *uobj)
+int idr_add_uobj(struct ib_uobject *uobj)
 {
 	int ret;
 
 	idr_preload(GFP_KERNEL);
-	spin_lock(&ib_uverbs_idr_lock);
+	spin_lock(&uobj->context->device->idr_lock);
 
-	ret = idr_alloc(idr, uobj, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0, GFP_NOWAIT);
 	if (ret >= 0)
 		uobj->id = ret;
 
-	spin_unlock(&ib_uverbs_idr_lock);
+	spin_unlock(&uobj->context->device->idr_lock);
 	idr_preload_end();
 
 	return ret < 0 ? ret : 0;
 }
 
-void idr_remove_uobj(struct idr *idr, struct ib_uobject *uobj)
+void idr_remove_uobj(struct ib_uobject *uobj)
 {
-	spin_lock(&ib_uverbs_idr_lock);
-	idr_remove(idr, uobj->id);
-	spin_unlock(&ib_uverbs_idr_lock);
+	spin_lock(&uobj->context->device->idr_lock);
+	idr_remove(&uobj->context->device->idr, uobj->id);
+	spin_unlock(&uobj->context->device->idr_lock);
 }
 
-static struct ib_uobject *__idr_get_uobj(struct idr *idr, int id,
-					 struct ib_ucontext *context)
+static struct ib_uobject *__idr_get_uobj(int id, struct ib_ucontext *context)
 {
 	struct ib_uobject *uobj;
 
 	rcu_read_lock();
-	uobj = idr_find(idr, id);
+	uobj = idr_find(&context->device->idr, id);
 	if (uobj) {
 		if (uobj->context == context)
 			kref_get(&uobj->ref);
@@ -160,12 +159,12 @@ static struct ib_uobject *__idr_get_uobj(struct idr *idr, int id,
 	return uobj;
 }
 
-static struct ib_uobject *idr_read_uobj(struct idr *idr, int id,
-					struct ib_ucontext *context, int nested)
+static struct ib_uobject *idr_read_uobj(int id, struct ib_ucontext *context,
+					int nested)
 {
 	struct ib_uobject *uobj;
 
-	uobj = __idr_get_uobj(idr, id, context);
+	uobj = __idr_get_uobj(id, context);
 	if (!uobj)
 		return NULL;
 
@@ -181,12 +180,11 @@ static struct ib_uobject *idr_read_uobj(struct idr *idr, int id,
 	return uobj;
 }
 
-static struct ib_uobject *idr_write_uobj(struct idr *idr, int id,
-					 struct ib_ucontext *context)
+struct ib_uobject *idr_write_uobj(int id, struct ib_ucontext *context)
 {
 	struct ib_uobject *uobj;
 
-	uobj = __idr_get_uobj(idr, id, context);
+	uobj = __idr_get_uobj(id, context);
 	if (!uobj)
 		return NULL;
 
@@ -199,18 +197,18 @@ static struct ib_uobject *idr_write_uobj(struct idr *idr, int id,
 	return uobj;
 }
 
-static void *idr_read_obj(struct idr *idr, int id, struct ib_ucontext *context,
+static void *idr_read_obj(int id, struct ib_ucontext *context,
 			  int nested)
 {
 	struct ib_uobject *uobj;
 
-	uobj = idr_read_uobj(idr, id, context, nested);
+	uobj = idr_read_uobj(id, context, nested);
 	return uobj ? uobj->object : NULL;
 }
 
-static struct ib_pd *idr_read_pd(int pd_handle, struct ib_ucontext *context)
+struct ib_pd *idr_read_pd(int pd_handle, struct ib_ucontext *context)
 {
-	return idr_read_obj(&ib_uverbs_pd_idr, pd_handle, context, 0);
+	return idr_read_obj(pd_handle, context, 0);
 }
 
 static void put_pd_read(struct ib_pd *pd)
@@ -220,7 +218,7 @@ static void put_pd_read(struct ib_pd *pd)
 
 static struct ib_cq *idr_read_cq(int cq_handle, struct ib_ucontext *context, int nested)
 {
-	return idr_read_obj(&ib_uverbs_cq_idr, cq_handle, context, nested);
+	return idr_read_obj(cq_handle, context, nested);
 }
 
 static void put_cq_read(struct ib_cq *cq)
@@ -228,26 +226,26 @@ static void put_cq_read(struct ib_cq *cq)
 	put_uobj_read(cq->uobject);
 }
 
-static struct ib_ah *idr_read_ah(int ah_handle, struct ib_ucontext *context)
+static void put_ah_read(struct ib_ah *ah)
 {
-	return idr_read_obj(&ib_uverbs_ah_idr, ah_handle, context, 0);
+	put_uobj_read(ah->uobject);
 }
 
-static void put_ah_read(struct ib_ah *ah)
+struct ib_ah *idr_read_ah(int ah_handle, struct ib_ucontext *context)
 {
-	put_uobj_read(ah->uobject);
+	return idr_read_obj(ah_handle, context, 0);
 }
 
-static struct ib_qp *idr_read_qp(int qp_handle, struct ib_ucontext *context)
+struct ib_qp *idr_read_qp(int qp_handle, struct ib_ucontext *context)
 {
-	return idr_read_obj(&ib_uverbs_qp_idr, qp_handle, context, 0);
+	return idr_read_obj(qp_handle, context, 0);
 }
 
-static struct ib_qp *idr_write_qp(int qp_handle, struct ib_ucontext *context)
+struct ib_qp *idr_write_qp(int qp_handle, struct ib_ucontext *context)
 {
 	struct ib_uobject *uobj;
 
-	uobj = idr_write_uobj(&ib_uverbs_qp_idr, qp_handle, context);
+	uobj = idr_write_uobj(qp_handle, context);
 	return uobj ? uobj->object : NULL;
 }
 
@@ -261,9 +259,9 @@ static void put_qp_write(struct ib_qp *qp)
 	put_uobj_write(qp->uobject);
 }
 
-static struct ib_srq *idr_read_srq(int srq_handle, struct ib_ucontext *context)
+struct ib_srq *idr_read_srq(int srq_handle, struct ib_ucontext *context)
 {
-	return idr_read_obj(&ib_uverbs_srq_idr, srq_handle, context, 0);
+	return idr_read_obj(srq_handle, context, 0);
 }
 
 static void put_srq_read(struct ib_srq *srq)
@@ -271,10 +269,10 @@ static void put_srq_read(struct ib_srq *srq)
 	put_uobj_read(srq->uobject);
 }
 
-static struct ib_xrcd *idr_read_xrcd(int xrcd_handle, struct ib_ucontext *context,
-				     struct ib_uobject **uobj)
+struct ib_xrcd *idr_read_xrcd(int xrcd_handle, struct ib_ucontext *context,
+			      struct ib_uobject **uobj)
 {
-	*uobj = idr_read_uobj(&ib_uverbs_xrcd_idr, xrcd_handle, context, 0);
+	*uobj = idr_read_uobj(xrcd_handle, context, 0);
 	return *uobj ? (*uobj)->object : NULL;
 }
 
@@ -282,7 +280,6 @@ static void put_xrcd_read(struct ib_uobject *uobj)
 {
 	put_uobj_read(uobj);
 }
-
 ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 			      struct ib_device *ib_dev,
 			      const char __user *buf,
@@ -550,7 +547,7 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
 	atomic_set(&pd->usecnt, 0);
 
 	uobj->object = pd;
-	ret = idr_add_uobj(&ib_uverbs_pd_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_idr;
 
@@ -574,7 +571,7 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_pd_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_idr:
 	ib_dealloc_pd(pd);
@@ -597,7 +594,7 @@ ssize_t ib_uverbs_dealloc_pd(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_pd_idr, cmd.pd_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.pd_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	pd = uobj->object;
@@ -615,7 +612,7 @@ ssize_t ib_uverbs_dealloc_pd(struct ib_uverbs_file *file,
 	uobj->live = 0;
 	put_uobj_write(uobj);
 
-	idr_remove_uobj(&ib_uverbs_pd_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -791,7 +788,7 @@ ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file,
 
 	atomic_set(&obj->refcnt, 0);
 	obj->uobject.object = xrcd;
-	ret = idr_add_uobj(&ib_uverbs_xrcd_idr, &obj->uobject);
+	ret = idr_add_uobj(&obj->uobject);
 	if (ret)
 		goto err_idr;
 
@@ -835,7 +832,7 @@ err_copy:
 	}
 
 err_insert_xrcd:
-	idr_remove_uobj(&ib_uverbs_xrcd_idr, &obj->uobject);
+	idr_remove_uobj(&obj->uobject);
 
 err_idr:
 	ib_dealloc_xrcd(xrcd);
@@ -869,7 +866,7 @@ ssize_t ib_uverbs_close_xrcd(struct ib_uverbs_file *file,
 		return -EFAULT;
 
 	mutex_lock(&file->device->xrcd_tree_mutex);
-	uobj = idr_write_uobj(&ib_uverbs_xrcd_idr, cmd.xrcd_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.xrcd_handle, file->ucontext);
 	if (!uobj) {
 		ret = -EINVAL;
 		goto out;
@@ -902,7 +899,7 @@ ssize_t ib_uverbs_close_xrcd(struct ib_uverbs_file *file,
 	if (inode && !live)
 		xrcd_table_delete(file->device, inode);
 
-	idr_remove_uobj(&ib_uverbs_xrcd_idr, uobj);
+	idr_remove_uobj(uobj);
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
 	mutex_unlock(&file->mutex);
@@ -995,7 +992,7 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 	atomic_inc(&pd->usecnt);
 
 	uobj->object = mr;
-	ret = idr_add_uobj(&ib_uverbs_mr_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_unreg;
 
@@ -1023,7 +1020,7 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_mr_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_unreg:
 	ib_dereg_mr(mr);
@@ -1068,8 +1065,7 @@ ssize_t ib_uverbs_rereg_mr(struct ib_uverbs_file *file,
 	     (cmd.start & ~PAGE_MASK) != (cmd.hca_va & ~PAGE_MASK)))
 			return -EINVAL;
 
-	uobj = idr_write_uobj(&ib_uverbs_mr_idr, cmd.mr_handle,
-			      file->ucontext);
+	uobj = idr_write_uobj(cmd.mr_handle, file->ucontext);
 
 	if (!uobj)
 		return -EINVAL;
@@ -1138,7 +1134,7 @@ ssize_t ib_uverbs_dereg_mr(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_mr_idr, cmd.mr_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.mr_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 
@@ -1153,7 +1149,7 @@ ssize_t ib_uverbs_dereg_mr(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_mr_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -1213,7 +1209,7 @@ ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
 	atomic_inc(&pd->usecnt);
 
 	uobj->object = mw;
-	ret = idr_add_uobj(&ib_uverbs_mw_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_unalloc;
 
@@ -1240,7 +1236,7 @@ ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_mw_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_unalloc:
 	uverbs_dealloc_mw(mw);
@@ -1266,7 +1262,7 @@ ssize_t ib_uverbs_dealloc_mw(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof(cmd)))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_mw_idr, cmd.mw_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.mw_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 
@@ -1281,7 +1277,7 @@ ssize_t ib_uverbs_dealloc_mw(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_mw_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -1395,7 +1391,7 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
 	atomic_set(&cq->usecnt, 0);
 
 	obj->uobject.object = cq;
-	ret = idr_add_uobj(&ib_uverbs_cq_idr, &obj->uobject);
+	ret = idr_add_uobj(&obj->uobject);
 	if (ret)
 		goto err_free;
 
@@ -1421,7 +1417,7 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
 	return obj;
 
 err_cb:
-	idr_remove_uobj(&ib_uverbs_cq_idr, &obj->uobject);
+	idr_remove_uobj(&obj->uobject);
 
 err_free:
 	ib_destroy_cq(cq);
@@ -1691,7 +1687,7 @@ ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_cq_idr, cmd.cq_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.cq_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	cq      = uobj->object;
@@ -1707,7 +1703,7 @@ ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_cq_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -1878,7 +1874,7 @@ static int create_qp(struct ib_uverbs_file *file,
 	qp->uobject = &obj->uevent.uobject;
 
 	obj->uevent.uobject.object = qp;
-	ret = idr_add_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	ret = idr_add_uobj(&obj->uevent.uobject);
 	if (ret)
 		goto err_destroy;
 
@@ -1924,7 +1920,7 @@ static int create_qp(struct ib_uverbs_file *file,
 
 	return 0;
 err_cb:
-	idr_remove_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	idr_remove_uobj(&obj->uevent.uobject);
 
 err_destroy:
 	ib_destroy_qp(qp);
@@ -2108,7 +2104,7 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
 	qp->uobject = &obj->uevent.uobject;
 
 	obj->uevent.uobject.object = qp;
-	ret = idr_add_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	ret = idr_add_uobj(&obj->uevent.uobject);
 	if (ret)
 		goto err_destroy;
 
@@ -2137,7 +2133,7 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
 	return in_len;
 
 err_remove:
-	idr_remove_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	idr_remove_uobj(&obj->uevent.uobject);
 
 err_destroy:
 	ib_destroy_qp(qp);
@@ -2377,7 +2373,7 @@ ssize_t ib_uverbs_destroy_qp(struct ib_uverbs_file *file,
 
 	memset(&resp, 0, sizeof resp);
 
-	uobj = idr_write_uobj(&ib_uverbs_qp_idr, cmd.qp_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.qp_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	qp  = uobj->object;
@@ -2400,7 +2396,7 @@ ssize_t ib_uverbs_destroy_qp(struct ib_uverbs_file *file,
 	if (obj->uxrcd)
 		atomic_dec(&obj->uxrcd->refcnt);
 
-	idr_remove_uobj(&ib_uverbs_qp_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -2852,7 +2848,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	ah->uobject  = uobj;
 	uobj->object = ah;
 
-	ret = idr_add_uobj(&ib_uverbs_ah_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_destroy;
 
@@ -2877,7 +2873,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_ah_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_destroy:
 	ib_destroy_ah(ah);
@@ -2902,7 +2898,7 @@ ssize_t ib_uverbs_destroy_ah(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_ah_idr, cmd.ah_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.ah_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	ah = uobj->object;
@@ -2916,7 +2912,7 @@ ssize_t ib_uverbs_destroy_ah(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_ah_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -3184,7 +3180,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 	flow_id->uobject = uobj;
 	uobj->object = flow_id;
 
-	err = idr_add_uobj(&ib_uverbs_rule_idr, uobj);
+	err = idr_add_uobj(uobj);
 	if (err)
 		goto destroy_flow;
 
@@ -3209,7 +3205,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 		kfree(kern_flow_attr);
 	return 0;
 err_copy:
-	idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+	idr_remove_uobj(uobj);
 destroy_flow:
 	ib_destroy_flow(flow_id);
 err_free:
@@ -3244,8 +3240,7 @@ int ib_uverbs_ex_destroy_flow(struct ib_uverbs_file *file,
 	if (cmd.comp_mask)
 		return -EINVAL;
 
-	uobj = idr_write_uobj(&ib_uverbs_rule_idr, cmd.flow_handle,
-			      file->ucontext);
+	uobj = idr_write_uobj(cmd.flow_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	flow_id = uobj->object;
@@ -3256,7 +3251,7 @@ int ib_uverbs_ex_destroy_flow(struct ib_uverbs_file *file,
 
 	put_uobj_write(uobj);
 
-	idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -3344,7 +3339,7 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	atomic_set(&srq->usecnt, 0);
 
 	obj->uevent.uobject.object = srq;
-	ret = idr_add_uobj(&ib_uverbs_srq_idr, &obj->uevent.uobject);
+	ret = idr_add_uobj(&obj->uevent.uobject);
 	if (ret)
 		goto err_destroy;
 
@@ -3378,7 +3373,7 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	return 0;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_srq_idr, &obj->uevent.uobject);
+	idr_remove_uobj(&obj->uevent.uobject);
 
 err_destroy:
 	ib_destroy_srq(srq);
@@ -3554,7 +3549,7 @@ ssize_t ib_uverbs_destroy_srq(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_srq_idr, cmd.srq_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.srq_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	srq = uobj->object;
@@ -3575,7 +3570,7 @@ ssize_t ib_uverbs_destroy_srq(struct ib_uverbs_file *file,
 		atomic_dec(&us->uxrcd->refcnt);
 	}
 
-	idr_remove_uobj(&ib_uverbs_srq_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 31f422a..333ed70 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -66,17 +66,6 @@ enum {
 
 static struct class *uverbs_class;
 
-DEFINE_SPINLOCK(ib_uverbs_idr_lock);
-DEFINE_IDR(ib_uverbs_pd_idr);
-DEFINE_IDR(ib_uverbs_mr_idr);
-DEFINE_IDR(ib_uverbs_mw_idr);
-DEFINE_IDR(ib_uverbs_ah_idr);
-DEFINE_IDR(ib_uverbs_cq_idr);
-DEFINE_IDR(ib_uverbs_qp_idr);
-DEFINE_IDR(ib_uverbs_srq_idr);
-DEFINE_IDR(ib_uverbs_xrcd_idr);
-DEFINE_IDR(ib_uverbs_rule_idr);
-
 static DEFINE_SPINLOCK(map_lock);
 static DECLARE_BITMAP(dev_map, IB_UVERBS_MAX_DEVICES);
 
@@ -227,7 +216,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
 		struct ib_ah *ah = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_ah_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_ah(ah);
 		kfree(uobj);
 	}
@@ -236,7 +225,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->mw_list, list) {
 		struct ib_mw *mw = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_mw_idr, uobj);
+		idr_remove_uobj(uobj);
 		uverbs_dealloc_mw(mw);
 		kfree(uobj);
 	}
@@ -244,7 +233,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->rule_list, list) {
 		struct ib_flow *flow_id = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_flow(flow_id);
 		kfree(uobj);
 	}
@@ -254,7 +243,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_uqp_object *uqp =
 			container_of(uobj, struct ib_uqp_object, uevent.uobject);
 
-		idr_remove_uobj(&ib_uverbs_qp_idr, uobj);
+		idr_remove_uobj(uobj);
 		if (qp != qp->real_qp) {
 			ib_close_qp(qp);
 		} else {
@@ -270,7 +259,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_uevent_object *uevent =
 			container_of(uobj, struct ib_uevent_object, uobject);
 
-		idr_remove_uobj(&ib_uverbs_srq_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_srq(srq);
 		ib_uverbs_release_uevent(file, uevent);
 		kfree(uevent);
@@ -282,7 +271,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_ucq_object *ucq =
 			container_of(uobj, struct ib_ucq_object, uobject);
 
-		idr_remove_uobj(&ib_uverbs_cq_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_cq(cq);
 		ib_uverbs_release_ucq(file, ev_file, ucq);
 		kfree(ucq);
@@ -291,7 +280,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->mr_list, list) {
 		struct ib_mr *mr = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_mr_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_dereg_mr(mr);
 		kfree(uobj);
 	}
@@ -302,7 +291,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_uxrcd_object *uxrcd =
 			container_of(uobj, struct ib_uxrcd_object, uobject);
 
-		idr_remove_uobj(&ib_uverbs_xrcd_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_uverbs_dealloc_xrcd(file->device, xrcd);
 		kfree(uxrcd);
 	}
@@ -311,7 +300,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->pd_list, list) {
 		struct ib_pd *pd = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_pd_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_dealloc_pd(pd);
 		kfree(uobj);
 	}
@@ -1326,13 +1315,6 @@ static void __exit ib_uverbs_cleanup(void)
 	unregister_chrdev_region(IB_UVERBS_BASE_DEV, IB_UVERBS_MAX_DEVICES);
 	if (overflow_maj)
 		unregister_chrdev_region(overflow_maj, IB_UVERBS_MAX_DEVICES);
-	idr_destroy(&ib_uverbs_pd_idr);
-	idr_destroy(&ib_uverbs_mr_idr);
-	idr_destroy(&ib_uverbs_mw_idr);
-	idr_destroy(&ib_uverbs_ah_idr);
-	idr_destroy(&ib_uverbs_cq_idr);
-	idr_destroy(&ib_uverbs_qp_idr);
-	idr_destroy(&ib_uverbs_srq_idr);
 }
 
 module_init(ib_uverbs_init);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 432bed5..14bfe3b 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1706,6 +1706,10 @@ struct ib_device {
 
 	struct iw_cm_verbs	     *iwcm;
 
+	struct idr		idr;
+	/* Global lock in use to safely release device IDR */
+	spinlock_t		idr_lock;
+
 	/**
 	 * alloc_hw_stats - Allocate a struct rdma_hw_stats and fill in the
 	 *   driver initialized data.  The struct is kfree()'ed by the sysfs
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC ABI V2 3/8] RDMA/core: Add support for custom types
       [not found] ` <1468941812-32286-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-07-19 15:23   ` [RFC ABI V2 1/8] RDMA/core: Export RDMA IOCTL declarations Matan Barak
  2016-07-19 15:23   ` [RFC ABI V2 2/8] RDMA/core: Refactor IDR to be per-device Matan Barak
@ 2016-07-19 15:23   ` Matan Barak
  2016-07-19 15:23   ` [RFC ABI V2 4/8] RDMA/core: Introduce add/remove uobj from types Matan Barak
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 50+ messages in thread
From: Matan Barak @ 2016-07-19 15:23 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Liran Liss,
	Haggai Eran, Tal Alon, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky, Matan Barak

From: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The new ioctl infrastructure supports vendor specific objects.
This is implemented by having a list of uverbs_uobject_type in the
ib_device. Each element of this list corresponds to a specific type
and specifies its free function, vendor's type_id, etc.
The order of elements dictates the order to object's destruction.
The whole type_list should be initialized before any ucontext was
created.

When a ucontext is created, a new list is created in this ib_ucontext.
This list corresponds to the ib_device's type list, as any element
in the ib_dev's type list has a corresponding element in this list.
Each element in the ucontext's list points to its respective
corresponding element in the ib_dev's type list.
In addition, it has a data structure (currently implemented by a list,
but should probably move to using a hash) that maps all ib_uobjects
of the same ib_ucontext and the respective type.

+-------------------------------------------------------------------+
|    ib_device                                                      |
|   +--------------+      +--------------+      +----------------+  |
|   |uobject_type  |      |              |      |                |  |
|   |free_fn       |      |              |      |                |  |
|   |              +----->+              +----->+                |  |
|   |              |      |              |      |                |  |
|   +----^---------+      +--------------+      +----------------+  |
| +------|                                                          |
+-------------------------------------------------------------------+
  |
  |
  |
+-------------------------------------------------------------------+
| |  ib_ucontext                                                    |
| | +--------------+      +--------------+      +----------------+  |
| | |uobject_list  |      |              |      |                |  |
| +-+type          |      |              |      |                |  |
|   |list+         +----->+              +----->+                |  |
|   |    |         |      |              |      |                |  |
|   +--------------+      +--------------+      +----------------+  |
|        |                                                          |
+-------------------------------------------------------------------+
         |
         |
         |
         |
         |  +-----------+        +------------+
         |  | ib_uobject|        |ib_uobject  |
         +-->           +------> |            |
            |           |        |            |
            +-----------+        +------------+

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/Makefile    |   3 +-
 drivers/infiniband/core/device.c    |   1 +
 drivers/infiniband/core/rdma_core.c | 102 ++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/rdma_core.h |  69 ++++++++++++++++++++++++
 include/rdma/ib_verbs.h             |   5 ++
 5 files changed, 179 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/core/rdma_core.c
 create mode 100644 drivers/infiniband/core/rdma_core.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index edaae9f..1819623 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -28,4 +28,5 @@ ib_umad-y :=			user_mad.o
 
 ib_ucm-y :=			ucm.o
 
-ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o
+ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
+				rdma_core.o
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index b0135c2..5f09c40 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -243,6 +243,7 @@ struct ib_device *ib_alloc_device(size_t size)
 	spin_lock_init(&device->client_data_lock);
 	INIT_LIST_HEAD(&device->client_data_list);
 	INIT_LIST_HEAD(&device->port_list);
+	INIT_LIST_HEAD(&device->type_list);
 
 	return device;
 }
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
new file mode 100644
index 0000000..672ce82
--- /dev/null
+++ b/drivers/infiniband/core/rdma_core.c
@@ -0,0 +1,102 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/ib_verbs.h>
+#include "uverbs.h"
+#include "rdma_core.h"
+
+/*
+ * lockless - the list shouldn't change. If disassociate is carrie out during
+ * this, we'll wait until all current executing commands are finished.
+ */
+struct uverbs_uobject_type *uverbs_get_type(struct ib_device *ibdev,
+					    uint16_t type)
+{
+	struct uverbs_uobject_type *uobj_type;
+
+	list_for_each_entry(uobj_type, &ibdev->type_list, type_list) {
+		if (uobj_type->obj_type == type)
+			return uobj_type;
+	}
+
+	return NULL;
+}
+
+int ib_uverbs_uobject_type_add(struct list_head	*head,
+			       void (*free)(struct uverbs_uobject_type *uobject_type,
+					    struct ib_uobject *uobject,
+					    struct ib_ucontext *ucontext),
+			       uint16_t	obj_type)
+{
+	/*
+	 * Allocate a new object type for the vendor, this should be done when a
+	 * vendor is initialized.
+	 */
+	struct uverbs_uobject_type *uobject_type;
+
+	uobject_type = kzalloc(sizeof(*uobject_type), GFP_KERNEL);
+	if (!uobject_type)
+		return -ENOMEM;
+
+	uobject_type->free = free;
+	uobject_type->obj_type = obj_type;
+	list_add_tail(&uobject_type->type_list, head);
+	return 0;
+}
+EXPORT_SYMBOL(ib_uverbs_uobject_type_add);
+
+/* Should only be called when device is destroyed (remove_one?) */
+static void ib_uverbs_uobject_type_remove(struct uverbs_uobject_type *uobject_type)
+{
+	/*
+	 * Allocate a new object type for the vendor, this should be done when a
+	 * vendor is initialized.
+	 */
+	WARN_ON(list_empty(&uobject_type->type_list));
+	list_del(&uobject_type->type_list);
+	kfree(uobject_type);
+}
+EXPORT_SYMBOL(ib_uverbs_uobject_type_remove);
+
+/*
+ * Done when device is destroyed. No one should touch the list or use its
+ * elements here.
+ */
+void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev)
+{
+	struct uverbs_uobject_type *iter, *temp;
+
+	list_for_each_entry_safe(iter, temp, &ib_dev->type_list, type_list)
+		ib_uverbs_uobject_type_remove(iter);
+}
+EXPORT_SYMBOL(ib_uverbs_uobject_types_remove);
+
diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h
new file mode 100644
index 0000000..c734a76
--- /dev/null
+++ b/drivers/infiniband/core/rdma_core.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright (c) 2005 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005-2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef UOBJECT_H
+#define UOBJECT_H
+
+#include <rdma/ib_verbs.h>
+#include <linux/mutex.h>
+
+struct uverbs_uobject_type *uverbs_get_type(struct ib_device *ibdev,
+					    uint16_t type);
+int ib_uverbs_uobject_type_add(struct list_head	*head,
+			       void (*free)(struct uverbs_uobject_type *uobject_type,
+					    struct ib_uobject *uobject,
+					    struct ib_ucontext *ucontext),
+			       uint16_t	obj_type);
+
+struct uverbs_uobject_type {
+	struct list_head	type_list;
+	void (*free)(struct uverbs_uobject_type *uobject_type,
+		     struct ib_uobject *uobject,
+		     struct ib_ucontext *ucontext);
+	u16			obj_type;
+	size_t			obj_size;
+};
+
+/* embed in ucontext per type */
+struct uverbs_uobject_list {
+	struct uverbs_uobject_type	*type;
+	/* lock of the uobject data type */
+	struct mutex			uobj_lock;
+	struct list_head		list;
+	struct list_head		type_list;
+};
+
+#endif /* UIDR_H */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 14bfe3b..6d7964f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1325,6 +1325,8 @@ struct ib_ucontext {
 	struct list_head	rule_list;
 	int			closing;
 
+	struct list_head	uobjects_lists;
+
 	struct pid             *tgid;
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 	struct rb_root      umem_tree;
@@ -1960,6 +1962,9 @@ struct ib_device {
 	 * in fast paths.
 	 */
 	int (*get_port_immutable)(struct ib_device *, u8, struct ib_port_immutable *);
+	struct list_head type_list;
+
+	const struct uverbs_types	*types;
 };
 
 struct ib_client {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC ABI V2 4/8] RDMA/core: Introduce add/remove uobj from types
       [not found] ` <1468941812-32286-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-07-19 15:23   ` [RFC ABI V2 3/8] RDMA/core: Add support for custom types Matan Barak
@ 2016-07-19 15:23   ` Matan Barak
  2016-07-19 15:23   ` [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface Matan Barak
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 50+ messages in thread
From: Matan Barak @ 2016-07-19 15:23 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Liran Liss,
	Haggai Eran, Tal Alon, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky, Matan Barak

We introduce here adding new user objects to the type list.
Adding an object is done in two parts.
First, an object is allocated and added to IDR. Then, the command's
handlers (in downstream patches) could work on this object and fill
in its required details.
After a successful command, ib_uverbs_uobject_enable is called and
this user objects becomes ucontext visible.

Removing an uboject is done by calling ib_uverbs_uobject_remove.

We should make sure IDR (per-device) and lists (per-ucontext) could
be accessed concurrently without corrupting them.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/rdma_core.c | 276 ++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/rdma_core.h |  21 +++
 include/rdma/ib_verbs.h             |  16 ++-
 include/rdma/uverbs_ioctl.h         | 155 ++++++++++++++++++++
 4 files changed, 466 insertions(+), 2 deletions(-)
 create mode 100644 include/rdma/uverbs_ioctl.h

diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 672ce82..d833874 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -33,6 +33,7 @@
 #include <rdma/ib_verbs.h>
 #include "uverbs.h"
 #include "rdma_core.h"
+#include <rdma/uverbs_ioctl.h>
 
 /*
  * lockless - the list shouldn't change. If disassociate is carrie out during
@@ -51,6 +52,189 @@ struct uverbs_uobject_type *uverbs_get_type(struct ib_device *ibdev,
 	return NULL;
 }
 
+static int uverbs_lock_object(struct ib_uobject *uobj,
+			      enum uverbs_idr_access access)
+{
+	if (access == UVERBS_IDR_ACCESS_READ)
+		return __atomic_add_unless(&uobj->usecnt, 1, -1) == -1 ?
+			-EBUSY : 0;
+	else
+		/* lock is either WRITE or DESTROY - should be exclusive */
+		return atomic_cmpxchg(&uobj->usecnt, 0, -1) == 0 ? 0 : -EBUSY;
+}
+
+static struct ib_uobject *get_uobj(int id, struct ib_ucontext *context)
+{
+	struct ib_uobject *uobj;
+
+	rcu_read_lock();
+	uobj = idr_find(&context->device->idr, id);
+	if (uobj && uobj->live) {
+		if (uobj->context != context)
+			uobj = NULL;
+	}
+	rcu_read_unlock();
+
+	return uobj;
+}
+
+static void init_uobj(struct ib_uobject *uobj, u64 user_handle,
+		      struct ib_ucontext *context)
+{
+	uobj->user_handle = user_handle;
+	uobj->context     = context;
+	uobj->live        = 0;
+}
+
+static int add_uobj(struct ib_uobject *uobj)
+{
+	int ret;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&uobj->context->device->idr_lock);
+
+	ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0, GFP_NOWAIT);
+	if (ret >= 0)
+		uobj->id = ret;
+
+	spin_unlock(&uobj->context->device->idr_lock);
+	idr_preload_end();
+
+	return ret < 0 ? ret : 0;
+}
+
+static void remove_uobj(struct ib_uobject *uobj)
+{
+	spin_lock(&uobj->context->device->idr_lock);
+	idr_remove(&uobj->context->device->idr, uobj->id);
+	spin_unlock(&uobj->context->device->idr_lock);
+}
+
+static void put_uobj(struct ib_uobject *uobj)
+{
+	kfree_rcu(uobj, rcu);
+}
+
+static struct ib_uobject *get_uobject_from_context(struct ib_ucontext *ucontext,
+						   const struct uverbs_uobject_type *type,
+						   u32 idr,
+						   enum uverbs_idr_access access)
+{
+	struct ib_uobject *uobj;
+	int ret;
+
+	rcu_read_lock();
+	uobj = get_uobj(idr, ucontext);
+	if (!uobj)
+		goto free;
+
+	if (uobj->type->type != type) {
+		uobj = NULL;
+		goto free;
+	}
+
+	ret = uverbs_lock_object(uobj, access);
+	if (ret)
+		uobj = ERR_PTR(ret);
+free:
+	rcu_read_unlock();
+	return uobj;
+
+	return NULL;
+}
+
+struct ib_uobject *uverbs_get_type_from_idr(struct uverbs_uobject_type *type,
+					    struct ib_ucontext *ucontext,
+					    enum uverbs_idr_access access,
+					    uint32_t idr)
+{
+	struct ib_uobject *uobj;
+	int ret;
+
+	if (access == UVERBS_IDR_ACCESS_NEW) {
+		uobj = kmalloc(type->obj_size, GFP_KERNEL);
+		if (!uobj)
+			return ERR_PTR(-ENOMEM);
+
+		init_uobj(uobj, 0, ucontext);
+
+		/* lock idr */
+		ret = ib_uverbs_uobject_add(uobj, type);
+		if (ret) {
+			kfree(uobj);
+			return ERR_PTR(ret);
+		}
+
+	} else {
+		uobj = get_uobject_from_context(ucontext, type, idr,
+						access);
+
+		if (!uobj)
+			return ERR_PTR(-ENOENT);
+	}
+
+	return uobj;
+}
+
+static void uverbs_unlock_object(struct ib_uobject *uobj,
+				 enum uverbs_idr_access access,
+				 bool success)
+{
+	switch (access) {
+	case UVERBS_IDR_ACCESS_READ:
+		atomic_dec(&uobj->usecnt);
+		break;
+	case UVERBS_IDR_ACCESS_NEW:
+		if (success) {
+			atomic_set(&uobj->usecnt, 0);
+			ib_uverbs_uobject_enable(uobj);
+		} else {
+			remove_uobj(uobj);
+			put_uobj(uobj);
+		}
+		break;
+	case UVERBS_IDR_ACCESS_WRITE:
+		atomic_set(&uobj->usecnt, 0);
+		break;
+	case UVERBS_IDR_ACCESS_DESTROY:
+		if (success)
+			ib_uverbs_uobject_remove(uobj);
+		else
+			atomic_set(&uobj->usecnt, 0);
+		break;
+	}
+}
+
+void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
+			   size_t num,
+			   const struct action_spec *chain,
+			   bool success)
+{
+	unsigned int i;
+
+	for (i = 0; i < num; i++) {
+		struct uverbs_attr_array *attr_spec_array = &attr_array[i];
+		const struct uverbs_attr_chain_spec *chain_spec =
+			chain->validator_chains[i];
+		unsigned int j;
+
+		for (j = 0; j < attr_spec_array->num_attrs; j++) {
+			struct uverbs_attr *attr = &attr_spec_array->attrs[j];
+			struct uverbs_attr_spec *spec = &chain_spec->attrs[j];
+
+			if (spec->type != UVERBS_ATTR_TYPE_IDR || !attr->valid)
+				continue;
+
+			/*
+			 * refcounts should be handled at the object level and
+			 * not at the uobject level.
+			 */
+			uverbs_unlock_object(attr->obj_attr.uobject,
+					     spec->idr.access, success);
+		}
+	}
+}
+
 int ib_uverbs_uobject_type_add(struct list_head	*head,
 			       void (*free)(struct uverbs_uobject_type *uobject_type,
 					    struct ib_uobject *uobject,
@@ -100,3 +284,95 @@ void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev)
 }
 EXPORT_SYMBOL(ib_uverbs_uobject_types_remove);
 
+void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext *ucontext)
+{
+	struct uverbs_uobject_list *uobject_list, *next_list;
+
+	list_for_each_entry_safe(uobject_list, next_list,
+				 &ucontext->uobjects_lists, type_list) {
+		struct ib_uobject *obj, *next_obj;
+
+		/*
+		 * No need to take lock here, as cleanup should be called
+		 * after all commands finished executing. Newly executed
+		 * commands should fail.
+		 */
+		list_for_each_entry_safe(obj, next_obj, &uobject_list->list,
+					 list)
+			ib_uverbs_uobject_remove(obj);
+
+		list_del(&uobject_list->type_list);
+	}
+}
+
+int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext *ucontext,
+					       struct list_head *type_list)
+{
+	/* create typed list in ucontext */
+	struct uverbs_uobject_type *type;
+	int err;
+
+	INIT_LIST_HEAD(&ucontext->uobjects_lists);
+
+	list_for_each_entry(type, type_list, type_list) {
+		struct uverbs_uobject_list *cur;
+
+		cur = kzalloc(sizeof(*cur), GFP_KERNEL);
+		if (!cur) {
+			err = -ENOMEM;
+			goto err;
+		}
+
+		cur->type = type;
+		INIT_LIST_HEAD(&cur->list);
+		list_add_tail(&cur->type_list, &ucontext->uobjects_lists);
+		mutex_init(&cur->uobj_lock);
+	}
+
+	return 0;
+
+err:
+	ib_uverbs_uobject_type_cleanup_ucontext(ucontext);
+	return err;
+}
+
+int ib_uverbs_uobject_add(struct ib_uobject *uobject,
+			  struct uverbs_uobject_type *uobject_type)
+{
+	int ret = -EINVAL;
+	struct uverbs_uobject_list *type;
+
+	/* No need for locking here is type list shouldn't be changed */
+	list_for_each_entry(type, &uobject->context->uobjects_lists, type_list)
+		if (type->type == uobject_type) {
+			uobject->type = type;
+			ret = add_uobj(uobject);
+			return ret;
+		}
+
+	return ret;
+}
+
+void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
+{
+	mutex_lock(&uobject->type->uobj_lock);
+	list_add(&uobject->list, &uobject->type->list);
+	mutex_unlock(&uobject->type->uobj_lock);
+	uobject->live = 1;
+}
+
+void ib_uverbs_uobject_remove(struct ib_uobject *uobject)
+{
+	/*
+	 * Calling remove requires exclusive access, so it's not possible
+	 * another thread will use our object.
+	 */
+	uobject->live = 0;
+	uobject->type->type->free(uobject->type->type, uobject,
+				  uobject->context);
+	mutex_lock(&uobject->type->uobj_lock);
+	list_del(&uobject->list);
+	mutex_unlock(&uobject->type->uobj_lock);
+	remove_uobj(uobject);
+	put_uobj(uobject);
+}
diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h
index c734a76..3bf6667 100644
--- a/drivers/infiniband/core/rdma_core.h
+++ b/drivers/infiniband/core/rdma_core.h
@@ -37,16 +37,32 @@
 #ifndef UOBJECT_H
 #define UOBJECT_H
 
+#include <linux/idr.h>
+#include <rdma/uverbs_ioctl.h>
 #include <rdma/ib_verbs.h>
 #include <linux/mutex.h>
 
 struct uverbs_uobject_type *uverbs_get_type(struct ib_device *ibdev,
 					    uint16_t type);
+struct ib_uobject *uverbs_get_type_from_idr(struct uverbs_uobject_type *type,
+					    struct ib_ucontext *ucontext,
+					    enum uverbs_idr_access access,
+					    uint32_t idr);
 int ib_uverbs_uobject_type_add(struct list_head	*head,
 			       void (*free)(struct uverbs_uobject_type *uobject_type,
 					    struct ib_uobject *uobject,
 					    struct ib_ucontext *ucontext),
 			       uint16_t	obj_type);
+void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev);
+void ib_uverbs_uobject_remove(struct ib_uobject *uobject);
+void ib_uverbs_uobject_enable(struct ib_uobject *uobject);
+void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
+			   size_t num,
+			   const struct action_spec *chain,
+			   bool success);
+
+int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext *ucontext,
+					       struct list_head *type_list);
 
 struct uverbs_uobject_type {
 	struct list_head	type_list;
@@ -66,4 +82,9 @@ struct uverbs_uobject_list {
 	struct list_head		type_list;
 };
 
+int ib_uverbs_uobject_add(struct ib_uobject *uobject,
+			  struct uverbs_uobject_type *uobject_type);
+void ib_uverbs_uobject_remove(struct ib_uobject *uobject);
+void ib_uverbs_uobject_enable(struct ib_uobject *uobject);
+
 #endif /* UIDR_H */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6d7964f..85d4c41 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1346,16 +1346,28 @@ struct ib_ucontext {
 #endif
 };
 
+struct uverbs_object_list;
+
+#define OLD_ABI_COMPAT
+
 struct ib_uobject {
 	u64			user_handle;	/* handle given to us by userspace */
 	struct ib_ucontext     *context;	/* associated user context */
 	void		       *object;		/* containing object */
 	struct list_head	list;		/* link to context's list */
 	int			id;		/* index into kernel idr */
-	struct kref		ref;
-	struct rw_semaphore	mutex;		/* protects .live */
+#ifdef OLD_ABI_COMPAT
+	struct kref             ref;
+#endif
+	atomic_t		usecnt;
+#ifdef OLD_ABI_COMPAT
+	struct rw_semaphore     mutex;          /* protects .live */
+#endif
 	struct rcu_head		rcu;		/* kfree_rcu() overhead */
 	int			live;
+	/* List of object under uverbs_object_type */
+	struct list_head	idr_list;
+	struct uverbs_uobject_list *type;	/* ptr to ucontext type */
 };
 
 struct ib_udata {
diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
new file mode 100644
index 0000000..7733b11
--- /dev/null
+++ b/include/rdma/uverbs_ioctl.h
@@ -0,0 +1,155 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UVERBS_IOCTL_
+#define _UVERBS_IOCTL_
+
+#include <linux/kernel.h>
+
+struct uverbs_object_type;
+struct ib_ucontext;
+struct ib_device;
+
+/*
+ * =======================================
+ *	Verbs action specifications
+ * =======================================
+ */
+
+enum uverbs_attr_type {
+	UVERBS_ATTR_TYPE_PTR_IN,
+	UVERBS_ATTR_TYPE_PTR_OUT,
+	UVERBS_ATTR_TYPE_IDR,
+	/*
+	 * TODO: we could add FD type for command which will migrate the events
+	 * to a specific FD.
+	 */
+};
+
+enum uverbs_idr_access {
+	UVERBS_IDR_ACCESS_READ,
+	UVERBS_IDR_ACCESS_WRITE,
+	UVERBS_IDR_ACCESS_NEW,
+	UVERBS_IDR_ACCESS_DESTROY
+};
+
+struct uverbs_attr_spec {
+	u16				len;
+	enum uverbs_attr_type		type;
+	struct {
+		u16			new_size;
+		u16			idr_type;
+		u8			access;
+	} idr;
+	/* TODO: In case of FD, we could validate here the fops pointer */
+};
+
+struct uverbs_attr_chain_spec {
+	struct uverbs_attr_spec		*attrs;
+	size_t				num_attrs;
+};
+
+struct action_spec {
+	const struct uverbs_attr_chain_spec		**validator_chains;
+	/* if > 0 -> validator, otherwise, error */
+	int (*dist)(__u16 *attr_id, void *priv);
+	void						*priv;
+	size_t						num_chains;
+};
+
+struct uverbs_attr_array;
+struct ib_uverbs_file;
+
+struct uverbs_action {
+	struct action_spec chain;
+	void *priv;
+	int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
+		       struct uverbs_attr_array *ctx, size_t num, void *priv);
+};
+
+struct uverbs_type_actions {
+	size_t				num_actions;
+	const struct uverbs_action	*actions;
+};
+
+struct uverbs_types {
+	size_t					num_types;
+	const struct uverbs_type_actions	**types;
+};
+
+/* =================================================
+ *              Parsing infrastructure
+ * =================================================
+ */
+
+struct uverbs_ptr_attr {
+	void	* __user ptr;
+	__u16		len;
+};
+
+struct uverbs_obj_attr {
+	/*  idr handle */
+	__u32	idr;
+	/* pointer to the kernel descriptor -> type, access, etc */
+	const struct uverbs_attr_spec *val;
+	struct ib_uobject		*uobject;
+	struct uverbs_uobject_type	*type;
+};
+
+struct uverbs_attr {
+	bool valid;
+	union {
+		struct uverbs_ptr_attr	cmd_attr;
+		struct uverbs_obj_attr	obj_attr;
+	};
+};
+
+/* output of one validator */
+struct uverbs_attr_array {
+	size_t num_attrs;
+	/* arrays of attrubytes, index is the id i.e SEND_CQ */
+	struct uverbs_attr *attrs;
+};
+
+/* =================================================
+ *              Types infrastructure
+ * =================================================
+ */
+
+int ib_uverbs_uobject_type_add(struct list_head	*head,
+			       void (*free)(struct uverbs_uobject_type *uobject_type,
+					    struct ib_uobject *uobject,
+					    struct ib_ucontext *ucontext),
+			       uint16_t	obj_type);
+void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev);
+
+#endif
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found] ` <1468941812-32286-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-07-19 15:23   ` [RFC ABI V2 4/8] RDMA/core: Introduce add/remove uobj from types Matan Barak
@ 2016-07-19 15:23   ` Matan Barak
       [not found]     ` <1468941812-32286-6-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-07-19 15:23   ` [RFC ABI V2 6/8] RDMA/core: Add initialize and cleanup of common types Matan Barak
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 50+ messages in thread
From: Matan Barak @ 2016-07-19 15:23 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Liran Liss,
	Haggai Eran, Tal Alon, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky, Matan Barak

In this proposed ioctl interface, processing the command starts from
properties of the command and fetching the appropriate user objects
before calling the handler.

Parsing and validation is done according to a specifier declared by
the driver's code. In the driver, all supported types are declared.
For each type we list all supported actions. Each action has a
specifies a handler, which could be either be a standard command or
a vendor specific command.
Along with the handler, a chain is specified as well. This chain
lists all supported attributes and is used for automatically
fetching and validating the command, response and its related objects.

Each chain consists of a validator_chains and a distribution function
which maps different command attributes to the different validators.
It also maps the attribute from the kABI world (which is vendor
specific) to the common kernel language.
Currently, in order to ease transition, we have a standard
distribution function which maps the attributes using the upper
attribute type bit.

A chain is an array of attributes. Each attribute has a type (PTR_IN,
PTR_OUT, IDR and in the future maybe FD, which could be used for
completion channel) and a length. Future work here might add
validation of mandatory attributes (i.e - make sure a specific
attribute was given).

If an IDR attribute is specified, the kernel also states the object
type and the required access (NEW, WRITE, READ or DESTROY).
All uobject management is done automatically by the infrastructure,
meaning - the infrastructure will fail concurrent commands that at
least one of them requires concurrent access (WRITE/DESTROY),
synchronize actions with device removals (dissociate context events)
and take care of reference counting (increase/decrease). Automatically
reference count isn't coded yet, but the general concept here is to add
(INCREASE_REF, DECREASE_REF) to the access specification field and
update these reference in success automatically.

 types
+--------+
|        |                                                                          +--------+
|        |   actions    action      action_spec                           +-----+   |len     |
+--------+  +------+   +-------+   +----------------+   +------------+    |attr1+-> |type    |
| type   +> |action+-> |chain  +-> +validator_chains+-> |common_chain+--> +-----+   |idr_type|
+--------+  +------+   |handler|   |distribution_fn |   +------------+    |attr2|   |access  |
|        |  |      |   +-------+   +----------------+   |vendor chain|    +-----+   +--------+
|        |  |      |                                    +------------+
|        |  +------+
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
+--------+

Once validation and object fetching (or creation) completed, we call
the handler:
int (*handler)(struct ib_device *ib_dev, struct ib_ucontext *ucontext,
               struct uverbs_attr_array *ctx, size_t num, void *priv);

Where ctx is an array of uverbs_attr_array. Each element in this array
is an array of attributes which corresponds to one validator.
For example, in the usually used case:

 ctx                               core
+----------------------------+     +------------+
| core: uverbs_attr_array    +---> | valid      |
+----------------------------+     | cmd_attr   |
| vendor: uverbs_attr_array  |     +------------+
|----------------------------+--+  | valid      |
                                |  | cmd_attr   |
                                |  +------------+
                                |  | valid      |
                                |  | obj_attr   |
                                |  +------------+
                                |
                                |  vendor
                                |  +------------+
                                +> | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | obj_attr   |
                                   +------------+

Ctx array's indices corresponds to the validators order. The indices
of core and vendor corresponds to the attributes name spaces of each
validator. Thus, we could think of the following as one object:
1. Set of attribute specification (with their attribute IDs)
2. Validator which owns (1) specifications
3. A function which could handle this attributes which the handler
   could call

That means that core and vendor are the validated function (3)
parameters and their types exist in this function namespace.

Since the requent used case, we propose a wrapper that
allows a definition of the following handler:
int (*handler)(struct ib_device *ib_dev, struct ib_ucontext *ucontext,
               struct uverbs_attr_array *common,
               struct uverbs_attr_array *vendor,
               void *priv);

Upon success, reference count of uobjects and use count will be a
updated automatically according to the specification.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/Makefile           |   2 +-
 drivers/infiniband/core/device.c           |   3 +
 drivers/infiniband/core/uverbs.h           |   4 +
 drivers/infiniband/core/uverbs_cmd.c       |   1 +
 drivers/infiniband/core/uverbs_ioctl.c     | 279 +++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_ioctl_cmd.c |  78 ++++++++
 drivers/infiniband/core/uverbs_main.c      |   6 +
 include/rdma/ib_verbs.h                    |   2 +
 include/rdma/uverbs_ioctl_cmd.h            |  68 +++++++
 include/uapi/rdma/rdma_user_ioctl.h        |  23 +++
 10 files changed, 465 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/core/uverbs_ioctl.c
 create mode 100644 drivers/infiniband/core/uverbs_ioctl_cmd.c
 create mode 100644 include/rdma/uverbs_ioctl_cmd.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 1819623..769a299 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -29,4 +29,4 @@ ib_umad-y :=			user_mad.o
 ib_ucm-y :=			ucm.o
 
 ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
-				rdma_core.o
+				rdma_core.o uverbs_ioctl.o uverbs_ioctl_cmd.o
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 5f09c40..b2e2ee2 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -245,6 +245,9 @@ struct ib_device *ib_alloc_device(size_t size)
 	INIT_LIST_HEAD(&device->port_list);
 	INIT_LIST_HEAD(&device->type_list);
 
+	/* TODO: don't forget to initialize device->driver_id, so verbs handshake between
+	 * user space<->kernel space will work for other values than driver_id == 0.
+	 */
 	return device;
 }
 EXPORT_SYMBOL(ib_alloc_device);
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 60d7c3d..86be861 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -41,6 +41,7 @@
 #include <linux/mutex.h>
 #include <linux/completion.h>
 #include <linux/cdev.h>
+#include <linux/rwsem.h>
 
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_umem.h>
@@ -83,6 +84,8 @@
  * released when the CQ is destroyed.
  */
 
+long ib_uverbs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+
 struct ib_uverbs_device {
 	atomic_t				refcount;
 	int					num_comp_vectors;
@@ -121,6 +124,7 @@ struct ib_uverbs_file {
 	struct ib_uverbs_event_file	       *async_file;
 	struct list_head			list;
 	int					is_closed;
+	struct rw_semaphore			close_sem;
 };
 
 struct ib_uverbs_event {
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 6af2c29..9648b80 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -361,6 +361,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 	}
 
 	file->ucontext = ucontext;
+	ucontext->ufile = file;
 
 	fd_install(resp.async_fd, filp);
 
diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
new file mode 100644
index 0000000..57ff7cb
--- /dev/null
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -0,0 +1,279 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/rdma_ioctl.h>
+#include <rdma/uverbs_ioctl.h>
+#include "rdma_core.h"
+#include "uverbs.h"
+
+static int uverbs_validate_attr(struct ib_device *ibdev,
+				struct ib_ucontext *ucontext,
+				const struct ib_uverbs_attr *uattr,
+				u16 attr_id,
+				const struct uverbs_attr_chain_spec *chain_spec,
+				struct uverbs_attr *elements)
+{
+	const struct uverbs_attr_spec *spec;
+	struct uverbs_attr *e;
+
+	if (uattr->reserved)
+		return -EINVAL;
+
+	if (attr_id >= chain_spec->num_attrs)
+		return -EINVAL;
+
+	spec = &chain_spec->attrs[attr_id];
+	e = &elements[attr_id];
+
+	if (e->valid)
+		return -EINVAL;
+
+	switch (spec->type) {
+	case UVERBS_ATTR_TYPE_PTR_IN:
+	case UVERBS_ATTR_TYPE_PTR_OUT:
+		/* If spec->len is zero, don't validate (flexible) */
+		if (spec->len && uattr->len != spec->len)
+			return -EINVAL;
+		e->cmd_attr.ptr = (void * __user)uattr->ptr_idr;
+		e->cmd_attr.len = uattr->len;
+		break;
+
+	case UVERBS_ATTR_TYPE_IDR:
+		if (uattr->len != 0 || (uattr->ptr_idr >> 32) || (!ucontext))
+			return -EINVAL;
+
+		e->obj_attr.idr = (uint32_t)uattr->ptr_idr;
+		e->obj_attr.val = spec;
+		e->obj_attr.type = uverbs_get_type(ibdev,
+						   spec->idr.idr_type);
+		if (!e->obj_attr.type)
+			return -EINVAL;
+
+		e->obj_attr.uobject = uverbs_get_type_from_idr(e->obj_attr.type,
+							       ucontext,
+							       spec->idr.access,
+							       e->obj_attr.idr);
+		if (!e->obj_attr.uobject)
+			return -EINVAL;
+
+		break;
+	};
+
+	e->valid = 1;
+	return 0;
+}
+
+static int uverbs_validate(struct ib_device *ibdev,
+			   struct ib_ucontext *ucontext,
+			   const struct ib_uverbs_attr *uattrs,
+			   size_t num_attrs,
+			   const struct action_spec *action_spec,
+			   struct uverbs_attr_array *attr_array)
+{
+	size_t i;
+	int ret;
+	int n_val = -1;
+
+	for (i = 0; i < num_attrs; i++) {
+		const struct ib_uverbs_attr *uattr = &uattrs[i];
+		__u16 attr_id = uattr->attr_id;
+		const struct uverbs_attr_chain_spec *chain_spec;
+
+		ret = action_spec->dist(&attr_id, action_spec->priv);
+		if (ret < 0)
+			return ret;
+
+		if (ret > n_val)
+			n_val = ret;
+
+		chain_spec = action_spec->validator_chains[ret];
+		ret = uverbs_validate_attr(ibdev, ucontext, uattr, attr_id,
+					   chain_spec, attr_array[ret].attrs);
+		if (ret)
+			return ret;
+
+	}
+
+	return n_val >= 0 ? n_val + 1 : n_val;
+}
+
+static int uverbs_handle_action(const struct ib_uverbs_attr *uattrs,
+				size_t num_attrs,
+				struct ib_device *ibdev,
+				struct ib_uverbs_file *ufile,
+				const struct uverbs_action *handler,
+				struct uverbs_attr_array *attr_array)
+{
+	int ret;
+	int n_val;
+
+	n_val = uverbs_validate(ibdev, ufile->ucontext, uattrs, num_attrs,
+				&handler->chain, attr_array);
+	if (n_val <= 0)
+		return n_val;
+
+	ret = handler->handler(ibdev, ufile, attr_array, n_val,
+			       handler->priv);
+	uverbs_unlock_objects(attr_array, n_val, &handler->chain, !ret);
+
+	return ret;
+}
+
+static long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
+				struct ib_uverbs_file *file,
+				struct ib_uverbs_ioctl_hdr *hdr,
+				void __user *buf)
+{
+	const struct uverbs_type_actions *type;
+	const struct uverbs_action *action;
+	const struct action_spec *action_spec;
+	long err = 0;
+	unsigned int num_specs = 0;
+	unsigned int i;
+	struct {
+		struct ib_uverbs_attr		*uattrs;
+		struct uverbs_attr_array	*uverbs_attr_array;
+	} *ctx = NULL;
+	struct uverbs_attr *curr_attr;
+	size_t ctx_size;
+
+	if (!ib_dev)
+		return -EIO;
+
+	if (ib_dev->types->num_types < hdr->object_type ||
+	    ib_dev->driver_id != hdr->driver_id)
+		return -EINVAL;
+
+	type = ib_dev->types->types[hdr->object_type];
+	if (!type || type->num_actions < hdr->action)
+		return -EOPNOTSUPP;
+
+	action = &type->actions[hdr->action];
+	if (!action)
+		return -EOPNOTSUPP;
+
+	action_spec = &action->chain;
+	for (i = 0; i < action_spec->num_chains;
+	     num_specs += action_spec->validator_chains[i]->num_attrs, i++)
+		;
+
+	ctx_size = sizeof(*ctx->uattrs) * hdr->num_attrs +
+		   sizeof(*ctx->uverbs_attr_array->attrs) * num_specs +
+		   sizeof(struct uverbs_attr_array) * action_spec->num_chains +
+		   sizeof(*ctx);
+
+	ctx = kzalloc(ctx_size, GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->uverbs_attr_array = (void *)ctx + sizeof(*ctx);
+	ctx->uattrs = (void *)(ctx->uverbs_attr_array +
+			       action_spec->num_chains);
+	curr_attr = (void *)(ctx->uattrs + hdr->num_attrs);
+	for (i = 0; i < action_spec->num_chains; i++) {
+		ctx->uverbs_attr_array[i].attrs = curr_attr;
+		ctx->uverbs_attr_array[i].num_attrs =
+			action_spec->validator_chains[i]->num_attrs;
+		curr_attr += action_spec->validator_chains[i]->num_attrs;
+	}
+
+	err = copy_from_user(ctx->uattrs, buf,
+			     sizeof(*ctx->uattrs) * hdr->num_attrs);
+	if (err) {
+		err = -EFAULT;
+		goto out;
+	}
+
+	err = uverbs_handle_action(ctx->uattrs, hdr->num_attrs, ib_dev,
+				   file, action, ctx->uverbs_attr_array);
+out:
+	kfree(ctx);
+	return err;
+}
+
+#define IB_UVERBS_MAX_CMD_SZ 4096
+
+long ib_uverbs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	struct ib_uverbs_file *file = filp->private_data;
+	struct ib_uverbs_ioctl_hdr __user *user_hdr =
+		(struct ib_uverbs_ioctl_hdr __user *)arg;
+	struct ib_uverbs_ioctl_hdr hdr;
+	struct ib_device *ib_dev;
+	int srcu_key;
+	long err;
+
+	srcu_key = srcu_read_lock(&file->device->disassociate_srcu);
+	ib_dev = srcu_dereference(file->device->ib_dev,
+				  &file->device->disassociate_srcu);
+	if (!ib_dev) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (cmd == RDMA_DIRECT_IOCTL) {
+		/* TODO? */
+		err = -ENOSYS;
+		goto out;
+	} else {
+		if (cmd != RDMA_VERBS_IOCTL) {
+			err = -ENOIOCTLCMD;
+			goto out;
+		}
+
+		err = copy_from_user(&hdr, user_hdr, sizeof(hdr));
+
+		if (err || hdr.length > IB_UVERBS_MAX_CMD_SZ ||
+		    hdr.length <= sizeof(hdr) ||
+		    hdr.length != sizeof(hdr) + hdr.num_attrs * sizeof(struct ib_uverbs_attr)) {
+			err = -EINVAL;
+			goto out;
+		}
+
+		/* currently there are no flags supported */
+		if (hdr.flags) {
+			err = -EOPNOTSUPP;
+			goto out;
+		}
+
+		/* We're closing, fail all commands */
+		if (!down_read_trylock(&file->close_sem))
+			return -EIO;
+		err = ib_uverbs_cmd_verbs(ib_dev, file, &hdr,
+					  (__user void *)arg + sizeof(hdr));
+		up_read(&file->close_sem);
+	}
+out:
+	srcu_read_unlock(&file->device->disassociate_srcu, srcu_key);
+
+	return err;
+}
diff --git a/drivers/infiniband/core/uverbs_ioctl_cmd.c b/drivers/infiniband/core/uverbs_ioctl_cmd.c
new file mode 100644
index 0000000..4c2d30e4
--- /dev/null
+++ b/drivers/infiniband/core/uverbs_ioctl_cmd.c
@@ -0,0 +1,78 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/uverbs_ioctl_cmd.h>
+#include <rdma/ib_verbs.h>
+#include <linux/bug.h>
+#include "uverbs.h"
+
+#define IB_UVERBS_VENDOR_FLAG	0x8000
+
+int ib_uverbs_std_dist(__u16 *attr_id, void *priv)
+{
+	if (*attr_id & IB_UVERBS_VENDOR_FLAG) {
+		*attr_id &= ~IB_UVERBS_VENDOR_FLAG;
+		return 1;
+	}
+	return 0;
+}
+EXPORT_SYMBOL(ib_uverbs_std_dist);
+
+int uverbs_action_std_handle(struct ib_device *ib_dev,
+			     struct ib_uverbs_file *ufile,
+			     struct uverbs_attr_array *ctx, size_t num,
+			     void *_priv)
+{
+	struct uverbs_action_std_handler *priv = _priv;
+
+	if (!ufile->ucontext)
+		return -EINVAL;
+
+	WARN_ON((num != 1) && (num != 2));
+	return priv->handler(ib_dev, ufile->ucontext, &ctx[0],
+			     (num == 2 ? &ctx[1] : NULL),
+			     priv->priv);
+}
+EXPORT_SYMBOL(uverbs_action_std_handle);
+
+int uverbs_action_std_ctx_handle(struct ib_device *ib_dev,
+				 struct ib_uverbs_file *ufile,
+				 struct uverbs_attr_array *ctx, size_t num,
+				 void *_priv)
+{
+	struct uverbs_action_std_ctx_handler *priv = _priv;
+
+	WARN_ON((num != 1) && (num != 2));
+	return priv->handler(ib_dev, ufile, &ctx[0], (num == 2 ? &ctx[1] : NULL), priv->priv);
+}
+EXPORT_SYMBOL(uverbs_action_std_ctx_handle);
+
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 333ed70..fa51568 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -49,6 +49,7 @@
 #include <asm/uaccess.h>
 
 #include <rdma/ib.h>
+#include <rdma/rdma_ioctl.h>
 
 #include "uverbs.h"
 
@@ -211,6 +212,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 {
 	struct ib_uobject *uobj, *tmp;
 
+	down_write(&file->close_sem);
 	context->closing = 1;
 
 	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
@@ -306,6 +308,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	}
 
 	put_pid(context->tgid);
+	up_write(&file->close_sem);
 
 	return context->device->dealloc_ucontext(context);
 }
@@ -915,6 +918,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
 		goto err;
 	}
 
+	init_rwsem(&file->close_sem);
 	file->device	 = dev;
 	file->ucontext	 = NULL;
 	file->async_file = NULL;
@@ -973,6 +977,7 @@ static const struct file_operations uverbs_fops = {
 	.open	 = ib_uverbs_open,
 	.release = ib_uverbs_close,
 	.llseek	 = no_llseek,
+	.unlocked_ioctl = ib_uverbs_ioctl,
 };
 
 static const struct file_operations uverbs_mmap_fops = {
@@ -982,6 +987,7 @@ static const struct file_operations uverbs_mmap_fops = {
 	.open	 = ib_uverbs_open,
 	.release = ib_uverbs_close,
 	.llseek	 = no_llseek,
+	.unlocked_ioctl = ib_uverbs_ioctl,
 };
 
 static struct ib_client uverbs_client = {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 85d4c41..4688463 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1314,6 +1314,7 @@ struct ib_umem;
 
 struct ib_ucontext {
 	struct ib_device       *device;
+	struct ib_uverbs_file  *ufile;
 	struct list_head	pd_list;
 	struct list_head	mr_list;
 	struct list_head	mw_list;
@@ -1976,6 +1977,7 @@ struct ib_device {
 	int (*get_port_immutable)(struct ib_device *, u8, struct ib_port_immutable *);
 	struct list_head type_list;
 
+	u16				driver_id;
 	const struct uverbs_types	*types;
 };
 
diff --git a/include/rdma/uverbs_ioctl_cmd.h b/include/rdma/uverbs_ioctl_cmd.h
new file mode 100644
index 0000000..19806df
--- /dev/null
+++ b/include/rdma/uverbs_ioctl_cmd.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UVERBS_IOCTL_CMD_
+#define _UVERBS_IOCTL_CMD_
+
+#include <rdma/uverbs_ioctl.h>
+
+int ib_uverbs_std_dist(__u16 *attr_id, void *priv);
+
+/* common validators */
+
+int uverbs_action_std_handle(struct ib_device *ib_dev,
+			     struct ib_uverbs_file *ufile,
+			     struct uverbs_attr_array *ctx, size_t num,
+			     void *_priv);
+int uverbs_action_std_ctx_handle(struct ib_device *ib_dev,
+				 struct ib_uverbs_file *ufile,
+				 struct uverbs_attr_array *ctx, size_t num,
+				 void *_priv);
+
+struct uverbs_action_std_handler {
+	int (*handler)(struct ib_device *ib_dev, struct ib_ucontext *ucontext,
+		       struct uverbs_attr_array *common,
+		       struct uverbs_attr_array *vendor,
+		       void *priv);
+	void *priv;
+};
+
+struct uverbs_action_std_ctx_handler {
+	int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
+		       struct uverbs_attr_array *common,
+		       struct uverbs_attr_array *vendor,
+		       void *priv);
+	void *priv;
+};
+
+#endif
+
diff --git a/include/uapi/rdma/rdma_user_ioctl.h b/include/uapi/rdma/rdma_user_ioctl.h
index 5c1117a..f6927fc 100644
--- a/include/uapi/rdma/rdma_user_ioctl.h
+++ b/include/uapi/rdma/rdma_user_ioctl.h
@@ -42,6 +42,29 @@
  */
 #define IB_IOCTL_MAGIC			RDMA_IOCTL_MAGIC
 
+#define RDMA_VERBS_IOCTL \
+	_IOWR(RDMA_IOCTL_MAGIC, 1, struct ib_uverbs_ioctl_hdr)
+
+#define RDMA_DIRECT_IOCTL \
+	_IOWR(RDMA_IOCTL_MAGIC, 2, struct ib_uverbs_ioctl_hdr)
+
+struct ib_uverbs_attr {
+	__u16 attr_id;		/* command specific type attribute */
+	__u16 len;		/* NA for idr */
+	__u32 reserved;
+	 __u64 ptr_idr;		/* ptr typeo command/idr handle */
+};
+
+struct ib_uverbs_ioctl_hdr {
+	__u16 length;
+	__u16 flags;
+	__u16 object_type;
+	__u16 driver_id;
+	__u16 action;
+	__u16 num_attrs;
+	struct ib_uverbs_attr  attrs[0];
+};
+
 /* Legacy part
  * !!!! NOTE: It uses the same command index as VERBS
  */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC ABI V2 6/8] RDMA/core: Add initialize and cleanup of common types
       [not found] ` <1468941812-32286-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-07-19 15:23   ` [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface Matan Barak
@ 2016-07-19 15:23   ` Matan Barak
       [not found]     ` <1468941812-32286-7-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-07-19 15:23   ` [RFC ABI V2 7/8] RDMA/core: Add common code for querying device and init context Matan Barak
  2016-07-19 15:23   ` [RFC ABI V2 8/8] RDMA/mlx5: Add mlx5 initial support of the new infrastructure Matan Barak
  7 siblings, 1 reply; 50+ messages in thread
From: Matan Barak @ 2016-07-19 15:23 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Liran Liss,
	Haggai Eran, Tal Alon, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky, Matan Barak

The new ABI infrastructure introduced types per driver. Since current
drivers use common types, we mimics the current release ucontext
process using the new process.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h           |   9 +-
 drivers/infiniband/core/uverbs_ioctl_cmd.c | 130 +++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c      | 101 +---------------------
 include/rdma/uverbs_ioctl_cmd.h            |  15 ++++
 4 files changed, 156 insertions(+), 99 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 86be861..b65a865 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -174,8 +174,6 @@ struct ib_ucq_object {
 	u32			async_events_reported;
 };
 
-void idr_remove_uobj(struct ib_uobject *uobj);
-
 struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file,
 					struct ib_device *ib_dev,
 					int is_async);
@@ -197,6 +195,13 @@ void ib_uverbs_event_handler(struct ib_event_handler *handler,
 void ib_uverbs_dealloc_xrcd(struct ib_uverbs_device *dev, struct ib_xrcd *xrcd);
 
 int uverbs_dealloc_mw(struct ib_mw *mw);
+void ib_uverbs_release_ucq(struct ib_uverbs_file *file,
+			   struct ib_uverbs_event_file *ev_file,
+			   struct ib_ucq_object *uobj);
+void ib_uverbs_release_uevent(struct ib_uverbs_file *file,
+			      struct ib_uevent_object *uobj);
+void ib_uverbs_detach_umcast(struct ib_qp *qp,
+			     struct ib_uqp_object *uobj);
 
 struct ib_uverbs_flow_spec {
 	union {
diff --git a/drivers/infiniband/core/uverbs_ioctl_cmd.c b/drivers/infiniband/core/uverbs_ioctl_cmd.c
index 4c2d30e4..ea50c08 100644
--- a/drivers/infiniband/core/uverbs_ioctl_cmd.c
+++ b/drivers/infiniband/core/uverbs_ioctl_cmd.c
@@ -31,8 +31,11 @@
  */
 
 #include <rdma/uverbs_ioctl_cmd.h>
+#include <rdma/ib_user_verbs.h>
 #include <rdma/ib_verbs.h>
 #include <linux/bug.h>
+#include <linux/file.h>
+#include "rdma_core.h"
 #include "uverbs.h"
 
 #define IB_UVERBS_VENDOR_FLAG	0x8000
@@ -76,3 +79,130 @@ int uverbs_action_std_ctx_handle(struct ib_device *ib_dev,
 }
 EXPORT_SYMBOL(uverbs_action_std_ctx_handle);
 
+static void free_ah(struct uverbs_uobject_type *uobject_type,
+		    struct ib_uobject *uobject,
+		    struct ib_ucontext *ucontext)
+{
+	ib_destroy_ah((struct ib_ah *)uobject->object);
+}
+
+static void free_flow(struct uverbs_uobject_type *uobject_type,
+		      struct ib_uobject *uobject,
+		      struct ib_ucontext *ucontext)
+{
+	ib_destroy_flow((struct ib_flow *)uobject->object);
+}
+
+static void free_mw(struct uverbs_uobject_type *uobject_type,
+		    struct ib_uobject *uobject,
+		    struct ib_ucontext *ucontext)
+{
+	uverbs_dealloc_mw((struct ib_mw *)uobject->object);
+}
+
+static void free_qp(struct uverbs_uobject_type *uobject_type,
+		    struct ib_uobject *uobject,
+		    struct ib_ucontext *ucontext)
+{
+	struct ib_qp *qp = uobject->object;
+	struct ib_uqp_object *uqp =
+		container_of(uobject, struct ib_uqp_object, uevent.uobject);
+
+	if (qp != qp->real_qp) {
+		ib_close_qp(qp);
+	} else {
+		ib_uverbs_detach_umcast(qp, uqp);
+		ib_destroy_qp(qp);
+	}
+	ib_uverbs_release_uevent(ucontext->ufile, &uqp->uevent);
+}
+
+static void free_srq(struct uverbs_uobject_type *uobject_type,
+		     struct ib_uobject *uobject,
+		     struct ib_ucontext *ucontext)
+{
+	struct ib_srq *srq = uobject->object;
+	struct ib_uevent_object *uevent =
+		container_of(uobject, struct ib_uevent_object, uobject);
+
+	ib_destroy_srq(srq);
+	ib_uverbs_release_uevent(ucontext->ufile, uevent);
+}
+
+static void free_cq(struct uverbs_uobject_type *uobject_type,
+		    struct ib_uobject *uobject,
+		    struct ib_ucontext *ucontext)
+{
+	struct ib_cq *cq = uobject->object;
+	struct ib_uverbs_event_file *ev_file = cq->cq_context;
+	struct ib_ucq_object *ucq =
+		container_of(uobject, struct ib_ucq_object, uobject);
+
+	ib_destroy_cq(cq);
+	ib_uverbs_release_ucq(ucontext->ufile, ev_file, ucq);
+}
+
+static void free_mr(struct uverbs_uobject_type *uobject_type,
+		    struct ib_uobject *uobject,
+		    struct ib_ucontext *ucontext)
+{
+	ib_dereg_mr((struct ib_mr *)uobject);
+}
+
+static void free_xrcd(struct uverbs_uobject_type *uobject_type,
+		      struct ib_uobject *uobject,
+		      struct ib_ucontext *ucontext)
+{
+	struct ib_xrcd *xrcd = uobject->object;
+
+	mutex_lock(&ucontext->ufile->device->xrcd_tree_mutex);
+	ib_uverbs_dealloc_xrcd(ucontext->ufile->device, xrcd);
+	mutex_unlock(&ucontext->ufile->device->xrcd_tree_mutex);
+}
+
+static void free_pd(struct uverbs_uobject_type *uobject_type,
+		    struct ib_uobject *uobject,
+		    struct ib_ucontext *ucontext)
+{
+	ib_dealloc_pd((struct ib_pd *)uobject);
+}
+
+int rdma_initialize_common_types(struct ib_device *ib_dev, unsigned int types)
+{
+	static const struct
+	{
+		enum uverbs_common_types type;
+		void (*free)(struct uverbs_uobject_type *uobject_type,
+			     struct ib_uobject *uobject,
+			     struct ib_ucontext *ucontext);
+
+	} common_types[] = { /* by release order */
+		{.type = UVERBS_TYPE_AH,	.free = free_ah},
+		{.type = UVERBS_TYPE_MW,	.free = free_mw},
+		{.type = UVERBS_TYPE_FLOW,	.free = free_flow},
+		{.type = UVERBS_TYPE_QP,	.free = free_qp},
+		{.type = UVERBS_TYPE_SRQ,	.free = free_srq},
+		{.type = UVERBS_TYPE_CQ,	.free = free_cq},
+		{.type = UVERBS_TYPE_MR,	.free = free_mr},
+		{.type = UVERBS_TYPE_XRCD,	.free = free_xrcd},
+		{.type = UVERBS_TYPE_PD,	.free = free_pd},
+	};
+	int ret = 0;
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(common_types); i++) {
+		if (types & common_types[i].type) {
+			ret = ib_uverbs_uobject_type_add(&ib_dev->type_list,
+							 common_types[i].free,
+							 common_types[i].type);
+			if (ret)
+				goto free;
+		}
+	}
+
+free:
+	ib_uverbs_uobject_types_remove(ib_dev);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_initialize_common_types);
+
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index fa51568..9137698 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -52,6 +52,7 @@
 #include <rdma/rdma_ioctl.h>
 
 #include "uverbs.h"
+#include "rdma_core.h"
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("InfiniBand userspace verbs access");
@@ -195,8 +196,8 @@ void ib_uverbs_release_uevent(struct ib_uverbs_file *file,
 	spin_unlock_irq(&file->async_file->lock);
 }
 
-static void ib_uverbs_detach_umcast(struct ib_qp *qp,
-				    struct ib_uqp_object *uobj)
+void ib_uverbs_detach_umcast(struct ib_qp *qp,
+			     struct ib_uqp_object *uobj)
 {
 	struct ib_uverbs_mcast_entry *mcast, *tmp;
 
@@ -210,103 +211,9 @@ static void ib_uverbs_detach_umcast(struct ib_qp *qp,
 static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 				      struct ib_ucontext *context)
 {
-	struct ib_uobject *uobj, *tmp;
-
 	down_write(&file->close_sem);
 	context->closing = 1;
-
-	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
-		struct ib_ah *ah = uobj->object;
-
-		idr_remove_uobj(uobj);
-		ib_destroy_ah(ah);
-		kfree(uobj);
-	}
-
-	/* Remove MWs before QPs, in order to support type 2A MWs. */
-	list_for_each_entry_safe(uobj, tmp, &context->mw_list, list) {
-		struct ib_mw *mw = uobj->object;
-
-		idr_remove_uobj(uobj);
-		uverbs_dealloc_mw(mw);
-		kfree(uobj);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->rule_list, list) {
-		struct ib_flow *flow_id = uobj->object;
-
-		idr_remove_uobj(uobj);
-		ib_destroy_flow(flow_id);
-		kfree(uobj);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->qp_list, list) {
-		struct ib_qp *qp = uobj->object;
-		struct ib_uqp_object *uqp =
-			container_of(uobj, struct ib_uqp_object, uevent.uobject);
-
-		idr_remove_uobj(uobj);
-		if (qp != qp->real_qp) {
-			ib_close_qp(qp);
-		} else {
-			ib_uverbs_detach_umcast(qp, uqp);
-			ib_destroy_qp(qp);
-		}
-		ib_uverbs_release_uevent(file, &uqp->uevent);
-		kfree(uqp);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->srq_list, list) {
-		struct ib_srq *srq = uobj->object;
-		struct ib_uevent_object *uevent =
-			container_of(uobj, struct ib_uevent_object, uobject);
-
-		idr_remove_uobj(uobj);
-		ib_destroy_srq(srq);
-		ib_uverbs_release_uevent(file, uevent);
-		kfree(uevent);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->cq_list, list) {
-		struct ib_cq *cq = uobj->object;
-		struct ib_uverbs_event_file *ev_file = cq->cq_context;
-		struct ib_ucq_object *ucq =
-			container_of(uobj, struct ib_ucq_object, uobject);
-
-		idr_remove_uobj(uobj);
-		ib_destroy_cq(cq);
-		ib_uverbs_release_ucq(file, ev_file, ucq);
-		kfree(ucq);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->mr_list, list) {
-		struct ib_mr *mr = uobj->object;
-
-		idr_remove_uobj(uobj);
-		ib_dereg_mr(mr);
-		kfree(uobj);
-	}
-
-	mutex_lock(&file->device->xrcd_tree_mutex);
-	list_for_each_entry_safe(uobj, tmp, &context->xrcd_list, list) {
-		struct ib_xrcd *xrcd = uobj->object;
-		struct ib_uxrcd_object *uxrcd =
-			container_of(uobj, struct ib_uxrcd_object, uobject);
-
-		idr_remove_uobj(uobj);
-		ib_uverbs_dealloc_xrcd(file->device, xrcd);
-		kfree(uxrcd);
-	}
-	mutex_unlock(&file->device->xrcd_tree_mutex);
-
-	list_for_each_entry_safe(uobj, tmp, &context->pd_list, list) {
-		struct ib_pd *pd = uobj->object;
-
-		idr_remove_uobj(uobj);
-		ib_dealloc_pd(pd);
-		kfree(uobj);
-	}
-
+	ib_uverbs_uobject_type_initialize_ucontext(context, &file->device->ib_dev->type_list);
 	put_pid(context->tgid);
 	up_write(&file->close_sem);
 
diff --git a/include/rdma/uverbs_ioctl_cmd.h b/include/rdma/uverbs_ioctl_cmd.h
index 19806df..18e652d 100644
--- a/include/rdma/uverbs_ioctl_cmd.h
+++ b/include/rdma/uverbs_ioctl_cmd.h
@@ -64,5 +64,20 @@ struct uverbs_action_std_ctx_handler {
 	void *priv;
 };
 
+int rdma_initialize_common_types(struct ib_device *ib_dev, unsigned int types);
+
+enum uverbs_common_types {
+	UVERBS_TYPE_DEVICE, /* Don't use IDRs here */
+	UVERBS_TYPE_PD,
+	UVERBS_TYPE_CQ,
+	UVERBS_TYPE_QP,
+	UVERBS_TYPE_SRQ,
+	UVERBS_TYPE_AH,
+	UVERBS_TYPE_MR,
+	UVERBS_TYPE_MW,
+	UVERBS_TYPE_FLOW,
+	UVERBS_TYPE_XRCD,
+};
+
 #endif
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC ABI V2 7/8] RDMA/core: Add common code for querying device and init context
       [not found] ` <1468941812-32286-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-07-19 15:23   ` [RFC ABI V2 6/8] RDMA/core: Add initialize and cleanup of common types Matan Barak
@ 2016-07-19 15:23   ` Matan Barak
  2016-07-19 15:23   ` [RFC ABI V2 8/8] RDMA/mlx5: Add mlx5 initial support of the new infrastructure Matan Barak
  7 siblings, 0 replies; 50+ messages in thread
From: Matan Barak @ 2016-07-19 15:23 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Liran Liss,
	Haggai Eran, Tal Alon, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky, Matan Barak

We add the common (core) code for init context and querying
device. This includes the following parts:
* Macros for defining commands and validators
* Creating a context command
    * handler
    * validator
* Querying a device
    * handler
    * validator

Drivers could use the these validators in their validator's chain
and use the handler directly in the action (or just wrap it in the
driver's custom code).

Currently we use ib_udata to pass vendor specific information to the
driver. This should probably be refactored in the future.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h           |   4 +
 drivers/infiniband/core/uverbs_cmd.c       |   7 +-
 drivers/infiniband/core/uverbs_ioctl_cmd.c | 202 +++++++++++++++++++++++++++++
 include/rdma/uverbs_ioctl.h                |  79 +++++++++++
 include/rdma/uverbs_ioctl_cmd.h            |  51 ++++++++
 5 files changed, 339 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index b65a865..0e06a8e 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -195,6 +195,10 @@ void ib_uverbs_event_handler(struct ib_event_handler *handler,
 void ib_uverbs_dealloc_xrcd(struct ib_uverbs_device *dev, struct ib_xrcd *xrcd);
 
 int uverbs_dealloc_mw(struct ib_mw *mw);
+void uverbs_copy_query_dev_fields(struct ib_device *ib_dev,
+				  struct ib_uverbs_query_device_resp *resp,
+				  struct ib_device_attr *attr);
+
 void ib_uverbs_release_ucq(struct ib_uverbs_file *file,
 			   struct ib_uverbs_event_file *ev_file,
 			   struct ib_ucq_object *uobj);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 9648b80..e913dc0 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -385,8 +385,7 @@ err:
 	return ret;
 }
 
-static void copy_query_dev_fields(struct ib_uverbs_file *file,
-				  struct ib_device *ib_dev,
+void uverbs_copy_query_dev_fields(struct ib_device *ib_dev,
 				  struct ib_uverbs_query_device_resp *resp,
 				  struct ib_device_attr *attr)
 {
@@ -447,7 +446,7 @@ ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file,
 		return -EFAULT;
 
 	memset(&resp, 0, sizeof resp);
-	copy_query_dev_fields(file, ib_dev, &resp, &ib_dev->attrs);
+	uverbs_copy_query_dev_fields(ib_dev, &resp, &ib_dev->attrs);
 
 	if (copy_to_user((void __user *) (unsigned long) cmd.response,
 			 &resp, sizeof resp))
@@ -3623,7 +3622,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 	if (err)
 		return err;
 
-	copy_query_dev_fields(file, ib_dev, &resp.base, &attr);
+	uverbs_copy_query_dev_fields(ib_dev, &resp.base, &attr);
 
 	if (ucore->outlen < resp.response_length + sizeof(resp.odp_caps))
 		goto end;
diff --git a/drivers/infiniband/core/uverbs_ioctl_cmd.c b/drivers/infiniband/core/uverbs_ioctl_cmd.c
index ea50c08..dc3990f 100644
--- a/drivers/infiniband/core/uverbs_ioctl_cmd.c
+++ b/drivers/infiniband/core/uverbs_ioctl_cmd.c
@@ -206,3 +206,205 @@ free:
 }
 EXPORT_SYMBOL(rdma_initialize_common_types);
 
+static void create_udata(struct uverbs_attr_array *vendor,
+			 struct ib_udata *udata)
+{
+	/*
+	 * This is for ease of conversion. The purpose is to convert all vendors
+	 * to use uverbs_attr_array instead of ib_udata.
+	 * Assume attr == 0 is input and attr == 1 is output.
+	 */
+	void * __user inbuf;
+	size_t inbuf_len = 0;
+	void * __user outbuf;
+	size_t outbuf_len = 0;
+
+	if(vendor) {
+		WARN_ON(vendor->num_attrs > 2);
+
+		if (vendor->attrs[0].valid) {
+			inbuf = vendor->attrs[0].cmd_attr.ptr;
+			inbuf_len = vendor->attrs[0].cmd_attr.len;
+		}
+
+		if (vendor->num_attrs == 2 && vendor->attrs[1].valid) {
+			outbuf = vendor->attrs[1].cmd_attr.ptr;
+			outbuf_len = vendor->attrs[1].cmd_attr.len;
+		}
+	}
+	INIT_UDATA_BUF_OR_NULL(udata, inbuf, outbuf, inbuf_len, outbuf_len);
+}
+
+DECLARE_UVERBS_ATTR_CHAIN_SPEC(
+	uverbs_get_context_spec,
+	UVERBS_ATTR_PTR_OUT(GET_CONTEXT_RESP,
+			    sizeof(struct ib_uverbs_get_context_resp)));
+EXPORT_SYMBOL(uverbs_get_context_spec);
+
+int uverbs_get_context(struct ib_device *ib_dev,
+		       struct ib_uverbs_file *file,
+		       struct uverbs_attr_array *common,
+		       struct uverbs_attr_array *vendor,
+		       void *priv)
+{
+	struct ib_udata uhw;
+	struct ib_uverbs_get_context_resp resp;
+	struct ib_ucontext		 *ucontext;
+	struct file			 *filp;
+	int ret;
+
+	if (!common->attrs[GET_CONTEXT_RESP].valid)
+		return -EINVAL;
+
+	/* Temporary, only until vendors get the new uverbs_attr_array */
+	create_udata(vendor, &uhw);
+
+	mutex_lock(&file->mutex);
+
+	if (file->ucontext) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ucontext = ib_dev->alloc_ucontext(ib_dev, &uhw);
+	if (IS_ERR(ucontext)) {
+		ret = PTR_ERR(ucontext);
+		goto err;
+	}
+
+	ucontext->device = ib_dev;
+	ret = ib_uverbs_uobject_type_initialize_ucontext(ucontext,
+							 &ib_dev->type_list);
+	if (ret)
+		goto err_context;
+
+	rcu_read_lock();
+	ucontext->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
+	rcu_read_unlock();
+	ucontext->closing = 0;
+
+#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
+	ucontext->umem_tree = RB_ROOT;
+	init_rwsem(&ucontext->umem_rwsem);
+	ucontext->odp_mrs_count = 0;
+	INIT_LIST_HEAD(&ucontext->no_private_counters);
+
+	if (!(ib_dev->attrs.device_cap_flags & IB_DEVICE_ON_DEMAND_PAGING))
+		ucontext->invalidate_range = NULL;
+
+#endif
+
+	resp.num_comp_vectors = file->device->num_comp_vectors;
+
+	ret = get_unused_fd_flags(O_CLOEXEC);
+	if (ret < 0)
+		goto err_free;
+	resp.async_fd = ret;
+
+	filp = ib_uverbs_alloc_event_file(file, ib_dev, 1);
+	if (IS_ERR(filp)) {
+		ret = PTR_ERR(filp);
+		goto err_fd;
+	}
+
+	if (copy_to_user(common->attrs[GET_CONTEXT_RESP].cmd_attr.ptr,
+			 &resp, sizeof(resp))) {
+		ret = -EFAULT;
+		goto err_file;
+	}
+
+	file->ucontext = ucontext;
+	ucontext->ufile = file;
+
+	fd_install(resp.async_fd, filp);
+
+	mutex_unlock(&file->mutex);
+
+	return 0;
+
+err_file:
+	ib_uverbs_free_async_event_file(file);
+	fput(filp);
+
+err_fd:
+	put_unused_fd(resp.async_fd);
+
+err_free:
+	put_pid(ucontext->tgid);
+err_context:
+	ib_dev->dealloc_ucontext(ucontext);
+
+err:
+	mutex_unlock(&file->mutex);
+	return ret;
+}
+EXPORT_SYMBOL(uverbs_get_context);
+
+DECLARE_UVERBS_ATTR_CHAIN_SPEC(
+	uverbs_query_device_spec,
+	UVERBS_ATTR_PTR_OUT(QUERY_DEVICE_RESP, sizeof(struct ib_uverbs_query_device_resp)),
+	UVERBS_ATTR_PTR_OUT(QUERY_DEVICE_ODP, sizeof(struct ib_uverbs_odp_caps)),
+	UVERBS_ATTR_PTR_OUT(QUERY_DEVICE_TIMESTAMP_MASK, sizeof(__u64)),
+	UVERBS_ATTR_PTR_OUT(QUERY_DEVICE_HCA_CORE_CLOCK, sizeof(__u64)),
+	UVERBS_ATTR_PTR_OUT(QUERY_DEVICE_CAP_FLAGS, sizeof(__u64)));
+EXPORT_SYMBOL(uverbs_query_device_spec);
+
+int uverbs_query_device_handler(struct ib_device *ib_dev,
+				struct ib_ucontext *ucontext,
+				struct uverbs_attr_array *common,
+				struct uverbs_attr_array *vendor,
+				void *priv)
+{
+	struct ib_device_attr attr = {};
+	struct ib_udata uhw;
+	int err;
+
+	/* Temporary, only until vendors get the new uverbs_attr_array */
+	create_udata(vendor, &uhw);
+
+	err = ib_dev->query_device(ib_dev, &attr, &uhw);
+	if (err)
+		return err;
+
+	if (common->attrs[QUERY_DEVICE_RESP].valid) {
+		struct ib_uverbs_query_device_resp resp = {};
+
+		uverbs_copy_query_dev_fields(ib_dev, &resp, &attr);
+		if (copy_to_user(common->attrs[QUERY_DEVICE_RESP].cmd_attr.ptr,
+				 &resp, sizeof(resp)))
+			return -EFAULT;
+	}
+
+#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
+	if (common->attrs[QUERY_DEVICE_ODP].valid) {
+		struct ib_uverbs_odp_caps odp_caps;
+
+		odp_caps.general_caps = attr.odp_caps.general_caps;
+		odp_caps.per_transport_caps.rc_odp_caps =
+			attr.odp_caps.per_transport_caps.rc_odp_caps;
+		odp_caps.per_transport_caps.uc_odp_caps =
+			attr.odp_caps.per_transport_caps.uc_odp_caps;
+		odp_caps.per_transport_caps.ud_odp_caps =
+			attr.odp_caps.per_transport_caps.ud_odp_caps;
+
+		if (copy_to_user(common->attrs[QUERY_DEVICE_ODP].cmd_attr.ptr,
+				 &odp_caps, sizeof(odp_caps)))
+			return -EFAULT;
+	}
+#endif
+	if (UVERBS_COPY_TO(common, QUERY_DEVICE_TIMESTAMP_MASK,
+			   &attr.timestamp_mask) == -EFAULT)
+		return -EFAULT;
+
+	if (UVERBS_COPY_TO(common, QUERY_DEVICE_HCA_CORE_CLOCK,
+			   &attr.hca_core_clock) == -EFAULT)
+		return -EFAULT;
+
+	if (UVERBS_COPY_TO(common, QUERY_DEVICE_CAP_FLAGS,
+			   &attr.device_cap_flags) == -EFAULT)
+		return -EFAULT;
+
+	return 0;
+}
+EXPORT_SYMBOL(uverbs_query_device_handler);
+
diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
index 7733b11..fb6c864 100644
--- a/include/rdma/uverbs_ioctl.h
+++ b/include/rdma/uverbs_ioctl.h
@@ -106,6 +106,85 @@ struct uverbs_types {
 	const struct uverbs_type_actions	**types;
 };
 
+#define UVERBS_ATTR(_id, _len, _type)					\
+	[_id] = {.len = _len, .type = _type}
+#define UVERBS_ATTR_PTR_IN(_id, _len)					\
+	UVERBS_ATTR(_id, _len, UVERBS_ATTR_TYPE_PTR_IN)
+#define UVERBS_ATTR_PTR_OUT(_id, _len)					\
+	UVERBS_ATTR(_id, _len, UVERBS_ATTR_TYPE_PTR_OUT)
+#define UVERBS_ATTR_IDR_SZ_IN(_id, _idr_type, _access, _new_sz)		\
+	[_id] = {.type = UVERBS_ATTR_TYPE_IDR,				\
+		 .idr = {.idr_type = _idr_type,				\
+			 .access = _access,				\
+			 .new_size = _new_sz} }
+#define UVERBS_ATTR_IDR_IN(_id, _idr_type, _access)			\
+	UVERBS_ATTR_IDR_SZ_IN(_id, _idr_type, _access, sizeof(struct ib_uobject))
+#define UVERBS_ATTR_CHAIN_SPEC_SZ(...)					\
+	(sizeof((const struct uverbs_attr_spec[]){__VA_ARGS__}) /	\
+	 sizeof(const struct uverbs_attr_spec))
+#define UVERBS_ATTR_CHAIN_SPEC(...)					\
+	(const struct uverbs_attr_chain_spec)				\
+	{.attrs = (struct uverbs_attr_spec[]){__VA_ARGS__},		\
+	 .num_attrs = UVERBS_ATTR_CHAIN_SPEC_SZ(__VA_ARGS__)}
+#define DECLARE_UVERBS_ATTR_CHAIN_SPEC(name, ...)			\
+	const struct uverbs_attr_chain_spec name =			\
+		UVERBS_ATTR_CHAIN_SPEC(__VA_ARGS__)
+#define UVERBS_ATTR_ACTION_SPEC_SZ(...)					  \
+	(sizeof((const struct uverbs_attr_chain_spec *[]){__VA_ARGS__}) / \
+				 sizeof(const struct uverbs_attr_chain_spec *))
+#define UVERBS_ATTR_ACTION_SPEC(_distfn, _priv, ...)			\
+	{.dist = _distfn,						\
+	 .priv = _priv,							\
+	 .num_chains =	UVERBS_ATTR_ACTION_SPEC_SZ(__VA_ARGS__),	\
+	 .validator_chains = (const struct uverbs_attr_chain_spec *[]){__VA_ARGS__} }
+#define UVERBS_STD_ACTION_SPEC(...)						\
+	UVERBS_ATTR_ACTION_SPEC(ib_uverbs_std_dist,				\
+				(void *)UVERBS_ATTR_ACTION_SPEC_SZ(__VA_ARGS__),\
+				__VA_ARGS__)
+#define UVERBS_STD_ACTION(_handler, _priv, ...)				\
+	{								\
+		.priv = &(struct uverbs_action_std_handler)		\
+			{.handler = _handler,				\
+			 .priv = _priv},				\
+		.handler = uverbs_action_std_handle,			\
+		.chain = UVERBS_STD_ACTION_SPEC(__VA_ARGS__)}
+#define UVERBS_STD_CTX_ACTION(_handler, _priv, ...)			\
+	{								\
+		.priv = &(struct uverbs_action_std_ctx_handler)		\
+			{.handler = _handler,				\
+			 .priv = _priv},				\
+		.handler = uverbs_action_std_ctx_handle,		\
+		.chain = UVERBS_STD_ACTION_SPEC(__VA_ARGS__)}
+#define UVERBS_ACTIONS_SZ(...)					\
+	(sizeof((const struct uverbs_action []){__VA_ARGS__}) /		\
+	 sizeof(const struct uverbs_action))
+#define UVERBS_ACTION(action_idx, _handler, _priv,  ...)		\
+	[action_idx] = UVERBS_STD_ACTION(_handler, _priv, __VA_ARGS__)
+#define UVERBS_CTX_ACTION(action_idx, _handler, _priv,  ...)		\
+	[action_idx] = UVERBS_STD_CTX_ACTION(_handler, _priv, __VA_ARGS__)
+#define UVERBS_ACTIONS(...)					\
+	((const struct uverbs_type_actions)				\
+	  {.num_actions = UVERBS_ACTIONS_SZ(__VA_ARGS__),		\
+	   .actions = (const struct uverbs_action[]){__VA_ARGS__} })
+#define DECLARE_UVERBS_TYPE(name, ...)					\
+	const struct uverbs_type_actions name = UVERBS_ACTIONS(__VA_ARGS__)
+#define UVERBS_TYPES_SZ(...)						\
+	(sizeof((const struct uverbs_type_actions *[]){__VA_ARGS__}) /	\
+	 sizeof(const struct uverbs_type_actions *))
+#define UVERBS_TYPE_ACTIONS(type_idx, ...)				\
+	[type_idx] = &UVERBS_ACTIONS(__VA_ARGS__)
+#define UVERBS_TYPE(type_idx, type_ptr)					\
+	[type_idx] = ((const struct uverbs_type_actions * const)&type_ptr)
+#define UVERBS_TYPES(...)						\
+	{.num_types = UVERBS_TYPES_SZ(__VA_ARGS__),			\
+	 .types = (const struct uverbs_type_actions *[]){__VA_ARGS__} }
+
+#define UVERBS_COPY_TO(attr_array, idx, from)				\
+	((attr_array)->attrs[idx].valid ?				\
+	 (copy_to_user((attr_array)->attrs[idx].cmd_attr.ptr, (from),	\
+		       (attr_array)->attrs[idx].cmd_attr.len) ?		\
+	  -EFAULT : 0) : -ENOENT)
+
 /* =================================================
  *              Parsing infrastructure
  * =================================================
diff --git a/include/rdma/uverbs_ioctl_cmd.h b/include/rdma/uverbs_ioctl_cmd.h
index 18e652d..185fed6 100644
--- a/include/rdma/uverbs_ioctl_cmd.h
+++ b/include/rdma/uverbs_ioctl_cmd.h
@@ -79,5 +79,56 @@ enum uverbs_common_types {
 	UVERBS_TYPE_XRCD,
 };
 
+#define UVERBS_COMMON_TYPES						       \
+	(UVERBS_TYPE_PD | UVERBS_TYPE_CQ | UVERBS_TYPE_QP | UVERBS_TYPE_SRQ |  \
+	 UVERBS_TYPE_AH | UVERBS_TYPE_MR | UVERBS_TYPE_MW | UVERBS_TYPE_FLOW | \
+	 UVERBS_TYPE_XRCD)
+
+enum uverbs_create_qp_cmd_attr {
+	CREATE_QP_CMD,
+	CREATE_QP_RESP,
+	CREATE_QP_QP,
+	CREATE_QP_PD,
+	CREATE_QP_RECV_CQ,
+	CREATE_QP_SEND_CQ,
+};
+
+enum uverbs_destroy_qp_cmd_attr {
+	DESTROY_QP_RESP,
+	DESTROY_QP_QP,
+};
+
+enum uverbs_create_cq_cmd_attr {
+	CREATE_CQ_CMD,
+	CREATE_CQ_RESP,
+};
+
+enum uverbs_get_context {
+	GET_CONTEXT_RESP,
+};
+
+enum uverbs_query_device {
+	QUERY_DEVICE_RESP,
+	QUERY_DEVICE_ODP,
+	QUERY_DEVICE_TIMESTAMP_MASK,
+	QUERY_DEVICE_HCA_CORE_CLOCK,
+	QUERY_DEVICE_CAP_FLAGS,
+};
+
+extern const struct uverbs_attr_chain_spec uverbs_get_context_spec;
+extern const struct uverbs_attr_chain_spec uverbs_query_device_spec;
+
+int uverbs_get_context(struct ib_device *ib_dev,
+		       struct ib_uverbs_file *file,
+		       struct uverbs_attr_array *common,
+		       struct uverbs_attr_array *vendor,
+		       void *priv);
+
+int uverbs_query_device_handler(struct ib_device *ib_dev,
+				struct ib_ucontext *ucontext,
+				struct uverbs_attr_array *common,
+				struct uverbs_attr_array *vendor,
+				void *priv);
+
 #endif
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC ABI V2 8/8] RDMA/mlx5: Add mlx5 initial support of the new infrastructure
       [not found] ` <1468941812-32286-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2016-07-19 15:23   ` [RFC ABI V2 7/8] RDMA/core: Add common code for querying device and init context Matan Barak
@ 2016-07-19 15:23   ` Matan Barak
       [not found]     ` <1468941812-32286-9-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  7 siblings, 1 reply; 50+ messages in thread
From: Matan Barak @ 2016-07-19 15:23 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Liran Liss,
	Haggai Eran, Tal Alon, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky, Matan Barak

It adds the following:
* Creating a context command
    * handler
    * validator
* Querying a device
    * handler
    * validator

It also populate the type table in the IB device.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/Makefile        |  3 +-
 drivers/infiniband/hw/mlx5/main.c          | 12 +++++-
 drivers/infiniband/hw/mlx5/mlx5_user_cmd.c | 69 ++++++++++++++++++++++++++++++
 drivers/infiniband/hw/mlx5/user.h          |  3 ++
 4 files changed, 84 insertions(+), 3 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_user_cmd.c

diff --git a/drivers/infiniband/hw/mlx5/Makefile b/drivers/infiniband/hw/mlx5/Makefile
index 7493a83..c1fed38 100644
--- a/drivers/infiniband/hw/mlx5/Makefile
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_MLX5_INFINIBAND)	+= mlx5_ib.o
 
-mlx5_ib-y :=	main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o gsi.o ib_virt.o
+mlx5_ib-y :=	main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o gsi.o \
+		ib_virt.o mlx5_user_cmd.o
 mlx5_ib-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += odp.o
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index c72797c..9ba97b0 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -45,6 +45,7 @@
 #include <rdma/ib_user_verbs.h>
 #include <rdma/ib_addr.h>
 #include <rdma/ib_cache.h>
+#include <rdma/uverbs_ioctl_cmd.h>
 #include <linux/mlx5/port.h>
 #include <linux/mlx5/vport.h>
 #include <rdma/ib_smi.h>
@@ -2467,10 +2468,16 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	if (err)
 		goto err_rsrc;
 
-	err = ib_register_device(&dev->ib_dev, NULL);
+	err = rdma_initialize_common_types(&dev->ib_dev, UVERBS_COMMON_TYPES);
 	if (err)
 		goto err_odp;
 
+	dev->ib_dev.types = &mlx5_types;
+
+	err = ib_register_device(&dev->ib_dev, NULL);
+	if (err)
+		goto err_remove_common_types;
+
 	err = create_umr_res(dev);
 	if (err)
 		goto err_dev;
@@ -2491,7 +2498,8 @@ err_umrc:
 
 err_dev:
 	ib_unregister_device(&dev->ib_dev);
-
+err_remove_common_types:
+	ib_uverbs_uobject_types_remove(&dev->ib_dev);
 err_odp:
 	mlx5_ib_odp_remove_one(dev);
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_user_cmd.c b/drivers/infiniband/hw/mlx5/mlx5_user_cmd.c
new file mode 100644
index 0000000..d1f7a02
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/mlx5_user_cmd.c
@@ -0,0 +1,69 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/uverbs_ioctl_cmd.h>
+#include "user.h"
+
+/* Refactor this to separate the versions */
+enum mlx5_alloc_ucontext_args {
+	ALLOC_UCONTEXT_IN,
+	ALLOC_UCONTEXT_OUT,
+};
+
+enum mlx5_device_actions {
+	DEVICE_ALLOC_CONTEXT,
+	DEVICE_QUERY,
+};
+
+DECLARE_UVERBS_TYPE(
+	mlx5_device,
+	UVERBS_CTX_ACTION(
+		DEVICE_ALLOC_CONTEXT, uverbs_get_context, NULL,
+		&uverbs_get_context_spec,
+		&UVERBS_ATTR_CHAIN_SPEC(
+			/*
+			 * Declared with size 0 as we current provide
+			 * backward compatibility (0 = variable size)
+			 */
+			UVERBS_ATTR_PTR_IN(ALLOC_UCONTEXT_IN, 0),
+			UVERBS_ATTR_PTR_OUT(ALLOC_UCONTEXT_OUT, 0),
+			),
+	),
+	UVERBS_ACTION(
+		DEVICE_QUERY, uverbs_query_device_handler, NULL,
+		&uverbs_query_device_spec,
+	),
+);
+
+struct uverbs_types mlx5_types = UVERBS_TYPES(
+	UVERBS_TYPE(UVERBS_TYPE_DEVICE, mlx5_device)
+);
diff --git a/drivers/infiniband/hw/mlx5/user.h b/drivers/infiniband/hw/mlx5/user.h
index 61bc308..ae6d8ab 100644
--- a/drivers/infiniband/hw/mlx5/user.h
+++ b/drivers/infiniband/hw/mlx5/user.h
@@ -194,4 +194,7 @@ static inline int get_srq_user_index(struct mlx5_ib_ucontext *ucontext,
 
 	return verify_assign_uidx(cqe_version, ucmd->uidx, user_index);
 }
+
+extern struct uverbs_types mlx5_types;
+
 #endif /* MLX5_IB_USER_H */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 1/8] RDMA/core: Export RDMA IOCTL declarations
       [not found]     ` <1468941812-32286-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-20  1:06       ` Christoph Lameter
       [not found]         ` <alpine.DEB.2.20.1607192004510.5828-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2016-07-21  8:07       ` Christoph Hellwig
  1 sibling, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-20  1:06 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Tue, 19 Jul 2016, Matan Barak wrote:

> Place all external RDMA IOCTL declarations into one UAPI exported
> header file and move all legacy MAD commands to that file.

Reviewed-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

Isnt this actually a proper cleanup because the legacy ioctl
definitions for MAD commands should have been a part of the uapi?
Thus this patch could be merged immediately?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 2/8] RDMA/core: Refactor IDR to be per-device
       [not found]     ` <1468941812-32286-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-20  1:15       ` Christoph Lameter
       [not found]         ` <alpine.DEB.2.20.1607192014010.5828-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2016-07-20 17:41       ` Jason Gunthorpe
  1 sibling, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-20  1:15 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Tue, 19 Jul 2016, Matan Barak wrote:

> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 432bed5..14bfe3b 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1706,6 +1706,10 @@ struct ib_device {
>
>  	struct iw_cm_verbs	     *iwcm;
>
> +	struct idr		idr;
> +	/* Global lock in use to safely release device IDR */
> +	spinlock_t		idr_lock;

Global? In what sense? This is a lock for the idrs for the particular
device right? The comment is a bit confusing. Maybe better remove it?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]     ` <1468941812-32286-6-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-20  1:25       ` Christoph Lameter
       [not found]         ` <alpine.DEB.2.20.1607192021500.5828-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-20  1:25 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Tue, 19 Jul 2016, Matan Barak wrote:

> diff --git a/include/uapi/rdma/rdma_user_ioctl.h b/include/uapi/rdma/rdma_user_ioctl.h
> index 5c1117a..f6927fc 100644
> --- a/include/uapi/rdma/rdma_user_ioctl.h
> +++ b/include/uapi/rdma/rdma_user_ioctl.h
> @@ -42,6 +42,29 @@
>   */
>  #define IB_IOCTL_MAGIC			RDMA_IOCTL_MAGIC
>
> +#define RDMA_VERBS_IOCTL \
> +	_IOWR(RDMA_IOCTL_MAGIC, 1, struct ib_uverbs_ioctl_hdr)
> +
> +#define RDMA_DIRECT_IOCTL \
> +	_IOWR(RDMA_IOCTL_MAGIC, 2, struct ib_uverbs_ioctl_hdr)
> +
> +struct ib_uverbs_attr {
> +	__u16 attr_id;		/* command specific type attribute */
> +	__u16 len;		/* NA for idr */
> +	__u32 reserved;
> +	 __u64 ptr_idr;		/* ptr typeo command/idr handle */
> +};
> +
> +struct ib_uverbs_ioctl_hdr {
> +	__u16 length;
> +	__u16 flags;
> +	__u16 object_type;
> +	__u16 driver_id;

driver id??? I would have expected that the driver to be used is
already selected through the file handle as passed to the ioctl.

Having the ability to issue ioctls to multiple drivers via a
single file handle is unusual and will likely trigger security concerns.
In particular various access mechanisms rely on policing access through
file handles.

I guess we have a good reason for doing so? If so then please document
that somewhere. Or did I not see that?


+	 __u16 action; > +	__u16 num_attrs;
> +	struct ib_uverbs_attr  attrs[0];
> +};
> +
>  /* Legacy part
>   * !!!! NOTE: It uses the same command index as VERBS
>   */
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 1/8] RDMA/core: Export RDMA IOCTL declarations
       [not found]         ` <alpine.DEB.2.20.1607192004510.5828-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-20  3:15           ` Leon Romanovsky
  0 siblings, 0 replies; 50+ messages in thread
From: Leon Romanovsky @ 2016-07-20  3:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Jason Gunthorpe, Sean Hefty, Liran Liss, Haggai Eran, Tal Alon,
	Majd Dibbiny

[-- Attachment #1: Type: text/plain, Size: 882 bytes --]

On Tue, Jul 19, 2016 at 08:06:35PM -0500, Christoph Lameter wrote:
> On Tue, 19 Jul 2016, Matan Barak wrote:
> 
> > Place all external RDMA IOCTL declarations into one UAPI exported
> > header file and move all legacy MAD commands to that file.
> 
> Reviewed-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> 
> Isnt this actually a proper cleanup because the legacy ioctl
> definitions for MAD commands should have been a part of the uapi?
> Thus this patch could be merged immediately?

Yes, it will.
We will respin this patch along with IDR refactoring patch as a separate
patchset to RFC ABI patches.

Thanks.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 2/8] RDMA/core: Refactor IDR to be per-device
       [not found]         ` <alpine.DEB.2.20.1607192014010.5828-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-20  3:20           ` Leon Romanovsky
       [not found]             ` <20160720032037.GR20674-2ukJVAZIZ/Y@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Leon Romanovsky @ 2016-07-20  3:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Jason Gunthorpe, Sean Hefty, Liran Liss, Haggai Eran, Tal Alon,
	Majd Dibbiny

[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]

On Tue, Jul 19, 2016 at 08:15:41PM -0500, Christoph Lameter wrote:
> On Tue, 19 Jul 2016, Matan Barak wrote:
> 
> > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> > index 432bed5..14bfe3b 100644
> > --- a/include/rdma/ib_verbs.h
> > +++ b/include/rdma/ib_verbs.h
> > @@ -1706,6 +1706,10 @@ struct ib_device {
> >
> >  	struct iw_cm_verbs	     *iwcm;
> >
> > +	struct idr		idr;
> > +	/* Global lock in use to safely release device IDR */
> > +	spinlock_t		idr_lock;
> 
> Global? In what sense? This is a lock for the idrs for the particular
> device right? The comment is a bit confusing. Maybe better remove it?

I'll fix it and will remove "global" word from the comment.
It should be /* Lock in use to safety release device IDR */. I prefer to
add comment to spinlock to pass checkpatch without errors/warnings about
missing comment for the spinlock.

Thanks.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]         ` <alpine.DEB.2.20.1607192021500.5828-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-20  8:11           ` Matan Barak
       [not found]             ` <53dd8337-0779-341e-49f5-ad675269bfe6-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Matan Barak @ 2016-07-20  8:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On 20/07/2016 04:25, Christoph Lameter wrote:
> On Tue, 19 Jul 2016, Matan Barak wrote:
>
>> diff --git a/include/uapi/rdma/rdma_user_ioctl.h b/include/uapi/rdma/rdma_user_ioctl.h
>> index 5c1117a..f6927fc 100644
>> --- a/include/uapi/rdma/rdma_user_ioctl.h
>> +++ b/include/uapi/rdma/rdma_user_ioctl.h
>> @@ -42,6 +42,29 @@
>>   */
>>  #define IB_IOCTL_MAGIC			RDMA_IOCTL_MAGIC
>>
>> +#define RDMA_VERBS_IOCTL \
>> +	_IOWR(RDMA_IOCTL_MAGIC, 1, struct ib_uverbs_ioctl_hdr)
>> +
>> +#define RDMA_DIRECT_IOCTL \
>> +	_IOWR(RDMA_IOCTL_MAGIC, 2, struct ib_uverbs_ioctl_hdr)
>> +
>> +struct ib_uverbs_attr {
>> +	__u16 attr_id;		/* command specific type attribute */
>> +	__u16 len;		/* NA for idr */
>> +	__u32 reserved;
>> +	 __u64 ptr_idr;		/* ptr typeo command/idr handle */
>> +};
>> +
>> +struct ib_uverbs_ioctl_hdr {
>> +	__u16 length;
>> +	__u16 flags;
>> +	__u16 object_type;
>> +	__u16 driver_id;
>
> driver id??? I would have expected that the driver to be used is
> already selected through the file handle as passed to the ioctl.
>

If I recall, Jason proposed that. I think the main reason here is for 
strace and debugging. Since the ABI is now driver specific, this helps 
you parse the structs via strace.

> Having the ability to issue ioctls to multiple drivers via a
> single file handle is unusual and will likely trigger security concerns.
> In particular various access mechanisms rely on policing access through
> file handles.
>

This is actually an ongoing discussion we have in the OFVWG. The benefit 
of unifying rdma-cm and uverbs is that sharing objects (and opening a 
device through rdma-cm and pass it to user-space) becomes easier. That 
means that security should be handled via selinux.

> I guess we have a good reason for doing so? If so then please document
> that somewhere. Or did I not see that?
>
>

The current v2 approach only adds the driver_id for information and 
strace only. I would really like to hear your insights about using a 
single file (rdma_uapi) vs multiple files as we have today.

> +	 __u16 action; > +	__u16 num_attrs;
>> +	struct ib_uverbs_attr  attrs[0];
>> +};
>> +
>>  /* Legacy part
>>   * !!!! NOTE: It uses the same command index as VERBS
>>   */
>>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]             ` <53dd8337-0779-341e-49f5-ad675269bfe6-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-20 10:03               ` Christoph Lameter
       [not found]                 ` <alpine.DEB.2.20.1607200448370.12616-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-20 10:03 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

[-- Attachment #1: Type: text/plain, Size: 3056 bytes --]

On Wed, 20 Jul 2016, Matan Barak wrote:

> > > +struct ib_uverbs_ioctl_hdr {
> > > +	__u16 length;
> > > +	__u16 flags;
> > > +	__u16 object_type;
> > > +	__u16 driver_id;
> >
> > driver id??? I would have expected that the driver to be used is
> > already selected through the file handle as passed to the ioctl.
> >
>
> If I recall, Jason proposed that. I think the main reason here is for strace
> and debugging. Since the ABI is now driver specific, this helps you parse the
> structs via strace.

You specify the driver_id in the ioctl data structure. The driver in an
ioctl is specified by the descriptor. strace etc would have an easier time
if we would follow the standard conventions for devices and not add
another device_id somewhere.

> This is actually an ongoing discussion we have in the OFVWG. The benefit of
> unifying rdma-cm and uverbs is that sharing objects (and opening a device
> through rdma-cm and pass it to user-space) becomes easier. That means that
> security should be handled via selinux.

Why would there be an issue of sharing data between multiple descriptors?
Data could be a subsystem specific state and not device specific if you
want to share.

> > I guess we have a good reason for doing so? If so then please document
> > that somewhere. Or did I not see that?
> The current v2 approach only adds the driver_id for information and strace
> only. I would really like to hear your insights about using a single file
> (rdma_uapi) vs multiple files as we have today.

Why do you need that information a second time if the descriptor already
provides the device information?

The simple approach would be to have either /dev or /sysfs based
descriptors that can be opened and then used for ioctls. F.e. open

/sys/class/infiniband/mlx5_0/desc or

/sys/class/infiniband/mlx5_0/ports/1/desc

or create a new hierachy using udev/systemd

f.e. /dev/rdma/mlx5_0

and then run ioctls on that file to configure the device. That is very
similar to the traditional use of ioctls. Security for each device can be
controlled separately without inspecting the data being passed (which
would require modifications to the security code). strace and
other tools would just natively know that the descriptor refers to a device.

Additional system calls would allow a straightforward war to control
passing the descriptor on exec and other fuunky things (see manpage
for fcntl f.e.). The usual machinery for device descriptor control and
management would come in almost for free.


IOCTL(2)                                            Linux Programmer's
Manual                                            IOCTL(2)

NAME
       ioctl - control device

SYNOPSIS
       #include <sys/ioctl.h>

       int ioctl(int d, int request, ...);

DESCRIPTION
       The ioctl() function manipulates the underlying device parameters
of special files.  In particular, many operating charac‐
       teristics of character special files (e.g., terminals) may be
controlled with ioctl() requests.  The argument d must be an
       open file descriptor.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                 ` <alpine.DEB.2.20.1607200448370.12616-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-20 11:56                   ` Matan Barak
       [not found]                     ` <0fc727fd-bc35-b2bd-f4a7-04efd937d4c3-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-07-20 17:21                   ` Jason Gunthorpe
  1 sibling, 1 reply; 50+ messages in thread
From: Matan Barak @ 2016-07-20 11:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On 20/07/2016 13:03, Christoph Lameter wrote:
> On Wed, 20 Jul 2016, Matan Barak wrote:
>
>>>> +struct ib_uverbs_ioctl_hdr {
>>>> +	__u16 length;
>>>> +	__u16 flags;
>>>> +	__u16 object_type;
>>>> +	__u16 driver_id;
>>>
>>> driver id??? I would have expected that the driver to be used is
>>> already selected through the file handle as passed to the ioctl.
>>>
>>
>> If I recall, Jason proposed that. I think the main reason here is for strace
>> and debugging. Since the ABI is now driver specific, this helps you parse the
>> structs via strace.
>
> You specify the driver_id in the ioctl data structure. The driver in an
> ioctl is specified by the descriptor. strace etc would have an easier time
> if we would follow the standard conventions for devices and not add
> another device_id somewhere.
>

If you use only one file (ie rdma_uapi), how will strace know which 
internal RDMA device is used?
It'll need to same some bits for the entire session to figure that out.

>> This is actually an ongoing discussion we have in the OFVWG. The benefit of
>> unifying rdma-cm and uverbs is that sharing objects (and opening a device
>> through rdma-cm and pass it to user-space) becomes easier. That means that
>> security should be handled via selinux.
>
> Why would there be an issue of sharing data between multiple descriptors?
> Data could be a subsystem specific state and not device specific if you
> want to share.
>

Currently uverbs and rdma-cm IDRs are separated. If we would like to 
introduce a semantics where rdma-cm opens the device (and maybe even 
access the device's objects) it needs to somehow share ids with the 
ib_dev context.
So do you suggest that rdma-cm will get a fd and open the required 
device on this fd while returning a handle bounded to this fd (you still 
need to bind the fd to its objects due to security reasons).

>>> I guess we have a good reason for doing so? If so then please document
>>> that somewhere. Or did I not see that?
>> The current v2 approach only adds the driver_id for information and strace
>> only. I would really like to hear your insights about using a single file
>> (rdma_uapi) vs multiple files as we have today.
>
> Why do you need that information a second time if the descriptor already
> provides the device information?
>
> The simple approach would be to have either /dev or /sysfs based
> descriptors that can be opened and then used for ioctls. F.e. open
>
> /sys/class/infiniband/mlx5_0/desc or
>
> /sys/class/infiniband/mlx5_0/ports/1/desc
>
> or create a new hierachy using udev/systemd
>
> f.e. /dev/rdma/mlx5_0
>
> and then run ioctls on that file to configure the device. That is very
> similar to the traditional use of ioctls. Security for each device can be
> controlled separately without inspecting the data being passed (which
> would require modifications to the security code). strace and
> other tools would just natively know that the descriptor refers to a device.
>
> Additional system calls would allow a straightforward war to control
> passing the descriptor on exec and other fuunky things (see manpage
> for fcntl f.e.). The usual machinery for device descriptor control and
> management would come in almost for free.
>
>

That's exactly where we stand right now. There are some supporters for 
unifying all devices to one file. Even if you want to unify rdma-cm to 
the same fd and open the device through it, strace can't know which 
physical device is used.

If we keep file per device as you described, we don't need this field.

> IOCTL(2)                                            Linux Programmer's
> Manual                                            IOCTL(2)
>
> NAME
>        ioctl - control device
>
> SYNOPSIS
>        #include <sys/ioctl.h>
>
>        int ioctl(int d, int request, ...);
>
> DESCRIPTION
>        The ioctl() function manipulates the underlying device parameters
> of special files.  In particular, many operating charac‐
>        teristics of character special files (e.g., terminals) may be
> controlled with ioctl() requests.  The argument d must be an
>        open file descriptor.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 2/8] RDMA/core: Refactor IDR to be per-device
       [not found]             ` <20160720032037.GR20674-2ukJVAZIZ/Y@public.gmane.org>
@ 2016-07-20 17:08               ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-20 17:08 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Lameter, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny

On Wed, Jul 20, 2016 at 06:20:37AM +0300, Leon Romanovsky wrote:
> > > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> > > index 432bed5..14bfe3b 100644
> > > +++ b/include/rdma/ib_verbs.h
> > > @@ -1706,6 +1706,10 @@ struct ib_device {
> > >
> > >  	struct iw_cm_verbs	     *iwcm;
> > >
> > > +	struct idr		idr;
> > > +	/* Global lock in use to safely release device IDR */
> > > +	spinlock_t		idr_lock;
> > 
> > Global? In what sense? This is a lock for the idrs for the particular
> > device right? The comment is a bit confusing. Maybe better remove it?
> 
> I'll fix it and will remove "global" word from the comment.
> It should be /* Lock in use to safety release device IDR */. I
> prefer to

That doesn't seem to be the right comment either.

+	spin_lock(&uobj->context->device->idr_lock);
+	ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0, GFP_NOWAIT);
+	spin_unlock(&uobj->context->device->idr_lock);

idr_lock appears to just be the standard spinlock protecting an IDR.

If it being re-used for something about device removal then that is
probably wrong....

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                 ` <alpine.DEB.2.20.1607200448370.12616-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2016-07-20 11:56                   ` Matan Barak
@ 2016-07-20 17:21                   ` Jason Gunthorpe
       [not found]                     ` <20160720172143.GG21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-20 17:21 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Wed, Jul 20, 2016 at 05:03:46AM -0500, Christoph Lameter wrote:
> > If I recall, Jason proposed that. I think the main reason here is for strace
> > and debugging. Since the ABI is now driver specific, this helps you parse the
> > structs via strace.
> 
> You specify the driver_id in the ioctl data structure. The driver in an
> ioctl is specified by the descriptor. strace etc would have an easier time
> if we would follow the standard conventions for devices and not add
> another device_id somewhere.

There isn't enough ioctl #'s for that. We probably need something like
over 500 ioctls by the time we are done all the drivers and
interfaces.. There is only about 100 reserved for RDMA today, and the
ioctl space is looking pretty full to steal another 4 blocks.

So the basic proposal is to use only a small number of ioctls and
have an 'extended ioctl #' in the struct.

The basic requirement is the same as in ioctl, the ioctl # and
extended # in the struct must be globally unique.

The device_id part of the extension allows each driver to have its own
globally unique number space without requireing another ioctl block.

> Why would there be an issue of sharing data between multiple descriptors?
> Data could be a subsystem specific state and not device specific if you
> want to share.

The state should be fd specific.

> Why do you need that information a second time if the descriptor already
> provides the device information?

Because the descriptor only indirectly implies a specific device.

It is the same reason overlapping ioctls are discouraged in the
kernel, even though I know I opened /dev/foo I still should send
globally unique 'FOO' ioctls to what I opened.

> and then run ioctls on that file to configure the device. That is very
> similar to the traditional use of ioctls. Security for each device can be
> controlled separately without inspecting the data being passed
> (which

It turns out that doesn't work today anyhow, since we have
multi-device requirements in the rdma_cm and you end up with a rdma_cm
descriptor that is world accessible and can interact with the network
even without device permissions.

netdev doesn't work in this screwy way, I think we should move away
from it as well...

> would require modifications to the security code). strace and
> other tools would just natively know that the descriptor refers to a device.

No, strace doesn't know what an open FD is, it just inspects the ioctl
# and content to do its decode. Requiring strace to decode differently
depending on what was opened would be the deviation from what we do
today.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 8/8] RDMA/mlx5: Add mlx5 initial support of the new infrastructure
       [not found]     ` <1468941812-32286-9-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-20 17:39       ` Jason Gunthorpe
       [not found]         ` <20160720173927.GH21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-20 17:39 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Tue, Jul 19, 2016 at 06:23:32PM +0300, Matan Barak wrote:
> +DECLARE_UVERBS_TYPE(
> +	mlx5_device,
> +	UVERBS_CTX_ACTION(
> +		DEVICE_ALLOC_CONTEXT, uverbs_get_context, NULL,
> +		&uverbs_get_context_spec,
> +		&UVERBS_ATTR_CHAIN_SPEC(
> +			/*
> +			 * Declared with size 0 as we current provide
> +			 * backward compatibility (0 = variable size)
> +			 */
> +			UVERBS_ATTR_PTR_IN(ALLOC_UCONTEXT_IN, 0),
> +			UVERBS_ATTR_PTR_OUT(ALLOC_UCONTEXT_OUT, 0),
> +			),
> +	),
> +	UVERBS_ACTION(
> +		DEVICE_QUERY, uverbs_query_device_handler, NULL,
> +		&uverbs_query_device_spec,
> +	),
> +);

The entire point of getting rid of the lists and changing the destruct
ordering was to avoid the need to have this kind of stuff in the
drivers.

I really don't want to see driver changes to implement the basic
API..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 6/8] RDMA/core: Add initialize and cleanup of common types
       [not found]     ` <1468941812-32286-7-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-20 17:40       ` Jason Gunthorpe
       [not found]         ` <20160720174052.GI21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-20 17:40 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Tue, Jul 19, 2016 at 06:23:30PM +0300, Matan Barak wrote:
> @@ -210,103 +211,9 @@ static void ib_uverbs_detach_umcast(struct ib_qp *qp,
>  static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
>  				      struct ib_ucontext *context)
>  {
> +	ib_uverbs_uobject_type_initialize_ucontext(context, &file->device->ib_dev->type_list);

There is a certain dissonance to having an initialize function called
inside a cleanup function - is that right (it doesn't look that
right)? If it is right, find a better name...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 2/8] RDMA/core: Refactor IDR to be per-device
       [not found]     ` <1468941812-32286-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-07-20  1:15       ` Christoph Lameter
@ 2016-07-20 17:41       ` Jason Gunthorpe
       [not found]         ` <20160720174122.GJ21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-20 17:41 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Tue, Jul 19, 2016 at 06:23:26PM +0300, Matan Barak wrote:
> From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> The current code creates an IDR per type. Since types are currently
> common for all vendors and known in advance, this was good enough.
> However, the proposed ioctl based infrastructure allows each vendor
> to declare only some of the common types and declare its own specific
> types.
> 
> Thus, we decided to implement IDR to be per device and refactor it to
> use a new file.

I still think this should be per-fd...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                     ` <0fc727fd-bc35-b2bd-f4a7-04efd937d4c3-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-21  0:38                       ` Christoph Lameter
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Lameter @ 2016-07-21  0:38 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Wed, 20 Jul 2016, Matan Barak wrote:

> If you use only one file (ie rdma_uapi), how will strace know which internal
> RDMA device is used?
> It'll need to same some bits for the entire session to figure that out.

Yes I thought that was the current approach? I do not like that approach.

> Currently uverbs and rdma-cm IDRs are separated. If we would like to introduce
> a semantics where rdma-cm opens the device (and maybe even access the device's
> objects) it needs to somehow share ids with the ib_dev context.
> So do you suggest that rdma-cm will get a fd and open the required device on
> this fd while returning a handle bounded to this fd (you still need to bind
> the fd to its objects due to security reasons).

Both operate on the same device. So they would use the same device handle
(descriptor).

> > Additional system calls would allow a straightforward war to control
> > passing the descriptor on exec and other fuunky things (see manpage
> > for fcntl f.e.). The usual machinery for device descriptor control and
> > management would come in almost for free.
> >
> >
>
> That's exactly where we stand right now. There are some supporters for
> unifying all devices to one file. Even if you want to unify rdma-cm to the
> same fd and open the device through it, strace can't know which physical
> device is used.

strace can know the device through the descriptor in the same way as done
for character and block devices if we would follow the convention of one
descriptor referring to one device. strace cannot know that if you use a
single file for multiple devices.

I think the idea of using a single file as a way to do RPC calls to a
subsystem needs to be retired. Operations occur on devices and ioctls
refer
to devices.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                     ` <20160720172143.GG21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-21  0:57                       ` Christoph Lameter
       [not found]                         ` <alpine.DEB.2.20.1607201950160.24804-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-21  0:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Wed, 20 Jul 2016, Jason Gunthorpe wrote:

> > You specify the driver_id in the ioctl data structure. The driver in an
> > ioctl is specified by the descriptor. strace etc would have an easier time
> > if we would follow the standard conventions for devices and not add
> > another device_id somewhere.
>
> There isn't enough ioctl #'s for that. We probably need something like
> over 500 ioctls by the time we are done all the drivers and
> interfaces.. There is only about 100 reserved for RDMA today, and the
> ioctl space is looking pretty full to steal another 4 blocks.

That sounds crazy. Surely there is a way to reduce that number.

> So the basic proposal is to use only a small number of ioctls and
> have an 'extended ioctl #' in the struct.

Ok then you already have much less.

> The device_id part of the extension allows each driver to have its own
> globally unique number space without requireing another ioctl block.

But you already have that uniqueness if the ioctl filehandle provides
that.

> > Why would there be an issue of sharing data between multiple descriptors?
> > Data could be a subsystem specific state and not device specific if you
> > want to share.
>
> The state should be fd specific.

I think that is what we want. Driver specific ioctls and driver specific
types. If you want subsystem wide types then something special would have
to happen permissions wise.

> > Why do you need that information a second time if the descriptor already
> > provides the device information?
>
> Because the descriptor only indirectly implies a specific device.

If you require specify a device on ioctl then it will directly indicate
the device you are using. AFAICT this is the way its supposed to be.

> > and then run ioctls on that file to configure the device. That is very
> > similar to the traditional use of ioctls. Security for each device can be
> > controlled separately without inspecting the data being passed
> > (which
>
> It turns out that doesn't work today anyhow, since we have
> multi-device requirements in the rdma_cm and you end up with a rdma_cm
> descriptor that is world accessible and can interact with the network
> even without device permissions.

multi-device requirements in the network code require a new device that
aggregates the other ones. The aggregation device is not world accessible.
The same approach could be used for RDMA.

> > would require modifications to the security code). strace and
> > other tools would just natively know that the descriptor refers to a device.
>
> No, strace doesn't know what an open FD is, it just inspects the ioctl
> # and content to do its decode. Requiring strace to decode differently
> depending on what was opened would be the deviation from what we do
> today.

It is the  basic Unix convention to have a descriptor/file handle refer to
a device. There is nothing special required here from strace.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                         ` <alpine.DEB.2.20.1607201950160.24804-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-21  1:56                           ` Jason Gunthorpe
       [not found]                             ` <20160721015613.GB8279-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-21  1:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Wed, Jul 20, 2016 at 07:57:09PM -0500, Christoph Lameter wrote:

> > The device_id part of the extension allows each driver to have its own
> > globally unique number space without requireing another ioctl block.
> 
> But you already have that uniqueness if the ioctl filehandle provides
> that.

The argument is that ioctls should be self-describing and never rely
on the filehandle to uniq them. That is basically standard in the
kernel, and why we have Documentation/ioctl/ioctl-number.txt
describing how to uniquely assign ioctl numbers.

There is nothing special here, we are just creating a much larger
ioctl # space by using (ioctl #, operation, device_id) as the globally
unique key, following existing best-practice if self-identifying ioctls.

> multi-device requirements in the network code require a new device that
> aggregates the other ones. The aggregation device is not world accessible.
> The same approach could be used for RDMA.

Some do, some don't. The rdma_cm requirement is more like
listen(0.0.0.0) which does not require special net device aggregation.

> > No, strace doesn't know what an open FD is, it just inspects the ioctl
> > # and content to do its decode. Requiring strace to decode differently
> > depending on what was opened would be the deviation from what we do
> > today.
> 
> It is the  basic Unix convention to have a descriptor/file handle refer to
> a device. There is nothing special required here from strace.

Sometimes yes, sometimes no. There are lots of examples both ways.

/dev/pts/ptmx, /dev/mapper/control, /dev/btrfs-control, etc, etc are
all examples of multi-device control fds similar to the proposed
single char dev.

.. and bear in mind that /dev/uverbs0 doesn't even really make that
much sense as it aggregates two physical ports. There is already no
way to split permissions up by port, which is a logical thing to want
for some dual-rail configurations.

What we have today really doesn't make much sense.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                             ` <20160721015613.GB8279-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-21  2:41                               ` Christoph Lameter
       [not found]                                 ` <alpine.DEB.2.20.1607202132080.26260-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-21  2:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Wed, 20 Jul 2016, Jason Gunthorpe wrote:

> >
> > But you already have that uniqueness if the ioctl filehandle provides
> > that.
>
> The argument is that ioctls should be self-describing and never rely
> on the filehandle to uniq them. That is basically standard in the
> kernel, and why we have Documentation/ioctl/ioctl-number.txt
> describing how to uniquely assign ioctl numbers.

Fully agree with that. But this statement does not relate to what we
were talking about. So I guess I need to restate it one more time:

device_id is useless if the driver is already determined by the device filehanle.

> There is nothing special here, we are just creating a much larger
> ioctl # space by using (ioctl #, operation, device_id) as the globally
> unique key, following existing best-practice if self-identifying ioctls.
>
> > multi-device requirements in the network code require a new device that
> > aggregates the other ones. The aggregation device is not world accessible.
> > The same approach could be used for RDMA.
>
> Some do, some don't. The rdma_cm requirement is more like
> listen(0.0.0.0) which does not require special net device aggregation.

Well that concept is for a packetized protocol stack. In that kind of a
system packets can be routed over multiple devices depending on the
address. We do not have that in the IB stack. Routing of packets would
require kernel code to run for each packet. Instead we have mostly point
to point RDMA requests. So thos consideration is not for the IB stack.

> > It is the  basic Unix convention to have a descriptor/file handle refer to
> > a device. There is nothing special required here from strace.
> Sometimes yes, sometimes no. There are lots of examples both ways.
>
> /dev/pts/ptmx, /dev/mapper/control, /dev/btrfs-control, etc, etc are
> all examples of multi-device control fds similar to the proposed
> single char dev.

Yes but those are not used for communications. They are used to control a
subsystem and access to those requires special priviledges. We are talking
about a device accessible witout special priviledges to do data
communications. /dev/rdmaXXXX to control global behavior of the stack
for all processes would be fine. But we are controlling the interaction of
a process with a device.

> .. and bear in mind that /dev/uverbs0 doesn't even really make that
> much sense as it aggregates two physical ports. There is already no
> way to split permissions up by port, which is a logical thing to want
> for some dual-rail configurations.

Right lets get rid of it. device specific files only.

> What we have today really doesn't make much sense.

Indeed lets adopt the standard filehandles for ioctls and other function
calls as much as posssible. One may then be able to reuse other existing
ioctls and other function calls for those filehandles (like fcntl which I
mentioned earlier).

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                 ` <alpine.DEB.2.20.1607202132080.26260-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-21  3:28                                   ` Jason Gunthorpe
       [not found]                                     ` <20160721032803.GA12093-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-21  3:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Wed, Jul 20, 2016 at 09:41:34PM -0500, Christoph Lameter wrote:
> On Wed, 20 Jul 2016, Jason Gunthorpe wrote:
> 
> > >
> > > But you already have that uniqueness if the ioctl filehandle provides
> > > that.
> >
> > The argument is that ioctls should be self-describing and never rely
> > on the filehandle to uniq them. That is basically standard in the
> > kernel, and why we have Documentation/ioctl/ioctl-number.txt
> > describing how to uniquely assign ioctl numbers.
> 
> Fully agree with that. But this statement does not relate to what we
> were talking about. So I guess I need to restate it one more time:

> device_id is useless if the driver is already determined by the device filehanle.

It isn't useless, it preserves the self describing property that is
the kernel standard for ioctls.

The filehandle major/minor will never uniquely describe the device, we
are not going to give mlx4,mlx5, etc unique major/minor numbers. Yes,
the kernel always implicitly knows what device driver the ioctl will
be delivered to.

But the kernel has that implicit knowledge for ioctls as well and we
still go through the trouble in Documentation/ioctl/ioctl-number.txt
to make them globally unique. device_id is exactly the same thing.

In otherwords, we are going to do something like this:

arg.driver_id = MLX4;
arg.driver_op = MLX4_FROBNICATE;
ioctl(IB_DRIVER_DO_SOMETHING,arg);

Something like strace looks at this and sees
IB_DRIVER_DO_SOMETHING,driver_id,driver_op and knows excatly how to
parse arg (struct mlx4_op_frobincate).

And it can tell it apart from this:

arg.driver_id = MLX5;
arg.driver_op = MLX5_BROBNICATE;
ioctl(IB_DRIVER_DO_SOMETHING,arg);

Which would have a different layout for arg.

Which is *exactly* what you expect for ioctl, and is the basic kernel
standard.

Think of driver_id and driver_op as being a 32 bit globally unique
value assigned to every driver function, run under the
IB_DRIVER_DO_SOMETHING multiplexor.

> > Some do, some don't. The rdma_cm requirement is more like
> > listen(0.0.0.0) which does not require special net device aggregation.
> 
> Well that concept is for a packetized protocol stack. In that kind of a
> system packets can be routed over multiple devices depending on the
> address. We do not have that in the IB stack.

We do have routing, RDMA_CM does it. Only once the connection is
established does it crystalize into specific hardware, which is
basically the same process as the net stack, rdma_cm uses the same
routing table functions to determine the RDMA device to route the QP
too, which is part of the problem with having things spread across all
these distinct FDs. They don't coorporate like they need to.

.. and there is also the issue of namespaces :| People want RDMA
namespaces, which I think really clouds how these per-device file
descriptors would sanely work, esepcially with the separate RDMA CM
fd.

> > /dev/pts/ptmx, /dev/mapper/control, /dev/btrfs-control, etc, etc are
> > all examples of multi-device control fds similar to the proposed
> > single char dev.
> 
> Yes but those are not used for communications. They are used to control a
> subsystem and access to those requires special priviledges. We are talking
> about a device accessible witout special priviledges to do data
> communications. /dev/rdmaXXXX to control global behavior of the stack
> for all processes would be fine. But we are controlling the interaction of
> a process with a device.

Eh? btrfs-control is mutliplexed across all mounted block devices used
by BTRFS, I wouldn't say it is any different than what we are talking
about with rdma.

Do you have an actual use for the currently somewhat broken fine-grained
permissions we have with uverbs0 ?

The only people a single char dev really impacts are users with
multiple cards, and I think that is fairly rare..

> > .. and bear in mind that /dev/uverbs0 doesn't even really make that
> > much sense as it aggregates two physical ports. There is already no
> > way to split permissions up by port, which is a logical thing to want
> > for some dual-rail configurations.
> 
> Right lets get rid of it. device specific files only.

There is a good reason why we bundle ports together - that is part of
how the IN spec works for PDs, APM and connection management. It is
not something we should throw away.

Even if we wanted to we just don't have the overall infrastructure to
restrict on a port by port basis (SELinux should ultimately fix that,
but SELinux will work the same in both single and multi char dev models)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                     ` <20160721032803.GA12093-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-21  5:00                                       ` Christoph Lameter
       [not found]                                         ` <alpine.DEB.2.20.1607202340280.26638-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-21  5:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Wed, 20 Jul 2016, Jason Gunthorpe wrote:

> > device_id is useless if the driver is already determined by the device filehanle.
>
> It isn't useless, it preserves the self describing property that is
> the kernel standard for ioctls.

What? Never heard about that and certainly do not see that in the ioctls I
have used. Something like the device is specified on on some special
sockets that control network stack behavior and other control files.

> But the kernel has that implicit knowledge for ioctls as well and we
> still go through the trouble in Documentation/ioctl/ioctl-number.txt
> to make them globally unique. device_id is exactly the same thing.

Nope. device_id is not part of the ioctl number. The uniqueness only
applies to the functionality requested from the device. Please do not mix
that up with device identification.


> In otherwords, we are going to do something like this:
>
> arg.driver_id = MLX4;
> arg.driver_op = MLX4_FROBNICATE;
> ioctl(IB_DRIVER_DO_SOMETHING,arg);

Uhhh.. The first argument to ioctl specifies the device!!! The
established way of doing things would be:

f = open("/dev/rdma/mlx5_0")

err = ioctl(f, IOCTL_RDMA_DO_SOMETHING, parameters)

>> Something like strace looks at this and sees
> IB_DRIVER_DO_SOMETHING,driver_id,driver_op and knows excatly how to
> parse arg (struct mlx4_op_frobincate).

Well there is no need to do that with the standard way. The filehandle
identifies the driver. No need to modify strace and no need to
schlepp the device_id around.


> And it can tell it apart from this:
>
> arg.driver_id = MLX5;
> arg.driver_op = MLX5_BROBNICATE;
> ioctl(IB_DRIVER_DO_SOMETHING,arg);
>
> Which would have a different layout for arg.
>
> Which is *exactly* what you expect for ioctl, and is the basic kernel
> standard.

I am sorry but ioctl really has a device parameter the first one. ioctls
without specifying a device do not exist as you seem to assume here.
Please read the manpage.


> Think of driver_id and driver_op as being a 32 bit globally unique
> value assigned to every driver function, run under the
> IB_DRIVER_DO_SOMETHING multiplexor.

No multiplexer please. Lets operate on the device specified.

> > Well that concept is for a packetized protocol stack. In that kind of a
> > system packets can be routed over multiple devices depending on the
> > address. We do not have that in the IB stack.
>
> We do have routing, RDMA_CM does it. Only once the connection is
> established does it crystalize into specific hardware, which is
> basically the same process as the net stack, rdma_cm uses the same
> routing table functions to determine the RDMA device to route the QP
> too, which is part of the problem with having things spread across all
> these distinct FDs. They don't coorporate like they need to.

OMG. And now we are adding vendor specific extensions. This is going to be
great fun to use. Case statements for vendor specific extension after we
have figured out which device to use?

Trivially the routing function could return a filehandle to use

> .. and there is also the issue of namespaces :| People want RDMA
> namespaces, which I think really clouds how these per-device file
> descriptors would sanely work, esepcially with the separate RDMA CM
> fd.

Namespaces already work for filehandles and always have. Look how devices
are handled in different namespaces.

> > > /dev/pts/ptmx, /dev/mapper/control, /dev/btrfs-control, etc, etc are
> > > all examples of multi-device control fds similar to the proposed
> > > single char dev.
> >
> > Yes but those are not used for communications. They are used to control a
> > subsystem and access to those requires special priviledges. We are talking
> > about a device accessible witout special priviledges to do data
> > communications. /dev/rdmaXXXX to control global behavior of the stack
> > for all processes would be fine. But we are controlling the interaction of
> > a process with a device.
>
> Eh? btrfs-control is mutliplexed across all mounted block devices used
> by BTRFS, I wouldn't say it is any different than what we are talking
> about with rdma.

But this is used to modify the operations of btrfs and do administration.
We are discussing here an endpoint establishing connections and
configurations specific to an application.

> Do you have an actual use for the currently somewhat broken fine-grained
> permissions we have with uverbs0 ?

Well a file exposed to one namespace and not another comes to mind.

A system may have multiple RDMA adapters and you want one application
running under one userid to have access to one and not the other. The
system may run a variety of more or less trustworthy apps and you would
like to restrict access. This is not possible with the existing scheme.

Also the filehandle trivially allows security subsystems to monitor the
usage of a device. One can identify all processes using one RDMA device
that may be going down etc etc.

> The only people a single char dev really impacts are users with
> multiple cards, and I think that is fairly rare..

Hmmm.... This is not rare here. We have both Ethernet RDMA and IB RDMA
devices (and sometimes multiple) in our systems.

> > Right lets get rid of it. device specific files only.
>
> There is a good reason why we bundle ports together - that is part of
> how the IN spec works for PDs, APM and connection management. It is
> not something we should throw away.

Ok what do those terms mean?

> Even if we wanted to we just don't have the overall infrastructure to
> restrict on a port by port basis (SELinux should ultimately fix that,
> but SELinux will work the same in both single and multi char dev models)

We already have the security infrastructure to control access by
filehandle both single device and multiple device. The multiplexer device
will cause additional security concerns because the ioctl packet must be
inspected to find the device. Please do not do this.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                         ` <alpine.DEB.2.20.1607202340280.26638-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-21  5:44                                           ` Jason Gunthorpe
       [not found]                                             ` <20160721054439.GA19329-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-21  5:44 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Thu, Jul 21, 2016 at 12:00:34AM -0500, Christoph Lameter wrote:
> On Wed, 20 Jul 2016, Jason Gunthorpe wrote:
> 
> > > device_id is useless if the driver is already determined by the device filehanle.
> >
> > It isn't useless, it preserves the self describing property that is
> > the kernel standard for ioctls.
> 
> What? Never heard about that and certainly do not see that in the ioctls I
> have used. Something like the device is specified on on some special
> sockets that control network stack behavior and other control files.

How many ioctls have you used that use a complex variable sized struct
as the parameter?

There are a few and they ones I've used are self-describing one way or
another. Perhaps they have an op code or something in the struct, or a
netlink-like format, or *something* but you don't need to know what
the fd is connected to in order to parse the struct. The ioctl # and
the argument should be enough to parse.

> > But the kernel has that implicit knowledge for ioctls as well and we
> >
> > arg.driver_id = MLX4;
> > arg.driver_op = MLX4_FROBNICATE;
> > ioctl(IB_DRIVER_DO_SOMETHING,arg);
> 
> Uhhh.. The first argument to ioctl specifies the device!!! The
> established way of doing things would be:

Yeah, I dropped it for brevity, because it doesn't really matter, we
all know ioctls work on fds...

> Well there is no need to do that with the standard way. The filehandle
> identifies the driver. No need to modify strace and no need to
> schlepp the device_id around.

Nope, that isn't how strace works, strace never checks the filehandle.

> > We do have routing, RDMA_CM does it. Only once the connection is
> > established does it crystalize into specific hardware, which is
> > basically the same process as the net stack, rdma_cm uses the same
> > routing table functions to determine the RDMA device to route the QP
> > too, which is part of the problem with having things spread across all
> > these distinct FDs. They don't coorporate like they need to.
> 
> OMG. And now we are adding vendor specific extensions. This is going to be
> great fun to use. Case statements for vendor specific extension after we
> have figured out which device to use?

There won't be case statements.

You are really surprised by the rdma_cm architecture??? I know it is
goofy, but we are stuck with it..

.. and we have been doing the un-described structs like you are asking
for today. libmlx4 just assumes the fd it is talking to is a mlx4
driver (because sysfs said so) and jams in untagged driver-specific
structs to the command flow. strace cannot parse them and there is no
kernel support for debugging or accidental misuse..

Adding a little tag to the driver specific struct seems totally
harmless, every iteration of the various proposals has had some kind
of struct tag one way or another. I don't know what you see strange
about this.

> Namespaces already work for filehandles and always have. Look how devices
> are handled in different namespaces.

People want to put different ports in different name spaces, the
uverbs fd is a full device thing, it doesn't work and doesn't make
sense as the namespace control point.

Heck, people want to put certain pkeys into namespaces.

The char dev alone is totally unsuitable for the namespace needs.

> Also the filehandle trivially allows security subsystems to monitor the
> usage of a device. One can identify all processes using one RDMA device
> that may be going down etc etc.

Not really, you can have rdma_cm things open listening that don't use
uverbs until a connection is triggered. Perhaps that is rare though.

> We already have the security infrastructure to control access by
> filehandle both single device and multiple device. The multiplexer device
> will cause additional security concerns because the ioctl packet must be
> inspected to find the device. Please do not do this.

I mean in IB, we don't have the ability to securely strip a single
port out of a device. This is why /dev/uverbs0 referes to both ports
on a card. Adding that ability would damage API capabilities we have.

We already have two command multiplexor fds in the current design and
they have exactly the security concerns you allude to. This is why we
have a SELinux patch set under consideration because labeling dev
nodes is not nearly enough. This is why the  namespace patches are
incomplete, etc..

There is no proposal to eliminate the multiplexors, I don't even know
how that could work...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                             ` <20160721054439.GA19329-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-21  7:11                                               ` Christoph Lameter
       [not found]                                                 ` <alpine.DEB.2.20.1607210202400.29415-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  2016-07-21  8:04                                               ` Christoph Hellwig
  1 sibling, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-21  7:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Wed, 20 Jul 2016, Jason Gunthorpe wrote:

> > Well there is no need to do that with the standard way. The filehandle
> > identifies the driver. No need to modify strace and no need to
> > schlepp the device_id around.
>
> Nope, that isn't how strace works, strace never checks the filehandle.

Ok why would strace check a filehandle in the first place? The descriptor
is the filehandle and you can simply find the operation that created that
file descriptor to find the device it refers to.

> > We already have the security infrastructure to control access by
> > filehandle both single device and multiple device. The multiplexer device
> > will cause additional security concerns because the ioctl packet must be
> > inspected to find the device. Please do not do this.
>
> I mean in IB, we don't have the ability to securely strip a single
> port out of a device. This is why /dev/uverbs0 referes to both ports
> on a card. Adding that ability would damage API capabilities we have.

We could easily do that following naming conventions for partitions or so.
Why would doing so damage the API capabilities? Seems that they are
sufficiently screwed up already. Cleaning that up could help quite a bit.

> We already have two command multiplexor fds in the current design and
> they have exactly the security concerns you allude to. This is why we
> have a SELinux patch set under consideration because labeling dev
> nodes is not nearly enough. This is why the  namespace patches are
> incomplete, etc..

Ok then this is the opportunity to get rid of these things.

> There is no proposal to eliminate the multiplexors, I don't even know
> how that could work...

I thought I just tried to outline how that could work. Consistently use
the device semantics already provided in the kernel and use the
functionality through ioctls, fnctls etc etc as already provided.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                             ` <20160721054439.GA19329-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2016-07-21  7:11                                               ` Christoph Lameter
@ 2016-07-21  8:04                                               ` Christoph Hellwig
       [not found]                                                 ` <20160721080433.GA21060-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  1 sibling, 1 reply; 50+ messages in thread
From: Christoph Hellwig @ 2016-07-21  8:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Lameter, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny, Leon Romanovsky

On Wed, Jul 20, 2016 at 11:44:39PM -0600, Jason Gunthorpe wrote:
> How many ioctls have you used that use a complex variable sized struct
> as the parameter?
> 
> There are a few and they ones I've used are self-describing one way or
> another. Perhaps they have an op code or something in the struct, or a
> netlink-like format, or *something* but you don't need to know what
> the fd is connected to in order to parse the struct. The ioctl # and
> the argument should be enough to parse.

In that case they are designed by taste-less fools.  No need to
to spread that BS even further.

> > Well there is no need to do that with the standard way. The filehandle
> > identifies the driver. No need to modify strace and no need to
> > schlepp the device_id around.
> 
> Nope, that isn't how strace works, strace never checks the filehandle.

Strace is just a random tracing tool.  No need to design interface
for it.  Just use a ioctl values that don't confluct with anything
heavily used and we'll be fine.

> You are really surprised by the rdma_cm architecture??? I know it is
> goofy, but we are stuck with it..

Are we?  We'll need to stick to the wire format obviously, but the
software architectures needs to die rather sooner than later.  It's the
worst pile of junk in the RDMA subsystem, and that really means
something.  It's almost impossible to use it correctly due to all the
warts and un- or under-documented assumtions in it's APIs.

> .. and we have been doing the un-described structs like you are asking
> for today. libmlx4 just assumes the fd it is talking to is a mlx4
> driver (because sysfs said so) and jams in untagged driver-specific
> structs to the command flow. strace cannot parse them and there is no
> kernel support for debugging or accidental misuse..
> 
> Adding a little tag to the driver specific struct seems totally
> harmless, every iteration of the various proposals has had some kind
> of struct tag one way or another. I don't know what you see strange
> about this.

Just use different ioctl codes for different drivers and you're done.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 1/8] RDMA/core: Export RDMA IOCTL declarations
       [not found]     ` <1468941812-32286-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-07-20  1:06       ` Christoph Lameter
@ 2016-07-21  8:07       ` Christoph Hellwig
       [not found]         ` <20160721080719.GB21060-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  1 sibling, 1 reply; 50+ messages in thread
From: Christoph Hellwig @ 2016-07-21  8:07 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Tue, Jul 19, 2016 at 06:23:25PM +0300, Matan Barak wrote:
> From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Place all external RDMA IOCTL declarations into one UAPI exported
> header file and move all legacy MAD commands to that file.

FYI, we also need to have all the structures used for the UAPI in
a uapi header.  I started on that a while ago, but it needs to be
resurrected.  Given that you're busy in this area it might be a good
idea if you pick this work up:

http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-uapi

> +#ifndef RDMA_IOCTL_H
> +#define RDMA_IOCTL_H
> +
> +#include <rdma/rdma_user_ioctl.h>
> +
> +#endif /* RDMA_IOCTL_H */

What's the point of this wrappers?  Either way no need to include
guards for a no-op header if it will be kept around for some reason.

> +/*
> + * For hfi1 driver
> + */
> +#define IB_IOCTL_MAGIC			RDMA_IOCTL_MAGIC
> +
> +/* Legacy part
> + * !!!! NOTE: It uses the same command index as VERBS
> + */

In that case it should be in the same header just to avoid confusion.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 1/8] RDMA/core: Export RDMA IOCTL declarations
       [not found]         ` <20160721080719.GB21060-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-07-21 10:49           ` Leon Romanovsky
  0 siblings, 0 replies; 50+ messages in thread
From: Leon Romanovsky @ 2016-07-21 10:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Jason Gunthorpe, Sean Hefty, Liran Liss, Haggai Eran, Tal Alon,
	Majd Dibbiny, Christoph Lameter

[-- Attachment #1: Type: text/plain, Size: 1592 bytes --]

On Thu, Jul 21, 2016 at 01:07:19AM -0700, Christoph Hellwig wrote:
> On Tue, Jul 19, 2016 at 06:23:25PM +0300, Matan Barak wrote:
> > From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > 
> > Place all external RDMA IOCTL declarations into one UAPI exported
> > header file and move all legacy MAD commands to that file.
> 
> FYI, we also need to have all the structures used for the UAPI in
> a uapi header.  I started on that a while ago, but it needs to be
> resurrected.  Given that you're busy in this area it might be a good
> idea if you pick this work up:
> 
> http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-uapi
> 
> > +#ifndef RDMA_IOCTL_H
> > +#define RDMA_IOCTL_H
> > +
> > +#include <rdma/rdma_user_ioctl.h>
> > +
> > +#endif /* RDMA_IOCTL_H */
> 
> What's the point of this wrappers?  Either way no need to include
> guards for a no-op header if it will be kept around for some reason.

I dropped this file, it is not needed.

> 
> > +/*
> > + * For hfi1 driver
> > + */
> > +#define IB_IOCTL_MAGIC			RDMA_IOCTL_MAGIC
> > +
> > +/* Legacy part
> > + * !!!! NOTE: It uses the same command index as VERBS
> > + */
> 
> In that case it should be in the same header just to avoid confusion.

I prepared proper series for this stuff and will send next week.

Thanks

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 8/8] RDMA/mlx5: Add mlx5 initial support of the new infrastructure
       [not found]         ` <20160720173927.GH21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-21 11:38           ` Matan Barak
       [not found]             ` <8ffd0f92-2564-cc39-10ab-5db287399e82-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Matan Barak @ 2016-07-21 11:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 20/07/2016 20:39, Jason Gunthorpe wrote:
> On Tue, Jul 19, 2016 at 06:23:32PM +0300, Matan Barak wrote:
>> +DECLARE_UVERBS_TYPE(
>> +	mlx5_device,
>> +	UVERBS_CTX_ACTION(
>> +		DEVICE_ALLOC_CONTEXT, uverbs_get_context, NULL,
>> +		&uverbs_get_context_spec,
>> +		&UVERBS_ATTR_CHAIN_SPEC(
>> +			/*
>> +			 * Declared with size 0 as we current provide
>> +			 * backward compatibility (0 = variable size)
>> +			 */
>> +			UVERBS_ATTR_PTR_IN(ALLOC_UCONTEXT_IN, 0),
>> +			UVERBS_ATTR_PTR_OUT(ALLOC_UCONTEXT_OUT, 0),
>> +			),
>> +	),
>> +	UVERBS_ACTION(
>> +		DEVICE_QUERY, uverbs_query_device_handler, NULL,
>> +		&uverbs_query_device_spec,
>> +	),
>> +);
>
> The entire point of getting rid of the lists and changing the destruct
> ordering was to avoid the need to have this kind of stuff in the
> drivers.
>
> I really don't want to see driver changes to implement the basic
> API..
>

You could declare entire types in a common space (and these two examples 
qualify for common declarations). However, if one wants to add driver 
specific arguments for  this command (besides some general arguments 
which could be driver dependent), it needs to be in a driver specific files.

I think this is a good trade-off between driver specific arguments, 
commands and types while sharing common handlers and command descriptors.


> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 6/8] RDMA/core: Add initialize and cleanup of common types
       [not found]         ` <20160720174052.GI21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-21 11:40           ` Matan Barak
  0 siblings, 0 replies; 50+ messages in thread
From: Matan Barak @ 2016-07-21 11:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 20/07/2016 20:40, Jason Gunthorpe wrote:
> On Tue, Jul 19, 2016 at 06:23:30PM +0300, Matan Barak wrote:
>> @@ -210,103 +211,9 @@ static void ib_uverbs_detach_umcast(struct ib_qp *qp,
>>  static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
>>  				      struct ib_ucontext *context)
>>  {
>> +	ib_uverbs_uobject_type_initialize_ucontext(context, &file->device->ib_dev->type_list);
>
> There is a certain dissonance to having an initialize function called
> inside a cleanup function - is that right (it doesn't look that
> right)? If it is right, find a better name...
>

That's a bug, thanks.

> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 2/8] RDMA/core: Refactor IDR to be per-device
       [not found]         ` <20160720174122.GJ21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-21 11:52           ` Matan Barak
       [not found]             ` <13309138-dd3f-7411-d750-4ff370d55daa-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Matan Barak @ 2016-07-21 11:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 20/07/2016 20:41, Jason Gunthorpe wrote:
> On Tue, Jul 19, 2016 at 06:23:26PM +0300, Matan Barak wrote:
>> From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> The current code creates an IDR per type. Since types are currently
>> common for all vendors and known in advance, this was good enough.
>> However, the proposed ioctl based infrastructure allows each vendor
>> to declare only some of the common types and declare its own specific
>> types.
>>
>> Thus, we decided to implement IDR to be per device and refactor it to
>> use a new file.
>
> I still think this should be per-fd...
>

If we unite rdma-cm and uverbs to one fd, so I agree - it's better to 
have one IDR per file (even though it might make sharing objects between 
processes a little bit harder). If we keep things as they are today 
(separate fds), I think both options are valid.

> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                                 ` <20160721080433.GA21060-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-07-21 16:20                                                   ` Jason Gunthorpe
       [not found]                                                     ` <20160721162033.GB19849-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-21 16:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Lameter, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny, Leon Romanovsky

On Thu, Jul 21, 2016 at 01:04:33AM -0700, Christoph Hellwig wrote:
> On Wed, Jul 20, 2016 at 11:44:39PM -0600, Jason Gunthorpe wrote:
> > How many ioctls have you used that use a complex variable sized struct
> > as the parameter?
> > 
> > There are a few and they ones I've used are self-describing one way or
> > another. Perhaps they have an op code or something in the struct, or a
> > netlink-like format, or *something* but you don't need to know what
> > the fd is connected to in order to parse the struct. The ioctl # and
> > the argument should be enough to parse.
> 
> In that case they are designed by taste-less fools.  No need to
> to spread that BS even further.

Well, what do you suggest then? This sounds like we are back to the
~500 ioctls. Is that OK & better?

This is not a small interface, and when you add in all the
driver-specific stuff there are *alot* of different API
call signatures..

I'm not so sure I want to see MLX5_CREATE_QP, MLX4_CREATE_QP,
QIB_CREATE_QP, etc, etc as ioctl numbers, that seems difficult
to implement the common code for.

> > You are really surprised by the rdma_cm architecture??? I know it is
> > goofy, but we are stuck with it..
> 
> Are we?  We'll need to stick to the wire format obviously, but the
> software architectures needs to die rather sooner than later.  It's the
> worst pile of junk in the RDMA subsystem, and that really means
> something.  It's almost impossible to use it correctly due to all the
> warts and un- or under-documented assumtions in it's APIs.

I think we are, it is baked into user space ...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                                 ` <alpine.DEB.2.20.1607210202400.29415-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-21 16:41                                                   ` Jason Gunthorpe
       [not found]                                                     ` <20160721164133.GD19849-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-21 16:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Thu, Jul 21, 2016 at 02:11:56AM -0500, Christoph Lameter wrote:
> On Wed, 20 Jul 2016, Jason Gunthorpe wrote:
> 
> > > Well there is no need to do that with the standard way. The filehandle
> > > identifies the driver. No need to modify strace and no need to
> > > schlepp the device_id around.
> >
> > Nope, that isn't how strace works, strace never checks the filehandle.
> 
> Ok why would strace check a filehandle in the first place? The descriptor
> is the filehandle and you can simply find the operation that created that
> file descriptor to find the device it refers to.

strace is stateless and can attach to a running process, it can't
watch for open() to figure things out. This is also why it doesn't
inspect the filehandle...

We don't *need* strace to work, but it should would be nice :|

> > I mean in IB, we don't have the ability to securely strip a single
> > port out of a device. This is why /dev/uverbs0 referes to both ports
> > on a card. Adding that ability would damage API capabilities we have.
> 
> We could easily do that following naming conventions for partitions or so.
> Why would doing so damage the API capabilities? Seems that they are
> sufficiently screwed up already. Cleaning that up could help quite a bit.

The current API is problematic because we try to both be like netdev
in that all devices are accessible (rdma_cm) and at the same with with
individual per-device chardevs (uverbs0).

The mismatch sucks and the two do not get along together as well at
they should, which leads to even more goofyness in the design of
things like rdma_cm :(

So, if you want to move fully to the per-char-dev model then I think
we'd give up the global netdev like behaviors, things like
listen(0.0.0) and output route selection, and so forth. I doubt there
is any support for that.

If we go the other way to a full netdev-like module then we give up
fine grained (currently mildly broken) file system permissions.

> > There is no proposal to eliminate the multiplexors, I don't even know
> > how that could work...
> 
> I thought I just tried to outline how that could work. Consistently use
> the device semantics already provided in the kernel and use the
> functionality through ioctls, fnctls etc etc as already provided.

You haven't explained how we can mesh the rdma_cm, netdev-like
listen(0.0.0.0) type semantics, continue to implement multi-port APM
functionality, share PDs across ports, etc, etc. These are all the
actual things done today that break when we drop the multiplexors.

This isn't a simple API that is 1:1 tied to a single physical object,
it is a sprawling thing with lots of built-in cross-device semantics. :(

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 8/8] RDMA/mlx5: Add mlx5 initial support of the new infrastructure
       [not found]             ` <8ffd0f92-2564-cc39-10ab-5db287399e82-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-21 16:49               ` Jason Gunthorpe
       [not found]                 ` <20160721164952.GE19849-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-21 16:49 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Thu, Jul 21, 2016 at 02:38:28PM +0300, Matan Barak wrote:
> On 20/07/2016 20:39, Jason Gunthorpe wrote:
> >On Tue, Jul 19, 2016 at 06:23:32PM +0300, Matan Barak wrote:
> >>+DECLARE_UVERBS_TYPE(
> >>+	mlx5_device,
> >>+	UVERBS_CTX_ACTION(
> >>+		DEVICE_ALLOC_CONTEXT, uverbs_get_context, NULL,
> >>+		&uverbs_get_context_spec,
> >>+		&UVERBS_ATTR_CHAIN_SPEC(
> >>+			/*
> >>+			 * Declared with size 0 as we current provide
> >>+			 * backward compatibility (0 = variable size)
> >>+			 */
> >>+			UVERBS_ATTR_PTR_IN(ALLOC_UCONTEXT_IN, 0),
> >>+			UVERBS_ATTR_PTR_OUT(ALLOC_UCONTEXT_OUT, 0),
> >>+			),
> >>+	),
> >>+	UVERBS_ACTION(
> >>+		DEVICE_QUERY, uverbs_query_device_handler, NULL,
> >>+		&uverbs_query_device_spec,
> >>+	),
> >>+);
> >
> >The entire point of getting rid of the lists and changing the destruct
> >ordering was to avoid the need to have this kind of stuff in the
> >drivers.
> >
> >I really don't want to see driver changes to implement the basic
> >API..
> >
> 
> You could declare entire types in a common space (and these two examples
> qualify for common declarations). However, if one wants to add driver
> specific arguments for  this command (besides some general arguments which
> could be driver dependent), it needs to be in a driver specific files.
> 
> I think this is a good trade-off between driver specific arguments, commands
> and types while sharing common handlers and command descriptors.

Are you going to fix and test every driver in the kernel?

In the interests of sanity, I think you need to provide a
straightforward way to retain the status-quo udata, at least as an
interm step..

I'm not excited to see things start here..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 2/8] RDMA/core: Refactor IDR to be per-device
       [not found]             ` <13309138-dd3f-7411-d750-4ff370d55daa-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-21 16:51               ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-21 16:51 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Thu, Jul 21, 2016 at 02:52:03PM +0300, Matan Barak wrote:
> On 20/07/2016 20:41, Jason Gunthorpe wrote:
> >On Tue, Jul 19, 2016 at 06:23:26PM +0300, Matan Barak wrote:
> >>From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>
> >>The current code creates an IDR per type. Since types are currently
> >>common for all vendors and known in advance, this was good enough.
> >>However, the proposed ioctl based infrastructure allows each vendor
> >>to declare only some of the common types and declare its own specific
> >>types.
> >>
> >>Thus, we decided to implement IDR to be per device and refactor it to
> >>use a new file.
> >
> >I still think this should be per-fd...
> >
> 
> If we unite rdma-cm and uverbs to one fd, so I agree - it's better to have
> one IDR per file (even though it might make sharing objects between
> processes a little bit harder). If we keep things as they are today
> (separate fds), I think both options are valid.

Sharing objects without having both fds in the same process for the
share seems inherently insecure.... Do we do sharing today? XRC?

I think namespaces are also a good motivation for the per-fd idr..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 8/8] RDMA/mlx5: Add mlx5 initial support of the new infrastructure
       [not found]                 ` <20160721164952.GE19849-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-24  6:18                   ` Matan Barak
  0 siblings, 0 replies; 50+ messages in thread
From: Matan Barak @ 2016-07-24  6:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 21/07/2016 19:49, Jason Gunthorpe wrote:
> On Thu, Jul 21, 2016 at 02:38:28PM +0300, Matan Barak wrote:
>> On 20/07/2016 20:39, Jason Gunthorpe wrote:
>>> On Tue, Jul 19, 2016 at 06:23:32PM +0300, Matan Barak wrote:
>>>> +DECLARE_UVERBS_TYPE(
>>>> +	mlx5_device,
>>>> +	UVERBS_CTX_ACTION(
>>>> +		DEVICE_ALLOC_CONTEXT, uverbs_get_context, NULL,
>>>> +		&uverbs_get_context_spec,
>>>> +		&UVERBS_ATTR_CHAIN_SPEC(
>>>> +			/*
>>>> +			 * Declared with size 0 as we current provide
>>>> +			 * backward compatibility (0 = variable size)
>>>> +			 */
>>>> +			UVERBS_ATTR_PTR_IN(ALLOC_UCONTEXT_IN, 0),
>>>> +			UVERBS_ATTR_PTR_OUT(ALLOC_UCONTEXT_OUT, 0),
>>>> +			),
>>>> +	),
>>>> +	UVERBS_ACTION(
>>>> +		DEVICE_QUERY, uverbs_query_device_handler, NULL,
>>>> +		&uverbs_query_device_spec,
>>>> +	),
>>>> +);
>>>
>>> The entire point of getting rid of the lists and changing the destruct
>>> ordering was to avoid the need to have this kind of stuff in the
>>> drivers.
>>>
>>> I really don't want to see driver changes to implement the basic
>>> API..
>>>
>>
>> You could declare entire types in a common space (and these two examples
>> qualify for common declarations). However, if one wants to add driver
>> specific arguments for  this command (besides some general arguments which
>> could be driver dependent), it needs to be in a driver specific files.
>>
>> I think this is a good trade-off between driver specific arguments, commands
>> and types while sharing common handlers and command descriptors.
>
> Are you going to fix and test every driver in the kernel?
>
> In the interests of sanity, I think you need to provide a
> straightforward way to retain the status-quo udata, at least as an
> interm step..
>
> I'm not excited to see things start here..

As an interm step, we use udata (see the actual example). By using that, 
these actions could be in a common place. However, IMHO since we 
declared this is a driver specific ABI where each driver could implement 
and use whatever attributes it wants and create new types and actions, I 
don't think we should differentiate between common attributes and driver 
specific attributes (as it's the driver which decides whats common and 
whats not).
Therefore, udata is great as an interm step avoiding changing 
everything, but as a last step it might be better to go with standard 
array of attributes.

>
> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                                     ` <20160721164133.GD19849-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-25 16:30                                                       ` Christoph Lameter
       [not found]                                                         ` <alpine.DEB.2.20.1607251104370.23039-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-25 16:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Thu, 21 Jul 2016, Jason Gunthorpe wrote:

> > Ok why would strace check a filehandle in the first place? The descriptor
> > is the filehandle and you can simply find the operation that created that
> > file descriptor to find the device it refers to.
>
> strace is stateless and can attach to a running process, it can't
> watch for open() to figure things out. This is also why it doesn't
> inspect the filehandle...

Well you can still lookup in the file handle in /proc/pid/.... if you want
that. Not sure why you are so focused on this.

> We don't *need* strace to work, but it should would be nice :|

It *is* nice. And it works fine for devices. Lets ensure that devices
are used in a standard way in the IB subsystem so that we can take full
advantage of the syscall infrastructure and the standard system calls.

> > We could easily do that following naming conventions for partitions or so.
> > Why would doing so damage the API capabilities? Seems that they are
> > sufficiently screwed up already. Cleaning that up could help quite a bit.
>
> The current API is problematic because we try to both be like netdev
> in that all devices are accessible (rdma_cm) and at the same with with
> individual per-device chardevs (uverbs0).

Device? uverbs is not a device. A particular connectx3 connected to the
pci bus is. And it should follow establish naming conventions etc. Please
lets drop the crap that is there now. If you use the notion of a device
the way it is designed to then we would have less issues.

> So, if you want to move fully to the per-char-dev model then I think
> we'd give up the global netdev like behaviors, things like
> listen(0.0.0) and output route selection, and so forth. I doubt there
> is any support for that.

Can the official listen() syscall be made to work over
infiniband devices? That would be best maybe?

I think in general one does the connection initiation via TCP and IP
protocol regardless... So really infiniband does only matter as the
underlying protocol over which we have imposed IP semantics via IPoIB.

> If we go the other way to a full netdev-like module then we give up
> fine grained (currently mildly broken) file system permissions.

Maybe go with a device semantic and not with full netdev because this is
not a classic packet based network.

> You haven't explained how we can mesh the rdma_cm, netdev-like
> listen(0.0.0.0) type semantics, continue to implement multi-port APM
> functionality, share PDs across ports, etc, etc. These are all the
> actual things done today that break when we drop the multiplexors.

I am not not *the* expert on this. Frankly this whole RDMA request stuff
is not that interesting. The basic thing that the RDMA API needs to do for
my use case is fast messaging bypassing the kernel. And having gazillion
of special ioctls on the site is not that productive. Can we please reuse
the standard system calls and ioctls as much as possible?

No idea what you mean by multiport "APMs". There is an obvius way to
aggreate devices by creating a new one like done in the storage subsystem.

Sharing PDs? Those are from the same address space using multiple devices.
It would be natural to share that in such a case since they are more bound
to the memory layout of a single process and not so much to the devices.
So PDs could be per process instead of per device.

> This isn't a simple API that is 1:1 tied to a single physical object,
> it is a sprawling thing with lots of built-in cross-device semantics. :(

Yes please simplify this sprawl as much as possible. Follow standard
convention instead of reinvention things like device aggregation.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                                     ` <20160721162033.GB19849-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-25 16:32                                                       ` Christoph Lameter
       [not found]                                                         ` <alpine.DEB.2.20.1607251131000.23039-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-25 16:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny, Leon Romanovsky

On Thu, 21 Jul 2016, Jason Gunthorpe wrote:

> I'm not so sure I want to see MLX5_CREATE_QP, MLX4_CREATE_QP,
> QIB_CREATE_QP, etc, etc as ioctl numbers, that seems difficult
> to implement the common code for.

Nobody wants that.... Why would that follow? Of course we would have
common ioctls that create QPs in the same way for all drivers.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                                         ` <alpine.DEB.2.20.1607251131000.23039-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-25 17:39                                                           ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-25 17:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Christoph Hellwig, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny, Leon Romanovsky

On Mon, Jul 25, 2016 at 11:32:23AM -0500, Christoph Lameter wrote:
> On Thu, 21 Jul 2016, Jason Gunthorpe wrote:
> 
> > I'm not so sure I want to see MLX5_CREATE_QP, MLX4_CREATE_QP,
> > QIB_CREATE_QP, etc, etc as ioctl numbers, that seems difficult
> > to implement the common code for.
> 
> Nobody wants that.... Why would that follow? Of course we would have
> common ioctls that create QPs in the same way for all drivers.

No, not of course. We don't have that today. Today every single
command is split between a commmon part and a variable sized (two-way)
driver-specific part. Nobody has come up with some way to avoid that,
so the plan is to continue that scheme.

Thus, every ioctl has a unique struct layout for every driver.

Christoph H. just said dynamic variable sized ioctl input structures
are ugly, so this is an alternative. Is this alternative less ugly
than the variable struct?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                                         ` <alpine.DEB.2.20.1607251104370.23039-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-25 18:01                                                           ` Jason Gunthorpe
       [not found]                                                             ` <20160725180119.GB18773-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Jason Gunthorpe @ 2016-07-25 18:01 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Mon, Jul 25, 2016 at 11:30:48AM -0500, Christoph Lameter wrote:

> > > We could easily do that following naming conventions for partitions or so.
> > > Why would doing so damage the API capabilities? Seems that they are
> > > sufficiently screwed up already. Cleaning that up could help quite a bit.
> >
> > The current API is problematic because we try to both be like netdev
> > in that all devices are accessible (rdma_cm) and at the same with with
> > individual per-device chardevs (uverbs0).
> 
> Device? uverbs is not a device. A particular connectx3 connected to the
> pci bus is. And it should follow establish naming conventions etc. Please
> lets drop the crap that is there now. If you use the notion of a device
> the way it is designed to then we would have less issues.

device in the RDMA sense, a device has a specific set of properties
related specifically to RDMA and the libibverbs API is designed around
this notion. We can't totally get rid of it, if we change the kernel
communication then 'device' has to be emulated somehow in userspace.

I don't know what you mean by 'device the way it is designed now' -
what infrastructure are you talking about?

> > So, if you want to move fully to the per-char-dev model then I think
> > we'd give up the global netdev like behaviors, things like
> > listen(0.0.0) and output route selection, and so forth. I doubt there
> > is any support for that.
> 
> Can the official listen() syscall be made to work over
> infiniband devices? That would be best maybe?

I have no idea if this is feasible, AFAIK nobody is looking into that
option.

If we do that, then we get a fd out of the standard scheme, what is
that FD? How does it get linked back to the actual driver ioctl interface?

> I think in general one does the connection initiation via TCP and IP
> protocol regardless... So really infiniband does only matter as the
> underlying protocol over which we have imposed IP semantics via IPoIB.

No, it is mostly all done over native protocols, not IP. IP is just
used for ARP and the routing table.

> > If we go the other way to a full netdev-like module then we give up
> > fine grained (currently mildly broken) file system permissions.
> 
> Maybe go with a device semantic and not with full netdev because this is
> not a classic packet based network.

Well, what do you mean? We are emulating netdev and strongly linking
ipoib netdev devices to the RDMA infrastructure - that is the mismatch
I keep talking about.

> > You haven't explained how we can mesh the rdma_cm, netdev-like
> > listen(0.0.0.0) type semantics, continue to implement multi-port APM
> > functionality, share PDs across ports, etc, etc. These are all the
> > actual things done today that break when we drop the multiplexors.
> 
> I am not not *the* expert on this. Frankly this whole RDMA request stuff
> is not that interesting. The basic thing that the RDMA API needs to do for
> my use case is fast messaging bypassing the kernel. And having gazillion
> of special ioctls on the site is not that productive. Can we please reuse
> the standard system calls and ioctls as much as possible?

I know it is not interesting for you, but this is the majority use
model for RDMA, so it has to be the main purpose for the design.

Your DPDK-like use case really has little need for most of the RDMA
API.

> No idea what you mean by multiport "APMs". There is an obvius way to
> aggreate devices by creating a new one like done in the storage subsystem.

That APM is a special RDMA feature for fast fail over and recovery, it
has dedicated hardware support and at least with our current API
cannot be modeled with device stacking. It is functionally different
from teaming/bonding like we see in ethernet.

> Sharing PDs? Those are from the same address space using multiple devices.
> It would be natural to share that in such a case since they are more bound
> to the memory layout of a single process and not so much to the devices.
> So PDs could be per process instead of per device.

I generally agree that we got this upside down, Devices/ports should
have been created under PDs.

But the fact remains, the hardware we have is very restricted on how
PD resources can be shared between ports. Today only ports on the same
RDMA device can share a PD. That is why we have the upside down model..

So, when I talk about sharing a PD, I mean in the sense that an app
cannot just create a single PD and use that for all RDMA. It currently
needs a PD per RDMA device.

I don't know if anyone is thinking this is a pain point.

> Yes please simplify this sprawl as much as possible. Follow standard
> convention instead of reinvention things like device aggregation.

I can't help that IBTA created a different scheme for device
aggregation, addressing, and basically everything else. 'standard
convention' in the kernel really just means ethernet, and this isn't
like ethernet, except superficially.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                                             ` <20160725180119.GB18773-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-27 15:03                                                               ` Christoph Lameter
       [not found]                                                                 ` <alpine.DEB.2.20.1607270954150.25436-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-27 15:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Liran Liss, Haggai Eran, Tal Alon, Majd Dibbiny,
	Leon Romanovsky

On Mon, 25 Jul 2016, Jason Gunthorpe wrote:

> I can't help that IBTA created a different scheme for device
> aggregation, addressing, and basically everything else. 'standard
> convention' in the kernel really just means ethernet, and this isn't
> like ethernet, except superficially.

Well you can create your own non ethernet protocol. Why is there not an IB
ulp that allows basic messaging, listen() and all the other stuff using
standard system calls? There should be a netdev. I already see devices
in /sys/class/infiniband/*! But they are not listed in /proc/net/dev!!!

Why can we not use these with regular syscalls? Could provide this basic
support for native infiniband? And maybe offload some of the need for
esoteric IB ioctl functions that could just be provided by the standard
calls? F.e. statistics and device configuration would be simplified and possible using
the standard tools. IB devices have MACs as well. We could reuse quite a
bit of software there.

I think Christoph H made the point already that the wire format needs to
stay the same and also the user verbs API (to some degree) but below that
we could do pretty much what we want.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                                                 ` <alpine.DEB.2.20.1607270954150.25436-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-27 17:28                                                                   ` Parav Pandit
       [not found]                                                                     ` <CAG53R5XZ9RVg-2yGFbr-7To1SMbRoGG3xiZHf-vNnGM4K+R4pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Parav Pandit @ 2016-07-27 17:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jason Gunthorpe, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Doug Ledford, Sean Hefty, Liran Liss, Haggai Eran, Tal Alon,
	Majd Dibbiny, Leon Romanovsky

On Wed, Jul 27, 2016 at 8:33 PM, Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> wrote:
>
> On Mon, 25 Jul 2016, Jason Gunthorpe wrote:
>
> > I can't help that IBTA created a different scheme for device
> > aggregation, addressing, and basically everything else. 'standard
> > convention' in the kernel really just means ethernet, and this isn't
> > like ethernet, except superficially.
>
> Well you can create your own non ethernet protocol. Why is there not an IB
> ulp that allows basic messaging, listen() and all the other stuff using
> standard system calls? There should be a netdev. I already see devices
> in /sys/class/infiniband/*! But they are not listed in /proc/net/dev!!!
>
> Why can we not use these with regular syscalls? Could provide this basic
> support for native infiniband? And maybe offload some of the need for
> esoteric IB ioctl functions that could just be provided by the standard
> calls? F.e. statistics and device configuration would be simplified and possible using
> the standard tools. IB devices have MACs as well. We could reuse quite a
> bit of software there.
>

I recall this functionality is being called as SDP (Socket Direct
Protocol) in Annex-4 of IB spec. 1.3 which used be present in older
2.6.x kernel as well as an ULP.
Vaguely recall it as non zcopy implementation.

Few older papers on this:
1. Asynchronous Zero-copy Communication for Synchronous Sockets in the
Sockets Direct Protocol (SDP) over InfiniBand
2. Zero Copy Sockets Direct Protocol over InfiniBand - Preliminary
Implementation and Performance Analysis
3. Transparently Achieving Superior Socket Performance Using Zero Copy
Socket Direct Protocol over IB
4. One more paper from Microsoft, I don't have ready reference too
which did zcopy implementation

Parav

> I think Christoph H made the point already that the wire format needs to
> stay the same and also the user verbs API (to some degree) but below that
> we could do pretty much what we want.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                                                     ` <CAG53R5XZ9RVg-2yGFbr-7To1SMbRoGG3xiZHf-vNnGM4K+R4pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-07-27 18:03                                                                       ` Christoph Lameter
       [not found]                                                                         ` <alpine.DEB.2.20.1607271300150.27032-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2016-07-27 18:03 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Gunthorpe, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Doug Ledford, Sean Hefty, Liran Liss, Haggai Eran, Tal Alon,
	Majd Dibbiny, Leon Romanovsky

On Wed, 27 Jul 2016, Parav Pandit wrote:

> I recall this functionality is being called as SDP (Socket Direct
> Protocol) in Annex-4 of IB spec. 1.3 which used be present in older
> 2.6.x kernel as well as an ULP.
> Vaguely recall it as non zcopy implementation.

It does not need to be zcopy if the functionality is only there for
administrative purposes and for other things where we are just too lazy to
get the whole QP stuff fully operational from user space and just fall
back to the old sockets paradigm where the QP stuff is done by th kernel
for us to give us a standard socket API. Diagnostics and configuration
would be using the standard tools and the established
infrastructure. F.e. ethtool would work without IPoIB using the native IB
addressing.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                                                         ` <alpine.DEB.2.20.1607271300150.27032-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
@ 2016-07-27 18:56                                                                           ` Parav Pandit
       [not found]                                                                             ` <CAG53R5V07Q-O8cmm3Ad23qsH3oDQiw9nL87yG2=ki=Rgg9LQAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 50+ messages in thread
From: Parav Pandit @ 2016-07-27 18:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jason Gunthorpe, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Doug Ledford, Sean Hefty, Liran Liss, Haggai Eran, Tal Alon,
	Majd Dibbiny, Leon Romanovsky

On Wed, Jul 27, 2016 at 11:33 PM, Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> wrote:
> On Wed, 27 Jul 2016, Parav Pandit wrote:
>
>> I recall this functionality is being called as SDP (Socket Direct
>> Protocol) in Annex-4 of IB spec. 1.3 which used be present in older
>> 2.6.x kernel as well as an ULP.
>> Vaguely recall it as non zcopy implementation.
>
> It does not need to be zcopy if the functionality is only there for
> administrative purposes and for other things where we are just too lazy to
> get the whole QP stuff fully operational from user space and just fall
> back to the old sockets paradigm where the QP stuff is done by th kernel
> for us to give us a standard socket API.
Agree. Would rsockets socket emulation be sufficient/good enough or
socket has to come from kernel?
It may be likely that some handful of applications might find socket
interface easy to integrate those application with pretty minor
changes which can still benefit from transport offload acceleration,
with cost of zcopy and syscall.

> Diagnostics and configuration
> would be using the standard tools and the established
> infrastructure. F.e. ethtool would work without IPoIB using the native IB
> addressing.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface
       [not found]                                                                             ` <CAG53R5V07Q-O8cmm3Ad23qsH3oDQiw9nL87yG2=ki=Rgg9LQAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-07-27 18:58                                                                               ` Christoph Lameter
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Lameter @ 2016-07-27 18:58 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Gunthorpe, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Doug Ledford, Sean Hefty, Liran Liss, Haggai Eran, Tal Alon,
	Majd Dibbiny, Leon Romanovsky

On Thu, 28 Jul 2016, Parav Pandit wrote:

> > It does not need to be zcopy if the functionality is only there for
> > administrative purposes and for other things where we are just too lazy to
> > get the whole QP stuff fully operational from user space and just fall
> > back to the old sockets paradigm where the QP stuff is done by th kernel
> > for us to give us a standard socket API.
> Agree. Would rsockets socket emulation be sufficient/good enough or
> socket has to come from kernel?

The implementation has to fully integrate with the kernel to take
advantage of all the diagnostics etc etc. User space wont help.

> changes which can still benefit from transport offload acceleration,
> with cost of zcopy and syscall.

Sure we could add ioctls etc to do QP setup etc etc. That then also has
the character of a general QP facility that could also be used by other
protocols in the future. Would make RDMA universally available if the
ioctls are also supported by other protocols. Which would be great.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2016-07-27 18:58 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19 15:23 [RFC ABI V2 0/8] SG-based RDMA ABI Proposal Matan Barak
     [not found] ` <1468941812-32286-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-19 15:23   ` [RFC ABI V2 1/8] RDMA/core: Export RDMA IOCTL declarations Matan Barak
     [not found]     ` <1468941812-32286-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-20  1:06       ` Christoph Lameter
     [not found]         ` <alpine.DEB.2.20.1607192004510.5828-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-20  3:15           ` Leon Romanovsky
2016-07-21  8:07       ` Christoph Hellwig
     [not found]         ` <20160721080719.GB21060-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-07-21 10:49           ` Leon Romanovsky
2016-07-19 15:23   ` [RFC ABI V2 2/8] RDMA/core: Refactor IDR to be per-device Matan Barak
     [not found]     ` <1468941812-32286-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-20  1:15       ` Christoph Lameter
     [not found]         ` <alpine.DEB.2.20.1607192014010.5828-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-20  3:20           ` Leon Romanovsky
     [not found]             ` <20160720032037.GR20674-2ukJVAZIZ/Y@public.gmane.org>
2016-07-20 17:08               ` Jason Gunthorpe
2016-07-20 17:41       ` Jason Gunthorpe
     [not found]         ` <20160720174122.GJ21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-21 11:52           ` Matan Barak
     [not found]             ` <13309138-dd3f-7411-d750-4ff370d55daa-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-21 16:51               ` Jason Gunthorpe
2016-07-19 15:23   ` [RFC ABI V2 3/8] RDMA/core: Add support for custom types Matan Barak
2016-07-19 15:23   ` [RFC ABI V2 4/8] RDMA/core: Introduce add/remove uobj from types Matan Barak
2016-07-19 15:23   ` [RFC ABI V2 5/8] RDMA/core: Add new ioctl interface Matan Barak
     [not found]     ` <1468941812-32286-6-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-20  1:25       ` Christoph Lameter
     [not found]         ` <alpine.DEB.2.20.1607192021500.5828-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-20  8:11           ` Matan Barak
     [not found]             ` <53dd8337-0779-341e-49f5-ad675269bfe6-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-20 10:03               ` Christoph Lameter
     [not found]                 ` <alpine.DEB.2.20.1607200448370.12616-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-20 11:56                   ` Matan Barak
     [not found]                     ` <0fc727fd-bc35-b2bd-f4a7-04efd937d4c3-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-21  0:38                       ` Christoph Lameter
2016-07-20 17:21                   ` Jason Gunthorpe
     [not found]                     ` <20160720172143.GG21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-21  0:57                       ` Christoph Lameter
     [not found]                         ` <alpine.DEB.2.20.1607201950160.24804-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-21  1:56                           ` Jason Gunthorpe
     [not found]                             ` <20160721015613.GB8279-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-21  2:41                               ` Christoph Lameter
     [not found]                                 ` <alpine.DEB.2.20.1607202132080.26260-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-21  3:28                                   ` Jason Gunthorpe
     [not found]                                     ` <20160721032803.GA12093-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-21  5:00                                       ` Christoph Lameter
     [not found]                                         ` <alpine.DEB.2.20.1607202340280.26638-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-21  5:44                                           ` Jason Gunthorpe
     [not found]                                             ` <20160721054439.GA19329-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-21  7:11                                               ` Christoph Lameter
     [not found]                                                 ` <alpine.DEB.2.20.1607210202400.29415-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-21 16:41                                                   ` Jason Gunthorpe
     [not found]                                                     ` <20160721164133.GD19849-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-25 16:30                                                       ` Christoph Lameter
     [not found]                                                         ` <alpine.DEB.2.20.1607251104370.23039-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-25 18:01                                                           ` Jason Gunthorpe
     [not found]                                                             ` <20160725180119.GB18773-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-27 15:03                                                               ` Christoph Lameter
     [not found]                                                                 ` <alpine.DEB.2.20.1607270954150.25436-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-27 17:28                                                                   ` Parav Pandit
     [not found]                                                                     ` <CAG53R5XZ9RVg-2yGFbr-7To1SMbRoGG3xiZHf-vNnGM4K+R4pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-27 18:03                                                                       ` Christoph Lameter
     [not found]                                                                         ` <alpine.DEB.2.20.1607271300150.27032-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-27 18:56                                                                           ` Parav Pandit
     [not found]                                                                             ` <CAG53R5V07Q-O8cmm3Ad23qsH3oDQiw9nL87yG2=ki=Rgg9LQAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-27 18:58                                                                               ` Christoph Lameter
2016-07-21  8:04                                               ` Christoph Hellwig
     [not found]                                                 ` <20160721080433.GA21060-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-07-21 16:20                                                   ` Jason Gunthorpe
     [not found]                                                     ` <20160721162033.GB19849-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-25 16:32                                                       ` Christoph Lameter
     [not found]                                                         ` <alpine.DEB.2.20.1607251131000.23039-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2016-07-25 17:39                                                           ` Jason Gunthorpe
2016-07-19 15:23   ` [RFC ABI V2 6/8] RDMA/core: Add initialize and cleanup of common types Matan Barak
     [not found]     ` <1468941812-32286-7-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-20 17:40       ` Jason Gunthorpe
     [not found]         ` <20160720174052.GI21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-21 11:40           ` Matan Barak
2016-07-19 15:23   ` [RFC ABI V2 7/8] RDMA/core: Add common code for querying device and init context Matan Barak
2016-07-19 15:23   ` [RFC ABI V2 8/8] RDMA/mlx5: Add mlx5 initial support of the new infrastructure Matan Barak
     [not found]     ` <1468941812-32286-9-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-20 17:39       ` Jason Gunthorpe
     [not found]         ` <20160720173927.GH21460-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-21 11:38           ` Matan Barak
     [not found]             ` <8ffd0f92-2564-cc39-10ab-5db287399e82-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-21 16:49               ` Jason Gunthorpe
     [not found]                 ` <20160721164952.GE19849-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-24  6:18                   ` Matan Barak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.