All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC ABI V5 00/10] SG-based RDMA ABI Proposal
@ 2016-10-27 14:43 Matan Barak
       [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Matan Barak @ 2016-10-27 14:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Christoph Lameter,
	Liran Liss, Haggai Eran, Majd Dibbiny, Matan Barak, Tal Alon,
	Leon Romanovsky

The following patch set comes to enrich security model as a follow up
to commit e6bd18f57aad ('IB/security: Restrict use of the write() interface').

DISCLAIMER:
These patches are far from being completed. They present working init_ucontext
and query_device (both regular and extended version). In addition, they are
given as a basis of discussions.

NOT ALL COMMENTS GIVEN ON PREVIOUS VERSIONS ARE HANDLED IN THIS SERIES,
SOME OF THEM WILL BE HANDLED IN THE FUTURE.

The ideas presented here are based on our previous series in addition to some
ideas presented in OFVWG and Sean's series.

This patch series add ioctl() interface to the existing write() interface and
provide an easy route to backport this change to legacy supported systems.
Analyzing the current uverbs role in dispatching and parsing commands, we find
that:
(a) uverbs validates the basic properties of the command
(b) uverbs is responsible of doing all the IDR and uobject management and
    locking. It's also responsible of handling completion FDs.
(c) uverbs transforms the user<-->kernel ABI to kernel API.

(a) and (b) are valid for every kABI. Although the nature of commands could
change, they still have to be validated and transform to kernel pointers.
In order to avoid duplications between the various drivers, we would like to
keep (a) and (b) as shared code.

In addition, this is a good time to expand the ABI to be more scalable, so we
added a few goals:
(1) Command's attributes shall be extensible in an easy one. Either by allowing
    drivers to have their own extensible set of attributes or core code
    extensible attributes. Moreover, driver's specific attributes could some
    day become core's standard attributes. We would like to still support
    old user-space while avoid duplicating the code in kernel.
(2) Each driver may have specific type system (i.e QP, CQ, ....). It may
    or may not even implement the standard type system. It could extend this
    type system in the future. Try to avoid duplicating existing types or
    actions.
(3) Do not change or recompile driver libraries and don't copy their data.
(4) Efficient dispatching.

Thus, in order to allow this flexibility, we decide giving (a) and (b) as a
common infrastructure, but use per-driver guidelines in order to do that
parsing and uobject management. Handlers are also set by the drivers
themselves (though they can point to either shared common code) or
driver specific code.

Since types are no longer enforced by the common infrastructure, there is no
point of pre-allocating common IDR types in the common code. Instead, we
provide an API for driver to add new types. We use one IDR per driver
for all its types. The driver declared all its supported types, their
free function and release order. After that, all uboject, exclusive access
and types are handled automatically for the driver by the infrastructure.

Scatter gather was chosen in order to allow us not to recompile user space
drivers. By using pointers to driver specific data, we could just use it
without introduce copying data and without changing the user-space driver at
all.

We chose to go with non blocking lock user objects. When exclusive
(WRITE or DESTROY) access is required, we dispatch the action if and only if
no other action needs this object as well. Otherwise, -EBUSY is returned to
the user-space. Device removal is synced with SRCU as of today.
If we were using locks, we would have need to sort the given user-space handles.
Otherwise, a user-space application may result in causing a deadlock.
Moving to a non blocking lock based behaviour, the dispatching in kernel
becomes more efficient.

We implement a compatibility layer between the old write implementation and
the new IOCTL based implementation by:
(a) Create IOCTL header and attributes descriptors.
(b) The attribute descriptors are mapped straight to the user-space supplied
    buffers. We expect that every subset of consecutive fields in the old ABI
    could be directly mapped to an attribute in the new ABI.
(c) We pass the DS of the headers to the IOCTL processing command.
(d) The IOCTL processing command parses the headers. It then move to USER_DS
    handles the data and then returns to the original DS.

Further uverbs related subsystem (such as RDMA-CM) may use other fds or use
other ioctl codes.

Note, we might switch to submitting one task (i.e - change locking schema) once
the concepts are more mature.

This series is based on Doug's k.o/for-4.9-fixed branch + Leon's [0] series.
A partially working libibverbs code, which is still based on the stand-alone
libibverbs git could be found in [1].

Regards,
Liran, Haggai, Leon and Matan

[0] RDMA/core: Unify style of IOCTL commands series
[1] https://github.com/matanb10/libibverbs/tree/abi_poc1

TODO:
1. Check other models for implementing FDs (as suggested in OFVWG).
2. Currently, this code only works with the new ioctl based libibverbs.
   Make this compatible with the old version.

Changes from V4:
1. Rebased over Doug's k.o/for-4.9-fixed branch.
2. Added create_qp and modify_qp commands.
3. Added libibverbs POC code. Started implementing the bits required for
   ibv_rc_pingpong.
4. Added a patch that puts the foundations of a compatibility layer
   between write commands and ioctl commands. This has some limitations
   of which every subset of the old write ABI should be directly mapped
   to an attribute of the new ABI.
5. Implement write's get_context using this compatibility layer.

Changes from V3:
1. Add create_cq and create_comp_channel.
2. Add FD as ib_uobject into the type system.

Changes from V2:
1. Use types declerations in order to declare release order and free function
2. Allow the driver to extend and use existing building blocks in any level:
        a. Add more types
        b. Add actions to exsiting types
        c. Add attributes to existing actions (existed in V2)
   Such a driver will only duplicate structs which it actually changed.
3. Fixed bugs in ucontext teardown and type allocation/locking.
4. Add reg_mr and init_pd

Changes from V1:
1. Refined locking system
	a. try_read_lock and write lock to sync exclusive access
	b. SRCU to sync device removal from commands execution
	c. Future rwsem to sync close context from commands execution
2. Added temporary udata usage for vendor's data
3. Add query_device and init_ucontext command with mlx5 implementation
4. Fixed bugs in ioctl dispatching
5. Change callbacks to get ib_uverbs_file instead of ucontext
6. Add general types initialization and cleanups

Leon Romanovsky (1):
  RDMA/core: Refactor IDR to be per-device

Matan Barak (9):
  RDMA/core: Add support for custom types
  RDMA/core: Add new ioctl interface
  RDMA/core: Add initialize and cleanup of common types
  RDMA/core: Add uverbs types, actions, handlers and attributes
  IB/mlx5: Implement common uverb objects
  IB/core: Support getting IOCTL header/SGEs from kernel space
  IB/core: Implement compatibility layer for get context command
  IB/core: Add create_qp command to the new ABI
  IB/core: Add modify_qp command to the new ABI

 drivers/infiniband/core/Makefile           |    3 +-
 drivers/infiniband/core/core_priv.h        |   14 +
 drivers/infiniband/core/device.c           |   18 +
 drivers/infiniband/core/rdma_core.c        |  505 ++++++++++++
 drivers/infiniband/core/rdma_core.h        |   77 ++
 drivers/infiniband/core/uverbs.h           |   38 +-
 drivers/infiniband/core/uverbs_cmd.c       |  344 ++++----
 drivers/infiniband/core/uverbs_ioctl.c     |  311 ++++++++
 drivers/infiniband/core/uverbs_ioctl_cmd.c | 1169 ++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c      |  188 ++---
 drivers/infiniband/hw/mlx5/main.c          |    2 +
 include/rdma/ib_verbs.h                    |   33 +-
 include/rdma/uverbs_ioctl.h                |  342 ++++++++
 include/rdma/uverbs_ioctl_cmd.h            |  330 ++++++++
 include/uapi/rdma/ib_user_verbs.h          |   39 +
 include/uapi/rdma/rdma_user_ioctl.h        |   23 +
 16 files changed, 3093 insertions(+), 343 deletions(-)
 create mode 100644 drivers/infiniband/core/rdma_core.c
 create mode 100644 drivers/infiniband/core/rdma_core.h
 create mode 100644 drivers/infiniband/core/uverbs_ioctl.c
 create mode 100644 drivers/infiniband/core/uverbs_ioctl_cmd.c
 create mode 100644 include/rdma/uverbs_ioctl.h
 create mode 100644 include/rdma/uverbs_ioctl_cmd.h

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device
       [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-10-27 14:43   ` Matan Barak
       [not found]     ` <1477579398-6875-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-10-27 14:43   ` [RFC ABI V5 02/10] RDMA/core: Add support for custom types Matan Barak
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Matan Barak @ 2016-10-27 14:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Christoph Lameter,
	Liran Liss, Haggai Eran, Majd Dibbiny, Matan Barak, Tal Alon,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The current code creates an IDR per type. Since types are currently
common for all vendors and known in advance, this was good enough.
However, the proposed ioctl based infrastructure allows each vendor
to declare only some of the common types and declare its own specific
types.

Thus, we decided to implement IDR to be per device and refactor it to
use a new file.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/device.c      |  14 +++
 drivers/infiniband/core/uverbs.h      |  16 +---
 drivers/infiniband/core/uverbs_cmd.c  | 157 ++++++++++++++++------------------
 drivers/infiniband/core/uverbs_main.c |  42 +++------
 include/rdma/ib_verbs.h               |   4 +
 5 files changed, 106 insertions(+), 127 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 760ef60..c3b68f5 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -168,11 +168,23 @@ static int alloc_name(char *name)
 	return 0;
 }
 
+static void ib_device_allocate_idrs(struct ib_device *device)
+{
+	spin_lock_init(&device->idr_lock);
+	idr_init(&device->idr);
+}
+
+static void ib_device_destroy_idrs(struct ib_device *device)
+{
+	idr_destroy(&device->idr);
+}
+
 static void ib_device_release(struct device *device)
 {
 	struct ib_device *dev = container_of(device, struct ib_device, dev);
 
 	ib_cache_release_one(dev);
+	ib_device_destroy_idrs(dev);
 	kfree(dev->port_immutable);
 	kfree(dev);
 }
@@ -219,6 +231,8 @@ struct ib_device *ib_alloc_device(size_t size)
 	if (!device)
 		return NULL;
 
+	ib_device_allocate_idrs(device);
+
 	device->dev.class = &ib_class;
 	device_initialize(&device->dev);
 
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index df26a74..8074705 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -38,7 +38,6 @@
 #define UVERBS_H
 
 #include <linux/kref.h>
-#include <linux/idr.h>
 #include <linux/mutex.h>
 #include <linux/completion.h>
 #include <linux/cdev.h>
@@ -176,20 +175,7 @@ struct ib_ucq_object {
 	u32			async_events_reported;
 };
 
-extern spinlock_t ib_uverbs_idr_lock;
-extern struct idr ib_uverbs_pd_idr;
-extern struct idr ib_uverbs_mr_idr;
-extern struct idr ib_uverbs_mw_idr;
-extern struct idr ib_uverbs_ah_idr;
-extern struct idr ib_uverbs_cq_idr;
-extern struct idr ib_uverbs_qp_idr;
-extern struct idr ib_uverbs_srq_idr;
-extern struct idr ib_uverbs_xrcd_idr;
-extern struct idr ib_uverbs_rule_idr;
-extern struct idr ib_uverbs_wq_idr;
-extern struct idr ib_uverbs_rwq_ind_tbl_idr;
-
-void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
+void idr_remove_uobj(struct ib_uobject *uobj);
 
 struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file,
 					struct ib_device *ib_dev,
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index cb3f515a..84daf2c 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -120,37 +120,36 @@ static void put_uobj_write(struct ib_uobject *uobj)
 	put_uobj(uobj);
 }
 
-static int idr_add_uobj(struct idr *idr, struct ib_uobject *uobj)
+static int idr_add_uobj(struct ib_uobject *uobj)
 {
 	int ret;
 
 	idr_preload(GFP_KERNEL);
-	spin_lock(&ib_uverbs_idr_lock);
+	spin_lock(&uobj->context->device->idr_lock);
 
-	ret = idr_alloc(idr, uobj, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0, GFP_NOWAIT);
 	if (ret >= 0)
 		uobj->id = ret;
 
-	spin_unlock(&ib_uverbs_idr_lock);
+	spin_unlock(&uobj->context->device->idr_lock);
 	idr_preload_end();
 
 	return ret < 0 ? ret : 0;
 }
 
-void idr_remove_uobj(struct idr *idr, struct ib_uobject *uobj)
+void idr_remove_uobj(struct ib_uobject *uobj)
 {
-	spin_lock(&ib_uverbs_idr_lock);
-	idr_remove(idr, uobj->id);
-	spin_unlock(&ib_uverbs_idr_lock);
+	spin_lock(&uobj->context->device->idr_lock);
+	idr_remove(&uobj->context->device->idr, uobj->id);
+	spin_unlock(&uobj->context->device->idr_lock);
 }
 
-static struct ib_uobject *__idr_get_uobj(struct idr *idr, int id,
-					 struct ib_ucontext *context)
+static struct ib_uobject *__idr_get_uobj(int id, struct ib_ucontext *context)
 {
 	struct ib_uobject *uobj;
 
 	rcu_read_lock();
-	uobj = idr_find(idr, id);
+	uobj = idr_find(&context->device->idr, id);
 	if (uobj) {
 		if (uobj->context == context)
 			kref_get(&uobj->ref);
@@ -162,12 +161,12 @@ static struct ib_uobject *__idr_get_uobj(struct idr *idr, int id,
 	return uobj;
 }
 
-static struct ib_uobject *idr_read_uobj(struct idr *idr, int id,
-					struct ib_ucontext *context, int nested)
+static struct ib_uobject *idr_read_uobj(int id, struct ib_ucontext *context,
+					int nested)
 {
 	struct ib_uobject *uobj;
 
-	uobj = __idr_get_uobj(idr, id, context);
+	uobj = __idr_get_uobj(id, context);
 	if (!uobj)
 		return NULL;
 
@@ -183,12 +182,11 @@ static struct ib_uobject *idr_read_uobj(struct idr *idr, int id,
 	return uobj;
 }
 
-static struct ib_uobject *idr_write_uobj(struct idr *idr, int id,
-					 struct ib_ucontext *context)
+static struct ib_uobject *idr_write_uobj(int id, struct ib_ucontext *context)
 {
 	struct ib_uobject *uobj;
 
-	uobj = __idr_get_uobj(idr, id, context);
+	uobj = __idr_get_uobj(id, context);
 	if (!uobj)
 		return NULL;
 
@@ -201,18 +199,18 @@ static struct ib_uobject *idr_write_uobj(struct idr *idr, int id,
 	return uobj;
 }
 
-static void *idr_read_obj(struct idr *idr, int id, struct ib_ucontext *context,
+static void *idr_read_obj(int id, struct ib_ucontext *context,
 			  int nested)
 {
 	struct ib_uobject *uobj;
 
-	uobj = idr_read_uobj(idr, id, context, nested);
+	uobj = idr_read_uobj(id, context, nested);
 	return uobj ? uobj->object : NULL;
 }
 
 static struct ib_pd *idr_read_pd(int pd_handle, struct ib_ucontext *context)
 {
-	return idr_read_obj(&ib_uverbs_pd_idr, pd_handle, context, 0);
+	return idr_read_obj(pd_handle, context, 0);
 }
 
 static void put_pd_read(struct ib_pd *pd)
@@ -222,7 +220,7 @@ static void put_pd_read(struct ib_pd *pd)
 
 static struct ib_cq *idr_read_cq(int cq_handle, struct ib_ucontext *context, int nested)
 {
-	return idr_read_obj(&ib_uverbs_cq_idr, cq_handle, context, nested);
+	return idr_read_obj(cq_handle, context, nested);
 }
 
 static void put_cq_read(struct ib_cq *cq)
@@ -230,24 +228,24 @@ static void put_cq_read(struct ib_cq *cq)
 	put_uobj_read(cq->uobject);
 }
 
-static struct ib_ah *idr_read_ah(int ah_handle, struct ib_ucontext *context)
+static void put_ah_read(struct ib_ah *ah)
 {
-	return idr_read_obj(&ib_uverbs_ah_idr, ah_handle, context, 0);
+	put_uobj_read(ah->uobject);
 }
 
-static void put_ah_read(struct ib_ah *ah)
+static struct ib_ah *idr_read_ah(int ah_handle, struct ib_ucontext *context)
 {
-	put_uobj_read(ah->uobject);
+	return idr_read_obj(ah_handle, context, 0);
 }
 
 static struct ib_qp *idr_read_qp(int qp_handle, struct ib_ucontext *context)
 {
-	return idr_read_obj(&ib_uverbs_qp_idr, qp_handle, context, 0);
+	return idr_read_obj(qp_handle, context, 0);
 }
 
 static struct ib_wq *idr_read_wq(int wq_handle, struct ib_ucontext *context)
 {
-	return idr_read_obj(&ib_uverbs_wq_idr, wq_handle, context, 0);
+	return idr_read_obj(wq_handle, context, 0);
 }
 
 static void put_wq_read(struct ib_wq *wq)
@@ -258,7 +256,7 @@ static void put_wq_read(struct ib_wq *wq)
 static struct ib_rwq_ind_table *idr_read_rwq_indirection_table(int ind_table_handle,
 							       struct ib_ucontext *context)
 {
-	return idr_read_obj(&ib_uverbs_rwq_ind_tbl_idr, ind_table_handle, context, 0);
+	return idr_read_obj(ind_table_handle, context, 0);
 }
 
 static void put_rwq_indirection_table_read(struct ib_rwq_ind_table *ind_table)
@@ -270,7 +268,7 @@ static struct ib_qp *idr_write_qp(int qp_handle, struct ib_ucontext *context)
 {
 	struct ib_uobject *uobj;
 
-	uobj = idr_write_uobj(&ib_uverbs_qp_idr, qp_handle, context);
+	uobj = idr_write_uobj(qp_handle, context);
 	return uobj ? uobj->object : NULL;
 }
 
@@ -286,7 +284,7 @@ static void put_qp_write(struct ib_qp *qp)
 
 static struct ib_srq *idr_read_srq(int srq_handle, struct ib_ucontext *context)
 {
-	return idr_read_obj(&ib_uverbs_srq_idr, srq_handle, context, 0);
+	return idr_read_obj(srq_handle, context, 0);
 }
 
 static void put_srq_read(struct ib_srq *srq)
@@ -297,7 +295,7 @@ static void put_srq_read(struct ib_srq *srq)
 static struct ib_xrcd *idr_read_xrcd(int xrcd_handle, struct ib_ucontext *context,
 				     struct ib_uobject **uobj)
 {
-	*uobj = idr_read_uobj(&ib_uverbs_xrcd_idr, xrcd_handle, context, 0);
+	*uobj = idr_read_uobj(xrcd_handle, context, 0);
 	return *uobj ? (*uobj)->object : NULL;
 }
 
@@ -305,7 +303,6 @@ static void put_xrcd_read(struct ib_uobject *uobj)
 {
 	put_uobj_read(uobj);
 }
-
 ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 			      struct ib_device *ib_dev,
 			      const char __user *buf,
@@ -575,7 +572,7 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
 	atomic_set(&pd->usecnt, 0);
 
 	uobj->object = pd;
-	ret = idr_add_uobj(&ib_uverbs_pd_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_idr;
 
@@ -599,7 +596,7 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_pd_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_idr:
 	ib_dealloc_pd(pd);
@@ -622,7 +619,7 @@ ssize_t ib_uverbs_dealloc_pd(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_pd_idr, cmd.pd_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.pd_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	pd = uobj->object;
@@ -640,7 +637,7 @@ ssize_t ib_uverbs_dealloc_pd(struct ib_uverbs_file *file,
 	uobj->live = 0;
 	put_uobj_write(uobj);
 
-	idr_remove_uobj(&ib_uverbs_pd_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -816,7 +813,7 @@ ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file,
 
 	atomic_set(&obj->refcnt, 0);
 	obj->uobject.object = xrcd;
-	ret = idr_add_uobj(&ib_uverbs_xrcd_idr, &obj->uobject);
+	ret = idr_add_uobj(&obj->uobject);
 	if (ret)
 		goto err_idr;
 
@@ -860,7 +857,7 @@ err_copy:
 	}
 
 err_insert_xrcd:
-	idr_remove_uobj(&ib_uverbs_xrcd_idr, &obj->uobject);
+	idr_remove_uobj(&obj->uobject);
 
 err_idr:
 	ib_dealloc_xrcd(xrcd);
@@ -894,7 +891,7 @@ ssize_t ib_uverbs_close_xrcd(struct ib_uverbs_file *file,
 		return -EFAULT;
 
 	mutex_lock(&file->device->xrcd_tree_mutex);
-	uobj = idr_write_uobj(&ib_uverbs_xrcd_idr, cmd.xrcd_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.xrcd_handle, file->ucontext);
 	if (!uobj) {
 		ret = -EINVAL;
 		goto out;
@@ -927,7 +924,7 @@ ssize_t ib_uverbs_close_xrcd(struct ib_uverbs_file *file,
 	if (inode && !live)
 		xrcd_table_delete(file->device, inode);
 
-	idr_remove_uobj(&ib_uverbs_xrcd_idr, uobj);
+	idr_remove_uobj(uobj);
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
 	mutex_unlock(&file->mutex);
@@ -1020,7 +1017,7 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 	atomic_inc(&pd->usecnt);
 
 	uobj->object = mr;
-	ret = idr_add_uobj(&ib_uverbs_mr_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_unreg;
 
@@ -1048,7 +1045,7 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_mr_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_unreg:
 	ib_dereg_mr(mr);
@@ -1093,8 +1090,7 @@ ssize_t ib_uverbs_rereg_mr(struct ib_uverbs_file *file,
 	     (cmd.start & ~PAGE_MASK) != (cmd.hca_va & ~PAGE_MASK)))
 			return -EINVAL;
 
-	uobj = idr_write_uobj(&ib_uverbs_mr_idr, cmd.mr_handle,
-			      file->ucontext);
+	uobj = idr_write_uobj(cmd.mr_handle, file->ucontext);
 
 	if (!uobj)
 		return -EINVAL;
@@ -1163,7 +1159,7 @@ ssize_t ib_uverbs_dereg_mr(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_mr_idr, cmd.mr_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.mr_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 
@@ -1178,7 +1174,7 @@ ssize_t ib_uverbs_dereg_mr(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_mr_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -1238,7 +1234,7 @@ ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
 	atomic_inc(&pd->usecnt);
 
 	uobj->object = mw;
-	ret = idr_add_uobj(&ib_uverbs_mw_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_unalloc;
 
@@ -1265,7 +1261,7 @@ ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_mw_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_unalloc:
 	uverbs_dealloc_mw(mw);
@@ -1291,7 +1287,7 @@ ssize_t ib_uverbs_dealloc_mw(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof(cmd)))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_mw_idr, cmd.mw_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.mw_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 
@@ -1306,7 +1302,7 @@ ssize_t ib_uverbs_dealloc_mw(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_mw_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -1420,7 +1416,7 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
 	atomic_set(&cq->usecnt, 0);
 
 	obj->uobject.object = cq;
-	ret = idr_add_uobj(&ib_uverbs_cq_idr, &obj->uobject);
+	ret = idr_add_uobj(&obj->uobject);
 	if (ret)
 		goto err_free;
 
@@ -1446,7 +1442,7 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
 	return obj;
 
 err_cb:
-	idr_remove_uobj(&ib_uverbs_cq_idr, &obj->uobject);
+	idr_remove_uobj(&obj->uobject);
 
 err_free:
 	ib_destroy_cq(cq);
@@ -1716,7 +1712,7 @@ ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_cq_idr, cmd.cq_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.cq_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	cq      = uobj->object;
@@ -1732,7 +1728,7 @@ ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_cq_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -1939,7 +1935,7 @@ static int create_qp(struct ib_uverbs_file *file,
 	qp->uobject = &obj->uevent.uobject;
 
 	obj->uevent.uobject.object = qp;
-	ret = idr_add_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	ret = idr_add_uobj(&obj->uevent.uobject);
 	if (ret)
 		goto err_destroy;
 
@@ -1987,7 +1983,7 @@ static int create_qp(struct ib_uverbs_file *file,
 
 	return 0;
 err_cb:
-	idr_remove_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	idr_remove_uobj(&obj->uevent.uobject);
 
 err_destroy:
 	ib_destroy_qp(qp);
@@ -2173,7 +2169,7 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
 	qp->uobject = &obj->uevent.uobject;
 
 	obj->uevent.uobject.object = qp;
-	ret = idr_add_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	ret = idr_add_uobj(&obj->uevent.uobject);
 	if (ret)
 		goto err_destroy;
 
@@ -2202,7 +2198,7 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
 	return in_len;
 
 err_remove:
-	idr_remove_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	idr_remove_uobj(&obj->uevent.uobject);
 
 err_destroy:
 	ib_destroy_qp(qp);
@@ -2442,7 +2438,7 @@ ssize_t ib_uverbs_destroy_qp(struct ib_uverbs_file *file,
 
 	memset(&resp, 0, sizeof resp);
 
-	uobj = idr_write_uobj(&ib_uverbs_qp_idr, cmd.qp_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.qp_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	qp  = uobj->object;
@@ -2465,7 +2461,7 @@ ssize_t ib_uverbs_destroy_qp(struct ib_uverbs_file *file,
 	if (obj->uxrcd)
 		atomic_dec(&obj->uxrcd->refcnt);
 
-	idr_remove_uobj(&ib_uverbs_qp_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -2917,7 +2913,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	ah->uobject  = uobj;
 	uobj->object = ah;
 
-	ret = idr_add_uobj(&ib_uverbs_ah_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_destroy;
 
@@ -2942,7 +2938,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_ah_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_destroy:
 	ib_destroy_ah(ah);
@@ -2967,7 +2963,7 @@ ssize_t ib_uverbs_destroy_ah(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_ah_idr, cmd.ah_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.ah_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	ah = uobj->object;
@@ -2981,7 +2977,7 @@ ssize_t ib_uverbs_destroy_ah(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_ah_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -3263,7 +3259,7 @@ int ib_uverbs_ex_create_wq(struct ib_uverbs_file *file,
 	atomic_inc(&cq->usecnt);
 	wq->uobject = &obj->uevent.uobject;
 	obj->uevent.uobject.object = wq;
-	err = idr_add_uobj(&ib_uverbs_wq_idr, &obj->uevent.uobject);
+	err = idr_add_uobj(&obj->uevent.uobject);
 	if (err)
 		goto destroy_wq;
 
@@ -3290,7 +3286,7 @@ int ib_uverbs_ex_create_wq(struct ib_uverbs_file *file,
 	return 0;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_wq_idr, &obj->uevent.uobject);
+	idr_remove_uobj(&obj->uevent.uobject);
 destroy_wq:
 	ib_destroy_wq(wq);
 err_put_cq:
@@ -3339,7 +3335,7 @@ int ib_uverbs_ex_destroy_wq(struct ib_uverbs_file *file,
 		return -EOPNOTSUPP;
 
 	resp.response_length = required_resp_len;
-	uobj = idr_write_uobj(&ib_uverbs_wq_idr, cmd.wq_handle,
+	uobj = idr_write_uobj(cmd.wq_handle,
 			      file->ucontext);
 	if (!uobj)
 		return -EINVAL;
@@ -3354,7 +3350,7 @@ int ib_uverbs_ex_destroy_wq(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_wq_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -3522,7 +3518,7 @@ int ib_uverbs_ex_create_rwq_ind_table(struct ib_uverbs_file *file,
 	for (i = 0; i < num_wq_handles; i++)
 		atomic_inc(&wqs[i]->usecnt);
 
-	err = idr_add_uobj(&ib_uverbs_rwq_ind_tbl_idr, uobj);
+	err = idr_add_uobj(uobj);
 	if (err)
 		goto destroy_ind_tbl;
 
@@ -3550,7 +3546,7 @@ int ib_uverbs_ex_create_rwq_ind_table(struct ib_uverbs_file *file,
 	return 0;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_rwq_ind_tbl_idr, uobj);
+	idr_remove_uobj(uobj);
 destroy_ind_tbl:
 	ib_destroy_rwq_ind_table(rwq_ind_tbl);
 err_uobj:
@@ -3593,7 +3589,7 @@ int ib_uverbs_ex_destroy_rwq_ind_table(struct ib_uverbs_file *file,
 	if (cmd.comp_mask)
 		return -EOPNOTSUPP;
 
-	uobj = idr_write_uobj(&ib_uverbs_rwq_ind_tbl_idr, cmd.ind_tbl_handle,
+	uobj = idr_write_uobj(cmd.ind_tbl_handle,
 			      file->ucontext);
 	if (!uobj)
 		return -EINVAL;
@@ -3609,7 +3605,7 @@ int ib_uverbs_ex_destroy_rwq_ind_table(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_rwq_ind_tbl_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -3749,7 +3745,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 	flow_id->uobject = uobj;
 	uobj->object = flow_id;
 
-	err = idr_add_uobj(&ib_uverbs_rule_idr, uobj);
+	err = idr_add_uobj(uobj);
 	if (err)
 		goto destroy_flow;
 
@@ -3774,7 +3770,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 		kfree(kern_flow_attr);
 	return 0;
 err_copy:
-	idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+	idr_remove_uobj(uobj);
 destroy_flow:
 	ib_destroy_flow(flow_id);
 err_free:
@@ -3809,8 +3805,7 @@ int ib_uverbs_ex_destroy_flow(struct ib_uverbs_file *file,
 	if (cmd.comp_mask)
 		return -EINVAL;
 
-	uobj = idr_write_uobj(&ib_uverbs_rule_idr, cmd.flow_handle,
-			      file->ucontext);
+	uobj = idr_write_uobj(cmd.flow_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	flow_id = uobj->object;
@@ -3821,7 +3816,7 @@ int ib_uverbs_ex_destroy_flow(struct ib_uverbs_file *file,
 
 	put_uobj_write(uobj);
 
-	idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -3909,7 +3904,7 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	atomic_set(&srq->usecnt, 0);
 
 	obj->uevent.uobject.object = srq;
-	ret = idr_add_uobj(&ib_uverbs_srq_idr, &obj->uevent.uobject);
+	ret = idr_add_uobj(&obj->uevent.uobject);
 	if (ret)
 		goto err_destroy;
 
@@ -3943,7 +3938,7 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	return 0;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_srq_idr, &obj->uevent.uobject);
+	idr_remove_uobj(&obj->uevent.uobject);
 
 err_destroy:
 	ib_destroy_srq(srq);
@@ -4119,7 +4114,7 @@ ssize_t ib_uverbs_destroy_srq(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_srq_idr, cmd.srq_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.srq_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	srq = uobj->object;
@@ -4140,7 +4135,7 @@ ssize_t ib_uverbs_destroy_srq(struct ib_uverbs_file *file,
 		atomic_dec(&us->uxrcd->refcnt);
 	}
 
-	idr_remove_uobj(&ib_uverbs_srq_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 0012fa5..f783723 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -66,19 +66,6 @@ enum {
 
 static struct class *uverbs_class;
 
-DEFINE_SPINLOCK(ib_uverbs_idr_lock);
-DEFINE_IDR(ib_uverbs_pd_idr);
-DEFINE_IDR(ib_uverbs_mr_idr);
-DEFINE_IDR(ib_uverbs_mw_idr);
-DEFINE_IDR(ib_uverbs_ah_idr);
-DEFINE_IDR(ib_uverbs_cq_idr);
-DEFINE_IDR(ib_uverbs_qp_idr);
-DEFINE_IDR(ib_uverbs_srq_idr);
-DEFINE_IDR(ib_uverbs_xrcd_idr);
-DEFINE_IDR(ib_uverbs_rule_idr);
-DEFINE_IDR(ib_uverbs_wq_idr);
-DEFINE_IDR(ib_uverbs_rwq_ind_tbl_idr);
-
 static DEFINE_SPINLOCK(map_lock);
 static DECLARE_BITMAP(dev_map, IB_UVERBS_MAX_DEVICES);
 
@@ -234,7 +221,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
 		struct ib_ah *ah = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_ah_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_ah(ah);
 		kfree(uobj);
 	}
@@ -243,7 +230,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->mw_list, list) {
 		struct ib_mw *mw = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_mw_idr, uobj);
+		idr_remove_uobj(uobj);
 		uverbs_dealloc_mw(mw);
 		kfree(uobj);
 	}
@@ -251,7 +238,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->rule_list, list) {
 		struct ib_flow *flow_id = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_flow(flow_id);
 		kfree(uobj);
 	}
@@ -261,7 +248,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_uqp_object *uqp =
 			container_of(uobj, struct ib_uqp_object, uevent.uobject);
 
-		idr_remove_uobj(&ib_uverbs_qp_idr, uobj);
+		idr_remove_uobj(uobj);
 		if (qp != qp->real_qp) {
 			ib_close_qp(qp);
 		} else {
@@ -276,7 +263,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_rwq_ind_table *rwq_ind_tbl = uobj->object;
 		struct ib_wq **ind_tbl = rwq_ind_tbl->ind_tbl;
 
-		idr_remove_uobj(&ib_uverbs_rwq_ind_tbl_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_rwq_ind_table(rwq_ind_tbl);
 		kfree(ind_tbl);
 		kfree(uobj);
@@ -287,7 +274,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_uwq_object *uwq =
 			container_of(uobj, struct ib_uwq_object, uevent.uobject);
 
-		idr_remove_uobj(&ib_uverbs_wq_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_wq(wq);
 		ib_uverbs_release_uevent(file, &uwq->uevent);
 		kfree(uwq);
@@ -298,7 +285,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_uevent_object *uevent =
 			container_of(uobj, struct ib_uevent_object, uobject);
 
-		idr_remove_uobj(&ib_uverbs_srq_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_srq(srq);
 		ib_uverbs_release_uevent(file, uevent);
 		kfree(uevent);
@@ -310,7 +297,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_ucq_object *ucq =
 			container_of(uobj, struct ib_ucq_object, uobject);
 
-		idr_remove_uobj(&ib_uverbs_cq_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_cq(cq);
 		ib_uverbs_release_ucq(file, ev_file, ucq);
 		kfree(ucq);
@@ -319,7 +306,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->mr_list, list) {
 		struct ib_mr *mr = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_mr_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_dereg_mr(mr);
 		kfree(uobj);
 	}
@@ -330,7 +317,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_uxrcd_object *uxrcd =
 			container_of(uobj, struct ib_uxrcd_object, uobject);
 
-		idr_remove_uobj(&ib_uverbs_xrcd_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_uverbs_dealloc_xrcd(file->device, xrcd);
 		kfree(uxrcd);
 	}
@@ -339,7 +326,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->pd_list, list) {
 		struct ib_pd *pd = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_pd_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_dealloc_pd(pd);
 		kfree(uobj);
 	}
@@ -1375,13 +1362,6 @@ static void __exit ib_uverbs_cleanup(void)
 	unregister_chrdev_region(IB_UVERBS_BASE_DEV, IB_UVERBS_MAX_DEVICES);
 	if (overflow_maj)
 		unregister_chrdev_region(overflow_maj, IB_UVERBS_MAX_DEVICES);
-	idr_destroy(&ib_uverbs_pd_idr);
-	idr_destroy(&ib_uverbs_mr_idr);
-	idr_destroy(&ib_uverbs_mw_idr);
-	idr_destroy(&ib_uverbs_ah_idr);
-	idr_destroy(&ib_uverbs_cq_idr);
-	idr_destroy(&ib_uverbs_qp_idr);
-	idr_destroy(&ib_uverbs_srq_idr);
 }
 
 module_init(ib_uverbs_init);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index d3fba0a..b5d2075 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1835,6 +1835,10 @@ struct ib_device {
 
 	struct iw_cm_verbs	     *iwcm;
 
+	struct idr		idr;
+	/* Global lock in use to safely release device IDR */
+	spinlock_t		idr_lock;
+
 	/**
 	 * alloc_hw_stats - Allocate a struct rdma_hw_stats and fill in the
 	 *   driver initialized data.  The struct is kfree()'ed by the sysfs
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC ABI V5 02/10] RDMA/core: Add support for custom types
       [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-10-27 14:43   ` [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device Matan Barak
@ 2016-10-27 14:43   ` Matan Barak
       [not found]     ` <1477579398-6875-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-10-27 14:43   ` [RFC ABI V5 03/10] RDMA/core: Add new ioctl interface Matan Barak
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Matan Barak @ 2016-10-27 14:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Christoph Lameter,
	Liran Liss, Haggai Eran, Majd Dibbiny, Matan Barak, Tal Alon,
	Leon Romanovsky

The new ioctl infrastructure supports driver specific objects.
Each such object type has a free function, allocation size and an
order of destruction. This information is embedded in the same
table describing the various action allowed on the object, similarly
to object oriented programming.

When a ucontext is created, a new list is created in this ib_ucontext.
This list contains all objects created under this ib_ucontext.
When a ib_ucontext is destroyed, we traverse this list several time
destroying the various objects by the order mentioned in the object
type description. If few object types have the same destruction order,
they are destroyed in an order opposite to their creation order.

Adding an object is done in two parts.
First, an object is allocated and added to IDR/fd table. Then, the
command's handlers (in downstream patches) could work on this object
and fill in its required details.
After a successful command, ib_uverbs_uobject_enable is called and
this user objects becomes ucontext visible.

Removing an uboject is done by calling ib_uverbs_uobject_remove.

We should make sure IDR (per-device) and list (per-ucontext) could
be accessed concurrently without corrupting them.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/Makefile      |   3 +-
 drivers/infiniband/core/device.c      |   1 +
 drivers/infiniband/core/rdma_core.c   | 489 ++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/rdma_core.h   |  75 ++++++
 drivers/infiniband/core/uverbs.h      |   1 +
 drivers/infiniband/core/uverbs_main.c |   2 +-
 include/rdma/ib_verbs.h               |  28 +-
 include/rdma/uverbs_ioctl.h           | 195 ++++++++++++++
 8 files changed, 789 insertions(+), 5 deletions(-)
 create mode 100644 drivers/infiniband/core/rdma_core.c
 create mode 100644 drivers/infiniband/core/rdma_core.h
 create mode 100644 include/rdma/uverbs_ioctl.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index edaae9f..1819623 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -28,4 +28,5 @@ ib_umad-y :=			user_mad.o
 
 ib_ucm-y :=			ucm.o
 
-ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o
+ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
+				rdma_core.o
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index c3b68f5..43994b1 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -243,6 +243,7 @@ struct ib_device *ib_alloc_device(size_t size)
 	spin_lock_init(&device->client_data_lock);
 	INIT_LIST_HEAD(&device->client_data_list);
 	INIT_LIST_HEAD(&device->port_list);
+	INIT_LIST_HEAD(&device->type_list);
 
 	return device;
 }
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
new file mode 100644
index 0000000..337abc2
--- /dev/null
+++ b/drivers/infiniband/core/rdma_core.c
@@ -0,0 +1,489 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/file.h>
+#include <linux/anon_inodes.h>
+#include <rdma/ib_verbs.h>
+#include "uverbs.h"
+#include "rdma_core.h"
+#include <rdma/uverbs_ioctl.h>
+
+const struct uverbs_type *uverbs_get_type(const struct ib_device *ibdev,
+					  uint16_t type)
+{
+	const struct uverbs_types_group *groups = ibdev->types_group;
+	const struct uverbs_types *types;
+	int ret = groups->dist(&type, groups->priv);
+
+	if (ret >= groups->num_groups)
+		return NULL;
+
+	types = groups->type_groups[ret];
+
+	if (type >= types->num_types)
+		return NULL;
+
+	return types->types[type];
+}
+
+static int uverbs_lock_object(struct ib_uobject *uobj,
+			      enum uverbs_idr_access access)
+{
+	if (access == UVERBS_IDR_ACCESS_READ)
+		return down_read_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
+
+	/* lock is either WRITE or DESTROY - should be exclusive */
+	return down_write_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
+}
+
+static struct ib_uobject *get_uobj(int id, struct ib_ucontext *context)
+{
+	struct ib_uobject *uobj;
+
+	rcu_read_lock();
+	uobj = idr_find(&context->device->idr, id);
+	if (uobj && uobj->live) {
+		if (uobj->context != context)
+			uobj = NULL;
+	}
+	rcu_read_unlock();
+
+	return uobj;
+}
+
+struct ib_ucontext_lock {
+	struct kref  ref;
+	/* locking the uobjects_list */
+	struct mutex lock;
+};
+
+static void init_uobjects_list_lock(struct ib_ucontext_lock *lock)
+{
+	mutex_init(&lock->lock);
+	kref_init(&lock->ref);
+}
+
+static void release_uobjects_list_lock(struct kref *ref)
+{
+	struct ib_ucontext_lock *lock = container_of(ref,
+						     struct ib_ucontext_lock,
+						     ref);
+
+	kfree(lock);
+}
+
+static void init_uobj(struct ib_uobject *uobj, u64 user_handle,
+		      struct ib_ucontext *context)
+{
+	init_rwsem(&uobj->usecnt);
+	uobj->user_handle = user_handle;
+	uobj->context     = context;
+	uobj->live        = 0;
+}
+
+static int add_uobj(struct ib_uobject *uobj)
+{
+	int ret;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&uobj->context->device->idr_lock);
+
+	ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0, GFP_NOWAIT);
+	if (ret >= 0)
+		uobj->id = ret;
+
+	spin_unlock(&uobj->context->device->idr_lock);
+	idr_preload_end();
+
+	return ret < 0 ? ret : 0;
+}
+
+static void remove_uobj(struct ib_uobject *uobj)
+{
+	spin_lock(&uobj->context->device->idr_lock);
+	idr_remove(&uobj->context->device->idr, uobj->id);
+	spin_unlock(&uobj->context->device->idr_lock);
+}
+
+static void put_uobj(struct ib_uobject *uobj)
+{
+	kfree_rcu(uobj, rcu);
+}
+
+static struct ib_uobject *get_uobject_from_context(struct ib_ucontext *ucontext,
+						   const struct uverbs_type_alloc_action *type,
+						   u32 idr,
+						   enum uverbs_idr_access access)
+{
+	struct ib_uobject *uobj;
+	int ret;
+
+	rcu_read_lock();
+	uobj = get_uobj(idr, ucontext);
+	if (!uobj)
+		goto free;
+
+	if (uobj->type != type) {
+		uobj = NULL;
+		goto free;
+	}
+
+	ret = uverbs_lock_object(uobj, access);
+	if (ret)
+		uobj = ERR_PTR(ret);
+free:
+	rcu_read_unlock();
+	return uobj;
+
+	return NULL;
+}
+
+static int ib_uverbs_uobject_add(struct ib_uobject *uobject,
+				 const struct uverbs_type_alloc_action *uobject_type)
+{
+	uobject->type = uobject_type;
+	return add_uobj(uobject);
+}
+
+struct ib_uobject *uverbs_get_type_from_idr(const struct uverbs_type_alloc_action *type,
+					    struct ib_ucontext *ucontext,
+					    enum uverbs_idr_access access,
+					    uint32_t idr)
+{
+	struct ib_uobject *uobj;
+	int ret;
+
+	if (access == UVERBS_IDR_ACCESS_NEW) {
+		uobj = kmalloc(type->obj_size, GFP_KERNEL);
+		if (!uobj)
+			return ERR_PTR(-ENOMEM);
+
+		init_uobj(uobj, 0, ucontext);
+
+		/* lock idr */
+		ret = ib_uverbs_uobject_add(uobj, type);
+		if (ret) {
+			kfree(uobj);
+			return ERR_PTR(ret);
+		}
+
+	} else {
+		uobj = get_uobject_from_context(ucontext, type, idr,
+						access);
+
+		if (!uobj)
+			return ERR_PTR(-ENOENT);
+	}
+
+	return uobj;
+}
+
+struct ib_uobject *uverbs_get_type_from_fd(const struct uverbs_type_alloc_action *type,
+					   struct ib_ucontext *ucontext,
+					   enum uverbs_idr_access access,
+					   int fd)
+{
+	if (access == UVERBS_IDR_ACCESS_NEW) {
+		int _fd;
+		struct ib_uobject *uobj = NULL;
+		struct file *filp;
+
+		_fd = get_unused_fd_flags(O_CLOEXEC);
+		if (_fd < 0 || WARN_ON(type->obj_size < sizeof(struct ib_uobject)))
+			return ERR_PTR(_fd);
+
+		uobj = kmalloc(type->obj_size, GFP_KERNEL);
+		init_uobj(uobj, 0, ucontext);
+
+		if (!uobj)
+			return ERR_PTR(-ENOMEM);
+
+		filp = anon_inode_getfile(type->fd.name, type->fd.fops,
+					  uobj + 1, type->fd.flags);
+		if (IS_ERR(filp)) {
+			put_unused_fd(_fd);
+			kfree(uobj);
+			return (void *)filp;
+		}
+
+		uobj->type = type;
+		uobj->id = _fd;
+		uobj->object = filp;
+
+		return uobj;
+	} else if (access == UVERBS_IDR_ACCESS_READ) {
+		struct file *f = fget(fd);
+		struct ib_uobject *uobject;
+
+		if (!f)
+			return ERR_PTR(-EBADF);
+
+		uobject = f->private_data - sizeof(struct ib_uobject);
+		if (f->f_op != type->fd.fops ||
+		    !uobject->live) {
+			fput(f);
+			return ERR_PTR(-EBADF);
+		}
+
+		/*
+		 * No need to protect it with a ref count, as fget increases
+		 * f_count.
+		 */
+		return uobject;
+	} else {
+		return ERR_PTR(-EOPNOTSUPP);
+	}
+}
+
+static void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
+{
+	mutex_lock(&uobject->context->uobjects_lock->lock);
+	list_add(&uobject->list, &uobject->context->uobjects);
+	mutex_unlock(&uobject->context->uobjects_lock->lock);
+	uobject->live = 1;
+}
+
+static void ib_uverbs_uobject_remove(struct ib_uobject *uobject, bool lock)
+{
+	/*
+	 * Calling remove requires exclusive access, so it's not possible
+	 * another thread will use our object.
+	 */
+	uobject->live = 0;
+	uobject->type->free_fn(uobject->type, uobject);
+	if (lock)
+		mutex_lock(&uobject->context->uobjects_lock->lock);
+	list_del(&uobject->list);
+	if (lock)
+		mutex_unlock(&uobject->context->uobjects_lock->lock);
+	remove_uobj(uobject);
+	put_uobj(uobject);
+}
+
+static void uverbs_unlock_idr(struct ib_uobject *uobj,
+			      enum uverbs_idr_access access,
+			      bool success)
+{
+	switch (access) {
+	case UVERBS_IDR_ACCESS_READ:
+		up_read(&uobj->usecnt);
+		break;
+	case UVERBS_IDR_ACCESS_NEW:
+		if (success) {
+			ib_uverbs_uobject_enable(uobj);
+		} else {
+			remove_uobj(uobj);
+			put_uobj(uobj);
+		}
+		break;
+	case UVERBS_IDR_ACCESS_WRITE:
+		up_write(&uobj->usecnt);
+		break;
+	case UVERBS_IDR_ACCESS_DESTROY:
+		if (success)
+			ib_uverbs_uobject_remove(uobj, true);
+		else
+			up_write(&uobj->usecnt);
+		break;
+	}
+}
+
+static void uverbs_unlock_fd(struct ib_uobject *uobj,
+			     enum uverbs_idr_access access,
+			     bool success)
+{
+	struct file *filp = uobj->object;
+
+	if (access == UVERBS_IDR_ACCESS_NEW) {
+		if (success) {
+			kref_get(&uobj->context->ufile->ref);
+			uobj->uobjects_lock = uobj->context->uobjects_lock;
+			kref_get(&uobj->uobjects_lock->ref);
+			ib_uverbs_uobject_enable(uobj);
+			fd_install(uobj->id, uobj->object);
+		} else {
+			fput(uobj->object);
+			put_unused_fd(uobj->id);
+			kfree(uobj);
+		}
+	} else {
+		fput(filp);
+	}
+}
+
+void uverbs_unlock_object(struct ib_uobject *uobj,
+			  enum uverbs_idr_access access,
+			  bool success)
+{
+	if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
+		uverbs_unlock_idr(uobj, access, success);
+	else if (uobj->type->type == UVERBS_ATTR_TYPE_FD)
+		uverbs_unlock_fd(uobj, access, success);
+	else
+		WARN_ON(true);
+}
+
+static void ib_uverbs_remove_fd(struct ib_uobject *uobject)
+{
+	/*
+	 * user should release the uobject in the release
+	 * callback.
+	 */
+	if (uobject->live) {
+		uobject->live = 0;
+		list_del(&uobject->list);
+		uobject->type->free_fn(uobject->type, uobject);
+		kref_put(&uobject->context->ufile->ref, ib_uverbs_release_file);
+		uobject->context = NULL;
+	}
+}
+
+void ib_uverbs_close_fd(struct file *f)
+{
+	struct ib_uobject *uobject = f->private_data - sizeof(struct ib_uobject);
+
+	mutex_lock(&uobject->uobjects_lock->lock);
+	if (uobject->live) {
+		uobject->live = 0;
+		list_del(&uobject->list);
+		kref_put(&uobject->context->ufile->ref, ib_uverbs_release_file);
+		uobject->context = NULL;
+	}
+	mutex_unlock(&uobject->uobjects_lock->lock);
+	kref_put(&uobject->uobjects_lock->ref, release_uobjects_list_lock);
+}
+
+void ib_uverbs_cleanup_fd(void *private_data)
+{
+	struct ib_uboject *uobject = private_data - sizeof(struct ib_uobject);
+
+	kfree(uobject);
+}
+
+void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
+			   size_t num,
+			   const struct uverbs_action_spec *spec,
+			   bool success)
+{
+	unsigned int i;
+
+	for (i = 0; i < num; i++) {
+		struct uverbs_attr_array *attr_spec_array = &attr_array[i];
+		const struct uverbs_attr_group_spec *group_spec =
+			spec->attr_groups[i];
+		unsigned int j;
+
+		for (j = 0; j < attr_spec_array->num_attrs; j++) {
+			struct uverbs_attr *attr = &attr_spec_array->attrs[j];
+			struct uverbs_attr_spec *spec = &group_spec->attrs[j];
+
+			if (!attr->valid)
+				continue;
+
+			if (spec->type == UVERBS_ATTR_TYPE_IDR ||
+			    spec->type == UVERBS_ATTR_TYPE_FD)
+				/*
+				 * refcounts should be handled at the object
+				 * level and not at the uobject level.
+				 */
+				uverbs_unlock_object(attr->obj_attr.uobject,
+						     spec->obj.access, success);
+		}
+	}
+}
+
+static unsigned int get_type_orders(const struct uverbs_types_group *types_group)
+{
+	unsigned int i;
+	unsigned int max = 0;
+
+	for (i = 0; i < types_group->num_groups; i++) {
+		unsigned int j;
+		const struct uverbs_types *types = types_group->type_groups[i];
+
+		for (j = 0; j < types->num_types; j++) {
+			if (!types->types[j] || !types->types[j]->alloc)
+				continue;
+			if (types->types[j]->alloc->order > max)
+				max = types->types[j]->alloc->order;
+		}
+	}
+
+	return max;
+}
+
+void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext *ucontext,
+					     const struct uverbs_types_group *types_group)
+{
+	unsigned int num_orders = get_type_orders(types_group);
+	unsigned int i;
+
+	for (i = 0; i <= num_orders; i++) {
+		struct ib_uobject *obj, *next_obj;
+
+		/*
+		 * No need to take lock here, as cleanup should be called
+		 * after all commands finished executing. Newly executed
+		 * commands should fail.
+		 */
+		mutex_lock(&ucontext->uobjects_lock->lock);
+		list_for_each_entry_safe(obj, next_obj, &ucontext->uobjects,
+					 list)
+			if (obj->type->order == i) {
+				if (obj->type->type == UVERBS_ATTR_TYPE_IDR)
+					ib_uverbs_uobject_remove(obj, false);
+				else
+					ib_uverbs_remove_fd(obj);
+			}
+		mutex_unlock(&ucontext->uobjects_lock->lock);
+	}
+	kref_put(&ucontext->uobjects_lock->ref, release_uobjects_list_lock);
+}
+
+int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext *ucontext)
+{
+	ucontext->uobjects_lock = kmalloc(sizeof(*ucontext->uobjects_lock),
+					  GFP_KERNEL);
+	if (!ucontext->uobjects_lock)
+		return -ENOMEM;
+
+	init_uobjects_list_lock(ucontext->uobjects_lock);
+	INIT_LIST_HEAD(&ucontext->uobjects);
+
+	return 0;
+}
+
+void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext *ucontext)
+{
+	kfree(ucontext->uobjects_lock);
+}
+
diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h
new file mode 100644
index 0000000..8990115
--- /dev/null
+++ b/drivers/infiniband/core/rdma_core.h
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2005 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005-2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef UOBJECT_H
+#define UOBJECT_H
+
+#include <linux/idr.h>
+#include <rdma/uverbs_ioctl.h>
+#include <rdma/ib_verbs.h>
+#include <linux/mutex.h>
+
+const struct uverbs_type *uverbs_get_type(const struct ib_device *ibdev,
+					  uint16_t type);
+struct ib_uobject *uverbs_get_type_from_idr(const struct uverbs_type_alloc_action *type,
+					    struct ib_ucontext *ucontext,
+					    enum uverbs_idr_access access,
+					    uint32_t idr);
+struct ib_uobject *uverbs_get_type_from_fd(const struct uverbs_type_alloc_action *type,
+					   struct ib_ucontext *ucontext,
+					   enum uverbs_idr_access access,
+					   int fd);
+void uverbs_unlock_object(struct ib_uobject *uobj,
+			  enum uverbs_idr_access access,
+			  bool success);
+void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
+			   size_t num,
+			   const struct uverbs_action_spec *spec,
+			   bool success);
+
+void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext *ucontext,
+					     const struct uverbs_types_group *types_group);
+int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext *ucontext);
+void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext *ucontext);
+void ib_uverbs_close_fd(struct file *f);
+void ib_uverbs_cleanup_fd(void *private_data);
+
+static inline void *uverbs_fd_to_priv(struct ib_uobject *uobj)
+{
+	return uobj + 1;
+}
+
+#endif /* UIDR_H */
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 8074705..ae7d4b8 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -180,6 +180,7 @@ void idr_remove_uobj(struct ib_uobject *uobj);
 struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file,
 					struct ib_device *ib_dev,
 					int is_async);
+void ib_uverbs_release_file(struct kref *ref);
 void ib_uverbs_free_async_event_file(struct ib_uverbs_file *uverbs_file);
 struct ib_uverbs_event_file *ib_uverbs_lookup_comp_file(int fd);
 
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index f783723..e63357a 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -341,7 +341,7 @@ static void ib_uverbs_comp_dev(struct ib_uverbs_device *dev)
 	complete(&dev->comp);
 }
 
-static void ib_uverbs_release_file(struct kref *ref)
+void ib_uverbs_release_file(struct kref *ref)
 {
 	struct ib_uverbs_file *file =
 		container_of(ref, struct ib_uverbs_file, ref);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index b5d2075..7240615 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1329,8 +1329,11 @@ struct ib_fmr_attr {
 
 struct ib_umem;
 
+struct ib_ucontext_lock;
+
 struct ib_ucontext {
 	struct ib_device       *device;
+	struct ib_uverbs_file  *ufile;
 	struct list_head	pd_list;
 	struct list_head	mr_list;
 	struct list_head	mw_list;
@@ -1344,6 +1347,10 @@ struct ib_ucontext {
 	struct list_head	rwq_ind_tbl_list;
 	int			closing;
 
+	/* lock for uobjects list */
+	struct ib_ucontext_lock	*uobjects_lock;
+	struct list_head	uobjects;
+
 	struct pid             *tgid;
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 	struct rb_root      umem_tree;
@@ -1363,16 +1370,28 @@ struct ib_ucontext {
 #endif
 };
 
+struct uverbs_object_list;
+
+#define OLD_ABI_COMPAT
+
 struct ib_uobject {
 	u64			user_handle;	/* handle given to us by userspace */
 	struct ib_ucontext     *context;	/* associated user context */
 	void		       *object;		/* containing object */
 	struct list_head	list;		/* link to context's list */
-	int			id;		/* index into kernel idr */
-	struct kref		ref;
-	struct rw_semaphore	mutex;		/* protects .live */
+	int			id;		/* index into kernel idr/fd */
+#ifdef OLD_ABI_COMPAT
+	struct kref             ref;
+#endif
+	struct rw_semaphore	usecnt;		/* protects exclusive access */
+#ifdef OLD_ABI_COMPAT
+	struct rw_semaphore     mutex;          /* protects .live */
+#endif
 	struct rcu_head		rcu;		/* kfree_rcu() overhead */
 	int			live;
+
+	const struct uverbs_type_alloc_action *type;
+	struct ib_ucontext_lock	*uobjects_lock;
 };
 
 struct ib_udata {
@@ -2101,6 +2120,9 @@ struct ib_device {
 	 */
 	int (*get_port_immutable)(struct ib_device *, u8, struct ib_port_immutable *);
 	void (*get_dev_fw_str)(struct ib_device *, char *str, size_t str_len);
+	struct list_head type_list;
+
+	const struct uverbs_types_group	*types_group;
 };
 
 struct ib_client {
diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
new file mode 100644
index 0000000..2f50045
--- /dev/null
+++ b/include/rdma/uverbs_ioctl.h
@@ -0,0 +1,195 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UVERBS_IOCTL_
+#define _UVERBS_IOCTL_
+
+#include <linux/kernel.h>
+
+struct uverbs_object_type;
+struct ib_ucontext;
+struct ib_uobject;
+struct ib_device;
+struct uverbs_uobject_type;
+
+/*
+ * =======================================
+ *	Verbs action specifications
+ * =======================================
+ */
+
+enum uverbs_attr_type {
+	UVERBS_ATTR_TYPE_PTR_IN,
+	UVERBS_ATTR_TYPE_PTR_OUT,
+	UVERBS_ATTR_TYPE_IDR,
+	UVERBS_ATTR_TYPE_FD,
+};
+
+enum uverbs_idr_access {
+	UVERBS_IDR_ACCESS_READ,
+	UVERBS_IDR_ACCESS_WRITE,
+	UVERBS_IDR_ACCESS_NEW,
+	UVERBS_IDR_ACCESS_DESTROY
+};
+
+struct uverbs_attr_spec {
+	u16				len;
+	enum uverbs_attr_type		type;
+	struct {
+		u16			obj_type;
+		u8			access;
+	} obj;
+};
+
+struct uverbs_attr_group_spec {
+	struct uverbs_attr_spec		*attrs;
+	size_t				num_attrs;
+};
+
+struct uverbs_action_spec {
+	const struct uverbs_attr_group_spec		**attr_groups;
+	/* if > 0 -> validator, otherwise, error */
+	int (*dist)(__u16 *attr_id, void *priv);
+	void						*priv;
+	size_t						num_groups;
+};
+
+struct uverbs_attr_array;
+struct ib_uverbs_file;
+
+struct uverbs_action {
+	struct uverbs_action_spec spec;
+	void *priv;
+	int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
+		       struct uverbs_attr_array *ctx, size_t num, void *priv);
+};
+
+struct uverbs_type_alloc_action;
+typedef void (*free_type)(const struct uverbs_type_alloc_action *uobject_type,
+			  struct ib_uobject *uobject);
+
+struct uverbs_type_alloc_action {
+	enum uverbs_attr_type		type;
+	int				order;
+	size_t				obj_size;
+	free_type			free_fn;
+	struct {
+		const struct file_operations	*fops;
+		const char			*name;
+		int				flags;
+	} fd;
+};
+
+struct uverbs_type_actions_group {
+	size_t					num_actions;
+	const struct uverbs_action		**actions;
+};
+
+struct uverbs_type {
+	size_t					num_groups;
+	const struct uverbs_type_actions_group	**action_groups;
+	const struct uverbs_type_alloc_action	*alloc;
+	int (*dist)(__u16 *action_id, void *priv);
+	void					*priv;
+};
+
+struct uverbs_types {
+	size_t					num_types;
+	const struct uverbs_type		**types;
+};
+
+struct uverbs_types_group {
+	const struct uverbs_types		**type_groups;
+	size_t					num_groups;
+	int (*dist)(__u16 *type_id, void *priv);
+	void					*priv;
+};
+
+/* =================================================
+ *              Parsing infrastructure
+ * =================================================
+ */
+
+struct uverbs_ptr_attr {
+	void	* __user ptr;
+	__u16		len;
+};
+
+struct uverbs_fd_attr {
+	int		fd;
+};
+
+struct uverbs_uobj_attr {
+	/*  idr handle */
+	__u32	idr;
+};
+
+struct uverbs_obj_attr {
+	/* pointer to the kernel descriptor -> type, access, etc */
+	const struct uverbs_attr_spec *val;
+	struct ib_uverbs_attr __user	*uattr;
+	const struct uverbs_type_alloc_action	*type;
+	struct ib_uobject		*uobject;
+	union {
+		struct uverbs_fd_attr		fd;
+		struct uverbs_uobj_attr		uobj;
+	};
+};
+
+struct uverbs_attr {
+	bool valid;
+	union {
+		struct uverbs_ptr_attr	cmd_attr;
+		struct uverbs_obj_attr	obj_attr;
+	};
+};
+
+/* output of one validator */
+struct uverbs_attr_array {
+	size_t num_attrs;
+	/* arrays of attrubytes, index is the id i.e SEND_CQ */
+	struct uverbs_attr *attrs;
+};
+
+/* =================================================
+ *              Types infrastructure
+ * =================================================
+ */
+
+int ib_uverbs_uobject_type_add(struct list_head	*head,
+			       void (*free)(struct uverbs_uobject_type *type,
+					    struct ib_uobject *uobject,
+					    struct ib_ucontext *ucontext),
+			       uint16_t	obj_type);
+void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev);
+
+#endif
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC ABI V5 03/10] RDMA/core: Add new ioctl interface
       [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-10-27 14:43   ` [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device Matan Barak
  2016-10-27 14:43   ` [RFC ABI V5 02/10] RDMA/core: Add support for custom types Matan Barak
@ 2016-10-27 14:43   ` Matan Barak
  2016-10-27 14:43   ` [RFC ABI V5 04/10] RDMA/core: Add initialize and cleanup of common types Matan Barak
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Matan Barak @ 2016-10-27 14:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Christoph Lameter,
	Liran Liss, Haggai Eran, Majd Dibbiny, Matan Barak, Tal Alon,
	Leon Romanovsky

In this proposed ioctl interface, processing the command starts from
properties of the command and fetching the appropriate user objects
before calling the handler.

Parsing and validation is done according to a specifier declared by
the driver's code. In the driver, all supported types are declared.
These types are separated to different type groups, each could be
declared in a different place (for example, common types and driver
specific types).

For each type we list all supported actions. Similarly to types,
actions are separated to actions groups too. Each group is declared
separately. This could be used in order to add actions to an existing
type.

Each action has a specifies a handler, which could be either be a
standard command or a driver specific command.
Along with the handler, a group of attributes is specified as well.
This group lists all supported attributes and is used for automatically
fetching and validating the command, response and its related objects.

When a group of elements is used, a distribution function is also
specified at the higher level (i.e - type specifies a distribution
function for the actions it contains). The distribution function
chooses the group which should be used and maps the element from
the kABI language (which is driver specific) to common kernel
language. The standard distribution function just uses the upper
bit for that.

A group of attributes is actually an array of attributes. Each
attribute has a type (PTR_IN, PTR_OUT, IDR and in the future maybe
FD, which could be used for completion channel) and a length.
Future work here might add validation of mandatory attributes
(i.e - make sure a specific attribute was given).

If an IDR/fd attribute is specified, the kernel also states the object
type and the required access (NEW, WRITE, READ or DESTROY).
All uobject/fd management is done automatically by the infrastructure,
meaning - the infrastructure will fail concurrent commands that at
least one of them requires concurrent access (WRITE/DESTROY),
synchronize actions with device removals (dissociate context events)
and take care of reference counting (increase/decrease) for concurrent
actions invocation. The reference counts on the actual kernel objects
shall be handled by the handlers.

 types
+--------+
|        |
|        |   actions                                                                +--------+
|        |   group      action      action_spec                           +-----+   |len     |
+--------+  +------+[d]+-------+   +----------------+[d]+------------+    |attr1+-> |type    |
| type   +> |action+-> | spec  +-> +  attr_groups   +-> |common_chain+--> +-----+   |idr_type|
+--------+  +------+   |handler|   |                |   +------------+    |attr2|   |access  |
|        |  |      |   +-------+   +----------------+   |vendor chain|    +-----+   +--------+
|        |  |      |                                    +------------+
|        |  +------+
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
+--------+

[d] = distribution function used

The right types table is also chosen via using a distribution function
over uverbs_types_groups.

Once validation and object fetching (or creation) completed, we call
the handler:
int (*handler)(struct ib_device *ib_dev, struct ib_ucontext *ucontext,
               struct uverbs_attr_array *ctx, size_t num, void *priv);

Where ctx is an array of uverbs_attr_array. Each element in this array
is an array of attributes which corresponds to one group of attributes.
For example, in the usually used case:

 ctx                               core
+----------------------------+     +------------+
| core: uverbs_attr_array    +---> | valid      |
+----------------------------+     | cmd_attr   |
| driver: uverbs_attr_array  |     +------------+
|----------------------------+--+  | valid      |
                                |  | cmd_attr   |
                                |  +------------+
                                |  | valid      |
                                |  | obj_attr   |
                                |  +------------+
                                |
                                |  vendor
                                |  +------------+
                                +> | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | obj_attr   |
                                   +------------+

Ctx array's indices corresponds to the attributes groups order. The indices
of core and driver corresponds to the attributes name spaces of each
group. Thus, we could think of the following as one object:
1. Set of attribute specification (with their attribute IDs)
2. Attribute group which owns (1) specifications
3. A function which could handle this attributes which the handler
   could call
4. The allocation descriptor of this type uverbs_type_alloc_action.

That means that core and driver are the validated function (3)
parameters and their types exist in this function namespace.

Since the frequent used case is groups of common and vendor specific
elements, we propose a wrapper that allows a definition of the
following handler:
int (*handler)(struct ib_device *ib_dev, struct ib_ucontext *ucontext,
               struct uverbs_attr_array *common,
               struct uverbs_attr_array *vendor,
               void *priv);

Upon success, reference count of uobjects and use count will be a
updated automatically according to the specification.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/Makefile           |   2 +-
 drivers/infiniband/core/device.c           |   3 +
 drivers/infiniband/core/rdma_core.c        |  16 ++
 drivers/infiniband/core/rdma_core.h        |   2 +
 drivers/infiniband/core/uverbs.h           |   4 +
 drivers/infiniband/core/uverbs_cmd.c       |   1 +
 drivers/infiniband/core/uverbs_ioctl.c     | 306 +++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_ioctl_cmd.c |  77 ++++++++
 drivers/infiniband/core/uverbs_main.c      |   6 +
 include/rdma/ib_verbs.h                    |   3 +-
 include/rdma/uverbs_ioctl_cmd.h            |  70 +++++++
 include/uapi/rdma/rdma_user_ioctl.h        |  23 +++
 12 files changed, 511 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/uverbs_ioctl.c
 create mode 100644 drivers/infiniband/core/uverbs_ioctl_cmd.c
 create mode 100644 include/rdma/uverbs_ioctl_cmd.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 1819623..769a299 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -29,4 +29,4 @@ ib_umad-y :=			user_mad.o
 ib_ucm-y :=			ucm.o
 
 ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
-				rdma_core.o
+				rdma_core.o uverbs_ioctl.o uverbs_ioctl_cmd.o
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 43994b1..c0c6365 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -245,6 +245,9 @@ struct ib_device *ib_alloc_device(size_t size)
 	INIT_LIST_HEAD(&device->port_list);
 	INIT_LIST_HEAD(&device->type_list);
 
+	/* TODO: don't forget to initialize device->driver_id, so verbs handshake between
+	 * user space<->kernel space will work for other values than driver_id == 0.
+	 */
 	return device;
 }
 EXPORT_SYMBOL(ib_alloc_device);
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 337abc2..bf8e741 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -55,6 +55,22 @@ const struct uverbs_type *uverbs_get_type(const struct ib_device *ibdev,
 	return types->types[type];
 }
 
+const struct uverbs_action *uverbs_get_action(const struct uverbs_type *type,
+					      uint16_t action)
+{
+	const struct uverbs_type_actions_group *actions_group;
+	int ret = type->dist(&action, type->priv);
+
+	if (ret >= type->num_groups)
+		return NULL;
+
+	actions_group = type->action_groups[ret];
+	if (action >= actions_group->num_actions)
+		return NULL;
+
+	return actions_group->actions[action];
+}
+
 static int uverbs_lock_object(struct ib_uobject *uobj,
 			      enum uverbs_idr_access access)
 {
diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h
index 8990115..f3661ef 100644
--- a/drivers/infiniband/core/rdma_core.h
+++ b/drivers/infiniband/core/rdma_core.h
@@ -44,6 +44,8 @@
 
 const struct uverbs_type *uverbs_get_type(const struct ib_device *ibdev,
 					  uint16_t type);
+const struct uverbs_action *uverbs_get_action(const struct uverbs_type *type,
+					      uint16_t action);
 struct ib_uobject *uverbs_get_type_from_idr(const struct uverbs_type_alloc_action *type,
 					    struct ib_ucontext *ucontext,
 					    enum uverbs_idr_access access,
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index ae7d4b8..d78c060 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -41,6 +41,7 @@
 #include <linux/mutex.h>
 #include <linux/completion.h>
 #include <linux/cdev.h>
+#include <linux/rwsem.h>
 
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_umem.h>
@@ -83,6 +84,8 @@
  * released when the CQ is destroyed.
  */
 
+long ib_uverbs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+
 struct ib_uverbs_device {
 	atomic_t				refcount;
 	int					num_comp_vectors;
@@ -122,6 +125,7 @@ struct ib_uverbs_file {
 	struct ib_uverbs_event_file	       *async_file;
 	struct list_head			list;
 	int					is_closed;
+	struct rw_semaphore			close_sem;
 };
 
 struct ib_uverbs_event {
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 84daf2c..ac17b9c 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -386,6 +386,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 	}
 
 	file->ucontext = ucontext;
+	ucontext->ufile = file;
 
 	fd_install(resp.async_fd, filp);
 
diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
new file mode 100644
index 0000000..9d56b17
--- /dev/null
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -0,0 +1,306 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/rdma_user_ioctl.h>
+#include <rdma/uverbs_ioctl.h>
+#include "rdma_core.h"
+#include "uverbs.h"
+
+static int uverbs_validate_attr(struct ib_device *ibdev,
+				struct ib_ucontext *ucontext,
+				const struct ib_uverbs_attr *uattr,
+				u16 attr_id,
+				const struct uverbs_attr_group_spec *group_spec,
+				struct uverbs_attr *elements,
+				struct ib_uverbs_attr __user *uattr_ptr)
+{
+	const struct uverbs_attr_spec *spec;
+	struct uverbs_attr *e;
+	const struct uverbs_type *type;
+	struct uverbs_obj_attr *o_attr;
+
+	if (uattr->reserved)
+		return -EINVAL;
+
+	if (attr_id >= group_spec->num_attrs)
+		return -EINVAL;
+
+	spec = &group_spec->attrs[attr_id];
+	e = &elements[attr_id];
+
+	if (e->valid)
+		return -EINVAL;
+
+	switch (spec->type) {
+	case UVERBS_ATTR_TYPE_PTR_IN:
+	case UVERBS_ATTR_TYPE_PTR_OUT:
+		/* If spec->len is zero, don't validate (flexible) */
+		if (spec->len && uattr->len != spec->len)
+			return -EINVAL;
+		e->cmd_attr.ptr = (void * __user)uattr->ptr_idr;
+		e->cmd_attr.len = uattr->len;
+		break;
+
+	case UVERBS_ATTR_TYPE_IDR:
+	case UVERBS_ATTR_TYPE_FD:
+		if (uattr->len != 0 || (uattr->ptr_idr >> 32) || (!ucontext))
+			return -EINVAL;
+
+		o_attr = &e->obj_attr;
+		o_attr->val = spec;
+		type = uverbs_get_type(ibdev, spec->obj.obj_type);
+		if (!type)
+			return -EINVAL;
+		o_attr->type = type->alloc;
+		o_attr->uattr = uattr_ptr;
+
+		if (spec->type == UVERBS_ATTR_TYPE_IDR) {
+			o_attr->uobj.idr = (uint32_t)uattr->ptr_idr;
+			o_attr->uobject = uverbs_get_type_from_idr(o_attr->type,
+								   ucontext,
+								   spec->obj.access,
+								   o_attr->uobj.idr);
+		} else {
+			o_attr->fd.fd = (int)uattr->ptr_idr;
+			o_attr->uobject = uverbs_get_type_from_fd(o_attr->type,
+								  ucontext,
+								  spec->obj.access,
+								  o_attr->fd.fd);
+		}
+
+		if (IS_ERR(o_attr->uobject))
+			return -EINVAL;
+
+		if (spec->obj.access == UVERBS_IDR_ACCESS_NEW) {
+			__u64 idr = o_attr->uobject->id;
+
+			if (put_user(idr, &o_attr->uattr->ptr_idr)) {
+				uverbs_unlock_object(o_attr->uobject,
+						     UVERBS_IDR_ACCESS_NEW,
+						     false);
+				return -EFAULT;
+			}
+		}
+
+		break;
+	};
+
+	e->valid = 1;
+	return 0;
+}
+
+static int uverbs_validate(struct ib_device *ibdev,
+			   struct ib_ucontext *ucontext,
+			   const struct ib_uverbs_attr *uattrs,
+			   size_t num_attrs,
+			   const struct uverbs_action_spec *action_spec,
+			   struct uverbs_attr_array *attr_array,
+			   struct ib_uverbs_attr __user *uattr_ptr)
+{
+	size_t i;
+	int ret;
+	int n_val = -1;
+
+	for (i = 0; i < num_attrs; i++) {
+		const struct ib_uverbs_attr *uattr = &uattrs[i];
+		__u16 attr_id = uattr->attr_id;
+		const struct uverbs_attr_group_spec *group_spec;
+
+		ret = action_spec->dist(&attr_id, action_spec->priv);
+		if (ret < 0)
+			return ret;
+
+		if (ret > n_val)
+			n_val = ret;
+
+		group_spec = action_spec->attr_groups[ret];
+		ret = uverbs_validate_attr(ibdev, ucontext, uattr, attr_id,
+					   group_spec, attr_array[ret].attrs,
+					   uattr_ptr++);
+		if (ret)
+			return ret;
+	}
+
+	return n_val >= 0 ? n_val + 1 : n_val;
+}
+
+static int uverbs_handle_action(struct ib_uverbs_attr __user *uattr_ptr,
+				const struct ib_uverbs_attr *uattrs,
+				size_t num_attrs,
+				struct ib_device *ibdev,
+				struct ib_uverbs_file *ufile,
+				const struct uverbs_action *handler,
+				struct uverbs_attr_array *attr_array)
+{
+	int ret;
+	int n_val;
+
+	n_val = uverbs_validate(ibdev, ufile->ucontext, uattrs, num_attrs,
+				&handler->spec, attr_array, uattr_ptr);
+	if (n_val <= 0)
+		return n_val;
+
+	ret = handler->handler(ibdev, ufile, attr_array, n_val,
+			       handler->priv);
+	uverbs_unlock_objects(attr_array, n_val, &handler->spec, !ret);
+
+	return ret;
+}
+
+static long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
+				struct ib_uverbs_file *file,
+				struct ib_uverbs_ioctl_hdr *hdr,
+				void __user *buf)
+{
+	const struct uverbs_type *type;
+	const struct uverbs_action *action;
+	const struct uverbs_action_spec *action_spec;
+	long err = 0;
+	unsigned int num_specs = 0;
+	unsigned int i;
+	struct {
+		struct ib_uverbs_attr		*uattrs;
+		struct uverbs_attr_array	*uverbs_attr_array;
+	} *ctx = NULL;
+	struct uverbs_attr *curr_attr;
+	size_t ctx_size;
+
+	if (!ib_dev)
+		return -EIO;
+
+	if (ib_dev->driver_id != hdr->driver_id)
+		return -EINVAL;
+
+	type = uverbs_get_type(ib_dev, hdr->object_type);
+	if (!type)
+		return -EOPNOTSUPP;
+
+	action = uverbs_get_action(type, hdr->action);
+	if (!action)
+		return -EOPNOTSUPP;
+
+	action_spec = &action->spec;
+	for (i = 0; i < action_spec->num_groups;
+	     num_specs += action_spec->attr_groups[i]->num_attrs, i++)
+		;
+
+	ctx_size = sizeof(*ctx->uattrs) * hdr->num_attrs +
+		   sizeof(*ctx->uverbs_attr_array->attrs) * num_specs +
+		   sizeof(struct uverbs_attr_array) * action_spec->num_groups +
+		   sizeof(*ctx);
+
+	ctx = kzalloc(ctx_size, GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->uverbs_attr_array = (void *)ctx + sizeof(*ctx);
+	ctx->uattrs = (void *)(ctx->uverbs_attr_array +
+			       action_spec->num_groups);
+	curr_attr = (void *)(ctx->uattrs + hdr->num_attrs);
+	for (i = 0; i < action_spec->num_groups; i++) {
+		ctx->uverbs_attr_array[i].attrs = curr_attr;
+		ctx->uverbs_attr_array[i].num_attrs =
+			action_spec->attr_groups[i]->num_attrs;
+		curr_attr += action_spec->attr_groups[i]->num_attrs;
+	}
+
+	err = copy_from_user(ctx->uattrs, buf,
+			     sizeof(*ctx->uattrs) * hdr->num_attrs);
+	if (err) {
+		err = -EFAULT;
+		goto out;
+	}
+
+	err = uverbs_handle_action(buf, ctx->uattrs, hdr->num_attrs, ib_dev,
+				   file, action, ctx->uverbs_attr_array);
+out:
+	kfree(ctx);
+	return err;
+}
+
+#define IB_UVERBS_MAX_CMD_SZ 4096
+
+long ib_uverbs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	struct ib_uverbs_file *file = filp->private_data;
+	struct ib_uverbs_ioctl_hdr __user *user_hdr =
+		(struct ib_uverbs_ioctl_hdr __user *)arg;
+	struct ib_uverbs_ioctl_hdr hdr;
+	struct ib_device *ib_dev;
+	int srcu_key;
+	long err;
+
+	srcu_key = srcu_read_lock(&file->device->disassociate_srcu);
+	ib_dev = srcu_dereference(file->device->ib_dev,
+				  &file->device->disassociate_srcu);
+	if (!ib_dev) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (cmd == RDMA_DIRECT_IOCTL) {
+		/* TODO? */
+		err = -ENOSYS;
+		goto out;
+	} else {
+		if (cmd != RDMA_VERBS_IOCTL) {
+			err = -ENOIOCTLCMD;
+			goto out;
+		}
+
+		err = copy_from_user(&hdr, user_hdr, sizeof(hdr));
+
+		if (err || hdr.length > IB_UVERBS_MAX_CMD_SZ ||
+		    hdr.length <= sizeof(hdr) ||
+		    hdr.length != sizeof(hdr) + hdr.num_attrs * sizeof(struct ib_uverbs_attr)) {
+			err = -EINVAL;
+			goto out;
+		}
+
+		/* currently there are no flags supported */
+		if (hdr.flags) {
+			err = -EOPNOTSUPP;
+			goto out;
+		}
+
+		/* We're closing, fail all commands */
+		if (!down_read_trylock(&file->close_sem))
+			return -EIO;
+		err = ib_uverbs_cmd_verbs(ib_dev, file, &hdr,
+					  (__user void *)arg + sizeof(hdr));
+		up_read(&file->close_sem);
+	}
+out:
+	srcu_read_unlock(&file->device->disassociate_srcu, srcu_key);
+
+	return err;
+}
diff --git a/drivers/infiniband/core/uverbs_ioctl_cmd.c b/drivers/infiniband/core/uverbs_ioctl_cmd.c
new file mode 100644
index 0000000..cde86b9
--- /dev/null
+++ b/drivers/infiniband/core/uverbs_ioctl_cmd.c
@@ -0,0 +1,77 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/uverbs_ioctl_cmd.h>
+#include <rdma/ib_verbs.h>
+#include <linux/bug.h>
+#include "uverbs.h"
+
+int ib_uverbs_std_dist(__u16 *id, void *priv)
+{
+	if (*id & IB_UVERBS_VENDOR_FLAG) {
+		*id &= ~IB_UVERBS_VENDOR_FLAG;
+		return 1;
+	}
+	return 0;
+}
+EXPORT_SYMBOL(ib_uverbs_std_dist);
+
+int uverbs_action_std_handle(struct ib_device *ib_dev,
+			     struct ib_uverbs_file *ufile,
+			     struct uverbs_attr_array *ctx, size_t num,
+			     void *_priv)
+{
+	struct uverbs_action_std_handler *priv = _priv;
+
+	if (!ufile->ucontext)
+		return -EINVAL;
+
+	WARN_ON((num != 1) && (num != 2));
+	return priv->handler(ib_dev, ufile->ucontext, &ctx[0],
+			     (num == 2 ? &ctx[1] : NULL),
+			     priv->priv);
+}
+EXPORT_SYMBOL(uverbs_action_std_handle);
+
+int uverbs_action_std_ctx_handle(struct ib_device *ib_dev,
+				 struct ib_uverbs_file *ufile,
+				 struct uverbs_attr_array *ctx, size_t num,
+				 void *_priv)
+{
+	struct uverbs_action_std_ctx_handler *priv = _priv;
+
+	WARN_ON((num != 1) && (num != 2));
+	return priv->handler(ib_dev, ufile, &ctx[0],
+			     (num == 2 ? &ctx[1] : NULL), priv->priv);
+}
+EXPORT_SYMBOL(uverbs_action_std_ctx_handle);
+
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index e63357a..4bb2fc6 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -49,6 +49,7 @@
 #include <asm/uaccess.h>
 
 #include <rdma/ib.h>
+#include <rdma/rdma_user_ioctl.h>
 
 #include "uverbs.h"
 
@@ -216,6 +217,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 {
 	struct ib_uobject *uobj, *tmp;
 
+	down_write(&file->close_sem);
 	context->closing = 1;
 
 	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
@@ -332,6 +334,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	}
 
 	put_pid(context->tgid);
+	up_write(&file->close_sem);
 
 	return context->device->dealloc_ucontext(context);
 }
@@ -951,6 +954,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
 		goto err;
 	}
 
+	init_rwsem(&file->close_sem);
 	file->device	 = dev;
 	file->ucontext	 = NULL;
 	file->async_file = NULL;
@@ -1012,6 +1016,7 @@ static const struct file_operations uverbs_fops = {
 	.open	 = ib_uverbs_open,
 	.release = ib_uverbs_close,
 	.llseek	 = no_llseek,
+	.unlocked_ioctl = ib_uverbs_ioctl,
 };
 
 static const struct file_operations uverbs_mmap_fops = {
@@ -1021,6 +1026,7 @@ static const struct file_operations uverbs_mmap_fops = {
 	.open	 = ib_uverbs_open,
 	.release = ib_uverbs_close,
 	.llseek	 = no_llseek,
+	.unlocked_ioctl = ib_uverbs_ioctl,
 };
 
 static struct ib_client uverbs_client = {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7240615..2801367 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2122,7 +2122,8 @@ struct ib_device {
 	void (*get_dev_fw_str)(struct ib_device *, char *str, size_t str_len);
 	struct list_head type_list;
 
-	const struct uverbs_types_group	*types_group;
+	u16					driver_id;
+	const struct uverbs_types_group		*types_group;
 };
 
 struct ib_client {
diff --git a/include/rdma/uverbs_ioctl_cmd.h b/include/rdma/uverbs_ioctl_cmd.h
new file mode 100644
index 0000000..728389e
--- /dev/null
+++ b/include/rdma/uverbs_ioctl_cmd.h
@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UVERBS_IOCTL_CMD_
+#define _UVERBS_IOCTL_CMD_
+
+#include <rdma/uverbs_ioctl.h>
+
+#define IB_UVERBS_VENDOR_FLAG	0x8000
+
+int ib_uverbs_std_dist(__u16 *attr_id, void *priv);
+
+/* common validators */
+
+int uverbs_action_std_handle(struct ib_device *ib_dev,
+			     struct ib_uverbs_file *ufile,
+			     struct uverbs_attr_array *ctx, size_t num,
+			     void *_priv);
+int uverbs_action_std_ctx_handle(struct ib_device *ib_dev,
+				 struct ib_uverbs_file *ufile,
+				 struct uverbs_attr_array *ctx, size_t num,
+				 void *_priv);
+
+struct uverbs_action_std_handler {
+	int (*handler)(struct ib_device *ib_dev, struct ib_ucontext *ucontext,
+		       struct uverbs_attr_array *common,
+		       struct uverbs_attr_array *vendor,
+		       void *priv);
+	void *priv;
+};
+
+struct uverbs_action_std_ctx_handler {
+	int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
+		       struct uverbs_attr_array *common,
+		       struct uverbs_attr_array *vendor,
+		       void *priv);
+	void *priv;
+};
+
+#endif
+
diff --git a/include/uapi/rdma/rdma_user_ioctl.h b/include/uapi/rdma/rdma_user_ioctl.h
index 9388125..5a45518 100644
--- a/include/uapi/rdma/rdma_user_ioctl.h
+++ b/include/uapi/rdma/rdma_user_ioctl.h
@@ -43,6 +43,29 @@
 /* Legacy name, for user space application which already use it */
 #define IB_IOCTL_MAGIC		RDMA_IOCTL_MAGIC
 
+#define RDMA_VERBS_IOCTL \
+	_IOWR(RDMA_IOCTL_MAGIC, 1, struct ib_uverbs_ioctl_hdr)
+
+#define RDMA_DIRECT_IOCTL \
+	_IOWR(RDMA_IOCTL_MAGIC, 2, struct ib_uverbs_ioctl_hdr)
+
+struct ib_uverbs_attr {
+	__u16 attr_id;		/* command specific type attribute */
+	__u16 len;		/* NA for idr */
+	__u32 reserved;
+	 __u64 ptr_idr;		/* ptr typeo command/idr handle */
+};
+
+struct ib_uverbs_ioctl_hdr {
+	__u16 length;
+	__u16 flags;
+	__u16 object_type;
+	__u16 driver_id;
+	__u16 action;
+	__u16 num_attrs;
+	struct ib_uverbs_attr  attrs[0];
+};
+
 /*
  * General blocks assignments
  * It is closed on purpose do not expose it it user space
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC ABI V5 04/10] RDMA/core: Add initialize and cleanup of common types
       [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-10-27 14:43   ` [RFC ABI V5 03/10] RDMA/core: Add new ioctl interface Matan Barak
@ 2016-10-27 14:43   ` Matan Barak
  2016-10-27 14:43   ` [RFC ABI V5 05/10] RDMA/core: Add uverbs types, actions, handlers and attributes Matan Barak
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Matan Barak @ 2016-10-27 14:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Christoph Lameter,
	Liran Liss, Haggai Eran, Majd Dibbiny, Matan Barak, Tal Alon,
	Leon Romanovsky

The new ABI infrastructure introduced types per driver. Since current
drivers use common types, we mimics the current release ucontext
process using the new process.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h           |   9 +-
 drivers/infiniband/core/uverbs_ioctl_cmd.c | 128 +++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c      | 123 ++-------------------------
 include/rdma/uverbs_ioctl_cmd.h            |  42 ++++++++++
 4 files changed, 182 insertions(+), 120 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index d78c060..fa37c2a 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -179,8 +179,6 @@ struct ib_ucq_object {
 	u32			async_events_reported;
 };
 
-void idr_remove_uobj(struct ib_uobject *uobj);
-
 struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file,
 					struct ib_device *ib_dev,
 					int is_async);
@@ -204,6 +202,13 @@ void ib_uverbs_event_handler(struct ib_event_handler *handler,
 void ib_uverbs_dealloc_xrcd(struct ib_uverbs_device *dev, struct ib_xrcd *xrcd);
 
 int uverbs_dealloc_mw(struct ib_mw *mw);
+void ib_uverbs_release_ucq(struct ib_uverbs_file *file,
+			   struct ib_uverbs_event_file *ev_file,
+			   struct ib_ucq_object *uobj);
+void ib_uverbs_release_uevent(struct ib_uverbs_file *file,
+			      struct ib_uevent_object *uobj);
+void ib_uverbs_detach_umcast(struct ib_qp *qp,
+			     struct ib_uqp_object *uobj);
 
 struct ib_uverbs_flow_spec {
 	union {
diff --git a/drivers/infiniband/core/uverbs_ioctl_cmd.c b/drivers/infiniband/core/uverbs_ioctl_cmd.c
index cde86b9..9ec76d9 100644
--- a/drivers/infiniband/core/uverbs_ioctl_cmd.c
+++ b/drivers/infiniband/core/uverbs_ioctl_cmd.c
@@ -31,8 +31,11 @@
  */
 
 #include <rdma/uverbs_ioctl_cmd.h>
+#include <rdma/ib_user_verbs.h>
 #include <rdma/ib_verbs.h>
 #include <linux/bug.h>
+#include <linux/file.h>
+#include "rdma_core.h"
 #include "uverbs.h"
 
 int ib_uverbs_std_dist(__u16 *id, void *priv)
@@ -75,3 +78,128 @@ int uverbs_action_std_ctx_handle(struct ib_device *ib_dev,
 }
 EXPORT_SYMBOL(uverbs_action_std_ctx_handle);
 
+void uverbs_free_ah(const struct uverbs_type_alloc_action *uobject_type,
+		    struct ib_uobject *uobject)
+{
+	ib_destroy_ah((struct ib_ah *)uobject->object);
+}
+EXPORT_SYMBOL(uverbs_free_ah);
+
+void uverbs_free_flow(const struct uverbs_type_alloc_action *type_alloc_action,
+		      struct ib_uobject *uobject)
+{
+	ib_destroy_flow((struct ib_flow *)uobject->object);
+}
+EXPORT_SYMBOL(uverbs_free_flow);
+
+void uverbs_free_mw(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject)
+{
+	uverbs_dealloc_mw((struct ib_mw *)uobject->object);
+}
+EXPORT_SYMBOL(uverbs_free_mw);
+
+void uverbs_free_qp(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject)
+{
+	struct ib_qp *qp = uobject->object;
+	struct ib_uqp_object *uqp =
+		container_of(uobject, struct ib_uqp_object, uevent.uobject);
+
+	if (qp != qp->real_qp) {
+		ib_close_qp(qp);
+	} else {
+		ib_uverbs_detach_umcast(qp, uqp);
+		ib_destroy_qp(qp);
+	}
+	ib_uverbs_release_uevent(uobject->context->ufile, &uqp->uevent);
+}
+EXPORT_SYMBOL(uverbs_free_qp);
+
+void uverbs_free_rwq_ind_tbl(const struct uverbs_type_alloc_action *type_alloc_action,
+			     struct ib_uobject *uobject)
+{
+	struct ib_rwq_ind_table *rwq_ind_tbl = uobject->object;
+	struct ib_wq **ind_tbl = rwq_ind_tbl->ind_tbl;
+
+	ib_destroy_rwq_ind_table(rwq_ind_tbl);
+	kfree(ind_tbl);
+}
+EXPORT_SYMBOL(uverbs_free_rwq_ind_tbl);
+
+void uverbs_free_wq(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject)
+{
+	struct ib_wq *wq = uobject->object;
+	struct ib_uwq_object *uwq =
+		container_of(uobject, struct ib_uwq_object, uevent.uobject);
+
+	ib_destroy_wq(wq);
+	ib_uverbs_release_uevent(uobject->context->ufile, &uwq->uevent);
+}
+EXPORT_SYMBOL(uverbs_free_wq);
+
+void uverbs_free_srq(const struct uverbs_type_alloc_action *type_alloc_action,
+		     struct ib_uobject *uobject)
+{
+	struct ib_srq *srq = uobject->object;
+	struct ib_uevent_object *uevent =
+		container_of(uobject, struct ib_uevent_object, uobject);
+
+	ib_destroy_srq(srq);
+	ib_uverbs_release_uevent(uobject->context->ufile, uevent);
+}
+EXPORT_SYMBOL(uverbs_free_srq);
+
+void uverbs_free_cq(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject)
+{
+	struct ib_cq *cq = uobject->object;
+	struct ib_uverbs_event_file *ev_file = cq->cq_context;
+	struct ib_ucq_object *ucq =
+		container_of(uobject, struct ib_ucq_object, uobject);
+
+	ib_destroy_cq(cq);
+	ib_uverbs_release_ucq(uobject->context->ufile, ev_file, ucq);
+}
+EXPORT_SYMBOL(uverbs_free_cq);
+
+void uverbs_free_mr(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject)
+{
+	ib_dereg_mr((struct ib_mr *)uobject->object);
+}
+EXPORT_SYMBOL(uverbs_free_mr);
+
+void uverbs_free_xrcd(const struct uverbs_type_alloc_action *type_alloc_action,
+		      struct ib_uobject *uobject)
+{
+	struct ib_xrcd *xrcd = uobject->object;
+
+	mutex_lock(&uobject->context->ufile->device->xrcd_tree_mutex);
+	ib_uverbs_dealloc_xrcd(uobject->context->ufile->device, xrcd);
+	mutex_unlock(&uobject->context->ufile->device->xrcd_tree_mutex);
+}
+EXPORT_SYMBOL(uverbs_free_xrcd);
+
+void uverbs_free_pd(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject)
+{
+	ib_dealloc_pd((struct ib_pd *)uobject->object);
+}
+EXPORT_SYMBOL(uverbs_free_pd);
+
+void uverbs_free_event_file(const struct uverbs_type_alloc_action *type_alloc_action,
+			    struct ib_uobject *uobject)
+{
+	struct ib_uverbs_event_file *event_file = (void *)(uobject + 1);
+
+	spin_lock_irq(&event_file->lock);
+	event_file->is_closed = 1;
+	spin_unlock_irq(&event_file->lock);
+
+	wake_up_interruptible(&event_file->poll_wait);
+	kill_fasync(&event_file->async_queue, SIGIO, POLL_IN);
+};
+EXPORT_SYMBOL(uverbs_free_event_file);
+
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 4bb2fc6..ff43ff4 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -52,6 +52,7 @@
 #include <rdma/rdma_user_ioctl.h>
 
 #include "uverbs.h"
+#include "rdma_core.h"
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("InfiniBand userspace verbs access");
@@ -200,8 +201,8 @@ void ib_uverbs_release_uevent(struct ib_uverbs_file *file,
 	spin_unlock_irq(&file->async_file->lock);
 }
 
-static void ib_uverbs_detach_umcast(struct ib_qp *qp,
-				    struct ib_uqp_object *uobj)
+void ib_uverbs_detach_umcast(struct ib_qp *qp,
+			     struct ib_uqp_object *uobj)
 {
 	struct ib_uverbs_mcast_entry *mcast, *tmp;
 
@@ -215,124 +216,10 @@ static void ib_uverbs_detach_umcast(struct ib_qp *qp,
 static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 				      struct ib_ucontext *context)
 {
-	struct ib_uobject *uobj, *tmp;
-
 	down_write(&file->close_sem);
 	context->closing = 1;
-
-	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
-		struct ib_ah *ah = uobj->object;
-
-		idr_remove_uobj(uobj);
-		ib_destroy_ah(ah);
-		kfree(uobj);
-	}
-
-	/* Remove MWs before QPs, in order to support type 2A MWs. */
-	list_for_each_entry_safe(uobj, tmp, &context->mw_list, list) {
-		struct ib_mw *mw = uobj->object;
-
-		idr_remove_uobj(uobj);
-		uverbs_dealloc_mw(mw);
-		kfree(uobj);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->rule_list, list) {
-		struct ib_flow *flow_id = uobj->object;
-
-		idr_remove_uobj(uobj);
-		ib_destroy_flow(flow_id);
-		kfree(uobj);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->qp_list, list) {
-		struct ib_qp *qp = uobj->object;
-		struct ib_uqp_object *uqp =
-			container_of(uobj, struct ib_uqp_object, uevent.uobject);
-
-		idr_remove_uobj(uobj);
-		if (qp != qp->real_qp) {
-			ib_close_qp(qp);
-		} else {
-			ib_uverbs_detach_umcast(qp, uqp);
-			ib_destroy_qp(qp);
-		}
-		ib_uverbs_release_uevent(file, &uqp->uevent);
-		kfree(uqp);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->rwq_ind_tbl_list, list) {
-		struct ib_rwq_ind_table *rwq_ind_tbl = uobj->object;
-		struct ib_wq **ind_tbl = rwq_ind_tbl->ind_tbl;
-
-		idr_remove_uobj(uobj);
-		ib_destroy_rwq_ind_table(rwq_ind_tbl);
-		kfree(ind_tbl);
-		kfree(uobj);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->wq_list, list) {
-		struct ib_wq *wq = uobj->object;
-		struct ib_uwq_object *uwq =
-			container_of(uobj, struct ib_uwq_object, uevent.uobject);
-
-		idr_remove_uobj(uobj);
-		ib_destroy_wq(wq);
-		ib_uverbs_release_uevent(file, &uwq->uevent);
-		kfree(uwq);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->srq_list, list) {
-		struct ib_srq *srq = uobj->object;
-		struct ib_uevent_object *uevent =
-			container_of(uobj, struct ib_uevent_object, uobject);
-
-		idr_remove_uobj(uobj);
-		ib_destroy_srq(srq);
-		ib_uverbs_release_uevent(file, uevent);
-		kfree(uevent);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->cq_list, list) {
-		struct ib_cq *cq = uobj->object;
-		struct ib_uverbs_event_file *ev_file = cq->cq_context;
-		struct ib_ucq_object *ucq =
-			container_of(uobj, struct ib_ucq_object, uobject);
-
-		idr_remove_uobj(uobj);
-		ib_destroy_cq(cq);
-		ib_uverbs_release_ucq(file, ev_file, ucq);
-		kfree(ucq);
-	}
-
-	list_for_each_entry_safe(uobj, tmp, &context->mr_list, list) {
-		struct ib_mr *mr = uobj->object;
-
-		idr_remove_uobj(uobj);
-		ib_dereg_mr(mr);
-		kfree(uobj);
-	}
-
-	mutex_lock(&file->device->xrcd_tree_mutex);
-	list_for_each_entry_safe(uobj, tmp, &context->xrcd_list, list) {
-		struct ib_xrcd *xrcd = uobj->object;
-		struct ib_uxrcd_object *uxrcd =
-			container_of(uobj, struct ib_uxrcd_object, uobject);
-
-		idr_remove_uobj(uobj);
-		ib_uverbs_dealloc_xrcd(file->device, xrcd);
-		kfree(uxrcd);
-	}
-	mutex_unlock(&file->device->xrcd_tree_mutex);
-
-	list_for_each_entry_safe(uobj, tmp, &context->pd_list, list) {
-		struct ib_pd *pd = uobj->object;
-
-		idr_remove_uobj(uobj);
-		ib_dealloc_pd(pd);
-		kfree(uobj);
-	}
-
+	ib_uverbs_uobject_type_cleanup_ucontext(context,
+						context->device->types_group);
 	put_pid(context->tgid);
 	up_write(&file->close_sem);
 
diff --git a/include/rdma/uverbs_ioctl_cmd.h b/include/rdma/uverbs_ioctl_cmd.h
index 728389e..e0299a2 100644
--- a/include/rdma/uverbs_ioctl_cmd.h
+++ b/include/rdma/uverbs_ioctl_cmd.h
@@ -66,5 +66,47 @@ struct uverbs_action_std_ctx_handler {
 	void *priv;
 };
 
+void uverbs_free_ah(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject);
+void uverbs_free_flow(const struct uverbs_type_alloc_action *type_alloc_action,
+		      struct ib_uobject *uobject);
+void uverbs_free_mw(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject);
+void uverbs_free_qp(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject);
+void uverbs_free_rwq_ind_tbl(const struct uverbs_type_alloc_action *type_alloc_action,
+			     struct ib_uobject *uobject);
+void uverbs_free_wq(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject);
+void uverbs_free_srq(const struct uverbs_type_alloc_action *type_alloc_action,
+		     struct ib_uobject *uobject);
+void uverbs_free_cq(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject);
+void uverbs_free_mr(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject);
+void uverbs_free_xrcd(const struct uverbs_type_alloc_action *type_alloc_action,
+		      struct ib_uobject *uobject);
+void uverbs_free_pd(const struct uverbs_type_alloc_action *type_alloc_action,
+		    struct ib_uobject *uobject);
+void uverbs_free_event_file(const struct uverbs_type_alloc_action *type_alloc_action,
+			    struct ib_uobject *uobject);
+
+enum uverbs_common_types {
+	UVERBS_TYPE_DEVICE, /* Don't use IDRs here */
+	UVERBS_TYPE_PD,
+	UVERBS_TYPE_COMP_CHANNEL,
+	UVERBS_TYPE_CQ,
+	UVERBS_TYPE_QP,
+	UVERBS_TYPE_SRQ,
+	UVERBS_TYPE_AH,
+	UVERBS_TYPE_MR,
+	UVERBS_TYPE_MW,
+	UVERBS_TYPE_FLOW,
+	UVERBS_TYPE_XRCD,
+	UVERBS_TYPE_RWQ_IND_TBL,
+	UVERBS_TYPE_WQ,
+	UVERBS_TYPE_LAST,
+};
+
 #endif
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC ABI V5 05/10] RDMA/core: Add uverbs types, actions, handlers and attributes
       [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-10-27 14:43   ` [RFC ABI V5 04/10] RDMA/core: Add initialize and cleanup of common types Matan Barak
@ 2016-10-27 14:43   ` Matan Barak
  2016-10-27 14:43   ` [RFC ABI V5 06/10] IB/mlx5: Implement common uverb objects Matan Barak
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Matan Barak @ 2016-10-27 14:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Christoph Lameter,
	Liran Liss, Haggai Eran, Majd Dibbiny, Matan Barak, Tal Alon,
	Leon Romanovsky

We add the common (core) code for init context, query device,
reg_mr, create_cq, create_comp_channel and init_pd.
This includes the following parts:
* Macros for defining commands and validators
* For each command
    * type declarations
          - destruction order
          - free function
          - uverbs action group
    * actions
    * handlers
    * attributes

Drivers could use the these attributes, actions or types when they
want to alter or add a new type. The could use the uverbs handler
directly in the action (or just wrap it in the driver's custom code).

Currently we use ib_udata to pass vendor specific information to the
driver. This should probably be refactored in the future.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h           |   6 +
 drivers/infiniband/core/uverbs_cmd.c       |   7 +-
 drivers/infiniband/core/uverbs_ioctl_cmd.c | 568 +++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_main.c      |  37 ++
 include/rdma/uverbs_ioctl.h                | 147 ++++++++
 include/rdma/uverbs_ioctl_cmd.h            | 143 ++++++++
 include/uapi/rdma/ib_user_verbs.h          |  13 +
 7 files changed, 917 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index fa37c2a..ad4af37 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -179,6 +179,8 @@ struct ib_ucq_object {
 	u32			async_events_reported;
 };
 
+extern const struct file_operations uverbs_refactored_event_fops;
+
 struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file,
 					struct ib_device *ib_dev,
 					int is_async);
@@ -202,6 +204,10 @@ void ib_uverbs_event_handler(struct ib_event_handler *handler,
 void ib_uverbs_dealloc_xrcd(struct ib_uverbs_device *dev, struct ib_xrcd *xrcd);
 
 int uverbs_dealloc_mw(struct ib_mw *mw);
+void uverbs_copy_query_dev_fields(struct ib_device *ib_dev,
+				  struct ib_uverbs_query_device_resp *resp,
+				  struct ib_device_attr *attr);
+
 void ib_uverbs_release_ucq(struct ib_uverbs_file *file,
 			   struct ib_uverbs_event_file *ev_file,
 			   struct ib_ucq_object *uobj);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index ac17b9c..8fc1557 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -410,8 +410,7 @@ err:
 	return ret;
 }
 
-static void copy_query_dev_fields(struct ib_uverbs_file *file,
-				  struct ib_device *ib_dev,
+void uverbs_copy_query_dev_fields(struct ib_device *ib_dev,
 				  struct ib_uverbs_query_device_resp *resp,
 				  struct ib_device_attr *attr)
 {
@@ -472,7 +471,7 @@ ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file,
 		return -EFAULT;
 
 	memset(&resp, 0, sizeof resp);
-	copy_query_dev_fields(file, ib_dev, &resp, &ib_dev->attrs);
+	uverbs_copy_query_dev_fields(ib_dev, &resp, &ib_dev->attrs);
 
 	if (copy_to_user((void __user *) (unsigned long) cmd.response,
 			 &resp, sizeof resp))
@@ -4188,7 +4187,7 @@ int ib_uverbs_ex_query_device(struct ib_uverbs_file *file,
 	if (err)
 		return err;
 
-	copy_query_dev_fields(file, ib_dev, &resp.base, &attr);
+	uverbs_copy_query_dev_fields(ib_dev, &resp.base, &attr);
 
 	if (ucore->outlen < resp.response_length + sizeof(resp.odp_caps))
 		goto end;
diff --git a/drivers/infiniband/core/uverbs_ioctl_cmd.c b/drivers/infiniband/core/uverbs_ioctl_cmd.c
index 9ec76d9..623e02e 100644
--- a/drivers/infiniband/core/uverbs_ioctl_cmd.c
+++ b/drivers/infiniband/core/uverbs_ioctl_cmd.c
@@ -203,3 +203,571 @@ void uverbs_free_event_file(const struct uverbs_type_alloc_action *type_alloc_ac
 };
 EXPORT_SYMBOL(uverbs_free_event_file);
 
+DECLARE_UVERBS_ATTR_SPEC(
+	uverbs_uhw_compat_spec,
+	UVERBS_ATTR_PTR_IN(UVERBS_UHW_IN, 0),
+	UVERBS_ATTR_PTR_OUT(UVERBS_UHW_OUT, 0));
+EXPORT_SYMBOL(uverbs_uhw_compat_spec);
+
+static void create_udata(struct uverbs_attr_array *vendor,
+			 struct ib_udata *udata)
+{
+	/*
+	 * This is for ease of conversion. The purpose is to convert all vendors
+	 * to use uverbs_attr_array instead of ib_udata.
+	 * Assume attr == 0 is input and attr == 1 is output.
+	 */
+	void * __user inbuf;
+	size_t inbuf_len = 0;
+	void * __user outbuf;
+	size_t outbuf_len = 0;
+
+	if (vendor) {
+		WARN_ON(vendor->num_attrs > 2);
+
+		if (vendor->attrs[0].valid) {
+			inbuf = vendor->attrs[0].cmd_attr.ptr;
+			inbuf_len = vendor->attrs[0].cmd_attr.len;
+		}
+
+		if (vendor->num_attrs == 2 && vendor->attrs[1].valid) {
+			outbuf = vendor->attrs[1].cmd_attr.ptr;
+			outbuf_len = vendor->attrs[1].cmd_attr.len;
+		}
+	}
+	INIT_UDATA_BUF_OR_NULL(udata, inbuf, outbuf, inbuf_len, outbuf_len);
+}
+
+DECLARE_UVERBS_ATTR_SPEC(
+	uverbs_get_context_spec,
+	UVERBS_ATTR_PTR_OUT(GET_CONTEXT_RESP,
+			    sizeof(struct ib_uverbs_get_context_resp)));
+EXPORT_SYMBOL(uverbs_get_context_spec);
+
+int uverbs_get_context(struct ib_device *ib_dev,
+		       struct ib_uverbs_file *file,
+		       struct uverbs_attr_array *common,
+		       struct uverbs_attr_array *vendor,
+		       void *priv)
+{
+	struct ib_udata uhw;
+	struct ib_uverbs_get_context_resp resp;
+	struct ib_ucontext		 *ucontext;
+	struct file			 *filp;
+	int ret;
+
+	if (!common->attrs[GET_CONTEXT_RESP].valid)
+		return -EINVAL;
+
+	/* Temporary, only until vendors get the new uverbs_attr_array */
+	create_udata(vendor, &uhw);
+
+	mutex_lock(&file->mutex);
+
+	if (file->ucontext) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ucontext = ib_dev->alloc_ucontext(ib_dev, &uhw);
+	if (IS_ERR(ucontext)) {
+		ret = PTR_ERR(ucontext);
+		goto err;
+	}
+
+	ucontext->device = ib_dev;
+	ret = ib_uverbs_uobject_type_initialize_ucontext(ucontext);
+	if (ret)
+		goto err_ctx;
+
+	rcu_read_lock();
+	ucontext->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
+	rcu_read_unlock();
+	ucontext->closing = 0;
+
+#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
+	ucontext->umem_tree = RB_ROOT;
+	init_rwsem(&ucontext->umem_rwsem);
+	ucontext->odp_mrs_count = 0;
+	INIT_LIST_HEAD(&ucontext->no_private_counters);
+
+	if (!(ib_dev->attrs.device_cap_flags & IB_DEVICE_ON_DEMAND_PAGING))
+		ucontext->invalidate_range = NULL;
+
+#endif
+
+	resp.num_comp_vectors = file->device->num_comp_vectors;
+
+	ret = get_unused_fd_flags(O_CLOEXEC);
+	if (ret < 0)
+		goto err_free;
+	resp.async_fd = ret;
+
+	filp = ib_uverbs_alloc_event_file(file, ib_dev, 1);
+	if (IS_ERR(filp)) {
+		ret = PTR_ERR(filp);
+		goto err_fd;
+	}
+
+	if (copy_to_user(common->attrs[GET_CONTEXT_RESP].cmd_attr.ptr,
+			 &resp, sizeof(resp))) {
+		ret = -EFAULT;
+		goto err_file;
+	}
+
+	file->ucontext = ucontext;
+	ucontext->ufile = file;
+
+	fd_install(resp.async_fd, filp);
+
+	mutex_unlock(&file->mutex);
+
+	return 0;
+
+err_file:
+	ib_uverbs_free_async_event_file(file);
+	fput(filp);
+
+err_fd:
+	put_unused_fd(resp.async_fd);
+
+err_free:
+	put_pid(ucontext->tgid);
+	ib_uverbs_uobject_type_release_ucontext(ucontext);
+
+err_ctx:
+	ib_dev->dealloc_ucontext(ucontext);
+err:
+	mutex_unlock(&file->mutex);
+	return ret;
+}
+EXPORT_SYMBOL(uverbs_get_context);
+DECLARE_UVERBS_CTX_ACTION(uverbs_action_get_context, uverbs_get_context, NULL,
+			  &uverbs_get_context_spec, &uverbs_uhw_compat_spec);
+EXPORT_SYMBOL(uverbs_action_get_context);
+
+DECLARE_UVERBS_ATTR_SPEC(
+	uverbs_query_device_spec,
+	UVERBS_ATTR_PTR_OUT(QUERY_DEVICE_RESP, sizeof(struct ib_uverbs_query_device_resp)),
+	UVERBS_ATTR_PTR_OUT(QUERY_DEVICE_ODP, sizeof(struct ib_uverbs_odp_caps)),
+	UVERBS_ATTR_PTR_OUT(QUERY_DEVICE_TIMESTAMP_MASK, sizeof(__u64)),
+	UVERBS_ATTR_PTR_OUT(QUERY_DEVICE_HCA_CORE_CLOCK, sizeof(__u64)),
+	UVERBS_ATTR_PTR_OUT(QUERY_DEVICE_CAP_FLAGS, sizeof(__u64)));
+EXPORT_SYMBOL(uverbs_query_device_spec);
+
+int uverbs_query_device_handler(struct ib_device *ib_dev,
+				struct ib_ucontext *ucontext,
+				struct uverbs_attr_array *common,
+				struct uverbs_attr_array *vendor,
+				void *priv)
+{
+	struct ib_device_attr attr = {};
+	struct ib_udata uhw;
+	int err;
+
+	/* Temporary, only until vendors get the new uverbs_attr_array */
+	create_udata(vendor, &uhw);
+
+	err = ib_dev->query_device(ib_dev, &attr, &uhw);
+	if (err)
+		return err;
+
+	if (common->attrs[QUERY_DEVICE_RESP].valid) {
+		struct ib_uverbs_query_device_resp resp = {};
+
+		uverbs_copy_query_dev_fields(ib_dev, &resp, &attr);
+		if (copy_to_user(common->attrs[QUERY_DEVICE_RESP].cmd_attr.ptr,
+				 &resp, sizeof(resp)))
+			return -EFAULT;
+	}
+
+#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
+	if (common->attrs[QUERY_DEVICE_ODP].valid) {
+		struct ib_uverbs_odp_caps odp_caps;
+
+		odp_caps.general_caps = attr.odp_caps.general_caps;
+		odp_caps.per_transport_caps.rc_odp_caps =
+			attr.odp_caps.per_transport_caps.rc_odp_caps;
+		odp_caps.per_transport_caps.uc_odp_caps =
+			attr.odp_caps.per_transport_caps.uc_odp_caps;
+		odp_caps.per_transport_caps.ud_odp_caps =
+			attr.odp_caps.per_transport_caps.ud_odp_caps;
+
+		if (copy_to_user(common->attrs[QUERY_DEVICE_ODP].cmd_attr.ptr,
+				 &odp_caps, sizeof(odp_caps)))
+			return -EFAULT;
+	}
+#endif
+	if (UVERBS_COPY_TO(common, QUERY_DEVICE_TIMESTAMP_MASK,
+			   &attr.timestamp_mask) == -EFAULT)
+		return -EFAULT;
+
+	if (UVERBS_COPY_TO(common, QUERY_DEVICE_HCA_CORE_CLOCK,
+			   &attr.hca_core_clock) == -EFAULT)
+		return -EFAULT;
+
+	if (UVERBS_COPY_TO(common, QUERY_DEVICE_CAP_FLAGS,
+			   &attr.device_cap_flags) == -EFAULT)
+		return -EFAULT;
+
+	return 0;
+}
+EXPORT_SYMBOL(uverbs_query_device_handler);
+DECLARE_UVERBS_ACTION(uverbs_action_query_device, uverbs_query_device_handler,
+		      NULL, &uverbs_query_device_spec, &uverbs_uhw_compat_spec);
+EXPORT_SYMBOL(uverbs_action_query_device);
+
+DECLARE_UVERBS_ATTR_SPEC(
+	uverbs_alloc_pd_spec,
+	UVERBS_ATTR_IDR(ALLOC_PD_HANDLE, UVERBS_TYPE_PD,
+			UVERBS_IDR_ACCESS_NEW));
+EXPORT_SYMBOL(uverbs_alloc_pd_spec);
+
+int uverbs_alloc_pd_handler(struct ib_device *ib_dev,
+			    struct ib_ucontext *ucontext,
+			    struct uverbs_attr_array *common,
+			    struct uverbs_attr_array *vendor,
+			    void *priv)
+{
+	struct ib_udata uhw;
+	struct ib_uobject *uobject;
+	struct ib_pd *pd;
+
+	if (!common->attrs[ALLOC_PD_HANDLE].valid)
+		return -EINVAL;
+
+	/* Temporary, only until vendors get the new uverbs_attr_array */
+	create_udata(vendor, &uhw);
+
+	pd = ib_dev->alloc_pd(ib_dev, ucontext, &uhw);
+	if (IS_ERR(pd))
+		return PTR_ERR(pd);
+
+	uobject = common->attrs[ALLOC_PD_HANDLE].obj_attr.uobject;
+	pd->device  = ib_dev;
+	pd->uobject = uobject;
+	pd->__internal_mr = NULL;
+	uobject->object = pd;
+	atomic_set(&pd->usecnt, 0);
+
+	return 0;
+}
+EXPORT_SYMBOL(uverbs_alloc_pd_handler);
+DECLARE_UVERBS_ACTION(uverbs_action_alloc_pd, uverbs_alloc_pd_handler, NULL,
+		      &uverbs_alloc_pd_spec, &uverbs_uhw_compat_spec);
+EXPORT_SYMBOL(uverbs_action_alloc_pd);
+
+DECLARE_UVERBS_ATTR_SPEC(
+	uverbs_reg_mr_spec,
+	UVERBS_ATTR_IDR(REG_MR_HANDLE, UVERBS_TYPE_MR, UVERBS_IDR_ACCESS_NEW),
+	UVERBS_ATTR_IDR(REG_MR_PD_HANDLE, UVERBS_TYPE_PD, UVERBS_IDR_ACCESS_READ),
+	UVERBS_ATTR_PTR_IN(REG_MR_CMD, sizeof(struct ib_uverbs_ioctl_reg_mr)),
+	UVERBS_ATTR_PTR_OUT(REG_MR_RESP, sizeof(struct ib_uverbs_ioctl_reg_mr_resp)));
+EXPORT_SYMBOL(uverbs_reg_mr_spec);
+
+int uverbs_reg_mr_handler(struct ib_device *ib_dev,
+			  struct ib_ucontext *ucontext,
+			  struct uverbs_attr_array *common,
+			  struct uverbs_attr_array *vendor,
+			  void *priv)
+{
+	struct ib_uverbs_ioctl_reg_mr		cmd;
+	struct ib_uverbs_ioctl_reg_mr_resp	resp;
+	struct ib_udata uhw;
+	struct ib_uobject *uobject;
+	struct ib_pd                *pd;
+	struct ib_mr                *mr;
+	int                          ret;
+
+	if (!common->attrs[REG_MR_HANDLE].valid ||
+	    !common->attrs[REG_MR_PD_HANDLE].valid ||
+	    !common->attrs[REG_MR_CMD].valid ||
+	    !common->attrs[REG_MR_RESP].valid)
+		return -EINVAL;
+
+	if (copy_from_user(&cmd, common->attrs[REG_MR_CMD].cmd_attr.ptr, sizeof(cmd)))
+		return -EFAULT;
+
+	if ((cmd.start & ~PAGE_MASK) != (cmd.hca_va & ~PAGE_MASK))
+		return -EINVAL;
+
+	ret = ib_check_mr_access(cmd.access_flags);
+	if (ret)
+		return ret;
+
+	/* Temporary, only until vendors get the new uverbs_attr_array */
+	create_udata(vendor, &uhw);
+
+	uobject = common->attrs[REG_MR_HANDLE].obj_attr.uobject;
+	pd = common->attrs[REG_MR_PD_HANDLE].obj_attr.uobject->object;
+
+	if (cmd.access_flags & IB_ACCESS_ON_DEMAND) {
+		if (!(pd->device->attrs.device_cap_flags &
+		      IB_DEVICE_ON_DEMAND_PAGING)) {
+			pr_debug("ODP support not available\n");
+			return -EINVAL;
+		}
+	}
+
+	mr = pd->device->reg_user_mr(pd, cmd.start, cmd.length, cmd.hca_va,
+				     cmd.access_flags, &uhw);
+	if (IS_ERR(mr))
+		return PTR_ERR(mr);
+
+	mr->device  = pd->device;
+	mr->pd      = pd;
+	mr->uobject = uobject;
+	atomic_inc(&pd->usecnt);
+	uobject->object = mr;
+
+	resp.lkey      = mr->lkey;
+	resp.rkey      = mr->rkey;
+
+	if (copy_to_user(common->attrs[REG_MR_RESP].cmd_attr.ptr,
+			 &resp, sizeof(resp))) {
+		ret = -EFAULT;
+		goto err;
+	}
+
+	return 0;
+
+err:
+	ib_dereg_mr(mr);
+	return ret;
+}
+EXPORT_SYMBOL(uverbs_reg_mr_handler);
+
+DECLARE_UVERBS_ACTION(uverbs_action_reg_mr, uverbs_reg_mr_handler, NULL,
+		      &uverbs_reg_mr_spec, &uverbs_uhw_compat_spec);
+EXPORT_SYMBOL(uverbs_action_reg_mr);
+
+DECLARE_UVERBS_ATTR_SPEC(
+	uverbs_dereg_mr_spec,
+	UVERBS_ATTR_IDR(DEREG_MR_HANDLE, UVERBS_TYPE_MR, UVERBS_IDR_ACCESS_DESTROY));
+EXPORT_SYMBOL(uverbs_dereg_mr_spec);
+
+int uverbs_dereg_mr_handler(struct ib_device *ib_dev,
+			    struct ib_ucontext *ucontext,
+			    struct uverbs_attr_array *common,
+			    struct uverbs_attr_array *vendor,
+			    void *priv)
+{
+	struct ib_mr             *mr;
+
+	if (!common->attrs[REG_MR_HANDLE].valid)
+		return -EINVAL;
+
+	mr = common->attrs[DEREG_MR_HANDLE].obj_attr.uobject->object;
+
+	/* dereg_mr doesn't support vendor data */
+	return ib_dereg_mr(mr);
+};
+EXPORT_SYMBOL(uverbs_dereg_mr_handler);
+
+DECLARE_UVERBS_ACTION(uverbs_action_dereg_mr, uverbs_dereg_mr_handler, NULL,
+		      &uverbs_dereg_mr_spec);
+EXPORT_SYMBOL(uverbs_action_dereg_mr);
+
+DECLARE_UVERBS_ATTR_SPEC(
+	uverbs_create_comp_channel_spec,
+	UVERBS_ATTR_FD(CREATE_COMP_CHANNEL_FD, UVERBS_TYPE_COMP_CHANNEL, UVERBS_IDR_ACCESS_NEW));
+EXPORT_SYMBOL(uverbs_create_comp_channel_spec);
+
+int uverbs_create_comp_channel_handler(struct ib_device *ib_dev,
+				       struct ib_ucontext *ucontext,
+				       struct uverbs_attr_array *common,
+				       struct uverbs_attr_array *vendor,
+				       void *priv)
+{
+	struct ib_uverbs_event_file *ev_file;
+
+	if (!common->attrs[CREATE_COMP_CHANNEL_FD].valid)
+		return -EINVAL;
+
+	ev_file = uverbs_fd_to_priv(common->attrs[CREATE_COMP_CHANNEL_FD].obj_attr.uobject);
+	kref_init(&ev_file->ref);
+	spin_lock_init(&ev_file->lock);
+	INIT_LIST_HEAD(&ev_file->event_list);
+	init_waitqueue_head(&ev_file->poll_wait);
+	ev_file->async_queue = NULL;
+	ev_file->is_closed   = 0;
+
+	/*
+	 * The original code puts the handle in an event list....
+	 * Currently, it's on our context
+	 */
+
+	return 0;
+}
+EXPORT_SYMBOL(uverbs_create_comp_channel_handler);
+
+DECLARE_UVERBS_ACTION(uverbs_action_create_comp_channel, uverbs_create_comp_channel_handler, NULL,
+		      &uverbs_create_comp_channel_spec);
+EXPORT_SYMBOL(uverbs_action_create_comp_channel);
+
+DECLARE_UVERBS_ATTR_SPEC(
+	uverbs_create_cq_spec,
+	UVERBS_ATTR_IDR(CREATE_CQ_HANDLE, UVERBS_TYPE_CQ, UVERBS_IDR_ACCESS_NEW),
+	UVERBS_ATTR_PTR_IN(CREATE_CQ_CQE, sizeof(__u32)),
+	UVERBS_ATTR_PTR_IN(CREATE_CQ_USER_HANDLE, sizeof(__u64)),
+	UVERBS_ATTR_FD(CREATE_CQ_COMP_CHANNEL, UVERBS_TYPE_COMP_CHANNEL, UVERBS_IDR_ACCESS_READ),
+	UVERBS_ATTR_PTR_IN(CREATE_CQ_COMP_VECTOR, sizeof(__u32)),
+	UVERBS_ATTR_PTR_IN(CREATE_CQ_FLAGS, sizeof(__u32)),
+	UVERBS_ATTR_PTR_OUT(CREATE_CQ_RESP_CQE, sizeof(__u32)));
+EXPORT_SYMBOL(uverbs_create_cq_spec);
+
+int uverbs_create_cq_handler(struct ib_device *ib_dev,
+			     struct ib_ucontext *ucontext,
+			     struct uverbs_attr_array *common,
+			     struct uverbs_attr_array *vendor,
+			     void *priv)
+{
+	struct ib_ucq_object           *obj;
+	struct ib_udata uhw;
+	int ret;
+	__u64 user_handle = 0;
+	struct ib_cq_init_attr attr = {};
+	struct ib_cq                   *cq;
+	struct ib_uverbs_event_file    *ev_file = NULL;
+
+	/*
+	 * Currently, COMP_VECTOR is mandatory, but that could be lifted in the
+	 * future.
+	 */
+	if (!common->attrs[CREATE_CQ_HANDLE].valid ||
+	    !common->attrs[CREATE_CQ_RESP_CQE].valid)
+		return -EINVAL;
+
+	ret = UVERBS_COPY_FROM(&attr.comp_vector, common, CREATE_CQ_COMP_VECTOR);
+	if (!ret)
+		ret = UVERBS_COPY_FROM(&attr.cqe, common, CREATE_CQ_CQE);
+	if (ret)
+		return ret;
+
+	/* Optional params */
+	if (UVERBS_COPY_FROM(&attr.flags, common, CREATE_CQ_FLAGS) == -EFAULT ||
+	    UVERBS_COPY_FROM(&user_handle, common, CREATE_CQ_USER_HANDLE) == -EFAULT)
+		return -EFAULT;
+
+	if (common->attrs[CREATE_CQ_COMP_CHANNEL].valid) {
+		ev_file = uverbs_fd_to_priv(common->attrs[CREATE_CQ_COMP_CHANNEL].obj_attr.uobject);
+		kref_get(&ev_file->ref);
+	}
+
+	if (attr.comp_vector >= ucontext->ufile->device->num_comp_vectors)
+		return -EINVAL;
+
+	obj = container_of(common->attrs[CREATE_CQ_HANDLE].obj_attr.uobject,
+			   typeof(*obj), uobject);
+	obj->uverbs_file	   = ucontext->ufile;
+	obj->comp_events_reported  = 0;
+	obj->async_events_reported = 0;
+	INIT_LIST_HEAD(&obj->comp_list);
+	INIT_LIST_HEAD(&obj->async_list);
+
+	/* Temporary, only until vendors get the new uverbs_attr_array */
+	create_udata(vendor, &uhw);
+
+	cq = ib_dev->create_cq(ib_dev, &attr, ucontext, &uhw);
+	if (IS_ERR(cq))
+		return PTR_ERR(cq);
+
+	cq->device        = ib_dev;
+	cq->uobject       = &obj->uobject;
+	cq->comp_handler  = ib_uverbs_comp_handler;
+	cq->event_handler = ib_uverbs_cq_event_handler;
+	cq->cq_context    = ev_file;
+	obj->uobject.object = cq;
+	obj->uobject.user_handle = user_handle;
+	atomic_set(&cq->usecnt, 0);
+
+	ret = UVERBS_COPY_TO(common, CREATE_CQ_RESP_CQE, &cq->cqe);
+	if (ret)
+		goto err;
+
+	return 0;
+err:
+	ib_destroy_cq(cq);
+	return ret;
+};
+EXPORT_SYMBOL(uverbs_create_cq_handler);
+
+DECLARE_UVERBS_ACTION(uverbs_action_create_cq, uverbs_create_cq_handler, NULL,
+		      &uverbs_create_cq_spec, &uverbs_uhw_compat_spec);
+EXPORT_SYMBOL(uverbs_action_create_cq);
+
+DECLARE_UVERBS_ACTIONS(
+	uverbs_actions_comp_channel,
+	ADD_UVERBS_ACTION_PTR(UVERBS_COMP_CHANNEL_CREATE, &uverbs_action_create_comp_channel),
+);
+EXPORT_SYMBOL(uverbs_actions_comp_channel);
+
+DECLARE_UVERBS_ACTIONS(
+	uverbs_actions_cq,
+	ADD_UVERBS_ACTION_PTR(UVERBS_CQ_CREATE, &uverbs_action_create_cq),
+);
+EXPORT_SYMBOL(uverbs_actions_cq);
+
+DECLARE_UVERBS_ACTIONS(
+	uverbs_actions_mr,
+	ADD_UVERBS_ACTION_PTR(UVERBS_MR_REG, &uverbs_action_reg_mr),
+	ADD_UVERBS_ACTION_PTR(UVERBS_MR_DEREG, &uverbs_action_dereg_mr),
+);
+EXPORT_SYMBOL(uverbs_actions_mr);
+
+DECLARE_UVERBS_ACTIONS(
+	uverbs_actions_pd,
+	ADD_UVERBS_ACTION_PTR(UVERBS_PD_ALLOC, &uverbs_action_alloc_pd),
+);
+EXPORT_SYMBOL(uverbs_actions_pd);
+
+DECLARE_UVERBS_ACTIONS(
+	uverbs_actions_device,
+	ADD_UVERBS_ACTION_PTR(UVERBS_DEVICE_QUERY, &uverbs_action_query_device),
+	ADD_UVERBS_ACTION_PTR(UVERBS_DEVICE_ALLOC_CONTEXT, &uverbs_action_get_context),
+);
+EXPORT_SYMBOL(uverbs_actions_device);
+
+DECLARE_UVERBS_TYPE(uverbs_type_comp_channel,
+		    /* 1 is used in order to free the comp_channel after the CQs */
+		    &UVERBS_TYPE_ALLOC_FD(1, sizeof(struct ib_uobject) + sizeof(struct ib_uverbs_event_file),
+					  uverbs_free_event_file,
+					  &uverbs_refactored_event_fops,
+					  "[infinibandevent]", O_RDONLY),
+		    &uverbs_actions_comp_channel);
+EXPORT_SYMBOL(uverbs_type_comp_channel);
+
+DECLARE_UVERBS_TYPE(uverbs_type_cq,
+		    /* 1 is used in order to free the MR after all the MWs */
+		    &UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_ucq_object), 0,
+					      uverbs_free_cq),
+		    &uverbs_actions_cq);
+EXPORT_SYMBOL(uverbs_type_cq);
+
+DECLARE_UVERBS_TYPE(uverbs_type_mr,
+		    /* 1 is used in order to free the MR after all the MWs */
+		    &UVERBS_TYPE_ALLOC_IDR(1, uverbs_free_mr),
+		    &uverbs_actions_mr);
+EXPORT_SYMBOL(uverbs_type_mr);
+
+DECLARE_UVERBS_TYPE(uverbs_type_pd,
+		    /* 2 is used in order to free the PD after all objects */
+		    &UVERBS_TYPE_ALLOC_IDR(2, uverbs_free_pd),
+		    &uverbs_actions_pd);
+EXPORT_SYMBOL(uverbs_type_pd);
+
+DECLARE_UVERBS_TYPE(uverbs_type_device, NULL, &uverbs_actions_device);
+EXPORT_SYMBOL(uverbs_type_device);
+
+DECLARE_UVERBS_TYPES(uverbs_types,
+		     ADD_UVERBS_TYPE(UVERBS_TYPE_DEVICE, uverbs_type_device),
+		     ADD_UVERBS_TYPE(UVERBS_TYPE_PD, uverbs_type_pd),
+		     ADD_UVERBS_TYPE(UVERBS_TYPE_MR, uverbs_type_mr),
+		     ADD_UVERBS_TYPE(UVERBS_TYPE_COMP_CHANNEL, uverbs_type_comp_channel),
+		     ADD_UVERBS_TYPE(UVERBS_TYPE_CQ, uverbs_type_cq),
+);
+EXPORT_SYMBOL(uverbs_types);
+
+DECLARE_UVERBS_TYPES_GROUP(uverbs_types_group, &uverbs_types);
+EXPORT_SYMBOL(uverbs_types_group);
+
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index ff43ff4..efa908a 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -369,6 +369,43 @@ static int ib_uverbs_event_close(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+static void ib_uverbs_release_refactored_event_file(struct kref *ref)
+{
+	struct ib_uverbs_event_file *file =
+		container_of(ref, struct ib_uverbs_event_file, ref);
+
+	ib_uverbs_cleanup_fd(file);
+}
+
+/* TODO: REFACTOR */
+static int ib_uverbs_event_refactored_close(struct inode *inode, struct file *filp)
+{
+	struct ib_uverbs_event_file *file = filp->private_data;
+	struct ib_uverbs_event *entry, *tmp;
+
+	spin_lock_irq(&file->lock);
+	list_for_each_entry_safe(entry, tmp, &file->event_list, list) {
+		if (entry->counter)
+			list_del(&entry->obj_list);
+		kfree(entry);
+	}
+	spin_unlock_irq(&file->lock);
+
+	ib_uverbs_close_fd(filp);
+	kref_put(&file->ref, ib_uverbs_release_refactored_event_file);
+
+	return 0;
+}
+
+const struct file_operations uverbs_refactored_event_fops = {
+	.owner	 = THIS_MODULE,
+	.read	 = ib_uverbs_event_read,
+	.poll    = ib_uverbs_event_poll,
+	.release = ib_uverbs_event_refactored_close,
+	.fasync  = ib_uverbs_event_fasync,
+	.llseek	 = no_llseek,
+};
+
 static const struct file_operations uverbs_event_fops = {
 	.owner	 = THIS_MODULE,
 	.read	 = ib_uverbs_event_read,
diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
index 2f50045..12642cc 100644
--- a/include/rdma/uverbs_ioctl.h
+++ b/include/rdma/uverbs_ioctl.h
@@ -134,6 +134,153 @@ struct uverbs_types_group {
 	void					*priv;
 };
 
+#define UVERBS_ATTR(_id, _len, _type)					\
+	[_id] = {.len = _len, .type = _type}
+#define UVERBS_ATTR_PTR_IN(_id, _len)					\
+	UVERBS_ATTR(_id, _len, UVERBS_ATTR_TYPE_PTR_IN)
+#define UVERBS_ATTR_PTR_OUT(_id, _len)					\
+	UVERBS_ATTR(_id, _len, UVERBS_ATTR_TYPE_PTR_OUT)
+#define UVERBS_ATTR_IDR(_id, _idr_type, _access)			\
+	[_id] = {.type = UVERBS_ATTR_TYPE_IDR,				\
+		 .obj = {.obj_type = _idr_type,				\
+			 .access = _access				\
+		 } }
+#define UVERBS_ATTR_FD(_id, _fd_type, _access)				\
+	[_id] = {.type = UVERBS_ATTR_TYPE_FD,				\
+		 .obj = {.obj_type = _fd_type,				\
+			 .access = _access + BUILD_BUG_ON_ZERO(		\
+				_access != UVERBS_IDR_ACCESS_NEW &&	\
+				_access != UVERBS_IDR_ACCESS_READ)	\
+		 } }
+#define _UVERBS_ATTR_SPEC_SZ(...)					\
+	(sizeof((const struct uverbs_attr_spec[]){__VA_ARGS__}) /	\
+	 sizeof(const struct uverbs_attr_spec))
+#define UVERBS_ATTR_SPEC(...)					\
+	((const struct uverbs_attr_group_spec)				\
+	 {.attrs = (struct uverbs_attr_spec[]){__VA_ARGS__},		\
+	  .num_attrs = _UVERBS_ATTR_SPEC_SZ(__VA_ARGS__)})
+#define DECLARE_UVERBS_ATTR_SPEC(name, ...)			\
+	const struct uverbs_attr_group_spec name =			\
+		UVERBS_ATTR_SPEC(__VA_ARGS__)
+#define _UVERBS_ATTR_ACTION_SPEC_SZ(...)				  \
+	(sizeof((const struct uverbs_attr_group_spec *[]){__VA_ARGS__}) / \
+	 sizeof(const struct uverbs_attr_group_spec *))
+#define _UVERBS_ATTR_ACTION_SPEC(_distfn, _priv, ...)			\
+	{.dist = _distfn,						\
+	 .priv = _priv,							\
+	 .num_groups =	_UVERBS_ATTR_ACTION_SPEC_SZ(__VA_ARGS__),	\
+	 .attr_groups = (const struct uverbs_attr_group_spec *[]){__VA_ARGS__} }
+#define UVERBS_ACTION_SPEC(...)						\
+	_UVERBS_ATTR_ACTION_SPEC(ib_uverbs_std_dist,			\
+				(void *)_UVERBS_ATTR_ACTION_SPEC_SZ(__VA_ARGS__),\
+				__VA_ARGS__)
+#define UVERBS_ACTION(_handler, _priv, ...)				\
+	((const struct uverbs_action) {					\
+		.priv = &(struct uverbs_action_std_handler)		\
+			{.handler = _handler,				\
+			 .priv = _priv},				\
+		.handler = uverbs_action_std_handle,			\
+		.spec = UVERBS_ACTION_SPEC(__VA_ARGS__)})
+#define UVERBS_CTX_ACTION(_handler, _priv, ...)			\
+	((const struct uverbs_action){					\
+		.priv = &(struct uverbs_action_std_ctx_handler)		\
+			{.handler = _handler,				\
+			 .priv = _priv},				\
+		.handler = uverbs_action_std_ctx_handle,		\
+		.spec = UVERBS_ACTION_SPEC(__VA_ARGS__)})
+#define _UVERBS_ACTIONS_SZ(...)					\
+	(sizeof((const struct uverbs_action *[]){__VA_ARGS__}) /	\
+	 sizeof(const struct uverbs_action *))
+#define ADD_UVERBS_ACTION(action_idx, _handler, _priv,  ...)		\
+	[action_idx] = &UVERBS_ACTION(_handler, _priv, __VA_ARGS__)
+#define DECLARE_UVERBS_ACTION(name, _handler, _priv, ...)		\
+	const struct uverbs_action name =				\
+		UVERBS_ACTION(_handler, _priv, __VA_ARGS__)
+#define ADD_UVERBS_CTX_ACTION(action_idx, _handler, _priv,  ...)	\
+	[action_idx] = &UVERBS_CTX_ACTION(_handler, _priv, __VA_ARGS__)
+#define DECLARE_UVERBS_CTX_ACTION(name, _handler, _priv, ...)	\
+	const struct uverbs_action name =				\
+		UVERBS_CTX_ACTION(_handler, _priv, __VA_ARGS__)
+#define ADD_UVERBS_ACTION_PTR(idx, ptr)					\
+	[idx] = ptr
+#define UVERBS_ACTIONS(...)						\
+	((const struct uverbs_type_actions_group)			\
+	  {.num_actions = _UVERBS_ACTIONS_SZ(__VA_ARGS__),		\
+	   .actions = (const struct uverbs_action *[]){__VA_ARGS__} })
+#define DECLARE_UVERBS_ACTIONS(name, ...)				\
+	const struct  uverbs_type_actions_group name =			\
+		UVERBS_ACTIONS(__VA_ARGS__)
+#define _UVERBS_ACTIONS_GROUP_SZ(...)					\
+	(sizeof((const struct uverbs_type_actions_group*[]){__VA_ARGS__}) / \
+	 sizeof(const struct uverbs_type_actions_group *))
+#define UVERBS_TYPE_ALLOC_FD(_order, _obj_size, _free_fn, _fops, _name, _flags)\
+	((const struct uverbs_type_alloc_action)			\
+	 {.type = UVERBS_ATTR_TYPE_FD,					\
+	 .order = _order,						\
+	 .obj_size = _obj_size,						\
+	 .free_fn = _free_fn,						\
+	 .fd = {.fops = _fops,						\
+		.name = _name,						\
+		.flags = _flags} })
+#define UVERBS_TYPE_ALLOC_IDR_SZ(_size, _order, _free_fn)		\
+	((const struct uverbs_type_alloc_action)			\
+	 {.type = UVERBS_ATTR_TYPE_IDR,					\
+	 .order = _order,						\
+	 .free_fn = _free_fn,						\
+	 .obj_size = _size,})
+#define UVERBS_TYPE_ALLOC_IDR(_order, _free_fn)				\
+	 UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uobject), _order, _free_fn)
+#define _DECLARE_UVERBS_TYPE(name, _alloc, _dist, _priv, ...)		\
+	const struct uverbs_type name = {				\
+		.alloc = _alloc,					\
+		.dist = _dist,						\
+		.priv = _priv,						\
+		.num_groups = _UVERBS_ACTIONS_GROUP_SZ(__VA_ARGS__),	\
+		.action_groups = (const struct uverbs_type_actions_group *[]){__VA_ARGS__} \
+	}
+#define DECLARE_UVERBS_TYPE(name, _alloc,  ...)				\
+	_DECLARE_UVERBS_TYPE(name, _alloc, ib_uverbs_std_dist, NULL,	\
+			     __VA_ARGS__)
+#define _UVERBS_TYPE_SZ(...)						\
+	(sizeof((const struct uverbs_type *[]){__VA_ARGS__}) /	\
+	 sizeof(const struct uverbs_type *))
+#define ADD_UVERBS_TYPE_ACTIONS(type_idx, ...)				\
+	[type_idx] = &UVERBS_ACTIONS(__VA_ARGS__)
+#define ADD_UVERBS_TYPE(type_idx, type_ptr)				\
+	[type_idx] = ((const struct uverbs_type * const)&type_ptr)
+#define UVERBS_TYPES(...)  ((const struct uverbs_types)			\
+	{.num_types = _UVERBS_TYPE_SZ(__VA_ARGS__),			\
+	 .types = (const struct uverbs_type *[]){__VA_ARGS__} })
+#define DECLARE_UVERBS_TYPES(name, ...)				\
+	const struct uverbs_types name = UVERBS_TYPES(__VA_ARGS__)
+
+#define _UVERBS_TYPES_SZ(...)						\
+	(sizeof((const struct uverbs_types *[]){__VA_ARGS__}) /	\
+	 sizeof(const struct uverbs_types *))
+
+#define UVERBS_TYPES_GROUP(_dist, _priv, ...)				\
+	((const struct uverbs_types_group){				\
+		.dist = _dist,						\
+		.priv = _priv,						\
+		.type_groups = (const struct uverbs_types *[]){__VA_ARGS__},\
+		.num_groups = _UVERBS_TYPES_SZ(__VA_ARGS__)})
+#define _DECLARE_UVERBS_TYPES_GROUP(name, _dist, _priv, ...)		\
+	const struct uverbs_types_group name = UVERBS_TYPES_GROUP(_dist, _priv,\
+								  __VA_ARGS__)
+#define DECLARE_UVERBS_TYPES_GROUP(name, ...)		\
+	_DECLARE_UVERBS_TYPES_GROUP(name, ib_uverbs_std_dist, NULL, __VA_ARGS__)
+
+#define UVERBS_COPY_TO(attr_array, idx, from)				\
+	((attr_array)->attrs[idx].valid ?				\
+	 (copy_to_user((attr_array)->attrs[idx].cmd_attr.ptr, (from),	\
+		       (attr_array)->attrs[idx].cmd_attr.len) ?		\
+	  -EFAULT : 0) : -ENOENT)
+#define UVERBS_COPY_FROM(to, attr_array, idx)				\
+	((attr_array)->attrs[idx].valid ?				\
+	 (copy_from_user((to), (attr_array)->attrs[idx].cmd_attr.ptr,	\
+			 (attr_array)->attrs[idx].cmd_attr.len) ?	\
+	  -EFAULT : 0) : -ENOENT)
+
 /* =================================================
  *              Parsing infrastructure
  * =================================================
diff --git a/include/rdma/uverbs_ioctl_cmd.h b/include/rdma/uverbs_ioctl_cmd.h
index e0299a2..7a2d8f1 100644
--- a/include/rdma/uverbs_ioctl_cmd.h
+++ b/include/rdma/uverbs_ioctl_cmd.h
@@ -37,6 +37,11 @@
 
 #define IB_UVERBS_VENDOR_FLAG	0x8000
 
+enum {
+	UVERBS_UHW_IN,
+	UVERBS_UHW_OUT,
+};
+
 int ib_uverbs_std_dist(__u16 *attr_id, void *priv);
 
 /* common validators */
@@ -108,5 +113,143 @@ enum uverbs_common_types {
 	UVERBS_TYPE_LAST,
 };
 
+enum uverbs_create_cq_cmd_attr {
+	CREATE_CQ_HANDLE,
+	CREATE_CQ_CQE,
+	CREATE_CQ_USER_HANDLE,
+	CREATE_CQ_COMP_CHANNEL,
+	CREATE_CQ_COMP_VECTOR,
+	CREATE_CQ_FLAGS,
+	CREATE_CQ_RESP_CQE,
+};
+
+enum uverbs_create_comp_channel_cmd_attr {
+	CREATE_COMP_CHANNEL_FD,
+};
+
+enum uverbs_get_context {
+	GET_CONTEXT_RESP,
+};
+
+enum uverbs_query_device {
+	QUERY_DEVICE_RESP,
+	QUERY_DEVICE_ODP,
+	QUERY_DEVICE_TIMESTAMP_MASK,
+	QUERY_DEVICE_HCA_CORE_CLOCK,
+	QUERY_DEVICE_CAP_FLAGS,
+};
+
+enum uverbs_alloc_pd {
+	ALLOC_PD_HANDLE,
+};
+
+enum uverbs_reg_mr {
+	REG_MR_HANDLE,
+	REG_MR_PD_HANDLE,
+	REG_MR_CMD,
+	REG_MR_RESP
+};
+
+enum uverbs_dereg_mr {
+	DEREG_MR_HANDLE,
+};
+
+extern const struct uverbs_attr_group_spec uverbs_uhw_compat_spec;
+extern const struct uverbs_attr_group_spec uverbs_get_context_spec;
+extern const struct uverbs_attr_group_spec uverbs_query_device_spec;
+extern const struct uverbs_attr_group_spec uverbs_alloc_pd_spec;
+extern const struct uverbs_attr_group_spec uverbs_reg_mr_spec;
+extern const struct uverbs_attr_group_spec uverbs_dereg_mr_spec;
+
+int uverbs_get_context(struct ib_device *ib_dev,
+		       struct ib_uverbs_file *file,
+		       struct uverbs_attr_array *common,
+		       struct uverbs_attr_array *vendor,
+		       void *priv);
+
+int uverbs_query_device_handler(struct ib_device *ib_dev,
+				struct ib_ucontext *ucontext,
+				struct uverbs_attr_array *common,
+				struct uverbs_attr_array *vendor,
+				void *priv);
+
+int uverbs_alloc_pd_handler(struct ib_device *ib_dev,
+			    struct ib_ucontext *ucontext,
+			    struct uverbs_attr_array *common,
+			    struct uverbs_attr_array *vendor,
+			    void *priv);
+
+int uverbs_reg_mr_handler(struct ib_device *ib_dev,
+			  struct ib_ucontext *ucontext,
+			  struct uverbs_attr_array *common,
+			  struct uverbs_attr_array *vendor,
+			  void *priv);
+
+int uverbs_dereg_mr_handler(struct ib_device *ib_dev,
+			    struct ib_ucontext *ucontext,
+			    struct uverbs_attr_array *common,
+			    struct uverbs_attr_array *vendor,
+			    void *priv);
+
+int uverbs_create_comp_channel_handler(struct ib_device *ib_dev,
+				       struct ib_ucontext *ucontext,
+				       struct uverbs_attr_array *common,
+				       struct uverbs_attr_array *vendor,
+				       void *priv);
+
+int uverbs_create_cq_handler(struct ib_device *ib_dev,
+			     struct ib_ucontext *ucontext,
+			     struct uverbs_attr_array *common,
+			     struct uverbs_attr_array *vendor,
+			     void *priv);
+
+extern const struct uverbs_action uverbs_action_get_context;
+extern const struct uverbs_action uverbs_action_create_cq;
+extern const struct uverbs_action uverbs_action_create_comp_channel;
+extern const struct uverbs_action uverbs_action_query_device;
+extern const struct uverbs_action uverbs_action_alloc_pd;
+extern const struct uverbs_action uverbs_action_reg_mr;
+extern const struct uverbs_action uverbs_action_dereg_mr;
+
+enum uverbs_actions_mr_ops {
+	UVERBS_MR_REG,
+	UVERBS_MR_DEREG,
+};
+
+extern const struct uverbs_type_actions_group uverbs_actions_mr;
+
+enum uverbs_actions_comp_channel_ops {
+	UVERBS_COMP_CHANNEL_CREATE,
+};
+
+extern const struct uverbs_type_actions_group uverbs_actions_comp_channel;
+
+enum uverbs_actions_cq_ops {
+	UVERBS_CQ_CREATE,
+};
+
+extern const struct uverbs_type_actions_group uverbs_actions_cq;
+
+enum uverbs_actions_pd_ops {
+	UVERBS_PD_ALLOC
+};
+
+extern const struct uverbs_type_actions_group uverbs_actions_pd;
+
+enum uverbs_actions_device_ops {
+	UVERBS_DEVICE_ALLOC_CONTEXT,
+	UVERBS_DEVICE_QUERY,
+};
+
+extern const struct uverbs_type_actions_group uverbs_actions_device;
+
+extern const struct uverbs_type uverbs_type_cq;
+extern const struct uverbs_type uverbs_type_comp_channel;
+extern const struct uverbs_type uverbs_type_mr;
+extern const struct uverbs_type uverbs_type_pd;
+extern const struct uverbs_type uverbs_type_device;
+
+extern const struct uverbs_types uverbs_common_types;
+extern const struct uverbs_types_group uverbs_types_group;
 #endif
 
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 25225eb..0e9821b 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -317,12 +317,25 @@ struct ib_uverbs_reg_mr {
 	__u64 driver_data[0];
 };
 
+struct ib_uverbs_ioctl_reg_mr {
+	__u64 start;
+	__u64 length;
+	__u64 hca_va;
+	__u32 access_flags;
+	__u32 reserved;
+};
+
 struct ib_uverbs_reg_mr_resp {
 	__u32 mr_handle;
 	__u32 lkey;
 	__u32 rkey;
 };
 
+struct ib_uverbs_ioctl_reg_mr_resp {
+	__u32 lkey;
+	__u32 rkey;
+};
+
 struct ib_uverbs_rereg_mr {
 	__u64 response;
 	__u32 mr_handle;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC ABI V5 06/10] IB/mlx5: Implement common uverb objects
       [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-10-27 14:43   ` [RFC ABI V5 05/10] RDMA/core: Add uverbs types, actions, handlers and attributes Matan Barak
@ 2016-10-27 14:43   ` Matan Barak
  2016-10-27 14:43   ` [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space Matan Barak
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Matan Barak @ 2016-10-27 14:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Christoph Lameter,
	Liran Liss, Haggai Eran, Majd Dibbiny, Matan Barak, Tal Alon,
	Leon Romanovsky

This patch simply tells mlx5 to use the uverb objects declared by
the common layer.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index f4160d5..40204d4 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -51,6 +51,7 @@
 #include <linux/list.h>
 #include <rdma/ib_smi.h>
 #include <rdma/ib_umem.h>
+#include <rdma/uverbs_ioctl_cmd.h>
 #include <linux/in.h>
 #include <linux/etherdevice.h>
 #include <linux/mlx5/fs.h>
@@ -3128,6 +3129,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	if (err)
 		goto err_odp;
 
+	dev->ib_dev.types_group = &uverbs_types_group;
 	err = ib_register_device(&dev->ib_dev, NULL);
 	if (err)
 		goto err_q_cnt;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
       [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-10-27 14:43   ` [RFC ABI V5 06/10] IB/mlx5: Implement common uverb objects Matan Barak
@ 2016-10-27 14:43   ` Matan Barak
       [not found]     ` <1477579398-6875-8-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-10-27 14:43   ` [RFC ABI V5 08/10] IB/core: Implement compatibility layer for get context command Matan Barak
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Matan Barak @ 2016-10-27 14:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Christoph Lameter,
	Liran Liss, Haggai Eran, Majd Dibbiny, Matan Barak, Tal Alon,
	Leon Romanovsky

In order to allow compatability header, allow passing
ib_uverbs_ioctl_hdr and ib_uverbs_attr from kernel space.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs_ioctl.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
index 9d56b17..c02b6d5 100644
--- a/drivers/infiniband/core/uverbs_ioctl.c
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -176,10 +176,11 @@ static int uverbs_handle_action(struct ib_uverbs_attr __user *uattr_ptr,
 	return ret;
 }
 
-static long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
-				struct ib_uverbs_file *file,
-				struct ib_uverbs_ioctl_hdr *hdr,
-				void __user *buf)
+long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
+			 struct ib_uverbs_file *file,
+			 struct ib_uverbs_ioctl_hdr *hdr,
+			 void __user *buf,
+			 mm_segment_t oldfs)
 {
 	const struct uverbs_type *type;
 	const struct uverbs_action *action;
@@ -193,6 +194,7 @@ static long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
 	} *ctx = NULL;
 	struct uverbs_attr *curr_attr;
 	size_t ctx_size;
+	mm_segment_t currentfs = get_fs();
 
 	if (!ib_dev)
 		return -EIO;
@@ -240,8 +242,10 @@ static long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
 		goto out;
 	}
 
+	set_fs(oldfs);
 	err = uverbs_handle_action(buf, ctx->uattrs, hdr->num_attrs, ib_dev,
 				   file, action, ctx->uverbs_attr_array);
+	set_fs(currentfs);
 out:
 	kfree(ctx);
 	return err;
@@ -296,7 +300,8 @@ long ib_uverbs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (!down_read_trylock(&file->close_sem))
 			return -EIO;
 		err = ib_uverbs_cmd_verbs(ib_dev, file, &hdr,
-					  (__user void *)arg + sizeof(hdr));
+					  (__user void *)arg + sizeof(hdr),
+					  get_fs());
 		up_read(&file->close_sem);
 	}
 out:
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC ABI V5 08/10] IB/core: Implement compatibility layer for get context command
       [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2016-10-27 14:43   ` [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space Matan Barak
@ 2016-10-27 14:43   ` Matan Barak
  2016-10-27 14:43   ` [RFC ABI V5 09/10] IB/core: Add create_qp command to the new ABI Matan Barak
  2016-10-27 14:43   ` [RFC ABI V5 10/10] IB/core: Add modify_qp " Matan Barak
  9 siblings, 0 replies; 29+ messages in thread
From: Matan Barak @ 2016-10-27 14:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Christoph Lameter,
	Liran Liss, Haggai Eran, Majd Dibbiny, Matan Barak, Tal Alon,
	Leon Romanovsky

Implement write -> ioctl compatibility layer for ib_uverbs_get_context
by translating the headers to ioctl headers and invoke the ioctl
parser.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h     |   6 ++
 drivers/infiniband/core/uverbs_cmd.c | 169 +++++++++++++++++------------------
 2 files changed, 87 insertions(+), 88 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index ad4af37..9f9f748 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -84,7 +84,13 @@
  * released when the CQ is destroyed.
  */
 
+struct ib_uverbs_ioctl_hdr;
 long ib_uverbs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
+			 struct ib_uverbs_file *file,
+			 struct ib_uverbs_ioctl_hdr *hdr,
+			 void __user *buf,
+			 mm_segment_t oldfs);
 
 struct ib_uverbs_device {
 	atomic_t				refcount;
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 8fc1557..b1baa67 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -37,6 +37,8 @@
 #include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
+#include <rdma/rdma_user_ioctl.h>
+#include <rdma/uverbs_ioctl_cmd.h>
 
 #include <asm/uaccess.h>
 
@@ -303,6 +305,55 @@ static void put_xrcd_read(struct ib_uobject *uobj)
 {
 	put_uobj_read(uobj);
 }
+
+static int get_vendor_num_attrs(size_t cmd, size_t resp, int in_len,
+				int out_len)
+{
+	return !!(cmd != in_len) + !!(resp != out_len);
+}
+
+static void init_ioctl_hdr(struct ib_uverbs_ioctl_hdr *hdr,
+			   struct ib_device *ib_dev,
+			   size_t num_attrs,
+			   u16 object_type,
+			   u16 action)
+{
+	hdr->length = sizeof(*hdr) + num_attrs * sizeof(hdr->attrs[0]);
+	hdr->flags = 0;
+	hdr->object_type = object_type;
+	hdr->driver_id = ib_dev->driver_id;
+	hdr->action = action;
+	hdr->num_attrs = num_attrs;
+}
+
+static void fill_attr_ptr(struct ib_uverbs_attr *attr, u16 attr_id, u16 len,
+			  const void * __user source)
+{
+	attr->attr_id = attr_id;
+	attr->len = len;
+	attr->reserved = 0;
+	attr->ptr_idr = (__u64)source;
+}
+
+static void fill_hw_attrs(struct ib_uverbs_attr *hw_attrs,
+			  const void __user *in_buf,
+			  const void __user *out_buf,
+			  size_t cmd_size, size_t resp_size,
+			  int in_len, int out_len)
+{
+	if (in_len > cmd_size)
+		fill_attr_ptr(&hw_attrs[UVERBS_UHW_IN],
+			      UVERBS_UHW_IN | IB_UVERBS_VENDOR_FLAG,
+			      in_len - cmd_size,
+			      in_buf + cmd_size);
+
+	if (out_len > resp_size)
+		fill_attr_ptr(&hw_attrs[UVERBS_UHW_OUT],
+			      UVERBS_UHW_OUT | IB_UVERBS_VENDOR_FLAG,
+			      out_len - resp_size,
+			      out_buf + resp_size);
+}
+
 ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 			      struct ib_device *ib_dev,
 			      const char __user *buf,
@@ -310,104 +361,46 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 {
 	struct ib_uverbs_get_context      cmd;
 	struct ib_uverbs_get_context_resp resp;
-	struct ib_udata                   udata;
-	struct ib_ucontext		 *ucontext;
-	struct file			 *filp;
-	int ret;
+	struct {
+		struct ib_uverbs_ioctl_hdr hdr;
+		struct ib_uverbs_attr  cmd_attrs[GET_CONTEXT_RESP + 1];
+		struct ib_uverbs_attr  hw_attrs[UVERBS_UHW_OUT + 1];
+	} ioctl_cmd;
+	mm_segment_t oldfs = get_fs();
+	long err;
 
 	if (out_len < sizeof resp)
 		return -ENOSPC;
 
-	if (copy_from_user(&cmd, buf, sizeof cmd))
+	if (copy_from_user(&cmd, buf, sizeof(cmd)))
 		return -EFAULT;
 
-	mutex_lock(&file->mutex);
-
-	if (file->ucontext) {
-		ret = -EINVAL;
-		goto err;
-	}
-
-	INIT_UDATA(&udata, buf + sizeof cmd,
-		   (unsigned long) cmd.response + sizeof resp,
-		   in_len - sizeof cmd, out_len - sizeof resp);
+	init_ioctl_hdr(&ioctl_cmd.hdr, ib_dev, ARRAY_SIZE(ioctl_cmd.cmd_attrs) +
+		       get_vendor_num_attrs(sizeof(cmd), sizeof(resp), in_len,
+					    out_len),
+		       UVERBS_TYPE_DEVICE, UVERBS_DEVICE_ALLOC_CONTEXT);
 
-	ucontext = ib_dev->alloc_ucontext(ib_dev, &udata);
-	if (IS_ERR(ucontext)) {
-		ret = PTR_ERR(ucontext);
+	/*
+	 * We have to have a direct mapping between the new format and the old
+	 * format. It's easily achievable with new attributes.
+	 */
+	fill_attr_ptr(&ioctl_cmd.cmd_attrs[GET_CONTEXT_RESP],
+		      GET_CONTEXT_RESP, sizeof(resp),
+		      (const void * __user)cmd.response);
+	fill_hw_attrs(ioctl_cmd.hw_attrs, buf,
+		      (const void * __user)cmd.response, sizeof(cmd),
+		      sizeof(resp), in_len, out_len);
+
+	set_fs(KERNEL_DS);
+	err = ib_uverbs_cmd_verbs(ib_dev, file, &ioctl_cmd.hdr,
+				  ioctl_cmd.cmd_attrs, oldfs);
+	set_fs(oldfs);
+
+	if (err < 0)
 		goto err;
-	}
-
-	ucontext->device = ib_dev;
-	INIT_LIST_HEAD(&ucontext->pd_list);
-	INIT_LIST_HEAD(&ucontext->mr_list);
-	INIT_LIST_HEAD(&ucontext->mw_list);
-	INIT_LIST_HEAD(&ucontext->cq_list);
-	INIT_LIST_HEAD(&ucontext->qp_list);
-	INIT_LIST_HEAD(&ucontext->srq_list);
-	INIT_LIST_HEAD(&ucontext->ah_list);
-	INIT_LIST_HEAD(&ucontext->wq_list);
-	INIT_LIST_HEAD(&ucontext->rwq_ind_tbl_list);
-	INIT_LIST_HEAD(&ucontext->xrcd_list);
-	INIT_LIST_HEAD(&ucontext->rule_list);
-	rcu_read_lock();
-	ucontext->tgid = get_task_pid(current->group_leader, PIDTYPE_PID);
-	rcu_read_unlock();
-	ucontext->closing = 0;
-
-#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
-	ucontext->umem_tree = RB_ROOT;
-	init_rwsem(&ucontext->umem_rwsem);
-	ucontext->odp_mrs_count = 0;
-	INIT_LIST_HEAD(&ucontext->no_private_counters);
-
-	if (!(ib_dev->attrs.device_cap_flags & IB_DEVICE_ON_DEMAND_PAGING))
-		ucontext->invalidate_range = NULL;
-
-#endif
-
-	resp.num_comp_vectors = file->device->num_comp_vectors;
-
-	ret = get_unused_fd_flags(O_CLOEXEC);
-	if (ret < 0)
-		goto err_free;
-	resp.async_fd = ret;
-
-	filp = ib_uverbs_alloc_event_file(file, ib_dev, 1);
-	if (IS_ERR(filp)) {
-		ret = PTR_ERR(filp);
-		goto err_fd;
-	}
-
-	if (copy_to_user((void __user *) (unsigned long) cmd.response,
-			 &resp, sizeof resp)) {
-		ret = -EFAULT;
-		goto err_file;
-	}
-
-	file->ucontext = ucontext;
-	ucontext->ufile = file;
-
-	fd_install(resp.async_fd, filp);
-
-	mutex_unlock(&file->mutex);
-
-	return in_len;
-
-err_file:
-	ib_uverbs_free_async_event_file(file);
-	fput(filp);
-
-err_fd:
-	put_unused_fd(resp.async_fd);
-
-err_free:
-	put_pid(ucontext->tgid);
-	ib_dev->dealloc_ucontext(ucontext);
 
 err:
-	mutex_unlock(&file->mutex);
-	return ret;
+	return err == 0 ? in_len : err;
 }
 
 void uverbs_copy_query_dev_fields(struct ib_device *ib_dev,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC ABI V5 09/10] IB/core: Add create_qp command to the new ABI
       [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2016-10-27 14:43   ` [RFC ABI V5 08/10] IB/core: Implement compatibility layer for get context command Matan Barak
@ 2016-10-27 14:43   ` Matan Barak
  2016-10-27 14:43   ` [RFC ABI V5 10/10] IB/core: Add modify_qp " Matan Barak
  9 siblings, 0 replies; 29+ messages in thread
From: Matan Barak @ 2016-10-27 14:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Christoph Lameter,
	Liran Liss, Haggai Eran, Majd Dibbiny, Matan Barak, Tal Alon,
	Leon Romanovsky

Since crating a XRC_TGT QP requires a different set of attributes,
we introduce a different command in order to create such a QP.
XRC_TGT QP creation command isn't tested at all.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs_ioctl_cmd.c | 235 +++++++++++++++++++++++++++++
 include/rdma/uverbs_ioctl_cmd.h            |  43 ++++++
 include/uapi/rdma/ib_user_verbs.h          |  19 +++
 3 files changed, 297 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_ioctl_cmd.c b/drivers/infiniband/core/uverbs_ioctl_cmd.c
index 623e02e..e0c1af9 100644
--- a/drivers/infiniband/core/uverbs_ioctl_cmd.c
+++ b/drivers/infiniband/core/uverbs_ioctl_cmd.c
@@ -696,6 +696,227 @@ DECLARE_UVERBS_ACTION(uverbs_action_create_cq, uverbs_create_cq_handler, NULL,
 		      &uverbs_create_cq_spec, &uverbs_uhw_compat_spec);
 EXPORT_SYMBOL(uverbs_action_create_cq);
 
+static int qp_fill_attrs(struct ib_qp_init_attr *attr, struct ib_ucontext *ctx,
+			 const struct ib_uverbs_ioctl_create_qp *cmd,
+			 u32 create_flags)
+{
+	if (create_flags & ~(IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK |
+			     IB_QP_CREATE_CROSS_CHANNEL |
+			     IB_QP_CREATE_MANAGED_SEND |
+			     IB_QP_CREATE_MANAGED_RECV |
+			     IB_QP_CREATE_SCATTER_FCS))
+		return -EINVAL;
+
+	attr->event_handler = ib_uverbs_qp_event_handler;
+	attr->qp_context = ctx->ufile;
+	attr->sq_sig_type = cmd->sq_sig_all ? IB_SIGNAL_ALL_WR :
+		IB_SIGNAL_REQ_WR;
+	attr->qp_type = cmd->qp_type;
+
+	attr->cap.max_send_wr     = cmd->max_send_wr;
+	attr->cap.max_recv_wr     = cmd->max_recv_wr;
+	attr->cap.max_send_sge    = cmd->max_send_sge;
+	attr->cap.max_recv_sge    = cmd->max_recv_sge;
+	attr->cap.max_inline_data = cmd->max_inline_data;
+
+	return 0;
+}
+
+static void qp_init_uqp(struct ib_uqp_object *obj)
+{
+	obj->uevent.events_reported     = 0;
+	INIT_LIST_HEAD(&obj->uevent.event_list);
+	INIT_LIST_HEAD(&obj->mcast_list);
+}
+
+static int qp_write_resp(const struct ib_qp_init_attr *attr,
+			 const struct ib_qp *qp,
+			 struct uverbs_attr_array *common)
+{
+	struct ib_uverbs_ioctl_create_qp_resp resp = {
+		.qpn = qp->qp_num,
+		.max_recv_sge    = attr->cap.max_recv_sge,
+		.max_send_sge    = attr->cap.max_send_sge,
+		.max_recv_wr     = attr->cap.max_recv_wr,
+		.max_send_wr     = attr->cap.max_send_wr,
+		.max_inline_data = attr->cap.max_inline_data};
+
+	return UVERBS_COPY_TO(common, CREATE_QP_RESP, &resp);
+}
+
+DECLARE_UVERBS_ATTR_SPEC(
+	uverbs_create_qp_spec,
+	UVERBS_ATTR_IDR(CREATE_QP_HANDLE, UVERBS_TYPE_QP, UVERBS_IDR_ACCESS_NEW),
+	UVERBS_ATTR_IDR(CREATE_QP_PD_HANDLE, UVERBS_TYPE_PD, UVERBS_IDR_ACCESS_READ),
+	UVERBS_ATTR_IDR(CREATE_QP_SEND_CQ, UVERBS_TYPE_CQ, UVERBS_IDR_ACCESS_READ),
+	UVERBS_ATTR_IDR(CREATE_QP_RECV_CQ, UVERBS_TYPE_CQ, UVERBS_IDR_ACCESS_READ),
+	UVERBS_ATTR_IDR(CREATE_QP_SRQ, UVERBS_TYPE_SRQ, UVERBS_IDR_ACCESS_READ),
+	UVERBS_ATTR_PTR_IN(CREATE_QP_USER_HANDLE, sizeof(__u64)),
+	UVERBS_ATTR_PTR_IN(CREATE_QP_CMD, sizeof(struct ib_uverbs_ioctl_create_qp)),
+	UVERBS_ATTR_PTR_IN(CREATE_QP_CMD_FLAGS, sizeof(__u32)),
+	UVERBS_ATTR_PTR_OUT(CREATE_QP_RESP, sizeof(struct ib_uverbs_ioctl_create_qp_resp)));
+EXPORT_SYMBOL(uverbs_create_qp_spec);
+
+int uverbs_create_qp_handler(struct ib_device *ib_dev,
+			     struct ib_ucontext *ucontext,
+			     struct uverbs_attr_array *common,
+			     struct uverbs_attr_array *vendor,
+			     void *priv)
+{
+	struct ib_uqp_object           *obj;
+	struct ib_udata uhw;
+	int ret;
+	__u64 user_handle = 0;
+	__u32 create_flags = 0;
+	struct ib_uverbs_ioctl_create_qp cmd;
+	struct ib_qp_init_attr attr = {};
+	struct ib_qp                   *qp;
+	struct ib_pd			*pd;
+
+	if (!common->attrs[CREATE_QP_HANDLE].valid ||
+	    !common->attrs[CREATE_QP_PD_HANDLE].valid ||
+	    !common->attrs[CREATE_QP_SEND_CQ].valid ||
+	    !common->attrs[CREATE_QP_RECV_CQ].valid ||
+	    !common->attrs[CREATE_QP_RESP].valid)
+		return -EINVAL;
+
+	ret = UVERBS_COPY_FROM(&cmd, common, CREATE_QP_CMD);
+	if (ret)
+		return ret;
+
+	/* Optional params */
+	if (UVERBS_COPY_FROM(&create_flags, common, CREATE_QP_CMD_FLAGS) == -EFAULT ||
+	    UVERBS_COPY_FROM(&user_handle, common, CREATE_QP_USER_HANDLE) == -EFAULT)
+		return -EFAULT;
+
+	if (cmd.qp_type == IB_QPT_XRC_INI) {
+		cmd.max_recv_wr = 0;
+		cmd.max_recv_sge = 0;
+	}
+
+	ret = qp_fill_attrs(&attr, ucontext, &cmd, create_flags);
+	if (ret)
+		return ret;
+
+	pd = common->attrs[CREATE_QP_PD_HANDLE].obj_attr.uobject->object;
+	attr.send_cq = common->attrs[CREATE_QP_SEND_CQ].obj_attr.uobject->object;
+	attr.recv_cq = common->attrs[CREATE_QP_RECV_CQ].obj_attr.uobject->object;
+	if (common->attrs[CREATE_QP_SRQ].valid)
+		attr.srq = common->attrs[CREATE_QP_SRQ].obj_attr.uobject->object;
+	obj = (struct ib_uqp_object *)common->attrs[CREATE_QP_HANDLE].obj_attr.uobject;
+
+	if (attr.srq && attr.srq->srq_type != IB_SRQT_BASIC)
+		return -EINVAL;
+
+	qp_init_uqp(obj);
+	create_udata(vendor, &uhw);
+	qp = pd->device->create_qp(pd, &attr, &uhw);
+	if (IS_ERR(qp))
+		return PTR_ERR(qp);
+	qp->real_qp	  = qp;
+	qp->device	  = pd->device;
+	qp->pd		  = pd;
+	qp->send_cq	  = attr.send_cq;
+	qp->recv_cq	  = attr.recv_cq;
+	qp->srq		  = attr.srq;
+	qp->event_handler = attr.event_handler;
+	qp->qp_context	  = attr.qp_context;
+	qp->qp_type	  = attr.qp_type;
+	atomic_set(&qp->usecnt, 0);
+	atomic_inc(&pd->usecnt);
+	atomic_inc(&attr.send_cq->usecnt);
+	if (attr.recv_cq)
+		atomic_inc(&attr.recv_cq->usecnt);
+	if (attr.srq)
+		atomic_inc(&attr.srq->usecnt);
+	qp->uobject = &obj->uevent.uobject;
+	obj->uevent.uobject.object = qp;
+	obj->uevent.uobject.user_handle = user_handle;
+
+	ret = qp_write_resp(&attr, qp, common);
+	if (ret) {
+		ib_destroy_qp(qp);
+		return ret;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(uverbs_create_qp_handler);
+
+DECLARE_UVERBS_ACTION(uverbs_action_create_qp, uverbs_create_qp_handler, NULL,
+		      &uverbs_create_qp_spec, &uverbs_uhw_compat_spec);
+EXPORT_SYMBOL(uverbs_action_create_qp);
+
+DECLARE_UVERBS_ATTR_SPEC(
+	uverbs_create_qp_xrc_tgt_spec,
+	UVERBS_ATTR_IDR(CREATE_QP_XRC_TGT_HANDLE, UVERBS_TYPE_QP, UVERBS_IDR_ACCESS_NEW),
+	UVERBS_ATTR_IDR(CREATE_QP_XRC_TGT_XRCD, UVERBS_TYPE_XRCD, UVERBS_IDR_ACCESS_READ),
+	UVERBS_ATTR_PTR_IN(CREATE_QP_XRC_TGT_USER_HANDLE, sizeof(__u64)),
+	UVERBS_ATTR_PTR_IN(CREATE_QP_XRC_TGT_CMD, sizeof(struct ib_uverbs_ioctl_create_qp)),
+	UVERBS_ATTR_PTR_IN(CREATE_QP_XRC_TGT_CMD_FLAGS, sizeof(__u32)),
+	UVERBS_ATTR_PTR_OUT(CREATE_QP_XRC_TGT_RESP, sizeof(struct ib_uverbs_ioctl_create_qp_resp)));
+EXPORT_SYMBOL(uverbs_create_qp_xrc_tgt_spec);
+
+int uverbs_create_qp_xrc_tgt_handler(struct ib_device *ib_dev,
+				     struct ib_ucontext *ucontext,
+				     struct uverbs_attr_array *common,
+				     struct uverbs_attr_array *vendor,
+				     void *priv)
+{
+	struct ib_uqp_object           *obj;
+	int ret;
+	__u64 user_handle = 0;
+	__u32 create_flags = 0;
+	struct ib_uverbs_ioctl_create_qp cmd;
+	struct ib_qp_init_attr attr = {};
+	struct ib_qp                   *qp;
+
+	if (!common->attrs[CREATE_QP_XRC_TGT_HANDLE].valid ||
+	    !common->attrs[CREATE_QP_XRC_TGT_XRCD].valid ||
+	    !common->attrs[CREATE_QP_XRC_TGT_RESP].valid)
+		return -EINVAL;
+
+	ret = UVERBS_COPY_FROM(&cmd, common, CREATE_QP_XRC_TGT_CMD);
+	if (ret)
+		return ret;
+
+	/* Optional params */
+	if (UVERBS_COPY_FROM(&create_flags, common, CREATE_QP_CMD_FLAGS) == -EFAULT ||
+	    UVERBS_COPY_FROM(&user_handle, common, CREATE_QP_USER_HANDLE) == -EFAULT)
+		return -EFAULT;
+
+	ret = qp_fill_attrs(&attr, ucontext, &cmd, create_flags);
+	if (ret)
+		return ret;
+
+	obj = (struct ib_uqp_object *)common->attrs[CREATE_QP_HANDLE].obj_attr.uobject;
+	obj->uxrcd = container_of(common->attrs[CREATE_QP_XRC_TGT_XRCD].obj_attr.uobject,
+				  struct ib_uxrcd_object, uobject);
+	attr.xrcd = obj->uxrcd->uobject.object;
+
+	qp_init_uqp(obj);
+	qp = ib_create_qp(NULL, &attr);
+	if (IS_ERR(qp))
+		return PTR_ERR(qp);
+	qp->uobject = &obj->uevent.uobject;
+	obj->uevent.uobject.object = qp;
+	obj->uevent.uobject.user_handle = user_handle;
+	atomic_inc(&obj->uxrcd->refcnt);
+
+	ret = qp_write_resp(&attr, qp, common);
+	if (ret) {
+		ib_destroy_qp(qp);
+		return ret;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(uverbs_create_qp_xrc_tgt_handler);
+
+DECLARE_UVERBS_ACTION(uverbs_action_create_qp_xrc_tgt, uverbs_create_qp_xrc_tgt_handler,
+		      NULL, &uverbs_create_qp_spec);
+EXPORT_SYMBOL(uverbs_action_create_qp_xrc_tgt);
+
 DECLARE_UVERBS_ACTIONS(
 	uverbs_actions_comp_channel,
 	ADD_UVERBS_ACTION_PTR(UVERBS_COMP_CHANNEL_CREATE, &uverbs_action_create_comp_channel),
@@ -709,6 +930,13 @@ DECLARE_UVERBS_ACTIONS(
 EXPORT_SYMBOL(uverbs_actions_cq);
 
 DECLARE_UVERBS_ACTIONS(
+	uverbs_actions_qp,
+	ADD_UVERBS_ACTION_PTR(UVERBS_QP_CREATE, &uverbs_action_create_qp),
+	ADD_UVERBS_ACTION_PTR(UVERBS_QP_CREATE_XRC_TGT, &uverbs_action_create_qp_xrc_tgt),
+);
+EXPORT_SYMBOL(uverbs_actions_qp);
+
+DECLARE_UVERBS_ACTIONS(
 	uverbs_actions_mr,
 	ADD_UVERBS_ACTION_PTR(UVERBS_MR_REG, &uverbs_action_reg_mr),
 	ADD_UVERBS_ACTION_PTR(UVERBS_MR_DEREG, &uverbs_action_dereg_mr),
@@ -744,6 +972,13 @@ DECLARE_UVERBS_TYPE(uverbs_type_cq,
 		    &uverbs_actions_cq);
 EXPORT_SYMBOL(uverbs_type_cq);
 
+DECLARE_UVERBS_TYPE(uverbs_type_qp,
+		    /* 1 is used in order to free the MR after all the MWs */
+		    &UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uqp_object), 0,
+					      uverbs_free_qp),
+		    &uverbs_actions_qp);
+EXPORT_SYMBOL(uverbs_type_qp);
+
 DECLARE_UVERBS_TYPE(uverbs_type_mr,
 		    /* 1 is used in order to free the MR after all the MWs */
 		    &UVERBS_TYPE_ALLOC_IDR(1, uverbs_free_mr),
diff --git a/include/rdma/uverbs_ioctl_cmd.h b/include/rdma/uverbs_ioctl_cmd.h
index 7a2d8f1..ca82138 100644
--- a/include/rdma/uverbs_ioctl_cmd.h
+++ b/include/rdma/uverbs_ioctl_cmd.h
@@ -113,6 +113,18 @@ enum uverbs_common_types {
 	UVERBS_TYPE_LAST,
 };
 
+enum uverbs_create_qp_cmd_attr {
+	CREATE_QP_HANDLE,
+	CREATE_QP_PD_HANDLE,
+	CREATE_QP_SEND_CQ,
+	CREATE_QP_RECV_CQ,
+	CREATE_QP_SRQ,
+	CREATE_QP_USER_HANDLE,
+	CREATE_QP_CMD,
+	CREATE_QP_CMD_FLAGS,
+	CREATE_QP_RESP
+};
+
 enum uverbs_create_cq_cmd_attr {
 	CREATE_CQ_HANDLE,
 	CREATE_CQ_CQE,
@@ -123,6 +135,15 @@ enum uverbs_create_cq_cmd_attr {
 	CREATE_CQ_RESP_CQE,
 };
 
+enum uverbs_create_qp_xrc_tgt_cmd_attr {
+	CREATE_QP_XRC_TGT_HANDLE,
+	CREATE_QP_XRC_TGT_XRCD,
+	CREATE_QP_XRC_TGT_USER_HANDLE,
+	CREATE_QP_XRC_TGT_CMD,
+	CREATE_QP_XRC_TGT_CMD_FLAGS,
+	CREATE_QP_XRC_TGT_RESP
+};
+
 enum uverbs_create_comp_channel_cmd_attr {
 	CREATE_COMP_CHANNEL_FD,
 };
@@ -203,6 +224,18 @@ int uverbs_create_cq_handler(struct ib_device *ib_dev,
 			     struct uverbs_attr_array *vendor,
 			     void *priv);
 
+int uverbs_create_qp_handler(struct ib_device *ib_dev,
+			     struct ib_ucontext *ucontext,
+			     struct uverbs_attr_array *common,
+			     struct uverbs_attr_array *vendor,
+			     void *priv);
+
+int uverbs_create_qp_xrc_tgt_handler(struct ib_device *ib_dev,
+				     struct ib_ucontext *ucontext,
+				     struct uverbs_attr_array *common,
+				     struct uverbs_attr_array *vendor,
+				     void *priv);
+
 extern const struct uverbs_action uverbs_action_get_context;
 extern const struct uverbs_action uverbs_action_create_cq;
 extern const struct uverbs_action uverbs_action_create_comp_channel;
@@ -210,6 +243,8 @@ extern const struct uverbs_action uverbs_action_query_device;
 extern const struct uverbs_action uverbs_action_alloc_pd;
 extern const struct uverbs_action uverbs_action_reg_mr;
 extern const struct uverbs_action uverbs_action_dereg_mr;
+extern const struct uverbs_action uverbs_action_create_qp;
+extern const struct uverbs_action uverbs_action_create_qp_xrc_tgt;
 
 enum uverbs_actions_mr_ops {
 	UVERBS_MR_REG,
@@ -230,6 +265,13 @@ enum uverbs_actions_cq_ops {
 
 extern const struct uverbs_type_actions_group uverbs_actions_cq;
 
+enum uverbs_actions_qp_ops {
+	UVERBS_QP_CREATE,
+	UVERBS_QP_CREATE_XRC_TGT,
+};
+
+extern const struct uverbs_type_actions_group uverbs_actions_qp;
+
 enum uverbs_actions_pd_ops {
 	UVERBS_PD_ALLOC
 };
@@ -244,6 +286,7 @@ enum uverbs_actions_device_ops {
 extern const struct uverbs_type_actions_group uverbs_actions_device;
 
 extern const struct uverbs_type uverbs_type_cq;
+extern const struct uverbs_type uverbs_type_qp;
 extern const struct uverbs_type uverbs_type_comp_channel;
 extern const struct uverbs_type uverbs_type_mr;
 extern const struct uverbs_type uverbs_type_pd;
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 0e9821b..14aff5f 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -579,6 +579,16 @@ struct ib_uverbs_ex_create_qp {
 	__u32  reserved1;
 };
 
+struct ib_uverbs_ioctl_create_qp {
+	__u32 max_send_wr;
+	__u32 max_recv_wr;
+	__u32 max_send_sge;
+	__u32 max_recv_sge;
+	__u32 max_inline_data;
+	__u8  sq_sig_all;
+	__u8  qp_type;
+};
+
 struct ib_uverbs_open_qp {
 	__u64 response;
 	__u64 user_handle;
@@ -601,6 +611,15 @@ struct ib_uverbs_create_qp_resp {
 	__u32 reserved;
 };
 
+struct ib_uverbs_ioctl_create_qp_resp {
+	__u32 qpn;
+	__u32 max_send_wr;
+	__u32 max_recv_wr;
+	__u32 max_send_sge;
+	__u32 max_recv_sge;
+	__u32 max_inline_data;
+};
+
 struct ib_uverbs_ex_create_qp_resp {
 	struct ib_uverbs_create_qp_resp base;
 	__u32 comp_mask;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC ABI V5 10/10] IB/core: Add modify_qp command to the new ABI
       [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2016-10-27 14:43   ` [RFC ABI V5 09/10] IB/core: Add create_qp command to the new ABI Matan Barak
@ 2016-10-27 14:43   ` Matan Barak
  9 siblings, 0 replies; 29+ messages in thread
From: Matan Barak @ 2016-10-27 14:43 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Christoph Lameter,
	Liran Liss, Haggai Eran, Majd Dibbiny, Matan Barak, Tal Alon,
	Leon Romanovsky

Partially tested.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/core_priv.h        |  14 +++
 drivers/infiniband/core/uverbs_cmd.c       |  14 ---
 drivers/infiniband/core/uverbs_ioctl_cmd.c | 161 +++++++++++++++++++++++++++++
 include/rdma/uverbs_ioctl_cmd.h            |  32 ++++++
 include/uapi/rdma/ib_user_verbs.h          |   7 ++
 5 files changed, 214 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 19d499d..fccc7bc 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -153,4 +153,18 @@ int ib_nl_handle_set_timeout(struct sk_buff *skb,
 int ib_nl_handle_ip_res_resp(struct sk_buff *skb,
 			     struct netlink_callback *cb);
 
+/* Remove ignored fields set in the attribute mask */
+static inline int modify_qp_mask(enum ib_qp_type qp_type, int mask)
+{
+	switch (qp_type) {
+	case IB_QPT_XRC_INI:
+		return mask & ~(IB_QP_MAX_DEST_RD_ATOMIC | IB_QP_MIN_RNR_TIMER);
+	case IB_QPT_XRC_TGT:
+		return mask & ~(IB_QP_MAX_QP_RD_ATOMIC | IB_QP_RETRY_CNT |
+				IB_QP_RNR_RETRY);
+	default:
+		return mask;
+	}
+}
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index b1baa67..f4704bc 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2303,20 +2303,6 @@ out:
 	return ret ? ret : in_len;
 }
 
-/* Remove ignored fields set in the attribute mask */
-static int modify_qp_mask(enum ib_qp_type qp_type, int mask)
-{
-	switch (qp_type) {
-	case IB_QPT_XRC_INI:
-		return mask & ~(IB_QP_MAX_DEST_RD_ATOMIC | IB_QP_MIN_RNR_TIMER);
-	case IB_QPT_XRC_TGT:
-		return mask & ~(IB_QP_MAX_QP_RD_ATOMIC | IB_QP_RETRY_CNT |
-				IB_QP_RNR_RETRY);
-	default:
-		return mask;
-	}
-}
-
 ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
 			    struct ib_device *ib_dev,
 			    const char __user *buf, int in_len,
diff --git a/drivers/infiniband/core/uverbs_ioctl_cmd.c b/drivers/infiniband/core/uverbs_ioctl_cmd.c
index e0c1af9..519b5e3 100644
--- a/drivers/infiniband/core/uverbs_ioctl_cmd.c
+++ b/drivers/infiniband/core/uverbs_ioctl_cmd.c
@@ -37,6 +37,7 @@
 #include <linux/file.h>
 #include "rdma_core.h"
 #include "uverbs.h"
+#include "core_priv.h"
 
 int ib_uverbs_std_dist(__u16 *id, void *priv)
 {
@@ -917,6 +918,164 @@ DECLARE_UVERBS_ACTION(uverbs_action_create_qp_xrc_tgt, uverbs_create_qp_xrc_tgt_
 		      NULL, &uverbs_create_qp_spec);
 EXPORT_SYMBOL(uverbs_action_create_qp_xrc_tgt);
 
+DECLARE_UVERBS_ATTR_SPEC(
+	uverbs_modify_qp_spec,
+	UVERBS_ATTR_IDR(MODIFY_QP_HANDLE, UVERBS_TYPE_QP, UVERBS_IDR_ACCESS_WRITE),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_STATE, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_CUR_STATE, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_EN_SQD_ASYNC_NOTIFY, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_ACCESS_FLAGS, sizeof(__u32)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_PKEY_INDEX, sizeof(__u16)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_PORT, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_QKEY, sizeof(__u32)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_AV, sizeof(struct ib_uverbs_qp_dest)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_PATH_MTU, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_TIMEOUT, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_RETRY_CNT, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_RNR_RETRY, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_RQ_PSN, sizeof(__u32)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_MAX_RD_ATOMIC, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_ALT_PATH, sizeof(struct ib_uverbs_qp_alt_path)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_MIN_RNR_TIMER, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_SQ_PSN, sizeof(__u32)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_MAX_DEST_RD_ATOMIC, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_PATH_MIG_STATE, sizeof(__u8)),
+	UVERBS_ATTR_PTR_IN(MODIFY_QP_DEST_QPN, sizeof(__u32)));
+EXPORT_SYMBOL(uverbs_modify_qp_spec);
+
+int uverbs_modify_qp_handler(struct ib_device *ib_dev,
+			     struct ib_ucontext *ucontext,
+			     struct uverbs_attr_array *common,
+			     struct uverbs_attr_array *vendor,
+			     void *priv)
+{
+	struct ib_udata            uhw;
+	struct ib_qp              *qp;
+	struct ib_qp_attr         *attr;
+	struct ib_uverbs_qp_dest  av;
+	struct ib_uverbs_qp_alt_path alt_path;
+	__u32 attr_mask = 0;
+	int ret;
+
+	if (!common->attrs[MODIFY_QP_HANDLE].valid)
+		return -EINVAL;
+
+	qp = common->attrs[MODIFY_QP_HANDLE].obj_attr.uobject->object;
+	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+	if (!attr)
+		return -ENOMEM;
+
+#define MODIFY_QP_CPY(_param, _fld, _attr)				\
+	({								\
+		int ret = UVERBS_COPY_FROM(_fld, common, _param);	\
+		if (!ret)						\
+			attr_mask |= _attr;				\
+		ret == -EFAULT ? ret : 0;				\
+	})
+
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_STATE, &attr->qp_state,
+				   IB_QP_STATE);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_CUR_STATE, &attr->cur_qp_state,
+				   IB_QP_CUR_STATE);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_EN_SQD_ASYNC_NOTIFY,
+				   &attr->en_sqd_async_notify,
+				   IB_QP_EN_SQD_ASYNC_NOTIFY);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_ACCESS_FLAGS,
+				   &attr->qp_access_flags, IB_QP_ACCESS_FLAGS);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_PKEY_INDEX, &attr->pkey_index,
+				   IB_QP_PKEY_INDEX);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_PORT, &attr->port_num, IB_QP_PORT);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_QKEY, &attr->qkey, IB_QP_QKEY);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_PATH_MTU, &attr->path_mtu,
+				   IB_QP_PATH_MTU);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_TIMEOUT, &attr->timeout,
+				   IB_QP_TIMEOUT);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_RETRY_CNT, &attr->retry_cnt,
+				   IB_QP_RETRY_CNT);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_RNR_RETRY, &attr->rnr_retry,
+				   IB_QP_RNR_RETRY);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_RQ_PSN, &attr->rq_psn,
+				   IB_QP_RQ_PSN);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_MAX_RD_ATOMIC,
+				   &attr->max_rd_atomic,
+				   IB_QP_MAX_QP_RD_ATOMIC);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_MIN_RNR_TIMER,
+				   &attr->min_rnr_timer, IB_QP_MIN_RNR_TIMER);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_SQ_PSN, &attr->sq_psn,
+				   IB_QP_SQ_PSN);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_MAX_DEST_RD_ATOMIC,
+				   &attr->max_dest_rd_atomic,
+				   IB_QP_MAX_DEST_RD_ATOMIC);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_PATH_MIG_STATE,
+				   &attr->path_mig_state, IB_QP_PATH_MIG_STATE);
+	ret = ret ?: MODIFY_QP_CPY(MODIFY_QP_DEST_QPN, &attr->dest_qp_num,
+				   IB_QP_DEST_QPN);
+
+	if (ret)
+		goto err;
+
+	ret = UVERBS_COPY_FROM(&av, common, MODIFY_QP_AV);
+	if (!ret) {
+		attr_mask |= IB_QP_AV;
+		attr->ah_attr.grh.flow_label        = av.flow_label;
+		attr->ah_attr.grh.sgid_index        = av.sgid_index;
+		attr->ah_attr.grh.hop_limit         = av.hop_limit;
+		attr->ah_attr.grh.traffic_class     = av.traffic_class;
+		attr->ah_attr.dlid		    = av.dlid;
+		attr->ah_attr.sl		    = av.sl;
+		attr->ah_attr.src_path_bits	    = av.src_path_bits;
+		attr->ah_attr.static_rate	    = av.static_rate;
+		attr->ah_attr.ah_flags		    = av.is_global ? IB_AH_GRH : 0;
+		attr->ah_attr.port_num		    = av.port_num;
+	} else if (ret == -EFAULT) {
+		goto err;
+	}
+
+	ret = UVERBS_COPY_FROM(&alt_path, common, MODIFY_QP_ALT_PATH);
+	if (!ret) {
+		attr_mask |= IB_QP_ALT_PATH;
+		attr->alt_ah_attr.grh.flow_label    = alt_path.dest.flow_label;
+		attr->alt_ah_attr.grh.sgid_index    = alt_path.dest.sgid_index;
+		attr->alt_ah_attr.grh.hop_limit     = alt_path.dest.hop_limit;
+		attr->alt_ah_attr.grh.traffic_class = alt_path.dest.traffic_class;
+		attr->alt_ah_attr.dlid		    = alt_path.dest.dlid;
+		attr->alt_ah_attr.sl		    = alt_path.dest.sl;
+		attr->alt_ah_attr.src_path_bits     = alt_path.dest.src_path_bits;
+		attr->alt_ah_attr.static_rate       = alt_path.dest.static_rate;
+		attr->alt_ah_attr.ah_flags	    = alt_path.dest.is_global ? IB_AH_GRH : 0;
+		attr->alt_ah_attr.port_num	    = alt_path.dest.port_num;
+		attr->alt_pkey_index		    = alt_path.pkey_index;
+		attr->alt_port_num		    = alt_path.port_num;
+		attr->alt_timeout		    = alt_path.timeout;
+	} else if (ret == -EFAULT) {
+		goto err;
+	}
+
+	create_udata(vendor, &uhw);
+
+	if (qp->real_qp == qp) {
+		ret = ib_resolve_eth_dmac(qp, attr, &attr_mask);
+		if (ret)
+			goto err;
+		ret = qp->device->modify_qp(qp, attr,
+			modify_qp_mask(qp->qp_type, attr_mask), &uhw);
+	} else {
+		ret = ib_modify_qp(qp, attr, modify_qp_mask(qp->qp_type, attr_mask));
+	}
+
+	if (ret)
+		goto err;
+
+	return 0;
+err:
+	kfree(attr);
+	return ret;
+}
+EXPORT_SYMBOL(uverbs_modify_qp_handler);
+
+DECLARE_UVERBS_ACTION(uverbs_action_modify_qp, uverbs_modify_qp_handler, NULL,
+		      &uverbs_modify_qp_spec, &uverbs_uhw_compat_spec);
+
 DECLARE_UVERBS_ACTIONS(
 	uverbs_actions_comp_channel,
 	ADD_UVERBS_ACTION_PTR(UVERBS_COMP_CHANNEL_CREATE, &uverbs_action_create_comp_channel),
@@ -933,6 +1092,7 @@ DECLARE_UVERBS_ACTIONS(
 	uverbs_actions_qp,
 	ADD_UVERBS_ACTION_PTR(UVERBS_QP_CREATE, &uverbs_action_create_qp),
 	ADD_UVERBS_ACTION_PTR(UVERBS_QP_CREATE_XRC_TGT, &uverbs_action_create_qp_xrc_tgt),
+	ADD_UVERBS_ACTION_PTR(UVERBS_QP_MODIFY, &uverbs_action_modify_qp),
 );
 EXPORT_SYMBOL(uverbs_actions_qp);
 
@@ -1000,6 +1160,7 @@ DECLARE_UVERBS_TYPES(uverbs_types,
 		     ADD_UVERBS_TYPE(UVERBS_TYPE_MR, uverbs_type_mr),
 		     ADD_UVERBS_TYPE(UVERBS_TYPE_COMP_CHANNEL, uverbs_type_comp_channel),
 		     ADD_UVERBS_TYPE(UVERBS_TYPE_CQ, uverbs_type_cq),
+		     ADD_UVERBS_TYPE(UVERBS_TYPE_QP, uverbs_type_qp),
 );
 EXPORT_SYMBOL(uverbs_types);
 
diff --git a/include/rdma/uverbs_ioctl_cmd.h b/include/rdma/uverbs_ioctl_cmd.h
index ca82138..50fdaba 100644
--- a/include/rdma/uverbs_ioctl_cmd.h
+++ b/include/rdma/uverbs_ioctl_cmd.h
@@ -144,6 +144,30 @@ enum uverbs_create_qp_xrc_tgt_cmd_attr {
 	CREATE_QP_XRC_TGT_RESP
 };
 
+enum uverbs_modify_qp_cmd_attr {
+	MODIFY_QP_HANDLE,
+	MODIFY_QP_STATE,
+	MODIFY_QP_CUR_STATE,
+	MODIFY_QP_EN_SQD_ASYNC_NOTIFY,
+	MODIFY_QP_ACCESS_FLAGS,
+	MODIFY_QP_PKEY_INDEX,
+	MODIFY_QP_PORT,
+	MODIFY_QP_QKEY,
+	MODIFY_QP_AV,
+	MODIFY_QP_PATH_MTU,
+	MODIFY_QP_TIMEOUT,
+	MODIFY_QP_RETRY_CNT,
+	MODIFY_QP_RNR_RETRY,
+	MODIFY_QP_RQ_PSN,
+	MODIFY_QP_MAX_RD_ATOMIC,
+	MODIFY_QP_ALT_PATH,
+	MODIFY_QP_MIN_RNR_TIMER,
+	MODIFY_QP_SQ_PSN,
+	MODIFY_QP_MAX_DEST_RD_ATOMIC,
+	MODIFY_QP_PATH_MIG_STATE,
+	MODIFY_QP_DEST_QPN
+};
+
 enum uverbs_create_comp_channel_cmd_attr {
 	CREATE_COMP_CHANNEL_FD,
 };
@@ -236,6 +260,12 @@ int uverbs_create_qp_xrc_tgt_handler(struct ib_device *ib_dev,
 				     struct uverbs_attr_array *vendor,
 				     void *priv);
 
+int uverbs_modify_qp_handler(struct ib_device *ib_dev,
+			     struct ib_ucontext *ucontext,
+			     struct uverbs_attr_array *common,
+			     struct uverbs_attr_array *vendor,
+			     void *priv);
+
 extern const struct uverbs_action uverbs_action_get_context;
 extern const struct uverbs_action uverbs_action_create_cq;
 extern const struct uverbs_action uverbs_action_create_comp_channel;
@@ -245,6 +275,7 @@ extern const struct uverbs_action uverbs_action_reg_mr;
 extern const struct uverbs_action uverbs_action_dereg_mr;
 extern const struct uverbs_action uverbs_action_create_qp;
 extern const struct uverbs_action uverbs_action_create_qp_xrc_tgt;
+extern const struct uverbs_action uverbs_action_modify_qp;
 
 enum uverbs_actions_mr_ops {
 	UVERBS_MR_REG,
@@ -268,6 +299,7 @@ extern const struct uverbs_type_actions_group uverbs_actions_cq;
 enum uverbs_actions_qp_ops {
 	UVERBS_QP_CREATE,
 	UVERBS_QP_CREATE_XRC_TGT,
+	UVERBS_QP_MODIFY,
 };
 
 extern const struct uverbs_type_actions_group uverbs_actions_qp;
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 14aff5f..0b06c4d 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -645,6 +645,13 @@ struct ib_uverbs_qp_dest {
 	__u8  port_num;
 };
 
+struct ib_uverbs_qp_alt_path {
+	struct ib_uverbs_qp_dest dest;
+	__u16 pkey_index;
+	__u8  port_num;
+	__u8  timeout;
+};
+
 struct ib_uverbs_query_qp {
 	__u64 response;
 	__u32 qp_handle;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
       [not found]     ` <1477579398-6875-8-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-10-28  6:59       ` Christoph Hellwig
       [not found]         ` <20161028065943.GA10418-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2016-10-28  6:59 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Christoph Lameter, Liran Liss, Haggai Eran,
	Majd Dibbiny, Tal Alon, Leon Romanovsky

> +	mm_segment_t currentfs = get_fs();
>  
>  	if (!ib_dev)
>  		return -EIO;
> @@ -240,8 +242,10 @@ static long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
>  		goto out;
>  	}
>  
> +	set_fs(oldfs);
>  	err = uverbs_handle_action(buf, ctx->uattrs, hdr->num_attrs, ib_dev,
>  				   file, action, ctx->uverbs_attr_array);
> +	set_fs(currentfs);

Adding this magic in new code is not acceptable.  Any given API
must take either a kernel or a user pointer.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
       [not found]         ` <20161028065943.GA10418-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-10-28 15:16           ` Leon Romanovsky
       [not found]             ` <20161028151606.GN3617-2ukJVAZIZ/Y@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2016-10-28 15:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Jason Gunthorpe, Sean Hefty, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 931 bytes --]

On Thu, Oct 27, 2016 at 11:59:43PM -0700, Christoph Hellwig wrote:
> > +	mm_segment_t currentfs = get_fs();
> >
> >  	if (!ib_dev)
> >  		return -EIO;
> > @@ -240,8 +242,10 @@ static long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
> >  		goto out;
> >  	}
> >
> > +	set_fs(oldfs);
> >  	err = uverbs_handle_action(buf, ctx->uattrs, hdr->num_attrs, ib_dev,
> >  				   file, action, ctx->uverbs_attr_array);
> > +	set_fs(currentfs);
>
> Adding this magic in new code is not acceptable.  Any given API
> must take either a kernel or a user pointer.

And it is indeed happen for new code. This magic is needed to allow legacy
write interface to be converted to new ioctl interface internally in
kernel.


> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
       [not found]             ` <20161028151606.GN3617-2ukJVAZIZ/Y@public.gmane.org>
@ 2016-10-28 15:21               ` Christoph Hellwig
       [not found]                 ` <20161028152138.GA16421-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2016-10-28 15:21 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Christoph Lameter, Liran Liss, Haggai Eran,
	Majd Dibbiny, Tal Alon

On Fri, Oct 28, 2016 at 06:16:06PM +0300, Leon Romanovsky wrote:
> And it is indeed happen for new code. This magic is needed to allow legacy
> write interface to be converted to new ioctl interface internally in
> kernel.

You just added these statements which are by defintion new code.
Don't do that - create a clean kernel internal interface instead.

The canonical one would be to only pass kernel pointers in the internal
interface.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
       [not found]                 ` <20161028152138.GA16421-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-10-28 15:33                   ` Leon Romanovsky
       [not found]                     ` <20161028153306.GO3617-2ukJVAZIZ/Y@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2016-10-28 15:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Jason Gunthorpe, Sean Hefty, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 1147 bytes --]

On Fri, Oct 28, 2016 at 08:21:38AM -0700, Christoph Hellwig wrote:
> On Fri, Oct 28, 2016 at 06:16:06PM +0300, Leon Romanovsky wrote:
> > And it is indeed happen for new code. This magic is needed to allow legacy
> > write interface to be converted to new ioctl interface internally in
> > kernel.
>
> You just added these statements which are by defintion new code.
> Don't do that - create a clean kernel internal interface instead.
>
> The canonical one would be to only pass kernel pointers in the internal
> interface.

Just to summarize, to be sure that I understood you correctly.

---------    --------------------
| write | -> | conversion logic | ---
---------    --------------------   |      ----------------------
                                    -----> | internal interface |
---------                           |      ----------------------
| ioctl | ---------------------------
---------

Am I right?

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
       [not found]                     ` <20161028153306.GO3617-2ukJVAZIZ/Y@public.gmane.org>
@ 2016-10-28 15:37                       ` Christoph Hellwig
       [not found]                         ` <20161028153725.GA14166-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2016-10-28 15:37 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Matan Barak,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Christoph Lameter, Liran Liss, Haggai Eran,
	Majd Dibbiny, Tal Alon

On Fri, Oct 28, 2016 at 06:33:06PM +0300, Leon Romanovsky wrote:
> Just to summarize, to be sure that I understood you correctly.
> 
> ---------    --------------------
> | write | -> | conversion logic | ---
> ---------    --------------------   |      ----------------------
>                                     -----> | internal interface |
> ---------                           |      ----------------------
> | ioctl | ---------------------------
> ---------
> 
> Am I right?

Yes, as long as the write and ioctl boxes do the copy_{from,to}_user.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
       [not found]                         ` <20161028153725.GA14166-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-10-28 15:46                           ` Leon Romanovsky
       [not found]                             ` <20161028154628.GP3617-2ukJVAZIZ/Y@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2016-10-28 15:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Jason Gunthorpe, Sean Hefty, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 880 bytes --]

On Fri, Oct 28, 2016 at 08:37:25AM -0700, Christoph Hellwig wrote:
> On Fri, Oct 28, 2016 at 06:33:06PM +0300, Leon Romanovsky wrote:
> > Just to summarize, to be sure that I understood you correctly.
> >
> > ---------    --------------------
> > | write | -> | conversion logic | ---
> > ---------    --------------------   |      ----------------------
> >                                     -----> | internal interface |
> > ---------                           |      ----------------------
> > | ioctl | ---------------------------
> > ---------
> >
> > Am I right?
>
> Yes, as long as the write and ioctl boxes do the copy_{from,to}_user.

Thanks

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device
       [not found]     ` <1477579398-6875-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-10-28 22:53       ` Hefty, Sean
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373AB0A445F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Hefty, Sean @ 2016-10-28 22:53 UTC (permalink / raw)
  To: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon, Leon Romanovsky

> The current code creates an IDR per type. Since types are currently
> common for all vendors and known in advance, this was good enough.
> However, the proposed ioctl based infrastructure allows each vendor
> to declare only some of the common types and declare its own specific
> types.
> 
> Thus, we decided to implement IDR to be per device and refactor it to
> use a new file.

I think this needs to be more abstract.  I would consider introducing the concept of an 'ioctl provider', with the idr per ioctl provider.  You could then make each ib_device an ioctl provider.  (Just embed the structure).  I believe this will be necessary to support the rdma_cm, ib_cm, as well as devices that export different sets of ioctls, where an ib_device isn't necessarily available.

Essentially, I would treat plugging into the uABI independent from plugging into the kernel verbs API.  Otherwise, I think we'll end up with multiple ioctl 'frameworks'.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
       [not found]                             ` <20161028154628.GP3617-2ukJVAZIZ/Y@public.gmane.org>
@ 2016-10-30  8:48                               ` Matan Barak
       [not found]                                 ` <CAAKD3BB0k1UxV2qO3SqAD_t1vM2pcduOXiz8aJ5c+JXAmq_aWw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Matan Barak @ 2016-10-30  8:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, Matan Barak, linux-rdma, Doug Ledford,
	Jason Gunthorpe, Sean Hefty, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon

On Fri, Oct 28, 2016 at 5:46 PM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On Fri, Oct 28, 2016 at 08:37:25AM -0700, Christoph Hellwig wrote:
>> On Fri, Oct 28, 2016 at 06:33:06PM +0300, Leon Romanovsky wrote:
>> > Just to summarize, to be sure that I understood you correctly.
>> >
>> > ---------    --------------------
>> > | write | -> | conversion logic | ---
>> > ---------    --------------------   |      ----------------------
>> >                                     -----> | internal interface |
>> > ---------                           |      ----------------------
>> > | ioctl | ---------------------------
>> > ---------
>> >
>> > Am I right?
>>
>> Yes, as long as the write and ioctl boxes do the copy_{from,to}_user.
>
> Thanks
>

If we accept the limitations here (i.e - all commands attributes come
either from kernel or from user,
but you can't mix them - that's mean the write comparability layer
either needs to copy all attributes
or use a direct mapping for all of them), I could just either break
ib_uverbs_cmd_verbs to a a few functions
or just pass a callback of boxing the descriptors copy.

>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373AB0A445F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2016-10-30  9:13           ` Leon Romanovsky
  2016-11-07 23:55           ` Jason Gunthorpe
  1 sibling, 0 replies; 29+ messages in thread
From: Leon Romanovsky @ 2016-10-30  9:13 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Jason Gunthorpe, Christoph Lameter, Liran Liss, Haggai Eran,
	Majd Dibbiny, Tal Alon

[-- Attachment #1: Type: text/plain, Size: 1363 bytes --]

On Fri, Oct 28, 2016 at 10:53:13PM +0000, Hefty, Sean wrote:
> > The current code creates an IDR per type. Since types are currently
> > common for all vendors and known in advance, this was good enough.
> > However, the proposed ioctl based infrastructure allows each vendor
> > to declare only some of the common types and declare its own specific
> > types.
> >
> > Thus, we decided to implement IDR to be per device and refactor it to
> > use a new file.
>
> I think this needs to be more abstract.  I would consider introducing the concept of an 'ioctl provider', with the idr per ioctl provider.  You could then make each ib_device an ioctl provider.  (Just embed the structure).  I believe this will be necessary to support the rdma_cm, ib_cm, as well as devices that export different sets of ioctls, where an ib_device isn't necessarily available.

IDR management is internal to kernel and it looks like an easy one to extend in the future.

>
> Essentially, I would treat plugging into the uABI independent from plugging into the kernel verbs API.  Otherwise, I think we'll end up with multiple ioctl 'frameworks'.
>
> - Sean
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC ABI V5 02/10] RDMA/core: Add support for custom types
       [not found]     ` <1477579398-6875-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-10-30 19:28       ` Hefty, Sean
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373AB0A47BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Hefty, Sean @ 2016-10-30 19:28 UTC (permalink / raw)
  To: 'Matan Barak', linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon, Leon Romanovsky

I found this patch very hard to follow.  This was in part due to the output of the patch command itself, but also because there lacked sufficient documentation on what the new data structures were for and the terms being used.  As a result, I had to bounce around the patch to figure things out, adding comments as I went along, until I finally just gave up trying to read it.

> The new ioctl infrastructure supports driver specific objects.
> Each such object type has a free function, allocation size and an

You can replace the allocation size with an alloc function, to pair with the free call.  Then the object can be initialized by the user.

> order of destruction. This information is embedded in the same
> table describing the various action allowed on the object, similarly
> to object oriented programming.
> 
> When a ucontext is created, a new list is created in this ib_ucontext.
> This list contains all objects created under this ib_ucontext.
> When a ib_ucontext is destroyed, we traverse this list several time
> destroying the various objects by the order mentioned in the object
> type description. If few object types have the same destruction order,
> they are destroyed in an order opposite to their creation order.

Could we simply walk the list backwards, destroying all objects with a reference count of 1 - repeat if necessary?  Basically avoid complex rules for this.

In fact, it would be great if we could just cleanup the list in the reverse order that items were created.  Maybe this requires supporting a pre-cleanup handler, so that the driver can pluck items out of the list that may need to be destroyed out of order.

> Adding an object is done in two parts.
> First, an object is allocated and added to IDR/fd table. Then, the
> command's handlers (in downstream patches) could work on this object
> and fill in its required details.
> After a successful command, ib_uverbs_uobject_enable is called and
> this user objects becomes ucontext visible.

If you have a way to mark that an object is used for exclusive access, you may be able to use that instead of introducing a new variable.  (I.e. acquire the object's write lock).  I think we want to make an effort to minimize the size of the kernel structure needed to track every user space object (within reason).

> Removing an uboject is done by calling ib_uverbs_uobject_remove.
> 
> We should make sure IDR (per-device) and list (per-ucontext) could
> be accessed concurrently without corrupting them.
> 
> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---

As a general comment, I do have concerns that the resulting generalized parsing of everything will negatively impact performance for operations that do have to transition into the kernel.  Not all devices offload all operations to user space.  Plus the resulting code is extremely difficult to read and non-trivial to use.  It's equivalent to reading C++ code that has 4 layers of inheritance with overrides to basic operators...

Pre and post operators per command that can do straightforward validation seem like a better option.


>  drivers/infiniband/core/Makefile      |   3 +-
>  drivers/infiniband/core/device.c      |   1 +
>  drivers/infiniband/core/rdma_core.c   | 489
> ++++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/rdma_core.h   |  75 ++++++
>  drivers/infiniband/core/uverbs.h      |   1 +
>  drivers/infiniband/core/uverbs_main.c |   2 +-
>  include/rdma/ib_verbs.h               |  28 +-
>  include/rdma/uverbs_ioctl.h           | 195 ++++++++++++++
>  8 files changed, 789 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/infiniband/core/rdma_core.c
>  create mode 100644 drivers/infiniband/core/rdma_core.h
>  create mode 100644 include/rdma/uverbs_ioctl.h
> 
> diff --git a/drivers/infiniband/core/Makefile
> b/drivers/infiniband/core/Makefile
> index edaae9f..1819623 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -28,4 +28,5 @@ ib_umad-y :=			user_mad.o
> 
>  ib_ucm-y :=			ucm.o
> 
> -ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o
> uverbs_marshall.o
> +ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o
> uverbs_marshall.o \
> +				rdma_core.o
> diff --git a/drivers/infiniband/core/device.c
> b/drivers/infiniband/core/device.c
> index c3b68f5..43994b1 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -243,6 +243,7 @@ struct ib_device *ib_alloc_device(size_t size)
>  	spin_lock_init(&device->client_data_lock);
>  	INIT_LIST_HEAD(&device->client_data_list);
>  	INIT_LIST_HEAD(&device->port_list);
> +	INIT_LIST_HEAD(&device->type_list);
> 
>  	return device;
>  }
> diff --git a/drivers/infiniband/core/rdma_core.c
> b/drivers/infiniband/core/rdma_core.c
> new file mode 100644
> index 0000000..337abc2
> --- /dev/null
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -0,0 +1,489 @@
> +/*
> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
> reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#include <linux/file.h>
> +#include <linux/anon_inodes.h>
> +#include <rdma/ib_verbs.h>
> +#include "uverbs.h"
> +#include "rdma_core.h"
> +#include <rdma/uverbs_ioctl.h>
> +
> +const struct uverbs_type *uverbs_get_type(const struct ib_device
> *ibdev,
> +					  uint16_t type)
> +{
> +	const struct uverbs_types_group *groups = ibdev->types_group;
> +	const struct uverbs_types *types;
> +	int ret = groups->dist(&type, groups->priv);
> +
> +	if (ret >= groups->num_groups)
> +		return NULL;
> +
> +	types = groups->type_groups[ret];
> +
> +	if (type >= types->num_types)
> +		return NULL;
> +
> +	return types->types[type];
> +}
> +
> +static int uverbs_lock_object(struct ib_uobject *uobj,
> +			      enum uverbs_idr_access access)
> +{
> +	if (access == UVERBS_IDR_ACCESS_READ)
> +		return down_read_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
> +
> +	/* lock is either WRITE or DESTROY - should be exclusive */
> +	return down_write_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;

This function could take the lock type directly (read or write), versus inferring it based on some other access type.

> +}
> +
> +static struct ib_uobject *get_uobj(int id, struct ib_ucontext
> *context)
> +{
> +	struct ib_uobject *uobj;
> +
> +	rcu_read_lock();
> +	uobj = idr_find(&context->device->idr, id);
> +	if (uobj && uobj->live) {
> +		if (uobj->context != context)
> +			uobj = NULL;
> +	}
> +	rcu_read_unlock();
> +
> +	return uobj;
> +}
> +
> +struct ib_ucontext_lock {
> +	struct kref  ref;
> +	/* locking the uobjects_list */
> +	struct mutex lock;
> +};
> +
> +static void init_uobjects_list_lock(struct ib_ucontext_lock *lock)
> +{
> +	mutex_init(&lock->lock);
> +	kref_init(&lock->ref);
> +}
> +
> +static void release_uobjects_list_lock(struct kref *ref)
> +{
> +	struct ib_ucontext_lock *lock = container_of(ref,
> +						     struct ib_ucontext_lock,
> +						     ref);
> +
> +	kfree(lock);
> +}
> +
> +static void init_uobj(struct ib_uobject *uobj, u64 user_handle,
> +		      struct ib_ucontext *context)
> +{
> +	init_rwsem(&uobj->usecnt);
> +	uobj->user_handle = user_handle;
> +	uobj->context     = context;
> +	uobj->live        = 0;
> +}
> +
> +static int add_uobj(struct ib_uobject *uobj)
> +{
> +	int ret;
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&uobj->context->device->idr_lock);
> +
> +	ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0,
> GFP_NOWAIT);
> +	if (ret >= 0)
> +		uobj->id = ret;
> +
> +	spin_unlock(&uobj->context->device->idr_lock);
> +	idr_preload_end();
> +
> +	return ret < 0 ? ret : 0;
> +}
> +
> +static void remove_uobj(struct ib_uobject *uobj)
> +{
> +	spin_lock(&uobj->context->device->idr_lock);
> +	idr_remove(&uobj->context->device->idr, uobj->id);
> +	spin_unlock(&uobj->context->device->idr_lock);
> +}
> +
> +static void put_uobj(struct ib_uobject *uobj)
> +{
> +	kfree_rcu(uobj, rcu);
> +}
> +
> +static struct ib_uobject *get_uobject_from_context(struct ib_ucontext
> *ucontext,
> +						   const struct
> uverbs_type_alloc_action *type,
> +						   u32 idr,
> +						   enum uverbs_idr_access access)
> +{
> +	struct ib_uobject *uobj;
> +	int ret;
> +
> +	rcu_read_lock();
> +	uobj = get_uobj(idr, ucontext);
> +	if (!uobj)
> +		goto free;
> +
> +	if (uobj->type != type) {
> +		uobj = NULL;
> +		goto free;
> +	}
> +
> +	ret = uverbs_lock_object(uobj, access);
> +	if (ret)
> +		uobj = ERR_PTR(ret);
> +free:
> +	rcu_read_unlock();
> +	return uobj;
> +
> +	return NULL;
> +}
> +
> +static int ib_uverbs_uobject_add(struct ib_uobject *uobject,
> +				 const struct uverbs_type_alloc_action
> *uobject_type)
> +{
> +	uobject->type = uobject_type;
> +	return add_uobj(uobject);
> +}
> +
> +struct ib_uobject *uverbs_get_type_from_idr(const struct
> uverbs_type_alloc_action *type,
> +					    struct ib_ucontext *ucontext,
> +					    enum uverbs_idr_access access,
> +					    uint32_t idr)
> +{
> +	struct ib_uobject *uobj;
> +	int ret;
> +
> +	if (access == UVERBS_IDR_ACCESS_NEW) {
> +		uobj = kmalloc(type->obj_size, GFP_KERNEL);
> +		if (!uobj)
> +			return ERR_PTR(-ENOMEM);
> +
> +		init_uobj(uobj, 0, ucontext);
> +
> +		/* lock idr */

Command to lock idr, but no lock is obtained.

> +		ret = ib_uverbs_uobject_add(uobj, type);
> +		if (ret) {
> +			kfree(uobj);
> +			return ERR_PTR(ret);
> +		}
> +
> +	} else {
> +		uobj = get_uobject_from_context(ucontext, type, idr,
> +						access);
> +
> +		if (!uobj)
> +			return ERR_PTR(-ENOENT);
> +	}
> +
> +	return uobj;
> +}
> +
> +struct ib_uobject *uverbs_get_type_from_fd(const struct
> uverbs_type_alloc_action *type,
> +					   struct ib_ucontext *ucontext,
> +					   enum uverbs_idr_access access,
> +					   int fd)
> +{
> +	if (access == UVERBS_IDR_ACCESS_NEW) {
> +		int _fd;
> +		struct ib_uobject *uobj = NULL;
> +		struct file *filp;
> +
> +		_fd = get_unused_fd_flags(O_CLOEXEC);
> +		if (_fd < 0 || WARN_ON(type->obj_size < sizeof(struct
> ib_uobject)))
> +			return ERR_PTR(_fd);
> +
> +		uobj = kmalloc(type->obj_size, GFP_KERNEL);
> +		init_uobj(uobj, 0, ucontext);
> +
> +		if (!uobj)
> +			return ERR_PTR(-ENOMEM);
> +
> +		filp = anon_inode_getfile(type->fd.name, type->fd.fops,
> +					  uobj + 1, type->fd.flags);
> +		if (IS_ERR(filp)) {
> +			put_unused_fd(_fd);
> +			kfree(uobj);
> +			return (void *)filp;
> +		}
> +
> +		uobj->type = type;
> +		uobj->id = _fd;
> +		uobj->object = filp;
> +
> +		return uobj;
> +	} else if (access == UVERBS_IDR_ACCESS_READ) {
> +		struct file *f = fget(fd);
> +		struct ib_uobject *uobject;
> +
> +		if (!f)
> +			return ERR_PTR(-EBADF);
> +
> +		uobject = f->private_data - sizeof(struct ib_uobject);
> +		if (f->f_op != type->fd.fops ||
> +		    !uobject->live) {
> +			fput(f);
> +			return ERR_PTR(-EBADF);
> +		}
> +
> +		/*
> +		 * No need to protect it with a ref count, as fget
> increases
> +		 * f_count.
> +		 */
> +		return uobject;
> +	} else {
> +		return ERR_PTR(-EOPNOTSUPP);
> +	}
> +}
> +
> +static void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
> +{
> +	mutex_lock(&uobject->context->uobjects_lock->lock);
> +	list_add(&uobject->list, &uobject->context->uobjects);
> +	mutex_unlock(&uobject->context->uobjects_lock->lock);

Why not just insert the object into the list on creation?

> +	uobject->live = 1;

See my comments above on removing the live field.

> +}
> +
> +static void ib_uverbs_uobject_remove(struct ib_uobject *uobject, bool
> lock)
> +{
> +	/*
> +	 * Calling remove requires exclusive access, so it's not possible
> +	 * another thread will use our object.
> +	 */
> +	uobject->live = 0;
> +	uobject->type->free_fn(uobject->type, uobject);
> +	if (lock)
> +		mutex_lock(&uobject->context->uobjects_lock->lock);
> +	list_del(&uobject->list);
> +	if (lock)
> +		mutex_unlock(&uobject->context->uobjects_lock->lock);
> +	remove_uobj(uobject);
> +	put_uobj(uobject);
> +}
> +
> +static void uverbs_unlock_idr(struct ib_uobject *uobj,
> +			      enum uverbs_idr_access access,
> +			      bool success)
> +{
> +	switch (access) {
> +	case UVERBS_IDR_ACCESS_READ:
> +		up_read(&uobj->usecnt);
> +		break;
> +	case UVERBS_IDR_ACCESS_NEW:
> +		if (success) {
> +			ib_uverbs_uobject_enable(uobj);
> +		} else {
> +			remove_uobj(uobj);
> +			put_uobj(uobj);
> +		}
> +		break;
> +	case UVERBS_IDR_ACCESS_WRITE:
> +		up_write(&uobj->usecnt);
> +		break;
> +	case UVERBS_IDR_ACCESS_DESTROY:
> +		if (success)
> +			ib_uverbs_uobject_remove(uobj, true);
> +		else
> +			up_write(&uobj->usecnt);
> +		break;
> +	}
> +}
> +
> +static void uverbs_unlock_fd(struct ib_uobject *uobj,
> +			     enum uverbs_idr_access access,
> +			     bool success)
> +{
> +	struct file *filp = uobj->object;
> +
> +	if (access == UVERBS_IDR_ACCESS_NEW) {
> +		if (success) {
> +			kref_get(&uobj->context->ufile->ref);
> +			uobj->uobjects_lock = uobj->context->uobjects_lock;
> +			kref_get(&uobj->uobjects_lock->ref);
> +			ib_uverbs_uobject_enable(uobj);
> +			fd_install(uobj->id, uobj->object);

I don't get this.  The function is unlocking something, but there are calls to get krefs?

> +		} else {
> +			fput(uobj->object);
> +			put_unused_fd(uobj->id);
> +			kfree(uobj);
> +		}
> +	} else {
> +		fput(filp);
> +	}
> +}
> +
> +void uverbs_unlock_object(struct ib_uobject *uobj,
> +			  enum uverbs_idr_access access,
> +			  bool success)
> +{
> +	if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
> +		uverbs_unlock_idr(uobj, access, success);
> +	else if (uobj->type->type == UVERBS_ATTR_TYPE_FD)
> +		uverbs_unlock_fd(uobj, access, success);
> +	else
> +		WARN_ON(true);
> +}
> +
> +static void ib_uverbs_remove_fd(struct ib_uobject *uobject)
> +{
> +	/*
> +	 * user should release the uobject in the release
> +	 * callback.
> +	 */
> +	if (uobject->live) {
> +		uobject->live = 0;
> +		list_del(&uobject->list);
> +		uobject->type->free_fn(uobject->type, uobject);
> +		kref_put(&uobject->context->ufile->ref,
> ib_uverbs_release_file);
> +		uobject->context = NULL;
> +	}
> +}
> +
> +void ib_uverbs_close_fd(struct file *f)
> +{
> +	struct ib_uobject *uobject = f->private_data - sizeof(struct
> ib_uobject);
> +
> +	mutex_lock(&uobject->uobjects_lock->lock);
> +	if (uobject->live) {
> +		uobject->live = 0;
> +		list_del(&uobject->list);
> +		kref_put(&uobject->context->ufile->ref,
> ib_uverbs_release_file);
> +		uobject->context = NULL;
> +	}
> +	mutex_unlock(&uobject->uobjects_lock->lock);
> +	kref_put(&uobject->uobjects_lock->ref,
> release_uobjects_list_lock);
> +}
> +
> +void ib_uverbs_cleanup_fd(void *private_data)
> +{
> +	struct ib_uboject *uobject = private_data - sizeof(struct
> ib_uobject);
> +
> +	kfree(uobject);
> +}
> +
> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
> +			   size_t num,
> +			   const struct uverbs_action_spec *spec,
> +			   bool success)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < num; i++) {
> +		struct uverbs_attr_array *attr_spec_array = &attr_array[i];
> +		const struct uverbs_attr_group_spec *group_spec =
> +			spec->attr_groups[i];
> +		unsigned int j;
> +
> +		for (j = 0; j < attr_spec_array->num_attrs; j++) {
> +			struct uverbs_attr *attr = &attr_spec_array-
> >attrs[j];
> +			struct uverbs_attr_spec *spec = &group_spec-
> >attrs[j];
> +
> +			if (!attr->valid)
> +				continue;
> +
> +			if (spec->type == UVERBS_ATTR_TYPE_IDR ||
> +			    spec->type == UVERBS_ATTR_TYPE_FD)
> +				/*
> +				 * refcounts should be handled at the object
> +				 * level and not at the uobject level.
> +				 */
> +				uverbs_unlock_object(attr->obj_attr.uobject,
> +						     spec->obj.access, success);
> +		}
> +	}
> +}
> +
> +static unsigned int get_type_orders(const struct uverbs_types_group
> *types_group)
> +{
> +	unsigned int i;
> +	unsigned int max = 0;
> +
> +	for (i = 0; i < types_group->num_groups; i++) {
> +		unsigned int j;
> +		const struct uverbs_types *types = types_group-
> >type_groups[i];
> +
> +		for (j = 0; j < types->num_types; j++) {
> +			if (!types->types[j] || !types->types[j]->alloc)
> +				continue;
> +			if (types->types[j]->alloc->order > max)
> +				max = types->types[j]->alloc->order;
> +		}
> +	}
> +
> +	return max;
> +}
> +
> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
> *ucontext,
> +					     const struct uverbs_types_group
> *types_group)
> +{
> +	unsigned int num_orders = get_type_orders(types_group);
> +	unsigned int i;
> +
> +	for (i = 0; i <= num_orders; i++) {
> +		struct ib_uobject *obj, *next_obj;
> +
> +		/*
> +		 * No need to take lock here, as cleanup should be called
> +		 * after all commands finished executing. Newly executed
> +		 * commands should fail.
> +		 */
> +		mutex_lock(&ucontext->uobjects_lock->lock);

It's really confusing to see a comment about 'no need to take lock' immediately followed by a call to lock.

> +		list_for_each_entry_safe(obj, next_obj, &ucontext-
> >uobjects,
> +					 list)
> +			if (obj->type->order == i) {
> +				if (obj->type->type == UVERBS_ATTR_TYPE_IDR)
> +					ib_uverbs_uobject_remove(obj, false);
> +				else
> +					ib_uverbs_remove_fd(obj);
> +			}
> +		mutex_unlock(&ucontext->uobjects_lock->lock);
> +	}
> +	kref_put(&ucontext->uobjects_lock->ref,
> release_uobjects_list_lock);
> +}
> +
> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
> *ucontext)

Please work on the function names.  This is horrendously long and still doesn't help describe what it does.

> +{
> +	ucontext->uobjects_lock = kmalloc(sizeof(*ucontext-
> >uobjects_lock),
> +					  GFP_KERNEL);
> +	if (!ucontext->uobjects_lock)
> +		return -ENOMEM;
> +
> +	init_uobjects_list_lock(ucontext->uobjects_lock);
> +	INIT_LIST_HEAD(&ucontext->uobjects);
> +
> +	return 0;
> +}
> +
> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
> *ucontext)
> +{
> +	kfree(ucontext->uobjects_lock);
> +}

No need to wrap a call to 'free'.

> +
> diff --git a/drivers/infiniband/core/rdma_core.h
> b/drivers/infiniband/core/rdma_core.h
> new file mode 100644
> index 0000000..8990115
> --- /dev/null
> +++ b/drivers/infiniband/core/rdma_core.h
> @@ -0,0 +1,75 @@
> +/*
> + * Copyright (c) 2005 Topspin Communications.  All rights reserved.
> + * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
> + * Copyright (c) 2005-2016 Mellanox Technologies. All rights reserved.
> + * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
> + * Copyright (c) 2005 PathScale, Inc. All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef UOBJECT_H
> +#define UOBJECT_H
> +
> +#include <linux/idr.h>
> +#include <rdma/uverbs_ioctl.h>
> +#include <rdma/ib_verbs.h>
> +#include <linux/mutex.h>
> +
> +const struct uverbs_type *uverbs_get_type(const struct ib_device
> *ibdev,
> +					  uint16_t type);
> +struct ib_uobject *uverbs_get_type_from_idr(const struct
> uverbs_type_alloc_action *type,
> +					    struct ib_ucontext *ucontext,
> +					    enum uverbs_idr_access access,
> +					    uint32_t idr);
> +struct ib_uobject *uverbs_get_type_from_fd(const struct
> uverbs_type_alloc_action *type,
> +					   struct ib_ucontext *ucontext,
> +					   enum uverbs_idr_access access,
> +					   int fd);
> +void uverbs_unlock_object(struct ib_uobject *uobj,
> +			  enum uverbs_idr_access access,
> +			  bool success);
> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
> +			   size_t num,
> +			   const struct uverbs_action_spec *spec,
> +			   bool success);
> +
> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
> *ucontext,
> +					     const struct uverbs_types_group
> *types_group);
> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
> *ucontext);
> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
> *ucontext);
> +void ib_uverbs_close_fd(struct file *f);
> +void ib_uverbs_cleanup_fd(void *private_data);
> +
> +static inline void *uverbs_fd_to_priv(struct ib_uobject *uobj)
> +{
> +	return uobj + 1;
> +}

This seems like a rather useless function.

> +
> +#endif /* UIDR_H */
> diff --git a/drivers/infiniband/core/uverbs.h
> b/drivers/infiniband/core/uverbs.h
> index 8074705..ae7d4b8 100644
> --- a/drivers/infiniband/core/uverbs.h
> +++ b/drivers/infiniband/core/uverbs.h
> @@ -180,6 +180,7 @@ void idr_remove_uobj(struct ib_uobject *uobj);
>  struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file
> *uverbs_file,
>  					struct ib_device *ib_dev,
>  					int is_async);
> +void ib_uverbs_release_file(struct kref *ref);
>  void ib_uverbs_free_async_event_file(struct ib_uverbs_file
> *uverbs_file);
>  struct ib_uverbs_event_file *ib_uverbs_lookup_comp_file(int fd);
> 
> diff --git a/drivers/infiniband/core/uverbs_main.c
> b/drivers/infiniband/core/uverbs_main.c
> index f783723..e63357a 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -341,7 +341,7 @@ static void ib_uverbs_comp_dev(struct
> ib_uverbs_device *dev)
>  	complete(&dev->comp);
>  }
> 
> -static void ib_uverbs_release_file(struct kref *ref)
> +void ib_uverbs_release_file(struct kref *ref)
>  {
>  	struct ib_uverbs_file *file =
>  		container_of(ref, struct ib_uverbs_file, ref);
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index b5d2075..7240615 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1329,8 +1329,11 @@ struct ib_fmr_attr {
> 
>  struct ib_umem;
> 
> +struct ib_ucontext_lock;
> +
>  struct ib_ucontext {
>  	struct ib_device       *device;
> +	struct ib_uverbs_file  *ufile;
>  	struct list_head	pd_list;
>  	struct list_head	mr_list;
>  	struct list_head	mw_list;
> @@ -1344,6 +1347,10 @@ struct ib_ucontext {
>  	struct list_head	rwq_ind_tbl_list;
>  	int			closing;
> 
> +	/* lock for uobjects list */
> +	struct ib_ucontext_lock	*uobjects_lock;
> +	struct list_head	uobjects;
> +
>  	struct pid             *tgid;
>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
>  	struct rb_root      umem_tree;
> @@ -1363,16 +1370,28 @@ struct ib_ucontext {
>  #endif
>  };
> 
> +struct uverbs_object_list;
> +
> +#define OLD_ABI_COMPAT
> +
>  struct ib_uobject {
>  	u64			user_handle;	/* handle given to us by userspace
> */
>  	struct ib_ucontext     *context;	/* associated user context
> */
>  	void		       *object;		/* containing object */
>  	struct list_head	list;		/* link to context's list */
> -	int			id;		/* index into kernel idr */
> -	struct kref		ref;
> -	struct rw_semaphore	mutex;		/* protects .live */
> +	int			id;		/* index into kernel idr/fd */
> +#ifdef OLD_ABI_COMPAT
> +	struct kref             ref;
> +#endif
> +	struct rw_semaphore	usecnt;		/* protects exclusive
> access */
> +#ifdef OLD_ABI_COMPAT
> +	struct rw_semaphore     mutex;          /* protects .live */
> +#endif
>  	struct rcu_head		rcu;		/* kfree_rcu() overhead */
>  	int			live;
> +
> +	const struct uverbs_type_alloc_action *type;
> +	struct ib_ucontext_lock	*uobjects_lock;
>  };
> 
>  struct ib_udata {
> @@ -2101,6 +2120,9 @@ struct ib_device {
>  	 */
>  	int (*get_port_immutable)(struct ib_device *, u8, struct
> ib_port_immutable *);
>  	void (*get_dev_fw_str)(struct ib_device *, char *str, size_t
> str_len);
> +	struct list_head type_list;
> +
> +	const struct uverbs_types_group	*types_group;
>  };
> 
>  struct ib_client {
> diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
> new file mode 100644
> index 0000000..2f50045
> --- /dev/null
> +++ b/include/rdma/uverbs_ioctl.h
> @@ -0,0 +1,195 @@
> +/*
> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
> reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef _UVERBS_IOCTL_
> +#define _UVERBS_IOCTL_
> +
> +#include <linux/kernel.h>
> +
> +struct uverbs_object_type;
> +struct ib_ucontext;
> +struct ib_uobject;
> +struct ib_device;
> +struct uverbs_uobject_type;
> +
> +/*
> + * =======================================
> + *	Verbs action specifications
> + * =======================================
> + */

I intentionally used urdma (though condensed to 3 letters that I don't recall atm), rather than uverbs.  This will need to work with non-verbs devices and interfaces -- again, consider how this fits with the rdma cm.  Verbs has a very specific meaning, which gets lost if we start referring to everything as 'verbs'.  It's bad enough that we're stuck with 'drivers/infiniband' and 'rdma', such that 'infiniband' also means ethernet and rdma means nothing. 

> +
> +enum uverbs_attr_type {
> +	UVERBS_ATTR_TYPE_PTR_IN,
> +	UVERBS_ATTR_TYPE_PTR_OUT,
> +	UVERBS_ATTR_TYPE_IDR,
> +	UVERBS_ATTR_TYPE_FD,
> +};
> +
> +enum uverbs_idr_access {
> +	UVERBS_IDR_ACCESS_READ,
> +	UVERBS_IDR_ACCESS_WRITE,
> +	UVERBS_IDR_ACCESS_NEW,
> +	UVERBS_IDR_ACCESS_DESTROY
> +};
> +
> +struct uverbs_attr_spec {
> +	u16				len;
> +	enum uverbs_attr_type		type;
> +	struct {
> +		u16			obj_type;
> +		u8			access;

Is access intended to be an enum uverbs_idr_access value?

> +	} obj;

I would remove (flatten) the substructure and re-order the fields for better alignment.

> +};
> +
> +struct uverbs_attr_group_spec {
> +	struct uverbs_attr_spec		*attrs;
> +	size_t				num_attrs;
> +};
> +
> +struct uverbs_action_spec {
> +	const struct uverbs_attr_group_spec		**attr_groups;
> +	/* if > 0 -> validator, otherwise, error */

? not sure what this comment means

> +	int (*dist)(__u16 *attr_id, void *priv);

What does 'dist' stand for?

> +	void						*priv;
> +	size_t						num_groups;
> +};
> +
> +struct uverbs_attr_array;
> +struct ib_uverbs_file;
> +
> +struct uverbs_action {
> +	struct uverbs_action_spec spec;
> +	void *priv;
> +	int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file
> *ufile,
> +		       struct uverbs_attr_array *ctx, size_t num, void
> *priv);
> +};
> +
> +struct uverbs_type_alloc_action;
> +typedef void (*free_type)(const struct uverbs_type_alloc_action
> *uobject_type,
> +			  struct ib_uobject *uobject);
> +
> +struct uverbs_type_alloc_action {
> +	enum uverbs_attr_type		type;
> +	int				order;

I think this is being used as destroy order, in which case I would rename it to clarify the intent.  Though I'd prefer we come up with a more efficient destruction mechanism than the repeated nested looping.

> +	size_t				obj_size;

This can be alloc_fn

> +	free_type			free_fn;
> +	struct {
> +		const struct file_operations	*fops;
> +		const char			*name;
> +		int				flags;
> +	} fd;
> +};
> +
> +struct uverbs_type_actions_group {
> +	size_t					num_actions;
> +	const struct uverbs_action		**actions;
> +};
> +
> +struct uverbs_type {
> +	size_t					num_groups;
> +	const struct uverbs_type_actions_group	**action_groups;
> +	const struct uverbs_type_alloc_action	*alloc;
> +	int (*dist)(__u16 *action_id, void *priv);
> +	void					*priv;
> +};
> +
> +struct uverbs_types {
> +	size_t					num_types;
> +	const struct uverbs_type		**types;
> +};
> +
> +struct uverbs_types_group {
> +	const struct uverbs_types		**type_groups;
> +	size_t					num_groups;
> +	int (*dist)(__u16 *type_id, void *priv);
> +	void					*priv;
> +};
> +
> +/* =================================================
> + *              Parsing infrastructure
> + * =================================================
> + */
> +
> +struct uverbs_ptr_attr {
> +	void	* __user ptr;
> +	__u16		len;
> +};
> +
> +struct uverbs_fd_attr {
> +	int		fd;
> +};
> +
> +struct uverbs_uobj_attr {
> +	/*  idr handle */
> +	__u32	idr;
> +};
> +
> +struct uverbs_obj_attr {
> +	/* pointer to the kernel descriptor -> type, access, etc */
> +	const struct uverbs_attr_spec *val;
> +	struct ib_uverbs_attr __user	*uattr;
> +	const struct uverbs_type_alloc_action	*type;
> +	struct ib_uobject		*uobject;
> +	union {
> +		struct uverbs_fd_attr		fd;
> +		struct uverbs_uobj_attr		uobj;
> +	};
> +};
> +
> +struct uverbs_attr {
> +	bool valid;
> +	union {
> +		struct uverbs_ptr_attr	cmd_attr;
> +		struct uverbs_obj_attr	obj_attr;
> +	};
> +};

It's odd to have a union that's part of a structure without some field to indicate which union field is accessible.

> +
> +/* output of one validator */
> +struct uverbs_attr_array {
> +	size_t num_attrs;
> +	/* arrays of attrubytes, index is the id i.e SEND_CQ */
> +	struct uverbs_attr *attrs;
> +};
> +
> +/* =================================================
> + *              Types infrastructure
> + * =================================================
> + */
> +
> +int ib_uverbs_uobject_type_add(struct list_head	*head,
> +			       void (*free)(struct uverbs_uobject_type *type,
> +					    struct ib_uobject *uobject,
> +					    struct ib_ucontext *ucontext),
> +			       uint16_t	obj_type);
> +void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev);
> +
> +#endif
> --
> 2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 02/10] RDMA/core: Add support for custom types
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373AB0A47BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2016-10-31 22:58           ` Matan Barak
       [not found]             ` <CAAKD3BDWyb10baLrDu=m_mYPB64i9OOPEPVYKtDo9zVbvMM-UA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Matan Barak @ 2016-10-31 22:58 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Jason Gunthorpe, Christoph Lameter, Liran Liss, Haggai Eran,
	Majd Dibbiny, Tal Alon, Leon Romanovsky

On Sun, Oct 30, 2016 at 9:28 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> I found this patch very hard to follow.  This was in part due to the output of the patch command itself, but also because there lacked sufficient documentation on what the new data structures were for and the terms being used.  As a result, I had to bounce around the patch to figure things out, adding comments as I went along, until I finally just gave up trying to read it.

Actually, there are some helpful slides in the OFVWG presentations. I
guess it would be best to clarify the model in the commit message.

>
>> The new ioctl infrastructure supports driver specific objects.
>> Each such object type has a free function, allocation size and an
>
> You can replace the allocation size with an alloc function, to pair with the free call.  Then the object can be initialized by the user.
>

I had thought about that, but the user could initialize its part of
the object in the function handler. It can't allocate the object as we
need it in order to allocate an IDR entry and co. The assumption here
is that the "unlock" stage can't fail.

>> order of destruction. This information is embedded in the same
>> table describing the various action allowed on the object, similarly
>> to object oriented programming.
>>
>> When a ucontext is created, a new list is created in this ib_ucontext.
>> This list contains all objects created under this ib_ucontext.
>> When a ib_ucontext is destroyed, we traverse this list several time
>> destroying the various objects by the order mentioned in the object
>> type description. If few object types have the same destruction order,
>> they are destroyed in an order opposite to their creation order.
>
> Could we simply walk the list backwards, destroying all objects with a reference count of 1 - repeat if necessary?  Basically avoid complex rules for this.
>

That's problematic in the MW case. A MW could be disassociated from
its MR by a remote peer. The kernel can't follow that.

> In fact, it would be great if we could just cleanup the list in the reverse order that items were created.  Maybe this requires supporting a pre-cleanup handler, so that the driver can pluck items out of the list that may need to be destroyed out of order.
>

So that's essentially one layer of ordering. Why do you consider a
driver iterating over all objects simpler than this model?

>> Adding an object is done in two parts.
>> First, an object is allocated and added to IDR/fd table. Then, the
>> command's handlers (in downstream patches) could work on this object
>> and fill in its required details.
>> After a successful command, ib_uverbs_uobject_enable is called and
>> this user objects becomes ucontext visible.
>
> If you have a way to mark that an object is used for exclusive access, you may be able to use that instead of introducing a new variable.  (I.e. acquire the object's write lock).  I think we want to make an effort to minimize the size of the kernel structure needed to track every user space object (within reason).
>

I didn't really follow. A command attribute states the nature of the
locking (for example, in MODIFY_QP the QP could be exclusively locked,
but in QUERY_QP it's only locked for reading). I don't want to really
grab a lock, as if I were I could face a dead-lock (user-space could
pass parameters in a colliding order), It could be solved by sorting
the handles, but that would degrade performance without a good reasob.

>> Removing an uboject is done by calling ib_uverbs_uobject_remove.
>>
>> We should make sure IDR (per-device) and list (per-ucontext) could
>> be accessed concurrently without corrupting them.
>>
>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>
> As a general comment, I do have concerns that the resulting generalized parsing of everything will negatively impact performance for operations that do have to transition into the kernel.  Not all devices offload all operations to user space.  Plus the resulting code is extremely difficult to read and non-trivial to use.  It's equivalent to reading C++ code that has 4 layers of inheritance with overrides to basic operators...

There are two parts here. I think the handlers themselves are simpler,
easier to read and less error-prone. They contain less code
duplications. The macro based define language explicitly declare all
attributes, their types, size, etc.
The model here is a bit more complex as we want to achieve both code
resue and add/override of new types/actions/attributes.


>
> Pre and post operators per command that can do straightforward validation seem like a better option.
>
>

I think that would duplicate a lot of code and will be more
error-prone than one infrastrucutre that automates all that work for
you.

>>  drivers/infiniband/core/Makefile      |   3 +-
>>  drivers/infiniband/core/device.c      |   1 +
>>  drivers/infiniband/core/rdma_core.c   | 489
>> ++++++++++++++++++++++++++++++++++
>>  drivers/infiniband/core/rdma_core.h   |  75 ++++++
>>  drivers/infiniband/core/uverbs.h      |   1 +
>>  drivers/infiniband/core/uverbs_main.c |   2 +-
>>  include/rdma/ib_verbs.h               |  28 +-
>>  include/rdma/uverbs_ioctl.h           | 195 ++++++++++++++
>>  8 files changed, 789 insertions(+), 5 deletions(-)
>>  create mode 100644 drivers/infiniband/core/rdma_core.c
>>  create mode 100644 drivers/infiniband/core/rdma_core.h
>>  create mode 100644 include/rdma/uverbs_ioctl.h
>>
>> diff --git a/drivers/infiniband/core/Makefile
>> b/drivers/infiniband/core/Makefile
>> index edaae9f..1819623 100644
>> --- a/drivers/infiniband/core/Makefile
>> +++ b/drivers/infiniband/core/Makefile
>> @@ -28,4 +28,5 @@ ib_umad-y :=                        user_mad.o
>>
>>  ib_ucm-y :=                  ucm.o
>>
>> -ib_uverbs-y :=                       uverbs_main.o uverbs_cmd.o
>> uverbs_marshall.o
>> +ib_uverbs-y :=                       uverbs_main.o uverbs_cmd.o
>> uverbs_marshall.o \
>> +                             rdma_core.o
>> diff --git a/drivers/infiniband/core/device.c
>> b/drivers/infiniband/core/device.c
>> index c3b68f5..43994b1 100644
>> --- a/drivers/infiniband/core/device.c
>> +++ b/drivers/infiniband/core/device.c
>> @@ -243,6 +243,7 @@ struct ib_device *ib_alloc_device(size_t size)
>>       spin_lock_init(&device->client_data_lock);
>>       INIT_LIST_HEAD(&device->client_data_list);
>>       INIT_LIST_HEAD(&device->port_list);
>> +     INIT_LIST_HEAD(&device->type_list);
>>
>>       return device;
>>  }
>> diff --git a/drivers/infiniband/core/rdma_core.c
>> b/drivers/infiniband/core/rdma_core.c
>> new file mode 100644
>> index 0000000..337abc2
>> --- /dev/null
>> +++ b/drivers/infiniband/core/rdma_core.c
>> @@ -0,0 +1,489 @@
>> +/*
>> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
>> reserved.
>> + *
>> + * This software is available to you under a choice of one of two
>> + * licenses.  You may choose to be licensed under the terms of the GNU
>> + * General Public License (GPL) Version 2, available from the file
>> + * COPYING in the main directory of this source tree, or the
>> + * OpenIB.org BSD license below:
>> + *
>> + *     Redistribution and use in source and binary forms, with or
>> + *     without modification, are permitted provided that the following
>> + *     conditions are met:
>> + *
>> + *      - Redistributions of source code must retain the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer.
>> + *
>> + *      - Redistributions in binary form must reproduce the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer in the documentation and/or other materials
>> + *        provided with the distribution.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> + * SOFTWARE.
>> + */
>> +
>> +#include <linux/file.h>
>> +#include <linux/anon_inodes.h>
>> +#include <rdma/ib_verbs.h>
>> +#include "uverbs.h"
>> +#include "rdma_core.h"
>> +#include <rdma/uverbs_ioctl.h>
>> +
>> +const struct uverbs_type *uverbs_get_type(const struct ib_device
>> *ibdev,
>> +                                       uint16_t type)
>> +{
>> +     const struct uverbs_types_group *groups = ibdev->types_group;
>> +     const struct uverbs_types *types;
>> +     int ret = groups->dist(&type, groups->priv);
>> +
>> +     if (ret >= groups->num_groups)
>> +             return NULL;
>> +
>> +     types = groups->type_groups[ret];
>> +
>> +     if (type >= types->num_types)
>> +             return NULL;
>> +
>> +     return types->types[type];
>> +}
>> +
>> +static int uverbs_lock_object(struct ib_uobject *uobj,
>> +                           enum uverbs_idr_access access)
>> +{
>> +     if (access == UVERBS_IDR_ACCESS_READ)
>> +             return down_read_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
>> +
>> +     /* lock is either WRITE or DESTROY - should be exclusive */
>> +     return down_write_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
>
> This function could take the lock type directly (read or write), versus inferring it based on some other access type.
>

We can, but since we use these enums in the attribute specifications,
I thought it could be more convinient.

>> +}
>> +
>> +static struct ib_uobject *get_uobj(int id, struct ib_ucontext
>> *context)
>> +{
>> +     struct ib_uobject *uobj;
>> +
>> +     rcu_read_lock();
>> +     uobj = idr_find(&context->device->idr, id);
>> +     if (uobj && uobj->live) {
>> +             if (uobj->context != context)
>> +                     uobj = NULL;
>> +     }
>> +     rcu_read_unlock();
>> +
>> +     return uobj;
>> +}
>> +
>> +struct ib_ucontext_lock {
>> +     struct kref  ref;
>> +     /* locking the uobjects_list */
>> +     struct mutex lock;
>> +};
>> +
>> +static void init_uobjects_list_lock(struct ib_ucontext_lock *lock)
>> +{
>> +     mutex_init(&lock->lock);
>> +     kref_init(&lock->ref);
>> +}
>> +
>> +static void release_uobjects_list_lock(struct kref *ref)
>> +{
>> +     struct ib_ucontext_lock *lock = container_of(ref,
>> +                                                  struct ib_ucontext_lock,
>> +                                                  ref);
>> +
>> +     kfree(lock);
>> +}
>> +
>> +static void init_uobj(struct ib_uobject *uobj, u64 user_handle,
>> +                   struct ib_ucontext *context)
>> +{
>> +     init_rwsem(&uobj->usecnt);
>> +     uobj->user_handle = user_handle;
>> +     uobj->context     = context;
>> +     uobj->live        = 0;
>> +}
>> +
>> +static int add_uobj(struct ib_uobject *uobj)
>> +{
>> +     int ret;
>> +
>> +     idr_preload(GFP_KERNEL);
>> +     spin_lock(&uobj->context->device->idr_lock);
>> +
>> +     ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0,
>> GFP_NOWAIT);
>> +     if (ret >= 0)
>> +             uobj->id = ret;
>> +
>> +     spin_unlock(&uobj->context->device->idr_lock);
>> +     idr_preload_end();
>> +
>> +     return ret < 0 ? ret : 0;
>> +}
>> +
>> +static void remove_uobj(struct ib_uobject *uobj)
>> +{
>> +     spin_lock(&uobj->context->device->idr_lock);
>> +     idr_remove(&uobj->context->device->idr, uobj->id);
>> +     spin_unlock(&uobj->context->device->idr_lock);
>> +}
>> +
>> +static void put_uobj(struct ib_uobject *uobj)
>> +{
>> +     kfree_rcu(uobj, rcu);
>> +}
>> +
>> +static struct ib_uobject *get_uobject_from_context(struct ib_ucontext
>> *ucontext,
>> +                                                const struct
>> uverbs_type_alloc_action *type,
>> +                                                u32 idr,
>> +                                                enum uverbs_idr_access access)
>> +{
>> +     struct ib_uobject *uobj;
>> +     int ret;
>> +
>> +     rcu_read_lock();
>> +     uobj = get_uobj(idr, ucontext);
>> +     if (!uobj)
>> +             goto free;
>> +
>> +     if (uobj->type != type) {
>> +             uobj = NULL;
>> +             goto free;
>> +     }
>> +
>> +     ret = uverbs_lock_object(uobj, access);
>> +     if (ret)
>> +             uobj = ERR_PTR(ret);
>> +free:
>> +     rcu_read_unlock();
>> +     return uobj;
>> +
>> +     return NULL;
>> +}
>> +
>> +static int ib_uverbs_uobject_add(struct ib_uobject *uobject,
>> +                              const struct uverbs_type_alloc_action
>> *uobject_type)
>> +{
>> +     uobject->type = uobject_type;
>> +     return add_uobj(uobject);
>> +}
>> +
>> +struct ib_uobject *uverbs_get_type_from_idr(const struct
>> uverbs_type_alloc_action *type,
>> +                                         struct ib_ucontext *ucontext,
>> +                                         enum uverbs_idr_access access,
>> +                                         uint32_t idr)
>> +{
>> +     struct ib_uobject *uobj;
>> +     int ret;
>> +
>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>> +             uobj = kmalloc(type->obj_size, GFP_KERNEL);
>> +             if (!uobj)
>> +                     return ERR_PTR(-ENOMEM);
>> +
>> +             init_uobj(uobj, 0, ucontext);
>> +
>> +             /* lock idr */
>
> Command to lock idr, but no lock is obtained.
>

ib_uverbs_uobject_add calls add_uobj which locks the IDR.

>> +             ret = ib_uverbs_uobject_add(uobj, type);
>> +             if (ret) {
>> +                     kfree(uobj);
>> +                     return ERR_PTR(ret);
>> +             }
>> +
>> +     } else {
>> +             uobj = get_uobject_from_context(ucontext, type, idr,
>> +                                             access);
>> +
>> +             if (!uobj)
>> +                     return ERR_PTR(-ENOENT);
>> +     }
>> +
>> +     return uobj;
>> +}
>> +
>> +struct ib_uobject *uverbs_get_type_from_fd(const struct
>> uverbs_type_alloc_action *type,
>> +                                        struct ib_ucontext *ucontext,
>> +                                        enum uverbs_idr_access access,
>> +                                        int fd)
>> +{
>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>> +             int _fd;
>> +             struct ib_uobject *uobj = NULL;
>> +             struct file *filp;
>> +
>> +             _fd = get_unused_fd_flags(O_CLOEXEC);
>> +             if (_fd < 0 || WARN_ON(type->obj_size < sizeof(struct
>> ib_uobject)))
>> +                     return ERR_PTR(_fd);
>> +
>> +             uobj = kmalloc(type->obj_size, GFP_KERNEL);
>> +             init_uobj(uobj, 0, ucontext);
>> +
>> +             if (!uobj)
>> +                     return ERR_PTR(-ENOMEM);
>> +
>> +             filp = anon_inode_getfile(type->fd.name, type->fd.fops,
>> +                                       uobj + 1, type->fd.flags);
>> +             if (IS_ERR(filp)) {
>> +                     put_unused_fd(_fd);
>> +                     kfree(uobj);
>> +                     return (void *)filp;
>> +             }
>> +
>> +             uobj->type = type;
>> +             uobj->id = _fd;
>> +             uobj->object = filp;
>> +
>> +             return uobj;
>> +     } else if (access == UVERBS_IDR_ACCESS_READ) {
>> +             struct file *f = fget(fd);
>> +             struct ib_uobject *uobject;
>> +
>> +             if (!f)
>> +                     return ERR_PTR(-EBADF);
>> +
>> +             uobject = f->private_data - sizeof(struct ib_uobject);
>> +             if (f->f_op != type->fd.fops ||
>> +                 !uobject->live) {
>> +                     fput(f);
>> +                     return ERR_PTR(-EBADF);
>> +             }
>> +
>> +             /*
>> +              * No need to protect it with a ref count, as fget
>> increases
>> +              * f_count.
>> +              */
>> +             return uobject;
>> +     } else {
>> +             return ERR_PTR(-EOPNOTSUPP);
>> +     }
>> +}
>> +
>> +static void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
>> +{
>> +     mutex_lock(&uobject->context->uobjects_lock->lock);
>> +     list_add(&uobject->list, &uobject->context->uobjects);
>> +     mutex_unlock(&uobject->context->uobjects_lock->lock);
>
> Why not just insert the object into the list on creation?
>
>> +     uobject->live = 1;
>
> See my comments above on removing the live field.
>

Seems that the list could suffice, but I'll look into that.

>> +}
>> +
>> +static void ib_uverbs_uobject_remove(struct ib_uobject *uobject, bool
>> lock)
>> +{
>> +     /*
>> +      * Calling remove requires exclusive access, so it's not possible
>> +      * another thread will use our object.
>> +      */
>> +     uobject->live = 0;
>> +     uobject->type->free_fn(uobject->type, uobject);
>> +     if (lock)
>> +             mutex_lock(&uobject->context->uobjects_lock->lock);
>> +     list_del(&uobject->list);
>> +     if (lock)
>> +             mutex_unlock(&uobject->context->uobjects_lock->lock);
>> +     remove_uobj(uobject);
>> +     put_uobj(uobject);
>> +}
>> +
>> +static void uverbs_unlock_idr(struct ib_uobject *uobj,
>> +                           enum uverbs_idr_access access,
>> +                           bool success)
>> +{
>> +     switch (access) {
>> +     case UVERBS_IDR_ACCESS_READ:
>> +             up_read(&uobj->usecnt);
>> +             break;
>> +     case UVERBS_IDR_ACCESS_NEW:
>> +             if (success) {
>> +                     ib_uverbs_uobject_enable(uobj);
>> +             } else {
>> +                     remove_uobj(uobj);
>> +                     put_uobj(uobj);
>> +             }
>> +             break;
>> +     case UVERBS_IDR_ACCESS_WRITE:
>> +             up_write(&uobj->usecnt);
>> +             break;
>> +     case UVERBS_IDR_ACCESS_DESTROY:
>> +             if (success)
>> +                     ib_uverbs_uobject_remove(uobj, true);
>> +             else
>> +                     up_write(&uobj->usecnt);
>> +             break;
>> +     }
>> +}
>> +
>> +static void uverbs_unlock_fd(struct ib_uobject *uobj,
>> +                          enum uverbs_idr_access access,
>> +                          bool success)
>> +{
>> +     struct file *filp = uobj->object;
>> +
>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>> +             if (success) {
>> +                     kref_get(&uobj->context->ufile->ref);
>> +                     uobj->uobjects_lock = uobj->context->uobjects_lock;
>> +                     kref_get(&uobj->uobjects_lock->ref);
>> +                     ib_uverbs_uobject_enable(uobj);
>> +                     fd_install(uobj->id, uobj->object);
>
> I don't get this.  The function is unlocking something, but there are calls to get krefs?
>

Before invoking the user's callback, we're first locking all objects
and afterwards we're unlocking them. When we need to create a new
object, the lock becomes object creation and the unlock could become
(assuming the user's callback succeeded) enabling this new object.
When you add a new object (or fd in this case), we take a reference
count to both the uverbs_file and the locking context.

>> +             } else {
>> +                     fput(uobj->object);
>> +                     put_unused_fd(uobj->id);
>> +                     kfree(uobj);
>> +             }
>> +     } else {
>> +             fput(filp);
>> +     }
>> +}
>> +
>> +void uverbs_unlock_object(struct ib_uobject *uobj,
>> +                       enum uverbs_idr_access access,
>> +                       bool success)
>> +{
>> +     if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
>> +             uverbs_unlock_idr(uobj, access, success);
>> +     else if (uobj->type->type == UVERBS_ATTR_TYPE_FD)
>> +             uverbs_unlock_fd(uobj, access, success);
>> +     else
>> +             WARN_ON(true);
>> +}
>> +
>> +static void ib_uverbs_remove_fd(struct ib_uobject *uobject)
>> +{
>> +     /*
>> +      * user should release the uobject in the release
>> +      * callback.
>> +      */
>> +     if (uobject->live) {
>> +             uobject->live = 0;
>> +             list_del(&uobject->list);
>> +             uobject->type->free_fn(uobject->type, uobject);
>> +             kref_put(&uobject->context->ufile->ref,
>> ib_uverbs_release_file);
>> +             uobject->context = NULL;
>> +     }
>> +}
>> +
>> +void ib_uverbs_close_fd(struct file *f)
>> +{
>> +     struct ib_uobject *uobject = f->private_data - sizeof(struct
>> ib_uobject);
>> +
>> +     mutex_lock(&uobject->uobjects_lock->lock);
>> +     if (uobject->live) {
>> +             uobject->live = 0;
>> +             list_del(&uobject->list);
>> +             kref_put(&uobject->context->ufile->ref,
>> ib_uverbs_release_file);
>> +             uobject->context = NULL;
>> +     }
>> +     mutex_unlock(&uobject->uobjects_lock->lock);
>> +     kref_put(&uobject->uobjects_lock->ref,
>> release_uobjects_list_lock);
>> +}
>> +
>> +void ib_uverbs_cleanup_fd(void *private_data)
>> +{
>> +     struct ib_uboject *uobject = private_data - sizeof(struct
>> ib_uobject);
>> +
>> +     kfree(uobject);
>> +}
>> +
>> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
>> +                        size_t num,
>> +                        const struct uverbs_action_spec *spec,
>> +                        bool success)
>> +{
>> +     unsigned int i;
>> +
>> +     for (i = 0; i < num; i++) {
>> +             struct uverbs_attr_array *attr_spec_array = &attr_array[i];
>> +             const struct uverbs_attr_group_spec *group_spec =
>> +                     spec->attr_groups[i];
>> +             unsigned int j;
>> +
>> +             for (j = 0; j < attr_spec_array->num_attrs; j++) {
>> +                     struct uverbs_attr *attr = &attr_spec_array-
>> >attrs[j];
>> +                     struct uverbs_attr_spec *spec = &group_spec-
>> >attrs[j];
>> +
>> +                     if (!attr->valid)
>> +                             continue;
>> +
>> +                     if (spec->type == UVERBS_ATTR_TYPE_IDR ||
>> +                         spec->type == UVERBS_ATTR_TYPE_FD)
>> +                             /*
>> +                              * refcounts should be handled at the object
>> +                              * level and not at the uobject level.
>> +                              */
>> +                             uverbs_unlock_object(attr->obj_attr.uobject,
>> +                                                  spec->obj.access, success);
>> +             }
>> +     }
>> +}
>> +
>> +static unsigned int get_type_orders(const struct uverbs_types_group
>> *types_group)
>> +{
>> +     unsigned int i;
>> +     unsigned int max = 0;
>> +
>> +     for (i = 0; i < types_group->num_groups; i++) {
>> +             unsigned int j;
>> +             const struct uverbs_types *types = types_group-
>> >type_groups[i];
>> +
>> +             for (j = 0; j < types->num_types; j++) {
>> +                     if (!types->types[j] || !types->types[j]->alloc)
>> +                             continue;
>> +                     if (types->types[j]->alloc->order > max)
>> +                             max = types->types[j]->alloc->order;
>> +             }
>> +     }
>> +
>> +     return max;
>> +}
>> +
>> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
>> *ucontext,
>> +                                          const struct uverbs_types_group
>> *types_group)
>> +{
>> +     unsigned int num_orders = get_type_orders(types_group);
>> +     unsigned int i;
>> +
>> +     for (i = 0; i <= num_orders; i++) {
>> +             struct ib_uobject *obj, *next_obj;
>> +
>> +             /*
>> +              * No need to take lock here, as cleanup should be called
>> +              * after all commands finished executing. Newly executed
>> +              * commands should fail.
>> +              */
>> +             mutex_lock(&ucontext->uobjects_lock->lock);
>
> It's really confusing to see a comment about 'no need to take lock' immediately followed by a call to lock.
>

Yeah :) That was before adding the fd. I'll delete the comment.

>> +             list_for_each_entry_safe(obj, next_obj, &ucontext-
>> >uobjects,
>> +                                      list)
>> +                     if (obj->type->order == i) {
>> +                             if (obj->type->type == UVERBS_ATTR_TYPE_IDR)
>> +                                     ib_uverbs_uobject_remove(obj, false);
>> +                             else
>> +                                     ib_uverbs_remove_fd(obj);
>> +                     }
>> +             mutex_unlock(&ucontext->uobjects_lock->lock);
>> +     }
>> +     kref_put(&ucontext->uobjects_lock->ref,
>> release_uobjects_list_lock);
>> +}
>> +
>> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
>> *ucontext)
>
> Please work on the function names.  This is horrendously long and still doesn't help describe what it does.
>

This just initialized the types part of the ucontext. Any suggestions?

>> +{
>> +     ucontext->uobjects_lock = kmalloc(sizeof(*ucontext-
>> >uobjects_lock),
>> +                                       GFP_KERNEL);
>> +     if (!ucontext->uobjects_lock)
>> +             return -ENOMEM;
>> +
>> +     init_uobjects_list_lock(ucontext->uobjects_lock);
>> +     INIT_LIST_HEAD(&ucontext->uobjects);
>> +
>> +     return 0;
>> +}
>> +
>> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
>> *ucontext)
>> +{
>> +     kfree(ucontext->uobjects_lock);
>> +}
>
> No need to wrap a call to 'free'.
>

In order to abstract away the ucontext type data structure.

>> +
>> diff --git a/drivers/infiniband/core/rdma_core.h
>> b/drivers/infiniband/core/rdma_core.h
>> new file mode 100644
>> index 0000000..8990115
>> --- /dev/null
>> +++ b/drivers/infiniband/core/rdma_core.h
>> @@ -0,0 +1,75 @@
>> +/*
>> + * Copyright (c) 2005 Topspin Communications.  All rights reserved.
>> + * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
>> + * Copyright (c) 2005-2016 Mellanox Technologies. All rights reserved.
>> + * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
>> + * Copyright (c) 2005 PathScale, Inc. All rights reserved.
>> + *
>> + * This software is available to you under a choice of one of two
>> + * licenses.  You may choose to be licensed under the terms of the GNU
>> + * General Public License (GPL) Version 2, available from the file
>> + * COPYING in the main directory of this source tree, or the
>> + * OpenIB.org BSD license below:
>> + *
>> + *     Redistribution and use in source and binary forms, with or
>> + *     without modification, are permitted provided that the following
>> + *     conditions are met:
>> + *
>> + *      - Redistributions of source code must retain the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer.
>> + *
>> + *      - Redistributions in binary form must reproduce the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer in the documentation and/or other materials
>> + *        provided with the distribution.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> + * SOFTWARE.
>> + */
>> +
>> +#ifndef UOBJECT_H
>> +#define UOBJECT_H
>> +
>> +#include <linux/idr.h>
>> +#include <rdma/uverbs_ioctl.h>
>> +#include <rdma/ib_verbs.h>
>> +#include <linux/mutex.h>
>> +
>> +const struct uverbs_type *uverbs_get_type(const struct ib_device
>> *ibdev,
>> +                                       uint16_t type);
>> +struct ib_uobject *uverbs_get_type_from_idr(const struct
>> uverbs_type_alloc_action *type,
>> +                                         struct ib_ucontext *ucontext,
>> +                                         enum uverbs_idr_access access,
>> +                                         uint32_t idr);
>> +struct ib_uobject *uverbs_get_type_from_fd(const struct
>> uverbs_type_alloc_action *type,
>> +                                        struct ib_ucontext *ucontext,
>> +                                        enum uverbs_idr_access access,
>> +                                        int fd);
>> +void uverbs_unlock_object(struct ib_uobject *uobj,
>> +                       enum uverbs_idr_access access,
>> +                       bool success);
>> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
>> +                        size_t num,
>> +                        const struct uverbs_action_spec *spec,
>> +                        bool success);
>> +
>> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
>> *ucontext,
>> +                                          const struct uverbs_types_group
>> *types_group);
>> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
>> *ucontext);
>> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
>> *ucontext);
>> +void ib_uverbs_close_fd(struct file *f);
>> +void ib_uverbs_cleanup_fd(void *private_data);
>> +
>> +static inline void *uverbs_fd_to_priv(struct ib_uobject *uobj)
>> +{
>> +     return uobj + 1;
>> +}
>
> This seems like a rather useless function.
>

Why? The user sholdn't know or care how we put our structs together.

>> +
>> +#endif /* UIDR_H */
>> diff --git a/drivers/infiniband/core/uverbs.h
>> b/drivers/infiniband/core/uverbs.h
>> index 8074705..ae7d4b8 100644
>> --- a/drivers/infiniband/core/uverbs.h
>> +++ b/drivers/infiniband/core/uverbs.h
>> @@ -180,6 +180,7 @@ void idr_remove_uobj(struct ib_uobject *uobj);
>>  struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file
>> *uverbs_file,
>>                                       struct ib_device *ib_dev,
>>                                       int is_async);
>> +void ib_uverbs_release_file(struct kref *ref);
>>  void ib_uverbs_free_async_event_file(struct ib_uverbs_file
>> *uverbs_file);
>>  struct ib_uverbs_event_file *ib_uverbs_lookup_comp_file(int fd);
>>
>> diff --git a/drivers/infiniband/core/uverbs_main.c
>> b/drivers/infiniband/core/uverbs_main.c
>> index f783723..e63357a 100644
>> --- a/drivers/infiniband/core/uverbs_main.c
>> +++ b/drivers/infiniband/core/uverbs_main.c
>> @@ -341,7 +341,7 @@ static void ib_uverbs_comp_dev(struct
>> ib_uverbs_device *dev)
>>       complete(&dev->comp);
>>  }
>>
>> -static void ib_uverbs_release_file(struct kref *ref)
>> +void ib_uverbs_release_file(struct kref *ref)
>>  {
>>       struct ib_uverbs_file *file =
>>               container_of(ref, struct ib_uverbs_file, ref);
>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>> index b5d2075..7240615 100644
>> --- a/include/rdma/ib_verbs.h
>> +++ b/include/rdma/ib_verbs.h
>> @@ -1329,8 +1329,11 @@ struct ib_fmr_attr {
>>
>>  struct ib_umem;
>>
>> +struct ib_ucontext_lock;
>> +
>>  struct ib_ucontext {
>>       struct ib_device       *device;
>> +     struct ib_uverbs_file  *ufile;
>>       struct list_head        pd_list;
>>       struct list_head        mr_list;
>>       struct list_head        mw_list;
>> @@ -1344,6 +1347,10 @@ struct ib_ucontext {
>>       struct list_head        rwq_ind_tbl_list;
>>       int                     closing;
>>
>> +     /* lock for uobjects list */
>> +     struct ib_ucontext_lock *uobjects_lock;
>> +     struct list_head        uobjects;
>> +
>>       struct pid             *tgid;
>>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
>>       struct rb_root      umem_tree;
>> @@ -1363,16 +1370,28 @@ struct ib_ucontext {
>>  #endif
>>  };
>>
>> +struct uverbs_object_list;
>> +
>> +#define OLD_ABI_COMPAT
>> +
>>  struct ib_uobject {
>>       u64                     user_handle;    /* handle given to us by userspace
>> */
>>       struct ib_ucontext     *context;        /* associated user context
>> */
>>       void                   *object;         /* containing object */
>>       struct list_head        list;           /* link to context's list */
>> -     int                     id;             /* index into kernel idr */
>> -     struct kref             ref;
>> -     struct rw_semaphore     mutex;          /* protects .live */
>> +     int                     id;             /* index into kernel idr/fd */
>> +#ifdef OLD_ABI_COMPAT
>> +     struct kref             ref;
>> +#endif
>> +     struct rw_semaphore     usecnt;         /* protects exclusive
>> access */
>> +#ifdef OLD_ABI_COMPAT
>> +     struct rw_semaphore     mutex;          /* protects .live */
>> +#endif
>>       struct rcu_head         rcu;            /* kfree_rcu() overhead */
>>       int                     live;
>> +
>> +     const struct uverbs_type_alloc_action *type;
>> +     struct ib_ucontext_lock *uobjects_lock;
>>  };
>>
>>  struct ib_udata {
>> @@ -2101,6 +2120,9 @@ struct ib_device {
>>        */
>>       int (*get_port_immutable)(struct ib_device *, u8, struct
>> ib_port_immutable *);
>>       void (*get_dev_fw_str)(struct ib_device *, char *str, size_t
>> str_len);
>> +     struct list_head type_list;
>> +
>> +     const struct uverbs_types_group *types_group;
>>  };
>>
>>  struct ib_client {
>> diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
>> new file mode 100644
>> index 0000000..2f50045
>> --- /dev/null
>> +++ b/include/rdma/uverbs_ioctl.h
>> @@ -0,0 +1,195 @@
>> +/*
>> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
>> reserved.
>> + *
>> + * This software is available to you under a choice of one of two
>> + * licenses.  You may choose to be licensed under the terms of the GNU
>> + * General Public License (GPL) Version 2, available from the file
>> + * COPYING in the main directory of this source tree, or the
>> + * OpenIB.org BSD license below:
>> + *
>> + *     Redistribution and use in source and binary forms, with or
>> + *     without modification, are permitted provided that the following
>> + *     conditions are met:
>> + *
>> + *      - Redistributions of source code must retain the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer.
>> + *
>> + *      - Redistributions in binary form must reproduce the above
>> + *        copyright notice, this list of conditions and the following
>> + *        disclaimer in the documentation and/or other materials
>> + *        provided with the distribution.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>> + * SOFTWARE.
>> + */
>> +
>> +#ifndef _UVERBS_IOCTL_
>> +#define _UVERBS_IOCTL_
>> +
>> +#include <linux/kernel.h>
>> +
>> +struct uverbs_object_type;
>> +struct ib_ucontext;
>> +struct ib_uobject;
>> +struct ib_device;
>> +struct uverbs_uobject_type;
>> +
>> +/*
>> + * =======================================
>> + *   Verbs action specifications
>> + * =======================================
>> + */
>
> I intentionally used urdma (though condensed to 3 letters that I don't recall atm), rather than uverbs.  This will need to work with non-verbs devices and interfaces -- again, consider how this fits with the rdma cm.  Verbs has a very specific meaning, which gets lost if we start referring to everything as 'verbs'.  It's bad enough that we're stuck with 'drivers/infiniband' and 'rdma', such that 'infiniband' also means ethernet and rdma means nothing.
>

IMHO - let's agree on the concept of this infrastructure. One we
decide its scope, we could generalize it (i.e - ioctl_provider and
ioctl_context) and implement it to rdma-cm as well.

>> +
>> +enum uverbs_attr_type {
>> +     UVERBS_ATTR_TYPE_PTR_IN,
>> +     UVERBS_ATTR_TYPE_PTR_OUT,
>> +     UVERBS_ATTR_TYPE_IDR,
>> +     UVERBS_ATTR_TYPE_FD,
>> +};
>> +
>> +enum uverbs_idr_access {
>> +     UVERBS_IDR_ACCESS_READ,
>> +     UVERBS_IDR_ACCESS_WRITE,
>> +     UVERBS_IDR_ACCESS_NEW,
>> +     UVERBS_IDR_ACCESS_DESTROY
>> +};
>> +
>> +struct uverbs_attr_spec {
>> +     u16                             len;
>> +     enum uverbs_attr_type           type;
>> +     struct {
>> +             u16                     obj_type;
>> +             u8                      access;
>
> Is access intended to be an enum uverbs_idr_access value?
>

Yeah, worth using this enum. Thanks.

>> +     } obj;
>
> I would remove (flatten) the substructure and re-order the fields for better alignment.
>

I noticed there are several places which aren't aliged. It's in my todo list.

>> +};
>> +
>> +struct uverbs_attr_group_spec {
>> +     struct uverbs_attr_spec         *attrs;
>> +     size_t                          num_attrs;
>> +};
>> +
>> +struct uverbs_action_spec {
>> +     const struct uverbs_attr_group_spec             **attr_groups;
>> +     /* if > 0 -> validator, otherwise, error */
>
> ? not sure what this comment means
>
>> +     int (*dist)(__u16 *attr_id, void *priv);
>
> What does 'dist' stand for?
>

dist = distribution function.
It maps the attributes you got from the user-space to your groups. You
could think of each group as a namespace - where its attributes (or
types/actions) starts from zero in the sake of compactness.
So, for example, it gets an attribute 0x8010 and maps to to "group 1"
(provider) and attribute 0x10.

>> +     void                                            *priv;
>> +     size_t                                          num_groups;
>> +};
>> +
>> +struct uverbs_attr_array;
>> +struct ib_uverbs_file;
>> +
>> +struct uverbs_action {
>> +     struct uverbs_action_spec spec;
>> +     void *priv;
>> +     int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file
>> *ufile,
>> +                    struct uverbs_attr_array *ctx, size_t num, void
>> *priv);
>> +};
>> +
>> +struct uverbs_type_alloc_action;
>> +typedef void (*free_type)(const struct uverbs_type_alloc_action
>> *uobject_type,
>> +                       struct ib_uobject *uobject);
>> +
>> +struct uverbs_type_alloc_action {
>> +     enum uverbs_attr_type           type;
>> +     int                             order;
>
> I think this is being used as destroy order, in which case I would rename it to clarify the intent.  Though I'd prefer we come up with a more efficient destruction mechanism than the repeated nested looping.
>

In one of the earlier revisions I used a sorted list, which was
efficient. I recall that Jason didn't like its complexity and
re-thinking about that - he's right. Most of your types are "order
number" 0 anyway. So you'll probably iterate very few objects in the
next round (in verbs, everything but MRs and PDs).

>> +     size_t                          obj_size;
>
> This can be alloc_fn
>
>> +     free_type                       free_fn;
>> +     struct {
>> +             const struct file_operations    *fops;
>> +             const char                      *name;
>> +             int                             flags;
>> +     } fd;
>> +};
>> +
>> +struct uverbs_type_actions_group {
>> +     size_t                                  num_actions;
>> +     const struct uverbs_action              **actions;
>> +};
>> +
>> +struct uverbs_type {
>> +     size_t                                  num_groups;
>> +     const struct uverbs_type_actions_group  **action_groups;
>> +     const struct uverbs_type_alloc_action   *alloc;
>> +     int (*dist)(__u16 *action_id, void *priv);
>> +     void                                    *priv;
>> +};
>> +
>> +struct uverbs_types {
>> +     size_t                                  num_types;
>> +     const struct uverbs_type                **types;
>> +};
>> +
>> +struct uverbs_types_group {
>> +     const struct uverbs_types               **type_groups;
>> +     size_t                                  num_groups;
>> +     int (*dist)(__u16 *type_id, void *priv);
>> +     void                                    *priv;
>> +};
>> +
>> +/* =================================================
>> + *              Parsing infrastructure
>> + * =================================================
>> + */
>> +
>> +struct uverbs_ptr_attr {
>> +     void    * __user ptr;
>> +     __u16           len;
>> +};
>> +
>> +struct uverbs_fd_attr {
>> +     int             fd;
>> +};
>> +
>> +struct uverbs_uobj_attr {
>> +     /*  idr handle */
>> +     __u32   idr;
>> +};
>> +
>> +struct uverbs_obj_attr {
>> +     /* pointer to the kernel descriptor -> type, access, etc */
>> +     const struct uverbs_attr_spec *val;
>> +     struct ib_uverbs_attr __user    *uattr;
>> +     const struct uverbs_type_alloc_action   *type;
>> +     struct ib_uobject               *uobject;
>> +     union {
>> +             struct uverbs_fd_attr           fd;
>> +             struct uverbs_uobj_attr         uobj;
>> +     };
>> +};
>> +
>> +struct uverbs_attr {
>> +     bool valid;
>> +     union {
>> +             struct uverbs_ptr_attr  cmd_attr;
>> +             struct uverbs_obj_attr  obj_attr;
>> +     };
>> +};
>
> It's odd to have a union that's part of a structure without some field to indicate which union field is accessible.
>

You index this array but the attribute id from the user's callback
funciton. The user should know what's the type of the attribute, as
[s]he declared the specification.

>> +
>> +/* output of one validator */
>> +struct uverbs_attr_array {
>> +     size_t num_attrs;
>> +     /* arrays of attrubytes, index is the id i.e SEND_CQ */
>> +     struct uverbs_attr *attrs;
>> +};
>> +
>> +/* =================================================
>> + *              Types infrastructure
>> + * =================================================
>> + */
>> +
>> +int ib_uverbs_uobject_type_add(struct list_head      *head,
>> +                            void (*free)(struct uverbs_uobject_type *type,
>> +                                         struct ib_uobject *uobject,
>> +                                         struct ib_ucontext *ucontext),
>> +                            uint16_t obj_type);
>> +void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev);
>> +
>> +#endif
>> --
>> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks for taking a look.

Regards,
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373AB0A445F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2016-10-30  9:13           ` Leon Romanovsky
@ 2016-11-07 23:55           ` Jason Gunthorpe
       [not found]             ` <20161107235516.GE7002-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2016-11-07 23:55 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Christoph Lameter, Liran Liss, Haggai Eran, Majd Dibbiny,
	Tal Alon, Leon Romanovsky

On Fri, Oct 28, 2016 at 10:53:13PM +0000, Hefty, Sean wrote:
> > The current code creates an IDR per type. Since types are currently
> > common for all vendors and known in advance, this was good enough.
> > However, the proposed ioctl based infrastructure allows each vendor
> > to declare only some of the common types and declare its own specific
> > types.
> > 
> > Thus, we decided to implement IDR to be per device and refactor it to
> > use a new file.
> 
> I think this needs to be more abstract.  I would consider
> introducing the concept of an 'ioctl provider', with the idr per
> ioctl provider.  You could then make each ib_device an ioctl
> provider.  (Just embed the structure).  I believe this will be
> necessary to support the rdma_cm, ib_cm, as well as devices that
> export different sets of ioctls, where an ib_device isn't
> necessarily available.
> 
> Essentially, I would treat plugging into the uABI independent from
> plugging into the kernel verbs API.  Otherwise, I think we'll end up
> with multiple ioctl 'frameworks'.

Matan,

I think you should change things so that all the *general* code uses
'urdma_' as a prefix instead of uverbs_. Only use uverbs_ on things
that truely only apply to uverbs. This will make things much
clearer how the code sharing is expected to work with rdma_cm

Sean is right, this shows why having the IDR be per device does not
work, rdma-cm really does need a per-file or global IDR - both
approaches should really be the same, and I think per-file has better
locking characteristics, so I'd recommend that.

Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
       [not found]                                 ` <CAAKD3BB0k1UxV2qO3SqAD_t1vM2pcduOXiz8aJ5c+JXAmq_aWw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-11-08  0:43                                   ` Jason Gunthorpe
       [not found]                                     ` <20161108004351.GA32444-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2016-11-08  0:43 UTC (permalink / raw)
  To: Matan Barak
  Cc: Leon Romanovsky, Christoph Hellwig, Matan Barak, linux-rdma,
	Doug Ledford, Sean Hefty, Christoph Lameter, Liran Liss,
	Haggai Eran, Majd Dibbiny, Tal Alon

On Sun, Oct 30, 2016 at 10:48:39AM +0200, Matan Barak wrote:
> On Fri, Oct 28, 2016 at 5:46 PM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> > On Fri, Oct 28, 2016 at 08:37:25AM -0700, Christoph Hellwig wrote:
> >> On Fri, Oct 28, 2016 at 06:33:06PM +0300, Leon Romanovsky wrote:
> >> > Just to summarize, to be sure that I understood you correctly.
> >> >
> >> > | write | -> | conversion logic | ---
> >> > | ioctl | ---------------------------
> >> >
> >> > Am I right?
> >>
> >> Yes, as long as the write and ioctl boxes do the copy_{from,to}_user.

> If we accept the limitations here (i.e - all commands attributes
> come either from kernel or from user, but you can't mix them -
> that's mean the write comparability layer either needs to copy all
> attributes or use a direct mapping for all of them), I could just
> either break ib_uverbs_cmd_verbs to a a few functions or just pass a
> callback of boxing the descriptors copy.

>From what I saw in the series, this looks easy enough to fix..

Just lightly refactor things so that the write() compat layer can call
into the ioctl processor with an already prepared tlv list in kernel
memory and form such a list on the stack when doing the compat stuff.

The bigger problem is the tlv list pointers themselves, they have to
point to user memory so the compat layer can only do so much of a
transformation.

I guess another flag in the copy_from_user wrapper would do the trick
if we need it.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device
       [not found]             ` <20161107235516.GE7002-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-11-09  9:34               ` Matan Barak
  0 siblings, 0 replies; 29+ messages in thread
From: Matan Barak @ 2016-11-09  9:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Hefty, Sean
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Christoph Lameter, Liran Liss, Haggai Eran, Majd Dibbiny,
	Tal Alon, Leon Romanovsky

On 08/11/2016 01:55, Jason Gunthorpe wrote:
> On Fri, Oct 28, 2016 at 10:53:13PM +0000, Hefty, Sean wrote:
>>> The current code creates an IDR per type. Since types are currently
>>> common for all vendors and known in advance, this was good enough.
>>> However, the proposed ioctl based infrastructure allows each vendor
>>> to declare only some of the common types and declare its own specific
>>> types.
>>>
>>> Thus, we decided to implement IDR to be per device and refactor it to
>>> use a new file.
>>
>> I think this needs to be more abstract.  I would consider
>> introducing the concept of an 'ioctl provider', with the idr per
>> ioctl provider.  You could then make each ib_device an ioctl
>> provider.  (Just embed the structure).  I believe this will be
>> necessary to support the rdma_cm, ib_cm, as well as devices that
>> export different sets of ioctls, where an ib_device isn't
>> necessarily available.
>>
>> Essentially, I would treat plugging into the uABI independent from
>> plugging into the kernel verbs API.  Otherwise, I think we'll end up
>> with multiple ioctl 'frameworks'.
>
> Matan,
>
> I think you should change things so that all the *general* code uses
> 'urdma_' as a prefix instead of uverbs_. Only use uverbs_ on things
> that truely only apply to uverbs. This will make things much
> clearer how the code sharing is expected to work with rdma_cm
>

Yeah, I'll change the general infrastructure to be urdma.

> Sean is right, this shows why having the IDR be per device does not
> work, rdma-cm really does need a per-file or global IDR - both
> approaches should really be the same, and I think per-file has better
> locking characteristics, so I'd recommend that.
>

Eventually, I think ending up with an ioctl_provider and ioctl_context 
is the way to go here. The IDR and locks should be per ioctl_provider.
In ib_device world, an ioctl_provider is indeed an ib_device. In rdma_cm 
world, the ioctl_provider is the rdma_cm global file.
However, I think in order to do such large amount of changes, lets push 
things incrementally. We could start with the current schema, where it's 
ib_device specific, lay out the foundations and then refactor this to be 
more abstract when adding rdma_cm. We could even do that refactoring 
before enabling the ioctl interface, so if we see that something in the 
model is broken, we could still back-off.
Sounds reasonable?

> Jason
>

Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
       [not found]                                     ` <20161108004351.GA32444-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-11-09  9:45                                       ` Matan Barak
  0 siblings, 0 replies; 29+ messages in thread
From: Matan Barak @ 2016-11-09  9:45 UTC (permalink / raw)
  To: Jason Gunthorpe, Matan Barak
  Cc: Leon Romanovsky, Christoph Hellwig, linux-rdma, Doug Ledford,
	Sean Hefty, Christoph Lameter, Liran Liss, Haggai Eran,
	Majd Dibbiny, Tal Alon

On 08/11/2016 02:43, Jason Gunthorpe wrote:
> On Sun, Oct 30, 2016 at 10:48:39AM +0200, Matan Barak wrote:
>> On Fri, Oct 28, 2016 at 5:46 PM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>> On Fri, Oct 28, 2016 at 08:37:25AM -0700, Christoph Hellwig wrote:
>>>> On Fri, Oct 28, 2016 at 06:33:06PM +0300, Leon Romanovsky wrote:
>>>>> Just to summarize, to be sure that I understood you correctly.
>>>>>
>>>>> | write | -> | conversion logic | ---
>>>>> | ioctl | ---------------------------
>>>>>
>>>>> Am I right?
>>>>
>>>> Yes, as long as the write and ioctl boxes do the copy_{from,to}_user.
>
>> If we accept the limitations here (i.e - all commands attributes
>> come either from kernel or from user, but you can't mix them -
>> that's mean the write comparability layer either needs to copy all
>> attributes or use a direct mapping for all of them), I could just
>> either break ib_uverbs_cmd_verbs to a a few functions or just pass a
>> callback of boxing the descriptors copy.
>
> From what I saw in the series, this looks easy enough to fix..
>
> Just lightly refactor things so that the write() compat layer can call
> into the ioctl processor with an already prepared tlv list in kernel
> memory and form such a list on the stack when doing the compat stuff.
>

Yeah, it's just an easy refactor of ib_uverbs_cmd_verbs and there's 
multiple ways of doing that :)

> The bigger problem is the tlv list pointers themselves, they have to
> point to user memory so the compat layer can only do so much of a
> transformation.
>
> I guess another flag in the copy_from_user wrapper would do the trick
> if we need it.
>

Currently we assume the payload itself is in user-space only so direct 
mapping is mandatory.
If we ever need to do something other than (bunch of consecutive write 
ABI struct fields) -> (attribute in the ioctl world), we'll have to box 
these copy macros/functions with copy_from_attr and copy_to_attr calls.

> Jason

Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC ABI V5 02/10] RDMA/core: Add support for custom types
       [not found]             ` <CAAKD3BDWyb10baLrDu=m_mYPB64i9OOPEPVYKtDo9zVbvMM-UA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-11-09 18:00               ` Hefty, Sean
       [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373AB0A8000-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Hefty, Sean @ 2016-11-09 18:00 UTC (permalink / raw)
  To: Matan Barak
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Jason Gunthorpe, Christoph Lameter, Liran Liss, Haggai Eran,
	Majd Dibbiny, Tal Alon, Leon Romanovsky

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 47059 bytes --]

> I had thought about that, but the user could initialize its part of
> the object in the function handler. It can't allocate the object as we
> need it in order to allocate an IDR entry and co. The assumption here
> is that the "unlock" stage can't fail.

This is creating a generic OO type of framework, so just add constructor/destructor functions and have all objects inherit from a base ioctl object class.

> > In fact, it would be great if we could just cleanup the list in the
> reverse order that items were created.  Maybe this requires supporting
> a pre-cleanup handler, so that the driver can pluck items out of the
> list that may need to be destroyed out of order.
> >
> 
> So that's essentially one layer of ordering. Why do you consider a
> driver iterating over all objects simpler than this model?

This problem is a verbs specific issue, and one that only involves MW .  We have reference counts that can provide the same functionality.  I want to avoid the amount of meta-data that needs to be used to describe objects.

> >> Adding an object is done in two parts.
> >> First, an object is allocated and added to IDR/fd table. Then, the
> >> command's handlers (in downstream patches) could work on this object
> >> and fill in its required details.
> >> After a successful command, ib_uverbs_uobject_enable is called and
> >> this user objects becomes ucontext visible.
> >
> > If you have a way to mark that an object is used for exclusive
> access, you may be able to use that instead of introducing a new
> variable.  (I.e. acquire the object's write lock).  I think we want to
> make an effort to minimize the size of the kernel structure needed to
> track every user space object (within reason).
> >
> 
> I didn't really follow. A command attribute states the nature of the
> locking (for example, in MODIFY_QP the QP could be exclusively locked,
> but in QUERY_QP it's only locked for reading). I don't want to really
> grab a lock, as if I were I could face a dead-lock (user-space could
> pass parameters in a colliding order), It could be solved by sorting
> the handles, but that would degrade performance without a good reasob.

I'm suggesting that the locking attribute and command be separate.  This allows the framework to acquire the proper type of lock independent of what function it will invoke.

The framework doesn't need to hold locks.  It should be able to mark access to an object.  If that access is not available, it can abort.  This pushes more complex synchronization and thread handling to user space.

> >> Removing an uboject is done by calling ib_uverbs_uobject_remove.
> >>
> >> We should make sure IDR (per-device) and list (per-ucontext) could
> >> be accessed concurrently without corrupting them.
> >>
> >> Signed-off-by: Matan Barak <matanb@mellanox.com>
> >> Signed-off-by: Haggai Eran <haggaie@mellanox.com>
> >> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >> ---
> >
> > As a general comment, I do have concerns that the resulting
> generalized parsing of everything will negatively impact performance
> for operations that do have to transition into the kernel.  Not all
> devices offload all operations to user space.  Plus the resulting code
> is extremely difficult to read and non-trivial to use.  It's equivalent
> to reading C++ code that has 4 layers of inheritance with overrides to
> basic operators...
> 
> There are two parts here. I think the handlers themselves are simpler,
> easier to read and less error-prone. They contain less code
> duplications. The macro based define language explicitly declare all
> attributes, their types, size, etc.
> The model here is a bit more complex as we want to achieve both code
> resue and add/override of new types/actions/attributes.
> 
> 
> >
> > Pre and post operators per command that can do straightforward
> validation seem like a better option.
> >
> >
> 
> I think that would duplicate a lot of code and will be more
> error-prone than one infrastrucutre that automates all that work for
> you.

I think that's a toss-up.  Either you have to write the code correctly or write the rules correctly.  Reading code is straightforward, manually converting rules into code is not.

In any case, the two approaches are not exclusive.  By forcing the rule language into the framework, everything is forced to deal with it.  By leaving it out, each ioctl provider can decide if they need this or not.  If you want verbs to process all ioctl's using a single pre-validation function that operates based on these rules you can.  Nothing prevents that.  But ioctl providers that want better performance can elect for a more straightforward validation model.

> >>  drivers/infiniband/core/Makefile      |   3 +-
> >>  drivers/infiniband/core/device.c      |   1 +
> >>  drivers/infiniband/core/rdma_core.c   | 489
> >> ++++++++++++++++++++++++++++++++++
> >>  drivers/infiniband/core/rdma_core.h   |  75 ++++++
> >>  drivers/infiniband/core/uverbs.h      |   1 +
> >>  drivers/infiniband/core/uverbs_main.c |   2 +-
> >>  include/rdma/ib_verbs.h               |  28 +-
> >>  include/rdma/uverbs_ioctl.h           | 195 ++++++++++++++
> >>  8 files changed, 789 insertions(+), 5 deletions(-)
> >>  create mode 100644 drivers/infiniband/core/rdma_core.c
> >>  create mode 100644 drivers/infiniband/core/rdma_core.h
> >>  create mode 100644 include/rdma/uverbs_ioctl.h
> >>
> >> diff --git a/drivers/infiniband/core/Makefile
> >> b/drivers/infiniband/core/Makefile
> >> index edaae9f..1819623 100644
> >> --- a/drivers/infiniband/core/Makefile
> >> +++ b/drivers/infiniband/core/Makefile
> >> @@ -28,4 +28,5 @@ ib_umad-y :=                        user_mad.o
> >>
> >>  ib_ucm-y :=                  ucm.o
> >>
> >> -ib_uverbs-y :=                       uverbs_main.o uverbs_cmd.o
> >> uverbs_marshall.o
> >> +ib_uverbs-y :=                       uverbs_main.o uverbs_cmd.o
> >> uverbs_marshall.o \
> >> +                             rdma_core.o
> >> diff --git a/drivers/infiniband/core/device.c
> >> b/drivers/infiniband/core/device.c
> >> index c3b68f5..43994b1 100644
> >> --- a/drivers/infiniband/core/device.c
> >> +++ b/drivers/infiniband/core/device.c
> >> @@ -243,6 +243,7 @@ struct ib_device *ib_alloc_device(size_t size)
> >>       spin_lock_init(&device->client_data_lock);
> >>       INIT_LIST_HEAD(&device->client_data_list);
> >>       INIT_LIST_HEAD(&device->port_list);
> >> +     INIT_LIST_HEAD(&device->type_list);
> >>
> >>       return device;
> >>  }
> >> diff --git a/drivers/infiniband/core/rdma_core.c
> >> b/drivers/infiniband/core/rdma_core.c
> >> new file mode 100644
> >> index 0000000..337abc2
> >> --- /dev/null
> >> +++ b/drivers/infiniband/core/rdma_core.c
> >> @@ -0,0 +1,489 @@
> >> +/*
> >> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
> >> reserved.
> >> + *
> >> + * This software is available to you under a choice of one of two
> >> + * licenses.  You may choose to be licensed under the terms of the
> GNU
> >> + * General Public License (GPL) Version 2, available from the file
> >> + * COPYING in the main directory of this source tree, or the
> >> + * OpenIB.org BSD license below:
> >> + *
> >> + *     Redistribution and use in source and binary forms, with or
> >> + *     without modification, are permitted provided that the
> following
> >> + *     conditions are met:
> >> + *
> >> + *      - Redistributions of source code must retain the above
> >> + *        copyright notice, this list of conditions and the
> following
> >> + *        disclaimer.
> >> + *
> >> + *      - Redistributions in binary form must reproduce the above
> >> + *        copyright notice, this list of conditions and the
> following
> >> + *        disclaimer in the documentation and/or other materials
> >> + *        provided with the distribution.
> >> + *
> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
> OF
> >> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> >> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
> HOLDERS
> >> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
> AN
> >> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
> IN
> >> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> >> + * SOFTWARE.
> >> + */
> >> +
> >> +#include <linux/file.h>
> >> +#include <linux/anon_inodes.h>
> >> +#include <rdma/ib_verbs.h>
> >> +#include "uverbs.h"
> >> +#include "rdma_core.h"
> >> +#include <rdma/uverbs_ioctl.h>
> >> +
> >> +const struct uverbs_type *uverbs_get_type(const struct ib_device
> >> *ibdev,
> >> +                                       uint16_t type)
> >> +{
> >> +     const struct uverbs_types_group *groups = ibdev->types_group;
> >> +     const struct uverbs_types *types;
> >> +     int ret = groups->dist(&type, groups->priv);
> >> +
> >> +     if (ret >= groups->num_groups)
> >> +             return NULL;
> >> +
> >> +     types = groups->type_groups[ret];
> >> +
> >> +     if (type >= types->num_types)
> >> +             return NULL;
> >> +
> >> +     return types->types[type];
> >> +}
> >> +
> >> +static int uverbs_lock_object(struct ib_uobject *uobj,
> >> +                           enum uverbs_idr_access access)
> >> +{
> >> +     if (access == UVERBS_IDR_ACCESS_READ)
> >> +             return down_read_trylock(&uobj->usecnt) == 1 ? 0 : -
> EBUSY;
> >> +
> >> +     /* lock is either WRITE or DESTROY - should be exclusive */
> >> +     return down_write_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
> >
> > This function could take the lock type directly (read or write),
> versus inferring it based on some other access type.
> >
> 
> We can, but since we use these enums in the attribute specifications,
> I thought it could be more convinient.
> 
> >> +}
> >> +
> >> +static struct ib_uobject *get_uobj(int id, struct ib_ucontext
> >> *context)
> >> +{
> >> +     struct ib_uobject *uobj;
> >> +
> >> +     rcu_read_lock();
> >> +     uobj = idr_find(&context->device->idr, id);
> >> +     if (uobj && uobj->live) {
> >> +             if (uobj->context != context)
> >> +                     uobj = NULL;
> >> +     }
> >> +     rcu_read_unlock();
> >> +
> >> +     return uobj;
> >> +}
> >> +
> >> +struct ib_ucontext_lock {
> >> +     struct kref  ref;
> >> +     /* locking the uobjects_list */
> >> +     struct mutex lock;
> >> +};
> >> +
> >> +static void init_uobjects_list_lock(struct ib_ucontext_lock *lock)
> >> +{
> >> +     mutex_init(&lock->lock);
> >> +     kref_init(&lock->ref);
> >> +}
> >> +
> >> +static void release_uobjects_list_lock(struct kref *ref)
> >> +{
> >> +     struct ib_ucontext_lock *lock = container_of(ref,
> >> +                                                  struct
> ib_ucontext_lock,
> >> +                                                  ref);
> >> +
> >> +     kfree(lock);
> >> +}
> >> +
> >> +static void init_uobj(struct ib_uobject *uobj, u64 user_handle,
> >> +                   struct ib_ucontext *context)
> >> +{
> >> +     init_rwsem(&uobj->usecnt);
> >> +     uobj->user_handle = user_handle;
> >> +     uobj->context     = context;
> >> +     uobj->live        = 0;
> >> +}
> >> +
> >> +static int add_uobj(struct ib_uobject *uobj)
> >> +{
> >> +     int ret;
> >> +
> >> +     idr_preload(GFP_KERNEL);
> >> +     spin_lock(&uobj->context->device->idr_lock);
> >> +
> >> +     ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0,
> >> GFP_NOWAIT);
> >> +     if (ret >= 0)
> >> +             uobj->id = ret;
> >> +
> >> +     spin_unlock(&uobj->context->device->idr_lock);
> >> +     idr_preload_end();
> >> +
> >> +     return ret < 0 ? ret : 0;
> >> +}
> >> +
> >> +static void remove_uobj(struct ib_uobject *uobj)
> >> +{
> >> +     spin_lock(&uobj->context->device->idr_lock);
> >> +     idr_remove(&uobj->context->device->idr, uobj->id);
> >> +     spin_unlock(&uobj->context->device->idr_lock);
> >> +}
> >> +
> >> +static void put_uobj(struct ib_uobject *uobj)
> >> +{
> >> +     kfree_rcu(uobj, rcu);
> >> +}
> >> +
> >> +static struct ib_uobject *get_uobject_from_context(struct
> ib_ucontext
> >> *ucontext,
> >> +                                                const struct
> >> uverbs_type_alloc_action *type,
> >> +                                                u32 idr,
> >> +                                                enum
> uverbs_idr_access access)
> >> +{
> >> +     struct ib_uobject *uobj;
> >> +     int ret;
> >> +
> >> +     rcu_read_lock();
> >> +     uobj = get_uobj(idr, ucontext);
> >> +     if (!uobj)
> >> +             goto free;
> >> +
> >> +     if (uobj->type != type) {
> >> +             uobj = NULL;
> >> +             goto free;
> >> +     }
> >> +
> >> +     ret = uverbs_lock_object(uobj, access);
> >> +     if (ret)
> >> +             uobj = ERR_PTR(ret);
> >> +free:
> >> +     rcu_read_unlock();
> >> +     return uobj;
> >> +
> >> +     return NULL;
> >> +}
> >> +
> >> +static int ib_uverbs_uobject_add(struct ib_uobject *uobject,
> >> +                              const struct uverbs_type_alloc_action
> >> *uobject_type)
> >> +{
> >> +     uobject->type = uobject_type;
> >> +     return add_uobj(uobject);
> >> +}
> >> +
> >> +struct ib_uobject *uverbs_get_type_from_idr(const struct
> >> uverbs_type_alloc_action *type,
> >> +                                         struct ib_ucontext
> *ucontext,
> >> +                                         enum uverbs_idr_access
> access,
> >> +                                         uint32_t idr)
> >> +{
> >> +     struct ib_uobject *uobj;
> >> +     int ret;
> >> +
> >> +     if (access == UVERBS_IDR_ACCESS_NEW) {
> >> +             uobj = kmalloc(type->obj_size, GFP_KERNEL);
> >> +             if (!uobj)
> >> +                     return ERR_PTR(-ENOMEM);
> >> +
> >> +             init_uobj(uobj, 0, ucontext);
> >> +
> >> +             /* lock idr */
> >
> > Command to lock idr, but no lock is obtained.
> >
> 
> ib_uverbs_uobject_add calls add_uobj which locks the IDR.
> 
> >> +             ret = ib_uverbs_uobject_add(uobj, type);
> >> +             if (ret) {
> >> +                     kfree(uobj);
> >> +                     return ERR_PTR(ret);
> >> +             }
> >> +
> >> +     } else {
> >> +             uobj = get_uobject_from_context(ucontext, type, idr,
> >> +                                             access);
> >> +
> >> +             if (!uobj)
> >> +                     return ERR_PTR(-ENOENT);
> >> +     }
> >> +
> >> +     return uobj;
> >> +}
> >> +
> >> +struct ib_uobject *uverbs_get_type_from_fd(const struct
> >> uverbs_type_alloc_action *type,
> >> +                                        struct ib_ucontext
> *ucontext,
> >> +                                        enum uverbs_idr_access
> access,
> >> +                                        int fd)
> >> +{
> >> +     if (access == UVERBS_IDR_ACCESS_NEW) {
> >> +             int _fd;
> >> +             struct ib_uobject *uobj = NULL;
> >> +             struct file *filp;
> >> +
> >> +             _fd = get_unused_fd_flags(O_CLOEXEC);
> >> +             if (_fd < 0 || WARN_ON(type->obj_size < sizeof(struct
> >> ib_uobject)))
> >> +                     return ERR_PTR(_fd);
> >> +
> >> +             uobj = kmalloc(type->obj_size, GFP_KERNEL);
> >> +             init_uobj(uobj, 0, ucontext);
> >> +
> >> +             if (!uobj)
> >> +                     return ERR_PTR(-ENOMEM);
> >> +
> >> +             filp = anon_inode_getfile(type->fd.name, type-
> >fd.fops,
> >> +                                       uobj + 1, type->fd.flags);
> >> +             if (IS_ERR(filp)) {
> >> +                     put_unused_fd(_fd);
> >> +                     kfree(uobj);
> >> +                     return (void *)filp;
> >> +             }
> >> +
> >> +             uobj->type = type;
> >> +             uobj->id = _fd;
> >> +             uobj->object = filp;
> >> +
> >> +             return uobj;
> >> +     } else if (access == UVERBS_IDR_ACCESS_READ) {
> >> +             struct file *f = fget(fd);
> >> +             struct ib_uobject *uobject;
> >> +
> >> +             if (!f)
> >> +                     return ERR_PTR(-EBADF);
> >> +
> >> +             uobject = f->private_data - sizeof(struct ib_uobject);
> >> +             if (f->f_op != type->fd.fops ||
> >> +                 !uobject->live) {
> >> +                     fput(f);
> >> +                     return ERR_PTR(-EBADF);
> >> +             }
> >> +
> >> +             /*
> >> +              * No need to protect it with a ref count, as fget
> >> increases
> >> +              * f_count.
> >> +              */
> >> +             return uobject;
> >> +     } else {
> >> +             return ERR_PTR(-EOPNOTSUPP);
> >> +     }
> >> +}
> >> +
> >> +static void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
> >> +{
> >> +     mutex_lock(&uobject->context->uobjects_lock->lock);
> >> +     list_add(&uobject->list, &uobject->context->uobjects);
> >> +     mutex_unlock(&uobject->context->uobjects_lock->lock);
> >
> > Why not just insert the object into the list on creation?
> >
> >> +     uobject->live = 1;
> >
> > See my comments above on removing the live field.
> >
> 
> Seems that the list could suffice, but I'll look into that.
> 
> >> +}
> >> +
> >> +static void ib_uverbs_uobject_remove(struct ib_uobject *uobject,
> bool
> >> lock)
> >> +{
> >> +     /*
> >> +      * Calling remove requires exclusive access, so it's not
> possible
> >> +      * another thread will use our object.
> >> +      */
> >> +     uobject->live = 0;
> >> +     uobject->type->free_fn(uobject->type, uobject);
> >> +     if (lock)
> >> +             mutex_lock(&uobject->context->uobjects_lock->lock);
> >> +     list_del(&uobject->list);
> >> +     if (lock)
> >> +             mutex_unlock(&uobject->context->uobjects_lock->lock);
> >> +     remove_uobj(uobject);
> >> +     put_uobj(uobject);
> >> +}
> >> +
> >> +static void uverbs_unlock_idr(struct ib_uobject *uobj,
> >> +                           enum uverbs_idr_access access,
> >> +                           bool success)
> >> +{
> >> +     switch (access) {
> >> +     case UVERBS_IDR_ACCESS_READ:
> >> +             up_read(&uobj->usecnt);
> >> +             break;
> >> +     case UVERBS_IDR_ACCESS_NEW:
> >> +             if (success) {
> >> +                     ib_uverbs_uobject_enable(uobj);
> >> +             } else {
> >> +                     remove_uobj(uobj);
> >> +                     put_uobj(uobj);
> >> +             }
> >> +             break;
> >> +     case UVERBS_IDR_ACCESS_WRITE:
> >> +             up_write(&uobj->usecnt);
> >> +             break;
> >> +     case UVERBS_IDR_ACCESS_DESTROY:
> >> +             if (success)
> >> +                     ib_uverbs_uobject_remove(uobj, true);
> >> +             else
> >> +                     up_write(&uobj->usecnt);
> >> +             break;
> >> +     }
> >> +}
> >> +
> >> +static void uverbs_unlock_fd(struct ib_uobject *uobj,
> >> +                          enum uverbs_idr_access access,
> >> +                          bool success)
> >> +{
> >> +     struct file *filp = uobj->object;
> >> +
> >> +     if (access == UVERBS_IDR_ACCESS_NEW) {
> >> +             if (success) {
> >> +                     kref_get(&uobj->context->ufile->ref);
> >> +                     uobj->uobjects_lock = uobj->context-
> >uobjects_lock;
> >> +                     kref_get(&uobj->uobjects_lock->ref);
> >> +                     ib_uverbs_uobject_enable(uobj);
> >> +                     fd_install(uobj->id, uobj->object);
> >
> > I don't get this.  The function is unlocking something, but there are
> calls to get krefs?
> >
> 
> Before invoking the user's callback, we're first locking all objects
> and afterwards we're unlocking them. When we need to create a new
> object, the lock becomes object creation and the unlock could become
> (assuming the user's callback succeeded) enabling this new object.
> When you add a new object (or fd in this case), we take a reference
> count to both the uverbs_file and the locking context.
> 
> >> +             } else {
> >> +                     fput(uobj->object);
> >> +                     put_unused_fd(uobj->id);
> >> +                     kfree(uobj);
> >> +             }
> >> +     } else {
> >> +             fput(filp);
> >> +     }
> >> +}
> >> +
> >> +void uverbs_unlock_object(struct ib_uobject *uobj,
> >> +                       enum uverbs_idr_access access,
> >> +                       bool success)
> >> +{
> >> +     if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
> >> +             uverbs_unlock_idr(uobj, access, success);
> >> +     else if (uobj->type->type == UVERBS_ATTR_TYPE_FD)
> >> +             uverbs_unlock_fd(uobj, access, success);
> >> +     else
> >> +             WARN_ON(true);
> >> +}
> >> +
> >> +static void ib_uverbs_remove_fd(struct ib_uobject *uobject)
> >> +{
> >> +     /*
> >> +      * user should release the uobject in the release
> >> +      * callback.
> >> +      */
> >> +     if (uobject->live) {
> >> +             uobject->live = 0;
> >> +             list_del(&uobject->list);
> >> +             uobject->type->free_fn(uobject->type, uobject);
> >> +             kref_put(&uobject->context->ufile->ref,
> >> ib_uverbs_release_file);
> >> +             uobject->context = NULL;
> >> +     }
> >> +}
> >> +
> >> +void ib_uverbs_close_fd(struct file *f)
> >> +{
> >> +     struct ib_uobject *uobject = f->private_data - sizeof(struct
> >> ib_uobject);
> >> +
> >> +     mutex_lock(&uobject->uobjects_lock->lock);
> >> +     if (uobject->live) {
> >> +             uobject->live = 0;
> >> +             list_del(&uobject->list);
> >> +             kref_put(&uobject->context->ufile->ref,
> >> ib_uverbs_release_file);
> >> +             uobject->context = NULL;
> >> +     }
> >> +     mutex_unlock(&uobject->uobjects_lock->lock);
> >> +     kref_put(&uobject->uobjects_lock->ref,
> >> release_uobjects_list_lock);
> >> +}
> >> +
> >> +void ib_uverbs_cleanup_fd(void *private_data)
> >> +{
> >> +     struct ib_uboject *uobject = private_data - sizeof(struct
> >> ib_uobject);
> >> +
> >> +     kfree(uobject);
> >> +}
> >> +
> >> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
> >> +                        size_t num,
> >> +                        const struct uverbs_action_spec *spec,
> >> +                        bool success)
> >> +{
> >> +     unsigned int i;
> >> +
> >> +     for (i = 0; i < num; i++) {
> >> +             struct uverbs_attr_array *attr_spec_array =
> &attr_array[i];
> >> +             const struct uverbs_attr_group_spec *group_spec =
> >> +                     spec->attr_groups[i];
> >> +             unsigned int j;
> >> +
> >> +             for (j = 0; j < attr_spec_array->num_attrs; j++) {
> >> +                     struct uverbs_attr *attr = &attr_spec_array-
> >> >attrs[j];
> >> +                     struct uverbs_attr_spec *spec = &group_spec-
> >> >attrs[j];
> >> +
> >> +                     if (!attr->valid)
> >> +                             continue;
> >> +
> >> +                     if (spec->type == UVERBS_ATTR_TYPE_IDR ||
> >> +                         spec->type == UVERBS_ATTR_TYPE_FD)
> >> +                             /*
> >> +                              * refcounts should be handled at the
> object
> >> +                              * level and not at the uobject level.
> >> +                              */
> >> +                             uverbs_unlock_object(attr-
> >obj_attr.uobject,
> >> +                                                  spec->obj.access,
> success);
> >> +             }
> >> +     }
> >> +}
> >> +
> >> +static unsigned int get_type_orders(const struct uverbs_types_group
> >> *types_group)
> >> +{
> >> +     unsigned int i;
> >> +     unsigned int max = 0;
> >> +
> >> +     for (i = 0; i < types_group->num_groups; i++) {
> >> +             unsigned int j;
> >> +             const struct uverbs_types *types = types_group-
> >> >type_groups[i];
> >> +
> >> +             for (j = 0; j < types->num_types; j++) {
> >> +                     if (!types->types[j] || !types->types[j]-
> >alloc)
> >> +                             continue;
> >> +                     if (types->types[j]->alloc->order > max)
> >> +                             max = types->types[j]->alloc->order;
> >> +             }
> >> +     }
> >> +
> >> +     return max;
> >> +}
> >> +
> >> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
> >> *ucontext,
> >> +                                          const struct
> uverbs_types_group
> >> *types_group)
> >> +{
> >> +     unsigned int num_orders = get_type_orders(types_group);
> >> +     unsigned int i;
> >> +
> >> +     for (i = 0; i <= num_orders; i++) {
> >> +             struct ib_uobject *obj, *next_obj;
> >> +
> >> +             /*
> >> +              * No need to take lock here, as cleanup should be
> called
> >> +              * after all commands finished executing. Newly
> executed
> >> +              * commands should fail.
> >> +              */
> >> +             mutex_lock(&ucontext->uobjects_lock->lock);
> >
> > It's really confusing to see a comment about 'no need to take lock'
> immediately followed by a call to lock.
> >
> 
> Yeah :) That was before adding the fd. I'll delete the comment.
> 
> >> +             list_for_each_entry_safe(obj, next_obj, &ucontext-
> >> >uobjects,
> >> +                                      list)
> >> +                     if (obj->type->order == i) {
> >> +                             if (obj->type->type ==
> UVERBS_ATTR_TYPE_IDR)
> >> +                                     ib_uverbs_uobject_remove(obj,
> false);
> >> +                             else
> >> +                                     ib_uverbs_remove_fd(obj);
> >> +                     }
> >> +             mutex_unlock(&ucontext->uobjects_lock->lock);
> >> +     }
> >> +     kref_put(&ucontext->uobjects_lock->ref,
> >> release_uobjects_list_lock);
> >> +}
> >> +
> >> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
> >> *ucontext)
> >
> > Please work on the function names.  This is horrendously long and
> still doesn't help describe what it does.
> >
> 
> This just initialized the types part of the ucontext. Any suggestions?
> 
> >> +{
> >> +     ucontext->uobjects_lock = kmalloc(sizeof(*ucontext-
> >> >uobjects_lock),
> >> +                                       GFP_KERNEL);
> >> +     if (!ucontext->uobjects_lock)
> >> +             return -ENOMEM;
> >> +
> >> +     init_uobjects_list_lock(ucontext->uobjects_lock);
> >> +     INIT_LIST_HEAD(&ucontext->uobjects);
> >> +
> >> +     return 0;
> >> +}
> >> +
> >> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
> >> *ucontext)
> >> +{
> >> +     kfree(ucontext->uobjects_lock);
> >> +}
> >
> > No need to wrap a call to 'free'.
> >
> 
> In order to abstract away the ucontext type data structure.
> 
> >> +
> >> diff --git a/drivers/infiniband/core/rdma_core.h
> >> b/drivers/infiniband/core/rdma_core.h
> >> new file mode 100644
> >> index 0000000..8990115
> >> --- /dev/null
> >> +++ b/drivers/infiniband/core/rdma_core.h
> >> @@ -0,0 +1,75 @@
> >> +/*
> >> + * Copyright (c) 2005 Topspin Communications.  All rights reserved.
> >> + * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
> >> + * Copyright (c) 2005-2016 Mellanox Technologies. All rights
> reserved.
> >> + * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
> >> + * Copyright (c) 2005 PathScale, Inc. All rights reserved.
> >> + *
> >> + * This software is available to you under a choice of one of two
> >> + * licenses.  You may choose to be licensed under the terms of the
> GNU
> >> + * General Public License (GPL) Version 2, available from the file
> >> + * COPYING in the main directory of this source tree, or the
> >> + * OpenIB.org BSD license below:
> >> + *
> >> + *     Redistribution and use in source and binary forms, with or
> >> + *     without modification, are permitted provided that the
> following
> >> + *     conditions are met:
> >> + *
> >> + *      - Redistributions of source code must retain the above
> >> + *        copyright notice, this list of conditions and the
> following
> >> + *        disclaimer.
> >> + *
> >> + *      - Redistributions in binary form must reproduce the above
> >> + *        copyright notice, this list of conditions and the
> following
> >> + *        disclaimer in the documentation and/or other materials
> >> + *        provided with the distribution.
> >> + *
> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
> OF
> >> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> >> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
> HOLDERS
> >> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
> AN
> >> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
> IN
> >> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> >> + * SOFTWARE.
> >> + */
> >> +
> >> +#ifndef UOBJECT_H
> >> +#define UOBJECT_H
> >> +
> >> +#include <linux/idr.h>
> >> +#include <rdma/uverbs_ioctl.h>
> >> +#include <rdma/ib_verbs.h>
> >> +#include <linux/mutex.h>
> >> +
> >> +const struct uverbs_type *uverbs_get_type(const struct ib_device
> >> *ibdev,
> >> +                                       uint16_t type);
> >> +struct ib_uobject *uverbs_get_type_from_idr(const struct
> >> uverbs_type_alloc_action *type,
> >> +                                         struct ib_ucontext
> *ucontext,
> >> +                                         enum uverbs_idr_access
> access,
> >> +                                         uint32_t idr);
> >> +struct ib_uobject *uverbs_get_type_from_fd(const struct
> >> uverbs_type_alloc_action *type,
> >> +                                        struct ib_ucontext
> *ucontext,
> >> +                                        enum uverbs_idr_access
> access,
> >> +                                        int fd);
> >> +void uverbs_unlock_object(struct ib_uobject *uobj,
> >> +                       enum uverbs_idr_access access,
> >> +                       bool success);
> >> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
> >> +                        size_t num,
> >> +                        const struct uverbs_action_spec *spec,
> >> +                        bool success);
> >> +
> >> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
> >> *ucontext,
> >> +                                          const struct
> uverbs_types_group
> >> *types_group);
> >> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
> >> *ucontext);
> >> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
> >> *ucontext);
> >> +void ib_uverbs_close_fd(struct file *f);
> >> +void ib_uverbs_cleanup_fd(void *private_data);
> >> +
> >> +static inline void *uverbs_fd_to_priv(struct ib_uobject *uobj)
> >> +{
> >> +     return uobj + 1;
> >> +}
> >
> > This seems like a rather useless function.
> >
> 
> Why? The user sholdn't know or care how we put our structs together.
> 
> >> +
> >> +#endif /* UIDR_H */
> >> diff --git a/drivers/infiniband/core/uverbs.h
> >> b/drivers/infiniband/core/uverbs.h
> >> index 8074705..ae7d4b8 100644
> >> --- a/drivers/infiniband/core/uverbs.h
> >> +++ b/drivers/infiniband/core/uverbs.h
> >> @@ -180,6 +180,7 @@ void idr_remove_uobj(struct ib_uobject *uobj);
> >>  struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file
> >> *uverbs_file,
> >>                                       struct ib_device *ib_dev,
> >>                                       int is_async);
> >> +void ib_uverbs_release_file(struct kref *ref);
> >>  void ib_uverbs_free_async_event_file(struct ib_uverbs_file
> >> *uverbs_file);
> >>  struct ib_uverbs_event_file *ib_uverbs_lookup_comp_file(int fd);
> >>
> >> diff --git a/drivers/infiniband/core/uverbs_main.c
> >> b/drivers/infiniband/core/uverbs_main.c
> >> index f783723..e63357a 100644
> >> --- a/drivers/infiniband/core/uverbs_main.c
> >> +++ b/drivers/infiniband/core/uverbs_main.c
> >> @@ -341,7 +341,7 @@ static void ib_uverbs_comp_dev(struct
> >> ib_uverbs_device *dev)
> >>       complete(&dev->comp);
> >>  }
> >>
> >> -static void ib_uverbs_release_file(struct kref *ref)
> >> +void ib_uverbs_release_file(struct kref *ref)
> >>  {
> >>       struct ib_uverbs_file *file =
> >>               container_of(ref, struct ib_uverbs_file, ref);
> >> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> >> index b5d2075..7240615 100644
> >> --- a/include/rdma/ib_verbs.h
> >> +++ b/include/rdma/ib_verbs.h
> >> @@ -1329,8 +1329,11 @@ struct ib_fmr_attr {
> >>
> >>  struct ib_umem;
> >>
> >> +struct ib_ucontext_lock;
> >> +
> >>  struct ib_ucontext {
> >>       struct ib_device       *device;
> >> +     struct ib_uverbs_file  *ufile;
> >>       struct list_head        pd_list;
> >>       struct list_head        mr_list;
> >>       struct list_head        mw_list;
> >> @@ -1344,6 +1347,10 @@ struct ib_ucontext {
> >>       struct list_head        rwq_ind_tbl_list;
> >>       int                     closing;
> >>
> >> +     /* lock for uobjects list */
> >> +     struct ib_ucontext_lock *uobjects_lock;
> >> +     struct list_head        uobjects;
> >> +
> >>       struct pid             *tgid;
> >>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> >>       struct rb_root      umem_tree;
> >> @@ -1363,16 +1370,28 @@ struct ib_ucontext {
> >>  #endif
> >>  };
> >>
> >> +struct uverbs_object_list;
> >> +
> >> +#define OLD_ABI_COMPAT
> >> +
> >>  struct ib_uobject {
> >>       u64                     user_handle;    /* handle given to us
> by userspace
> >> */
> >>       struct ib_ucontext     *context;        /* associated user
> context
> >> */
> >>       void                   *object;         /* containing object
> */
> >>       struct list_head        list;           /* link to context's
> list */
> >> -     int                     id;             /* index into kernel
> idr */
> >> -     struct kref             ref;
> >> -     struct rw_semaphore     mutex;          /* protects .live */
> >> +     int                     id;             /* index into kernel
> idr/fd */
> >> +#ifdef OLD_ABI_COMPAT
> >> +     struct kref             ref;
> >> +#endif
> >> +     struct rw_semaphore     usecnt;         /* protects exclusive
> >> access */
> >> +#ifdef OLD_ABI_COMPAT
> >> +     struct rw_semaphore     mutex;          /* protects .live */
> >> +#endif
> >>       struct rcu_head         rcu;            /* kfree_rcu()
> overhead */
> >>       int                     live;
> >> +
> >> +     const struct uverbs_type_alloc_action *type;
> >> +     struct ib_ucontext_lock *uobjects_lock;
> >>  };
> >>
> >>  struct ib_udata {
> >> @@ -2101,6 +2120,9 @@ struct ib_device {
> >>        */
> >>       int (*get_port_immutable)(struct ib_device *, u8, struct
> >> ib_port_immutable *);
> >>       void (*get_dev_fw_str)(struct ib_device *, char *str, size_t
> >> str_len);
> >> +     struct list_head type_list;
> >> +
> >> +     const struct uverbs_types_group *types_group;
> >>  };
> >>
> >>  struct ib_client {
> >> diff --git a/include/rdma/uverbs_ioctl.h
> b/include/rdma/uverbs_ioctl.h
> >> new file mode 100644
> >> index 0000000..2f50045
> >> --- /dev/null
> >> +++ b/include/rdma/uverbs_ioctl.h
> >> @@ -0,0 +1,195 @@
> >> +/*
> >> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
> >> reserved.
> >> + *
> >> + * This software is available to you under a choice of one of two
> >> + * licenses.  You may choose to be licensed under the terms of the
> GNU
> >> + * General Public License (GPL) Version 2, available from the file
> >> + * COPYING in the main directory of this source tree, or the
> >> + * OpenIB.org BSD license below:
> >> + *
> >> + *     Redistribution and use in source and binary forms, with or
> >> + *     without modification, are permitted provided that the
> following
> >> + *     conditions are met:
> >> + *
> >> + *      - Redistributions of source code must retain the above
> >> + *        copyright notice, this list of conditions and the
> following
> >> + *        disclaimer.
> >> + *
> >> + *      - Redistributions in binary form must reproduce the above
> >> + *        copyright notice, this list of conditions and the
> following
> >> + *        disclaimer in the documentation and/or other materials
> >> + *        provided with the distribution.
> >> + *
> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
> OF
> >> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> >> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
> HOLDERS
> >> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
> AN
> >> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
> IN
> >> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> >> + * SOFTWARE.
> >> + */
> >> +
> >> +#ifndef _UVERBS_IOCTL_
> >> +#define _UVERBS_IOCTL_
> >> +
> >> +#include <linux/kernel.h>
> >> +
> >> +struct uverbs_object_type;
> >> +struct ib_ucontext;
> >> +struct ib_uobject;
> >> +struct ib_device;
> >> +struct uverbs_uobject_type;
> >> +
> >> +/*
> >> + * =======================================
> >> + *   Verbs action specifications
> >> + * =======================================
> >> + */
> >
> > I intentionally used urdma (though condensed to 3 letters that I
> don't recall atm), rather than uverbs.  This will need to work with
> non-verbs devices and interfaces -- again, consider how this fits with
> the rdma cm.  Verbs has a very specific meaning, which gets lost if we
> start referring to everything as 'verbs'.  It's bad enough that we're
> stuck with 'drivers/infiniband' and 'rdma', such that 'infiniband' also
> means ethernet and rdma means nothing.
> >
> 
> IMHO - let's agree on the concept of this infrastructure. One we
> decide its scope, we could generalize it (i.e - ioctl_provider and
> ioctl_context) and implement it to rdma-cm as well.
> 
> >> +
> >> +enum uverbs_attr_type {
> >> +     UVERBS_ATTR_TYPE_PTR_IN,
> >> +     UVERBS_ATTR_TYPE_PTR_OUT,
> >> +     UVERBS_ATTR_TYPE_IDR,
> >> +     UVERBS_ATTR_TYPE_FD,
> >> +};
> >> +
> >> +enum uverbs_idr_access {
> >> +     UVERBS_IDR_ACCESS_READ,
> >> +     UVERBS_IDR_ACCESS_WRITE,
> >> +     UVERBS_IDR_ACCESS_NEW,
> >> +     UVERBS_IDR_ACCESS_DESTROY
> >> +};
> >> +
> >> +struct uverbs_attr_spec {
> >> +     u16                             len;
> >> +     enum uverbs_attr_type           type;
> >> +     struct {
> >> +             u16                     obj_type;
> >> +             u8                      access;
> >
> > Is access intended to be an enum uverbs_idr_access value?
> >
> 
> Yeah, worth using this enum. Thanks.
> 
> >> +     } obj;
> >
> > I would remove (flatten) the substructure and re-order the fields for
> better alignment.
> >
> 
> I noticed there are several places which aren't aliged. It's in my todo
> list.
> 
> >> +};
> >> +
> >> +struct uverbs_attr_group_spec {
> >> +     struct uverbs_attr_spec         *attrs;
> >> +     size_t                          num_attrs;
> >> +};
> >> +
> >> +struct uverbs_action_spec {
> >> +     const struct uverbs_attr_group_spec             **attr_groups;
> >> +     /* if > 0 -> validator, otherwise, error */
> >
> > ? not sure what this comment means
> >
> >> +     int (*dist)(__u16 *attr_id, void *priv);
> >
> > What does 'dist' stand for?
> >
> 
> dist = distribution function.
> It maps the attributes you got from the user-space to your groups. You
> could think of each group as a namespace - where its attributes (or
> types/actions) starts from zero in the sake of compactness.
> So, for example, it gets an attribute 0x8010 and maps to to "group 1"
> (provider) and attribute 0x10.
> 
> >> +     void                                            *priv;
> >> +     size_t                                          num_groups;
> >> +};
> >> +
> >> +struct uverbs_attr_array;
> >> +struct ib_uverbs_file;
> >> +
> >> +struct uverbs_action {
> >> +     struct uverbs_action_spec spec;
> >> +     void *priv;
> >> +     int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file
> >> *ufile,
> >> +                    struct uverbs_attr_array *ctx, size_t num, void
> >> *priv);
> >> +};
> >> +
> >> +struct uverbs_type_alloc_action;
> >> +typedef void (*free_type)(const struct uverbs_type_alloc_action
> >> *uobject_type,
> >> +                       struct ib_uobject *uobject);
> >> +
> >> +struct uverbs_type_alloc_action {
> >> +     enum uverbs_attr_type           type;
> >> +     int                             order;
> >
> > I think this is being used as destroy order, in which case I would
> rename it to clarify the intent.  Though I'd prefer we come up with a
> more efficient destruction mechanism than the repeated nested looping.
> >
> 
> In one of the earlier revisions I used a sorted list, which was
> efficient. I recall that Jason didn't like its complexity and
> re-thinking about that - he's right. Most of your types are "order
> number" 0 anyway. So you'll probably iterate very few objects in the
> next round (in verbs, everything but MRs and PDs).
> 
> >> +     size_t                          obj_size;
> >
> > This can be alloc_fn
> >
> >> +     free_type                       free_fn;
> >> +     struct {
> >> +             const struct file_operations    *fops;
> >> +             const char                      *name;
> >> +             int                             flags;
> >> +     } fd;
> >> +};
> >> +
> >> +struct uverbs_type_actions_group {
> >> +     size_t                                  num_actions;
> >> +     const struct uverbs_action              **actions;
> >> +};
> >> +
> >> +struct uverbs_type {
> >> +     size_t                                  num_groups;
> >> +     const struct uverbs_type_actions_group  **action_groups;
> >> +     const struct uverbs_type_alloc_action   *alloc;
> >> +     int (*dist)(__u16 *action_id, void *priv);
> >> +     void                                    *priv;
> >> +};
> >> +
> >> +struct uverbs_types {
> >> +     size_t                                  num_types;
> >> +     const struct uverbs_type                **types;
> >> +};
> >> +
> >> +struct uverbs_types_group {
> >> +     const struct uverbs_types               **type_groups;
> >> +     size_t                                  num_groups;
> >> +     int (*dist)(__u16 *type_id, void *priv);
> >> +     void                                    *priv;
> >> +};
> >> +
> >> +/* =================================================
> >> + *              Parsing infrastructure
> >> + * =================================================
> >> + */
> >> +
> >> +struct uverbs_ptr_attr {
> >> +     void    * __user ptr;
> >> +     __u16           len;
> >> +};
> >> +
> >> +struct uverbs_fd_attr {
> >> +     int             fd;
> >> +};
> >> +
> >> +struct uverbs_uobj_attr {
> >> +     /*  idr handle */
> >> +     __u32   idr;
> >> +};
> >> +
> >> +struct uverbs_obj_attr {
> >> +     /* pointer to the kernel descriptor -> type, access, etc */
> >> +     const struct uverbs_attr_spec *val;
> >> +     struct ib_uverbs_attr __user    *uattr;
> >> +     const struct uverbs_type_alloc_action   *type;
> >> +     struct ib_uobject               *uobject;
> >> +     union {
> >> +             struct uverbs_fd_attr           fd;
> >> +             struct uverbs_uobj_attr         uobj;
> >> +     };
> >> +};
> >> +
> >> +struct uverbs_attr {
> >> +     bool valid; > >> +     union {
> >> +             struct uverbs_ptr_attr  cmd_attr;
> >> +             struct uverbs_obj_attr  obj_attr;
> >> +     };
> >> +};
> >
> > It's odd to have a union that's part of a structure without some
> field to indicate which union field is accessible.
> >
> 
> You index this array but the attribute id from the user's callback
> funciton. The user should know what's the type of the attribute, as
> [s]he declared the specification.
> 
> >> +
> >> +/* output of one validator */
> >> +struct uverbs_attr_array {
> >> +     size_t num_attrs;
> >> +     /* arrays of attrubytes, index is the id i.e SEND_CQ */
> >> +     struct uverbs_attr *attrs;
> >> +};
> >> +
> >> +/* =================================================
> >> + *              Types infrastructure
> >> + * =================================================
> >> + */
> >> +
> >> +int ib_uverbs_uobject_type_add(struct list_head      *head,
> >> +                            void (*free)(struct uverbs_uobject_type
> *type,
> >> +                                         struct ib_uobject
> *uobject,
> >> +                                         struct ib_ucontext
> *ucontext),
> >> +                            uint16_t obj_type);
> >> +void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev);
> >> +
> >> +#endif
> >> --
> >> 2.7.4

Matan, please re-look at the architecture that I proposed:

https://patchwork.kernel.org/patch/9178991/

including the terminology (and consider using common OOP terms).  The *core* of the ioctl framework is to simply invoke a function dispatch table.  IMO, that's where we should start.  Anything beyond that is extra that we should have a strong reason for including.  (Yes, I think we need more.)  Starting simple and adding necessary functionality should let us get something upstream quicker and re-use more of the existing code.

If we're going to re-create netlink as part of the rdma ioctl interface, then why don't we just use netlink directly?
N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 02/10] RDMA/core: Add support for custom types
       [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373AB0A8000-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2016-11-09 18:50                   ` Jason Gunthorpe
  2016-11-10  8:29                   ` Matan Barak
  1 sibling, 0 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2016-11-09 18:50 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Matan Barak, Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Doug Ledford, Christoph Lameter, Liran Liss, Haggai Eran,
	Majd Dibbiny, Tal Alon, Leon Romanovsky

On Wed, Nov 09, 2016 at 06:00:48PM +0000, Hefty, Sean wrote:

> In any case, the two approaches are not exclusive.  By forcing the
> rule language into the framework, everything is forced to deal with
> it.  By leaving it out, each ioctl provider can decide if they need
> this or not.  If you want verbs to process all ioctl's using a
> single pre-validation function that operates based on these rules
> you can.  Nothing prevents that.  But ioctl providers that want
> better performance can elect for a more straightforward validation
> model.

The pre-validation is tied into the hash expansion and will hopefully
be the raw data to support a new discoverability scheme. So, making it
optional really wrecks the whole design, I think.

Also, this is really the best way to ensure that we have consistent
checking and error reporting around attributes (eg what happens if the
kernel does not support a requested attribute, or uses the wrong size,
etc) It is very important these things use the correct errnos not the
random mismatch we see today.

I'm not seeing that it is a clear peformance loss (relative to open
coding at least), the major work is the expansion to the hash table
and doing a couple size tests along the way is not hard. Matan's
revised series should be event better on this regard as I gave alot
of feedback to speed it up at plumbers.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC ABI V5 02/10] RDMA/core: Add support for custom types
       [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373AB0A8000-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2016-11-09 18:50                   ` Jason Gunthorpe
@ 2016-11-10  8:29                   ` Matan Barak
  1 sibling, 0 replies; 29+ messages in thread
From: Matan Barak @ 2016-11-10  8:29 UTC (permalink / raw)
  To: Hefty, Sean, Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Christoph Lameter, Liran Liss, Haggai Eran, Majd Dibbiny,
	Tal Alon, Leon Romanovsky

On 09/11/2016 20:00, Hefty, Sean wrote:
>> I had thought about that, but the user could initialize its part of
>> the object in the function handler. It can't allocate the object as we
>> need it in order to allocate an IDR entry and co. The assumption here
>> is that the "unlock" stage can't fail.
>
> This is creating a generic OO type of framework, so just add constructor/destructor functions and have all objects inherit from a base ioctl object class.
>

Adding a constructor and destructor to every object would make the 
infrastructure slower, it'll open code the locks (which are more related 
to the actions themselves and not to the types), it'll duplicate more 
code. Anyway, examining our use, I don't really see good value for ctors 
and dtors for our usage right now.

>>> In fact, it would be great if we could just cleanup the list in the
>> reverse order that items were created.  Maybe this requires supporting
>> a pre-cleanup handler, so that the driver can pluck items out of the
>> list that may need to be destroyed out of order.
>>>
>>
>> So that's essentially one layer of ordering. Why do you consider a
>> driver iterating over all objects simpler than this model?
>
> This problem is a verbs specific issue, and one that only involves MW .  We have reference counts that can provide the same functionality.  I want to avoid the amount of meta-data that needs to be used to describe objects.
>

It's currently happens in the verbs MW/MRs. However, it could happen 
with any types whose bindings happen in the user-space or hardware.

>>>> Adding an object is done in two parts.
>>>> First, an object is allocated and added to IDR/fd table. Then, the
>>>> command's handlers (in downstream patches) could work on this object
>>>> and fill in its required details.
>>>> After a successful command, ib_uverbs_uobject_enable is called and
>>>> this user objects becomes ucontext visible.
>>>
>>> If you have a way to mark that an object is used for exclusive
>> access, you may be able to use that instead of introducing a new
>> variable.  (I.e. acquire the object's write lock).  I think we want to
>> make an effort to minimize the size of the kernel structure needed to
>> track every user space object (within reason).
>>>
>>
>> I didn't really follow. A command attribute states the nature of the
>> locking (for example, in MODIFY_QP the QP could be exclusively locked,
>> but in QUERY_QP it's only locked for reading). I don't want to really
>> grab a lock, as if I were I could face a dead-lock (user-space could
>> pass parameters in a colliding order), It could be solved by sorting
>> the handles, but that would degrade performance without a good reasob.
>
> I'm suggesting that the locking attribute and command be separate.  This allows the framework to acquire the proper type of lock independent of what function it will invoke.
>

The locking is tied to what you want to do on that type. If you query 
something, read locks are probably enough. I don't think separating them 
will make the code more readable.

> The framework doesn't need to hold locks.  It should be able to mark access to an object.  If that access is not available, it can abort.  This pushes more complex synchronization and thread handling to user space.
>

The try_{read,write} locks achieve exactly that and are simple enough.

>>>> Removing an uboject is done by calling ib_uverbs_uobject_remove.
>>>>
>>>> We should make sure IDR (per-device) and list (per-ucontext) could
>>>> be accessed concurrently without corrupting them.
>>>>
>>>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> ---
>>>
>>> As a general comment, I do have concerns that the resulting
>> generalized parsing of everything will negatively impact performance
>> for operations that do have to transition into the kernel.  Not all
>> devices offload all operations to user space.  Plus the resulting code
>> is extremely difficult to read and non-trivial to use.  It's equivalent
>> to reading C++ code that has 4 layers of inheritance with overrides to
>> basic operators...
>>
>> There are two parts here. I think the handlers themselves are simpler,
>> easier to read and less error-prone. They contain less code
>> duplications. The macro based define language explicitly declare all
>> attributes, their types, size, etc.
>> The model here is a bit more complex as we want to achieve both code
>> resue and add/override of new types/actions/attributes.
>>
>>
>>>
>>> Pre and post operators per command that can do straightforward
>> validation seem like a better option.
>>>
>>>
>>
>> I think that would duplicate a lot of code and will be more
>> error-prone than one infrastrucutre that automates all that work for
>> you.
>
> I think that's a toss-up.  Either you have to write the code correctly or write the rules correctly.  Reading code is straightforward, manually converting rules into code is not.
>
> In any case, the two approaches are not exclusive.  By forcing the rule language into the framework, everything is forced to deal with it.  By leaving it out, each ioctl provider can decide if they need this or not.  If you want verbs to process all ioctl's using a single pre-validation function that operates based on these rules you can.  Nothing prevents that.  But ioctl providers that want better performance can elect for a more straightforward validation model.
>

Please see Jason's response.

>>>>  drivers/infiniband/core/Makefile      |   3 +-
>>>>  drivers/infiniband/core/device.c      |   1 +
>>>>  drivers/infiniband/core/rdma_core.c   | 489
>>>> ++++++++++++++++++++++++++++++++++
>>>>  drivers/infiniband/core/rdma_core.h   |  75 ++++++
>>>>  drivers/infiniband/core/uverbs.h      |   1 +
>>>>  drivers/infiniband/core/uverbs_main.c |   2 +-
>>>>  include/rdma/ib_verbs.h               |  28 +-
>>>>  include/rdma/uverbs_ioctl.h           | 195 ++++++++++++++
>>>>  8 files changed, 789 insertions(+), 5 deletions(-)
>>>>  create mode 100644 drivers/infiniband/core/rdma_core.c
>>>>  create mode 100644 drivers/infiniband/core/rdma_core.h
>>>>  create mode 100644 include/rdma/uverbs_ioctl.h
>>>>
>>>> diff --git a/drivers/infiniband/core/Makefile
>>>> b/drivers/infiniband/core/Makefile
>>>> index edaae9f..1819623 100644
>>>> --- a/drivers/infiniband/core/Makefile
>>>> +++ b/drivers/infiniband/core/Makefile
>>>> @@ -28,4 +28,5 @@ ib_umad-y :=                        user_mad.o
>>>>
>>>>  ib_ucm-y :=                  ucm.o
>>>>
>>>> -ib_uverbs-y :=                       uverbs_main.o uverbs_cmd.o
>>>> uverbs_marshall.o
>>>> +ib_uverbs-y :=                       uverbs_main.o uverbs_cmd.o
>>>> uverbs_marshall.o \
>>>> +                             rdma_core.o
>>>> diff --git a/drivers/infiniband/core/device.c
>>>> b/drivers/infiniband/core/device.c
>>>> index c3b68f5..43994b1 100644
>>>> --- a/drivers/infiniband/core/device.c
>>>> +++ b/drivers/infiniband/core/device.c
>>>> @@ -243,6 +243,7 @@ struct ib_device *ib_alloc_device(size_t size)
>>>>       spin_lock_init(&device->client_data_lock);
>>>>       INIT_LIST_HEAD(&device->client_data_list);
>>>>       INIT_LIST_HEAD(&device->port_list);
>>>> +     INIT_LIST_HEAD(&device->type_list);
>>>>
>>>>       return device;
>>>>  }
>>>> diff --git a/drivers/infiniband/core/rdma_core.c
>>>> b/drivers/infiniband/core/rdma_core.c
>>>> new file mode 100644
>>>> index 0000000..337abc2
>>>> --- /dev/null
>>>> +++ b/drivers/infiniband/core/rdma_core.c
>>>> @@ -0,0 +1,489 @@
>>>> +/*
>>>> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
>>>> reserved.
>>>> + *
>>>> + * This software is available to you under a choice of one of two
>>>> + * licenses.  You may choose to be licensed under the terms of the
>> GNU
>>>> + * General Public License (GPL) Version 2, available from the file
>>>> + * COPYING in the main directory of this source tree, or the
>>>> + * OpenIB.org BSD license below:
>>>> + *
>>>> + *     Redistribution and use in source and binary forms, with or
>>>> + *     without modification, are permitted provided that the
>> following
>>>> + *     conditions are met:
>>>> + *
>>>> + *      - Redistributions of source code must retain the above
>>>> + *        copyright notice, this list of conditions and the
>> following
>>>> + *        disclaimer.
>>>> + *
>>>> + *      - Redistributions in binary form must reproduce the above
>>>> + *        copyright notice, this list of conditions and the
>> following
>>>> + *        disclaimer in the documentation and/or other materials
>>>> + *        provided with the distribution.
>>>> + *
>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
>> OF
>>>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>>>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
>> HOLDERS
>>>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
>> AN
>>>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
>> IN
>>>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>>>> + * SOFTWARE.
>>>> + */
>>>> +
>>>> +#include <linux/file.h>
>>>> +#include <linux/anon_inodes.h>
>>>> +#include <rdma/ib_verbs.h>
>>>> +#include "uverbs.h"
>>>> +#include "rdma_core.h"
>>>> +#include <rdma/uverbs_ioctl.h>
>>>> +
>>>> +const struct uverbs_type *uverbs_get_type(const struct ib_device
>>>> *ibdev,
>>>> +                                       uint16_t type)
>>>> +{
>>>> +     const struct uverbs_types_group *groups = ibdev->types_group;
>>>> +     const struct uverbs_types *types;
>>>> +     int ret = groups->dist(&type, groups->priv);
>>>> +
>>>> +     if (ret >= groups->num_groups)
>>>> +             return NULL;
>>>> +
>>>> +     types = groups->type_groups[ret];
>>>> +
>>>> +     if (type >= types->num_types)
>>>> +             return NULL;
>>>> +
>>>> +     return types->types[type];
>>>> +}
>>>> +
>>>> +static int uverbs_lock_object(struct ib_uobject *uobj,
>>>> +                           enum uverbs_idr_access access)
>>>> +{
>>>> +     if (access == UVERBS_IDR_ACCESS_READ)
>>>> +             return down_read_trylock(&uobj->usecnt) == 1 ? 0 : -
>> EBUSY;
>>>> +
>>>> +     /* lock is either WRITE or DESTROY - should be exclusive */
>>>> +     return down_write_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
>>>
>>> This function could take the lock type directly (read or write),
>> versus inferring it based on some other access type.
>>>
>>
>> We can, but since we use these enums in the attribute specifications,
>> I thought it could be more convinient.
>>
>>>> +}
>>>> +
>>>> +static struct ib_uobject *get_uobj(int id, struct ib_ucontext
>>>> *context)
>>>> +{
>>>> +     struct ib_uobject *uobj;
>>>> +
>>>> +     rcu_read_lock();
>>>> +     uobj = idr_find(&context->device->idr, id);
>>>> +     if (uobj && uobj->live) {
>>>> +             if (uobj->context != context)
>>>> +                     uobj = NULL;
>>>> +     }
>>>> +     rcu_read_unlock();
>>>> +
>>>> +     return uobj;
>>>> +}
>>>> +
>>>> +struct ib_ucontext_lock {
>>>> +     struct kref  ref;
>>>> +     /* locking the uobjects_list */
>>>> +     struct mutex lock;
>>>> +};
>>>> +
>>>> +static void init_uobjects_list_lock(struct ib_ucontext_lock *lock)
>>>> +{
>>>> +     mutex_init(&lock->lock);
>>>> +     kref_init(&lock->ref);
>>>> +}
>>>> +
>>>> +static void release_uobjects_list_lock(struct kref *ref)
>>>> +{
>>>> +     struct ib_ucontext_lock *lock = container_of(ref,
>>>> +                                                  struct
>> ib_ucontext_lock,
>>>> +                                                  ref);
>>>> +
>>>> +     kfree(lock);
>>>> +}
>>>> +
>>>> +static void init_uobj(struct ib_uobject *uobj, u64 user_handle,
>>>> +                   struct ib_ucontext *context)
>>>> +{
>>>> +     init_rwsem(&uobj->usecnt);
>>>> +     uobj->user_handle = user_handle;
>>>> +     uobj->context     = context;
>>>> +     uobj->live        = 0;
>>>> +}
>>>> +
>>>> +static int add_uobj(struct ib_uobject *uobj)
>>>> +{
>>>> +     int ret;
>>>> +
>>>> +     idr_preload(GFP_KERNEL);
>>>> +     spin_lock(&uobj->context->device->idr_lock);
>>>> +
>>>> +     ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0,
>>>> GFP_NOWAIT);
>>>> +     if (ret >= 0)
>>>> +             uobj->id = ret;
>>>> +
>>>> +     spin_unlock(&uobj->context->device->idr_lock);
>>>> +     idr_preload_end();
>>>> +
>>>> +     return ret < 0 ? ret : 0;
>>>> +}
>>>> +
>>>> +static void remove_uobj(struct ib_uobject *uobj)
>>>> +{
>>>> +     spin_lock(&uobj->context->device->idr_lock);
>>>> +     idr_remove(&uobj->context->device->idr, uobj->id);
>>>> +     spin_unlock(&uobj->context->device->idr_lock);
>>>> +}
>>>> +
>>>> +static void put_uobj(struct ib_uobject *uobj)
>>>> +{
>>>> +     kfree_rcu(uobj, rcu);
>>>> +}
>>>> +
>>>> +static struct ib_uobject *get_uobject_from_context(struct
>> ib_ucontext
>>>> *ucontext,
>>>> +                                                const struct
>>>> uverbs_type_alloc_action *type,
>>>> +                                                u32 idr,
>>>> +                                                enum
>> uverbs_idr_access access)
>>>> +{
>>>> +     struct ib_uobject *uobj;
>>>> +     int ret;
>>>> +
>>>> +     rcu_read_lock();
>>>> +     uobj = get_uobj(idr, ucontext);
>>>> +     if (!uobj)
>>>> +             goto free;
>>>> +
>>>> +     if (uobj->type != type) {
>>>> +             uobj = NULL;
>>>> +             goto free;
>>>> +     }
>>>> +
>>>> +     ret = uverbs_lock_object(uobj, access);
>>>> +     if (ret)
>>>> +             uobj = ERR_PTR(ret);
>>>> +free:
>>>> +     rcu_read_unlock();
>>>> +     return uobj;
>>>> +
>>>> +     return NULL;
>>>> +}
>>>> +
>>>> +static int ib_uverbs_uobject_add(struct ib_uobject *uobject,
>>>> +                              const struct uverbs_type_alloc_action
>>>> *uobject_type)
>>>> +{
>>>> +     uobject->type = uobject_type;
>>>> +     return add_uobj(uobject);
>>>> +}
>>>> +
>>>> +struct ib_uobject *uverbs_get_type_from_idr(const struct
>>>> uverbs_type_alloc_action *type,
>>>> +                                         struct ib_ucontext
>> *ucontext,
>>>> +                                         enum uverbs_idr_access
>> access,
>>>> +                                         uint32_t idr)
>>>> +{
>>>> +     struct ib_uobject *uobj;
>>>> +     int ret;
>>>> +
>>>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>>>> +             uobj = kmalloc(type->obj_size, GFP_KERNEL);
>>>> +             if (!uobj)
>>>> +                     return ERR_PTR(-ENOMEM);
>>>> +
>>>> +             init_uobj(uobj, 0, ucontext);
>>>> +
>>>> +             /* lock idr */
>>>
>>> Command to lock idr, but no lock is obtained.
>>>
>>
>> ib_uverbs_uobject_add calls add_uobj which locks the IDR.
>>
>>>> +             ret = ib_uverbs_uobject_add(uobj, type);
>>>> +             if (ret) {
>>>> +                     kfree(uobj);
>>>> +                     return ERR_PTR(ret);
>>>> +             }
>>>> +
>>>> +     } else {
>>>> +             uobj = get_uobject_from_context(ucontext, type, idr,
>>>> +                                             access);
>>>> +
>>>> +             if (!uobj)
>>>> +                     return ERR_PTR(-ENOENT);
>>>> +     }
>>>> +
>>>> +     return uobj;
>>>> +}
>>>> +
>>>> +struct ib_uobject *uverbs_get_type_from_fd(const struct
>>>> uverbs_type_alloc_action *type,
>>>> +                                        struct ib_ucontext
>> *ucontext,
>>>> +                                        enum uverbs_idr_access
>> access,
>>>> +                                        int fd)
>>>> +{
>>>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>>>> +             int _fd;
>>>> +             struct ib_uobject *uobj = NULL;
>>>> +             struct file *filp;
>>>> +
>>>> +             _fd = get_unused_fd_flags(O_CLOEXEC);
>>>> +             if (_fd < 0 || WARN_ON(type->obj_size < sizeof(struct
>>>> ib_uobject)))
>>>> +                     return ERR_PTR(_fd);
>>>> +
>>>> +             uobj = kmalloc(type->obj_size, GFP_KERNEL);
>>>> +             init_uobj(uobj, 0, ucontext);
>>>> +
>>>> +             if (!uobj)
>>>> +                     return ERR_PTR(-ENOMEM);
>>>> +
>>>> +             filp = anon_inode_getfile(type->fd.name, type-
>>> fd.fops,
>>>> +                                       uobj + 1, type->fd.flags);
>>>> +             if (IS_ERR(filp)) {
>>>> +                     put_unused_fd(_fd);
>>>> +                     kfree(uobj);
>>>> +                     return (void *)filp;
>>>> +             }
>>>> +
>>>> +             uobj->type = type;
>>>> +             uobj->id = _fd;
>>>> +             uobj->object = filp;
>>>> +
>>>> +             return uobj;
>>>> +     } else if (access == UVERBS_IDR_ACCESS_READ) {
>>>> +             struct file *f = fget(fd);
>>>> +             struct ib_uobject *uobject;
>>>> +
>>>> +             if (!f)
>>>> +                     return ERR_PTR(-EBADF);
>>>> +
>>>> +             uobject = f->private_data - sizeof(struct ib_uobject);
>>>> +             if (f->f_op != type->fd.fops ||
>>>> +                 !uobject->live) {
>>>> +                     fput(f);
>>>> +                     return ERR_PTR(-EBADF);
>>>> +             }
>>>> +
>>>> +             /*
>>>> +              * No need to protect it with a ref count, as fget
>>>> increases
>>>> +              * f_count.
>>>> +              */
>>>> +             return uobject;
>>>> +     } else {
>>>> +             return ERR_PTR(-EOPNOTSUPP);
>>>> +     }
>>>> +}
>>>> +
>>>> +static void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
>>>> +{
>>>> +     mutex_lock(&uobject->context->uobjects_lock->lock);
>>>> +     list_add(&uobject->list, &uobject->context->uobjects);
>>>> +     mutex_unlock(&uobject->context->uobjects_lock->lock);
>>>
>>> Why not just insert the object into the list on creation?
>>>
>>>> +     uobject->live = 1;
>>>
>>> See my comments above on removing the live field.
>>>
>>
>> Seems that the list could suffice, but I'll look into that.
>>
>>>> +}
>>>> +
>>>> +static void ib_uverbs_uobject_remove(struct ib_uobject *uobject,
>> bool
>>>> lock)
>>>> +{
>>>> +     /*
>>>> +      * Calling remove requires exclusive access, so it's not
>> possible
>>>> +      * another thread will use our object.
>>>> +      */
>>>> +     uobject->live = 0;
>>>> +     uobject->type->free_fn(uobject->type, uobject);
>>>> +     if (lock)
>>>> +             mutex_lock(&uobject->context->uobjects_lock->lock);
>>>> +     list_del(&uobject->list);
>>>> +     if (lock)
>>>> +             mutex_unlock(&uobject->context->uobjects_lock->lock);
>>>> +     remove_uobj(uobject);
>>>> +     put_uobj(uobject);
>>>> +}
>>>> +
>>>> +static void uverbs_unlock_idr(struct ib_uobject *uobj,
>>>> +                           enum uverbs_idr_access access,
>>>> +                           bool success)
>>>> +{
>>>> +     switch (access) {
>>>> +     case UVERBS_IDR_ACCESS_READ:
>>>> +             up_read(&uobj->usecnt);
>>>> +             break;
>>>> +     case UVERBS_IDR_ACCESS_NEW:
>>>> +             if (success) {
>>>> +                     ib_uverbs_uobject_enable(uobj);
>>>> +             } else {
>>>> +                     remove_uobj(uobj);
>>>> +                     put_uobj(uobj);
>>>> +             }
>>>> +             break;
>>>> +     case UVERBS_IDR_ACCESS_WRITE:
>>>> +             up_write(&uobj->usecnt);
>>>> +             break;
>>>> +     case UVERBS_IDR_ACCESS_DESTROY:
>>>> +             if (success)
>>>> +                     ib_uverbs_uobject_remove(uobj, true);
>>>> +             else
>>>> +                     up_write(&uobj->usecnt);
>>>> +             break;
>>>> +     }
>>>> +}
>>>> +
>>>> +static void uverbs_unlock_fd(struct ib_uobject *uobj,
>>>> +                          enum uverbs_idr_access access,
>>>> +                          bool success)
>>>> +{
>>>> +     struct file *filp = uobj->object;
>>>> +
>>>> +     if (access == UVERBS_IDR_ACCESS_NEW) {
>>>> +             if (success) {
>>>> +                     kref_get(&uobj->context->ufile->ref);
>>>> +                     uobj->uobjects_lock = uobj->context-
>>> uobjects_lock;
>>>> +                     kref_get(&uobj->uobjects_lock->ref);
>>>> +                     ib_uverbs_uobject_enable(uobj);
>>>> +                     fd_install(uobj->id, uobj->object);
>>>
>>> I don't get this.  The function is unlocking something, but there are
>> calls to get krefs?
>>>
>>
>> Before invoking the user's callback, we're first locking all objects
>> and afterwards we're unlocking them. When we need to create a new
>> object, the lock becomes object creation and the unlock could become
>> (assuming the user's callback succeeded) enabling this new object.
>> When you add a new object (or fd in this case), we take a reference
>> count to both the uverbs_file and the locking context.
>>
>>>> +             } else {
>>>> +                     fput(uobj->object);
>>>> +                     put_unused_fd(uobj->id);
>>>> +                     kfree(uobj);
>>>> +             }
>>>> +     } else {
>>>> +             fput(filp);
>>>> +     }
>>>> +}
>>>> +
>>>> +void uverbs_unlock_object(struct ib_uobject *uobj,
>>>> +                       enum uverbs_idr_access access,
>>>> +                       bool success)
>>>> +{
>>>> +     if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
>>>> +             uverbs_unlock_idr(uobj, access, success);
>>>> +     else if (uobj->type->type == UVERBS_ATTR_TYPE_FD)
>>>> +             uverbs_unlock_fd(uobj, access, success);
>>>> +     else
>>>> +             WARN_ON(true);
>>>> +}
>>>> +
>>>> +static void ib_uverbs_remove_fd(struct ib_uobject *uobject)
>>>> +{
>>>> +     /*
>>>> +      * user should release the uobject in the release
>>>> +      * callback.
>>>> +      */
>>>> +     if (uobject->live) {
>>>> +             uobject->live = 0;
>>>> +             list_del(&uobject->list);
>>>> +             uobject->type->free_fn(uobject->type, uobject);
>>>> +             kref_put(&uobject->context->ufile->ref,
>>>> ib_uverbs_release_file);
>>>> +             uobject->context = NULL;
>>>> +     }
>>>> +}
>>>> +
>>>> +void ib_uverbs_close_fd(struct file *f)
>>>> +{
>>>> +     struct ib_uobject *uobject = f->private_data - sizeof(struct
>>>> ib_uobject);
>>>> +
>>>> +     mutex_lock(&uobject->uobjects_lock->lock);
>>>> +     if (uobject->live) {
>>>> +             uobject->live = 0;
>>>> +             list_del(&uobject->list);
>>>> +             kref_put(&uobject->context->ufile->ref,
>>>> ib_uverbs_release_file);
>>>> +             uobject->context = NULL;
>>>> +     }
>>>> +     mutex_unlock(&uobject->uobjects_lock->lock);
>>>> +     kref_put(&uobject->uobjects_lock->ref,
>>>> release_uobjects_list_lock);
>>>> +}
>>>> +
>>>> +void ib_uverbs_cleanup_fd(void *private_data)
>>>> +{
>>>> +     struct ib_uboject *uobject = private_data - sizeof(struct
>>>> ib_uobject);
>>>> +
>>>> +     kfree(uobject);
>>>> +}
>>>> +
>>>> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
>>>> +                        size_t num,
>>>> +                        const struct uverbs_action_spec *spec,
>>>> +                        bool success)
>>>> +{
>>>> +     unsigned int i;
>>>> +
>>>> +     for (i = 0; i < num; i++) {
>>>> +             struct uverbs_attr_array *attr_spec_array =
>> &attr_array[i];
>>>> +             const struct uverbs_attr_group_spec *group_spec =
>>>> +                     spec->attr_groups[i];
>>>> +             unsigned int j;
>>>> +
>>>> +             for (j = 0; j < attr_spec_array->num_attrs; j++) {
>>>> +                     struct uverbs_attr *attr = &attr_spec_array-
>>>>> attrs[j];
>>>> +                     struct uverbs_attr_spec *spec = &group_spec-
>>>>> attrs[j];
>>>> +
>>>> +                     if (!attr->valid)
>>>> +                             continue;
>>>> +
>>>> +                     if (spec->type == UVERBS_ATTR_TYPE_IDR ||
>>>> +                         spec->type == UVERBS_ATTR_TYPE_FD)
>>>> +                             /*
>>>> +                              * refcounts should be handled at the
>> object
>>>> +                              * level and not at the uobject level.
>>>> +                              */
>>>> +                             uverbs_unlock_object(attr-
>>> obj_attr.uobject,
>>>> +                                                  spec->obj.access,
>> success);
>>>> +             }
>>>> +     }
>>>> +}
>>>> +
>>>> +static unsigned int get_type_orders(const struct uverbs_types_group
>>>> *types_group)
>>>> +{
>>>> +     unsigned int i;
>>>> +     unsigned int max = 0;
>>>> +
>>>> +     for (i = 0; i < types_group->num_groups; i++) {
>>>> +             unsigned int j;
>>>> +             const struct uverbs_types *types = types_group-
>>>>> type_groups[i];
>>>> +
>>>> +             for (j = 0; j < types->num_types; j++) {
>>>> +                     if (!types->types[j] || !types->types[j]-
>>> alloc)
>>>> +                             continue;
>>>> +                     if (types->types[j]->alloc->order > max)
>>>> +                             max = types->types[j]->alloc->order;
>>>> +             }
>>>> +     }
>>>> +
>>>> +     return max;
>>>> +}
>>>> +
>>>> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
>>>> *ucontext,
>>>> +                                          const struct
>> uverbs_types_group
>>>> *types_group)
>>>> +{
>>>> +     unsigned int num_orders = get_type_orders(types_group);
>>>> +     unsigned int i;
>>>> +
>>>> +     for (i = 0; i <= num_orders; i++) {
>>>> +             struct ib_uobject *obj, *next_obj;
>>>> +
>>>> +             /*
>>>> +              * No need to take lock here, as cleanup should be
>> called
>>>> +              * after all commands finished executing. Newly
>> executed
>>>> +              * commands should fail.
>>>> +              */
>>>> +             mutex_lock(&ucontext->uobjects_lock->lock);
>>>
>>> It's really confusing to see a comment about 'no need to take lock'
>> immediately followed by a call to lock.
>>>
>>
>> Yeah :) That was before adding the fd. I'll delete the comment.
>>
>>>> +             list_for_each_entry_safe(obj, next_obj, &ucontext-
>>>>> uobjects,
>>>> +                                      list)
>>>> +                     if (obj->type->order == i) {
>>>> +                             if (obj->type->type ==
>> UVERBS_ATTR_TYPE_IDR)
>>>> +                                     ib_uverbs_uobject_remove(obj,
>> false);
>>>> +                             else
>>>> +                                     ib_uverbs_remove_fd(obj);
>>>> +                     }
>>>> +             mutex_unlock(&ucontext->uobjects_lock->lock);
>>>> +     }
>>>> +     kref_put(&ucontext->uobjects_lock->ref,
>>>> release_uobjects_list_lock);
>>>> +}
>>>> +
>>>> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
>>>> *ucontext)
>>>
>>> Please work on the function names.  This is horrendously long and
>> still doesn't help describe what it does.
>>>
>>
>> This just initialized the types part of the ucontext. Any suggestions?
>>
>>>> +{
>>>> +     ucontext->uobjects_lock = kmalloc(sizeof(*ucontext-
>>>>> uobjects_lock),
>>>> +                                       GFP_KERNEL);
>>>> +     if (!ucontext->uobjects_lock)
>>>> +             return -ENOMEM;
>>>> +
>>>> +     init_uobjects_list_lock(ucontext->uobjects_lock);
>>>> +     INIT_LIST_HEAD(&ucontext->uobjects);
>>>> +
>>>> +     return 0;
>>>> +}
>>>> +
>>>> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
>>>> *ucontext)
>>>> +{
>>>> +     kfree(ucontext->uobjects_lock);
>>>> +}
>>>
>>> No need to wrap a call to 'free'.
>>>
>>
>> In order to abstract away the ucontext type data structure.
>>
>>>> +
>>>> diff --git a/drivers/infiniband/core/rdma_core.h
>>>> b/drivers/infiniband/core/rdma_core.h
>>>> new file mode 100644
>>>> index 0000000..8990115
>>>> --- /dev/null
>>>> +++ b/drivers/infiniband/core/rdma_core.h
>>>> @@ -0,0 +1,75 @@
>>>> +/*
>>>> + * Copyright (c) 2005 Topspin Communications.  All rights reserved.
>>>> + * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
>>>> + * Copyright (c) 2005-2016 Mellanox Technologies. All rights
>> reserved.
>>>> + * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
>>>> + * Copyright (c) 2005 PathScale, Inc. All rights reserved.
>>>> + *
>>>> + * This software is available to you under a choice of one of two
>>>> + * licenses.  You may choose to be licensed under the terms of the
>> GNU
>>>> + * General Public License (GPL) Version 2, available from the file
>>>> + * COPYING in the main directory of this source tree, or the
>>>> + * OpenIB.org BSD license below:
>>>> + *
>>>> + *     Redistribution and use in source and binary forms, with or
>>>> + *     without modification, are permitted provided that the
>> following
>>>> + *     conditions are met:
>>>> + *
>>>> + *      - Redistributions of source code must retain the above
>>>> + *        copyright notice, this list of conditions and the
>> following
>>>> + *        disclaimer.
>>>> + *
>>>> + *      - Redistributions in binary form must reproduce the above
>>>> + *        copyright notice, this list of conditions and the
>> following
>>>> + *        disclaimer in the documentation and/or other materials
>>>> + *        provided with the distribution.
>>>> + *
>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
>> OF
>>>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>>>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
>> HOLDERS
>>>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
>> AN
>>>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
>> IN
>>>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>>>> + * SOFTWARE.
>>>> + */
>>>> +
>>>> +#ifndef UOBJECT_H
>>>> +#define UOBJECT_H
>>>> +
>>>> +#include <linux/idr.h>
>>>> +#include <rdma/uverbs_ioctl.h>
>>>> +#include <rdma/ib_verbs.h>
>>>> +#include <linux/mutex.h>
>>>> +
>>>> +const struct uverbs_type *uverbs_get_type(const struct ib_device
>>>> *ibdev,
>>>> +                                       uint16_t type);
>>>> +struct ib_uobject *uverbs_get_type_from_idr(const struct
>>>> uverbs_type_alloc_action *type,
>>>> +                                         struct ib_ucontext
>> *ucontext,
>>>> +                                         enum uverbs_idr_access
>> access,
>>>> +                                         uint32_t idr);
>>>> +struct ib_uobject *uverbs_get_type_from_fd(const struct
>>>> uverbs_type_alloc_action *type,
>>>> +                                        struct ib_ucontext
>> *ucontext,
>>>> +                                        enum uverbs_idr_access
>> access,
>>>> +                                        int fd);
>>>> +void uverbs_unlock_object(struct ib_uobject *uobj,
>>>> +                       enum uverbs_idr_access access,
>>>> +                       bool success);
>>>> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
>>>> +                        size_t num,
>>>> +                        const struct uverbs_action_spec *spec,
>>>> +                        bool success);
>>>> +
>>>> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
>>>> *ucontext,
>>>> +                                          const struct
>> uverbs_types_group
>>>> *types_group);
>>>> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
>>>> *ucontext);
>>>> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
>>>> *ucontext);
>>>> +void ib_uverbs_close_fd(struct file *f);
>>>> +void ib_uverbs_cleanup_fd(void *private_data);
>>>> +
>>>> +static inline void *uverbs_fd_to_priv(struct ib_uobject *uobj)
>>>> +{
>>>> +     return uobj + 1;
>>>> +}
>>>
>>> This seems like a rather useless function.
>>>
>>
>> Why? The user sholdn't know or care how we put our structs together.
>>
>>>> +
>>>> +#endif /* UIDR_H */
>>>> diff --git a/drivers/infiniband/core/uverbs.h
>>>> b/drivers/infiniband/core/uverbs.h
>>>> index 8074705..ae7d4b8 100644
>>>> --- a/drivers/infiniband/core/uverbs.h
>>>> +++ b/drivers/infiniband/core/uverbs.h
>>>> @@ -180,6 +180,7 @@ void idr_remove_uobj(struct ib_uobject *uobj);
>>>>  struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file
>>>> *uverbs_file,
>>>>                                       struct ib_device *ib_dev,
>>>>                                       int is_async);
>>>> +void ib_uverbs_release_file(struct kref *ref);
>>>>  void ib_uverbs_free_async_event_file(struct ib_uverbs_file
>>>> *uverbs_file);
>>>>  struct ib_uverbs_event_file *ib_uverbs_lookup_comp_file(int fd);
>>>>
>>>> diff --git a/drivers/infiniband/core/uverbs_main.c
>>>> b/drivers/infiniband/core/uverbs_main.c
>>>> index f783723..e63357a 100644
>>>> --- a/drivers/infiniband/core/uverbs_main.c
>>>> +++ b/drivers/infiniband/core/uverbs_main.c
>>>> @@ -341,7 +341,7 @@ static void ib_uverbs_comp_dev(struct
>>>> ib_uverbs_device *dev)
>>>>       complete(&dev->comp);
>>>>  }
>>>>
>>>> -static void ib_uverbs_release_file(struct kref *ref)
>>>> +void ib_uverbs_release_file(struct kref *ref)
>>>>  {
>>>>       struct ib_uverbs_file *file =
>>>>               container_of(ref, struct ib_uverbs_file, ref);
>>>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>>>> index b5d2075..7240615 100644
>>>> --- a/include/rdma/ib_verbs.h
>>>> +++ b/include/rdma/ib_verbs.h
>>>> @@ -1329,8 +1329,11 @@ struct ib_fmr_attr {
>>>>
>>>>  struct ib_umem;
>>>>
>>>> +struct ib_ucontext_lock;
>>>> +
>>>>  struct ib_ucontext {
>>>>       struct ib_device       *device;
>>>> +     struct ib_uverbs_file  *ufile;
>>>>       struct list_head        pd_list;
>>>>       struct list_head        mr_list;
>>>>       struct list_head        mw_list;
>>>> @@ -1344,6 +1347,10 @@ struct ib_ucontext {
>>>>       struct list_head        rwq_ind_tbl_list;
>>>>       int                     closing;
>>>>
>>>> +     /* lock for uobjects list */
>>>> +     struct ib_ucontext_lock *uobjects_lock;
>>>> +     struct list_head        uobjects;
>>>> +
>>>>       struct pid             *tgid;
>>>>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
>>>>       struct rb_root      umem_tree;
>>>> @@ -1363,16 +1370,28 @@ struct ib_ucontext {
>>>>  #endif
>>>>  };
>>>>
>>>> +struct uverbs_object_list;
>>>> +
>>>> +#define OLD_ABI_COMPAT
>>>> +
>>>>  struct ib_uobject {
>>>>       u64                     user_handle;    /* handle given to us
>> by userspace
>>>> */
>>>>       struct ib_ucontext     *context;        /* associated user
>> context
>>>> */
>>>>       void                   *object;         /* containing object
>> */
>>>>       struct list_head        list;           /* link to context's
>> list */
>>>> -     int                     id;             /* index into kernel
>> idr */
>>>> -     struct kref             ref;
>>>> -     struct rw_semaphore     mutex;          /* protects .live */
>>>> +     int                     id;             /* index into kernel
>> idr/fd */
>>>> +#ifdef OLD_ABI_COMPAT
>>>> +     struct kref             ref;
>>>> +#endif
>>>> +     struct rw_semaphore     usecnt;         /* protects exclusive
>>>> access */
>>>> +#ifdef OLD_ABI_COMPAT
>>>> +     struct rw_semaphore     mutex;          /* protects .live */
>>>> +#endif
>>>>       struct rcu_head         rcu;            /* kfree_rcu()
>> overhead */
>>>>       int                     live;
>>>> +
>>>> +     const struct uverbs_type_alloc_action *type;
>>>> +     struct ib_ucontext_lock *uobjects_lock;
>>>>  };
>>>>
>>>>  struct ib_udata {
>>>> @@ -2101,6 +2120,9 @@ struct ib_device {
>>>>        */
>>>>       int (*get_port_immutable)(struct ib_device *, u8, struct
>>>> ib_port_immutable *);
>>>>       void (*get_dev_fw_str)(struct ib_device *, char *str, size_t
>>>> str_len);
>>>> +     struct list_head type_list;
>>>> +
>>>> +     const struct uverbs_types_group *types_group;
>>>>  };
>>>>
>>>>  struct ib_client {
>>>> diff --git a/include/rdma/uverbs_ioctl.h
>> b/include/rdma/uverbs_ioctl.h
>>>> new file mode 100644
>>>> index 0000000..2f50045
>>>> --- /dev/null
>>>> +++ b/include/rdma/uverbs_ioctl.h
>>>> @@ -0,0 +1,195 @@
>>>> +/*
>>>> + * Copyright (c) 2016, Mellanox Technologies inc.  All rights
>>>> reserved.
>>>> + *
>>>> + * This software is available to you under a choice of one of two
>>>> + * licenses.  You may choose to be licensed under the terms of the
>> GNU
>>>> + * General Public License (GPL) Version 2, available from the file
>>>> + * COPYING in the main directory of this source tree, or the
>>>> + * OpenIB.org BSD license below:
>>>> + *
>>>> + *     Redistribution and use in source and binary forms, with or
>>>> + *     without modification, are permitted provided that the
>> following
>>>> + *     conditions are met:
>>>> + *
>>>> + *      - Redistributions of source code must retain the above
>>>> + *        copyright notice, this list of conditions and the
>> following
>>>> + *        disclaimer.
>>>> + *
>>>> + *      - Redistributions in binary form must reproduce the above
>>>> + *        copyright notice, this list of conditions and the
>> following
>>>> + *        disclaimer in the documentation and/or other materials
>>>> + *        provided with the distribution.
>>>> + *
>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
>> OF
>>>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>>>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
>> HOLDERS
>>>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
>> AN
>>>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
>> IN
>>>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>>>> + * SOFTWARE.
>>>> + */
>>>> +
>>>> +#ifndef _UVERBS_IOCTL_
>>>> +#define _UVERBS_IOCTL_
>>>> +
>>>> +#include <linux/kernel.h>
>>>> +
>>>> +struct uverbs_object_type;
>>>> +struct ib_ucontext;
>>>> +struct ib_uobject;
>>>> +struct ib_device;
>>>> +struct uverbs_uobject_type;
>>>> +
>>>> +/*
>>>> + * =======================================
>>>> + *   Verbs action specifications
>>>> + * =======================================
>>>> + */
>>>
>>> I intentionally used urdma (though condensed to 3 letters that I
>> don't recall atm), rather than uverbs.  This will need to work with
>> non-verbs devices and interfaces -- again, consider how this fits with
>> the rdma cm.  Verbs has a very specific meaning, which gets lost if we
>> start referring to everything as 'verbs'.  It's bad enough that we're
>> stuck with 'drivers/infiniband' and 'rdma', such that 'infiniband' also
>> means ethernet and rdma means nothing.
>>>
>>
>> IMHO - let's agree on the concept of this infrastructure. One we
>> decide its scope, we could generalize it (i.e - ioctl_provider and
>> ioctl_context) and implement it to rdma-cm as well.
>>
>>>> +
>>>> +enum uverbs_attr_type {
>>>> +     UVERBS_ATTR_TYPE_PTR_IN,
>>>> +     UVERBS_ATTR_TYPE_PTR_OUT,
>>>> +     UVERBS_ATTR_TYPE_IDR,
>>>> +     UVERBS_ATTR_TYPE_FD,
>>>> +};
>>>> +
>>>> +enum uverbs_idr_access {
>>>> +     UVERBS_IDR_ACCESS_READ,
>>>> +     UVERBS_IDR_ACCESS_WRITE,
>>>> +     UVERBS_IDR_ACCESS_NEW,
>>>> +     UVERBS_IDR_ACCESS_DESTROY
>>>> +};
>>>> +
>>>> +struct uverbs_attr_spec {
>>>> +     u16                             len;
>>>> +     enum uverbs_attr_type           type;
>>>> +     struct {
>>>> +             u16                     obj_type;
>>>> +             u8                      access;
>>>
>>> Is access intended to be an enum uverbs_idr_access value?
>>>
>>
>> Yeah, worth using this enum. Thanks.
>>
>>>> +     } obj;
>>>
>>> I would remove (flatten) the substructure and re-order the fields for
>> better alignment.
>>>
>>
>> I noticed there are several places which aren't aliged. It's in my todo
>> list.
>>
>>>> +};
>>>> +
>>>> +struct uverbs_attr_group_spec {
>>>> +     struct uverbs_attr_spec         *attrs;
>>>> +     size_t                          num_attrs;
>>>> +};
>>>> +
>>>> +struct uverbs_action_spec {
>>>> +     const struct uverbs_attr_group_spec             **attr_groups;
>>>> +     /* if > 0 -> validator, otherwise, error */
>>>
>>> ? not sure what this comment means
>>>
>>>> +     int (*dist)(__u16 *attr_id, void *priv);
>>>
>>> What does 'dist' stand for?
>>>
>>
>> dist = distribution function.
>> It maps the attributes you got from the user-space to your groups. You
>> could think of each group as a namespace - where its attributes (or
>> types/actions) starts from zero in the sake of compactness.
>> So, for example, it gets an attribute 0x8010 and maps to to "group 1"
>> (provider) and attribute 0x10.
>>
>>>> +     void                                            *priv;
>>>> +     size_t                                          num_groups;
>>>> +};
>>>> +
>>>> +struct uverbs_attr_array;
>>>> +struct ib_uverbs_file;
>>>> +
>>>> +struct uverbs_action {
>>>> +     struct uverbs_action_spec spec;
>>>> +     void *priv;
>>>> +     int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file
>>>> *ufile,
>>>> +                    struct uverbs_attr_array *ctx, size_t num, void
>>>> *priv);
>>>> +};
>>>> +
>>>> +struct uverbs_type_alloc_action;
>>>> +typedef void (*free_type)(const struct uverbs_type_alloc_action
>>>> *uobject_type,
>>>> +                       struct ib_uobject *uobject);
>>>> +
>>>> +struct uverbs_type_alloc_action {
>>>> +     enum uverbs_attr_type           type;
>>>> +     int                             order;
>>>
>>> I think this is being used as destroy order, in which case I would
>> rename it to clarify the intent.  Though I'd prefer we come up with a
>> more efficient destruction mechanism than the repeated nested looping.
>>>
>>
>> In one of the earlier revisions I used a sorted list, which was
>> efficient. I recall that Jason didn't like its complexity and
>> re-thinking about that - he's right. Most of your types are "order
>> number" 0 anyway. So you'll probably iterate very few objects in the
>> next round (in verbs, everything but MRs and PDs).
>>
>>>> +     size_t                          obj_size;
>>>
>>> This can be alloc_fn
>>>
>>>> +     free_type                       free_fn;
>>>> +     struct {
>>>> +             const struct file_operations    *fops;
>>>> +             const char                      *name;
>>>> +             int                             flags;
>>>> +     } fd;
>>>> +};
>>>> +
>>>> +struct uverbs_type_actions_group {
>>>> +     size_t                                  num_actions;
>>>> +     const struct uverbs_action              **actions;
>>>> +};
>>>> +
>>>> +struct uverbs_type {
>>>> +     size_t                                  num_groups;
>>>> +     const struct uverbs_type_actions_group  **action_groups;
>>>> +     const struct uverbs_type_alloc_action   *alloc;
>>>> +     int (*dist)(__u16 *action_id, void *priv);
>>>> +     void                                    *priv;
>>>> +};
>>>> +
>>>> +struct uverbs_types {
>>>> +     size_t                                  num_types;
>>>> +     const struct uverbs_type                **types;
>>>> +};
>>>> +
>>>> +struct uverbs_types_group {
>>>> +     const struct uverbs_types               **type_groups;
>>>> +     size_t                                  num_groups;
>>>> +     int (*dist)(__u16 *type_id, void *priv);
>>>> +     void                                    *priv;
>>>> +};
>>>> +
>>>> +/* =================================================
>>>> + *              Parsing infrastructure
>>>> + * =================================================
>>>> + */
>>>> +
>>>> +struct uverbs_ptr_attr {
>>>> +     void    * __user ptr;
>>>> +     __u16           len;
>>>> +};
>>>> +
>>>> +struct uverbs_fd_attr {
>>>> +     int             fd;
>>>> +};
>>>> +
>>>> +struct uverbs_uobj_attr {
>>>> +     /*  idr handle */
>>>> +     __u32   idr;
>>>> +};
>>>> +
>>>> +struct uverbs_obj_attr {
>>>> +     /* pointer to the kernel descriptor -> type, access, etc */
>>>> +     const struct uverbs_attr_spec *val;
>>>> +     struct ib_uverbs_attr __user    *uattr;
>>>> +     const struct uverbs_type_alloc_action   *type;
>>>> +     struct ib_uobject               *uobject;
>>>> +     union {
>>>> +             struct uverbs_fd_attr           fd;
>>>> +             struct uverbs_uobj_attr         uobj;
>>>> +     };
>>>> +};
>>>> +
>>>> +struct uverbs_attr {
>>>> +     bool valid; > >> +     union {
>>>> +             struct uverbs_ptr_attr  cmd_attr;
>>>> +             struct uverbs_obj_attr  obj_attr;
>>>> +     };
>>>> +};
>>>
>>> It's odd to have a union that's part of a structure without some
>> field to indicate which union field is accessible.
>>>
>>
>> You index this array but the attribute id from the user's callback
>> funciton. The user should know what's the type of the attribute, as
>> [s]he declared the specification.
>>
>>>> +
>>>> +/* output of one validator */
>>>> +struct uverbs_attr_array {
>>>> +     size_t num_attrs;
>>>> +     /* arrays of attrubytes, index is the id i.e SEND_CQ */
>>>> +     struct uverbs_attr *attrs;
>>>> +};
>>>> +
>>>> +/* =================================================
>>>> + *              Types infrastructure
>>>> + * =================================================
>>>> + */
>>>> +
>>>> +int ib_uverbs_uobject_type_add(struct list_head      *head,
>>>> +                            void (*free)(struct uverbs_uobject_type
>> *type,
>>>> +                                         struct ib_uobject
>> *uobject,
>>>> +                                         struct ib_ucontext
>> *ucontext),
>>>> +                            uint16_t obj_type);
>>>> +void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev);
>>>> +
>>>> +#endif
>>>> --
>>>> 2.7.4
>
> Matan, please re-look at the architecture that I proposed:
>
> https://patchwork.kernel.org/patch/9178991/
>
> including the terminology (and consider using common OOP terms).  The *core* of the ioctl framework is to simply invoke a function dispatch table.  IMO, that's where we should start.  Anything beyond that is extra that we should have a strong reason for including.  (Yes, I think we need more.)  Starting simple and adding necessary functionality should let us get something upstream quicker and re-use more of the existing code.

Actually, I've looked at your proposal. Some of the ideas are blend 
between your proposal and my original proposal.
One of our goals is to get rid of the commonalities between handlers and 
push the validation and locking to a common code which could validated 
once to be correct. It tends to be a lot easier (and correct) than to 
re-examine every function call.

>
> If we're going to re-create netlink as part of the rdma ioctl interface, then why don't we just use netlink directly?
>

We had a netlink based proposal first, but:
(a) We want one call dispatching (as opposed to send-receive).
(b) We don't want to copy the driver's data from the user-space to the 
libibverbs buffer.
(c) We don't need the complexity of nesting.
(d) Bare netlink is considered unreliable.
(e) We need semantics of objects.
(f) By using pointers, we could eliminate some copies.

Thanks for looking at this patch.

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-11-10  8:29 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-27 14:43 [RFC ABI V5 00/10] SG-based RDMA ABI Proposal Matan Barak
     [not found] ` <1477579398-6875-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-10-27 14:43   ` [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device Matan Barak
     [not found]     ` <1477579398-6875-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-10-28 22:53       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A82373AB0A445F-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-10-30  9:13           ` Leon Romanovsky
2016-11-07 23:55           ` Jason Gunthorpe
     [not found]             ` <20161107235516.GE7002-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-11-09  9:34               ` Matan Barak
2016-10-27 14:43   ` [RFC ABI V5 02/10] RDMA/core: Add support for custom types Matan Barak
     [not found]     ` <1477579398-6875-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-10-30 19:28       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A82373AB0A47BD-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-10-31 22:58           ` Matan Barak
     [not found]             ` <CAAKD3BDWyb10baLrDu=m_mYPB64i9OOPEPVYKtDo9zVbvMM-UA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-11-09 18:00               ` Hefty, Sean
     [not found]                 ` <1828884A29C6694DAF28B7E6B8A82373AB0A8000-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-11-09 18:50                   ` Jason Gunthorpe
2016-11-10  8:29                   ` Matan Barak
2016-10-27 14:43   ` [RFC ABI V5 03/10] RDMA/core: Add new ioctl interface Matan Barak
2016-10-27 14:43   ` [RFC ABI V5 04/10] RDMA/core: Add initialize and cleanup of common types Matan Barak
2016-10-27 14:43   ` [RFC ABI V5 05/10] RDMA/core: Add uverbs types, actions, handlers and attributes Matan Barak
2016-10-27 14:43   ` [RFC ABI V5 06/10] IB/mlx5: Implement common uverb objects Matan Barak
2016-10-27 14:43   ` [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space Matan Barak
     [not found]     ` <1477579398-6875-8-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-10-28  6:59       ` Christoph Hellwig
     [not found]         ` <20161028065943.GA10418-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-10-28 15:16           ` Leon Romanovsky
     [not found]             ` <20161028151606.GN3617-2ukJVAZIZ/Y@public.gmane.org>
2016-10-28 15:21               ` Christoph Hellwig
     [not found]                 ` <20161028152138.GA16421-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-10-28 15:33                   ` Leon Romanovsky
     [not found]                     ` <20161028153306.GO3617-2ukJVAZIZ/Y@public.gmane.org>
2016-10-28 15:37                       ` Christoph Hellwig
     [not found]                         ` <20161028153725.GA14166-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-10-28 15:46                           ` Leon Romanovsky
     [not found]                             ` <20161028154628.GP3617-2ukJVAZIZ/Y@public.gmane.org>
2016-10-30  8:48                               ` Matan Barak
     [not found]                                 ` <CAAKD3BB0k1UxV2qO3SqAD_t1vM2pcduOXiz8aJ5c+JXAmq_aWw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-11-08  0:43                                   ` Jason Gunthorpe
     [not found]                                     ` <20161108004351.GA32444-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-11-09  9:45                                       ` Matan Barak
2016-10-27 14:43   ` [RFC ABI V5 08/10] IB/core: Implement compatibility layer for get context command Matan Barak
2016-10-27 14:43   ` [RFC ABI V5 09/10] IB/core: Add create_qp command to the new ABI Matan Barak
2016-10-27 14:43   ` [RFC ABI V5 10/10] IB/core: Add modify_qp " Matan Barak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.