All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC ABI V1 0/8] SG-based RDMA ABI Proposal
@ 2016-06-30 13:39 Matan Barak
       [not found] ` <1467293971-25688-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-06-30 13:39 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Tal Alon, Liran Liss,
	Haggai Eran, Matan Barak, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky

The following patch set comes to enrich security model as a follow up
to commit e6bd18f57aad ('IB/security: Restrict use of the write() interface').

DISCLAIMER:
These patches don't work and are far from being completed. They only present
the fundamental concepts and are sent in order to be a basis of discussions.

The ideas presented here are based on our V0 series in addition to some ideas
presented in OFVWG and Sean's series.

This patch series add ioctl() interface to the existing write() interface and
provide an easy route to backport this change to legacy supported systems.
Analyzing the current uverbs role in dispatching and parsing commands, we find
that:
(a) uverbs validates the basic properties of the command
(b) uverbs is responsible of doing all the IDR and uobject management and
    locking.
(c) uverbs transforms the user<-->kernel ABI to kernel API.

(a) and (b) are valid for every kABI. Although the nature of commands could
change, they still have to be validated and transform to kernel pointers.
In order to avoid duplications between the various drivers, we would like to
keep (a) and (b) as shared code.

In addition, this is a good time to expand the ABI to be more scalable, so we
added a few goals:
(1) Command's attributes shall be extensible in an easy one. Either by allowing
    vendors to have their own extensible set of attributes or core code
    extensible attributes. Moreover, vendor's specific attributes could some
    day become core's standard attributes. We would like to still support
    old user-space while avoid duplicating the code in kernel.
(2) Each driver may have specific type system (i.e QP, CQ, ....). It may
    or may not even implement the standard type system. It could extend this
    type system in the future.
(3) Do not change or recompile vendor libraries and don't copy their data.
(4) Efficient dispatching.

Thus, in order to allow this flexibility, we decide giving (a) and (b) as a
common infrastructure, but use per-driver guidelines in order to do that
parsing and uobject management. Handlers are also set by the drivers
themselves (though they can point to either shared common code) or
vendor specific code.

Since types are no longer enforced by the common infrastructure, there is no
point of pre-allocating common IDR types in the common code. Instead, we
provide an API for driver to add new types. We use one IDR per vendor
for all its types. The driver should add all its usable types before
any application runs. The order of which the driver adds its types (and the
common types it uses) dictates the process release order. After that,
all uboject, reference counts and types are handled automatically for the
driver by the infrastructure.

Scatter gather was chosen in order to allow us not to recompile user space
drivers. By using pointers to vendor specific data, we could just use it
without introduce copying data and without changing the user-space driver at
all.

We chose to go with lockless uobjects. When exclusive (WRITE or DESTROY) access
is required, we dispatch the action if and only if no other action needs
this object as well. Otherwise, -EBUSY is returned to the user-space.
Device removal is synced with the commands by using a rwsem that is present
on the ib_uverbs_file struct.
If we were using locks, we would have need to sort the given user-space handles.
Otherwise, a user-space application may result in causing a deadlock.
Moving to a lockless based behaviour, the dispatching in kernel becomes more
efficient.

Further uverbs related subsystem (such as RDMA-CM) may use other fds or use
other ioctl codes.

Haggai Eran (1):
  RDMA/core: Add support for custom types

Leon Romanovsky (3):
  RDMA/core: Export RDMA IOCTL declarations
  RDMA/core: Move uobject code to separate files
  RDMA/core: Refactor IDR to be per-device

Matan Barak (4):
  RDMA/core: Introduce add/remove uobj from types
  RDMA/core: Add new ioctl interface
  RDMA/core: Change locking of ib_uobject
  RDMA/{hw, core}: Provide simple skeleton to IOCTL interface

 drivers/infiniband/core/Makefile           |   3 +-
 drivers/infiniband/core/device.c           |  22 ++
 drivers/infiniband/core/uidr.c             | 287 +++++++++++++++++++++
 drivers/infiniband/core/uidr.h             |  68 +++++
 drivers/infiniband/core/uobject.c          | 208 ++++++++++++++++
 drivers/infiniband/core/uobject.h          | 108 ++++++++
 drivers/infiniband/core/user_mad.c         |   2 +-
 drivers/infiniband/core/uverbs.h           |  17 +-
 drivers/infiniband/core/uverbs_cmd.c       | 385 ++++++-----------------------
 drivers/infiniband/core/uverbs_ioctl.c     | 279 +++++++++++++++++++++
 drivers/infiniband/core/uverbs_ioctl_cmd.c |  97 ++++++++
 drivers/infiniband/core/uverbs_main.c      |  43 ++--
 drivers/infiniband/hw/ioctl-skel.c         | 119 +++++++++
 include/rdma/ib_ioctl.h                    |  38 +++
 include/rdma/ib_verbs.h                    |  17 +-
 include/rdma/uverbs_ioctl.h                | 202 +++++++++++++++
 include/rdma/uverbs_ioctl_cmd.h            |  88 +++++++
 include/uapi/rdma/Kbuild                   |   1 +
 include/uapi/rdma/ib_user_ioctl.h          |  78 ++++++
 include/uapi/rdma/ib_user_mad.h            |  12 -
 include/uapi/rdma/ib_user_verbs.h          |  19 ++
 21 files changed, 1725 insertions(+), 368 deletions(-)
 create mode 100644 drivers/infiniband/core/uidr.c
 create mode 100644 drivers/infiniband/core/uidr.h
 create mode 100644 drivers/infiniband/core/uobject.c
 create mode 100644 drivers/infiniband/core/uobject.h
 create mode 100644 drivers/infiniband/core/uverbs_ioctl.c
 create mode 100644 drivers/infiniband/core/uverbs_ioctl_cmd.c
 create mode 100644 drivers/infiniband/hw/ioctl-skel.c
 create mode 100644 include/rdma/ib_ioctl.h
 create mode 100644 include/rdma/uverbs_ioctl.h
 create mode 100644 include/rdma/uverbs_ioctl_cmd.h
 create mode 100644 include/uapi/rdma/ib_user_ioctl.h

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC ABI V1 1/8] RDMA/core: Export RDMA IOCTL declarations
       [not found] ` <1467293971-25688-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-06-30 13:39   ` Matan Barak
       [not found]     ` <1467293971-25688-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-06-30 13:39   ` [RFC ABI V1 2/8] RDMA/core: Move uobject code to separate files Matan Barak
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-06-30 13:39 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Tal Alon, Liran Liss,
	Haggai Eran, Matan Barak, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Place all external RDMA IOCTL declarations into one UAPI exported
header file and move all legacy MAD commands to that file.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/user_mad.c |  2 +-
 include/rdma/ib_ioctl.h            | 38 ++++++++++++++++++++++++++
 include/uapi/rdma/Kbuild           |  1 +
 include/uapi/rdma/ib_user_ioctl.h  | 55 ++++++++++++++++++++++++++++++++++++++
 include/uapi/rdma/ib_user_mad.h    | 12 ---------
 5 files changed, 95 insertions(+), 13 deletions(-)
 create mode 100644 include/rdma/ib_ioctl.h
 create mode 100644 include/uapi/rdma/ib_user_ioctl.h

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 415a318..bee2a34 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -52,8 +52,8 @@
 
 #include <asm/uaccess.h>
 
+#include <rdma/ib_ioctl.h>
 #include <rdma/ib_mad.h>
-#include <rdma/ib_user_mad.h>
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("InfiniBand userspace MAD packet access");
diff --git a/include/rdma/ib_ioctl.h b/include/rdma/ib_ioctl.h
new file mode 100644
index 0000000..d75cce0
--- /dev/null
+++ b/include/rdma/ib_ioctl.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright (c) 2016 Mellanox Technologies, LTD. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef IB_IOCTL_H
+#define IB_IOCTL_H
+
+#include <rdma/ib_user_ioctl.h>
+
+#endif /* IB_IOCTL_H */
diff --git a/include/uapi/rdma/Kbuild b/include/uapi/rdma/Kbuild
index 231901b..c9344b3 100644
--- a/include/uapi/rdma/Kbuild
+++ b/include/uapi/rdma/Kbuild
@@ -1,5 +1,6 @@
 # UAPI Header export list
 header-y += ib_user_cm.h
+header-y += ib_user_ioctl.h
 header-y += ib_user_mad.h
 header-y += ib_user_sa.h
 header-y += ib_user_verbs.h
diff --git a/include/uapi/rdma/ib_user_ioctl.h b/include/uapi/rdma/ib_user_ioctl.h
new file mode 100644
index 0000000..d1928dea
--- /dev/null
+++ b/include/uapi/rdma/ib_user_ioctl.h
@@ -0,0 +1,55 @@
+/*
+ * Copyright (c) 2016 Mellanox Technologies, LTD. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef IB_USER_IOCTL_H
+#define IB_USER_IOCTL_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define IB_IOCTL_MAGIC		0x1b
+
+/* Legacy part
+ * !!!! NOTE: It uses the same command index as VERBS
+ */
+#include <rdma/ib_user_mad.h>
+#define IB_USER_MAD_REGISTER_AGENT	_IOWR(IB_IOCTL_MAGIC, 1, \
+					      struct ib_user_mad_reg_req)
+
+#define IB_USER_MAD_UNREGISTER_AGENT	_IOW(IB_IOCTL_MAGIC, 2, __u32)
+
+#define IB_USER_MAD_ENABLE_PKEY		_IO(IB_IOCTL_MAGIC, 3)
+
+#define IB_USER_MAD_REGISTER_AGENT2     _IOWR(IB_IOCTL_MAGIC, 4, \
+					      struct ib_user_mad_reg_req2)
+
+#endif /* IB_USER_IOCTL_H */
diff --git a/include/uapi/rdma/ib_user_mad.h b/include/uapi/rdma/ib_user_mad.h
index 09f809f..7e722e7 100644
--- a/include/uapi/rdma/ib_user_mad.h
+++ b/include/uapi/rdma/ib_user_mad.h
@@ -230,16 +230,4 @@ struct ib_user_mad_reg_req2 {
 	__u8	reserved[3];
 };
 
-#define IB_IOCTL_MAGIC		0x1b
-
-#define IB_USER_MAD_REGISTER_AGENT	_IOWR(IB_IOCTL_MAGIC, 1, \
-					      struct ib_user_mad_reg_req)
-
-#define IB_USER_MAD_UNREGISTER_AGENT	_IOW(IB_IOCTL_MAGIC, 2, __u32)
-
-#define IB_USER_MAD_ENABLE_PKEY		_IO(IB_IOCTL_MAGIC, 3)
-
-#define IB_USER_MAD_REGISTER_AGENT2     _IOWR(IB_IOCTL_MAGIC, 4, \
-					      struct ib_user_mad_reg_req2)
-
 #endif /* IB_USER_MAD_H */
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC ABI V1 2/8] RDMA/core: Move uobject code to separate files
       [not found] ` <1467293971-25688-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-06-30 13:39   ` [RFC ABI V1 1/8] RDMA/core: Export RDMA IOCTL declarations Matan Barak
@ 2016-06-30 13:39   ` Matan Barak
       [not found]     ` <1467293971-25688-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-06-30 13:39   ` [RFC ABI V1 3/8] RDMA/core: Refactor IDR to be per-device Matan Barak
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-06-30 13:39 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Tal Alon, Liran Liss,
	Haggai Eran, Matan Barak, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The new proposed infrastructure allows driver's to declare new object
types. Furthermore, uobject management code will be used from both
the old legacy code and the new ioctl infrastructure.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/Makefile     |   3 +-
 drivers/infiniband/core/uobject.c    |  95 ++++++++++++++++++++++++++++++++
 drivers/infiniband/core/uobject.h    |  87 ++++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_cmd.c | 101 +----------------------------------
 4 files changed, 185 insertions(+), 101 deletions(-)
 create mode 100644 drivers/infiniband/core/uobject.c
 create mode 100644 drivers/infiniband/core/uobject.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index edaae9f..68190b4 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -28,4 +28,5 @@ ib_umad-y :=			user_mad.o
 
 ib_ucm-y :=			ucm.o
 
-ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o
+ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
+				uobject.o
diff --git a/drivers/infiniband/core/uobject.c b/drivers/infiniband/core/uobject.c
new file mode 100644
index 0000000..55138ea
--- /dev/null
+++ b/drivers/infiniband/core/uobject.c
@@ -0,0 +1,95 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/ib_verbs.h>
+
+#include "uobject.h"
+
+/*
+ * The ib_uobject locking scheme is as follows:
+ *
+ * - uobj->context->device->idr_lock protects the uverbs idrs themselves, so
+ *   it needs to be held during all idr write operations.  When an object is
+ *   looked up, a reference must be taken on the object's kref before
+ *   dropping this lock.  For read operations, the rcu_read_lock()
+ *   and rcu_write_lock() but similarly the kref reference is grabbed
+ *   before the rcu_read_unlock().
+ *
+ * - Each object also has an rwsem.  This rwsem must be held for
+ *   reading while an operation that uses the object is performed.
+ *   For example, while registering an MR, the associated PD's
+ *   uobject.mutex must be held for reading.  The rwsem must be held
+ *   for writing while initializing or destroying an object.
+ *
+ * - In addition, each object has a "live" flag.  If this flag is not
+ *   set, then lookups of the object will fail even if it is found in
+ *   the idr.  This handles a reader that blocks and does not acquire
+ *   the rwsem until after the object is destroyed.  The destroy
+ *   operation will set the live flag to 0 and then drop the rwsem;
+ *   this will allow the reader to acquire the rwsem, see that the
+ *   live flag is 0, and then drop the rwsem and its reference to
+ *   object.  The underlying storage will not be freed until the last
+ *   reference to the object is dropped.
+ */
+
+void init_uobj(struct ib_uobject *uobj, u64 user_handle,
+	       struct ib_ucontext *context, struct uverbs_lock_class *c)
+{
+	uobj->user_handle = user_handle;
+	uobj->context     = context;
+	kref_init(&uobj->ref);
+	init_rwsem(&uobj->mutex);
+	lockdep_set_class_and_name(&uobj->mutex, &c->key, c->name);
+	uobj->live        = 0;
+}
+
+void release_uobj(struct kref *kref)
+{
+	kfree_rcu(container_of(kref, struct ib_uobject, ref), rcu);
+}
+
+void put_uobj(struct ib_uobject *uobj)
+{
+	kref_put(&uobj->ref, release_uobj);
+}
+
+void put_uobj_read(struct ib_uobject *uobj)
+{
+	up_read(&uobj->mutex);
+	put_uobj(uobj);
+}
+
+void put_uobj_write(struct ib_uobject *uobj)
+{
+	up_write(&uobj->mutex);
+	put_uobj(uobj);
+}
diff --git a/drivers/infiniband/core/uobject.h b/drivers/infiniband/core/uobject.h
new file mode 100644
index 0000000..2958ef7
--- /dev/null
+++ b/drivers/infiniband/core/uobject.h
@@ -0,0 +1,87 @@
+/*
+ * Copyright (c) 2005 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005-2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef UOBJECT_H
+#define UOBJECT_H
+
+struct uverbs_lock_class {
+	struct lock_class_key	key;
+	char			name[16];
+};
+
+void init_uobj(struct ib_uobject *uobj, u64 user_handle,
+	       struct ib_ucontext *context, struct uverbs_lock_class *c);
+
+void release_uobj(struct kref *kref);
+void put_uobj(struct ib_uobject *uobj);
+void put_uobj_read(struct ib_uobject *uobj);
+void put_uobj_write(struct ib_uobject *uobj);
+
+static inline void put_pd_read(struct ib_pd *pd)
+{
+	put_uobj_read(pd->uobject);
+}
+
+static inline void put_cq_read(struct ib_cq *cq)
+{
+	put_uobj_read(cq->uobject);
+}
+
+static inline void put_ah_read(struct ib_ah *ah)
+{
+	put_uobj_read(ah->uobject);
+}
+
+static inline void put_qp_read(struct ib_qp *qp)
+{
+	put_uobj_read(qp->uobject);
+}
+
+static inline void put_qp_write(struct ib_qp *qp)
+{
+	put_uobj_write(qp->uobject);
+}
+
+static inline void put_srq_read(struct ib_srq *srq)
+{
+	put_uobj_read(srq->uobject);
+}
+
+static inline void put_xrcd_read(struct ib_uobject *uobj)
+{
+	put_uobj_read(uobj);
+}
+#endif /* UIDR_H */
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 1a8babb..fd7e320 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -41,13 +41,9 @@
 #include <asm/uaccess.h>
 
 #include "uverbs.h"
+#include "uobject.h"
 #include "core_priv.h"
 
-struct uverbs_lock_class {
-	struct lock_class_key	key;
-	char			name[16];
-};
-
 static struct uverbs_lock_class pd_lock_class	= { .name = "PD-uobj" };
 static struct uverbs_lock_class mr_lock_class	= { .name = "MR-uobj" };
 static struct uverbs_lock_class mw_lock_class	= { .name = "MW-uobj" };
@@ -58,66 +54,6 @@ static struct uverbs_lock_class srq_lock_class	= { .name = "SRQ-uobj" };
 static struct uverbs_lock_class xrcd_lock_class = { .name = "XRCD-uobj" };
 static struct uverbs_lock_class rule_lock_class = { .name = "RULE-uobj" };
 
-/*
- * The ib_uobject locking scheme is as follows:
- *
- * - ib_uverbs_idr_lock protects the uverbs idrs themselves, so it
- *   needs to be held during all idr write operations.  When an object is
- *   looked up, a reference must be taken on the object's kref before
- *   dropping this lock.  For read operations, the rcu_read_lock()
- *   and rcu_write_lock() but similarly the kref reference is grabbed
- *   before the rcu_read_unlock().
- *
- * - Each object also has an rwsem.  This rwsem must be held for
- *   reading while an operation that uses the object is performed.
- *   For example, while registering an MR, the associated PD's
- *   uobject.mutex must be held for reading.  The rwsem must be held
- *   for writing while initializing or destroying an object.
- *
- * - In addition, each object has a "live" flag.  If this flag is not
- *   set, then lookups of the object will fail even if it is found in
- *   the idr.  This handles a reader that blocks and does not acquire
- *   the rwsem until after the object is destroyed.  The destroy
- *   operation will set the live flag to 0 and then drop the rwsem;
- *   this will allow the reader to acquire the rwsem, see that the
- *   live flag is 0, and then drop the rwsem and its reference to
- *   object.  The underlying storage will not be freed until the last
- *   reference to the object is dropped.
- */
-
-static void init_uobj(struct ib_uobject *uobj, u64 user_handle,
-		      struct ib_ucontext *context, struct uverbs_lock_class *c)
-{
-	uobj->user_handle = user_handle;
-	uobj->context     = context;
-	kref_init(&uobj->ref);
-	init_rwsem(&uobj->mutex);
-	lockdep_set_class_and_name(&uobj->mutex, &c->key, c->name);
-	uobj->live        = 0;
-}
-
-static void release_uobj(struct kref *kref)
-{
-	kfree_rcu(container_of(kref, struct ib_uobject, ref), rcu);
-}
-
-static void put_uobj(struct ib_uobject *uobj)
-{
-	kref_put(&uobj->ref, release_uobj);
-}
-
-static void put_uobj_read(struct ib_uobject *uobj)
-{
-	up_read(&uobj->mutex);
-	put_uobj(uobj);
-}
-
-static void put_uobj_write(struct ib_uobject *uobj)
-{
-	up_write(&uobj->mutex);
-	put_uobj(uobj);
-}
-
 static int idr_add_uobj(struct idr *idr, struct ib_uobject *uobj)
 {
 	int ret;
@@ -213,31 +149,16 @@ static struct ib_pd *idr_read_pd(int pd_handle, struct ib_ucontext *context)
 	return idr_read_obj(&ib_uverbs_pd_idr, pd_handle, context, 0);
 }
 
-static void put_pd_read(struct ib_pd *pd)
-{
-	put_uobj_read(pd->uobject);
-}
-
 static struct ib_cq *idr_read_cq(int cq_handle, struct ib_ucontext *context, int nested)
 {
 	return idr_read_obj(&ib_uverbs_cq_idr, cq_handle, context, nested);
 }
 
-static void put_cq_read(struct ib_cq *cq)
-{
-	put_uobj_read(cq->uobject);
-}
-
 static struct ib_ah *idr_read_ah(int ah_handle, struct ib_ucontext *context)
 {
 	return idr_read_obj(&ib_uverbs_ah_idr, ah_handle, context, 0);
 }
 
-static void put_ah_read(struct ib_ah *ah)
-{
-	put_uobj_read(ah->uobject);
-}
-
 static struct ib_qp *idr_read_qp(int qp_handle, struct ib_ucontext *context)
 {
 	return idr_read_obj(&ib_uverbs_qp_idr, qp_handle, context, 0);
@@ -251,26 +172,11 @@ static struct ib_qp *idr_write_qp(int qp_handle, struct ib_ucontext *context)
 	return uobj ? uobj->object : NULL;
 }
 
-static void put_qp_read(struct ib_qp *qp)
-{
-	put_uobj_read(qp->uobject);
-}
-
-static void put_qp_write(struct ib_qp *qp)
-{
-	put_uobj_write(qp->uobject);
-}
-
 static struct ib_srq *idr_read_srq(int srq_handle, struct ib_ucontext *context)
 {
 	return idr_read_obj(&ib_uverbs_srq_idr, srq_handle, context, 0);
 }
 
-static void put_srq_read(struct ib_srq *srq)
-{
-	put_uobj_read(srq->uobject);
-}
-
 static struct ib_xrcd *idr_read_xrcd(int xrcd_handle, struct ib_ucontext *context,
 				     struct ib_uobject **uobj)
 {
@@ -278,11 +184,6 @@ static struct ib_xrcd *idr_read_xrcd(int xrcd_handle, struct ib_ucontext *contex
 	return *uobj ? (*uobj)->object : NULL;
 }
 
-static void put_xrcd_read(struct ib_uobject *uobj)
-{
-	put_uobj_read(uobj);
-}
-
 ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 			      struct ib_device *ib_dev,
 			      const char __user *buf,
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC ABI V1 3/8] RDMA/core: Refactor IDR to be per-device
       [not found] ` <1467293971-25688-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-06-30 13:39   ` [RFC ABI V1 1/8] RDMA/core: Export RDMA IOCTL declarations Matan Barak
  2016-06-30 13:39   ` [RFC ABI V1 2/8] RDMA/core: Move uobject code to separate files Matan Barak
@ 2016-06-30 13:39   ` Matan Barak
       [not found]     ` <1467293971-25688-4-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-06-30 13:39   ` [RFC ABI V1 4/8] RDMA/core: Add support for custom types Matan Barak
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-06-30 13:39 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Tal Alon, Liran Liss,
	Haggai Eran, Matan Barak, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky

From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The current code creates an IDR per type. Since types are currently
common for all drivers and known in advance, this was good enough.
However, the proposed ioctl based infrastructure allows each driver
to declare only some of the common types and declare its own specific
types.

Thus, we decided to implement IDR to be per device and refactor it to
use a new file.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/Makefile      |   2 +-
 drivers/infiniband/core/device.c      |  22 ++++
 drivers/infiniband/core/uidr.c        | 165 ++++++++++++++++++++++++++
 drivers/infiniband/core/uidr.h        |  55 +++++++++
 drivers/infiniband/core/uobject.c     |   1 +
 drivers/infiniband/core/uverbs.h      |  13 ---
 drivers/infiniband/core/uverbs_cmd.c  | 212 +++++++---------------------------
 drivers/infiniband/core/uverbs_main.c |  37 ++----
 include/rdma/ib_verbs.h               |   4 +
 9 files changed, 299 insertions(+), 212 deletions(-)
 create mode 100644 drivers/infiniband/core/uidr.c
 create mode 100644 drivers/infiniband/core/uidr.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 68190b4..c8bf8de 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -29,4 +29,4 @@ ib_umad-y :=			user_mad.o
 ib_ucm-y :=			ucm.o
 
 ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
-				uobject.o
+				uidr.o uobject.o
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 5516fb0..e0211d74 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -44,6 +44,7 @@
 #include <rdma/ib_cache.h>
 
 #include "core_priv.h"
+#include "uidr.h"
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("core kernel InfiniBand API");
@@ -311,6 +312,18 @@ static int read_port_immutable(struct ib_device *device)
 	return 0;
 }
 
+static int ib_device_allocate_idrs(struct ib_device *device)
+{
+	spin_lock_init(&device->idr_lock);
+	idr_init(&device->idr);
+	return 0;
+}
+
+static void ib_device_destroy_idrs(struct ib_device *device)
+{
+	idr_destroy(&device->idr);
+}
+
 /**
  * ib_register_device - Register an IB device with IB core
  * @device:Device to register
@@ -370,6 +383,14 @@ int ib_register_device(struct ib_device *device,
 		goto out;
 	}
 
+	ret = ib_device_allocate_idrs(device);
+	if (ret) {
+		pr_warn("Couldn't allocate IDRs device %s with driver model\n",
+			device->name);
+		ib_cache_cleanup_one(device);
+		goto out;
+	}
+
 	device->reg_state = IB_DEV_REGISTERED;
 
 	list_for_each_entry(client, &client_list, list)
@@ -416,6 +437,7 @@ void ib_unregister_device(struct ib_device *device)
 	mutex_unlock(&device_mutex);
 
 	ib_device_unregister_sysfs(device);
+	ib_device_destroy_idrs(device);
 	ib_cache_cleanup_one(device);
 
 	down_write(&lists_rwsem);
diff --git a/drivers/infiniband/core/uidr.c b/drivers/infiniband/core/uidr.c
new file mode 100644
index 0000000..1acdf64
--- /dev/null
+++ b/drivers/infiniband/core/uidr.c
@@ -0,0 +1,165 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/ib_verbs.h>
+#include "uidr.h"
+#include "uobject.h"
+
+int idr_add_uobj(struct ib_uobject *uobj)
+{
+	int ret;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&uobj->context->device->idr_lock);
+
+	ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0, GFP_NOWAIT);
+	if (ret >= 0)
+		uobj->id = ret;
+
+	spin_unlock(&uobj->context->device->idr_lock);
+	idr_preload_end();
+
+	return ret < 0 ? ret : 0;
+}
+
+void idr_remove_uobj(struct ib_uobject *uobj)
+{
+	spin_lock(&uobj->context->device->idr_lock);
+	idr_remove(&uobj->context->device->idr, uobj->id);
+	spin_unlock(&uobj->context->device->idr_lock);
+}
+
+static struct ib_uobject *__idr_get_uobj(int id, struct ib_ucontext *context)
+{
+	struct ib_uobject *uobj;
+
+	rcu_read_lock();
+	uobj = idr_find(&context->device->idr, id);
+	if (uobj) {
+		if (uobj->context == context)
+			kref_get(&uobj->ref);
+		else
+			uobj = NULL;
+	}
+	rcu_read_unlock();
+
+	return uobj;
+}
+
+static struct ib_uobject *idr_read_uobj(int id, struct ib_ucontext *context,
+					int nested)
+{
+	struct ib_uobject *uobj;
+
+	uobj = __idr_get_uobj(id, context);
+	if (!uobj)
+		return NULL;
+
+	if (nested)
+		down_read_nested(&uobj->mutex, SINGLE_DEPTH_NESTING);
+	else
+		down_read(&uobj->mutex);
+	if (!uobj->live) {
+		put_uobj_read(uobj);
+		return NULL;
+	}
+
+	return uobj;
+}
+
+struct ib_uobject *idr_write_uobj(int id, struct ib_ucontext *context)
+{
+	struct ib_uobject *uobj;
+
+	uobj = __idr_get_uobj(id, context);
+	if (!uobj)
+		return NULL;
+
+	down_write(&uobj->mutex);
+	if (!uobj->live) {
+		put_uobj_write(uobj);
+		return NULL;
+	}
+
+	return uobj;
+}
+
+static void *idr_read_obj(int id, struct ib_ucontext *context,
+			  int nested)
+{
+	struct ib_uobject *uobj;
+
+	uobj = idr_read_uobj(id, context, nested);
+	return uobj ? uobj->object : NULL;
+}
+
+struct ib_pd *idr_read_pd(int pd_handle, struct ib_ucontext *context)
+{
+	return idr_read_obj(pd_handle, context, 0);
+}
+
+struct ib_cq *idr_read_cq(int cq_handle, struct ib_ucontext *context,
+			  int nested)
+{
+	return idr_read_obj(cq_handle, context, nested);
+}
+
+struct ib_ah *idr_read_ah(int ah_handle, struct ib_ucontext *context)
+{
+	return idr_read_obj(ah_handle, context, 0);
+}
+
+struct ib_qp *idr_read_qp(int qp_handle, struct ib_ucontext *context)
+{
+	return idr_read_obj(qp_handle, context, 0);
+}
+
+struct ib_qp *idr_write_qp(int qp_handle, struct ib_ucontext *context)
+{
+	struct ib_uobject *uobj;
+
+	uobj = idr_write_uobj(qp_handle, context);
+	return uobj ? uobj->object : NULL;
+}
+
+struct ib_srq *idr_read_srq(int srq_handle, struct ib_ucontext *context)
+{
+	return idr_read_obj(srq_handle, context, 0);
+}
+
+struct ib_xrcd *idr_read_xrcd(int xrcd_handle, struct ib_ucontext *context,
+			      struct ib_uobject **uobj)
+{
+	*uobj = idr_read_uobj(xrcd_handle, context, 0);
+	return *uobj ? (*uobj)->object : NULL;
+}
+
diff --git a/drivers/infiniband/core/uidr.h b/drivers/infiniband/core/uidr.h
new file mode 100644
index 0000000..581b394
--- /dev/null
+++ b/drivers/infiniband/core/uidr.h
@@ -0,0 +1,55 @@
+/*
+ * Copyright (c) 2005 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005-2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef UIDR_H
+#define UIDR_H
+
+#include <linux/idr.h>
+
+int idr_add_uobj(struct ib_uobject *uobj);
+void idr_remove_uobj(struct ib_uobject *uobj);
+struct ib_uobject *idr_write_uobj(int id, struct ib_ucontext *context);
+
+struct ib_pd *idr_read_pd(int pd_handle, struct ib_ucontext *context);
+struct ib_cq *idr_read_cq(int cq_handle, struct ib_ucontext *context,
+			  int nested);
+struct ib_ah *idr_read_ah(int ah_handle, struct ib_ucontext *context);
+struct ib_qp *idr_read_qp(int qp_handle, struct ib_ucontext *context);
+struct ib_qp *idr_write_qp(int qp_handle, struct ib_ucontext *context);
+struct ib_srq *idr_read_srq(int srq_handle, struct ib_ucontext *context);
+struct ib_xrcd *idr_read_xrcd(int xrcd_handle, struct ib_ucontext *context,
+			      struct ib_uobject **uobj);
+#endif /* UIDR_H */
diff --git a/drivers/infiniband/core/uobject.c b/drivers/infiniband/core/uobject.c
index 55138ea..86f6460 100644
--- a/drivers/infiniband/core/uobject.c
+++ b/drivers/infiniband/core/uobject.c
@@ -33,6 +33,7 @@
 #include <rdma/ib_verbs.h>
 
 #include "uobject.h"
+#include "uidr.h"
 
 /*
  * The ib_uobject locking scheme is as follows:
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 612ccfd..11bad7a 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -171,19 +171,6 @@ struct ib_ucq_object {
 	u32			async_events_reported;
 };
 
-extern spinlock_t ib_uverbs_idr_lock;
-extern struct idr ib_uverbs_pd_idr;
-extern struct idr ib_uverbs_mr_idr;
-extern struct idr ib_uverbs_mw_idr;
-extern struct idr ib_uverbs_ah_idr;
-extern struct idr ib_uverbs_cq_idr;
-extern struct idr ib_uverbs_qp_idr;
-extern struct idr ib_uverbs_srq_idr;
-extern struct idr ib_uverbs_xrcd_idr;
-extern struct idr ib_uverbs_rule_idr;
-
-void idr_remove_uobj(struct idr *idp, struct ib_uobject *uobj);
-
 struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file *uverbs_file,
 					struct ib_device *ib_dev,
 					int is_async);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index fd7e320..22406fe 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -41,7 +41,9 @@
 #include <asm/uaccess.h>
 
 #include "uverbs.h"
+#include "uidr.h"
 #include "uobject.h"
+
 #include "core_priv.h"
 
 static struct uverbs_lock_class pd_lock_class	= { .name = "PD-uobj" };
@@ -54,136 +56,6 @@ static struct uverbs_lock_class srq_lock_class	= { .name = "SRQ-uobj" };
 static struct uverbs_lock_class xrcd_lock_class = { .name = "XRCD-uobj" };
 static struct uverbs_lock_class rule_lock_class = { .name = "RULE-uobj" };
 
-static int idr_add_uobj(struct idr *idr, struct ib_uobject *uobj)
-{
-	int ret;
-
-	idr_preload(GFP_KERNEL);
-	spin_lock(&ib_uverbs_idr_lock);
-
-	ret = idr_alloc(idr, uobj, 0, 0, GFP_NOWAIT);
-	if (ret >= 0)
-		uobj->id = ret;
-
-	spin_unlock(&ib_uverbs_idr_lock);
-	idr_preload_end();
-
-	return ret < 0 ? ret : 0;
-}
-
-void idr_remove_uobj(struct idr *idr, struct ib_uobject *uobj)
-{
-	spin_lock(&ib_uverbs_idr_lock);
-	idr_remove(idr, uobj->id);
-	spin_unlock(&ib_uverbs_idr_lock);
-}
-
-static struct ib_uobject *__idr_get_uobj(struct idr *idr, int id,
-					 struct ib_ucontext *context)
-{
-	struct ib_uobject *uobj;
-
-	rcu_read_lock();
-	uobj = idr_find(idr, id);
-	if (uobj) {
-		if (uobj->context == context)
-			kref_get(&uobj->ref);
-		else
-			uobj = NULL;
-	}
-	rcu_read_unlock();
-
-	return uobj;
-}
-
-static struct ib_uobject *idr_read_uobj(struct idr *idr, int id,
-					struct ib_ucontext *context, int nested)
-{
-	struct ib_uobject *uobj;
-
-	uobj = __idr_get_uobj(idr, id, context);
-	if (!uobj)
-		return NULL;
-
-	if (nested)
-		down_read_nested(&uobj->mutex, SINGLE_DEPTH_NESTING);
-	else
-		down_read(&uobj->mutex);
-	if (!uobj->live) {
-		put_uobj_read(uobj);
-		return NULL;
-	}
-
-	return uobj;
-}
-
-static struct ib_uobject *idr_write_uobj(struct idr *idr, int id,
-					 struct ib_ucontext *context)
-{
-	struct ib_uobject *uobj;
-
-	uobj = __idr_get_uobj(idr, id, context);
-	if (!uobj)
-		return NULL;
-
-	down_write(&uobj->mutex);
-	if (!uobj->live) {
-		put_uobj_write(uobj);
-		return NULL;
-	}
-
-	return uobj;
-}
-
-static void *idr_read_obj(struct idr *idr, int id, struct ib_ucontext *context,
-			  int nested)
-{
-	struct ib_uobject *uobj;
-
-	uobj = idr_read_uobj(idr, id, context, nested);
-	return uobj ? uobj->object : NULL;
-}
-
-static struct ib_pd *idr_read_pd(int pd_handle, struct ib_ucontext *context)
-{
-	return idr_read_obj(&ib_uverbs_pd_idr, pd_handle, context, 0);
-}
-
-static struct ib_cq *idr_read_cq(int cq_handle, struct ib_ucontext *context, int nested)
-{
-	return idr_read_obj(&ib_uverbs_cq_idr, cq_handle, context, nested);
-}
-
-static struct ib_ah *idr_read_ah(int ah_handle, struct ib_ucontext *context)
-{
-	return idr_read_obj(&ib_uverbs_ah_idr, ah_handle, context, 0);
-}
-
-static struct ib_qp *idr_read_qp(int qp_handle, struct ib_ucontext *context)
-{
-	return idr_read_obj(&ib_uverbs_qp_idr, qp_handle, context, 0);
-}
-
-static struct ib_qp *idr_write_qp(int qp_handle, struct ib_ucontext *context)
-{
-	struct ib_uobject *uobj;
-
-	uobj = idr_write_uobj(&ib_uverbs_qp_idr, qp_handle, context);
-	return uobj ? uobj->object : NULL;
-}
-
-static struct ib_srq *idr_read_srq(int srq_handle, struct ib_ucontext *context)
-{
-	return idr_read_obj(&ib_uverbs_srq_idr, srq_handle, context, 0);
-}
-
-static struct ib_xrcd *idr_read_xrcd(int xrcd_handle, struct ib_ucontext *context,
-				     struct ib_uobject **uobj)
-{
-	*uobj = idr_read_uobj(&ib_uverbs_xrcd_idr, xrcd_handle, context, 0);
-	return *uobj ? (*uobj)->object : NULL;
-}
-
 ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 			      struct ib_device *ib_dev,
 			      const char __user *buf,
@@ -451,7 +323,7 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
 	atomic_set(&pd->usecnt, 0);
 
 	uobj->object = pd;
-	ret = idr_add_uobj(&ib_uverbs_pd_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_idr;
 
@@ -475,7 +347,7 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_pd_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_idr:
 	ib_dealloc_pd(pd);
@@ -498,7 +370,7 @@ ssize_t ib_uverbs_dealloc_pd(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_pd_idr, cmd.pd_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.pd_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	pd = uobj->object;
@@ -516,7 +388,7 @@ ssize_t ib_uverbs_dealloc_pd(struct ib_uverbs_file *file,
 	uobj->live = 0;
 	put_uobj_write(uobj);
 
-	idr_remove_uobj(&ib_uverbs_pd_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -692,7 +564,7 @@ ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file,
 
 	atomic_set(&obj->refcnt, 0);
 	obj->uobject.object = xrcd;
-	ret = idr_add_uobj(&ib_uverbs_xrcd_idr, &obj->uobject);
+	ret = idr_add_uobj(&obj->uobject);
 	if (ret)
 		goto err_idr;
 
@@ -736,7 +608,7 @@ err_copy:
 	}
 
 err_insert_xrcd:
-	idr_remove_uobj(&ib_uverbs_xrcd_idr, &obj->uobject);
+	idr_remove_uobj(&obj->uobject);
 
 err_idr:
 	ib_dealloc_xrcd(xrcd);
@@ -770,7 +642,7 @@ ssize_t ib_uverbs_close_xrcd(struct ib_uverbs_file *file,
 		return -EFAULT;
 
 	mutex_lock(&file->device->xrcd_tree_mutex);
-	uobj = idr_write_uobj(&ib_uverbs_xrcd_idr, cmd.xrcd_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.xrcd_handle, file->ucontext);
 	if (!uobj) {
 		ret = -EINVAL;
 		goto out;
@@ -803,7 +675,7 @@ ssize_t ib_uverbs_close_xrcd(struct ib_uverbs_file *file,
 	if (inode && !live)
 		xrcd_table_delete(file->device, inode);
 
-	idr_remove_uobj(&ib_uverbs_xrcd_idr, uobj);
+	idr_remove_uobj(uobj);
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
 	mutex_unlock(&file->mutex);
@@ -896,7 +768,7 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 	atomic_inc(&pd->usecnt);
 
 	uobj->object = mr;
-	ret = idr_add_uobj(&ib_uverbs_mr_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_unreg;
 
@@ -924,7 +796,7 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_mr_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_unreg:
 	ib_dereg_mr(mr);
@@ -969,8 +841,7 @@ ssize_t ib_uverbs_rereg_mr(struct ib_uverbs_file *file,
 	     (cmd.start & ~PAGE_MASK) != (cmd.hca_va & ~PAGE_MASK)))
 			return -EINVAL;
 
-	uobj = idr_write_uobj(&ib_uverbs_mr_idr, cmd.mr_handle,
-			      file->ucontext);
+	uobj = idr_write_uobj(cmd.mr_handle, file->ucontext);
 
 	if (!uobj)
 		return -EINVAL;
@@ -1039,7 +910,7 @@ ssize_t ib_uverbs_dereg_mr(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_mr_idr, cmd.mr_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.mr_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 
@@ -1054,7 +925,7 @@ ssize_t ib_uverbs_dereg_mr(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_mr_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -1114,7 +985,7 @@ ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
 	atomic_inc(&pd->usecnt);
 
 	uobj->object = mw;
-	ret = idr_add_uobj(&ib_uverbs_mw_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_unalloc;
 
@@ -1141,7 +1012,7 @@ ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_mw_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_unalloc:
 	uverbs_dealloc_mw(mw);
@@ -1167,7 +1038,7 @@ ssize_t ib_uverbs_dealloc_mw(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof(cmd)))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_mw_idr, cmd.mw_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.mw_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 
@@ -1182,7 +1053,7 @@ ssize_t ib_uverbs_dealloc_mw(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_mw_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -1296,7 +1167,7 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
 	atomic_set(&cq->usecnt, 0);
 
 	obj->uobject.object = cq;
-	ret = idr_add_uobj(&ib_uverbs_cq_idr, &obj->uobject);
+	ret = idr_add_uobj(&obj->uobject);
 	if (ret)
 		goto err_free;
 
@@ -1322,7 +1193,7 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
 	return obj;
 
 err_cb:
-	idr_remove_uobj(&ib_uverbs_cq_idr, &obj->uobject);
+	idr_remove_uobj(&obj->uobject);
 
 err_free:
 	ib_destroy_cq(cq);
@@ -1592,7 +1463,7 @@ ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_cq_idr, cmd.cq_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.cq_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	cq      = uobj->object;
@@ -1608,7 +1479,7 @@ ssize_t ib_uverbs_destroy_cq(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_cq_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -1779,7 +1650,7 @@ static int create_qp(struct ib_uverbs_file *file,
 	qp->uobject = &obj->uevent.uobject;
 
 	obj->uevent.uobject.object = qp;
-	ret = idr_add_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	ret = idr_add_uobj(&obj->uevent.uobject);
 	if (ret)
 		goto err_destroy;
 
@@ -1825,7 +1696,7 @@ static int create_qp(struct ib_uverbs_file *file,
 
 	return 0;
 err_cb:
-	idr_remove_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	idr_remove_uobj(&obj->uevent.uobject);
 
 err_destroy:
 	ib_destroy_qp(qp);
@@ -2009,7 +1880,7 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
 	qp->uobject = &obj->uevent.uobject;
 
 	obj->uevent.uobject.object = qp;
-	ret = idr_add_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	ret = idr_add_uobj(&obj->uevent.uobject);
 	if (ret)
 		goto err_destroy;
 
@@ -2038,7 +1909,7 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
 	return in_len;
 
 err_remove:
-	idr_remove_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject);
+	idr_remove_uobj(&obj->uevent.uobject);
 
 err_destroy:
 	ib_destroy_qp(qp);
@@ -2278,7 +2149,7 @@ ssize_t ib_uverbs_destroy_qp(struct ib_uverbs_file *file,
 
 	memset(&resp, 0, sizeof resp);
 
-	uobj = idr_write_uobj(&ib_uverbs_qp_idr, cmd.qp_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.qp_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	qp  = uobj->object;
@@ -2301,7 +2172,7 @@ ssize_t ib_uverbs_destroy_qp(struct ib_uverbs_file *file,
 	if (obj->uxrcd)
 		atomic_dec(&obj->uxrcd->refcnt);
 
-	idr_remove_uobj(&ib_uverbs_qp_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -2753,7 +2624,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	ah->uobject  = uobj;
 	uobj->object = ah;
 
-	ret = idr_add_uobj(&ib_uverbs_ah_idr, uobj);
+	ret = idr_add_uobj(uobj);
 	if (ret)
 		goto err_destroy;
 
@@ -2778,7 +2649,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	return in_len;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_ah_idr, uobj);
+	idr_remove_uobj(uobj);
 
 err_destroy:
 	ib_destroy_ah(ah);
@@ -2803,7 +2674,7 @@ ssize_t ib_uverbs_destroy_ah(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_ah_idr, cmd.ah_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.ah_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	ah = uobj->object;
@@ -2817,7 +2688,7 @@ ssize_t ib_uverbs_destroy_ah(struct ib_uverbs_file *file,
 	if (ret)
 		return ret;
 
-	idr_remove_uobj(&ib_uverbs_ah_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -3085,7 +2956,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 	flow_id->uobject = uobj;
 	uobj->object = flow_id;
 
-	err = idr_add_uobj(&ib_uverbs_rule_idr, uobj);
+	err = idr_add_uobj(uobj);
 	if (err)
 		goto destroy_flow;
 
@@ -3110,7 +2981,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 		kfree(kern_flow_attr);
 	return 0;
 err_copy:
-	idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+	idr_remove_uobj(uobj);
 destroy_flow:
 	ib_destroy_flow(flow_id);
 err_free:
@@ -3145,8 +3016,7 @@ int ib_uverbs_ex_destroy_flow(struct ib_uverbs_file *file,
 	if (cmd.comp_mask)
 		return -EINVAL;
 
-	uobj = idr_write_uobj(&ib_uverbs_rule_idr, cmd.flow_handle,
-			      file->ucontext);
+	uobj = idr_write_uobj(cmd.flow_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	flow_id = uobj->object;
@@ -3157,7 +3027,7 @@ int ib_uverbs_ex_destroy_flow(struct ib_uverbs_file *file,
 
 	put_uobj_write(uobj);
 
-	idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
@@ -3245,7 +3115,7 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	atomic_set(&srq->usecnt, 0);
 
 	obj->uevent.uobject.object = srq;
-	ret = idr_add_uobj(&ib_uverbs_srq_idr, &obj->uevent.uobject);
+	ret = idr_add_uobj(&obj->uevent.uobject);
 	if (ret)
 		goto err_destroy;
 
@@ -3279,7 +3149,7 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	return 0;
 
 err_copy:
-	idr_remove_uobj(&ib_uverbs_srq_idr, &obj->uevent.uobject);
+	idr_remove_uobj(&obj->uevent.uobject);
 
 err_destroy:
 	ib_destroy_srq(srq);
@@ -3455,7 +3325,7 @@ ssize_t ib_uverbs_destroy_srq(struct ib_uverbs_file *file,
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
 
-	uobj = idr_write_uobj(&ib_uverbs_srq_idr, cmd.srq_handle, file->ucontext);
+	uobj = idr_write_uobj(cmd.srq_handle, file->ucontext);
 	if (!uobj)
 		return -EINVAL;
 	srq = uobj->object;
@@ -3476,7 +3346,7 @@ ssize_t ib_uverbs_destroy_srq(struct ib_uverbs_file *file,
 		atomic_dec(&us->uxrcd->refcnt);
 	}
 
-	idr_remove_uobj(&ib_uverbs_srq_idr, uobj);
+	idr_remove_uobj(uobj);
 
 	mutex_lock(&file->mutex);
 	list_del(&uobj->list);
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 31f422a..5dce312 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -51,6 +51,7 @@
 #include <rdma/ib.h>
 
 #include "uverbs.h"
+#include "uidr.h"
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("InfiniBand userspace verbs access");
@@ -66,17 +67,6 @@ enum {
 
 static struct class *uverbs_class;
 
-DEFINE_SPINLOCK(ib_uverbs_idr_lock);
-DEFINE_IDR(ib_uverbs_pd_idr);
-DEFINE_IDR(ib_uverbs_mr_idr);
-DEFINE_IDR(ib_uverbs_mw_idr);
-DEFINE_IDR(ib_uverbs_ah_idr);
-DEFINE_IDR(ib_uverbs_cq_idr);
-DEFINE_IDR(ib_uverbs_qp_idr);
-DEFINE_IDR(ib_uverbs_srq_idr);
-DEFINE_IDR(ib_uverbs_xrcd_idr);
-DEFINE_IDR(ib_uverbs_rule_idr);
-
 static DEFINE_SPINLOCK(map_lock);
 static DECLARE_BITMAP(dev_map, IB_UVERBS_MAX_DEVICES);
 
@@ -227,7 +217,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
 		struct ib_ah *ah = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_ah_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_ah(ah);
 		kfree(uobj);
 	}
@@ -236,7 +226,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->mw_list, list) {
 		struct ib_mw *mw = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_mw_idr, uobj);
+		idr_remove_uobj(uobj);
 		uverbs_dealloc_mw(mw);
 		kfree(uobj);
 	}
@@ -244,7 +234,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->rule_list, list) {
 		struct ib_flow *flow_id = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_rule_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_flow(flow_id);
 		kfree(uobj);
 	}
@@ -254,7 +244,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_uqp_object *uqp =
 			container_of(uobj, struct ib_uqp_object, uevent.uobject);
 
-		idr_remove_uobj(&ib_uverbs_qp_idr, uobj);
+		idr_remove_uobj(uobj);
 		if (qp != qp->real_qp) {
 			ib_close_qp(qp);
 		} else {
@@ -270,7 +260,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_uevent_object *uevent =
 			container_of(uobj, struct ib_uevent_object, uobject);
 
-		idr_remove_uobj(&ib_uverbs_srq_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_srq(srq);
 		ib_uverbs_release_uevent(file, uevent);
 		kfree(uevent);
@@ -282,7 +272,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_ucq_object *ucq =
 			container_of(uobj, struct ib_ucq_object, uobject);
 
-		idr_remove_uobj(&ib_uverbs_cq_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_destroy_cq(cq);
 		ib_uverbs_release_ucq(file, ev_file, ucq);
 		kfree(ucq);
@@ -291,7 +281,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->mr_list, list) {
 		struct ib_mr *mr = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_mr_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_dereg_mr(mr);
 		kfree(uobj);
 	}
@@ -302,7 +292,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 		struct ib_uxrcd_object *uxrcd =
 			container_of(uobj, struct ib_uxrcd_object, uobject);
 
-		idr_remove_uobj(&ib_uverbs_xrcd_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_uverbs_dealloc_xrcd(file->device, xrcd);
 		kfree(uxrcd);
 	}
@@ -311,7 +301,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	list_for_each_entry_safe(uobj, tmp, &context->pd_list, list) {
 		struct ib_pd *pd = uobj->object;
 
-		idr_remove_uobj(&ib_uverbs_pd_idr, uobj);
+		idr_remove_uobj(uobj);
 		ib_dealloc_pd(pd);
 		kfree(uobj);
 	}
@@ -1326,13 +1316,6 @@ static void __exit ib_uverbs_cleanup(void)
 	unregister_chrdev_region(IB_UVERBS_BASE_DEV, IB_UVERBS_MAX_DEVICES);
 	if (overflow_maj)
 		unregister_chrdev_region(overflow_maj, IB_UVERBS_MAX_DEVICES);
-	idr_destroy(&ib_uverbs_pd_idr);
-	idr_destroy(&ib_uverbs_mr_idr);
-	idr_destroy(&ib_uverbs_mw_idr);
-	idr_destroy(&ib_uverbs_ah_idr);
-	idr_destroy(&ib_uverbs_cq_idr);
-	idr_destroy(&ib_uverbs_qp_idr);
-	idr_destroy(&ib_uverbs_srq_idr);
 }
 
 module_init(ib_uverbs_init);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 432bed5..14bfe3b 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1706,6 +1706,10 @@ struct ib_device {
 
 	struct iw_cm_verbs	     *iwcm;
 
+	struct idr		idr;
+	/* Global lock in use to safely release device IDR */
+	spinlock_t		idr_lock;
+
 	/**
 	 * alloc_hw_stats - Allocate a struct rdma_hw_stats and fill in the
 	 *   driver initialized data.  The struct is kfree()'ed by the sysfs
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found] ` <1467293971-25688-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-06-30 13:39   ` [RFC ABI V1 3/8] RDMA/core: Refactor IDR to be per-device Matan Barak
@ 2016-06-30 13:39   ` Matan Barak
       [not found]     ` <1467293971-25688-5-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-06-30 13:39   ` [RFC ABI V1 5/8] RDMA/core: Introduce add/remove uobj from types Matan Barak
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-06-30 13:39 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Tal Alon, Liran Liss,
	Haggai Eran, Matan Barak, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky

From: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

The new ioctl infrastructure supports driver specific types.
This is implemented by having a list of uverbs_uobject_type in the
ib_device. Each element of this list corresponds to a specific type
and specifies its free function, driver's type_id, etc.
The order of elements dictates the order of process release.
The whole type_list should be initialized before any ucontext was
created.

When a ucontext is created, a new list is created in this ib_ucontext.
This list corresponds to the ib_device's type list, as any element
in the ib_dev's type list has a corresponding element in this list.
Each element in the ucontext's list points to its respective
corresponding element in the ib_dev's type list.
In addition, it has a data structure (currently implemented by a list,
but should probably move to using a hash) that maps all ib_uobjects
of the same ib_ucontext and the respective type.

+-------------------------------------------------------------------+
|    ib_device                                                      |
|   +--------------+      +--------------+      +----------------+  |
|   |uobject_type  |      |              |      |                |  |
|   |free_fn       |      |              |      |                |  |
|   |              +----->+              +----->+                |  |
|   |              |      |              |      |                |  |
|   +----^---------+      +--------------+      +----------------+  |
| +------|                                                          |
+-------------------------------------------------------------------+
  |
  |
  |
+-------------------------------------------------------------------+
| |  ib_ucontext                                                    |
| | +--------------+      +--------------+      +----------------+  |
| | |uobject_list  |      |              |      |                |  |
| +-+type          |      |              |      |                |  |
|   |list+         +----->+              +----->+                |  |
|   |    |         |      |              |      |                |  |
|   +--------------+      +--------------+      +----------------+  |
|        |                                                          |
+-------------------------------------------------------------------+
         |
         |
         |
         |
         |  +-----------+        +------------+
         |  | ib_uobject|        |ib_uobject  |
         +-->           +------> |            |
            |           |        |            |
            +-----------+        +------------+

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uobject.c | 83 +++++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/uobject.h | 16 ++++++++
 include/rdma/ib_verbs.h           |  8 ++++
 3 files changed, 107 insertions(+)

diff --git a/drivers/infiniband/core/uobject.c b/drivers/infiniband/core/uobject.c
index 86f6460..bc15be1 100644
--- a/drivers/infiniband/core/uobject.c
+++ b/drivers/infiniband/core/uobject.c
@@ -35,6 +35,89 @@
 #include "uobject.h"
 #include "uidr.h"
 
+int ib_uverbs_uobject_type_add(struct list_head	*head,
+			       void (*free)(struct uverbs_uobject_type *uobject_type,
+					    struct ib_uobject *uobject,
+					    struct ib_ucontext *ucontext),
+			       uint16_t	obj_type)
+{
+	/*
+	 * Allocate a new object type for the vendor, this should be done when a
+	 * vendor is initialized.
+	 */
+	struct uverbs_uobject_type *uobject_type;
+
+	uobject_type = kzalloc(sizeof(*uobject_type), GFP_KERNEL);
+	if (!uobject_type)
+		return -ENOMEM;
+
+	uobject_type->free = free;
+	uobject_type->obj_type = obj_type;
+	list_add_tail(&uobject_type->type_list, head);
+	return 0;
+}
+
+void ib_uverbs_uobject_type_remove(struct uverbs_uobject_type *uobject_type)
+{
+	/*
+	 * Allocate a new object type for the vendor, this should be done when a
+	 * vendor is initialized.
+	 */
+	WARN_ON(list_empty(&uobject_type->type_list));
+	list_del(&uobject_type->type_list);
+	kfree(uobject_type);
+}
+
+void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext *ucontext)
+{
+	struct uverbs_uobject_list *uobject_list, *next_list;
+
+	list_for_each_entry_safe(uobject_list, next_list,
+				 &ucontext->uobjects_lists, type_list) {
+		struct uverbs_uobject_type *type = uobject_list->type;
+		struct ib_uobject *obj, *next_obj;
+
+		list_for_each_entry_safe(obj, next_obj, &uobject_list->list,
+					 idr_list) {
+			/* TODO */
+			type->free(type, obj, ucontext);
+			list_del(&obj->idr_list);
+		}
+
+		list_del(&uobject_list->type_list);
+	}
+}
+
+int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext *ucontext,
+					       struct list_head *type_list)
+{
+	/* create typed list in ucontext */
+	struct uverbs_uobject_type *type;
+	int err;
+
+	INIT_LIST_HEAD(&ucontext->uobjects_lists);
+
+	list_for_each_entry(type, type_list, type_list) {
+		struct uverbs_uobject_list *cur;
+
+		cur = kzalloc(sizeof(*cur), GFP_KERNEL);
+		if (!cur) {
+			err = -ENOMEM;
+			goto err;
+		}
+
+		cur->type = type;
+		INIT_LIST_HEAD(&cur->list);
+		list_add_tail(&cur->type_list, &ucontext->uobjects_lists);
+	}
+
+	return 0;
+
+err:
+	ib_uverbs_uobject_type_cleanup_ucontext(ucontext);
+	return err;
+}
+
 /*
  * The ib_uobject locking scheme is as follows:
  *
diff --git a/drivers/infiniband/core/uobject.h b/drivers/infiniband/core/uobject.h
index 2958ef7..40f6ff1 100644
--- a/drivers/infiniband/core/uobject.h
+++ b/drivers/infiniband/core/uobject.h
@@ -42,6 +42,22 @@ struct uverbs_lock_class {
 	char			name[16];
 };
 
+struct uverbs_uobject_type {
+	struct list_head	type_list;
+	void (*free)(struct uverbs_uobject_type *uobject_type,
+		     struct ib_uobject *uobject,
+		     struct ib_ucontext *ucontext);
+	u16			obj_type;
+	struct uverbs_lock_class lock_class;
+};
+
+/* embed in ucontext per type */
+struct uverbs_uobject_list {
+	struct uverbs_uobject_type	*type;
+	struct list_head		list;
+	struct list_head		type_list;
+};
+
 void init_uobj(struct ib_uobject *uobj, u64 user_handle,
 	       struct ib_ucontext *context, struct uverbs_lock_class *c);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 14bfe3b..c18155f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1325,6 +1325,8 @@ struct ib_ucontext {
 	struct list_head	rule_list;
 	int			closing;
 
+	struct list_head	uobjects_lists;
+
 	struct pid             *tgid;
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 	struct rb_root      umem_tree;
@@ -1354,6 +1356,9 @@ struct ib_uobject {
 	struct rw_semaphore	mutex;		/* protects .live */
 	struct rcu_head		rcu;		/* kfree_rcu() overhead */
 	int			live;
+	/* List of object under uverbs_object_type */
+	struct list_head	idr_list;
+	struct uverbs_uobject_list *type;	/* ptr to ucontext type */
 };
 
 struct ib_udata {
@@ -1960,6 +1965,9 @@ struct ib_device {
 	 * in fast paths.
 	 */
 	int (*get_port_immutable)(struct ib_device *, u8, struct ib_port_immutable *);
+	struct list_head type_list;
+
+	const struct uverbs_types	*types;
 };
 
 struct ib_client {
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC ABI V1 5/8] RDMA/core: Introduce add/remove uobj from types
       [not found] ` <1467293971-25688-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-06-30 13:39   ` [RFC ABI V1 4/8] RDMA/core: Add support for custom types Matan Barak
@ 2016-06-30 13:39   ` Matan Barak
  2016-06-30 13:39   ` [RFC ABI V1 6/8] RDMA/core: Add new ioctl interface Matan Barak
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Matan Barak @ 2016-06-30 13:39 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Tal Alon, Liran Liss,
	Haggai Eran, Matan Barak, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky

We introduce here adding new user objects to the type list.
Adding an object is done in two parts. First, an object is allocated
and added to IDR. Then, the command's handlers (in downstream patches)
could work on this object and fill its required details.
After a successful command, ib_uverbs_uobject_enable is called and
this user object becomes ucontext visible.

Removing an uboject is done by calling ib_uverbs_uobject_remove.

We should make sure IDR (per-device) and lists (per-ucontext) could
be accessed concurrently without corrupting them (some parts are
missing here).

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uobject.c | 33 +++++++++++++++++++++++++++++++++
 drivers/infiniband/core/uobject.h |  5 +++++
 2 files changed, 38 insertions(+)

diff --git a/drivers/infiniband/core/uobject.c b/drivers/infiniband/core/uobject.c
index bc15be1..c480b3b 100644
--- a/drivers/infiniband/core/uobject.c
+++ b/drivers/infiniband/core/uobject.c
@@ -118,6 +118,39 @@ err:
 	return err;
 }
 
+int ib_uverbs_uobject_add(struct ib_uobject *uobject,
+			  struct uverbs_uobject_type *uobject_type)
+{
+	int ret = -EINVAL;
+	struct uverbs_uobject_list *type;
+
+	/* TODO: add locking */
+	list_for_each_entry(type, &uobject->context->uobjects_lists, type_list)
+		if (type->type == uobject_type) {
+			uobject->type = type;
+			ret = idr_add_uobj(uobject);
+			return ret;
+		}
+
+	return ret;
+}
+
+void ib_uverbs_uobject_remove(struct ib_uobject *uobject)
+{
+	spin_lock(&uobject->context->device->idr_lock);
+	list_del(&uobject->idr_list);
+	spin_unlock(&uobject->context->device->idr_lock);
+	idr_remove_uobj(uobject);
+}
+
+void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
+{
+	spin_lock(&uobject->context->device->idr_lock);
+	list_add(&uobject->idr_list, &uobject->type->list);
+	spin_unlock(&uobject->context->device->idr_lock);
+	uobject->live = 1;
+}
+
 /*
  * The ib_uobject locking scheme is as follows:
  *
diff --git a/drivers/infiniband/core/uobject.h b/drivers/infiniband/core/uobject.h
index 40f6ff1..13fdaef 100644
--- a/drivers/infiniband/core/uobject.h
+++ b/drivers/infiniband/core/uobject.h
@@ -58,6 +58,11 @@ struct uverbs_uobject_list {
 	struct list_head		type_list;
 };
 
+int ib_uverbs_uobject_add(struct ib_uobject *uobject,
+			  struct uverbs_uobject_type *uobject_type);
+void ib_uverbs_uobject_remove(struct ib_uobject *uobject);
+void ib_uverbs_uobject_enable(struct ib_uobject *uobject);
+
 void init_uobj(struct ib_uobject *uobj, u64 user_handle,
 	       struct ib_ucontext *context, struct uverbs_lock_class *c);
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC ABI V1 6/8] RDMA/core: Add new ioctl interface
       [not found] ` <1467293971-25688-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-06-30 13:39   ` [RFC ABI V1 5/8] RDMA/core: Introduce add/remove uobj from types Matan Barak
@ 2016-06-30 13:39   ` Matan Barak
  2016-06-30 13:39   ` [RFC ABI V1 7/8] RDMA/core: Change locking of ib_uobject Matan Barak
  2016-06-30 13:39   ` [RFC ABI V1 8/8] RDMA/{hw, core}: Provide simple skeleton to IOCTL interface Matan Barak
  7 siblings, 0 replies; 35+ messages in thread
From: Matan Barak @ 2016-06-30 13:39 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Tal Alon, Liran Liss,
	Haggai Eran, Matan Barak, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky

In this proposed ioctl interface, processing the command starts from
validating properties of the command and fetching the appropriate user
objects before calling the handler.

Parsing and validation is done according to a specifier declared by
the driver's code. In the driver, all supported types are declared.
For each type we list all supported actions. Each action specifies a
handler, which could be either be a standard function or a vendor
specific function.
Along with the handler, a chain is specified as well. This chain
lists all supported attributes and is used for automatically
fetching and validating the command, response and its related objects.

Each chain consists of a validator_chains and a distribution function
which maps different command attributes to the different validators.
It also maps the attribute from the KABI world (which is vendor
specific) to the common kernel language.
Currently, in order to ease transition, we have a standard
distribution function which maps the attributes using the upper
attribute type bit.

A chain is an array of attributes. Each attribute has a type (PTR_IN,
PTR_OUT, IDR and in the future maybe FD, which could be used for
completion channel) and a length. Future work here might add
validation of mandatory attributes (i.e - make sure a specific
attribute was given).

If an IDR attribute is specified, the kernel also states the object
type and the required access (NEW, WRITE, READ or DESTROY).
All uobject management is done automatically by the infrastructure,
meaning - the infrastructure will fail when executing concurrent
actions and least one of these actions requires exclusive access
(WRITE/DESTROY).
Automatically synchronizing actions with device removals (dissociate
context events) and taking care of reference counting
(increase/decrease) is also one of the infrastructure goals.
Automatic reference counting isn't coded yet, but the general concept
here is to add (INCREASE_REF, DECREASE_REF) to the access
specification field and update these reference in success
automatically.

 types
+--------+
|        |                                                                          +--------+
|        |   actions    action      action_spec                           +-----+   |len     |
+--------+  +------+   +-------+   +----------------+   +------------+    |attr1+-> |type    |
| type   +> |action+-> |chain  +-> +validator_chains+-> |common_chain+--> +-----+   |idr_type|
+--------+  +------+   |handler|   |distribution_fn |   +------------+    |attr2|   |access  |
|        |  |      |   +-------+   +----------------+   |vendor chain|    +-----+   +--------+
|        |  |      |                                    +------------+
|        |  +------+
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
+--------+

Once validation and object fetching (or creation) completed, we call
the handler:
int (*handler)(struct ib_device *ib_dev, struct ib_ucontext *ucontext,
               struct uverbs_attr_array *ctx, size_t num, void *priv);

Where ctx is an array of uverbs_attr_array. Each element in this array
is an array of attributes which corresponds to one validator.
For example, in the usually used case:

 ctx                               core
+----------------------------+     +------------+
| core: uverbs_attr_array    +---> | valid      |
+----------------------------+     | cmd_attr   |
| driver: uverbs_attr_array  |     +------------+
|----------------------------+--+  | valid      |
                                |  | cmd_attr   |
                                |  +------------+
                                |  | valid      |
                                |  | obj_attr   |
                                |  +------------+
                                |
                                |  driver
                                |  +------------+
                                +> | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | obj_attr   |
                                   +------------+

Ctx array's indices corresponds to the various validators in the chain
spec and is given by the same order. The indices of core and driver
corresponds to the attributes name spaces of each validator.
Thus, we could think of the following as belong to one "object":
1. Set of attribute specification (with their attribute IDs)
2. Validator which owns (1) specifications
3. A function which could handle these attributes which the action's
   handler probably calls.

That means that core and driver attributes are the validated arguments
of function (3). Their "types" exist in this function scope in the
same way as bar exists in foo's scope in the following example:
int foo(some_type_t bar);

Since the common and vendor distinction is the frequent use case, we
propose a wrapper that allows a definition of the following handler:
int (*handler)(struct ib_device *ib_dev, struct ib_ucontext *ucontext,
               struct uverbs_attr_array *common,
               struct uverbs_attr_array *vendor,
               void *priv);

Upon success, reference count of uobjects and use count will be a
updated automatically according to the specification.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/Makefile           |   2 +-
 drivers/infiniband/core/uidr.c             | 129 +++++++++++++
 drivers/infiniband/core/uidr.h             |  13 ++
 drivers/infiniband/core/uobject.c          |   2 +-
 drivers/infiniband/core/uverbs.h           |   4 +
 drivers/infiniband/core/uverbs_ioctl.c     | 279 +++++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_ioctl_cmd.c |  57 ++++++
 drivers/infiniband/core/uverbs_main.c      |   6 +
 include/rdma/ib_verbs.h                    |   4 +
 include/rdma/uverbs_ioctl.h                | 202 +++++++++++++++++++++
 include/rdma/uverbs_ioctl_cmd.h            |  53 ++++++
 include/uapi/rdma/ib_user_ioctl.h          |  23 +++
 12 files changed, 772 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/uverbs_ioctl.c
 create mode 100644 drivers/infiniband/core/uverbs_ioctl_cmd.c
 create mode 100644 include/rdma/uverbs_ioctl.h
 create mode 100644 include/rdma/uverbs_ioctl_cmd.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index c8bf8de..45caf7b 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -29,4 +29,4 @@ ib_umad-y :=			user_mad.o
 ib_ucm-y :=			ucm.o
 
 ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
-				uidr.o uobject.o
+				uidr.o uobject.o uverbs_ioctl.o uverbs_ioctl_cmd.o
diff --git a/drivers/infiniband/core/uidr.c b/drivers/infiniband/core/uidr.c
index 1acdf64..72c4c77 100644
--- a/drivers/infiniband/core/uidr.c
+++ b/drivers/infiniband/core/uidr.c
@@ -31,9 +31,138 @@
  */
 
 #include <rdma/ib_verbs.h>
+#include <rdma/uverbs_ioctl.h>
 #include "uidr.h"
 #include "uobject.h"
 
+struct uverbs_uobject_type *uverbs_get_type(struct ib_device *ibdev,
+					    uint16_t type)
+{
+	struct uverbs_uobject_type *uobj_type;
+
+	list_for_each_entry(uobj_type, &ibdev->type_list, type_list) {
+		if (uobj_type->obj_type == type)
+			return uobj_type;
+	}
+
+	return NULL;
+}
+
+static int uverbs_lock_object(struct ib_uobject *uobj, int access)
+{
+	if (access == UVERBS_IDR_ACCESS_READ)
+		return __atomic_add_unless(&uobj->usecnt, 1, -1) == -1 ?
+			-EBUSY : 0;
+	else
+		/* lock is either WRITE or DESTROY - should be exclusive */
+		return atomic_cmpxchg(&uobj->usecnt, 0, -1) == 0 ? 0 : -EBUSY;
+}
+
+static struct ib_uobject *get_uobject_from_context(struct ib_ucontext *ucontext,
+						   const struct uverbs_uobject_type *type,
+						   uint32_t idr)
+{
+	struct uverbs_uobject_list *iter;
+	struct ib_uobject *uobj;
+
+	/* TODO: use something smarter.... hash? */
+	list_for_each_entry(iter, &ucontext->uobjects_lists, type_list)
+		if (iter->type == type)
+			list_for_each_entry(uobj, &iter->list, idr_list)
+				if (uobj->id == idr)
+					return uobj;
+
+	return NULL;
+}
+
+struct ib_uobject *uverbs_get_type_from_idr(struct uverbs_uobject_type *type,
+					    struct ib_ucontext *ucontext,
+					    int access,
+					    uint32_t idr)
+{
+	struct ib_uobject *uobj;
+	int ret;
+
+	if (access == UVERBS_IDR_ACCESS_NEW) {
+		uobj = kmalloc(sizeof(*uobj), GFP_KERNEL);
+		if (!uobj)
+			return ERR_PTR(-ENOMEM);
+
+		init_uobj(uobj, 0, ucontext, &type->lock_class);
+
+		/* lock idr */
+		ret = ib_uverbs_uobject_add(uobj, type);
+		if (ret)
+			goto free_uobj;
+
+		ret = uverbs_lock_object(uobj, access);
+		if (ret)
+			goto remove_uobj;
+	} else {
+		uobj = get_uobject_from_context(ucontext, type, idr);
+
+		if (uobj) {
+			ret = uverbs_lock_object(uobj, access);
+			if (ret)
+				return ERR_PTR(ret);
+			return uobj;
+		}
+
+		return ERR_PTR(-ENOENT);
+	}
+remove_uobj:
+	ib_uverbs_uobject_remove(uobj);
+free_uobj:
+	kfree(uobj);
+	return ERR_PTR(ret);
+}
+
+static void uverbs_unlock_object(struct ib_uobject *uobj, int access)
+{
+	if (access == UVERBS_IDR_ACCESS_READ) {
+		atomic_dec(&uobj->usecnt);
+	} else {
+		if (access == UVERBS_IDR_ACCESS_NEW)
+			ib_uverbs_uobject_enable(uobj);
+
+		if (access == UVERBS_IDR_ACCESS_WRITE ||
+		    access == UVERBS_IDR_ACCESS_NEW)
+			atomic_set(&uobj->usecnt, 0);
+
+		if (access == UVERBS_IDR_ACCESS_DESTROY)
+			ib_uverbs_uobject_remove(uobj);
+	}
+}
+
+void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
+			   size_t num,
+			   const struct action_spec *chain,
+			   bool success)
+{
+	unsigned int i;
+
+	for (i = 0; i < num; i++) {
+		struct uverbs_attr_array *attr_spec_array = &attr_array[i];
+		const struct uverbs_attr_chain_spec *chain_spec =
+			chain->validator_chains[i];
+		unsigned int j;
+
+		for (j = 0; j < attr_spec_array->num_attrs; j++) {
+			struct uverbs_attr *attr = &attr_spec_array->attrs[j];
+			struct uverbs_attr_spec *spec = &chain_spec->attrs[j];
+
+			if (spec->type != UVERBS_ATTR_TYPE_IDR || !attr->valid)
+				continue;
+
+			/* TODO: if (!success) -> reduce refcount, otherwise
+			 * fetching from idr already increased the refcount
+			 */
+			uverbs_unlock_object(attr->obj_attr.uobject,
+					     spec->idr.access);
+		}
+	}
+}
+
 int idr_add_uobj(struct ib_uobject *uobj)
 {
 	int ret;
diff --git a/drivers/infiniband/core/uidr.h b/drivers/infiniband/core/uidr.h
index 581b394..9eb1e66 100644
--- a/drivers/infiniband/core/uidr.h
+++ b/drivers/infiniband/core/uidr.h
@@ -38,7 +38,20 @@
 #define UIDR_H
 
 #include <linux/idr.h>
+#include <rdma/uverbs_ioctl.h>
 
+struct uverbs_uobject_type *uverbs_get_type(struct ib_device *ibdev,
+					    uint16_t type);
+struct ib_uobject *uverbs_get_type_from_idr(struct uverbs_uobject_type *type,
+					    struct ib_ucontext *ucontext,
+					    int access,
+					    uint32_t idr);
+void ib_uverbs_uobject_remove(struct ib_uobject *uobject);
+void ib_uverbs_uobject_enable(struct ib_uobject *uobject);
+void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
+			   size_t num,
+			   const struct action_spec *chain,
+			   bool success);
 int idr_add_uobj(struct ib_uobject *uobj);
 void idr_remove_uobj(struct ib_uobject *uobj);
 struct ib_uobject *idr_write_uobj(int id, struct ib_ucontext *context);
diff --git a/drivers/infiniband/core/uobject.c b/drivers/infiniband/core/uobject.c
index c480b3b..c3d1098 100644
--- a/drivers/infiniband/core/uobject.c
+++ b/drivers/infiniband/core/uobject.c
@@ -31,7 +31,7 @@
  */
 
 #include <rdma/ib_verbs.h>
-
+#include <rdma/uverbs_ioctl.h>
 #include "uobject.h"
 #include "uidr.h"
 
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 11bad7a..d2e964e 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -42,6 +42,7 @@
 #include <linux/mutex.h>
 #include <linux/completion.h>
 #include <linux/cdev.h>
+#include <linux/rwsem.h>
 
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_umem.h>
@@ -84,6 +85,8 @@
  * released when the CQ is destroyed.
  */
 
+long ib_uverbs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+
 struct ib_uverbs_device {
 	atomic_t				refcount;
 	int					num_comp_vectors;
@@ -122,6 +125,7 @@ struct ib_uverbs_file {
 	struct ib_uverbs_event_file	       *async_file;
 	struct list_head			list;
 	int					is_closed;
+	struct rw_semaphore			close_sem;
 };
 
 struct ib_uverbs_event {
diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
new file mode 100644
index 0000000..6160151
--- /dev/null
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -0,0 +1,279 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/ib_user_ioctl.h>
+#include <rdma/uverbs_ioctl.h>
+#include "uidr.h"
+#include "uverbs.h"
+
+static int uverbs_validate_attr(struct ib_device *ibdev,
+				struct ib_ucontext *ucontext,
+				const struct ib_uverbs_attr *uattr,
+				u16 attr_id,
+				const struct uverbs_attr_chain_spec *chain_spec,
+				struct uverbs_attr *elements)
+{
+	const struct uverbs_attr_spec *spec;
+	struct uverbs_attr *e;
+
+	if (uattr->reserved)
+		return -EINVAL;
+
+	if (attr_id >= chain_spec->num_attrs)
+		return -EINVAL;
+
+	spec = &chain_spec->attrs[attr_id];
+	e = &elements[attr_id];
+
+	if (e->valid)
+		return -EINVAL;
+
+	switch (spec->type) {
+	case UVERBS_ATTR_TYPE_PTR_IN:
+	case UVERBS_ATTR_TYPE_PTR_OUT:
+		if (uattr->len != spec->len)
+			return -EINVAL;
+		e->cmd_attr.ptr = uattr->ptr_idr;
+		e->cmd_attr.len = uattr->len;
+		break;
+
+	case UVERBS_ATTR_TYPE_IDR:
+		if (uattr->len != 0 || (uattr->ptr_idr >> 32) || (!ucontext))
+			return -EINVAL;
+
+		e->obj_attr.idr = (uint32_t)uattr->ptr_idr;
+		e->obj_attr.val = spec;
+		e->obj_attr.type = uverbs_get_type(ibdev,
+						   spec->idr.idr_type);
+		if (!e->obj_attr.type)
+			return -EINVAL;
+
+		e->obj_attr.uobject = uverbs_get_type_from_idr(e->obj_attr.type,
+							       ucontext,
+							       spec->idr.access,
+							       e->obj_attr.idr);
+		if (!e->obj_attr.uobject)
+			return -EINVAL;
+
+		break;
+	};
+
+	e->valid = 1;
+	return 0;
+}
+
+static int uverbs_validate(struct ib_device *ibdev,
+			   struct ib_ucontext *ucontext,
+			   const struct ib_uverbs_attr *uattrs,
+			   size_t num_attrs,
+			   const struct action_spec *action_spec,
+			   struct uverbs_attr_array *attr_array)
+{
+	size_t i;
+	int ret;
+	int n_val = -1;
+
+	for (i = 0; i < num_attrs; i++) {
+		const struct ib_uverbs_attr *uattr = &uattrs[i];
+		__u16 attr_id = uattr->attr_id;
+		const struct uverbs_attr_chain_spec *chain_spec;
+
+		ret = action_spec->dist(&attr_id, action_spec->priv);
+		if (ret < 0)
+			return ret;
+
+		if (ret > n_val)
+			n_val = ret;
+
+		chain_spec = action_spec->validator_chains[ret];
+		ret = uverbs_validate_attr(ibdev, ucontext, uattr, attr_id,
+					   chain_spec, attr_array[ret].attrs);
+		if (ret)
+			return ret;
+
+		if (attr_array[ret].num_attrs < attr_id)
+			attr_array[ret].num_attrs = attr_id;
+	}
+
+	return n_val >= 0 ? n_val + 1 : n_val;
+}
+
+static int uverbs_handle_action(const struct ib_uverbs_attr *uattrs,
+				size_t num_attrs,
+				struct ib_device *ibdev,
+				struct ib_ucontext *ucontext,
+				const struct uverbs_action *handler,
+				struct uverbs_attr_array *attr_array)
+{
+	int ret;
+	int n_val;
+
+	n_val = uverbs_validate(ibdev, ucontext, uattrs, num_attrs,
+				&handler->chain, attr_array);
+	if (n_val <= 0)
+		return n_val;
+
+	ret = handler->handler(ibdev, ucontext, attr_array, n_val,
+			       handler->priv);
+	uverbs_unlock_objects(attr_array, n_val, &handler->chain, !ret);
+
+	return ret;
+}
+
+static long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
+				struct ib_uverbs_file *file,
+				struct ib_uverbs_ioctl_hdr *hdr,
+				void __user *buf)
+{
+	const struct uverbs_type_actions *type;
+	const struct uverbs_action *action;
+	const struct action_spec *action_spec;
+	long err = 0;
+	unsigned int num_specs = 0;
+	unsigned int i;
+	struct {
+		struct ib_uverbs_attr		*uattrs;
+		struct uverbs_attr_array	*uverbs_attr_array;
+	} *ctx = NULL;
+	struct uverbs_attr *curr_attr;
+	size_t ctx_size;
+
+	if (!ib_dev)
+		return -EIO;
+
+	if (ib_dev->types->num_types < hdr->object_type ||
+	    ib_dev->driver_id != hdr->driver_id)
+		return -EINVAL;
+
+	type = ib_dev->types->types[hdr->object_type];
+	if (!type || type->num_actions < hdr->action)
+		return -EOPNOTSUPP;
+
+	action = &type->actions[hdr->action];
+	if (!action)
+		return -EOPNOTSUPP;
+
+	action_spec = &action->chain;
+	for (i = 0; i < action_spec->num_chains;
+	     num_specs += action_spec->validator_chains[i]->num_attrs, i++)
+		;
+
+	ctx_size = sizeof(*ctx->uattrs) * hdr->num_attrs +
+		   sizeof(*ctx->uverbs_attr_array->attrs) * num_specs +
+		   sizeof(struct uverbs_attr_array) * action_spec->num_chains +
+		   sizeof(*ctx);
+
+	ctx = kzalloc(ctx_size, GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->uverbs_attr_array = (void *)(ctx + 1);
+	ctx->uattrs = (void *)(&ctx->uverbs_attr_array + 1);
+	curr_attr = (void *)(ctx->uattrs + hdr->num_attrs);
+	for (i = 0; i < action_spec->num_chains; i++) {
+		ctx->uverbs_attr_array[i].attrs = curr_attr;
+		ctx->uverbs_attr_array[i].num_attrs =
+			action_spec->validator_chains[i]->num_attrs;
+		curr_attr += action_spec->validator_chains[i]->num_attrs;
+	}
+
+	err = copy_from_user(ctx->uattrs, hdr->attrs,
+			     sizeof(*ctx->uattrs) * hdr->num_attrs);
+	if (err) {
+		err = -EFAULT;
+		goto out;
+	}
+
+	err = uverbs_handle_action(ctx->uattrs, hdr->num_attrs, ib_dev,
+				   file->ucontext, action, ctx->uverbs_attr_array);
+out:
+	kfree(ctx);
+	return err;
+}
+
+#define IB_UVERBS_MAX_CMD_SZ 4096
+
+long ib_uverbs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	struct ib_uverbs_file *file = filp->private_data;
+	struct ib_uverbs_ioctl_hdr __user *user_hdr =
+		(struct ib_uverbs_ioctl_hdr __user *)arg;
+	struct ib_uverbs_ioctl_hdr hdr;
+	struct ib_device *ib_dev;
+	int srcu_key;
+	long err;
+
+	srcu_key = srcu_read_lock(&file->device->disassociate_srcu);
+	ib_dev = srcu_dereference(file->device->ib_dev,
+				  &file->device->disassociate_srcu);
+	if (!ib_dev) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (cmd == IB_USER_DIRECT_IOCTL_COMMAND) {
+		/* TODO? */
+		err = -ENOSYS;
+		goto out;
+	} else {
+		if (cmd != IB_USER_VERBS_IOCTL_COMMAND) {
+			err = -ENOIOCTLCMD;
+			goto out;
+		}
+
+		err = copy_from_user(&hdr, user_hdr, sizeof(hdr));
+
+		if (err || hdr.length > IB_UVERBS_MAX_CMD_SZ ||
+		    hdr.length <= sizeof(hdr) ||
+		    hdr.length != sizeof(hdr) + hdr.num_attrs * sizeof(struct ib_uverbs_attr)) {
+			err = -EINVAL;
+			goto out;
+		}
+
+		/* currently there are no flags supported */
+		if (hdr.flags) {
+			err = -EOPNOTSUPP;
+			goto out;
+		}
+
+		/* We're closing, fail all commands */
+		if (!down_read_trylock(&file->close_sem))
+			return -EIO;
+		err = ib_uverbs_cmd_verbs(ib_dev, file, &hdr,
+					  (__user void *)arg + sizeof(hdr));
+		up_read(&file->close_sem);
+	}
+out:
+	srcu_read_unlock(&file->device->disassociate_srcu, srcu_key);
+
+	return err;
+}
diff --git a/drivers/infiniband/core/uverbs_ioctl_cmd.c b/drivers/infiniband/core/uverbs_ioctl_cmd.c
new file mode 100644
index 0000000..d38b7ec
--- /dev/null
+++ b/drivers/infiniband/core/uverbs_ioctl_cmd.c
@@ -0,0 +1,57 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/uverbs_ioctl_cmd.h>
+#include <linux/bug.h>
+
+#define IB_UVERBS_VENDOR_FLAG	0x8000
+
+int ib_uverbs_std_dist(__u16 *attr_id, void *priv)
+{
+	if (*attr_id & IB_UVERBS_VENDOR_FLAG) {
+		*attr_id &= ~IB_UVERBS_VENDOR_FLAG;
+		return 1;
+	}
+	return 0;
+}
+
+int uverbs_action_std_handle(struct ib_device *ib_dev,
+			     struct ib_ucontext *ucontext,
+			     struct uverbs_attr_array *ctx, size_t num,
+			     void *_priv)
+{
+	struct uverbs_action_std_handler *priv = _priv;
+
+	WARN_ON(num != 2);
+
+	return priv->handler(ib_dev, ucontext, &ctx[0], &ctx[1], priv->priv);
+}
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 5dce312..ab20647 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -49,6 +49,7 @@
 #include <asm/uaccess.h>
 
 #include <rdma/ib.h>
+#include <rdma/ib_ioctl.h>
 
 #include "uverbs.h"
 #include "uidr.h"
@@ -212,6 +213,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 {
 	struct ib_uobject *uobj, *tmp;
 
+	down_write(&file->close_sem);
 	context->closing = 1;
 
 	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
@@ -307,6 +309,7 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	}
 
 	put_pid(context->tgid);
+	up_write(&file->close_sem);
 
 	return context->device->dealloc_ucontext(context);
 }
@@ -916,6 +919,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
 		goto err;
 	}
 
+	init_rwsem(&file->close_sem);
 	file->device	 = dev;
 	file->ucontext	 = NULL;
 	file->async_file = NULL;
@@ -974,6 +978,7 @@ static const struct file_operations uverbs_fops = {
 	.open	 = ib_uverbs_open,
 	.release = ib_uverbs_close,
 	.llseek	 = no_llseek,
+	.unlocked_ioctl = ib_uverbs_ioctl,
 };
 
 static const struct file_operations uverbs_mmap_fops = {
@@ -983,6 +988,7 @@ static const struct file_operations uverbs_mmap_fops = {
 	.open	 = ib_uverbs_open,
 	.release = ib_uverbs_close,
 	.llseek	 = no_llseek,
+	.unlocked_ioctl = ib_uverbs_ioctl,
 };
 
 static struct ib_client uverbs_client = {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index c18155f..e402473 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1346,6 +1346,8 @@ struct ib_ucontext {
 #endif
 };
 
+struct uverbs_object_list;
+
 struct ib_uobject {
 	u64			user_handle;	/* handle given to us by userspace */
 	struct ib_ucontext     *context;	/* associated user context */
@@ -1353,6 +1355,7 @@ struct ib_uobject {
 	struct list_head	list;		/* link to context's list */
 	int			id;		/* index into kernel idr */
 	struct kref		ref;
+	atomic_t		usecnt;
 	struct rw_semaphore	mutex;		/* protects .live */
 	struct rcu_head		rcu;		/* kfree_rcu() overhead */
 	int			live;
@@ -1967,6 +1970,7 @@ struct ib_device {
 	int (*get_port_immutable)(struct ib_device *, u8, struct ib_port_immutable *);
 	struct list_head type_list;
 
+	u16				driver_id;
 	const struct uverbs_types	*types;
 };
 
diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
new file mode 100644
index 0000000..625bd6e
--- /dev/null
+++ b/include/rdma/uverbs_ioctl.h
@@ -0,0 +1,202 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UVERBS_IOCTL_
+#define _UVERBS_IOCTL_
+
+#include <linux/kernel.h>
+
+struct uverbs_object_type;
+struct ib_ucontext;
+struct ib_device;
+
+/*
+ * =======================================
+ *	Verbs action specifications
+ * =======================================
+ */
+
+enum uverbs_attr_type {
+	UVERBS_ATTR_TYPE_PTR_IN,
+	UVERBS_ATTR_TYPE_PTR_OUT,
+	UVERBS_ATTR_TYPE_IDR,
+	/*
+	 * TODO: we could add FD type for command which will migrate the events
+	 * to a specific FD.
+	 */
+};
+
+enum uverbs_idr_access {
+	UVERBS_IDR_ACCESS_READ,
+	UVERBS_IDR_ACCESS_WRITE,
+	UVERBS_IDR_ACCESS_NEW,
+	UVERBS_IDR_ACCESS_DESTROY
+};
+
+struct uverbs_attr_spec {
+	u16				len;
+	enum uverbs_attr_type		type;
+	struct {
+		u16			idr_type;
+		enum uverbs_idr_access	access;
+	} idr;
+	/* TODO: In case of FD, we could validate here the fops pointer */
+};
+
+struct uverbs_attr_chain_spec {
+	struct uverbs_attr_spec		*attrs;
+	size_t				num_attrs;
+};
+
+struct action_spec {
+	const struct uverbs_attr_chain_spec		**validator_chains;
+	/* if > 0 -> validator, otherwise, error */
+	int (*dist)(__u16 *attr_id, void *priv);
+	void						*priv;
+	size_t						num_chains;
+};
+
+struct uverbs_attr_array;
+
+struct uverbs_action {
+	struct action_spec chain;
+	void *priv;
+	int (*handler)(struct ib_device *ib_dev, struct ib_ucontext *ucontext,
+		       struct uverbs_attr_array *ctx, size_t num, void *priv);
+};
+
+struct uverbs_type_actions {
+	size_t				num_actions;
+	const struct uverbs_action	*actions;
+};
+
+struct uverbs_types {
+	size_t					num_types;
+	const struct uverbs_type_actions	**types;
+};
+
+#define UVERBS_ATTR(_id, _len, _type)					\
+	[_id] = {.len = _len, .type = _type}
+#define UVERBS_ATTR_PTR_IN(_id, _len)					\
+	UVERBS_ATTR(_id, _len, UVERBS_ATTR_TYPE_PTR_IN)
+#define UVERBS_ATTR_PTR_OUT(_id, _len)					\
+	UVERBS_ATTR(_id, _len, UVERBS_ATTR_TYPE_PTR_OUT)
+#define UVERBS_ATTR_IDR_IN(_id, _idr_type, _access)			\
+	[_id] = {.type = UVERBS_ATTR_TYPE_IDR,				\
+		 .idr = {.idr_type = _idr_type,				\
+			 .access = _access} }
+#define UVERBS_ATTR_CHAIN_SPEC_SZ(...)					\
+	(sizeof((const struct uverbs_attr_spec[]){__VA_ARGS__}) /	\
+	 sizeof(const struct uverbs_attr_spec))
+#define UVERBS_ATTR_CHAIN_SPEC(...)					\
+	(const struct uverbs_attr_chain_spec)				\
+	{.attrs = (struct uverbs_attr_spec[]){__VA_ARGS__},		\
+	 .num_attrs = UVERBS_ATTR_CHAIN_SPEC_SZ(__VA_ARGS__)}
+#define DECLARE_UVERBS_ATTR_CHAIN_SPEC(name, ...)			\
+	const struct uverbs_attr_chain_spec name =			\
+		UVERBS_ATTR_CHAIN_SPEC(__VA_ARGS__)
+#define UVERBS_ATTR_ACTION_SPEC_SZ(...)					  \
+	(sizeof((const struct uverbs_attr_chain_spec *[]){__VA_ARGS__}) / \
+				 sizeof(const struct uverbs_attr_chain_spec *))
+#define UVERBS_ATTR_ACTION_SPEC(_distfn, _priv, ...)			\
+	{.dist = _distfn,						\
+	 .priv = _priv,							\
+	 .num_chains =	UVERBS_ATTR_ACTION_SPEC_SZ(__VA_ARGS__),	\
+	 .validator_chains = (const struct uverbs_attr_chain_spec *[]){__VA_ARGS__} }
+#define UVERBS_STD_ACTION_SPEC(...)						\
+	UVERBS_ATTR_ACTION_SPEC(ib_uverbs_std_dist,				\
+				(void *)UVERBS_ATTR_ACTION_SPEC_SZ(__VA_ARGS__),\
+				__VA_ARGS__)
+#define UVERBS_STD_ACTION(_handler, _priv, ...)				\
+	{								\
+		.priv = &(struct uverbs_action_std_handler)		\
+			{.handler = _handler,				\
+			 .priv = _priv},				\
+		.handler = uverbs_action_std_handle,			\
+		.chain = UVERBS_STD_ACTION_SPEC(__VA_ARGS__)}
+#define UVERBS_ACTIONS_SZ(...)					\
+	(sizeof((const struct uverbs_action []){__VA_ARGS__}) /		\
+	 sizeof(const struct uverbs_action))
+#define UVERBS_ACTION(action_idx, _handler, _priv,  ...)		\
+	[action_idx] = UVERBS_STD_ACTION(_handler, _priv, __VA_ARGS__)
+#define UVERBS_ACTIONS(...)					\
+	((const struct uverbs_type_actions)				\
+	  {.num_actions = UVERBS_ACTIONS_SZ(__VA_ARGS__),		\
+	   .actions = (const struct uverbs_action[]){__VA_ARGS__} })
+#define DECLARE_UVERBS_TYPE(name, ...)					\
+	const struct uverbs_type_actions name = UVERBS_ACTIONS(__VA_ARGS__)
+#define UVERBS_TYPES_SZ(...)						\
+	(sizeof((const struct uverbs_type_actions *[]){__VA_ARGS__}) /	\
+	 sizeof(const struct uverbs_type_actions *))
+#define UVERBS_TYPE_ACTIONS(type_idx, ...)				\
+	[type_idx] = &UVERBS_ACTIONS(__VA_ARGS__)
+#define UVERBS_TYPE(type_idx, type_ptr)					\
+	[type_idx] = ((const struct uverbs_type_actions * const)&type_ptr)
+#define UVERBS_TYPES(...)						\
+	{.num_types = UVERBS_TYPES_SZ(__VA_ARGS__),			\
+	 .types = (const struct uverbs_type_actions *[]){__VA_ARGS__} }
+
+/* =================================================
+ *              Parsing infrastructure
+ * =================================================
+ */
+
+struct uverbs_ptr_attr {
+	__u64	ptr;
+	__u16	len;
+};
+
+struct uverbs_obj_attr {
+	/*  idr handle */
+	__u32	idr;
+	/* pointer to the kernel descriptor -> type, access, etc */
+	const struct uverbs_attr_spec *val;
+	struct ib_uobject		*uobject;
+	struct uverbs_uobject_type	*type;
+};
+
+struct uverbs_attr {
+	bool valid;
+	union {
+		struct uverbs_ptr_attr	cmd_attr;
+		struct uverbs_obj_attr	obj_attr;
+	};
+};
+
+/* output of one validator */
+struct uverbs_attr_array {
+	size_t num_attrs;
+	/* arrays of attrubytes, index is the id i.e SEND_CQ */
+	struct uverbs_attr *attrs;
+};
+
+#endif
diff --git a/include/rdma/uverbs_ioctl_cmd.h b/include/rdma/uverbs_ioctl_cmd.h
new file mode 100644
index 0000000..07cf6fa
--- /dev/null
+++ b/include/rdma/uverbs_ioctl_cmd.h
@@ -0,0 +1,53 @@
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UVERBS_IOCTL_CMD_
+#define _UVERBS_IOCTL_CMD_
+
+#include <rdma/uverbs_ioctl.h>
+
+/* common validators */
+
+int uverbs_action_std_handle(struct ib_device *ib_dev,
+			     struct ib_ucontext *ucontext,
+			     struct uverbs_attr_array *ctx, size_t num,
+			     void *_priv);
+
+struct uverbs_action_std_handler {
+	int (*handler)(struct ib_device *ib_dev, struct ib_ucontext *ucontext,
+		       struct uverbs_attr_array *common,
+		       struct uverbs_attr_array *vendor,
+		       void *priv);
+	void *priv;
+};
+#endif
+
diff --git a/include/uapi/rdma/ib_user_ioctl.h b/include/uapi/rdma/ib_user_ioctl.h
index d1928dea..bd25a86 100644
--- a/include/uapi/rdma/ib_user_ioctl.h
+++ b/include/uapi/rdma/ib_user_ioctl.h
@@ -38,6 +38,29 @@
 
 #define IB_IOCTL_MAGIC		0x1b
 
+#define IB_USER_VERBS_IOCTL_COMMAND \
+	_IOWR(IB_IOCTL_MAGIC, 1, struct ib_uverbs_ioctl_hdr)
+
+#define IB_USER_DIRECT_IOCTL_COMMAND \
+	_IOWR(IB_IOCTL_MAGIC, 2, struct ib_uverbs_ioctl_hdr)
+
+struct ib_uverbs_attr {
+	__u16 attr_id;		/* command specific type attribute */
+	__u16 len;		/* NA for idr */
+	__u32 reserved;
+	 __u64 ptr_idr;		/* ptr typeo command/idr handle */
+};
+
+struct ib_uverbs_ioctl_hdr {
+	__u16 length;
+	__u16 flags;
+	__u16 object_type;
+	__u16 driver_id;
+	__u16 action;
+	__u16 num_attrs;
+	struct ib_uverbs_attr  attrs[0];
+};
+
 /* Legacy part
  * !!!! NOTE: It uses the same command index as VERBS
  */
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC ABI V1 7/8] RDMA/core: Change locking of ib_uobject
       [not found] ` <1467293971-25688-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-06-30 13:39   ` [RFC ABI V1 6/8] RDMA/core: Add new ioctl interface Matan Barak
@ 2016-06-30 13:39   ` Matan Barak
       [not found]     ` <1467293971-25688-8-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-06-30 13:39   ` [RFC ABI V1 8/8] RDMA/{hw, core}: Provide simple skeleton to IOCTL interface Matan Barak
  7 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-06-30 13:39 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Tal Alon, Liran Liss,
	Haggai Eran, Matan Barak, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky

This patch presents some concept about lockless scheme. It's only
intended to open discussion and still misses some crucial parts.

Every ib_uobject has a usecnt atomic variable. Single rwsem in
ib_uverbs_file (protect destruction of device from execution of
commands).

For every command execution:
  a. Check if device is available (as of today, with srcu)
  b. Take down_read(&file->close_sem)
  c. <User is now protected from closing the process and ib-dev
     destruction>
  d. When an object is used -> use uverbs_lock_object function
     implemented in this patch.
       i .If the object isn't available (-EBUSY)
  e. Unlock with uverbs_unlock_object (implemented in this patch)
  f. Release up_read(&file->close_sem)

Disassociate/process destruction:
  * In disassociate, do thee following for every process
  1. Grab down_write(&file->close_sem)
  2. Release all objects from context, ordered by type list (call free
     function the driver has specified)
  3. Release up_write(&file->close_sem)

ib_uobjects are lockless. If two commands are executed in
parallel and one need exclusive access (WRITE/DESTROY) -> one
command will fail. User-space needs to either synchronize or do
something productive with the failure.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uidr.c       | 11 +-----
 drivers/infiniband/core/uobject.c    |  6 +--
 drivers/infiniband/core/uobject.h    |  4 +-
 drivers/infiniband/core/uverbs_cmd.c | 72 +++++++++++++++---------------------
 include/rdma/ib_verbs.h              |  1 -
 5 files changed, 35 insertions(+), 59 deletions(-)

diff --git a/drivers/infiniband/core/uidr.c b/drivers/infiniband/core/uidr.c
index 72c4c77..6018717 100644
--- a/drivers/infiniband/core/uidr.c
+++ b/drivers/infiniband/core/uidr.c
@@ -88,7 +88,7 @@ struct ib_uobject *uverbs_get_type_from_idr(struct uverbs_uobject_type *type,
 		if (!uobj)
 			return ERR_PTR(-ENOMEM);
 
-		init_uobj(uobj, 0, ucontext, &type->lock_class);
+		init_uobj(uobj, 0, ucontext);
 
 		/* lock idr */
 		ret = ib_uverbs_uobject_add(uobj, type);
@@ -213,10 +213,6 @@ static struct ib_uobject *idr_read_uobj(int id, struct ib_ucontext *context,
 	if (!uobj)
 		return NULL;
 
-	if (nested)
-		down_read_nested(&uobj->mutex, SINGLE_DEPTH_NESTING);
-	else
-		down_read(&uobj->mutex);
 	if (!uobj->live) {
 		put_uobj_read(uobj);
 		return NULL;
@@ -233,11 +229,8 @@ struct ib_uobject *idr_write_uobj(int id, struct ib_ucontext *context)
 	if (!uobj)
 		return NULL;
 
-	down_write(&uobj->mutex);
-	if (!uobj->live) {
-		put_uobj_write(uobj);
+	if (!uobj->live)
 		return NULL;
-	}
 
 	return uobj;
 }
diff --git a/drivers/infiniband/core/uobject.c b/drivers/infiniband/core/uobject.c
index c3d1098..ed862e8 100644
--- a/drivers/infiniband/core/uobject.c
+++ b/drivers/infiniband/core/uobject.c
@@ -179,13 +179,11 @@ void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
  */
 
 void init_uobj(struct ib_uobject *uobj, u64 user_handle,
-	       struct ib_ucontext *context, struct uverbs_lock_class *c)
+	       struct ib_ucontext *context)
 {
 	uobj->user_handle = user_handle;
 	uobj->context     = context;
 	kref_init(&uobj->ref);
-	init_rwsem(&uobj->mutex);
-	lockdep_set_class_and_name(&uobj->mutex, &c->key, c->name);
 	uobj->live        = 0;
 }
 
@@ -201,12 +199,10 @@ void put_uobj(struct ib_uobject *uobj)
 
 void put_uobj_read(struct ib_uobject *uobj)
 {
-	up_read(&uobj->mutex);
 	put_uobj(uobj);
 }
 
 void put_uobj_write(struct ib_uobject *uobj)
 {
-	up_write(&uobj->mutex);
 	put_uobj(uobj);
 }
diff --git a/drivers/infiniband/core/uobject.h b/drivers/infiniband/core/uobject.h
index 13fdaef..3514a1b 100644
--- a/drivers/infiniband/core/uobject.h
+++ b/drivers/infiniband/core/uobject.h
@@ -48,12 +48,12 @@ struct uverbs_uobject_type {
 		     struct ib_uobject *uobject,
 		     struct ib_ucontext *ucontext);
 	u16			obj_type;
-	struct uverbs_lock_class lock_class;
 };
 
 /* embed in ucontext per type */
 struct uverbs_uobject_list {
 	struct uverbs_uobject_type	*type;
+	/* TODO: replace with hash for faster access */
 	struct list_head		list;
 	struct list_head		type_list;
 };
@@ -64,7 +64,7 @@ void ib_uverbs_uobject_remove(struct ib_uobject *uobject);
 void ib_uverbs_uobject_enable(struct ib_uobject *uobject);
 
 void init_uobj(struct ib_uobject *uobj, u64 user_handle,
-	       struct ib_ucontext *context, struct uverbs_lock_class *c);
+	       struct ib_ucontext *context);
 
 void release_uobj(struct kref *kref);
 void put_uobj(struct ib_uobject *uobj);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 22406fe..148e26e 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -46,16 +46,6 @@
 
 #include "core_priv.h"
 
-static struct uverbs_lock_class pd_lock_class	= { .name = "PD-uobj" };
-static struct uverbs_lock_class mr_lock_class	= { .name = "MR-uobj" };
-static struct uverbs_lock_class mw_lock_class	= { .name = "MW-uobj" };
-static struct uverbs_lock_class cq_lock_class	= { .name = "CQ-uobj" };
-static struct uverbs_lock_class qp_lock_class	= { .name = "QP-uobj" };
-static struct uverbs_lock_class ah_lock_class	= { .name = "AH-uobj" };
-static struct uverbs_lock_class srq_lock_class	= { .name = "SRQ-uobj" };
-static struct uverbs_lock_class xrcd_lock_class = { .name = "XRCD-uobj" };
-static struct uverbs_lock_class rule_lock_class = { .name = "RULE-uobj" };
-
 ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 			      struct ib_device *ib_dev,
 			      const char __user *buf,
@@ -308,8 +298,8 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
 	if (!uobj)
 		return -ENOMEM;
 
-	init_uobj(uobj, 0, file->ucontext, &pd_lock_class);
-	down_write(&uobj->mutex);
+	down_read(&file->close_sem);
+	init_uobj(uobj, 0, file->ucontext);
 
 	pd = ib_dev->alloc_pd(ib_dev, file->ucontext, &udata);
 	if (IS_ERR(pd)) {
@@ -342,7 +332,7 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
 
 	uobj->live = 1;
 
-	up_write(&uobj->mutex);
+	up_read(&file->close_sem);
 
 	return in_len;
 
@@ -543,9 +533,8 @@ ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file,
 		goto err_tree_mutex_unlock;
 	}
 
-	init_uobj(&obj->uobject, 0, file->ucontext, &xrcd_lock_class);
-
-	down_write(&obj->uobject.mutex);
+	down_read(&file->close_sem);
+	init_uobj(&obj->uobject, 0, file->ucontext);
 
 	if (!xrcd) {
 		xrcd = ib_dev->alloc_xrcd(ib_dev, file->ucontext, &udata);
@@ -595,7 +584,7 @@ ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file,
 	mutex_unlock(&file->mutex);
 
 	obj->uobject.live = 1;
-	up_write(&obj->uobject.mutex);
+	up_read(&file->close_sem);
 
 	mutex_unlock(&file->device->xrcd_tree_mutex);
 	return in_len;
@@ -737,8 +726,8 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 	if (!uobj)
 		return -ENOMEM;
 
-	init_uobj(uobj, 0, file->ucontext, &mr_lock_class);
-	down_write(&uobj->mutex);
+	down_read(&file->close_sem);
+	init_uobj(uobj, 0, file->ucontext);
 
 	pd = idr_read_pd(cmd.pd_handle, file->ucontext);
 	if (!pd) {
@@ -791,7 +780,7 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 
 	uobj->live = 1;
 
-	up_write(&uobj->mutex);
+	up_read(&file->close_sem);
 
 	return in_len;
 
@@ -959,8 +948,8 @@ ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
 	if (!uobj)
 		return -ENOMEM;
 
-	init_uobj(uobj, 0, file->ucontext, &mw_lock_class);
-	down_write(&uobj->mutex);
+	down_read(&file->close_sem);
+	init_uobj(uobj, 0, file->ucontext);
 
 	pd = idr_read_pd(cmd.pd_handle, file->ucontext);
 	if (!pd) {
@@ -1007,7 +996,7 @@ ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
 
 	uobj->live = 1;
 
-	up_write(&uobj->mutex);
+	up_read(&file->close_sem);
 
 	return in_len;
 
@@ -1129,8 +1118,8 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
 	if (!obj)
 		return ERR_PTR(-ENOMEM);
 
-	init_uobj(&obj->uobject, cmd->user_handle, file->ucontext, &cq_lock_class);
-	down_write(&obj->uobject.mutex);
+	down_read(&file->close_sem);
+	init_uobj(&obj->uobject, cmd->user_handle, file->ucontext);
 
 	if (cmd->comp_channel >= 0) {
 		ev_file = ib_uverbs_lookup_comp_file(cmd->comp_channel);
@@ -1188,7 +1177,7 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
 
 	obj->uobject.live = 1;
 
-	up_write(&obj->uobject.mutex);
+	up_read(&file->close_sem);
 
 	return obj;
 
@@ -1530,9 +1519,8 @@ static int create_qp(struct ib_uverbs_file *file,
 	if (!obj)
 		return -ENOMEM;
 
-	init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext,
-		  &qp_lock_class);
-	down_write(&obj->uevent.uobject.mutex);
+	down_read(&file->close_sem);
+	init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext);
 
 	if (cmd->qp_type == IB_QPT_XRC_TGT) {
 		xrcd = idr_read_xrcd(cmd->pd_handle, file->ucontext,
@@ -1692,7 +1680,7 @@ static int create_qp(struct ib_uverbs_file *file,
 
 	obj->uevent.uobject.live = 1;
 
-	up_write(&obj->uevent.uobject.mutex);
+	up_read(&file->close_sem);
 
 	return 0;
 err_cb:
@@ -1853,8 +1841,8 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
 	if (!obj)
 		return -ENOMEM;
 
-	init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext, &qp_lock_class);
-	down_write(&obj->uevent.uobject.mutex);
+	down_read(&file->close_sem);
+	init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext);
 
 	xrcd = idr_read_xrcd(cmd.pd_handle, file->ucontext, &xrcd_uobj);
 	if (!xrcd) {
@@ -1904,7 +1892,7 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
 
 	obj->uevent.uobject.live = 1;
 
-	up_write(&obj->uevent.uobject.mutex);
+	up_read(&file->close_sem);
 
 	return in_len;
 
@@ -2593,8 +2581,8 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	if (!uobj)
 		return -ENOMEM;
 
-	init_uobj(uobj, cmd.user_handle, file->ucontext, &ah_lock_class);
-	down_write(&uobj->mutex);
+	down_read(&file->close_sem);
+	init_uobj(uobj, cmd.user_handle, file->ucontext);
 
 	pd = idr_read_pd(cmd.pd_handle, file->ucontext);
 	if (!pd) {
@@ -2644,7 +2632,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 
 	uobj->live = 1;
 
-	up_write(&uobj->mutex);
+	up_read(&file->close_sem);
 
 	return in_len;
 
@@ -2904,8 +2892,8 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 		err = -ENOMEM;
 		goto err_free_attr;
 	}
-	init_uobj(uobj, 0, file->ucontext, &rule_lock_class);
-	down_write(&uobj->mutex);
+	down_read(&file->close_sem);
+	init_uobj(uobj, 0, file->ucontext);
 
 	qp = idr_read_qp(cmd.qp_handle, file->ucontext);
 	if (!qp) {
@@ -2975,7 +2963,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 
 	uobj->live = 1;
 
-	up_write(&uobj->mutex);
+	up_read(&file->close_sem);
 	kfree(flow_attr);
 	if (cmd.flow_attr.num_of_specs)
 		kfree(kern_flow_attr);
@@ -3055,8 +3043,8 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	if (!obj)
 		return -ENOMEM;
 
-	init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext, &srq_lock_class);
-	down_write(&obj->uevent.uobject.mutex);
+	down_read(&file->close_sem);
+	init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext);
 
 	if (cmd->srq_type == IB_SRQT_XRC) {
 		attr.ext.xrc.xrcd  = idr_read_xrcd(cmd->xrcd_handle, file->ucontext, &xrcd_uobj);
@@ -3144,7 +3132,7 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 
 	obj->uevent.uobject.live = 1;
 
-	up_write(&obj->uevent.uobject.mutex);
+	up_read(&file->close_sem);
 
 	return 0;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e402473..84d72d2 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1356,7 +1356,6 @@ struct ib_uobject {
 	int			id;		/* index into kernel idr */
 	struct kref		ref;
 	atomic_t		usecnt;
-	struct rw_semaphore	mutex;		/* protects .live */
 	struct rcu_head		rcu;		/* kfree_rcu() overhead */
 	int			live;
 	/* List of object under uverbs_object_type */
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC ABI V1 8/8] RDMA/{hw, core}: Provide simple skeleton to IOCTL interface
       [not found] ` <1467293971-25688-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2016-06-30 13:39   ` [RFC ABI V1 7/8] RDMA/core: Change locking of ib_uobject Matan Barak
@ 2016-06-30 13:39   ` Matan Barak
  7 siblings, 0 replies; 35+ messages in thread
From: Matan Barak @ 2016-06-30 13:39 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Doug Ledford, Jason Gunthorpe, Sean Hefty, Tal Alon, Liran Liss,
	Haggai Eran, Matan Barak, Majd Dibbiny, Christoph Lameter,
	Leon Romanovsky

This patch presents simple skeleton to shared and hardware specific
code.

This code demonstrates how shared code specifications are written
(uverbs_create_qp_spec, uverbs_destroy_qp_spec and
uverbs_create_cq_spec), while the driver could choose implementing
a handler (which may call a common code function) or use a common
code function as handler directly (uverbs_destroy_qp_handler).

The driver declares an array of types UVERBS_TYPES. Each type could
be declared by pointer (UVERBS_TYPE) or directly inlined
(UVERBS_TYPE_ACTIONS).

A type is composed of actions (UVERBS_ACTION). Each action specifies
the handler and the chain of specifications used to validate the
action. This chain of specification could either be declared by
pointer or inlined (UVERBS_ATTR_CHAIN_SPEC).
Each such chain consists of attributes that could be either
PTR_IN, PTR_OUT, IDR_IN and maybe others in the future.
There's no need for IDR_OUT as it could just be written as ant other
esponse data.
Attributes states the type and length of their respected data. If an
attribute is an IDR attribute, the object type and required access
is also stated.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs_ioctl_cmd.c |  40 ++++++++++
 drivers/infiniband/hw/ioctl-skel.c         | 119 +++++++++++++++++++++++++++++
 include/rdma/uverbs_ioctl_cmd.h            |  35 +++++++++
 include/uapi/rdma/ib_user_verbs.h          |  19 +++++
 4 files changed, 213 insertions(+)
 create mode 100644 drivers/infiniband/hw/ioctl-skel.c

diff --git a/drivers/infiniband/core/uverbs_ioctl_cmd.c b/drivers/infiniband/core/uverbs_ioctl_cmd.c
index d38b7ec..57e9d76 100644
--- a/drivers/infiniband/core/uverbs_ioctl_cmd.c
+++ b/drivers/infiniband/core/uverbs_ioctl_cmd.c
@@ -31,6 +31,7 @@
  */
 
 #include <rdma/uverbs_ioctl_cmd.h>
+#include <rdma/ib_user_verbs.h>
 #include <linux/bug.h>
 
 #define IB_UVERBS_VENDOR_FLAG	0x8000
@@ -55,3 +56,42 @@ int uverbs_action_std_handle(struct ib_device *ib_dev,
 
 	return priv->handler(ib_dev, ucontext, &ctx[0], &ctx[1], priv->priv);
 }
+
+DECLARE_UVERBS_ATTR_CHAIN_SPEC(
+	uverbs_create_qp_spec,
+	UVERBS_ATTR_PTR_IN(CREATE_QP_CMD, sizeof(struct rdma_ioctl_create_qp)),
+	UVERBS_ATTR_PTR_OUT(CREATE_QP_RESP, sizeof(struct ib_uverbs_create_qp_resp)),
+	UVERBS_ATTR_IDR_IN(CREATE_QP_QP, UVERBS_TYPE_QP,
+			   UVERBS_IDR_ACCESS_NEW),
+	UVERBS_ATTR_IDR_IN(CREATE_QP_PD, UVERBS_TYPE_PD,
+			   UVERBS_IDR_ACCESS_READ),
+	UVERBS_ATTR_IDR_IN(CREATE_QP_RECV_CQ, UVERBS_TYPE_CQ,
+			   UVERBS_IDR_ACCESS_READ),
+	UVERBS_ATTR_IDR_IN(CREATE_QP_SEND_CQ, UVERBS_TYPE_CQ,
+			   UVERBS_IDR_ACCESS_READ));
+
+DECLARE_UVERBS_ATTR_CHAIN_SPEC(
+	uverbs_destroy_qp_spec,
+	UVERBS_ATTR_PTR_OUT(DESTROY_QP_RESP, sizeof(struct ib_uverbs_destroy_qp_resp)),
+	UVERBS_ATTR_IDR_IN(DESTROY_QP_QP, UVERBS_TYPE_QP,
+			   UVERBS_IDR_ACCESS_DESTROY));
+
+DECLARE_UVERBS_ATTR_CHAIN_SPEC(
+	uverbs_create_cq_spec,
+	UVERBS_ATTR_PTR_IN(CREATE_CQ_CMD, sizeof(struct rdma_create_cq)),
+	UVERBS_ATTR_PTR_OUT(CREATE_CQ_RESP, sizeof(struct ib_uverbs_create_cq_resp)));
+
+
+int uverbs_destroy_qp_handler(struct ib_device *ib_dev,
+			      struct ib_ucontext *ucontext,
+			      struct uverbs_attr_array *common,
+			      struct uverbs_attr_array *vendor,
+			      void *priv)
+{
+	/*
+	 * Implementation:
+	 * 1. Serialize to kAPI
+	 * 2. Call ib_dev with @vendor as vendor specific data
+	 */
+	return 0;
+}
diff --git a/drivers/infiniband/hw/ioctl-skel.c b/drivers/infiniband/hw/ioctl-skel.c
new file mode 100644
index 0000000..4bb646e
--- /dev/null
+++ b/drivers/infiniband/hw/ioctl-skel.c
@@ -0,0 +1,119 @@
+/* Vendor demo */
+
+#include <rdma/uverbs_ioctl_cmd.h>
+
+/*
+ * Vendor could have its own object model as well
+ * enum {
+ *	MLX5_VENDOR_TYPE_OBJ
+ * };
+ */
+
+struct mlx5_ib_create_qp_vendor_cmd {
+	__u64	buf_addr;
+	__u64	db_addr;
+	__u32	sq_wqe_count;
+	__u32	rq_wqe_count;
+	__u32	rq_wqe_shift;
+	__u32	flags;
+	__u32	uidx;
+	__u32	reserved0;
+	__u64	sq_buf_addr;
+};
+
+struct mlx5_ib_create_qp_vendor_resp {
+	__u32	uuar_index;
+};
+
+struct mlx5_ib_create_cq_vendor_cmd {
+	__u64	buf_addr;
+	__u64	db_addr;
+	__u32	cqe_size;
+	__u32	reserved;
+};
+
+struct mlx5_ib_create_cq_vendor_resp {
+	__u32	cqn;
+	__u32	reserved;
+};
+
+/* Could be per vendor handler */
+static int create_qp_handler(struct ib_device *ib_dev,
+			     struct ib_ucontext *ucontext,
+			     struct uverbs_attr_array *common,
+			     struct uverbs_attr_array *vendor,
+			     void *priv)
+{
+	/* Some smart things here */
+	return 0;
+}
+
+/* Could be per vendor handler */
+static int create_cq_handler(struct ib_device *ib_dev,
+			     struct ib_ucontext *ucontext,
+			     struct uverbs_attr_array *common,
+			     struct uverbs_attr_array *vendor,
+			     void *priv)
+{
+	/* Some smart things here */
+	return 0;
+}
+
+enum mlx5_qp_commands {
+	MLX5_QP_COMMAND_CREATE,
+	MLX5_QP_COMMAND_DESTROY
+	/* TODO: Other commands */
+};
+
+enum mlx5_cq_commands {
+	MLX5_CQ_COMMAND_CREATE,
+};
+
+enum mlx5_qp_create {
+	MLX5_CREATE_QP_VENDOR_CMD = IB_UVERBS_VENDOR_FLAG,
+	MLX5_CREATE_QP_VENDOR_RESP,
+};
+
+DECLARE_UVERBS_TYPE(mlx5_ib_qp,
+	UVERBS_ACTION(
+		MLX5_QP_COMMAND_CREATE, create_qp_handler, NULL, &uverbs_create_qp_spec,
+		&UVERBS_ATTR_CHAIN_SPEC(
+			UVERBS_ATTR_PTR_IN(MLX5_CREATE_QP_VENDOR_CMD,
+					   sizeof(struct mlx5_ib_create_qp_vendor_cmd)),
+			UVERBS_ATTR_PTR_OUT(MLX5_CREATE_QP_VENDOR_RESP,
+					    sizeof(struct mlx5_ib_create_qp_vendor_resp)),
+			/*
+			 * User could have its own objects and IDRs
+			 * UVERBS_ATTR_IDR_IN(MLX_CREATE_QP_VENDOR_OBJ,
+			 *		   UVERBS_COMMON_TYPE_QP, 0)
+			 */
+			),
+	),
+	UVERBS_ACTION(MLX5_QP_COMMAND_DESTROY, uverbs_destroy_qp_handler, NULL,
+		      &uverbs_destroy_qp_spec),
+);
+
+enum mlx5_cq_create {
+	MLX5_CREATE_CQ_VENDOR_CMD = IB_UVERBS_VENDOR_FLAG,
+	MLX5_CREATE_CQ_VENDOR_RESP,
+};
+
+struct uverbs_types objects = UVERBS_TYPES(
+	/* Decalre types by pointer */
+	UVERBS_TYPE(UVERBS_TYPE_QP, mlx5_ib_qp),
+	/* Types could only declared in-lined */
+	UVERBS_TYPE_ACTIONS(
+		UVERBS_TYPE_CQ,
+		UVERBS_ACTION(
+			MLX5_CQ_COMMAND_CREATE, create_cq_handler, NULL,
+			&uverbs_create_cq_spec,
+			&UVERBS_ATTR_CHAIN_SPEC(
+				UVERBS_ATTR_PTR_IN(MLX5_CREATE_CQ_VENDOR_CMD,
+						   sizeof(struct mlx5_ib_create_cq_vendor_cmd)),
+				UVERBS_ATTR_PTR_OUT(MLX5_CREATE_QP_VENDOR_RESP,
+						    sizeof(struct mlx5_ib_create_cq_vendor_resp)),
+			),
+		),
+	)
+);
+
diff --git a/include/rdma/uverbs_ioctl_cmd.h b/include/rdma/uverbs_ioctl_cmd.h
index 07cf6fa..ffb4d80 100644
--- a/include/rdma/uverbs_ioctl_cmd.h
+++ b/include/rdma/uverbs_ioctl_cmd.h
@@ -49,5 +49,40 @@ struct uverbs_action_std_handler {
 		       void *priv);
 	void *priv;
 };
+
+enum uverbs_common_types {
+	UVERBS_TYPE_PD,
+	UVERBS_TYPE_CQ,
+	UVERBS_TYPE_QP,
+};
+
+enum uverbs_create_qp_cmd_attr {
+	CREATE_QP_CMD,
+	CREATE_QP_RESP,
+	CREATE_QP_QP,
+	CREATE_QP_PD,
+	CREATE_QP_RECV_CQ,
+	CREATE_QP_SEND_CQ,
+};
+
+enum uverbs_destroy_qp_cmd_attr {
+	DESTROY_QP_RESP,
+	DESTROY_QP_QP,
+};
+
+enum uverbs_create_cq_cmd_attr {
+	CREATE_CQ_CMD,
+	CREATE_CQ_RESP,
+};
+
+extern const struct uverbs_attr_chain_spec uverbs_create_qp_spec;
+extern const struct uverbs_attr_chain_spec uverbs_destroy_qp_spec;
+extern const struct uverbs_attr_chain_spec uverbs_create_cq_spec;
+
+int uverbs_destroy_qp_handler(struct ib_device *ib_dev,
+			      struct ib_ucontext *ucontext,
+			      struct uverbs_attr_array *common,
+			      struct uverbs_attr_array *vendor,
+			      void *priv);
 #endif
 
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index b6543d7..c6b2ef0 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -368,6 +368,14 @@ struct ib_uverbs_ex_create_cq {
 	__u32 reserved;
 };
 
+struct rdma_create_cq {
+	__u64 user_handle;
+	__u32 cqe;
+	__u32 comp_vector;
+	__s32 comp_channel;
+	__u32 flags;
+};
+
 struct ib_uverbs_create_cq_resp {
 	__u32 cq_handle;
 	__u32 cqe;
@@ -518,6 +526,17 @@ struct ib_uverbs_create_qp {
 	__u64 driver_data[0];
 };
 
+struct rdma_ioctl_create_qp
+{
+	__u32 max_send_wr;
+	__u32 max_recv_wr;
+	__u32 max_send_sge;
+	__u32 max_recv_sge;
+	__u32 max_inline_data;
+	__u8  sq_sig_all;
+	__u8  qp_type;
+};
+
 struct ib_uverbs_ex_create_qp {
 	__u64 user_handle;
 	__u32 pd_handle;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 1/8] RDMA/core: Export RDMA IOCTL declarations
       [not found]     ` <1467293971-25688-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-12 19:12       ` Jason Gunthorpe
       [not found]         ` <20160712191256.GA8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-12 19:12 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Thu, Jun 30, 2016 at 04:39:24PM +0300, Matan Barak wrote:
> From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Place all external RDMA IOCTL declarations into one UAPI exported
> header file and move all legacy MAD commands to that file.
> 
> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This patch seems fine to me.

> +/* Legacy part
> + * !!!! NOTE: It uses the same command index as VERBS

Not sure what this comment means, all IOCTLs in this file should use
IB_IOCTL_MAGIC  ..

> +#include <rdma/ib_user_mad.h>

Strange not to be at the top.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 2/8] RDMA/core: Move uobject code to separate files
       [not found]     ` <1467293971-25688-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-12 19:15       ` Jason Gunthorpe
       [not found]         ` <20160712191507.GB8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-12 19:15 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Thu, Jun 30, 2016 at 04:39:25PM +0300, Matan Barak wrote:
> From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> The new proposed infrastructure allows driver's to declare new object
> types. Furthermore, uobject management code will be used from both
> the old legacy code and the new ioctl infrastructure.

Seems reasonable.

> +static inline void put_pd_read(struct ib_pd *pd)
> +{
> +	put_uobj_read(pd->uobject);
> +}

Ultimately I think we should get rid of these sorts of wrappers and
just open code it - especailly as we are talking about growing the
number of uboject types.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]     ` <1467293971-25688-5-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-12 19:23       ` Jason Gunthorpe
       [not found]         ` <20160712192345.GC8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-12 19:23 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Thu, Jun 30, 2016 at 04:39:27PM +0300, Matan Barak wrote:
> From: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> The new ioctl infrastructure supports driver specific types.
> This is implemented by having a list of uverbs_uobject_type in the
> ib_device. Each element of this list corresponds to a specific type
> and specifies its free function, driver's type_id, etc.
> The order of elements dictates the order of process release.
> The whole type_list should be initialized before any ucontext was
> created.

Do we need to have a linked list? Can we use static const arrays like
so much other stuff in the kernel?


> +int ib_uverbs_uobject_type_add(struct list_head	*head,
> +			       void (*free)(struct uverbs_uobject_type *uobject_type,
> +					    struct ib_uobject *uobject,
> +					    struct ib_ucontext *ucontext),

Don't pass function pointers like this, create a proper 'ops' kind of
structure.

Why is this taking a list_head not a ib_device?

> +void ib_uverbs_uobject_type_remove(struct uverbs_uobject_type
> *uobject_type)

Why would a driver ever have to remove a type?

> +{
> +	/*
> +	 * Allocate a new object type for the vendor, this should be done when a
> +	 * vendor is initialized.

Please stop using the word 'vendor' and scrub it from your patches.

This is 'driver specific'

> +	/* List of object under uverbs_object_type */
> +	struct list_head	idr_list;
> +	struct uverbs_uobject_list *type;	/* ptr to ucontext type */
>  };

I'm unclear why we need a list in the driver and list in the
ucontext.. I would have thought they are almost the same.

What use is tracking the ubojects in a per-type list? Do we ever
iterate that way other than on global destruct?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 3/8] RDMA/core: Refactor IDR to be per-device
       [not found]     ` <1467293971-25688-4-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-12 19:29       ` Jason Gunthorpe
       [not found]         ` <20160712192933.GD8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-12 19:29 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Thu, Jun 30, 2016 at 04:39:26PM +0300, Matan Barak wrote:
>  
> +static int ib_device_allocate_idrs(struct ib_device *device)
> +{
> +	spin_lock_init(&device->idr_lock);
> +	idr_init(&device->idr);
> +	return 0;
> +}
> +

> +	ret = ib_device_allocate_idrs(device);
> +	if (ret) {
> +		pr_warn("Couldn't allocate IDRs device %s with driver model\n",
> +			device->name);
> +		ib_cache_cleanup_one(device);
> +		goto out;

Eh? Can't happen, don't over design stuff like this. Return void from
ib_device_allocate_idrs

> +static struct ib_uobject *idr_read_uobj(int id, struct ib_ucontext *context,
> +					int nested)

What chance is there to get rid of 'nested' ? If you want to use a new
locking model, I'm wondering if we should trial it with uverbs???

> +struct ib_pd *idr_read_pd(int pd_handle, struct ib_ucontext *context)
> +{
> +	return idr_read_obj(pd_handle, context, 0);
> +}

All this stuff needs a type check. Something needs to confirm that
'pd_handle' is in fact a pd. This was being done before with the split
IDRs.

I'd also suggest that the uobj restructuring be completed a dedicated
series.

It looks like all approaches require this sort of stuff, and perhaps it could be
merged??

I didn't look at the locking in any of this stuff, maybe once you finish
it up.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 2/8] RDMA/core: Move uobject code to separate files
       [not found]         ` <20160712191507.GB8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-12 19:38           ` Leon Romanovsky
  2016-07-13 14:34           ` Matan Barak
  1 sibling, 0 replies; 35+ messages in thread
From: Leon Romanovsky @ 2016-07-12 19:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter

[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]

On Tue, Jul 12, 2016 at 01:15:07PM -0600, Jason Gunthorpe wrote:
> On Thu, Jun 30, 2016 at 04:39:25PM +0300, Matan Barak wrote:
> > From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > 
> > The new proposed infrastructure allows driver's to declare new object
> > types. Furthermore, uobject management code will be used from both
> > the old legacy code and the new ioctl infrastructure.
> 
> Seems reasonable.
> 
> > +static inline void put_pd_read(struct ib_pd *pd)
> > +{
> > +	put_uobj_read(pd->uobject);
> > +}
> 
> Ultimately I think we should get rid of these sorts of wrappers and
> just open code it - especailly as we are talking about growing the
> number of uboject types.

Yes, we are doing it in next version.

> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 1/8] RDMA/core: Export RDMA IOCTL declarations
       [not found]         ` <20160712191256.GA8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-12 19:43           ` Leon Romanovsky
  0 siblings, 0 replies; 35+ messages in thread
From: Leon Romanovsky @ 2016-07-12 19:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter

[-- Attachment #1: Type: text/plain, Size: 1399 bytes --]

On Tue, Jul 12, 2016 at 01:12:56PM -0600, Jason Gunthorpe wrote:
> On Thu, Jun 30, 2016 at 04:39:24PM +0300, Matan Barak wrote:
> > From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > 
> > Place all external RDMA IOCTL declarations into one UAPI exported
> > header file and move all legacy MAD commands to that file.
> > 
> > Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> This patch seems fine to me.
> 
> > +/* Legacy part
> > + * !!!! NOTE: It uses the same command index as VERBS
> 
> Not sure what this comment means, all IOCTLs in this file should use
> IB_IOCTL_MAGIC  ..

I meant numbers 1,2,... in the defines, wrote as a reminder.
+#define IB_USER_MAD_REGISTER_AGENT     _IOWR(IB_IOCTL_MAGIC, 1, \
------------------------------------------------------------^^^^^---

> 
> > +#include <rdma/ib_user_mad.h>
> 
> Strange not to be at the top.

To be on safe side, to preserve legacy structure.

> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 3/8] RDMA/core: Refactor IDR to be per-device
       [not found]         ` <20160712192933.GD8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-12 19:48           ` Leon Romanovsky
  0 siblings, 0 replies; 35+ messages in thread
From: Leon Romanovsky @ 2016-07-12 19:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter

[-- Attachment #1: Type: text/plain, Size: 1946 bytes --]

On Tue, Jul 12, 2016 at 01:29:33PM -0600, Jason Gunthorpe wrote:
> On Thu, Jun 30, 2016 at 04:39:26PM +0300, Matan Barak wrote:
> >  
> > +static int ib_device_allocate_idrs(struct ib_device *device)
> > +{
> > +	spin_lock_init(&device->idr_lock);
> > +	idr_init(&device->idr);
> > +	return 0;
> > +}
> > +
> 
> > +	ret = ib_device_allocate_idrs(device);
> > +	if (ret) {
> > +		pr_warn("Couldn't allocate IDRs device %s with driver model\n",
> > +			device->name);
> > +		ib_cache_cleanup_one(device);
> > +		goto out;
> 
> Eh? Can't happen, don't over design stuff like this. Return void from
> ib_device_allocate_idrs

Right,
I'll do.

> 
> > +static struct ib_uobject *idr_read_uobj(int id, struct ib_ucontext *context,
> > +					int nested)
> 
> What chance is there to get rid of 'nested' ? If you want to use a new
> locking model, I'm wondering if we should trial it with uverbs???

It is removed in our next version.

> 
> > +struct ib_pd *idr_read_pd(int pd_handle, struct ib_ucontext *context)
> > +{
> > +	return idr_read_obj(pd_handle, context, 0);
> > +}
> 
> All this stuff needs a type check. Something needs to confirm that
> 'pd_handle' is in fact a pd. This was being done before with the split
> IDRs.
> 
> I'd also suggest that the uobj restructuring be completed a dedicated
> series.
> 
> It looks like all approaches require this sort of stuff, and perhaps it could be
> merged??
> 
> I didn't look at the locking in any of this stuff, maybe once you finish
> it up.

I think that the best way will be to wait for the next version, because we
reworked locking approach and removed all these idr_read.../idr_write..
calls.

> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 2/8] RDMA/core: Move uobject code to separate files
       [not found]         ` <20160712191507.GB8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2016-07-12 19:38           ` Leon Romanovsky
@ 2016-07-13 14:34           ` Matan Barak
  1 sibling, 0 replies; 35+ messages in thread
From: Matan Barak @ 2016-07-13 14:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 12/07/2016 22:15, Jason Gunthorpe wrote:
> On Thu, Jun 30, 2016 at 04:39:25PM +0300, Matan Barak wrote:
>> From: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> The new proposed infrastructure allows driver's to declare new object
>> types. Furthermore, uobject management code will be used from both
>> the old legacy code and the new ioctl infrastructure.
>
> Seems reasonable.
>
>> +static inline void put_pd_read(struct ib_pd *pd)
>> +{
>> +	put_uobj_read(pd->uobject);
>> +}
>
> Ultimately I think we should get rid of these sorts of wrappers and
> just open code it - especailly as we are talking about growing the
> number of uboject types.
>

I agree. The new infrastructure fetches and manages use count 
automatically for you, so ultimately all these functions will be deleted.

> Jason
>

Matan

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]         ` <20160712192345.GC8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-13 14:44           ` Matan Barak
       [not found]             ` <081a02c0-0650-d0c2-494c-19a64b83cbc1-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-07-13 14:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 12/07/2016 22:23, Jason Gunthorpe wrote:
> On Thu, Jun 30, 2016 at 04:39:27PM +0300, Matan Barak wrote:
>> From: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> The new ioctl infrastructure supports driver specific types.
>> This is implemented by having a list of uverbs_uobject_type in the
>> ib_device. Each element of this list corresponds to a specific type
>> and specifies its free function, driver's type_id, etc.
>> The order of elements dictates the order of process release.
>> The whole type_list should be initialized before any ucontext was
>> created.
>
> Do we need to have a linked list? Can we use static const arrays like
> so much other stuff in the kernel?
>
>

We can, but then every driver should declare the common objects order by 
itself.
Maybe we could provide a macro that does this for you (if you don't have 
any custom types interleaved).

The basic advantage of a linked list is that we could still have a 
function that builds the common types list and a driver could still 
insert and custom type where ever it wants.

>> +int ib_uverbs_uobject_type_add(struct list_head	*head,
>> +			       void (*free)(struct uverbs_uobject_type *uobject_type,
>> +					    struct ib_uobject *uobject,
>> +					    struct ib_ucontext *ucontext),
>
> Don't pass function pointers like this, create a proper 'ops' kind of
> structure.
>
> Why is this taking a list_head not a ib_device?
>

In order to let you insert a type anywhere you want in the type list.
This affects the destruction order.

>> +void ib_uverbs_uobject_type_remove(struct uverbs_uobject_type
>> *uobject_type)
>
> Why would a driver ever have to remove a type?
>

Correct, unnecessary.

>> +{
>> +	/*
>> +	 * Allocate a new object type for the vendor, this should be done when a
>> +	 * vendor is initialized.
>
> Please stop using the word 'vendor' and scrub it from your patches.
>
> This is 'driver specific'
>

Ok, we'll replace vendor -> driver.

>> +	/* List of object under uverbs_object_type */
>> +	struct list_head	idr_list;
>> +	struct uverbs_uobject_list *type;	/* ptr to ucontext type */
>>  };
>
> I'm unclear why we need a list in the driver and list in the
> ucontext.. I would have thought they are almost the same.
>
> What use is tracking the ubojects in a per-type list? Do we ever
> iterate that way other than on global destruct?
>

The only purpose of these lists are either ucontext destruction or 
device removal (which ultimately deletes all ucontexts).
ucontext destruction requires list per ucontext x type.

> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]             ` <081a02c0-0650-d0c2-494c-19a64b83cbc1-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-13 16:39               ` Jason Gunthorpe
       [not found]                 ` <20160713163924.GA19657-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-13 16:39 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Wed, Jul 13, 2016 at 05:44:46PM +0300, Matan Barak wrote:

> We can, but then every driver should declare the common objects
> order by itself.  Maybe we could provide a macro that does this for
> you (if you don't have any custom types interleaved).

Hum, that seems so very fragile and complex..

There are other ways to deal with destruction ordering. refcounts
only for use during descruction might be easier. Repeatedly iterate
over the uobject list deleting 0 refcount objects until the list is
empty.

So, eg when you create a MR on a PD then PD's destruct refcount gets
++'d, and when the MR is deleted it gets --'d. This naturally forces
the needed ordering.

> >What use is tracking the ubojects in a per-type list? Do we ever
> >iterate that way other than on global destruct?
> >
> 
> The only purpose of these lists are either ucontext destruction or device
> removal (which ultimately deletes all ucontexts).
> ucontext destruction requires list per ucontext x type.

I wouldn't waste memory on optimizing destruction like that, just run
over the idr repeatedly, matching against the object type.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]                 ` <20160713163924.GA19657-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-13 16:57                   ` Matan Barak
       [not found]                     ` <0959d391-75fb-75e8-ef2e-9d8c06b1b96f-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-07-13 16:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 13/07/2016 19:39, Jason Gunthorpe wrote:
> On Wed, Jul 13, 2016 at 05:44:46PM +0300, Matan Barak wrote:
>
>> We can, but then every driver should declare the common objects
>> order by itself.  Maybe we could provide a macro that does this for
>> you (if you don't have any custom types interleaved).
>
> Hum, that seems so very fragile and complex..
>
> There are other ways to deal with destruction ordering. refcounts
> only for use during descruction might be easier. Repeatedly iterate
> over the uobject list deleting 0 refcount objects until the list is
> empty.
>
> So, eg when you create a MR on a PD then PD's destruct refcount gets
> ++'d, and when the MR is deleted it gets --'d. This naturally forces
> the needed ordering.
>

Yeah, that's nicer. We'll do that instead :)

>>> What use is tracking the ubojects in a per-type list? Do we ever
>>> iterate that way other than on global destruct?
>>>
>>
>> The only purpose of these lists are either ucontext destruction or device
>> removal (which ultimately deletes all ucontexts).
>> ucontext destruction requires list per ucontext x type.
>
> I wouldn't waste memory on optimizing destruction like that, just run
> over the idr repeatedly, matching against the object type.
>

Maybe instead of having an idr per device, it's better to have an idr 
per uverbs file. Doing that, if we decide to unify rdma-cm with the 
device itself, they can share the same idr. This gives us another 
advantage of destroying a ucontext without going over all opened objects.

However, when a device is removed, we need to find all its related 
objects and destroy them. In order to do that in a simple way, we could 
say that a uverbs_file is either not bound to any rdma device or bound 
to a single IB device (do you see any value of bounding a uverbs_file to 
several devices?). By this model you could have one ioctl code for 
device commands, another one for client commands (rdma-cm) and a third 
one for infrastructure based commands (query devices, query clients).

This is of course only relevant if we go for a unified fd for rdma-cm 
and rdma device.


> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 7/8] RDMA/core: Change locking of ib_uobject
       [not found]     ` <1467293971-25688-8-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-13 17:16       ` Jason Gunthorpe
       [not found]         ` <20160713171657.GB19657-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-13 17:16 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Thu, Jun 30, 2016 at 04:39:30PM +0300, Matan Barak wrote:
> This patch presents some concept about lockless scheme. It's only
> intended to open discussion and still misses some crucial parts.

It isn't really lockless, is it? I gather the intent is to remove the
per-uobj lock(s) and replace with a ucontext rwsem?

> Every ib_uobject has a usecnt atomic variable. Single rwsem in
> ib_uverbs_file (protect destruction of device from execution of
> commands).
> 
> For every command execution:
>   a. Check if device is available (as of today, with srcu)
>   b. Take down_read(&file->close_sem)
>   c. <User is now protected from closing the process and ib-dev
>      destruction>
>   d. When an object is used -> use uverbs_lock_object function
>      implemented in this patch.
>        i .If the object isn't available (-EBUSY)

I'd like to see a detailed list of all the calls that change from
blocking to EBUSY with this approach.

I also think this should be stand-alone as part of the uobject rework
series.. Again it seems like this benifits any approach.

uverbs_lock_object is in an different patch in the series..

I'm always nervous when I see people create their own locks with
atomics. Don't do that. That is a rw lock using try_lock.

>   e. Unlock with uverbs_unlock_object (implemented in this patch)
>   f. Release up_read(&file->close_sem)

g. release srcu

If the close_sem is always being held, then I suspect we can drop SRCU
as well??

> ib_uobjects are lockless. If two commands are executed in
> parallel and one need exclusive access (WRITE/DESTROY) -> one
> command will fail. User-space needs to either synchronize or do
> something productive with the failure.

I think this can work for destroy, but I have a hard time seeing how
it is going to be OK for WRITE/MODIFY/etc operations.

eg rereg_mr holds the write lock, I'm not sure I'm comfortable seeing
that be allowed to fail ..

> @@ -308,8 +298,8 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
>  	if (!uobj)
>  		return -ENOMEM;
>  
> -	init_uobj(uobj, 0, file->ucontext, &pd_lock_class);
> -	down_write(&uobj->mutex);
> +	down_read(&file->close_sem);
> +	init_uobj(uobj, 0, file->ucontext);

Seeing this pattern in your patches really makes me wonder.. What is
the down_write even doing at that point???

        uobj = kmalloc(sizeof *uobj, GFP_KERNEL);
        init_uobj(uobj, 0, file->ucontext, &pd_lock_class);
        down_write(&uobj->mutex);

init_uobj doesn't add the memory to any lists or anything. How could
another thread get access to the pointer and contend on the mutex?

I guess it is because later we do:

        ret = idr_add_uobj(&ib_uverbs_pd_idr, uobj);

While holding the lock. But that is a really ugly API. Nothing should be
added to the idr until it is 100% initialized, so we don't need
concurrency protection. What is the write lock doing at that point??

This whole flow should be simplified, which will make the locking
requirements a lot clearer.

  uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
  [.. full setup ..]
  uobj_activate(uobj); // live = 1, list_add_tail, idr_add

  resp.handle = uobj->id;
  copy_to_user(..);

  put_uobj(uobj);
  return ret;

err1:
  uobj_deactivate(uobj);
err2:
  put_obj(uobj);

I would do the above sort of restructuring as another (mergable, even)
patch, then rip out the locking.

Also, I think the destruction should be equally cleaned up. If you
introduce a patch with actual ops and a destroy function pointer
per-type then a centralized destruct call can be implemented that does
the correct lock, wrapping the functionc all. Again this would be a
nice clean up to have distinct from the ioctl stuff...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]                     ` <0959d391-75fb-75e8-ef2e-9d8c06b1b96f-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-13 17:21                       ` Jason Gunthorpe
       [not found]                         ` <20160713172116.GC19657-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-13 17:21 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Wed, Jul 13, 2016 at 07:57:04PM +0300, Matan Barak wrote:

> Maybe instead of having an idr per device, it's better to have an idr per
> uverbs file.

Oh, I thought you were doing that already!

Yes, the idr must be per ucontext, otherwise we create a big security
problem. Access to one files object #'s cannot be allowed from another
file.

> However, when a device is removed, we need to find all its related objects
> and destroy them. In order to do that in a simple way, we could say that a
> uverbs_file is either not bound to any rdma device or bound to a
> single IB

This is just more searching on the disassociate path. Search all
ucontexts for any uobj connected to the victim device.

If we drop the file == device scheme then we need a generic op to
tell what device a uobj is associated with. All our kobjs already
store this information, so it isn't a big deal.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]                         ` <20160713172116.GC19657-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-14  5:30                           ` Matan Barak
       [not found]                             ` <5fdae959-5ab2-ee17-e36e-3642ddd3e6ce-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2016-07-14  6:30                           ` Leon Romanovsky
  1 sibling, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-07-14  5:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 13/07/2016 20:21, Jason Gunthorpe wrote:
> On Wed, Jul 13, 2016 at 07:57:04PM +0300, Matan Barak wrote:
>
>> Maybe instead of having an idr per device, it's better to have an idr per
>> uverbs file.
>
> Oh, I thought you were doing that already!
>
> Yes, the idr must be per ucontext, otherwise we create a big security
> problem. Access to one files object #'s cannot be allowed from another
> file.
>

We don't have a security problem, as the ucontext pointer is store on 
the uobject. Therefore, the uobject is returned only if it matches the 
current ucontext.

>> However, when a device is removed, we need to find all its related objects
>> and destroy them. In order to do that in a simple way, we could say that a
>> uverbs_file is either not bound to any rdma device or bound to a
>> single IB
>
> This is just more searching on the disassociate path. Search all
> ucontexts for any uobj connected to the victim device.
>
> If we drop the file == device scheme then we need a generic op to
> tell what device a uobj is associated with. All our kobjs already
> store this information, so it isn't a big deal.
>

I've thought about ditching the lists, but there's one fundamental 
problem. Keeping reference counts in kernel doesn't encompass dependency 
relations in the user space. For example, since memory windows could be 
bounded and unbounded from a MR, we need to delete all MWs before 
deleting the MRs.
Secondly, we could safely change the list to hlist. The memory overhead 
will be minimal (probably better than adding and IDR per type).

If we break the file == device scheme, it also requires a few new object 
types (for example, DEVICE and CLIENT object) to pass them from user-space.
So the flow might be using the QUERY ioctl code and query the 
devices/clients. Creating a device/client and get a IDR handle for that. 
For every IOCTL code (either device or client commands), you pass this 
idr handle as part of the ioctl code specific header.

However, does it really worth it? Do you see any usage of binding the 
same uverbs_file to several rdma devices?
If not, we could have a simpler approach of having a file either bounded 
to a device or not bounded to any device. We could still share the IDR 
with clients on the same fd, but they are all guaranteed to set or use 
this exact device.

> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]                         ` <20160713172116.GC19657-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2016-07-14  5:30                           ` Matan Barak
@ 2016-07-14  6:30                           ` Leon Romanovsky
  1 sibling, 0 replies; 35+ messages in thread
From: Leon Romanovsky @ 2016-07-14  6:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford,
	Sean Hefty, Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

On Wed, Jul 13, 2016 at 11:21:16AM -0600, Jason Gunthorpe wrote:
> On Wed, Jul 13, 2016 at 07:57:04PM +0300, Matan Barak wrote:
> 
> > Maybe instead of having an idr per device, it's better to have an idr per
> > uverbs file.
> 
> Oh, I thought you were doing that already!
> 
> Yes, the idr must be per ucontext, otherwise we create a big security
> problem. Access to one files object #'s cannot be allowed from another
> file.

On top of the Matan's response, regarding no security issue here. I
didn't see added value in multiplying IDRs, which would be if
creation per-file is done.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 7/8] RDMA/core: Change locking of ib_uobject
       [not found]         ` <20160713171657.GB19657-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-14  7:59           ` Matan Barak
       [not found]             ` <6ae1a553-408c-723a-93a2-3d46d952ce35-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-07-14  7:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 13/07/2016 20:16, Jason Gunthorpe wrote:
> On Thu, Jun 30, 2016 at 04:39:30PM +0300, Matan Barak wrote:
>> This patch presents some concept about lockless scheme. It's only
>> intended to open discussion and still misses some crucial parts.
>
> It isn't really lockless, is it? I gather the intent is to remove the
> per-uobj lock(s) and replace with a ucontext rwsem?
>
>> Every ib_uobject has a usecnt atomic variable. Single rwsem in
>> ib_uverbs_file (protect destruction of device from execution of
>> commands).
>>
>> For every command execution:
>>   a. Check if device is available (as of today, with srcu)
>>   b. Take down_read(&file->close_sem)
>>   c. <User is now protected from closing the process and ib-dev
>>      destruction>
>>   d. When an object is used -> use uverbs_lock_object function
>>      implemented in this patch.
>>        i .If the object isn't available (-EBUSY)
>
> I'd like to see a detailed list of all the calls that change from
> blocking to EBUSY with this approach.
>

I'm not at the office right now, but AFAIK it's attach_mcast, 
detach_mcast, rereg_mr and all the destroy functions.

> I also think this should be stand-alone as part of the uobject rework
> series.. Again it seems like this benifits any approach.
>

Yeah, but since the new infrastructure is going to do the locks 
automatically for you, we'll ditch the locks in uverbs_cmd anyway. 
Therefore, I don't see a real benefit refactoring uverbs_cmd and then 
ditching all these changes.

> uverbs_lock_object is in an different patch in the series..
>
> I'm always nervous when I see people create their own locks with
> atomics. Don't do that. That is a rw lock using try_lock.
>

Ok

>>   e. Unlock with uverbs_unlock_object (implemented in this patch)
>>   f. Release up_read(&file->close_sem)
>
> g. release srcu
>
> If the close_sem is always being held, then I suspect we can drop SRCU
> as well??
>

No, it's not the same thing. SRCU protects device removal from command 
execution. close_sem protects context destruction from command execution.
You're correct that in the current state of the code, context is only 
closed when a file is closed or when a device is removed, so it's 
protected. However, if we want to add a destroy_context action - it 
becomes necessary.

If we move to a per uverbs_file idr, contexts (either file context or 
client context), we still need to sync destruction of these contexts 
with carrying out actions. Since the uverbs_file outlives them, it's 
reasonable to put the lock there.

>> ib_uobjects are lockless. If two commands are executed in
>> parallel and one need exclusive access (WRITE/DESTROY) -> one
>> command will fail. User-space needs to either synchronize or do
>> something productive with the failure.
>
> I think this can work for destroy, but I have a hard time seeing how
> it is going to be OK for WRITE/MODIFY/etc operations.
>
> eg rereg_mr holds the write lock, I'm not sure I'm comfortable seeing
> that be allowed to fail ..
>

Currently, as far as I remember, modify isn't protected by the write 
lock anyway. Besides that, nothing stops the user from syncing the actions.

If you do the synchronization in kernel, you need to make sure locks are 
taken in the exact same order for all commands to avoid deadlock in the 
kernel. This is more expensive than just letting the user do that.
For example, if a user has a single thread to control his resources, why 
do we have to waste time or sorting his handles in kernel?

>> @@ -308,8 +298,8 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
>>  	if (!uobj)
>>  		return -ENOMEM;
>>
>> -	init_uobj(uobj, 0, file->ucontext, &pd_lock_class);
>> -	down_write(&uobj->mutex);
>> +	down_read(&file->close_sem);
>> +	init_uobj(uobj, 0, file->ucontext);
>
> Seeing this pattern in your patches really makes me wonder.. What is
> the down_write even doing at that point???
>
>         uobj = kmalloc(sizeof *uobj, GFP_KERNEL);
>         init_uobj(uobj, 0, file->ucontext, &pd_lock_class);
>         down_write(&uobj->mutex);
>
> init_uobj doesn't add the memory to any lists or anything. How could
> another thread get access to the pointer and contend on the mutex?
>

Our latest locking series (not posted yet) removes the mutex from the 
uobject.

> I guess it is because later we do:
>
>         ret = idr_add_uobj(&ib_uverbs_pd_idr, uobj);
>
> While holding the lock. But that is a really ugly API. Nothing should be
> added to the idr until it is 100% initialized, so we don't need
> concurrency protection. What is the write lock doing at that point??
>

We assume the enable function can't fail. Since adding something to idr 
could fail, we preferred adding it first with live = 0 flag and then 
enabling it.

> This whole flow should be simplified, which will make the locking
> requirements a lot clearer.
>
>   uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
>   [.. full setup ..]
>   uobj_activate(uobj); // live = 1, list_add_tail, idr_add
>
>   resp.handle = uobj->id;
>   copy_to_user(..);
>
>   put_uobj(uobj);
>   return ret;
>
> err1:
>   uobj_deactivate(uobj);
> err2:
>   put_obj(uobj);
>

I agree with the general schema, but I think idr_add should be done in 
uobj_alloc as it can fail.

> I would do the above sort of restructuring as another (mergable, even)
> patch, then rip out the locking.
>

This locking is done automatically using the new infrastructure. The 
goal is to remove all these locking semantics from the user handler.
Therefore, I don't think we should refactor uverbs_cmd and then delete 
all locks from there.

> Also, I think the destruction should be equally cleaned up. If you
> introduce a patch with actual ops and a destroy function pointer
> per-type then a centralized destruct call can be implemented that does
> the correct lock, wrapping the functionc all. Again this would be a
> nice clean up to have distinct from the ioctl stuff...
>

We have a free function per object type. The correct locking is indeed 
done by the wrapper.

> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]                             ` <5fdae959-5ab2-ee17-e36e-3642ddd3e6ce-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-14 16:41                               ` Jason Gunthorpe
       [not found]                                 ` <20160714164121.GA2046-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-14 16:41 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Thu, Jul 14, 2016 at 08:30:05AM +0300, Matan Barak wrote:

> We don't have a security problem, as the ucontext pointer is store on the
> uobject. Therefore, the uobject is returned only if it matches the current
> ucontext.

That still allows a different security problem, idrs have bounded
capacity and now unprivileged users can exhaust them for all users.

> I've thought about ditching the lists, but there's one fundamental problem.
> Keeping reference counts in kernel doesn't encompass dependency relations in
> the user space. For example, since memory windows could be bounded and
> unbounded from a MR, we need to delete all MWs before deleting the MRs.
> Secondly, we could safely change the list to hlist. The memory overhead will
> be minimal (probably better than adding and IDR per type).

Bleck, that is a terrible API. Destroying the MR should return
associated MW's to unbound.

Is MR the only case of this? I can't think of much else that uses the
WQ that way..

I think the lists should always go, always use repeated iteration. If
you really can't make refcount work then use a type compare instead,
but I don't think we don't need the overhead of keeping lists to
optimize destruction.

The MW case could be handled specially with a simple priority field.

> If we break the file == device scheme, it also requires a few new object
> types (for example, DEVICE and CLIENT object) to pass them from
> user-space.

Don't know what a CLIENT object is, but I think we always need to
have a DEVICE object even if there is only allowed 1 device object per
FD.

We need the device object to match what ibv_open_device does and
provide a basis for sending device specific driver commands and
issuing device specific queries.

If it is a singleton per-fd or created doesn't really matter from an
API design perspective.

> So the flow might be using the QUERY ioctl code and query the
> devices/clients. Creating a device/client and get a IDR handle for that. For
> every IOCTL code (either device or client commands), you pass this idr
> handle as part of the ioctl code specific header.

Yes, as I said, we pretty much have to do that already, and it
wouldn't be on every ioctl, just certain top level ones like PDs.

> However, does it really worth it? Do you see any usage of binding the same
> uverbs_file to several rdma devices?

I'd like to hear more arguments on this point.

The main value I see is combining rdma_cm and uverbs into one fd & IDR, and
that requires allowing multiple devices per fd.

This seems simpler for clients and helps move things in the direction
Sean was talking about where all events can all be aggregated into one
poll loop.

As far as I can tell, the only downside to that is we loose the
per-device permissions on /dev/infiniband/uverbsX, which don't really
work today anyhow.

> If not, we could have a simpler approach of having a file either bounded to
> a device or not bounded to any device. We could still share the IDR with
> clients on the same fd, but they are all guaranteed to set or use this exact
> device.

Why is this simpler?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 7/8] RDMA/core: Change locking of ib_uobject
       [not found]             ` <6ae1a553-408c-723a-93a2-3d46d952ce35-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-14 17:00               ` Jason Gunthorpe
       [not found]                 ` <20160714170051.GB2046-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-14 17:00 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Thu, Jul 14, 2016 at 10:59:41AM +0300, Matan Barak wrote:

> I'm not at the office right now, but AFAIK it's attach_mcast, detach_mcast,
> rereg_mr and all the destroy functions.

I don't think we can change those three, I would be too scared that
would break existing apps that don't serialize them.

I think we need to change the kernel locking for those three cases to
rely on a different lock so that the modify style operations can not
return EBUSY.

That seems like it should be simple enough? Can we use the lock in the
kobj or something? Why is it like that today anyhow?

> >I also think this should be stand-alone as part of the uobject rework
> >series.. Again it seems like this benifits any approach.
> 
> Yeah, but since the new infrastructure is going to do the locks
> automatically for you, we'll ditch the locks in uverbs_cmd anyway.
> Therefore, I don't see a real benefit refactoring uverbs_cmd and then
> ditching all these changes.

We still have to go through a transitional approach here, to me that
means we should move existing uverbs closer to the new approach
(particularly in terms of locking and code sharing), trial that in
real apps, and then add the new api on that tested basis. We can't
just abandon uverbs, and we probably shouldn't try to do a flag day
convert either.

So I don't know what your plan is to 'ditch the locks' with the new
infrastucture while maintaining the uverbs api..

My preference is to see smaller patch series with clear goals that can
be applied - and 'rework uverbs locking' seems to fit that bill quite
well.

For instance, if we are introducing a uobject type scheme like this
series does then it should be another stand alone series that doesn't
try and introduce the ioctl stuff..

> No, it's not the same thing. SRCU protects device removal from command
> execution. close_sem protects context destruction from command
> execution.

It is the same, device removal requires 'context destruction' as
a precondition, so it should be able to ride on the same lock.

The 'context destruction' for device removal is a bit special but it
is still basically the same action as close(fd).

> You're correct that in the current state of the code, context is only closed
> when a file is closed or when a device is removed, so it's protected.
> However, if we want to add a destroy_context action - it becomes necessary.

What would destroy_context do? Why would we need that?

> If we move to a per uverbs_file idr, contexts (either file context or client
> context), we still need to sync destruction of these contexts with carrying
> out actions. Since the uverbs_file outlives them, it's reasonable to put the
> lock there.

Are you proposing that all object destroy will be serialized?

> Currently, as far as I remember, modify isn't protected by the write lock
> anyway. Besides that, nothing stops the user from syncing the actions.

You can't make that argument when dealing with compat. We can't change
something like the locking requirements in user space.

> >While holding the lock. But that is a really ugly API. Nothing should be
> >added to the idr until it is 100% initialized, so we don't need
> >concurrency protection. What is the write lock doing at that point??
> 
> We assume the enable function can't fail. Since adding something to idr
> could fail, we preferred adding it first with live = 0 flag and then
> enabling it.

Why can't it fail?

> >  uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
> >  [.. full setup ..]
> >  uobj_activate(uobj); // live = 1, list_add_tail, idr_add
> >
> >  resp.handle = uobj->id;
> >  copy_to_user(..);
> >
> >  put_uobj(uobj);
> >  return ret;
> >
> >err1:
> >  uobj_deactivate(uobj);
> >err2:
> >  put_obj(uobj);
> >
> 
> I agree with the general schema, but I think idr_add should be done in
> uobj_alloc as it can fail.

In my version uobj_activate can fail, and we already have to have the
goto-unwind infrastructure to cope with that (copy_to_user could
fail), so I'm not sure why avoiding it would be helpful?

I also wonder if the above flow is enough to remove live? That always
seemed like a strange approach to me, most idr users would just use in-idr
as 'live' ...

> >I would do the above sort of restructuring as another (mergable, even)
> >patch, then rip out the locking.
> 
> This locking is done automatically using the new infrastructure. The goal is
> to remove all these locking semantics from the user handler.
> Therefore, I don't think we should refactor uverbs_cmd and then delete all
> locks from there.

What happens to uverbs_cmd then? As I said above we still need to keep
it around and it still need to fit with the new infrastructure. I'd
rather see a patch flow that slowly transforms uverbs_cmd to a point
where adding the new ioctl becomes a simpler patch.

There is obvoisly a lot of in-kernel infrastructure work to do.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]                                 ` <20160714164121.GA2046-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-19 14:05                                   ` Matan Barak
       [not found]                                     ` <a1fcb1a7-0028-329a-d34a-ca8b52323916-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-07-19 14:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 14/07/2016 19:41, Jason Gunthorpe wrote:
> On Thu, Jul 14, 2016 at 08:30:05AM +0300, Matan Barak wrote:
>
>> We don't have a security problem, as the ucontext pointer is store on the
>> uobject. Therefore, the uobject is returned only if it matches the current
>> ucontext.
>
> That still allows a different security problem, idrs have bounded
> capacity and now unprivileged users can exhaust them for all users.
>

Right, but that's an existing security issue.

>> I've thought about ditching the lists, but there's one fundamental problem.
>> Keeping reference counts in kernel doesn't encompass dependency relations in
>> the user space. For example, since memory windows could be bounded and
>> unbounded from a MR, we need to delete all MWs before deleting the MRs.
>> Secondly, we could safely change the list to hlist. The memory overhead will
>> be minimal (probably better than adding and IDR per type).
>
> Bleck, that is a terrible API. Destroying the MR should return
> associated MW's to unbound.
>
> Is MR the only case of this? I can't think of much else that uses the
> WQ that way..
>

According to the IB spec, a CI could choose either to return an error or 
to move MWs to orphaned MWs (and make sure any access to these MWs will 
be blocked). Some hardwares might implement the first approach and keep 
reference from MRs to MWs in hardware (without having a way to query 
that in software). In these cases, you can't force this unbound (as you 
don't know if they were unbound because of remote invalidate).

AFAIK, MW are the only case.

> I think the lists should always go, always use repeated iteration. If
> you really can't make refcount work then use a type compare instead,
> but I don't think we don't need the overhead of keeping lists to
> optimize destruction.
>
> The MW case could be handled specially with a simple priority field.
>

So you suggest having a priority field in the uobject and iterate over 
the IDR freeing by order. Seems simple enough.

>> If we break the file == device scheme, it also requires a few new object
>> types (for example, DEVICE and CLIENT object) to pass them from
>> user-space.
>
> Don't know what a CLIENT object is, but I think we always need to
> have a DEVICE object even if there is only allowed 1 device object per
> FD.
>

By saying CLIENT I mean rdma-cm or other such users which can handle 
rdma devices and provide services.

> We need the device object to match what ibv_open_device does and
> provide a basis for sending device specific driver commands and
> issuing device specific queries.
>
> If it is a singleton per-fd or created doesn't really matter from an
> API design perspective.
>

If it's a singleton per-fd, you don't need to pass it per action. If you 
get an action on this fd - you know exactly to which device you should 
delegate it.

>> So the flow might be using the QUERY ioctl code and query the
>> devices/clients. Creating a device/client and get a IDR handle for that. For
>> every IOCTL code (either device or client commands), you pass this idr
>> handle as part of the ioctl code specific header.
>
> Yes, as I said, we pretty much have to do that already, and it
> wouldn't be on every ioctl, just certain top level ones like PDs.
>

I meant querying the possible devices or clients. Meaning, introduce 
another ioctl code that is used to get all available clients and devices.

>> However, does it really worth it? Do you see any usage of binding the same
>> uverbs_file to several rdma devices?
>
> I'd like to hear more arguments on this point.
>
> The main value I see is combining rdma_cm and uverbs into one fd & IDR, and
> that requires allowing multiple devices per fd.
>
> This seems simpler for clients and helps move things in the direction
> Sean was talking about where all events can all be aggregated into one
> poll loop.
>
> As far as I can tell, the only downside to that is we loose the
> per-device permissions on /dev/infiniband/uverbsX, which don't really
> work today anyhow.
>

The point here isn't about unifying rdma-cm and uverbs. Assuming we do 
that, we could still allow a user to bind fd to several devices or just 
to a single device.

>> If not, we could have a simpler approach of having a file either bounded to
>> a device or not bounded to any device. We could still share the IDR with
>> clients on the same fd, but they are all guaranteed to set or use this exact
>> device.
>
> Why is this simpler?
>

It's simpler as you don't need to tie objects in the IDR to an IB device 
(as there's only one option). In addition, you don't have to pass a 
device handle to all commands. You could either have a bounded device 
and then it's clear what you're using or you didn't bind any device and 
then all device specific commands (except init_ucontext) fail. Client 
commands (i.e rdma-cm) could either fail or succeed, depending on their 
requirements. It's a closer model to what we have today - uverbs_file 
points to an ib_dev (which might be NULL).

> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 7/8] RDMA/core: Change locking of ib_uobject
       [not found]                 ` <20160714170051.GB2046-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-19 15:16                   ` Matan Barak
       [not found]                     ` <b47386f4-fdb5-d24b-dbc6-9b5e18be6bf9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-07-19 15:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 14/07/2016 20:00, Jason Gunthorpe wrote:
> On Thu, Jul 14, 2016 at 10:59:41AM +0300, Matan Barak wrote:
>
>> I'm not at the office right now, but AFAIK it's attach_mcast, detach_mcast,
>> rereg_mr and all the destroy functions.
>
> I don't think we can change those three, I would be too scared that
> would break existing apps that don't serialize them.
>
> I think we need to change the kernel locking for those three cases to
> rely on a different lock so that the modify style operations can not
> return EBUSY.
>
> That seems like it should be simple enough? Can we use the lock in the
> kobj or something? Why is it like that today anyhow?
>

Yeah, I think it'll be safer to use lower level locking (like in kobj).

>>> I also think this should be stand-alone as part of the uobject rework
>>> series.. Again it seems like this benifits any approach.
>>
>> Yeah, but since the new infrastructure is going to do the locks
>> automatically for you, we'll ditch the locks in uverbs_cmd anyway.
>> Therefore, I don't see a real benefit refactoring uverbs_cmd and then
>> ditching all these changes.
>
> We still have to go through a transitional approach here, to me that
> means we should move existing uverbs closer to the new approach
> (particularly in terms of locking and code sharing), trial that in
> real apps, and then add the new api on that tested basis. We can't
> just abandon uverbs, and we probably shouldn't try to do a flag day
> convert either.
>

Well, another option (not saying it's a better one) is to develop the 
new ABI in a separate branch and try real world applications using it 
(with a modified libibverbs of course). This shall give use a feel if 
things are mature enough. The last step could be adding a write->ioctl 
interface. That's qualified to "flag day" conversion in the main branch.

Saying that, I see that advantages of a transitional approach. We need 
to balance between transitional approach and try avoiding write code 
that we know will be dumped anyway.

Lets decide on the development phases here after we have a clear idea on 
how the new ABI is going to look like and we could all work towards this 
approach.

> So I don't know what your plan is to 'ditch the locks' with the new
> infrastucture while maintaining the uverbs api..
>

Using a conversion layer of the old ABI to the new ABI. Move the current 
write locks to lower kobj level as you suggested.

> My preference is to see smaller patch series with clear goals that can
> be applied - and 'rework uverbs locking' seems to fit that bill quite
> well.
>
> For instance, if we are introducing a uobject type scheme like this
> series does then it should be another stand alone series that doesn't
> try and introduce the ioctl stuff..
>

Yeah, changes have a clear smaller goals and won't be changed 
dramatically as the ABI task is evolving should definitely go through a 
different patch-set.

>> No, it's not the same thing. SRCU protects device removal from command
>> execution. close_sem protects context destruction from command
>> execution.
>
> It is the same, device removal requires 'context destruction' as
> a precondition, so it should be able to ride on the same lock.
>
> The 'context destruction' for device removal is a bit special but it
> is still basically the same action as close(fd).
>

rechecked - you're right, both calls ib_uverbs_cleanup_ucontext.

>> You're correct that in the current state of the code, context is only closed
>> when a file is closed or when a device is removed, so it's protected.
>> However, if we want to add a destroy_context action - it becomes necessary.
>
> What would destroy_context do? Why would we need that?
>

If you want to break device != fd, you might want to destroy a device 
and start using the same fd with another device.

>> If we move to a per uverbs_file idr, contexts (either file context or client
>> context), we still need to sync destruction of these contexts with carrying
>> out actions. Since the uverbs_file outlives them, it's reasonable to put the
>> lock there.
>
> Are you proposing that all object destroy will be serialized?
>

When a process/device dies, I think it's reasonable to destroy objects 
in a serial way. I think it's reasonable that when a device is removed, 
you serialize this with handling commands on the same fd. Of course 
destroying unrelated objects could be done in parallel.

>> Currently, as far as I remember, modify isn't protected by the write lock
>> anyway. Besides that, nothing stops the user from syncing the actions.
>
> You can't make that argument when dealing with compat. We can't change
> something like the locking requirements in user space.
>

Of course, but if currently modify_qp uses read_lock - there's no change 
in user space locking requirement.

>>> While holding the lock. But that is a really ugly API. Nothing should be
>>> added to the idr until it is 100% initialized, so we don't need
>>> concurrency protection. What is the write lock doing at that point??
>>
>> We assume the enable function can't fail. Since adding something to idr
>> could fail, we preferred adding it first with live = 0 flag and then
>> enabling it.
>
> Why can't it fail?
>

Because in our v2 (to be sent soon) we only adds it to the type list and 
set the live flag.

>>>  uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
>>>  [.. full setup ..]
>>>  uobj_activate(uobj); // live = 1, list_add_tail, idr_add
>>>
>>>  resp.handle = uobj->id;
>>>  copy_to_user(..);
>>>
>>>  put_uobj(uobj);
>>>  return ret;
>>>
>>> err1:
>>>  uobj_deactivate(uobj);
>>> err2:
>>>  put_obj(uobj);
>>>
>>
>> I agree with the general schema, but I think idr_add should be done in
>> uobj_alloc as it can fail.
>
> In my version uobj_activate can fail, and we already have to have the
> goto-unwind infrastructure to cope with that (copy_to_user could
> fail), so I'm not sure why avoiding it would be helpful?
>

I think it should be the other way around - copy_to_user is done by the 
handler and uobj_activate is done by the infrastructure. Therefore, the 
general doesn't know how to roll back the command and I want to 
guarantee anything it does after calling the handler succeed.

> I also wonder if the above flow is enough to remove live? That always
> seemed like a strange approach to me, most idr users would just use in-idr
> as 'live' ...
>
>>> I would do the above sort of restructuring as another (mergable, even)
>>> patch, then rip out the locking.
>>
>> This locking is done automatically using the new infrastructure. The goal is
>> to remove all these locking semantics from the user handler.
>> Therefore, I don't think we should refactor uverbs_cmd and then delete all
>> locks from there.
>
> What happens to uverbs_cmd then? As I said above we still need to keep
> it around and it still need to fit with the new infrastructure. I'd
> rather see a patch flow that slowly transforms uverbs_cmd to a point
> where adding the new ioctl becomes a simpler patch.
>

IMHO, uverbs_cmd should become write_format->ioctl_format convertor.
We don't want to have two paths doing almost the same thing in two 
different ways.

> There is obvoisly a lot of in-kernel infrastructure work to do.
>

Yeah, that's why we're still in the RFC phase. I think we need to agree 
on how we are going to do things and get the general picture clear.

> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]                                     ` <a1fcb1a7-0028-329a-d34a-ca8b52323916-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-19 20:36                                       ` Jason Gunthorpe
       [not found]                                         ` <20160719203609.GD16042-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-19 20:36 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Tue, Jul 19, 2016 at 05:05:46PM +0300, Matan Barak wrote:

> According to the IB spec, a CI could choose either to return an error or to
> move MWs to orphaned MWs (and make sure any access to these MWs will be
> blocked). Some hardwares might implement the first approach and keep
> reference from MRs to MWs in hardware (without having a way to query that in
> software). In these cases, you can't force this unbound (as you don't know
> if they were unbound because of remote invalidate).

Ugh, I don't why that was let in the spec, from an app perspective you
have to assume the annoying behavior...

> >The MW case could be handled specially with a simple priority field.

> So you suggest having a priority field in the uobject and iterate over the
> IDR freeing by order. Seems simple enough.

That is one option.. 

> >>So the flow might be using the QUERY ioctl code and query the
> >>devices/clients. Creating a device/client and get a IDR handle for that. For
> >>every IOCTL code (either device or client commands), you pass this idr
> >>handle as part of the ioctl code specific header.
> >
> >Yes, as I said, we pretty much have to do that already, and it
> >wouldn't be on every ioctl, just certain top level ones like PDs.

> I meant querying the possible devices or clients. Meaning, introduce another
> ioctl code that is used to get all available clients and devices.

Not necessarily, the 'create device' could accept the existing sysfs
device string as the starting point, libibverbs already has that
information.

> The point here isn't about unifying rdma-cm and uverbs. Assuming we do that,
> we could still allow a user to bind fd to several devices or just to a
> single device.

I don't understand this comment.

If we keep /dev/uverbs0 then that has to remain 1:1 with a device
because that is our permissions model.

If we move to /dev/rdma_uapi, then that would support 1:N devices.

I can't see a reason to do both concurrently even though that would be
easy enough technically..

> It's simpler as you don't need to tie objects in the IDR to an IB device (as
> there's only one option). In addition, you don't have to pass a device
> handle to all commands. You could either have a bounded device and then it's
> clear what you're using or you didn't bind any device and then all device
> specific commands (except init_ucontext) fail. Client commands (i.e rdma-cm)
> could either fail or succeed, depending on their requirements. It's a closer
> model to what we have today - uverbs_file points to an ib_dev (which might
> be NULL).

I don't see todays model as actually be very good, and those don't
seem like very strong reasons. An extra argument on a couple of
commands (and PD create is the only one that springs to mind) seems
trivial. libiverbs already keeps track of this.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 7/8] RDMA/core: Change locking of ib_uobject
       [not found]                     ` <b47386f4-fdb5-d24b-dbc6-9b5e18be6bf9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-19 20:44                       ` Jason Gunthorpe
       [not found]                         ` <20160719204456.GF16042-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-19 20:44 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Tue, Jul 19, 2016 at 06:16:11PM +0300, Matan Barak wrote:
> On 14/07/2016 20:00, Jason Gunthorpe wrote:
> >That seems like it should be simple enough? Can we use the lock in the
> >kobj or something? Why is it like that today anyhow?
> 
> Yeah, I think it'll be safer to use lower level locking (like in kobj).

That is something that could productively be a dedicated patch
series..

> Well, another option (not saying it's a better one) is to develop the new
> ABI in a separate branch and try real world applications using it (with a
> modified libibverbs of course). This shall give use a feel if things are
> mature enough. The last step could be adding a write->ioctl interface.
> That's qualified to "flag day" conversion in the main branch.

My advice is to not do that.

I've observed a low success rate with such huge flag day projects.

Transitional is more work, but it is the historical 'kernel way'

> >So I don't know what your plan is to 'ditch the locks' with the new
> >infrastucture while maintaining the uverbs api..

> Using a conversion layer of the old ABI to the new ABI. Move the current
> write locks to lower kobj level as you suggested.

Hm, that is a long way to go :)

> If you want to break device != fd, you might want to destroy a device and
> start using the same fd with another device.

But that would return EBUSY like all our other destory calls if the
subordinate objects have not been deleted?

> >>> uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
> >>> [.. full setup ..]
> >>> uobj_activate(uobj); // live = 1, list_add_tail, idr_add
> >>>
> >>> resp.handle = uobj->id;
> >>> copy_to_user(..);
> >>>
> >>> put_uobj(uobj);
> >>> return ret;
> >>>
> >>>err1:
> >>> uobj_deactivate(uobj);
> >>>err2:
> >>> put_obj(uobj);
> >>>
> >>
> >>I agree with the general schema, but I think idr_add should be done in
> >>uobj_alloc as it can fail.
> >
> >In my version uobj_activate can fail, and we already have to have the
> >goto-unwind infrastructure to cope with that (copy_to_user could
> >fail), so I'm not sure why avoiding it would be helpful?
> >
> 
> I think it should be the other way around - copy_to_user is done by the
> handler and uobj_activate is done by the infrastructure. Therefore, the
> general doesn't know how to roll back the command and I want to guarantee
> anything it does after calling the handler succeed.

That just shuffles the requirement around. copy_to_user can always
fail, and it always has to happen after the idr_add - so there must
always be an unwind for it. Which means we can allow uobj_activate to
fail too and use the same basic unwind.

> IMHO, uverbs_cmd should become write_format->ioctl_format convertor.
> We don't want to have two paths doing almost the same thing in two different
> ways.

Right, but that is a long way off...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 7/8] RDMA/core: Change locking of ib_uobject
       [not found]                         ` <20160719204456.GF16042-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-20  7:54                           ` Matan Barak (External)
       [not found]                             ` <5c6e6a5d-9fde-a31f-1038-dce9e09662c4-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Matan Barak (External) @ 2016-07-20  7:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 19/07/2016 23:44, Jason Gunthorpe wrote:
> On Tue, Jul 19, 2016 at 06:16:11PM +0300, Matan Barak wrote:
>> On 14/07/2016 20:00, Jason Gunthorpe wrote:
>>> That seems like it should be simple enough? Can we use the lock in the
>>> kobj or something? Why is it like that today anyhow?
>>
>> Yeah, I think it'll be safer to use lower level locking (like in kobj).
>
> That is something that could productively be a dedicated patch
> series..
>
>> Well, another option (not saying it's a better one) is to develop the new
>> ABI in a separate branch and try real world applications using it (with a
>> modified libibverbs of course). This shall give use a feel if things are
>> mature enough. The last step could be adding a write->ioctl interface.
>> That's qualified to "flag day" conversion in the main branch.
>
> My advice is to not do that.
>
> I've observed a low success rate with such huge flag day projects.
>
> Transitional is more work, but it is the historical 'kernel way'
>

Yeah, I'm aware of that.

>>> So I don't know what your plan is to 'ditch the locks' with the new
>>> infrastucture while maintaining the uverbs api..
>
>> Using a conversion layer of the old ABI to the new ABI. Move the current
>> write locks to lower kobj level as you suggested.
>
> Hm, that is a long way to go :)
>
>> If you want to break device != fd, you might want to destroy a device and
>> start using the same fd with another device.
>
> But that would return EBUSY like all our other destory calls if the
> subordinate objects have not been deleted?
>

Not sure -EBUSY, but an error of course. I think this should be done in 
a lower layer. I would like to avoid managing dependencies between 
object in the general infrastructure. It's the same as you can't destroy 
a PD when there's a bounded CQ (you should get some error for that).

>>>>> uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
>>>>> [.. full setup ..]
>>>>> uobj_activate(uobj); // live = 1, list_add_tail, idr_add
>>>>>
>>>>> resp.handle = uobj->id;
>>>>> copy_to_user(..);
>>>>>
>>>>> put_uobj(uobj);
>>>>> return ret;
>>>>>
>>>>> err1:
>>>>> uobj_deactivate(uobj);
>>>>> err2:
>>>>> put_obj(uobj);
>>>>>
>>>>
>>>> I agree with the general schema, but I think idr_add should be done in
>>>> uobj_alloc as it can fail.
>>>
>>> In my version uobj_activate can fail, and we already have to have the
>>> goto-unwind infrastructure to cope with that (copy_to_user could
>>> fail), so I'm not sure why avoiding it would be helpful?
>>>
>>
>> I think it should be the other way around - copy_to_user is done by the
>> handler and uobj_activate is done by the infrastructure. Therefore, the
>> general doesn't know how to roll back the command and I want to guarantee
>> anything it does after calling the handler succeed.
>
> That just shuffles the requirement around. copy_to_user can always
> fail, and it always has to happen after the idr_add - so there must
> always be an unwind for it. Which means we can allow uobj_activate to
> fail too and use the same basic unwind.
>

[Infra]
validate params
allocate uobjects and IDRs
		[handler]
			copy_from_user
			execute
			if (success) {
				copy_to_user
				return success
			} else {
				return failure
			}

[infra contiune]
if success
	enable_new_uobjs
	unlock_uobjs
	destroy_what_has_to_be_destroyed
else
	delete_new_uobjs
	unlock_uobjs


In that way, the infrastructure doesn't need to roll back commands. Only 
to roll back whatever it executed.

>> IMHO, uverbs_cmd should become write_format->ioctl_format convertor.
>> We don't want to have two paths doing almost the same thing in two different
>> ways.
>
> Right, but that is a long way off...
>
> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]                                         ` <20160719203609.GD16042-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-07-20  8:01                                           ` Matan Barak
       [not found]                                             ` <9727f6e6-47fd-fb0a-537f-6264e8d742a9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Matan Barak @ 2016-07-20  8:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On 19/07/2016 23:36, Jason Gunthorpe wrote:
> On Tue, Jul 19, 2016 at 05:05:46PM +0300, Matan Barak wrote:
>
>> According to the IB spec, a CI could choose either to return an error or to
>> move MWs to orphaned MWs (and make sure any access to these MWs will be
>> blocked). Some hardwares might implement the first approach and keep
>> reference from MRs to MWs in hardware (without having a way to query that in
>> software). In these cases, you can't force this unbound (as you don't know
>> if they were unbound because of remote invalidate).
>
> Ugh, I don't why that was let in the spec, from an app perspective you
> have to assume the annoying behavior...
>

I agree.

>>> The MW case could be handled specially with a simple priority field.
>
>> So you suggest having a priority field in the uobject and iterate over the
>> IDR freeing by order. Seems simple enough.
>
> That is one option..
>
>>>> So the flow might be using the QUERY ioctl code and query the
>>>> devices/clients. Creating a device/client and get a IDR handle for that. For
>>>> every IOCTL code (either device or client commands), you pass this idr
>>>> handle as part of the ioctl code specific header.
>>>
>>> Yes, as I said, we pretty much have to do that already, and it
>>> wouldn't be on every ioctl, just certain top level ones like PDs.
>
>> I meant querying the possible devices or clients. Meaning, introduce another
>> ioctl code that is used to get all available clients and devices.
>
> Not necessarily, the 'create device' could accept the existing sysfs
> device string as the starting point, libibverbs already has that
> information.
>

But how would you know if rdma-cm is loaded and available? Don't we want 
to have a decent query interface rather than iterating through sysfs?
I agree that it could be lower priority than the rest.

>> The point here isn't about unifying rdma-cm and uverbs. Assuming we do that,
>> we could still allow a user to bind fd to several devices or just to a
>> single device.
>
> I don't understand this comment.
>
> If we keep /dev/uverbs0 then that has to remain 1:1 with a device
> because that is our permissions model.
>
> If we move to /dev/rdma_uapi, then that would support 1:N devices.
>
> I can't see a reason to do both concurrently even though that would be
> easy enough technically..
>

I meant moving to /dev/rdma_uapi, but when you fopen the device and get 
a fd, you could either have fd->NULL or fd->single_rdma_dev mapping.
It's a little bit simpler, however, not that much from just doing 1:N 
mapping.

>> It's simpler as you don't need to tie objects in the IDR to an IB device (as
>> there's only one option). In addition, you don't have to pass a device
>> handle to all commands. You could either have a bounded device and then it's
>> clear what you're using or you didn't bind any device and then all device
>> specific commands (except init_ucontext) fail. Client commands (i.e rdma-cm)
>> could either fail or succeed, depending on their requirements. It's a closer
>> model to what we have today - uverbs_file points to an ib_dev (which might
>> be NULL).
>
> I don't see todays model as actually be very good, and those don't
> seem like very strong reasons. An extra argument on a couple of
> commands (and PD create is the only one that springs to mind) seems
> trivial. libiverbs already keeps track of this.
>

I agree that if we move to /dev/rdma_uapi file, the gap from doing 1:N 
mapping isn't large.

> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 4/8] RDMA/core: Add support for custom types
       [not found]                                             ` <9727f6e6-47fd-fb0a-537f-6264e8d742a9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-20 17:49                                               ` Jason Gunthorpe
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-20 17:49 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Wed, Jul 20, 2016 at 11:01:18AM +0300, Matan Barak wrote:

> But how would you know if rdma-cm is loaded and available? Don't we want to
> have a decent query interface rather than iterating through sysfs?
> I agree that it could be lower priority than the rest.

1) 'loaded available' I think you will agree we have a fairly bad user
   experience when it comes to autoloading modules. If we have one fd,
   and the core of that FD is part of the ib_core module that is
   pulled in on driver load then the a request for an API segment (eg uverbs,
   rdma_cm) can now actually trigger a module load, just like net does
   for PF's. So we actually get to a better, saner, place.

2) Yes, an in-band query interface makes much more sense than
   schlepping around sysfs. That could be done in the ioctl, or maybe
   via netlink, but as you say, lower priority.

> I meant moving to /dev/rdma_uapi, but when you fopen the device and get a
> fd, you could either have fd->NULL or fd->single_rdma_dev mapping.
> It's a little bit simpler, however, not that much from just doing 1:N
> mapping.

A major point of the unified interface is to sanely support rdma_cm
when working with multiple devices, so I'd expect 1:N is the only
reasonable option for that interface..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC ABI V1 7/8] RDMA/core: Change locking of ib_uobject
       [not found]                             ` <5c6e6a5d-9fde-a31f-1038-dce9e09662c4-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2016-07-20 17:56                               ` Jason Gunthorpe
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Gunthorpe @ 2016-07-20 17:56 UTC (permalink / raw)
  To: Matan Barak (External)
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Sean Hefty,
	Tal Alon, Liran Liss, Haggai Eran, Majd Dibbiny,
	Christoph Lameter, Leon Romanovsky

On Wed, Jul 20, 2016 at 10:54:55AM +0300, Matan Barak (External) wrote:
> Not sure -EBUSY, but an error of course. I think this should be done in a
> lower layer. I would like to avoid managing dependencies between object in
> the general infrastructure. It's the same as you can't destroy a PD when
> there's a bounded CQ (you should get some error for that).

Yes, that is my thinking..
> >>I think it should be the other way around - copy_to_user is done by the
> >>handler and uobj_activate is done by the infrastructure. Therefore, the
> >>general doesn't know how to roll back the command and I want to guarantee
> >>anything it does after calling the handler succeed.
> >
> >That just shuffles the requirement around. copy_to_user can always
> >fail, and it always has to happen after the idr_add - so there must
> >always be an unwind for it. Which means we can allow uobj_activate to
> >fail too and use the same basic unwind.
> >
> 
> [Infra]
> validate params
> allocate uobjects and IDRs
> 		[handler]
> 			copy_from_user
> 			execute
> 			if (success) {
> 				copy_to_user
> 				return success
> 			} else {
> 				return failure
> 			}
> 
> [infra contiune]
> if success
> 	enable_new_uobjs
> 	unlock_uobjs
> 	destroy_what_has_to_be_destroyed
> else
> 	delete_new_uobjs
> 	unlock_uobjs
> 
> 
> In that way, the infrastructure doesn't need to roll back commands. Only to
> roll back whatever it executed.

I'd really prefer to avoid this kind of flow where the idr is
installed at the top. The current code does that and uses 'live' to
fix it, but that is not how idrs are typically used in the kernel, and
getting rid of live is a noble goal, IMHO.

I suggest you use the same basic flow but 'handler' returns a
functional uobj that is *not* inside the idr, and the destroy of a
functional uboj is simply handled by calling the types' destroy
method, like any other case.

To make this work better I suggest having a standard place in the
ioctl response for the allocated uobj # and let the core code fill
that in, that way we don't need to copy_to_user in the handler for the
uobj #.

If idr insertion fails, then call the type's destroy function, just
like on a fd close or otherwise. No trouble for the core code.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-07-20 17:56 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-30 13:39 [RFC ABI V1 0/8] SG-based RDMA ABI Proposal Matan Barak
     [not found] ` <1467293971-25688-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-06-30 13:39   ` [RFC ABI V1 1/8] RDMA/core: Export RDMA IOCTL declarations Matan Barak
     [not found]     ` <1467293971-25688-2-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-12 19:12       ` Jason Gunthorpe
     [not found]         ` <20160712191256.GA8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-12 19:43           ` Leon Romanovsky
2016-06-30 13:39   ` [RFC ABI V1 2/8] RDMA/core: Move uobject code to separate files Matan Barak
     [not found]     ` <1467293971-25688-3-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-12 19:15       ` Jason Gunthorpe
     [not found]         ` <20160712191507.GB8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-12 19:38           ` Leon Romanovsky
2016-07-13 14:34           ` Matan Barak
2016-06-30 13:39   ` [RFC ABI V1 3/8] RDMA/core: Refactor IDR to be per-device Matan Barak
     [not found]     ` <1467293971-25688-4-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-12 19:29       ` Jason Gunthorpe
     [not found]         ` <20160712192933.GD8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-12 19:48           ` Leon Romanovsky
2016-06-30 13:39   ` [RFC ABI V1 4/8] RDMA/core: Add support for custom types Matan Barak
     [not found]     ` <1467293971-25688-5-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-12 19:23       ` Jason Gunthorpe
     [not found]         ` <20160712192345.GC8206-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-13 14:44           ` Matan Barak
     [not found]             ` <081a02c0-0650-d0c2-494c-19a64b83cbc1-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-13 16:39               ` Jason Gunthorpe
     [not found]                 ` <20160713163924.GA19657-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-13 16:57                   ` Matan Barak
     [not found]                     ` <0959d391-75fb-75e8-ef2e-9d8c06b1b96f-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-13 17:21                       ` Jason Gunthorpe
     [not found]                         ` <20160713172116.GC19657-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-14  5:30                           ` Matan Barak
     [not found]                             ` <5fdae959-5ab2-ee17-e36e-3642ddd3e6ce-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-14 16:41                               ` Jason Gunthorpe
     [not found]                                 ` <20160714164121.GA2046-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-19 14:05                                   ` Matan Barak
     [not found]                                     ` <a1fcb1a7-0028-329a-d34a-ca8b52323916-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-19 20:36                                       ` Jason Gunthorpe
     [not found]                                         ` <20160719203609.GD16042-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-20  8:01                                           ` Matan Barak
     [not found]                                             ` <9727f6e6-47fd-fb0a-537f-6264e8d742a9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-20 17:49                                               ` Jason Gunthorpe
2016-07-14  6:30                           ` Leon Romanovsky
2016-06-30 13:39   ` [RFC ABI V1 5/8] RDMA/core: Introduce add/remove uobj from types Matan Barak
2016-06-30 13:39   ` [RFC ABI V1 6/8] RDMA/core: Add new ioctl interface Matan Barak
2016-06-30 13:39   ` [RFC ABI V1 7/8] RDMA/core: Change locking of ib_uobject Matan Barak
     [not found]     ` <1467293971-25688-8-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-13 17:16       ` Jason Gunthorpe
     [not found]         ` <20160713171657.GB19657-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-14  7:59           ` Matan Barak
     [not found]             ` <6ae1a553-408c-723a-93a2-3d46d952ce35-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-14 17:00               ` Jason Gunthorpe
     [not found]                 ` <20160714170051.GB2046-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-19 15:16                   ` Matan Barak
     [not found]                     ` <b47386f4-fdb5-d24b-dbc6-9b5e18be6bf9-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-19 20:44                       ` Jason Gunthorpe
     [not found]                         ` <20160719204456.GF16042-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-07-20  7:54                           ` Matan Barak (External)
     [not found]                             ` <5c6e6a5d-9fde-a31f-1038-dce9e09662c4-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2016-07-20 17:56                               ` Jason Gunthorpe
2016-06-30 13:39   ` [RFC ABI V1 8/8] RDMA/{hw, core}: Provide simple skeleton to IOCTL interface Matan Barak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.